[jira] [Updated] (SPARK-24780) DataFrame.column_name should resolve to a distinct ref

holdenk (JIRA) Tue, 10 Jul 2018 18:48:07 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-24780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


holdenk updated SPARK-24780:
----------------------------
    Summary: DataFrame.column_name should resolve to a distinct ref  (was: 
DataFrame.column_name should take into account DataFrame alias for future joins)

> DataFrame.column_name should resolve to a distinct ref
> ------------------------------------------------------
>
>                 Key: SPARK-24780
>                 URL: https://issues.apache.org/jira/browse/SPARK-24780
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 2.4.0
>            Reporter: holdenk
>            Priority: Minor
>
> If we join a dataframe with another dataframe which has the same column name 
> of the conditions (e.g. shared lineage on one of the conditions) even though 
> the join condition may be written with the full name, the columns returned 
> don't have the dataframe alias and as such will create a cross-join.
> For example this currently works even if both posts_by_sampled_authors  &  
> mailing_list_posts_in_reply_to contain both in_reply_to and message_id fields.
>  
> {code:java}
> posts_with_replies = posts_by_sampled_authors.join(
>  mailing_list_posts_in_reply_to,
>  [F.col("mailing_list_posts_in_reply_to.in_reply_to") == 
> F.col("posts_by_sampled_authors.message_id")],
>  "inner"){code}
>  
> But a similarly written expression:
> {code:java}
> posts_with_replies = posts_by_sampled_authors.join(
>  mailing_list_posts_in_reply_to,
>  [mailing_list_posts_in_reply_to.in_reply_to == 
> posts_by_sampled_authors.message_id],
>  "inner"){code}
> will fail.
>  
> I'm not super sure whats going on inside of the resolution that's causing it 
> to get confused.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24780) DataFrame.column_name should resolve to a distinct ref

Reply via email to