[ https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071709#comment-15071709 ]
Apache Spark commented on SPARK-12520: -------------------------------------- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/10477 > Python API dataframe join returns wrong results on outer join > ------------------------------------------------------------- > > Key: SPARK-12520 > URL: https://issues.apache.org/jira/browse/SPARK-12520 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 1.4.1 > Reporter: Aravind B > > Consider the following dataframes: > """ > left_table: > +------------+------------+---------+--------------+ > |head_id_left|tail_id_left|weight|joining_column| > +------------+------------+---------+--------------+ > | 1| 2| 1| 1~2| > +------------+------------+---------+--------------+ > right_table: > +-------------+-------------+--------------+ > |head_id_right|tail_id_right|joining_column| > +-------------+-------------+--------------+ > +-------------+-------------+--------------+ > """ > The following code returns an empty dataframe: > """ > joined_table = left_table.join(right_table, "joining_column", "outer") > """ > joined_table has zero rows. > However: > """ > joined_table = left_table.join(right_table, left_table.joining_column == > right_table.joining_column, "outer") > """ > returns the correct answer with one row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org