Aravind B created SPARK-12520: ---------------------------------- Summary: Python API dataframe join returns wrong results on outer join Key: SPARK-12520 URL: https://issues.apache.org/jira/browse/SPARK-12520 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.1 Reporter: Aravind B
Consider the following dataframes: """ left_table: +------------+------------+---------+--------------+ |head_id_left|tail_id_left|weight|joining_column| +------------+------------+---------+--------------+ | 1| 2| 1| 1~2| +------------+------------+---------+--------------+ right_table: +-------------+-------------+--------------+ |head_id_right|tail_id_right|joining_column| +-------------+-------------+--------------+ +-------------+-------------+--------------+ """ The following code returns an empty dataframe: """ joined_table = left_table.join(right_table, "joining_column", "outer") """ joined_table has zero rows. However: """ joined_table = left_table.join(right_table, left_table.joining_column == right_table.joining_column, "outer") """ returns the correct answer with one row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org