[jira] [Updated] (SPARK-12520) Python API dataframe join returns wrong results on outer join
[ https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-12520: --- Assignee: Xiao Li > Python API dataframe join returns wrong results on outer join > - > > Key: SPARK-12520 > URL: https://issues.apache.org/jira/browse/SPARK-12520 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.4.1 >Reporter: Aravind B >Assignee: Xiao Li > Fix For: 1.6.0, 2.0.0 > > > Consider the following dataframes: > """ > left_table: > +++-+--+ > |head_id_left|tail_id_left|weight|joining_column| > +++-+--+ > | 1| 2|1| 1~2| > +++-+--+ > right_table: > +-+-+--+ > |head_id_right|tail_id_right|joining_column| > +-+-+--+ > +-+-+--+ > """ > The following code returns an empty dataframe: > """ > joined_table = left_table.join(right_table, "joining_column", "outer") > """ > joined_table has zero rows. > However: > """ > joined_table = left_table.join(right_table, left_table.joining_column == > right_table.joining_column, "outer") > """ > returns the correct answer with one row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12520) Python API dataframe join returns wrong results on outer join
[ https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-12520: --- Fix Version/s: 1.6.0 > Python API dataframe join returns wrong results on outer join > - > > Key: SPARK-12520 > URL: https://issues.apache.org/jira/browse/SPARK-12520 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.4.1 >Reporter: Aravind B > Fix For: 1.6.0, 2.0.0 > > > Consider the following dataframes: > """ > left_table: > +++-+--+ > |head_id_left|tail_id_left|weight|joining_column| > +++-+--+ > | 1| 2|1| 1~2| > +++-+--+ > right_table: > +-+-+--+ > |head_id_right|tail_id_right|joining_column| > +-+-+--+ > +-+-+--+ > """ > The following code returns an empty dataframe: > """ > joined_table = left_table.join(right_table, "joining_column", "outer") > """ > joined_table has zero rows. > However: > """ > joined_table = left_table.join(right_table, left_table.joining_column == > right_table.joining_column, "outer") > """ > returns the correct answer with one row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12520) Python API dataframe join returns wrong results on outer join
[ https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-12520: --- Fix Version/s: 1.5.3 > Python API dataframe join returns wrong results on outer join > - > > Key: SPARK-12520 > URL: https://issues.apache.org/jira/browse/SPARK-12520 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.4.1 >Reporter: Aravind B >Assignee: Xiao Li > Fix For: 1.5.3, 1.6.0, 2.0.0 > > > Consider the following dataframes: > """ > left_table: > +++-+--+ > |head_id_left|tail_id_left|weight|joining_column| > +++-+--+ > | 1| 2|1| 1~2| > +++-+--+ > right_table: > +-+-+--+ > |head_id_right|tail_id_right|joining_column| > +-+-+--+ > +-+-+--+ > """ > The following code returns an empty dataframe: > """ > joined_table = left_table.join(right_table, "joining_column", "outer") > """ > joined_table has zero rows. > However: > """ > joined_table = left_table.join(right_table, left_table.joining_column == > right_table.joining_column, "outer") > """ > returns the correct answer with one row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12520) Python API dataframe join returns wrong results on outer join
[ https://issues.apache.org/jira/browse/SPARK-12520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-12520: Component/s: SQL > Python API dataframe join returns wrong results on outer join > - > > Key: SPARK-12520 > URL: https://issues.apache.org/jira/browse/SPARK-12520 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.4.1 >Reporter: Aravind B > > Consider the following dataframes: > """ > left_table: > +++-+--+ > |head_id_left|tail_id_left|weight|joining_column| > +++-+--+ > | 1| 2|1| 1~2| > +++-+--+ > right_table: > +-+-+--+ > |head_id_right|tail_id_right|joining_column| > +-+-+--+ > +-+-+--+ > """ > The following code returns an empty dataframe: > """ > joined_table = left_table.join(right_table, "joining_column", "outer") > """ > joined_table has zero rows. > However: > """ > joined_table = left_table.join(right_table, left_table.joining_column == > right_table.joining_column, "outer") > """ > returns the correct answer with one row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org