[ 
https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-6743:
------------------------------
    Labels: correctness  (was: )

> Join with empty projection on one side produces invalid results
> ---------------------------------------------------------------
>
>                 Key: SPARK-6743
>                 URL: https://issues.apache.org/jira/browse/SPARK-6743
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: Santiago M. Mola
>            Assignee: Michael Armbrust
>            Priority: Critical
>              Labels: correctness
>             Fix For: 1.4.0
>
>
> {code:java}
> val sqlContext = new SQLContext(sc)
> val tab0 = sc.parallelize(Seq(
>       (83,0,38),
>       (26,0,79),
>       (43,81,24)
>     ))
>     sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), 
> "tab0")
> sqlContext.cacheTable("tab0")   
> val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP 
> BY tab0._2, cor0._2")
> val result1 = df1.collect()
> val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY 
> cor0._2")
> val result2 = df2.collect()
> val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2")
> val result3 = df3.collect()
> {code}
> Given the previous code, result2 equals to Row(43), Row(83), Row(26), which 
> is wrong. These results correspond to cor0._1, instead of cor0._2. Correct 
> results would be Row(0), Row(81), which are ok for the third query. The first 
> query also produces valid results, and the only difference is that the left 
> side of the join is not empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to