[ https://issues.apache.org/jira/browse/SPARK-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-17348: ------------------------------------ Assignee: Apache Spark > Incorrect results from subquery transformation > ---------------------------------------------- > > Key: SPARK-17348 > URL: https://issues.apache.org/jira/browse/SPARK-17348 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Nattavut Sutyanyong > Assignee: Apache Spark > Labels: correctness > > {noformat} > Seq((1,1)).toDF("c1","c2").createOrReplaceTempView("t1") > Seq((1,1),(2,0)).toDF("c1","c2").createOrReplaceTempView("t2") > sql("select c1 from t1 where c1 in (select max(t2.c1) from t2 where t1.c2 >= > t2.c2)").show > +---+ > | c1| > +---+ > | 1| > +---+ > {noformat} > The correct result of the above query should be an empty set. Here is an > explanation: > Both rows from T2 satisfies the correlated predicate T1.C2 >= T2.C2 when > T1.C1 = 1 so both rows needs to be processed in the same group of the > aggregation process in the subquery. The result of the aggregation yields > MAX(T2.C1) as 2. Therefore, the result of the evaluation of the predicate > T1.C1 (which is 1) IN MAX(T2.C1) (which is 2) should be an empty set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org