[ https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-19017: ------------------------------------ Assignee: Apache Spark > NOT IN subquery with more than one column may return incorrect results > ---------------------------------------------------------------------- > > Key: SPARK-19017 > URL: https://issues.apache.org/jira/browse/SPARK-19017 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0 > Reporter: Nattavut Sutyanyong > Assignee: Apache Spark > > When putting more than one column in the NOT IN, the query may not return > correctly if there is a null data. We can demonstrate the problem with the > following data set and query: > {code} > Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1") > Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2") > sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show > +---+---+ > | a1| b1| > +---+---+ > +---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org