[jira] [Commented] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results

Nattavut Sutyanyong (JIRA) Thu, 05 Jan 2017 07:21:25 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801624#comment-15801624
 ]


Nattavut Sutyanyong commented on SPARK-19017:
---------------------------------------------

One way to interpret the behaviour of the comparison of tuples is it operates 
strictly on 2-value logic of the comparison of each element of the tuples.

That is,

Case 1:
(a1, a2) = (b1, b2)

is interpreted as

IF (a1 = b1) is true THEN true ELSE false
AND
IF (a2 = b2) is true THEN true ELSE false

Case 2:
(a1, a2) <> (b1, b2)

is interpreted as a negation of the equality operator on the 2-value logic of 
each element.

IF (a1 = b1) is true THEN false ELSE true
OR
IF (a2 = b2) is true THEN false ELSE true


The NOT IN semantics is different that it preserves the 3-value logic in the 
comparison of each element.

I don't know that we should align the two usages to the same semantics or not. 

> NOT IN subquery with more than one column may return incorrect results
> ----------------------------------------------------------------------
>
>                 Key: SPARK-19017
>                 URL: https://issues.apache.org/jira/browse/SPARK-19017
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>            Reporter: Nattavut Sutyanyong
>
> When putting more than one column in the NOT IN, the query may not return 
> correctly if there is a null data. We can demonstrate the problem with the 
> following data set and query:
> {code}
> Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1")
> Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2")
> sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show
> +---+---+
> | a1| b1|
> +---+---+
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results

Reply via email to