[ 
https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miles Yucht updated SPARK-24395:
--------------------------------
    Description: 
Spark does not return the correct answer when evaluating NOT IN in some cases. 
For example:
{code:java}
CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES
  (null, null)
  AS m(a, b);

SELECT *
FROM   m
WHERE  a IS NULL AND b IS NULL
       AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
1))));{code}
According to the semantics of null-aware anti-join, this should return no rows. 
However, it actually returns the row {{NULL NULL}}. This was found by 
inspecting the unit tests added for SPARK-24381 
([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).]

*Acceptance Criteria*:
 * We should be able to add the following test cases back to 
{{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}:
{code:java}
  -- Case 3
  -- (probe-side columns are all null -> row not returned)
SELECT *
FROM   m
WHERE  a IS NULL AND b IS NULL -- Matches only (null, null)
       AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 1))));

  -- Case 4
  -- (one column null, other column matches a row in the subquery result -> row 
not returned)
SELECT *
FROM   m
WHERE  b = 1.0 -- Matches (null, 1.0)
       AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
1)))); 
{code}

 

cc [~smilegator] [~juliuszsompolski]

  was:
Spark does not return the correct answer when evaluating NOT IN in some cases. 
For example:
{code:java}
CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES
  (null, null)
  AS m(a, b);

SELECT *
FROM   m
WHERE  a IS NULL AND b IS NULL
       AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
1))));{code}
According to the semantics of null-aware anti-join, this should return no rows. 
However, it actually returns the row {{NULL NULL}}. This was found by 
inspecting the unit tests added for SPARK-24381 
([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).]

cc [~smilegator] [~juliuszsompolski]


> Fix Behavior of NOT IN with Literals Containing NULL
> ----------------------------------------------------
>
>                 Key: SPARK-24395
>                 URL: https://issues.apache.org/jira/browse/SPARK-24395
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: Miles Yucht
>            Priority: Major
>
> Spark does not return the correct answer when evaluating NOT IN in some 
> cases. For example:
> {code:java}
> CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES
>   (null, null)
>   AS m(a, b);
> SELECT *
> FROM   m
> WHERE  a IS NULL AND b IS NULL
>        AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
> 1))));{code}
> According to the semantics of null-aware anti-join, this should return no 
> rows. However, it actually returns the row {{NULL NULL}}. This was found by 
> inspecting the unit tests added for SPARK-24381 
> ([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).]
> *Acceptance Criteria*:
>  * We should be able to add the following test cases back to 
> {{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}:
> {code:java}
>   -- Case 3
>   -- (probe-side columns are all null -> row not returned)
> SELECT *
> FROM   m
> WHERE  a IS NULL AND b IS NULL -- Matches only (null, null)
>        AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
> 1))));
>   -- Case 4
>   -- (one column null, other column matches a row in the subquery result -> 
> row not returned)
> SELECT *
> FROM   m
> WHERE  b = 1.0 -- Matches (null, 1.0)
>        AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, 
> 1)))); 
> {code}
>  
> cc [~smilegator] [~juliuszsompolski]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to