[jira] [Commented] (SPARK-24395) Fix Behavior of NOT IN with Literals Containing NULL
[ https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571892#comment-16571892 ] Apache Spark commented on SPARK-24395: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/22029 > Fix Behavior of NOT IN with Literals Containing NULL > > > Key: SPARK-24395 > URL: https://issues.apache.org/jira/browse/SPARK-24395 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Miles Yucht >Priority: Major > > Spark does not return the correct answer when evaluating NOT IN in some > cases. For example: > {code:java} > CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES > (null, null) > AS m(a, b); > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1;{code} > According to the semantics of null-aware anti-join, this should return no > rows. However, it actually returns the row {{NULL NULL}}. This was found by > inspecting the unit tests added for SPARK-24381 > ([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).] > *Acceptance Criteria*: > * We should be able to add the following test cases back to > {{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}: > {code:java} > -- Case 2 > -- (subquery contains a row with null in all columns -> row not returned) > SELECT * > FROM m > WHERE (a, b) NOT IN ((CAST (null AS INT), CAST (null AS DECIMAL(2, 1; > -- Case 3 > -- (probe-side columns are all null -> row not returned) > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL -- Matches only (null, null) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > -- Case 4 > -- (one column null, other column matches a row in the subquery result -> > row not returned) > SELECT * > FROM m > WHERE b = 1.0 -- Matches (null, 1.0) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > {code} > > cc [~smilegator] [~juliuszsompolski] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24395) Fix Behavior of NOT IN with Literals Containing NULL
[ https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495068#comment-16495068 ] Marco Gaido commented on SPARK-24395: - The main issue here is that {{(null, null) = (1, 2)}} in Spark evaluates to {{false}}, while on other DBs evaluates to null. So we should revisit all our comparisons for struct. > Fix Behavior of NOT IN with Literals Containing NULL > > > Key: SPARK-24395 > URL: https://issues.apache.org/jira/browse/SPARK-24395 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Miles Yucht >Priority: Major > > Spark does not return the correct answer when evaluating NOT IN in some > cases. For example: > {code:java} > CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES > (null, null) > AS m(a, b); > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1;{code} > According to the semantics of null-aware anti-join, this should return no > rows. However, it actually returns the row {{NULL NULL}}. This was found by > inspecting the unit tests added for SPARK-24381 > ([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).] > *Acceptance Criteria*: > * We should be able to add the following test cases back to > {{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}: > {code:java} > -- Case 2 > -- (subquery contains a row with null in all columns -> row not returned) > SELECT * > FROM m > WHERE (a, b) NOT IN ((CAST (null AS INT), CAST (null AS DECIMAL(2, 1; > -- Case 3 > -- (probe-side columns are all null -> row not returned) > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL -- Matches only (null, null) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > -- Case 4 > -- (one column null, other column matches a row in the subquery result -> > row not returned) > SELECT * > FROM m > WHERE b = 1.0 -- Matches (null, 1.0) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > {code} > > cc [~smilegator] [~juliuszsompolski] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24395) Fix Behavior of NOT IN with Literals Containing NULL
[ https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494769#comment-16494769 ] Xiao Li commented on SPARK-24395: - I think Oracle returns a different answer. We should fix them. > Fix Behavior of NOT IN with Literals Containing NULL > > > Key: SPARK-24395 > URL: https://issues.apache.org/jira/browse/SPARK-24395 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Miles Yucht >Priority: Major > > Spark does not return the correct answer when evaluating NOT IN in some > cases. For example: > {code:java} > CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES > (null, null) > AS m(a, b); > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1;{code} > According to the semantics of null-aware anti-join, this should return no > rows. However, it actually returns the row {{NULL NULL}}. This was found by > inspecting the unit tests added for SPARK-24381 > ([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).] > *Acceptance Criteria*: > * We should be able to add the following test cases back to > {{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}: > {code:java} > -- Case 2 > -- (subquery contains a row with null in all columns -> row not returned) > SELECT * > FROM m > WHERE (a, b) NOT IN ((CAST (null AS INT), CAST (null AS DECIMAL(2, 1; > -- Case 3 > -- (probe-side columns are all null -> row not returned) > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL -- Matches only (null, null) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > -- Case 4 > -- (one column null, other column matches a row in the subquery result -> > row not returned) > SELECT * > FROM m > WHERE b = 1.0 -- Matches (null, 1.0) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > {code} > > cc [~smilegator] [~juliuszsompolski] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24395) Fix Behavior of NOT IN with Literals Containing NULL
[ https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494181#comment-16494181 ] Juliusz Sompolski commented on SPARK-24395: --- The question is whether the literals should be treated as structs, or unpacked? If like structs, then the current behavior is correct, I think. But when a similar query is IN / NOT IN subquery, it is currently treated as if the left hand side was unpacked into independent columns. cc [~mgaido] [~hvanhovell] > Fix Behavior of NOT IN with Literals Containing NULL > > > Key: SPARK-24395 > URL: https://issues.apache.org/jira/browse/SPARK-24395 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Miles Yucht >Priority: Major > > Spark does not return the correct answer when evaluating NOT IN in some > cases. For example: > {code:java} > CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES > (null, null) > AS m(a, b); > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1;{code} > According to the semantics of null-aware anti-join, this should return no > rows. However, it actually returns the row {{NULL NULL}}. This was found by > inspecting the unit tests added for SPARK-24381 > ([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).] > *Acceptance Criteria*: > * We should be able to add the following test cases back to > {{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}: > {code:java} > -- Case 2 > -- (subquery contains a row with null in all columns -> row not returned) > SELECT * > FROM m > WHERE (a, b) NOT IN ((CAST (null AS INT), CAST (null AS DECIMAL(2, 1; > -- Case 3 > -- (probe-side columns are all null -> row not returned) > SELECT * > FROM m > WHERE a IS NULL AND b IS NULL -- Matches only (null, null) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > -- Case 4 > -- (one column null, other column matches a row in the subquery result -> > row not returned) > SELECT * > FROM m > WHERE b = 1.0 -- Matches (null, 1.0) >AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2, > 1; > {code} > > cc [~smilegator] [~juliuszsompolski] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org