[jira] [Commented] (SPARK-28186) array_contains returns null instead of false when one of the items in the array is null
[ https://issues.apache.org/jira/browse/SPARK-28186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876650#comment-16876650 ] Takeshi Yamamuro commented on SPARK-28186: -- I also think this is a right behaviour as Marco said. If no more comment, I'll close this. Thanks. > array_contains returns null instead of false when one of the items in the > array is null > --- > > Key: SPARK-28186 > URL: https://issues.apache.org/jira/browse/SPARK-28186 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Alex Kushnir >Priority: Major > > If array of items contains a null item then array_contains returns true if > item is found but if item is not found it returns null instead of false > Seq( > (1, Seq("a", "b", "c")), > (2, Seq("a", "b", null, "c")) > ).toDF("id", "vals").createOrReplaceTempView("tbl") > spark.sql("select id, vals, array_contains(vals, 'a') as has_a, > array_contains(vals, 'd') as has_d from tbl").show > ++-++--+ > |id|vals|has_a|has_d| > ++-++--+ > |1|[a, b, c]|true|false| > |2|[a, b,, c]|true|null| > ++-++--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28186) array_contains returns null instead of false when one of the items in the array is null
[ https://issues.apache.org/jira/browse/SPARK-28186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876509#comment-16876509 ] Marco Gaido commented on SPARK-28186: - You're right with that. The equivalent in Postgres is {{=ANY}} which behaves like current Spark. So I don't see a string motivation to change the current Spark behavior. > array_contains returns null instead of false when one of the items in the > array is null > --- > > Key: SPARK-28186 > URL: https://issues.apache.org/jira/browse/SPARK-28186 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Alex Kushnir >Priority: Major > > If array of items contains a null item then array_contains returns true if > item is found but if item is not found it returns null instead of false > Seq( > (1, Seq("a", "b", "c")), > (2, Seq("a", "b", null, "c")) > ).toDF("id", "vals").createOrReplaceTempView("tbl") > spark.sql("select id, vals, array_contains(vals, 'a') as has_a, > array_contains(vals, 'd') as has_d from tbl").show > ++-++--+ > |id|vals|has_a|has_d| > ++-++--+ > |1|[a, b, c]|true|false| > |2|[a, b,, c]|true|null| > ++-++--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28186) array_contains returns null instead of false when one of the items in the array is null
[ https://issues.apache.org/jira/browse/SPARK-28186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876386#comment-16876386 ] Alex Kushnir commented on SPARK-28186: -- I'm porting HIVE workload to SPARK. It works in HIVE as expected select array_contains(array('a','b',null,'c'),'a'), array_contains(array('a','b',null,'c'), 'd') returns true, false > array_contains returns null instead of false when one of the items in the > array is null > --- > > Key: SPARK-28186 > URL: https://issues.apache.org/jira/browse/SPARK-28186 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Alex Kushnir >Priority: Major > > If array of items contains a null item then array_contains returns true if > item is found but if item is not found it returns null instead of false > Seq( > (1, Seq("a", "b", "c")), > (2, Seq("a", "b", null, "c")) > ).toDF("id", "vals").createOrReplaceTempView("tbl") > spark.sql("select id, vals, array_contains(vals, 'a') as has_a, > array_contains(vals, 'd') as has_d from tbl").show > ++-++--+ > |id|vals|has_a|has_d| > ++-++--+ > |1|[a, b, c]|true|false| > |2|[a, b,, c]|true|null| > ++-++--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28186) array_contains returns null instead of false when one of the items in the array is null
[ https://issues.apache.org/jira/browse/SPARK-28186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876357#comment-16876357 ] Marco Gaido commented on SPARK-28186: - Do you know of any SQL BD with the behavior you are suggesting? > array_contains returns null instead of false when one of the items in the > array is null > --- > > Key: SPARK-28186 > URL: https://issues.apache.org/jira/browse/SPARK-28186 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Alex Kushnir >Priority: Major > > If array of items contains a null item then array_contains returns true if > item is found but if item is not found it returns null instead of false > Seq( > (1, Seq("a", "b", "c")), > (2, Seq("a", "b", null, "c")) > ).toDF("id", "vals").createOrReplaceTempView("tbl") > spark.sql("select id, vals, array_contains(vals, 'a') as has_a, > array_contains(vals, 'd') as has_d from tbl").show > ++-++--+ > |id|vals|has_a|has_d| > ++-++--+ > |1|[a, b, c]|true|false| > |2|[a, b,, c]|true|null| > ++-++--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28186) array_contains returns null instead of false when one of the items in the array is null
[ https://issues.apache.org/jira/browse/SPARK-28186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876304#comment-16876304 ] Alex Kushnir commented on SPARK-28186: -- because array ["a","b",null,"c"] clearly does not contain "d" and I would expect it to return false and not null. Why are you saying that this is correct behavior? > array_contains returns null instead of false when one of the items in the > array is null > --- > > Key: SPARK-28186 > URL: https://issues.apache.org/jira/browse/SPARK-28186 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Alex Kushnir >Priority: Major > > If array of items contains a null item then array_contains returns true if > item is found but if item is not found it returns null instead of false > Seq( > (1, Seq("a", "b", "c")), > (2, Seq("a", "b", null, "c")) > ).toDF("id", "vals").createOrReplaceTempView("tbl") > spark.sql("select id, vals, array_contains(vals, 'a') as has_a, > array_contains(vals, 'd') as has_d from tbl").show > ++-++--+ > |id|vals|has_a|has_d| > ++-++--+ > |1|[a, b, c]|true|false| > |2|[a, b,, c]|true|null| > ++-++--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28186) array_contains returns null instead of false when one of the items in the array is null
[ https://issues.apache.org/jira/browse/SPARK-28186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875447#comment-16875447 ] Marco Gaido commented on SPARK-28186: - This is the right behavior AFAIK. Why are you saying it is wrong? > array_contains returns null instead of false when one of the items in the > array is null > --- > > Key: SPARK-28186 > URL: https://issues.apache.org/jira/browse/SPARK-28186 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Alex Kushnir >Priority: Major > > If array of items contains a null item then array_contains returns true if > item is found but if item is not found it returns null instead of false > Seq( > (1, Seq("a", "b", "c")), > (2, Seq("a", "b", null, "c")) > ).toDF("id", "vals").createOrReplaceTempView("tbl") > spark.sql("select id, vals, array_contains(vals, 'a') as has_a, > array_contains(vals, 'd') as has_d from tbl").show > ++-++--+ > |id|vals|has_a|has_d| > ++-++--+ > |1|[a, b, c]|true|false| > |2|[a, b,, c]|true|null| > ++-++--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org