[ https://issues.apache.org/jira/browse/SPARK-43156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716312#comment-17716312 ]
Ignite TC Bot commented on SPARK-43156: --------------------------------------- User 'jchen5' has created a pull request for this issue: https://github.com/apache/spark/pull/40946 > Correctness COUNT bug in correlated scalar subselect with `COUNT(*) is null` > ---------------------------------------------------------------------------- > > Key: SPARK-43156 > URL: https://issues.apache.org/jira/browse/SPARK-43156 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.4.0 > Reporter: Jack Chen > Priority: Major > > Example query: > {code:java} > spark.sql("select *, (select (count(1)) is null from t1 where t0.a = t1.c) > from t0").collect() > res6: Array[org.apache.spark.sql.Row] = Array([1,1.0,null], [2,2.0,false]) > {code} > In this subquery, count(1) always evaluates to a non-null integer value, so > count(1) is null is always false. The correct evaluation of the subquery is > always false. > We incorrectly evaluate it to null for empty groups. The reason is that > NullPropagation rewrites Aggregate [c] [isnull(count(1))] to Aggregate [c] > [false] - this rewrite would be correct normally, but in the context of a > scalar subquery it breaks our count bug handling in > RewriteCorrelatedScalarSubquery.constructLeftJoins . By the time we get > there, the query appears to not have the count bug - it looks the same as if > the original query had a subquery with select any_value(false) from r..., and > that case is _not_ subject to the count bug. > > Postgres comparison show correct always-false result: > [http://sqlfiddle.com/#!17/67822/5] > DDL for the example: > {code:java} > create or replace temp view t0 (a, b) > as values > (1, 1.0), > (2, 2.0); > create or replace temp view t1 (c, d) > as values > (2, 3.0); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org