Jack Chen created SPARK-43098:
---------------------------------

             Summary: Should not handle the COUNT bug when the GROUP BY clause 
of a correlated scalar subquery is non-empty
                 Key: SPARK-43098
                 URL: https://issues.apache.org/jira/browse/SPARK-43098
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Jack Chen


>From [~allisonwang-db] :

There is no COUNT bug when the correlated equality predicates are also in the 
group by clause. However, the current logic to handle the COUNT bug still adds 
default aggregate function value and returns incorrect results.

 
{code:java}
create view t1(c1, c2) as values (0, 1), (1, 2);
create view t2(c1, c2) as values (0, 2), (0, 3);

select c1, c2, (select count(*) from t2 where t1.c1 = t2.c1 group by c1) from 
t1;

-- Correct answer: [(0, 1, 2), (1, 2, null)]
+---+---+------------------+
|c1 |c2 |scalarsubquery(c1)|
+---+---+------------------+
|0  |1  |2                 |
|1  |2  |0                 |
+---+---+------------------+
 {code}
 

This bug affects scalar subqueries in RewriteCorrelatedScalarSubquery, but 
lateral subqueries handle it correctly in DecorrelateInnerQuery. Related: 
https://issues.apache.org/jira/browse/SPARK-36113 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to