Asif created SPARK-41141:
----------------------------

             Summary: avoid introducing a new aggregate expression in the 
analysis phase when subquery is referencing it
                 Key: SPARK-41141
                 URL: https://issues.apache.org/jira/browse/SPARK-41141
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.1
            Reporter: Asif


Currently the  analyzer phase rules on subquery referencing the aggregate 
expression in outer query, avoids introducing a new aggregate only for a single 
level aggregate function. It introduces new aggregate expression for nested 
aggregate functions.

It is possible to avoid  adding this extra aggregate expression  easily, 
atleast if the outer projection involving aggregate function is exactly same as 
the one that is used in subquery, or if the outer query's projection involving 
aggregate function is a subtree of the subquery's expression.

 

Thus consider the following 2 cases:

1) select  cos (sum(a)) , b from t1  group by b having exists (select x from t2 
where y = cos(sum(a)) )

2) select  sum(a) , b from t1  group by b having exists (select x from t2 where 
y = cos(sum(a)) )

 

In both the above cases, there is no need for adding extra aggregate expression.

 

I am also investigating if its possible to avoid if the case is 

 

3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from t2 
where y = sum(a) )

 

This Jira also is needed for another issue where subquery datasource v2  is 
projecting columns which are not needed. ( no Jira filed yet for that, will do 
that..)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to