[ 
https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Asif updated SPARK-41141:
-------------------------
    Priority: Minor  (was: Major)

> avoid introducing a new aggregate expression in the analysis phase when 
> subquery is referencing it
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-41141
>                 URL: https://issues.apache.org/jira/browse/SPARK-41141
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.1
>            Reporter: Asif
>            Priority: Minor
>              Labels: spark-sql
>
> Currently the  analyzer phase rules on subquery referencing the aggregate 
> expression in outer query, avoids introducing a new aggregate only for a 
> single level aggregate function. It introduces new aggregate expression for 
> nested aggregate functions.
> It is possible to avoid  adding this extra aggregate expression  easily, 
> atleast if the outer projection involving aggregate function is exactly same 
> as the one that is used in subquery, or if the outer query's projection 
> involving aggregate function is a subtree of the subquery's expression.
>  
> Thus consider the following 2 cases:
> 1) select  cos (sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = cos(sum(a)) )
> 2) select  sum(a) , b from t1  group by b having exists (select x from t2 
> where y = cos(sum(a)) )
>  
> In both the above cases, there is no need for adding extra aggregate 
> expression.
>  
> I am also investigating if its possible to avoid if the case is 
>  
> 3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = sum(a) )
>  
> This Jira also is needed for another issue where subquery datasource v2  is 
> projecting columns which are not needed. ( no Jira filed yet for that, will 
> do that..)
>  
> Will be opening a PR for this soon..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to