[ https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-41141: ------------------------------------ Assignee: (was: Apache Spark) > avoid introducing a new aggregate expression in the analysis phase when > subquery is referencing it > -------------------------------------------------------------------------------------------------- > > Key: SPARK-41141 > URL: https://issues.apache.org/jira/browse/SPARK-41141 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.1 > Reporter: Asif > Priority: Minor > Labels: spark-sql > > Currently the analyzer phase rules on subquery referencing the aggregate > expression in outer query, avoids introducing a new aggregate only for a > single level aggregate function. It introduces new aggregate expression for > nested aggregate functions. > It is possible to avoid adding this extra aggregate expression easily, > atleast if the outer projection involving aggregate function is exactly same > as the one that is used in subquery, or if the outer query's projection > involving aggregate function is a subtree of the subquery's expression. > > Thus consider the following 2 cases: > 1) select cos (sum(a)) , b from t1 group by b having exists (select x from > t2 where y = cos(sum(a)) ) > 2) select sum(a) , b from t1 group by b having exists (select x from t2 > where y = cos(sum(a)) ) > > In both the above cases, there is no need for adding extra aggregate > expression. > > I am also investigating if its possible to avoid if the case is > > 3) select Cos(sum(a)) , b from t1 group by b having exists (select x from > t2 where y = sum(a) ) > > This Jira also is needed for another issue where subquery datasource v2 is > projecting columns which are not needed. ( no Jira filed yet for that, will > do that..) > > Will be opening a PR for this soon.. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org