[ https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Asif updated SPARK-41141: ------------------------- Description: Currently the analyzer phase rules on subquery referencing the aggregate expression in outer query, avoids introducing a new aggregate only for a single level aggregate function. It introduces new aggregate expression for nested aggregate functions. It is possible to avoid adding this extra aggregate expression easily, atleast if the outer projection involving aggregate function is exactly same as the one that is used in subquery, or if the outer query's projection involving aggregate function is a subtree of the subquery's expression. Thus consider the following 2 cases: 1) select cos (sum(a)) , b from t1 group by b having exists (select x from t2 where y = cos(sum(a)) ) 2) select sum(a) , b from t1 group by b having exists (select x from t2 where y = cos(sum(a)) ) In both the above cases, there is no need for adding extra aggregate expression. I am also investigating if its possible to avoid if the case is 3) select Cos(sum(a)) , b from t1 group by b having exists (select x from t2 where y = sum(a) ) This Jira also is needed for another issue where subquery datasource v2 is projecting columns which are not needed. ( no Jira filed yet for that, will do that..) Will be opening a PR for this soon.. was: Currently the analyzer phase rules on subquery referencing the aggregate expression in outer query, avoids introducing a new aggregate only for a single level aggregate function. It introduces new aggregate expression for nested aggregate functions. It is possible to avoid adding this extra aggregate expression easily, atleast if the outer projection involving aggregate function is exactly same as the one that is used in subquery, or if the outer query's projection involving aggregate function is a subtree of the subquery's expression. Thus consider the following 2 cases: 1) select cos (sum(a)) , b from t1 group by b having exists (select x from t2 where y = cos(sum(a)) ) 2) select sum(a) , b from t1 group by b having exists (select x from t2 where y = cos(sum(a)) ) In both the above cases, there is no need for adding extra aggregate expression. I am also investigating if its possible to avoid if the case is 3) select Cos(sum(a)) , b from t1 group by b having exists (select x from t2 where y = sum(a) ) This Jira also is needed for another issue where subquery datasource v2 is projecting columns which are not needed. ( no Jira filed yet for that, will do that..) > avoid introducing a new aggregate expression in the analysis phase when > subquery is referencing it > -------------------------------------------------------------------------------------------------- > > Key: SPARK-41141 > URL: https://issues.apache.org/jira/browse/SPARK-41141 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.1 > Reporter: Asif > Priority: Major > Labels: spark-sql > > Currently the analyzer phase rules on subquery referencing the aggregate > expression in outer query, avoids introducing a new aggregate only for a > single level aggregate function. It introduces new aggregate expression for > nested aggregate functions. > It is possible to avoid adding this extra aggregate expression easily, > atleast if the outer projection involving aggregate function is exactly same > as the one that is used in subquery, or if the outer query's projection > involving aggregate function is a subtree of the subquery's expression. > > Thus consider the following 2 cases: > 1) select cos (sum(a)) , b from t1 group by b having exists (select x from > t2 where y = cos(sum(a)) ) > 2) select sum(a) , b from t1 group by b having exists (select x from t2 > where y = cos(sum(a)) ) > > In both the above cases, there is no need for adding extra aggregate > expression. > > I am also investigating if its possible to avoid if the case is > > 3) select Cos(sum(a)) , b from t1 group by b having exists (select x from > t2 where y = sum(a) ) > > This Jira also is needed for another issue where subquery datasource v2 is > projecting columns which are not needed. ( no Jira filed yet for that, will > do that..) > > Will be opening a PR for this soon.. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org