[jira] [Commented] (SPARK-41141) avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it
[ https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635989#comment-17635989 ] Asif commented on SPARK-41141: -- Opened the following PR [SPARK-41141-PR|https://github.com/apache/spark/pull/38714/files] > avoid introducing a new aggregate expression in the analysis phase when > subquery is referencing it > -- > > Key: SPARK-41141 > URL: https://issues.apache.org/jira/browse/SPARK-41141 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: Asif >Priority: Minor > Labels: spark-sql > > Currently the analyzer phase rules on subquery referencing the aggregate > expression in outer query, avoids introducing a new aggregate only for a > single level aggregate function. It introduces new aggregate expression for > nested aggregate functions. > It is possible to avoid adding this extra aggregate expression easily, > atleast if the outer projection involving aggregate function is exactly same > as the one that is used in subquery, or if the outer query's projection > involving aggregate function is a subtree of the subquery's expression. > > Thus consider the following 2 cases: > 1) select cos (sum(a)) , b from t1 group by b having exists (select x from > t2 where y = cos(sum(a)) ) > 2) select sum(a) , b from t1 group by b having exists (select x from t2 > where y = cos(sum(a)) ) > > In both the above cases, there is no need for adding extra aggregate > expression. > > I am also investigating if its possible to avoid if the case is > > 3) select Cos(sum(a)) , b from t1 group by b having exists (select x from > t2 where y = sum(a) ) > > This Jira also is needed for another issue where subquery datasource v2 is > projecting columns which are not needed. ( no Jira filed yet for that, will > do that..) > > Will be opening a PR for this soon.. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41141) avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it
[ https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635988#comment-17635988 ] Apache Spark commented on SPARK-41141: -- User 'ahshahid' has created a pull request for this issue: https://github.com/apache/spark/pull/38714 > avoid introducing a new aggregate expression in the analysis phase when > subquery is referencing it > -- > > Key: SPARK-41141 > URL: https://issues.apache.org/jira/browse/SPARK-41141 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: Asif >Priority: Minor > Labels: spark-sql > > Currently the analyzer phase rules on subquery referencing the aggregate > expression in outer query, avoids introducing a new aggregate only for a > single level aggregate function. It introduces new aggregate expression for > nested aggregate functions. > It is possible to avoid adding this extra aggregate expression easily, > atleast if the outer projection involving aggregate function is exactly same > as the one that is used in subquery, or if the outer query's projection > involving aggregate function is a subtree of the subquery's expression. > > Thus consider the following 2 cases: > 1) select cos (sum(a)) , b from t1 group by b having exists (select x from > t2 where y = cos(sum(a)) ) > 2) select sum(a) , b from t1 group by b having exists (select x from t2 > where y = cos(sum(a)) ) > > In both the above cases, there is no need for adding extra aggregate > expression. > > I am also investigating if its possible to avoid if the case is > > 3) select Cos(sum(a)) , b from t1 group by b having exists (select x from > t2 where y = sum(a) ) > > This Jira also is needed for another issue where subquery datasource v2 is > projecting columns which are not needed. ( no Jira filed yet for that, will > do that..) > > Will be opening a PR for this soon.. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org