[jira] [Commented] (SPARK-41141) avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-18 Thread Asif (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635989#comment-17635989
 ] 

Asif commented on SPARK-41141:
--

Opened the following PR

[SPARK-41141-PR|https://github.com/apache/spark/pull/38714/files]

 

> avoid introducing a new aggregate expression in the analysis phase when 
> subquery is referencing it
> --
>
> Key: SPARK-41141
> URL: https://issues.apache.org/jira/browse/SPARK-41141
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Asif
>Priority: Minor
>  Labels: spark-sql
>
> Currently the  analyzer phase rules on subquery referencing the aggregate 
> expression in outer query, avoids introducing a new aggregate only for a 
> single level aggregate function. It introduces new aggregate expression for 
> nested aggregate functions.
> It is possible to avoid  adding this extra aggregate expression  easily, 
> atleast if the outer projection involving aggregate function is exactly same 
> as the one that is used in subquery, or if the outer query's projection 
> involving aggregate function is a subtree of the subquery's expression.
>  
> Thus consider the following 2 cases:
> 1) select  cos (sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = cos(sum(a)) )
> 2) select  sum(a) , b from t1  group by b having exists (select x from t2 
> where y = cos(sum(a)) )
>  
> In both the above cases, there is no need for adding extra aggregate 
> expression.
>  
> I am also investigating if its possible to avoid if the case is 
>  
> 3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = sum(a) )
>  
> This Jira also is needed for another issue where subquery datasource v2  is 
> projecting columns which are not needed. ( no Jira filed yet for that, will 
> do that..)
>  
> Will be opening a PR for this soon..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41141) avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635988#comment-17635988
 ] 

Apache Spark commented on SPARK-41141:
--

User 'ahshahid' has created a pull request for this issue:
https://github.com/apache/spark/pull/38714

> avoid introducing a new aggregate expression in the analysis phase when 
> subquery is referencing it
> --
>
> Key: SPARK-41141
> URL: https://issues.apache.org/jira/browse/SPARK-41141
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Asif
>Priority: Minor
>  Labels: spark-sql
>
> Currently the  analyzer phase rules on subquery referencing the aggregate 
> expression in outer query, avoids introducing a new aggregate only for a 
> single level aggregate function. It introduces new aggregate expression for 
> nested aggregate functions.
> It is possible to avoid  adding this extra aggregate expression  easily, 
> atleast if the outer projection involving aggregate function is exactly same 
> as the one that is used in subquery, or if the outer query's projection 
> involving aggregate function is a subtree of the subquery's expression.
>  
> Thus consider the following 2 cases:
> 1) select  cos (sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = cos(sum(a)) )
> 2) select  sum(a) , b from t1  group by b having exists (select x from t2 
> where y = cos(sum(a)) )
>  
> In both the above cases, there is no need for adding extra aggregate 
> expression.
>  
> I am also investigating if its possible to avoid if the case is 
>  
> 3) select  Cos(sum(a)) , b from t1  group by b having exists (select x from 
> t2 where y = sum(a) )
>  
> This Jira also is needed for another issue where subquery datasource v2  is 
> projecting columns which are not needed. ( no Jira filed yet for that, will 
> do that..)
>  
> Will be opening a PR for this soon..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org