beliefer opened a new pull request #27428: [SPARK-30276][SQL] Support Filter 
expression allows simultaneous use of DISTINCT
URL: https://github.com/apache/spark/pull/27428
 
 
   ### What changes were proposed in this pull request?
   This PR is related to https://github.com/apache/spark/pull/26656.
   https://github.com/apache/spark/pull/26656 only support use FILTER clause on 
aggregate expression without DISTINCT.
   This PR will enhance this feature when one or more DISTINCT aggregate 
expressions which allows the use of the FILTER clause.
   Such as:
   ```
   select sum(distinct id) filter (where sex = 'man') from student;
   select class_id, sum(distinct id) filter (where sex = 'man') from student 
group by class_id;
   select count(id) filter (where class_id = 1), sum(distinct id) filter (where 
sex = 'man') from student;
   select class_id, count(id) filter (where class_id = 1), sum(distinct id) 
filter (where sex = 'man') from student group by class_id;
   select sum(distinct id), sum(distinct id) filter (where sex = 'man') from 
student;
   select class_id, sum(distinct id), sum(distinct id) filter (where sex = 
'man') from student group by class_id;
   select class_id, count(id), count(id) filter (where class_id = 1), 
sum(distinct id), sum(distinct id) filter (where sex = 'man') from student 
group by class_id;
   ```
   
   **Note:**
   In https://github.com/apache/spark/pull/26656, we use `AggregationIterator` 
to treat the filter conditions of aggregate expr. This is good because we can 
evaluate filter in first aggregate locally.
   If we use `AggregationIterator` too, the filter conditions of DISTINCT 
aggregate expr will be treated in second or thrid aggregate.
   In order to reduce cost, we treat the filter conditions of DISTINCT 
aggregate expr in first aggregate or local is better.
   So, this PR uses `Expand` to ensure the evaluation at local.
   
   ### Why are the changes needed?
   Spark SQL only support use FILTER clause on aggregate expression without 
DISTINCT.
   This PR support Filter expression allows simultaneous use of DISTINCT
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   Exists and new UT 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to