beliefer opened a new pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT URL: https://github.com/apache/spark/pull/27428 ### What changes were proposed in this pull request? This PR is related to https://github.com/apache/spark/pull/26656. https://github.com/apache/spark/pull/26656 only support use FILTER clause on aggregate expression without DISTINCT. This PR will enhance this feature when one or more DISTINCT aggregate expressions which allows the use of the FILTER clause. Such as: ``` select sum(distinct id) filter (where sex = 'man') from student; select class_id, sum(distinct id) filter (where sex = 'man') from student group by class_id; select count(id) filter (where class_id = 1), sum(distinct id) filter (where sex = 'man') from student; select class_id, count(id) filter (where class_id = 1), sum(distinct id) filter (where sex = 'man') from student group by class_id; select sum(distinct id), sum(distinct id) filter (where sex = 'man') from student; select class_id, sum(distinct id), sum(distinct id) filter (where sex = 'man') from student group by class_id; select class_id, count(id), count(id) filter (where class_id = 1), sum(distinct id), sum(distinct id) filter (where sex = 'man') from student group by class_id; ``` **Note:** In https://github.com/apache/spark/pull/26656, we use `AggregationIterator` to treat the filter conditions of aggregate expr. This is good because we can evaluate filter in first aggregate locally. If we use `AggregationIterator` too, the filter conditions of DISTINCT aggregate expr will be treated in second or thrid aggregate. In order to reduce cost, we treat the filter conditions of DISTINCT aggregate expr in first aggregate or local is better. So, this PR uses `Expand` to ensure the evaluation at local. ### Why are the changes needed? Spark SQL only support use FILTER clause on aggregate expression without DISTINCT. This PR support Filter expression allows simultaneous use of DISTINCT ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists and new UT
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org