andygrove opened a new issue, #2292: URL: https://github.com/apache/datafusion-comet/issues/2292
### What is the problem the feature request solves? The expression `COUNT(DISTINCT expr)` is relatively common and it is used in TPC-H, so it would be good to be able to accelerate this in Comet. Spark supports multiple expressions e.g. `COUNT(DISTINCT a, b, c)`, but DataFusion does not, so we should only attempt to accelerate this if there is a single input expression. Implementing this feature is not trivial because there are some design issues with how we currently support partial aggregates. Specifically, we do not report the correct output schema from the partial aggregate. For the aggregate expressions that we currently support it doesn't matter because the output of the partial and final aggregates is the same. For example `SUM(int_column)` will have the output type `int` for both partial and final. For `COUNT(DISTINCT int_column)` the output of the partial will be a **list** of int and the output of the final will be a long. ### Describe the potential solution _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
