[I] Add support for `COUNT(DISTINCT expr)` [datafusion-comet]

via GitHub Wed, 03 Sep 2025 09:20:57 -0700


andygrove opened a new issue, #2292:
URL: https://github.com/apache/datafusion-comet/issues/2292


   ### What is the problem the feature request solves?
   
   The expression `COUNT(DISTINCT expr)` is relatively common and it is used in 
TPC-H, so it would be good to be able to accelerate this in Comet.
   
   Spark supports multiple expressions e.g. `COUNT(DISTINCT a, b, c)`, but 
DataFusion does not, so we should only attempt to accelerate this if there is a 
single input expression.
   
   Implementing this feature is not trivial because there are some design 
issues with how we currently support partial aggregates. Specifically, we do 
not report the correct output schema from the partial aggregate. For the 
aggregate expressions that we currently support it doesn't matter because the 
output of the partial and final aggregates is the same. For example 
`SUM(int_column)` will have the output type `int` for both partial and final. 
For `COUNT(DISTINCT int_column)` the output of the partial will be a **list** 
of int and the output of the final will be a long.
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Add support for `COUNT(DISTINCT expr)` [datafusion-comet]

Reply via email to