drusso commented on a change in pull request #8222:
URL: https://github.com/apache/arrow/pull/8222#discussion_r500984291



##########
File path: rust/datafusion/src/physical_plan/aggregates.rs
##########
@@ -124,14 +126,30 @@ pub fn create_aggregate_expr(
 
     let return_type = return_type(&fun, &arg_types)?;
 
-    Ok(match fun {
-        AggregateFunction::Count => {
+    Ok(match (fun, distinct) {
+        (AggregateFunction::Count, false) => {
             Arc::new(expressions::Count::new(arg, name, return_type))
         }
-        AggregateFunction::Sum => Arc::new(expressions::Sum::new(arg, name, 
return_type)),
-        AggregateFunction::Min => Arc::new(expressions::Min::new(arg, name, 
return_type)),
-        AggregateFunction::Max => Arc::new(expressions::Max::new(arg, name, 
return_type)),
-        AggregateFunction::Avg => Arc::new(expressions::Avg::new(arg, name, 
return_type)),
+        (AggregateFunction::Count, true) => {
+            Arc::new(distinct_expressions::DistinctCount::new(
+                arg_types,
+                args.clone(),
+                name,
+                return_type,
+            ))
+        }
+        (AggregateFunction::Sum, _) => {

Review comment:
       Sounds good, and I will do the same for `Avg`. I will leave `Min` and 
`Max` as `_` since the `distinct` flag shouldn't have any effect on these 
aggregations.
   
   The only change in behaviour to highlight would be that existing queries 
that happen to include a `sum(distinct)` or `avg(distinct)` would start to 
fail, rather than be handled implicitly as a plain `sum()` or `avg()`, but I 
believe this would be a non-issue. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to