Jefffrey commented on code in PR #18607:
URL: https://github.com/apache/datafusion/pull/18607#discussion_r2514628553


##########
docs/source/user-guide/sql/aggregate_functions.md:
##########
@@ -48,6 +48,28 @@ FROM employees;
 
 Note: When no rows pass the filter, `COUNT` returns `0` while 
`SUM`/`AVG`/`MIN`/`MAX` return `NULL`.
 
+## WITHIN GROUP / Ordered-set aggregates
+
+Some aggregate functions support the SQL `WITHIN GROUP (ORDER BY ...)` clause. 
This clause is used by
+ordered-set aggregate functions (for example, percentile and rank-like 
aggregations) to specify the ordering
+of inputs that the aggregate relies on. In DataFusion, only aggregate 
functions that explicitly opt into
+ordered-set semantics via their implementation will accept `WITHIN GROUP`; 
attempting to use `WITHIN GROUP`
+with a regular aggregate (for example `SUM(x) WITHIN GROUP (ORDER BY x)`) will 
fail during planning with an
+error. This matches Postgres semantics and ensures ordered-set behavior is 
opt-in for user-defined aggregates.
+
+Example (ordered-set aggregate):
+
+```sql
+percentile_cont(0.5) WITHIN GROUP (ORDER BY value)
+```
+
+Example (invalid usage — planner will error):
+
+```sql
+-- This will fail: SUM is not an ordered-set aggregate
+SELECT SUM(x) WITHIN GROUP (ORDER BY x) FROM t;
+```

Review Comment:
   If we decide to add this to our documentation we should be explicit about 
which functions are ordered set aggregates. As it is, the user cannot tell by 
this alone.



##########
datafusion/sql/src/expr/function.rs:
##########
@@ -490,31 +496,35 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
                 let (mut args, mut arg_names) =
                     self.function_args_to_expr_with_names(args, schema, 
planner_context)?;
 
-                let order_by = if fm.supports_within_group_clause() {
-                    let within_group = self.order_by_to_sort_expr(
-                        within_group,
-                        schema,
-                        planner_context,
-                        false,
-                        None,
-                    )?;
-
-                    // Add the WITHIN GROUP ordering expressions to the front 
of the argument list
-                    // So function(arg) WITHIN GROUP (ORDER BY x) becomes 
function(x, arg)
-                    if !within_group.is_empty() {
-                        // Prepend None arg names for each WITHIN GROUP 
expression
-                        let within_group_count = within_group.len();
-                        arg_names = std::iter::repeat_n(None, 
within_group_count)
-                            .chain(arg_names)
-                            .collect();
-
-                        args = within_group
-                            .iter()
-                            .map(|sort| sort.expr.clone())
-                            .chain(args)
-                            .collect::<Vec<_>>();
-                    }
-                    within_group
+                // Enforce explicit opt-in for WITHIN GROUP: UDAFs must 
advertise
+                // `supports_within_group_clause()` to accept WITHIN GROUP 
syntax.
+                // Previously the planner forwarded WITHIN GROUP to any
+                // order-sensitive aggregate; that was too permissive.
+                let supports_within_group = fm.supports_within_group_clause();
+
+                if !within_group.is_empty() && !supports_within_group {
+                    return plan_err!(
+                        "WITHIN GROUP is only supported for ordered-set 
aggregate functions"
+                    );
+                }
+
+                // If the UDAF opted into WITHIN GROUP handling, convert the
+                // WITHIN GROUP ordering into sort expressions and prepend the
+                // ordering expressions to the function argument list (as 
unnamed
+                // args). This helper centralizes the argument-prepending 
protocol
+                // so it's easier to maintain and reason about.

Review Comment:
   Ditto for these comments



##########
datafusion/sql/src/expr/function.rs:
##########
@@ -807,6 +817,41 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
         Ok((exprs, names))
     }
 
+    fn extract_and_prepend_within_group_args(
+        &self,
+        within_group: Vec<OrderByExpr>,
+        args: Vec<Expr>,
+        arg_names: Vec<Option<String>>,
+        schema: &DFSchema,
+        planner_context: &mut PlannerContext,
+    ) -> Result<WithinGroupExtraction> {
+        let mut args = args;
+        let mut arg_names = arg_names;

Review Comment:
   ```suggestion
           mut args: Vec<Expr>,
           mut arg_names: Vec<Option<String>>,
           schema: &DFSchema,
           planner_context: &mut PlannerContext,
       ) -> Result<WithinGroupExtraction> {
   ```



##########
datafusion/sql/src/expr/function.rs:
##########
@@ -212,6 +213,11 @@ impl FunctionArgs {
     }
 }
 
+// Type alias to simplify very complex tuple return types used by helper
+// functions in this module. Using a named alias reduces the type complexity
+// shown in compiler errors and improves readability.

Review Comment:
   These comments feel overly verbose
   
   Also having this type alias so far away from its only usage feels odd to me



##########
datafusion/sql/tests/sql_integration.rs:
##########


Review Comment:
   I'm not sure adding these tests here are necessary; SLTs should be 
sufficient and I believe some of the tests added here are already in SLTs, e.g.
   
   
https://github.com/apache/datafusion/blob/d57e215f95fb2a015a825ceec1849edb32d189da/datafusion/sqllogictest/test_files/aggregate.slt#L174-L179



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to