kosiew opened a new pull request, #18607:
URL: https://github.com/apache/datafusion/pull/18607

   ## Which issue does this PR close?
   
   Closes #18109.
   
   ## Rationale for this change
   
   Previously, the SQL planner accepted `WITHIN GROUP` clauses for all 
aggregate UDAFs, even those that did not explicitly support ordered-set 
semantics. This behavior was too permissive and inconsistent with PostgreSQL. 
For example, queries such as `SUM(x) WITHIN GROUP (ORDER BY x)` were allowed, 
even though `SUM` is not an ordered-set aggregate.
   
   This PR enforces stricter validation so that only UDAFs that explicitly 
return `true` from `supports_within_group_clause()` may use `WITHIN GROUP`. All 
other aggregates now produce a clear planner error when this syntax is used.
   
   ## What changes are included in this PR?
   
   * Added type alias `WithinGroupExtraction` to simplify complex tuple return 
types used by helper functions.
   * Introduced a new helper method `extract_and_prepend_within_group_args` to 
centralize logic for handling `WITHIN GROUP` argument rewriting.
   * Updated the planner to:
   
     * Validate that only UDAFs with `supports_within_group_clause()` can 
accept `WITHIN GROUP`.
     * Prepend `WITHIN GROUP` ordering expressions to function arguments only 
for supported ordered-set aggregates.
     * Produce clear error messages when `WITHIN GROUP` is used incorrectly.
   * Added comprehensive unit tests verifying correct behavior and failure 
cases:
   
     * `WITHIN GROUP` rejected for non-ordered-set aggregates (`MIN`, `SUM`, 
etc.).
     * `WITHIN GROUP` accepted for ordered-set aggregates such as 
`percentile_cont`.
     * Validation for named arguments, multiple ordering expressions, and 
semantic conflicts with `OVER` clauses.
   * Updated SQL logic tests (`aggregate.slt`) to reflect new rejection 
behavior.
   * Updated documentation:
   
     * `aggregate_functions.md` and developer docs to clarify when and how 
`WITHIN GROUP` can be used.
     * `upgrading.md` to inform users of this stricter enforcement and 
migration guidance.
   
   ## Are these changes tested?
   
   ✅ Yes.
   
   * New tests in `sql_integration.rs` validate acceptance, rejection, and 
argument behavior of `WITHIN GROUP` for both valid and invalid cases.
   * SQL logic tests (`aggregate.slt`) include negative test cases confirming 
planner rejections.
   
   ## Are there any user-facing changes?
   
   ✅ Yes.
   
   * Users attempting to use `WITHIN GROUP` with regular aggregates (e.g. 
`SUM`, `AVG`, `MIN`, `MAX`) will now see a planner error:
   
     > `WITHIN GROUP is only supported for ordered-set aggregate functions`
   
   * Documentation has been updated to clearly describe `WITHIN GROUP` 
semantics and provide examples of valid and invalid usage.
   
   No API-breaking changes were introduced; only stricter planner validation 
and improved error messaging.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to