findepi commented on PR #16625: URL: https://github.com/apache/datafusion/pull/16625#issuecomment-3093779036
> I re-read the comments on this PR and I wonder if you tried implementing the solution suggested by @ozankabak in [#16625 (comment)](https://github.com/apache/datafusion/pull/16625#issuecomment-3019757986): > > > If getting this over the finish line in the short term is important to you, I think a reasonable step forward is to take a look at what solving it entails: There are basically two steps: (1) AggregateExec needs to consult the UDAF definition as it forms its required input ordering (this is the easy step), (2) The enforce sorting rule needs to address the case when there is already a sort with a prefix of a soft requirement, and just extend the sort keys (this is the harder step). > > I think the solution will ultimately be small in terms of LOC changes, but the second step will require some thinking. > > This PR seems similar except that it adds the `SoftRequirement` stage as well. If we could avoid the need for `SoftRequirement` I think this PR would be pretty great https://github.com/apache/datafusion/pull/16625#discussion_r2178514868 https://github.com/apache/datafusion/pull/16625#issuecomment-3029193406 I am not convinced it's actually desired to make the existing `Beneficial` behave like the new `SoftRequirement` does though. Consider first_value / top_1 function, which is O(n). It benefits from the input being sorted, becoming O(1) in such case. It does not, however, want to impose input sorting, as that would be O(n log n). This is different from e.g. sorted array_agg, which needs to sort either way. One can argue that sorting for "first_value order by a, b" can be added only if there already is some sorting on a. It's not a bad argument, but note that within the group of least a value, it's still O(group size) -> O(group size * log group size) change. Thus, it seems optimal to be able to distinguish functions that - benefit from ordering, e.g. first_value (`Beneficial`) - those are simply better if input can be ordered, e.g. ordered array_agg in this PR (`SoftRequirement` being added in this PR) - those which cannot execute if input is not pre-ordered, e.g. ordered array_agg before this PR (`HardRequirement`) - those which do not care about input ordering (`Insensitive`) Thus it makes sense for the `AggregateOrderSensitivity` to have 4 options. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org