zhuqi-lucas commented on PR #21976:
URL: https://github.com/apache/datafusion/pull/21976#issuecomment-4384563140
cc @alamb @adriangb @xudong963 @Dandandan @2010YOUY01
We've hit several production bugs caused by EnforceDistribution and
EnforceSorting running as separate rules — they're coupled through
SortExec.preserve_partitioning, which makes the composition non-idempotent.
Other engines (Spark's EnsureRequirements, Trino's AddExchanges) handle both
in a single rule, which sidesteps this entire bug class.
I initially attempted a true single-pass rewrite, but the interactions
between distribution and sorting requirements turned out to be deeper than
anticipated.
This PR takes a more conservative approach — composing the existing rules
into one with the necessary correctness fixes. A clean single-pass design is
left as a follow-up once the foundation is in place.
If a default change feels too risky for a first step, I'm happy to gate it
behind a config flag (e.g. `datafusion.optimizer.use_ensure_requirements`) so
users can opt in and we can validate equivalence in CI before flipping the
default.
Would appreciate any thoughts on the overall direction.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]