zhuqi-lucas commented on PR #21976:
URL: https://github.com/apache/datafusion/pull/21976#issuecomment-4384563140

   cc @alamb @adriangb @xudong963  @Dandandan @2010YOUY01 
   
   We've hit several production bugs caused by EnforceDistribution and 
EnforceSorting running as separate rules — they're coupled through 
SortExec.preserve_partitioning, which makes the composition non-idempotent.
   Other engines (Spark's EnsureRequirements, Trino's AddExchanges) handle both 
in a single rule, which sidesteps this entire bug class.
   
    I initially attempted a true single-pass rewrite, but the interactions 
between distribution and sorting requirements turned out to be deeper than 
anticipated.
   
   This PR takes a more conservative approach — composing the existing rules 
into one with the necessary correctness fixes. A clean single-pass design is 
left as a follow-up once the foundation is in place.
   
   If a default change feels too risky for a first step, I'm happy to gate it 
behind a config flag (e.g. `datafusion.optimizer.use_ensure_requirements`) so 
users can opt in and we can validate equivalence in CI before flipping the 
default.
   
   Would appreciate any thoughts on the overall direction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to