alamb commented on issue #8043:
URL: 
https://github.com/apache/arrow-datafusion/issues/8043#issuecomment-1806069322

   Ok, I spent a while debugging our problem today
   
   What is happening is that a `RepartitionExec` was accidentally keeping the 
`preserve_ordering` flag even when the input wasn't sorted, which caused the 
wrong variant to be used
   
   This diff fixed my problem:
   
   ```diff
   diff --git a/datafusion/physical-plan/src/repartition/mod.rs 
b/datafusion/physical-plan/src/repartition/mod.rs
   index 66f7037e5c..17babcd109 100644
   --- a/datafusion/physical-plan/src/repartition/mod.rs
   +++ b/datafusion/physical-plan/src/repartition/mod.rs
   @@ -644,16 +644,16 @@ impl RepartitionExec {
            })
        }
   
   -    /// Set Order preserving flag
   +
   +    /// Set Order preserving flag, which controlls if this node is
   +    /// `RepartitionExec` or `SortPreservingRepartitionExec`. If the input 
is
   +    /// not ordered, or has only one partiton, remains a `RepartitionExec`.
        pub fn with_preserve_order(mut self, preserve_order: bool) -> Self {
   -        // Set "preserve order" mode only if the input partition count is 
larger than 1
   -        // Because in these cases naive `RepartitionExec` cannot maintain 
ordering. Using
   -        // `SortPreservingRepartitionExec` is necessity. However, when 
input partition number
   -        // is 1, `RepartitionExec` can maintain ordering. In this case, we 
don't need to use
   -        // `SortPreservingRepartitionExec` variant to maintain ordering.
   -        if self.input.output_partitioning().partition_count() > 1 {
   -            self.preserve_order = preserve_order
   -        }
   +        self.preserve_order = preserve_order &&
   +                // If the input isn't ordered, there is no ordering to 
preserve
   +                self.input.output_ordering().is_some() &&
   +                // if there is only one input partition, merging is 
required to maintain order
   +                self.input.output_partitioning().partition_count() > 1;
            self
        }
   ```
   
   I will prepare a PR with this fix shortly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to