[GitHub] [arrow-datafusion] alamb commented on pull request #1776: Update `ExecutionPlan` to know about sortedness and repartitioning optimizer pass respect the invariants

GitBox Tue, 08 Feb 2022 09:07:53 -0800


alamb commented on pull request #1776:
URL: 
https://github.com/apache/arrow-datafusion/pull/1776#issuecomment-1032853053



   > However, I am a little bit uncertain about `output_ordering`. My 
understanding is it is present to allow repartitioning of branches with 
order-sensitive operators, such as limit, but no explicit order.
   
   I think that is correct. The specific case that `output_order` is required 
at the moment to get correct is distinguishing between
   
   ```
   Limit
   Filter
   Scan
   ```
   
   And 
   ```
   Limit
   Sort
   Scan
   ```
   
   
   > I worry that this will lead two classes of hard to track down bugs:
   > 
   > 1. ExecutionPlan that incorrectly report `None` for `output_ordering`
   > 2. Plans that make assumptions about ordering without encoding this into 
Datafusion
   
   yes, I think these are indeed two classes of hard to track down bugs that 
can/will occur if DataFusion starts optimizing based on sort orders.  (cc 
@NGA-TRAN). One might argue that we already have one example of such a bug in 
https://github.com/apache/arrow-datafusion/issues/423 😆 . I will add some more 
comments to try and make it harder to forget. 
   
   > I guess I just wonder if this is really worth the potential headaches 
:sweat_smile: 
   
   Well the real question is what is the alternative 🤔  Some thoughts are:
   1. Be conservative for operators like `Limit` and simply don't repartition / 
do anything to their inputs
   2. I could also special case Limit (for example look for a `SortExec` or 
`SortPreservingMerge` anywhere below it)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on pull request #1776: Update `ExecutionPlan` to know about sortedness and repartitioning optimizer pass respect the invariants

Reply via email to