pepijnve commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2930199048

   > Making the rule smarter, by utilizing EmissionType information, so that it 
only adds YieldExecs when necessary. If the path from a leaf to the root does 
not involve any operator that is pipeline-breaking, there is no need to insert 
a YieldExec as the parent of that leaf. I think this similar to @pepijnve' 
thinking.
   
   My thinking was that we could use EmissionType to insert the yield wrapper 
closer to where it's needed rather than at the leaves.
   
   So if you have
   
   ```
   AggregateExec -> final
     FilterExec -> copy input behavior
       ProjectionExec -> copy input behavior
         DataSourceExec -> incremental
   ```
   
   rather than adding yield as parent of the leaves
   
   ```
   AggregateExec -> final
     FilterExec -> copy input behavior
       ProjectionExec -> copy input behavior
         YieldExec -> copy input behavior
           DataSourceExec -> incremental
   ```
   
   you would add it as parent of the children of the node with emission type 
final
   
   ```
   AggregateExec -> final
     YieldExec -> copy input behavior
       FilterExec -> copy input behavior
         ProjectionExec -> copy input behavior
           DataSourceExec -> incremental
   ```
   
   I'll admit that I didn't work out all the possible scenarios, but my 
reasoning is that if you have more than one pipeline breaking operator in a 
chain that this will work more reliably since you're 'fixing' each of them 
rather than injecting yield points at the leaves and hoping this propagates 
through the entire chain.
   Additional benefit might be that it's more trivial to implement the plan 
transformation this way since it only requires very local context (i.e. the 
'final' nodes themselves).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to