[GitHub] [arrow] westonpace commented on pull request #10204: [WIP] ARROW-11928: [C++] Execution engine API

GitBox Fri, 07 May 2021 12:15:30 -0700


westonpace commented on pull request #10204:
URL: https://github.com/apache/arrow/pull/10204#issuecomment-834711745



   This article has an interesting description of how DAGs (which implies 
multiple outputs) are used by Materialize to optimize query plans: 
https://scattered-thoughts.net/writing/materialize-decorrelation
   
   I don't know nearly enough to know how common or essential this is.
   
   As for complications, multiple outputs introduces buffering (in both pull 
and push models).  While you are delivering a result to consumer 1 you have to 
buffer the result so you can later deliver it to consumer 2.  If your query 
plan's bottleneck is down the consumer 1 path you could potentially accumulate 
results in the multicasting operator and need to trigger backpressure.
   
   That's the main complication that jumps to mind.  That being said, this 
"multicasting" is one of the more confusing points of Rx (Reactive).  But that 
may just come from the dynamic and linear way in which observers are chained.  
Since you're already building a graph (that presumably is unchanging for the 
duration of the execution) that shouldn't be a problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace commented on pull request #10204: [WIP] ARROW-11928: [C++] Execution engine API

Reply via email to