[ https://issues.apache.org/jira/browse/ARROW-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sasha Krassovsky reassigned ARROW-16522: ---------------------------------------- Assignee: Sasha Krassovsky > [C++] Evolution of exec plan > ---------------------------- > > Key: ARROW-16522 > URL: https://issues.apache.org/jira/browse/ARROW-16522 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Weston Pace > Assignee: Sasha Krassovsky > Priority: Major > > This is a high level JIRA that will be broken into a number of subtasks. > There are a number of things that each node has to reinvent (or, more likely, > copy/paste) and it we probably know enough at this point that we can enhance > the ExecNode/ExecPlan API a bit to move some of the burden out of the nodes > and into the plan. At a high level: > * All nodes that schedule tasks use an async task group so they can make > sure that all scheduled tasks are finished before the node is marked finish. > Task scheduling could happen at the plan level and there would be one spot > keeping track of running tasks. > * If we have centralized scheduling then that opens up the door for a > centralized handling of errors and backpressure. A node's default error > behavior is then "stop scheduling tasks" and a node's default backpressure > behavior is "stop running tasks". > * There are no nodes that emit to multiple outputs today. Sending data to > multiple outputs is not simple. This is a pipeline breaking activity and new > tasks will need to be scheduled and the data will need to be held onto until > each downstream task has run, and backpressure may need to be applied if one > output is slower than the other. Node's should not be given the choice of > multiple outputs unless they are solving specifically that problem (e.g. > temporary table). -- This message was sent by Atlassian Jira (v8.20.7#820007)