An ExecPlan is composed of a bunch of implicit “pipelines”. Each node in a 
pipeline (starting with a source node) implements `InputReceived` and 
`InputFinished`. On `InputReceived`, it performs its computation and calls 
`InputReceived` on its output. On `InputFinished`, it performs any cleanup and 
calls `InputFinished` on its output (note that in the code, `outputs_` is a 
vector, but we only ever use `outputs_[0]`. This will probably end up getting 
cleaned up at some point). As such there’s an implicit pipeline of chained 
calls to `InputReceived`. Some nodes, such as Join or GroupBy or Sort are 
pipeline breakers: they must accumulate the whole dataset before performing 
their computation and starting off the next pipeline. Pipeline breakers would 
make use of stuff like TaskGroup and such. 

So the model of parallelism is driven by the source nodes: if your source node 
is multithreaded, then you may have several concurrent calls to 
`InputReceived`. Weston mentioned to me today that there may be a way to give 
some sort of guarantee of “almost-ordered” input, which may be enough to make 
streaming work well (you’d only have to accumulate at most `num_threads` extra 
batches in memory at a time). I’m not sure the details of it, but that may be 
possible. 

Hopefully the description of how parallelism works was at least helpful!

Sasha

> On Apr 26, 2022, at 12:54 PM, Li Jin <ice.xell...@gmail.com> wrote:
> 
> sure how they would output. (i.e., do they output batches / call

Reply via email to