An ExecPlan is composed of a bunch of implicit “pipelines”. Each node in a pipeline (starting with a source node) implements `InputReceived` and `InputFinished`. On `InputReceived`, it performs its computation and calls `InputReceived` on its output. On `InputFinished`, it performs any cleanup and calls `InputFinished` on its output (note that in the code, `outputs_` is a vector, but we only ever use `outputs_[0]`. This will probably end up getting cleaned up at some point). As such there’s an implicit pipeline of chained calls to `InputReceived`. Some nodes, such as Join or GroupBy or Sort are pipeline breakers: they must accumulate the whole dataset before performing their computation and starting off the next pipeline. Pipeline breakers would make use of stuff like TaskGroup and such.
So the model of parallelism is driven by the source nodes: if your source node is multithreaded, then you may have several concurrent calls to `InputReceived`. Weston mentioned to me today that there may be a way to give some sort of guarantee of “almost-ordered” input, which may be enough to make streaming work well (you’d only have to accumulate at most `num_threads` extra batches in memory at a time). I’m not sure the details of it, but that may be possible. Hopefully the description of how parallelism works was at least helpful! Sasha > On Apr 26, 2022, at 12:54 PM, Li Jin <ice.xell...@gmail.com> wrote: > > sure how they would output. (i.e., do they output batches / call