Thanks for your quick replies, Bobby and Arun! On 8 Feb 2016, at 18:03, Arun Mahadevan <[email protected]> wrote: > The execute phase is pipelined and only the commits are strictly ordered. > > So a trident bolt could receive tuples from batch1, batch2 and again batch1 > and so on. The framework internally maintains separate context for each batch > and the execute is invoked with the respective batch’s context. The bolts > could also emit tuples which are forwarded to the next bolt in the DAG > without waiting for the batch to complete.
Just to make sure I get this right: The intermixing of tuples from different batches only happens when pipelining is enabled, doesn’t it? So, could the properties summarized as follows? Without pipelining: Tuples are assigned to a batch and emitted as soon as possible. When all tuples of a batch have completed processing, a commit is issued and afterwards, tuples of the next batch will begin processing. WIth pipeling: Tuples assigned to multiple different batches (at most `topology.max.spout.pending` batches) may be active at a time. When all tuples of a batch have completed processing, results from that batch are committed. As long as the commit isn’t finished, no second commit will be started. Regards, Felix
