Hi Felix, What you have described is correct. The commits are ordered i.e batch 1, batch 2 etc in that order, even if batch 2 tuples completes processing before batch 1 (with pipelining).
- Arun On 2/9/16, 4:55 AM, "Felix Dreissig" <[email protected]> wrote: >Thanks for your quick replies, Bobby and Arun! > >On 8 Feb 2016, at 18:03, Arun Mahadevan <[email protected]> wrote: >> The execute phase is pipelined and only the commits are strictly ordered. >> >> So a trident bolt could receive tuples from batch1, batch2 and again batch1 >> and so on. The framework internally maintains separate context for each >> batch and the execute is invoked with the respective batch’s context. The >> bolts could also emit tuples which are forwarded to the next bolt in the DAG >> without waiting for the batch to complete. > >Just to make sure I get this right: The intermixing of tuples from different >batches only happens when pipelining is enabled, doesn’t it? > >So, could the properties summarized as follows? >Without pipelining: Tuples are assigned to a batch and emitted as soon as >possible. When all tuples of a batch have completed processing, a commit is >issued and afterwards, tuples of the next batch will begin processing. >WIth pipeling: Tuples assigned to multiple different batches (at most >`topology.max.spout.pending` batches) may be active at a time. When all tuples >of a batch have completed processing, results from that batch are committed. >As long as the commit isn’t finished, no second commit will be started. > >Regards, >Felix
