Dear Samza users,

We recently discovered an issue with the way we handle state in Samza Beam
and Samza High-Level API Window operators. Under certain situations, at
least once processing guarantee is violated.


*Details on the issue*

The Samza high-level API includes operators such as windows which can hold
messages and emit them at a later time. eg: a time based window will buffer
its messages and emit results only at periodic time intervals. Here’s the
sequence of operations when a window operator is ready to emit results:

   1.

   Obtain the results ready to be emitted for the window
   2.

   Remove the operator state corresponding to those results from its
   state-store
   3.

   Propagate the window results to down-stream operators in the pipeline.
   The results eventually makes it to a terminal operator, which emits the
   final output.
   4.

   At some future point, issue a commit operation, which flushes the
   producers, the state stores and persists the input offsets.


Scenarios

   -

   An exception in a downstream operator in step 3.
   -

   An unclean shutdown, e.g., due to a “kill -9” to the container before
   the outputs have been flushed in step 4


Both these examples violate Samza’s at-least once processing guarantee,
since they cause state to be modified even though the processed outputs may
not have been emitted.


*Solution*

Here's a proposal
<https://docs.google.com/document/d/1wtSAUMrns14cWCf5Wdf4YwXd7K5Hf-bm7W3SQ3sXZsQ/edit>
to fix the above issue. We are working on addressing this with the highest
priority.


Thanks,

Jagadish

Reply via email to