Support non-keyed stateful ParDo

Xinyu Liu Wed, 25 Apr 2018 17:46:15 -0700

Hi,

I am working on adding the stateful ParDo to the upcoming BEAM Samza
runner, and realized that the state for each ParDo processElement() is not
only associated with the window of the element, but also the key of the
element. Chatted with Kenneth over email about this design decision, which
has the following benefits for keyed state:


1) No synchronization
2) Simple programming model
3) No communication between works

The current design doesn't support accessing the state across different
keys, which seems to be a more general use case. This use case is also very
common inside LinkedIn where the users have access to the entire state of
an operator/task, and performing lookups and computations on top of it.
It's quite hard to make every user here aware that the state is also
tightly associated with key of the element.. From the stateful ParDo API
the state looks pretty general too. I am wondering is it possible to extend
the current API to support both keyed and non-keyed state? Even internally
BEAM assigns a dummy key for to associate the state with all the elements.
It will be very beneficial to existing Samza users and help them adopt BEAM.

Thanks,
Xinyu

Support non-keyed stateful ParDo

Reply via email to