Hi!

I’m curious about the fault-tolerance properties of stateful streaming 
operations. I am specifically interested about updateStateByKey.
What happens if a node fails during processing? Is the state recoverable?

Our use case is the following: we have messages arriving from a message queue 
about updating a resource specified in the message. When such update request 
arrives, we wait a specific amount of times and if in that window another 
update message arrives pointing in the same resource, we batch these, and 
update the after the time elapsed since the first in this window and update the 
resource. We thought about using updateStateByKey with key as the resource 
identifier.

It is important to guarantee exactly once processing for the messages so every 
update should happen, and no more than once.

Is it a good way to go?

Cheers,

--
David Szakallas | Software Engineer, RisingStack
Monitoring with Trace: http://trace.risingstack.com 
<http://trace.risingstack.com/>
http://risingstack.com <http://risingstack.com/> | http://blog.risingstack.com 
<http://blog.risingstack.com/>
Twitter: @szdavid92





Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to