Hi Aaron,

maybe someone else will give another option, but if I understand correctly what you want to solve, then you essentially have to do either:

 a) use the compare & swap mechanism in the sink you described

 b) use a buffer to buffer elements inside the outputting ParDo and only output them when watermark passes (using a timer).

There is actually an ongoing discussion about how to make option b) user-friendly and part of Beam itself, but currently there is no out-of-the-box solution for that.

Jan

On 11/25/19 10:27 PM, Aaron Dixon wrote:
Suppose I trigger a Combine per-element (in a high-volume stream) and use a ParDo as a sink.

I assume there is no guarantee about the order that my ParDo will see these triggers, especially as it processes in parallel, anyway.

That said, my sink writes to a db or cache and I would not like the cache to ever regress its value to something "before" what it has already written.

Is the best way to solve this problem to always write the event-time in the cache and do a compare-and-swap only updating the sink if the triggered value in-hand is later than the target value?

Or is there a better way to guarantee that my ParDo sink will process elements in-order? (Eg, if I can give up per-event/real-time, then a delay-based trigger would probably be sufficient I imagine.)

Thanks for advice!

Reply via email to