@Jan @Pablo Thank you @Pablo In this case it's a single global windowed Combine/perKey, triggered per element. Keys are few (client accounts) so they can live forever.
It looks like just by virtue of using a stateful ParDo I could get this final execution to be "serialized" per key. (Then I could simply do the compare-and-swap using Beam's state mechanism to keep track of the "latest trigger timestamp" instead of having to orchestrate compare-and-swap in the target store :thinking:.) On Mon, Nov 25, 2019 at 4:14 PM Jan Lukavský <je...@seznam.cz> wrote: > One addition, to make the list of options exhaustive, there is probably > one more option > > c) create a ParDo keyed by primary key of your sink, cache the last > write in there and compare it locally, without the need to query the > database > > It would still need some timer to clear values after watermark + allowed > lateness, because otherwise you would have to cache your whole database > on workers. But because you don't need actual ordering, you just need > the most recent value (if I got it right) this might be an option. > > Jan > > On 11/25/19 10:53 PM, Jan Lukavský wrote: > > Hi Aaron, > > > > maybe someone else will give another option, but if I understand > > correctly what you want to solve, then you essentially have to do either: > > > > a) use the compare & swap mechanism in the sink you described > > > > b) use a buffer to buffer elements inside the outputting ParDo and > > only output them when watermark passes (using a timer). > > > > There is actually an ongoing discussion about how to make option b) > > user-friendly and part of Beam itself, but currently there is no > > out-of-the-box solution for that. > > > > Jan > > > > On 11/25/19 10:27 PM, Aaron Dixon wrote: > >> Suppose I trigger a Combine per-element (in a high-volume stream) and > >> use a ParDo as a sink. > >> > >> I assume there is no guarantee about the order that my ParDo will see > >> these triggers, especially as it processes in parallel, anyway. > >> > >> That said, my sink writes to a db or cache and I would not like the > >> cache to ever regress its value to something "before" what it has > >> already written. > >> > >> Is the best way to solve this problem to always write the event-time > >> in the cache and do a compare-and-swap only updating the sink if the > >> triggered value in-hand is later than the target value? > >> > >> Or is there a better way to guarantee that my ParDo sink will process > >> elements in-order? (Eg, if I can give up per-event/real-time, then a > >> delay-based trigger would probably be sufficient I imagine.) > >> > >> Thanks for advice! >