I understand that firing multiple times per window tend to be non-deterministic but here is an example use case. A pipeline reads pubsub messages which contain account linking messages. Each message has two accounts that are linked together so the user produces a KV<AccountA, AccountB> and KV<AccountB, AccountA> and outputs them to the multimap PCollectionView. Then some other portion of the pipeline consumes account update messages from a different pubsub topic and makes sure that updates are applied to all linked accounts. Could the author of the pipeline know that the multimap will contain a consistent view of these bidirectional mappings?
On Thu, Apr 11, 2019 at 9:44 AM Reuven Lax <[email protected]> wrote: > One thing to keep in mind: triggers that fire multiple times per window > already tend to be non deterministic. These are element-count or > processing-time triggers, both of which are fairly non deterministic in > firing. > > Reuven > > On Thu, Apr 11, 2019 at 9:27 AM Lukasz Cwik <[email protected]> wrote: > >> Today, we define that a side input becomes available to be consumed once >> at least one firing occurs or when the runner detects that no such output >> could be produced (e.g. watermark is beyond the end of the window when >> using the default trigger). For triggers that fire at most once, consumers >> are guaranteed to have a consistent view of the contents of the side input. >> But what happens when the trigger fire multiple times? >> >> Lets say we have a pipeline containing: >> ParDo(A) --> PCollectionView S >> \-> PCollectionView T >> >> ... >> | >> ParDo(C) <-(side input)- PCollectionView S and PCollectionView T >> | >> ... >> >> 1) Lets say ParDo(A) outputs (during a single bundle) X and Y to >> PCollectionView S, should ParDo(C) see be guaranteed to see X only if it >> can also see Y (and vice versa)? >> >> 2) Lets say ParDo(A) outputs (during a single bundle) X to >> PCollectionView S and Y to PCollectionView T, should ParDo(C) be guaranteed >> to see X only if it can also see Y? >> >
