Re: Throttling stream outputs per trigger?

Vincent Marquez Tue, 08 Dec 2020 15:35:25 -0800

*~Vincent*

On Tue, Dec 8, 2020 at 3:13 PM Boyuan Zhang <boyu...@google.com> wrote:

> Please note that each record output from ReadFromKafkaDoFn is in a
> GlobalWindow. The workflow is:
> ReadFromKafkaDoFn -> Reshuffle -> Window.into(FixedWindows) ->
> Max.longsPerKey -> CommitDoFn
>                                                |
>                                                ---> downstream consumers
>
> but won't there still be 5 commits that happen as fast as possible for
>> each of the windows that were constructed from the initial fetch?
>
> I'm not sure what you mean here. Would you like to elaborate more on your
> questions?
>

Sure, I'll try to explain, it's very possible I just am misunderstanding
Windowing here.

Assumption 1:  Windowing works on the output timestamp.
Assumption 2:  Max.longsPerKey will fire as fast as it can, in other words,
there is no throttling.

So, if we have a topic that has the following msgs:
msg | timestamp (mm,ss)
-----------------------
   A  |  01:00
   B  |  01:01
   D  |  06:00
   E  |  06:04
   F  |  12:02

and we read them all at once, we will have one window that contains [A,B]
and another one that has [D,E], and a third that has [F].  Once we get the
max offset for all three, won't they fire back to back without delay? So F
will fire as soon as E is finished committing, which fires immediately
after B is committed?

Re: Throttling stream outputs per trigger?

Reply via email to