Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Rui Wang
Hi Community,

I am trying to understand Beam model and having a question related to
accumulating mode and panes:

Accumulating mode means every time when a trigger fires, it emits all
values seen so far in a window(so it's called accumulating), an example
from Beam programming model guide[1] sets a repeating order has a repeating
trigger that fires every time 3 elements arrive on a 10-min fixed windowing
with the following emitted results:

  First trigger firing:  [5, 8, 3]
  Second trigger firing: [5, 8, 3, 15, 19, 23]
  Third trigger firing:  [5, 8, 3, 15, 19, 23, 9, 13, 10]


The original dataflow paper[2] also mentions that accumulating mode is
useful to downstream consumers to overwrite old value with new value.

In order to help such "overwriting" use case, seems to me that Beam model
provides:
1. triggers fire in order. In the example above, second trigger firing
should after first trigger firing such that downstream transforms should
see [5, 8, 3] before [5, 8, 3, 15, 19, 23].
2. downstream transforms execute panes in order. If this is not true, it
might end with that new values(from later panes) are overwritten by old
values(earlier panes)

Do I have a correct understanding?


Thanks,
Rui



[1]:
https://beam.apache.org/documentation/programming-guide/#setting-a-trigger
[2]: https://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf


Re: Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Robert Bradshaw
There is no promise that panes will arrive in order (especially the
further you get "downstream"). Though they may be approximately so,
it's dangerous to assume that. You can inspect the sequential index in
PaneInfo to determine whether a pane is older than other panes you
have seen.

On Wed, Jun 26, 2019 at 7:03 PM Rui Wang  wrote:
>
> Hi Community,
>
> I am trying to understand Beam model and having a question related to 
> accumulating mode and panes:
>
> Accumulating mode means every time when a trigger fires, it emits all values 
> seen so far in a window(so it's called accumulating), an example from Beam 
> programming model guide[1] sets a repeating order has a repeating trigger 
> that fires every time 3 elements arrive on a 10-min fixed windowing with the 
> following emitted results:
>
>   First trigger firing:  [5, 8, 3]
>   Second trigger firing: [5, 8, 3, 15, 19, 23]
>   Third trigger firing:  [5, 8, 3, 15, 19, 23, 9, 13, 10]
>
>
> The original dataflow paper[2] also mentions that accumulating mode is useful 
> to downstream consumers to overwrite old value with new value.
>
> In order to help such "overwriting" use case, seems to me that Beam model 
> provides:
> 1. triggers fire in order. In the example above, second trigger firing should 
> after first trigger firing such that downstream transforms should see [5, 8, 
> 3] before [5, 8, 3, 15, 19, 23].
> 2. downstream transforms execute panes in order. If this is not true, it 
> might end with that new values(from later panes) are overwritten by old 
> values(earlier panes)
>
> Do I have a correct understanding?
>
>
> Thanks,
> Rui
>
>
>
> [1]: 
> https://beam.apache.org/documentation/programming-guide/#setting-a-trigger
> [2]: https://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf


Re: Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Steve Niemitz
There was a thread about this a few months ago as well:
https://lists.apache.org/thread.html/20d11046d26174969ef44a781e409a1cb9f7c736e605fa40fdf98397@%3Cuser.beam.apache.org%3E


On Wed, Jun 26, 2019 at 4:02 PM Robert Bradshaw  wrote:

> There is no promise that panes will arrive in order (especially the
> further you get "downstream"). Though they may be approximately so,
> it's dangerous to assume that. You can inspect the sequential index in
> PaneInfo to determine whether a pane is older than other panes you
> have seen.
>
> On Wed, Jun 26, 2019 at 7:03 PM Rui Wang  wrote:
> >
> > Hi Community,
> >
> > I am trying to understand Beam model and having a question related to
> accumulating mode and panes:
> >
> > Accumulating mode means every time when a trigger fires, it emits all
> values seen so far in a window(so it's called accumulating), an example
> from Beam programming model guide[1] sets a repeating order has a repeating
> trigger that fires every time 3 elements arrive on a 10-min fixed windowing
> with the following emitted results:
> >
> >   First trigger firing:  [5, 8, 3]
> >   Second trigger firing: [5, 8, 3, 15, 19, 23]
> >   Third trigger firing:  [5, 8, 3, 15, 19, 23, 9, 13, 10]
> >
> >
> > The original dataflow paper[2] also mentions that accumulating mode is
> useful to downstream consumers to overwrite old value with new value.
> >
> > In order to help such "overwriting" use case, seems to me that Beam
> model provides:
> > 1. triggers fire in order. In the example above, second trigger firing
> should after first trigger firing such that downstream transforms should
> see [5, 8, 3] before [5, 8, 3, 15, 19, 23].
> > 2. downstream transforms execute panes in order. If this is not true, it
> might end with that new values(from later panes) are overwritten by old
> values(earlier panes)
> >
> > Do I have a correct understanding?
> >
> >
> > Thanks,
> > Rui
> >
> >
> >
> > [1]:
> https://beam.apache.org/documentation/programming-guide/#setting-a-trigger
> > [2]: https://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>


Re: Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Rui Wang
Thanks! That thread was really helpful!

-Rui

On Wed, Jun 26, 2019 at 1:18 PM Steve Niemitz  wrote:

> There was a thread about this a few months ago as well:
>
> https://lists.apache.org/thread.html/20d11046d26174969ef44a781e409a1cb9f7c736e605fa40fdf98397@%3Cuser.beam.apache.org%3E
>
>
> On Wed, Jun 26, 2019 at 4:02 PM Robert Bradshaw 
> wrote:
>
>> There is no promise that panes will arrive in order (especially the
>> further you get "downstream"). Though they may be approximately so,
>> it's dangerous to assume that. You can inspect the sequential index in
>> PaneInfo to determine whether a pane is older than other panes you
>> have seen.
>>
>> On Wed, Jun 26, 2019 at 7:03 PM Rui Wang  wrote:
>> >
>> > Hi Community,
>> >
>> > I am trying to understand Beam model and having a question related to
>> accumulating mode and panes:
>> >
>> > Accumulating mode means every time when a trigger fires, it emits all
>> values seen so far in a window(so it's called accumulating), an example
>> from Beam programming model guide[1] sets a repeating order has a repeating
>> trigger that fires every time 3 elements arrive on a 10-min fixed windowing
>> with the following emitted results:
>> >
>> >   First trigger firing:  [5, 8, 3]
>> >   Second trigger firing: [5, 8, 3, 15, 19, 23]
>> >   Third trigger firing:  [5, 8, 3, 15, 19, 23, 9, 13, 10]
>> >
>> >
>> > The original dataflow paper[2] also mentions that accumulating mode is
>> useful to downstream consumers to overwrite old value with new value.
>> >
>> > In order to help such "overwriting" use case, seems to me that Beam
>> model provides:
>> > 1. triggers fire in order. In the example above, second trigger firing
>> should after first trigger firing such that downstream transforms should
>> see [5, 8, 3] before [5, 8, 3, 15, 19, 23].
>> > 2. downstream transforms execute panes in order. If this is not true,
>> it might end with that new values(from later panes) are overwritten by old
>> values(earlier panes)
>> >
>> > Do I have a correct understanding?
>> >
>> >
>> > Thanks,
>> > Rui
>> >
>> >
>> >
>> > [1]:
>> https://beam.apache.org/documentation/programming-guide/#setting-a-trigger
>> > [2]: https://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>>
>


Re: Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Reuven Lax
Correct, however I think our triggering model is close to useless (or at
least close to unusable) without such a guarantee, for both accumulating
and discarding. What's worse - AFAIK all streaming runners today
practically do provide these panes in order; this means that likely many
users implicitly rely on this "non guarantee," probably without even
knowing they are relying on it.

On Wed, Jun 26, 2019 at 10:02 PM Robert Bradshaw 
wrote:

> There is no promise that panes will arrive in order (especially the
> further you get "downstream"). Though they may be approximately so,
> it's dangerous to assume that. You can inspect the sequential index in
> PaneInfo to determine whether a pane is older than other panes you
> have seen.
>
> On Wed, Jun 26, 2019 at 7:03 PM Rui Wang  wrote:
> >
> > Hi Community,
> >
> > I am trying to understand Beam model and having a question related to
> accumulating mode and panes:
> >
> > Accumulating mode means every time when a trigger fires, it emits all
> values seen so far in a window(so it's called accumulating), an example
> from Beam programming model guide[1] sets a repeating order has a repeating
> trigger that fires every time 3 elements arrive on a 10-min fixed windowing
> with the following emitted results:
> >
> >   First trigger firing:  [5, 8, 3]
> >   Second trigger firing: [5, 8, 3, 15, 19, 23]
> >   Third trigger firing:  [5, 8, 3, 15, 19, 23, 9, 13, 10]
> >
> >
> > The original dataflow paper[2] also mentions that accumulating mode is
> useful to downstream consumers to overwrite old value with new value.
> >
> > In order to help such "overwriting" use case, seems to me that Beam
> model provides:
> > 1. triggers fire in order. In the example above, second trigger firing
> should after first trigger firing such that downstream transforms should
> see [5, 8, 3] before [5, 8, 3, 15, 19, 23].
> > 2. downstream transforms execute panes in order. If this is not true, it
> might end with that new values(from later panes) are overwritten by old
> values(earlier panes)
> >
> > Do I have a correct understanding?
> >
> >
> > Thanks,
> > Rui
> >
> >
> >
> > [1]:
> https://beam.apache.org/documentation/programming-guide/#setting-a-trigger
> > [2]: https://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>


Re: Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Rui Wang
>
>
>  AFAIK all streaming runners today practically do provide these panes in
> order;
>
Does it refer to "the stage immediately after GBK itself processes fired
panes in order" in streaming runners? Could you share more information?



> this means that likely many users implicitly rely on this "non guarantee,"
> probably without even knowing they are relying on it.
>
If streaming runners have already provided or processed panes in order, and
likely many users rely on it already, why not make order of panes a part of
model explicitly?


-Rui


Re: Accumulating mode implies that panes are processed in order?

2019-06-26 Thread Robert Bradshaw
On Thu, Jun 27, 2019 at 1:52 AM Rui Wang  wrote:
>>
>>
>>  AFAIK all streaming runners today practically do provide these panes in 
>> order;
>
> Does it refer to "the stage immediately after GBK itself processes fired 
> panes in order" in streaming runners? Could you share more information?
>
>
>>
>> this means that likely many users implicitly rely on this "non guarantee," 
>> probably without even knowing they are relying on it.
>
> If streaming runners have already provided or processed panes in order, and 
> likely many users rely on it already, why not make order of panes a part of 
> model explicitly?

Most runners produce panes in order, but they don't necessarily
preserve the order downstream (at least beyond what's fused into the
same stage, which is where it gets difficult).


Re: Accumulating mode implies that panes are processed in order?

2019-06-27 Thread Reuven Lax
On Thu, Jun 27, 2019 at 3:32 AM Robert Bradshaw  wrote:

> On Thu, Jun 27, 2019 at 1:52 AM Rui Wang  wrote:
> >>
> >>
> >>  AFAIK all streaming runners today practically do provide these panes
> in order;
> >
> > Does it refer to "the stage immediately after GBK itself processes fired
> panes in order" in streaming runners? Could you share more information?
> >
> >
> >>
> >> this means that likely many users implicitly rely on this "non
> guarantee," probably without even knowing they are relying on it.
> >
> > If streaming runners have already provided or processed panes in order,
> and likely many users rely on it already, why not make order of panes a
> part of model explicitly?
>
> Most runners produce panes in order, but they don't necessarily
> preserve the order downstream (at least beyond what's fused into the
> same stage, which is where it gets difficult).
>

Actually they generally do preserve them at least to the very next fused
stage. That's why I assume many users rely on this non guarantee.


Re: Accumulating mode implies that panes are processed in order?

2019-06-27 Thread Rui Wang
Makes sense. At least for accumulating mode, maintaining pane ordering
cross stages will be very useful but it is indeed difficult to do so.

Now I can see why trigger at sinks might be a better approach.


-Rui



On Thu, Jun 27, 2019 at 9:35 AM Reuven Lax  wrote:

>
>
> On Thu, Jun 27, 2019 at 3:32 AM Robert Bradshaw 
> wrote:
>
>> On Thu, Jun 27, 2019 at 1:52 AM Rui Wang  wrote:
>> >>
>> >>
>> >>  AFAIK all streaming runners today practically do provide these panes
>> in order;
>> >
>> > Does it refer to "the stage immediately after GBK itself processes
>> fired panes in order" in streaming runners? Could you share more
>> information?
>> >
>> >
>> >>
>> >> this means that likely many users implicitly rely on this "non
>> guarantee," probably without even knowing they are relying on it.
>> >
>> > If streaming runners have already provided or processed panes in order,
>> and likely many users rely on it already, why not make order of panes a
>> part of model explicitly?
>>
>> Most runners produce panes in order, but they don't necessarily
>> preserve the order downstream (at least beyond what's fused into the
>> same stage, which is where it gets difficult).
>>
>
> Actually they generally do preserve them at least to the very next fused
> stage. That's why I assume many users rely on this non guarantee.
>