On Thu, Oct 31, 2019 at 2:11 AM Jan Lukavský <je...@seznam.cz> wrote:

> Hi Kenn,
>
> does there still remain some use for trigger to finish? If we don't drop
> data, would it still be of any use to users? If not, would it be better
> to just remove the functionality completely, so that users who use it
> (and it will possibly break for them) are aware of it at compile time?
>
> Jan
>

Good point. I believe there is no good use for a top-level trigger
finishing. As mentioned, the intended uses aren't really met by triggers,
but are met by stateful DoFn.

Eugene's bug even has this title :-). We could not change any behavior but
just reject pipelines with broken top-level triggers. This is probably a
better solution. Because if a user has a broken trigger, the new behavior
is probably not enough to magically fix their pipeline. They are better off
knowing that they are broken and fixing it.

And at that point, there is a lot of dead code and my PR is really just
cleaning it up as a simplification.

Kenn



> On 10/30/19 11:26 PM, Kenneth Knowles wrote:
> > Problem: a trigger can "finish" which causes a window to "close" and
> > drop all remaining data arriving for that window.
> >
> > This has been discussed many times and I thought fixed, but it seems
> > to not be fixed. It does not seem to have its own Jira or thread that
> > I can find. But here are some pointers:
> >
> >  - data loss bug:
> >
> https://lists.apache.org/thread.html/ce413231d0b7d52019668765186ef27a7ffb69b151fdb34f4bf80b0f@%3Cdev.beam.apache.org%3E
> >  - user hitting the bug:
> >
> https://lists.apache.org/thread.html/28879bc80cd5c7ef1a3e38cb1d2c063165d40c13c02894bbccd66aca@%3Cuser.beam.apache.org%3E
> >  - user confusion:
> >
> https://lists.apache.org/thread.html/2707aa449c8c6de1c6e3e8229db396323122304c14931c44d0081449@%3Cuser.beam.apache.org%3E
> >  - thread from 2016 on the topic:
> >
> https://lists.apache.org/thread.html/5f44b62fdaf34094ccff8da2a626b7cd344d29a8a0fff6eac8e148ea@%3Cdev.beam.apache.org%3E
> >
> > In theory, trigger finishing was intended for users who can get their
> > answers from a smaller amount of data and then drop the rest. In
> > practice, triggers aren't really expressive enough for this. Stateful
> > DoFn is the solution for these cases.
> >
> > I've opened https://github.com/apache/beam/pull/9942 which makes the
> > following changes:
> >
> >  - when a trigger says it is finished, it never fires again but data
> > is still kept
> >  - at GC time the final output will be emitted
> >
> > As with all bugfixes, this is backwards-incompatible (if your pipeline
> > relies on buggy behavior, it will stop working). So this is a major
> > change that I wanted to discuss on dev@.
> >
> > Kenn
> >
>

Reply via email to