Re: Upgrading from 2.15 to 2.19 makes compilation fail on trigger

2020-05-05 Thread Kyle Weaver
> Maybe we should add a statement like "did you mean to wrap it in
Repeatedly.forever?" to the error message

+1. IMO the less indirection between the user and the fix, the better.

On Tue, May 5, 2020 at 12:08 PM Luke Cwik  wrote:

> Pointing users to the website with additional details in the error
> messages would likely help as well.
>
> On Tue, May 5, 2020 at 8:45 AM Brian Hulette  wrote:
>
>> In both SDKs this is an unsafe trigger because it will only fire once for
>> the first window (per key), and any subsequent data on the same key will be
>> dropped. In 2.18, we closed BEAM-3288 with PR
>> https://github.com/apache/beam/pull/9960, which detects these cases and
>> fails early. Probably the fix is to add Repeatedly.forever around your
>> AfterWatermark trigger.
>>
>> This is noted if you read through
>> https://s.apache.org/finishing-triggers-drop-data but it's not super
>> clear from a user perspective. Maybe we should add a statement like "did
>> you mean to wrap it in Repeatedly.forever?" to the error message, and/or
>> update https://s.apache.org/finishing-triggers-drop-data with clear
>> directions for users. +Kenneth Knowles 
>>
>> On Tue, May 5, 2020 at 5:18 AM Eddy G  wrote:
>>
>>> Hey all!
>>>
>>> Recently been updating Beam pipelines up to 2.19, and the following
>>> trigger which previously worked with 2.15 flawlessly has stopped doing so
>>> and the project doesn't even compile now.
>>>
>>> .apply("15min Window",
>>> Window.into(FixedWindows.of(Duration.standardMinutes(15)))
>>> .triggering(AfterWatermark
>>> .pastEndOfWindow())
>>> .withAllowedLateness(Duration.standardMinutes(60))
>>> .discardingFiredPanes()
>>> )
>>>
>>> And will complain with the following error.
>>>
>>> Exception in thread "main" java.lang.IllegalArgumentException: Unsafe
>>> trigger may lose data, see
>>> https://s.apache.org/finishing-triggers-drop-data:
>>> AfterWatermark.pastEndOfWindow()
>>>
>>> Reviewing the changelog I don't see any changes regarding
>>> AfterWatermark. Am I missing something?
>>>
>>


Re: Upgrading from 2.15 to 2.19 makes compilation fail on trigger

2020-05-05 Thread Luke Cwik
Pointing users to the website with additional details in the error messages
would likely help as well.

On Tue, May 5, 2020 at 8:45 AM Brian Hulette  wrote:

> In both SDKs this is an unsafe trigger because it will only fire once for
> the first window (per key), and any subsequent data on the same key will be
> dropped. In 2.18, we closed BEAM-3288 with PR
> https://github.com/apache/beam/pull/9960, which detects these cases and
> fails early. Probably the fix is to add Repeatedly.forever around your
> AfterWatermark trigger.
>
> This is noted if you read through
> https://s.apache.org/finishing-triggers-drop-data but it's not super
> clear from a user perspective. Maybe we should add a statement like "did
> you mean to wrap it in Repeatedly.forever?" to the error message, and/or
> update https://s.apache.org/finishing-triggers-drop-data with clear
> directions for users. +Kenneth Knowles 
>
> On Tue, May 5, 2020 at 5:18 AM Eddy G  wrote:
>
>> Hey all!
>>
>> Recently been updating Beam pipelines up to 2.19, and the following
>> trigger which previously worked with 2.15 flawlessly has stopped doing so
>> and the project doesn't even compile now.
>>
>> .apply("15min Window",
>> Window.into(FixedWindows.of(Duration.standardMinutes(15)))
>> .triggering(AfterWatermark
>> .pastEndOfWindow())
>> .withAllowedLateness(Duration.standardMinutes(60))
>> .discardingFiredPanes()
>> )
>>
>> And will complain with the following error.
>>
>> Exception in thread "main" java.lang.IllegalArgumentException: Unsafe
>> trigger may lose data, see
>> https://s.apache.org/finishing-triggers-drop-data:
>> AfterWatermark.pastEndOfWindow()
>>
>> Reviewing the changelog I don't see any changes regarding AfterWatermark.
>> Am I missing something?
>>
>


Re: Upgrading from 2.15 to 2.19 makes compilation fail on trigger

2020-05-05 Thread Brian Hulette
In both SDKs this is an unsafe trigger because it will only fire once for
the first window (per key), and any subsequent data on the same key will be
dropped. In 2.18, we closed BEAM-3288 with PR
https://github.com/apache/beam/pull/9960, which detects these cases and
fails early. Probably the fix is to add Repeatedly.forever around your
AfterWatermark trigger.

This is noted if you read through
https://s.apache.org/finishing-triggers-drop-data but it's not super clear
from a user perspective. Maybe we should add a statement like "did you mean
to wrap it in Repeatedly.forever?" to the error message, and/or update
https://s.apache.org/finishing-triggers-drop-data with clear directions for
users. +Kenneth Knowles 

On Tue, May 5, 2020 at 5:18 AM Eddy G  wrote:

> Hey all!
>
> Recently been updating Beam pipelines up to 2.19, and the following
> trigger which previously worked with 2.15 flawlessly has stopped doing so
> and the project doesn't even compile now.
>
> .apply("15min Window",
> Window.into(FixedWindows.of(Duration.standardMinutes(15)))
> .triggering(AfterWatermark
> .pastEndOfWindow())
> .withAllowedLateness(Duration.standardMinutes(60))
> .discardingFiredPanes()
> )
>
> And will complain with the following error.
>
> Exception in thread "main" java.lang.IllegalArgumentException: Unsafe
> trigger may lose data, see
> https://s.apache.org/finishing-triggers-drop-data:
> AfterWatermark.pastEndOfWindow()
>
> Reviewing the changelog I don't see any changes regarding AfterWatermark.
> Am I missing something?
>


Re: Flink Runner with RequiresStableInput fails after a certain number of checkpoints

2020-05-05 Thread Eleanore Jin
Hi Max,

Thanks for the info!

Eleanore

On Tue, May 5, 2020 at 4:01 AM Maximilian Michels  wrote:

> Hey Eleanore,
>
> The change will be part of the 2.21.0 release.
>
> -Max
>
> On 04.05.20 19:14, Eleanore Jin wrote:
> > Hi Max,
> >
> > Thanks for the information and I saw this PR is already merged, just
> > wonder is it backported to the affected versions already
> > (i.e. 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 2.19.0, 2.20.0)? Or I have
> > to wait for the 2.20.1 release?
> >
> > Thanks a lot!
> > Eleanore
> >
> > On Wed, Apr 22, 2020 at 2:31 AM Maximilian Michels  > > wrote:
> >
> > Hi Eleanore,
> >
> > Exactly-once is not affected but the pipeline can fail to checkpoint
> > after the maximum number of state cells have been reached. We are
> > working on a fix [1].
> >
> > Cheers,
> > Max
> >
> > [1] https://github.com/apache/beam/pull/11478
> >
> > On 22.04.20 07:19, Eleanore Jin wrote:
> > > Hi Maxi,
> > >
> > > I assume this will impact the Exactly Once Semantics that beam
> > provided
> > > as in the KafkaExactlyOnceSink, the processElement method is also
> > > annotated with @RequiresStableInput?
> > >
> > > Thanks a lot!
> > > Eleanore
> > >
> > > On Tue, Apr 21, 2020 at 12:58 AM Maximilian Michels
> > mailto:m...@apache.org>
> > > >> wrote:
> > >
> > > Hi Stephen,
> > >
> > > Thanks for reporting the issue! David, good catch!
> > >
> > > I think we have to resort to only using a single state cell for
> > > buffering on checkpoints, instead of using a new one for every
> > > checkpoint. I was under the assumption that, if the state cell
> was
> > > cleared, it would not be checkpointed but that does not seem
> to be
> > > the case.
> > >
> > > Thanks,
> > > Max
> > >
> > > On 21.04.20 09:29, David Morávek wrote:
> > > > Hi Stephen,
> > > >
> > > > nice catch and awesome report! ;) This definitely needs a
> > proper fix.
> > > > I've created a new JIRA to track the issue and will try to
> > resolve it
> > > > soon as this seems critical to me.
> > > >
> > > > https://issues.apache.org/jira/browse/BEAM-9794
> > > >
> > > > Thanks,
> > > > D.
> > > >
> > > > On Mon, Apr 20, 2020 at 10:41 PM Stephen Patel
> > > mailto:stephenpate...@gmail.com>
> > >
> > > >  > 
> > >  >  > > >
> > > > I was able to reproduce this in a unit test:
> > > >
> > > > @Test
> > > >
> > > >   *public* *void* test() *throws*
> InterruptedException,
> > > > ExecutionException {
> > > >
> > > > FlinkPipelineOptions options =
> > > >
> >  PipelineOptionsFactory./as/(FlinkPipelineOptions.*class*);
> > > >
> > > > options.setCheckpointingInterval(10L);
> > > >
> > > > options.setParallelism(1);
> > > >
> > > > options.setStreaming(*true*);
> > > >
> > > > options.setRunner(FlinkRunner.*class*);
> > > >
> > > > options.setFlinkMaster("[local]");
> > > >
> > > > options.setStateBackend(*new*
> > > > MemoryStateBackend(Integer.*/MAX_VALUE/*));
> > > >
> > > > Pipeline pipeline = Pipeline./create/(options);
> > > >
> > > > pipeline
> > > >
> > > > .apply(Create./of/((Void) *null*))
> > > >
> > > > .apply(
> > > >
> > > > ParDo./of/(
> > > >
> > > > *new* DoFn() {
> > > >
> > > >
> > > >   *private* *static* *final* *long*
> > > > */serialVersionUID/* = 1L;
> > > >
> > > >
> > > >   @RequiresStableInput
> > > >
> > > >   @ProcessElement
> > > >
> > > >   *public* *void* processElement() {}
> > > >
> > > > }));
> > > >
> > > > pipeline.run();
> > > >
> > > >   }
> > > >
> > > >
> > > > It took a while to get to checkpoint 32,767, but
> > eventually it
> > > did,
> > > > and it failed with the same error I listed 

Upgrading from 2.15 to 2.19 makes compilation fail on trigger

2020-05-05 Thread Eddy G
Hey all!

Recently been updating Beam pipelines up to 2.19, and the following trigger 
which previously worked with 2.15 flawlessly has stopped doing so and the 
project doesn't even compile now.

.apply("15min Window", 
Window.into(FixedWindows.of(Duration.standardMinutes(15)))
.triggering(AfterWatermark
.pastEndOfWindow())
.withAllowedLateness(Duration.standardMinutes(60))
.discardingFiredPanes()
)

And will complain with the following error.

Exception in thread "main" java.lang.IllegalArgumentException: Unsafe trigger 
may lose data, see https://s.apache.org/finishing-triggers-drop-data: 
AfterWatermark.pastEndOfWindow()

Reviewing the changelog I don't see any changes regarding AfterWatermark. Am I 
missing something?


Re: Flink Runner with RequiresStableInput fails after a certain number of checkpoints

2020-05-05 Thread Maximilian Michels
Hey Eleanore,

The change will be part of the 2.21.0 release.

-Max

On 04.05.20 19:14, Eleanore Jin wrote:
> Hi Max, 
> 
> Thanks for the information and I saw this PR is already merged, just
> wonder is it backported to the affected versions already
> (i.e. 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 2.19.0, 2.20.0)? Or I have
> to wait for the 2.20.1 release? 
> 
> Thanks a lot!
> Eleanore
> 
> On Wed, Apr 22, 2020 at 2:31 AM Maximilian Michels  > wrote:
> 
> Hi Eleanore,
> 
> Exactly-once is not affected but the pipeline can fail to checkpoint
> after the maximum number of state cells have been reached. We are
> working on a fix [1].
> 
> Cheers,
> Max
> 
> [1] https://github.com/apache/beam/pull/11478
> 
> On 22.04.20 07:19, Eleanore Jin wrote:
> > Hi Maxi, 
> >
> > I assume this will impact the Exactly Once Semantics that beam
> provided
> > as in the KafkaExactlyOnceSink, the processElement method is also
> > annotated with @RequiresStableInput?
> >
> > Thanks a lot!
> > Eleanore
> >
> > On Tue, Apr 21, 2020 at 12:58 AM Maximilian Michels
> mailto:m...@apache.org>
> > >> wrote:
> >
> >     Hi Stephen,
> >
> >     Thanks for reporting the issue! David, good catch!
> >
> >     I think we have to resort to only using a single state cell for
> >     buffering on checkpoints, instead of using a new one for every
> >     checkpoint. I was under the assumption that, if the state cell was
> >     cleared, it would not be checkpointed but that does not seem to be
> >     the case.
> >
> >     Thanks,
> >     Max
> >
> >     On 21.04.20 09:29, David Morávek wrote:
> >     > Hi Stephen,
> >     >
> >     > nice catch and awesome report! ;) This definitely needs a
> proper fix.
> >     > I've created a new JIRA to track the issue and will try to
> resolve it
> >     > soon as this seems critical to me.
> >     >
> >     > https://issues.apache.org/jira/browse/BEAM-9794
> >     >
> >     > Thanks,
> >     > D.
> >     >
> >     > On Mon, Apr 20, 2020 at 10:41 PM Stephen Patel
> >     mailto:stephenpate...@gmail.com>
> >
> >     >  
> >       >     >
> >     >     I was able to reproduce this in a unit test:
> >     >
> >     >         @Test
> >     >
> >     >           *public* *void* test() *throws* InterruptedException,
> >     >         ExecutionException {
> >     >
> >     >             FlinkPipelineOptions options =
> >     >       
>  PipelineOptionsFactory./as/(FlinkPipelineOptions.*class*);
> >     >
> >     >             options.setCheckpointingInterval(10L);
> >     >
> >     >             options.setParallelism(1);
> >     >
> >     >             options.setStreaming(*true*);
> >     >
> >     >             options.setRunner(FlinkRunner.*class*);
> >     >
> >     >             options.setFlinkMaster("[local]");
> >     >
> >     >             options.setStateBackend(*new*
> >     >         MemoryStateBackend(Integer.*/MAX_VALUE/*));
> >     >
> >     >             Pipeline pipeline = Pipeline./create/(options);
> >     >
> >     >             pipeline
> >     >
> >     >                 .apply(Create./of/((Void) *null*))
> >     >
> >     >                 .apply(
> >     >
> >     >                     ParDo./of/(
> >     >
> >     >                         *new* DoFn() {
> >     >
> >     >
> >     >                           *private* *static* *final* *long*
> >     >         */serialVersionUID/* = 1L;
> >     >
> >     >
> >     >                           @RequiresStableInput
> >     >
> >     >                           @ProcessElement
> >     >
> >     >                           *public* *void* processElement() {}
> >     >
> >     >                         }));
> >     >
> >     >             pipeline.run();
> >     >
> >     >           }
> >     >
> >     >
> >     >     It took a while to get to checkpoint 32,767, but
> eventually it
> >     did,
> >     >     and it failed with the same error I listed above.
> >     >
> >     >     On Thu, Apr 16, 2020 at 11:26 AM Stephen Patel
> >     >        >
> >