Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-03 Thread Amit Sela
I think this problem emphasise the importance of triggers - for tests
alone, it's a huge improvement over non-deterministic and optimistic hacks
(wait for... tail log to see... etc.)

As a runner developer, I'd like to be able to develop incrementally with
tests that allow me to say I support the Source API in streaming, but not
yet triggers. This is not possible now as PAssert relies on GroupByKey and
Never.ever() trigger.

If I had a quick solution for this, I'd offer it, but I understand why it's
hard (exactly because of the discussed issues) - I thought that
filtering-out the empty iterable would be ok because runners are asserting
the success/failure aggregators.

Anyway, I think it's something worth considering as part of the Runner API
- being able to break-down the development effort to testable stages.

Thanks,
Amit

On Thu, Nov 3, 2016 at 12:11 AM Kenneth Knowles  wrote:

> Interesting problem.
>
> First, I do think that PAssert is defined entirely in terms of the model
> so this is a real problem with the results that are produced.
>
> Then, I think the answer might lie with triggering. When PAssert rewindows
> into the global window and triggers "never" this actually means that its
> internal GBK emits all of its output exactly once, when the window is
> expiring. So producing more than one output - even if spread across
> microbatches - is actually the trouble.
>
> On Wed, Nov 2, 2016 at 10:20 AM Amit Sela  wrote:
>
> Spark won't evaluate empty PCollections as well, and I've handled it by
> asserting the SUCCESS/FAIL aggregators in PAssert in the TestSparkRunner.
>
> Main problem for Spark is when running gracefully it might take sometime
> to terminate and execute another batch which is an empty iterable, then
> fail on AssertionError <0> but expected <100>.
> Running ungracefully seemed to come back and bite me in the past, so I'm
> trying to avoid it.
>
> Thoughts ?
>
> On Wed, Nov 2, 2016 at 6:00 PM Dan Halperin 
> wrote:
>
> (Also: I meant "tests will [incorrectly] pass silently")
>
> On Wed, Nov 2, 2016 at 8:57 AM, Kenneth Knowles  wrote:
>
> > FWIW if the runner is set up properly the tests will still fail with a
> > timeout waiting for the assertion aggregators to reach expected values.
> > Unfortunately we haven't yet centralized this functionality into
> > TestPipeline or thereabouts.
> >
> > On Wed, Nov 2, 2016 at 8:56 AM Dan Halperin  wrote:
> >
> >> +Kenn
> >>
> >> I believe this is done because if there is no output, no assertions will
> >> be run and tests will fail silently. (This was a side effect of
> switching
> >> from side inputs to groupbykey for this flow, which enabled testing of
> >> triggers/panes/etc.)
> >>
> >> On Wed, Nov 2, 2016 at 5:19 AM, Amit Sela  wrote:
> >>
> >> I've proposed https://github.com/apache/incubator-beam/pull/1257 (also
> >> opened a ticket).
> >> Tested locally Direct/Flink/Spark runners, and it looks fine, I've
> issued
> >> a
> >> PR to see if it affects Dataflow runner.
> >>
> >> Amit.
> >>
> >> On Wed, Nov 2, 2016 at 11:56 AM Jean-Baptiste Onofré 
> >> wrote:
> >>
> >> > Agree, this element should be removed.
> >> >
> >> > Regards
> >> > JB
> >> >
> >> > On 11/02/2016 10:53 AM, Amit Sela wrote:
> >> > > Hi all,
> >> > >
> >> > > I've been looking at PAssert and I've notice that
> >> PAssert.GroupedGlobally
> >> > > points
> >> > > <
> >> > https://github.com/apache/incubator-beam/blob/master/sdks/
> >> java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L825
> >> > >
> >> > > that it will result in a singe empty iterable even if the input
> >> > PCollection
> >> > > is empty.
> >> > > This is a weird behaviour as it may cause following assertions to
> >> fail.
> >> > >
> >> > > Wouldn't it be more correct to remove (filter out ?) this element ?
> >> > >
> >> > > Thanks,
> >> > > Amit
> >> > >
> >> >
> >> > --
> >> > Jean-Baptiste Onofré
> >> > jbono...@apache.org
> >> > http://blog.nanthrax.net
> >> > Talend - http://www.talend.com
> >> >
> >>
> >>
> >>
>
>


Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Kenneth Knowles
Interesting problem.

First, I do think that PAssert is defined entirely in terms of the model so
this is a real problem with the results that are produced.

Then, I think the answer might lie with triggering. When PAssert rewindows
into the global window and triggers "never" this actually means that its
internal GBK emits all of its output exactly once, when the window is
expiring. So producing more than one output - even if spread across
microbatches - is actually the trouble.

On Wed, Nov 2, 2016 at 10:20 AM Amit Sela  wrote:

> Spark won't evaluate empty PCollections as well, and I've handled it by
> asserting the SUCCESS/FAIL aggregators in PAssert in the TestSparkRunner.
>
> Main problem for Spark is when running gracefully it might take sometime
> to terminate and execute another batch which is an empty iterable, then
> fail on AssertionError <0> but expected <100>.
> Running ungracefully seemed to come back and bite me in the past, so I'm
> trying to avoid it.
>
> Thoughts ?
>
> On Wed, Nov 2, 2016 at 6:00 PM Dan Halperin 
> wrote:
>
> (Also: I meant "tests will [incorrectly] pass silently")
>
> On Wed, Nov 2, 2016 at 8:57 AM, Kenneth Knowles  wrote:
>
> > FWIW if the runner is set up properly the tests will still fail with a
> > timeout waiting for the assertion aggregators to reach expected values.
> > Unfortunately we haven't yet centralized this functionality into
> > TestPipeline or thereabouts.
> >
> > On Wed, Nov 2, 2016 at 8:56 AM Dan Halperin  wrote:
> >
> >> +Kenn
> >>
> >> I believe this is done because if there is no output, no assertions will
> >> be run and tests will fail silently. (This was a side effect of
> switching
> >> from side inputs to groupbykey for this flow, which enabled testing of
> >> triggers/panes/etc.)
> >>
> >> On Wed, Nov 2, 2016 at 5:19 AM, Amit Sela  wrote:
> >>
> >> I've proposed https://github.com/apache/incubator-beam/pull/1257 (also
> >> opened a ticket).
> >> Tested locally Direct/Flink/Spark runners, and it looks fine, I've
> issued
> >> a
> >> PR to see if it affects Dataflow runner.
> >>
> >> Amit.
> >>
> >> On Wed, Nov 2, 2016 at 11:56 AM Jean-Baptiste Onofré 
> >> wrote:
> >>
> >> > Agree, this element should be removed.
> >> >
> >> > Regards
> >> > JB
> >> >
> >> > On 11/02/2016 10:53 AM, Amit Sela wrote:
> >> > > Hi all,
> >> > >
> >> > > I've been looking at PAssert and I've notice that
> >> PAssert.GroupedGlobally
> >> > > points
> >> > > <
> >> > https://github.com/apache/incubator-beam/blob/master/sdks/
> >> java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L825
> >> > >
> >> > > that it will result in a singe empty iterable even if the input
> >> > PCollection
> >> > > is empty.
> >> > > This is a weird behaviour as it may cause following assertions to
> >> fail.
> >> > >
> >> > > Wouldn't it be more correct to remove (filter out ?) this element ?
> >> > >
> >> > > Thanks,
> >> > > Amit
> >> > >
> >> >
> >> > --
> >> > Jean-Baptiste Onofré
> >> > jbono...@apache.org
> >> > http://blog.nanthrax.net
> >> > Talend - http://www.talend.com
> >> >
> >>
> >>
> >>
>
>


Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Amit Sela
Spark won't evaluate empty PCollections as well, and I've handled it by
asserting the SUCCESS/FAIL aggregators in PAssert in the TestSparkRunner.

Main problem for Spark is when running gracefully it might take sometime to
terminate and execute another batch which is an empty iterable, then fail
on AssertionError <0> but expected <100>.
Running ungracefully seemed to come back and bite me in the past, so I'm
trying to avoid it.

Thoughts ?

On Wed, Nov 2, 2016 at 6:00 PM Dan Halperin 
wrote:

> (Also: I meant "tests will [incorrectly] pass silently")
>
> On Wed, Nov 2, 2016 at 8:57 AM, Kenneth Knowles  wrote:
>
> > FWIW if the runner is set up properly the tests will still fail with a
> > timeout waiting for the assertion aggregators to reach expected values.
> > Unfortunately we haven't yet centralized this functionality into
> > TestPipeline or thereabouts.
> >
> > On Wed, Nov 2, 2016 at 8:56 AM Dan Halperin  wrote:
> >
> >> +Kenn
> >>
> >> I believe this is done because if there is no output, no assertions will
> >> be run and tests will fail silently. (This was a side effect of
> switching
> >> from side inputs to groupbykey for this flow, which enabled testing of
> >> triggers/panes/etc.)
> >>
> >> On Wed, Nov 2, 2016 at 5:19 AM, Amit Sela  wrote:
> >>
> >> I've proposed https://github.com/apache/incubator-beam/pull/1257 (also
> >> opened a ticket).
> >> Tested locally Direct/Flink/Spark runners, and it looks fine, I've
> issued
> >> a
> >> PR to see if it affects Dataflow runner.
> >>
> >> Amit.
> >>
> >> On Wed, Nov 2, 2016 at 11:56 AM Jean-Baptiste Onofré 
> >> wrote:
> >>
> >> > Agree, this element should be removed.
> >> >
> >> > Regards
> >> > JB
> >> >
> >> > On 11/02/2016 10:53 AM, Amit Sela wrote:
> >> > > Hi all,
> >> > >
> >> > > I've been looking at PAssert and I've notice that
> >> PAssert.GroupedGlobally
> >> > > points
> >> > > <
> >> > https://github.com/apache/incubator-beam/blob/master/sdks/
> >> java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L825
> >> > >
> >> > > that it will result in a singe empty iterable even if the input
> >> > PCollection
> >> > > is empty.
> >> > > This is a weird behaviour as it may cause following assertions to
> >> fail.
> >> > >
> >> > > Wouldn't it be more correct to remove (filter out ?) this element ?
> >> > >
> >> > > Thanks,
> >> > > Amit
> >> > >
> >> >
> >> > --
> >> > Jean-Baptiste Onofré
> >> > jbono...@apache.org
> >> > http://blog.nanthrax.net
> >> > Talend - http://www.talend.com
> >> >
> >>
> >>
> >>
>


Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Kenneth Knowles
The iterable is the entirety of the contents of the PCollection. So empty
iterable -> empty PCollection.

It is actually main purpose/complexity in this transform to make sure it is
non-empty, because otherwise downstream asserts do not run.

On Wed, Nov 2, 2016 at 5:20 AM Amit Sela  wrote:

> I've proposed https://github.com/apache/incubator-beam/pull/1257 (also
> opened a ticket).
> Tested locally Direct/Flink/Spark runners, and it looks fine, I've issued a
> PR to see if it affects Dataflow runner.
>
> Amit.
>
> On Wed, Nov 2, 2016 at 11:56 AM Jean-Baptiste Onofré 
> wrote:
>
> > Agree, this element should be removed.
> >
> > Regards
> > JB
> >
> > On 11/02/2016 10:53 AM, Amit Sela wrote:
> > > Hi all,
> > >
> > > I've been looking at PAssert and I've notice that
> PAssert.GroupedGlobally
> > > points
> > > <
> >
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L825
> > >
> > > that it will result in a singe empty iterable even if the input
> > PCollection
> > > is empty.
> > > This is a weird behaviour as it may cause following assertions to fail.
> > >
> > > Wouldn't it be more correct to remove (filter out ?) this element ?
> > >
> > > Thanks,
> > > Amit
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Dan Halperin
(Also: I meant "tests will [incorrectly] pass silently")

On Wed, Nov 2, 2016 at 8:57 AM, Kenneth Knowles  wrote:

> FWIW if the runner is set up properly the tests will still fail with a
> timeout waiting for the assertion aggregators to reach expected values.
> Unfortunately we haven't yet centralized this functionality into
> TestPipeline or thereabouts.
>
> On Wed, Nov 2, 2016 at 8:56 AM Dan Halperin  wrote:
>
>> +Kenn
>>
>> I believe this is done because if there is no output, no assertions will
>> be run and tests will fail silently. (This was a side effect of switching
>> from side inputs to groupbykey for this flow, which enabled testing of
>> triggers/panes/etc.)
>>
>> On Wed, Nov 2, 2016 at 5:19 AM, Amit Sela  wrote:
>>
>> I've proposed https://github.com/apache/incubator-beam/pull/1257 (also
>> opened a ticket).
>> Tested locally Direct/Flink/Spark runners, and it looks fine, I've issued
>> a
>> PR to see if it affects Dataflow runner.
>>
>> Amit.
>>
>> On Wed, Nov 2, 2016 at 11:56 AM Jean-Baptiste Onofré 
>> wrote:
>>
>> > Agree, this element should be removed.
>> >
>> > Regards
>> > JB
>> >
>> > On 11/02/2016 10:53 AM, Amit Sela wrote:
>> > > Hi all,
>> > >
>> > > I've been looking at PAssert and I've notice that
>> PAssert.GroupedGlobally
>> > > points
>> > > <
>> > https://github.com/apache/incubator-beam/blob/master/sdks/
>> java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L825
>> > >
>> > > that it will result in a singe empty iterable even if the input
>> > PCollection
>> > > is empty.
>> > > This is a weird behaviour as it may cause following assertions to
>> fail.
>> > >
>> > > Wouldn't it be more correct to remove (filter out ?) this element ?
>> > >
>> > > Thanks,
>> > > Amit
>> > >
>> >
>> > --
>> > Jean-Baptiste Onofré
>> > jbono...@apache.org
>> > http://blog.nanthrax.net
>> > Talend - http://www.talend.com
>> >
>>
>>
>>


Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Kenneth Knowles
FWIW if the runner is set up properly the tests will still fail with a
timeout waiting for the assertion aggregators to reach expected values.
Unfortunately we haven't yet centralized this functionality into
TestPipeline or thereabouts.

On Wed, Nov 2, 2016 at 8:56 AM Dan Halperin  wrote:

> +Kenn
>
> I believe this is done because if there is no output, no assertions will
> be run and tests will fail silently. (This was a side effect of switching
> from side inputs to groupbykey for this flow, which enabled testing of
> triggers/panes/etc.)
>
> On Wed, Nov 2, 2016 at 5:19 AM, Amit Sela  wrote:
>
> I've proposed https://github.com/apache/incubator-beam/pull/1257 (also
> opened a ticket).
> Tested locally Direct/Flink/Spark runners, and it looks fine, I've issued a
> PR to see if it affects Dataflow runner.
>
> Amit.
>
> On Wed, Nov 2, 2016 at 11:56 AM Jean-Baptiste Onofré 
> wrote:
>
> > Agree, this element should be removed.
> >
> > Regards
> > JB
> >
> > On 11/02/2016 10:53 AM, Amit Sela wrote:
> > > Hi all,
> > >
> > > I've been looking at PAssert and I've notice that
> PAssert.GroupedGlobally
> > > points
> > > <
> >
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L825
> > >
> > > that it will result in a singe empty iterable even if the input
> > PCollection
> > > is empty.
> > > This is a weird behaviour as it may cause following assertions to fail.
> > >
> > > Wouldn't it be more correct to remove (filter out ?) this element ?
> > >
> > > Thanks,
> > > Amit
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>
>
>


Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Dan Halperin
+Kenn

I believe this is done because if there is no output, no assertions will be
run and tests will fail silently. (This was a side effect of switching from
side inputs to groupbykey for this flow, which enabled testing of
triggers/panes/etc.)

On Wed, Nov 2, 2016 at 5:19 AM, Amit Sela  wrote:

> I've proposed https://github.com/apache/incubator-beam/pull/1257 (also
> opened a ticket).
> Tested locally Direct/Flink/Spark runners, and it looks fine, I've issued a
> PR to see if it affects Dataflow runner.
>
> Amit.
>
> On Wed, Nov 2, 2016 at 11:56 AM Jean-Baptiste Onofré 
> wrote:
>
> > Agree, this element should be removed.
> >
> > Regards
> > JB
> >
> > On 11/02/2016 10:53 AM, Amit Sela wrote:
> > > Hi all,
> > >
> > > I've been looking at PAssert and I've notice that
> PAssert.GroupedGlobally
> > > points
> > > <
> > https://github.com/apache/incubator-beam/blob/master/
> sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L825
> > >
> > > that it will result in a singe empty iterable even if the input
> > PCollection
> > > is empty.
> > > This is a weird behaviour as it may cause following assertions to fail.
> > >
> > > Wouldn't it be more correct to remove (filter out ?) this element ?
> > >
> > > Thanks,
> > > Amit
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Amit Sela
I've proposed https://github.com/apache/incubator-beam/pull/1257 (also
opened a ticket).
Tested locally Direct/Flink/Spark runners, and it looks fine, I've issued a
PR to see if it affects Dataflow runner.

Amit.

On Wed, Nov 2, 2016 at 11:56 AM Jean-Baptiste Onofré 
wrote:

> Agree, this element should be removed.
>
> Regards
> JB
>
> On 11/02/2016 10:53 AM, Amit Sela wrote:
> > Hi all,
> >
> > I've been looking at PAssert and I've notice that PAssert.GroupedGlobally
> > points
> > <
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L825
> >
> > that it will result in a singe empty iterable even if the input
> PCollection
> > is empty.
> > This is a weird behaviour as it may cause following assertions to fail.
> >
> > Wouldn't it be more correct to remove (filter out ?) this element ?
> >
> > Thanks,
> > Amit
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Jean-Baptiste Onofré

Agree, this element should be removed.

Regards
JB

On 11/02/2016 10:53 AM, Amit Sela wrote:

Hi all,

I've been looking at PAssert and I've notice that PAssert.GroupedGlobally
points

that it will result in a singe empty iterable even if the input PCollection
is empty.
This is a weird behaviour as it may cause following assertions to fail.

Wouldn't it be more correct to remove (filter out ?) this element ?

Thanks,
Amit



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com