I think this problem emphasise the importance of triggers - for tests
alone, it's a huge improvement over non-deterministic and optimistic hacks
(wait for... tail log to see... etc.)

As a runner developer, I'd like to be able to develop incrementally with
tests that allow me to say I support the Source API in streaming, but not
yet triggers. This is not possible now as PAssert relies on GroupByKey and
Never.ever() trigger.

If I had a quick solution for this, I'd offer it, but I understand why it's
hard (exactly because of the discussed issues) - I thought that
filtering-out the empty iterable would be ok because runners are asserting
the success/failure aggregators.

Anyway, I think it's something worth considering as part of the Runner API
- being able to break-down the development effort to testable stages.

Thanks,
Amit

On Thu, Nov 3, 2016 at 12:11 AM Kenneth Knowles <k...@google.com> wrote:

> Interesting problem.
>
> First, I do think that PAssert is defined entirely in terms of the model
> so this is a real problem with the results that are produced.
>
> Then, I think the answer might lie with triggering. When PAssert rewindows
> into the global window and triggers "never" this actually means that its
> internal GBK emits all of its output exactly once, when the window is
> expiring. So producing more than one output - even if spread across
> microbatches - is actually the trouble.
>
> On Wed, Nov 2, 2016 at 10:20 AM Amit Sela <amitsel...@gmail.com> wrote:
>
> Spark won't evaluate empty PCollections as well, and I've handled it by
> asserting the SUCCESS/FAIL aggregators in PAssert in the TestSparkRunner.
>
> Main problem for Spark is when running gracefully it might take sometime
> to terminate and execute another batch which is an empty iterable, then
> fail on AssertionError <0> but expected <100>.
> Running ungracefully seemed to come back and bite me in the past, so I'm
> trying to avoid it.
>
> Thoughts ?
>
> On Wed, Nov 2, 2016 at 6:00 PM Dan Halperin <dhalp...@google.com.invalid>
> wrote:
>
> (Also: I meant "tests will [incorrectly] pass silently")
>
> On Wed, Nov 2, 2016 at 8:57 AM, Kenneth Knowles <k...@google.com> wrote:
>
> > FWIW if the runner is set up properly the tests will still fail with a
> > timeout waiting for the assertion aggregators to reach expected values.
> > Unfortunately we haven't yet centralized this functionality into
> > TestPipeline or thereabouts.
> >
> > On Wed, Nov 2, 2016 at 8:56 AM Dan Halperin <dhalp...@google.com> wrote:
> >
> >> +Kenn
> >>
> >> I believe this is done because if there is no output, no assertions will
> >> be run and tests will fail silently. (This was a side effect of
> switching
> >> from side inputs to groupbykey for this flow, which enabled testing of
> >> triggers/panes/etc.)
> >>
> >> On Wed, Nov 2, 2016 at 5:19 AM, Amit Sela <amitsel...@gmail.com> wrote:
> >>
> >> I've proposed https://github.com/apache/incubator-beam/pull/1257 (also
> >> opened a ticket).
> >> Tested locally Direct/Flink/Spark runners, and it looks fine, I've
> issued
> >> a
> >> PR to see if it affects Dataflow runner.
> >>
> >> Amit.
> >>
> >> On Wed, Nov 2, 2016 at 11:56 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> >> wrote:
> >>
> >> > Agree, this element should be removed.
> >> >
> >> > Regards
> >> > JB
> >> >
> >> > On 11/02/2016 10:53 AM, Amit Sela wrote:
> >> > > Hi all,
> >> > >
> >> > > I've been looking at PAssert and I've notice that
> >> PAssert.GroupedGlobally
> >> > > points
> >> > > <
> >> > https://github.com/apache/incubator-beam/blob/master/sdks/
> >> java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L825
> >> > >
> >> > > that it will result in a singe empty iterable even if the input
> >> > PCollection
> >> > > is empty.
> >> > > This is a weird behaviour as it may cause following assertions to
> >> fail.
> >> > >
> >> > > Wouldn't it be more correct to remove (filter out ?) this element ?
> >> > >
> >> > > Thanks,
> >> > > Amit
> >> > >
> >> >
> >> > --
> >> > Jean-Baptiste Onofré
> >> > jbono...@apache.org
> >> > http://blog.nanthrax.net
> >> > Talend - http://www.talend.com
> >> >
> >>
> >>
> >>
>
>

Reply via email to