I think this problem emphasise the importance of triggers - for tests alone, it's a huge improvement over non-deterministic and optimistic hacks (wait for... tail log to see... etc.)
As a runner developer, I'd like to be able to develop incrementally with tests that allow me to say I support the Source API in streaming, but not yet triggers. This is not possible now as PAssert relies on GroupByKey and Never.ever() trigger. If I had a quick solution for this, I'd offer it, but I understand why it's hard (exactly because of the discussed issues) - I thought that filtering-out the empty iterable would be ok because runners are asserting the success/failure aggregators. Anyway, I think it's something worth considering as part of the Runner API - being able to break-down the development effort to testable stages. Thanks, Amit On Thu, Nov 3, 2016 at 12:11 AM Kenneth Knowles <k...@google.com> wrote: > Interesting problem. > > First, I do think that PAssert is defined entirely in terms of the model > so this is a real problem with the results that are produced. > > Then, I think the answer might lie with triggering. When PAssert rewindows > into the global window and triggers "never" this actually means that its > internal GBK emits all of its output exactly once, when the window is > expiring. So producing more than one output - even if spread across > microbatches - is actually the trouble. > > On Wed, Nov 2, 2016 at 10:20 AM Amit Sela <amitsel...@gmail.com> wrote: > > Spark won't evaluate empty PCollections as well, and I've handled it by > asserting the SUCCESS/FAIL aggregators in PAssert in the TestSparkRunner. > > Main problem for Spark is when running gracefully it might take sometime > to terminate and execute another batch which is an empty iterable, then > fail on AssertionError <0> but expected <100>. > Running ungracefully seemed to come back and bite me in the past, so I'm > trying to avoid it. > > Thoughts ? > > On Wed, Nov 2, 2016 at 6:00 PM Dan Halperin <dhalp...@google.com.invalid> > wrote: > > (Also: I meant "tests will [incorrectly] pass silently") > > On Wed, Nov 2, 2016 at 8:57 AM, Kenneth Knowles <k...@google.com> wrote: > > > FWIW if the runner is set up properly the tests will still fail with a > > timeout waiting for the assertion aggregators to reach expected values. > > Unfortunately we haven't yet centralized this functionality into > > TestPipeline or thereabouts. > > > > On Wed, Nov 2, 2016 at 8:56 AM Dan Halperin <dhalp...@google.com> wrote: > > > >> +Kenn > >> > >> I believe this is done because if there is no output, no assertions will > >> be run and tests will fail silently. (This was a side effect of > switching > >> from side inputs to groupbykey for this flow, which enabled testing of > >> triggers/panes/etc.) > >> > >> On Wed, Nov 2, 2016 at 5:19 AM, Amit Sela <amitsel...@gmail.com> wrote: > >> > >> I've proposed https://github.com/apache/incubator-beam/pull/1257 (also > >> opened a ticket). > >> Tested locally Direct/Flink/Spark runners, and it looks fine, I've > issued > >> a > >> PR to see if it affects Dataflow runner. > >> > >> Amit. > >> > >> On Wed, Nov 2, 2016 at 11:56 AM Jean-Baptiste Onofré <j...@nanthrax.net> > >> wrote: > >> > >> > Agree, this element should be removed. > >> > > >> > Regards > >> > JB > >> > > >> > On 11/02/2016 10:53 AM, Amit Sela wrote: > >> > > Hi all, > >> > > > >> > > I've been looking at PAssert and I've notice that > >> PAssert.GroupedGlobally > >> > > points > >> > > < > >> > https://github.com/apache/incubator-beam/blob/master/sdks/ > >> java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L825 > >> > > > >> > > that it will result in a singe empty iterable even if the input > >> > PCollection > >> > > is empty. > >> > > This is a weird behaviour as it may cause following assertions to > >> fail. > >> > > > >> > > Wouldn't it be more correct to remove (filter out ?) this element ? > >> > > > >> > > Thanks, > >> > > Amit > >> > > > >> > > >> > -- > >> > Jean-Baptiste Onofré > >> > jbono...@apache.org > >> > http://blog.nanthrax.net > >> > Talend - http://www.talend.com > >> > > >> > >> > >> > >