There are a limited number of streaming pipelines that run as post commits
on Jenkins but it isn't comprehensive compared to the validates runner set.

Googlers regularly import the github code into Google and find issues with
implementations that have been merged. Sometimes its bugs in
implementations and sometimes its issues related to how the code is
integrated within Google and we typically work with the PR author to fix
things and/or fix them.

As always, we hope to improve this situation by running more tests
externally but we need to address some issues with how we launch one
Dataflow job per test since we have GCP project quota limits.

On Thu, Apr 9, 2020 at 2:33 PM Reuven Lax <re...@google.com> wrote:

> You can also call Create and call setIsBoundedInternal(IsBounded.UNBOUNDED)
> on the resulting PCollection, which will force the streaming runner to be
> used.
>
> On Thu, Apr 9, 2020 at 2:25 PM Steve Niemitz <sniem...@twitter.com> wrote:
>
>> Ah yeah I forgot that you can force a pipeline into streaming mode with
>> that flag.
>>
>> It sounds like the story here is there are tests for the streaming
>> worker, but they run "on the side" in Google's environment?  My concern is
>> it seems like (publically at least) there's no test coverage on the
>> streaming worker.  ie, if I make changes, how do I know if I broke anything?
>>
>> On Thu, Apr 9, 2020 at 5:20 PM Luke Cwik <lc...@google.com> wrote:
>>
>>> You can use Create in streaming pipelines as well but you want to ensure
>>> that --streaming is passed as a flag.
>>> You could update the existing test target and force --streaming to be
>>> inserted for example here:
>>>
>>> https://github.com/lukecwik/incubator-beam/blob/8097972b4d0ed759aa45f6710ac02b982c6e8deb/runners/google-cloud-dataflow-java/build.gradle#L156
>>>
>>> Create can't exercise a lot of complex windowing/triggering semantics
>>> though or can not produce deterministic output which is why TestStream was
>>> created which is like Create but allows you to control watermark and
>>> processing time advancement. This allows for more complicated triggering
>>> and execution but it only works with portable Dataflow pipelines that also
>>> use this additional experiment ('use_unified_worker') which can be launched
>>> with this target:
>>>
>>> https://github.com/lukecwik/incubator-beam/blob/8097972b4d0ed759aa45f6710ac02b982c6e8deb/runners/google-cloud-dataflow-java/build.gradle#L246
>>> We are looking to have the first API stable version using the
>>> portability framework with 2.21.0 which should mean that tests that run
>>> outside of Google will be possible.
>>>
>>>
>>> On Thu, Apr 9, 2020 at 7:17 AM Steve Niemitz <sniem...@apache.org>
>>> wrote:
>>>
>>>> I was trying to run a @ValidatesRunner test for the streaming dataflow
>>>> runner, but I actually can't find any way to run them in streaming.  It
>>>> looks like all the tests are set up using the Create transform,
>>>> which generates a batch pipeline.
>>>>
>>>> Are there actually no @ValidatesRunner tests for the streaming dataflow
>>>> runner?  That seems like a big gap in coverage.  Am I missing a setting
>>>> somewhere?
>>>>
>>>

Reply via email to