As an update.

Direct and Twister2 are done.
Samza: is ready for review[1].
Flink: is almost ready for review. [2] lays all the groundwork for the
migration and [3] finishes the migration (there is a timeout happening in
FlinkSubmissionTest that I'm trying to figure out).
No further updates on Spark[4] or Jet[5].

@Maximilian Michels <m...@apache.org> or @t...@apache.org
<thomas.we...@gmail.com>, can either of you take a look at the Flink PRs?
@ke.wu...@icloud.com <ke.wu...@icloud.com>, Since Xinyu delegated to you,
can you take another look at the Samza PR?

1: https://github.com/apache/beam/pull/12617
2: https://github.com/apache/beam/pull/12706
3: https://github.com/apache/beam/pull/12708
4: https://github.com/apache/beam/pull/12603
5: https://github.com/apache/beam/pull/12616

On Tue, Aug 18, 2020 at 11:42 AM Pulasthi Supun Wickramasinghe <
pulasthi...@gmail.com> wrote:

> Hi Luke
>
> Will take a look at this as soon as possible and get back to you.
>
> Best Regards,
> Pulasthi
>
> On Tue, Aug 18, 2020 at 2:30 PM Luke Cwik <lc...@google.com> wrote:
>
>> I have made some good progress here and have gotten to the following
>> state for non-portable runners:
>>
>> DirectRunner[1]: Merged. Supports Read.Bounded and Read.Unbounded.
>> Twister2[2]: Ready for review. Supports Read.Bounded, the current runner
>> doesn't support unbounded pipelines.
>> Spark[3]: WIP. Supports Read.Bounded, Nexmark suite passes. Not certain
>> about level of unbounded pipeline support coverage since Spark uses its own
>> tiny suite of tests to get unbounded pipeline coverage instead of the
>> validates runner set.
>> Jet[4]: WIP. Supports Read.Bounded. Read.Unbounded definitely needs
>> additional work.
>> Sazma[5]: WIP. Supports Read.Bounded. Not certain about level of
>> unbounded pipeline support coverage since Spark uses its own tiny suite of
>> tests to get unbounded pipeline coverage instead of the validates runner
>> set.
>> Flink: Unstarted.
>>
>> @Pulasthi Supun Wickramasinghe <pulasthi...@gmail.com> , can you help me
>> with the Twister2 PR[2]?
>> @Ismaël Mejía <ieme...@gmail.com>, is PR[3] the expected level of
>> support for unbounded pipelines and hence ready for review?
>> @Jozsef Bartok <jo...@hazelcast.com>, can you help me out to get support
>> for unbounded splittable DoFn's into Jet[4]?
>> @Xinyu Liu <xinyuliu...@gmail.com>, is PR[5] the expected level of
>> support for unbounded pipelines and hence ready for review?
>>
>> 1: https://github.com/apache/beam/pull/12519
>> 2: https://github.com/apache/beam/pull/12594
>> 3: https://github.com/apache/beam/pull/12603
>> 4: https://github.com/apache/beam/pull/12616
>> 5: https://github.com/apache/beam/pull/12617
>>
>> On Tue, Aug 11, 2020 at 10:55 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> There shouldn't be any changes required since the wrapper will smoothly
>>> transition the execution to be run as an SDF. New IOs should strongly
>>> prefer to use SDF since it should be simpler to write and will be more
>>> flexible but they can use the "*Source"-based APIs. Eventually we'll
>>> deprecate the APIs but we will never stop supporting them. Eventually they
>>> should all be migrated to use SDF and if there is another major Beam
>>> version, we'll finally be able to remove them.
>>>
>>> On Tue, Aug 11, 2020 at 8:40 AM Alexey Romanenko <
>>> aromanenko....@gmail.com> wrote:
>>>
>>>> Hi Luke,
>>>>
>>>> Great to hear about such progress on this!
>>>>
>>>> Talking about opt-out for all runners in the future, will it require
>>>> any code changes for current “*Source”-based IOs or the wrappers should
>>>> completely smooth this transition?
>>>> Do we need to require to create new IOs only based on SDF or again, the
>>>> wrappers should help to avoid this?
>>>>
>>>> On 10 Aug 2020, at 22:59, Luke Cwik <lc...@google.com> wrote:
>>>>
>>>> In the past couple of months wrappers[1, 2] have been added to the Beam
>>>> Java SDK which can execute BoundedSource and UnboundedSource as Splittable
>>>> DoFns. These have been opt-out for portable pipelines (e.g. Dataflow runner
>>>> v2, XLang pipelines on Flink/Spark) and opt-in using an experiment for all
>>>> other pipelines.
>>>>
>>>> I would like to start making the non-portable pipelines starting with
>>>> the DirectRunner[3] to be opt-out with the plan that eventually all runners
>>>> will only execute splittable DoFns and the BoundedSource/UnboundedSource
>>>> specific execution logic from the runners will be removed.
>>>>
>>>> Users will be able to opt-in any pipeline using the experiment
>>>> 'use_sdf_read' and opt-out with the experiment 'use_deprecated_read'. (For
>>>> portable pipelines these experiments were 'beam_fn_api' and
>>>> 'beam_fn_api_use_deprecated_read' respectively and I have added these two
>>>> additional aliases to make the experience less confusing).
>>>>
>>>> 1:
>>>> https://github.com/apache/beam/blob/af1ce643d8fde5352d4519a558de4a2dfd24721d/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L275
>>>> 2:
>>>> https://github.com/apache/beam/blob/af1ce643d8fde5352d4519a558de4a2dfd24721d/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L449
>>>> 3: https://github.com/apache/beam/pull/12519
>>>>
>>>>
>>>>
>
> --
> Pulasthi S. Wickramasinghe
> PhD Candidate  | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> cell: 224-386-9035 <(224)%20386-9035>
>

Reply via email to