As an update. Direct and Twister2 are done. Samza: is ready for review[1]. Flink: is almost ready for review. [2] lays all the groundwork for the migration and [3] finishes the migration (there is a timeout happening in FlinkSubmissionTest that I'm trying to figure out). No further updates on Spark[4] or Jet[5].
@Maximilian Michels <m...@apache.org> or @t...@apache.org <thomas.we...@gmail.com>, can either of you take a look at the Flink PRs? @ke.wu...@icloud.com <ke.wu...@icloud.com>, Since Xinyu delegated to you, can you take another look at the Samza PR? 1: https://github.com/apache/beam/pull/12617 2: https://github.com/apache/beam/pull/12706 3: https://github.com/apache/beam/pull/12708 4: https://github.com/apache/beam/pull/12603 5: https://github.com/apache/beam/pull/12616 On Tue, Aug 18, 2020 at 11:42 AM Pulasthi Supun Wickramasinghe < pulasthi...@gmail.com> wrote: > Hi Luke > > Will take a look at this as soon as possible and get back to you. > > Best Regards, > Pulasthi > > On Tue, Aug 18, 2020 at 2:30 PM Luke Cwik <lc...@google.com> wrote: > >> I have made some good progress here and have gotten to the following >> state for non-portable runners: >> >> DirectRunner[1]: Merged. Supports Read.Bounded and Read.Unbounded. >> Twister2[2]: Ready for review. Supports Read.Bounded, the current runner >> doesn't support unbounded pipelines. >> Spark[3]: WIP. Supports Read.Bounded, Nexmark suite passes. Not certain >> about level of unbounded pipeline support coverage since Spark uses its own >> tiny suite of tests to get unbounded pipeline coverage instead of the >> validates runner set. >> Jet[4]: WIP. Supports Read.Bounded. Read.Unbounded definitely needs >> additional work. >> Sazma[5]: WIP. Supports Read.Bounded. Not certain about level of >> unbounded pipeline support coverage since Spark uses its own tiny suite of >> tests to get unbounded pipeline coverage instead of the validates runner >> set. >> Flink: Unstarted. >> >> @Pulasthi Supun Wickramasinghe <pulasthi...@gmail.com> , can you help me >> with the Twister2 PR[2]? >> @Ismaël Mejía <ieme...@gmail.com>, is PR[3] the expected level of >> support for unbounded pipelines and hence ready for review? >> @Jozsef Bartok <jo...@hazelcast.com>, can you help me out to get support >> for unbounded splittable DoFn's into Jet[4]? >> @Xinyu Liu <xinyuliu...@gmail.com>, is PR[5] the expected level of >> support for unbounded pipelines and hence ready for review? >> >> 1: https://github.com/apache/beam/pull/12519 >> 2: https://github.com/apache/beam/pull/12594 >> 3: https://github.com/apache/beam/pull/12603 >> 4: https://github.com/apache/beam/pull/12616 >> 5: https://github.com/apache/beam/pull/12617 >> >> On Tue, Aug 11, 2020 at 10:55 AM Luke Cwik <lc...@google.com> wrote: >> >>> There shouldn't be any changes required since the wrapper will smoothly >>> transition the execution to be run as an SDF. New IOs should strongly >>> prefer to use SDF since it should be simpler to write and will be more >>> flexible but they can use the "*Source"-based APIs. Eventually we'll >>> deprecate the APIs but we will never stop supporting them. Eventually they >>> should all be migrated to use SDF and if there is another major Beam >>> version, we'll finally be able to remove them. >>> >>> On Tue, Aug 11, 2020 at 8:40 AM Alexey Romanenko < >>> aromanenko....@gmail.com> wrote: >>> >>>> Hi Luke, >>>> >>>> Great to hear about such progress on this! >>>> >>>> Talking about opt-out for all runners in the future, will it require >>>> any code changes for current “*Source”-based IOs or the wrappers should >>>> completely smooth this transition? >>>> Do we need to require to create new IOs only based on SDF or again, the >>>> wrappers should help to avoid this? >>>> >>>> On 10 Aug 2020, at 22:59, Luke Cwik <lc...@google.com> wrote: >>>> >>>> In the past couple of months wrappers[1, 2] have been added to the Beam >>>> Java SDK which can execute BoundedSource and UnboundedSource as Splittable >>>> DoFns. These have been opt-out for portable pipelines (e.g. Dataflow runner >>>> v2, XLang pipelines on Flink/Spark) and opt-in using an experiment for all >>>> other pipelines. >>>> >>>> I would like to start making the non-portable pipelines starting with >>>> the DirectRunner[3] to be opt-out with the plan that eventually all runners >>>> will only execute splittable DoFns and the BoundedSource/UnboundedSource >>>> specific execution logic from the runners will be removed. >>>> >>>> Users will be able to opt-in any pipeline using the experiment >>>> 'use_sdf_read' and opt-out with the experiment 'use_deprecated_read'. (For >>>> portable pipelines these experiments were 'beam_fn_api' and >>>> 'beam_fn_api_use_deprecated_read' respectively and I have added these two >>>> additional aliases to make the experience less confusing). >>>> >>>> 1: >>>> https://github.com/apache/beam/blob/af1ce643d8fde5352d4519a558de4a2dfd24721d/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L275 >>>> 2: >>>> https://github.com/apache/beam/blob/af1ce643d8fde5352d4519a558de4a2dfd24721d/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L449 >>>> 3: https://github.com/apache/beam/pull/12519 >>>> >>>> >>>> > > -- > Pulasthi S. Wickramasinghe > PhD Candidate | Research Assistant > School of Informatics and Computing | Digital Science Center > Indiana University, Bloomington > cell: 224-386-9035 <(224)%20386-9035> >