Re: Current progress on Portable runners

Thomas Weise Fri, 18 May 2018 07:02:21 -0700

Most of it should probably go to https://beam.apache.org/con
tribute/portability/


Also for reference, here is the prototype doc: https://s.apache.org/beam-
portability-team-doc

Thomas

On Fri, May 18, 2018 at 5:35 AM, Kenneth Knowles <k...@google.com> wrote:

> This is awesome. Would you be up for adding a brief description at
> https://beam.apache.org/contribute/#works-in-progress and maybe a pointer
> to a gdoc with something like the contents of this email? (my reasoning is
> (a) keep the contribution guide concise but (b) all this detail is helpful
> yet (c) the detail may be ever-changing so making a separate web page is
> not the best format)
>
> Kenn
>
> On Thu, May 17, 2018 at 3:13 PM Eugene Kirpichov <kirpic...@google.com>
> wrote:
>
>> Hi all,
>>
>> A little over a month ago, a large group of Beam community members has
>> been working a prototype of a portable Flink runner - that is, a runner
>> that can execute Beam pipelines on Flink via the Portability API
>> <https://s.apache.org/beam-runner-api>. The prototype was developed in a 
>> separate
>> branch <https://github.com/bsidhom/beam/tree/hacking-job-server> and was
>> successfully demonstrated at Flink Forward, where it ran Python and Go
>> pipelines in a limited setting.
>>
>> Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben
>> Sidhom and myself) have been working on productionizing the prototype to
>> address its limitations and do things "the right way", preparing to reuse
>> this work for developing other portable runners (e.g. Spark). This involves
>> a surprising amount of work, since many important design and implementation
>> concerns could be ignored for the purposes of a prototype. I wanted to give
>> an update on where we stand now.
>>
>> Our immediate milestone in sight is *Run Java and Python batch WordCount
>> examples against a distributed remote Flink cluster*. That involves a
>> few moving parts, roughly in order of appearance:
>>
>> *Job submission:*
>> - The SDK is configured to use a "portable runner", whose responsibility
>> is to run the pipeline against a given JobService endpoint.
>> - The portable runner converts the pipeline to a portable Pipeline proto
>> - The runner finds out which artifacts it needs to stage, and staging
>> them against an ArtifactStagingService
>> - A Flink-specific JobService receives the Pipeline proto, performs some
>> optimizations (e.g. fusion) and translates it to Flink datasets and
>> functions
>>
>> *Job execution:*
>> - A Flink function executes a fused chain of Beam transforms (an
>> "executable stage") by converting the input and the stage to bundles and
>> executing them against an SDK harness
>> - The function starts the proper SDK harness, auxiliary services (e.g.
>> artifact retrieval, side input handling) and wires them together
>> - The function feeds the data to the harness and receives data back.
>>
>> *And here is our status of implementation for these parts:* basically,
>> almost everything is either done or in review.
>>
>> *Job submission:*
>> - General-purpose portable runner in the Python SDK: done
>> <https://github.com/apache/beam/pull/5301>; Java SDK: also done
>> <https://github.com/apache/beam/pull/5150>
>> - Artifact staging from the Python SDK: in review (PR
>> <https://github.com/apache/beam/pull/5273>, PR
>> <https://github.com/apache/beam/pull/5251>); in java, it's done also
>> - Flink JobService: in review <https://github.com/apache/beam/pull/5262>
>> - Translation from a Pipeline proto to Flink datasets and functions: done
>> <https://github.com/apache/beam/pull/5226>
>> - ArtifactStagingService implementation that stages artifacts to a
>> location on a distributed filesystem: in development (design is clear)
>>
>> *Job execution:*
>> - Flink function for executing via an SDK harness: done
>> <https://github.com/apache/beam/pull/5285>
>> - APIs for managing lifecycle of an SDK harness: done
>> <https://github.com/apache/beam/pull/5152>
>> - Specific implementation of those APIs using Docker: part done
>> <https://github.com/apache/beam/pull/5189>, part in review
>> <https://github.com/apache/beam/pull/5392>
>> - ArtifactRetrievalService that retrieves artifacts from the location
>> where ArtifactStagingService staged them: in development.
>>
>> We expect that the in-review parts will be done, and the in-development
>> parts be developed, in the next 2-3 weeks. We will, of course, update the
>> community when this important milestone is reached.
>>
>> *After that, the next milestones include:*
>> - Sett up Java, Python and Go ValidatesRunner tests to run against the
>> portable Flink runner, and get them to pass
>> - Expand Python and Go to parity in terms of such test coverage
>> - Implement the portable Spark runner, with a similar lifecycle but
>> reusing almost all of the Flink work
>> - Add support for streaming to both (which requires SDF - that work is
>> progressing in parallel and by this point should be in a suitable place)
>>
>> *For people who would like to get involved in this effort: *You can
>> already help out by improving ValidatesRunner test coverage in Python and
>> Go. Java has >300 such tests, Python has only a handful. There'll be a
>> large amount of parallelizable work once we get the VR test suites running
>> - stay tuned. SDF+Portability is also expected to produce a lot of
>> parallelizable work up for grabs within several weeks.
>>
>> Thanks!
>>
>

Re: Current progress on Portable runners

Reply via email to