Re: Current progress on Portable runners

2018-05-22 Thread Eugene Kirpichov
Thanks all! Yeah, I'll update the Portability page with the status of this
project and other pointers this week or next (mostly out of office this
week).

On Fri, May 18, 2018 at 5:01 PM Thomas Weise  wrote:

> - Flink JobService: in review 
>
> That's TODO (above PR was merged, but it doesn't contain the Flink job
> service).
>
> Discussion about it is here:
> https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit?ts=5afa1238
>
> Thanks,
> Thomas
>
>
>
> On Fri, May 18, 2018 at 7:01 AM, Thomas Weise  wrote:
>
>> Most of it should probably go to
>> https://beam.apache.org/contribute/portability/
>>
>> Also for reference, here is the prototype doc: https://s.apache.org/beam-
>> portability-team-doc
>>
>> Thomas
>>
>> On Fri, May 18, 2018 at 5:35 AM, Kenneth Knowles  wrote:
>>
>>> This is awesome. Would you be up for adding a brief description at
>>> https://beam.apache.org/contribute/#works-in-progress and maybe a
>>> pointer to a gdoc with something like the contents of this email? (my
>>> reasoning is (a) keep the contribution guide concise but (b) all this
>>> detail is helpful yet (c) the detail may be ever-changing so making a
>>> separate web page is not the best format)
>>>
>>> Kenn
>>>
>>> On Thu, May 17, 2018 at 3:13 PM Eugene Kirpichov 
>>> wrote:
>>>
 Hi all,

 A little over a month ago, a large group of Beam community members has
 been working a prototype of a portable Flink runner - that is, a runner
 that can execute Beam pipelines on Flink via the Portability API
 . The prototype was developed in
 a separate branch
  and was
 successfully demonstrated at Flink Forward, where it ran Python and Go
 pipelines in a limited setting.

 Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben
 Sidhom and myself) have been working on productionizing the prototype to
 address its limitations and do things "the right way", preparing to reuse
 this work for developing other portable runners (e.g. Spark). This involves
 a surprising amount of work, since many important design and implementation
 concerns could be ignored for the purposes of a prototype. I wanted to give
 an update on where we stand now.

 Our immediate milestone in sight is *Run Java and Python batch
 WordCount examples against a distributed remote Flink cluster*. That
 involves a few moving parts, roughly in order of appearance:

 *Job submission:*
 - The SDK is configured to use a "portable runner", whose
 responsibility is to run the pipeline against a given JobService endpoint.
 - The portable runner converts the pipeline to a portable Pipeline proto
 - The runner finds out which artifacts it needs to stage, and staging
 them against an ArtifactStagingService
 - A Flink-specific JobService receives the Pipeline proto, performs
 some optimizations (e.g. fusion) and translates it to Flink datasets and
 functions

 *Job execution:*
 - A Flink function executes a fused chain of Beam transforms (an
 "executable stage") by converting the input and the stage to bundles and
 executing them against an SDK harness
 - The function starts the proper SDK harness, auxiliary services (e.g.
 artifact retrieval, side input handling) and wires them together
 - The function feeds the data to the harness and receives data back.

 *And here is our status of implementation for these parts:* basically,
 almost everything is either done or in review.

 *Job submission:*
 - General-purpose portable runner in the Python SDK: done
 ; Java SDK: also done
 
 - Artifact staging from the Python SDK: in review (PR
 , PR
 ); in java, it's done also
 - Flink JobService: in review
 
 - Translation from a Pipeline proto to Flink datasets and functions:
 done 
 - ArtifactStagingService implementation that stages artifacts to a
 location on a distributed filesystem: in development (design is clear)

 *Job execution:*
 - Flink function for executing via an SDK harness: done
 
 - APIs for managing lifecycle of an SDK harness: done
 
 - Specific implementation of those APIs using Docker: part done
 , part in review
 
 - ArtifactRetrievalService that retrieves artifacts from the location
 where ArtifactStagingService 

Re: Current progress on Portable runners

2018-05-18 Thread Thomas Weise
- Flink JobService: in review 

That's TODO (above PR was merged, but it doesn't contain the Flink job
service).

Discussion about it is here:
https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit?ts=5afa1238

Thanks,
Thomas



On Fri, May 18, 2018 at 7:01 AM, Thomas Weise  wrote:

> Most of it should probably go to https://beam.apache.org/con
> tribute/portability/
>
> Also for reference, here is the prototype doc: https://s.apache.org/beam-
> portability-team-doc
>
> Thomas
>
> On Fri, May 18, 2018 at 5:35 AM, Kenneth Knowles  wrote:
>
>> This is awesome. Would you be up for adding a brief description at
>> https://beam.apache.org/contribute/#works-in-progress and maybe a
>> pointer to a gdoc with something like the contents of this email? (my
>> reasoning is (a) keep the contribution guide concise but (b) all this
>> detail is helpful yet (c) the detail may be ever-changing so making a
>> separate web page is not the best format)
>>
>> Kenn
>>
>> On Thu, May 17, 2018 at 3:13 PM Eugene Kirpichov 
>> wrote:
>>
>>> Hi all,
>>>
>>> A little over a month ago, a large group of Beam community members has
>>> been working a prototype of a portable Flink runner - that is, a runner
>>> that can execute Beam pipelines on Flink via the Portability API
>>> . The prototype was developed in
>>> a separate branch
>>>  and was
>>> successfully demonstrated at Flink Forward, where it ran Python and Go
>>> pipelines in a limited setting.
>>>
>>> Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben
>>> Sidhom and myself) have been working on productionizing the prototype to
>>> address its limitations and do things "the right way", preparing to reuse
>>> this work for developing other portable runners (e.g. Spark). This involves
>>> a surprising amount of work, since many important design and implementation
>>> concerns could be ignored for the purposes of a prototype. I wanted to give
>>> an update on where we stand now.
>>>
>>> Our immediate milestone in sight is *Run Java and Python batch
>>> WordCount examples against a distributed remote Flink cluster*. That
>>> involves a few moving parts, roughly in order of appearance:
>>>
>>> *Job submission:*
>>> - The SDK is configured to use a "portable runner", whose responsibility
>>> is to run the pipeline against a given JobService endpoint.
>>> - The portable runner converts the pipeline to a portable Pipeline proto
>>> - The runner finds out which artifacts it needs to stage, and staging
>>> them against an ArtifactStagingService
>>> - A Flink-specific JobService receives the Pipeline proto, performs some
>>> optimizations (e.g. fusion) and translates it to Flink datasets and
>>> functions
>>>
>>> *Job execution:*
>>> - A Flink function executes a fused chain of Beam transforms (an
>>> "executable stage") by converting the input and the stage to bundles and
>>> executing them against an SDK harness
>>> - The function starts the proper SDK harness, auxiliary services (e.g.
>>> artifact retrieval, side input handling) and wires them together
>>> - The function feeds the data to the harness and receives data back.
>>>
>>> *And here is our status of implementation for these parts:* basically,
>>> almost everything is either done or in review.
>>>
>>> *Job submission:*
>>> - General-purpose portable runner in the Python SDK: done
>>> ; Java SDK: also done
>>> 
>>> - Artifact staging from the Python SDK: in review (PR
>>> , PR
>>> ); in java, it's done also
>>> - Flink JobService: in review 
>>> - Translation from a Pipeline proto to Flink datasets and functions:
>>> done 
>>> - ArtifactStagingService implementation that stages artifacts to a
>>> location on a distributed filesystem: in development (design is clear)
>>>
>>> *Job execution:*
>>> - Flink function for executing via an SDK harness: done
>>> 
>>> - APIs for managing lifecycle of an SDK harness: done
>>> 
>>> - Specific implementation of those APIs using Docker: part done
>>> , part in review
>>> 
>>> - ArtifactRetrievalService that retrieves artifacts from the location
>>> where ArtifactStagingService staged them: in development.
>>>
>>> We expect that the in-review parts will be done, and the in-development
>>> parts be developed, in the next 2-3 weeks. We will, of course, update the
>>> community when this important milestone is reached.
>>>
>>> *After that, the next milestones include:*
>>> - Sett up Java, Python and 

Re: Current progress on Portable runners

2018-05-18 Thread Thomas Weise
Most of it should probably go to https://beam.apache.org/con
tribute/portability/

Also for reference, here is the prototype doc: https://s.apache.org/beam-
portability-team-doc

Thomas

On Fri, May 18, 2018 at 5:35 AM, Kenneth Knowles  wrote:

> This is awesome. Would you be up for adding a brief description at
> https://beam.apache.org/contribute/#works-in-progress and maybe a pointer
> to a gdoc with something like the contents of this email? (my reasoning is
> (a) keep the contribution guide concise but (b) all this detail is helpful
> yet (c) the detail may be ever-changing so making a separate web page is
> not the best format)
>
> Kenn
>
> On Thu, May 17, 2018 at 3:13 PM Eugene Kirpichov 
> wrote:
>
>> Hi all,
>>
>> A little over a month ago, a large group of Beam community members has
>> been working a prototype of a portable Flink runner - that is, a runner
>> that can execute Beam pipelines on Flink via the Portability API
>> . The prototype was developed in a 
>> separate
>> branch  and was
>> successfully demonstrated at Flink Forward, where it ran Python and Go
>> pipelines in a limited setting.
>>
>> Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben
>> Sidhom and myself) have been working on productionizing the prototype to
>> address its limitations and do things "the right way", preparing to reuse
>> this work for developing other portable runners (e.g. Spark). This involves
>> a surprising amount of work, since many important design and implementation
>> concerns could be ignored for the purposes of a prototype. I wanted to give
>> an update on where we stand now.
>>
>> Our immediate milestone in sight is *Run Java and Python batch WordCount
>> examples against a distributed remote Flink cluster*. That involves a
>> few moving parts, roughly in order of appearance:
>>
>> *Job submission:*
>> - The SDK is configured to use a "portable runner", whose responsibility
>> is to run the pipeline against a given JobService endpoint.
>> - The portable runner converts the pipeline to a portable Pipeline proto
>> - The runner finds out which artifacts it needs to stage, and staging
>> them against an ArtifactStagingService
>> - A Flink-specific JobService receives the Pipeline proto, performs some
>> optimizations (e.g. fusion) and translates it to Flink datasets and
>> functions
>>
>> *Job execution:*
>> - A Flink function executes a fused chain of Beam transforms (an
>> "executable stage") by converting the input and the stage to bundles and
>> executing them against an SDK harness
>> - The function starts the proper SDK harness, auxiliary services (e.g.
>> artifact retrieval, side input handling) and wires them together
>> - The function feeds the data to the harness and receives data back.
>>
>> *And here is our status of implementation for these parts:* basically,
>> almost everything is either done or in review.
>>
>> *Job submission:*
>> - General-purpose portable runner in the Python SDK: done
>> ; Java SDK: also done
>> 
>> - Artifact staging from the Python SDK: in review (PR
>> , PR
>> ); in java, it's done also
>> - Flink JobService: in review 
>> - Translation from a Pipeline proto to Flink datasets and functions: done
>> 
>> - ArtifactStagingService implementation that stages artifacts to a
>> location on a distributed filesystem: in development (design is clear)
>>
>> *Job execution:*
>> - Flink function for executing via an SDK harness: done
>> 
>> - APIs for managing lifecycle of an SDK harness: done
>> 
>> - Specific implementation of those APIs using Docker: part done
>> , part in review
>> 
>> - ArtifactRetrievalService that retrieves artifacts from the location
>> where ArtifactStagingService staged them: in development.
>>
>> We expect that the in-review parts will be done, and the in-development
>> parts be developed, in the next 2-3 weeks. We will, of course, update the
>> community when this important milestone is reached.
>>
>> *After that, the next milestones include:*
>> - Sett up Java, Python and Go ValidatesRunner tests to run against the
>> portable Flink runner, and get them to pass
>> - Expand Python and Go to parity in terms of such test coverage
>> - Implement the portable Spark runner, with a similar lifecycle but
>> reusing almost all of the Flink work
>> - Add support for streaming to both (which requires SDF - that work is
>> progressing in parallel and by this point should be in a suitable place)
>>
>> *For people who would like to ge

Re: Current progress on Portable runners

2018-05-18 Thread Kenneth Knowles
This is awesome. Would you be up for adding a brief description at
https://beam.apache.org/contribute/#works-in-progress and maybe a pointer
to a gdoc with something like the contents of this email? (my reasoning is
(a) keep the contribution guide concise but (b) all this detail is helpful
yet (c) the detail may be ever-changing so making a separate web page is
not the best format)

Kenn

On Thu, May 17, 2018 at 3:13 PM Eugene Kirpichov 
wrote:

> Hi all,
>
> A little over a month ago, a large group of Beam community members has
> been working a prototype of a portable Flink runner - that is, a runner
> that can execute Beam pipelines on Flink via the Portability API
> . The prototype was developed in a 
> separate
> branch  and was
> successfully demonstrated at Flink Forward, where it ran Python and Go
> pipelines in a limited setting.
>
> Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben
> Sidhom and myself) have been working on productionizing the prototype to
> address its limitations and do things "the right way", preparing to reuse
> this work for developing other portable runners (e.g. Spark). This involves
> a surprising amount of work, since many important design and implementation
> concerns could be ignored for the purposes of a prototype. I wanted to give
> an update on where we stand now.
>
> Our immediate milestone in sight is *Run Java and Python batch WordCount
> examples against a distributed remote Flink cluster*. That involves a few
> moving parts, roughly in order of appearance:
>
> *Job submission:*
> - The SDK is configured to use a "portable runner", whose responsibility
> is to run the pipeline against a given JobService endpoint.
> - The portable runner converts the pipeline to a portable Pipeline proto
> - The runner finds out which artifacts it needs to stage, and staging them
> against an ArtifactStagingService
> - A Flink-specific JobService receives the Pipeline proto, performs some
> optimizations (e.g. fusion) and translates it to Flink datasets and
> functions
>
> *Job execution:*
> - A Flink function executes a fused chain of Beam transforms (an
> "executable stage") by converting the input and the stage to bundles and
> executing them against an SDK harness
> - The function starts the proper SDK harness, auxiliary services (e.g.
> artifact retrieval, side input handling) and wires them together
> - The function feeds the data to the harness and receives data back.
>
> *And here is our status of implementation for these parts:* basically,
> almost everything is either done or in review.
>
> *Job submission:*
> - General-purpose portable runner in the Python SDK: done
> ; Java SDK: also done
> 
> - Artifact staging from the Python SDK: in review (PR
> , PR
> ); in java, it's done also
> - Flink JobService: in review 
> - Translation from a Pipeline proto to Flink datasets and functions: done
> 
> - ArtifactStagingService implementation that stages artifacts to a
> location on a distributed filesystem: in development (design is clear)
>
> *Job execution:*
> - Flink function for executing via an SDK harness: done
> 
> - APIs for managing lifecycle of an SDK harness: done
> 
> - Specific implementation of those APIs using Docker: part done
> , part in review
> 
> - ArtifactRetrievalService that retrieves artifacts from the location
> where ArtifactStagingService staged them: in development.
>
> We expect that the in-review parts will be done, and the in-development
> parts be developed, in the next 2-3 weeks. We will, of course, update the
> community when this important milestone is reached.
>
> *After that, the next milestones include:*
> - Sett up Java, Python and Go ValidatesRunner tests to run against the
> portable Flink runner, and get them to pass
> - Expand Python and Go to parity in terms of such test coverage
> - Implement the portable Spark runner, with a similar lifecycle but
> reusing almost all of the Flink work
> - Add support for streaming to both (which requires SDF - that work is
> progressing in parallel and by this point should be in a suitable place)
>
> *For people who would like to get involved in this effort: *You can
> already help out by improving ValidatesRunner test coverage in Python and
> Go. Java has >300 such tests, Python has only a handful. There'll be a
> large amount of parallelizable work once we get the VR test suites running
> - stay tuned. SDF+Portability is also expected to produce a lot of
> parallelizable 

Re: Current progress on Portable runners

2018-05-17 Thread Robert Bradshaw
On Thu, May 17, 2018 at 10:25 PM Thomas Weise  wrote:

> Hi Eugene,

> Thanks for putting this together, this is a very nice update and brings
> much needed visibility to those hoping to make use of the portability
> features or contribute to them.

+1, this is a great summary.

> Since the P1 (MVP) milestone is "wordcount" and some of the next things
> listed are more contributor oriented, perhaps we can get more detailed on
> what functionality users can expect?

The way things are structured, once we can run wordcount, we'll be able to
run almost all batch pipelines.

> The next P2 milestone is basically everything and that is a lot. It might
> actually help to break this down a bit more. A couple of things that I'm
> specifically interested in for Python on Flink:

> AFAIK state and timer support in Python are not being worked on yet, is
> anyone planning to and any idea by when SDK and portable runner might
> support it?

IIRC, Luke's starting to investigate portable support for state (and
timers). The Python SDK does not yet offer them yet; this work is ripe for
someone to pick up. I'd be more than happy to talk to anyone about design
here. (The Python SDK already has some notion of state in the side input
implementation, but how exactly timers will be plumbed back is still a bit
less clear.)

> Session windows are supported in the Python SDK, but will they (and all
other windowing features) work equally well on the portable Flink runner?
We know that custom window functions will need work..

The "standard" known window fn URNs (global, fixed, sliding, and sessions)
will work equally well on the portable Flink runner. WindowFns with URNs
that are not built into the runner core (e.g. "PickledPythonWindowFn") are
a bit harder, and there's still design work to do for how best to support
merging in that case.

Windowing also impacts side inputs, and this should "just work" as long as
the runner can understand the window coder (currently we only have Global
and IntervalWindows) and probably wouldn't be too hard to augment with
length prefixing for arbitrary window types (given that the window mapping
happens in the SDK).

> BTW can you clarify the dependency between streaming support (which I'm
working on) and SDF. It refers to new connectors?

It's pretty straightforward to translate a bound source into an impulse, a
DoFn that outputs splits, and then a DoFn that reads the splits, none of
which requires the S part of SDF (until one wants to do liquid sharding of
course). We'll actually need the full splittable protocol over the Fn API
to wrap existing unbounded sources (optimistically hoping there's a clean
map) as (a series of) SDFs or write new ones.

- Robert


> On Thu, May 17, 2018 at 3:12 PM, Eugene Kirpichov 
wrote:

>> Hi all,

>> A little over a month ago, a large group of Beam community members has
been working a prototype of a portable Flink runner - that is, a runner
that can execute Beam pipelines on Flink via the Portability API. The
prototype was developed in a separate branch and was successfully
demonstrated at Flink Forward, where it ran Python and Go pipelines in a
limited setting.

>> Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben
Sidhom and myself) have been working on productionizing the prototype to
address its limitations and do things "the right way", preparing to reuse
this work for developing other portable runners (e.g. Spark). This involves
a surprising amount of work, since many important design and implementation
concerns could be ignored for the purposes of a prototype. I wanted to give
an update on where we stand now.

>> Our immediate milestone in sight is Run Java and Python batch WordCount
examples against a distributed remote Flink cluster. That involves a few
moving parts, roughly in order of appearance:

>> Job submission:
>> - The SDK is configured to use a "portable runner", whose responsibility
is to run the pipeline against a given JobService endpoint.
>> - The portable runner converts the pipeline to a portable Pipeline proto
>> - The runner finds out which artifacts it needs to stage, and staging
them against an ArtifactStagingService
>> - A Flink-specific JobService receives the Pipeline proto, performs some
optimizations (e.g. fusion) and translates it to Flink datasets and
functions

>> Job execution:
>> - A Flink function executes a fused chain of Beam transforms (an
"executable stage") by converting the input and the stage to bundles and
executing them against an SDK harness
>> - The function starts the proper SDK harness, auxiliary services (e.g.
artifact retrieval, side input handling) and wires them together
>> - The function feeds the data to the harness and receives data back.

>> And here is our status of implementation for these parts: basically,
almost everything is either done or in review.

>> Job submission:
>> - General-purpose portable runner in the Python SDK: done; Java SDK:
also done
>> - Artifact staging from the 

Re: Current progress on Portable runners

2018-05-17 Thread Thomas Weise
Hi Eugene,

Thanks for putting this together, this is a very nice update and brings
much needed visibility to those hoping to make use of the portability
features or contribute to them.

Since the P1 (MVP) milestone is "wordcount" and some of the next things
listed are more contributor oriented, perhaps we can get more detailed on
what functionality users can expect?

The next P2 milestone is basically everything and that is a lot. It might
actually help to break this down a bit more. A couple of things that I'm
specifically interested in for Python on Flink:

AFAIK state and timer support in Python are not being worked on yet, is
anyone planning to and any idea by when SDK and portable runner might
support it?

Session windows are supported in the Python SDK, but will they (and all
other windowing features) work equally well on the portable Flink runner?
We know that custom window functions will need work..

BTW can you clarify the dependency between streaming support (which I'm
working on) and SDF. It refers to new connectors?

Thanks,
Thomas


On Thu, May 17, 2018 at 3:12 PM, Eugene Kirpichov 
wrote:

> Hi all,
>
> A little over a month ago, a large group of Beam community members has
> been working a prototype of a portable Flink runner - that is, a runner
> that can execute Beam pipelines on Flink via the Portability API
> . The prototype was developed in a 
> separate
> branch  and was
> successfully demonstrated at Flink Forward, where it ran Python and Go
> pipelines in a limited setting.
>
> Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben
> Sidhom and myself) have been working on productionizing the prototype to
> address its limitations and do things "the right way", preparing to reuse
> this work for developing other portable runners (e.g. Spark). This involves
> a surprising amount of work, since many important design and implementation
> concerns could be ignored for the purposes of a prototype. I wanted to give
> an update on where we stand now.
>
> Our immediate milestone in sight is *Run Java and Python batch WordCount
> examples against a distributed remote Flink cluster*. That involves a few
> moving parts, roughly in order of appearance:
>
> *Job submission:*
> - The SDK is configured to use a "portable runner", whose responsibility
> is to run the pipeline against a given JobService endpoint.
> - The portable runner converts the pipeline to a portable Pipeline proto
> - The runner finds out which artifacts it needs to stage, and staging them
> against an ArtifactStagingService
> - A Flink-specific JobService receives the Pipeline proto, performs some
> optimizations (e.g. fusion) and translates it to Flink datasets and
> functions
>
> *Job execution:*
> - A Flink function executes a fused chain of Beam transforms (an
> "executable stage") by converting the input and the stage to bundles and
> executing them against an SDK harness
> - The function starts the proper SDK harness, auxiliary services (e.g.
> artifact retrieval, side input handling) and wires them together
> - The function feeds the data to the harness and receives data back.
>
> *And here is our status of implementation for these parts:* basically,
> almost everything is either done or in review.
>
> *Job submission:*
> - General-purpose portable runner in the Python SDK: done
> ; Java SDK: also done
> 
> - Artifact staging from the Python SDK: in review (PR
> , PR
> ); in java, it's done also
> - Flink JobService: in review 
> - Translation from a Pipeline proto to Flink datasets and functions: done
> 
> - ArtifactStagingService implementation that stages artifacts to a
> location on a distributed filesystem: in development (design is clear)
>
> *Job execution:*
> - Flink function for executing via an SDK harness: done
> 
> - APIs for managing lifecycle of an SDK harness: done
> 
> - Specific implementation of those APIs using Docker: part done
> , part in review
> 
> - ArtifactRetrievalService that retrieves artifacts from the location
> where ArtifactStagingService staged them: in development.
>
> We expect that the in-review parts will be done, and the in-development
> parts be developed, in the next 2-3 weeks. We will, of course, update the
> community when this important milestone is reached.
>
> *After that, the next milestones include:*
> - Sett up Java, Python and Go ValidatesRunner tests to run against the
> portable Flink runner, and get them to pass
> - Expand Python and Go to parity 

Current progress on Portable runners

2018-05-17 Thread Eugene Kirpichov
Hi all,

A little over a month ago, a large group of Beam community members has been
working a prototype of a portable Flink runner - that is, a runner that can
execute Beam pipelines on Flink via the Portability API
. The prototype was developed in
a separate
branch  and was
successfully demonstrated at Flink Forward, where it ran Python and Go
pipelines in a limited setting.

Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben
Sidhom and myself) have been working on productionizing the prototype to
address its limitations and do things "the right way", preparing to reuse
this work for developing other portable runners (e.g. Spark). This involves
a surprising amount of work, since many important design and implementation
concerns could be ignored for the purposes of a prototype. I wanted to give
an update on where we stand now.

Our immediate milestone in sight is *Run Java and Python batch WordCount
examples against a distributed remote Flink cluster*. That involves a few
moving parts, roughly in order of appearance:

*Job submission:*
- The SDK is configured to use a "portable runner", whose responsibility is
to run the pipeline against a given JobService endpoint.
- The portable runner converts the pipeline to a portable Pipeline proto
- The runner finds out which artifacts it needs to stage, and staging them
against an ArtifactStagingService
- A Flink-specific JobService receives the Pipeline proto, performs some
optimizations (e.g. fusion) and translates it to Flink datasets and
functions

*Job execution:*
- A Flink function executes a fused chain of Beam transforms (an
"executable stage") by converting the input and the stage to bundles and
executing them against an SDK harness
- The function starts the proper SDK harness, auxiliary services (e.g.
artifact retrieval, side input handling) and wires them together
- The function feeds the data to the harness and receives data back.

*And here is our status of implementation for these parts:* basically,
almost everything is either done or in review.

*Job submission:*
- General-purpose portable runner in the Python SDK: done
; Java SDK: also done

- Artifact staging from the Python SDK: in review (PR
, PR
); in java, it's done also
- Flink JobService: in review 
- Translation from a Pipeline proto to Flink datasets and functions: done

- ArtifactStagingService implementation that stages artifacts to a location
on a distributed filesystem: in development (design is clear)

*Job execution:*
- Flink function for executing via an SDK harness: done

- APIs for managing lifecycle of an SDK harness: done

- Specific implementation of those APIs using Docker: part done
, part in review

- ArtifactRetrievalService that retrieves artifacts from the location where
ArtifactStagingService staged them: in development.

We expect that the in-review parts will be done, and the in-development
parts be developed, in the next 2-3 weeks. We will, of course, update the
community when this important milestone is reached.

*After that, the next milestones include:*
- Sett up Java, Python and Go ValidatesRunner tests to run against the
portable Flink runner, and get them to pass
- Expand Python and Go to parity in terms of such test coverage
- Implement the portable Spark runner, with a similar lifecycle but reusing
almost all of the Flink work
- Add support for streaming to both (which requires SDF - that work is
progressing in parallel and by this point should be in a suitable place)

*For people who would like to get involved in this effort: *You can already
help out by improving ValidatesRunner test coverage in Python and Go. Java
has >300 such tests, Python has only a handful. There'll be a large amount
of parallelizable work once we get the VR test suites running - stay tuned.
SDF+Portability is also expected to produce a lot of parallelizable work up
for grabs within several weeks.

Thanks!