Good questions Anton. I can't give *definitive* answers to any of these,
but I can at least explain how I've been interpreting the move to the
Python version.

 - if the focus is on python runner for portability efforts, how does java
> SDK (and other languages) tie into this? E.g. how do we run, test, measure,
> and develop things (pipelines, aspects of the SDK, runner);


You should be able to run anything that worked with the Java ULR on the
Python one. Thanks to Portability the Runner and SDK can be completely
independent. For example when I was working on the Java ULR I got it
running the Python validatesRunner tests that are currently used to test
the Python ULR. The reverse should hold true. I don't want to get too in
depth on how it and other local portable runners are used, but the short
version is that you would start the runner as a separate process on your
machine and then indicate the runner you're using and the port it's on in
your Pipeline Options.

The main obstacle I see is that recommending a Python runner for people
running Java pipelines is counterintuitive. It would require users to have
Python installed on their machine just to test their Java code which is a
difficult situation to explain.

 - what's our approach to developing new features, should we make sure
> python runner supports them as early as possible (e.g. schemas and SQL)?
>

That was the original hope with the Java ULR, that it would be a good place
to start implementing and iterating on new features without having to
implement them in a more complex runner. Of course we never actually
reached that goal, but we might be able to with the Python ULR since it's
so much further in development.

- java DirectRunner is still there:
>     - it is still the primary tool for java SDK development purposes, and
> as Kenn mentioned in the linked threads it adds value by making sure users
> don't rely on implementation details of specific runners. Do we have a
> similar story for portable scenarios?
>

I think a long-term goal when it comes to portable runners is that we only
have one local runner in one language that all developers use across
multiple SDKs. In that sense yes, the Python ULR would have a similar
story, but for all SDKs, but only with portable pipelines.

But we've had differing ideas about this and how far it should go. Like is
this runner supposed to be good for debugging or just running already
validated pipelines? Do we still want non-portable local runners for each
SDK for performance or debug reasons? Questions like that haven't really
been answered. I think in one of the threads I linked to in the OP there
was some discussion about this if you want to see.

- I assume that extra validations in the DirectRunner have impact on
> performance in various ways (potentially non-deterministic). While this
> doesn't matter in some cases, it might do in others. Having a local runner
> that is (better) optimized for execution would probably make more sense for
> perf measurements, integration tests, and maybe even local production jobs.
> Is this something potentially worth looking into?
>

Basically what I mentioned above, there's no specific plans so it's mainly
something that's up for community discussion.

My personal opinion is that it's worth looking into, but I think a basic
implementation of portable features is more important first. Once
portability is at the point where it's reached parity with non-portable
pipelines feature-wise, then we can start thinking about having runners
with more niche uses.

On Fri, Apr 26, 2019 at 9:54 AM Anton Kedin <ke...@google.com> wrote:

> If there is no plans to invest in ULR then it makes sense to remove it.
>
> Going forward, however, I think we should try to document the higher level
> approach we're taking with runners (and portability) now that we have
> something working and can reflect on it. For example, couple of things that
> are not 100% clear to me:
>  - if the focus is on python runner for portability efforts, how does java
> SDK (and other languages) tie into this? E.g. how do we run, test, measure,
> and develop things (pipelines, aspects of the SDK, runner);
>  - what's our approach to developing new features, should we make sure
> python runner supports them as early as possible (e.g. schemas and SQL)?
>  - java DirectRunner is still there:
>     - it is still the primary tool for java SDK development purposes, and
> as Kenn mentioned in the linked threads it adds value by making sure users
> don't rely on implementation details of specific runners. Do we have a
> similar story for portable scenarios?
>     - I assume that extra validations in the DirectRunner have impact on
> performance in various ways (potentially non-deterministic). While this
> doesn't matter in some cases, it might do in others. Having a local runner
> that is (better) optimized for execution would probably make more sense for
> perf measurements, integration tests, and maybe even local production jobs.
> Is this something potentially worth looking into?
>
> Regards,
> Anton
>
>
> On Fri, Apr 26, 2019 at 4:41 AM Maximilian Michels <m...@apache.org> wrote:
>
>> Thanks for following up with this. I have mixed feelings to see the
>> portable Java DirectRunner go, but I'm in favor of this change because
>> it removes a lot of code that we do not really make use of.
>>
>> -Max
>>
>> On 26.04.19 02:58, Kenneth Knowles wrote:
>> > Thanks for providing all this background on the PR. It is very easy to
>> > see where it came from. Definitely nice to have less code and fewer
>> > things that can break. Perhaps lazy consensus is enough.
>> >
>> > Kenn
>> >
>> > On Thu, Apr 25, 2019 at 4:01 PM Daniel Oliveira <danolive...@google.com
>> > <mailto:danolive...@google.com>> wrote:
>> >
>> >     Hey everyone,
>> >
>> >     I made a preliminary PR for removing all the Java Reference Runner
>> >     code (PR-8380 <https://github.com/apache/beam/pull/8380>) since I
>> >     wanted to see if it could be done easily. It seems to be working
>> >     fine, so I wanted to open up this discussion to make sure people are
>> >     still in agreement on getting rid of this code and that people don't
>> >     have any concerns.
>> >
>> >     For those who need additional context about this, this previous
>> >     thread
>> >     <
>> https://lists.apache.org/thread.html/b235f8ee55a737ea399756edd80b1218ed34d3439f7b0ed59bfa8e40@%3Cdev.beam.apache.org%3E
>> >
>> >     is where we discussed deprecating the Java Reference Runner (in some
>> >     places it's called the ULR or Universal Local Runner, but it's the
>> >     same thing). Then there's this thread
>> >     <
>> https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E
>> >
>> >     where we discussed removing the code from the repo since it's been
>> >     deprecated.
>> >
>> >     If no one has any objections to trying to remove the code I'll have
>> >     someone review the PR I wrote and start a vote to have it merged.
>> >
>> >     Thanks,
>> >     Daniel Oliveira
>> >
>>
>

Reply via email to