Thomas has a point on the PortableRunner name, I was super confused
because of the `PortableRunner` not being a runner, I don't know if
too late but maybe it is still worth to give it a better name.

On Thu, Apr 30, 2020 at 8:41 PM Thomas Weise <[email protected]> wrote:
>
> +1 for removing the default runner. It has always been the Beam user 
> expectation that a runner needs to be selected.
>
> "PortableRunner" isn't a runner (despite its name) - it's a proxy to a runner 
> that the user specifies via job_endpoint.
>
> Thanks for cleaning this up!
>
> On Thu, Apr 30, 2020 at 10:11 AM Kyle Weaver <[email protected]> wrote:
>>
>> I'll bite :) Thanks for the feedback everyone!
>>
>> On Thu, Apr 30, 2020 at 1:01 PM Robert Bradshaw <[email protected]> wrote:
>>>
>>> I filed https://issues.apache.org/jira/browse/BEAM-9860. Any takers?
>>>
>>> On Thu, Apr 30, 2020 at 5:49 AM Ismaël Mejía <[email protected]> wrote:
>>>>
>>>> +1 for A there are zero reasons to have a default runner set by
>>>> default, being explicit is better as Robert suggests and it resolves
>>>> the confusion that the user reported.
>>>>
>>>> On Wed, Apr 29, 2020 at 10:05 PM Robert Bradshaw <[email protected]> 
>>>> wrote:
>>>> >
>>>> > +1, I was actually thinking about this just the other day. 
>>>> > PortableRunner should require job_endpoint to be set, and we can have a 
>>>> > nice error message directing the explicit use of FlinkRunner for the old 
>>>> > behavior.
>>>> >
>>>> > On Wed, Apr 29, 2020 at 11:50 AM Kyle Weaver <[email protected]> wrote:
>>>> >>
>>>> >> > Could the error message suggest switching to FlinkRunner (and/or 
>>>> >> > other runners that start a job server for you)? Then it seems like 
>>>> >> > the breakage would only be a minor annoyance.
>>>> >>
>>>> >> Definitely.
>>>> >>
>>>> >> On Wed, Apr 29, 2020 at 2:49 PM Brian Hulette <[email protected]> 
>>>> >> wrote:
>>>> >>>
>>>> >>> Could the error message suggest switching to FlinkRunner (and/or other 
>>>> >>> runners that start a job server for you)? Then it seems like the 
>>>> >>> breakage would only be a minor annoyance.
>>>> >>>
>>>> >>> Brian
>>>> >>>
>>>> >>> On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <[email protected]> 
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> Hi all,
>>>> >>>>
>>>> >>>> Currently, when running a pipeline that has the options 
>>>> >>>> runner=PortableRunner and job_endpoint unset, the Python SDK spins up 
>>>> >>>> a Dockerized Flink job server [1]. This is problematic because the 
>>>> >>>> PortableRunner can be used by any portable runner. So for example, a 
>>>> >>>> Spark runner user was recently baffled when their job ran 
>>>> >>>> successfully but printed a bunch of Flink log messages.
>>>> >>>>
>>>> >>>> There are not too many uses of this default behavior to my knowledge, 
>>>> >>>> at least within Beam itself. The only example I could find was in the 
>>>> >>>> portableWordCount tests, which is mostly the same as 
>>>> >>>> portableWordCountFlinkRunner tests [2]. The default behavior is 
>>>> >>>> entirely superseded by the FlinkRunner class, which provides better 
>>>> >>>> encapsulation.
>>>> >>>>
>>>> >>>> I also noticed that DockerizedJobServer is only used by [3]. In 
>>>> >>>> FlinkRunner, we pull the job server from Maven if necessary and call 
>>>> >>>> Java directly. In general, I think there are already quite enough 
>>>> >>>> knobs in the portability framework, so we should remove it unless 
>>>> >>>> there is reason to prefer running the job server with Docker instead 
>>>> >>>> of calling Java directly.
>>>> >>>>
>>>> >>>> There are a couple options:
>>>> >>>>
>>>> >>>> A) Remove the default behavior and require job_endpoint to always be 
>>>> >>>> set when using PortableRunner. This would be a breaking change.
>>>> >>>> B) Keep the current behavior, but warn when the user sets 
>>>> >>>> runner=PortableRunner without job_endpoint. This is easy to miss, but 
>>>> >>>> it's better than nothing.
>>>> >>>>
>>>> >>>> What do you think?
>>>> >>>>
>>>> >>>> [1] 
>>>> >>>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
>>>> >>>> [2] 
>>>> >>>> https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
>>>> >>>> [3] 
>>>> >>>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163

Reply via email to