> I'd like to see this issue resolved before 2.17 as changing the public
API once it's released will be harder.

+1. In particular, I misunderstood that [auto] is not supported by
`FlinkUberJarJobServer`. Since [auto] is now the default, it's broken for
Python 3.6+.

requests.exceptions.InvalidURL: Failed to parse: [auto]/v1/config

We definitely should fix that, if nothing else.

> One concern with this is that just supplying host:port is the existing
> behavior, so we can't start requiring the http://.

The user shouldn't have to specify a protocol for Python, I think it's
preferable and reasonable to handle that for them in order to maintain
existing behavior and align with Java SDK.

> 2. Deprecate the "[auto]" and "[local]" values. It should be sufficient
> to have either a non-empty address string or an empty one. The empty
> string would either mean local execution or, in the context of the Flink
> CLI tool, loading the master address from the config. The non-empty
> string would be interpreted as a cluster address.

Looks like we also have a [collection] configuration value [1].

If we're using [auto] as the default, I don't think this really makes so
much of a difference (as long as we're supporting and documenting these
properly, of course). I'm not sure there's a compelling reason to change
this?

> always run locally (the least surprising to me

I agree a local cluster should remain the default, whether that is achieved
through [local] or [auto] or some new mechanism such as the above.

[1]
https://github.com/apache/beam/blob/2f1b56ccc506054e40afe4793a8b556e872e1865/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java#L80

On Mon, Oct 28, 2019 at 6:35 AM Maximilian Michels <m...@apache.org> wrote:

> Hi,
>
> Robert and Kyle have been doing great work to simplify submitting
> portable pipelines with the Flink Runner. Part of this is having a
> Python "FlinkRunner" which handles bringing up a Beam job server and
> submitting the pipeline directly via the Flink REST API. One building
> block is the creation of "executable Jars" which contain the
> materialized / translated Flink pipeline and do not require the Beam job
> server or the Python driver anymore.
>
> While unifying a newly introduced option "flink_master_url" with the
> pre-existing "flink_master" [1][2], some questions came up about Flink's
> execution modes. (The two options are meant to do the same thing:
> provide the address of the Flink master to hand-over the translated
> pipeline.)
>
> Historically, Flink had a proprietary protocol for submitting pipelines,
> running on port 9091. This has since been replaced with a REST protocol
> at port 8081. To this date, this has implications how you submit
> programs, e.g. the Flink client libraries expects the address to be of
> form "host:port", without a protocol scheme. On the other hand, external
> Rest libraries typically expect a protocol scheme.
>
> But this is only half of the fun. There are also special addresses for
> "flink_master" that influence submission of the pipeline. If you specify
> "[local]" as the address, the pipeline won't be submitted but executed
> in a local in-process Flink cluster. If you specify "[auto]" and you use
> the CLI tool that comes bundled with Flink, then the master address will
> be loaded from the Flink config, including any configuration like SSL.
> If none is found, then it falls back to "[local]".
>
> This is a bit odd, and after a discussion with Robert and Thomas in [1],
> we figured that this needs to be changed:
>
> 1. Make the master address a URL. Add "http://"; to "flink_master" in
> Python if no scheme is specified. Similarly, remove any "http://"; in
> Java, since the Java rest client does not expect a scheme. In case of
> "http_s_://", we have a special treatment to load the SSL settings from
> the Flink config.
>
> 2. Deprecate the "[auto]" and "[local]" values. It should be sufficient
> to have either a non-empty address string or an empty one. The empty
> string would either mean local execution or, in the context of the Flink
> CLI tool, loading the master address from the config. The non-empty
> string would be interpreted as a cluster address.
>
>
> Any opinions on this?
>
>
> Thanks,
> Max
>
>
> [1] https://github.com/apache/beam/pull/9803
> [2] https://github.com/apache/beam/pull/9844
>

Reply via email to