> I'd like to see this issue resolved before 2.17 as changing the public API once it's released will be harder.
+1. In particular, I misunderstood that [auto] is not supported by `FlinkUberJarJobServer`. Since [auto] is now the default, it's broken for Python 3.6+. requests.exceptions.InvalidURL: Failed to parse: [auto]/v1/config We definitely should fix that, if nothing else. > One concern with this is that just supplying host:port is the existing > behavior, so we can't start requiring the http://. The user shouldn't have to specify a protocol for Python, I think it's preferable and reasonable to handle that for them in order to maintain existing behavior and align with Java SDK. > 2. Deprecate the "[auto]" and "[local]" values. It should be sufficient > to have either a non-empty address string or an empty one. The empty > string would either mean local execution or, in the context of the Flink > CLI tool, loading the master address from the config. The non-empty > string would be interpreted as a cluster address. Looks like we also have a [collection] configuration value [1]. If we're using [auto] as the default, I don't think this really makes so much of a difference (as long as we're supporting and documenting these properly, of course). I'm not sure there's a compelling reason to change this? > always run locally (the least surprising to me I agree a local cluster should remain the default, whether that is achieved through [local] or [auto] or some new mechanism such as the above. [1] https://github.com/apache/beam/blob/2f1b56ccc506054e40afe4793a8b556e872e1865/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java#L80 On Mon, Oct 28, 2019 at 6:35 AM Maximilian Michels <m...@apache.org> wrote: > Hi, > > Robert and Kyle have been doing great work to simplify submitting > portable pipelines with the Flink Runner. Part of this is having a > Python "FlinkRunner" which handles bringing up a Beam job server and > submitting the pipeline directly via the Flink REST API. One building > block is the creation of "executable Jars" which contain the > materialized / translated Flink pipeline and do not require the Beam job > server or the Python driver anymore. > > While unifying a newly introduced option "flink_master_url" with the > pre-existing "flink_master" [1][2], some questions came up about Flink's > execution modes. (The two options are meant to do the same thing: > provide the address of the Flink master to hand-over the translated > pipeline.) > > Historically, Flink had a proprietary protocol for submitting pipelines, > running on port 9091. This has since been replaced with a REST protocol > at port 8081. To this date, this has implications how you submit > programs, e.g. the Flink client libraries expects the address to be of > form "host:port", without a protocol scheme. On the other hand, external > Rest libraries typically expect a protocol scheme. > > But this is only half of the fun. There are also special addresses for > "flink_master" that influence submission of the pipeline. If you specify > "[local]" as the address, the pipeline won't be submitted but executed > in a local in-process Flink cluster. If you specify "[auto]" and you use > the CLI tool that comes bundled with Flink, then the master address will > be loaded from the Flink config, including any configuration like SSL. > If none is found, then it falls back to "[local]". > > This is a bit odd, and after a discussion with Robert and Thomas in [1], > we figured that this needs to be changed: > > 1. Make the master address a URL. Add "http://" to "flink_master" in > Python if no scheme is specified. Similarly, remove any "http://" in > Java, since the Java rest client does not expect a scheme. In case of > "http_s_://", we have a special treatment to load the SSL settings from > the Flink config. > > 2. Deprecate the "[auto]" and "[local]" values. It should be sufficient > to have either a non-empty address string or an empty one. The empty > string would either mean local execution or, in the context of the Flink > CLI tool, loading the master address from the config. The non-empty > string would be interpreted as a cluster address. > > > Any opinions on this? > > > Thanks, > Max > > > [1] https://github.com/apache/beam/pull/9803 > [2] https://github.com/apache/beam/pull/9844 >