Hey all,

I think that approach 2 is more idiomatic for container deployments where it 
can be cumbersome to manually map flink-conf.yaml contents to env vars [1]. The 
precedence order outlined by Till would also cover Steven's hierarchical 
overwrite requirement.

I'm really excited about this feature as it will make Flink deployments a lot 
more ergonomic. The implementation seems to be not too complicated (which makes 
we wonder why we didn't tackle this earlier or whether I'm missing something).

I'd also be happy to shepherd this contribution if there is consensus on the 
need for it and the approach. Does it make sense to formalize this decision a 
bit with a short FLIP?

– Ufuk

[1] In Ververica Platform, we support approach 1, because the Flink 
configuration is part of the specification for a single Deployment and it's 
minimally more convenient to have something like

flinkConfiguration:
  foo: ${BAR}

for us. I don't think this approach would feel natural when manually deploying 
Flink. There would be a clear migration path for our customers, so I'm not 
concerned about this too much.

On Tue, Jan 19, 2021, at 10:01 AM, Till Rohrmann wrote:
> Hi everyone,
> 
> Thanks for starting this discussion Ingo. I think being able to use env
> variables to change Flink's configuration will be a very useful feature.
> 
> Concerning the two approaches I would be in favour of the second approach
> ($FLINK_CONFIG_S3_ACCESS_KEY) because it does not require the user to
> prepare a special flink-conf.yaml where he inserts env variables for every
> config value he wants to configure. Since this is not required with the
> second approach, I think it is more general and easier to use. Also, the
> user does not have to remember a second set of names (env names) which he
> has to/can set.
> 
> For how to substitute the values, I think it should happen when we load the
> Flink configuration. First we read the file and then overwrite values
> specified via an env variable or dynamic properties in some defined order.
> For env.java.opts and other options which are used for starting the JVM we
> might need special handling in the bash scripts.
> 
> Cheers,
> Till
> 
> On Tue, Jan 19, 2021 at 9:46 AM Ingo Bürk <i...@ververica.com> wrote:
> 
> > Hi Yang,
> >
> > 1. As you said I think this doesn't affect Ververica Platform, really, so
> > I'm more than happy to hear and follow the thoughts of people more
> > experienced with Flink than me.
> > 2. I wasn't aware of env.java.opts, but that's definitely a candidate where
> > a user may want to "escape" it so it doesn't get substituted immediately, I
> > agree.
> >
> >
> > Regards
> > Ingo
> >
> > On Tue, Jan 19, 2021 at 4:47 AM Yang Wang <danrtsey...@gmail.com> wrote:
> >
> > > Hi Ingo,
> > >
> > > Thanks for your response.
> > >
> > > 1. Not distinguishing JM/TM is reasonable, but what about the client
> > side.
> > > For Yarn/K8s deployment,
> > > the local flink-conf.yaml will be shipped to JM/TM. So I am just confused
> > > about where should the environment
> > > variables be replaced? IIUC, it is not an issue for Ververica Platform
> > > since it is always done in the JM/TM side.
> > >
> > > 2. I believe we should support not do the substitution for specific key.
> > A
> > > typical use case is "env.java.opts". If the
> > > value contains environment variables, they are expected to be replaced
> > > exactly when the java command is executed,
> > > not after the java process is started. Maybe escaping with single quote
> > is
> > > enough.
> > >
> > > 3. The substitution only takes effects on the value makes sense to me.
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Steven Wu <stevenz...@gmail.com> 于2021年1月19日周二 上午12:36写道:
> > >
> > > > Variable substitution (proposed here) is definitely useful.
> > > >
> > > > For us, hierarchical override is more useful.  E.g., we may have the
> > > > default value of "state.checkpoints.dir=path1" defined in
> > > flink-conf.yaml.
> > > > But maybe we want to override it to "state.checkpoints.dir=path2" via
> > > > environment variable in some scenarios. Otherwise, we have to define a
> > > > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for the Flink
> > > > config, which is annoying.
> > > >
> > > > As Ingo pointed, it is also annoying to handle Java property key naming
> > > > convention (dots separated), as dots aren't allowed in shell env var
> > > naming
> > > > (All caps, separated with underscore). Shell will complain. We have to
> > > > bundle all env var overrides (k-v pairs) in a single property value
> > (JSON
> > > > and base64 encode) to avoid it.
> > > >
> > > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <i...@ververica.com> wrote:
> > > >
> > > > > Hi Yang,
> > > > >
> > > > > thanks for your questions! I'm glad to see this feature is being
> > > received
> > > > > positively.
> > > > >
> > > > > ad 1) We don't distinguish JM/TM, and I can't think of a good reason
> > > why
> > > > a
> > > > > user would want to do so. I'm not very experienced with Flink,
> > however,
> > > > so
> > > > > please excuse me if I'm overlooking some obvious reason here. :-)
> > > > > ad 2) Admittedly I don't have a good overview on all the
> > configuration
> > > > > options that exist, but from those that I do know I can't imagine
> > > someone
> > > > > wanting to pass a value like "${MY_VAR}" verbatim. In Ververica
> > > Platform
> > > > as
> > > > > of now we ignore this problem. If, however, this needs to be
> > > addressed, a
> > > > > possible solution could be to allow escaping syntax such as
> > > "\${MY_VAR}".
> > > > >
> > > > > Another point to consider here is when exactly the substitution takes
> > > > > place: on the "raw" file, or on the parsed key / value separately,
> > and
> > > if
> > > > > so, should it support both key and value? My current thinking is that
> > > > > substituting only the value of the parsed entry should be sufficient.
> > > > >
> > > > >
> > > > > Regards
> > > > > Ingo
> > > > >
> > > > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang <danrtsey...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Thanks for kicking off the discussion.
> > > > > >
> > > > > > I think supporting environment variables rendering in the Flink
> > > > > > configuration yaml file is a good idea. Especially for
> > > > > > the Kubernetes environment since we are using the secret resource
> > to
> > > > > store
> > > > > > the authentication information.
> > > > > >
> > > > > > But I have some questions for how to do it?
> > > > > > 1. The environments in Flink configuration yaml will be replaced in
> > > > > client,
> > > > > > JobManager, TaskManager or all of them?
> > > > > > 2. If users do not want some config options to be replaced, how to
> > > > > > achieve that?
> > > > > >
> > > > > > Best,
> > > > > > Yang
> > > > > >
> > > > > > Khachatryan Roman <khachatryan.ro...@gmail.com> 于2021年1月18日周一
> > > > 下午8:55写道:
> > > > > >
> > > > > > > Hi Ingo,
> > > > > > >
> > > > > > > Thanks a lot for this proposal!
> > > > > > >
> > > > > > > We had a related discussion recently in the context of
> > FLINK-19520
> > > > > > > (randomizing tests configuration) [1].
> > > > > > > I believe other scenarios will benefit as well.
> > > > > > >
> > > > > > > For the end users, I think substitution in configuration files is
> > > > > > > preferable over parsing env vars in Flink code.
> > > > > > > And for cases without such a file, we could have a default one on
> > > the
> > > > > > > classpath with all substitutions defined (and then merge
> > everything
> > > > > from
> > > > > > > the user-supplied file).
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520
> > > > > > >
> > > > > > > Regards,
> > > > > > > Roman
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk <i...@ververica.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > in Ververica Platform we offer a feature to use environment
> > > > variables
> > > > > > in
> > > > > > > > the Flink configuration¹, e.g.
> > > > > > > >
> > > > > > > > ```
> > > > > > > > s3.access-key: ${S3_ACCESS_KEY}
> > > > > > > > ```
> > > > > > > >
> > > > > > > > We've been discussing internally whether contributing such a
> > > > feature
> > > > > to
> > > > > > > > Flink directly would make sense and wanted to start a
> > discussion
> > > on
> > > > > > this
> > > > > > > > topic.
> > > > > > > >
> > > > > > > > An alternative way to do so from the above would be parsing
> > those
> > > > > > > directly
> > > > > > > > based on their name, so instead of having it defined in the
> > Flink
> > > > > > > > configuration as above, it would get automatically set if
> > > something
> > > > > > like
> > > > > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. This is
> > > > > > somewhat
> > > > > > > > similar to what e.g. Spring does, and faces similar challenges
> > > > > (dealing
> > > > > > > > with "."s etc.)
> > > > > > > >
> > > > > > > > Although I view both of these approaches as mostly orthogonal,
> > > > > > supporting
> > > > > > > > both very likely wouldn't make sense, of course. So I was
> > > wondering
> > > > > > what
> > > > > > > > your opinion is in terms of whether the project would benefit
> > > from
> > > > > > > > environment variable support for the Flink configuration, and
> > > > whether
> > > > > > > there
> > > > > > > > are tendencies as to which approach to go with.
> > > > > > > >
> > > > > > > > ¹
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables
> > > > > > > >
> > > > > > > > Best regards
> > > > > > > > Ingo
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 

Reply via email to