Hi Steven,

understood, thanks for expanding on your point. I will prepare a FLIP now
going for the solution of S3_ACCESS_KEY style environment variables since
overall this fits Flink use-cases better. I think your point that users
have to know how to convert s3.access-key to S3_ACCESS_KEY is valid, but it
would follow a logical pattern also used by a big project like Spring.

I'll update this thread with the FLIP number once I have written the first
draft. Thanks everyone!


Regards
Ingo

On Tue, Jan 19, 2021 at 6:28 PM Steven Wu <stevenz...@gmail.com> wrote:

> Ingo,
>
> regarding "state.checkpoints.dir: ${CHECKPOINTS_DIR:-path1}", it definitely
> can work. but now users need to know that we can use "CHECKPOINTS_DIR" env
> var to override "state.checkpoints.dir". That is the inconvenience that I
> am trying to avoid. "state.checkpoints.dir" is well documented in the Flink
> website. Now, we need to document "CHECKPOINTS_DIR" separately.
>
> Ideally, we want to just let the user define a env var like
> "state.checkpoints.dir=some/overriden/path". But it suffers the problem
> that dot chars are invalid characters for shell env var names. That is the
> dilemma.
>
> That is why we bundled all the non-conformning env var overrides (where
> variable names containing non-conforming chars) into a single base64
> encoded string (like "NON_CONFORMING_OVERRIDES_BASE64=<base64 encoded json
> string>"  in our deployment infrastructure and unpack them in the container
> startup. It is hacky. I am hoping that there is a more elegant solution.
>
> If the configuration key is conforming to shell standard (like
> S3_ACCESS_KEY), then we don't have a problem. but it will be deviating from
> the Flink config naming convention (Java property style, dot separated).
>
> Thanks,
> Steven
>
>
> On Tue, Jan 19, 2021 at 6:56 AM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
> > I think a short FLIP would be awesome.
> >
> > I guess this feature hasn't been implemented yet because it has not been
> > implemented yet ;-) I agree that this feature will improve configuration
> > ergonomics big time :-)
> >
> > Cheers,
> > Till
> >
> > On Tue, Jan 19, 2021 at 12:28 PM Ufuk Celebi <u...@apache.org> wrote:
> >
> > > Hey all,
> > >
> > > I think that approach 2 is more idiomatic for container deployments
> where
> > > it can be cumbersome to manually map flink-conf.yaml contents to env
> vars
> > > [1]. The precedence order outlined by Till would also cover Steven's
> > > hierarchical overwrite requirement.
> > >
> > > I'm really excited about this feature as it will make Flink
> deployments a
> > > lot more ergonomic. The implementation seems to be not too complicated
> > > (which makes we wonder why we didn't tackle this earlier or whether I'm
> > > missing something).
> > >
> > > I'd also be happy to shepherd this contribution if there is consensus
> on
> > > the need for it and the approach. Does it make sense to formalize this
> > > decision a bit with a short FLIP?
> > >
> > > – Ufuk
> > >
> > > [1] In Ververica Platform, we support approach 1, because the Flink
> > > configuration is part of the specification for a single Deployment and
> > it's
> > > minimally more convenient to have something like
> > >
> > > flinkConfiguration:
> > >   foo: ${BAR}
> > >
> > > for us. I don't think this approach would feel natural when manually
> > > deploying Flink. There would be a clear migration path for our
> customers,
> > > so I'm not concerned about this too much.
> > >
> > > On Tue, Jan 19, 2021, at 10:01 AM, Till Rohrmann wrote:
> > > > Hi everyone,
> > > >
> > > > Thanks for starting this discussion Ingo. I think being able to use
> env
> > > > variables to change Flink's configuration will be a very useful
> > feature.
> > > >
> > > > Concerning the two approaches I would be in favour of the second
> > approach
> > > > ($FLINK_CONFIG_S3_ACCESS_KEY) because it does not require the user to
> > > > prepare a special flink-conf.yaml where he inserts env variables for
> > > every
> > > > config value he wants to configure. Since this is not required with
> the
> > > > second approach, I think it is more general and easier to use. Also,
> > the
> > > > user does not have to remember a second set of names (env names)
> which
> > he
> > > > has to/can set.
> > > >
> > > > For how to substitute the values, I think it should happen when we
> load
> > > the
> > > > Flink configuration. First we read the file and then overwrite values
> > > > specified via an env variable or dynamic properties in some defined
> > > order.
> > > > For env.java.opts and other options which are used for starting the
> JVM
> > > we
> > > > might need special handling in the bash scripts.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Jan 19, 2021 at 9:46 AM Ingo Bürk <i...@ververica.com>
> wrote:
> > > >
> > > > > Hi Yang,
> > > > >
> > > > > 1. As you said I think this doesn't affect Ververica Platform,
> > really,
> > > so
> > > > > I'm more than happy to hear and follow the thoughts of people more
> > > > > experienced with Flink than me.
> > > > > 2. I wasn't aware of env.java.opts, but that's definitely a
> candidate
> > > where
> > > > > a user may want to "escape" it so it doesn't get substituted
> > > immediately, I
> > > > > agree.
> > > > >
> > > > >
> > > > > Regards
> > > > > Ingo
> > > > >
> > > > > On Tue, Jan 19, 2021 at 4:47 AM Yang Wang <danrtsey...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Ingo,
> > > > > >
> > > > > > Thanks for your response.
> > > > > >
> > > > > > 1. Not distinguishing JM/TM is reasonable, but what about the
> > client
> > > > > side.
> > > > > > For Yarn/K8s deployment,
> > > > > > the local flink-conf.yaml will be shipped to JM/TM. So I am just
> > > confused
> > > > > > about where should the environment
> > > > > > variables be replaced? IIUC, it is not an issue for Ververica
> > > Platform
> > > > > > since it is always done in the JM/TM side.
> > > > > >
> > > > > > 2. I believe we should support not do the substitution for
> specific
> > > key.
> > > > > A
> > > > > > typical use case is "env.java.opts". If the
> > > > > > value contains environment variables, they are expected to be
> > > replaced
> > > > > > exactly when the java command is executed,
> > > > > > not after the java process is started. Maybe escaping with single
> > > quote
> > > > > is
> > > > > > enough.
> > > > > >
> > > > > > 3. The substitution only takes effects on the value makes sense
> to
> > > me.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yang
> > > > > >
> > > > > > Steven Wu <stevenz...@gmail.com> 于2021年1月19日周二 上午12:36写道:
> > > > > >
> > > > > > > Variable substitution (proposed here) is definitely useful.
> > > > > > >
> > > > > > > For us, hierarchical override is more useful.  E.g., we may
> have
> > > the
> > > > > > > default value of "state.checkpoints.dir=path1" defined in
> > > > > > flink-conf.yaml.
> > > > > > > But maybe we want to override it to
> "state.checkpoints.dir=path2"
> > > via
> > > > > > > environment variable in some scenarios. Otherwise, we have to
> > > define a
> > > > > > > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for
> the
> > > Flink
> > > > > > > config, which is annoying.
> > > > > > >
> > > > > > > As Ingo pointed, it is also annoying to handle Java property
> key
> > > naming
> > > > > > > convention (dots separated), as dots aren't allowed in shell
> env
> > > var
> > > > > > naming
> > > > > > > (All caps, separated with underscore). Shell will complain. We
> > > have to
> > > > > > > bundle all env var overrides (k-v pairs) in a single property
> > value
> > > > > (JSON
> > > > > > > and base64 encode) to avoid it.
> > > > > > >
> > > > > > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <i...@ververica.com>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Yang,
> > > > > > > >
> > > > > > > > thanks for your questions! I'm glad to see this feature is
> > being
> > > > > > received
> > > > > > > > positively.
> > > > > > > >
> > > > > > > > ad 1) We don't distinguish JM/TM, and I can't think of a good
> > > reason
> > > > > > why
> > > > > > > a
> > > > > > > > user would want to do so. I'm not very experienced with
> Flink,
> > > > > however,
> > > > > > > so
> > > > > > > > please excuse me if I'm overlooking some obvious reason here.
> > :-)
> > > > > > > > ad 2) Admittedly I don't have a good overview on all the
> > > > > configuration
> > > > > > > > options that exist, but from those that I do know I can't
> > imagine
> > > > > > someone
> > > > > > > > wanting to pass a value like "${MY_VAR}" verbatim. In
> Ververica
> > > > > > Platform
> > > > > > > as
> > > > > > > > of now we ignore this problem. If, however, this needs to be
> > > > > > addressed, a
> > > > > > > > possible solution could be to allow escaping syntax such as
> > > > > > "\${MY_VAR}".
> > > > > > > >
> > > > > > > > Another point to consider here is when exactly the
> substitution
> > > takes
> > > > > > > > place: on the "raw" file, or on the parsed key / value
> > > separately,
> > > > > and
> > > > > > if
> > > > > > > > so, should it support both key and value? My current thinking
> > is
> > > that
> > > > > > > > substituting only the value of the parsed entry should be
> > > sufficient.
> > > > > > > >
> > > > > > > >
> > > > > > > > Regards
> > > > > > > > Ingo
> > > > > > > >
> > > > > > > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang <
> > danrtsey...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for kicking off the discussion.
> > > > > > > > >
> > > > > > > > > I think supporting environment variables rendering in the
> > Flink
> > > > > > > > > configuration yaml file is a good idea. Especially for
> > > > > > > > > the Kubernetes environment since we are using the secret
> > > resource
> > > > > to
> > > > > > > > store
> > > > > > > > > the authentication information.
> > > > > > > > >
> > > > > > > > > But I have some questions for how to do it?
> > > > > > > > > 1. The environments in Flink configuration yaml will be
> > > replaced in
> > > > > > > > client,
> > > > > > > > > JobManager, TaskManager or all of them?
> > > > > > > > > 2. If users do not want some config options to be replaced,
> > > how to
> > > > > > > > > achieve that?
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yang
> > > > > > > > >
> > > > > > > > > Khachatryan Roman <khachatryan.ro...@gmail.com>
> > 于2021年1月18日周一
> > > > > > > 下午8:55写道:
> > > > > > > > >
> > > > > > > > > > Hi Ingo,
> > > > > > > > > >
> > > > > > > > > > Thanks a lot for this proposal!
> > > > > > > > > >
> > > > > > > > > > We had a related discussion recently in the context of
> > > > > FLINK-19520
> > > > > > > > > > (randomizing tests configuration) [1].
> > > > > > > > > > I believe other scenarios will benefit as well.
> > > > > > > > > >
> > > > > > > > > > For the end users, I think substitution in configuration
> > > files is
> > > > > > > > > > preferable over parsing env vars in Flink code.
> > > > > > > > > > And for cases without such a file, we could have a
> default
> > > one on
> > > > > > the
> > > > > > > > > > classpath with all substitutions defined (and then merge
> > > > > everything
> > > > > > > > from
> > > > > > > > > > the user-supplied file).
> > > > > > > > > >
> > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Roman
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk <
> > > i...@ververica.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi everyone,
> > > > > > > > > > >
> > > > > > > > > > > in Ververica Platform we offer a feature to use
> > environment
> > > > > > > variables
> > > > > > > > > in
> > > > > > > > > > > the Flink configuration¹, e.g.
> > > > > > > > > > >
> > > > > > > > > > > ```
> > > > > > > > > > > s3.access-key: ${S3_ACCESS_KEY}
> > > > > > > > > > > ```
> > > > > > > > > > >
> > > > > > > > > > > We've been discussing internally whether contributing
> > such
> > > a
> > > > > > > feature
> > > > > > > > to
> > > > > > > > > > > Flink directly would make sense and wanted to start a
> > > > > discussion
> > > > > > on
> > > > > > > > > this
> > > > > > > > > > > topic.
> > > > > > > > > > >
> > > > > > > > > > > An alternative way to do so from the above would be
> > parsing
> > > > > those
> > > > > > > > > > directly
> > > > > > > > > > > based on their name, so instead of having it defined in
> > the
> > > > > Flink
> > > > > > > > > > > configuration as above, it would get automatically set
> if
> > > > > > something
> > > > > > > > > like
> > > > > > > > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment.
> > > This is
> > > > > > > > > somewhat
> > > > > > > > > > > similar to what e.g. Spring does, and faces similar
> > > challenges
> > > > > > > > (dealing
> > > > > > > > > > > with "."s etc.)
> > > > > > > > > > >
> > > > > > > > > > > Although I view both of these approaches as mostly
> > > orthogonal,
> > > > > > > > > supporting
> > > > > > > > > > > both very likely wouldn't make sense, of course. So I
> was
> > > > > > wondering
> > > > > > > > > what
> > > > > > > > > > > your opinion is in terms of whether the project would
> > > benefit
> > > > > > from
> > > > > > > > > > > environment variable support for the Flink
> configuration,
> > > and
> > > > > > > whether
> > > > > > > > > > there
> > > > > > > > > > > are tendencies as to which approach to go with.
> > > > > > > > > > >
> > > > > > > > > > > ¹
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables
> > > > > > > > > > >
> > > > > > > > > > > Best regards
> > > > > > > > > > > Ingo
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to