Hi, sorry, I almost forgot, so just to update this thread: I now started a new thread for the actual FLIP:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-161-Configuration-through-environment-variables-td48094.html Ingo On Wed, Jan 20, 2021 at 11:10 AM Ingo Bürk <i...@ververica.com> wrote: > Hi Steven, > > understood, thanks for expanding on your point. I will prepare a FLIP now > going for the solution of S3_ACCESS_KEY style environment variables since > overall this fits Flink use-cases better. I think your point that users > have to know how to convert s3.access-key to S3_ACCESS_KEY is valid, but it > would follow a logical pattern also used by a big project like Spring. > > I'll update this thread with the FLIP number once I have written the first > draft. Thanks everyone! > > > Regards > Ingo > > On Tue, Jan 19, 2021 at 6:28 PM Steven Wu <stevenz...@gmail.com> wrote: > >> Ingo, >> >> regarding "state.checkpoints.dir: ${CHECKPOINTS_DIR:-path1}", it >> definitely >> can work. but now users need to know that we can use "CHECKPOINTS_DIR" env >> var to override "state.checkpoints.dir". That is the inconvenience that I >> am trying to avoid. "state.checkpoints.dir" is well documented in the >> Flink >> website. Now, we need to document "CHECKPOINTS_DIR" separately. >> >> Ideally, we want to just let the user define a env var like >> "state.checkpoints.dir=some/overriden/path". But it suffers the problem >> that dot chars are invalid characters for shell env var names. That is the >> dilemma. >> >> That is why we bundled all the non-conformning env var overrides (where >> variable names containing non-conforming chars) into a single base64 >> encoded string (like "NON_CONFORMING_OVERRIDES_BASE64=<base64 encoded json >> string>" in our deployment infrastructure and unpack them in the >> container >> startup. It is hacky. I am hoping that there is a more elegant solution. >> >> If the configuration key is conforming to shell standard (like >> S3_ACCESS_KEY), then we don't have a problem. but it will be deviating >> from >> the Flink config naming convention (Java property style, dot separated). >> >> Thanks, >> Steven >> >> >> On Tue, Jan 19, 2021 at 6:56 AM Till Rohrmann <trohrm...@apache.org> >> wrote: >> >> > I think a short FLIP would be awesome. >> > >> > I guess this feature hasn't been implemented yet because it has not been >> > implemented yet ;-) I agree that this feature will improve configuration >> > ergonomics big time :-) >> > >> > Cheers, >> > Till >> > >> > On Tue, Jan 19, 2021 at 12:28 PM Ufuk Celebi <u...@apache.org> wrote: >> > >> > > Hey all, >> > > >> > > I think that approach 2 is more idiomatic for container deployments >> where >> > > it can be cumbersome to manually map flink-conf.yaml contents to env >> vars >> > > [1]. The precedence order outlined by Till would also cover Steven's >> > > hierarchical overwrite requirement. >> > > >> > > I'm really excited about this feature as it will make Flink >> deployments a >> > > lot more ergonomic. The implementation seems to be not too complicated >> > > (which makes we wonder why we didn't tackle this earlier or whether >> I'm >> > > missing something). >> > > >> > > I'd also be happy to shepherd this contribution if there is consensus >> on >> > > the need for it and the approach. Does it make sense to formalize this >> > > decision a bit with a short FLIP? >> > > >> > > – Ufuk >> > > >> > > [1] In Ververica Platform, we support approach 1, because the Flink >> > > configuration is part of the specification for a single Deployment and >> > it's >> > > minimally more convenient to have something like >> > > >> > > flinkConfiguration: >> > > foo: ${BAR} >> > > >> > > for us. I don't think this approach would feel natural when manually >> > > deploying Flink. There would be a clear migration path for our >> customers, >> > > so I'm not concerned about this too much. >> > > >> > > On Tue, Jan 19, 2021, at 10:01 AM, Till Rohrmann wrote: >> > > > Hi everyone, >> > > > >> > > > Thanks for starting this discussion Ingo. I think being able to use >> env >> > > > variables to change Flink's configuration will be a very useful >> > feature. >> > > > >> > > > Concerning the two approaches I would be in favour of the second >> > approach >> > > > ($FLINK_CONFIG_S3_ACCESS_KEY) because it does not require the user >> to >> > > > prepare a special flink-conf.yaml where he inserts env variables for >> > > every >> > > > config value he wants to configure. Since this is not required with >> the >> > > > second approach, I think it is more general and easier to use. Also, >> > the >> > > > user does not have to remember a second set of names (env names) >> which >> > he >> > > > has to/can set. >> > > > >> > > > For how to substitute the values, I think it should happen when we >> load >> > > the >> > > > Flink configuration. First we read the file and then overwrite >> values >> > > > specified via an env variable or dynamic properties in some defined >> > > order. >> > > > For env.java.opts and other options which are used for starting the >> JVM >> > > we >> > > > might need special handling in the bash scripts. >> > > > >> > > > Cheers, >> > > > Till >> > > > >> > > > On Tue, Jan 19, 2021 at 9:46 AM Ingo Bürk <i...@ververica.com> >> wrote: >> > > > >> > > > > Hi Yang, >> > > > > >> > > > > 1. As you said I think this doesn't affect Ververica Platform, >> > really, >> > > so >> > > > > I'm more than happy to hear and follow the thoughts of people more >> > > > > experienced with Flink than me. >> > > > > 2. I wasn't aware of env.java.opts, but that's definitely a >> candidate >> > > where >> > > > > a user may want to "escape" it so it doesn't get substituted >> > > immediately, I >> > > > > agree. >> > > > > >> > > > > >> > > > > Regards >> > > > > Ingo >> > > > > >> > > > > On Tue, Jan 19, 2021 at 4:47 AM Yang Wang <danrtsey...@gmail.com> >> > > wrote: >> > > > > >> > > > > > Hi Ingo, >> > > > > > >> > > > > > Thanks for your response. >> > > > > > >> > > > > > 1. Not distinguishing JM/TM is reasonable, but what about the >> > client >> > > > > side. >> > > > > > For Yarn/K8s deployment, >> > > > > > the local flink-conf.yaml will be shipped to JM/TM. So I am just >> > > confused >> > > > > > about where should the environment >> > > > > > variables be replaced? IIUC, it is not an issue for Ververica >> > > Platform >> > > > > > since it is always done in the JM/TM side. >> > > > > > >> > > > > > 2. I believe we should support not do the substitution for >> specific >> > > key. >> > > > > A >> > > > > > typical use case is "env.java.opts". If the >> > > > > > value contains environment variables, they are expected to be >> > > replaced >> > > > > > exactly when the java command is executed, >> > > > > > not after the java process is started. Maybe escaping with >> single >> > > quote >> > > > > is >> > > > > > enough. >> > > > > > >> > > > > > 3. The substitution only takes effects on the value makes sense >> to >> > > me. >> > > > > > >> > > > > > >> > > > > > Best, >> > > > > > Yang >> > > > > > >> > > > > > Steven Wu <stevenz...@gmail.com> 于2021年1月19日周二 上午12:36写道: >> > > > > > >> > > > > > > Variable substitution (proposed here) is definitely useful. >> > > > > > > >> > > > > > > For us, hierarchical override is more useful. E.g., we may >> have >> > > the >> > > > > > > default value of "state.checkpoints.dir=path1" defined in >> > > > > > flink-conf.yaml. >> > > > > > > But maybe we want to override it to >> "state.checkpoints.dir=path2" >> > > via >> > > > > > > environment variable in some scenarios. Otherwise, we have to >> > > define a >> > > > > > > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for >> the >> > > Flink >> > > > > > > config, which is annoying. >> > > > > > > >> > > > > > > As Ingo pointed, it is also annoying to handle Java property >> key >> > > naming >> > > > > > > convention (dots separated), as dots aren't allowed in shell >> env >> > > var >> > > > > > naming >> > > > > > > (All caps, separated with underscore). Shell will complain. We >> > > have to >> > > > > > > bundle all env var overrides (k-v pairs) in a single property >> > value >> > > > > (JSON >> > > > > > > and base64 encode) to avoid it. >> > > > > > > >> > > > > > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <i...@ververica.com >> > >> > > wrote: >> > > > > > > >> > > > > > > > Hi Yang, >> > > > > > > > >> > > > > > > > thanks for your questions! I'm glad to see this feature is >> > being >> > > > > > received >> > > > > > > > positively. >> > > > > > > > >> > > > > > > > ad 1) We don't distinguish JM/TM, and I can't think of a >> good >> > > reason >> > > > > > why >> > > > > > > a >> > > > > > > > user would want to do so. I'm not very experienced with >> Flink, >> > > > > however, >> > > > > > > so >> > > > > > > > please excuse me if I'm overlooking some obvious reason >> here. >> > :-) >> > > > > > > > ad 2) Admittedly I don't have a good overview on all the >> > > > > configuration >> > > > > > > > options that exist, but from those that I do know I can't >> > imagine >> > > > > > someone >> > > > > > > > wanting to pass a value like "${MY_VAR}" verbatim. In >> Ververica >> > > > > > Platform >> > > > > > > as >> > > > > > > > of now we ignore this problem. If, however, this needs to be >> > > > > > addressed, a >> > > > > > > > possible solution could be to allow escaping syntax such as >> > > > > > "\${MY_VAR}". >> > > > > > > > >> > > > > > > > Another point to consider here is when exactly the >> substitution >> > > takes >> > > > > > > > place: on the "raw" file, or on the parsed key / value >> > > separately, >> > > > > and >> > > > > > if >> > > > > > > > so, should it support both key and value? My current >> thinking >> > is >> > > that >> > > > > > > > substituting only the value of the parsed entry should be >> > > sufficient. >> > > > > > > > >> > > > > > > > >> > > > > > > > Regards >> > > > > > > > Ingo >> > > > > > > > >> > > > > > > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang < >> > danrtsey...@gmail.com >> > > > >> > > > > > wrote: >> > > > > > > > >> > > > > > > > > Thanks for kicking off the discussion. >> > > > > > > > > >> > > > > > > > > I think supporting environment variables rendering in the >> > Flink >> > > > > > > > > configuration yaml file is a good idea. Especially for >> > > > > > > > > the Kubernetes environment since we are using the secret >> > > resource >> > > > > to >> > > > > > > > store >> > > > > > > > > the authentication information. >> > > > > > > > > >> > > > > > > > > But I have some questions for how to do it? >> > > > > > > > > 1. The environments in Flink configuration yaml will be >> > > replaced in >> > > > > > > > client, >> > > > > > > > > JobManager, TaskManager or all of them? >> > > > > > > > > 2. If users do not want some config options to be >> replaced, >> > > how to >> > > > > > > > > achieve that? >> > > > > > > > > >> > > > > > > > > Best, >> > > > > > > > > Yang >> > > > > > > > > >> > > > > > > > > Khachatryan Roman <khachatryan.ro...@gmail.com> >> > 于2021年1月18日周一 >> > > > > > > 下午8:55写道: >> > > > > > > > > >> > > > > > > > > > Hi Ingo, >> > > > > > > > > > >> > > > > > > > > > Thanks a lot for this proposal! >> > > > > > > > > > >> > > > > > > > > > We had a related discussion recently in the context of >> > > > > FLINK-19520 >> > > > > > > > > > (randomizing tests configuration) [1]. >> > > > > > > > > > I believe other scenarios will benefit as well. >> > > > > > > > > > >> > > > > > > > > > For the end users, I think substitution in configuration >> > > files is >> > > > > > > > > > preferable over parsing env vars in Flink code. >> > > > > > > > > > And for cases without such a file, we could have a >> default >> > > one on >> > > > > > the >> > > > > > > > > > classpath with all substitutions defined (and then merge >> > > > > everything >> > > > > > > > from >> > > > > > > > > > the user-supplied file). >> > > > > > > > > > >> > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520 >> > > > > > > > > > >> > > > > > > > > > Regards, >> > > > > > > > > > Roman >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk < >> > > i...@ververica.com> >> > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > > Hi everyone, >> > > > > > > > > > > >> > > > > > > > > > > in Ververica Platform we offer a feature to use >> > environment >> > > > > > > variables >> > > > > > > > > in >> > > > > > > > > > > the Flink configuration¹, e.g. >> > > > > > > > > > > >> > > > > > > > > > > ``` >> > > > > > > > > > > s3.access-key: ${S3_ACCESS_KEY} >> > > > > > > > > > > ``` >> > > > > > > > > > > >> > > > > > > > > > > We've been discussing internally whether contributing >> > such >> > > a >> > > > > > > feature >> > > > > > > > to >> > > > > > > > > > > Flink directly would make sense and wanted to start a >> > > > > discussion >> > > > > > on >> > > > > > > > > this >> > > > > > > > > > > topic. >> > > > > > > > > > > >> > > > > > > > > > > An alternative way to do so from the above would be >> > parsing >> > > > > those >> > > > > > > > > > directly >> > > > > > > > > > > based on their name, so instead of having it defined >> in >> > the >> > > > > Flink >> > > > > > > > > > > configuration as above, it would get automatically >> set if >> > > > > > something >> > > > > > > > > like >> > > > > > > > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the >> environment. >> > > This is >> > > > > > > > > somewhat >> > > > > > > > > > > similar to what e.g. Spring does, and faces similar >> > > challenges >> > > > > > > > (dealing >> > > > > > > > > > > with "."s etc.) >> > > > > > > > > > > >> > > > > > > > > > > Although I view both of these approaches as mostly >> > > orthogonal, >> > > > > > > > > supporting >> > > > > > > > > > > both very likely wouldn't make sense, of course. So I >> was >> > > > > > wondering >> > > > > > > > > what >> > > > > > > > > > > your opinion is in terms of whether the project would >> > > benefit >> > > > > > from >> > > > > > > > > > > environment variable support for the Flink >> configuration, >> > > and >> > > > > > > whether >> > > > > > > > > > there >> > > > > > > > > > > are tendencies as to which approach to go with. >> > > > > > > > > > > >> > > > > > > > > > > ¹ >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > >> > >> https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables >> > > > > > > > > > > >> > > > > > > > > > > Best regards >> > > > > > > > > > > Ingo >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >