Thanks for the further feedback Thomas and Yangze.

> A generic, dynamic configuration mechanism based on environment variables
is essential and it is already supported via envsubst and an environment
variable that can supply a configuration fragment

True, we already have this. As I understand this was introduced for
flexibility to template a custom flink-conf.yaml with env vars, put it into
the FLINK_PROPERTIES and merge it with the default one.
Could we achieve the same with the dynamic properties (-Drpc.port=1234),
passed as image args to run it, instead of FLINK_PROPERTIES?
They could be also parametrised with env vars. This would require
jobmanager.sh to properly propagate them to
the StandaloneSessionClusterEntrypoint though:
https://github.com/docker-flink/docker-flink/pull/82#issuecomment-525285552
cc @Till
This would provide a unified configuration approach.

> On the flip side, attempting to support a fixed subset of configuration
options is brittle and will probably lead to compatibility issues down the
road

I agree with it. The idea was to have just some shortcut scripted functions
to set options in flink-conf.yaml for a custom Dockerfile or entry point
script.
TASK_MANAGER_NUMBER_OF_TASK_SLOTS could be set as a dynamic property of
started JM.
I am not sure how many users depend on it. Maybe we could remove it.
It also looks we already have somewhat unclean state in
the docker-entrypoint.sh where some ports are set the hardcoded values
and then FLINK_PROPERTIES are applied potentially duplicating options in
the result flink-conf.yaml.

I can see some potential usage of env vars as standard entry point args but
for purposes related to something which cannot be achieved by passing entry
point args, like changing flink-conf.yaml options. Nothing comes into my
mind at the moment. It could be some setting specific to the running mode
of the entry point. The mode itself can stay the first arg of the entry
point.

> I would second that it is desirable to support Java 11

> Regarding supporting JAVA 11:
> - Not sure if it is necessary to ship JAVA. Maybe we could just change
> the base image from openjdk:8-jre to openjdk:11-jre in template docker
> file[1]. Correct me if I understand incorrectly. Also, I agree to move
> this out of the scope of this FLIP if it indeed takes much extra
> effort.

This is what I meant by bumping up the Java version in the docker hub Flink
image:
FROM openjdk:8-jre -> FROM openjdk:11-jre
This can be polled dependently in user mailing list.

> and in general use a base image that allows the (straightforward) use of
more recent versions of other software (Python etc.)

This can be polled whether to always include some version of python into
the docker hub image.
A potential problem here is once it is there, it is some hassle to
remove/change it in a custom extended Dockerfile.

It would be also nice to avoid maintaining images for various combinations
of installed Java/Scala/Python in docker hub.

> Regarding building from local dist:
> - Yes, I bring this up mostly for development purpose. Since k8s is
> popular, I believe more and more developers would like to test their
> work on k8s cluster. I'm not sure should all developers write a custom
> docker file themselves in this scenario. Thus, I still prefer to
> provide a script for devs.
> - I agree to keep the scope of this FLIP mostly for those normal
> users. But as far as I can see, supporting building from local dist
> would not take much extra effort.
> - The maven docker plugin sounds good. I'll take a look at it.

I would see any scripts introduced in this FLIP also as potential building
blocks for a custom dev Dockerfile.
Maybe, this will be all what we need for dev images or we write a dev
Dockerfile, highly parametrised for building a dev image.
If scripts stay in apache/flink-docker, it is also somewhat inconvenient to
use them in the main Flink repo but possible.
If we move them to apache/flink then we will have to e.g. include them into
the release to make them easily available in apache/flink-docker and
maintain them in main repo, although they are only docker specific.
All in all, I would say, once we implement them, we can revisit this topic.

Best,
Andrey

On Wed, Mar 11, 2020 at 8:58 AM Yangze Guo <karma...@gmail.com> wrote:

> Thanks for the reply, Andrey.
>
> Regarding building from local dist:
> - Yes, I bring this up mostly for development purpose. Since k8s is
> popular, I believe more and more developers would like to test their
> work on k8s cluster. I'm not sure should all developers write a custom
> docker file themselves in this scenario. Thus, I still prefer to
> provide a script for devs.
> - I agree to keep the scope of this FLIP mostly for those normal
> users. But as far as I can see, supporting building from local dist
> would not take much extra effort.
> - The maven docker plugin sounds good. I'll take a look at it.
>
> Regarding supporting JAVA 11:
> - Not sure if it is necessary to ship JAVA. Maybe we could just change
> the base image from openjdk:8-jre to openjdk:11-jre in template docker
> file[1]. Correct me if I understand incorrectly. Also, I agree to move
> this out of the scope of this FLIP if it indeed takes much extra
> effort.
>
> Regarding the custom configuration, the mechanism that Thomas mentioned
> LGTM.
>
> [1]
> https://github.com/apache/flink-docker/blob/master/Dockerfile-debian.template
>
> Best,
> Yangze Guo
>
> On Wed, Mar 11, 2020 at 5:52 AM Thomas Weise <t...@apache.org> wrote:
> >
> > Thanks for working on improvements to the Flink Docker container images.
> This will be important as more and more users are looking to adopt
> Kubernetes and other deployment tooling that relies on Docker images.
> >
> > A generic, dynamic configuration mechanism based on environment
> variables is essential and it is already supported via envsubst and an
> environment variable that can supply a configuration fragment:
> >
> >
> https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L88
> >
> https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L85
> >
> > This gives the necessary control for infrastructure use cases that aim
> to supply deployment tooling other users. An example in this category this
> is the FlinkK8sOperator:
> >
> > https://github.com/lyft/flinkk8soperator/tree/master/examples/wordcount
> >
> > On the flip side, attempting to support a fixed subset of configuration
> options is brittle and will probably lead to compatibility issues down the
> road:
> >
> >
> https://github.com/apache/flink-docker/blob/09adf2dcd99abfb6180e1e2b5b917b288e0c01f6/docker-entrypoint.sh#L97
> >
> > Besides the configuration, it may be worthwhile to see in which other
> ways the base Docker images can provide more flexibility to incentivize
> wider adoption.
> >
> > I would second that it is desirable to support Java 11 and in general
> use a base image that allows the (straightforward) use of more recent
> versions of other software (Python etc.)
> >
> >
> https://github.com/apache/flink-docker/blob/d3416e720377e9b4c07a2d0f4591965264ac74c5/Dockerfile-debian.template#L19
> >
> > Thanks,
> > Thomas
> >
> > On Tue, Mar 10, 2020 at 12:26 PM Andrey Zagrebin <azagre...@apache.org>
> wrote:
> >>
> >> Hi All,
> >>
> >> Thanks a lot for the feedback!
> >>
> >> *@Yangze Guo*
> >>
> >> - Regarding the flink_docker_utils#install_flink function, I think it
> >> > should also support build from local dist and build from a
> >> > user-defined archive.
> >>
> >> I suppose you bring this up mostly for development purpose or powerful
> >> users.
> >> Most of normal users are usually interested in mainstream released
> versions
> >> of Flink.
> >> Although, you are bring a valid concern, my idea was to keep scope of
> this
> >> FLIP mostly for those normal users.
> >> The powerful users are usually capable to design a completely
> >> custom Dockerfile themselves.
> >> At the moment, we already have custom Dockerfiles e.g. for tests in
> >>
> flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile.
> >> We can add something similar for development purposes and maybe
> introduce a
> >> special maven goal. There is a maven docker plugin, afaik.
> >> I will add this to FLIP as next step.
> >>
> >> - It seems that the install_shaded_hadoop could be an option of
> >> > install_flink
> >>
> >> I woud rather think about this as a separate independent optional step.
> >>
> >> - Should we support JAVA 11? Currently, most of the docker file based on
> >> > JAVA 8.
> >>
> >> Indeed, it is a valid concern. Java version is a fundamental property of
> >> the docker image.
> >> To customise this in the current mainstream image is difficult, this
> would
> >> require to ship it w/o Java at all.
> >> Or this is a separate discussion whether we want to distribute docker
> hub
> >> images with different Java versions or just bump it to Java 11.
> >> This should be easy in a custom Dockerfile for development purposes
> though
> >> as mentioned before.
> >>
> >> - I do not understand how to set config options through
> >>
> >> "flink_docker_utils configure"? Does this step happen during the image
> >> > build or the container start? If it happens during the image build,
> >> > there would be a new image every time we change the config. If it just
> >> > a part of the container entrypoint, I think there is no need to add a
> >> > configure command, we could just add all dynamic config options to the
> >> > args list of "start_jobmaster"/"start_session_jobmanager". Am I
> >> > understanding this correctly?
> >>
> >>  `flink_docker_utils configure ...` can be called everywhere:
> >> - while building a custom image (`RUN flink_docker_utils configure ..`)
> by
> >> extending our base image from docker hub (`from flink`)
> >> - in a custom entry point as well
> >> I will check this but if user can also pass a dynamic config option it
> also
> >> sounds like a good option
> >> Our standard entry point script in base image could just properly
> forward
> >> the arguments to the Flink process.
> >>
> >> @Yang Wang
> >>
> >> > About docker utils
> >> > I really like the idea to provide some utils for the docker file and
> entry
> >> > point. The
> >> > `flink_docker_utils` will help to build the image easier. I am not
> sure
> >> > about the
> >> > `flink_docker_utils start_jobmaster`. Do you mean when we build a
> docker
> >> > image, we
> >> > need to add `RUN flink_docker_utils start_jobmaster` in the docker
> file?
> >> > Why do we need this?
> >>
> >> This is a scripted action to start JM. It can be called everywhere.
> >> Indeed, it does not make too much sense to run it in Dockerfile.
> >> Mostly, the idea was to use in a custom entry point. When our base
> docker
> >> hub image is started its entry point can be also completely overridden.
> >> The actions are also sorted in the FLIP: for Dockerfile or for entry
> point.
> >> E.g. our standard entry point script in the base docker hub image can
> >> already use it.
> >> Anyways, it was just an example, the details are to be defined in Jira,
> imo.
> >>
> >> > About docker entry point
> >> > I agree with you that the docker entry point could more powerful with
> more
> >> > functionality.
> >> > Mostly, it is about to override the config options. If we support
> dynamic
> >> > properties, i think
> >> > it is more convenient for users without any learning curve.
> >> > `docker run flink session_jobmanager -D rest.bind-port=8081`
> >>
> >> Indeed, as mentioned before, it can be a better option.
> >> The standard entry point also decides at least what to run JM or TM. I
> >> think we will see what else makes sense to include there during the
> >> implementation.
> >> Some specifics may be more convenient to set with env vars as Konstantin
> >> mentioned.
> >>
> >> > About the logging
> >> > Updating the `log4j-console.properties` to support multiple appender
> is a
> >> > better option.
> >> > Currently, the native K8s is suggesting users to debug the logs in
> this
> >> > way[1]. However,
> >> > there is also some problems. The stderr and stdout of JM/TM processes
> could
> >> > not be
> >> > forwarded to the docker container console.
> >>
> >> Strange, we should check maybe there is a docker option to query the
> >> container's stderr output as well.
> >> If we forward Flink process stdout as usual in bash console, it should
> not
> >> be a problem. Why can it not be forwarded?
> >>
> >> @Konstantin Knauf
> >>
> >> For the entrypoint, have you considered to also allow setting
> configuration
> >> > via environment variables as in "docker run -e
> FLINK_REST_BIN_PORT=8081
> >> > ..."? This is quite common and more flexible, e.g. it makes it very
> easy to
> >> > pass values of Kubernetes Secrets into the Flink configuration.
> >>
> >> This is indeed an interesting option to pass arguments to the entry
> point
> >> in general.
> >> For the config options, the dynamic args can be a better option as
> >> mentioned above.
> >>
> >> With respect to logging, I would opt to keep this very basic and to only
> >> > support logging to the console (maybe with a fix for the web user
> >> > interface). For everything else, users can easily build their own
> images
> >> > based on library/flink (provide the dependencies, change the logging
> >> > configuration).
> >>
> >> agree
> >>
> >> Thanks,
> >> Andrey
> >>
> >> On Sun, Mar 8, 2020 at 8:55 PM Konstantin Knauf <
> konstan...@ververica.com>
> >> wrote:
> >>
> >> > Hi Andrey,
> >> >
> >> > thanks a lot for this proposal. The variety of Docker files in the
> project
> >> > has been causing quite some confusion.
> >> >
> >> > For the entrypoint, have you considered to also allow setting
> >> > configuration via environment variables as in "docker run -e
> >> > FLINK_REST_BIN_PORT=8081 ..."? This is quite common and more
> flexible, e.g.
> >> > it makes it very easy to pass values of Kubernetes Secrets into the
> Flink
> >> > configuration.
> >> >
> >> > With respect to logging, I would opt to keep this very basic and to
> only
> >> > support logging to the console (maybe with a fix for the web user
> >> > interface). For everything else, users can easily build their own
> images
> >> > based on library/flink (provide the dependencies, change the logging
> >> > configuration).
> >> >
> >> > Cheers,
> >> >
> >> > Konstantin
> >> >
> >> >
> >> > On Thu, Mar 5, 2020 at 11:01 AM Yang Wang <danrtsey...@gmail.com>
> wrote:
> >> >
> >> >> Hi Andrey,
> >> >>
> >> >>
> >> >> Thanks for driving this significant FLIP. From the user ML, we could
> also
> >> >> know there are
> >> >> many users running Flink in container environment. Then the docker
> image
> >> >> will be the
> >> >> very basic requirement. Just as you say, we should provide a unified
> >> >> place for all various
> >> >> usage(e.g. session, job, native k8s, swarm, etc.).
> >> >>
> >> >>
> >> >> > About docker utils
> >> >>
> >> >> I really like the idea to provide some utils for the docker file and
> >> >> entry point. The
> >> >> `flink_docker_utils` will help to build the image easier. I am not
> sure
> >> >> about the
> >> >> `flink_docker_utils start_jobmaster`. Do you mean when we build a
> docker
> >> >> image, we
> >> >> need to add `RUN flink_docker_utils start_jobmaster` in the docker
> file?
> >> >> Why do we need this?
> >> >>
> >> >>
> >> >> > About docker entry point
> >> >>
> >> >> I agree with you that the docker entry point could more powerful with
> >> >> more functionality.
> >> >> Mostly, it is about to override the config options. If we support
> dynamic
> >> >> properties, i think
> >> >> it is more convenient for users without any learning curve.
> >> >> `docker run flink session_jobmanager -D rest.bind-port=8081`
> >> >>
> >> >>
> >> >> > About the logging
> >> >>
> >> >> Updating the `log4j-console.properties` to support multiple appender
> is a
> >> >> better option.
> >> >> Currently, the native K8s is suggesting users to debug the logs in
> this
> >> >> way[1]. However,
> >> >> there is also some problems. The stderr and stdout of JM/TM processes
> >> >> could not be
> >> >> forwarded to the docker container console.
> >> >>
> >> >>
> >> >> [1].
> >> >>
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files
> >> >>
> >> >>
> >> >> Best,
> >> >> Yang
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Andrey Zagrebin <azagre...@apache.org> 于2020年3月4日周三 下午5:34写道:
> >> >>
> >> >>> Hi All,
> >> >>>
> >> >>> If you have ever touched the docker topic in Flink, you
> >> >>> probably noticed that we have multiple places in docs and repos
> which
> >> >>> address its various concerns.
> >> >>>
> >> >>> We have prepared a FLIP [1] to simplify the perception of docker
> topic in
> >> >>> Flink by users. It mostly advocates for an approach of extending
> official
> >> >>> Flink image from the docker hub. For convenience, it can come with
> a set
> >> >>> of
> >> >>> bash utilities and documented examples of their usage. The utilities
> >> >>> allow
> >> >>> to:
> >> >>>
> >> >>>    - run the docker image in various modes (single job, session
> master,
> >> >>>    task manager etc)
> >> >>>    - customise the extending Dockerfile
> >> >>>    - and its entry point
> >> >>>
> >> >>> Eventually, the FLIP suggests to remove all other user facing
> Dockerfiles
> >> >>> and building scripts from Flink repo, move all docker docs to
> >> >>> apache/flink-docker and adjust existing docker use cases to refer
> to this
> >> >>> new approach (mostly Kubernetes now).
> >> >>>
> >> >>> The first contributed version of Flink docker integration also
> contained
> >> >>> example and docs for the integration with Bluemix in IBM cloud. We
> also
> >> >>> suggest to maintain it outside of Flink repository (cc Markus
> Müller).
> >> >>>
> >> >>> Thanks,
> >> >>> Andrey
> >> >>>
> >> >>> [1]
> >> >>>
> >> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification
> >> >>>
> >> >>
> >> >
> >> > --
> >> >
> >> > Konstantin Knauf | Head of Product
> >> >
> >> > +49 160 91394525
> >> >
> >> >
> >> > Follow us @VervericaData Ververica <https://www.ververica.com/>
> >> >
> >> >
> >> > --
> >> >
> >> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> >> > Conference
> >> >
> >> > Stream Processing | Event Driven | Real Time
> >> >
> >> > --
> >> >
> >> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >> >
> >> > --
> >> > Ververica GmbH
> >> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason,
> Ji
> >> > (Tony) Cheng
> >> >
>

Reply via email to