Re: [DISCUSS] FLIP-111: Docker image unification

Andrey Zagrebin Tue, 10 Mar 2020 12:26:10 -0700

Hi All,

Thanks a lot for the feedback!


*@Yangze Guo*

- Regarding the flink_docker_utils#install_flink function, I think it
> should also support build from local dist and build from a
> user-defined archive.

I suppose you bring this up mostly for development purpose or powerful
users.
Most of normal users are usually interested in mainstream released versions
of Flink.
Although, you are bring a valid concern, my idea was to keep scope of this
FLIP mostly for those normal users.
The powerful users are usually capable to design a completely
custom Dockerfile themselves.
At the moment, we already have custom Dockerfiles e.g. for tests in
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile.
We can add something similar for development purposes and maybe introduce a
special maven goal. There is a maven docker plugin, afaik.
I will add this to FLIP as next step.

- It seems that the install_shaded_hadoop could be an option of
> install_flink

I woud rather think about this as a separate independent optional step.

- Should we support JAVA 11? Currently, most of the docker file based on
> JAVA 8.

Indeed, it is a valid concern. Java version is a fundamental property of
the docker image.
To customise this in the current mainstream image is difficult, this would
require to ship it w/o Java at all.
Or this is a separate discussion whether we want to distribute docker hub
images with different Java versions or just bump it to Java 11.
This should be easy in a custom Dockerfile for development purposes though
as mentioned before.

- I do not understand how to set config options through

"flink_docker_utils configure"? Does this step happen during the image
> build or the container start? If it happens during the image build,
> there would be a new image every time we change the config. If it just
> a part of the container entrypoint, I think there is no need to add a
> configure command, we could just add all dynamic config options to the
> args list of "start_jobmaster"/"start_session_jobmanager". Am I
> understanding this correctly?

 `flink_docker_utils configure ...` can be called everywhere:
- while building a custom image (`RUN flink_docker_utils configure ..`) by
extending our base image from docker hub (`from flink`)
- in a custom entry point as well
I will check this but if user can also pass a dynamic config option it also
sounds like a good option
Our standard entry point script in base image could just properly forward
the arguments to the Flink process.

@Yang Wang

> About docker utils
> I really like the idea to provide some utils for the docker file and entry
> point. The
> `flink_docker_utils` will help to build the image easier. I am not sure
> about the
> `flink_docker_utils start_jobmaster`. Do you mean when we build a docker
> image, we
> need to add `RUN flink_docker_utils start_jobmaster` in the docker file?
> Why do we need this?

This is a scripted action to start JM. It can be called everywhere.
Indeed, it does not make too much sense to run it in Dockerfile.
Mostly, the idea was to use in a custom entry point. When our base docker
hub image is started its entry point can be also completely overridden.
The actions are also sorted in the FLIP: for Dockerfile or for entry point.
E.g. our standard entry point script in the base docker hub image can
already use it.
Anyways, it was just an example, the details are to be defined in Jira, imo.

> About docker entry point
> I agree with you that the docker entry point could more powerful with more
> functionality.
> Mostly, it is about to override the config options. If we support dynamic
> properties, i think
> it is more convenient for users without any learning curve.
> `docker run flink session_jobmanager -D rest.bind-port=8081`

Indeed, as mentioned before, it can be a better option.
The standard entry point also decides at least what to run JM or TM. I
think we will see what else makes sense to include there during the
implementation.
Some specifics may be more convenient to set with env vars as Konstantin
mentioned.

> About the logging
> Updating the `log4j-console.properties` to support multiple appender is a
> better option.
> Currently, the native K8s is suggesting users to debug the logs in this
> way[1]. However,
> there is also some problems. The stderr and stdout of JM/TM processes could
> not be
> forwarded to the docker container console.

Strange, we should check maybe there is a docker option to query the
container's stderr output as well.
If we forward Flink process stdout as usual in bash console, it should not
be a problem. Why can it not be forwarded?

@Konstantin Knauf

For the entrypoint, have you considered to also allow setting configuration
> via environment variables as in "docker run -e FLINK_REST_BIN_PORT=8081
> ..."? This is quite common and more flexible, e.g. it makes it very easy to
> pass values of Kubernetes Secrets into the Flink configuration.

This is indeed an interesting option to pass arguments to the entry point
in general.
For the config options, the dynamic args can be a better option as
mentioned above.

With respect to logging, I would opt to keep this very basic and to only
> support logging to the console (maybe with a fix for the web user
> interface). For everything else, users can easily build their own images
> based on library/flink (provide the dependencies, change the logging
> configuration).

agree

Thanks,
Andrey

On Sun, Mar 8, 2020 at 8:55 PM Konstantin Knauf <konstan...@ververica.com>
wrote:

> Hi Andrey,
>
> thanks a lot for this proposal. The variety of Docker files in the project
> has been causing quite some confusion.
>
> For the entrypoint, have you considered to also allow setting
> configuration via environment variables as in "docker run -e
> FLINK_REST_BIN_PORT=8081 ..."? This is quite common and more flexible, e.g.
> it makes it very easy to pass values of Kubernetes Secrets into the Flink
> configuration.
>
> With respect to logging, I would opt to keep this very basic and to only
> support logging to the console (maybe with a fix for the web user
> interface). For everything else, users can easily build their own images
> based on library/flink (provide the dependencies, change the logging
> configuration).
>
> Cheers,
>
> Konstantin
>
>
> On Thu, Mar 5, 2020 at 11:01 AM Yang Wang <danrtsey...@gmail.com> wrote:
>
>> Hi Andrey,
>>
>>
>> Thanks for driving this significant FLIP. From the user ML, we could also
>> know there are
>> many users running Flink in container environment. Then the docker image
>> will be the
>> very basic requirement. Just as you say, we should provide a unified
>> place for all various
>> usage(e.g. session, job, native k8s, swarm, etc.).
>>
>>
>> > About docker utils
>>
>> I really like the idea to provide some utils for the docker file and
>> entry point. The
>> `flink_docker_utils` will help to build the image easier. I am not sure
>> about the
>> `flink_docker_utils start_jobmaster`. Do you mean when we build a docker
>> image, we
>> need to add `RUN flink_docker_utils start_jobmaster` in the docker file?
>> Why do we need this?
>>
>>
>> > About docker entry point
>>
>> I agree with you that the docker entry point could more powerful with
>> more functionality.
>> Mostly, it is about to override the config options. If we support dynamic
>> properties, i think
>> it is more convenient for users without any learning curve.
>> `docker run flink session_jobmanager -D rest.bind-port=8081`
>>
>>
>> > About the logging
>>
>> Updating the `log4j-console.properties` to support multiple appender is a
>> better option.
>> Currently, the native K8s is suggesting users to debug the logs in this
>> way[1]. However,
>> there is also some problems. The stderr and stdout of JM/TM processes
>> could not be
>> forwarded to the docker container console.
>>
>>
>> [1].
>> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files
>>
>>
>> Best,
>> Yang
>>
>>
>>
>>
>> Andrey Zagrebin <azagre...@apache.org> 于2020年3月4日周三 下午5:34写道：
>>
>>> Hi All,
>>>
>>> If you have ever touched the docker topic in Flink, you
>>> probably noticed that we have multiple places in docs and repos which
>>> address its various concerns.
>>>
>>> We have prepared a FLIP [1] to simplify the perception of docker topic in
>>> Flink by users. It mostly advocates for an approach of extending official
>>> Flink image from the docker hub. For convenience, it can come with a set
>>> of
>>> bash utilities and documented examples of their usage. The utilities
>>> allow
>>> to:
>>>
>>>    - run the docker image in various modes (single job, session master,
>>>    task manager etc)
>>>    - customise the extending Dockerfile
>>>    - and its entry point
>>>
>>> Eventually, the FLIP suggests to remove all other user facing Dockerfiles
>>> and building scripts from Flink repo, move all docker docs to
>>> apache/flink-docker and adjust existing docker use cases to refer to this
>>> new approach (mostly Kubernetes now).
>>>
>>> The first contributed version of Flink docker integration also contained
>>> example and docs for the integration with Bluemix in IBM cloud. We also
>>> suggest to maintain it outside of Flink repository (cc Markus Müller).
>>>
>>> Thanks,
>>> Andrey
>>>
>>> [1]
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification
>>>
>>
>
> --
>
> Konstantin Knauf | Head of Product
>
> +49 160 91394525
>
>
> Follow us @VervericaData Ververica <https://www.ververica.com/>
>
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Tony) Cheng
>

Re: [DISCUSS] FLIP-111: Docker image unification

Reply via email to