Thanks Yun Tang for starting this discussion.

I think this is very important when deploy Flink with container environment
in production. I just
have quick question. Could we have both memory allocator(e.g. glibc,
jemalloc) in the Flink
official image and enable a specific one by setting ENV?

Best,
Yang

Yu Li <car...@gmail.com> 于2020年10月14日周三 下午12:23写道:

> Thanks for debugging and resolving the issue and driving the discussion
> Yun!
>
> For the given solutions, I prefer option 1 (supply another Dockerfile using
> jemalloc as default memory allocator) because of the below reasons:
>
> 1. It's hard to say jemalloc is always better than ptmalloc (glibc malloc),
> or else glibc should have already adopted it as the default memory
> allocator. And as indicated here [1], in some cases jemalloc will
> consume as much as twice the memory than glibc
>
> 2. All existing Flink docker images use glibc, if we change the default
> memory allocator to jemalloc and only supply one series of images, we will
> leave those having better performance with glibc no other choices but
> staying with old images. In another word, there's a risk of introducing new
> problems while fixing an existing one if choosing option-2.
>
> And there is a third option considering the efforts of maintaining more
> images if the memory leak issue is not widely observed, that we could
> document the steps of building Dockerfile with jemalloc as default
> allocator so users could build it when needed, which leaves the burden to
> our users so for me it's not the best option.
>
> Best Regards,
> Yu
>
> [1] https://stackoverflow.com/a/33993215
>
> On Tue, 13 Oct 2020 at 15:34, Yun Tang <myas...@live.com> wrote:
>
> > Hi all
> >
> > Users report they meet serious memory leak when submitting jobs
> > continously in session mode within k8s (please refer to FLINK-18712[1] ),
> > and I also reproduce this to find this is caused by memory fragmentation
> of
> > glibc [2][3] and provide solutions to fix this:
> >
> >   *   Quick but not very clean solution to limit the memory pool of
> glibc,
> > limit MALLOC_ARENA_MAX to 2
> >
> >   *   More general solution by rebuilding the image to install
> > libjemalloc-dev and add the libjemalloc.so it to LD_PRELOAD
> >
> > The reporter adopted the 2nd solution to fix this issue eventually. Thus,
> > I begin to think whether we should change our Dockerfile to adopt
> jemalloc
> > as default memory allocator [4].
> > From my point of view, we have two choices:
> >
> >   1.  Introduce another Dockerfile using jemalloc as default memory
> > allocator, which means Flink needs another two new image tags to build
> > docker with jemalloc while default docker still use glibc.
> >   2.  Set the default memory allocator as jemalloc in our existing
> > Dockerfiles, which means Flink offer docker image with jemalloc by
> default.
> >
> > I prefer the 2nd option as our company already use jemalloc as default
> > memory allocator for JDK at our production environment due to messages
> from
> > os team warning of glibc's memory fragmentation.
> > Moreover, I found several open source projects adopting jemalloc as
> > default memory allocator within their images to resolve memory
> > fragmentation problem, e.g fluent [5], home-assistant [6].
> >
> > What do you guys think of this issue?
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-18712
> > [2]
> >
> https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc
> > [3] https://sourceware.org/bugzilla/show_bug.cgi?id=15321
> > [4] https://issues.apache.org/jira/browse/FLINK-19125
> > [5]
> >
> https://docs.fluentbit.io/manual/v/1.0/installation/docker#why-there-is-no-fluent-bit-docker-image-based-on-alpine-linux
> > [6] https://github.com/home-assistant/core/pull/33237
> >
> >
> > Best
> > Yun Tang
> >
>

Reply via email to