Hi

I think Till's view deserves wider discussion.
I want to give my two cents when I debug with Nico on his reported RocksDB OOM 
problem.
Jemalloc has the mechanism to profile memory allocation [1] which is widely 
used to analysis memory leak.
Once we set jemalloc as default memory allocator, the frequency of OOM behavior 
decreases obviously.

Considering the OOM killed problem in k8s, change default memory allocator as 
jemalloc could be something beneficial.

[1] https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Heap-Profiling

Best
Yun Tang

________________________________
From: Till Rohrmann <trohrm...@apache.org>
Sent: Thursday, October 29, 2020 18:34
To: dev <dev@flink.apache.org>
Cc: Yun Tang <myas...@live.com>
Subject: Re: [DISCUSS][docker] Adopt Jemalloc as default memory allocator for 
debian based Flink docker image

Hi Yu,

I see your point why you are in favour of continuing using glibc.

I honestly don't have a lot of experience with malloc libraries so I can
only argue from a general perspective: We know that glibc has some problems
wrt to memory fragmentation which can cause processes to exceed its memory
limit. jemalloc seems to address this problem by avoiding memory
fragmentation. However, it might be the case that jemalloc introduces other
problems. In order to figure this out we would need to collect more data
points/experience. So in the end we have to weigh glibc memory
fragmentation against unknown problems in jemalloc.

Given that choosing the malloc library will be quite an expert setting, I
don't expect many Flink users to use it in case they see OOMs. Hence, I
would be slightly in favour of choosing jemalloc as the default because it
would allow us to gather feedback from users and solve the existing
problem. In the best case, user's won't run into new problems. If they do,
then we will have to re-evaluate this decision and might have to switch the
malloc library back. W/o setting jemalloc as the default, I fear that we
will never gather enough feedback to make jemalloc confidently the default.

What one could argue, though, is that making jemalloc the default so
shortly before the feature freeze limits the timespan we can observe how it
behaves before the actual release. Doing such a change at the beginning of
a release cycle, would allow us to gain more confidence.

In any way, I believe that we should add a big release note and document
what users should do in case they see memory issues (OOM kills, slower
performance, etc.).

Cheers,
Till

On Tue, Oct 20, 2020 at 5:39 PM Yu Li <car...@gmail.com> wrote:

> True, thanks for the reminder Till!
>
> I suggest using glibc malloc as default. On one hand this follows our old
> behavior (only with glibc malloc support in the image), on the other hand I
> believe glibc isn't using jemalloc as its default memory allocator for some
> reason.
>
> Please let me know your thoughts. Thanks.
>
> Best Regards,
> Yu
>
>
> On Tue, 20 Oct 2020 at 21:45, Till Rohrmann <trohrm...@apache.org> wrote:
>
> > The only question left would be what will be the default value?
> >
> > Cheers,
> > Till
> >
> > On Tue, Oct 20, 2020 at 10:16 AM Yu Li <car...@gmail.com> wrote:
> >
> > > I'm also +1 on making it configurable in the same Docker image.
> > >
> > > It seems we have reached consensus and there are already enough +1s to
> > move
> > > forward, and suggest @Yun to conclude the discussion directly if there
> > are
> > > no objections.
> > >
> > > Thanks.
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Fri, 16 Oct 2020 at 23:16, Till Rohrmann <trohrm...@apache.org>
> > wrote:
> > >
> > > > +1 for making it configurable in the same Docker image.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Fri, Oct 16, 2020 at 12:56 PM Chesnay Schepler <
> ches...@apache.org>
> > > > wrote:
> > > >
> > > > > If it is possible to support both allocators in a single image then
> > we
> > > > > should definitely go with that option.
> > > > >
> > > > > On 10/16/2020 12:21 PM, Yun Tang wrote:
> > > > > > Thanks for Yang's suggestion. I think this could be a better
> > choice.
> > > > > > We could install jemalloc and only enable it in LD_PRELOAD when
> > user
> > > > > pass specific configuration for docker-entrypoint.sh.
> > > > > > By doing so, we could avoid to create another docker image tags
> and
> > > > also
> > > > > offer ability to reduce memory fragmentation problem.
> > > > > >
> > > > > > Does anyone else have other ideas?
> > > > > >
> > > > > > Best
> > > > > > Yun Tang
> > > > > > ________________________________
> > > > > > From: Yang Wang <danrtsey...@gmail.com>
> > > > > > Sent: Thursday, October 15, 2020 14:59
> > > > > > To: dev <dev@flink.apache.org>
> > > > > > Subject: Re: [DISCUSS][docker] Adopt Jemalloc as default memory
> > > > > allocator for debian based Flink docker image
> > > > > >
> > > > > > Thanks Yun Tang for starting this discussion.
> > > > > >
> > > > > > I think this is very important when deploy Flink with container
> > > > > environment
> > > > > > in production. I just
> > > > > > have quick question. Could we have both memory allocator(e.g.
> > glibc,
> > > > > > jemalloc) in the Flink
> > > > > > official image and enable a specific one by setting ENV?
> > > > > >
> > > > > > Best,
> > > > > > Yang
> > > > > >
> > > > > > Yu Li <car...@gmail.com> 于2020年10月14日周三 下午12:23写道:
> > > > > >
> > > > > >> Thanks for debugging and resolving the issue and driving the
> > > > discussion
> > > > > >> Yun!
> > > > > >>
> > > > > >> For the given solutions, I prefer option 1 (supply another
> > > Dockerfile
> > > > > using
> > > > > >> jemalloc as default memory allocator) because of the below
> > reasons:
> > > > > >>
> > > > > >> 1. It's hard to say jemalloc is always better than ptmalloc
> (glibc
> > > > > malloc),
> > > > > >> or else glibc should have already adopted it as the default
> memory
> > > > > >> allocator. And as indicated here [1], in some cases jemalloc
> will
> > > > > >> consume as much as twice the memory than glibc
> > > > > >>
> > > > > >> 2. All existing Flink docker images use glibc, if we change the
> > > > default
> > > > > >> memory allocator to jemalloc and only supply one series of
> images,
> > > we
> > > > > will
> > > > > >> leave those having better performance with glibc no other
> choices
> > > but
> > > > > >> staying with old images. In another word, there's a risk of
> > > > introducing
> > > > > new
> > > > > >> problems while fixing an existing one if choosing option-2.
> > > > > >>
> > > > > >> And there is a third option considering the efforts of
> maintaining
> > > > more
> > > > > >> images if the memory leak issue is not widely observed, that we
> > > could
> > > > > >> document the steps of building Dockerfile with jemalloc as
> default
> > > > > >> allocator so users could build it when needed, which leaves the
> > > burden
> > > > > to
> > > > > >> our users so for me it's not the best option.
> > > > > >>
> > > > > >> Best Regards,
> > > > > >> Yu
> > > > > >>
> > > > > >> [1] https://stackoverflow.com/a/33993215
> > > > > >>
> > > > > >> On Tue, 13 Oct 2020 at 15:34, Yun Tang <myas...@live.com>
> wrote:
> > > > > >>
> > > > > >>> Hi all
> > > > > >>>
> > > > > >>> Users report they meet serious memory leak when submitting jobs
> > > > > >>> continously in session mode within k8s (please refer to
> > > > FLINK-18712[1]
> > > > > ),
> > > > > >>> and I also reproduce this to find this is caused by memory
> > > > > fragmentation
> > > > > >> of
> > > > > >>> glibc [2][3] and provide solutions to fix this:
> > > > > >>>
> > > > > >>>    *   Quick but not very clean solution to limit the memory
> pool
> > > of
> > > > > >> glibc,
> > > > > >>> limit MALLOC_ARENA_MAX to 2
> > > > > >>>
> > > > > >>>    *   More general solution by rebuilding the image to install
> > > > > >>> libjemalloc-dev and add the libjemalloc.so it to LD_PRELOAD
> > > > > >>>
> > > > > >>> The reporter adopted the 2nd solution to fix this issue
> > eventually.
> > > > > Thus,
> > > > > >>> I begin to think whether we should change our Dockerfile to
> adopt
> > > > > >> jemalloc
> > > > > >>> as default memory allocator [4].
> > > > > >>>  From my point of view, we have two choices:
> > > > > >>>
> > > > > >>>    1.  Introduce another Dockerfile using jemalloc as default
> > > memory
> > > > > >>> allocator, which means Flink needs another two new image tags
> to
> > > > build
> > > > > >>> docker with jemalloc while default docker still use glibc.
> > > > > >>>    2.  Set the default memory allocator as jemalloc in our
> > existing
> > > > > >>> Dockerfiles, which means Flink offer docker image with jemalloc
> > by
> > > > > >> default.
> > > > > >>> I prefer the 2nd option as our company already use jemalloc as
> > > > default
> > > > > >>> memory allocator for JDK at our production environment due to
> > > > messages
> > > > > >> from
> > > > > >>> os team warning of glibc's memory fragmentation.
> > > > > >>> Moreover, I found several open source projects adopting
> jemalloc
> > as
> > > > > >>> default memory allocator within their images to resolve memory
> > > > > >>> fragmentation problem, e.g fluent [5], home-assistant [6].
> > > > > >>>
> > > > > >>> What do you guys think of this issue?
> > > > > >>>
> > > > > >>> [1] https://issues.apache.org/jira/browse/FLINK-18712
> > > > > >>> [2]
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc
> > > > > >>> [3] https://sourceware.org/bugzilla/show_bug.cgi?id=15321
> > > > > >>> [4] https://issues.apache.org/jira/browse/FLINK-19125
> > > > > >>> [5]
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://docs.fluentbit.io/manual/v/1.0/installation/docker#why-there-is-no-fluent-bit-docker-image-based-on-alpine-linux
> > > > > >>> [6] https://github.com/home-assistant/core/pull/33237
> > > > > >>>
> > > > > >>>
> > > > > >>> Best
> > > > > >>> Yun Tang
> > > > > >>>
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to