GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/1593
[FLINK-3120] [runtime] Manually configure Netty's ByteBufAllocator
tl;dr Change default Netty configuration to be relative to number of slots,
i.e. configure one memory arena (in PooledByteBufAllocator) per slot and use
one event loop thread per slot. Behaviour can still be manually overwritten.
With this change, we can expect 16 MB of direct memory allocated per task slot
by Netty.
Problem: We were using Netty's default PooledByteBufAllocator instance,
which is subject to changing behaviour between Netty versions (happened between
versions 4.0.27.Final and 4.0.28.Final resulting in increased memory
consumption) and whose default memory consumption depends on the number of
available cores in the system. This can be problematic for example in YARN
setups where users run one slot per task manager on machines with many cores,
resulting in a relatively high number of allocated memory.
Solution: We instantiate a PooledByteBufAllocator instance manually and
wrap it as a NettyBufferPool. Our instance configures one arena per task slot
as default. It's desirable to have the number of arenas match the number of
event loop threads to minimize lock contention (Netty's default tried to ensure
this as well), hence the number of threads is changed as well to match the
number of slots as default. Both number of threads and arenas can still be
manually configured.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink 3120-buffers
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1593.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1593
commit 613ed9cce07d36e7b229e444dad3996db1bdb8c6
Author: Ufuk Celebi
Date: 2016-02-03T15:05:37Z
[FLINK-3120] [runtime] Manually configure Netty's ByteBufAllocator
tl;dr Change default Netty configuration to be relative to number of slots,
i.e. configure one memory arena (in PooledByteBufAllocator) per slot and
use one
event loop thread per slot. Behaviour can still be manually overwritten.
With
this change, we can expect 16 MB of direct memory allocated per task slot by
Netty.
Problem: We were using Netty's default PooledByteBufAllocator instance,
which
is subject to changing behaviour between Netty versions (happened between
versions 4.0.27.Final and 4.0.28.Final resulting in increased memory
consumption) and whose default memory consumption depends on the number of
available cores in the system. This can be problematic for example in YARN
setups where users run one slot per task manager on machines with many
cores,
resulting in a relatively high number of allocated memory.
Solution: We instantiate a PooledByteBufAllocator instance manually and wrap
it as a NettyBufferPool. Our instance configures one arena per task slot as
default. It's desirable to have the number of arenas match the number of
event
loop threads to minimize lock contention (Netty's default tried to ensure
this
as well), hence the number of threads is changed as well to match the number
of slots as default. Both number of threads and arenas can still be manually
configured.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---