Peter,
Greetings from DigitalOcean. We're experiencing the same symptoms
without this patch.
We have, collectively, many gigabytes of un-planned-for RSS being used
per-hypervisor
that we would like to get rid of =).
Without explicitly trying this patch (will do that ASAP), we immediately
noticed that the
192MB mentioned immediately melts away (Yay) when we disabled the
coroutine thread pool explicitly,
with another ~100MB in additional stack usage that would likely also go
away if we
applied the entirety of your patch.
Is there any chance you have revisited this or have a timeline for it?
- Michael
/*
* Michael R. Hines
* Senior Engineer, DigitalOcean.
*/
On 06/28/2016 04:01 AM, Peter Lieven wrote:
I recently found that Qemu is using several hundred megabytes of RSS memory
more than older versions such as Qemu 2.2.0. So I started tracing
memory allocation and found 2 major reasons for this.
1) We changed the qemu coroutine pool to have a per thread and a global release
pool. The choosen poolsize and the changed algorithm could lead to up to
192 free coroutines with just a single iothread. Each of the coroutines
in the pool each having 1MB of stack memory.
2) Between Qemu 2.2.0 and 2.3.0 RCU was introduced which lead to delayed freeing
of memory. This lead to higher heap allocations which could not effectively
be returned to kernel (most likely due to fragmentation).
The following series is what I came up with. Beside the coroutine patches I
changed
some allocations to forcibly use mmap. All these allocations are not repeatly
made
during runtime so the impact of using mmap should be neglectible.
There are still some big malloced allocations left which cannot be easily
changed
(e.g. the pixman buffers in VNC). So it might an idea to set a lower mmap
threshold for
malloc since this threshold seems to be in the order of several Megabytes on
modern systems.
Peter Lieven (15):
coroutine-ucontext: mmap stack memory
coroutine-ucontext: add a switch to monitor maximum stack size
coroutine-ucontext: reduce stack size to 64kB
coroutine: add a knob to disable the shared release pool
util: add a helper to mmap private anonymous memory
exec: use mmap for subpages
qapi: use mmap for QmpInputVisitor
virtio: use mmap for VirtQueue
loader: use mmap for ROMs
vmware_svga: use mmap for scratch pad
qom: use mmap for bigger Objects
util: add a function to realloc mmapped memory
exec: use mmap for PhysPageMap->nodes
vnc-tight: make the encoding palette static
vnc: use mmap for VncState
configure | 33 ++++++++++++++++++--
exec.c | 11 ++++---
hw/core/loader.c | 16 +++++-----
hw/display/vmware_vga.c | 3 +-
hw/virtio/virtio.c | 5 +--
include/qemu/mmap-alloc.h | 7 +++++
include/qom/object.h | 1 +
qapi/qmp-input-visitor.c | 5 +--
qom/object.c | 20 ++++++++++--
ui/vnc-enc-tight.c | 21 ++++++-------
ui/vnc.c | 5 +--
ui/vnc.h | 1 +
util/coroutine-ucontext.c | 66 +++++++++++++++++++++++++++++++++++++--
util/mmap-alloc.c | 27 ++++++++++++++++
util/qemu-coroutine.c | 79 ++++++++++++++++++++++++++---------------------
15 files changed, 225 insertions(+), 75 deletions(-)