On 26.08.24 18:25, David Hildenbrand wrote:
On 26.08.24 17:56, David Hildenbrand wrote:
On 26.08.24 17:38, David Hildenbrand wrote:
On 20.08.24 10:50, Peter Maydell wrote:
On Mon, 19 Aug 2024 at 20:07, David Hildenbrand <da...@redhat.com> wrote:

On 19.08.24 18:24, Peter Maydell wrote:
Hi; I'm looking at a memory leak apparently in the host memory backend
code that you can see from the qmp-cmd-test. Repro instructions:

Hi Peter,


(1) build QEMU with '--cc=clang' '--cxx=clang++' '--enable-debug'
'--target-list=x86_64-softmmu' '--enable-sanitizers'
(2) run 'make check'. More specifically, to get just this
failure ('make check' on current head-of-tree produces some
other unrelated leak errors) you can run the relevant single test:

(cd build/asan && ASAN_OPTIONS="fast_unwind_on_malloc=0"
QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/qmp-cmd-test
--tap -k -p /x86_64/qmp/object-add-failure-modes)

The test case is doing a variety of object-add then object-del
of the "memory-backend-ram" object, and this add-del cycle seems
to result in a fairly large leak:

Direct leak of 1572864 byte(s) in 6 object(s) allocated from:
         #0 0x555c1336efd8 in __interceptor_calloc
(/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-x86_64+0x218efd8)
(BuildId: fc7566a39db1253aed91d500b5b1784e0c438397)
         #1 0x7f5bf3472c50 in g_malloc0 
debian/build/deb/../../../glib/gmem.c:161:13
         #2 0x555c155bb134 in bitmap_new
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/qemu/bitmap.h:102:12
         #3 0x555c155ba4ee in dirty_memory_extend system/physmem.c:1831:37
         #4 0x555c15585a2b in ram_block_add system/physmem.c:1907:9
         #5 0x555c15589e50 in qemu_ram_alloc_internal system/physmem.c:2109:5
         #6 0x555c1558a096 in qemu_ram_alloc system/physmem.c:2129:12
         #7 0x555c15518b69 in memory_region_init_ram_flags_nomigrate
system/memory.c:1571:21
         #8 0x555c1464fd27 in ram_backend_memory_alloc 
backends/hostmem-ram.c:34:12
         #9 0x555c146510ac in host_memory_backend_memory_complete
backends/hostmem.c:345:10
         #10 0x555c1580bc90 in user_creatable_complete 
qom/object_interfaces.c:28:9
         #11 0x555c1580c6f8 in user_creatable_add_type 
qom/object_interfaces.c:125:10
         #12 0x555c1580ccc4 in user_creatable_add_qapi 
qom/object_interfaces.c:157:11
         #13 0x555c15ff0e2c in qmp_object_add qom/qom-qmp-cmds.c:227:5
         #14 0x555c161ce508 in qmp_marshal_object_add
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qapi/qapi-commands-qom.c:337:5
         #15 0x555c162a7139 in do_qmp_dispatch_bh qapi/qmp-dispatch.c:128:5
         #16 0x555c16387921 in aio_bh_call util/async.c:171:5
         #17 0x555c163887fc in aio_bh_poll util/async.c:218:13
         #18 0x555c162e1288 in aio_dispatch util/aio-posix.c:423:5
         #19 0x555c1638f7be in aio_ctx_dispatch util/async.c:360:5
         #20 0x7f5bf3469d3a in g_main_dispatch
debian/build/deb/../../../glib/gmain.c:3419:28
         #21 0x7f5bf3469d3a in g_main_context_dispatch
debian/build/deb/../../../glib/gmain.c:4137:7
         #22 0x555c163935c9 in glib_pollfds_poll util/main-loop.c:287:9
         #23 0x555c16391f03 in os_host_main_loop_wait util/main-loop.c:310:5
         #24 0x555c16391acc in main_loop_wait util/main-loop.c:589:11
         #25 0x555c14614917 in qemu_main_loop system/runstate.c:801:9
         #26 0x555c16008b8c in qemu_default_main system/main.c:37:14
         #27 0x555c16008bd7 in main system/main.c:48:12
         #28 0x7f5bf12fbd8f in __libc_start_call_main
csu/../sysdeps/nptl/libc_start_call_main.h:58:16

My initial suspicion here is that the problem is that
TYPE_MEMORY_BACKEND has a UserCreatableClass::complete method which
calls HostMemoryBackend::alloc, but there is no corresponding
"now free this" in instance_finalize. So ram_backend_memory_alloc()
calls memory_region_init_ram_flags_nomigrate(), which allocates
RAM, dirty blocks, etc, but nothing ever destroys the MR and the
memory is leaked when the TYPE_MEMORY_BACKEND object is finalized.

But there isn't a "free" method in HostMemoryBackendClass,
only an "alloc", so this looks like an API with "leaks memory"
baked into it. How is the freeing of the memory on object
deletion intended to work?

I *think* during object_del(), we would be un-refing the contained
memory-region, which in turn will make the refcount go to 0 and end up
calling memory_region_finalize().

Oh, yes, I'd forgotten about the MemoryRegions being refcounted.
That explains why the MR itself doesn't show up as a leak, only
these dirty memory bitmaps.

In memory_region_finalize, we do various things, including calling
mr->destructor(mr).

For memory_region_init_ram_flags_nomigrate(), the deconstructor is set
to memory_region_destructor_ram(). This is the place where we call
qemu_ram_free(mr->ram_block);

There we clean up.

What we *don't* clean up is the allocation you are seeing:
dirty_memory_extend() will extend the ram_list.dirty_memory bitmap as
needed. It is not stored in the RAMBlock, it's a global list.

It's not really a leak I think: when we object_del + object_add *I
think* that bitmap will simply get reused.

I think there probably is a leak here somewhere, though --
lsan will only report if the memory is unreachable from
anywhere on program exit, AIUI. If we still had the global
list available to reuse on the next object-creation
shouldn't it still be reachable from somewhere?

Yes, that's what confusing me here. It's a global array that holds these
bitmap chunks. I don't see how there would be a leak, but maybe I'm
missing something. Let me have another look.


It's possible the leak only happens in some of the
"check failure cases of object-add" code paths that the
test is exercising, of course.

Right, but I think in any case we would keep the global array
consistent. Let me try to re-understand that code.


Ah, yes there is a memory leak. The issue is that on unplug, when the
RAMBlock goes away, we don't free any entries that dirty_memory_extend()
allocated.

The next time a RAMBlock is added, we call last_ram_page() again, to see
if we have to extend. And we think we have to extend.

As last_ram_page() is based on ramblocks, and the relevant RAMBlock
vanished, we would call dirty_memory_extend() again, doing a
bitmap_new() at spots where there would already have been a bitmap_new()
before.

That whole code is quite fragile I'm afraid. It would probably be best
to just sore the numbers of allocated blocks per
ram_list.dirty_memory[i] entry, and maybe try getting rid of this
"old_ram_size" thingy completely.

Of course, freeing bitmaps where possible might also be an option when
removing RAMBlocks ...

The following should fix it (only compile-tested so far):

  From a4d1c4ebbcb93a8ba32d86ee334e30b602ac640f Mon Sep 17 00:00:00 2001
From: David Hildenbrand <da...@redhat.com>
Date: Mon, 26 Aug 2024 18:14:46 +0200
Subject: [PATCH v1] softmmu/physmem: fix memory leak in dirty_memory_extend()

As reported by Peter, we might be leaking memory when removing the
highest RAMBlock (in the weird ram_addr_t space), and adding a new one.

We will fail to realize that we already allocated bitmaps for more
dirty memory blocks, and effectively discard them to overwrite them
with new bitmaps.

Fix it by getting rid of last_ram_page() and simply storing the number
of dirty memory blocks that have been allocated.

Signed-off-by: David Hildenbrand <da...@redhat.com>
---
   include/exec/ramlist.h |  1 +
   system/physmem.c       | 42 +++++++++++++-----------------------------
   2 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index 2ad2a81acc..f2a965f293 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -41,6 +41,7 @@ typedef struct RAMBlockNotifier RAMBlockNotifier;
   #define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8)
   typedef struct {
       struct rcu_head rcu;
+    unsigned int num_blocks;
       unsigned long *blocks[];
   } DirtyMemoryBlocks;
diff --git a/system/physmem.c b/system/physmem.c
index 94600a33ec..54384e6f34 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1534,18 +1534,6 @@ static ram_addr_t find_ram_offset(ram_addr_t size)
       return offset;
   }
-static unsigned long last_ram_page(void)
-{
-    RAMBlock *block;
-    ram_addr_t last = 0;
-
-    RCU_READ_LOCK_GUARD();
-    RAMBLOCK_FOREACH(block) {
-        last = MAX(last, block->offset + block->max_length);
-    }
-    return last >> TARGET_PAGE_BITS;
-}
-
   static void qemu_ram_setup_dump(void *addr, ram_addr_t size)
   {
       int ret;
@@ -1799,28 +1787,29 @@ void qemu_ram_msync(RAMBlock *block, ram_addr_t start, 
ram_addr_t length)
   }
/* Called with ram_list.mutex held */
-static void dirty_memory_extend(ram_addr_t old_ram_size,
-                                ram_addr_t new_ram_size)
+static void dirty_memory_extend(ram_addr_t new_ram_size)
   {
-    ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size,
-                                             DIRTY_MEMORY_BLOCK_SIZE);
       ram_addr_t new_num_blocks = DIV_ROUND_UP(new_ram_size,
                                                DIRTY_MEMORY_BLOCK_SIZE);
       int i;
- /* Only need to extend if block count increased */
-    if (new_num_blocks <= old_num_blocks) {
-        return;
-    }
-
       for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
           DirtyMemoryBlocks *old_blocks;
           DirtyMemoryBlocks *new_blocks;
+        ram_addr_t old_num_blocks;
           int j;
old_blocks = qatomic_rcu_read(&ram_list.dirty_memory[i]);
+        old_num_blocks = old_blocks->num_blocks;
+
+        /* Only need to extend if block count increased */
+        if (new_num_blocks <= old_num_blocks) {
+            return;
+        }
+
           new_blocks = g_malloc(sizeof(*new_blocks) +
                                 sizeof(new_blocks->blocks[0]) * 
new_num_blocks);
+        new_blocks->num_blocks = new_num_blocks;
if (old_num_blocks) {
               memcpy(new_blocks->blocks, old_blocks->blocks,
@@ -1846,11 +1835,9 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
       RAMBlock *block;
       RAMBlock *last_block = NULL;
       bool free_on_error = false;
-    ram_addr_t old_ram_size, new_ram_size;
+    ram_addr_t new_ram_size;
       Error *err = NULL;
- old_ram_size = last_ram_page();
-
       qemu_mutex_lock_ramlist();
       new_block->offset = find_ram_offset(new_block->max_length);
@@ -1901,11 +1888,8 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
           }
       }
- new_ram_size = MAX(old_ram_size,
-              (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS);
-    if (new_ram_size > old_ram_size) {
-        dirty_memory_extend(old_ram_size, new_ram_size);
-    }
+    new_ram_size = (new_block->offset + new_block->max_length) >> 
TARGET_PAGE_BITS;
+    dirty_memory_extend(new_ram_size);
       /* Keep the list sorted from biggest to smallest block.  Unlike QTAILQ,
        * QLIST (which has an RCU-friendly variant) does not have insertion at
        * tail, so save the last element in last_block.

[talking to myself] there is a BUG in above:

diff --git a/system/physmem.c b/system/physmem.c
index 54384e6f34..e744d5304a 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1796,15 +1796,17 @@ static void dirty_memory_extend(ram_addr_t new_ram_size)
     for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
         DirtyMemoryBlocks *old_blocks;
         DirtyMemoryBlocks *new_blocks;
-        ram_addr_t old_num_blocks;
+        ram_addr_t old_num_blocks = 0;
         int j;
old_blocks = qatomic_rcu_read(&ram_list.dirty_memory[i]);
-        old_num_blocks = old_blocks->num_blocks;
+        if (old_blocks) {
+            old_num_blocks = old_blocks->num_blocks;
- /* Only need to extend if block count increased */
-        if (new_num_blocks <= old_num_blocks) {
-            return;
+            /* Only need to extend if block count increased */
+            if (new_num_blocks <= old_num_blocks) {
+                return;
+            }
         }
new_blocks = g_malloc(sizeof(*new_blocks) +


Will send out a patch once I did more testing.

--
Cheers,

David / dhildenb


Reply via email to