Re: [Qemu-devel] [PATCH] memory: Fix double unref of flatview
On 13/02/2015 04:29, Matthew Rosato wrote: FYI, then this probably also affects the places you hit in d8d9581460 memory: convert memory_region_destroy to object_unparent, as that's what I modeled this approach on -- but I haven't tested any of them. Luckily not, because only real regions (not aliases and containers) end up in a FlatView. So you can do object_unparent on aliases and containers. It's ugly and should be avoided, but not buggy. There's only three cases (VFIO, for which patches have been posted and reviewed already, plus other two in PCI) that have to be modified. Paolo
Re: [Qemu-devel] [PATCH] memory: Fix double unref of flatview
On 02/12/2015 12:34 PM, Paolo Bonzini wrote: On 12/02/2015 17:21, Matthew Rosato wrote: Since 374f2981d1 memory: protect current_map by RCU, address_space_update_topology unrefs the old_flatview twice, once by call_rcu and once by direct call. This patch removes the direct call in favor of the call_rcu. Fixes at least one assertion failure seen in s390, where a ref count for a memory region attempts to go negative during hot-unplug of guest memory. This is buggy: MemoryRegion *standby_ram = g_new(MemoryRegion, 1); memory_region_init_ram(standby_ram, NULL, id, this_subregion_size, error_abort); ... object_unparent(OBJECT(mr)); g_free(mr); Memory might not be released after object_unparent (and will not be released if the RCU callbacks haven't run yet). Your patch worked around it because it never freed the FlatView, and thus the RCU callbacks never did anything on the memory regions. Instead, you have to do this: MemoryRegion *standby_ram = g_new(MemoryRegion, 1); Object *obj = OBJECT(standby_ram); memory_region_init_ram(standby_ram, NULL, id, this_subregion_size, error_abort); OBJECT(mr)-free = g_free; ... object_unparent(OBJECT(mr)); and the freeing will happen once the reference count goes to zero. Paolo Thanks Paolo, I like this change but it doesn't seem to help; still hitting the same assertion with it applied. Could it be that the order in which flatview_unref (and therefore memory_region_unref) vs object_unparent(mr) matters (ie, object_unparent should always happen last)? Prior to RCUification, seems like the old_view was always unreferenced prior to return from memory_region_del_subregion (so, memory_region_unref always occurred before the object_unparent(mr)). Now, the two could happen in either order -- and looking at memory_region_unref, it is specifically looking at the existence of obj-parent. Still investigating this, but I performed a litmus test: if I directly call flatview_unref twice in address_space_update_topology (rather than 1 direct 1 via call_rcu), the assertion goes away. Matt Signed-off-by: Matthew Rosato mjros...@linux.vnet.ibm.com --- memory.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/memory.c b/memory.c index 130152c..d08abe5 100644 --- a/memory.c +++ b/memory.c @@ -755,7 +755,6 @@ static void address_space_update_topology(AddressSpace *as) /* Writes are protected by the BQL. */ atomic_rcu_set(as-current_map, new_view); -call_rcu(old_view, flatview_unref, rcu); /* Note that all the old MemoryRegions are still alive up to this * point. This relieves most MemoryListeners from the need to @@ -763,7 +762,7 @@ static void address_space_update_topology(AddressSpace *as) * outside the iothread mutex, in which case precise reference * counting is necessary. */ -flatview_unref(old_view); +call_rcu(old_view, flatview_unref, rcu); address_space_update_ioeventfds(as); }
Re: [Qemu-devel] [PATCH] memory: Fix double unref of flatview
On 12/02/2015 20:32, Matthew Rosato wrote: Could it be that the order in which flatview_unref (and therefore memory_region_unref) vs object_unparent(mr) matters (ie, object_unparent should always happen last)? Prior to RCUification, seems like the old_view was always unreferenced prior to return from memory_region_del_subregion (so, memory_region_unref always occurred before the object_unparent(mr)). Now, the two could happen in either order -- and looking at memory_region_unref, it is specifically looking at the existence of obj-parent. I think you're right. To test your theory, you could try using call_rcu to unparent the memory region. The complete fix would be to wrap the MemoryRegion within a container device, and do object_unparent on the container device. Then the following happens: 1) the container object's unparent method calls memory_region_del_subregion 2) when the old flatview is destroyed, it calls memory_region_unref but the parent is still in place 3) the container object is destroyed 4) the container object destroys the container device. Note that these rules apply only to actual memory regions (I/O or RAM), not aliases or containers. I hope to post a doc patch next week, since I'm completing the patches that fix the only other problematic case (PCIe MMCONFIG). I was not expecting this to trigger so early, which is why I had left this for part 3. Paolo
Re: [Qemu-devel] [PATCH] memory: Fix double unref of flatview
On 02/12/2015 03:43 PM, Paolo Bonzini wrote: On 12/02/2015 20:32, Matthew Rosato wrote: Could it be that the order in which flatview_unref (and therefore memory_region_unref) vs object_unparent(mr) matters (ie, object_unparent should always happen last)? Prior to RCUification, seems like the old_view was always unreferenced prior to return from memory_region_del_subregion (so, memory_region_unref always occurred before the object_unparent(mr)). Now, the two could happen in either order -- and looking at memory_region_unref, it is specifically looking at the existence of obj-parent. I think you're right. FYI, then this probably also affects the places you hit in d8d9581460 memory: convert memory_region_destroy to object_unparent, as that's what I modeled this approach on -- but I haven't tested any of them. To test your theory, you could try using call_rcu to unparent the memory region. The complete fix would be to wrap the MemoryRegion within a Yep, this test worked. container device, and do object_unparent on the container device. Then the following happens: 1) the container object's unparent method calls memory_region_del_subregion 2) when the old flatview is destroyed, it calls memory_region_unref but the parent is still in place 3) the container object is destroyed 4) the container object destroys the container device. OK great! I noticed this problem while re-structuring this area of code anyway, so I'll roll this in. Note that these rules apply only to actual memory regions (I/O or RAM), not aliases or containers. I hope to post a doc patch next week, since I'm completing the patches that fix the only other problematic case (PCIe MMCONFIG). I was not expecting this to trigger so early, which is why I had left this for part 3. Looking forward to it -- Thanks for the help Paolo. Matt Paolo
Re: [Qemu-devel] [PATCH] memory: Fix double unref of flatview
On 12/02/2015 17:21, Matthew Rosato wrote: Since 374f2981d1 memory: protect current_map by RCU, address_space_update_topology unrefs the old_flatview twice, once by call_rcu and once by direct call. This patch removes the direct call in favor of the call_rcu. Fixes at least one assertion failure seen in s390, where a ref count for a memory region attempts to go negative during hot-unplug of guest memory. Signed-off-by: Matthew Rosato mjros...@linux.vnet.ibm.com The two unrefs are correct. One is needed to balance address_space_get_flatview; the other is needed because as-current_map does not point to old_view anymore. You can remove them with something like this (your patch plus one extra hunk): diff --git a/memory.c b/memory.c index a844ced..5add529 100644 --- a/memory.c +++ b/memory.c @@ -747,7 +747,7 @@ static void address_space_update_topology_pass(AddressSpace *as, static void address_space_update_topology(AddressSpace *as) { -FlatView *old_view = address_space_get_flatview(as); +FlatView *old_view = as-current_map; FlatView *new_view = generate_memory_topology(as-root); address_space_update_topology_pass(as, old_view, new_view, false); @@ -755,7 +755,6 @@ static void address_space_update_topology(AddressSpace *as) /* Writes are protected by the BQL. */ atomic_rcu_set(as-current_map, new_view); -call_rcu(old_view, flatview_unref, rcu); /* Note that all the old MemoryRegions are still alive up to this * point. This relieves most MemoryListeners from the need to @@ -763,7 +762,7 @@ static void address_space_update_topology(AddressSpace *as) * outside the iothread mutex, in which case precise reference * counting is necessary. */ -flatview_unref(old_view); +call_rcu(old_view, flatview_unref, rcu); address_space_update_ioeventfds(as); } but it wouldn't affect your bug. Paolo
Re: [Qemu-devel] [PATCH] memory: Fix double unref of flatview
On 12/02/2015 17:21, Matthew Rosato wrote: Since 374f2981d1 memory: protect current_map by RCU, address_space_update_topology unrefs the old_flatview twice, once by call_rcu and once by direct call. This patch removes the direct call in favor of the call_rcu. Fixes at least one assertion failure seen in s390, where a ref count for a memory region attempts to go negative during hot-unplug of guest memory. This is buggy: MemoryRegion *standby_ram = g_new(MemoryRegion, 1); memory_region_init_ram(standby_ram, NULL, id, this_subregion_size, error_abort); ... object_unparent(OBJECT(mr)); g_free(mr); Memory might not be released after object_unparent (and will not be released if the RCU callbacks haven't run yet). Your patch worked around it because it never freed the FlatView, and thus the RCU callbacks never did anything on the memory regions. Instead, you have to do this: MemoryRegion *standby_ram = g_new(MemoryRegion, 1); Object *obj = OBJECT(standby_ram); memory_region_init_ram(standby_ram, NULL, id, this_subregion_size, error_abort); OBJECT(mr)-free = g_free; ... object_unparent(OBJECT(mr)); and the freeing will happen once the reference count goes to zero. Paolo Signed-off-by: Matthew Rosato mjros...@linux.vnet.ibm.com --- memory.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/memory.c b/memory.c index 130152c..d08abe5 100644 --- a/memory.c +++ b/memory.c @@ -755,7 +755,6 @@ static void address_space_update_topology(AddressSpace *as) /* Writes are protected by the BQL. */ atomic_rcu_set(as-current_map, new_view); -call_rcu(old_view, flatview_unref, rcu); /* Note that all the old MemoryRegions are still alive up to this * point. This relieves most MemoryListeners from the need to @@ -763,7 +762,7 @@ static void address_space_update_topology(AddressSpace *as) * outside the iothread mutex, in which case precise reference * counting is necessary. */ -flatview_unref(old_view); +call_rcu(old_view, flatview_unref, rcu); address_space_update_ioeventfds(as); }