Re: [Qemu-devel] [PATCH] memory: Fix double unref of flatview

2015-02-13 Thread Paolo Bonzini


On 13/02/2015 04:29, Matthew Rosato wrote:
 FYI, then this probably also affects the places you hit in d8d9581460
 memory: convert memory_region_destroy to object_unparent, as that's
 what I modeled this approach on -- but I haven't tested any of them.

Luckily not, because only real regions (not aliases and containers)
end up in a FlatView.  So you can do object_unparent on aliases and
containers.  It's ugly and should be avoided, but not buggy.

There's only three cases (VFIO, for which patches have been posted and
reviewed already, plus other two in PCI) that have to be modified.

Paolo



Re: [Qemu-devel] [PATCH] memory: Fix double unref of flatview

2015-02-12 Thread Matthew Rosato
On 02/12/2015 12:34 PM, Paolo Bonzini wrote:
 
 
 On 12/02/2015 17:21, Matthew Rosato wrote:
 Since 374f2981d1 memory: protect current_map by RCU,
 address_space_update_topology unrefs the old_flatview twice,
 once by call_rcu and once by direct call.  This patch removes
 the direct call in favor of the call_rcu.  Fixes at least one
 assertion failure seen in s390, where a ref count for a memory
 region attempts to go negative during hot-unplug of guest memory.
 
 This is buggy:
 
 MemoryRegion *standby_ram = g_new(MemoryRegion, 1);
 memory_region_init_ram(standby_ram, NULL, id, 
 this_subregion_size, error_abort);
 
 ...
 object_unparent(OBJECT(mr));
 g_free(mr);
 
 Memory might not be released after object_unparent (and will not be
 released if the RCU callbacks haven't run yet).
 
 Your patch worked around it because it never freed the FlatView, and
 thus the RCU callbacks never did anything on the memory regions.
 
 Instead, you have to do this:
 
 MemoryRegion *standby_ram = g_new(MemoryRegion, 1);
 Object *obj = OBJECT(standby_ram);
 
 memory_region_init_ram(standby_ram, NULL, id, 
 this_subregion_size, error_abort);
 OBJECT(mr)-free = g_free;
 ...
 object_unparent(OBJECT(mr));
 
 and the freeing will happen once the reference count goes to zero.
 
 Paolo
 

Thanks Paolo, I like this change but it doesn't seem to help; still
hitting the same assertion with it applied.

Could it be that the order in which flatview_unref (and therefore
memory_region_unref) vs object_unparent(mr) matters (ie, object_unparent
should always happen last)?  Prior to RCUification, seems like the
old_view was always unreferenced prior to return from
memory_region_del_subregion (so, memory_region_unref always occurred
before the object_unparent(mr)).  Now, the two could happen in either
order -- and looking at memory_region_unref, it is specifically looking
at the existence of obj-parent.

Still investigating this, but I performed a litmus test:  if I directly
call flatview_unref twice in address_space_update_topology (rather than
1 direct  1 via call_rcu), the assertion goes away.

Matt



 Signed-off-by: Matthew Rosato mjros...@linux.vnet.ibm.com
 ---
  memory.c |3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

 diff --git a/memory.c b/memory.c
 index 130152c..d08abe5 100644
 --- a/memory.c
 +++ b/memory.c
 @@ -755,7 +755,6 @@ static void address_space_update_topology(AddressSpace 
 *as)
  
  /* Writes are protected by the BQL.  */
  atomic_rcu_set(as-current_map, new_view);
 -call_rcu(old_view, flatview_unref, rcu);
  
  /* Note that all the old MemoryRegions are still alive up to this
   * point.  This relieves most MemoryListeners from the need to
 @@ -763,7 +762,7 @@ static void address_space_update_topology(AddressSpace 
 *as)
   * outside the iothread mutex, in which case precise reference
   * counting is necessary.
   */
 -flatview_unref(old_view);
 +call_rcu(old_view, flatview_unref, rcu);
  
  address_space_update_ioeventfds(as);
  }

 




Re: [Qemu-devel] [PATCH] memory: Fix double unref of flatview

2015-02-12 Thread Paolo Bonzini


On 12/02/2015 20:32, Matthew Rosato wrote:
 Could it be that the order in which flatview_unref (and therefore
 memory_region_unref) vs object_unparent(mr) matters (ie, object_unparent
 should always happen last)?  Prior to RCUification, seems like the
 old_view was always unreferenced prior to return from
 memory_region_del_subregion (so, memory_region_unref always occurred
 before the object_unparent(mr)).  Now, the two could happen in either
 order -- and looking at memory_region_unref, it is specifically looking
 at the existence of obj-parent.

I think you're right.

To test your theory, you could try using call_rcu to unparent the memory
region.  The complete fix would be to wrap the MemoryRegion within a
container device, and do object_unparent on the container device.  Then
the following happens:

1) the container object's unparent method calls memory_region_del_subregion

2) when the old flatview is destroyed, it calls memory_region_unref but
the parent is still in place

3) the container object is destroyed

4) the container object destroys the container device.

Note that these rules apply only to actual memory regions (I/O or RAM),
not aliases or containers.  I hope to post a doc patch next week, since
I'm completing the patches that fix the only other problematic case
(PCIe MMCONFIG).  I was not expecting this to trigger so early, which is
why I had left this for part 3.

Paolo



Re: [Qemu-devel] [PATCH] memory: Fix double unref of flatview

2015-02-12 Thread Matthew Rosato
On 02/12/2015 03:43 PM, Paolo Bonzini wrote:
 
 
 On 12/02/2015 20:32, Matthew Rosato wrote:
 Could it be that the order in which flatview_unref (and therefore
 memory_region_unref) vs object_unparent(mr) matters (ie, object_unparent
 should always happen last)?  Prior to RCUification, seems like the
 old_view was always unreferenced prior to return from
 memory_region_del_subregion (so, memory_region_unref always occurred
 before the object_unparent(mr)).  Now, the two could happen in either
 order -- and looking at memory_region_unref, it is specifically looking
 at the existence of obj-parent.
 
 I think you're right.

FYI, then this probably also affects the places you hit in d8d9581460
memory: convert memory_region_destroy to object_unparent, as that's
what I modeled this approach on -- but I haven't tested any of them.

 
 To test your theory, you could try using call_rcu to unparent the memory
 region.  The complete fix would be to wrap the MemoryRegion within a

Yep, this test worked.

 container device, and do object_unparent on the container device.  Then
 the following happens:
 
 1) the container object's unparent method calls memory_region_del_subregion
 
 2) when the old flatview is destroyed, it calls memory_region_unref but
 the parent is still in place
 
 3) the container object is destroyed
 
 4) the container object destroys the container device.

OK great!  I noticed this problem while re-structuring this area of code
anyway, so I'll roll this in.

 
 Note that these rules apply only to actual memory regions (I/O or RAM),
 not aliases or containers.  I hope to post a doc patch next week, since
 I'm completing the patches that fix the only other problematic case
 (PCIe MMCONFIG).  I was not expecting this to trigger so early, which is
 why I had left this for part 3.

Looking forward to it -- Thanks for the help Paolo.

Matt

 
 Paolo
 




Re: [Qemu-devel] [PATCH] memory: Fix double unref of flatview

2015-02-12 Thread Paolo Bonzini


On 12/02/2015 17:21, Matthew Rosato wrote:
 Since 374f2981d1 memory: protect current_map by RCU,
 address_space_update_topology unrefs the old_flatview twice,
 once by call_rcu and once by direct call.  This patch removes
 the direct call in favor of the call_rcu.  Fixes at least one
 assertion failure seen in s390, where a ref count for a memory
 region attempts to go negative during hot-unplug of guest memory.
 
 Signed-off-by: Matthew Rosato mjros...@linux.vnet.ibm.com

The two unrefs are correct.

One is needed to balance address_space_get_flatview; the other is 
needed because as-current_map does not point to old_view anymore.

You can remove them with something like this (your patch plus one
extra hunk):

diff --git a/memory.c b/memory.c
index a844ced..5add529 100644
--- a/memory.c
+++ b/memory.c
@@ -747,7 +747,7 @@ static void address_space_update_topology_pass(AddressSpace 
*as,
 
 static void address_space_update_topology(AddressSpace *as)
 {
-FlatView *old_view = address_space_get_flatview(as);
+FlatView *old_view = as-current_map;
 FlatView *new_view = generate_memory_topology(as-root);
 
 address_space_update_topology_pass(as, old_view, new_view, false);
@@ -755,7 +755,6 @@ static void address_space_update_topology(AddressSpace *as)
 
 /* Writes are protected by the BQL.  */
 atomic_rcu_set(as-current_map, new_view);
-call_rcu(old_view, flatview_unref, rcu);
 
 /* Note that all the old MemoryRegions are still alive up to this
  * point.  This relieves most MemoryListeners from the need to
@@ -763,7 +762,7 @@ static void address_space_update_topology(AddressSpace *as)
  * outside the iothread mutex, in which case precise reference
  * counting is necessary.
  */
-flatview_unref(old_view);
+call_rcu(old_view, flatview_unref, rcu);
 
 address_space_update_ioeventfds(as);
 }

but it wouldn't affect your bug.

Paolo



Re: [Qemu-devel] [PATCH] memory: Fix double unref of flatview

2015-02-12 Thread Paolo Bonzini


On 12/02/2015 17:21, Matthew Rosato wrote:
 Since 374f2981d1 memory: protect current_map by RCU,
 address_space_update_topology unrefs the old_flatview twice,
 once by call_rcu and once by direct call.  This patch removes
 the direct call in favor of the call_rcu.  Fixes at least one
 assertion failure seen in s390, where a ref count for a memory
 region attempts to go negative during hot-unplug of guest memory.

This is buggy:

MemoryRegion *standby_ram = g_new(MemoryRegion, 1);
memory_region_init_ram(standby_ram, NULL, id, this_subregion_size, 
error_abort);

...
object_unparent(OBJECT(mr));
g_free(mr);

Memory might not be released after object_unparent (and will not be
released if the RCU callbacks haven't run yet).

Your patch worked around it because it never freed the FlatView, and
thus the RCU callbacks never did anything on the memory regions.

Instead, you have to do this:

MemoryRegion *standby_ram = g_new(MemoryRegion, 1);
Object *obj = OBJECT(standby_ram);

memory_region_init_ram(standby_ram, NULL, id, this_subregion_size, 
error_abort);
OBJECT(mr)-free = g_free;
...
object_unparent(OBJECT(mr));

and the freeing will happen once the reference count goes to zero.

Paolo

 
 Signed-off-by: Matthew Rosato mjros...@linux.vnet.ibm.com
 ---
  memory.c |3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)
 
 diff --git a/memory.c b/memory.c
 index 130152c..d08abe5 100644
 --- a/memory.c
 +++ b/memory.c
 @@ -755,7 +755,6 @@ static void address_space_update_topology(AddressSpace 
 *as)
  
  /* Writes are protected by the BQL.  */
  atomic_rcu_set(as-current_map, new_view);
 -call_rcu(old_view, flatview_unref, rcu);
  
  /* Note that all the old MemoryRegions are still alive up to this
   * point.  This relieves most MemoryListeners from the need to
 @@ -763,7 +762,7 @@ static void address_space_update_topology(AddressSpace 
 *as)
   * outside the iothread mutex, in which case precise reference
   * counting is necessary.
   */
 -flatview_unref(old_view);
 +call_rcu(old_view, flatview_unref, rcu);
  
  address_space_update_ioeventfds(as);
  }