On 09/25/2015 05:09 PM, Denis V. Lunev wrote:
> Release qemu global mutex before call synchronize_rcu().
> synchronize_rcu() waiting for all readers to finish their critical
> sections. There is at least one critical section in which we try
> to get QGM (critical section is in address_space_rw() and
> prepare_mmio_access() is trying to aquire QGM).
> 
> Both functions (migration_end() and migration_bitmap_extend())
> are called from main thread which is holding QGM.
> 
> Thus there is a race condition that ends up with deadlock:
> main thread     working thread
> Lock QGA                |
> |             Call KVM_EXIT_IO handler
> |                       |
> |        Open rcu reader's critical section
> Migration cleanup bh    |
> |                       |
> synchronize_rcu() is    |
> waiting for readers     |
> |            prepare_mmio_access() is waiting for QGM
>   \                   /
>          deadlock
> 
> Patches here are quick and dirty, compile-tested only to validate the
> architectual approach.
> 
> Igor, Anna, can you pls start your tests with these patches instead of your
> original one. Thank you.

Can you give me the backtrace of the working thread?

I think it is very bad to wait some lock in rcu reader's cirtical section.

To Paolo:
Do we allow this in rcu critical section?

Thanks
Wen Congyang

> 
> Signed-off-by: Denis V. Lunev <d...@openvz.org>
> CC: Igor Redko <red...@virtuozzo.com>
> CC: Anna Melekhova <an...@virtuozzo.com>
> CC: Juan Quintela <quint...@redhat.com>
> CC: Amit Shah <amit.s...@redhat.com>
> 
> Denis V. Lunev (2):
>   migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0
>   migration: fix deadlock
> 
>  migration/ram.c | 45 ++++++++++++++++++++++++++++-----------------
>  1 file changed, 28 insertions(+), 17 deletions(-)
> 


Reply via email to