* Igor Redko (red...@virtuozzo.com) wrote: > On Пт., 2015-09-25 at 17:46 +0800, Wen Congyang wrote: > > On 09/25/2015 05:09 PM, Denis V. Lunev wrote: > > > Release qemu global mutex before call synchronize_rcu(). > > > synchronize_rcu() waiting for all readers to finish their critical > > > sections. There is at least one critical section in which we try > > > to get QGM (critical section is in address_space_rw() and > > > prepare_mmio_access() is trying to aquire QGM). > > > > > > Both functions (migration_end() and migration_bitmap_extend()) > > > are called from main thread which is holding QGM. > > > > > > Thus there is a race condition that ends up with deadlock: > > > main thread working thread > > > Lock QGA | > > > | Call KVM_EXIT_IO handler > > > | | > > > | Open rcu reader's critical section > > > Migration cleanup bh | > > > | | > > > synchronize_rcu() is | > > > waiting for readers | > > > | prepare_mmio_access() is waiting for QGM > > > \ / > > > deadlock > > > > > > Patches here are quick and dirty, compile-tested only to validate the > > > architectual approach. > > > > > > Igor, Anna, can you pls start your tests with these patches instead of > > > your > > > original one. Thank you. > > > > Can you give me the backtrace of the working thread? > > > > I think it is very bad to wait some lock in rcu reader's cirtical section. > > #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > #1 0x00007f1ef113ccfd in __GI___pthread_mutex_lock (mutex=0x7f1ef4145ce0 > <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80 > #2 0x00007f1ef3c36546 in qemu_mutex_lock (mutex=0x7f1ef4145ce0 > <qemu_global_mutex>) at util/qemu-thread-posix.c:73 > #3 0x00007f1ef387ff46 in qemu_mutex_lock_iothread () at > /home/user/my_qemu/qemu/cpus.c:1170 > #4 0x00007f1ef38514a2 in prepare_mmio_access (mr=0x7f1ef612f200) at > /home/user/my_qemu/qemu/exec.c:2390 > #5 0x00007f1ef385157e in address_space_rw (as=0x7f1ef40ec940 > <address_space_io>, addr=49402, attrs=..., buf=0x7f1ef3f97000 "\001", len=1, > is_write=true) > at /home/user/my_qemu/qemu/exec.c:2425 > #6 0x00007f1ef3897c53 in kvm_handle_io (port=49402, attrs=..., > data=0x7f1ef3f97000, direction=1, size=1, count=1) at > /home/user/my_qemu/qemu/kvm-all.c:1680 > #7 0x00007f1ef3898144 in kvm_cpu_exec (cpu=0x7f1ef5010fc0) at > /home/user/my_qemu/qemu/kvm-all.c:1849 > #8 0x00007f1ef387fa91 in qemu_kvm_cpu_thread_fn (arg=0x7f1ef5010fc0) at > /home/user/my_qemu/qemu/cpus.c:979 > #9 0x00007f1ef113a6aa in start_thread (arg=0x7f1eef0b9700) at > pthread_create.c:333 > #10 0x00007f1ef0e6feed in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Do you have a test to run in the guest that easily triggers this? Dave > > > > > > > > Signed-off-by: Denis V. Lunev <d...@openvz.org> > > > CC: Igor Redko <red...@virtuozzo.com> > > > CC: Anna Melekhova <an...@virtuozzo.com> > > > CC: Juan Quintela <quint...@redhat.com> > > > CC: Amit Shah <amit.s...@redhat.com> > > > > > > Denis V. Lunev (2): > > > migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0 > > > migration: fix deadlock > > > > > > migration/ram.c | 45 ++++++++++++++++++++++++++++----------------- > > > 1 file changed, 28 insertions(+), 17 deletions(-) > > > > > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK