On Пн., 2015-09-28 at 13:55 +0300, Igor Redko wrote: > On Пт., 2015-09-25 at 17:46 +0800, Wen Congyang wrote: > > On 09/25/2015 05:09 PM, Denis V. Lunev wrote: > > > Release qemu global mutex before call synchronize_rcu(). > > > synchronize_rcu() waiting for all readers to finish their critical > > > sections. There is at least one critical section in which we try > > > to get QGM (critical section is in address_space_rw() and > > > prepare_mmio_access() is trying to aquire QGM). > > > > > > Both functions (migration_end() and migration_bitmap_extend()) > > > are called from main thread which is holding QGM. > > > > > > Thus there is a race condition that ends up with deadlock: > > > main thread working thread > > > Lock QGA | > > > | Call KVM_EXIT_IO handler > > > | | > > > | Open rcu reader's critical section > > > Migration cleanup bh | > > > | | > > > synchronize_rcu() is | > > > waiting for readers | > > > | prepare_mmio_access() is waiting for QGM > > > \ / > > > deadlock > > > > > > Patches here are quick and dirty, compile-tested only to validate the > > > architectual approach. > > > > > > Igor, Anna, can you pls start your tests with these patches instead of > > > your > > > original one. Thank you. > > > > Can you give me the backtrace of the working thread? > > > > I think it is very bad to wait some lock in rcu reader's cirtical section. > > #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > #1 0x00007f1ef113ccfd in __GI___pthread_mutex_lock (mutex=0x7f1ef4145ce0 > <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80 > #2 0x00007f1ef3c36546 in qemu_mutex_lock (mutex=0x7f1ef4145ce0 > <qemu_global_mutex>) at util/qemu-thread-posix.c:73 > #3 0x00007f1ef387ff46 in qemu_mutex_lock_iothread () at > /home/user/my_qemu/qemu/cpus.c:1170 > #4 0x00007f1ef38514a2 in prepare_mmio_access (mr=0x7f1ef612f200) at > /home/user/my_qemu/qemu/exec.c:2390 > #5 0x00007f1ef385157e in address_space_rw (as=0x7f1ef40ec940 > <address_space_io>, addr=49402, attrs=..., buf=0x7f1ef3f97000 "\001", len=1, > is_write=true) > at /home/user/my_qemu/qemu/exec.c:2425 > #6 0x00007f1ef3897c53 in kvm_handle_io (port=49402, attrs=..., > data=0x7f1ef3f97000, direction=1, size=1, count=1) at > /home/user/my_qemu/qemu/kvm-all.c:1680 > #7 0x00007f1ef3898144 in kvm_cpu_exec (cpu=0x7f1ef5010fc0) at > /home/user/my_qemu/qemu/kvm-all.c:1849 > #8 0x00007f1ef387fa91 in qemu_kvm_cpu_thread_fn (arg=0x7f1ef5010fc0) at > /home/user/my_qemu/qemu/cpus.c:979 > #9 0x00007f1ef113a6aa in start_thread (arg=0x7f1eef0b9700) at > pthread_create.c:333 > #10 0x00007f1ef0e6feed in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Backtrace of the main thread: #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x00007f6ade8e9a17 in futex_wait (ev=0x7f6adf26e120 <rcu_gp_event>, val=4294967295) at util/qemu-thread-posix.c:301 #2 0x00007f6ade8e9b2d in qemu_event_wait (ev=0x7f6adf26e120 <rcu_gp_event>) at util/qemu-thread-posix.c:399 #3 0x00007f6ade8fd7ec in wait_for_readers () at util/rcu.c:120 #4 0x00007f6ade8fd875 in synchronize_rcu () at util/rcu.c:149 #5 0x00007f6ade5640fd in migration_end () at /home/user/my_qemu/qemu/migration/ram.c:1036 #6 0x00007f6ade564194 in ram_migration_cancel (opaque=0x0) at /home/user/my_qemu/qemu/migration/ram.c:1054 #7 0x00007f6ade567d5a in qemu_savevm_state_cancel () at /home/user/my_qemu/qemu/migration/savevm.c:915 #8 0x00007f6ade7b4bdf in migrate_fd_cleanup (opaque=0x7f6aded8fd40 <current_migration>) at migration/migration.c:582 #9 0x00007f6ade804d15 in aio_bh_poll (ctx=0x7f6adf895e50) at async.c:87 #10 0x00007f6ade814dcb in aio_dispatch (ctx=0x7f6adf895e50) at aio-posix.c:135 #11 0x00007f6ade8050b5 in aio_ctx_dispatch (source=0x7f6adf895e50, callback=0x0, user_data=0x0) at async.c:226 #12 0x00007f6adc9a3c3d in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #13 0x00007f6ade813274 in glib_pollfds_poll () at main-loop.c:208 #14 0x00007f6ade813351 in os_host_main_loop_wait (timeout=422420000) at main-loop.c:253 #15 0x00007f6ade813410 in main_loop_wait (nonblocking=0) at main-loop.c:502 #16 0x00007f6ade64ae6a in main_loop () at vl.c:1902 #17 0x00007f6ade652c32 in main (argc=70, argv=0x7ffcc3b674e8, envp=0x7ffcc3b67720) at vl.c:4653 > > > > > > > > > Signed-off-by: Denis V. Lunev <d...@openvz.org> > > > CC: Igor Redko <red...@virtuozzo.com> > > > CC: Anna Melekhova <an...@virtuozzo.com> > > > CC: Juan Quintela <quint...@redhat.com> > > > CC: Amit Shah <amit.s...@redhat.com> > > > > > > Denis V. Lunev (2): > > > migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0 > > > migration: fix deadlock > > > > > > migration/ram.c | 45 ++++++++++++++++++++++++++++----------------- > > > 1 file changed, 28 insertions(+), 17 deletions(-) > > > > > >