On 29.09.2015 11:47, Dr. David Alan Gilbert wrote:
* Igor Redko (red...@virtuozzo.com) wrote:
On Пт., 2015-09-25 at 17:46 +0800, Wen Congyang wrote:
On 09/25/2015 05:09 PM, Denis V. Lunev wrote:
Release qemu global mutex before call synchronize_rcu().
synchronize_rcu() waiting for all readers to finish their critical
sections. There is at least one critical section in which we try
to get QGM (critical section is in address_space_rw() and
prepare_mmio_access() is trying to aquire QGM).
Both functions (migration_end() and migration_bitmap_extend())
are called from main thread which is holding QGM.
Thus there is a race condition that ends up with deadlock:
main thread working thread
Lock QGA |
| Call KVM_EXIT_IO handler
| |
| Open rcu reader's critical section
Migration cleanup bh |
| |
synchronize_rcu() is |
waiting for readers |
| prepare_mmio_access() is waiting for QGM
\ /
deadlock
Patches here are quick and dirty, compile-tested only to validate the
architectual approach.
Igor, Anna, can you pls start your tests with these patches instead of your
original one. Thank you.
Can you give me the backtrace of the working thread?
I think it is very bad to wait some lock in rcu reader's cirtical section.
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f1ef113ccfd in __GI___pthread_mutex_lock (mutex=0x7f1ef4145ce0
<qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007f1ef3c36546 in qemu_mutex_lock (mutex=0x7f1ef4145ce0
<qemu_global_mutex>) at util/qemu-thread-posix.c:73
#3 0x00007f1ef387ff46 in qemu_mutex_lock_iothread () at
/home/user/my_qemu/qemu/cpus.c:1170
#4 0x00007f1ef38514a2 in prepare_mmio_access (mr=0x7f1ef612f200) at
/home/user/my_qemu/qemu/exec.c:2390
#5 0x00007f1ef385157e in address_space_rw (as=0x7f1ef40ec940 <address_space_io>,
addr=49402, attrs=..., buf=0x7f1ef3f97000 "\001", len=1, is_write=true)
at /home/user/my_qemu/qemu/exec.c:2425
#6 0x00007f1ef3897c53 in kvm_handle_io (port=49402, attrs=...,
data=0x7f1ef3f97000, direction=1, size=1, count=1) at
/home/user/my_qemu/qemu/kvm-all.c:1680
#7 0x00007f1ef3898144 in kvm_cpu_exec (cpu=0x7f1ef5010fc0) at
/home/user/my_qemu/qemu/kvm-all.c:1849
#8 0x00007f1ef387fa91 in qemu_kvm_cpu_thread_fn (arg=0x7f1ef5010fc0) at
/home/user/my_qemu/qemu/cpus.c:979
#9 0x00007f1ef113a6aa in start_thread (arg=0x7f1eef0b9700) at
pthread_create.c:333
#10 0x00007f1ef0e6feed in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Do you have a test to run in the guest that easily triggers this?
Dave
There are two ways to trigger this. Both of them need 2 hosts with
qemu+libvirt (host0 and host1) configured for migration.
First way:
0. Create VM on host0 and install centos7
1. Shutdown VM.
2. Start VM (virsh start <VM_name>) and right after that start migration
to host1 (smth like 'virsh migrate --live --verbose <VM_name>
"qemu+ssh://host1/system"')
3. Stop migration after ~1 sec (after migration process have been
started, but before it completed. for example when you see "Migration: [
5 %]")
deadlock: no response from VM and no response from qemu monitor (for
example 'virsh qemu-monitor-command --hmp <VM_NAME> "info migrate"' will
hang indefinitely) 9/10
Second way:
0. Create VM with e1000 network card on host0 and install centos7
1. Run iperf on VM (or any other load on network)
2. Start migration
3. Stop migration before it completed.
For this approach e1000 network card is essential because it generates
KVM_EXIT_MMIO.
Igor
Signed-off-by: Denis V. Lunev <d...@openvz.org>
CC: Igor Redko <red...@virtuozzo.com>
CC: Anna Melekhova <an...@virtuozzo.com>
CC: Juan Quintela <quint...@redhat.com>
CC: Amit Shah <amit.s...@redhat.com>
Denis V. Lunev (2):
migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0
migration: fix deadlock
migration/ram.c | 45 ++++++++++++++++++++++++++++-----------------
1 file changed, 28 insertions(+), 17 deletions(-)
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK