Hi Kevin,
here we go. I created a blocking multipath device (interrupted all
paths). qemu-kvm hangs with 100% cpu.
also monitor is not responding.
If I restore at least one path, the vm is continueing.
BR,
Peter
^C
Program received signal SIGINT, Interrupt.
0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) bt
#0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
#1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0
#2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0
#3 0x000000000042e739 in kvm_mutex_lock () at
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524
#4 0x000000000042e76e in qemu_mutex_lock_iothread () at
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537
#5 0x000000000040c262 in main_loop_wait (timeout=1000) at
/usr/src/qemu-kvm-0.12.4/vl.c:3995
#6 0x000000000042dcf1 in kvm_main_loop () at
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126
#7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212
#8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8,
envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252
(gdb) bt full
#0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0
No symbol table info available.
#1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0
No symbol table info available.
#2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0
No symbol table info available.
#3 0x000000000042e739 in kvm_mutex_lock () at
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524
No locals.
#4 0x000000000042e76e in qemu_mutex_lock_iothread () at
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537
No locals.
#5 0x000000000040c262 in main_loop_wait (timeout=1000) at
/usr/src/qemu-kvm-0.12.4/vl.c:3995
ioh = (IOHandlerRecord *) 0x0
rfds = {fds_bits = {1048576, 0 <repeats 15 times>}}
wfds = {fds_bits = {0 <repeats 16 times>}}
xfds = {fds_bits = {0 <repeats 16 times>}}
ret = 1
nfds = 21
tv = {tv_sec = 0, tv_usec = 999761}
#6 0x000000000042dcf1 in kvm_main_loop () at
/usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126
fds = {18, 19}
mask = {__val = {268443712, 0 <repeats 15 times>}}
sigfd = 20
#7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212
r = 0
#8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8,
envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252
gdbstub_dev = 0x0
boot_devices_bitmap = 12
i = 0
snapshot = 0
linux_boot = 0
initrd_filename = 0x0
kernel_filename = 0x0
kernel_cmdline = 0x588fac ""
boot_devices = "dc", '\0' <repeats 30 times>
ds = (DisplayState *) 0x198bf00
dcl = (DisplayChangeListener *) 0x0
cyls = 0
heads = 0
secs = 0
translation = 0
hda_opts = (QemuOpts *) 0x0
opts = (QemuOpts *) 0x1957390
optind = 30
---Type <return> to continue, or q <return> to quit---
r = 0x7fff266a8a23 "-usbdevice"
optarg = 0x7fff266a8a2e "tablet"
loadvm = 0x0
machine = (QEMUMachine *) 0x861720
cpu_model = 0x7fff266a8917 "qemu64,model_id=Intel(R) Xeon(R) CPU", '
' <repeats 11 times>, "E5520 @ 2.27GHz"
fds = {644511720, 32767}
tb_size = 0
pid_file = 0x7fff266a89bb "/var/run/qemu/vm-150.pid"
incoming = 0x0
fd = 0
pwd = (struct passwd *) 0x0
chroot_dir = 0x0
run_as = 0x0
env = (struct CPUX86State *) 0x0
show_vnc_port = 0
params = {0x58cc76 "order", 0x58cc7c "once", 0x58cc81 "menu", 0x0}
Kevin Wolf wrote:
Am 04.05.2010 15:42, schrieb Peter Lieven:
hi kevin,
you did it *g*
looks promising. applied this patched and was not able to reproduce yet :-)
secure way to reproduce was to shut down all multipath paths, then
initiate i/o
in the vm (e.g. start an application). of course, everything hangs at
this point.
after reenabling one path, vm crashed. now it seems to behave correctly and
just report an DMA timeout and continues normally afterwards.
Great, I'm going to submit it as a proper patch then.
Christoph, by now I'm pretty sure it's right, but can you have another
look if this is correct, anyway?
can you imagine of any way preventing the vm to consume 100% cpu in
that waiting state?
my current approach is to run all vms with nice 1, which helped to keep the
machine responsible if all vms (in my test case 64 on a box) have hanging
i/o at the same time.
I don't have anything particular in mind, but you could just attach gdb
and get another backtrace while it consumes 100% CPU (you'll need to use
"thread apply all bt" to catch everything). Then we should see where
it's hanging.
Kevin
--
Mit freundlichen Grüßen/Kind Regards
Peter Lieven
..........................................................................................................
KAMP Netzwerkdienste GmbH
Vestische Str. 89-91 | 46117 Oberhausen
Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
mailto:p...@kamp.de | http://www.kamp.de
Geschäftsführer: Heiner Lante | Michael Lante
Amtsgericht Duisburg | HRB Nr. 12154
USt-Id-Nr.: DE 120607556
.........................................................................................................