On Tue, 06/09 11:01, Christian Borntraeger wrote: > Am 09.06.2015 um 04:28 schrieb Fam Zheng: > > On Tue, 06/02 16:36, Christian Borntraeger wrote: > >> Paolo, > >> > >> I bisected > >> commit a0710f7995f914e3044e5899bd8ff6c43c62f916 > >> Author: Paolo Bonzini <pbonz...@redhat.com> > >> AuthorDate: Fri Feb 20 17:26:52 2015 +0100 > >> Commit: Kevin Wolf <kw...@redhat.com> > >> CommitDate: Tue Apr 28 15:36:08 2015 +0200 > >> > >> iothread: release iothread around aio_poll > >> > >> to cause a problem with hanging guests. > >> > >> Having many guests all with a kernel/ramdisk (via -kernel) and > >> several null block devices will result in hangs. All hanging > >> guests are in partition detection code waiting for an I/O to return > >> so very early maybe even the first I/O. > >> > >> Reverting that commit "fixes" the hangs. > >> Any ideas? > > > > Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do you > > have a reproducer for x86? Or could you collect backtraces for all the > > threads > > in QEMU when it hangs? > > > > My long shot is that the main loop is blocked at aio_context_acquire(ctx), > > while the iothread of that ctx is blocked at aio_poll(ctx, blocking). > > Here is a backtrace on s390. I need 2 or more disks, (one is not enough).
It shows iothreads and main loop are all waiting for events, and the vcpu threads are running guest code. It could be the requests being leaked. Do you see this problem with a regular file based image or null-co driver? Maybe we're missing something about the AioContext in block/null.c. Fam > > Thread 5 (Thread 0x3fffb406910 (LWP 74602)): > #0 0x000003fffc0bde8e in syscall () from /lib64/libc.so.6 > #1 0x00000000801dd282 in futex_wait (val=4294967295, ev=0x8079c6c4 > <rcu_call_ready_event>) at > /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:301 > #2 qemu_event_wait (ev=ev@entry=0x8079c6c4 <rcu_call_ready_event>) at > /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:399 > #3 0x00000000801ec75c in call_rcu_thread (opaque=<optimized out>) at > /home/cborntra/REPOS/qemu/util/rcu.c:233 > #4 0x000003fffc16f4e6 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003fffc0c30fa in thread_start () from /lib64/libc.so.6 > > Thread 4 (Thread 0x3fffabf7910 (LWP 74604)): > #0 0x000003fffc0b75d6 in ppoll () from /lib64/libc.so.6 > #1 0x000000008016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized > out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 > #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, > timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310 > #3 0x0000000080170d32 in aio_poll (ctx=0x807d6a70, > blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274 > #4 0x00000000800b3758 in iothread_run (opaque=0x807d6690) at > /home/cborntra/REPOS/qemu/iothread.c:41 > #5 0x000003fffc16f4e6 in start_thread () from /lib64/libpthread.so.0 > #6 0x000003fffc0c30fa in thread_start () from /lib64/libc.so.6 > > Thread 3 (Thread 0x3fffa3f7910 (LWP 74605)): > #0 0x000003fffc0b75d6 in ppoll () from /lib64/libc.so.6 > #1 0x000000008016fbd0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized > out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 > #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, > timeout=timeout@entry=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:310 > #3 0x0000000080170d32 in aio_poll (ctx=0x807d94a0, > blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:274 > #4 0x00000000800b3758 in iothread_run (opaque=0x807d6f60) at > /home/cborntra/REPOS/qemu/iothread.c:41 > #5 0x000003fffc16f4e6 in start_thread () from /lib64/libpthread.so.0 > #6 0x000003fffc0c30fa in thread_start () from /lib64/libc.so.6 > > Thread 2 (Thread 0x3fff8a21910 (LWP 74625)): > #0 0x000003fffc0b90a2 in ioctl () from /lib64/libc.so.6 > #1 0x0000000080056a46 in kvm_vcpu_ioctl (cpu=cpu@entry=0x81f3b620, > type=type@entry=44672) at /home/cborntra/REPOS/qemu/kvm-all.c:1916 > #2 0x0000000080056b08 in kvm_cpu_exec (cpu=cpu@entry=0x81f3b620) at > /home/cborntra/REPOS/qemu/kvm-all.c:1775 > #3 0x00000000800445de in qemu_kvm_cpu_thread_fn (arg=0x81f3b620) at > /home/cborntra/REPOS/qemu/cpus.c:979 > #4 0x000003fffc16f4e6 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003fffc0c30fa in thread_start () from /lib64/libc.so.6 > > Thread 1 (Thread 0x3fffb408bc0 (LWP 74580)): > #0 0x000003fffc0b75d6 in ppoll () from /lib64/libc.so.6 > #1 0x000000008016fbb0 in ppoll (__ss=0x0, __timeout=0x3ffffd64438, > __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 > #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, > timeout=timeout@entry=999000000) at /home/cborntra/REPOS/qemu/qemu-timer.c:322 > #3 0x000000008016f230 in os_host_main_loop_wait (timeout=999000000) at > /home/cborntra/REPOS/qemu/main-loop.c:239 > #4 main_loop_wait (nonblocking=<optimized out>) at > /home/cborntra/REPOS/qemu/main-loop.c:494 > #5 0x000000008001346a in main_loop () at /home/cborntra/REPOS/qemu/vl.c:1789 > #6 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) > at /home/cborntra/REPOS/qemu/vl.c:4391 > >