On 04/12/2017 04:46 PM, Jeff Cody wrote: > > This occurs on v2.9.0-rc4, but not on v2.8.0. > > When running QEMU with an iothread, and then performing a block-mirror, if > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu > becomes deadlocked. > > The block job is not paused, nor cancelled, so we are stuck in the while > loop in block_job_detach_aio_context: > > static void block_job_detach_aio_context(void *opaque) > { > BlockJob *job = opaque; > > /* In case the job terminates during aio_poll()... */ > block_job_ref(job); > > block_job_pause(job); > > while (!job->paused && !job->completed) { > block_job_drain(job); > } >
Looks like when block_job_drain calls block_job_enter from this context (the main thread, since we're trying to do a system_reset...), we cannot enter the coroutine because it's the wrong context, so we schedule an entry instead with aio_co_schedule(ctx, co); But that entry never happens, so the job never wakes up and we never make enough progress in the coroutine to gracefully pause, so we wedge here. > block_job_unref(job); > } > > > Reproducer script and QAPI commands: > > # QEMU script: > gdb --args /home/user/deploy-${1}/bin/qemu-system-x86_64 -enable-kvm -smp 4 > -object iothread,id=iothread0 -drive > file=${2},if=none,id=drive-virtio-disk0,aio=native,cache=none,discard=unmap > -device > virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,iothread=iothread0 > -m 1024 -boot menu=on -qmp stdio -drive > file=${3},if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop > -device > virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7 > > > > # QAPI commands: > { "execute": "drive-mirror", "arguments": { "device": "drive-data-disk0", > "target": "/home/user/sn1", "format": "qcow2", "mode": "absolute-paths", > "sync": "full", "speed": 1000000000, "on-source-error": "stop", > "on-target-error": "stop" } } > > > # after BLOCK_JOB_READY, do system reset > { "execute": "system_reset" } > > > > > > gbd bt: > > (gdb) bt > #0 0x0000555555aa79f3 in bdrv_drain_recurse (bs=bs@entry=0x55555783e900) at > block/io.c:164 > #1 0x0000555555aa825d in bdrv_drained_begin (bs=bs@entry=0x55555783e900) at > block/io.c:231 > #2 0x0000555555aa8449 in bdrv_drain (bs=0x55555783e900) at block/io.c:265 > #3 0x0000555555a9c356 in blk_drain (blk=<optimized out>) at > block/block-backend.c:1383 > #4 0x0000555555aa3cfd in mirror_drain (job=<optimized out>) at > block/mirror.c:1000 > #5 0x0000555555a66e11 in block_job_detach_aio_context > (opaque=0x555557a19a40) at blockjob.c:142 > #6 0x0000555555a62f4d in bdrv_detach_aio_context > (bs=bs@entry=0x555557839410) at block.c:4357 > #7 0x0000555555a63116 in bdrv_set_aio_context (bs=bs@entry=0x555557839410, > new_context=new_context@entry=0x55555668bc20) at block.c:4418 > #8 0x0000555555a9d326 in blk_set_aio_context (blk=0x5555566db520, > new_context=0x55555668bc20) at block/block-backend.c:1662 > #9 0x00005555557e38da in virtio_blk_data_plane_stop (vdev=<optimized out>) > at /home/jcody/work/upstream/qemu-kvm/hw/block/dataplane/virtio-blk.c:262 > #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd > (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:246 > #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd > (bus=bus@entry=0x5555583089a8) at hw/virtio/virtio-bus.c:238 > #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd (proxy=0x555558300510) at > hw/virtio/virtio-pci.c:348 > #13 0x00005555559f6a18 in virtio_pci_reset (qdev=<optimized out>) at > hw/virtio/virtio-pci.c:1872 > #14 0x00005555559139a9 in qdev_reset_one (dev=<optimized out>, > opaque=<optimized out>) at hw/core/qdev.c:310 > #15 0x0000555555916738 in qbus_walk_children (bus=0x55555693aa30, > pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, > post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59 > #16 0x0000555555913318 in qdev_walk_children (dev=0x5555569387d0, > pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, > post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/qdev.c:617 > #17 0x0000555555916738 in qbus_walk_children (bus=0x555556756f70, > pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, > post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59 > #18 0x00005555559168ca in qemu_devices_reset () at hw/core/reset.c:69 > #19 0x000055555581fcbb in pc_machine_reset () at > /home/jcody/work/upstream/qemu-kvm/hw/i386/pc.c:2234 > #20 0x00005555558a4d96 in qemu_system_reset (report=<optimized out>) at > vl.c:1697 > #21 0x000055555577157a in main_loop_should_exit () at vl.c:1865 > #22 0x000055555577157a in main_loop () at vl.c:1902 > #23 0x000055555577157a in main (argc=<optimized out>, argv=<optimized out>, > envp=<optimized out>) at vl.c:4709 > > > -Jeff > Here's a backtrace for an unoptimized build showing all threads: https://paste.fedoraproject.org/paste/lLnm8jKeq2wLKF6yEaoEM15M1UNdIGYhyRLivL9gydE= --js