On Tue, 01/24 09:53, Stefan Hajnoczi wrote: > AioHandlers marked ->is_external must be skipped when aio_node_check() > fails. bdrv_drained_begin() needs this to prevent dataplane from > submitting new I/O requests while another thread accesses the device and > relies on it being quiesced. > > This patch fixes the following segfault: > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at > qemu/block/io.c:2650 > 2650 bdrv_io_plug(child->bs); > [Current thread is 1 (Thread 0x7ff5c4bd1c80 (LWP 10917))] > (gdb) bt > #0 0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at > qemu/block/io.c:2650 > #1 0x00005577f6114363 in blk_io_plug (blk=0x5577f7b8ba20) at > qemu/block/block-backend.c:1561 > #2 0x00005577f5d4091d in virtio_blk_handle_vq (s=0x5577f9ada030, > vq=0x5577f9b3d2a0) at qemu/hw/block/virtio-blk.c:589 > #3 0x00005577f5d4240d in virtio_blk_data_plane_handle_output > (vdev=0x5577f9ada030, vq=0x5577f9b3d2a0) at > qemu/hw/block/dataplane/virtio-blk.c:158 > #4 0x00005577f5d88acd in virtio_queue_notify_aio_vq (vq=0x5577f9b3d2a0) at > qemu/hw/virtio/virtio.c:1304 > #5 0x00005577f5d8aaaf in virtio_queue_host_notifier_aio_poll > (opaque=0x5577f9b3d308) at qemu/hw/virtio/virtio.c:2134 > #6 0x00005577f60ca077 in run_poll_handlers_once (ctx=0x5577f79ddbb0) at > qemu/aio-posix.c:493 > #7 0x00005577f60ca268 in try_poll_mode (ctx=0x5577f79ddbb0, blocking=true) > at qemu/aio-posix.c:569 > #8 0x00005577f60ca331 in aio_poll (ctx=0x5577f79ddbb0, blocking=true) at > qemu/aio-posix.c:601 > #9 0x00005577f612722a in bdrv_flush (bs=0x5577f7c20970) at > qemu/block/io.c:2403 > #10 0x00005577f60c1b2d in bdrv_close (bs=0x5577f7c20970) at > qemu/block.c:2322 > #11 0x00005577f60c20e7 in bdrv_delete (bs=0x5577f7c20970) at > qemu/block.c:2465 > #12 0x00005577f60c3ecf in bdrv_unref (bs=0x5577f7c20970) at > qemu/block.c:3425 > #13 0x00005577f60bf951 in bdrv_root_unref_child (child=0x5577f7a2de70) at > qemu/block.c:1361 > #14 0x00005577f6112162 in blk_remove_bs (blk=0x5577f7b8ba20) at > qemu/block/block-backend.c:491 > #15 0x00005577f6111b1b in blk_remove_all_bs () at > qemu/block/block-backend.c:245 > #16 0x00005577f60c1db6 in bdrv_close_all () at qemu/block.c:2382 > #17 0x00005577f5e60cca in main (argc=20, argv=0x7ffea6eb8398, > envp=0x7ffea6eb8440) at qemu/vl.c:4684 > > The key thing is that bdrv_close() uses bdrv_drained_begin() and > virtio_queue_host_notifier_aio_poll() must not be called. > > Thanks to Fam Zheng <f...@redhat.com> for identifying the root cause of > this crash. > > Reported-by: Alberto Garcia <be...@igalia.com> > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > --- > aio-posix.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/aio-posix.c b/aio-posix.c > index 9453d83..a8d7090 100644 > --- a/aio-posix.c > +++ b/aio-posix.c > @@ -508,7 +508,8 @@ static bool run_poll_handlers_once(AioContext *ctx) > > QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { > if (!node->deleted && node->io_poll && > - node->io_poll(node->opaque)) { > + aio_node_check(ctx, node->is_external) && > + node->io_poll(node->opaque)) { > progress = true; > } > > -- > 2.9.3 > >
The patch is not wrong and I believe it is enough to fix the crash, however it's not enough... All in all I think we should skip external handlers regardless of aio_disable_external(), or even skip try_poll_mode, in nested aio_poll()'s. The reasons are 1) many nested aio_poll()'s don't have bdrv_drained_begin, so this check is not sufficient; 2) aio_poll() on qemu_aio_context doesn't look at ioeventfd before, but this was changed by adding try_poll_mode(), which is not very correct. These two factors combined together make it possible for bdrv_flush() etc to spin longer than necessary, if not forever, when the guest keeps submitting more requests with ioeventfd. Fam