deadlock when using iothread during backup_clean()
Hi, as the title says, a deadlock is possible in backup_clean() when using a drive with an IO thread. The main thread keeps waiting in bdrv_graph_wrlock() for reader_count() to become 0, while the IO thread waits for its AioContext lock, which the main thread holds (because it also is the backup job's AioContext). See the stack trace below [1]. The reason why it happens becomes clear with following commit message: > commit 31b2ddfea304afc498aca8cac171020ef33eb89b > Author: Kevin Wolf > Date: Mon Jun 5 10:57:10 2023 +0200 > > graph-lock: Unlock the AioContext while polling > > If the caller keeps the AioContext lock for a block node in an iothread, > polling in bdrv_graph_wrlock() deadlocks if the condition isn't > fulfilled immediately. > > Now that all callers make sure to actually have the AioContext locked > when they call bdrv_replace_child_noperm() like they should, we can > change bdrv_graph_wrlock() to take a BlockDriverState whose AioContext > lock the caller holds (NULL if it doesn't) and unlock it temporarily > while polling. > and noting that bdrv_graph_wrlock() is called with bs being NULL while the caller holds the IO thread's AioContext lock. The question is where should the IO thread's AioContext lock be dropped by the main thread? The "Graph locking part 4 (node management)" series [0] (also reproduced the dealock with that applied) moves the bdrv_graph_wrlock() further up to block_job_remove_all_bdrv(), but it still passes along NULL as the argument. Can we assume block_job_remove_all_bdrv() to always hold the job's AioContext? And if yes, can we just tell bdrv_graph_wrlock() that it needs to release that before polling to fix the deadlock? Best Regards, Fiona [0]: https://lists.nongnu.org/archive/html/qemu-devel/2023-08/msg02831.html [1]: IO thread: > Thread 3 (Thread 0x7fa2209596c0 (LWP 119031) "qemu-system-x86"): > #0 futex_wait (private=0, expected=2, futex_word=0x5648fc93b2c0) at > ../sysdeps/nptl/futex-internal.h:146 > #1 __GI___lll_lock_wait (futex=futex@entry=0x5648fc93b2c0, private=0) at > ./nptl/lowlevellock.c:49 > #2 0x7fa2240e532a in lll_mutex_lock_optimized (mutex=0x5648fc93b2c0) at > ./nptl/pthread_mutex_lock.c:48 > #3 ___pthread_mutex_lock (mutex=0x5648fc93b2c0) at > ./nptl/pthread_mutex_lock.c:128 > #4 0x5648fa1742f8 in qemu_mutex_lock_impl (mutex=0x5648fc93b2c0, > file=0x5648fa446579 "../util/async.c", line=728) at > ../util/qemu-thread-posix.c:94 > #5 0x5648fa174528 in qemu_rec_mutex_lock_impl (mutex=0x5648fc93b2c0, > file=0x5648fa446579 "../util/async.c", line=728) at > ../util/qemu-thread-posix.c:149 > #6 0x5648fa18dd6f in aio_context_acquire (ctx=0x5648fc93b260) at > ../util/async.c:728 > #7 0x5648fa18dce4 in aio_co_enter (ctx=0x5648fc93b260, > co=0x7fa217f49210) at ../util/async.c:710 > #8 0x5648fa18dc31 in aio_co_wake (co=0x7fa217f49210) at > ../util/async.c:695 > #9 0x5648fa08e4a2 in luring_process_completions (s=0x5648fdf2dd00) at > ../block/io_uring.c:216 > #10 0x5648fa08e6c7 in luring_process_completions_and_submit > (s=0x5648fdf2dd00) at ../block/io_uring.c:268 > #11 0x5648fa08e727 in qemu_luring_completion_cb (opaque=0x5648fdf2dd00) > at ../block/io_uring.c:284 > #12 0x5648fa16f433 in aio_dispatch_handler (ctx=0x5648fc93b260, > node=0x5648fdf2de50) at ../util/aio-posix.c:372 > #13 0x5648fa16f506 in aio_dispatch_ready_handlers (ctx=0x5648fc93b260, > ready_list=0x7fa220954180) at ../util/aio-posix.c:401 > #14 0x5648fa16ff7d in aio_poll (ctx=0x5648fc93b260, blocking=true) at > ../util/aio-posix.c:723 > #15 0x5648f9fbaa59 in iothread_run (opaque=0x5648fc5ed6b0) at > ../iothread.c:63 > #16 0x5648fa175046 in qemu_thread_start (args=0x5648fc93b8f0) at > ../util/qemu-thread-posix.c:541 > #17 0x7fa2240e2044 in start_thread (arg=) at > ./nptl/pthread_create.c:442 > #18 0x7fa2241625fc in clone3 () at > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 > main thread: > Thread 1 (Thread 0x7fa2214bf340 (LWP 119029) "qemu-system-x86"): > #0 0x7fa224155136 in __ppoll (fds=0x5648fdf2ce50, nfds=2, > timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42 > #1 0x5648fa193102 in qemu_poll_ns (fds=0x5648fdf2ce50, nfds=2, > timeout=-1) at ../util/qemu-timer.c:339 > #2 0x5648fa17042f in fdmon_poll_wait (ctx=0x5648fc6c9740, > ready_list=0x72757260, timeout=-1) at ../util/fdmon-poll.c:79 > #3 0x5648fa16fd6b in aio_poll (ctx=0x5648fc6c9740, blocking=true) at > ../util/aio-posix.c:670 > #4 0x5648f9ffc0bc in bdrv_graph_wrlock (bs=0x0) at > ../block/graph-lock.c:145 > #5 0x5648f9fc78fa in bdrv_replace_child_noperm (child=0x5648fd729fd0, > new_bs=0x0) at ../block.c:2897 > #6 0x5648f9fc8487 in bdrv_root_unref_child (child=0x5648fd729fd0) at > ../block.c:3227 > #7 0x5648f9fd4b50 in block_job_remove_all_bdrv (job=0x5648fe0b6960) at > ../blockjob.c:208 > #8 0x5648f9feb11b in bac
Re: deadlock when using iothread during backup_clean()
Am 05.09.23 um 12:01 schrieb Fiona Ebner: > > Can we assume block_job_remove_all_bdrv() to always hold the job's > AioContext? And if yes, can we just tell bdrv_graph_wrlock() that it > needs to release that before polling to fix the deadlock? > I tried a doing something similar as a proof-of-concept > diff --git a/blockjob.c b/blockjob.c > index 58c5d64539..1a696241a0 100644 > --- a/blockjob.c > +++ b/blockjob.c > @@ -198,19 +198,19 @@ void block_job_remove_all_bdrv(BlockJob *job) > * one to make sure that such a concurrent access does not attempt > * to process an already freed BdrvChild. > */ > -bdrv_graph_wrlock(NULL); > while (job->nodes) { > GSList *l = job->nodes; > BdrvChild *c = l->data; > > job->nodes = l->next; > > +bdrv_graph_wrlock(c->bs); > bdrv_op_unblock_all(c->bs, job->blocker); > bdrv_root_unref_child(c); > +bdrv_graph_wrunlock(); > > g_slist_free_1(l); > } > -bdrv_graph_wrunlock(); > } and while it did get slightly further, I ran into another deadlock with > #0 0x7f1941155136 in __ppoll (fds=0x55992068fb20, nfds=2, > timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42 > #1 0x55991c6a1a3f in qemu_poll_ns (fds=0x55992068fb20, nfds=2, > timeout=-1) at ../util/qemu-timer.c:339 > #2 0x55991c67ed6c in fdmon_poll_wait (ctx=0x55991f058810, > ready_list=0x7ffda8c987b0, timeout=-1) at ../util/fdmon-poll.c:79 > #3 0x55991c67e6a8 in aio_poll (ctx=0x55991f058810, blocking=true) at > ../util/aio-posix.c:670 > #4 0x55991c50a763 in bdrv_graph_wrlock (bs=0x0) at > ../block/graph-lock.c:145 > #5 0x55991c4daf85 in bdrv_close (bs=0x55991fff2f30) at ../block.c:5166 > #6 0x55991c4dc050 in bdrv_delete (bs=0x55991fff2f30) at ../block.c:5606 > #7 0x55991c4df205 in bdrv_unref (bs=0x55991fff2f30) at ../block.c:7173 > #8 0x55991c4fb8ca in bdrv_cbw_drop (bs=0x55991fff2f30) at > ../block/copy-before-write.c:566 > #9 0x55991c4f9685 in backup_clean (job=0x55992016d0b0) at > ../block/backup.c:105
Re: deadlock when using iothread during backup_clean()
On 9/5/23 12:01, Fiona Ebner wrote: Can we assume block_job_remove_all_bdrv() to always hold the job's AioContext? I think so, see job_unref_locked(), job_prepare_locked() and job_finalize_single_locked(). These call the callbacks that ultimately get to block_job_remove_all_bdrv(). And if yes, can we just tell bdrv_graph_wrlock() that it needs to release that before polling to fix the deadlock? No, but I think it should be released and re-acquired in block_job_remove_all_bdrv() itself. mirror_exit_common() however holds _two_ AioContext locks at the time it calls block_job_remove_all_bdrv(), qemu_get_aio_context() has to be released and reacquired in mirror_exit_common() itself. Paolo
Re: deadlock when using iothread during backup_clean()
Am 05.09.23 um 13:42 schrieb Paolo Bonzini: > On 9/5/23 12:01, Fiona Ebner wrote: >> Can we assume block_job_remove_all_bdrv() to always hold the job's >> AioContext? > > I think so, see job_unref_locked(), job_prepare_locked() and > job_finalize_single_locked(). These call the callbacks that ultimately > get to block_job_remove_all_bdrv(). > >> And if yes, can we just tell bdrv_graph_wrlock() that it >> needs to release that before polling to fix the deadlock? > > No, but I think it should be released and re-acquired in > block_job_remove_all_bdrv() itself. > For fixing the backup cancel deadlock, I tried the following: > diff --git a/blockjob.c b/blockjob.c > index 58c5d64539..fd6132ebfe 100644 > --- a/blockjob.c > +++ b/blockjob.c > @@ -198,7 +198,9 @@ void block_job_remove_all_bdrv(BlockJob *job) > * one to make sure that such a concurrent access does not attempt > * to process an already freed BdrvChild. > */ > +aio_context_release(job->job.aio_context); > bdrv_graph_wrlock(NULL); > +aio_context_acquire(job->job.aio_context); > while (job->nodes) { > GSList *l = job->nodes; > BdrvChild *c = l->data; but it's not enough unfortunately. And I don't just mean with the later deadlock during bdrv_close() (via bdrv_cbw_drop()) as mentioned in the other mail. Even when I got lucky and that deadlock didn't trigger by chance or with an additional change to try and avoid that one > diff --git a/block.c b/block.c > index e7f349b25c..02d2c4e777 100644 > --- a/block.c > +++ b/block.c > @@ -5165,7 +5165,7 @@ static void bdrv_close(BlockDriverState *bs) > bs->drv = NULL; > } > > -bdrv_graph_wrlock(NULL); > +bdrv_graph_wrlock(bs); > QLIST_FOREACH_SAFE(child, &bs->children, next, next) { > bdrv_unref_child(bs, child); > } often guest IO would get completely stuck after canceling the backup. There's nothing obvious to me in the backtraces at that point, but it seems the vCPU and main threads running like usual, while the IO thread is stuck in aio_poll(), i.e. never returns from the __ppoll() call. This would happen with both, a VirtIO SCSI and a VirtIO block disk and with both aio=io_uring and aio=threads. I should also mention I'm using > fio --name=file --size=4k --direct=1 --rw=randwrite --bs=4k --ioengine=psync > --numjobs=5 --runtime=6000 --time_based inside the guest during canceling of the backup. I'd be glad for any pointers what to look for and happy to provide more information. Best Regards, Fiona
Re: deadlock when using iothread during backup_clean()
On 05.09.23 14:25, Fiona Ebner wrote: Am 05.09.23 um 12:01 schrieb Fiona Ebner: Can we assume block_job_remove_all_bdrv() to always hold the job's AioContext? And if yes, can we just tell bdrv_graph_wrlock() that it needs to release that before polling to fix the deadlock? I tried a doing something similar as a proof-of-concept diff --git a/blockjob.c b/blockjob.c index 58c5d64539..1a696241a0 100644 --- a/blockjob.c +++ b/blockjob.c @@ -198,19 +198,19 @@ void block_job_remove_all_bdrv(BlockJob *job) * one to make sure that such a concurrent access does not attempt * to process an already freed BdrvChild. */ -bdrv_graph_wrlock(NULL); while (job->nodes) { GSList *l = job->nodes; BdrvChild *c = l->data; job->nodes = l->next; +bdrv_graph_wrlock(c->bs); bdrv_op_unblock_all(c->bs, job->blocker); bdrv_root_unref_child(c); +bdrv_graph_wrunlock(); g_slist_free_1(l); } -bdrv_graph_wrunlock(); } and while it did get slightly further, I ran into another deadlock with #0 0x7f1941155136 in __ppoll (fds=0x55992068fb20, nfds=2, timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42 #1 0x55991c6a1a3f in qemu_poll_ns (fds=0x55992068fb20, nfds=2, timeout=-1) at ../util/qemu-timer.c:339 #2 0x55991c67ed6c in fdmon_poll_wait (ctx=0x55991f058810, ready_list=0x7ffda8c987b0, timeout=-1) at ../util/fdmon-poll.c:79 #3 0x55991c67e6a8 in aio_poll (ctx=0x55991f058810, blocking=true) at ../util/aio-posix.c:670 #4 0x55991c50a763 in bdrv_graph_wrlock (bs=0x0) at ../block/graph-lock.c:145 Interesting, why in bdrv_close() we pass NULL to bdrv_graph_wrlock.. Shouldn't bdrv_close() be called with bs aio context locked? #5 0x55991c4daf85 in bdrv_close (bs=0x55991fff2f30) at ../block.c:5166 #6 0x55991c4dc050 in bdrv_delete (bs=0x55991fff2f30) at ../block.c:5606 #7 0x55991c4df205 in bdrv_unref (bs=0x55991fff2f30) at ../block.c:7173 #8 0x55991c4fb8ca in bdrv_cbw_drop (bs=0x55991fff2f30) at ../block/copy-before-write.c:566 #9 0x55991c4f9685 in backup_clean (job=0x55992016d0b0) at ../block/backup.c:105 -- Best regards, Vladimir
Re: deadlock when using iothread during backup_clean()
On 28.09.23 11:06, Fiona Ebner wrote: Am 05.09.23 um 13:42 schrieb Paolo Bonzini: On 9/5/23 12:01, Fiona Ebner wrote: Can we assume block_job_remove_all_bdrv() to always hold the job's AioContext? I think so, see job_unref_locked(), job_prepare_locked() and job_finalize_single_locked(). These call the callbacks that ultimately get to block_job_remove_all_bdrv(). And if yes, can we just tell bdrv_graph_wrlock() that it needs to release that before polling to fix the deadlock? No, but I think it should be released and re-acquired in block_job_remove_all_bdrv() itself. For fixing the backup cancel deadlock, I tried the following: diff --git a/blockjob.c b/blockjob.c index 58c5d64539..fd6132ebfe 100644 --- a/blockjob.c +++ b/blockjob.c @@ -198,7 +198,9 @@ void block_job_remove_all_bdrv(BlockJob *job) * one to make sure that such a concurrent access does not attempt * to process an already freed BdrvChild. */ +aio_context_release(job->job.aio_context); bdrv_graph_wrlock(NULL); +aio_context_acquire(job->job.aio_context); while (job->nodes) { GSList *l = job->nodes; BdrvChild *c = l->data; but it's not enough unfortunately. And I don't just mean with the later deadlock during bdrv_close() (via bdrv_cbw_drop()) as mentioned in the other mail. Even when I got lucky and that deadlock didn't trigger by chance or with an additional change to try and avoid that one diff --git a/block.c b/block.c index e7f349b25c..02d2c4e777 100644 --- a/block.c +++ b/block.c @@ -5165,7 +5165,7 @@ static void bdrv_close(BlockDriverState *bs) bs->drv = NULL; } -bdrv_graph_wrlock(NULL); +bdrv_graph_wrlock(bs); QLIST_FOREACH_SAFE(child, &bs->children, next, next) { bdrv_unref_child(bs, child); } often guest IO would get completely stuck after canceling the backup. There's nothing obvious to me in the backtraces at that point, but it seems the vCPU and main threads running like usual, while the IO thread is stuck in aio_poll(), i.e. never returns from the __ppoll() call. This would happen with both, a VirtIO SCSI and a VirtIO block disk and with both aio=io_uring and aio=threads. When IO is stuck, it may be helpful to look at bs->tracked_requests list in gdb. If there are requests, each one has field .co, which points to the coroutine of request. Next step is to look at coroutine stack. Something like (in gdb): source scripts/qemu-gdb.py qemu coroutine may work. ("may", because it was long ago when I used this last time) I should also mention I'm using fio --name=file --size=4k --direct=1 --rw=randwrite --bs=4k --ioengine=psync --numjobs=5 --runtime=6000 --time_based inside the guest during canceling of the backup. I'd be glad for any pointers what to look for and happy to provide more information. Best Regards, Fiona -- Best regards, Vladimir
Re: deadlock when using iothread during backup_clean()
Am 04.10.23 um 19:08 schrieb Vladimir Sementsov-Ogievskiy: > On 28.09.23 11:06, Fiona Ebner wrote: >> For fixing the backup cancel deadlock, I tried the following: >> >>> diff --git a/blockjob.c b/blockjob.c >>> index 58c5d64539..fd6132ebfe 100644 >>> --- a/blockjob.c >>> +++ b/blockjob.c >>> @@ -198,7 +198,9 @@ void block_job_remove_all_bdrv(BlockJob *job) >>>* one to make sure that such a concurrent access does not attempt >>>* to process an already freed BdrvChild. >>>*/ >>> +aio_context_release(job->job.aio_context); >>> bdrv_graph_wrlock(NULL); >>> +aio_context_acquire(job->job.aio_context); >>> while (job->nodes) { >>> GSList *l = job->nodes; >>> BdrvChild *c = l->data; >> >> but it's not enough unfortunately. And I don't just mean with the later >> deadlock during bdrv_close() (via bdrv_cbw_drop()) as mentioned in the >> other mail. >> >> Even when I got lucky and that deadlock didn't trigger by chance or with >> an additional change to try and avoid that one >> >>> diff --git a/block.c b/block.c >>> index e7f349b25c..02d2c4e777 100644 >>> --- a/block.c >>> +++ b/block.c >>> @@ -5165,7 +5165,7 @@ static void bdrv_close(BlockDriverState *bs) >>> bs->drv = NULL; >>> } >>> -bdrv_graph_wrlock(NULL); >>> +bdrv_graph_wrlock(bs); >>> QLIST_FOREACH_SAFE(child, &bs->children, next, next) { >>> bdrv_unref_child(bs, child); >>> } >> >> often guest IO would get completely stuck after canceling the backup. >> There's nothing obvious to me in the backtraces at that point, but it >> seems the vCPU and main threads running like usual, while the IO thread >> is stuck in aio_poll(), i.e. never returns from the __ppoll() call. This >> would happen with both, a VirtIO SCSI and a VirtIO block disk and with >> both aio=io_uring and aio=threads. > > When IO is stuck, it may be helpful to look at bs->tracked_requests list > in gdb. If there are requests, each one has field .co, which points to > the coroutine of request. After the IO was stuck in the guest, I used bdrv_next_all_states() to iterate over the states and there's only the bdrv_raw and the bdrv_host_device. For both, tracked_requests was empty. What is also very interesting is that the IO isn't always dead immediately. It can be that the fio command still runs with lower speed for a while (sometimes even up to about a minute, but most often about 10-15 seconds or so). During that time, I still can see calls to virtio_scsi_handle_cmd() and blk_aio_write_entry(). Then they suddenly stop. >> >> I should also mention I'm using >> >>> fio --name=file --size=4k --direct=1 --rw=randwrite --bs=4k >>> --ioengine=psync --numjobs=5 --runtime=6000 --time_based >> >> inside the guest during canceling of the backup. One single time, it got stuck polling while canceling [0]. I usually need to do the backup+cancel a few times, because the IO doesn't get stuck each time, so this was a subsequent invocation. The reentrancy to virtio_scsi_handle_cmd()/etc. seems strange at a first glance. Can that be normal? Best Regards, Fiona [0]: > Thread 3 (Thread 0x7ff7f28946c0 (LWP 1815909) "qemu-system-x86"): > #0 0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, > timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42 > #1 0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, > timeout=-1) at ../util/qemu-timer.c:339 > #2 0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, > ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79 > #3 0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at > ../util/aio-posix.c:670 > #4 0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at > /home/febner/repos/qemu/block/block-gen.h:43 > #5 0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at > block/block-gen.c:1426 > #6 0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, > d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290 > #7 0x558712f697e4 in virtio_scsi_handle_cmd_req_prepare > (s=0x558716c049c0, req=0x7ff7e400b650) at ../hw/scsi/virtio-scsi.c:788 > #8 0x558712f699b0 in virtio_scsi_handle_cmd_vq (s=0x558716c049c0, > vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:831 > #9 0x558712f69bcb in virtio_scsi_handle_cmd (vdev=0x558716c049c0, > vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:867 > #10 0x558712f96812 in virtio_queue_notify_vq (vq=0x558716c0d2a8) at > ../hw/virtio/virtio.c:2263 > #11 0x558712f99b75 in virtio_queue_host_notifier_read (n=0x558716c0d31c) > at ../hw/virtio/virtio.c:3575 > #12 0x55871323d8b5 in aio_dispatch_handler (ctx=0x55871598d5e0, > node=0x558716771000) at ../util/aio-posix.c:372 > #13 0x55871323d988 in aio_dispatch_ready_handlers (ctx=0x55871598d5e0, > ready_list=0x7ff7f288eeb0) at ../util/aio-posix.c:401 > #14 0x55871323e3ff in aio_poll (ctx=0x55871598d5e0, blocking=true) at > ../util/aio-posix.c:723 > #15 0x558713089ef
Re: deadlock when using iothread during backup_clean()
Am 06.10.23 um 14:18 schrieb Fiona Ebner: > Am 04.10.23 um 19:08 schrieb Vladimir Sementsov-Ogievskiy: >> On 28.09.23 11:06, Fiona Ebner wrote: >>> For fixing the backup cancel deadlock, I tried the following: >>> diff --git a/blockjob.c b/blockjob.c index 58c5d64539..fd6132ebfe 100644 --- a/blockjob.c +++ b/blockjob.c @@ -198,7 +198,9 @@ void block_job_remove_all_bdrv(BlockJob *job) * one to make sure that such a concurrent access does not attempt * to process an already freed BdrvChild. */ +aio_context_release(job->job.aio_context); bdrv_graph_wrlock(NULL); +aio_context_acquire(job->job.aio_context); while (job->nodes) { GSList *l = job->nodes; BdrvChild *c = l->data; >>> >>> but it's not enough unfortunately. And I don't just mean with the later >>> deadlock during bdrv_close() (via bdrv_cbw_drop()) as mentioned in the >>> other mail. >>> >>> Even when I got lucky and that deadlock didn't trigger by chance or with >>> an additional change to try and avoid that one >>> diff --git a/block.c b/block.c index e7f349b25c..02d2c4e777 100644 --- a/block.c +++ b/block.c @@ -5165,7 +5165,7 @@ static void bdrv_close(BlockDriverState *bs) bs->drv = NULL; } -bdrv_graph_wrlock(NULL); +bdrv_graph_wrlock(bs); QLIST_FOREACH_SAFE(child, &bs->children, next, next) { bdrv_unref_child(bs, child); } >>> >>> often guest IO would get completely stuck after canceling the backup. >>> There's nothing obvious to me in the backtraces at that point, but it >>> seems the vCPU and main threads running like usual, while the IO thread >>> is stuck in aio_poll(), i.e. never returns from the __ppoll() call. This >>> would happen with both, a VirtIO SCSI and a VirtIO block disk and with >>> both aio=io_uring and aio=threads. >> >> When IO is stuck, it may be helpful to look at bs->tracked_requests list >> in gdb. If there are requests, each one has field .co, which points to >> the coroutine of request. > > After the IO was stuck in the guest, I used bdrv_next_all_states() to > iterate over the states and there's only the bdrv_raw and the > bdrv_host_device. For both, tracked_requests was empty. > > What is also very interesting is that the IO isn't always dead > immediately. It can be that the fio command still runs with lower speed > for a while (sometimes even up to about a minute, but most often about > 10-15 seconds or so). During that time, I still can see calls to > virtio_scsi_handle_cmd() and blk_aio_write_entry(). Then they suddenly stop. > >>> >>> I should also mention I'm using >>> fio --name=file --size=4k --direct=1 --rw=randwrite --bs=4k --ioengine=psync --numjobs=5 --runtime=6000 --time_based >>> >>> inside the guest during canceling of the backup. > > One single time, it got stuck polling while canceling [0]. I usually > need to do the backup+cancel a few times, because the IO doesn't get > stuck each time, so this was a subsequent invocation. The reentrancy to > virtio_scsi_handle_cmd()/etc. seems strange at a first glance. Can that > be normal? > I ran into similar issues now with mirror, (both deadlocks and stuck guest IO at other times), and interestingly, also during job start. Also had a backtrace similar to [0] once, so I took a closer look. Probably was obvious to others already, but for the record: 1. the graph is locked by the main thread 2. the iothread holds the AioContext lock 3. the main thread waits on the AioContext lock 4. the iothread waits for coroutine spawned by blk_is_available() As for why it doesn't progress, blk_co_is_available_entry() uses bdrv_graph_co_rdlock() and can't get it, because the main thread has the write lock. Should be fixed once the AioContext locks are gone, but not sure what should be done to avoid it until then. I'm still trying to figure out what happens in the cases where the guest IO gets stuck. I should mention that I'm not experiencing the issues when disabling graph locking. It's rather obvious for the deadlocks, but I also can't reproduce the issues with stuck guest IO. Did it on top of QEMU 8.1.1, because stuff like bdrv_schedule_unref() came in later and I didn't want to mess that up :) > Best Regards, > Fiona > > [0]: > >> Thread 3 (Thread 0x7ff7f28946c0 (LWP 1815909) "qemu-system-x86"): >> #0 0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, >> timeout=, sigmask=0x0) at >> ../sysdeps/unix/sysv/linux/ppoll.c:42 >> #1 0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, >> timeout=-1) at ../util/qemu-timer.c:339 >> #2 0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, >> ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79 >> #3 0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at >> ../util/aio-posix.c:670 >> #4 0x558713089efa in bdrv_poll_co
Re: deadlock when using iothread during backup_clean()
Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben: > Am 06.10.23 um 14:18 schrieb Fiona Ebner: > > Am 04.10.23 um 19:08 schrieb Vladimir Sementsov-Ogievskiy: > >> On 28.09.23 11:06, Fiona Ebner wrote: > >>> For fixing the backup cancel deadlock, I tried the following: > >>> > diff --git a/blockjob.c b/blockjob.c > index 58c5d64539..fd6132ebfe 100644 > --- a/blockjob.c > +++ b/blockjob.c > @@ -198,7 +198,9 @@ void block_job_remove_all_bdrv(BlockJob *job) > * one to make sure that such a concurrent access does not attempt > * to process an already freed BdrvChild. > */ > +aio_context_release(job->job.aio_context); > bdrv_graph_wrlock(NULL); > +aio_context_acquire(job->job.aio_context); > while (job->nodes) { > GSList *l = job->nodes; > BdrvChild *c = l->data; > >>> > >>> but it's not enough unfortunately. And I don't just mean with the later > >>> deadlock during bdrv_close() (via bdrv_cbw_drop()) as mentioned in the > >>> other mail. > >>> > >>> Even when I got lucky and that deadlock didn't trigger by chance or with > >>> an additional change to try and avoid that one > >>> > diff --git a/block.c b/block.c > index e7f349b25c..02d2c4e777 100644 > --- a/block.c > +++ b/block.c > @@ -5165,7 +5165,7 @@ static void bdrv_close(BlockDriverState *bs) > bs->drv = NULL; > } > -bdrv_graph_wrlock(NULL); > +bdrv_graph_wrlock(bs); > QLIST_FOREACH_SAFE(child, &bs->children, next, next) { > bdrv_unref_child(bs, child); > } > >>> > >>> often guest IO would get completely stuck after canceling the backup. > >>> There's nothing obvious to me in the backtraces at that point, but it > >>> seems the vCPU and main threads running like usual, while the IO thread > >>> is stuck in aio_poll(), i.e. never returns from the __ppoll() call. This > >>> would happen with both, a VirtIO SCSI and a VirtIO block disk and with > >>> both aio=io_uring and aio=threads. > >> > >> When IO is stuck, it may be helpful to look at bs->tracked_requests list > >> in gdb. If there are requests, each one has field .co, which points to > >> the coroutine of request. > > > > After the IO was stuck in the guest, I used bdrv_next_all_states() to > > iterate over the states and there's only the bdrv_raw and the > > bdrv_host_device. For both, tracked_requests was empty. > > > > What is also very interesting is that the IO isn't always dead > > immediately. It can be that the fio command still runs with lower speed > > for a while (sometimes even up to about a minute, but most often about > > 10-15 seconds or so). During that time, I still can see calls to > > virtio_scsi_handle_cmd() and blk_aio_write_entry(). Then they suddenly stop. > > > >>> > >>> I should also mention I'm using > >>> > fio --name=file --size=4k --direct=1 --rw=randwrite --bs=4k > --ioengine=psync --numjobs=5 --runtime=6000 --time_based > >>> > >>> inside the guest during canceling of the backup. > > > > One single time, it got stuck polling while canceling [0]. I usually > > need to do the backup+cancel a few times, because the IO doesn't get > > stuck each time, so this was a subsequent invocation. The reentrancy to > > virtio_scsi_handle_cmd()/etc. seems strange at a first glance. Can that > > be normal? > > > > I ran into similar issues now with mirror, (both deadlocks and stuck > guest IO at other times), and interestingly, also during job start. > > Also had a backtrace similar to [0] once, so I took a closer look. > Probably was obvious to others already, but for the record: > > 1. the graph is locked by the main thread > 2. the iothread holds the AioContext lock > 3. the main thread waits on the AioContext lock > 4. the iothread waits for coroutine spawned by blk_is_available() Where does this blk_is_available() in the iothread come from? Having it wait without dropping the AioContext lock sounds like something that we'd want to avoid. Ideally, devices using iothreads shouldn't use synchronous requests at all, but I think scsi-disk might have some of them. > As for why it doesn't progress, blk_co_is_available_entry() uses > bdrv_graph_co_rdlock() and can't get it, because the main thread has the > write lock. Should be fixed once the AioContext locks are gone, but not > sure what should be done to avoid it until then. Then the nested event loop in blk_is_available() would probably be enough to make progress, yes. Maybe we could actually drop the lock (and immediately reacquire it) in AIO_WAIT_WHILE() even if we're in the home thread? That should give the main thread a chance to make progress. But I think we're actually not very far from removing the AioContext lock, so if we can't find an easy fix in the meantime, waiting for that could be a realistic option. > I'm still trying to figure out what happens in the cases wher
Re: deadlock when using iothread during backup_clean()
Am 17.10.23 um 14:12 schrieb Kevin Wolf: > Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben: >> I ran into similar issues now with mirror, (both deadlocks and stuck >> guest IO at other times), and interestingly, also during job start. >> >> Also had a backtrace similar to [0] once, so I took a closer look. >> Probably was obvious to others already, but for the record: >> >> 1. the graph is locked by the main thread >> 2. the iothread holds the AioContext lock >> 3. the main thread waits on the AioContext lock >> 4. the iothread waits for coroutine spawned by blk_is_available() > > Where does this blk_is_available() in the iothread come from? Having it > wait without dropping the AioContext lock sounds like something that > we'd want to avoid. Ideally, devices using iothreads shouldn't use > synchronous requests at all, but I think scsi-disk might have some of > them. > It's part of the request handling in virtio-scsi: > #0 0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, > timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42 > #1 0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, > timeout=-1) at ../util/qemu-timer.c:339 > #2 0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, > ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79 > #3 0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at > ../util/aio-posix.c:670 > #4 0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at > /home/febner/repos/qemu/block/block-gen.h:43 > #5 0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at > block/block-gen.c:1426 > #6 0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, > d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290 > #7 0x558712f697e4 in virtio_scsi_handle_cmd_req_prepare > (s=0x558716c049c0, req=0x7ff7e400b650) at ../hw/scsi/virtio-scsi.c:788 > #8 0x558712f699b0 in virtio_scsi_handle_cmd_vq (s=0x558716c049c0, > vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:831 > #9 0x558712f69bcb in virtio_scsi_handle_cmd (vdev=0x558716c049c0, > vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:867 > #10 0x558712f96812 in virtio_queue_notify_vq (vq=0x558716c0d2a8) at > ../hw/virtio/virtio.c:2263 > #11 0x558712f99b75 in virtio_queue_host_notifier_read (n=0x558716c0d31c) > at ../hw/virtio/virtio.c:3575 > #12 0x55871323d8b5 in aio_dispatch_handler (ctx=0x55871598d5e0, > node=0x558716771000) at ../util/aio-posix.c:372 > #13 0x55871323d988 in aio_dispatch_ready_handlers (ctx=0x55871598d5e0, > ready_list=0x7ff7f288eeb0) at ../util/aio-posix.c:401 >> As for why it doesn't progress, blk_co_is_available_entry() uses >> bdrv_graph_co_rdlock() and can't get it, because the main thread has the >> write lock. Should be fixed once the AioContext locks are gone, but not >> sure what should be done to avoid it until then. > > Then the nested event loop in blk_is_available() would probably be > enough to make progress, yes. > > Maybe we could actually drop the lock (and immediately reacquire it) in > AIO_WAIT_WHILE() even if we're in the home thread? That should give the > main thread a chance to make progress. > Seems to work :) I haven't run into the issue with the following change anymore, but I have to say, running into that specific deadlock only happened every 10-15 tries or so before. Did 30 tests now. But unfortunately, the stuck IO issue is still there. > diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h > index 5449b6d742..da159501ca 100644 > --- a/include/block/aio-wait.h > +++ b/include/block/aio-wait.h > @@ -88,7 +88,13 @@ extern AioWait global_aio_wait; > smp_mb__after_rmw(); \ > if (ctx_ && in_aio_context_home_thread(ctx_)) {\ > while ((cond)) { \ > +if (unlock && ctx_) { \ > +aio_context_release(ctx_); \ > +} \ > aio_poll(ctx_, true); \ > +if (unlock && ctx_) { \ > +aio_context_acquire(ctx_); \ > +} \ > waited_ = true;\ > } \ > } else { \ > > But I think we're actually not very far from removing the AioContext > lock, so if we can't find an easy fix in the meantime, waiting for that > could be a realistic option. > Glad to hear :) Do you think it will be in time for QEMU 8.2? Otherwise, I'll go ahead and send what I have for fixing the deadlocks from this mail thread in the following days. The stuck guest IO can happen even with
Re: deadlock when using iothread during backup_clean()
Am 17.10.2023 um 15:37 hat Fiona Ebner geschrieben: > Am 17.10.23 um 14:12 schrieb Kevin Wolf: > > Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben: > >> I ran into similar issues now with mirror, (both deadlocks and stuck > >> guest IO at other times), and interestingly, also during job start. > >> > >> Also had a backtrace similar to [0] once, so I took a closer look. > >> Probably was obvious to others already, but for the record: > >> > >> 1. the graph is locked by the main thread > >> 2. the iothread holds the AioContext lock > >> 3. the main thread waits on the AioContext lock > >> 4. the iothread waits for coroutine spawned by blk_is_available() > > > > Where does this blk_is_available() in the iothread come from? Having it > > wait without dropping the AioContext lock sounds like something that > > we'd want to avoid. Ideally, devices using iothreads shouldn't use > > synchronous requests at all, but I think scsi-disk might have some of > > them. > > > > It's part of the request handling in virtio-scsi: > > > #0 0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, > > timeout=, sigmask=0x0) at > > ../sysdeps/unix/sysv/linux/ppoll.c:42 > > #1 0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, > > timeout=-1) at ../util/qemu-timer.c:339 > > #2 0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, > > ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79 > > #3 0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at > > ../util/aio-posix.c:670 > > #4 0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at > > /home/febner/repos/qemu/block/block-gen.h:43 > > #5 0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at > > block/block-gen.c:1426 > > #6 0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, > > d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290 Oh... So essentially for an assertion. I wonder if the blk_is_available() check introduced in 2a2d69f490c is even necessary any more, because BlockBackend has its own AioContext now. And if blk_bs(blk) != NULL isn't what we actually want to check if the check is necessary, because calling bdrv_is_inserted() doesn't seem to have been intended. blk_bs() wouldn't have to poll. > > #7 0x558712f697e4 in virtio_scsi_handle_cmd_req_prepare > > (s=0x558716c049c0, req=0x7ff7e400b650) at ../hw/scsi/virtio-scsi.c:788 > > #8 0x558712f699b0 in virtio_scsi_handle_cmd_vq (s=0x558716c049c0, > > vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:831 > > #9 0x558712f69bcb in virtio_scsi_handle_cmd (vdev=0x558716c049c0, > > vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:867 > > #10 0x558712f96812 in virtio_queue_notify_vq (vq=0x558716c0d2a8) at > > ../hw/virtio/virtio.c:2263 > > #11 0x558712f99b75 in virtio_queue_host_notifier_read > > (n=0x558716c0d31c) at ../hw/virtio/virtio.c:3575 > > #12 0x55871323d8b5 in aio_dispatch_handler (ctx=0x55871598d5e0, > > node=0x558716771000) at ../util/aio-posix.c:372 > > #13 0x55871323d988 in aio_dispatch_ready_handlers (ctx=0x55871598d5e0, > > ready_list=0x7ff7f288eeb0) at ../util/aio-posix.c:401 > > > >> As for why it doesn't progress, blk_co_is_available_entry() uses > >> bdrv_graph_co_rdlock() and can't get it, because the main thread has the > >> write lock. Should be fixed once the AioContext locks are gone, but not > >> sure what should be done to avoid it until then. > > > > Then the nested event loop in blk_is_available() would probably be > > enough to make progress, yes. > > > > Maybe we could actually drop the lock (and immediately reacquire it) in > > AIO_WAIT_WHILE() even if we're in the home thread? That should give the > > main thread a chance to make progress. > > Seems to work :) I haven't run into the issue with the following change > anymore, but I have to say, running into that specific deadlock only > happened every 10-15 tries or so before. Did 30 tests now. But > unfortunately, the stuck IO issue is still there. > > > diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h > > index 5449b6d742..da159501ca 100644 > > --- a/include/block/aio-wait.h > > +++ b/include/block/aio-wait.h > > @@ -88,7 +88,13 @@ extern AioWait global_aio_wait; > > smp_mb__after_rmw(); \ > > if (ctx_ && in_aio_context_home_thread(ctx_)) {\ > > while ((cond)) { \ > > +if (unlock && ctx_) { \ > > +aio_context_release(ctx_); \ > > +} \ > > aio_poll(ctx_, true); \ > > +if (unlock && ctx_) { \ > > +aio_context_acquire(ctx_); \ > > +} \ > >
Re: deadlock when using iothread during backup_clean()
Am 17.10.23 um 16:20 schrieb Kevin Wolf: > Am 17.10.2023 um 15:37 hat Fiona Ebner geschrieben: >> Am 17.10.23 um 14:12 schrieb Kevin Wolf: >>> Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben: I ran into similar issues now with mirror, (both deadlocks and stuck guest IO at other times), and interestingly, also during job start. Also had a backtrace similar to [0] once, so I took a closer look. Probably was obvious to others already, but for the record: 1. the graph is locked by the main thread 2. the iothread holds the AioContext lock 3. the main thread waits on the AioContext lock 4. the iothread waits for coroutine spawned by blk_is_available() >>> >>> Where does this blk_is_available() in the iothread come from? Having it >>> wait without dropping the AioContext lock sounds like something that >>> we'd want to avoid. Ideally, devices using iothreads shouldn't use >>> synchronous requests at all, but I think scsi-disk might have some of >>> them. >>> >> >> It's part of the request handling in virtio-scsi: >> >>> #0 0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, >>> timeout=, sigmask=0x0) at >>> ../sysdeps/unix/sysv/linux/ppoll.c:42 >>> #1 0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, >>> timeout=-1) at ../util/qemu-timer.c:339 >>> #2 0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, >>> ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79 >>> #3 0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at >>> ../util/aio-posix.c:670 >>> #4 0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at >>> /home/febner/repos/qemu/block/block-gen.h:43 >>> #5 0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at >>> block/block-gen.c:1426 >>> #6 0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, >>> d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290 > > Oh... So essentially for an assertion. > > I wonder if the blk_is_available() check introduced in 2a2d69f490c is > even necessary any more, because BlockBackend has its own AioContext > now. And if blk_bs(blk) != NULL isn't what we actually want to check if > the check is necessary, because calling bdrv_is_inserted() doesn't seem > to have been intended. blk_bs() wouldn't have to poll. > Could virtio_scsi_hotunplug() be an issue with removing or modifying the check? There's a call there which sets the blk's AioContext to qemu_get_aio_context(). Or are we sure that the assert in virtio_scsi_ctx_check() can't be reached after that? >>> #7 0x558712f697e4 in virtio_scsi_handle_cmd_req_prepare >>> (s=0x558716c049c0, req=0x7ff7e400b650) at ../hw/scsi/virtio-scsi.c:788 >>> #8 0x558712f699b0 in virtio_scsi_handle_cmd_vq (s=0x558716c049c0, >>> vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:831 >>> #9 0x558712f69bcb in virtio_scsi_handle_cmd (vdev=0x558716c049c0, >>> vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:867 >>> #10 0x558712f96812 in virtio_queue_notify_vq (vq=0x558716c0d2a8) at >>> ../hw/virtio/virtio.c:2263 >>> #11 0x558712f99b75 in virtio_queue_host_notifier_read >>> (n=0x558716c0d31c) at ../hw/virtio/virtio.c:3575 >>> #12 0x55871323d8b5 in aio_dispatch_handler (ctx=0x55871598d5e0, >>> node=0x558716771000) at ../util/aio-posix.c:372 >>> #13 0x55871323d988 in aio_dispatch_ready_handlers (ctx=0x55871598d5e0, >>> ready_list=0x7ff7f288eeb0) at ../util/aio-posix.c:401 >> >> As for why it doesn't progress, blk_co_is_available_entry() uses bdrv_graph_co_rdlock() and can't get it, because the main thread has the write lock. Should be fixed once the AioContext locks are gone, but not sure what should be done to avoid it until then. >>> >>> Then the nested event loop in blk_is_available() would probably be >>> enough to make progress, yes. >>> >>> Maybe we could actually drop the lock (and immediately reacquire it) in >>> AIO_WAIT_WHILE() even if we're in the home thread? That should give the >>> main thread a chance to make progress. >> >> Seems to work :) I haven't run into the issue with the following change >> anymore, but I have to say, running into that specific deadlock only >> happened every 10-15 tries or so before. Did 30 tests now. But >> unfortunately, the stuck IO issue is still there. >> >>> diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h >>> index 5449b6d742..da159501ca 100644 >>> --- a/include/block/aio-wait.h >>> +++ b/include/block/aio-wait.h >>> @@ -88,7 +88,13 @@ extern AioWait global_aio_wait; >>> smp_mb__after_rmw(); \ >>> if (ctx_ && in_aio_context_home_thread(ctx_)) {\ >>> while ((cond)) { \ >>> +if (unlock && ctx_) { \ >>> +aio_context_release(ctx_); \ >>> +}
Re: deadlock when using iothread during backup_clean()
Am 18.10.2023 um 11:42 hat Fiona Ebner geschrieben: > Am 17.10.23 um 16:20 schrieb Kevin Wolf: > > Am 17.10.2023 um 15:37 hat Fiona Ebner geschrieben: > >> Am 17.10.23 um 14:12 schrieb Kevin Wolf: > >>> Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben: > I ran into similar issues now with mirror, (both deadlocks and stuck > guest IO at other times), and interestingly, also during job start. > > Also had a backtrace similar to [0] once, so I took a closer look. > Probably was obvious to others already, but for the record: > > 1. the graph is locked by the main thread > 2. the iothread holds the AioContext lock > 3. the main thread waits on the AioContext lock > 4. the iothread waits for coroutine spawned by blk_is_available() > >>> > >>> Where does this blk_is_available() in the iothread come from? Having it > >>> wait without dropping the AioContext lock sounds like something that > >>> we'd want to avoid. Ideally, devices using iothreads shouldn't use > >>> synchronous requests at all, but I think scsi-disk might have some of > >>> them. > >>> > >> > >> It's part of the request handling in virtio-scsi: > >> > >>> #0 0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, > >>> timeout=, sigmask=0x0) at > >>> ../sysdeps/unix/sysv/linux/ppoll.c:42 > >>> #1 0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, > >>> timeout=-1) at ../util/qemu-timer.c:339 > >>> #2 0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, > >>> ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79 > >>> #3 0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at > >>> ../util/aio-posix.c:670 > >>> #4 0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at > >>> /home/febner/repos/qemu/block/block-gen.h:43 > >>> #5 0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at > >>> block/block-gen.c:1426 > >>> #6 0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, > >>> d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290 > > > > Oh... So essentially for an assertion. > > > > I wonder if the blk_is_available() check introduced in 2a2d69f490c is > > even necessary any more, because BlockBackend has its own AioContext > > now. And if blk_bs(blk) != NULL isn't what we actually want to check if > > the check is necessary, because calling bdrv_is_inserted() doesn't seem > > to have been intended. blk_bs() wouldn't have to poll. > > > > Could virtio_scsi_hotunplug() be an issue with removing or modifying > the check? There's a call there which sets the blk's AioContext to > qemu_get_aio_context(). Or are we sure that the assert in > virtio_scsi_ctx_check() can't be reached after that? I think that would be the kind of bug that the assertion tries to catch, because then we would be sending requests to blk from a thread that doesn't match its AioContext (which will be allowed soon, but not quite yet). Before resetting the AioContext, virtio_scsi_hotunplug() calls qdev_simple_device_unplug_cb(), which unrealizes the SCSI device. This calls scsi_qdev_unrealize() -> scsi_device_purge_requests(), which in turn drains blk and cancels all pending requests. So there should be nothing left that could call into virtio_scsi_ctx_check() any more. The other argument is that after unrealize, virtio_scsi_device_get() would return NULL anyway, so even if a request were still pending, it would just fail instead of accessing the unplugged device. > >>> #7 0x558712f697e4 in virtio_scsi_handle_cmd_req_prepare > >>> (s=0x558716c049c0, req=0x7ff7e400b650) at ../hw/scsi/virtio-scsi.c:788 > >>> #8 0x558712f699b0 in virtio_scsi_handle_cmd_vq (s=0x558716c049c0, > >>> vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:831 > >>> #9 0x558712f69bcb in virtio_scsi_handle_cmd (vdev=0x558716c049c0, > >>> vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:867 > >>> #10 0x558712f96812 in virtio_queue_notify_vq (vq=0x558716c0d2a8) at > >>> ../hw/virtio/virtio.c:2263 > >>> #11 0x558712f99b75 in virtio_queue_host_notifier_read > >>> (n=0x558716c0d31c) at ../hw/virtio/virtio.c:3575 > >>> #12 0x55871323d8b5 in aio_dispatch_handler (ctx=0x55871598d5e0, > >>> node=0x558716771000) at ../util/aio-posix.c:372 > >>> #13 0x55871323d988 in aio_dispatch_ready_handlers > >>> (ctx=0x55871598d5e0, ready_list=0x7ff7f288eeb0) at ../util/aio-posix.c:401 > >> > >> > As for why it doesn't progress, blk_co_is_available_entry() uses > bdrv_graph_co_rdlock() and can't get it, because the main thread has the > write lock. Should be fixed once the AioContext locks are gone, but not > sure what should be done to avoid it until then. > >>> > >>> Then the nested event loop in blk_is_available() would probably be > >>> enough to make progress, yes. > >>> > >>> Maybe we could actually drop the lock (and immediately reacquire it) in > >>> AIO_WAIT_WHILE() even if we're in the home thread? That should give the > >>> main thread a
Re: deadlock when using iothread during backup_clean()
Am 19.10.23 um 14:14 schrieb Kevin Wolf: > Am 18.10.2023 um 11:42 hat Fiona Ebner geschrieben: >> Am 17.10.23 um 16:20 schrieb Kevin Wolf: >>> Am 17.10.2023 um 15:37 hat Fiona Ebner geschrieben: Am 17.10.23 um 14:12 schrieb Kevin Wolf: > Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben: >> I ran into similar issues now with mirror, (both deadlocks and stuck >> guest IO at other times), and interestingly, also during job start. >> >> Also had a backtrace similar to [0] once, so I took a closer look. >> Probably was obvious to others already, but for the record: >> >> 1. the graph is locked by the main thread >> 2. the iothread holds the AioContext lock >> 3. the main thread waits on the AioContext lock >> 4. the iothread waits for coroutine spawned by blk_is_available() > > Where does this blk_is_available() in the iothread come from? Having it > wait without dropping the AioContext lock sounds like something that > we'd want to avoid. Ideally, devices using iothreads shouldn't use > synchronous requests at all, but I think scsi-disk might have some of > them. > It's part of the request handling in virtio-scsi: > #0 0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, > timeout=, sigmask=0x0) at > ../sysdeps/unix/sysv/linux/ppoll.c:42 > #1 0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, > timeout=-1) at ../util/qemu-timer.c:339 > #2 0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, > ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79 > #3 0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at > ../util/aio-posix.c:670 > #4 0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at > /home/febner/repos/qemu/block/block-gen.h:43 > #5 0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at > block/block-gen.c:1426 > #6 0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, > d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290 >>> >>> Oh... So essentially for an assertion. >>> >>> I wonder if the blk_is_available() check introduced in 2a2d69f490c is >>> even necessary any more, because BlockBackend has its own AioContext >>> now. And if blk_bs(blk) != NULL isn't what we actually want to check if >>> the check is necessary, because calling bdrv_is_inserted() doesn't seem >>> to have been intended. blk_bs() wouldn't have to poll. >>> >> >> Could virtio_scsi_hotunplug() be an issue with removing or modifying >> the check? There's a call there which sets the blk's AioContext to >> qemu_get_aio_context(). Or are we sure that the assert in >> virtio_scsi_ctx_check() can't be reached after that? > > I think that would be the kind of bug that the assertion tries to > catch, because then we would be sending requests to blk from a thread > that doesn't match its AioContext (which will be allowed soon, but not > quite yet). > > Before resetting the AioContext, virtio_scsi_hotunplug() calls > qdev_simple_device_unplug_cb(), which unrealizes the SCSI device. This > calls scsi_qdev_unrealize() -> scsi_device_purge_requests(), which in > turn drains blk and cancels all pending requests. So there should be > nothing left that could call into virtio_scsi_ctx_check() any more. > > The other argument is that after unrealize, virtio_scsi_device_get() > would return NULL anyway, so even if a request were still pending, it > would just fail instead of accessing the unplugged device. > Okay, sounds like a way to get around that deadlock issue then :) (...) >>> >>> What does the stuck I/O look like? Is it stuck in the backend, i.e. the >>> device started requests that never complete? Or stuck from the guest >>> perspective, i.e. the device never checks for new requests? >>> >> >> AFAICT, from the guest perspective. >> >>> I don't really have an idea immediately, we'd have to find out where the >>> stuck I/O stops being processed. >>> >> >> I've described it in an earlier mail in this thread: >> https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg01900.html >> >> Quoting from there: >> >>> After the IO was stuck in the guest, I used bdrv_next_all_states() to >>> iterate over the states and there's only the bdrv_raw and the >>> bdrv_host_device. For both, tracked_requests was empty. > > And bs->in_flight and blk->in_flight are 0, too? > Yes. And queued_requests in the BlockBackend is also empty. > Is anything quiesced? No. quiesce_counter is 0 for both BlockDriverState instances as well as for the BlockBackend. quiesced_parent is false for both parents (i.e. child_root for the bdrv_raw and child_of_bds for the bdrv_file (this time I used VirtIO SCSI, in the quote it was VirtIO block)). >>> What is also very interesting is that the IO isn't always dead >>> immediately. It can be that the fio command still runs with lower speed >>> for a while (sometimes even up to about a minute,
Re: deadlock when using iothread during backup_clean()
Am 19.10.23 um 15:53 schrieb Fiona Ebner: > Am 19.10.23 um 14:14 schrieb Kevin Wolf: >> Am 18.10.2023 um 11:42 hat Fiona Ebner geschrieben: >>> Am 17.10.23 um 16:20 schrieb Kevin Wolf: Am 17.10.2023 um 15:37 hat Fiona Ebner geschrieben: > Am 17.10.23 um 14:12 schrieb Kevin Wolf: >> Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben: >>> I ran into similar issues now with mirror, (both deadlocks and stuck >>> guest IO at other times), and interestingly, also during job start. >>> >>> Also had a backtrace similar to [0] once, so I took a closer look. >>> Probably was obvious to others already, but for the record: >>> >>> 1. the graph is locked by the main thread >>> 2. the iothread holds the AioContext lock >>> 3. the main thread waits on the AioContext lock >>> 4. the iothread waits for coroutine spawned by blk_is_available() >> >> Where does this blk_is_available() in the iothread come from? Having it >> wait without dropping the AioContext lock sounds like something that >> we'd want to avoid. Ideally, devices using iothreads shouldn't use >> synchronous requests at all, but I think scsi-disk might have some of >> them. >> > > It's part of the request handling in virtio-scsi: > >> #0 0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, >> timeout=, sigmask=0x0) at >> ../sysdeps/unix/sysv/linux/ppoll.c:42 >> #1 0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, >> timeout=-1) at ../util/qemu-timer.c:339 >> #2 0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, >> ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79 >> #3 0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) >> at ../util/aio-posix.c:670 >> #4 0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at >> /home/febner/repos/qemu/block/block-gen.h:43 >> #5 0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at >> block/block-gen.c:1426 >> #6 0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, >> d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290 Oh... So essentially for an assertion. I wonder if the blk_is_available() check introduced in 2a2d69f490c is even necessary any more, because BlockBackend has its own AioContext now. And if blk_bs(blk) != NULL isn't what we actually want to check if the check is necessary, because calling bdrv_is_inserted() doesn't seem to have been intended. blk_bs() wouldn't have to poll. >>> >>> Could virtio_scsi_hotunplug() be an issue with removing or modifying >>> the check? There's a call there which sets the blk's AioContext to >>> qemu_get_aio_context(). Or are we sure that the assert in >>> virtio_scsi_ctx_check() can't be reached after that? >> >> I think that would be the kind of bug that the assertion tries to >> catch, because then we would be sending requests to blk from a thread >> that doesn't match its AioContext (which will be allowed soon, but not >> quite yet). >> >> Before resetting the AioContext, virtio_scsi_hotunplug() calls >> qdev_simple_device_unplug_cb(), which unrealizes the SCSI device. This >> calls scsi_qdev_unrealize() -> scsi_device_purge_requests(), which in >> turn drains blk and cancels all pending requests. So there should be >> nothing left that could call into virtio_scsi_ctx_check() any more. >> >> The other argument is that after unrealize, virtio_scsi_device_get() >> would return NULL anyway, so even if a request were still pending, it >> would just fail instead of accessing the unplugged device. >> > > Okay, sounds like a way to get around that deadlock issue then :) > Unfortunately, scsi_dma_command() also has a blk_is_available() call and I ran into a similar deadlock with that once. > (...) > What does the stuck I/O look like? Is it stuck in the backend, i.e. the device started requests that never complete? Or stuck from the guest perspective, i.e. the device never checks for new requests? >>> >>> AFAICT, from the guest perspective. >>> I don't really have an idea immediately, we'd have to find out where the stuck I/O stops being processed. >>> >>> I've described it in an earlier mail in this thread: >>> https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg01900.html >>> >>> Quoting from there: >>> After the IO was stuck in the guest, I used bdrv_next_all_states() to iterate over the states and there's only the bdrv_raw and the bdrv_host_device. For both, tracked_requests was empty. >> >> And bs->in_flight and blk->in_flight are 0, too? >> > > Yes. And queued_requests in the BlockBackend is also empty. > >> Is anything quiesced? > > No. quiesce_counter is 0 for both BlockDriverState instances as well as > for the BlockBackend. quiesced_parent is false for both parents (i.e. > child_root for the bdrv_raw and child_of_bds for th
Re: deadlock when using iothread during backup_clean()
Am 20.10.23 um 15:52 schrieb Fiona Ebner: > And I found that with drive-mirror, the issue during starting seems to > manifest with the bdrv_open() call. Adding a return before it, the guest > IO didn't get stuck in my testing, but adding a return after it, it can > get stuck. I'll try to see if I can further narrow it down next week, > but maybe that's already a useful hint? > In the end, I was able to find a reproducer that just does draining and bisected the issue (doesn't seem related to the graph lock after all). I replied there, to avoid all the overhead from this thread: https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg00681.html Best Regards, Fiona