subject:"\"deadlock when using iothread during backup

deadlock when using iothread during backup_clean()

2023-09-05 Thread Fiona Ebner

Hi,

as the title says, a deadlock is possible in backup_clean() when using a
drive with an IO thread. The main thread keeps waiting in
bdrv_graph_wrlock() for reader_count() to become 0, while the IO thread
waits for its AioContext lock, which the main thread holds (because it
also is the backup job's AioContext). See the stack trace below [1].

The reason why it happens becomes clear with following commit message:

> commit 31b2ddfea304afc498aca8cac171020ef33eb89b
> Author: Kevin Wolf 
> Date:   Mon Jun 5 10:57:10 2023 +0200
> 
> graph-lock: Unlock the AioContext while polling
> 
> If the caller keeps the AioContext lock for a block node in an iothread,
> polling in bdrv_graph_wrlock() deadlocks if the condition isn't
> fulfilled immediately.
> 
> Now that all callers make sure to actually have the AioContext locked
> when they call bdrv_replace_child_noperm() like they should, we can
> change bdrv_graph_wrlock() to take a BlockDriverState whose AioContext
> lock the caller holds (NULL if it doesn't) and unlock it temporarily
> while polling.
>

and noting that bdrv_graph_wrlock() is called with bs being NULL while
the caller holds the IO thread's AioContext lock.

The question is where should the IO thread's AioContext lock be dropped
by the main thread? The "Graph locking part 4 (node management)" series
[0] (also reproduced the dealock with that applied) moves the
bdrv_graph_wrlock() further up to block_job_remove_all_bdrv(), but it
still passes along NULL as the argument.

Can we assume block_job_remove_all_bdrv() to always hold the job's
AioContext? And if yes, can we just tell bdrv_graph_wrlock() that it
needs to release that before polling to fix the deadlock?

Best Regards,
Fiona

[0]: https://lists.nongnu.org/archive/html/qemu-devel/2023-08/msg02831.html

[1]:

IO thread:

> Thread 3 (Thread 0x7fa2209596c0 (LWP 119031) "qemu-system-x86"):
> #0  futex_wait (private=0, expected=2, futex_word=0x5648fc93b2c0) at 
> ../sysdeps/nptl/futex-internal.h:146
> #1  __GI___lll_lock_wait (futex=futex@entry=0x5648fc93b2c0, private=0) at 
> ./nptl/lowlevellock.c:49
> #2  0x7fa2240e532a in lll_mutex_lock_optimized (mutex=0x5648fc93b2c0) at 
> ./nptl/pthread_mutex_lock.c:48
> #3  ___pthread_mutex_lock (mutex=0x5648fc93b2c0) at 
> ./nptl/pthread_mutex_lock.c:128
> #4  0x5648fa1742f8 in qemu_mutex_lock_impl (mutex=0x5648fc93b2c0, 
> file=0x5648fa446579 "../util/async.c", line=728) at 
> ../util/qemu-thread-posix.c:94
> #5  0x5648fa174528 in qemu_rec_mutex_lock_impl (mutex=0x5648fc93b2c0, 
> file=0x5648fa446579 "../util/async.c", line=728) at 
> ../util/qemu-thread-posix.c:149
> #6  0x5648fa18dd6f in aio_context_acquire (ctx=0x5648fc93b260) at 
> ../util/async.c:728
> #7  0x5648fa18dce4 in aio_co_enter (ctx=0x5648fc93b260, 
> co=0x7fa217f49210) at ../util/async.c:710
> #8  0x5648fa18dc31 in aio_co_wake (co=0x7fa217f49210) at 
> ../util/async.c:695
> #9  0x5648fa08e4a2 in luring_process_completions (s=0x5648fdf2dd00) at 
> ../block/io_uring.c:216
> #10 0x5648fa08e6c7 in luring_process_completions_and_submit 
> (s=0x5648fdf2dd00) at ../block/io_uring.c:268
> #11 0x5648fa08e727 in qemu_luring_completion_cb (opaque=0x5648fdf2dd00) 
> at ../block/io_uring.c:284
> #12 0x5648fa16f433 in aio_dispatch_handler (ctx=0x5648fc93b260, 
> node=0x5648fdf2de50) at ../util/aio-posix.c:372
> #13 0x5648fa16f506 in aio_dispatch_ready_handlers (ctx=0x5648fc93b260, 
> ready_list=0x7fa220954180) at ../util/aio-posix.c:401
> #14 0x5648fa16ff7d in aio_poll (ctx=0x5648fc93b260, blocking=true) at 
> ../util/aio-posix.c:723
> #15 0x5648f9fbaa59 in iothread_run (opaque=0x5648fc5ed6b0) at 
> ../iothread.c:63
> #16 0x5648fa175046 in qemu_thread_start (args=0x5648fc93b8f0) at 
> ../util/qemu-thread-posix.c:541
> #17 0x7fa2240e2044 in start_thread (arg=) at 
> ./nptl/pthread_create.c:442
> #18 0x7fa2241625fc in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
> 

main thread:

> Thread 1 (Thread 0x7fa2214bf340 (LWP 119029) "qemu-system-x86"):
> #0  0x7fa224155136 in __ppoll (fds=0x5648fdf2ce50, nfds=2, 
> timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
> #1  0x5648fa193102 in qemu_poll_ns (fds=0x5648fdf2ce50, nfds=2, 
> timeout=-1) at ../util/qemu-timer.c:339
> #2  0x5648fa17042f in fdmon_poll_wait (ctx=0x5648fc6c9740, 
> ready_list=0x72757260, timeout=-1) at ../util/fdmon-poll.c:79
> #3  0x5648fa16fd6b in aio_poll (ctx=0x5648fc6c9740, blocking=true) at 
> ../util/aio-posix.c:670
> #4  0x5648f9ffc0bc in bdrv_graph_wrlock (bs=0x0) at 
> ../block/graph-lock.c:145
> #5  0x5648f9fc78fa in bdrv_replace_child_noperm (child=0x5648fd729fd0, 
> new_bs=0x0) at ../block.c:2897
> #6  0x5648f9fc8487 in bdrv_root_unref_child (child=0x5648fd729fd0) at 
> ../block.c:3227
> #7  0x5648f9fd4b50 in block_job_remove_all_bdrv (job=0x5648fe0b6960) at 
> ../blockjob.c:208
> #8  0x5648f9feb11b in bac

Re: deadlock when using iothread during backup_clean()

2023-09-05 Thread Fiona Ebner

Am 05.09.23 um 12:01 schrieb Fiona Ebner:
> 
> Can we assume block_job_remove_all_bdrv() to always hold the job's
> AioContext? And if yes, can we just tell bdrv_graph_wrlock() that it
> needs to release that before polling to fix the deadlock?
> 

I tried a doing something similar as a proof-of-concept

> diff --git a/blockjob.c b/blockjob.c
> index 58c5d64539..1a696241a0 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -198,19 +198,19 @@ void block_job_remove_all_bdrv(BlockJob *job)
>   * one to make sure that such a concurrent access does not attempt
>   * to process an already freed BdrvChild.
>   */
> -bdrv_graph_wrlock(NULL);
>  while (job->nodes) {
>  GSList *l = job->nodes;
>  BdrvChild *c = l->data;
>  
>  job->nodes = l->next;
>  
> +bdrv_graph_wrlock(c->bs);
>  bdrv_op_unblock_all(c->bs, job->blocker);
>  bdrv_root_unref_child(c);
> +bdrv_graph_wrunlock();
>  
>  g_slist_free_1(l);
>  }
> -bdrv_graph_wrunlock();
>  }

and while it did get slightly further, I ran into another deadlock with

> #0  0x7f1941155136 in __ppoll (fds=0x55992068fb20, nfds=2, 
> timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
> #1  0x55991c6a1a3f in qemu_poll_ns (fds=0x55992068fb20, nfds=2, 
> timeout=-1) at ../util/qemu-timer.c:339
> #2  0x55991c67ed6c in fdmon_poll_wait (ctx=0x55991f058810, 
> ready_list=0x7ffda8c987b0, timeout=-1) at ../util/fdmon-poll.c:79
> #3  0x55991c67e6a8 in aio_poll (ctx=0x55991f058810, blocking=true) at 
> ../util/aio-posix.c:670
> #4  0x55991c50a763 in bdrv_graph_wrlock (bs=0x0) at 
> ../block/graph-lock.c:145
> #5  0x55991c4daf85 in bdrv_close (bs=0x55991fff2f30) at ../block.c:5166
> #6  0x55991c4dc050 in bdrv_delete (bs=0x55991fff2f30) at ../block.c:5606
> #7  0x55991c4df205 in bdrv_unref (bs=0x55991fff2f30) at ../block.c:7173
> #8  0x55991c4fb8ca in bdrv_cbw_drop (bs=0x55991fff2f30) at 
> ../block/copy-before-write.c:566
> #9  0x55991c4f9685 in backup_clean (job=0x55992016d0b0) at 
> ../block/backup.c:105

Re: deadlock when using iothread during backup_clean()

2023-09-05 Thread Paolo Bonzini


On 9/5/23 12:01, Fiona Ebner wrote:

Can we assume block_job_remove_all_bdrv() to always hold the job's
AioContext?


I think so, see job_unref_locked(), job_prepare_locked() and 
job_finalize_single_locked().  These call the callbacks that ultimately 
get to block_job_remove_all_bdrv().



And if yes, can we just tell bdrv_graph_wrlock() that it
needs to release that before polling to fix the deadlock?


No, but I think it should be released and re-acquired in 
block_job_remove_all_bdrv() itself.


mirror_exit_common() however holds _two_ AioContext locks at the time it 
calls block_job_remove_all_bdrv(), qemu_get_aio_context() has to be 
released and reacquired in mirror_exit_common() itself.


Paolo

Re: deadlock when using iothread during backup_clean()

2023-09-28 Thread Fiona Ebner

Am 05.09.23 um 13:42 schrieb Paolo Bonzini:
> On 9/5/23 12:01, Fiona Ebner wrote:
>> Can we assume block_job_remove_all_bdrv() to always hold the job's
>> AioContext?
> 
> I think so, see job_unref_locked(), job_prepare_locked() and
> job_finalize_single_locked().  These call the callbacks that ultimately
> get to block_job_remove_all_bdrv().
> 
>> And if yes, can we just tell bdrv_graph_wrlock() that it
>> needs to release that before polling to fix the deadlock?
> 
> No, but I think it should be released and re-acquired in
> block_job_remove_all_bdrv() itself.
> 

For fixing the backup cancel deadlock, I tried the following:

> diff --git a/blockjob.c b/blockjob.c
> index 58c5d64539..fd6132ebfe 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -198,7 +198,9 @@ void block_job_remove_all_bdrv(BlockJob *job)
>   * one to make sure that such a concurrent access does not attempt
>   * to process an already freed BdrvChild.
>   */
> +aio_context_release(job->job.aio_context);
>  bdrv_graph_wrlock(NULL);
> +aio_context_acquire(job->job.aio_context);
>  while (job->nodes) {
>  GSList *l = job->nodes;
>  BdrvChild *c = l->data;

but it's not enough unfortunately. And I don't just mean with the later
deadlock during bdrv_close() (via bdrv_cbw_drop()) as mentioned in the
other mail.

Even when I got lucky and that deadlock didn't trigger by chance or with
an additional change to try and avoid that one

> diff --git a/block.c b/block.c
> index e7f349b25c..02d2c4e777 100644
> --- a/block.c
> +++ b/block.c
> @@ -5165,7 +5165,7 @@ static void bdrv_close(BlockDriverState *bs)
>  bs->drv = NULL;
>  }
>  
> -bdrv_graph_wrlock(NULL);
> +bdrv_graph_wrlock(bs);
>  QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
>  bdrv_unref_child(bs, child);
>  }

often guest IO would get completely stuck after canceling the backup.
There's nothing obvious to me in the backtraces at that point, but it
seems the vCPU and main threads running like usual, while the IO thread
is stuck in aio_poll(), i.e. never returns from the __ppoll() call. This
would happen with both, a VirtIO SCSI and a VirtIO block disk and with
both aio=io_uring and aio=threads.

I should also mention I'm using

> fio --name=file --size=4k --direct=1 --rw=randwrite --bs=4k --ioengine=psync 
> --numjobs=5 --runtime=6000 --time_based

inside the guest during canceling of the backup.

I'd be glad for any pointers what to look for and happy to provide more
information.

Best Regards,
Fiona

Re: deadlock when using iothread during backup_clean()

2023-10-04 Thread Vladimir Sementsov-Ogievskiy


On 05.09.23 14:25, Fiona Ebner wrote:

Am 05.09.23 um 12:01 schrieb Fiona Ebner:


Can we assume block_job_remove_all_bdrv() to always hold the job's
AioContext? And if yes, can we just tell bdrv_graph_wrlock() that it
needs to release that before polling to fix the deadlock?



I tried a doing something similar as a proof-of-concept


diff --git a/blockjob.c b/blockjob.c
index 58c5d64539..1a696241a0 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -198,19 +198,19 @@ void block_job_remove_all_bdrv(BlockJob *job)
   * one to make sure that such a concurrent access does not attempt
   * to process an already freed BdrvChild.
   */
-bdrv_graph_wrlock(NULL);
  while (job->nodes) {
  GSList *l = job->nodes;
  BdrvChild *c = l->data;
  
  job->nodes = l->next;
  
+bdrv_graph_wrlock(c->bs);

  bdrv_op_unblock_all(c->bs, job->blocker);
  bdrv_root_unref_child(c);
+bdrv_graph_wrunlock();
  
  g_slist_free_1(l);

  }
-bdrv_graph_wrunlock();
  }


and while it did get slightly further, I ran into another deadlock with


#0  0x7f1941155136 in __ppoll (fds=0x55992068fb20, nfds=2, timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
#1  0x55991c6a1a3f in qemu_poll_ns (fds=0x55992068fb20, nfds=2, timeout=-1) 
at ../util/qemu-timer.c:339
#2  0x55991c67ed6c in fdmon_poll_wait (ctx=0x55991f058810, 
ready_list=0x7ffda8c987b0, timeout=-1) at ../util/fdmon-poll.c:79
#3  0x55991c67e6a8 in aio_poll (ctx=0x55991f058810, blocking=true) at 
../util/aio-posix.c:670
#4  0x55991c50a763 in bdrv_graph_wrlock (bs=0x0) at 
../block/graph-lock.c:145


Interesting, why in bdrv_close() we pass NULL to bdrv_graph_wrlock.. Shouldn't 
bdrv_close() be called with bs aio context locked?


#5  0x55991c4daf85 in bdrv_close (bs=0x55991fff2f30) at ../block.c:5166
#6  0x55991c4dc050 in bdrv_delete (bs=0x55991fff2f30) at ../block.c:5606
#7  0x55991c4df205 in bdrv_unref (bs=0x55991fff2f30) at ../block.c:7173
#8  0x55991c4fb8ca in bdrv_cbw_drop (bs=0x55991fff2f30) at 
../block/copy-before-write.c:566
#9  0x55991c4f9685 in backup_clean (job=0x55992016d0b0) at 
../block/backup.c:105




--
Best regards,
Vladimir

Re: deadlock when using iothread during backup_clean()

2023-10-04 Thread Vladimir Sementsov-Ogievskiy


On 28.09.23 11:06, Fiona Ebner wrote:

Am 05.09.23 um 13:42 schrieb Paolo Bonzini:

On 9/5/23 12:01, Fiona Ebner wrote:

Can we assume block_job_remove_all_bdrv() to always hold the job's
AioContext?


I think so, see job_unref_locked(), job_prepare_locked() and
job_finalize_single_locked().  These call the callbacks that ultimately
get to block_job_remove_all_bdrv().
 

And if yes, can we just tell bdrv_graph_wrlock() that it
needs to release that before polling to fix the deadlock?


No, but I think it should be released and re-acquired in
block_job_remove_all_bdrv() itself.



For fixing the backup cancel deadlock, I tried the following:


diff --git a/blockjob.c b/blockjob.c
index 58c5d64539..fd6132ebfe 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -198,7 +198,9 @@ void block_job_remove_all_bdrv(BlockJob *job)
   * one to make sure that such a concurrent access does not attempt
   * to process an already freed BdrvChild.
   */
+aio_context_release(job->job.aio_context);
  bdrv_graph_wrlock(NULL);
+aio_context_acquire(job->job.aio_context);
  while (job->nodes) {
  GSList *l = job->nodes;
  BdrvChild *c = l->data;


but it's not enough unfortunately. And I don't just mean with the later
deadlock during bdrv_close() (via bdrv_cbw_drop()) as mentioned in the
other mail.

Even when I got lucky and that deadlock didn't trigger by chance or with
an additional change to try and avoid that one


diff --git a/block.c b/block.c
index e7f349b25c..02d2c4e777 100644
--- a/block.c
+++ b/block.c
@@ -5165,7 +5165,7 @@ static void bdrv_close(BlockDriverState *bs)
  bs->drv = NULL;
  }
  
-bdrv_graph_wrlock(NULL);

+bdrv_graph_wrlock(bs);
  QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
  bdrv_unref_child(bs, child);
  }


often guest IO would get completely stuck after canceling the backup.
There's nothing obvious to me in the backtraces at that point, but it
seems the vCPU and main threads running like usual, while the IO thread
is stuck in aio_poll(), i.e. never returns from the __ppoll() call. This
would happen with both, a VirtIO SCSI and a VirtIO block disk and with
both aio=io_uring and aio=threads.


When IO is stuck, it may be helpful to look at bs->tracked_requests list in 
gdb. If there are requests, each one has field .co, which points to the coroutine 
of request.

Next step is to look at coroutine stack.

Something like (in gdb):

source scripts/qemu-gdb.py
qemu coroutine 

may work. ("may", because it was long ago when I used this last time)



I should also mention I'm using


fio --name=file --size=4k --direct=1 --rw=randwrite --bs=4k --ioengine=psync 
--numjobs=5 --runtime=6000 --time_based


inside the guest during canceling of the backup.

I'd be glad for any pointers what to look for and happy to provide more
information.

Best Regards,
Fiona



--
Best regards,
Vladimir

Re: deadlock when using iothread during backup_clean()

2023-10-06 Thread Fiona Ebner

Am 04.10.23 um 19:08 schrieb Vladimir Sementsov-Ogievskiy:
> On 28.09.23 11:06, Fiona Ebner wrote:
>> For fixing the backup cancel deadlock, I tried the following:
>>
>>> diff --git a/blockjob.c b/blockjob.c
>>> index 58c5d64539..fd6132ebfe 100644
>>> --- a/blockjob.c
>>> +++ b/blockjob.c
>>> @@ -198,7 +198,9 @@ void block_job_remove_all_bdrv(BlockJob *job)
>>>* one to make sure that such a concurrent access does not attempt
>>>* to process an already freed BdrvChild.
>>>*/
>>> +aio_context_release(job->job.aio_context);
>>>   bdrv_graph_wrlock(NULL);
>>> +aio_context_acquire(job->job.aio_context);
>>>   while (job->nodes) {
>>>   GSList *l = job->nodes;
>>>   BdrvChild *c = l->data;
>>
>> but it's not enough unfortunately. And I don't just mean with the later
>> deadlock during bdrv_close() (via bdrv_cbw_drop()) as mentioned in the
>> other mail.
>>
>> Even when I got lucky and that deadlock didn't trigger by chance or with
>> an additional change to try and avoid that one
>>
>>> diff --git a/block.c b/block.c
>>> index e7f349b25c..02d2c4e777 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -5165,7 +5165,7 @@ static void bdrv_close(BlockDriverState *bs)
>>>   bs->drv = NULL;
>>>   }
>>>   -bdrv_graph_wrlock(NULL);
>>> +bdrv_graph_wrlock(bs);
>>>   QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
>>>   bdrv_unref_child(bs, child);
>>>   }
>>
>> often guest IO would get completely stuck after canceling the backup.
>> There's nothing obvious to me in the backtraces at that point, but it
>> seems the vCPU and main threads running like usual, while the IO thread
>> is stuck in aio_poll(), i.e. never returns from the __ppoll() call. This
>> would happen with both, a VirtIO SCSI and a VirtIO block disk and with
>> both aio=io_uring and aio=threads.
> 
> When IO is stuck, it may be helpful to look at bs->tracked_requests list
> in gdb. If there are requests, each one has field .co, which points to
> the coroutine of request.

After the IO was stuck in the guest, I used bdrv_next_all_states() to
iterate over the states and there's only the bdrv_raw and the
bdrv_host_device. For both, tracked_requests was empty.

What is also very interesting is that the IO isn't always dead
immediately. It can be that the fio command still runs with lower speed
for a while (sometimes even up to about a minute, but most often about
10-15 seconds or so). During that time, I still can see calls to
virtio_scsi_handle_cmd() and blk_aio_write_entry(). Then they suddenly stop.

>>
>> I should also mention I'm using
>>
>>> fio --name=file --size=4k --direct=1 --rw=randwrite --bs=4k
>>> --ioengine=psync --numjobs=5 --runtime=6000 --time_based
>>
>> inside the guest during canceling of the backup.

One single time, it got stuck polling while canceling [0]. I usually
need to do the backup+cancel a few times, because the IO doesn't get
stuck each time, so this was a subsequent invocation. The reentrancy to
virtio_scsi_handle_cmd()/etc. seems strange at a first glance. Can that
be normal?

Best Regards,
Fiona

[0]:

> Thread 3 (Thread 0x7ff7f28946c0 (LWP 1815909) "qemu-system-x86"):
> #0  0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, 
> timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
> #1  0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, 
> timeout=-1) at ../util/qemu-timer.c:339
> #2  0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, 
> ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79
> #3  0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at 
> ../util/aio-posix.c:670
> #4  0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at 
> /home/febner/repos/qemu/block/block-gen.h:43
> #5  0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at 
> block/block-gen.c:1426
> #6  0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, 
> d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290
> #7  0x558712f697e4 in virtio_scsi_handle_cmd_req_prepare 
> (s=0x558716c049c0, req=0x7ff7e400b650) at ../hw/scsi/virtio-scsi.c:788
> #8  0x558712f699b0 in virtio_scsi_handle_cmd_vq (s=0x558716c049c0, 
> vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:831
> #9  0x558712f69bcb in virtio_scsi_handle_cmd (vdev=0x558716c049c0, 
> vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:867
> #10 0x558712f96812 in virtio_queue_notify_vq (vq=0x558716c0d2a8) at 
> ../hw/virtio/virtio.c:2263
> #11 0x558712f99b75 in virtio_queue_host_notifier_read (n=0x558716c0d31c) 
> at ../hw/virtio/virtio.c:3575
> #12 0x55871323d8b5 in aio_dispatch_handler (ctx=0x55871598d5e0, 
> node=0x558716771000) at ../util/aio-posix.c:372
> #13 0x55871323d988 in aio_dispatch_ready_handlers (ctx=0x55871598d5e0, 
> ready_list=0x7ff7f288eeb0) at ../util/aio-posix.c:401
> #14 0x55871323e3ff in aio_poll (ctx=0x55871598d5e0, blocking=true) at 
> ../util/aio-posix.c:723
> #15 0x558713089ef

Re: deadlock when using iothread during backup_clean()

2023-10-17 Thread Fiona Ebner

Am 06.10.23 um 14:18 schrieb Fiona Ebner:
> Am 04.10.23 um 19:08 schrieb Vladimir Sementsov-Ogievskiy:
>> On 28.09.23 11:06, Fiona Ebner wrote:
>>> For fixing the backup cancel deadlock, I tried the following:
>>>
 diff --git a/blockjob.c b/blockjob.c
 index 58c5d64539..fd6132ebfe 100644
 --- a/blockjob.c
 +++ b/blockjob.c
 @@ -198,7 +198,9 @@ void block_job_remove_all_bdrv(BlockJob *job)
* one to make sure that such a concurrent access does not attempt
* to process an already freed BdrvChild.
*/
 +aio_context_release(job->job.aio_context);
   bdrv_graph_wrlock(NULL);
 +aio_context_acquire(job->job.aio_context);
   while (job->nodes) {
   GSList *l = job->nodes;
   BdrvChild *c = l->data;
>>>
>>> but it's not enough unfortunately. And I don't just mean with the later
>>> deadlock during bdrv_close() (via bdrv_cbw_drop()) as mentioned in the
>>> other mail.
>>>
>>> Even when I got lucky and that deadlock didn't trigger by chance or with
>>> an additional change to try and avoid that one
>>>
 diff --git a/block.c b/block.c
 index e7f349b25c..02d2c4e777 100644
 --- a/block.c
 +++ b/block.c
 @@ -5165,7 +5165,7 @@ static void bdrv_close(BlockDriverState *bs)
   bs->drv = NULL;
   }
   -bdrv_graph_wrlock(NULL);
 +bdrv_graph_wrlock(bs);
   QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
   bdrv_unref_child(bs, child);
   }
>>>
>>> often guest IO would get completely stuck after canceling the backup.
>>> There's nothing obvious to me in the backtraces at that point, but it
>>> seems the vCPU and main threads running like usual, while the IO thread
>>> is stuck in aio_poll(), i.e. never returns from the __ppoll() call. This
>>> would happen with both, a VirtIO SCSI and a VirtIO block disk and with
>>> both aio=io_uring and aio=threads.
>>
>> When IO is stuck, it may be helpful to look at bs->tracked_requests list
>> in gdb. If there are requests, each one has field .co, which points to
>> the coroutine of request.
> 
> After the IO was stuck in the guest, I used bdrv_next_all_states() to
> iterate over the states and there's only the bdrv_raw and the
> bdrv_host_device. For both, tracked_requests was empty.
> 
> What is also very interesting is that the IO isn't always dead
> immediately. It can be that the fio command still runs with lower speed
> for a while (sometimes even up to about a minute, but most often about
> 10-15 seconds or so). During that time, I still can see calls to
> virtio_scsi_handle_cmd() and blk_aio_write_entry(). Then they suddenly stop.
> 
>>>
>>> I should also mention I'm using
>>>
 fio --name=file --size=4k --direct=1 --rw=randwrite --bs=4k
 --ioengine=psync --numjobs=5 --runtime=6000 --time_based
>>>
>>> inside the guest during canceling of the backup.
> 
> One single time, it got stuck polling while canceling [0]. I usually
> need to do the backup+cancel a few times, because the IO doesn't get
> stuck each time, so this was a subsequent invocation. The reentrancy to
> virtio_scsi_handle_cmd()/etc. seems strange at a first glance. Can that
> be normal?
> 

I ran into similar issues now with mirror, (both deadlocks and stuck
guest IO at other times), and interestingly, also during job start.

Also had a backtrace similar to [0] once, so I took a closer look.
Probably was obvious to others already, but for the record:

1. the graph is locked by the main thread
2. the iothread holds the AioContext lock
3. the main thread waits on the AioContext lock
4. the iothread waits for coroutine spawned by blk_is_available()

As for why it doesn't progress, blk_co_is_available_entry() uses
bdrv_graph_co_rdlock() and can't get it, because the main thread has the
write lock. Should be fixed once the AioContext locks are gone, but not
sure what should be done to avoid it until then.

I'm still trying to figure out what happens in the cases where the guest
IO gets stuck. I should mention that I'm not experiencing the issues
when disabling graph locking. It's rather obvious for the deadlocks, but
I also can't reproduce the issues with stuck guest IO. Did it on top of
QEMU 8.1.1, because stuff like bdrv_schedule_unref() came in later and I
didn't want to mess that up :)

> Best Regards,
> Fiona
> 
> [0]:
> 
>> Thread 3 (Thread 0x7ff7f28946c0 (LWP 1815909) "qemu-system-x86"):
>> #0  0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, 
>> timeout=, sigmask=0x0) at 
>> ../sysdeps/unix/sysv/linux/ppoll.c:42
>> #1  0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, 
>> timeout=-1) at ../util/qemu-timer.c:339
>> #2  0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, 
>> ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79
>> #3  0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at 
>> ../util/aio-posix.c:670
>> #4  0x558713089efa in bdrv_poll_co

Re: deadlock when using iothread during backup_clean()

2023-10-17 Thread Kevin Wolf

Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben:
> Am 06.10.23 um 14:18 schrieb Fiona Ebner:
> > Am 04.10.23 um 19:08 schrieb Vladimir Sementsov-Ogievskiy:
> >> On 28.09.23 11:06, Fiona Ebner wrote:
> >>> For fixing the backup cancel deadlock, I tried the following:
> >>>
>  diff --git a/blockjob.c b/blockjob.c
>  index 58c5d64539..fd6132ebfe 100644
>  --- a/blockjob.c
>  +++ b/blockjob.c
>  @@ -198,7 +198,9 @@ void block_job_remove_all_bdrv(BlockJob *job)
> * one to make sure that such a concurrent access does not attempt
> * to process an already freed BdrvChild.
> */
>  +aio_context_release(job->job.aio_context);
>    bdrv_graph_wrlock(NULL);
>  +aio_context_acquire(job->job.aio_context);
>    while (job->nodes) {
>    GSList *l = job->nodes;
>    BdrvChild *c = l->data;
> >>>
> >>> but it's not enough unfortunately. And I don't just mean with the later
> >>> deadlock during bdrv_close() (via bdrv_cbw_drop()) as mentioned in the
> >>> other mail.
> >>>
> >>> Even when I got lucky and that deadlock didn't trigger by chance or with
> >>> an additional change to try and avoid that one
> >>>
>  diff --git a/block.c b/block.c
>  index e7f349b25c..02d2c4e777 100644
>  --- a/block.c
>  +++ b/block.c
>  @@ -5165,7 +5165,7 @@ static void bdrv_close(BlockDriverState *bs)
>    bs->drv = NULL;
>    }
>    -bdrv_graph_wrlock(NULL);
>  +bdrv_graph_wrlock(bs);
>    QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
>    bdrv_unref_child(bs, child);
>    }
> >>>
> >>> often guest IO would get completely stuck after canceling the backup.
> >>> There's nothing obvious to me in the backtraces at that point, but it
> >>> seems the vCPU and main threads running like usual, while the IO thread
> >>> is stuck in aio_poll(), i.e. never returns from the __ppoll() call. This
> >>> would happen with both, a VirtIO SCSI and a VirtIO block disk and with
> >>> both aio=io_uring and aio=threads.
> >>
> >> When IO is stuck, it may be helpful to look at bs->tracked_requests list
> >> in gdb. If there are requests, each one has field .co, which points to
> >> the coroutine of request.
> > 
> > After the IO was stuck in the guest, I used bdrv_next_all_states() to
> > iterate over the states and there's only the bdrv_raw and the
> > bdrv_host_device. For both, tracked_requests was empty.
> > 
> > What is also very interesting is that the IO isn't always dead
> > immediately. It can be that the fio command still runs with lower speed
> > for a while (sometimes even up to about a minute, but most often about
> > 10-15 seconds or so). During that time, I still can see calls to
> > virtio_scsi_handle_cmd() and blk_aio_write_entry(). Then they suddenly stop.
> > 
> >>>
> >>> I should also mention I'm using
> >>>
>  fio --name=file --size=4k --direct=1 --rw=randwrite --bs=4k
>  --ioengine=psync --numjobs=5 --runtime=6000 --time_based
> >>>
> >>> inside the guest during canceling of the backup.
> > 
> > One single time, it got stuck polling while canceling [0]. I usually
> > need to do the backup+cancel a few times, because the IO doesn't get
> > stuck each time, so this was a subsequent invocation. The reentrancy to
> > virtio_scsi_handle_cmd()/etc. seems strange at a first glance. Can that
> > be normal?
> > 
> 
> I ran into similar issues now with mirror, (both deadlocks and stuck
> guest IO at other times), and interestingly, also during job start.
> 
> Also had a backtrace similar to [0] once, so I took a closer look.
> Probably was obvious to others already, but for the record:
> 
> 1. the graph is locked by the main thread
> 2. the iothread holds the AioContext lock
> 3. the main thread waits on the AioContext lock
> 4. the iothread waits for coroutine spawned by blk_is_available()

Where does this blk_is_available() in the iothread come from? Having it
wait without dropping the AioContext lock sounds like something that
we'd want to avoid. Ideally, devices using iothreads shouldn't use
synchronous requests at all, but I think scsi-disk might have some of
them.

> As for why it doesn't progress, blk_co_is_available_entry() uses
> bdrv_graph_co_rdlock() and can't get it, because the main thread has the
> write lock. Should be fixed once the AioContext locks are gone, but not
> sure what should be done to avoid it until then.

Then the nested event loop in blk_is_available() would probably be
enough to make progress, yes.

Maybe we could actually drop the lock (and immediately reacquire it) in
AIO_WAIT_WHILE() even if we're in the home thread? That should give the
main thread a chance to make progress.

But I think we're actually not very far from removing the AioContext
lock, so if we can't find an easy fix in the meantime, waiting for that
could be a realistic option.

> I'm still trying to figure out what happens in the cases wher

Re: deadlock when using iothread during backup_clean()

2023-10-17 Thread Fiona Ebner

Am 17.10.23 um 14:12 schrieb Kevin Wolf:
> Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben:
>> I ran into similar issues now with mirror, (both deadlocks and stuck
>> guest IO at other times), and interestingly, also during job start.
>>
>> Also had a backtrace similar to [0] once, so I took a closer look.
>> Probably was obvious to others already, but for the record:
>>
>> 1. the graph is locked by the main thread
>> 2. the iothread holds the AioContext lock
>> 3. the main thread waits on the AioContext lock
>> 4. the iothread waits for coroutine spawned by blk_is_available()
> 
> Where does this blk_is_available() in the iothread come from? Having it
> wait without dropping the AioContext lock sounds like something that
> we'd want to avoid. Ideally, devices using iothreads shouldn't use
> synchronous requests at all, but I think scsi-disk might have some of
> them.
> 

It's part of the request handling in virtio-scsi:

> #0  0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, 
> timeout=, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
> #1  0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, 
> timeout=-1) at ../util/qemu-timer.c:339
> #2  0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, 
> ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79
> #3  0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at 
> ../util/aio-posix.c:670
> #4  0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at 
> /home/febner/repos/qemu/block/block-gen.h:43
> #5  0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at 
> block/block-gen.c:1426
> #6  0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, 
> d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290
> #7  0x558712f697e4 in virtio_scsi_handle_cmd_req_prepare 
> (s=0x558716c049c0, req=0x7ff7e400b650) at ../hw/scsi/virtio-scsi.c:788
> #8  0x558712f699b0 in virtio_scsi_handle_cmd_vq (s=0x558716c049c0, 
> vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:831
> #9  0x558712f69bcb in virtio_scsi_handle_cmd (vdev=0x558716c049c0, 
> vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:867
> #10 0x558712f96812 in virtio_queue_notify_vq (vq=0x558716c0d2a8) at 
> ../hw/virtio/virtio.c:2263
> #11 0x558712f99b75 in virtio_queue_host_notifier_read (n=0x558716c0d31c) 
> at ../hw/virtio/virtio.c:3575
> #12 0x55871323d8b5 in aio_dispatch_handler (ctx=0x55871598d5e0, 
> node=0x558716771000) at ../util/aio-posix.c:372
> #13 0x55871323d988 in aio_dispatch_ready_handlers (ctx=0x55871598d5e0, 
> ready_list=0x7ff7f288eeb0) at ../util/aio-posix.c:401


>> As for why it doesn't progress, blk_co_is_available_entry() uses
>> bdrv_graph_co_rdlock() and can't get it, because the main thread has the
>> write lock. Should be fixed once the AioContext locks are gone, but not
>> sure what should be done to avoid it until then.
> 
> Then the nested event loop in blk_is_available() would probably be
> enough to make progress, yes.
> 
> Maybe we could actually drop the lock (and immediately reacquire it) in
> AIO_WAIT_WHILE() even if we're in the home thread? That should give the
> main thread a chance to make progress.
> 

Seems to work :) I haven't run into the issue with the following change
anymore, but I have to say, running into that specific deadlock only
happened every 10-15 tries or so before. Did 30 tests now. But
unfortunately, the stuck IO issue is still there.

> diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
> index 5449b6d742..da159501ca 100644
> --- a/include/block/aio-wait.h
> +++ b/include/block/aio-wait.h
> @@ -88,7 +88,13 @@ extern AioWait global_aio_wait;
>  smp_mb__after_rmw();   \
>  if (ctx_ && in_aio_context_home_thread(ctx_)) {\
>  while ((cond)) {   \
> +if (unlock && ctx_) {  \
> +aio_context_release(ctx_); \
> +}  \
>  aio_poll(ctx_, true);  \
> +if (unlock && ctx_) {  \
> +aio_context_acquire(ctx_); \
> +}  \
>  waited_ = true;\
>  }  \
>  } else {   \
> 
> But I think we're actually not very far from removing the AioContext
> lock, so if we can't find an easy fix in the meantime, waiting for that
> could be a realistic option.
> 

Glad to hear :) Do you think it will be in time for QEMU 8.2? Otherwise,
I'll go ahead and send what I have for fixing the deadlocks from this
mail thread in the following days. The stuck guest IO can happen even
with

Re: deadlock when using iothread during backup_clean()

2023-10-17 Thread Kevin Wolf

Am 17.10.2023 um 15:37 hat Fiona Ebner geschrieben:
> Am 17.10.23 um 14:12 schrieb Kevin Wolf:
> > Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben:
> >> I ran into similar issues now with mirror, (both deadlocks and stuck
> >> guest IO at other times), and interestingly, also during job start.
> >>
> >> Also had a backtrace similar to [0] once, so I took a closer look.
> >> Probably was obvious to others already, but for the record:
> >>
> >> 1. the graph is locked by the main thread
> >> 2. the iothread holds the AioContext lock
> >> 3. the main thread waits on the AioContext lock
> >> 4. the iothread waits for coroutine spawned by blk_is_available()
> > 
> > Where does this blk_is_available() in the iothread come from? Having it
> > wait without dropping the AioContext lock sounds like something that
> > we'd want to avoid. Ideally, devices using iothreads shouldn't use
> > synchronous requests at all, but I think scsi-disk might have some of
> > them.
> > 
> 
> It's part of the request handling in virtio-scsi:
> 
> > #0  0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, 
> > timeout=, sigmask=0x0) at 
> > ../sysdeps/unix/sysv/linux/ppoll.c:42
> > #1  0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, 
> > timeout=-1) at ../util/qemu-timer.c:339
> > #2  0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, 
> > ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79
> > #3  0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at 
> > ../util/aio-posix.c:670
> > #4  0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at 
> > /home/febner/repos/qemu/block/block-gen.h:43
> > #5  0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at 
> > block/block-gen.c:1426
> > #6  0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, 
> > d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290

Oh... So essentially for an assertion.

I wonder if the blk_is_available() check introduced in 2a2d69f490c is
even necessary any more, because BlockBackend has its own AioContext
now. And if blk_bs(blk) != NULL isn't what we actually want to check if
the check is necessary, because calling bdrv_is_inserted() doesn't seem
to have been intended. blk_bs() wouldn't have to poll.

> > #7  0x558712f697e4 in virtio_scsi_handle_cmd_req_prepare 
> > (s=0x558716c049c0, req=0x7ff7e400b650) at ../hw/scsi/virtio-scsi.c:788
> > #8  0x558712f699b0 in virtio_scsi_handle_cmd_vq (s=0x558716c049c0, 
> > vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:831
> > #9  0x558712f69bcb in virtio_scsi_handle_cmd (vdev=0x558716c049c0, 
> > vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:867
> > #10 0x558712f96812 in virtio_queue_notify_vq (vq=0x558716c0d2a8) at 
> > ../hw/virtio/virtio.c:2263
> > #11 0x558712f99b75 in virtio_queue_host_notifier_read 
> > (n=0x558716c0d31c) at ../hw/virtio/virtio.c:3575
> > #12 0x55871323d8b5 in aio_dispatch_handler (ctx=0x55871598d5e0, 
> > node=0x558716771000) at ../util/aio-posix.c:372
> > #13 0x55871323d988 in aio_dispatch_ready_handlers (ctx=0x55871598d5e0, 
> > ready_list=0x7ff7f288eeb0) at ../util/aio-posix.c:401
> 
> 
> >> As for why it doesn't progress, blk_co_is_available_entry() uses
> >> bdrv_graph_co_rdlock() and can't get it, because the main thread has the
> >> write lock. Should be fixed once the AioContext locks are gone, but not
> >> sure what should be done to avoid it until then.
> > 
> > Then the nested event loop in blk_is_available() would probably be
> > enough to make progress, yes.
> > 
> > Maybe we could actually drop the lock (and immediately reacquire it) in
> > AIO_WAIT_WHILE() even if we're in the home thread? That should give the
> > main thread a chance to make progress.
> 
> Seems to work :) I haven't run into the issue with the following change
> anymore, but I have to say, running into that specific deadlock only
> happened every 10-15 tries or so before. Did 30 tests now. But
> unfortunately, the stuck IO issue is still there.
> 
> > diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
> > index 5449b6d742..da159501ca 100644
> > --- a/include/block/aio-wait.h
> > +++ b/include/block/aio-wait.h
> > @@ -88,7 +88,13 @@ extern AioWait global_aio_wait;
> >  smp_mb__after_rmw();   \
> >  if (ctx_ && in_aio_context_home_thread(ctx_)) {\
> >  while ((cond)) {   \
> > +if (unlock && ctx_) {  \
> > +aio_context_release(ctx_); \
> > +}  \
> >  aio_poll(ctx_, true);  \
> > +if (unlock && ctx_) {  \
> > +aio_context_acquire(ctx_); \
> > +}  \
> >

Re: deadlock when using iothread during backup_clean()

2023-10-18 Thread Fiona Ebner

Am 17.10.23 um 16:20 schrieb Kevin Wolf:
> Am 17.10.2023 um 15:37 hat Fiona Ebner geschrieben:
>> Am 17.10.23 um 14:12 schrieb Kevin Wolf:
>>> Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben:
 I ran into similar issues now with mirror, (both deadlocks and stuck
 guest IO at other times), and interestingly, also during job start.

 Also had a backtrace similar to [0] once, so I took a closer look.
 Probably was obvious to others already, but for the record:

 1. the graph is locked by the main thread
 2. the iothread holds the AioContext lock
 3. the main thread waits on the AioContext lock
 4. the iothread waits for coroutine spawned by blk_is_available()
>>>
>>> Where does this blk_is_available() in the iothread come from? Having it
>>> wait without dropping the AioContext lock sounds like something that
>>> we'd want to avoid. Ideally, devices using iothreads shouldn't use
>>> synchronous requests at all, but I think scsi-disk might have some of
>>> them.
>>>
>>
>> It's part of the request handling in virtio-scsi:
>>
>>> #0  0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, 
>>> timeout=, sigmask=0x0) at 
>>> ../sysdeps/unix/sysv/linux/ppoll.c:42
>>> #1  0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, 
>>> timeout=-1) at ../util/qemu-timer.c:339
>>> #2  0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, 
>>> ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79
>>> #3  0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at 
>>> ../util/aio-posix.c:670
>>> #4  0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at 
>>> /home/febner/repos/qemu/block/block-gen.h:43
>>> #5  0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at 
>>> block/block-gen.c:1426
>>> #6  0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, 
>>> d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290
> 
> Oh... So essentially for an assertion.
> 
> I wonder if the blk_is_available() check introduced in 2a2d69f490c is
> even necessary any more, because BlockBackend has its own AioContext
> now. And if blk_bs(blk) != NULL isn't what we actually want to check if
> the check is necessary, because calling bdrv_is_inserted() doesn't seem
> to have been intended. blk_bs() wouldn't have to poll.
> 

Could virtio_scsi_hotunplug() be an issue with removing or modifying the
check? There's a call there which sets the blk's AioContext to
qemu_get_aio_context(). Or are we sure that the assert in
virtio_scsi_ctx_check() can't be reached after that?

>>> #7  0x558712f697e4 in virtio_scsi_handle_cmd_req_prepare 
>>> (s=0x558716c049c0, req=0x7ff7e400b650) at ../hw/scsi/virtio-scsi.c:788
>>> #8  0x558712f699b0 in virtio_scsi_handle_cmd_vq (s=0x558716c049c0, 
>>> vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:831
>>> #9  0x558712f69bcb in virtio_scsi_handle_cmd (vdev=0x558716c049c0, 
>>> vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:867
>>> #10 0x558712f96812 in virtio_queue_notify_vq (vq=0x558716c0d2a8) at 
>>> ../hw/virtio/virtio.c:2263
>>> #11 0x558712f99b75 in virtio_queue_host_notifier_read 
>>> (n=0x558716c0d31c) at ../hw/virtio/virtio.c:3575
>>> #12 0x55871323d8b5 in aio_dispatch_handler (ctx=0x55871598d5e0, 
>>> node=0x558716771000) at ../util/aio-posix.c:372
>>> #13 0x55871323d988 in aio_dispatch_ready_handlers (ctx=0x55871598d5e0, 
>>> ready_list=0x7ff7f288eeb0) at ../util/aio-posix.c:401
>>
>>
 As for why it doesn't progress, blk_co_is_available_entry() uses
 bdrv_graph_co_rdlock() and can't get it, because the main thread has the
 write lock. Should be fixed once the AioContext locks are gone, but not
 sure what should be done to avoid it until then.
>>>
>>> Then the nested event loop in blk_is_available() would probably be
>>> enough to make progress, yes.
>>>
>>> Maybe we could actually drop the lock (and immediately reacquire it) in
>>> AIO_WAIT_WHILE() even if we're in the home thread? That should give the
>>> main thread a chance to make progress.
>>
>> Seems to work :) I haven't run into the issue with the following change
>> anymore, but I have to say, running into that specific deadlock only
>> happened every 10-15 tries or so before. Did 30 tests now. But
>> unfortunately, the stuck IO issue is still there.
>>
>>> diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
>>> index 5449b6d742..da159501ca 100644
>>> --- a/include/block/aio-wait.h
>>> +++ b/include/block/aio-wait.h
>>> @@ -88,7 +88,13 @@ extern AioWait global_aio_wait;
>>>  smp_mb__after_rmw();   \
>>>  if (ctx_ && in_aio_context_home_thread(ctx_)) {\
>>>  while ((cond)) {   \
>>> +if (unlock && ctx_) {  \
>>> +aio_context_release(ctx_); \
>>> +}

Re: deadlock when using iothread during backup_clean()

2023-10-19 Thread Kevin Wolf

Am 18.10.2023 um 11:42 hat Fiona Ebner geschrieben:
> Am 17.10.23 um 16:20 schrieb Kevin Wolf:
> > Am 17.10.2023 um 15:37 hat Fiona Ebner geschrieben:
> >> Am 17.10.23 um 14:12 schrieb Kevin Wolf:
> >>> Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben:
>  I ran into similar issues now with mirror, (both deadlocks and stuck
>  guest IO at other times), and interestingly, also during job start.
> 
>  Also had a backtrace similar to [0] once, so I took a closer look.
>  Probably was obvious to others already, but for the record:
> 
>  1. the graph is locked by the main thread
>  2. the iothread holds the AioContext lock
>  3. the main thread waits on the AioContext lock
>  4. the iothread waits for coroutine spawned by blk_is_available()
> >>>
> >>> Where does this blk_is_available() in the iothread come from? Having it
> >>> wait without dropping the AioContext lock sounds like something that
> >>> we'd want to avoid. Ideally, devices using iothreads shouldn't use
> >>> synchronous requests at all, but I think scsi-disk might have some of
> >>> them.
> >>>
> >>
> >> It's part of the request handling in virtio-scsi:
> >>
> >>> #0  0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, 
> >>> timeout=, sigmask=0x0) at 
> >>> ../sysdeps/unix/sysv/linux/ppoll.c:42
> >>> #1  0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, 
> >>> timeout=-1) at ../util/qemu-timer.c:339
> >>> #2  0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, 
> >>> ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79
> >>> #3  0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at 
> >>> ../util/aio-posix.c:670
> >>> #4  0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at 
> >>> /home/febner/repos/qemu/block/block-gen.h:43
> >>> #5  0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at 
> >>> block/block-gen.c:1426
> >>> #6  0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, 
> >>> d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290
> > 
> > Oh... So essentially for an assertion.
> > 
> > I wonder if the blk_is_available() check introduced in 2a2d69f490c is
> > even necessary any more, because BlockBackend has its own AioContext
> > now. And if blk_bs(blk) != NULL isn't what we actually want to check if
> > the check is necessary, because calling bdrv_is_inserted() doesn't seem
> > to have been intended. blk_bs() wouldn't have to poll.
> > 
> 
> Could virtio_scsi_hotunplug() be an issue with removing or modifying
> the check? There's a call there which sets the blk's AioContext to
> qemu_get_aio_context(). Or are we sure that the assert in
> virtio_scsi_ctx_check() can't be reached after that?

I think that would be the kind of bug that the assertion tries to
catch, because then we would be sending requests to blk from a thread
that doesn't match its AioContext (which will be allowed soon, but not
quite yet).

Before resetting the AioContext, virtio_scsi_hotunplug() calls
qdev_simple_device_unplug_cb(), which unrealizes the SCSI device. This
calls scsi_qdev_unrealize() -> scsi_device_purge_requests(), which in
turn drains blk and cancels all pending requests. So there should be
nothing left that could call into virtio_scsi_ctx_check() any more.

The other argument is that after unrealize, virtio_scsi_device_get()
would return NULL anyway, so even if a request were still pending, it
would just fail instead of accessing the unplugged device.

> >>> #7  0x558712f697e4 in virtio_scsi_handle_cmd_req_prepare 
> >>> (s=0x558716c049c0, req=0x7ff7e400b650) at ../hw/scsi/virtio-scsi.c:788
> >>> #8  0x558712f699b0 in virtio_scsi_handle_cmd_vq (s=0x558716c049c0, 
> >>> vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:831
> >>> #9  0x558712f69bcb in virtio_scsi_handle_cmd (vdev=0x558716c049c0, 
> >>> vq=0x558716c0d2a8) at ../hw/scsi/virtio-scsi.c:867
> >>> #10 0x558712f96812 in virtio_queue_notify_vq (vq=0x558716c0d2a8) at 
> >>> ../hw/virtio/virtio.c:2263
> >>> #11 0x558712f99b75 in virtio_queue_host_notifier_read 
> >>> (n=0x558716c0d31c) at ../hw/virtio/virtio.c:3575
> >>> #12 0x55871323d8b5 in aio_dispatch_handler (ctx=0x55871598d5e0, 
> >>> node=0x558716771000) at ../util/aio-posix.c:372
> >>> #13 0x55871323d988 in aio_dispatch_ready_handlers 
> >>> (ctx=0x55871598d5e0, ready_list=0x7ff7f288eeb0) at ../util/aio-posix.c:401
> >>
> >>
>  As for why it doesn't progress, blk_co_is_available_entry() uses
>  bdrv_graph_co_rdlock() and can't get it, because the main thread has the
>  write lock. Should be fixed once the AioContext locks are gone, but not
>  sure what should be done to avoid it until then.
> >>>
> >>> Then the nested event loop in blk_is_available() would probably be
> >>> enough to make progress, yes.
> >>>
> >>> Maybe we could actually drop the lock (and immediately reacquire it) in
> >>> AIO_WAIT_WHILE() even if we're in the home thread? That should give the
> >>> main thread a

Re: deadlock when using iothread during backup_clean()

2023-10-19 Thread Fiona Ebner

Am 19.10.23 um 14:14 schrieb Kevin Wolf:
> Am 18.10.2023 um 11:42 hat Fiona Ebner geschrieben:
>> Am 17.10.23 um 16:20 schrieb Kevin Wolf:
>>> Am 17.10.2023 um 15:37 hat Fiona Ebner geschrieben:
 Am 17.10.23 um 14:12 schrieb Kevin Wolf:
> Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben:
>> I ran into similar issues now with mirror, (both deadlocks and stuck
>> guest IO at other times), and interestingly, also during job start.
>>
>> Also had a backtrace similar to [0] once, so I took a closer look.
>> Probably was obvious to others already, but for the record:
>>
>> 1. the graph is locked by the main thread
>> 2. the iothread holds the AioContext lock
>> 3. the main thread waits on the AioContext lock
>> 4. the iothread waits for coroutine spawned by blk_is_available()
>
> Where does this blk_is_available() in the iothread come from? Having it
> wait without dropping the AioContext lock sounds like something that
> we'd want to avoid. Ideally, devices using iothreads shouldn't use
> synchronous requests at all, but I think scsi-disk might have some of
> them.
>

 It's part of the request handling in virtio-scsi:

> #0  0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, 
> timeout=, sigmask=0x0) at 
> ../sysdeps/unix/sysv/linux/ppoll.c:42
> #1  0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, 
> timeout=-1) at ../util/qemu-timer.c:339
> #2  0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, 
> ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79
> #3  0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) at 
> ../util/aio-posix.c:670
> #4  0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at 
> /home/febner/repos/qemu/block/block-gen.h:43
> #5  0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at 
> block/block-gen.c:1426
> #6  0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, 
> d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290
>>>
>>> Oh... So essentially for an assertion.
>>>
>>> I wonder if the blk_is_available() check introduced in 2a2d69f490c is
>>> even necessary any more, because BlockBackend has its own AioContext
>>> now. And if blk_bs(blk) != NULL isn't what we actually want to check if
>>> the check is necessary, because calling bdrv_is_inserted() doesn't seem
>>> to have been intended. blk_bs() wouldn't have to poll.
>>>
>>
>> Could virtio_scsi_hotunplug() be an issue with removing or modifying
>> the check? There's a call there which sets the blk's AioContext to
>> qemu_get_aio_context(). Or are we sure that the assert in
>> virtio_scsi_ctx_check() can't be reached after that?
> 
> I think that would be the kind of bug that the assertion tries to
> catch, because then we would be sending requests to blk from a thread
> that doesn't match its AioContext (which will be allowed soon, but not
> quite yet).
> 
> Before resetting the AioContext, virtio_scsi_hotunplug() calls
> qdev_simple_device_unplug_cb(), which unrealizes the SCSI device. This
> calls scsi_qdev_unrealize() -> scsi_device_purge_requests(), which in
> turn drains blk and cancels all pending requests. So there should be
> nothing left that could call into virtio_scsi_ctx_check() any more.
> 
> The other argument is that after unrealize, virtio_scsi_device_get()
> would return NULL anyway, so even if a request were still pending, it
> would just fail instead of accessing the unplugged device.
> 

Okay, sounds like a way to get around that deadlock issue then :)

(...)

>>>
>>> What does the stuck I/O look like? Is it stuck in the backend, i.e. the
>>> device started requests that never complete? Or stuck from the guest
>>> perspective, i.e. the device never checks for new requests?
>>>
>>
>> AFAICT, from the guest perspective.
>>
>>> I don't really have an idea immediately, we'd have to find out where the
>>> stuck I/O stops being processed.
>>>
>>
>> I've described it in an earlier mail in this thread:
>> https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg01900.html
>>
>> Quoting from there:
>>
>>> After the IO was stuck in the guest, I used bdrv_next_all_states() to
>>> iterate over the states and there's only the bdrv_raw and the
>>> bdrv_host_device. For both, tracked_requests was empty.
> 
> And bs->in_flight and blk->in_flight are 0, too?
> 

Yes. And queued_requests in the BlockBackend is also empty.

> Is anything quiesced?

No. quiesce_counter is 0 for both BlockDriverState instances as well as
for the BlockBackend. quiesced_parent is false for both parents (i.e.
child_root for the bdrv_raw and child_of_bds for the bdrv_file (this
time I used VirtIO SCSI, in the quote it was VirtIO block)).

>>> What is also very interesting is that the IO isn't always dead
>>> immediately. It can be that the fio command still runs with lower speed
>>> for a while (sometimes even up to about a minute,

Re: deadlock when using iothread during backup_clean()

2023-10-20 Thread Fiona Ebner

Am 19.10.23 um 15:53 schrieb Fiona Ebner:
> Am 19.10.23 um 14:14 schrieb Kevin Wolf:
>> Am 18.10.2023 um 11:42 hat Fiona Ebner geschrieben:
>>> Am 17.10.23 um 16:20 schrieb Kevin Wolf:
 Am 17.10.2023 um 15:37 hat Fiona Ebner geschrieben:
> Am 17.10.23 um 14:12 schrieb Kevin Wolf:
>> Am 17.10.2023 um 12:18 hat Fiona Ebner geschrieben:
>>> I ran into similar issues now with mirror, (both deadlocks and stuck
>>> guest IO at other times), and interestingly, also during job start.
>>>
>>> Also had a backtrace similar to [0] once, so I took a closer look.
>>> Probably was obvious to others already, but for the record:
>>>
>>> 1. the graph is locked by the main thread
>>> 2. the iothread holds the AioContext lock
>>> 3. the main thread waits on the AioContext lock
>>> 4. the iothread waits for coroutine spawned by blk_is_available()
>>
>> Where does this blk_is_available() in the iothread come from? Having it
>> wait without dropping the AioContext lock sounds like something that
>> we'd want to avoid. Ideally, devices using iothreads shouldn't use
>> synchronous requests at all, but I think scsi-disk might have some of
>> them.
>>
>
> It's part of the request handling in virtio-scsi:
>
>> #0  0x7ff7f5f55136 in __ppoll (fds=0x7ff7e40030c0, nfds=8, 
>> timeout=, sigmask=0x0) at 
>> ../sysdeps/unix/sysv/linux/ppoll.c:42
>> #1  0x5587132615ab in qemu_poll_ns (fds=0x7ff7e40030c0, nfds=8, 
>> timeout=-1) at ../util/qemu-timer.c:339
>> #2  0x55871323e8b1 in fdmon_poll_wait (ctx=0x55871598d5e0, 
>> ready_list=0x7ff7f288ebe0, timeout=-1) at ../util/fdmon-poll.c:79
>> #3  0x55871323e1ed in aio_poll (ctx=0x55871598d5e0, blocking=true) 
>> at ../util/aio-posix.c:670
>> #4  0x558713089efa in bdrv_poll_co (s=0x7ff7f288ec90) at 
>> /home/febner/repos/qemu/block/block-gen.h:43
>> #5  0x55871308c362 in blk_is_available (blk=0x55871599e2f0) at 
>> block/block-gen.c:1426
>> #6  0x558712f6843b in virtio_scsi_ctx_check (s=0x558716c049c0, 
>> d=0x55871581cd30) at ../hw/scsi/virtio-scsi.c:290

 Oh... So essentially for an assertion.

 I wonder if the blk_is_available() check introduced in 2a2d69f490c is
 even necessary any more, because BlockBackend has its own AioContext
 now. And if blk_bs(blk) != NULL isn't what we actually want to check if
 the check is necessary, because calling bdrv_is_inserted() doesn't seem
 to have been intended. blk_bs() wouldn't have to poll.

>>>
>>> Could virtio_scsi_hotunplug() be an issue with removing or modifying
>>> the check? There's a call there which sets the blk's AioContext to
>>> qemu_get_aio_context(). Or are we sure that the assert in
>>> virtio_scsi_ctx_check() can't be reached after that?
>>
>> I think that would be the kind of bug that the assertion tries to
>> catch, because then we would be sending requests to blk from a thread
>> that doesn't match its AioContext (which will be allowed soon, but not
>> quite yet).
>>
>> Before resetting the AioContext, virtio_scsi_hotunplug() calls
>> qdev_simple_device_unplug_cb(), which unrealizes the SCSI device. This
>> calls scsi_qdev_unrealize() -> scsi_device_purge_requests(), which in
>> turn drains blk and cancels all pending requests. So there should be
>> nothing left that could call into virtio_scsi_ctx_check() any more.
>>
>> The other argument is that after unrealize, virtio_scsi_device_get()
>> would return NULL anyway, so even if a request were still pending, it
>> would just fail instead of accessing the unplugged device.
>>
> 
> Okay, sounds like a way to get around that deadlock issue then :)
> 

Unfortunately, scsi_dma_command() also has a blk_is_available() call and
I ran into a similar deadlock with that once.

> (...)
> 

 What does the stuck I/O look like? Is it stuck in the backend, i.e. the
 device started requests that never complete? Or stuck from the guest
 perspective, i.e. the device never checks for new requests?

>>>
>>> AFAICT, from the guest perspective.
>>>
 I don't really have an idea immediately, we'd have to find out where the
 stuck I/O stops being processed.

>>>
>>> I've described it in an earlier mail in this thread:
>>> https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg01900.html
>>>
>>> Quoting from there:
>>>
 After the IO was stuck in the guest, I used bdrv_next_all_states() to
 iterate over the states and there's only the bdrv_raw and the
 bdrv_host_device. For both, tracked_requests was empty.
>>
>> And bs->in_flight and blk->in_flight are 0, too?
>>
> 
> Yes. And queued_requests in the BlockBackend is also empty.
> 
>> Is anything quiesced?
> 
> No. quiesce_counter is 0 for both BlockDriverState instances as well as
> for the BlockBackend. quiesced_parent is false for both parents (i.e.
> child_root for the bdrv_raw and child_of_bds for th

Re: deadlock when using iothread during backup_clean()

2023-11-03 Thread Fiona Ebner

Am 20.10.23 um 15:52 schrieb Fiona Ebner:
> And I found that with drive-mirror, the issue during starting seems to
> manifest with the bdrv_open() call. Adding a return before it, the guest
> IO didn't get stuck in my testing, but adding a return after it, it can
> get stuck. I'll try to see if I can further narrow it down next week,
> but maybe that's already a useful hint?
> 

In the end, I was able to find a reproducer that just does draining and
bisected the issue (doesn't seem related to the graph lock after all). I
replied there, to avoid all the overhead from this thread:

https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg00681.html

Best Regards,
Fiona

deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

Re: deadlock when using iothread during backup_clean()

16 matches

Site Navigation

Mail list logo

Footer information