Re: [PATCH for-10.2] Revert "nvme: Fix coroutine waking"

Hanna Czenczek Fri, 12 Dec 2025 09:32:59 -0800

On 12.12.25 11:25, Hanna Czenczek wrote:

This reverts commit 0f142cbd919fcb6cea7aa176f7e4939925806dd9.


Lukáš Doktor reported a simple single-threaded nvme test case hanging
and bisected it to this commit.  While we are still investigating, it is
best to revert the commit for now.

(This breaks multiqueue for nvme, but better to have single-queue
working than neither.)

Cc: [email protected]
Reported-by: Lukáš Doktor <[email protected]>
Signed-off-by: Hanna Czenczek <[email protected]>
---
  block/nvme.c | 56 +++++++++++++++++++++++++---------------------------
  1 file changed, 27 insertions(+), 29 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 919e14cef9..c3d3b99d1f 100644
--- a/block/nvme.c
+++ b/block/nvme.c


[...]

  /* Put into NVMeRequest.cb, so runs in the BDS's main AioContext */
  static void nvme_rw_cb(void *opaque, int ret)
  {


[...]

-        aio_co_wake(data->co);

[...]

+    replay_bh_schedule_oneshot_event(data->ctx, nvme_rw_cb_bh, data);
  }

From testing, this bit seems to be the important one: The hang seems tobe caused by entering directly the coroutine directly instead of alwaysgoing through a BH. Why that is, I haven’t yet found out, only thats/aio_co_wake()/aio_co_schedule()/ seems to make it work.


I’ll spend more time trying to find out why.

(The only thing I know so far is that iscsi similarly should not useaio_co_wake(), and for that we do have a documented reason:https://gitlab.com/qemu-project/qemu/-/commit/8b9dfe9098 – in light ofthat, it probably makes sense not to use aio_co_wake() for NFS either,which was the third case in the original series where I replaced aoneshot schedule by aio_co_wake().)


Hanna

Re: [PATCH for-10.2] Revert "nvme: Fix coroutine waking"

Reply via email to