Am 28.10.2025 um 17:33 hat Hanna Czenczek geschrieben: > nvme wakes the request coroutine via qemu_coroutine_enter() from a BH > scheduled in the BDS AioContext. This may not be the same context as > the one in which the request originally ran, which would be wrong: > - It could mean we enter the coroutine before it yields, > - We would move the coroutine in to a different context. > > (Can be reproduced with multiqueue by adding a usleep(100000) before the > `while (data.ret == -EINPROGRESS)` loop.) > > To fix that, schedule nvme_rw_cb_bh() in the coroutine AioContext. > (Just like in the preceding iscsi and nfs patches, we could drop the > trivial nvme_rw_cb_bh() and just use aio_co_wake() directly, but don’t > do that to keep replay_bh_schedule_oneshot_event().) > > With this, we can remove NVMeCoData.ctx. > > Note the check of data->co == NULL to bypass the BH/yield combination in > case nvme_rw_cb() is called from nvme_submit_command(): We probably want > to keep this fast path for performance reasons, but we have to be quite > careful about it: > - We cannot overload .ret for this, but have to use a dedicated > .skip_yield field. Otherwise, if nvme_rw_cb() runs in a different > thread than the coroutine, it may see .ret set and skip the yield, > while nvme_rw_cb() will still schedule a BH for waking. Therefore, > the signal to skip the yield can only be set in nvme_rw_cb() if waking > too is skipped, which is independent from communicating the return > value. > - We can only skip the yield if nvme_rw_cb() actually runs in the > request coroutine. Otherwise (specifically if they run in different > AioContexts), the order between this function’s execution and the > coroutine yielding (or not yielding) is not reliable. > - There is no point to yielding in a loop; there are no spurious wakes, > so once we yield, we will only be re-entered once the command is done. > Replace `while` by `if`. > > Cc: [email protected] > Signed-off-by: Hanna Czenczek <[email protected]>
As suggested in an earlier patch, I prefer to keep setting -EINPROGRESS, even if we don't check for it elsewhere, for better debugability. Kevin
