On Mon, Aug 02, 2021 at 02:40:36PM +0200, Kevin Wolf wrote:
Am 29.07.2021 um 11:10 hat Fabian Ebner geschrieben:
Linux SCSI can throw spurious -EAGAIN in some corner cases in its
completion path, which will end up being the result in the completed
io_uring request.
Resubmitting such requests should allow block jobs to complete, even
if such spurious errors are encountered.
Co-authored-by: Stefan Hajnoczi <stefa...@gmail.com>
Reviewed-by: Stefano Garzarella <sgarz...@redhat.com>
Signed-off-by: Fabian Ebner <f.eb...@proxmox.com>
---
Changes from v1:
* Focus on what's relevant for the patch itself in the commit
message.
* Add Stefan's comment.
* Add Stefano's R-b tag (I hope that's fine, since there was no
change code-wise).
block/io_uring.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/block/io_uring.c b/block/io_uring.c
index 00a3ee9fb8..dfa475cc87 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -165,7 +165,21 @@ static void luring_process_completions(LuringState *s)
total_bytes = ret + luringcb->total_read;
if (ret < 0) {
- if (ret == -EINTR) {
+ /*
+ * Only writev/readv/fsync requests on regular files or host block
+ * devices are submitted. Therefore -EAGAIN is not expected but
it's
+ * known to happen sometimes with Linux SCSI. Submit again and hope
+ * the request completes successfully.
+ *
+ * For more information, see:
+ *
https://lore.kernel.org/io-uring/20210727165811.284510-3-ax...@kernel.dk/T/#u
+ *
+ * If the code is changed to submit other types of requests in the
+ * future, then this workaround may need to be extended to deal
with
+ * genuine -EAGAIN results that should not be resubmitted
+ * immediately.
+ */
+ if (ret == -EINTR || ret == -EAGAIN) {
luring_resubmit(s, luringcb);
continue;
}
Reviewed-by: Kevin Wolf <kw...@redhat.com>
Question about the preexisting code, though: luring_resubmit() requires
that the caller calls ioq_submit() later so that the request doesn't
just sit in a queue without getting any attention, but actually gets
submitted to the kernel.
In the call chain ioq_submit() -> luring_process_completions() ->
luring_resubmit(), who takes care of that?
Mmm, good point.
There should be the same problem with ioq_submit() ->
luring_process_completions() -> luring_resubmit_short_read() ->
luring_resubmit().
Should we schedule a BH for example in luring_resubmit() to make sure
that ioq_submit() is invoked after a resubmission?
Thanks,
Stefano