On Mon, Mar 11, 2024 at 04:40:05PM +0100, Kevin Wolf wrote: > Am 11.03.2024 um 14:09 hat Stefan Hajnoczi geschrieben: > > On Mon, Mar 11, 2024 at 11:13:33AM +0530, Prasad Pandit wrote: > > > From: Prasad Pandit <p...@fedoraproject.org> > > > > > > Libaio defines IO_CMD_FDSYNC command to sync all outstanding > > > asynchronous I/O operations, by flushing out file data to the > > > disk storage. > > > > > > Enable linux-aio to submit such aio request. This helps to > > > reduce latency induced via pthread_create calls by > > > thread-pool (aio=threads). > > > > > > Signed-off-by: Prasad Pandit <p...@fedoraproject.org> > > > --- > > > block/file-posix.c | 12 ++++++++++++ > > > block/linux-aio.c | 5 ++++- > > > 2 files changed, 16 insertions(+), 1 deletion(-) > > > > > > v2: if IO_CMD_FDSYNC is not supported by the kernel, > > > fallback on thread-pool flush. > > > -> > > > https://lists.nongnu.org/archive/html/qemu-devel/2024-03/msg01986.html > > > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > > index 35684f7e21..4f2195d01d 100644 > > > --- a/block/file-posix.c > > > +++ b/block/file-posix.c > > > @@ -2599,6 +2599,18 @@ static int coroutine_fn > > > raw_co_flush_to_disk(BlockDriverState *bs) > > > if (raw_check_linux_io_uring(s)) { > > > return luring_co_submit(bs, s->fd, 0, NULL, QEMU_AIO_FLUSH); > > > } > > > +#endif > > > +#ifdef CONFIG_LINUX_AIO > > > + if (raw_check_linux_aio(s)) { > > > + ret = laio_co_submit(s->fd, 0, NULL, QEMU_AIO_FLUSH, 0); > > > + if (ret >= 0) { > > > + /* > > > + * if AIO_FLUSH is supported return > > > + * else fallback on thread-pool flush. > > > + */ > > > + return ret; > > > + } > > > > Falling back every time on an older host kernel might be a noticeable > > performance regression. That can be avoided with a variable that keeps > > track of whether -EINVAL was seen before and skips Linux AIO in that > > case. > > > > However, it appears that popular distributions starting from Debian 10, > > Ubuntu 20.04, Fedora 27, CentOS 8, and OpenSUSE Leap 15.5 have the > > necessary minimum Linux 4.18 kernel: > > https://repology.org/project/linux/versions > > > > Fallback should be very rare, so I don't think it needs to be optimized: > > > > Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com> > > We might need this approach for a different reason: This is an > io_submit() error, so while we retry the flush with the fallback path, > other requests in the same batch may incorrectly return errors. This > probably explains the errors Prasad saw in the guest when the kernel > doesn't have support for flush in Linux AIO. > > So in order to avoid this, we'll probably have to send one flush just to > probe (while making sure that no other request is pending - maybe > immediately when opening the image?) and then remember whether it > worked. > > Or we'd have to change the error handling around io_submit(), so that we > don't always fail the first request in the batch, but first fail any > flushes and only then the rest of the requests.
I don't see the behavior you are describing in the code. My interpretation of ioq_submit() is that only the flush request fails. Other queued requests (before and after the flush) are submitted successfully: static void ioq_submit(LinuxAioState *s) { int ret, len; struct qemu_laiocb *aiocb; struct iocb *iocbs[MAX_EVENTS]; QSIMPLEQ_HEAD(, qemu_laiocb) completed; do { if (s->io_q.in_flight >= MAX_EVENTS) { break; } len = 0; QSIMPLEQ_FOREACH(aiocb, &s->io_q.pending, next) { iocbs[len++] = &aiocb->iocb; if (s->io_q.in_flight + len >= MAX_EVENTS) { break; } } ret = io_submit(s->ctx, len, iocbs); if (ret == -EAGAIN) { break; } if (ret < 0) { /* Fail the first request, retry the rest */ aiocb = QSIMPLEQ_FIRST(&s->io_q.pending); QSIMPLEQ_REMOVE_HEAD(&s->io_q.pending, next); s->io_q.in_queue--; aiocb->ret = ret; qemu_laio_process_completion(aiocb); continue; } s->io_q.in_flight += ret; s->io_q.in_queue -= ret; aiocb = container_of(iocbs[ret - 1], struct qemu_laiocb, iocb); QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed); } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending)); Have I missed something? Thanks, Stefan
signature.asc
Description: PGP signature