Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
On Thu, Jul 28, 2011 at 02:41:02PM +0200, Kevin Wolf wrote: More or less. There's one corner case for all Linux I/O, and that is only writes up to INT_MAX are supported, and larger writes (and reads) get truncated to it. It's pretty nasty, but Linux has been vocally opposed to fixing this issue. I think we can safely ignore this. So just replacing the current ret = -EINVAL; by a memset(buf + ret, 0, len - ret); ret = 0; should be okay, right? (Of course using the qiov versions, but you get the idea) This should be safe.
Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
On Thu, Jul 28, 2011 at 1:15 PM, Christoph Hellwig h...@lst.de wrote: On Thu, Jul 28, 2011 at 09:47:05AM +0200, Kevin Wolf wrote: Indeed. This has come up a few times, and actually is a mostly trivial task. Maybe we should give up waiting for -blockdev and separate cache mode settings and allow a nocache-writethrough or similar mode now? It's going to be around 10 lines of code + documentation. I understand that there may be reasons for using O_DIRECT | O_DSYNC, but what is the explanation for O_DSYNC improving performance? There isn't any, at least for modern Linux. O_DSYNC at this point is equivalent to a range fdatasync for each write call, and given that we're doing O_DIRECT the ranges flush doesn't matter. If you do have a modern host and an old guest it might end up beeing faster because the barrier implementation in Linux used to suck so badly, but that's not inhrent to the I/O model. If you guest however doesn't support cache flushes at all O_DIRECT | O_DSYNC is the only sane model to use for local filesystems and block devices. I can rebase this cache=directsync patch and send it: http://repo.or.cz/w/qemu/stefanha.git/commitdiff/6756719a46ac9876ac6d0460a33ad98e96a3a923 The other weird caching-related option I was playing with is -drive ...,readahead=on|off. It lets you disable the host kernel readahead on buffered modes (cache=writeback|writethrough): http://repo.or.cz/w/qemu/stefanha.git/commitdiff/f2fc2b297a2b2dd0cccd1dc2f7c519f3b0374e0d Stefan
Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
Am 27.07.2011 21:57, schrieb Christoph Hellwig: On Wed, Jul 27, 2011 at 09:52:51PM +0200, Frediano Ziglio wrote: Also I notice that combining XFS, Linux AIO, O_DIRECT and O_DSYNC give impressive performance but currently there is no way to specify all that flags together cause nocache enable O_DIRECT while O_DSYNC is enabled with writethrough. Indeed. This has come up a few times, and actually is a mostly trivial task. Maybe we should give up waiting for -blockdev and separate cache mode settings and allow a nocache-writethrough or similar mode now? It's going to be around 10 lines of code + documentation. I understand that there may be reasons for using O_DIRECT | O_DSYNC, but what is the explanation for O_DSYNC improving performance? Christoph, on another note: Can we rely on Linux AIO never returning short writes except on EOF? Currently we return -EINVAL in this case, so I hope it's true or we wouldn't return the correct error code. The reason why I'm asking is because I want to allow reads across EOF for growable images and pad with zeros (the synchronous code does this today in order to allow bdrv_pread/pwrite to work, and when we start using coroutines in the block layer, these cases will hit the AIO paths). Kevin
Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
On Thu, Jul 28, 2011 at 09:47:05AM +0200, Kevin Wolf wrote: Indeed. This has come up a few times, and actually is a mostly trivial task. Maybe we should give up waiting for -blockdev and separate cache mode settings and allow a nocache-writethrough or similar mode now? It's going to be around 10 lines of code + documentation. I understand that there may be reasons for using O_DIRECT | O_DSYNC, but what is the explanation for O_DSYNC improving performance? There isn't any, at least for modern Linux. O_DSYNC at this point is equivalent to a range fdatasync for each write call, and given that we're doing O_DIRECT the ranges flush doesn't matter. If you do have a modern host and an old guest it might end up beeing faster because the barrier implementation in Linux used to suck so badly, but that's not inhrent to the I/O model. If you guest however doesn't support cache flushes at all O_DIRECT | O_DSYNC is the only sane model to use for local filesystems and block devices. Christoph, on another note: Can we rely on Linux AIO never returning short writes except on EOF? Currently we return -EINVAL in this case, so I hope it's true or we wouldn't return the correct error code. More or less. There's one corner case for all Linux I/O, and that is only writes up to INT_MAX are supported, and larger writes (and reads) get truncated to it. It's pretty nasty, but Linux has been vocally opposed to fixing this issue.
Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
Am 28.07.2011 14:15, schrieb Christoph Hellwig: Christoph, on another note: Can we rely on Linux AIO never returning short writes except on EOF? Currently we return -EINVAL in this case, so short reads I meant, of course. I hope it's true or we wouldn't return the correct error code. More or less. There's one corner case for all Linux I/O, and that is only writes up to INT_MAX are supported, and larger writes (and reads) get truncated to it. It's pretty nasty, but Linux has been vocally opposed to fixing this issue. I think we can safely ignore this. So just replacing the current ret = -EINVAL; by a memset(buf + ret, 0, len - ret); ret = 0; should be okay, right? (Of course using the qiov versions, but you get the idea) Kevin
[Qemu-devel] [PATCH 1/2] linux aio: support flush operation
Signed-off-by: Frediano Ziglio fredd...@gmail.com --- block/raw-posix.c |7 +++ linux-aio.c |3 +++ 2 files changed, 10 insertions(+), 0 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index 3c6bd4b..27ae81e 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -628,6 +628,13 @@ static BlockDriverAIOCB *raw_aio_flush(BlockDriverState *bs, if (fd_open(bs) 0) return NULL; +#ifdef CONFIG_LINUX_AIO +if (s-use_aio) { +return laio_submit(bs, s-aio_ctx, s-fd, 0, NULL, + 0, cb, opaque, QEMU_AIO_FLUSH); +} +#endif + return paio_submit(bs, s-fd, 0, NULL, 0, cb, opaque, QEMU_AIO_FLUSH); } diff --git a/linux-aio.c b/linux-aio.c index 68f4b3d..d07c435 100644 --- a/linux-aio.c +++ b/linux-aio.c @@ -215,6 +215,9 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd, case QEMU_AIO_READ: io_prep_preadv(iocbs, fd, qiov-iov, qiov-niov, offset); break; +case QEMU_AIO_FLUSH: +io_prep_fdsync(iocbs, fd); +break; default: fprintf(stderr, %s: invalid AIO request type 0x%x.\n, __func__, type); -- 1.7.1
Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
Did you test this at all? On Wed, Jul 27, 2011 at 08:25:25PM +0200, Frediano Ziglio wrote: +case QEMU_AIO_FLUSH: +io_prep_fdsync(iocbs, fd); +break; Looks great, but doesn't work as expected. Hint: grep for aio_fsync in the linux kernel.
Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
Il giorno 27/lug/2011, alle ore 20:31, Christoph Hellwig h...@lst.de ha scritto: Did you test this at all? Yes! Not at kernel level :-) Usually I trust documentation and man pages. On Wed, Jul 27, 2011 at 08:25:25PM +0200, Frediano Ziglio wrote: +case QEMU_AIO_FLUSH: +io_prep_fdsync(iocbs, fd); +break; Looks great, but doesn't work as expected. Hint: grep for aio_fsync in the linux kernel. Thanks. I'll try to port misaligned access to Linux AIO. Also I'll add some comments on code to avoid somebody do the same mistache I did. Mainly however -k qemu-img and aio=native in blockdev options are silently ignored if nocache is not enabled. Also I notice that combining XFS, Linux AIO, O_DIRECT and O_DSYNC give impressive performance but currently there is no way to specify all that flags together cause nocache enable O_DIRECT while O_DSYNC is enabled with writethrough. Frediano
Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
On Wed, Jul 27, 2011 at 09:52:51PM +0200, Frediano Ziglio wrote: Yes! Not at kernel level :-) In that case we have a bad error handling problem somewhere in qemu. the IOCB_CMD_FDSYNC aio opcode will always return EINVAL currently, and we really should have cought that somewhere in qemu. Thanks. I'll try to port misaligned access to Linux AIO. Also I'll add some comments on code to avoid somebody do the same mistache I did. It's direct I/O code in general that doesn't handle misaligned access. Given that we should never get misaligned I/O from guests I just didn't bother duplicating the read-modify-write code for that code path as well. Mainly however -k qemu-img and aio=native in blockdev options are silently ignored if nocache is not enabled. Maybe we should indeed error out instead. Care to prepare a patch for that? Also I notice that combining XFS, Linux AIO, O_DIRECT and O_DSYNC give impressive performance but currently there is no way to specify all that flags together cause nocache enable O_DIRECT while O_DSYNC is enabled with writethrough. Indeed. This has come up a few times, and actually is a mostly trivial task. Maybe we should give up waiting for -blockdev and separate cache mode settings and allow a nocache-writethrough or similar mode now? It's going to be around 10 lines of code + documentation.