Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation

2011-07-29 Thread Christoph Hellwig
On Thu, Jul 28, 2011 at 02:41:02PM +0200, Kevin Wolf wrote:
  More or less.  There's one corner case for all Linux I/O, and that is
  only writes up to INT_MAX are supported, and larger writes (and reads)
  get truncated to it.  It's pretty nasty, but Linux has been vocally
  opposed to fixing this issue.
 
 I think we can safely ignore this. So just replacing the current
 ret = -EINVAL; by a memset(buf + ret, 0, len - ret); ret = 0; should be
 okay, right? (Of course using the qiov versions, but you get the idea)

This should be safe.




Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation

2011-07-29 Thread Stefan Hajnoczi
On Thu, Jul 28, 2011 at 1:15 PM, Christoph Hellwig h...@lst.de wrote:
 On Thu, Jul 28, 2011 at 09:47:05AM +0200, Kevin Wolf wrote:
  Indeed.  This has come up a few times, and actually is a mostly trivial
  task.  Maybe we should give up waiting for -blockdev and separate cache
  mode settings and allow a nocache-writethrough or similar mode now?  It's
  going to be around 10 lines of code + documentation.

 I understand that there may be reasons for using O_DIRECT | O_DSYNC, but
 what is the explanation for O_DSYNC improving performance?

 There isn't any, at least for modern Linux.  O_DSYNC at this point is
 equivalent to a range fdatasync for each write call, and given that we're
 doing O_DIRECT the ranges flush doesn't matter.  If you do have a modern
 host and an old guest it might end up beeing faster because the barrier
 implementation in Linux used to suck so badly, but that's not inhrent
 to the I/O model.  If you guest however doesn't support cache flushes
 at all O_DIRECT | O_DSYNC is the only sane model to use for local filesystems
 and block devices.

I can rebase this cache=directsync patch and send it:
http://repo.or.cz/w/qemu/stefanha.git/commitdiff/6756719a46ac9876ac6d0460a33ad98e96a3a923

The other weird caching-related option I was playing with is -drive
...,readahead=on|off.  It lets you disable the host kernel readahead
on buffered modes (cache=writeback|writethrough):
http://repo.or.cz/w/qemu/stefanha.git/commitdiff/f2fc2b297a2b2dd0cccd1dc2f7c519f3b0374e0d

Stefan



Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation

2011-07-28 Thread Kevin Wolf
Am 27.07.2011 21:57, schrieb Christoph Hellwig:
 On Wed, Jul 27, 2011 at 09:52:51PM +0200, Frediano Ziglio wrote:
 Also I notice that combining XFS, Linux AIO, O_DIRECT and O_DSYNC give 
 impressive performance but currently there is no way to specify all that 
 flags together cause nocache enable O_DIRECT while O_DSYNC is enabled with 
 writethrough.
 
 Indeed.  This has come up a few times, and actually is a mostly trivial
 task.  Maybe we should give up waiting for -blockdev and separate cache
 mode settings and allow a nocache-writethrough or similar mode now?  It's
 going to be around 10 lines of code + documentation.

I understand that there may be reasons for using O_DIRECT | O_DSYNC, but
what is the explanation for O_DSYNC improving performance?

Christoph, on another note: Can we rely on Linux AIO never returning
short writes except on EOF? Currently we return -EINVAL in this case, so
I hope it's true or we wouldn't return the correct error code.

The reason why I'm asking is because I want to allow reads across EOF
for growable images and pad with zeros (the synchronous code does this
today in order to allow bdrv_pread/pwrite to work, and when we start
using coroutines in the block layer, these cases will hit the AIO paths).

Kevin



Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation

2011-07-28 Thread Christoph Hellwig
On Thu, Jul 28, 2011 at 09:47:05AM +0200, Kevin Wolf wrote:
  Indeed.  This has come up a few times, and actually is a mostly trivial
  task.  Maybe we should give up waiting for -blockdev and separate cache
  mode settings and allow a nocache-writethrough or similar mode now?  It's
  going to be around 10 lines of code + documentation.
 
 I understand that there may be reasons for using O_DIRECT | O_DSYNC, but
 what is the explanation for O_DSYNC improving performance?

There isn't any, at least for modern Linux.  O_DSYNC at this point is
equivalent to a range fdatasync for each write call, and given that we're
doing O_DIRECT the ranges flush doesn't matter.  If you do have a modern
host and an old guest it might end up beeing faster because the barrier
implementation in Linux used to suck so badly, but that's not inhrent
to the I/O model.  If you guest however doesn't support cache flushes
at all O_DIRECT | O_DSYNC is the only sane model to use for local filesystems
and block devices.
 

 Christoph, on another note: Can we rely on Linux AIO never returning
 short writes except on EOF? Currently we return -EINVAL in this case, so
 I hope it's true or we wouldn't return the correct error code.

More or less.  There's one corner case for all Linux I/O, and that is
only writes up to INT_MAX are supported, and larger writes (and reads)
get truncated to it.  It's pretty nasty, but Linux has been vocally
opposed to fixing this issue.




Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation

2011-07-28 Thread Kevin Wolf
Am 28.07.2011 14:15, schrieb Christoph Hellwig:
 Christoph, on another note: Can we rely on Linux AIO never returning
 short writes except on EOF? Currently we return -EINVAL in this case, so

short reads I meant, of course.

 I hope it's true or we wouldn't return the correct error code.
 
 More or less.  There's one corner case for all Linux I/O, and that is
 only writes up to INT_MAX are supported, and larger writes (and reads)
 get truncated to it.  It's pretty nasty, but Linux has been vocally
 opposed to fixing this issue.

I think we can safely ignore this. So just replacing the current
ret = -EINVAL; by a memset(buf + ret, 0, len - ret); ret = 0; should be
okay, right? (Of course using the qiov versions, but you get the idea)

Kevin



Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation

2011-07-27 Thread Christoph Hellwig
Did you test this at all?

On Wed, Jul 27, 2011 at 08:25:25PM +0200, Frediano Ziglio wrote:
 +case QEMU_AIO_FLUSH:
 +io_prep_fdsync(iocbs, fd);
 +break;

Looks great, but doesn't work as expected.

Hint: grep for aio_fsync in the linux kernel.




Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation

2011-07-27 Thread Frediano Ziglio

Il giorno 27/lug/2011, alle ore 20:31, Christoph Hellwig h...@lst.de ha 
scritto:

 Did you test this at all?
 

Yes! Not at kernel level :-)
Usually I trust documentation and man pages.

 On Wed, Jul 27, 2011 at 08:25:25PM +0200, Frediano Ziglio wrote:
 +case QEMU_AIO_FLUSH:
 +io_prep_fdsync(iocbs, fd);
 +break;
 
 Looks great, but doesn't work as expected.
 
 Hint: grep for aio_fsync in the linux kernel.
 

Thanks. I'll try to port misaligned access to Linux AIO. Also I'll add some 
comments on code to avoid somebody do the same mistache I did.
Mainly however -k qemu-img and aio=native in blockdev options are silently 
ignored if nocache is not enabled. Also I notice that combining XFS, Linux AIO, 
O_DIRECT and O_DSYNC give impressive performance but currently there is no way 
to specify all that flags together cause nocache enable O_DIRECT while O_DSYNC 
is enabled with writethrough.

Frediano


Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation

2011-07-27 Thread Christoph Hellwig
On Wed, Jul 27, 2011 at 09:52:51PM +0200, Frediano Ziglio wrote:
  
 
 Yes! Not at kernel level :-)

In that case we have a bad error handling problem somewhere in qemu.
the IOCB_CMD_FDSYNC aio opcode will always return EINVAL currently,
and we really should have cought that somewhere in qemu.

 Thanks. I'll try to port misaligned access to Linux AIO. Also I'll add some 
 comments on code to avoid somebody do the same mistache I did.

It's direct I/O code in general that doesn't handle misaligned access.
Given that we should never get misaligned I/O from guests I just didn't
bother duplicating the read-modify-write code for that code path as well.

 Mainly however -k qemu-img and aio=native in blockdev options are silently 
 ignored if nocache is not enabled.

Maybe we should indeed error out instead.  Care to prepare a patch for that?

 Also I notice that combining XFS, Linux AIO, O_DIRECT and O_DSYNC give 
 impressive performance but currently there is no way to specify all that 
 flags together cause nocache enable O_DIRECT while O_DSYNC is enabled with 
 writethrough.

Indeed.  This has come up a few times, and actually is a mostly trivial
task.  Maybe we should give up waiting for -blockdev and separate cache
mode settings and allow a nocache-writethrough or similar mode now?  It's
going to be around 10 lines of code + documentation.