Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-05 Thread Anthony Liguori

Gerd Hoffmann wrote:

Anthony Liguori wrote:
  

Gerd Hoffmann wrote:


  Hi,

  

I really want to use readv/writev though.  With virtio, we get a
scatter/gather list for each IO request.



Yep, I've also missed pwritev (or whatever that syscall would be named).

  

Once I post the virtio-blk driver, I'll follow up a little later with
some refactoring of the block device layers.  I think it can be made
much simpler while still remaining asynchronous.



IMHO the only alternative to that scheme would be to turn the block
drivers in some kind of remapping drivers for the various file formats
which don't actually perform the I/O.  Then you can handle the actual
I/O in a generic way using whatever API is available, be it posix-aio,
linux-aio or slow-sync-io.
  

That's part of my plan.



Oh, cool.  Can you also turn them into a sane shared library while being
at it?  The current approach to compile it once for qemu and once for
qemu-img with -DQEMU_TOOL isn't that great.  But if you factor out the
actual I/O the block-raw.c code should have no need to mess with qemu
internals any more and become much cleaner and simpler ...
  


Yeah, it is definitely something that should be turned into a shared 
library.  I don't think I'll attempt that at first but I do agree it's 
the right direction to move toward.


Regards,

Anthony Liguori


cheers,
  Gerd




  






Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-04 Thread Gerd Hoffmann
Anthony Liguori wrote:
> Gerd Hoffmann wrote:
  Hi,

> I really want to use readv/writev though.  With virtio, we get a
> scatter/gather list for each IO request.

Yep, I've also missed pwritev (or whatever that syscall would be named).

> Once I post the virtio-blk driver, I'll follow up a little later with
> some refactoring of the block device layers.  I think it can be made
> much simpler while still remaining asynchronous.
> 
>> IMHO the only alternative to that scheme would be to turn the block
>> drivers in some kind of remapping drivers for the various file formats
>> which don't actually perform the I/O.  Then you can handle the actual
>> I/O in a generic way using whatever API is available, be it posix-aio,
>> linux-aio or slow-sync-io.
> 
> That's part of my plan.

Oh, cool.  Can you also turn them into a sane shared library while being
at it?  The current approach to compile it once for qemu and once for
qemu-img with -DQEMU_TOOL isn't that great.  But if you factor out the
actual I/O the block-raw.c code should have no need to mess with qemu
internals any more and become much cleaner and simpler ...

cheers,
  Gerd





Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-04 Thread Anthony Liguori

Gerd Hoffmann wrote:

Anthony Liguori wrote:
  

IMHO it would be a much better idea to kill the aio interface altogether
and instead make the block drivers reentrant.  Then you can use
(multiple) posix threads to run the I/O async if you want.
  

Threads are a poor substitute for a proper AIO interface.  linux-aio
gives you everything you could possibly want in an interface since it
allows you to submit multiple vectored operations in a single syscall,
use an fd to signal request completion, complete multiple requests in a
single syscall, and inject barriers via fdsync.



I still think implementing async i/o at block driver level is the wrong
thing to do.  You'll end up reinventing the wheel over and over again
and add complexity to the block drivers which simply doesn't belong
there (or not supporting async I/O for most file formats).  Just look at
the insane file size of the block driver for the simplest possible disk
format: block-raw.c.  It will become even worse when adding a
linux-specific aio variant.

In contrast:  Making the disk drivers reentrant should be easy for most
of them.  For the raw driver it should be just using pread/pwrite
syscalls instead of lseek + read/write (also saves a syscall along the
way, yea!).  Others probably need an additional lock for metadata
updates.  With that in place you can easily implement async I/O via
threads one layer above, and only once, in block.c.
  


I really want to use readv/writev though.  With virtio, we get a 
scatter/gather list for each IO request.


Once I post the virtio-blk driver, I'll follow up a little later with 
some refactoring of the block device layers.  I think it can be made 
much simpler while still remaining asynchronous.



IMHO the only alternative to that scheme would be to turn the block
drivers in some kind of remapping drivers for the various file formats
which don't actually perform the I/O.  Then you can handle the actual
I/O in a generic way using whatever API is available, be it posix-aio,
linux-aio or slow-sync-io.
  


That's part of my plan.

Regards,

Anthony Liguori


cheers,
  Gerd



  






Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-04 Thread Gerd Hoffmann
Anthony Liguori wrote:
>> IMHO it would be a much better idea to kill the aio interface altogether
>> and instead make the block drivers reentrant.  Then you can use
>> (multiple) posix threads to run the I/O async if you want.
> 
> Threads are a poor substitute for a proper AIO interface.  linux-aio
> gives you everything you could possibly want in an interface since it
> allows you to submit multiple vectored operations in a single syscall,
> use an fd to signal request completion, complete multiple requests in a
> single syscall, and inject barriers via fdsync.

I still think implementing async i/o at block driver level is the wrong
thing to do.  You'll end up reinventing the wheel over and over again
and add complexity to the block drivers which simply doesn't belong
there (or not supporting async I/O for most file formats).  Just look at
the insane file size of the block driver for the simplest possible disk
format: block-raw.c.  It will become even worse when adding a
linux-specific aio variant.

In contrast:  Making the disk drivers reentrant should be easy for most
of them.  For the raw driver it should be just using pread/pwrite
syscalls instead of lseek + read/write (also saves a syscall along the
way, yea!).  Others probably need an additional lock for metadata
updates.  With that in place you can easily implement async I/O via
threads one layer above, and only once, in block.c.

IMHO the only alternative to that scheme would be to turn the block
drivers in some kind of remapping drivers for the various file formats
which don't actually perform the I/O.  Then you can handle the actual
I/O in a generic way using whatever API is available, be it posix-aio,
linux-aio or slow-sync-io.

cheers,
  Gerd




Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-04 Thread Laurent Vivier
Le mardi 04 décembre 2007 à 13:49 +0100, Gerd Hoffmann a écrit :
> Anthony Liguori wrote:
> > I have a patch that uses linux-aio for the virtio-blk driver I'll be
> > posting tomorrow and I'm extremely happy with the results.  In recent
> > kernels, you can use an eventfd interface along with linux-aio so that
> > polling is unnecessary.
> 
> Which kernel version is "recent"?

I think it is 2.6.22 and after

Laurent
-- 
- [EMAIL PROTECTED]  --
   "Any sufficiently advanced technology is
  indistinguishable from magic." - Arthur C. Clarke


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-04 Thread Gerd Hoffmann
Anthony Liguori wrote:
> I have a patch that uses linux-aio for the virtio-blk driver I'll be
> posting tomorrow and I'm extremely happy with the results.  In recent
> kernels, you can use an eventfd interface along with linux-aio so that
> polling is unnecessary.

Which kernel version is "recent"?

cheers,
  Gerd






Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-04 Thread Laurent Vivier
Le lundi 03 décembre 2007 à 19:16 +, Paul Brook a écrit :
> > Yes, librt is providing posix-aio, and librt coming with GNU libc uses
> > threads.
> > But if I remember correctly librt coming with RHEL uses a mix of threads
> > and linux kernel AIO (you can have a look to the .srpm of libc).
> >
> > BTW, if everyone thinks it could be a good idea I can port block-raw.c
> > to use linux kernel AIO (without removing POSIX AIO support, of course)
> 
> This seems rather pointless, given a user can just use a linux-AIO librt 
> instead.

Just a comment: to use linux-aio, file must be opened with O_DIRECT.
(it's a good reason to include my patch, isn't it ?)

Laurent
-- 
- [EMAIL PROTECTED]  --
   "Any sufficiently advanced technology is
  indistinguishable from magic." - Arthur C. Clarke


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Anthony Liguori

Gerd Hoffmann wrote:

  Hi,

  

BTW, if everyone thinks it could be a good idea I can port block-raw.c
to use linux kernel AIO (without removing POSIX AIO support, of course)



IMHO it would be a much better idea to kill the aio interface altogether
and instead make the block drivers reentrant.  Then you can use
(multiple) posix threads to run the I/O async if you want.
  


Threads are a poor substitute for a proper AIO interface.  linux-aio 
gives you everything you could possibly want in an interface since it 
allows you to submit multiple vectored operations in a single syscall, 
use an fd to signal request completion, complete multiple requests in a 
single syscall, and inject barriers via fdsync.


Regards,

Anthony Liguori


cheers,
  Gerd





  






Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Anthony Liguori

Paul Brook wrote:

Yes, librt is providing posix-aio, and librt coming with GNU libc uses
threads.
But if I remember correctly librt coming with RHEL uses a mix of threads
and linux kernel AIO (you can have a look to the .srpm of libc).

BTW, if everyone thinks it could be a good idea I can port block-raw.c
to use linux kernel AIO (without removing POSIX AIO support, of course)



This seems rather pointless, given a user can just use a linux-AIO librt 
instead.
  


Not at all.  linux-aio is the only interface that allows you to do 
asynchronous fdsync which simulates a barrier which allows for an 
ordered queue.


I have a patch that uses linux-aio for the virtio-blk driver I'll be 
posting tomorrow and I'm extremely happy with the results.  In recent 
kernels, you can use an eventfd interface along with linux-aio so that 
polling is unnecessary.  Along with O_DIRECT and the preadv/pwritev 
interface, you can make a block backend in userspace that performs just 
as well as if it were in the kernel.


The posix-aio interface simply doesn't provide a mechanism to do these 
things.


Regards,

Anthony Liguori


Paul



  






Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Samuel Thibault
Gerd Hoffmann, le Mon 03 Dec 2007 22:13:07 +0100, a écrit :
> > BTW, if everyone thinks it could be a good idea I can port block-raw.c
> > to use linux kernel AIO (without removing POSIX AIO support, of course)
> 
> IMHO it would be a much better idea to kill the aio interface altogether
> and instead make the block drivers reentrant.  Then you can use
> (multiple) posix threads to run the I/O async if you want.

Mmm, that will not make my life easier... I'm precisely trying to avoid
threads so as to get better throughput.

Samuel




Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Gerd Hoffmann
  Hi,

> BTW, if everyone thinks it could be a good idea I can port block-raw.c
> to use linux kernel AIO (without removing POSIX AIO support, of course)

IMHO it would be a much better idea to kill the aio interface altogether
and instead make the block drivers reentrant.  Then you can use
(multiple) posix threads to run the I/O async if you want.

cheers,
  Gerd






Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Samuel Thibault
Paul Brook, le Mon 03 Dec 2007 15:39:48 +, a écrit :
> I think host caching is still useful enough to be enabled by default, and 
> provides a significant performance increase in several cases. 
> 
> - The guest typically has a relatively small quantity of RAM, compared to a 
> modern machine.  Allowing the host OS to act as a demand-based L2 cache 
> allows this to be used without having to dedicate excessive quantities of ram 
> to qemu.
> - I've seen reports that it significantly speeds up the windows installer.
> - Host cache is persistent between multiple qemu runs. f you're doing 
> anything 
> that requires frequent guest reboots (e.g. kernel debugging) this is going to 
> be a huge win.
> - You're running a host OS that has limited or no caching (e.g. DOS).

Yes, and in other cases (e.g. real-production KVM/Xen servers), this is
just cache duplication.

> I'd hope that the host OS would have cache use heuristics that would help 
> limit cache pollution.

How could it?  It can't detect that the guest also has a buffer/page
cache.

Samuel




Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Paul Brook
> Yes, librt is providing posix-aio, and librt coming with GNU libc uses
> threads.
> But if I remember correctly librt coming with RHEL uses a mix of threads
> and linux kernel AIO (you can have a look to the .srpm of libc).
>
> BTW, if everyone thinks it could be a good idea I can port block-raw.c
> to use linux kernel AIO (without removing POSIX AIO support, of course)

This seems rather pointless, given a user can just use a linux-AIO librt 
instead.

Paul




Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Paul Brook
> Well, let's separate a few things.  QEMU uses posix-aio which uses
> threads and normal read/write operations.  It also limits the number of
> threads that aio uses to 1 which effectively makes everything
> synchronous anyway.

This is a bug. Allegedly this is to workaround an old broken glibc, so we 
should probably make it conditional on old glibc.

Paul




Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Laurent Vivier
Le lundi 03 décembre 2007 à 12:06 -0600, Anthony Liguori a écrit :
> Samuel Thibault wrote:
> > Anthony Liguori, le Mon 03 Dec 2007 09:54:47 -0600, a écrit :
> >   
> >> Have you done any performance testing?  Buffered IO should absolutely 
> >> beat direct IO simply because buffered IO allows writes to complete 
> >> before they actually hit disk.
> >> 
> >
> > Since qemu can use the aio interface, that shouldn't matter.
> >   
> 
> Well, let's separate a few things.  QEMU uses posix-aio which uses 
> threads and normal read/write operations.  It also limits the number of 
> threads that aio uses to 1 which effectively makes everything 
> synchronous anyway.

Yes, librt is providing posix-aio, and librt coming with GNU libc uses
threads.
But if I remember correctly librt coming with RHEL uses a mix of threads
and linux kernel AIO (you can have a look to the .srpm of libc).

There is also the libaio I wrote some years ago (with Sébastien Dugué)
which is purely linux kernel AIO (but kernel patches were never included
because of Zach Brown Asynchronous System Call work)

BTW, if everyone thinks it could be a good idea I can port block-raw.c
to use linux kernel AIO (without removing POSIX AIO support, of course)

> But it still doesn't matter.  When you issue a write() on an O_DIRECT 
> fd, the write does not complete until the data has made it's way to 
> disk.  The guest can still run if you're using O_NONBLOCK but the IDE 
> device will not submit another IO request until you complete the DMA 
> operation.
> 
> The SCSI device supports multiple outstanding operations but it's 
> limited to 16 but you'll never see more than one request at a time in 
> QEMU currently because of the limitation to a single thread.
> 
> Regards,
> 
> Anthony Liguori
> 
> > Samuel
> >
> >
> >
> >   
> 
> 
> 
-- 
- [EMAIL PROTECTED]  --
   "Any sufficiently advanced technology is
  indistinguishable from magic." - Arthur C. Clarke


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Laurent Vivier
Le lundi 03 décembre 2007 à 09:54 -0600, Anthony Liguori a écrit :
> Laurent Vivier wrote:
> > Le lundi 03 décembre 2007 à 11:23 +0100, Fabrice Bellard a écrit :
> >   
> >> Laurent Vivier wrote:
> >> 
> >>> This patch enhances the "-drive ,cache=off" mode with IDE drive emulation
> >>> by removing the buffer used in the IDE emulation.
> >>> ---
> >>>  block.c |   10 +++
> >>>  block.h |2 
> >>>  block_int.h |1 
> >>>  cpu-all.h   |1 
> >>>  exec.c  |   19 ++
> >>>  hw/ide.c|  176 
> >>> +---
> >>>  vl.c|1 
> >>>  7 files changed, 204 insertions(+), 6 deletions(-)
> >>>   
> >> What's the use of keeping the buffered case ?
> >> 
> >
> > Well, I don't like to remove code written by others...
> > and I don't want to break something.
> >
> > But if you think I should remove the buffered case, I can.
> >
> > BTW, do you think I should enable "cache=off" by default ?
> > Or even remove the option from the command line and always use
> > O_DIRECT ?
> >   
> 
> Hi Laurent,

Hi Anthony,

> Have you done any performance testing?  Buffered IO should absolutely 
> beat direct IO simply because buffered IO allows writes to complete 
> before they actually hit disk.  I've observed this myself.  Plus the 
> host typically has a much larger page cache then the guest so the second 
> level of caching helps an awful lot.

I don't have real benchmarks. I just saw some improvements with dbench
(which is not a good benchmark, I know...)

Direct I/O can be good in some cases (because it avoids multiple copies)
and good in others (because it avoids disk access, and as you say it
doesn't wait I/O completion).

But there are at least two other good reasons to use it:

- reliability: by avoiding cache we improve probability of data are on
disk (and the ordering of I/O). And as you say, as we wait write
completion, we are sure data have been written.

- isolation: it allows to avoid to pollute host cache with guest data
(and if we have several guests, it avoids to have performance impact at
the cache level between guests).

But there is no perfect solution, it's why I think it's good thing to
let the choice to the user.

Laurent
- 
-- 
- [EMAIL PROTECTED]  --
   "Any sufficiently advanced technology is
  indistinguishable from magic." - Arthur C. Clarke


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Jamie Lokier
Anthony Liguori wrote:
> >With the IDE emulation, when the emulated "disk write cache" flag is
> >on it may be reasonable to report a write as completed when the AIO is
> >dispatched, without waiting for the AIO to complete. 
> >
> >An IDE flush cache command would wait for all outstanding write AIOs
> >to complete, and then issue a flush cache (fdatasync) to the real
> >device before reporting it has completed.
> >
> >That's roughly equivalent to what an IDE disk with write caching does,
> >and it would provide exactly the guarantees for safe storage to the
> >real physical medium that a journalling filesystem or database in the
> >guest requires.
> 
> Except that in an enterprise environment, you typically have battery 
> backed disk cache.  It really doesn't matter though b/c in QEMU today, 
> submitting the request blocks until it's completed anyway (which is 
> nearly instant anyway since I/O is buffered).

Buffered I/O is less reliable in a sense.

With buffered I/O, if the host crashes, you may lose data that a
filesystem or database on the guest reported as committed to
applications.  That can result, on those rare occasions, in guest
journalled filesystem corruption (something that should be
impossible), and in database corruption or durability failure.

With direct I/O and write cache emulation (as described), when a guest
journalling filesystem or database reports data is committed, it has
much the same committment/durability guarantee that the same
applications would have running on the host.  Namely, the data has
reached the disk, and the disk has reported it's committed.

This may matter if you want to run those sort of applications in a
guest, which clearly people often do, especially with KVM or Xen.

Anecdote: This is already a problem in some environments.  I have a
rented virtual machine; it's running UML.  The UML disk uses O_SYNC
writes (nowadays), because buffered host writes resulted in occasional
guest data loss, and journalled filesystem corruption.  Unfortunately,
this is a performance slowdown, but it's better than occasional
corruption.  I imagine similar things apply with Qemu machines
occasionally.

-- Jamie




Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Anthony Liguori

Jamie Lokier wrote:

Paul Brook wrote:
  

On Monday 03 December 2007, Samuel Thibault wrote:


Anthony Liguori, le Mon 03 Dec 2007 09:54:47 -0600, a écrit :
  

Have you done any performance testing?  Buffered IO should absolutely
beat direct IO simply because buffered IO allows writes to complete
before they actually hit disk.


Since qemu can use the aio interface, that shouldn't matter.
  
Only if the emulated hardware and guest OS support multiple concurrent 
commands.  IDE supports async operation, but not concurrent commmands. In 
practice this means you only get full performance if you're using the SCSI 
emulation.



With the IDE emulation, when the emulated "disk write cache" flag is
on it may be reasonable to report a write as completed when the AIO is
dispatched, without waiting for the AIO to complete. 


An IDE flush cache command would wait for all outstanding write AIOs
to complete, and then issue a flush cache (fdatasync) to the real
device before reporting it has completed.

That's roughly equivalent to what an IDE disk with write caching does,
and it would provide exactly the guarantees for safe storage to the
real physical medium that a journalling filesystem or database in the
guest requires.

If a guest doesn't use journalling with IDE write cache safely
(e.g. 2.4 Linux and earler), it can simply turn off the IDE "disk
write cache" flag, which is what it has to do on a real physical disk
too.

Terminating the qemu process abruptly might cancel some AIOs, but even
that is ok, as it's equivalent to pulling the power on a real disk
with uncommitted cached writes.
  


Except that in an enterprise environment, you typically have battery 
backed disk cache.  It really doesn't matter though b/c in QEMU today, 
submitting the request blocks until it's completed anyway (which is 
nearly instant anyway since I/O is buffered).


Regards,

Anthony Liguori


-- Jamie



  






Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Anthony Liguori

Samuel Thibault wrote:

Anthony Liguori, le Mon 03 Dec 2007 09:54:47 -0600, a écrit :
  
Have you done any performance testing?  Buffered IO should absolutely 
beat direct IO simply because buffered IO allows writes to complete 
before they actually hit disk.



Since qemu can use the aio interface, that shouldn't matter.
  


Well, let's separate a few things.  QEMU uses posix-aio which uses 
threads and normal read/write operations.  It also limits the number of 
threads that aio uses to 1 which effectively makes everything 
synchronous anyway.


But it still doesn't matter.  When you issue a write() on an O_DIRECT 
fd, the write does not complete until the data has made it's way to 
disk.  The guest can still run if you're using O_NONBLOCK but the IDE 
device will not submit another IO request until you complete the DMA 
operation.


The SCSI device supports multiple outstanding operations but it's 
limited to 16 but you'll never see more than one request at a time in 
QEMU currently because of the limitation to a single thread.


Regards,

Anthony Liguori


Samuel



  






Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Jamie Lokier
Paul Brook wrote:
> On Monday 03 December 2007, Samuel Thibault wrote:
> > Anthony Liguori, le Mon 03 Dec 2007 09:54:47 -0600, a écrit :
> > > Have you done any performance testing?  Buffered IO should absolutely
> > > beat direct IO simply because buffered IO allows writes to complete
> > > before they actually hit disk.
> >
> > Since qemu can use the aio interface, that shouldn't matter.
> 
> Only if the emulated hardware and guest OS support multiple concurrent 
> commands.  IDE supports async operation, but not concurrent commmands. In 
> practice this means you only get full performance if you're using the SCSI 
> emulation.

With the IDE emulation, when the emulated "disk write cache" flag is
on it may be reasonable to report a write as completed when the AIO is
dispatched, without waiting for the AIO to complete. 

An IDE flush cache command would wait for all outstanding write AIOs
to complete, and then issue a flush cache (fdatasync) to the real
device before reporting it has completed.

That's roughly equivalent to what an IDE disk with write caching does,
and it would provide exactly the guarantees for safe storage to the
real physical medium that a journalling filesystem or database in the
guest requires.

If a guest doesn't use journalling with IDE write cache safely
(e.g. 2.4 Linux and earler), it can simply turn off the IDE "disk
write cache" flag, which is what it has to do on a real physical disk
too.

Terminating the qemu process abruptly might cancel some AIOs, but even
that is ok, as it's equivalent to pulling the power on a real disk
with uncommitted cached writes.

-- Jamie




Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Paul Brook
On Monday 03 December 2007, Samuel Thibault wrote:
> Anthony Liguori, le Mon 03 Dec 2007 09:54:47 -0600, a écrit :
> > Have you done any performance testing?  Buffered IO should absolutely
> > beat direct IO simply because buffered IO allows writes to complete
> > before they actually hit disk.
>
> Since qemu can use the aio interface, that shouldn't matter.

Only if the emulated hardware and guest OS support multiple concurrent 
commands.  IDE supports async operation, but not concurrent commmands. In 
practice this means you only get full performance if you're using the SCSI 
emulation.

Paul




Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Samuel Thibault
Anthony Liguori, le Mon 03 Dec 2007 09:54:47 -0600, a écrit :
> Have you done any performance testing?  Buffered IO should absolutely 
> beat direct IO simply because buffered IO allows writes to complete 
> before they actually hit disk.

Since qemu can use the aio interface, that shouldn't matter.

Samuel




Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Anthony Liguori

Laurent Vivier wrote:

Le lundi 03 décembre 2007 à 11:23 +0100, Fabrice Bellard a écrit :
  

Laurent Vivier wrote:


This patch enhances the "-drive ,cache=off" mode with IDE drive emulation
by removing the buffer used in the IDE emulation.
---
 block.c |   10 +++
 block.h |2 
 block_int.h |1 
 cpu-all.h   |1 
 exec.c  |   19 ++

 hw/ide.c|  176 +---
 vl.c|1 
 7 files changed, 204 insertions(+), 6 deletions(-)
  

What's the use of keeping the buffered case ?



Well, I don't like to remove code written by others...
and I don't want to break something.

But if you think I should remove the buffered case, I can.

BTW, do you think I should enable "cache=off" by default ?
Or even remove the option from the command line and always use
O_DIRECT ?
  


Hi Laurent,

Have you done any performance testing?  Buffered IO should absolutely 
beat direct IO simply because buffered IO allows writes to complete 
before they actually hit disk.  I've observed this myself.  Plus the 
host typically has a much larger page cache then the guest so the second 
level of caching helps an awful lot.


Regards,

Anthony Liguori


Regards,
Laurent
  






Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Paul Brook
On Monday 03 December 2007, Markus Hitter wrote:
> Am 03.12.2007 um 11:30 schrieb Laurent Vivier:
> > But if you think I should remove the buffered case, I can.
>
> In doubt, less code is always better. For the unlikely case you broke
> something badly, there's always the option to take back the patch.
>
> > BTW, do you think I should enable "cache=off" by default ?
>
> This would be fine for a transition phase, but likely, the cache=on
> case gets forgotten to be removed later. So, do it now.

I think host caching is still useful enough to be enabled by default, and 
provides a significant performance increase in several cases. 

- The guest typically has a relatively small quantity of RAM, compared to a 
modern machine.  Allowing the host OS to act as a demand-based L2 cache 
allows this to be used without having to dedicate excessive quantities of ram 
to qemu.
- I've seen reports that it significantly speeds up the windows installer.
- Host cache is persistent between multiple qemu runs. f you're doing anything 
that requires frequent guest reboots (e.g. kernel debugging) this is going to 
be a huge win.
- You're running a host OS that has limited or no caching (e.g. DOS).

I'd hope that the host OS would have cache use heuristics that would help 
limit cache pollution.

Paul




Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Markus Hitter


Am 03.12.2007 um 11:30 schrieb Laurent Vivier:


But if you think I should remove the buffered case, I can.


In doubt, less code is always better. For the unlikely case you broke  
something badly, there's always the option to take back the patch.



BTW, do you think I should enable "cache=off" by default ?



This would be fine for a transition phase, but likely, the cache=on  
case gets forgotten to be removed later. So, do it now.



my $ 0.02,
Markus

- - - - - - - - - - - - - - - - - - -
Dipl. Ing. Markus Hitter
http://www.jump-ing.de/








Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Johannes Schindelin
Hi,

On Mon, 3 Dec 2007, Fabrice Bellard wrote:

> Laurent Vivier wrote:
> > This patch enhances the "-drive ,cache=off" mode with IDE drive emulation
> > by removing the buffer used in the IDE emulation.
> > ---
> >  block.c |   10 +++
> >  block.h |2  block_int.h |1  cpu-all.h   |1  exec.c  |
> > 19 ++
> >  hw/ide.c|  176
> > +---
> >  vl.c|1  7 files changed, 204 insertions(+), 6 deletions(-)
> 
> What's the use of keeping the buffered case ?

AFAICT if your guest is DOS without a disk caching driver, you do not 
really want to use O_DIRECT.

Ciao,
Dscho





Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Laurent Vivier
Le lundi 03 décembre 2007 à 11:23 +0100, Fabrice Bellard a écrit :
> Laurent Vivier wrote:
> > This patch enhances the "-drive ,cache=off" mode with IDE drive emulation
> > by removing the buffer used in the IDE emulation.
> > ---
> >  block.c |   10 +++
> >  block.h |2 
> >  block_int.h |1 
> >  cpu-all.h   |1 
> >  exec.c  |   19 ++
> >  hw/ide.c|  176 
> > +---
> >  vl.c|1 
> >  7 files changed, 204 insertions(+), 6 deletions(-)
> 
> What's the use of keeping the buffered case ?

Well, I don't like to remove code written by others...
and I don't want to break something.

But if you think I should remove the buffered case, I can.

BTW, do you think I should enable "cache=off" by default ?
Or even remove the option from the command line and always use
O_DIRECT ?

Regards,
Laurent
-- 
- [EMAIL PROTECTED]  --
   "Any sufficiently advanced technology is
  indistinguishable from magic." - Arthur C. Clarke


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Fabrice Bellard

Laurent Vivier wrote:

This patch enhances the "-drive ,cache=off" mode with IDE drive emulation
by removing the buffer used in the IDE emulation.
---
 block.c |   10 +++
 block.h |2 
 block_int.h |1 
 cpu-all.h   |1 
 exec.c  |   19 ++

 hw/ide.c|  176 +---
 vl.c|1 
 7 files changed, 204 insertions(+), 6 deletions(-)


What's the use of keeping the buffered case ?

Fabrice.




[Qemu-devel] [PATCH 2/2 v2] Direct IDE I/O

2007-12-03 Thread Laurent Vivier

This patch enhances the "-drive ,cache=off" mode with IDE drive emulation
by removing the buffer used in the IDE emulation.
---
 block.c |   10 +++
 block.h |2 
 block_int.h |1 
 cpu-all.h   |1 
 exec.c  |   19 ++
 hw/ide.c|  176 +---
 vl.c|1 
 7 files changed, 204 insertions(+), 6 deletions(-)

Index: qemu/block.c
===
--- qemu.orig/block.c   2007-12-03 09:54:47.0 +0100
+++ qemu/block.c2007-12-03 09:54:53.0 +0100
@@ -758,6 +758,11 @@ void bdrv_set_translation_hint(BlockDriv
 bs->translation = translation;
 }
 
+void bdrv_set_cache_hint(BlockDriverState *bs, int cache)
+{
+bs->cache = cache;
+}
+
 void bdrv_get_geometry_hint(BlockDriverState *bs,
 int *pcyls, int *pheads, int *psecs)
 {
@@ -786,6 +791,11 @@ int bdrv_is_read_only(BlockDriverState *
 return bs->read_only;
 }
 
+int bdrv_is_cached(BlockDriverState *bs)
+{
+return bs->cache;
+}
+
 /* XXX: no longer used */
 void bdrv_set_change_cb(BlockDriverState *bs,
 void (*change_cb)(void *opaque), void *opaque)
Index: qemu/block.h
===
--- qemu.orig/block.h   2007-12-03 09:54:47.0 +0100
+++ qemu/block.h2007-12-03 09:54:53.0 +0100
@@ -113,6 +113,7 @@ void bdrv_set_geometry_hint(BlockDriverS
 int cyls, int heads, int secs);
 void bdrv_set_type_hint(BlockDriverState *bs, int type);
 void bdrv_set_translation_hint(BlockDriverState *bs, int translation);
+void bdrv_set_cache_hint(BlockDriverState *bs, int cache);
 void bdrv_get_geometry_hint(BlockDriverState *bs,
 int *pcyls, int *pheads, int *psecs);
 int bdrv_get_type_hint(BlockDriverState *bs);
@@ -120,6 +121,7 @@ int bdrv_get_translation_hint(BlockDrive
 int bdrv_is_removable(BlockDriverState *bs);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_inserted(BlockDriverState *bs);
+int bdrv_is_cached(BlockDriverState *bs);
 int bdrv_media_changed(BlockDriverState *bs);
 int bdrv_is_locked(BlockDriverState *bs);
 void bdrv_set_locked(BlockDriverState *bs, int locked);
Index: qemu/block_int.h
===
--- qemu.orig/block_int.h   2007-12-03 09:53:30.0 +0100
+++ qemu/block_int.h2007-12-03 09:54:53.0 +0100
@@ -124,6 +124,7 @@ struct BlockDriverState {
drivers. They are not used by the block driver */
 int cyls, heads, secs, translation;
 int type;
+int cache;
 char device_name[32];
 BlockDriverState *next;
 };
Index: qemu/vl.c
===
--- qemu.orig/vl.c  2007-12-03 09:54:47.0 +0100
+++ qemu/vl.c   2007-12-03 09:54:53.0 +0100
@@ -5112,6 +5112,7 @@ static int drive_init(const char *str, i
 bdrv_flags |= BDRV_O_SNAPSHOT;
 if (!cache)
 bdrv_flags |= BDRV_O_DIRECT;
+bdrv_set_cache_hint(bdrv, cache);
 if (bdrv_open(bdrv, file, bdrv_flags) < 0 || qemu_key_check(bdrv, file)) {
 fprintf(stderr, "qemu: could not open disk image %s\n",
 file);
Index: qemu/hw/ide.c
===
--- qemu.orig/hw/ide.c  2007-12-03 09:54:47.0 +0100
+++ qemu/hw/ide.c   2007-12-03 09:54:53.0 +0100
@@ -816,7 +816,7 @@ static int dma_buf_rw(BMDMAState *bm, in
 }
 
 /* XXX: handle errors */
-static void ide_read_dma_cb(void *opaque, int ret)
+static void ide_read_dma_cb_buffered(void *opaque, int ret)
 {
 BMDMAState *bm = opaque;
 IDEState *s = bm->ide_if;
@@ -856,7 +856,86 @@ static void ide_read_dma_cb(void *opaque
 printf("aio_read: sector_num=%lld n=%d\n", sector_num, n);
 #endif
 bm->aiocb = bdrv_aio_read(s->bs, sector_num, s->io_buffer, n,
-  ide_read_dma_cb, bm);
+  ide_read_dma_cb_buffered, bm);
+}
+
+static void ide_read_dma_cb_unbuffered(void *opaque, int ret)
+{
+BMDMAState *bm = opaque;
+IDEState *s = bm->ide_if;
+int64_t sector_num;
+int nsector;
+int len;
+uint8_t *phy_addr;
+
+if (s->nsector == 0) {
+s->status = READY_STAT | SEEK_STAT;
+   ide_set_irq(s);
+eot:
+   bm->status &= ~BM_STATUS_DMAING;
+   bm->status |= BM_STATUS_INT;
+   bm->dma_cb = NULL;
+   bm->ide_if = NULL;
+   bm->aiocb = NULL;
+   return;
+}
+
+/* launch next transfer */
+
+if (bm->cur_prd_len == 0) {
+struct {
+uint32_t addr;
+uint32_t size;
+} prd;
+
+cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+
+bm->cur_addr += 8;
+prd.addr = le32_to_cpu(prd.addr);
+prd.size = le32_to_cpu(prd.size);
+len = prd.size