Am 28.02.2014 um 15:35 hat Peter Lieven geschrieben: > On 27.02.2014 09:57, Stefan Hajnoczi wrote: > >On Wed, Feb 26, 2014 at 05:01:52PM +0100, Peter Lieven wrote: > >>On 26.02.2014 16:41, Stefan Hajnoczi wrote: > >>>On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote: > >>>>I was wondering if it would be a good idea to set the O_DIRECT mode for > >>>>the source > >>>>files of a qemu-img convert process if the source is a host_device? > >>>> > >>>>Currently the backup of a host device is polluting the page cache. > >>>Points to consider: > >>> > >>>1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening > >>> the file. A fallback is necessary. > >>> > >>>2. O_DIRECT has no readahead so performance could actually decrease. > >>> The question is, how important is reahead versus polluting page > >>> cache? > >>> > >>>3. For raw files it would make sense to tell the kernel that access is > >>> sequential and data will be used only once. Then we can get the best > >>> of both worlds (avoid polluting page cache but still get readahead). > >>> This is done using posix_fadvise(2). > >>> > >>> The problem is what to do for image formats. An image file can be > >>> very fragmented so the readahead might not be a win. Does this mean > >>> that for image formats we should tell the kernel access will be > >>> random? > >>> > >>> Furthermore, maybe it's best to do readahead inside QEMU so that even > >>> network protocols (nbd, iscsi, etc) can get good performance. They > >>> act like O_DIRECT is always on. > >>your comments are regarding qemu-img convert, right? > >>How would you implement this? A new open flag because > >>the fadvise had to goto inside the protocol driver. > >> > >>I would start with host_devices first and see how it performs there. > >> > >>For qemu-img convert I would issue a FADV_DONTNEED after > >>a write for the bytes that have been written > >>(i have tested this with Linux and it seems to work quite well). > >> > >>Question is, what is the right paramter for reads? Also FADV_DONTNEED? > >I think so but this should be justified with benchmark results. > > I ran some benchmarks at found that a FADV_DONTNEED issues after > a read does not hurt regarding to performance. But it avoids buffers > increasing while I read from a host_device of raw file.
Okay, sounds reasonable. > As for writing it does only work if I issue a fdatasync after each write, but > this should be equivalent to O_DIRECT. So I would keep the patch > to support qemu-img convert sources if they are host_device or file. Doing an fdatasync() is not an option (and not equivalent to O_DIRECT at all). > Here is a proposal for a patch: > > diff --git a/block.c b/block.c > index 2fd5482..2445433 100644 > --- a/block.c > +++ b/block.c > @@ -2626,6 +2626,14 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t > offset, > qemu_aio_wait(); > } > } > + > +#ifdef POSIX_FADV_DONTNEED > + if (!rwco.ret && bs->open_flags & BDRV_O_SEQUENTIAL && > + bs->drv->bdrv_fadvise && !is_write) { > + bs->drv->bdrv_fadvise(bs, offset, qiov->size, POSIX_FADV_DONTNEED); > + } > +#endif > + This #ifdef should be in the raw-posix driver. Please try to keep the qemu interface backend agnostic and leave POSIX_FADV_DONTNEED and friends as an implementation detail of block drivers. > diff --git a/include/block/block.h b/include/block/block.h > index 780f48b..a4dcc3c 100644 > --- a/include/block/block.h > +++ b/include/block/block.h > @@ -105,6 +105,9 @@ typedef enum { > #define BDRV_O_PROTOCOL 0x8000 /* if no block driver is explicitly given: > select an appropriate protocol driver, > ignoring the format layer */ > +#define BDRV_O_SEQUENTIAL 0x10000 /* open device for sequential read/write > */ > + > + > > #define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | > BDRV_O_NO_FLUSH) Why two additional newlines? BDRV_O_SEQUENTIAL works for me as the external interface. Kevin