Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-13 Thread Stefan Hajnoczi
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com



Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-12 Thread Stefan Hajnoczi
On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V
aneesh.ku...@linux.vnet.ibm.com wrote:
 diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c
 index 5c8b5ed..441a37f 100644
 --- a/hw/9pfs/virtio-9p-handle.c
 +++ b/hw/9pfs/virtio-9p-handle.c
 @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, 
 const struct iovec *iov,
         return writev(fd, iov, iovcnt);

The sync_file_range(2) call below is dead-code since we'll return
immediately after writev(2) completes.  The writev(2) return value
needs to be saved temporarily and then returned after
sync_file_range(2).

     }
  #endif
 +    if (ctx-cache_flags  V9FS_WRITETHROUGH_CACHE) {

-drive cache=writethrough means something different from 9pfs
writethrough.  This is confusing so I wonder if there is a better
name like immediate write-out.

 +        /*
 +         * Initiate a writeback. This is not a data integrity sync.
 +         * We want to ensure that we don't leave dirty pages in the cache
 +         * after write when cache=writethrough is sepcified.
 +         */
 +        sync_file_range(fd, offset, 0,
 +                        SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE);
 +    }

I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary.  As a
best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although
a client that rapidly rewrites may be able to leave dirty pages in the
host page cache.  SYNC_FILE_RANGE_WAIT_BEFORE ensures that dirty pages
get written out but it is no longer asynchronous because it blocks.

Stefan



Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-12 Thread Aneesh Kumar K.V
On Wed, 12 Oct 2011 10:55:21 +0100, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V
 aneesh.ku...@linux.vnet.ibm.com wrote:
  diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c
  index 5c8b5ed..441a37f 100644
  --- a/hw/9pfs/virtio-9p-handle.c
  +++ b/hw/9pfs/virtio-9p-handle.c
  @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, 
  const struct iovec *iov,
          return writev(fd, iov, iovcnt);
 
 The sync_file_range(2) call below is dead-code since we'll return
 immediately after writev(2) completes.  The writev(2) return value
 needs to be saved temporarily and then returned after
 sync_file_range(2).

Missed that. Will fix in the next update

 
      }
   #endif
  +    if (ctx-cache_flags  V9FS_WRITETHROUGH_CACHE) {
 
 -drive cache=writethrough means something different from 9pfs
 writethrough.  This is confusing so I wonder if there is a better
 name like immediate write-out.
 

cache=immediate-write-out ? 

  +        /*
  +         * Initiate a writeback. This is not a data integrity sync.
  +         * We want to ensure that we don't leave dirty pages in the cache
  +         * after write when cache=writethrough is sepcified.
  +         */
  +        sync_file_range(fd, offset, 0,
  +                        SYNC_FILE_RANGE_WAIT_BEFORE | 
  SYNC_FILE_RANGE_WRITE);
  +    }
 
 I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary.  As a
 best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although
 a client that rapidly rewrites may be able to leave dirty pages in the
 host page cache

Shouldn't we prevent this ? That is the reason for me to use that
WAIT_BEFORE ?

.  SYNC_FILE_RANGE_WAIT_BEFORE ensures that dirty pages
 get written out but it is no longer asynchronous because it blocks.

-aneesh




Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-12 Thread Stefan Hajnoczi
On Wed, Oct 12, 2011 at 2:19 PM, Aneesh Kumar K.V
aneesh.ku...@linux.vnet.ibm.com wrote:
 On Wed, 12 Oct 2011 10:55:21 +0100, Stefan Hajnoczi stefa...@gmail.com 
 wrote:
 On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V
 aneesh.ku...@linux.vnet.ibm.com wrote:
  diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c
  index 5c8b5ed..441a37f 100644
  --- a/hw/9pfs/virtio-9p-handle.c
  +++ b/hw/9pfs/virtio-9p-handle.c
  @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, 
  const struct iovec *iov,
          return writev(fd, iov, iovcnt);

 The sync_file_range(2) call below is dead-code since we'll return
 immediately after writev(2) completes.  The writev(2) return value
 needs to be saved temporarily and then returned after
 sync_file_range(2).

 Missed that. Will fix in the next update


      }
   #endif
  +    if (ctx-cache_flags  V9FS_WRITETHROUGH_CACHE) {

 -drive cache=writethrough means something different from 9pfs
 writethrough.  This is confusing so I wonder if there is a better
 name like immediate write-out.


 cache=immediate-write-out ?

  +        /*
  +         * Initiate a writeback. This is not a data integrity sync.
  +         * We want to ensure that we don't leave dirty pages in the cache
  +         * after write when cache=writethrough is sepcified.
  +         */
  +        sync_file_range(fd, offset, 0,
  +                        SYNC_FILE_RANGE_WAIT_BEFORE | 
  SYNC_FILE_RANGE_WRITE);
  +    }

 I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary.  As a
 best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although
 a client that rapidly rewrites may be able to leave dirty pages in the
 host page cache

 Shouldn't we prevent this ? That is the reason for me to use that
 WAIT_BEFORE ?

The flag will cause sync_file_range(2) to wait on in-flight I/O.  The
guest will notice slow I/O.  You should at least specify a range
instead of nbytes=0 in the arguments.

Stefan



Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-12 Thread Aneesh Kumar K.V
On Wed, 12 Oct 2011 15:32:36 +0100, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Wed, Oct 12, 2011 at 2:19 PM, Aneesh Kumar K.V
 aneesh.ku...@linux.vnet.ibm.com wrote:
  On Wed, 12 Oct 2011 10:55:21 +0100, Stefan Hajnoczi stefa...@gmail.com 
  wrote:
  On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V
  aneesh.ku...@linux.vnet.ibm.com wrote:
   diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c
   index 5c8b5ed..441a37f 100644
   --- a/hw/9pfs/virtio-9p-handle.c
   +++ b/hw/9pfs/virtio-9p-handle.c
   @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int 
   fd, const struct iovec *iov,
           return writev(fd, iov, iovcnt);
 
  The sync_file_range(2) call below is dead-code since we'll return
  immediately after writev(2) completes.  The writev(2) return value
  needs to be saved temporarily and then returned after
  sync_file_range(2).
 
  Missed that. Will fix in the next update
 
 
       }
    #endif
   +    if (ctx-cache_flags  V9FS_WRITETHROUGH_CACHE) {
 
  -drive cache=writethrough means something different from 9pfs
  writethrough.  This is confusing so I wonder if there is a better
  name like immediate write-out.
 
 
  cache=immediate-write-out ?
 
   +        /*
   +         * Initiate a writeback. This is not a data integrity sync.
   +         * We want to ensure that we don't leave dirty pages in the 
   cache
   +         * after write when cache=writethrough is sepcified.
   +         */
   +        sync_file_range(fd, offset, 0,
   +                        SYNC_FILE_RANGE_WAIT_BEFORE | 
   SYNC_FILE_RANGE_WRITE);
   +    }
 
  I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary.  As a
  best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although
  a client that rapidly rewrites may be able to leave dirty pages in the
  host page cache
 
  Shouldn't we prevent this ? That is the reason for me to use that
  WAIT_BEFORE ?
 
 The flag will cause sync_file_range(2) to wait on in-flight I/O.  The
 guest will notice slow I/O.  You should at least specify a range
 instead of nbytes=0 in the arguments.

version 3 i have 

commit db3cb2ac561d837176f9e0d02075a898c57ad309
Author: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Date:   Wed Oct 12 19:11:23 2011 +0530

hw/9pfs: Add new virtfs option writeout=immediate skip host page cache

writeout=immediate implies the after pwritev we do a sync_file_range.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index 8de8abf..af3ecbe 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -53,12 +53,16 @@ struct xattr_operations;
 /* FsContext flag values */
 #define PATHNAME_FSCONTEXT 0x1
 
+/* export flags */
+#define V9FS_IMMEDIATE_WRITEOUT 0x1
+
 typedef struct FsContext
 {
 int flags;
 char *fs_root;
 SecModel fs_sm;
 uid_t uid;
+int export_flags;
 struct xattr_operations **xops;
 /* fs driver specific data */
 void *private;
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index 768819f..946bad7 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts)
 const char *fstype = qemu_opt_get(opts, fstype);
 const char *path = qemu_opt_get(opts, path);
 const char *sec_model = qemu_opt_get(opts, security_model);
+const char *writeout = qemu_opt_get(opts, writeout);
+
 
 if (!fsdev_id) {
 fprintf(stderr, fsdev: No id specified\n);
@@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts)
 fsle-fse.path = g_strdup(path);
 fsle-fse.security_model = g_strdup(sec_model);
 fsle-fse.ops = FsTypes[i].ops;
-
+fsle-fse.export_flags = 0;
+if (writeout) {
+if (!strcmp(writeout, immediate)) {
+fsle-fse.export_flags = V9FS_IMMEDIATE_WRITEOUT;
+}
+}
 QTAILQ_INSERT_TAIL(fstype_entries, fsle, next);
 return 0;
-
 }
 
 FsTypeEntry *get_fsdev_fsentry(char *id)
diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h
index e04931a..3b2feb4 100644
--- a/fsdev/qemu-fsdev.h
+++ b/fsdev/qemu-fsdev.h
@@ -41,6 +41,7 @@ typedef struct FsTypeEntry {
 char *fsdev_id;
 char *path;
 char *security_model;
+int export_flags;
 FileOperations *ops;
 } FsTypeEntry;
 
diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index e5b68da..403eed0 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -115,6 +115,7 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf 
*conf)
 exit(1);
 }
 
+s-ctx.export_flags = fse-export_flags;
 s-ctx.fs_root = g_strdup(fse-path);
 len = strlen(conf-tag);
 if (len  MAX_TAG_LEN) {
diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c
index 5c8b5ed..91d757a 100644
--- a/hw/9pfs/virtio-9p-handle.c
+++ b/hw/9pfs/virtio-9p-handle.c
@@ -192,16 +192,27 @@ static ssize_t handle_preadv(FsContext *ctx, int fd, 
const struct iovec *iov,
 static ssize_t 

Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-10 Thread Aneesh Kumar K.V
On Mon, 10 Oct 2011 05:54:56 +0100, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Sun, Oct 9, 2011 at 7:35 PM, Aneesh Kumar K.V
 aneesh.ku...@linux.vnet.ibm.com wrote:
  On Sun, 9 Oct 2011 17:16:50 +0100, Stefan Hajnoczi stefa...@gmail.com 
  wrote:
  On Sun, Oct 9, 2011 at 4:34 PM, Aneesh Kumar K.V
  aneesh.ku...@linux.vnet.ibm.com wrote:
   On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi stefa...@gmail.com 
   wrote:
   On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V
   aneesh.ku...@linux.vnet.ibm.com wrote:
cache=writethrough implies the file are opened in the host with 
O_SYNC open flag
   
Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 fsdev/file-op-9p.h         |    1 +
 fsdev/qemu-fsdev.c         |   10 --
 fsdev/qemu-fsdev.h         |    2 ++
 hw/9pfs/virtio-9p-device.c |    5 +
 hw/9pfs/virtio-9p.c        |   24 ++--
 qemu-config.c              |    6 ++
 qemu-options.hx            |   17 -
 vl.c                       |    6 ++
 8 files changed, 58 insertions(+), 13 deletions(-)
  
   When would this be used?  For serving up vanilla 9P?
  
   I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does 
   not.
  
  
   TFSYNC is added by 9p.L. So we would need this for 9p.u.
 
  I think 9p.u is covered by this wstat hack in
  http://ericvh.github.com/9p-rfc/rfc9p2000.html#anchor32:
 
  if all the elements of the directory entry in a Twstat message are
  ``don't touch'' val- ues, the server may interpret it as a request to
  guarantee that the contents of the associated file are committed to
  stable storage before the Rwstat message is returned.
 
  A real TFSYNC operation is nicer though and could be mandatory (the
  9P2000 RFC only says the server *may*).
 
   Another use
   case is to ensure that we don't leave pages on host as dirty. That would
   ensure that large writeback from a guest don't result in large number of
   dirty pages on the host, thereby resulting in writeback in the host. It
   would be needed for predictable I/O behavior in a setup where we have
   multiple guest.
 
  I see.  I'm mostly curious about this change because the caching modes
  are a nightmare with block devices - a lot of time is spent discussing
  and benchmarking them, and they cause confusion when configuring KVM.
 
  It sounds like O_SYNC is being used in order to keep page cache clean.
   But keeping the page cache clean is a side-effect of O_SYNC's
  behavior: writing out each page and synchronizing the disk write
  cache.  If you are attempting to bypass the page cache, just use
  O_DIRECT without O_SYNC.
 
  But how about reads. I want to make sure i get to use the page cache for
  reads and also want to keep the page cache clean.
 
   O_SYNC is doing the additional disk write
  cache synchronization which will slow down I/O and prevent the server
  from using disk write cache.  O_SYNC is not the right flag to use.
 
  O_DIRECT have additional requirement on buffer alignment, and we don't
  want to send every read to disk.  VirtFS also support zero copy
  read/write, so that buffer alignment will always not be possible.
  We want to make sure writes don't leave the page cache dirty so
  that host doesn't spent much time in write back of data dirtied by the 
  guest.
 
 sync_file_range(2) kicks off the write-out but does not flush file
 system metadata or synchronize the disk write cache.
 

something like below ?

commit bc7c28877b30155264f351d26692b3cceca853bb
Author: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Date:   Mon May 23 11:29:15 2011 +0530

hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

cache=writethrough implies the file are opened in the host with O_SYNC open 
flag

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index 8de8abf..84ec9e0 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -53,12 +53,16 @@ struct xattr_operations;
 /* FsContext flag values */
 #define PATHNAME_FSCONTEXT 0x1
 
+/* cache flags */
+#define V9FS_WRITETHROUGH_CACHE 0x1
+
 typedef struct FsContext
 {
 int flags;
 char *fs_root;
 SecModel fs_sm;
 uid_t uid;
+int cache_flags;
 struct xattr_operations **xops;
 /* fs driver specific data */
 void *private;
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index 768819f..fce016b 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts)
 const char *fstype = qemu_opt_get(opts, fstype);
 const char *path = qemu_opt_get(opts, path);
 const char *sec_model = qemu_opt_get(opts, security_model);
+const char *cache = qemu_opt_get(opts, cache);
+
 
 if (!fsdev_id) {
 fprintf(stderr, fsdev: No id specified\n);
@@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts)
 fsle-fse.path = 

Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-09 Thread Aneesh Kumar K.V
On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V
 aneesh.ku...@linux.vnet.ibm.com wrote:
  cache=writethrough implies the file are opened in the host with O_SYNC open 
  flag
 
  Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  ---
   fsdev/file-op-9p.h         |    1 +
   fsdev/qemu-fsdev.c         |   10 --
   fsdev/qemu-fsdev.h         |    2 ++
   hw/9pfs/virtio-9p-device.c |    5 +
   hw/9pfs/virtio-9p.c        |   24 ++--
   qemu-config.c              |    6 ++
   qemu-options.hx            |   17 -
   vl.c                       |    6 ++
   8 files changed, 58 insertions(+), 13 deletions(-)
 
 When would this be used?  For serving up vanilla 9P?
 
 I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does not.
 

TFSYNC is added by 9p.L. So we would need this for 9p.u. Another use
case is to ensure that we don't leave pages on host as dirty. That would
ensure that large writeback from a guest don't result in large number of
dirty pages on the host, thereby resulting in writeback in the host. It
would be needed for predictable I/O behavior in a setup where we have
multiple guest.

-aneesh



Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-09 Thread Stefan Hajnoczi
On Sun, Oct 9, 2011 at 4:34 PM, Aneesh Kumar K.V
aneesh.ku...@linux.vnet.ibm.com wrote:
 On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V
 aneesh.ku...@linux.vnet.ibm.com wrote:
  cache=writethrough implies the file are opened in the host with O_SYNC 
  open flag
 
  Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  ---
   fsdev/file-op-9p.h         |    1 +
   fsdev/qemu-fsdev.c         |   10 --
   fsdev/qemu-fsdev.h         |    2 ++
   hw/9pfs/virtio-9p-device.c |    5 +
   hw/9pfs/virtio-9p.c        |   24 ++--
   qemu-config.c              |    6 ++
   qemu-options.hx            |   17 -
   vl.c                       |    6 ++
   8 files changed, 58 insertions(+), 13 deletions(-)

 When would this be used?  For serving up vanilla 9P?

 I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does not.


 TFSYNC is added by 9p.L. So we would need this for 9p.u.

I think 9p.u is covered by this wstat hack in
http://ericvh.github.com/9p-rfc/rfc9p2000.html#anchor32:

if all the elements of the directory entry in a Twstat message are
``don't touch'' val- ues, the server may interpret it as a request to
guarantee that the contents of the associated file are committed to
stable storage before the Rwstat message is returned.

A real TFSYNC operation is nicer though and could be mandatory (the
9P2000 RFC only says the server *may*).

 Another use
 case is to ensure that we don't leave pages on host as dirty. That would
 ensure that large writeback from a guest don't result in large number of
 dirty pages on the host, thereby resulting in writeback in the host. It
 would be needed for predictable I/O behavior in a setup where we have
 multiple guest.

I see.  I'm mostly curious about this change because the caching modes
are a nightmare with block devices - a lot of time is spent discussing
and benchmarking them, and they cause confusion when configuring KVM.

It sounds like O_SYNC is being used in order to keep page cache clean.
 But keeping the page cache clean is a side-effect of O_SYNC's
behavior: writing out each page and synchronizing the disk write
cache.  If you are attempting to bypass the page cache, just use
O_DIRECT without O_SYNC.  O_SYNC is doing the additional disk write
cache synchronization which will slow down I/O and prevent the server
from using disk write cache.  O_SYNC is not the right flag to use.

Stefan



Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-09 Thread Aneesh Kumar K.V
On Sun, 9 Oct 2011 17:16:50 +0100, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Sun, Oct 9, 2011 at 4:34 PM, Aneesh Kumar K.V
 aneesh.ku...@linux.vnet.ibm.com wrote:
  On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi stefa...@gmail.com 
  wrote:
  On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V
  aneesh.ku...@linux.vnet.ibm.com wrote:
   cache=writethrough implies the file are opened in the host with O_SYNC 
   open flag
  
   Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
   ---
    fsdev/file-op-9p.h         |    1 +
    fsdev/qemu-fsdev.c         |   10 --
    fsdev/qemu-fsdev.h         |    2 ++
    hw/9pfs/virtio-9p-device.c |    5 +
    hw/9pfs/virtio-9p.c        |   24 ++--
    qemu-config.c              |    6 ++
    qemu-options.hx            |   17 -
    vl.c                       |    6 ++
    8 files changed, 58 insertions(+), 13 deletions(-)
 
  When would this be used?  For serving up vanilla 9P?
 
  I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does not.
 
 
  TFSYNC is added by 9p.L. So we would need this for 9p.u.
 
 I think 9p.u is covered by this wstat hack in
 http://ericvh.github.com/9p-rfc/rfc9p2000.html#anchor32:
 
 if all the elements of the directory entry in a Twstat message are
 ``don't touch'' val- ues, the server may interpret it as a request to
 guarantee that the contents of the associated file are committed to
 stable storage before the Rwstat message is returned.
 
 A real TFSYNC operation is nicer though and could be mandatory (the
 9P2000 RFC only says the server *may*).
 
  Another use
  case is to ensure that we don't leave pages on host as dirty. That would
  ensure that large writeback from a guest don't result in large number of
  dirty pages on the host, thereby resulting in writeback in the host. It
  would be needed for predictable I/O behavior in a setup where we have
  multiple guest.
 
 I see.  I'm mostly curious about this change because the caching modes
 are a nightmare with block devices - a lot of time is spent discussing
 and benchmarking them, and they cause confusion when configuring KVM.
 
 It sounds like O_SYNC is being used in order to keep page cache clean.
  But keeping the page cache clean is a side-effect of O_SYNC's
 behavior: writing out each page and synchronizing the disk write
 cache.  If you are attempting to bypass the page cache, just use
 O_DIRECT without O_SYNC.

But how about reads. I want to make sure i get to use the page cache for
reads and also want to keep the page cache clean.

  O_SYNC is doing the additional disk write
 cache synchronization which will slow down I/O and prevent the server
 from using disk write cache.  O_SYNC is not the right flag to use.

O_DIRECT have additional requirement on buffer alignment, and we don't
want to send every read to disk. VirtFS also support zero copy
read/write, so that buffer alignment will always not be possible.
We want to make sure writes don't leave the page cache dirty so
that host doesn't spent much time in write back of data dirtied by the guest.


-aneesh




Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-09 Thread Stefan Hajnoczi
On Sun, Oct 9, 2011 at 7:35 PM, Aneesh Kumar K.V
aneesh.ku...@linux.vnet.ibm.com wrote:
 On Sun, 9 Oct 2011 17:16:50 +0100, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Sun, Oct 9, 2011 at 4:34 PM, Aneesh Kumar K.V
 aneesh.ku...@linux.vnet.ibm.com wrote:
  On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi stefa...@gmail.com 
  wrote:
  On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V
  aneesh.ku...@linux.vnet.ibm.com wrote:
   cache=writethrough implies the file are opened in the host with O_SYNC 
   open flag
  
   Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
   ---
    fsdev/file-op-9p.h         |    1 +
    fsdev/qemu-fsdev.c         |   10 --
    fsdev/qemu-fsdev.h         |    2 ++
    hw/9pfs/virtio-9p-device.c |    5 +
    hw/9pfs/virtio-9p.c        |   24 ++--
    qemu-config.c              |    6 ++
    qemu-options.hx            |   17 -
    vl.c                       |    6 ++
    8 files changed, 58 insertions(+), 13 deletions(-)
 
  When would this be used?  For serving up vanilla 9P?
 
  I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does not.
 
 
  TFSYNC is added by 9p.L. So we would need this for 9p.u.

 I think 9p.u is covered by this wstat hack in
 http://ericvh.github.com/9p-rfc/rfc9p2000.html#anchor32:

 if all the elements of the directory entry in a Twstat message are
 ``don't touch'' val- ues, the server may interpret it as a request to
 guarantee that the contents of the associated file are committed to
 stable storage before the Rwstat message is returned.

 A real TFSYNC operation is nicer though and could be mandatory (the
 9P2000 RFC only says the server *may*).

  Another use
  case is to ensure that we don't leave pages on host as dirty. That would
  ensure that large writeback from a guest don't result in large number of
  dirty pages on the host, thereby resulting in writeback in the host. It
  would be needed for predictable I/O behavior in a setup where we have
  multiple guest.

 I see.  I'm mostly curious about this change because the caching modes
 are a nightmare with block devices - a lot of time is spent discussing
 and benchmarking them, and they cause confusion when configuring KVM.

 It sounds like O_SYNC is being used in order to keep page cache clean.
  But keeping the page cache clean is a side-effect of O_SYNC's
 behavior: writing out each page and synchronizing the disk write
 cache.  If you are attempting to bypass the page cache, just use
 O_DIRECT without O_SYNC.

 But how about reads. I want to make sure i get to use the page cache for
 reads and also want to keep the page cache clean.

  O_SYNC is doing the additional disk write
 cache synchronization which will slow down I/O and prevent the server
 from using disk write cache.  O_SYNC is not the right flag to use.

 O_DIRECT have additional requirement on buffer alignment, and we don't
 want to send every read to disk.  VirtFS also support zero copy
 read/write, so that buffer alignment will always not be possible.
 We want to make sure writes don't leave the page cache dirty so
 that host doesn't spent much time in write back of data dirtied by the guest.

sync_file_range(2) kicks off the write-out but does not flush file
system metadata or synchronize the disk write cache.

Stefan



Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-08 Thread Stefan Hajnoczi
On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V
aneesh.ku...@linux.vnet.ibm.com wrote:
 cache=writethrough implies the file are opened in the host with O_SYNC open 
 flag

 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
  fsdev/file-op-9p.h         |    1 +
  fsdev/qemu-fsdev.c         |   10 --
  fsdev/qemu-fsdev.h         |    2 ++
  hw/9pfs/virtio-9p-device.c |    5 +
  hw/9pfs/virtio-9p.c        |   24 ++--
  qemu-config.c              |    6 ++
  qemu-options.hx            |   17 -
  vl.c                       |    6 ++
  8 files changed, 58 insertions(+), 13 deletions(-)

When would this be used?  For serving up vanilla 9P?

I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does not.

Stefan



[Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-10-07 Thread Aneesh Kumar K.V
cache=writethrough implies the file are opened in the host with O_SYNC open flag

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 fsdev/file-op-9p.h |1 +
 fsdev/qemu-fsdev.c |   10 --
 fsdev/qemu-fsdev.h |2 ++
 hw/9pfs/virtio-9p-device.c |5 +
 hw/9pfs/virtio-9p.c|   24 ++--
 qemu-config.c  |6 ++
 qemu-options.hx|   17 -
 vl.c   |6 ++
 8 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index 8de8abf..5d088d4 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -59,6 +59,7 @@ typedef struct FsContext
 char *fs_root;
 SecModel fs_sm;
 uid_t uid;
+int open_flags;
 struct xattr_operations **xops;
 /* fs driver specific data */
 void *private;
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index 768819f..fce016b 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts)
 const char *fstype = qemu_opt_get(opts, fstype);
 const char *path = qemu_opt_get(opts, path);
 const char *sec_model = qemu_opt_get(opts, security_model);
+const char *cache = qemu_opt_get(opts, cache);
+
 
 if (!fsdev_id) {
 fprintf(stderr, fsdev: No id specified\n);
@@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts)
 fsle-fse.path = g_strdup(path);
 fsle-fse.security_model = g_strdup(sec_model);
 fsle-fse.ops = FsTypes[i].ops;
-
+fsle-fse.cache_model = 0;
+if (cache) {
+if (!strcmp(cache, writethrough)) {
+fsle-fse.cache_model = V9FS_WRITETHROUGH_CACHE;
+}
+}
 QTAILQ_INSERT_TAIL(fstype_entries, fsle, next);
 return 0;
-
 }
 
 FsTypeEntry *get_fsdev_fsentry(char *id)
diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h
index e04931a..4e53966 100644
--- a/fsdev/qemu-fsdev.h
+++ b/fsdev/qemu-fsdev.h
@@ -34,6 +34,7 @@ typedef struct FsTypeTable {
 FileOperations *ops;
 } FsTypeTable;
 
+#define V9FS_WRITETHROUGH_CACHE 0x1
 /*
  * Structure to store the various fsdev's passed through command line.
  */
@@ -41,6 +42,7 @@ typedef struct FsTypeEntry {
 char *fsdev_id;
 char *path;
 char *security_model;
+int cache_model;
 FileOperations *ops;
 } FsTypeEntry;
 
diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index e5b68da..a267f01 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -115,6 +115,11 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf 
*conf)
 exit(1);
 }
 
+if (fse-cache_model  V9FS_WRITETHROUGH_CACHE) {
+s-ctx.open_flags = O_SYNC;
+} else {
+s-ctx.open_flags = 0;
+}
 s-ctx.fs_root = g_strdup(fse-path);
 len = strlen(conf-tag);
 if (len  MAX_TAG_LEN) {
diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
index c01c31a..1ca3c8e 100644
--- a/hw/9pfs/virtio-9p.c
+++ b/hw/9pfs/virtio-9p.c
@@ -80,6 +80,22 @@ void cred_init(FsCred *credp)
 credp-fc_rdev = -1;
 }
 
+static int get_dotl_openflags(V9fsState *s, int oflags)
+{
+int flags;
+/*
+ * Filter the client open flags
+ */
+flags = s-ctx.open_flags;
+flags |= oflags;
+flags = ~(O_NOCTTY | O_ASYNC | O_CREAT);
+/*
+ * Ignore direct disk access hint until the server supports it.
+ */
+flags = ~O_DIRECT;
+return flags;
+}
+
 void v9fs_string_init(V9fsString *str)
 {
 str-data = NULL;
@@ -1598,10 +1614,7 @@ static void v9fs_open(void *opaque)
 err = offset;
 } else {
 if (s-proto_version == V9FS_PROTO_2000L) {
-flags = mode;
-flags = ~(O_NOCTTY | O_ASYNC | O_CREAT);
-/* Ignore direct disk access hint until the server supports it. */
-flags = ~O_DIRECT;
+flags = get_dotl_openflags(s, mode);
 } else {
 flags = omode_to_uflags(mode);
 }
@@ -1650,8 +1663,7 @@ static void v9fs_lcreate(void *opaque)
 goto out_nofid;
 }
 
-/* Ignore direct disk access hint until the server supports it. */
-flags = ~O_DIRECT;
+flags = get_dotl_openflags(pdu-s, flags);
 err = v9fs_co_open2(pdu, fidp, name, gid,
 flags | O_CREAT, mode, stbuf);
 if (err  0) {
diff --git a/qemu-config.c b/qemu-config.c
index 7a7854f..b2ab0b2 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -177,6 +177,9 @@ QemuOptsList qemu_fsdev_opts = {
 }, {
 .name = security_model,
 .type = QEMU_OPT_STRING,
+}, {
+.name = cache,
+.type = QEMU_OPT_STRING,
 },
 { /*End of list */ }
 },
@@ -199,6 +202,9 @@ QemuOptsList qemu_virtfs_opts = {
 }, {
 .name = security_model,
 .type = QEMU_OPT_STRING,
+}, {
+.name = cache,
+.type = QEMU_OPT_STRING,
 },
 
   

[Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache

2011-06-06 Thread Aneesh Kumar K.V
cache=writethrough implies the file are opened in the host with O_SYNC open flag

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 fsdev/file-op-9p.h |1 +
 fsdev/qemu-fsdev.c |   10 --
 fsdev/qemu-fsdev.h |2 ++
 hw/9pfs/virtio-9p-device.c |6 ++
 hw/9pfs/virtio-9p.c|   28 
 qemu-config.c  |6 ++
 qemu-options.hx|   17 -
 vl.c   |6 ++
 8 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index 1eda342..fbf9afb 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -54,6 +54,7 @@ typedef struct FsContext
 char *fs_root;
 SecModel fs_sm;
 uid_t uid;
+int open_flags;
 struct xattr_operations **xops;
 } FsContext;
 
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index 0b33290..fccf011 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -33,6 +33,8 @@ int qemu_fsdev_add(QemuOpts *opts)
 const char *fstype = qemu_opt_get(opts, fstype);
 const char *path = qemu_opt_get(opts, path);
 const char *sec_model = qemu_opt_get(opts, security_model);
+const char *cache = qemu_opt_get(opts, cache);
+
 
 if (!fsdev_id) {
 fprintf(stderr, fsdev: No id specified\n);
@@ -71,10 +73,14 @@ int qemu_fsdev_add(QemuOpts *opts)
 fsle-fse.path = qemu_strdup(path);
 fsle-fse.security_model = qemu_strdup(sec_model);
 fsle-fse.ops = FsTypes[i].ops;
-
+fsle-fse.cache_model = 0;
+if (cache) {
+if (!strcmp(cache, writethrough)) {
+fsle-fse.cache_model = V9FS_WRITETHROUGH_CACHE;
+}
+}
 QTAILQ_INSERT_TAIL(fstype_entries, fsle, next);
 return 0;
-
 }
 
 FsTypeEntry *get_fsdev_fsentry(char *id)
diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h
index f9f08d3..8cc010b 100644
--- a/fsdev/qemu-fsdev.h
+++ b/fsdev/qemu-fsdev.h
@@ -34,6 +34,7 @@ typedef struct FsTypeTable {
 FileOperations *ops;
 } FsTypeTable;
 
+#define V9FS_WRITETHROUGH_CACHE 0x1
 /*
  * Structure to store the various fsdev's passed through command line.
  */
@@ -41,6 +42,7 @@ typedef struct FsTypeEntry {
 char *fsdev_id;
 char *path;
 char *security_model;
+int cache_model;
 FileOperations *ops;
 } FsTypeEntry;
 
diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 4bec8bf..55edc45 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -114,6 +114,12 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf 
*conf)
 exit(1);
 }
 
+if (fse-cache_model  V9FS_WRITETHROUGH_CACHE) {
+s-ctx.open_flags = O_SYNC;
+} else {
+s-ctx.open_flags = 0;
+}
+
 s-ctx.fs_root = qemu_strdup(fse-path);
 len = strlen(conf-tag);
 if (len  MAX_TAG_LEN) {
diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
index 55aca93..9d1813e 100644
--- a/hw/9pfs/virtio-9p.c
+++ b/hw/9pfs/virtio-9p.c
@@ -80,6 +80,22 @@ void cred_init(FsCred *credp)
 credp-fc_rdev = -1;
 }
 
+static int get_dotl_openflags(V9fsState *s, int oflags)
+{
+int flags;
+/*
+ * Filter the client open flags
+ */
+flags = s-ctx.open_flags;
+flags |= oflags;
+flags = ~(O_NOCTTY | O_ASYNC | O_CREAT);
+/*
+ * Ignore direct disk access hint until the server supports it.
+ */
+flags = ~O_DIRECT;
+return flags;
+}
+
 static void v9fs_string_init(V9fsString *str)
 {
 str-data = NULL;
@@ -1540,10 +1556,7 @@ static void v9fs_open(void *opaque)
 err = offset;
 } else {
 if (s-proto_version == V9FS_PROTO_2000L) {
-flags = mode;
-flags = ~(O_NOCTTY | O_ASYNC | O_CREAT);
-/* Ignore direct disk access hint until the server supports it. */
-flags = ~O_DIRECT;
+flags = get_dotl_openflags(s, mode);
 } else {
 flags = omode_to_uflags(mode);
 }
@@ -1596,10 +1609,9 @@ static void v9fs_lcreate(void *opaque)
 }
 v9fs_string_sprintf(fullname, %s/%s, fidp-path.data, name.data);
 
-/* Ignore direct disk access hint until the server supports it. */
-flags = ~O_DIRECT;
-
-err = v9fs_co_open2(pdu-s, fidp, fullname.data, gid, flags, mode);
+flags = get_dotl_openflags(pdu-s, flags);
+err = v9fs_co_open2(pdu-s, fidp, fullname.data, gid,
+flags | O_CREAT, mode);
 if (err  0) {
 goto out;
 }
diff --git a/qemu-config.c b/qemu-config.c
index 5d7ffa2..1125fb3 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -170,6 +170,9 @@ QemuOptsList qemu_fsdev_opts = {
 }, {
 .name = security_model,
 .type = QEMU_OPT_STRING,
+}, {
+.name = cache,
+.type = QEMU_OPT_STRING,
 },
 { /*End of list */ }
 },
@@ -192,6 +195,9 @@ QemuOptsList qemu_virtfs_opts = {
 }, {
 .name = security_model,