Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..441a37f 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, const struct iovec *iov, return writev(fd, iov, iovcnt); The sync_file_range(2) call below is dead-code since we'll return immediately after writev(2) completes. The writev(2) return value needs to be saved temporarily and then returned after sync_file_range(2). } #endif + if (ctx-cache_flags V9FS_WRITETHROUGH_CACHE) { -drive cache=writethrough means something different from 9pfs writethrough. This is confusing so I wonder if there is a better name like immediate write-out. + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when cache=writethrough is sepcified. + */ + sync_file_range(fd, offset, 0, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE); + } I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary. As a best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although a client that rapidly rewrites may be able to leave dirty pages in the host page cache. SYNC_FILE_RANGE_WAIT_BEFORE ensures that dirty pages get written out but it is no longer asynchronous because it blocks. Stefan
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Wed, 12 Oct 2011 10:55:21 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..441a37f 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, const struct iovec *iov, return writev(fd, iov, iovcnt); The sync_file_range(2) call below is dead-code since we'll return immediately after writev(2) completes. The writev(2) return value needs to be saved temporarily and then returned after sync_file_range(2). Missed that. Will fix in the next update } #endif + if (ctx-cache_flags V9FS_WRITETHROUGH_CACHE) { -drive cache=writethrough means something different from 9pfs writethrough. This is confusing so I wonder if there is a better name like immediate write-out. cache=immediate-write-out ? + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when cache=writethrough is sepcified. + */ + sync_file_range(fd, offset, 0, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE); + } I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary. As a best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although a client that rapidly rewrites may be able to leave dirty pages in the host page cache Shouldn't we prevent this ? That is the reason for me to use that WAIT_BEFORE ? . SYNC_FILE_RANGE_WAIT_BEFORE ensures that dirty pages get written out but it is no longer asynchronous because it blocks. -aneesh
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Wed, Oct 12, 2011 at 2:19 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: On Wed, 12 Oct 2011 10:55:21 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..441a37f 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, const struct iovec *iov, return writev(fd, iov, iovcnt); The sync_file_range(2) call below is dead-code since we'll return immediately after writev(2) completes. The writev(2) return value needs to be saved temporarily and then returned after sync_file_range(2). Missed that. Will fix in the next update } #endif + if (ctx-cache_flags V9FS_WRITETHROUGH_CACHE) { -drive cache=writethrough means something different from 9pfs writethrough. This is confusing so I wonder if there is a better name like immediate write-out. cache=immediate-write-out ? + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when cache=writethrough is sepcified. + */ + sync_file_range(fd, offset, 0, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE); + } I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary. As a best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although a client that rapidly rewrites may be able to leave dirty pages in the host page cache Shouldn't we prevent this ? That is the reason for me to use that WAIT_BEFORE ? The flag will cause sync_file_range(2) to wait on in-flight I/O. The guest will notice slow I/O. You should at least specify a range instead of nbytes=0 in the arguments. Stefan
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Wed, 12 Oct 2011 15:32:36 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Wed, Oct 12, 2011 at 2:19 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: On Wed, 12 Oct 2011 10:55:21 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Oct 10, 2011 at 10:06 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..441a37f 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -202,6 +202,15 @@ static ssize_t handle_pwritev(FsContext *ctx, int fd, const struct iovec *iov, return writev(fd, iov, iovcnt); The sync_file_range(2) call below is dead-code since we'll return immediately after writev(2) completes. The writev(2) return value needs to be saved temporarily and then returned after sync_file_range(2). Missed that. Will fix in the next update } #endif + if (ctx-cache_flags V9FS_WRITETHROUGH_CACHE) { -drive cache=writethrough means something different from 9pfs writethrough. This is confusing so I wonder if there is a better name like immediate write-out. cache=immediate-write-out ? + /* + * Initiate a writeback. This is not a data integrity sync. + * We want to ensure that we don't leave dirty pages in the cache + * after write when cache=writethrough is sepcified. + */ + sync_file_range(fd, offset, 0, + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE); + } I'm not sure whether SYNC_FILE_RANGE_WAIT_BEFORE is necessary. As a best-effort mechanism just SYNC_FILE_RANGE_WRITE does the job although a client that rapidly rewrites may be able to leave dirty pages in the host page cache Shouldn't we prevent this ? That is the reason for me to use that WAIT_BEFORE ? The flag will cause sync_file_range(2) to wait on in-flight I/O. The guest will notice slow I/O. You should at least specify a range instead of nbytes=0 in the arguments. version 3 i have commit db3cb2ac561d837176f9e0d02075a898c57ad309 Author: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Date: Wed Oct 12 19:11:23 2011 +0530 hw/9pfs: Add new virtfs option writeout=immediate skip host page cache writeout=immediate implies the after pwritev we do a sync_file_range. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 8de8abf..af3ecbe 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -53,12 +53,16 @@ struct xattr_operations; /* FsContext flag values */ #define PATHNAME_FSCONTEXT 0x1 +/* export flags */ +#define V9FS_IMMEDIATE_WRITEOUT 0x1 + typedef struct FsContext { int flags; char *fs_root; SecModel fs_sm; uid_t uid; +int export_flags; struct xattr_operations **xops; /* fs driver specific data */ void *private; diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 768819f..946bad7 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts) const char *fstype = qemu_opt_get(opts, fstype); const char *path = qemu_opt_get(opts, path); const char *sec_model = qemu_opt_get(opts, security_model); +const char *writeout = qemu_opt_get(opts, writeout); + if (!fsdev_id) { fprintf(stderr, fsdev: No id specified\n); @@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts) fsle-fse.path = g_strdup(path); fsle-fse.security_model = g_strdup(sec_model); fsle-fse.ops = FsTypes[i].ops; - +fsle-fse.export_flags = 0; +if (writeout) { +if (!strcmp(writeout, immediate)) { +fsle-fse.export_flags = V9FS_IMMEDIATE_WRITEOUT; +} +} QTAILQ_INSERT_TAIL(fstype_entries, fsle, next); return 0; - } FsTypeEntry *get_fsdev_fsentry(char *id) diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index e04931a..3b2feb4 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -41,6 +41,7 @@ typedef struct FsTypeEntry { char *fsdev_id; char *path; char *security_model; +int export_flags; FileOperations *ops; } FsTypeEntry; diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index e5b68da..403eed0 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -115,6 +115,7 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) exit(1); } +s-ctx.export_flags = fse-export_flags; s-ctx.fs_root = g_strdup(fse-path); len = strlen(conf-tag); if (len MAX_TAG_LEN) { diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index 5c8b5ed..91d757a 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -192,16 +192,27 @@ static ssize_t handle_preadv(FsContext *ctx, int fd, const struct iovec *iov, static ssize_t
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Mon, 10 Oct 2011 05:54:56 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Sun, Oct 9, 2011 at 7:35 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: On Sun, 9 Oct 2011 17:16:50 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Sun, Oct 9, 2011 at 4:34 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: cache=writethrough implies the file are opened in the host with O_SYNC open flag Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- fsdev/file-op-9p.h | 1 + fsdev/qemu-fsdev.c | 10 -- fsdev/qemu-fsdev.h | 2 ++ hw/9pfs/virtio-9p-device.c | 5 + hw/9pfs/virtio-9p.c | 24 ++-- qemu-config.c | 6 ++ qemu-options.hx | 17 - vl.c | 6 ++ 8 files changed, 58 insertions(+), 13 deletions(-) When would this be used? For serving up vanilla 9P? I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does not. TFSYNC is added by 9p.L. So we would need this for 9p.u. I think 9p.u is covered by this wstat hack in http://ericvh.github.com/9p-rfc/rfc9p2000.html#anchor32: if all the elements of the directory entry in a Twstat message are ``don't touch'' val- ues, the server may interpret it as a request to guarantee that the contents of the associated file are committed to stable storage before the Rwstat message is returned. A real TFSYNC operation is nicer though and could be mandatory (the 9P2000 RFC only says the server *may*). Another use case is to ensure that we don't leave pages on host as dirty. That would ensure that large writeback from a guest don't result in large number of dirty pages on the host, thereby resulting in writeback in the host. It would be needed for predictable I/O behavior in a setup where we have multiple guest. I see. I'm mostly curious about this change because the caching modes are a nightmare with block devices - a lot of time is spent discussing and benchmarking them, and they cause confusion when configuring KVM. It sounds like O_SYNC is being used in order to keep page cache clean. But keeping the page cache clean is a side-effect of O_SYNC's behavior: writing out each page and synchronizing the disk write cache. If you are attempting to bypass the page cache, just use O_DIRECT without O_SYNC. But how about reads. I want to make sure i get to use the page cache for reads and also want to keep the page cache clean. O_SYNC is doing the additional disk write cache synchronization which will slow down I/O and prevent the server from using disk write cache. O_SYNC is not the right flag to use. O_DIRECT have additional requirement on buffer alignment, and we don't want to send every read to disk. VirtFS also support zero copy read/write, so that buffer alignment will always not be possible. We want to make sure writes don't leave the page cache dirty so that host doesn't spent much time in write back of data dirtied by the guest. sync_file_range(2) kicks off the write-out but does not flush file system metadata or synchronize the disk write cache. something like below ? commit bc7c28877b30155264f351d26692b3cceca853bb Author: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Date: Mon May 23 11:29:15 2011 +0530 hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache cache=writethrough implies the file are opened in the host with O_SYNC open flag Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 8de8abf..84ec9e0 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -53,12 +53,16 @@ struct xattr_operations; /* FsContext flag values */ #define PATHNAME_FSCONTEXT 0x1 +/* cache flags */ +#define V9FS_WRITETHROUGH_CACHE 0x1 + typedef struct FsContext { int flags; char *fs_root; SecModel fs_sm; uid_t uid; +int cache_flags; struct xattr_operations **xops; /* fs driver specific data */ void *private; diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 768819f..fce016b 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts) const char *fstype = qemu_opt_get(opts, fstype); const char *path = qemu_opt_get(opts, path); const char *sec_model = qemu_opt_get(opts, security_model); +const char *cache = qemu_opt_get(opts, cache); + if (!fsdev_id) { fprintf(stderr, fsdev: No id specified\n); @@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts) fsle-fse.path =
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: cache=writethrough implies the file are opened in the host with O_SYNC open flag Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- fsdev/file-op-9p.h | 1 + fsdev/qemu-fsdev.c | 10 -- fsdev/qemu-fsdev.h | 2 ++ hw/9pfs/virtio-9p-device.c | 5 + hw/9pfs/virtio-9p.c | 24 ++-- qemu-config.c | 6 ++ qemu-options.hx | 17 - vl.c | 6 ++ 8 files changed, 58 insertions(+), 13 deletions(-) When would this be used? For serving up vanilla 9P? I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does not. TFSYNC is added by 9p.L. So we would need this for 9p.u. Another use case is to ensure that we don't leave pages on host as dirty. That would ensure that large writeback from a guest don't result in large number of dirty pages on the host, thereby resulting in writeback in the host. It would be needed for predictable I/O behavior in a setup where we have multiple guest. -aneesh
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Sun, Oct 9, 2011 at 4:34 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: cache=writethrough implies the file are opened in the host with O_SYNC open flag Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- fsdev/file-op-9p.h | 1 + fsdev/qemu-fsdev.c | 10 -- fsdev/qemu-fsdev.h | 2 ++ hw/9pfs/virtio-9p-device.c | 5 + hw/9pfs/virtio-9p.c | 24 ++-- qemu-config.c | 6 ++ qemu-options.hx | 17 - vl.c | 6 ++ 8 files changed, 58 insertions(+), 13 deletions(-) When would this be used? For serving up vanilla 9P? I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does not. TFSYNC is added by 9p.L. So we would need this for 9p.u. I think 9p.u is covered by this wstat hack in http://ericvh.github.com/9p-rfc/rfc9p2000.html#anchor32: if all the elements of the directory entry in a Twstat message are ``don't touch'' val- ues, the server may interpret it as a request to guarantee that the contents of the associated file are committed to stable storage before the Rwstat message is returned. A real TFSYNC operation is nicer though and could be mandatory (the 9P2000 RFC only says the server *may*). Another use case is to ensure that we don't leave pages on host as dirty. That would ensure that large writeback from a guest don't result in large number of dirty pages on the host, thereby resulting in writeback in the host. It would be needed for predictable I/O behavior in a setup where we have multiple guest. I see. I'm mostly curious about this change because the caching modes are a nightmare with block devices - a lot of time is spent discussing and benchmarking them, and they cause confusion when configuring KVM. It sounds like O_SYNC is being used in order to keep page cache clean. But keeping the page cache clean is a side-effect of O_SYNC's behavior: writing out each page and synchronizing the disk write cache. If you are attempting to bypass the page cache, just use O_DIRECT without O_SYNC. O_SYNC is doing the additional disk write cache synchronization which will slow down I/O and prevent the server from using disk write cache. O_SYNC is not the right flag to use. Stefan
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Sun, 9 Oct 2011 17:16:50 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Sun, Oct 9, 2011 at 4:34 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: cache=writethrough implies the file are opened in the host with O_SYNC open flag Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- fsdev/file-op-9p.h | 1 + fsdev/qemu-fsdev.c | 10 -- fsdev/qemu-fsdev.h | 2 ++ hw/9pfs/virtio-9p-device.c | 5 + hw/9pfs/virtio-9p.c | 24 ++-- qemu-config.c | 6 ++ qemu-options.hx | 17 - vl.c | 6 ++ 8 files changed, 58 insertions(+), 13 deletions(-) When would this be used? For serving up vanilla 9P? I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does not. TFSYNC is added by 9p.L. So we would need this for 9p.u. I think 9p.u is covered by this wstat hack in http://ericvh.github.com/9p-rfc/rfc9p2000.html#anchor32: if all the elements of the directory entry in a Twstat message are ``don't touch'' val- ues, the server may interpret it as a request to guarantee that the contents of the associated file are committed to stable storage before the Rwstat message is returned. A real TFSYNC operation is nicer though and could be mandatory (the 9P2000 RFC only says the server *may*). Another use case is to ensure that we don't leave pages on host as dirty. That would ensure that large writeback from a guest don't result in large number of dirty pages on the host, thereby resulting in writeback in the host. It would be needed for predictable I/O behavior in a setup where we have multiple guest. I see. I'm mostly curious about this change because the caching modes are a nightmare with block devices - a lot of time is spent discussing and benchmarking them, and they cause confusion when configuring KVM. It sounds like O_SYNC is being used in order to keep page cache clean. But keeping the page cache clean is a side-effect of O_SYNC's behavior: writing out each page and synchronizing the disk write cache. If you are attempting to bypass the page cache, just use O_DIRECT without O_SYNC. But how about reads. I want to make sure i get to use the page cache for reads and also want to keep the page cache clean. O_SYNC is doing the additional disk write cache synchronization which will slow down I/O and prevent the server from using disk write cache. O_SYNC is not the right flag to use. O_DIRECT have additional requirement on buffer alignment, and we don't want to send every read to disk. VirtFS also support zero copy read/write, so that buffer alignment will always not be possible. We want to make sure writes don't leave the page cache dirty so that host doesn't spent much time in write back of data dirtied by the guest. -aneesh
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Sun, Oct 9, 2011 at 7:35 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: On Sun, 9 Oct 2011 17:16:50 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Sun, Oct 9, 2011 at 4:34 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: On Sat, 8 Oct 2011 12:24:37 +0100, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: cache=writethrough implies the file are opened in the host with O_SYNC open flag Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- fsdev/file-op-9p.h | 1 + fsdev/qemu-fsdev.c | 10 -- fsdev/qemu-fsdev.h | 2 ++ hw/9pfs/virtio-9p-device.c | 5 + hw/9pfs/virtio-9p.c | 24 ++-- qemu-config.c | 6 ++ qemu-options.hx | 17 - vl.c | 6 ++ 8 files changed, 58 insertions(+), 13 deletions(-) When would this be used? For serving up vanilla 9P? I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does not. TFSYNC is added by 9p.L. So we would need this for 9p.u. I think 9p.u is covered by this wstat hack in http://ericvh.github.com/9p-rfc/rfc9p2000.html#anchor32: if all the elements of the directory entry in a Twstat message are ``don't touch'' val- ues, the server may interpret it as a request to guarantee that the contents of the associated file are committed to stable storage before the Rwstat message is returned. A real TFSYNC operation is nicer though and could be mandatory (the 9P2000 RFC only says the server *may*). Another use case is to ensure that we don't leave pages on host as dirty. That would ensure that large writeback from a guest don't result in large number of dirty pages on the host, thereby resulting in writeback in the host. It would be needed for predictable I/O behavior in a setup where we have multiple guest. I see. I'm mostly curious about this change because the caching modes are a nightmare with block devices - a lot of time is spent discussing and benchmarking them, and they cause confusion when configuring KVM. It sounds like O_SYNC is being used in order to keep page cache clean. But keeping the page cache clean is a side-effect of O_SYNC's behavior: writing out each page and synchronizing the disk write cache. If you are attempting to bypass the page cache, just use O_DIRECT without O_SYNC. But how about reads. I want to make sure i get to use the page cache for reads and also want to keep the page cache clean. O_SYNC is doing the additional disk write cache synchronization which will slow down I/O and prevent the server from using disk write cache. O_SYNC is not the right flag to use. O_DIRECT have additional requirement on buffer alignment, and we don't want to send every read to disk. VirtFS also support zero copy read/write, so that buffer alignment will always not be possible. We want to make sure writes don't leave the page cache dirty so that host doesn't spent much time in write back of data dirtied by the guest. sync_file_range(2) kicks off the write-out but does not flush file system metadata or synchronize the disk write cache. Stefan
Re: [Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
On Fri, Oct 7, 2011 at 7:46 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: cache=writethrough implies the file are opened in the host with O_SYNC open flag Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- fsdev/file-op-9p.h | 1 + fsdev/qemu-fsdev.c | 10 -- fsdev/qemu-fsdev.h | 2 ++ hw/9pfs/virtio-9p-device.c | 5 + hw/9pfs/virtio-9p.c | 24 ++-- qemu-config.c | 6 ++ qemu-options.hx | 17 - vl.c | 6 ++ 8 files changed, 58 insertions(+), 13 deletions(-) When would this be used? For serving up vanilla 9P? I think 9P.u and 9P.l have support for fsync(2) while vanilla 9P does not. Stefan
[Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
cache=writethrough implies the file are opened in the host with O_SYNC open flag Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- fsdev/file-op-9p.h |1 + fsdev/qemu-fsdev.c | 10 -- fsdev/qemu-fsdev.h |2 ++ hw/9pfs/virtio-9p-device.c |5 + hw/9pfs/virtio-9p.c| 24 ++-- qemu-config.c |6 ++ qemu-options.hx| 17 - vl.c |6 ++ 8 files changed, 58 insertions(+), 13 deletions(-) diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 8de8abf..5d088d4 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -59,6 +59,7 @@ typedef struct FsContext char *fs_root; SecModel fs_sm; uid_t uid; +int open_flags; struct xattr_operations **xops; /* fs driver specific data */ void *private; diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 768819f..fce016b 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -34,6 +34,8 @@ int qemu_fsdev_add(QemuOpts *opts) const char *fstype = qemu_opt_get(opts, fstype); const char *path = qemu_opt_get(opts, path); const char *sec_model = qemu_opt_get(opts, security_model); +const char *cache = qemu_opt_get(opts, cache); + if (!fsdev_id) { fprintf(stderr, fsdev: No id specified\n); @@ -72,10 +74,14 @@ int qemu_fsdev_add(QemuOpts *opts) fsle-fse.path = g_strdup(path); fsle-fse.security_model = g_strdup(sec_model); fsle-fse.ops = FsTypes[i].ops; - +fsle-fse.cache_model = 0; +if (cache) { +if (!strcmp(cache, writethrough)) { +fsle-fse.cache_model = V9FS_WRITETHROUGH_CACHE; +} +} QTAILQ_INSERT_TAIL(fstype_entries, fsle, next); return 0; - } FsTypeEntry *get_fsdev_fsentry(char *id) diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index e04931a..4e53966 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -34,6 +34,7 @@ typedef struct FsTypeTable { FileOperations *ops; } FsTypeTable; +#define V9FS_WRITETHROUGH_CACHE 0x1 /* * Structure to store the various fsdev's passed through command line. */ @@ -41,6 +42,7 @@ typedef struct FsTypeEntry { char *fsdev_id; char *path; char *security_model; +int cache_model; FileOperations *ops; } FsTypeEntry; diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index e5b68da..a267f01 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -115,6 +115,11 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) exit(1); } +if (fse-cache_model V9FS_WRITETHROUGH_CACHE) { +s-ctx.open_flags = O_SYNC; +} else { +s-ctx.open_flags = 0; +} s-ctx.fs_root = g_strdup(fse-path); len = strlen(conf-tag); if (len MAX_TAG_LEN) { diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c index c01c31a..1ca3c8e 100644 --- a/hw/9pfs/virtio-9p.c +++ b/hw/9pfs/virtio-9p.c @@ -80,6 +80,22 @@ void cred_init(FsCred *credp) credp-fc_rdev = -1; } +static int get_dotl_openflags(V9fsState *s, int oflags) +{ +int flags; +/* + * Filter the client open flags + */ +flags = s-ctx.open_flags; +flags |= oflags; +flags = ~(O_NOCTTY | O_ASYNC | O_CREAT); +/* + * Ignore direct disk access hint until the server supports it. + */ +flags = ~O_DIRECT; +return flags; +} + void v9fs_string_init(V9fsString *str) { str-data = NULL; @@ -1598,10 +1614,7 @@ static void v9fs_open(void *opaque) err = offset; } else { if (s-proto_version == V9FS_PROTO_2000L) { -flags = mode; -flags = ~(O_NOCTTY | O_ASYNC | O_CREAT); -/* Ignore direct disk access hint until the server supports it. */ -flags = ~O_DIRECT; +flags = get_dotl_openflags(s, mode); } else { flags = omode_to_uflags(mode); } @@ -1650,8 +1663,7 @@ static void v9fs_lcreate(void *opaque) goto out_nofid; } -/* Ignore direct disk access hint until the server supports it. */ -flags = ~O_DIRECT; +flags = get_dotl_openflags(pdu-s, flags); err = v9fs_co_open2(pdu, fidp, name, gid, flags | O_CREAT, mode, stbuf); if (err 0) { diff --git a/qemu-config.c b/qemu-config.c index 7a7854f..b2ab0b2 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -177,6 +177,9 @@ QemuOptsList qemu_fsdev_opts = { }, { .name = security_model, .type = QEMU_OPT_STRING, +}, { +.name = cache, +.type = QEMU_OPT_STRING, }, { /*End of list */ } }, @@ -199,6 +202,9 @@ QemuOptsList qemu_virtfs_opts = { }, { .name = security_model, .type = QEMU_OPT_STRING, +}, { +.name = cache, +.type = QEMU_OPT_STRING, },
[Qemu-devel] [PATCH 1/2] hw/9pfs: Add new virtfs option cache=writethrough to skip host page cache
cache=writethrough implies the file are opened in the host with O_SYNC open flag Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- fsdev/file-op-9p.h |1 + fsdev/qemu-fsdev.c | 10 -- fsdev/qemu-fsdev.h |2 ++ hw/9pfs/virtio-9p-device.c |6 ++ hw/9pfs/virtio-9p.c| 28 qemu-config.c |6 ++ qemu-options.hx| 17 - vl.c |6 ++ 8 files changed, 61 insertions(+), 15 deletions(-) diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 1eda342..fbf9afb 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -54,6 +54,7 @@ typedef struct FsContext char *fs_root; SecModel fs_sm; uid_t uid; +int open_flags; struct xattr_operations **xops; } FsContext; diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c index 0b33290..fccf011 100644 --- a/fsdev/qemu-fsdev.c +++ b/fsdev/qemu-fsdev.c @@ -33,6 +33,8 @@ int qemu_fsdev_add(QemuOpts *opts) const char *fstype = qemu_opt_get(opts, fstype); const char *path = qemu_opt_get(opts, path); const char *sec_model = qemu_opt_get(opts, security_model); +const char *cache = qemu_opt_get(opts, cache); + if (!fsdev_id) { fprintf(stderr, fsdev: No id specified\n); @@ -71,10 +73,14 @@ int qemu_fsdev_add(QemuOpts *opts) fsle-fse.path = qemu_strdup(path); fsle-fse.security_model = qemu_strdup(sec_model); fsle-fse.ops = FsTypes[i].ops; - +fsle-fse.cache_model = 0; +if (cache) { +if (!strcmp(cache, writethrough)) { +fsle-fse.cache_model = V9FS_WRITETHROUGH_CACHE; +} +} QTAILQ_INSERT_TAIL(fstype_entries, fsle, next); return 0; - } FsTypeEntry *get_fsdev_fsentry(char *id) diff --git a/fsdev/qemu-fsdev.h b/fsdev/qemu-fsdev.h index f9f08d3..8cc010b 100644 --- a/fsdev/qemu-fsdev.h +++ b/fsdev/qemu-fsdev.h @@ -34,6 +34,7 @@ typedef struct FsTypeTable { FileOperations *ops; } FsTypeTable; +#define V9FS_WRITETHROUGH_CACHE 0x1 /* * Structure to store the various fsdev's passed through command line. */ @@ -41,6 +42,7 @@ typedef struct FsTypeEntry { char *fsdev_id; char *path; char *security_model; +int cache_model; FileOperations *ops; } FsTypeEntry; diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index 4bec8bf..55edc45 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -114,6 +114,12 @@ VirtIODevice *virtio_9p_init(DeviceState *dev, V9fsConf *conf) exit(1); } +if (fse-cache_model V9FS_WRITETHROUGH_CACHE) { +s-ctx.open_flags = O_SYNC; +} else { +s-ctx.open_flags = 0; +} + s-ctx.fs_root = qemu_strdup(fse-path); len = strlen(conf-tag); if (len MAX_TAG_LEN) { diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c index 55aca93..9d1813e 100644 --- a/hw/9pfs/virtio-9p.c +++ b/hw/9pfs/virtio-9p.c @@ -80,6 +80,22 @@ void cred_init(FsCred *credp) credp-fc_rdev = -1; } +static int get_dotl_openflags(V9fsState *s, int oflags) +{ +int flags; +/* + * Filter the client open flags + */ +flags = s-ctx.open_flags; +flags |= oflags; +flags = ~(O_NOCTTY | O_ASYNC | O_CREAT); +/* + * Ignore direct disk access hint until the server supports it. + */ +flags = ~O_DIRECT; +return flags; +} + static void v9fs_string_init(V9fsString *str) { str-data = NULL; @@ -1540,10 +1556,7 @@ static void v9fs_open(void *opaque) err = offset; } else { if (s-proto_version == V9FS_PROTO_2000L) { -flags = mode; -flags = ~(O_NOCTTY | O_ASYNC | O_CREAT); -/* Ignore direct disk access hint until the server supports it. */ -flags = ~O_DIRECT; +flags = get_dotl_openflags(s, mode); } else { flags = omode_to_uflags(mode); } @@ -1596,10 +1609,9 @@ static void v9fs_lcreate(void *opaque) } v9fs_string_sprintf(fullname, %s/%s, fidp-path.data, name.data); -/* Ignore direct disk access hint until the server supports it. */ -flags = ~O_DIRECT; - -err = v9fs_co_open2(pdu-s, fidp, fullname.data, gid, flags, mode); +flags = get_dotl_openflags(pdu-s, flags); +err = v9fs_co_open2(pdu-s, fidp, fullname.data, gid, +flags | O_CREAT, mode); if (err 0) { goto out; } diff --git a/qemu-config.c b/qemu-config.c index 5d7ffa2..1125fb3 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -170,6 +170,9 @@ QemuOptsList qemu_fsdev_opts = { }, { .name = security_model, .type = QEMU_OPT_STRING, +}, { +.name = cache, +.type = QEMU_OPT_STRING, }, { /*End of list */ } }, @@ -192,6 +195,9 @@ QemuOptsList qemu_virtfs_opts = { }, { .name = security_model,