Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
On 06/23/2014 03:17 AM, Chunyan Liu wrote: Add 'nocow' option so that users could have a chance to set NOCOW flag to newly created files. It's useful on btrfs file system to enhance performance. Btrfs has low performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files. Generally, there are two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then all newly created files will be NOCOW. b) per file. Add the NOCOW file attribute. It could only be done to empty or new files. This patch tries the second way, according to the option, it could add NOCOW per file. For most block drivers, since the create file step is in raw-posix.c, so we can do setting NOCOW flag ioctl in raw-posix.c only. But there are some exceptions, like block/vpc.c and block/vdi.c, they are creating file by calling qemu_open directly. For them, do the same setting NOCOW flag ioctl work in them separately. Design question (not a patch review): It looks like your patch allows one to set the NOCOW flag via ioctl when requested. But how does one learn if the flag is already set? Can you update 'qemu-img info' to show whether a file currently has the flag set? Can 'qemu-img amend' be taught to set and/or clear the flag on an already existing file? @@ -1291,6 +1296,21 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp) result = -errno; error_setg_errno(errp, -result, Could not create file); } else { +if (nocow) { +#ifdef __linux__ +/* Set NOCOW flag to solve performance issue on fs like btrfs. + * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value + * will be ignored since any failure of this operation should not + * block the left work. + */ +int attr; +if (ioctl(fd, FS_IOC_GETFLAGS, attr) == 0) { +attr |= FS_NOCOW_FL; +ioctl(fd, FS_IOC_SETFLAGS, attr); +} +#endif +} This silently ignores the nocow flag on non-Linux. Wouldn't it be better to reject the option as unsupported? What happens if the ioctl fails? Would it be better to fail the qemu-img creation if the flag is requested but can't be honored? +++ b/qemu-doc.texi @@ -589,6 +589,22 @@ check -r all} is required, which may take some time. This option can only be enabled if @code{compat=1.1} is specified. +@item nocow +If this option is set to @code{on}, it will trun off COW of the file. It's only s/trun/turn/ +valid on btrfs, no effect on other file systems. This sort of statement may get stale, if other file systems learn to honor the same ioctl as btrfs. + +Btrfs has low performance when hosting a VM image file, even more when the guest +on the VM also using btrfs as file system. Turning off COW is a way to mitigate +this bad performance. Generally there are two ways to turn off COW on btrfs: +a) Disable it by mounting with nodatacow, then all newly created files will be +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this option +does. + +Note: this option is only valid to new or empty files. If there is an existing +file which is COW and has data blocks already, it couldn't be changed to NOCOW +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check if +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag). s/Capitabl/Capital/ Oh, so it looks like setting the attribute is one-way, and can't be undone once something is written? Or is it that it can only be set on an empty file, but can be cleared at any time? Again, making people refer to lsattr to learn if the flag is already set seems painful; can qemu-img info be taught to expose this information, so that one tool is sufficient to manage the entire experience? +++ b/qemu-img.texi @@ -474,6 +474,22 @@ check -r all} is required, which may take some time. This option can only be enabled if @code{compat=1.1} is specified. +@item nocow +If this option is set to @code{on}, it will trun off COW of the file. It's only s/trun/turn/ +valid on btrfs, no effect on other file systems. + +Btrfs has low performance when hosting a VM image file, even more when the guest +on the VM also using btrfs as file system. Turning off COW is a way to mitigate +this bad performance. Generally there are two ways to turn off COW on btrfs: +a) Disable it by mounting with nodatacow, then all newly created files will be +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this option +does. + +Note: this option is only valid to new or empty files. If there is an existing +file which is COW and has data blocks already, it couldn't be changed to NOCOW +by setting @code{nocow=on}. One can issue @code{lsattr
Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
On 7/2/2014 at 05:02 AM, in message 53b321cc.7070...@redhat.com, Eric Blake ebl...@redhat.com wrote: On 06/23/2014 03:17 AM, Chunyan Liu wrote: Add 'nocow' option so that users could have a chance to set NOCOW flag to newly created files. It's useful on btrfs file system to enhance performance. Btrfs has low performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files. Generally, there are two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then all newly created files will be NOCOW. b) per file. Add the NOCOW file attribute. It could only be done to empty or new files. This patch tries the second way, according to the option, it could add NOCOW per file. For most block drivers, since the create file step is in raw-posix.c, so we can do setting NOCOW flag ioctl in raw-posix.c only. But there are some exceptions, like block/vpc.c and block/vdi.c, they are creating file by calling qemu_open directly. For them, do the same setting NOCOW flag ioctl work in them separately. Design question (not a patch review): It looks like your patch allows one to set the NOCOW flag via ioctl when requested. But how does one learn if the flag is already set? Can you update 'qemu-img info' to show whether a file currently has the flag set? The reliable way to check if the file is really NOCOW is 'lsattr filename'. -o nocow=on not really means NOCOW successfully set. We didn't block file creation even if ioctl fails (see below). Maybe 'qemu-img info' can call 'lsattr filename' to judge if the file is NOCOW. Can 'qemu-img amend' be taught to set and/or clear the flag on an already existing file? No. It's one way. A COW file can be not changed to NOCOW, vice versa. @@ -1291,6 +1296,21 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp) result = -errno; error_setg_errno(errp, -result, Could not create file); } else { +if (nocow) { +#ifdef __linux__ +/* Set NOCOW flag to solve performance issue on fs like btrfs. + * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value + * will be ignored since any failure of this operation should not + * block the left work. + */ +int attr; +if (ioctl(fd, FS_IOC_GETFLAGS, attr) == 0) { +attr |= FS_NOCOW_FL; +ioctl(fd, FS_IOC_SETFLAGS, attr); +} +#endif +} This silently ignores the nocow flag on non-Linux. Wouldn't it be better to reject the option as unsupported? First, other file systems (at least ext3 ext4 as I checked ) not supporting that ioctl would simply return 0. So we couldn't rely on the ioctl return value to check if the option is unsupported. What happens if the ioctl fails? Would it be better to fail the qemu-img creation if the flag is requested but can't be honored? NOCOW is an optimization for performance, we don't want it to block the file creation. In other words, if ioctl fails, we still hope the file image is created. Another reason for doing this is the ioctl return value is not consistent in different file systems, ioctl return value: 0: not always means SUCCESS, could be UNSUPPORTED !0: could be supported but not successfuly (like in btrfs); could be unsupported at all but other error. (other fs. In this case, we don't want to block the file creation.) +++ b/qemu-doc.texi @@ -589,6 +589,22 @@ check -r all} is required, which may take some time. This option can only be enabled if @code{compat=1.1} is specified. +@item nocow +If this option is set to @code{on}, it will trun off COW of the file. It's only s/trun/turn/ +valid on btrfs, no effect on other file systems. This sort of statement may get stale, if other file systems learn to honor the same ioctl as btrfs. Yes. Currently only btrfs supports this ioctl. Writting here because we didn't check what current fs is, and we didn't have a reliable way to report the ioctl is unsupported (as above). So to warn users in manpage. + +Btrfs has low performance when hosting a VM image file, even more when the guest +on the VM also using btrfs as file system. Turning off COW is a way to mitigate +this bad performance. Generally there are two ways to turn off COW on btrfs: +a) Disable it by mounting with nodatacow, then all newly created files will be +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this option +does. + +Note: this option is only valid to new or empty files. If there is an existing +file which is COW and has data blocks already, it
Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
On 6/27/2014 at 07:48 PM, in message 20140627114806.gm12...@stefanha-thinkpad.muc.redhat.com, Stefan Hajnoczi stefa...@redhat.com wrote: On Mon, Jun 23, 2014 at 05:17:02PM +0800, Chunyan Liu wrote: Add 'nocow' option so that users could have a chance to set NOCOW flag to newly created files. It's useful on btrfs file system to enhance performance. Btrfs has low performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files. Generally, there are two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then all newly created files will be NOCOW. b) per file. Add the NOCOW file attribute. It could only be done to empty or new files. This patch tries the second way, according to the option, it could add NOCOW per file. For most block drivers, since the create file step is in raw-posix.c, so we can do setting NOCOW flag ioctl in raw-posix.c only. But there are some exceptions, like block/vpc.c and block/vdi.c, they are creating file by calling qemu_open directly. For them, do the same setting NOCOW flag ioctl work in them separately. Signed-off-by: Chunyan Liu cy...@suse.com --- Changes to v2: * based on QemuOpts instead of old QEMUOptionParameters * add nocow description in man page and html doc Old v2 is here: http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02429.html --- block/cow.c | 5 + block/qcow.c | 5 + block/qcow2.c | 5 + block/qed.c | 11 --- block/raw-posix.c | 25 + block/vdi.c | 29 + block/vhdx.c | 5 + block/vmdk.c | 11 --- block/vpc.c | 29 + include/block/block_int.h | 1 + qemu-doc.texi | 16 qemu-img.texi | 16 12 files changed, 152 insertions(+), 6 deletions(-) Are you sure it's necessary to touch all image formats in order to pass through the nocow option? Looking at bdrv_img_create() I think it will work without touching all image formats since both drv and proto_drv-create_opts are appended: Right. For those calling bdrv_create_file to create file, it's not necessary to add NOCOW option to their .create_opts. Adding NOCOW to raw-posix.c is enough. There will be no difference to users when they do: qemu-img create -f fmt name size -o nocow=on or qemu-img create -f fmt name size -o ? void bdrv_img_create(const char *filename, const char *fmt, const char *base_filename, const char *base_fmt, char *options, uint64_t img_size, int flags, Error **errp, bool quiet) { QemuOptsList *create_opts = NULL; ... create_opts = qemu_opts_append(create_opts, drv-create_opts); create_opts = qemu_opts_append(create_opts, proto_drv-create_opts); /* Create parameter list with default values */ opts = qemu_opts_create(create_opts, NULL, 0, error_abort); qemu_opt_set_number(opts, BLOCK_OPT_SIZE, img_size); /* Parse -o options */ if (options) { if (qemu_opts_do_parse(opts, options, NULL) != 0) { error_setg(errp, Invalid options for file format '%s', fmt); goto out; } }
Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
On Mon, Jun 23, 2014 at 05:17:02PM +0800, Chunyan Liu wrote: Add 'nocow' option so that users could have a chance to set NOCOW flag to newly created files. It's useful on btrfs file system to enhance performance. Btrfs has low performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files. Generally, there are two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then all newly created files will be NOCOW. b) per file. Add the NOCOW file attribute. It could only be done to empty or new files. This patch tries the second way, according to the option, it could add NOCOW per file. For most block drivers, since the create file step is in raw-posix.c, so we can do setting NOCOW flag ioctl in raw-posix.c only. But there are some exceptions, like block/vpc.c and block/vdi.c, they are creating file by calling qemu_open directly. For them, do the same setting NOCOW flag ioctl work in them separately. Signed-off-by: Chunyan Liu cy...@suse.com --- Changes to v2: * based on QemuOpts instead of old QEMUOptionParameters * add nocow description in man page and html doc Old v2 is here: http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02429.html --- block/cow.c | 5 + block/qcow.c | 5 + block/qcow2.c | 5 + block/qed.c | 11 --- block/raw-posix.c | 25 + block/vdi.c | 29 + block/vhdx.c | 5 + block/vmdk.c | 11 --- block/vpc.c | 29 + include/block/block_int.h | 1 + qemu-doc.texi | 16 qemu-img.texi | 16 12 files changed, 152 insertions(+), 6 deletions(-) Are you sure it's necessary to touch all image formats in order to pass through the nocow option? Looking at bdrv_img_create() I think it will work without touching all image formats since both drv and proto_drv-create_opts are appended: void bdrv_img_create(const char *filename, const char *fmt, const char *base_filename, const char *base_fmt, char *options, uint64_t img_size, int flags, Error **errp, bool quiet) { QemuOptsList *create_opts = NULL; ... create_opts = qemu_opts_append(create_opts, drv-create_opts); create_opts = qemu_opts_append(create_opts, proto_drv-create_opts); /* Create parameter list with default values */ opts = qemu_opts_create(create_opts, NULL, 0, error_abort); qemu_opt_set_number(opts, BLOCK_OPT_SIZE, img_size); /* Parse -o options */ if (options) { if (qemu_opts_do_parse(opts, options, NULL) != 0) { error_setg(errp, Invalid options for file format '%s', fmt); goto out; } } pgpn82sRWQr9t.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
Hi, Stefan Kevin, Could you help to have a look at this version? We've discussed about this last November and now switch it to QemuOpts. Thanks, Chunyan On 6/23/2014 at 05:17 PM, in message 1403515022-24802-1-git-send-email-cy...@suse.com, Chunyan Liu cy...@suse.com wrote: Add 'nocow' option so that users could have a chance to set NOCOW flag to newly created files. It's useful on btrfs file system to enhance performance. Btrfs has low performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files. Generally, there are two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then all newly created files will be NOCOW. b) per file. Add the NOCOW file attribute. It could only be done to empty or new files. This patch tries the second way, according to the option, it could add NOCOW per file. For most block drivers, since the create file step is in raw-posix.c, so we can do setting NOCOW flag ioctl in raw-posix.c only. But there are some exceptions, like block/vpc.c and block/vdi.c, they are creating file by calling qemu_open directly. For them, do the same setting NOCOW flag ioctl work in them separately. Signed-off-by: Chunyan Liu cy...@suse.com --- Changes to v2: * based on QemuOpts instead of old QEMUOptionParameters * add nocow description in man page and html doc Old v2 is here: http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02429.html --- block/cow.c | 5 + block/qcow.c | 5 + block/qcow2.c | 5 + block/qed.c | 11 --- block/raw-posix.c | 25 + block/vdi.c | 29 + block/vhdx.c | 5 + block/vmdk.c | 11 --- block/vpc.c | 29 + include/block/block_int.h | 1 + qemu-doc.texi | 16 qemu-img.texi | 16 12 files changed, 152 insertions(+), 6 deletions(-) diff --git a/block/cow.c b/block/cow.c index a05a92c..43b537c 100644 --- a/block/cow.c +++ b/block/cow.c @@ -401,6 +401,11 @@ static QemuOptsList cow_create_opts = { .type = QEMU_OPT_STRING, .help = File name of a base image }, +{ +.name = BLOCK_OPT_NOCOW, +.type = QEMU_OPT_BOOL, +.help = Turn off copy-on-write (valid only on btrfs) +}, { /* end of list */ } } }; diff --git a/block/qcow.c b/block/qcow.c index 1f2bac8..5b23540 100644 --- a/block/qcow.c +++ b/block/qcow.c @@ -928,6 +928,11 @@ static QemuOptsList qcow_create_opts = { .help = Encrypt the image, .def_value_str = off }, +{ +.name = BLOCK_OPT_NOCOW, +.type = QEMU_OPT_BOOL, +.help = Turn off copy-on-write (valid only on btrfs) +}, { /* end of list */ } } }; diff --git a/block/qcow2.c b/block/qcow2.c index b9d2fa6..3a4cc8a 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -2382,6 +2382,11 @@ static QemuOptsList qcow2_create_opts = { .help = Postpone refcount updates, .def_value_str = off }, +{ +.name = BLOCK_OPT_NOCOW, +.type = QEMU_OPT_BOOL, +.help = Turn off copy-on-write (valid only on btrfs) +}, { /* end of list */ } } }; diff --git a/block/qed.c b/block/qed.c index 092e6fb..460ac92 100644 --- a/block/qed.c +++ b/block/qed.c @@ -567,7 +567,7 @@ static void bdrv_qed_close(BlockDriverState *bs) static int qed_create(const char *filename, uint32_t cluster_size, uint64_t image_size, uint32_t table_size, const char *backing_file, const char *backing_fmt, - Error **errp) + QemuOpts *opts, Error **errp) { QEDHeader header = { .magic = QED_MAGIC, @@ -586,7 +586,7 @@ static int qed_create(const char *filename, uint32_t cluster_size, int ret = 0; BlockDriverState *bs; -ret = bdrv_create_file(filename, NULL, local_err); +ret = bdrv_create_file(filename, opts, local_err); if (ret 0) { error_propagate(errp, local_err); return ret; @@ -682,7 +682,7 @@ static int bdrv_qed_create(const char *filename, QemuOpts *opts, Error **errp) } ret = qed_create(filename, cluster_size, image_size, table_size, - backing_file, backing_fmt, errp); + backing_file, backing_fmt, opts, errp); finish:
[Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
Add 'nocow' option so that users could have a chance to set NOCOW flag to newly created files. It's useful on btrfs file system to enhance performance. Btrfs has low performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files. Generally, there are two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then all newly created files will be NOCOW. b) per file. Add the NOCOW file attribute. It could only be done to empty or new files. This patch tries the second way, according to the option, it could add NOCOW per file. For most block drivers, since the create file step is in raw-posix.c, so we can do setting NOCOW flag ioctl in raw-posix.c only. But there are some exceptions, like block/vpc.c and block/vdi.c, they are creating file by calling qemu_open directly. For them, do the same setting NOCOW flag ioctl work in them separately. Signed-off-by: Chunyan Liu cy...@suse.com --- Changes to v2: * based on QemuOpts instead of old QEMUOptionParameters * add nocow description in man page and html doc Old v2 is here: http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02429.html --- block/cow.c | 5 + block/qcow.c | 5 + block/qcow2.c | 5 + block/qed.c | 11 --- block/raw-posix.c | 25 + block/vdi.c | 29 + block/vhdx.c | 5 + block/vmdk.c | 11 --- block/vpc.c | 29 + include/block/block_int.h | 1 + qemu-doc.texi | 16 qemu-img.texi | 16 12 files changed, 152 insertions(+), 6 deletions(-) diff --git a/block/cow.c b/block/cow.c index a05a92c..43b537c 100644 --- a/block/cow.c +++ b/block/cow.c @@ -401,6 +401,11 @@ static QemuOptsList cow_create_opts = { .type = QEMU_OPT_STRING, .help = File name of a base image }, +{ +.name = BLOCK_OPT_NOCOW, +.type = QEMU_OPT_BOOL, +.help = Turn off copy-on-write (valid only on btrfs) +}, { /* end of list */ } } }; diff --git a/block/qcow.c b/block/qcow.c index 1f2bac8..5b23540 100644 --- a/block/qcow.c +++ b/block/qcow.c @@ -928,6 +928,11 @@ static QemuOptsList qcow_create_opts = { .help = Encrypt the image, .def_value_str = off }, +{ +.name = BLOCK_OPT_NOCOW, +.type = QEMU_OPT_BOOL, +.help = Turn off copy-on-write (valid only on btrfs) +}, { /* end of list */ } } }; diff --git a/block/qcow2.c b/block/qcow2.c index b9d2fa6..3a4cc8a 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -2382,6 +2382,11 @@ static QemuOptsList qcow2_create_opts = { .help = Postpone refcount updates, .def_value_str = off }, +{ +.name = BLOCK_OPT_NOCOW, +.type = QEMU_OPT_BOOL, +.help = Turn off copy-on-write (valid only on btrfs) +}, { /* end of list */ } } }; diff --git a/block/qed.c b/block/qed.c index 092e6fb..460ac92 100644 --- a/block/qed.c +++ b/block/qed.c @@ -567,7 +567,7 @@ static void bdrv_qed_close(BlockDriverState *bs) static int qed_create(const char *filename, uint32_t cluster_size, uint64_t image_size, uint32_t table_size, const char *backing_file, const char *backing_fmt, - Error **errp) + QemuOpts *opts, Error **errp) { QEDHeader header = { .magic = QED_MAGIC, @@ -586,7 +586,7 @@ static int qed_create(const char *filename, uint32_t cluster_size, int ret = 0; BlockDriverState *bs; -ret = bdrv_create_file(filename, NULL, local_err); +ret = bdrv_create_file(filename, opts, local_err); if (ret 0) { error_propagate(errp, local_err); return ret; @@ -682,7 +682,7 @@ static int bdrv_qed_create(const char *filename, QemuOpts *opts, Error **errp) } ret = qed_create(filename, cluster_size, image_size, table_size, - backing_file, backing_fmt, errp); + backing_file, backing_fmt, opts, errp); finish: g_free(backing_file); @@ -1644,6 +1644,11 @@ static QemuOptsList qed_create_opts = { .type = QEMU_OPT_SIZE, .help = L1/L2 table size (in clusters) }, +{ +.name = BLOCK_OPT_NOCOW, +.type = QEMU_OPT_BOOL, +.help = Turn off copy-on-write (valid only on btrfs) +}, { /* end of list */ } } }; diff --git a/block/raw-posix.c b/block/raw-posix.c index dacf4fb..825a0c8 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -55,6 +55,9 @@ #include