Re: [Qemu-block] [Qemu-devel] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-24 Thread 858585 jemmy
On Mon, Apr 17, 2017 at 12:00 PM, 858585 jemmy  wrote:
> On Mon, Apr 17, 2017 at 11:49 AM, Fam Zheng  wrote:
>> On Fri, 04/14 14:30, 858585 jemmy wrote:
>>> Do you know some other format which have very small cluster size?
>>
>> 64k is the default cluster size for qcow2 but it can be configured at image
>> creation time, as 512 bytes, for example:
>>
>> $ qemu-img create -f qcow2 test.qcow2 -o cluster_size=512 1G
>
> Thanks, i will test the performance again.

I find the performance reduce when cluster size is 512.
I will optimize the performance and submit a patch later.
Thanks.

>>
>> Fam



Re: [Qemu-block] [Qemu-devel] [QEMU-2.8] Source QEMU crashes with: "bdrv_co_pwritev: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed"

2017-04-24 Thread Kashyap Chamarthy
On Sat, Apr 22, 2017 at 05:23:49PM +0800, Hailiang Zhang wrote:
> Hi,

Hi Hailiang,

> I think the bellow patch can fix your problme.
> [PATCH 2/4] qmp-cont: invalidate on RUN_STATE_PRELAUNCH
> https://patchwork.kernel.org/patch/9591885/

Hmm, the above patch ("qmp-cont: invalidate on RUN_STATE_PRELAUNCH") is
not merged in Git, as it's stalled on design discussion between Kevin
Wolf and Vladimir.

And the below patch, from you, seems to be not submitted upstream (2.8
stable tree, perhaps).  Do you intend to do so?

> Actually, we encounter the same problem in our test, we fix it with the 
> follow patch:
> 
>  From 0e4d6d706afd9909b5fd71536b45c58af60892f8 Mon Sep 17 00:00:00 2001
>  From: zhanghailiang
>  Date: Tue, 21 Mar 2017 09:44:36 +0800
>  Subject: [PATCH] migration: Re-activate blocks whenever migration been
>   cancelled
> 
>  In commit 1d2acc3162d9c7772510c973f446353fbdd1f9a8, we try to fix the bug
>  'bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed'
>  which occured in migration cancelling process.
> 
>  But it seems that we didn't cover all the cases, we caught such a case 
> which
>  slipped from the old fixup in our test: if libvirtd cancelled the 
> migration
>  process for a shutting down VM, it will send 'system_reset' command 
> first,
>  and then 'cont' command behind, after VM resumes to run, it will trigger 
> the above
>  error reports, because we didn't regain the control of blocks for VM.
> 
>  Signed-off-by: zhanghailiang
>  Signed-off-by: Hongyang Yang
>  ---
>   block.c   | 12 +++-
>   include/block/block.h |  1 +
>   include/migration/migration.h |  3 ---
>   migration/migration.c |  7 +--
>   qmp.c |  4 +---
>   5 files changed, 14 insertions(+), 13 deletions(-)

[...]

-- 
/kashyap



[Qemu-block] [PATCH v5 0/4] Improve convert and dd commands

2017-04-24 Thread Daniel P. Berrange
Update to

  v1: https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg05699.html
  v2: https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg00728.html
  v3: https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg04391.html

This series is in response to Max pointing out that you cannot
use 'convert' for an encrypted target image.

The 'convert' and 'dd' commands need to first create the image
and then open it. The bdrv_create() method takes a set of options
for creating the image, which let us provide a key-secret for the
encryption key. When the commands then open the new image, they
don't provide any options, so the image is unable to be opened
due to lack of encryption key. It is also not possible to use
the --image-opts argument to provide structured options in the
target image name - it must be a plain filename to satisfy the
bdrv_create() API contract.

This series addresses these problems to some extent

 - Adds a new --target-image-opts flag which is used to say
   that the target filename is using structured options.
   It is *only* permitted to use this when -n is also set.
   ie the target image must be pre-created so convert/dd
   don't need to run bdrv_create().

 - When --target-image-opts is not used, add special case
   code that identifies options passed to bdrv_create()
   named "*key-secret" and adds them to the options used
   to open the new image

In future it is desirable to make --target-image-opts work even when -n is
*not* given. This requires considerable work to create a new bdrv_create()
API impl.

The first patch fixes a bug in the 'dd' command while the second adds support
for the missing '--object' arg to 'dd', allowing it to reference secrets when
opening files.  The last two patches implement the new features described above
for the 'convert' command.

Changed in v5:

 - Fix return value (Max)
 - Misc doc changes (Max)
 - Use error_abort (Max)

Changed in v4:

 - Refactor img_open_new_file in terms of img_open_file (Kevin)

Changed in v3:

 - Drop all patches affecting the 'dd' command except for the clear bug fix
   and the --object support. They can be re-considered once dd is rewritten
   to run ontop of convert.
 - Use consistent return/goto style in dd command (Max)
 - Fix error reporting when using compressed image and skip-create (Max)
 - Unconditionally create QDict when open files (Max)

Changed in v2:

 - Replace dd -n flag with support for conv=nocreat,notrunc
 - Misc typos (Eric, Fam)


Daniel P. Berrange (4):
  qemu-img: add support for --object with 'dd' command
  qemu-img: fix --image-opts usage with dd command
  qemu-img: introduce --target-image-opts for 'convert' command
  qemu-img: copy *key-secret opts when opening newly created files

 qemu-img-cmds.hx |   4 +-
 qemu-img.c   | 148 ---
 qemu-img.texi|  12 -
 3 files changed, 130 insertions(+), 34 deletions(-)

-- 
2.9.3




[Qemu-block] [PATCH v5 1/4] qemu-img: add support for --object with 'dd' command

2017-04-24 Thread Daniel P. Berrange
The qemu-img dd command added --image-opts support, but missed
the corresponding --object support. This prevented passing
secrets (eg auth passwords) needed by certain disk images.

Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
---
 qemu-img.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index b220cf7..2249c21 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4159,6 +4159,7 @@ static int img_dd(int argc, char **argv)
 };
 const struct option long_options[] = {
 { "help", no_argument, 0, 'h'},
+{ "object", required_argument, 0, OPTION_OBJECT},
 { "image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
 { 0, 0, 0, 0 }
 };
@@ -4183,6 +4184,15 @@ static int img_dd(int argc, char **argv)
 case 'h':
 help();
 break;
+case OPTION_OBJECT: {
+QemuOpts *opts;
+opts = qemu_opts_parse_noisily(&qemu_object_opts,
+   optarg, true);
+if (!opts) {
+ret = -1;
+goto out;
+}
+}   break;
 case OPTION_IMAGE_OPTS:
 image_opts = true;
 break;
@@ -4227,6 +4237,14 @@ static int img_dd(int argc, char **argv)
 ret = -1;
 goto out;
 }
+
+if (qemu_opts_foreach(&qemu_object_opts,
+  user_creatable_add_opts_foreach,
+  NULL, NULL)) {
+ret = -1;
+goto out;
+}
+
 blk1 = img_open(image_opts, in.filename, fmt, 0, false, false);
 
 if (!blk1) {
-- 
2.9.3




[Qemu-block] [PATCH v5 3/4] qemu-img: introduce --target-image-opts for 'convert' command

2017-04-24 Thread Daniel P. Berrange
The '--image-opts' flags indicates whether the source filename
includes options. The target filename has to remain in the
plain filename format though, since it needs to be passed to
bdrv_create().  When using --skip-create though, it would be
possible to use image-opts syntax. This adds --target-image-opts
to indicate that the target filename includes options. Currently
this mandates use of the --skip-create flag too.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
---
 qemu-img-cmds.hx |  4 +--
 qemu-img.c   | 86 
 qemu-img.texi| 12 ++--
 3 files changed, 73 insertions(+), 29 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 8ac7822..93b50ef 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -40,9 +40,9 @@ STEXI
 ETEXI
 
 DEF("convert", img_convert,
-"convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f fmt] 
[-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] 
[-l snapshot_param] [-S sparse_size] [-m num_coroutines] [-W] filename 
[filename2 [...]] output_filename")
+"convert [--object objectdef] [--image-opts] [--target-image-opts] [-c] 
[-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-o options] 
[-s snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m 
num_coroutines] [-W] filename [filename2 [...]] output_filename")
 STEXI
-@item convert [--object @var{objectdef}] [--image-opts] [-c] [-p] [-q] [-n] 
[-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-o 
@var{options}] [-s @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S 
@var{sparse_size}] [-m @var{num_coroutines}] [-W] @var{filename} 
[@var{filename2} [...]] @var{output_filename}
+@item convert [--object @var{objectdef}] [--image-opts] [--target-image-opts] 
[-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O 
@var{output_fmt}] [-o @var{options}] [-s @var{snapshot_id_or_name}] [-l 
@var{snapshot_param}] [-S @var{sparse_size}] [-m @var{num_coroutines}] [-W] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("dd", img_dd,
diff --git a/qemu-img.c b/qemu-img.c
index 83aff5e..2344e64 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -59,6 +59,7 @@ enum {
 OPTION_PATTERN = 260,
 OPTION_FLUSH_INTERVAL = 261,
 OPTION_NO_DRAIN = 262,
+OPTION_TARGET_IMAGE_OPTS = 263,
 };
 
 typedef enum OutputFormat {
@@ -1921,7 +1922,7 @@ static int img_convert(int argc, char **argv)
 int progress = 0, flags, src_flags;
 bool writethrough, src_writethrough;
 const char *fmt, *out_fmt, *cache, *src_cache, *out_baseimg, *out_filename;
-BlockDriver *drv, *proto_drv;
+BlockDriver *drv = NULL, *proto_drv = NULL;
 BlockBackend **blk = NULL, *out_blk = NULL;
 BlockDriverState **bs = NULL, *out_bs = NULL;
 int64_t total_sectors;
@@ -1941,9 +1942,10 @@ static int img_convert(int argc, char **argv)
 bool image_opts = false;
 bool wr_in_order = true;
 long num_coroutines = 8;
+bool tgt_image_opts = false;
 
+out_fmt = NULL;
 fmt = NULL;
-out_fmt = "raw";
 cache = "unsafe";
 src_cache = BDRV_DEFAULT_CACHE;
 out_baseimg = NULL;
@@ -1954,6 +1956,7 @@ static int img_convert(int argc, char **argv)
 {"help", no_argument, 0, 'h'},
 {"object", required_argument, 0, OPTION_OBJECT},
 {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
+{"target-image-opts", no_argument, 0, OPTION_TARGET_IMAGE_OPTS},
 {0, 0, 0, 0}
 };
 c = getopt_long(argc, argv, ":hf:O:B:ce6o:s:l:S:pt:T:qnm:W",
@@ -2075,9 +2078,16 @@ static int img_convert(int argc, char **argv)
 case OPTION_IMAGE_OPTS:
 image_opts = true;
 break;
+case OPTION_TARGET_IMAGE_OPTS:
+tgt_image_opts = true;
+break;
 }
 }
 
+if (!out_fmt && !tgt_image_opts) {
+out_fmt = "raw";
+}
+
 if (qemu_opts_foreach(&qemu_object_opts,
   user_creatable_add_opts_foreach,
   NULL, NULL)) {
@@ -2090,6 +2100,12 @@ static int img_convert(int argc, char **argv)
 goto fail_getopt;
 }
 
+if (tgt_image_opts && !skip_create) {
+error_report("--target-image-opts requires use of -n flag");
+ret = -1;
+goto fail_getopt;
+}
+
 /* Initialize before goto out */
 if (quiet) {
 progress = 0;
@@ -2100,8 +2116,14 @@ static int img_convert(int argc, char **argv)
 out_filename = bs_n >= 1 ? argv[argc - 1] : NULL;
 
 if (options && has_help_option(options)) {
-ret = print_block_option_help(out_filename, out_fmt);
-goto out;
+if (out_fmt) {
+ret = print_block_option_help(out_filename, out_fmt);
+goto out;
+} else {
+error_report("Option help requires a format be specified");
+ret = -

[Qemu-block] [PATCH v5 4/4] qemu-img: copy *key-secret opts when opening newly created files

2017-04-24 Thread Daniel P. Berrange
The qemu-img dd/convert commands will create a image file and
then try to open it. Historically it has been possible to open
new files without passing any options. With encrypted files
though, the *key-secret options are mandatory, so we need to
provide those options when opening the newly created file.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
---
 qemu-img.c | 41 +++--
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 2344e64..cee28db 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -305,15 +305,17 @@ static BlockBackend *img_open_opts(const char *optstr,
 }
 
 static BlockBackend *img_open_file(const char *filename,
+   QDict *options,
const char *fmt, int flags,
bool writethrough, bool quiet)
 {
 BlockBackend *blk;
 Error *local_err = NULL;
-QDict *options = NULL;
 
 if (fmt) {
-options = qdict_new();
+if (!options) {
+options = qdict_new();
+}
 qdict_put(options, "driver", qstring_from_str(fmt));
 }
 
@@ -332,6 +334,33 @@ static BlockBackend *img_open_file(const char *filename,
 }
 
 
+static int img_add_key_secrets(void *opaque,
+   const char *name, const char *value,
+   Error **errp)
+{
+QDict *options = opaque;
+
+if (g_str_has_suffix(name, "key-secret")) {
+qdict_put(options, name, qstring_from_str(value));
+}
+
+return 0;
+}
+
+static BlockBackend *img_open_new_file(const char *filename,
+   QemuOpts *create_opts,
+   const char *fmt, int flags,
+   bool writethrough, bool quiet)
+{
+QDict *options = NULL;
+
+options = qdict_new();
+qemu_opt_foreach(create_opts, img_add_key_secrets, options, &error_abort);
+
+return img_open_file(filename, options, fmt, flags, writethrough, quiet);
+}
+
+
 static BlockBackend *img_open(bool image_opts,
   const char *filename,
   const char *fmt, int flags, bool writethrough,
@@ -351,7 +380,7 @@ static BlockBackend *img_open(bool image_opts,
 }
 blk = img_open_opts(filename, opts, flags, writethrough, quiet);
 } else {
-blk = img_open_file(filename, fmt, flags, writethrough, quiet);
+blk = img_open_file(filename, NULL, fmt, flags, writethrough, quiet);
 }
 return blk;
 }
@@ -2303,8 +2332,8 @@ static int img_convert(int argc, char **argv)
  * That has to wait for bdrv_create to be improved
  * to allow filenames in option syntax
  */
-out_blk = img_open_file(out_filename, out_fmt,
-flags, writethrough, quiet);
+out_blk = img_open_new_file(out_filename, opts, out_fmt,
+flags, writethrough, quiet);
 }
 if (!out_blk) {
 ret = -1;
@@ -4353,7 +4382,7 @@ static int img_dd(int argc, char **argv)
  * with the bdrv_create() call above which does not
  * support image-opts style.
  */
-blk2 = img_open_file(out.filename, out_fmt, BDRV_O_RDWR,
+blk2 = img_open_file(out.filename, NULL, out_fmt, BDRV_O_RDWR,
  false, false);
 
 if (!blk2) {
-- 
2.9.3




[Qemu-block] [PATCH v5 2/4] qemu-img: fix --image-opts usage with dd command

2017-04-24 Thread Daniel P. Berrange
The --image-opts flag can only be used to affect the parsing
of the source image. The target image has to be specified in
the traditional style regardless, since it needs to be passed
to the bdrv_create() API which does not support the new style
opts.

Reviewed-by: Max Reitz 
Signed-off-by: Daniel P. Berrange 
---
 qemu-img.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 2249c21..83aff5e 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4312,8 +4312,13 @@ static int img_dd(int argc, char **argv)
 goto out;
 }
 
-blk2 = img_open(image_opts, out.filename, out_fmt, BDRV_O_RDWR,
-false, false);
+/* TODO, we can't honour --image-opts for the target,
+ * since it needs to be given in a format compatible
+ * with the bdrv_create() call above which does not
+ * support image-opts style.
+ */
+blk2 = img_open_file(out.filename, out_fmt, BDRV_O_RDWR,
+ false, false);
 
 if (!blk2) {
 ret = -1;
-- 
2.9.3




Re: [Qemu-block] [PATCH v5 3/4] qemu-img: introduce --target-image-opts for 'convert' command

2017-04-24 Thread Fam Zheng
On Mon, 04/24 10:16, Daniel P. Berrange wrote:
> The '--image-opts' flags indicates whether the source filename

s/flags/flag/ or s/indicates/indicate/, I think?

> includes options. The target filename has to remain in the
> plain filename format though, since it needs to be passed to
> bdrv_create().  When using --skip-create though, it would be
> possible to use image-opts syntax. This adds --target-image-opts
> to indicate that the target filename includes options. Currently
> this mandates use of the --skip-create flag too.

Fam



Re: [Qemu-block] [PATCH v5 3/4] qemu-img: introduce --target-image-opts for 'convert' command

2017-04-24 Thread Daniel P. Berrange
On Mon, Apr 24, 2017 at 05:45:12PM +0800, Fam Zheng wrote:
> On Mon, 04/24 10:16, Daniel P. Berrange wrote:
> > The '--image-opts' flags indicates whether the source filename
> 
> s/flags/flag/ or s/indicates/indicate/, I think?

Yes to the first, no to the second

> > includes options. The target filename has to remain in the
> > plain filename format though, since it needs to be passed to
> > bdrv_create().  When using --skip-create though, it would be
> > possible to use image-opts syntax. This adds --target-image-opts
> > to indicate that the target filename includes options. Currently
> > this mandates use of the --skip-create flag too.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH v5 4/4] qemu-img: copy *key-secret opts when opening newly created files

2017-04-24 Thread Fam Zheng
On Mon, 04/24 10:16, Daniel P. Berrange wrote:
> The qemu-img dd/convert commands will create a image file and

s/a image/an image/

> then try to open it. Historically it has been possible to open
> new files without passing any options. With encrypted files
> though, the *key-secret options are mandatory, so we need to
> provide those options when opening the newly created file.

Fam



Re: [Qemu-block] [PATCH v5 0/4] Improve convert and dd commands

2017-04-24 Thread Fam Zheng
On Mon, 04/24 10:16, Daniel P. Berrange wrote:
> Update to
> 
>   v1: https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg05699.html
>   v2: https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg00728.html
>   v3: https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg04391.html
> 
> This series is in response to Max pointing out that you cannot
> use 'convert' for an encrypted target image.

Looks good to me. Apart from the commit message tweaks:

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [PATCH v14 04/20] qemu-img: Add --share-rw option to subcommands

2017-04-24 Thread Kevin Wolf
Am 24.04.2017 um 08:10 hat Fam Zheng geschrieben:
> On Fri, 04/21 15:25, Kevin Wolf wrote:
> > Am 21.04.2017 um 05:55 hat Fam Zheng geschrieben:
> > > Similar to share-rw qdev property, this will force the opened images to
> > > allow shared write permission of other programs.
> > > 
> > > Signed-off-by: Fam Zheng 
> > 
> > General observation: We were considering to make share-rw require
> > read-only. Some of the commands converted here always open the image
> > read-write, so if we go ahead with the restriction, will the option
> > become useless in many of the subcommands?
> 
> share-rw on qdev doesn't require read-only, so I personally perfer we
> follow that manner.

It's not really completely comparable to qdev's share-rw because qdev
only shares writes on the top level (and the qcow2 driver restricts this
again down the tree), while this option propagates all the way down.
Which is why you called the block layer option "force-shared-write".
Maybe that would be the better name here as well.

> Because even with --share-rw for the read-write commands, the image is
> still protected from corruption by the fact that the other side
> probably uses non-share-rw.

If the other side "probably" uses non-share-rw, then the image is only
"probably" protected. I think you're right about the common case, but if
two qemu instances use force-shared-write=on, then we get actual image
corruption.

As far as I know, our real use cases for the option are read-only: We
want to inspect images which are in use by a VM. Do we have any use
cases for read-write access?

Note that this is different from qdev in that share-rw on the qdev level
affects only the user data and can work e.g. if the guest uses a cluster
file system. But this option affects metadata as well and qcow2 never
supports this, so opening two images read-write at the same time is
guaranteed to corrupt the image.

> But on the other hand, we can always add the option when necessary, so
> it's okay to leave them as is. If you insist, I can remove them in
> next version.

Yes, I think we really need a check in bdrv_open_common() that forbids
force-shared-write=on on writable images. And then, the options in
qemu-img become useless when applied to writable images because they
would only produce errors.

> > >  qemu-img.c | 155 
> > > +++--
> > >  1 file changed, 119 insertions(+), 36 deletions(-)
> > > 
> > > diff --git a/qemu-img.c b/qemu-img.c
> > > index ed24371..df88a79 100644
> > > --- a/qemu-img.c
> > > +++ b/qemu-img.c
> > > @@ -28,6 +28,7 @@
> > >  #include "qapi/qobject-output-visitor.h"
> > >  #include "qapi/qmp/qerror.h"
> > >  #include "qapi/qmp/qjson.h"
> > > +#include "qapi/qmp/qbool.h"
> > >  #include "qemu/cutils.h"
> > >  #include "qemu/config-file.h"
> > >  #include "qemu/option.h"
> > > @@ -283,12 +284,15 @@ static int img_open_password(BlockBackend *blk, 
> > > const char *filename,
> > >  
> > >  static BlockBackend *img_open_opts(const char *optstr,
> > > QemuOpts *opts, int flags, bool 
> > > writethrough,
> > > -   bool quiet)
> > > +   bool quiet, bool share_rw)
> > >  {
> > >  QDict *options;
> > >  Error *local_err = NULL;
> > >  BlockBackend *blk;
> > >  options = qemu_opts_to_qdict(opts, NULL);
> > > +if (share_rw) {
> > > +qdict_put(options, BDRV_OPT_FORCE_SHARED_WRITE, 
> > > qbool_from_bool(true));
> > > +}
> > 
> > It's interesting that you chose a conditional qdict_put for true rather
> > than an unconditional one for share_rw here. The difference becomes
> > visible when someone sets both -U and share-rw=off; we need to decide
> > which one should take precedence.
> 
> I don't have a preference here.  Setting both -U and share-rw=off is
> inconsistent, it's not a problem to yield an "undefined" result.

We could just check whether the entry already exists and error out.
Maybe that's the best option.

Kevin



Re: [Qemu-block] [PATCH v14 04/20] qemu-img: Add --share-rw option to subcommands

2017-04-24 Thread Fam Zheng
On Mon, 04/24 12:13, Kevin Wolf wrote:
> Am 24.04.2017 um 08:10 hat Fam Zheng geschrieben:
> > On Fri, 04/21 15:25, Kevin Wolf wrote:
> > > Am 21.04.2017 um 05:55 hat Fam Zheng geschrieben:
> > > > Similar to share-rw qdev property, this will force the opened images to
> > > > allow shared write permission of other programs.
> > > > 
> > > > Signed-off-by: Fam Zheng 
> > > 
> > > General observation: We were considering to make share-rw require
> > > read-only. Some of the commands converted here always open the image
> > > read-write, so if we go ahead with the restriction, will the option
> > > become useless in many of the subcommands?
> > 
> > share-rw on qdev doesn't require read-only, so I personally perfer we
> > follow that manner.
> 
> It's not really completely comparable to qdev's share-rw because qdev
> only shares writes on the top level (and the qcow2 driver restricts this
> again down the tree), while this option propagates all the way down.
> Which is why you called the block layer option "force-shared-write".
> Maybe that would be the better name here as well.

Makes sense to me.

> 
> > Because even with --share-rw for the read-write commands, the image is
> > still protected from corruption by the fact that the other side
> > probably uses non-share-rw.
> 
> If the other side "probably" uses non-share-rw, then the image is only
> "probably" protected. I think you're right about the common case, but if
> two qemu instances use force-shared-write=on, then we get actual image
> corruption.
> 
> As far as I know, our real use cases for the option are read-only: We
> want to inspect images which are in use by a VM. Do we have any use
> cases for read-write access?
> 
> Note that this is different from qdev in that share-rw on the qdev level
> affects only the user data and can work e.g. if the guest uses a cluster
> file system. But this option affects metadata as well and qcow2 never
> supports this, so opening two images read-write at the same time is
> guaranteed to corrupt the image.

OK, I think that makes sense too.

> 
> > But on the other hand, we can always add the option when necessary, so
> > it's okay to leave them as is. If you insist, I can remove them in
> > next version.
> 
> Yes, I think we really need a check in bdrv_open_common() that forbids
> force-shared-write=on on writable images. And then, the options in
> qemu-img become useless when applied to writable images because they
> would only produce errors.
> 
> > > >  qemu-img.c | 155 
> > > > +++--
> > > >  1 file changed, 119 insertions(+), 36 deletions(-)
> > > > 
> > > > diff --git a/qemu-img.c b/qemu-img.c
> > > > index ed24371..df88a79 100644
> > > > --- a/qemu-img.c
> > > > +++ b/qemu-img.c
> > > > @@ -28,6 +28,7 @@
> > > >  #include "qapi/qobject-output-visitor.h"
> > > >  #include "qapi/qmp/qerror.h"
> > > >  #include "qapi/qmp/qjson.h"
> > > > +#include "qapi/qmp/qbool.h"
> > > >  #include "qemu/cutils.h"
> > > >  #include "qemu/config-file.h"
> > > >  #include "qemu/option.h"
> > > > @@ -283,12 +284,15 @@ static int img_open_password(BlockBackend *blk, 
> > > > const char *filename,
> > > >  
> > > >  static BlockBackend *img_open_opts(const char *optstr,
> > > > QemuOpts *opts, int flags, bool 
> > > > writethrough,
> > > > -   bool quiet)
> > > > +   bool quiet, bool share_rw)
> > > >  {
> > > >  QDict *options;
> > > >  Error *local_err = NULL;
> > > >  BlockBackend *blk;
> > > >  options = qemu_opts_to_qdict(opts, NULL);
> > > > +if (share_rw) {
> > > > +qdict_put(options, BDRV_OPT_FORCE_SHARED_WRITE, 
> > > > qbool_from_bool(true));
> > > > +}
> > > 
> > > It's interesting that you chose a conditional qdict_put for true rather
> > > than an unconditional one for share_rw here. The difference becomes
> > > visible when someone sets both -U and share-rw=off; we need to decide
> > > which one should take precedence.
> > 
> > I don't have a preference here.  Setting both -U and share-rw=off is
> > inconsistent, it's not a problem to yield an "undefined" result.
> 
> We could just check whether the entry already exists and error out.
> Maybe that's the best option.

Sounds good, will add the check.

Fam




Re: [Qemu-block] proposed qcow2 extension: cluster reservations [was: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-24 Thread Kevin Wolf
Am 22.04.2017 um 19:56 hat Max Reitz geschrieben:
> On 21.04.2017 23:09, Eric Blake wrote:
> > And meanwhile, it looks like I have some patches to propose (and
> > qemu-iotests to write) if I can help fix the bugs I've pointed out.
> 
> You mean these?
> https://github.com/XanClic/qemu/commits/random-qcow2-stuff-v1
> 
> ;-)
> 
> I'll send them once I've written tests.

You were quicker with writing the patches than I could reply that I
already started a patch to reuse preallocated zero clusters, but never
got around to finish it:

http://repo.or.cz/qemu/kevin.git/commitdiff/60b9f813bb258978455944b4ace1ab5d93b5e7d4

I seem to remember that it still had a problem (and probably also the
refcount > 0 bug that Eric mentioned), but maybe you can reuse at least
the test case from there.

Kevin


pgpqOkNNHlp3i.pgp
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-24 Thread 858585 jemmy
On Mon, Apr 24, 2017 at 3:40 PM, 858585 jemmy  wrote:
> On Mon, Apr 17, 2017 at 12:00 PM, 858585 jemmy  wrote:
>> On Mon, Apr 17, 2017 at 11:49 AM, Fam Zheng  wrote:
>>> On Fri, 04/14 14:30, 858585 jemmy wrote:
 Do you know some other format which have very small cluster size?
>>>
>>> 64k is the default cluster size for qcow2 but it can be configured at image
>>> creation time, as 512 bytes, for example:
>>>
>>> $ qemu-img create -f qcow2 test.qcow2 -o cluster_size=512 1G
>>
>> Thanks, i will test the performance again.
>
> I find the performance reduce when cluster size is 512.
> I will optimize the performance and submit a patch later.
> Thanks.

after optimize the code, i find the destination qemu process still have very
bad performance when cluster_size is 512. the reason is cause by
qcow2_check_metadata_overlap.

if cluster_size is 512, the destination qemu process reach 100% cpu usage.
and the perf top result is below:

Samples: 32K of event 'cycles', Event count (approx.): 20105269445
 91.68%  qemu-system-x86_64   [.] qcow2_check_metadata_overlap
  3.33%  qemu-system-x86_64   [.] range_get_last
  2.76%  qemu-system-x86_64   [.] ranges_overlap
  0.61%  qemu-system-x86_64   [.] qcow2_cache_do_get

very large l1_size.
(gdb) p s->l1_size
$3 = 1310720

(gdb) p s->max_refcount_table_index
$5 = 21905

the backtrace:

Breakpoint 1, qcow2_check_metadata_overlap (bs=0x16feb00, ign=0,
offset=440329728, size=4096) at block/qcow2-refcount.c:2344
2344{
(gdb) bt
#0  qcow2_check_metadata_overlap (bs=0x16feb00, ign=0,
offset=440329728, size=4096) at block/qcow2-refcount.c:2344
#1  0x00878d9f in qcow2_pre_write_overlap_check (bs=0x16feb00,
ign=0, offset=440329728, size=4096) at block/qcow2-refcount.c:2473
#2  0x0086e382 in qcow2_co_pwritev (bs=0x16feb00,
offset=771047424, bytes=704512, qiov=0x7fd026bfdb90, flags=0) at
block/qcow2.c:1653
#3  0x008aeace in bdrv_driver_pwritev (bs=0x16feb00,
offset=770703360, bytes=1048576, qiov=0x7fd026bfdb90, flags=0) at
block/io.c:871
#4  0x008b015c in bdrv_aligned_pwritev (child=0x171b630,
req=0x7fd026bfd980, offset=770703360, bytes=1048576, align=1,
qiov=0x7fd026bfdb90, flags=0) at block/io.c:1371
#5  0x008b0d77 in bdrv_co_pwritev (child=0x171b630,
offset=770703360, bytes=1048576, qiov=0x7fd026bfdb90, flags=0) at
block/io.c:1622
#6  0x0089a76d in blk_co_pwritev (blk=0x16fe920,
offset=770703360, bytes=1048576, qiov=0x7fd026bfdb90, flags=0) at
block/block-backend.c:992
#7  0x0089a878 in blk_write_entry (opaque=0x7fd026bfdb70) at
block/block-backend.c:1017
#8  0x0089a95d in blk_prw (blk=0x16fe920, offset=770703360,
buf=0x362b050 "", bytes=1048576, co_entry=0x89a81a ,
flags=0) at block/block-backend.c:1045
#9  0x0089b222 in blk_pwrite (blk=0x16fe920, offset=770703360,
buf=0x362b050, count=1048576, flags=0) at block/block-backend.c:1208
#10 0x007d480d in block_load (f=0x1784fa0, opaque=0xfd46a0,
version_id=1) at migration/block.c:992
#11 0x0049dc58 in vmstate_load (f=0x1784fa0, se=0x16fbdc0,
version_id=1) at /data/qemu/migration/savevm.c:730
#12 0x004a0752 in qemu_loadvm_section_part_end (f=0x1784fa0,
mis=0xfd4160) at /data/qemu/migration/savevm.c:1923
#13 0x004a0842 in qemu_loadvm_state_main (f=0x1784fa0,
mis=0xfd4160) at /data/qemu/migration/savevm.c:1954
#14 0x004a0a33 in qemu_loadvm_state (f=0x1784fa0) at
/data/qemu/migration/savevm.c:2020
#15 0x007c2d33 in process_incoming_migration_co
(opaque=0x1784fa0) at migration/migration.c:404
#16 0x00966593 in coroutine_trampoline (i0=27108400, i1=0) at
util/coroutine-ucontext.c:79
#17 0x7fd03946b8f0 in ?? () from /lib64/libc.so.6
#18 0x7fff869c87e0 in ?? ()
#19 0x in ?? ()

when the cluster_size is too small, the write performance is very bad.
How to solve this problem? Any suggestion?
1. when the cluster_size is too small, not invoke qcow2_pre_write_overlap_check.
2.limit the qcow2 cluster_size range, don't allow set the cluster_size
too small.
which way is better?

>
>>>
>>> Fam



Re: [Qemu-block] [Qemu-devel] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-24 Thread Fam Zheng
On Mon, 04/24 19:54, 858585 jemmy wrote:
> On Mon, Apr 24, 2017 at 3:40 PM, 858585 jemmy  wrote:
> > On Mon, Apr 17, 2017 at 12:00 PM, 858585 jemmy  
> > wrote:
> >> On Mon, Apr 17, 2017 at 11:49 AM, Fam Zheng  wrote:
> >>> On Fri, 04/14 14:30, 858585 jemmy wrote:
>  Do you know some other format which have very small cluster size?
> >>>
> >>> 64k is the default cluster size for qcow2 but it can be configured at 
> >>> image
> >>> creation time, as 512 bytes, for example:
> >>>
> >>> $ qemu-img create -f qcow2 test.qcow2 -o cluster_size=512 1G
> >>
> >> Thanks, i will test the performance again.
> >
> > I find the performance reduce when cluster size is 512.
> > I will optimize the performance and submit a patch later.
> > Thanks.
> 
> after optimize the code, i find the destination qemu process still have very
> bad performance when cluster_size is 512. the reason is cause by
> qcow2_check_metadata_overlap.
> 
> if cluster_size is 512, the destination qemu process reach 100% cpu usage.
> and the perf top result is below:
> 
> Samples: 32K of event 'cycles', Event count (approx.): 20105269445
>  91.68%  qemu-system-x86_64   [.] qcow2_check_metadata_overlap
>   3.33%  qemu-system-x86_64   [.] range_get_last
>   2.76%  qemu-system-x86_64   [.] ranges_overlap
>   0.61%  qemu-system-x86_64   [.] qcow2_cache_do_get
> 
> very large l1_size.
> (gdb) p s->l1_size
> $3 = 1310720
> 
> (gdb) p s->max_refcount_table_index
> $5 = 21905
> 
> the backtrace:
> 
> Breakpoint 1, qcow2_check_metadata_overlap (bs=0x16feb00, ign=0,
> offset=440329728, size=4096) at block/qcow2-refcount.c:2344
> 2344{
> (gdb) bt
> #0  qcow2_check_metadata_overlap (bs=0x16feb00, ign=0,
> offset=440329728, size=4096) at block/qcow2-refcount.c:2344
> #1  0x00878d9f in qcow2_pre_write_overlap_check (bs=0x16feb00,
> ign=0, offset=440329728, size=4096) at block/qcow2-refcount.c:2473
> #2  0x0086e382 in qcow2_co_pwritev (bs=0x16feb00,
> offset=771047424, bytes=704512, qiov=0x7fd026bfdb90, flags=0) at
> block/qcow2.c:1653
> #3  0x008aeace in bdrv_driver_pwritev (bs=0x16feb00,
> offset=770703360, bytes=1048576, qiov=0x7fd026bfdb90, flags=0) at
> block/io.c:871
> #4  0x008b015c in bdrv_aligned_pwritev (child=0x171b630,
> req=0x7fd026bfd980, offset=770703360, bytes=1048576, align=1,
> qiov=0x7fd026bfdb90, flags=0) at block/io.c:1371
> #5  0x008b0d77 in bdrv_co_pwritev (child=0x171b630,
> offset=770703360, bytes=1048576, qiov=0x7fd026bfdb90, flags=0) at
> block/io.c:1622
> #6  0x0089a76d in blk_co_pwritev (blk=0x16fe920,
> offset=770703360, bytes=1048576, qiov=0x7fd026bfdb90, flags=0) at
> block/block-backend.c:992
> #7  0x0089a878 in blk_write_entry (opaque=0x7fd026bfdb70) at
> block/block-backend.c:1017
> #8  0x0089a95d in blk_prw (blk=0x16fe920, offset=770703360,
> buf=0x362b050 "", bytes=1048576, co_entry=0x89a81a ,
> flags=0) at block/block-backend.c:1045
> #9  0x0089b222 in blk_pwrite (blk=0x16fe920, offset=770703360,
> buf=0x362b050, count=1048576, flags=0) at block/block-backend.c:1208
> #10 0x007d480d in block_load (f=0x1784fa0, opaque=0xfd46a0,
> version_id=1) at migration/block.c:992
> #11 0x0049dc58 in vmstate_load (f=0x1784fa0, se=0x16fbdc0,
> version_id=1) at /data/qemu/migration/savevm.c:730
> #12 0x004a0752 in qemu_loadvm_section_part_end (f=0x1784fa0,
> mis=0xfd4160) at /data/qemu/migration/savevm.c:1923
> #13 0x004a0842 in qemu_loadvm_state_main (f=0x1784fa0,
> mis=0xfd4160) at /data/qemu/migration/savevm.c:1954
> #14 0x004a0a33 in qemu_loadvm_state (f=0x1784fa0) at
> /data/qemu/migration/savevm.c:2020
> #15 0x007c2d33 in process_incoming_migration_co
> (opaque=0x1784fa0) at migration/migration.c:404
> #16 0x00966593 in coroutine_trampoline (i0=27108400, i1=0) at
> util/coroutine-ucontext.c:79
> #17 0x7fd03946b8f0 in ?? () from /lib64/libc.so.6
> #18 0x7fff869c87e0 in ?? ()
> #19 0x in ?? ()
> 
> when the cluster_size is too small, the write performance is very bad.
> How to solve this problem? Any suggestion?
> 1. when the cluster_size is too small, not invoke 
> qcow2_pre_write_overlap_check.
> 2.limit the qcow2 cluster_size range, don't allow set the cluster_size
> too small.
> which way is better?

It's a separate problem.

I think what should be done in this patch (or a follow up) is coalescing the
same type of write as much as possible (by type I mean "zeroed" or "normal"
write).  With that, cluster size won't matter that much.

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-24 Thread 858585 jemmy
On Mon, Apr 24, 2017 at 8:09 PM, Fam Zheng  wrote:
> On Mon, 04/24 19:54, 858585 jemmy wrote:
>> On Mon, Apr 24, 2017 at 3:40 PM, 858585 jemmy  wrote:
>> > On Mon, Apr 17, 2017 at 12:00 PM, 858585 jemmy  
>> > wrote:
>> >> On Mon, Apr 17, 2017 at 11:49 AM, Fam Zheng  wrote:
>> >>> On Fri, 04/14 14:30, 858585 jemmy wrote:
>>  Do you know some other format which have very small cluster size?
>> >>>
>> >>> 64k is the default cluster size for qcow2 but it can be configured at 
>> >>> image
>> >>> creation time, as 512 bytes, for example:
>> >>>
>> >>> $ qemu-img create -f qcow2 test.qcow2 -o cluster_size=512 1G
>> >>
>> >> Thanks, i will test the performance again.
>> >
>> > I find the performance reduce when cluster size is 512.
>> > I will optimize the performance and submit a patch later.
>> > Thanks.
>>
>> after optimize the code, i find the destination qemu process still have very
>> bad performance when cluster_size is 512. the reason is cause by
>> qcow2_check_metadata_overlap.
>>
>> if cluster_size is 512, the destination qemu process reach 100% cpu usage.
>> and the perf top result is below:
>>
>> Samples: 32K of event 'cycles', Event count (approx.): 20105269445
>>  91.68%  qemu-system-x86_64   [.] qcow2_check_metadata_overlap
>>   3.33%  qemu-system-x86_64   [.] range_get_last
>>   2.76%  qemu-system-x86_64   [.] ranges_overlap
>>   0.61%  qemu-system-x86_64   [.] qcow2_cache_do_get
>>
>> very large l1_size.
>> (gdb) p s->l1_size
>> $3 = 1310720
>>
>> (gdb) p s->max_refcount_table_index
>> $5 = 21905
>>
>> the backtrace:
>>
>> Breakpoint 1, qcow2_check_metadata_overlap (bs=0x16feb00, ign=0,
>> offset=440329728, size=4096) at block/qcow2-refcount.c:2344
>> 2344{
>> (gdb) bt
>> #0  qcow2_check_metadata_overlap (bs=0x16feb00, ign=0,
>> offset=440329728, size=4096) at block/qcow2-refcount.c:2344
>> #1  0x00878d9f in qcow2_pre_write_overlap_check (bs=0x16feb00,
>> ign=0, offset=440329728, size=4096) at block/qcow2-refcount.c:2473
>> #2  0x0086e382 in qcow2_co_pwritev (bs=0x16feb00,
>> offset=771047424, bytes=704512, qiov=0x7fd026bfdb90, flags=0) at
>> block/qcow2.c:1653
>> #3  0x008aeace in bdrv_driver_pwritev (bs=0x16feb00,
>> offset=770703360, bytes=1048576, qiov=0x7fd026bfdb90, flags=0) at
>> block/io.c:871
>> #4  0x008b015c in bdrv_aligned_pwritev (child=0x171b630,
>> req=0x7fd026bfd980, offset=770703360, bytes=1048576, align=1,
>> qiov=0x7fd026bfdb90, flags=0) at block/io.c:1371
>> #5  0x008b0d77 in bdrv_co_pwritev (child=0x171b630,
>> offset=770703360, bytes=1048576, qiov=0x7fd026bfdb90, flags=0) at
>> block/io.c:1622
>> #6  0x0089a76d in blk_co_pwritev (blk=0x16fe920,
>> offset=770703360, bytes=1048576, qiov=0x7fd026bfdb90, flags=0) at
>> block/block-backend.c:992
>> #7  0x0089a878 in blk_write_entry (opaque=0x7fd026bfdb70) at
>> block/block-backend.c:1017
>> #8  0x0089a95d in blk_prw (blk=0x16fe920, offset=770703360,
>> buf=0x362b050 "", bytes=1048576, co_entry=0x89a81a ,
>> flags=0) at block/block-backend.c:1045
>> #9  0x0089b222 in blk_pwrite (blk=0x16fe920, offset=770703360,
>> buf=0x362b050, count=1048576, flags=0) at block/block-backend.c:1208
>> #10 0x007d480d in block_load (f=0x1784fa0, opaque=0xfd46a0,
>> version_id=1) at migration/block.c:992
>> #11 0x0049dc58 in vmstate_load (f=0x1784fa0, se=0x16fbdc0,
>> version_id=1) at /data/qemu/migration/savevm.c:730
>> #12 0x004a0752 in qemu_loadvm_section_part_end (f=0x1784fa0,
>> mis=0xfd4160) at /data/qemu/migration/savevm.c:1923
>> #13 0x004a0842 in qemu_loadvm_state_main (f=0x1784fa0,
>> mis=0xfd4160) at /data/qemu/migration/savevm.c:1954
>> #14 0x004a0a33 in qemu_loadvm_state (f=0x1784fa0) at
>> /data/qemu/migration/savevm.c:2020
>> #15 0x007c2d33 in process_incoming_migration_co
>> (opaque=0x1784fa0) at migration/migration.c:404
>> #16 0x00966593 in coroutine_trampoline (i0=27108400, i1=0) at
>> util/coroutine-ucontext.c:79
>> #17 0x7fd03946b8f0 in ?? () from /lib64/libc.so.6
>> #18 0x7fff869c87e0 in ?? ()
>> #19 0x in ?? ()
>>
>> when the cluster_size is too small, the write performance is very bad.
>> How to solve this problem? Any suggestion?
>> 1. when the cluster_size is too small, not invoke 
>> qcow2_pre_write_overlap_check.
>> 2.limit the qcow2 cluster_size range, don't allow set the cluster_size
>> too small.
>> which way is better?
>
> It's a separate problem.
>
> I think what should be done in this patch (or a follow up) is coalescing the
> same type of write as much as possible (by type I mean "zeroed" or "normal"
> write).  With that, cluster size won't matter that much.
yes, i have already optimize the code this way. i will send the patch later.
but the performance is still bad. It's a separate problem.
qcow2_check_metadata_overlap use a lot of cpu usage.

after i optimize the code, the blk_pwrite already coalescing the same type.
you can see in the 

Re: [Qemu-block] [Qemu-devel] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-24 Thread Fam Zheng
On Mon, 04/24 20:09, Fam Zheng wrote:
> It's a separate problem.

To be specific:

1) there is an option "overlap-check" that one can use to
disable the costly metadata check;

2) qcow2 with cluster_size = 512 is probably too uncommon to be optimized for.

Both are irrelevant to why and how this patch can be improved, IMO.

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-24 Thread 858585 jemmy
On Mon, Apr 24, 2017 at 8:19 PM, Fam Zheng  wrote:
> On Mon, 04/24 20:09, Fam Zheng wrote:
>> It's a separate problem.
>
> To be specific:
>
> 1) there is an option "overlap-check" that one can use to
> disable the costly metadata check;
yes, i will disable metadata check, and test the performance again.

>
> 2) qcow2 with cluster_size = 512 is probably too uncommon to be optimized for.
if culster_size is very small, should disable metadata check default?

>
> Both are irrelevant to why and how this patch can be improved, IMO.
>
> Fam



Re: [Qemu-block] [Qemu-devel] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-24 Thread Fam Zheng
On Mon, 04/24 20:26, 858585 jemmy wrote:
> > 2) qcow2 with cluster_size = 512 is probably too uncommon to be optimized 
> > for.
> if culster_size is very small, should disable metadata check default?
> 

No, I don't think it's worth the inconsistent behavior. People who want
performance shouldn't use 512 bytes anyway.

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-24 Thread 858585 jemmy
On Mon, Apr 24, 2017 at 8:36 PM, Fam Zheng  wrote:
> On Mon, 04/24 20:26, 858585 jemmy wrote:
>> > 2) qcow2 with cluster_size = 512 is probably too uncommon to be optimized 
>> > for.
>> if culster_size is very small, should disable metadata check default?
>>
>
> No, I don't think it's worth the inconsistent behavior. People who want
> performance shouldn't use 512 bytes anyway.
ok,thanks.

>
> Fam



Re: [Qemu-block] proposed qcow2 extension: cluster reservations [was: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-24 Thread Alberto Garcia
On Sat 22 Apr 2017 07:56:57 PM CEST, Max Reitz wrote:
>> So, if you got this far in reading, the question becomes whether
>> having a mode where you can mark a cluster as
>> mapping-reserved-but-unallocated has enough use case to be worth
>> pursuing, knowing that it will burn an incompatible feature bit; or
>> if it should be rolled into the subcluster proposal, or whether it's
>> not a feature that anyone needs after all.
>
> I just forgot that just saying !ALLOCATED will be enough, regardless
> of the ZERO flag... Yeah, subclusters will give us this for free, and
> I think it's therefore reasonable to just require them if you want
> this feature (preallocation with a backing file).

I agree with Max here.

Berto



Re: [Qemu-block] [PATCH v2] qemu-img: use blk_co_pwrite_zeroes for zero sectors when compressed

2017-04-24 Thread 858585 jemmy
Hi Everyone:
Any suggestion about this patch?
Thanks.

On Sun, Apr 23, 2017 at 5:53 PM, 858585 jemmy  wrote:
> I test four test case for this patch.
> 1.qcow2 image with lots of zero cluster, and convert with compress
> 2.qcow2 image with lots of zero cluster, and convert with no compress
> 3.qcow2 image with lots of non-zero clusters, and convert with compress
> 4.qcow2 image with lots of non-zero clusters, and convert with no compress
> all test case pass.
>
> the qcow2 image with lots of zero clusters, the time reduce obviously.
> and have no bad effort in other cases.
>
>
> On Fri, Apr 21, 2017 at 5:57 PM,   wrote:
>> From: Lidong Chen 
>>
>> when the buffer is zero, blk_co_pwrite_zeroes is more effectively than
>> blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED. this patch can reduces
>> the time when converts the qcow2 image with lots of zero.
>>
>> Signed-off-by: Lidong Chen 
>> ---
>> v2 changelog:
>> unify the compressed and non-compressed code paths
>> ---
>>  qemu-img.c | 41 +++--
>>  1 file changed, 11 insertions(+), 30 deletions(-)
>>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index b220cf7..60c9adf 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -1661,6 +1661,8 @@ static int coroutine_fn 
>> convert_co_write(ImgConvertState *s, int64_t sector_num,
>>
>>  while (nb_sectors > 0) {
>>  int n = nb_sectors;
>> +BdrvRequestFlags flags = s->compressed ? BDRV_REQ_WRITE_COMPRESSED 
>> : 0;
>> +
>>  switch (status) {
>>  case BLK_BACKING_FILE:
>>  /* If we have a backing file, leave clusters unallocated that 
>> are
>> @@ -1670,43 +1672,21 @@ static int coroutine_fn 
>> convert_co_write(ImgConvertState *s, int64_t sector_num,
>>  break;
>>
>>  case BLK_DATA:
>> -/* We must always write compressed clusters as a whole, so don't
>> - * try to find zeroed parts in the buffer. We can only save the
>> - * write if the buffer is completely zeroed and we're allowed to
>> - * keep the target sparse. */
>> -if (s->compressed) {
>> -if (s->has_zero_init && s->min_sparse &&
>> -buffer_is_zero(buf, n * BDRV_SECTOR_SIZE))
>> -{
>> -assert(!s->target_has_backing);
>> -break;
>> -}
>> -
>> -iov.iov_base = buf;
>> -iov.iov_len = n << BDRV_SECTOR_BITS;
>> -qemu_iovec_init_external(&qiov, &iov, 1);
>> -
>> -ret = blk_co_pwritev(s->target, sector_num << 
>> BDRV_SECTOR_BITS,
>> - n << BDRV_SECTOR_BITS, &qiov,
>> - BDRV_REQ_WRITE_COMPRESSED);
>> -if (ret < 0) {
>> -return ret;
>> -}
>> -break;
>> -}
>> -
>> -/* If there is real non-zero data or we're told to keep the 
>> target
>> - * fully allocated (-S 0), we must write it. Otherwise we can 
>> treat
>> +/* If we're told to keep the target fully allocated (-S 0) or 
>> there
>> + * is real non-zero data, we must write it. Otherwise we can 
>> treat
>>   * it as zero sectors. */
>>  if (!s->min_sparse ||
>> -is_allocated_sectors_min(buf, n, &n, s->min_sparse))
>> -{
>> +(!s->compressed &&
>> + is_allocated_sectors_min(buf, n, &n, s->min_sparse)) ||
>> +(s->compressed &&
>> + !buffer_is_zero(buf, n * BDRV_SECTOR_SIZE))) {
>> +
>>  iov.iov_base = buf;
>>  iov.iov_len = n << BDRV_SECTOR_BITS;
>>  qemu_iovec_init_external(&qiov, &iov, 1);
>>
>>  ret = blk_co_pwritev(s->target, sector_num << 
>> BDRV_SECTOR_BITS,
>> - n << BDRV_SECTOR_BITS, &qiov, 0);
>> + n << BDRV_SECTOR_BITS, &qiov, flags);
>>  if (ret < 0) {
>>  return ret;
>>  }
>> @@ -1716,6 +1696,7 @@ static int coroutine_fn 
>> convert_co_write(ImgConvertState *s, int64_t sector_num,
>>
>>  case BLK_ZERO:
>>  if (s->has_zero_init) {
>> +assert(!s->target_has_backing);
>>  break;
>>  }
>>  ret = blk_co_pwrite_zeroes(s->target,
>> --
>> 1.8.3.1
>>



Re: [Qemu-block] [PATCH v14 19/20] file-posix: Add image locking in perm operations

2017-04-24 Thread Kevin Wolf
Am 21.04.2017 um 05:56 hat Fam Zheng geschrieben:
> virtlockd in libvirt locks the first byte, so we start looking at the
> file bytes from 0x10.
> 
> The complication is in the transactional interface.  To make the reopen
> logic managable, and allow better reuse, the code is internally
> organized with a table from old mode to the new one.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/file-posix.c | 739 
> -
>  1 file changed, 736 insertions(+), 3 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 2114720..c307493 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -129,8 +129,54 @@ do { \
>  
>  #define MAX_BLOCKSIZE4096
>  
> +/* Posix file locking bytes. Libvirt takes byte 0, we start from byte 0x10,
> + * leaving a few more bytes for its future use. */
> +#define RAW_LOCK_BYTE_MIN 0x10
> +#define RAW_LOCK_BYTE_NO_OTHER_WRITER 0x10
> +#define RAW_LOCK_BYTE_WRITE   0x11
> +#ifdef F_OFD_SETLK
> +#define RAW_LOCK_SUPPORTED 1
> +#else
> +#define RAW_LOCK_SUPPORTED 0
> +#endif
> +
> +/*
> + ** reader that can tolerate writers: Don't do anything
> + *
> + ** reader that can't tolerate writers: Take shared lock on byte 0x10. Test
> + *  byte 0x11 is unlocked.
> + *
> + ** shared writer: Take shared lock on byte 0x11. Test byte 0x10 is unlocked.
> + *
> + ** exclusive writer: Take exclusive locks on both bytes.
> + */

There are few things that made this patch basically degenerate into a
ton of special cases, the most important of them being that we thought
that we'd need acquire exclusive locks to avoid races and that read-only
file descriptors can't take an exclusive lock and therefore can't test
whether anyone else is holding a shared lock on the file descriptor.

After looking a bit more into this and discussing it on IRC with Fam who
found the last missing puzzle piece, it turned out that while it's true
that F_OFD_SETLK for an exclusive lock doesn't work for read-only FDs,
in fact F_OFD_GETLK (which tests if a lock could be set) does work, so
we can use it to check whether anyone else is holding a shared lock.

We also never need to actually acquire any exclusive locks, just
acquiring shared locks everywhere and testing whether we could set
exclusive locks is enough for our needs. Both of these operations work
fine with read-only FDs.

With these observations, it should be easy to even map _all_ 64 bits of
perm and ~shared to file locks. I think the algorithm looks roughly like
this:

1. Acquire shared @perm locks for (old_perm | new_perm)
2. Acquire shared @~shared locks for (~old_shared | ~new_shared)
3. Test that nobody else holds @perm locks for ~new_shared
4. Test that nobody else holds @~shared locks for new_perm
5. Drop @perm locks for (old_perm & ~new_perm)
6. Drop @~shared locks for (~old_shared & new_shared)

Having one universal algorithm for all state transitions allows us to
greatly simplify the code in this patch.

Kevin



[Qemu-block] [PATCH] migration/block: optimize the performance by coalescing the same write type

2017-04-24 Thread jemmy858585
From: Lidong Chen 

This patch optimizes the performance by coalescing the same write type.
When the zero/non-zero state changes, perform the write for the accumulated
cluster count.

Signed-off-by: Lidong Chen 
---
Thanks Fam Zheng and Stefan's advice.
---
 migration/block.c | 66 +--
 1 file changed, 49 insertions(+), 17 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index 060087f..e9c5e21 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -40,6 +40,8 @@
 
 #define MAX_INFLIGHT_IO 512
 
+#define MIN_CLUSTER_SIZE 8192
+
 //#define DEBUG_BLK_MIGRATION
 
 #ifdef DEBUG_BLK_MIGRATION
@@ -923,10 +925,11 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 }
 
 ret = bdrv_get_info(blk_bs(blk), &bdi);
-if (ret == 0 && bdi.cluster_size > 0 &&
-bdi.cluster_size <= BLOCK_SIZE &&
-BLOCK_SIZE % bdi.cluster_size == 0) {
+if (ret == 0 && bdi.cluster_size > 0) {
 cluster_size = bdi.cluster_size;
+while (cluster_size < MIN_CLUSTER_SIZE) {
+cluster_size *= 2;
+}
 } else {
 cluster_size = BLOCK_SIZE;
 }
@@ -943,29 +946,58 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 nr_sectors * BDRV_SECTOR_SIZE,
 BDRV_REQ_MAY_UNMAP);
 } else {
-int i;
-int64_t cur_addr;
-uint8_t *cur_buf;
+int64_t cur_addr = addr * BDRV_SECTOR_SIZE;
+uint8_t *cur_buf = NULL;
+int64_t last_addr = addr * BDRV_SECTOR_SIZE;
+uint8_t *last_buf = NULL;
+int64_t end_addr = addr * BDRV_SECTOR_SIZE + BLOCK_SIZE;
 
 buf = g_malloc(BLOCK_SIZE);
 qemu_get_buffer(f, buf, BLOCK_SIZE);
-for (i = 0; i < BLOCK_SIZE / cluster_size; i++) {
-cur_addr = addr * BDRV_SECTOR_SIZE + i * cluster_size;
-cur_buf = buf + i * cluster_size;
-
-if ((!block_mig_state.zero_blocks ||
-cluster_size < BLOCK_SIZE) &&
-buffer_is_zero(cur_buf, cluster_size)) {
-ret = blk_pwrite_zeroes(blk, cur_addr,
-cluster_size,
+cur_buf = buf;
+last_buf = buf;
+
+while (last_addr < end_addr) {
+int is_zero = 0;
+int buf_size = MIN(end_addr - cur_addr, cluster_size);
+
+/* If the "zero blocks" migration capability is enabled
+ * and the buf_size == BLOCK_SIZE, then the source QEMU
+ * process has already scanned for zeroes. CPU is wasted
+ * scanning for zeroes in the destination QEMU process.
+ */
+if (block_mig_state.zero_blocks &&
+buf_size == BLOCK_SIZE) {
+is_zero = 0;
+} else {
+is_zero = buffer_is_zero(cur_buf, buf_size);
+}
+
+cur_addr += buf_size;
+cur_buf += buf_size;
+while (cur_addr < end_addr) {
+buf_size = MIN(end_addr - cur_addr, cluster_size);
+if (is_zero != buffer_is_zero(cur_buf, buf_size)) {
+break;
+}
+cur_addr += buf_size;
+cur_buf += buf_size;
+}
+
+if (is_zero) {
+ret = blk_pwrite_zeroes(blk, last_addr,
+cur_addr - last_addr,
 BDRV_REQ_MAY_UNMAP);
 } else {
-ret = blk_pwrite(blk, cur_addr, cur_buf,
- cluster_size, 0);
+ret = blk_pwrite(blk, last_addr, last_buf,
+ cur_addr - last_addr, 0);
 }
 if (ret < 0) {
 break;
 }
+
+last_addr = cur_addr;
+last_buf = cur_buf;
 }
 g_free(buf);
 }
-- 
1.8.3.1




Re: [Qemu-block] [PATCH] migration/block: optimize the performance by coalescing the same write type

2017-04-24 Thread 858585 jemmy
the reason of MIN_CLUSTER_SIZE is 8192 is base on the performance
test result. the performance is only reduce obviously when cluster_size is
less than 8192.

I write this code, run in guest os. to create the worst condition.

#include 
#include 
#include 

int main()
{
char *zero;
char *nonzero;
FILE* fp = fopen("./test.dat", "ab");

zero = malloc(sizeof(char)*512*8);
nonzero = malloc(sizeof(char)*512*8);

memset(zero, 0, sizeof(char)*512*8);
memset(nonzero, 1, sizeof(char)*512*8);

while (1) {
fwrite(zero, sizeof(char)*512*8, 1, fp);
fwrite(nonzero, sizeof(char)*512*8, 1, fp);
}
fclose(fp);
}


On Mon, Apr 24, 2017 at 9:55 PM,   wrote:
> From: Lidong Chen 
>
> This patch optimizes the performance by coalescing the same write type.
> When the zero/non-zero state changes, perform the write for the accumulated
> cluster count.
>
> Signed-off-by: Lidong Chen 
> ---
> Thanks Fam Zheng and Stefan's advice.
> ---
>  migration/block.c | 66 
> +--
>  1 file changed, 49 insertions(+), 17 deletions(-)
>
> diff --git a/migration/block.c b/migration/block.c
> index 060087f..e9c5e21 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -40,6 +40,8 @@
>
>  #define MAX_INFLIGHT_IO 512
>
> +#define MIN_CLUSTER_SIZE 8192
> +
>  //#define DEBUG_BLK_MIGRATION
>
>  #ifdef DEBUG_BLK_MIGRATION
> @@ -923,10 +925,11 @@ static int block_load(QEMUFile *f, void *opaque, int 
> version_id)
>  }
>
>  ret = bdrv_get_info(blk_bs(blk), &bdi);
> -if (ret == 0 && bdi.cluster_size > 0 &&
> -bdi.cluster_size <= BLOCK_SIZE &&
> -BLOCK_SIZE % bdi.cluster_size == 0) {
> +if (ret == 0 && bdi.cluster_size > 0) {
>  cluster_size = bdi.cluster_size;
> +while (cluster_size < MIN_CLUSTER_SIZE) {
> +cluster_size *= 2;
> +}
>  } else {
>  cluster_size = BLOCK_SIZE;
>  }
> @@ -943,29 +946,58 @@ static int block_load(QEMUFile *f, void *opaque, int 
> version_id)
>  nr_sectors * BDRV_SECTOR_SIZE,
>  BDRV_REQ_MAY_UNMAP);
>  } else {
> -int i;
> -int64_t cur_addr;
> -uint8_t *cur_buf;
> +int64_t cur_addr = addr * BDRV_SECTOR_SIZE;
> +uint8_t *cur_buf = NULL;
> +int64_t last_addr = addr * BDRV_SECTOR_SIZE;
> +uint8_t *last_buf = NULL;
> +int64_t end_addr = addr * BDRV_SECTOR_SIZE + BLOCK_SIZE;
>
>  buf = g_malloc(BLOCK_SIZE);
>  qemu_get_buffer(f, buf, BLOCK_SIZE);
> -for (i = 0; i < BLOCK_SIZE / cluster_size; i++) {
> -cur_addr = addr * BDRV_SECTOR_SIZE + i * cluster_size;
> -cur_buf = buf + i * cluster_size;
> -
> -if ((!block_mig_state.zero_blocks ||
> -cluster_size < BLOCK_SIZE) &&
> -buffer_is_zero(cur_buf, cluster_size)) {
> -ret = blk_pwrite_zeroes(blk, cur_addr,
> -cluster_size,
> +cur_buf = buf;
> +last_buf = buf;
> +
> +while (last_addr < end_addr) {
> +int is_zero = 0;
> +int buf_size = MIN(end_addr - cur_addr, cluster_size);
> +
> +/* If the "zero blocks" migration capability is enabled
> + * and the buf_size == BLOCK_SIZE, then the source QEMU
> + * process has already scanned for zeroes. CPU is wasted
> + * scanning for zeroes in the destination QEMU process.
> + */
> +if (block_mig_state.zero_blocks &&
> +buf_size == BLOCK_SIZE) {
> +is_zero = 0;
> +} else {
> +is_zero = buffer_is_zero(cur_buf, buf_size);
> +}
> +
> +cur_addr += buf_size;
> +cur_buf += buf_size;
> +while (cur_addr < end_addr) {
> +buf_size = MIN(end_addr - cur_addr, cluster_size);
> +if (is_zero != buffer_is_zero(cur_buf, buf_size)) {
> +break;
> +}
> +cur_addr += buf_size;
> +cur_buf += buf_size;
> +}
> +
> +if (is_zero) {
> +ret = blk_pwrite_zeroes(blk, last_addr,
> +cur_addr - last_addr,
>  BDRV_REQ_MAY_U

Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] qemu-img: fix some spelling errors

2017-04-24 Thread Eric Blake
On 04/23/2017 09:33 AM, jemmy858...@gmail.com wrote:
> From: Lidong Chen 
> 
> Fix some spelling errors in is_allocated_sectors comment.
> 
> Signed-off-by: Lidong Chen 
> ---
>  qemu-img.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index df6d165..0b3349c 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1033,8 +1033,8 @@ done:
>  }
>  
>  /*
> - * Returns true iff the first sector pointed to by 'buf' contains at least
> - * a non-NUL byte.
> + * Returns true if the first sector pointed to by 'buf' contains at least
> + * a non-NULL byte.

NACK to both changes.  'iff' is an English word that is shorthand for
"if and only if".  "NUL" means the one-byte character, while "NULL"
means the 8-byte (or 4-byte, on 32-bit platform) pointer value.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 1/2] qemu-img: make sure contain the consecutive number of zero bytes

2017-04-24 Thread Eric Blake
On 04/23/2017 09:33 AM, jemmy858...@gmail.com wrote:
> From: Lidong Chen 
> 
> is_allocated_sectors_min don't guarantee to contain the
> consecutive number of zero bytes. this patch fixes this bug.

This message was sent without an 'In-Reply-To' header pointing to a 0/2
cover letter.  When sending a series, please always thread things to a
cover letter; you may find 'git config format.coverletter auto' to be
helpful.

> 
> Signed-off-by: Lidong Chen 
> ---
>  qemu-img.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index b220cf7..df6d165 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1060,9 +1060,9 @@ static int is_allocated_sectors(const uint8_t *buf, int 
> n, int *pnum)
>  }
>  
>  /*
> - * Like is_allocated_sectors, but if the buffer starts with a used sector,
> - * up to 'min' consecutive sectors containing zeros are ignored. This avoids
> - * breaking up write requests for only small sparse areas.
> + * Like is_allocated_sectors, but up to 'min' consecutive sectors
> + * containing zeros are ignored. This avoids breaking up write requests
> + * for only small sparse areas.
>   */
>  static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum,
>  int min)
> @@ -1071,11 +1071,12 @@ static int is_allocated_sectors_min(const uint8_t 
> *buf, int n, int *pnum,
>  int num_checked, num_used;
>  
>  if (n < min) {
> -min = n;
> +*pnum = n;
> +return 1;
>  }
>  
>  ret = is_allocated_sectors(buf, n, pnum);
> -if (!ret) {
> +if (!ret && *pnum >= min) {

I seem to recall past attempts to try and patch this function, which
were then turned down, although I haven't scrubbed the archives for a
quick URL to point to. I'm worried that there are more subtleties here
than what you realize.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] qemu-img: fix some spelling errors

2017-04-24 Thread Philippe Mathieu-Daudé

Hi Eric,

On 04/24/2017 11:40 AM, Eric Blake wrote:

On 04/23/2017 09:33 AM, jemmy858...@gmail.com wrote:

From: Lidong Chen 

Fix some spelling errors in is_allocated_sectors comment.

Signed-off-by: Lidong Chen 
---
 qemu-img.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index df6d165..0b3349c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1033,8 +1033,8 @@ done:
 }

 /*
- * Returns true iff the first sector pointed to by 'buf' contains at least
- * a non-NUL byte.
+ * Returns true if the first sector pointed to by 'buf' contains at least
+ * a non-NULL byte.


NACK to both changes.  'iff' is an English word that is shorthand for
"if and only if".  "NUL" means the one-byte character, while "NULL"
means the 8-byte (or 4-byte, on 32-bit platform) pointer value.


I agree with Lidong shorthands are not obvious from non-native speaker.

What about this?

 * Returns true if (and only if) the first sector pointed to by 'buf' 
contains

 * at least a non-null character.



Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] qemu-img: fix some spelling errors

2017-04-24 Thread Eric Blake
On 04/24/2017 10:37 AM, Philippe Mathieu-Daudé wrote:

>>>  /*
>>> - * Returns true iff the first sector pointed to by 'buf' contains at
>>> least
>>> - * a non-NUL byte.
>>> + * Returns true if the first sector pointed to by 'buf' contains at
>>> least
>>> + * a non-NULL byte.
>>
>> NACK to both changes.  'iff' is an English word that is shorthand for
>> "if and only if".  "NUL" means the one-byte character, while "NULL"
>> means the 8-byte (or 4-byte, on 32-bit platform) pointer value.
> 
> I agree with Lidong shorthands are not obvious from non-native speaker.
> 
> What about this?
> 
>  * Returns true if (and only if) the first sector pointed to by 'buf'
> contains

That might be okay.

>  * at least a non-null character.

But that still doesn't make sense.  The character name is NUL, and
non-NULL refers to something that is a pointer, not a character.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] qemu-img: fix some spelling errors

2017-04-24 Thread Eric Blake
On 04/24/2017 10:47 AM, Eric Blake wrote:
> On 04/24/2017 10:37 AM, Philippe Mathieu-Daudé wrote:
> 
  /*
 - * Returns true iff the first sector pointed to by 'buf' contains at
 least
 - * a non-NUL byte.
 + * Returns true if the first sector pointed to by 'buf' contains at
 least
 + * a non-NULL byte.
>>>
>>> NACK to both changes.  'iff' is an English word that is shorthand for
>>> "if and only if".  "NUL" means the one-byte character, while "NULL"
>>> means the 8-byte (or 4-byte, on 32-bit platform) pointer value.
>>
>> I agree with Lidong shorthands are not obvious from non-native speaker.
>>
>> What about this?
>>
>>  * Returns true if (and only if) the first sector pointed to by 'buf'
>> contains
> 
> That might be okay.
> 
>>  * at least a non-null character.
> 
> But that still doesn't make sense.  The character name is NUL, and
> non-NULL refers to something that is a pointer, not a character.

What's more, the NUL character can actually occupy more than one byte
(think UTF-16, where it is the two-byte 0 value).  Referring to NUL byte
rather than NUL character (or even the 'zero byte') makes it obvious
that this function is NOT encoding-sensitive, and doesn't start
mis-behaving just because the data picks a multi-byte character encoding.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH V2] qemu-img: simplify img_convert

2017-04-24 Thread Kevin Wolf
Am 21.04.2017 um 11:11 hat Peter Lieven geschrieben:
> img_convert has been around before there was an ImgConvertState or
> a block backend, but it has never been modified to directly use
> these structs. Change this by parsing parameters directly into
> the ImgConvertState and directly use BlockBackend where possible.
> Furthermore variable initialization has been reworked and sorted.
> 
> Signed-off-by: Peter Lieven 

Thanks, updated in the block-next branch.

Kevin



Re: [Qemu-block] [PATCH v2 1/1] qemu-img: wait for convert coroutines to complete

2017-04-24 Thread Anton Nefedov

On 04/21/2017 03:37 PM, Peter Lieven wrote:

Am 21.04.2017 um 14:19 schrieb Anton Nefedov:

On 04/21/2017 01:44 PM, Peter Lieven wrote:

Am 21.04.2017 um 12:04 schrieb Anton Nefedov:

On error path (like i/o error in one of the coroutines), it's required to
  - wait for coroutines completion before cleaning the common structures
  - reenter dependent coroutines so they ever finish

Introduced in 2d9187bc65.

Signed-off-by: Anton Nefedov 
---
[..]




And what if we error out in the read path? Wouldn't be something like this 
easier?


diff --git a/qemu-img.c b/qemu-img.c
index 22f559a..4ff1085 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1903,6 +1903,16 @@ static int convert_do_copy(ImgConvertState *s)
 main_loop_wait(false);
 }

+/* on error path we need to enter all coroutines that are still
+ * running before cleaning up common structures */
+if (s->ret) {
+for (i = 0; i < s->num_coroutines; i++) {
+ if (s->co[i]) {
+ qemu_coroutine_enter(s->co[i]);
+ }
+}
+}
+
 if (s->compressed && !s->ret) {
 /* signal EOF to align */
 ret = blk_pwrite_compressed(s->target, 0, NULL, 0);


Peter



seemed a bit too daring to me to re-enter every coroutine potentially including 
the ones that yielded waiting for I/O completion.
If that's ok - that is for sure easier :)


I think we should enter every coroutine that is still running and have it 
terminate correctly. It was my mistake that I have not
done this in the original patch. Can you check if the above fixes your test 
cases that triggered the bug?



hi, sorry I'm late with the answer

this segfaults in bdrv_close(). Looks like it tries to finish some i/o 
which coroutine we have already entered and terminated?


(gdb) run
Starting program: /vz/anefedov/qemu-build/us/./qemu-img convert -O qcow2 
./harddisk.hdd.c ./harddisk.hdd

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffeac2d700 (LWP 436020)]
[New Thread 0x7fffe4ed6700 (LWP 436021)]
qemu-img: error while reading sector 20480: Input/output error
qemu-img: error while writing sector 19712: Operation now in progress

Program received signal SIGSEGV, Segmentation fault.
aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
454 ctx = atomic_read(&co->ctx);
(gdb) bt
#0  aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
/* [Anton]: thread_pool_co_cb () here */
#1  0x55634629 in thread_pool_completion_bh 
(opaque=0x55cfe020) at /mnt/code/us-qemu/util/thread-pool.c:189
#2  0x55633b31 in aio_bh_call (bh=0x55cfe0f0) at 
/mnt/code/us-qemu/util/async.c:90
#3  aio_bh_poll (ctx=ctx@entry=0x55cee6d0) at 
/mnt/code/us-qemu/util/async.c:118
#4  0x55636f14 in aio_poll (ctx=ctx@entry=0x55cee6d0, 
blocking=) at /mnt/code/us-qemu/util/aio-posix.c:682
#5  0x555c52d4 in bdrv_drain_recurse 
(bs=bs@entry=0x55d22560) at /mnt/code/us-qemu/block/io.c:164
#6  0x555c5aed in bdrv_drained_begin 
(bs=bs@entry=0x55d22560) at /mnt/code/us-qemu/block/io.c:248
#7  0x55581443 in bdrv_close (bs=0x55d22560) at 
/mnt/code/us-qemu/block.c:2909

#8  bdrv_delete (bs=0x55d22560) at /mnt/code/us-qemu/block.c:3100
#9  bdrv_unref (bs=0x55d22560) at /mnt/code/us-qemu/block.c:4087
#10 0x555baf44 in blk_remove_bs (blk=blk@entry=0x55d22380) 
at /mnt/code/us-qemu/block/block-backend.c:552
#11 0x555bb173 in blk_delete (blk=0x55d22380) at 
/mnt/code/us-qemu/block/block-backend.c:238
#12 blk_unref (blk=blk@entry=0x55d22380) at 
/mnt/code/us-qemu/block/block-backend.c:282
#13 0x5557a22c in img_convert (argc=, 
argv=) at /mnt/code/us-qemu/qemu-img.c:2359
#14 0x55574189 in main (argc=5, argv=0x7fffe4a0) at 
/mnt/code/us-qemu/qemu-img.c:4464




Peter



/Anton



Re: [Qemu-block] [PATCH v5 09/18] qcow: convert QCow to use QCryptoBlock for encryption

2017-04-24 Thread Daniel P. Berrange
On Tue, Feb 21, 2017 at 02:19:46PM +0100, Alberto Garcia wrote:
> On Tue 21 Feb 2017 12:55:03 PM CET, Daniel P. Berrange wrote:
> > @@ -175,8 +185,31 @@ static int qcow_open(BlockDriverState *bs, QDict 
> > *options, int flags,
> >  ret = -ENOSYS;
> >  goto fail;
> >  }
> > +if (s->crypt_method_header == QCOW_CRYPT_AES) {
> > +crypto_opts = block_crypto_open_opts_init(
> > +Q_CRYPTO_BLOCK_FORMAT_QCOW, opts, "aes-", &local_err);
> > +if (local_err) {
> > +error_propagate(errp, local_err);
> > +ret = -EINVAL;
> > +goto fail;
> > +}
> >
> > +if (flags & BDRV_O_NO_IO) {
> > +cflags |= QCRYPTO_BLOCK_OPEN_NO_IO;
> > +}
> > +s->crypto = qcrypto_block_open(crypto_opts, NULL, NULL,
> > +   cflags, errp);
> 
> You don't call qcrypto_block_free() if qcow_open() eventually fails.
> Although qcow_close() takes care of that, a failure to open the image
> sets bs->drv = NULL in bdrv_open_common(), preventing qcow_close() from
> being called.

Yes, I'll fix that leak.

> 
> > @@ -260,14 +293,17 @@ static int qcow_open(BlockDriverState *bs, QDict 
> > *options, int flags,
> >  goto fail;
> >  }
> >  
> > +qapi_free_QCryptoBlockOpenOptions(crypto_opts);
> >  qemu_co_mutex_init(&s->lock);
> >  return 0;
> >  
> >   fail:
> > +qemu_opts_del(opts);
> 
> You need to delete opts as well if this function succeeds, don't you?

New version will avoid the use of QemuOpts entirely - using the QDict
APIs instead, but I'll have equiv cleanup todo.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH v5 11/18] qcow2: convert QCow2 to use QCryptoBlock for encryption

2017-04-24 Thread Daniel P. Berrange
On Tue, Feb 21, 2017 at 02:30:10PM +0100, Alberto Garcia wrote:
> On Tue 21 Feb 2017 12:55:05 PM CET, Daniel P. Berrange wrote:
> > +switch (s->crypt_method_header) {
> > +case QCOW_CRYPT_NONE:
> > +break;
> > +
> > +case QCOW_CRYPT_AES:
> > +r->crypto_opts = block_crypto_open_opts_init(
> > +Q_CRYPTO_BLOCK_FORMAT_QCOW, opts, "aes-", errp);
> > +break;
> > +
> > +default:
> > +error_setg(errp, "Unsupported encryption method %d",
> > +   s->crypt_method_header);
> > +break;
> > +}
> > +if (s->crypt_method_header && !r->crypto_opts) {
> > +ret = -EINVAL;
> > +goto fail;
> > +}
> 
> This last condition relies on the assumption that QCOW_CRYPT_NONE == 0.
> 
> I think it's safe to assume that its value is never going to change and
> therefore this isn't too important, but I'm just pointing it out in case
> you want to make it explicit.

Yeah, I'll make it explicit to be kinder to future reviewers :-)

> 
> > @@ -1122,6 +1145,24 @@ static int qcow2_open(BlockDriverState *bs, QDict 
> > *options, int flags,
> >  goto fail;
> >  }
> >  
> > +if (s->crypt_method_header == QCOW_CRYPT_AES) {
> > +unsigned int cflags = 0;
> > +if (flags & BDRV_O_NO_IO) {
> > +cflags |= QCRYPTO_BLOCK_OPEN_NO_IO;
> > +}
> > +/* TODO how do we pass the same crypto opts down to the
> > + * backing file by default, so we don't have to manually
> > + * provide the same key-secret property against the full
> > + * backing chain
> > + */
> > +s->crypto = qcrypto_block_open(s->crypto_opts, NULL, NULL,
> > +   cflags, errp);
> > +if (!s->crypto) {
> > +ret = -EINVAL;
> > +goto fail;
> > +}
> > +}
> 
> Actually this has the same problem that I mentioned for patch 9: if
> qcow2_open() fails then s->crypto is leaked.

Yep, and the crypto_opts actually


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH v5 13/18] qcow2: add support for LUKS encryption format

2017-04-24 Thread Daniel P. Berrange
On Tue, Feb 21, 2017 at 03:13:03PM +0100, Alberto Garcia wrote:
> On Tue 21 Feb 2017 12:55:07 PM CET, Daniel P. Berrange wrote:
> >  static int qcow2_set_up_encryption(BlockDriverState *bs, QemuOpts *opts,
> > -   Error **errp)
> > +   const char *fmtstr, Error **errp)
> >  {
> >  BDRVQcow2State *s = bs->opaque;
> >  QCryptoBlockCreateOptions *cryptoopts = NULL;
> >  QCryptoBlock *crypto = NULL;
> >  int ret = -EINVAL;
> > +int fmt = qcow2_crypt_method_from_format(fmtstr);
> >  
> > -cryptoopts = block_crypto_create_opts_init(
> > -Q_CRYPTO_BLOCK_FORMAT_QCOW, opts, "aes-", errp);
> > +switch (fmt) {
> > +case QCOW_CRYPT_LUKS:
> > +cryptoopts = block_crypto_create_opts_init(
> > +Q_CRYPTO_BLOCK_FORMAT_LUKS, opts, "luks-", errp);
> > +break;
> > +case QCOW_CRYPT_AES:
> > +cryptoopts = block_crypto_create_opts_init(
> > +Q_CRYPTO_BLOCK_FORMAT_QCOW, opts, "aes-", errp);
> > +break;
> > +default:
> > +error_setg(errp, "Unknown encryption format %s", fmtstr);
> > +break;
> > +}
> > +s->crypt_method_header = fmt;
> >  if (!cryptoopts) {
> >  ret = -EINVAL;
> >  goto out;
> >  }
> 
> I don't believe it has any practical effect in this case, but I think
> it's cleaner to set s->crypt_method_header = fmt after checking that
> everything went well. Otherwise an incorrect format string will result
> in s->crypt_method_header = -EINVAL, which doesn't make sense (it's not
> even the correct type: the variable is unsigned).

Yes, makes sense.

> Other than that, the patch looks good to me.
> 
> Reviewed-by: Alberto Garcia 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [Qemu-stable] [PATCH v2 1/1] qemu-img: wait for convert coroutines to complete

2017-04-24 Thread Peter Lieven


> Am 24.04.2017 um 18:27 schrieb Anton Nefedov :
> 
>> On 04/21/2017 03:37 PM, Peter Lieven wrote:
>>> Am 21.04.2017 um 14:19 schrieb Anton Nefedov:
 On 04/21/2017 01:44 PM, Peter Lieven wrote:
> Am 21.04.2017 um 12:04 schrieb Anton Nefedov:
> On error path (like i/o error in one of the coroutines), it's required to
>  - wait for coroutines completion before cleaning the common structures
>  - reenter dependent coroutines so they ever finish
> 
> Introduced in 2d9187bc65.
> 
> Signed-off-by: Anton Nefedov 
> ---
> [..]
> 
 
 
 And what if we error out in the read path? Wouldn't be something like this 
 easier?
 
 
 diff --git a/qemu-img.c b/qemu-img.c
 index 22f559a..4ff1085 100644
 --- a/qemu-img.c
 +++ b/qemu-img.c
 @@ -1903,6 +1903,16 @@ static int convert_do_copy(ImgConvertState *s)
 main_loop_wait(false);
 }
 
 +/* on error path we need to enter all coroutines that are still
 + * running before cleaning up common structures */
 +if (s->ret) {
 +for (i = 0; i < s->num_coroutines; i++) {
 + if (s->co[i]) {
 + qemu_coroutine_enter(s->co[i]);
 + }
 +}
 +}
 +
 if (s->compressed && !s->ret) {
 /* signal EOF to align */
 ret = blk_pwrite_compressed(s->target, 0, NULL, 0);
 
 
 Peter
 
>>> 
>>> seemed a bit too daring to me to re-enter every coroutine potentially 
>>> including the ones that yielded waiting for I/O completion.
>>> If that's ok - that is for sure easier :)
>> 
>> I think we should enter every coroutine that is still running and have it 
>> terminate correctly. It was my mistake that I have not
>> done this in the original patch. Can you check if the above fixes your test 
>> cases that triggered the bug?
>> 
> 
> hi, sorry I'm late with the answer
> 
> this segfaults in bdrv_close(). Looks like it tries to finish some i/o which 
> coroutine we have already entered and terminated?
> 
> (gdb) run
> Starting program: /vz/anefedov/qemu-build/us/./qemu-img convert -O qcow2 
> ./harddisk.hdd.c ./harddisk.hdd
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x7fffeac2d700 (LWP 436020)]
> [New Thread 0x7fffe4ed6700 (LWP 436021)]
> qemu-img: error while reading sector 20480: Input/output error
> qemu-img: error while writing sector 19712: Operation now in progress
> 
> Program received signal SIGSEGV, Segmentation fault.
> aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
> 454 ctx = atomic_read(&co->ctx);
> (gdb) bt
> #0  aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
> /* [Anton]: thread_pool_co_cb () here */
> #1  0x55634629 in thread_pool_completion_bh (opaque=0x55cfe020) 
> at /mnt/code/us-qemu/util/thread-pool.c:189
> #2  0x55633b31 in aio_bh_call (bh=0x55cfe0f0) at 
> /mnt/code/us-qemu/util/async.c:90
> #3  aio_bh_poll (ctx=ctx@entry=0x55cee6d0) at 
> /mnt/code/us-qemu/util/async.c:118
> #4  0x55636f14 in aio_poll (ctx=ctx@entry=0x55cee6d0, 
> blocking=) at /mnt/code/us-qemu/util/aio-posix.c:682
> #5  0x555c52d4 in bdrv_drain_recurse (bs=bs@entry=0x55d22560) at 
> /mnt/code/us-qemu/block/io.c:164
> #6  0x555c5aed in bdrv_drained_begin (bs=bs@entry=0x55d22560) at 
> /mnt/code/us-qemu/block/io.c:248
> #7  0x55581443 in bdrv_close (bs=0x55d22560) at 
> /mnt/code/us-qemu/block.c:2909
> #8  bdrv_delete (bs=0x55d22560) at /mnt/code/us-qemu/block.c:3100
> #9  bdrv_unref (bs=0x55d22560) at /mnt/code/us-qemu/block.c:4087
> #10 0x555baf44 in blk_remove_bs (blk=blk@entry=0x55d22380) at 
> /mnt/code/us-qemu/block/block-backend.c:552
> #11 0x555bb173 in blk_delete (blk=0x55d22380) at 
> /mnt/code/us-qemu/block/block-backend.c:238
> #12 blk_unref (blk=blk@entry=0x55d22380) at 
> /mnt/code/us-qemu/block/block-backend.c:282
> #13 0x5557a22c in img_convert (argc=, argv= out>) at /mnt/code/us-qemu/qemu-img.c:2359
> #14 0x55574189 in main (argc=5, argv=0x7fffe4a0) at 
> /mnt/code/us-qemu/qemu-img.c:4464
> 
> 
>> Peter
>> 
> 
> /Anton
> 

it seems that this is a bit tricky, can you share how your test case works?

thanks,
peter



[Qemu-block] [PULL v2 02/12] block/vxhs.c: Add qemu-iotests for new block device type "vxhs"

2017-04-24 Thread Jeff Cody
From: Ashish Mittal 

These changes use a vxhs test server that is a part of the following
repository:
https://github.com/VeritasHyperScale/libqnio.git

Signed-off-by: Ashish Mittal 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Jeff Cody 
Signed-off-by: Jeff Cody 
Message-id: 1491277689-24949-3-git-send-email-ashish.mit...@veritas.com
---
 tests/qemu-iotests/common|  6 ++
 tests/qemu-iotests/common.config | 13 +
 tests/qemu-iotests/common.filter |  1 +
 tests/qemu-iotests/common.rc | 19 +++
 4 files changed, 39 insertions(+)

diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index 4d5650d..9c6f972 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -157,6 +157,7 @@ check options
 -sshtest ssh
 -nfstest nfs
 -luks   test luks
+-vxhs   test vxhs
 -xdiff  graphical mode diff
 -nocacheuse O_DIRECT on backing file
 -misalign   misalign memory allocations
@@ -260,6 +261,11 @@ testlist options
 xpand=false
 ;;
 
+-vxhs)
+IMGPROTO=vxhs
+xpand=false
+;;
+
 -ssh)
 IMGPROTO=ssh
 xpand=false
diff --git a/tests/qemu-iotests/common.config b/tests/qemu-iotests/common.config
index 55527aa..c4b51b3 100644
--- a/tests/qemu-iotests/common.config
+++ b/tests/qemu-iotests/common.config
@@ -105,6 +105,10 @@ if [ -z "$QEMU_NBD_PROG" ]; then
 export QEMU_NBD_PROG="`set_prog_path qemu-nbd`"
 fi
 
+if [ -z "$QEMU_VXHS_PROG" ]; then
+export QEMU_VXHS_PROG="`set_prog_path qnio_server`"
+fi
+
 _qemu_wrapper()
 {
 (
@@ -156,10 +160,19 @@ _qemu_nbd_wrapper()
 )
 }
 
+_qemu_vxhs_wrapper()
+{
+(
+echo $BASHPID > "${TEST_DIR}/qemu-vxhs.pid"
+exec "$QEMU_VXHS_PROG" $QEMU_VXHS_OPTIONS "$@"
+)
+}
+
 export QEMU=_qemu_wrapper
 export QEMU_IMG=_qemu_img_wrapper
 export QEMU_IO=_qemu_io_wrapper
 export QEMU_NBD=_qemu_nbd_wrapper
+export QEMU_VXHS=_qemu_vxhs_wrapper
 
 QEMU_IMG_EXTRA_ARGS=
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 1040013..c9a2d5c 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -122,6 +122,7 @@ _filter_img_info()
 -e "s#$TEST_DIR#TEST_DIR#g" \
 -e "s#$IMGFMT#IMGFMT#g" \
 -e 's#nbd://127.0.0.1:10810$#TEST_DIR/t.IMGFMT#g' \
+-e 's#json.*vdisk-id.*vxhs"}}#TEST_DIR/t.IMGFMT#' \
 -e "/encrypted: yes/d" \
 -e "/cluster_size: [0-9]\\+/d" \
 -e "/table_size: [0-9]\\+/d" \
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 7d4781d..62529ee 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -85,6 +85,9 @@ else
 elif [ "$IMGPROTO" = "nfs" ]; then
 TEST_DIR="nfs://127.0.0.1/$TEST_DIR"
 TEST_IMG=$TEST_DIR/t.$IMGFMT
+elif [ "$IMGPROTO" = "vxhs" ]; then
+TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
+TEST_IMG="vxhs://127.0.0.1:/t.$IMGFMT"
 else
 TEST_IMG=$IMGPROTO:$TEST_DIR/t.$IMGFMT
 fi
@@ -171,6 +174,12 @@ _make_test_img()
 eval "$QEMU_NBD -v -t -b 127.0.0.1 -p 10810 -f $IMGFMT  $TEST_IMG_FILE 
>/dev/null &"
 sleep 1 # FIXME: qemu-nbd needs to be listening before we continue
 fi
+
+# Start QNIO server on image directory for vxhs protocol
+if [ $IMGPROTO = "vxhs" ]; then
+eval "$QEMU_VXHS -d  $TEST_DIR > /dev/null &"
+sleep 1 # Wait for server to come up.
+fi
 }
 
 _rm_test_img()
@@ -197,6 +206,16 @@ _cleanup_test_img()
 fi
 rm -f "$TEST_IMG_FILE"
 ;;
+vxhs)
+if [ -f "${TEST_DIR}/qemu-vxhs.pid" ]; then
+local QEMU_VXHS_PID
+read QEMU_VXHS_PID < "${TEST_DIR}/qemu-vxhs.pid"
+kill ${QEMU_VXHS_PID} >/dev/null 2>&1
+rm -f "${TEST_DIR}/qemu-vxhs.pid"
+fi
+rm -f "$TEST_IMG_FILE"
+;;
+
 file)
 _rm_test_img "$TEST_DIR/t.$IMGFMT"
 _rm_test_img "$TEST_DIR/t.$IMGFMT.orig"
-- 
2.9.3




[Qemu-block] [PULL v2 00/12] Block patches

2017-04-24 Thread Jeff Cody
The following changes since commit 4c55b1d0bad8a703f0499fe62e3761a0cd288da3:

  Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2017-04-24' into 
staging (2017-04-24 14:49:48 +0100)

are available in the git repository at:

  git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to ecfa185400ade2abc9915efa924cbad1e15a21a4:

  qemu-iotests: _cleanup_qemu must be called on exit (2017-04-24 15:09:33 -0400)


Pull v2, with 32-bit errors fixed.  I don't have OS X to test compile on,
but I think it is safe to assume the cause of the compile error was the same.


Ashish Mittal (2):
  block/vxhs.c: Add support for a new block device type called "vxhs"
  block/vxhs.c: Add qemu-iotests for new block device type "vxhs"

Jeff Cody (10):
  qemu-iotests: exclude vxhs from image creation via protocol
  block: add bdrv_set_read_only() helper function
  block: do not set BDS read_only if copy_on_read enabled
  block: honor BDRV_O_ALLOW_RDWR when clearing bs->read_only
  block: code movement
  block: introduce bdrv_can_set_read_only()
  block: use bdrv_can_set_read_only() during reopen
  block/rbd - update variable names to more apt names
  block/rbd: Add support for reopen()
  qemu-iotests: _cleanup_qemu must be called on exit

 block.c  |  56 +++-
 block/Makefile.objs  |   2 +
 block/bochs.c|   5 +-
 block/cloop.c|   5 +-
 block/dmg.c  |   6 +-
 block/rbd.c  |  65 +++--
 block/trace-events   |  17 ++
 block/vvfat.c|  19 +-
 block/vxhs.c | 575 +++
 configure|  39 +++
 include/block/block.h|   2 +
 qapi/block-core.json |  23 +-
 tests/qemu-iotests/017   |   1 +
 tests/qemu-iotests/020   |   1 +
 tests/qemu-iotests/028   |   1 +
 tests/qemu-iotests/029   |   1 +
 tests/qemu-iotests/073   |   1 +
 tests/qemu-iotests/094   |  11 +-
 tests/qemu-iotests/102   |   5 +-
 tests/qemu-iotests/109   |   1 +
 tests/qemu-iotests/114   |   1 +
 tests/qemu-iotests/117   |   1 +
 tests/qemu-iotests/130   |   2 +
 tests/qemu-iotests/134   |   1 +
 tests/qemu-iotests/140   |   1 +
 tests/qemu-iotests/141   |   1 +
 tests/qemu-iotests/143   |   1 +
 tests/qemu-iotests/156   |   2 +
 tests/qemu-iotests/158   |   1 +
 tests/qemu-iotests/common|   6 +
 tests/qemu-iotests/common.config |  13 +
 tests/qemu-iotests/common.filter |   1 +
 tests/qemu-iotests/common.rc |  19 ++
 33 files changed, 844 insertions(+), 42 deletions(-)
 create mode 100644 block/vxhs.c

-- 
2.9.3




[Qemu-block] [PULL v2 04/12] block: add bdrv_set_read_only() helper function

2017-04-24 Thread Jeff Cody
We have a helper wrapper for checking for the BDS read_only flag,
add a helper wrapper to set the read_only flag as well.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Jeff Cody 
Reviewed-by: John Snow 
Message-id: 
9b18972d05f5fa2ac16c014f0af98d680553048d.1491597120.git.jc...@redhat.com
---
 block.c   | 5 +
 block/bochs.c | 2 +-
 block/cloop.c | 2 +-
 block/dmg.c   | 2 +-
 block/rbd.c   | 2 +-
 block/vvfat.c | 4 ++--
 include/block/block.h | 1 +
 7 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 7eda9a4..6f21145 100644
--- a/block.c
+++ b/block.c
@@ -192,6 +192,11 @@ void path_combine(char *dest, int dest_size,
 }
 }
 
+void bdrv_set_read_only(BlockDriverState *bs, bool read_only)
+{
+bs->read_only = read_only;
+}
+
 void bdrv_get_full_backing_filename_from_filename(const char *backed,
   const char *backing,
   char *dest, size_t sz,
diff --git a/block/bochs.c b/block/bochs.c
index 516da56..bdc2831 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -110,7 +110,7 @@ static int bochs_open(BlockDriverState *bs, QDict *options, 
int flags,
 return -EINVAL;
 }
 
-bs->read_only = true; /* no write support yet */
+bdrv_set_read_only(bs, true); /* no write support yet */
 
 ret = bdrv_pread(bs->file, 0, &bochs, sizeof(bochs));
 if (ret < 0) {
diff --git a/block/cloop.c b/block/cloop.c
index a6c7b9d..11f17c8 100644
--- a/block/cloop.c
+++ b/block/cloop.c
@@ -72,7 +72,7 @@ static int cloop_open(BlockDriverState *bs, QDict *options, 
int flags,
 return -EINVAL;
 }
 
-bs->read_only = true;
+bdrv_set_read_only(bs, true);
 
 /* read header */
 ret = bdrv_pread(bs->file, 128, &s->block_size, 4);
diff --git a/block/dmg.c b/block/dmg.c
index a7d25fc..27ce4a6 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -420,7 +420,7 @@ static int dmg_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 block_module_load_one("dmg-bz2");
-bs->read_only = true;
+bdrv_set_read_only(bs, true);
 
 s->n_chunks = 0;
 s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
diff --git a/block/rbd.c b/block/rbd.c
index 1ceeeb5..6ad2904 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -641,7 +641,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto failed_open;
 }
 
-bs->read_only = (s->snap != NULL);
+bdrv_set_read_only(bs, (s->snap != NULL));
 
 qemu_opts_del(opts);
 return 0;
diff --git a/block/vvfat.c b/block/vvfat.c
index af5153d..d4ce6d7 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1157,7 +1157,7 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->current_cluster=0x;
 
 /* read only is the default for safety */
-bs->read_only = true;
+bdrv_set_read_only(bs, true);
 s->qcow = NULL;
 s->qcow_filename = NULL;
 s->fat2 = NULL;
@@ -1173,7 +1173,7 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,
 if (ret < 0) {
 goto fail;
 }
-bs->read_only = false;
+bdrv_set_read_only(bs, false);
 }
 
 bs->total_sectors = cyls * heads * secs;
diff --git a/include/block/block.h b/include/block/block.h
index 466de49..99d49f2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -434,6 +434,7 @@ int bdrv_is_allocated_above(BlockDriverState *top, 
BlockDriverState *base,
 int64_t sector_num, int nb_sectors, int *pnum);
 
 bool bdrv_is_read_only(BlockDriverState *bs);
+void bdrv_set_read_only(BlockDriverState *bs, bool read_only);
 bool bdrv_is_sg(BlockDriverState *bs);
 bool bdrv_is_inserted(BlockDriverState *bs);
 int bdrv_media_changed(BlockDriverState *bs);
-- 
2.9.3




[Qemu-block] [PULL v2 08/12] block: introduce bdrv_can_set_read_only()

2017-04-24 Thread Jeff Cody
Introduce check function for setting read_only flags.  Will return < 0 on
error, with appropriate Error value set.  Does not alter any flags.

Signed-off-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: John Snow 
Message-id: 
e2bba34ac3bc76a0c42adc390413f358ae0566e8.1491597120.git.jc...@redhat.com
---
 block.c   | 14 +-
 include/block/block.h |  1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 123c982..1ac05c1 100644
--- a/block.c
+++ b/block.c
@@ -197,7 +197,7 @@ bool bdrv_is_read_only(BlockDriverState *bs)
 return bs->read_only;
 }
 
-int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
+int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
 {
 /* Do not set read_only if copy_on_read is enabled */
 if (bs->copy_on_read && read_only) {
@@ -213,6 +213,18 @@ int bdrv_set_read_only(BlockDriverState *bs, bool 
read_only, Error **errp)
 return -EPERM;
 }
 
+return 0;
+}
+
+int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
+{
+int ret = 0;
+
+ret = bdrv_can_set_read_only(bs, read_only, errp);
+if (ret < 0) {
+return ret;
+}
+
 bs->read_only = read_only;
 return 0;
 }
diff --git a/include/block/block.h b/include/block/block.h
index 1d7fd19..144df0d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -434,6 +434,7 @@ int bdrv_is_allocated_above(BlockDriverState *top, 
BlockDriverState *base,
 int64_t sector_num, int nb_sectors, int *pnum);
 
 bool bdrv_is_read_only(BlockDriverState *bs);
+int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only, Error **errp);
 int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp);
 bool bdrv_is_sg(BlockDriverState *bs);
 bool bdrv_is_inserted(BlockDriverState *bs);
-- 
2.9.3




[Qemu-block] [PULL v2 05/12] block: do not set BDS read_only if copy_on_read enabled

2017-04-24 Thread Jeff Cody
A few block drivers will set the BDS read_only flag from their
.bdrv_open() function.  This means the bs->read_only flag could
be set after we enable copy_on_read, as the BDRV_O_COPY_ON_READ
flag check occurs prior to the call to bdrv->bdrv_open().

This adds an error return to bdrv_set_read_only(), and an error will be
return if we try to set the BDS to read_only while copy_on_read is
enabled.

This patch also changes the behavior of vvfat.  Before, vvfat could
override the drive 'readonly' flag with its own, internal 'rw' flag.

For instance, this -drive parameter would result in a writable image:

"-drive format=vvfat,dir=/tmp/vvfat,rw,if=virtio,readonly=on"

This is not correct.  Now, attempting to use the above -drive parameter
will result in an error (i.e., 'rw' is incompatible with 'readonly=on').

Signed-off-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: John Snow 
Message-id: 
0c5b4c1cc2c651471b131f21376dfd5ea24d2196.1491597120.git.jc...@redhat.com
---
 block.c   | 10 +-
 block/bochs.c |  5 -
 block/cloop.c |  5 -
 block/dmg.c   |  6 +-
 block/rbd.c   | 11 ++-
 block/vvfat.c | 19 +++
 include/block/block.h |  2 +-
 7 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/block.c b/block.c
index 6f21145..67ae35a 100644
--- a/block.c
+++ b/block.c
@@ -192,9 +192,17 @@ void path_combine(char *dest, int dest_size,
 }
 }
 
-void bdrv_set_read_only(BlockDriverState *bs, bool read_only)
+int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
 {
+/* Do not set read_only if copy_on_read is enabled */
+if (bs->copy_on_read && read_only) {
+error_setg(errp, "Can't set node '%s' to r/o with copy-on-read 
enabled",
+   bdrv_get_device_or_node_name(bs));
+return -EINVAL;
+}
+
 bs->read_only = read_only;
+return 0;
 }
 
 void bdrv_get_full_backing_filename_from_filename(const char *backed,
diff --git a/block/bochs.c b/block/bochs.c
index bdc2831..a759b6e 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -110,7 +110,10 @@ static int bochs_open(BlockDriverState *bs, QDict 
*options, int flags,
 return -EINVAL;
 }
 
-bdrv_set_read_only(bs, true); /* no write support yet */
+ret = bdrv_set_read_only(bs, true, errp); /* no write support yet */
+if (ret < 0) {
+return ret;
+}
 
 ret = bdrv_pread(bs->file, 0, &bochs, sizeof(bochs));
 if (ret < 0) {
diff --git a/block/cloop.c b/block/cloop.c
index 11f17c8..d6597fc 100644
--- a/block/cloop.c
+++ b/block/cloop.c
@@ -72,7 +72,10 @@ static int cloop_open(BlockDriverState *bs, QDict *options, 
int flags,
 return -EINVAL;
 }
 
-bdrv_set_read_only(bs, true);
+ret = bdrv_set_read_only(bs, true, errp);
+if (ret < 0) {
+return ret;
+}
 
 /* read header */
 ret = bdrv_pread(bs->file, 128, &s->block_size, 4);
diff --git a/block/dmg.c b/block/dmg.c
index 27ce4a6..900ae5a 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -419,8 +419,12 @@ static int dmg_open(BlockDriverState *bs, QDict *options, 
int flags,
 return -EINVAL;
 }
 
+ret = bdrv_set_read_only(bs, true, errp);
+if (ret < 0) {
+return ret;
+}
+
 block_module_load_one("dmg-bz2");
-bdrv_set_read_only(bs, true);
 
 s->n_chunks = 0;
 s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
diff --git a/block/rbd.c b/block/rbd.c
index 6ad2904..1c43171 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -635,13 +635,22 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto failed_shutdown;
 }
 
+/* rbd_open is always r/w */
 r = rbd_open(s->io_ctx, s->name, &s->image, s->snap);
 if (r < 0) {
 error_setg_errno(errp, -r, "error reading header from %s", s->name);
 goto failed_open;
 }
 
-bdrv_set_read_only(bs, (s->snap != NULL));
+/* If we are using an rbd snapshot, we must be r/o, otherwise
+ * leave as-is */
+if (s->snap != NULL) {
+r = bdrv_set_read_only(bs, true, &local_err);
+if (r < 0) {
+error_propagate(errp, local_err);
+goto failed_open;
+}
+}
 
 qemu_opts_del(opts);
 return 0;
diff --git a/block/vvfat.c b/block/vvfat.c
index d4ce6d7..b509d55 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1156,8 +1156,6 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 s->current_cluster=0x;
 
-/* read only is the default for safety */
-bdrv_set_read_only(bs, true);
 s->qcow = NULL;
 s->qcow_filename = NULL;
 s->fat2 = NULL;
@@ -1169,11 +1167,24 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->sector_count = cyls * heads * secs - (s->first_sectors_number - 1);
 
 if (qemu_opt_get_bool(opts, "rw", false)) {
-ret = enable_write_target(bs, errp);
+if (!bdrv_is_read

[Qemu-block] [PULL v2 07/12] block: code movement

2017-04-24 Thread Jeff Cody
Move bdrv_is_read_only() up with its friends.

Reviewed-by: Stefan Hajnoczi 
Reviewed-by: John Snow 
Signed-off-by: Jeff Cody 
Message-id: 
73b2399459760c32506f9407efb9dddb3a2789de.1491597120.git.jc...@redhat.com
---
 block.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index 4752fee..123c982 100644
--- a/block.c
+++ b/block.c
@@ -192,6 +192,11 @@ void path_combine(char *dest, int dest_size,
 }
 }
 
+bool bdrv_is_read_only(BlockDriverState *bs)
+{
+return bs->read_only;
+}
+
 int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
 {
 /* Do not set read_only if copy_on_read is enabled */
@@ -3375,11 +3380,6 @@ void bdrv_get_geometry(BlockDriverState *bs, uint64_t 
*nb_sectors_ptr)
 *nb_sectors_ptr = nb_sectors < 0 ? 0 : nb_sectors;
 }
 
-bool bdrv_is_read_only(BlockDriverState *bs)
-{
-return bs->read_only;
-}
-
 bool bdrv_is_sg(BlockDriverState *bs)
 {
 return bs->sg;
-- 
2.9.3




[Qemu-block] [PULL v2 06/12] block: honor BDRV_O_ALLOW_RDWR when clearing bs->read_only

2017-04-24 Thread Jeff Cody
The BDRV_O_ALLOW_RDWR flag allows / prohibits the changing of
the BDS 'read_only' state, but there are a few places where it
is ignored.  In the bdrv_set_read_only() helper, make sure to
honor the flag.

Signed-off-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: John Snow 
Message-id: 
be2e5fb2d285cbece2b6d06bed54a6f56520d251.1491597120.git.jc...@redhat.com
---
 block.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block.c b/block.c
index 67ae35a..4752fee 100644
--- a/block.c
+++ b/block.c
@@ -201,6 +201,13 @@ int bdrv_set_read_only(BlockDriverState *bs, bool 
read_only, Error **errp)
 return -EINVAL;
 }
 
+/* Do not clear read_only if it is prohibited */
+if (!read_only && !(bs->open_flags & BDRV_O_ALLOW_RDWR)) {
+error_setg(errp, "Node '%s' is read only",
+   bdrv_get_device_or_node_name(bs));
+return -EPERM;
+}
+
 bs->read_only = read_only;
 return 0;
 }
-- 
2.9.3




[Qemu-block] [PULL v2 03/12] qemu-iotests: exclude vxhs from image creation via protocol

2017-04-24 Thread Jeff Cody
The protocol VXHS does not support image creation.  Some tests expect
to be able to create images through the protocol.  Exclude VXHS from
these tests.

Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/017 | 1 +
 tests/qemu-iotests/020 | 1 +
 tests/qemu-iotests/029 | 1 +
 tests/qemu-iotests/073 | 1 +
 tests/qemu-iotests/114 | 1 +
 tests/qemu-iotests/130 | 1 +
 tests/qemu-iotests/134 | 1 +
 tests/qemu-iotests/156 | 1 +
 tests/qemu-iotests/158 | 1 +
 9 files changed, 9 insertions(+)

diff --git a/tests/qemu-iotests/017 b/tests/qemu-iotests/017
index e3f9e75..4f9302d 100755
--- a/tests/qemu-iotests/017
+++ b/tests/qemu-iotests/017
@@ -41,6 +41,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 # Any format supporting backing files
 _supported_fmt qcow qcow2 vmdk qed
 _supported_proto generic
+_unsupported_proto vxhs
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat"
 
diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 9c4a68c..7a0 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -43,6 +43,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 # Any format supporting backing files
 _supported_fmt qcow qcow2 vmdk qed
 _supported_proto generic
+_unsupported_proto vxhs
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" \
  "subformat=twoGbMaxExtentFlat" \
diff --git a/tests/qemu-iotests/029 b/tests/qemu-iotests/029
index e639ac0..30bab24 100755
--- a/tests/qemu-iotests/029
+++ b/tests/qemu-iotests/029
@@ -42,6 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 # Any format supporting intenal snapshots
 _supported_fmt qcow2
 _supported_proto generic
+_unsupported_proto vxhs
 _supported_os Linux
 # Internal snapshots are (currently) impossible with refcount_bits=1
 _unsupported_imgopts 'refcount_bits=1[^0-9]'
diff --git a/tests/qemu-iotests/073 b/tests/qemu-iotests/073
index ad37a61..40f85b1 100755
--- a/tests/qemu-iotests/073
+++ b/tests/qemu-iotests/073
@@ -39,6 +39,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 _supported_fmt qcow2
 _supported_proto generic
+_unsupported_proto vxhs
 _supported_os Linux
 
 CLUSTER_SIZE=64k
diff --git a/tests/qemu-iotests/114 b/tests/qemu-iotests/114
index f110d4f..5b7dc54 100755
--- a/tests/qemu-iotests/114
+++ b/tests/qemu-iotests/114
@@ -39,6 +39,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 _supported_fmt qcow2
 _supported_proto generic
+_unsupported_proto vxhs
 _supported_os Linux
 
 
diff --git a/tests/qemu-iotests/130 b/tests/qemu-iotests/130
index ecc8a5b..f941fc9 100755
--- a/tests/qemu-iotests/130
+++ b/tests/qemu-iotests/130
@@ -42,6 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 _supported_fmt qcow2
 _supported_proto generic
+_unsupported_proto vxhs
 _supported_os Linux
 
 qemu_comm_method="monitor"
diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
index af618b8..acce946 100755
--- a/tests/qemu-iotests/134
+++ b/tests/qemu-iotests/134
@@ -39,6 +39,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 _supported_fmt qcow2
 _supported_proto generic
+_unsupported_proto vxhs
 _supported_os Linux
 
 
diff --git a/tests/qemu-iotests/156 b/tests/qemu-iotests/156
index cc95ff1..78deaff 100755
--- a/tests/qemu-iotests/156
+++ b/tests/qemu-iotests/156
@@ -48,6 +48,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 _supported_fmt qcow2 qed
 _supported_proto generic
+_unsupported_proto vxhs
 _supported_os Linux
 
 # Create source disk
diff --git a/tests/qemu-iotests/158 b/tests/qemu-iotests/158
index a6cdd6d..ef8d70f 100755
--- a/tests/qemu-iotests/158
+++ b/tests/qemu-iotests/158
@@ -39,6 +39,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 _supported_fmt qcow2
 _supported_proto generic
+_unsupported_proto vxhs
 _supported_os Linux
 
 
-- 
2.9.3




[Qemu-block] [PULL v2 01/12] block/vxhs.c: Add support for a new block device type called "vxhs"

2017-04-24 Thread Jeff Cody
From: Ashish Mittal 

Source code for the qnio library that this code loads can be downloaded from:
https://github.com/VeritasHyperScale/libqnio.git

Sample command line using JSON syntax:
./x86_64-softmmu/qemu-system-x86_64 -name instance-0008 -S -vnc 0.0.0.0:0
-k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
-msg timestamp=on
'json:{"driver":"vxhs","vdisk-id":"c3e9095a-a5ee-4dce-afeb-2a59fb387410",
"server":{"host":"172.172.17.4","port":""}}'

Sample command line using URI syntax:
qemu-img convert -f raw -O raw -n
/var/lib/nova/instances/_base/0c5eacd5ebea5ed914b6a3e7b18f1ce734c386ad
vxhs://192.168.0.1:/c6718f6b-0401-441d-a8c3-1f0064d75ee0

Sample command line using TLS credentials (run in secure mode):
./qemu-io --object
tls-creds-x509,id=tls0,dir=/etc/pki/qemu/vxhs,endpoint=client -c 'read
-v 66000 2.5k' 'json:{"server.host": "127.0.0.1", "server.port": "",
"vdisk-id": "/test.raw", "driver": "vxhs", "tls-creds":"tls0"}'

[Jeff: Modified trace-events with the correct string formatting]

Signed-off-by: Ashish Mittal 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Jeff Cody 
Signed-off-by: Jeff Cody 
Message-id: 1491277689-24949-2-git-send-email-ashish.mit...@veritas.com
---
 block/Makefile.objs  |   2 +
 block/trace-events   |  17 ++
 block/vxhs.c | 575 +++
 configure|  39 
 qapi/block-core.json |  23 ++-
 5 files changed, 654 insertions(+), 2 deletions(-)
 create mode 100644 block/vxhs.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index de96f8e..ea95530 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -19,6 +19,7 @@ block-obj-$(CONFIG_LIBNFS) += nfs.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
 block-obj-$(CONFIG_GLUSTERFS) += gluster.o
+block-obj-$(CONFIG_VXHS) += vxhs.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o dirty-bitmap.o
 block-obj-y += write-threshold.o
@@ -38,6 +39,7 @@ rbd.o-cflags   := $(RBD_CFLAGS)
 rbd.o-libs := $(RBD_LIBS)
 gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
 gluster.o-libs := $(GLUSTERFS_LIBS)
+vxhs.o-libs:= $(VXHS_LIBS)
 ssh.o-cflags   := $(LIBSSH2_CFLAGS)
 ssh.o-libs := $(LIBSSH2_LIBS)
 block-obj-$(if $(CONFIG_BZIP2),m,n) += dmg-bz2.o
diff --git a/block/trace-events b/block/trace-events
index 0bc5c0a..9a71c7f 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -110,3 +110,20 @@ qed_aio_write_data(void *s, void *acb, int ret, uint64_t 
offset, size_t len) "s
 qed_aio_write_prefill(void *s, void *acb, uint64_t start, size_t len, uint64_t 
offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64
 qed_aio_write_postfill(void *s, void *acb, uint64_t start, size_t len, 
uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64
 qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) 
"s %p acb %p ret %d offset %"PRIu64" len %zu"
+
+# block/vxhs.c
+vxhs_iio_callback(int error) "ctx is NULL: error %d"
+vxhs_iio_callback_chnfail(int err, int error) "QNIO channel failed, no i/o %d, 
%d"
+vxhs_iio_callback_unknwn(int opcode, int err) "unexpected opcode %d, errno %d"
+vxhs_aio_rw_invalid(int req) "Invalid I/O request iodir %d"
+vxhs_aio_rw_ioerr(char *guid, int iodir, uint64_t size, uint64_t off, void 
*acb, int ret, int err) "IO ERROR (vDisk %s) FOR : Read/Write = %d size = 
%"PRIu64" offset = %"PRIu64" ACB = %p. Error = %d, errno = %d"
+vxhs_get_vdisk_stat_err(char *guid, int ret, int err) "vDisk (%s) stat ioctl 
failed, ret = %d, errno = %d"
+vxhs_get_vdisk_stat(char *vdisk_guid, uint64_t vdisk_size) "vDisk %s stat 
ioctl returned size %"PRIu64
+vxhs_complete_aio(void *acb, uint64_t ret) "aio failed acb %p ret %"PRIu64
+vxhs_parse_uri_filename(const char *filename) "URI passed via 
bdrv_parse_filename %s"
+vxhs_open_vdiskid(const char *vdisk_id) "Opening vdisk-id %s"
+vxhs_open_hostinfo(char *of_vsa_addr, int port) "Adding host %s:%d to 
BDRVVXHSState"
+vxhs_open_iio_open(const char *host) "Failed to connect to storage agent on 
host %s"
+vxhs_parse_uri_hostinfo(char *host, int port) "Host: IP %s, Port %d"
+vxhs_close(char *vdisk_guid) "Closing vdisk %s"
+vxhs_get_creds(const char *cacert, const char *client_key, const char 
*client_cert) "cacert %s, client_key %s, client_cert %s"
diff --git a/block/vxhs.c b/block/vxhs.c
new file mode 100644
index 000..9ffe9d3
--- /dev/null
+++ b/block/vxhs.c
@@ -0,0 +1,575 @@
+/*
+ * QEMU Block driver for Veritas HyperScale (VxHS)
+ *
+ * Copyright (c) 2017 Veritas Technologies LLC.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+#include "block/block_int.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qstring.h"
+#include "trace.h"
+#include "qemu/uri.h"
+#include "qapi/error.h"
+#include "qemu/uuid.h"
+#i

[Qemu-block] [PULL v2 09/12] block: use bdrv_can_set_read_only() during reopen

2017-04-24 Thread Jeff Cody
Signed-off-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: John Snow 
Message-id: 
00aed7ffdd7be4b9ed9ce1007d50028a72b34ebe.1491597120.git.jc...@redhat.com
---
 block.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 1ac05c1..5db266b 100644
--- a/block.c
+++ b/block.c
@@ -2789,6 +2789,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 BlockDriver *drv;
 QemuOpts *opts;
 const char *value;
+bool read_only;
 
 assert(reopen_state != NULL);
 assert(reopen_state->bs->drv != NULL);
@@ -2817,12 +2818,13 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 qdict_put(reopen_state->options, "driver", qstring_from_str(value));
 }
 
-/* if we are to stay read-only, do not allow permission change
- * to r/w */
-if (!(reopen_state->bs->open_flags & BDRV_O_ALLOW_RDWR) &&
-reopen_state->flags & BDRV_O_RDWR) {
-error_setg(errp, "Node '%s' is read only",
-   bdrv_get_device_or_node_name(reopen_state->bs));
+/* If we are to stay read-only, do not allow permission change
+ * to r/w. Attempting to set to r/w may fail if either BDRV_O_ALLOW_RDWR is
+ * not set, or if the BDS still has copy_on_read enabled */
+read_only = !(reopen_state->flags & BDRV_O_RDWR);
+ret = bdrv_can_set_read_only(reopen_state->bs, read_only, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
 goto error;
 }
 
-- 
2.9.3




[Qemu-block] [PULL v2 10/12] block/rbd - update variable names to more apt names

2017-04-24 Thread Jeff Cody
Update 'clientname' to be 'user', which tracks better with both
the QAPI and rados variable naming.

Update 'name' to be 'image_name', as it indicates the rbd image.
Naming it 'image' would have been ideal, but we are using that for
the rados_image_t value returned by rbd_open().

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Jeff Cody 
Reviewed-by: John Snow 
Message-id: 
b7ec1fb2e1cf36f9b6911631447a5b0422590b7d.1491597120.git.jc...@redhat.com
---
 block/rbd.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 1c43171..35853c9 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -94,7 +94,7 @@ typedef struct BDRVRBDState {
 rados_t cluster;
 rados_ioctx_t io_ctx;
 rbd_image_t image;
-char *name;
+char *image_name;
 char *snap;
 } BDRVRBDState;
 
@@ -350,7 +350,7 @@ static int qemu_rbd_create(const char *filename, QemuOpts 
*opts, Error **errp)
 int64_t bytes = 0;
 int64_t objsize;
 int obj_order = 0;
-const char *pool, *name, *conf, *clientname, *keypairs;
+const char *pool, *image_name, *conf, *user, *keypairs;
 const char *secretid;
 rados_t cluster;
 rados_ioctx_t io_ctx;
@@ -393,11 +393,11 @@ static int qemu_rbd_create(const char *filename, QemuOpts 
*opts, Error **errp)
  */
 pool   = qdict_get_try_str(options, "pool");
 conf   = qdict_get_try_str(options, "conf");
-clientname = qdict_get_try_str(options, "user");
-name   = qdict_get_try_str(options, "image");
+user   = qdict_get_try_str(options, "user");
+image_name = qdict_get_try_str(options, "image");
 keypairs   = qdict_get_try_str(options, "=keyvalue-pairs");
 
-ret = rados_create(&cluster, clientname);
+ret = rados_create(&cluster, user);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "error initializing");
 goto exit;
@@ -434,7 +434,7 @@ static int qemu_rbd_create(const char *filename, QemuOpts 
*opts, Error **errp)
 goto shutdown;
 }
 
-ret = rbd_create(io_ctx, name, bytes, &obj_order);
+ret = rbd_create(io_ctx, image_name, bytes, &obj_order);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "error rbd create");
 }
@@ -540,7 +540,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
  Error **errp)
 {
 BDRVRBDState *s = bs->opaque;
-const char *pool, *snap, *conf, *clientname, *name, *keypairs;
+const char *pool, *snap, *conf, *user, *image_name, *keypairs;
 const char *secretid;
 QemuOpts *opts;
 Error *local_err = NULL;
@@ -567,24 +567,24 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 pool   = qemu_opt_get(opts, "pool");
 conf   = qemu_opt_get(opts, "conf");
 snap   = qemu_opt_get(opts, "snapshot");
-clientname = qemu_opt_get(opts, "user");
-name   = qemu_opt_get(opts, "image");
+user   = qemu_opt_get(opts, "user");
+image_name = qemu_opt_get(opts, "image");
 keypairs   = qemu_opt_get(opts, "=keyvalue-pairs");
 
-if (!pool || !name) {
+if (!pool || !image_name) {
 error_setg(errp, "Parameters 'pool' and 'image' are required");
 r = -EINVAL;
 goto failed_opts;
 }
 
-r = rados_create(&s->cluster, clientname);
+r = rados_create(&s->cluster, user);
 if (r < 0) {
 error_setg_errno(errp, -r, "error initializing");
 goto failed_opts;
 }
 
 s->snap = g_strdup(snap);
-s->name = g_strdup(name);
+s->image_name = g_strdup(image_name);
 
 /* try default location when conf=NULL, but ignore failure */
 r = rados_conf_read_file(s->cluster, conf);
@@ -636,9 +636,10 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 /* rbd_open is always r/w */
-r = rbd_open(s->io_ctx, s->name, &s->image, s->snap);
+r = rbd_open(s->io_ctx, s->image_name, &s->image, s->snap);
 if (r < 0) {
-error_setg_errno(errp, -r, "error reading header from %s", s->name);
+error_setg_errno(errp, -r, "error reading header from %s",
+ s->image_name);
 goto failed_open;
 }
 
@@ -660,7 +661,7 @@ failed_open:
 failed_shutdown:
 rados_shutdown(s->cluster);
 g_free(s->snap);
-g_free(s->name);
+g_free(s->image_name);
 failed_opts:
 qemu_opts_del(opts);
 g_free(mon_host);
@@ -674,7 +675,7 @@ static void qemu_rbd_close(BlockDriverState *bs)
 rbd_close(s->image);
 rados_ioctx_destroy(s->io_ctx);
 g_free(s->snap);
-g_free(s->name);
+g_free(s->image_name);
 rados_shutdown(s->cluster);
 }
 
-- 
2.9.3




[Qemu-block] [PULL v2 11/12] block/rbd: Add support for reopen()

2017-04-24 Thread Jeff Cody
This adds support for reopen in rbd, for changing between r/w and r/o.

Note, that this is only a flag change, but we will block a change from
r/o to r/w if we are using an RBD internal snapshot.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Jeff Cody 
Reviewed-by: John Snow 
Message-id: 
d4e87539167ec6527d44c97b164eabcccf96e4f3.1491597120.git.jc...@redhat.com
---
 block/rbd.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index 35853c9..6471f4f 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -668,6 +668,26 @@ failed_opts:
 return r;
 }
 
+
+/* Since RBD is currently always opened R/W via the API,
+ * we just need to check if we are using a snapshot or not, in
+ * order to determine if we will allow it to be R/W */
+static int qemu_rbd_reopen_prepare(BDRVReopenState *state,
+   BlockReopenQueue *queue, Error **errp)
+{
+BDRVRBDState *s = state->bs->opaque;
+int ret = 0;
+
+if (s->snap && state->flags & BDRV_O_RDWR) {
+error_setg(errp,
+   "Cannot change node '%s' to r/w when using RBD snapshot",
+   bdrv_get_device_or_node_name(state->bs));
+ret = -EINVAL;
+}
+
+return ret;
+}
+
 static void qemu_rbd_close(BlockDriverState *bs)
 {
 BDRVRBDState *s = bs->opaque;
@@ -1074,6 +1094,7 @@ static BlockDriver bdrv_rbd = {
 .bdrv_parse_filename= qemu_rbd_parse_filename,
 .bdrv_file_open = qemu_rbd_open,
 .bdrv_close = qemu_rbd_close,
+.bdrv_reopen_prepare= qemu_rbd_reopen_prepare,
 .bdrv_create= qemu_rbd_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
 .bdrv_get_info  = qemu_rbd_getinfo,
-- 
2.9.3




[Qemu-block] [PULL v2 12/12] qemu-iotests: _cleanup_qemu must be called on exit

2017-04-24 Thread Jeff Cody
For the tests that use the common.qemu functions for running a QEMU
process, _cleanup_qemu must be called in the exit function.

If it is not, if the qemu process aborts, then not all of the droppings
are cleaned up (e.g. pidfile, fifos).

This updates those tests that did not have a cleanup in qemu-iotests.

(I swapped spaces for tabs in test 102 as well)

Reported-by: Eric Blake 
Reviewed-by: Eric Blake 
Signed-off-by: Jeff Cody 
Message-id: 
d59c2f6ad6c1da8b9b3c7f357c94a7122ccfc55a.1492544096.git.jc...@redhat.com
---
 tests/qemu-iotests/028 |  1 +
 tests/qemu-iotests/094 | 11 ---
 tests/qemu-iotests/102 |  5 +++--
 tests/qemu-iotests/109 |  1 +
 tests/qemu-iotests/117 |  1 +
 tests/qemu-iotests/130 |  1 +
 tests/qemu-iotests/140 |  1 +
 tests/qemu-iotests/141 |  1 +
 tests/qemu-iotests/143 |  1 +
 tests/qemu-iotests/156 |  1 +
 10 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
index 7783e57..97a8869 100755
--- a/tests/qemu-iotests/028
+++ b/tests/qemu-iotests/028
@@ -32,6 +32,7 @@ status=1  # failure is the default!
 
 _cleanup()
 {
+_cleanup_qemu
 rm -f "${TEST_IMG}.copy"
 _cleanup_test_img
 }
diff --git a/tests/qemu-iotests/094 b/tests/qemu-iotests/094
index 0ba0b0c..9aa01e3 100755
--- a/tests/qemu-iotests/094
+++ b/tests/qemu-iotests/094
@@ -27,7 +27,14 @@ echo "QA output created by $seq"
 here="$PWD"
 status=1   # failure is the default!
 
-trap "exit \$status" 0 1 2 3 15
+_cleanup()
+{
+_cleanup_qemu
+_cleanup_test_img
+rm -f "$TEST_DIR/source.$IMGFMT"
+}
+
+trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # get standard environment, filters and checks
 . ./common.rc
@@ -73,8 +80,6 @@ _send_qemu_cmd $QEMU_HANDLE \
 
 wait=1 _cleanup_qemu
 
-_cleanup_test_img
-rm -f "$TEST_DIR/source.$IMGFMT"
 
 # success, all done
 echo '*** done'
diff --git a/tests/qemu-iotests/102 b/tests/qemu-iotests/102
index 64b4af9..87db1bb 100755
--- a/tests/qemu-iotests/102
+++ b/tests/qemu-iotests/102
@@ -25,11 +25,12 @@ seq=$(basename $0)
 echo "QA output created by $seq"
 
 here=$PWD
-status=1   # failure is the default!
+status=1# failure is the default!
 
 _cleanup()
 {
-   _cleanup_test_img
+_cleanup_qemu
+_cleanup_test_img
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
diff --git a/tests/qemu-iotests/109 b/tests/qemu-iotests/109
index 927151a..6161633 100755
--- a/tests/qemu-iotests/109
+++ b/tests/qemu-iotests/109
@@ -29,6 +29,7 @@ status=1  # failure is the default!
 
 _cleanup()
 {
+_cleanup_qemu
 rm -f $TEST_IMG.src
_cleanup_test_img
 }
diff --git a/tests/qemu-iotests/117 b/tests/qemu-iotests/117
index e955d52..6c83461 100755
--- a/tests/qemu-iotests/117
+++ b/tests/qemu-iotests/117
@@ -29,6 +29,7 @@ status=1  # failure is the default!
 
 _cleanup()
 {
+_cleanup_qemu
_cleanup_test_img
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
diff --git a/tests/qemu-iotests/130 b/tests/qemu-iotests/130
index f941fc9..e7e43de 100755
--- a/tests/qemu-iotests/130
+++ b/tests/qemu-iotests/130
@@ -31,6 +31,7 @@ status=1  # failure is the default!
 
 _cleanup()
 {
+_cleanup_qemu
 _cleanup_test_img
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
diff --git a/tests/qemu-iotests/140 b/tests/qemu-iotests/140
index 49f9df4..8c80a5a 100755
--- a/tests/qemu-iotests/140
+++ b/tests/qemu-iotests/140
@@ -33,6 +33,7 @@ status=1  # failure is the default!
 
 _cleanup()
 {
+_cleanup_qemu
 _cleanup_test_img
 rm -f "$TEST_DIR/nbd"
 }
diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
index 27fb1cc..40a3405 100755
--- a/tests/qemu-iotests/141
+++ b/tests/qemu-iotests/141
@@ -29,6 +29,7 @@ status=1  # failure is the default!
 
 _cleanup()
 {
+_cleanup_qemu
 _cleanup_test_img
 rm -f "$TEST_DIR/{b,m,o}.$IMGFMT"
 }
diff --git a/tests/qemu-iotests/143 b/tests/qemu-iotests/143
index ec4ef22..5ff1944 100755
--- a/tests/qemu-iotests/143
+++ b/tests/qemu-iotests/143
@@ -29,6 +29,7 @@ status=1  # failure is the default!
 
 _cleanup()
 {
+_cleanup_qemu
 rm -f "$TEST_DIR/nbd"
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
diff --git a/tests/qemu-iotests/156 b/tests/qemu-iotests/156
index 78deaff..d799b73 100755
--- a/tests/qemu-iotests/156
+++ b/tests/qemu-iotests/156
@@ -37,6 +37,7 @@ status=1  # failure is the default!
 
 _cleanup()
 {
+_cleanup_qemu
 rm -f "$TEST_IMG{,.target}{,.backing,.overlay}"
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
-- 
2.9.3




Re: [Qemu-block] [Qemu-stable] [PATCH v2 1/1] qemu-img: wait for convert coroutines to complete

2017-04-24 Thread Anton Nefedov


On 24/04/2017 21:16, Peter Lieven wrote:




Am 24.04.2017 um 18:27 schrieb Anton Nefedov :


On 04/21/2017 03:37 PM, Peter Lieven wrote:

Am 21.04.2017 um 14:19 schrieb Anton Nefedov:

On 04/21/2017 01:44 PM, Peter Lieven wrote:

Am 21.04.2017 um 12:04 schrieb Anton Nefedov:
On error path (like i/o error in one of the coroutines), it's required to
 - wait for coroutines completion before cleaning the common structures
 - reenter dependent coroutines so they ever finish

Introduced in 2d9187bc65.

Signed-off-by: Anton Nefedov 
---
[..]




And what if we error out in the read path? Wouldn't be something like this 
easier?


diff --git a/qemu-img.c b/qemu-img.c
index 22f559a..4ff1085 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1903,6 +1903,16 @@ static int convert_do_copy(ImgConvertState *s)
main_loop_wait(false);
}

+/* on error path we need to enter all coroutines that are still
+ * running before cleaning up common structures */
+if (s->ret) {
+for (i = 0; i < s->num_coroutines; i++) {
+ if (s->co[i]) {
+ qemu_coroutine_enter(s->co[i]);
+ }
+}
+}
+
if (s->compressed && !s->ret) {
/* signal EOF to align */
ret = blk_pwrite_compressed(s->target, 0, NULL, 0);


Peter



seemed a bit too daring to me to re-enter every coroutine potentially including 
the ones that yielded waiting for I/O completion.
If that's ok - that is for sure easier :)


I think we should enter every coroutine that is still running and have it 
terminate correctly. It was my mistake that I have not
done this in the original patch. Can you check if the above fixes your test 
cases that triggered the bug?



hi, sorry I'm late with the answer

this segfaults in bdrv_close(). Looks like it tries to finish some i/o which 
coroutine we have already entered and terminated?

(gdb) run
Starting program: /vz/anefedov/qemu-build/us/./qemu-img convert -O qcow2 
./harddisk.hdd.c ./harddisk.hdd
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffeac2d700 (LWP 436020)]
[New Thread 0x7fffe4ed6700 (LWP 436021)]
qemu-img: error while reading sector 20480: Input/output error
qemu-img: error while writing sector 19712: Operation now in progress

Program received signal SIGSEGV, Segmentation fault.
aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
454 ctx = atomic_read(&co->ctx);
(gdb) bt
#0  aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
/* [Anton]: thread_pool_co_cb () here */
#1  0x55634629 in thread_pool_completion_bh (opaque=0x55cfe020) at 
/mnt/code/us-qemu/util/thread-pool.c:189
#2  0x55633b31 in aio_bh_call (bh=0x55cfe0f0) at 
/mnt/code/us-qemu/util/async.c:90
#3  aio_bh_poll (ctx=ctx@entry=0x55cee6d0) at 
/mnt/code/us-qemu/util/async.c:118
#4  0x55636f14 in aio_poll (ctx=ctx@entry=0x55cee6d0, 
blocking=) at /mnt/code/us-qemu/util/aio-posix.c:682
#5  0x555c52d4 in bdrv_drain_recurse (bs=bs@entry=0x55d22560) at 
/mnt/code/us-qemu/block/io.c:164
#6  0x555c5aed in bdrv_drained_begin (bs=bs@entry=0x55d22560) at 
/mnt/code/us-qemu/block/io.c:248
#7  0x55581443 in bdrv_close (bs=0x55d22560) at 
/mnt/code/us-qemu/block.c:2909
#8  bdrv_delete (bs=0x55d22560) at /mnt/code/us-qemu/block.c:3100
#9  bdrv_unref (bs=0x55d22560) at /mnt/code/us-qemu/block.c:4087
#10 0x555baf44 in blk_remove_bs (blk=blk@entry=0x55d22380) at 
/mnt/code/us-qemu/block/block-backend.c:552
#11 0x555bb173 in blk_delete (blk=0x55d22380) at 
/mnt/code/us-qemu/block/block-backend.c:238
#12 blk_unref (blk=blk@entry=0x55d22380) at 
/mnt/code/us-qemu/block/block-backend.c:282
#13 0x5557a22c in img_convert (argc=, argv=) at /mnt/code/us-qemu/qemu-img.c:2359
#14 0x55574189 in main (argc=5, argv=0x7fffe4a0) at 
/mnt/code/us-qemu/qemu-img.c:4464



Peter



/Anton



it seems that this is a bit tricky, can you share how your test case works?

thanks,
peter



how I tested today was basically

diff --git a/qemu-img.c b/qemu-img.c
index 4425aaa..3d2d506 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1788,6 +1788,10 @@ static void coroutine_fn convert_co_do_copy(void 
*opaque)


 if (status == BLK_DATA) {
 ret = convert_co_read(s, sector_num, n, buf);
+const uint64_t fsector = 10*1024*1024/512;
+if (sector_num <= fsector && fsector < sector_num+n) {
+ret = -EIO;
+}
 if (ret < 0) {
 error_report("error while reading sector %" PRId64
  ": %s", sector_num, strerror(-ret));




Re: [Qemu-block] [PATCH 17/17] block: Make bdrv_is_allocated_above() byte-based

2017-04-24 Thread John Snow


On 04/11/2017 06:29 PM, Eric Blake wrote:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  In the common case, allocation is unlikely to ever use
> values that are not naturally sector-aligned, but it is possible
> that byte-based values will let us be more precise about allocation
> at the end of an unaligned file that can do byte-based access.
> 
> Changing the signature of the function to use int64_t *pnum ensures
> that the compiler enforces that all callers are updated.  For now,
> the io.c layer still assert()s that all callers are sector-aligned,
> but that can be relaxed when a later patch implements byte-based
> block status.  Therefore, for the most part this patch is just the
> addition of scaling at the callers followed by inverse scaling at
> bdrv_is_allocated().  But some code, particularly stream_run(),
> gets a lot simpler because it no longer has to mess with sectors.
> 
> For ease of review, bdrv_is_allocated() was tackled separately.
> 
> Signed-off-by: Eric Blake 
> ---
>  include/block/block.h |  2 +-
>  block/commit.c| 20 
>  block/io.c| 36 +++-
>  block/mirror.c|  5 -
>  block/replication.c   | 17 -
>  block/stream.c| 21 +
>  qemu-img.c| 10 +++---
>  7 files changed, 56 insertions(+), 55 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 8641149..740cb86 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -425,7 +425,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
>  int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
>int64_t *pnum);
>  int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
> -int64_t sector_num, int nb_sectors, int *pnum);
> +int64_t offset, int64_t bytes, int64_t *pnum);
> 
>  bool bdrv_is_read_only(BlockDriverState *bs);
>  bool bdrv_is_sg(BlockDriverState *bs);
> diff --git a/block/commit.c b/block/commit.c
> index 4d6bb2a..989de7d 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -132,7 +132,7 @@ static void coroutine_fn commit_run(void *opaque)
>  int64_t offset;
>  uint64_t delay_ns = 0;
>  int ret = 0;
> -int n = 0; /* sectors */
> +int64_t n = 0; /* bytes */
>  void *buf = NULL;
>  int bytes_written = 0;
>  int64_t base_len;
> @@ -157,7 +157,7 @@ static void coroutine_fn commit_run(void *opaque)
> 
>  buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
> 
> -for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) 
> {
> +for (offset = 0; offset < s->common.len; offset += n) {
>  bool copy;
> 
>  /* Note that even when no rate limit is applied we need to yield
> @@ -169,15 +169,12 @@ static void coroutine_fn commit_run(void *opaque)
>  }
>  /* Copy if allocated above the base */
>  ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
> -  offset / BDRV_SECTOR_SIZE,
> -  COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
> -  &n);
> +  offset, COMMIT_BUFFER_SIZE, &n);
>  copy = (ret == 1);
> -trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
> +trace_commit_one_iteration(s, offset, n, ret);
>  if (copy) {
> -ret = commit_populate(s->top, s->base, offset,
> -  n * BDRV_SECTOR_SIZE, buf);
> -bytes_written += n * BDRV_SECTOR_SIZE;
> +ret = commit_populate(s->top, s->base, offset, n, buf);
> +bytes_written += n;
>  }
>  if (ret < 0) {
>  BlockErrorAction action =
> @@ -190,11 +187,10 @@ static void coroutine_fn commit_run(void *opaque)
>  }
>  }
>  /* Publish progress */
> -s->common.offset += n * BDRV_SECTOR_SIZE;
> +s->common.offset += n;
> 
>  if (copy && s->common.speed) {
> -delay_ns = ratelimit_calculate_delay(&s->limit,
> - n * BDRV_SECTOR_SIZE);
> +delay_ns = ratelimit_calculate_delay(&s->limit, n);
>  }
>  }
> 
> diff --git a/block/io.c b/block/io.c
> index 438a493..9218329 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1930,52 +1930,46 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState 
> *bs, int64_t offset,
>  /*
>   * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
>   *
> - * Return true if the given sector is allocated in any image between
> + * Return true if the given offset is allocated in any image between

perhaps "range" instead of "offset"?

>   * BASE and TOP (inclusive).  BASE can be NULL to check if the given
> - 

Re: [Qemu-block] [PATCH 17/17] block: Make bdrv_is_allocated_above() byte-based

2017-04-24 Thread Eric Blake
On 04/24/2017 06:06 PM, John Snow wrote:
> 
> 
> On 04/11/2017 06:29 PM, Eric Blake wrote:
>> We are gradually moving away from sector-based interfaces, towards
>> byte-based.  In the common case, allocation is unlikely to ever use
>> values that are not naturally sector-aligned, but it is possible
>> that byte-based values will let us be more precise about allocation
>> at the end of an unaligned file that can do byte-based access.
>>
>> Changing the signature of the function to use int64_t *pnum ensures
>> that the compiler enforces that all callers are updated.  For now,
>> the io.c layer still assert()s that all callers are sector-aligned,
>> but that can be relaxed when a later patch implements byte-based
>> block status.  Therefore, for the most part this patch is just the
>> addition of scaling at the callers followed by inverse scaling at
>> bdrv_is_allocated().  But some code, particularly stream_run(),
>> gets a lot simpler because it no longer has to mess with sectors.
>>

>> +++ b/block/io.c
>> @@ -1930,52 +1930,46 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState 
>> *bs, int64_t offset,
>>  /*
>>   * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
>>   *
>> - * Return true if the given sector is allocated in any image between
>> + * Return true if the given offset is allocated in any image between
> 
> perhaps "range" instead of "offset"?
> 
>>   * BASE and TOP (inclusive).  BASE can be NULL to check if the given
>> - * sector is allocated in any image of the chain.  Return false otherwise,
>> + * offset is allocated in any image of the chain.  Return false otherwise,
>>   * or negative errno on failure.

Seems reasonable.


>>  /*
>> - * [sector_num, nb_sectors] is unallocated on top but intermediate
>> - * might have
>> - *
>> - * [sector_num+x, nr_sectors] allocated.
>> + * [offset, bytes] is unallocated on top but intermediate
>> + * might have [offset+x, bytes-x] allocated.
>>   */
>> -if (n > psectors_inter &&
>> +if (n > pnum_inter &&
>>  (intermediate == top ||
>> - sector_num + psectors_inter < intermediate->total_sectors)) {
> 
> 
> 
>> -n = psectors_inter;
>> + offset + pnum_inter < (intermediate->total_sectors *
>> +BDRV_SECTOR_SIZE))) {
> 
> Naive question: not worth using either bdrv_getlength for bytes, or the
> bdrv_nb_sectors helpers?

bdrv_getlength(intermediate) should indeed be the same as
intermediate->total_sectors * BDRV_SECTOR_SIZE (for now - ultimately it
would be nice to track a byte length rather than a sector length for
images). I can make that cleanup for v2.

> 
> Reviewed-by: John Snow 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 1/2] qemu-img: make sure contain the consecutive number of zero bytes

2017-04-24 Thread 858585 jemmy
On Mon, Apr 24, 2017 at 10:43 PM, Eric Blake  wrote:
> On 04/23/2017 09:33 AM, jemmy858...@gmail.com wrote:
>> From: Lidong Chen 
>>
>> is_allocated_sectors_min don't guarantee to contain the
>> consecutive number of zero bytes. this patch fixes this bug.
>
> This message was sent without an 'In-Reply-To' header pointing to a 0/2
> cover letter.  When sending a series, please always thread things to a
> cover letter; you may find 'git config format.coverletter auto' to be
> helpful.

Thanks for your kind advises.

>
>>
>> Signed-off-by: Lidong Chen 
>> ---
>>  qemu-img.c | 11 ++-
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index b220cf7..df6d165 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -1060,9 +1060,9 @@ static int is_allocated_sectors(const uint8_t *buf, 
>> int n, int *pnum)
>>  }
>>
>>  /*
>> - * Like is_allocated_sectors, but if the buffer starts with a used sector,
>> - * up to 'min' consecutive sectors containing zeros are ignored. This avoids
>> - * breaking up write requests for only small sparse areas.
>> + * Like is_allocated_sectors, but up to 'min' consecutive sectors
>> + * containing zeros are ignored. This avoids breaking up write requests
>> + * for only small sparse areas.
>>   */
>>  static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum,
>>  int min)
>> @@ -1071,11 +1071,12 @@ static int is_allocated_sectors_min(const uint8_t 
>> *buf, int n, int *pnum,
>>  int num_checked, num_used;
>>
>>  if (n < min) {
>> -min = n;
>> +*pnum = n;
>> +return 1;
>>  }
>>
>>  ret = is_allocated_sectors(buf, n, pnum);
>> -if (!ret) {
>> +if (!ret && *pnum >= min) {
>
> I seem to recall past attempts to try and patch this function, which
> were then turned down, although I haven't scrubbed the archives for a
> quick URL to point to. I'm worried that there are more subtleties here
> than what you realize.

Hi Eric:
Do you mean this URL?
https://lists.gnu.org/archive/html/qemu-block/2017-01/msg00306.html

But I think the code is not consistent with qemu-img --help.
qemu-img --help
  '-S' indicates the consecutive number of bytes (defaults to 4k) that must
   contain only zeros for qemu-img to create a sparse image during
   conversion. If the number of bytes is 0, the source will not be
scanned for
   unallocated or zero sectors, and the destination image will always be
   fully allocated.

another reason:
if s->has_zero_init is 1(the qcow2 image which have backing_file), the empty
space at the beginning of the buffer still need write and invoke
blk_co_pwrite_zeroes.
and split a single write operation into two just because there is small empty
space at the beginning.

Thanks.

>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>



Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] qemu-img: fix some spelling errors

2017-04-24 Thread 858585 jemmy
On Mon, Apr 24, 2017 at 11:53 PM, Eric Blake  wrote:
> On 04/24/2017 10:47 AM, Eric Blake wrote:
>> On 04/24/2017 10:37 AM, Philippe Mathieu-Daudé wrote:
>>
>  /*
> - * Returns true iff the first sector pointed to by 'buf' contains at
> least
> - * a non-NUL byte.
> + * Returns true if the first sector pointed to by 'buf' contains at
> least
> + * a non-NULL byte.

 NACK to both changes.  'iff' is an English word that is shorthand for
 "if and only if".  "NUL" means the one-byte character, while "NULL"
 means the 8-byte (or 4-byte, on 32-bit platform) pointer value.
>>>
>>> I agree with Lidong shorthands are not obvious from non-native speaker.
>>>
>>> What about this?
>>>
>>>  * Returns true if (and only if) the first sector pointed to by 'buf'
>>> contains
>>
>> That might be okay.
>>
>>>  * at least a non-null character.
>>
>> But that still doesn't make sense.  The character name is NUL, and
>> non-NULL refers to something that is a pointer, not a character.
>
> What's more, the NUL character can actually occupy more than one byte
> (think UTF-16, where it is the two-byte 0 value).  Referring to NUL byte
> rather than NUL character (or even the ' byte') makes it obvious
> that this function is NOT encoding-sensitive, and doesn't start
> mis-behaving just because the data picks a multi-byte character encoding.

How about this?

 * Returns true  if (and only if) the first sector pointed to by 'buf'
contains at least
 * a non-zero byte.

Thanks.

>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>