[Qemu-block] [PULL 2/2] iscsi: add missing colons to the qapi docs

2017-02-27 Thread Jeff Cody
The missing colons make the iscsi part of the documentation not render
quite as nicely, so add those in.

Signed-off-by: Jeff Cody 
---
 qapi/block-core.json | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5f82d35..cf24c04 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2625,29 +2625,29 @@
 ##
 # @BlockdevOptionsIscsi:
 #
-# @transportThe iscsi transport type
+# @transport:   The iscsi transport type
 #
-# @portal   The address of the iscsi portal
+# @portal:  The address of the iscsi portal
 #
-# @target   The target iqn name
+# @target:  The target iqn name
 #
-# @lun  #optional LUN to connect to. Defaults to 0.
+# @lun: #optional LUN to connect to. Defaults to 0.
 #
-# @user #optional User name to log in with. If omitted, no CHAP
+# @user:#optional User name to log in with. If omitted, no CHAP
 #   authentication is performed.
 #
-# @password-secret  #optional The ID of a QCryptoSecret object providing
+# @password-secret: #optional The ID of a QCryptoSecret object providing
 #   the password for the login. This option is required if
 #   @user is specified.
 #
-# @initiator-name   #optional The iqn name we want to identify to the target
+# @initiator-name:  #optional The iqn name we want to identify to the target
 #   as. If this option is not specified, an initiator name is
 #   generated automatically.
 #
-# @header-digest#optional The desired header digest. Defaults to
+# @header-digest:   #optional The desired header digest. Defaults to
 #   none-crc32c.
 #
-# @timeout  #optional Timeout in seconds after which a request will
+# @timeout: #optional Timeout in seconds after which a request will
 #   timeout. 0 means no timeout and is the default.
 #
 # Driver specific block device options for iscsi
-- 
2.9.3




[Qemu-block] [PULL 1/2] block/mirror: fix broken sparseness detection

2017-02-27 Thread Jeff Cody
From: John Snow 

int64_t is in all likelihood the actual scalar type we want.
Yep, really.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1219541

Signed-off-by: John Snow 
Reviewed-by: Jeff Cody 
Signed-off-by: Jeff Cody 
---
 block/mirror.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 3d50857..1b34b36 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -386,7 +386,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 nb_chunks * sectors_per_chunk);
 bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
 while (nb_chunks > 0 && sector_num < end) {
-int ret;
+int64_t ret;
 int io_sectors, io_sectors_acct;
 BlockDriverState *file;
 enum MirrorMethod {
-- 
2.9.3




[Qemu-block] [PULL 0/2] Block patches

2017-02-27 Thread Jeff Cody
The following changes since commit 8f2d7c341184a95d05476ea3c45dbae2b9ddbe51:

  Merge remote-tracking branch 
'remotes/berrange/tags/pull-qcrypto-2017-02-27-1' into staging (2017-02-27 
15:33:21 +)

are available in the git repository at:

  https://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to 51654aa52a94612edfaf76dcb51c0a0b7821c90d:

  iscsi: add missing colons to the qapi docs (2017-02-27 23:33:41 -0500)


Block patches for 2.9


Jeff Cody (1):
  iscsi: add missing colons to the qapi docs

John Snow (1):
  block/mirror: fix broken sparseness detection

 block/mirror.c   |  2 +-
 qapi/block-core.json | 18 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

-- 
2.9.3




Re: [Qemu-block] [PATCH 1/1] iscsi: add missing colons to the qapi docs

2017-02-27 Thread Jeff Cody
On Mon, Feb 27, 2017 at 11:29:07PM -0500, Jeff Cody wrote:
> The missing colons make the iscsi part of the documentation not render
> quite as nicely, so add those in.
> 
> Signed-off-by: Jeff Cody 
> ---
>  qapi/block-core.json | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 5f82d35..cf24c04 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2625,29 +2625,29 @@
>  ##
>  # @BlockdevOptionsIscsi:
>  #
> -# @transportThe iscsi transport type
> +# @transport:   The iscsi transport type
>  #
> -# @portal   The address of the iscsi portal
> +# @portal:  The address of the iscsi portal
>  #
> -# @target   The target iqn name
> +# @target:  The target iqn name
>  #
> -# @lun  #optional LUN to connect to. Defaults to 0.
> +# @lun: #optional LUN to connect to. Defaults to 0.
>  #
> -# @user #optional User name to log in with. If omitted, no CHAP
> +# @user:#optional User name to log in with. If omitted, no CHAP
>  #   authentication is performed.
>  #
> -# @password-secret  #optional The ID of a QCryptoSecret object providing
> +# @password-secret: #optional The ID of a QCryptoSecret object providing
>  #   the password for the login. This option is required if
>  #   @user is specified.
>  #
> -# @initiator-name   #optional The iqn name we want to identify to the target
> +# @initiator-name:  #optional The iqn name we want to identify to the target
>  #   as. If this option is not specified, an initiator name is
>  #   generated automatically.
>  #
> -# @header-digest#optional The desired header digest. Defaults to
> +# @header-digest:   #optional The desired header digest. Defaults to
>  #   none-crc32c.
>  #
> -# @timeout  #optional Timeout in seconds after which a request will
> +# @timeout: #optional Timeout in seconds after which a request will
>  #   timeout. 0 means no timeout and is the default.
>  #
>  # Driver specific block device options for iscsi
> -- 
> 2.9.3
> 

Thanks,

Applied to my block branch:

git://github.com/codyprime/qemu-kvm-jtc.git block

-Jeff



[Qemu-block] [PATCH 1/1] iscsi: add missing colons to the qapi docs

2017-02-27 Thread Jeff Cody
The missing colons make the iscsi part of the documentation not render
quite as nicely, so add those in.

Signed-off-by: Jeff Cody 
---
 qapi/block-core.json | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5f82d35..cf24c04 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2625,29 +2625,29 @@
 ##
 # @BlockdevOptionsIscsi:
 #
-# @transportThe iscsi transport type
+# @transport:   The iscsi transport type
 #
-# @portal   The address of the iscsi portal
+# @portal:  The address of the iscsi portal
 #
-# @target   The target iqn name
+# @target:  The target iqn name
 #
-# @lun  #optional LUN to connect to. Defaults to 0.
+# @lun: #optional LUN to connect to. Defaults to 0.
 #
-# @user #optional User name to log in with. If omitted, no CHAP
+# @user:#optional User name to log in with. If omitted, no CHAP
 #   authentication is performed.
 #
-# @password-secret  #optional The ID of a QCryptoSecret object providing
+# @password-secret: #optional The ID of a QCryptoSecret object providing
 #   the password for the login. This option is required if
 #   @user is specified.
 #
-# @initiator-name   #optional The iqn name we want to identify to the target
+# @initiator-name:  #optional The iqn name we want to identify to the target
 #   as. If this option is not specified, an initiator name is
 #   generated automatically.
 #
-# @header-digest#optional The desired header digest. Defaults to
+# @header-digest:   #optional The desired header digest. Defaults to
 #   none-crc32c.
 #
-# @timeout  #optional Timeout in seconds after which a request will
+# @timeout: #optional Timeout in seconds after which a request will
 #   timeout. 0 means no timeout and is the default.
 #
 # Driver specific block device options for iscsi
-- 
2.9.3




[Qemu-block] [PATCH v3 5/5] block/rbd: add support for 'mon_host', 'auth_supported' via QAPI

2017-02-27 Thread Jeff Cody
This adds support for three additional options that may be specified
by QAPI in blockdev-add:

server: host, port
auth method: either 'cephx' or 'none'

The "server" and "auth-supported" QAPI parameters are arrays.  To conform
with the rados API, the array items are join as a single string with a ';'
character as a delimiter when setting the configuration values.

Signed-off-by: Jeff Cody 
---
 block/rbd.c  | 119 +++
 qapi/block-core.json |  29 +
 2 files changed, 148 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index cc43f42..dfa52cc 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -405,6 +405,19 @@ static QemuOptsList runtime_opts = {
 .type = QEMU_OPT_STRING,
 .help = "Legacy rados key/value option parameters",
 },
+{
+.name = "host",
+.type = QEMU_OPT_STRING,
+},
+{
+.name = "port",
+.type = QEMU_OPT_STRING,
+},
+{
+.name = "auth",
+.type = QEMU_OPT_STRING,
+.help = "Supported authentication method, either cephx or none",
+},
 { /* end of list */ }
 },
 };
@@ -565,14 +578,89 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 qemu_aio_unref(acb);
 }
 
+#define RBD_MON_HOST  0
+#define RBD_AUTH_SUPPORTED1
+static char *qemu_rbd_array_opts(QDict *options, const char *prefix, int type,
+ Error **errp)
+{
+size_t num_entries;
+QemuOpts *opts = NULL;
+QDict *sub_options;
+const char *host;
+const char *port;
+char *str;
+char *rados_str = NULL;
+Error *local_err = NULL;
+
+assert(type == RBD_MON_HOST || type == RBD_AUTH_SUPPORTED);
+
+num_entries = qdict_array_entries(options, prefix);
+
+if (num_entries) {
+for (int i = 0; i < num_entries; i++) {
+char *tmp = NULL;
+const char *value;
+char *rados_str_tmp;
+
+str = g_strdup_printf("%s%d.", prefix, i);
+qdict_extract_subqdict(options, _options, str);
+g_free(str);
+
+opts = qemu_opts_create(_opts, NULL, 0, _abort);
+qemu_opts_absorb_qdict(opts, sub_options, _err);
+QDECREF(sub_options);
+if (local_err) {
+error_propagate(errp, local_err);
+goto exit;
+}
+
+if (type == RBD_MON_HOST) {
+host = qemu_opt_get(opts, "host");
+port = qemu_opt_get(opts, "port");
+
+value = host;
+if (port) {
+tmp = g_strdup_printf("%s:%s", host, port);
+value = tmp;
+}
+} else {
+value = qemu_opt_get(opts, "auth");
+}
+
+
+/* each iteration in the for loop will build upon the string,
+ * and if rados_str is NULL then it is our first pass */
+if (rados_str) {
+/* separate options with ';', as that  is what rados_conf_set()
+ * requires */
+rados_str_tmp = rados_str;
+rados_str = g_strdup_printf("%s;%s", rados_str_tmp, value);
+g_free(rados_str_tmp);
+} else {
+rados_str = g_strdup(value);
+}
+
+g_free(tmp);
+qemu_opts_del(opts);
+opts = NULL;
+}
+}
+
+exit:
+qemu_opts_del(opts);
+return rados_str;
+}
+
 static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
  Error **errp)
 {
 BDRVRBDState *s = bs->opaque;
 const char *pool, *snap, *conf, *clientname, *name, *keypairs;
+const char *auth_supported;
 const char *secretid;
 QemuOpts *opts;
 Error *local_err = NULL;
+char *mon_host = NULL;
 int r;
 
 opts = qemu_opts_create(_opts, NULL, 0, _abort);
@@ -583,6 +671,22 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 return -EINVAL;
 }
 
+   auth_supported = qemu_rbd_array_opts(options, "auth-supported.",
+ RBD_AUTH_SUPPORTED, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+r = -EINVAL;
+goto failed_opts;
+}
+
+mon_host = qemu_rbd_array_opts(options, "server.",
+   RBD_MON_HOST, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+r = -EINVAL;
+goto failed_opts;
+}
+
 secretid = qemu_opt_get(opts, "password-secret");
 
 pool   = qemu_opt_get(opts, "pool");
@@ -615,6 +719,20 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto failed_shutdown;
 }
 
+if (mon_host) {
+r = rados_conf_set(s->cluster, "mon_host", mon_host);
+if (r < 

[Qemu-block] [PATCH v3 0/5] RBD: blockdev-add (for 2.9?)

2017-02-27 Thread Jeff Cody

This series adds blockdev-add for rbd.


Changes from v2:

Patch 2: Updated commit message, and documented the runtime opts
 (Thanks Eric)

Patch 3: Fixed commit type, added "FIXME" in ugly string concat spot
 (Thanks Eric)

Patch 4: Fixed all the nits - deleted lines, spaces.  Kept list 
 alphabetical.  (Thanks Eric)

Patch 5: Significant changes.  Both 'mon_host' became 'server', and an array.
 'auth_supported' became 'auth-supported', and an array.
 (Thanks Daniel, Eric)

  Patch 5 also contains a new function, qemu_rbd_array_opts(), to
  parse the array options.


Changes from v1:

Overall:

* QAPI interface does not allow arbitrary key/value pairs
  in v2 (Thanks Daniel)

* QAPI interface adds 'mon_host' and 'auth_supported' options (Thanks Daniel)

* Use 'user' instead of 'rbd-id' (Thanks Daniel)
v
By patch:

Patch 1:
 * Fixed some indentation in patch 1 (Thanks Markus)

Patch 2:
 * 'rbd-id' becomes 'user', and the commit message is fixed. (Thanks Daniel)

Patch 3:
 * Ripple-through from changes in patch 2
 * Removed the string unescape from qemu_rbd_set_keypairs(), because the
   strings have already been unescaped by the time they hit this function.

Patch 4:
 * 'rbd-id' becomes 'user'
 * drop the 'keyvalue-pairs' from the QAPI  (both, thanks Daniel)

Patch 5:
 * new patch
 * Adds the 'server' (mon_host) and 'auth_supported' options to the
   QAPI (Thanks Daniel)


Jeff Cody (5):
  block/rbd: don't copy strings in qemu_rbd_next_tok()
  block/rbd: add all the currently supported runtime_opts
  block/rbd: parse all options via bdrv_parse_filename
  block/rbd: add blockdev-add support
  block/rbd: add support for 'mon_host', 'auth_supported' via QAPI

 block/rbd.c  | 553 ++-
 qapi/block-core.json |  62 +-
 2 files changed, 427 insertions(+), 188 deletions(-)

-- 
2.9.3




[Qemu-block] [PATCH v3 1/5] block/rbd: don't copy strings in qemu_rbd_next_tok()

2017-02-27 Thread Jeff Cody
This patch is prep work for parsing options for .bdrv_parse_filename,
and using QDict options.

The function qemu_rbd_next_tok() searched for various key/value pairs,
and copied them into buffers.  This will soon be an unnecessary extra
step, so we will now return found strings by reference only, and
offload the responsibility for safely handling/coping these strings to
the caller.

This also cleans up error handling some, as the callers now rely on
the Error object to determine if there is a parse error.

Reviewed-by: Eric Blake 
Reviewed-by: Markus Armbruster 
Signed-off-by: Jeff Cody 
---
 block/rbd.c | 99 +++--
 1 file changed, 64 insertions(+), 35 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 22e8e69..33c21d8 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -102,10 +102,10 @@ typedef struct BDRVRBDState {
 char *snap;
 } BDRVRBDState;
 
-static int qemu_rbd_next_tok(char *dst, int dst_len,
- char *src, char delim,
- const char *name,
- char **p, Error **errp)
+static char *qemu_rbd_next_tok(int max_len,
+   char *src, char delim,
+   const char *name,
+   char **p, Error **errp)
 {
 int l;
 char *end;
@@ -127,17 +127,15 @@ static int qemu_rbd_next_tok(char *dst, int dst_len,
 }
 }
 l = strlen(src);
-if (l >= dst_len) {
+if (l >= max_len) {
 error_setg(errp, "%s too long", name);
-return -EINVAL;
+return NULL;
 } else if (l == 0) {
 error_setg(errp, "%s too short", name);
-return -EINVAL;
+return NULL;
 }
 
-pstrcpy(dst, dst_len, src);
-
-return 0;
+return src;
 }
 
 static void qemu_rbd_unescape(char *src)
@@ -162,7 +160,9 @@ static int qemu_rbd_parsename(const char *filename,
 {
 const char *start;
 char *p, *buf;
-int ret;
+int ret = 0;
+char *found_str;
+Error *local_err = NULL;
 
 if (!strstart(filename, "rbd:", )) {
 error_setg(errp, "File name must start with 'rbd:'");
@@ -174,36 +174,60 @@ static int qemu_rbd_parsename(const char *filename,
 *snap = '\0';
 *conf = '\0';
 
-ret = qemu_rbd_next_tok(pool, pool_len, p,
-'/', "pool name", , errp);
-if (ret < 0 || !p) {
+found_str = qemu_rbd_next_tok(pool_len, p,
+  '/', "pool name", , _err);
+if (local_err) {
+goto done;
+}
+if (!p) {
 ret = -EINVAL;
+error_setg(errp, "Pool name is required");
 goto done;
 }
-qemu_rbd_unescape(pool);
+qemu_rbd_unescape(found_str);
+g_strlcpy(pool, found_str, pool_len);
 
 if (strchr(p, '@')) {
-ret = qemu_rbd_next_tok(name, name_len, p,
-'@', "object name", , errp);
-if (ret < 0) {
+found_str = qemu_rbd_next_tok(name_len, p,
+  '@', "object name", , _err);
+if (local_err) {
 goto done;
 }
-ret = qemu_rbd_next_tok(snap, snap_len, p,
-':', "snap name", , errp);
-qemu_rbd_unescape(snap);
+qemu_rbd_unescape(found_str);
+g_strlcpy(name, found_str, name_len);
+
+found_str = qemu_rbd_next_tok(snap_len, p,
+  ':', "snap name", , _err);
+if (local_err) {
+goto done;
+}
+qemu_rbd_unescape(found_str);
+g_strlcpy(snap, found_str, snap_len);
 } else {
-ret = qemu_rbd_next_tok(name, name_len, p,
-':', "object name", , errp);
+found_str = qemu_rbd_next_tok(name_len, p,
+  ':', "object name", , _err);
+if (local_err) {
+goto done;
+}
+qemu_rbd_unescape(found_str);
+g_strlcpy(name, found_str, name_len);
 }
-qemu_rbd_unescape(name);
-if (ret < 0 || !p) {
+if (!p) {
 goto done;
 }
 
-ret = qemu_rbd_next_tok(conf, conf_len, p,
-'\0', "configuration", , errp);
+found_str = qemu_rbd_next_tok(conf_len, p,
+  '\0', "configuration", , _err);
+if (local_err) {
+goto done;
+}
+g_strlcpy(conf, found_str, conf_len);
 
 done:
+if (local_err) {
+ret = -EINVAL;
+error_propagate(errp, local_err);
+}
 g_free(buf);
 return ret;
 }
@@ -262,17 +286,18 @@ static int qemu_rbd_set_conf(rados_t cluster, const char 
*conf,
  Error **errp)
 {
 char *p, *buf;
-char name[RBD_MAX_CONF_NAME_SIZE];
-char value[RBD_MAX_CONF_VAL_SIZE];
+char *name;
+char *value;
+Error *local_err = NULL;
 int ret = 0;
 
  

[Qemu-block] [PATCH v3 2/5] block/rbd: add all the currently supported runtime_opts

2017-02-27 Thread Jeff Cody
This adds all the currently supported runtime opts, which
are the options as parsed from the filename.  All of these
options are explicitly checked for during during runtime,
with an exception to the "keyvalue-pairs" option.

This option contains all the key/value pairs that the QEMU rbd
driver merely unescapes, and passes along blindly to rados.  This
option is a "legacy" option, and will not be exposed in the QAPI
or available for introspection.

Reviewed-by: Eric Blake 
Signed-off-by: Jeff Cody 
---
 block/rbd.c | 68 -
 1 file changed, 49 insertions(+), 19 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 33c21d8..67d680c 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -357,6 +357,55 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
 }
 }
 
+static QemuOptsList runtime_opts = {
+.name = "rbd",
+.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
+.desc = {
+{
+.name = "filename",
+.type = QEMU_OPT_STRING,
+.help = "Specification of the rbd image",
+},
+{
+.name = "password-secret",
+.type = QEMU_OPT_STRING,
+.help = "ID of secret providing the password",
+},
+{
+.name = "conf",
+.type = QEMU_OPT_STRING,
+.help = "Rados config file location",
+},
+{
+.name = "pool",
+.type = QEMU_OPT_STRING,
+.help = "Rados pool name",
+},
+{
+.name = "image",
+.type = QEMU_OPT_STRING,
+.help = "Image name in the pool",
+},
+{
+.name = "snapshot",
+.type = QEMU_OPT_STRING,
+.help = "Ceph snapshot name",
+},
+{
+/* maps to 'id' in rados_create() */
+.name = "user",
+.type = QEMU_OPT_STRING,
+.help = "Rados id name",
+},
+{
+.name = "keyvalue-pairs",
+.type = QEMU_OPT_STRING,
+.help = "Legacy rados key/value option parameters",
+},
+{ /* end of list */ }
+},
+};
+
 static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 Error *local_err = NULL;
@@ -500,25 +549,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 qemu_aio_unref(acb);
 }
 
-/* TODO Convert to fine grained options */
-static QemuOptsList runtime_opts = {
-.name = "rbd",
-.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
-.desc = {
-{
-.name = "filename",
-.type = QEMU_OPT_STRING,
-.help = "Specification of the rbd image",
-},
-{
-.name = "password-secret",
-.type = QEMU_OPT_STRING,
-.help = "ID of secret providing the password",
-},
-{ /* end of list */ }
-},
-};
-
 static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
  Error **errp)
 {
-- 
2.9.3




Re: [Qemu-block] [PATCH v2 5/5] block/rbd: add support for 'mon_host', 'auth_supported' via QAPI

2017-02-27 Thread Jeff Cody
On Mon, Feb 27, 2017 at 04:47:54PM -0600, Eric Blake wrote:
> On 02/27/2017 12:58 PM, Jeff Cody wrote:
> > This adds support for two additional options that may be specified
> > by QAPI in blockdev-add:
> > 
> > mon_host: servername and port
> > auth_supported: either 'cephx' or 'none'
> 
> Please spell new options with '-'
> 
> > 
> > Signed-off-by: Jeff Cody 
> > ---
> >  block/rbd.c  | 39 +++
> >  qapi/block-core.json |  8 
> >  2 files changed, 47 insertions(+)
> > 
> > diff --git a/block/rbd.c b/block/rbd.c
> > index e04a5e1..51e971e 100644
> > --- a/block/rbd.c
> > +++ b/block/rbd.c
> > @@ -394,6 +394,18 @@ static QemuOptsList runtime_opts = {
> >  .name = "keyvalue-pairs",
> >  .type = QEMU_OPT_STRING,
> >  },
> > +{
> > +.name = "server.host",
> > +.type = QEMU_OPT_STRING,
> > +},
> > +{
> > +.name = "server.port",
> > +.type = QEMU_OPT_STRING,
> > +},
> 
> See Dan's comment about supporting more than one server via an array in
> QAPI.
> 
> > +{
> > +.name = "auth_supported",
> 
> Should be auth-supported in QAPI.
> 
> 
> > @@ -604,6 +620,29 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
> > *options, int flags,
> >  goto failed_shutdown;
> >  }
> >  
> > +/* if mon_host was specified */
> > +if (host) {
> > +const char *hostname = host;
> > +char *mon_host = NULL;
> > +
> > +if (port) {
> > +mon_host = g_strdup_printf("%s:%s", host, port);
> 
> Does Ceph care about IPv6 (in which case you may need [host]:port when
> host itself includes ':')?
>

Some quick sanity testing seems to show that it does not need [] for ipv6
addresses, which is nice.


> > +hostname = mon_host;
> > +}
> > +r = rados_conf_set(s->cluster, "mon_host", hostname);
> > +g_free(mon_host);
> > +if (r < 0) {
> > +goto failed_shutdown;
> > +}
> > +}
> > +
> > +if (auth_supported) {
> > +r = rados_conf_set(s->cluster, "auth_supported", auth_supported);
> 
> Translating QAPI auth-supported to rados auth_supported is fine.
> 
> 
> > +++ b/qapi/block-core.json
> > @@ -2680,6 +2680,12 @@
> >  #
> >  # @user:   #optional Ceph id name.
> >  #
> > +# @server: #optional Monitor host address and port.  This maps
> > +#  to the "mon_host" Ceph option.
> > +#
> > +# @auth_supported: #optional Authentication supported.
> > +#  Either "cephx" or"none".
> 
> Missing a space.
> 
> If you're going to support only a finite set of strings, this should be
> a QAPI enum type, not 'str'.
> 
> > +#
> >  # @password-secret:#optional The ID of a QCryptoSecret object providing
> >  #   the password for the login.
> >  #
> > @@ -2691,6 +2697,8 @@
> >  '*conf': 'str',
> >  '*snapshot': 'str',
> >  '*user': 'str',
> > +'*server': 'InetSocketAddress',
> > +'*auth_supported': 'str',
> >  '*password-secret': 'str' } }
> >  
> >  ##
> > 
> 
> Looks like we'll need a v3 to tweak this one.
> 
> -- 
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 






Re: [Qemu-block] [Qemu-devel] [PATCH v16 08/22] qcow2: add bitmaps extension

2017-02-27 Thread John Snow


On 02/27/2017 07:27 AM, Max Reitz wrote:
> On 25.02.2017 18:07, Vladimir Sementsov-Ogievskiy wrote:
>> Add bitmap extension as specified in docs/specs/qcow2.txt.
>> For now, just mirror extension header into Qcow2 state and check
>> constraints. Also, calculate refcounts for qcow2 bitmaps, to not break
>> qemu-img check.
>>
>> For now, disable image resize if it has bitmaps. It will be fixed later.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>  block/Makefile.objs|   2 +-
>>  block/qcow2-bitmap.c   | 439 
>> +
>>  block/qcow2-refcount.c |   6 +
>>  block/qcow2.c  | 124 +-
>>  block/qcow2.h  |  27 +++
>>  5 files changed, 592 insertions(+), 6 deletions(-)
>>  create mode 100644 block/qcow2-bitmap.c
> 
> Somehow I have the feeling Kevin will find bad things in this patch, but
> since I have already approved of all of the previous patches this one is
> composed of and the changes on top of that look OK to me:
> 
> Reviewed-by: Max Reitz 
> 

Pretty much the same sentiment as Max.

The patchset is now organized a bit strangely, but I did R-B all the
component pieces before, and the changes look fine.


Reviewed-by: John Snow 



Re: [Qemu-block] [PATCH v2 5/5] block/rbd: add support for 'mon_host', 'auth_supported' via QAPI

2017-02-27 Thread Jeff Cody
On Mon, Feb 27, 2017 at 04:47:54PM -0600, Eric Blake wrote:
> On 02/27/2017 12:58 PM, Jeff Cody wrote:
> > This adds support for two additional options that may be specified
> > by QAPI in blockdev-add:
> > 
> > mon_host: servername and port
> > auth_supported: either 'cephx' or 'none'
> 
> Please spell new options with '-'
> 

OK, thanks.

> > 
> > Signed-off-by: Jeff Cody 
> > ---
> >  block/rbd.c  | 39 +++
> >  qapi/block-core.json |  8 
> >  2 files changed, 47 insertions(+)
> > 
> > diff --git a/block/rbd.c b/block/rbd.c
> > index e04a5e1..51e971e 100644
> > --- a/block/rbd.c
> > +++ b/block/rbd.c
> > @@ -394,6 +394,18 @@ static QemuOptsList runtime_opts = {
> >  .name = "keyvalue-pairs",
> >  .type = QEMU_OPT_STRING,
> >  },
> > +{
> > +.name = "server.host",
> > +.type = QEMU_OPT_STRING,
> > +},
> > +{
> > +.name = "server.port",
> > +.type = QEMU_OPT_STRING,
> > +},
> 
> See Dan's comment about supporting more than one server via an array in
> QAPI.
> 
> > +{
> > +.name = "auth_supported",
> 
> Should be auth-supported in QAPI.
> 
> 

OK

> > @@ -604,6 +620,29 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
> > *options, int flags,
> >  goto failed_shutdown;
> >  }
> >  
> > +/* if mon_host was specified */
> > +if (host) {
> > +const char *hostname = host;
> > +char *mon_host = NULL;
> > +
> > +if (port) {
> > +mon_host = g_strdup_printf("%s:%s", host, port);
> 
> Does Ceph care about IPv6 (in which case you may need [host]:port when
> host itself includes ':')?
> 

Good question - it looks like it can be enabled in the conf file at least,
from this: 

http://docs.ceph.com/docs/master/install/manual-deployment/


> > +hostname = mon_host;
> > +}
> > +r = rados_conf_set(s->cluster, "mon_host", hostname);
> > +g_free(mon_host);
> > +if (r < 0) {
> > +goto failed_shutdown;
> > +}
> > +}
> > +
> > +if (auth_supported) {
> > +r = rados_conf_set(s->cluster, "auth_supported", auth_supported);
> 
> Translating QAPI auth-supported to rados auth_supported is fine.
> 
> 
> > +++ b/qapi/block-core.json
> > @@ -2680,6 +2680,12 @@
> >  #
> >  # @user:   #optional Ceph id name.
> >  #
> > +# @server: #optional Monitor host address and port.  This maps
> > +#  to the "mon_host" Ceph option.
> > +#
> > +# @auth_supported: #optional Authentication supported.
> > +#  Either "cephx" or"none".
> 
> Missing a space.
> 
> If you're going to support only a finite set of strings, this should be
> a QAPI enum type, not 'str'.
>

OK, I'll make it an enum (and fix the space, of course).

> > +#
> >  # @password-secret:#optional The ID of a QCryptoSecret object providing
> >  #   the password for the login.
> >  #
> > @@ -2691,6 +2697,8 @@
> >  '*conf': 'str',
> >  '*snapshot': 'str',
> >  '*user': 'str',
> > +'*server': 'InetSocketAddress',
> > +'*auth_supported': 'str',
> >  '*password-secret': 'str' } }
> >  
> >  ##
> > 
> 
> Looks like we'll need a v3 to tweak this one.
>

Yep - working on that now.  Thanks!

-Jeff




Re: [Qemu-block] [PATCH v2 3/5] block/rbd: parse all options via bdrv_parse_filename

2017-02-27 Thread Eric Blake
On 02/27/2017 04:56 PM, Jeff Cody wrote:

>>>  static BlockDriver bdrv_rbd = {
> 
>>> -.instance_size  = sizeof(BDRVRBDState),
>>> -.bdrv_needs_filename = true,
>>> -.bdrv_file_open = qemu_rbd_open,
>>> -.bdrv_close = qemu_rbd_close,
>>> -.bdrv_create= qemu_rbd_create,
>>> -.bdrv_has_zero_init = bdrv_has_zero_init_1,
>>> -.bdrv_get_info  = qemu_rbd_getinfo,
>>> -.create_opts= _rbd_create_opts,
>>> -.bdrv_getlength = qemu_rbd_getlength,
>>> -.bdrv_truncate  = qemu_rbd_truncate,
>>> -.protocol_name  = "rbd",
>>> +.format_name= "rbd",
>>> +.instance_size  = sizeof(BDRVRBDState),
>>> +.bdrv_parse_filename= qemu_rbd_parse_filename,
>>> +.bdrv_file_open = qemu_rbd_open,
>>> +.bdrv_close = qemu_rbd_close,
>>> +.bdrv_create= qemu_rbd_create,
>>> +.bdrv_has_zero_init = bdrv_has_zero_init_1,
>>> +.bdrv_get_info  = qemu_rbd_getinfo,
>>> +.create_opts= _rbd_create_opts,
>>
>> Pointless &; might as well remove it for consistency while touching it.
>>
> 
> I'm not sure I understand - we need the '&' here for the .create_opts
> initializer, unless I am overlooking something.

Nope, I'm overlooking. I assumed that everything here was a function
pointer, but you are right that .create_opts takes an object (not a
function) pointer, so the & is necessary.  Maybe I need to take a short
break from reviewing...

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 3/5] block/rbd: parse all options via bdrv_parse_filename

2017-02-27 Thread Jeff Cody
On Mon, Feb 27, 2017 at 04:35:58PM -0600, Eric Blake wrote:
> On 02/27/2017 12:58 PM, Jeff Cody wrote:
> > Get rid of qemu_rbd_parsename in favor of bdrv_parse_filename.
> > This simplifies a lot of the parsing as well, as we can treat everything
> > a bit simpler since nonexistent options are simply NULL pointers instead
> > of empy strings.
> 
> s/empy/empty/
>

Thanks

> > 
> > An important item to note:
> > 
> > Ceph has many extra option values that can be specified as key/value
> > pairs.  This was handled previously in the driver by extracting the
> > values that the QEMU driver cared about, and then blindly passing all
> > extra options to rbd after splitting them into key/value pairs, and
> > cleaning up any special character escaping.
> > 
> > The practice is continued in this patch; there is an option
> > "keyvalue-pairs" that is populated with all the key/value pairs that the
> > QEMU driver does not care about.  These key/value pairs will override
> > any settings in the 'conf' configuration file, just as they did before.
> > 
> > Signed-off-by: Jeff Cody 
> > ---
> >  block/rbd.c | 298 
> > ++--
> >  1 file changed, 148 insertions(+), 150 deletions(-)
> > 
> 
> > +
> > +/* The following are essentially all key/value pairs, and we treat
> > + * 'id' and 'conf' a bit special.  Key/value pairs may be in any 
> > order. */
> > +while (p) {
> 
> > +if (!strcmp(name, "conf")) {
> > +qdict_put(options, "conf", qstring_from_str(value));
> > +} else if (!strcmp(name, "id")) {
> > +qdict_put(options, "user" , qstring_from_str(value));
> > +} else {
> > +char *tmp = g_malloc0(max_keypair_size);
> > +/* only use a delimiter if it is not the first keypair found */
> > +/* These are sets of unknown key/value pairs we'll pass along
> > + * to ceph */
> > +if (keypairs[0]) {
> > +snprintf(tmp, max_keypair_size, ":%s=%s", name, value);
> > +pstrcat(keypairs, max_keypair_size, tmp);
> > +} else {
> > +snprintf(keypairs, max_keypair_size, "%s=%s", name, value);
> > +}
> > +g_free(tmp);
> > +}
> > +}
> > +
> > +if (keypairs[0]) {
> > +qdict_put(options, "keyvalue-pairs", qstring_from_str(keypairs));
> 
> Uggh.  Why are we compressing this into a single string, instead of
> using a GList?  True, we aren't exposing it through QAPI, but I still
> wonder if a smarter representation than a flat string is warranted.
> 

Yes, a bit gross.  I wouldn't mind cleaning it up (and maybe some other rbd
cleanup as well), but for this series I wanted to keep the "other" parsing
the same as before, and just try to (as much as possible) layer the
blockdev-add QAPI on top.

As you suggest below, I'll add a 'FIXME' here detailing that this is left in
place as legacy code, and should be cleaned up into something more palatable
like a GList (or at least _some_ sort of structured data).

> > @@ -434,35 +421,55 @@ static int qemu_rbd_create(const char *filename, 
> > QemuOpts *opts, Error **errp)
> >  if (objsize) {
> >  if ((objsize - 1) & objsize) {/* not a power of 2? */
> 
> Drive-by comment (if you fix it, do it as a separate followup patch): we
> have is_power_of_2() to make code like this more legible.
>

I'll add that to my post 2.9 rbd cleanup queue, thanks.

> >  
> >  static BlockDriver bdrv_rbd = {

> > -.instance_size  = sizeof(BDRVRBDState),
> > -.bdrv_needs_filename = true,
> > -.bdrv_file_open = qemu_rbd_open,
> > -.bdrv_close = qemu_rbd_close,
> > -.bdrv_create= qemu_rbd_create,
> > -.bdrv_has_zero_init = bdrv_has_zero_init_1,
> > -.bdrv_get_info  = qemu_rbd_getinfo,
> > -.create_opts= _rbd_create_opts,
> > -.bdrv_getlength = qemu_rbd_getlength,
> > -.bdrv_truncate  = qemu_rbd_truncate,
> > -.protocol_name  = "rbd",
> > +.format_name= "rbd",
> > +.instance_size  = sizeof(BDRVRBDState),
> > +.bdrv_parse_filename= qemu_rbd_parse_filename,
> > +.bdrv_file_open = qemu_rbd_open,
> > +.bdrv_close = qemu_rbd_close,
> > +.bdrv_create= qemu_rbd_create,
> > +.bdrv_has_zero_init = bdrv_has_zero_init_1,
> > +.bdrv_get_info  = qemu_rbd_getinfo,
> > +.create_opts= _rbd_create_opts,
> 
> Pointless &; might as well remove it for consistency while touching it.
> 

I'm not sure I understand - we need the '&' here for the .create_opts
initializer, unless I am overlooking something.

> > +.bdrv_getlength = qemu_rbd_getlength,
> > +.bdrv_truncate  = qemu_rbd_truncate,
> > +.protocol_name  = "rbd",
> >  
> 
> I don't know if it is worth respinning to change keyvalue-pairs into a
> more 

Re: [Qemu-block] [PATCH v2 5/5] block/rbd: add support for 'mon_host', 'auth_supported' via QAPI

2017-02-27 Thread Eric Blake
On 02/27/2017 12:58 PM, Jeff Cody wrote:
> This adds support for two additional options that may be specified
> by QAPI in blockdev-add:
> 
> mon_host: servername and port
> auth_supported: either 'cephx' or 'none'

Please spell new options with '-'

> 
> Signed-off-by: Jeff Cody 
> ---
>  block/rbd.c  | 39 +++
>  qapi/block-core.json |  8 
>  2 files changed, 47 insertions(+)
> 
> diff --git a/block/rbd.c b/block/rbd.c
> index e04a5e1..51e971e 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -394,6 +394,18 @@ static QemuOptsList runtime_opts = {
>  .name = "keyvalue-pairs",
>  .type = QEMU_OPT_STRING,
>  },
> +{
> +.name = "server.host",
> +.type = QEMU_OPT_STRING,
> +},
> +{
> +.name = "server.port",
> +.type = QEMU_OPT_STRING,
> +},

See Dan's comment about supporting more than one server via an array in
QAPI.

> +{
> +.name = "auth_supported",

Should be auth-supported in QAPI.


> @@ -604,6 +620,29 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  goto failed_shutdown;
>  }
>  
> +/* if mon_host was specified */
> +if (host) {
> +const char *hostname = host;
> +char *mon_host = NULL;
> +
> +if (port) {
> +mon_host = g_strdup_printf("%s:%s", host, port);

Does Ceph care about IPv6 (in which case you may need [host]:port when
host itself includes ':')?

> +hostname = mon_host;
> +}
> +r = rados_conf_set(s->cluster, "mon_host", hostname);
> +g_free(mon_host);
> +if (r < 0) {
> +goto failed_shutdown;
> +}
> +}
> +
> +if (auth_supported) {
> +r = rados_conf_set(s->cluster, "auth_supported", auth_supported);

Translating QAPI auth-supported to rados auth_supported is fine.


> +++ b/qapi/block-core.json
> @@ -2680,6 +2680,12 @@
>  #
>  # @user:   #optional Ceph id name.
>  #
> +# @server: #optional Monitor host address and port.  This maps
> +#  to the "mon_host" Ceph option.
> +#
> +# @auth_supported: #optional Authentication supported.
> +#  Either "cephx" or"none".

Missing a space.

If you're going to support only a finite set of strings, this should be
a QAPI enum type, not 'str'.

> +#
>  # @password-secret:#optional The ID of a QCryptoSecret object providing
>  #   the password for the login.
>  #
> @@ -2691,6 +2697,8 @@
>  '*conf': 'str',
>  '*snapshot': 'str',
>  '*user': 'str',
> +'*server': 'InetSocketAddress',
> +'*auth_supported': 'str',
>  '*password-secret': 'str' } }
>  
>  ##
> 

Looks like we'll need a v3 to tweak this one.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 3/5] block/rbd: parse all options via bdrv_parse_filename

2017-02-27 Thread Eric Blake
On 02/27/2017 12:58 PM, Jeff Cody wrote:
> Get rid of qemu_rbd_parsename in favor of bdrv_parse_filename.
> This simplifies a lot of the parsing as well, as we can treat everything
> a bit simpler since nonexistent options are simply NULL pointers instead
> of empy strings.

s/empy/empty/

> 
> An important item to note:
> 
> Ceph has many extra option values that can be specified as key/value
> pairs.  This was handled previously in the driver by extracting the
> values that the QEMU driver cared about, and then blindly passing all
> extra options to rbd after splitting them into key/value pairs, and
> cleaning up any special character escaping.
> 
> The practice is continued in this patch; there is an option
> "keyvalue-pairs" that is populated with all the key/value pairs that the
> QEMU driver does not care about.  These key/value pairs will override
> any settings in the 'conf' configuration file, just as they did before.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block/rbd.c | 298 
> ++--
>  1 file changed, 148 insertions(+), 150 deletions(-)
> 

> +
> +/* The following are essentially all key/value pairs, and we treat
> + * 'id' and 'conf' a bit special.  Key/value pairs may be in any order. 
> */
> +while (p) {

> +if (!strcmp(name, "conf")) {
> +qdict_put(options, "conf", qstring_from_str(value));
> +} else if (!strcmp(name, "id")) {
> +qdict_put(options, "user" , qstring_from_str(value));
> +} else {
> +char *tmp = g_malloc0(max_keypair_size);
> +/* only use a delimiter if it is not the first keypair found */
> +/* These are sets of unknown key/value pairs we'll pass along
> + * to ceph */
> +if (keypairs[0]) {
> +snprintf(tmp, max_keypair_size, ":%s=%s", name, value);
> +pstrcat(keypairs, max_keypair_size, tmp);
> +} else {
> +snprintf(keypairs, max_keypair_size, "%s=%s", name, value);
> +}
> +g_free(tmp);
> +}
> +}
> +
> +if (keypairs[0]) {
> +qdict_put(options, "keyvalue-pairs", qstring_from_str(keypairs));

Uggh.  Why are we compressing this into a single string, instead of
using a GList?  True, we aren't exposing it through QAPI, but I still
wonder if a smarter representation than a flat string is warranted.

> @@ -434,35 +421,55 @@ static int qemu_rbd_create(const char *filename, 
> QemuOpts *opts, Error **errp)
>  if (objsize) {
>  if ((objsize - 1) & objsize) {/* not a power of 2? */

Drive-by comment (if you fix it, do it as a separate followup patch): we
have is_power_of_2() to make code like this more legible.

>  
>  static BlockDriver bdrv_rbd = {
> -.format_name= "rbd",
> -.instance_size  = sizeof(BDRVRBDState),
> -.bdrv_needs_filename = true,
> -.bdrv_file_open = qemu_rbd_open,
> -.bdrv_close = qemu_rbd_close,
> -.bdrv_create= qemu_rbd_create,
> -.bdrv_has_zero_init = bdrv_has_zero_init_1,
> -.bdrv_get_info  = qemu_rbd_getinfo,
> -.create_opts= _rbd_create_opts,
> -.bdrv_getlength = qemu_rbd_getlength,
> -.bdrv_truncate  = qemu_rbd_truncate,
> -.protocol_name  = "rbd",
> +.format_name= "rbd",
> +.instance_size  = sizeof(BDRVRBDState),
> +.bdrv_parse_filename= qemu_rbd_parse_filename,
> +.bdrv_file_open = qemu_rbd_open,
> +.bdrv_close = qemu_rbd_close,
> +.bdrv_create= qemu_rbd_create,
> +.bdrv_has_zero_init = bdrv_has_zero_init_1,
> +.bdrv_get_info  = qemu_rbd_getinfo,
> +.create_opts= _rbd_create_opts,

Pointless &; might as well remove it for consistency while touching it.

> +.bdrv_getlength = qemu_rbd_getlength,
> +.bdrv_truncate  = qemu_rbd_truncate,
> +.protocol_name  = "rbd",
>  

I don't know if it is worth respinning to change keyvalue-pairs into a
more appropriate data type; given our desire to make blockdev-add stable
for 2.9 and the fact that keyvalue-pairs is not exposed to QAPI, I can
live with passing around a flat string.  You may want to add FIXME
comments to call attention to the fact that we know it is gross but why
we do it anyways.

But since adding comments, and fixing minor things like &, doesn't
change the real meat of this patch, I can live with:

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 2/5] block/rbd: add all the currently supported runtime_opts

2017-02-27 Thread Jeff Cody
On Mon, Feb 27, 2017 at 04:18:57PM -0600, Eric Blake wrote:
> On 02/27/2017 12:58 PM, Jeff Cody wrote:
> > This adds all the currently supported runtime opts, which
> > are the options as parsed from the filename.  All of these
> > options are explicitly checked for during during runtime,
> > with an exception to the "keyvalue-pairs" option.
> > 
> > This option contains all the key/value pairs that the QEMU rbd
> > driver merely unescapes, and passes along blindly to rados.
> 
> Maybe worth adding a comment that keyvalue-pairs will NOT be exposed in
> QAPI in the later patches, making it command-line only and
> non-introspectible.
>

Yes, I will do that.


> > 
> > Signed-off-by: Jeff Cody 
> > ---
> >  block/rbd.c | 62 
> > ++---
> >  1 file changed, 43 insertions(+), 19 deletions(-)
> > 
> 
> > +static QemuOptsList runtime_opts = {
> > +.name = "rbd",
> > +.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
> > +.desc = {
> > +{
> > +.name = "filename",
> > +.type = QEMU_OPT_STRING,
> > +.help = "Specification of the rbd image",
> > +},
> > +{
> > +.name = "password-secret",
> > +.type = QEMU_OPT_STRING,
> > +.help = "ID of secret providing the password",
> > +},
> > +{
> > +.name = "conf",
> 
> Is "conf" the best name, or do we want "configuration"?
> 

I chose "conf" because it matches the rados option name (and the command
line option name; it is of the form "conf=filename").

> > +.type = QEMU_OPT_STRING,
> > +},
> 
> Worth documenting all the options?
> 

Yes, probably so - and especially to map them up with what rados/ceph
options they correspond to.

> I'm not seeing where "keyvalue-pairs" is used yet, but assume it is in a
> later patch. But assuming the QAPI version in a later patch matches,
> other than keyvalue-pairs, I think you're okay.
> 

Yep, the next patch (where we switch over to .bdrv_parse_filename()).

> Reviewed-by: Eric Blake 
>

Thanks!

-Jeff



Re: [Qemu-block] [Qemu-devel] [PATCH] option: Tweak invalid size error message and unbreak iotest 049

2017-02-27 Thread Christian Borntraeger
On 02/27/2017 01:55 PM, Markus Armbruster wrote:
> Commit 75cdcd1 neglected to update tests/qemu-iotests/049.out, and
> made the error message for negative size worse.  Fix that.
> 
> Reported-by: Thomas Huth 
> Signed-off-by: Markus Armbruster 
Tested-by: Christian Borntraeger 


> ---
>  tests/qemu-iotests/049.out | 14 +-
>  util/qemu-option.c |  2 +-
>  2 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
> index 4673b67..34e66db 100644
> --- a/tests/qemu-iotests/049.out
> +++ b/tests/qemu-iotests/049.out
> @@ -95,14 +95,14 @@ qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- -1024
>  qemu-img: Image size must be less than 8 EiB!
> 
>  qemu-img create -f qcow2 -o size=-1024 TEST_DIR/t.qcow2
> -qemu-img: Parameter 'size' expects a non-negative number below 2^64
> +qemu-img: Value '-1024' is out of range for parameter 'size'
>  qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
> 
>  qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- -1k
>  qemu-img: Image size must be less than 8 EiB!
> 
>  qemu-img create -f qcow2 -o size=-1k TEST_DIR/t.qcow2
> -qemu-img: Parameter 'size' expects a non-negative number below 2^64
> +qemu-img: Value '-1k' is out of range for parameter 'size'
>  qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
> 
>  qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1kilobyte
> @@ -110,15 +110,19 @@ qemu-img: Invalid image size specified! You may use k, 
> M, G, T, P or E suffixes
>  qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.
> 
>  qemu-img create -f qcow2 -o size=1kilobyte TEST_DIR/t.qcow2
> -Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
> cluster_size=65536 lazy_refcounts=off refcount_bits=16
> +qemu-img: Parameter 'size' expects a non-negative number below 2^64
> +Optional suffix k, M, G, T, P or E means kilo-, mega-, giga-, tera-, peta-
> +and exabytes, respectively.
> +qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
> 
>  qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- foobar
>  qemu-img: Invalid image size specified! You may use k, M, G, T, P or E 
> suffixes for
>  qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.
> 
>  qemu-img create -f qcow2 -o size=foobar TEST_DIR/t.qcow2
> -qemu-img: Parameter 'size' expects a size
> -You may use k, M, G or T suffixes for kilobytes, megabytes, gigabytes and 
> terabytes.
> +qemu-img: Parameter 'size' expects a non-negative number below 2^64
> +Optional suffix k, M, G, T, P or E means kilo-, mega-, giga-, tera-, peta-
> +and exabytes, respectively.
>  qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
> 
>  == Check correct interpretation of suffixes for cluster size ==
> diff --git a/util/qemu-option.c b/util/qemu-option.c
> index 419f252..5ce1b5c 100644
> --- a/util/qemu-option.c
> +++ b/util/qemu-option.c
> @@ -179,7 +179,7 @@ void parse_option_size(const char *name, const char 
> *value,
> 
>  err = qemu_strtosz(value, NULL, );
>  if (err == -ERANGE) {
> -error_setg(errp, "Value '%s' is too large for parameter '%s'",
> +error_setg(errp, "Value '%s' is out of range for parameter '%s'",
> value, name);
>  return;
>  }
> 




Re: [Qemu-block] [PATCH v2 1/5] block/rbd: don't copy strings in qemu_rbd_next_tok()

2017-02-27 Thread Eric Blake
On 02/27/2017 12:58 PM, Jeff Cody wrote:
> This patch is prep work for parsing options for .bdrv_parse_filename,
> and using QDict options.
> 
> The function qemu_rbd_next_tok() searched for various key/value pairs,
> and copied them into buffers.  This will soon be an unnecessary extra
> step, so we will now return found strings by reference only, and
> offload the responsibility for safely handling/coping these strings to
> the caller.
> 
> This also cleans up error handling some, as the callers now rely on
> the Error object to determine if there is a parse error.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block/rbd.c | 99 
> +++--
>  1 file changed, 64 insertions(+), 35 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH V2] qemu-img: make convert async

2017-02-27 Thread Eric Blake
On 02/27/2017 05:03 AM, Peter Lieven wrote:
> the convert process is currently completely implemented with sync operations.
> That means it reads one buffer and then writes it. No parallelism and each 
> sync
> request takes as long as it takes until it is completed.
> 

> 
> This patches introduces 2 new cmdline parameters. The -m parameter to specify
> the number of coroutines running in parallel (defaults to 8). And the -W 
> paremeter to

s/paremeter/parameter/

> allow qemu-img to write to the target out of order rather than sequential. 
> This improves
> performance as the writes do not have to wait for each other to complete.
> 
> Signed-off-by: Peter Lieven 
> ---

> @@ -1798,7 +1908,7 @@ static int img_convert(int argc, char **argv)
>  {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
>  {0, 0, 0, 0}
>  };
> -c = getopt_long(argc, argv, "hf:O:B:ce6o:s:l:S:pt:T:qn",
> +c = getopt_long(argc, argv, "hf:O:B:ce6o:s:l:S:pt:T:qnm:W",
>  long_options, NULL);
>  if (c == -1) {
>  break;
> @@ -1890,6 +2000,18 @@ static int img_convert(int argc, char **argv)
>  case 'n':
>  skip_create = 1;
>  break;
> +case 'm':
> +num_coroutines = atoi(optarg);

atoi() should be avoided. It has no error checking, so it treats '-m 1'
and '-m 1k' identically.  You are a bit justified in that '-m junk' gets
treated like '-m 0' and rejected, but it's still a poor error message in
that case.

> +if (num_coroutines < 1 || num_coroutines > MAX_COROUTINES) {
> +error_report("Allowed number of coroutines is between 1 and 
> %d",
> + MAX_COROUTINES);

> +++ b/qemu-img.texi
> @@ -137,6 +137,12 @@ Parameters to convert subcommand:
>  
>  @item -n
>  Skip the creation of the target volume
> +@item -m
> +Number of parallel coroutines for the convert process
> +@item -W
> +Allow to write out of order to the destination. This is option improves 
> performance,

Grammar suggestion:

Allow out-of-order writes to the destination.  This option ...

> +but is only recommened for preallocated devices like host devices or other

s/recommened/recommended/

> +raw block devices.
>  @end table
>  


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 12/17] migration: add postcopy migration of dirty bitmaps

2017-02-27 Thread Dr. David Alan Gilbert
* Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote:
> 24.02.2017 16:26, Dr. David Alan Gilbert wrote:
> > * Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote:
> > > Postcopy migration of dirty bitmaps. Only named dirty bitmaps,
> > > associated with root nodes and non-root named nodes are migrated.
> > > 
> > > If destination qemu is already containing a dirty bitmap with the same 
> > > name
> > > as a migrated bitmap (for the same node), than, if their granularities are
> > > the same the migration will be done, otherwise the error will be 
> > > generated.
> > > 
> > > If destination qemu doesn't contain such bitmap it will be created.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > > ---
> > >   include/migration/block.h  |   1 +
> > >   include/migration/migration.h  |   4 +
> > >   migration/Makefile.objs|   2 +-
> > >   migration/block-dirty-bitmap.c | 679 
> > > +
> > >   migration/migration.c  |   3 +
> > >   migration/savevm.c |   2 +
> > >   migration/trace-events |  14 +
> > >   vl.c   |   1 +
> > >   8 files changed, 705 insertions(+), 1 deletion(-)
> > >   create mode 100644 migration/block-dirty-bitmap.c
> > > 
> > > diff --git a/include/migration/block.h b/include/migration/block.h
> > > index 41a1ac8..8333c43 100644
> > > --- a/include/migration/block.h
> > > +++ b/include/migration/block.h
> > > @@ -14,6 +14,7 @@
> > >   #ifndef MIGRATION_BLOCK_H
> > >   #define MIGRATION_BLOCK_H
> > > +void dirty_bitmap_mig_init(void);
> > >   void blk_mig_init(void);
> > >   int blk_mig_active(void);
> > >   uint64_t blk_mig_bytes_transferred(void);
> > > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > index 46645f4..03a4993 100644
> > > --- a/include/migration/migration.h
> > > +++ b/include/migration/migration.h
> > > @@ -371,4 +371,8 @@ int ram_save_queue_pages(MigrationState *ms, const 
> > > char *rbname,
> > >   PostcopyState postcopy_state_get(void);
> > >   /* Set the state and return the old state */
> > >   PostcopyState postcopy_state_set(PostcopyState new_state);
> > > +
> > > +void dirty_bitmap_mig_before_vm_start(void);
> > > +void init_dirty_bitmap_incoming_migration(void);
> > > +
> > >   #endif
> > > diff --git a/migration/Makefile.objs b/migration/Makefile.objs
> > > index 480dd49..fa3bf6a 100644
> > > --- a/migration/Makefile.objs
> > > +++ b/migration/Makefile.objs
> > > @@ -9,5 +9,5 @@ common-obj-y += qjson.o
> > >   common-obj-$(CONFIG_RDMA) += rdma.o
> > > -common-obj-y += block.o
> > > +common-obj-y += block.o block-dirty-bitmap.o
> > > diff --git a/migration/block-dirty-bitmap.c 
> > > b/migration/block-dirty-bitmap.c
> > > new file mode 100644
> > > index 000..28e3732
> > > --- /dev/null
> > > +++ b/migration/block-dirty-bitmap.c
> > > @@ -0,0 +1,679 @@
> > > +/*
> > > + * Block dirty bitmap postcopy migration
> > > + *
> > > + * Copyright IBM, Corp. 2009
> > > + * Copyright (C) 2016 Parallels IP Holdings GmbH. All rights reserved.
> > > + *
> > > + * Authors:
> > > + *  Liran Schour   
> > > + *  Vladimir Sementsov-Ogievskiy 
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2.  See
> > > + * the COPYING file in the top-level directory.
> > > + * This file is derived from migration/block.c, so it's author and IBM 
> > > copyright
> > > + * are here, although content is quite different.
> > > + *
> > > + * Contributions after 2012-01-13 are licensed under the terms of the
> > > + * GNU GPL, version 2 or (at your option) any later version.
> > > + *
> > > + ****
> > > + *
> > > + * Here postcopy migration of dirty bitmaps is realized. Only named dirty
> > > + * bitmaps, associated with root nodes and non-root named nodes are 
> > > migrated.
> > > + *
> > > + * If destination qemu is already containing a dirty bitmap with the 
> > > same name
> > > + * as a migrated bitmap (for the same node), then, if their 
> > > granularities are
> > > + * the same the migration will be done, otherwise the error will be 
> > > generated.
> > > + *
> > > + * If destination qemu doesn't contain such bitmap it will be created.
> > > + *
> > > + * format of migration:
> > > + *
> > > + * # Header (shared for different chunk types)
> > > + * 1, 2 or 4 bytes: flags (see qemu_{put,put}_flags)
> > > + * [ 1 byte: node name size ] \  flags & DEVICE_NAME
> > > + * [ n bytes: node name ] /
> > > + * [ 1 byte: bitmap name size ] \  flags & BITMAP_NAME
> > > + * [ n bytes: bitmap name ] /
> > > + *
> > > + * # Start of bitmap migration (flags & START)
> > > + * header
> > > + * be64: granularity
> > > + * 1 byte: bitmap enabled flag
> > > + *
> > > + * # Complete of bitmap migration (flags & COMPLETE)
> > > + * header
> > > + *
> > > + * # Data chunk of bitmap migration
> > > + * header

[Qemu-block] [PATCH v2 42/43] block: Add Error parameter to bdrv_set_backing_hd()

2017-02-27 Thread Kevin Wolf
Not all callers of bdrv_set_backing_hd() know for sure that attaching
the backing file will be allowed by the permission system. Return the
error from the function rather than aborting.

Signed-off-by: Kevin Wolf 
---
 block.c   | 30 +++---
 block/commit.c| 14 +++---
 block/mirror.c|  7 ++-
 block/stream.c|  9 -
 block/vvfat.c |  2 +-
 include/block/block.h |  3 ++-
 6 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/block.c b/block.c
index 5118828..1a6e033 100644
--- a/block.c
+++ b/block.c
@@ -1873,7 +1873,8 @@ static void bdrv_parent_cb_resize(BlockDriverState *bs)
  * Sets the backing file link of a BDS. A new reference is created; callers
  * which don't need their own reference any more must call bdrv_unref().
  */
-void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd)
+void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
+ Error **errp)
 {
 if (backing_hd) {
 bdrv_ref(backing_hd);
@@ -1887,9 +1888,12 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 bs->backing = NULL;
 goto out;
 }
-/* FIXME Error handling */
+
 bs->backing = bdrv_attach_child(bs, backing_hd, "backing", _backing,
-_abort);
+errp);
+if (!bs->backing) {
+bdrv_unref(backing_hd);
+}
 
 out:
 bdrv_refresh_limits(bs, NULL);
@@ -1973,8 +1977,12 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 
 /* Hook up the backing file link; drop our reference, bs owns the
  * backing_hd reference now */
-bdrv_set_backing_hd(bs, backing_hd);
+bdrv_set_backing_hd(bs, backing_hd, _err);
 bdrv_unref(backing_hd);
+if (local_err) {
+ret = -EINVAL;
+goto free_exit;
+}
 
 qdict_del(parent_options, bdref_key);
 
@@ -2808,7 +2816,7 @@ static void bdrv_close(BlockDriverState *bs)
 bs->drv->bdrv_close(bs);
 bs->drv = NULL;
 
-bdrv_set_backing_hd(bs, NULL);
+bdrv_set_backing_hd(bs, NULL, _abort);
 
 if (bs->file != NULL) {
 bdrv_unref_child(bs, bs->file);
@@ -2915,7 +2923,8 @@ void bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top)
 bdrv_ref(bs_top);
 
 change_parent_backing_link(bs_top, bs_new);
-bdrv_set_backing_hd(bs_new, bs_top);
+/* FIXME Error handling */
+bdrv_set_backing_hd(bs_new, bs_top, _abort);
 bdrv_unref(bs_top);
 
 /* bs_new is now referenced by its new parents, we don't need the
@@ -3063,6 +3072,7 @@ int bdrv_drop_intermediate(BlockDriverState *active, 
BlockDriverState *top,
BlockDriverState *base, const char 
*backing_file_str)
 {
 BlockDriverState *new_top_bs = NULL;
+Error *local_err = NULL;
 int ret = -EIO;
 
 if (!top->drv || !base->drv) {
@@ -3095,7 +3105,13 @@ int bdrv_drop_intermediate(BlockDriverState *active, 
BlockDriverState *top,
 if (ret) {
 goto exit;
 }
-bdrv_set_backing_hd(new_top_bs, base);
+
+bdrv_set_backing_hd(new_top_bs, base, _err);
+if (local_err) {
+ret = -EPERM;
+error_report_err(local_err);
+goto exit;
+}
 
 ret = 0;
 exit:
diff --git a/block/commit.c b/block/commit.c
index 1e0f531..22a0a4d 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -121,7 +121,7 @@ static void commit_complete(BlockJob *job, void *opaque)
  * filter driver from the backing chain. Do this as the final step so that
  * the 'consistent read' permission can be granted.  */
 if (remove_commit_top_bs) {
-bdrv_set_backing_hd(overlay_bs, top);
+bdrv_set_backing_hd(overlay_bs, top, _abort);
 }
 }
 
@@ -316,8 +316,8 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 goto fail;
 }
 
-bdrv_set_backing_hd(commit_top_bs, top);
-bdrv_set_backing_hd(overlay_bs, commit_top_bs);
+bdrv_set_backing_hd(commit_top_bs, top, _abort);
+bdrv_set_backing_hd(overlay_bs, commit_top_bs, _abort);
 
 s->commit_top_bs = commit_top_bs;
 bdrv_unref(commit_top_bs);
@@ -390,7 +390,7 @@ fail:
 blk_unref(s->top);
 }
 if (commit_top_bs) {
-bdrv_set_backing_hd(overlay_bs, top);
+bdrv_set_backing_hd(overlay_bs, top, _abort);
 }
 block_job_unref(>common);
 }
@@ -451,8 +451,8 @@ int bdrv_commit(BlockDriverState *bs)
 goto ro_cleanup;
 }
 
-bdrv_set_backing_hd(commit_top_bs, backing_file_bs);
-bdrv_set_backing_hd(bs, commit_top_bs);
+bdrv_set_backing_hd(commit_top_bs, backing_file_bs, _abort);
+bdrv_set_backing_hd(bs, commit_top_bs, _abort);
 
 ret = blk_insert_bs(backing, backing_file_bs, _err);
 if (ret < 0) {
@@ -532,7 +532,7 @@ ro_cleanup:
 
 blk_unref(backing);
 if (backing_file_bs) {
-

[Qemu-block] [PATCH v2 41/43] block: Assertions for resize permission

2017-02-27 Thread Kevin Wolf
This adds an assertion that ensures that the necessary resize permission
has been granted before bdrv_truncate() is called.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c| 3 +++
 block/io.c | 1 +
 2 files changed, 4 insertions(+)

diff --git a/block.c b/block.c
index af2b8ff..5118828 100644
--- a/block.c
+++ b/block.c
@@ -3110,6 +3110,9 @@ int bdrv_truncate(BdrvChild *child, int64_t offset)
 BlockDriverState *bs = child->bs;
 BlockDriver *drv = bs->drv;
 int ret;
+
+assert(child->perm & BLK_PERM_RESIZE);
+
 if (!drv)
 return -ENOMEDIUM;
 if (!drv->bdrv_truncate)
diff --git a/block/io.c b/block/io.c
index 4c79745..8f38d46 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1339,6 +1339,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 assert(req->overlap_offset <= offset);
 assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
 assert(child->perm & BLK_PERM_WRITE);
+assert(end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE);
 
 ret = notifier_with_return_list_notify(>before_write_notifiers, req);
 
-- 
1.8.3.1




[Qemu-block] [PATCH v2 38/43] tests: Remove FIXME comments

2017-02-27 Thread Kevin Wolf
Not requesting any permissions is actually correct for these test cases
because no actual I/O or other operation covered by the permission
system is performed.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 tests/test-blockjob.c | 2 +-
 tests/test-throttle.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index 1afe17b..740e740 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -54,7 +54,7 @@ static BlockJob *do_test_id(BlockBackend *blk, const char *id,
  * BlockDriverState inserted. */
 static BlockBackend *create_blk(const char *name)
 {
-/* FIXME Use real permissions */
+/* No I/O is performed on this device */
 BlockBackend *blk = blk_new(0, BLK_PERM_ALL);
 BlockDriverState *bs;
 
diff --git a/tests/test-throttle.c b/tests/test-throttle.c
index 5846433..bd7c501 100644
--- a/tests/test-throttle.c
+++ b/tests/test-throttle.c
@@ -593,7 +593,7 @@ static void test_groups(void)
 BlockBackend *blk1, *blk2, *blk3;
 BlockBackendPublic *blkp1, *blkp2, *blkp3;
 
-/* FIXME Use real permissions */
+/* No actual I/O is performed on these devices */
 blk1 = blk_new(0, BLK_PERM_ALL);
 blk2 = blk_new(0, BLK_PERM_ALL);
 blk3 = blk_new(0, BLK_PERM_ALL);
-- 
1.8.3.1




[Qemu-block] [PATCH v2 36/43] migration/block: Use real permissions

2017-02-27 Thread Kevin Wolf
Request BLK_PERM_CONSISTENT_READ for the source of block migration, and
handle potential permission errors as good as we can in this place
(which is not very good, but it matches the other failure cases).

Signed-off-by: Kevin Wolf 
---
 migration/block.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index d259936..1941bc2 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -379,7 +379,7 @@ static void unset_dirty_tracking(void)
 }
 }
 
-static void init_blk_migration(QEMUFile *f)
+static int init_blk_migration(QEMUFile *f)
 {
 BlockDriverState *bs;
 BlkMigDevState *bmds;
@@ -390,6 +390,8 @@ static void init_blk_migration(QEMUFile *f)
 BlkMigDevState *bmds;
 BlockDriverState *bs;
 } *bmds_bs;
+Error *local_err = NULL;
+int ret;
 
 block_mig_state.submitted = 0;
 block_mig_state.read_done = 0;
@@ -411,12 +413,12 @@ static void init_blk_migration(QEMUFile *f)
 
 sectors = bdrv_nb_sectors(bs);
 if (sectors <= 0) {
+ret = sectors;
 goto out;
 }
 
 bmds = g_new0(BlkMigDevState, 1);
-/* FIXME Use real permissions */
-bmds->blk = blk_new(0, BLK_PERM_ALL);
+bmds->blk = blk_new(BLK_PERM_CONSISTENT_READ, BLK_PERM_ALL);
 bmds->blk_name = g_strdup(bdrv_get_device_name(bs));
 bmds->bulk_completed = 0;
 bmds->total_sectors = sectors;
@@ -446,7 +448,11 @@ static void init_blk_migration(QEMUFile *f)
 BlockDriverState *bs = bmds_bs[i].bs;
 
 if (bmds) {
-blk_insert_bs(bmds->blk, bs, _abort);
+ret = blk_insert_bs(bmds->blk, bs, _err);
+if (ret < 0) {
+error_report_err(local_err);
+goto out;
+}
 
 alloc_aio_bitmap(bmds);
 error_setg(>blocker, "block device is in use by migration");
@@ -454,8 +460,10 @@ static void init_blk_migration(QEMUFile *f)
 }
 }
 
+ret = 0;
 out:
 g_free(bmds_bs);
+return ret;
 }
 
 /* Called with no lock taken.  */
@@ -706,7 +714,11 @@ static int block_save_setup(QEMUFile *f, void *opaque)
 block_mig_state.submitted, block_mig_state.transferred);
 
 qemu_mutex_lock_iothread();
-init_blk_migration(f);
+ret = init_blk_migration(f);
+if (ret < 0) {
+qemu_mutex_unlock_iothread();
+return ret;
+}
 
 /* start track dirty blocks */
 ret = set_dirty_tracking();
-- 
1.8.3.1




[Qemu-block] [PATCH v2 33/43] mirror: Add filter-node-name to blockdev-mirror

2017-02-27 Thread Kevin Wolf
Management tools need to be able to know about every node in the graph
and need a way to address them. Changing the graph structure was okay
because libvirt doesn't really manage the node level yet, but future
libvirt versions need to deal with both new and old version of qemu.

This new option to blockdev-mirror allows the client to set a node-name
for the automatically inserted filter driver, and at the same time
serves as a witness for a future libvirt that this version of qemu does
automatically insert a filter driver.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/mirror.c| 14 --
 blockdev.c| 12 +++-
 include/block/block_int.h |  5 -
 qapi/block-core.json  |  8 +++-
 4 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 70faf77..c08b7e0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1084,7 +1084,7 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
  void *opaque, Error **errp,
  const BlockJobDriver *driver,
  bool is_none_mode, BlockDriverState *base,
- bool auto_complete)
+ bool auto_complete, const char *filter_node_name)
 {
 MirrorBlockJob *s;
 BlockDriverState *mirror_top_bs;
@@ -1110,8 +1110,8 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 /* In the case of active commit, add dummy driver to provide consistent
  * reads on the top, while disabling it in the intermediate nodes, and make
  * the backing chain writable. */
-mirror_top_bs = bdrv_new_open_driver(_mirror_top, NULL, BDRV_O_RDWR,
- errp);
+mirror_top_bs = bdrv_new_open_driver(_mirror_top, filter_node_name,
+ BDRV_O_RDWR, errp);
 if (mirror_top_bs == NULL) {
 return;
 }
@@ -1221,7 +1221,7 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
   MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
   BlockdevOnError on_source_error,
   BlockdevOnError on_target_error,
-  bool unmap, Error **errp)
+  bool unmap, const char *filter_node_name, Error **errp)
 {
 bool is_none_mode;
 BlockDriverState *base;
@@ -1235,7 +1235,8 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
 mirror_start_job(job_id, bs, BLOCK_JOB_DEFAULT, target, replaces,
  speed, granularity, buf_size, backing_mode,
  on_source_error, on_target_error, unmap, NULL, NULL, errp,
- _job_driver, is_none_mode, base, false);
+ _job_driver, is_none_mode, base, false,
+ filter_node_name);
 }
 
 void commit_active_start(const char *job_id, BlockDriverState *bs,
@@ -1256,7 +1257,8 @@ void commit_active_start(const char *job_id, 
BlockDriverState *bs,
 mirror_start_job(job_id, bs, creation_flags, base, NULL, speed, 0, 0,
  MIRROR_LEAVE_BACKING_CHAIN,
  on_error, on_error, true, cb, opaque, _err,
- _active_job_driver, false, base, auto_complete);
+ _active_job_driver, false, base, auto_complete,
+ NULL);
 if (local_err) {
 error_propagate(errp, local_err);
 goto error_restore_flags;
diff --git a/blockdev.c b/blockdev.c
index 2374973..5bd09f8 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3365,6 +3365,8 @@ static void blockdev_mirror_common(const char *job_id, 
BlockDriverState *bs,
bool has_on_target_error,
BlockdevOnError on_target_error,
bool has_unmap, bool unmap,
+   bool has_filter_node_name,
+   const char *filter_node_name,
Error **errp)
 {
 
@@ -3386,6 +3388,9 @@ static void blockdev_mirror_common(const char *job_id, 
BlockDriverState *bs,
 if (!has_unmap) {
 unmap = true;
 }
+if (!has_filter_node_name) {
+filter_node_name = NULL;
+}
 
 if (granularity != 0 && (granularity < 512 || granularity > 1048576 * 64)) 
{
 error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "granularity",
@@ -3415,7 +3420,8 @@ static void blockdev_mirror_common(const char *job_id, 
BlockDriverState *bs,
 mirror_start(job_id, bs, target,
  has_replaces ? replaces : NULL,
  speed, granularity, buf_size, sync, backing_mode,
- on_source_error, on_target_error, unmap, errp);
+ on_source_error, on_target_error, unmap, filter_node_name,
+ errp);
 }
 
 void 

Re: [Qemu-block] [PATCH V2] qemu-img: make convert async

2017-02-27 Thread Kevin Wolf
Am 27.02.2017 um 12:03 hat Peter Lieven geschrieben:
> the convert process is currently completely implemented with sync operations.
> That means it reads one buffer and then writes it. No parallelism and each 
> sync
> request takes as long as it takes until it is completed.
> 
> This can be a big performance hit when the convert process reads and writes
> to devices which do not benefit from kernel readahead or pagecache.
> In our environment we heavily have the following two use cases when using
> qemu-img convert.
> 
> a) reading from NFS and writing to iSCSI for deploying templates
> b) reading from iSCSI and writing to NFS for backups
> 
> In both processes we use libiscsi and libnfs so we have no kernel cache.
> 
> This patch changes the convert process to work with parallel running 
> coroutines
> which can significantly improve performance for network storage devices:
> 
> qemu-img (master)
>  nfs -> iscsi 22.8 secs
>  nfs -> ram   11.7 secs
>  ram -> iscsi 12.3 secs
> 
> qemu-img-async (8 coroutines, in-order write disabled)
>  nfs -> iscsi 11.0 secs
>  nfs -> ram   10.4 secs
>  ram -> iscsi  9.0 secs
> 
> This patches introduces 2 new cmdline parameters. The -m parameter to specify
> the number of coroutines running in parallel (defaults to 8). And the -W 
> paremeter to
> allow qemu-img to write to the target out of order rather than sequential. 
> This improves
> performance as the writes do not have to wait for each other to complete.
> 
> Signed-off-by: Peter Lieven 

Thanks, applied to the block branch.

Kevin



[Qemu-block] [PATCH v2 31/43] mirror: Use real permissions in mirror/active commit block job

2017-02-27 Thread Kevin Wolf
The mirror block job is mainly used for two different scenarios:
Mirroring to an otherwise unused, independent target node, or for active
commit where the target node is part of the backing chain of the source.

Similarly to the commit block job patch, we need to insert a new filter
node to keep the permissions correct during active commit.

Note that one change this implies is that job->blk points to
mirror_top_bs as its root now, and mirror_top_bs (rather than the actual
source node) contains the bs->job pointer. This requires qemu-img commit
to get the job by name now rather than just taking bs->job.

Signed-off-by: Kevin Wolf 
---
 block/mirror.c | 212 +
 qemu-img.c |   6 +-
 tests/qemu-iotests/141 |   2 +-
 tests/qemu-iotests/141.out |   4 +-
 4 files changed, 186 insertions(+), 38 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index beaac6f..70faf77 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -38,7 +38,10 @@ typedef struct MirrorBlockJob {
 BlockJob common;
 RateLimit limit;
 BlockBackend *target;
+BlockDriverState *mirror_top_bs;
+BlockDriverState *source;
 BlockDriverState *base;
+
 /* The name of the graph node to replace */
 char *replaces;
 /* The BDS to replace */
@@ -327,7 +330,7 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
 
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
-BlockDriverState *source = blk_bs(s->common.blk);
+BlockDriverState *source = s->source;
 int64_t sector_num, first_chunk;
 uint64_t delay_ns = 0;
 /* At least the first dirty chunk is mirrored in one iteration. */
@@ -497,12 +500,24 @@ static void mirror_exit(BlockJob *job, void *opaque)
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
 MirrorExitData *data = opaque;
 AioContext *replace_aio_context = NULL;
-BlockDriverState *src = blk_bs(s->common.blk);
+BlockDriverState *src = s->source;
 BlockDriverState *target_bs = blk_bs(s->target);
+BlockDriverState *mirror_top_bs = s->mirror_top_bs;
 
 /* Make sure that the source BDS doesn't go away before we called
  * block_job_completed(). */
 bdrv_ref(src);
+bdrv_ref(mirror_top_bs);
+
+/* We don't access the source any more. Dropping any WRITE/RESIZE is
+ * required before it could become a backing file of target_bs. */
+bdrv_child_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL);
+if (s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
+BlockDriverState *backing = s->is_none_mode ? src : s->base;
+if (backing_bs(target_bs) != backing) {
+bdrv_set_backing_hd(target_bs, backing);
+}
+}
 
 if (s->to_replace) {
 replace_aio_context = bdrv_get_aio_context(s->to_replace);
@@ -524,13 +539,6 @@ static void mirror_exit(BlockJob *job, void *opaque)
 bdrv_drained_begin(target_bs);
 bdrv_replace_in_backing_chain(to_replace, target_bs);
 bdrv_drained_end(target_bs);
-
-/* We just changed the BDS the job BB refers to, so switch the BB back
- * so the cleanup does the right thing. We don't need any permissions
- * any more now. */
-blk_remove_bs(job->blk);
-blk_set_perm(job->blk, 0, BLK_PERM_ALL, _abort);
-blk_insert_bs(job->blk, src, _abort);
 }
 if (s->to_replace) {
 bdrv_op_unblock_all(s->to_replace, s->replace_blocker);
@@ -543,9 +551,23 @@ static void mirror_exit(BlockJob *job, void *opaque)
 g_free(s->replaces);
 blk_unref(s->target);
 s->target = NULL;
+
+/* Remove the mirror filter driver from the graph */
+bdrv_replace_in_backing_chain(mirror_top_bs, backing_bs(mirror_top_bs));
+
+/* We just changed the BDS the job BB refers to (with either or both of the
+ * bdrv_replace_in_backing_chain() calls), so switch the BB back so the
+ * cleanup does the right thing. We don't need any permissions any more
+ * now. */
+blk_remove_bs(job->blk);
+blk_set_perm(job->blk, 0, BLK_PERM_ALL, _abort);
+blk_insert_bs(job->blk, mirror_top_bs, _abort);
+
 block_job_completed(>common, data->ret);
+
 g_free(data);
 bdrv_drained_end(src);
+bdrv_unref(mirror_top_bs);
 bdrv_unref(src);
 }
 
@@ -565,7 +587,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 {
 int64_t sector_num, end;
 BlockDriverState *base = s->base;
-BlockDriverState *bs = blk_bs(s->common.blk);
+BlockDriverState *bs = s->source;
 BlockDriverState *target_bs = blk_bs(s->target);
 int ret, n;
 
@@ -647,7 +669,7 @@ static void coroutine_fn mirror_run(void *opaque)
 {
 MirrorBlockJob *s = opaque;
 MirrorExitData *data;
-BlockDriverState *bs = blk_bs(s->common.blk);
+BlockDriverState *bs = s->source;
 BlockDriverState *target_bs = blk_bs(s->target);
 bool need_drain = true;
 int64_t length;
@@ 

[Qemu-block] [PATCH v2 29/43] block: BdrvChildRole.attach/detach() callbacks

2017-02-27 Thread Kevin Wolf
Backing files are somewhat special compared to other kinds of children
because they are attached and detached using bdrv_set_backing_hd()
rather than the normal set of functions, which does a few more things
like setting backing blockers, toggling the BDRV_O_NO_BACKING flag,
setting parent_bs->backing_file, etc.

These special features are a reason why change_parent_backing_link()
can't handle backing files yet. With abstracting the additional features
into .attach/.detach callbacks, we get a step closer to a function that
can actually deal with this.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c   | 94 +--
 include/block/block_int.h |  3 ++
 2 files changed, 62 insertions(+), 35 deletions(-)

diff --git a/block.c b/block.c
index 7926c6c..86567a8 100644
--- a/block.c
+++ b/block.c
@@ -807,6 +807,57 @@ const BdrvChildRole child_format = {
 .drained_end = bdrv_child_cb_drained_end,
 };
 
+static void bdrv_backing_attach(BdrvChild *c)
+{
+BlockDriverState *parent = c->opaque;
+BlockDriverState *backing_hd = c->bs;
+
+assert(!parent->backing_blocker);
+error_setg(>backing_blocker,
+   "node is used as backing hd of '%s'",
+   bdrv_get_device_or_node_name(parent));
+
+parent->open_flags &= ~BDRV_O_NO_BACKING;
+pstrcpy(parent->backing_file, sizeof(parent->backing_file),
+backing_hd->filename);
+pstrcpy(parent->backing_format, sizeof(parent->backing_format),
+backing_hd->drv ? backing_hd->drv->format_name : "");
+
+bdrv_op_block_all(backing_hd, parent->backing_blocker);
+/* Otherwise we won't be able to commit or stream */
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
+parent->backing_blocker);
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_STREAM,
+parent->backing_blocker);
+/*
+ * We do backup in 3 ways:
+ * 1. drive backup
+ *The target bs is new opened, and the source is top BDS
+ * 2. blockdev backup
+ *Both the source and the target are top BDSes.
+ * 3. internal backup(used for block replication)
+ *Both the source and the target are backing file
+ *
+ * In case 1 and 2, neither the source nor the target is the backing file.
+ * In case 3, we will block the top BDS, so there is only one block job
+ * for the top BDS and its backing chain.
+ */
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+parent->backing_blocker);
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
+parent->backing_blocker);
+}
+
+static void bdrv_backing_detach(BdrvChild *c)
+{
+BlockDriverState *parent = c->opaque;
+
+assert(parent->backing_blocker);
+bdrv_op_unblock_all(c->bs, parent->backing_blocker);
+error_free(parent->backing_blocker);
+parent->backing_blocker = NULL;
+}
+
 /*
  * Returns the options and flags that bs->backing should get, based on the
  * given options and flags for the parent BDS
@@ -833,6 +884,8 @@ static void bdrv_backing_options(int *child_flags, QDict 
*child_options,
 
 const BdrvChildRole child_backing = {
 .get_parent_desc = bdrv_child_get_parent_desc,
+.attach  = bdrv_backing_attach,
+.detach  = bdrv_backing_detach,
 .inherit_options = bdrv_backing_options,
 .drained_begin   = bdrv_child_cb_drained_begin,
 .drained_end = bdrv_child_cb_drained_end,
@@ -1672,6 +1725,9 @@ static void bdrv_replace_child(BdrvChild *child, 
BlockDriverState *new_bs)
 if (old_bs->quiesce_counter && child->role->drained_end) {
 child->role->drained_end(child);
 }
+if (child->role->detach) {
+child->role->detach(child);
+}
 QLIST_REMOVE(child, next_parent);
 bdrv_update_perm(old_bs);
 }
@@ -1684,6 +1740,9 @@ static void bdrv_replace_child(BdrvChild *child, 
BlockDriverState *new_bs)
 child->role->drained_begin(child);
 }
 bdrv_update_perm(new_bs);
+if (child->role->attach) {
+child->role->attach(child);
+}
 }
 }
 
@@ -1821,52 +1880,17 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 }
 
 if (bs->backing) {
-assert(bs->backing_blocker);
-bdrv_op_unblock_all(bs->backing->bs, bs->backing_blocker);
 bdrv_unref_child(bs, bs->backing);
-} else if (backing_hd) {
-error_setg(>backing_blocker,
-   "node is used as backing hd of '%s'",
-   bdrv_get_device_or_node_name(bs));
 }
 
 if (!backing_hd) {
-error_free(bs->backing_blocker);
-bs->backing_blocker = NULL;
 bs->backing = NULL;
 goto out;
 }
 /* FIXME Error handling */
 bs->backing = bdrv_attach_child(bs, backing_hd, "backing", _backing,
  

[Qemu-block] [PATCH v2 30/43] block: Allow backing file links in change_parent_backing_link()

2017-02-27 Thread Kevin Wolf
Now that the backing file child role implements .attach/.detach
callbacks, nothing prevents us from modifying the graph even if that
involves changing backing file links.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 86567a8..af2b8ff 100644
--- a/block.c
+++ b/block.c
@@ -2872,9 +2872,9 @@ static void change_parent_backing_link(BlockDriverState 
*from,
 continue;
 }
 if (c->role == _backing) {
-/* @from is generally not allowed to be a backing file, except for
- * when @to is the overlay. In that case, @from may not be replaced
- * by @to as @to's backing node. */
+/* If @from is a backing file of @to, ignore the child to avoid
+ * creating a loop. We only want to change the pointer of other
+ * parents. */
 QLIST_FOREACH(to_c, >children, next) {
 if (to_c == c) {
 break;
@@ -2885,7 +2885,6 @@ static void change_parent_backing_link(BlockDriverState 
*from,
 }
 }
 
-assert(c->role != _backing);
 bdrv_ref(to);
 bdrv_replace_child(c, to);
 bdrv_unref(from);
-- 
1.8.3.1




[Qemu-block] [PATCH v2 27/43] backup: Use real permissions in backup block job

2017-02-27 Thread Kevin Wolf
The backup block job doesn't have very complicated requirements: It
needs to read from the source and write to the target, but it's fine
with either side being changed. The only restriction is that we can't
resize the image because the job uses a cached value.

qemu-iotests 055 needs to be changed because it used a target which was
already attached to a virtio-blk device. The permission system correctly
forbids this (virtio-blk can't accept another writer with its default
share-rw=off).

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/backup.c | 15 ++-
 tests/qemu-iotests/055 | 11 +++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 405f271..d1ab617 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -618,15 +618,20 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 goto error;
 }
 
-/* FIXME Use real permissions */
-job = block_job_create(job_id, _job_driver, bs, 0, BLK_PERM_ALL,
+/* job->common.len is fixed, so we can't allow resize */
+job = block_job_create(job_id, _job_driver, bs,
+   BLK_PERM_CONSISTENT_READ,
+   BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
+   BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
speed, creation_flags, cb, opaque, errp);
 if (!job) {
 goto error;
 }
 
-/* FIXME Use real permissions */
-job->target = blk_new(0, BLK_PERM_ALL);
+/* The target must match the source in size, so no resize here either */
+job->target = blk_new(BLK_PERM_WRITE,
+  BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
+  BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
 ret = blk_insert_bs(job->target, target, errp);
 if (ret < 0) {
 goto error;
@@ -657,7 +662,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
 }
 
-/* FIXME Use real permissions */
+/* Required permissions are already taken with target's blk_new() */
 block_job_add_bdrv(>common, "target", target, 0, BLK_PERM_ALL,
_abort);
 job->common.len = len;
diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
index 1d3fd04..aafcd24 100755
--- a/tests/qemu-iotests/055
+++ b/tests/qemu-iotests/055
@@ -48,7 +48,8 @@ class TestSingleDrive(iotests.QMPTestCase):
 def setUp(self):
 qemu_img('create', '-f', iotests.imgfmt, blockdev_target_img, 
str(image_len))
 
-self.vm = 
iotests.VM().add_drive(test_img).add_drive(blockdev_target_img)
+self.vm = iotests.VM().add_drive(test_img)
+self.vm.add_drive(blockdev_target_img, interface="none")
 if iotests.qemu_default_machine == 'pc':
 self.vm.add_drive(None, 'media=cdrom', 'ide')
 self.vm.launch()
@@ -164,7 +165,8 @@ class TestSetSpeed(iotests.QMPTestCase):
 def setUp(self):
 qemu_img('create', '-f', iotests.imgfmt, blockdev_target_img, 
str(image_len))
 
-self.vm = 
iotests.VM().add_drive(test_img).add_drive(blockdev_target_img)
+self.vm = iotests.VM().add_drive(test_img)
+self.vm.add_drive(blockdev_target_img, interface="none")
 self.vm.launch()
 
 def tearDown(self):
@@ -247,7 +249,8 @@ class TestSingleTransaction(iotests.QMPTestCase):
 def setUp(self):
 qemu_img('create', '-f', iotests.imgfmt, blockdev_target_img, 
str(image_len))
 
-self.vm = 
iotests.VM().add_drive(test_img).add_drive(blockdev_target_img)
+self.vm = iotests.VM().add_drive(test_img)
+self.vm.add_drive(blockdev_target_img, interface="none")
 if iotests.qemu_default_machine == 'pc':
 self.vm.add_drive(None, 'media=cdrom', 'ide')
 self.vm.launch()
@@ -460,7 +463,7 @@ class TestDriveCompression(iotests.QMPTestCase):
 
 qemu_img('create', '-f', fmt, blockdev_target_img,
  str(TestDriveCompression.image_len), *args)
-self.vm.add_drive(blockdev_target_img, format=fmt)
+self.vm.add_drive(blockdev_target_img, format=fmt, interface="none")
 
 self.vm.launch()
 
-- 
1.8.3.1




[Qemu-block] [PATCH v2 43/43] block: Add Error parameter to bdrv_append()

2017-02-27 Thread Kevin Wolf
Aborting on error in bdrv_append() isn't correct. This patch fixes it
and lets the callers handle failures.

Test case 085 needs a reference output update. This is caused by the
reversed order of bdrv_set_backing_hd() and change_parent_backing_link()
in bdrv_append(): When the backing file of the new node is set, the
parent nodes are still pointing to the old top, so the backing blocker
is now initialised with the node name rather than the BlockBackend name.

Signed-off-by: Kevin Wolf 
---
 block.c| 23 +--
 block/mirror.c |  9 -
 blockdev.c | 18 +++---
 include/block/block.h  |  3 ++-
 tests/qemu-iotests/085.out |  2 +-
 5 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index 1a6e033..1581335 100644
--- a/block.c
+++ b/block.c
@@ -2077,6 +2077,7 @@ static BlockDriverState 
*bdrv_append_temp_snapshot(BlockDriverState *bs,
 int64_t total_size;
 QemuOpts *opts = NULL;
 BlockDriverState *bs_snapshot;
+Error *local_err = NULL;
 int ret;
 
 /* if snapshot, we create a temporary backing file and open it
@@ -2126,7 +2127,12 @@ static BlockDriverState 
*bdrv_append_temp_snapshot(BlockDriverState *bs,
  * call bdrv_unref() on it), so in order to be able to return one, we have
  * to increase bs_snapshot's refcount here */
 bdrv_ref(bs_snapshot);
-bdrv_append(bs_snapshot, bs);
+bdrv_append(bs_snapshot, bs, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto out;
+}
 
 g_free(tmp_filename);
 return bs_snapshot;
@@ -2915,20 +2921,25 @@ static void change_parent_backing_link(BlockDriverState 
*from,
  * parents of bs_top after bdrv_append() returns. If the caller needs to keep a
  * reference of its own, it must call bdrv_ref().
  */
-void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
+void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
+ Error **errp)
 {
+Error *local_err = NULL;
+
 assert(!atomic_read(_top->in_flight));
 assert(!atomic_read(_new->in_flight));
 
-bdrv_ref(bs_top);
+bdrv_set_backing_hd(bs_new, bs_top, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto out;
+}
 
 change_parent_backing_link(bs_top, bs_new);
-/* FIXME Error handling */
-bdrv_set_backing_hd(bs_new, bs_top, _abort);
-bdrv_unref(bs_top);
 
 /* bs_new is now referenced by its new parents, we don't need the
  * additional reference any more. */
+out:
 bdrv_unref(bs_new);
 }
 
diff --git a/block/mirror.c b/block/mirror.c
index f94f3f8..321f7c2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1095,6 +1095,7 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 BlockDriverState *mirror_top_bs;
 bool target_graph_mod;
 bool target_is_backing;
+Error *local_err = NULL;
 int ret;
 
 if (granularity == 0) {
@@ -1126,9 +1127,15 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
  * it alive until block_job_create() even if bs has no parent. */
 bdrv_ref(mirror_top_bs);
 bdrv_drained_begin(bs);
-bdrv_append(mirror_top_bs, bs);
+bdrv_append(mirror_top_bs, bs, _err);
 bdrv_drained_end(bs);
 
+if (local_err) {
+bdrv_unref(mirror_top_bs);
+error_propagate(errp, local_err);
+return;
+}
+
 /* Make sure that the source is not resized while the job is running */
 s = block_job_create(job_id, driver, mirror_top_bs,
  BLK_PERM_CONSISTENT_READ,
diff --git a/blockdev.c b/blockdev.c
index 34b522b..97fbc7e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1767,6 +1767,17 @@ static void external_snapshot_prepare(BlkActionState 
*common,
 
 if (!state->new_bs->drv->supports_backing) {
 error_setg(errp, "The snapshot does not support backing images");
+return;
+}
+
+/* This removes our old bs and adds the new bs. This is an operation that
+ * can fail, so we need to do it in .prepare; undoing it for abort is
+ * always possible. */
+bdrv_ref(state->new_bs);
+bdrv_append(state->new_bs, state->old_bs, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
 }
 }
 
@@ -1777,8 +1788,6 @@ static void external_snapshot_commit(BlkActionState 
*common)
 
 bdrv_set_aio_context(state->new_bs, state->aio_context);
 
-/* This removes our old bs and adds the new bs */
-bdrv_append(state->new_bs, state->old_bs);
 /* We don't need (or want) to use the transactional
  * bdrv_reopen_multiple() across all the entries at once, because we
  * don't want to abort all of them if one of them fails the reopen */
@@ -1793,7 +1802,9 @@ static void external_snapshot_abort(BlkActionState 
*common)
 ExternalSnapshotState *state =

[Qemu-block] [PATCH v2 39/43] block: Pass BdrvChild to bdrv_aligned_preadv/pwritev and copy-on-read

2017-02-27 Thread Kevin Wolf
This is where we want to check the permissions, so we need to have the
BdrvChild around where they are stored.

Signed-off-by: Kevin Wolf 
---
 block/io.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/block/io.c b/block/io.c
index d5c4544..2592ca1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -925,9 +925,11 @@ bdrv_driver_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 return drv->bdrv_co_pwritev_compressed(bs, offset, bytes, qiov);
 }
 
-static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
+static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
 int64_t offset, unsigned int bytes, QEMUIOVector *qiov)
 {
+BlockDriverState *bs = child->bs;
+
 /* Perform I/O through a temporary buffer so that users who scribble over
  * their read buffer while the operation is in progress do not end up
  * modifying the image file.  This is critical for zero-copy guest I/O
@@ -1001,10 +1003,11 @@ err:
  * handles copy on read, zeroing after EOF, and fragmentation of large
  * reads; any other features must be implemented by the caller.
  */
-static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
+static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
 BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
 int64_t align, QEMUIOVector *qiov, int flags)
 {
+BlockDriverState *bs = child->bs;
 int64_t total_bytes, max_bytes;
 int ret = 0;
 uint64_t bytes_remaining = bytes;
@@ -1050,7 +1053,7 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }
 
 if (!ret || pnum != nb_sectors) {
-ret = bdrv_co_do_copy_on_readv(bs, offset, bytes, qiov);
+ret = bdrv_co_do_copy_on_readv(child, offset, bytes, qiov);
 goto out;
 }
 }
@@ -1158,7 +1161,7 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
 }
 
 tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_READ);
-ret = bdrv_aligned_preadv(bs, , offset, bytes, align,
+ret = bdrv_aligned_preadv(child, , offset, bytes, align,
   use_local_qiov ? _qiov : qiov,
   flags);
 tracked_request_end();
@@ -1306,10 +1309,11 @@ fail:
  * Forwards an already correctly aligned write request to the BlockDriver,
  * after possibly fragmenting it.
  */
-static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
+static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
 BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
 int64_t align, QEMUIOVector *qiov, int flags)
 {
+BlockDriverState *bs = child->bs;
 BlockDriver *drv = bs->drv;
 bool waited;
 int ret;
@@ -1397,12 +1401,13 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 return ret;
 }
 
-static int coroutine_fn bdrv_co_do_zero_pwritev(BlockDriverState *bs,
+static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
 int64_t offset,
 unsigned int bytes,
 BdrvRequestFlags flags,
 BdrvTrackedRequest *req)
 {
+BlockDriverState *bs = child->bs;
 uint8_t *buf = NULL;
 QEMUIOVector local_qiov;
 struct iovec iov;
@@ -1430,7 +1435,7 @@ static int coroutine_fn 
bdrv_co_do_zero_pwritev(BlockDriverState *bs,
 mark_request_serialising(req, align);
 wait_serialising_requests(req);
 bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_HEAD);
-ret = bdrv_aligned_preadv(bs, req, offset & ~(align - 1), align,
+ret = bdrv_aligned_preadv(child, req, offset & ~(align - 1), align,
   align, _qiov, 0);
 if (ret < 0) {
 goto fail;
@@ -1438,7 +1443,7 @@ static int coroutine_fn 
bdrv_co_do_zero_pwritev(BlockDriverState *bs,
 bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_AFTER_HEAD);
 
 memset(buf + head_padding_bytes, 0, zero_bytes);
-ret = bdrv_aligned_pwritev(bs, req, offset & ~(align - 1), align,
+ret = bdrv_aligned_pwritev(child, req, offset & ~(align - 1), align,
align, _qiov,
flags & ~BDRV_REQ_ZERO_WRITE);
 if (ret < 0) {
@@ -1452,7 +1457,7 @@ static int coroutine_fn 
bdrv_co_do_zero_pwritev(BlockDriverState *bs,
 if (bytes >= align) {
 /* Write the aligned part in the middle. */
 uint64_t aligned_bytes = bytes & ~(align - 1);
-ret = bdrv_aligned_pwritev(bs, req, offset, aligned_bytes, align,
+ret = bdrv_aligned_pwritev(child, req, offset, aligned_bytes, align,
NULL, flags);
 if (ret < 0) {
 goto fail;
@@ -1468,7 +1473,7 @@ static int coroutine_fn 

[Qemu-block] [PATCH v2 26/43] commit: Use real permissions for HMP 'commit'

2017-02-27 Thread Kevin Wolf
This is a little simpler than the commit block job because it's
synchronous and only commits into the immediate backing file, but
otherwise doing more or less the same.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/commit.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 8de4473..f18026b 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -401,11 +401,14 @@ fail:
 int bdrv_commit(BlockDriverState *bs)
 {
 BlockBackend *src, *backing;
+BlockDriverState *backing_file_bs = NULL;
+BlockDriverState *commit_top_bs = NULL;
 BlockDriver *drv = bs->drv;
 int64_t sector, total_sectors, length, backing_length;
 int n, ro, open_flags;
 int ret = 0;
 uint8_t *buf = NULL;
+Error *local_err = NULL;
 
 if (!drv)
 return -ENOMEDIUM;
@@ -428,17 +431,31 @@ int bdrv_commit(BlockDriverState *bs)
 }
 }
 
-/* FIXME Use real permissions */
-src = blk_new(0, BLK_PERM_ALL);
-backing = blk_new(0, BLK_PERM_ALL);
+src = blk_new(BLK_PERM_CONSISTENT_READ, BLK_PERM_ALL);
+backing = blk_new(BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
 
-ret = blk_insert_bs(src, bs, NULL);
+ret = blk_insert_bs(src, bs, _err);
 if (ret < 0) {
+error_report_err(local_err);
+goto ro_cleanup;
+}
+
+/* Insert commit_top block node above backing, so we can write to it */
+backing_file_bs = backing_bs(bs);
+
+commit_top_bs = bdrv_new_open_driver(_commit_top, NULL, BDRV_O_RDWR,
+ _err);
+if (commit_top_bs == NULL) {
+error_report_err(local_err);
 goto ro_cleanup;
 }
 
-ret = blk_insert_bs(backing, bs->backing->bs, NULL);
+bdrv_set_backing_hd(commit_top_bs, backing_file_bs);
+bdrv_set_backing_hd(bs, commit_top_bs);
+
+ret = blk_insert_bs(backing, backing_file_bs, _err);
 if (ret < 0) {
+error_report_err(local_err);
 goto ro_cleanup;
 }
 
@@ -512,8 +529,12 @@ int bdrv_commit(BlockDriverState *bs)
 ro_cleanup:
 qemu_vfree(buf);
 
-blk_unref(src);
 blk_unref(backing);
+if (backing_file_bs) {
+bdrv_set_backing_hd(bs, backing_file_bs);
+}
+bdrv_unref(commit_top_bs);
+blk_unref(src);
 
 if (ro) {
 /* ignoring error return here */
-- 
1.8.3.1




[Qemu-block] [PATCH v2 23/43] block: Add BdrvChildRole.stay_at_node

2017-02-27 Thread Kevin Wolf
When the parents' child links are updated in bdrv_append() or
bdrv_replace_in_backing_chain(), this should affect all child links of
BlockBackends or other nodes, but not on child links held for other
purposes (like for setting permissions). This patch allows to control
the behaviour per BdrvChildRole.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c   | 3 +++
 include/block/block_int.h | 4 
 2 files changed, 7 insertions(+)

diff --git a/block.c b/block.c
index a0c1886..6456917 100644
--- a/block.c
+++ b/block.c
@@ -2844,6 +2844,9 @@ static void change_parent_backing_link(BlockDriverState 
*from,
 BdrvChild *c, *next, *to_c;
 
 QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
+if (c->role->stay_at_node) {
+continue;
+}
 if (c->role == _backing) {
 /* @from is generally not allowed to be a backing file, except for
  * when @to is the overlay. In that case, @from may not be replaced
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 3177b9f..a0d9328 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -441,6 +441,10 @@ typedef struct BdrvAioNotifier {
 } BdrvAioNotifier;
 
 struct BdrvChildRole {
+/* If true, bdrv_replace_in_backing_chain() doesn't change the node this
+ * BdrvChild points to. */
+bool stay_at_node;
+
 void (*inherit_options)(int *child_flags, QDict *child_options,
 int parent_flags, QDict *parent_options);
 
-- 
1.8.3.1




[Qemu-block] [PATCH v2 37/43] nbd/server: Use real permissions for NBD exports

2017-02-27 Thread Kevin Wolf
NBD can't cope with device size changes, so resize must be forbidden,
but otherwise we can tolerate anything. Depending on whether the export
is writable or not, we only require consistent reads and writes.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 nbd/server.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 89362ba..924a1fe 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -891,10 +891,17 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset, off_t size,
 {
 BlockBackend *blk;
 NBDExport *exp = g_malloc0(sizeof(NBDExport));
+uint64_t perm;
 int ret;
 
-/* FIXME Use real permissions */
-blk = blk_new(0, BLK_PERM_ALL);
+/* Don't allow resize while the NBD server is running, otherwise we don't
+ * care what happens with the node. */
+perm = BLK_PERM_CONSISTENT_READ;
+if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
+perm |= BLK_PERM_WRITE;
+}
+blk = blk_new(perm, BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
+BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
 ret = blk_insert_bs(blk, bs, errp);
 if (ret < 0) {
 goto fail;
-- 
1.8.3.1




[Qemu-block] [PATCH v2 22/43] block: Include details on permission errors in message

2017-02-27 Thread Kevin Wolf
Instead of just telling that there was some conflict, we can be specific
and tell which permissions were in conflict and which way the conflict
is.

Signed-off-by: Kevin Wolf 
---
 block.c | 67 ++---
 1 file changed, 56 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index c4696f1..a0c1886 100644
--- a/block.c
+++ b/block.c
@@ -1471,6 +1471,43 @@ static void bdrv_update_perm(BlockDriverState *bs)
 bdrv_set_perm(bs, cumulative_perms, cumulative_shared_perms);
 }
 
+static char *bdrv_child_user_desc(BdrvChild *c)
+{
+if (c->role->get_parent_desc) {
+return c->role->get_parent_desc(c);
+}
+
+return g_strdup("another user");
+}
+
+static char *bdrv_perm_names(uint64_t perm)
+{
+struct perm_name {
+uint64_t perm;
+const char *name;
+} permissions[] = {
+{ BLK_PERM_CONSISTENT_READ, "consistent read" },
+{ BLK_PERM_WRITE,   "write" },
+{ BLK_PERM_WRITE_UNCHANGED, "write unchanged" },
+{ BLK_PERM_RESIZE,  "resize" },
+{ BLK_PERM_GRAPH_MOD,   "change children" },
+{ 0, NULL }
+};
+
+char *result = g_strdup("");
+struct perm_name *p;
+
+for (p = permissions; p->name; p++) {
+if (perm & p->perm) {
+char *old = result;
+result = g_strdup_printf("%s%s%s", old, *old ? ", " : "", p->name);
+g_free(old);
+}
+}
+
+return result;
+}
+
 /*
  * Checks whether a new reference to @bs can be added if the new user requires
  * @new_used_perm/@new_shared_perm as its permissions. If @ignore_child is set,
@@ -1495,17 +1532,25 @@ static int bdrv_check_update_perm(BlockDriverState *bs, 
uint64_t new_used_perm,
 continue;
 }
 
-if ((new_used_perm & c->shared_perm) != new_used_perm ||
-(c->perm & new_shared_perm) != c->perm)
-{
-const char *user = NULL;
-if (c->role->get_name) {
-user = c->role->get_name(c);
-if (user && !*user) {
-user = NULL;
-}
-}
-error_setg(errp, "Conflicts with %s", user ?: "another operation");
+if ((new_used_perm & c->shared_perm) != new_used_perm) {
+char *user = bdrv_child_user_desc(c);
+char *perm_names = bdrv_perm_names(new_used_perm & 
~c->shared_perm);
+error_setg(errp, "Conflicts with use by %s as '%s', which does not 
"
+ "allow '%s' on %s",
+   user, c->name, perm_names, bdrv_get_node_name(c->bs));
+g_free(user);
+g_free(perm_names);
+return -EPERM;
+}
+
+if ((c->perm & new_shared_perm) != c->perm) {
+char *user = bdrv_child_user_desc(c);
+char *perm_names = bdrv_perm_names(c->perm & ~new_shared_perm);
+error_setg(errp, "Conflicts with use by %s as '%s', which uses "
+ "'%s' on %s",
+   user, c->name, perm_names, bdrv_get_node_name(c->bs));
+g_free(user);
+g_free(perm_names);
 return -EPERM;
 }
 
-- 
1.8.3.1




[Qemu-block] [PATCH v2 40/43] block: Assertions for write permissions

2017-02-27 Thread Kevin Wolf
This adds assertions that ensure that the necessary write permissions
have been granted before someone attempts to write to a node.

Signed-off-by: Kevin Wolf 
---
 block/io.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/io.c b/block/io.c
index 2592ca1..4c79745 100644
--- a/block/io.c
+++ b/block/io.c
@@ -945,6 +945,8 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild 
*child,
 size_t skip_bytes;
 int ret;
 
+assert(child->perm & (BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE));
+
 /* Cover entire cluster so no additional backing file I/O is required when
  * allocating cluster in the image file.
  */
@@ -1336,6 +1338,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 assert(!waited || !req->serialising);
 assert(req->overlap_offset <= offset);
 assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
+assert(child->perm & BLK_PERM_WRITE);
 
 ret = notifier_with_return_list_notify(>before_write_notifiers, req);
 
-- 
1.8.3.1




[Qemu-block] [PATCH v2 34/43] commit: Add filter-node-name to block-commit

2017-02-27 Thread Kevin Wolf
Management tools need to be able to know about every node in the graph
and need a way to address them. Changing the graph structure was okay
because libvirt doesn't really manage the node level yet, but future
libvirt versions need to deal with both new and old version of qemu.

This new option to blockdev-commit allows the client to set a node-name
for the automatically inserted filter driver, and at the same time
serves as a witness for a future libvirt that this version of qemu does
automatically insert a filter driver.

Signed-off-by: Kevin Wolf 
---
 block/commit.c|  5 +++--
 block/mirror.c|  3 ++-
 block/replication.c   |  2 +-
 blockdev.c| 10 +++---
 include/block/block_int.h | 13 ++---
 qapi/block-core.json  |  8 +++-
 qemu-img.c|  4 ++--
 7 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index f18026b..1e0f531 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -257,7 +257,7 @@ static BlockDriver bdrv_commit_top = {
 void commit_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, BlockDriverState *top, int64_t speed,
   BlockdevOnError on_error, const char *backing_file_str,
-  Error **errp)
+  const char *filter_node_name, Error **errp)
 {
 CommitBlockJob *s;
 BlockReopenQueue *reopen_queue = NULL;
@@ -310,7 +310,8 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 
 /* Insert commit_top block node above top, so we can block consistent read
  * on the backing chain below it */
-commit_top_bs = bdrv_new_open_driver(_commit_top, NULL, 0, errp);
+commit_top_bs = bdrv_new_open_driver(_commit_top, filter_node_name, 0,
+ errp);
 if (commit_top_bs == NULL) {
 goto fail;
 }
diff --git a/block/mirror.c b/block/mirror.c
index c08b7e0..eea2d76 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1242,6 +1242,7 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
 void commit_active_start(const char *job_id, BlockDriverState *bs,
  BlockDriverState *base, int creation_flags,
  int64_t speed, BlockdevOnError on_error,
+ const char *filter_node_name,
  BlockCompletionFunc *cb, void *opaque, Error **errp,
  bool auto_complete)
 {
@@ -1258,7 +1259,7 @@ void commit_active_start(const char *job_id, 
BlockDriverState *bs,
  MIRROR_LEAVE_BACKING_CHAIN,
  on_error, on_error, true, cb, opaque, _err,
  _active_job_driver, false, base, auto_complete,
- NULL);
+ filter_node_name);
 if (local_err) {
 error_propagate(errp, local_err);
 goto error_restore_flags;
diff --git a/block/replication.c b/block/replication.c
index 91465cb..22f170f 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -644,7 +644,7 @@ static void replication_stop(ReplicationState *rs, bool 
failover, Error **errp)
 s->replication_state = BLOCK_REPLICATION_FAILOVER;
 commit_active_start(NULL, s->active_disk->bs, s->secondary_disk->bs,
 BLOCK_JOB_INTERNAL, 0, BLOCKDEV_ON_ERROR_REPORT,
-replication_done, bs, errp, true);
+NULL, replication_done, bs, errp, true);
 break;
 default:
 aio_context_release(aio_context);
diff --git a/blockdev.c b/blockdev.c
index 5bd09f8..34b522b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3031,6 +3031,7 @@ void qmp_block_commit(bool has_job_id, const char 
*job_id, const char *device,
   bool has_top, const char *top,
   bool has_backing_file, const char *backing_file,
   bool has_speed, int64_t speed,
+  bool has_filter_node_name, const char *filter_node_name,
   Error **errp)
 {
 BlockDriverState *bs;
@@ -3046,6 +3047,9 @@ void qmp_block_commit(bool has_job_id, const char 
*job_id, const char *device,
 if (!has_speed) {
 speed = 0;
 }
+if (!has_filter_node_name) {
+filter_node_name = NULL;
+}
 
 /* Important Note:
  *  libvirt relies on the DeviceNotFound error class in order to probe for
@@ -3120,8 +3124,8 @@ void qmp_block_commit(bool has_job_id, const char 
*job_id, const char *device,
 goto out;
 }
 commit_active_start(has_job_id ? job_id : NULL, bs, base_bs,
-BLOCK_JOB_DEFAULT, speed, on_error, NULL, NULL,
-_err, false);
+BLOCK_JOB_DEFAULT, speed, on_error,
+filter_node_name, NULL, NULL, _err, false);
 } else {
 BlockDriverState 

[Qemu-block] [PATCH v2 35/43] hmp: Request permissions in qemu-io

2017-02-27 Thread Kevin Wolf
The HMP command 'qemu-io' is a bit tricky because it wants to work on
the original BlockBackend, but additional permissions could be required.
The details are explained in a comment in the code, but in summary, just
request whatever permissions the current qemu-io command needs.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/block-backend.c  |  6 ++
 hmp.c  | 26 +-
 include/qemu-io.h  |  1 +
 include/sysemu/block-backend.h |  1 +
 qemu-io-cmds.c | 28 
 5 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 38a3858..daa7908 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -584,6 +584,12 @@ int blk_set_perm(BlockBackend *blk, uint64_t perm, 
uint64_t shared_perm,
 return 0;
 }
 
+void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm)
+{
+*perm = blk->perm;
+*shared_perm = blk->shared_perm;
+}
+
 static int blk_do_attach_dev(BlockBackend *blk, void *dev)
 {
 if (blk->dev) {
diff --git a/hmp.c b/hmp.c
index e219f97..7b44e64 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2051,7 +2051,6 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 if (!blk) {
 BlockDriverState *bs = bdrv_lookup_bs(NULL, device, );
 if (bs) {
-/* FIXME Use real permissions */
 blk = local_blk = blk_new(0, BLK_PERM_ALL);
 ret = blk_insert_bs(blk, bs, );
 if (ret < 0) {
@@ -2065,6 +2064,31 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 aio_context = blk_get_aio_context(blk);
 aio_context_acquire(aio_context);
 
+/*
+ * Notably absent: Proper permission management. This is sad, but it seems
+ * almost impossible to achieve without changing the semantics and thereby
+ * limiting the use cases of the qemu-io HMP command.
+ *
+ * In an ideal world we would unconditionally create a new BlockBackend for
+ * qemuio_command(), but we have commands like 'reopen' and want them to
+ * take effect on the exact BlockBackend whose name the user passed instead
+ * of just on a temporary copy of it.
+ *
+ * Another problem is that deleting the temporary BlockBackend involves
+ * draining all requests on it first, but some qemu-iotests cases want to
+ * issue multiple aio_read/write requests and expect them to complete in
+ * the background while the monitor has already returned.
+ *
+ * This is also what prevents us from saving the original permissions and
+ * restoring them later: We can't revoke permissions until all requests
+ * have completed, and we don't know when that is nor can we really let
+ * anything else run before we have revoken them to avoid race conditions.
+ *
+ * What happens now is that command() in qemu-io-cmds.c can extend the
+ * permissions if necessary for the qemu-io command. And they simply stay
+ * extended, possibly resulting in a read-only guest device keeping write
+ * permissions. Ugly, but it appears to be the lesser evil.
+ */
 qemuio_command(blk, command);
 
 aio_context_release(aio_context);
diff --git a/include/qemu-io.h b/include/qemu-io.h
index 4d402b9..196fde0 100644
--- a/include/qemu-io.h
+++ b/include/qemu-io.h
@@ -36,6 +36,7 @@ typedef struct cmdinfo {
 const char  *args;
 const char  *oneline;
 helpfunc_t  help;
+uint64_tperm;
 } cmdinfo_t;
 
 extern bool qemuio_misalign;
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index b23f683..096c17f 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -107,6 +107,7 @@ bool bdrv_has_blk(BlockDriverState *bs);
 bool bdrv_is_root_node(BlockDriverState *bs);
 int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
  Error **errp);
+void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm);
 
 void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow);
 void blk_iostatus_enable(BlockBackend *blk);
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 7ac1576..2c48f9c 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -83,6 +83,29 @@ static int command(BlockBackend *blk, const cmdinfo_t *ct, 
int argc,
 }
 return 0;
 }
+
+/* Request additional permissions if necessary for this command. The caller
+ * is responsible for restoring the original permissions afterwards if this
+ * is what it wants. */
+if (ct->perm && blk_is_available(blk)) {
+uint64_t orig_perm, orig_shared_perm;
+blk_get_perm(blk, _perm, _shared_perm);
+
+if (ct->perm & ~orig_perm) {
+uint64_t new_perm;
+Error *local_err = NULL;
+int ret;
+
+new_perm = orig_perm | ct->perm;
+
+ret = blk_set_perm(blk, 

[Qemu-block] [PATCH v2 21/43] block: Add BdrvChildRole.get_parent_desc()

2017-02-27 Thread Kevin Wolf
For meaningful error messages in the permission system, we need to get
some human-readable description of the parent of a BdrvChild.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c   |  9 +
 block/block-backend.c | 21 +
 include/block/block_int.h |  6 ++
 3 files changed, 36 insertions(+)

diff --git a/block.c b/block.c
index 189351e..c4696f1 100644
--- a/block.c
+++ b/block.c
@@ -707,6 +707,12 @@ int bdrv_parse_cache_mode(const char *mode, int *flags, 
bool *writethrough)
 return 0;
 }
 
+static char *bdrv_child_get_parent_desc(BdrvChild *c)
+{
+BlockDriverState *parent = c->opaque;
+return g_strdup(bdrv_get_device_or_node_name(parent));
+}
+
 static void bdrv_child_cb_drained_begin(BdrvChild *child)
 {
 BlockDriverState *bs = child->opaque;
@@ -774,6 +780,7 @@ static void bdrv_inherited_options(int *child_flags, QDict 
*child_options,
 }
 
 const BdrvChildRole child_file = {
+.get_parent_desc = bdrv_child_get_parent_desc,
 .inherit_options = bdrv_inherited_options,
 .drained_begin   = bdrv_child_cb_drained_begin,
 .drained_end = bdrv_child_cb_drained_end,
@@ -794,6 +801,7 @@ static void bdrv_inherited_fmt_options(int *child_flags, 
QDict *child_options,
 }
 
 const BdrvChildRole child_format = {
+.get_parent_desc = bdrv_child_get_parent_desc,
 .inherit_options = bdrv_inherited_fmt_options,
 .drained_begin   = bdrv_child_cb_drained_begin,
 .drained_end = bdrv_child_cb_drained_end,
@@ -824,6 +832,7 @@ static void bdrv_backing_options(int *child_flags, QDict 
*child_options,
 }
 
 const BdrvChildRole child_backing = {
+.get_parent_desc = bdrv_child_get_parent_desc,
 .inherit_options = bdrv_backing_options,
 .drained_begin   = bdrv_child_cb_drained_begin,
 .drained_end = bdrv_child_cb_drained_end,
diff --git a/block/block-backend.c b/block/block-backend.c
index fcc42b5..38a3858 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -80,6 +80,7 @@ static const AIOCBInfo block_backend_aiocb_info = {
 
 static void drive_info_del(DriveInfo *dinfo);
 static BlockBackend *bdrv_first_blk(BlockDriverState *bs);
+static char *blk_get_attached_dev_id(BlockBackend *blk);
 
 /* All BlockBackends */
 static QTAILQ_HEAD(, BlockBackend) block_backends =
@@ -102,6 +103,25 @@ static void blk_root_drained_end(BdrvChild *child);
 static void blk_root_change_media(BdrvChild *child, bool load);
 static void blk_root_resize(BdrvChild *child);
 
+static char *blk_root_get_parent_desc(BdrvChild *child)
+{
+BlockBackend *blk = child->opaque;
+char *dev_id;
+
+if (blk->name) {
+return g_strdup(blk->name);
+}
+
+dev_id = blk_get_attached_dev_id(blk);
+if (*dev_id) {
+return dev_id;
+} else {
+/* TODO Callback into the BB owner for something more detailed */
+g_free(dev_id);
+return g_strdup("a block device");
+}
+}
+
 static const char *blk_root_get_name(BdrvChild *child)
 {
 return blk_name(child->opaque);
@@ -113,6 +133,7 @@ static const BdrvChildRole child_root = {
 .change_media   = blk_root_change_media,
 .resize = blk_root_resize,
 .get_name   = blk_root_get_name,
+.get_parent_desc= blk_root_get_parent_desc,
 
 .drained_begin  = blk_root_drained_begin,
 .drained_end= blk_root_drained_end,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e00d0f4..3177b9f 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -452,6 +452,12 @@ struct BdrvChildRole {
  * name), or NULL if the parent can't provide a better name. */
 const char* (*get_name)(BdrvChild *child);
 
+/* Returns a malloced string that describes the parent of the child for a
+ * human reader. This could be a node-name, BlockBackend name, qdev ID or
+ * QOM path of the device owning the BlockBackend, job type and ID etc. The
+ * caller is responsible for freeing the memory. */
+char* (*get_parent_desc)(BdrvChild *child);
+
 /*
  * If this pair of functions is implemented, the parent doesn't issue new
  * requests after returning from .drained_begin() until .drained_end() is
-- 
1.8.3.1




[Qemu-block] [PATCH v2 28/43] block: Fix pending requests check in bdrv_append()

2017-02-27 Thread Kevin Wolf
bdrv_append() cares about isolation of the node that it modifies, but
not about activity in some subtree below it. Instead of using the
recursive bdrv_requests_pending(), directly check bs->in_flight, which
considers only the node in question.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 6456917..7926c6c 100644
--- a/block.c
+++ b/block.c
@@ -2886,8 +2886,8 @@ static void change_parent_backing_link(BlockDriverState 
*from,
  */
 void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
 {
-assert(!bdrv_requests_pending(bs_top));
-assert(!bdrv_requests_pending(bs_new));
+assert(!atomic_read(_top->in_flight));
+assert(!atomic_read(_new->in_flight));
 
 bdrv_ref(bs_top);
 
-- 
1.8.3.1




[Qemu-block] [PATCH v2 24/43] blockjob: Add permissions to block_job_add_bdrv()

2017-02-27 Thread Kevin Wolf
Block jobs don't actually do I/O through the the reference they create
with block_job_add_bdrv(), but they might want to use the permisssion
system to express what the block job does to intermediate nodes. This
adds permissions to block_job_add_bdrv() to provide the means to request
permissions.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/backup.c   |  4 +++-
 block/commit.c   |  8 ++--
 block/mirror.c   |  9 +++--
 block/stream.c   |  4 +++-
 blockjob.c   | 36 ++--
 include/block/blockjob.h |  5 -
 6 files changed, 53 insertions(+), 13 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index c759684..405f271 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -657,7 +657,9 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
 }
 
-block_job_add_bdrv(>common, target);
+/* FIXME Use real permissions */
+block_job_add_bdrv(>common, "target", target, 0, BLK_PERM_ALL,
+   _abort);
 job->common.len = len;
 block_job_txn_add_job(txn, >common);
 
diff --git a/block/commit.c b/block/commit.c
index 60d29a9..b69586f 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -267,13 +267,17 @@ void commit_start(const char *job_id, BlockDriverState 
*bs,
  * disappear from the chain after this operation. */
 assert(bdrv_chain_contains(top, base));
 for (iter = top; iter != backing_bs(base); iter = backing_bs(iter)) {
-block_job_add_bdrv(>common, iter);
+/* FIXME Use real permissions */
+block_job_add_bdrv(>common, "intermediate node", iter, 0,
+   BLK_PERM_ALL, _abort);
 }
 /* overlay_bs must be blocked because it needs to be modified to
  * update the backing image string, but if it's the root node then
  * don't block it again */
 if (bs != overlay_bs) {
-block_job_add_bdrv(>common, overlay_bs);
+/* FIXME Use real permissions */
+block_job_add_bdrv(>common, "overlay of top", overlay_bs, 0,
+   BLK_PERM_ALL, _abort);
 }
 
 /* FIXME Use real permissions */
diff --git a/block/mirror.c b/block/mirror.c
index cd4e7db..beaac6f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1052,13 +1052,18 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 return;
 }
 
-block_job_add_bdrv(>common, target);
+/* FIXME Use real permissions */
+block_job_add_bdrv(>common, "target", target, 0, BLK_PERM_ALL,
+   _abort);
+
 /* In commit_active_start() all intermediate nodes disappear, so
  * any jobs in them must be blocked */
 if (bdrv_chain_contains(bs, target)) {
 BlockDriverState *iter;
 for (iter = backing_bs(bs); iter != target; iter = backing_bs(iter)) {
-block_job_add_bdrv(>common, iter);
+/* FIXME Use real permissions */
+block_job_add_bdrv(>common, "intermediate node", iter, 0,
+   BLK_PERM_ALL, _abort);
 }
 }
 
diff --git a/block/stream.c b/block/stream.c
index 7f49279..ba8650f 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -248,7 +248,9 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 /* Block all intermediate nodes between bs and base, because they
  * will disappear from the chain after this operation */
 for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) 
{
-block_job_add_bdrv(>common, iter);
+/* FIXME Use real permissions */
+block_job_add_bdrv(>common, "intermediate node", iter, 0,
+   BLK_PERM_ALL, _abort);
 }
 
 s->base = base;
diff --git a/blockjob.c b/blockjob.c
index 27833c7..4216cde 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -55,6 +55,19 @@ struct BlockJobTxn {
 
 static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
 
+static char *child_job_get_parent_desc(BdrvChild *c)
+{
+BlockJob *job = c->opaque;
+return g_strdup_printf("%s job '%s'",
+   BlockJobType_lookup[job->driver->job_type],
+   job->id);
+}
+
+static const BdrvChildRole child_job = {
+.get_parent_desc= child_job_get_parent_desc,
+.stay_at_node   = true,
+};
+
 BlockJob *block_job_next(BlockJob *job)
 {
 if (!job) {
@@ -115,11 +128,22 @@ static void block_job_detach_aio_context(void *opaque)
 block_job_unref(job);
 }
 
-void block_job_add_bdrv(BlockJob *job, BlockDriverState *bs)
+int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
+   uint64_t perm, uint64_t shared_perm, Error **errp)
 {
-job->nodes = g_slist_prepend(job->nodes, bs);
+BdrvChild *c;
+
+c = bdrv_root_attach_child(bs, name, 

[Qemu-block] [PATCH v2 25/43] commit: Use real permissions in commit block job

2017-02-27 Thread Kevin Wolf
This is probably one of the most interesting conversions to the new
op blocker system because a commit block job intentionally leaves some
intermediate block nodes in the backing chain that aren't valid on their
own any more; only the whole chain together results in a valid view.

In order to provide the 'consistent read' permission to the parents of
the 'top' node of the commit job, a new filter block driver is inserted
above 'top' which doesn't require 'consistent read' on its backing
chain. Subsequently, the commit job can block 'consistent read' on all
intermediate nodes without causing a conflict.

Signed-off-by: Kevin Wolf 
---
 block/commit.c | 113 -
 1 file changed, 95 insertions(+), 18 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index b69586f..8de4473 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -36,6 +36,7 @@ typedef struct CommitBlockJob {
 BlockJob common;
 RateLimit limit;
 BlockDriverState *active;
+BlockDriverState *commit_top_bs;
 BlockBackend *top;
 BlockBackend *base;
 BlockdevOnError on_error;
@@ -83,12 +84,23 @@ static void commit_complete(BlockJob *job, void *opaque)
 BlockDriverState *active = s->active;
 BlockDriverState *top = blk_bs(s->top);
 BlockDriverState *base = blk_bs(s->base);
-BlockDriverState *overlay_bs = bdrv_find_overlay(active, top);
+BlockDriverState *overlay_bs = bdrv_find_overlay(active, s->commit_top_bs);
 int ret = data->ret;
+bool remove_commit_top_bs = false;
+
+/* Remove base node parent that still uses BLK_PERM_WRITE/RESIZE before
+ * the normal backing chain can be restored. */
+blk_unref(s->base);
 
 if (!block_job_is_cancelled(>common) && ret == 0) {
 /* success */
-ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
+ret = bdrv_drop_intermediate(active, s->commit_top_bs, base,
+ s->backing_file_str);
+} else if (overlay_bs) {
+/* XXX Can (or should) we somehow keep 'consistent read' blocked even
+ * after the failed/cancelled commit job is gone? If we already wrote
+ * something to base, the intermediate images aren't valid any more. */
+remove_commit_top_bs = true;
 }
 
 /* restore base open flags here if appropriate (e.g., change the base back
@@ -102,9 +114,15 @@ static void commit_complete(BlockJob *job, void *opaque)
 }
 g_free(s->backing_file_str);
 blk_unref(s->top);
-blk_unref(s->base);
 block_job_completed(>common, ret);
 g_free(data);
+
+/* If bdrv_drop_intermediate() didn't already do that, remove the commit
+ * filter driver from the backing chain. Do this as the final step so that
+ * the 'consistent read' permission can be granted.  */
+if (remove_commit_top_bs) {
+bdrv_set_backing_hd(overlay_bs, top);
+}
 }
 
 static void coroutine_fn commit_run(void *opaque)
@@ -208,6 +226,34 @@ static const BlockJobDriver commit_job_driver = {
 .start = commit_run,
 };
 
+static int coroutine_fn bdrv_commit_top_preadv(BlockDriverState *bs,
+uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
+{
+return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
+}
+
+static void bdrv_commit_top_close(BlockDriverState *bs)
+{
+}
+
+static void bdrv_commit_top_child_perm(BlockDriverState *bs, BdrvChild *c,
+   const BdrvChildRole *role,
+   uint64_t perm, uint64_t shared,
+   uint64_t *nperm, uint64_t *nshared)
+{
+*nperm = 0;
+*nshared = BLK_PERM_ALL;
+}
+
+/* Dummy node that provides consistent read to its users without requiring it
+ * from its backing file and that allows writes on the backing file chain. */
+static BlockDriver bdrv_commit_top = {
+.format_name= "commit_top",
+.bdrv_co_preadv = bdrv_commit_top_preadv,
+.bdrv_close = bdrv_commit_top_close,
+.bdrv_child_perm= bdrv_commit_top_child_perm,
+};
+
 void commit_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, BlockDriverState *top, int64_t speed,
   BlockdevOnError on_error, const char *backing_file_str,
@@ -219,6 +265,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 int orig_base_flags;
 BlockDriverState *iter;
 BlockDriverState *overlay_bs;
+BlockDriverState *commit_top_bs = NULL;
 Error *local_err = NULL;
 int ret;
 
@@ -235,7 +282,6 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 return;
 }
 
-/* FIXME Use real permissions */
 s = block_job_create(job_id, _job_driver, bs, 0, BLK_PERM_ALL,
  speed, BLOCK_JOB_DEFAULT, NULL, NULL, errp);
 if (!s) {
@@ -262,34 +308,62 @@ void commit_start(const char *job_id, 

[Qemu-block] [PATCH v2 32/43] stream: Use real permissions in streaming block job

2017-02-27 Thread Kevin Wolf
The correct permissions are relatively obvious here (and explained in
code comments). For intermediate streaming, we need to reopen the top
node read-write before creating the job now because the permissions
system catches attempts to get the BLK_PERM_WRITE_UNCHANGED permission
on a read-only node.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/stream.c | 37 +
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index ba8650f..0c30d41 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -229,28 +229,35 @@ void stream_start(const char *job_id, BlockDriverState 
*bs,
 BlockDriverState *iter;
 int orig_bs_flags;
 
-/* FIXME Use real permissions */
-s = block_job_create(job_id, _job_driver, bs, 0, BLK_PERM_ALL,
- speed, BLOCK_JOB_DEFAULT, NULL, NULL, errp);
-if (!s) {
-return;
-}
-
 /* Make sure that the image is opened in read-write mode */
 orig_bs_flags = bdrv_get_flags(bs);
 if (!(orig_bs_flags & BDRV_O_RDWR)) {
 if (bdrv_reopen(bs, orig_bs_flags | BDRV_O_RDWR, errp) != 0) {
-block_job_unref(>common);
 return;
 }
 }
 
-/* Block all intermediate nodes between bs and base, because they
- * will disappear from the chain after this operation */
+/* Prevent concurrent jobs trying to modify the graph structure here, we
+ * already have our own plans. Also don't allow resize as the image size is
+ * queried only at the job start and then cached. */
+s = block_job_create(job_id, _job_driver, bs,
+ BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
+ BLK_PERM_GRAPH_MOD,
+ BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
+ BLK_PERM_WRITE,
+ speed, BLOCK_JOB_DEFAULT, NULL, NULL, errp);
+if (!s) {
+goto fail;
+}
+
+/* Block all intermediate nodes between bs and base, because they will
+ * disappear from the chain after this operation. The streaming job reads
+ * every block only once, assuming that it doesn't change, so block writes
+ * and resizes. */
 for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) 
{
-/* FIXME Use real permissions */
 block_job_add_bdrv(>common, "intermediate node", iter, 0,
-   BLK_PERM_ALL, _abort);
+   BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED,
+   _abort);
 }
 
 s->base = base;
@@ -260,4 +267,10 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 s->on_error = on_error;
 trace_stream_start(bs, base, s);
 block_job_start(>common);
+return;
+
+fail:
+if (orig_bs_flags != bdrv_get_flags(bs)) {
+bdrv_reopen(bs, s->bs_flags, NULL);
+}
 }
-- 
1.8.3.1




[Qemu-block] [PATCH v2 08/43] block: Request child permissions in format drivers

2017-02-27 Thread Kevin Wolf
This makes use of the .bdrv_child_perm() implementation for formats that
we just added. All format drivers expose the permissions they actually
need nows, so that they can be set accordingly and updated when parents
are attached or detached.

The only format not included here is raw, which was already converted
with the other filter drivers.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/bochs.c | 1 +
 block/cloop.c | 1 +
 block/crypto.c| 1 +
 block/dmg.c   | 1 +
 block/parallels.c | 1 +
 block/qcow.c  | 1 +
 block/qcow2.c | 1 +
 block/qed.c   | 1 +
 block/vdi.c   | 1 +
 block/vhdx.c  | 1 +
 block/vmdk.c  | 1 +
 block/vpc.c   | 1 +
 12 files changed, 12 insertions(+)

diff --git a/block/bochs.c b/block/bochs.c
index 7dd2ac4..516da56 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -293,6 +293,7 @@ static BlockDriver bdrv_bochs = {
 .instance_size = sizeof(BDRVBochsState),
 .bdrv_probe= bochs_probe,
 .bdrv_open = bochs_open,
+.bdrv_child_perm = bdrv_format_default_perms,
 .bdrv_refresh_limits = bochs_refresh_limits,
 .bdrv_co_preadv = bochs_co_preadv,
 .bdrv_close= bochs_close,
diff --git a/block/cloop.c b/block/cloop.c
index 877c9b0..a6c7b9d 100644
--- a/block/cloop.c
+++ b/block/cloop.c
@@ -290,6 +290,7 @@ static BlockDriver bdrv_cloop = {
 .instance_size  = sizeof(BDRVCloopState),
 .bdrv_probe = cloop_probe,
 .bdrv_open  = cloop_open,
+.bdrv_child_perm = bdrv_format_default_perms,
 .bdrv_refresh_limits = cloop_refresh_limits,
 .bdrv_co_preadv = cloop_co_preadv,
 .bdrv_close = cloop_close,
diff --git a/block/crypto.c b/block/crypto.c
index 7cb2ff2..4a20388 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -628,6 +628,7 @@ BlockDriver bdrv_crypto_luks = {
 .bdrv_probe = block_crypto_probe_luks,
 .bdrv_open  = block_crypto_open_luks,
 .bdrv_close = block_crypto_close,
+.bdrv_child_perm= bdrv_format_default_perms,
 .bdrv_create= block_crypto_create_luks,
 .bdrv_truncate  = block_crypto_truncate,
 .create_opts= _crypto_create_opts_luks,
diff --git a/block/dmg.c b/block/dmg.c
index 8e387cd..a7d25fc 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -697,6 +697,7 @@ static BlockDriver bdrv_dmg = {
 .bdrv_probe = dmg_probe,
 .bdrv_open  = dmg_open,
 .bdrv_refresh_limits = dmg_refresh_limits,
+.bdrv_child_perm = bdrv_format_default_perms,
 .bdrv_co_preadv = dmg_co_preadv,
 .bdrv_close = dmg_close,
 };
diff --git a/block/parallels.c b/block/parallels.c
index b2ec09f..6b0c0a9 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -762,6 +762,7 @@ static BlockDriver bdrv_parallels = {
 .bdrv_probe= parallels_probe,
 .bdrv_open = parallels_open,
 .bdrv_close= parallels_close,
+.bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_co_get_block_status = parallels_co_get_block_status,
 .bdrv_has_zero_init   = bdrv_has_zero_init_1,
 .bdrv_co_flush_to_os  = parallels_co_flush_to_os,
diff --git a/block/qcow.c b/block/qcow.c
index 038b05a..eb5d54c 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -1052,6 +1052,7 @@ static BlockDriver bdrv_qcow = {
 .bdrv_probe= qcow_probe,
 .bdrv_open = qcow_open,
 .bdrv_close= qcow_close,
+.bdrv_child_perm= bdrv_format_default_perms,
 .bdrv_reopen_prepare= qcow_reopen_prepare,
 .bdrv_create= qcow_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
diff --git a/block/qcow2.c b/block/qcow2.c
index 21e6142..ef028f6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3403,6 +3403,7 @@ BlockDriver bdrv_qcow2 = {
 .bdrv_reopen_commit   = qcow2_reopen_commit,
 .bdrv_reopen_abort= qcow2_reopen_abort,
 .bdrv_join_options= qcow2_join_options,
+.bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create= qcow2_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
 .bdrv_co_get_block_status = qcow2_co_get_block_status,
diff --git a/block/qed.c b/block/qed.c
index 62a0a09..d8f947a 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1704,6 +1704,7 @@ static BlockDriver bdrv_qed = {
 .bdrv_open= bdrv_qed_open,
 .bdrv_close   = bdrv_qed_close,
 .bdrv_reopen_prepare  = bdrv_qed_reopen_prepare,
+.bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create  = bdrv_qed_create,
 .bdrv_has_zero_init   = bdrv_has_zero_init_1,
 .bdrv_co_get_block_status = bdrv_qed_co_get_block_status,
diff --git a/block/vdi.c b/block/vdi.c
index 18b4773..fd6e26d 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -891,6 +891,7 @@ static BlockDriver bdrv_vdi = {
 .bdrv_open = vdi_open,
 .bdrv_close = 

[Qemu-block] [PATCH v2 09/43] vvfat: Implement .bdrv_child_perm()

2017-02-27 Thread Kevin Wolf
vvfat is the last remaining driver that can have children, but doesn't
implement .bdrv_child_perm() yet. The default handlers aren't suitable
here, so let's implement a very simple driver-specific one that protects
the internal child from being used by other users as good as our
permissions permit.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c   |  2 +-
 block/vvfat.c | 22 ++
 include/block/block_int.h |  1 +
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 20dd1c5..7ae8264 100644
--- a/block.c
+++ b/block.c
@@ -823,7 +823,7 @@ static void bdrv_backing_options(int *child_flags, QDict 
*child_options,
 *child_flags = flags;
 }
 
-static const BdrvChildRole child_backing = {
+const BdrvChildRole child_backing = {
 .inherit_options = bdrv_backing_options,
 .drained_begin   = bdrv_child_cb_drained_begin,
 .drained_end = bdrv_child_cb_drained_end,
diff --git a/block/vvfat.c b/block/vvfat.c
index 7f230be..72b482c 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3052,6 +3052,27 @@ err:
 return ret;
 }
 
+static void vvfat_child_perm(BlockDriverState *bs, BdrvChild *c,
+ const BdrvChildRole *role,
+ uint64_t perm, uint64_t shared,
+ uint64_t *nperm, uint64_t *nshared)
+{
+BDRVVVFATState *s = bs->opaque;
+
+assert(c == s->qcow || role == _backing);
+
+if (c == s->qcow) {
+/* This is a private node, nobody should try to attach to it */
+*nperm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
+*nshared = BLK_PERM_WRITE_UNCHANGED;
+} else {
+/* The backing file is there so 'commit' can use it. vvfat doesn't
+ * access it in any way. */
+*nperm = 0;
+*nshared = BLK_PERM_ALL;
+}
+}
+
 static void vvfat_close(BlockDriverState *bs)
 {
 BDRVVVFATState *s = bs->opaque;
@@ -3077,6 +3098,7 @@ static BlockDriver bdrv_vvfat = {
 .bdrv_file_open = vvfat_open,
 .bdrv_refresh_limits= vvfat_refresh_limits,
 .bdrv_close = vvfat_close,
+.bdrv_child_perm= vvfat_child_perm,
 
 .bdrv_co_preadv = vvfat_co_preadv,
 .bdrv_co_pwritev= vvfat_co_pwritev,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index eb0598e..63d5446 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -466,6 +466,7 @@ struct BdrvChildRole {
 
 extern const BdrvChildRole child_file;
 extern const BdrvChildRole child_format;
+extern const BdrvChildRole child_backing;
 
 struct BdrvChild {
 BlockDriverState *bs;
-- 
1.8.3.1




[Qemu-block] [PATCH v2 20/43] blockjob: Add permissions to block_job_create()

2017-02-27 Thread Kevin Wolf
This functions creates a BlockBackend internally, so the block jobs need
to tell it what they want to do with the BB.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/backup.c   | 5 +++--
 block/commit.c   | 5 +++--
 block/mirror.c   | 5 +++--
 block/stream.c   | 5 +++--
 blockjob.c   | 6 +++---
 include/block/blockjob_int.h | 4 +++-
 tests/test-blockjob-txn.c| 6 +++---
 tests/test-blockjob.c| 5 +++--
 8 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index f38d1d0..c759684 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -618,8 +618,9 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 goto error;
 }
 
-job = block_job_create(job_id, _job_driver, bs, speed,
-   creation_flags, cb, opaque, errp);
+/* FIXME Use real permissions */
+job = block_job_create(job_id, _job_driver, bs, 0, BLK_PERM_ALL,
+   speed, creation_flags, cb, opaque, errp);
 if (!job) {
 goto error;
 }
diff --git a/block/commit.c b/block/commit.c
index 2ad8138..60d29a9 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -235,8 +235,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 return;
 }
 
-s = block_job_create(job_id, _job_driver, bs, speed,
- BLOCK_JOB_DEFAULT, NULL, NULL, errp);
+/* FIXME Use real permissions */
+s = block_job_create(job_id, _job_driver, bs, 0, BLK_PERM_ALL,
+ speed, BLOCK_JOB_DEFAULT, NULL, NULL, errp);
 if (!s) {
 return;
 }
diff --git a/block/mirror.c b/block/mirror.c
index 7eeeb97..cd4e7db 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1015,8 +1015,9 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 buf_size = DEFAULT_MIRROR_BUF_SIZE;
 }
 
-s = block_job_create(job_id, driver, bs, speed, creation_flags,
- cb, opaque, errp);
+/* FIXME Use real permissions */
+s = block_job_create(job_id, driver, bs, 0, BLK_PERM_ALL, speed,
+ creation_flags, cb, opaque, errp);
 if (!s) {
 return;
 }
diff --git a/block/stream.c b/block/stream.c
index 1523ba7..7f49279 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -229,8 +229,9 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 BlockDriverState *iter;
 int orig_bs_flags;
 
-s = block_job_create(job_id, _job_driver, bs, speed,
- BLOCK_JOB_DEFAULT, NULL, NULL, errp);
+/* FIXME Use real permissions */
+s = block_job_create(job_id, _job_driver, bs, 0, BLK_PERM_ALL,
+ speed, BLOCK_JOB_DEFAULT, NULL, NULL, errp);
 if (!s) {
 return;
 }
diff --git a/blockjob.c b/blockjob.c
index 72b7d4c..27833c7 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -123,7 +123,8 @@ void block_job_add_bdrv(BlockJob *job, BlockDriverState *bs)
 }
 
 void *block_job_create(const char *job_id, const BlockJobDriver *driver,
-   BlockDriverState *bs, int64_t speed, int flags,
+   BlockDriverState *bs, uint64_t perm,
+   uint64_t shared_perm, int64_t speed, int flags,
BlockCompletionFunc *cb, void *opaque, Error **errp)
 {
 BlockBackend *blk;
@@ -160,8 +161,7 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 }
 }
 
-/* FIXME Use real permissions */
-blk = blk_new(0, BLK_PERM_ALL);
+blk = blk_new(perm, shared_perm);
 ret = blk_insert_bs(blk, bs, errp);
 if (ret < 0) {
 blk_unref(blk);
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 8223822..3f86cc5 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -119,6 +119,7 @@ struct BlockJobDriver {
  * generated automatically.
  * @job_type: The class object for the newly-created job.
  * @bs: The block
+ * @perm, @shared_perm: Permissions to request for @bs
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @cb: Completion function for the job.
  * @opaque: Opaque pointer value passed to @cb.
@@ -134,7 +135,8 @@ struct BlockJobDriver {
  * called from a wrapper that is specific to the job type.
  */
 void *block_job_create(const char *job_id, const BlockJobDriver *driver,
-   BlockDriverState *bs, int64_t speed, int flags,
+   BlockDriverState *bs, uint64_t perm,
+   uint64_t shared_perm, int64_t speed, int flags,
BlockCompletionFunc *cb, void *opaque, Error **errp);
 
 /**
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index f6dfd08..4ccbda1 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -101,9 +101,9 @@ static 

[Qemu-block] [PATCH v2 18/43] hw/block: Request permissions

2017-02-27 Thread Kevin Wolf
This makes all device emulations with a qdev drive property request
permissions on their BlockBackend. The only thing we block at this point
is resizing images for some devices that can't support it.

Signed-off-by: Kevin Wolf 
---
 hw/block/block.c | 22 +-
 hw/block/fdc.c   | 25 +++--
 hw/block/m25p80.c|  8 
 hw/block/nand.c  |  7 +++
 hw/block/nvme.c  |  8 +++-
 hw/block/onenand.c   |  7 +++
 hw/block/pflash_cfi01.c  | 18 --
 hw/block/pflash_cfi02.c  | 19 +--
 hw/block/virtio-blk.c|  8 +++-
 hw/core/qdev-properties-system.c |  1 -
 hw/ide/qdev.c|  8 ++--
 hw/nvram/spapr_nvram.c   |  8 
 hw/scsi/scsi-disk.c  |  9 +++--
 hw/sd/sd.c   |  6 ++
 hw/usb/dev-storage.c |  6 +-
 include/hw/block/block.h |  3 ++-
 tests/qemu-iotests/051.pc.out|  6 +++---
 17 files changed, 142 insertions(+), 27 deletions(-)

diff --git a/hw/block/block.c b/hw/block/block.c
index 8dc9d84..7059ba1 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -51,11 +51,31 @@ void blkconf_blocksizes(BlockConf *conf)
 }
 }
 
-void blkconf_apply_backend_options(BlockConf *conf)
+void blkconf_apply_backend_options(BlockConf *conf, bool readonly,
+   bool resizable, Error **errp)
 {
 BlockBackend *blk = conf->blk;
 BlockdevOnError rerror, werror;
+uint64_t perm, shared_perm;
 bool wce;
+int ret;
+
+perm = BLK_PERM_CONSISTENT_READ;
+if (!readonly) {
+perm |= BLK_PERM_WRITE;
+}
+
+/* TODO Remove BLK_PERM_WRITE unless explicitly configured so */
+shared_perm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
+  BLK_PERM_GRAPH_MOD | BLK_PERM_WRITE;
+if (resizable) {
+shared_perm |= BLK_PERM_RESIZE;
+}
+
+ret = blk_set_perm(blk, perm, shared_perm, errp);
+if (ret < 0) {
+return;
+}
 
 switch (conf->wce) {
 case ON_OFF_AUTO_ON:wce = true; break;
diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 5f6c496..a328693 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -186,6 +186,7 @@ typedef enum FDiskFlags {
 struct FDrive {
 FDCtrl *fdctrl;
 BlockBackend *blk;
+BlockConf *conf;
 /* Drive status */
 FloppyDriveType drive;/* CMOS drive type*/
 uint8_t perpendicular;/* 2.88 MB access mode*/
@@ -472,6 +473,19 @@ static void fd_revalidate(FDrive *drv)
 static void fd_change_cb(void *opaque, bool load, Error **errp)
 {
 FDrive *drive = opaque;
+Error *local_err = NULL;
+
+if (!load) {
+blk_set_perm(drive->blk, 0, BLK_PERM_ALL, _abort);
+} else {
+blkconf_apply_backend_options(drive->conf,
+  blk_is_read_only(drive->blk), false,
+  _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
 
 drive->media_changed = 1;
 drive->media_validated = false;
@@ -508,6 +522,7 @@ static int floppy_drive_init(DeviceState *qdev)
 FloppyDrive *dev = FLOPPY_DRIVE(qdev);
 FloppyBus *bus = FLOPPY_BUS(qdev->parent_bus);
 FDrive *drive;
+Error *local_err = NULL;
 int ret;
 
 if (dev->unit == -1) {
@@ -533,7 +548,6 @@ static int floppy_drive_init(DeviceState *qdev)
 
 if (!dev->conf.blk) {
 /* Anonymous BlockBackend for an empty drive */
-/* FIXME Use real permissions */
 dev->conf.blk = blk_new(0, BLK_PERM_ALL);
 ret = blk_attach_dev(dev->conf.blk, qdev);
 assert(ret == 0);
@@ -552,7 +566,13 @@ static int floppy_drive_init(DeviceState *qdev)
  * blkconf_apply_backend_options(). */
 dev->conf.rerror = BLOCKDEV_ON_ERROR_AUTO;
 dev->conf.werror = BLOCKDEV_ON_ERROR_AUTO;
-blkconf_apply_backend_options(>conf);
+
+blkconf_apply_backend_options(>conf, blk_is_read_only(dev->conf.blk),
+  false, _err);
+if (local_err) {
+error_report_err(local_err);
+return -1;
+}
 
 /* 'enospc' is the default for -drive, 'report' is what blk_new() gives us
  * for empty drives. */
@@ -566,6 +586,7 @@ static int floppy_drive_init(DeviceState *qdev)
 return -1;
 }
 
+drive->conf = >conf;
 drive->blk = dev->conf.blk;
 drive->fdctrl = bus->fdc;
 
diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 2d6eb46..190573c 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -1215,6 +1215,7 @@ static void m25p80_realize(SSISlave *ss, Error **errp)
 {
 Flash *s = M25P80(ss);
 M25P80Class *mc = M25P80_GET_CLASS(s);
+int ret;
 
 s->pi = mc->pi;
 
@@ -1222,6 +1223,13 @@ static void m25p80_realize(SSISlave *ss, Error 

[Qemu-block] [PATCH v2 17/43] block: Allow error return in BlockDevOps.change_media_cb()

2017-02-27 Thread Kevin Wolf
Some devices allow a media change between read-only and read-write
media. They need to adapt the permissions in their .change_media_cb()
implementation, which can fail. So add an Error parameter to the
function.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/block-backend.c  | 20 +++-
 blockdev.c | 19 +++
 hw/block/fdc.c |  2 +-
 hw/ide/core.c  |  2 +-
 hw/scsi/scsi-disk.c|  2 +-
 hw/sd/sd.c |  2 +-
 include/block/block_int.h  |  2 +-
 include/sysemu/block-backend.h |  2 +-
 8 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 03d5495..fcc42b5 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -677,19 +677,29 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps 
*ops,
 
 /*
  * Notify @blk's attached device model of media change.
- * If @load is true, notify of media load.
- * Else, notify of media eject.
+ *
+ * If @load is true, notify of media load. This action can fail, meaning that
+ * the medium cannot be loaded. @errp is set then.
+ *
+ * If @load is false, notify of media eject. This can never fail.
+ *
  * Also send DEVICE_TRAY_MOVED events as appropriate.
  */
-void blk_dev_change_media_cb(BlockBackend *blk, bool load)
+void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp)
 {
 if (blk->dev_ops && blk->dev_ops->change_media_cb) {
 bool tray_was_open, tray_is_open;
+Error *local_err = NULL;
 
 assert(!blk->legacy_dev);
 
 tray_was_open = blk_dev_is_tray_open(blk);
-blk->dev_ops->change_media_cb(blk->dev_opaque, load);
+blk->dev_ops->change_media_cb(blk->dev_opaque, load, _err);
+if (local_err) {
+assert(load == true);
+error_propagate(errp, local_err);
+return;
+}
 tray_is_open = blk_dev_is_tray_open(blk);
 
 if (tray_was_open != tray_is_open) {
@@ -703,7 +713,7 @@ void blk_dev_change_media_cb(BlockBackend *blk, bool load)
 
 static void blk_root_change_media(BdrvChild *child, bool load)
 {
-blk_dev_change_media_cb(child->opaque, load);
+blk_dev_change_media_cb(child->opaque, load, NULL);
 }
 
 /*
diff --git a/blockdev.c b/blockdev.c
index 011871b..2374973 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2310,7 +2310,7 @@ static int do_open_tray(const char *blk_name, const char 
*qdev_id,
 }
 
 if (!locked || force) {
-blk_dev_change_media_cb(blk, false);
+blk_dev_change_media_cb(blk, false, _abort);
 }
 
 if (locked && !force) {
@@ -2348,6 +2348,7 @@ void qmp_blockdev_close_tray(bool has_device, const char 
*device,
  Error **errp)
 {
 BlockBackend *blk;
+Error *local_err = NULL;
 
 device = has_device ? device : NULL;
 id = has_id ? id : NULL;
@@ -2371,7 +2372,11 @@ void qmp_blockdev_close_tray(bool has_device, const char 
*device,
 return;
 }
 
-blk_dev_change_media_cb(blk, true);
+blk_dev_change_media_cb(blk, true, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
 }
 
 void qmp_x_blockdev_remove_medium(bool has_device, const char *device,
@@ -2424,7 +2429,7 @@ void qmp_x_blockdev_remove_medium(bool has_device, const 
char *device,
  * called at all); therefore, the medium needs to be ejected here.
  * Do it after blk_remove_bs() so blk_is_inserted(blk) returns the 
@load
  * value passed here (i.e. false). */
-blk_dev_change_media_cb(blk, false);
+blk_dev_change_media_cb(blk, false, _abort);
 }
 
 out:
@@ -2434,6 +2439,7 @@ out:
 static void qmp_blockdev_insert_anon_medium(BlockBackend *blk,
 BlockDriverState *bs, Error **errp)
 {
+Error *local_err = NULL;
 bool has_device;
 int ret;
 
@@ -2466,7 +2472,12 @@ static void qmp_blockdev_insert_anon_medium(BlockBackend 
*blk,
  * slot here.
  * Do it after blk_insert_bs() so blk_is_inserted(blk) returns the 
@load
  * value passed here (i.e. true). */
-blk_dev_change_media_cb(blk, true);
+blk_dev_change_media_cb(blk, true, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+blk_remove_bs(blk);
+return;
+}
 }
 }
 
diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 74f3634..5f6c496 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -469,7 +469,7 @@ static void fd_revalidate(FDrive *drv)
 }
 }
 
-static void fd_change_cb(void *opaque, bool load)
+static void fd_change_cb(void *opaque, bool load, Error **errp)
 {
 FDrive *drive = opaque;
 
diff --git a/hw/ide/core.c b/hw/ide/core.c
index cfa5de6..db509b3 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1120,7 +1120,7 @@ static void 

[Qemu-block] [PATCH v2 16/43] block: Request real permissions in blk_new_open()

2017-02-27 Thread Kevin Wolf
We can figure out the necessary permissions from the flags that the
caller passed.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/block-backend.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 299948f..03d5495 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -166,17 +166,33 @@ BlockBackend *blk_new_open(const char *filename, const 
char *reference,
 {
 BlockBackend *blk;
 BlockDriverState *bs;
+uint64_t perm;
+
+/* blk_new_open() is mainly used in .bdrv_create implementations and the
+ * tools where sharing isn't a concern because the BDS stays private, so we
+ * just request permission according to the flags.
+ *
+ * The exceptions are xen_disk and blockdev_init(); in these cases, the
+ * caller of blk_new_open() doesn't make use of the permissions, but they
+ * shouldn't hurt either. We can still share everything here because the
+ * guest devices will add their own blockers if they can't share. */
+perm = BLK_PERM_CONSISTENT_READ;
+if (flags & BDRV_O_RDWR) {
+perm |= BLK_PERM_WRITE;
+}
+if (flags & BDRV_O_RESIZE) {
+perm |= BLK_PERM_RESIZE;
+}
 
-blk = blk_new(0, BLK_PERM_ALL);
+blk = blk_new(perm, BLK_PERM_ALL);
 bs = bdrv_open(filename, reference, options, flags, errp);
 if (!bs) {
 blk_unref(blk);
 return NULL;
 }
 
-/* FIXME Use real permissions */
 blk->root = bdrv_root_attach_child(bs, "root", _root,
-   0, BLK_PERM_ALL, blk, _abort);
+   perm, BLK_PERM_ALL, blk, _abort);
 
 return blk;
 }
-- 
1.8.3.1




[Qemu-block] [PATCH v2 12/43] block: Add permissions to BlockBackend

2017-02-27 Thread Kevin Wolf
The BlockBackend can now store the permissions that its user requires.
This is necessary because nodes can be ejected from or inserted into a
BlockBackend and all of these operations must make sure that the user
still gets what it requested initially.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/block-backend.c  | 27 +++
 include/sysemu/block-backend.h |  2 ++
 2 files changed, 29 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 9bb4528..1ed75c6 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -59,6 +59,9 @@ struct BlockBackend {
 bool iostatus_enabled;
 BlockDeviceIoStatus iostatus;
 
+uint64_t perm;
+uint64_t shared_perm;
+
 bool allow_write_beyond_eof;
 
 NotifierList remove_bs_notifiers, insert_bs_notifiers;
@@ -126,6 +129,8 @@ BlockBackend *blk_new(void)
 
 blk = g_new0(BlockBackend, 1);
 blk->refcnt = 1;
+blk->perm = 0;
+blk->shared_perm = BLK_PERM_ALL;
 blk_set_enable_write_cache(blk, true);
 
 qemu_co_queue_init(>public.throttled_reqs[0]);
@@ -511,6 +516,27 @@ void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
 }
 }
 
+/*
+ * Sets the permission bitmasks that the user of the BlockBackend needs.
+ */
+int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
+ Error **errp)
+{
+int ret;
+
+if (blk->root) {
+ret = bdrv_child_try_set_perm(blk->root, perm, shared_perm, errp);
+if (ret < 0) {
+return ret;
+}
+}
+
+blk->perm = perm;
+blk->shared_perm = shared_perm;
+
+return 0;
+}
+
 static int blk_do_attach_dev(BlockBackend *blk, void *dev)
 {
 if (blk->dev) {
@@ -557,6 +583,7 @@ void blk_detach_dev(BlockBackend *blk, void *dev)
 blk->dev_ops = NULL;
 blk->dev_opaque = NULL;
 blk->guest_block_size = 512;
+blk_set_perm(blk, 0, BLK_PERM_ALL, _abort);
 blk_unref(blk);
 }
 
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index f365a51..4a18e86 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -105,6 +105,8 @@ void blk_remove_bs(BlockBackend *blk);
 void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs);
 bool bdrv_has_blk(BlockDriverState *bs);
 bool bdrv_is_root_node(BlockDriverState *bs);
+int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
+ Error **errp);
 
 void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow);
 void blk_iostatus_enable(BlockBackend *blk);
-- 
1.8.3.1




[Qemu-block] [PATCH v2 07/43] block: Default .bdrv_child_perm() for format drivers

2017-02-27 Thread Kevin Wolf
Almost all format drivers have the same characteristics as far as
permissions are concerned: They have one or more children for storing
their own data and, more importantly, metadata (can be written to and
grow even without external write requests, must be protected against
other writers and present consistent data) and optionally a backing file
(this is just data, so like for a filter, it only depends on what the
parent nodes need).

This provides a default implementation that can be shared by most of
our format drivers.

Signed-off-by: Kevin Wolf 
---
 block.c   | 43 +++
 include/block/block_int.h |  8 
 2 files changed, 51 insertions(+)

diff --git a/block.c b/block.c
index 597da9a..20dd1c5 100644
--- a/block.c
+++ b/block.c
@@ -1554,6 +1554,49 @@ void bdrv_filter_default_perms(BlockDriverState *bs, 
BdrvChild *c,
(c->shared_perm & DEFAULT_PERM_UNCHANGED);
 }
 
+void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
+   const BdrvChildRole *role,
+   uint64_t perm, uint64_t shared,
+   uint64_t *nperm, uint64_t *nshared)
+{
+bool backing = (role == _backing);
+assert(role == _backing || role == _file);
+
+if (!backing) {
+/* Apart from the modifications below, the same permissions are
+ * forwarded and left alone as for filters */
+bdrv_filter_default_perms(bs, c, role, perm, shared, , );
+
+/* Format drivers may touch metadata even if the guest doesn't write */
+if (!bdrv_is_read_only(bs)) {
+perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
+}
+
+/* bs->file always needs to be consistent because of the metadata. We
+ * can never allow other users to resize or write to it. */
+perm |= BLK_PERM_CONSISTENT_READ;
+shared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
+} else {
+/* We want consistent read from backing files if the parent needs it.
+ * No other operations are performed on backing files. */
+perm &= BLK_PERM_CONSISTENT_READ;
+
+/* If the parent can deal with changing data, we're okay with a
+ * writable and resizable backing file. */
+/* TODO Require !(perm & BLK_PERM_CONSISTENT_READ), too? */
+if (shared & BLK_PERM_WRITE) {
+shared = BLK_PERM_WRITE | BLK_PERM_RESIZE;
+} else {
+shared = 0;
+}
+
+shared |= BLK_PERM_CONSISTENT_READ | BLK_PERM_GRAPH_MOD |
+  BLK_PERM_WRITE_UNCHANGED;
+}
+
+*nperm = perm;
+*nshared = shared;
+}
 
 static void bdrv_replace_child(BdrvChild *child, BlockDriverState *new_bs)
 {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 17f4c2d..eb0598e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -880,6 +880,14 @@ void bdrv_filter_default_perms(BlockDriverState *bs, 
BdrvChild *c,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared);
 
+/* Default implementation for BlockDriver.bdrv_child_perm() that can be used by
+ * (non-raw) image formats: Like above for bs->backing, but for bs->file it
+ * requires WRITE | RESIZE for read-write images, always requires
+ * CONSISTENT_READ and doesn't share WRITE. */
+void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
+   const BdrvChildRole *role,
+   uint64_t perm, uint64_t shared,
+   uint64_t *nperm, uint64_t *nshared);
 
 const char *bdrv_get_parent_name(const BlockDriverState *bs);
 void blk_dev_change_media_cb(BlockBackend *blk, bool load);
-- 
1.8.3.1




[Qemu-block] [PATCH v2 15/43] block: Add BDRV_O_RESIZE for blk_new_open()

2017-02-27 Thread Kevin Wolf
blk_new_open() is a convenience function that processes flags rather
than QDict options as a simple way to just open an image file.

In order to keep it convenient in the future, it must automatically
request the necessary permissions. This can easily be inferred from the
flags for read and write, but we need another flag that tells us whether
to get the resize permission.

We can't just always request it because that means that no block jobs
can run on the resulting BlockBackend (which is something that e.g.
qemu-img commit wants to do), but we also can't request it never because
most of the .bdrv_create() implementations call blk_truncate().

The solution is to introduce another flag that is passed by all users
that want to resize the image.

Signed-off-by: Kevin Wolf 
---
 block/parallels.c | 3 ++-
 block/qcow.c  | 3 ++-
 block/qcow2.c | 6 --
 block/qed.c   | 3 ++-
 block/sheepdog.c  | 2 +-
 block/vdi.c   | 3 ++-
 block/vhdx.c  | 3 ++-
 block/vmdk.c  | 6 --
 block/vpc.c   | 3 ++-
 include/block/block.h | 1 +
 qemu-img.c| 2 +-
 11 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 6b0c0a9..19935e2 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -488,7 +488,8 @@ static int parallels_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 
 file = blk_new_open(filename, NULL, NULL,
-BDRV_O_RDWR | BDRV_O_PROTOCOL, _err);
+BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL,
+_err);
 if (file == NULL) {
 error_propagate(errp, local_err);
 return -EIO;
diff --git a/block/qcow.c b/block/qcow.c
index eb5d54c..9d6ac83 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -823,7 +823,8 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 
 qcow_blk = blk_new_open(filename, NULL, NULL,
-BDRV_O_RDWR | BDRV_O_PROTOCOL, _err);
+BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL,
+_err);
 if (qcow_blk == NULL) {
 error_propagate(errp, local_err);
 ret = -EIO;
diff --git a/block/qcow2.c b/block/qcow2.c
index 6f79df8..6a92d2e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2202,7 +2202,8 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 }
 
 blk = blk_new_open(filename, NULL, NULL,
-   BDRV_O_RDWR | BDRV_O_PROTOCOL, _err);
+   BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL,
+   _err);
 if (blk == NULL) {
 error_propagate(errp, local_err);
 return -EIO;
@@ -2266,7 +2267,8 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 options = qdict_new();
 qdict_put(options, "driver", qstring_from_str("qcow2"));
 blk = blk_new_open(filename, NULL, options,
-   BDRV_O_RDWR | BDRV_O_NO_FLUSH, _err);
+   BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_NO_FLUSH,
+   _err);
 if (blk == NULL) {
 error_propagate(errp, local_err);
 ret = -EIO;
diff --git a/block/qed.c b/block/qed.c
index d8f947a..5ec7fd8 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -625,7 +625,8 @@ static int qed_create(const char *filename, uint32_t 
cluster_size,
 }
 
 blk = blk_new_open(filename, NULL, NULL,
-   BDRV_O_RDWR | BDRV_O_PROTOCOL, _err);
+   BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL,
+   _err);
 if (blk == NULL) {
 error_propagate(errp, local_err);
 return -EIO;
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 860ba61..7434710 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1609,7 +1609,7 @@ static int sd_prealloc(const char *filename, Error **errp)
 int ret;
 
 blk = blk_new_open(filename, NULL, NULL,
-   BDRV_O_RDWR | BDRV_O_PROTOCOL, errp);
+   BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL, errp);
 if (blk == NULL) {
 ret = -EIO;
 goto out_with_err_set;
diff --git a/block/vdi.c b/block/vdi.c
index fd6e26d..9b4f70e 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -763,7 +763,8 @@ static int vdi_create(const char *filename, QemuOpts *opts, 
Error **errp)
 }
 
 blk = blk_new_open(filename, NULL, NULL,
-   BDRV_O_RDWR | BDRV_O_PROTOCOL, _err);
+   BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL,
+   _err);
 if (blk == NULL) {
 error_propagate(errp, local_err);
 ret = -EIO;
diff --git a/block/vhdx.c b/block/vhdx.c
index ab747f6..052a753 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1859,7 +1859,8 @@ static int vhdx_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 
 blk = 

[Qemu-block] [PATCH v2 14/43] block: Add error parameter to blk_insert_bs()

2017-02-27 Thread Kevin Wolf
Now that blk_insert_bs() requests the BlockBackend permissions for the
node it attaches to, it can fail. Instead of aborting, pass the errors
to the callers.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c  |  5 -
 block/backup.c   |  5 -
 block/block-backend.c| 13 -
 block/commit.c   | 38 ++
 block/mirror.c   | 15 ---
 block/qcow2.c| 10 --
 blockdev.c   | 11 +--
 blockjob.c   |  7 ++-
 hmp.c|  6 +-
 hw/core/qdev-properties-system.c |  7 ++-
 include/sysemu/block-backend.h   |  2 +-
 migration/block.c|  2 +-
 nbd/server.c |  6 +-
 tests/test-blockjob.c|  2 +-
 14 files changed, 100 insertions(+), 29 deletions(-)

diff --git a/block.c b/block.c
index 50c94ce..189351e 100644
--- a/block.c
+++ b/block.c
@@ -2185,8 +2185,11 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 }
 if (file_bs != NULL) {
 file = blk_new(BLK_PERM_CONSISTENT_READ, BLK_PERM_ALL);
-blk_insert_bs(file, file_bs);
+blk_insert_bs(file, file_bs, _err);
 bdrv_unref(file_bs);
+if (local_err) {
+goto fail;
+}
 
 qdict_put(options, "file",
   qstring_from_str(bdrv_get_node_name(file_bs)));
diff --git a/block/backup.c b/block/backup.c
index 4b3c94c..f38d1d0 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -626,7 +626,10 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 
 /* FIXME Use real permissions */
 job->target = blk_new(0, BLK_PERM_ALL);
-blk_insert_bs(job->target, target);
+ret = blk_insert_bs(job->target, target, errp);
+if (ret < 0) {
+goto error;
+}
 
 job->on_source_error = on_source_error;
 job->on_target_error = on_target_error;
diff --git a/block/block-backend.c b/block/block-backend.c
index 0319220..299948f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -508,19 +508,22 @@ void blk_remove_bs(BlockBackend *blk)
 /*
  * Associates a new BlockDriverState with @blk.
  */
-void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
+int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
 {
-bdrv_ref(bs);
-/* FIXME Error handling */
 blk->root = bdrv_root_attach_child(bs, "root", _root,
-   blk->perm, blk->shared_perm, blk,
-   _abort);
+   blk->perm, blk->shared_perm, blk, errp);
+if (blk->root == NULL) {
+return -EPERM;
+}
+bdrv_ref(bs);
 
 notifier_list_notify(>insert_bs_notifiers, blk);
 if (blk->public.throttle_state) {
 throttle_timers_attach_aio_context(
 >public.throttle_timers, bdrv_get_aio_context(bs));
 }
+
+return 0;
 }
 
 /*
diff --git a/block/commit.c b/block/commit.c
index 1897e98..2ad8138 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -220,6 +220,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 BlockDriverState *iter;
 BlockDriverState *overlay_bs;
 Error *local_err = NULL;
+int ret;
 
 assert(top != bs);
 if (top == base) {
@@ -256,8 +257,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 bdrv_reopen_multiple(bdrv_get_aio_context(bs), reopen_queue, 
_err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
-block_job_unref(>common);
-return;
+goto fail;
 }
 }
 
@@ -277,11 +277,17 @@ void commit_start(const char *job_id, BlockDriverState 
*bs,
 
 /* FIXME Use real permissions */
 s->base = blk_new(0, BLK_PERM_ALL);
-blk_insert_bs(s->base, base);
+ret = blk_insert_bs(s->base, base, errp);
+if (ret < 0) {
+goto fail;
+}
 
 /* FIXME Use real permissions */
 s->top = blk_new(0, BLK_PERM_ALL);
-blk_insert_bs(s->top, top);
+ret = blk_insert_bs(s->top, top, errp);
+if (ret < 0) {
+goto fail;
+}
 
 s->active = bs;
 
@@ -294,6 +300,16 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 
 trace_commit_start(bs, base, top, s);
 block_job_start(>common);
+return;
+
+fail:
+if (s->base) {
+blk_unref(s->base);
+}
+if (s->top) {
+blk_unref(s->top);
+}
+block_job_unref(>common);
 }
 
 
@@ -332,11 +348,17 @@ int bdrv_commit(BlockDriverState *bs)
 
 /* FIXME Use real permissions */
 src = blk_new(0, BLK_PERM_ALL);
-blk_insert_bs(src, bs);
-
-/* FIXME Use real permissions */
 backing = blk_new(0, BLK_PERM_ALL);
-blk_insert_bs(backing, bs->backing->bs);

[Qemu-block] [PATCH v2 13/43] block: Add permissions to blk_new()

2017-02-27 Thread Kevin Wolf
We want every user to be specific about the permissions it needs, so
we'll pass the initial permissions as parameters to blk_new(). A user
only needs to call blk_set_perm() if it wants to change the permissions
after the fact.

The permissions are stored in the BlockBackend and applied whenever a
BlockDriverState should be attached in blk_insert_bs().

This does not include actually choosing the right set of permissions
everywhere yet. Instead, the usual FIXME comment is added to each place
and will be addressed in individual patches.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c  |  2 +-
 block/backup.c   |  3 ++-
 block/block-backend.c| 21 ++---
 block/commit.c   | 12 
 block/mirror.c   |  3 ++-
 block/qcow2.c|  2 +-
 blockdev.c   |  4 ++--
 blockjob.c   |  3 ++-
 hmp.c|  3 ++-
 hw/block/fdc.c   |  3 ++-
 hw/core/qdev-properties-system.c |  3 ++-
 hw/ide/qdev.c|  3 ++-
 hw/scsi/scsi-disk.c  |  3 ++-
 include/sysemu/block-backend.h   |  2 +-
 migration/block.c|  3 ++-
 nbd/server.c |  3 ++-
 tests/test-blockjob.c|  3 ++-
 tests/test-throttle.c|  7 ---
 18 files changed, 53 insertions(+), 30 deletions(-)

diff --git a/block.c b/block.c
index 08476bb..50c94ce 100644
--- a/block.c
+++ b/block.c
@@ -2184,7 +2184,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 goto fail;
 }
 if (file_bs != NULL) {
-file = blk_new();
+file = blk_new(BLK_PERM_CONSISTENT_READ, BLK_PERM_ALL);
 blk_insert_bs(file, file_bs);
 bdrv_unref(file_bs);
 
diff --git a/block/backup.c b/block/backup.c
index fe010e7..4b3c94c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -624,7 +624,8 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 goto error;
 }
 
-job->target = blk_new();
+/* FIXME Use real permissions */
+job->target = blk_new(0, BLK_PERM_ALL);
 blk_insert_bs(job->target, target);
 
 job->on_source_error = on_source_error;
diff --git a/block/block-backend.c b/block/block-backend.c
index 1ed75c6..0319220 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -120,17 +120,23 @@ static const BdrvChildRole child_root = {
 
 /*
  * Create a new BlockBackend with a reference count of one.
- * Store an error through @errp on failure, unless it's null.
+ *
+ * @perm is a bitmasks of BLK_PERM_* constants which describes the permissions
+ * to request for a block driver node that is attached to this BlockBackend.
+ * @shared_perm is a bitmask which describes which permissions may be granted
+ * to other users of the attached node.
+ * Both sets of permissions can be changed later using blk_set_perm().
+ *
  * Return the new BlockBackend on success, null on failure.
  */
-BlockBackend *blk_new(void)
+BlockBackend *blk_new(uint64_t perm, uint64_t shared_perm)
 {
 BlockBackend *blk;
 
 blk = g_new0(BlockBackend, 1);
 blk->refcnt = 1;
-blk->perm = 0;
-blk->shared_perm = BLK_PERM_ALL;
+blk->perm = perm;
+blk->shared_perm = shared_perm;
 blk_set_enable_write_cache(blk, true);
 
 qemu_co_queue_init(>public.throttled_reqs[0]);
@@ -161,7 +167,7 @@ BlockBackend *blk_new_open(const char *filename, const char 
*reference,
 BlockBackend *blk;
 BlockDriverState *bs;
 
-blk = blk_new();
+blk = blk_new(0, BLK_PERM_ALL);
 bs = bdrv_open(filename, reference, options, flags, errp);
 if (!bs) {
 blk_unref(blk);
@@ -505,9 +511,10 @@ void blk_remove_bs(BlockBackend *blk)
 void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
 {
 bdrv_ref(bs);
-/* FIXME Use real permissions */
+/* FIXME Error handling */
 blk->root = bdrv_root_attach_child(bs, "root", _root,
-   0, BLK_PERM_ALL, blk, _abort);
+   blk->perm, blk->shared_perm, blk,
+   _abort);
 
 notifier_list_notify(>insert_bs_notifiers, blk);
 if (blk->public.throttle_state) {
diff --git a/block/commit.c b/block/commit.c
index c284e85..1897e98 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -275,10 +275,12 @@ void commit_start(const char *job_id, BlockDriverState 
*bs,
 block_job_add_bdrv(>common, overlay_bs);
 }
 
-s->base = blk_new();
+/* FIXME Use real permissions */
+s->base = blk_new(0, BLK_PERM_ALL);
 blk_insert_bs(s->base, base);
 
-s->top = blk_new();
+/* FIXME Use real permissions */
+s->top = blk_new(0, BLK_PERM_ALL);
 blk_insert_bs(s->top, top);
 
 s->active = bs;
@@ -328,10 +330,12 @@ int bdrv_commit(BlockDriverState *bs)
 

[Qemu-block] [PATCH v2 04/43] block: Involve block drivers in permission granting

2017-02-27 Thread Kevin Wolf
In many cases, the required permissions of one node on its children
depend on what its parents require from it. For example, the raw format
or most filter drivers only need to request consistent reads if that's
something that one of their parents wants.

In order to achieve this, this patch introduces two new BlockDriver
callbacks. The first one lets drivers first check (recursively) whether
the requested permissions can be set; the second one actually sets the
new permission bitmask.

Also add helper functions that drivers can use in their implementation
of the callbacks to update their permissions on a specific child.

Signed-off-by: Kevin Wolf 
---
 block.c   | 177 ++
 include/block/block_int.h |  61 
 2 files changed, 238 insertions(+)

diff --git a/block.c b/block.c
index 9628c7a..803a688 100644
--- a/block.c
+++ b/block.c
@@ -1326,11 +1326,145 @@ static int bdrv_fill_options(QDict **options, const 
char *filename,
 return 0;
 }
 
+/*
+ * Check whether permissions on this node can be changed in a way that
+ * @cumulative_perms and @cumulative_shared_perms are the new cumulative
+ * permissions of all its parents. This involves checking whether all necessary
+ * permission changes to child nodes can be performed.
+ *
+ * A call to this function must always be followed by a call to bdrv_set_perm()
+ * or bdrv_abort_perm_update().
+ */
+static int bdrv_check_perm(BlockDriverState *bs, uint64_t cumulative_perms,
+   uint64_t cumulative_shared_perms, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+BdrvChild *c;
+int ret;
+
+if (!drv) {
+error_setg(errp, "Block node is not opened");
+return -EINVAL;
+}
+
+/* Write permissions never work with read-only images */
+if ((cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) &&
+bdrv_is_read_only(bs))
+{
+error_setg(errp, "Block node is read-only");
+return -EPERM;
+}
+
+/* Check this node */
+if (drv->bdrv_check_perm) {
+return drv->bdrv_check_perm(bs, cumulative_perms,
+cumulative_shared_perms, errp);
+}
+
+/* Drivers may not have .bdrv_child_perm() */
+if (!drv->bdrv_child_perm) {
+return 0;
+}
+
+/* Check all children */
+QLIST_FOREACH(c, >children, next) {
+uint64_t cur_perm, cur_shared;
+drv->bdrv_child_perm(bs, c, c->role,
+ cumulative_perms, cumulative_shared_perms,
+ _perm, _shared);
+ret = bdrv_child_check_perm(c, cur_perm, cur_shared, errp);
+if (ret < 0) {
+return ret;
+}
+}
+
+return 0;
+}
+
+/*
+ * Notifies drivers that after a previous bdrv_check_perm() call, the
+ * permission update is not performed and any preparations made for it (e.g.
+ * taken file locks) need to be undone.
+ *
+ * This function recursively notifies all child nodes.
+ */
+static void bdrv_abort_perm_update(BlockDriverState *bs)
+{
+BlockDriver *drv = bs->drv;
+BdrvChild *c;
+
+if (!drv) {
+return;
+}
+
+if (drv->bdrv_abort_perm_update) {
+drv->bdrv_abort_perm_update(bs);
+}
+
+QLIST_FOREACH(c, >children, next) {
+bdrv_child_abort_perm_update(c);
+}
+}
+
+static void bdrv_set_perm(BlockDriverState *bs, uint64_t cumulative_perms,
+  uint64_t cumulative_shared_perms)
+{
+BlockDriver *drv = bs->drv;
+BdrvChild *c;
+
+if (!drv) {
+return;
+}
+
+/* Update this node */
+if (drv->bdrv_set_perm) {
+drv->bdrv_set_perm(bs, cumulative_perms, cumulative_shared_perms);
+}
+
+/* Drivers may not have .bdrv_child_perm() */
+if (!drv->bdrv_child_perm) {
+return;
+}
+
+/* Update all children */
+QLIST_FOREACH(c, >children, next) {
+uint64_t cur_perm, cur_shared;
+drv->bdrv_child_perm(bs, c, c->role,
+ cumulative_perms, cumulative_shared_perms,
+ _perm, _shared);
+bdrv_child_set_perm(c, cur_perm, cur_shared);
+}
+}
+
+static void bdrv_update_perm(BlockDriverState *bs)
+{
+BdrvChild *c;
+uint64_t cumulative_perms = 0;
+uint64_t cumulative_shared_perms = BLK_PERM_ALL;
+
+QLIST_FOREACH(c, >parents, next_parent) {
+cumulative_perms |= c->perm;
+cumulative_shared_perms &= c->shared_perm;
+}
+
+bdrv_set_perm(bs, cumulative_perms, cumulative_shared_perms);
+}
+
+/*
+ * Checks whether a new reference to @bs can be added if the new user requires
+ * @new_used_perm/@new_shared_perm as its permissions. If @ignore_child is set,
+ * this old reference is ignored in the calculations; this allows checking
+ * permission updates for an existing reference.
+ *
+ * Needs to be followed by a call to either bdrv_set_perm() or
+ * 

[Qemu-block] [PATCH v2 11/43] block: Request real permissions in bdrv_attach_child()

2017-02-27 Thread Kevin Wolf
Now that all block drivers with children tell us what permissions they
need from each of their children, bdrv_attach_child() can use this
information and make the right requirements while trying to attach new
children.

Signed-off-by: Kevin Wolf 
---
 block.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index d2c8126..08476bb 100644
--- a/block.c
+++ b/block.c
@@ -1438,7 +1438,8 @@ static void bdrv_set_perm(BlockDriverState *bs, uint64_t 
cumulative_perms,
 }
 }
 
-static void bdrv_update_perm(BlockDriverState *bs)
+static void bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
+ uint64_t *shared_perm)
 {
 BdrvChild *c;
 uint64_t cumulative_perms = 0;
@@ -1449,6 +1450,15 @@ static void bdrv_update_perm(BlockDriverState *bs)
 cumulative_shared_perms &= c->shared_perm;
 }
 
+*perm = cumulative_perms;
+*shared_perm = cumulative_shared_perms;
+}
+
+static void bdrv_update_perm(BlockDriverState *bs)
+{
+uint64_t cumulative_perms, cumulative_shared_perms;
+
+bdrv_get_cumulative_perm(bs, _perms, _shared_perms);
 bdrv_set_perm(bs, cumulative_perms, cumulative_shared_perms);
 }
 
@@ -1661,10 +1671,16 @@ BdrvChild *bdrv_attach_child(BlockDriverState 
*parent_bs,
  Error **errp)
 {
 BdrvChild *child;
+uint64_t perm, shared_perm;
+
+bdrv_get_cumulative_perm(parent_bs, , _perm);
+
+assert(parent_bs->drv);
+parent_bs->drv->bdrv_child_perm(parent_bs, NULL, child_role,
+perm, shared_perm, , _perm);
 
-/* FIXME Use real permissions */
 child = bdrv_root_attach_child(child_bs, child_name, child_role,
-   0, BLK_PERM_ALL, parent_bs, errp);
+   perm, shared_perm, parent_bs, errp);
 if (child == NULL) {
 return NULL;
 }
-- 
1.8.3.1




[Qemu-block] [PATCH v2 10/43] block: Require .bdrv_child_perm() with child nodes

2017-02-27 Thread Kevin Wolf
All block drivers that can have child nodes implement .bdrv_child_perm()
now. Make this officially a requirement by asserting that only drivers
without children can omit .bdrv_child_perm().

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 7ae8264..d2c8126 100644
--- a/block.c
+++ b/block.c
@@ -1361,8 +1361,9 @@ static int bdrv_check_perm(BlockDriverState *bs, uint64_t 
cumulative_perms,
 cumulative_shared_perms, errp);
 }
 
-/* Drivers may not have .bdrv_child_perm() */
+/* Drivers that never have children can omit .bdrv_child_perm() */
 if (!drv->bdrv_child_perm) {
+assert(QLIST_EMPTY(>children));
 return 0;
 }
 
@@ -1421,8 +1422,9 @@ static void bdrv_set_perm(BlockDriverState *bs, uint64_t 
cumulative_perms,
 drv->bdrv_set_perm(bs, cumulative_perms, cumulative_shared_perms);
 }
 
-/* Drivers may not have .bdrv_child_perm() */
+/* Drivers that never have children can omit .bdrv_child_perm() */
 if (!drv->bdrv_child_perm) {
+assert(QLIST_EMPTY(>children));
 return;
 }
 
-- 
1.8.3.1




[Qemu-block] [PATCH v2 06/43] block: Request child permissions in filter drivers

2017-02-27 Thread Kevin Wolf
All callers will have to request permissions for all of their child
nodes. Block drivers that act as simply filters can use the default
implementation of .bdrv_child_perm().

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/blkdebug.c| 2 ++
 block/blkreplay.c   | 1 +
 block/blkverify.c   | 1 +
 block/quorum.c  | 2 ++
 block/raw-format.c  | 1 +
 block/replication.c | 1 +
 6 files changed, 8 insertions(+)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 6117ce5..67e8024 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -734,6 +734,8 @@ static BlockDriver bdrv_blkdebug = {
 .bdrv_file_open = blkdebug_open,
 .bdrv_close = blkdebug_close,
 .bdrv_reopen_prepare= blkdebug_reopen_prepare,
+.bdrv_child_perm= bdrv_filter_default_perms,
+
 .bdrv_getlength = blkdebug_getlength,
 .bdrv_truncate  = blkdebug_truncate,
 .bdrv_refresh_filename  = blkdebug_refresh_filename,
diff --git a/block/blkreplay.c b/block/blkreplay.c
index cfc8c5b..e110211 100755
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -137,6 +137,7 @@ static BlockDriver bdrv_blkreplay = {
 
 .bdrv_file_open = blkreplay_open,
 .bdrv_close = blkreplay_close,
+.bdrv_child_perm= bdrv_filter_default_perms,
 .bdrv_getlength = blkreplay_getlength,
 
 .bdrv_co_preadv = blkreplay_co_preadv,
diff --git a/block/blkverify.c b/block/blkverify.c
index 43a940c..9a1e21c 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -320,6 +320,7 @@ static BlockDriver bdrv_blkverify = {
 .bdrv_parse_filename  = blkverify_parse_filename,
 .bdrv_file_open   = blkverify_open,
 .bdrv_close   = blkverify_close,
+.bdrv_child_perm  = bdrv_filter_default_perms,
 .bdrv_getlength   = blkverify_getlength,
 .bdrv_refresh_filename= blkverify_refresh_filename,
 
diff --git a/block/quorum.c b/block/quorum.c
index bdbcec6..40205fb 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1133,6 +1133,8 @@ static BlockDriver bdrv_quorum = {
 .bdrv_add_child = quorum_add_child,
 .bdrv_del_child = quorum_del_child,
 
+.bdrv_child_perm= bdrv_filter_default_perms,
+
 .is_filter  = true,
 .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
 };
diff --git a/block/raw-format.c b/block/raw-format.c
index ce34d1b..86fbc65 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -467,6 +467,7 @@ BlockDriver bdrv_raw = {
 .bdrv_reopen_abort= _reopen_abort,
 .bdrv_open= _open,
 .bdrv_close   = _close,
+.bdrv_child_perm  = bdrv_filter_default_perms,
 .bdrv_create  = _create,
 .bdrv_co_preadv   = _co_preadv,
 .bdrv_co_pwritev  = _co_pwritev,
diff --git a/block/replication.c b/block/replication.c
index eff85c7..91465cb 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -660,6 +660,7 @@ BlockDriver bdrv_replication = {
 
 .bdrv_open  = replication_open,
 .bdrv_close = replication_close,
+.bdrv_child_perm= bdrv_filter_default_perms,
 
 .bdrv_getlength = replication_getlength,
 .bdrv_co_readv  = replication_co_readv,
-- 
1.8.3.1




[Qemu-block] [PATCH v2 03/43] block: Let callers request permissions when attaching a child node

2017-02-27 Thread Kevin Wolf
When attaching a node as a child to a new parent, the required and
shared permissions for this parent are checked against all other parents
of the node now, and an error is returned if there is a conflict.

This allows error returns to a function that previously always
succeeded, and the same is true for quite a few callers and their
callers. Converting all of them within the same patch would be too much,
so for now everyone tells that they don't need any permissions and allow
everyone else to do anything. This way we can use _abort initially
and convert caller by caller to pass actual permission requirements and
implement error handling.

All these places are marked with FIXME comments and it will be the job
of the next patches to clean them up again.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c   | 66 +--
 block/block-backend.c |  8 --
 include/block/block_int.h | 15 ++-
 3 files changed, 78 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index 65240fa..9628c7a 100644
--- a/block.c
+++ b/block.c
@@ -1326,6 +1326,38 @@ static int bdrv_fill_options(QDict **options, const char 
*filename,
 return 0;
 }
 
+static int bdrv_check_update_perm(BlockDriverState *bs, uint64_t new_used_perm,
+  uint64_t new_shared_perm,
+  BdrvChild *ignore_child, Error **errp)
+{
+BdrvChild *c;
+
+/* There is no reason why anyone couldn't tolerate write_unchanged */
+assert(new_shared_perm & BLK_PERM_WRITE_UNCHANGED);
+
+QLIST_FOREACH(c, >parents, next_parent) {
+if (c == ignore_child) {
+continue;
+}
+
+if ((new_used_perm & c->shared_perm) != new_used_perm ||
+(c->perm & new_shared_perm) != c->perm)
+{
+const char *user = NULL;
+if (c->role->get_name) {
+user = c->role->get_name(c);
+if (user && !*user) {
+user = NULL;
+}
+}
+error_setg(errp, "Conflicts with %s", user ?: "another operation");
+return -EPERM;
+}
+}
+
+return 0;
+}
+
 static void bdrv_replace_child(BdrvChild *child, BlockDriverState *new_bs)
 {
 BlockDriverState *old_bs = child->bs;
@@ -1350,14 +1382,25 @@ static void bdrv_replace_child(BdrvChild *child, 
BlockDriverState *new_bs)
 BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
   const char *child_name,
   const BdrvChildRole *child_role,
-  void *opaque)
+  uint64_t perm, uint64_t shared_perm,
+  void *opaque, Error **errp)
 {
-BdrvChild *child = g_new(BdrvChild, 1);
+BdrvChild *child;
+int ret;
+
+ret = bdrv_check_update_perm(child_bs, perm, shared_perm, NULL, errp);
+if (ret < 0) {
+return NULL;
+}
+
+child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
-.bs = NULL,
-.name   = g_strdup(child_name),
-.role   = child_role,
-.opaque = opaque,
+.bs = NULL,
+.name   = g_strdup(child_name),
+.role   = child_role,
+.perm   = perm,
+.shared_perm= shared_perm,
+.opaque = opaque,
 };
 
 bdrv_replace_child(child, child_bs);
@@ -1371,8 +1414,15 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
  const BdrvChildRole *child_role,
  Error **errp)
 {
-BdrvChild *child = bdrv_root_attach_child(child_bs, child_name, child_role,
-  parent_bs);
+BdrvChild *child;
+
+/* FIXME Use real permissions */
+child = bdrv_root_attach_child(child_bs, child_name, child_role,
+   0, BLK_PERM_ALL, parent_bs, errp);
+if (child == NULL) {
+return NULL;
+}
+
 QLIST_INSERT_HEAD(_bs->children, child, next);
 return child;
 }
diff --git a/block/block-backend.c b/block/block-backend.c
index 492e71e..9bb4528 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -163,7 +163,9 @@ BlockBackend *blk_new_open(const char *filename, const char 
*reference,
 return NULL;
 }
 
-blk->root = bdrv_root_attach_child(bs, "root", _root, blk);
+/* FIXME Use real permissions */
+blk->root = bdrv_root_attach_child(bs, "root", _root,
+   0, BLK_PERM_ALL, blk, _abort);
 
 return blk;
 }
@@ -498,7 +500,9 @@ void blk_remove_bs(BlockBackend *blk)
 void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
 {
 bdrv_ref(bs);
-blk->root = bdrv_root_attach_child(bs, "root", _root, blk);
+/* FIXME Use real permissions */
+blk->root 

[Qemu-block] [PATCH v2 05/43] block: Default .bdrv_child_perm() for filter drivers

2017-02-27 Thread Kevin Wolf
Most filters need permissions related to read and write for their
children, but only if the node has a parent that wants to use the same
operation on the filter. The same is true for resize.

This adds a default implementation that simply forwards all necessary
permissions to all children of the node and leaves the other permissions
unchanged.

Signed-off-by: Kevin Wolf 
---
 block.c   | 24 
 include/block/block_int.h |  8 
 2 files changed, 32 insertions(+)

diff --git a/block.c b/block.c
index 803a688..597da9a 100644
--- a/block.c
+++ b/block.c
@@ -1531,6 +1531,30 @@ int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, 
uint64_t shared,
 return 0;
 }
 
+#define DEFAULT_PERM_PASSTHROUGH (BLK_PERM_CONSISTENT_READ \
+ | BLK_PERM_WRITE \
+ | BLK_PERM_WRITE_UNCHANGED \
+ | BLK_PERM_RESIZE)
+#define DEFAULT_PERM_UNCHANGED (BLK_PERM_ALL & ~DEFAULT_PERM_PASSTHROUGH)
+
+void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
+   const BdrvChildRole *role,
+   uint64_t perm, uint64_t shared,
+   uint64_t *nperm, uint64_t *nshared)
+{
+if (c == NULL) {
+*nperm = perm & DEFAULT_PERM_PASSTHROUGH;
+*nshared = (shared & DEFAULT_PERM_PASSTHROUGH) | 
DEFAULT_PERM_UNCHANGED;
+return;
+}
+
+*nperm = (perm & DEFAULT_PERM_PASSTHROUGH) |
+ (c->perm & DEFAULT_PERM_UNCHANGED);
+*nshared = (shared & DEFAULT_PERM_PASSTHROUGH) |
+   (c->shared_perm & DEFAULT_PERM_UNCHANGED);
+}
+
+
 static void bdrv_replace_child(BdrvChild *child, BlockDriverState *new_bs)
 {
 BlockDriverState *old_bs = child->bs;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index cef2b6e..17f4c2d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -872,6 +872,14 @@ void bdrv_child_abort_perm_update(BdrvChild *c);
 int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
 Error **errp);
 
+/* Default implementation for BlockDriver.bdrv_child_perm() that can be used by
+ * block filters: Forward CONSISTENT_READ, WRITE, WRITE_UNCHANGED and RESIZE to
+ * all children */
+void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
+   const BdrvChildRole *role,
+   uint64_t perm, uint64_t shared,
+   uint64_t *nperm, uint64_t *nshared);
+
 
 const char *bdrv_get_parent_name(const BlockDriverState *bs);
 void blk_dev_change_media_cb(BlockBackend *blk, bool load);
-- 
1.8.3.1




[Qemu-block] [PATCH v2 02/43] block: Add Error argument to bdrv_attach_child()

2017-02-27 Thread Kevin Wolf
It will have to return an error soon, so prepare the callers for it.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c   | 16 +---
 block/quorum.c|  9 -
 include/block/block.h |  3 ++-
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index b663204..65240fa 100644
--- a/block.c
+++ b/block.c
@@ -1368,7 +1368,8 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
*child_bs,
 BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
  BlockDriverState *child_bs,
  const char *child_name,
- const BdrvChildRole *child_role)
+ const BdrvChildRole *child_role,
+ Error **errp)
 {
 BdrvChild *child = bdrv_root_attach_child(child_bs, child_name, child_role,
   parent_bs);
@@ -1469,7 +1470,9 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 bs->backing = NULL;
 goto out;
 }
-bs->backing = bdrv_attach_child(bs, backing_hd, "backing", _backing);
+/* FIXME Error handling */
+bs->backing = bdrv_attach_child(bs, backing_hd, "backing", _backing,
+_abort);
 bs->open_flags &= ~BDRV_O_NO_BACKING;
 pstrcpy(bs->backing_file, sizeof(bs->backing_file), backing_hd->filename);
 pstrcpy(bs->backing_format, sizeof(bs->backing_format),
@@ -1648,6 +1651,7 @@ BdrvChild *bdrv_open_child(const char *filename,
const BdrvChildRole *child_role,
bool allow_none, Error **errp)
 {
+BdrvChild *c;
 BlockDriverState *bs;
 
 bs = bdrv_open_child_bs(filename, options, bdref_key, parent, child_role,
@@ -1656,7 +1660,13 @@ BdrvChild *bdrv_open_child(const char *filename,
 return NULL;
 }
 
-return bdrv_attach_child(parent, bs, bdref_key, child_role);
+c = bdrv_attach_child(parent, bs, bdref_key, child_role, errp);
+if (!c) {
+bdrv_unref(bs);
+return NULL;
+}
+
+return c;
 }
 
 static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
diff --git a/block/quorum.c b/block/quorum.c
index 86e2072..bdbcec6 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1032,10 +1032,17 @@ static void quorum_add_child(BlockDriverState *bs, 
BlockDriverState *child_bs,
 
 /* We can safely add the child now */
 bdrv_ref(child_bs);
-child = bdrv_attach_child(bs, child_bs, indexstr, _format);
+
+child = bdrv_attach_child(bs, child_bs, indexstr, _format, errp);
+if (child == NULL) {
+s->next_child_index--;
+bdrv_unref(child_bs);
+goto out;
+}
 s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
 s->children[s->num_children++] = child;
 
+out:
 bdrv_drained_end(bs);
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index f62f38e..ff951ea 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -520,7 +520,8 @@ void bdrv_unref_child(BlockDriverState *parent, BdrvChild 
*child);
 BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
  BlockDriverState *child_bs,
  const char *child_name,
- const BdrvChildRole *child_role);
+ const BdrvChildRole *child_role,
+ Error **errp);
 
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
-- 
1.8.3.1




[Qemu-block] [PATCH v2 00/43] New op blocker system, part 1

2017-02-27 Thread Kevin Wolf
This series is the first part of implementing the new op blocker system
whose design was agreed on quite a while ago, but proved a bit tricky to
implement in places. There is more work to do to fully replace the old op
blocker system, but realistically we don't have that much time until the 2.9
freeze. So let's merge this series to complement the traditional op blockers
and plan with a second part for the 2.10 timeframe.

The basic idea is that every user of a block node (including things outside the
block layer that go through a BlockBackend, and also other block nodes that
hold references to it) has to declare which low-level operations/permissions it
needs and which operation it allows other users to perform on the same node.
Depending on these declarations, conflicts are avoided by returning an error
for attempts to attach a conflicting user to the same node.

After this series, all users request permissions, and hopefully all of the
permissions they need. For writes and resize, getting the permission first is
actually enforced with assertions. Asserting it for consistent reads is in
theory doable, but would mean introducing a request flag that tell us that
inconsistent reads are okay - and that in all block drivers to recursively
propagate this flag through the backing chain. It might not be worth it.

As stated above, the series doesn't remove the old op blockers yet, though in
theory the new op blockers should block everything that needs to be blocked.
In practice, the read/write/resize blockers should be okay, but
BLK_PERM_GRAPH_MOD isn't to be taken too seriously at the moment. It isn't
really applied consistently and doesn't do much useful yet. Making proper use
of it is left for the part 2 series.

Kevin Wolf (43):
  block: Add op blocker permission constants
  block: Add Error argument to bdrv_attach_child()
  block: Let callers request permissions when attaching a child node
  block: Involve block drivers in permission granting
  block: Default .bdrv_child_perm() for filter drivers
  block: Request child permissions in filter drivers
  block: Default .bdrv_child_perm() for format drivers
  block: Request child permissions in format drivers
  vvfat: Implement .bdrv_child_perm()
  block: Require .bdrv_child_perm() with child nodes
  block: Request real permissions in bdrv_attach_child()
  block: Add permissions to BlockBackend
  block: Add permissions to blk_new()
  block: Add error parameter to blk_insert_bs()
  block: Add BDRV_O_RESIZE for blk_new_open()
  block: Request real permissions in blk_new_open()
  block: Allow error return in BlockDevOps.change_media_cb()
  hw/block: Request permissions
  hw/block: Introduce share-rw qdev property
  blockjob: Add permissions to block_job_create()
  block: Add BdrvChildRole.get_parent_desc()
  block: Include details on permission errors in message
  block: Add BdrvChildRole.stay_at_node
  blockjob: Add permissions to block_job_add_bdrv()
  commit: Use real permissions in commit block job
  commit: Use real permissions for HMP 'commit'
  backup: Use real permissions in backup block job
  block: Fix pending requests check in bdrv_append()
  block: BdrvChildRole.attach/detach() callbacks
  block: Allow backing file links in change_parent_backing_link()
  mirror: Use real permissions in mirror/active commit block job
  stream: Use real permissions in streaming block job
  mirror: Add filter-node-name to blockdev-mirror
  commit: Add filter-node-name to block-commit
  hmp: Request permissions in qemu-io
  migration/block: Use real permissions
  nbd/server: Use real permissions for NBD exports
  tests: Remove FIXME comments
  block: Pass BdrvChild to bdrv_aligned_preadv/pwritev and copy-on-read
  block: Assertions for write permissions
  block: Assertions for resize permission
  block: Add Error parameter to bdrv_set_backing_hd()
  block: Add Error parameter to bdrv_append()

 block.c  | 563 ++-
 block/backup.c   |  22 +-
 block/blkdebug.c |   2 +
 block/blkreplay.c|   1 +
 block/blkverify.c|   1 +
 block/block-backend.c| 116 +++-
 block/bochs.c|   1 +
 block/cloop.c|   1 +
 block/commit.c   | 176 ++--
 block/crypto.c   |   1 +
 block/dmg.c  |   1 +
 block/io.c   |  41 +--
 block/mirror.c   | 233 ++--
 block/parallels.c|   4 +-
 block/qcow.c |   4 +-
 block/qcow2.c|  19 +-
 block/qed.c  |   4 +-
 block/quorum.c   |  11 +-
 block/raw-format.c   |   1 +
 block/replication.c  |   3 +-
 block/sheepdog.c |   2 +-
 block/stream.c   |  45 +++-
 block/vdi.c  |   4 +-
 block/vhdx.c |   4 +-
 

[Qemu-block] [PATCH v2 01/43] block: Add op blocker permission constants

2017-02-27 Thread Kevin Wolf
This patch defines the permission categories that will be used by the
new op blocker system.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 include/block/block.h | 36 
 1 file changed, 36 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index bde5ebd..f62f38e 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -187,6 +187,42 @@ typedef enum BlockOpType {
 BLOCK_OP_TYPE_MAX,
 } BlockOpType;
 
+/* Block node permission constants */
+enum {
+/**
+ * A user that has the "permission" of consistent reads is guaranteed that
+ * their view of the contents of the block device is complete and
+ * self-consistent, representing the contents of a disk at a specific
+ * point.
+ *
+ * For most block devices (including their backing files) this is true, but
+ * the property cannot be maintained in a few situations like for
+ * intermediate nodes of a commit block job.
+ */
+BLK_PERM_CONSISTENT_READ= 0x01,
+
+/** This permission is required to change the visible disk contents. */
+BLK_PERM_WRITE  = 0x02,
+
+/**
+ * This permission (which is weaker than BLK_PERM_WRITE) is both enough and
+ * required for writes to the block node when the caller promises that
+ * the visible disk content doesn't change.
+ */
+BLK_PERM_WRITE_UNCHANGED= 0x04,
+
+/** This permission is required to change the size of a block node. */
+BLK_PERM_RESIZE = 0x08,
+
+/**
+ * This permission is required to change the node that this BdrvChild
+ * points to.
+ */
+BLK_PERM_GRAPH_MOD  = 0x10,
+
+BLK_PERM_ALL= 0x1f,
+};
+
 /* disk I/O throttling */
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
-- 
1.8.3.1




Re: [Qemu-block] [Qemu-devel] [PATCH 48/54] nbd/server: Use real permissions for NBD exports

2017-02-27 Thread Eric Blake
On 02/21/2017 08:58 AM, Kevin Wolf wrote:
> NBD can't cope with device size changes, so resize must be forbidden,
> but otherwise we can tolerate anything. Depending on whether the export
> is writable or not, we only require consistent reads and writes.

Well, there is a proposal for NBD to grow an extension to support
resizes, but nothing that will land in time for qemu 2.9 :)

> 
> Signed-off-by: Kevin Wolf 
> ---
>  nbd/server.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 5/5] block/rbd: add support for 'mon_host', 'auth_supported' via QAPI

2017-02-27 Thread Daniel P. Berrange
On Mon, Feb 27, 2017 at 01:58:48PM -0500, Jeff Cody wrote:
> This adds support for two additional options that may be specified
> by QAPI in blockdev-add:
> 
> mon_host: servername and port
> auth_supported: either 'cephx' or 'none'
> 
> Signed-off-by: Jeff Cody 
> ---
>  block/rbd.c  | 39 +++
>  qapi/block-core.json |  8 
>  2 files changed, 47 insertions(+)
> 
> diff --git a/block/rbd.c b/block/rbd.c
> index e04a5e1..51e971e 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -394,6 +394,18 @@ static QemuOptsList runtime_opts = {
>  .name = "keyvalue-pairs",
>  .type = QEMU_OPT_STRING,
>  },
> +{
> +.name = "server.host",
> +.type = QEMU_OPT_STRING,
> +},
> +{
> +.name = "server.port",
> +.type = QEMU_OPT_STRING,
> +},
> +{
> +.name = "auth_supported",
> +.type = QEMU_OPT_STRING,
> +},
>  { /* end of list */ }
>  },
>  };
> @@ -559,6 +571,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  {
>  BDRVRBDState *s = bs->opaque;
>  const char *pool, *snap, *conf, *clientname, *name, *keypairs;
> +const char *host, *port, *auth_supported;
>  const char *secretid;
>  QemuOpts *opts;
>  Error *local_err = NULL;
> @@ -580,6 +593,9 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  clientname = qemu_opt_get(opts, "user");
>  name   = qemu_opt_get(opts, "image");
>  keypairs   = qemu_opt_get(opts, "keyvalue-pairs");
> +host   = qemu_opt_get(opts, "server.host");
> +port   = qemu_opt_get(opts, "server.port");
> +auth_supported = qemu_opt_get(opts, "auth_supported");
>  
>  r = rados_create(>cluster, clientname);
>  if (r < 0) {
> @@ -604,6 +620,29 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  goto failed_shutdown;
>  }
>  
> +/* if mon_host was specified */
> +if (host) {
> +const char *hostname = host;
> +char *mon_host = NULL;
> +
> +if (port) {
> +mon_host = g_strdup_printf("%s:%s", host, port);
> +hostname = mon_host;
> +}
> +r = rados_conf_set(s->cluster, "mon_host", hostname);
> +g_free(mon_host);
> +if (r < 0) {
> +goto failed_shutdown;
> +}
> +}
> +
> +if (auth_supported) {
> +r = rados_conf_set(s->cluster, "auth_supported", auth_supported);
> +if (r < 0) {
> +goto failed_shutdown;
> +}
> +}
> +
>  if (qemu_rbd_set_auth(s->cluster, secretid, errp) < 0) {
>  r = -EIO;
>  goto failed_shutdown;
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 5b311ff..376512c 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2680,6 +2680,12 @@
>  #
>  # @user:   #optional Ceph id name.
>  #
> +# @server: #optional Monitor host address and port.  This maps
> +#  to the "mon_host" Ceph option.
> +#
> +# @auth_supported: #optional Authentication supported.
> +#  Either "cephx" or"none".
> +#
>  # @password-secret:#optional The ID of a QCryptoSecret object providing
>  #   the password for the login.
>  #
> @@ -2691,6 +2697,8 @@
>  '*conf': 'str',
>  '*snapshot': 'str',
>  '*user': 'str',
> +'*server': 'InetSocketAddress',

This needs to be an array

> +'*auth_supported': 'str',

IIUC, you're allowed to list multiple auth options too

>  '*password-secret': 'str' } }
>  


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-block] [Qemu-devel] Non-flat command line option argument syntax

2017-02-27 Thread Eric Blake
On 02/27/2017 07:36 AM, Markus Armbruster wrote:

> 
>>  Maybe just 'foo.array[]' (without any =) for an empty
>> array or something like that.
> 
> Yes, that should do.  Likewise foo.object{} for empty object.
> 
> {} doesn't even need quoting.  [] may.

[contents] needs quoting, but [] does NOT need shell quoting (no shells
treat it as a glob), for the same reason that 'if [ -e "$file" ];' needs
no quoting around the [ or ] (the shell only requires quotes for [ if
the rest of the shell word can look like a valid glob, but globs require
intermediate content before the ]).

>> Before we introduce anything like this, do we actually need it?
> 
> I don't know whether anything needs optional, present and empty.  But
> even if the answer is "no" today, it need not remain "no".
> 
> Anyone running into a case of "yes", will have to fall back to the JSON
> form of -blockdev.  Strengthens my belief that providing JSON there is a
> good idea.
> 
> The insufficient generality of dotted keys bugs me a bit.  Not sure
> whether it justifies more syntax now.  But we should document it.

I agree that documenting it as a shortcoming of dotted form and pointing
to JSON form is okay.  I also like that we are leaving the door open for
future expansion, if needed, and think that is better than inventing the
syntax now, especially for what we are trying to get into 2.9.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 1/5] block/rbd: don't copy strings in qemu_rbd_next_tok()

2017-02-27 Thread Markus Armbruster
Jeff Cody  writes:

> This patch is prep work for parsing options for .bdrv_parse_filename,
> and using QDict options.
>
> The function qemu_rbd_next_tok() searched for various key/value pairs,
> and copied them into buffers.  This will soon be an unnecessary extra
> step, so we will now return found strings by reference only, and
> offload the responsibility for safely handling/coping these strings to
> the caller.
>
> This also cleans up error handling some, as the callers now rely on
> the Error object to determine if there is a parse error.
>
> Signed-off-by: Jeff Cody 

Reviewed-by: Markus Armbruster 



Re: [Qemu-block] [PATCH] option: Tweak invalid size error message and unbreak iotest 049

2017-02-27 Thread Thomas Huth
On 27.02.2017 13:55, Markus Armbruster wrote:
> Commit 75cdcd1 neglected to update tests/qemu-iotests/049.out, and
> made the error message for negative size worse.  Fix that.
> 
> Reported-by: Thomas Huth 
> Signed-off-by: Markus Armbruster 
> ---
>  tests/qemu-iotests/049.out | 14 +-
>  util/qemu-option.c |  2 +-
>  2 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
> index 4673b67..34e66db 100644
> --- a/tests/qemu-iotests/049.out
> +++ b/tests/qemu-iotests/049.out
> @@ -95,14 +95,14 @@ qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- -1024
>  qemu-img: Image size must be less than 8 EiB!
>  
>  qemu-img create -f qcow2 -o size=-1024 TEST_DIR/t.qcow2
> -qemu-img: Parameter 'size' expects a non-negative number below 2^64
> +qemu-img: Value '-1024' is out of range for parameter 'size'
>  qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
>  
>  qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- -1k
>  qemu-img: Image size must be less than 8 EiB!
>  
>  qemu-img create -f qcow2 -o size=-1k TEST_DIR/t.qcow2
> -qemu-img: Parameter 'size' expects a non-negative number below 2^64
> +qemu-img: Value '-1k' is out of range for parameter 'size'
>  qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
>  
>  qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1kilobyte
> @@ -110,15 +110,19 @@ qemu-img: Invalid image size specified! You may use k, 
> M, G, T, P or E suffixes
>  qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.
>  
>  qemu-img create -f qcow2 -o size=1kilobyte TEST_DIR/t.qcow2
> -Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off 
> cluster_size=65536 lazy_refcounts=off refcount_bits=16
> +qemu-img: Parameter 'size' expects a non-negative number below 2^64
> +Optional suffix k, M, G, T, P or E means kilo-, mega-, giga-, tera-, peta-
> +and exabytes, respectively.
> +qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
>  
>  qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- foobar
>  qemu-img: Invalid image size specified! You may use k, M, G, T, P or E 
> suffixes for
>  qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.
>  
>  qemu-img create -f qcow2 -o size=foobar TEST_DIR/t.qcow2
> -qemu-img: Parameter 'size' expects a size
> -You may use k, M, G or T suffixes for kilobytes, megabytes, gigabytes and 
> terabytes.
> +qemu-img: Parameter 'size' expects a non-negative number below 2^64
> +Optional suffix k, M, G, T, P or E means kilo-, mega-, giga-, tera-, peta-
> +and exabytes, respectively.
>  qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
>  
>  == Check correct interpretation of suffixes for cluster size ==
> diff --git a/util/qemu-option.c b/util/qemu-option.c
> index 419f252..5ce1b5c 100644
> --- a/util/qemu-option.c
> +++ b/util/qemu-option.c
> @@ -179,7 +179,7 @@ void parse_option_size(const char *name, const char 
> *value,
>  
>  err = qemu_strtosz(value, NULL, );
>  if (err == -ERANGE) {
> -error_setg(errp, "Value '%s' is too large for parameter '%s'",
> +error_setg(errp, "Value '%s' is out of range for parameter '%s'",
> value, name);
>  return;
>  }

Reviewed-by: Thomas Huth 




Re: [Qemu-block] [PATCH] block/mirror: fix broken sparseness detection

2017-02-27 Thread Jeff Cody
On Mon, Feb 27, 2017 at 12:13:14PM -0500, John Snow wrote:
> int64_t is in all likelihood the actual scalar type we want.
> Yep, really.
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1219541
> 
> Signed-off-by: John Snow 
> ---
>  block/mirror.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index ca8547b..7b4cfbd 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -386,7 +386,7 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  nb_chunks * sectors_per_chunk);
>  bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, 
> nb_chunks);
>  while (nb_chunks > 0 && sector_num < end) {
> -int ret;
> +int64_t ret;
>  int io_sectors, io_sectors_acct;
>  BlockDriverState *file;
>  enum MirrorMethod {
> -- 
> 2.9.3
>

Reviewed-by: Jeff Cody 



[Qemu-block] [PATCH v2 1/5] block/rbd: don't copy strings in qemu_rbd_next_tok()

2017-02-27 Thread Jeff Cody
This patch is prep work for parsing options for .bdrv_parse_filename,
and using QDict options.

The function qemu_rbd_next_tok() searched for various key/value pairs,
and copied them into buffers.  This will soon be an unnecessary extra
step, so we will now return found strings by reference only, and
offload the responsibility for safely handling/coping these strings to
the caller.

This also cleans up error handling some, as the callers now rely on
the Error object to determine if there is a parse error.

Signed-off-by: Jeff Cody 
---
 block/rbd.c | 99 +++--
 1 file changed, 64 insertions(+), 35 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 22e8e69..33c21d8 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -102,10 +102,10 @@ typedef struct BDRVRBDState {
 char *snap;
 } BDRVRBDState;
 
-static int qemu_rbd_next_tok(char *dst, int dst_len,
- char *src, char delim,
- const char *name,
- char **p, Error **errp)
+static char *qemu_rbd_next_tok(int max_len,
+   char *src, char delim,
+   const char *name,
+   char **p, Error **errp)
 {
 int l;
 char *end;
@@ -127,17 +127,15 @@ static int qemu_rbd_next_tok(char *dst, int dst_len,
 }
 }
 l = strlen(src);
-if (l >= dst_len) {
+if (l >= max_len) {
 error_setg(errp, "%s too long", name);
-return -EINVAL;
+return NULL;
 } else if (l == 0) {
 error_setg(errp, "%s too short", name);
-return -EINVAL;
+return NULL;
 }
 
-pstrcpy(dst, dst_len, src);
-
-return 0;
+return src;
 }
 
 static void qemu_rbd_unescape(char *src)
@@ -162,7 +160,9 @@ static int qemu_rbd_parsename(const char *filename,
 {
 const char *start;
 char *p, *buf;
-int ret;
+int ret = 0;
+char *found_str;
+Error *local_err = NULL;
 
 if (!strstart(filename, "rbd:", )) {
 error_setg(errp, "File name must start with 'rbd:'");
@@ -174,36 +174,60 @@ static int qemu_rbd_parsename(const char *filename,
 *snap = '\0';
 *conf = '\0';
 
-ret = qemu_rbd_next_tok(pool, pool_len, p,
-'/', "pool name", , errp);
-if (ret < 0 || !p) {
+found_str = qemu_rbd_next_tok(pool_len, p,
+  '/', "pool name", , _err);
+if (local_err) {
+goto done;
+}
+if (!p) {
 ret = -EINVAL;
+error_setg(errp, "Pool name is required");
 goto done;
 }
-qemu_rbd_unescape(pool);
+qemu_rbd_unescape(found_str);
+g_strlcpy(pool, found_str, pool_len);
 
 if (strchr(p, '@')) {
-ret = qemu_rbd_next_tok(name, name_len, p,
-'@', "object name", , errp);
-if (ret < 0) {
+found_str = qemu_rbd_next_tok(name_len, p,
+  '@', "object name", , _err);
+if (local_err) {
 goto done;
 }
-ret = qemu_rbd_next_tok(snap, snap_len, p,
-':', "snap name", , errp);
-qemu_rbd_unescape(snap);
+qemu_rbd_unescape(found_str);
+g_strlcpy(name, found_str, name_len);
+
+found_str = qemu_rbd_next_tok(snap_len, p,
+  ':', "snap name", , _err);
+if (local_err) {
+goto done;
+}
+qemu_rbd_unescape(found_str);
+g_strlcpy(snap, found_str, snap_len);
 } else {
-ret = qemu_rbd_next_tok(name, name_len, p,
-':', "object name", , errp);
+found_str = qemu_rbd_next_tok(name_len, p,
+  ':', "object name", , _err);
+if (local_err) {
+goto done;
+}
+qemu_rbd_unescape(found_str);
+g_strlcpy(name, found_str, name_len);
 }
-qemu_rbd_unescape(name);
-if (ret < 0 || !p) {
+if (!p) {
 goto done;
 }
 
-ret = qemu_rbd_next_tok(conf, conf_len, p,
-'\0', "configuration", , errp);
+found_str = qemu_rbd_next_tok(conf_len, p,
+  '\0', "configuration", , _err);
+if (local_err) {
+goto done;
+}
+g_strlcpy(conf, found_str, conf_len);
 
 done:
+if (local_err) {
+ret = -EINVAL;
+error_propagate(errp, local_err);
+}
 g_free(buf);
 return ret;
 }
@@ -262,17 +286,18 @@ static int qemu_rbd_set_conf(rados_t cluster, const char 
*conf,
  Error **errp)
 {
 char *p, *buf;
-char name[RBD_MAX_CONF_NAME_SIZE];
-char value[RBD_MAX_CONF_VAL_SIZE];
+char *name;
+char *value;
+Error *local_err = NULL;
 int ret = 0;
 
 buf = g_strdup(conf);
 p = buf;
 
 while (p) {
-ret = 

[Qemu-block] [PATCH v2 0/5] RBD: blockdev-add

2017-02-27 Thread Jeff Cody

This series adds blockdev-add for rbd.

Changes from v1:

Overall:

* QAPI interface does not allow arbitrary key/value pairs
  in v2 (Thanks Daniel)

* QAPI interface adds 'mon_host' and 'auth_supported' options (Thanks Daniel)

* Use 'user' instead of 'rbd-id' (Thanks Daniel)


By patch:

Patch 1:
 * Fixed some indentation in patch 1 (Thanks Markus)

Patch 2:
 * 'rbd-id' becomes 'user', and the commit message is fixed. (Thanks Daniel)

Patch 3:
 * Ripple-through from changes in patch 2
 * Removed the string unescape from qemu_rbd_set_keypairs(), because the
   strings have already been unescaped by the time they hit this function.

Patch 4:
 * 'rbd-id' becomes 'user'
 * drop the 'keyvalue-pairs' from the QAPI  (both, thanks Daniel)

Patch 5:
 * new patch
 * Adds the 'server' (mon_host) and 'auth_supported' options to the
   QAPI (Thanks Daniel)


Jeff Cody (5):
  block/rbd: don't copy strings in qemu_rbd_next_tok()
  block/rbd: add all the currently supported runtime_opts
  block/rbd: parse all options via bdrv_parse_filename
  block/rbd: add blockdev-add support
  block/rbd: add support for 'mon_host', 'auth_supported' via QAPI

 block/rbd.c  | 464 ++-
 qapi/block-core.json |  42 -
 2 files changed, 316 insertions(+), 190 deletions(-)

-- 
2.9.3v



[Qemu-block] [PATCH v2 4/5] block/rbd: add blockdev-add support

2017-02-27 Thread Jeff Cody
Signed-off-by: Jeff Cody 
---
 qapi/block-core.json | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5f82d35..5b311ff 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2111,6 +2111,7 @@
 # @replication: Since 2.8
 # @ssh: Since 2.8
 # @iscsi: Since 2.9
+# @rbd: Since 2.9
 #
 # Since: 2.0
 ##
@@ -2120,7 +2121,7 @@
 'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
 'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed',
 'quorum', 'raw', 'replication', 'ssh', 'vdi', 'vhdx', 'vmdk',
-'vpc', 'vvfat' ] }
+'vpc', 'vvfat', 'rbd' ] }
 
 ##
 # @BlockdevOptionsFile:
@@ -2376,7 +2377,6 @@
 'path': 'str',
 '*user': 'str' } }
 
-
 ##
 # @BlkdebugEvent:
 #
@@ -2666,6 +2666,34 @@
 '*timeout': 'int' } }
 
 ##
+# @BlockdevOptionsRbd:
+#
+# @pool:   Ceph pool name.
+#
+# @image:  Image name in the Ceph pool.
+#
+# @conf:   # optional path to Ceph configuration file.  Values
+#  in the configuration file will be overridden by
+#  options specified via QAPI.
+#
+# @snapshot:   #optional Ceph snapshot name.
+#
+# @user:   #optional Ceph id name.
+#
+# @password-secret:#optional The ID of a QCryptoSecret object providing
+#   the password for the login.
+#
+# Since: 2.9
+##
+{ 'struct': 'BlockdevOptionsRbd',
+  'data': { 'pool': 'str',
+'image': 'str',
+'*conf': 'str',
+'*snapshot': 'str',
+'*user': 'str',
+'*password-secret': 'str' } }
+
+##
 # @ReplicationMode:
 #
 # An enumeration of replication modes.
@@ -2863,7 +2891,7 @@
   'qed':'BlockdevOptionsGenericCOWFormat',
   'quorum': 'BlockdevOptionsQuorum',
   'raw':'BlockdevOptionsRaw',
-# TODO rbd: Wait for structured options
+  'rbd':'BlockdevOptionsRbd',
   'replication':'BlockdevOptionsReplication',
 # TODO sheepdog: Wait for structured options
   'ssh':'BlockdevOptionsSsh',
-- 
2.9.3




[Qemu-block] [PATCH v2 2/5] block/rbd: add all the currently supported runtime_opts

2017-02-27 Thread Jeff Cody
This adds all the currently supported runtime opts, which
are the options as parsed from the filename.  All of these
options are explicitly checked for during during runtime,
with an exception to the "keyvalue-pairs" option.

This option contains all the key/value pairs that the QEMU rbd
driver merely unescapes, and passes along blindly to rados.

Signed-off-by: Jeff Cody 
---
 block/rbd.c | 62 ++---
 1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 33c21d8..ff5def4 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -357,6 +357,49 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
 }
 }
 
+static QemuOptsList runtime_opts = {
+.name = "rbd",
+.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
+.desc = {
+{
+.name = "filename",
+.type = QEMU_OPT_STRING,
+.help = "Specification of the rbd image",
+},
+{
+.name = "password-secret",
+.type = QEMU_OPT_STRING,
+.help = "ID of secret providing the password",
+},
+{
+.name = "conf",
+.type = QEMU_OPT_STRING,
+},
+{
+.name = "pool",
+.type = QEMU_OPT_STRING,
+},
+{
+.name = "image",
+.type = QEMU_OPT_STRING,
+},
+{
+.name = "snapshot",
+.type = QEMU_OPT_STRING,
+},
+{
+/* maps to 'id' in rados_create() */
+.name = "user",
+.type = QEMU_OPT_STRING,
+},
+{
+.name = "keyvalue-pairs",
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
 static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 Error *local_err = NULL;
@@ -500,25 +543,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 qemu_aio_unref(acb);
 }
 
-/* TODO Convert to fine grained options */
-static QemuOptsList runtime_opts = {
-.name = "rbd",
-.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
-.desc = {
-{
-.name = "filename",
-.type = QEMU_OPT_STRING,
-.help = "Specification of the rbd image",
-},
-{
-.name = "password-secret",
-.type = QEMU_OPT_STRING,
-.help = "ID of secret providing the password",
-},
-{ /* end of list */ }
-},
-};
-
 static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
  Error **errp)
 {
-- 
2.9.3




[Qemu-block] [PATCH v2 5/5] block/rbd: add support for 'mon_host', 'auth_supported' via QAPI

2017-02-27 Thread Jeff Cody
This adds support for two additional options that may be specified
by QAPI in blockdev-add:

mon_host: servername and port
auth_supported: either 'cephx' or 'none'

Signed-off-by: Jeff Cody 
---
 block/rbd.c  | 39 +++
 qapi/block-core.json |  8 
 2 files changed, 47 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index e04a5e1..51e971e 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -394,6 +394,18 @@ static QemuOptsList runtime_opts = {
 .name = "keyvalue-pairs",
 .type = QEMU_OPT_STRING,
 },
+{
+.name = "server.host",
+.type = QEMU_OPT_STRING,
+},
+{
+.name = "server.port",
+.type = QEMU_OPT_STRING,
+},
+{
+.name = "auth_supported",
+.type = QEMU_OPT_STRING,
+},
 { /* end of list */ }
 },
 };
@@ -559,6 +571,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 {
 BDRVRBDState *s = bs->opaque;
 const char *pool, *snap, *conf, *clientname, *name, *keypairs;
+const char *host, *port, *auth_supported;
 const char *secretid;
 QemuOpts *opts;
 Error *local_err = NULL;
@@ -580,6 +593,9 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 clientname = qemu_opt_get(opts, "user");
 name   = qemu_opt_get(opts, "image");
 keypairs   = qemu_opt_get(opts, "keyvalue-pairs");
+host   = qemu_opt_get(opts, "server.host");
+port   = qemu_opt_get(opts, "server.port");
+auth_supported = qemu_opt_get(opts, "auth_supported");
 
 r = rados_create(>cluster, clientname);
 if (r < 0) {
@@ -604,6 +620,29 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto failed_shutdown;
 }
 
+/* if mon_host was specified */
+if (host) {
+const char *hostname = host;
+char *mon_host = NULL;
+
+if (port) {
+mon_host = g_strdup_printf("%s:%s", host, port);
+hostname = mon_host;
+}
+r = rados_conf_set(s->cluster, "mon_host", hostname);
+g_free(mon_host);
+if (r < 0) {
+goto failed_shutdown;
+}
+}
+
+if (auth_supported) {
+r = rados_conf_set(s->cluster, "auth_supported", auth_supported);
+if (r < 0) {
+goto failed_shutdown;
+}
+}
+
 if (qemu_rbd_set_auth(s->cluster, secretid, errp) < 0) {
 r = -EIO;
 goto failed_shutdown;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5b311ff..376512c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2680,6 +2680,12 @@
 #
 # @user:   #optional Ceph id name.
 #
+# @server: #optional Monitor host address and port.  This maps
+#  to the "mon_host" Ceph option.
+#
+# @auth_supported: #optional Authentication supported.
+#  Either "cephx" or"none".
+#
 # @password-secret:#optional The ID of a QCryptoSecret object providing
 #   the password for the login.
 #
@@ -2691,6 +2697,8 @@
 '*conf': 'str',
 '*snapshot': 'str',
 '*user': 'str',
+'*server': 'InetSocketAddress',
+'*auth_supported': 'str',
 '*password-secret': 'str' } }
 
 ##
-- 
2.9.3




[Qemu-block] [PATCH v2 3/5] block/rbd: parse all options via bdrv_parse_filename

2017-02-27 Thread Jeff Cody
Get rid of qemu_rbd_parsename in favor of bdrv_parse_filename.
This simplifies a lot of the parsing as well, as we can treat everything
a bit simpler since nonexistent options are simply NULL pointers instead
of empy strings.

An important item to note:

Ceph has many extra option values that can be specified as key/value
pairs.  This was handled previously in the driver by extracting the
values that the QEMU driver cared about, and then blindly passing all
extra options to rbd after splitting them into key/value pairs, and
cleaning up any special character escaping.

The practice is continued in this patch; there is an option
"keyvalue-pairs" that is populated with all the key/value pairs that the
QEMU driver does not care about.  These key/value pairs will override
any settings in the 'conf' configuration file, just as they did before.

Signed-off-by: Jeff Cody 
---
 block/rbd.c | 298 ++--
 1 file changed, 148 insertions(+), 150 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index ff5def4..e04a5e1 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -18,6 +18,7 @@
 #include "block/block_int.h"
 #include "crypto/secret.h"
 #include "qemu/cutils.h"
+#include "qapi/qmp/qstring.h"
 
 #include 
 
@@ -151,113 +152,129 @@ static void qemu_rbd_unescape(char *src)
 *p = '\0';
 }
 
-static int qemu_rbd_parsename(const char *filename,
-  char *pool, int pool_len,
-  char *snap, int snap_len,
-  char *name, int name_len,
-  char *conf, int conf_len,
-  Error **errp)
+static void qemu_rbd_parse_filename(const char *filename, QDict *options,
+Error **errp)
 {
 const char *start;
-char *p, *buf;
-int ret = 0;
+char *p, *buf, *keypairs;
 char *found_str;
+size_t max_keypair_size;
 Error *local_err = NULL;
 
 if (!strstart(filename, "rbd:", )) {
 error_setg(errp, "File name must start with 'rbd:'");
-return -EINVAL;
+return;
 }
 
+max_keypair_size = strlen(start) + 1;
 buf = g_strdup(start);
+keypairs = g_malloc0(max_keypair_size);
 p = buf;
-*snap = '\0';
-*conf = '\0';
 
-found_str = qemu_rbd_next_tok(pool_len, p,
+found_str = qemu_rbd_next_tok(RBD_MAX_POOL_NAME_SIZE, p,
   '/', "pool name", , _err);
 if (local_err) {
 goto done;
 }
 if (!p) {
-ret = -EINVAL;
 error_setg(errp, "Pool name is required");
 goto done;
 }
 qemu_rbd_unescape(found_str);
-g_strlcpy(pool, found_str, pool_len);
+qdict_put(options, "pool", qstring_from_str(found_str));
 
 if (strchr(p, '@')) {
-found_str = qemu_rbd_next_tok(name_len, p,
+found_str = qemu_rbd_next_tok(RBD_MAX_IMAGE_NAME_SIZE, p,
   '@', "object name", , _err);
 if (local_err) {
 goto done;
 }
 qemu_rbd_unescape(found_str);
-g_strlcpy(name, found_str, name_len);
+qdict_put(options, "image", qstring_from_str(found_str));
 
-found_str = qemu_rbd_next_tok(snap_len, p,
+found_str = qemu_rbd_next_tok(RBD_MAX_SNAP_NAME_SIZE, p,
   ':', "snap name", , _err);
 if (local_err) {
 goto done;
 }
 qemu_rbd_unescape(found_str);
-g_strlcpy(snap, found_str, snap_len);
+qdict_put(options, "snapshot", qstring_from_str(found_str));
 } else {
-found_str = qemu_rbd_next_tok(name_len, p,
+found_str = qemu_rbd_next_tok(RBD_MAX_IMAGE_NAME_SIZE, p,
   ':', "object name", , _err);
 if (local_err) {
 goto done;
 }
 qemu_rbd_unescape(found_str);
-g_strlcpy(name, found_str, name_len);
+qdict_put(options, "image", qstring_from_str(found_str));
 }
 if (!p) {
 goto done;
 }
 
-found_str = qemu_rbd_next_tok(conf_len, p,
+found_str = qemu_rbd_next_tok(RBD_MAX_CONF_NAME_SIZE, p,
   '\0', "configuration", , _err);
 if (local_err) {
 goto done;
 }
-g_strlcpy(conf, found_str, conf_len);
+
+p = found_str;
+
+/* The following are essentially all key/value pairs, and we treat
+ * 'id' and 'conf' a bit special.  Key/value pairs may be in any order. */
+while (p) {
+char *name, *value;
+name = qemu_rbd_next_tok(RBD_MAX_CONF_NAME_SIZE, p,
+ '=', "conf option name", , _err);
+if (local_err) {
+break;
+}
+
+if (!p) {
+error_setg(errp, "conf option %s has no value", name);
+break;
+}
+
+qemu_rbd_unescape(name);
+
+value = 

Re: [Qemu-block] [PATCH v2] vl: disable default cdrom when using explicitely scsi-hd

2017-02-27 Thread Hervé Poussineau

Ping?

Le 20/02/2017 à 21:41, Hervé Poussineau a écrit :

In commit af6bf1328ef90fae617857c02697e0174b84d596 (May 2011),
ide-hd, ide-cd and scsi-cd have been added to disable default cdrom,
"or else you can't put one on secondary master without -nodefaults".

Make it the same for scsi-hd, so you can put one on scsi-id 2 without
using -nodefaults.
scsi-hd has probably been forgotten, as it has been added in the
preceding commit (b443ae67130d32ad06b06fc9aa6d04d05ccd93ce).

Affected users are the ones using a machine with SCSI devices and start QEMU
with -device scsi-hd but without -device scsi-cd or -cdrom
In that case, the default cdrom device will disappear instead of being empty.

Signed-off-by: Hervé Poussineau 
---
 vl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/vl.c b/vl.c
index 27d9829..4af95b3 100644
--- a/vl.c
+++ b/vl.c
@@ -226,6 +226,7 @@ static struct {
 { .driver = "ide-hd",   .flag = _cdrom },
 { .driver = "ide-drive",.flag = _cdrom },
 { .driver = "scsi-cd",  .flag = _cdrom },
+{ .driver = "scsi-hd",  .flag = _cdrom },
 { .driver = "virtio-serial-pci",.flag = _virtcon   },
 { .driver = "virtio-serial",.flag = _virtcon   },
 { .driver = "VGA",  .flag = _vga   },






Re: [Qemu-block] [Qemu-devel] [PATCH 1/4] block/rbd: don't copy strings in qemu_rbd_next_tok()

2017-02-27 Thread Jeff Cody
On Mon, Feb 27, 2017 at 05:39:45PM +0100, Markus Armbruster wrote:
> Jeff Cody  writes:
> 
> > This patch is prep work for parsing options for .bdrv_parse_filename,
> > and using QDict options.
> >
> > The function qemu_rbd_next_tok() searched for various key/value pairs,
> > and copied them into buffers.  This will soon be an unnecessary extra
> > step, so we will now return found strings by reference only, and
> > offload the responsibility for safely handling/coping these strings to
> > the caller.
> >
> > This also cleans up error handling some, as the callers now rely on
> > the Error object to determine if there is a parse error.
> >
> > Signed-off-by: Jeff Cody 
> > ---
> >  block/rbd.c | 99 
> > +++--
> >  1 file changed, 64 insertions(+), 35 deletions(-)
> >
> > diff --git a/block/rbd.c b/block/rbd.c
> > index 22e8e69..3f1a9de 100644
> > --- a/block/rbd.c
> > +++ b/block/rbd.c
> > @@ -102,10 +102,10 @@ typedef struct BDRVRBDState {
> >  char *snap;
> >  } BDRVRBDState;
> >  
> > -static int qemu_rbd_next_tok(char *dst, int dst_len,
> > - char *src, char delim,
> > - const char *name,
> > - char **p, Error **errp)
> > +static char *qemu_rbd_next_tok(int max_len,
> > +   char *src, char delim,
> > +   const char *name,
> > +   char **p, Error **errp)
> >  {
> >  int l;
> >  char *end;
> 
>*p = NULL;
> 
>if (delim != '\0') {
>for (end = src; *end; ++end) {
>if (*end == delim) {
>break;
>}
>if (*end == '\\' && end[1] != '\0') {
>end++;
>}
>}
>if (*end == delim) {
>*p = end + 1;
>*end = '\0';
> >  }
> >  }
> >  l = strlen(src);
> 
> Not this patch's problem: this is a rather thoughtless way to say
> 
>l = end - src;
> 
> > -if (l >= dst_len) {
> > +if (l >= max_len) {
> >  error_setg(errp, "%s too long", name);
> > -return -EINVAL;
> > +return NULL;
> >  } else if (l == 0) {
> >  error_setg(errp, "%s too short", name);
> > -return -EINVAL;
> > +return NULL;
> >  }
> >  
> > -pstrcpy(dst, dst_len, src);
> > -
> > -return 0;
> > +return src;
> >  }
> 
> Note for later:
> 
> 1. This function always dereferences @src.
> 2. If @delim, it sets *@p to point behind @src plus the delimiter,
>else to NULL
> 3. It returns NULL exactly when it sets an error.
> 4. It returns NULL and sets an error when @src is empty.
> 
> Not this patch's problem, but I have to say it: whoever wrote this code
> should either write simpler functions or get into the habit of writing
> function contract comments.  Because having to document your
> embarrassingly complicated shit is great motivation to simplify (pardon
> my french).
>

Heh.  I had to read and re-read this function multiple times to get a feel
for what it was doing.


> >  
> >  static void qemu_rbd_unescape(char *src)
> > @@ -162,7 +160,9 @@ static int qemu_rbd_parsename(const char *filename,
> 
> This is a parser.  As so often, it is a parser without any hint on what
> it's supposed to parse, let alone a grammar.  Experience tells that
> these are wrong more often than not, and exposing me to yet another one
> is a surefire way to make me grumpy.  Not your fault, of course.
> 
> >  {
> >  const char *start;
> >  char *p, *buf;
> > -int ret;
> > +int ret = 0;
> > +char *found_str;
> > +Error *local_err = NULL;
> >  
> >  if (!strstart(filename, "rbd:", )) {
> >  error_setg(errp, "File name must start with 'rbd:'");
>return -EINVAL;
>}
> 
>buf = g_strdup(start);
>p = buf;
> 
> This assignment to @p ...
> 
> >  *snap = '\0';
> >  *conf = '\0';
> >  
> > -ret = qemu_rbd_next_tok(pool, pool_len, p,
> > -'/', "pool name", , errp);
> > -if (ret < 0 || !p) {
> > +found_str = qemu_rbd_next_tok(pool_len, p,
> > + '/', "pool name", , _err);
> 
> ... is dead, because qemu_rbd_next_tok() assigns to it unconditionally.
> 

While that is true, @p is also used as the src argument to
qemu_rbd_next_tok() in addition (second arg).  We could just pass in @buf
for that argument, but using @p keeps it consistent with the other calls.



> The call extracts the part up to the first unescaped '/' or the end of
> the string.
> 
> > +if (local_err) {
> > +goto done;
> > +}
> > +if (!p) {
> 
> We extracted to end of string, i.e. we didn't find '/'.
> 
> >  ret = -EINVAL;
> > +error_setg(errp, "Pool name is required");
> >  goto done;
> >  }
> > 

Re: [Qemu-block] [PATCH 36/54] commit: Use real permissions in commit block job

2017-02-27 Thread Kevin Wolf
Am 24.02.2017 um 18:29 hat Max Reitz geschrieben:
> On 21.02.2017 15:58, Kevin Wolf wrote:
> > This is probably one of the most interesting conversions to the new
> > op blocker system because a commit block job intentionally leaves some
> > intermediate block nodes in the backing chain that aren't valid on their
> > own any more; only the whole chain together results in a valid view.
> > 
> > In order to provide the 'consistent read' permission to the parents of
> > the 'top' node of the commit job, a new filter block driver is inserted
> > above 'top' which doesn't require 'consistent read' on its backing
> > chain. Subsequently, the commit job can block 'consistent read' on all
> > intermediate nodes without causing a conflict.
> > 
> > Signed-off-by: Kevin Wolf 

> > @@ -262,34 +305,62 @@ void commit_start(const char *job_id, 
> > BlockDriverState *bs,
> >  }
> >  }
> >  
> > +/* Insert commit_top block node above top, so we can block consistent 
> > read
> > + * on the backing chain below it */
> > +commit_top_bs = bdrv_new_open_driver(_commit_top, NULL, 
> > BDRV_O_RDWR,
> 
> Why RDWR when the driver only allows reads anyway?

Good question. I'll try to change it, maybe it doesn't break everything.

> > + errp);
> > +if (commit_top_bs == NULL) {
> > +goto fail;
> > +}
> > +
> > +bdrv_set_backing_hd(commit_top_bs, top);
> > +bdrv_set_backing_hd(overlay_bs, commit_top_bs);
> > +
> > +s->commit_top_bs = commit_top_bs;
> > +bdrv_unref(commit_top_bs);
> >  
> >  /* Block all nodes between top and base, because they will
> >   * disappear from the chain after this operation. */
> >  assert(bdrv_chain_contains(top, base));
> > -for (iter = top; iter != backing_bs(base); iter = backing_bs(iter)) {
> > -/* FIXME Use real permissions */
> > -block_job_add_bdrv(>common, "intermediate node", iter, 0,
> > -   BLK_PERM_ALL, _abort);
> > +for (iter = top; iter != base; iter = backing_bs(iter)) {
> > +/* XXX BLK_PERM_WRITE needs to be allowed so we don't block 
> > ourselves
> > + * at s->base.
> 
> As far as I can see, the loop doesn't even touch base, though...?

If bs isn't writable, bs->backing generally isn't writable either, and
we are touching a parent of base.

> > The other options would be a second filter driver 
> > above
> > + * s->base. */
> > +ret = block_job_add_bdrv(>common, "intermediate node", iter, 0,
> 
> Don't we need CONSISTENT_READ at least for top?

top can't provide CONSISTENT_READ because its backing files can't
provide it. It's the job (one of the jobs) of commit_top_bs to shield
the parents of top from the loss of CONSISTENT_READ.

> > + BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
> > + errp);
> > +if (ret < 0) {
> > +goto fail;
> > +}
> >  }
> > +
> > +ret = block_job_add_bdrv(>common, "base", base, 0, BLK_PERM_ALL, 
> > errp);
> > +if (ret < 0) {
> > +goto fail;
> > +}
> > +
> >  /* overlay_bs must be blocked because it needs to be modified to
> > - * update the backing image string, but if it's the root node then
> > - * don't block it again */
> > -if (bs != overlay_bs) {
> > -/* FIXME Use real permissions */
> > -block_job_add_bdrv(>common, "overlay of top", overlay_bs, 0,
> > -   BLK_PERM_ALL, _abort);
> > + * update the backing image string. */
> > +ret = block_job_add_bdrv(>common, "overlay of top", overlay_bs,
> > + BLK_PERM_GRAPH_MOD, BLK_PERM_ALL, errp);
> > +if (ret < 0) {
> > +goto fail;
> >  }
> >  
> > -/* FIXME Use real permissions */
> > -s->base = blk_new(0, BLK_PERM_ALL);
> > +s->base = blk_new(BLK_PERM_CONSISTENT_READ
> 
> Do we actually need CONSISTENT_READ for the base?

If base doesn't provide CONSISTENT_READ, commit_top_bs wouldn't be able
to provide it either.

If we ever find a case where this is too restrictive because the parents
of commit_top_bs don't need CONSISTENT_READ, we can probably be less
strict in this case, but just getting commit to work is already tricky
enough that I wouldn't like to do it in this patch (or even series).

Kevin


pgpcf7yk85BwX.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH v3 5/6] replication: Implement block replication for shared disk case

2017-02-27 Thread Stefan Hajnoczi
On Fri, Jan 20, 2017 at 11:47:59AM +0800, zhanghailiang wrote:
> Just as the scenario of non-shared disk block replication,
> we are going to implement block replication from many basic
> blocks that are already in QEMU.
> The architecture is:
> 
>  virtio-blk ||   
> .--
>  /  ||   | 
> Secondary
> /   ||   
> '--
>/|| 
> virtio-blk
>   / ||
>   |
>   | ||   
> replication(5)
>   |NBD  >   NBD   (2) 
>   |
>   |  client ||server ---> hidden disk <-- 
> active disk(4)
>   | ^   ||  |
>   |  replication(1) ||  |
>   | |   ||  |
>   |   +-'   ||  |
>  (3)  |drive-backup sync=none   ||  |
> . |   +-+   ||  |
> Primary | | |   ||   backing|
> ' | |   ||  |
>   V |   |
>+---+|
>|   shared disk | <--+
>+---+
> 
> 1) Primary writes will read original data and forward it to Secondary
>QEMU.
> 2) The hidden-disk is created automatically. It buffers the original 
> content
>that is modified by the primary VM. It should also be an empty disk, 
> and
>the driver supports bdrv_make_empty() and backing file.
> 3) Primary write requests will be written to Shared disk.
> 4) Secondary write requests will be buffered in the active disk and it
>will overwrite the existing sector content in the buffer.
> 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Wen Congyang 
> Signed-off-by: Zhang Chen 

Are there any restrictions on the shared disk?  For example the -drive
cache= mode must be 'none'.  If the cache mode isn't 'none' the
secondary host might have old data in the host page cache.  The
Secondary QEMU would have an inconsistent view of the shared disk.

Are image file formats like qcow2 supported for the shared disk?  Extra
steps are required to achieve consistency, see bdrv_invalidate_cache().

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 1/3] block: implement bdrv_snapshot_goto for blkreplay

2017-02-27 Thread Kevin Wolf
Am 27.02.2017 um 13:07 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:kw...@redhat.com]
> > Am 31.01.2017 um 12:57 hat Pavel Dovgalyuk geschrieben:
> > > This patch enables making snapshots with blkreplay used in
> > > block devices.
> > >
> > > Signed-off-by: Pavel Dovgalyuk 
> > 
> > Specifically, I think it avoids the blkreplay_open/close sequence. Is
> > this what is needed to make it work?
> 
> Then I'll need to implement bdrv_open, because there is only bdrv_file_open
> for blkreplay now.
> 
> Which way is better?

I was just checking whether I understood the reason for this correctly.

If I did, then I think your solution is fine and we should just make the
commit message a bit more explicit.

Kevin

> > We should probably mention in the commit message the exact reason why
> > implementing .bdrv_snapshot_goto, but not the other snapshot related
> > callbacks, fixes things. If you confirm my assumption, I can add that
> > while applying.
> 
> Pavel Dovgalyuk
> 



[Qemu-block] [PATCH] block/mirror: fix broken sparseness detection

2017-02-27 Thread John Snow
int64_t is in all likelihood the actual scalar type we want.
Yep, really.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1219541

Signed-off-by: John Snow 
---
 block/mirror.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index ca8547b..7b4cfbd 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -386,7 +386,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 nb_chunks * sectors_per_chunk);
 bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
 while (nb_chunks > 0 && sector_num < end) {
-int ret;
+int64_t ret;
 int io_sectors, io_sectors_acct;
 BlockDriverState *file;
 enum MirrorMethod {
-- 
2.9.3




Re: [Qemu-block] [PATCH v3 2/6] replication: add shared-disk and shared-disk-id options

2017-02-27 Thread Stefan Hajnoczi
On Fri, Jan 20, 2017 at 11:47:56AM +0800, zhanghailiang wrote:
> @@ -119,12 +136,31 @@ static int replication_open(BlockDriverState *bs, QDict 
> *options,
> "The option mode's value should be primary or secondary");
>  goto fail;
>  }
> +s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
> +  false);
> +if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
> +shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
> +if (!shared_disk_id) {
> +error_setg(_err, "Missing shared disk blk option");
> +goto fail;
> +}
> +s->shared_disk_id = g_strdup(shared_disk_id);
> +blk = blk_by_name(s->shared_disk_id);
> +if (!blk) {
> +error_setg(_err, "There is no %s block", 
> s->shared_disk_id);
> +goto fail;
> +}
> +/* We can't access root member of BlockBackend directly */
> +tmp_bs = blk_bs(blk);
> +s->primary_disk = QLIST_FIRST(_bs->parents);

Why is this necessary?

We already have the BlockBackend for the primary disk.  I'm not sure why
the BdrvChild is needed.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH 1/4] block/rbd: don't copy strings in qemu_rbd_next_tok()

2017-02-27 Thread Markus Armbruster
Jeff Cody  writes:

> This patch is prep work for parsing options for .bdrv_parse_filename,
> and using QDict options.
>
> The function qemu_rbd_next_tok() searched for various key/value pairs,
> and copied them into buffers.  This will soon be an unnecessary extra
> step, so we will now return found strings by reference only, and
> offload the responsibility for safely handling/coping these strings to
> the caller.
>
> This also cleans up error handling some, as the callers now rely on
> the Error object to determine if there is a parse error.
>
> Signed-off-by: Jeff Cody 
> ---
>  block/rbd.c | 99 
> +++--
>  1 file changed, 64 insertions(+), 35 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 22e8e69..3f1a9de 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -102,10 +102,10 @@ typedef struct BDRVRBDState {
>  char *snap;
>  } BDRVRBDState;
>  
> -static int qemu_rbd_next_tok(char *dst, int dst_len,
> - char *src, char delim,
> - const char *name,
> - char **p, Error **errp)
> +static char *qemu_rbd_next_tok(int max_len,
> +   char *src, char delim,
> +   const char *name,
> +   char **p, Error **errp)
>  {
>  int l;
>  char *end;

   *p = NULL;

   if (delim != '\0') {
   for (end = src; *end; ++end) {
   if (*end == delim) {
   break;
   }
   if (*end == '\\' && end[1] != '\0') {
   end++;
   }
   }
   if (*end == delim) {
   *p = end + 1;
   *end = '\0';
>  }
>  }
>  l = strlen(src);

Not this patch's problem: this is a rather thoughtless way to say

   l = end - src;

> -if (l >= dst_len) {
> +if (l >= max_len) {
>  error_setg(errp, "%s too long", name);
> -return -EINVAL;
> +return NULL;
>  } else if (l == 0) {
>  error_setg(errp, "%s too short", name);
> -return -EINVAL;
> +return NULL;
>  }
>  
> -pstrcpy(dst, dst_len, src);
> -
> -return 0;
> +return src;
>  }

Note for later:

1. This function always dereferences @src.
2. If @delim, it sets *@p to point behind @src plus the delimiter,
   else to NULL
3. It returns NULL exactly when it sets an error.
4. It returns NULL and sets an error when @src is empty.

Not this patch's problem, but I have to say it: whoever wrote this code
should either write simpler functions or get into the habit of writing
function contract comments.  Because having to document your
embarrassingly complicated shit is great motivation to simplify (pardon
my french).

>  
>  static void qemu_rbd_unescape(char *src)
> @@ -162,7 +160,9 @@ static int qemu_rbd_parsename(const char *filename,

This is a parser.  As so often, it is a parser without any hint on what
it's supposed to parse, let alone a grammar.  Experience tells that
these are wrong more often than not, and exposing me to yet another one
is a surefire way to make me grumpy.  Not your fault, of course.

>  {
>  const char *start;
>  char *p, *buf;
> -int ret;
> +int ret = 0;
> +char *found_str;
> +Error *local_err = NULL;
>  
>  if (!strstart(filename, "rbd:", )) {
>  error_setg(errp, "File name must start with 'rbd:'");
   return -EINVAL;
   }

   buf = g_strdup(start);
   p = buf;

This assignment to @p ...

>  *snap = '\0';
>  *conf = '\0';
>  
> -ret = qemu_rbd_next_tok(pool, pool_len, p,
> -'/', "pool name", , errp);
> -if (ret < 0 || !p) {
> +found_str = qemu_rbd_next_tok(pool_len, p,
> + '/', "pool name", , _err);

... is dead, because qemu_rbd_next_tok() assigns to it unconditionally.

The call extracts the part up to the first unescaped '/' or the end of
the string.

> +if (local_err) {
> +goto done;
> +}
> +if (!p) {

We extracted to end of string, i.e. we didn't find '/'.

>  ret = -EINVAL;
> +error_setg(errp, "Pool name is required");
>  goto done;
>  }
> -qemu_rbd_unescape(pool);
> +qemu_rbd_unescape(found_str);
> +g_strlcpy(pool, found_str, pool_len);

Before, we copy, then unescape the copy.

After, we unescape in place, then copy.

Unescaping can't lengthen the string.  Therefore, this is safe as long
as nothing else uses this part of @buf.

>  
>  if (strchr(p, '@')) {
> -ret = qemu_rbd_next_tok(name, name_len, p,
> -'@', "object name", , errp);
> -if (ret < 0) {
> +found_str = qemu_rbd_next_tok(name_len, p,
> + '@', "object name", , _err);

Extracts from first unescaped '/' to next unescaped '@' or 

Re: [Qemu-block] [PATCH] blk: Add discard=sparse mode

2017-02-27 Thread Max Reitz
On 27.02.2017 17:33, Samuel Thibault wrote:
> Hello,
> 
> Max Reitz, on lun. 27 févr. 2017 17:12:47 +0100, wrote:
>>>  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>>> -if (s->has_discard && s->has_fallocate) {
>>> +if (s->has_discard && (s->has_fallocate || open_flags & 
>>> BDRV_O_SPARSE)) {
>>
>> s->has_fallocate has a meaning. I wouldn't try to call do_fallocate() if
>> s->has_fallocate is false.
> 
> Ah, sorry, I didn't realize that that test wasn't only to check that
> we'll be able to call fallocate(0) further down.
> 
>> Therefore, I consider this to effectively be a no-op.
> 
> Yes.
> 
>>> @@ -1098,7 +1102,8 @@ static ssize_t 
>>> handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>>>  #endif
>>>  
>>>  #ifdef CONFIG_FALLOCATE
>>> -if (s->has_fallocate && aiocb->aio_offset >= 
>>> bdrv_getlength(aiocb->bs)) {
>>> +if (s->has_fallocate && !(open_flags & BDRV_O_SPARSE)
>>> +&& aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
>>
>> First, this part is only invoked if everything before it has failed.
> 
> I misread the code indeed.
> 
>> Unless I'm mistaken, unmap/trim requests from the guest should result in
>> a discard request in the block layer. This should always trigger
>> handle_aiocb_discard() here and that should do what you want it to.
> 
> Mmm, indeed.  I guess I got lost in the hairy block code.

I can understand that very well. ;-)

>> Could you maybe give me the configuration that results in the issue
>> you're describing in the commit message?
> 
> Actually I can't reproduce the issue any more.  I'm now wondering how I
> ended up there.
> 
> Anyway, I'm really sorry for the noise, and thanks for the good work :)

Not a problem at all. In case you happen to encounter the issue again,
just send a report to the qemu-block list.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] blk: Add discard=sparse mode

2017-02-27 Thread Samuel Thibault
Hello,

Max Reitz, on lun. 27 févr. 2017 17:12:47 +0100, wrote:
> >  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> > -if (s->has_discard && s->has_fallocate) {
> > +if (s->has_discard && (s->has_fallocate || open_flags & 
> > BDRV_O_SPARSE)) {
> 
> s->has_fallocate has a meaning. I wouldn't try to call do_fallocate() if
> s->has_fallocate is false.

Ah, sorry, I didn't realize that that test wasn't only to check that
we'll be able to call fallocate(0) further down.

> Therefore, I consider this to effectively be a no-op.

Yes.

> > @@ -1098,7 +1102,8 @@ static ssize_t 
> > handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
> >  #endif
> >  
> >  #ifdef CONFIG_FALLOCATE
> > -if (s->has_fallocate && aiocb->aio_offset >= 
> > bdrv_getlength(aiocb->bs)) {
> > +if (s->has_fallocate && !(open_flags & BDRV_O_SPARSE)
> > +&& aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
> 
> First, this part is only invoked if everything before it has failed.

I misread the code indeed.

> Unless I'm mistaken, unmap/trim requests from the guest should result in
> a discard request in the block layer. This should always trigger
> handle_aiocb_discard() here and that should do what you want it to.

Mmm, indeed.  I guess I got lost in the hairy block code.

> Could you maybe give me the configuration that results in the issue
> you're describing in the commit message?

Actually I can't reproduce the issue any more.  I'm now wondering how I
ended up there.

Anyway, I'm really sorry for the noise, and thanks for the good work :)

Samuel



Re: [Qemu-block] [PATCH] blk: Add discard=sparse mode

2017-02-27 Thread Max Reitz
Hi,

On 27.02.2017 01:45, Samuel Thibault wrote:
> By default, on discard requests, the posix block backend punches holes but
> re-fallocates them to keep the allocated size intact. In some situations
> it is however convenient, when using sparse disk images, to see disk image
> sizes shrink on discard requests.
> 
> This commit adds a discard=sparse mode which does this, by disabling the
> fallocate call.
> 
> Signed-off-by: Samuel Thibault 
> 
> diff --git a/block.c b/block.c
> index b663204f3f..e9cd83210a 100644
> --- a/block.c
> +++ b/block.c
> @@ -665,12 +665,14 @@ static void bdrv_join_options(BlockDriverState *bs, 
> QDict *options,
>   */
>  int bdrv_parse_discard_flags(const char *mode, int *flags)
>  {
> -*flags &= ~BDRV_O_UNMAP;
> +*flags &= ~(BDRV_O_UNMAP | BDRV_O_SPARSE);
>  
>  if (!strcmp(mode, "off") || !strcmp(mode, "ignore")) {
>  /* do nothing */
>  } else if (!strcmp(mode, "on") || !strcmp(mode, "unmap")) {
>  *flags |= BDRV_O_UNMAP;
> +} else if (!strcmp(mode, "sparse")) {
> +*flags |= BDRV_O_UNMAP | BDRV_O_SPARSE;
>  } else {
>  return -1;
>  }
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 4de1abd023..f9efadc5be 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1057,6 +1057,10 @@ static ssize_t 
> handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>  BDRVRawState *s = aiocb->bs->opaque;
>  #endif
>  
> +#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || defined(CONFIG_FALLOCATE)
> +int open_flags = aiocb->bs->open_flags;
> +#endif
> +
>  if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
>  return handle_aiocb_write_zeroes_block(aiocb);
>  }
> @@ -1079,7 +1083,7 @@ static ssize_t 
> handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>  #endif
>  
>  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> -if (s->has_discard && s->has_fallocate) {
> +if (s->has_discard && (s->has_fallocate || open_flags & BDRV_O_SPARSE)) {

s->has_fallocate has a meaning. I wouldn't try to call do_fallocate() if
s->has_fallocate is false. Therefore, I consider this to effectively be
a no-op.

>  int ret = do_fallocate(s->fd,
> FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> aiocb->aio_offset, aiocb->aio_nbytes);
> @@ -1098,7 +1102,8 @@ static ssize_t 
> handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>  #endif
>  
>  #ifdef CONFIG_FALLOCATE
> -if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
> +if (s->has_fallocate && !(open_flags & BDRV_O_SPARSE)
> +&& aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {

First, this part is only invoked if everything before it has failed.
Second, this condition will only be true if we try to write zeroes
beyond the end of the file. This is a pretty unusual request that
normally doesn't happen (it may happen during qcow2 image creation with
preallocation or something like that).

This part here is just a final fallback for growing files on systems
which do not have any advanced fallocate() modes. It seems very unlikely
to me that anyone would hit this in normal operation.

So all in all I don't know how this patch changes anything.


Also, this function is for writing zeroes, not for handling discards.
That is done in handle_aiocb_discard().

And as far as I can tell, handle_aiocb_discard() does exactly what you
want it to do. It's invoked by raw_aio_pdiscard() and also by
raw_co_pwrite_zeroes() if BDRV_REQ_MAY_UNMAP is set. If that flag is not
set, raw_co_pwrite_zeroes() will use the above zero-writing function.

Unless I'm mistaken, unmap/trim requests from the guest should result in
a discard request in the block layer. This should always trigger
handle_aiocb_discard() here and that should do what you want it to.

I don't know exactly what you are doing so maybe for some reason the
request doesn't arrive as a discard but as a write-zeroes. As I said,
even if that is so, I don't see how this patch then changes the behavior
when compared to discard=unmap.

What might make sense is to make BDRV_O_SPARSE always set
BDRV_REQ_MAY_UNMAP for any zero-write request. But I don't know why that
would be necessary. With virtio-scsi, discard requests from the guest
should result in discard requests in the block layer anyway. And
detect-zeroes=unmap does set BDRV_REQ_MAY_UNMAP for the write-zeroes
requests it generates.


Could you maybe give me the configuration that results in the issue
you're describing in the commit message?

Max

>  int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, 
> aiocb->aio_nbytes);
>  if (ret == 0 || ret != -ENOTSUP) {
>  return ret;
> diff --git a/include/block/block.h b/include/block/block.h
> index bde5ebda18..103313bee0 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -97,6 +97,8 @@ typedef struct HDGeometry {
>select an 

Re: [Qemu-block] [PATCH v16 13/22] qcow2: add persistent dirty bitmaps support

2017-02-27 Thread Max Reitz
On 25.02.2017 18:07, Vladimir Sementsov-Ogievskiy wrote:
> Store persistent dirty bitmaps in qcow2 image.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block.c  |   6 +-
>  block/qcow2-bitmap.c | 473 
> +++
>  block/qcow2.c|   9 +
>  block/qcow2.h|   1 +
>  4 files changed, 486 insertions(+), 3 deletions(-)
> 
> diff --git a/block.c b/block.c
> index a0346c80c6..16cf522219 100644
> --- a/block.c
> +++ b/block.c
> @@ -2322,9 +2322,6 @@ static void bdrv_close(BlockDriverState *bs)
>  bdrv_flush(bs);
>  bdrv_drain(bs); /* in case flush left pending I/O */
>  
> -bdrv_release_named_dirty_bitmaps(bs);
> -assert(QLIST_EMPTY(>dirty_bitmaps));
> -
>  if (bs->drv) {
>  BdrvChild *child, *next;
>  
> @@ -2363,6 +2360,9 @@ static void bdrv_close(BlockDriverState *bs)
>  bs->full_open_options = NULL;
>  }
>  
> +bdrv_release_named_dirty_bitmaps(bs);
> +assert(QLIST_EMPTY(>dirty_bitmaps));
> +
>  QLIST_FOREACH_SAFE(ban, >aio_notifiers, list, ban_next) {
>  g_free(ban);
>  }

Might deserve an own patch, but I don't mind.

> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index ba72b7d2ac..e377215d5c 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c

[...]

> @@ -127,6 +145,70 @@ static int check_table_entry(uint64_t entry, int 
> cluster_size)
>  return 0;
>  }
>  
> +static int check_constraints_on_bitmap(BlockDriverState *bs,
> +   const char *name,
> +   uint32_t granularity,
> +   Error **errp)
> +{
> +BDRVQcow2State *s = bs->opaque;
> +int granularity_bits = ctz32(granularity);
> +int64_t len = bdrv_getlength(bs);
> +
> +assert(granularity > 0);
> +assert((granularity & (granularity - 1)) == 0);
> +
> +if (len < 0) {
> +error_setg_errno(errp, -len, "Failed to get size of '%s'",
> + bdrv_get_device_or_node_name(bs));
> +return len;
> +}
> +
> +if (granularity_bits > BME_MAX_GRANULARITY_BITS) {
> +error_setg(errp, "Granularity exceeds maximum (%u bytes)",
> +   1 << BME_MAX_GRANULARITY_BITS);

This will overflow because 1 << 31 is not representable in int (and 1 is
an int). The %u saves it by converting it back, but it's still
implementation-defined behavior at most.

I'd prefer a plain 1ull and %ull. That way, this would be save no matter
what value BME_MAX_GRANULARITY_BITS is.

> +return -EINVAL;
> +}
> +if (granularity_bits < BME_MIN_GRANULARITY_BITS) {
> +error_setg(errp, "Granularity is under minimum (%u bytes)",
> +   1 << BME_MIN_GRANULARITY_BITS);

The same applies here, although this does not have the overflow issue.

Rest looks good (to me O:-)).

Max

> +return -EINVAL;
> +}
> +
> +if ((len > (uint64_t)BME_MAX_PHYS_SIZE << granularity_bits) ||
> +(len > (uint64_t)BME_MAX_TABLE_SIZE * s->cluster_size <<
> +   granularity_bits))
> +{
> +error_setg(errp, "Too much space will be occupied by the bitmap. "
> +   "Use larger granularity");
> +return -EINVAL;
> +}
> +
> +if (strlen(name) > BME_MAX_NAME_SIZE) {
> +error_setg(errp, "Name length exceeds maximum (%u characters)",
> +   BME_MAX_NAME_SIZE);
> +return -EINVAL;
> +}
> +
> +return 0;
> +}



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v16 15/22] qcow2: add .bdrv_can_store_new_dirty_bitmap

2017-02-27 Thread Max Reitz
On 25.02.2017 18:07, Vladimir Sementsov-Ogievskiy wrote:
> Realize .bdrv_can_store_new_dirty_bitmap interface.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: John Snow 
> ---
>  block/qcow2-bitmap.c | 51 +++
>  block/qcow2.c|  2 ++
>  block/qcow2.h|  4 
>  3 files changed, 57 insertions(+)
> 
> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index e377215d5c..8c0c24c208 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c
> @@ -1280,3 +1280,54 @@ fail:
>  
>  bitmap_list_free(bm_list);
>  }
> +
> +bool qcow2_can_store_new_dirty_bitmap(BlockDriverState *bs,
> +  const char *name,
> +  uint32_t granularity,
> +  Error **errp)
> +{
> +BDRVQcow2State *s = bs->opaque;
> +bool found;
> +Qcow2BitmapList *bm_list;
> +
> +if (check_constraints_on_bitmap(bs, name, granularity, errp) != 0) {
> +goto fail;
> +}
> +
> +if (s->nb_bitmaps == 0) {
> +return true;
> +}
> +
> +if (s->nb_bitmaps >= QCOW2_MAX_BITMAPS) {
> +error_setg(errp,
> +   "Maximum number of persistent bitmaps is already 
> reached");
> +goto fail;
> +}
> +
> +if (s->bitmap_directory_size + calc_dir_entry_size(strlen(name), 0) >
> +QCOW2_MAX_BITMAP_DIRECTORY_SIZE)
> +{
> +error_setg(errp, "No enough space in the bitmap directory");

*Not

With that fixed:

Reviewed-by: Max Reitz 

> +goto fail;
> +}
> +
> +bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
> +   s->bitmap_directory_size, errp);
> +if (bm_list == NULL) {
> +goto fail;
> +}
> +
> +found = find_bitmap_by_name(bm_list, name);
> +bitmap_list_free(bm_list);
> +if (found) {
> +error_setg(errp, "Bitmap with the same name is already stored");
> +goto fail;
> +}
> +
> +return true;
> +
> +fail:
> +error_prepend(errp, "Can't make bitmap '%s' persistent in '%s': ",
> +  name, bdrv_get_device_or_node_name(bs));
> +return false;
> +}
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 201e7186f0..8f7937dc50 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -3555,6 +3555,8 @@ BlockDriver bdrv_qcow2 = {
>  
>  .bdrv_detach_aio_context  = qcow2_detach_aio_context,
>  .bdrv_attach_aio_context  = qcow2_attach_aio_context,
> +
> +.bdrv_can_store_new_dirty_bitmap = qcow2_can_store_new_dirty_bitmap,
>  };
>  
>  static void bdrv_qcow2_init(void)
> diff --git a/block/qcow2.h b/block/qcow2.h
> index e2ef5698cd..c291858425 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -622,5 +622,9 @@ int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, 
> BdrvCheckResult *res,
>int64_t *refcount_table_size);
>  void qcow2_load_autoloading_dirty_bitmaps(BlockDriverState *bs, Error 
> **errp);
>  void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error 
> **errp);
> +bool qcow2_can_store_new_dirty_bitmap(BlockDriverState *bs,
> +  const char *name,
> +  uint32_t granularity,
> +  Error **errp);
>  
>  #endif
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 4/4] block/rbd: Add blockdev-add support

2017-02-27 Thread Jeff Cody
On Mon, Feb 27, 2017 at 01:45:47PM +, Daniel P. Berrange wrote:
> On Mon, Feb 27, 2017 at 02:30:41AM -0500, Jeff Cody wrote:
> > Signed-off-by: Jeff Cody 
> > ---
> >  qapi/block-core.json | 47 ---
> >  1 file changed, 44 insertions(+), 3 deletions(-)
> > 
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 5f82d35..08a1419 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -2111,6 +2111,7 @@
> >  # @replication: Since 2.8
> >  # @ssh: Since 2.8
> >  # @iscsi: Since 2.9
> > +# @rbd: Since 2.9
> >  #
> >  # Since: 2.0
> >  ##
> > @@ -2120,7 +2121,7 @@
> >  'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
> >  'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed',
> >  'quorum', 'raw', 'replication', 'ssh', 'vdi', 'vhdx', 'vmdk',
> > -'vpc', 'vvfat' ] }
> > +'vpc', 'vvfat', 'rbd' ] }
> >  
> >  ##
> >  # @BlockdevOptionsFile:
> > @@ -2376,7 +2377,6 @@
> >  'path': 'str',
> >  '*user': 'str' } }
> >  
> > -
> >  ##
> >  # @BlkdebugEvent:
> >  #
> > @@ -2666,6 +2666,47 @@
> >  '*timeout': 'int' } }
> >  
> >  ##
> > +# @BlockdevOptionsRbd:
> > +#
> > +# @pool:   Ceph pool name
> > +#
> > +# @image:  Image name in the Ceph pool
> > +#
> > +# @conf:   # optional path to Ceph configuration file.  Values
> > +#  in the configuration file will be overridden by
> > +#  options specified via QAPI.
> > +#
> > +# @snapshot:   #optional Ceph snapshot name
> > +#
> > +# @rbd-id: #optional Ceph id name
> 
> BTW, I think I'd suggest 'user' or 'username' for this, since that is the more
> common terminology we seem to use for other block drivers
>

OK, I will go with 'user' instead of 'rbd-id'.

I think that fits with the usage terminology in rados_create()
documentation as well:

int rados_create(rados_t * cluster, const char *const id)

[...]

Parameters
   * cluster: where to store the handle
   * id: the user to connect as (i.e. admin, not client.admin)

-Jeff



Re: [Qemu-block] [PATCH 18/54] block: Default .bdrv_child_perm() for format drivers

2017-02-27 Thread Max Reitz
On 27.02.2017 15:05, Kevin Wolf wrote:
> Am 27.02.2017 um 13:34 hat Max Reitz geschrieben:
>> On 27.02.2017 13:33, Kevin Wolf wrote:
>>> Am 25.02.2017 um 12:57 hat Max Reitz geschrieben:
 On 21.02.2017 15:58, Kevin Wolf wrote:
> Almost all format drivers have the same characteristics as far as
> permissions are concerned: They have one or more children for storing
> their own data and, more importantly, metadata (can be written to and
> grow even without external write requests, must be protected against
> other writers and present consistent data) and optionally a backing file
> (this is just data, so like for a filter, it only depends on what the
> parent nodes need).
>
> This provides a default implementation that can be shared by most of
> our format drivers.
>
> Signed-off-by: Kevin Wolf 
> ---
>  block.c   | 42 ++
>  include/block/block_int.h |  8 
>  2 files changed, 50 insertions(+)
>
> diff --git a/block.c b/block.c
> index 523cbd3..f2e7178 100644
> --- a/block.c
> +++ b/block.c
> @@ -1554,6 +1554,48 @@ void bdrv_filter_default_perms(BlockDriverState 
> *bs, BdrvChild *c,
> (c->shared_perm & DEFAULT_PERM_UNCHANGED);
>  }
>  
> +void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
> +   const BdrvChildRole *role,
> +   uint64_t perm, uint64_t shared,
> +   uint64_t *nperm, uint64_t *nshared)
> +{
> +bool backing = (role == _backing);
> +assert(role == _backing || role == _file);
> +
> +if (!backing) {
> +/* Apart from the modifications below, the same permissions are
> + * forwarded and left alone as for filters */
> +bdrv_filter_default_perms(bs, c, role, perm, shared, , 
> );
> +
> +/* Format drivers may touch metadata even if the guest doesn't 
> write */
> +if (!bdrv_is_read_only(bs)) {
> +perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
> +}
> +
> +/* bs->file always needs to be consistent because of the 
> metadata. We
> + * can never allow other users to resize or write to it. */
> +perm |= BLK_PERM_CONSISTENT_READ;
> +shared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
> +} else {
> +/* We want consistent read from backing files if the parent 
> needs it.
> + * No other operations are performed on backing files. */
> +perm &= BLK_PERM_CONSISTENT_READ;
> +
> +/* If the parent can deal with changing data, we're okay with a
> + * writable and resizable backing file. */
> +if (shared & BLK_PERM_WRITE) {
> +shared = BLK_PERM_WRITE | BLK_PERM_RESIZE;

 Wouldn't this break CONSISTENT_READ?
>>>
>>> WRITE (even for multiple users) and CONSISTENT_READ aren't mutually
>>> exclusive. I was afraid that I didn't define CONSISTENT_READ right, but
>>> it appears that the definition is fine:
>>>
>>>  * A user that has the "permission" of consistent reads is guaranteed that
>>>  * their view of the contents of the block device is complete and
>>>  * self-consistent, representing the contents of a disk at a specific
>>>  * point.
>>
>> Right, but writes to the backing file at least to me appear to be a
>> different matter. If those don't break CONSISTENT_READ, then I don't see
>> how commit breaks CONSISTENT_READ for the intermediate nodes.
> 
> There's probably multiple ways to interpret such actions. You could
> understand a commit job as writing the desired image to the base node
> and at the same time it's a shared writer for the intermediate nodes
> that happens to write garbage.

Agreed.

>The question is if this is a useful way
> of seeing it when the job is to prevent accidental data corruption.

Agreed. But then I would infer that any write to a backing file breaks
CONSISTENT_READ on the overlay.

> Note that we need writable backing files for commit, so taking away
> BLK_PERM_WRITE from shared wouldn't work. We could probably make it
> dependent on cleared CONSISTENT_READ (commit jobs don't require this
> anyway), if you think that the current version is too permissive.

I agree that we need to be able to share WRITE for a backing file. But I
think this should only be set if the overlay's parents do not require
CONSISTENT_READ from the overlay.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 22/54] block: Request real permissions in bdrv_attach_child()

2017-02-27 Thread Kevin Wolf
Am 22.02.2017 um 15:31 hat Max Reitz geschrieben:
> On 21.02.2017 15:58, Kevin Wolf wrote:
> > Now that all block drivers with children tell us what permissions they
> > need from each of their children, bdrv_attach_child() can use this
> > information and make the right requirements while trying to attach new
> > children.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  block.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block.c b/block.c
> > index 1c5f211..054e6f0 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -1659,10 +1659,14 @@ BdrvChild *bdrv_attach_child(BlockDriverState 
> > *parent_bs,
> >   Error **errp)
> >  {
> >  BdrvChild *child;
> > +uint64_t perm, shared_perm;
> > +
> > +assert(parent_bs->drv);
> > +parent_bs->drv->bdrv_child_perm(parent_bs, NULL, child_role,
> > +0, BLK_PERM_ALL, , _perm);
> 
> Another Second Thought™: Why do we request no permissions for the new
> child here? Seems weird to me. Shouldn't the caller specify the
> necessary permissions and what can be shared?

Actually not the caller, but we should calculate the cumulative
permissions of parent_bs, like in bdrv_update_perm().

Kevin


pgpYq0G6ZCAWr.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH 18/54] block: Default .bdrv_child_perm() for format drivers

2017-02-27 Thread Kevin Wolf
Am 27.02.2017 um 13:34 hat Max Reitz geschrieben:
> On 27.02.2017 13:33, Kevin Wolf wrote:
> > Am 25.02.2017 um 12:57 hat Max Reitz geschrieben:
> >> On 21.02.2017 15:58, Kevin Wolf wrote:
> >>> Almost all format drivers have the same characteristics as far as
> >>> permissions are concerned: They have one or more children for storing
> >>> their own data and, more importantly, metadata (can be written to and
> >>> grow even without external write requests, must be protected against
> >>> other writers and present consistent data) and optionally a backing file
> >>> (this is just data, so like for a filter, it only depends on what the
> >>> parent nodes need).
> >>>
> >>> This provides a default implementation that can be shared by most of
> >>> our format drivers.
> >>>
> >>> Signed-off-by: Kevin Wolf 
> >>> ---
> >>>  block.c   | 42 ++
> >>>  include/block/block_int.h |  8 
> >>>  2 files changed, 50 insertions(+)
> >>>
> >>> diff --git a/block.c b/block.c
> >>> index 523cbd3..f2e7178 100644
> >>> --- a/block.c
> >>> +++ b/block.c
> >>> @@ -1554,6 +1554,48 @@ void bdrv_filter_default_perms(BlockDriverState 
> >>> *bs, BdrvChild *c,
> >>> (c->shared_perm & DEFAULT_PERM_UNCHANGED);
> >>>  }
> >>>  
> >>> +void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
> >>> +   const BdrvChildRole *role,
> >>> +   uint64_t perm, uint64_t shared,
> >>> +   uint64_t *nperm, uint64_t *nshared)
> >>> +{
> >>> +bool backing = (role == _backing);
> >>> +assert(role == _backing || role == _file);
> >>> +
> >>> +if (!backing) {
> >>> +/* Apart from the modifications below, the same permissions are
> >>> + * forwarded and left alone as for filters */
> >>> +bdrv_filter_default_perms(bs, c, role, perm, shared, , 
> >>> );
> >>> +
> >>> +/* Format drivers may touch metadata even if the guest doesn't 
> >>> write */
> >>> +if (!bdrv_is_read_only(bs)) {
> >>> +perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
> >>> +}
> >>> +
> >>> +/* bs->file always needs to be consistent because of the 
> >>> metadata. We
> >>> + * can never allow other users to resize or write to it. */
> >>> +perm |= BLK_PERM_CONSISTENT_READ;
> >>> +shared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
> >>> +} else {
> >>> +/* We want consistent read from backing files if the parent 
> >>> needs it.
> >>> + * No other operations are performed on backing files. */
> >>> +perm &= BLK_PERM_CONSISTENT_READ;
> >>> +
> >>> +/* If the parent can deal with changing data, we're okay with a
> >>> + * writable and resizable backing file. */
> >>> +if (shared & BLK_PERM_WRITE) {
> >>> +shared = BLK_PERM_WRITE | BLK_PERM_RESIZE;
> >>
> >> Wouldn't this break CONSISTENT_READ?
> > 
> > WRITE (even for multiple users) and CONSISTENT_READ aren't mutually
> > exclusive. I was afraid that I didn't define CONSISTENT_READ right, but
> > it appears that the definition is fine:
> > 
> >  * A user that has the "permission" of consistent reads is guaranteed that
> >  * their view of the contents of the block device is complete and
> >  * self-consistent, representing the contents of a disk at a specific
> >  * point.
> 
> Right, but writes to the backing file at least to me appear to be a
> different matter. If those don't break CONSISTENT_READ, then I don't see
> how commit breaks CONSISTENT_READ for the intermediate nodes.

There's probably multiple ways to interpret such actions. You could
understand a commit job as writing the desired image to the base node
and at the same time it's a shared writer for the intermediate nodes
that happens to write garbage. The question is if this is a useful way
of seeing it when the job is to prevent accidental data corruption.

Note that we need writable backing files for commit, so taking away
BLK_PERM_WRITE from shared wouldn't work. We could probably make it
dependent on cleared CONSISTENT_READ (commit jobs don't require this
anyway), if you think that the current version is too permissive.

Kevin


pgp23uuMhAZE3.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH 4/4] block/rbd: Add blockdev-add support

2017-02-27 Thread Daniel P. Berrange
On Mon, Feb 27, 2017 at 02:30:41AM -0500, Jeff Cody wrote:
> Signed-off-by: Jeff Cody 
> ---
>  qapi/block-core.json | 47 ---
>  1 file changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 5f82d35..08a1419 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2111,6 +2111,7 @@
>  # @replication: Since 2.8
>  # @ssh: Since 2.8
>  # @iscsi: Since 2.9
> +# @rbd: Since 2.9
>  #
>  # Since: 2.0
>  ##
> @@ -2120,7 +2121,7 @@
>  'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
>  'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed',
>  'quorum', 'raw', 'replication', 'ssh', 'vdi', 'vhdx', 'vmdk',
> -'vpc', 'vvfat' ] }
> +'vpc', 'vvfat', 'rbd' ] }
>  
>  ##
>  # @BlockdevOptionsFile:
> @@ -2376,7 +2377,6 @@
>  'path': 'str',
>  '*user': 'str' } }
>  
> -
>  ##
>  # @BlkdebugEvent:
>  #
> @@ -2666,6 +2666,47 @@
>  '*timeout': 'int' } }
>  
>  ##
> +# @BlockdevOptionsRbd:
> +#
> +# @pool:   Ceph pool name
> +#
> +# @image:  Image name in the Ceph pool
> +#
> +# @conf:   # optional path to Ceph configuration file.  Values
> +#  in the configuration file will be overridden by
> +#  options specified via QAPI.
> +#
> +# @snapshot:   #optional Ceph snapshot name
> +#
> +# @rbd-id: #optional Ceph id name

BTW, I think I'd suggest 'user' or 'username' for this, since that is the more
common terminology we seem to use for other block drivers

> +#
> +# @password-secret:#optional The ID of a QCryptoSecret object providing
> +#   the password for the login.
> +#
> +# @keyvalue-pairs: #optional  string containing key/value pairs for
> +#  additional Ceph configuration, not including "id" or 
> "conf"
> +#  options. This can be used to specify any of the 
> options
> +#  that Ceph supports.  The format is of the form:
> +#   key1=value1:key2=value2:[...]
> +#
> +#  Special characters such as ":" and "=" can be escaped
> +#  with a '\' character, which means the QAPI needs an
> +#  extra '\' character to pass the needed escape 
> character.
> +#  For example:
> +#"keyvalue-pairs": "mon_host=127.0.0.1\\:6321"

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-block] [PATCH 4/4] block/rbd: Add blockdev-add support

2017-02-27 Thread Jeff Cody
On Mon, Feb 27, 2017 at 01:30:46PM +, Daniel P. Berrange wrote:
> On Mon, Feb 27, 2017 at 08:18:59AM -0500, Jeff Cody wrote:
> > On Mon, Feb 27, 2017 at 09:31:21AM +, Daniel P. Berrange wrote:
> > > On Mon, Feb 27, 2017 at 02:36:13AM -0500, Jeff Cody wrote:
> > > > On Mon, Feb 27, 2017 at 02:30:41AM -0500, Jeff Cody wrote:
> > > > > Signed-off-by: Jeff Cody 
> > > > > ---
> > > > >  qapi/block-core.json | 47 
> > > > > ---
> > > > >  1 file changed, 44 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > > > > index 5f82d35..08a1419 100644
> > > > > --- a/qapi/block-core.json
> > > > > +++ b/qapi/block-core.json
> > > > > @@ -2111,6 +2111,7 @@
> > > > >  # @replication: Since 2.8
> > > > >  # @ssh: Since 2.8
> > > > >  # @iscsi: Since 2.9
> > > > > +# @rbd: Since 2.9
> > > > >  #
> > > > >  # Since: 2.0
> > > > >  ##
> > > > > @@ -2120,7 +2121,7 @@
> > > > >  'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 
> > > > > 'nfs',
> > > > >  'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 
> > > > > 'qed',
> > > > >  'quorum', 'raw', 'replication', 'ssh', 'vdi', 'vhdx', 
> > > > > 'vmdk',
> > > > > -'vpc', 'vvfat' ] }
> > > > > +'vpc', 'vvfat', 'rbd' ] }
> > > > >  
> > > > >  ##
> > > > >  # @BlockdevOptionsFile:
> > > > > @@ -2376,7 +2377,6 @@
> > > > >  'path': 'str',
> > > > >  '*user': 'str' } }
> > > > >  
> > > > > -
> > > > >  ##
> > > > >  # @BlkdebugEvent:
> > > > >  #
> > > > > @@ -2666,6 +2666,47 @@
> > > > >  '*timeout': 'int' } }
> > > > >  
> > > > >  ##
> > > > > +# @BlockdevOptionsRbd:
> > > > > +#
> > > > > +# @pool:   Ceph pool name
> > > > > +#
> > > > > +# @image:  Image name in the Ceph pool
> > > > > +#
> > > > > +# @conf:   # optional path to Ceph configuration file.  
> > > > > Values
> > > > > +#  in the configuration file will be overridden 
> > > > > by
> > > > > +#  options specified via QAPI.
> > > > > +#
> > > > > +# @snapshot:   #optional Ceph snapshot name
> > > > > +#
> > > > > +# @rbd-id: #optional Ceph id name
> > > > > +#
> > > > > +# @password-secret:#optional The ID of a QCryptoSecret object 
> > > > > providing
> > > > > +#   the password for the login.
> > > > > +#
> > > > > +# @keyvalue-pairs: #optional  string containing key/value pairs 
> > > > > for
> > > > > +#  additional Ceph configuration, not including 
> > > > > "id" or "conf"
> > > > > +#  options. This can be used to specify any of 
> > > > > the options
> > > > > +#  that Ceph supports.  The format is of the 
> > > > > form:
> > > > > +#   key1=value1:key2=value2:[...]
> > > > > +#
> > > > > +#  Special characters such as ":" and "=" can be 
> > > > > escaped
> > > > > +#  with a '\' character, which means the QAPI 
> > > > > needs an
> > > > > +#  extra '\' character to pass the needed escape 
> > > > > character.
> > > > > +#  For example:
> > > > > +#"keyvalue-pairs": 
> > > > > "mon_host=127.0.0.1\\:6321"
> > > > > +#
> > > > 
> > > > This is the key / value pair issue mentioned in the cover letter.  
> > > > Encoding
> > > > all the options as a string like this is ugly.  What is the preference 
> > > > on
> > > > how to handle these via QAPI, when the actual key and value pairs could 
> > > > be
> > > > anything?   Talking with Markus on IRC, one option he mentioned was an 
> > > > array
> > > > of a generic struct of 'key' and 'value' pairs.
> > > > 
> > > > Do the libvirt folks have any interface preferences here?
> > > 
> > > IMHO, we should formally model each option that we need to be able to 
> > > provide
> > > and *not* provide any generic passthrough feature in QAPI.
> > > 
> > > Particularly for the server hostname/port, we should have the same QAPI
> > > modelling approach that we did for other network protocols.
> > > 
> > >
> > 
> > That is a sane position to take, but the problem is I really have no idea
> > all the options to include or not include here.
> 
> Libvirt relies on the following
> 
>  - id - to provide the username
>  - mon_host   - to provide the list of host+ports
>  - auth_supported - to provide the list of authentication schemes to try
>  - conf   - to proide the ceph config file
> 
> 
> > However, maybe it doesn't matter, at least for 2.9 - for the QAPI command,
> > we could drop the extra arguments completely (i.e., just drop the
> > keyvalue-pairs argument, above).  The extra options could still be set via a
> > config file passed in via 'conf', and in release > 2.9 we can gradually (or
> > 

Re: [Qemu-block] [PATCH v16 11/22] block: introduce persistent dirty bitmaps

2017-02-27 Thread Max Reitz
On 25.02.2017 18:07, Vladimir Sementsov-Ogievskiy wrote:
> New field BdrvDirtyBitmap.persistent means, that bitmap should be saved
> on bdrv_close, using format driver.

Somehow this sentence stays valid, but it has a much different meaning
now. bdrv_close() no longer directly takes care of saving bitmaps but
its completely up to the format driver. In any case, this patch has no
longer anything to do with that, so I think this statement should be
changed.

Max

> Format driver should maintain bitmap
> storing.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/dirty-bitmap.c | 26 ++
>  block/qcow2-bitmap.c |  1 +
>  include/block/dirty-bitmap.h |  6 ++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index a9dfce8d00..d2fbf55964 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -44,6 +44,7 @@ struct BdrvDirtyBitmap {
>  int64_t size;   /* Size of the bitmap (Number of sectors) */
>  bool disabled;  /* Bitmap is read-only */
>  int active_iterators;   /* How many iterators are active */
> +bool persistent;/* bitmap must be saved to owner disk image 
> */
>  bool autoload;  /* For persistent bitmaps: bitmap must be
> autoloaded on image opening */
>  QLIST_ENTRY(BdrvDirtyBitmap) list;
> @@ -72,6 +73,7 @@ void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
>  assert(!bdrv_dirty_bitmap_frozen(bitmap));
>  g_free(bitmap->name);
>  bitmap->name = NULL;
> +bitmap->persistent = false;
>  bitmap->autoload = false;
>  }
>  
> @@ -241,6 +243,8 @@ BdrvDirtyBitmap 
> *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs,
>  bitmap->name = NULL;
>  successor->name = name;
>  bitmap->successor = NULL;
> +successor->persistent = bitmap->persistent;
> +bitmap->persistent = false;
>  successor->autoload = bitmap->autoload;
>  bitmap->autoload = false;
>  bdrv_release_dirty_bitmap(bs, bitmap);
> @@ -555,3 +559,25 @@ bool bdrv_dirty_bitmap_get_autoload(const 
> BdrvDirtyBitmap *bitmap)
>  {
>  return bitmap->autoload;
>  }
> +
> +void bdrv_dirty_bitmap_set_persistance(BdrvDirtyBitmap *bitmap, bool 
> persistent)
> +{
> +bitmap->persistent = persistent;
> +}
> +
> +bool bdrv_dirty_bitmap_get_persistance(BdrvDirtyBitmap *bitmap)
> +{
> +return bitmap->persistent;
> +}
> +
> +bool bdrv_has_persistent_bitmaps(BlockDriverState *bs)
> +{
> +BdrvDirtyBitmap *bm;
> +QLIST_FOREACH(bm, >dirty_bitmaps, list) {
> +if (bm->persistent) {
> +return true;
> +}
> +}
> +
> +return false;
> +}
> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index 6b1a2c9c67..ba72b7d2ac 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c
> @@ -780,6 +780,7 @@ void 
> qcow2_load_autoloading_dirty_bitmaps(BlockDriverState *bs, Error **errp)
>  goto fail;
>  }
>  
> +bdrv_dirty_bitmap_set_persistance(bitmap, true);
>  bdrv_dirty_bitmap_set_autoload(bitmap, true);
>  bm->flags |= BME_FLAG_IN_USE;
>  created_dirty_bitmaps =
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index 45a389a20a..8dbd16b040 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -77,4 +77,10 @@ void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap 
> *bitmap);
>  
>  void bdrv_dirty_bitmap_set_autoload(BdrvDirtyBitmap *bitmap, bool autoload);
>  bool bdrv_dirty_bitmap_get_autoload(const BdrvDirtyBitmap *bitmap);
> +void bdrv_dirty_bitmap_set_persistance(BdrvDirtyBitmap *bitmap,
> +bool persistent);
> +bool bdrv_dirty_bitmap_get_persistance(BdrvDirtyBitmap *bitmap);
> +
> +bool bdrv_has_persistent_bitmaps(BlockDriverState *bs);
> +
>  #endif
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] Non-flat command line option argument syntax

2017-02-27 Thread Markus Armbruster
Kevin Wolf  writes:

> Am 27.02.2017 um 11:27 hat Markus Armbruster geschrieben:
>> Markus Armbruster  writes:
>> 
>> [...]
>> > === Dotted keys ===
>> >
>> > One sufficiently powerful syntax extension already exists: the dotted
>> > key convention.  It's syntactically unambiguous only when none of the
>> > KEYs involved contains '.'  To adopt it across the board, we'd have to
>> > outlaw '.' in KEYs.  QAPI outlaws '.' already, but we have a bunch of
>> > QOM properties names with '.'.  We'd have to rename at least the ones
>> > that need to be accessible in -object.
>> >
>> > Dotted keys can't express member names that look like integers.  We'd
>> > have to outlaw them at least for the objects that are accessible on the
>> > command line.  Once again, QAPI outlaws such names already.  QOM is
>> > anarchy when it comes to names, however.
>> >
>> > The way dotted keys do arrays is inconsistent with how QOM's automatic
>> > arrayification (commit 3396590) do them: foo.0 vs. foo[0].  Backward
>> > compatibility makes changing the dotted key convention awkward.  Perhaps
>> > we can still change QOM.
>> 
>> Design flaw: there is no good way to denote an empty array or object
>> other than the root object.
>> 
>> Empty KEY=VALUE,... is valid and results in an empty root object.
>> 
>> Presence of a KEY that contains periods results in additional non-root
>> objects or arrays.  For instance, KEY a.b.c results in root object
>> member "a" that has member "b" that has (scalar) member "c".
>> 
>> These additional objects and arrays all have at least one member, by
>> construction.
>> 
>> Begs the question how to denote an empty object or array other than the
>> root.
>> 
>> A natural idea is to interpret "absent in KEY=VALUE,..." as empty.
>> After all, removing one key from it removes one member when there are
>> more, so why not when there aren't.
>> 
>> Sadly, it doesn't work: "absent in KEY=VALUE,..." already means
>> "optional object/array absent", which isn't the same as "empty
>> object/array present".
>> 
>> Without additional syntax, all we can do is choose what exactly to make
>> impossible:
>> 
>> * Absent key means absent, period.  No way to do empty array or object.
>>   This is what I implemented.
>
> I'm not currently aware of any places where the difference between a
> present, but empty array and an absent array is actually significant, so
> this is probably the most consistent and useful way to interpret things.
>
> In other words, I agree with your implementation.
>
>> * Absent key means absent, except when the member is visited it means
>>   empty.  No way to do absent optional array or object.
>> 
>> * Likewise, but if the visit is preceeded by a test for presence with
>>   visit_optional(), it means absent again.  No way to do present
>>   optional empty array or object.  This requires keeping additional
>>   state.
>> 
>> Any bright ideas on how to avoid making things impossible?
>
> I can't see any other option than extending the syntax if we need this.
> We can't tell the difference between a string and any other object
> description after =, so we would need to make use of reserved characters
> in the key name.

Think so, too.

>  Maybe just 'foo.array[]' (without any =) for an empty
> array or something like that.

Yes, that should do.  Likewise foo.object{} for empty object.

{} doesn't even need quoting.  [] may.

A trailing period without '=' makes some sense, but looks a bit
error-prone, and can't distinguish between array and object.

> Before we introduce anything like this, do we actually need it?

I don't know whether anything needs optional, present and empty.  But
even if the answer is "no" today, it need not remain "no".

Anyone running into a case of "yes", will have to fall back to the JSON
form of -blockdev.  Strengthens my belief that providing JSON there is a
good idea.

The insufficient generality of dotted keys bugs me a bit.  Not sure
whether it justifies more syntax now.  But we should document it.



Re: [Qemu-block] [PATCH V2] qemu-img: make convert async

2017-02-27 Thread Stefan Hajnoczi
On Mon, Feb 27, 2017 at 12:03:14PM +0100, Peter Lieven wrote:
> the convert process is currently completely implemented with sync operations.
> That means it reads one buffer and then writes it. No parallelism and each 
> sync
> request takes as long as it takes until it is completed.
> 
> This can be a big performance hit when the convert process reads and writes
> to devices which do not benefit from kernel readahead or pagecache.
> In our environment we heavily have the following two use cases when using
> qemu-img convert.
> 
> a) reading from NFS and writing to iSCSI for deploying templates
> b) reading from iSCSI and writing to NFS for backups
> 
> In both processes we use libiscsi and libnfs so we have no kernel cache.
> 
> This patch changes the convert process to work with parallel running 
> coroutines
> which can significantly improve performance for network storage devices:
> 
> qemu-img (master)
>  nfs -> iscsi 22.8 secs
>  nfs -> ram   11.7 secs
>  ram -> iscsi 12.3 secs
> 
> qemu-img-async (8 coroutines, in-order write disabled)
>  nfs -> iscsi 11.0 secs
>  nfs -> ram   10.4 secs
>  ram -> iscsi  9.0 secs
> 
> This patches introduces 2 new cmdline parameters. The -m parameter to specify
> the number of coroutines running in parallel (defaults to 8). And the -W 
> paremeter to
> allow qemu-img to write to the target out of order rather than sequential. 
> This improves
> performance as the writes do not have to wait for each other to complete.
> 
> Signed-off-by: Peter Lieven 
> ---
> V1->V2: - do not calculate source partition globally [Kevin]
> - don't use s->status outside the global lock [Kevin]
> - remove accidently left bracket in qemu-img.texi [Kevin]
> - reworkd -W parageaph in documentation [Stefan]
> 
>RFC->V1: - add documentation
> - add missing coroutine_fn annotation [Stefan]
> - add a comment why it is safe to call coroutine_enter [Stefan]
> - check -m paramater for values < 1 [Stefan]
> - disallow -W parameter with compression [Stefan]
> 
> RFC V3->V4: - avoid to prepare a request queue upfront [Kevin]
> - do not ignore the BLK_BACKING_FILE status [Kevin]
> - redesign the interface to the read and write routines [Kevin]
> 
> RFC V2->V3: - updated stats in the commit msg from a host with a better 
> network card
> - only wake up the coroutine that is acutally waiting for a write 
> to complete.
>   this was not only overhead, but also breaking at least linux 
> AIO.
> - fix coding style complaints
> - rename some variables and structs
> 
> RFC V1->V2: - using coroutine as worker "threads". [Max]
> - keeping the request queue as otherwise it happens
>   that we wait on BLK_ZERO chunks while keeping the write order.
>   it also avoids redundant calls to get_block_status and helps
>   to skip some conditions for fully allocated imaged 
> (!s->min_sparse)
> 
> ---
>  qemu-img-cmds.hx |   4 +-
>  qemu-img.c   | 322 
> ++-
>  qemu-img.texi|  16 ++-
>  3 files changed, 243 insertions(+), 99 deletions(-)

I haven't checked the locking issues that Kevin pointed out, but I'm
happy on the other aspects:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 4/4] block/rbd: Add blockdev-add support

2017-02-27 Thread Daniel P. Berrange
On Mon, Feb 27, 2017 at 08:18:59AM -0500, Jeff Cody wrote:
> On Mon, Feb 27, 2017 at 09:31:21AM +, Daniel P. Berrange wrote:
> > On Mon, Feb 27, 2017 at 02:36:13AM -0500, Jeff Cody wrote:
> > > On Mon, Feb 27, 2017 at 02:30:41AM -0500, Jeff Cody wrote:
> > > > Signed-off-by: Jeff Cody 
> > > > ---
> > > >  qapi/block-core.json | 47 
> > > > ---
> > > >  1 file changed, 44 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > > > index 5f82d35..08a1419 100644
> > > > --- a/qapi/block-core.json
> > > > +++ b/qapi/block-core.json
> > > > @@ -2111,6 +2111,7 @@
> > > >  # @replication: Since 2.8
> > > >  # @ssh: Since 2.8
> > > >  # @iscsi: Since 2.9
> > > > +# @rbd: Since 2.9
> > > >  #
> > > >  # Since: 2.0
> > > >  ##
> > > > @@ -2120,7 +2121,7 @@
> > > >  'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 
> > > > 'nfs',
> > > >  'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed',
> > > >  'quorum', 'raw', 'replication', 'ssh', 'vdi', 'vhdx', 
> > > > 'vmdk',
> > > > -'vpc', 'vvfat' ] }
> > > > +'vpc', 'vvfat', 'rbd' ] }
> > > >  
> > > >  ##
> > > >  # @BlockdevOptionsFile:
> > > > @@ -2376,7 +2377,6 @@
> > > >  'path': 'str',
> > > >  '*user': 'str' } }
> > > >  
> > > > -
> > > >  ##
> > > >  # @BlkdebugEvent:
> > > >  #
> > > > @@ -2666,6 +2666,47 @@
> > > >  '*timeout': 'int' } }
> > > >  
> > > >  ##
> > > > +# @BlockdevOptionsRbd:
> > > > +#
> > > > +# @pool:   Ceph pool name
> > > > +#
> > > > +# @image:  Image name in the Ceph pool
> > > > +#
> > > > +# @conf:   # optional path to Ceph configuration file.  
> > > > Values
> > > > +#  in the configuration file will be overridden by
> > > > +#  options specified via QAPI.
> > > > +#
> > > > +# @snapshot:   #optional Ceph snapshot name
> > > > +#
> > > > +# @rbd-id: #optional Ceph id name
> > > > +#
> > > > +# @password-secret:#optional The ID of a QCryptoSecret object 
> > > > providing
> > > > +#   the password for the login.
> > > > +#
> > > > +# @keyvalue-pairs: #optional  string containing key/value pairs for
> > > > +#  additional Ceph configuration, not including 
> > > > "id" or "conf"
> > > > +#  options. This can be used to specify any of the 
> > > > options
> > > > +#  that Ceph supports.  The format is of the form:
> > > > +#   key1=value1:key2=value2:[...]
> > > > +#
> > > > +#  Special characters such as ":" and "=" can be 
> > > > escaped
> > > > +#  with a '\' character, which means the QAPI 
> > > > needs an
> > > > +#  extra '\' character to pass the needed escape 
> > > > character.
> > > > +#  For example:
> > > > +#"keyvalue-pairs": 
> > > > "mon_host=127.0.0.1\\:6321"
> > > > +#
> > > 
> > > This is the key / value pair issue mentioned in the cover letter.  
> > > Encoding
> > > all the options as a string like this is ugly.  What is the preference on
> > > how to handle these via QAPI, when the actual key and value pairs could be
> > > anything?   Talking with Markus on IRC, one option he mentioned was an 
> > > array
> > > of a generic struct of 'key' and 'value' pairs.
> > > 
> > > Do the libvirt folks have any interface preferences here?
> > 
> > IMHO, we should formally model each option that we need to be able to 
> > provide
> > and *not* provide any generic passthrough feature in QAPI.
> > 
> > Particularly for the server hostname/port, we should have the same QAPI
> > modelling approach that we did for other network protocols.
> > 
> >
> 
> That is a sane position to take, but the problem is I really have no idea
> all the options to include or not include here.

Libvirt relies on the following

 - id - to provide the username
 - mon_host   - to provide the list of host+ports
 - auth_supported - to provide the list of authentication schemes to try
 - conf   - to proide the ceph config file


> However, maybe it doesn't matter, at least for 2.9 - for the QAPI command,
> we could drop the extra arguments completely (i.e., just drop the
> keyvalue-pairs argument, above).  The extra options could still be set via a
> config file passed in via 'conf', and in release > 2.9 we can gradually (or
> not-so-gradually) add in additional options directly supported via QAPI.
> 
> The filename parsing would remain the same, for backwards compatibility, of
> course.
> 
> Does this sound reasonable to you?

If we support the pieces libvirt needs, then I've no objection to dropping
the rest.

Regards,
Daniel
-- 
|: http://berrange.com  -o-

  1   2   >