Re: [Qemu-block] [Qemu-devel] [PATCH 1/1] MAINTAINERS: Fix typo, block/stream.h -> block/stream.c

2016-03-09 Thread Fam Zheng
On Wed, 03/09 21:54, Jeff Cody wrote:
> There is no block/stream.h, the intended filename is block/stream.c
> instead.
> 
> Signed-off-by: Jeff Cody 
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index dc0aa54..e04e9e9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1008,7 +1008,7 @@ F: blockjob.c
>  F: include/block/blockjob.h
>  F: block/backup.c
>  F: block/commit.c
> -F: block/stream.h
> +F: block/stream.c
>  F: block/mirror.c
>  T: git git://github.com/codyprime/qemu-kvm-jtc.git block
>  
> -- 
> 1.9.3
> 
> 

Reviewed-by: Fam Zheng 



[Qemu-block] [PATCH v12 1/3] Add new block driver interface to add/delete a BDS's child

2016-03-09 Thread Changlong Xie
From: Wen Congyang 

In some cases, we want to take a quorum child offline, and take
another child online.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Reviewed-by: Max Reitz 
---
 block.c   | 49 +++
 include/block/block.h |  4 
 include/block/block_int.h |  5 +
 3 files changed, 58 insertions(+)

diff --git a/block.c b/block.c
index ba24b8e..d48f441 100644
--- a/block.c
+++ b/block.c
@@ -4395,3 +4395,52 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 QDECREF(json);
 }
 }
+
+/*
+ * Hot add/remove a BDS's child. So the user can take a child offline when
+ * it is broken and take a new child online
+ */
+void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
+Error **errp)
+{
+
+if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
+error_setg(errp, "The node %s does not support adding a child",
+   bdrv_get_device_or_node_name(parent_bs));
+return;
+}
+
+if (!QLIST_EMPTY(_bs->parents)) {
+error_setg(errp, "The node %s already has a parent",
+   child_bs->node_name);
+return;
+}
+
+parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
+}
+
+void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error 
**errp)
+{
+BdrvChild *tmp;
+
+if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
+error_setg(errp, "The node %s does not support removing a child",
+   bdrv_get_device_or_node_name(parent_bs));
+return;
+}
+
+QLIST_FOREACH(tmp, _bs->children, next) {
+if (tmp == child) {
+break;
+}
+}
+
+if (!tmp) {
+error_setg(errp, "The node %s does not have child named %s",
+   bdrv_get_device_or_node_name(parent_bs),
+   bdrv_get_device_or_node_name(child->bs));
+return;
+}
+
+parent_bs->drv->bdrv_del_child(parent_bs, child, errp);
+}
diff --git a/include/block/block.h b/include/block/block.h
index 1c4f4d8..7378e74 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -582,4 +582,8 @@ void bdrv_drained_begin(BlockDriverState *bs);
  */
 void bdrv_drained_end(BlockDriverState *bs);
 
+void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
+Error **errp);
+void bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 9ef823a..704efe5 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -305,6 +305,11 @@ struct BlockDriver {
  */
 void (*bdrv_drain)(BlockDriverState *bs);
 
+void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
+   Error **errp);
+void (*bdrv_del_child)(BlockDriverState *parent, BdrvChild *child,
+   Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
1.9.3






[Qemu-block] [PATCH v12 3/3] qmp: add monitor command to add/remove a child

2016-03-09 Thread Changlong Xie
From: Wen Congyang 

The new QMP command name is x-blockdev-change. It's just for adding/removing
quorum's child now, and doesn't support all kinds of children, all kinds of
operations, nor all block drivers. So it is experimental now.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Reviewed-by: Max Reitz 
---
 blockdev.c   | 55 
 qapi/block-core.json | 32 ++
 qmp-commands.hx  | 54 +++
 3 files changed, 141 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 0f20c65..435631e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -4017,6 +4017,61 @@ out:
 aio_context_release(aio_context);
 }
 
+static BdrvChild *bdrv_find_child(BlockDriverState *parent_bs,
+  const char *child_name)
+{
+BdrvChild *child;
+
+QLIST_FOREACH(child, _bs->children, next) {
+if (strcmp(child->name, child_name) == 0) {
+return child;
+}
+}
+
+return NULL;
+}
+
+void qmp_x_blockdev_change(const char *parent, bool has_child,
+   const char *child, bool has_node,
+   const char *node, Error **errp)
+{
+BlockDriverState *parent_bs, *new_bs = NULL;
+BdrvChild *p_child;
+
+parent_bs = bdrv_lookup_bs(parent, parent, errp);
+if (!parent_bs) {
+return;
+}
+
+if (has_child == has_node) {
+if (has_child) {
+error_setg(errp, "The parameters child and node are in conflict");
+} else {
+error_setg(errp, "Either child or node must be specified");
+}
+return;
+}
+
+if (has_child) {
+p_child = bdrv_find_child(parent_bs, child);
+if (!p_child) {
+error_setg(errp, "Node '%s' does not have child '%s'",
+   parent, child);
+return;
+}
+bdrv_del_child(parent_bs, p_child, errp);
+}
+
+if (has_node) {
+new_bs = bdrv_find_node(node);
+if (!new_bs) {
+error_setg(errp, "Node '%s' not found", node);
+return;
+}
+bdrv_add_child(parent_bs, new_bs, errp);
+}
+}
+
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
 BlockJobInfoList *head = NULL, **p_next = 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9bf1b22..bc3fd0b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2548,3 +2548,35 @@
 ##
 { 'command': 'block-set-write-threshold',
   'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
+
+##
+# @x-blockdev-change
+#
+# Dynamically reconfigure the block driver state graph. It can be used
+# to add, remove, insert or replace a graph node. Currently only the
+# Quorum driver implements this feature to add or remove its child. This
+# is useful to fix a broken quorum child.
+#
+# If @node is specified, it will be inserted under @parent. @child
+# may not be specified in this case. If both @parent and @child are
+# specified but @node is not, @child will be detached from @parent.
+#
+# @parent: the id or name of the parent node.
+#
+# @child: #optional the name of a child under the given parent node.
+#
+# @node: #optional the name of the node that will be added.
+#
+# Note: this command is experimental, and its API is not stable. It
+# does not support all kinds of operations, all kinds of children, nor
+# all block drivers.
+#
+# Warning: The data in a new quorum child MUST be consistent with that of
+# the rest of the array.
+#
+# Since: 2.6
+##
+{ 'command': 'x-blockdev-change',
+  'data' : { 'parent': 'str',
+ '*child': 'str',
+ '*node': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index b629673..2a55135 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4398,6 +4398,60 @@ Example:
 EQMP
 
 {
+.name   = "x-blockdev-change",
+.args_type  = "parent:B,child:B?,node:B?",
+.mhandler.cmd_new = qmp_marshal_x_blockdev_change,
+},
+
+SQMP
+x-blockdev-change
+-
+
+Dynamically reconfigure the block driver state graph. It can be used
+to add, remove, insert or replace a graph node. Currently only the
+Quorum driver implements this feature to add or remove its child. This
+is useful to fix a broken quorum child.
+
+If @node is specified, it will be inserted under @parent. @child
+may not be specified in this case. If both @parent and @child are
+specified but @node is not, @child will be detached from @parent.
+
+Arguments:
+- "parent": the id or name of the parent node (json-string)
+- "child": the name of a child under the given parent node (json-string, 
optional)
+- "node": the name of the node that will be added (json-string, 

[Qemu-block] [PATCH v12 0/3] qapi: child add/delete support

2016-03-09 Thread Changlong Xie
ChangLog:
v11~v12:
1. Address comments from Max
p1. Add R-B
p2. Add R-B, remove unnecessary "endptr" "value"
p3. Add R-B

v10~v11:
1. Rebase to the newest codes
2. Address comment from Max
Don't use contractions in error messages,
p1: Remove R-Bs, and use "BdrvChild *child" in bdrv_del_child
p2: Fix error logic in get_new_child_index/remove_child_index, and prefect
child->name parsing
p3: Make bdrv_find_child return BdrvChild *, and add missing explanation

v9~v10:
1. Rebase to the newest codes
2. Address comments from Berto and Max, update the documents in block-core.json 
   and qmp-commands.hx 
3. Remove redundant codes in quorum_add_child() and quorum_del_child()
v8:
1. Rebase to the newest codes
2. Address the comments from Eric Blake
v7:
1. Remove the qmp command x-blockdev-change's parameter operation according
   to Kevin's comments.
2. Remove the hmp command.
v6:
1. Use a single qmp command x-blockdev-change to replace x-blockdev-child-add
   and x-blockdev-child-delete
v5:
1. Address Eric Blake's comments
v4:
1. drop nbd driver's implementation. We can use human-monitor-command
   to do it.
2. Rename the command name.
v3:
1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
   created by the QMP command blockdev-add.
2. The driver NBD can support filename, path, host:port now.
v2:
1. Use bdrv_get_device_or_node_name() instead of new function
   bdrv_get_id_or_node_name()
2. Update the error message
3. Update the documents in block-core.json

Wen Congyang (3):
  Add new block driver interface to add/delete a BDS's child
  quorum: implement bdrv_add_child() and bdrv_del_child()
  qmp: add monitor command to add/remove a child

 block.c   |  57 --
 block/quorum.c| 121 +-
 blockdev.c|  55 +
 include/block/block.h |   8 +++
 include/block/block_int.h |   5 ++
 qapi/block-core.json  |  32 
 qmp-commands.hx   |  54 +
 7 files changed, 326 insertions(+), 6 deletions(-)

-- 
1.9.3






Re: [Qemu-block] [PATCH v3 1/1] block/sheepdog: fix argument passed to qemu_strtoul()

2016-03-09 Thread Jeff Cody
On Wed, Mar 02, 2016 at 11:24:42AM -0500, Jeff Cody wrote:
> The function qemu_strtoul() reads 'unsigned long' sized data,
> which is larger than uint32_t on 64-bit machines.
> 
> Even though the snap_id field in the header is 32-bits, we must
> accomodate the full size in qemu_strtoul().
> 
> This patch also adds more meaningful error handling to the
> qemu_strtoul() call, and subsequent results.
> 
> Reported-by: Paolo Bonzini 
> Signed-off-by: Jeff Cody 
> ---
>  block/sheepdog.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index 8739acc..87f0027 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -2543,7 +2543,7 @@ static int sd_snapshot_delete(BlockDriverState *bs,
>const char *name,
>Error **errp)
>  {
> -uint32_t snap_id = 0;
> +unsigned long snap_id = 0;
>  char snap_tag[SD_MAX_VDI_TAG_LEN];
>  Error *local_err = NULL;
>  int fd, ret;
> @@ -2565,12 +2565,15 @@ static int sd_snapshot_delete(BlockDriverState *bs,
>  memset(buf, 0, sizeof(buf));
>  memset(snap_tag, 0, sizeof(snap_tag));
>  pstrcpy(buf, SD_MAX_VDI_LEN, s->name);
> -if (qemu_strtoul(snapshot_id, NULL, 10, (unsigned long *)_id)) {
> -return -1;
> +ret = qemu_strtoul(snapshot_id, NULL, 10, _id);
> +if (ret || snap_id > UINT32_MAX) {
> +error_setg(errp, "Invalid snapshot ID: %s",
> + snapshot_id ? snapshot_id : "");
> +return -EINVAL;
>  }
>  
>  if (snap_id) {
> -hdr.snapid = snap_id;
> +hdr.snapid = (uint32_t) snap_id;
>  } else {
>  pstrcpy(snap_tag, sizeof(snap_tag), snapshot_id);
>  pstrcpy(buf + SD_MAX_VDI_LEN, SD_MAX_VDI_TAG_LEN, snap_tag);
> -- 
> 1.9.3
> 

Thanks,

Applied to my block branch:

git://github.com/codyprime/qemu-kvm-jtc.git block

-Jeff



Re: [Qemu-block] [PATCH v11 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-03-09 Thread Changlong Xie

On 03/10/2016 02:11 AM, Max Reitz wrote:

On 09.03.2016 04:51, Changlong Xie wrote:

From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
  block.c   |   8 ++--
  block/quorum.c| 123 +-
  include/block/block.h |   4 ++
  3 files changed, 129 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index d48f441..66d21af 100644
--- a/block.c
+++ b/block.c
@@ -1194,10 +1194,10 @@ static int bdrv_fill_options(QDict **options, const 
char *filename,
  return 0;
  }

-static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
-BlockDriverState *child_bs,
-const char *child_name,
-const BdrvChildRole *child_role)
+BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+ BlockDriverState *child_bs,
+ const char *child_name,
+ const BdrvChildRole *child_role)
  {
  BdrvChild *child = g_new(BdrvChild, 1);
  *child = (BdrvChild) {
diff --git a/block/quorum.c b/block/quorum.c
index 11cc60b..469e4a3 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -24,6 +24,7 @@
  #include "qapi/qmp/qstring.h"
  #include "qapi-event.h"
  #include "crypto/hash.h"
+#include "qemu/bitmap.h"

  #define HASH_LENGTH 32

@@ -81,6 +82,8 @@ typedef struct BDRVQuorumState {
  bool rewrite_corrupted;/* true if the driver must rewrite-on-read 
corrupted
  * block if Quorum is reached.
  */
+unsigned long *index_bitmap;
+int bsize;

  QuorumReadPattern read_pattern;
  } BDRVQuorumState;
@@ -877,9 +880,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
  ret = -EINVAL;
  goto exit;
  }
-if (s->num_children < 2) {
+if (s->num_children < 1) {
  error_setg(_err,
-   "Number of provided children must be greater than 1");
+   "Number of provided children must be 1 or more");
  ret = -EINVAL;
  goto exit;
  }
@@ -928,6 +931,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
  /* allocate the children array */
  s->children = g_new0(BdrvChild *, s->num_children);
  opened = g_new0(bool, s->num_children);
+s->index_bitmap = bitmap_new(s->num_children);

  for (i = 0; i < s->num_children; i++) {
  char indexstr[32];
@@ -943,6 +947,8 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,

  opened[i] = true;
  }
+bitmap_set(s->index_bitmap, 0, s->num_children);
+s->bsize = s->num_children;

  g_free(opened);
  goto exit;
@@ -999,6 +1005,116 @@ static void quorum_attach_aio_context(BlockDriverState 
*bs,
  }
  }

+static int get_new_child_index(BDRVQuorumState *s)
+{
+int index;
+
+index = find_next_zero_bit(s->index_bitmap, s->bsize, 0);
+if (index < s->bsize) {
+return index;
+}
+
+s->index_bitmap = bitmap_zero_extend(s->index_bitmap, s->bsize,
+ s->bsize + 1);


If s->bsize == INT_MAX, then this will overflow to INT_MIN (probably).
This negative value will then be converted to a smaller negative value
by BITS_TO_LONGS() * sizeof(long) in bitmap_zero_extend(), and this
negative value will then be implicitly casted to a size_t value for the
g_realloc() call. On both 32 and 64 bit systems, allocating this will
probably fail due to insufficient memory which will then crash qemu.



It's a problem.


One way to prevent this: Prevent the overflow in this function by
failing if s->bsize == INT_MAX before bitmap_zero_extend() is called.

Another way: Do not limit the number of children in quorum_add_child()
(and additionally in quorum_open()) to INT_MAX, but to something more
sane like 256 or 1024 or 65536 if you want to go really high (I can't
imagine anyone using more than 32 children). That way, s->bsize can
never grow to be INT_MAX in the first place.

In any case, qemu will probably crash long before this overflows because
trying to create 2G BDSs will definitely break something. This is why
I'd prefer the second approach (limiting the number of children to a


Me too


sane amount), and this is also why I don't actually care about this
overflow here:

In my opinion you don't need to change anything here. A follow-up patch
can take care of limiting the number of quorum children to a sane amount.


Okay




+return s->bsize++;
+}
+
+static void remove_child_index(BDRVQuorumState *s, int index)
+{
+int last_index, old_bsize;
+size_t new_len;
+
+assert(index < s->bsize);
+
+clear_bit(index, 

Re: [Qemu-block] [Qemu-devel] [PATCH v2 3/3] vmdk: Switch to heap arrays for vmdk_parent_open

2016-03-09 Thread Fam Zheng
On Wed, 03/09 10:50, Kevin Wolf wrote:
> Am 09.03.2016 um 01:43 hat Fam Zheng geschrieben:
> > On Tue, 03/08 16:24, Fam Zheng wrote:
> > > Signed-off-by: Fam Zheng 
> > > ---
> > >  block/vmdk.c | 17 +++--
> > >  1 file changed, 11 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/block/vmdk.c b/block/vmdk.c
> > > index c68f456..03be7f0 100644
> > > --- a/block/vmdk.c
> > > +++ b/block/vmdk.c
> > > @@ -343,15 +343,16 @@ static int vmdk_reopen_prepare(BDRVReopenState 
> > > *state,
> > >  static int vmdk_parent_open(BlockDriverState *bs)
> > >  {
> > >  char *p_name;
> > > -char desc[DESC_SIZE + 1];
> > > +char *desc;
> > >  BDRVVmdkState *s = bs->opaque;
> > >  int ret;
> > >  
> > > -desc[DESC_SIZE] = '\0';
> > > +desc = g_malloc0(DESC_SIZE + 1);
> > >  ret = bdrv_pread(bs->file->bs, s->desc_offset, desc, DESC_SIZE);
> > >  if (ret < 0) {
> > > -return ret;
> > > +goto out;
> > >  }
> > > +ret = 0;
> > 
> > Kevin, were you referring to this "ret = 0" in the cover letter? If so, it 
> > is
> > not useless, because ret was set to DESC_SIZE by bdrv_pread. :)
> 
> Nope, I meant the initialisation in patch 1.

You're right.

Fam



Re: [Qemu-block] [PATCH 1/2] block/qapi: make two printf() formats literal

2016-03-09 Thread Peter Xu
On Wed, Mar 09, 2016 at 03:14:03PM -0700, Eric Blake wrote:
> > +func_fprintf(f, "%*s[%i]:%c", indentation * 4, "", i,
> > + composite ? '\n' : ' ');
> 
> [The nerd in me wants to point out that you could avoid the ternary by
> writing '"\n "[composite]', but that's too ugly to use outside of IOCCC
> submissions, and I wouldn't be surprised if it (rightfully) triggers
> clang warnings]

Do you mean something like:

int i = 0;
printf("%c", '"\n "[i]');

Is this a grammar btw?

Peter



[Qemu-block] [PATCH v5 11/14] qapi: Don't special-case simple union wrappers

2016-03-09 Thread Eric Blake
Simple unions were carrying a special case that hid their 'data'
QMP member from the resulting C struct, via the hack method
QAPISchemaObjectTypeVariant.simple_union_type().  But by using
the work we started by unboxing flat union and alternate
branches, coupled with the ability to visit the members of an
implicit type, we can now expose the simple union's implicit
type in qapi-types.h:

| struct q_obj_ImageInfoSpecificQCow2_wrapper {
| ImageInfoSpecificQCow2 *data;
| };
|
| struct q_obj_ImageInfoSpecificVmdk_wrapper {
| ImageInfoSpecificVmdk *data;
| };
...
| struct ImageInfoSpecific {
| ImageInfoSpecificKind type;
| union { /* union tag is @type */
| void *data;
|-ImageInfoSpecificQCow2 *qcow2;
|-ImageInfoSpecificVmdk *vmdk;
|+q_obj_ImageInfoSpecificQCow2_wrapper qcow2;
|+q_obj_ImageInfoSpecificVmdk_wrapper vmdk;
| } u;
| };

Doing this removes asymmetry between QAPI's QMP side and its
C side (both sides now expose 'data'), and means that the
treatment of a simple union as sugar for a flat union is now
equivalent in both languages (previously the two approaches used
a different layer of dereferencing, where the simple union could
be converted to a flat union with equivalent C layout but
different {} on the wire, or to an equivalent QMP wire form
but with different C representation).  Using the implicit type
also lets us get rid of the simple_union_type() hack.

Of course, now all clients of simple unions have to adjust from
using su->u.member to using su->u.member.data; while this touches
a number of files in the tree, some earlier cleanup patches
helped minimize the change to the initialization of a temporary
variable rather than every single member access.  The generated
qapi-visit.c code is also affected by the layout change:

|@@ -7393,10 +7393,10 @@ void visit_type_ImageInfoSpecific_member
| }
| switch (obj->type) {
| case IMAGE_INFO_SPECIFIC_KIND_QCOW2:
|-visit_type_ImageInfoSpecificQCow2(v, "data", >u.qcow2, );
|+visit_type_q_obj_ImageInfoSpecificQCow2_wrapper_members(v, 
>u.qcow2, );
| break;
| case IMAGE_INFO_SPECIFIC_KIND_VMDK:
|-visit_type_ImageInfoSpecificVmdk(v, "data", >u.vmdk, );
|+visit_type_q_obj_ImageInfoSpecificVmdk_wrapper_members(v, 
>u.vmdk, );
| break;
| default:
| abort();

Signed-off-by: Eric Blake 

---
v5: rebase to q_obj naming change, hoist assertion earlier in series
v4: improve commit message, rebase onto exposed implicit types
[no v3]
v2: rebase onto s/fields/members/ change, changes in master; pick
up missing net/ files
---
 scripts/qapi.py | 10 -
 scripts/qapi-types.py   |  8 +---
 scripts/qapi-visit.py   | 22 ++-
 backends/baum.c |  2 +-
 backends/msmouse.c  |  2 +-
 block/nbd.c |  6 +--
 block/qcow2.c   |  6 +--
 block/vmdk.c|  8 ++--
 blockdev.c  | 24 ++--
 hmp.c   |  8 ++--
 hw/char/escc.c  |  2 +-
 hw/input/hid.c  |  8 ++--
 hw/input/ps2.c  |  6 +--
 hw/input/virtio-input-hid.c |  8 ++--
 hw/mem/pc-dimm.c|  2 +-
 net/dump.c  |  2 +-
 net/hub.c   |  2 +-
 net/l2tpv3.c|  2 +-
 net/net.c   |  4 +-
 net/netmap.c|  2 +-
 net/slirp.c |  2 +-
 net/socket.c|  2 +-
 net/tap-win32.c |  2 +-
 net/tap.c   |  4 +-
 net/vde.c   |  2 +-
 net/vhost-user.c|  2 +-
 numa.c  |  4 +-
 qemu-char.c | 82 +
 qemu-nbd.c  |  6 +--
 replay/replay-input.c   | 44 +++---
 spice-qemu-char.c   | 14 ---
 tests/test-io-channel-socket.c  | 40 ++--
 tests/test-qmp-commands.c   |  2 +-
 tests/test-qmp-input-visitor.c  | 25 +++--
 tests/test-qmp-output-visitor.c | 24 ++--
 tpm.c   |  2 +-
 ui/console.c|  4 +-
 ui/input-keymap.c   | 10 ++---
 ui/input-legacy.c   |  8 ++--
 ui/input.c  | 34 -
 ui/vnc-auth-sasl.c  |  3 +-
 ui/vnc.c| 29 ---
 util/qemu-sockets.c | 35 +-
 43 files changed, 246 insertions(+), 268 deletions(-)

diff --git a/scripts/qapi.py b/scripts/qapi.py
index 08d63bf..d91af94 100644
--- a/scripts/qapi.py
+++ b/scripts/qapi.py
@@ -1115,16 +1115,6 @@ class 
QAPISchemaObjectTypeVariant(QAPISchemaObjectTypeMember):
 def __init__(self, name, typ):
 QAPISchemaObjectTypeMember.__init__(self, name, typ, False)

-# This 

Re: [Qemu-block] [PATCH v2] block/gluster: add support for SEEK_DATA/SEEK_HOLE

2016-03-09 Thread Jeff Cody
On Wed, Mar 09, 2016 at 07:12:41PM +0100, Niels de Vos wrote:
> On Wed, Mar 09, 2016 at 10:46:02AM -0500, Jeff Cody wrote:
> > On Wed, Mar 09, 2016 at 01:30:14PM +0100, Niels de Vos wrote:
> > > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > > it possible to detect sparse areas in files.
> > > 
> > > Signed-off-by: Niels de Vos 
> > > 
> > > ---
> > > Tested by compiling and running "qemu-img map gluster://..." with a
> > > build of the current master branch of glusterfs. Using a Fedora cloud
> > > image (in raw format) shows many SEEK procudure calls going back and
> > > forth over the network. The output of "qemu map" matches the output when
> > > run against the image on the local filesystem.
> > > 
> > > v2 based on feedback from Jeff Cody:
> > > - Replace compile time detection by runtime detection
> > > - Update return pointer (new argument) for .bdrv_co_get_block_status
> > > ---
> > >  block/gluster.c | 186 
> > > 
> > >  1 file changed, 186 insertions(+)
> > > 
> > > diff --git a/block/gluster.c b/block/gluster.c
> > > index 65077a0..b01ab52 100644
> > > --- a/block/gluster.c
> > > +++ b/block/gluster.c
> > > @@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
> > >  typedef struct BDRVGlusterState {
> > >  struct glfs *glfs;
> > >  struct glfs_fd *fd;
> > > +bool supports_seek_data;
> > >  } BDRVGlusterState;
> > >  
> > >  typedef struct GlusterConf {
> > > @@ -286,6 +287,33 @@ static void qemu_gluster_parse_flags(int bdrv_flags, 
> > > int *open_flags)
> > >  }
> > >  }
> > >  
> > > +/*
> > > + * Do SEEK_DATA/HOLE to detect if it is functional. In order to be 
> > > usable, it
> > > + * should return different values for the start of data and the start of 
> > > a
> > > + * hole. There are three different cases to handle:
> > > + *
> > > + *  - the same position is returned for data/hole (indicates broken 
> > > gfapi)
> > > + *  - an error is returned:
> > > + * - ENXIO only gets returned if there is valid support on 
> > > client+server
> > > + * - EINVAL is returned when gfapi or the server does not support it
> > > + */
> > > +static bool qemu_gluster_test_seek(struct glfs_fd *fd)
> > > +{
> > > +off_t start_data, start_hole;
> > > +bool supports_seek_data = false;
> > > +
> > > +start_data = glfs_lseek(fd, 0, SEEK_DATA);
> > > +if (start_data != -1) {
> > 
> > I recommend just checking if the returned value is >= 0.
> 
> Ok, I can change that.
> 
> > > +start_hole = glfs_lseek(fd, 0, SEEK_HOLE);
> > > +if (start_hole != -1)
> > 
> > Minor formatting nit: per QEMU coding standard, all conditional
> > statements require brackets.
> 
> Ah, sure, will do.
> 
> > > +supports_seek_data = !(start_data == start_hole);
> > > +} else if (errno == ENXIO) {
> > 
> > This errno check for ENXIO won't catch the case if an ENXIO error
> > occurs in the SEEK_HOLE call.
> 
> I'm not sure if I'm following. lseek(SEEK_DATA) returns -1 and sets
> errno to ENXIO when the position in the filedescriptor is EOF. In this
> test, we check from position=0, so when ENXIO is returned, we know the
> file is empty and the return value+errno has gone through the whole
> Gluster stack.
> 
> EINVAL would be an other error that is expected, in case either (a
> current) gfapi or the server do not support SEEK_DATA.
> 
> I do not think there is a need to check for ENXIO on lseek(SEEK_HOLE).


Hmm, I think you are right - I can't think of any scenario that
SEEK_HOLE should return ENXIO that SEEK_DATA would not.

As matter of fact, if we can rely on ENXIO always indicating if
SEEK_DATA is supported by gluster, then we can make the whole
detection process much simpler, like this:

static bool qemu_gluster_test_seek(struct glfs_fd *fd)
{
off_t ret, eof;
eof = glfs_lseek(fd, 0, SEEK_END);
if (eof < 0) {
/* this shouldn't occur */
return false;
}
/* this should always fail with ENXIO if SEEK_DATA is supported */
ret = glfs_lseek(fd, eof, SEEK_DATA);
return (ret < 0) && (errno == ENXIO);
}


> 
> Do you have a preference on how I should change it?
> 
> > > +supports_seek_data = true;
> > > +}
> > > +
> > > +return supports_seek_data;
> > > +}
> > > +
> > >  static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
> > >   int bdrv_flags, Error **errp)
> > >  {
> > > @@ -320,6 +348,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  
> > > QDict *options,
> > >  ret = -errno;
> > >  }
> > >  
> > > +s->supports_seek_data = qemu_gluster_test_seek(s->fd);
> > > +
> > >  out:
> > >  qemu_opts_del(opts);
> > >  qemu_gluster_gconf_free(gconf);
> > > @@ -677,6 +707,158 @@ static int 
> > > qemu_gluster_has_zero_init(BlockDriverState *bs)
> > >  return 0;
> > >  }
> > >  
> > > +/*
> > > + * Find allocation range in @bs around offset @start.
> 

Re: [Qemu-block] [PATCH 2/2] block/qapi: fix unbounded stack for dump_qdict

2016-03-09 Thread Eric Blake
On 03/08/2016 10:56 PM, Peter Xu wrote:
> Using heap instead of stack for better safety.
> 
> Signed-off-by: Peter Xu 
> ---
>  block/qapi.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/block/qapi.c b/block/qapi.c
> index c4c2115..b798e35 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -636,9 +636,8 @@ static void dump_qdict(fprintf_function func_fprintf, 
> void *f, int indentation,
>  for (entry = qdict_first(dict); entry; entry = qdict_next(dict, entry)) {
>  QType type = qobject_type(entry->value);
>  bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
> -char key[strlen(entry->key) + 1];
> +char *key = g_malloc(strlen(entry->key) + 1);
>  int i;
> -
>  /* replace dashes with spaces in key (variable) names */
>  for (i = 0; entry->key[i]; i++) {
>  key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];
> @@ -650,6 +649,7 @@ static void dump_qdict(fprintf_function func_fprintf, 
> void *f, int indentation,
>  if (!composite) {
>  func_fprintf(f, "\n");
>  }
> +g_free(key);
>  }
>  }
>  
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 1/2] block/qapi: make two printf() formats literal

2016-03-09 Thread Eric Blake
On 03/08/2016 10:56 PM, Peter Xu wrote:
> Fix two places to use literal printf format when possible.
> 
> Signed-off-by: Peter Xu 
> ---
>  block/qapi.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/block/qapi.c b/block/qapi.c
> index db2d3fb..c4c2115 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -619,9 +619,8 @@ static void dump_qlist(fprintf_function func_fprintf, 
> void *f, int indentation,
>  for (entry = qlist_first(list); entry; entry = qlist_next(entry), i++) {
>  QType type = qobject_type(entry->value);
>  bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
> -const char *format = composite ? "%*s[%i]:\n" : "%*s[%i]: ";
> -
> -func_fprintf(f, format, indentation * 4, "", i);
> +func_fprintf(f, "%*s[%i]:%c", indentation * 4, "", i,
> + composite ? '\n' : ' ');

[The nerd in me wants to point out that you could avoid the ternary by
writing '"\n "[composite]', but that's too ugly to use outside of IOCCC
submissions, and I wouldn't be surprised if it (rightfully) triggers
clang warnings]

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v11 3/3] qmp: add monitor command to add/remove a child

2016-03-09 Thread Max Reitz
On 09.03.2016 04:51, Changlong Xie wrote:
> From: Wen Congyang 
> 
> The new QMP command name is x-blockdev-change. It's just for adding/removing
> quorum's child now, and doesn't support all kinds of children, all kinds of
> operations, nor all block drivers. So it is experimental now.
> 
> Signed-off-by: Wen Congyang 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Gonglei 
> Signed-off-by: Changlong Xie 
> ---
>  blockdev.c   | 55 
> 
>  qapi/block-core.json | 32 ++
>  qmp-commands.hx  | 54 +++
>  3 files changed, 141 insertions(+)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2] block/gluster: add support for SEEK_DATA/SEEK_HOLE

2016-03-09 Thread Niels de Vos
On Wed, Mar 09, 2016 at 10:46:02AM -0500, Jeff Cody wrote:
> On Wed, Mar 09, 2016 at 01:30:14PM +0100, Niels de Vos wrote:
> > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > it possible to detect sparse areas in files.
> > 
> > Signed-off-by: Niels de Vos 
> > 
> > ---
> > Tested by compiling and running "qemu-img map gluster://..." with a
> > build of the current master branch of glusterfs. Using a Fedora cloud
> > image (in raw format) shows many SEEK procudure calls going back and
> > forth over the network. The output of "qemu map" matches the output when
> > run against the image on the local filesystem.
> > 
> > v2 based on feedback from Jeff Cody:
> > - Replace compile time detection by runtime detection
> > - Update return pointer (new argument) for .bdrv_co_get_block_status
> > ---
> >  block/gluster.c | 186 
> > 
> >  1 file changed, 186 insertions(+)
> > 
> > diff --git a/block/gluster.c b/block/gluster.c
> > index 65077a0..b01ab52 100644
> > --- a/block/gluster.c
> > +++ b/block/gluster.c
> > @@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
> >  typedef struct BDRVGlusterState {
> >  struct glfs *glfs;
> >  struct glfs_fd *fd;
> > +bool supports_seek_data;
> >  } BDRVGlusterState;
> >  
> >  typedef struct GlusterConf {
> > @@ -286,6 +287,33 @@ static void qemu_gluster_parse_flags(int bdrv_flags, 
> > int *open_flags)
> >  }
> >  }
> >  
> > +/*
> > + * Do SEEK_DATA/HOLE to detect if it is functional. In order to be usable, 
> > it
> > + * should return different values for the start of data and the start of a
> > + * hole. There are three different cases to handle:
> > + *
> > + *  - the same position is returned for data/hole (indicates broken gfapi)
> > + *  - an error is returned:
> > + * - ENXIO only gets returned if there is valid support on 
> > client+server
> > + * - EINVAL is returned when gfapi or the server does not support it
> > + */
> > +static bool qemu_gluster_test_seek(struct glfs_fd *fd)
> > +{
> > +off_t start_data, start_hole;
> > +bool supports_seek_data = false;
> > +
> > +start_data = glfs_lseek(fd, 0, SEEK_DATA);
> > +if (start_data != -1) {
> 
> I recommend just checking if the returned value is >= 0.

Ok, I can change that.

> > +start_hole = glfs_lseek(fd, 0, SEEK_HOLE);
> > +if (start_hole != -1)
> 
> Minor formatting nit: per QEMU coding standard, all conditional
> statements require brackets.

Ah, sure, will do.

> > +supports_seek_data = !(start_data == start_hole);
> > +} else if (errno == ENXIO) {
> 
> This errno check for ENXIO won't catch the case if an ENXIO error
> occurs in the SEEK_HOLE call.

I'm not sure if I'm following. lseek(SEEK_DATA) returns -1 and sets
errno to ENXIO when the position in the filedescriptor is EOF. In this
test, we check from position=0, so when ENXIO is returned, we know the
file is empty and the return value+errno has gone through the whole
Gluster stack.

EINVAL would be an other error that is expected, in case either (a
current) gfapi or the server do not support SEEK_DATA.

I do not think there is a need to check for ENXIO on lseek(SEEK_HOLE).

Do you have a preference on how I should change it?

> > +supports_seek_data = true;
> > +}
> > +
> > +return supports_seek_data;
> > +}
> > +
> >  static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
> >   int bdrv_flags, Error **errp)
> >  {
> > @@ -320,6 +348,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  
> > QDict *options,
> >  ret = -errno;
> >  }
> >  
> > +s->supports_seek_data = qemu_gluster_test_seek(s->fd);
> > +
> >  out:
> >  qemu_opts_del(opts);
> >  qemu_gluster_gconf_free(gconf);
> > @@ -677,6 +707,158 @@ static int 
> > qemu_gluster_has_zero_init(BlockDriverState *bs)
> >  return 0;
> >  }
> >  
> > +/*
> > + * Find allocation range in @bs around offset @start.
> > + * May change underlying file descriptor's file offset.
> > + * If @start is not in a hole, store @start in @data, and the
> > + * beginning of the next hole in @hole, and return 0.
> > + * If @start is in a non-trailing hole, store @start in @hole and the
> > + * beginning of the next non-hole in @data, and return 0.
> > + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> > + * If we can't find out, return a negative errno other than -ENXIO.
> > + *
> > + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> > + */
> > +static int find_allocation(BlockDriverState *bs, off_t start,
> > +   off_t *data, off_t *hole)
> > +{
> > +BDRVGlusterState *s = bs->opaque;
> > +off_t offs;
> > +
> > +if (!s->supports_seek_data)
> 
> Another formatting nit: brackets needed here as well.

Ok!

> > +return -EINVAL;
> 
> -ENOTSUP would probably be a better fit here, but I don't care 

Re: [Qemu-block] [PATCH v11 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-03-09 Thread Max Reitz
On 09.03.2016 04:51, Changlong Xie wrote:
> From: Wen Congyang 
> 
> Signed-off-by: Wen Congyang 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Gonglei 
> Signed-off-by: Changlong Xie 
> ---
>  block.c   |   8 ++--
>  block/quorum.c| 123 
> +-
>  include/block/block.h |   4 ++
>  3 files changed, 129 insertions(+), 6 deletions(-)
> 
> diff --git a/block.c b/block.c
> index d48f441..66d21af 100644
> --- a/block.c
> +++ b/block.c
> @@ -1194,10 +1194,10 @@ static int bdrv_fill_options(QDict **options, const 
> char *filename,
>  return 0;
>  }
>  
> -static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
> -BlockDriverState *child_bs,
> -const char *child_name,
> -const BdrvChildRole *child_role)
> +BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
> + BlockDriverState *child_bs,
> + const char *child_name,
> + const BdrvChildRole *child_role)
>  {
>  BdrvChild *child = g_new(BdrvChild, 1);
>  *child = (BdrvChild) {
> diff --git a/block/quorum.c b/block/quorum.c
> index 11cc60b..469e4a3 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -24,6 +24,7 @@
>  #include "qapi/qmp/qstring.h"
>  #include "qapi-event.h"
>  #include "crypto/hash.h"
> +#include "qemu/bitmap.h"
>  
>  #define HASH_LENGTH 32
>  
> @@ -81,6 +82,8 @@ typedef struct BDRVQuorumState {
>  bool rewrite_corrupted;/* true if the driver must rewrite-on-read 
> corrupted
>  * block if Quorum is reached.
>  */
> +unsigned long *index_bitmap;
> +int bsize;
>  
>  QuorumReadPattern read_pattern;
>  } BDRVQuorumState;
> @@ -877,9 +880,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  ret = -EINVAL;
>  goto exit;
>  }
> -if (s->num_children < 2) {
> +if (s->num_children < 1) {
>  error_setg(_err,
> -   "Number of provided children must be greater than 1");
> +   "Number of provided children must be 1 or more");
>  ret = -EINVAL;
>  goto exit;
>  }
> @@ -928,6 +931,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  /* allocate the children array */
>  s->children = g_new0(BdrvChild *, s->num_children);
>  opened = g_new0(bool, s->num_children);
> +s->index_bitmap = bitmap_new(s->num_children);
>  
>  for (i = 0; i < s->num_children; i++) {
>  char indexstr[32];
> @@ -943,6 +947,8 @@ static int quorum_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  
>  opened[i] = true;
>  }
> +bitmap_set(s->index_bitmap, 0, s->num_children);
> +s->bsize = s->num_children;
>  
>  g_free(opened);
>  goto exit;
> @@ -999,6 +1005,116 @@ static void quorum_attach_aio_context(BlockDriverState 
> *bs,
>  }
>  }
>  
> +static int get_new_child_index(BDRVQuorumState *s)
> +{
> +int index;
> +
> +index = find_next_zero_bit(s->index_bitmap, s->bsize, 0);
> +if (index < s->bsize) {
> +return index;
> +}
> +
> +s->index_bitmap = bitmap_zero_extend(s->index_bitmap, s->bsize,
> + s->bsize + 1);

If s->bsize == INT_MAX, then this will overflow to INT_MIN (probably).
This negative value will then be converted to a smaller negative value
by BITS_TO_LONGS() * sizeof(long) in bitmap_zero_extend(), and this
negative value will then be implicitly casted to a size_t value for the
g_realloc() call. On both 32 and 64 bit systems, allocating this will
probably fail due to insufficient memory which will then crash qemu.

One way to prevent this: Prevent the overflow in this function by
failing if s->bsize == INT_MAX before bitmap_zero_extend() is called.

Another way: Do not limit the number of children in quorum_add_child()
(and additionally in quorum_open()) to INT_MAX, but to something more
sane like 256 or 1024 or 65536 if you want to go really high (I can't
imagine anyone using more than 32 children). That way, s->bsize can
never grow to be INT_MAX in the first place.

In any case, qemu will probably crash long before this overflows because
trying to create 2G BDSs will definitely break something. This is why
I'd prefer the second approach (limiting the number of children to a
sane amount), and this is also why I don't actually care about this
overflow here:

In my opinion you don't need to change anything here. A follow-up patch
can take care of limiting the number of quorum children to a sane amount.

> +return s->bsize++;
> +}
> +
> +static void remove_child_index(BDRVQuorumState *s, int index)
> +{
> +

Re: [Qemu-block] [PATCH v11 1/3] Add new block driver interface to add/delete a BDS's child

2016-03-09 Thread Max Reitz
On 09.03.2016 04:51, Changlong Xie wrote:
> From: Wen Congyang 
> 
> In some cases, we want to take a quorum child offline, and take
> another child online.
> 
> Signed-off-by: Wen Congyang 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Gonglei 
> Signed-off-by: Changlong Xie 
> ---
>  block.c   | 49 
> +++
>  include/block/block.h |  4 
>  include/block/block_int.h |  5 +
>  3 files changed, 58 insertions(+)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 4/4] iotests: Add test for QMP event rates

2016-03-09 Thread Max Reitz
On 09.03.2016 17:11, Alberto Garcia wrote:
> This test verifies that the rate-limited QMP events are emitted at a
> maximum rate of 1 per second as defined in monitor_qapi_event_conf in
> monitor.c
> 
> It also checks that QUORUM_REPORT_BAD events generated from different
> nodes are kept in separate queues so they don't mask each other.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/146 | 129 
> +
>  tests/qemu-iotests/146.out |   5 ++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 135 insertions(+)
>  create mode 100644 tests/qemu-iotests/146
>  create mode 100644 tests/qemu-iotests/146.out
> 
> diff --git a/tests/qemu-iotests/146 b/tests/qemu-iotests/146
> new file mode 100644
> index 000..30bc379
> --- /dev/null
> +++ b/tests/qemu-iotests/146

[...]

> +# I/O errors in different children: all events are emitted
> +delay = 2 * event_rate
> +for i in range(len(imgs)):
> +self.vm.hmp_qemu_io("drive0", "aio_read %d %d" %
> +((offset + i) * sector_size, sector_size))
> +self.vm.qtest("clock_step %d" % delay)
> +self.do_check_event('img%d' % i, offset + i)
> +

A self.vm.qtest("clock_step %d" % (2 * event_rate)) may be useful here,
but it's not necessary.

Reviewed-by: Max Reitz 

> +# No more pending events
> +self.do_check_event(None)
> +
> +if __name__ == '__main__':
> +iotests.main(supported_fmts=["raw"])



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 3/4] monitor: Use QEMU_CLOCK_VIRTUAL for the event queue in qtest mode

2016-03-09 Thread Max Reitz
On 09.03.2016 17:11, Alberto Garcia wrote:
> This allows us to perform tests on the monitor queues to verify that
> the rate limits are enforced.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  monitor.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/monitor.c b/monitor.c
> index c9fe862..d689b83 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -76,6 +76,7 @@
>  #include "qapi-event.h"
>  #include "qmp-introspect.h"
>  #include "sysemu/block-backend.h"
> +#include "sysemu/qtest.h"
>  
>  /* for hmp_info_irq/pic */
>  #if defined(TARGET_SPARC)
> @@ -232,6 +233,8 @@ static const mon_cmd_t qmp_cmds[];
>  
>  Monitor *cur_mon;
>  
> +static QEMUClockType clock_type = QEMU_CLOCK_REALTIME;

Maybe event_clock_type would be a better name.

Regardless:

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 2/4] monitor: Separate QUORUM_REPORT_BAD events according to the node name

2016-03-09 Thread Max Reitz
On 09.03.2016 17:11, Alberto Garcia wrote:
> The QUORUM_REPORT_BAD event is emitted whenever there's an I/O error
> in a child of a Quorum device. This event is emitted at a maximum rate
> of 1 per second. This means that an error in one of the children will
> mask errors in the other children if they happen within the same 1
> second interval.
> 
> This patch modifies qapi_event_throttle_equal() so QUORUM_REPORT_BAD
> events are kept separately if they come from different children.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  monitor.c | 9 +
>  1 file changed, 9 insertions(+)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] bogus bdrv_check_request in bdrv_co_discard

2016-03-09 Thread Olaf Hering
On Wed, Mar 09, Olaf Hering wrote:

> Why does the code use signed ints anyway for sectors and offset?!

I have to check mainline (next week), at least this fixes mkfs for me:

+++ xen-4.4.3-testing/tools/qemu-xen-dir-remote/block/raw-posix.c
@@ -792,8 +792,8 @@ static BlockDriverAIOCB *paio_submit(Blo
 acb->aio_iov = qiov->iov;
 acb->aio_niov = qiov->niov;
 }
-acb->aio_nbytes = nb_sectors * 512;
-acb->aio_offset = sector_num * 512;
+acb->aio_nbytes = nb_sectors * 512U;
+acb->aio_offset = sector_num * 512U;
 
 trace_paio_submit(acb, opaque, sector_num, nb_sectors, type);
 pool = aio_get_thread_pool(bdrv_get_aio_context(bs));

Olaf



Re: [Qemu-block] bogus bdrv_check_request in bdrv_co_discard

2016-03-09 Thread Olaf Hering
On Wed, Mar 09, Kevin Wolf wrote:

> Removing integer overflow checks without removing the potentially
> overflowing operation doesn't feel like a particularly good idea,
> though.

Why does the code use signed ints anyway for sectors and offset?!

Olaf



[Qemu-block] [PATCH 2/4] monitor: Separate QUORUM_REPORT_BAD events according to the node name

2016-03-09 Thread Alberto Garcia
The QUORUM_REPORT_BAD event is emitted whenever there's an I/O error
in a child of a Quorum device. This event is emitted at a maximum rate
of 1 per second. This means that an error in one of the children will
mask errors in the other children if they happen within the same 1
second interval.

This patch modifies qapi_event_throttle_equal() so QUORUM_REPORT_BAD
events are kept separately if they come from different children.

Signed-off-by: Alberto Garcia 
---
 monitor.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/monitor.c b/monitor.c
index e99ca8c..c9fe862 100644
--- a/monitor.c
+++ b/monitor.c
@@ -572,6 +572,10 @@ static unsigned int qapi_event_throttle_hash(const void 
*key)
 hash += g_str_hash(qdict_get_str(evstate->data, "id"));
 }
 
+if (evstate->event == QAPI_EVENT_QUORUM_REPORT_BAD) {
+hash += g_str_hash(qdict_get_str(evstate->data, "node-name"));
+}
+
 return hash;
 }
 
@@ -589,6 +593,11 @@ static gboolean qapi_event_throttle_equal(const void *a, 
const void *b)
qdict_get_str(evb->data, "id"));
 }
 
+if (eva->event == QAPI_EVENT_QUORUM_REPORT_BAD) {
+return !strcmp(qdict_get_str(eva->data, "node-name"),
+   qdict_get_str(evb->data, "node-name"));
+}
+
 return TRUE;
 }
 
-- 
2.7.0




[Qemu-block] [PATCH 0/4] Separate QUORUM_REPORT_BAD events according to their node name

2016-03-09 Thread Alberto Garcia
Hi,

this was proposed by Eric in a recent email, but I'll summarize it
here:

QUORUM_REPORT_BAD events are limited to a maximum rate of 1 per
second. While this is not a problem in itself, this means that an
error in one a Quorum child will mask errors in the other children if
they happen within the same 1 second interval.

This series fixes that problem by separating these events in different
queues if they come from different nodes. Once we add the 'type' field
to QUORUM_REPORT_BAD we will also be able to classify them according
to the type if we want.

In addition to the above, this series also fixes a crash that happens
if there's an I/O error in one of the children. This is serious enough
so I'll send the patch to fix this crash to qemu-stable as well.

Regards,

Beto

Alberto Garcia (4):
  quorum: Fix crash in quorum_aio_cb()
  monitor: Separate QUORUM_REPORT_BAD events according to the node name
  monitor: Use QEMU_CLOCK_VIRTUAL for the event queue in qtest mode
  iotests: Add test for QMP event rates

 block/quorum.c |  12 +++--
 monitor.c  |  22 ++--
 tests/qemu-iotests/146 | 129 +
 tests/qemu-iotests/146.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 5 files changed, 161 insertions(+), 8 deletions(-)
 create mode 100644 tests/qemu-iotests/146
 create mode 100644 tests/qemu-iotests/146.out

-- 
2.7.0




[Qemu-block] [PATCH 3/4] monitor: Use QEMU_CLOCK_VIRTUAL for the event queue in qtest mode

2016-03-09 Thread Alberto Garcia
This allows us to perform tests on the monitor queues to verify that
the rate limits are enforced.

Signed-off-by: Alberto Garcia 
---
 monitor.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/monitor.c b/monitor.c
index c9fe862..d689b83 100644
--- a/monitor.c
+++ b/monitor.c
@@ -76,6 +76,7 @@
 #include "qapi-event.h"
 #include "qmp-introspect.h"
 #include "sysemu/block-backend.h"
+#include "sysemu/qtest.h"
 
 /* for hmp_info_irq/pic */
 #if defined(TARGET_SPARC)
@@ -232,6 +233,8 @@ static const mon_cmd_t qmp_cmds[];
 
 Monitor *cur_mon;
 
+static QEMUClockType clock_type = QEMU_CLOCK_REALTIME;
+
 static void monitor_command_cb(void *opaque, const char *cmdline,
void *readline_opaque);
 
@@ -513,7 +516,7 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, 
Error **errp)
  * monitor_qapi_event_handler() in evconf->rate ns.  Any
  * events arriving before then will be delayed until then.
  */
-int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+int64_t now = qemu_clock_get_ns(clock_type);
 
 monitor_qapi_event_emit(event, qdict);
 
@@ -522,7 +525,7 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, 
Error **errp)
 evstate->data = data;
 QINCREF(evstate->data);
 evstate->qdict = NULL;
-evstate->timer = timer_new_ns(QEMU_CLOCK_REALTIME,
+evstate->timer = timer_new_ns(clock_type,
   monitor_qapi_event_handler,
   evstate);
 g_hash_table_add(monitor_qapi_event_state, evstate);
@@ -547,7 +550,7 @@ static void monitor_qapi_event_handler(void *opaque)
 qemu_mutex_lock(_lock);
 
 if (evstate->qdict) {
-int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+int64_t now = qemu_clock_get_ns(clock_type);
 
 monitor_qapi_event_emit(evstate->event, evstate->qdict);
 QDECREF(evstate->qdict);
@@ -603,6 +606,10 @@ static gboolean qapi_event_throttle_equal(const void *a, 
const void *b)
 
 static void monitor_qapi_event_init(void)
 {
+if (qtest_enabled()) {
+clock_type = QEMU_CLOCK_VIRTUAL;
+}
+
 monitor_qapi_event_state = g_hash_table_new(qapi_event_throttle_hash,
 qapi_event_throttle_equal);
 qmp_event_set_func_emit(monitor_qapi_event_queue);
-- 
2.7.0




[Qemu-block] [PATCH 1/4] quorum: Fix crash in quorum_aio_cb()

2016-03-09 Thread Alberto Garcia
quorum_aio_cb() emits the QUORUM_REPORT_BAD event if there's
an I/O error in a Quorum child. However sacb->aiocb must be
correctly initialized for this to happen. read_quorum_children() and
read_fifo_child() are not doing this, which results in a QEMU crash.

Signed-off-by: Alberto Garcia 
---
 block/quorum.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 11cc60b..73ff309 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -648,8 +648,9 @@ static BlockAIOCB *read_quorum_children(QuorumAIOCB *acb)
 }
 
 for (i = 0; i < s->num_children; i++) {
-bdrv_aio_readv(s->children[i]->bs, acb->sector_num, >qcrs[i].qiov,
-   acb->nb_sectors, quorum_aio_cb, >qcrs[i]);
+acb->qcrs[i].aiocb = bdrv_aio_readv(s->children[i]->bs, 
acb->sector_num,
+>qcrs[i].qiov, 
acb->nb_sectors,
+quorum_aio_cb, >qcrs[i]);
 }
 
 return >common;
@@ -664,9 +665,10 @@ static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
 qemu_iovec_init(>qcrs[acb->child_iter].qiov, acb->qiov->niov);
 qemu_iovec_clone(>qcrs[acb->child_iter].qiov, acb->qiov,
  acb->qcrs[acb->child_iter].buf);
-bdrv_aio_readv(s->children[acb->child_iter]->bs, acb->sector_num,
-   >qcrs[acb->child_iter].qiov, acb->nb_sectors,
-   quorum_aio_cb, >qcrs[acb->child_iter]);
+acb->qcrs[acb->child_iter].aiocb =
+bdrv_aio_readv(s->children[acb->child_iter]->bs, acb->sector_num,
+   >qcrs[acb->child_iter].qiov, acb->nb_sectors,
+   quorum_aio_cb, >qcrs[acb->child_iter]);
 
 return >common;
 }
-- 
2.7.0




[Qemu-block] [PATCH 4/4] iotests: Add test for QMP event rates

2016-03-09 Thread Alberto Garcia
This test verifies that the rate-limited QMP events are emitted at a
maximum rate of 1 per second as defined in monitor_qapi_event_conf in
monitor.c

It also checks that QUORUM_REPORT_BAD events generated from different
nodes are kept in separate queues so they don't mask each other.

Signed-off-by: Alberto Garcia 
---
 tests/qemu-iotests/146 | 129 +
 tests/qemu-iotests/146.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 135 insertions(+)
 create mode 100644 tests/qemu-iotests/146
 create mode 100644 tests/qemu-iotests/146.out

diff --git a/tests/qemu-iotests/146 b/tests/qemu-iotests/146
new file mode 100644
index 000..30bc379
--- /dev/null
+++ b/tests/qemu-iotests/146
@@ -0,0 +1,129 @@
+#!/usr/bin/env python
+#
+# Test the rate limit of QMP events
+#
+# Copyright (C) 2016 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+
+imgs = (os.path.join(iotests.test_dir, 'quorum0.img'),
+os.path.join(iotests.test_dir, 'quorum1.img'),
+os.path.join(iotests.test_dir, 'quorum2.img'))
+
+img_conf = (os.path.join(iotests.test_dir, 'quorum0.conf'),
+os.path.join(iotests.test_dir, 'quorum1.conf'),
+os.path.join(iotests.test_dir, 'quorum2.conf'))
+
+event_rate = 10
+sector_size = 512
+offset = 10
+
+class TestQuorumEvents(iotests.QMPTestCase):
+
+def create_blkdebug_file(self, blkdebug_file, bad_sector):
+file = open(blkdebug_file, 'w')
+file.write('''
+[inject-error]
+event = "read_aio"
+errno = "5"
+sector = "%d"
+''' % bad_sector)
+file.close()
+
+def setUp(self):
+driveopts = ['driver=quorum', 'vote-threshold=2']
+for i in range(len(imgs)):
+iotests.qemu_img('create', '-f', iotests.imgfmt, imgs[i], '1M')
+self.create_blkdebug_file(img_conf[i], i + offset)
+driveopts.append('children.%d.driver=%s' % (i, iotests.imgfmt))
+driveopts.append('children.%d.file.driver=blkdebug' % i)
+driveopts.append('children.%d.file.config=%s' % (i, img_conf[i]))
+driveopts.append('children.%d.file.image.filename=%s' % (i, 
imgs[i]))
+driveopts.append('children.%d.node-name=img%d' % (i, i))
+self.vm = iotests.VM()
+self.vm.add_drive(None, opts = ','.join(driveopts))
+self.vm.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+for i in range(len(imgs)):
+os.remove(imgs[i])
+os.remove(img_conf[i])
+
+def do_check_event(self, node, sector = 0):
+if node == None:
+self.assertEqual(self.vm.get_qmp_event(), None)
+return
+
+for event in self.vm.get_qmp_events(wait=True):
+if event['event'] == 'QUORUM_REPORT_BAD':
+self.assert_qmp(event, 'data/node-name', node)
+self.assert_qmp(event, 'data/sector-num', sector)
+
+def testQuorum(self):
+if not 'quorum' in iotests.qemu_img_pipe('--help'):
+return
+
+# Generate an error and get an event
+self.vm.hmp_qemu_io("drive0", "aio_read %d %d" %
+(offset * sector_size, sector_size))
+self.vm.qtest("clock_step 10")
+self.do_check_event('img0', offset)
+
+# I/O errors in the same child: only one event is emitted
+delay = 10
+for i in range(3):
+self.vm.hmp_qemu_io("drive0", "aio_read %d %d" %
+(offset * sector_size, sector_size))
+self.vm.qtest("clock_step %d" % delay)
+self.do_check_event(None)
+
+# Wait enough so the event is finally emitted
+self.vm.qtest("clock_step %d" % (2 * event_rate))
+self.do_check_event('img0', offset)
+
+# I/O errors in the same child: all events are emitted
+delay = 2 * event_rate
+for i in range(3):
+self.vm.hmp_qemu_io("drive0", "aio_read %d %d" %
+(offset * sector_size, sector_size))
+self.vm.qtest("clock_step %d" % delay)
+self.do_check_event('img0', offset)
+
+# I/O errors in different children: all events are emitted
+delay = 10
+for i in range(len(imgs)):
+

Re: [Qemu-block] [PATCH v2] block/gluster: add support for SEEK_DATA/SEEK_HOLE

2016-03-09 Thread Jeff Cody
On Wed, Mar 09, 2016 at 01:30:14PM +0100, Niels de Vos wrote:
> GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> it possible to detect sparse areas in files.
> 
> Signed-off-by: Niels de Vos 
> 
> ---
> Tested by compiling and running "qemu-img map gluster://..." with a
> build of the current master branch of glusterfs. Using a Fedora cloud
> image (in raw format) shows many SEEK procudure calls going back and
> forth over the network. The output of "qemu map" matches the output when
> run against the image on the local filesystem.
> 
> v2 based on feedback from Jeff Cody:
> - Replace compile time detection by runtime detection
> - Update return pointer (new argument) for .bdrv_co_get_block_status
> ---
>  block/gluster.c | 186 
> 
>  1 file changed, 186 insertions(+)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index 65077a0..b01ab52 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
>  typedef struct BDRVGlusterState {
>  struct glfs *glfs;
>  struct glfs_fd *fd;
> +bool supports_seek_data;
>  } BDRVGlusterState;
>  
>  typedef struct GlusterConf {
> @@ -286,6 +287,33 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int 
> *open_flags)
>  }
>  }
>  
> +/*
> + * Do SEEK_DATA/HOLE to detect if it is functional. In order to be usable, it
> + * should return different values for the start of data and the start of a
> + * hole. There are three different cases to handle:
> + *
> + *  - the same position is returned for data/hole (indicates broken gfapi)
> + *  - an error is returned:
> + * - ENXIO only gets returned if there is valid support on client+server
> + * - EINVAL is returned when gfapi or the server does not support it
> + */
> +static bool qemu_gluster_test_seek(struct glfs_fd *fd)
> +{
> +off_t start_data, start_hole;
> +bool supports_seek_data = false;
> +
> +start_data = glfs_lseek(fd, 0, SEEK_DATA);
> +if (start_data != -1) {

I recommend just checking if the returned value is >= 0.

> +start_hole = glfs_lseek(fd, 0, SEEK_HOLE);
> +if (start_hole != -1)

Minor formatting nit: per QEMU coding standard, all conditional
statements require brackets.

> +supports_seek_data = !(start_data == start_hole);
> +} else if (errno == ENXIO) {

This errno check for ENXIO won't catch the case if an ENXIO error
occurs in the SEEK_HOLE call.

> +supports_seek_data = true;
> +}
> +
> +return supports_seek_data;
> +}
> +
>  static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
>   int bdrv_flags, Error **errp)
>  {
> @@ -320,6 +348,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict 
> *options,
>  ret = -errno;
>  }
>  
> +s->supports_seek_data = qemu_gluster_test_seek(s->fd);
> +
>  out:
>  qemu_opts_del(opts);
>  qemu_gluster_gconf_free(gconf);
> @@ -677,6 +707,158 @@ static int qemu_gluster_has_zero_init(BlockDriverState 
> *bs)
>  return 0;
>  }
>  
> +/*
> + * Find allocation range in @bs around offset @start.
> + * May change underlying file descriptor's file offset.
> + * If @start is not in a hole, store @start in @data, and the
> + * beginning of the next hole in @hole, and return 0.
> + * If @start is in a non-trailing hole, store @start in @hole and the
> + * beginning of the next non-hole in @data, and return 0.
> + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> + * If we can't find out, return a negative errno other than -ENXIO.
> + *
> + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> + */
> +static int find_allocation(BlockDriverState *bs, off_t start,
> +   off_t *data, off_t *hole)
> +{
> +BDRVGlusterState *s = bs->opaque;
> +off_t offs;
> +
> +if (!s->supports_seek_data)

Another formatting nit: brackets needed here as well.

> +return -EINVAL;

-ENOTSUP would probably be a better fit here, but I don't care too
much, since the error code isn't passed along outside the gluster
driver.

> +
> +/*
> + * SEEK_DATA cases:
> + * D1. offs == start: start is in data
> + * D2. offs > start: start is in a hole, next data at offs
> + * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
> + *  or start is beyond EOF
> + * If the latter happens, the file has been truncated behind
> + * our back since we opened it.  All bets are off then.
> + * Treating like a trailing hole is simplest.
> + * D4. offs < 0, errno != ENXIO: we learned nothing
> + */
> +offs = glfs_lseek(s->fd, start, SEEK_DATA);
> +if (offs < 0) {
> +return -errno;  /* D3 or D4 */
> +}
> +assert(offs >= start);
> +
> +if (offs > start) {
> +/* D2: in hole, next data at offs */
> +

Re: [Qemu-block] [PATCH] qed: fix bdrv_qed_drain

2016-03-09 Thread Stefan Hajnoczi
On Tue, Mar 08, 2016 at 10:58:47AM +0100, Kevin Wolf wrote:
> Am 07.03.2016 um 21:56 hat Stefan Hajnoczi geschrieben:
> > On Mon, Mar 07, 2016 at 05:57:41PM +0100, Kevin Wolf wrote:
> > > Am 23.02.2016 um 14:54 hat Paolo Bonzini geschrieben:
> > > > 
> > > > 
> > > > On 23/02/2016 13:49, Fam Zheng wrote:
> > > > > On Tue, 02/23 11:43, Paolo Bonzini wrote:
> > > > >>
> > > > >>
> > > > >> On 23/02/2016 06:57, Fam Zheng wrote:
> > > > >> +qed_cancel_need_check_timer(s);
> > > > >> +qed_need_check_timer_cb(s);
> > > > >> +}
> > > > >
> > > > > What if an allocating write is queued (the else branch case)? Its 
> > > > > completion
> > > > > will be in bdrv_drain and it could arm the need_check_timer which 
> > > > > is wrong.
> > > > >
> > > > > We need to drain the allocating_write_reqs queue before checking 
> > > > > the timer.
> > > > 
> > > >  You're right, but how?  That's what bdrv_drain(bs) does, it's a
> > > >  chicken-and-egg problem.
> > > > >>>
> > > > >>> Maybe use an aio_poll loop before the if?
> > > > >>
> > > > >> That would not change the fact that you're reimplementing bdrv_drain
> > > > >> inside bdrv_qed_drain.
> > > > > 
> > > > > But it fulfills the contract of .bdrv_drain. This is the easy way, 
> > > > > the hard way
> > > > > would be iterating through the allocating_write_reqs list and process 
> > > > > reqs one
> > > > > by one synchronously, which still involves aio_poll indirectly.
> > > > 
> > > > The easy way would be better then.
> > > > 
> > > > Stefan, any second opinion?
> > > 
> > > What's the status here? It would be good to have qed not segfaulting all
> > > the time.
> > 
> > I think the timer concept itself is troublesome.  A radical approach but
> > limited to changing QED itself is to drop the timer and instead keep a
> > timestamp for the last allocating write request.  Next time a write
> > request (allocating or rewriting) is about to complete, do the flush and
> > clear the need check flag as part of the write request (if 5 seconds
> > have passed since the timestamp).
> 
> Isn't that kind of backwards, though, because it's very likely that
> after some new activity a second write request will come in soon and
> mark the image dirty again?
> 
> Apart from that, doesn't it fail to provide the intended effect, that
> the image doesn't stay dirty for a long time and a repair isn't usually
> needed after a crash?

Yes, it relies on a non-allocating write request after the timeout.

I've reverted the drain patch for now.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v10 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-03-09 Thread Max Reitz
On 08.03.2016 03:57, Changlong Xie wrote:
> On 03/08/2016 12:02 AM, Max Reitz wrote:
>> On 07.03.2016 17:02, Eric Blake wrote:
>>> On 03/05/2016 11:13 AM, Max Reitz wrote:
>>>
> +index = atoi(child->name + 9);

 Optional: Assert absence of an error:

>>>
>>> Indeed, atoi() is worthless, because it cannot do error detection.
>>>
 unsigned long index;
 char *endptr;

 index = strtoul(child->name + 9, , 10);
 assert(index >= 0 && !*endptr);
>>>
>>> Still incorrect; you aren't handling errno properly for detecting all
>>> errors.  Even better is to use qemu_strtoul(), which already handles
>>> proper error detection.
>>
>> Yeah, I keep forgetting that it returns ULONG_MAX on range error...
> 
> Yes, we should limit the range to INT_MAX. How do you like the following
> codes, i just steal it from xen_host_pci_get_value().
> 
> int rc;
> const char *endptr;
> unsigned long value;
> 
> assert(!strncmp(child->name, "children.", 9));
> rc = qemu_strtoul(child->name + 9, , 10, );

Passing NULL instead of  will make qemu_strtoul() check that the
string passed to it (child->name + 9) only consists of a number; which
should be true here, so you can do that (pass NULL instead of ).

> if (!rc) {
> assert(value <= INT_MAX);
> index = value;
> } else {
> error_setg_errno(errp, -rc, "Failed to parse value '%s'",
>  child->name + 9);
> return;
> }

You could simplify this as

assert(!rc && value <= INT_MAX);
index = value;

(It should be impossible for qemu_strtoul() to return an error here, so
an assert() is just as fine as a normal error.)

And you could get rid of the index = value assignment by making index an
unsigned long and replacing all instances of "value" by "index".

Max

> 
> Thanks
> -Xie
> 
>>
>> Max
>>
> 
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] bogus bdrv_check_request in bdrv_co_discard

2016-03-09 Thread Kevin Wolf
Am 09.03.2016 um 15:45 hat Olaf Hering geschrieben:
> On Wed, Mar 09, Paolo Bonzini wrote:
> 
> > It probably should range check max_unmap_size and max_io_size against
> > BDRV_REQUEST_MAX_SECTORS, and reject anything bigger than that, though.
> 
> Are you sure? Shouldnt the only check be if sect+num is inside the disk
> size? And everything smaller should be automatically split by qemu to
> whatever num.
> 
> I will see what works for me, probably something like this is already enough
> (for 2.0.2):
> 
> --- xen-4.5.2-testing.orig/tools/qemu-xen-dir-remote/block.c
> +++ xen-4.5.2-testing/tools/qemu-xen-dir-remote/block.c
> @@ -4898,7 +4898,8 @@ int coroutine_fn bdrv_co_discard(BlockDr
>  
>  if (!bs->drv) {
>  return -ENOMEDIUM;
> -} else if (bdrv_check_request(bs, sector_num, nb_sectors)) {
> +} else if (bdrv_check_byte_request(bs, sector_num * BDRV_SECTOR_SIZE,
> +   nb_sectors * BDRV_SECTOR_SIZE)) {
>  return -EIO;

Removing integer overflow checks without removing the potentially
overflowing operation doesn't feel like a particularly good idea,
though.

Kevin



[Qemu-block] bogus bdrv_check_request in bdrv_co_discard

2016-03-09 Thread Olaf Hering
What is the purpose of the bdrv_check_request() call in bdrv_co_discard?

It seems a frontend cant possibly know what the limit is in the
qemu-of-the-day, I found no interface to propagate
BDRV_REQUEST_MAX_SECTORS into the guest.

I think to handle nb_sectors > BDRV_REQUEST_MAX_SECTORS bdrv_co_discard
has to split the request into smaller chunks, just as it does a few
lines down in the function.


Olaf



Re: [Qemu-block] [Qemu-devel] [PATCH 0/3] Early release of -drive QemuOpts

2016-03-09 Thread Markus Armbruster
Kevin Wolf  writes:

> Am 09.03.2016 um 13:20 hat Paolo Bonzini geschrieben:
>> On 22/02/2016 15:39, Paolo Bonzini wrote:
>> > In short, this patch gets rid of blockdev_mark_auto_del and
>> > blockdev_auto_del.
>> > 
>> > With these patches, it is possible to create a new -drive with the same
>> > id as soon as the DEVICE_DELETED event is delivered (which equals to
>> > unrealize).
>> > 
>> > I'm sorry I'm not able to explain the history (and probably do not
>> > understand the full ramifications) of this.  That's why this is just
>> > an RFC.
>> > 
>> > The idea here is that reference counting the BlockBackend is enough to
>> > defer the deletion of the block device as much as necessary; anticipating
>> > the destruction of the DriveInfo is not a problem, and has the desired
>> > effect of freeing the QemuOpts.
>> > 
>> > Patches 1 and 3 are mostly similar to the version I had earlier sent as
>> > RFC, but they now pass all unit tests.  Patch 2 is new, but I don't know
>> > of a test that fails it.
>> > 
>> > Paolo
>> > 
>> > Paolo Bonzini (3):
>> >   block: detach devices from DriveInfo at unrealize time
>> >   block: keep BlockBackend alive until device finalize time
>> >   block: remove legacy_dinfo at blk_detach_dev time
>> > 
>> >  block/block-backend.c| 13 +
>> >  blockdev.c   | 28 +---
>> >  hw/block/virtio-blk.c|  4 +++-
>> >  hw/block/xen_disk.c  |  1 +
>> >  hw/core/qdev-properties-system.c | 13 +++--
>> >  hw/ide/piix.c|  3 +++
>> >  hw/scsi/scsi-bus.c   |  4 +++-
>> >  hw/usb/dev-storage.c |  3 ++-
>> >  include/sysemu/blockdev.h|  5 ++---
>> >  9 files changed, 43 insertions(+), 31 deletions(-)
>> > 
>> 
>> Ping?!?
>
> Markus, can you please review this? You seem to have a better
> understanding of DriveInfo and related magic.

Okay, I'll try to get to it this week.



[Qemu-block] [PATCH v2] block/gluster: add support for SEEK_DATA/SEEK_HOLE

2016-03-09 Thread Niels de Vos
GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
it possible to detect sparse areas in files.

Signed-off-by: Niels de Vos 

---
Tested by compiling and running "qemu-img map gluster://..." with a
build of the current master branch of glusterfs. Using a Fedora cloud
image (in raw format) shows many SEEK procudure calls going back and
forth over the network. The output of "qemu map" matches the output when
run against the image on the local filesystem.

v2 based on feedback from Jeff Cody:
- Replace compile time detection by runtime detection
- Update return pointer (new argument) for .bdrv_co_get_block_status
---
 block/gluster.c | 186 
 1 file changed, 186 insertions(+)

diff --git a/block/gluster.c b/block/gluster.c
index 65077a0..b01ab52 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -23,6 +23,7 @@ typedef struct GlusterAIOCB {
 typedef struct BDRVGlusterState {
 struct glfs *glfs;
 struct glfs_fd *fd;
+bool supports_seek_data;
 } BDRVGlusterState;
 
 typedef struct GlusterConf {
@@ -286,6 +287,33 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int 
*open_flags)
 }
 }
 
+/*
+ * Do SEEK_DATA/HOLE to detect if it is functional. In order to be usable, it
+ * should return different values for the start of data and the start of a
+ * hole. There are three different cases to handle:
+ *
+ *  - the same position is returned for data/hole (indicates broken gfapi)
+ *  - an error is returned:
+ * - ENXIO only gets returned if there is valid support on client+server
+ * - EINVAL is returned when gfapi or the server does not support it
+ */
+static bool qemu_gluster_test_seek(struct glfs_fd *fd)
+{
+off_t start_data, start_hole;
+bool supports_seek_data = false;
+
+start_data = glfs_lseek(fd, 0, SEEK_DATA);
+if (start_data != -1) {
+start_hole = glfs_lseek(fd, 0, SEEK_HOLE);
+if (start_hole != -1)
+supports_seek_data = !(start_data == start_hole);
+} else if (errno == ENXIO) {
+supports_seek_data = true;
+}
+
+return supports_seek_data;
+}
+
 static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
  int bdrv_flags, Error **errp)
 {
@@ -320,6 +348,8 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict 
*options,
 ret = -errno;
 }
 
+s->supports_seek_data = qemu_gluster_test_seek(s->fd);
+
 out:
 qemu_opts_del(opts);
 qemu_gluster_gconf_free(gconf);
@@ -677,6 +707,158 @@ static int qemu_gluster_has_zero_init(BlockDriverState 
*bs)
 return 0;
 }
 
+/*
+ * Find allocation range in @bs around offset @start.
+ * May change underlying file descriptor's file offset.
+ * If @start is not in a hole, store @start in @data, and the
+ * beginning of the next hole in @hole, and return 0.
+ * If @start is in a non-trailing hole, store @start in @hole and the
+ * beginning of the next non-hole in @data, and return 0.
+ * If @start is in a trailing hole or beyond EOF, return -ENXIO.
+ * If we can't find out, return a negative errno other than -ENXIO.
+ *
+ * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
+ */
+static int find_allocation(BlockDriverState *bs, off_t start,
+   off_t *data, off_t *hole)
+{
+BDRVGlusterState *s = bs->opaque;
+off_t offs;
+
+if (!s->supports_seek_data)
+return -EINVAL;
+
+/*
+ * SEEK_DATA cases:
+ * D1. offs == start: start is in data
+ * D2. offs > start: start is in a hole, next data at offs
+ * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
+ *  or start is beyond EOF
+ * If the latter happens, the file has been truncated behind
+ * our back since we opened it.  All bets are off then.
+ * Treating like a trailing hole is simplest.
+ * D4. offs < 0, errno != ENXIO: we learned nothing
+ */
+offs = glfs_lseek(s->fd, start, SEEK_DATA);
+if (offs < 0) {
+return -errno;  /* D3 or D4 */
+}
+assert(offs >= start);
+
+if (offs > start) {
+/* D2: in hole, next data at offs */
+*hole = start;
+*data = offs;
+return 0;
+}
+
+/* D1: in data, end not yet known */
+
+/*
+ * SEEK_HOLE cases:
+ * H1. offs == start: start is in a hole
+ * If this happens here, a hole has been dug behind our back
+ * since the previous lseek().
+ * H2. offs > start: either start is in data, next hole at offs,
+ *   or start is in trailing hole, EOF at offs
+ * Linux treats trailing holes like any other hole: offs ==
+ * start.  Solaris seeks to EOF instead: offs > start (blech).
+ * If that happens here, a hole has been dug behind our back
+ * since the previous lseek().
+ * H3. offs < 0, errno = ENXIO: start is beyond EOF
+ * If this 

Re: [Qemu-block] [PATCH 0/3] Early release of -drive QemuOpts

2016-03-09 Thread Kevin Wolf
Am 09.03.2016 um 13:20 hat Paolo Bonzini geschrieben:
> On 22/02/2016 15:39, Paolo Bonzini wrote:
> > In short, this patch gets rid of blockdev_mark_auto_del and
> > blockdev_auto_del.
> > 
> > With these patches, it is possible to create a new -drive with the same
> > id as soon as the DEVICE_DELETED event is delivered (which equals to
> > unrealize).
> > 
> > I'm sorry I'm not able to explain the history (and probably do not
> > understand the full ramifications) of this.  That's why this is just
> > an RFC.
> > 
> > The idea here is that reference counting the BlockBackend is enough to
> > defer the deletion of the block device as much as necessary; anticipating
> > the destruction of the DriveInfo is not a problem, and has the desired
> > effect of freeing the QemuOpts.
> > 
> > Patches 1 and 3 are mostly similar to the version I had earlier sent as
> > RFC, but they now pass all unit tests.  Patch 2 is new, but I don't know
> > of a test that fails it.
> > 
> > Paolo
> > 
> > Paolo Bonzini (3):
> >   block: detach devices from DriveInfo at unrealize time
> >   block: keep BlockBackend alive until device finalize time
> >   block: remove legacy_dinfo at blk_detach_dev time
> > 
> >  block/block-backend.c| 13 +
> >  blockdev.c   | 28 +---
> >  hw/block/virtio-blk.c|  4 +++-
> >  hw/block/xen_disk.c  |  1 +
> >  hw/core/qdev-properties-system.c | 13 +++--
> >  hw/ide/piix.c|  3 +++
> >  hw/scsi/scsi-bus.c   |  4 +++-
> >  hw/usb/dev-storage.c |  3 ++-
> >  include/sysemu/blockdev.h|  5 ++---
> >  9 files changed, 43 insertions(+), 31 deletions(-)
> > 
> 
> Ping?!?

Markus, can you please review this? You seem to have a better
understanding of DriveInfo and related magic.

Kevin



Re: [Qemu-block] bogus bdrv_check_request in bdrv_co_discard

2016-03-09 Thread Paolo Bonzini


On 09/03/2016 13:11, Olaf Hering wrote:
> What is the purpose of the bdrv_check_request() call in bdrv_co_discard?
> 
> It seems a frontend cant possibly know what the limit is in the
> qemu-of-the-day, I found no interface to propagate
> BDRV_REQUEST_MAX_SECTORS into the guest.

It depends on the backend.  For example SCSI uses the block limits VPD
page.  It has a default max-unmap-size of 1 GiB, which happens to be
smaller than BDRV_REQUEST_MAX_SECTORS too.

It probably should range check max_unmap_size and max_io_size against
BDRV_REQUEST_MAX_SECTORS, and reject anything bigger than that, though.

Paolo

> I think to handle nb_sectors > BDRV_REQUEST_MAX_SECTORS bdrv_co_discard
> has to split the request into smaller chunks, just as it does a few
> lines down in the function.



Re: [Qemu-block] [PATCH 01/11] block: Use writeback in .bdrv_create() implementations

2016-03-09 Thread Kevin Wolf
Am 09.03.2016 um 13:14 hat Paolo Bonzini geschrieben:
> 
> 
> On 08/03/2016 17:34, Kevin Wolf wrote:
> > There's no reason to use a writethrough cache mode while creating an
> > image.
> 
> There's no reason to do flushes in fact, so you could use
> BDRV_O_NO_FLUSH too. :)

That's true. On the other hand, we don't issue flushes anyway, so
there's little reason to ignore non-existing flushes. :-)

The only part where it might make sense is where image formats don't
open the raw image file, but actually open the image with themselves and
allocate things. qcow2 at least already sets BDRV_O_NO_FLUSH for that
part.

In fact, the only bdrv_open() touched in this patch that doesn't have
BDRV_O_PROTOCOL set, is one instance in VMDK where it opens the backing
file to read some ID from the header. So no flush involved there.

Kevin



Re: [Qemu-block] [PATCH 0/3] Early release of -drive QemuOpts

2016-03-09 Thread Paolo Bonzini


On 22/02/2016 15:39, Paolo Bonzini wrote:
> In short, this patch gets rid of blockdev_mark_auto_del and
> blockdev_auto_del.
> 
> With these patches, it is possible to create a new -drive with the same
> id as soon as the DEVICE_DELETED event is delivered (which equals to
> unrealize).
> 
> I'm sorry I'm not able to explain the history (and probably do not
> understand the full ramifications) of this.  That's why this is just
> an RFC.
> 
> The idea here is that reference counting the BlockBackend is enough to
> defer the deletion of the block device as much as necessary; anticipating
> the destruction of the DriveInfo is not a problem, and has the desired
> effect of freeing the QemuOpts.
> 
> Patches 1 and 3 are mostly similar to the version I had earlier sent as
> RFC, but they now pass all unit tests.  Patch 2 is new, but I don't know
> of a test that fails it.
> 
> Paolo
> 
> Paolo Bonzini (3):
>   block: detach devices from DriveInfo at unrealize time
>   block: keep BlockBackend alive until device finalize time
>   block: remove legacy_dinfo at blk_detach_dev time
> 
>  block/block-backend.c| 13 +
>  blockdev.c   | 28 +---
>  hw/block/virtio-blk.c|  4 +++-
>  hw/block/xen_disk.c  |  1 +
>  hw/core/qdev-properties-system.c | 13 +++--
>  hw/ide/piix.c|  3 +++
>  hw/scsi/scsi-bus.c   |  4 +++-
>  hw/usb/dev-storage.c |  3 ++-
>  include/sysemu/blockdev.h|  5 ++---
>  9 files changed, 43 insertions(+), 31 deletions(-)
> 

Ping?!?



Re: [Qemu-block] [PATCH 01/11] block: Use writeback in .bdrv_create() implementations

2016-03-09 Thread Paolo Bonzini


On 08/03/2016 17:34, Kevin Wolf wrote:
> There's no reason to use a writethrough cache mode while creating an
> image.

There's no reason to do flushes in fact, so you could use
BDRV_O_NO_FLUSH too. :)

Paolo

> Signed-off-by: Kevin Wolf 
> ---
>  block/parallels.c | 3 ++-
>  block/qcow.c  | 3 ++-
>  block/qcow2.c | 3 ++-
>  block/sheepdog.c  | 6 --
>  block/vdi.c   | 3 ++-
>  block/vhdx.c  | 3 ++-
>  block/vmdk.c  | 9 ++---
>  block/vpc.c   | 3 ++-
>  8 files changed, 22 insertions(+), 11 deletions(-)
> 
> diff --git a/block/parallels.c b/block/parallels.c
> index 645521d..3f9fd48 100644
> --- a/block/parallels.c
> +++ b/block/parallels.c
> @@ -479,7 +479,8 @@ static int parallels_create(const char *filename, 
> QemuOpts *opts, Error **errp)
>  
>  file = NULL;
>  ret = bdrv_open(, filename, NULL, NULL,
> -BDRV_O_RDWR | BDRV_O_PROTOCOL, _err);
> +BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
> +_err);
>  if (ret < 0) {
>  error_propagate(errp, local_err);
>  return ret;
> diff --git a/block/qcow.c b/block/qcow.c
> index 251910c..c46810c 100644
> --- a/block/qcow.c
> +++ b/block/qcow.c
> @@ -798,7 +798,8 @@ static int qcow_create(const char *filename, QemuOpts 
> *opts, Error **errp)
>  
>  qcow_bs = NULL;
>  ret = bdrv_open(_bs, filename, NULL, NULL,
> -BDRV_O_RDWR | BDRV_O_PROTOCOL, _err);
> +BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
> +_err);
>  if (ret < 0) {
>  error_propagate(errp, local_err);
>  goto cleanup;
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 8babecd..5a79177 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2173,7 +2173,8 @@ static int qcow2_create2(const char *filename, int64_t 
> total_size,
>  }
>  
>  bs = NULL;
> -ret = bdrv_open(, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
> +ret = bdrv_open(, filename, NULL, NULL,
> +BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
>  _err);
>  if (ret < 0) {
>  error_propagate(errp, local_err);
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index 8739acc..5f31ab3 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -1646,7 +1646,8 @@ static int sd_prealloc(const char *filename, Error 
> **errp)
>  void *buf = NULL;
>  int ret;
>  
> -ret = bdrv_open(, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
> +ret = bdrv_open(, filename, NULL, NULL,
> +BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
>  errp);
>  if (ret < 0) {
>  goto out_with_err_set;
> @@ -1839,7 +1840,8 @@ static int sd_create(const char *filename, QemuOpts 
> *opts,
>  }
>  
>  bs = NULL;
> -ret = bdrv_open(, backing_file, NULL, NULL, BDRV_O_PROTOCOL, 
> errp);
> +ret = bdrv_open(, backing_file, NULL, NULL,
> +BDRV_O_PROTOCOL | BDRV_O_CACHE_WB, errp);
>  if (ret < 0) {
>  goto out;
>  }
> diff --git a/block/vdi.c b/block/vdi.c
> index b403243..12407c4 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -766,7 +766,8 @@ static int vdi_create(const char *filename, QemuOpts 
> *opts, Error **errp)
>  error_propagate(errp, local_err);
>  goto exit;
>  }
> -ret = bdrv_open(, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
> +ret = bdrv_open(, filename, NULL, NULL,
> +BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
>  _err);
>  if (ret < 0) {
>  error_propagate(errp, local_err);
> diff --git a/block/vhdx.c b/block/vhdx.c
> index 9a51428..ea030ad 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -1838,7 +1838,8 @@ static int vhdx_create(const char *filename, QemuOpts 
> *opts, Error **errp)
>  }
>  
>  bs = NULL;
> -ret = bdrv_open(, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
> +ret = bdrv_open(, filename, NULL, NULL,
> +BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
>  _err);
>  if (ret < 0) {
>  error_propagate(errp, local_err);
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 03be7f0..dd80936 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1664,7 +1664,8 @@ static int vmdk_create_extent(const char *filename, 
> int64_t filesize,
>  }
>  
>  assert(bs == NULL);
> -ret = bdrv_open(, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
> +ret = bdrv_open(, filename, NULL, NULL,
> +BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
>  _err);
>  if (ret < 0) {
>  error_propagate(errp, local_err);
> @@ -1944,7 +1945,8 @@ static int vmdk_create(const char *filename, QemuOpts 
> *opts, Error **errp)
>  ret = -ENOENT;
>  goto exit;
>  

Re: [Qemu-block] [PATCH 0/2] block: drive_add/del without BlockBackend

2016-03-09 Thread Kevin Wolf
Am 23.02.2016 um 18:16 hat Kevin Wolf geschrieben:
> See patch 1 for the detailed description why this is needed. The short
> version is that libvirt needs to enable detect-zeroes on a mirror
> target.
> 
> Peter, can you please check if this provides what you need? I checked
> whether I could do the mirroring operation manually; and if I did it
> right, it looks good to me. (Shortened) protocol of my test:
> [...]

Applied to the block branch.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication

2016-03-09 Thread Changlong Xie

On 03/05/2016 01:53 AM, Stefan Hajnoczi wrote:

On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:

+static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
+{
+Error *local_err = NULL;
+int ret;
+
+if (!s->secondary_disk->bs->job) {
+error_setg(errp, "Backup job was cancelled unexpectedly");
+return;
+}


Do you have test cases for the replication driver?

This error code path seems like something that should be exercised to
make sure it works.

Basic start/checkpoint/stop and checks that the active/secondary/hidden
disks contain the expected data are also important.

Probably the easiest way to do this would be a file called
tests/test-replication.c that calls start/checkpoint/stop and read/write
functions directly instead of running a guest.



Hi Stefan

I'm working on the testcase now, and encount some issues.

Replication is different from other block drivers, we just use it for 
COLO live migration what means "primary" and "secondary" *MUST*

co-work with each other.

(1)
There are two test "prototype":
1 "COLO prototype":
simulate COLO environment, run two guests at the same time and 
setup nbd on both 'primary'/'secondary' side.


2 "Separate prototype"
test relevant APIs in separate 'primary'/'secondary' mode, so we
don't need run two guests and setup nbd on both side

I prefer *"Separate prototype"*. Which one do you prefer?

(2)
Since we have replication_(start/stop/do_checkpoint/get_error)_all APIs, 
i know how to test start/checkpoint/stop/get_error callbacks.


But I'm confused about how to test read/write of replication, are there 
some ready-made APIs for qemu test system?


I know, for interpreted languages, we can use:
 $QEMU_IO -s -c "read 0 $size" "$TEST_IMG"
 $QEMU_IO -s -c "write -P 0xa 0 $size" "$TEST_IMG"

Also are some APIs to check disks contents(we need check 
active/secondary/hidden disk contents)?


Thanks
-Xie





Re: [Qemu-block] [Qemu-devel] [PATCH v2 3/3] vmdk: Switch to heap arrays for vmdk_parent_open

2016-03-09 Thread Kevin Wolf
Am 09.03.2016 um 01:43 hat Fam Zheng geschrieben:
> On Tue, 03/08 16:24, Fam Zheng wrote:
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/vmdk.c | 17 +++--
> >  1 file changed, 11 insertions(+), 6 deletions(-)
> > 
> > diff --git a/block/vmdk.c b/block/vmdk.c
> > index c68f456..03be7f0 100644
> > --- a/block/vmdk.c
> > +++ b/block/vmdk.c
> > @@ -343,15 +343,16 @@ static int vmdk_reopen_prepare(BDRVReopenState *state,
> >  static int vmdk_parent_open(BlockDriverState *bs)
> >  {
> >  char *p_name;
> > -char desc[DESC_SIZE + 1];
> > +char *desc;
> >  BDRVVmdkState *s = bs->opaque;
> >  int ret;
> >  
> > -desc[DESC_SIZE] = '\0';
> > +desc = g_malloc0(DESC_SIZE + 1);
> >  ret = bdrv_pread(bs->file->bs, s->desc_offset, desc, DESC_SIZE);
> >  if (ret < 0) {
> > -return ret;
> > +goto out;
> >  }
> > +ret = 0;
> 
> Kevin, were you referring to this "ret = 0" in the cover letter? If so, it is
> not useless, because ret was set to DESC_SIZE by bdrv_pread. :)

Nope, I meant the initialisation in patch 1.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH 13/16] block: only call aio_poll from iothread

2016-03-09 Thread Fam Zheng
On Wed, 03/09 10:10, Paolo Bonzini wrote:
> 
> 
> On 09/03/2016 09:30, Fam Zheng wrote:
> > > -aio_poll(bdrv_get_aio_context(bs), true);
> > > +if (aio_context_in_iothread(ctx)) {
> > > +/* This case should not occur at all, except for the
> > > + * main thread.
> > > + */
> > 
> > Maybe assert ctx == qemu_get_aio_context()?
> 
> Actually it happens for block/mirror.c's bdrv_drained_begin, but it's safe.

Oh yes, then we cannot assert, and the comment need adjustion. Thanks for
pointing out.

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH 13/16] block: only call aio_poll from iothread

2016-03-09 Thread Paolo Bonzini


On 09/03/2016 09:30, Fam Zheng wrote:
> > -aio_poll(bdrv_get_aio_context(bs), true);
> > +if (aio_context_in_iothread(ctx)) {
> > +/* This case should not occur at all, except for the
> > + * main thread.
> > + */
> 
> Maybe assert ctx == qemu_get_aio_context()?

Sure.

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite

2016-03-09 Thread Fam Zheng
On Tue, 02/16 18:56, Paolo Bonzini wrote:
> So the fine-grained locking series has grown from 2 parts to 3...
> 
> This first part stops where we remove RFifoLock.

Modulo to the minor comments I had:

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [PATCH 06/16] block: add BDS field to count in-flight requests

2016-03-09 Thread Fam Zheng
On Wed, 03/09 09:22, Paolo Bonzini wrote:
> 
> 
> On 09/03/2016 09:00, Fam Zheng wrote:
> >> > On 09/03/2016 04:35, Fam Zheng wrote:
> > > >> >  enum BdrvTrackedRequestType {
> > > >> >  BDRV_TRACKED_READ,
> > > >> >  BDRV_TRACKED_WRITE,
> > > >> > -BDRV_TRACKED_FLUSH,
> > > >> > -BDRV_TRACKED_IOCTL,
> > > >> >  BDRV_TRACKED_DISCARD,
> >>> > > Okay, so flush and ioctl are not needed, but why is discard different?
> >> > 
> >> > Discard can modify the contents of the device, so I think it's safer to
> >> > serialize it against RMW and copy-on-read operations.
> > Okay, that makes sense, but ioctl like SG_IO can also modify content, no?
> 
> If you use SG_IO you shouldn't use RMW (because scsi_read_complete traps
> READ CAPACITY and sets the host block size as the guest block size) or
> copy-on-read (because raw has no backing file).
> 
> Besides, BDRV_TRACKED_IOCTL didn't include sector_num/nr_sectors
> operations so it didn't provide serialization.
> 

Yes, that's right. Thanks.

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH 13/16] block: only call aio_poll from iothread

2016-03-09 Thread Fam Zheng
On Tue, 02/16 18:56, Paolo Bonzini wrote:
> aio_poll is not thread safe; for example bdrv_drain can hang if
> the last in-flight I/O operation is completed in the I/O thread after
> the main thread has checked bs->in_flight.
> 
> The bug remains latent as long as all of it is called within
> aio_context_acquire/aio_context_release, but this will change soon.
> 
> To fix this, if bdrv_drain is called from outside the I/O thread handle
> it internally in the BDS, without involving AioContext and aio_poll.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block.c   |  2 ++
>  block/io.c| 21 ++---
>  include/block/block_int.h |  5 -
>  3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/block.c b/block.c
> index fb02d7f..601a73f 100644
> --- a/block.c
> +++ b/block.c
> @@ -267,6 +267,7 @@ BlockDriverState *bdrv_new(void)
>  qemu_co_queue_init(>throttled_reqs[1]);
>  bs->refcnt = 1;
>  bs->aio_context = qemu_get_aio_context();
> +qemu_event_init(>in_flight_event, true);
>  
>  QTAILQ_INSERT_TAIL(_bdrv_states, bs, bs_list);
>  
> @@ -2395,6 +2396,7 @@ static void bdrv_delete(BlockDriverState *bs)
>  bdrv_make_anon(bs);
>  
>  QTAILQ_REMOVE(_bdrv_states, bs, bs_list);
> +qemu_event_destroy(>in_flight_event);
>  
>  g_free(bs);
>  }
> diff --git a/block/io.c b/block/io.c
> index 04b52c8..ea0546f 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -251,11 +251,24 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
>  
>  static bool bdrv_drain_io_recurse(BlockDriverState *bs)
>  {
> -BdrvChild *child;
> +AioContext *ctx = bdrv_get_aio_context(bs);
>  bool waited = false;
> +BdrvChild *child;
>  
>  while (atomic_read(>in_flight) > 0) {
> -aio_poll(bdrv_get_aio_context(bs), true);
> +if (aio_context_in_iothread(ctx)) {
> +/* This case should not occur at all, except for the
> + * main thread.
> + */

Maybe assert ctx == qemu_get_aio_context()?

> +aio_poll(bdrv_get_aio_context(bs), true);
> +} else {
> +qemu_event_reset(>in_flight_event);
> +if (atomic_read(>in_flight) > 0) {
> +aio_context_release(bdrv_get_aio_context(bs));
> +qemu_event_wait(>in_flight_event);
> +aio_context_acquire(bdrv_get_aio_context(bs));
> +}
> +}
>  waited = true;
>  }
>  
> @@ -465,7 +478,9 @@ void bdrv_inc_in_flight(BlockDriverState *bs)
>  
>  void bdrv_dec_in_flight(BlockDriverState *bs)
>  {
> -atomic_dec(>in_flight);
> +if (atomic_fetch_dec(>in_flight) == 1) {
> +qemu_event_set(>in_flight_event);
> +}
>  }
>  
>  static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 89c38c0..9c96d5d 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -404,9 +404,12 @@ struct BlockDriverState {
>  /* Callback before write request is processed */
>  NotifierWithReturnList before_write_notifiers;
>  
> -/* number of in-flight requests; overall and serialising */
> +/* number of in-flight requests; overall and serialising.
> + * in_flight_event is set when in_flight becomes 0.
> + */
>  unsigned int in_flight;
>  unsigned int serialising_in_flight;
> +QemuEvent in_flight_event;
>  
>  /* I/O throttling.
>   * throttle_state tells us if this BDS has I/O limits configured.
> -- 
> 2.5.0
> 
> 
> 



Re: [Qemu-block] [PATCH 06/16] block: add BDS field to count in-flight requests

2016-03-09 Thread Paolo Bonzini


On 09/03/2016 09:00, Fam Zheng wrote:
>> > On 09/03/2016 04:35, Fam Zheng wrote:
> > >> >  enum BdrvTrackedRequestType {
> > >> >  BDRV_TRACKED_READ,
> > >> >  BDRV_TRACKED_WRITE,
> > >> > -BDRV_TRACKED_FLUSH,
> > >> > -BDRV_TRACKED_IOCTL,
> > >> >  BDRV_TRACKED_DISCARD,
>>> > > Okay, so flush and ioctl are not needed, but why is discard different?
>> > 
>> > Discard can modify the contents of the device, so I think it's safer to
>> > serialize it against RMW and copy-on-read operations.
> Okay, that makes sense, but ioctl like SG_IO can also modify content, no?

If you use SG_IO you shouldn't use RMW (because scsi_read_complete traps
READ CAPACITY and sets the host block size as the guest block size) or
copy-on-read (because raw has no backing file).

Besides, BDRV_TRACKED_IOCTL didn't include sector_num/nr_sectors
operations so it didn't provide serialization.

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 09/16] block: wait for all pending I/O when doing synchronous requests

2016-03-09 Thread Paolo Bonzini


On 09/03/2016 09:13, Fam Zheng wrote:
>> > @@ -352,7 +352,9 @@ static void qed_start_need_check_timer(BDRVQEDState *s)
>> >  static void qed_cancel_need_check_timer(BDRVQEDState *s)
>> >  {
>> >  trace_qed_cancel_need_check_timer(s);
>> > -timer_del(s->need_check_timer);
>> > +if (s->need_check_timer) {
>> > +timer_del(s->need_check_timer);
>> > +}
>> >  }
> Not clear why this change is needed in this patch, but it is obviously not
> wrong.  If this is to mask a bug, it at least deserves a comment.
> 
> The other parts of the patch looks good to me.

I need to check. :)

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 06/16] block: add BDS field to count in-flight requests

2016-03-09 Thread Fam Zheng
On Wed, 03/09 08:43, Paolo Bonzini wrote:
> 
> 
> On 09/03/2016 04:35, Fam Zheng wrote:
> >> >  enum BdrvTrackedRequestType {
> >> >  BDRV_TRACKED_READ,
> >> >  BDRV_TRACKED_WRITE,
> >> > -BDRV_TRACKED_FLUSH,
> >> > -BDRV_TRACKED_IOCTL,
> >> >  BDRV_TRACKED_DISCARD,
> > Okay, so flush and ioctl are not needed, but why is discard different?
> 
> Discard can modify the contents of the device, so I think it's safer to
> serialize it against RMW and copy-on-read operations.

Okay, that makes sense, but ioctl like SG_IO can also modify content, no?

Fam