Re: [Qemu-block] [Qemu-devel] [PATCH 07/16] block: change drain to look only at one child at a time

2016-03-08 Thread Paolo Bonzini


On 09/03/2016 04:41, Fam Zheng wrote:
> > bdrv_requests_pending is checking children to also wait until internal
> > requests (such as metadata writes) have completed.  However, checking
> > children is in general overkill because, apart from this special case,
> > the parent's in_flight count will always be incremented by at least one
> > for every request in the child.
> 
> While the patch looks good, I'm not sure I understand this: aren't children's
> requests generated by the child itself, how is parent's in_flight incremented?

What I mean is that there are two cases of bs generating I/O on
bs->file->bs:

1) writes caused by an operation on bs, e.g. a bdrv_aio_write to bs.  In
this case, the bdrv_aio_write increments bs's in_flight count

2) asynchronous metadata writes or flushes.  Such writes can be started
even if bs's in_flight count is zero, but not after the .bdrv_drain
callback has been invoked.  So the correct sequence is:

- finish current I/O

- call bdrv_drain callback

- recurse on children

This is what is needed for QED's callback to work correctly, and I'll
implement it this way in v2.

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 06/16] block: add BDS field to count in-flight requests

2016-03-08 Thread Paolo Bonzini


On 09/03/2016 04:35, Fam Zheng wrote:
>> >  enum BdrvTrackedRequestType {
>> >  BDRV_TRACKED_READ,
>> >  BDRV_TRACKED_WRITE,
>> > -BDRV_TRACKED_FLUSH,
>> > -BDRV_TRACKED_IOCTL,
>> >  BDRV_TRACKED_DISCARD,
> Okay, so flush and ioctl are not needed, but why is discard different?

Discard can modify the contents of the device, so I think it's safer to
serialize it against RMW and copy-on-read operations.

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 05/16] mirror: use bottom half to re-enter coroutine

2016-03-08 Thread Paolo Bonzini


On 09/03/2016 04:19, Fam Zheng wrote:
>> > +/* The I/O operation is not finished until the callback returns.
>> > + * If we call qemu_coroutine_enter here, there is the possibility
>> > + * of a deadlock when the coroutine calls bdrv_drained_begin.
>> > + */
>> > +op->co_enter_bh = qemu_bh_new(mirror_bh_cb, op);
> Shouldn't this be aio_bh_new()?

Yes, of course.

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 03/16] block: introduce bdrv_no_throttling_begin/end

2016-03-08 Thread Paolo Bonzini


On 09/03/2016 02:45, Fam Zheng wrote:
>> > @@ -555,11 +574,7 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t 
>> > offset,
>> >   * will not fire; so the I/O throttling function has to be disabled 
>> > here
>> >   * if it has been enabled.
>> >   */
>> > -if (bs->io_limits_enabled) {
>> > -fprintf(stderr, "Disabling I/O throttling on '%s' due "
>> > -"to synchronous I/O.\n", 
>> > bdrv_get_device_name(bs));
>> > -bdrv_io_limits_disable(bs);
>> > -}
>> > +bdrv_no_throttling_begin(bs);
>> >  
>> >  if (qemu_in_coroutine()) {
>> >  /* Fast-path if already in coroutine context */
>> > @@ -573,6 +588,8 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t 
>> > offset,
>> >  aio_poll(aio_context, true);
>> >  }
>> >  }
>> > +
>> > +bdrv_no_throttling_end(bs);
> 
> Does this change the behavior? There wasn't a bdrv_io_limits_enable() here, 
> and
> the throttle doesn't come back automatically. Just want to make sure it's
> intended.

Yes, it's intended.  As long as the I/O stays synchronous, throttling is
disabled.  If it starts to be asynchronous, it will be re-enabled
automatically.

Paolo



Re: [Qemu-block] [Qemu-devel] [PATCH 02/16] block: move restarting of throttled reqs to block/throttle-groups.c

2016-03-08 Thread Paolo Bonzini


On 09/03/2016 02:26, Fam Zheng wrote:
>> diff --git a/block/throttle-groups.c b/block/throttle-groups.c
>> index 4920e09..eccfc0d 100644
>> --- a/block/throttle-groups.c
>> +++ b/block/throttle-groups.c
>> @@ -313,6 +313,17 @@ void coroutine_fn 
>> throttle_group_co_io_limits_intercept(BlockDriverState *bs,
>>  qemu_mutex_unlock(>lock);
>>  }
>>  
>> +void throttle_group_restart_bs(BlockDriverState *bs)
>> +{
>> +int i;
>> +
>> +for (i = 0; i < 2; i++) {
>> +while (qemu_co_enter_next(>throttled_reqs[i])) {
>> +;
>> +}
>> +}
>> +}
>> +
>>  /* Update the throttle configuration for a particular group. Similar
>>   * to throttle_config(), but guarantees atomicity within the
>>   * throttling group.
>> @@ -335,6 +346,10 @@ void throttle_group_config(BlockDriverState *bs, 
>> ThrottleConfig *cfg)
>>  }
>>  throttle_config(ts, tt, cfg);
>>  qemu_mutex_unlock(>lock);
>> +
>> +aio_context_acquire(bdrv_get_aio_context(bs));
>> +throttle_group_restart_bs(bs);
>> +aio_context_release(bdrv_get_aio_context(bs));
> 
> Could you explain why does this hunk belong to this patch?
> 
> Otherwise looks good.

Sure.  It's moved from bdrv_set_io_limits, which calls
throttle_group_config, to throttle_group_config itself:

>>  void bdrv_set_io_limits(BlockDriverState *bs,
>>  ThrottleConfig *cfg)
>>  {
>> -int i;
>> -
>>  throttle_group_config(bs, cfg);
>> -
>> -for (i = 0; i < 2; i++) {
>> -qemu_co_enter_next(>throttled_reqs[i]);
>> -}
>>  }
>>  

But in v2 I'll change it to only restart the first request so there is
no semantic change.

Paolo



[Qemu-block] [PATCH 0/2] block/qapi: trivial fixes

2016-03-08 Thread Peter Xu
One is to use literal printf format when possible.

One is to fix an unbounded usage of stack.

Peter Xu (2):
  block/qapi: make two printf() formats literal
  block/qapi: fix unbounded stack for dump_qdict

 block/qapi.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

-- 
2.4.3




[Qemu-block] [PATCH 1/2] block/qapi: make two printf() formats literal

2016-03-08 Thread Peter Xu
Fix two places to use literal printf format when possible.

Signed-off-by: Peter Xu 
---
 block/qapi.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index db2d3fb..c4c2115 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -619,9 +619,8 @@ static void dump_qlist(fprintf_function func_fprintf, void 
*f, int indentation,
 for (entry = qlist_first(list); entry; entry = qlist_next(entry), i++) {
 QType type = qobject_type(entry->value);
 bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
-const char *format = composite ? "%*s[%i]:\n" : "%*s[%i]: ";
-
-func_fprintf(f, format, indentation * 4, "", i);
+func_fprintf(f, "%*s[%i]:%c", indentation * 4, "", i,
+ composite ? '\n' : ' ');
 dump_qobject(func_fprintf, f, indentation + 1, entry->value);
 if (!composite) {
 func_fprintf(f, "\n");
@@ -637,7 +636,6 @@ static void dump_qdict(fprintf_function func_fprintf, void 
*f, int indentation,
 for (entry = qdict_first(dict); entry; entry = qdict_next(dict, entry)) {
 QType type = qobject_type(entry->value);
 bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
-const char *format = composite ? "%*s%s:\n" : "%*s%s: ";
 char key[strlen(entry->key) + 1];
 int i;
 
@@ -646,8 +644,8 @@ static void dump_qdict(fprintf_function func_fprintf, void 
*f, int indentation,
 key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];
 }
 key[i] = 0;
-
-func_fprintf(f, format, indentation * 4, "", key);
+func_fprintf(f, "%*s%s:%c", indentation * 4, "", key,
+ composite ? '\n' : ' ');
 dump_qobject(func_fprintf, f, indentation + 1, entry->value);
 if (!composite) {
 func_fprintf(f, "\n");
-- 
2.4.3




[Qemu-block] [PATCH v11 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-03-08 Thread Changlong Xie
From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 block.c   |   8 ++--
 block/quorum.c| 123 +-
 include/block/block.h |   4 ++
 3 files changed, 129 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index d48f441..66d21af 100644
--- a/block.c
+++ b/block.c
@@ -1194,10 +1194,10 @@ static int bdrv_fill_options(QDict **options, const 
char *filename,
 return 0;
 }
 
-static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
-BlockDriverState *child_bs,
-const char *child_name,
-const BdrvChildRole *child_role)
+BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+ BlockDriverState *child_bs,
+ const char *child_name,
+ const BdrvChildRole *child_role)
 {
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
diff --git a/block/quorum.c b/block/quorum.c
index 11cc60b..469e4a3 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -24,6 +24,7 @@
 #include "qapi/qmp/qstring.h"
 #include "qapi-event.h"
 #include "crypto/hash.h"
+#include "qemu/bitmap.h"
 
 #define HASH_LENGTH 32
 
@@ -81,6 +82,8 @@ typedef struct BDRVQuorumState {
 bool rewrite_corrupted;/* true if the driver must rewrite-on-read corrupted
 * block if Quorum is reached.
 */
+unsigned long *index_bitmap;
+int bsize;
 
 QuorumReadPattern read_pattern;
 } BDRVQuorumState;
@@ -877,9 +880,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = -EINVAL;
 goto exit;
 }
-if (s->num_children < 2) {
+if (s->num_children < 1) {
 error_setg(_err,
-   "Number of provided children must be greater than 1");
+   "Number of provided children must be 1 or more");
 ret = -EINVAL;
 goto exit;
 }
@@ -928,6 +931,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 /* allocate the children array */
 s->children = g_new0(BdrvChild *, s->num_children);
 opened = g_new0(bool, s->num_children);
+s->index_bitmap = bitmap_new(s->num_children);
 
 for (i = 0; i < s->num_children; i++) {
 char indexstr[32];
@@ -943,6 +947,8 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 opened[i] = true;
 }
+bitmap_set(s->index_bitmap, 0, s->num_children);
+s->bsize = s->num_children;
 
 g_free(opened);
 goto exit;
@@ -999,6 +1005,116 @@ static void quorum_attach_aio_context(BlockDriverState 
*bs,
 }
 }
 
+static int get_new_child_index(BDRVQuorumState *s)
+{
+int index;
+
+index = find_next_zero_bit(s->index_bitmap, s->bsize, 0);
+if (index < s->bsize) {
+return index;
+}
+
+s->index_bitmap = bitmap_zero_extend(s->index_bitmap, s->bsize,
+ s->bsize + 1);
+return s->bsize++;
+}
+
+static void remove_child_index(BDRVQuorumState *s, int index)
+{
+int last_index, old_bsize;
+size_t new_len;
+
+assert(index < s->bsize);
+
+clear_bit(index, s->index_bitmap);
+if (index < s->bsize - 1) {
+/* The last bit is always set */
+return;
+}
+
+/* Clear last bit */
+old_bsize = s->bsize;
+last_index = find_last_bit(s->index_bitmap, s->bsize);
+assert(last_index < old_bsize);
+s->bsize = last_index + 1;
+
+if (BITS_TO_LONGS(old_bsize) == BITS_TO_LONGS(s->bsize)) {
+return;
+}
+
+new_len = BITS_TO_LONGS(s->bsize) * sizeof(unsigned long);
+s->index_bitmap = g_realloc(s->index_bitmap, new_len);
+}
+
+static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+BdrvChild *child;
+char indexstr[32];
+int index, ret;
+
+index = get_new_child_index(s);
+ret = snprintf(indexstr, 32, "children.%d", index);
+if (ret < 0 || ret >= 32) {
+error_setg(errp, "cannot generate child name");
+return;
+}
+
+bdrv_drain(bs);
+
+assert(s->num_children <= INT_MAX / sizeof(BdrvChild *));
+if (s->num_children == INT_MAX / sizeof(BdrvChild *)) {
+error_setg(errp, "Too many children");
+return;
+}
+s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
+
+bdrv_ref(child_bs);
+child = bdrv_attach_child(bs, child_bs, indexstr, _format);
+s->children[s->num_children++] = child;
+set_bit(index, s->index_bitmap);
+}
+
+static void 

[Qemu-block] [PATCH v11 1/3] Add new block driver interface to add/delete a BDS's child

2016-03-08 Thread Changlong Xie
From: Wen Congyang 

In some cases, we want to take a quorum child offline, and take
another child online.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 block.c   | 49 +++
 include/block/block.h |  4 
 include/block/block_int.h |  5 +
 3 files changed, 58 insertions(+)

diff --git a/block.c b/block.c
index ba24b8e..d48f441 100644
--- a/block.c
+++ b/block.c
@@ -4395,3 +4395,52 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 QDECREF(json);
 }
 }
+
+/*
+ * Hot add/remove a BDS's child. So the user can take a child offline when
+ * it is broken and take a new child online
+ */
+void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
+Error **errp)
+{
+
+if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
+error_setg(errp, "The node %s does not support adding a child",
+   bdrv_get_device_or_node_name(parent_bs));
+return;
+}
+
+if (!QLIST_EMPTY(_bs->parents)) {
+error_setg(errp, "The node %s already has a parent",
+   child_bs->node_name);
+return;
+}
+
+parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
+}
+
+void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error 
**errp)
+{
+BdrvChild *tmp;
+
+if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
+error_setg(errp, "The node %s does not support removing a child",
+   bdrv_get_device_or_node_name(parent_bs));
+return;
+}
+
+QLIST_FOREACH(tmp, _bs->children, next) {
+if (tmp == child) {
+break;
+}
+}
+
+if (!tmp) {
+error_setg(errp, "The node %s does not have child named %s",
+   bdrv_get_device_or_node_name(parent_bs),
+   bdrv_get_device_or_node_name(child->bs));
+return;
+}
+
+parent_bs->drv->bdrv_del_child(parent_bs, child, errp);
+}
diff --git a/include/block/block.h b/include/block/block.h
index 1c4f4d8..7378e74 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -582,4 +582,8 @@ void bdrv_drained_begin(BlockDriverState *bs);
  */
 void bdrv_drained_end(BlockDriverState *bs);
 
+void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
+Error **errp);
+void bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 9ef823a..704efe5 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -305,6 +305,11 @@ struct BlockDriver {
  */
 void (*bdrv_drain)(BlockDriverState *bs);
 
+void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
+   Error **errp);
+void (*bdrv_del_child)(BlockDriverState *parent, BdrvChild *child,
+   Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
1.9.3






[Qemu-block] [PATCH v11 3/3] qmp: add monitor command to add/remove a child

2016-03-08 Thread Changlong Xie
From: Wen Congyang 

The new QMP command name is x-blockdev-change. It's just for adding/removing
quorum's child now, and doesn't support all kinds of children, all kinds of
operations, nor all block drivers. So it is experimental now.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 blockdev.c   | 55 
 qapi/block-core.json | 32 ++
 qmp-commands.hx  | 54 +++
 3 files changed, 141 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 0f20c65..435631e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -4017,6 +4017,61 @@ out:
 aio_context_release(aio_context);
 }
 
+static BdrvChild *bdrv_find_child(BlockDriverState *parent_bs,
+  const char *child_name)
+{
+BdrvChild *child;
+
+QLIST_FOREACH(child, _bs->children, next) {
+if (strcmp(child->name, child_name) == 0) {
+return child;
+}
+}
+
+return NULL;
+}
+
+void qmp_x_blockdev_change(const char *parent, bool has_child,
+   const char *child, bool has_node,
+   const char *node, Error **errp)
+{
+BlockDriverState *parent_bs, *new_bs = NULL;
+BdrvChild *p_child;
+
+parent_bs = bdrv_lookup_bs(parent, parent, errp);
+if (!parent_bs) {
+return;
+}
+
+if (has_child == has_node) {
+if (has_child) {
+error_setg(errp, "The parameters child and node are in conflict");
+} else {
+error_setg(errp, "Either child or node must be specified");
+}
+return;
+}
+
+if (has_child) {
+p_child = bdrv_find_child(parent_bs, child);
+if (!p_child) {
+error_setg(errp, "Node '%s' does not have child '%s'",
+   parent, child);
+return;
+}
+bdrv_del_child(parent_bs, p_child, errp);
+}
+
+if (has_node) {
+new_bs = bdrv_find_node(node);
+if (!new_bs) {
+error_setg(errp, "Node '%s' not found", node);
+return;
+}
+bdrv_add_child(parent_bs, new_bs, errp);
+}
+}
+
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
 BlockJobInfoList *head = NULL, **p_next = 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9bf1b22..bc3fd0b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2548,3 +2548,35 @@
 ##
 { 'command': 'block-set-write-threshold',
   'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
+
+##
+# @x-blockdev-change
+#
+# Dynamically reconfigure the block driver state graph. It can be used
+# to add, remove, insert or replace a graph node. Currently only the
+# Quorum driver implements this feature to add or remove its child. This
+# is useful to fix a broken quorum child.
+#
+# If @node is specified, it will be inserted under @parent. @child
+# may not be specified in this case. If both @parent and @child are
+# specified but @node is not, @child will be detached from @parent.
+#
+# @parent: the id or name of the parent node.
+#
+# @child: #optional the name of a child under the given parent node.
+#
+# @node: #optional the name of the node that will be added.
+#
+# Note: this command is experimental, and its API is not stable. It
+# does not support all kinds of operations, all kinds of children, nor
+# all block drivers.
+#
+# Warning: The data in a new quorum child MUST be consistent with that of
+# the rest of the array.
+#
+# Since: 2.6
+##
+{ 'command': 'x-blockdev-change',
+  'data' : { 'parent': 'str',
+ '*child': 'str',
+ '*node': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index b629673..2a55135 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4398,6 +4398,60 @@ Example:
 EQMP
 
 {
+.name   = "x-blockdev-change",
+.args_type  = "parent:B,child:B?,node:B?",
+.mhandler.cmd_new = qmp_marshal_x_blockdev_change,
+},
+
+SQMP
+x-blockdev-change
+-
+
+Dynamically reconfigure the block driver state graph. It can be used
+to add, remove, insert or replace a graph node. Currently only the
+Quorum driver implements this feature to add or remove its child. This
+is useful to fix a broken quorum child.
+
+If @node is specified, it will be inserted under @parent. @child
+may not be specified in this case. If both @parent and @child are
+specified but @node is not, @child will be detached from @parent.
+
+Arguments:
+- "parent": the id or name of the parent node (json-string)
+- "child": the name of a child under the given parent node (json-string, 
optional)
+- "node": the name of the node that will be added (json-string, optional)
+
+Note: this command is 

[Qemu-block] [PATCH v11 0/3] qapi: child add/delete support

2016-03-08 Thread Changlong Xie
ChangLog:
v10~v11:
1. Rebase to the newest codes
2. Address comment from Max
Don't use contractions in error messages,
p1: Remove R-Bs, and use "BdrvChild *child" in bdrv_del_child
p2: Fix error logic in get_new_child_index/remove_child_index, and prefect
child->name parsing
p3: Make bdrv_find_child return BdrvChild *, and add missing explanation

v9~v10:
1. Rebase to the newest codes
2. Address comments from Berto and Max, update the documents in block-core.json 
   and qmp-commands.hx 
3. Remove redundant codes in quorum_add_child() and quorum_del_child()
v8:
1. Rebase to the newest codes
2. Address the comments from Eric Blake
v7:
1. Remove the qmp command x-blockdev-change's parameter operation according
   to Kevin's comments.
2. Remove the hmp command.
v6:
1. Use a single qmp command x-blockdev-change to replace x-blockdev-child-add
   and x-blockdev-child-delete
v5:
1. Address Eric Blake's comments
v4:
1. drop nbd driver's implementation. We can use human-monitor-command
   to do it.
2. Rename the command name.
v3:
1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
   created by the QMP command blockdev-add.
2. The driver NBD can support filename, path, host:port now.
v2:
1. Use bdrv_get_device_or_node_name() instead of new function
   bdrv_get_id_or_node_name()
2. Update the error message
3. Update the documents in block-core.json

Wen Congyang (3):
  Add new block driver interface to add/delete a BDS's child
  quorum: implement bdrv_add_child() and bdrv_del_child()
  qmp: add monitor command to add/remove a child

 block.c   |  57 +++--
 block/quorum.c| 123 +-
 blockdev.c|  55 +
 include/block/block.h |   8 +++
 include/block/block_int.h |   5 ++
 qapi/block-core.json  |  32 
 qmp-commands.hx   |  54 
 7 files changed, 328 insertions(+), 6 deletions(-)

-- 
1.9.3






Re: [Qemu-block] [Qemu-devel] [PATCH 07/16] block: change drain to look only at one child at a time

2016-03-08 Thread Fam Zheng
On Tue, 02/16 18:56, Paolo Bonzini wrote:
> bdrv_requests_pending is checking children to also wait until internal
> requests (such as metadata writes) have completed.  However, checking
> children is in general overkill because, apart from this special case,
> the parent's in_flight count will always be incremented by at least one
> for every request in the child.

While the patch looks good, I'm not sure I understand this: aren't children's
requests generated by the child itself, how is parent's in_flight incremented?

> 
> Since internal requests are only generated by the parent in the child,
> instead visit the tree parent first, and then wait for internal I/O in
> the children to complete.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/io.c | 27 +++
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index a9a23a6..e0c9215 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -249,6 +249,23 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
>  }
>  }
>  
> +static bool bdrv_drain_io_recurse(BlockDriverState *bs)
> +{
> +BdrvChild *child;
> +bool waited = false;
> +
> +while (atomic_read(>in_flight) > 0) {
> +aio_poll(bdrv_get_aio_context(bs), true);
> +waited = true;
> +}
> +
> +QLIST_FOREACH(child, >children, next) {
> +waited |= bdrv_drain_io_recurse(child->bs);
> +}
> +
> +return waited;
> +}
> +
>  /*
>   * Wait for pending requests to complete on a single BlockDriverState 
> subtree,
>   * and suspend block driver's internal I/O until next request arrives.
> @@ -265,10 +282,7 @@ void bdrv_drain(BlockDriverState *bs)
>  bdrv_no_throttling_begin(bs);
>  bdrv_io_unplugged_begin(bs);
>  bdrv_drain_recurse(bs);
> -while (bdrv_requests_pending(bs)) {
> -/* Keep iterating */
> - aio_poll(bdrv_get_aio_context(bs), true);
> -}
> +bdrv_drain_io_recurse(bs);
>  bdrv_io_unplugged_end(bs);
>  bdrv_no_throttling_end(bs);
>  }
> @@ -319,10 +333,7 @@ void bdrv_drain_all(void)
>  aio_context_acquire(aio_context);
>  while ((bs = bdrv_next(bs))) {
>  if (aio_context == bdrv_get_aio_context(bs)) {
> -if (bdrv_requests_pending(bs)) {
> -aio_poll(aio_context, true);
> -waited = true;
> -}
> +waited |= bdrv_drain_io_recurse(bs);
>  }
>  }
>  aio_context_release(aio_context);
> -- 
> 2.5.0
> 
> 
> 



Re: [Qemu-block] [Qemu-devel] [PATCH 06/16] block: add BDS field to count in-flight requests

2016-03-08 Thread Fam Zheng
On Tue, 02/16 18:56, Paolo Bonzini wrote:
> Unlike tracked_requests, this field also counts throttled requests,
> and remains non-zero if an AIO operation needs a BH to be "really"
> completed.
> 
> With this change, it is no longer necessary to have a dummy
> BdrvTrackedRequest for requests that are never serialising, and
> it is no longer necessary to poll the AioContext once after
> bdrv_requests_pending(bs) returns false.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/io.c| 71 
> +++
>  block/mirror.c|  2 ++
>  include/block/block_int.h |  8 --
>  3 files changed, 54 insertions(+), 27 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index d8d50b7..a9a23a6 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -224,13 +224,7 @@ bool bdrv_requests_pending(BlockDriverState *bs)
>  {
>  BdrvChild *child;
>  
> -if (!QLIST_EMPTY(>tracked_requests)) {
> -return true;
> -}
> -if (!qemu_co_queue_empty(>throttled_reqs[0])) {
> -return true;
> -}
> -if (!qemu_co_queue_empty(>throttled_reqs[1])) {
> +if (atomic_read(>in_flight)) {
>  return true;
>  }
>  
> @@ -268,15 +262,12 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
>   */
>  void bdrv_drain(BlockDriverState *bs)
>  {
> -bool busy = true;
> -
>  bdrv_no_throttling_begin(bs);
>  bdrv_io_unplugged_begin(bs);
>  bdrv_drain_recurse(bs);
> -while (busy) {
> +while (bdrv_requests_pending(bs)) {
>  /* Keep iterating */
> - busy = bdrv_requests_pending(bs);
> - busy |= aio_poll(bdrv_get_aio_context(bs), busy);
> + aio_poll(bdrv_get_aio_context(bs), true);
>  }
>  bdrv_io_unplugged_end(bs);
>  bdrv_no_throttling_end(bs);
> @@ -291,7 +282,7 @@ void bdrv_drain(BlockDriverState *bs)
>  void bdrv_drain_all(void)
>  {
>  /* Always run first iteration so any pending completion BHs run */
> -bool busy = true;
> +bool waited = true;
>  BlockDriverState *bs = NULL;
>  GSList *aio_ctxs = NULL, *ctx;
>  
> @@ -318,8 +309,8 @@ void bdrv_drain_all(void)
>   * request completion.  Therefore we must keep looping until there was no
>   * more activity rather than simply draining each device independently.
>   */
> -while (busy) {
> -busy = false;
> +while (waited) {
> +waited = false;
>  
>  for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
>  AioContext *aio_context = ctx->data;
> @@ -329,12 +320,11 @@ void bdrv_drain_all(void)
>  while ((bs = bdrv_next(bs))) {
>  if (aio_context == bdrv_get_aio_context(bs)) {
>  if (bdrv_requests_pending(bs)) {
> -busy = true;
> -aio_poll(aio_context, busy);
> +aio_poll(aio_context, true);
> +waited = true;
>  }
>  }
>  }
> -busy |= aio_poll(aio_context, false);
>  aio_context_release(aio_context);
>  }
>  }
> @@ -457,6 +447,16 @@ static bool tracked_request_overlaps(BdrvTrackedRequest 
> *req,
>  return true;
>  }
>  
> +void bdrv_inc_in_flight(BlockDriverState *bs)
> +{
> +atomic_inc(>in_flight);
> +}
> +
> +void bdrv_dec_in_flight(BlockDriverState *bs)
> +{
> +atomic_dec(>in_flight);
> +}
> +
>  static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
>  {
>  BlockDriverState *bs = self->bs;
> @@ -963,6 +963,8 @@ static int coroutine_fn 
> bdrv_co_do_preadv(BlockDriverState *bs,
>  return ret;
>  }
>  
> +bdrv_inc_in_flight(bs);
> +
>  /* Don't do copy-on-read if we read data before write operation */
>  if (bs->copy_on_read && !(flags & BDRV_REQ_NO_SERIALISING)) {
>  flags |= BDRV_REQ_COPY_ON_READ;
> @@ -1003,6 +1005,7 @@ static int coroutine_fn 
> bdrv_co_do_preadv(BlockDriverState *bs,
>use_local_qiov ? _qiov : qiov,
>flags);
>  tracked_request_end();
> +bdrv_dec_in_flight(bs);
>  
>  if (use_local_qiov) {
>  qemu_iovec_destroy(_qiov);
> @@ -1310,6 +1313,8 @@ static int coroutine_fn 
> bdrv_co_do_pwritev(BlockDriverState *bs,
>  return ret;
>  }
>  
> +bdrv_inc_in_flight(bs);
> +
>  /* throttling disk I/O */
>  if (bs->throttle_state) {
>  throttle_group_co_io_limits_intercept(bs, bytes, true);
> @@ -1408,6 +1413,7 @@ fail:
>  qemu_vfree(tail_buf);
>  out:
>  tracked_request_end();
> +bdrv_dec_in_flight(bs);
>  return ret;
>  }
>  
> @@ -2065,6 +2071,7 @@ static const AIOCBInfo bdrv_em_aiocb_info = {
>  static void bdrv_aio_bh_cb(void *opaque)
>  {
>  BlockAIOCBSync *acb = opaque;
> +BlockDriverState *bs = acb->common.bs;
>  
>  if (!acb->is_write && acb->ret >= 0) {
>  

Re: [Qemu-block] [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication

2016-03-08 Thread Changlong Xie

On 03/05/2016 01:39 AM, Stefan Hajnoczi wrote:

On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:

+static void replication_start(ReplicationState *rs, ReplicationMode mode,
+  Error **errp)
+{
+BlockDriverState *bs = rs->opaque;
+BDRVReplicationState *s = bs->opaque;
+int64_t active_length, hidden_length, disk_length;
+AioContext *aio_context;
+Error *local_err = NULL;
+
+if (s->replication_state != BLOCK_REPLICATION_NONE) {


Dereferencing bs and s is not thread-safe since the AioContext isn't
acquired here.  Unless you have other locking rules, the default is that
everything in BlockDriverState and bs->opaque must be accessed with the
AioContext acquired.

Please apply this comment to the rest of the patch series, too.
Functions that are not part of BlockDriver are probably called outside
the AioContext and must acquire it before they are allowed to access
anything else in bs.


+s->top_bs = get_top_bs(bs);
+if (!s->top_bs) {
+error_setg(errp, "Cannot get the top block driver state to do"
+   " internal backup");
+return;
+}


Traversing the BDS graph using the parent pointer is not safe - you
cannot stash the parent pointer because it changes if a parent node is
inserted/removed.

I suggest passing the drive name as an argument so that bdrv_lookup_bs()
can be used when installing the op blocker.  Then the BdrvChild.parent
pointer patch and get_top_bs() can be deleted.

Jeff Cody is currently working on a new op blocker system.  Hopefully
this system will allow you to install blockers on bs instead of on the
drive's top BDS.  But let's not worry about that for now and just use
the drive name as an argument.


+/*
+ * Must protect backup target if backup job was stopped/cancelled
+ * unexpectedly
+ */
+bdrv_ref(s->hidden_disk->bs);


Where is the matching bdrv_unref() call?


Two conditions
1. job is cancelled, so "local_err != 0"

449 if (local_err) {
450 error_propagate(errp, local_err);
451 backup_job_cleanup(s);
452 bdrv_unref(s->hidden_disk->bs);
453 return;
454 }

2. backup completed
backup_complete
bdrv_unref(s->target);

If i'm wrong, pls correct me.

Thanks
-Xie











Re: [Qemu-block] [Qemu-devel] [PATCH 05/16] mirror: use bottom half to re-enter coroutine

2016-03-08 Thread Fam Zheng
On Tue, 02/16 18:56, Paolo Bonzini wrote:
> mirror is calling bdrv_drain from an AIO callback---more precisely,
> the bdrv_drain happens far away from the AIO callback, in the coroutine that
> the AIO callback enters.
> 
> This used to be okay because bdrv_drain more or less tried to guess
> when all AIO callbacks were done; however it will cause a deadlock once
> bdrv_drain really checks for AIO callbacks to be complete.  The situation
> here is admittedly underdefined, and Stefan pointed out that the same
> solution is found in many other places in the QEMU block layer, therefore
> I think this workaround is acceptable.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/mirror.c | 23 +++
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 2c0edfa..793c20c 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -71,6 +71,7 @@ typedef struct MirrorOp {
>  QEMUIOVector qiov;
>  int64_t sector_num;
>  int nb_sectors;
> +QEMUBH *co_enter_bh;
>  } MirrorOp;
>  
>  static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
> @@ -86,6 +87,18 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob 
> *s, bool read,
>  }
>  }
>  
> +static void mirror_bh_cb(void *opaque)
> +{
> +MirrorOp *op = opaque;
> +MirrorBlockJob *s = op->s;
> +
> +qemu_bh_delete(op->co_enter_bh);
> +g_free(op);
> +if (s->waiting_for_io) {
> +qemu_coroutine_enter(s->common.co, NULL);
> +}
> +}
> +
>  static void mirror_iteration_done(MirrorOp *op, int ret)
>  {
>  MirrorBlockJob *s = op->s;
> @@ -116,11 +129,13 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>  }
>  
>  qemu_iovec_destroy(>qiov);
> -g_free(op);
>  
> -if (s->waiting_for_io) {
> -qemu_coroutine_enter(s->common.co, NULL);
> -}
> +/* The I/O operation is not finished until the callback returns.
> + * If we call qemu_coroutine_enter here, there is the possibility
> + * of a deadlock when the coroutine calls bdrv_drained_begin.
> + */
> +op->co_enter_bh = qemu_bh_new(mirror_bh_cb, op);

Shouldn't this be aio_bh_new()?

Fam

> +qemu_bh_schedule(op->co_enter_bh);
>  }
>  
>  static void mirror_write_complete(void *opaque, int ret)
> -- 
> 2.5.0
> 
> 
> 



Re: [Qemu-block] [Qemu-devel] [PATCH 2/8] block: fix unbounded stack for dump_qdict

2016-03-08 Thread Peter Xu
On Tue, Mar 08, 2016 at 01:17:03PM +0100, Paolo Bonzini wrote:
> 
> 
> On 08/03/2016 09:12, Markus Armbruster wrote:
> > I'm afraid this isn't a good idea.  It relies on the non-local argument
> > that nobody will ever put a key longer than 255 into a qdict that gets
> > dumped.  That may even be the case, but you need to *prove* it, not just
> > assert it.  The weakest acceptable proof might be assertions in every
> > place that put keys into a dict that might get dumped.  I suspect that's
> > practical and maintainable only if there's a single place that does it.
> > 
> > If this was a good idea, I'd recommend to avoid the awkward macro:
> > 
> >char key[256];
> >int i;
> >
> >assert(strlen(entry->key) + 1 <= ARRAY_SIZE(key));
> > 
> > There are several other ways to limit the stack usage:
> > 
> > 1. Move the array from stack to heap.  Fine unless it's on a hot path.
> >As far as I can tell, this dumping business is for HMP and qemu-io,
> >i.e. not hot.
> 
> I think this is the best.  You can just g_strdup, modify in place, print
> and free.

g_strdup() will bring one more loop? One to copy the strings, one
for replacing "-" to " ". Though I will first need to replace
g_malloc0() with g_malloc(), which seems more suitable here. :)

Thanks!
Peter



Re: [Qemu-block] [Qemu-devel] [PATCH 2/8] block: fix unbounded stack for dump_qdict

2016-03-08 Thread Peter Xu
On Tue, Mar 08, 2016 at 02:47:31PM +0100, Markus Armbruster wrote:
> Fam Zheng  writes:
> > Also I think the double underscore identifiers are considered reserved in C,
> > no?
> 
> Correct.  C99 7.1.3 Reserved identifiers: All identifiers that begin
> with an underscore and either an uppercase letter or another underscore
> are always reserved for any use.
> 
> [...]

Fam, Markus,

Thanks to point out. Will fix.

Peter



Re: [Qemu-block] [Qemu-devel] [PATCH 03/16] block: introduce bdrv_no_throttling_begin/end

2016-03-08 Thread Fam Zheng
On Tue, 02/16 18:56, Paolo Bonzini wrote:
> Extract the handling of throttling from bdrv_flush_io_queue.

Looks good overall. Have two questions below.

> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block.c   |  1 -
>  block/io.c| 56 
> +--
>  block/throttle-groups.c   |  4 
>  include/block/block_int.h |  6 ++---
>  4 files changed, 42 insertions(+), 25 deletions(-)
> 
> diff --git a/block.c b/block.c
> index efc3c43..b4f328f 100644
> --- a/block.c
> +++ b/block.c
> @@ -2314,7 +2314,6 @@ static void swap_feature_fields(BlockDriverState 
> *bs_top,
>  
>  assert(!bs_new->throttle_state);
>  if (bs_top->throttle_state) {
> -assert(bs_top->io_limits_enabled);
>  bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
>  bdrv_io_limits_disable(bs_top);
>  }
> diff --git a/block/io.c b/block/io.c
> index 5ee5032..272caac 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -69,28 +69,43 @@ void bdrv_set_io_limits(BlockDriverState *bs,
>  throttle_group_config(bs, cfg);
>  }
>  
> -static void bdrv_start_throttled_reqs(BlockDriverState *bs)
> +static void bdrv_no_throttling_begin(BlockDriverState *bs)
>  {
> -bool enabled = bs->io_limits_enabled;
> +BdrvChild *child;
> +
> +QLIST_FOREACH(child, >children, next) {
> +bdrv_no_throttling_begin(child->bs);
> +}
>  
> -bs->io_limits_enabled = false;
> -throttle_group_restart_bs(bs);
> -bs->io_limits_enabled = enabled;
> +if (bs->io_limits_disabled++ == 0) {
> +throttle_group_restart_bs(bs);
> +}
> +}
> +
> +static void bdrv_no_throttling_end(BlockDriverState *bs)
> +{
> +BdrvChild *child;
> +
> +--bs->io_limits_disabled;
> +
> +QLIST_FOREACH(child, >children, next) {
> +bdrv_no_throttling_end(child->bs);
> +}
>  }
>  
>  void bdrv_io_limits_disable(BlockDriverState *bs)
>  {
> -bs->io_limits_enabled = false;
> -bdrv_start_throttled_reqs(bs);
> +assert(bs->throttle_state);
> +bdrv_no_throttling_begin(bs);
>  throttle_group_unregister_bs(bs);
> +bdrv_no_throttling_end(bs);
>  }
>  
>  /* should be called before bdrv_set_io_limits if a limit is set */
>  void bdrv_io_limits_enable(BlockDriverState *bs, const char *group)
>  {
> -assert(!bs->io_limits_enabled);
> +assert(!bs->throttle_state);
>  throttle_group_register_bs(bs, group);
> -bs->io_limits_enabled = true;
>  }
>  
>  void bdrv_io_limits_update_group(BlockDriverState *bs, const char *group)
> @@ -255,6 +270,7 @@ void bdrv_drain(BlockDriverState *bs)
>  {
>  bool busy = true;
>  
> +bdrv_no_throttling_begin(bs);
>  bdrv_drain_recurse(bs);
>  while (busy) {
>  /* Keep iterating */
> @@ -262,6 +278,7 @@ void bdrv_drain(BlockDriverState *bs)
>   busy = bdrv_requests_pending(bs);
>   busy |= aio_poll(bdrv_get_aio_context(bs), busy);
>  }
> +bdrv_no_throttling_end(bs);
>  }
>  
>  /*
> @@ -284,6 +301,7 @@ void bdrv_drain_all(void)
>  if (bs->job) {
>  block_job_pause(bs->job);
>  }
> +bdrv_no_throttling_begin(bs);
>  bdrv_drain_recurse(bs);
>  aio_context_release(aio_context);
>  
> @@ -325,6 +343,7 @@ void bdrv_drain_all(void)
>  AioContext *aio_context = bdrv_get_aio_context(bs);
>  
>  aio_context_acquire(aio_context);
> +bdrv_no_throttling_end(bs);
>  if (bs->job) {
>  block_job_resume(bs->job);
>  }
> @@ -555,11 +574,7 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t 
> offset,
>   * will not fire; so the I/O throttling function has to be disabled here
>   * if it has been enabled.
>   */
> -if (bs->io_limits_enabled) {
> -fprintf(stderr, "Disabling I/O throttling on '%s' due "
> -"to synchronous I/O.\n", bdrv_get_device_name(bs));
> -bdrv_io_limits_disable(bs);
> -}
> +bdrv_no_throttling_begin(bs);
>  
>  if (qemu_in_coroutine()) {
>  /* Fast-path if already in coroutine context */
> @@ -573,6 +588,8 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t 
> offset,
>  aio_poll(aio_context, true);
>  }
>  }
> +
> +bdrv_no_throttling_end(bs);

Does this change the behavior? There wasn't a bdrv_io_limits_enable() here, and
the throttle doesn't come back automatically. Just want to make sure it's
intended.

>  return rwco.ret;
>  }
>  
> @@ -608,13 +625,11 @@ int bdrv_read(BlockDriverState *bs, int64_t sector_num,
>  int bdrv_read_unthrottled(BlockDriverState *bs, int64_t sector_num,
>uint8_t *buf, int nb_sectors)
>  {
> -bool enabled;
>  int ret;
>  
> -enabled = bs->io_limits_enabled;
> -bs->io_limits_enabled = false;
> +bdrv_no_throttling_begin(bs);
>  ret = bdrv_read(bs, sector_num, buf, nb_sectors);
> -bs->io_limits_enabled = 

Re: [Qemu-block] [Qemu-devel] [PATCH 02/16] block: move restarting of throttled reqs to block/throttle-groups.c

2016-03-08 Thread Fam Zheng
On Tue, 02/16 18:56, Paolo Bonzini wrote:
> We want to remove throttled_reqs from block/io.c.  This is the easy
> part---hide the handling of throttled_reqs during disable/enable of
> throttling within throttle-groups.c.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/io.c  | 15 +--
>  block/throttle-groups.c | 15 +++
>  include/block/throttle-groups.h |  1 +
>  3 files changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index e58cfe2..5ee5032 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -66,28 +66,15 @@ static int coroutine_fn 
> bdrv_co_do_write_zeroes(BlockDriverState *bs,
>  void bdrv_set_io_limits(BlockDriverState *bs,
>  ThrottleConfig *cfg)
>  {
> -int i;
> -
>  throttle_group_config(bs, cfg);
> -
> -for (i = 0; i < 2; i++) {
> -qemu_co_enter_next(>throttled_reqs[i]);
> -}
>  }
>  
>  static void bdrv_start_throttled_reqs(BlockDriverState *bs)
>  {
>  bool enabled = bs->io_limits_enabled;
> -int i;
>  
>  bs->io_limits_enabled = false;
> -
> -for (i = 0; i < 2; i++) {
> -while (qemu_co_enter_next(>throttled_reqs[i])) {
> -;
> -}
> -}
> -
> +throttle_group_restart_bs(bs);
>  bs->io_limits_enabled = enabled;
>  }
>  
> diff --git a/block/throttle-groups.c b/block/throttle-groups.c
> index 4920e09..eccfc0d 100644
> --- a/block/throttle-groups.c
> +++ b/block/throttle-groups.c
> @@ -313,6 +313,17 @@ void coroutine_fn 
> throttle_group_co_io_limits_intercept(BlockDriverState *bs,
>  qemu_mutex_unlock(>lock);
>  }
>  
> +void throttle_group_restart_bs(BlockDriverState *bs)
> +{
> +int i;
> +
> +for (i = 0; i < 2; i++) {
> +while (qemu_co_enter_next(>throttled_reqs[i])) {
> +;
> +}
> +}
> +}
> +
>  /* Update the throttle configuration for a particular group. Similar
>   * to throttle_config(), but guarantees atomicity within the
>   * throttling group.
> @@ -335,6 +346,10 @@ void throttle_group_config(BlockDriverState *bs, 
> ThrottleConfig *cfg)
>  }
>  throttle_config(ts, tt, cfg);
>  qemu_mutex_unlock(>lock);
> +
> +aio_context_acquire(bdrv_get_aio_context(bs));
> +throttle_group_restart_bs(bs);
> +aio_context_release(bdrv_get_aio_context(bs));

Could you explain why does this hunk belong to this patch?

Otherwise looks good.

Fam

>  }
>  
>  /* Get the throttle configuration from a particular group. Similar to
> diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
> index aba28f3..395f72d 100644
> --- a/include/block/throttle-groups.h
> +++ b/include/block/throttle-groups.h
> @@ -38,6 +38,7 @@ void throttle_group_get_config(BlockDriverState *bs, 
> ThrottleConfig *cfg);
>  
>  void throttle_group_register_bs(BlockDriverState *bs, const char *groupname);
>  void throttle_group_unregister_bs(BlockDriverState *bs);
> +void throttle_group_restart_bs(BlockDriverState *bs);
>  
>  void coroutine_fn throttle_group_co_io_limits_intercept(BlockDriverState *bs,
>  unsigned int bytes,
> -- 
> 2.5.0
> 
> 
> 



Re: [Qemu-block] [Qemu-devel] [PATCH v2 3/3] vmdk: Switch to heap arrays for vmdk_parent_open

2016-03-08 Thread Fam Zheng
On Tue, 03/08 16:24, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 
> ---
>  block/vmdk.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index c68f456..03be7f0 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -343,15 +343,16 @@ static int vmdk_reopen_prepare(BDRVReopenState *state,
>  static int vmdk_parent_open(BlockDriverState *bs)
>  {
>  char *p_name;
> -char desc[DESC_SIZE + 1];
> +char *desc;
>  BDRVVmdkState *s = bs->opaque;
>  int ret;
>  
> -desc[DESC_SIZE] = '\0';
> +desc = g_malloc0(DESC_SIZE + 1);
>  ret = bdrv_pread(bs->file->bs, s->desc_offset, desc, DESC_SIZE);
>  if (ret < 0) {
> -return ret;
> +goto out;
>  }
> +ret = 0;

Kevin, were you referring to this "ret = 0" in the cover letter? If so, it is
not useless, because ret was set to DESC_SIZE by bdrv_pread. :)

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH v4 07/10] qapi: Don't special-case simple union wrappers

2016-03-08 Thread Markus Armbruster
Eric Blake  writes:

> On 03/08/2016 08:59 AM, Markus Armbruster wrote:
>> Eric Blake  writes:
>> 
>>> Simple unions were carrying a special case that hid their 'data'
>>> QMP member from the resulting C struct, via the hack method
>>> QAPISchemaObjectTypeVariant.simple_union_type().  But by using
>>> the work we started by unboxing flat union and alternate
>>> branches, coupled with the ability to visit the members of an
>>> implicit type, we can now expose the simple union's implicit
>>> type in qapi-types.h:
>>>
>
>>> +++ b/scripts/qapi.py
>>> @@ -1006,7 +1006,6 @@ class QAPISchemaObjectType(QAPISchemaType):
>>>  return c_name(self.name) + pointer_suffix
>>>
>>>  def c_unboxed_type(self):
>>> -assert not self.is_implicit()
>> 
>> Doesn't this belong into PATCH 04?
>> 
>>>  return c_name(self.name)
>
> Maybe. Patch 3 kept the assertion out of straight code refactoring, and
> patch 4 didn't use c_unboxed_type(), so this was the first place where I
> had to weaken the assertion.  But moving it into patch 4 doesn't seem
> like it would hurt, as it is still semantically related to the fact that
> we are planning on allowing an unboxed implicit type.

PATCH 04 drops the assertion from c_name().  Let's keep the two
consistent.

>>> -visit_type_%(c_type)s_members(v, >u.%(c_name)s, );
>>> -''',
>>> - c_type=var.type.c_name(),
>>> - c_name=c_name(var.name))
>>> -ret += mcgen('''
>>> -break;
>>> -''')
>>> +   
>>> variants.tag_member.type.prefix),
>>> + c_type=var.type.c_name(), c_name=c_name(var.name))
>>>
>>>  ret += mcgen('''
>>>  default:
>> 
>> This stupid special case has given us enough trouble, good to see it
>> gone!
>
> Yeah, it was a nice feeling to get to this point!



Re: [Qemu-block] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite

2016-03-08 Thread Paolo Bonzini


On 16/02/2016 18:56, Paolo Bonzini wrote:
> So the fine-grained locking series has grown from 2 parts to 3...
> 
> This first part stops where we remove RFifoLock.
> 
> In the previous submission, the bulk of the series took care of
> associating an AioContext to a thread, so that aio_poll could run event
> handlers only on that thread.  This was done to fix a race in bdrv_drain.
> There were two disadvantages:
> 
> 1) reporting progress from aio_poll to the main thread was Really Hard.
> Despite being relatively sure of the correctness---also thanks to formal
> models---it's not the kind of code that I'd lightly donate to posterity.
> 
> 2) the strict dependency between bdrv_drain and a single AioContext
> would have been bad for multiqueue.
> 
> Instead, this series does the same trick (do not run dataplane event handlers
> from the main thread) but reports progress directly at the BlockDriverState
> level.
> 
> To do so, the mechanism to track in-flight requests is changed to a
> simple counter.  This is more flexible than BdrvTrackedRequest, and lets
> the block/io.c code track precisely the initiation and completion of the
> requests.  In particular, the counter remains nonzero when a bottom half
> is required to "really" complete the request.  bdrv_drain doesn't rely
> anymore on a non-blocking aio_poll to execute those bottom halves.
> 
> It is also more efficient; while BdrvTrackedRequest has to remain
> for request serialisation, with this series we can in principle make
> BdrvTrackedRequest optional and enable it only when something (automatic
> RMW or copy-on-read) requires request serialisation.
> 
> In general this flows nicely, the only snag being patch 5.  The problem
> here is that mirror is calling bdrv_drain from an AIO callback, which
> used to be okay (because bdrv_drain more or less tried to guess when
> all AIO callbacks were done) but now causes a deadlock (because bdrv_drain
> really checks for AIO callbacks to be complete).  To add to the complication,
> the bdrv_drain happens far away from the AIO callback, in the coroutine that
> the AIO callback enters.  The situation here is admittedly underdefined,
> and Stefan pointed out that the same solution is found in many other places
> in the QEMU block layer.  Therefore I think it's acceptable.  I'm pointing
> it out explicitly though, because (together with patch 8) it is an example
> of the issues caused by the new, stricter definition of bdrv_drain.
> 
> Patches 1-7 organize bdrv_drain into separate phases, with a well-defined
> order between the BDS and it children.  The phases are: 1) disable
> throttling; 2) disable io_plug; 3) call bdrv_drain callbacks; 4) wait
> for I/O to finish; 5) re-enable io_plug and throttling.  Previously,
> throttling of throttling and io_plug were handled by bdrv_flush_io_queue,
> which was repeatedly called as part of the I/O wait loop.  This part of
> the series removes bdrv_flush_io_queue.
> 
> Patch 8-11 replace aio_poll with bdrv_drain so that the work done
> so far applies to all former callers of aio_poll.
> 
> Patches 12-13 introduce the logic that lets the main thread wait remotely
> for an iothread to drain the BDS.  Patches 14-16 then achieve the purpose
> of this series, removing the long-running AioContext critical section
> from iothread_run and removing RFifoLock.
> 
> The series passes check-block.sh with a few fixes applied for pre-existing
> bugs:
> 
> vl: fix migration from prelaunch state
> migration: fix incorrect memory_global_dirty_log_start outside BQL
> qed: fix bdrv_qed_drain
> 
> Paolo
> 
> Paolo Bonzini (16):
>   block: make bdrv_start_throttled_reqs return void
>   block: move restarting of throttled reqs to block/throttle-groups.c
>   block: introduce bdrv_no_throttling_begin/end
>   block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
>   mirror: use bottom half to re-enter coroutine
>   block: add BDS field to count in-flight requests
>   block: change drain to look only at one child at a time
>   blockjob: introduce .drain callback for jobs
>   block: wait for all pending I/O when doing synchronous requests
>   nfs: replace aio_poll with bdrv_drain
>   sheepdog: disable dataplane
>   aio: introduce aio_context_in_iothread
>   block: only call aio_poll from iothread
>   iothread: release AioContext around aio_poll
>   qemu-thread: introduce QemuRecMutex
>   aio: convert from RFifoLock to QemuRecMutex
> 
>  async.c |  28 +---
>  block.c |   4 +-
>  block/backup.c  |   7 +
>  block/io.c  | 281 
> +---
>  block/linux-aio.c   |  13 +-
>  block/mirror.c  |  37 +-
>  block/nfs.c |  50 ---
>  block/qed-table.c   |  16 +--
>  block/qed.c |   4 +-
>  block/raw-aio.h |   2 +-
>  block/raw-posix.c   |  16 +--
>  

[Qemu-block] [PATCH 10/11] vmdk: Use BB functions in .bdrv_create()

2016-03-08 Thread Kevin Wolf
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf 
---
 block/vmdk.c | 77 +---
 1 file changed, 43 insertions(+), 34 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index dd80936..23bd57e 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -26,6 +26,7 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "block/block_int.h"
+#include "sysemu/block-backend.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
@@ -1650,7 +1651,7 @@ static int vmdk_create_extent(const char *filename, 
int64_t filesize,
   QemuOpts *opts, Error **errp)
 {
 int ret, i;
-BlockDriverState *bs = NULL;
+BlockBackend *blk = NULL;
 VMDK4Header header;
 Error *local_err = NULL;
 uint32_t tmp, magic, grains, gd_sectors, gt_size, gt_count;
@@ -1663,17 +1664,19 @@ static int vmdk_create_extent(const char *filename, 
int64_t filesize,
 goto exit;
 }
 
-assert(bs == NULL);
-ret = bdrv_open(, filename, NULL, NULL,
-BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
-_err);
-if (ret < 0) {
+blk = blk_new_open("extent", filename, NULL, NULL,
+   BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
+   _err);
+if (blk == NULL) {
 error_propagate(errp, local_err);
+ret = -EIO;
 goto exit;
 }
 
+blk_set_allow_write_beyond_eof(blk, true);
+
 if (flat) {
-ret = bdrv_truncate(bs, filesize);
+ret = blk_truncate(blk, filesize);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not truncate file");
 }
@@ -1728,18 +1731,18 @@ static int vmdk_create_extent(const char *filename, 
int64_t filesize,
 header.check_bytes[3] = 0xa;
 
 /* write all the data */
-ret = bdrv_pwrite(bs, 0, , sizeof(magic));
+ret = blk_pwrite(blk, 0, , sizeof(magic));
 if (ret < 0) {
 error_setg(errp, QERR_IO_ERROR);
 goto exit;
 }
-ret = bdrv_pwrite(bs, sizeof(magic), , sizeof(header));
+ret = blk_pwrite(blk, sizeof(magic), , sizeof(header));
 if (ret < 0) {
 error_setg(errp, QERR_IO_ERROR);
 goto exit;
 }
 
-ret = bdrv_truncate(bs, le64_to_cpu(header.grain_offset) << 9);
+ret = blk_truncate(blk, le64_to_cpu(header.grain_offset) << 9);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not truncate file");
 goto exit;
@@ -1752,8 +1755,8 @@ static int vmdk_create_extent(const char *filename, 
int64_t filesize,
  i < gt_count; i++, tmp += gt_size) {
 gd_buf[i] = cpu_to_le32(tmp);
 }
-ret = bdrv_pwrite(bs, le64_to_cpu(header.rgd_offset) * BDRV_SECTOR_SIZE,
-  gd_buf, gd_buf_size);
+ret = blk_pwrite(blk, le64_to_cpu(header.rgd_offset) * BDRV_SECTOR_SIZE,
+ gd_buf, gd_buf_size);
 if (ret < 0) {
 error_setg(errp, QERR_IO_ERROR);
 goto exit;
@@ -1764,8 +1767,8 @@ static int vmdk_create_extent(const char *filename, 
int64_t filesize,
  i < gt_count; i++, tmp += gt_size) {
 gd_buf[i] = cpu_to_le32(tmp);
 }
-ret = bdrv_pwrite(bs, le64_to_cpu(header.gd_offset) * BDRV_SECTOR_SIZE,
-  gd_buf, gd_buf_size);
+ret = blk_pwrite(blk, le64_to_cpu(header.gd_offset) * BDRV_SECTOR_SIZE,
+ gd_buf, gd_buf_size);
 if (ret < 0) {
 error_setg(errp, QERR_IO_ERROR);
 goto exit;
@@ -1773,8 +1776,8 @@ static int vmdk_create_extent(const char *filename, 
int64_t filesize,
 
 ret = 0;
 exit:
-if (bs) {
-bdrv_unref(bs);
+if (blk) {
+blk_unref(blk);
 }
 g_free(gd_buf);
 return ret;
@@ -1823,7 +1826,7 @@ static int filename_decompose(const char *filename, char 
*path, char *prefix,
 static int vmdk_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 int idx = 0;
-BlockDriverState *new_bs = NULL;
+BlockBackend *new_blk = NULL;
 Error *local_err = NULL;
 char *desc = NULL;
 int64_t total_size = 0, filesize;
@@ -1934,7 +1937,7 @@ static int vmdk_create(const char *filename, QemuOpts 
*opts, Error **errp)
 goto exit;
 }
 if (backing_file) {
-BlockDriverState *bs = NULL;
+BlockBackend *blk;
 char *full_backing = g_new0(char, PATH_MAX);
 bdrv_get_full_backing_filename_from_filename(filename, backing_file,
  full_backing, PATH_MAX,
@@ -1945,19 +1948,21 @@ static int vmdk_create(const char *filename, QemuOpts 
*opts, Error **errp)
 ret = -ENOENT;
 goto exit;
 }
-ret = bdrv_open(, full_backing, NULL, NULL,
-BDRV_O_NO_BACKING | 

[Qemu-block] [PATCH 06/11] qed: Use BB functions in .bdrv_create()

2016-03-08 Thread Kevin Wolf
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf 
---
 block/qed.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index 404be1e..8de7dd0 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -18,6 +18,7 @@
 #include "qed.h"
 #include "qapi/qmp/qerror.h"
 #include "migration/migration.h"
+#include "sysemu/block-backend.h"
 
 static const AIOCBInfo qed_aiocb_info = {
 .aiocb_size = sizeof(QEDAIOCB),
@@ -580,7 +581,7 @@ static int qed_create(const char *filename, uint32_t 
cluster_size,
 size_t l1_size = header.cluster_size * header.table_size;
 Error *local_err = NULL;
 int ret = 0;
-BlockDriverState *bs;
+BlockBackend *blk;
 
 ret = bdrv_create_file(filename, opts, _err);
 if (ret < 0) {
@@ -588,17 +589,18 @@ static int qed_create(const char *filename, uint32_t 
cluster_size,
 return ret;
 }
 
-bs = NULL;
-ret = bdrv_open(, filename, NULL, NULL,
-BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
-_err);
-if (ret < 0) {
+blk = blk_new_open("image", filename, NULL, NULL,
+   BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
+   _err);
+if (blk == NULL) {
 error_propagate(errp, local_err);
-return ret;
+return -EIO;
 }
 
+blk_set_allow_write_beyond_eof(blk, true);
+
 /* File must start empty and grow, check truncate is supported */
-ret = bdrv_truncate(bs, 0);
+ret = blk_truncate(blk, 0);
 if (ret < 0) {
 goto out;
 }
@@ -614,18 +616,18 @@ static int qed_create(const char *filename, uint32_t 
cluster_size,
 }
 
 qed_header_cpu_to_le(, _header);
-ret = bdrv_pwrite(bs, 0, _header, sizeof(le_header));
+ret = blk_pwrite(blk, 0, _header, sizeof(le_header));
 if (ret < 0) {
 goto out;
 }
-ret = bdrv_pwrite(bs, sizeof(le_header), backing_file,
-  header.backing_filename_size);
+ret = blk_pwrite(blk, sizeof(le_header), backing_file,
+ header.backing_filename_size);
 if (ret < 0) {
 goto out;
 }
 
 l1_table = g_malloc0(l1_size);
-ret = bdrv_pwrite(bs, header.l1_table_offset, l1_table, l1_size);
+ret = blk_pwrite(blk, header.l1_table_offset, l1_table, l1_size);
 if (ret < 0) {
 goto out;
 }
@@ -633,7 +635,7 @@ static int qed_create(const char *filename, uint32_t 
cluster_size,
 ret = 0; /* success */
 out:
 g_free(l1_table);
-bdrv_unref(bs);
+blk_unref(blk);
 return ret;
 }
 
-- 
1.8.3.1




[Qemu-block] [PATCH 07/11] sheepdog: Use BB functions in .bdrv_create()

2016-03-08 Thread Kevin Wolf
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf 
---
 block/sheepdog.c | 43 ---
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 5f31ab3..335a60e 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -18,6 +18,7 @@
 #include "qemu/error-report.h"
 #include "qemu/sockets.h"
 #include "block/block_int.h"
+#include "sysemu/block-backend.h"
 #include "qemu/bitops.h"
 
 #define SD_PROTO_VER 0x01
@@ -1637,7 +1638,7 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t 
*vdi_id, int snapshot,
 
 static int sd_prealloc(const char *filename, Error **errp)
 {
-BlockDriverState *bs = NULL;
+BlockBackend *blk = NULL;
 BDRVSheepdogState *base = NULL;
 unsigned long buf_size;
 uint32_t idx, max_idx;
@@ -1646,20 +1647,23 @@ static int sd_prealloc(const char *filename, Error 
**errp)
 void *buf = NULL;
 int ret;
 
-ret = bdrv_open(, filename, NULL, NULL,
-BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
-errp);
-if (ret < 0) {
+blk = blk_new_open("image-prealloc", filename, NULL, NULL,
+   BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
+   errp);
+if (blk == NULL) {
+ret = -EIO;
 goto out_with_err_set;
 }
 
-vdi_size = bdrv_getlength(bs);
+blk_set_allow_write_beyond_eof(blk, true);
+
+vdi_size = blk_getlength(blk);
 if (vdi_size < 0) {
 ret = vdi_size;
 goto out;
 }
 
-base = bs->opaque;
+base = blk_bs(blk)->opaque;
 object_size = (UINT32_C(1) << base->inode.block_size_shift);
 buf_size = MIN(object_size, SD_DATA_OBJ_SIZE);
 buf = g_malloc0(buf_size);
@@ -1671,23 +1675,24 @@ static int sd_prealloc(const char *filename, Error 
**errp)
  * The created image can be a cloned image, so we need to read
  * a data from the source image.
  */
-ret = bdrv_pread(bs, idx * buf_size, buf, buf_size);
+ret = blk_pread(blk, idx * buf_size, buf, buf_size);
 if (ret < 0) {
 goto out;
 }
-ret = bdrv_pwrite(bs, idx * buf_size, buf, buf_size);
+ret = blk_pwrite(blk, idx * buf_size, buf, buf_size);
 if (ret < 0) {
 goto out;
 }
 }
 
+ret = 0;
 out:
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Can't pre-allocate");
 }
 out_with_err_set:
-if (bs) {
-bdrv_unref(bs);
+if (blk) {
+blk_unref(blk);
 }
 g_free(buf);
 
@@ -1827,7 +1832,7 @@ static int sd_create(const char *filename, QemuOpts *opts,
 }
 
 if (backing_file) {
-BlockDriverState *bs;
+BlockBackend *blk;
 BDRVSheepdogState *base;
 BlockDriver *drv;
 
@@ -1839,23 +1844,23 @@ static int sd_create(const char *filename, QemuOpts 
*opts,
 goto out;
 }
 
-bs = NULL;
-ret = bdrv_open(, backing_file, NULL, NULL,
-BDRV_O_PROTOCOL | BDRV_O_CACHE_WB, errp);
-if (ret < 0) {
+blk = blk_new_open("backing", backing_file, NULL, NULL,
+   BDRV_O_PROTOCOL | BDRV_O_CACHE_WB, errp);
+if (blk == NULL) {
+ret = -EIO;
 goto out;
 }
 
-base = bs->opaque;
+base = blk_bs(blk)->opaque;
 
 if (!is_snapshot(>inode)) {
 error_setg(errp, "cannot clone from a non snapshot vdi");
-bdrv_unref(bs);
+blk_unref(blk);
 ret = -EINVAL;
 goto out;
 }
 s->inode.vdi_id = base->inode.vdi_id;
-bdrv_unref(bs);
+blk_unref(blk);
 }
 
 s->aio_context = qemu_get_aio_context();
-- 
1.8.3.1




[Qemu-block] [PATCH 11/11] vpc: Use BB functions in .bdrv_create()

2016-03-08 Thread Kevin Wolf
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf 
---
 block/vpc.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 1db47d6..0d1524d 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -25,6 +25,7 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "block/block_int.h"
+#include "sysemu/block-backend.h"
 #include "qemu/module.h"
 #include "migration/migration.h"
 #if defined(CONFIG_UUID)
@@ -758,7 +759,7 @@ static int calculate_geometry(int64_t total_sectors, 
uint16_t* cyls,
 return 0;
 }
 
-static int create_dynamic_disk(BlockDriverState *bs, uint8_t *buf,
+static int create_dynamic_disk(BlockBackend *blk, uint8_t *buf,
int64_t total_sectors)
 {
 VHDDynDiskHeader *dyndisk_header =
@@ -772,13 +773,13 @@ static int create_dynamic_disk(BlockDriverState *bs, 
uint8_t *buf,
 block_size = 0x20;
 num_bat_entries = (total_sectors + block_size / 512) / (block_size / 512);
 
-ret = bdrv_pwrite_sync(bs, offset, buf, HEADER_SIZE);
+ret = blk_pwrite(blk, offset, buf, HEADER_SIZE);
 if (ret) {
 goto fail;
 }
 
 offset = 1536 + ((num_bat_entries * 4 + 511) & ~511);
-ret = bdrv_pwrite_sync(bs, offset, buf, HEADER_SIZE);
+ret = blk_pwrite(blk, offset, buf, HEADER_SIZE);
 if (ret < 0) {
 goto fail;
 }
@@ -788,7 +789,7 @@ static int create_dynamic_disk(BlockDriverState *bs, 
uint8_t *buf,
 
 memset(buf, 0xFF, 512);
 for (i = 0; i < (num_bat_entries * 4 + 511) / 512; i++) {
-ret = bdrv_pwrite_sync(bs, offset, buf, 512);
+ret = blk_pwrite(blk, offset, buf, 512);
 if (ret < 0) {
 goto fail;
 }
@@ -815,7 +816,7 @@ static int create_dynamic_disk(BlockDriverState *bs, 
uint8_t *buf,
 // Write the header
 offset = 512;
 
-ret = bdrv_pwrite_sync(bs, offset, buf, 1024);
+ret = blk_pwrite(blk, offset, buf, 1024);
 if (ret < 0) {
 goto fail;
 }
@@ -824,7 +825,7 @@ static int create_dynamic_disk(BlockDriverState *bs, 
uint8_t *buf,
 return ret;
 }
 
-static int create_fixed_disk(BlockDriverState *bs, uint8_t *buf,
+static int create_fixed_disk(BlockBackend *blk, uint8_t *buf,
  int64_t total_size)
 {
 int ret;
@@ -832,12 +833,12 @@ static int create_fixed_disk(BlockDriverState *bs, 
uint8_t *buf,
 /* Add footer to total size */
 total_size += HEADER_SIZE;
 
-ret = bdrv_truncate(bs, total_size);
+ret = blk_truncate(blk, total_size);
 if (ret < 0) {
 return ret;
 }
 
-ret = bdrv_pwrite_sync(bs, total_size - HEADER_SIZE, buf, HEADER_SIZE);
+ret = blk_pwrite(blk, total_size - HEADER_SIZE, buf, HEADER_SIZE);
 if (ret < 0) {
 return ret;
 }
@@ -860,7 +861,7 @@ static int vpc_create(const char *filename, QemuOpts *opts, 
Error **errp)
 int ret = -EIO;
 bool force_size;
 Error *local_err = NULL;
-BlockDriverState *bs = NULL;
+BlockBackend *blk = NULL;
 
 /* Read out options */
 total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
@@ -886,14 +887,18 @@ static int vpc_create(const char *filename, QemuOpts 
*opts, Error **errp)
 error_propagate(errp, local_err);
 goto out;
 }
-ret = bdrv_open(, filename, NULL, NULL,
-BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
-_err);
-if (ret < 0) {
+
+blk = blk_new_open("image", filename, NULL, NULL,
+   BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
+   _err);
+if (blk == NULL) {
 error_propagate(errp, local_err);
+ret = -EIO;
 goto out;
 }
 
+blk_set_allow_write_beyond_eof(blk, true);
+
 /*
  * Calculate matching total_size and geometry. Increase the number of
  * sectors requested until we get enough (or fail). This ensures that
@@ -965,13 +970,13 @@ static int vpc_create(const char *filename, QemuOpts 
*opts, Error **errp)
 footer->checksum = cpu_to_be32(vpc_checksum(buf, HEADER_SIZE));
 
 if (disk_type == VHD_DYNAMIC) {
-ret = create_dynamic_disk(bs, buf, total_sectors);
+ret = create_dynamic_disk(blk, buf, total_sectors);
 } else {
-ret = create_fixed_disk(bs, buf, total_size);
+ret = create_fixed_disk(blk, buf, total_size);
 }
 
 out:
-bdrv_unref(bs);
+blk_unref(blk);
 g_free(disk_type_param);
 return ret;
 }
-- 
1.8.3.1




[Qemu-block] [PATCH 04/11] qcow: Use BB functions in .bdrv_create()

2016-03-08 Thread Kevin Wolf
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf 
---
 block/qcow.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index c46810c..2fd5ee6 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -24,6 +24,7 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "block/block_int.h"
+#include "sysemu/block-backend.h"
 #include "qemu/module.h"
 #include 
 #include "qapi/qmp/qerror.h"
@@ -780,7 +781,7 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 int flags = 0;
 Error *local_err = NULL;
 int ret;
-BlockDriverState *qcow_bs;
+BlockBackend *qcow_blk;
 
 /* Read out options */
 total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
@@ -796,16 +797,18 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 goto cleanup;
 }
 
-qcow_bs = NULL;
-ret = bdrv_open(_bs, filename, NULL, NULL,
-BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
-_err);
-if (ret < 0) {
+qcow_blk = blk_new_open("image", filename, NULL, NULL,
+BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
+_err);
+if (qcow_blk == NULL) {
 error_propagate(errp, local_err);
+ret = -EIO;
 goto cleanup;
 }
 
-ret = bdrv_truncate(qcow_bs, 0);
+blk_set_allow_write_beyond_eof(qcow_blk, true);
+
+ret = blk_truncate(qcow_blk, 0);
 if (ret < 0) {
 goto exit;
 }
@@ -845,13 +848,13 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 
 /* write all the data */
-ret = bdrv_pwrite(qcow_bs, 0, , sizeof(header));
+ret = blk_pwrite(qcow_blk, 0, , sizeof(header));
 if (ret != sizeof(header)) {
 goto exit;
 }
 
 if (backing_file) {
-ret = bdrv_pwrite(qcow_bs, sizeof(header),
+ret = blk_pwrite(qcow_blk, sizeof(header),
 backing_file, backing_filename_len);
 if (ret != backing_filename_len) {
 goto exit;
@@ -861,7 +864,7 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 tmp = g_malloc0(BDRV_SECTOR_SIZE);
 for (i = 0; i < ((sizeof(uint64_t)*l1_size + BDRV_SECTOR_SIZE - 1)/
 BDRV_SECTOR_SIZE); i++) {
-ret = bdrv_pwrite(qcow_bs, header_size +
+ret = blk_pwrite(qcow_blk, header_size +
 BDRV_SECTOR_SIZE*i, tmp, BDRV_SECTOR_SIZE);
 if (ret != BDRV_SECTOR_SIZE) {
 g_free(tmp);
@@ -872,7 +875,7 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 g_free(tmp);
 ret = 0;
 exit:
-bdrv_unref(qcow_bs);
+blk_unref(qcow_blk);
 cleanup:
 g_free(backing_file);
 return ret;
-- 
1.8.3.1




[Qemu-block] [PATCH 00/11] block: Use BB function in .bdrv_create() implementations

2016-03-08 Thread Kevin Wolf
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementations are such users, so this series
converts them.

This series (specifically patch 1) will also help with moving the
writethrough cache implementation from BDS to BB, where it really
belongs. Once this is moved, it wouldn't be possible to use bdrv_open()
to use a standalone BDS writethrough any more.

Of course, writethrough doesn't make any sense in .bdrv_create() anyway,
so while I'm still converting everything to BB where it would keep
working (because that's the Right Thing), the drivers don't actually
make use of this fact any more after this series.


Kevin Wolf (11):
  block: Use writeback in .bdrv_create() implementations
  block: Introduce blk_set_allow_write_beyond_eof()
  parallels: Use BB functions in .bdrv_create()
  qcow: Use BB functions in .bdrv_create()
  qcow2: Use BB functions in .bdrv_create()
  qed: Use BB functions in .bdrv_create()
  sheepdog: Use BB functions in .bdrv_create()
  vdi: Use BB functions in .bdrv_create()
  vhdx: Use BB functions in .bdrv_create()
  vmdk: Use BB functions in .bdrv_create()
  vpc: Use BB functions in .bdrv_create()

 block/block-backend.c  | 23 +
 block/parallels.c  | 24 --
 block/qcow.c   | 24 --
 block/qcow2.c  | 61 ++
 block/qed.c| 28 
 block/sheepdog.c   | 41 +--
 block/vdi.c| 22 -
 block/vhdx.c   | 28 +---
 block/vmdk.c   | 74 --
 block/vpc.c| 36 +++-
 include/sysemu/block-backend.h |  1 +
 11 files changed, 211 insertions(+), 151 deletions(-)

-- 
1.8.3.1




[Qemu-block] [PATCH 01/11] block: Use writeback in .bdrv_create() implementations

2016-03-08 Thread Kevin Wolf
There's no reason to use a writethrough cache mode while creating an
image.

Signed-off-by: Kevin Wolf 
---
 block/parallels.c | 3 ++-
 block/qcow.c  | 3 ++-
 block/qcow2.c | 3 ++-
 block/sheepdog.c  | 6 --
 block/vdi.c   | 3 ++-
 block/vhdx.c  | 3 ++-
 block/vmdk.c  | 9 ++---
 block/vpc.c   | 3 ++-
 8 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 645521d..3f9fd48 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -479,7 +479,8 @@ static int parallels_create(const char *filename, QemuOpts 
*opts, Error **errp)
 
 file = NULL;
 ret = bdrv_open(, filename, NULL, NULL,
-BDRV_O_RDWR | BDRV_O_PROTOCOL, _err);
+BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
+_err);
 if (ret < 0) {
 error_propagate(errp, local_err);
 return ret;
diff --git a/block/qcow.c b/block/qcow.c
index 251910c..c46810c 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -798,7 +798,8 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 
 qcow_bs = NULL;
 ret = bdrv_open(_bs, filename, NULL, NULL,
-BDRV_O_RDWR | BDRV_O_PROTOCOL, _err);
+BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
+_err);
 if (ret < 0) {
 error_propagate(errp, local_err);
 goto cleanup;
diff --git a/block/qcow2.c b/block/qcow2.c
index 8babecd..5a79177 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2173,7 +2173,8 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 }
 
 bs = NULL;
-ret = bdrv_open(, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
+ret = bdrv_open(, filename, NULL, NULL,
+BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
 _err);
 if (ret < 0) {
 error_propagate(errp, local_err);
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 8739acc..5f31ab3 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1646,7 +1646,8 @@ static int sd_prealloc(const char *filename, Error **errp)
 void *buf = NULL;
 int ret;
 
-ret = bdrv_open(, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
+ret = bdrv_open(, filename, NULL, NULL,
+BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
 errp);
 if (ret < 0) {
 goto out_with_err_set;
@@ -1839,7 +1840,8 @@ static int sd_create(const char *filename, QemuOpts *opts,
 }
 
 bs = NULL;
-ret = bdrv_open(, backing_file, NULL, NULL, BDRV_O_PROTOCOL, errp);
+ret = bdrv_open(, backing_file, NULL, NULL,
+BDRV_O_PROTOCOL | BDRV_O_CACHE_WB, errp);
 if (ret < 0) {
 goto out;
 }
diff --git a/block/vdi.c b/block/vdi.c
index b403243..12407c4 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -766,7 +766,8 @@ static int vdi_create(const char *filename, QemuOpts *opts, 
Error **errp)
 error_propagate(errp, local_err);
 goto exit;
 }
-ret = bdrv_open(, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
+ret = bdrv_open(, filename, NULL, NULL,
+BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
 _err);
 if (ret < 0) {
 error_propagate(errp, local_err);
diff --git a/block/vhdx.c b/block/vhdx.c
index 9a51428..ea030ad 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1838,7 +1838,8 @@ static int vhdx_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 
 bs = NULL;
-ret = bdrv_open(, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
+ret = bdrv_open(, filename, NULL, NULL,
+BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
 _err);
 if (ret < 0) {
 error_propagate(errp, local_err);
diff --git a/block/vmdk.c b/block/vmdk.c
index 03be7f0..dd80936 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1664,7 +1664,8 @@ static int vmdk_create_extent(const char *filename, 
int64_t filesize,
 }
 
 assert(bs == NULL);
-ret = bdrv_open(, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
+ret = bdrv_open(, filename, NULL, NULL,
+BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
 _err);
 if (ret < 0) {
 error_propagate(errp, local_err);
@@ -1944,7 +1945,8 @@ static int vmdk_create(const char *filename, QemuOpts 
*opts, Error **errp)
 ret = -ENOENT;
 goto exit;
 }
-ret = bdrv_open(, full_backing, NULL, NULL, BDRV_O_NO_BACKING, 
errp);
+ret = bdrv_open(, full_backing, NULL, NULL,
+BDRV_O_NO_BACKING | BDRV_O_CACHE_WB, errp);
 g_free(full_backing);
 if (ret != 0) {
 goto exit;
@@ -2015,7 +2017,8 @@ static int vmdk_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 

[Qemu-block] [PATCH 02/11] block: Introduce blk_set_allow_write_beyond_eof()

2016-03-08 Thread Kevin Wolf
We check that the guest can't write beyond the end of its disk, but for
other internal users it can make sense to allow growing a file.

Signed-off-by: Kevin Wolf 
---
 block/block-backend.c  | 23 ---
 include/sysemu/block-backend.h |  1 +
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index ebdf78a..03e71b4 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -50,6 +50,8 @@ struct BlockBackend {
 bool iostatus_enabled;
 BlockDeviceIoStatus iostatus;
 
+bool allow_write_beyond_eof;
+
 NotifierList remove_bs_notifiers, insert_bs_notifiers;
 };
 
@@ -579,6 +581,11 @@ void blk_iostatus_set_err(BlockBackend *blk, int error)
 }
 }
 
+void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow)
+{
+blk->allow_write_beyond_eof = allow;
+}
+
 static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
   size_t size)
 {
@@ -592,17 +599,19 @@ static int blk_check_byte_request(BlockBackend *blk, 
int64_t offset,
 return -ENOMEDIUM;
 }
 
-len = blk_getlength(blk);
-if (len < 0) {
-return len;
-}
-
 if (offset < 0) {
 return -EIO;
 }
 
-if (offset > len || len - offset < size) {
-return -EIO;
+if (!blk->allow_write_beyond_eof) {
+len = blk_getlength(blk);
+if (len < 0) {
+return len;
+}
+
+if (offset > len || len - offset < size) {
+return -EIO;
+}
 }
 
 return 0;
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 66c5cf2..00d69ba 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -78,6 +78,7 @@ void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs);
 
 void blk_hide_on_behalf_of_hmp_drive_del(BlockBackend *blk);
 
+void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow);
 void blk_iostatus_enable(BlockBackend *blk);
 bool blk_iostatus_is_enabled(const BlockBackend *blk);
 BlockDeviceIoStatus blk_iostatus(const BlockBackend *blk);
-- 
1.8.3.1




Re: [Qemu-block] [Qemu-devel] [PATCH v4 07/10] qapi: Don't special-case simple union wrappers

2016-03-08 Thread Eric Blake
On 03/08/2016 08:59 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> Simple unions were carrying a special case that hid their 'data'
>> QMP member from the resulting C struct, via the hack method
>> QAPISchemaObjectTypeVariant.simple_union_type().  But by using
>> the work we started by unboxing flat union and alternate
>> branches, coupled with the ability to visit the members of an
>> implicit type, we can now expose the simple union's implicit
>> type in qapi-types.h:
>>

>> +++ b/scripts/qapi.py
>> @@ -1006,7 +1006,6 @@ class QAPISchemaObjectType(QAPISchemaType):
>>  return c_name(self.name) + pointer_suffix
>>
>>  def c_unboxed_type(self):
>> -assert not self.is_implicit()
> 
> Doesn't this belong into PATCH 04?
> 
>>  return c_name(self.name)

Maybe. Patch 3 kept the assertion out of straight code refactoring, and
patch 4 didn't use c_unboxed_type(), so this was the first place where I
had to weaken the assertion.  But moving it into patch 4 doesn't seem
like it would hurt, as it is still semantically related to the fact that
we are planning on allowing an unboxed implicit type.

>> -visit_type_%(c_type)s_members(v, >u.%(c_name)s, );
>> -''',
>> - c_type=var.type.c_name(),
>> - c_name=c_name(var.name))
>> -ret += mcgen('''
>> -break;
>> -''')
>> +   variants.tag_member.type.prefix),
>> + c_type=var.type.c_name(), c_name=c_name(var.name))
>>
>>  ret += mcgen('''
>>  default:
> 
> This stupid special case has given us enough trouble, good to see it
> gone!

Yeah, it was a nice feeling to get to this point!

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH 2/8] block: fix unbounded stack for dump_qdict

2016-03-08 Thread Peter Xu
Suggested-by: Paolo Bonzini 
CC: Markus Armbruster 
CC: Kevin Wolf 
CC: qemu-block@nongnu.org
Signed-off-by: Peter Xu 
---
 block/qapi.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/qapi.c b/block/qapi.c
index db2d3fb..687e577 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -638,9 +638,12 @@ static void dump_qdict(fprintf_function func_fprintf, void 
*f, int indentation,
 QType type = qobject_type(entry->value);
 bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
 const char *format = composite ? "%*s%s:\n" : "%*s%s: ";
-char key[strlen(entry->key) + 1];
+#define __KEY_LEN (256)
+char key[__KEY_LEN];
 int i;
 
+assert(strlen(entry->key) + 1 <= __KEY_LEN);
+#undef __KEY_LEN
 /* replace dashes with spaces in key (variable) names */
 for (i = 0; entry->key[i]; i++) {
 key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];
-- 
2.4.3




Re: [Qemu-block] [Qemu-devel] [PATCH 2/8] block: fix unbounded stack for dump_qdict

2016-03-08 Thread Peter Xu
On Tue, Mar 08, 2016 at 09:12:45AM +0100, Markus Armbruster wrote:
> Peter Xu  writes:
> >  const char *format = composite ? "%*s%s:\n" : "%*s%s: ";
> 
> Unrelated to your patch: ugh!
> 
> Printf formats should be literals whenever possible, to make it easy for
> the compiler to warn you when you screw up.  It's trivially possible
> here!  Instead of
> 
>func_fprintf(f, format, indentation * 4, "", key);
> 
> do
> 
>func_fprintf(f, "%*s%s:%c", indentation * 4, "", key,
> composite ? '\n', ' ');;

Yes, I can do that too. Do you think I should split the patchset
into several smaller ones possibly? So that I can use a 2/2 for this
specific function, one for the printf() issue, one for the stack
allocation issue.

> 
> > -char key[strlen(entry->key) + 1];
> > +#define __KEY_LEN (256)
> > +char key[__KEY_LEN];
> >  int i;
> >  
> > +assert(strlen(entry->key) + 1 <= __KEY_LEN);
> > +#undef __KEY_LEN
> >  /* replace dashes with spaces in key (variable) names */
> >  for (i = 0; entry->key[i]; i++) {
> >  key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];
> 
> I'm afraid this isn't a good idea.  It relies on the non-local argument
> that nobody will ever put a key longer than 255 into a qdict that gets
> dumped.  That may even be the case, but you need to *prove* it, not just
> assert it.  The weakest acceptable proof might be assertions in every
> place that put keys into a dict that might get dumped.  I suspect that's
> practical and maintainable only if there's a single place that does it.

Yes. Actually I doubted whether I should do this change... since I
am not experienced enough to estimate a value which will be 100%
working all the time. Here I just assumed 256 is big enough for
qdict keys, which indeed is lack of proof.

> 
> If this was a good idea, I'd recommend to avoid the awkward macro:
> 
>char key[256];
>int i;
>
>assert(strlen(entry->key) + 1 <= ARRAY_SIZE(key));

Yes, possibly! I forgot ARRAY_SIZE() macro. Thanks to point out.

> 
> There are several other ways to limit the stack usage:
> 
> 1. Move the array from stack to heap.  Fine unless it's on a hot path.
>As far as I can tell, this dumping business is for HMP and qemu-io,
>i.e. not hot.
> 
> 2. Use a stack array for short strings, switch to the heap for large
>strings.  More complex, but may be worth it on a hot path where most
>strings are short.
> 
> 3. Instead of creating a new string with '-' replaced for printing,
>print characters.  Can be okay with buffered I/O, but obviously not
>on a hot path.
> 
> 4. Like 3., but print substrings not containing '-'.

Thanks for all the suggestions and ideas.

To avoid the limitation of 256 (and also consider this path is not
hot path), I'd like to choose (1) if you would not mind, in a split
patch maybe.

Thanks.
Peter



Re: [Qemu-block] [PATCH] vmdk: Create streamOptimized as version 3

2016-03-08 Thread Radoslav Gerganov
On 17.09.2015 08:04, Fam Zheng wrote:
> VMware products accept only version 3 for streamOptimized, let's bump
> the version.
> 
> Reported-by: Radoslav Gerganov 
> Signed-off-by: Fam Zheng 
> ---
>  block/vmdk.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)

Hi Fam,

I am really sorry for the late reply.  Your patch looks great, I confirm that
VMware products expect VMDK version 3 for streamOptimized images.

I will try to follow-up the VMDK work promptly from now on.  Please keep
adding me in CC.  Thanks! 

-Rado



Re: [Qemu-block] [Qemu-devel] [PATCH] vmdk: Switch to heap arrays for vmdk_write_cid

2016-03-08 Thread Peter Xu
On Tue, Mar 08, 2016 at 02:18:35PM +0800, Fam Zheng wrote:
> It is only called once for each opened image, so we can do it the easy
> way.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/vmdk.c | 25 ++---
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index a8db5d9..1ec2452 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -274,36 +274,39 @@ static uint32_t vmdk_read_cid(BlockDriverState *bs, int 
> parent)
>  
>  static int vmdk_write_cid(BlockDriverState *bs, uint32_t cid)

If to use heap in write, do we need to do the same for e.g.,
vmdk_parent_open() and vmdk_read_cid(), to make all things at least
aligned?

>  {
> -char desc[DESC_SIZE], tmp_desc[DESC_SIZE];
> +char *desc, *tmp_desc;
>  char *p_name, *tmp_str;
>  BDRVVmdkState *s = bs->opaque;
> -int ret;
> +int ret = 0;
>  
> +desc = g_malloc0(DESC_SIZE);
> +tmp_desc = g_malloc0(DESC_SIZE);
>  ret = bdrv_pread(bs->file->bs, s->desc_offset, desc, DESC_SIZE);
>  if (ret < 0) {
> -return ret;
> +goto out;
>  }
>  
>  desc[DESC_SIZE - 1] = '\0';
>  tmp_str = strstr(desc, "parentCID");
>  if (tmp_str == NULL) {
> -return -EINVAL;
> +ret = -EINVAL;
> +goto out;
>  }
>  
> -pstrcpy(tmp_desc, sizeof(tmp_desc), tmp_str);
> +pstrcpy(tmp_desc, DESC_SIZE, tmp_str);
>  p_name = strstr(desc, "CID");
>  if (p_name != NULL) {
>  p_name += sizeof("CID");
> -snprintf(p_name, sizeof(desc) - (p_name - desc), "%" PRIx32 "\n", 
> cid);
> -pstrcat(desc, sizeof(desc), tmp_desc);
> +snprintf(p_name, DESC_SIZE - (p_name - desc), "%" PRIx32 "\n", cid);
> +pstrcat(desc, DESC_SIZE, tmp_desc);
>  }
>  
>  ret = bdrv_pwrite_sync(bs->file->bs, s->desc_offset, desc, DESC_SIZE);
> -if (ret < 0) {
> -return ret;
> -}
>  
> -return 0;
> +out:
> +g_free(desc);
> +g_free(tmp_desc);
> +return ret;
>  }
>  
>  static int vmdk_is_cid_valid(BlockDriverState *bs)
> -- 
> 2.4.3
> 
> 

Besides the above nit-pick:

Reviewed-by: Peter Xu 



[Qemu-block] [PATCH 0/8] Fix several unbounded stack usage

2016-03-08 Thread Peter Xu
Suggested by Paolo.

Ths patchset fixes several of the warnings generated by
"-Wstack-usage=10".  There are about 20-30 unbound stack cases
during my build, and this patch is only fixing several of them,
those which are obvious and easy.  For the rest, most of them need
some knowledge on specific area (e.g., USB, net, block) to have a
better assessment on the limitiation values, and are not covered in
this patchset.

One thing to mention about patch 4: I still cannot figure out why
the function xhci_dma_write_u32s() cannot be inlined in short
time... However, the current fix can at least keep the code behavior
not changed while making it stack bounded.  Please let me know if
anyone knows.

Thanks.
Peter

CC: Markus Armbruster 
CC: Kevin Wolf 
CC: "Michael S. Tsirkin" 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: Gerd Hoffmann 
CC: Juan Quintela 
CC: Amit Shah 
CC: Luiz Capitulino 
CC: qemu-block@nongnu.org

Peter Xu (8):
  qdict: fix unbounded stack for qdict_array_entries
  block: fix unbounded stack for dump_qdict
  usb: fix unbounded stack for ohci_td_pkt
  usb: fix unbounded stack for xhci_dma_write_u32s
  usb: fix unbounded stack for inotify_watchfn
  usb: fix unbounded stack for usb_mtp_add_str
  migration: fix unbounded stack for source_return_path_thread
  hw/i386: fix unbounded stack for load_multiboot

 block/qapi.c  |  5 -
 hw/i386/multiboot.c   | 10 +-
 hw/usb/dev-mtp.c  | 13 +
 hw/usb/hcd-ohci.c |  7 ---
 hw/usb/hcd-xhci.c | 12 
 migration/migration.c |  7 ---
 qobject/qdict.c   | 15 +--
 7 files changed, 47 insertions(+), 22 deletions(-)

-- 
2.4.3




Re: [Qemu-block] [Qemu-devel] [PATCH 2/8] block: fix unbounded stack for dump_qdict

2016-03-08 Thread Markus Armbruster
Peter Xu  writes:

> On Tue, Mar 08, 2016 at 09:12:45AM +0100, Markus Armbruster wrote:
>> Peter Xu  writes:
>> >  const char *format = composite ? "%*s%s:\n" : "%*s%s: ";
>> 
>> Unrelated to your patch: ugh!
>> 
>> Printf formats should be literals whenever possible, to make it easy for
>> the compiler to warn you when you screw up.  It's trivially possible
>> here!  Instead of
>> 
>>func_fprintf(f, format, indentation * 4, "", key);
>> 
>> do
>> 
>>func_fprintf(f, "%*s%s:%c", indentation * 4, "", key,
>> composite ? '\n', ' ');;
>
> Yes, I can do that too. Do you think I should split the patchset
> into several smaller ones possibly? So that I can use a 2/2 for this
> specific function, one for the printf() issue, one for the stack
> allocation issue.

Since the format string cleanup and the unbounded stack issue aren't
related, separate patches make sense.

>> 
>> > -char key[strlen(entry->key) + 1];
>> > +#define __KEY_LEN (256)
>> > +char key[__KEY_LEN];
>> >  int i;
>> >  
>> > +assert(strlen(entry->key) + 1 <= __KEY_LEN);
>> > +#undef __KEY_LEN
>> >  /* replace dashes with spaces in key (variable) names */
>> >  for (i = 0; entry->key[i]; i++) {
>> >  key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];
>> 
>> I'm afraid this isn't a good idea.  It relies on the non-local argument
>> that nobody will ever put a key longer than 255 into a qdict that gets
>> dumped.  That may even be the case, but you need to *prove* it, not just
>> assert it.  The weakest acceptable proof might be assertions in every
>> place that put keys into a dict that might get dumped.  I suspect that's
>> practical and maintainable only if there's a single place that does it.
>
> Yes. Actually I doubted whether I should do this change... since I
> am not experienced enough to estimate a value which will be 100%
> working all the time. Here I just assumed 256 is big enough for
> qdict keys, which indeed is lack of proof.
>
>> 
>> If this was a good idea, I'd recommend to avoid the awkward macro:
>> 
>>char key[256];
>>int i;
>>
>>assert(strlen(entry->key) + 1 <= ARRAY_SIZE(key));
>
> Yes, possibly! I forgot ARRAY_SIZE() macro. Thanks to point out.
>
>> 
>> There are several other ways to limit the stack usage:
>> 
>> 1. Move the array from stack to heap.  Fine unless it's on a hot path.
>>As far as I can tell, this dumping business is for HMP and qemu-io,
>>i.e. not hot.
>> 
>> 2. Use a stack array for short strings, switch to the heap for large
>>strings.  More complex, but may be worth it on a hot path where most
>>strings are short.
>> 
>> 3. Instead of creating a new string with '-' replaced for printing,
>>print characters.  Can be okay with buffered I/O, but obviously not
>>on a hot path.
>> 
>> 4. Like 3., but print substrings not containing '-'.
>
> Thanks for all the suggestions and ideas.
>
> To avoid the limitation of 256 (and also consider this path is not
> hot path), I'd like to choose (1) if you would not mind, in a split
> patch maybe.

Sounds good to me.



Re: [Qemu-block] [Qemu-devel] [PATCH 2/8] block: fix unbounded stack for dump_qdict

2016-03-08 Thread Markus Armbruster
Fam Zheng  writes:

> On Tue, 03/08 09:12, Markus Armbruster wrote:
>> Peter Xu  writes:
>> 
>> > Suggested-by: Paolo Bonzini 
>> > CC: Markus Armbruster 
>> > CC: Kevin Wolf 
>> > CC: qemu-block@nongnu.org
>> > Signed-off-by: Peter Xu 
>> > ---
>> >  block/qapi.c | 5 -
>> >  1 file changed, 4 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/block/qapi.c b/block/qapi.c
>> > index db2d3fb..687e577 100644
>> > --- a/block/qapi.c
>> > +++ b/block/qapi.c
>> > @@ -638,9 +638,12 @@ static void dump_qdict(fprintf_function func_fprintf, 
>> > void *f, int indentation,
>> >  QType type = qobject_type(entry->value);
>> >  bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
>> >  const char *format = composite ? "%*s%s:\n" : "%*s%s: ";
>> 
>> Unrelated to your patch: ugh!
>> 
>> Printf formats should be literals whenever possible, to make it easy for
>> the compiler to warn you when you screw up.  It's trivially possible
>> here!  Instead of
>> 
>>func_fprintf(f, format, indentation * 4, "", key);
>> 
>> do
>> 
>>func_fprintf(f, "%*s%s:%c", indentation * 4, "", key,
>> composite ? '\n', ' ');;
>> 
>> > -char key[strlen(entry->key) + 1];
>> > +#define __KEY_LEN (256)
>> > +char key[__KEY_LEN];
>> >  int i;
>> >  
>> > +assert(strlen(entry->key) + 1 <= __KEY_LEN);
>> > +#undef __KEY_LEN
>> >  /* replace dashes with spaces in key (variable) names */
>> >  for (i = 0; entry->key[i]; i++) {
>> >  key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];
>> 
>> I'm afraid this isn't a good idea.  It relies on the non-local argument
>> that nobody will ever put a key longer than 255 into a qdict that gets
>> dumped.  That may even be the case, but you need to *prove* it, not just
>> assert it.  The weakest acceptable proof might be assertions in every
>> place that put keys into a dict that might get dumped.  I suspect that's
>> practical and maintainable only if there's a single place that does it.
>> 
>> If this was a good idea, I'd recommend to avoid the awkward macro:
>
> Also I think the double underscore identifiers are considered reserved in C,
> no?

Correct.  C99 7.1.3 Reserved identifiers: All identifiers that begin
with an underscore and either an uppercase letter or another underscore
are always reserved for any use.

[...]



Re: [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE

2016-03-08 Thread Niels de Vos
On Tue, Mar 08, 2016 at 01:53:26PM +0100, Kevin Wolf wrote:
> Am 08.03.2016 um 05:21 hat Niels de Vos geschrieben:
> > On Mon, Mar 07, 2016 at 01:27:38PM -0500, Jeff Cody wrote:
> > > On Mon, Mar 07, 2016 at 07:04:15PM +0100, Niels de Vos wrote:
> > > > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > > > it possible to detect sparse areas in files.
> > > > 
> > > > Signed-off-by: Niels de Vos 
> > > > 
> > > > --
> > > > Tested by compiling and running "qemu-img map gluster://..." with a
> > > > build of the current master branch of glusterfs. Using a Fedora
> > > > cloud image (in raw format) shows many SEEK procudure calls going back
> > > > and forth over the network. The output of "qemu map" matches the output
> > > > when run against the image on the local filesystem.
> > > > ---
> > > >  block/gluster.c | 159 
> > > > 
> > > >  configure   |  25 +
> > > >  2 files changed, 184 insertions(+)
> > > > 
> > > > diff --git a/block/gluster.c b/block/gluster.c
> > > > index 65077a0..1430010 100644
> > > > --- a/block/gluster.c
> > > > +++ b/block/gluster.c
> > > > @@ -677,6 +677,153 @@ static int 
> > > > qemu_gluster_has_zero_init(BlockDriverState *bs)
> > > >  return 0;
> > > >  }
> > > >  
> > > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > > 
> > > Why do we need to make this a compile-time option?  Version checking
> > > is problematic; for instance, different distributions may have
> > > backported bug fixes / features, that are not reflected by the
> > > reported version number, etc..  Ideally, we can determine
> > > functionality during runtime, and behave accordingly.
> > 
> > This will not get backported to older Gluster versions, it required a
> > protocol change.
> > 
> > > If SEEK_DATA and SEEK_HOLE are not supported,
> > > qemu_gluster_co_get_block_status can return that sectors are all
> > > allocated (which is what happens in block/io.c anyway if the driver
> > > doesn't support the function).
> > 
> > Ok, good to know.
> > 
> > > As long as glfs_lseek() will return error (e.g. EINVAL) for an invalid
> > > whence value,  we can handle it runtime.  Does glfs_lseek() behave
> > > sanely?
> > 
> > Unfortunately older versions of libgfapi do not return EINVAL when
> > SEEK_DATA/HOLE is used. It is something we'll need to fix in the stable
> > releases. We can not assume that all users have installed a version of
> > the library that handles SEEK_DATA/HOLE correctly (return EINVAL) when
> > there is no support in the network protocol or on the server.
> > 
> > To be sure that we don't get some undefined behaviour, the compile time
> > check is needed.
> 
> The code could be compiled on a host with newer libgfapi, but run on a
> different host with an older version. This is why having (only) compile
> time checks is rarely a good idea.

Oh, yes, that is possible. glfs_lseek() is not a new function, so the
symbol version did not need to change.

> Jeff's suggestion to probe the actual behaviour on the host we're
> running on in .bdrv_open() sounds reasonable to me.

Yes, it sure is. I'll send a v2 patch soon.

Thanks,
Niels



Re: [Qemu-block] [PATCH v2 0/3] vmdk: Move descriptor buffers to heap

2016-03-08 Thread Kevin Wolf
Am 08.03.2016 um 09:24 hat Fam Zheng geschrieben:
> All three functions are not in hot path (all run once for the BDS lifecycle),
> so it's okay to convert to g_malloc0.

Yeah, it's okay, but still kind of useless to memset the whole buffer
before you overwrite it anyway. There's also a useless ret = 0 somewhere.
I'd prefer not to do that in the next series.

Anyway, thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE

2016-03-08 Thread Niels de Vos
On Tue, Mar 08, 2016 at 07:33:26AM -0500, Jeff Cody wrote:
> On Tue, Mar 08, 2016 at 05:21:48AM +0100, Niels de Vos wrote:
> > On Mon, Mar 07, 2016 at 01:27:38PM -0500, Jeff Cody wrote:
> > > On Mon, Mar 07, 2016 at 07:04:15PM +0100, Niels de Vos wrote:
> > > > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > > > it possible to detect sparse areas in files.
> > > > 
> > > > Signed-off-by: Niels de Vos 
> > > > 
> > > > --
> > > > Tested by compiling and running "qemu-img map gluster://..." with a
> > > > build of the current master branch of glusterfs. Using a Fedora
> > > > cloud image (in raw format) shows many SEEK procudure calls going back
> > > > and forth over the network. The output of "qemu map" matches the output
> > > > when run against the image on the local filesystem.
> > > > ---
> > > >  block/gluster.c | 159 
> > > > 
> > > >  configure   |  25 +
> > > >  2 files changed, 184 insertions(+)
> > > > 
> > > > diff --git a/block/gluster.c b/block/gluster.c
> > > > index 65077a0..1430010 100644
> > > > --- a/block/gluster.c
> > > > +++ b/block/gluster.c
> > > > @@ -677,6 +677,153 @@ static int 
> > > > qemu_gluster_has_zero_init(BlockDriverState *bs)
> > > >  return 0;
> > > >  }
> > > >  
> > > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > > 
> > > Why do we need to make this a compile-time option?  Version checking
> > > is problematic; for instance, different distributions may have
> > > backported bug fixes / features, that are not reflected by the
> > > reported version number, etc..  Ideally, we can determine
> > > functionality during runtime, and behave accordingly.
> > 
> > This will not get backported to older Gluster versions, it required a
> > protocol change.
> > 
> > > If SEEK_DATA and SEEK_HOLE are not supported,
> > > qemu_gluster_co_get_block_status can return that sectors are all
> > > allocated (which is what happens in block/io.c anyway if the driver
> > > doesn't support the function).
> > 
> > Ok, good to know.
> > 
> > > As long as glfs_lseek() will return error (e.g. EINVAL) for an invalid
> > > whence value,  we can handle it runtime.  Does glfs_lseek() behave
> > > sanely?
> > 
> > Unfortunately older versions of libgfapi do not return EINVAL when
> > SEEK_DATA/HOLE is used. It is something we'll need to fix in the stable
> > releases. We can not assume that all users have installed a version of
> > the library that handles SEEK_DATA/HOLE correctly (return EINVAL) when
> > there is no support in the network protocol or on the server.
> > 
> > To be sure that we don't get some undefined behaviour, the compile time
> > check is needed.
> >
> 
> That's unfortunate.
> 
> However, we may still be able to work around that with some probing in
> the qemu_gluster_open() function.  I peeked at the libgfapi code, and
> it looks like for an unknown whence, the current offset is just
> returned by glfs_lseek() - is that correct?

Yes, that is correct.

> With the same offset input, an lseek should always return something
> different for a SEEK_DATA and SEEK_HOLE whence (SEEK_HOLE will return
> EOF offset if there are no further holes).
> 
> In qemu_gluster_open(), we can then probe for support.  if we call
> glfs_lseek() twice with the same offset, first with SEEK_DATA and
> then with SEEK_HOLE, then we can see if the libgfapi version
> supports SEEK_DATA/SEEK_HOLE.  If the resulting offsets from the calls
> are equal, it doesn't support it.  If the offsets differ, then
> presumably it does.  We can then set and remember that in the
> BDRVGlusterState struct (e.g. bool supports_seek_data).

Hmm, yes, that should work. And it'll obviously catch the case where
glfs_lseek() returns EINVAL too.

Thanks for the idea, I'll try to get this done over the next few days.

Niels


> 
> > > > +/*
> > > > + * Find allocation range in @bs around offset @start.
> > > > + * May change underlying file descriptor's file offset.
> > > > + * If @start is not in a hole, store @start in @data, and the
> > > > + * beginning of the next hole in @hole, and return 0.
> > > > + * If @start is in a non-trailing hole, store @start in @hole and the
> > > > + * beginning of the next non-hole in @data, and return 0.
> > > > + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> > > > + * If we can't find out, return a negative errno other than -ENXIO.
> > > > + *
> > > > + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> > > > + */
> > > > +static int find_allocation(BlockDriverState *bs, off_t start,
> > > > +   off_t *data, off_t *hole)
> > > > +{
> > > > +BDRVGlusterState *s = bs->opaque;
> > > > +off_t offs;
> > > > +
> > > > +/*
> > > > + * SEEK_DATA cases:
> > > > + * D1. offs == start: start is in data
> > > > + * D2. offs > start: start is in a hole, next data at offs
> > > > + * D3. offs < 0, errno = ENXIO: 

Re: [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE

2016-03-08 Thread Kevin Wolf
Am 08.03.2016 um 05:21 hat Niels de Vos geschrieben:
> On Mon, Mar 07, 2016 at 01:27:38PM -0500, Jeff Cody wrote:
> > On Mon, Mar 07, 2016 at 07:04:15PM +0100, Niels de Vos wrote:
> > > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > > it possible to detect sparse areas in files.
> > > 
> > > Signed-off-by: Niels de Vos 
> > > 
> > > --
> > > Tested by compiling and running "qemu-img map gluster://..." with a
> > > build of the current master branch of glusterfs. Using a Fedora
> > > cloud image (in raw format) shows many SEEK procudure calls going back
> > > and forth over the network. The output of "qemu map" matches the output
> > > when run against the image on the local filesystem.
> > > ---
> > >  block/gluster.c | 159 
> > > 
> > >  configure   |  25 +
> > >  2 files changed, 184 insertions(+)
> > > 
> > > diff --git a/block/gluster.c b/block/gluster.c
> > > index 65077a0..1430010 100644
> > > --- a/block/gluster.c
> > > +++ b/block/gluster.c
> > > @@ -677,6 +677,153 @@ static int 
> > > qemu_gluster_has_zero_init(BlockDriverState *bs)
> > >  return 0;
> > >  }
> > >  
> > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > 
> > Why do we need to make this a compile-time option?  Version checking
> > is problematic; for instance, different distributions may have
> > backported bug fixes / features, that are not reflected by the
> > reported version number, etc..  Ideally, we can determine
> > functionality during runtime, and behave accordingly.
> 
> This will not get backported to older Gluster versions, it required a
> protocol change.
> 
> > If SEEK_DATA and SEEK_HOLE are not supported,
> > qemu_gluster_co_get_block_status can return that sectors are all
> > allocated (which is what happens in block/io.c anyway if the driver
> > doesn't support the function).
> 
> Ok, good to know.
> 
> > As long as glfs_lseek() will return error (e.g. EINVAL) for an invalid
> > whence value,  we can handle it runtime.  Does glfs_lseek() behave
> > sanely?
> 
> Unfortunately older versions of libgfapi do not return EINVAL when
> SEEK_DATA/HOLE is used. It is something we'll need to fix in the stable
> releases. We can not assume that all users have installed a version of
> the library that handles SEEK_DATA/HOLE correctly (return EINVAL) when
> there is no support in the network protocol or on the server.
> 
> To be sure that we don't get some undefined behaviour, the compile time
> check is needed.

The code could be compiled on a host with newer libgfapi, but run on a
different host with an older version. This is why having (only) compile
time checks is rarely a good idea.

Jeff's suggestion to probe the actual behaviour on the host we're
running on in .bdrv_open() sounds reasonable to me.

Kevin



[Qemu-block] [PATCH 8/8] block: Use blk_co_pwritev() in blk_co_write_zeroes()

2016-03-08 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 2947e8c..1c2ea72 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1465,12 +1465,13 @@ void *blk_aio_get(const AIOCBInfo *aiocb_info, 
BlockBackend *blk,
 int coroutine_fn blk_co_write_zeroes(BlockBackend *blk, int64_t sector_num,
  int nb_sectors, BdrvRequestFlags flags)
 {
-int ret = blk_check_request(blk, sector_num, nb_sectors);
-if (ret < 0) {
-return ret;
+if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+return -EINVAL;
 }
 
-return bdrv_co_write_zeroes(blk_bs(blk), sector_num, nb_sectors, flags);
+return blk_co_pwritev(blk, sector_num << BDRV_SECTOR_BITS,
+  nb_sectors << BDRV_SECTOR_BITS, NULL,
+  BDRV_REQ_ZERO_WRITE);
 }
 
 int blk_write_compressed(BlockBackend *blk, int64_t sector_num,
-- 
1.8.3.1




[Qemu-block] [PATCH 6/8] block: Use blk_prw() in blk_pread()/blk_pwrite()

2016-03-08 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 5b9d387..579cc09 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -747,9 +747,9 @@ static void blk_write_entry(void *opaque)
rwco->qiov, rwco->flags);
 }
 
-static int blk_rw(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
-  int nb_sectors, CoroutineEntry co_entry,
-  BdrvRequestFlags flags)
+static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
+   int64_t bytes, CoroutineEntry co_entry,
+   BdrvRequestFlags flags)
 {
 AioContext *aio_context;
 QEMUIOVector qiov;
@@ -757,19 +757,15 @@ static int blk_rw(BlockBackend *blk, int64_t sector_num, 
uint8_t *buf,
 Coroutine *co;
 BlkRwCo rwco;
 
-if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-return -EINVAL;
-}
-
 iov = (struct iovec) {
 .iov_base = buf,
-.iov_len = nb_sectors * BDRV_SECTOR_SIZE,
+.iov_len = bytes,
 };
 qemu_iovec_init_external(, , 1);
 
 rwco = (BlkRwCo) {
 .blk= blk,
-.offset = sector_num << BDRV_SECTOR_BITS,
+.offset = offset,
 .qiov   = ,
 .flags  = flags,
 .ret= NOT_DONE,
@@ -786,6 +782,18 @@ static int blk_rw(BlockBackend *blk, int64_t sector_num, 
uint8_t *buf,
 return rwco.ret;
 }
 
+static int blk_rw(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
+  int nb_sectors, CoroutineEntry co_entry,
+  BdrvRequestFlags flags)
+{
+if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+return -EINVAL;
+}
+
+return blk_prw(blk, sector_num << BDRV_SECTOR_BITS, buf,
+   nb_sectors << BDRV_SECTOR_BITS, co_entry, flags);
+}
+
 int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
  int nb_sectors)
 {
@@ -866,22 +874,20 @@ BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, 
int64_t sector_num,
 
 int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count)
 {
-int ret = blk_check_byte_request(blk, offset, count);
+int ret = blk_prw(blk, offset, buf, count, blk_read_entry, 0);
 if (ret < 0) {
 return ret;
 }
-
-return bdrv_pread(blk_bs(blk), offset, buf, count);
+return count;
 }
 
 int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count)
 {
-int ret = blk_check_byte_request(blk, offset, count);
+int ret = blk_prw(blk, offset, (void*) buf, count, blk_write_entry, 0);
 if (ret < 0) {
 return ret;
 }
-
-return bdrv_pwrite(blk_bs(blk), offset, buf, count);
+return count;
 }
 
 int64_t blk_getlength(BlockBackend *blk)
-- 
1.8.3.1




[Qemu-block] [PATCH 5/8] block: Use blk_co_pwritev() in blk_write_zeroes()

2016-03-08 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 4fd0545..5b9d387 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -748,7 +748,8 @@ static void blk_write_entry(void *opaque)
 }
 
 static int blk_rw(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
-  int nb_sectors, CoroutineEntry co_entry)
+  int nb_sectors, CoroutineEntry co_entry,
+  BdrvRequestFlags flags)
 {
 AioContext *aio_context;
 QEMUIOVector qiov;
@@ -770,6 +771,7 @@ static int blk_rw(BlockBackend *blk, int64_t sector_num, 
uint8_t *buf,
 .blk= blk,
 .offset = sector_num << BDRV_SECTOR_BITS,
 .qiov   = ,
+.flags  = flags,
 .ret= NOT_DONE,
 };
 
@@ -787,7 +789,7 @@ static int blk_rw(BlockBackend *blk, int64_t sector_num, 
uint8_t *buf,
 int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
  int nb_sectors)
 {
-return blk_rw(blk, sector_num, buf, nb_sectors, blk_read_entry);
+return blk_rw(blk, sector_num, buf, nb_sectors, blk_read_entry, 0);
 }
 
 int blk_read_unthrottled(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
@@ -812,18 +814,15 @@ int blk_read_unthrottled(BlockBackend *blk, int64_t 
sector_num, uint8_t *buf,
 int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
   int nb_sectors)
 {
-return blk_rw(blk, sector_num, (uint8_t*) buf, nb_sectors, 
blk_write_entry);
+return blk_rw(blk, sector_num, (uint8_t*) buf, nb_sectors,
+  blk_write_entry, 0);
 }
 
 int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
  int nb_sectors, BdrvRequestFlags flags)
 {
-int ret = blk_check_request(blk, sector_num, nb_sectors);
-if (ret < 0) {
-return ret;
-}
-
-return bdrv_write_zeroes(blk_bs(blk), sector_num, nb_sectors, flags);
+return blk_rw(blk, sector_num, NULL, nb_sectors, blk_write_entry,
+  BDRV_REQ_ZERO_WRITE);
 }
 
 static void error_callback_bh(void *opaque)
-- 
1.8.3.1




[Qemu-block] [PATCH 7/8] block: Use blk_aio_prwv() for aio_read/write/write_zeroes

2016-03-08 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 104 +++---
 1 file changed, 91 insertions(+), 13 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 579cc09..2947e8c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -859,17 +859,95 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
 return >common;
 }
 
+typedef struct BlkAioEmAIOCB {
+BlockAIOCB common;
+BlkRwCo rwco;
+bool has_returned;
+QEMUBH* bh;
+} BlkAioEmAIOCB;
+
+static const AIOCBInfo blk_aio_em_aiocb_info = {
+.aiocb_size = sizeof(BlkAioEmAIOCB),
+};
+
+static void blk_aio_complete(BlkAioEmAIOCB *acb)
+{
+if (acb->bh) {
+assert(acb->has_returned);
+qemu_bh_delete(acb->bh);
+}
+if (acb->has_returned) {
+acb->common.cb(acb->common.opaque, acb->rwco.ret);
+qemu_aio_unref(acb);
+}
+}
+
+static void blk_aio_complete_bh(void *opaque)
+{
+blk_aio_complete(opaque);
+}
+
+static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset,
+QEMUIOVector *qiov, CoroutineEntry co_entry,
+BdrvRequestFlags flags,
+BlockCompletionFunc *cb, void *opaque)
+{
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+
+acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.offset = offset,
+.qiov   = qiov,
+.flags  = flags,
+.ret= NOT_DONE,
+};
+acb->bh = NULL;
+acb->has_returned = false;
+
+co = qemu_coroutine_create(co_entry);
+qemu_coroutine_enter(co, acb);
+
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+acb->bh = aio_bh_new(blk_get_aio_context(blk), blk_aio_complete_bh, 
acb);
+qemu_bh_schedule(acb->bh);
+}
+
+return >common;
+}
+
+static void blk_aio_read_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, rwco->qiov->size,
+  rwco->qiov, rwco->flags);
+blk_aio_complete(acb);
+}
+
+static void blk_aio_write_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_pwritev(rwco->blk, rwco->offset, rwco->qiov->size,
+   rwco->qiov, rwco->flags);
+blk_aio_complete(acb);
+}
+
 BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num,
  int nb_sectors, BdrvRequestFlags flags,
  BlockCompletionFunc *cb, void *opaque)
 {
-int ret = blk_check_request(blk, sector_num, nb_sectors);
-if (ret < 0) {
-return blk_abort_aio_request(blk, cb, opaque, ret);
+if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+return blk_abort_aio_request(blk, cb, opaque, -EINVAL);
 }
 
-return bdrv_aio_write_zeroes(blk_bs(blk), sector_num, nb_sectors, flags,
- cb, opaque);
+return blk_aio_prwv(blk, sector_num << BDRV_SECTOR_BITS, NULL,
+blk_aio_write_entry, BDRV_REQ_ZERO_WRITE, cb, opaque);
 }
 
 int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count)
@@ -921,24 +999,24 @@ BlockAIOCB *blk_aio_readv(BlockBackend *blk, int64_t 
sector_num,
   QEMUIOVector *iov, int nb_sectors,
   BlockCompletionFunc *cb, void *opaque)
 {
-int ret = blk_check_request(blk, sector_num, nb_sectors);
-if (ret < 0) {
-return blk_abort_aio_request(blk, cb, opaque, ret);
+if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+return blk_abort_aio_request(blk, cb, opaque, -EINVAL);
 }
 
-return bdrv_aio_readv(blk_bs(blk), sector_num, iov, nb_sectors, cb, 
opaque);
+return blk_aio_prwv(blk, sector_num << BDRV_SECTOR_BITS, iov,
+blk_aio_read_entry, 0, cb, opaque);
 }
 
 BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t sector_num,
QEMUIOVector *iov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
 {
-int ret = blk_check_request(blk, sector_num, nb_sectors);
-if (ret < 0) {
-return blk_abort_aio_request(blk, cb, opaque, ret);
+if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+return blk_abort_aio_request(blk, cb, opaque, -EINVAL);
 }
 
-return bdrv_aio_writev(blk_bs(blk), sector_num, iov, nb_sectors, cb, 
opaque);
+return blk_aio_prwv(blk, sector_num << BDRV_SECTOR_BITS, iov,
+blk_aio_write_entry, 0, cb, opaque);
 }
 
 BlockAIOCB *blk_aio_flush(BlockBackend *blk,
-- 
1.8.3.1




[Qemu-block] [PATCH 1/8] block: Use BdrvChild in BlockBackend

2016-03-08 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
---
 block.c   |  47 ++---
 block/block-backend.c | 254 --
 include/block/block_int.h |   5 +
 3 files changed, 192 insertions(+), 114 deletions(-)

diff --git a/block.c b/block.c
index 11ba73c..57056b5 100644
--- a/block.c
+++ b/block.c
@@ -1195,10 +1195,9 @@ static int bdrv_fill_options(QDict **options, const char 
*filename,
 return 0;
 }
 
-static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
-BlockDriverState *child_bs,
-const char *child_name,
-const BdrvChildRole *child_role)
+BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
+  const char *child_name,
+  const BdrvChildRole *child_role)
 {
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
@@ -1207,24 +1206,43 @@ static BdrvChild *bdrv_attach_child(BlockDriverState 
*parent_bs,
 .role   = child_role,
 };
 
-QLIST_INSERT_HEAD(_bs->children, child, next);
 QLIST_INSERT_HEAD(_bs->parents, child, next_parent);
 
 return child;
 }
 
+static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+BlockDriverState *child_bs,
+const char *child_name,
+const BdrvChildRole *child_role)
+{
+BdrvChild *child = bdrv_root_attach_child(child_bs, child_name, 
child_role);
+QLIST_INSERT_HEAD(_bs->children, child, next);
+return child;
+}
+
 static void bdrv_detach_child(BdrvChild *child)
 {
-QLIST_REMOVE(child, next);
+if (child->next.le_prev) {
+QLIST_REMOVE(child, next);
+child->next.le_prev = NULL;
+}
 QLIST_REMOVE(child, next_parent);
 g_free(child->name);
 g_free(child);
 }
 
-void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
+void bdrv_root_unref_child(BdrvChild *child)
 {
 BlockDriverState *child_bs;
 
+child_bs = child->bs;
+bdrv_detach_child(child);
+bdrv_unref(child_bs);
+}
+
+void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
+{
 if (child == NULL) {
 return;
 }
@@ -1233,9 +1251,7 @@ void bdrv_unref_child(BlockDriverState *parent, BdrvChild 
*child)
 child->bs->inherits_from = NULL;
 }
 
-child_bs = child->bs;
-bdrv_detach_child(child);
-bdrv_unref(child_bs);
+bdrv_root_unref_child(child);
 }
 
 /*
@@ -2275,6 +2291,14 @@ static void change_parent_backing_link(BlockDriverState 
*from,
 {
 BdrvChild *c, *next;
 
+if (from->blk) {
+/* FIXME We bypass blk_set_bs(), so we need to make these updates
+ * manually. The root problem is not in this change function, but the
+ * existence of BlockDriverState.blk. */
+to->blk = from->blk;
+from->blk = NULL;
+}
+
 QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
 assert(c->role != _backing);
 c->bs = to;
@@ -2283,9 +2307,6 @@ static void change_parent_backing_link(BlockDriverState 
*from,
 bdrv_ref(to);
 bdrv_unref(from);
 }
-if (from->blk) {
-blk_set_bs(from->blk, to);
-}
 }
 
 static void swap_feature_fields(BlockDriverState *bs_top,
diff --git a/block/block-backend.c b/block/block-backend.c
index d1621ec..056d79e 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -27,7 +27,7 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
 struct BlockBackend {
 char *name;
 int refcnt;
-BlockDriverState *bs;
+BdrvChild *root;
 DriveInfo *legacy_dinfo;/* null unless created by drive_new() */
 QTAILQ_ENTRY(BlockBackend) link; /* for blk_backends */
 QTAILQ_ENTRY(BlockBackend) monitor_link; /* for monitor_block_backends */
@@ -78,6 +78,17 @@ static QTAILQ_HEAD(, BlockBackend) blk_backends =
 static QTAILQ_HEAD(, BlockBackend) monitor_block_backends =
 QTAILQ_HEAD_INITIALIZER(monitor_block_backends);
 
+static void blk_root_inherit_options(int *child_flags, QDict *child_options,
+ int parent_flags, QDict *parent_options)
+{
+/* We're not supposed to call this function for root nodes */
+abort();
+}
+
+static const BdrvChildRole child_root = {
+.inherit_options = blk_root_inherit_options,
+};
+
 /*
  * Create a new BlockBackend with @name, with a reference count of one.
  * @name must not be null or empty.
@@ -129,7 +140,7 @@ BlockBackend *blk_new_with_bs(const char *name, Error 
**errp)
 }
 
 bs = bdrv_new_root();
-blk->bs = bs;
+blk->root = bdrv_root_attach_child(bs, "root", _root);
 bs->blk = blk;
 return blk;
 }
@@ -159,7 +170,7 @@ BlockBackend *blk_new_open(const char *name, const char 
*filename,
 return NULL;
 }
 
-ret = bdrv_open(>bs, filename, reference, 

[Qemu-block] [PATCH 4/8] block: Pull up blk_read_unthrottled() implementation

2016-03-08 Thread Kevin Wolf
Use blk_read(), so that it goes through blk_co_preadv() like all read
requests from the BB to the BDS.

Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 12 ++--
 block/io.c| 14 --
 include/block/block.h |  2 --
 3 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index e159647..4fd0545 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -793,12 +793,20 @@ int blk_read(BlockBackend *blk, int64_t sector_num, 
uint8_t *buf,
 int blk_read_unthrottled(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
  int nb_sectors)
 {
-int ret = blk_check_request(blk, sector_num, nb_sectors);
+BlockDriverState *bs = blk_bs(blk);
+bool enabled;
+int ret;
+
+ret = blk_check_request(blk, sector_num, nb_sectors);
 if (ret < 0) {
 return ret;
 }
 
-return bdrv_read_unthrottled(blk_bs(blk), sector_num, buf, nb_sectors);
+enabled = bs->io_limits_enabled;
+bs->io_limits_enabled = false;
+ret = blk_read(blk, sector_num, buf, nb_sectors);
+bs->io_limits_enabled = enabled;
+return ret;
 }
 
 int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
diff --git a/block/io.c b/block/io.c
index aa8537c..41d954ca 100644
--- a/block/io.c
+++ b/block/io.c
@@ -615,20 +615,6 @@ int bdrv_read(BlockDriverState *bs, int64_t sector_num,
 return bdrv_rw_co(bs, sector_num, buf, nb_sectors, false, 0);
 }
 
-/* Just like bdrv_read(), but with I/O throttling temporarily disabled */
-int bdrv_read_unthrottled(BlockDriverState *bs, int64_t sector_num,
-  uint8_t *buf, int nb_sectors)
-{
-bool enabled;
-int ret;
-
-enabled = bs->io_limits_enabled;
-bs->io_limits_enabled = false;
-ret = bdrv_read(bs, sector_num, buf, nb_sectors);
-bs->io_limits_enabled = enabled;
-return ret;
-}
-
 /* Return < 0 if error. Important errors are:
   -EIO generic I/O error (may happen for all errors)
   -ENOMEDIUM   No media inserted.
diff --git a/include/block/block.h b/include/block/block.h
index d2e1a51..3a78414 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -227,8 +227,6 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state);
 void bdrv_reopen_abort(BDRVReopenState *reopen_state);
 int bdrv_read(BlockDriverState *bs, int64_t sector_num,
   uint8_t *buf, int nb_sectors);
-int bdrv_read_unthrottled(BlockDriverState *bs, int64_t sector_num,
-  uint8_t *buf, int nb_sectors);
 int bdrv_write(BlockDriverState *bs, int64_t sector_num,
const uint8_t *buf, int nb_sectors);
 int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num,
-- 
1.8.3.1




[Qemu-block] [PATCH 2/8] block: Use blk_co_preadv() for blk_read()

2016-03-08 Thread Kevin Wolf
This patch introduces blk_co_preadv() as a central function on the
BlockBackend level that is supposed to handle all read requests from the
BB to its root BDS eventually.

Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 64 ---
 block/io.c|  5 +---
 include/block/block_int.h |  4 +++
 3 files changed, 65 insertions(+), 8 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 056d79e..d97d1ac 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -22,6 +22,8 @@
 /* Number of coroutines to reserve per attached device model */
 #define COROUTINE_POOL_RESERVATION 64
 
+#define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
+
 static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
 
 struct BlockBackend {
@@ -697,15 +699,69 @@ static int blk_check_request(BlockBackend *blk, int64_t 
sector_num,
   nb_sectors * BDRV_SECTOR_SIZE);
 }
 
-int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
- int nb_sectors)
+static int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
+  unsigned int bytes, QEMUIOVector *qiov,
+  BdrvRequestFlags flags)
 {
-int ret = blk_check_request(blk, sector_num, nb_sectors);
+int ret = blk_check_byte_request(blk, offset, bytes);
 if (ret < 0) {
 return ret;
 }
 
-return bdrv_read(blk_bs(blk), sector_num, buf, nb_sectors);
+return bdrv_co_do_preadv(blk_bs(blk), offset, bytes, qiov, flags);
+}
+
+typedef struct BlkRwCo {
+BlockBackend *blk;
+int64_t offset;
+QEMUIOVector *qiov;
+int ret;
+BdrvRequestFlags flags;
+} BlkRwCo;
+
+static void blk_read_entry(void *opaque)
+{
+BlkRwCo *rwco = opaque;
+
+rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, rwco->qiov->size,
+  rwco->qiov, rwco->flags);
+}
+
+int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
+ int nb_sectors)
+{
+AioContext *aio_context;
+QEMUIOVector qiov;
+struct iovec iov;
+Coroutine *co;
+BlkRwCo rwco;
+
+if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+return -EINVAL;
+}
+
+iov = (struct iovec) {
+.iov_base = buf,
+.iov_len = nb_sectors * BDRV_SECTOR_SIZE,
+};
+qemu_iovec_init_external(, , 1);
+
+rwco = (BlkRwCo) {
+.blk= blk,
+.offset = sector_num << BDRV_SECTOR_BITS,
+.qiov   = ,
+.ret= NOT_DONE,
+};
+
+co = qemu_coroutine_create(blk_read_entry);
+qemu_coroutine_enter(co, );
+
+aio_context = blk_get_aio_context(blk);
+while (rwco.ret == NOT_DONE) {
+aio_poll(aio_context, true);
+}
+
+return rwco.ret;
 }
 
 int blk_read_unthrottled(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
diff --git a/block/io.c b/block/io.c
index 5f9b6d6..c7084e4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -44,9 +44,6 @@ static int coroutine_fn bdrv_co_readv_em(BlockDriverState *bs,
 static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
  int64_t sector_num, int nb_sectors,
  QEMUIOVector *iov);
-static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
-int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
-BdrvRequestFlags flags);
 static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
 int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags);
@@ -939,7 +936,7 @@ out:
 /*
  * Handle a read request in coroutine context
  */
-static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
+int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
 int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags)
 {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 9d3a2cc..db830f6 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -508,6 +508,10 @@ extern BlockDriver bdrv_qcow2;
  */
 void bdrv_setup_io_funcs(BlockDriver *bdrv);
 
+int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
+int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
+BdrvRequestFlags flags);
+
 int get_tmp_filename(char *filename, int size);
 BlockDriver *bdrv_probe_all(const uint8_t *buf, int buf_size,
 const char *filename);
-- 
1.8.3.1




[Qemu-block] [PATCH 3/8] block: Use blk_co_pwritev() for blk_write()

2016-03-08 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 39 ++-
 block/io.c|  5 +
 include/block/block_int.h |  3 +++
 3 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index d97d1ac..e159647 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -711,6 +711,18 @@ static int coroutine_fn blk_co_preadv(BlockBackend *blk, 
int64_t offset,
 return bdrv_co_do_preadv(blk_bs(blk), offset, bytes, qiov, flags);
 }
 
+static int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
+  unsigned int bytes, QEMUIOVector *qiov,
+  BdrvRequestFlags flags)
+{
+int ret = blk_check_byte_request(blk, offset, bytes);
+if (ret < 0) {
+return ret;
+}
+
+return bdrv_co_do_pwritev(blk_bs(blk), offset, bytes, qiov, flags);
+}
+
 typedef struct BlkRwCo {
 BlockBackend *blk;
 int64_t offset;
@@ -727,8 +739,16 @@ static void blk_read_entry(void *opaque)
   rwco->qiov, rwco->flags);
 }
 
-int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
- int nb_sectors)
+static void blk_write_entry(void *opaque)
+{
+BlkRwCo *rwco = opaque;
+
+rwco->ret = blk_co_pwritev(rwco->blk, rwco->offset, rwco->qiov->size,
+   rwco->qiov, rwco->flags);
+}
+
+static int blk_rw(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
+  int nb_sectors, CoroutineEntry co_entry)
 {
 AioContext *aio_context;
 QEMUIOVector qiov;
@@ -753,7 +773,7 @@ int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t 
*buf,
 .ret= NOT_DONE,
 };
 
-co = qemu_coroutine_create(blk_read_entry);
+co = qemu_coroutine_create(co_entry);
 qemu_coroutine_enter(co, );
 
 aio_context = blk_get_aio_context(blk);
@@ -764,6 +784,12 @@ int blk_read(BlockBackend *blk, int64_t sector_num, 
uint8_t *buf,
 return rwco.ret;
 }
 
+int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
+ int nb_sectors)
+{
+return blk_rw(blk, sector_num, buf, nb_sectors, blk_read_entry);
+}
+
 int blk_read_unthrottled(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
  int nb_sectors)
 {
@@ -778,12 +804,7 @@ int blk_read_unthrottled(BlockBackend *blk, int64_t 
sector_num, uint8_t *buf,
 int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
   int nb_sectors)
 {
-int ret = blk_check_request(blk, sector_num, nb_sectors);
-if (ret < 0) {
-return ret;
-}
-
-return bdrv_write(blk_bs(blk), sector_num, buf, nb_sectors);
+return blk_rw(blk, sector_num, (uint8_t*) buf, nb_sectors, 
blk_write_entry);
 }
 
 int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
diff --git a/block/io.c b/block/io.c
index c7084e4..aa8537c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -44,9 +44,6 @@ static int coroutine_fn bdrv_co_readv_em(BlockDriverState *bs,
 static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
  int64_t sector_num, int nb_sectors,
  QEMUIOVector *iov);
-static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
-int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
-BdrvRequestFlags flags);
 static BlockAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
  int64_t sector_num,
  QEMUIOVector *qiov,
@@ -1281,7 +1278,7 @@ fail:
 /*
  * Handle a write request in coroutine context
  */
-static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
+int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
 int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags)
 {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index db830f6..35ba42f 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -511,6 +511,9 @@ void bdrv_setup_io_funcs(BlockDriver *bdrv);
 int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
 int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags);
+int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
+int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
+BdrvRequestFlags flags);
 
 int get_tmp_filename(char *filename, int size);
 BlockDriver *bdrv_probe_all(const uint8_t *buf, int buf_size,
-- 
1.8.3.1




[Qemu-block] [PATCH 0/8] block: Introduce common read/write function

2016-03-08 Thread Kevin Wolf
BlockBackends support a few different interfaces for reads and writes. Until
now they used to forward the requests directly to the BDS layer, which
implemented wrappers around one central common read/write function that
contained the actual implementation of features provided by the block layer.

This only works as long as none of the features are actually on the
BlockBackend level. As it happens, we have features (writethrough cache, I/O
throttling) that are currently implemented in the BDS, but must move to the BB.

As a preparation, this series introduces the mapping of the existing APIs to a
single coroutine based preadv/pwritev function (as we already have on the BDS
level) to the BB layer. The BDS version of the emulation can't go away just yet
because there are internal users of them, but the goal is to remove them in the
long term.

Depends on Max's "blockdev: Further BlockBackend work".

Kevin Wolf (8):
  block: Use BdrvChild in BlockBackend
  block: Use blk_co_preadv() for blk_read()
  block: Use blk_co_pwritev() for blk_write()
  block: Pull up blk_read_unthrottled() implementation
  block: Use blk_co_pwritev() in blk_write_zeroes()
  block: Use blk_prw() in blk_pread()/blk_pwrite()
  block: Use blk_aio_prwv() for aio_read/write/write_zeroes
  block: Use blk_co_pwritev() in blk_co_write_zeroes()

 block.c   |  47 +++--
 block/block-backend.c | 485 +-
 block/io.c|  24 +--
 include/block/block.h |   2 -
 include/block/block_int.h |  12 ++
 5 files changed, 401 insertions(+), 169 deletions(-)

-- 
1.8.3.1




Re: [Qemu-block] [PATCH v2 0/3] vmdk: Move descriptor buffers to heap

2016-03-08 Thread Paolo Bonzini


On 08/03/2016 09:24, Fam Zheng wrote:
> All three functions are not in hot path (all run once for the BDS lifecycle),
> so it's okay to convert to g_malloc0.
> 
> Fam
> 
> 
> Fam Zheng (3):
>   vmdk: Switch to heap arrays for vmdk_write_cid
>   vmdk: Switch to heap arrays for vmdk_read_cid
>   vmdk: Switch to heap arrays for vmdk_parent_open
> 
>  block/vmdk.c | 47 +--
>  1 file changed, 29 insertions(+), 18 deletions(-)
> 

Reviewed-by: Paolo Bonzini 



Re: [Qemu-block] [PATCH] block/gluster: add support for SEEK_DATA/SEEK_HOLE

2016-03-08 Thread Jeff Cody
On Tue, Mar 08, 2016 at 05:21:48AM +0100, Niels de Vos wrote:
> On Mon, Mar 07, 2016 at 01:27:38PM -0500, Jeff Cody wrote:
> > On Mon, Mar 07, 2016 at 07:04:15PM +0100, Niels de Vos wrote:
> > > GlusterFS 3.8 contains support for SEEK_DATA and SEEK_HOLE. This makes
> > > it possible to detect sparse areas in files.
> > > 
> > > Signed-off-by: Niels de Vos 
> > > 
> > > --
> > > Tested by compiling and running "qemu-img map gluster://..." with a
> > > build of the current master branch of glusterfs. Using a Fedora
> > > cloud image (in raw format) shows many SEEK procudure calls going back
> > > and forth over the network. The output of "qemu map" matches the output
> > > when run against the image on the local filesystem.
> > > ---
> > >  block/gluster.c | 159 
> > > 
> > >  configure   |  25 +
> > >  2 files changed, 184 insertions(+)
> > > 
> > > diff --git a/block/gluster.c b/block/gluster.c
> > > index 65077a0..1430010 100644
> > > --- a/block/gluster.c
> > > +++ b/block/gluster.c
> > > @@ -677,6 +677,153 @@ static int 
> > > qemu_gluster_has_zero_init(BlockDriverState *bs)
> > >  return 0;
> > >  }
> > >  
> > > +#ifdef CONFIG_GLUSTERFS_SEEK_DATA
> > 
> > Why do we need to make this a compile-time option?  Version checking
> > is problematic; for instance, different distributions may have
> > backported bug fixes / features, that are not reflected by the
> > reported version number, etc..  Ideally, we can determine
> > functionality during runtime, and behave accordingly.
> 
> This will not get backported to older Gluster versions, it required a
> protocol change.
> 
> > If SEEK_DATA and SEEK_HOLE are not supported,
> > qemu_gluster_co_get_block_status can return that sectors are all
> > allocated (which is what happens in block/io.c anyway if the driver
> > doesn't support the function).
> 
> Ok, good to know.
> 
> > As long as glfs_lseek() will return error (e.g. EINVAL) for an invalid
> > whence value,  we can handle it runtime.  Does glfs_lseek() behave
> > sanely?
> 
> Unfortunately older versions of libgfapi do not return EINVAL when
> SEEK_DATA/HOLE is used. It is something we'll need to fix in the stable
> releases. We can not assume that all users have installed a version of
> the library that handles SEEK_DATA/HOLE correctly (return EINVAL) when
> there is no support in the network protocol or on the server.
> 
> To be sure that we don't get some undefined behaviour, the compile time
> check is needed.
>

That's unfortunate.

However, we may still be able to work around that with some probing in
the qemu_gluster_open() function.  I peeked at the libgfapi code, and
it looks like for an unknown whence, the current offset is just
returned by glfs_lseek() - is that correct?


With the same offset input, an lseek should always return something
different for a SEEK_DATA and SEEK_HOLE whence (SEEK_HOLE will return
EOF offset if there are no further holes).

In qemu_gluster_open(), we can then probe for support.  if we call
glfs_lseek() twice with the same offset, first with SEEK_DATA and
then with SEEK_HOLE, then we can see if the libgfapi version
supports SEEK_DATA/SEEK_HOLE.  If the resulting offsets from the calls
are equal, it doesn't support it.  If the offsets differ, then
presumably it does.  We can then set and remember that in the
BDRVGlusterState struct (e.g. bool supports_seek_data).

> > > +/*
> > > + * Find allocation range in @bs around offset @start.
> > > + * May change underlying file descriptor's file offset.
> > > + * If @start is not in a hole, store @start in @data, and the
> > > + * beginning of the next hole in @hole, and return 0.
> > > + * If @start is in a non-trailing hole, store @start in @hole and the
> > > + * beginning of the next non-hole in @data, and return 0.
> > > + * If @start is in a trailing hole or beyond EOF, return -ENXIO.
> > > + * If we can't find out, return a negative errno other than -ENXIO.
> > > + *
> > > + * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
> > > + */
> > > +static int find_allocation(BlockDriverState *bs, off_t start,
> > > +   off_t *data, off_t *hole)
> > > +{
> > > +BDRVGlusterState *s = bs->opaque;
> > > +off_t offs;
> > > +
> > > +/*
> > > + * SEEK_DATA cases:
> > > + * D1. offs == start: start is in data
> > > + * D2. offs > start: start is in a hole, next data at offs
> > > + * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
> > > + *  or start is beyond EOF
> > > + * If the latter happens, the file has been truncated behind
> > > + * our back since we opened it.  All bets are off then.
> > > + * Treating like a trailing hole is simplest.
> > > + * D4. offs < 0, errno != ENXIO: we learned nothing
> > > + */
> > > +offs = glfs_lseek(s->fd, start, SEEK_DATA);
> > > +if (offs < 0) {
> > 

Re: [Qemu-block] [Qemu-devel] [PATCH 2/8] block: fix unbounded stack for dump_qdict

2016-03-08 Thread Paolo Bonzini


On 08/03/2016 09:12, Markus Armbruster wrote:
> I'm afraid this isn't a good idea.  It relies on the non-local argument
> that nobody will ever put a key longer than 255 into a qdict that gets
> dumped.  That may even be the case, but you need to *prove* it, not just
> assert it.  The weakest acceptable proof might be assertions in every
> place that put keys into a dict that might get dumped.  I suspect that's
> practical and maintainable only if there's a single place that does it.
> 
> If this was a good idea, I'd recommend to avoid the awkward macro:
> 
>char key[256];
>int i;
>
>assert(strlen(entry->key) + 1 <= ARRAY_SIZE(key));
> 
> There are several other ways to limit the stack usage:
> 
> 1. Move the array from stack to heap.  Fine unless it's on a hot path.
>As far as I can tell, this dumping business is for HMP and qemu-io,
>i.e. not hot.

I think this is the best.  You can just g_strdup, modify in place, print
and free.

Paolo



Re: [Qemu-block] strange crash in tracked_request_begin

2016-03-08 Thread Stefan Hajnoczi
On Mon, Mar 07, 2016 at 08:00:49PM +0100, Christian Borntraeger wrote:
> On 03/07/2016 06:01 PM, Stefan Hajnoczi wrote:
> > On Mon, Mar 07, 2016 at 01:29:08PM +0100, Christian Borntraeger wrote:
> >> Folks,
> >>
> >> I had a crash of a qemu guest in tracked_request_begin.
> >> The testcase was a guest with ramdisk/kernel that reboots in a 
> >> loop. (about 10 times per second) with a single null-co disk 
> >> attached. No idea how to reproduce this, seems to be a lucky hit.
> >>
> >> (gdb) bt
> >> #0  0x101db5ba in tracked_request_begin 
> >> (req=req@entry=0x3ff90f1bdc0, bs=bs@entry=0x42a39190, 
> >> offset=offset@entry=0, bytes=bytes@entry=4096, 
> >> type=type@entry=BDRV_TRACKED_READ)
> >> at /home/cborntra/REPOS/qemu/block/io.c:390
> >> #1  0x101de91e in bdrv_co_do_preadv (bs=0x42a39190, offset=0, 
> >> bytes=4096, qiov=0x3ff7400cbd8, flags=, 
> >> flags@entry=(unknown: 0))
> >> at /home/cborntra/REPOS/qemu/block/io.c:1001
> >> #2  0x101dfc3e in bdrv_co_do_readv (flags=(unknown: 0), 
> >> qiov=, nb_sectors=, sector_num= >> out>, bs=)
> >> at /home/cborntra/REPOS/qemu/block/io.c:1024
> >> #3  bdrv_co_do_rw (opaque=0x3ff7400e370) at 
> >> /home/cborntra/REPOS/qemu/block/io.c:2173
> >> #4  0x1022d8f6 in coroutine_trampoline (i0=, 
> >> i1=-1946150928) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:79
> >> #5  0x03ff95ed150a in __makecontext_ret () from /lib64/libc.so.6
> >>
> >> looking at the code we are at
> >>
> >> QLIST_INSERT_HEAD(>tracked_requests, req, list);
> >> which translates to
> >>
> >> if (((req)->list.le_next = (>tracked_requests)->lh_first) != NULL) 
> >> (>tracked_requests)->lh_first->list.le_prev = &(req)->list.le_next;
> >> (>tracked_requests)->lh_first = (req);   
> >> (req)->list.le_prev = &(>tracked_requests)->lh_first;
> >>
> >> gdb says, that (>tracked_requests)->lh_first) is zero in the corefile
> >> (gdb) print /x bs->tracked_requests
> >> $6 = {lh_first = 0x0}
> >>
> >> Now looking at the code I am asking myself if this can happen in parallel
> >> to another code that touches tracked_requests, because gcc seems to read
> >> >tracked_requests)->lh_first twice (first to check the value, then
> >> to use it as pointer)
> > 
> > tracked_requests is protected by AioContext.  Perhaps something is doing
> > I/O without acquiring AioContext?
> 
> Hmm, the guest was rebooting, which resets all devices. Maybe something
> in that code is still not right? I will have a look.

virtio_blk_reset() does acquire AioContext so at least that part should
be safe with running IOThreads.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH] qed: fix bdrv_qed_drain

2016-03-08 Thread Stefan Hajnoczi
On Mon, Mar 07, 2016 at 10:22:58PM +0100, Paolo Bonzini wrote:
> On 07/03/2016 21:56, Stefan Hajnoczi wrote:
> > I think the timer concept itself is troublesome.  A radical approach but
> > limited to changing QED itself is to drop the timer and instead keep a
> > timestamp for the last allocating write request.  Next time a write
> > request (allocating or rewriting) is about to complete, do the flush and
> > clear the need check flag as part of the write request (if 5 seconds
> > have passed since the timestamp).
> 
> bdrv_qed_drain should be easy to fix in my new drain implementation,
> which is based on draining the parent before the BdrvChild-ren.  It's
> just troublesome in the current one which alternates flushing and draining.
> 
> I would just revert the patch that introduced bdrv_qed_drain for now,
> and reintroduce it later (note however that something was messed up in
> commit df9a681, "qed: Implement .bdrv_drain", 2015-11-12, because it
> includes some dirty bitmap stuff).

You're right, that might be the best solution for the time being.  AFAIK
the need check write is harmless.  It does not result in a guest-visible
change and is basically just a flush + header update.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH] block: Fix snapshot=on cache modes

2016-03-08 Thread Kevin Wolf
Am 07.03.2016 um 19:24 hat Max Reitz geschrieben:
> On 07.03.2016 13:26, Kevin Wolf wrote:
> > Since commit 91a097e, we end up with a somewhat weird cache mode
> > configuration with snapshot=on: The commit broke the cache mode
> > inheritance for the snapshot overlay so that it is opened as
> > writethrough instead of unsafe now. The following bdrv_append() call to
> > put it on top of the tree swaps the WCE flag with the snapshot's backing
> > file (i.e. the originally given file), so what we eventually get is
> > cache=writeback on the temporary overlay and
> > cache=writethrough,cache.no-flush=on on the real image file.
> > 
> > This patch changes things so that the temporary overlay gets
> > cache=unsafe again like it used to, and the real images get whatever the
> > user specified. This means that cache.direct is now respected even with
> > snapshot=on, and in the case of committing changes, the final flush is
> > no longer ignored except explicitly requested by the user.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  block.c   | 34 --
> >  blockdev.c|  7 ---
> >  include/block/block.h |  1 -
> >  3 files changed, 24 insertions(+), 18 deletions(-)
> > 
> > diff --git a/block.c b/block.c
> > index ba24b8e..e3fe8ed 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -687,13 +687,19 @@ int bdrv_parse_cache_flags(const char *mode, int 
> > *flags)
> >  }
> >  
> >  /*
> > - * Returns the flags that a temporary snapshot should get, based on the
> > - * originally requested flags (the originally requested image will have 
> > flags
> > - * like a backing file)
> > + * Returns the options and flags that a temporary snapshot should get, 
> > based on
> > + * the originally requested flags (the originally requested image will have
> > + * flags like a backing file)
> >   */
> > -static int bdrv_temp_snapshot_flags(int flags)
> > +static void bdrv_temp_snapshot_options(int *child_flags, QDict 
> > *child_options,
> > +   int parent_flags, QDict 
> > *parent_options)
> >  {
> > -return (flags & ~BDRV_O_SNAPSHOT) | BDRV_O_TEMPORARY;
> > +*child_flags = (parent_flags & ~BDRV_O_SNAPSHOT) | BDRV_O_TEMPORARY;
> > +
> > +/* For temporary files, unconditional cache=unsafe is fine */
> > +qdict_set_default_str(child_options, BDRV_OPT_CACHE_WB, "on");
> > +qdict_set_default_str(child_options, BDRV_OPT_CACHE_DIRECT, "off");
> > +qdict_set_default_str(child_options, BDRV_OPT_CACHE_NO_FLUSH, "on");
> >  }
> >  
> >  /*
> > @@ -1424,13 +1430,13 @@ done:
> >  return c;
> >  }
> >  
> > -int bdrv_append_temp_snapshot(BlockDriverState *bs, int flags, Error 
> > **errp)
> > +static int bdrv_append_temp_snapshot(BlockDriverState *bs, int flags,
> > + QDict *snapshot_options, Error **errp)
> >  {
> >  /* TODO: extra byte is a hack to ensure MAX_PATH space on Windows. */
> >  char *tmp_filename = g_malloc0(PATH_MAX + 1);
> >  int64_t total_size;
> >  QemuOpts *opts = NULL;
> > -QDict *snapshot_options;
> >  BlockDriverState *bs_snapshot;
> >  Error *local_err = NULL;
> >  int ret;
> > @@ -1465,7 +1471,6 @@ int bdrv_append_temp_snapshot(BlockDriverState *bs, 
> > int flags, Error **errp)
> >  }
> >  
> >  /* Prepare a new options QDict for the temporary file */
> 
> This comment reads a bit weird now.

Good catch, will s/a new// before sending a pull request.

> Rest looks good and this is not exactly critical, so:
> 
> Reviewed-by: Max Reitz 

Thanks.

Kevin


pgplt5YHMGRZF.pgp
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH 2/8] block: fix unbounded stack for dump_qdict

2016-03-08 Thread Fam Zheng
On Tue, 03/08 09:12, Markus Armbruster wrote:
> Peter Xu  writes:
> 
> > Suggested-by: Paolo Bonzini 
> > CC: Markus Armbruster 
> > CC: Kevin Wolf 
> > CC: qemu-block@nongnu.org
> > Signed-off-by: Peter Xu 
> > ---
> >  block/qapi.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/qapi.c b/block/qapi.c
> > index db2d3fb..687e577 100644
> > --- a/block/qapi.c
> > +++ b/block/qapi.c
> > @@ -638,9 +638,12 @@ static void dump_qdict(fprintf_function func_fprintf, 
> > void *f, int indentation,
> >  QType type = qobject_type(entry->value);
> >  bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
> >  const char *format = composite ? "%*s%s:\n" : "%*s%s: ";
> 
> Unrelated to your patch: ugh!
> 
> Printf formats should be literals whenever possible, to make it easy for
> the compiler to warn you when you screw up.  It's trivially possible
> here!  Instead of
> 
>func_fprintf(f, format, indentation * 4, "", key);
> 
> do
> 
>func_fprintf(f, "%*s%s:%c", indentation * 4, "", key,
> composite ? '\n', ' ');;
> 
> > -char key[strlen(entry->key) + 1];
> > +#define __KEY_LEN (256)
> > +char key[__KEY_LEN];
> >  int i;
> >  
> > +assert(strlen(entry->key) + 1 <= __KEY_LEN);
> > +#undef __KEY_LEN
> >  /* replace dashes with spaces in key (variable) names */
> >  for (i = 0; entry->key[i]; i++) {
> >  key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];
> 
> I'm afraid this isn't a good idea.  It relies on the non-local argument
> that nobody will ever put a key longer than 255 into a qdict that gets
> dumped.  That may even be the case, but you need to *prove* it, not just
> assert it.  The weakest acceptable proof might be assertions in every
> place that put keys into a dict that might get dumped.  I suspect that's
> practical and maintainable only if there's a single place that does it.
> 
> If this was a good idea, I'd recommend to avoid the awkward macro:

Also I think the double underscore identifiers are considered reserved in C,
no?

> 
>char key[256];
>int i;
>
>assert(strlen(entry->key) + 1 <= ARRAY_SIZE(key));
> 

<...>

Fam



[Qemu-block] [PATCH v2 3/3] vmdk: Switch to heap arrays for vmdk_parent_open

2016-03-08 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 block/vmdk.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index c68f456..03be7f0 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -343,15 +343,16 @@ static int vmdk_reopen_prepare(BDRVReopenState *state,
 static int vmdk_parent_open(BlockDriverState *bs)
 {
 char *p_name;
-char desc[DESC_SIZE + 1];
+char *desc;
 BDRVVmdkState *s = bs->opaque;
 int ret;
 
-desc[DESC_SIZE] = '\0';
+desc = g_malloc0(DESC_SIZE + 1);
 ret = bdrv_pread(bs->file->bs, s->desc_offset, desc, DESC_SIZE);
 if (ret < 0) {
-return ret;
+goto out;
 }
+ret = 0;
 
 p_name = strstr(desc, "parentFileNameHint");
 if (p_name != NULL) {
@@ -360,16 +361,20 @@ static int vmdk_parent_open(BlockDriverState *bs)
 p_name += sizeof("parentFileNameHint") + 1;
 end_name = strchr(p_name, '\"');
 if (end_name == NULL) {
-return -EINVAL;
+ret = -EINVAL;
+goto out;
 }
 if ((end_name - p_name) > sizeof(bs->backing_file) - 1) {
-return -EINVAL;
+ret = -EINVAL;
+goto out;
 }
 
 pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
 }
 
-return 0;
+out:
+g_free(desc);
+return ret;
 }
 
 /* Create and append extent to the extent array. Return the added VmdkExtent
-- 
2.4.3




[Qemu-block] [PATCH v2 0/3] vmdk: Move descriptor buffers to heap

2016-03-08 Thread Fam Zheng
All three functions are not in hot path (all run once for the BDS lifecycle),
so it's okay to convert to g_malloc0.

Fam


Fam Zheng (3):
  vmdk: Switch to heap arrays for vmdk_write_cid
  vmdk: Switch to heap arrays for vmdk_read_cid
  vmdk: Switch to heap arrays for vmdk_parent_open

 block/vmdk.c | 47 +--
 1 file changed, 29 insertions(+), 18 deletions(-)

-- 
2.4.3




[Qemu-block] [PATCH v2 1/3] vmdk: Switch to heap arrays for vmdk_write_cid

2016-03-08 Thread Fam Zheng
It is only called once for each opened image, so we can do it the easy
way.

Reviewed-by: Peter Xu 
Signed-off-by: Fam Zheng 
---
 block/vmdk.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index a8db5d9..1ec2452 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -274,36 +274,39 @@ static uint32_t vmdk_read_cid(BlockDriverState *bs, int 
parent)
 
 static int vmdk_write_cid(BlockDriverState *bs, uint32_t cid)
 {
-char desc[DESC_SIZE], tmp_desc[DESC_SIZE];
+char *desc, *tmp_desc;
 char *p_name, *tmp_str;
 BDRVVmdkState *s = bs->opaque;
-int ret;
+int ret = 0;
 
+desc = g_malloc0(DESC_SIZE);
+tmp_desc = g_malloc0(DESC_SIZE);
 ret = bdrv_pread(bs->file->bs, s->desc_offset, desc, DESC_SIZE);
 if (ret < 0) {
-return ret;
+goto out;
 }
 
 desc[DESC_SIZE - 1] = '\0';
 tmp_str = strstr(desc, "parentCID");
 if (tmp_str == NULL) {
-return -EINVAL;
+ret = -EINVAL;
+goto out;
 }
 
-pstrcpy(tmp_desc, sizeof(tmp_desc), tmp_str);
+pstrcpy(tmp_desc, DESC_SIZE, tmp_str);
 p_name = strstr(desc, "CID");
 if (p_name != NULL) {
 p_name += sizeof("CID");
-snprintf(p_name, sizeof(desc) - (p_name - desc), "%" PRIx32 "\n", cid);
-pstrcat(desc, sizeof(desc), tmp_desc);
+snprintf(p_name, DESC_SIZE - (p_name - desc), "%" PRIx32 "\n", cid);
+pstrcat(desc, DESC_SIZE, tmp_desc);
 }
 
 ret = bdrv_pwrite_sync(bs->file->bs, s->desc_offset, desc, DESC_SIZE);
-if (ret < 0) {
-return ret;
-}
 
-return 0;
+out:
+g_free(desc);
+g_free(tmp_desc);
+return ret;
 }
 
 static int vmdk_is_cid_valid(BlockDriverState *bs)
-- 
2.4.3




Re: [Qemu-block] [Qemu-devel] [PATCH 2/8] block: fix unbounded stack for dump_qdict

2016-03-08 Thread Markus Armbruster
Peter Xu  writes:

> Suggested-by: Paolo Bonzini 
> CC: Markus Armbruster 
> CC: Kevin Wolf 
> CC: qemu-block@nongnu.org
> Signed-off-by: Peter Xu 
> ---
>  block/qapi.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/block/qapi.c b/block/qapi.c
> index db2d3fb..687e577 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -638,9 +638,12 @@ static void dump_qdict(fprintf_function func_fprintf, 
> void *f, int indentation,
>  QType type = qobject_type(entry->value);
>  bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
>  const char *format = composite ? "%*s%s:\n" : "%*s%s: ";

Unrelated to your patch: ugh!

Printf formats should be literals whenever possible, to make it easy for
the compiler to warn you when you screw up.  It's trivially possible
here!  Instead of

   func_fprintf(f, format, indentation * 4, "", key);

do

   func_fprintf(f, "%*s%s:%c", indentation * 4, "", key,
composite ? '\n', ' ');;

> -char key[strlen(entry->key) + 1];
> +#define __KEY_LEN (256)
> +char key[__KEY_LEN];
>  int i;
>  
> +assert(strlen(entry->key) + 1 <= __KEY_LEN);
> +#undef __KEY_LEN
>  /* replace dashes with spaces in key (variable) names */
>  for (i = 0; entry->key[i]; i++) {
>  key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];

I'm afraid this isn't a good idea.  It relies on the non-local argument
that nobody will ever put a key longer than 255 into a qdict that gets
dumped.  That may even be the case, but you need to *prove* it, not just
assert it.  The weakest acceptable proof might be assertions in every
place that put keys into a dict that might get dumped.  I suspect that's
practical and maintainable only if there's a single place that does it.

If this was a good idea, I'd recommend to avoid the awkward macro:

   char key[256];
   int i;
   
   assert(strlen(entry->key) + 1 <= ARRAY_SIZE(key));

There are several other ways to limit the stack usage:

1. Move the array from stack to heap.  Fine unless it's on a hot path.
   As far as I can tell, this dumping business is for HMP and qemu-io,
   i.e. not hot.

2. Use a stack array for short strings, switch to the heap for large
   strings.  More complex, but may be worth it on a hot path where most
   strings are short.

3. Instead of creating a new string with '-' replaced for printing,
   print characters.  Can be okay with buffered I/O, but obviously not
   on a hot path.

4. Like 3., but print substrings not containing '-'.