Re: [Qemu-block] [Qemu-devel] [PATCH v5 4/6] qemu-io: Allow unaligned access by default

2016-05-12 Thread Eric Blake
On 05/12/2016 09:50 AM, Eric Blake wrote:
>> This breaks qemu-iotests 136 for raw. It's pretty obvious that this is a
>> test case problem (uses unaligned requests to test error accounting), so
>> I'm not dropping the patch, but please do send a follow-up.
> 
> ...which explains why I missed this failure with ./check -raw.  Will
> fix, and maybe I should have grepped a bit harder, since it is fairly
> obvious:
> 
> tests/qemu-iotests/136:# Two types of invalid operations:
> unaligned length and unaligned offset
> 
> I will also check if this needs updating:
> 
> tests/qemu-iotests/109:# qemu-img compare can't handle unaligned
> file sizes

Turns out the comment was stale, even before my recent patches, but I
didn't bother bisecting to find when qemu-img learned to handle
unaligned raw images.  But see my comments in my other mail on the patch
for this file: 'qemu-img compare' doesn't necessarily give the nicest of
error messages for unaligned files

> 
> as both of those tests run under -raw but not -qcow2
> 
>>
>> Maybe negative length and offset work as a replacement.

Sadly, no, because cvtnum() doesn't like things larger than INT64_MAX,
so you can't pass in a negative number.  I added a new '-i' flag
instead; series now available for review.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH 1/3] qemu-io: Fix missing getopt() updates

2016-05-12 Thread Eric Blake
Commit 770e0e0e [*] forgot to implement 'writev -f'.  Likewise,
commit c3e001c forgot to implement 'aio_write -u -z'.

[*] does it sound "ech0e" in here? :)

Signed-off-by: Eric Blake 
---
 qemu-io-cmds.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 4a00bc6..415be25 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1150,7 +1150,7 @@ static int writev_f(BlockBackend *blk, int argc, char 
**argv)
 int pattern = 0xcd;
 QEMUIOVector qiov;

-while ((c = getopt(argc, argv, "CqP:")) != -1) {
+while ((c = getopt(argc, argv, "CfqP:")) != -1) {
 switch (c) {
 case 'C':
 Cflag = true;
@@ -1595,7 +1595,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 int flags = 0;

 ctx->blk = blk;
-while ((c = getopt(argc, argv, "CfqP:z")) != -1) {
+while ((c = getopt(argc, argv, "CfqP:uz")) != -1) {
 switch (c) {
 case 'C':
 ctx->Cflag = true;
-- 
2.5.5




[Qemu-block] [PATCH 0/3] Fix recent qemu-iotests issues

2016-05-12 Thread Eric Blake
I introduced a couple of bugs in my recent qemu-io enhancements;
time to fix them back up now that the broken patches are already
part of mainline.

Eric Blake (3):
  qemu-io: Fix missing getopt() updates
  qemu-iotests: Simplify 109 with unaligned qemu-img compare
  qemu-iotests: Fix regression in 136 on aio_read invalid

 qemu-io-cmds.c | 22 +-
 tests/qemu-iotests/109 |  2 --
 tests/qemu-iotests/109.out |  4 
 tests/qemu-iotests/136 | 18 +++---
 4 files changed, 24 insertions(+), 22 deletions(-)

-- 
2.5.5




[Qemu-block] [PATCH 3/3] qemu-iotests: Fix regression in 136 on aio_read invalid

2016-05-12 Thread Eric Blake
Commit 093ea232 removed the ability for aio_read and aio_write
to artificially inflate the invalid statistics counters for
block devices, since it no longer flags unaligned offset or
length.  Add 'aio_read -i' and 'aio_write -i' to restore
the ability, and update test 136 to use it.

Reported-by: Kevin Wolf 
Signed-off-by: Eric Blake 
---
 qemu-io-cmds.c | 20 
 tests/qemu-iotests/136 | 18 +++---
 2 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 415be25..059b8ee 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1476,6 +1476,7 @@ static void aio_read_help(void)
 " used to ensure all outstanding aio requests have been completed.\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -P, -- use a pattern to verify read data\n"
+" -i, -- treat request as invalid, for exercising stats\n"
 " -v, -- dump buffer to standard output\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
 "\n");
@@ -1488,7 +1489,7 @@ static const cmdinfo_t aio_read_cmd = {
 .cfunc  = aio_read_f,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-Cqv] [-P pattern] off len [len..]",
+.args   = "[-Ciqv] [-P pattern] off len [len..]",
 .oneline= "asynchronously reads a number of bytes",
 .help   = aio_read_help,
 };
@@ -1499,7 +1500,7 @@ static int aio_read_f(BlockBackend *blk, int argc, char 
**argv)
 struct aio_ctx *ctx = g_new0(struct aio_ctx, 1);

 ctx->blk = blk;
-while ((c = getopt(argc, argv, "CP:qv")) != -1) {
+while ((c = getopt(argc, argv, "CP:iqv")) != -1) {
 switch (c) {
 case 'C':
 ctx->Cflag = true;
@@ -1512,6 +1513,11 @@ static int aio_read_f(BlockBackend *blk, int argc, char 
**argv)
 return 0;
 }
 break;
+case 'i':
+printf("injecting invalid read request\n");
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ);
+g_free(ctx);
+return 0;
 case 'q':
 ctx->qflag = true;
 break;
@@ -1569,6 +1575,7 @@ static void aio_write_help(void)
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -f, -- use Force Unit Access semantics\n"
+" -i, -- treat request as invalid, for exercising stats\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
 " -u, -- with -z, allow unmapping\n"
 " -z, -- write zeroes using blk_aio_write_zeroes\n"
@@ -1582,7 +1589,7 @@ static const cmdinfo_t aio_write_cmd = {
 .cfunc  = aio_write_f,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-Cfquz] [-P pattern] off len [len..]",
+.args   = "[-Cfiquz] [-P pattern] off len [len..]",
 .oneline= "asynchronously writes a number of bytes",
 .help   = aio_write_help,
 };
@@ -1595,7 +1602,7 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 int flags = 0;

 ctx->blk = blk;
-while ((c = getopt(argc, argv, "CfqP:uz")) != -1) {
+while ((c = getopt(argc, argv, "CfiqP:uz")) != -1) {
 switch (c) {
 case 'C':
 ctx->Cflag = true;
@@ -1616,6 +1623,11 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 return 0;
 }
 break;
+case 'i':
+printf("injecting invalid write request\n");
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE);
+g_free(ctx);
+return 0;
 case 'z':
 ctx->zflag = true;
 break;
diff --git a/tests/qemu-iotests/136 b/tests/qemu-iotests/136
index e8c6937..5e92c4b 100644
--- a/tests/qemu-iotests/136
+++ b/tests/qemu-iotests/136
@@ -226,18 +226,14 @@ sector = "%d"

 highest_offset = wr_ops * wr_size

-# Two types of invalid operations: unaligned length and unaligned 
offset
-for i in range(invalid_rd_ops / 2):
-ops.append("aio_read 0 511")
+# Block layer abstracts away unaligned length and offset, so we
+# can't trigger an invalid op with any addresses; use qemu-io's
+# invalid injection feature instead
+for i in range(invalid_rd_ops):
+ops.append("aio_read -i 0 512")

-for i in range(invalid_rd_ops / 2, invalid_rd_ops):
-ops.append("aio_read 13 512")
-
-for i in range(invalid_wr_ops / 2):
-ops.append("aio_write 0 511")
-
-for i in range(invalid_wr_ops / 2, invalid_wr_ops):
-ops.append("aio_write 13 512")
+for i in range(invalid_wr_ops):
+ops.append("aio_write -i 0 512")

 for i in range(failed_rd_ops):
 ops.append("aio_read %d 512" % bad_offset)
-- 
2.5.5




Re: [Qemu-block] [PATCH v4 24/26] block: rip out all traces of password prompting

2016-05-12 Thread Eric Blake
On 02/29/2016 05:00 AM, Daniel P. Berrange wrote:
> Now that qcow & qcow2 are wired up to get encryption keys
> via the QCryptoSecret object, nothing is relying on the
> interactive prompting for passwords. All the code related
> to password prompting can thus be ripped out.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  hmp.c | 31 -
>  hw/usb/dev-storage.c  | 34 
>  include/monitor/monitor.h |  7 -
>  include/qemu/osdep.h  |  2 --
>  monitor.c | 68 
> ---
>  qemu-img.c| 31 -
>  qemu-io.c | 21 ---
>  qmp.c | 10 +--
>  tests/qemu-iotests/087|  2 ++
>  util/oslib-posix.c| 66 -
>  util/oslib-win32.c| 24 -
>  11 files changed, 3 insertions(+), 293 deletions(-)

Missed a spot: in qapi-schema.json, human-monitor-command states:

# Notes: This command only exists as a stop-gap.  Its use is highly
#discouraged.  The semantics of this command are not guaranteed.
...
#   o Commands that prompt the user for data (eg. 'cont' when the block
# device is encrypted) don't currently work

but after your series, cont no longer prompts for passwords so the
comment is (or will be) stale and worth removing.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PULL 59/69] qmp: add monitor command to add/remove a child

2016-05-12 Thread Kevin Wolf
From: Wen Congyang 

The new QMP command name is x-blockdev-change. It's just for adding/removing
quorum's child now, and doesn't support all kinds of children, all kinds of
operations, nor all block drivers. So it is experimental now.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 
Message-id: 1462865799-19402-4-git-send-email-xiecl.f...@cn.fujitsu.com
Signed-off-by: Max Reitz 
---
 blockdev.c   | 55 
 qapi/block-core.json | 32 ++
 qmp-commands.hx  | 53 ++
 3 files changed, 140 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index f74eb43..1892b8e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -4092,6 +4092,61 @@ out:
 aio_context_release(aio_context);
 }
 
+static BdrvChild *bdrv_find_child(BlockDriverState *parent_bs,
+  const char *child_name)
+{
+BdrvChild *child;
+
+QLIST_FOREACH(child, _bs->children, next) {
+if (strcmp(child->name, child_name) == 0) {
+return child;
+}
+}
+
+return NULL;
+}
+
+void qmp_x_blockdev_change(const char *parent, bool has_child,
+   const char *child, bool has_node,
+   const char *node, Error **errp)
+{
+BlockDriverState *parent_bs, *new_bs = NULL;
+BdrvChild *p_child;
+
+parent_bs = bdrv_lookup_bs(parent, parent, errp);
+if (!parent_bs) {
+return;
+}
+
+if (has_child == has_node) {
+if (has_child) {
+error_setg(errp, "The parameters child and node are in conflict");
+} else {
+error_setg(errp, "Either child or node must be specified");
+}
+return;
+}
+
+if (has_child) {
+p_child = bdrv_find_child(parent_bs, child);
+if (!p_child) {
+error_setg(errp, "Node '%s' does not have child '%s'",
+   parent, child);
+return;
+}
+bdrv_del_child(parent_bs, p_child, errp);
+}
+
+if (has_node) {
+new_bs = bdrv_find_node(node);
+if (!new_bs) {
+error_setg(errp, "Node '%s' not found", node);
+return;
+}
+bdrv_add_child(parent_bs, new_bs, errp);
+}
+}
+
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
 BlockJobInfoList *head = NULL, **p_next = 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1d09079..98a20d2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2556,3 +2556,35 @@
 ##
 { 'command': 'block-set-write-threshold',
   'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
+
+##
+# @x-blockdev-change
+#
+# Dynamically reconfigure the block driver state graph. It can be used
+# to add, remove, insert or replace a graph node. Currently only the
+# Quorum driver implements this feature to add or remove its child. This
+# is useful to fix a broken quorum child.
+#
+# If @node is specified, it will be inserted under @parent. @child
+# may not be specified in this case. If both @parent and @child are
+# specified but @node is not, @child will be detached from @parent.
+#
+# @parent: the id or name of the parent node.
+#
+# @child: #optional the name of a child under the given parent node.
+#
+# @node: #optional the name of the node that will be added.
+#
+# Note: this command is experimental, and its API is not stable. It
+# does not support all kinds of operations, all kinds of children, nor
+# all block drivers.
+#
+# Warning: The data in a new quorum child MUST be consistent with that of
+# the rest of the array.
+#
+# Since: 2.7
+##
+{ 'command': 'x-blockdev-change',
+  'data' : { 'parent': 'str',
+ '*child': 'str',
+ '*node': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index de896a5..94847e5 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4398,6 +4398,59 @@ Example:
 EQMP
 
 {
+.name   = "x-blockdev-change",
+.args_type  = "parent:B,child:B?,node:B?",
+.mhandler.cmd_new = qmp_marshal_x_blockdev_change,
+},
+
+SQMP
+x-blockdev-change
+-
+
+Dynamically reconfigure the block driver state graph. It can be used
+to add, remove, insert or replace a graph node. Currently only the
+Quorum driver implements this feature to add or remove its child. This
+is useful to fix a broken quorum child.
+
+If @node is specified, it will be inserted under @parent. @child
+may not be specified in this case. If both @parent and @child are
+specified but @node is not, @child will be detached from @parent.
+
+Arguments:
+- "parent": the id or name of the parent node 

[Qemu-block] [PULL 41/69] sd: Switch to byte-based block access

2016-05-12 Thread Kevin Wolf
From: Eric Blake 

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Greatly simplifies the code, now that we let the block layer
take care of alignment and read-modify-write on our behalf :)
In fact, we no longer need to include 'buf' in the migration
stream (although we do have to ensure that the stream remains
compatible).

Signed-off-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 hw/sd/sd.c | 51 ---
 1 file changed, 4 insertions(+), 47 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index b66e5d2..87e3d23 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -123,7 +123,6 @@ struct SDState {
 qemu_irq readonly_cb;
 qemu_irq inserted_cb;
 BlockBackend *blk;
-uint8_t *buf;
 
 bool enable;
 };
@@ -551,7 +550,7 @@ static const VMStateDescription sd_vmstate = {
 VMSTATE_UINT64(data_start, SDState),
 VMSTATE_UINT32(data_offset, SDState),
 VMSTATE_UINT8_ARRAY(data, SDState, 512),
-VMSTATE_BUFFER_POINTER_UNSAFE(buf, SDState, 1, 512),
+VMSTATE_UNUSED_V(1, 512),
 VMSTATE_BOOL(enable, SDState),
 VMSTATE_END_OF_LIST()
 },
@@ -1577,57 +1576,17 @@ send_response:
 
 static void sd_blk_read(SDState *sd, uint64_t addr, uint32_t len)
 {
-uint64_t end = addr + len;
-
 DPRINTF("sd_blk_read: addr = 0x%08llx, len = %d\n",
 (unsigned long long) addr, len);
-if (!sd->blk || blk_read(sd->blk, addr >> 9, sd->buf, 1) < 0) {
+if (!sd->blk || blk_pread(sd->blk, addr, sd->data, len) < 0) {
 fprintf(stderr, "sd_blk_read: read error on host side\n");
-return;
 }
-
-if (end > (addr & ~511) + 512) {
-memcpy(sd->data, sd->buf + (addr & 511), 512 - (addr & 511));
-
-if (blk_read(sd->blk, end >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_read: read error on host side\n");
-return;
-}
-memcpy(sd->data + 512 - (addr & 511), sd->buf, end & 511);
-} else
-memcpy(sd->data, sd->buf + (addr & 511), len);
 }
 
 static void sd_blk_write(SDState *sd, uint64_t addr, uint32_t len)
 {
-uint64_t end = addr + len;
-
-if ((addr & 511) || len < 512)
-if (!sd->blk || blk_read(sd->blk, addr >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: read error on host side\n");
-return;
-}
-
-if (end > (addr & ~511) + 512) {
-memcpy(sd->buf + (addr & 511), sd->data, 512 - (addr & 511));
-if (blk_write(sd->blk, addr >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: write error on host side\n");
-return;
-}
-
-if (blk_read(sd->blk, end >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: read error on host side\n");
-return;
-}
-memcpy(sd->buf, sd->data + 512 - (addr & 511), end & 511);
-if (blk_write(sd->blk, end >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: write error on host side\n");
-}
-} else {
-memcpy(sd->buf + (addr & 511), sd->data, len);
-if (!sd->blk || blk_write(sd->blk, addr >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: write error on host side\n");
-}
+if (!sd->blk || blk_pwrite(sd->blk, addr, sd->data, len, 0) < 0) {
+fprintf(stderr, "sd_blk_write: write error on host side\n");
 }
 }
 
@@ -1925,8 +1884,6 @@ static void sd_realize(DeviceState *dev, Error **errp)
 return;
 }
 
-sd->buf = blk_blockalign(sd->blk, 512);
-
 if (sd->blk) {
 blk_set_dev_ops(sd->blk, _block_ops, sd);
 }
-- 
1.8.3.1




[Qemu-block] [PULL 38/69] nand: Switch to byte-based block access

2016-05-12 Thread Kevin Wolf
From: Eric Blake 

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

This file is doing some complex computations to map various
flash page sizes (256, 512, and 2048) atop generic uses of
512-byte sector operations.  Perhaps someone will want to tidy
up the file for fewer gymnastics in managing addresses and
offsets, and less wasteful visits of 256-byte pages, but it
was out of scope for this series, where I just went with the
mechanical conversion.

Signed-off-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 hw/block/nand.c | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/hw/block/nand.c b/hw/block/nand.c
index 29c6596..c69e675 100644
--- a/hw/block/nand.c
+++ b/hw/block/nand.c
@@ -663,7 +663,8 @@ static void glue(nand_blk_write_, PAGE_SIZE)(NANDFlashState 
*s)
 sector = SECTOR(s->addr);
 off = (s->addr & PAGE_MASK) + s->offset;
 soff = SECTOR_OFFSET(s->addr);
-if (blk_read(s->blk, sector, iobuf, PAGE_SECTORS) < 0) {
+if (blk_pread(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+  PAGE_SECTORS << BDRV_SECTOR_BITS) < 0) {
 printf("%s: read error in sector %" PRIu64 "\n", __func__, sector);
 return;
 }
@@ -675,21 +676,24 @@ static void glue(nand_blk_write_, 
PAGE_SIZE)(NANDFlashState *s)
 MIN(OOB_SIZE, off + s->iolen - PAGE_SIZE));
 }
 
-if (blk_write(s->blk, sector, iobuf, PAGE_SECTORS) < 0) {
+if (blk_pwrite(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+   PAGE_SECTORS << BDRV_SECTOR_BITS, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, 
sector);
 }
 } else {
 off = PAGE_START(s->addr) + (s->addr & PAGE_MASK) + s->offset;
 sector = off >> 9;
 soff = off & 0x1ff;
-if (blk_read(s->blk, sector, iobuf, PAGE_SECTORS + 2) < 0) {
+if (blk_pread(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+  (PAGE_SECTORS + 2) << BDRV_SECTOR_BITS) < 0) {
 printf("%s: read error in sector %" PRIu64 "\n", __func__, sector);
 return;
 }
 
 mem_and(iobuf + soff, s->io, s->iolen);
 
-if (blk_write(s->blk, sector, iobuf, PAGE_SECTORS + 2) < 0) {
+if (blk_pwrite(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+   (PAGE_SECTORS + 2) << BDRV_SECTOR_BITS, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, 
sector);
 }
 }
@@ -716,17 +720,20 @@ static void glue(nand_blk_erase_, 
PAGE_SIZE)(NANDFlashState *s)
 i = SECTOR(addr);
 page = SECTOR(addr + (1 << (ADDR_SHIFT + s->erase_shift)));
 for (; i < page; i ++)
-if (blk_write(s->blk, i, iobuf, 1) < 0) {
+if (blk_pwrite(s->blk, i << BDRV_SECTOR_BITS, iobuf,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, i);
 }
 } else {
 addr = PAGE_START(addr);
 page = addr >> 9;
-if (blk_read(s->blk, page, iobuf, 1) < 0) {
+if (blk_pread(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+  BDRV_SECTOR_SIZE) < 0) {
 printf("%s: read error in sector %" PRIu64 "\n", __func__, page);
 }
 memset(iobuf + (addr & 0x1ff), 0xff, (~addr & 0x1ff) + 1);
-if (blk_write(s->blk, page, iobuf, 1) < 0) {
+if (blk_pwrite(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, page);
 }
 
@@ -734,18 +741,20 @@ static void glue(nand_blk_erase_, 
PAGE_SIZE)(NANDFlashState *s)
 i = (addr & ~0x1ff) + 0x200;
 for (addr += ((PAGE_SIZE + OOB_SIZE) << s->erase_shift) - 0x200;
 i < addr; i += 0x200) {
-if (blk_write(s->blk, i >> 9, iobuf, 1) < 0) {
+if (blk_pwrite(s->blk, i, iobuf, BDRV_SECTOR_SIZE, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n",
__func__, i >> 9);
 }
 }
 
 page = i >> 9;
-if (blk_read(s->blk, page, iobuf, 1) < 0) {
+if (blk_pread(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+  BDRV_SECTOR_SIZE) < 0) {
 printf("%s: read error in sector %" PRIu64 "\n", __func__, page);
 }
 memset(iobuf, 0xff, ((addr - 1) & 0x1ff) + 1);
-if (blk_write(s->blk, page, iobuf, 1) < 0) {
+if (blk_pwrite(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, page);
 }
 }
@@ -760,7 +769,8 @@ static void 

Re: [Qemu-block] [PATCH v7 06/19] scsi-disk: Switch to byte-based aio block access

2016-05-12 Thread Eric Blake
On 05/12/2016 05:25 AM, Paolo Bonzini wrote:
> 
> 
> On 06/05/2016 18:26, Eric Blake wrote:
>> @@ -340,11 +338,12 @@ static void scsi_do_read(SCSIDiskReq *r, int ret)
>>  r->req.aiocb = dma_blk_read(s->qdev.conf.blk, r->req.sg, r->sector,
>>  scsi_dma_complete, r);
> 
> This is broken, it should be changed to an offset in the previous patch.
> 
> Please rename the function too, so that it is obvious that you have
> changed all callers.
> 
> How was this patch tested?

Sadly, my testing was limited to NBD use of the blk_ functions, then
compile testing and close audit of all other drivers, to limit the
changes to ONLY the blk_ calling conventions. That's why I kept sector
alignment in other calls (as you later noticed, that means
dma_blk_read() was unchanged in semantics).

Yes, there are more cleanups possible, such as altering things to a new
dma_blk_pread() with byte semantics, but I'm hoping that it can be done
on a per-driver basis by someone more familiar with how to test the
changes.  I also like the fact that my patches were done at the very
beginning of the 2.7 cycle to maximize testing.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PULL 36/69] xen_disk: Switch to byte-based aio block access

2016-05-12 Thread Kevin Wolf
From: Eric Blake 

Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

Signed-off-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 hw/block/xen_disk.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index d4ce380..064c116 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -554,9 +554,8 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
 block_acct_start(blk_get_stats(blkdev->blk), >acct,
  ioreq->v.size, BLOCK_ACCT_READ);
 ioreq->aio_inflight++;
-blk_aio_readv(blkdev->blk, ioreq->start / BLOCK_SIZE,
-  >v, ioreq->v.size / BLOCK_SIZE,
-  qemu_aio_complete, ioreq);
+blk_aio_preadv(blkdev->blk, ioreq->start, >v, 0,
+   qemu_aio_complete, ioreq);
 break;
 case BLKIF_OP_WRITE:
 case BLKIF_OP_FLUSH_DISKCACHE:
@@ -569,9 +568,8 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
  ioreq->req.operation == BLKIF_OP_WRITE ?
  BLOCK_ACCT_WRITE : BLOCK_ACCT_FLUSH);
 ioreq->aio_inflight++;
-blk_aio_writev(blkdev->blk, ioreq->start / BLOCK_SIZE,
-   >v, ioreq->v.size / BLOCK_SIZE,
-   qemu_aio_complete, ioreq);
+blk_aio_pwritev(blkdev->blk, ioreq->start, >v, 0,
+qemu_aio_complete, ioreq);
 break;
 case BLKIF_OP_DISCARD:
 {
-- 
1.8.3.1




[Qemu-block] [PULL 51/69] nbd: Simplify client FUA handling

2016-05-12 Thread Kevin Wolf
From: Eric Blake 

Now that the block layer honors per-bds FUA support, we don't
have to duplicate the fallback flush at the NBD layer.  The
static function nbd_co_writev_flags() is no longer needed, and
the driver can just directly use nbd_client_co_writev().

Signed-off-by: Eric Blake 
Reviewed-by: Fam Zheng 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block/nbd-client.c |  8 
 block/nbd-client.h |  2 +-
 block/nbd.c| 25 +++--
 3 files changed, 8 insertions(+), 27 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 5fc96e9..4d13444 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -243,15 +243,15 @@ static int nbd_co_readv_1(BlockDriverState *bs, int64_t 
sector_num,
 
 static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov,
-   int offset, int *flags)
+   int offset, int flags)
 {
 NbdClientSession *client = nbd_get_client_session(bs);
 struct nbd_request request = { .type = NBD_CMD_WRITE };
 struct nbd_reply reply;
 ssize_t ret;
 
-if ((*flags & BDRV_REQ_FUA) && (client->nbdflags & NBD_FLAG_SEND_FUA)) {
-*flags &= ~BDRV_REQ_FUA;
+if (flags & BDRV_REQ_FUA) {
+assert(client->nbdflags & NBD_FLAG_SEND_FUA);
 request.type |= NBD_CMD_FLAG_FUA;
 }
 
@@ -291,7 +291,7 @@ int nbd_client_co_readv(BlockDriverState *bs, int64_t 
sector_num,
 }
 
 int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
- int nb_sectors, QEMUIOVector *qiov, int *flags)
+ int nb_sectors, QEMUIOVector *qiov, int flags)
 {
 int offset = 0;
 int ret;
diff --git a/block/nbd-client.h b/block/nbd-client.h
index bc7aec0..c618dad 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -48,7 +48,7 @@ int nbd_client_co_discard(BlockDriverState *bs, int64_t 
sector_num,
   int nb_sectors);
 int nbd_client_co_flush(BlockDriverState *bs);
 int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
- int nb_sectors, QEMUIOVector *qiov, int *flags);
+ int nb_sectors, QEMUIOVector *qiov, int flags);
 int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov);
 
diff --git a/block/nbd.c b/block/nbd.c
index a4fba91..6015e8b 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -355,25 +355,6 @@ static int nbd_co_readv(BlockDriverState *bs, int64_t 
sector_num,
 return nbd_client_co_readv(bs, sector_num, nb_sectors, qiov);
 }
 
-static int nbd_co_writev_flags(BlockDriverState *bs, int64_t sector_num,
-   int nb_sectors, QEMUIOVector *qiov, int flags)
-{
-int ret;
-
-ret = nbd_client_co_writev(bs, sector_num, nb_sectors, qiov, );
-if (ret < 0) {
-return ret;
-}
-
-/* The flag wasn't sent to the server, so we need to emulate it with an
- * explicit flush */
-if (flags & BDRV_REQ_FUA) {
-ret = nbd_client_co_flush(bs);
-}
-
-return ret;
-}
-
 static int nbd_co_flush(BlockDriverState *bs)
 {
 return nbd_client_co_flush(bs);
@@ -470,7 +451,7 @@ static BlockDriver bdrv_nbd = {
 .bdrv_parse_filename= nbd_parse_filename,
 .bdrv_file_open = nbd_open,
 .bdrv_co_readv  = nbd_co_readv,
-.bdrv_co_writev_flags   = nbd_co_writev_flags,
+.bdrv_co_writev_flags   = nbd_client_co_writev,
 .bdrv_close = nbd_close,
 .bdrv_co_flush_to_os= nbd_co_flush,
 .bdrv_co_discard= nbd_co_discard,
@@ -488,7 +469,7 @@ static BlockDriver bdrv_nbd_tcp = {
 .bdrv_parse_filename= nbd_parse_filename,
 .bdrv_file_open = nbd_open,
 .bdrv_co_readv  = nbd_co_readv,
-.bdrv_co_writev_flags   = nbd_co_writev_flags,
+.bdrv_co_writev_flags   = nbd_client_co_writev,
 .bdrv_close = nbd_close,
 .bdrv_co_flush_to_os= nbd_co_flush,
 .bdrv_co_discard= nbd_co_discard,
@@ -506,7 +487,7 @@ static BlockDriver bdrv_nbd_unix = {
 .bdrv_parse_filename= nbd_parse_filename,
 .bdrv_file_open = nbd_open,
 .bdrv_co_readv  = nbd_co_readv,
-.bdrv_co_writev_flags   = nbd_co_writev_flags,
+.bdrv_co_writev_flags   = nbd_client_co_writev,
 .bdrv_close = nbd_close,
 .bdrv_co_flush_to_os= nbd_co_flush,
 .bdrv_co_discard= nbd_co_discard,
-- 
1.8.3.1




Re: [Qemu-block] [Qemu-devel] [PULL 00/69] Block layer patches

2016-05-12 Thread Peter Maydell
On 12 May 2016 at 15:34, Kevin Wolf  wrote:
> The following changes since commit 26617924e9a329bdff81936d2d277983f0c4d372:
>
>   Open 2.7 development tree (2016-05-12 12:35:25 +0100)
>
> are available in the git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to efc2645f714aae1bcf22e8165cad51c57f34fdf3:
>
>   Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-05-12' 
> into queue-block (2016-05-12 15:35:20 +0200)
>
> 
>
> Block layer patches
>
> 

Applied, thanks.

-- PMM



Re: [Qemu-block] [Qemu-devel] [PATCH v4 00/11] nbd: tighter protocol compliance

2016-05-12 Thread Alex Bligh

On 11 May 2016, at 23:39, Eric Blake  wrote:

> Fix several corner-case bugs in our implementation of the NBD
> protocol, both as client and as server.

I thought I'd added a Reviewed-By: line to more of these before.
On a very very quick look, they all look good to me.

-- 
Alex Bligh







[Qemu-block] [PULL 56/69] qemu-img: check block status of backing file when converting.

2016-05-12 Thread Kevin Wolf
From: Ren Kimura 

When converting images, check the block status of its backing file chain
to avoid needlessly reading zeros.

Signed-off-by: Ren Kimura 
Message-id: 1461773098-20356-1-git-send-email-rkx1209...@gmail.com
Signed-off-by: Max Reitz 
---
 qemu-img.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 491a460..4792366 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1475,10 +1475,21 @@ static int convert_iteration_sectors(ImgConvertState 
*s, int64_t sector_num)
 } else if (!s->target_has_backing) {
 /* Without a target backing file we must copy over the contents of
  * the backing file as well. */
-/* TODO Check block status of the backing file chain to avoid
+/* Check block status of the backing file chain to avoid
  * needlessly reading zeroes and limiting the iteration to the
  * buffer size */
-s->status = BLK_DATA;
+ret = bdrv_get_block_status_above(blk_bs(s->src[s->src_cur]), NULL,
+  sector_num - s->src_cur_offset,
+  n, , );
+if (ret < 0) {
+return ret;
+}
+
+if (ret & BDRV_BLOCK_ZERO) {
+s->status = BLK_ZERO;
+} else {
+s->status = BLK_DATA;
+}
 } else {
 s->status = BLK_BACKING_FILE;
 }
-- 
1.8.3.1




Re: [Qemu-block] [Qemu-devel] [PATCH v4 09/11] nbd: Add qemu-nbd -D for human-readable description

2016-05-12 Thread Daniel P. Berrange
On Thu, May 12, 2016 at 09:38:58AM -0600, Eric Blake wrote:
> On 05/12/2016 01:47 AM, Daniel P. Berrange wrote:
> > On Wed, May 11, 2016 at 04:39:42PM -0600, Eric Blake wrote:
> >> The NBD protocol allows servers to advertise a human-readable
> >> description alongside an export name during NBD_OPT_LIST.  Add
> >> an option to pass through the user's string to the NBD client.
> >>
> >> Doing this also makes it easier to test commit 200650d4, which
> >> is the client counterpart of receiving the description.
> >>
> 
> >> -@item -x NAME, --export-name=NAME
> >> +@item -x, --export-name=@var{name}
> > 
> > Why this change - that reads as saying that '-x' doesn't take any value
> > which is wrong IMHO
> 
> It's consistent with other options-with-arguments in the same file, such as:
> 
> @item -p, --port=@var{port}
> @item -o, --offset=@var{offset}
> @item -b, --bind=@var{iface}
> @item -k, --socket=@var{path}
> 
> etc. Basically, we want to use this common escape hatch (see 'ls
> --help', for example):
> 
> Mandatory arguments to long options are mandatory for short options too.

Ah, I didn't realize it was standard practice todo that - i personally
just historically include the arg value in both, but if QEMU doesn't
that's ok - consistency is more important.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-block] [PATCH v5 4/6] qemu-io: Allow unaligned access by default

2016-05-12 Thread Eric Blake
On 05/12/2016 08:38 AM, Kevin Wolf wrote:
> Am 08.05.2016 um 05:16 hat Eric Blake geschrieben:
>> There's no reason to require the user to specify a flag just so
>> they can pass in unaligned numbers.  Keep 'read -p' and 'write -p'
>> as no-ops so that I don't have to hunt down and update all users
>> of qemu-io, but otherwise make their behavior default as 'read' and
>> 'write'.  Also fix 'write -z', 'readv', 'writev', 'writev',
>> 'aio_read', 'aio_write', and 'aio_write -z'.  For now, 'read -b',
>> 'write -b', and 'write -c' still require alignment (and 'multiwrite',
>> but that's slated to die soon).
>>
>> qemu-iotest 23 is updated to match, as the only test that was
>> previously explicitly expecting an error on an unaligned request.

I found that one by 'git grep "sector aligned"', and tested with ./check
-qcow2...

>>
>> Signed-off-by: Eric Blake 
> 
> This breaks qemu-iotests 136 for raw. It's pretty obvious that this is a
> test case problem (uses unaligned requests to test error accounting), so
> I'm not dropping the patch, but please do send a follow-up.

...which explains why I missed this failure with ./check -raw.  Will
fix, and maybe I should have grepped a bit harder, since it is fairly
obvious:

tests/qemu-iotests/136:# Two types of invalid operations:
unaligned length and unaligned offset

I will also check if this needs updating:

tests/qemu-iotests/109:# qemu-img compare can't handle unaligned
file sizes

as both of those tests run under -raw but not -qcow2

> 
> Maybe negative length and offset work as a replacement.

Indeed, since unaligned length and unaligned offset are now explicitly
handled by the block layer.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-12 Thread Alex Bligh

On 11 May 2016, at 22:12, Wouter Verhelst  wrote:

> On Tue, May 10, 2016 at 04:38:29PM +0100, Alex Bligh wrote:
>> On 10 May 2016, at 16:29, Eric Blake  wrote:
>>> So the kernel is currently one of the clients that does NOT honor block
>>> sizes, and as such, servers should be prepared for ANY size up to
>>> UINT_MAX (other than DoS handling).
>> 
>> Or not to permit a connection.
> 
> Right -- and this is why I was recommending against making this a MUST in the
> first place.

Indeed, and it currently is a 'MAY':

> except that if a server believes a client's behaviour constitutes
> a denial of service, it MAY initiate a hard disconnect.

-- 
Alex Bligh







Re: [Qemu-block] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer

2016-05-12 Thread Kevin Wolf
Am 12.05.2016 um 15:47 hat Alberto Garcia geschrieben:
> On Tue 03 May 2016 03:48:47 PM CEST, Kevin Wolf wrote:
> > Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben:
> >> On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote:
> >> >> c) we fix bdrv_reopen() so we can actually run both jobs at the same
> >> >>time. I'm wondering if pausing all block jobs between
> >> >>bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
> >> >>trick. Opinions?
> >> >
> >> > I would have to read up the details of the problem again, but I think
> >> > with bdrv_drained_begin/end() we actually have the right tool now to fix
> >> > it properly. We may need to pull up the drain (bdrv_drain_all() today)
> >> > from bdrv_reopen_multiple() to its caller and just assert it in the
> >> > function itself, but there shouldn't be much more to it than that.
> >> 
> >> I think that's not enough, see point 2) here:
> >> 
> >> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> >> 
> >>   "I've been taking a look at the bdrv_drained_begin/end() API, but as I
> >>understand it it prevents requests from a different AioContext.
> >>Since all BDS in the same chain share the same context it does not
> >>really help here."
> >
> > Yes, that's the part I meant with pulling up the calls.
> >
> > If I understand correctly, the problem is that first bdrv_reopen_queue()
> > queues a few BDSes for reopen, then bdrv_drain_all() completes all
> > running requests and can indirectly trigger a graph modification, and
> > then bdrv_reopen_multiple() uses the queue which doesn't match reality
> > any more.
> >
> > The solution to that should be simply changing the order of things:
> >
> > 1. bdrv_drained_begin()
> > 2. bdrv_reopen_queue()
> > 3. bdrv_reopen_multiple()
> > * Instead of bdrv_drain_all(), assert that no requests are pending
> > * We don't run requests, so we can't complete a block job and
> >   manipulate the graph any more
> > 4. then bdrv_drained_end()
> 
> This doesn't work. Here's what happens:
> 
> 1) Block job (a) starts (block-stream).
> 
> 2) Block job (b) starts (block-stream, or block-commit).
> 
> 3) job (b) calls bdrv_reopen() and does the drain call.
> 
> 4) job (b) creates reopen_queue and calls bdrv_reopen_multiple().
>There are no pending requests at this point, but job (a) is sleeping.
> 
> 5) bdrv_reopen_multiple() iterates over reopen_queue and calls
>bdrv_reopen_prepare() -> bdrv_flush() -> bdrv_co_flush() ->
>qemu_coroutine_yield().

I think between here and the next step is what I don't understand.

bdrv_reopen_multiple() is not called in coroutine context, right? All
block jobs use block_job_defer_to_main_loop() before they call
bdrv_reopen(), as far as I can see. So bdrv_flush() shouldn't take the
shortcut, but use a nested event loop.

What is it that calls into job (a) from that event loop? It can't be a
request completion because we already drained all requests. Is it a
timer?

> 6) job (a) resumes, finishes the job and removes nodes from the graph.
> 
> 7) job (b) continues with bdrv_reopen_multiple() but now reopen_queue
>contains invalid pointers.

I don't fully understand the problem yet, but as a shot in the dark,
would pausing block jobs in bdrv_drained_begin() help?

Kevin



[Qemu-block] [PULL 49/69] block: Make supported_write_flags a per-bds property

2016-05-12 Thread Kevin Wolf
From: Eric Blake 

Pre-patch, .supported_write_flags lives at the driver level, which
means we are blindly declaring that all block devices using a
given driver will either equally support FUA, or that we need a
fallback at the block layer.  But there are drivers where FUA
support is a per-block decision: the NBD block driver is dependent
on the remote server advertising NBD_FLAG_SEND_FUA (and has
fallback code to duplicate the flush that the block layer would do
if NBD had not set .supported_write_flags); and the iscsi block
driver is dependent on the mode sense bits advertised by the
underlying device (and is currently silently ignoring FUA requests
if the underlying device does not support FUA).

The fix is to make supported flags as a per-BDS option, set during
.bdrv_open().  This patch moves the variable and fixes NBD and iscsi
to set it only conditionally; later patches will then further
simplify the NBD driver to quit duplicating work done at the block
layer, as well as tackle the fact that SCSI does not support FUA
semantics on WRITESAME(10/16) but only on WRITE(10/16).

Signed-off-by: Eric Blake 
Reviewed-by: Fam Zheng 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block/io.c|  9 -
 block/iscsi.c | 10 +++---
 block/nbd-client.c|  3 +++
 block/nbd.c   |  3 ---
 block/raw_bsd.c   |  2 +-
 include/block/block_int.h |  4 ++--
 6 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/block/io.c b/block/io.c
index 0db1146..1fb7afe 100644
--- a/block/io.c
+++ b/block/io.c
@@ -841,9 +841,10 @@ static int coroutine_fn 
bdrv_driver_pwritev(BlockDriverState *bs,
 
 if (drv->bdrv_co_writev_flags) {
 ret = drv->bdrv_co_writev_flags(bs, sector_num, nb_sectors, qiov,
-flags);
+flags & bs->supported_write_flags);
+flags &= ~bs->supported_write_flags;
 } else if (drv->bdrv_co_writev) {
-assert(drv->supported_write_flags == 0);
+assert(!bs->supported_write_flags);
 ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
 } else {
 BlockAIOCB *acb;
@@ -862,9 +863,7 @@ static int coroutine_fn 
bdrv_driver_pwritev(BlockDriverState *bs,
 }
 
 emulate_flags:
-if (ret == 0 && (flags & BDRV_REQ_FUA) &&
-!(drv->supported_write_flags & BDRV_REQ_FUA))
-{
+if (ret == 0 && (flags & BDRV_REQ_FUA)) {
 ret = bdrv_co_flush(bs);
 }
 
diff --git a/block/iscsi.c b/block/iscsi.c
index 4f75204..6d5c1f6 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -456,8 +456,11 @@ iscsi_co_writev_flags(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors,
 struct IscsiTask iTask;
 uint64_t lba;
 uint32_t num_sectors;
-bool fua;
+bool fua = flags & BDRV_REQ_FUA;
 
+if (fua) {
+assert(iscsilun->dpofua);
+}
 if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
 return -EINVAL;
 }
@@ -472,7 +475,6 @@ iscsi_co_writev_flags(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors,
 num_sectors = sector_qemu2lun(nb_sectors, iscsilun);
 iscsi_co_init_iscsitask(iscsilun, );
 retry:
-fua = iscsilun->dpofua && (flags & BDRV_REQ_FUA);
 if (iscsilun->use_16_for_rw) {
 iTask.task = iscsi_write16_task(iscsilun->iscsi, iscsilun->lun, lba,
 NULL, num_sectors * 
iscsilun->block_size,
@@ -1548,6 +1550,9 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 task = NULL;
 
 iscsi_modesense_sync(iscsilun);
+if (iscsilun->dpofua) {
+bs->supported_write_flags = BDRV_REQ_FUA;
+}
 
 /* Check the write protect flag of the LUN if we want to write */
 if (iscsilun->type == TYPE_DISK && (flags & BDRV_O_RDWR) &&
@@ -1841,7 +1846,6 @@ static BlockDriver bdrv_iscsi = {
 .bdrv_co_write_zeroes = iscsi_co_write_zeroes,
 .bdrv_co_readv = iscsi_co_readv,
 .bdrv_co_writev_flags  = iscsi_co_writev_flags,
-.supported_write_flags = BDRV_REQ_FUA,
 .bdrv_co_flush_to_disk = iscsi_co_flush,
 
 #ifdef __linux__
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 878e879..5fc96e9 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -414,6 +414,9 @@ int nbd_client_init(BlockDriverState *bs,
 logout("Failed to negotiate with the NBD server\n");
 return ret;
 }
+if (client->nbdflags & NBD_FLAG_SEND_FUA) {
+bs->supported_write_flags = BDRV_REQ_FUA;
+}
 
 qemu_co_mutex_init(>send_mutex);
 qemu_co_mutex_init(>free_sema);
diff --git a/block/nbd.c b/block/nbd.c
index fccbfef..a4fba91 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -471,7 +471,6 @@ static BlockDriver bdrv_nbd = {
 .bdrv_file_open = nbd_open,
 .bdrv_co_readv  = nbd_co_readv,
 

[Qemu-block] [PULL 52/69] block: Invalidate all children

2016-05-12 Thread Kevin Wolf
From: Fam Zheng 

Currently we only recurse to bs->file, which will miss the children in quorum
and VMDK.

Recurse into the whole subtree to avoid that.

Signed-off-by: Fam Zheng 
Reviewed-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index d70ae55..71b523c 100644
--- a/block.c
+++ b/block.c
@@ -3198,6 +3198,7 @@ void bdrv_init_with_whitelist(void)
 
 void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
 {
+BdrvChild *child;
 Error *local_err = NULL;
 int ret;
 
@@ -3212,13 +3213,20 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error 
**errp)
 
 if (bs->drv->bdrv_invalidate_cache) {
 bs->drv->bdrv_invalidate_cache(bs, _err);
-} else if (bs->file) {
-bdrv_invalidate_cache(bs->file->bs, _err);
+if (local_err) {
+bs->open_flags |= BDRV_O_INACTIVE;
+error_propagate(errp, local_err);
+return;
+}
 }
-if (local_err) {
-bs->open_flags |= BDRV_O_INACTIVE;
-error_propagate(errp, local_err);
-return;
+
+QLIST_FOREACH(child, >children, next) {
+bdrv_invalidate_cache(child->bs, _err);
+if (local_err) {
+bs->open_flags |= BDRV_O_INACTIVE;
+error_propagate(errp, local_err);
+return;
+}
 }
 
 ret = refresh_total_sectors(bs, bs->total_sectors);
-- 
1.8.3.1




Re: [Qemu-block] [PATCH v5 4/6] qemu-io: Allow unaligned access by default

2016-05-12 Thread Kevin Wolf
Am 08.05.2016 um 05:16 hat Eric Blake geschrieben:
> There's no reason to require the user to specify a flag just so
> they can pass in unaligned numbers.  Keep 'read -p' and 'write -p'
> as no-ops so that I don't have to hunt down and update all users
> of qemu-io, but otherwise make their behavior default as 'read' and
> 'write'.  Also fix 'write -z', 'readv', 'writev', 'writev',
> 'aio_read', 'aio_write', and 'aio_write -z'.  For now, 'read -b',
> 'write -b', and 'write -c' still require alignment (and 'multiwrite',
> but that's slated to die soon).
> 
> qemu-iotest 23 is updated to match, as the only test that was
> previously explicitly expecting an error on an unaligned request.
> 
> Signed-off-by: Eric Blake 

This breaks qemu-iotests 136 for raw. It's pretty obvious that this is a
test case problem (uses unaligned requests to test error accounting), so
I'm not dropping the patch, but please do send a follow-up.

Maybe negative length and offset work as a replacement.

Kevin



[Qemu-block] [PULL 32/69] block: Introduce byte-based aio read/write

2016-05-12 Thread Kevin Wolf
From: Eric Blake 

blk_aio_readv() and blk_aio_writev() are annoying in that they
can't access sub-sector granularity, and cannot pass flags.
Also, they require the caller to pass redundant information
about the size of the I/O (qiov->size in bytes must match
nb_sectors in sectors).

Add new blk_aio_preadv() and blk_aio_pwritev() functions to fix
the flaws. The next few patches will upgrade callers, then
finally delete the old interfaces.

Signed-off-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c  | 18 +-
 include/sysemu/block-backend.h |  8 +++-
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index f8f88a6..6ac76d0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1,7 +1,7 @@
 /*
  * QEMU Block backends
  *
- * Copyright (C) 2014 Red Hat, Inc.
+ * Copyright (C) 2014-2016 Red Hat, Inc.
  *
  * Authors:
  *  Markus Armbruster ,
@@ -998,6 +998,14 @@ BlockAIOCB *blk_aio_readv(BlockBackend *blk, int64_t 
sector_num,
 blk_aio_read_entry, 0, cb, opaque);
 }
 
+BlockAIOCB *blk_aio_preadv(BlockBackend *blk, int64_t offset,
+   QEMUIOVector *qiov, BdrvRequestFlags flags,
+   BlockCompletionFunc *cb, void *opaque)
+{
+return blk_aio_prwv(blk, offset, qiov->size, qiov,
+blk_aio_read_entry, flags, cb, opaque);
+}
+
 BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t sector_num,
QEMUIOVector *iov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
@@ -1011,6 +1019,14 @@ BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t 
sector_num,
 blk_aio_write_entry, 0, cb, opaque);
 }
 
+BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
+QEMUIOVector *qiov, BdrvRequestFlags flags,
+BlockCompletionFunc *cb, void *opaque)
+{
+return blk_aio_prwv(blk, offset, qiov->size, qiov,
+blk_aio_write_entry, flags, cb, opaque);
+}
+
 BlockAIOCB *blk_aio_flush(BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque)
 {
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 851376b..73df1a6 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -1,7 +1,7 @@
 /*
  * QEMU Block backends
  *
- * Copyright (C) 2014 Red Hat, Inc.
+ * Copyright (C) 2014-2016 Red Hat, Inc.
  *
  * Authors:
  *  Markus Armbruster ,
@@ -110,9 +110,15 @@ int64_t blk_nb_sectors(BlockBackend *blk);
 BlockAIOCB *blk_aio_readv(BlockBackend *blk, int64_t sector_num,
   QEMUIOVector *iov, int nb_sectors,
   BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *blk_aio_preadv(BlockBackend *blk, int64_t offset,
+   QEMUIOVector *qiov, BdrvRequestFlags flags,
+   BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t sector_num,
QEMUIOVector *iov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
+QEMUIOVector *qiov, BdrvRequestFlags flags,
+BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *blk_aio_flush(BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *blk_aio_discard(BlockBackend *blk,
-- 
1.8.3.1




[Qemu-block] [PULL 34/69] scsi-disk: Switch to byte-based aio block access

2016-05-12 Thread Kevin Wolf
From: Eric Blake 

Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

As part of the cleanup, scsi_init_iovec() no longer needs to return
a value, and reword a comment.

[ kwolf: Fix read accounting change ]

Signed-off-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 hw/scsi/scsi-disk.c | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 1335392..ce89c98 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -108,7 +108,7 @@ static void scsi_check_condition(SCSIDiskReq *r, SCSISense 
sense)
 scsi_req_complete(>req, CHECK_CONDITION);
 }
 
-static uint32_t scsi_init_iovec(SCSIDiskReq *r, size_t size)
+static void scsi_init_iovec(SCSIDiskReq *r, size_t size)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
@@ -118,7 +118,6 @@ static uint32_t scsi_init_iovec(SCSIDiskReq *r, size_t size)
 }
 r->iov.iov_len = MIN(r->sector_count * 512, r->buflen);
 qemu_iovec_init_external(>qiov, >iov, 1);
-return r->qiov.size / 512;
 }
 
 static void scsi_disk_save_request(QEMUFile *f, SCSIRequest *req)
@@ -316,7 +315,6 @@ done:
 static void scsi_do_read(SCSIDiskReq *r, int ret)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-uint32_t n;
 
 assert (r->req.aiocb == NULL);
 
@@ -340,11 +338,12 @@ static void scsi_do_read(SCSIDiskReq *r, int ret)
 r->req.aiocb = dma_blk_read(s->qdev.conf.blk, r->req.sg, r->sector,
 scsi_dma_complete, r);
 } else {
-n = scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
+scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
 block_acct_start(blk_get_stats(s->qdev.conf.blk), >acct,
- n * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-r->req.aiocb = blk_aio_readv(s->qdev.conf.blk, r->sector, >qiov, n,
- scsi_read_complete, r);
+ r->qiov.size, BLOCK_ACCT_READ);
+r->req.aiocb = blk_aio_preadv(s->qdev.conf.blk,
+  r->sector << BDRV_SECTOR_BITS, >qiov,
+  0, scsi_read_complete, r);
 }
 
 done:
@@ -504,7 +503,6 @@ static void scsi_write_data(SCSIRequest *req)
 {
 SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-uint32_t n;
 
 /* No data transfer may already be in progress */
 assert(r->req.aiocb == NULL);
@@ -544,11 +542,11 @@ static void scsi_write_data(SCSIRequest *req)
 r->req.aiocb = dma_blk_write(s->qdev.conf.blk, r->req.sg, r->sector,
  scsi_dma_complete, r);
 } else {
-n = r->qiov.size / 512;
 block_acct_start(blk_get_stats(s->qdev.conf.blk), >acct,
- n * BDRV_SECTOR_SIZE, BLOCK_ACCT_WRITE);
-r->req.aiocb = blk_aio_writev(s->qdev.conf.blk, r->sector, >qiov, n,
-  scsi_write_complete, r);
+ r->qiov.size, BLOCK_ACCT_WRITE);
+r->req.aiocb = blk_aio_pwritev(s->qdev.conf.blk,
+   r->sector << BDRV_SECTOR_BITS, >qiov,
+   0, scsi_write_complete, r);
 }
 }
 
@@ -1730,13 +1728,13 @@ static void scsi_write_same_complete(void *opaque, int 
ret)
 if (data->iov.iov_len) {
 block_acct_start(blk_get_stats(s->qdev.conf.blk), >acct,
  data->iov.iov_len, BLOCK_ACCT_WRITE);
-/* blk_aio_write doesn't like the qiov size being different from
- * nb_sectors, make sure they match.
- */
+/* Reinitialize qiov, to handle unaligned WRITE SAME request
+ * where final qiov may need smaller size */
 qemu_iovec_init_external(>qiov, >iov, 1);
-r->req.aiocb = blk_aio_writev(s->qdev.conf.blk, data->sector,
-  >qiov, data->iov.iov_len / 512,
-  scsi_write_same_complete, data);
+r->req.aiocb = blk_aio_pwritev(s->qdev.conf.blk,
+   data->sector << BDRV_SECTOR_BITS,
+   >qiov, 0,
+   scsi_write_same_complete, data);
 return;
 }
 
@@ -1803,9 +1801,10 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, 
uint8_t *inbuf)
 scsi_req_ref(>req);
 block_acct_start(blk_get_stats(s->qdev.conf.blk), >acct,
  data->iov.iov_len, BLOCK_ACCT_WRITE);
-r->req.aiocb = blk_aio_writev(s->qdev.conf.blk, data->sector,
-  >qiov, data->iov.iov_len / 512,
-  scsi_write_same_complete, data);
+r->req.aiocb = 

[Qemu-block] [PULL 40/69] pflash: Switch to byte-based block access

2016-05-12 Thread Kevin Wolf
From: Eric Blake 

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 hw/block/pflash_cfi01.c | 12 ++--
 hw/block/pflash_cfi02.c | 12 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 106a775..3a1f85d 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -413,11 +413,11 @@ static void pflash_update(pflash_t *pfl, int offset,
 int offset_end;
 if (pfl->blk) {
 offset_end = offset + size;
-/* round to sectors */
-offset = offset >> 9;
-offset_end = (offset_end + 511) >> 9;
-blk_write(pfl->blk, offset, pfl->storage + (offset << 9),
-  offset_end - offset);
+/* widen to sector boundaries */
+offset = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
+offset_end = QEMU_ALIGN_UP(offset_end, BDRV_SECTOR_SIZE);
+blk_pwrite(pfl->blk, offset, pfl->storage + offset,
+   offset_end - offset, 0);
 }
 }
 
@@ -739,7 +739,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 
 if (pfl->blk) {
 /* read the initial flash content */
-ret = blk_read(pfl->blk, 0, pfl->storage, total_len >> 9);
+ret = blk_pread(pfl->blk, 0, pfl->storage, total_len);
 
 if (ret < 0) {
 vmstate_unregister_ram(>mem, DEVICE(pfl));
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index b13172c..5f10610 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -253,11 +253,11 @@ static void pflash_update(pflash_t *pfl, int offset,
 int offset_end;
 if (pfl->blk) {
 offset_end = offset + size;
-/* round to sectors */
-offset = offset >> 9;
-offset_end = (offset_end + 511) >> 9;
-blk_write(pfl->blk, offset, pfl->storage + (offset << 9),
-  offset_end - offset);
+/* widen to sector boundaries */
+offset = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
+offset_end = QEMU_ALIGN_UP(offset_end, BDRV_SECTOR_SIZE);
+blk_pwrite(pfl->blk, offset, pfl->storage + offset,
+   offset_end - offset, 0);
 }
 }
 
@@ -622,7 +622,7 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 pfl->chip_len = chip_len;
 if (pfl->blk) {
 /* read the initial flash content */
-ret = blk_read(pfl->blk, 0, pfl->storage, chip_len >> 9);
+ret = blk_pread(pfl->blk, 0, pfl->storage, chip_len);
 if (ret < 0) {
 vmstate_unregister_ram(>orig_mem, DEVICE(pfl));
 error_setg(errp, "failed to read the initial flash content");
-- 
1.8.3.1




[Qemu-block] [PULL 19/69] vmdk: Implement .bdrv_co_preadv() interface

2016-05-12 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
---
 block/vmdk.c | 98 
 1 file changed, 53 insertions(+), 45 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index f1e01f9..6c447ad 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1381,8 +1381,8 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t 
cluster_offset,
 }
 
 static int vmdk_read_extent(VmdkExtent *extent, int64_t cluster_offset,
-int64_t offset_in_cluster, uint8_t *buf,
-int nb_sectors)
+int64_t offset_in_cluster, QEMUIOVector *qiov,
+int bytes)
 {
 int ret;
 int cluster_bytes, buf_bytes;
@@ -1394,14 +1394,13 @@ static int vmdk_read_extent(VmdkExtent *extent, int64_t 
cluster_offset,
 
 
 if (!extent->compressed) {
-ret = bdrv_pread(extent->file->bs,
-  cluster_offset + offset_in_cluster,
-  buf, nb_sectors * 512);
-if (ret == nb_sectors * 512) {
-return 0;
-} else {
-return -EIO;
+ret = bdrv_co_preadv(extent->file->bs,
+ cluster_offset + offset_in_cluster, bytes,
+ qiov, 0);
+if (ret < 0) {
+return ret;
 }
+return 0;
 }
 cluster_bytes = extent->cluster_sectors * 512;
 /* Read two clusters in case GrainMarker + compressed data > one cluster */
@@ -1433,11 +1432,11 @@ static int vmdk_read_extent(VmdkExtent *extent, int64_t 
cluster_offset,
 
 }
 if (offset_in_cluster < 0 ||
-offset_in_cluster + nb_sectors * 512 > buf_len) {
+offset_in_cluster + bytes > buf_len) {
 ret = -EINVAL;
 goto out;
 }
-memcpy(buf, uncomp_buf + offset_in_cluster, nb_sectors * 512);
+qemu_iovec_from_buf(qiov, 0, uncomp_buf + offset_in_cluster, bytes);
 ret = 0;
 
  out:
@@ -1446,64 +1445,73 @@ static int vmdk_read_extent(VmdkExtent *extent, int64_t 
cluster_offset,
 return ret;
 }
 
-static int vmdk_read(BlockDriverState *bs, int64_t sector_num,
-uint8_t *buf, int nb_sectors)
+static int coroutine_fn
+vmdk_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+   QEMUIOVector *qiov, int flags)
 {
 BDRVVmdkState *s = bs->opaque;
 int ret;
-uint64_t n, index_in_cluster;
+uint64_t n_bytes, offset_in_cluster;
 VmdkExtent *extent = NULL;
+QEMUIOVector local_qiov;
 uint64_t cluster_offset;
+uint64_t bytes_done = 0;
 
-while (nb_sectors > 0) {
-extent = find_extent(s, sector_num, extent);
+qemu_iovec_init(_qiov, qiov->niov);
+qemu_co_mutex_lock(>lock);
+
+while (bytes > 0) {
+extent = find_extent(s, offset >> BDRV_SECTOR_BITS, extent);
 if (!extent) {
-return -EIO;
+ret = -EIO;
+goto fail;
 }
 ret = get_cluster_offset(bs, extent, NULL,
- sector_num << 9, false, _offset,
- 0, 0);
-index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
-n = extent->cluster_sectors - index_in_cluster;
-if (n > nb_sectors) {
-n = nb_sectors;
-}
+ offset, false, _offset, 0, 0);
+offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
+
+n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
+ - offset_in_cluster);
+
 if (ret != VMDK_OK) {
 /* if not allocated, try to read from parent image, if exist */
 if (bs->backing && ret != VMDK_ZEROED) {
 if (!vmdk_is_cid_valid(bs)) {
-return -EINVAL;
+ret = -EINVAL;
+goto fail;
 }
-ret = bdrv_read(bs->backing->bs, sector_num, buf, n);
+
+qemu_iovec_reset(_qiov);
+qemu_iovec_concat(_qiov, qiov, bytes_done, n_bytes);
+
+ret = bdrv_co_preadv(bs->backing->bs, offset, n_bytes,
+ _qiov, 0);
 if (ret < 0) {
-return ret;
+goto fail;
 }
 } else {
-memset(buf, 0, 512 * n);
+qemu_iovec_memset(qiov, bytes_done, 0, n_bytes);
 }
 } else {
-ret = vmdk_read_extent(extent,
-cluster_offset, index_in_cluster * 512,
-buf, n);
+qemu_iovec_reset(_qiov);
+qemu_iovec_concat(_qiov, qiov, bytes_done, n_bytes);
+
+ret = vmdk_read_extent(extent, cluster_offset, offset_in_cluster,
+   _qiov, n_bytes);
 if (ret) {
-  

[Qemu-block] [PULL 14/69] cloop: Implement .bdrv_co_preadv() interface

2016-05-12 Thread Kevin Wolf
This implements .bdrv_co_preadv() for the cloop block driver. While
updating the error paths, change -1 to a valid -errno code.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 block/cloop.c | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/block/cloop.c b/block/cloop.c
index a84f140..fc1ca3a 100644
--- a/block/cloop.c
+++ b/block/cloop.c
@@ -66,6 +66,7 @@ static int cloop_open(BlockDriverState *bs, QDict *options, 
int flags,
 int ret;
 
 bs->read_only = 1;
+bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
 
 /* read header */
 ret = bdrv_pread(bs->file->bs, 128, >block_size, 4);
@@ -229,33 +230,38 @@ static inline int cloop_read_block(BlockDriverState *bs, 
int block_num)
 return 0;
 }
 
-static int cloop_read(BlockDriverState *bs, int64_t sector_num,
-uint8_t *buf, int nb_sectors)
+static int coroutine_fn
+cloop_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+QEMUIOVector *qiov, int flags)
 {
 BDRVCloopState *s = bs->opaque;
-int i;
+uint64_t sector_num = offset >> BDRV_SECTOR_BITS;
+int nb_sectors = bytes >> BDRV_SECTOR_BITS;
+int ret, i;
+
+assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+
+qemu_co_mutex_lock(>lock);
 
 for (i = 0; i < nb_sectors; i++) {
+void *data;
 uint32_t sector_offset_in_block =
 ((sector_num + i) % s->sectors_per_block),
 block_num = (sector_num + i) / s->sectors_per_block;
 if (cloop_read_block(bs, block_num) != 0) {
-return -1;
+ret = -EIO;
+goto fail;
 }
-memcpy(buf + i * 512,
-s->uncompressed_block + sector_offset_in_block * 512, 512);
+
+data = s->uncompressed_block + sector_offset_in_block * 512;
+qemu_iovec_from_buf(qiov, i * 512, data, 512);
 }
-return 0;
-}
 
-static coroutine_fn int cloop_co_read(BlockDriverState *bs, int64_t sector_num,
-  uint8_t *buf, int nb_sectors)
-{
-int ret;
-BDRVCloopState *s = bs->opaque;
-qemu_co_mutex_lock(>lock);
-ret = cloop_read(bs, sector_num, buf, nb_sectors);
+ret = 0;
+fail:
 qemu_co_mutex_unlock(>lock);
+
 return ret;
 }
 
@@ -273,7 +279,7 @@ static BlockDriver bdrv_cloop = {
 .instance_size  = sizeof(BDRVCloopState),
 .bdrv_probe = cloop_probe,
 .bdrv_open  = cloop_open,
-.bdrv_read  = cloop_co_read,
+.bdrv_co_preadv = cloop_co_preadv,
 .bdrv_close = cloop_close,
 };
 
-- 
1.8.3.1




[Qemu-block] [PULL 18/69] vmdk: Add vmdk_find_offset_in_cluster()

2016-05-12 Thread Kevin Wolf
This is a byte granularity version of vmdk_find_index_in_cluster().

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
---
 block/vmdk.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 45f9d3c..f1e01f9 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1259,15 +1259,26 @@ static VmdkExtent *find_extent(BDRVVmdkState *s,
 return NULL;
 }
 
+static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
+   int64_t offset)
+{
+uint64_t offset_in_cluster, extent_begin_offset, extent_relative_offset;
+uint64_t cluster_size = extent->cluster_sectors * BDRV_SECTOR_SIZE;
+
+extent_begin_offset =
+(extent->end_sector - extent->sectors) * BDRV_SECTOR_SIZE;
+extent_relative_offset = offset - extent_begin_offset;
+offset_in_cluster = extent_relative_offset % cluster_size;
+
+return offset_in_cluster;
+}
+
 static inline uint64_t vmdk_find_index_in_cluster(VmdkExtent *extent,
   int64_t sector_num)
 {
-uint64_t index_in_cluster, extent_begin_sector, extent_relative_sector_num;
-
-extent_begin_sector = extent->end_sector - extent->sectors;
-extent_relative_sector_num = sector_num - extent_begin_sector;
-index_in_cluster = extent_relative_sector_num % extent->cluster_sectors;
-return index_in_cluster;
+uint64_t offset;
+offset = vmdk_find_offset_in_cluster(extent, sector_num * 
BDRV_SECTOR_SIZE);
+return offset / BDRV_SECTOR_SIZE;
 }
 
 static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
-- 
1.8.3.1




[Qemu-block] [PULL 31/69] block: Switch blk_*write_zeroes() to byte interface

2016-05-12 Thread Kevin Wolf
From: Eric Blake 

Sector-based blk_write() should die; convert the one-off
variant blk_write_zeroes() to use an offset/count interface
instead.  Likewise for blk_co_write_zeroes() and
blk_aio_write_zeroes().

Signed-off-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c  | 33 +++--
 block/parallels.c  |  3 ++-
 hw/scsi/scsi-disk.c|  4 ++--
 include/sysemu/block-backend.h | 12 ++--
 qemu-img.c |  3 ++-
 qemu-io-cmds.c |  6 ++
 6 files changed, 25 insertions(+), 36 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index e5a8a07..f8f88a6 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -814,11 +814,11 @@ int blk_write(BlockBackend *blk, int64_t sector_num, 
const uint8_t *buf,
   blk_write_entry, 0);
 }
 
-int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
- int nb_sectors, BdrvRequestFlags flags)
+int blk_write_zeroes(BlockBackend *blk, int64_t offset,
+ int count, BdrvRequestFlags flags)
 {
-return blk_rw(blk, sector_num, NULL, nb_sectors, blk_write_entry,
-  flags | BDRV_REQ_ZERO_WRITE);
+return blk_prw(blk, offset, NULL, count, blk_write_entry,
+   flags | BDRV_REQ_ZERO_WRITE);
 }
 
 static void error_callback_bh(void *opaque)
@@ -930,18 +930,12 @@ static void blk_aio_write_entry(void *opaque)
 blk_aio_complete(acb);
 }
 
-BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num,
- int nb_sectors, BdrvRequestFlags flags,
+BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t offset,
+ int count, BdrvRequestFlags flags,
  BlockCompletionFunc *cb, void *opaque)
 {
-if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-return blk_abort_aio_request(blk, cb, opaque, -EINVAL);
-}
-
-return blk_aio_prwv(blk, sector_num << BDRV_SECTOR_BITS,
-nb_sectors << BDRV_SECTOR_BITS, NULL,
-blk_aio_write_entry, flags | BDRV_REQ_ZERO_WRITE,
-cb, opaque);
+return blk_aio_prwv(blk, offset, count, NULL, blk_aio_write_entry,
+flags | BDRV_REQ_ZERO_WRITE, cb, opaque);
 }
 
 int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count)
@@ -1444,15 +1438,10 @@ void *blk_aio_get(const AIOCBInfo *aiocb_info, 
BlockBackend *blk,
 return qemu_aio_get(aiocb_info, blk_bs(blk), cb, opaque);
 }
 
-int coroutine_fn blk_co_write_zeroes(BlockBackend *blk, int64_t sector_num,
- int nb_sectors, BdrvRequestFlags flags)
+int coroutine_fn blk_co_write_zeroes(BlockBackend *blk, int64_t offset,
+ int count, BdrvRequestFlags flags)
 {
-if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-return -EINVAL;
-}
-
-return blk_co_pwritev(blk, sector_num << BDRV_SECTOR_BITS,
-  nb_sectors << BDRV_SECTOR_BITS, NULL,
+return blk_co_pwritev(blk, offset, count, NULL,
   flags | BDRV_REQ_ZERO_WRITE);
 }
 
diff --git a/block/parallels.c b/block/parallels.c
index 2d8bc87..cddbfc4 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -516,7 +516,8 @@ static int parallels_create(const char *filename, QemuOpts 
*opts, Error **errp)
 if (ret < 0) {
 goto exit;
 }
-ret = blk_write_zeroes(file, 1, bat_sectors - 1, 0);
+ret = blk_write_zeroes(file, BDRV_SECTOR_SIZE,
+   (bat_sectors - 1) << BDRV_SECTOR_BITS, 0);
 if (ret < 0) {
 goto exit;
 }
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index c3ce54a..1335392 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1781,8 +1781,8 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, 
uint8_t *inbuf)
  nb_sectors * s->qdev.blocksize,
 BLOCK_ACCT_WRITE);
 r->req.aiocb = blk_aio_write_zeroes(s->qdev.conf.blk,
-r->req.cmd.lba * (s->qdev.blocksize / 512),
-nb_sectors * (s->qdev.blocksize / 512),
+r->req.cmd.lba * s->qdev.blocksize,
+nb_sectors * s->qdev.blocksize,
 flags, scsi_aio_complete, r);
 return;
 }
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 662a106..851376b 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -96,10 +96,10 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t 
offset, uint8_t *buf,
   int count);
 int blk_write(BlockBackend *blk, int64_t sector_num, const 

[Qemu-block] [PULL 27/69] Allow users to specify the vmdk virtual hardware version.

2016-05-12 Thread Kevin Wolf
From: Janne Karhunen 

Vmdk images have metadata to indicate the vmware virtual
hardware version image was created/tested to run with.
Allow users to specify that version via new 'hwversion'
option.

[ kwolf: Adjust qemu-iotests common.filter ]

Signed-off-by: Janne Karhunen 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/vmdk.c | 27 +++
 include/block/block_int.h|  2 +-
 qemu-doc.texi|  3 +++
 tests/qemu-iotests/common.filter |  2 ++
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index f243527..61e20af 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1909,8 +1909,8 @@ static int vmdk_create(const char *filename, QemuOpts 
*opts, Error **errp)
 int64_t total_size = 0, filesize;
 char *adapter_type = NULL;
 char *backing_file = NULL;
+char *hw_version = NULL;
 char *fmt = NULL;
-int flags = 0;
 int ret = 0;
 bool flat, split, compress;
 GString *ext_desc_lines;
@@ -1941,7 +1941,7 @@ static int vmdk_create(const char *filename, QemuOpts 
*opts, Error **errp)
 "# The Disk Data Base\n"
 "#DDB\n"
 "\n"
-"ddb.virtualHWVersion = \"%d\"\n"
+"ddb.virtualHWVersion = \"%s\"\n"
 "ddb.geometry.cylinders = \"%" PRId64 "\"\n"
 "ddb.geometry.heads = \"%" PRIu32 "\"\n"
 "ddb.geometry.sectors = \"63\"\n"
@@ -1958,8 +1958,20 @@ static int vmdk_create(const char *filename, QemuOpts 
*opts, Error **errp)
   BDRV_SECTOR_SIZE);
 adapter_type = qemu_opt_get_del(opts, BLOCK_OPT_ADAPTER_TYPE);
 backing_file = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
+hw_version = qemu_opt_get_del(opts, BLOCK_OPT_HWVERSION);
 if (qemu_opt_get_bool_del(opts, BLOCK_OPT_COMPAT6, false)) {
-flags |= BLOCK_FLAG_COMPAT6;
+if (strcmp(hw_version, "undefined")) {
+error_setg(errp,
+   "compat6 cannot be enabled with hwversion set");
+ret = -EINVAL;
+goto exit;
+}
+g_free(hw_version);
+hw_version = g_strdup("6");
+}
+if (strcmp(hw_version, "undefined") == 0) {
+g_free(hw_version);
+hw_version = g_strdup("4");
 }
 fmt = qemu_opt_get_del(opts, BLOCK_OPT_SUBFMT);
 if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ZEROED_GRAIN, false)) {
@@ -2081,7 +2093,7 @@ static int vmdk_create(const char *filename, QemuOpts 
*opts, Error **errp)
fmt,
parent_desc_line,
ext_desc_lines->str,
-   (flags & BLOCK_FLAG_COMPAT6 ? 6 : 4),
+   hw_version,
total_size /
(int64_t)(63 * number_heads * BDRV_SECTOR_SIZE),
number_heads,
@@ -2127,6 +2139,7 @@ exit:
 }
 g_free(adapter_type);
 g_free(backing_file);
+g_free(hw_version);
 g_free(fmt);
 g_free(desc);
 g_free(path);
@@ -2378,6 +2391,12 @@ static QemuOptsList vmdk_create_opts = {
 .def_value_str = "off"
 },
 {
+.name = BLOCK_OPT_HWVERSION,
+.type = QEMU_OPT_STRING,
+.help = "VMDK hardware version",
+.def_value_str = "undefined"
+},
+{
 .name = BLOCK_OPT_SUBFMT,
 .type = QEMU_OPT_STRING,
 .help =
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c512074..6fbe648 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -38,12 +38,12 @@
 #include "qemu/throttle.h"
 
 #define BLOCK_FLAG_ENCRYPT  1
-#define BLOCK_FLAG_COMPAT6  4
 #define BLOCK_FLAG_LAZY_REFCOUNTS   8
 
 #define BLOCK_OPT_SIZE  "size"
 #define BLOCK_OPT_ENCRYPT   "encryption"
 #define BLOCK_OPT_COMPAT6   "compat6"
+#define BLOCK_OPT_HWVERSION "hwversion"
 #define BLOCK_OPT_BACKING_FILE  "backing_file"
 #define BLOCK_OPT_BACKING_FMT   "backing_fmt"
 #define BLOCK_OPT_CLUSTER_SIZE  "cluster_size"
diff --git a/qemu-doc.texi b/qemu-doc.texi
index 79141d3..f37fd31 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -693,6 +693,9 @@ Supported options:
 File name of a base image (see @option{create} subcommand).
 @item compat6
 Create a VMDK version 6 image (instead of version 4)
+@item hwversion
+Specify vmdk virtual hardware version. Compat6 flag cannot be enabled
+if hwversion is specified.
 @item subformat
 Specifies which VMDK subformat to use. Valid options are
 @code{monolithicSparse} (default),
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 8a6e1b5..72f77fa 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -92,6 +92,7 @@ _filter_img_create()
  

[Qemu-block] [PULL 22/69] vpc: Implement .bdrv_co_pwritev() interface

2016-05-12 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
---
 block/vpc.c | 86 ++---
 1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 01f5f27..2da4126 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -518,7 +518,7 @@ static int rewrite_footer(BlockDriverState* bs)
  *
  * Returns the sectors' offset in the image file on success and < 0 on error
  */
-static int64_t alloc_block(BlockDriverState* bs, int64_t sector_num)
+static int64_t alloc_block(BlockDriverState* bs, int64_t offset)
 {
 BDRVVPCState *s = bs->opaque;
 int64_t bat_offset;
@@ -527,14 +527,13 @@ static int64_t alloc_block(BlockDriverState* bs, int64_t 
sector_num)
 uint8_t bitmap[s->bitmap_size];
 
 /* Check if sector_num is valid */
-if ((sector_num < 0) || (sector_num > bs->total_sectors))
-return -1;
+if ((offset < 0) || (offset > bs->total_sectors * BDRV_SECTOR_SIZE)) {
+return -EINVAL;
+}
 
 /* Write entry into in-memory BAT */
-index = (sector_num * 512) / s->block_size;
-if (s->pagetable[index] != 0x)
-return -1;
-
+index = offset / s->block_size;
+assert(s->pagetable[index] == 0x);
 s->pagetable[index] = s->free_data_block_offset / 512;
 
 /* Initialize the block's bitmap */
@@ -558,11 +557,11 @@ static int64_t alloc_block(BlockDriverState* bs, int64_t 
sector_num)
 if (ret < 0)
 goto fail;
 
-return get_sector_offset(bs, sector_num, 0);
+return get_image_offset(bs, offset, false);
 
 fail:
 s->free_data_block_offset -= (s->block_size + s->bitmap_size);
-return -1;
+return ret;
 }
 
 static int vpc_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
@@ -627,55 +626,56 @@ fail:
 return ret;
 }
 
-static int vpc_write(BlockDriverState *bs, int64_t sector_num,
-const uint8_t *buf, int nb_sectors)
+static int coroutine_fn
+vpc_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+   QEMUIOVector *qiov, int flags)
 {
 BDRVVPCState *s = bs->opaque;
-int64_t offset;
-int64_t sectors, sectors_per_block;
+int64_t image_offset;
+int64_t n_bytes;
+int64_t bytes_done = 0;
 int ret;
 VHDFooter *footer =  (VHDFooter *) s->footer_buf;
+QEMUIOVector local_qiov;
 
 if (be32_to_cpu(footer->type) == VHD_FIXED) {
-return bdrv_write(bs->file->bs, sector_num, buf, nb_sectors);
+return bdrv_co_pwritev(bs->file->bs, offset, bytes, qiov, 0);
 }
-while (nb_sectors > 0) {
-offset = get_sector_offset(bs, sector_num, 1);
 
-sectors_per_block = s->block_size >> BDRV_SECTOR_BITS;
-sectors = sectors_per_block - (sector_num % sectors_per_block);
-if (sectors > nb_sectors) {
-sectors = nb_sectors;
-}
+qemu_co_mutex_lock(>lock);
+qemu_iovec_init(_qiov, qiov->niov);
+
+while (bytes > 0) {
+image_offset = get_image_offset(bs, offset, true);
+n_bytes = MIN(bytes, s->block_size - (offset % s->block_size));
 
-if (offset == -1) {
-offset = alloc_block(bs, sector_num);
-if (offset < 0)
-return -1;
+if (image_offset == -1) {
+image_offset = alloc_block(bs, offset);
+if (image_offset < 0) {
+ret = image_offset;
+goto fail;
+}
 }
 
-ret = bdrv_pwrite(bs->file->bs, offset, buf,
-  sectors * BDRV_SECTOR_SIZE);
-if (ret != sectors * BDRV_SECTOR_SIZE) {
-return -1;
+qemu_iovec_reset(_qiov);
+qemu_iovec_concat(_qiov, qiov, bytes_done, n_bytes);
+
+ret = bdrv_co_pwritev(bs->file->bs, image_offset, n_bytes,
+  _qiov, 0);
+if (ret < 0) {
+goto fail;
 }
 
-nb_sectors -= sectors;
-sector_num += sectors;
-buf += sectors * BDRV_SECTOR_SIZE;
+bytes -= n_bytes;
+offset += n_bytes;
+bytes_done += n_bytes;
 }
 
-return 0;
-}
-
-static coroutine_fn int vpc_co_write(BlockDriverState *bs, int64_t sector_num,
- const uint8_t *buf, int nb_sectors)
-{
-int ret;
-BDRVVPCState *s = bs->opaque;
-qemu_co_mutex_lock(>lock);
-ret = vpc_write(bs, sector_num, buf, nb_sectors);
+ret = 0;
+fail:
+qemu_iovec_destroy(_qiov);
 qemu_co_mutex_unlock(>lock);
+
 return ret;
 }
 
@@ -1062,7 +1062,7 @@ static BlockDriver bdrv_vpc = {
 .bdrv_create= vpc_create,
 
 .bdrv_co_preadv = vpc_co_preadv,
-.bdrv_write = vpc_co_write,
+.bdrv_co_pwritev= vpc_co_pwritev,
 .bdrv_co_get_block_status   = vpc_co_get_block_status,
 
 .bdrv_get_info  = vpc_get_info,
-- 
1.8.3.1




[Qemu-block] [PULL 26/69] block: always compile-check debug prints

2016-05-12 Thread Kevin Wolf
From: Zhou Jie 

Files with conditional debug statements should ensure that the printf is
always compiled. This prevents bitrot of the format string of the debug
statement. And switch debug output to stderr.

Signed-off-by: Zhou Jie 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/curl.c | 10 --
 block/sheepdog.c | 13 -
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 5a8f8b6..da9f5e8 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -36,10 +36,16 @@
 // #define DEBUG_VERBOSE
 
 #ifdef DEBUG_CURL
-#define DPRINTF(fmt, ...) do { printf(fmt, ## __VA_ARGS__); } while (0)
+#define DEBUG_CURL_PRINT 1
 #else
-#define DPRINTF(fmt, ...) do { } while (0)
+#define DEBUG_CURL_PRINT 0
 #endif
+#define DPRINTF(fmt, ...)\
+do { \
+if (DEBUG_CURL_PRINT) {  \
+fprintf(stderr, fmt, ## __VA_ARGS__);\
+}\
+} while (0)
 
 #if LIBCURL_VERSION_NUM >= 0x071000
 /* The multi interface timer callback was introduced in 7.16.0 */
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 33e0a33..9023686 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -294,13 +294,16 @@ static inline size_t count_data_objs(const struct 
SheepdogInode *inode)
 
 #undef DPRINTF
 #ifdef DEBUG_SDOG
-#define DPRINTF(fmt, args...)   \
-do {\
-fprintf(stdout, "%s %d: " fmt, __func__, __LINE__, ##args); \
-} while (0)
+#define DEBUG_SDOG_PRINT 1
 #else
-#define DPRINTF(fmt, args...)
+#define DEBUG_SDOG_PRINT 0
 #endif
+#define DPRINTF(fmt, args...)   \
+do {\
+if (DEBUG_SDOG_PRINT) { \
+fprintf(stderr, "%s %d: " fmt, __func__, __LINE__, ##args); \
+}   \
+} while (0)
 
 typedef struct SheepdogAIOCB SheepdogAIOCB;
 
-- 
1.8.3.1




[Qemu-block] [PULL 17/69] vdi: Implement .bdrv_co_pwritev() interface

2016-05-12 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
---
 block/vdi.c | 72 +++--
 1 file changed, 41 insertions(+), 31 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 8295511..e5fe4e8 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -611,53 +611,55 @@ vdi_co_preadv(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 return ret;
 }
 
-static int vdi_co_write(BlockDriverState *bs,
-int64_t sector_num, const uint8_t *buf, int nb_sectors)
+static int coroutine_fn
+vdi_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+   QEMUIOVector *qiov, int flags)
 {
 BDRVVdiState *s = bs->opaque;
+QEMUIOVector local_qiov;
 uint32_t bmap_entry;
 uint32_t block_index;
-uint32_t sector_in_block;
-uint32_t n_sectors;
+uint32_t offset_in_block;
+uint32_t n_bytes;
 uint32_t bmap_first = VDI_UNALLOCATED;
 uint32_t bmap_last = VDI_UNALLOCATED;
 uint8_t *block = NULL;
+uint64_t bytes_done = 0;
 int ret = 0;
 
 logout("\n");
 
-while (ret >= 0 && nb_sectors > 0) {
-block_index = sector_num / s->block_sectors;
-sector_in_block = sector_num % s->block_sectors;
-n_sectors = s->block_sectors - sector_in_block;
-if (n_sectors > nb_sectors) {
-n_sectors = nb_sectors;
-}
+qemu_iovec_init(_qiov, qiov->niov);
 
-logout("will write %u sectors starting at sector %" PRIu64 "\n",
-   n_sectors, sector_num);
+while (ret >= 0 && bytes > 0) {
+block_index = offset / s->block_size;
+offset_in_block = offset % s->block_size;
+n_bytes = MIN(bytes, s->block_size - offset_in_block);
+
+logout("will write %u bytes starting at offset %" PRIu64 "\n",
+   n_bytes, offset);
 
 /* prepare next AIO request */
 bmap_entry = le32_to_cpu(s->bmap[block_index]);
 if (!VDI_IS_ALLOCATED(bmap_entry)) {
 /* Allocate new block and write to it. */
-uint64_t offset;
+uint64_t data_offset;
 bmap_entry = s->header.blocks_allocated;
 s->bmap[block_index] = cpu_to_le32(bmap_entry);
 s->header.blocks_allocated++;
-offset = s->header.offset_data / SECTOR_SIZE +
- (uint64_t)bmap_entry * s->block_sectors;
+data_offset = s->header.offset_data +
+  (uint64_t)bmap_entry * s->block_size;
 if (block == NULL) {
 block = g_malloc(s->block_size);
 bmap_first = block_index;
 }
 bmap_last = block_index;
 /* Copy data to be written to new block and zero unused parts. */
-memset(block, 0, sector_in_block * SECTOR_SIZE);
-memcpy(block + sector_in_block * SECTOR_SIZE,
-   buf, n_sectors * SECTOR_SIZE);
-memset(block + (sector_in_block + n_sectors) * SECTOR_SIZE, 0,
-   (s->block_sectors - n_sectors - sector_in_block) * 
SECTOR_SIZE);
+memset(block, 0, offset_in_block);
+qemu_iovec_to_buf(qiov, bytes_done, block + offset_in_block,
+  n_bytes);
+memset(block + offset_in_block + n_bytes, 0,
+   s->block_size - n_bytes - offset_in_block);
 
 /* Note that this coroutine does not yield anywhere from reading 
the
  * bmap entry until here, so in regards to all the coroutines 
trying
@@ -667,12 +669,12 @@ static int vdi_co_write(BlockDriverState *bs,
  * acquire the lock and thus the padded cluster is written before
  * the other coroutines can write to the affected area. */
 qemu_co_mutex_lock(>write_lock);
-ret = bdrv_write(bs->file->bs, offset, block, s->block_sectors);
+ret = bdrv_pwrite(bs->file->bs, data_offset, block, s->block_size);
 qemu_co_mutex_unlock(>write_lock);
 } else {
-uint64_t offset = s->header.offset_data / SECTOR_SIZE +
-  (uint64_t)bmap_entry * s->block_sectors +
-  sector_in_block;
+uint64_t data_offset = s->header.offset_data +
+   (uint64_t)bmap_entry * s->block_size +
+   offset_in_block;
 qemu_co_mutex_lock(>write_lock);
 /* This lock is only used to make sure the following write 
operation
  * is executed after the write issued by the coroutine allocating
@@ -683,16 +685,23 @@ static int vdi_co_write(BlockDriverState *bs,
  * that that write operation has returned (there may be other 
writes
  * in flight, but they do not concern this very operation). */
 qemu_co_mutex_unlock(>write_lock);
-ret = bdrv_write(bs->file->bs, offset, buf, 

[Qemu-block] [PULL 16/69] vdi: Implement .bdrv_co_preadv() interface

2016-05-12 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
---
 block/vdi.c | 55 ---
 1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 75d4819..8295511 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -557,48 +557,57 @@ static int64_t coroutine_fn 
vdi_co_get_block_status(BlockDriverState *bs,
 return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
 }
 
-static int vdi_co_read(BlockDriverState *bs,
-int64_t sector_num, uint8_t *buf, int nb_sectors)
+static int coroutine_fn
+vdi_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+  QEMUIOVector *qiov, int flags)
 {
 BDRVVdiState *s = bs->opaque;
+QEMUIOVector local_qiov;
 uint32_t bmap_entry;
 uint32_t block_index;
-uint32_t sector_in_block;
-uint32_t n_sectors;
+uint32_t offset_in_block;
+uint32_t n_bytes;
+uint64_t bytes_done = 0;
 int ret = 0;
 
 logout("\n");
 
-while (ret >= 0 && nb_sectors > 0) {
-block_index = sector_num / s->block_sectors;
-sector_in_block = sector_num % s->block_sectors;
-n_sectors = s->block_sectors - sector_in_block;
-if (n_sectors > nb_sectors) {
-n_sectors = nb_sectors;
-}
+qemu_iovec_init(_qiov, qiov->niov);
 
-logout("will read %u sectors starting at sector %" PRIu64 "\n",
-   n_sectors, sector_num);
+while (ret >= 0 && bytes > 0) {
+block_index = offset / s->block_size;
+offset_in_block = offset % s->block_size;
+n_bytes = MIN(bytes, s->block_size - offset_in_block);
+
+logout("will read %u bytes starting at offset %" PRIu64 "\n",
+   n_bytes, offset);
 
 /* prepare next AIO request */
 bmap_entry = le32_to_cpu(s->bmap[block_index]);
 if (!VDI_IS_ALLOCATED(bmap_entry)) {
 /* Block not allocated, return zeros, no need to wait. */
-memset(buf, 0, n_sectors * SECTOR_SIZE);
+qemu_iovec_memset(qiov, bytes_done, 0, n_bytes);
 ret = 0;
 } else {
-uint64_t offset = s->header.offset_data / SECTOR_SIZE +
-  (uint64_t)bmap_entry * s->block_sectors +
-  sector_in_block;
-ret = bdrv_read(bs->file->bs, offset, buf, n_sectors);
+uint64_t data_offset = s->header.offset_data +
+   (uint64_t)bmap_entry * s->block_size +
+   offset_in_block;
+
+qemu_iovec_reset(_qiov);
+qemu_iovec_concat(_qiov, qiov, bytes_done, n_bytes);
+
+ret = bdrv_co_preadv(bs->file->bs, data_offset, n_bytes,
+ _qiov, 0);
 }
-logout("%u sectors read\n", n_sectors);
+logout("%u bytes read\n", n_bytes);
 
-nb_sectors -= n_sectors;
-sector_num += n_sectors;
-buf += n_sectors * SECTOR_SIZE;
+bytes -= n_bytes;
+offset += n_bytes;
+bytes_done += n_bytes;
 }
 
+qemu_iovec_destroy(_qiov);
+
 return ret;
 }
 
@@ -903,7 +912,7 @@ static BlockDriver bdrv_vdi = {
 .bdrv_co_get_block_status = vdi_co_get_block_status,
 .bdrv_make_empty = vdi_make_empty,
 
-.bdrv_read = vdi_co_read,
+.bdrv_co_preadv = vdi_co_preadv,
 #if defined(CONFIG_VDI_WRITE)
 .bdrv_write = vdi_co_write,
 #endif
-- 
1.8.3.1




[Qemu-block] [PULL 11/69] block: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev

2016-05-12 Thread Kevin Wolf
It used to be an internal helper function just for implementing
bdrv_co_do_readv/writev(), but now that it's a public interface, it
deserves a name without "do" in it.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 block/block-backend.c |  4 ++--
 block/io.c| 24 
 block/raw_bsd.c   |  4 ++--
 hw/ide/macio.c|  4 ++--
 include/block/block_int.h |  4 ++--
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 9538e79..a7623e8 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -692,7 +692,7 @@ static int coroutine_fn blk_co_preadv(BlockBackend *blk, 
int64_t offset,
 return ret;
 }
 
-return bdrv_co_do_preadv(blk_bs(blk), offset, bytes, qiov, flags);
+return bdrv_co_preadv(blk_bs(blk), offset, bytes, qiov, flags);
 }
 
 static int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
@@ -710,7 +710,7 @@ static int coroutine_fn blk_co_pwritev(BlockBackend *blk, 
int64_t offset,
 flags |= BDRV_REQ_FUA;
 }
 
-return bdrv_co_do_pwritev(blk_bs(blk), offset, bytes, qiov, flags);
+return bdrv_co_pwritev(blk_bs(blk), offset, bytes, qiov, flags);
 }
 
 typedef struct BlkRwCo {
diff --git a/block/io.c b/block/io.c
index fbde5e0..feddb71 100644
--- a/block/io.c
+++ b/block/io.c
@@ -577,13 +577,13 @@ static void coroutine_fn bdrv_rw_co_entry(void *opaque)
 RwCo *rwco = opaque;
 
 if (!rwco->is_write) {
-rwco->ret = bdrv_co_do_preadv(rwco->bs, rwco->offset,
-  rwco->qiov->size, rwco->qiov,
-  rwco->flags);
+rwco->ret = bdrv_co_preadv(rwco->bs, rwco->offset,
+   rwco->qiov->size, rwco->qiov,
+   rwco->flags);
 } else {
-rwco->ret = bdrv_co_do_pwritev(rwco->bs, rwco->offset,
-   rwco->qiov->size, rwco->qiov,
-   rwco->flags);
+rwco->ret = bdrv_co_pwritev(rwco->bs, rwco->offset,
+rwco->qiov->size, rwco->qiov,
+rwco->flags);
 }
 }
 
@@ -1042,7 +1042,7 @@ out:
 /*
  * Handle a read request in coroutine context
  */
-int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
+int coroutine_fn bdrv_co_preadv(BlockDriverState *bs,
 int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags)
 {
@@ -1124,8 +1124,8 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState 
*bs,
 return -EINVAL;
 }
 
-return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
- nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
+return bdrv_co_preadv(bs, sector_num << BDRV_SECTOR_BITS,
+  nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
 }
 
 int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
@@ -1385,7 +1385,7 @@ fail:
 /*
  * Handle a write request in coroutine context
  */
-int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
+int coroutine_fn bdrv_co_pwritev(BlockDriverState *bs,
 int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
 BdrvRequestFlags flags)
 {
@@ -1520,8 +1520,8 @@ static int coroutine_fn 
bdrv_co_do_writev(BlockDriverState *bs,
 return -EINVAL;
 }
 
-return bdrv_co_do_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
-  nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
+return bdrv_co_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
+   nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
 }
 
 int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 9c9d39b..5e65fb0 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -105,8 +105,8 @@ raw_co_writev_flags(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors,
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-ret = bdrv_co_do_pwritev(bs->file->bs, sector_num * BDRV_SECTOR_SIZE,
- nb_sectors * BDRV_SECTOR_SIZE, qiov, flags);
+ret = bdrv_co_pwritev(bs->file->bs, sector_num * BDRV_SECTOR_SIZE,
+  nb_sectors * BDRV_SECTOR_SIZE, qiov, flags);
 
 fail:
 if (qiov == _qiov) {
diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index 76256eb..ae29b6f 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -55,8 +55,8 @@ static const int debug_macio = 0;
 /*
  * Unaligned DMA read/write access functions required for OS X/Darwin which
  * don't perform DMA transactions on sector boundaries. These functions are
- * modelled on bdrv_co_do_preadv()/bdrv_co_do_pwritev() and so should be
- * easy to remove if the unaligned block APIs are ever exposed.
+ * modelled on 

[Qemu-block] [PULL 10/69] block: Support AIO drivers in bdrv_driver_preadv/pwritev()

2016-05-12 Thread Kevin Wolf
Instead of registering emulation functions as .bdrv_co_writev, just
directly check whether the function is there or not, and use the AIO
interface if it isn't. This makes the read/write functions more
consistent with how things are done in other places (flush, discard,
etc.)

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 block/io.c   | 126 ---
 trace-events |   1 -
 2 files changed, 52 insertions(+), 75 deletions(-)

diff --git a/block/io.c b/block/io.c
index 53b4f2c..fbde5e0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -40,12 +40,6 @@ static BlockAIOCB *bdrv_aio_readv_em(BlockDriverState *bs,
 static BlockAIOCB *bdrv_aio_writev_em(BlockDriverState *bs,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockCompletionFunc *cb, void *opaque);
-static int coroutine_fn bdrv_co_readv_em(BlockDriverState *bs,
- int64_t sector_num, int nb_sectors,
- QEMUIOVector *iov);
-static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
- int64_t sector_num, int nb_sectors,
- QEMUIOVector *iov);
 static BlockAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
  int64_t sector_num,
  QEMUIOVector *qiov,
@@ -112,19 +106,13 @@ void bdrv_io_limits_update_group(BlockDriverState *bs, 
const char *group)
 
 void bdrv_setup_io_funcs(BlockDriver *bdrv)
 {
-/* Block drivers without coroutine functions need emulation */
-if (!bdrv->bdrv_co_readv) {
-bdrv->bdrv_co_readv = bdrv_co_readv_em;
-bdrv->bdrv_co_writev = bdrv_co_writev_em;
-
-/* bdrv_co_readv_em()/brdv_co_writev_em() work in terms of aio, so if
- * the block driver lacks aio we need to emulate that too.
- */
-if (!bdrv->bdrv_aio_readv) {
-/* add AIO emulation layer */
-bdrv->bdrv_aio_readv = bdrv_aio_readv_em;
-bdrv->bdrv_aio_writev = bdrv_aio_writev_em;
-}
+/* bdrv_co_readv_em()/brdv_co_writev_em() work in terms of aio, so if
+ * the block driver lacks aio we need to emulate that.
+ */
+if (!bdrv->bdrv_aio_readv) {
+/* add AIO emulation layer */
+bdrv->bdrv_aio_readv = bdrv_aio_readv_em;
+bdrv->bdrv_aio_writev = bdrv_aio_writev_em;
 }
 }
 
@@ -797,6 +785,19 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
 return 0;
 }
 
+typedef struct CoroutineIOCompletion {
+Coroutine *coroutine;
+int ret;
+} CoroutineIOCompletion;
+
+static void bdrv_co_io_em_complete(void *opaque, int ret)
+{
+CoroutineIOCompletion *co = opaque;
+
+co->ret = ret;
+qemu_coroutine_enter(co->coroutine, NULL);
+}
+
 static int coroutine_fn bdrv_driver_preadv(BlockDriverState *bs,
uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
@@ -809,7 +810,23 @@ static int coroutine_fn 
bdrv_driver_preadv(BlockDriverState *bs,
 assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 assert((bytes >> BDRV_SECTOR_BITS) <= BDRV_REQUEST_MAX_SECTORS);
 
-return drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+if (drv->bdrv_co_readv) {
+return drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+} else {
+BlockAIOCB *acb;
+CoroutineIOCompletion co = {
+.coroutine = qemu_coroutine_self(),
+};
+
+acb = bs->drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
+  bdrv_co_io_em_complete, );
+if (acb == NULL) {
+return -EIO;
+} else {
+qemu_coroutine_yield();
+return co.ret;
+}
+}
 }
 
 static int coroutine_fn bdrv_driver_pwritev(BlockDriverState *bs,
@@ -828,9 +845,23 @@ static int coroutine_fn 
bdrv_driver_pwritev(BlockDriverState *bs,
 if (drv->bdrv_co_writev_flags) {
 ret = drv->bdrv_co_writev_flags(bs, sector_num, nb_sectors, qiov,
 flags);
-} else {
+} else if (drv->bdrv_co_writev) {
 assert(drv->supported_write_flags == 0);
 ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
+} else {
+BlockAIOCB *acb;
+CoroutineIOCompletion co = {
+.coroutine = qemu_coroutine_self(),
+};
+
+acb = bs->drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
+   bdrv_co_io_em_complete, );
+if (acb == NULL) {
+return -EIO;
+} else {
+qemu_coroutine_yield();
+return co.ret;
+}
 }
 
 if (ret == 0 && (flags & BDRV_REQ_FUA) &&
@@ -2348,59 +2379,6 @@ void 

[Qemu-block] [PULL 13/69] bochs: Implement .bdrv_co_preadv() interface

2016-05-12 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 block/bochs.c | 51 +--
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/block/bochs.c b/block/bochs.c
index af8b7ab..f0e18c0 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -104,6 +104,7 @@ static int bochs_open(BlockDriverState *bs, QDict *options, 
int flags,
 int ret;
 
 bs->read_only = 1; // no write support yet
+bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
 
 ret = bdrv_pread(bs->file->bs, 0, , sizeof(bochs));
 if (ret < 0) {
@@ -221,38 +222,52 @@ static int64_t seek_to_sector(BlockDriverState *bs, 
int64_t sector_num)
 return bitmap_offset + (512 * (s->bitmap_blocks + extent_offset));
 }
 
-static int bochs_read(BlockDriverState *bs, int64_t sector_num,
-uint8_t *buf, int nb_sectors)
+static int coroutine_fn
+bochs_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+QEMUIOVector *qiov, int flags)
 {
+BDRVBochsState *s = bs->opaque;
+uint64_t sector_num = offset >> BDRV_SECTOR_BITS;
+int nb_sectors = bytes >> BDRV_SECTOR_BITS;
+uint64_t bytes_done = 0;
+QEMUIOVector local_qiov;
 int ret;
 
+assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+
+qemu_iovec_init(_qiov, qiov->niov);
+qemu_co_mutex_lock(>lock);
+
 while (nb_sectors > 0) {
 int64_t block_offset = seek_to_sector(bs, sector_num);
 if (block_offset < 0) {
-return block_offset;
-} else if (block_offset > 0) {
-ret = bdrv_pread(bs->file->bs, block_offset, buf, 512);
+ret = block_offset;
+goto fail;
+}
+
+qemu_iovec_reset(_qiov);
+qemu_iovec_concat(_qiov, qiov, bytes_done, 512);
+
+if (block_offset > 0) {
+ret = bdrv_co_preadv(bs->file->bs, block_offset, 512,
+ _qiov, 0);
 if (ret < 0) {
-return ret;
+goto fail;
 }
 } else {
-memset(buf, 0, 512);
+qemu_iovec_memset(_qiov, 0, 0, 512);
 }
 nb_sectors--;
 sector_num++;
-buf += 512;
+bytes_done += 512;
 }
-return 0;
-}
 
-static coroutine_fn int bochs_co_read(BlockDriverState *bs, int64_t sector_num,
-  uint8_t *buf, int nb_sectors)
-{
-int ret;
-BDRVBochsState *s = bs->opaque;
-qemu_co_mutex_lock(>lock);
-ret = bochs_read(bs, sector_num, buf, nb_sectors);
+ret = 0;
+fail:
 qemu_co_mutex_unlock(>lock);
+qemu_iovec_destroy(_qiov);
+
 return ret;
 }
 
@@ -267,7 +282,7 @@ static BlockDriver bdrv_bochs = {
 .instance_size = sizeof(BDRVBochsState),
 .bdrv_probe= bochs_probe,
 .bdrv_open = bochs_open,
-.bdrv_read  = bochs_co_read,
+.bdrv_co_preadv = bochs_co_preadv,
 .bdrv_close= bochs_close,
 };
 
-- 
1.8.3.1




[Qemu-block] [PULL 01/69] block: Don't disable I/O throttling on sync requests

2016-05-12 Thread Kevin Wolf
We had to disable I/O throttling with synchronous requests because we
didn't use to run timers in nested event loops when the code was
introduced. This isn't true any more, and throttling works just fine
even when using the synchronous API.

The removed code is in fact dead code since commit a8823a3b ('block: Use
blk_co_pwritev() for blk_write()') because I/O throttling can only be
set on the top layer, but BlockBackend always uses the coroutine
interface now instead of using the sync API emulation in block.c.

Signed-off-by: Kevin Wolf 
Message-Id: <1458660792-3035-2-git-send-email-kw...@redhat.com>
Signed-off-by: Paolo Bonzini 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block/io.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/block/io.c b/block/io.c
index a7dbf85..a91d862 100644
--- a/block/io.c
+++ b/block/io.c
@@ -608,17 +608,6 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t 
offset,
 .flags = flags,
 };
 
-/**
- * In sync call context, when the vcpu is blocked, this throttling timer
- * will not fire; so the I/O throttling function has to be disabled here
- * if it has been enabled.
- */
-if (bs->io_limits_enabled) {
-fprintf(stderr, "Disabling I/O throttling on '%s' due "
-"to synchronous I/O.\n", bdrv_get_device_name(bs));
-bdrv_io_limits_disable(bs);
-}
-
 if (qemu_in_coroutine()) {
 /* Fast-path if already in coroutine context */
 bdrv_rw_co_entry();
-- 
1.8.3.1




[Qemu-block] [PULL 06/69] block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end

2016-05-12 Thread Kevin Wolf
From: Paolo Bonzini 

Extract the handling of io_plug "depth" from linux-aio.c and let the
main bdrv_drain loop do nothing but wait on I/O.

Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug
now operate on all children.  The visit order is now symmetrical between
plug and unplug, making it possible for formats to implement plug/unplug.

Reviewed-by: Fam Zheng 
Signed-off-by: Paolo Bonzini 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block/io.c| 76 ---
 block/linux-aio.c | 13 
 block/raw-aio.h   |  2 +-
 block/raw-posix.c | 16 +-
 include/block/block.h |  3 +-
 include/block/block_int.h |  5 +++-
 6 files changed, 71 insertions(+), 44 deletions(-)

diff --git a/block/io.c b/block/io.c
index b798040..b903270 100644
--- a/block/io.c
+++ b/block/io.c
@@ -253,7 +253,6 @@ static void bdrv_drain_poll(BlockDriverState *bs)
 
 while (busy) {
 /* Keep iterating */
-bdrv_flush_io_queue(bs);
 busy = bdrv_requests_pending(bs);
 busy |= aio_poll(bdrv_get_aio_context(bs), busy);
 }
@@ -307,20 +306,24 @@ static void coroutine_fn 
bdrv_co_yield_to_drain(BlockDriverState *bs)
 void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
 {
 bdrv_no_throttling_begin(bs);
+bdrv_io_unplugged_begin(bs);
 bdrv_drain_recurse(bs);
 bdrv_co_yield_to_drain(bs);
+bdrv_io_unplugged_end(bs);
 bdrv_no_throttling_end(bs);
 }
 
 void bdrv_drain(BlockDriverState *bs)
 {
 bdrv_no_throttling_begin(bs);
+bdrv_io_unplugged_begin(bs);
 bdrv_drain_recurse(bs);
 if (qemu_in_coroutine()) {
 bdrv_co_yield_to_drain(bs);
 } else {
 bdrv_drain_poll(bs);
 }
+bdrv_io_unplugged_end(bs);
 bdrv_no_throttling_end(bs);
 }
 
@@ -345,6 +348,7 @@ void bdrv_drain_all(void)
 block_job_pause(bs->job);
 }
 bdrv_no_throttling_begin(bs);
+bdrv_io_unplugged_begin(bs);
 bdrv_drain_recurse(bs);
 aio_context_release(aio_context);
 
@@ -369,7 +373,6 @@ void bdrv_drain_all(void)
 aio_context_acquire(aio_context);
 while ((bs = bdrv_next(bs))) {
 if (aio_context == bdrv_get_aio_context(bs)) {
-bdrv_flush_io_queue(bs);
 if (bdrv_requests_pending(bs)) {
 busy = true;
 aio_poll(aio_context, busy);
@@ -386,6 +389,7 @@ void bdrv_drain_all(void)
 AioContext *aio_context = bdrv_get_aio_context(bs);
 
 aio_context_acquire(aio_context);
+bdrv_io_unplugged_end(bs);
 bdrv_no_throttling_end(bs);
 if (bs->job) {
 block_job_resume(bs->job);
@@ -2756,31 +2760,67 @@ void bdrv_add_before_write_notifier(BlockDriverState 
*bs,
 
 void bdrv_io_plug(BlockDriverState *bs)
 {
-BlockDriver *drv = bs->drv;
-if (drv && drv->bdrv_io_plug) {
-drv->bdrv_io_plug(bs);
-} else if (bs->file) {
-bdrv_io_plug(bs->file->bs);
+BdrvChild *child;
+
+QLIST_FOREACH(child, >children, next) {
+bdrv_io_plug(child->bs);
+}
+
+if (bs->io_plugged++ == 0 && bs->io_plug_disabled == 0) {
+BlockDriver *drv = bs->drv;
+if (drv && drv->bdrv_io_plug) {
+drv->bdrv_io_plug(bs);
+}
 }
 }
 
 void bdrv_io_unplug(BlockDriverState *bs)
 {
-BlockDriver *drv = bs->drv;
-if (drv && drv->bdrv_io_unplug) {
-drv->bdrv_io_unplug(bs);
-} else if (bs->file) {
-bdrv_io_unplug(bs->file->bs);
+BdrvChild *child;
+
+assert(bs->io_plugged);
+if (--bs->io_plugged == 0 && bs->io_plug_disabled == 0) {
+BlockDriver *drv = bs->drv;
+if (drv && drv->bdrv_io_unplug) {
+drv->bdrv_io_unplug(bs);
+}
+}
+
+QLIST_FOREACH(child, >children, next) {
+bdrv_io_unplug(child->bs);
 }
 }
 
-void bdrv_flush_io_queue(BlockDriverState *bs)
+void bdrv_io_unplugged_begin(BlockDriverState *bs)
 {
-BlockDriver *drv = bs->drv;
-if (drv && drv->bdrv_flush_io_queue) {
-drv->bdrv_flush_io_queue(bs);
-} else if (bs->file) {
-bdrv_flush_io_queue(bs->file->bs);
+BdrvChild *child;
+
+if (bs->io_plug_disabled++ == 0 && bs->io_plugged > 0) {
+BlockDriver *drv = bs->drv;
+if (drv && drv->bdrv_io_unplug) {
+drv->bdrv_io_unplug(bs);
+}
+}
+
+QLIST_FOREACH(child, >children, next) {
+bdrv_io_unplugged_begin(child->bs);
+}
+}
+
+void bdrv_io_unplugged_end(BlockDriverState *bs)
+{
+BdrvChild *child;
+
+assert(bs->io_plug_disabled);
+QLIST_FOREACH(child, >children, next) {
+bdrv_io_unplugged_end(child->bs);
+}
+
+if (--bs->io_plug_disabled == 0 && bs->io_plugged > 0) {
+BlockDriver *drv = bs->drv;
+if (drv 

[Qemu-block] [PULL 08/69] block: Introduce bdrv_driver_preadv()

2016-05-12 Thread Kevin Wolf
This is a function that simply calls into the block driver for doing a
read, providing the byte granularity interface we want to eventually
have everywhere, and using whatever interface that driver supports.

For now, this is just a wrapper for calling bs->drv->bdrv_co_readv().

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 block/io.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/block/io.c b/block/io.c
index b903270..d3617fe 100644
--- a/block/io.c
+++ b/block/io.c
@@ -797,6 +797,21 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
 return 0;
 }
 
+static int coroutine_fn bdrv_driver_preadv(BlockDriverState *bs,
+   uint64_t offset, uint64_t bytes,
+   QEMUIOVector *qiov, int flags)
+{
+BlockDriver *drv = bs->drv;
+int64_t sector_num = offset >> BDRV_SECTOR_BITS;
+unsigned int nb_sectors = bytes >> BDRV_SECTOR_BITS;
+
+assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert((bytes >> BDRV_SECTOR_BITS) <= BDRV_REQUEST_MAX_SECTORS);
+
+return drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+}
+
 static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
 {
@@ -833,8 +848,9 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BlockDriverState *bs,
 
 qemu_iovec_init_external(_qiov, , 1);
 
-ret = drv->bdrv_co_readv(bs, cluster_sector_num, cluster_nb_sectors,
- _qiov);
+ret = bdrv_driver_preadv(bs, cluster_sector_num * BDRV_SECTOR_SIZE,
+ cluster_nb_sectors * BDRV_SECTOR_SIZE,
+ _qiov, 0);
 if (ret < 0) {
 goto err;
 }
@@ -877,7 +893,6 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
 int64_t align, QEMUIOVector *qiov, int flags)
 {
-BlockDriver *drv = bs->drv;
 int ret;
 
 int64_t sector_num = offset >> BDRV_SECTOR_BITS;
@@ -918,7 +933,7 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 
 /* Forward the request to the BlockDriver */
 if (!bs->zero_beyond_eof) {
-ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0);
 } else {
 /* Read zeros after EOF */
 int64_t total_sectors, max_nb_sectors;
@@ -932,7 +947,7 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 max_nb_sectors = ROUND_UP(MAX(0, total_sectors - sector_num),
   align >> BDRV_SECTOR_BITS);
 if (nb_sectors < max_nb_sectors) {
-ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0);
 } else if (max_nb_sectors > 0) {
 QEMUIOVector local_qiov;
 
@@ -940,8 +955,9 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 qemu_iovec_concat(_qiov, qiov, 0,
   max_nb_sectors * BDRV_SECTOR_SIZE);
 
-ret = drv->bdrv_co_readv(bs, sector_num, max_nb_sectors,
- _qiov);
+ret = bdrv_driver_preadv(bs, offset,
+ max_nb_sectors * BDRV_SECTOR_SIZE,
+ _qiov, 0);
 
 qemu_iovec_destroy(_qiov);
 } else {
-- 
1.8.3.1




[Qemu-block] [PULL 03/69] block: move restarting of throttled reqs to block/throttle-groups.c

2016-05-12 Thread Kevin Wolf
From: Paolo Bonzini 

We want to remove throttled_reqs from block/io.c.  This is the easy
part---hide the handling of throttled_reqs during disable/enable of
throttling within throttle-groups.c.

Signed-off-by: Paolo Bonzini 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block/io.c  | 15 +--
 block/throttle-groups.c | 14 ++
 include/block/throttle-groups.h |  1 +
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/block/io.c b/block/io.c
index 691baa6..9201b89 100644
--- a/block/io.c
+++ b/block/io.c
@@ -62,28 +62,15 @@ static int coroutine_fn 
bdrv_co_do_write_zeroes(BlockDriverState *bs,
 void bdrv_set_io_limits(BlockDriverState *bs,
 ThrottleConfig *cfg)
 {
-int i;
-
 throttle_group_config(bs, cfg);
-
-for (i = 0; i < 2; i++) {
-qemu_co_enter_next(>throttled_reqs[i]);
-}
 }
 
 static void bdrv_start_throttled_reqs(BlockDriverState *bs)
 {
 bool enabled = bs->io_limits_enabled;
-int i;
 
 bs->io_limits_enabled = false;
-
-for (i = 0; i < 2; i++) {
-while (qemu_co_enter_next(>throttled_reqs[i])) {
-;
-}
-}
-
+throttle_group_restart_bs(bs);
 bs->io_limits_enabled = enabled;
 }
 
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 4920e09..b796f6b 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -313,6 +313,17 @@ void coroutine_fn 
throttle_group_co_io_limits_intercept(BlockDriverState *bs,
 qemu_mutex_unlock(>lock);
 }
 
+void throttle_group_restart_bs(BlockDriverState *bs)
+{
+int i;
+
+for (i = 0; i < 2; i++) {
+while (qemu_co_enter_next(>throttled_reqs[i])) {
+;
+}
+}
+}
+
 /* Update the throttle configuration for a particular group. Similar
  * to throttle_config(), but guarantees atomicity within the
  * throttling group.
@@ -335,6 +346,9 @@ void throttle_group_config(BlockDriverState *bs, 
ThrottleConfig *cfg)
 }
 throttle_config(ts, tt, cfg);
 qemu_mutex_unlock(>lock);
+
+qemu_co_enter_next(>throttled_reqs[0]);
+qemu_co_enter_next(>throttled_reqs[1]);
 }
 
 /* Get the throttle configuration from a particular group. Similar to
diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
index aba28f3..395f72d 100644
--- a/include/block/throttle-groups.h
+++ b/include/block/throttle-groups.h
@@ -38,6 +38,7 @@ void throttle_group_get_config(BlockDriverState *bs, 
ThrottleConfig *cfg);
 
 void throttle_group_register_bs(BlockDriverState *bs, const char *groupname);
 void throttle_group_unregister_bs(BlockDriverState *bs);
+void throttle_group_restart_bs(BlockDriverState *bs);
 
 void coroutine_fn throttle_group_co_io_limits_intercept(BlockDriverState *bs,
 unsigned int bytes,
-- 
1.8.3.1




[Qemu-block] [PULL 05/69] block: introduce bdrv_no_throttling_begin/end

2016-05-12 Thread Kevin Wolf
From: Paolo Bonzini 

Extract the handling of throttling from bdrv_flush_io_queue.  These
new functions will soon become BdrvChildRole callbacks, as they can
be generalized to "beginning of drain" and "end of drain".

Reviewed-by: Alberto Garcia 
Signed-off-by: Paolo Bonzini 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block.c   |  1 -
 block/block-backend.c |  6 ++
 block/io.c| 33 +
 block/throttle-groups.c   |  4 
 include/block/block_int.h |  9 ++---
 5 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/block.c b/block.c
index d4939b4..6cbad0e 100644
--- a/block.c
+++ b/block.c
@@ -2261,7 +2261,6 @@ static void swap_feature_fields(BlockDriverState *bs_top,
 
 assert(!bs_new->throttle_state);
 if (bs_top->throttle_state) {
-assert(bs_top->io_limits_enabled);
 bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
 bdrv_io_limits_disable(bs_top);
 }
diff --git a/block/block-backend.c b/block/block-backend.c
index 16c9d5e..9538e79 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -794,7 +794,6 @@ int blk_read_unthrottled(BlockBackend *blk, int64_t 
sector_num, uint8_t *buf,
  int nb_sectors)
 {
 BlockDriverState *bs = blk_bs(blk);
-bool enabled;
 int ret;
 
 ret = blk_check_request(blk, sector_num, nb_sectors);
@@ -802,10 +801,9 @@ int blk_read_unthrottled(BlockBackend *blk, int64_t 
sector_num, uint8_t *buf,
 return ret;
 }
 
-enabled = bs->io_limits_enabled;
-bs->io_limits_enabled = false;
+bdrv_no_throttling_begin(bs);
 ret = blk_read(blk, sector_num, buf, nb_sectors);
-bs->io_limits_enabled = enabled;
+bdrv_no_throttling_end(bs);
 return ret;
 }
 
diff --git a/block/io.c b/block/io.c
index c484856..b798040 100644
--- a/block/io.c
+++ b/block/io.c
@@ -65,28 +65,32 @@ void bdrv_set_io_limits(BlockDriverState *bs,
 throttle_group_config(bs, cfg);
 }
 
-static void bdrv_start_throttled_reqs(BlockDriverState *bs)
+void bdrv_no_throttling_begin(BlockDriverState *bs)
 {
-bool enabled = bs->io_limits_enabled;
+if (bs->io_limits_disabled++ == 0) {
+throttle_group_restart_bs(bs);
+}
+}
 
-bs->io_limits_enabled = false;
-throttle_group_restart_bs(bs);
-bs->io_limits_enabled = enabled;
+void bdrv_no_throttling_end(BlockDriverState *bs)
+{
+assert(bs->io_limits_disabled);
+--bs->io_limits_disabled;
 }
 
 void bdrv_io_limits_disable(BlockDriverState *bs)
 {
-bs->io_limits_enabled = false;
-bdrv_start_throttled_reqs(bs);
+assert(bs->throttle_state);
+bdrv_no_throttling_begin(bs);
 throttle_group_unregister_bs(bs);
+bdrv_no_throttling_end(bs);
 }
 
 /* should be called before bdrv_set_io_limits if a limit is set */
 void bdrv_io_limits_enable(BlockDriverState *bs, const char *group)
 {
-assert(!bs->io_limits_enabled);
+assert(!bs->throttle_state);
 throttle_group_register_bs(bs, group);
-bs->io_limits_enabled = true;
 }
 
 void bdrv_io_limits_update_group(BlockDriverState *bs, const char *group)
@@ -302,18 +306,22 @@ static void coroutine_fn 
bdrv_co_yield_to_drain(BlockDriverState *bs)
  */
 void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
 {
+bdrv_no_throttling_begin(bs);
 bdrv_drain_recurse(bs);
 bdrv_co_yield_to_drain(bs);
+bdrv_no_throttling_end(bs);
 }
 
 void bdrv_drain(BlockDriverState *bs)
 {
+bdrv_no_throttling_begin(bs);
 bdrv_drain_recurse(bs);
 if (qemu_in_coroutine()) {
 bdrv_co_yield_to_drain(bs);
 } else {
 bdrv_drain_poll(bs);
 }
+bdrv_no_throttling_end(bs);
 }
 
 /*
@@ -336,6 +344,7 @@ void bdrv_drain_all(void)
 if (bs->job) {
 block_job_pause(bs->job);
 }
+bdrv_no_throttling_begin(bs);
 bdrv_drain_recurse(bs);
 aio_context_release(aio_context);
 
@@ -377,6 +386,7 @@ void bdrv_drain_all(void)
 AioContext *aio_context = bdrv_get_aio_context(bs);
 
 aio_context_acquire(aio_context);
+bdrv_no_throttling_end(bs);
 if (bs->job) {
 block_job_resume(bs->job);
 }
@@ -980,7 +990,7 @@ int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
 }
 
 /* throttling disk I/O */
-if (bs->io_limits_enabled) {
+if (bs->throttle_state) {
 throttle_group_co_io_limits_intercept(bs, bytes, false);
 }
 
@@ -1330,7 +1340,7 @@ int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
 }
 
 /* throttling disk I/O */
-if (bs->io_limits_enabled) {
+if (bs->throttle_state) {
 throttle_group_co_io_limits_intercept(bs, bytes, true);
 }
 
@@ -2772,7 +2782,6 @@ void bdrv_flush_io_queue(BlockDriverState *bs)
 } else if (bs->file) {
 bdrv_flush_io_queue(bs->file->bs);
 }
-

[Qemu-block] [PULL 12/69] block: Introduce .bdrv_co_preadv/pwritev BlockDriver function

2016-05-12 Thread Kevin Wolf
Many parts of the block layer are already byte granularity. The block
driver interface, however, was still missing an interface that allows
making use of this. This patch introduces a new BlockDriver interface,
which is based on coroutines, vectored, has flags and uses a byte
granularity. This is now the preferred interface for new drivers.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 block/io.c| 28 ++--
 include/block/block_int.h |  4 
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/block/io.c b/block/io.c
index feddb71..70caadd 100644
--- a/block/io.c
+++ b/block/io.c
@@ -803,8 +803,15 @@ static int coroutine_fn 
bdrv_driver_preadv(BlockDriverState *bs,
QEMUIOVector *qiov, int flags)
 {
 BlockDriver *drv = bs->drv;
-int64_t sector_num = offset >> BDRV_SECTOR_BITS;
-unsigned int nb_sectors = bytes >> BDRV_SECTOR_BITS;
+int64_t sector_num;
+unsigned int nb_sectors;
+
+if (drv->bdrv_co_preadv) {
+return drv->bdrv_co_preadv(bs, offset, bytes, qiov, flags);
+}
+
+sector_num = offset >> BDRV_SECTOR_BITS;
+nb_sectors = bytes >> BDRV_SECTOR_BITS;
 
 assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
 assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
@@ -834,10 +841,18 @@ static int coroutine_fn 
bdrv_driver_pwritev(BlockDriverState *bs,
 QEMUIOVector *qiov, int flags)
 {
 BlockDriver *drv = bs->drv;
-int64_t sector_num = offset >> BDRV_SECTOR_BITS;
-unsigned int nb_sectors = bytes >> BDRV_SECTOR_BITS;
+int64_t sector_num;
+unsigned int nb_sectors;
 int ret;
 
+if (drv->bdrv_co_pwritev) {
+ret = drv->bdrv_co_pwritev(bs, offset, bytes, qiov, flags);
+goto emulate_flags;
+}
+
+sector_num = offset >> BDRV_SECTOR_BITS;
+nb_sectors = bytes >> BDRV_SECTOR_BITS;
+
 assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
 assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 assert((bytes >> BDRV_SECTOR_BITS) <= BDRV_REQUEST_MAX_SECTORS);
@@ -857,13 +872,14 @@ static int coroutine_fn 
bdrv_driver_pwritev(BlockDriverState *bs,
 acb = bs->drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
bdrv_co_io_em_complete, );
 if (acb == NULL) {
-return -EIO;
+ret = -EIO;
 } else {
 qemu_coroutine_yield();
-return co.ret;
+ret = co.ret;
 }
 }
 
+emulate_flags:
 if (ret == 0 && (flags & BDRV_REQ_FUA) &&
 !(drv->supported_write_flags & BDRV_REQ_FUA))
 {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 804bc1d..565f795 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -153,10 +153,14 @@ struct BlockDriver {
 
 int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs,
+uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
 int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn (*bdrv_co_writev_flags)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int flags);
+int coroutine_fn (*bdrv_co_pwritev)(BlockDriverState *bs,
+uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
 
 int supported_write_flags;
 
-- 
1.8.3.1




[Qemu-block] [PULL 02/69] block: make bdrv_start_throttled_reqs return void

2016-05-12 Thread Kevin Wolf
From: Paolo Bonzini 

The return value is unused and I am not sure why it would be useful.

Reviewed-by: Fam Zheng 
Signed-off-by: Paolo Bonzini 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block/io.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/block/io.c b/block/io.c
index a91d862..691baa6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -71,10 +71,8 @@ void bdrv_set_io_limits(BlockDriverState *bs,
 }
 }
 
-/* this function drain all the throttled IOs */
-static bool bdrv_start_throttled_reqs(BlockDriverState *bs)
+static void bdrv_start_throttled_reqs(BlockDriverState *bs)
 {
-bool drained = false;
 bool enabled = bs->io_limits_enabled;
 int i;
 
@@ -82,13 +80,11 @@ static bool bdrv_start_throttled_reqs(BlockDriverState *bs)
 
 for (i = 0; i < 2; i++) {
 while (qemu_co_enter_next(>throttled_reqs[i])) {
-drained = true;
+;
 }
 }
 
 bs->io_limits_enabled = enabled;
-
-return drained;
 }
 
 void bdrv_io_limits_disable(BlockDriverState *bs)
-- 
1.8.3.1




Re: [Qemu-block] [PATCH v7 06/19] scsi-disk: Switch to byte-based aio block access

2016-05-12 Thread Paolo Bonzini


On 12/05/2016 13:25, Paolo Bonzini wrote:
>> >  r->req.aiocb = dma_blk_read(s->qdev.conf.blk, r->req.sg, 
>> > r->sector,
>> >  scsi_dma_complete, r);
> This is broken, it should be changed to an offset in the previous patch.
> 
> Please rename the function too, so that it is obvious that you have
> changed all callers.

Oh, dma_blk_read is still sector-based...

Paolo



Re: [Qemu-block] [PATCH v7 06/19] scsi-disk: Switch to byte-based aio block access

2016-05-12 Thread Paolo Bonzini


On 06/05/2016 18:26, Eric Blake wrote:
> @@ -340,11 +338,12 @@ static void scsi_do_read(SCSIDiskReq *r, int ret)
>  r->req.aiocb = dma_blk_read(s->qdev.conf.blk, r->req.sg, r->sector,
>  scsi_dma_complete, r);

This is broken, it should be changed to an offset in the previous patch.

Please rename the function too, so that it is obvious that you have
changed all callers.

How was this patch tested?

Paolo

>  } else {
> -n = scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
> +scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
>  block_acct_start(blk_get_stats(s->qdev.conf.blk), >acct,
> - n * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -r->req.aiocb = blk_aio_readv(s->qdev.conf.blk, r->sector, >qiov, 
> n,
> - scsi_read_complete, r);
> + SCSI_DMA_BUF_SIZE, BLOCK_ACCT_READ);
> +r->req.aiocb = blk_aio_preadv(s->qdev.conf.blk,
> +  r->sector << BDRV_SECTOR_BITS, 
> >qiov,
> +  0, scsi_read_complete, r);
>  }



Re: [Qemu-block] [Qemu-devel] [PATCH v4 09/11] nbd: Add qemu-nbd -D for human-readable description

2016-05-12 Thread Daniel P. Berrange
On Wed, May 11, 2016 at 04:39:42PM -0600, Eric Blake wrote:
> The NBD protocol allows servers to advertise a human-readable
> description alongside an export name during NBD_OPT_LIST.  Add
> an option to pass through the user's string to the NBD client.
> 
> Doing this also makes it easier to test commit 200650d4, which
> is the client counterpart of receiving the description.
> 
> Signed-off-by: Eric Blake 
> ---
>  include/block/nbd.h |  1 +
>  nbd/nbd-internal.h  |  5 +++--
>  nbd/server.c| 34 ++
>  qemu-nbd.c  | 12 +++-
>  qemu-nbd.texi   |  5 -
>  5 files changed, 45 insertions(+), 12 deletions(-)

> diff --git a/qemu-nbd.texi b/qemu-nbd.texi
> index 9f23343..923de74 100644
> --- a/qemu-nbd.texi
> +++ b/qemu-nbd.texi
> @@ -79,9 +79,12 @@ Disconnect the device @var{dev}
>  Allow up to @var{num} clients to share the device (default @samp{1})
>  @item -t, --persistent
>  Don't exit on the last connection
> -@item -x NAME, --export-name=NAME
> +@item -x, --export-name=@var{name}

Why this change - that reads as saying that '-x' doesn't take any value
which is wrong IMHO

>  Set the NBD volume export name. This switches the server to use
>  the new style NBD protocol negotiation
> +@item -D, --description=@var{description}

Likewise this suggests -D doesn't take a value

> +Set the NBD volume export description, as a human-readable
> +string. Requires the use of @option{-x}
>  @item --tls-creds=ID
>  Enable mandatory TLS encryption for the server by setting the ID
>  of the TLS credentials object previously created with the --object

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|