[Qemu-block] [PATCH RFC 0/1] block: Handle NULL options correctly in raw_open

2017-03-07 Thread Dong Jia Shi
Trying to restore rbd image on ceph cluster from snapshot with
qemu-img could trigger a calling to raw_open with a NULL @options
after a calling to raw_close, and that will lead to a failure of
the snapshot applying.

[root@s8345007 ~]# gdb --args qemu-img snapshot -a snap1 rbd:test_pool/dj_image
... ...
Program received signal SIGSEGV, Segmentation fault.
0x801395a8 in qdict_next_entry (qdict=0x0, first_bucket=0) at 
qobject/qdict.c:327
327 if (!QLIST_EMPTY(>table[i])) {
(gdb) bt
#0  0x801395a8 in qdict_next_entry (qdict=0x0, first_bucket=0) at 
qobject/qdict.c:327
#1  0x80139626 in qdict_first (qdict=0x0) at qobject/qdict.c:340
#2  0x8013a00c in qdict_extract_subqdict (src=0x0, dst=0x3ffec50, 
start=0x80698260 "file.")
at qobject/qdict.c:576
#3  0x80019c26 in bdrv_open_child_bs (filename=0x0, options=0x0, 
bdref_key=0x8017ab38 "file", 
parent=0x80630300, child_role=0x80176108 , allow_none=false, 
errp=0x0) at block.c:2018
#4  0x80019dfa in bdrv_open_child (filename=0x0, options=0x0, 
bdref_key=0x8017ab38 "file", 
parent=0x80630300, child_role=0x80176108 , allow_none=false, 
errp=0x0) at block.c:2065
#5  0x8002b9a0 in raw_open (bs=0x80630300, options=0x0, flags=8194, 
errp=0x0) at block/raw-format.c:387
#6  0x80087516 in bdrv_snapshot_goto (bs=0x80630300, 
snapshot_id=0x3fff75c "snap1")
at block/snapshot.c:194
#7  0x80010b8c in img_snapshot (argc=4, argv=0x3fff4c0) at 
qemu-img.c:2937
#8  0x800140e4 in main (argc=4, argv=0x3fff4c0) at qemu-img.c:4373

The problematic caller of raw_open is block/snapshot.c:194:
178 int bdrv_snapshot_goto(BlockDriverState *bs,
179const char *snapshot_id)
180 {
181 BlockDriver *drv = bs->drv;
182 int ret, open_ret;
183 
184 if (!drv) {
185 return -ENOMEDIUM;
186 }
187 if (drv->bdrv_snapshot_goto) {
188 return drv->bdrv_snapshot_goto(bs, snapshot_id);
189 }
190 
191 if (bs->file) {
192 drv->bdrv_close(bs);
193 ret = bdrv_snapshot_goto(bs->file->bs, snapshot_id);
194 open_ret = drv->bdrv_open(bs, NULL, bs->open_flags, NULL);
195 if (open_ret < 0) {
196 bdrv_unref(bs->file->bs);
197 bs->drv = NULL;
198 return open_ret;
199 }
200 return ret;
201 }
202 
203 return -ENOTSUP;
204 }

AFAIU, that's because raw_open does not take NULL @options as a
legal parameter. @options should have a "file" key-value pair to
ensure a success open.

I'm not familiar with these code, but after reading them for a
while, I think that raw_close always does nothing, so it seems to
me that a raw_open after a raw_close probably should also do nothing
as well.

I admit that there may be more than one way to detect whether
raw_open is called after raw_close. E.g. introducing a flag to
indicate whether the current BDS is closed, or try to reuse some
existing fields in BDS. But taking NULL @options as a sign of
trying to open again could be the simplest working method jumped
into my mind.

I'd like to send this as a RFC here for more advice, in case I
walked along a wrong way to far. Comments are welcome.

Dong Jia Shi (1):
  block: Handle NULL options correctly in raw_open

 block/raw-format.c | 8 
 1 file changed, 8 insertions(+)

-- 
2.8.4




[Qemu-block] [PATCH RFC 1/1] block: Handle NULL options correctly in raw_open

2017-03-07 Thread Dong Jia Shi
A normal call for raw_open should always pass in a non-NULL @options,
but for some certain cases (e.g. trying to applying snapshot on a RBD
image), they call raw_open with a NULL @options right after the calling
for raw_close.

Let's take the NULL @options as a sign of trying to do raw_open again,
and just simply return a success code.

Signed-off-by: Dong Jia Shi 
---
 block/raw-format.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index 86fbc65..ee05730 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -384,6 +384,14 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 BDRVRawState *s = bs->opaque;
 int ret;
 
+/*
+ * Notice:
+ * NULL options is only sensible when applying a snapshot.
+ */
+if (!options) {
+return 0;
+}
+
 bs->file = bdrv_open_child(NULL, options, "file", bs, _file,
false, errp);
 if (!bs->file) {
-- 
2.8.4




Re: [Qemu-block] [PATCH RFC 1/1] vmstate: draft fix for failed iotests case 68 and 91

2017-03-07 Thread QingFeng Hao



在 2017/3/7 18:19, Halil Pasic 写道:


On 03/07/2017 11:05 AM, Kevin Wolf wrote:

Am 07.03.2017 um 10:54 hat Halil Pasic geschrieben:


On 03/07/2017 10:29 AM, Kevin Wolf wrote:

Am 07.03.2017 um 03:53 hat QingFeng Hao geschrieben:

I am not very clear about the logic in vmstate.c, but from its context in
vmstate_save_state, it seems size should not be 0, otherwise the followed
for loop will keep working on the same element. So I just add a simple
check to pass that case, not sure if it's right but it can pass iotest
case 68 and 91 now.

The iotest's failed output is:
068 1s ... - output mismatch (see 068.out.bad)
--- 
/home/haoqf/KVMonz/gitcheck/work/qemu-master/tree/qemu/tests/qemu-iotests/068.out
   2017-03-06 05:52:24.817328899 +0100
+++ 068.out.bad 2017-03-07 03:28:44.426714519 +0100
@@ -3,9 +3,13 @@
  === Saving and reloading a VM state to/from a qcow2 image ===

  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
+qemu-system-s390x: migration/vmstate.c:336: vmstate_save_state: Assertion 
`first_elem || !n_elems' failed.
+./common.config: line 109: 52497 Aborted ( if [ -n 
"${QEMU_NEED_PID}" ]; then
+echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid";
+fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" )


Signed-off-by: QingFeng Hao 
---
  migration/vmstate.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/migration/vmstate.c b/migration/vmstate.c
index 78b3cd4..ff28dde 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -106,6 +106,10 @@ int vmstate_load_state(QEMUFile *f, const 
VMStateDescription *vmsd,
  int i, n_elems = vmstate_n_elems(opaque, field);
  int size = vmstate_size(opaque, field);
  
+if (size == 0) {

+field++;
+continue;
+}
  vmstate_handle_alloc(first_elem, field, opaque);
  if (field->flags & VMS_POINTER) {
  first_elem = *(void **)first_elem;
@@ -322,6 +326,10 @@ void vmstate_save_state(QEMUFile *f, const 
VMStateDescription *vmsd,
  int64_t old_offset, written_bytes;
  QJSON *vmdesc_loop = vmdesc;
  
+if (size == 0) {

+field++;
+continue;
+}
  trace_vmstate_save_state_loop(vmsd->name, field->name, n_elems);
  if (field->flags & VMS_POINTER) {
  first_elem = *(void **)first_elem;

This is really a live migration fix, so I'm adding Juan and Dave to CC.

You are right, this is migration stuff and has very little to do with
qemu-block.

I suspect the real question is why a field with size 0 was even stored
in the vmstate to begin with.


I have looked onto the issue. It affects s390x only if we
are running without KVM. Basically, S390CPU.irqstate is unused
if we do not use KVM, and thus no buffer is allocated.

IMHO this is a missing field and the cleaner way to handle such
missing fields is exist. However this used to work, and I recommended
QuiFeng Hao to discuss the problem upstream.

By the way, I think, if we want to go back to the old behavior
and support VMS_VBUFFER with size 0 and nullptr, a much
cleaner way to do the fix is to change the assert to:

assert(first_elem  || !n_elems || !size)

Obviously the other clean way to fix is to implement exists.

If you're right that this specific vmstate was valid in earlier
versions, then I think it's clear that we need to make it work again.
Otherwise we're breaking migration from old versions.

Not really. We would not break migration because nothing was written to
the stream for VMS_VBUFFER of size 0 except the vmdesc which is at the end,
'debug only', and does not affect migration compatibility.

IMHO it is an API question. I would have said, there is no data, therefore
there is no field if it's from scratch. But with prior history, I agree
with Dave, we should restore old behavior -- which was changed unintentionally
because I made a wrong assumption.

Halil,
Unfortunately, another assert failed after I change the code as below in
vmstate_save_state and vmstate_load_state:
assert(first_elem  || !n_elems || !size);

The error is:
+qemu-system-s390x: migration/vmstate.c:341: vmstate_save_state: 
Assertion `field->flags & VMS_ARRAY_OF_POINTER' failed.

It's the code as below:
 if (!curr_elem) {
/* if null pointer write placeholder and do not 
follow  */

  assert(field->flags & VMS_ARRAY_OF_POINTER);
After debug, I found that the assert failure was still of irqstate and 
its field flags is 0x102(VMS_VBUFFER | VMS_POINTER),
while VMS_ARRAY_OF_POINTER is 0x040. The flags's value matches Dave's 
former assumption

on machine.c although machine.c wasn't compiled, which also confuses me.

Then I commented out that assert(field->flags & VMS_ARRAY_OF_POINTER) 
and the case 68 & 91

can both work!

The question is: can we do that change and what the second assert of 
field's flags is used 

[Qemu-block] [PATCH v2] file-posix: Incoporate max_segments in block limit

2017-03-07 Thread Fam Zheng
Linux exposes a separate limit, /sys/block/.../queue/max_segments, which
in the worst case can be more restrictive than BLKSECTGET (as they are
two different things). Similar to the BLKSECTGET story, guests don't see
this limit and send big requests will get -EINVAL error on SG_IO.

Lean on the safer side to clamp max_transfer according to max_segments
and page size, because in the end what host HBA gets is the mapped host
pages rather than a guest buffer.

Signed-off-by: Fam Zheng 

---

v2: Use /sys/dev/block/MAJOR:MINOR/queue/max_segments. [Paolo]
---
 block/file-posix.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 4de1abd..c4c0663 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -668,6 +668,48 @@ static int hdev_get_max_transfer_length(BlockDriverState 
*bs, int fd)
 #endif
 }
 
+static int hdev_get_max_segments(const struct stat *st)
+{
+#ifdef CONFIG_LINUX
+char buf[32];
+const char *end;
+char *sysfspath;
+int ret;
+int fd = -1;
+long max_segments;
+
+sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
+major(st->st_rdev), minor(st->st_rdev));
+fd = open(sysfspath, O_RDONLY);
+if (fd == -1) {
+ret = -errno;
+goto out;
+}
+do {
+ret = read(fd, buf, sizeof(buf));
+} while (ret == -1 && errno == EINTR);
+if (ret < 0) {
+ret = -errno;
+goto out;
+} else if (ret == 0) {
+ret = -EIO;
+goto out;
+}
+buf[ret] = 0;
+/* The file is ended with '\n', pass 'end' to accept that. */
+ret = qemu_strtol(buf, , 10, _segments);
+if (ret == 0 && end && *end == '\n') {
+ret = max_segments;
+}
+
+out:
+g_free(sysfspath);
+return ret;
+#else
+return -ENOTSUP;
+#endif
+}
+
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
@@ -679,6 +721,11 @@ static void raw_refresh_limits(BlockDriverState *bs, Error 
**errp)
 if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
 bs->bl.max_transfer = pow2floor(ret);
 }
+ret = hdev_get_max_segments();
+if (ret > 0) {
+bs->bl.max_transfer = MIN(bs->bl.max_transfer,
+  ret * getpagesize());
+}
 }
 }
 
-- 
2.9.3




Re: [Qemu-block] [PATCH] file-posix: Incoporate max_segments in block limit

2017-03-07 Thread Fam Zheng
On Tue, 03/07 11:58, Paolo Bonzini wrote:
> 
> 
> On 07/03/2017 03:17, Fam Zheng wrote:
> > Linux exposes a separate limit, /sys/block/.../queue/max_segments, which
> > in the worst case can be more restrictive than BLKSECTGET (as they are
> > two different things). Similar to the BLKSECTGET story, guests don't see
> > this limit and send big requests will get -EINVAL error on SG_IO.
> > 
> > Lean on the safer side to clamp max_transfer according to max_segments
> > and page size, because in the end what host HBA gets is the mapped host
> > pages rather than a guest buffer.
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/file-posix.c | 58 
> > ++
> >  1 file changed, 58 insertions(+)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 4de1abd..b615262 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -668,6 +668,59 @@ static int 
> > hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
> >  #endif
> >  }
> >  
> > +static int hdev_get_max_segments(BlockDriverState *bs)
> > +{
> > +#ifdef CONFIG_LINUX
> > +char buf[32];
> > +const char *end;
> > +char *sysfspath, *fullpath;
> > +int ret;
> > +int fd = -1;
> > +long max_segments;
> > +
> > +fullpath = realpath(bs->exact_filename, NULL);
> > +if (!fullpath) {
> > +return -errno;
> > +}
> > +if (strncmp(fullpath, "/dev/", 5) || !fullpath[5]) {
> > +ret = -ENOTSUP;
> > +goto out;
> > +}
> > +sysfspath = g_strdup_printf("/sys/block/%s/queue/max_segments",
> > +[5]);
> 
> I think you cannot rely on the /dev/... path.  Luckily, there is an
> alternative path to "queue" via /sys/dev/block/MAJOR:MINOR/queue, where
> MAJOR:MINOR can be retrieved via fstat.  This also avoids any possible
> TOC-TOU races.

Sounds good, I'll send another version.

Fam



[Qemu-block] [PATCH v6 10/10] tests: Add coverage for recent block geometry fixes

2017-03-07 Thread Eric Blake
Use blkdebug's new geometry constraints to emulate setups that
have caused recent regression fixes: write zeroes asserting
when running through a loopback block device with max-transfer
smaller than cluster size, and discard rounding away portions
of requests not aligned to preferred boundaries.  Also, add
coverage that the block layer is honoring max transfer limits.

For now, a single iotest performs all actions, with the idea
that we can add future blkdebug constraint test cases in the
same file; but it can be split into multiple iotests if we find
reason to run one portion of the test in more setups than what
are possible in the other.

For reference, the final portion of the test (checking whether
discard passes as much as possible to the lowest layers of the
stack) works as follows:

qemu-io: discard 30M at 8001, passed to blkdebug
  blkdebug: discard 511 bytes at 8001, -ENOTSUP (smaller than
blkdebug's 512 align)
  blkdebug: discard 14371328 bytes at 8512, passed to qcow2
qcow2: discard 739840 bytes at 8512, -ENOTSUP (smaller than
qcow2's 1M align)
qcow2: discard 13M bytes at 77M, succeeds
  blkdebug: discard 15M bytes at 90M, passed to qcow2
qcow2: discard 15M bytes at 90M, succeeds
  blkdebug: discard 1356800 bytes at 105M, passed to qcow2
qcow2: discard 1M at 105M, succeeds
qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's
1M align)
  blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than
blkdebug's 512 align)

Signed-off-by: Eric Blake 
Reviewed-by: Max Reitz 

---
v6: rebase to master by renumbering s/175/177/
v5: rebase to master by renumbering s/173/175/
v4: clean up some comments, nicer backing file creation, more commit message
v3: make comments tied more to test at hand, rather than the
particular hardware that led to the earlier patches being tested
v2: new patch
---
 tests/qemu-iotests/177 | 114 +
 tests/qemu-iotests/177.out |  49 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 164 insertions(+)
 create mode 100755 tests/qemu-iotests/177
 create mode 100644 tests/qemu-iotests/177.out

diff --git a/tests/qemu-iotests/177 b/tests/qemu-iotests/177
new file mode 100755
index 000..e4ddec7
--- /dev/null
+++ b/tests/qemu-iotests/177
@@ -0,0 +1,114 @@
+#!/bin/bash
+#
+# Test corner cases with unusual block geometries
+#
+# Copyright (C) 2016-2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=ebl...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+
+CLUSTER_SIZE=1M
+size=128M
+options=driver=blkdebug,image.driver=qcow2
+
+echo
+echo "== setting up files =="
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+$QEMU_IO -c "write -P 11 0 $size" "$TEST_IMG.base" | _filter_qemu_io
+_make_test_img -b "$TEST_IMG.base"
+$QEMU_IO -c "write -P 22 0 $size" "$TEST_IMG" | _filter_qemu_io
+
+# Limited to 64k max-transfer
+echo
+echo "== constrained alignment and max-transfer =="
+limits=align=4k,max-transfer=64k
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+ -c "write -P 33 1000 128k" -c "read -P 33 1000 128k" | _filter_qemu_io
+
+echo
+echo "== write zero with constrained max-transfer =="
+limits=align=512,max-transfer=64k,opt-write-zero=$CLUSTER_SIZE
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+ -c "write -z 8003584 2093056" | _filter_qemu_io
+
+# non-power-of-2 write-zero/discard alignments
+echo
+echo "== non-power-of-2 write zeroes limits =="
+
+limits=align=512,opt-write-zero=15M,max-write-zero=15M,opt-discard=15M,max-discard=15M
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+ -c "write -z 32M 32M" | _filter_qemu_io
+
+echo
+echo "== non-power-of-2 discard limits =="
+
+limits=align=512,opt-write-zero=15M,max-write-zero=15M,opt-discard=15M,max-discard=15M
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+ -c "discard 8001 30M" | _filter_qemu_io
+
+echo
+echo "== verify image content =="
+
+function verify_io()
+{
+   

[Qemu-block] [PATCH v6 09/10] blkdebug: Add ability to override unmap geometries

2017-03-07 Thread Eric Blake
Make it easier to simulate various unusual hardware setups (for
example, recent commits 3482b9b and b8d0a98 affect the Dell
Equallogic iSCSI with its 15M preferred and maximum unmap and
write zero sizing, or b2f95fe deals with the Linux loopback
block device having a max_transfer of 64k), by allowing blkdebug
to wrap any other device with further restrictions on various
alignments.

Signed-off-by: Eric Blake 

---
v6: more tweaks in docs and error messages
v5: tweak docs regarding max-transfer minimum
v4: relax 512 byte minimum now that blkdebug is byte-based, fix doc typo
v3: improve legibility of bounds checking, improve docs
v2: new patch
---
 qapi/block-core.json | 31 +++--
 block/blkdebug.c | 96 +++-
 2 files changed, 124 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9bb7f4a..e8aeb8e 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2476,7 +2476,32 @@
 # @config:  #optional filename of the configuration file
 #
 # @align:   #optional required alignment for requests in bytes,
-#   must be power of 2, or 0 for default
+#   must be positive power of 2, or 0 for default
+#
+# @max-transfer:#optional maximum size for I/O transfers in bytes,
+#   must be positive multiple of @align and of the
+#   underlying file's request alignment (but need not be
+#   a power of 2), or 0 for default (since 2.10)
+#
+# @opt-write-zero:  #optional preferred alignment for write zero requests
+#   in bytes, must be positive multiple of @align and of
+#   the underlying file's request alignment (but need not
+#   be a power of 2), or 0 for default (since 2.10)
+#
+# @max-write-zero:  #optional maximum size for write zero requests in bytes,
+#   must be positive multiple of @align, of @opt-write-zero,
+#   and of the underlying file's request alignment (but
+#   need not be a power of 2), or 0 for default (since 2.10)
+#
+# @opt-discard: #optional preferred alignment for discard requests
+#   in bytes, must be positive multiple of @align and of
+#   the underlying file's request alignment (but need not
+#   be a power of 2), or 0 for default (since 2.10)
+#
+# @max-discard: #optional maximum size for discard requests in bytes,
+#   must be positive multiple of @align, of @opt-discard,
+#   and of the underlying file's request alignment (but
+#   need not be a power of 2), or 0 for default (since 2.10)
 #
 # @inject-error:#optional array of error injection descriptions
 #
@@ -2487,7 +2512,9 @@
 { 'struct': 'BlockdevOptionsBlkdebug',
   'data': { 'image': 'BlockdevRef',
 '*config': 'str',
-'*align': 'int',
+'*align': 'int', '*max-transfer': 'int32',
+'*opt-write-zero': 'int32', '*max-write-zero': 'int32',
+'*opt-discard': 'int32', '*max-discard': 'int32',
 '*inject-error': ['BlkdebugInjectErrorOptions'],
 '*set-state': ['BlkdebugSetStateOptions'] } }

diff --git a/block/blkdebug.c b/block/blkdebug.c
index dd5d85d..4e2c392 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -39,6 +39,11 @@ typedef struct BDRVBlkdebugState {
 int state;
 int new_state;
 uint64_t align;
+uint64_t max_transfer;
+uint64_t opt_write_zero;
+uint64_t max_write_zero;
+uint64_t opt_discard;
+uint64_t max_discard;

 /* For blkdebug_refresh_filename() */
 char *config_file;
@@ -343,6 +348,31 @@ static QemuOptsList runtime_opts = {
 .type = QEMU_OPT_SIZE,
 .help = "Required alignment in bytes",
 },
+{
+.name = "max-transfer",
+.type = QEMU_OPT_SIZE,
+.help = "Maximum transfer size in bytes",
+},
+{
+.name = "opt-write-zero",
+.type = QEMU_OPT_SIZE,
+.help = "Optimum write zero alignment in bytes",
+},
+{
+.name = "max-write-zero",
+.type = QEMU_OPT_SIZE,
+.help = "Maximum write zero size in bytes",
+},
+{
+.name = "opt-discard",
+.type = QEMU_OPT_SIZE,
+.help = "Optimum discard alignment in bytes",
+},
+{
+.name = "max-discard",
+.type = QEMU_OPT_SIZE,
+.help = "Maximum discard size in bytes",
+},
 { /* end of list */ }
 },
 };
@@ -354,6 +384,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 QemuOpts *opts;
 Error *local_err = NULL;
 int ret;
+uint64_t align;

 opts = qemu_opts_create(_opts, NULL, 0, _abort);
 qemu_opts_absorb_qdict(opts, 

[Qemu-block] [PATCH v6 05/10] blkdebug: Sanity check block layer guarantees

2017-03-07 Thread Eric Blake
Commits 04ed95f4 and 1a62d0ac updated the block layer to auto-fragment
any I/O to fit within device boundaries. Additionally, when using a
minimum alignment of 4k, we want to ensure the block layer does proper
read-modify-write rather than requesting I/O on a slice of a sector.
Let's enforce that the contract is obeyed when using blkdebug.  For
now, blkdebug only allows alignment overrides, and just inherits other
limits from whatever device it is wrapping, but a future patch will
further enhance things.

Signed-off-by: Eric Blake 
Reviewed-by: Kevin Wolf 
Reviewed-by: Max Reitz 

---
v5: no change
v4: no change
v3: rebase to byte-based interfaces
v2: new patch
---
 block/blkdebug.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 67e8024..7c804b2 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -431,6 +431,13 @@ blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 BDRVBlkdebugState *s = bs->opaque;
 BlkdebugRule *rule = NULL;

+/* Sanity check block layer guarantees */
+assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
+if (bs->bl.max_transfer) {
+assert(bytes <= bs->bl.max_transfer);
+}
+
 QSIMPLEQ_FOREACH(rule, >active_rules, active_next) {
 uint64_t inject_offset = rule->options.inject.offset;

@@ -455,6 +462,13 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 BDRVBlkdebugState *s = bs->opaque;
 BlkdebugRule *rule = NULL;

+/* Sanity check block layer guarantees */
+assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
+if (bs->bl.max_transfer) {
+assert(bytes <= bs->bl.max_transfer);
+}
+
 QSIMPLEQ_FOREACH(rule, >active_rules, active_next) {
 uint64_t inject_offset = rule->options.inject.offset;

-- 
2.9.3




[Qemu-block] [PATCH v6 08/10] blkdebug: Simplify override logic

2017-03-07 Thread Eric Blake
Rather than store into a local variable, then copy to the struct
if the value is valid, then reporting errors otherwise, it is
simpler to just store into the struct and report errors if the
value is invalid.  This however requires that the struct store
a 64-bit number, rather than a narrower type.  Move the errno
assignment into a label that will be reused from more places in
the next patch.

Signed-off-by: Eric Blake 
Reviewed-by: Max Reitz 
Reviewed-by: Kevin Wolf 

---
v6: tweak error message, but R-b kept
v5: no change
v4: fix typo in commit message, move errno assignment
v3: new patch
---
 block/blkdebug.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index b3aa5c3..dd5d85d 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -38,7 +38,7 @@
 typedef struct BDRVBlkdebugState {
 int state;
 int new_state;
-int align;
+uint64_t align;

 /* For blkdebug_refresh_filename() */
 char *config_file;
@@ -353,7 +353,6 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 BDRVBlkdebugState *s = bs->opaque;
 QemuOpts *opts;
 Error *local_err = NULL;
-uint64_t align;
 int ret;

 opts = qemu_opts_create(_opts, NULL, 0, _abort);
@@ -389,12 +388,10 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 bs->file->bs->supported_zero_flags;

 /* Set request alignment */
-align = qemu_opt_get_size(opts, "align", 0);
-if (align < INT_MAX && is_power_of_2(align)) {
-s->align = align;
-} else if (align) {
-error_setg(errp, "Invalid alignment");
-ret = -EINVAL;
+s->align = qemu_opt_get_size(opts, "align", 0);
+if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
+error_setg(errp, "Cannot meet constraints with align %" PRIu64,
+   s->align);
 goto fail_unref;
 }

@@ -402,6 +399,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto out;

 fail_unref:
+ret = -EINVAL;
 bdrv_unref_child(bs, bs->file);
 out:
 if (ret < 0) {
-- 
2.9.3




[Qemu-block] [PATCH v6 06/10] blkdebug: Refactor error injection

2017-03-07 Thread Eric Blake
Rather than repeat the logic at each caller of checking if a Rule
exists that warrants an error injection, fold that logic into
inject_error(); and rename it to rule_check() for legibility.
This will help the next patch, which adds two more callers that
need to check rules for the potential of injecting errors.

Signed-off-by: Eric Blake 

---
v6: new patch
---
 block/blkdebug.c | 74 +---
 1 file changed, 33 insertions(+), 41 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 7c804b2..f721bcb 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -405,11 +405,30 @@ out:
 return ret;
 }

-static int inject_error(BlockDriverState *bs, BlkdebugRule *rule)
+static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
 {
 BDRVBlkdebugState *s = bs->opaque;
-int error = rule->options.inject.error;
-bool immediately = rule->options.inject.immediately;
+BlkdebugRule *rule = NULL;
+int error;
+bool immediately;
+
+QSIMPLEQ_FOREACH(rule, >active_rules, active_next) {
+uint64_t inject_offset = rule->options.inject.offset;
+
+if (inject_offset == -1 ||
+(bytes && inject_offset >= offset &&
+ inject_offset < offset + bytes))
+{
+break;
+}
+}
+
+if (!rule || !rule->options.inject.error) {
+return 0;
+}
+
+immediately = rule->options.inject.immediately;
+error = rule->options.inject.error;

 if (rule->options.inject.once) {
 QSIMPLEQ_REMOVE(>active_rules, rule, BlkdebugRule, active_next);
@@ -428,8 +447,7 @@ static int coroutine_fn
 blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
 {
-BDRVBlkdebugState *s = bs->opaque;
-BlkdebugRule *rule = NULL;
+int err;

 /* Sanity check block layer guarantees */
 assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
@@ -438,18 +456,9 @@ blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 assert(bytes <= bs->bl.max_transfer);
 }

-QSIMPLEQ_FOREACH(rule, >active_rules, active_next) {
-uint64_t inject_offset = rule->options.inject.offset;
-
-if (inject_offset == -1 ||
-(inject_offset >= offset && inject_offset < offset + bytes))
-{
-break;
-}
-}
-
-if (rule && rule->options.inject.error) {
-return inject_error(bs, rule);
+err = rule_check(bs, offset, bytes);
+if (err) {
+return err;
 }

 return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
@@ -459,8 +468,7 @@ static int coroutine_fn
 blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
 QEMUIOVector *qiov, int flags)
 {
-BDRVBlkdebugState *s = bs->opaque;
-BlkdebugRule *rule = NULL;
+int err;

 /* Sanity check block layer guarantees */
 assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
@@ -469,18 +477,9 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 assert(bytes <= bs->bl.max_transfer);
 }

-QSIMPLEQ_FOREACH(rule, >active_rules, active_next) {
-uint64_t inject_offset = rule->options.inject.offset;
-
-if (inject_offset == -1 ||
-(inject_offset >= offset && inject_offset < offset + bytes))
-{
-break;
-}
-}
-
-if (rule && rule->options.inject.error) {
-return inject_error(bs, rule);
+err = rule_check(bs, offset, bytes);
+if (err) {
+return err;
 }

 return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
@@ -488,17 +487,10 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t 
offset, uint64_t bytes,

 static int blkdebug_co_flush(BlockDriverState *bs)
 {
-BDRVBlkdebugState *s = bs->opaque;
-BlkdebugRule *rule = NULL;
+int err = rule_check(bs, 0, 0);

-QSIMPLEQ_FOREACH(rule, >active_rules, active_next) {
-if (rule->options.inject.offset == -1) {
-break;
-}
-}
-
-if (rule && rule->options.inject.error) {
-return inject_error(bs, rule);
+if (err) {
+return err;
 }

 return bdrv_co_flush(bs->file->bs);
-- 
2.9.3




[Qemu-block] [PATCH v6 01/10] iotests: fix 097 when run with qcow

2017-03-07 Thread Eric Blake
From: "Daniel P. Berrange" 

The previous commit:

  commit a3e1505daec31ef56f0489f8c8fff1b8e4ca92bd
  Author: Eric Blake 
  Date:   Mon Dec 5 09:49:34 2016 -0600

qcow2: Don't strand clusters near 2G intervals during commit

extended the 097 test case so that it did two passes, once
with an internal snapshot, once without.

qcow (v1) does not support internal snapshots, so this change
broke test 097 when run against qcow.

This splits 097 in two, creating a new 176 that tests the
internal snapshot codepath, effectively putting 097 back
to its content before the above commit.

Reviewed-by: Max Reitz 
Signed-off-by: Daniel P. Berrange 
Message-Id: <20170221115512.21918-8-berra...@redhat.com>
[eblake: test collisions: s/173/176/g]
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/097 |  10 +---
 tests/qemu-iotests/097.out | 125 ++--
 tests/qemu-iotests/176 | 126 +
 tests/qemu-iotests/176.out | 119 ++
 tests/qemu-iotests/group   |   1 +
 5 files changed, 251 insertions(+), 130 deletions(-)
 create mode 100755 tests/qemu-iotests/176
 create mode 100644 tests/qemu-iotests/176.out

diff --git a/tests/qemu-iotests/097 b/tests/qemu-iotests/097
index 4c33e80..1d28aff 100755
--- a/tests/qemu-iotests/097
+++ b/tests/qemu-iotests/097
@@ -56,26 +56,19 @@ _supported_os Linux
 #  3: Two-layer backing chain, commit to lower backing file
 # (in this case, the top image will implicitly stay unchanged)
 #
-# Each pass is run twice, since qcow2 has different code paths for cleaning
-# an image depending on whether it has a snapshot.
-#
 # 020 already tests committing, so this only tests whether image chains are
 # working properly and that all images above the base are emptied; therefore,
 # no complicated patterns are necessary.  Check near the 2G mark, as qcow2
 # has been buggy at that boundary in the past.
 for i in 0 1 2 3; do
-for j in 0 1; do

 echo
-echo "=== Test pass $i.$j ==="
+echo "=== Test pass $i ==="
 echo

 TEST_IMG="$TEST_IMG.base" _make_test_img 2100M
 TEST_IMG="$TEST_IMG.itmd" _make_test_img -b "$TEST_IMG.base" 2100M
 _make_test_img -b "$TEST_IMG.itmd" 2100M
-if [ $j -eq 0 ]; then
-$QEMU_IMG snapshot -c snap "$TEST_IMG"
-fi

 $QEMU_IO -c 'write -P 1 0x7ffd 192k' "$TEST_IMG.base" | _filter_qemu_io
 $QEMU_IO -c 'write -P 2 0x7ffe 128k' "$TEST_IMG.itmd" | _filter_qemu_io
@@ -121,7 +114,6 @@ $QEMU_IMG map "$TEST_IMG.itmd" | _filter_qemu_img_map
 $QEMU_IMG map "$TEST_IMG" | _filter_qemu_img_map

 done
-done


 # success, all done
diff --git a/tests/qemu-iotests/097.out b/tests/qemu-iotests/097.out
index 8106cc9..81fc225 100644
--- a/tests/qemu-iotests/097.out
+++ b/tests/qemu-iotests/097.out
@@ -1,6 +1,6 @@
 QA output created by 097

-=== Test pass 0.0 ===
+=== Test pass 0 ===

 Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=2202009600
 Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=2202009600 
backing_file=TEST_DIR/t.IMGFMT.base
@@ -29,66 +29,7 @@ Offset  Length  File
 0x7ffd  0x1 TEST_DIR/t.IMGFMT.base
 0x7ffe  0x2 TEST_DIR/t.IMGFMT.itmd

-=== Test pass 0.1 ===
-
-Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=2202009600
-Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=2202009600 
backing_file=TEST_DIR/t.IMGFMT.base
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2202009600 
backing_file=TEST_DIR/t.IMGFMT.itmd
-wrote 196608/196608 bytes at offset 2147287040
-192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 131072/131072 bytes at offset 2147352576
-128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 65536/65536 bytes at offset 2147418112
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-Image committed.
-read 196608/196608 bytes at offset 2147287040
-192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 65536/65536 bytes at offset 2147287040
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 65536/65536 bytes at offset 2147352576
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 65536/65536 bytes at offset 2147418112
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-Offset  Length  File
-0x7ffd  0x3 TEST_DIR/t.IMGFMT.base
-Offset  Length  File
-0x7ffd  0x1 TEST_DIR/t.IMGFMT.base
-0x7ffe  0x2 TEST_DIR/t.IMGFMT.itmd
-Offset  Length  File
-0x7ffd  0x1 TEST_DIR/t.IMGFMT.base
-0x7ffe  0x2 TEST_DIR/t.IMGFMT.itmd
-
-=== Test pass 1.0 ===
-
-Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=2202009600
-Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=2202009600 
backing_file=TEST_DIR/t.IMGFMT.base
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2202009600 

[Qemu-block] [PATCH v6 07/10] blkdebug: Add pass-through write_zero and discard support

2017-03-07 Thread Eric Blake
In order to test the effects of artificial geometry constraints
on operations like write zero or discard, we first need blkdebug
to manage these actions.  It also allows us to inject errors on
those operations, just like we can for read/write/flush.

We can also test the contract promised by the block layer; namely,
if a device has specified limits on alignment or maximum size,
then those limits must be obeyed (for now, the blkdebug driver
merely inherits limits from whatever it is wrapping, but the next
patch will further enhance it to allow specific limit overrides).

This patch intentionally refuses to service requests smaller than
the requested alignments; this is because an upcoming patch adds
a qemu-iotest to prove that the block layer is correctly handling
fragmentation, but the test only works if there is a way to tell
the difference at artificial alignment boundaries when blkdebug is
using a larger-than-default alignment.  If we let the blkdebug
layer always defer to the underlying layer, which potentially has
a smaller granularity, the iotest will be thwarted.

Tested by setting up an NBD server with export 'foo', then invoking:
$ ./qemu-io
qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
qemu-io> d 0 15M
qemu-io> w -z 0 15M

Pre-patch, the server never sees the discard (it was silently
eaten by the block layer); post-patch it is passed across the
wire.  Likewise, pre-patch the write is always passed with
NBD_WRITE (with 15M of zeroes on the wire), while post-patch
it can utilize NBD_WRITE_ZEROES (for less traffic).

Signed-off-by: Eric Blake 

---
v6: tighten check of unaligned requests, rebase on rule check
refactoring, drop R-b
v5: include 2017 copyright
v4: correct error injection to respect byte range, tweak formatting
v3: rebase to byte-based read/write, improve docs on why no
partial write zero passthrough
v2: new patch
---
 block/blkdebug.c | 74 
 1 file changed, 74 insertions(+)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index f721bcb..b3aa5c3 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -1,6 +1,7 @@
 /*
  * Block protocol for I/O error injection
  *
+ * Copyright (C) 2016-2017 Red Hat, Inc.
  * Copyright (c) 2010 Kevin Wolf 
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
@@ -382,6 +383,11 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto out;
 }

+bs->supported_write_flags = BDRV_REQ_FUA &
+bs->file->bs->supported_write_flags;
+bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+bs->file->bs->supported_zero_flags;
+
 /* Set request alignment */
 align = qemu_opt_get_size(opts, "align", 0);
 if (align < INT_MAX && is_power_of_2(align)) {
@@ -496,6 +502,72 @@ static int blkdebug_co_flush(BlockDriverState *bs)
 return bdrv_co_flush(bs->file->bs);
 }

+static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
+  int64_t offset, int count,
+  BdrvRequestFlags flags)
+{
+uint32_t align = MAX(bs->bl.request_alignment,
+ bs->bl.pwrite_zeroes_alignment);
+int err;
+
+/* Only pass through requests that are larger than requested
+ * preferred alignment (so that we test the fallback to writes on
+ * unaligned portions), and check that the block layer never hands
+ * us anything unaligned that crosses an alignment boundary.  */
+if (count < align) {
+assert(QEMU_IS_ALIGNED(offset, align) ||
+   QEMU_IS_ALIGNED(offset + count, align) ||
+   DIV_ROUND_UP(offset, align) ==
+   DIV_ROUND_UP(offset + count, align));
+return -ENOTSUP;
+}
+assert(QEMU_IS_ALIGNED(offset, align));
+assert(QEMU_IS_ALIGNED(count, align));
+if (bs->bl.max_pwrite_zeroes) {
+assert(count <= bs->bl.max_pwrite_zeroes);
+}
+
+err = rule_check(bs, offset, count);
+if (err) {
+return err;
+}
+
+return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
+}
+
+static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
+ int64_t offset, int count)
+{
+uint32_t align = bs->bl.pdiscard_alignment;
+int err;
+
+/* Only pass through requests that are larger than requested
+ * minimum alignment, and ensure that unaligned requests do not
+ * cross optimum discard boundaries. */
+if (count < bs->bl.request_alignment) {
+assert(QEMU_IS_ALIGNED(offset, align) ||
+   QEMU_IS_ALIGNED(offset + count, align) ||
+   DIV_ROUND_UP(offset, align) ==
+   DIV_ROUND_UP(offset + count, align));
+return -ENOTSUP;
+}
+assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+

[Qemu-block] [PATCH v6 00/10] add blkdebug tests

2017-03-07 Thread Eric Blake
Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-blkdebug-v6

v5 was:
https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg03042.html

Since then:
- Rebase to master
- Pull in Dan's patch that splits test 97 for qcow (patch 1)
- Address comments from Kevin (includes creating new patch 2, 6)
- Update patches to reflect that this is now 2.10 material, since it
missed soft freeze and is not fixing a bug

001/10:[down] 'iotests: fix 097 when run with qcow'
002/10:[down] 'iotests: Improve image-clear tests on non-aligned image'
003/10:[0007] [FC] 'qcow2: Assert that cluster operations are aligned'
004/10:[0002] [FC] 'qcow2: Discard/zero clusters by byte count'
005/10:[] [--] 'blkdebug: Sanity check block layer guarantees'
006/10:[down] 'blkdebug: Refactor error injection'
007/10:[0046] [FC] 'blkdebug: Add pass-through write_zero and discard support'
008/10:[0003] [FC] 'blkdebug: Simplify override logic'
009/10:[0045] [FC] 'blkdebug: Add ability to override unmap geometries'
010/10:[0004] [FC] 'tests: Add coverage for recent block geometry fixes'

Daniel P. Berrange (1):
  iotests: fix 097 when run with qcow

Eric Blake (9):
  iotests: Improve image-clear tests on non-aligned image
  qcow2: Assert that cluster operations are aligned
  qcow2: Discard/zero clusters by byte count
  blkdebug: Sanity check block layer guarantees
  blkdebug: Refactor error injection
  blkdebug: Add pass-through write_zero and discard support
  blkdebug: Simplify override logic
  blkdebug: Add ability to override unmap geometries
  tests: Add coverage for recent block geometry fixes

 qapi/block-core.json   |  31 +-
 block/qcow2.h  |   9 +-
 block/blkdebug.c   | 264 +
 block/qcow2-cluster.c  |  51 +
 block/qcow2-snapshot.c |   7 +-
 block/qcow2.c  |  21 ++--
 tests/qemu-iotests/097 |  17 +--
 tests/qemu-iotests/097.out | 149 +++--
 tests/qemu-iotests/176 | 127 ++
 tests/qemu-iotests/176.out | 119 
 tests/qemu-iotests/177 | 114 
 tests/qemu-iotests/177.out |  49 +
 tests/qemu-iotests/group   |   2 +
 13 files changed, 724 insertions(+), 236 deletions(-)
 create mode 100755 tests/qemu-iotests/176
 create mode 100644 tests/qemu-iotests/176.out
 create mode 100755 tests/qemu-iotests/177
 create mode 100644 tests/qemu-iotests/177.out

-- 
2.9.3




[Qemu-block] [PATCH v6 03/10] qcow2: Assert that cluster operations are aligned

2017-03-07 Thread Eric Blake
qcow2_discard_clusters() is set up to silently ignore sub-cluster
head or tail on unaligned requests.  However, it is easy to audit
the various callers: qcow2_snapshot_create() has always passed
aligned data since the call was introduced in 1ebf561;
qcow2_co_pdiscard() has passed aligned clusters since commit
ecdbead taught the block layer the preferred discard alignment (the
block layer can still pass sub-cluster values, but those are
handled directly in qcow2_co_pdiscard()); and qcow2_make_empty()
was fixed to pass aligned clusters in commit a3e1505.  Replace
rounding with assertions to hold us to the tighter contract,
eliminating the now-impossible case of an early exit for a
sub-cluster request.

qcow2_zero_clusters() has always been called with cluster-aligned
arguments from its lone caller qcow2_co_pwrite_zeroes() (like
qcow2_co_pdiscard(), the caller takes care of sub-cluster requests
from the block layer; and qcow2_zero_clusters() would have
misbehaved on unaligned requests), but it deserves the same
assertion for symmetry.

Signed-off-by: Eric Blake 

---
v6: avoid assertion on non-cluster-aligned image, use s->cluster_sectors
to avoid a shift, drop R-b
v5: no change
v4: new patch
---
 block/qcow2-cluster.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 78c11d4..046fbd8 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1519,13 +1519,10 @@ int qcow2_discard_clusters(BlockDriverState *bs, 
uint64_t offset,

 end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);

-/* Round start up and end down */
-offset = align_offset(offset, s->cluster_size);
-end_offset = start_of_cluster(s, end_offset);
-
-if (offset > end_offset) {
-return 0;
-}
+/* Caller must pass aligned values, except at image end */
+assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
+assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
+   end_offset == bs->total_sectors << BDRV_SECTOR_BITS);

 nb_clusters = size_to_clusters(s, end_offset - offset);

@@ -1600,6 +1597,10 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t 
offset, int nb_sectors,
 uint64_t nb_clusters;
 int ret;

+/* Caller must pass aligned values */
+assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
+assert(QEMU_IS_ALIGNED(nb_sectors, s->cluster_sectors));
+
 /* The zero flag is only supported by version 3 and newer */
 if (s->qcow_version < 3) {
 return -ENOTSUP;
-- 
2.9.3




[Qemu-block] [PATCH v6 04/10] qcow2: Discard/zero clusters by byte count

2017-03-07 Thread Eric Blake
Passing a byte offset, but sector count, when we ultimately
want to operate on cluster granularity, is madness.  Clean up
the external interfaces to take both offset and count as bytes,
while still keeping the assertion added previously that the
caller must align the values to a cluster.  Then rename things
to make sure backports don't get confused by changed units:
instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().

The internal functions still operate on clusters at a time, and
return an int for number of cleared clusters; but on an image
with 2M clusters, a single L2 table holds 256k entries that each
represent a 2M cluster, totalling well over INT_MAX bytes if we
ever had a request for that many bytes at once.  All our callers
currently limit themselves to 32-bit bytes (and therefore fewer
clusters), but by making this function 64-bit clean, we have one
less place to clean up if we later improve the block layer to
support 64-bit bytes through all operations (with the block layer
auto-fragmenting on behalf of more-limited drivers), rather than
the current state where some interfaces are artificially limited
to INT_MAX at a time.

Signed-off-by: Eric Blake 
Reviewed-by: Kevin Wolf 

---
v6: rebase due to context
v5: s/count/byte/ to make the units obvious, and rework the math
to ensure no 32-bit integer overflow on large clusters
v4: improve function names, split assertion additions into earlier patch
[no v3 or v2]
v1: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00339.html
---
 block/qcow2.h  |  9 +
 block/qcow2-cluster.c  | 38 +-
 block/qcow2-snapshot.c |  7 +++
 block/qcow2.c  | 21 +
 4 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index f8aeb08..808104c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -544,10 +544,11 @@ uint64_t 
qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
  int compressed_size);

 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-int nb_sectors, enum qcow2_discard_type type, bool full_discard);
-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
-int flags);
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+  uint64_t bytes, enum qcow2_discard_type type,
+  bool full_discard);
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+  uint64_t bytes, int flags);

 int qcow2_expand_zero_clusters(BlockDriverState *bs,
BlockDriverAmendStatusCB *status_cb,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 046fbd8..ca9e7e6 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1509,16 +1509,16 @@ static int discard_single_l2(BlockDriverState *bs, 
uint64_t offset,
 return nb_clusters;
 }

-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-int nb_sectors, enum qcow2_discard_type type, bool full_discard)
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+  uint64_t bytes, enum qcow2_discard_type type,
+  bool full_discard)
 {
 BDRVQcow2State *s = bs->opaque;
-uint64_t end_offset;
+uint64_t end_offset = offset + bytes;
 uint64_t nb_clusters;
+int64_t cleared;
 int ret;

-end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
-
 /* Caller must pass aligned values, except at image end */
 assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
 assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
@@ -1530,13 +1530,15 @@ int qcow2_discard_clusters(BlockDriverState *bs, 
uint64_t offset,

 /* Each L2 table is handled by its own loop iteration */
 while (nb_clusters > 0) {
-ret = discard_single_l2(bs, offset, nb_clusters, type, full_discard);
-if (ret < 0) {
+cleared = discard_single_l2(bs, offset, nb_clusters, type,
+full_discard);
+if (cleared < 0) {
+ret = cleared;
 goto fail;
 }

-nb_clusters -= ret;
-offset += (ret * s->cluster_size);
+nb_clusters -= cleared;
+offset += (cleared * s->cluster_size);
 }

 ret = 0;
@@ -1590,16 +1592,17 @@ static int zero_single_l2(BlockDriverState *bs, 
uint64_t offset,
 return nb_clusters;
 }

-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
-int flags)
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+  uint64_t bytes, int flags)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t 

[Qemu-block] [PATCH v6 02/10] iotests: Improve image-clear tests on non-aligned image

2017-03-07 Thread Eric Blake
Tweak 097 and 176 to operate on an image that is not cluster-aligned,
to give further coverage of clearing out an entire image.

Signed-off-by: Eric Blake 

---
v6: new patch
---
 tests/qemu-iotests/097 |  7 ---
 tests/qemu-iotests/097.out | 24 
 tests/qemu-iotests/176 |  7 ---
 tests/qemu-iotests/176.out | 24 
 4 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/tests/qemu-iotests/097 b/tests/qemu-iotests/097
index 1d28aff..2e3a886 100755
--- a/tests/qemu-iotests/097
+++ b/tests/qemu-iotests/097
@@ -66,9 +66,10 @@ echo
 echo "=== Test pass $i ==="
 echo

-TEST_IMG="$TEST_IMG.base" _make_test_img 2100M
-TEST_IMG="$TEST_IMG.itmd" _make_test_img -b "$TEST_IMG.base" 2100M
-_make_test_img -b "$TEST_IMG.itmd" 2100M
+len=$((2100 * 1024 * 1024 + 512)) # larger than 2G, and not cluster aligned
+TEST_IMG="$TEST_IMG.base" _make_test_img $len
+TEST_IMG="$TEST_IMG.itmd" _make_test_img -b "$TEST_IMG.base" $len
+_make_test_img -b "$TEST_IMG.itmd" $len

 $QEMU_IO -c 'write -P 1 0x7ffd 192k' "$TEST_IMG.base" | _filter_qemu_io
 $QEMU_IO -c 'write -P 2 0x7ffe 128k' "$TEST_IMG.itmd" | _filter_qemu_io
diff --git a/tests/qemu-iotests/097.out b/tests/qemu-iotests/097.out
index 81fc225..30a64ba 100644
--- a/tests/qemu-iotests/097.out
+++ b/tests/qemu-iotests/097.out
@@ -2,9 +2,9 @@ QA output created by 097

 === Test pass 0 ===

-Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=2202009600
-Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=2202009600 
backing_file=TEST_DIR/t.IMGFMT.base
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2202009600 
backing_file=TEST_DIR/t.IMGFMT.itmd
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=2202010112
+Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=2202010112 
backing_file=TEST_DIR/t.IMGFMT.base
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2202010112 
backing_file=TEST_DIR/t.IMGFMT.itmd
 wrote 196608/196608 bytes at offset 2147287040
 192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 131072/131072 bytes at offset 2147352576
@@ -31,9 +31,9 @@ Offset  Length  File

 === Test pass 1 ===

-Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=2202009600
-Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=2202009600 
backing_file=TEST_DIR/t.IMGFMT.base
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2202009600 
backing_file=TEST_DIR/t.IMGFMT.itmd
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=2202010112
+Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=2202010112 
backing_file=TEST_DIR/t.IMGFMT.base
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2202010112 
backing_file=TEST_DIR/t.IMGFMT.itmd
 wrote 196608/196608 bytes at offset 2147287040
 192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 131072/131072 bytes at offset 2147352576
@@ -61,9 +61,9 @@ Offset  Length  File

 === Test pass 2 ===

-Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=2202009600
-Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=2202009600 
backing_file=TEST_DIR/t.IMGFMT.base
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2202009600 
backing_file=TEST_DIR/t.IMGFMT.itmd
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=2202010112
+Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=2202010112 
backing_file=TEST_DIR/t.IMGFMT.base
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2202010112 
backing_file=TEST_DIR/t.IMGFMT.itmd
 wrote 196608/196608 bytes at offset 2147287040
 192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 131072/131072 bytes at offset 2147352576
@@ -91,9 +91,9 @@ Offset  Length  File

 === Test pass 3 ===

-Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=2202009600
-Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=2202009600 
backing_file=TEST_DIR/t.IMGFMT.base
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2202009600 
backing_file=TEST_DIR/t.IMGFMT.itmd
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=2202010112
+Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=2202010112 
backing_file=TEST_DIR/t.IMGFMT.base
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2202010112 
backing_file=TEST_DIR/t.IMGFMT.itmd
 wrote 196608/196608 bytes at offset 2147287040
 192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 131072/131072 bytes at offset 2147352576
diff --git a/tests/qemu-iotests/176 b/tests/qemu-iotests/176
index 9a70a87..39fc6fb 100755
--- a/tests/qemu-iotests/176
+++ b/tests/qemu-iotests/176
@@ -69,9 +69,10 @@ echo
 echo "=== Test pass $i ==="
 echo

-TEST_IMG="$TEST_IMG.base" _make_test_img 2100M
-TEST_IMG="$TEST_IMG.itmd" _make_test_img -b "$TEST_IMG.base" 2100M
-_make_test_img -b "$TEST_IMG.itmd" 2100M
+len=$((2100 * 1024 * 1024 + 512)) # larger than 2G, and not cluster aligned
+TEST_IMG="$TEST_IMG.base" _make_test_img $len
+TEST_IMG="$TEST_IMG.itmd" _make_test_img -b "$TEST_IMG.base" $len
+_make_test_img -b "$TEST_IMG.itmd" $len
 

Re: [Qemu-block] [PATCH v5 6/7] blkdebug: Add ability to override unmap geometries

2017-03-07 Thread Eric Blake
On 02/15/2017 10:20 AM, Kevin Wolf wrote:
> Am 14.02.2017 um 20:25 hat Eric Blake geschrieben:
>> Make it easier to simulate various unusual hardware setups (for
>> example, recent commits 3482b9b and b8d0a98 affect the Dell
>> Equallogic iSCSI with its 15M preferred and maximum unmap and
>> write zero sizing, or b2f95fe deals with the Linux loopback
>> block device having a max_transfer of 64k), by allowing blkdebug
>> to wrap any other device with further restrictions on various
>> alignments.
>>
>> Signed-off-by: Eric Blake 
>>

>>  # @inject-error:#optional array of error injection descriptions
>>  #
>>  # @set-state:   #optional array of state-change descriptions
>> @@ -2472,7 +2493,9 @@
>>  { 'struct': 'BlockdevOptionsBlkdebug',
>>'data': { 'image': 'BlockdevRef',
>>  '*config': 'str',
>> -'*align': 'int',
>> +'*align': 'int', '*max-transfer': 'int32',
>> +'*opt-write-zero': 'int32', '*max-write-zero': 'int32',
>> +'*opt-discard': 'int32', '*max-discard': 'int32',
> 
> Hm, strictly speaking, this schema allows for negative values. Should we
> document that they aren't allowed?

Doesn't power of two imply positive? But yes, I can tweak the wording.


>> -/* Set request alignment */
>> +/* Set alignment overrides */
>>  s->align = qemu_opt_get_size(opts, "align", 0);
>>  if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
>>  error_setg(errp, "Invalid alignment, align must be integer power of 
>> 2");
>>  goto fail_unref;
>>  }
>> +align = MAX(s->align, bs->file->bs->bl.request_alignment);
>> +
>> +s->max_transfer = qemu_opt_get_size(opts, "max-transfer", 0);
>> +if (s->max_transfer &&
>> +(s->max_transfer >= INT_MAX ||
>> + !QEMU_IS_ALIGNED(s->max_transfer, align))) {
>> +error_setg(errp, "Invalid max-transfer, must be multiple of align");
> 
> It's not that I'm generally a friend of unspecific messages, but in this
> case I think I would prefer being unspecific to the potentially wrong
> error message that is returned here. We're checking multiple conditions
> and the error message mentions only one of them as the reason, which may
> or may not be the right one.
> 
> This is the same for all new options.

I'm leaning towards "Cannot meet constraints with max-transfer %lld" as
being non-specific enough to cover all the cases while still describing
the problem to be fixed.  Any other wording suggestions would be
welcome, if someone has a particular bikeshed color they like.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v4] backup: allow target without .bdrv_get_info

2017-03-07 Thread Vladimir Sementsov-Ogievskiy

07.03.2017 16:47, Kevin Wolf wrote:

Am 28.02.2017 um 20:33 hat Vladimir Sementsov-Ogievskiy geschrieben:

Currently backup to nbd target is broken, as nbd doesn't have
.bdrv_get_info realization.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

v4: use error_report()
 add article

v3: fix compilation (I feel like an idiot)
 adjust wording (Fam)
 
v2: add WARNING


===

Since commit

commit 4c9bca7e39a6e07ad02c1dcde3478363344ec60b
Author: John Snow 
Date:   Thu Feb 25 15:58:30 2016 -0500

 block/backup: avoid copying less than full target clusters

backup to nbd target is broken, we have "Couldn't determine the cluster size of
the target image".

Proposed NBD protocol extension - NBD_OPT_INFO should finally solve this 
problem.
But until it is not realized, we need allow backup to nbd target due to backward
compatibility.

Furthermore, is it entirely ok to disallow backup if bds lacks .bdrv_get_info?
Which behavior should be default: to fail backup or to use default cluster size?

  block/backup.c | 12 +++-
  1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/block/backup.c b/block/backup.c
index ea38733849..ea160e9e82 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -24,6 +24,7 @@
  #include "qemu/cutils.h"
  #include "sysemu/block-backend.h"
  #include "qemu/bitmap.h"
+#include "qemu/error-report.h"
  
  #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)

  #define SLICE_TIME 1ULL /* ns */
@@ -638,7 +639,16 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
   * backup cluster size is smaller than the target cluster size. Even for
   * targets with a backing file, try to avoid COW if possible. */
  ret = bdrv_get_info(target, );
-if (ret < 0 && !target->backing) {
+if (ret == -ENOTSUP) {

I think this should be if (ret == -ENOTSUP && !target->backing) because
the warning explicitly says "doesn't have a backing file", and the case
with a backing file is already handled below (without a warning).

I can fix this while applying if you agree.


Ok. no problem




+/* Cluster size is not defined */
+error_report("WARNING: The target block device doesn't provide "
+ "information about the block size and it doesn't have a "
+ "backing file. The default block size of %u bytes is "
+ "used. If the actual block size of the target exceeds "
+ "this default, the backup may be unusable",
+ BACKUP_CLUSTER_SIZE_DEFAULT);
+job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
+} else if (ret < 0 && !target->backing) {
  error_setg_errno(errp, -ret,
  "Couldn't determine the cluster size of the target image, "
  "which has no backing file");

Kevin



--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] [PATCH v5 07/18] iotests: fix 097 when run with qcow

2017-03-07 Thread Daniel P. Berrange
On Tue, Mar 07, 2017 at 09:44:02AM -0600, Eric Blake wrote:
> On 02/22/2017 05:46 PM, Eric Blake wrote:
> > On 02/21/2017 05:55 AM, Daniel P. Berrange wrote:
> >> The previous commit:
> >>
> >>   commit a3e1505daec31ef56f0489f8c8fff1b8e4ca92bd
> >>   Author: Eric Blake 
> >>   Date:   Mon Dec 5 09:49:34 2016 -0600
> >>
> >> qcow2: Don't strand clusters near 2G intervals during commit
> >>
> >> extended the 097 test case so that it did two passes, once
> >> with an internal snapshot, once without.
> >>
> >> qcow (v1) does not support internal snapshots, so this change
> >> broke test 097 when run against qcow.
> >>
> >> This splits 097 in two, creating a new 173 that tests the
> 
> 173 is wrong, and 175 is now taken.
> 
> >> internal snapshot codepath, effectively putting 097 back
> >> to its content before the above commit.
> >>
> >> Reviewed-by: Max Reitz 
> >> Signed-off-by: Daniel P. Berrange 
> >> ---
> >>  tests/qemu-iotests/097 |  10 +---
> >>  tests/qemu-iotests/097.out | 125 
> >> ++--
> >>  tests/qemu-iotests/175 | 126 
> >> +
> >>  tests/qemu-iotests/175.out | 119 
> >> ++
> 
> I'll be incorporating this patch into my series on blkdebug
> improvements, as I have a further enhancement (to both 97 and what is
> now 176) to cover an image that is not cluster-aligned.  You may want to
> rebase the rest of your series (which adds two more tests) on top of
> that posting.
> 
> > I used 175 in v5 of my blkdebug series, but Kevin had comments that are
> > worth me respinning, so I'm rebasing mine to use 176.
> > https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg03049.html
> 
> Particularly since my blkdebug series will now be using 177.

/me can't help thinking that we should being giving tests names instead
of numbers so we avoid the frequent clashes between people's series :-)

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-block] [Qemu-devel] [PATCH v5 07/18] iotests: fix 097 when run with qcow

2017-03-07 Thread Eric Blake
On 02/22/2017 05:46 PM, Eric Blake wrote:
> On 02/21/2017 05:55 AM, Daniel P. Berrange wrote:
>> The previous commit:
>>
>>   commit a3e1505daec31ef56f0489f8c8fff1b8e4ca92bd
>>   Author: Eric Blake 
>>   Date:   Mon Dec 5 09:49:34 2016 -0600
>>
>> qcow2: Don't strand clusters near 2G intervals during commit
>>
>> extended the 097 test case so that it did two passes, once
>> with an internal snapshot, once without.
>>
>> qcow (v1) does not support internal snapshots, so this change
>> broke test 097 when run against qcow.
>>
>> This splits 097 in two, creating a new 173 that tests the

173 is wrong, and 175 is now taken.

>> internal snapshot codepath, effectively putting 097 back
>> to its content before the above commit.
>>
>> Reviewed-by: Max Reitz 
>> Signed-off-by: Daniel P. Berrange 
>> ---
>>  tests/qemu-iotests/097 |  10 +---
>>  tests/qemu-iotests/097.out | 125 
>> ++--
>>  tests/qemu-iotests/175 | 126 
>> +
>>  tests/qemu-iotests/175.out | 119 ++

I'll be incorporating this patch into my series on blkdebug
improvements, as I have a further enhancement (to both 97 and what is
now 176) to cover an image that is not cluster-aligned.  You may want to
rebase the rest of your series (which adds two more tests) on top of
that posting.

> I used 175 in v5 of my blkdebug series, but Kevin had comments that are
> worth me respinning, so I'm rebasing mine to use 176.
> https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg03049.html

Particularly since my blkdebug series will now be using 177.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PULL 24/27] qapi-schema: Rename SocketAddressFlat's variant tcp to inet

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

QAPI type SocketAddressFlat differs from SocketAddress pointlessly:
the discriminator value for variant InetSocketAddress is 'tcp' instead
of 'inet'.  Rename.

The type is so far only used by the Gluster block drivers.  Take care
to keep 'tcp' working in things like -drive's file.server.0.type=tcp.
The "gluster+tcp" URI scheme in pseudo-filenames stays the same.
blockdev-add changes, but it has changed incompatibly since 2.8
already.

Signed-off-by: Markus Armbruster 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/gluster.c  | 59 +---
 qapi-schema.json |  8 
 2 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 64b0217..a577dae 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -152,7 +152,7 @@ static QemuOptsList runtime_type_opts = {
 {
 .name = GLUSTER_OPT_TYPE,
 .type = QEMU_OPT_STRING,
-.help = "tcp|unix",
+.help = "inet|unix",
 },
 { /* end of list */ }
 },
@@ -171,14 +171,14 @@ static QemuOptsList runtime_unix_opts = {
 },
 };
 
-static QemuOptsList runtime_tcp_opts = {
-.name = "gluster_tcp",
-.head = QTAILQ_HEAD_INITIALIZER(runtime_tcp_opts.head),
+static QemuOptsList runtime_inet_opts = {
+.name = "gluster_inet",
+.head = QTAILQ_HEAD_INITIALIZER(runtime_inet_opts.head),
 .desc = {
 {
 .name = GLUSTER_OPT_TYPE,
 .type = QEMU_OPT_STRING,
-.help = "tcp|unix",
+.help = "inet|unix",
 },
 {
 .name = GLUSTER_OPT_HOST,
@@ -337,14 +337,14 @@ static int qemu_gluster_parse_uri(BlockdevOptionsGluster 
*gconf,
 
 /* transport */
 if (!uri->scheme || !strcmp(uri->scheme, "gluster")) {
-gsconf->type = SOCKET_ADDRESS_FLAT_TYPE_TCP;
+gsconf->type = SOCKET_ADDRESS_FLAT_TYPE_INET;
 } else if (!strcmp(uri->scheme, "gluster+tcp")) {
-gsconf->type = SOCKET_ADDRESS_FLAT_TYPE_TCP;
+gsconf->type = SOCKET_ADDRESS_FLAT_TYPE_INET;
 } else if (!strcmp(uri->scheme, "gluster+unix")) {
 gsconf->type = SOCKET_ADDRESS_FLAT_TYPE_UNIX;
 is_unix = true;
 } else if (!strcmp(uri->scheme, "gluster+rdma")) {
-gsconf->type = SOCKET_ADDRESS_FLAT_TYPE_TCP;
+gsconf->type = SOCKET_ADDRESS_FLAT_TYPE_INET;
 error_report("Warning: rdma feature is not supported, falling "
  "back to tcp");
 } else {
@@ -374,11 +374,11 @@ static int qemu_gluster_parse_uri(BlockdevOptionsGluster 
*gconf,
 }
 gsconf->u.q_unix.path = g_strdup(qp->p[0].value);
 } else {
-gsconf->u.tcp.host = g_strdup(uri->server ? uri->server : "localhost");
+gsconf->u.inet.host = g_strdup(uri->server ? uri->server : 
"localhost");
 if (uri->port) {
-gsconf->u.tcp.port = g_strdup_printf("%d", uri->port);
+gsconf->u.inet.port = g_strdup_printf("%d", uri->port);
 } else {
-gsconf->u.tcp.port = g_strdup_printf("%d", GLUSTER_DEFAULT_PORT);
+gsconf->u.inet.port = g_strdup_printf("%d", GLUSTER_DEFAULT_PORT);
 }
 }
 
@@ -416,15 +416,15 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
 ret = glfs_set_volfile_server(glfs, "unix",
server->value->u.q_unix.path, 0);
 } else {
-if (parse_uint_full(server->value->u.tcp.port, , 10) < 0 ||
+if (parse_uint_full(server->value->u.inet.port, , 10) < 0 ||
 port > 65535) {
 error_setg(errp, "'%s' is not a valid port number",
-   server->value->u.tcp.port);
+   server->value->u.inet.port);
 errno = EINVAL;
 goto out;
 }
 ret = glfs_set_volfile_server(glfs, "tcp",
-   server->value->u.tcp.host,
+   server->value->u.inet.host,
(int)port);
 }
 
@@ -448,8 +448,8 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
   server->value->u.q_unix.path);
 } else {
 error_append_hint(errp, "hint: failed on host %s and port %s ",
-  server->value->u.tcp.host,
-  server->value->u.tcp.port);
+  server->value->u.inet.host,
+  server->value->u.inet.port);
 }
 }
 
@@ -536,21 +536,24 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
*gconf,
 
 }
 gsconf = g_new0(SocketAddressFlat, 1);
+if (!strcmp(ptr, "tcp")) {
+ptr = 

[Qemu-block] [PULL 23/27] qapi-schema: Rename GlusterServer to SocketAddressFlat

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

As its documentation says, it's not specific to Gluster.  Rename it,
as I'm going to use it for something else.

Signed-off-by: Markus Armbruster 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/gluster.c  | 38 +++---
 qapi-schema.json | 38 ++
 qapi/block-core.json | 46 +-
 3 files changed, 58 insertions(+), 64 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 991f18f..64b0217 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -321,7 +321,7 @@ static int parse_volume_options(BlockdevOptionsGluster 
*gconf, char *path)
 static int qemu_gluster_parse_uri(BlockdevOptionsGluster *gconf,
   const char *filename)
 {
-GlusterServer *gsconf;
+SocketAddressFlat *gsconf;
 URI *uri;
 QueryParams *qp = NULL;
 bool is_unix = false;
@@ -332,19 +332,19 @@ static int qemu_gluster_parse_uri(BlockdevOptionsGluster 
*gconf,
 return -EINVAL;
 }
 
-gconf->server = g_new0(GlusterServerList, 1);
-gconf->server->value = gsconf = g_new0(GlusterServer, 1);
+gconf->server = g_new0(SocketAddressFlatList, 1);
+gconf->server->value = gsconf = g_new0(SocketAddressFlat, 1);
 
 /* transport */
 if (!uri->scheme || !strcmp(uri->scheme, "gluster")) {
-gsconf->type = GLUSTER_TRANSPORT_TCP;
+gsconf->type = SOCKET_ADDRESS_FLAT_TYPE_TCP;
 } else if (!strcmp(uri->scheme, "gluster+tcp")) {
-gsconf->type = GLUSTER_TRANSPORT_TCP;
+gsconf->type = SOCKET_ADDRESS_FLAT_TYPE_TCP;
 } else if (!strcmp(uri->scheme, "gluster+unix")) {
-gsconf->type = GLUSTER_TRANSPORT_UNIX;
+gsconf->type = SOCKET_ADDRESS_FLAT_TYPE_UNIX;
 is_unix = true;
 } else if (!strcmp(uri->scheme, "gluster+rdma")) {
-gsconf->type = GLUSTER_TRANSPORT_TCP;
+gsconf->type = SOCKET_ADDRESS_FLAT_TYPE_TCP;
 error_report("Warning: rdma feature is not supported, falling "
  "back to tcp");
 } else {
@@ -396,7 +396,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
 struct glfs *glfs;
 int ret;
 int old_errno;
-GlusterServerList *server;
+SocketAddressFlatList *server;
 unsigned long long port;
 
 glfs = glfs_find_preopened(gconf->volume);
@@ -412,7 +412,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
 glfs_set_preopened(gconf->volume, glfs);
 
 for (server = gconf->server; server; server = server->next) {
-if (server->value->type  == GLUSTER_TRANSPORT_UNIX) {
+if (server->value->type  == SOCKET_ADDRESS_FLAT_TYPE_UNIX) {
 ret = glfs_set_volfile_server(glfs, "unix",
server->value->u.q_unix.path, 0);
 } else {
@@ -443,7 +443,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
 error_setg(errp, "Gluster connection for volume %s, path %s failed"
  " to connect", gconf->volume, gconf->path);
 for (server = gconf->server; server; server = server->next) {
-if (server->value->type  == GLUSTER_TRANSPORT_UNIX) {
+if (server->value->type  == SOCKET_ADDRESS_FLAT_TYPE_UNIX) {
 error_append_hint(errp, "hint: failed on socket %s ",
   server->value->u.q_unix.path);
 } else {
@@ -480,8 +480,8 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
*gconf,
   QDict *options, Error **errp)
 {
 QemuOpts *opts;
-GlusterServer *gsconf = NULL;
-GlusterServerList *curr = NULL;
+SocketAddressFlat *gsconf = NULL;
+SocketAddressFlatList *curr = NULL;
 QDict *backing_options = NULL;
 Error *local_err = NULL;
 char *str = NULL;
@@ -535,9 +535,9 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
*gconf,
 goto out;
 
 }
-gsconf = g_new0(GlusterServer, 1);
-gsconf->type = qapi_enum_parse(GlusterTransport_lookup, ptr,
-   GLUSTER_TRANSPORT__MAX, -1,
+gsconf = g_new0(SocketAddressFlat, 1);
+gsconf->type = qapi_enum_parse(SocketAddressFlatType_lookup, ptr,
+   SOCKET_ADDRESS_FLAT_TYPE__MAX, -1,
_err);
 if (local_err) {
 error_append_hint(_err,
@@ -548,7 +548,7 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
*gconf,
 }
 qemu_opts_del(opts);
 
-if (gsconf->type == GLUSTER_TRANSPORT_TCP) {
+if (gsconf->type == SOCKET_ADDRESS_FLAT_TYPE_TCP) {
 /* create opts info from runtime_tcp_opts list */
 opts = qemu_opts_create(_tcp_opts, NULL, 0, 

[Qemu-block] [PULL 21/27] gluster: Don't duplicate qapi-util.c's qapi_enum_parse()

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

Signed-off-by: Markus Armbruster 
Reviewed-by: Niels de Vos 
Signed-off-by: Kevin Wolf 
---
 block/gluster.c | 30 +-
 1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 7236d59..6fbcf9e 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -12,6 +12,7 @@
 #include "block/block_int.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
+#include "qapi/util.h"
 #include "qemu/uri.h"
 #include "qemu/error-report.h"
 #include "qemu/cutils.h"
@@ -472,23 +473,6 @@ out:
 return NULL;
 }
 
-static int qapi_enum_parse(const char *opt)
-{
-int i;
-
-if (!opt) {
-return GLUSTER_TRANSPORT__MAX;
-}
-
-for (i = 0; i < GLUSTER_TRANSPORT__MAX; i++) {
-if (!strcmp(opt, GlusterTransport_lookup[i])) {
-return i;
-}
-}
-
-return i;
-}
-
 /*
  * Convert the json formatted command line into qapi.
 */
@@ -546,16 +530,20 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
*gconf,
 
 ptr = qemu_opt_get(opts, GLUSTER_OPT_TYPE);
 gsconf = g_new0(GlusterServer, 1);
-gsconf->type = qapi_enum_parse(ptr);
+gsconf->type = qapi_enum_parse(GlusterTransport_lookup, ptr,
+   GLUSTER_TRANSPORT__MAX,
+   GLUSTER_TRANSPORT__MAX,
+   _err);
 if (!ptr) {
 error_setg(_err, QERR_MISSING_PARAMETER, GLUSTER_OPT_TYPE);
 error_append_hint(_err, GERR_INDEX_HINT, i);
 goto out;
 
 }
-if (gsconf->type == GLUSTER_TRANSPORT__MAX) {
-error_setg(_err, QERR_INVALID_PARAMETER_VALUE,
-   GLUSTER_OPT_TYPE, "tcp or unix");
+if (local_err) {
+error_append_hint(_err,
+  "Parameter '%s' may be 'tcp' or 'unix'\n",
+  GLUSTER_OPT_TYPE);
 error_append_hint(_err, GERR_INDEX_HINT, i);
 goto out;
 }
-- 
1.8.3.1




[Qemu-block] [PULL 18/27] sheepdog: Use SocketAddress and socket_connect()

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

sd_parse_uri() builds a string from host and port parts for
inet_connect().  inet_connect() parses it into host, port and options.
Whether this gets exactly the same host, port and no options for all
inputs is not obvious.

Cut out the string middleman and build a SocketAddress for
socket_connect() instead.

Signed-off-by: Markus Armbruster 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block/sheepdog.c | 53 ++---
 1 file changed, 30 insertions(+), 23 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 161932d..9b1e121 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -374,8 +374,7 @@ struct BDRVSheepdogState {
 uint32_t cache_flags;
 bool discard_supported;
 
-char *host_spec;
-bool is_unix;
+SocketAddress *addr;
 int fd;
 
 CoMutex lock;
@@ -532,16 +531,12 @@ static int connect_to_sdog(BDRVSheepdogState *s, Error 
**errp)
 {
 int fd;
 
-if (s->is_unix) {
-fd = unix_connect(s->host_spec, errp);
-} else {
-fd = inet_connect(s->host_spec, errp);
+fd = socket_connect(s->addr, errp, NULL, NULL);
 
-if (fd >= 0) {
-int ret = socket_set_nodelay(fd);
-if (ret < 0) {
-error_report("%s", strerror(errno));
-}
+if (s->addr->type == SOCKET_ADDRESS_KIND_INET && fd >= 0) {
+int ret = socket_set_nodelay(fd);
+if (ret < 0) {
+error_report("%s", strerror(errno));
 }
 }
 
@@ -820,8 +815,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 case AIOCB_DISCARD_OBJ:
 switch (rsp.result) {
 case SD_RES_INVALID_PARMS:
-error_report("sheep(%s) doesn't support discard command",
- s->host_spec);
+error_report("server doesn't support discard command");
 rsp.result = SD_RES_SUCCESS;
 s->discard_supported = false;
 break;
@@ -962,8 +956,10 @@ static void sd_parse_uri(BDRVSheepdogState *s, const char 
*filename,
  Error **errp)
 {
 Error *err = NULL;
-URI *uri;
 QueryParams *qp = NULL;
+SocketAddress *addr = NULL;
+bool is_unix;
+URI *uri;
 
 uri = uri_parse(filename);
 if (!uri) {
@@ -973,11 +969,11 @@ static void sd_parse_uri(BDRVSheepdogState *s, const char 
*filename,
 
 /* transport */
 if (!strcmp(uri->scheme, "sheepdog")) {
-s->is_unix = false;
+is_unix = false;
 } else if (!strcmp(uri->scheme, "sheepdog+tcp")) {
-s->is_unix = false;
+is_unix = false;
 } else if (!strcmp(uri->scheme, "sheepdog+unix")) {
-s->is_unix = true;
+is_unix = true;
 } else {
 error_setg(, "URI scheme must be 'sheepdog', 'sheepdog+tcp',"
" or 'sheepdog+unix'");
@@ -994,8 +990,9 @@ static void sd_parse_uri(BDRVSheepdogState *s, const char 
*filename,
 }
 
 qp = query_params_parse(uri->query);
+addr = g_new0(SocketAddress, 1);
 
-if (s->is_unix) {
+if (is_unix) {
 /* sheepdog+unix:///vdiname?socket=path */
 if (uri->server || uri->port) {
 error_setg(, "URI scheme %s doesn't accept a server address",
@@ -1012,15 +1009,20 @@ static void sd_parse_uri(BDRVSheepdogState *s, const 
char *filename,
 error_setg(, "unexpected query parameters");
 goto out;
 }
-s->host_spec = g_strdup(qp->p[0].value);
+addr->type = SOCKET_ADDRESS_KIND_UNIX;
+addr->u.q_unix.data = g_new0(UnixSocketAddress, 1);
+addr->u.q_unix.data->path = g_strdup(qp->p[0].value);
 } else {
 /* sheepdog[+tcp]://[host:port]/vdiname */
 if (qp->n) {
 error_setg(, "unexpected query parameters");
 goto out;
 }
-s->host_spec = g_strdup_printf("%s:%d", uri->server ?: SD_DEFAULT_ADDR,
-   uri->port ?: SD_DEFAULT_PORT);
+addr->type = SOCKET_ADDRESS_KIND_INET;
+addr->u.inet.data = g_new0(InetSocketAddress, 1);
+addr->u.inet.data->host = g_strdup(uri->server ?: SD_DEFAULT_ADDR);
+addr->u.inet.data->port = g_strdup_printf("%d",
+uri->port ?: SD_DEFAULT_PORT);
 }
 
 /* snapshot tag */
@@ -1035,7 +1037,12 @@ static void sd_parse_uri(BDRVSheepdogState *s, const 
char *filename,
 }
 
 out:
-error_propagate(errp, err);
+if (err) {
+error_propagate(errp, err);
+qapi_free_SocketAddress(addr);
+} else {
+s->addr = addr;
+}
 if (qp) {
 query_params_free(qp);
 }
@@ -1998,7 +2005,7 @@ static void sd_close(BlockDriverState *bs)
 aio_set_fd_handler(bdrv_get_aio_context(bs), s->fd,
false, NULL, NULL, NULL, NULL);
 closesocket(s->fd);
-

[Qemu-block] [PULL 19/27] sheepdog: Implement bdrv_parse_filename()

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

This permits configuration with driver-specific options in addition to
pseudo-filename parsed as URI.  For instance,

--drive driver=sheepdog,host=fido,vdi=dolly

instead of

--drive driver=sheepdog,file=sheepdog://fido/dolly

It's also a first step towards supporting blockdev-add.

Signed-off-by: Markus Armbruster 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/sheepdog.c | 230 +--
 1 file changed, 174 insertions(+), 56 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 9b1e121..89e98ed 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -14,6 +14,8 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qint.h"
 #include "qemu/uri.h"
 #include "qemu/error-report.h"
 #include "qemu/sockets.h"
@@ -526,6 +528,25 @@ static void sd_aio_setup(SheepdogAIOCB *acb, 
BDRVSheepdogState *s,
 QLIST_INSERT_HEAD(>inflight_aiocb_head, acb, aiocb_siblings);
 }
 
+static SocketAddress *sd_socket_address(const char *path,
+const char *host, const char *port)
+{
+SocketAddress *addr = g_new0(SocketAddress, 1);
+
+if (path) {
+addr->type = SOCKET_ADDRESS_KIND_UNIX;
+addr->u.q_unix.data = g_new0(UnixSocketAddress, 1);
+addr->u.q_unix.data->path = g_strdup(path);
+} else {
+addr->type = SOCKET_ADDRESS_KIND_INET;
+addr->u.inet.data = g_new0(InetSocketAddress, 1);
+addr->u.inet.data->host = g_strdup(host ?: SD_DEFAULT_ADDR);
+addr->u.inet.data->port = g_strdup(port ?: stringify(SD_DEFAULT_PORT));
+}
+
+return addr;
+}
+
 /* Return -EIO in case of error, file descriptor on success */
 static int connect_to_sdog(BDRVSheepdogState *s, Error **errp)
 {
@@ -951,17 +972,37 @@ static bool sd_parse_snapid_or_tag(const char *str,
 return true;
 }
 
-static void sd_parse_uri(BDRVSheepdogState *s, const char *filename,
- char *vdi, uint32_t *snapid, char *tag,
+typedef struct {
+const char *path;   /* non-null iff transport is tcp */
+const char *host;   /* valid when transport is tcp */
+int port;   /* valid when transport is tcp */
+char vdi[SD_MAX_VDI_LEN];
+char tag[SD_MAX_VDI_TAG_LEN];
+uint32_t snap_id;
+/* Remainder is only for sd_config_done() */
+URI *uri;
+QueryParams *qp;
+} SheepdogConfig;
+
+static void sd_config_done(SheepdogConfig *cfg)
+{
+if (cfg->qp) {
+query_params_free(cfg->qp);
+}
+uri_free(cfg->uri);
+}
+
+static void sd_parse_uri(SheepdogConfig *cfg, const char *filename,
  Error **errp)
 {
 Error *err = NULL;
 QueryParams *qp = NULL;
-SocketAddress *addr = NULL;
 bool is_unix;
 URI *uri;
 
-uri = uri_parse(filename);
+memset(cfg, 0, sizeof(*cfg));
+
+cfg->uri = uri = uri_parse(filename);
 if (!uri) {
 error_setg(, "invalid URI");
 goto out;
@@ -984,13 +1025,13 @@ static void sd_parse_uri(BDRVSheepdogState *s, const 
char *filename,
 error_setg(, "missing file path in URI");
 goto out;
 }
-if (g_strlcpy(vdi, uri->path + 1, SD_MAX_VDI_LEN) >= SD_MAX_VDI_LEN) {
+if (g_strlcpy(cfg->vdi, uri->path + 1, SD_MAX_VDI_LEN)
+>= SD_MAX_VDI_LEN) {
 error_setg(, "VDI name is too long");
 goto out;
 }
 
-qp = query_params_parse(uri->query);
-addr = g_new0(SocketAddress, 1);
+cfg->qp = qp = query_params_parse(uri->query);
 
 if (is_unix) {
 /* sheepdog+unix:///vdiname?socket=path */
@@ -1009,44 +1050,34 @@ static void sd_parse_uri(BDRVSheepdogState *s, const 
char *filename,
 error_setg(, "unexpected query parameters");
 goto out;
 }
-addr->type = SOCKET_ADDRESS_KIND_UNIX;
-addr->u.q_unix.data = g_new0(UnixSocketAddress, 1);
-addr->u.q_unix.data->path = g_strdup(qp->p[0].value);
+cfg->path = qp->p[0].value;
 } else {
 /* sheepdog[+tcp]://[host:port]/vdiname */
 if (qp->n) {
 error_setg(, "unexpected query parameters");
 goto out;
 }
-addr->type = SOCKET_ADDRESS_KIND_INET;
-addr->u.inet.data = g_new0(InetSocketAddress, 1);
-addr->u.inet.data->host = g_strdup(uri->server ?: SD_DEFAULT_ADDR);
-addr->u.inet.data->port = g_strdup_printf("%d",
-uri->port ?: SD_DEFAULT_PORT);
+cfg->host = uri->server;
+cfg->port = uri->port;
 }
 
 /* snapshot tag */
 if (uri->fragment) {
-if (!sd_parse_snapid_or_tag(uri->fragment, snapid, tag)) {
+if (!sd_parse_snapid_or_tag(uri->fragment,
+>snap_id, cfg->tag)) {
 error_setg(, "'%s' is not a valid 

[Qemu-block] [PULL 20/27] gluster: Drop assumptions on SocketTransport names

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

qemu_gluster_glfs_init() passes the names of QAPI enumeration type
SocketTransport to glfs_set_volfile_server().  Works, because they
were chosen to match.  But the coupling is artificial.  Use the
appropriate literal strings instead.

Signed-off-by: Markus Armbruster 
Reviewed-by: Niels de Vos 
Signed-off-by: Kevin Wolf 
---
 block/gluster.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 56b4abe..7236d59 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -412,8 +412,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
 
 for (server = gconf->server; server; server = server->next) {
 if (server->value->type  == GLUSTER_TRANSPORT_UNIX) {
-ret = glfs_set_volfile_server(glfs,
-   
GlusterTransport_lookup[server->value->type],
+ret = glfs_set_volfile_server(glfs, "unix",
server->value->u.q_unix.path, 0);
 } else {
 if (parse_uint_full(server->value->u.tcp.port, , 10) < 0 ||
@@ -423,8 +422,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
 errno = EINVAL;
 goto out;
 }
-ret = glfs_set_volfile_server(glfs,
-   
GlusterTransport_lookup[server->value->type],
+ret = glfs_set_volfile_server(glfs, "tcp",
server->value->u.tcp.host,
(int)port);
 }
-- 
1.8.3.1




[Qemu-block] [PULL 27/27] commit: Don't use error_abort in commit_start

2017-03-07 Thread Kevin Wolf
From: Fam Zheng 

bdrv_set_backing_hd failure needn't be abort. Since we already have
error parameter, use it.

Signed-off-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/commit.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index e57c1cf..9c41988 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -316,8 +316,20 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 goto fail;
 }
 
-bdrv_set_backing_hd(commit_top_bs, top, _abort);
-bdrv_set_backing_hd(overlay_bs, commit_top_bs, _abort);
+bdrv_set_backing_hd(commit_top_bs, top, _err);
+if (local_err) {
+bdrv_unref(commit_top_bs);
+commit_top_bs = NULL;
+error_propagate(errp, local_err);
+goto fail;
+}
+bdrv_set_backing_hd(overlay_bs, commit_top_bs, _err);
+if (local_err) {
+bdrv_unref(commit_top_bs);
+commit_top_bs = NULL;
+error_propagate(errp, local_err);
+goto fail;
+}
 
 s->commit_top_bs = commit_top_bs;
 bdrv_unref(commit_top_bs);
-- 
1.8.3.1




[Qemu-block] [PULL 26/27] block: Don't use error_abort in blk_new_open

2017-03-07 Thread Kevin Wolf
From: Fam Zheng 

We have an errp and bdrv_root_attach_child can fail permission check,
error_abort is not the best choice here.

Signed-off-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index daa7908..5742c09 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -213,7 +213,12 @@ BlockBackend *blk_new_open(const char *filename, const 
char *reference,
 }
 
 blk->root = bdrv_root_attach_child(bs, "root", _root,
-   perm, BLK_PERM_ALL, blk, _abort);
+   perm, BLK_PERM_ALL, blk, errp);
+if (!blk->root) {
+bdrv_unref(bs);
+blk_unref(blk);
+return NULL;
+}
 
 return blk;
 }
-- 
1.8.3.1




[Qemu-block] [PULL 25/27] sheepdog: Support blockdev-add

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

Signed-off-by: Markus Armbruster 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index d63be0a..9bb7f4a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2124,6 +2124,7 @@
 # @ssh: Since 2.8
 # @iscsi: Since 2.9
 # @rbd: Since 2.9
+# @sheepdog: Since 2.9
 #
 # Since: 2.0
 ##
@@ -2132,8 +2133,8 @@
 'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom',
 'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
 'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed',
-'quorum', 'raw', 'rbd', 'replication', 'ssh', 'vdi', 'vhdx', 
'vmdk',
-'vpc', 'vvfat' ] }
+'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh',
+'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsFile:
@@ -2692,6 +2693,26 @@
 '*password-secret': 'str' } }
 
 ##
+# @BlockdevOptionsSheepdog:
+#
+# Driver specific block device options for sheepdog
+#
+# @vdi: Virtual disk image name
+# @addr:The Sheepdog server to connect to
+# @snap-id: Snapshot ID
+# @tag: Snapshot tag name
+#
+# Only one of @snap-id and @tag may be present.
+#
+# Since: 2.9
+##
+{ 'struct': 'BlockdevOptionsSheepdog',
+  'data': { 'addr': 'SocketAddressFlat',
+'vdi': 'str',
+'*snap-id': 'uint32',
+'*tag': 'str' } }
+
+##
 # @ReplicationMode:
 #
 # An enumeration of replication modes.
@@ -2891,7 +2912,7 @@
   'raw':'BlockdevOptionsRaw',
   'rbd':'BlockdevOptionsRbd',
   'replication':'BlockdevOptionsReplication',
-# TODO sheepdog: Wait for structured options
+  'sheepdog':   'BlockdevOptionsSheepdog',
   'ssh':'BlockdevOptionsSsh',
   'vdi':'BlockdevOptionsGenericFormat',
   'vhdx':   'BlockdevOptionsGenericFormat',
-- 
1.8.3.1




[Qemu-block] [PULL 22/27] gluster: Plug memory leaks in qemu_gluster_parse_json()

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

To reproduce, run

$ valgrind qemu-system-x86_64 --nodefaults -S --drive 
driver=gluster,volume=testvol,path=/a/b/c,server.0.type=xxx

Signed-off-by: Markus Armbruster 
Reviewed-by: Niels de Vos 
Signed-off-by: Kevin Wolf 
---
 block/gluster.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 6fbcf9e..991f18f 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -480,7 +480,7 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
*gconf,
   QDict *options, Error **errp)
 {
 QemuOpts *opts;
-GlusterServer *gsconf;
+GlusterServer *gsconf = NULL;
 GlusterServerList *curr = NULL;
 QDict *backing_options = NULL;
 Error *local_err = NULL;
@@ -529,17 +529,16 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
*gconf,
 }
 
 ptr = qemu_opt_get(opts, GLUSTER_OPT_TYPE);
-gsconf = g_new0(GlusterServer, 1);
-gsconf->type = qapi_enum_parse(GlusterTransport_lookup, ptr,
-   GLUSTER_TRANSPORT__MAX,
-   GLUSTER_TRANSPORT__MAX,
-   _err);
 if (!ptr) {
 error_setg(_err, QERR_MISSING_PARAMETER, GLUSTER_OPT_TYPE);
 error_append_hint(_err, GERR_INDEX_HINT, i);
 goto out;
 
 }
+gsconf = g_new0(GlusterServer, 1);
+gsconf->type = qapi_enum_parse(GlusterTransport_lookup, ptr,
+   GLUSTER_TRANSPORT__MAX, -1,
+   _err);
 if (local_err) {
 error_append_hint(_err,
   "Parameter '%s' may be 'tcp' or 'unix'\n",
@@ -626,8 +625,10 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
*gconf,
 curr->next->value = gsconf;
 curr = curr->next;
 }
+gsconf = NULL;
 
-qdict_del(backing_options, str);
+QDECREF(backing_options);
+backing_options = NULL;
 g_free(str);
 str = NULL;
 }
@@ -636,11 +637,10 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster 
*gconf,
 
 out:
 error_propagate(errp, local_err);
+qapi_free_GlusterServer(gsconf);
 qemu_opts_del(opts);
-if (str) {
-qdict_del(backing_options, str);
-g_free(str);
-}
+g_free(str);
+QDECREF(backing_options);
 errno = EINVAL;
 return -errno;
 }
-- 
1.8.3.1




[Qemu-block] [PULL 16/27] sheepdog: Don't truncate long VDI name in _open(), _create()

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

sd_parse_uri() truncates long VDI names silently.  Reject them
instead.

Signed-off-by: Markus Armbruster 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Kevin Wolf 
---
 block/sheepdog.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index d3d3731..fed7264 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -985,7 +985,10 @@ static int sd_parse_uri(BDRVSheepdogState *s, const char 
*filename,
 ret = -EINVAL;
 goto out;
 }
-pstrcpy(vdi, SD_MAX_VDI_LEN, uri->path + 1);
+if (g_strlcpy(vdi, uri->path + 1, SD_MAX_VDI_LEN) >= SD_MAX_VDI_LEN) {
+ret = -EINVAL;
+goto out;
+}
 
 qp = query_params_parse(uri->query);
 if (qp->n > 1 || (s->is_unix && !qp->n) || (!s->is_unix && qp->n)) {
-- 
1.8.3.1




[Qemu-block] [PULL 12/27] sheepdog: Fix error handling in sd_snapshot_delete()

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

As a bdrv_snapshot_delete() method, sd_snapshot_delete() must set an
error and return negative errno on failure.  It sometimes returns -1,
and sometimes neglects to set an error.  It also prints error messages
with error_report().  Fix all that.

Moreover, its handling of an attempt to delete a nonexistent snapshot
is wrong: it error_report()s and succeeds.  Fix it to set an error and
return -ENOENT instead.

Signed-off-by: Markus Armbruster 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/sheepdog.c | 41 +++--
 1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index c3ee4ce..0a0803e 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2405,18 +2405,15 @@ out:
 
 #define NR_BATCHED_DISCARD 128
 
-static bool remove_objects(BDRVSheepdogState *s)
+static int remove_objects(BDRVSheepdogState *s, Error **errp)
 {
 int fd, i = 0, nr_objs = 0;
-Error *local_err = NULL;
-int ret = 0;
-bool result = true;
+int ret;
 SheepdogInode *inode = >inode;
 
-fd = connect_to_sdog(s, _err);
+fd = connect_to_sdog(s, errp);
 if (fd < 0) {
-error_report_err(local_err);
-return false;
+return fd;
 }
 
 nr_objs = count_data_objs(inode);
@@ -2446,15 +2443,15 @@ static bool remove_objects(BDRVSheepdogState *s)
 data_vdi_id[start_idx]),
false, s->cache_flags);
 if (ret < 0) {
-error_report("failed to discard snapshot inode.");
-result = false;
+error_setg(errp, "Failed to discard snapshot inode");
 goto out;
 }
 }
 
+ret = 0;
 out:
 closesocket(fd);
-return result;
+return ret;
 }
 
 static int sd_snapshot_delete(BlockDriverState *bs,
@@ -2464,7 +2461,6 @@ static int sd_snapshot_delete(BlockDriverState *bs,
 {
 unsigned long snap_id = 0;
 char snap_tag[SD_MAX_VDI_TAG_LEN];
-Error *local_err = NULL;
 int fd, ret;
 char buf[SD_MAX_VDI_LEN + SD_MAX_VDI_TAG_LEN];
 BDRVSheepdogState *s = bs->opaque;
@@ -2477,8 +2473,9 @@ static int sd_snapshot_delete(BlockDriverState *bs,
 };
 SheepdogVdiRsp *rsp = (SheepdogVdiRsp *)
 
-if (!remove_objects(s)) {
-return -1;
+ret = remove_objects(s, errp);
+if (ret) {
+return ret;
 }
 
 memset(buf, 0, sizeof(buf));
@@ -2498,36 +2495,36 @@ static int sd_snapshot_delete(BlockDriverState *bs,
 pstrcpy(buf + SD_MAX_VDI_LEN, SD_MAX_VDI_TAG_LEN, snap_tag);
 }
 
-ret = find_vdi_name(s, s->name, snap_id, snap_tag, , true,
-_err);
+ret = find_vdi_name(s, s->name, snap_id, snap_tag, , true, errp);
 if (ret) {
 return ret;
 }
 
-fd = connect_to_sdog(s, _err);
+fd = connect_to_sdog(s, errp);
 if (fd < 0) {
-error_report_err(local_err);
-return -1;
+return fd;
 }
 
 ret = do_req(fd, s->bs, (SheepdogReq *),
  buf, , );
 closesocket(fd);
 if (ret) {
+error_setg_errno(errp, -ret, "Couldn't send request to server");
 return ret;
 }
 
 switch (rsp->result) {
 case SD_RES_NO_VDI:
-error_report("%s was already deleted", s->name);
+error_setg(errp, "Can't find the snapshot");
+return -ENOENT;
 case SD_RES_SUCCESS:
 break;
 default:
-error_report("%s, %s", sd_strerror(rsp->result), s->name);
-return -1;
+error_setg(errp, "%s", sd_strerror(rsp->result));
+return -EIO;
 }
 
-return ret;
+return 0;
 }
 
 static int sd_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
-- 
1.8.3.1




[Qemu-block] [PULL 11/27] sheepdog: Defuse time bomb in sd_open() error handling

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

When qemu_opts_absorb_qdict() fails, sd_open() closes stdin, because
sd->fd is still zero.  Fortunately, qemu_opts_absorb_qdict() can't
fail, because:

1. it only fails when qemu_opt_parse() fails, and
2. the only member of runtime_opts.desc[] is a QEMU_OPT_STRING, and
3. qemu_opt_parse() can't fail for QEMU_OPT_STRING.

Defuse this ticking time bomb by jumping behind the file descriptor
cleanup on error.

Also do that for the error paths where sd->fd is still -1.  The file
descriptor cleanup happens to do nothing then, but let's not rely on
that here.

While there, rename label out to err, because it's on the error path,
not the normal path out of the function.

Signed-off-by: Markus Armbruster 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/sheepdog.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 7434710..c3ee4ce 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1392,7 +1392,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, 
int flags,
 if (local_err) {
 error_propagate(errp, local_err);
 ret = -EINVAL;
-goto out;
+goto err_no_fd;
 }
 
 filename = qemu_opt_get(opts, "filename");
@@ -1412,17 +1412,17 @@ static int sd_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 if (ret < 0) {
 error_setg(errp, "Can't parse filename");
-goto out;
+goto err_no_fd;
 }
 s->fd = get_sheep_fd(s, errp);
 if (s->fd < 0) {
 ret = s->fd;
-goto out;
+goto err_no_fd;
 }
 
 ret = find_vdi_name(s, vdi, snapid, tag, , true, errp);
 if (ret) {
-goto out;
+goto err;
 }
 
 /*
@@ -1443,7 +1443,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, 
int flags,
 fd = connect_to_sdog(s, errp);
 if (fd < 0) {
 ret = fd;
-goto out;
+goto err;
 }
 
 buf = g_malloc(SD_INODE_SIZE);
@@ -1454,7 +1454,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, 
int flags,
 
 if (ret) {
 error_setg(errp, "Can't read snapshot inode");
-goto out;
+goto err;
 }
 
 memcpy(>inode, buf, sizeof(s->inode));
@@ -1466,12 +1466,12 @@ static int sd_open(BlockDriverState *bs, QDict 
*options, int flags,
 qemu_opts_del(opts);
 g_free(buf);
 return 0;
-out:
+
+err:
 aio_set_fd_handler(bdrv_get_aio_context(bs), s->fd,
false, NULL, NULL, NULL, NULL);
-if (s->fd >= 0) {
-closesocket(s->fd);
-}
+closesocket(s->fd);
+err_no_fd:
 qemu_opts_del(opts);
 g_free(buf);
 return ret;
-- 
1.8.3.1




[Qemu-block] [PULL 10/27] block: Fix error handling in bdrv_replace_in_backing_chain()

2017-03-07 Thread Kevin Wolf
When adding an Error parameter, bdrv_replace_in_backing_chain() would
become nothing more than a wrapper around change_parent_backing_link().
So make the latter public, renamed as bdrv_replace_node(), and remove
bdrv_replace_in_backing_chain().

Most of the callers just remove a node from the graph that they just
inserted, so they can use _abort, but completion of a mirror job
with 'replaces' set can actually fail.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
---
 block.c   | 25 ++---
 block/mirror.c| 15 +--
 blockdev.c|  2 +-
 include/block/block.h |  4 ++--
 include/block/block_int.h |  4 ++--
 5 files changed, 20 insertions(+), 30 deletions(-)

diff --git a/block.c b/block.c
index a310132..dd9ded8 100644
--- a/block.c
+++ b/block.c
@@ -2932,8 +2932,8 @@ static bool should_update_child(BdrvChild *c, 
BlockDriverState *to)
 return true;
 }
 
-static void change_parent_backing_link(BlockDriverState *from,
-   BlockDriverState *to, Error **errp)
+void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
+   Error **errp)
 {
 BdrvChild *c, *next;
 GSList *list = NULL, *p;
@@ -2941,6 +2941,9 @@ static void change_parent_backing_link(BlockDriverState 
*from,
 uint64_t perm = 0, shared = BLK_PERM_ALL;
 int ret;
 
+assert(!atomic_read(>in_flight));
+assert(!atomic_read(>in_flight));
+
 /* Make sure that @from doesn't go away until we have successfully attached
  * all of its parents to @to. */
 bdrv_ref(from);
@@ -3003,16 +3006,13 @@ void bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top,
 {
 Error *local_err = NULL;
 
-assert(!atomic_read(_top->in_flight));
-assert(!atomic_read(_new->in_flight));
-
 bdrv_set_backing_hd(bs_new, bs_top, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 goto out;
 }
 
-change_parent_backing_link(bs_top, bs_new, _err);
+bdrv_replace_node(bs_top, bs_new, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 bdrv_set_backing_hd(bs_new, NULL, _abort);
@@ -3025,19 +3025,6 @@ out:
 bdrv_unref(bs_new);
 }
 
-void bdrv_replace_in_backing_chain(BlockDriverState *old, BlockDriverState 
*new)
-{
-assert(!bdrv_requests_pending(old));
-assert(!bdrv_requests_pending(new));
-
-bdrv_ref(old);
-
-/* FIXME Proper error handling */
-change_parent_backing_link(old, new, _abort);
-
-bdrv_unref(old);
-}
-
 static void bdrv_delete(BlockDriverState *bs)
 {
 assert(!bs->job);
diff --git a/block/mirror.c b/block/mirror.c
index f24dc51..a5d30ee 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -550,8 +550,12 @@ static void mirror_exit(BlockJob *job, void *opaque)
 /* The mirror job has no requests in flight any more, but we need to
  * drain potential other users of the BDS before changing the graph. */
 bdrv_drained_begin(target_bs);
-bdrv_replace_in_backing_chain(to_replace, target_bs);
+bdrv_replace_node(to_replace, target_bs, _err);
 bdrv_drained_end(target_bs);
+if (local_err) {
+error_report_err(local_err);
+data->ret = -EPERM;
+}
 }
 if (s->to_replace) {
 bdrv_op_unblock_all(s->to_replace, s->replace_blocker);
@@ -570,12 +574,11 @@ static void mirror_exit(BlockJob *job, void *opaque)
  * block the removal. */
 block_job_remove_all_bdrv(job);
 bdrv_child_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL);
-bdrv_replace_in_backing_chain(mirror_top_bs, backing_bs(mirror_top_bs));
+bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), _abort);
 
 /* We just changed the BDS the job BB refers to (with either or both of the
- * bdrv_replace_in_backing_chain() calls), so switch the BB back so the
- * cleanup does the right thing. We don't need any permissions any more
- * now. */
+ * bdrv_replace_node() calls), so switch the BB back so the cleanup does
+ * the right thing. We don't need any permissions any more now. */
 blk_remove_bs(job->blk);
 blk_set_perm(job->blk, 0, BLK_PERM_ALL, _abort);
 blk_insert_bs(job->blk, mirror_top_bs, _abort);
@@ -1234,7 +1237,7 @@ fail:
 }
 
 bdrv_child_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL);
-bdrv_replace_in_backing_chain(mirror_top_bs, backing_bs(mirror_top_bs));
+bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), _abort);
 }
 
 void mirror_start(const char *job_id, BlockDriverState *bs,
diff --git a/blockdev.c b/blockdev.c
index af67ce4..f1f49bd 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1806,7 +1806,7 @@ static void external_snapshot_abort(BlkActionState 
*common)
  DO_UPCAST(ExternalSnapshotState, common, common);
 if (state->new_bs) {
 

[Qemu-block] [PULL 09/27] block: Handle permission errors in change_parent_backing_link()

2017-03-07 Thread Kevin Wolf
Instead of just trying to change parents by parent over to reference @to
instead of @from, and abort()ing whenever the permissions don't allow
this, do proper permission checking beforehand and pass any error to the
callers.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
---
 block.c | 50 --
 1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index a7b09d3..a310132 100644
--- a/block.c
+++ b/block.c
@@ -2933,21 +2933,53 @@ static bool should_update_child(BdrvChild *c, 
BlockDriverState *to)
 }
 
 static void change_parent_backing_link(BlockDriverState *from,
-   BlockDriverState *to)
+   BlockDriverState *to, Error **errp)
 {
 BdrvChild *c, *next;
+GSList *list = NULL, *p;
+uint64_t old_perm, old_shared;
+uint64_t perm = 0, shared = BLK_PERM_ALL;
+int ret;
+
+/* Make sure that @from doesn't go away until we have successfully attached
+ * all of its parents to @to. */
+bdrv_ref(from);
 
+/* Put all parents into @list and calculate their cumulative permissions */
 QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
 if (!should_update_child(c, to)) {
 continue;
 }
+list = g_slist_prepend(list, c);
+perm |= c->perm;
+shared &= c->shared_perm;
+}
+
+/* Check whether the required permissions can be granted on @to, ignoring
+ * all BdrvChild in @list so that they can't block themselves. */
+ret = bdrv_check_update_perm(to, perm, shared, list, errp);
+if (ret < 0) {
+bdrv_abort_perm_update(to);
+goto out;
+}
+
+/* Now actually perform the change. We performed the permission check for
+ * all elements of @list at once, so set the permissions all at once at the
+ * very end. */
+for (p = list; p != NULL; p = p->next) {
+c = p->data;
 
 bdrv_ref(to);
-/* FIXME Are we sure that bdrv_replace_child() can't run into
- * _abort because of permissions? */
-bdrv_replace_child(c, to, true);
+bdrv_replace_child_noperm(c, to);
 bdrv_unref(from);
 }
+
+bdrv_get_cumulative_perm(to, _perm, _shared);
+bdrv_set_perm(to, old_perm | perm, old_shared | shared);
+
+out:
+g_slist_free(list);
+bdrv_unref(from);
 }
 
 /*
@@ -2980,7 +3012,12 @@ void bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top,
 goto out;
 }
 
-change_parent_backing_link(bs_top, bs_new);
+change_parent_backing_link(bs_top, bs_new, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+bdrv_set_backing_hd(bs_new, NULL, _abort);
+goto out;
+}
 
 /* bs_new is now referenced by its new parents, we don't need the
  * additional reference any more. */
@@ -2995,7 +3032,8 @@ void bdrv_replace_in_backing_chain(BlockDriverState *old, 
BlockDriverState *new)
 
 bdrv_ref(old);
 
-change_parent_backing_link(old, new);
+/* FIXME Proper error handling */
+change_parent_backing_link(old, new, _abort);
 
 bdrv_unref(old);
 }
-- 
1.8.3.1




[Qemu-block] [PULL 14/27] sheepdog: Mark sd_snapshot_delete() lossage FIXME

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

sd_snapshot_delete() should delete the snapshot whose ID matches
@snapshot_id and whose name matches @name.  But that's not what it
does.  If @snapshot_id is a valid ID, it deletes the snapshot with
that ID, else it deletes the snapshot with that name.  It doesn't use
@name at all.  Add suitable FIXME comments, so someone who actually
knows Sheepdog can fix it.

Signed-off-by: Markus Armbruster 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/sheepdog.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index be3db1f..187bcd8 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2457,6 +2457,10 @@ static int sd_snapshot_delete(BlockDriverState *bs,
   const char *name,
   Error **errp)
 {
+/*
+ * FIXME should delete the snapshot matching both @snapshot_id and
+ * @name, but @name not used here
+ */
 unsigned long snap_id = 0;
 char snap_tag[SD_MAX_VDI_TAG_LEN];
 int fd, ret;
@@ -2481,6 +2485,11 @@ static int sd_snapshot_delete(BlockDriverState *bs,
 pstrcpy(buf, SD_MAX_VDI_LEN, s->name);
 ret = qemu_strtoul(snapshot_id, NULL, 10, _id);
 if (ret || snap_id > UINT32_MAX) {
+/*
+ * FIXME Since qemu_strtoul() returns -EINVAL when
+ * @snapshot_id is null, @snapshot_id is mandatory.  Correct
+ * would be to require at least one of @snapshot_id and @name.
+ */
 error_setg(errp, "Invalid snapshot ID: %s",
  snapshot_id ? snapshot_id : "");
 return -EINVAL;
@@ -2489,6 +2498,7 @@ static int sd_snapshot_delete(BlockDriverState *bs,
 if (snap_id) {
 hdr.snapid = (uint32_t) snap_id;
 } else {
+/* FIXME I suspect we should use @name here */
 pstrcpy(snap_tag, sizeof(snap_tag), snapshot_id);
 pstrcpy(buf + SD_MAX_VDI_LEN, SD_MAX_VDI_TAG_LEN, snap_tag);
 }
-- 
1.8.3.1




[Qemu-block] [PULL 04/27] mirror: Fix error path for dirty bitmap creation

2017-03-07 Thread Kevin Wolf
mirror_top_bs must be removed from the graph again when creating the
dirty bitmap fails.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
---
 block/mirror.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 001b5f0..f24dc51 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1197,10 +1197,7 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 
 s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
 if (!s->dirty_bitmap) {
-g_free(s->replaces);
-blk_unref(s->target);
-block_job_unref(>common);
-return;
+goto fail;
 }
 
 /* Required permissions are already taken with blk_new() */
-- 
1.8.3.1




[Qemu-block] [PULL 07/27] block: Factor out bdrv_replace_child_noperm()

2017-03-07 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
---
 block.c | 38 +-
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index 6dc02b8..d4570c8 100644
--- a/block.c
+++ b/block.c
@@ -1713,11 +1713,10 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 *nshared = shared;
 }
 
-static void bdrv_replace_child(BdrvChild *child, BlockDriverState *new_bs,
-   bool check_new_perm)
+static void bdrv_replace_child_noperm(BdrvChild *child,
+  BlockDriverState *new_bs)
 {
 BlockDriverState *old_bs = child->bs;
-uint64_t perm, shared_perm;
 
 if (old_bs) {
 if (old_bs->quiesce_counter && child->role->drained_end) {
@@ -1727,7 +1726,29 @@ static void bdrv_replace_child(BdrvChild *child, 
BlockDriverState *new_bs,
 child->role->detach(child);
 }
 QLIST_REMOVE(child, next_parent);
+}
+
+child->bs = new_bs;
+
+if (new_bs) {
+QLIST_INSERT_HEAD(_bs->parents, child, next_parent);
+if (new_bs->quiesce_counter && child->role->drained_begin) {
+child->role->drained_begin(child);
+}
+
+if (child->role->attach) {
+child->role->attach(child);
+}
+}
+}
 
+static void bdrv_replace_child(BdrvChild *child, BlockDriverState *new_bs,
+   bool check_new_perm)
+{
+BlockDriverState *old_bs = child->bs;
+uint64_t perm, shared_perm;
+
+if (old_bs) {
 /* Update permissions for old node. This is guaranteed to succeed
  * because we're just taking a parent away, so we're loosening
  * restrictions. */
@@ -1736,23 +1757,14 @@ static void bdrv_replace_child(BdrvChild *child, 
BlockDriverState *new_bs,
 bdrv_set_perm(old_bs, perm, shared_perm);
 }
 
-child->bs = new_bs;
+bdrv_replace_child_noperm(child, new_bs);
 
 if (new_bs) {
-QLIST_INSERT_HEAD(_bs->parents, child, next_parent);
-if (new_bs->quiesce_counter && child->role->drained_begin) {
-child->role->drained_begin(child);
-}
-
 bdrv_get_cumulative_perm(new_bs, , _perm);
 if (check_new_perm) {
 bdrv_check_perm(new_bs, perm, shared_perm, _abort);
 }
 bdrv_set_perm(new_bs, perm, shared_perm);
-
-if (child->role->attach) {
-child->role->attach(child);
-}
 }
 }
 
-- 
1.8.3.1




[Qemu-block] [PULL 17/27] sheepdog: Report errors in pseudo-filename more usefully

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

Errors in the pseudo-filename are all reported with the same laconic
"Can't parse filename" message.

Add real error reporting, such as:

$ qemu-system-x86_64 --drive driver=sheepdog,filename=sheepdog:///
qemu-system-x86_64: --drive driver=sheepdog,filename=sheepdog:///: missing 
file path in URI
$ qemu-system-x86_64 --drive driver=sheepdog,filename=sheepgod:///vdi
qemu-system-x86_64: --drive driver=sheepdog,filename=sheepgod:///vdi: URI 
scheme must be 'sheepdog', 'sheepdog+tcp', or 'sheepdog+unix'
$ qemu-system-x86_64 --drive 
driver=sheepdog,filename=sheepdog+unix:///vdi?socke=sheepdog.sock
qemu-system-x86_64: --drive 
driver=sheepdog,filename=sheepdog+unix:///vdi?socke=sheepdog.sock: unexpected 
query parameters

The code to translate legacy syntax to URI fails to escape URI
meta-characters.  The new error messages are misleading then.  Replace
them by the old "Can't parse filename" message.  "Internal error"
would be more honest.  Anyway, no worse than before.  Also add a FIXME
comment.

Signed-off-by: Markus Armbruster 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block/sheepdog.c | 88 +---
 1 file changed, 59 insertions(+), 29 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index fed7264..161932d 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -957,16 +957,18 @@ static bool sd_parse_snapid_or_tag(const char *str,
 return true;
 }
 
-static int sd_parse_uri(BDRVSheepdogState *s, const char *filename,
-char *vdi, uint32_t *snapid, char *tag)
+static void sd_parse_uri(BDRVSheepdogState *s, const char *filename,
+ char *vdi, uint32_t *snapid, char *tag,
+ Error **errp)
 {
+Error *err = NULL;
 URI *uri;
 QueryParams *qp = NULL;
-int ret = 0;
 
 uri = uri_parse(filename);
 if (!uri) {
-return -EINVAL;
+error_setg(, "invalid URI");
+goto out;
 }
 
 /* transport */
@@ -977,34 +979,46 @@ static int sd_parse_uri(BDRVSheepdogState *s, const char 
*filename,
 } else if (!strcmp(uri->scheme, "sheepdog+unix")) {
 s->is_unix = true;
 } else {
-ret = -EINVAL;
+error_setg(, "URI scheme must be 'sheepdog', 'sheepdog+tcp',"
+   " or 'sheepdog+unix'");
 goto out;
 }
 
 if (uri->path == NULL || !strcmp(uri->path, "/")) {
-ret = -EINVAL;
+error_setg(, "missing file path in URI");
 goto out;
 }
 if (g_strlcpy(vdi, uri->path + 1, SD_MAX_VDI_LEN) >= SD_MAX_VDI_LEN) {
-ret = -EINVAL;
+error_setg(, "VDI name is too long");
 goto out;
 }
 
 qp = query_params_parse(uri->query);
-if (qp->n > 1 || (s->is_unix && !qp->n) || (!s->is_unix && qp->n)) {
-ret = -EINVAL;
-goto out;
-}
 
 if (s->is_unix) {
 /* sheepdog+unix:///vdiname?socket=path */
-if (uri->server || uri->port || strcmp(qp->p[0].name, "socket")) {
-ret = -EINVAL;
+if (uri->server || uri->port) {
+error_setg(, "URI scheme %s doesn't accept a server address",
+   uri->scheme);
+goto out;
+}
+if (!qp->n) {
+error_setg(,
+   "URI scheme %s requires query parameter 'socket'",
+   uri->scheme);
+goto out;
+}
+if (qp->n != 1 || strcmp(qp->p[0].name, "socket")) {
+error_setg(, "unexpected query parameters");
 goto out;
 }
 s->host_spec = g_strdup(qp->p[0].value);
 } else {
 /* sheepdog[+tcp]://[host:port]/vdiname */
+if (qp->n) {
+error_setg(, "unexpected query parameters");
+goto out;
+}
 s->host_spec = g_strdup_printf("%s:%d", uri->server ?: SD_DEFAULT_ADDR,
uri->port ?: SD_DEFAULT_PORT);
 }
@@ -1012,7 +1026,8 @@ static int sd_parse_uri(BDRVSheepdogState *s, const char 
*filename,
 /* snapshot tag */
 if (uri->fragment) {
 if (!sd_parse_snapid_or_tag(uri->fragment, snapid, tag)) {
-ret = -EINVAL;
+error_setg(, "'%s' is not a valid snapshot ID",
+   uri->fragment);
 goto out;
 }
 } else {
@@ -1020,11 +1035,11 @@ static int sd_parse_uri(BDRVSheepdogState *s, const 
char *filename,
 }
 
 out:
+error_propagate(errp, err);
 if (qp) {
 query_params_free(qp);
 }
 uri_free(uri);
-return ret;
 }
 
 /*
@@ -1044,12 +1059,14 @@ out:
  * You can run VMs outside the Sheepdog cluster by specifying
  * `hostname' and `port' (experimental).
  */
-static int parse_vdiname(BDRVSheepdogState *s, const char *filename,
-  

[Qemu-block] [PULL 15/27] sheepdog: Fix snapshot ID parsing in _open(), _create, _goto()

2017-03-07 Thread Kevin Wolf
From: Markus Armbruster 

sd_parse_uri() and sd_snapshot_goto() screw up error checking after
strtoul(), and truncate long tag names silently.  Fix by replacing
those parts by new sd_parse_snapid_or_tag(), which checks more
carefully.

sd_snapshot_delete() also parses snapshot IDs, but is currently too
broken for me to touch.  Mark TODO.

Two calls of strtol() without error checking remain in
parse_redundancy().  Mark them FIXME.

More silent truncation of configuration strings remains elsewhere.
Not marked.

Signed-off-by: Markus Armbruster 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/sheepdog.c | 66 ++--
 1 file changed, 55 insertions(+), 11 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 187bcd8..d3d3731 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -914,6 +914,49 @@ static int get_sheep_fd(BDRVSheepdogState *s, Error **errp)
 return fd;
 }
 
+/*
+ * Parse numeric snapshot ID in @str
+ * If @str can't be parsed as number, return false.
+ * Else, if the number is zero or too large, set *@snapid to zero and
+ * return true.
+ * Else, set *@snapid to the number and return true.
+ */
+static bool sd_parse_snapid(const char *str, uint32_t *snapid)
+{
+unsigned long ul;
+int ret;
+
+ret = qemu_strtoul(str, NULL, 10, );
+if (ret == -ERANGE) {
+ul = ret = 0;
+}
+if (ret) {
+return false;
+}
+if (ul > UINT32_MAX) {
+ul = 0;
+}
+
+*snapid = ul;
+return true;
+}
+
+static bool sd_parse_snapid_or_tag(const char *str,
+   uint32_t *snapid, char tag[])
+{
+if (!sd_parse_snapid(str, snapid)) {
+*snapid = 0;
+if (g_strlcpy(tag, str, SD_MAX_VDI_TAG_LEN) >= SD_MAX_VDI_TAG_LEN) {
+return false;
+}
+} else if (!*snapid) {
+return false;
+} else {
+tag[0] = 0;
+}
+return true;
+}
+
 static int sd_parse_uri(BDRVSheepdogState *s, const char *filename,
 char *vdi, uint32_t *snapid, char *tag)
 {
@@ -965,9 +1008,9 @@ static int sd_parse_uri(BDRVSheepdogState *s, const char 
*filename,
 
 /* snapshot tag */
 if (uri->fragment) {
-*snapid = strtoul(uri->fragment, NULL, 10);
-if (*snapid == 0) {
-pstrcpy(tag, SD_MAX_VDI_TAG_LEN, uri->fragment);
+if (!sd_parse_snapid_or_tag(uri->fragment, snapid, tag)) {
+ret = -EINVAL;
+goto out;
 }
 } else {
 *snapid = CURRENT_VDI_ID; /* search current vdi */
@@ -1685,6 +1728,7 @@ static int parse_redundancy(BDRVSheepdogState *s, const 
char *opt)
 }
 
 copy = strtol(n1, NULL, 10);
+/* FIXME fix error checking by switching to qemu_strtol() */
 if (copy > SD_MAX_COPIES || copy < 1) {
 return -EINVAL;
 }
@@ -1699,6 +1743,7 @@ static int parse_redundancy(BDRVSheepdogState *s, const 
char *opt)
 }
 
 parity = strtol(n2, NULL, 10);
+/* FIXME fix error checking by switching to qemu_strtol() */
 if (parity >= SD_EC_MAX_STRIP || parity < 1) {
 return -EINVAL;
 }
@@ -2365,19 +2410,16 @@ static int sd_snapshot_goto(BlockDriverState *bs, const 
char *snapshot_id)
 BDRVSheepdogState *old_s;
 char tag[SD_MAX_VDI_TAG_LEN];
 uint32_t snapid = 0;
-int ret = 0;
+int ret;
+
+if (!sd_parse_snapid_or_tag(snapshot_id, , tag)) {
+return -EINVAL;
+}
 
 old_s = g_new(BDRVSheepdogState, 1);
 
 memcpy(old_s, s, sizeof(BDRVSheepdogState));
 
-snapid = strtoul(snapshot_id, NULL, 10);
-if (snapid) {
-tag[0] = 0;
-} else {
-pstrcpy(tag, sizeof(tag), snapshot_id);
-}
-
 ret = reload_inode(s, snapid, tag);
 if (ret) {
 goto out;
@@ -2483,6 +2525,7 @@ static int sd_snapshot_delete(BlockDriverState *bs,
 memset(buf, 0, sizeof(buf));
 memset(snap_tag, 0, sizeof(snap_tag));
 pstrcpy(buf, SD_MAX_VDI_LEN, s->name);
+/* TODO Use sd_parse_snapid() once this mess is cleaned up */
 ret = qemu_strtoul(snapshot_id, NULL, 10, _id);
 if (ret || snap_id > UINT32_MAX) {
 /*
@@ -2499,6 +2542,7 @@ static int sd_snapshot_delete(BlockDriverState *bs,
 hdr.snapid = (uint32_t) snap_id;
 } else {
 /* FIXME I suspect we should use @name here */
+/* FIXME don't truncate silently */
 pstrcpy(snap_tag, sizeof(snap_tag), snapshot_id);
 pstrcpy(buf + SD_MAX_VDI_LEN, SD_MAX_VDI_TAG_LEN, snap_tag);
 }
-- 
1.8.3.1




[Qemu-block] [PULL 06/27] block: Factor out should_update_child()

2017-03-07 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
---
 block.c | 42 +++---
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index f293ccb..6dc02b8 100644
--- a/block.c
+++ b/block.c
@@ -2886,28 +2886,40 @@ void bdrv_close_all(void)
 assert(QTAILQ_EMPTY(_bdrv_states));
 }
 
+static bool should_update_child(BdrvChild *c, BlockDriverState *to)
+{
+BdrvChild *to_c;
+
+if (c->role->stay_at_node) {
+return false;
+}
+
+if (c->role == _backing) {
+/* If @from is a backing file of @to, ignore the child to avoid
+ * creating a loop. We only want to change the pointer of other
+ * parents. */
+QLIST_FOREACH(to_c, >children, next) {
+if (to_c == c) {
+break;
+}
+}
+if (to_c) {
+return false;
+}
+}
+
+return true;
+}
+
 static void change_parent_backing_link(BlockDriverState *from,
BlockDriverState *to)
 {
-BdrvChild *c, *next, *to_c;
+BdrvChild *c, *next;
 
 QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
-if (c->role->stay_at_node) {
+if (!should_update_child(c, to)) {
 continue;
 }
-if (c->role == _backing) {
-/* If @from is a backing file of @to, ignore the child to avoid
- * creating a loop. We only want to change the pointer of other
- * parents. */
-QLIST_FOREACH(to_c, >children, next) {
-if (to_c == c) {
-break;
-}
-}
-if (to_c) {
-continue;
-}
-}
 
 bdrv_ref(to);
 /* FIXME Are we sure that bdrv_replace_child() can't run into
-- 
1.8.3.1




[Qemu-block] [PULL 05/27] block: Fix blockdev-snapshot error handling

2017-03-07 Thread Kevin Wolf
For blockdev-snapshot, external_snapshot_prepare() accepts an arbitrary
node reference at first and only checks later whether it already has a
backing file. Between those places, other errors can occur.

Therefore checking in external_snapshot_abort() whether state->new_bs
has a backing file is not sufficient to tell whether bdrv_append() was
already completed or not. Trying to undo the bdrv_append() when it
wasn't even executed is wrong.

Introduce a new boolean flag in the state to fix this.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
---
 blockdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 8eb4e84..af67ce4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1614,6 +1614,7 @@ typedef struct ExternalSnapshotState {
 BlockDriverState *old_bs;
 BlockDriverState *new_bs;
 AioContext *aio_context;
+bool overlay_appended;
 } ExternalSnapshotState;
 
 static void external_snapshot_prepare(BlkActionState *common,
@@ -1780,6 +1781,7 @@ static void external_snapshot_prepare(BlkActionState 
*common,
 error_propagate(errp, local_err);
 return;
 }
+state->overlay_appended = true;
 }
 
 static void external_snapshot_commit(BlkActionState *common)
@@ -1803,7 +1805,7 @@ static void external_snapshot_abort(BlkActionState 
*common)
 ExternalSnapshotState *state =
  DO_UPCAST(ExternalSnapshotState, common, common);
 if (state->new_bs) {
-if (state->new_bs->backing) {
+if (state->overlay_appended) {
 bdrv_replace_in_backing_chain(state->new_bs, state->old_bs);
 }
 }
-- 
1.8.3.1




[Qemu-block] [PULL 02/27] mirror: Fix permission problem with 'replaces'

2017-03-07 Thread Kevin Wolf
The 'replaces' option of drive-mirror can be used to mirror a Quorum
node to a new image and then let the target image replace one of the
Quorum children. In order for this graph modification to succeed, the
mirror job needs to lift its restrictions on the target node first
before actually replacing the child.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
---
 block/mirror.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 57f26c3..c9185b3 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -509,6 +509,13 @@ static void mirror_exit(BlockJob *job, void *opaque)
  * block_job_completed(). */
 bdrv_ref(src);
 bdrv_ref(mirror_top_bs);
+bdrv_ref(target_bs);
+
+/* Remove target parent that still uses BLK_PERM_WRITE/RESIZE before
+ * inserting target_bs at s->to_replace, where we might not be able to get
+ * these permissions. */
+blk_unref(s->target);
+s->target = NULL;
 
 /* We don't access the source any more. Dropping any WRITE/RESIZE is
  * required before it could become a backing file of target_bs. */
@@ -555,8 +562,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
 aio_context_release(replace_aio_context);
 }
 g_free(s->replaces);
-blk_unref(s->target);
-s->target = NULL;
+bdrv_unref(target_bs);
 
 /* Remove the mirror filter driver from the graph. Before this, get rid of
  * the blockers on the intermediate nodes so that the resulting state is
-- 
1.8.3.1




[Qemu-block] [PULL 00/27] Block layer fixes for 2.9.0-rc0

2017-03-07 Thread Kevin Wolf
The following changes since commit ff79d5e939c38677a575e3493eb9b4d36eb21865:

  Merge remote-tracking branch 'remotes/xtensa/tags/20170306-xtensa' into 
staging (2017-03-07 09:57:14 +)

are available in the git repository at:


  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to b69f00dde490e88d55f5ee731545e690b2dc61f8:

  commit: Don't use error_abort in commit_start (2017-03-07 14:53:29 +0100)


Block layer fixes for 2.9.0-rc0


Fam Zheng (2):
  block: Don't use error_abort in blk_new_open
  commit: Don't use error_abort in commit_start

Kevin Wolf (10):
  commit: Fix error handling
  mirror: Fix permission problem with 'replaces'
  mirror: Fix permissions for removing mirror_top_bs
  mirror: Fix error path for dirty bitmap creation
  block: Fix blockdev-snapshot error handling
  block: Factor out should_update_child()
  block: Factor out bdrv_replace_child_noperm()
  block: Ignore multiple children in bdrv_check_update_perm()
  block: Handle permission errors in change_parent_backing_link()
  block: Fix error handling in bdrv_replace_in_backing_chain()

Markus Armbruster (15):
  sheepdog: Defuse time bomb in sd_open() error handling
  sheepdog: Fix error handling in sd_snapshot_delete()
  sheepdog: Fix error handling sd_create()
  sheepdog: Mark sd_snapshot_delete() lossage FIXME
  sheepdog: Fix snapshot ID parsing in _open(), _create, _goto()
  sheepdog: Don't truncate long VDI name in _open(), _create()
  sheepdog: Report errors in pseudo-filename more usefully
  sheepdog: Use SocketAddress and socket_connect()
  sheepdog: Implement bdrv_parse_filename()
  gluster: Drop assumptions on SocketTransport names
  gluster: Don't duplicate qapi-util.c's qapi_enum_parse()
  gluster: Plug memory leaks in qemu_gluster_parse_json()
  qapi-schema: Rename GlusterServer to SocketAddressFlat
  qapi-schema: Rename SocketAddressFlat's variant tcp to inet
  sheepdog: Support blockdev-add

 block.c   | 182 ---
 block/block-backend.c |   7 +-
 block/commit.c|  18 +-
 block/gluster.c   | 127 ++---
 block/mirror.c|  35 ++--
 block/sheepdog.c  | 453 +-
 blockdev.c|   6 +-
 include/block/block.h |   4 +-
 include/block/block_int.h |   6 +-
 qapi-schema.json  |  38 
 qapi/block-core.json  |  73 +++-
 11 files changed, 623 insertions(+), 326 deletions(-)



[Qemu-block] [PULL 03/27] mirror: Fix permissions for removing mirror_top_bs

2017-03-07 Thread Kevin Wolf
mirror_top_bs takes write permissions on its backing file, which can
make it impossible to attach that backing file node to another parent.
However, this is exactly what needs to be done in order to remove
mirror_top_bs from the backing chain. So give up the write permission
first.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
---
 block/mirror.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index c9185b3..001b5f0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -566,8 +566,10 @@ static void mirror_exit(BlockJob *job, void *opaque)
 
 /* Remove the mirror filter driver from the graph. Before this, get rid of
  * the blockers on the intermediate nodes so that the resulting state is
- * valid. */
+ * valid. Also give up permissions on mirror_top_bs->backing, which might
+ * block the removal. */
 block_job_remove_all_bdrv(job);
+bdrv_child_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL);
 bdrv_replace_in_backing_chain(mirror_top_bs, backing_bs(mirror_top_bs));
 
 /* We just changed the BDS the job BB refers to (with either or both of the
@@ -1234,6 +1236,7 @@ fail:
 block_job_unref(>common);
 }
 
+bdrv_child_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL);
 bdrv_replace_in_backing_chain(mirror_top_bs, backing_bs(mirror_top_bs));
 }
 
-- 
1.8.3.1




[Qemu-block] [PULL 01/27] commit: Fix error handling

2017-03-07 Thread Kevin Wolf
Apparently some kind of mismerge happened in commit 8dfba279, which
broke the error handling without any real reason by removing the
assignment of the return value to ret in a blk_insert_bs() call.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Philippe Mathieu-Daudé 
---
 block/commit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/commit.c b/block/commit.c
index 22a0a4d..e57c1cf 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -364,7 +364,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 
 /* Required permissions are already taken with block_job_add_bdrv() */
 s->top = blk_new(0, BLK_PERM_ALL);
-blk_insert_bs(s->top, top, errp);
+ret = blk_insert_bs(s->top, top, errp);
 if (ret < 0) {
 goto fail;
 }
-- 
1.8.3.1




Re: [Qemu-block] [PATCH v3 5/6] replication: Implement block replication for shared disk case

2017-03-07 Thread Hailiang Zhang

Hi Stefan,

Sorry for the delayed reply.

On 2017/2/28 1:37, Stefan Hajnoczi wrote:

On Fri, Jan 20, 2017 at 11:47:59AM +0800, zhanghailiang wrote:

Just as the scenario of non-shared disk block replication,
we are going to implement block replication from many basic
blocks that are already in QEMU.
The architecture is:

  virtio-blk ||   
.--
  /  ||   | 
Secondary
 /   ||   
'--
/|| 
virtio-blk
   / || 
 |
   | ||   
replication(5)
   |NBD  >   NBD   (2)  
 |
   |  client ||server ---> hidden disk <-- 
active disk(4)
   | ^   ||  |
   |  replication(1) ||  |
   | |   ||  |
   |   +-'   ||  |
  (3)  |drive-backup sync=none   ||  |
. |   +-+   ||  |
Primary | | |   ||   backing|
' | |   ||  |
   V |   |
+---+|
|   shared disk | <--+
+---+

 1) Primary writes will read original data and forward it to Secondary
QEMU.
 2) The hidden-disk is created automatically. It buffers the original 
content
that is modified by the primary VM. It should also be an empty disk, and
the driver supports bdrv_make_empty() and backing file.
 3) Primary write requests will be written to Shared disk.
 4) Secondary write requests will be buffered in the active disk and it
will overwrite the existing sector content in the buffer.

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 


Are there any restrictions on the shared disk?  For example the -drive
cache= mode must be 'none'.  If the cache mode isn't 'none' the
secondary host might have old data in the host page cache.  The


While do checkpoint, we will call vm_stop(), in which, the bdrv_flush_all()
will be called, is it enough ?


Secondary QEMU would have an inconsistent view of the shared disk.

Are image file formats like qcow2 supported for the shared disk?  Extra


In the above scenario, it has no limitation of formats for the shared disk.


steps are required to achieve consistency, see bdrv_invalidate_cache().



Hmm, in that case, we should call bdrv_invalidate_cache_all() while checkpoint.


Thanks,
Hailiang


Stefan






Re: [Qemu-block] [PATCH] block: Constify data passed by pointer to blk_name

2017-03-07 Thread Kevin Wolf
Am 05.03.2017 um 22:44 hat Krzysztof Kozlowski geschrieben:
> blk_name() is not modifying data passed to it through pointer and it
> returns also a pointer to const so the argument can be made const for
> code safeness.
> 
> Signed-off-by: Krzysztof Kozlowski 

Thanks, applied to block-next for qemu 2.10.

Kevin



Re: [Qemu-block] [PATCH v4] backup: allow target without .bdrv_get_info

2017-03-07 Thread Kevin Wolf
Am 28.02.2017 um 20:33 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Currently backup to nbd target is broken, as nbd doesn't have
> .bdrv_get_info realization.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> v4: use error_report()
> add article
> 
> v3: fix compilation (I feel like an idiot)
> adjust wording (Fam)
> 
> v2: add WARNING
> 
> ===
> 
> Since commit
> 
> commit 4c9bca7e39a6e07ad02c1dcde3478363344ec60b
> Author: John Snow 
> Date:   Thu Feb 25 15:58:30 2016 -0500
> 
> block/backup: avoid copying less than full target clusters
> 
> backup to nbd target is broken, we have "Couldn't determine the cluster size 
> of
> the target image".
> 
> Proposed NBD protocol extension - NBD_OPT_INFO should finally solve this 
> problem.
> But until it is not realized, we need allow backup to nbd target due to 
> backward
> compatibility.
> 
> Furthermore, is it entirely ok to disallow backup if bds lacks .bdrv_get_info?
> Which behavior should be default: to fail backup or to use default cluster 
> size?
> 
>  block/backup.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index ea38733849..ea160e9e82 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -24,6 +24,7 @@
>  #include "qemu/cutils.h"
>  #include "sysemu/block-backend.h"
>  #include "qemu/bitmap.h"
> +#include "qemu/error-report.h"
>  
>  #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>  #define SLICE_TIME 1ULL /* ns */
> @@ -638,7 +639,16 @@ BlockJob *backup_job_create(const char *job_id, 
> BlockDriverState *bs,
>   * backup cluster size is smaller than the target cluster size. Even for
>   * targets with a backing file, try to avoid COW if possible. */
>  ret = bdrv_get_info(target, );
> -if (ret < 0 && !target->backing) {
> +if (ret == -ENOTSUP) {

I think this should be if (ret == -ENOTSUP && !target->backing) because
the warning explicitly says "doesn't have a backing file", and the case
with a backing file is already handled below (without a warning).

I can fix this while applying if you agree.

> +/* Cluster size is not defined */
> +error_report("WARNING: The target block device doesn't provide "
> + "information about the block size and it doesn't have a 
> "
> + "backing file. The default block size of %u bytes is "
> + "used. If the actual block size of the target exceeds "
> + "this default, the backup may be unusable",
> + BACKUP_CLUSTER_SIZE_DEFAULT);
> +job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
> +} else if (ret < 0 && !target->backing) {
>  error_setg_errno(errp, -ret,
>  "Couldn't determine the cluster size of the target image, "
>  "which has no backing file");

Kevin



Re: [Qemu-block] [PATCH v2 00/15] block: A bunch of fixes for Sheepdog and Gluster

2017-03-07 Thread Kevin Wolf
Am 06.03.2017 um 20:00 hat Markus Armbruster geschrieben:
> Bad error handling, memory leaks, and lack of blockdev-add support.

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH v2 0/2] block: More robust error handling around perm check

2017-03-07 Thread Kevin Wolf
Am 07.03.2017 um 12:07 hat Fam Zheng geschrieben:
> v2: Fix error handling in both patches. [Kevin]
> 
> These are a couple cases added with the op blocker changes, where errp is
> available but we still use error_abort. Fix them for obvious reasons.

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH 00/10] block: Op blocker fixes

2017-03-07 Thread Kevin Wolf
Am 06.03.2017 um 17:21 hat Kevin Wolf geschrieben:
> This series fixes a few problems introduced recently with the op blocker
> series. It includes mainly fix for cases where qemu would abort()
> instead of doing proper error handling previously. These changes also
> happen to result in more complete and correct permission checking.

Applied to the block branch.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH 05/10] block: Fix blockdev-snapshot error handling

2017-03-07 Thread Kevin Wolf
Am 06.03.2017 um 21:23 hat Eric Blake geschrieben:
> On 03/06/2017 10:21 AM, Kevin Wolf wrote:
> > For blockdev-snapshot, external_snapshot_prepare() accepts an arbitrary
> > node reference at first and only checks later whether it already has a
> > backing file. Between those places, other errors can occur.
> > 
> > Therefore checking in external_snapshot_abort() whether state->new_bs
> > has a backing file is not sufficient to tell whether bdrv_append() was
> > already completed or not. Trying to undo the bdrv_append() when it
> > wasn't even executed is wrong.
> > 
> > Introduce a new boolean flag in the state to fix this.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  blockdev.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> 
> Reviewed-by: Eric Blake 
> 
> By the way, how are you finding all these spots? Is it existing qemu-io
> tests that are failing? And if so, would mentioning which test exposed
> the problem being fixed be worth adding in the commit messages?  If not,
> are there some qemu-io tests to be added?

Most of the fixes here are based on qemu-iotests failures that only
appeared after fixing the error path in change_parent_backing_link(),
which is now later in this series. So there is no commit at which
qemu-iotests cases are actually failing, even though they helped me spot
some problems.

Kevin


pgp5wp89HNTc6.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH 1/4] iotests: add migration corner cases test

2017-03-07 Thread Dr. David Alan Gilbert
* Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

If I understand this correctly, these are tests that currently fail and
are fixed by the following patches;  in which case it needs reordering
to keep stuff bisectable.

Dave

> ---
>  tests/qemu-iotests/175 | 71 
> ++
>  tests/qemu-iotests/175.out |  5 
>  tests/qemu-iotests/group   |  1 +
>  3 files changed, 77 insertions(+)
>  create mode 100644 tests/qemu-iotests/175
>  create mode 100644 tests/qemu-iotests/175.out
> 
> diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
> new file mode 100644
> index 00..ef86c70db5
> --- /dev/null
> +++ b/tests/qemu-iotests/175
> @@ -0,0 +1,71 @@
> +#!/usr/bin/env python
> +#
> +# Test migration corner-cases
> +#
> +# Copyright (C) Vladimir Sementsov-Ogievskiy 2017
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see .
> +#
> +
> +import os
> +import iotests
> +import time
> +from iotests import qemu_img
> +
> +disk = os.path.join(iotests.test_dir, 'disk')
> +
> +class TestMigrationCornerCases(iotests.QMPTestCase):
> +def setUp(self):
> +qemu_img('create', '-f', iotests.imgfmt, disk, '10M')
> +self.vm = iotests.VM().add_drive(disk)
> +self.vm.launch()
> +
> +def tearDown(self):
> +self.vm.shutdown()
> +os.remove(disk)
> +
> +def test_migrate_reset_cont_write(self):
> +result = self.vm.qmp('migrate', uri='exec:cat>/dev/null')
> +self.assert_qmp(result, 'return', {})
> +time.sleep(4)
> +
> +result = self.vm.qmp('human-monitor-command',
> + command_line='system_reset')
> +self.assert_qmp(result, 'return', '')
> +
> +result = self.vm.qmp('cont')
> +self.assert_qmp(result, 'return', {})
> +
> +result = self.vm.qmp('human-monitor-command',
> + command_line='qemu-io drive0 "write 0 512"')
> +self.assert_qmp(result, 'return', '')
> +
> +def test_migrate_savevm(self):
> +result = self.vm.qmp('migrate', uri='exec:cat>/dev/null')
> +self.assert_qmp(result, 'return', {})
> +time.sleep(4)
> +
> +result = self.vm.qmp('human-monitor-command', command_line='savevm')
> +self.assert_qmp(result, 'return', '')
> +
> +def test_savevm_set_speed_savevm(self):
> +for i in range(10):
> +result = self.vm.qmp('human-monitor-command', 
> command_line='savevm')
> +self.assert_qmp(result, 'return', '')
> +
> +result = self.vm.qmp('migrate_set_speed', 
> value=9223372036853727232)
> +self.assert_qmp(result, 'return', {})
> +
> +if __name__ == '__main__':
> +iotests.main()
> diff --git a/tests/qemu-iotests/175.out b/tests/qemu-iotests/175.out
> new file mode 100644
> index 00..8d7e996700
> --- /dev/null
> +++ b/tests/qemu-iotests/175.out
> @@ -0,0 +1,5 @@
> +...
> +--
> +Ran 3 tests
> +
> +OK
> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
> index 985b9a6a36..1f4bf03185 100644
> --- a/tests/qemu-iotests/group
> +++ b/tests/qemu-iotests/group
> @@ -167,3 +167,4 @@
>  172 auto
>  173 rw auto
>  174 auto
> +175 auto quick
> -- 
> 2.11.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [PATCH 3/4] savevm: fix savevm after migration

2017-03-07 Thread Dr. David Alan Gilbert
* Kevin Wolf (kw...@redhat.com) wrote:
> Am 07.03.2017 um 10:59 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > 07.03.2017 12:53, Kevin Wolf wrote:
> > >Am 25.02.2017 um 20:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > >>After migration all drives are inactive and savevm will fail with
> > >>
> > >>qemu-kvm: block/io.c:1406: bdrv_co_do_pwritev:
> > >>Assertion `!(bs->open_flags & 0x0800)' failed.
> > >>
> > >>Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > >What's the exact state you're in? I tried to reproduce this, but just
> > >doing a live migration and then savevm on the destination works fine for
> > >me.
> > >
> > >Hm... Or do you mean on the source? In that case, I think the operation
> > >must fail, but of course more gracefully than now.
> > 
> > Yes, I mean on the source. It may not be migration for "mirgration",
> > but for dumping state to file. In that case it seems not wrong to
> > make a snapshot on source.
> 
> Yes, resuming on the source is definitely valid. I'm only questioning if
> 'savevm' (and by extension, any other command that modifies the VM or
> its images) should automagically regain control of the VM, or whether
> that should be more explicit.

How many things other than 'cont' and 'savevm' are there?

We should definitely fix it so  a savevm there doesn't assert,
but I'm also unsure if it's worth a separate explicit command.

Dave

> Kevin
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-block] [PATCH v2 1/2] block: Don't use error_abort in blk_new_open

2017-03-07 Thread Fam Zheng
We have an errp and bdrv_root_attach_child can fail permission check,
error_abort is not the best choice here.

Signed-off-by: Fam Zheng 
---
 block/block-backend.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index daa7908..5742c09 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -213,7 +213,12 @@ BlockBackend *blk_new_open(const char *filename, const 
char *reference,
 }
 
 blk->root = bdrv_root_attach_child(bs, "root", _root,
-   perm, BLK_PERM_ALL, blk, _abort);
+   perm, BLK_PERM_ALL, blk, errp);
+if (!blk->root) {
+bdrv_unref(bs);
+blk_unref(blk);
+return NULL;
+}
 
 return blk;
 }
-- 
2.9.3




[Qemu-block] [PATCH v2 0/2] block: More robust error handling around perm check

2017-03-07 Thread Fam Zheng
v2: Fix error handling in both patches. [Kevin]

These are a couple cases added with the op blocker changes, where errp is
available but we still use error_abort. Fix them for obvious reasons.

Fam Zheng (2):
  block: Don't use error_abort in blk_new_open
  commit: Don't use error_abort in commit_start

 block/block-backend.c |  7 ++-
 block/commit.c| 16 ++--
 2 files changed, 20 insertions(+), 3 deletions(-)

-- 
2.9.3




[Qemu-block] [PATCH v2 2/2] commit: Don't use error_abort in commit_start

2017-03-07 Thread Fam Zheng
bdrv_set_backing_hd failure needn't be abort. Since we already have
error parameter, use it.

Signed-off-by: Fam Zheng 
---
 block/commit.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 22a0a4d..55f5cee 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -316,8 +316,20 @@ void commit_start(const char *job_id, BlockDriverState *bs,
 goto fail;
 }
 
-bdrv_set_backing_hd(commit_top_bs, top, _abort);
-bdrv_set_backing_hd(overlay_bs, commit_top_bs, _abort);
+bdrv_set_backing_hd(commit_top_bs, top, _err);
+if (local_err) {
+bdrv_unref(commit_top_bs);
+commit_top_bs = NULL;
+error_propagate(errp, local_err);
+goto fail;
+}
+bdrv_set_backing_hd(overlay_bs, commit_top_bs, _err);
+if (local_err) {
+bdrv_unref(commit_top_bs);
+commit_top_bs = NULL;
+error_propagate(errp, local_err);
+goto fail;
+}
 
 s->commit_top_bs = commit_top_bs;
 bdrv_unref(commit_top_bs);
-- 
2.9.3




Re: [Qemu-block] [PATCH 3/4] savevm: fix savevm after migration

2017-03-07 Thread Kevin Wolf
Am 07.03.2017 um 10:59 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 07.03.2017 12:53, Kevin Wolf wrote:
> >Am 25.02.2017 um 20:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>After migration all drives are inactive and savevm will fail with
> >>
> >>qemu-kvm: block/io.c:1406: bdrv_co_do_pwritev:
> >>Assertion `!(bs->open_flags & 0x0800)' failed.
> >>
> >>Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >What's the exact state you're in? I tried to reproduce this, but just
> >doing a live migration and then savevm on the destination works fine for
> >me.
> >
> >Hm... Or do you mean on the source? In that case, I think the operation
> >must fail, but of course more gracefully than now.
> 
> Yes, I mean on the source. It may not be migration for "mirgration",
> but for dumping state to file. In that case it seems not wrong to
> make a snapshot on source.

Yes, resuming on the source is definitely valid. I'm only questioning if
'savevm' (and by extension, any other command that modifies the VM or
its images) should automagically regain control of the VM, or whether
that should be more explicit.

Kevin



Re: [Qemu-block] [PATCH] file-posix: Incoporate max_segments in block limit

2017-03-07 Thread Paolo Bonzini


On 07/03/2017 03:17, Fam Zheng wrote:
> Linux exposes a separate limit, /sys/block/.../queue/max_segments, which
> in the worst case can be more restrictive than BLKSECTGET (as they are
> two different things). Similar to the BLKSECTGET story, guests don't see
> this limit and send big requests will get -EINVAL error on SG_IO.
> 
> Lean on the safer side to clamp max_transfer according to max_segments
> and page size, because in the end what host HBA gets is the mapped host
> pages rather than a guest buffer.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/file-posix.c | 58 
> ++
>  1 file changed, 58 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 4de1abd..b615262 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -668,6 +668,59 @@ static int hdev_get_max_transfer_length(BlockDriverState 
> *bs, int fd)
>  #endif
>  }
>  
> +static int hdev_get_max_segments(BlockDriverState *bs)
> +{
> +#ifdef CONFIG_LINUX
> +char buf[32];
> +const char *end;
> +char *sysfspath, *fullpath;
> +int ret;
> +int fd = -1;
> +long max_segments;
> +
> +fullpath = realpath(bs->exact_filename, NULL);
> +if (!fullpath) {
> +return -errno;
> +}
> +if (strncmp(fullpath, "/dev/", 5) || !fullpath[5]) {
> +ret = -ENOTSUP;
> +goto out;
> +}
> +sysfspath = g_strdup_printf("/sys/block/%s/queue/max_segments",
> +[5]);

I think you cannot rely on the /dev/... path.  Luckily, there is an
alternative path to "queue" via /sys/dev/block/MAJOR:MINOR/queue, where
MAJOR:MINOR can be retrieved via fstat.  This also avoids any possible
TOC-TOU races.

Thanks,

Paolo

> +fd = open(sysfspath, O_RDONLY);
> +if (fd == -1) {
> +ret = -errno;
> +goto out;
> +}
> +do {
> +ret = read(fd, buf, sizeof(buf));
> +} while (ret == -1 && errno == EINTR);
> +if (ret < 0) {
> +ret = -errno;
> +goto out;
> +} else if (ret == 0) {
> +ret = -EIO;
> +goto out;
> +}
> +buf[ret] = 0;
> +/* The file is ended with '\n', pass 'end' to accept that. */
> +ret = qemu_strtol(buf, , 10, _segments);
> +if (ret == 0 && end && *end == '\n') {
> +ret = max_segments;
> +}
> +
> +out:
> +if (fd >= 0) {
> +close(fd);
> +}
> +free(fullpath);
> +return ret;
> +#else
> +return -ENOTSUP;
> +#endif
> +}
> +
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>  BDRVRawState *s = bs->opaque;
> @@ -679,6 +732,11 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
>  bs->bl.max_transfer = pow2floor(ret);
>  }
> +ret = hdev_get_max_segments(bs);
> +if (ret > 0) {
> +bs->bl.max_transfer = MIN(bs->bl.max_transfer,
> +  ret * getpagesize());
> +}
>  }
>  }
>  
> 



Re: [Qemu-block] [PATCH for-2.10 1/3] block: Add errp to b{lk, drv}_truncate()

2017-03-07 Thread Kevin Wolf
Am 06.03.2017 um 20:54 hat Max Reitz geschrieben:
> For one thing, this allows us to drop the error message generation from
> qemu-img.c and blockdev.c and instead have it unified in
> bdrv_truncate().
> 
> Signed-off-by: Max Reitz 

>  block/commit.c |  5 +++--
>  block/mirror.c |  2 +-

These still pass NULL after the series. Fixing it would require to add a
way to complete jobs with an Error object, but maybe we should do this
sooner or later anyway. Not necessarily part of this series, though.

>  block/vhdx.c   |  6 +++---
>  block/vpc.c|  2 +-

vpc can easily be fixed to pass the actual errp from vpc_create() to the
blk_truncate() call. In vhdx.c, the blk_truncate() call is a bit more
deeply nested, but it doesn't seem completely unreasonable there either.

> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1569,7 +1569,7 @@ static int truncate_f(BlockBackend *blk, int argc, char 
> **argv)
>  return 0;
>  }
>  
> -ret = blk_truncate(blk, offset);
> +ret = blk_truncate(blk, offset, NULL);
>  if (ret < 0) {
>  printf("truncate: %s\n", strerror(-ret));
>  return 0;

Come on, using NULL immediately before a printf()? Doing the right thing
here is trivial.

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH for-2.10 3/3] block: Add some bdrv_truncate() error messages

2017-03-07 Thread Kevin Wolf
Am 06.03.2017 um 21:53 hat Eric Blake geschrieben:
> On 03/06/2017 02:48 PM, Max Reitz wrote:
> > On 06.03.2017 21:21, Eric Blake wrote:
> >> On 03/06/2017 02:14 PM, Max Reitz wrote:
> >>
> >  if (S_ISREG(st.st_mode)) {
> >  if (ftruncate(s->fd, offset) < 0) {
> > +/* The generic error message will be fine */
> >  return -errno;
> 
>  Relying on a generic error message in the caller is awkward. I see it as
>  evidence of a partial conversion if we have an interface that requires a
>  return of a negative errno value to make up for when errp is not set.
> >>>
> >>> I'm not sure what you mean by this exactly.
> >>
> >> The ideal conversion would be having .bdrv_truncate switch to a void
> >> return; then no caller is messing with negative errno values (especially
> >> since some of them are just synthesizing errno values, rather than
> >> actually encountering one, and since some of them are probably using -1
> >> when they should be using errno). But such a conversion requires that
> >> all drivers be updated to properly set errp.
> > 
> > Not sure if that is an ideal conversion. I very much prefer negative
> > return value + error object because that allows constructs like
> > 
> > if (foo(..., errp) < 0) {
> > return;
> > }
> 
> Fair point - Markus has argued that we should convert functions from
> void to easy-to-spot return sentinels for easier logic (less boilerplate
> in creating a local error, checking it, and propagating it).
> 
> But the point remains that returning -1 is simpler to code than
> returning negative errno, when some of the existing drivers are already
> synthesizing errno.  And (temporarily) forcing a void return would make
> it easy to spot who still needs conversion, even if we then go back to
> returning int (but this time, with the simpler -1 for error, rather than
> negative errno).

Note that bdrv_truncate() is a bit special because it is called in some
paths that only have a -errno return and no Error** parameter, like
bdrv_co_pwritev() implementations, and I don't think we intend to
convert those to Error**.

Sharing some code between bdrv_truncate() and write after EOF is
something that I'd like to have anyway. Some of the things that
bdrv_truncate() does are missing in bdrv_aligned_pwritev(), and in the
age of -blockdev where you can use any node for anything, this is a bug.

Kevin


pgpIeDcgntcD7.pgp
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH 1/2] virtio-blk: Remove useless condition around g_free()

2017-03-07 Thread Fam Zheng
On Tue, 02/07 21:27, Fam Zheng wrote:
> Laszlo spotted and studied this wasteful "if". He pointed out:
> 
> The original virtio_blk_free_request needed an "if" as it accesses one
> field, since 671ec3f05655 ("virtio-blk: Convert VirtIOBlockReq.elem to
> pointer", 2014-06-11); later on in f897bf751fbd ("virtio-blk: embed
> VirtQueueElement in VirtIOBlockReq", 2014-07-09) the field became
> embedded, so the "if" became unnecessary (at which point we were using
> g_slice_free(), but it is the same.
> 
> Now drop it.
> 
> Reported-by: Laszlo Ersek 
> Signed-off-by: Fam Zheng 
> ---
>  hw/block/virtio-blk.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 702eda8..2858c31 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -42,9 +42,7 @@ static void virtio_blk_init_request(VirtIOBlock *s, 
> VirtQueue *vq,
>  
>  static void virtio_blk_free_request(VirtIOBlockReq *req)
>  {
> -if (req) {
> -g_free(req);
> -}
> +g_free(req);
>  }
>  
>  static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char 
> status)
> -- 
> 2.9.3
> 
> 

Cc: qemu-triv...@nongnu.org

(Let's drop 2/2 and perhaps merge this via trivial)

Fam



Re: [Qemu-block] [PATCH 2/4] qmp-cont: invalidate on RUN_STATE_PRELAUNCH

2017-03-07 Thread Kevin Wolf
Am 07.03.2017 um 11:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 07.03.2017 13:02, Kevin Wolf wrote:
> >Am 25.02.2017 um 20:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>We must invalidate on RUN_STATE_PRELAUNCH too, as it is available
> >>through qmp_system_reset from RUN_STATE_POSTMIGRATE. Otherwise, we will
> >>come to
> >>
> >>qemu-kvm: block/io.c:1406: bdrv_co_do_pwritev:
> >>Assertion `!(bs->open_flags & 0x0800)' failed.
> >>
> >>on the first write after vm start.
> >>
> >>Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >Wouldn't it make more sense to invalidate in qmp_system_reset() where
> >the migration states are left?
> >
> >Or maybe BDRV_O_INACTIVE could even be tied directly to runstates? Not
> >sure how realistic this one is, but if we start adding invalidate_cache
> >calls all over the place, maybe that's a sign that we need to look for a
> >more central place.
> 
> I've proposed it in cover letter) These bugs and my fixes are just
> show that something should be rethought.. I don't claim that these
> fixes are true way, they are just the simplest.

Ah, sorry, I read the cover letter a while ago, but not again before
reading the patches now. You're right that it's something that needs to
be addressed in some way. Your patches may be simple, but I think there
is no hope to ever get it completely correct with that approach because
there are so many cases (basically each QMP command is one case).

The cover letter seems to propose that INACTIVE could be a runstate. I
don't think that's quite right, because we already have many runstates
that would qualify as inactive in this sense, but that still need to be
separate. But possibly we can have a mapping from runstates to the
inactive flag.

One (implementation) problem with directly coupling inactive with
runstates is that runstate_set() can't fail at the moment, but
bdrv_inactivate/invalidate_cache() can.

Kevin



Re: [Qemu-block] [PATCH RFC 1/1] vmstate: draft fix for failed iotests case 68 and 91

2017-03-07 Thread Halil Pasic


On 03/07/2017 11:05 AM, Kevin Wolf wrote:
> Am 07.03.2017 um 10:54 hat Halil Pasic geschrieben:
>>
>>
>> On 03/07/2017 10:29 AM, Kevin Wolf wrote:
>>> Am 07.03.2017 um 03:53 hat QingFeng Hao geschrieben:
 I am not very clear about the logic in vmstate.c, but from its context in
 vmstate_save_state, it seems size should not be 0, otherwise the followed
 for loop will keep working on the same element. So I just add a simple
 check to pass that case, not sure if it's right but it can pass iotest
 case 68 and 91 now.

 The iotest's failed output is:
 068 1s ... - output mismatch (see 068.out.bad)
 --- 
 /home/haoqf/KVMonz/gitcheck/work/qemu-master/tree/qemu/tests/qemu-iotests/068.out
2017-03-06 05:52:24.817328899 +0100
 +++ 068.out.bad 2017-03-07 03:28:44.426714519 +0100
 @@ -3,9 +3,13 @@
  === Saving and reloading a VM state to/from a qcow2 image ===

  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
 +qemu-system-s390x: migration/vmstate.c:336: vmstate_save_state: Assertion 
 `first_elem || !n_elems' failed.
 +./common.config: line 109: 52497 Aborted ( if [ -n 
 "${QEMU_NEED_PID}" ]; then
 +echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid";
 +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" )
  QEMU X.Y.Z monitor - type 'help' for more information
  (qemu) savevm 0
 -(qemu) quit
 +qemu-system-s390x: Device 'virtio0' does not have the requested snapshot 
 '0'
  QEMU X.Y.Z monitor - type 'help' for more information
  (qemu) quit
  *** done

 091 1s ... [failed, exit status 1] - output mismatch (see 091.out.bad)
 --- tests/qemu-iotests/091.out 2016-08-30 12:35:04.207683276 +0200
 +++ 091.out.bad2017-03-06 13:08:03.717135426 +0100
 @@ -11,18 +11,23 @@
  
  vm1: qemu-io disk write complete
  vm1: live migration started
 -vm1: live migration completed
 -
 -=== VM 2: Post-migration, write to disk, verify running ===
 -
 -vm2: qemu-io disk write complete
 -vm2: qemu process running successfully
 -vm2: flush io, and quit
 -Check image pattern
 -read 4194304/4194304 bytes at offset 0
 -4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 -Running 'qemu-img check -r all $TEST_IMG'
 -No errors were found on the image.
 -80/16384 = 0.49% allocated, 0.00% fragmented, 0.00% compressed 
 clusters
 -Image end offset: 5570560
 -*** done
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +./common.qemu: line 110: write error: Broken pipe
 +Timeout waiting for completed on handle 0

 Signed-off-by: QingFeng Hao 
 ---
  migration/vmstate.c | 8 
  1 file changed, 8 insertions(+)

 diff --git a/migration/vmstate.c b/migration/vmstate.c
 index 78b3cd4..ff28dde 100644
 --- a/migration/vmstate.c
 +++ b/migration/vmstate.c
 @@ -106,6 +106,10 @@ int vmstate_load_state(QEMUFile *f, const 
 VMStateDescription *vmsd,
  int i, n_elems = vmstate_n_elems(opaque, field);
  int size = vmstate_size(opaque, field);
  
 +if (size == 0) {
 +field++;
 +continue;
 +}
  vmstate_handle_alloc(first_elem, field, opaque);
  if (field->flags & VMS_POINTER) {
  first_elem = *(void **)first_elem;
 @@ -322,6 +326,10 @@ void vmstate_save_state(QEMUFile *f, const 
 VMStateDescription *vmsd,
  int64_t old_offset, written_bytes;
  QJSON *vmdesc_loop = vmdesc;
  
 +if (size == 0) {
 +field++;
 +continue;
 +}
  

Re: [Qemu-block] [PATCH RFC 1/1] vmstate: draft fix for failed iotests case 68 and 91

2017-03-07 Thread Halil Pasic


On 03/07/2017 11:03 AM, Dr. David Alan Gilbert wrote:
> * Halil Pasic (pa...@linux.vnet.ibm.com) wrote:
>>
>>
>> On 03/07/2017 10:29 AM, Kevin Wolf wrote:
>>> Am 07.03.2017 um 03:53 hat QingFeng Hao geschrieben:
 I am not very clear about the logic in vmstate.c, but from its context in
 vmstate_save_state, it seems size should not be 0, otherwise the followed
 for loop will keep working on the same element. So I just add a simple
 check to pass that case, not sure if it's right but it can pass iotest
 case 68 and 91 now.

[..]

>>
>> I have looked onto the issue. It affects s390x only if we
>> are running without KVM. Basically, S390CPU.irqstate is unused
>> if we do not use KVM, and thus no buffer is allocated.
>>
>> IMHO this is a missing field and the cleaner way to handle such
>> missing fields is exist. However this used to work, and I recommended
>> QuiFeng Hao to discuss the problem upstream.
>>
>> By the way, I think, if we want to go back to the old behavior
>> and support VMS_VBUFFER with size 0 and nullptr, a much
>> cleaner way to do the fix is to change the assert to:
>>
>> assert(first_elem  || !n_elems || !size)
>>
>> Obviously the other clean way to fix is to implement exists.
>>
>> I think the migration maintainers (Juan and Dave) should make a
>> call regarding if the described usage  of VMS_BUFFER is a legal
>> or an illegal one.
> 
> So is it this code in target/s390x/machine.c ?
> 
> VMSTATE_VBUFFER_UINT32(irqstate, S390CPU, 4, NULL,
>irqstate_saved_size),
> 

Yes!

> it should be legal for that to be zero length.
> It also makes sense that if that's zero length it should be OK for
> the pointer to be NULL.
> 
> I think it's best if you do change that assert.
> 

Makes sense. I think QingFeng will agree too and send a
new version soon.

Regards,
Halil

> Dave




Re: [Qemu-block] [PATCH 2/4] qmp-cont: invalidate on RUN_STATE_PRELAUNCH

2017-03-07 Thread Vladimir Sementsov-Ogievskiy

07.03.2017 13:02, Kevin Wolf wrote:

Am 25.02.2017 um 20:31 hat Vladimir Sementsov-Ogievskiy geschrieben:

We must invalidate on RUN_STATE_PRELAUNCH too, as it is available
through qmp_system_reset from RUN_STATE_POSTMIGRATE. Otherwise, we will
come to

qemu-kvm: block/io.c:1406: bdrv_co_do_pwritev:
Assertion `!(bs->open_flags & 0x0800)' failed.

on the first write after vm start.

Signed-off-by: Vladimir Sementsov-Ogievskiy 

Wouldn't it make more sense to invalidate in qmp_system_reset() where
the migration states are left?

Or maybe BDRV_O_INACTIVE could even be tied directly to runstates? Not
sure how realistic this one is, but if we start adding invalidate_cache
calls all over the place, maybe that's a sign that we need to look for a
more central place.


I've proposed it in cover letter) These bugs and my fixes are just show 
that something should be rethought.. I don't claim that these fixes are 
true way, they are just the simplest.



Kevin


  qmp.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qmp.c b/qmp.c
index dfaabac1a6..e61795d033 100644
--- a/qmp.c
+++ b/qmp.c
@@ -198,7 +198,8 @@ void qmp_cont(Error **errp)
  /* Continuing after completed migration. Images have been inactivated to
   * allow the destination to take control. Need to get control back now. */
  if (runstate_check(RUN_STATE_FINISH_MIGRATE) ||
-runstate_check(RUN_STATE_POSTMIGRATE))
+runstate_check(RUN_STATE_POSTMIGRATE) ||
+runstate_check(RUN_STATE_PRELAUNCH))
  {
  bdrv_invalidate_cache_all(_err);
  if (local_err) {
--
2.11.1




--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH RFC 1/1] vmstate: draft fix for failed iotests case 68 and 91

2017-03-07 Thread Kevin Wolf
Am 07.03.2017 um 10:54 hat Halil Pasic geschrieben:
> 
> 
> On 03/07/2017 10:29 AM, Kevin Wolf wrote:
> > Am 07.03.2017 um 03:53 hat QingFeng Hao geschrieben:
> >> I am not very clear about the logic in vmstate.c, but from its context in
> >> vmstate_save_state, it seems size should not be 0, otherwise the followed
> >> for loop will keep working on the same element. So I just add a simple
> >> check to pass that case, not sure if it's right but it can pass iotest
> >> case 68 and 91 now.
> >>
> >> The iotest's failed output is:
> >> 068 1s ... - output mismatch (see 068.out.bad)
> >> --- 
> >> /home/haoqf/KVMonz/gitcheck/work/qemu-master/tree/qemu/tests/qemu-iotests/068.out
> >>2017-03-06 05:52:24.817328899 +0100
> >> +++ 068.out.bad 2017-03-07 03:28:44.426714519 +0100
> >> @@ -3,9 +3,13 @@
> >>  === Saving and reloading a VM state to/from a qcow2 image ===
> >>
> >>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
> >> +qemu-system-s390x: migration/vmstate.c:336: vmstate_save_state: Assertion 
> >> `first_elem || !n_elems' failed.
> >> +./common.config: line 109: 52497 Aborted ( if [ -n 
> >> "${QEMU_NEED_PID}" ]; then
> >> +echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid";
> >> +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" )
> >>  QEMU X.Y.Z monitor - type 'help' for more information
> >>  (qemu) savevm 0
> >> -(qemu) quit
> >> +qemu-system-s390x: Device 'virtio0' does not have the requested snapshot 
> >> '0'
> >>  QEMU X.Y.Z monitor - type 'help' for more information
> >>  (qemu) quit
> >>  *** done
> >>
> >> 091 1s ... [failed, exit status 1] - output mismatch (see 091.out.bad)
> >> --- tests/qemu-iotests/091.out 2016-08-30 12:35:04.207683276 +0200
> >> +++ 091.out.bad2017-03-06 13:08:03.717135426 +0100
> >> @@ -11,18 +11,23 @@
> >>  
> >>  vm1: qemu-io disk write complete
> >>  vm1: live migration started
> >> -vm1: live migration completed
> >> -
> >> -=== VM 2: Post-migration, write to disk, verify running ===
> >> -
> >> -vm2: qemu-io disk write complete
> >> -vm2: qemu process running successfully
> >> -vm2: flush io, and quit
> >> -Check image pattern
> >> -read 4194304/4194304 bytes at offset 0
> >> -4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >> -Running 'qemu-img check -r all $TEST_IMG'
> >> -No errors were found on the image.
> >> -80/16384 = 0.49% allocated, 0.00% fragmented, 0.00% compressed 
> >> clusters
> >> -Image end offset: 5570560
> >> -*** done
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +Timeout waiting for completed on handle 0
> >>
> >> Signed-off-by: QingFeng Hao 
> >> ---
> >>  migration/vmstate.c | 8 
> >>  1 file changed, 8 insertions(+)
> >>
> >> diff --git a/migration/vmstate.c b/migration/vmstate.c
> >> index 78b3cd4..ff28dde 100644
> >> --- a/migration/vmstate.c
> >> +++ b/migration/vmstate.c
> >> @@ -106,6 +106,10 @@ int vmstate_load_state(QEMUFile *f, const 
> >> VMStateDescription *vmsd,
> >>  int i, n_elems = vmstate_n_elems(opaque, field);
> >>  int size = vmstate_size(opaque, field);
> >>  
> >> +if (size == 0) {
> >> +field++;
> >> +continue;
> >> +}
> >>  vmstate_handle_alloc(first_elem, field, opaque);
> >>  if (field->flags & VMS_POINTER) {
> >>  first_elem = *(void **)first_elem;
> >> @@ -322,6 +326,10 @@ void vmstate_save_state(QEMUFile *f, const 
> >> VMStateDescription *vmsd,
> >>  int64_t old_offset, written_bytes;
> >>  QJSON *vmdesc_loop = vmdesc;
> >>  
> >> +if (size == 0) {
> >> +field++;
> >> +continue;
> >> +}
> >>  trace_vmstate_save_state_loop(vmsd->name, field->name, 
> >> 

Re: [Qemu-block] [PATCH RFC 1/1] vmstate: draft fix for failed iotests case 68 and 91

2017-03-07 Thread Dr. David Alan Gilbert
* Halil Pasic (pa...@linux.vnet.ibm.com) wrote:
> 
> 
> On 03/07/2017 10:29 AM, Kevin Wolf wrote:
> > Am 07.03.2017 um 03:53 hat QingFeng Hao geschrieben:
> >> I am not very clear about the logic in vmstate.c, but from its context in
> >> vmstate_save_state, it seems size should not be 0, otherwise the followed
> >> for loop will keep working on the same element. So I just add a simple
> >> check to pass that case, not sure if it's right but it can pass iotest
> >> case 68 and 91 now.
> >>
> >> The iotest's failed output is:
> >> 068 1s ... - output mismatch (see 068.out.bad)
> >> --- 
> >> /home/haoqf/KVMonz/gitcheck/work/qemu-master/tree/qemu/tests/qemu-iotests/068.out
> >>2017-03-06 05:52:24.817328899 +0100
> >> +++ 068.out.bad 2017-03-07 03:28:44.426714519 +0100
> >> @@ -3,9 +3,13 @@
> >>  === Saving and reloading a VM state to/from a qcow2 image ===
> >>
> >>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
> >> +qemu-system-s390x: migration/vmstate.c:336: vmstate_save_state: Assertion 
> >> `first_elem || !n_elems' failed.
> >> +./common.config: line 109: 52497 Aborted ( if [ -n 
> >> "${QEMU_NEED_PID}" ]; then
> >> +echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid";
> >> +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" )
> >>  QEMU X.Y.Z monitor - type 'help' for more information
> >>  (qemu) savevm 0
> >> -(qemu) quit
> >> +qemu-system-s390x: Device 'virtio0' does not have the requested snapshot 
> >> '0'
> >>  QEMU X.Y.Z monitor - type 'help' for more information
> >>  (qemu) quit
> >>  *** done
> >>
> >> 091 1s ... [failed, exit status 1] - output mismatch (see 091.out.bad)
> >> --- tests/qemu-iotests/091.out 2016-08-30 12:35:04.207683276 +0200
> >> +++ 091.out.bad2017-03-06 13:08:03.717135426 +0100
> >> @@ -11,18 +11,23 @@
> >>  
> >>  vm1: qemu-io disk write complete
> >>  vm1: live migration started
> >> -vm1: live migration completed
> >> -
> >> -=== VM 2: Post-migration, write to disk, verify running ===
> >> -
> >> -vm2: qemu-io disk write complete
> >> -vm2: qemu process running successfully
> >> -vm2: flush io, and quit
> >> -Check image pattern
> >> -read 4194304/4194304 bytes at offset 0
> >> -4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >> -Running 'qemu-img check -r all $TEST_IMG'
> >> -No errors were found on the image.
> >> -80/16384 = 0.49% allocated, 0.00% fragmented, 0.00% compressed 
> >> clusters
> >> -Image end offset: 5570560
> >> -*** done
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +./common.qemu: line 110: write error: Broken pipe
> >> +Timeout waiting for completed on handle 0
> >>
> >> Signed-off-by: QingFeng Hao 
> >> ---
> >>  migration/vmstate.c | 8 
> >>  1 file changed, 8 insertions(+)
> >>
> >> diff --git a/migration/vmstate.c b/migration/vmstate.c
> >> index 78b3cd4..ff28dde 100644
> >> --- a/migration/vmstate.c
> >> +++ b/migration/vmstate.c
> >> @@ -106,6 +106,10 @@ int vmstate_load_state(QEMUFile *f, const 
> >> VMStateDescription *vmsd,
> >>  int i, n_elems = vmstate_n_elems(opaque, field);
> >>  int size = vmstate_size(opaque, field);
> >>  
> >> +if (size == 0) {
> >> +field++;
> >> +continue;
> >> +}
> >>  vmstate_handle_alloc(first_elem, field, opaque);
> >>  if (field->flags & VMS_POINTER) {
> >>  first_elem = *(void **)first_elem;
> >> @@ -322,6 +326,10 @@ void vmstate_save_state(QEMUFile *f, const 
> >> VMStateDescription *vmsd,
> >>  int64_t old_offset, written_bytes;
> >>  QJSON *vmdesc_loop = vmdesc;
> >>  
> >> +if (size == 0) {
> >> +field++;
> >> +continue;
> >> +}
> >>  trace_vmstate_save_state_loop(vmsd->name, field->name, 
> >> n_elems);

Re: [Qemu-block] [PATCH 2/4] qmp-cont: invalidate on RUN_STATE_PRELAUNCH

2017-03-07 Thread Kevin Wolf
Am 25.02.2017 um 20:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
> We must invalidate on RUN_STATE_PRELAUNCH too, as it is available
> through qmp_system_reset from RUN_STATE_POSTMIGRATE. Otherwise, we will
> come to
> 
> qemu-kvm: block/io.c:1406: bdrv_co_do_pwritev:
>Assertion `!(bs->open_flags & 0x0800)' failed.
> 
> on the first write after vm start.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Wouldn't it make more sense to invalidate in qmp_system_reset() where
the migration states are left?

Or maybe BDRV_O_INACTIVE could even be tied directly to runstates? Not
sure how realistic this one is, but if we start adding invalidate_cache
calls all over the place, maybe that's a sign that we need to look for a
more central place.

Kevin

>  qmp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/qmp.c b/qmp.c
> index dfaabac1a6..e61795d033 100644
> --- a/qmp.c
> +++ b/qmp.c
> @@ -198,7 +198,8 @@ void qmp_cont(Error **errp)
>  /* Continuing after completed migration. Images have been inactivated to
>   * allow the destination to take control. Need to get control back now. 
> */
>  if (runstate_check(RUN_STATE_FINISH_MIGRATE) ||
> -runstate_check(RUN_STATE_POSTMIGRATE))
> +runstate_check(RUN_STATE_POSTMIGRATE) ||
> +runstate_check(RUN_STATE_PRELAUNCH))
>  {
>  bdrv_invalidate_cache_all(_err);
>  if (local_err) {
> -- 
> 2.11.1
> 



Re: [Qemu-block] [PATCH 3/4] savevm: fix savevm after migration

2017-03-07 Thread Vladimir Sementsov-Ogievskiy

07.03.2017 12:53, Kevin Wolf wrote:

Am 25.02.2017 um 20:31 hat Vladimir Sementsov-Ogievskiy geschrieben:

After migration all drives are inactive and savevm will fail with

qemu-kvm: block/io.c:1406: bdrv_co_do_pwritev:
Assertion `!(bs->open_flags & 0x0800)' failed.

Signed-off-by: Vladimir Sementsov-Ogievskiy 

What's the exact state you're in? I tried to reproduce this, but just
doing a live migration and then savevm on the destination works fine for
me.

Hm... Or do you mean on the source? In that case, I think the operation
must fail, but of course more gracefully than now.


Yes, I mean on the source. It may not be migration for "mirgration", but 
for dumping state to file. In that case it seems not wrong to make a 
snapshot on source.




Actually, the question that you're asking implicitly here is how the
source qemu process should be "reactivated" after a failed migration.
Currently, as far as I know, this is only with issuing a "cont" command.
It might make sense to provide a way to get control without resuming the
VM, but I doubt that adding automatic resume to every QMP command is the
right way to achieve it.

Dave, Juan, what do you think?


diff --git a/block/snapshot.c b/block/snapshot.c
index bf5c2ca5e1..256d06ac9f 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -145,7 +145,8 @@ bool bdrv_snapshot_find_by_id_and_name(BlockDriverState *bs,
  int bdrv_can_snapshot(BlockDriverState *bs)
  {
  BlockDriver *drv = bs->drv;
-if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
+if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
+(bs->open_flags & BDRV_O_INACTIVE)) {
  return 0;
  }

I wasn't sure if this doesn't disable too much, but it seems it only
makes 'info snapshots' turn up empty, which might not be nice, but maybe
tolerable.

At least it should definitely fix the assertion.


diff --git a/migration/savevm.c b/migration/savevm.c
index 5ecd264134..75e56d2d07 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2068,6 +2068,17 @@ int save_vmstate(Monitor *mon, const char *name)
  Error *local_err = NULL;
  AioContext *aio_context;
  
+if (runstate_check(RUN_STATE_FINISH_MIGRATE) ||

+runstate_check(RUN_STATE_POSTMIGRATE) ||
+runstate_check(RUN_STATE_PRELAUNCH))
+{
+bdrv_invalidate_cache_all(_err);
+if (local_err) {
+error_report_err(local_err);
+return -EINVAL;
+}
+}
+

This hunk can't go in before the more general question of implicitly or
explicitly regaining control after a failed migration is answered.

Kevin



--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH RFC 1/1] vmstate: draft fix for failed iotests case 68 and 91

2017-03-07 Thread Fam Zheng
On Tue, 03/07 09:41, Dr. David Alan Gilbert wrote:
> > This is really a live migration fix, so I'm adding Juan and Dave to CC.
> > 
> > I suspect the real question is why a field with size 0 was even stored
> > in the vmstate to begin with.
> 
> Yes; which field is it that's failing?

See Qingfeng's other reply in this thread. It is irqstate.

Fam



Re: [Qemu-block] [PATCH RFC 1/1] vmstate: draft fix for failed iotests case 68 and 91

2017-03-07 Thread Halil Pasic


On 03/07/2017 10:29 AM, Kevin Wolf wrote:
> Am 07.03.2017 um 03:53 hat QingFeng Hao geschrieben:
>> I am not very clear about the logic in vmstate.c, but from its context in
>> vmstate_save_state, it seems size should not be 0, otherwise the followed
>> for loop will keep working on the same element. So I just add a simple
>> check to pass that case, not sure if it's right but it can pass iotest
>> case 68 and 91 now.
>>
>> The iotest's failed output is:
>> 068 1s ... - output mismatch (see 068.out.bad)
>> --- 
>> /home/haoqf/KVMonz/gitcheck/work/qemu-master/tree/qemu/tests/qemu-iotests/068.out
>>2017-03-06 05:52:24.817328899 +0100
>> +++ 068.out.bad 2017-03-07 03:28:44.426714519 +0100
>> @@ -3,9 +3,13 @@
>>  === Saving and reloading a VM state to/from a qcow2 image ===
>>
>>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
>> +qemu-system-s390x: migration/vmstate.c:336: vmstate_save_state: Assertion 
>> `first_elem || !n_elems' failed.
>> +./common.config: line 109: 52497 Aborted ( if [ -n 
>> "${QEMU_NEED_PID}" ]; then
>> +echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid";
>> +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" )
>>  QEMU X.Y.Z monitor - type 'help' for more information
>>  (qemu) savevm 0
>> -(qemu) quit
>> +qemu-system-s390x: Device 'virtio0' does not have the requested snapshot '0'
>>  QEMU X.Y.Z monitor - type 'help' for more information
>>  (qemu) quit
>>  *** done
>>
>> 091 1s ... [failed, exit status 1] - output mismatch (see 091.out.bad)
>> --- tests/qemu-iotests/091.out   2016-08-30 12:35:04.207683276 +0200
>> +++ 091.out.bad  2017-03-06 13:08:03.717135426 +0100
>> @@ -11,18 +11,23 @@
>>  
>>  vm1: qemu-io disk write complete
>>  vm1: live migration started
>> -vm1: live migration completed
>> -
>> -=== VM 2: Post-migration, write to disk, verify running ===
>> -
>> -vm2: qemu-io disk write complete
>> -vm2: qemu process running successfully
>> -vm2: flush io, and quit
>> -Check image pattern
>> -read 4194304/4194304 bytes at offset 0
>> -4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> -Running 'qemu-img check -r all $TEST_IMG'
>> -No errors were found on the image.
>> -80/16384 = 0.49% allocated, 0.00% fragmented, 0.00% compressed clusters
>> -Image end offset: 5570560
>> -*** done
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +./common.qemu: line 110: write error: Broken pipe
>> +Timeout waiting for completed on handle 0
>>
>> Signed-off-by: QingFeng Hao 
>> ---
>>  migration/vmstate.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/migration/vmstate.c b/migration/vmstate.c
>> index 78b3cd4..ff28dde 100644
>> --- a/migration/vmstate.c
>> +++ b/migration/vmstate.c
>> @@ -106,6 +106,10 @@ int vmstate_load_state(QEMUFile *f, const 
>> VMStateDescription *vmsd,
>>  int i, n_elems = vmstate_n_elems(opaque, field);
>>  int size = vmstate_size(opaque, field);
>>  
>> +if (size == 0) {
>> +field++;
>> +continue;
>> +}
>>  vmstate_handle_alloc(first_elem, field, opaque);
>>  if (field->flags & VMS_POINTER) {
>>  first_elem = *(void **)first_elem;
>> @@ -322,6 +326,10 @@ void vmstate_save_state(QEMUFile *f, const 
>> VMStateDescription *vmsd,
>>  int64_t old_offset, written_bytes;
>>  QJSON *vmdesc_loop = vmdesc;
>>  
>> +if (size == 0) {
>> +field++;
>> +continue;
>> +}
>>  trace_vmstate_save_state_loop(vmsd->name, field->name, n_elems);
>>  if (field->flags & VMS_POINTER) {
>>  first_elem = *(void **)first_elem;
> 
> This is really a live migration fix, so I'm adding Juan and Dave to CC.

You are right, this is migration stuff and has very little to do with
qemu-block.
> 
> I suspect the 

Re: [Qemu-block] [PATCH 3/4] savevm: fix savevm after migration

2017-03-07 Thread Kevin Wolf
Am 25.02.2017 um 20:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
> After migration all drives are inactive and savevm will fail with
> 
> qemu-kvm: block/io.c:1406: bdrv_co_do_pwritev:
>Assertion `!(bs->open_flags & 0x0800)' failed.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

What's the exact state you're in? I tried to reproduce this, but just
doing a live migration and then savevm on the destination works fine for
me.

Hm... Or do you mean on the source? In that case, I think the operation
must fail, but of course more gracefully than now.

Actually, the question that you're asking implicitly here is how the
source qemu process should be "reactivated" after a failed migration.
Currently, as far as I know, this is only with issuing a "cont" command.
It might make sense to provide a way to get control without resuming the
VM, but I doubt that adding automatic resume to every QMP command is the
right way to achieve it.

Dave, Juan, what do you think?

> diff --git a/block/snapshot.c b/block/snapshot.c
> index bf5c2ca5e1..256d06ac9f 100644
> --- a/block/snapshot.c
> +++ b/block/snapshot.c
> @@ -145,7 +145,8 @@ bool bdrv_snapshot_find_by_id_and_name(BlockDriverState 
> *bs,
>  int bdrv_can_snapshot(BlockDriverState *bs)
>  {
>  BlockDriver *drv = bs->drv;
> -if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
> +if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
> +(bs->open_flags & BDRV_O_INACTIVE)) {
>  return 0;
>  }

I wasn't sure if this doesn't disable too much, but it seems it only
makes 'info snapshots' turn up empty, which might not be nice, but maybe
tolerable.

At least it should definitely fix the assertion.

> diff --git a/migration/savevm.c b/migration/savevm.c
> index 5ecd264134..75e56d2d07 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2068,6 +2068,17 @@ int save_vmstate(Monitor *mon, const char *name)
>  Error *local_err = NULL;
>  AioContext *aio_context;
>  
> +if (runstate_check(RUN_STATE_FINISH_MIGRATE) ||
> +runstate_check(RUN_STATE_POSTMIGRATE) ||
> +runstate_check(RUN_STATE_PRELAUNCH))
> +{
> +bdrv_invalidate_cache_all(_err);
> +if (local_err) {
> +error_report_err(local_err);
> +return -EINVAL;
> +}
> +}
> +

This hunk can't go in before the more general question of implicitly or
explicitly regaining control after a failed migration is answered.

Kevin



Re: [Qemu-block] [PATCH RFC 1/1] vmstate: draft fix for failed iotests case 68 and 91

2017-03-07 Thread Dr. David Alan Gilbert
* Kevin Wolf (kw...@redhat.com) wrote:
> Am 07.03.2017 um 03:53 hat QingFeng Hao geschrieben:
> > I am not very clear about the logic in vmstate.c, but from its context in
> > vmstate_save_state, it seems size should not be 0, otherwise the followed
> > for loop will keep working on the same element. So I just add a simple
> > check to pass that case, not sure if it's right but it can pass iotest
> > case 68 and 91 now.
> > 
> > The iotest's failed output is:
> > 068 1s ... - output mismatch (see 068.out.bad)
> > --- 
> > /home/haoqf/KVMonz/gitcheck/work/qemu-master/tree/qemu/tests/qemu-iotests/068.out
> >2017-03-06 05:52:24.817328899 +0100
> > +++ 068.out.bad 2017-03-07 03:28:44.426714519 +0100
> > @@ -3,9 +3,13 @@
> >  === Saving and reloading a VM state to/from a qcow2 image ===
> > 
> >  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
> > +qemu-system-s390x: migration/vmstate.c:336: vmstate_save_state: Assertion 
> > `first_elem || !n_elems' failed.
> > +./common.config: line 109: 52497 Aborted ( if [ -n 
> > "${QEMU_NEED_PID}" ]; then
> > +echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid";
> > +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" )
> >  QEMU X.Y.Z monitor - type 'help' for more information
> >  (qemu) savevm 0
> > -(qemu) quit
> > +qemu-system-s390x: Device 'virtio0' does not have the requested snapshot 
> > '0'
> >  QEMU X.Y.Z monitor - type 'help' for more information
> >  (qemu) quit
> >  *** done
> > 
> > 091 1s ... [failed, exit status 1] - output mismatch (see 091.out.bad)
> > --- tests/qemu-iotests/091.out  2016-08-30 12:35:04.207683276 +0200
> > +++ 091.out.bad 2017-03-06 13:08:03.717135426 +0100
> > @@ -11,18 +11,23 @@
> >  
> >  vm1: qemu-io disk write complete
> >  vm1: live migration started
> > -vm1: live migration completed
> > -
> > -=== VM 2: Post-migration, write to disk, verify running ===
> > -
> > -vm2: qemu-io disk write complete
> > -vm2: qemu process running successfully
> > -vm2: flush io, and quit
> > -Check image pattern
> > -read 4194304/4194304 bytes at offset 0
> > -4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > -Running 'qemu-img check -r all $TEST_IMG'
> > -No errors were found on the image.
> > -80/16384 = 0.49% allocated, 0.00% fragmented, 0.00% compressed clusters
> > -Image end offset: 5570560
> > -*** done
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +./common.qemu: line 110: write error: Broken pipe
> > +Timeout waiting for completed on handle 0
> > 
> > Signed-off-by: QingFeng Hao 
> > ---
> >  migration/vmstate.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/migration/vmstate.c b/migration/vmstate.c
> > index 78b3cd4..ff28dde 100644
> > --- a/migration/vmstate.c
> > +++ b/migration/vmstate.c
> > @@ -106,6 +106,10 @@ int vmstate_load_state(QEMUFile *f, const 
> > VMStateDescription *vmsd,
> >  int i, n_elems = vmstate_n_elems(opaque, field);
> >  int size = vmstate_size(opaque, field);
> >  
> > +if (size == 0) {
> > +field++;
> > +continue;
> > +}
> >  vmstate_handle_alloc(first_elem, field, opaque);
> >  if (field->flags & VMS_POINTER) {
> >  first_elem = *(void **)first_elem;
> > @@ -322,6 +326,10 @@ void vmstate_save_state(QEMUFile *f, const 
> > VMStateDescription *vmsd,
> >  int64_t old_offset, written_bytes;
> >  QJSON *vmdesc_loop = vmdesc;
> >  
> > +if (size == 0) {
> > +field++;
> > +continue;
> > +}
> >  trace_vmstate_save_state_loop(vmsd->name, field->name, 
> > n_elems);
> >  if (field->flags & VMS_POINTER) {
> >  first_elem = *(void **)first_elem;
> 
> This is really a live migration fix, so I'm adding Juan and 

Re: [Qemu-block] [PATCH RFC 1/1] vmstate: draft fix for failed iotests case 68 and 91

2017-03-07 Thread Kevin Wolf
Am 07.03.2017 um 03:53 hat QingFeng Hao geschrieben:
> I am not very clear about the logic in vmstate.c, but from its context in
> vmstate_save_state, it seems size should not be 0, otherwise the followed
> for loop will keep working on the same element. So I just add a simple
> check to pass that case, not sure if it's right but it can pass iotest
> case 68 and 91 now.
> 
> The iotest's failed output is:
> 068 1s ... - output mismatch (see 068.out.bad)
> --- 
> /home/haoqf/KVMonz/gitcheck/work/qemu-master/tree/qemu/tests/qemu-iotests/068.out
>2017-03-06 05:52:24.817328899 +0100
> +++ 068.out.bad 2017-03-07 03:28:44.426714519 +0100
> @@ -3,9 +3,13 @@
>  === Saving and reloading a VM state to/from a qcow2 image ===
> 
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
> +qemu-system-s390x: migration/vmstate.c:336: vmstate_save_state: Assertion 
> `first_elem || !n_elems' failed.
> +./common.config: line 109: 52497 Aborted ( if [ -n 
> "${QEMU_NEED_PID}" ]; then
> +echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid";
> +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" )
>  QEMU X.Y.Z monitor - type 'help' for more information
>  (qemu) savevm 0
> -(qemu) quit
> +qemu-system-s390x: Device 'virtio0' does not have the requested snapshot '0'
>  QEMU X.Y.Z monitor - type 'help' for more information
>  (qemu) quit
>  *** done
> 
> 091 1s ... [failed, exit status 1] - output mismatch (see 091.out.bad)
> --- tests/qemu-iotests/091.out2016-08-30 12:35:04.207683276 +0200
> +++ 091.out.bad   2017-03-06 13:08:03.717135426 +0100
> @@ -11,18 +11,23 @@
>  
>  vm1: qemu-io disk write complete
>  vm1: live migration started
> -vm1: live migration completed
> -
> -=== VM 2: Post-migration, write to disk, verify running ===
> -
> -vm2: qemu-io disk write complete
> -vm2: qemu process running successfully
> -vm2: flush io, and quit
> -Check image pattern
> -read 4194304/4194304 bytes at offset 0
> -4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -Running 'qemu-img check -r all $TEST_IMG'
> -No errors were found on the image.
> -80/16384 = 0.49% allocated, 0.00% fragmented, 0.00% compressed clusters
> -Image end offset: 5570560
> -*** done
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +./common.qemu: line 110: write error: Broken pipe
> +Timeout waiting for completed on handle 0
> 
> Signed-off-by: QingFeng Hao 
> ---
>  migration/vmstate.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/migration/vmstate.c b/migration/vmstate.c
> index 78b3cd4..ff28dde 100644
> --- a/migration/vmstate.c
> +++ b/migration/vmstate.c
> @@ -106,6 +106,10 @@ int vmstate_load_state(QEMUFile *f, const 
> VMStateDescription *vmsd,
>  int i, n_elems = vmstate_n_elems(opaque, field);
>  int size = vmstate_size(opaque, field);
>  
> +if (size == 0) {
> +field++;
> +continue;
> +}
>  vmstate_handle_alloc(first_elem, field, opaque);
>  if (field->flags & VMS_POINTER) {
>  first_elem = *(void **)first_elem;
> @@ -322,6 +326,10 @@ void vmstate_save_state(QEMUFile *f, const 
> VMStateDescription *vmsd,
>  int64_t old_offset, written_bytes;
>  QJSON *vmdesc_loop = vmdesc;
>  
> +if (size == 0) {
> +field++;
> +continue;
> +}
>  trace_vmstate_save_state_loop(vmsd->name, field->name, n_elems);
>  if (field->flags & VMS_POINTER) {
>  first_elem = *(void **)first_elem;

This is really a live migration fix, so I'm adding Juan and Dave to CC.

I suspect the real question is why a field with size 0 was even stored
in the vmstate to begin with.

Kevin



Re: [Qemu-block] [PATCH 2/4] qmp-cont: invalidate on RUN_STATE_PRELAUNCH

2017-03-07 Thread Fam Zheng
On Sat, 02/25 22:31, Vladimir Sementsov-Ogievskiy wrote:
> We must invalidate on RUN_STATE_PRELAUNCH too, as it is available
> through qmp_system_reset from RUN_STATE_POSTMIGRATE. Otherwise, we will
> come to
> 
> qemu-kvm: block/io.c:1406: bdrv_co_do_pwritev:
>Assertion `!(bs->open_flags & 0x0800)' failed.
> 
> on the first write after vm start.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  qmp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/qmp.c b/qmp.c
> index dfaabac1a6..e61795d033 100644
> --- a/qmp.c
> +++ b/qmp.c
> @@ -198,7 +198,8 @@ void qmp_cont(Error **errp)
>  /* Continuing after completed migration. Images have been inactivated to
>   * allow the destination to take control. Need to get control back now. 
> */
>  if (runstate_check(RUN_STATE_FINISH_MIGRATE) ||
> -runstate_check(RUN_STATE_POSTMIGRATE))
> +runstate_check(RUN_STATE_POSTMIGRATE) ||
> +runstate_check(RUN_STATE_PRELAUNCH))
>  {
>  bdrv_invalidate_cache_all(_err);
>  if (local_err) {
> -- 
> 2.11.1
> 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [PATCH 1/4] iotests: add migration corner cases test

2017-03-07 Thread Fam Zheng
On Sat, 02/25 22:31, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/175 | 71 
> ++
>  tests/qemu-iotests/175.out |  5 
>  tests/qemu-iotests/group   |  1 +
>  3 files changed, 77 insertions(+)
>  create mode 100644 tests/qemu-iotests/175
>  create mode 100644 tests/qemu-iotests/175.out
> 
> diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
> new file mode 100644
> index 00..ef86c70db5
> --- /dev/null
> +++ b/tests/qemu-iotests/175
> @@ -0,0 +1,71 @@
> +#!/usr/bin/env python
> +#
> +# Test migration corner-cases
> +#
> +# Copyright (C) Vladimir Sementsov-Ogievskiy 2017
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see .
> +#
> +
> +import os
> +import iotests
> +import time
> +from iotests import qemu_img
> +
> +disk = os.path.join(iotests.test_dir, 'disk')
> +
> +class TestMigrationCornerCases(iotests.QMPTestCase):
> +def setUp(self):
> +qemu_img('create', '-f', iotests.imgfmt, disk, '10M')
> +self.vm = iotests.VM().add_drive(disk)
> +self.vm.launch()
> +
> +def tearDown(self):
> +self.vm.shutdown()
> +os.remove(disk)
> +
> +def test_migrate_reset_cont_write(self):
> +result = self.vm.qmp('migrate', uri='exec:cat>/dev/null')
> +self.assert_qmp(result, 'return', {})
> +time.sleep(4)

Should you query the migration status instead of blink sleep?

> +
> +result = self.vm.qmp('human-monitor-command',
> + command_line='system_reset')
> +self.assert_qmp(result, 'return', '')
> +
> +result = self.vm.qmp('cont')
> +self.assert_qmp(result, 'return', {})
> +
> +result = self.vm.qmp('human-monitor-command',
> + command_line='qemu-io drive0 "write 0 512"')
> +self.assert_qmp(result, 'return', '')
> +
> +def test_migrate_savevm(self):
> +result = self.vm.qmp('migrate', uri='exec:cat>/dev/null')
> +self.assert_qmp(result, 'return', {})
> +time.sleep(4)

Ditto.

> +
> +result = self.vm.qmp('human-monitor-command', command_line='savevm')
> +self.assert_qmp(result, 'return', '')
> +
> +def test_savevm_set_speed_savevm(self):
> +for i in range(10):
> +result = self.vm.qmp('human-monitor-command', 
> command_line='savevm')
> +self.assert_qmp(result, 'return', '')
> +
> +result = self.vm.qmp('migrate_set_speed', 
> value=9223372036853727232)
> +self.assert_qmp(result, 'return', {})
> +
> +if __name__ == '__main__':
> +iotests.main()
> diff --git a/tests/qemu-iotests/175.out b/tests/qemu-iotests/175.out
> new file mode 100644
> index 00..8d7e996700
> --- /dev/null
> +++ b/tests/qemu-iotests/175.out
> @@ -0,0 +1,5 @@
> +...
> +--
> +Ran 3 tests
> +
> +OK
> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
> index 985b9a6a36..1f4bf03185 100644
> --- a/tests/qemu-iotests/group
> +++ b/tests/qemu-iotests/group
> @@ -167,3 +167,4 @@
>  172 auto
>  173 rw auto
>  174 auto
> +175 auto quick

Not sure it still qualifies as "quick" given the two "time.sleep(4)", but more
importantly I wonder if the sleep should be improved.

Fam



Re: [Qemu-block] [PATCH 00/10] block: Op blocker fixes

2017-03-07 Thread Fam Zheng
On Mon, 03/06 17:21, Kevin Wolf wrote:
> This series fixes a few problems introduced recently with the op blocker
> series. It includes mainly fix for cases where qemu would abort()
> instead of doing proper error handling previously. These changes also
> happen to result in more complete and correct permission checking.
> 
> Kevin Wolf (10):
>   commit: Fix error handling
>   mirror: Fix permission problem with 'replaces'
>   mirror: Fix permissions for removing mirror_top_bs
>   mirror: Fix error path for dirty bitmap creation
>   block: Fix blockdev-snapshot error handling
>   block: Factor out should_update_child()
>   block: Factor out bdrv_replace_child_noperm()
>   block: Ignore multiple children in bdrv_check_update_perm()
>   block: Handle permission errors in change_parent_backing_link()
>   block: Fix error handling in bdrv_replace_in_backing_chain()
> 
>  block.c   | 182 
> ++
>  block/commit.c|   2 +-
>  block/mirror.c|  35 +
>  blockdev.c|   6 +-
>  include/block/block.h |   4 +-
>  include/block/block_int.h |   6 +-
>  6 files changed, 152 insertions(+), 83 deletions(-)
> 
> -- 
> 1.8.3.1
> 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [Qemu-devel] [PATCH 06/10] block: Factor out should_update_child()

2017-03-07 Thread Philippe Mathieu-Daudé
On Mon, Mar 6, 2017 at 5:35 PM, Eric Blake  wrote:
> On 03/06/2017 10:21 AM, Kevin Wolf wrote:
>> Signed-off-by: Kevin Wolf 
>> ---
>>  block.c | 42 +++---
>>  1 file changed, 27 insertions(+), 15 deletions(-)
>>
>
> Reviewed-by: Eric Blake 

Reviewed-by: Philippe Mathieu-Daudé 

>
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>