date:20180307

Re: [Qemu-block] [PATCH v4] iotests: Tweak 030 in order to trigger a race condition with parallel jobs

2018-03-07 Thread Alberto Garcia

On Wed 07 Mar 2018 06:54:51 PM CET, Max Reitz wrote:
>> v4: Mention that commit 1a63a907507fbbcfaee3f622907ec24 also
>> contributes to solve the original bug (both commits need to
>> reverted in order to reproduce this bug reliably).
>> 
>> Rewrite the loop that writes data into the images to make it more
>> readable.
>
> Thanks!  Applied to my block tree:
>
> https://github.com/XanClic/qemu/commits/block
>
> (Still took me a couple of attempts to get it to fail both commits
> reverted, though...)

Odd, I can reproduce it 100% of the cases. Were you maybe running the
tests on tmpfs ?

Anyway, thanks!

Berto

Re: [Qemu-block] [PATCH] nbd/server: Honor FUA request on NBD_CMD_TRIM

2018-03-07 Thread Paolo Bonzini


> The NBD spec states that since trim requests can affect disk contents,
> then they should allow for FUA semantics just like writes for ensuring
> the disk has settled before returning.  As bdrv_[co_]pdiscard() does
> not (yet?) support a flags argument, we can't pass FUA down the block
> layer stack, and must therefore emulate it with a flush at the NBD
> layer.

TRIM requests should not need FUA since they're just advisory.  On
the other hand, WRITE ZEROES requests need to support FUA.

Paolo

Re: [Qemu-block] [PATCH v3 0/4] vl: introduce vm_shutdown()

2018-03-07 Thread Fam Zheng

On Wed, 03/07 14:42, Stefan Hajnoczi wrote:
> v3:
>  * Rebase on qemu.git/master after AIO_WAIT_WHILE() was merged [Fam]
> v2:
>  * Tackle the .ioeventfd_stop() vs vq handler race by removing the ioeventfd
>from a BH in the IOThread [Fam]
> 
> There are several race conditions in virtio-blk/virtio-scsi dataplane code.
> This patch series addresses them, see the commit description for details on 
> the
> individual cases.

Reviewed-by: Fam Zheng

Re: [Qemu-block] [Qemu-devel] [PATCH 0/2] block/ssh: Implement .bdrv_refresh_filename()

2018-03-07 Thread Fam Zheng

On Wed, 03/07 12:50, John Snow wrote:
> It's something I'd like to see patchew do, actually:
> 
> "Here's a list of what's on the list that has no reviews or NACKs, and
> needs some love"

It's not hard to define a search condition for that:

http://patchew.org/search-help

http://patchew.org/search?q=project%3AQEMU+age%3A%3E1m+not%3Areviewed+not%3Areplied+not%3Amerged+is%3Atested+to%3Aqemu-block

> 
> coupled with a 30 day "Hey, nobody looked at this" ping to the list
> before it NACKs a set for being too old.

If the initial landing of the patch didn't get enough attention, chances are the
pings will not change much about it especially it's from a bot.

A summary list sounds good, though.

Fam

Re: [Qemu-block] block migration and dirty bitmap reset

2018-03-07 Thread Fam Zheng

On Wed, 03/07 09:06, Peter Lieven wrote:
> Hi,
> 
> while looking at the code I wonder if the blk_aio_preadv and the 
> bdrv_reset_dirty_bitmap order must
> be swapped in mig_save_device_bulk:
> 
> qemu_mutex_lock_iothread();
> aio_context_acquire(blk_get_aio_context(bmds->blk));
> blk->aiocb = blk_aio_preadv(bb, cur_sector * BDRV_SECTOR_SIZE, >qiov,
> 0, blk_mig_read_cb, blk);
> 
> bdrv_reset_dirty_bitmap(bmds->dirty_bitmap, cur_sector * BDRV_SECTOR_SIZE,
> nr_sectors * BDRV_SECTOR_SIZE);
> aio_context_release(blk_get_aio_context(bmds->blk));
> qemu_mutex_unlock_iothread();
> 
> In mig_save_device_dirty we first reset the dirty bitmap and read then which 
> shoulds like
> a better idea.

Yes, that sounds right to me.

Fam

Re: [Qemu-block] [Qemu-devel] [PATCH 0/2] block: fix nbd-server-stop crash after blockdev-snapshot-sync

2018-03-07 Thread Eric Blake


On 03/06/2018 02:48 PM, Stefan Hajnoczi wrote:

The blockdev-snapshot-sync command uses bdrv_append() to update all parents to
point at the external snapshot node.  This breaks BlockBackend's
blk_add/remove_aio_context_notifier(), which doesn't expect a BDS change.

Patch 1 fixes this by tracking AioContext notifiers in BlockBackend.

See the test case in Patch 2 for a reproducer.

Stefan Hajnoczi (2):
   block: let blk_add/remove_aio_context_notifier() tolerate BDS changes
   iotests: add 208 nbd-server + blockdev-snapshot-sync test case

  block/block-backend.c  | 63 ++
  block/trace-events |  2 ++
  tests/qemu-iotests/208 | 55 
  tests/qemu-iotests/208.out |  9 +++
  tests/qemu-iotests/group   |  1 +
  5 files changed, 130 insertions(+)
  create mode 100755 tests/qemu-iotests/208
  create mode 100644 tests/qemu-iotests/208.out


Whose tree should this series go through?  MAINTAINERS didn't flag it as 
directly touching any files that normally affect my NBD queue, but given 
that the iotest that reproduces the problem uses NBD, I'm fine if you 
want it to go through me.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [Qemu-devel] [PATCH v2] block: make BDRV_POLL_WHILE() re-entrancy safe

2018-03-07 Thread Eric Blake


On 03/07/2018 06:46 AM, Stefan Hajnoczi wrote:

Nested BDRV_POLL_WHILE() calls can occur.  Currently
assert(!wait_->wakeup) fails in AIO_WAIT_WHILE() when this happens.

This patch converts the bool wait_->need_kick flag to an unsigned
wait_->num_waiters counter.

Nesting works correctly because outer AIO_WAIT_WHILE() callers evaluate
the condition again after the inner caller completes (invoking the inner
caller counts as aio_poll() progress).

Reported-by: "fuweiwei (C)" 
Cc: Paolo Bonzini 
Signed-off-by: Stefan Hajnoczi 
---
v2:
  * Rebase onto qemu.git/master now that AIO_WAIT_WHILE() has landed
[Kevin]

  include/block/aio-wait.h | 61 


Looks big due to whitespace change when column for trailing \ changed. 
Viewing the diff with whitespace ignored made it easier to review.


Reviewed-by: Eric Blake 

diff --git c/include/block/aio-wait.h w/include/block/aio-wait.h
index a48c744fa87..74cde07bef3 100644
--- c/include/block/aio-wait.h
+++ w/include/block/aio-wait.h
@@ -50,8 +50,8 @@
  *   }
  */
 typedef struct {
-/* Is the main loop waiting for a kick?  Accessed with atomic ops. */
-bool need_kick;
+/* Number of waiting AIO_WAIT_WHILE() callers. Accessed with atomic 
ops. */

+unsigned num_waiters;
 } AioWait;

 /**
@@ -84,9 +84,8 @@ typedef struct {
 } else {   \
 assert(qemu_get_current_aio_context() ==   \
qemu_get_aio_context());\
-assert(!wait_->need_kick);  \
-/* Set wait_->need_kick before evaluating cond.  */ \
-atomic_mb_set(_->need_kick, true); \
+/* Increment wait_->num_waiters before evaluating cond. */ \
+atomic_inc(_->num_waiters);   \
 while (busy_) {\
 if ((cond)) {  \
 waited_ = busy_ = true;\
@@ -98,7 +97,7 @@ typedef struct {
 waited_ |= busy_;  \
 }  \
 }  \
-atomic_set(_->need_kick, false);   \
+atomic_dec(_->num_waiters);   \
 }  \
 waited_; })


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-block] [PATCH] nbd/server: Honor FUA request on NBD_CMD_TRIM

2018-03-07 Thread Eric Blake

The NBD spec states that since trim requests can affect disk contents,
then they should allow for FUA semantics just like writes for ensuring
the disk has settled before returning.  As bdrv_[co_]pdiscard() does
not (yet?) support a flags argument, we can't pass FUA down the block
layer stack, and must therefore emulate it with a flush at the NBD
layer.

Signed-off-by: Eric Blake 
---

Question for Paolo: does ISCSI support the notion of FUA on a
TRIM request (where we could better emulate a guest TRIM request
with FUA all the way through our stack to the NBD server), or is
FUA just for normal writes?  Likewise, are you familiar enough
with the kernel's NBD module to know if the kernel as an NBD client
would ever request FUA on a discard request?

Question for Kevin: should we update the block layer to have a
flag arguments to bdrv_co_pdiscard (right now, the only valid
flag would be BDRV_REQ_FUA, and we'd probably need a
supported_discard_flags in parallel to supported_write_flags),
and implement qemu-io -c 'discard -f' for easily testing the use
of that flag?

Depending on answers to those questions, I may want to spin a
v2 patch that adds flag support throughout the block layer
discard implementation, rather than this patch which just does
it in NBD; but if nothing else, this is the shortest patch
possible to fix the (corner-case?) NBD spec non-compliance.

 nbd/server.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/nbd/server.c b/nbd/server.c
index 4990a5826e6..e098da819df 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1623,6 +1623,9 @@ static coroutine_fn void nbd_trip(void *opaque)
 case NBD_CMD_TRIM:
 ret = blk_co_pdiscard(exp->blk, request.from + exp->dev_offset,
   request.len);
+if (ret == 0 && request.flags & NBD_CMD_FLAG_FUA) {
+ret = blk_co_flush(exp->blk);
+}
 if (ret < 0) {
 error_setg_errno(_err, -ret, "discard failed");
 }
-- 
2.14.3

Re: [Qemu-block] [Qemu-devel] [PATCH v4 00/37] x-blockdev-create for protocols and qcow2

2018-03-07 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180307185946.29366-1-kw...@redhat.com
Subject: [Qemu-devel] [PATCH v4 00/37] x-blockdev-create for protocols and qcow2

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   patchew/20180307185946.29366-1-kw...@redhat.com -> 
patchew/20180307185946.29366-1-kw...@redhat.com
 t [tag update]patchew/cover.1520352600.git.be...@igalia.com -> 
patchew/cover.1520352600.git.be...@igalia.com
Switched to a new branch 'test'
31a3009b18 qemu-iotests: Test ssh image creation over QMP
04d93a3d4f qemu-iotests: Test qcow2 over file image creation with QMP
4d44559176 block: Fail bdrv_truncate() with negative size
199d11005c file-posix: Fix no-op bdrv_truncate() with falloc preallocation
6d9de3491a ssh: Support .bdrv_co_create
38e4ea119c ssh: Pass BlockdevOptionsSsh to connect_to_ssh()
09a94b57cf ssh: QAPIfy host-key-check option
78fe031228 ssh: Use QAPI BlockdevOptionsSsh object
a43d55b0fe sheepdog: Support .bdrv_co_create
5c16d28edd sheepdog: QAPIfy "redundancy" create option
827ffe113c nfs: Support .bdrv_co_create
4c6f72aa3d nfs: Use QAPI options in nfs_client_open()
cb68550d50 rbd: Use qemu_rbd_connect() in qemu_rbd_do_create()
6534001b69 rbd: Assign s->snap/image_name in qemu_rbd_open()
016039e274 rbd: Support .bdrv_co_create
0fae0f1e6b rbd: Pass BlockdevOptionsRbd to qemu_rbd_connect()
0493c80f19 rbd: Remove non-schema options from runtime_opts
b5e3a19199 rbd: Factor out qemu_rbd_connect()
9f5e2db035 rbd: Fix use after free in qemu_rbd_set_keypairs() error path
a6fe11e442 gluster: Support .bdrv_co_create
8dd9c3 file-win32: Support .bdrv_co_create
561c7126e2 file-posix: Support .bdrv_co_create
61c550d7b9 block: x-blockdev-create QMP command
a349d435e7 block: Make bdrv_is_whitelisted() public
da8d4fde95 qcow2: Use visitor for options in qcow2_create()
0c6082fa17 qdict: Introduce qdict_rename_keys()
54130ce09d test-qemu-opts: Test qemu_opts_to_qdict_filtered()
4876bce8c0 test-qemu-opts: Test qemu_opts_append()
e7a13d4c34 util: Add qemu_opts_to_qdict_filtered()
200a661b86 qcow2: Handle full/falloc preallocation in qcow2_co_create()
b5d9f42cf2 qcow2: Use QCryptoBlockCreateOptions in qcow2_co_create()
070c5be70c qcow2: Use BlockdevRef in qcow2_co_create()
b908bbca7a qcow2: Pass BlockdevCreateOptions to qcow2_co_create()
4339d9e11b qcow2: Let qcow2_create() handle protocol layer
d4b04ac240 qcow2: Rename qcow2_co_create2() to qcow2_co_create()
ada0274302 block/qapi: Add qcow2 create options to schema
ab0ff60240 block/qapi: Introduce BlockdevCreateOptions

=== OUTPUT BEGIN ===
Checking PATCH 1/37: block/qapi: Introduce BlockdevCreateOptions...
Checking PATCH 2/37: block/qapi: Add qcow2 create options to schema...
Checking PATCH 3/37: qcow2: Rename qcow2_co_create2() to qcow2_co_create()...
Checking PATCH 4/37: qcow2: Let qcow2_create() handle protocol layer...
Checking PATCH 5/37: qcow2: Pass BlockdevCreateOptions to qcow2_co_create()...
Checking PATCH 6/37: qcow2: Use BlockdevRef in qcow2_co_create()...
Checking PATCH 7/37: qcow2: Use QCryptoBlockCreateOptions in 
qcow2_co_create()...
Checking PATCH 8/37: qcow2: Handle full/falloc preallocation in 
qcow2_co_create()...
Checking PATCH 9/37: util: Add qemu_opts_to_qdict_filtered()...
Checking PATCH 10/37: test-qemu-opts: Test qemu_opts_append()...
Checking PATCH 11/37: test-qemu-opts: Test qemu_opts_to_qdict_filtered()...
WARNING: line over 80 characters
#156: FILE: tests/test-qemu-opts.c:1015:
+g_test_add_func("/qemu-opts/to_qdict/filtered", 
test_opts_to_qdict_filtered);

WARNING: line over 80 characters
#157: FILE: tests/test-qemu-opts.c:1016:
+g_test_add_func("/qemu-opts/to_qdict/duplicates", 
test_opts_to_qdict_duplicates);

total: 0 errors, 2 warnings, 143 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 12/37: qdict: Introduce qdict_rename_keys()...
Checking PATCH 13/37: qcow2: Use visitor for options in qcow2_create()...
Checking PATCH 14/37: block: Make bdrv_is_whitelisted() public...
Checking PATCH 15/37: block: x-blockdev-create QMP command...
Checking PATCH 16/37: file-posix: Support .bdrv_co_create...
Checking PATCH 17/37: file-win32: Support .bdrv_co_create...
Checking PATCH 18/37:

Re: [Qemu-block] Limiting coroutine stack usage

2018-03-07 Thread Peter Lieven

Am 06.03.2018 um 12:51 schrieb Stefan Hajnoczi:
> On Tue, Feb 20, 2018 at 06:04:02PM +0100, Peter Lieven wrote:
>> I remember we discussed a long time ago to limit the stack usage of all 
>> functions that are executed in a coroutine
>> context to a very low value to be able to safely limit the coroutine stack 
>> size as well.
>>
>> I checked through all functions in block/, migration/ and nbd/ and there are 
>> only very few larger or unbound stack
>> allocations that can easily be fixed.
>>
>> Now my question: Is there an easy way to add a cflag like -Wstack-usage=2048 
>> to all objects in a given directory only?
>> I tried to add a llimit to the whole project, but fixing this will be a 
>> larger task.
> 2KB is fine for QEMU code but actual coroutine stack sizes will have to
> be at least 8KB, I guess, in order for third-party libraries to work
> (e.g. curl, rbd).  PATH_MAX is 4KB on Linux.
>
> Nested event loops in QEMU code can also result in deep call stacks.
> This happens when aio_poll() invokes an fd handler or BH that also
> invokes aio_poll().

The plan was to limit the stack usage only as a compiler option. I would leave 
the coroutine stack size at 1MB
for now until we have a way to identify the worst case usage.

Peter

Re: [Qemu-block] block migration and MAX_IN_FLIGHT_IO

2018-03-07 Thread Peter Lieven

Am 07.03.2018 um 10:47 schrieb Stefan Hajnoczi:
> On Wed, Mar 7, 2018 at 7:55 AM, Peter Lieven  wrote:
>> Am 06.03.2018 um 17:35 schrieb Peter Lieven:
>>> Am 06.03.2018 um 17:07 schrieb Stefan Hajnoczi:
 On Mon, Mar 05, 2018 at 02:52:16PM +, Dr. David Alan Gilbert wrote:
> * Peter Lieven (p...@kamp.de) wrote:
>> Am 05.03.2018 um 12:45 schrieb Stefan Hajnoczi:
>>> On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote:
 I stumbled across the MAX_INFLIGHT_IO field that was introduced in 
 2015 and was curious what was the reason
 to choose 512MB as readahead? The question is that I found that the 
 source VM gets very unresponsive I/O wise
 while the initial 512MB are read and furthermore seems to stay 
 unreasponsive if we choose a high migration speed
 and have a fast storage on the destination VM.

 In our environment I modified this value to 16MB which seems to work 
 much smoother. I wonder if we should make
 this a user configurable value or define a different rate limit for 
 the block transfer in bulk stage at least?
>>> I don't know if benchmarks were run when choosing the value.  From the
>>> commit description it sounds like the main purpose was to limit the
>>> amount of memory that can be consumed.
>>>
>>> 16 MB also fulfills that criteria :), but why is the source VM more
>>> responsive with a lower value?
>>>
>>> Perhaps the issue is queue depth on the storage device - the block
>>> migration code enqueues up to 512 MB worth of reads, and guest I/O has
>>> to wait?
>> That is my guess. Especially if the destination storage is faster we 
>> basically alsways have
>> 512 I/Os in flight on the source storage.
>>
>> Does anyone mind if the reduce that value to 16MB or do we need a better 
>> mechanism?
> We've got migration-parameters these days; you could connect it to one
> of those fairly easily I think.
> Try: grep -i 'cpu[-_]throttle[-_]initial'  for an example of one that's
> already there.
> Then you can set it to whatever you like.
 It would be nice to solve the performance problem without adding a
 tuneable.

 On the other hand, QEMU has no idea what the queue depth of the device
 is.  Therefore it cannot prioritize guest I/O over block migration I/O.

 512 parallel requests is much too high.  Most parallel I/O benchmarking
 is done at 32-64 queue depth.

 I think that 16 parallel requests is a reasonable maximum number for a
 background job.

 We need to be clear though that the purpose of this change is unrelated
 to the original 512 MB memory footprint goal.  It just happens to touch
 the same constant but the goal is now to submit at most 16 I/O requests
 in parallel to avoid monopolizing the I/O device.
>>> I think we should really look at this. The variables that control if we 
>>> stay in the while loop or not are incremented and decremented
>>> at the following places:
>>>
>>> mig_save_device_dirty:
>>> mig_save_device_bulk:
>>> block_mig_state.submitted++;
>>>
>>> blk_mig_read_cb:
>>> block_mig_state.submitted--;
>>> block_mig_state.read_done++;
>>>
>>> flush_blks:
>>> block_mig_state.read_done--;
>>>
>>> The condition of the while loop is:
>>> (block_mig_state.submitted +
>>> block_mig_state.read_done) * BLOCK_SIZE <
>>>qemu_file_get_rate_limit(f) &&
>>>(block_mig_state.submitted +
>>> block_mig_state.read_done) <
>>>MAX_INFLIGHT_IO)
>>>
>>> At first I wonder if we ever reach the rate-limit because we put the read 
>>> buffers onto f AFTER we exit the while loop?
>>>
>>> And even if we reach the limit we constantly maintain 512 I/Os in parallel 
>>> because we immediately decrement read_done
>>> when we put the buffers to f in flush_blks. In the next iteration of the 
>>> while loop we then read again until we have 512 in-flight I/Os.
>>>
>>> And shouldn't we have a time limit to limit the time we stay in the while 
>>> loop? I think we artificially delay sending data to f?
>> Thinking about it for a while I would propose the following:
>>
>> a) rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS
>> b) add MAX_PARALLEL_IO with a value of 16
>> c) compare qemu_file_get_rate_limit only with block_mig_state.read_done
>>
>> This would yield in the following condition for the while loop:
>>
>> (block_mig_state.read_done * BLOCK_SIZE < qemu_file_get_rate_limit(f) &&
>>  (block_mig_state.submitted + block_mig_state.read_done) < MAX_IO_BUFFERS &&
>>  block_mig_state.submitted < MAX_PARALLEL_IO)
>>
>> Sounds that like a plan?
> That sounds good to me.

I will prepare patches for this.

Peter

Re: [Qemu-block] [PATCH v4 05/37] qcow2: Pass BlockdevCreateOptions to qcow2_co_create()

2018-03-07 Thread Eric Blake


On 03/07/2018 12:59 PM, Kevin Wolf wrote:

All of the simple options are now passed to qcow2_co_create() in a
BlockdevCreateOptions object. Still missing: node-name and the
encryption options.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
  block/qcow2.c | 189 ++
  1 file changed, 151 insertions(+), 38 deletions(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH v4 33/37] ssh: Support .bdrv_co_create

2018-03-07 Thread Max Reitz

On 2018-03-07 19:59, Kevin Wolf wrote:
> This adds the .bdrv_co_create driver callback to ssh, which enables
> image creation over QMP.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-core.json | 16 +-
>  block/ssh.c  | 83 
> ++--
>  2 files changed, 63 insertions(+), 36 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v4 03/37] qcow2: Rename qcow2_co_create2() to qcow2_co_create()

2018-03-07 Thread Eric Blake


On 03/07/2018 12:59 PM, Kevin Wolf wrote:

The functions originally known as qcow2_create() and qcow2_create2()
are now called qcow2_co_create_opts() and qcow2_co_create(), which
matches the names of the BlockDriver callbacks that they will implement
at the end of this patch series.

Signed-off-by: Kevin Wolf 
---
  block/qcow2.c | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH v4 03/37] qcow2: Rename qcow2_co_create2() to qcow2_co_create()

2018-03-07 Thread Max Reitz

On 2018-03-07 19:59, Kevin Wolf wrote:
> The functions originally known as qcow2_create() and qcow2_create2()
> are now called qcow2_co_create_opts() and qcow2_co_create(), which
> matches the names of the BlockDriver callbacks that they will implement
> at the end of this patch series.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/qcow2.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH v4 35/37] block: Fail bdrv_truncate() with negative size

2018-03-07 Thread Kevin Wolf

Most callers have their own checks, but something like this should also
be checked centrally. As it happens, x-blockdev-create can pass negative
image sizes to format drivers (because there is no QAPI type that would
reject negative numbers) and triggers the check added by this patch.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block.c b/block.c
index 00f94241fc..75a9fd49de 100644
--- a/block.c
+++ b/block.c
@@ -3719,6 +3719,11 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, 
PreallocMode prealloc,
 error_setg(errp, "No medium inserted");
 return -ENOMEDIUM;
 }
+if (offset < 0) {
+error_setg(errp, "Image size cannot be negative");
+return -EINVAL;
+}
+
 if (!drv->bdrv_truncate) {
 if (bs->file && drv->is_filter) {
 return bdrv_truncate(bs->file, offset, prealloc, errp);
-- 
2.13.6

[Qemu-block] [PATCH v4 34/37] file-posix: Fix no-op bdrv_truncate() with falloc preallocation

2018-03-07 Thread Kevin Wolf

If bdrv_truncate() is called, but the requested size is the same as
before, don't call posix_fallocate(), which returns -EINVAL for length
zero and would therefore make bdrv_truncate() fail.

The problem can be triggered by creating a zero-sized raw image with
'falloc' preallocation mode.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/file-posix.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index fbc21a9921..d7fb772c14 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1686,11 +1686,15 @@ static int raw_regular_truncate(int fd, int64_t offset, 
PreallocMode prealloc,
  * file systems that do not support fallocate(), trying to check if a
  * block is allocated before allocating it, so don't do that here.
  */
-result = -posix_fallocate(fd, current_length, offset - current_length);
-if (result != 0) {
-/* posix_fallocate() doesn't set errno. */
-error_setg_errno(errp, -result,
- "Could not preallocate new data");
+if (offset != current_length) {
+result = -posix_fallocate(fd, current_length, offset - 
current_length);
+if (result != 0) {
+/* posix_fallocate() doesn't set errno. */
+error_setg_errno(errp, -result,
+ "Could not preallocate new data");
+}
+} else {
+result = 0;
 }
 goto out;
 #endif
-- 
2.13.6

[Qemu-block] [PATCH v4 28/37] sheepdog: QAPIfy "redundancy" create option

2018-03-07 Thread Kevin Wolf

The "redundancy" option for Sheepdog image creation is currently a
string that can encode one or two integers depending on its format,
which at the same time implicitly selects a mode.

This patch turns it into a QAPI union and converts the string into such
a QAPI object before interpreting the values.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json | 45 +
 block/sheepdog.c | 94 +---
 2 files changed, 112 insertions(+), 27 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 39e53c7791..e590ab6c71 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3467,6 +3467,51 @@
 '*cluster-size' :   'size' } }
 
 ##
+# @SheepdogRedundancyType:
+#
+# @full Create a fully replicated vdi with x copies
+# @erasure-codedCreate an erasure coded vdi with x data strips and
+#   y parity strips
+#
+# Since: 2.12
+##
+{ 'enum': 'SheepdogRedundancyType',
+  'data': [ 'full', 'erasure-coded' ] }
+
+##
+# @SheepdogRedundancyFull:
+#
+# @copies   Number of copies to use (between 1 and 31)
+#
+# Since: 2.12
+##
+{ 'struct': 'SheepdogRedundancyFull',
+  'data': { 'copies': 'int' }}
+
+##
+# @SheepdogRedundancyErasureCoded:
+#
+# @data-strips  Number of data strips to use (one of {2,4,8,16})
+# @parity-stripsNumber of parity strips to use (between 1 and 15)
+#
+# Since: 2.12
+##
+{ 'struct': 'SheepdogRedundancyErasureCoded',
+  'data': { 'data-strips': 'int',
+'parity-strips': 'int' }}
+
+##
+# @SheepdogRedundancy:
+#
+# Since: 2.12
+##
+{ 'union': 'SheepdogRedundancy',
+  'base': { 'type': 'SheepdogRedundancyType' },
+  'discriminator': 'type',
+  'data': { 'full': 'SheepdogRedundancyFull',
+'erasure-coded': 'SheepdogRedundancyErasureCoded' } }
+
+##
 # @BlockdevCreateNotSupported:
 #
 # This is used for all drivers that don't support creating images.
diff --git a/block/sheepdog.c b/block/sheepdog.c
index d8c10b7cac..3966cd229a 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1882,6 +1882,48 @@ out_with_err_set:
 return ret;
 }
 
+static int parse_redundancy(BDRVSheepdogState *s, SheepdogRedundancy *opt)
+{
+struct SheepdogInode *inode = >inode;
+
+switch (opt->type) {
+case SHEEPDOG_REDUNDANCY_TYPE_FULL:
+if (opt->u.full.copies > SD_MAX_COPIES || opt->u.full.copies < 1) {
+return -EINVAL;
+}
+inode->copy_policy = 0;
+inode->nr_copies = opt->u.full.copies;
+return 0;
+
+case SHEEPDOG_REDUNDANCY_TYPE_ERASURE_CODED:
+{
+int64_t copy = opt->u.erasure_coded.data_strips;
+int64_t parity = opt->u.erasure_coded.parity_strips;
+
+if (copy != 2 && copy != 4 && copy != 8 && copy != 16) {
+return -EINVAL;
+}
+
+if (parity >= SD_EC_MAX_STRIP || parity < 1) {
+return -EINVAL;
+}
+
+/*
+ * 4 bits for parity and 4 bits for data.
+ * We have to compress upper data bits because it can't represent 16
+ */
+inode->copy_policy = ((copy / 2) << 4) + parity;
+inode->nr_copies = copy + parity;
+return 0;
+}
+
+default:
+g_assert_not_reached();
+}
+
+return -EINVAL;
+}
+
 /*
  * Sheepdog support two kinds of redundancy, full replication and erasure
  * coding.
@@ -1892,12 +1934,13 @@ out_with_err_set:
  * # create a erasure coded vdi with x data strips and y parity strips
  * -o redundancy=x:y (x must be one of {2,4,8,16} and 1 <= y < SD_EC_MAX_STRIP)
  */
-static int parse_redundancy(BDRVSheepdogState *s, const char *opt)
+static int parse_redundancy_str(BDRVSheepdogState *s, const char *opt)
 {
-struct SheepdogInode *inode = >inode;
+struct SheepdogRedundancy redundancy;
 const char *n1, *n2;
 long copy, parity;
 char p[10];
+int ret;
 
 pstrcpy(p, sizeof(p), opt);
 n1 = strtok(p, ":");
@@ -1907,35 +1950,32 @@ static int parse_redundancy(BDRVSheepdogState *s, const 
char *opt)
 return -EINVAL;
 }
 
-copy = strtol(n1, NULL, 10);
-/* FIXME fix error checking by switching to qemu_strtol() */
-if (copy > SD_MAX_COPIES || copy < 1) {
-return -EINVAL;
-}
-if (!n2) {
-inode->copy_policy = 0;
-inode->nr_copies = copy;
-return 0;
+ret = qemu_strtol(n1, NULL, 10, );
+if (ret < 0) {
+return ret;
 }
 
-if (copy != 2 && copy != 4 && copy != 8 && copy != 16) {
-return -EINVAL;
-}
+if (!n2) {
+redundancy = (SheepdogRedundancy) {
+.type   = SHEEPDOG_REDUNDANCY_TYPE_FULL,
+.u.full.copies  = copy,
+};
+} else {
+ret = qemu_strtol(n2, NULL, 10, );
+if (ret < 0) {
+return ret;
+}
 
-parity = strtol(n2, NULL, 10);
-/* FIXME fix error checking by

[Qemu-block] [PATCH v4 27/37] nfs: Support .bdrv_co_create

2018-03-07 Thread Kevin Wolf

This adds the .bdrv_co_create driver callback to nfs, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json | 16 ++-
 block/nfs.c  | 76 +---
 2 files changed, 75 insertions(+), 17 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index d4351877fc..39e53c7791 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3393,6 +3393,20 @@
 '*preallocation':   'PreallocMode' } }
 
 ##
+# @BlockdevCreateOptionsNfs:
+#
+# Driver specific image creation options for NFS.
+#
+# @location Where to store the new image file
+# @size Size of the virtual disk in bytes
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsNfs',
+  'data': { 'location': 'BlockdevOptionsNfs',
+'size': 'size' } }
+
+##
 # @BlockdevQcow2Version:
 #
 # @v2:  The original QCOW2 format as introduced in qemu 0.10 (version 2)
@@ -3491,7 +3505,7 @@
   'iscsi':  'BlockdevCreateNotSupported',
   'luks':   'BlockdevCreateNotSupported',
   'nbd':'BlockdevCreateNotSupported',
-  'nfs':'BlockdevCreateNotSupported',
+  'nfs':'BlockdevCreateOptionsNfs',
   'null-aio':   'BlockdevCreateNotSupported',
   'null-co':'BlockdevCreateNotSupported',
   'nvme':   'BlockdevCreateNotSupported',
diff --git a/block/nfs.c b/block/nfs.c
index e402d643fe..2577df4b26 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -551,33 +551,45 @@ out:
 return ret;
 }
 
-static int64_t nfs_client_open_qdict(NFSClient *client, QDict *options,
- int flags, int open_flags, Error **errp)
+static BlockdevOptionsNfs *nfs_options_qdict_to_qapi(QDict *options,
+ Error **errp)
 {
 BlockdevOptionsNfs *opts = NULL;
 QObject *crumpled = NULL;
 Visitor *v;
 Error *local_err = NULL;
-int ret;
 
 crumpled = qdict_crumple(options, errp);
 if (crumpled == NULL) {
-return -EINVAL;
+return NULL;
 }
 
 v = qobject_input_visitor_new_keyval(crumpled);
 visit_type_BlockdevOptionsNfs(v, NULL, , _err);
 visit_free(v);
+qobject_decref(crumpled);
 
 if (local_err) {
-error_propagate(errp, local_err);
+return NULL;
+}
+
+return opts;
+}
+
+static int64_t nfs_client_open_qdict(NFSClient *client, QDict *options,
+ int flags, int open_flags, Error **errp)
+{
+BlockdevOptionsNfs *opts;
+int ret;
+
+opts = nfs_options_qdict_to_qapi(options, errp);
+if (opts == NULL) {
 ret = -EINVAL;
 goto fail;
 }
 
 ret = nfs_client_open(client, opts, flags, open_flags, errp);
 fail:
-qobject_decref(crumpled);
 qapi_free_BlockdevOptionsNfs(opts);
 return ret;
 }
@@ -614,18 +626,43 @@ static QemuOptsList nfs_create_opts = {
 }
 };
 
-static int coroutine_fn nfs_file_co_create_opts(const char *url, QemuOpts 
*opts,
-Error **errp)
+static int nfs_file_co_create(BlockdevCreateOptions *options, Error **errp)
 {
-int64_t ret, total_size;
+BlockdevCreateOptionsNfs *opts = >u.nfs;
 NFSClient *client = g_new0(NFSClient, 1);
-QDict *options = NULL;
+int ret;
+
+assert(options->driver == BLOCKDEV_DRIVER_NFS);
 
 client->aio_context = qemu_get_aio_context();
 
+ret = nfs_client_open(client, opts->location, O_CREAT, 0, errp);
+if (ret < 0) {
+goto out;
+}
+ret = nfs_ftruncate(client->context, client->fh, opts->size);
+nfs_client_close(client);
+
+out:
+g_free(client);
+return ret;
+}
+
+static int coroutine_fn nfs_file_co_create_opts(const char *url, QemuOpts 
*opts,
+Error **errp)
+{
+BlockdevCreateOptions *create_options;
+BlockdevCreateOptionsNfs *nfs_opts;
+QDict *options;
+int ret;
+
+create_options = g_new0(BlockdevCreateOptions, 1);
+create_options->driver = BLOCKDEV_DRIVER_NFS;
+nfs_opts = _options->u.nfs;
+
 /* Read out options */
-total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-  BDRV_SECTOR_SIZE);
+nfs_opts->size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
 
 options = qdict_new();
 ret = nfs_parse_uri(url, options, errp);
@@ -633,15 +670,21 @@ static int coroutine_fn nfs_file_co_create_opts(const 
char *url, QemuOpts *opts,
 goto out;
 }
 
-ret = nfs_client_open_qdict(client, options, O_CREAT, 0, errp);
+nfs_opts->location = nfs_options_qdict_to_qapi(options, errp);
+if (nfs_opts->location == NULL) {
+ret = -EINVAL;
+goto out;
+}
+
+ret =

Re: [Qemu-block] [PATCH v4 22/37] rbd: Pass BlockdevOptionsRbd to qemu_rbd_connect()

2018-03-07 Thread Max Reitz

On 2018-03-07 19:59, Kevin Wolf wrote:
> With the conversion to a QAPI options object, the function is now
> prepared to be used in a .bdrv_co_create implementation.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/rbd.c | 115 
> +---
>  1 file changed, 55 insertions(+), 60 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH v4 25/37] rbd: Use qemu_rbd_connect() in qemu_rbd_do_create()

2018-03-07 Thread Kevin Wolf

This is almost exactly the same code. The differences are that
qemu_rbd_connect() supports BlockdevOptionsRbd.server and that the cache
mode is set explicitly.

Supporting 'server' is a welcome new feature for image creation.
Caching is disabled by default, so leave it that way.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/rbd.c | 54 ++
 1 file changed, 10 insertions(+), 44 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 2ac7ffca42..294ed07ac4 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -103,6 +103,11 @@ typedef struct BDRVRBDState {
 char *snap;
 } BDRVRBDState;
 
+static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
+BlockdevOptionsRbd *opts, bool cache,
+const char *keypairs, const char *secretid,
+Error **errp);
+
 static char *qemu_rbd_next_tok(char *src, char delim, char **p)
 {
 char *end;
@@ -351,12 +356,6 @@ static int qemu_rbd_do_create(BlockdevCreateOptions 
*options,
 return -EINVAL;
 }
 
-/* TODO Remove the limitation */
-if (opts->location->has_server) {
-error_setg(errp, "Can't specify server for image creation");
-return -EINVAL;
-}
-
 if (opts->has_cluster_size) {
 int64_t objsize = opts->cluster_size;
 if ((objsize - 1) & objsize) {/* not a power of 2? */
@@ -370,54 +369,21 @@ static int qemu_rbd_do_create(BlockdevCreateOptions 
*options,
 obj_order = ctz32(objsize);
 }
 
-ret = rados_create(, opts->location->user);
+ret = qemu_rbd_connect(, _ctx, opts->location, false, keypairs,
+   password_secret, errp);
 if (ret < 0) {
-error_setg_errno(errp, -ret, "error initializing");
 return ret;
 }
 
-/* try default location when conf=NULL, but ignore failure */
-ret = rados_conf_read_file(cluster, opts->location->conf);
-if (opts->location->conf && ret < 0) {
-error_setg_errno(errp, -ret, "error reading conf file %s",
- opts->location->conf);
-ret = -EIO;
-goto shutdown;
-}
-
-ret = qemu_rbd_set_keypairs(cluster, keypairs, errp);
-if (ret < 0) {
-ret = -EIO;
-goto shutdown;
-}
-
-if (qemu_rbd_set_auth(cluster, password_secret, errp) < 0) {
-ret = -EIO;
-goto shutdown;
-}
-
-ret = rados_connect(cluster);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "error connecting");
-goto shutdown;
-}
-
-ret = rados_ioctx_create(cluster, opts->location->pool, _ctx);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "error opening pool %s",
- opts->location->pool);
-goto shutdown;
-}
-
 ret = rbd_create(io_ctx, opts->location->image, opts->size, _order);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "error rbd create");
+goto out;
 }
 
-rados_ioctx_destroy(io_ctx);
-
 ret = 0;
-shutdown:
+out:
+rados_ioctx_destroy(io_ctx);
 rados_shutdown(cluster);
 return ret;
 }
-- 
2.13.6

Re: [Qemu-block] [PATCH v2 5/7] qcow2: Check snapshot L1 table in qcow2_snapshot_goto()

2018-03-07 Thread Eric Blake


On 03/06/2018 10:14 AM, Alberto Garcia wrote:

This function copies a snapshot's L1 table into the active one without
validating it first.

We now have a function to take care of this, so let's use it.

Signed-off-by: Alberto Garcia 
Cc: Eric Blake 
---
  block/qcow2-snapshot.c | 9 +
  tests/qemu-iotests/080 | 2 ++
  tests/qemu-iotests/080.out | 4 
  3 files changed, 15 insertions(+)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-block] [PATCH v4 37/37] qemu-iotests: Test ssh image creation over QMP

2018-03-07 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/207 | 261 +
 tests/qemu-iotests/207.out |  75 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 337 insertions(+)
 create mode 100755 tests/qemu-iotests/207
 create mode 100644 tests/qemu-iotests/207.out

diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
new file mode 100755
index 00..f5c77852d1
--- /dev/null
+++ b/tests/qemu-iotests/207
@@ -0,0 +1,261 @@
+#!/bin/bash
+#
+# Test ssh image creation
+#
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=kw...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto ssh
+_supported_os Linux
+
+function do_run_qemu()
+{
+echo Testing: "$@"
+$QEMU -nographic -qmp stdio -serial none "$@"
+echo
+}
+
+function run_qemu()
+{
+do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qmp \
+  | _filter_qemu | _filter_imgfmt \
+  | _filter_actual_image_size
+}
+
+echo
+echo "=== Successful image creation (defaults) ==="
+echo
+
+run_qemu

[Qemu-block] [PATCH v4 36/37] qemu-iotests: Test qcow2 over file image creation with QMP

2018-03-07 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/206 | 436 +
 tests/qemu-iotests/206.out | 209 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 646 insertions(+)
 create mode 100755 tests/qemu-iotests/206
 create mode 100644 tests/qemu-iotests/206.out

diff --git a/tests/qemu-iotests/206 b/tests/qemu-iotests/206
new file mode 100755
index 00..0a18b2b19a
--- /dev/null
+++ b/tests/qemu-iotests/206
@@ -0,0 +1,436 @@
+#!/bin/bash
+#
+# Test qcow2 and file image creation
+#
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=kw...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+function do_run_qemu()
+{
+echo Testing: "$@"
+$QEMU -nographic -qmp stdio -serial none "$@"
+echo
+}
+
+function run_qemu()
+{
+do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qmp \
+  | _filter_qemu | _filter_imgfmt \
+  | _filter_actual_image_size
+}
+
+echo
+echo "=== Successful image creation (defaults) ==="
+echo
+
+size=$((128 * 1024 * 1024))
+
+run_qemu <

[Qemu-block] [PATCH v4 29/37] sheepdog: Support .bdrv_co_create

2018-03-07 Thread Kevin Wolf

This adds the .bdrv_co_create driver callback to sheepdog, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json |  24 -
 block/sheepdog.c | 243 +++
 2 files changed, 192 insertions(+), 75 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index e590ab6c71..fd21fc 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3512,6 +3512,28 @@
 'erasure-coded': 'SheepdogRedundancyErasureCoded' } }
 
 ##
+# @BlockdevCreateOptionsSheepdog:
+#
+# Driver specific image creation options for Sheepdog.
+#
+# @location Where to store the new image file
+# @size Size of the virtual disk in bytes
+# @backing-file File name of a base image
+# @preallocationPreallocation mode (allowed values: off, full)
+# @redundancy   Redundancy of the image
+# @object-size  Object size of the image
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsSheepdog',
+  'data': { 'location': 'BlockdevOptionsSheepdog',
+'size': 'size',
+'*backing-file':'str',
+'*preallocation':   'PreallocMode',
+'*redundancy':  'SheepdogRedundancy',
+'*object-size': 'size' } }
+
+##
 # @BlockdevCreateNotSupported:
 #
 # This is used for all drivers that don't support creating images.
@@ -3562,7 +3584,7 @@
   'raw':'BlockdevCreateNotSupported',
   'rbd':'BlockdevCreateOptionsRbd',
   'replication':'BlockdevCreateNotSupported',
-  'sheepdog':   'BlockdevCreateNotSupported',
+  'sheepdog':   'BlockdevCreateOptionsSheepdog',
   'ssh':'BlockdevCreateNotSupported',
   'throttle':   'BlockdevCreateNotSupported',
   'vdi':'BlockdevCreateNotSupported',
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 3966cd229a..8680b2926f 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -15,8 +15,10 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qapi/qapi-visit-block-core.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qobject-input-visitor.h"
+#include "qapi/qobject-output-visitor.h"
 #include "qemu/uri.h"
 #include "qemu/error-report.h"
 #include "qemu/option.h"
@@ -533,23 +535,6 @@ static void sd_aio_setup(SheepdogAIOCB *acb, 
BDRVSheepdogState *s,
 qemu_co_mutex_unlock(>queue_lock);
 }
 
-static SocketAddress *sd_socket_address(const char *path,
-const char *host, const char *port)
-{
-SocketAddress *addr = g_new0(SocketAddress, 1);
-
-if (path) {
-addr->type = SOCKET_ADDRESS_TYPE_UNIX;
-addr->u.q_unix.path = g_strdup(path);
-} else {
-addr->type = SOCKET_ADDRESS_TYPE_INET;
-addr->u.inet.host = g_strdup(host ?: SD_DEFAULT_ADDR);
-addr->u.inet.port = g_strdup(port ?: stringify(SD_DEFAULT_PORT));
-}
-
-return addr;
-}
-
 static SocketAddress *sd_server_config(QDict *options, Error **errp)
 {
 QDict *server = NULL;
@@ -1882,6 +1867,44 @@ out_with_err_set:
 return ret;
 }
 
+static int sd_create_prealloc(BlockdevOptionsSheepdog *location, int64_t size,
+  Error **errp)
+{
+BlockDriverState *bs;
+Visitor *v;
+QObject *obj = NULL;
+QDict *qdict;
+Error *local_err = NULL;
+int ret;
+
+v = qobject_output_visitor_new();
+visit_type_BlockdevOptionsSheepdog(v, NULL, , _err);
+visit_free(v);
+
+if (local_err) {
+error_propagate(errp, local_err);
+qobject_decref(obj);
+return -EINVAL;
+}
+
+qdict = qobject_to_qdict(obj);
+qdict_flatten(qdict);
+
+qdict_put_str(qdict, "driver", "sheepdog");
+
+bs = bdrv_open(NULL, NULL, qdict, BDRV_O_PROTOCOL | BDRV_O_RDWR, errp);
+if (bs == NULL) {
+ret = -EIO;
+goto fail;
+}
+
+ret = sd_prealloc(bs, 0, size, errp);
+fail:
+bdrv_unref(bs);
+QDECREF(qdict);
+return ret;
+}
+
 static int parse_redundancy(BDRVSheepdogState *s, SheepdogRedundancy *opt)
 {
 struct SheepdogInode *inode = >inode;
@@ -1934,9 +1957,9 @@ static int parse_redundancy(BDRVSheepdogState *s, 
SheepdogRedundancy *opt)
  * # create a erasure coded vdi with x data strips and y parity strips
  * -o redundancy=x:y (x must be one of {2,4,8,16} and 1 <= y < SD_EC_MAX_STRIP)
  */
-static int parse_redundancy_str(BDRVSheepdogState *s, const char *opt)
+static SheepdogRedundancy *parse_redundancy_str(const char *opt)
 {
-struct SheepdogRedundancy redundancy;
+SheepdogRedundancy *redundancy;
 const char *n1, *n2;
 long copy, parity;
 char p[10];
@@ -1947,26 +1970,27 @@ static int parse_redundancy_str(BDRVSheepdogState *s, 
const char *opt)
 n2 = strtok(NULL, ":");
 
 if (!n1) {
-return -EINVAL;
+return NULL;

[Qemu-block] [PATCH v4 17/37] file-win32: Support .bdrv_co_create

2018-03-07 Thread Kevin Wolf

This adds the .bdrv_co_create driver callback to file-win32, which
enables image creation over QMP.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/file-win32.c | 47 ++-
 1 file changed, 38 insertions(+), 9 deletions(-)

diff --git a/block/file-win32.c b/block/file-win32.c
index 4a430d45f1..2e2f746bb1 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -553,30 +553,59 @@ static int64_t 
raw_get_allocated_file_size(BlockDriverState *bs)
 return st.st_size;
 }
 
-static int coroutine_fn raw_co_create_opts(const char *filename, QemuOpts 
*opts,
-   Error **errp)
+static int raw_co_create(BlockdevCreateOptions *options, Error **errp)
 {
+BlockdevCreateOptionsFile *file_opts;
 int fd;
-int64_t total_size = 0;
 
-strstart(filename, "file:", );
+assert(options->driver == BLOCKDEV_DRIVER_FILE);
+file_opts = >u.file;
 
-/* Read out options */
-total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-  BDRV_SECTOR_SIZE);
+if (file_opts->has_preallocation) {
+error_setg(errp, "Preallocation is not supported on Windows");
+return -EINVAL;
+}
+if (file_opts->has_nocow) {
+error_setg(errp, "nocow is not supported on Windows");
+return -EINVAL;
+}
 
-fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
+fd = qemu_open(file_opts->filename, O_WRONLY | O_CREAT | O_TRUNC | 
O_BINARY,
0644);
 if (fd < 0) {
 error_setg_errno(errp, errno, "Could not create file");
 return -EIO;
 }
 set_sparse(fd);
-ftruncate(fd, total_size);
+ftruncate(fd, file_opts->size);
 qemu_close(fd);
+
 return 0;
 }
 
+static int coroutine_fn raw_co_create_opts(const char *filename, QemuOpts 
*opts,
+   Error **errp)
+{
+BlockdevCreateOptions options;
+int64_t total_size = 0;
+
+strstart(filename, "file:", );
+
+/* Read out options */
+total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
+
+options = (BlockdevCreateOptions) {
+.driver = BLOCKDEV_DRIVER_FILE,
+.u.file = {
+.filename   = (char *) filename,
+.size   = total_size,
+.has_preallocation  = false,
+.has_nocow  = false,
+},
+};
+return raw_co_create(, errp);
+}
 
 static QemuOptsList raw_create_opts = {
 .name = "raw-create-opts",
-- 
2.13.6

[Qemu-block] [PATCH v4 24/37] rbd: Assign s->snap/image_name in qemu_rbd_open()

2018-03-07 Thread Kevin Wolf

Now that the options are already available in qemu_rbd_open() and not
only parsed in qemu_rbd_connect(), we can assign s->snap and
s->image_name there instead of passing the fields by reference to
qemu_rbd_connect().

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/rbd.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 1cd526bcea..2ac7ffca42 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -571,7 +571,6 @@ static char *qemu_rbd_mon_host(BlockdevOptionsRbd *opts, 
Error **errp)
 }
 
 static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
-char **s_snap, char **s_image_name,
 BlockdevOptionsRbd *opts, bool cache,
 const char *keypairs, const char *secretid,
 Error **errp)
@@ -593,9 +592,6 @@ static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t 
*io_ctx,
 goto failed_opts;
 }
 
-*s_snap = g_strdup(opts->snapshot);
-*s_image_name = g_strdup(opts->image);
-
 /* try default location when conf=NULL, but ignore failure */
 r = rados_conf_read_file(*cluster, opts->conf);
 if (opts->has_conf && r < 0) {
@@ -649,8 +645,6 @@ static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t 
*io_ctx,
 
 failed_shutdown:
 rados_shutdown(*cluster);
-g_free(*s_snap);
-g_free(*s_image_name);
 failed_opts:
 g_free(mon_host);
 return r;
@@ -711,13 +705,15 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto out;
 }
 
-r = qemu_rbd_connect(>cluster, >io_ctx, >snap, >image_name,
- opts, !(flags & BDRV_O_NOCACHE), keypairs, secretid,
- errp);
+r = qemu_rbd_connect(>cluster, >io_ctx, opts,
+ !(flags & BDRV_O_NOCACHE), keypairs, secretid, errp);
 if (r < 0) {
 goto out;
 }
 
+s->snap = g_strdup(opts->snapshot);
+s->image_name = g_strdup(opts->image);
+
 /* rbd_open is always r/w */
 r = rbd_open(s->io_ctx, s->image_name, >image, s->snap);
 if (r < 0) {
-- 
2.13.6

[Qemu-block] [PATCH v4 18/37] gluster: Support .bdrv_co_create

2018-03-07 Thread Kevin Wolf

This adds the .bdrv_co_create driver callback to gluster, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 qapi/block-core.json |  18 ++-
 block/gluster.c  | 135 ++-
 2 files changed, 108 insertions(+), 45 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 41955b097f..9170fbf6e6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3377,6 +3377,22 @@
 '*nocow':   'bool' } }
 
 ##
+# @BlockdevCreateOptionsGluster:
+#
+# Driver specific image creation options for gluster.
+#
+# @location Where to store the new image file
+# @size Size of the virtual disk in bytes
+# @preallocationPreallocation mode for the new image (default: off)
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsGluster',
+  'data': { 'location': 'BlockdevOptionsGluster',
+'size': 'size',
+'*preallocation':   'PreallocMode' } }
+
+##
 # @BlockdevQcow2Version:
 #
 # @v2:  The original QCOW2 format as introduced in qemu 0.10 (version 2)
@@ -3450,7 +3466,7 @@
   'file':   'BlockdevCreateOptionsFile',
   'ftp':'BlockdevCreateNotSupported',
   'ftps':   'BlockdevCreateNotSupported',
-  'gluster':'BlockdevCreateNotSupported',
+  'gluster':'BlockdevCreateOptionsGluster',
   'host_cdrom': 'BlockdevCreateNotSupported',
   'host_device':'BlockdevCreateNotSupported',
   'http':   'BlockdevCreateNotSupported',
diff --git a/block/gluster.c b/block/gluster.c
index 79b4cfdf74..63d3c37d4c 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -655,9 +655,11 @@ out:
 return -errno;
 }
 
-static struct glfs *qemu_gluster_init(BlockdevOptionsGluster *gconf,
-  const char *filename,
-  QDict *options, Error **errp)
+/* Converts options given in @filename and the @options QDict into the QAPI
+ * object @gconf. */
+static int qemu_gluster_parse(BlockdevOptionsGluster *gconf,
+  const char *filename,
+  QDict *options, Error **errp)
 {
 int ret;
 if (filename) {
@@ -668,8 +670,7 @@ static struct glfs 
*qemu_gluster_init(BlockdevOptionsGluster *gconf,
 "[host[:port]]volume/path[?socket=...]"
 "[,file.debug=N]"
 "[,file.logfile=/path/filename.log]\n");
-errno = -ret;
-return NULL;
+return ret;
 }
 } else {
 ret = qemu_gluster_parse_json(gconf, options, errp);
@@ -685,10 +686,23 @@ static struct glfs 
*qemu_gluster_init(BlockdevOptionsGluster *gconf,
  "file.server.1.transport=unix,"
  "file.server.1.socket=/var/run/glusterd.socket 
..."
  "\n");
-errno = -ret;
-return NULL;
+return ret;
 }
+}
 
+return 0;
+}
+
+static struct glfs *qemu_gluster_init(BlockdevOptionsGluster *gconf,
+  const char *filename,
+  QDict *options, Error **errp)
+{
+int ret;
+
+ret = qemu_gluster_parse(gconf, filename, options, errp);
+if (ret < 0) {
+errno = -ret;
+return NULL;
 }
 
 return qemu_gluster_glfs_init(gconf, errp);
@@ -1021,20 +1035,72 @@ static int qemu_gluster_do_truncate(struct glfs_fd *fd, 
int64_t offset,
 return 0;
 }
 
+static int qemu_gluster_co_create(BlockdevCreateOptions *options,
+  Error **errp)
+{
+BlockdevCreateOptionsGluster *opts = >u.gluster;
+struct glfs *glfs;
+struct glfs_fd *fd = NULL;
+int ret = 0;
+
+assert(options->driver == BLOCKDEV_DRIVER_GLUSTER);
+
+glfs = qemu_gluster_glfs_init(opts->location, errp);
+if (!glfs) {
+ret = -errno;
+goto out;
+}
+
+fd = glfs_creat(glfs, opts->location->path,
+O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, S_IRUSR | 
S_IWUSR);
+if (!fd) {
+ret = -errno;
+goto out;
+}
+
+ret = qemu_gluster_do_truncate(fd, opts->size, opts->preallocation, errp);
+
+out:
+if (fd) {
+if (glfs_close(fd) != 0 && ret == 0) {
+ret = -errno;
+}
+}
+glfs_clear_preopened(glfs);
+return ret;
+}
+
 static int coroutine_fn qemu_gluster_co_create_opts(const char *filename,
 QemuOpts *opts,
 Error **errp)
 {
+BlockdevCreateOptions *options;
+BlockdevCreateOptionsGluster *gopts;
 BlockdevOptionsGluster *gconf;
-struct glfs

[Qemu-block] [PATCH v4 31/37] ssh: QAPIfy host-key-check option

2018-03-07 Thread Kevin Wolf

This makes the host-key-check option available in blockdev-add.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json | 63 +++--
 block/ssh.c  | 88 +---
 2 files changed, 117 insertions(+), 34 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index fd21fc..4814bb7db7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2553,6 +2553,63 @@
 '*encrypt': 'BlockdevQcow2Encryption' } }
 
 ##
+# @SshHostKeyCheckMode:
+#
+# @none Don't check the host key at all
+# @hash Compare the host key with a given hash
+# @known_hosts  Check the host key against the known_hosts file
+#
+# Since: 2.12
+##
+{ 'enum': 'SshHostKeyCheckMode',
+  'data': [ 'none', 'hash', 'known_hosts' ] }
+
+##
+# @SshHostKeyCheckHashType:
+#
+# @md5  The given hash is an md5 hash
+# @sha1 The given hash is an sha1 hash
+#
+# Since: 2.12
+##
+{ 'enum': 'SshHostKeyCheckHashType',
+  'data': [ 'md5', 'sha1' ] }
+
+##
+# @SshHostKeyHash:
+#
+# @type The hash algorithm used for the hash
+# @hash The expected hash value
+#
+# Since: 2.12
+##
+{ 'struct': 'SshHostKeyHash',
+  'data': { 'type': 'SshHostKeyCheckHashType',
+'hash': 'str' }}
+
+##
+# @SshHostKeyDummy:
+#
+# For those union branches that don't need additional fields.
+#
+# Since: 2.12
+##
+{ 'struct': 'SshHostKeyDummy',
+  'data': {} }
+
+##
+# @SshHostKeyCheck:
+#
+# Since: 2.12
+##
+{ 'union': 'SshHostKeyCheck',
+  'base': { 'mode': 'SshHostKeyCheckMode' },
+  'discriminator': 'mode',
+  'data': { 'none': 'SshHostKeyDummy',
+'hash': 'SshHostKeyHash',
+'known_hosts': 'SshHostKeyDummy' } }
+
+##
 # @BlockdevOptionsSsh:
 #
 # @server:  host address
@@ -2562,14 +2619,16 @@
 # @user:user as which to connect, defaults to current
 #   local user name
 #
-# TODO: Expose the host_key_check option in QMP
+# @host-key-check:  Defines how and what to check the host key against
+#   (default: known_hosts)
 #
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsSsh',
   'data': { 'server': 'InetSocketAddress',
 'path': 'str',
-'*user': 'str' } }
+'*user': 'str',
+'*host-key-check': 'SshHostKeyCheck' } }
 
 
 ##
diff --git a/block/ssh.c b/block/ssh.c
index 8b646c0ede..30cdf9a99f 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -431,31 +431,35 @@ check_host_key_hash(BDRVSSHState *s, const char *hash,
 }
 
 static int check_host_key(BDRVSSHState *s, const char *host, int port,
-  const char *host_key_check, Error **errp)
+  SshHostKeyCheck *hkc, Error **errp)
 {
-/* host_key_check=no */
-if (strcmp(host_key_check, "no") == 0) {
-return 0;
-}
+SshHostKeyCheckMode mode;
 
-/* host_key_check=md5:xx:yy:zz:... */
-if (strncmp(host_key_check, "md5:", 4) == 0) {
-return check_host_key_hash(s, _key_check[4],
-   LIBSSH2_HOSTKEY_HASH_MD5, 16, errp);
-}
-
-/* host_key_check=sha1:xx:yy:zz:... */
-if (strncmp(host_key_check, "sha1:", 5) == 0) {
-return check_host_key_hash(s, _key_check[5],
-   LIBSSH2_HOSTKEY_HASH_SHA1, 20, errp);
+if (hkc) {
+mode = hkc->mode;
+} else {
+mode = SSH_HOST_KEY_CHECK_MODE_KNOWN_HOSTS;
 }
 
-/* host_key_check=yes */
-if (strcmp(host_key_check, "yes") == 0) {
+switch (mode) {
+case SSH_HOST_KEY_CHECK_MODE_NONE:
+return 0;
+case SSH_HOST_KEY_CHECK_MODE_HASH:
+if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_MD5) {
+return check_host_key_hash(s, hkc->u.hash.hash,
+   LIBSSH2_HOSTKEY_HASH_MD5, 16, errp);
+} else if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_SHA1) {
+return check_host_key_hash(s, hkc->u.hash.hash,
+   LIBSSH2_HOSTKEY_HASH_SHA1, 20, errp);
+}
+g_assert_not_reached();
+break;
+case SSH_HOST_KEY_CHECK_MODE_KNOWN_HOSTS:
 return check_host_key_knownhosts(s, host, port, errp);
+default:
+g_assert_not_reached();
 }
 
-error_setg(errp, "unknown host_key_check setting (%s)", host_key_check);
 return -EINVAL;
 }
 
@@ -544,16 +548,22 @@ static QemuOptsList ssh_runtime_opts = {
 .type = QEMU_OPT_NUMBER,
 .help = "Port to connect to",
 },
+{
+.name = "host_key_check",
+.type = QEMU_OPT_STRING,
+.help = "Defines how and what to check the host key against",
+},
 { /* end of list */ }
 },
 };
 
-static bool ssh_process_legacy_socket_options(QDict *output_opts,
-

[Qemu-block] [PATCH v4 30/37] ssh: Use QAPI BlockdevOptionsSsh object

2018-03-07 Thread Kevin Wolf

Create a BlockdevOptionsSsh object in connect_to_ssh() and take the
options from there. 'host_key_check' is still processed separately
because it's not in the schema yet.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/ssh.c | 137 +++-
 1 file changed, 62 insertions(+), 75 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index ff9929497d..8b646c0ede 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -35,6 +35,7 @@
 #include "qemu/sockets.h"
 #include "qemu/uri.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qapi/qapi-visit-block-core.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
 #include "qapi/qobject-input-visitor.h"
@@ -543,21 +544,6 @@ static QemuOptsList ssh_runtime_opts = {
 .type = QEMU_OPT_NUMBER,
 .help = "Port to connect to",
 },
-{
-.name = "path",
-.type = QEMU_OPT_STRING,
-.help = "Path of the image on the host",
-},
-{
-.name = "user",
-.type = QEMU_OPT_STRING,
-.help = "User as which to connect",
-},
-{
-.name = "host_key_check",
-.type = QEMU_OPT_STRING,
-.help = "Defines how and what to check the host key against",
-},
 { /* end of list */ }
 },
 };
@@ -582,23 +568,31 @@ static bool ssh_process_legacy_socket_options(QDict 
*output_opts,
 return true;
 }
 
-static InetSocketAddress *ssh_config(QDict *options, Error **errp)
+static BlockdevOptionsSsh *ssh_parse_options(QDict *options, Error **errp)
 {
-InetSocketAddress *inet = NULL;
-QDict *addr = NULL;
-QObject *crumpled_addr = NULL;
-Visitor *iv = NULL;
-Error *local_error = NULL;
-
-qdict_extract_subqdict(options, , "server.");
-if (!qdict_size(addr)) {
-error_setg(errp, "SSH server address missing");
-goto out;
+BlockdevOptionsSsh *result = NULL;
+QemuOpts *opts = NULL;
+Error *local_err = NULL;
+QObject *crumpled;
+const QDictEntry *e;
+Visitor *v;
+
+/* Translate legacy options */
+opts = qemu_opts_create(_runtime_opts, NULL, 0, _abort);
+qemu_opts_absorb_qdict(opts, options, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
 }
 
-crumpled_addr = qdict_crumple(addr, errp);
-if (!crumpled_addr) {
-goto out;
+if (!ssh_process_legacy_socket_options(options, opts, errp)) {
+goto fail;
+}
+
+/* Create the QAPI object */
+crumpled = qdict_crumple(options, errp);
+if (crumpled == NULL) {
+goto fail;
 }
 
 /*
@@ -609,51 +603,50 @@ static InetSocketAddress *ssh_config(QDict *options, 
Error **errp)
  * but when they come from -drive, they're all QString.  The
  * visitor expects the former.
  */
-iv = qobject_input_visitor_new(crumpled_addr);
-visit_type_InetSocketAddress(iv, NULL, , _error);
-if (local_error) {
-error_propagate(errp, local_error);
-goto out;
+v = qobject_input_visitor_new(crumpled);
+visit_type_BlockdevOptionsSsh(v, NULL, , _err);
+visit_free(v);
+qobject_decref(crumpled);
+
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
 }
 
-out:
-QDECREF(addr);
-qobject_decref(crumpled_addr);
-visit_free(iv);
-return inet;
+/* Remove the processed options from the QDict (the visitor processes
+ * _all_ options in the QDict) */
+while ((e = qdict_first(options))) {
+qdict_del(options, e->key);
+}
+
+fail:
+qemu_opts_del(opts);
+return result;
 }
 
 static int connect_to_ssh(BDRVSSHState *s, QDict *options,
   int ssh_flags, int creat_mode, Error **errp)
 {
+BlockdevOptionsSsh *opts;
 int r, ret;
-QemuOpts *opts = NULL;
-Error *local_err = NULL;
-const char *user, *path, *host_key_check;
+const char *user, *host_key_check;
 long port = 0;
 
-opts = qemu_opts_create(_runtime_opts, NULL, 0, _abort);
-qemu_opts_absorb_qdict(opts, options, _err);
-if (local_err) {
-ret = -EINVAL;
-error_propagate(errp, local_err);
-goto err;
-}
-
-if (!ssh_process_legacy_socket_options(options, opts, errp)) {
-ret = -EINVAL;
-goto err;
+host_key_check = qdict_get_try_str(options, "host_key_check");
+if (!host_key_check) {
+host_key_check = "yes";
+} else {
+qdict_del(options, "host_key_check");
 }
 
-path = qemu_opt_get(opts, "path");
-if (!path) {
-ret = -EINVAL;
-error_setg(errp, "No path was specified");
-goto err;
+opts = ssh_parse_options(options, errp);
+if (opts == NULL) {
+return -EINVAL;
 }
 
-user = qemu_opt_get(opts, "user");
-if (!user) {
+if (opts->has_user) {
+user = opts->user;
+

[Qemu-block] [PATCH v4 33/37] ssh: Support .bdrv_co_create

2018-03-07 Thread Kevin Wolf

This adds the .bdrv_co_create driver callback to ssh, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json | 16 +-
 block/ssh.c  | 83 ++--
 2 files changed, 63 insertions(+), 36 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 4814bb7db7..524d51567a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3593,6 +3593,20 @@
 '*object-size': 'size' } }
 
 ##
+# @BlockdevCreateOptionsSsh:
+#
+# Driver specific image creation options for SSH.
+#
+# @location Where to store the new image file
+# @size Size of the virtual disk in bytes
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsSsh',
+  'data': { 'location': 'BlockdevOptionsSsh',
+'size': 'size' } }
+
+##
 # @BlockdevCreateNotSupported:
 #
 # This is used for all drivers that don't support creating images.
@@ -3644,7 +3658,7 @@
   'rbd':'BlockdevCreateOptionsRbd',
   'replication':'BlockdevCreateNotSupported',
   'sheepdog':   'BlockdevCreateOptionsSheepdog',
-  'ssh':'BlockdevCreateNotSupported',
+  'ssh':'BlockdevCreateOptionsSsh',
   'throttle':   'BlockdevCreateNotSupported',
   'vdi':'BlockdevCreateNotSupported',
   'vhdx':   'BlockdevCreateNotSupported',
diff --git a/block/ssh.c b/block/ssh.c
index 80f59055cc..ab3acf0c22 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -854,59 +854,71 @@ static QemuOptsList ssh_create_opts = {
 }
 };
 
+static int ssh_co_create(BlockdevCreateOptions *options, Error **errp)
+{
+BlockdevCreateOptionsSsh *opts = >u.ssh;
+BDRVSSHState s;
+int ret;
+
+assert(options->driver == BLOCKDEV_DRIVER_SSH);
+
+ssh_state_init();
+
+ret = connect_to_ssh(, opts->location,
+ LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
+ LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
+ 0644, errp);
+if (ret < 0) {
+goto fail;
+}
+
+if (opts->size > 0) {
+ret = ssh_grow_file(, opts->size, errp);
+if (ret < 0) {
+goto fail;
+}
+}
+
+ret = 0;
+fail:
+ssh_state_free();
+return ret;
+}
+
 static int coroutine_fn ssh_co_create_opts(const char *filename, QemuOpts 
*opts,
Error **errp)
 {
-int r, ret;
-int64_t total_size = 0;
+BlockdevCreateOptions *create_options;
+BlockdevCreateOptionsSsh *ssh_opts;
+int ret;
 QDict *uri_options = NULL;
-BlockdevOptionsSsh *ssh_opts = NULL;
-BDRVSSHState s;
 
-ssh_state_init();
+create_options = g_new0(BlockdevCreateOptions, 1);
+create_options->driver = BLOCKDEV_DRIVER_SSH;
+ssh_opts = _options->u.ssh;
 
 /* Get desired file size. */
-total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-  BDRV_SECTOR_SIZE);
-DPRINTF("total_size=%" PRIi64, total_size);
+ssh_opts->size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
+DPRINTF("total_size=%" PRIi64, ssh_opts->size);
 
 uri_options = qdict_new();
-r = parse_uri(filename, uri_options, errp);
-if (r < 0) {
-ret = r;
+ret = parse_uri(filename, uri_options, errp);
+if (ret < 0) {
 goto out;
 }
 
-ssh_opts = ssh_parse_options(uri_options, errp);
-if (ssh_opts == NULL) {
+ssh_opts->location = ssh_parse_options(uri_options, errp);
+if (ssh_opts->location == NULL) {
 ret = -EINVAL;
 goto out;
 }
 
-r = connect_to_ssh(, ssh_opts,
-   LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
-   LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
-   0644, errp);
-if (r < 0) {
-ret = r;
-goto out;
-}
-
-if (total_size > 0) {
-ret = ssh_grow_file(, total_size, errp);
-if (ret < 0) {
-goto out;
-}
-}
-
-ret = 0;
+ret = ssh_co_create(create_options, errp);
 
  out:
-ssh_state_free();
-if (uri_options != NULL) {
-QDECREF(uri_options);
-}
-qapi_free_BlockdevOptionsSsh(ssh_opts);
+QDECREF(uri_options);
+qapi_free_BlockdevCreateOptions(create_options);
 return ret;
 }
 
@@ -1268,6 +1280,7 @@ static BlockDriver bdrv_ssh = {
 .instance_size= sizeof(BDRVSSHState),
 .bdrv_parse_filename  = ssh_parse_filename,
 .bdrv_file_open   = ssh_file_open,
+.bdrv_co_create   = ssh_co_create,
 .bdrv_co_create_opts  = ssh_co_create_opts,
 .bdrv_close   = ssh_close,
 .bdrv_has_zero_init   = ssh_has_zero_init,
-- 
2.13.6

[Qemu-block] [PATCH v4 32/37] ssh: Pass BlockdevOptionsSsh to connect_to_ssh()

2018-03-07 Thread Kevin Wolf

Move the parsing of the QDict options up to the callers, in preparation
for the .bdrv_co_create implementation that directly gets a QAPI type.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/ssh.c | 34 +-
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index 30cdf9a99f..80f59055cc 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -656,19 +656,13 @@ fail:
 return result;
 }
 
-static int connect_to_ssh(BDRVSSHState *s, QDict *options,
+static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
   int ssh_flags, int creat_mode, Error **errp)
 {
-BlockdevOptionsSsh *opts;
 int r, ret;
 const char *user;
 long port = 0;
 
-opts = ssh_parse_options(options, errp);
-if (opts == NULL) {
-return -EINVAL;
-}
-
 if (opts->has_user) {
 user = opts->user;
 } else {
@@ -748,8 +742,6 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
 goto err;
 }
 
-qapi_free_BlockdevOptionsSsh(opts);
-
 r = libssh2_sftp_fstat(s->sftp_handle, >attrs);
 if (r < 0) {
 sftp_error_setg(errp, s, "failed to read file attributes");
@@ -775,8 +767,6 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
 }
 s->session = NULL;
 
-qapi_free_BlockdevOptionsSsh(opts);
-
 return ret;
 }
 
@@ -784,6 +774,7 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
  Error **errp)
 {
 BDRVSSHState *s = bs->opaque;
+BlockdevOptionsSsh *opts;
 int ret;
 int ssh_flags;
 
@@ -794,8 +785,13 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
 ssh_flags |= LIBSSH2_FXF_WRITE;
 }
 
+opts = ssh_parse_options(options, errp);
+if (opts == NULL) {
+return -EINVAL;
+}
+
 /* Start up SSH. */
-ret = connect_to_ssh(s, options, ssh_flags, 0, errp);
+ret = connect_to_ssh(s, opts, ssh_flags, 0, errp);
 if (ret < 0) {
 goto err;
 }
@@ -803,6 +799,8 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
 /* Go non-blocking. */
 libssh2_session_set_blocking(s->session, 0);
 
+qapi_free_BlockdevOptionsSsh(opts);
+
 return 0;
 
  err:
@@ -811,6 +809,8 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
 }
 s->sock = -1;
 
+qapi_free_BlockdevOptionsSsh(opts);
+
 return ret;
 }
 
@@ -860,6 +860,7 @@ static int coroutine_fn ssh_co_create_opts(const char 
*filename, QemuOpts *opts,
 int r, ret;
 int64_t total_size = 0;
 QDict *uri_options = NULL;
+BlockdevOptionsSsh *ssh_opts = NULL;
 BDRVSSHState s;
 
 ssh_state_init();
@@ -876,7 +877,13 @@ static int coroutine_fn ssh_co_create_opts(const char 
*filename, QemuOpts *opts,
 goto out;
 }
 
-r = connect_to_ssh(, uri_options,
+ssh_opts = ssh_parse_options(uri_options, errp);
+if (ssh_opts == NULL) {
+ret = -EINVAL;
+goto out;
+}
+
+r = connect_to_ssh(, ssh_opts,
LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
0644, errp);
@@ -899,6 +906,7 @@ static int coroutine_fn ssh_co_create_opts(const char 
*filename, QemuOpts *opts,
 if (uri_options != NULL) {
 QDECREF(uri_options);
 }
+qapi_free_BlockdevOptionsSsh(ssh_opts);
 return ret;
 }
 
-- 
2.13.6

[Qemu-block] [PATCH v4 26/37] nfs: Use QAPI options in nfs_client_open()

2018-03-07 Thread Kevin Wolf

Using the QAPI visitor to turn all options into QAPI BlockdevOptionsNfs
simplifies the code a lot. It will also be useful for implementing the
QAPI based .bdrv_co_create callback.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/nfs.c | 176 ++--
 1 file changed, 53 insertions(+), 123 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index 7433d25856..e402d643fe 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -367,49 +367,6 @@ static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
 return task.ret;
 }
 
-static QemuOptsList runtime_opts = {
-.name = "nfs",
-.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
-.desc = {
-{
-.name = "path",
-.type = QEMU_OPT_STRING,
-.help = "Path of the image on the host",
-},
-{
-.name = "user",
-.type = QEMU_OPT_NUMBER,
-.help = "UID value to use when talking to the server",
-},
-{
-.name = "group",
-.type = QEMU_OPT_NUMBER,
-.help = "GID value to use when talking to the server",
-},
-{
-.name = "tcp-syn-count",
-.type = QEMU_OPT_NUMBER,
-.help = "Number of SYNs to send during the session establish",
-},
-{
-.name = "readahead-size",
-.type = QEMU_OPT_NUMBER,
-.help = "Set the readahead size in bytes",
-},
-{
-.name = "page-cache-size",
-.type = QEMU_OPT_NUMBER,
-.help = "Set the pagecache size in bytes",
-},
-{
-.name = "debug",
-.type = QEMU_OPT_NUMBER,
-.help = "Set the NFS debug level (max 2)",
-},
-{ /* end of list */ }
-},
-};
-
 static void nfs_detach_aio_context(BlockDriverState *bs)
 {
 NFSClient *client = bs->opaque;
@@ -452,71 +409,16 @@ static void nfs_file_close(BlockDriverState *bs)
 nfs_client_close(client);
 }
 
-static NFSServer *nfs_config(QDict *options, Error **errp)
-{
-NFSServer *server = NULL;
-QDict *addr = NULL;
-QObject *crumpled_addr = NULL;
-Visitor *iv = NULL;
-Error *local_error = NULL;
-
-qdict_extract_subqdict(options, , "server.");
-if (!qdict_size(addr)) {
-error_setg(errp, "NFS server address missing");
-goto out;
-}
-
-crumpled_addr = qdict_crumple(addr, errp);
-if (!crumpled_addr) {
-goto out;
-}
-
-/*
- * Caution: this works only because all scalar members of
- * NFSServer are QString in @crumpled_addr.  The visitor expects
- * @crumpled_addr to be typed according to the QAPI schema.  It
- * is when @options come from -blockdev or blockdev_add.  But when
- * they come from -drive, they're all QString.
- */
-iv = qobject_input_visitor_new(crumpled_addr);
-visit_type_NFSServer(iv, NULL, , _error);
-if (local_error) {
-error_propagate(errp, local_error);
-goto out;
-}
-
-out:
-QDECREF(addr);
-qobject_decref(crumpled_addr);
-visit_free(iv);
-return server;
-}
-
-
-static int64_t nfs_client_open(NFSClient *client, QDict *options,
+static int64_t nfs_client_open(NFSClient *client, BlockdevOptionsNfs *opts,
int flags, int open_flags, Error **errp)
 {
 int64_t ret = -EINVAL;
-QemuOpts *opts = NULL;
-Error *local_err = NULL;
 struct stat st;
 char *file = NULL, *strp = NULL;
 
 qemu_mutex_init(>mutex);
-opts = qemu_opts_create(_opts, NULL, 0, _abort);
-qemu_opts_absorb_qdict(opts, options, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-ret = -EINVAL;
-goto fail;
-}
 
-client->path = g_strdup(qemu_opt_get(opts, "path"));
-if (!client->path) {
-ret = -EINVAL;
-error_setg(errp, "No path was specified");
-goto fail;
-}
+client->path = g_strdup(opts->path);
 
 strp = strrchr(client->path, '/');
 if (strp == NULL) {
@@ -526,12 +428,10 @@ static int64_t nfs_client_open(NFSClient *client, QDict 
*options,
 file = g_strdup(strp);
 *strp = 0;
 
-/* Pop the config into our state object, Exit if invalid */
-client->server = nfs_config(options, errp);
-if (!client->server) {
-ret = -EINVAL;
-goto fail;
-}
+/* Steal the NFSServer object from opts; set the original pointer to NULL
+ * to avoid use after free and double free. */
+client->server = opts->server;
+opts->server = NULL;
 
 client->context = nfs_init_context();
 if (client->context == NULL) {
@@ -539,29 +439,29 @@ static int64_t nfs_client_open(NFSClient *client, QDict 
*options,
 goto fail;
 }
 
-if (qemu_opt_get(opts, "user")) {
-client->uid = qemu_opt_get_number(opts, "user", 0);
+if (opts->has_user) {
+

[Qemu-block] [PATCH v4 20/37] rbd: Factor out qemu_rbd_connect()

2018-03-07 Thread Kevin Wolf

The code to establish an RBD connection is duplicated between open and
create. In order to be able to share the code, factor out the code from
qemu_rbd_open() as a first step.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/rbd.c | 100 
 1 file changed, 60 insertions(+), 40 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index c1025c8493..99fcc7ecdf 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -546,32 +546,17 @@ out:
 return rados_str;
 }
 
-static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
- Error **errp)
+static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
+char **s_snap, char **s_image_name,
+QDict *options, bool cache, Error **errp)
 {
-BDRVRBDState *s = bs->opaque;
-const char *pool, *snap, *conf, *user, *image_name, *keypairs;
-const char *secretid, *filename;
 QemuOpts *opts;
-Error *local_err = NULL;
 char *mon_host = NULL;
+const char *pool, *snap, *conf, *user, *image_name, *keypairs;
+const char *secretid;
+Error *local_err = NULL;
 int r;
 
-/* If we are given a filename, parse the filename, with precedence given to
- * filename encoded options */
-filename = qdict_get_try_str(options, "filename");
-if (filename) {
-warn_report("'filename' option specified. "
-"This is an unsupported option, and may be deprecated "
-"in the future");
-qemu_rbd_parse_filename(filename, options, _err);
-if (local_err) {
-r = -EINVAL;
-error_propagate(errp, local_err);
-goto exit;
-}
-}
-
 opts = qemu_opts_create(_opts, NULL, 0, _abort);
 qemu_opts_absorb_qdict(opts, options, _err);
 if (local_err) {
@@ -602,35 +587,35 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto failed_opts;
 }
 
-r = rados_create(>cluster, user);
+r = rados_create(cluster, user);
 if (r < 0) {
 error_setg_errno(errp, -r, "error initializing");
 goto failed_opts;
 }
 
-s->snap = g_strdup(snap);
-s->image_name = g_strdup(image_name);
+*s_snap = g_strdup(snap);
+*s_image_name = g_strdup(image_name);
 
 /* try default location when conf=NULL, but ignore failure */
-r = rados_conf_read_file(s->cluster, conf);
+r = rados_conf_read_file(*cluster, conf);
 if (conf && r < 0) {
 error_setg_errno(errp, -r, "error reading conf file %s", conf);
 goto failed_shutdown;
 }
 
-r = qemu_rbd_set_keypairs(s->cluster, keypairs, errp);
+r = qemu_rbd_set_keypairs(*cluster, keypairs, errp);
 if (r < 0) {
 goto failed_shutdown;
 }
 
 if (mon_host) {
-r = rados_conf_set(s->cluster, "mon_host", mon_host);
+r = rados_conf_set(*cluster, "mon_host", mon_host);
 if (r < 0) {
 goto failed_shutdown;
 }
 }
 
-if (qemu_rbd_set_auth(s->cluster, secretid, errp) < 0) {
+if (qemu_rbd_set_auth(*cluster, secretid, errp) < 0) {
 r = -EIO;
 goto failed_shutdown;
 }
@@ -642,24 +627,65 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
  * librbd defaults to no caching. If write through caching cannot
  * be set up, fall back to no caching.
  */
-if (flags & BDRV_O_NOCACHE) {
-rados_conf_set(s->cluster, "rbd_cache", "false");
+if (cache) {
+rados_conf_set(*cluster, "rbd_cache", "true");
 } else {
-rados_conf_set(s->cluster, "rbd_cache", "true");
+rados_conf_set(*cluster, "rbd_cache", "false");
 }
 
-r = rados_connect(s->cluster);
+r = rados_connect(*cluster);
 if (r < 0) {
 error_setg_errno(errp, -r, "error connecting");
 goto failed_shutdown;
 }
 
-r = rados_ioctx_create(s->cluster, pool, >io_ctx);
+r = rados_ioctx_create(*cluster, pool, io_ctx);
 if (r < 0) {
 error_setg_errno(errp, -r, "error opening pool %s", pool);
 goto failed_shutdown;
 }
 
+qemu_opts_del(opts);
+return 0;
+
+failed_shutdown:
+rados_shutdown(*cluster);
+g_free(*s_snap);
+g_free(*s_image_name);
+failed_opts:
+qemu_opts_del(opts);
+g_free(mon_host);
+return r;
+}
+
+static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
+ Error **errp)
+{
+BDRVRBDState *s = bs->opaque;
+Error *local_err = NULL;
+const char *filename;
+int r;
+
+/* If we are given a filename, parse the filename, with precedence given to
+ * filename encoded options */
+filename = qdict_get_try_str(options, "filename");
+if (filename) {
+warn_report("'filename' option specified. "
+"This is an unsupported option, and may

[Qemu-block] [PATCH v4 16/37] file-posix: Support .bdrv_co_create

2018-03-07 Thread Kevin Wolf

This adds the .bdrv_co_create driver callback to file, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 qapi/block-core.json | 20 -
 block/file-posix.c   | 79 +---
 2 files changed, 75 insertions(+), 24 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 88d7a8678d..41955b097f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3359,6 +3359,24 @@
 { 'command': 'blockdev-del', 'data': { 'node-name': 'str' } }
 
 ##
+# @BlockdevCreateOptionsFile:
+#
+# Driver specific image creation options for file.
+#
+# @filename Filename for the new image file
+# @size Size of the virtual disk in bytes
+# @preallocationPreallocation mode for the new image (default: off)
+# @nocowTurn off copy-on-write (valid only on btrfs; default: off)
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsFile',
+  'data': { 'filename': 'str',
+'size': 'size',
+'*preallocation':   'PreallocMode',
+'*nocow':   'bool' } }
+
+##
 # @BlockdevQcow2Version:
 #
 # @v2:  The original QCOW2 format as introduced in qemu 0.10 (version 2)
@@ -3429,7 +3447,7 @@
   'bochs':  'BlockdevCreateNotSupported',
   'cloop':  'BlockdevCreateNotSupported',
   'dmg':'BlockdevCreateNotSupported',
-  'file':   'BlockdevCreateNotSupported',
+  'file':   'BlockdevCreateOptionsFile',
   'ftp':'BlockdevCreateNotSupported',
   'ftps':   'BlockdevCreateNotSupported',
   'gluster':'BlockdevCreateNotSupported',
diff --git a/block/file-posix.c b/block/file-posix.c
index 7f2cc63c60..fbc21a9921 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1982,34 +1982,25 @@ static int64_t 
raw_get_allocated_file_size(BlockDriverState *bs)
 return (int64_t)st.st_blocks * 512;
 }
 
-static int coroutine_fn raw_co_create_opts(const char *filename, QemuOpts 
*opts,
-   Error **errp)
+static int raw_co_create(BlockdevCreateOptions *options, Error **errp)
 {
+BlockdevCreateOptionsFile *file_opts;
 int fd;
 int result = 0;
-int64_t total_size = 0;
-bool nocow = false;
-PreallocMode prealloc;
-char *buf = NULL;
-Error *local_err = NULL;
 
-strstart(filename, "file:", );
+/* Validate options and set default values */
+assert(options->driver == BLOCKDEV_DRIVER_FILE);
+file_opts = >u.file;
 
-/* Read out options */
-total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-  BDRV_SECTOR_SIZE);
-nocow = qemu_opt_get_bool(opts, BLOCK_OPT_NOCOW, false);
-buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
-prealloc = qapi_enum_parse(_lookup, buf,
-   PREALLOC_MODE_OFF, _err);
-g_free(buf);
-if (local_err) {
-error_propagate(errp, local_err);
-result = -EINVAL;
-goto out;
+if (!file_opts->has_nocow) {
+file_opts->nocow = false;
+}
+if (!file_opts->has_preallocation) {
+file_opts->preallocation = PREALLOC_MODE_OFF;
 }
 
-fd = qemu_open(filename, O_RDWR | O_CREAT | O_TRUNC | O_BINARY,
+/* Create file */
+fd = qemu_open(file_opts->filename, O_RDWR | O_CREAT | O_TRUNC | O_BINARY,
0644);
 if (fd < 0) {
 result = -errno;
@@ -2017,7 +2008,7 @@ static int coroutine_fn raw_co_create_opts(const char 
*filename, QemuOpts *opts,
 goto out;
 }
 
-if (nocow) {
+if (file_opts->nocow) {
 #ifdef __linux__
 /* Set NOCOW flag to solve performance issue on fs like btrfs.
  * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value
@@ -2032,7 +2023,8 @@ static int coroutine_fn raw_co_create_opts(const char 
*filename, QemuOpts *opts,
 #endif
 }
 
-result = raw_regular_truncate(fd, total_size, prealloc, errp);
+result = raw_regular_truncate(fd, file_opts->size, 
file_opts->preallocation,
+  errp);
 if (result < 0) {
 goto out_close;
 }
@@ -2046,6 +2038,46 @@ out:
 return result;
 }
 
+static int coroutine_fn raw_co_create_opts(const char *filename, QemuOpts 
*opts,
+   Error **errp)
+{
+BlockdevCreateOptions options;
+int64_t total_size = 0;
+bool nocow = false;
+PreallocMode prealloc;
+char *buf = NULL;
+Error *local_err = NULL;
+
+/* Skip file: protocol prefix */
+strstart(filename, "file:", );
+
+/* Read out options */
+total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+  BDRV_SECTOR_SIZE);
+nocow = qemu_opt_get_bool(opts, BLOCK_OPT_NOCOW, false);
+buf =

[Qemu-block] [PATCH v4 23/37] rbd: Support .bdrv_co_create

2018-03-07 Thread Kevin Wolf

This adds the .bdrv_co_create driver callback to rbd, which enables
image creation over QMP.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json |  19 ++-
 block/rbd.c  | 150 ++-
 2 files changed, 118 insertions(+), 51 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9170fbf6e6..d4351877fc 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3436,6 +3436,23 @@
 '*refcount-bits':   'int' } }
 
 ##
+# @BlockdevCreateOptionsRbd:
+#
+# Driver specific image creation options for rbd/Ceph.
+#
+# @location Where to store the new image file. This location cannot
+#   point to a snapshot.
+# @size Size of the virtual disk in bytes
+# @cluster-size RBD object size
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsRbd',
+  'data': { 'location': 'BlockdevOptionsRbd',
+'size': 'size',
+'*cluster-size' :   'size' } }
+
+##
 # @BlockdevCreateNotSupported:
 #
 # This is used for all drivers that don't support creating images.
@@ -3484,7 +3501,7 @@
   'qed':'BlockdevCreateNotSupported',
   'quorum': 'BlockdevCreateNotSupported',
   'raw':'BlockdevCreateNotSupported',
-  'rbd':'BlockdevCreateNotSupported',
+  'rbd':'BlockdevCreateOptionsRbd',
   'replication':'BlockdevCreateNotSupported',
   'sheepdog':   'BlockdevCreateNotSupported',
   'ssh':'BlockdevCreateNotSupported',
diff --git a/block/rbd.c b/block/rbd.c
index 999fea105f..1cd526bcea 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -332,71 +332,55 @@ static QemuOptsList runtime_opts = {
 },
 };
 
-static int coroutine_fn qemu_rbd_co_create_opts(const char *filename,
-QemuOpts *opts,
-Error **errp)
+/* FIXME Deprecate and remove keypairs or make it available in QMP.
+ * password_secret should eventually be configurable in opts->location. Support
+ * for it in .bdrv_open will make it work here as well. */
+static int qemu_rbd_do_create(BlockdevCreateOptions *options,
+  const char *keypairs, const char 
*password_secret,
+  Error **errp)
 {
-Error *local_err = NULL;
-int64_t bytes = 0;
-int64_t objsize;
-int obj_order = 0;
-const char *pool, *image_name, *conf, *user, *keypairs;
-const char *secretid;
+BlockdevCreateOptionsRbd *opts = >u.rbd;
 rados_t cluster;
 rados_ioctx_t io_ctx;
-QDict *options = NULL;
-int ret = 0;
+int obj_order = 0;
+int ret;
+
+assert(options->driver == BLOCKDEV_DRIVER_RBD);
+if (opts->location->has_snapshot) {
+error_setg(errp, "Can't use snapshot name for image creation");
+return -EINVAL;
+}
 
-secretid = qemu_opt_get(opts, "password-secret");
+/* TODO Remove the limitation */
+if (opts->location->has_server) {
+error_setg(errp, "Can't specify server for image creation");
+return -EINVAL;
+}
 
-/* Read out options */
-bytes = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
- BDRV_SECTOR_SIZE);
-objsize = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE, 0);
-if (objsize) {
+if (opts->has_cluster_size) {
+int64_t objsize = opts->cluster_size;
 if ((objsize - 1) & objsize) {/* not a power of 2? */
 error_setg(errp, "obj size needs to be power of 2");
-ret = -EINVAL;
-goto exit;
+return -EINVAL;
 }
 if (objsize < 4096) {
 error_setg(errp, "obj size too small");
-ret = -EINVAL;
-goto exit;
+return -EINVAL;
 }
 obj_order = ctz32(objsize);
 }
 
-options = qdict_new();
-qemu_rbd_parse_filename(filename, options, _err);
-if (local_err) {
-ret = -EINVAL;
-error_propagate(errp, local_err);
-goto exit;
-}
-
-/*
- * Caution: while qdict_get_try_str() is fine, getting non-string
- * types would require more care.  When @options come from -blockdev
- * or blockdev_add, its members are typed according to the QAPI
- * schema, but when they come from -drive, they're all QString.
- */
-pool   = qdict_get_try_str(options, "pool");
-conf   = qdict_get_try_str(options, "conf");
-user   = qdict_get_try_str(options, "user");
-image_name = qdict_get_try_str(options, "image");
-keypairs   = qdict_get_try_str(options, "=keyvalue-pairs");
-
-ret = rados_create(, user);
+ret = rados_create(, opts->location->user);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "error initializing");
-goto exit;
+return ret;
 }

[Qemu-block] [PATCH v4 15/37] block: x-blockdev-create QMP command

2018-03-07 Thread Kevin Wolf

This adds a synchronous x-blockdev-create QMP command that can create
qcow2 images on a given node name.

We don't want to block while creating an image, so this is not the final
interface in all aspects, but BlockdevCreateOptionsQcow2 and
.bdrv_co_create() are what they actually might look like in the end. In
any case, this should be good enough to test whether we interpret
BlockdevCreateOptions as we should.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 qapi/block-core.json  | 12 
 include/block/block_int.h |  5 +++-
 block/create.c| 76 +++
 block/qcow2.c |  1 +
 block/Makefile.objs   |  2 +-
 5 files changed, 94 insertions(+), 2 deletions(-)
 create mode 100644 block/create.c

diff --git a/qapi/block-core.json b/qapi/block-core.json
index dfea7b0102..88d7a8678d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3464,6 +3464,18 @@
   } }
 
 ##
+# @x-blockdev-create:
+#
+# Create an image format on a given node.
+# TODO Replace with something asynchronous (block job?)
+#
+# Since: 2.12
+##
+{ 'command': 'x-blockdev-create',
+  'data': 'BlockdevCreateOptions',
+  'boxed': true }
+
+##
 # @blockdev-open-tray:
 #
 # Opens a block device's tray. If there is a block driver state tree inserted 
as
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a84cc04d55..27e17addba 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -129,8 +129,11 @@ struct BlockDriver {
 int (*bdrv_file_open)(BlockDriverState *bs, QDict *options, int flags,
   Error **errp);
 void (*bdrv_close)(BlockDriverState *bs);
-int coroutine_fn (*bdrv_co_create_opts)(const char *filename, QemuOpts 
*opts,
+int coroutine_fn (*bdrv_co_create)(BlockdevCreateOptions *opts,
Error **errp);
+int coroutine_fn (*bdrv_co_create_opts)(const char *filename,
+QemuOpts *opts,
+Error **errp);
 int (*bdrv_make_empty)(BlockDriverState *bs);
 
 void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
diff --git a/block/create.c b/block/create.c
new file mode 100644
index 00..8bd8a03719
--- /dev/null
+++ b/block/create.c
@@ -0,0 +1,76 @@
+/*
+ * Block layer code related to image creation
+ *
+ * Copyright (c) 2018 Kevin Wolf 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "block/block_int.h"
+#include "qapi/qapi-commands-block-core.h"
+#include "qapi/error.h"
+
+typedef struct BlockdevCreateCo {
+BlockDriver *drv;
+BlockdevCreateOptions *opts;
+int ret;
+Error **errp;
+} BlockdevCreateCo;
+
+static void coroutine_fn bdrv_co_create_co_entry(void *opaque)
+{
+BlockdevCreateCo *cco = opaque;
+cco->ret = cco->drv->bdrv_co_create(cco->opts, cco->errp);
+}
+
+void qmp_x_blockdev_create(BlockdevCreateOptions *options, Error **errp)
+{
+const char *fmt = BlockdevDriver_str(options->driver);
+BlockDriver *drv = bdrv_find_format(fmt);
+Coroutine *co;
+BlockdevCreateCo cco;
+
+/* If the driver is in the schema, we know that it exists. But it may not
+ * be whitelisted. */
+assert(drv);
+if (bdrv_uses_whitelist() && !bdrv_is_whitelisted(drv, false)) {
+error_setg(errp, "Driver is not whitelisted");
+return;
+}
+
+/* Call callback if it exists */
+if (!drv->bdrv_co_create) {
+error_setg(errp, "Driver does not support blockdev-create");
+return;
+}
+
+cco = (BlockdevCreateCo) {
+.drv = drv,
+.opts = options,
+.ret = -EINPROGRESS,
+.errp = errp,
+};
+
+co = qemu_coroutine_create(bdrv_co_create_co_entry, );
+

[Qemu-block] [PATCH v4 22/37] rbd: Pass BlockdevOptionsRbd to qemu_rbd_connect()

2018-03-07 Thread Kevin Wolf

With the conversion to a QAPI options object, the function is now
prepared to be used in a .bdrv_co_create implementation.

Signed-off-by: Kevin Wolf 
---
 block/rbd.c | 115 +---
 1 file changed, 55 insertions(+), 60 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index a979107f65..999fea105f 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -24,6 +24,8 @@
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qlist.h"
+#include "qapi/qobject-input-visitor.h"
+#include "qapi/qapi-visit-block-core.h"
 
 /*
  * When specifying the image filename use:
@@ -484,98 +486,71 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 qemu_aio_unref(acb);
 }
 
-static char *qemu_rbd_mon_host(QDict *options, Error **errp)
+static char *qemu_rbd_mon_host(BlockdevOptionsRbd *opts, Error **errp)
 {
-const char **vals = g_new(const char *, qdict_size(options) + 1);
-char keybuf[32];
+const char **vals;
 const char *host, *port;
 char *rados_str;
-int i;
-
-for (i = 0;; i++) {
-sprintf(keybuf, "server.%d.host", i);
-host = qdict_get_try_str(options, keybuf);
-qdict_del(options, keybuf);
-sprintf(keybuf, "server.%d.port", i);
-port = qdict_get_try_str(options, keybuf);
-qdict_del(options, keybuf);
-if (!host && !port) {
-break;
-}
-if (!host) {
-error_setg(errp, "Parameter server.%d.host is missing", i);
-rados_str = NULL;
-goto out;
-}
+InetSocketAddressBaseList *p;
+int i, cnt;
+
+if (!opts->has_server) {
+return NULL;
+}
+
+for (cnt = 0, p = opts->server; p; p = p->next) {
+cnt++;
+}
+
+vals = g_new(const char *, cnt + 1);
+
+for (i = 0, p = opts->server; p; p = p->next, i++) {
+host = p->value->host;
+port = p->value->port;
 
 if (strchr(host, ':')) {
-vals[i] = port ? g_strdup_printf("[%s]:%s", host, port)
-: g_strdup_printf("[%s]", host);
+vals[i] = g_strdup_printf("[%s]:%s", host, port);
 } else {
-vals[i] = port ? g_strdup_printf("%s:%s", host, port)
-: g_strdup(host);
+vals[i] = g_strdup_printf("%s:%s", host, port);
 }
 }
 vals[i] = NULL;
 
 rados_str = i ? g_strjoinv(";", (char **)vals) : NULL;
-out:
 g_strfreev((char **)vals);
 return rados_str;
 }
 
 static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
 char **s_snap, char **s_image_name,
-QDict *options, bool cache,
+BlockdevOptionsRbd *opts, bool cache,
 const char *keypairs, const char *secretid,
 Error **errp)
 {
-QemuOpts *opts;
 char *mon_host = NULL;
-const char *pool, *snap, *conf, *user, *image_name;
 Error *local_err = NULL;
 int r;
 
-opts = qemu_opts_create(_opts, NULL, 0, _abort);
-qemu_opts_absorb_qdict(opts, options, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-r = -EINVAL;
-goto failed_opts;
-}
-
-mon_host = qemu_rbd_mon_host(options, _err);
+mon_host = qemu_rbd_mon_host(opts, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 r = -EINVAL;
 goto failed_opts;
 }
 
-pool   = qemu_opt_get(opts, "pool");
-conf   = qemu_opt_get(opts, "conf");
-snap   = qemu_opt_get(opts, "snapshot");
-user   = qemu_opt_get(opts, "user");
-image_name = qemu_opt_get(opts, "image");
-
-if (!pool || !image_name) {
-error_setg(errp, "Parameters 'pool' and 'image' are required");
-r = -EINVAL;
-goto failed_opts;
-}
-
-r = rados_create(cluster, user);
+r = rados_create(cluster, opts->user);
 if (r < 0) {
 error_setg_errno(errp, -r, "error initializing");
 goto failed_opts;
 }
 
-*s_snap = g_strdup(snap);
-*s_image_name = g_strdup(image_name);
+*s_snap = g_strdup(opts->snapshot);
+*s_image_name = g_strdup(opts->image);
 
 /* try default location when conf=NULL, but ignore failure */
-r = rados_conf_read_file(*cluster, conf);
-if (conf && r < 0) {
-error_setg_errno(errp, -r, "error reading conf file %s", conf);
+r = rados_conf_read_file(*cluster, opts->conf);
+if (opts->has_conf && r < 0) {
+error_setg_errno(errp, -r, "error reading conf file %s", opts->conf);
 goto failed_shutdown;
 }
 
@@ -615,13 +590,12 @@ static int qemu_rbd_connect(rados_t *cluster, 
rados_ioctx_t *io_ctx,
 goto failed_shutdown;
 }
 
-r = rados_ioctx_create(*cluster, pool, io_ctx);
+r = rados_ioctx_create(*cluster, opts->pool, io_ctx);
 if (r < 0) {
-error_setg_errno(errp, -r, "error

[Qemu-block] [PATCH v4 19/37] rbd: Fix use after free in qemu_rbd_set_keypairs() error path

2018-03-07 Thread Kevin Wolf

If we want to include the invalid option name in the error message, we
can't free the string earlier than that.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/rbd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/rbd.c b/block/rbd.c
index c1275c1ec9..c1025c8493 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -268,13 +268,14 @@ static int qemu_rbd_set_keypairs(rados_t cluster, const 
char *keypairs_json,
 key = qstring_get_str(name);
 
 ret = rados_conf_set(cluster, key, qstring_get_str(value));
-QDECREF(name);
 QDECREF(value);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "invalid conf option %s", key);
+QDECREF(name);
 ret = -EINVAL;
 break;
 }
+QDECREF(name);
 }
 
 QDECREF(keypairs);
-- 
2.13.6

[Qemu-block] [PATCH v4 13/37] qcow2: Use visitor for options in qcow2_create()

2018-03-07 Thread Kevin Wolf

Instead of manually creating the BlockdevCreateOptions object, use a
visitor to parse the given options into the QAPI object.

This involves translation from the old command line syntax to the syntax
mandated by the QAPI schema. Option names are still checked against
qcow2_create_opts, so only the old option names are allowed on the
command line, even if they are translated in qcow2_create().

In contrast, new option values are optionally recognised besides the old
values: 'compat' accepts 'v2'/'v3' as an alias for '0.10'/'1.1', and
'encrypt.format' accepts 'qcow' as an alias for 'aes' now.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2.c  | 218 -
 tests/qemu-iotests/049.out |   8 +-
 tests/qemu-iotests/112.out |   4 +-
 3 files changed, 84 insertions(+), 146 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 933c612754..37b0e36c1e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -37,7 +37,8 @@
 #include "qemu/option_int.h"
 #include "qemu/cutils.h"
 #include "qemu/bswap.h"
-#include "qapi/opts-visitor.h"
+#include "qapi/qobject-input-visitor.h"
+#include "qapi/qapi-visit-block-core.h"
 #include "block/crypto.h"
 
 /*
@@ -2449,37 +2450,6 @@ static int qcow2_crypt_method_from_format(const char 
*encryptfmt)
 }
 }
 
-static QCryptoBlockCreateOptions *
-qcow2_parse_encryption(const char *encryptfmt, QemuOpts *opts, Error **errp)
-{
-QCryptoBlockCreateOptions *cryptoopts = NULL;
-QDict *options, *encryptopts;
-int fmt;
-
-options = qemu_opts_to_qdict(opts, NULL);
-qdict_extract_subqdict(options, , "encrypt.");
-QDECREF(options);
-
-fmt = qcow2_crypt_method_from_format(encryptfmt);
-
-switch (fmt) {
-case QCOW_CRYPT_LUKS:
-cryptoopts = block_crypto_create_opts_init(
-Q_CRYPTO_BLOCK_FORMAT_LUKS, encryptopts, errp);
-break;
-case QCOW_CRYPT_AES:
-cryptoopts = block_crypto_create_opts_init(
-Q_CRYPTO_BLOCK_FORMAT_QCOW, encryptopts, errp);
-break;
-default:
-error_setg(errp, "Unknown encryption format '%s'", encryptfmt);
-break;
-}
-
-QDECREF(encryptopts);
-return cryptoopts;
-}
-
 static int qcow2_set_up_encryption(BlockDriverState *bs,
QCryptoBlockCreateOptions *cryptoopts,
Error **errp)
@@ -2874,7 +2844,7 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 }
 if (version < 3 && qcow2_opts->lazy_refcounts) {
 error_setg(errp, "Lazy refcounts only supported with compatibility "
-   "level 1.1 and above (use compat=1.1 or greater)");
+   "level 1.1 and above (use version=v3 or greater)");
 ret = -EINVAL;
 goto out;
 }
@@ -2892,7 +2862,7 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 }
 if (version < 3 && qcow2_opts->refcount_bits != 16) {
 error_setg(errp, "Different refcount widths than 16 bits require "
-   "compatibility level 1.1 or above (use compat=1.1 or "
+   "compatibility level 1.1 or above (use version=v3 or "
"greater)");
 ret = -EINVAL;
 goto out;
@@ -3080,144 +3050,112 @@ out:
 static int coroutine_fn qcow2_co_create_opts(const char *filename, QemuOpts 
*opts,
  Error **errp)
 {
-BlockdevCreateOptions create_options;
-char *backing_file = NULL;
-char *backing_fmt = NULL;
-BlockdevDriver backing_drv;
-char *buf = NULL;
-uint64_t size = 0;
-int flags = 0;
-size_t cluster_size = DEFAULT_CLUSTER_SIZE;
-PreallocMode prealloc;
-int version;
-uint64_t refcount_bits;
-char *encryptfmt = NULL;
-QCryptoBlockCreateOptions *cryptoopts = NULL;
+BlockdevCreateOptions *create_options = NULL;
+QDict *qdict = NULL;
+QObject *qobj;
+Visitor *v;
 BlockDriverState *bs = NULL;
 Error *local_err = NULL;
+const char *val;
 int ret;
 
-/* Read out options */
-size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
-BDRV_SECTOR_SIZE);
-backing_file = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
-backing_fmt = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FMT);
-backing_drv = qapi_enum_parse(_lookup, backing_fmt,
-  0, _err);
-if (local_err) {
-error_propagate(errp, local_err);
+/* Only the keyval visitor supports the dotted syntax needed for
+ * encryption, so go through a QDict before getting a QAPI type. Ignore
+ * options meant for the protocol layer so that the visitor doesn't
+ * complain. */
+qdict = qemu_opts_to_qdict_filtered(opts, NULL, bdrv_qcow2.create_opts,
+true);
+
+/*

[Qemu-block] [PATCH v4 14/37] block: Make bdrv_is_whitelisted() public

2018-03-07 Thread Kevin Wolf

We'll use a separate source file for image creation, and we need to
check there whether the requested driver is whitelisted.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 include/block/block.h | 1 +
 block.c   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/block/block.h b/include/block/block.h
index 7805187b30..cdec3639a3 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -226,6 +226,7 @@ char *bdrv_perm_names(uint64_t perm);
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
 bool bdrv_uses_whitelist(void);
+int bdrv_is_whitelisted(BlockDriver *drv, bool read_only);
 BlockDriver *bdrv_find_protocol(const char *filename,
 bool allow_protocol_prefix,
 Error **errp);
diff --git a/block.c b/block.c
index 4fc65f7621..00f94241fc 100644
--- a/block.c
+++ b/block.c
@@ -370,7 +370,7 @@ BlockDriver *bdrv_find_format(const char *format_name)
 return bdrv_do_find_format(format_name);
 }
 
-static int bdrv_is_whitelisted(BlockDriver *drv, bool read_only)
+int bdrv_is_whitelisted(BlockDriver *drv, bool read_only)
 {
 static const char *whitelist_rw[] = {
 CONFIG_BDRV_RW_WHITELIST
-- 
2.13.6

[Qemu-block] [PATCH v4 21/37] rbd: Remove non-schema options from runtime_opts

2018-03-07 Thread Kevin Wolf

Instead of the QemuOpts in qemu_rbd_connect(), we want to use QAPI
objects. As a preparation, fetch those options directly from the QDict
that .bdrv_open() supports in the rbd driver and that are not in the
schema.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/rbd.c | 55 ---
 1 file changed, 24 insertions(+), 31 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 99fcc7ecdf..a979107f65 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -326,28 +326,6 @@ static QemuOptsList runtime_opts = {
 /*
  * server.* extracted manually, see qemu_rbd_mon_host()
  */
-{
-.name = "password-secret",
-.type = QEMU_OPT_STRING,
-.help = "ID of secret providing the password",
-},
-
-/*
- * Keys for qemu_rbd_parse_filename(), not in the QAPI schema
- */
-{
-/*
- * HACK: name starts with '=' so that qemu_opts_parse()
- * can't set it
- */
-.name = "=keyvalue-pairs",
-.type = QEMU_OPT_STRING,
-.help = "Legacy rados key/value option parameters",
-},
-{
-.name = "filename",
-.type = QEMU_OPT_STRING,
-},
 { /* end of list */ }
 },
 };
@@ -548,12 +526,13 @@ out:
 
 static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
 char **s_snap, char **s_image_name,
-QDict *options, bool cache, Error **errp)
+QDict *options, bool cache,
+const char *keypairs, const char *secretid,
+Error **errp)
 {
 QemuOpts *opts;
 char *mon_host = NULL;
-const char *pool, *snap, *conf, *user, *image_name, *keypairs;
-const char *secretid;
+const char *pool, *snap, *conf, *user, *image_name;
 Error *local_err = NULL;
 int r;
 
@@ -572,14 +551,11 @@ static int qemu_rbd_connect(rados_t *cluster, 
rados_ioctx_t *io_ctx,
 goto failed_opts;
 }
 
-secretid = qemu_opt_get(opts, "password-secret");
-
 pool   = qemu_opt_get(opts, "pool");
 conf   = qemu_opt_get(opts, "conf");
 snap   = qemu_opt_get(opts, "snapshot");
 user   = qemu_opt_get(opts, "user");
 image_name = qemu_opt_get(opts, "image");
-keypairs   = qemu_opt_get(opts, "=keyvalue-pairs");
 
 if (!pool || !image_name) {
 error_setg(errp, "Parameters 'pool' and 'image' are required");
@@ -664,6 +640,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 BDRVRBDState *s = bs->opaque;
 Error *local_err = NULL;
 const char *filename;
+char *keypairs, *secretid;
 int r;
 
 /* If we are given a filename, parse the filename, with precedence given to
@@ -674,16 +651,28 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 "This is an unsupported option, and may be deprecated "
 "in the future");
 qemu_rbd_parse_filename(filename, options, _err);
+qdict_del(options, "filename");
 if (local_err) {
 error_propagate(errp, local_err);
 return -EINVAL;
 }
 }
 
+keypairs = g_strdup(qdict_get_try_str(options, "=keyvalue-pairs"));
+if (keypairs) {
+qdict_del(options, "=keyvalue-pairs");
+}
+
+secretid = g_strdup(qdict_get_try_str(options, "password-secret"));
+if (secretid) {
+qdict_del(options, "password-secret");
+}
+
 r = qemu_rbd_connect(>cluster, >io_ctx, >snap, >image_name,
- options, !(flags & BDRV_O_NOCACHE), errp);
+ options, !(flags & BDRV_O_NOCACHE), keypairs, 
secretid,
+ errp);
 if (r < 0) {
-return r;
+goto out;
 }
 
 /* rbd_open is always r/w */
@@ -710,13 +699,17 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }
 
-return 0;
+r = 0;
+goto out;
 
 failed_open:
 rados_ioctx_destroy(s->io_ctx);
 g_free(s->snap);
 g_free(s->image_name);
 rados_shutdown(s->cluster);
+out:
+g_free(keypairs);
+g_free(secretid);
 return r;
 }
 
-- 
2.13.6

[Qemu-block] [PATCH v4 06/37] qcow2: Use BlockdevRef in qcow2_co_create()

2018-03-07 Thread Kevin Wolf

Instead of passing a separate BlockDriverState* into qcow2_co_create(),
make use of the BlockdevRef that is included in BlockdevCreateOptions.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 include/block/block.h |  1 +
 block.c   | 47 +++
 block/qcow2.c | 39 +--
 3 files changed, 73 insertions(+), 14 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 8b6db952a2..7805187b30 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -246,6 +246,7 @@ BdrvChild *bdrv_open_child(const char *filename,
BlockDriverState* parent,
const BdrvChildRole *child_role,
bool allow_none, Error **errp);
+BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef *ref, Error **errp);
 void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
  Error **errp);
 int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
diff --git a/block.c b/block.c
index 8f1c43d037..4fc65f7621 100644
--- a/block.c
+++ b/block.c
@@ -34,6 +34,8 @@
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qstring.h"
+#include "qapi/qobject-output-visitor.h"
+#include "qapi/qapi-visit-block-core.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/sysemu.h"
 #include "qemu/notify.h"
@@ -2406,6 +2408,51 @@ BdrvChild *bdrv_open_child(const char *filename,
 return c;
 }
 
+/* TODO Future callers may need to specify parent/child_role in order for
+ * option inheritance to work. Existing callers use it for the root node. */
+BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef *ref, Error **errp)
+{
+BlockDriverState *bs = NULL;
+Error *local_err = NULL;
+QObject *obj = NULL;
+QDict *qdict = NULL;
+const char *reference = NULL;
+Visitor *v = NULL;
+
+if (ref->type == QTYPE_QSTRING) {
+reference = ref->u.reference;
+} else {
+BlockdevOptions *options = >u.definition;
+assert(ref->type == QTYPE_QDICT);
+
+v = qobject_output_visitor_new();
+visit_type_BlockdevOptions(v, NULL, , _err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
+}
+visit_complete(v, );
+
+qdict = qobject_to_qdict(obj);
+qdict_flatten(qdict);
+
+/* bdrv_open_inherit() defaults to the values in bdrv_flags (for
+ * compatibility with other callers) rather than what we want as the
+ * real defaults. Apply the defaults here instead. */
+qdict_set_default_str(qdict, BDRV_OPT_CACHE_DIRECT, "off");
+qdict_set_default_str(qdict, BDRV_OPT_CACHE_NO_FLUSH, "off");
+qdict_set_default_str(qdict, BDRV_OPT_READ_ONLY, "off");
+}
+
+bs = bdrv_open_inherit(NULL, reference, qdict, 0, NULL, NULL, errp);
+obj = NULL;
+
+fail:
+qobject_decref(obj);
+visit_free(v);
+return bs;
+}
+
 static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
int flags,
QDict *snapshot_options,
diff --git a/block/qcow2.c b/block/qcow2.c
index 7679c28f57..b7df2d5cab 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2768,8 +2768,8 @@ static uint64_t qcow2_opt_get_refcount_bits_del(QemuOpts 
*opts, int version,
 }
 
 static int coroutine_fn
-qcow2_co_create(BlockDriverState *bs, BlockdevCreateOptions *create_options,
-QemuOpts *opts, const char *encryptfmt, Error **errp)
+qcow2_co_create(BlockdevCreateOptions *create_options, QemuOpts *opts,
+const char *encryptfmt, Error **errp)
 {
 BlockdevCreateOptionsQcow2 *qcow2_opts;
 QDict *options;
@@ -2786,7 +2786,8 @@ qcow2_co_create(BlockDriverState *bs, 
BlockdevCreateOptions *create_options,
  * 2 GB for 64k clusters, and we don't want to have a 2 GB initial file
  * size for any qcow2 image.
  */
-BlockBackend *blk;
+BlockBackend *blk = NULL;
+BlockDriverState *bs = NULL;
 QCowHeader *header;
 size_t cluster_size;
 int version;
@@ -2795,10 +2796,15 @@ qcow2_co_create(BlockDriverState *bs, 
BlockdevCreateOptions *create_options,
 Error *local_err = NULL;
 int ret;
 
-/* Validate options and set default values */
 assert(create_options->driver == BLOCKDEV_DRIVER_QCOW2);
 qcow2_opts = _options->u.qcow2;
 
+bs = bdrv_open_blockdev_ref(qcow2_opts->file, errp);
+if (bs == NULL) {
+return -EIO;
+}
+
+/* Validate options and set default values */
 if (!QEMU_IS_ALIGNED(qcow2_opts->size, BDRV_SECTOR_SIZE)) {
 error_setg(errp, "Image size must be a multiple of 512 bytes");
 ret = -EINVAL;
@@ -2827,7 +2833,8 @@

[Qemu-block] [PATCH v4 10/37] test-qemu-opts: Test qemu_opts_append()

2018-03-07 Thread Kevin Wolf

Basic test for merging two QemuOptsLists.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 tests/test-qemu-opts.c | 128 +
 1 file changed, 128 insertions(+)

diff --git a/tests/test-qemu-opts.c b/tests/test-qemu-opts.c
index 5d5a3daa7b..6c3183390b 100644
--- a/tests/test-qemu-opts.c
+++ b/tests/test-qemu-opts.c
@@ -23,6 +23,8 @@ static QemuOptsList opts_list_01 = {
 {
 .name = "str1",
 .type = QEMU_OPT_STRING,
+.help = "Help texts are preserved in qemu_opts_append",
+.def_value_str = "default",
 },{
 .name = "str2",
 .type = QEMU_OPT_STRING,
@@ -32,6 +34,7 @@ static QemuOptsList opts_list_01 = {
 },{
 .name = "number1",
 .type = QEMU_OPT_NUMBER,
+.help = "Having help texts only for some options is okay",
 },{
 .name = "number2",
 .type = QEMU_OPT_NUMBER,
@@ -743,6 +746,129 @@ static void test_opts_parse_size(void)
 qemu_opts_reset(_list_02);
 }
 
+static void append_verify_list_01(QemuOptDesc *desc, bool with_overlapping)
+{
+int i = 0;
+
+if (with_overlapping) {
+g_assert_cmpstr(desc[i].name, ==, "str1");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_STRING);
+g_assert_cmpstr(desc[i].help, ==,
+"Help texts are preserved in qemu_opts_append");
+g_assert_cmpstr(desc[i].def_value_str, ==, "default");
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "str2");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_STRING);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+}
+
+g_assert_cmpstr(desc[i].name, ==, "str3");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_STRING);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "number1");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_NUMBER);
+g_assert_cmpstr(desc[i].help, ==,
+"Having help texts only for some options is okay");
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "number2");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_NUMBER);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, NULL);
+}
+
+static void append_verify_list_02(QemuOptDesc *desc)
+{
+int i = 0;
+
+g_assert_cmpstr(desc[i].name, ==, "str1");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_STRING);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "str2");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_STRING);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "bool1");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_BOOL);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "bool2");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_BOOL);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "size1");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_SIZE);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "size2");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_SIZE);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+i++;
+
+g_assert_cmpstr(desc[i].name, ==, "size3");
+g_assert_cmpint(desc[i].type, ==, QEMU_OPT_SIZE);
+g_assert_cmpstr(desc[i].help, ==, NULL);
+g_assert_cmpstr(desc[i].def_value_str, ==, NULL);
+}
+
+static void test_opts_append_to_null(void)
+{
+QemuOptsList *merged;
+
+merged = qemu_opts_append(NULL, _list_01);
+g_assert(merged != _list_01);
+
+g_assert_cmpstr(merged->name, ==, NULL);
+g_assert_cmpstr(merged->implied_opt_name, ==, NULL);
+g_assert_false(merged->merge_lists);
+
+append_verify_list_01(merged->desc, true);
+
+qemu_opts_free(merged);
+}
+
+static void test_opts_append(void)
+{
+QemuOptsList *first, *merged;
+
+first = qemu_opts_append(NULL, _list_02);
+merged = qemu_opts_append(first, _list_01);
+g_assert(first != _list_02);
+g_assert(merged != _list_01);
+
+g_assert_cmpstr(merged->name, ==, NULL);
+g_assert_cmpstr(merged->implied_opt_name, ==, NULL);
+g_assert_false(merged->merge_lists);
+
+

[Qemu-block] [PATCH v4 11/37] test-qemu-opts: Test qemu_opts_to_qdict_filtered()

2018-03-07 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 tests/test-qemu-opts.c | 125 +
 1 file changed, 125 insertions(+)

diff --git a/tests/test-qemu-opts.c b/tests/test-qemu-opts.c
index 6c3183390b..2c422abcd4 100644
--- a/tests/test-qemu-opts.c
+++ b/tests/test-qemu-opts.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "qemu/option.h"
+#include "qemu/option_int.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
@@ -868,6 +869,127 @@ static void test_opts_append(void)
 qemu_opts_free(merged);
 }
 
+static void test_opts_to_qdict_basic(void)
+{
+QemuOpts *opts;
+QDict *dict;
+
+opts = qemu_opts_parse(_list_01, "str1=foo,str2=,str3=bar,number1=42",
+   false, _abort);
+g_assert(opts != NULL);
+
+dict = qemu_opts_to_qdict(opts, NULL);
+g_assert(dict != NULL);
+
+g_assert_cmpstr(qdict_get_str(dict, "str1"), ==, "foo");
+g_assert_cmpstr(qdict_get_str(dict, "str2"), ==, "");
+g_assert_cmpstr(qdict_get_str(dict, "str3"), ==, "bar");
+g_assert_cmpstr(qdict_get_str(dict, "number1"), ==, "42");
+g_assert_false(qdict_haskey(dict, "number2"));
+
+QDECREF(dict);
+qemu_opts_del(opts);
+}
+
+static void test_opts_to_qdict_filtered(void)
+{
+QemuOptsList *first, *merged;
+QemuOpts *opts;
+QDict *dict;
+
+first = qemu_opts_append(NULL, _list_02);
+merged = qemu_opts_append(first, _list_01);
+
+opts = qemu_opts_parse(merged,
+   "str1=foo,str2=,str3=bar,bool1=off,number1=42",
+   false, _abort);
+g_assert(opts != NULL);
+
+/* Convert to QDict without deleting from opts */
+dict = qemu_opts_to_qdict_filtered(opts, NULL, _list_01, false);
+g_assert(dict != NULL);
+g_assert_cmpstr(qdict_get_str(dict, "str1"), ==, "foo");
+g_assert_cmpstr(qdict_get_str(dict, "str2"), ==, "");
+g_assert_cmpstr(qdict_get_str(dict, "str3"), ==, "bar");
+g_assert_cmpstr(qdict_get_str(dict, "number1"), ==, "42");
+g_assert_false(qdict_haskey(dict, "number2"));
+g_assert_false(qdict_haskey(dict, "bool1"));
+QDECREF(dict);
+
+dict = qemu_opts_to_qdict_filtered(opts, NULL, _list_02, false);
+g_assert(dict != NULL);
+g_assert_cmpstr(qdict_get_str(dict, "str1"), ==, "foo");
+g_assert_cmpstr(qdict_get_str(dict, "str2"), ==, "");
+g_assert_cmpstr(qdict_get_str(dict, "bool1"), ==, "off");
+g_assert_false(qdict_haskey(dict, "str3"));
+g_assert_false(qdict_haskey(dict, "number1"));
+g_assert_false(qdict_haskey(dict, "number2"));
+QDECREF(dict);
+
+/* Now delete converted options from opts */
+dict = qemu_opts_to_qdict_filtered(opts, NULL, _list_01, true);
+g_assert(dict != NULL);
+g_assert_cmpstr(qdict_get_str(dict, "str1"), ==, "foo");
+g_assert_cmpstr(qdict_get_str(dict, "str2"), ==, "");
+g_assert_cmpstr(qdict_get_str(dict, "str3"), ==, "bar");
+g_assert_cmpstr(qdict_get_str(dict, "number1"), ==, "42");
+g_assert_false(qdict_haskey(dict, "number2"));
+g_assert_false(qdict_haskey(dict, "bool1"));
+QDECREF(dict);
+
+dict = qemu_opts_to_qdict_filtered(opts, NULL, _list_02, true);
+g_assert(dict != NULL);
+g_assert_cmpstr(qdict_get_str(dict, "bool1"), ==, "off");
+g_assert_false(qdict_haskey(dict, "str1"));
+g_assert_false(qdict_haskey(dict, "str2"));
+g_assert_false(qdict_haskey(dict, "str3"));
+g_assert_false(qdict_haskey(dict, "number1"));
+g_assert_false(qdict_haskey(dict, "number2"));
+QDECREF(dict);
+
+g_assert_true(QTAILQ_EMPTY(>head));
+
+qemu_opts_del(opts);
+qemu_opts_free(merged);
+}
+
+static void test_opts_to_qdict_duplicates(void)
+{
+QemuOpts *opts;
+QemuOpt *opt;
+QDict *dict;
+
+opts = qemu_opts_parse(_list_03, "foo=a,foo=b", false, _abort);
+g_assert(opts != NULL);
+
+/* Verify that opts has two options with the same name */
+opt = QTAILQ_FIRST(>head);
+g_assert_cmpstr(opt->name, ==, "foo");
+g_assert_cmpstr(opt->str , ==, "a");
+
+opt = QTAILQ_NEXT(opt, next);
+g_assert_cmpstr(opt->name, ==, "foo");
+g_assert_cmpstr(opt->str , ==, "b");
+
+opt = QTAILQ_NEXT(opt, next);
+g_assert(opt == NULL);
+
+/* In the conversion to QDict, the last one wins */
+dict = qemu_opts_to_qdict(opts, NULL);
+g_assert(dict != NULL);
+g_assert_cmpstr(qdict_get_str(dict, "foo"), ==, "b");
+QDECREF(dict);
+
+/* The last one still wins if entries are deleted, and both are deleted */
+dict = qemu_opts_to_qdict_filtered(opts, NULL, NULL, true);
+g_assert(dict != NULL);
+g_assert_cmpstr(qdict_get_str(dict, "foo"), ==, "b");
+QDECREF(dict);
+
+g_assert_true(QTAILQ_EMPTY(>head));
+
+qemu_opts_del(opts);
+}
 
 int main(int argc, char *argv[])
 {
@@ -889,6

[Qemu-block] [PATCH v4 09/37] util: Add qemu_opts_to_qdict_filtered()

2018-03-07 Thread Kevin Wolf

This allows, given a QemuOpts for a QemuOptsList that was merged from
multiple QemuOptsList, to only consider those options that exist in one
specific list. Block drivers need this to separate format-layer create
options from protocol-level options.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 include/qemu/option.h |  2 ++
 util/qemu-option.c| 42 +-
 2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index b127fb6db6..306fdb5f7a 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -124,6 +124,8 @@ void qemu_opts_set_defaults(QemuOptsList *list, const char 
*params,
 int permit_abbrev);
 QemuOpts *qemu_opts_from_qdict(QemuOptsList *list, const QDict *qdict,
Error **errp);
+QDict *qemu_opts_to_qdict_filtered(QemuOpts *opts, QDict *qdict,
+   QemuOptsList *list, bool del);
 QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict);
 void qemu_opts_absorb_qdict(QemuOpts *opts, QDict *qdict, Error **errp);
 
diff --git a/util/qemu-option.c b/util/qemu-option.c
index a401e936da..2b412eff5e 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -1007,14 +1007,23 @@ void qemu_opts_absorb_qdict(QemuOpts *opts, QDict 
*qdict, Error **errp)
 }
 
 /*
- * Convert from QemuOpts to QDict.
- * The QDict values are of type QString.
+ * Convert from QemuOpts to QDict. The QDict values are of type QString.
+ *
+ * If @list is given, only add those options to the QDict that are contained in
+ * the list. If @del is true, any options added to the QDict are removed from
+ * the QemuOpts, otherwise they remain there.
+ *
+ * If two options in @opts have the same name, they are processed in order
+ * so that the last one wins (consistent with the reverse iteration in
+ * qemu_opt_find()), but all of them are deleted if @del is true.
+ *
  * TODO We'll want to use types appropriate for opt->desc->type, but
  * this is enough for now.
  */
-QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict)
+QDict *qemu_opts_to_qdict_filtered(QemuOpts *opts, QDict *qdict,
+   QemuOptsList *list, bool del)
 {
-QemuOpt *opt;
+QemuOpt *opt, *next;
 
 if (!qdict) {
 qdict = qdict_new();
@@ -1022,12 +1031,35 @@ QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict)
 if (opts->id) {
 qdict_put_str(qdict, "id", opts->id);
 }
-QTAILQ_FOREACH(opt, >head, next) {
+QTAILQ_FOREACH_SAFE(opt, >head, next, next) {
+if (list) {
+QemuOptDesc *desc;
+bool found = false;
+for (desc = list->desc; desc->name; desc++) {
+if (!strcmp(desc->name, opt->name)) {
+found = true;
+break;
+}
+}
+if (!found) {
+continue;
+}
+}
 qdict_put_str(qdict, opt->name, opt->str);
+if (del) {
+qemu_opt_del(opt);
+}
 }
 return qdict;
 }
 
+/* Copy all options in a QemuOpts to the given QDict. See
+ * qemu_opts_to_qdict_filtered() for details. */
+QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict)
+{
+return qemu_opts_to_qdict_filtered(opts, qdict, NULL, false);
+}
+
 /* Validate parsed opts against descriptions where no
  * descriptions were provided in the QemuOptsList.
  */
-- 
2.13.6

[Qemu-block] [PATCH v4 12/37] qdict: Introduce qdict_rename_keys()

2018-03-07 Thread Kevin Wolf

A few block drivers will need to rename .bdrv_create options for their
QAPIfication, so let's have a helper function for that.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 include/qapi/qmp/qdict.h |   6 +++
 qobject/qdict.c  |  34 +
 tests/check-qdict.c  | 129 +++
 3 files changed, 169 insertions(+)

diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
index ff6f7842c3..7c6d844549 100644
--- a/include/qapi/qmp/qdict.h
+++ b/include/qapi/qmp/qdict.h
@@ -81,4 +81,10 @@ QObject *qdict_crumple(const QDict *src, Error **errp);
 
 void qdict_join(QDict *dest, QDict *src, bool overwrite);
 
+typedef struct QDictRenames {
+const char *from;
+const char *to;
+} QDictRenames;
+bool qdict_rename_keys(QDict *qdict, const QDictRenames *renames, Error 
**errp);
+
 #endif /* QDICT_H */
diff --git a/qobject/qdict.c b/qobject/qdict.c
index 23df84f9cd..229b8c840b 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -1072,3 +1072,37 @@ void qdict_join(QDict *dest, QDict *src, bool overwrite)
 entry = next;
 }
 }
+
+/**
+ * qdict_rename_keys(): Rename keys in qdict according to the replacements
+ * specified in the array renames. The array must be terminated by an entry
+ * with from = NULL.
+ *
+ * The renames are performed individually in the order of the array, so entries
+ * may be renamed multiple times and may or may not conflict depending on the
+ * order of the renames array.
+ *
+ * Returns true for success, false in error cases.
+ */
+bool qdict_rename_keys(QDict *qdict, const QDictRenames *renames, Error **errp)
+{
+QObject *qobj;
+
+while (renames->from) {
+if (qdict_haskey(qdict, renames->from)) {
+if (qdict_haskey(qdict, renames->to)) {
+error_setg(errp, "'%s' and its alias '%s' can't be used at the 
"
+   "same time", renames->to, renames->from);
+return false;
+}
+
+qobj = qdict_get(qdict, renames->from);
+qobject_incref(qobj);
+qdict_put_obj(qdict, renames->to, qobj);
+qdict_del(qdict, renames->from);
+}
+
+renames++;
+}
+return true;
+}
diff --git a/tests/check-qdict.c b/tests/check-qdict.c
index ec628f3453..a3faea8bfc 100644
--- a/tests/check-qdict.c
+++ b/tests/check-qdict.c
@@ -665,6 +665,133 @@ static void qdict_crumple_test_empty(void)
 QDECREF(dst);
 }
 
+static int qdict_count_entries(QDict *dict)
+{
+const QDictEntry *e;
+int count = 0;
+
+for (e = qdict_first(dict); e; e = qdict_next(dict, e)) {
+count++;
+}
+
+return count;
+}
+
+static void qdict_rename_keys_test(void)
+{
+QDict *dict = qdict_new();
+QDict *copy;
+QDictRenames *renames;
+Error *local_err = NULL;
+
+qdict_put_str(dict, "abc", "foo");
+qdict_put_str(dict, "abcdef", "bar");
+qdict_put_int(dict, "number", 42);
+qdict_put_bool(dict, "flag", true);
+qdict_put_null(dict, "nothing");
+
+/* Empty rename list */
+renames = (QDictRenames[]) {
+{ NULL, "this can be anything" }
+};
+copy = qdict_clone_shallow(dict);
+qdict_rename_keys(copy, renames, _abort);
+
+g_assert_cmpstr(qdict_get_str(copy, "abc"), ==, "foo");
+g_assert_cmpstr(qdict_get_str(copy, "abcdef"), ==, "bar");
+g_assert_cmpint(qdict_get_int(copy, "number"), ==, 42);
+g_assert_cmpint(qdict_get_bool(copy, "flag"), ==, true);
+g_assert(qobject_type(qdict_get(copy, "nothing")) == QTYPE_QNULL);
+g_assert_cmpint(qdict_count_entries(copy), ==, 5);
+
+QDECREF(copy);
+
+/* Simple rename of all entries */
+renames = (QDictRenames[]) {
+{ "abc","str1" },
+{ "abcdef", "str2" },
+{ "number", "int" },
+{ "flag",   "bool" },
+{ "nothing","null" },
+{ NULL , NULL }
+};
+copy = qdict_clone_shallow(dict);
+qdict_rename_keys(copy, renames, _abort);
+
+g_assert(!qdict_haskey(copy, "abc"));
+g_assert(!qdict_haskey(copy, "abcdef"));
+g_assert(!qdict_haskey(copy, "number"));
+g_assert(!qdict_haskey(copy, "flag"));
+g_assert(!qdict_haskey(copy, "nothing"));
+
+g_assert_cmpstr(qdict_get_str(copy, "str1"), ==, "foo");
+g_assert_cmpstr(qdict_get_str(copy, "str2"), ==, "bar");
+g_assert_cmpint(qdict_get_int(copy, "int"), ==, 42);
+g_assert_cmpint(qdict_get_bool(copy, "bool"), ==, true);
+g_assert(qobject_type(qdict_get(copy, "null")) == QTYPE_QNULL);
+g_assert_cmpint(qdict_count_entries(copy), ==, 5);
+
+QDECREF(copy);
+
+/* Renames are processed top to bottom */
+renames = (QDictRenames[]) {
+{ "abc","tmp" },
+{ "abcdef", "abc" },
+{ "number", "abcdef" },
+{ "flag",   "number" },
+{ "nothing","flag" },
+{ "tmp",

[Qemu-block] [PATCH v4 01/37] block/qapi: Introduce BlockdevCreateOptions

2018-03-07 Thread Kevin Wolf

This creates a BlockdevCreateOptions union type that will contain all of
the options for image creation. We'll start out with an empty struct
type BlockdevCreateNotSupported for all drivers.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json | 62 
 1 file changed, 62 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 00475f08d4..bb2db662f7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3359,6 +3359,68 @@
 { 'command': 'blockdev-del', 'data': { 'node-name': 'str' } }
 
 ##
+# @BlockdevCreateNotSupported:
+#
+# This is used for all drivers that don't support creating images.
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateNotSupported', 'data': {}}
+
+##
+# @BlockdevCreateOptions:
+#
+# Options for creating an image format on a given node.
+#
+# @driver   block driver to create the image format
+#
+# Since: 2.12
+##
+{ 'union': 'BlockdevCreateOptions',
+  'base': {
+  'driver': 'BlockdevDriver' },
+  'discriminator': 'driver',
+  'data': {
+  'blkdebug':   'BlockdevCreateNotSupported',
+  'blkverify':  'BlockdevCreateNotSupported',
+  'bochs':  'BlockdevCreateNotSupported',
+  'cloop':  'BlockdevCreateNotSupported',
+  'dmg':'BlockdevCreateNotSupported',
+  'file':   'BlockdevCreateNotSupported',
+  'ftp':'BlockdevCreateNotSupported',
+  'ftps':   'BlockdevCreateNotSupported',
+  'gluster':'BlockdevCreateNotSupported',
+  'host_cdrom': 'BlockdevCreateNotSupported',
+  'host_device':'BlockdevCreateNotSupported',
+  'http':   'BlockdevCreateNotSupported',
+  'https':  'BlockdevCreateNotSupported',
+  'iscsi':  'BlockdevCreateNotSupported',
+  'luks':   'BlockdevCreateNotSupported',
+  'nbd':'BlockdevCreateNotSupported',
+  'nfs':'BlockdevCreateNotSupported',
+  'null-aio':   'BlockdevCreateNotSupported',
+  'null-co':'BlockdevCreateNotSupported',
+  'nvme':   'BlockdevCreateNotSupported',
+  'parallels':  'BlockdevCreateNotSupported',
+  'qcow2':  'BlockdevCreateNotSupported',
+  'qcow':   'BlockdevCreateNotSupported',
+  'qed':'BlockdevCreateNotSupported',
+  'quorum': 'BlockdevCreateNotSupported',
+  'raw':'BlockdevCreateNotSupported',
+  'rbd':'BlockdevCreateNotSupported',
+  'replication':'BlockdevCreateNotSupported',
+  'sheepdog':   'BlockdevCreateNotSupported',
+  'ssh':'BlockdevCreateNotSupported',
+  'throttle':   'BlockdevCreateNotSupported',
+  'vdi':'BlockdevCreateNotSupported',
+  'vhdx':   'BlockdevCreateNotSupported',
+  'vmdk':   'BlockdevCreateNotSupported',
+  'vpc':'BlockdevCreateNotSupported',
+  'vvfat':  'BlockdevCreateNotSupported',
+  'vxhs':   'BlockdevCreateNotSupported'
+  } }
+
+##
 # @blockdev-open-tray:
 #
 # Opens a block device's tray. If there is a block driver state tree inserted 
as
-- 
2.13.6

[Qemu-block] [PATCH v4 05/37] qcow2: Pass BlockdevCreateOptions to qcow2_co_create()

2018-03-07 Thread Kevin Wolf

All of the simple options are now passed to qcow2_co_create() in a
BlockdevCreateOptions object. Still missing: node-name and the
encryption options.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 189 ++
 1 file changed, 151 insertions(+), 38 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 7a11874d22..7679c28f57 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2700,19 +2700,26 @@ static int64_t qcow2_calc_prealloc_size(int64_t 
total_size,
 return meta_size + aligned_total_size;
 }
 
-static size_t qcow2_opt_get_cluster_size_del(QemuOpts *opts, Error **errp)
+static bool validate_cluster_size(size_t cluster_size, Error **errp)
 {
-size_t cluster_size;
-int cluster_bits;
-
-cluster_size = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
- DEFAULT_CLUSTER_SIZE);
-cluster_bits = ctz32(cluster_size);
+int cluster_bits = ctz32(cluster_size);
 if (cluster_bits < MIN_CLUSTER_BITS || cluster_bits > MAX_CLUSTER_BITS ||
 (1 << cluster_bits) != cluster_size)
 {
 error_setg(errp, "Cluster size must be a power of two between %d and "
"%dk", 1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
+return false;
+}
+return true;
+}
+
+static size_t qcow2_opt_get_cluster_size_del(QemuOpts *opts, Error **errp)
+{
+size_t cluster_size;
+
+cluster_size = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
+ DEFAULT_CLUSTER_SIZE);
+if (!validate_cluster_size(cluster_size, errp)) {
 return 0;
 }
 return cluster_size;
@@ -2761,12 +2768,10 @@ static uint64_t 
qcow2_opt_get_refcount_bits_del(QemuOpts *opts, int version,
 }
 
 static int coroutine_fn
-qcow2_co_create(BlockDriverState *bs, int64_t total_size,
-const char *backing_file, const char *backing_format,
-int flags, size_t cluster_size, PreallocMode prealloc,
-QemuOpts *opts, int version, int refcount_order,
-const char *encryptfmt, Error **errp)
+qcow2_co_create(BlockDriverState *bs, BlockdevCreateOptions *create_options,
+QemuOpts *opts, const char *encryptfmt, Error **errp)
 {
+BlockdevCreateOptionsQcow2 *qcow2_opts;
 QDict *options;
 
 /*
@@ -2783,10 +2788,92 @@ qcow2_co_create(BlockDriverState *bs, int64_t 
total_size,
  */
 BlockBackend *blk;
 QCowHeader *header;
+size_t cluster_size;
+int version;
+int refcount_order;
 uint64_t* refcount_table;
 Error *local_err = NULL;
 int ret;
 
+/* Validate options and set default values */
+assert(create_options->driver == BLOCKDEV_DRIVER_QCOW2);
+qcow2_opts = _options->u.qcow2;
+
+if (!QEMU_IS_ALIGNED(qcow2_opts->size, BDRV_SECTOR_SIZE)) {
+error_setg(errp, "Image size must be a multiple of 512 bytes");
+ret = -EINVAL;
+goto out;
+}
+
+if (qcow2_opts->has_version) {
+switch (qcow2_opts->version) {
+case BLOCKDEV_QCOW2_VERSION_V2:
+version = 2;
+break;
+case BLOCKDEV_QCOW2_VERSION_V3:
+version = 3;
+break;
+default:
+g_assert_not_reached();
+}
+} else {
+version = 3;
+}
+
+if (qcow2_opts->has_cluster_size) {
+cluster_size = qcow2_opts->cluster_size;
+} else {
+cluster_size = DEFAULT_CLUSTER_SIZE;
+}
+
+if (!validate_cluster_size(cluster_size, errp)) {
+return -EINVAL;
+}
+
+if (!qcow2_opts->has_preallocation) {
+qcow2_opts->preallocation = PREALLOC_MODE_OFF;
+}
+if (qcow2_opts->has_backing_file &&
+qcow2_opts->preallocation != PREALLOC_MODE_OFF)
+{
+error_setg(errp, "Backing file and preallocation cannot be used at "
+   "the same time");
+return -EINVAL;
+}
+if (qcow2_opts->has_backing_fmt && !qcow2_opts->has_backing_file) {
+error_setg(errp, "Backing format cannot be used without backing file");
+return -EINVAL;
+}
+
+if (!qcow2_opts->has_lazy_refcounts) {
+qcow2_opts->lazy_refcounts = false;
+}
+if (version < 3 && qcow2_opts->lazy_refcounts) {
+error_setg(errp, "Lazy refcounts only supported with compatibility "
+   "level 1.1 and above (use compat=1.1 or greater)");
+return -EINVAL;
+}
+
+if (!qcow2_opts->has_refcount_bits) {
+qcow2_opts->refcount_bits = 16;
+}
+if (qcow2_opts->refcount_bits > 64 ||
+!is_power_of_2(qcow2_opts->refcount_bits))
+{
+error_setg(errp, "Refcount width must be a power of two and may not "
+   "exceed 64 bits");
+return -EINVAL;
+}
+if (version < 3 && qcow2_opts->refcount_bits != 16) {
+error_setg(errp, "Different

[Qemu-block] [PATCH v4 08/37] qcow2: Handle full/falloc preallocation in qcow2_co_create()

2018-03-07 Thread Kevin Wolf

Once qcow2_co_create() can be called directly on an already existing
node, we must provide the 'full' and 'falloc' preallocation modes
outside of creating the image on the protocol layer. Fortunately, we
have preallocated truncate now which can provide this functionality.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index e1821eb3c8..933c612754 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2908,6 +2908,25 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 }
 blk_set_allow_write_beyond_eof(blk, true);
 
+/* Clear the protocol layer and preallocate it if necessary */
+ret = blk_truncate(blk, 0, PREALLOC_MODE_OFF, errp);
+if (ret < 0) {
+goto out;
+}
+
+if (qcow2_opts->preallocation == PREALLOC_MODE_FULL ||
+qcow2_opts->preallocation == PREALLOC_MODE_FALLOC)
+{
+int64_t prealloc_size =
+qcow2_calc_prealloc_size(qcow2_opts->size, cluster_size,
+ refcount_order);
+
+ret = blk_truncate(blk, prealloc_size, qcow2_opts->preallocation, 
errp);
+if (ret < 0) {
+goto out;
+}
+}
+
 /* Write the header */
 QEMU_BUILD_BUG_ON((1 << MIN_CLUSTER_BITS) < sizeof(*header));
 header = g_malloc0(cluster_size);
@@ -3145,15 +3164,6 @@ static int coroutine_fn qcow2_co_create_opts(const char 
*filename, QemuOpts *opt
 
 
 /* Create and open the file (protocol layer) */
-if (prealloc == PREALLOC_MODE_FULL || prealloc == PREALLOC_MODE_FALLOC) {
-int refcount_order = ctz32(refcount_bits);
-int64_t prealloc_size =
-qcow2_calc_prealloc_size(size, cluster_size, refcount_order);
-qemu_opt_set_number(opts, BLOCK_OPT_SIZE, prealloc_size, _abort);
-qemu_opt_set(opts, BLOCK_OPT_PREALLOC, PreallocMode_str(prealloc),
- _abort);
-}
-
 ret = bdrv_create_file(filename, opts, errp);
 if (ret < 0) {
 goto finish;
-- 
2.13.6

[Qemu-block] [PATCH v4 02/37] block/qapi: Add qcow2 create options to schema

2018-03-07 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json | 45 -
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index bb2db662f7..dfea7b0102 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3359,6 +3359,49 @@
 { 'command': 'blockdev-del', 'data': { 'node-name': 'str' } }
 
 ##
+# @BlockdevQcow2Version:
+#
+# @v2:  The original QCOW2 format as introduced in qemu 0.10 (version 2)
+# @v3:  The extended QCOW2 format as introduced in qemu 1.1 (version 3)
+#
+# Since: 2.12
+##
+{ 'enum': 'BlockdevQcow2Version',
+  'data': [ 'v2', 'v3' ] }
+
+
+##
+# @BlockdevCreateOptionsQcow2:
+#
+# Driver specific image creation options for qcow2.
+#
+# @file Node to create the image format on
+# @size Size of the virtual disk in bytes
+# @version  Compatibility level (default: v3)
+# @backing-file File name of the backing file if a backing file
+#   should be used
+# @backing-fmt  Name of the block driver to use for the backing file
+# @encrypt  Encryption options if the image should be encrypted
+# @cluster-size qcow2 cluster size in bytes (default: 65536)
+# @preallocationPreallocation mode for the new image (default: off)
+# @lazy-refcounts   True if refcounts may be updated lazily (default: off)
+# @refcount-bitsWidth of reference counts in bits (default: 16)
+#
+# Since: 2.12
+##
+{ 'struct': 'BlockdevCreateOptionsQcow2',
+  'data': { 'file': 'BlockdevRef',
+'size': 'size',
+'*version': 'BlockdevQcow2Version',
+'*backing-file':'str',
+'*backing-fmt': 'BlockdevDriver',
+'*encrypt': 'QCryptoBlockCreateOptions',
+'*cluster-size':'size',
+'*preallocation':   'PreallocMode',
+'*lazy-refcounts':  'bool',
+'*refcount-bits':   'int' } }
+
+##
 # @BlockdevCreateNotSupported:
 #
 # This is used for all drivers that don't support creating images.
@@ -3402,7 +3445,7 @@
   'null-co':'BlockdevCreateNotSupported',
   'nvme':   'BlockdevCreateNotSupported',
   'parallels':  'BlockdevCreateNotSupported',
-  'qcow2':  'BlockdevCreateNotSupported',
+  'qcow2':  'BlockdevCreateOptionsQcow2',
   'qcow':   'BlockdevCreateNotSupported',
   'qed':'BlockdevCreateNotSupported',
   'quorum': 'BlockdevCreateNotSupported',
-- 
2.13.6

[Qemu-block] [PATCH v4 07/37] qcow2: Use QCryptoBlockCreateOptions in qcow2_co_create()

2018-03-07 Thread Kevin Wolf

Instead of passing the encryption format name and the QemuOpts down, use
the QCryptoBlockCreateOptions contained in BlockdevCreateOptions.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 62 +++
 1 file changed, 45 insertions(+), 17 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b7df2d5cab..e1821eb3c8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2449,13 +2449,10 @@ static int qcow2_crypt_method_from_format(const char 
*encryptfmt)
 }
 }
 
-static int qcow2_set_up_encryption(BlockDriverState *bs, const char 
*encryptfmt,
-   QemuOpts *opts, Error **errp)
+static QCryptoBlockCreateOptions *
+qcow2_parse_encryption(const char *encryptfmt, QemuOpts *opts, Error **errp)
 {
-BDRVQcow2State *s = bs->opaque;
 QCryptoBlockCreateOptions *cryptoopts = NULL;
-QCryptoBlock *crypto = NULL;
-int ret = -EINVAL;
 QDict *options, *encryptopts;
 int fmt;
 
@@ -2478,10 +2475,31 @@ static int qcow2_set_up_encryption(BlockDriverState 
*bs, const char *encryptfmt,
 error_setg(errp, "Unknown encryption format '%s'", encryptfmt);
 break;
 }
-if (!cryptoopts) {
-ret = -EINVAL;
-goto out;
+
+QDECREF(encryptopts);
+return cryptoopts;
+}
+
+static int qcow2_set_up_encryption(BlockDriverState *bs,
+   QCryptoBlockCreateOptions *cryptoopts,
+   Error **errp)
+{
+BDRVQcow2State *s = bs->opaque;
+QCryptoBlock *crypto = NULL;
+int fmt, ret;
+
+switch (cryptoopts->format) {
+case Q_CRYPTO_BLOCK_FORMAT_LUKS:
+fmt = QCOW_CRYPT_LUKS;
+break;
+case Q_CRYPTO_BLOCK_FORMAT_QCOW:
+fmt = QCOW_CRYPT_AES;
+break;
+default:
+error_setg(errp, "Crypto format not supported in qcow2");
+return -EINVAL;
 }
+
 s->crypt_method_header = fmt;
 
 crypto = qcrypto_block_create(cryptoopts, "encrypt.",
@@ -2489,8 +2507,7 @@ static int qcow2_set_up_encryption(BlockDriverState *bs, 
const char *encryptfmt,
   qcow2_crypto_hdr_write_func,
   bs, errp);
 if (!crypto) {
-ret = -EINVAL;
-goto out;
+return -EINVAL;
 }
 
 ret = qcow2_update_header(bs);
@@ -2499,10 +2516,9 @@ static int qcow2_set_up_encryption(BlockDriverState *bs, 
const char *encryptfmt,
 goto out;
 }
 
+ret = 0;
  out:
-QDECREF(encryptopts);
 qcrypto_block_free(crypto);
-qapi_free_QCryptoBlockCreateOptions(cryptoopts);
 return ret;
 }
 
@@ -2768,8 +2784,7 @@ static uint64_t qcow2_opt_get_refcount_bits_del(QemuOpts 
*opts, int version,
 }
 
 static int coroutine_fn
-qcow2_co_create(BlockdevCreateOptions *create_options, QemuOpts *opts,
-const char *encryptfmt, Error **errp)
+qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
 {
 BlockdevCreateOptionsQcow2 *qcow2_opts;
 QDict *options;
@@ -2999,8 +3014,8 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
QemuOpts *opts,
 }
 
 /* Want encryption? There you go. */
-if (encryptfmt) {
-ret = qcow2_set_up_encryption(blk_bs(blk), encryptfmt, opts, errp);
+if (qcow2_opts->has_encrypt) {
+ret = qcow2_set_up_encryption(blk_bs(blk), qcow2_opts->encrypt, errp);
 if (ret < 0) {
 goto out;
 }
@@ -3058,6 +3073,7 @@ static int coroutine_fn qcow2_co_create_opts(const char 
*filename, QemuOpts *opt
 int version;
 uint64_t refcount_bits;
 char *encryptfmt = NULL;
+QCryptoBlockCreateOptions *cryptoopts = NULL;
 BlockDriverState *bs = NULL;
 Error *local_err = NULL;
 int ret;
@@ -3074,6 +3090,7 @@ static int coroutine_fn qcow2_co_create_opts(const char 
*filename, QemuOpts *opt
 ret = -EINVAL;
 goto finish;
 }
+
 encryptfmt = qemu_opt_get_del(opts, BLOCK_OPT_ENCRYPT_FORMAT);
 if (encryptfmt) {
 if (qemu_opt_get(opts, BLOCK_OPT_ENCRYPT)) {
@@ -3085,6 +3102,14 @@ static int coroutine_fn qcow2_co_create_opts(const char 
*filename, QemuOpts *opt
 } else if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ENCRYPT, false)) {
 encryptfmt = g_strdup("aes");
 }
+if (encryptfmt) {
+cryptoopts = qcow2_parse_encryption(encryptfmt, opts, errp);
+if (cryptoopts == NULL) {
+ret = -EINVAL;
+goto finish;
+}
+}
+
 cluster_size = qcow2_opt_get_cluster_size_del(opts, _err);
 if (local_err) {
 error_propagate(errp, local_err);
@@ -3158,6 +3183,8 @@ static int coroutine_fn qcow2_co_create_opts(const char 
*filename, QemuOpts *opt
 .backing_file   = backing_file,
 .has_backing_fmt= (backing_fmt != NULL),
 .backing_fmt= backing_drv,
+

[Qemu-block] [PATCH v4 00/37] x-blockdev-create for protocols and qcow2

2018-03-07 Thread Kevin Wolf

This series implements a minimal QMP command that allows to create an
image file on the protocol level or an image format on a given block
node.

Eventually, the interface is going to change to some kind of an async
command (possibly a (non-)block job), but that will require more work on
the job infrastructure first, so let's first QAPIfy image creation in
the block drivers. In this series, I'm going for a synchronous command
that is prefixed with x- for now.

This series converts qcow2 and all protocol drivers that allow an actual
image creation. This means that drivers which only check if the already
existing storage is good enough are not converted (e.g. host_device,
iscsi). The old behaviour was useful because 'qemu-img create' wants to
create both protocol and format layer, but with the separation in QMP,
you can just leave out the protocol layer creation when the device
already exists.

Please note that for some of the protocol drivers (gluster, rbd and
sheepdog) I don't have a test setup ready. For those, I only tested
with a fake server address to check that the option are parsed correctly
up to this point and an appropriate error is returned without crashing.

If you are a maintainer of one of these protocols and you are
interested in keeping image creation working for your protocol, you
probably want to test this series on a real setup and give me some
feedback. If you don't, I'll just merge the patches and hope that they
won't break anything.


v4:
- Rebased on top of a few conflicting series that have hit master
  meanwhile
- qcow2: Renamed qcow2_create2() to qcow2_co_create() while resolving
  the conflict from Paolo's series that renamed it to qcow2_co_create2()
- rbd: Further simplified qemu_rbd_mon_host() [Max]


git-backport-diff compared to v3:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/37:[] [--] 'block/qapi: Introduce BlockdevCreateOptions'
002/37:[] [--] 'block/qapi: Add qcow2 create options to schema'
003/37:[down] 'qcow2: Rename qcow2_co_create2() to qcow2_co_create()'
-> this one is actually new
004/37:[0012] [FC] 'qcow2: Let qcow2_create() handle protocol layer'
005/37:[down] 'qcow2: Pass BlockdevCreateOptions to qcow2_co_create()'
006/37:[down] 'qcow2: Use BlockdevRef in qcow2_co_create()'
007/37:[down] 'qcow2: Use QCryptoBlockCreateOptions in qcow2_co_create()'
008/37:[down] 'qcow2: Handle full/falloc preallocation in qcow2_co_create()'
-> just rebase with subject line changes
009/37:[] [--] 'util: Add qemu_opts_to_qdict_filtered()'
010/37:[] [--] 'test-qemu-opts: Test qemu_opts_append()'
011/37:[] [--] 'test-qemu-opts: Test qemu_opts_to_qdict_filtered()'
012/37:[] [--] 'qdict: Introduce qdict_rename_keys()'
013/37:[0005] [FC] 'qcow2: Use visitor for options in qcow2_create()'
014/37:[] [--] 'block: Make bdrv_is_whitelisted() public'
015/37:[0011] [FC] 'block: x-blockdev-create QMP command'
016/37:[0006] [FC] 'file-posix: Support .bdrv_co_create'
017/37:[0006] [FC] 'file-win32: Support .bdrv_co_create'
018/37:[0018] [FC] 'gluster: Support .bdrv_co_create'
019/37:[] [--] 'rbd: Fix use after free in qemu_rbd_set_keypairs() error 
path'
020/37:[] [--] 'rbd: Factor out qemu_rbd_connect()'
021/37:[] [--] 'rbd: Remove non-schema options from runtime_opts'
022/37:[0008] [FC] 'rbd: Pass BlockdevOptionsRbd to qemu_rbd_connect()'
023/37:[0022] [FC] 'rbd: Support .bdrv_co_create'
024/37:[] [--] 'rbd: Assign s->snap/image_name in qemu_rbd_open()'
025/37:[] [--] 'rbd: Use qemu_rbd_connect() in qemu_rbd_do_create()'
026/37:[] [-C] 'nfs: Use QAPI options in nfs_client_open()'
027/37:[0006] [FC] 'nfs: Support .bdrv_co_create'
028/37:[] [-C] 'sheepdog: QAPIfy "redundancy" create option'
029/37:[0009] [FC] 'sheepdog: Support .bdrv_co_create'
030/37:[0001] [FC] 'ssh: Use QAPI BlockdevOptionsSsh object'
031/37:[] [--] 'ssh: QAPIfy host-key-check option'
032/37:[] [-C] 'ssh: Pass BlockdevOptionsSsh to connect_to_ssh()'
033/37:[0035] [FC] 'ssh: Support .bdrv_co_create'
034/37:[] [--] 'file-posix: Fix no-op bdrv_truncate() with falloc 
preallocation'
035/37:[] [--] 'block: Fail bdrv_truncate() with negative size'
036/37:[] [--] 'qemu-iotests: Test qcow2 over file image creation with QMP'
037/37:[] [--] 'qemu-iotests: Test ssh image creation over QMP'Key:


Kevin Wolf (37):
  block/qapi: Introduce BlockdevCreateOptions
  block/qapi: Add qcow2 create options to schema
  qcow2: Rename qcow2_co_create2() to qcow2_co_create()
  qcow2: Let qcow2_create() handle protocol layer
  qcow2: Pass BlockdevCreateOptions to qcow2_co_create()
  qcow2: Use BlockdevRef in qcow2_co_create()
  qcow2: Use QCryptoBlockCreateOptions in qcow2_co_create()
  qcow2: Handle full/falloc preallocation in qcow2_co_create()
  util: Add

[Qemu-block] [PATCH v4 04/37] qcow2: Let qcow2_create() handle protocol layer

2018-03-07 Thread Kevin Wolf

Currently, qcow2_create() only parses the QemuOpts and then calls
qcow2_co_create() for the actual image creation, which includes both the
creation of the actual file on the file system and writing a valid empty
qcow2 image into that file.

The plan is that qcow2_co_create() becomes the function that implements
the functionality for a future 'blockdev-create' QMP command, which only
creates the qcow2 layer on an already opened file node.

This is a first step towards that goal: Let's move out anything that
deals with the protocol layer from qcow2_co_create() into
qcow2_create().  This means that qcow2_co_create() doesn't need a file
name any more.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 64 +++
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b7354a27a1..7a11874d22 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2761,7 +2761,7 @@ static uint64_t qcow2_opt_get_refcount_bits_del(QemuOpts 
*opts, int version,
 }
 
 static int coroutine_fn
-qcow2_co_create(const char *filename, int64_t total_size,
+qcow2_co_create(BlockDriverState *bs, int64_t total_size,
 const char *backing_file, const char *backing_format,
 int flags, size_t cluster_size, PreallocMode prealloc,
 QemuOpts *opts, int version, int refcount_order,
@@ -2787,28 +2787,11 @@ qcow2_co_create(const char *filename, int64_t 
total_size,
 Error *local_err = NULL;
 int ret;
 
-if (prealloc == PREALLOC_MODE_FULL || prealloc == PREALLOC_MODE_FALLOC) {
-int64_t prealloc_size =
-qcow2_calc_prealloc_size(total_size, cluster_size, refcount_order);
-qemu_opt_set_number(opts, BLOCK_OPT_SIZE, prealloc_size, _abort);
-qemu_opt_set(opts, BLOCK_OPT_PREALLOC, PreallocMode_str(prealloc),
- _abort);
-}
-
-ret = bdrv_create_file(filename, opts, _err);
+blk = blk_new(BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
+ret = blk_insert_bs(blk, bs, errp);
 if (ret < 0) {
-error_propagate(errp, local_err);
-return ret;
-}
-
-blk = blk_new_open(filename, NULL, NULL,
-   BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL,
-   _err);
-if (blk == NULL) {
-error_propagate(errp, local_err);
-return -EIO;
+goto out;
 }
-
 blk_set_allow_write_beyond_eof(blk, true);
 
 /* Write the header */
@@ -2863,7 +2846,8 @@ qcow2_co_create(const char *filename, int64_t total_size,
  */
 options = qdict_new();
 qdict_put_str(options, "driver", "qcow2");
-blk = blk_new_open(filename, NULL, options,
+qdict_put_str(options, "file", bs->node_name);
+blk = blk_new_open(NULL, NULL, options,
BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_NO_FLUSH,
_err);
 if (blk == NULL) {
@@ -2935,7 +2919,8 @@ qcow2_co_create(const char *filename, int64_t total_size,
  */
 options = qdict_new();
 qdict_put_str(options, "driver", "qcow2");
-blk = blk_new_open(filename, NULL, options,
+qdict_put_str(options, "file", bs->node_name);
+blk = blk_new_open(NULL, NULL, options,
BDRV_O_RDWR | BDRV_O_NO_BACKING | BDRV_O_NO_IO,
_err);
 if (blk == NULL) {
@@ -2966,6 +2951,7 @@ static int coroutine_fn qcow2_co_create_opts(const char 
*filename, QemuOpts *opt
 uint64_t refcount_bits;
 int refcount_order;
 char *encryptfmt = NULL;
+BlockDriverState *bs = NULL;
 Error *local_err = NULL;
 int ret;
 
@@ -3034,12 +3020,38 @@ static int coroutine_fn qcow2_co_create_opts(const char 
*filename, QemuOpts *opt
 
 refcount_order = ctz32(refcount_bits);
 
-ret = qcow2_co_create(filename, size, backing_file, backing_fmt, flags,
+/* Create and open the file (protocol layer) */
+if (prealloc == PREALLOC_MODE_FULL || prealloc == PREALLOC_MODE_FALLOC) {
+int64_t prealloc_size =
+qcow2_calc_prealloc_size(size, cluster_size, refcount_order);
+qemu_opt_set_number(opts, BLOCK_OPT_SIZE, prealloc_size, _abort);
+qemu_opt_set(opts, BLOCK_OPT_PREALLOC, PreallocMode_str(prealloc),
+ _abort);
+}
+
+ret = bdrv_create_file(filename, opts, errp);
+if (ret < 0) {
+goto finish;
+}
+
+bs = bdrv_open(filename, NULL, NULL,
+   BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL, errp);
+if (bs == NULL) {
+ret = -EIO;
+goto finish;
+}
+
+/* Create the qcow2 image (format layer) */
+ret = qcow2_co_create(bs, size, backing_file, backing_fmt, flags,
   cluster_size, prealloc, opts, version, 
refcount_order,
-  encryptfmt, _err);
-error_propagate(errp, local_err);
+

[Qemu-block] [PATCH v4 03/37] qcow2: Rename qcow2_co_create2() to qcow2_co_create()

2018-03-07 Thread Kevin Wolf

The functions originally known as qcow2_create() and qcow2_create2()
are now called qcow2_co_create_opts() and qcow2_co_create(), which
matches the names of the BlockDriver callbacks that they will implement
at the end of this patch series.

Signed-off-by: Kevin Wolf 
---
 block/qcow2.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 369e374a9b..b7354a27a1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2761,11 +2761,11 @@ static uint64_t 
qcow2_opt_get_refcount_bits_del(QemuOpts *opts, int version,
 }
 
 static int coroutine_fn
-qcow2_co_create2(const char *filename, int64_t total_size,
- const char *backing_file, const char *backing_format,
- int flags, size_t cluster_size, PreallocMode prealloc,
- QemuOpts *opts, int version, int refcount_order,
- const char *encryptfmt, Error **errp)
+qcow2_co_create(const char *filename, int64_t total_size,
+const char *backing_file, const char *backing_format,
+int flags, size_t cluster_size, PreallocMode prealloc,
+QemuOpts *opts, int version, int refcount_order,
+const char *encryptfmt, Error **errp)
 {
 QDict *options;
 
@@ -3034,9 +3034,9 @@ static int coroutine_fn qcow2_co_create_opts(const char 
*filename, QemuOpts *opt
 
 refcount_order = ctz32(refcount_bits);
 
-ret = qcow2_co_create2(filename, size, backing_file, backing_fmt, flags,
-   cluster_size, prealloc, opts, version, 
refcount_order,
-   encryptfmt, _err);
+ret = qcow2_co_create(filename, size, backing_file, backing_fmt, flags,
+  cluster_size, prealloc, opts, version, 
refcount_order,
+  encryptfmt, _err);
 error_propagate(errp, local_err);
 
 finish:
-- 
2.13.6

Re: [Qemu-block] [PATCH v2 3/7] qcow2: Check L1 table parameters in qcow2_expand_zero_clusters()

2018-03-07 Thread Eric Blake


On 03/06/2018 10:14 AM, Alberto Garcia wrote:

This function iterates over all snapshots of a qcow2 file in order to
expand all zero clusters, but it does not validate the snapshots' L1
tables first.

We now have a function to take care of this, so let's use it.

We can also take the opportunity to replace the sector-based
bdrv_read() with bdrv_pread().

Signed-off-by: Alberto Garcia 
Cc: Eric Blake 
---
  block/qcow2-cluster.c  | 24 +---
  tests/qemu-iotests/080 |  2 ++
  tests/qemu-iotests/080.out |  4 
  3 files changed, 23 insertions(+), 7 deletions(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [PATCH] qemu-iotests: fix 203 migration completion race

2018-03-07 Thread Max Reitz

On 2018-03-06 17:18, Stefan Hajnoczi wrote:
> On Mon, Mar 05, 2018 at 05:04:52PM +0100, Max Reitz wrote:
>> On 2018-03-05 16:59, Stefan Hajnoczi wrote:
>>> There is a race between the test's 'query-migrate' QMP command after the
>>> QMP 'STOP' event and completing the migration:
>>>
>>> The test case invokes 'query-migrate' upon receiving 'STOP'.  At this
>>> point the migration thread may still be in the process of completing.
>>> Therefore 'query-migrate' can return 'status': 'active' for a brief
>>> window of time instead of 'status': 'completed'.  This results in
>>> qemu-iotests 203 hanging.
>>>
>>> Solve the race by enabling the 'events' migration capability, which
>>> causes QEMU to emit migration-specific QMP events that do not suffer
>>> from this race condition.  Wait for the QMP 'MIGRATION' event with
>>> 'status': 'completed'.
>>>
>>> Reported-by: Max Reitz 
>>> Signed-off-by: Stefan Hajnoczi 
>>> ---
>>>  tests/qemu-iotests/203 | 15 +++
>>>  tests/qemu-iotests/203.out |  5 +
>>>  2 files changed, 16 insertions(+), 4 deletions(-)
>>
>> So much for "the ppoll() dungeon"...
> 
> It was still a pain to debug :).
> 
> I put a ring buffer into the QMP monitor input/output code.

Oh, wow.

>  Then it was
> possible to figure out the issue via GDB on a hung QEMU:
> 
>   (gdb) p current_run_state
>   RUN_STATE_POSTMIGRATE
>   (gdb) p current_migration->status
>   MIGRATION_STATUS_COMPLETED
>   (gdb) p monitor_out_ring
>   ...'STOP' event...
>   (gdb) p monitor_in_ring
>   ...query-migrate...  <-- okay, the test checked if migration finished
> 
> Then looking at the code:
> 
>   static void migration_completion(MigrationState *s)
>   {
>   ...
>   if (s->state == MIGRATION_STATUS_ACTIVE) {
>   qemu_mutex_lock_iothread();
>   s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>   qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>   s->vm_was_running = runstate_is_running();
>   ret = global_state_store();
> 
>   if (!ret) {
>   bool inactivate = !migrate_colo_enabled();
> 
>   v The stop event comes from here
>   ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> ...
>   }
>   qemu_mutex_unlock_iothread(); <--- oh, no!
>   ...
>   if (!migrate_colo_enabled()) {
>   migrate_set_state(>state, current_active_state,
> MIGRATION_STATUS_COMPLETED); <-- too late!
>   }
> 
>   return;

OK...  I guess the answer to this just is "the stop event doesn't mean
anything, use the migration events instead" (i.e. what your patch does).

Thanks a lot, applied to my block branch:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v4] iotests: Tweak 030 in order to trigger a race condition with parallel jobs

2018-03-07 Thread Max Reitz

On 2018-03-06 14:01, Alberto Garcia wrote:
> This patch tweaks TestParallelOps in iotest 030 so it allocates data
> in smaller regions (256KB/512KB instead of 512KB/1MB) and the
> block-stream job in test_stream_commit() only needs to copy data that
> is at the very end of the image.
> 
> This way when the block-stream job is awakened it will finish right
> away without any chance of being stopped by block_job_sleep_ns(). This
> triggers the bug that was fixed by 3d5d319e1221082974711af1d09d82f and
> 1a63a907507fbbcfaee3f622907ec24 and is therefore a more useful test
> case for parallel block jobs.
> 
> After this patch the aforementiond bug can also be reproduced with the
> test_stream_parallel() test case.
> 
> Since with this change the stream job in test_stream_commit() finishes
> early, this patch introduces a similar test case where both jobs are
> slowed down so they can actually run in parallel.
> 
> Signed-off-by: Alberto Garcia 
> Cc: John Snow 
> ---
> 
> v4: Mention that commit 1a63a907507fbbcfaee3f622907ec24 also
> contributes to solve the original bug (both commits need to
> reverted in order to reproduce this bug reliably).
> 
> Rewrite the loop that writes data into the images to make it more
> readable.

Thanks!  Applied to my block tree:

https://github.com/XanClic/qemu/commits/block

(Still took me a couple of attempts to get it to fail both commits
reverted, though...)

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 0/2] block/ssh: Implement .bdrv_refresh_filename()

2018-03-07 Thread John Snow

On 03/07/2018 05:04 AM, Kevin Wolf wrote:
> Am 06.03.2018 um 22:51 hat John Snow geschrieben:
>> On 02/05/2018 03:22 PM, Max Reitz wrote:
>>> This series implements .bdrv_refresh_filename() for the ssh block
>>> driver, along with an appropriate .bdrv_dirname() so we don't chop off
>>> query strings for backing files with relative filenames.
>>>
>>> This series depends on my “block: Fix some filename generation issues”
>>> series and on Pino's “ssh: switch from libssh2 to libssh” patch.
>>>
>>> Based-on: 20180205151835.20812-1-mre...@redhat.com
>>> Based-on: 20180118164439.2120-1-ptosc...@redhat.com
>>>
>>>
>>> Max Reitz (2):
>>>   block/ssh: Implement .bdrv_refresh_filename()
>>>   block/ssh: Implement .bdrv_dirname()
>>>
>>>  block/ssh.c | 72 
>>> +++--
>>>  1 file changed, 65 insertions(+), 7 deletions(-)
>>
>> Did this one rot on the vine?
>>
>>> 1 month old.
> 
> The Based-on tags are the problem, in particular the first one. But yes,
> we could possibly do more to review the dependencies...
> 
> Kevin
> 

Yeah, sorry, I'm not pulling my review weight right now. I do try to
keep my block-devel inbox a rough "todo," though, so for patches that
get to that one month mark with no conclusion I like to ping them before
I archive them.

It's something I'd like to see patchew do, actually:

"Here's a list of what's on the list that has no reviews or NACKs, and
needs some love"

coupled with a 30 day "Hey, nobody looked at this" ping to the list
before it NACKs a set for being too old.

I hope nobody reads these >1 month pings as me wondering why nobody ELSE
has reviewed it, which is not my intent...

--js

Re: [Qemu-block] [Qemu-devel] [RFC] qemu-img: Drop BLK_ZERO from convert

2018-03-07 Thread Max Reitz

On 2018-03-07 18:05, Kevin Wolf wrote:
> Am 07.03.2018 um 16:57 hat Max Reitz geschrieben:
>> On 2018-03-06 18:37, Kevin Wolf wrote:
>>> Am 06.03.2018 um 14:47 hat Stefan Hajnoczi geschrieben:
 On Wed, Feb 28, 2018 at 09:11:32PM +0100, Max Reitz wrote:
> So...  It's more extreme than I had hoped, that's for sure.  What I
> conclude from this is:
>
> (1) This patch is generally good for nearly fully allocated images.  In
> the worst case (on well-behaving filesystems with well-optimized image
> formats) it changes nothing.  In the best case, conversion time is
> reduced drastically.
>>>
>>> This makes sense. Asking the kernel whether a block is zero only helps
>>> the performance if the result is yes occasionally, otherwise it's just
>>> wasted work.
>>>
>>> Maybe we could try to guess the ratio by comparing the number of
>>> allocated blocks in the image file and the virtual disk size or
>>> something? Then we could only call lseek() when we can actually expect
>>> an improvement from it.
>>
>> Sounds like "qemu should not contain policy" to me.  If the user expects
>> the image to be fully allocated, they might as well use -S 0.
> 
> Optimising special cases isn't really what is meant when we're talking
> about having policy in qemu. The result doesn't change, but the
> performance does potentially. And as long as you access the whole image
> like in qemu-img convert, qemu can make a pretty good estimation because
> it knows both physical and virtual image sizes, so why bother the
> management layer with it?

Oh, I thought we should measure how long an lseek() takes and then
decide based on that.  Hm, well, yes, comparing the allocation size
would be worse thinking about.  But then it gets tricky with internal
snapshots and such...

> The thing that could be considered policy is the threshold where you
> switch from one method to the other.
> 
> (2) For sparse raw images, this is absolutely devastating.  Reading them
> now takes more than (ext4) or nearly (xfs) twice as much time as reading
> a fully allocated image.  So much for "if a filesystem driver has any
> sense".
>>>
>>> Are you sure that only the filesystem is the problem? Checking for every
>>> single byte of an image whether it is zero has to cost some performance.
>>
>> Well, yes, but "read data location from FS metadata" + "realize it's a
>> hole" + memset() + "repe scasb" shouldn't take twice as much time as
>> "read data location from FS metadata" + "read data from SSD".
>>
>> I expected the "realize it's a hole" part to fall out for free, so this
>> would that memset() + repe scasb take much longer than reading data from
>> the SSD -- and that's just pretty much impossible.
> 
> Not sure where you get the SSD part from. The scenarios you're comparing
> are these:
> 
> 1. Query holes with lseek() and then memset() in qemu's emulation of
>bdrv_co_pwrite_zeroes() for drivers that don't implement it. (Which
>is true for your null-co, but not representative for the real-world
>use cases with an actual file-posix protocol layer. Benchmarking with
>an extended null-co that has write_zero support would probably
>better.)

That's before this patch for sparsely allocated images.

> 2. Uncondtionally call pread() and let the kernel do a memset(), at the
>cost of having to scan the buffer afterwards because qemu doesn't
>know yet that it contains zeros.

That's after this patch for sparsely allocated images.

What I was wondering about was solely post-patch behavior, namely
sparsely vs. nearly fully allocated images.

The thing was that converting a sparsely allocated image from an SSD
took twice as much time as converting a fully allocated image.  Reading
data comes in for the fully allocated case.

The thing was that I forgot to drop the caches (and I really do want to
drop those, because they only help for my 2 GB test case, but in the
real world with 300+ GB images, they won't do much).  So, yes, I guess
what I compared was an in-memory metadata lookup + memset() +
O(n) buffer_is_zero() vs. in-memory metadata lookup + memcpy() +
O(1) buffer_is_zero().

Still leaves something to be explained (because I'd expect memset() to
be twice as fast as memcpy()), but at least it isn't completely weird.

> Neither case involves disk accesses if the filesystem metadata is
> cached. You're comparing a memset+scan to just a memset (and the more
> realistic case should be comparing memset+scan to nothing).
> 
> (BTW, buffer_is_zero() does complicated stuff, but 'repe scasb' isn't
> among it.)

I know, that was just a simplification.

>>> The fully allocated image doesn't suffer from this because (a) it only
>>> has to check the first byte in each block and (b) the cost was already
>>> there before this patch.
>>>
>>> In fact, if null-co supported .bdrv_co_pwrite_zeroes, I think you would
>>> get even worse results for your patch because then the pre-patch version
>>> doesn't even have to

Re: [Qemu-block] [Qemu-devel] [RFC] qemu-img: Drop BLK_ZERO from convert

2018-03-07 Thread Kevin Wolf

Am 07.03.2018 um 16:57 hat Max Reitz geschrieben:
> On 2018-03-06 18:37, Kevin Wolf wrote:
> > Am 06.03.2018 um 14:47 hat Stefan Hajnoczi geschrieben:
> >> On Wed, Feb 28, 2018 at 09:11:32PM +0100, Max Reitz wrote:
> >>> So...  It's more extreme than I had hoped, that's for sure.  What I
> >>> conclude from this is:
> >>>
> >>> (1) This patch is generally good for nearly fully allocated images.  In
> >>> the worst case (on well-behaving filesystems with well-optimized image
> >>> formats) it changes nothing.  In the best case, conversion time is
> >>> reduced drastically.
> > 
> > This makes sense. Asking the kernel whether a block is zero only helps
> > the performance if the result is yes occasionally, otherwise it's just
> > wasted work.
> > 
> > Maybe we could try to guess the ratio by comparing the number of
> > allocated blocks in the image file and the virtual disk size or
> > something? Then we could only call lseek() when we can actually expect
> > an improvement from it.
> 
> Sounds like "qemu should not contain policy" to me.  If the user expects
> the image to be fully allocated, they might as well use -S 0.

Optimising special cases isn't really what is meant when we're talking
about having policy in qemu. The result doesn't change, but the
performance does potentially. And as long as you access the whole image
like in qemu-img convert, qemu can make a pretty good estimation because
it knows both physical and virtual image sizes, so why bother the
management layer with it?

The thing that could be considered policy is the threshold where you
switch from one method to the other.

> >>> (2) For sparse raw images, this is absolutely devastating.  Reading them
> >>> now takes more than (ext4) or nearly (xfs) twice as much time as reading
> >>> a fully allocated image.  So much for "if a filesystem driver has any
> >>> sense".
> > 
> > Are you sure that only the filesystem is the problem? Checking for every
> > single byte of an image whether it is zero has to cost some performance.
> 
> Well, yes, but "read data location from FS metadata" + "realize it's a
> hole" + memset() + "repe scasb" shouldn't take twice as much time as
> "read data location from FS metadata" + "read data from SSD".
> 
> I expected the "realize it's a hole" part to fall out for free, so this
> would that memset() + repe scasb take much longer than reading data from
> the SSD -- and that's just pretty much impossible.

Not sure where you get the SSD part from. The scenarios you're comparing
are these:

1. Query holes with lseek() and then memset() in qemu's emulation of
   bdrv_co_pwrite_zeroes() for drivers that don't implement it. (Which
   is true for your null-co, but not representative for the real-world
   use cases with an actual file-posix protocol layer. Benchmarking with
   an extended null-co that has write_zero support would probably
   better.)

2. Uncondtionally call pread() and let the kernel do a memset(), at the
   cost of having to scan the buffer afterwards because qemu doesn't
   know yet that it contains zeros.

Neither case involves disk accesses if the filesystem metadata is
cached. You're comparing a memset+scan to just a memset (and the more
realistic case should be comparing memset+scan to nothing).

(BTW, buffer_is_zero() does complicated stuff, but 'repe scasb' isn't
among it.)

> > The fully allocated image doesn't suffer from this because (a) it only
> > has to check the first byte in each block and (b) the cost was already
> > there before this patch.
> > 
> > In fact, if null-co supported .bdrv_co_pwrite_zeroes, I think you would
> > get even worse results for your patch because then the pre-patch version
> > doesn't even have to do the memset().
> > 
> >>> (2a) It might be worth noting that on xfs, reading the sparse file took
> >>> longer even before this patch...
> >>>
> >>> (3) qcow2 is different: It benefits from this patch on tmpfs and xfs
> >>> (note that reading a sparse qcow2 file took longer than reading a full
> >>> qcow2 file before this patch!), but it gets pretty much destroyed on
> >>> ext4, too.
> > 
> > I suppose an empty qcow2 with metadata preallocation behaves roughly
> > like sparse raw?
> 
> Yep, more on that below.
> 
> > As long as the qcow2 metadata reflects the allocation status in the
> > image file (which it probably usually does, except with preallocation),
> > it makes sense that qcow2 performs better with just relying on its
> > metadata. Calling an lseek() that just returns the same result is a
> > wasted effort then.
> > 
> >>> (4) As for sparse vmdk images...  Reading them takes longer, but it's
> >>> still fasster than reading full vmdk images, so that's not horrible.
> > 
> > Hm, why is that? Shouldn't vmdk metadata reflect the allocation status
> > in the image file just as well as qcow2 metadata?
> > 
> > But actually, the absolute numbers are much lower than both raw and
> > qcow2, which is a bit surprising. Is there a bug somewhere in vmdk or
> > are we

Re: [Qemu-block] [Qemu-devel] [RFC] qemu-img: Drop BLK_ZERO from convert

2018-03-07 Thread Max Reitz

On 2018-03-07 17:33, Paolo Bonzini wrote:
> On 07/03/2018 16:57, Max Reitz wrote:
> (2) For sparse raw images, this is absolutely devastating.  Reading them
> now takes more than (ext4) or nearly (xfs) twice as much time as reading
> a fully allocated image.  So much for "if a filesystem driver has any
> sense".
>>> Are you sure that only the filesystem is the problem? Checking for every
>>> single byte of an image whether it is zero has to cost some performance.
>> Well, yes, but "read data location from FS metadata" + "realize it's a
>> hole" + memset() + "repe scasb" shouldn't take twice as much time as
>> "read data location from FS metadata" + "read data from SSD".
>>
>> I expected the "realize it's a hole" part to fall out for free, so this
>> would that memset() + repe scasb take much longer than reading data from
>> the SSD -- and that's just pretty much impossible.
>>
> 
> This makes a lot of sense, but just to double-check, what does profiling
> say?

Oops, right.  I forgot that I forgot to drop the caches in that first
benchmark...
(http://lists.nongnu.org/archive/html/qemu-block/2018-02/msg01166.html)

I don't have a full test run for this RFC version, but with the modified
series I hinted at in
http://lists.nongnu.org/archive/html/qemu-block/2018-03/msg00244.html I get:
- 0.6 s for a sparse qcow2
- 1.3 s for a preallocated qcow2 (basically like a sparse raw file in
this RFC here)
- 4.0 s for a fully allocated qcow2

So that makes more sense.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 2/2] iotests: add 208 nbd-server + blockdev-snapshot-sync test case

2018-03-07 Thread Max Reitz

On 2018-03-07 17:43, Stefano Panella wrote:
> 
> 
> On Wed, Mar 7, 2018 at 4:16 PM, Stefan Hajnoczi  > wrote:
>>
>> On Wed, Mar 7, 2018 at 1:57 PM, Stefano Panella  > wrote:
>> > On Wed, Mar 7, 2018 at 10:55 AM, Stefan Hajnoczi  > wrote:
>> >>
>> >> On Tue, Mar 6, 2018 at 11:25 PM, Stefano Panella
> >
>> >> wrote:
>> >> > I have applied this patch and when I run the following qmp
> commands I I
>> >> > do
>> >> > not see the crash anymore but there is still something wrong because
>> >> > only
>> >> > /root/a is opened from qemu. It looks like nbd-server-stop is also
>> >> > getting
>> >> > rid of the nodes added with blockdev-snapshot-sync, therfore is
> than not
>> >> > possible to do blockdev-del on /root/d because node-name is not found
>> >>
>> >> Nodes are reference counted.  If nothing holds a refcount then the
>> >> node is freed.
>> > Thanks, that explains the behaviour
>> >>
>> >> The blockdev-add command holds a reference to the node.  The node will
>> >> stay alive until blockdev-del, which releases that reference.
>> >>
>> >> blockdev-snapshot-sync does not hold a reference.  Therefore snapshot
>> >> nodes are freed once nothing is using them anymore.  When the snapshot
>> >> node is created, the users of the parent node are updated to point to
>> >> the snapshot node instead.  This is why the NBD server switches to the
>> >> snapshot mode after blockdev-snapshot-sync.
>> >>
>> >> This is why the snapshot nodes disappear after the NBD server is
>> >> stopped while /root/a stays alive.
>> >>
>> >> I'm not sure if the current blockdev-snapshot-sync behavior is useful.
>> >> Perhaps the presence of the "snapshot-node-name" argument should cause
>> >> the snapshot node to be treated as monitor-owned, just like
>> >> blockdev-add.  This would introduce leaks for existing QMP clients
>> >> though, so it may be necessary to add yet another argument for this
>> >> behavior.
>> > that would be nice, I mean to add an extra parameter so it is added
> to the
>> > monitor
>> >>
>> >> Anyway, I hope this explains the current behavior.  I don't see a
>> >> problem with it, but it's something the API users need to be aware of.
>> >>
>> > Yes, I was not aware of that behaviour, the problem is that many
> examples
>> > refer
>> > to having a device associated with the blockdev-add'd node therefore
> we do
>> > not
>> > see this problem.
>> >> If it is a problem for your use case, please explain what you are
> trying
>> >> to do.
>> >>
>> > It is not strictly a problem for my usecase but it would be nice to
> have the
>> > extra param to
>> > blockdev-snapshot-sync. That would also fix the problem of running
> multiple
>> > snap-sync
>> > after blockdev-add but before there is any user.
>>
>> Max Reitz mentioned that the 'blockdev-snapshot' command is preferred
>> over 'blockdev-snapshot-sync'.  'blockdev-snapshot-sync' is a legacy
>> command that implicitly creates the snapshot node.
>>
>> The difference is that 'blockdev-snapshot' requires that the user
>> first creates the snapshot file (e.g. using qemu-img create), then
>> uses 'blockdev-add' to add the snapshot node, and finally uses
>> 'blockdev-snapshot' to install the snapshot node.
>>
>> When 'blockdev-snapshot' is used, the user must delete snapshot nodes
>> using 'blockdev-del' since they created using 'blockdev-add'.
>>
> That is a very usefull info, I was not aware that blockdev-snapshot-sync
> was not
> recommended.

Yeah, well...  Someone (O:-)) needs to go over all the block QMP
commands and see which are good and which should be deprecated at some
point.  I don't think we have a central list of everything yet...

> I will try to run some examples with blockdev-snapshot.
> In case I want to achieve
> A <- B
> and I do:
> blockdev_add A
> create external snapshot with qemu-img B with A as backing image
> blockdev_add B
> blockdev_snapshot B -> A
> 
> What do I need to do to delete A and B?
> is it fine to just call blockdev_del B ?
> or should I call blockdev_del A as well?

You need to call both.  The basic idea is that you have to pair every
blockdev-add with a blockdev-del.

(You have to delete B first, though, because you cannot delete a node
while it is in use (and A is in use by B as long as B exists).)

Don't forget the '"backing": null" parameter for the blockdev-add B
command, or B will already have A opened as its backing image (which is
not good, you don't want qemu to open the same image twice).

(Or maybe blockdev-add B will not even work without '"backing": null'
because qemu figures out that you are trying to open the same image (A)
twice and prevent that.)

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 2/2] iotests: add 208 nbd-server + blockdev-snapshot-sync test case

2018-03-07 Thread Stefano Panella

On Wed, Mar 7, 2018 at 4:16 PM, Stefan Hajnoczi  wrote:
>
> On Wed, Mar 7, 2018 at 1:57 PM, Stefano Panella 
wrote:
> > On Wed, Mar 7, 2018 at 10:55 AM, Stefan Hajnoczi 
wrote:
> >>
> >> On Tue, Mar 6, 2018 at 11:25 PM, Stefano Panella 
> >> wrote:
> >> > I have applied this patch and when I run the following qmp commands
I I
> >> > do
> >> > not see the crash anymore but there is still something wrong because
> >> > only
> >> > /root/a is opened from qemu. It looks like nbd-server-stop is also
> >> > getting
> >> > rid of the nodes added with blockdev-snapshot-sync, therfore is than
not
> >> > possible to do blockdev-del on /root/d because node-name is not found
> >>
> >> Nodes are reference counted.  If nothing holds a refcount then the
> >> node is freed.
> > Thanks, that explains the behaviour
> >>
> >> The blockdev-add command holds a reference to the node.  The node will
> >> stay alive until blockdev-del, which releases that reference.
> >>
> >> blockdev-snapshot-sync does not hold a reference.  Therefore snapshot
> >> nodes are freed once nothing is using them anymore.  When the snapshot
> >> node is created, the users of the parent node are updated to point to
> >> the snapshot node instead.  This is why the NBD server switches to the
> >> snapshot mode after blockdev-snapshot-sync.
> >>
> >> This is why the snapshot nodes disappear after the NBD server is
> >> stopped while /root/a stays alive.
> >>
> >> I'm not sure if the current blockdev-snapshot-sync behavior is useful.
> >> Perhaps the presence of the "snapshot-node-name" argument should cause
> >> the snapshot node to be treated as monitor-owned, just like
> >> blockdev-add.  This would introduce leaks for existing QMP clients
> >> though, so it may be necessary to add yet another argument for this
> >> behavior.
> > that would be nice, I mean to add an extra parameter so it is added to
the
> > monitor
> >>
> >> Anyway, I hope this explains the current behavior.  I don't see a
> >> problem with it, but it's something the API users need to be aware of.
> >>
> > Yes, I was not aware of that behaviour, the problem is that many
examples
> > refer
> > to having a device associated with the blockdev-add'd node therefore we
do
> > not
> > see this problem.
> >> If it is a problem for your use case, please explain what you are
trying
> >> to do.
> >>
> > It is not strictly a problem for my usecase but it would be nice to
have the
> > extra param to
> > blockdev-snapshot-sync. That would also fix the problem of running
multiple
> > snap-sync
> > after blockdev-add but before there is any user.
>
> Max Reitz mentioned that the 'blockdev-snapshot' command is preferred
> over 'blockdev-snapshot-sync'.  'blockdev-snapshot-sync' is a legacy
> command that implicitly creates the snapshot node.
>
> The difference is that 'blockdev-snapshot' requires that the user
> first creates the snapshot file (e.g. using qemu-img create), then
> uses 'blockdev-add' to add the snapshot node, and finally uses
> 'blockdev-snapshot' to install the snapshot node.
>
> When 'blockdev-snapshot' is used, the user must delete snapshot nodes
> using 'blockdev-del' since they created using 'blockdev-add'.
>
That is a very usefull info, I was not aware that blockdev-snapshot-sync
was not
recommended. I will try to run some examples with blockdev-snapshot.
In case I want to achieve
A <- B
and I do:
blockdev_add A
create external snapshot with qemu-img B with A as backing image
blockdev_add B
blockdev_snapshot B -> A

What do I need to do to delete A and B?
is it fine to just call blockdev_del B ?
or should I call blockdev_del A as well?

Thanks again for your help!!!

> Stefan

Re: [Qemu-block] [PATCH 2/2] iotests: add 208 nbd-server + blockdev-snapshot-sync test case

2018-03-07 Thread Max Reitz

On 2018-03-07 11:55, Stefan Hajnoczi wrote:
> On Tue, Mar 6, 2018 at 11:25 PM, Stefano Panella  wrote:
>> I have applied this patch and when I run the following qmp commands I I do
>> not see the crash anymore but there is still something wrong because only
>> /root/a is opened from qemu. It looks like nbd-server-stop is also getting
>> rid of the nodes added with blockdev-snapshot-sync, therfore is than not
>> possible to do blockdev-del on /root/d because node-name is not found
> 
> Nodes are reference counted.  If nothing holds a refcount then the
> node is freed.
> 
> The blockdev-add command holds a reference to the node.  The node will
> stay alive until blockdev-del, which releases that reference.
> 
> blockdev-snapshot-sync does not hold a reference.

I think that's a bug.  When you specify a node name for the new node, it
should get a reference.

>Therefore snapshot
> nodes are freed once nothing is using them anymore.  When the snapshot
> node is created, the users of the parent node are updated to point to
> the snapshot node instead.  This is why the NBD server switches to the
> snapshot mode after blockdev-snapshot-sync.
> 
> This is why the snapshot nodes disappear after the NBD server is
> stopped while /root/a stays alive.
> 
> I'm not sure if the current blockdev-snapshot-sync behavior is useful.
> Perhaps the presence of the "snapshot-node-name" argument should cause
> the snapshot node to be treated as monitor-owned, just like
> blockdev-add.  This would introduce leaks for existing QMP clients
> though, so it may be necessary to add yet another argument for this
> behavior.

That's true.

> Anyway, I hope this explains the current behavior.  I don't see a
> problem with it, but it's something the API users need to be aware of.

Hm, OK.

As an explanation: blockdev-snapshot-sync is from before we had node
names and blockdev-add.  You'd just create something that needs the
block layer (like a guest device or an NBD server) and then you'd open
the BDS chain you want to go with it (mostly by just specifying the
filename of the top image, and maybe its format).

Then you'd use blockdev-snapshot-sync to just create an overlay during
runtime, and since there weren't any node names it was clear that it
would go away if you deleted whatever was using the chain (like the NBD
server).

Then we introduced node names, and blockdev-snapshot-sync gained the
ability to give one to the overlay -- basically as an afterthought.  I
think we didn't really have a fleshed-out concept of monitor references
back then...  So we forgot to give the overlay an additional reference
in such a case (because we didn't know better).

As you pointed to in your other reply, blockdev-snapshot is the "pure"
blockdev command that should ideally be used.  It allows you much more
control over the overlay (because you have to do the blockdev-add
yourself), and it doesn't have this reference issue.

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [RFC] qemu-img: Drop BLK_ZERO from convert

2018-03-07 Thread Paolo Bonzini

On 07/03/2018 16:57, Max Reitz wrote:
 (2) For sparse raw images, this is absolutely devastating.  Reading them
 now takes more than (ext4) or nearly (xfs) twice as much time as reading
 a fully allocated image.  So much for "if a filesystem driver has any
 sense".
>> Are you sure that only the filesystem is the problem? Checking for every
>> single byte of an image whether it is zero has to cost some performance.
> Well, yes, but "read data location from FS metadata" + "realize it's a
> hole" + memset() + "repe scasb" shouldn't take twice as much time as
> "read data location from FS metadata" + "read data from SSD".
> 
> I expected the "realize it's a hole" part to fall out for free, so this
> would that memset() + repe scasb take much longer than reading data from
> the SSD -- and that's just pretty much impossible.
> 

This makes a lot of sense, but just to double-check, what does profiling
say?

Paolo

Re: [Qemu-block] [PATCH 2/2] iotests: add 208 nbd-server + blockdev-snapshot-sync test case

2018-03-07 Thread Stefan Hajnoczi

On Wed, Mar 7, 2018 at 1:57 PM, Stefano Panella  wrote:
> On Wed, Mar 7, 2018 at 10:55 AM, Stefan Hajnoczi  wrote:
>>
>> On Tue, Mar 6, 2018 at 11:25 PM, Stefano Panella 
>> wrote:
>> > I have applied this patch and when I run the following qmp commands I I
>> > do
>> > not see the crash anymore but there is still something wrong because
>> > only
>> > /root/a is opened from qemu. It looks like nbd-server-stop is also
>> > getting
>> > rid of the nodes added with blockdev-snapshot-sync, therfore is than not
>> > possible to do blockdev-del on /root/d because node-name is not found
>>
>> Nodes are reference counted.  If nothing holds a refcount then the
>> node is freed.
> Thanks, that explains the behaviour
>>
>> The blockdev-add command holds a reference to the node.  The node will
>> stay alive until blockdev-del, which releases that reference.
>>
>> blockdev-snapshot-sync does not hold a reference.  Therefore snapshot
>> nodes are freed once nothing is using them anymore.  When the snapshot
>> node is created, the users of the parent node are updated to point to
>> the snapshot node instead.  This is why the NBD server switches to the
>> snapshot mode after blockdev-snapshot-sync.
>>
>> This is why the snapshot nodes disappear after the NBD server is
>> stopped while /root/a stays alive.
>>
>> I'm not sure if the current blockdev-snapshot-sync behavior is useful.
>> Perhaps the presence of the "snapshot-node-name" argument should cause
>> the snapshot node to be treated as monitor-owned, just like
>> blockdev-add.  This would introduce leaks for existing QMP clients
>> though, so it may be necessary to add yet another argument for this
>> behavior.
> that would be nice, I mean to add an extra parameter so it is added to the
> monitor
>>
>> Anyway, I hope this explains the current behavior.  I don't see a
>> problem with it, but it's something the API users need to be aware of.
>>
> Yes, I was not aware of that behaviour, the problem is that many examples
> refer
> to having a device associated with the blockdev-add'd node therefore we do
> not
> see this problem.
>> If it is a problem for your use case, please explain what you are trying
>> to do.
>>
> It is not strictly a problem for my usecase but it would be nice to have the
> extra param to
> blockdev-snapshot-sync. That would also fix the problem of running multiple
> snap-sync
> after blockdev-add but before there is any user.

Max Reitz mentioned that the 'blockdev-snapshot' command is preferred
over 'blockdev-snapshot-sync'.  'blockdev-snapshot-sync' is a legacy
command that implicitly creates the snapshot node.

The difference is that 'blockdev-snapshot' requires that the user
first creates the snapshot file (e.g. using qemu-img create), then
uses 'blockdev-add' to add the snapshot node, and finally uses
'blockdev-snapshot' to install the snapshot node.

When 'blockdev-snapshot' is used, the user must delete snapshot nodes
using 'blockdev-del' since they created using 'blockdev-add'.

Stefan

Re: [Qemu-block] [Qemu-devel] [RFC] qemu-img: Drop BLK_ZERO from convert

2018-03-07 Thread Max Reitz

On 2018-03-06 18:37, Kevin Wolf wrote:
> Am 06.03.2018 um 14:47 hat Stefan Hajnoczi geschrieben:
>> On Wed, Feb 28, 2018 at 09:11:32PM +0100, Max Reitz wrote:
>>> On 2018-02-28 19:08, Max Reitz wrote:
 On 2018-02-27 17:17, Stefan Hajnoczi wrote:
> On Mon, Feb 26, 2018 at 06:03:13PM +0100, Max Reitz wrote:
>> There are filesystems (among which is tmpfs) that have a hard time
>> reporting allocation status.  That is definitely a bug in them.
>>
>> However, there is no good reason why qemu-img convert should query the
>> allocation status in the first place.  It does zero detection by itself
>> anyway, so we can detect unallocated areas ourselves.
>>
>> Furthermore, if a filesystem driver has any sense, reading unallocated
>> data should take just as much time as lseek(SEEK_DATA) + memset().  So
>> the only overhead we introduce by dropping the manual lseek() call is a
>> memset() in the driver and a buffer_is_zero() in qemu-img, both of which
>> should be relatively quick.
>
> This makes sense.  Which file systems did you test this patch on?

 On tmpfs and xfs, so far.

> XFS, ext4, and tmpfs would be a good minimal test set to prove the
> patch.  Perhaps with two input files:
> 1. A file that is mostly filled with data.
> 2. A file that is only sparsely populated with data.

 And probably with vmdk, which (by default) forbids querying any areas
 larger than 64 kB.

> The time taken should be comparable with the time before this patch.

 Yep, I'll do some benchmarks.
>>>
>>> And the results are in.  I've created 2 GB images on various filesystems
>>> in various formats, then I've either written 64 kB every 32 MB to them
>>> ("sparse"), or left out 64 kB every 32 MB ("full").  Then I've converted
>>> them to null-co:// and took the (real) time through "time". (Script is
>>> attached.)
>>>
>>> I've attached the raw results before and after this patch.  Usually, I
>>> did six runs for each case and dropped the most extreme outlier --
>>> except for full vmdk images, where I've only done one run for each case
>>> because creating these images can take a very long time.
>>>
>>> Here are the differences from before to after:
>>>
>>> sparse raw on tmpfs:+ 19 % (436 ms to 520 ms)
>>> sparse qcow2 on tmpfs:  - 31 % (435 ms to 301 ms)
>>> sparse vmdk on tmpfs:   + 37 % (214 ms to 294 ms)
>>>
>>> sparse raw on xfs:  + 69 % (452 ms to 762 ms)
>>> sparse qcow2 on xfs:- 34 % (462 ms to 304 ms)
>>> sparse vmdk on xfs: + 42 % (210 ms to 298 ms)
>>>
>>> sparse raw on ext4: +360 % (144 ms to 655 ms)
>>> sparse qcow2 on ext4:   +120 % (147 ms to 330 ms)
>>> sparse vmdk on ext4:+ 16 % (253 ms to 293 ms)
>>>
>>>
>>> full raw on tmpfs:  -  9 % (437 ms to 398 ms)
>>> full qcow2 on tmpfs:- 75 % (1.63 s to 403 ms)
>>> full vmdk on tmpfs: -100 % (10 min to 767 ms)
>>>
>>> full raw on xfs:-  1 % (407 ms to 404 ms, insignificant)
>>> full qcow2 on xfs:  -  1 % (410 ms to 404 ms, insignificant)
>>> full vmdk on xfs:   - 33 % (1.05 s to 695 ms)
>>>
>>>
>>>
>>>
>>> full raw on ext4:   -  2 % (308 ms to 301 ms, insignificant)
>>> full qcow2 on ext4: +  2 % (307 ms to 312 ms, insignificant)
>>> full vmdk on ext4:  - 74 % (3.53 s to 839 ms)
>>>
>>>
>>> So...  It's more extreme than I had hoped, that's for sure.  What I
>>> conclude from this is:
>>>
>>> (1) This patch is generally good for nearly fully allocated images.  In
>>> the worst case (on well-behaving filesystems with well-optimized image
>>> formats) it changes nothing.  In the best case, conversion time is
>>> reduced drastically.
> 
> This makes sense. Asking the kernel whether a block is zero only helps
> the performance if the result is yes occasionally, otherwise it's just
> wasted work.
> 
> Maybe we could try to guess the ratio by comparing the number of
> allocated blocks in the image file and the virtual disk size or
> something? Then we could only call lseek() when we can actually expect
> an improvement from it.

Sounds like "qemu should not contain policy" to me.  If the user expects
the image to be fully allocated, they might as well use -S 0.

>>> (2) For sparse raw images, this is absolutely devastating.  Reading them
>>> now takes more than (ext4) or nearly (xfs) twice as much time as reading
>>> a fully allocated image.  So much for "if a filesystem driver has any
>>> sense".
> 
> Are you sure that only the filesystem is the problem? Checking for every
> single byte of an image whether it is zero has to cost some performance.

Well, yes, but "read data location from FS metadata" + "realize it's a
hole" + memset() + "repe scasb" shouldn't take twice as much time as
"read data location from FS metadata" + "read data from SSD".

I expected the "realize it's a hole" part to fall out for free, so this
would that memset() + repe scasb take much longer than reading data from
the SSD -- and

Re: [Qemu-block] [PATCH v3 0/4] vl: introduce vm_shutdown()

2018-03-07 Thread Paolo Bonzini

On 07/03/2018 15:42, Stefan Hajnoczi wrote:
> v3:
>  * Rebase on qemu.git/master after AIO_WAIT_WHILE() was merged [Fam]
> v2:
>  * Tackle the .ioeventfd_stop() vs vq handler race by removing the ioeventfd
>from a BH in the IOThread [Fam]
> 
> There are several race conditions in virtio-blk/virtio-scsi dataplane code.
> This patch series addresses them, see the commit description for details on 
> the
> individual cases.

Acked-by: Paolo Bonzini 

Thanks!

Paolo

Re: [Qemu-block] [PATCH] hw: Do not include "sysemu/block-backend.h" if it is not necessary

2018-03-07 Thread Paolo Bonzini

On 15/02/2018 09:55, Thomas Huth wrote:
> After reviewing a patch from Philippe that removes block-backend.h
> from hw/lm32/milkymist.c, I noticed that this header is included
> unnecessarily in a lot of other files, too. Remove those unneeded
> includes to speed up the compilation process a little bit.
> 
> Signed-off-by: Thomas Huth 

Michael seems busy, so I'll pick this up.

Paolo

> ---
>  hw/arm/highbank.c  | 1 -
>  hw/arm/msf2-soc.c  | 1 -
>  hw/arm/realview.c  | 1 -
>  hw/arm/tosa.c  | 1 -
>  hw/i386/pc.c   | 2 --
>  hw/i386/pc_piix.c  | 1 -
>  hw/ide/ahci-allwinner.c| 1 -
>  hw/ide/cmd646.c| 1 -
>  hw/ide/ich.c   | 1 -
>  hw/ide/isa.c   | 1 -
>  hw/ide/microdrive.c| 1 -
>  hw/ide/mmio.c  | 1 -
>  hw/mips/mips_fulong2e.c| 1 -
>  hw/mips/mips_jazz.c| 1 -
>  hw/ppc/mac_newworld.c  | 1 -
>  hw/ppc/mac_oldworld.c  | 1 -
>  hw/ppc/prep.c  | 1 -
>  hw/scsi/mptendian.c| 1 -
>  hw/sd/core.c   | 1 -
>  hw/sparc/sun4m.c   | 1 -
>  hw/tricore/tricore_testboard.c | 2 --
>  21 files changed, 23 deletions(-)
> 
> diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
> index 287392b..1742cf6 100644
> --- a/hw/arm/highbank.c
> +++ b/hw/arm/highbank.c
> @@ -27,7 +27,6 @@
>  #include "sysemu/kvm.h"
>  #include "sysemu/sysemu.h"
>  #include "hw/boards.h"
> -#include "sysemu/block-backend.h"
>  #include "exec/address-spaces.h"
>  #include "qemu/error-report.h"
>  #include "hw/char/pl011.h"
> diff --git a/hw/arm/msf2-soc.c b/hw/arm/msf2-soc.c
> index a8ec2cd..f68df56 100644
> --- a/hw/arm/msf2-soc.c
> +++ b/hw/arm/msf2-soc.c
> @@ -29,7 +29,6 @@
>  #include "exec/address-spaces.h"
>  #include "hw/char/serial.h"
>  #include "hw/boards.h"
> -#include "sysemu/block-backend.h"
>  #include "qemu/cutils.h"
>  #include "hw/arm/msf2-soc.h"
>  #include "hw/misc/unimp.h"
> diff --git a/hw/arm/realview.c b/hw/arm/realview.c
> index 87cd1e5..2139a62 100644
> --- a/hw/arm/realview.c
> +++ b/hw/arm/realview.c
> @@ -20,7 +20,6 @@
>  #include "sysemu/sysemu.h"
>  #include "hw/boards.h"
>  #include "hw/i2c/i2c.h"
> -#include "sysemu/block-backend.h"
>  #include "exec/address-spaces.h"
>  #include "qemu/error-report.h"
>  #include "hw/char/pl011.h"
> diff --git a/hw/arm/tosa.c b/hw/arm/tosa.c
> index a55b1a3..7a925fa 100644
> --- a/hw/arm/tosa.c
> +++ b/hw/arm/tosa.c
> @@ -22,7 +22,6 @@
>  #include "hw/boards.h"
>  #include "hw/i2c/i2c.h"
>  #include "hw/ssi/ssi.h"
> -#include "sysemu/block-backend.h"
>  #include "hw/sysbus.h"
>  #include "exec/address-spaces.h"
>  #include "sysemu/sysemu.h"
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 55e69d6..7670b45 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -50,8 +50,6 @@
>  #include "sysemu/qtest.h"
>  #include "kvm_i386.h"
>  #include "hw/xen/xen.h"
> -#include "sysemu/block-backend.h"
> -#include "hw/block/block.h"
>  #include "ui/qemu-spice.h"
>  #include "exec/memory.h"
>  #include "exec/address-spaces.h"
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 456dc9e..527c922 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -40,7 +40,6 @@
>  #include "sysemu/sysemu.h"
>  #include "hw/sysbus.h"
>  #include "sysemu/arch_init.h"
> -#include "sysemu/block-backend.h"
>  #include "hw/i2c/smbus.h"
>  #include "hw/xen/xen.h"
>  #include "exec/memory.h"
> diff --git a/hw/ide/ahci-allwinner.c b/hw/ide/ahci-allwinner.c
> index c3f1604..5397483 100644
> --- a/hw/ide/ahci-allwinner.c
> +++ b/hw/ide/ahci-allwinner.c
> @@ -18,7 +18,6 @@
>  #include "qemu/osdep.h"
>  #include "hw/hw.h"
>  #include "qemu/error-report.h"
> -#include "sysemu/block-backend.h"
>  #include "sysemu/dma.h"
>  #include "hw/ide/internal.h"
>  #include "hw/ide/ahci_internal.h"
> diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
> index 65aff51..6bb92d7 100644
> --- a/hw/ide/cmd646.c
> +++ b/hw/ide/cmd646.c
> @@ -26,7 +26,6 @@
>  #include "hw/hw.h"
>  #include "hw/pci/pci.h"
>  #include "hw/isa/isa.h"
> -#include "sysemu/block-backend.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/dma.h"
>  
> diff --git a/hw/ide/ich.c b/hw/ide/ich.c
> index c01b24e..134478e 100644
> --- a/hw/ide/ich.c
> +++ b/hw/ide/ich.c
> @@ -65,7 +65,6 @@
>  #include "hw/pci/msi.h"
>  #include "hw/pci/pci.h"
>  #include "hw/isa/isa.h"
> -#include "sysemu/block-backend.h"
>  #include "sysemu/dma.h"
>  #include "hw/ide/pci.h"
>  #include "hw/ide/ahci_internal.h"
> diff --git a/hw/ide/isa.c b/hw/ide/isa.c
> index 9fb24fc..028bd61 100644
> --- a/hw/ide/isa.c
> +++ b/hw/ide/isa.c
> @@ -25,7 +25,6 @@
>  #include "qemu/osdep.h"
>  #include "hw/hw.h"
>  #include "hw/isa/isa.h"
> -#include "sysemu/block-backend.h"
>  #include "sysemu/dma.h"
>  
>  #include "hw/ide/internal.h"
> diff --git a/hw/ide/microdrive.c b/hw/ide/microdrive.c
> index 58e4f52..34bb98d 100644
> ---

[Qemu-block] [PATCH v3 2/4] virtio-blk: fix race between .ioeventfd_stop() and vq handler

2018-03-07 Thread Stefan Hajnoczi

If the main loop thread invokes .ioeventfd_stop() just as the vq handler
function begins in the IOThread then the handler may lose the race for
the AioContext lock.  By the time the vq handler is able to acquire the
AioContext lock the ioeventfd has already been removed and the handler
isn't supposed to run anymore!

Use the new aio_wait_bh_oneshot() function to perform ioeventfd removal
from within the IOThread.  This way no races with the vq handler are
possible.

Signed-off-by: Stefan Hajnoczi 
---
 hw/block/dataplane/virtio-blk.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 2cb990997e..d3bb09bc4e 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -229,6 +229,22 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 return -ENOSYS;
 }
 
+/* Stop notifications for new requests from guest.
+ *
+ * Context: BH in IOThread
+ */
+static void virtio_blk_data_plane_stop_bh(void *opaque)
+{
+VirtIOBlockDataPlane *s = opaque;
+unsigned i;
+
+for (i = 0; i < s->conf->num_queues; i++) {
+VirtQueue *vq = virtio_get_queue(s->vdev, i);
+
+virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, NULL);
+}
+}
+
 /* Context: QEMU global mutex held */
 void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 {
@@ -253,13 +269,7 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 trace_virtio_blk_data_plane_stop(s);
 
 aio_context_acquire(s->ctx);
-
-/* Stop notifications for new requests from guest */
-for (i = 0; i < nvqs; i++) {
-VirtQueue *vq = virtio_get_queue(s->vdev, i);
-
-virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, NULL);
-}
+aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
 
 /* Drain and switch bs back to the QEMU main loop */
 blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
-- 
2.14.3

[Qemu-block] [PATCH v3 1/4] block: add aio_wait_bh_oneshot()

2018-03-07 Thread Stefan Hajnoczi

Sometimes it's necessary for the main loop thread to run a BH in an
IOThread and wait for its completion.  This primitive is useful during
startup/shutdown to synchronize and avoid race conditions.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio-wait.h | 13 +
 util/aio-wait.c  | 31 +++
 2 files changed, 44 insertions(+)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index a48c744fa8..f7a3972200 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -113,4 +113,17 @@ typedef struct {
  */
 void aio_wait_kick(AioWait *wait);
 
+/**
+ * aio_wait_bh_oneshot:
+ * @ctx: the aio context
+ * @cb: the BH callback function
+ * @opaque: user data for the BH callback function
+ *
+ * Run a BH in @ctx and wait for it to complete.
+ *
+ * Must be called from the main loop thread with @ctx acquired exactly once.
+ * Note that main loop event processing may occur.
+ */
+void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
+
 #endif /* QEMU_AIO_WAIT */
diff --git a/util/aio-wait.c b/util/aio-wait.c
index a487cdb852..975afddf4c 100644
--- a/util/aio-wait.c
+++ b/util/aio-wait.c
@@ -38,3 +38,34 @@ void aio_wait_kick(AioWait *wait)
 aio_bh_schedule_oneshot(qemu_get_aio_context(), dummy_bh_cb, NULL);
 }
 }
+
+typedef struct {
+AioWait wait;
+bool done;
+QEMUBHFunc *cb;
+void *opaque;
+} AioWaitBHData;
+
+/* Context: BH in IOThread */
+static void aio_wait_bh(void *opaque)
+{
+AioWaitBHData *data = opaque;
+
+data->cb(data->opaque);
+
+data->done = true;
+aio_wait_kick(>wait);
+}
+
+void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
+{
+AioWaitBHData data = {
+.cb = cb,
+.opaque = opaque,
+};
+
+assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+
+aio_bh_schedule_oneshot(ctx, aio_wait_bh, );
+AIO_WAIT_WHILE(, ctx, !data.done);
+}
-- 
2.14.3

[Qemu-block] [PATCH v3 4/4] vl: introduce vm_shutdown()

2018-03-07 Thread Stefan Hajnoczi

Commit 00d09fdbbae5f7864ce754913efc84c12fdf9f1a ("vl: pause vcpus before
stopping iothreads") and commit dce8921b2baaf95974af8176406881872067adfa
("iothread: Stop threads before main() quits") tried to work around the
fact that emulation was still active during termination by stopping
iothreads.  They suffer from race conditions:
1. virtio_scsi_handle_cmd_vq() racing with iothread_stop_all() hits the
   virtio_scsi_ctx_check() assertion failure because the BDS AioContext
   has been modified by iothread_stop_all().
2. Guest vq kick racing with main loop termination leaves a readable
   ioeventfd that is handled by the next aio_poll() when external
   clients are enabled again, resulting in unwanted emulation activity.

This patch obsoletes those commits by fully disabling emulation activity
when vcpus are stopped.

Use the new vm_shutdown() function instead of pause_all_vcpus() so that
vm change state handlers are invoked too.  Virtio devices will now stop
their ioeventfds, preventing further emulation activity after vm_stop().

Note that vm_stop(RUN_STATE_SHUTDOWN) cannot be used because it emits a
QMP STOP event that may affect existing clients.

It is no longer necessary to call replay_disable_events() directly since
vm_shutdown() does so already.

Drop iothread_stop_all() since it is no longer used.

Cc: Fam Zheng 
Cc: Kevin Wolf 
Signed-off-by: Stefan Hajnoczi 
---
 include/sysemu/iothread.h |  1 -
 include/sysemu/sysemu.h   |  1 +
 cpus.c| 16 +---
 iothread.c| 31 ---
 vl.c  | 13 +++--
 5 files changed, 17 insertions(+), 45 deletions(-)

diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
index 799614ffd2..8a7ac2c528 100644
--- a/include/sysemu/iothread.h
+++ b/include/sysemu/iothread.h
@@ -45,7 +45,6 @@ typedef struct {
 char *iothread_get_id(IOThread *iothread);
 IOThread *iothread_by_id(const char *id);
 AioContext *iothread_get_aio_context(IOThread *iothread);
-void iothread_stop_all(void);
 GMainContext *iothread_get_g_main_context(IOThread *iothread);
 
 /*
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index d24ad09f37..356bfdc1c1 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -56,6 +56,7 @@ void vm_start(void);
 int vm_prepare_start(void);
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
+int vm_shutdown(void);
 
 typedef enum WakeupReason {
 /* Always keep QEMU_WAKEUP_REASON_NONE = 0 */
diff --git a/cpus.c b/cpus.c
index 9bcff7d63c..d8fe90eafe 100644
--- a/cpus.c
+++ b/cpus.c
@@ -993,7 +993,7 @@ void cpu_synchronize_all_pre_loadvm(void)
 }
 }
 
-static int do_vm_stop(RunState state)
+static int do_vm_stop(RunState state, bool send_stop)
 {
 int ret = 0;
 
@@ -1002,7 +1002,9 @@ static int do_vm_stop(RunState state)
 pause_all_vcpus();
 runstate_set(state);
 vm_state_notify(0, state);
-qapi_event_send_stop(_abort);
+if (send_stop) {
+qapi_event_send_stop(_abort);
+}
 }
 
 bdrv_drain_all();
@@ -1012,6 +1014,14 @@ static int do_vm_stop(RunState state)
 return ret;
 }
 
+/* Special vm_stop() variant for terminating the process.  Historically clients
+ * did not expect a QMP STOP event and so we need to retain compatibility.
+ */
+int vm_shutdown(void)
+{
+return do_vm_stop(RUN_STATE_SHUTDOWN, false);
+}
+
 static bool cpu_can_run(CPUState *cpu)
 {
 if (cpu->stop) {
@@ -1994,7 +2004,7 @@ int vm_stop(RunState state)
 return 0;
 }
 
-return do_vm_stop(state);
+return do_vm_stop(state, true);
 }
 
 /**
diff --git a/iothread.c b/iothread.c
index 2ec5a3bffe..1b3463cb00 100644
--- a/iothread.c
+++ b/iothread.c
@@ -101,18 +101,6 @@ void iothread_stop(IOThread *iothread)
 qemu_thread_join(>thread);
 }
 
-static int iothread_stop_iter(Object *object, void *opaque)
-{
-IOThread *iothread;
-
-iothread = (IOThread *)object_dynamic_cast(object, TYPE_IOTHREAD);
-if (!iothread) {
-return 0;
-}
-iothread_stop(iothread);
-return 0;
-}
-
 static void iothread_instance_init(Object *obj)
 {
 IOThread *iothread = IOTHREAD(obj);
@@ -333,25 +321,6 @@ IOThreadInfoList *qmp_query_iothreads(Error **errp)
 return head;
 }
 
-void iothread_stop_all(void)
-{
-Object *container = object_get_objects_root();
-BlockDriverState *bs;
-BdrvNextIterator it;
-
-for (bs = bdrv_first(); bs; bs = bdrv_next()) {
-AioContext *ctx = bdrv_get_aio_context(bs);
-if (ctx == qemu_get_aio_context()) {
-continue;
-}
-aio_context_acquire(ctx);
-bdrv_set_aio_context(bs, qemu_get_aio_context());
-aio_context_release(ctx);
-}
-
-object_child_foreach(container, iothread_stop_iter, NULL);
-}
-
 static gpointer iothread_g_main_context_init(gpointer opaque)
 {
 AioContext *ctx;
diff

[Qemu-block] [PATCH v3 0/4] vl: introduce vm_shutdown()

2018-03-07 Thread Stefan Hajnoczi

v3:
 * Rebase on qemu.git/master after AIO_WAIT_WHILE() was merged [Fam]
v2:
 * Tackle the .ioeventfd_stop() vs vq handler race by removing the ioeventfd
   from a BH in the IOThread [Fam]

There are several race conditions in virtio-blk/virtio-scsi dataplane code.
This patch series addresses them, see the commit description for details on the
individual cases.

Stefan Hajnoczi (4):
  block: add aio_wait_bh_oneshot()
  virtio-blk: fix race between .ioeventfd_stop() and vq handler
  virtio-scsi: fix race between .ioeventfd_stop() and vq handler
  vl: introduce vm_shutdown()

 include/block/aio-wait.h| 13 +
 include/sysemu/iothread.h   |  1 -
 include/sysemu/sysemu.h |  1 +
 cpus.c  | 16 +---
 hw/block/dataplane/virtio-blk.c | 24 +---
 hw/scsi/virtio-scsi-dataplane.c |  9 +
 iothread.c  | 31 ---
 util/aio-wait.c | 31 +++
 vl.c| 13 +++--
 9 files changed, 83 insertions(+), 56 deletions(-)

-- 
2.14.3

[Qemu-block] [PATCH v3 3/4] virtio-scsi: fix race between .ioeventfd_stop() and vq handler

2018-03-07 Thread Stefan Hajnoczi

If the main loop thread invokes .ioeventfd_stop() just as the vq handler
function begins in the IOThread then the handler may lose the race for
the AioContext lock.  By the time the vq handler is able to acquire the
AioContext lock the ioeventfd has already been removed and the handler
isn't supposed to run anymore!

Use the new aio_wait_bh_oneshot() function to perform ioeventfd removal
from within the IOThread.  This way no races with the vq handler are
possible.

Signed-off-by: Stefan Hajnoczi 
---
 hw/scsi/virtio-scsi-dataplane.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 1c33322ba6..912e5005d8 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -107,9 +107,10 @@ static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue 
*vq, int n,
 return 0;
 }
 
-/* assumes s->ctx held */
-static void virtio_scsi_clear_aio(VirtIOSCSI *s)
+/* Context: BH in IOThread */
+static void virtio_scsi_dataplane_stop_bh(void *opaque)
 {
+VirtIOSCSI *s = opaque;
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
 int i;
 
@@ -171,7 +172,7 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 return 0;
 
 fail_vrings:
-virtio_scsi_clear_aio(s);
+aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
 aio_context_release(s->ctx);
 for (i = 0; i < vs->conf.num_queues + 2; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
@@ -207,7 +208,7 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
 s->dataplane_stopping = true;
 
 aio_context_acquire(s->ctx);
-virtio_scsi_clear_aio(s);
+aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
 aio_context_release(s->ctx);
 
 blk_drain_all(); /* ensure there are no in-flight requests */
-- 
2.14.3

Re: [Qemu-block] [PATCH 2/2] iotests: add 208 nbd-server + blockdev-snapshot-sync test case

2018-03-07 Thread Stefano Panella

On Wed, Mar 7, 2018 at 10:55 AM, Stefan Hajnoczi  wrote:
>
> On Tue, Mar 6, 2018 at 11:25 PM, Stefano Panella 
wrote:
> > I have applied this patch and when I run the following qmp commands I I
do
> > not see the crash anymore but there is still something wrong because
only
> > /root/a is opened from qemu. It looks like nbd-server-stop is also
getting
> > rid of the nodes added with blockdev-snapshot-sync, therfore is than not
> > possible to do blockdev-del on /root/d because node-name is not found
>
> Nodes are reference counted.  If nothing holds a refcount then the
> node is freed.
Thanks, that explains the behaviour
>
> The blockdev-add command holds a reference to the node.  The node will
> stay alive until blockdev-del, which releases that reference.
>
> blockdev-snapshot-sync does not hold a reference.  Therefore snapshot
> nodes are freed once nothing is using them anymore.  When the snapshot
> node is created, the users of the parent node are updated to point to
> the snapshot node instead.  This is why the NBD server switches to the
> snapshot mode after blockdev-snapshot-sync.
>
> This is why the snapshot nodes disappear after the NBD server is
> stopped while /root/a stays alive.
>
> I'm not sure if the current blockdev-snapshot-sync behavior is useful.
> Perhaps the presence of the "snapshot-node-name" argument should cause
> the snapshot node to be treated as monitor-owned, just like
> blockdev-add.  This would introduce leaks for existing QMP clients
> though, so it may be necessary to add yet another argument for this
> behavior.
that would be nice, I mean to add an extra parameter so it is added to the
monitor
>
> Anyway, I hope this explains the current behavior.  I don't see a
> problem with it, but it's something the API users need to be aware of.
>
Yes, I was not aware of that behaviour, the problem is that many examples
refer
to having a device associated with the blockdev-add'd node therefore we do
not
see this problem.
> If it is a problem for your use case, please explain what you are trying
to do.
>
It is not strictly a problem for my usecase but it would be nice to have
the extra param to
blockdev-snapshot-sync. That would also fix the problem of running multiple
snap-sync
after blockdev-add but before there is any user.
> Stefan

[Qemu-block] [PATCH v2] block: make BDRV_POLL_WHILE() re-entrancy safe

2018-03-07 Thread Stefan Hajnoczi

Nested BDRV_POLL_WHILE() calls can occur.  Currently
assert(!wait_->wakeup) fails in AIO_WAIT_WHILE() when this happens.

This patch converts the bool wait_->need_kick flag to an unsigned
wait_->num_waiters counter.

Nesting works correctly because outer AIO_WAIT_WHILE() callers evaluate
the condition again after the inner caller completes (invoking the inner
caller counts as aio_poll() progress).

Reported-by: "fuweiwei (C)" 
Cc: Paolo Bonzini 
Signed-off-by: Stefan Hajnoczi 
---
v2:
 * Rebase onto qemu.git/master now that AIO_WAIT_WHILE() has landed
   [Kevin]

 include/block/aio-wait.h | 61 
 util/aio-wait.c  |  2 +-
 2 files changed, 31 insertions(+), 32 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index a48c744fa8..74cde07bef 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -50,8 +50,8 @@
  *   }
  */
 typedef struct {
-/* Is the main loop waiting for a kick?  Accessed with atomic ops. */
-bool need_kick;
+/* Number of waiting AIO_WAIT_WHILE() callers. Accessed with atomic ops. */
+unsigned num_waiters;
 } AioWait;
 
 /**
@@ -71,35 +71,34 @@ typedef struct {
  * wait on conditions between two IOThreads since that could lead to deadlock,
  * go via the main loop instead.
  */
-#define AIO_WAIT_WHILE(wait, ctx, cond) ({  \
-bool waited_ = false;   \
-bool busy_ = true;  \
-AioWait *wait_ = (wait);\
-AioContext *ctx_ = (ctx);   \
-if (in_aio_context_home_thread(ctx_)) { \
-while ((cond) || busy_) {   \
-busy_ = aio_poll(ctx_, (cond)); \
-waited_ |= !!(cond) | busy_;\
-}   \
-} else {\
-assert(qemu_get_current_aio_context() ==\
-   qemu_get_aio_context()); \
-assert(!wait_->need_kick);  \
-/* Set wait_->need_kick before evaluating cond.  */ \
-atomic_mb_set(_->need_kick, true); \
-while (busy_) { \
-if ((cond)) {   \
-waited_ = busy_ = true; \
-aio_context_release(ctx_);  \
-aio_poll(qemu_get_aio_context(), true); \
-aio_context_acquire(ctx_);  \
-} else {\
-busy_ = aio_poll(ctx_, false);  \
-waited_ |= busy_;   \
-}   \
-}   \
-atomic_set(_->need_kick, false);   \
-}   \
+#define AIO_WAIT_WHILE(wait, ctx, cond) ({ \
+bool waited_ = false;  \
+bool busy_ = true; \
+AioWait *wait_ = (wait);   \
+AioContext *ctx_ = (ctx);  \
+if (in_aio_context_home_thread(ctx_)) {\
+while ((cond) || busy_) {  \
+busy_ = aio_poll(ctx_, (cond));\
+waited_ |= !!(cond) | busy_;   \
+}  \
+} else {   \
+assert(qemu_get_current_aio_context() ==   \
+   qemu_get_aio_context());\
+/* Increment wait_->num_waiters before evaluating cond. */ \
+atomic_inc(_->num_waiters);   \
+while (busy_) {\
+if ((cond)) {  \
+waited_ = busy_ = true;\
+aio_context_release(ctx_); \
+aio_poll(qemu_get_aio_context(), true);\
+aio_context_acquire(ctx_); \
+} else {   \
+busy_ = aio_poll(ctx_, false); \
+waited_ |= busy_;  \
+}  \
+}

[Qemu-block] [PATCH] virtio-blk: dataplane: Don't batch notifications if EVENT_IDX is present

2018-03-07 Thread Sergio Lopez

Commit 5b2ffbe4d99843fd8305c573a100047a8c962327 ("virtio-blk: dataplane:
notify guest as a batch") deferred guest notification to a BH in order
batch notifications, with purpose of avoiding flooding the guest with
interruptions.

This optimization came with a cost. The average latency perceived in the
guest is increased by a few microseconds, but also when multiple IO
operations finish at the same time, the guest won't be notified until
all completions from each operation has been run. On the contrary,
virtio-scsi issues the notification at the end of each completion.

On the other hand, nowadays we have the EVENT_IDX feature that allows a
better coordination between QEMU and the Guest OS to avoid sending
unnecessary interruptions.

With this change, virtio-blk/dataplane only batches notifications if the
EVENT_IDX feature is not present.

Some numbers obtained with fio (ioengine=sync, iodepth=1, direct=1):
 - Test specs:
   * fio-3.4 (ioengine=sync, iodepth=1, direct=1)
   * qemu master
   * virtio-blk with a dedicated iothread (default poll-max-ns)
   * backend: null_blk nr_devices=1 irqmode=2 completion_nsec=28
   * 8 vCPUs pinned to isolated physical cores
   * Emulator and iothread also pinned to separate isolated cores
   * variance between runs < 1%

 - Not patched
   * numjobs=1:  lat_avg=327.32  irqs=29998
   * numjobs=4:  lat_avg=337.89  irqs=29073
   * numjobs=8:  lat_avg=342.98  irqs=28643

 - Patched:
   * numjobs=1:  lat_avg=323.92  irqs=30262
   * numjobs=4:  lat_avg=332.65  irqs=29520
   * numjobs=8:  lat_avg=335.54  irqs=29323

Signed-off-by: Sergio Lopez 
---
 hw/block/dataplane/virtio-blk.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 2cb990997e..c46253a924 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -34,6 +34,7 @@ struct VirtIOBlockDataPlane {
 VirtIODevice *vdev;
 QEMUBH *bh; /* bh for guest notification */
 unsigned long *batch_notify_vqs;
+bool batch_notifications;
 
 /* Note that these EventNotifiers are assigned by value.  This is
  * fine as long as you do not call event_notifier_cleanup on them
@@ -47,8 +48,12 @@ struct VirtIOBlockDataPlane {
 /* Raise an interrupt to signal guest, if necessary */
 void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, VirtQueue *vq)
 {
-set_bit(virtio_get_queue_index(vq), s->batch_notify_vqs);
-qemu_bh_schedule(s->bh);
+if (s->batch_notifications) {
+set_bit(virtio_get_queue_index(vq), s->batch_notify_vqs);
+qemu_bh_schedule(s->bh);
+} else {
+virtio_notify_irqfd(s->vdev, vq);
+}
 }
 
 static void notify_guest_bh(void *opaque)
@@ -177,6 +182,12 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 
 s->starting = true;
 
+if (!virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
+s->batch_notifications = true;
+} else {
+s->batch_notifications = false;
+}
+
 /* Set up guest notifier (irq) */
 r = k->set_guest_notifiers(qbus->parent, nvqs, true);
 if (r != 0) {
-- 
2.14.3

Re: [Qemu-block] [PATCH] block: make BDRV_POLL_WHILE() re-entrancy safe

2018-03-07 Thread Stefan Hajnoczi

On Tue, Mar 06, 2018 at 12:08:44PM +0100, Kevin Wolf wrote:
> Am 06.03.2018 um 11:53 hat Stefan Hajnoczi geschrieben:
> > Nested BDRV_POLL_WHILE() calls can occur.  Currently
> > assert(!bs_->wakeup) will fail when this happens.
> > 
> > This patch converts bs->wakeup from bool to a counter.
> > 
> > Nesting works correctly because outer BDRV_POLL_WHILE() callers evaluate
> > the condition again after the inner caller completes (invoking the inner
> > caller counts as aio_poll() progress).
> > 
> > Reported-by: "fuweiwei (C)" 
> > Cc: Paolo Bonzini 
> > Signed-off-by: Stefan Hajnoczi 
> 
> Doesn't this conflict with your own AIO_WAIT_WHILE() patch?

Yes, I wanted this patch to be easy for Weiwei to test without
dependencies.

AIO_WAIT_WHILE() has just hit qemu.git/master, so I'll rebase and send a
v2.

Stefan


signature.asc
Description: PGP signature

[Qemu-block] [PULL 5/6] qio: non-default context for async conn

2018-03-07 Thread Daniel P . Berrangé

From: Peter Xu 

We have worked on qio_task_run_in_thread() already.  Further, let
all the qio channel APIs use that context.

Signed-off-by: Peter Xu 
Signed-off-by: Daniel P. Berrangé 
---
 chardev/char-socket.c  |  4 ++--
 include/io/channel-socket.h| 15 ---
 io/channel-socket.c| 15 +--
 migration/socket.c |  3 ++-
 tests/test-io-channel-socket.c |  4 ++--
 5 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 22f65971a1..b0d11387f3 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -867,7 +867,7 @@ static gboolean socket_reconnect_timeout(gpointer opaque)
 tcp_chr_set_client_ioc_name(chr, sioc);
 qio_channel_socket_connect_async(sioc, s->addr,
  qemu_chr_socket_connected,
- chr, NULL);
+ chr, NULL, NULL);
 
 return false;
 }
@@ -951,7 +951,7 @@ static void qmp_chardev_open_socket(Chardev *chr,
 tcp_chr_set_client_ioc_name(chr, sioc);
 qio_channel_socket_connect_async(sioc, s->addr,
  qemu_chr_socket_connected,
- chr, NULL);
+ chr, NULL, NULL);
 } else {
 if (s->is_listen) {
 char *name;
diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index 53801f6042..d7134d2cd6 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -101,6 +101,8 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
  * @callback: the function to invoke on completion
  * @opaque: user data to pass to @callback
  * @destroy: the function to free @opaque
+ * @context: the context to run the async task. If %NULL, the default
+ *   context will be used.
  *
  * Attempt to connect to the address @addr. This method
  * will run in the background so the caller will regain
@@ -113,7 +115,8 @@ void qio_channel_socket_connect_async(QIOChannelSocket *ioc,
   SocketAddress *addr,
   QIOTaskFunc callback,
   gpointer opaque,
-  GDestroyNotify destroy);
+  GDestroyNotify destroy,
+  GMainContext *context);
 
 
 /**
@@ -138,6 +141,8 @@ int qio_channel_socket_listen_sync(QIOChannelSocket *ioc,
  * @callback: the function to invoke on completion
  * @opaque: user data to pass to @callback
  * @destroy: the function to free @opaque
+ * @context: the context to run the async task. If %NULL, the default
+ *   context will be used.
  *
  * Attempt to listen to the address @addr. This method
  * will run in the background so the caller will regain
@@ -150,7 +155,8 @@ void qio_channel_socket_listen_async(QIOChannelSocket *ioc,
  SocketAddress *addr,
  QIOTaskFunc callback,
  gpointer opaque,
- GDestroyNotify destroy);
+ GDestroyNotify destroy,
+ GMainContext *context);
 
 
 /**
@@ -179,6 +185,8 @@ int qio_channel_socket_dgram_sync(QIOChannelSocket *ioc,
  * @callback: the function to invoke on completion
  * @opaque: user data to pass to @callback
  * @destroy: the function to free @opaque
+ * @context: the context to run the async task. If %NULL, the default
+ *   context will be used.
  *
  * Attempt to initialize a datagram socket bound to
  * @localAddr and communicating with peer @remoteAddr.
@@ -194,7 +202,8 @@ void qio_channel_socket_dgram_async(QIOChannelSocket *ioc,
 SocketAddress *remoteAddr,
 QIOTaskFunc callback,
 gpointer opaque,
-GDestroyNotify destroy);
+GDestroyNotify destroy,
+GMainContext *context);
 
 
 /**
diff --git a/io/channel-socket.c b/io/channel-socket.c
index b4d914b767..57cfb4d3a6 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -174,7 +174,8 @@ void qio_channel_socket_connect_async(QIOChannelSocket *ioc,
   SocketAddress *addr,
   QIOTaskFunc callback,
   gpointer opaque,
-  GDestroyNotify destroy)
+  GDestroyNotify destroy,
+  GMainContext *context)
 {
 QIOTask *task = qio_task_new(
 OBJECT(ioc), callback,

[Qemu-block] [PULL 6/6] qio: non-default context for TLS handshake

2018-03-07 Thread Daniel P . Berrangé

From: Peter Xu 

A new parameter "context" is added to qio_channel_tls_handshake() is to
allow the TLS to be run on a non-default context.  Still, no functional
change.

Signed-off-by: Peter Xu 
Signed-off-by: Daniel P. Berrangé 
---
 chardev/char-socket.c   |  1 +
 include/io/channel-tls.h|  5 -
 io/channel-tls.c| 45 ++---
 migration/tls.c |  2 ++
 nbd/client.c|  1 +
 nbd/server.c|  1 +
 tests/test-io-channel-tls.c |  2 ++
 ui/vnc-auth-vencrypt.c  |  1 +
 ui/vnc-ws.c |  1 +
 9 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index b0d11387f3..58e11c6f4c 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -703,6 +703,7 @@ static void tcp_chr_tls_init(Chardev *chr)
 qio_channel_tls_handshake(tioc,
   tcp_chr_tls_handshake,
   chr,
+  NULL,
   NULL);
 }
 
diff --git a/include/io/channel-tls.h b/include/io/channel-tls.h
index d157eb10e8..87fcaf9146 100644
--- a/include/io/channel-tls.h
+++ b/include/io/channel-tls.h
@@ -116,6 +116,8 @@ qio_channel_tls_new_client(QIOChannel *master,
  * @func: the callback to invoke when completed
  * @opaque: opaque data to pass to @func
  * @destroy: optional callback to free @opaque
+ * @context: the context that TLS handshake will run with. If %NULL,
+ *   the default context will be used
  *
  * Perform the TLS session handshake. This method
  * will return immediately and the handshake will
@@ -126,7 +128,8 @@ qio_channel_tls_new_client(QIOChannel *master,
 void qio_channel_tls_handshake(QIOChannelTLS *ioc,
QIOTaskFunc func,
gpointer opaque,
-   GDestroyNotify destroy);
+   GDestroyNotify destroy,
+   GMainContext *context);
 
 /**
  * qio_channel_tls_get_session:
diff --git a/io/channel-tls.c b/io/channel-tls.c
index 6182702dab..9628e6fa47 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -140,13 +140,19 @@ qio_channel_tls_new_client(QIOChannel *master,
 return NULL;
 }
 
+struct QIOChannelTLSData {
+QIOTask *task;
+GMainContext *context;
+};
+typedef struct QIOChannelTLSData QIOChannelTLSData;
 
 static gboolean qio_channel_tls_handshake_io(QIOChannel *ioc,
  GIOCondition condition,
  gpointer user_data);
 
 static void qio_channel_tls_handshake_task(QIOChannelTLS *ioc,
-   QIOTask *task)
+   QIOTask *task,
+   GMainContext *context)
 {
 Error *err = NULL;
 QCryptoTLSSessionHandshakeStatus status;
@@ -171,6 +177,15 @@ static void qio_channel_tls_handshake_task(QIOChannelTLS 
*ioc,
 qio_task_complete(task);
 } else {
 GIOCondition condition;
+QIOChannelTLSData *data = g_new0(typeof(*data), 1);
+
+data->task = task;
+data->context = context;
+
+if (context) {
+g_main_context_ref(context);
+}
+
 if (status == QCRYPTO_TLS_HANDSHAKE_SENDING) {
 condition = G_IO_OUT;
 } else {
@@ -178,11 +193,12 @@ static void qio_channel_tls_handshake_task(QIOChannelTLS 
*ioc,
 }
 
 trace_qio_channel_tls_handshake_pending(ioc, status);
-qio_channel_add_watch(ioc->master,
-  condition,
-  qio_channel_tls_handshake_io,
-  task,
-  NULL);
+qio_channel_add_watch_full(ioc->master,
+   condition,
+   qio_channel_tls_handshake_io,
+   data,
+   NULL,
+   context);
 }
 }
 
@@ -191,12 +207,18 @@ static gboolean qio_channel_tls_handshake_io(QIOChannel 
*ioc,
  GIOCondition condition,
  gpointer user_data)
 {
-QIOTask *task = user_data;
+QIOChannelTLSData *data = user_data;
+QIOTask *task = data->task;
+GMainContext *context = data->context;
 QIOChannelTLS *tioc = QIO_CHANNEL_TLS(
 qio_task_get_source(task));
 
-qio_channel_tls_handshake_task(
-   tioc, task);
+g_free(data);
+qio_channel_tls_handshake_task(tioc, task, context);
+
+if (context) {
+g_main_context_unref(context);
+}
 
 return FALSE;
 }
@@ -204,7 +226,8 @@ static gboolean qio_channel_tls_handshake_io(QIOChannel 
*ioc,
 void

[Qemu-block] [PULL 0/6] Qio next patches

2018-03-07 Thread Daniel P . Berrangé

The following changes since commit f2bb2d14c2958f3f5aef456bd2cdb1ff99f4a562:

  Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into 
staging (2018-03-05 16:41:20 +)

are available in the Git repository at:

  https://github.com/berrange/qemu tags/qio-next-pull-request

for you to fetch changes up to 1939ccdaa61ce6a1f57d83277b3d41d3a9ad3c58:

  qio: non-default context for TLS handshake (2018-03-06 10:19:07 +)





Peter Xu (6):
  qio: rename qio_task_thread_result
  qio: introduce qio_channel_add_watch_{full|source}
  qio: store gsources for net listeners
  qio: non-default context for threaded qtask
  qio: non-default context for async conn
  qio: non-default context for TLS handshake

 chardev/char-socket.c  |  5 ++--
 include/io/channel-socket.h| 15 ---
 include/io/channel-tls.h   |  5 +++-
 include/io/channel.h   | 44 
 include/io/net-listener.h  | 22 ++--
 include/io/task.h  |  7 +++--
 io/channel-socket.c| 18 -
 io/channel-tls.c   | 45 
 io/channel.c   | 40 -
 io/dns-resolver.c  |  3 ++-
 io/net-listener.c  | 58 ++
 io/task.c  | 22 +---
 migration/socket.c |  3 ++-
 migration/tls.c|  2 ++
 nbd/client.c   |  1 +
 nbd/server.c   |  1 +
 tests/test-io-channel-socket.c |  4 +--
 tests/test-io-channel-tls.c|  2 ++
 tests/test-io-task.c   |  2 ++
 ui/vnc-auth-vencrypt.c |  1 +
 ui/vnc-ws.c|  1 +
 21 files changed, 239 insertions(+), 62 deletions(-)

-- 
2.14.3

[Qemu-block] [PULL 4/6] qio: non-default context for threaded qtask

2018-03-07 Thread Daniel P . Berrangé

From: Peter Xu 

qio_task_run_in_thread() allows main thread to run blocking operations
in the background. However it has an assumption on that it's always
working with the default context. This patch tries to allow the threaded
QIO task framework to run with non-default gcontext.

Currently no functional change so far, so the QIOTasks are still always
running on main context.

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
Signed-off-by: Daniel P. Berrangé 
---
 include/io/task.h|  7 +--
 io/channel-socket.c  |  9 ++---
 io/dns-resolver.c|  3 ++-
 io/task.c| 20 ++--
 tests/test-io-task.c |  2 ++
 5 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/include/io/task.h b/include/io/task.h
index 6021f51336..9e09b95b2e 100644
--- a/include/io/task.h
+++ b/include/io/task.h
@@ -227,15 +227,18 @@ QIOTask *qio_task_new(Object *source,
  * @worker: the function to invoke in a thread
  * @opaque: opaque data to pass to @worker
  * @destroy: function to free @opaque
+ * @context: the context to run the complete hook. If %NULL, the
+ *   default context will be used.
  *
  * Run a task in a background thread. When @worker
  * returns it will call qio_task_complete() in
- * the main event thread context.
+ * the event thread context that provided.
  */
 void qio_task_run_in_thread(QIOTask *task,
 QIOTaskWorker worker,
 gpointer opaque,
-GDestroyNotify destroy);
+GDestroyNotify destroy,
+GMainContext *context);
 
 /**
  * qio_task_complete:
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 8359b6683a..b4d914b767 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -188,7 +188,8 @@ void qio_channel_socket_connect_async(QIOChannelSocket *ioc,
 qio_task_run_in_thread(task,
qio_channel_socket_connect_worker,
addrCopy,
-   (GDestroyNotify)qapi_free_SocketAddress);
+   (GDestroyNotify)qapi_free_SocketAddress,
+   NULL);
 }
 
 
@@ -246,7 +247,8 @@ void qio_channel_socket_listen_async(QIOChannelSocket *ioc,
 qio_task_run_in_thread(task,
qio_channel_socket_listen_worker,
addrCopy,
-   (GDestroyNotify)qapi_free_SocketAddress);
+   (GDestroyNotify)qapi_free_SocketAddress,
+   NULL);
 }
 
 
@@ -322,7 +324,8 @@ void qio_channel_socket_dgram_async(QIOChannelSocket *ioc,
 qio_task_run_in_thread(task,
qio_channel_socket_dgram_worker,
data,
-   qio_channel_socket_dgram_worker_free);
+   qio_channel_socket_dgram_worker_free,
+   NULL);
 }
 
 
diff --git a/io/dns-resolver.c b/io/dns-resolver.c
index 8c924071c4..187f725665 100644
--- a/io/dns-resolver.c
+++ b/io/dns-resolver.c
@@ -234,7 +234,8 @@ void qio_dns_resolver_lookup_async(QIODNSResolver *resolver,
 qio_task_run_in_thread(task,
qio_dns_resolver_lookup_worker,
data,
-   qio_dns_resolver_lookup_data_free);
+   qio_dns_resolver_lookup_data_free,
+   NULL);
 }
 
 
diff --git a/io/task.c b/io/task.c
index 1a0a1c7185..2886a2c1bc 100644
--- a/io/task.c
+++ b/io/task.c
@@ -77,6 +77,7 @@ struct QIOTaskThreadData {
 QIOTaskWorker worker;
 gpointer opaque;
 GDestroyNotify destroy;
+GMainContext *context;
 };
 
 
@@ -91,6 +92,10 @@ static gboolean qio_task_thread_result(gpointer opaque)
 data->destroy(data->opaque);
 }
 
+if (data->context) {
+g_main_context_unref(data->context);
+}
+
 g_free(data);
 
 return FALSE;
@@ -100,6 +105,7 @@ static gboolean qio_task_thread_result(gpointer opaque)
 static gpointer qio_task_thread_worker(gpointer opaque)
 {
 struct QIOTaskThreadData *data = opaque;
+GSource *idle;
 
 trace_qio_task_thread_run(data->task);
 data->worker(data->task, data->opaque);
@@ -110,7 +116,11 @@ static gpointer qio_task_thread_worker(gpointer opaque)
  * the worker results
  */
 trace_qio_task_thread_exit(data->task);
-g_idle_add(qio_task_thread_result, data);
+
+idle = g_idle_source_new();
+g_source_set_callback(idle, qio_task_thread_result, data, NULL);
+g_source_attach(idle, data->context);
+
 return NULL;
 }
 
@@ -118,15 +128,21 @@ static gpointer qio_task_thread_worker(gpointer opaque)
 void qio_task_run_in_thread(QIOTask *task,
 QIOTaskWorker worker,
 gpointer opaque,
-

[Qemu-block] [PULL 3/6] qio: store gsources for net listeners

2018-03-07 Thread Daniel P . Berrangé

From: Peter Xu 

Originally we were storing the GSources tag IDs.  That'll be not enough
if we are going to support non-default gcontext for QIO code.  Switch to
GSources without changing anything real.  Now we still always pass in
NULL, which means the default gcontext.

Signed-off-by: Peter Xu 
Signed-off-by: Daniel P. Berrangé 
---
 include/io/net-listener.h | 22 --
 io/net-listener.c | 58 +--
 2 files changed, 56 insertions(+), 24 deletions(-)

diff --git a/include/io/net-listener.h b/include/io/net-listener.h
index 56d6da7a76..8081ac58a2 100644
--- a/include/io/net-listener.h
+++ b/include/io/net-listener.h
@@ -53,7 +53,7 @@ struct QIONetListener {
 
 char *name;
 QIOChannelSocket **sioc;
-gulong *io_tag;
+GSource **io_source;
 size_t nsioc;
 
 bool connected;
@@ -120,17 +120,35 @@ void qio_net_listener_add(QIONetListener *listener,
   QIOChannelSocket *sioc);
 
 /**
- * qio_net_listener_set_client_func:
+ * qio_net_listener_set_client_func_full:
  * @listener: the network listener object
  * @func: the callback function
  * @data: opaque data to pass to @func
  * @notify: callback to free @data
+ * @context: the context that the sources will be bound to.  If %NULL,
+ *   the default context will be used.
  *
  * Register @func to be invoked whenever a new client
  * connects to the listener. @func will be invoked
  * passing in the QIOChannelSocket instance for the
  * client.
  */
+void qio_net_listener_set_client_func_full(QIONetListener *listener,
+   QIONetListenerClientFunc func,
+   gpointer data,
+   GDestroyNotify notify,
+   GMainContext *context);
+
+/**
+ * qio_net_listener_set_client_func:
+ * @listener: the network listener object
+ * @func: the callback function
+ * @data: opaque data to pass to @func
+ * @notify: callback to free @data
+ *
+ * Wrapper of qio_net_listener_set_client_func_full(), only that the
+ * sources will always be bound to default main context.
+ */
 void qio_net_listener_set_client_func(QIONetListener *listener,
   QIONetListenerClientFunc func,
   gpointer data,
diff --git a/io/net-listener.c b/io/net-listener.c
index de38dfae99..555e8acaa4 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -118,29 +118,32 @@ void qio_net_listener_add(QIONetListener *listener,
 
 listener->sioc = g_renew(QIOChannelSocket *, listener->sioc,
  listener->nsioc + 1);
-listener->io_tag = g_renew(gulong, listener->io_tag, listener->nsioc + 1);
+listener->io_source = g_renew(typeof(listener->io_source[0]),
+  listener->io_source,
+  listener->nsioc + 1);
 listener->sioc[listener->nsioc] = sioc;
-listener->io_tag[listener->nsioc] = 0;
+listener->io_source[listener->nsioc] = NULL;
 
 object_ref(OBJECT(sioc));
 listener->connected = true;
 
 if (listener->io_func != NULL) {
 object_ref(OBJECT(listener));
-listener->io_tag[listener->nsioc] = qio_channel_add_watch(
+listener->io_source[listener->nsioc] = qio_channel_add_watch_source(
 QIO_CHANNEL(listener->sioc[listener->nsioc]), G_IO_IN,
 qio_net_listener_channel_func,
-listener, (GDestroyNotify)object_unref);
+listener, (GDestroyNotify)object_unref, NULL);
 }
 
 listener->nsioc++;
 }
 
 
-void qio_net_listener_set_client_func(QIONetListener *listener,
-  QIONetListenerClientFunc func,
-  gpointer data,
-  GDestroyNotify notify)
+void qio_net_listener_set_client_func_full(QIONetListener *listener,
+   QIONetListenerClientFunc func,
+   gpointer data,
+   GDestroyNotify notify,
+   GMainContext *context)
 {
 size_t i;
 
@@ -152,23 +155,32 @@ void qio_net_listener_set_client_func(QIONetListener 
*listener,
 listener->io_notify = notify;
 
 for (i = 0; i < listener->nsioc; i++) {
-if (listener->io_tag[i]) {
-g_source_remove(listener->io_tag[i]);
-listener->io_tag[i] = 0;
+if (listener->io_source[i]) {
+g_source_destroy(listener->io_source[i]);
+g_source_unref(listener->io_source[i]);
+listener->io_source[i] = NULL;
 }
 }
 
 if (listener->io_func != NULL) {
 for (i = 0; i < listener->nsioc; i++) {
 object_ref(OBJECT(listener));
-

[Qemu-block] [PULL 1/6] qio: rename qio_task_thread_result

2018-03-07 Thread Daniel P . Berrangé

From: Peter Xu 

It is strange that it was called gio_task_thread_result.  Rename it to
follow the naming rule of the file.

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
Signed-off-by: Daniel P. Berrangé 
---
 io/task.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/io/task.c b/io/task.c
index 3ce556017c..1a0a1c7185 100644
--- a/io/task.c
+++ b/io/task.c
@@ -80,7 +80,7 @@ struct QIOTaskThreadData {
 };
 
 
-static gboolean gio_task_thread_result(gpointer opaque)
+static gboolean qio_task_thread_result(gpointer opaque)
 {
 struct QIOTaskThreadData *data = opaque;
 
@@ -110,7 +110,7 @@ static gpointer qio_task_thread_worker(gpointer opaque)
  * the worker results
  */
 trace_qio_task_thread_exit(data->task);
-g_idle_add(gio_task_thread_result, data);
+g_idle_add(qio_task_thread_result, data);
 return NULL;
 }
 
-- 
2.14.3

[Qemu-block] [PULL 2/6] qio: introduce qio_channel_add_watch_{full|source}

2018-03-07 Thread Daniel P . Berrangé

From: Peter Xu 

Firstly, introduce an internal qio_channel_add_watch_full(), which
enhances qio_channel_add_watch() that context can be specified.

Then add a new API wrapper qio_channel_add_watch_source() to return a
GSource pointer rather than a tag ID.

Note that the _source() call will keep a reference of GSource so that
callers need to unref them explicitly when finished using the GSource.

Signed-off-by: Peter Xu 
Signed-off-by: Daniel P. Berrangé 
---
 include/io/channel.h | 44 
 io/channel.c | 40 ++--
 2 files changed, 78 insertions(+), 6 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 3995e243a3..e8cdadb0b0 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -648,6 +648,50 @@ guint qio_channel_add_watch(QIOChannel *ioc,
 gpointer user_data,
 GDestroyNotify notify);
 
+/**
+ * qio_channel_add_watch_full:
+ * @ioc: the channel object
+ * @condition: the I/O condition to monitor
+ * @func: callback to invoke when the source becomes ready
+ * @user_data: opaque data to pass to @func
+ * @notify: callback to free @user_data
+ * @context: the context to run the watch source
+ *
+ * Similar as qio_channel_add_watch(), but allows to specify context
+ * to run the watch source.
+ *
+ * Returns: the source ID
+ */
+guint qio_channel_add_watch_full(QIOChannel *ioc,
+ GIOCondition condition,
+ QIOChannelFunc func,
+ gpointer user_data,
+ GDestroyNotify notify,
+ GMainContext *context);
+
+/**
+ * qio_channel_add_watch_source:
+ * @ioc: the channel object
+ * @condition: the I/O condition to monitor
+ * @func: callback to invoke when the source becomes ready
+ * @user_data: opaque data to pass to @func
+ * @notify: callback to free @user_data
+ * @context: gcontext to bind the source to
+ *
+ * Similar as qio_channel_add_watch(), but allows to specify context
+ * to run the watch source, meanwhile return the GSource object
+ * instead of tag ID, with the GSource referenced already.
+ *
+ * Note: callers is responsible to unref the source when not needed.
+ *
+ * Returns: the source pointer
+ */
+GSource *qio_channel_add_watch_source(QIOChannel *ioc,
+  GIOCondition condition,
+  QIOChannelFunc func,
+  gpointer user_data,
+  GDestroyNotify notify,
+  GMainContext *context);
 
 /**
  * qio_channel_attach_aio_context:
diff --git a/io/channel.c b/io/channel.c
index ec4b86de7c..8dd0684f5d 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -299,11 +299,12 @@ void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
 klass->io_set_aio_fd_handler(ioc, ctx, io_read, io_write, opaque);
 }
 
-guint qio_channel_add_watch(QIOChannel *ioc,
-GIOCondition condition,
-QIOChannelFunc func,
-gpointer user_data,
-GDestroyNotify notify)
+guint qio_channel_add_watch_full(QIOChannel *ioc,
+ GIOCondition condition,
+ QIOChannelFunc func,
+ gpointer user_data,
+ GDestroyNotify notify,
+ GMainContext *context)
 {
 GSource *source;
 guint id;
@@ -312,12 +313,39 @@ guint qio_channel_add_watch(QIOChannel *ioc,
 
 g_source_set_callback(source, (GSourceFunc)func, user_data, notify);
 
-id = g_source_attach(source, NULL);
+id = g_source_attach(source, context);
 g_source_unref(source);
 
 return id;
 }
 
+guint qio_channel_add_watch(QIOChannel *ioc,
+GIOCondition condition,
+QIOChannelFunc func,
+gpointer user_data,
+GDestroyNotify notify)
+{
+return qio_channel_add_watch_full(ioc, condition, func,
+  user_data, notify, NULL);
+}
+
+GSource *qio_channel_add_watch_source(QIOChannel *ioc,
+  GIOCondition condition,
+  QIOChannelFunc func,
+  gpointer user_data,
+  GDestroyNotify notify,
+  GMainContext *context)
+{
+GSource *source;
+guint id;
+
+id = qio_channel_add_watch_full(ioc, condition, func,
+user_data, notify, context);
+source = g_main_context_find_source_by_id(context, id);
+

Re: [Qemu-block] [PATCH v1 1/1] iotests: bypass s390x for case 200

2018-03-07 Thread QingFeng Hao




在 2018/3/6 15:56, Christian Borntraeger 写道:

Nack. This will be fixed by

s390/ipl: only print boot menu error if -boot menu=on was specified

You are right. After I applied that patch, the case is passed.
Please ignore this patch. Thanks


On 03/06/2018 08:54 AM, QingFeng Hao wrote:

In s390x, the case 200 failed as:
  === Starting QEMU VM ===

+QEMU_PROG: boot menu is not supported for this device type.
  {"return": {}}

  === Sending stream/cancel, checking for SIGSEGV only ===
Failures: 200
Failed 1 of 1 tests

It was caused by the command which isn't supported by s390x now:
qemu-system-s390x -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 -object 
iothread,id=iothread0 -device 
virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0,iothread=iothread0 -drive 
file=.../scratch/test.img,media=disk,if=none,cache=writeback,id=drive_sysdisk,format=qcow2
 -device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0 
-nographic

Signed-off-by: QingFeng Hao 
---
  tests/qemu-iotests/200 | 4 
  1 file changed, 4 insertions(+)

diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
index ddbdedc476..7e53bd7774 100755
--- a/tests/qemu-iotests/200
+++ b/tests/qemu-iotests/200
@@ -45,6 +45,10 @@ _supported_fmt qcow2 qed
  _supported_proto file
  _supported_os Linux

+if [ "$QEMU_DEFAULT_MACHINE" != "pc" ]; then
+_notrun "Requires a PC machine"
+fi
+
  BACKING_IMG="${TEST_DIR}/backing.img"
  TEST_IMG="${TEST_DIR}/test.img"



--
Regards
QingFeng Hao

Re: [Qemu-block] [PATCH 2/2] iotests: add 208 nbd-server + blockdev-snapshot-sync test case

2018-03-07 Thread Stefan Hajnoczi

On Tue, Mar 6, 2018 at 11:25 PM, Stefano Panella  wrote:
> I have applied this patch and when I run the following qmp commands I I do
> not see the crash anymore but there is still something wrong because only
> /root/a is opened from qemu. It looks like nbd-server-stop is also getting
> rid of the nodes added with blockdev-snapshot-sync, therfore is than not
> possible to do blockdev-del on /root/d because node-name is not found

Nodes are reference counted.  If nothing holds a refcount then the
node is freed.

The blockdev-add command holds a reference to the node.  The node will
stay alive until blockdev-del, which releases that reference.

blockdev-snapshot-sync does not hold a reference.  Therefore snapshot
nodes are freed once nothing is using them anymore.  When the snapshot
node is created, the users of the parent node are updated to point to
the snapshot node instead.  This is why the NBD server switches to the
snapshot mode after blockdev-snapshot-sync.

This is why the snapshot nodes disappear after the NBD server is
stopped while /root/a stays alive.

I'm not sure if the current blockdev-snapshot-sync behavior is useful.
Perhaps the presence of the "snapshot-node-name" argument should cause
the snapshot node to be treated as monitor-owned, just like
blockdev-add.  This would introduce leaks for existing QMP clients
though, so it may be necessary to add yet another argument for this
behavior.

Anyway, I hope this explains the current behavior.  I don't see a
problem with it, but it's something the API users need to be aware of.

If it is a problem for your use case, please explain what you are trying to do.

Stefan

Re: [Qemu-block] [PATCH 0/2] block/ssh: Implement .bdrv_refresh_filename()

2018-03-07 Thread Kevin Wolf

Am 06.03.2018 um 22:51 hat John Snow geschrieben:
> On 02/05/2018 03:22 PM, Max Reitz wrote:
> > This series implements .bdrv_refresh_filename() for the ssh block
> > driver, along with an appropriate .bdrv_dirname() so we don't chop off
> > query strings for backing files with relative filenames.
> > 
> > This series depends on my “block: Fix some filename generation issues”
> > series and on Pino's “ssh: switch from libssh2 to libssh” patch.
> > 
> > Based-on: 20180205151835.20812-1-mre...@redhat.com
> > Based-on: 20180118164439.2120-1-ptosc...@redhat.com
> > 
> > 
> > Max Reitz (2):
> >   block/ssh: Implement .bdrv_refresh_filename()
> >   block/ssh: Implement .bdrv_dirname()
> > 
> >  block/ssh.c | 72 
> > +++--
> >  1 file changed, 65 insertions(+), 7 deletions(-)
> 
> Did this one rot on the vine?
> 
> >1 month old.

The Based-on tags are the problem, in particular the first one. But yes,
we could possibly do more to review the dependencies...

Kevin

Re: [Qemu-block] [Qemu-devel] [PATCH] hw: Do not include "sysemu/block-backend.h" if it is not necessary

2018-03-07 Thread Thomas Huth

On 15.02.2018 09:55, Thomas Huth wrote:
> After reviewing a patch from Philippe that removes block-backend.h
> from hw/lm32/milkymist.c, I noticed that this header is included
> unnecessarily in a lot of other files, too. Remove those unneeded
> includes to speed up the compilation process a little bit.
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/arm/highbank.c  | 1 -
>  hw/arm/msf2-soc.c  | 1 -
>  hw/arm/realview.c  | 1 -
>  hw/arm/tosa.c  | 1 -
>  hw/i386/pc.c   | 2 --
>  hw/i386/pc_piix.c  | 1 -
>  hw/ide/ahci-allwinner.c| 1 -
>  hw/ide/cmd646.c| 1 -
>  hw/ide/ich.c   | 1 -
>  hw/ide/isa.c   | 1 -
>  hw/ide/microdrive.c| 1 -
>  hw/ide/mmio.c  | 1 -
>  hw/mips/mips_fulong2e.c| 1 -
>  hw/mips/mips_jazz.c| 1 -
>  hw/ppc/mac_newworld.c  | 1 -
>  hw/ppc/mac_oldworld.c  | 1 -
>  hw/ppc/prep.c  | 1 -
>  hw/scsi/mptendian.c| 1 -
>  hw/sd/core.c   | 1 -
>  hw/sparc/sun4m.c   | 1 -
>  hw/tricore/tricore_testboard.c | 2 --
>  21 files changed, 23 deletions(-)
> 
> diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
> index 287392b..1742cf6 100644
> --- a/hw/arm/highbank.c
> +++ b/hw/arm/highbank.c
> @@ -27,7 +27,6 @@
>  #include "sysemu/kvm.h"
>  #include "sysemu/sysemu.h"
>  #include "hw/boards.h"
> -#include "sysemu/block-backend.h"
>  #include "exec/address-spaces.h"
>  #include "qemu/error-report.h"
>  #include "hw/char/pl011.h"
> diff --git a/hw/arm/msf2-soc.c b/hw/arm/msf2-soc.c
> index a8ec2cd..f68df56 100644
> --- a/hw/arm/msf2-soc.c
> +++ b/hw/arm/msf2-soc.c
> @@ -29,7 +29,6 @@
>  #include "exec/address-spaces.h"
>  #include "hw/char/serial.h"
>  #include "hw/boards.h"
> -#include "sysemu/block-backend.h"
>  #include "qemu/cutils.h"
>  #include "hw/arm/msf2-soc.h"
>  #include "hw/misc/unimp.h"
> diff --git a/hw/arm/realview.c b/hw/arm/realview.c
> index 87cd1e5..2139a62 100644
> --- a/hw/arm/realview.c
> +++ b/hw/arm/realview.c
> @@ -20,7 +20,6 @@
>  #include "sysemu/sysemu.h"
>  #include "hw/boards.h"
>  #include "hw/i2c/i2c.h"
> -#include "sysemu/block-backend.h"
>  #include "exec/address-spaces.h"
>  #include "qemu/error-report.h"
>  #include "hw/char/pl011.h"
> diff --git a/hw/arm/tosa.c b/hw/arm/tosa.c
> index a55b1a3..7a925fa 100644
> --- a/hw/arm/tosa.c
> +++ b/hw/arm/tosa.c
> @@ -22,7 +22,6 @@
>  #include "hw/boards.h"
>  #include "hw/i2c/i2c.h"
>  #include "hw/ssi/ssi.h"
> -#include "sysemu/block-backend.h"
>  #include "hw/sysbus.h"
>  #include "exec/address-spaces.h"
>  #include "sysemu/sysemu.h"
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 55e69d6..7670b45 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -50,8 +50,6 @@
>  #include "sysemu/qtest.h"
>  #include "kvm_i386.h"
>  #include "hw/xen/xen.h"
> -#include "sysemu/block-backend.h"
> -#include "hw/block/block.h"
>  #include "ui/qemu-spice.h"
>  #include "exec/memory.h"
>  #include "exec/address-spaces.h"
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 456dc9e..527c922 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -40,7 +40,6 @@
>  #include "sysemu/sysemu.h"
>  #include "hw/sysbus.h"
>  #include "sysemu/arch_init.h"
> -#include "sysemu/block-backend.h"
>  #include "hw/i2c/smbus.h"
>  #include "hw/xen/xen.h"
>  #include "exec/memory.h"
> diff --git a/hw/ide/ahci-allwinner.c b/hw/ide/ahci-allwinner.c
> index c3f1604..5397483 100644
> --- a/hw/ide/ahci-allwinner.c
> +++ b/hw/ide/ahci-allwinner.c
> @@ -18,7 +18,6 @@
>  #include "qemu/osdep.h"
>  #include "hw/hw.h"
>  #include "qemu/error-report.h"
> -#include "sysemu/block-backend.h"
>  #include "sysemu/dma.h"
>  #include "hw/ide/internal.h"
>  #include "hw/ide/ahci_internal.h"
> diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
> index 65aff51..6bb92d7 100644
> --- a/hw/ide/cmd646.c
> +++ b/hw/ide/cmd646.c
> @@ -26,7 +26,6 @@
>  #include "hw/hw.h"
>  #include "hw/pci/pci.h"
>  #include "hw/isa/isa.h"
> -#include "sysemu/block-backend.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/dma.h"
>  
> diff --git a/hw/ide/ich.c b/hw/ide/ich.c
> index c01b24e..134478e 100644
> --- a/hw/ide/ich.c
> +++ b/hw/ide/ich.c
> @@ -65,7 +65,6 @@
>  #include "hw/pci/msi.h"
>  #include "hw/pci/pci.h"
>  #include "hw/isa/isa.h"
> -#include "sysemu/block-backend.h"
>  #include "sysemu/dma.h"
>  #include "hw/ide/pci.h"
>  #include "hw/ide/ahci_internal.h"
> diff --git a/hw/ide/isa.c b/hw/ide/isa.c
> index 9fb24fc..028bd61 100644
> --- a/hw/ide/isa.c
> +++ b/hw/ide/isa.c
> @@ -25,7 +25,6 @@
>  #include "qemu/osdep.h"
>  #include "hw/hw.h"
>  #include "hw/isa/isa.h"
> -#include "sysemu/block-backend.h"
>  #include "sysemu/dma.h"
>  
>  #include "hw/ide/internal.h"
> diff --git a/hw/ide/microdrive.c b/hw/ide/microdrive.c
> index 58e4f52..34bb98d 100644
> --- a/hw/ide/microdrive.c
> +++ b/hw/ide/microdrive.c
>

Re: [Qemu-block] block migration and MAX_IN_FLIGHT_IO

2018-03-07 Thread Stefan Hajnoczi

On Wed, Mar 7, 2018 at 7:55 AM, Peter Lieven  wrote:
> Am 06.03.2018 um 17:35 schrieb Peter Lieven:
>> Am 06.03.2018 um 17:07 schrieb Stefan Hajnoczi:
>>> On Mon, Mar 05, 2018 at 02:52:16PM +, Dr. David Alan Gilbert wrote:
 * Peter Lieven (p...@kamp.de) wrote:
> Am 05.03.2018 um 12:45 schrieb Stefan Hajnoczi:
>> On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote:
>>> I stumbled across the MAX_INFLIGHT_IO field that was introduced in 2015 
>>> and was curious what was the reason
>>> to choose 512MB as readahead? The question is that I found that the 
>>> source VM gets very unresponsive I/O wise
>>> while the initial 512MB are read and furthermore seems to stay 
>>> unreasponsive if we choose a high migration speed
>>> and have a fast storage on the destination VM.
>>>
>>> In our environment I modified this value to 16MB which seems to work 
>>> much smoother. I wonder if we should make
>>> this a user configurable value or define a different rate limit for the 
>>> block transfer in bulk stage at least?
>> I don't know if benchmarks were run when choosing the value.  From the
>> commit description it sounds like the main purpose was to limit the
>> amount of memory that can be consumed.
>>
>> 16 MB also fulfills that criteria :), but why is the source VM more
>> responsive with a lower value?
>>
>> Perhaps the issue is queue depth on the storage device - the block
>> migration code enqueues up to 512 MB worth of reads, and guest I/O has
>> to wait?
> That is my guess. Especially if the destination storage is faster we 
> basically alsways have
> 512 I/Os in flight on the source storage.
>
> Does anyone mind if the reduce that value to 16MB or do we need a better 
> mechanism?
 We've got migration-parameters these days; you could connect it to one
 of those fairly easily I think.
 Try: grep -i 'cpu[-_]throttle[-_]initial'  for an example of one that's
 already there.
 Then you can set it to whatever you like.
>>> It would be nice to solve the performance problem without adding a
>>> tuneable.
>>>
>>> On the other hand, QEMU has no idea what the queue depth of the device
>>> is.  Therefore it cannot prioritize guest I/O over block migration I/O.
>>>
>>> 512 parallel requests is much too high.  Most parallel I/O benchmarking
>>> is done at 32-64 queue depth.
>>>
>>> I think that 16 parallel requests is a reasonable maximum number for a
>>> background job.
>>>
>>> We need to be clear though that the purpose of this change is unrelated
>>> to the original 512 MB memory footprint goal.  It just happens to touch
>>> the same constant but the goal is now to submit at most 16 I/O requests
>>> in parallel to avoid monopolizing the I/O device.
>> I think we should really look at this. The variables that control if we stay 
>> in the while loop or not are incremented and decremented
>> at the following places:
>>
>> mig_save_device_dirty:
>> mig_save_device_bulk:
>> block_mig_state.submitted++;
>>
>> blk_mig_read_cb:
>> block_mig_state.submitted--;
>> block_mig_state.read_done++;
>>
>> flush_blks:
>> block_mig_state.read_done--;
>>
>> The condition of the while loop is:
>> (block_mig_state.submitted +
>> block_mig_state.read_done) * BLOCK_SIZE <
>>qemu_file_get_rate_limit(f) &&
>>(block_mig_state.submitted +
>> block_mig_state.read_done) <
>>MAX_INFLIGHT_IO)
>>
>> At first I wonder if we ever reach the rate-limit because we put the read 
>> buffers onto f AFTER we exit the while loop?
>>
>> And even if we reach the limit we constantly maintain 512 I/Os in parallel 
>> because we immediately decrement read_done
>> when we put the buffers to f in flush_blks. In the next iteration of the 
>> while loop we then read again until we have 512 in-flight I/Os.
>>
>> And shouldn't we have a time limit to limit the time we stay in the while 
>> loop? I think we artificially delay sending data to f?
>
> Thinking about it for a while I would propose the following:
>
> a) rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS
> b) add MAX_PARALLEL_IO with a value of 16
> c) compare qemu_file_get_rate_limit only with block_mig_state.read_done
>
> This would yield in the following condition for the while loop:
>
> (block_mig_state.read_done * BLOCK_SIZE < qemu_file_get_rate_limit(f) &&
>  (block_mig_state.submitted + block_mig_state.read_done) < MAX_IO_BUFFERS &&
>  block_mig_state.submitted < MAX_PARALLEL_IO)
>
> Sounds that like a plan?

That sounds good to me.

Stefan

[Qemu-block] block migration and dirty bitmap reset

2018-03-07 Thread Peter Lieven

Hi,

while looking at the code I wonder if the blk_aio_preadv and the 
bdrv_reset_dirty_bitmap order must
be swapped in mig_save_device_bulk:

qemu_mutex_lock_iothread();
aio_context_acquire(blk_get_aio_context(bmds->blk));
blk->aiocb = blk_aio_preadv(bb, cur_sector * BDRV_SECTOR_SIZE, >qiov,
0, blk_mig_read_cb, blk);

bdrv_reset_dirty_bitmap(bmds->dirty_bitmap, cur_sector * BDRV_SECTOR_SIZE,
nr_sectors * BDRV_SECTOR_SIZE);
aio_context_release(blk_get_aio_context(bmds->blk));
qemu_mutex_unlock_iothread();

In mig_save_device_dirty we first reset the dirty bitmap and read then which 
shoulds like
a better idea. Maybe it doesn't matter while we acquire the aioctx and the 
iothread lock...

Peter

92 matches

Mail list logo