Re: [Qemu-block] [PATCH 16/21] block/commit: refactor commit to use job callbacks

2018-08-15 Thread Kevin Wolf
Am 15.08.2018 um 23:17 hat John Snow geschrieben:
> On 08/09/2018 11:12 AM, Kevin Wolf wrote:
> > Am 07.08.2018 um 06:33 hat John Snow geschrieben:
> >> Use the component callbacks; prepare, commit, abort, and clean.
> >>
> >> NB: prepare is only called when the job has not yet failed;
> >> and abort can be called after prepare.
> >>
> >> complete -> prepare -> abort -> clean
> >> complete -> abort -> clean
> > 
> > I always found this confusing. After converting all jobs to use the
> > infrastructure, do you think that this design is helpful? You seem to be
> > calling *_common function from commit and abort, with an almost empty
> > prepare, which looks like a hint that maybe we should reorganise things.
> > 
> 
> After rewriting things a bit, I think it would be nicer to call prepare
> regardless of circumstances. The callbacks will be nicer for it.
> 
> When I wrote it the first time, the thought process was something like:
> 
> "Well, we won't need to prepare anything if we've already failed, we
> just need to speed along to abort."
> 
> I wasn't considering so carefully the common cleanup case that needs to
> occur post-finalization which appears to actually happen in a good
> number of jobs.

Maybe let's see how things turn out when we actually make some more jobs
transactionable. For now, it seems that the *_common function can go
away at least for commit; and we didn't even try to properly split the
completion of the other two jobs.

> >> Signed-off-by: John Snow 
> >> ---
> >>  block/commit.c | 94 
> >> +-
> >>  1 file changed, 61 insertions(+), 33 deletions(-)
> >>
> >> diff --git a/block/commit.c b/block/commit.c
> >> index 1a4a56795f..6673a0544e 100644
> >> --- a/block/commit.c
> >> +++ b/block/commit.c
> >> @@ -35,7 +35,9 @@ typedef struct CommitBlockJob {
> >>  BlockJob common;
> >>  BlockDriverState *commit_top_bs;
> >>  BlockBackend *top;
> >> +BlockDriverState *top_bs;
> >>  BlockBackend *base;
> >> +BlockDriverState *base_bs;
> >>  BlockdevOnError on_error;
> >>  int base_flags;
> >>  char *backing_file_str;
> > 
> > More state. :-(
> > 
> > Can we at least move the new fields to the end of the struct with a
> > comment that they are only valid after .exit()?
> > 
> 
> Sure ... admittedly this is just a crutch because I was not precisely
> sure exactly when these might change out from underneath me. If there
> are some clever ways to avoid needing the state, feel free to suggest them.

I did, in the reply to my own mail. Everything that would need the state
can actually live in .abort, so it can be local.

> >> +}
> >> +
> >> +static void commit_commit(Job *job)
> >> +{
> >> +commit_exit_common(job);
> >> +}
> > 
> > (I think I've answered this question now, by effectively proposing to do
> > away with .commit, but I'll leave it there anyway.)
> > 
> > Is there any reason for the split between .prepare and .commit as it is
> > or is this mostly arbitrary? Probably the latter because commit isn't
> > even transactionable?
> > 
> 
> Just historical, yeah -- we only had commit and abort for a while, and
> prepare didn't join the party until we wanted finalize and it became
> apparent we needed a way to check the return code and still be able to
> fail the transaction in time to be able to do a final commit or still
> abort very, very late in the process.
> 
> Probably there's no real meaningful reason that prepare and commit need
> to be separate callbacks.
> 
> It is possible that:
> 
> prepare --> [abort] --> clean
> 
> would be entirely sufficient for all current cases.

I think jobs that actually support transactions (i.e. backup currently)
do in fact need commit. The other ones might not.

Kevin



Re: [Qemu-block] [PATCH 04/21] block/commit: utilize job_exit shim

2018-08-15 Thread Kevin Wolf
Am 15.08.2018 um 22:52 hat John Snow geschrieben:
> On 08/08/2018 12:29 PM, Kevin Wolf wrote:
> > Am 07.08.2018 um 06:33 hat John Snow geschrieben:
> >> Change the manual deferment to commit_complete into the implicit
> >> callback to job_exit.
> >>
> >> Signed-off-by: John Snow 
> > 
> > There is one tricky thing in this patch that the commit message could be
> > a bit more explicit about, which is moving job_completed() to a later
> > point.
> > 
> > This is the code that happens between the old call of job_completed()
> > and the new one:
> > 
> > /* If bdrv_drop_intermediate() didn't already do that, remove the commit
> >  * filter driver from the backing chain. Do this as the final step so 
> > that
> >  * the 'consistent read' permission can be granted.  */
> > if (remove_commit_top_bs) {
> > bdrv_child_try_set_perm(commit_top_bs->backing, 0, BLK_PERM_ALL,
> > &error_abort);
> > bdrv_replace_node(commit_top_bs, backing_bs(commit_top_bs),
> >   &error_abort);
> > }
> > 
> > bdrv_unref(commit_top_bs);
> > bdrv_unref(top);
> > 
> > As the comment states, bdrv_replace_node() requires that the permission
> > restrictions that the commit job made are already lifted. The most
> > important part is done by the explicit block_job_remove_all_bdrv() call
> > right before this hunk. It still leaves bjob->blk around, which could
> > have implications, but luckily we didn't take any permissions for that
> > one:
> > 
> > s = block_job_create(job_id, &commit_job_driver, NULL, bs, 0, 
> > BLK_PERM_ALL,
> >  speed, JOB_DEFAULT, NULL, NULL, errp);
> > 
> > So I think we got everything out of the way and bdrv_replace_node() can
> > do what it wants to do.
> > 
> > Kevin
> > 
> 
> I suppose it will be up to the author of a job to be aware of any
> permissions they pick up at creation time that might have an effect on
> the cleanup they wish to do during completion time.

I'm really only talking about the commit job, the specific conversion
in this patch and the terse commit message.

Kevin



Re: [Qemu-block] [Qemu-devel] [RFC PATCH] rbd: Don't convert keypairs to JSON and back

2018-08-15 Thread Markus Armbruster
Max Reitz  writes:

> On 2018-08-15 10:12, Markus Armbruster wrote:
>> Max Reitz  writes:
>
> [...]
>
>>> To me personally the issue is that if you can specify a plain filename,
>>> bdrv_refresh_filename() should give you that plain filename back.  So
>>> rbd's implementation of that is lacking.  Well, it just doesn't exist.
>> 
>> I'm not even sure I understand what you're talking about.
>
> We have this bdrv_refresh_filename() thing which can do this:
>
> $ qemu-img info \
> "json:{'driver':'raw',
>'file':{'driver':'nbd','host':'localhost'}}"
> image: nbd://localhost:10809
> [...]
>
> So it can reconstruct a plain filename even if you specify it as options
> instead of just using a plain filename.
>
>
> Now here's my fault: I thought it might be necessary for a driver to
> implement that function (which rbd doesn't) so that you'd get a nice
> filename back (instead of just json:{} garbled things).   But you don't.
>  For protocol drivers, you'll just get the initial filename back.  (So
> my comment was just wrong.)
>
> So what I was thinking about was some case where you specified a normal
> plain filename and qemu would give you back json:{}.  (If rbd
> implemented bdrv_refresh_filename(), that wouldn't happen, because it
> would reconstruct a nice normal filename.)  It turns out, I don't think
> that can happen so easily.  You'll just get your filename back.
>
> Because here's what I'm thinking: If someone uses an option that is
> undocumented and starts with =, well, too bad.  If someone uses a normal
> filename, but gets back a json:{} filename...  Then they are free to use
> that anywhere, and their use of "=" is legitimized.
>
>
> Now that issue kind of reappears when you open an RBD volume, and then
> e.g. take a blockdev-snapshot.  Then your overlay has an overridden
> backing file (one that may be different from what its image header says)
> and its filename may well become a json:{} one (to arrange for the
> overridden backing file options).  Of course, if you opened the RBD
> volume with a filename with some of the options warranting
> =keyvalue-pairs, then your json:{} filename will contain those options
> under =keyvalue-pairs.
>
> So...  I'm not quite sure what I want to say?  I think there are edge
> cases where the user may not have put any weird option into qemu, but
> they do get a json:{} filename with =keyvalue-pairs out of it.  And I
> think users are free to use json:{} filenames qemu spews at them, and we
> can't blame them for it.

Makes sense.

More reason to deprecate key-value pairs in pseudo-filenames.

The only alternative would be to also provide them in QAPI
BlockdevOptionsRbd.  I find that as distasteful as ever, but if the
block maintainers decide we need it, I'll hold my nose.

 If so, and we are comfortable changing the output the way this patch does
 (technically altering ABI anyway), we might as well go all the way and
 filter it out completely.  That would be preferable to cleaning up the json
 output of the internal key/value pairs, IMO.
>>>
>>> Well, this filtering at least is done by my "Fix some filename
>>> generation issues" series.
>> 
>> Likewise.
>
> The series overhauls quite a bit of the bdrv_refresh_filename()
> infrastructure.  That function is also responsible for generating
> json:{} filenames.
>
> One thing it introduces is a BlockDriver field where a driver can
> specify which of the runtime options are actually important.  The rest
> is omitted from the generated json:{} filename.
>
> I may have taken the liberty not to include =keyvalue-pairs in RBD's
> "strong runtime options" list.

I see.

Permit me to digress a bit.

I understand one application for generating a json: filename is for
putting it into an image file header (say as a COW's backing image).

Having image file headers point to other images is not as simple as it
may look at first glance.  The basic case of image format plus plain
filename (not containing '/') is straightforward enough.  But if I make
the filename absolute (with a leading '/'), the image becomes less easy
to move to another machine.

Similarly, certain Ceph configuration bits may make no sense on another
machine, and putting them into an image file header may not be a good
idea.

Rather vague, I'm afraid.



Re: [Qemu-block] [Qemu-devel] [RFC PATCH] rbd: Don't convert keypairs to JSON and back

2018-08-15 Thread Markus Armbruster
Jeff Cody  writes:

> On Wed, Aug 15, 2018 at 10:12:12AM +0200, Markus Armbruster wrote:
>> Max Reitz  writes:
>> 
>> > On 2018-07-28 06:32, Jeff Cody wrote:
>> >> On Wed, Jul 25, 2018 at 05:01:44PM +0100, Daniel P. Berrangé wrote:
>> >>> On Wed, Jul 25, 2018 at 10:56:48AM -0500, Eric Blake wrote:
>>  On 07/25/2018 10:10 AM, Markus Armbruster wrote:
>> > qemu_rbd_parse_filename() builds a keypairs QList, converts it to 
>> > JSON, and
>> > stores the resulting QString in a QDict.
>> >
>> > qemu_rbd_co_create_opts() and qemu_rbd_open() get the QString from the
>> > QDict, pass it to qemu_rbd_set_keypairs(), which converts it back into
>> > a QList.
>> >
>> > Drop both conversions, store the QList instead.
>> >
>> > This affects output of qemu-img info.  Before this patch:
>> >
>> >  $ qemu-img info 
>> > rbd:rbd/testimg.raw:mon_host=192.168.15.180:rbd_cache=true:conf=/tmp/ceph.conf
>> >  [junk printed by Ceph library code...]
>> >  image: json:{"driver": "raw", "file": {"pool": "rbd", "image": 
>> > "testimg.raw", "conf": "/tmp/ceph.conf", "driver": "rbd", 
>> > "=keyvalue-pairs": "[\"mon_host\", \"192.168.15.180\", \"rbd_cache\", 
>> > \"true\"]"}}
>> >  [more output, not interesting here]
>> >
>> > After this patch, we get
>> >
>> >  image: json:{"driver": "raw", "file": {"pool": "rbd", "image": 
>> > "testimg.raw", "conf": "/tmp/ceph.conf", "driver": "rbd", 
>> > "=keyvalue-pairs": ["mon_host", "192.168.15.180", "rbd_cache", 
>> > "true"]}}
>> >
>> > The value of member "=keyvalue-pairs" changes from a string containing
>> > a JSON array to that JSON array.  That's an improvement of sorts.  
>> > However:
>> >
>> > * Should "=keyvalue-pairs" even be visible here?  It's supposed to be
>> >purely internal...
>> 
>>  I'd argue that since it is supposed to be internal (as evidenced by the
>>  leading '=' that does not name a normal variable), changing it doesn't 
>>  hurt
>>  stability. But yes, it would be nicer if we could filter it entirely so 
>>  that
>>  it does not appear in json: output, if it doesn't truly affect the 
>>  contents
>>  that the guest would see.
>> >>>
>> >>> If it appears in the json: output, then that means it could get written
>> >>> into qcow2 headers as a backing file name, which would make it ABI
>> >>> sensitive. This makes it even more important to filter it if it is 
>> >>> supposed
>> >>> to be internal only, with no ABI guarantee.
>> >>>
>> >> 
>> >> It's been present for a couple releases (counting 3.0); is it safe to
>> >> assume that, although it could be present in the qcow2 headers, that it 
>> >> will
>> >> not break anything by altering it or removing it?
>> >
>> > Did =keyvalue-pairs even work in json:{} filename?  If so, it will
>> > continue to work even after filtering it.  If not, then filtering it
>> > won't break existing files because they didn't work before either.
>> 
>> I'm afraid it does work:
>> 
>> $ gdb --args upstream-qemu -nodefaults -S -display vnc=:0 -readconfig 
>> test.cfg 'json:{"driver": "raw", "file": {"pool": "rbd", "image": 
>> "testimg.raw", "conf": "/tmp/ceph.conf", "driver": "rbd", "=keyvalue-pairs": 
>> "[\"mon_host\", \"192.168.15.180\", \"rbd_cache\", \"true\"]"}}'
>> GNU gdb (GDB) Fedora 8.1-25.fc28
>> [...]
>> (gdb) b qemu_rbd_open 
>> Breakpoint 1 at 0x845f83: file /work/armbru/qemu/block/rbd.c, line 660.
>> (gdb) r
>> Starting program: /home/armbru/bin/upstream-qemu -nodefaults -S -display 
>> vnc=:0 -readconfig test.cfg json:\{\"driver\":\ \"raw\",\ \"file\":\ 
>> \{\"pool\":\ \"rbd\",\ \"image\":\ \"testimg.raw\",\ \"conf\":\ 
>> \"/tmp/ceph.conf\",\ \"driver\":\ \"rbd\",\ \"=keyvalue-pairs\":\ 
>> \"\[\\\"mon_host\\\",\ \\\"192.168.15.180\\\",\ \\\"rbd_cache\\\",\ 
>> \\\"true\\\"\]\"\}\}
>> [...]
>> Thread 1 "upstream-qemu" hit Breakpoint 1, qemu_rbd_open 
>> (bs=0x56a5a970, 
>> options=0x56a5ec40, flags=24578, errp=0x7fffd370)
>> at /work/armbru/qemu/block/rbd.c:660
>> 660  {
>> [...]
>> (gdb) n
>> 661  BDRVRBDState *s = bs->opaque;
>> (gdb) 
>> 662  BlockdevOptionsRbd *opts = NULL;
>> (gdb) 
>> 665  Error *local_err = NULL;
>> (gdb) 
>> 669  keypairs = g_strdup(qdict_get_try_str(options, 
>> "=keyvalue-pairs"));
>> (gdb) 
>> 670  if (keypairs) {
>> (gdb) p keypairs 
>> $1 = 0x569e54c0 "[\"mon_host\", \"192.168.15.180\", \"rbd_cache\", 
>> \"true\"]"
>> 
>> It really, really, really should not work.
>> 
>> It doesn't work with anything that relies on QAPI to represent
>> configuration (such as QMP's blockdev-add), because BlockdevOptionsRbd
>> doesn't have it.
>> 
>> It works with -drive only with a pseudo-filename (more on that below),
>> even though -drive uses Qe

Re: [Qemu-block] [Qemu-devel] [PATCH for-3.1 v10 00/31] block: Fix some filename generation issues

2018-08-15 Thread Markus Armbruster
Max Reitz  writes:

> On 2018-08-15 05:43, no-re...@patchew.org wrote:
>> Hi,
>> 
>> This series seems to have some coding style problems. See output below for
>> more information:
>
> [...]
>
>> === OUTPUT BEGIN ===
>> Checking PATCH 1/31: block: Use bdrv_refresh_filename() to pull...
>> Checking PATCH 2/31: block: Use children list in bdrv_refresh_filename...
>> Checking PATCH 3/31: block: Skip implicit nodes for filename info...
>> Checking PATCH 4/31: block: Add BDS.auto_backing_file...
>> Checking PATCH 5/31: block: Respect backing bs in bdrv_refresh_filename...
>> ERROR: return is not a function, parentheses are not required
>> #46: FILE: block.c:5192:
>> +return (bs->auto_backing_file[0] != '\0');
>> 
>> total: 1 errors, 0 warnings, 136 lines checked
>
> Sure, but I'd hate you personally if you omitted them.

@@
expression E;
@@
-return (E);
+return E;

You're welcome!



Re: [Qemu-block] [PATCH 2/2] qemu-img: Add dd seek= option

2018-08-15 Thread Max Reitz
On 2018-08-16 04:57, Eric Blake wrote:
> On 08/15/2018 09:49 PM, Max Reitz wrote:
> 
 In my opinion, we do not want feature parity with dd.  What we do want
 is feature parity with convert.
>>>
>>> Well, convert is lacking a way to specify a subset of one file to move
>>> to a (possibly different) subset of the other.  I'm fine if we want to
>>> enhance convert to do the things that right now require a dd-alike
>>> interface (namely, limiting the copying to less than the full file, and
>>> choosing the offset at which to start [before this patch] or write to
>>> [with this patch]).
>>
>> Yes, I would want that.
>>
>>> If convert were more powerful, I'd be fine dropping 'qemu-img dd' after
>>> a proper deprecation period.
>>
>> Technically it has those features already, with the raw block driver's
>> offset and size parameters.
> 
> Perhaps so, but it will be an interesting exercise in rewriting the
> shorthand nbd://host:port/export into the proper longhand driver syntax.

Don't dare me! :-)

>>> Because of performance: qemu-nbd + Linux nbd device + real dd is one
>>> more layer of data copying (each write() from dd goes to kernel, then is
>>> sent to qemu-nbd in userspace as a socket message before being sent back
>>> to the kernel to actually write() to the final destination) compared to
>>> just doing it all in one process (write() lands in the final destination
>>> with no further user space bouncing).  And because the additional steps
>>> to set it up are awkward (see my other email where I rant about losing
>>> the better part of today to realizing that 'dd ...; qemu-nbd -d
>>> /dev/nbd1' loses data if you omit conv=fdatasync).
>>
>> I can see the sync problems, but is the performance really that much
>> worse?
> 
> When you don't have sparse file support, reading or writing large blocks
> of zeroes really is worse over /dev/nbd* than over a server/client pair
> that know how to do it efficiently.  But for non-sparse data, I don't
> know if a benchmark would be able to consistently note a difference
> (might be a fun benchmark for someone to try, but not high on my current
> to-do list).

Hm.  Yeah.  Well, for me, it remains that it would be better to have a
good way of exposing image contents to all of the rest of the system and
all of the nice tools there already are instead of re-implementing them
in qemu.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 2/2] qemu-img: Add dd seek= option

2018-08-15 Thread Eric Blake

On 08/15/2018 09:49 PM, Max Reitz wrote:


In my opinion, we do not want feature parity with dd.  What we do want
is feature parity with convert.


Well, convert is lacking a way to specify a subset of one file to move
to a (possibly different) subset of the other.  I'm fine if we want to
enhance convert to do the things that right now require a dd-alike
interface (namely, limiting the copying to less than the full file, and
choosing the offset at which to start [before this patch] or write to
[with this patch]).


Yes, I would want that.


If convert were more powerful, I'd be fine dropping 'qemu-img dd' after
a proper deprecation period.


Technically it has those features already, with the raw block driver's
offset and size parameters.


Perhaps so, but it will be an interesting exercise in rewriting the 
shorthand nbd://host:port/export into the proper longhand driver syntax.





Because of performance: qemu-nbd + Linux nbd device + real dd is one
more layer of data copying (each write() from dd goes to kernel, then is
sent to qemu-nbd in userspace as a socket message before being sent back
to the kernel to actually write() to the final destination) compared to
just doing it all in one process (write() lands in the final destination
with no further user space bouncing).  And because the additional steps
to set it up are awkward (see my other email where I rant about losing
the better part of today to realizing that 'dd ...; qemu-nbd -d
/dev/nbd1' loses data if you omit conv=fdatasync).


I can see the sync problems, but is the performance really that much worse?


When you don't have sparse file support, reading or writing large blocks 
of zeroes really is worse over /dev/nbd* than over a server/client pair 
that know how to do it efficiently.  But for non-sparse data, I don't 
know if a benchmark would be able to consistently note a difference 
(might be a fun benchmark for someone to try, but not high on my current 
to-do list).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH 2/2] qemu-img: Add dd seek= option

2018-08-15 Thread Max Reitz
On 2018-08-16 04:39, Eric Blake wrote:
> On 08/15/2018 09:20 PM, Max Reitz wrote:
>> On 2018-08-15 04:56, Eric Blake wrote:
>>> For feature parity with dd, we want to be able to specify
>>> the offset within the output file, just as we can specify
>>> the offset for the input (in particular, this makes copying
>>> a subset range of guest-visible bytes from one file to
>>> another much easier).
>>
>> In my opinion, we do not want feature parity with dd.  What we do want
>> is feature parity with convert.
> 
> Well, convert is lacking a way to specify a subset of one file to move
> to a (possibly different) subset of the other.  I'm fine if we want to
> enhance convert to do the things that right now require a dd-alike
> interface (namely, limiting the copying to less than the full file, and
> choosing the offset at which to start [before this patch] or write to
> [with this patch]).

Yes, I would want that.

> If convert were more powerful, I'd be fine dropping 'qemu-img dd' after
> a proper deprecation period.

Technically it has those features already, with the raw block driver's
offset and size parameters.

>>> The code style for 'qemu-img dd' was pretty hard to read;
>>> unfortunately this patch focuses only on adding the new
>>> feature in the existing style rather than trying to improve
>>> the overall flow, other than switching octal constants to
>>> hex.  Oh well.
>>
>> No, the real issue is that dd is still not implemented just as a
>> frontend to convert.  Which it should be.  I'm not sure dd was a very
>> good idea from the start, and now it should ideally be a frontend to
>> convert.
>>
>> (My full opinion on the matter: dd has a horrible interface.  I don't
>> quite see why we replicated that inside qemu-img.  Also, if you want to
>> use dd, why not use qemu-nbd + Linux nbd device + real dd?)
> 
> Because of performance: qemu-nbd + Linux nbd device + real dd is one
> more layer of data copying (each write() from dd goes to kernel, then is
> sent to qemu-nbd in userspace as a socket message before being sent back
> to the kernel to actually write() to the final destination) compared to
> just doing it all in one process (write() lands in the final destination
> with no further user space bouncing).  And because the additional steps
> to set it up are awkward (see my other email where I rant about losing
> the better part of today to realizing that 'dd ...; qemu-nbd -d
> /dev/nbd1' loses data if you omit conv=fdatasync).

I can see the sync problems, but is the performance really that much worse?

>> ((That gave me a good idea.  Actually, it's probably not such a good
>> idea, but I guess I'll do it in my spare time anyway.  A qemu-img fuse
>> might be nice which represents an image as a raw image at some mount
>> point.  Benefits over qemu-nbd: (1) You don't need root, (2) you don't
>> need to type modprobe nbd.))
> 
> So the kernel->userspace translation would be happening via the FUSE
> interface instead of the NBD interface.  Data still bounces around just
> as much, but it might be a fun project.  Does fuse behave well when
> serving exactly one file at the mountpoint, rather than the more typical
> file system rooted in a directory?  NBD at least has the benefit of
> claiming to be a block device all along, rather than complicating the
> user interface with POSIX file system rules (which you'll be bending,
> because you are serving exactly one file instead of a system).

Well, but I can just pretend my FUSE file is a block device, no?

Also, I just discovered something really interesting: FUSE allows you to
specify a single file as a mountpoint.

And you know what?  You can open the original file before you replace it
by the FUSE "filesystem".

So my fun interface is going to looks like this:

$ qemu-img fuse foo.qcow2

And then your foo.qcow2 is a raw image until the next "fusermount -u
foo.qcow2"!  Isn't that fun?

>>> Also, switch the test to use an offset of 0 instead of 1,
>>> to test skip= and seek= on their own; as it is, this is
>>> effectively quadrupling the test runtime, which starts
>>> to make this test borderline on whether it should still
>>> belong to './check -g quick'.  And I didn't bother to
>>> reindent the test shell code for the new nested loop.
>>
>> In my opinion, it should no longer belong to quick.  It takes 8 s on my
>> tmpfs.  My border is somewhere around 2 or 3; and I haven't yet decided
>> whether that's on tmpfs or SSD.
> 
> I took 4 iterations pre-patch, to 8 iterations after patch 1, to 32
> iterations with this patch; my observed times went from 1s to 2s to 7s
> on SSD ext4. Yeah, for v2, I'll drop it from quick.

Thanks!

>>> @@ -4574,7 +4592,14 @@ static int img_dd(int argc, char **argv)
>>>   size = dd.count * in.bsz;
>>>   }
>>>
>>> -    qemu_opt_set_number(opts, BLOCK_OPT_SIZE, size, &error_abort);
>>> +    if (dd.flags & C_SEEK && out.offset * out.bsz > INT64_MAX - size) {
>>
>> What about overflows in out.offset * out.bsz?
> 
> I've had 

Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] qemu-img: Add dd seek= option

2018-08-15 Thread Eric Blake

On 08/15/2018 09:39 PM, Eric Blake wrote:


(My full opinion on the matter: dd has a horrible interface.  I don't
quite see why we replicated that inside qemu-img.  Also, if you want to
use dd, why not use qemu-nbd + Linux nbd device + real dd?)


Because of performance: qemu-nbd + Linux nbd device + real dd is one 
more layer of data copying (each write() from dd goes to kernel, then is 
sent to qemu-nbd in userspace as a socket message before being sent back 
to the kernel to actually write() to the final destination) compared to 
just doing it all in one process (write() lands in the final destination 
with no further user space bouncing).  And because the additional steps 
to set it up are awkward (see my other email where I rant about losing 
the better part of today to realizing that 'dd ...; qemu-nbd -d 
/dev/nbd1' loses data if you omit conv=fdatasync).


Oh, and because the kernel NBD module still doesn't know how to do 
sparse writes or expose the location of holes. In the kernel source, see 
drivers/block/nbd.c and weep at how far it lags behind qemu-nbd features 
that at least 'qemu-img convert' can take advantage of, but aren't 
present via /dev/nbd*


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH 2/2] qemu-img: Add dd seek= option

2018-08-15 Thread Eric Blake

On 08/15/2018 09:20 PM, Max Reitz wrote:

On 2018-08-15 04:56, Eric Blake wrote:

For feature parity with dd, we want to be able to specify
the offset within the output file, just as we can specify
the offset for the input (in particular, this makes copying
a subset range of guest-visible bytes from one file to
another much easier).


In my opinion, we do not want feature parity with dd.  What we do want
is feature parity with convert.


Well, convert is lacking a way to specify a subset of one file to move 
to a (possibly different) subset of the other.  I'm fine if we want to 
enhance convert to do the things that right now require a dd-alike 
interface (namely, limiting the copying to less than the full file, and 
choosing the offset at which to start [before this patch] or write to 
[with this patch]).


If convert were more powerful, I'd be fine dropping 'qemu-img dd' after 
a proper deprecation period.





The code style for 'qemu-img dd' was pretty hard to read;
unfortunately this patch focuses only on adding the new
feature in the existing style rather than trying to improve
the overall flow, other than switching octal constants to
hex.  Oh well.


No, the real issue is that dd is still not implemented just as a
frontend to convert.  Which it should be.  I'm not sure dd was a very
good idea from the start, and now it should ideally be a frontend to
convert.

(My full opinion on the matter: dd has a horrible interface.  I don't
quite see why we replicated that inside qemu-img.  Also, if you want to
use dd, why not use qemu-nbd + Linux nbd device + real dd?)


Because of performance: qemu-nbd + Linux nbd device + real dd is one 
more layer of data copying (each write() from dd goes to kernel, then is 
sent to qemu-nbd in userspace as a socket message before being sent back 
to the kernel to actually write() to the final destination) compared to 
just doing it all in one process (write() lands in the final destination 
with no further user space bouncing).  And because the additional steps 
to set it up are awkward (see my other email where I rant about losing 
the better part of today to realizing that 'dd ...; qemu-nbd -d 
/dev/nbd1' loses data if you omit conv=fdatasync).




((That gave me a good idea.  Actually, it's probably not such a good
idea, but I guess I'll do it in my spare time anyway.  A qemu-img fuse
might be nice which represents an image as a raw image at some mount
point.  Benefits over qemu-nbd: (1) You don't need root, (2) you don't
need to type modprobe nbd.))


So the kernel->userspace translation would be happening via the FUSE 
interface instead of the NBD interface.  Data still bounces around just 
as much, but it might be a fun project.  Does fuse behave well when 
serving exactly one file at the mountpoint, rather than the more typical 
file system rooted in a directory?  NBD at least has the benefit of 
claiming to be a block device all along, rather than complicating the 
user interface with POSIX file system rules (which you'll be bending, 
because you are serving exactly one file instead of a system).





Also, switch the test to use an offset of 0 instead of 1,
to test skip= and seek= on their own; as it is, this is
effectively quadrupling the test runtime, which starts
to make this test borderline on whether it should still
belong to './check -g quick'.  And I didn't bother to
reindent the test shell code for the new nested loop.


In my opinion, it should no longer belong to quick.  It takes 8 s on my
tmpfs.  My border is somewhere around 2 or 3; and I haven't yet decided
whether that's on tmpfs or SSD.


I took 4 iterations pre-patch, to 8 iterations after patch 1, to 32 
iterations with this patch; my observed times went from 1s to 2s to 7s 
on SSD ext4. Yeah, for v2, I'll drop it from quick.



@@ -4574,7 +4592,14 @@ static int img_dd(int argc, char **argv)
  size = dd.count * in.bsz;
  }

-qemu_opt_set_number(opts, BLOCK_OPT_SIZE, size, &error_abort);
+if (dd.flags & C_SEEK && out.offset * out.bsz > INT64_MAX - size) {


What about overflows in out.offset * out.bsz?


I've had enough of my eyes bleeding on all the code repeatedly scaling 
things. For v2, I'm strongly considering a cleanup patch that reads all 
input, then scales all values into bytes, and THEN performs any 
additional math in a single unit, just so the additions become easier to 
reason about.





+error_report("Seek too large for '%s'", out.filename);
+ret = -1;
+goto out;


Real dd doesn't seem to error out (it just reports an error).  I don't
know whether that makes any difference, though.


But where does the data get written if you can't actually seek that far 
into the file?




The test looks good to me.


Other than my creative indentation levels ;)



Max


+}
+seek = out.offset * out.bsz;
+
+qemu_opt_set_number(opts, BLOCK_OPT_SIZE, size + seek, &error_abort);
  end = size + in_pos;

  ret = bdrv_create(drv, out.filenam

Re: [Qemu-block] [PATCH 2/2] qemu-img: Add dd seek= option

2018-08-15 Thread Max Reitz
On 2018-08-15 04:56, Eric Blake wrote:
> For feature parity with dd, we want to be able to specify
> the offset within the output file, just as we can specify
> the offset for the input (in particular, this makes copying
> a subset range of guest-visible bytes from one file to
> another much easier).

In my opinion, we do not want feature parity with dd.  What we do want
is feature parity with convert.

> The code style for 'qemu-img dd' was pretty hard to read;
> unfortunately this patch focuses only on adding the new
> feature in the existing style rather than trying to improve
> the overall flow, other than switching octal constants to
> hex.  Oh well.

No, the real issue is that dd is still not implemented just as a
frontend to convert.  Which it should be.  I'm not sure dd was a very
good idea from the start, and now it should ideally be a frontend to
convert.

(My full opinion on the matter: dd has a horrible interface.  I don't
quite see why we replicated that inside qemu-img.  Also, if you want to
use dd, why not use qemu-nbd + Linux nbd device + real dd?)

((That gave me a good idea.  Actually, it's probably not such a good
idea, but I guess I'll do it in my spare time anyway.  A qemu-img fuse
might be nice which represents an image as a raw image at some mount
point.  Benefits over qemu-nbd: (1) You don't need root, (2) you don't
need to type modprobe nbd.))

> Also, switch the test to use an offset of 0 instead of 1,
> to test skip= and seek= on their own; as it is, this is
> effectively quadrupling the test runtime, which starts
> to make this test borderline on whether it should still
> belong to './check -g quick'.  And I didn't bother to
> reindent the test shell code for the new nested loop.

In my opinion, it should no longer belong to quick.  It takes 8 s on my
tmpfs.  My border is somewhere around 2 or 3; and I haven't yet decided
whether that's on tmpfs or SSD.

> Signed-off-by: Eric Blake 
> ---
>  qemu-img.c |  41 --
>  tests/qemu-iotests/160 |  12 +-
>  tests/qemu-iotests/160.out | 304 
> +++--
>  3 files changed, 336 insertions(+), 21 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index d72f0f0ec94..ee01a18f331 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c

[...]

> @@ -4574,7 +4592,14 @@ static int img_dd(int argc, char **argv)
>  size = dd.count * in.bsz;
>  }
> 
> -qemu_opt_set_number(opts, BLOCK_OPT_SIZE, size, &error_abort);
> +if (dd.flags & C_SEEK && out.offset * out.bsz > INT64_MAX - size) {

What about overflows in out.offset * out.bsz?

> +error_report("Seek too large for '%s'", out.filename);
> +ret = -1;
> +goto out;

Real dd doesn't seem to error out (it just reports an error).  I don't
know whether that makes any difference, though.

The test looks good to me.

Max

> +}
> +seek = out.offset * out.bsz;
> +
> +qemu_opt_set_number(opts, BLOCK_OPT_SIZE, size + seek, &error_abort);
>  end = size + in_pos;
> 
>  ret = bdrv_create(drv, out.filename, opts, &local_err);



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 1/2] qemu-img: Fix dd with skip= and count=

2018-08-15 Thread Max Reitz
On 2018-08-16 04:17, Eric Blake wrote:
> On 08/15/2018 09:03 PM, Max Reitz wrote:
> 
>>> @@ -4559,19 +4559,23 @@ static int img_dd(int argc, char **argv)
>>>   goto out;
>>>   }
>>>
>>> +    /* Overflow means the specified offset is beyond input image's
>>> size */
>>> +    if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
>>> +  size < in.bsz * in.offset)) {
>>> +    size = 0;
>>> +    error_report("%s: cannot skip to specified offset",
>>> in.filename);
>>
>> in_pos should be initialized as well (to "size", I suppose), or my gcc
>> will continue to complain. :-)
>>
> 
> Serves me right for compiling with -g instead of -O2 (gcc only does that
> warning on optimized builds, for some weird reason).  Will fix in v2.
> 
> 
>>> +++ b/tests/qemu-iotests/160
>>> @@ -44,6 +44,7 @@ _supported_os Linux
>>>   TEST_SKIP_BLOCKS="1 2 30 30K"
>>>
>>>   for skip in $TEST_SKIP_BLOCKS; do
>>> +  for count in '' 'count=1 '; do
>>
>> Ah, so this is why we indent everything by four spaces!  So you can
>> squeeze in three more block headers without having to re-indent
>> everything.  I finally see. O:-)
> 
> I'm seriously thinking of reindenting things in this and the next patch,
> rather than my initial quick-and-dirty "squeeze it in".  'git diff -w'
> is not that hard to use, after all.
> 
>>
>> (Not sure why you put a space after the 'count=1', though, but OK)
> 
> For this line:
> 
> +    echo "== Converting the image with dd with ${count}skip=$skip =="
> 
> so that when $count is empty, the .out file doesn't end up with a double
> space.  Okay, I do have some sense of output aesthetics, even if my
> re-indentation skills are lacking ;=)

Ah, right, I forgot you'd get the double space.  That's true.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 1/2] qemu-img: Fix dd with skip= and count=

2018-08-15 Thread Eric Blake

On 08/15/2018 09:03 PM, Max Reitz wrote:


@@ -4559,19 +4559,23 @@ static int img_dd(int argc, char **argv)
  goto out;
  }

+/* Overflow means the specified offset is beyond input image's size */
+if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
+  size < in.bsz * in.offset)) {
+size = 0;
+error_report("%s: cannot skip to specified offset", in.filename);


in_pos should be initialized as well (to "size", I suppose), or my gcc
will continue to complain. :-)



Serves me right for compiling with -g instead of -O2 (gcc only does that 
warning on optimized builds, for some weird reason).  Will fix in v2.




+++ b/tests/qemu-iotests/160
@@ -44,6 +44,7 @@ _supported_os Linux
  TEST_SKIP_BLOCKS="1 2 30 30K"

  for skip in $TEST_SKIP_BLOCKS; do
+  for count in '' 'count=1 '; do


Ah, so this is why we indent everything by four spaces!  So you can
squeeze in three more block headers without having to re-indent
everything.  I finally see. O:-)


I'm seriously thinking of reindenting things in this and the next patch, 
rather than my initial quick-and-dirty "squeeze it in".  'git diff -w' 
is not that hard to use, after all.




(Not sure why you put a space after the 'count=1', though, but OK)


For this line:

+echo "== Converting the image with dd with ${count}skip=$skip =="

so that when $count is empty, the .out file doesn't end up with a double 
space.  Okay, I do have some sense of output aesthetics, even if my 
re-indentation skills are lacking ;=)


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Qemu-devel] [PATCH 0/2] Improve qemu-img dd

2018-08-15 Thread Eric Blake

On 08/14/2018 09:56 PM, Eric Blake wrote:

I was trying to test NBD fleecing by copying subsets of one
file to another, and had the idea to use:

$ export NBD drive to be fleeced on port 10809
$ qemu-img create -f qcow2 copy $size
$ qemu-nbd -f qcow2 -p 10810 copy
$ qemu-img dd -f raw -O raw if=nbd://localhost:10809 of=nbd://localhost:10810 \
 skip=$offset seek=$offset count=$((len/cluster)) bs=$cluster

except that seek= wasn't implemented. And in implementing that,
I learned that skip= is broken when combined with count=.


On the bright side, 'qemu-img dd' appears to properly flush images when 
closing them, even though it does not have a conv= parameter.




[In the meantime, I had to use:

$ export NBD drive to be fleeced on port 10809
$ modprobe nbd
$ qemu-nbd -c /dev/nbd0 -f raw nbd://localhost:10809
$ qemu-nbd -c /dev/nbd1 -f qcow2 copy
$ dd if=/dev/nbd0 of/dev/nbd1 \


Oops, left out one = on that line.


 skip=$offset seek=$offset count=$((len/cluster)) bs=$cluster


And this needs to be skip=$((offset/cluster)) seek=$((offset/cluster)), 
unless iflag=skip_bytes oflag=seek_bytes is also in use.


What's more, it's essential to use conv=fdatasync when scripting this, 
otherwise, the dd process can end while the data is still in kernel 
buffers, and if 'qemu-nbd -d /dev/nbd1' follows too closely, those 
buffers are lost instead of flushed.


I lost the better part of a day figuring out why things worked when I 
did it by hand but not when I scripted it, until finally figuring out 
that the final flush (or a long sleep before disconnect) is mandatory to 
avoid data loss.  And it did not help that 'qemu-nbd -c' forks parallel 
processes in order to both serve an image over NBD and to connect the 
kernel as client to that private socket, but in such as way that 
--trace=nbd_\* no longer works due to the fd dance performed in setting 
things up, so it's much harder to see what requests the kernel is 
making. Also useful for anyone else trying to debug setups like this: 
'strace -D -o file -f qemu-nbd ...' is essential (without -D, the 
parallel processes don't get set up correctly; and -f is needed to trace 
through the forks).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



[Qemu-block] [PATCH 1/2] qemu-img: Fix dd with skip= and count=

2018-08-15 Thread Max Reitz
On 2018-08-15 04:56, Eric Blake wrote:
> When both skip= and count= are active, qemu-img dd was not copying
> enough data. It didn't help that the code made the same check for
> dd.flags & C_SKIP in two separate places. Compute 'size' as the
> amount of bytes to be read, and 'end' as the offset to end at,
> rather than trying to cram both meanings into a single variable
> (which only worked as long as we had at most one of those two
> limiting factors to worry about, but not both).
> 
> Enhance the test to cover more combinations, and expose the problem.
> 
> Signed-off-by: Eric Blake 
> CC: qemu-sta...@nongnu.org
> ---
>  qemu-img.c | 39 -
>  tests/qemu-iotests/160 |  9 ++---
>  tests/qemu-iotests/160.out | 48 
> ++
>  3 files changed, 70 insertions(+), 26 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 1acddf693c6..d72f0f0ec94 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c

[...]

> @@ -4559,19 +4559,23 @@ static int img_dd(int argc, char **argv)
>  goto out;
>  }
> 
> +/* Overflow means the specified offset is beyond input image's size */
> +if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
> +  size < in.bsz * in.offset)) {
> +size = 0;
> +error_report("%s: cannot skip to specified offset", in.filename);

in_pos should be initialized as well (to "size", I suppose), or my gcc
will continue to complain. :-)

The rest looks good to me.

> +} else {
> +size -= in.offset * in.bsz;
> +in_pos = in.offset * in.bsz;
> +}
> +

[...]

> diff --git a/tests/qemu-iotests/160 b/tests/qemu-iotests/160
> index 5c910e5bfc1..48380a3aafc 100755
> --- a/tests/qemu-iotests/160
> +++ b/tests/qemu-iotests/160
> @@ -44,6 +44,7 @@ _supported_os Linux
>  TEST_SKIP_BLOCKS="1 2 30 30K"
> 
>  for skip in $TEST_SKIP_BLOCKS; do
> +  for count in '' 'count=1 '; do

Ah, so this is why we indent everything by four spaces!  So you can
squeeze in three more block headers without having to re-indent
everything.  I finally see. O:-)

(Not sure why you put a space after the 'count=1', though, but OK)

Max

>  echo
>  echo "== Creating image =="
> 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 0/7] qcow2: async handling of fragmented io

2018-08-15 Thread Max Reitz
On 2018-08-07 19:43, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> Here is an asynchronous scheme for handling fragmented qcow2
> reads and writes. Both qcow2 read and write functions loops through
> sequential portions of data. The series aim it to parallelize these
> loops iterations.
> 
> It improves performance for fragmented qcow2 images, I've tested it
> as follows:
> 
> I have four 4G qcow2 images (with default 64k block size) on my ssd disk:
> t-seq.qcow2 - sequentially written qcow2 image
> t-reverse.qcow2 - filled by writing 64k portions from end to the start
> t-rand.qcow2 - filled by writing 64k portions (aligned) in random order
> t-part-rand.qcow2 - filled by shuffling order of 64k writes in 1m clusters
> (see source code of image generation in the end for details)
> 
> and the test (sequential io by 1mb chunks):
> 
> test write:
> for t in /ssd/t-*; \
> do sync; echo 1 > /proc/sys/vm/drop_caches; echo ===  $t  ===; \
> ./qemu-img bench -c 4096 -d 1 -f qcow2 -n -s 1m -t none -w $t; \
> done
> 
> test read (same, just drop -w parameter):
> for t in /ssd/t-*; \
> do sync; echo 1 > /proc/sys/vm/drop_caches; echo ===  $t  ===; \
> ./qemu-img bench -c 4096 -d 1 -f qcow2 -n -s 1m -t none $t; \
> done
> 
> short info about parameters:
>   -w - do writes (otherwise do reads)
>   -c - count of blocks
>   -s - block size
>   -t none - disable cache
>   -n - native aio
>   -d 1 - don't use parallel requests provided by qemu-img bench itself

Hm, actually, why not?  And how does a guest behave?

If parallel requests on an SSD perform better, wouldn't a guest issue
parallel requests to the virtual device and thus to qcow2 anyway?

(I suppose the global qcow2 lock could be an issue here, but then your
benchmark should work even without -d 1.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH for-3.1 v10 00/31] block: Fix some filename generation issues

2018-08-15 Thread Max Reitz
On 2018-08-15 05:43, no-re...@patchew.org wrote:
> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:

[...]

> === OUTPUT BEGIN ===
> Checking PATCH 1/31: block: Use bdrv_refresh_filename() to pull...
> Checking PATCH 2/31: block: Use children list in bdrv_refresh_filename...
> Checking PATCH 3/31: block: Skip implicit nodes for filename info...
> Checking PATCH 4/31: block: Add BDS.auto_backing_file...
> Checking PATCH 5/31: block: Respect backing bs in bdrv_refresh_filename...
> ERROR: return is not a function, parentheses are not required
> #46: FILE: block.c:5192:
> +return (bs->auto_backing_file[0] != '\0');
> 
> total: 1 errors, 0 warnings, 136 lines checked

Sure, but I'd hate you personally if you omitted them.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [RFC PATCH] rbd: Don't convert keypairs to JSON and back

2018-08-15 Thread Max Reitz
On 2018-08-15 10:12, Markus Armbruster wrote:
> Max Reitz  writes:

[...]

>> To me personally the issue is that if you can specify a plain filename,
>> bdrv_refresh_filename() should give you that plain filename back.  So
>> rbd's implementation of that is lacking.  Well, it just doesn't exist.
> 
> I'm not even sure I understand what you're talking about.

We have this bdrv_refresh_filename() thing which can do this:

$ qemu-img info \
"json:{'driver':'raw',
   'file':{'driver':'nbd','host':'localhost'}}"
image: nbd://localhost:10809
[...]

So it can reconstruct a plain filename even if you specify it as options
instead of just using a plain filename.


Now here's my fault: I thought it might be necessary for a driver to
implement that function (which rbd doesn't) so that you'd get a nice
filename back (instead of just json:{} garbled things).   But you don't.
 For protocol drivers, you'll just get the initial filename back.  (So
my comment was just wrong.)

So what I was thinking about was some case where you specified a normal
plain filename and qemu would give you back json:{}.  (If rbd
implemented bdrv_refresh_filename(), that wouldn't happen, because it
would reconstruct a nice normal filename.)  It turns out, I don't think
that can happen so easily.  You'll just get your filename back.

Because here's what I'm thinking: If someone uses an option that is
undocumented and starts with =, well, too bad.  If someone uses a normal
filename, but gets back a json:{} filename...  Then they are free to use
that anywhere, and their use of "=" is legitimized.


Now that issue kind of reappears when you open an RBD volume, and then
e.g. take a blockdev-snapshot.  Then your overlay has an overridden
backing file (one that may be different from what its image header says)
and its filename may well become a json:{} one (to arrange for the
overridden backing file options).  Of course, if you opened the RBD
volume with a filename with some of the options warranting
=keyvalue-pairs, then your json:{} filename will contain those options
under =keyvalue-pairs.

So...  I'm not quite sure what I want to say?  I think there are edge
cases where the user may not have put any weird option into qemu, but
they do get a json:{} filename with =keyvalue-pairs out of it.  And I
think users are free to use json:{} filenames qemu spews at them, and we
can't blame them for it.

>>> If so, and we are comfortable changing the output the way this patch does
>>> (technically altering ABI anyway), we might as well go all the way and
>>> filter it out completely.  That would be preferable to cleaning up the json
>>> output of the internal key/value pairs, IMO.
>>
>> Well, this filtering at least is done by my "Fix some filename
>> generation issues" series.
> 
> Likewise.

The series overhauls quite a bit of the bdrv_refresh_filename()
infrastructure.  That function is also responsible for generating
json:{} filenames.

One thing it introduces is a BlockDriver field where a driver can
specify which of the runtime options are actually important.  The rest
is omitted from the generated json:{} filename.

I may have taken the liberty not to include =keyvalue-pairs in RBD's
"strong runtime options" list.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH] block: for jobs, do not clear user_paused until after the resume

2018-08-15 Thread Jeff Cody
On Wed, Aug 15, 2018 at 05:23:43PM -0400, John Snow wrote:
> 
> 
> On 08/15/2018 11:59 AM, Jeff Cody wrote:
> > The function job_cancel_async() will always cause an assert for blockjob
> > user resume.  We set job->user_paused to false, and then call
> > job->driver->user_resume().  In the case of blockjobs, this is the
> > block_job_user_resume() function.
> > 
> > In that function, we assert that job.user_paused is set to true.
> > Unfortunately, right before calling this function, it has explicitly
> > been set to false.
> > 
> > The fix is pretty simple: set job->user_paused to false only after the
> > job user_resume() function has been called.
> > 
> > Signed-off-by: Jeff Cody 
> > ---
> >  job.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/job.c b/job.c
> > index fa671b431a..e36ebaafd8 100644
> > --- a/job.c
> > +++ b/job.c
> > @@ -732,10 +732,10 @@ static void job_cancel_async(Job *job, bool force)
> >  {
> >  if (job->user_paused) {
> >  /* Do not call job_enter here, the caller will handle it.  */
> > -job->user_paused = false;
> >  if (job->driver->user_resume) {
> >  job->driver->user_resume(job);
> >  }
> > +job->user_paused = false;
> >  assert(job->pause_count > 0);
> >  job->pause_count--;
> >  }
> > 
> 
> Looks right to me; are you going to make good on your threat of a v2
> with a unit test?
>

Yep!

> Reviewed-by: John Snow 

Thanks

-Jeff



Re: [Qemu-block] [Qemu-devel] [PATCH] block: for jobs, do not clear user_paused until after the resume

2018-08-15 Thread Jeff Cody
On Wed, Aug 15, 2018 at 04:25:16PM -0500, Eric Blake wrote:
> On 08/15/2018 10:59 AM, Jeff Cody wrote:
> >The function job_cancel_async() will always cause an assert for blockjob
> >user resume.  We set job->user_paused to false, and then call
> >job->driver->user_resume().  In the case of blockjobs, this is the
> >block_job_user_resume() function.
> >
> >In that function, we assert that job.user_paused is set to true.
> >Unfortunately, right before calling this function, it has explicitly
> >been set to false.
> >
> >The fix is pretty simple: set job->user_paused to false only after the
> >job user_resume() function has been called.
> >
> >Signed-off-by: Jeff Cody 
> >---
> >  job.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> 
> Does this need to CC qemu-stable?
> 

Good point, yes.  I'm going to do a v2 with an iotest, and I'll CC
qemu-stable on that one.

> Reviewed-by: Eric Blake 
> 

Thanks


-Jeff



Re: [Qemu-block] [Qemu-devel] [PATCH] block: for jobs, do not clear user_paused until after the resume

2018-08-15 Thread Eric Blake

On 08/15/2018 10:59 AM, Jeff Cody wrote:

The function job_cancel_async() will always cause an assert for blockjob
user resume.  We set job->user_paused to false, and then call
job->driver->user_resume().  In the case of blockjobs, this is the
block_job_user_resume() function.

In that function, we assert that job.user_paused is set to true.
Unfortunately, right before calling this function, it has explicitly
been set to false.

The fix is pretty simple: set job->user_paused to false only after the
job user_resume() function has been called.

Signed-off-by: Jeff Cody 
---
  job.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Does this need to CC qemu-stable?

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Qemu-devel] [PATCH] block: for jobs, do not clear user_paused until after the resume

2018-08-15 Thread John Snow



On 08/15/2018 11:59 AM, Jeff Cody wrote:
> The function job_cancel_async() will always cause an assert for blockjob
> user resume.  We set job->user_paused to false, and then call
> job->driver->user_resume().  In the case of blockjobs, this is the
> block_job_user_resume() function.
> 
> In that function, we assert that job.user_paused is set to true.
> Unfortunately, right before calling this function, it has explicitly
> been set to false.
> 
> The fix is pretty simple: set job->user_paused to false only after the
> job user_resume() function has been called.
> 
> Signed-off-by: Jeff Cody 
> ---
>  job.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/job.c b/job.c
> index fa671b431a..e36ebaafd8 100644
> --- a/job.c
> +++ b/job.c
> @@ -732,10 +732,10 @@ static void job_cancel_async(Job *job, bool force)
>  {
>  if (job->user_paused) {
>  /* Do not call job_enter here, the caller will handle it.  */
> -job->user_paused = false;
>  if (job->driver->user_resume) {
>  job->driver->user_resume(job);
>  }
> +job->user_paused = false;
>  assert(job->pause_count > 0);
>  job->pause_count--;
>  }
> 

Looks right to me; are you going to make good on your threat of a v2
with a unit test?

Reviewed-by: John Snow 



Re: [Qemu-block] [PATCH 16/21] block/commit: refactor commit to use job callbacks

2018-08-15 Thread John Snow



On 08/09/2018 11:12 AM, Kevin Wolf wrote:
> Am 07.08.2018 um 06:33 hat John Snow geschrieben:
>> Use the component callbacks; prepare, commit, abort, and clean.
>>
>> NB: prepare is only called when the job has not yet failed;
>> and abort can be called after prepare.
>>
>> complete -> prepare -> abort -> clean
>> complete -> abort -> clean
> 
> I always found this confusing. After converting all jobs to use the
> infrastructure, do you think that this design is helpful? You seem to be
> calling *_common function from commit and abort, with an almost empty
> prepare, which looks like a hint that maybe we should reorganise things.
> 

After rewriting things a bit, I think it would be nicer to call prepare
regardless of circumstances. The callbacks will be nicer for it.

When I wrote it the first time, the thought process was something like:

"Well, we won't need to prepare anything if we've already failed, we
just need to speed along to abort."

I wasn't considering so carefully the common cleanup case that needs to
occur post-finalization which appears to actually happen in a good
number of jobs.

>> Signed-off-by: John Snow 
>> ---
>>  block/commit.c | 94 
>> +-
>>  1 file changed, 61 insertions(+), 33 deletions(-)
>>
>> diff --git a/block/commit.c b/block/commit.c
>> index 1a4a56795f..6673a0544e 100644
>> --- a/block/commit.c
>> +++ b/block/commit.c
>> @@ -35,7 +35,9 @@ typedef struct CommitBlockJob {
>>  BlockJob common;
>>  BlockDriverState *commit_top_bs;
>>  BlockBackend *top;
>> +BlockDriverState *top_bs;
>>  BlockBackend *base;
>> +BlockDriverState *base_bs;
>>  BlockdevOnError on_error;
>>  int base_flags;
>>  char *backing_file_str;
> 
> More state. :-(
> 
> Can we at least move the new fields to the end of the struct with a
> comment that they are only valid after .exit()?
> 

Sure ... admittedly this is just a crutch because I was not precisely
sure exactly when these might change out from underneath me. If there
are some clever ways to avoid needing the state, feel free to suggest them.

>> @@ -71,37 +73,37 @@ static int coroutine_fn commit_populate(BlockBackend 
>> *bs, BlockBackend *base,
>>  static void commit_exit(Job *job)
>>  {
>>  CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
>> -BlockJob *bjob = &s->common;
>> -BlockDriverState *top = blk_bs(s->top);
>> -BlockDriverState *base = blk_bs(s->base);
>> -BlockDriverState *commit_top_bs = s->commit_top_bs;
>> -int ret = job->ret;
>> -bool remove_commit_top_bs = false;
>> +
>> +s->top_bs = blk_bs(s->top);
>> +s->base_bs = blk_bs(s->base);
>>  
>>  /* Make sure commit_top_bs and top stay around until 
>> bdrv_replace_node() */
>> -bdrv_ref(top);
>> -bdrv_ref(commit_top_bs);
>> +bdrv_ref(s->top_bs);
>> +bdrv_ref(s->commit_top_bs);
>> +}
>>  
>> -/* Remove base node parent that still uses BLK_PERM_WRITE/RESIZE before
>> - * the normal backing chain can be restored. */
>> -blk_unref(s->base);
>> +static int commit_prepare(Job *job)
>> +{
>> +CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
>>  
>> -if (!job_is_cancelled(job) && ret == 0) {
>> -/* success */
>> -ret = bdrv_drop_intermediate(s->commit_top_bs, base,
>> - s->backing_file_str);
>> -} else {
>> -/* XXX Can (or should) we somehow keep 'consistent read' blocked 
>> even
>> - * after the failed/cancelled commit job is gone? If we already 
>> wrote
>> - * something to base, the intermediate images aren't valid any 
>> more. */
>> -remove_commit_top_bs = true;
>> -}
>> + /* Remove base node parent that still uses BLK_PERM_WRITE/RESIZE before
>> +  * the normal backing chain can be restored. */
>> + blk_unref(s->base);
>> + s->base = NULL;
>> +
>> + return bdrv_drop_intermediate(s->commit_top_bs, s->base_bs,
>> +   s->backing_file_str);
> 
> The indentation is off here (which is weird because the additional space
> seems to be the only change to most of the lines).
> 
>> +}
>> +
>> +static void commit_exit_common(Job *job)
>> +{
>> +CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
>>  
>>  /* restore base open flags here if appropriate (e.g., change the base 
>> back
>>   * to r/o). These reopens do not need to be atomic, since we won't abort
>>   * even on failure here */
>> -if (s->base_flags != bdrv_get_flags(base)) {
>> -bdrv_reopen(base, s->base_flags, NULL);
>> +if (s->base_flags != bdrv_get_flags(s->base_bs)) {
>> +bdrv_reopen(s->base_bs, s->base_flags, NULL);
>>  }
>>  g_free(s->backing_file_str);
>>  blk_unref(s->top);
> 
> Feels like general cleanup, so intuitively, I'd expect this in .clean.
> The only thing that could make this impossible is the ordering.
> 
> bdr

Re: [Qemu-block] [PATCH 04/21] block/commit: utilize job_exit shim

2018-08-15 Thread John Snow



On 08/08/2018 12:29 PM, Kevin Wolf wrote:
> Am 07.08.2018 um 06:33 hat John Snow geschrieben:
>> Change the manual deferment to commit_complete into the implicit
>> callback to job_exit.
>>
>> Signed-off-by: John Snow 
> 
> There is one tricky thing in this patch that the commit message could be
> a bit more explicit about, which is moving job_completed() to a later
> point.
> 
> This is the code that happens between the old call of job_completed()
> and the new one:
> 
> /* If bdrv_drop_intermediate() didn't already do that, remove the commit
>  * filter driver from the backing chain. Do this as the final step so that
>  * the 'consistent read' permission can be granted.  */
> if (remove_commit_top_bs) {
> bdrv_child_try_set_perm(commit_top_bs->backing, 0, BLK_PERM_ALL,
> &error_abort);
> bdrv_replace_node(commit_top_bs, backing_bs(commit_top_bs),
>   &error_abort);
> }
> 
> bdrv_unref(commit_top_bs);
> bdrv_unref(top);
> 
> As the comment states, bdrv_replace_node() requires that the permission
> restrictions that the commit job made are already lifted. The most
> important part is done by the explicit block_job_remove_all_bdrv() call
> right before this hunk. It still leaves bjob->blk around, which could
> have implications, but luckily we didn't take any permissions for that
> one:
> 
> s = block_job_create(job_id, &commit_job_driver, NULL, bs, 0, 
> BLK_PERM_ALL,
>  speed, JOB_DEFAULT, NULL, NULL, errp);
> 
> So I think we got everything out of the way and bdrv_replace_node() can
> do what it wants to do.
> 
> Kevin
> 

I suppose it will be up to the author of a job to be aware of any
permissions they pick up at creation time that might have an effect on
the cleanup they wish to do during completion time.

--js



[Qemu-block] [PATCH] block: for jobs, do not clear user_paused until after the resume

2018-08-15 Thread Jeff Cody
The function job_cancel_async() will always cause an assert for blockjob
user resume.  We set job->user_paused to false, and then call
job->driver->user_resume().  In the case of blockjobs, this is the
block_job_user_resume() function.

In that function, we assert that job.user_paused is set to true.
Unfortunately, right before calling this function, it has explicitly
been set to false.

The fix is pretty simple: set job->user_paused to false only after the
job user_resume() function has been called.

Signed-off-by: Jeff Cody 
---
 job.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/job.c b/job.c
index fa671b431a..e36ebaafd8 100644
--- a/job.c
+++ b/job.c
@@ -732,10 +732,10 @@ static void job_cancel_async(Job *job, bool force)
 {
 if (job->user_paused) {
 /* Do not call job_enter here, the caller will handle it.  */
-job->user_paused = false;
 if (job->driver->user_resume) {
 job->driver->user_resume(job);
 }
+job->user_paused = false;
 assert(job->pause_count > 0);
 job->pause_count--;
 }
-- 
2.17.1




Re: [Qemu-block] [PATCH 00/21] jobs: defer graph changes until finalize

2018-08-15 Thread Peter Krempa
On Wed, Aug 15, 2018 at 17:00:29 +0200, Kevin Wolf wrote:
> Am 15.08.2018 um 16:44 hat Peter Krempa geschrieben:
> > On Tue, Aug 07, 2018 at 00:33:28 -0400, John Snow wrote:
> > > This series forces all jobs to use the "finalize" semantics that were
> > > introduced previously, but only exposed via the backup jobs.
> > > This series looks huge, but it's mostly mechanical changes broken out
> > > into a series of much smaller commits so that the changes are easier
> > > to follow at each step.
> > 
> > Please note that due to the semantics of the commands 'block-job-cancel'
> > and 'block-job-complete' at least in case of the drive/blockdev-mirror
> > job are technically blockjobs of it's own since they may fail or finish
> > in an unbounded amount of time and the actual return value on the
> > monitor does not reflect the result of the operation itself.
> > 
> > It would be very helpful if qemu treas them as such.
> 
> 'block-job-cancel' and 'block-job-complete' return immediately. They
> just move the given job into the next phase and return. That job may
> then still take some time until it finally completes (successfully or
> with an error), but that's not part of 'block-job-cancel/complete' any
> more.

Ah I see. That means that we just need to modify the semantics/language
used in libvirt for this.

Thanks for the clarification.


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 00/21] jobs: defer graph changes until finalize

2018-08-15 Thread Kevin Wolf
Am 15.08.2018 um 16:44 hat Peter Krempa geschrieben:
> On Tue, Aug 07, 2018 at 00:33:28 -0400, John Snow wrote:
> > This series forces all jobs to use the "finalize" semantics that were
> > introduced previously, but only exposed via the backup jobs.
> > This series looks huge, but it's mostly mechanical changes broken out
> > into a series of much smaller commits so that the changes are easier
> > to follow at each step.
> 
> Please note that due to the semantics of the commands 'block-job-cancel'
> and 'block-job-complete' at least in case of the drive/blockdev-mirror
> job are technically blockjobs of it's own since they may fail or finish
> in an unbounded amount of time and the actual return value on the
> monitor does not reflect the result of the operation itself.
> 
> It would be very helpful if qemu treas them as such.

'block-job-cancel' and 'block-job-complete' return immediately. They
just move the given job into the next phase and return. That job may
then still take some time until it finally completes (successfully or
with an error), but that's not part of 'block-job-cancel/complete' any
more.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH 00/21] jobs: defer graph changes until finalize

2018-08-15 Thread Peter Krempa
On Tue, Aug 07, 2018 at 00:33:28 -0400, John Snow wrote:
> This series forces all jobs to use the "finalize" semantics that were
> introduced previously, but only exposed via the backup jobs.
> This series looks huge, but it's mostly mechanical changes broken out
> into a series of much smaller commits so that the changes are easier
> to follow at each step.

Please note that due to the semantics of the commands 'block-job-cancel'
and 'block-job-complete' at least in case of the drive/blockdev-mirror
job are technically blockjobs of it's own since they may fail or finish
in an unbounded amount of time and the actual return value on the
monitor does not reflect the result of the operation itself.

It would be very helpful if qemu treas them as such.


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [RFC PATCH] rbd: Don't convert keypairs to JSON and back

2018-08-15 Thread Jeff Cody
On Wed, Aug 15, 2018 at 10:12:12AM +0200, Markus Armbruster wrote:
> Max Reitz  writes:
> 
> > On 2018-07-28 06:32, Jeff Cody wrote:
> >> On Wed, Jul 25, 2018 at 05:01:44PM +0100, Daniel P. Berrangé wrote:
> >>> On Wed, Jul 25, 2018 at 10:56:48AM -0500, Eric Blake wrote:
>  On 07/25/2018 10:10 AM, Markus Armbruster wrote:
> > qemu_rbd_parse_filename() builds a keypairs QList, converts it to JSON, 
> > and
> > stores the resulting QString in a QDict.
> >
> > qemu_rbd_co_create_opts() and qemu_rbd_open() get the QString from the
> > QDict, pass it to qemu_rbd_set_keypairs(), which converts it back into
> > a QList.
> >
> > Drop both conversions, store the QList instead.
> >
> > This affects output of qemu-img info.  Before this patch:
> >
> >  $ qemu-img info 
> > rbd:rbd/testimg.raw:mon_host=192.168.15.180:rbd_cache=true:conf=/tmp/ceph.conf
> >  [junk printed by Ceph library code...]
> >  image: json:{"driver": "raw", "file": {"pool": "rbd", "image": 
> > "testimg.raw", "conf": "/tmp/ceph.conf", "driver": "rbd", 
> > "=keyvalue-pairs": "[\"mon_host\", \"192.168.15.180\", \"rbd_cache\", 
> > \"true\"]"}}
> >  [more output, not interesting here]
> >
> > After this patch, we get
> >
> >  image: json:{"driver": "raw", "file": {"pool": "rbd", "image": 
> > "testimg.raw", "conf": "/tmp/ceph.conf", "driver": "rbd", 
> > "=keyvalue-pairs": ["mon_host", "192.168.15.180", "rbd_cache", "true"]}}
> >
> > The value of member "=keyvalue-pairs" changes from a string containing
> > a JSON array to that JSON array.  That's an improvement of sorts.  
> > However:
> >
> > * Should "=keyvalue-pairs" even be visible here?  It's supposed to be
> >purely internal...
> 
>  I'd argue that since it is supposed to be internal (as evidenced by the
>  leading '=' that does not name a normal variable), changing it doesn't 
>  hurt
>  stability. But yes, it would be nicer if we could filter it entirely so 
>  that
>  it does not appear in json: output, if it doesn't truly affect the 
>  contents
>  that the guest would see.
> >>>
> >>> If it appears in the json: output, then that means it could get written
> >>> into qcow2 headers as a backing file name, which would make it ABI
> >>> sensitive. This makes it even more important to filter it if it is 
> >>> supposed
> >>> to be internal only, with no ABI guarantee.
> >>>
> >> 
> >> It's been present for a couple releases (counting 3.0); is it safe to
> >> assume that, although it could be present in the qcow2 headers, that it 
> >> will
> >> not break anything by altering it or removing it?
> >
> > Did =keyvalue-pairs even work in json:{} filename?  If so, it will
> > continue to work even after filtering it.  If not, then filtering it
> > won't break existing files because they didn't work before either.
> 
> I'm afraid it does work:
> 
> $ gdb --args upstream-qemu -nodefaults -S -display vnc=:0 -readconfig 
> test.cfg 'json:{"driver": "raw", "file": {"pool": "rbd", "image": 
> "testimg.raw", "conf": "/tmp/ceph.conf", "driver": "rbd", "=keyvalue-pairs": 
> "[\"mon_host\", \"192.168.15.180\", \"rbd_cache\", \"true\"]"}}'
> GNU gdb (GDB) Fedora 8.1-25.fc28
> [...]
> (gdb) b qemu_rbd_open 
> Breakpoint 1 at 0x845f83: file /work/armbru/qemu/block/rbd.c, line 660.
> (gdb) r
> Starting program: /home/armbru/bin/upstream-qemu -nodefaults -S -display 
> vnc=:0 -readconfig test.cfg json:\{\"driver\":\ \"raw\",\ \"file\":\ 
> \{\"pool\":\ \"rbd\",\ \"image\":\ \"testimg.raw\",\ \"conf\":\ 
> \"/tmp/ceph.conf\",\ \"driver\":\ \"rbd\",\ \"=keyvalue-pairs\":\ 
> \"\[\\\"mon_host\\\",\ \\\"192.168.15.180\\\",\ \\\"rbd_cache\\\",\ 
> \\\"true\\\"\]\"\}\}
> [...]
> Thread 1 "upstream-qemu" hit Breakpoint 1, qemu_rbd_open 
> (bs=0x56a5a970, 
> options=0x56a5ec40, flags=24578, errp=0x7fffd370)
> at /work/armbru/qemu/block/rbd.c:660
> 660   {
> [...]
> (gdb) n
> 661   BDRVRBDState *s = bs->opaque;
> (gdb) 
> 662   BlockdevOptionsRbd *opts = NULL;
> (gdb) 
> 665   Error *local_err = NULL;
> (gdb) 
> 669   keypairs = g_strdup(qdict_get_try_str(options, 
> "=keyvalue-pairs"));
> (gdb) 
> 670   if (keypairs) {
> (gdb) p keypairs 
> $1 = 0x569e54c0 "[\"mon_host\", \"192.168.15.180\", \"rbd_cache\", 
> \"true\"]"
> 
> It really, really, really should not work.
> 
> It doesn't work with anything that relies on QAPI to represent
> configuration (such as QMP's blockdev-add), because BlockdevOptionsRbd
> doesn't have it.
> 
> It works with -drive only with a pseudo-filename (more on that below),
> even though -drive uses QemuOpts and QDict rather than QAPI, because the
> (carefully chosen) name "=keyvalue-pairs" is impossible to use with
> QemuOpts.
> 
> Howe

Re: [Qemu-block] [Libguestfs] [PATCH] v2v: Add --print-estimate option to print source size estimate.

2018-08-15 Thread Eric Blake

On 08/15/2018 07:53 AM, Kevin Wolf wrote:


But if I run ‘qemu-img convert ... -O raw output.raw’ then output.raw
will be a sparse file, and that's the file size I'd expect measure to
give us for "required size" (of course "fully allocated size" would be
the virtual size, and that is correct).

It does look as if the block/raw-format.c:raw_measure function is wrong.


No, it is correct. Its output is what the file size will be. For raw
images, that is the same as the virtual disk size.

Not all blocks in the file will be actually allocated, but the file size
is what 'ls -l' prints, not what 'du' prints (for regular files).

It becomes even clearer when you create LVs as the target. If you have a
10 GB image in which only the last 1 MB actually contains data, you
still need a 10 GB volume. You can't create a 1 MB volume and then store
data at an offset 10 GB - 1 MB, that would be way after the end of the
block device.


In any case we can use the qcow2 estimate for our purposes as it will
be near enough what we need (a rough estimate of the size of the copy).


I don't know what the exact purpose is, and it feels a bit hacky, but it
might be good enough.

I suppose what you really want is that 'qemu-img measure' provides
another number for the space taken by allocated blocks. (Probably
excluding metadata for non-raw formats? Might depend on the actual
purpose.)


Adding a third metric to 'qemu-img measure' makes more sense than 
changing the meaning of either of the two existing metrics.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Libguestfs] [PATCH] v2v: Add --print-estimate option to print source size estimate.

2018-08-15 Thread Richard W.M. Jones
On Wed, Aug 15, 2018 at 02:53:05PM +0200, Kevin Wolf wrote:
> [ Cc: qemu-block ]
> 
> Am 15.08.2018 um 12:55 hat Richard W.M. Jones geschrieben:
> > (Adding Stefan who implemented the subcommand)
> > 
> > On Wed, Aug 15, 2018 at 12:44:44PM +0200, Kevin Wolf wrote:
> > > Am 15.08.2018 um 12:26 hat Richard W.M. Jones geschrieben:
> > > > On Tue, Aug 14, 2018 at 09:31:06PM +0300, Nir Soffer wrote:
> > > > > On Tue, Aug 14, 2018 at 8:29 PM Richard W.M. Jones 
> > > > > wrote:
> > > > > 
> > > > > > This option prints the estimated size of the data that will be 
> > > > > > copied
> > > > > > from the source disk.
> > > > > >
> > > > > > For interest, the test prints:
> > > > > >
> > > > > > 3747840 ../test-data/phony-guests/windows.img
> > > > > > Estimate: 3710976
> > > > > >
> > > > > 
> > > > > Why not use qemu-img measure on the overlay?
> > > > 
> > > > Yes this would be better, but oddly it doesn't work properly when
> > > > the output format is set to 'raw':
> > > > 
> > > >   /usr/bin/qemu-img 'measure' '-f' 'qcow2' '-O' 'raw' '--output=human' 
> > > > '/home/rjones/d/libguestfs/tmp/v2vovl62b7c2.qcow2'
> > > >   required size: 6442450944
> > > >   fully allocated size: 6442450944
> > > > 
> > > > However it's OK if the output format is set to 'qcow2':
> > > > 
> > > >   /usr/bin/qemu-img 'measure' '-f' 'qcow2' '-O' 'qcow2' 
> > > > '--output=human' '/home/rjones/d/libguestfs/tmp/v2vovla53d7c.qcow2'
> > > >   required size: 1047986176
> > > >   fully allocated size: 6443696128
> > > > 
> > > > I guess it ignores sparseness of raw images, but I can't see a way to
> > > > specify that on the command line.
> > > > 
> > > > OTOH the qcow2 figure is probably a good enough guess for our purposes
> > > > (ie. estimating how much data will be copied).
> > > 
> > > 'qemu-img measure' calculates the resulting file size, not the number of
> > > used blocks. I think this is because its main purpose is to create block
> > > devices (like LVs) of the right size, so sparseness on a filesystem
> > > level doesn't buy you anything.
> > 
> > But if I run ‘qemu-img convert ... -O raw output.raw’ then output.raw
> > will be a sparse file, and that's the file size I'd expect measure to
> > give us for "required size" (of course "fully allocated size" would be
> > the virtual size, and that is correct).
> > 
> > It does look as if the block/raw-format.c:raw_measure function is wrong.
> 
> No, it is correct. Its output is what the file size will be. For raw
> images, that is the same as the virtual disk size.
> 
> Not all blocks in the file will be actually allocated, but the file size
> is what 'ls -l' prints, not what 'du' prints (for regular files).
> 
> It becomes even clearer when you create LVs as the target. If you have a
> 10 GB image in which only the last 1 MB actually contains data, you
> still need a 10 GB volume. You can't create a 1 MB volume and then store
> data at an offset 10 GB - 1 MB, that would be way after the end of the
> block device.
> 
> > In any case we can use the qcow2 estimate for our purposes as it will
> > be near enough what we need (a rough estimate of the size of the copy).
> 
> I don't know what the exact purpose is, and it feels a bit hacky, but it
> might be good enough.

The original goal is to try to get an estimate for how many bytes a
subsequent ‘qemu-img convert’ will actually copy.  The estimate does
not need to be very precise: it is used so that we can make a
prediction of how long the copy will take.  The prediction is used for
planning purposes, such as whether the copy will fit into a planned
downtime window and how long the overall process (which involves
multiple hundreds of qemu-img convert copies) will take.

I think the qcow2 number is near enough for these purposes.

> I suppose what you really want is that 'qemu-img measure' provides
> another number for the space taken by allocated blocks. (Probably
> excluding metadata for non-raw formats? Might depend on the actual
> purpose.)
> 
> Kevin

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v



[Qemu-block] [PULL 18/21] block: Simplify bdrv_reopen_abort()

2018-08-15 Thread Kevin Wolf
From: Alberto Garcia 

If a bdrv_reopen_multiple() call fails, then the explicit_options
QDict has to be deleted for every entry in the reopen queue. This must
happen regardless of whether that entry's bdrv_reopen_prepare() call
succeeded or not.

This patch simplifies the cleanup code a bit.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index a8fbab396d..e82cfa6fe2 100644
--- a/block.c
+++ b/block.c
@@ -3050,9 +3050,10 @@ int bdrv_reopen_multiple(AioContext *ctx, 
BlockReopenQueue *bs_queue, Error **er
 
 cleanup:
 QSIMPLEQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
-if (ret && bs_entry->prepared) {
-bdrv_reopen_abort(&bs_entry->state);
-} else if (ret) {
+if (ret) {
+if (bs_entry->prepared) {
+bdrv_reopen_abort(&bs_entry->state);
+}
 qobject_unref(bs_entry->state.explicit_options);
 }
 qobject_unref(bs_entry->state.options);
@@ -3341,8 +3342,6 @@ void bdrv_reopen_abort(BDRVReopenState *reopen_state)
 drv->bdrv_reopen_abort(reopen_state);
 }
 
-qobject_unref(reopen_state->explicit_options);
-
 bdrv_abort_perm_update(reopen_state->bs);
 }
 
-- 
2.13.6




[Qemu-block] [PULL 17/21] block: Remove children options from bs->{options, explicit_options}

2018-08-15 Thread Kevin Wolf
From: Alberto Garcia 

When bdrv_open_inherit() opens a BlockDriverState the options QDict
can contain options for some of its children, passed in the form of
child-name.option=value

So while each child is opened with that subset of options, those same
options remain stored in the parent BDS, leaving (at least) two copies
of each one of them ("child-name.option=value" in the parent and
"option=value" in the child).

Having the children options stored in the parent is unnecessary and it
can easily lead to an inconsistent state:

  $ qemu-img create -f qcow2 hd0.qcow2 10M
  $ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2
  $ qemu-img create -f qcow2 -b hd1.qcow2 hd2.qcow2

  $ $QEMU -drive file=hd2.qcow2,node-name=hd2,backing.node-name=hd1

This opens a chain of images hd0 <- hd1 <- hd2. Now let's remove hd1
using block_stream:

  (qemu) block_stream hd2 0 hd0.qcow2

After this hd2 contains backing.node-name=hd1, which is no longer
correct because hd1 doesn't exist anymore.

This patch removes all children options from the parent dictionaries
at the end of bdrv_open_inherit() and bdrv_reopen_queue_child().

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/block.c b/block.c
index 9694018a68..a8fbab396d 100644
--- a/block.c
+++ b/block.c
@@ -2584,6 +2584,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 BlockBackend *file = NULL;
 BlockDriverState *bs;
 BlockDriver *drv = NULL;
+BdrvChild *child;
 const char *drvname;
 const char *backing;
 Error *local_err = NULL;
@@ -2767,6 +2768,15 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 }
 }
 
+/* Remove all children options from bs->options and bs->explicit_options */
+QLIST_FOREACH(child, &bs->children, next) {
+char *child_key_dot;
+child_key_dot = g_strdup_printf("%s.", child->name);
+qdict_extract_subqdict(bs->explicit_options, NULL, child_key_dot);
+qdict_extract_subqdict(bs->options, NULL, child_key_dot);
+g_free(child_key_dot);
+}
+
 bdrv_refresh_filename(bs);
 
 /* Check if any unknown options were used */
@@ -2976,6 +2986,7 @@ static BlockReopenQueue 
*bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
 }
 
 child_key_dot = g_strdup_printf("%s.", child->name);
+qdict_extract_subqdict(explicit_options, NULL, child_key_dot);
 qdict_extract_subqdict(options, &new_child_options, child_key_dot);
 g_free(child_key_dot);
 
-- 
2.13.6




[Qemu-block] [PULL 16/21] qdict: Make qdict_extract_subqdict() accept dst = NULL

2018-08-15 Thread Kevin Wolf
From: Alberto Garcia 

This function extracts all options from a QDict starting with a
certain prefix and puts them in a new QDict.

We'll have a couple of cases where we simply want to discard those
options instead of copying them, and that's what this patch does.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 qobject/block-qdict.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/qobject/block-qdict.c b/qobject/block-qdict.c
index 80c653013f..42054cc274 100644
--- a/qobject/block-qdict.c
+++ b/qobject/block-qdict.c
@@ -158,20 +158,25 @@ void qdict_flatten(QDict *qdict)
 qdict_flatten_qdict(qdict, qdict, NULL);
 }
 
-/* extract all the src QDict entries starting by start into dst */
+/* extract all the src QDict entries starting by start into dst.
+ * If dst is NULL then the entries are simply removed from src. */
 void qdict_extract_subqdict(QDict *src, QDict **dst, const char *start)
 
 {
 const QDictEntry *entry, *next;
 const char *p;
 
-*dst = qdict_new();
+if (dst) {
+*dst = qdict_new();
+}
 entry = qdict_first(src);
 
 while (entry != NULL) {
 next = qdict_next(src, entry);
 if (strstart(entry->key, start, &p)) {
-qdict_put_obj(*dst, p, qobject_ref(entry->value));
+if (dst) {
+qdict_put_obj(*dst, p, qobject_ref(entry->value));
+}
 qdict_del(src, entry->key);
 }
 entry = next;
-- 
2.13.6




[Qemu-block] [PULL 10/21] block: Remove dead deprecation warning code

2018-08-15 Thread Kevin Wolf
This reinstates commit 6266e900b8083945cb766b45c124fb3c42932cb3,
which was temporarily reverted for the 3.0 release so that libvirt gets
some extra time to update their command lines.

We removed all options from the 'deprecated' array, so the code is dead
and can be removed as well.

Signed-off-by: Kevin Wolf 
Reviewed-by: Markus Armbruster 
---
 blockdev.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 37eb40670b..72f5347df5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -775,8 +775,6 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type)
 const char *filename;
 Error *local_err = NULL;
 int i;
-const char *deprecated[] = {
-};
 
 /* Change legacy command line options into QMP ones */
 static const struct {
@@ -853,16 +851,6 @@ DriveInfo *drive_new(QemuOpts *all_opts, 
BlockInterfaceType block_default_type)
 goto fail;
 }
 
-/* Other deprecated options */
-if (!qtest_enabled()) {
-for (i = 0; i < ARRAY_SIZE(deprecated); i++) {
-if (qemu_opt_get(legacy_opts, deprecated[i]) != NULL) {
-error_report("'%s' is deprecated, please use the corresponding 
"
- "option of '-device' instead", deprecated[i]);
-}
-}
-}
-
 /* Media type */
 value = qemu_opt_get(legacy_opts, "media");
 if (value) {
-- 
2.13.6




[Qemu-block] [PULL 14/21] block: make .bdrv_close optional

2018-08-15 Thread Kevin Wolf
From: Vladimir Sementsov-Ogievskiy 

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Kevin Wolf 
---
 block.c  | 4 +++-
 block/snapshot.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 39f373e035..9694018a68 100644
--- a/block.c
+++ b/block.c
@@ -3349,7 +3349,9 @@ static void bdrv_close(BlockDriverState *bs)
 bdrv_drain(bs); /* in case flush left pending I/O */
 
 if (bs->drv) {
-bs->drv->bdrv_close(bs);
+if (bs->drv->bdrv_close) {
+bs->drv->bdrv_close(bs);
+}
 bs->drv = NULL;
 }
 
diff --git a/block/snapshot.c b/block/snapshot.c
index f9903bc94e..3218a542df 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -218,7 +218,9 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 qobject_unref(file_options);
 qdict_put_str(options, "file", bdrv_get_node_name(file));
 
-drv->bdrv_close(bs);
+if (drv->bdrv_close) {
+drv->bdrv_close(bs);
+}
 bdrv_unref_child(bs, bs->file);
 bs->file = NULL;
 
-- 
2.13.6




[Qemu-block] [PULL 08/21] block: Remove deprecated -drive option addr

2018-08-15 Thread Kevin Wolf
This reinstates commit eae3bd1eb7c6b105d30ec06008b3bc3dfc5f45bb,
which was temporarily reverted for the 3.0 release so that libvirt gets
some extra time to update their command lines.

The -drive option addr was deprecated in QEMU 2.10. It's time to remove
it.

Signed-off-by: Kevin Wolf 
Reviewed-by: Markus Armbruster 
Reviewed-by: Jeff Cody 
---
 include/sysemu/blockdev.h |  1 -
 blockdev.c| 17 +
 device-hotplug.c  |  4 
 qemu-deprecated.texi  |  5 -
 qemu-options.hx   |  5 +
 5 files changed, 2 insertions(+), 30 deletions(-)

diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index 37ea39719e..c0ae3700ec 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -28,7 +28,6 @@ typedef enum {
 } BlockInterfaceType;
 
 struct DriveInfo {
-const char *devaddr;
 BlockInterfaceType type;
 int bus;
 int unit;
diff --git a/blockdev.c b/blockdev.c
index c23587b075..6c530769fd 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -731,10 +731,6 @@ QemuOptsList qemu_legacy_drive_opts = {
 .type = QEMU_OPT_STRING,
 .help = "interface (ide, scsi, sd, mtd, floppy, pflash, virtio)",
 },{
-.name = "addr",
-.type = QEMU_OPT_STRING,
-.help = "pci address (virtio only)",
-},{
 .name = "serial",
 .type = QEMU_OPT_STRING,
 .help = "disk serial number",
@@ -777,7 +773,6 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type)
 DriveMediaType media = MEDIA_DISK;
 BlockInterfaceType type;
 int max_devs, bus_id, unit_id, index;
-const char *devaddr;
 const char *werror, *rerror;
 bool read_only = false;
 bool copy_on_read;
@@ -786,7 +781,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type)
 Error *local_err = NULL;
 int i;
 const char *deprecated[] = {
-"serial", "addr"
+"serial"
 };
 
 /* Change legacy command line options into QMP ones */
@@ -976,12 +971,6 @@ DriveInfo *drive_new(QemuOpts *all_opts, 
BlockInterfaceType block_default_type)
 }
 
 /* Add virtio block device */
-devaddr = qemu_opt_get(legacy_opts, "addr");
-if (devaddr && type != IF_VIRTIO) {
-error_report("addr is not supported by this bus type");
-goto fail;
-}
-
 if (type == IF_VIRTIO) {
 QemuOpts *devopts;
 devopts = qemu_opts_create(qemu_find_opts("device"), NULL, 0,
@@ -993,9 +982,6 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type)
 }
 qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
  &error_abort);
-if (devaddr) {
-qemu_opt_set(devopts, "addr", devaddr, &error_abort);
-}
 }
 
 filename = qemu_opt_get(legacy_opts, "file");
@@ -1040,7 +1026,6 @@ DriveInfo *drive_new(QemuOpts *all_opts, 
BlockInterfaceType block_default_type)
 dinfo->type = type;
 dinfo->bus = bus_id;
 dinfo->unit = unit_id;
-dinfo->devaddr = devaddr;
 dinfo->serial = g_strdup(serial);
 
 blk_set_legacy_dinfo(blk, dinfo);
diff --git a/device-hotplug.c b/device-hotplug.c
index 23fd6656f1..cd427e2c76 100644
--- a/device-hotplug.c
+++ b/device-hotplug.c
@@ -69,10 +69,6 @@ void hmp_drive_add(Monitor *mon, const QDict *qdict)
 if (!dinfo) {
 goto err;
 }
-if (dinfo->devaddr) {
-monitor_printf(mon, "Parameter addr not supported\n");
-goto err;
-}
 
 switch (dinfo->type) {
 case IF_NONE:
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index f90436da56..6d7aaab2cd 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -99,11 +99,6 @@ provided per NIC.
 The drive serial argument is replaced by the the serial argument
 that can be specified with the ``-device'' parameter.
 
-@subsection -drive addr=... (since 2.10.0)
-
-The drive addr argument is replaced by the the addr argument
-that can be specified with the ``-device'' parameter.
-
 @subsection -usbdevice (since 2.10.0)
 
 The ``-usbdevice DEV'' argument is now a synonym for setting
diff --git a/qemu-options.hx b/qemu-options.hx
index b5155c925a..a42917f41c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -805,7 +805,7 @@ ETEXI
 DEF("drive", HAS_ARG, QEMU_OPTION_drive,
 "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
 "   
[,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
-"   
[,snapshot=on|off][,serial=s][,addr=A][,rerror=ignore|stop|report]\n"
+"   [,snapshot=on|off][,serial=s][,rerror=ignore|stop|report]\n"
 "   
[,werror=ignore|stop|report|enospc][,id=name][,aio=threads|native]\n"
 "   [,readonly=on|off][,copy-on-read=on|off]\n"
 "   [,discard=ignore|unmap][,detect-zeroes=on|off|unmap]\n"
@@ -883,9 +883,6 @@ an untrusted format header.
 This option specifie

[Qemu-block] [PULL 13/21] qemu-img: fix regression copying secrets during convert

2018-08-15 Thread Kevin Wolf
From: Daniel P. Berrangé 

When the convert command is creating an output file that needs
secrets, we need to ensure those secrets are passed to both the
blk_new_open and bdrv_create API calls.

This is done by qemu-img extracting all opts matching the name
suffix "key-secret". Unfortunately the code doing this was run after the
call to bdrv_create(), which meant the QemuOpts it was extracting
secrets from was now empty.

Previously this worked by luks as a bug meant the "key-secret"
parameters were not purged from the QemuOpts. This bug was fixed in

  commit b76b4f604521e59f857d6177bc55f6f2e41fd392
  Author: Kevin Wolf 
  Date:   Thu Jan 11 16:18:08 2018 +0100

qcow2: Use visitor for options in qcow2_create()

Exposing the latent bug in qemu-img. This fix simply moves the copying
of secrets to before the bdrv_create() call.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Daniel P. Berrangé 
Signed-off-by: Kevin Wolf 
---
 qemu-img.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 1acddf693c..b12f4cd19b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -345,21 +345,6 @@ static int img_add_key_secrets(void *opaque,
 return 0;
 }
 
-static BlockBackend *img_open_new_file(const char *filename,
-   QemuOpts *create_opts,
-   const char *fmt, int flags,
-   bool writethrough, bool quiet,
-   bool force_share)
-{
-QDict *options = NULL;
-
-options = qdict_new();
-qemu_opt_foreach(create_opts, img_add_key_secrets, options, &error_abort);
-
-return img_open_file(filename, options, fmt, flags, writethrough, quiet,
- force_share);
-}
-
 
 static BlockBackend *img_open(bool image_opts,
   const char *filename,
@@ -2018,6 +2003,7 @@ static int img_convert(int argc, char **argv)
 BlockDriverState *out_bs;
 QemuOpts *opts = NULL, *sn_opts = NULL;
 QemuOptsList *create_opts = NULL;
+QDict *open_opts = NULL;
 char *options = NULL;
 Error *local_err = NULL;
 bool writethrough, src_writethrough, quiet = false, image_opts = false,
@@ -2362,6 +2348,16 @@ static int img_convert(int argc, char **argv)
 }
 }
 
+/*
+ * The later open call will need any decryption secrets, and
+ * bdrv_create() will purge "opts", so extract them now before
+ * they are lost.
+ */
+if (!skip_create) {
+open_opts = qdict_new();
+qemu_opt_foreach(opts, img_add_key_secrets, open_opts, &error_abort);
+}
+
 if (!skip_create) {
 /* Create the new image */
 ret = bdrv_create(drv, out_filename, opts, &local_err);
@@ -2388,8 +2384,9 @@ static int img_convert(int argc, char **argv)
  * That has to wait for bdrv_create to be improved
  * to allow filenames in option syntax
  */
-s.target = img_open_new_file(out_filename, opts, out_fmt,
- flags, writethrough, quiet, false);
+s.target = img_open_file(out_filename, open_opts, out_fmt,
+ flags, writethrough, quiet, false);
+open_opts = NULL; /* blk_new_open will have freed it */
 }
 if (!s.target) {
 ret = -1;
@@ -2464,6 +2461,7 @@ out:
 qemu_opts_del(opts);
 qemu_opts_free(create_opts);
 qemu_opts_del(sn_opts);
+qobject_unref(open_opts);
 blk_unref(s.target);
 if (s.src) {
 for (bs_i = 0; bs_i < s.src_num; bs_i++) {
-- 
2.13.6




[Qemu-block] [PULL 19/21] block: Update bs->options if bdrv_reopen() succeeds

2018-08-15 Thread Kevin Wolf
From: Alberto Garcia 

If bdrv_reopen() succeeds then bs->explicit_options is updated with
the new values, but bs->options never changes.

Here's an example:

   { "execute": "blockdev-add",
 "arguments": {
   "driver": "qcow2",
   "node-name": "hd0",
   "overlap-check": "all",
   "file": {
 "driver": "file",
 "filename": "hd0.qcow2"
   }
 }
   }

After this, both bs->options and bs->explicit_options contain
"overlap-check": "all".

Now let's change that using qemu-io's reopen command:

   (qemu) qemu-io hd0 "reopen -o overlap-check=none"

After this, bs->explicit_options contains the new value but
bs->options still keeps the old one.

This patch updates bs->options after a BDS has been successfully
reopened.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index e82cfa6fe2..8c95f9893a 100644
--- a/block.c
+++ b/block.c
@@ -3055,8 +3055,8 @@ cleanup:
 bdrv_reopen_abort(&bs_entry->state);
 }
 qobject_unref(bs_entry->state.explicit_options);
+qobject_unref(bs_entry->state.options);
 }
-qobject_unref(bs_entry->state.options);
 g_free(bs_entry);
 }
 g_free(bs_queue);
@@ -3156,6 +3156,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 Error *local_err = NULL;
 BlockDriver *drv;
 QemuOpts *opts;
+QDict *orig_reopen_opts;
 const char *value;
 bool read_only;
 
@@ -3163,6 +3164,11 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 assert(reopen_state->bs->drv != NULL);
 drv = reopen_state->bs->drv;
 
+/* This function and each driver's bdrv_reopen_prepare() remove
+ * entries from reopen_state->options as they are processed, so
+ * we need to make a copy of the original QDict. */
+orig_reopen_opts = qdict_clone_shallow(reopen_state->options);
+
 /* Process generic block layer options */
 opts = qemu_opts_create(&bdrv_runtime_opts, NULL, 0, &error_abort);
 qemu_opts_absorb_qdict(opts, reopen_state->options, &local_err);
@@ -3269,8 +3275,13 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 
 ret = 0;
 
+/* Restore the original reopen_state->options QDict */
+qobject_unref(reopen_state->options);
+reopen_state->options = qobject_ref(orig_reopen_opts);
+
 error:
 qemu_opts_del(opts);
+qobject_unref(orig_reopen_opts);
 return ret;
 }
 
@@ -3300,8 +3311,10 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
 
 /* set BDS specific flags now */
 qobject_unref(bs->explicit_options);
+qobject_unref(bs->options);
 
 bs->explicit_options   = reopen_state->explicit_options;
+bs->options= reopen_state->options;
 bs->open_flags = reopen_state->flags;
 bs->read_only = !(reopen_state->flags & BDRV_O_RDWR);
 
-- 
2.13.6




[Qemu-block] [PULL 20/21] block: Simplify append_open_options()

2018-08-15 Thread Kevin Wolf
From: Alberto Garcia 

This function returns a BDS's driver-specific options, excluding also
those from its children. Since we have just removed all children
options from bs->options there's no need to do this last step.

We allow references to children, though ("backing": "node0"), so those
we still have to remove.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index 8c95f9893a..6161dbe3eb 100644
--- a/block.c
+++ b/block.c
@@ -5150,16 +5150,13 @@ static bool append_open_options(QDict *d, 
BlockDriverState *bs)
 QemuOptDesc *desc;
 BdrvChild *child;
 bool found_any = false;
-const char *p;
 
 for (entry = qdict_first(bs->options); entry;
  entry = qdict_next(bs->options, entry))
 {
-/* Exclude options for children */
+/* Exclude node-name references to children */
 QLIST_FOREACH(child, &bs->children, next) {
-if (strstart(qdict_entry_key(entry), child->name, &p)
-&& (!*p || *p == '.'))
-{
+if (!strcmp(entry->key, child->name)) {
 break;
 }
 }
-- 
2.13.6




[Qemu-block] [PULL 15/21] block: drop empty .bdrv_close handlers

2018-08-15 Thread Kevin Wolf
From: Vladimir Sementsov-Ogievskiy 

.bdrv_close handler is optional after previous commit, no needs to keep
empty functions more.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Kevin Wolf 
---
 block/blkreplay.c| 5 -
 block/commit.c   | 5 -
 block/copy-on-read.c | 6 --
 block/mirror.c   | 5 -
 block/null.c | 6 --
 block/raw-format.c   | 5 -
 6 files changed, 32 deletions(-)

diff --git a/block/blkreplay.c b/block/blkreplay.c
index 766150ade6..b5d9efdeca 100755
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -43,10 +43,6 @@ fail:
 return ret;
 }
 
-static void blkreplay_close(BlockDriverState *bs)
-{
-}
-
 static int64_t blkreplay_getlength(BlockDriverState *bs)
 {
 return bdrv_getlength(bs->file->bs);
@@ -135,7 +131,6 @@ static BlockDriver bdrv_blkreplay = {
 .instance_size  = 0,
 
 .bdrv_open  = blkreplay_open,
-.bdrv_close = blkreplay_close,
 .bdrv_child_perm= bdrv_filter_default_perms,
 .bdrv_getlength = blkreplay_getlength,
 
diff --git a/block/commit.c b/block/commit.c
index e1814d9693..eb414579bd 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -239,10 +239,6 @@ static void 
bdrv_commit_top_refresh_filename(BlockDriverState *bs, QDict *opts)
 bs->backing->bs->filename);
 }
 
-static void bdrv_commit_top_close(BlockDriverState *bs)
-{
-}
-
 static void bdrv_commit_top_child_perm(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
@@ -260,7 +256,6 @@ static BlockDriver bdrv_commit_top = {
 .bdrv_co_preadv = bdrv_commit_top_preadv,
 .bdrv_co_block_status   = bdrv_co_block_status_from_backing,
 .bdrv_refresh_filename  = bdrv_commit_top_refresh_filename,
-.bdrv_close = bdrv_commit_top_close,
 .bdrv_child_perm= bdrv_commit_top_child_perm,
 };
 
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index a19164f9eb..64dcc424b5 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -45,11 +45,6 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 
-static void cor_close(BlockDriverState *bs)
-{
-}
-
-
 #define PERM_PASSTHROUGH (BLK_PERM_CONSISTENT_READ \
   | BLK_PERM_WRITE \
   | BLK_PERM_RESIZE)
@@ -143,7 +138,6 @@ BlockDriver bdrv_copy_on_read = {
 .format_name= "copy-on-read",
 
 .bdrv_open  = cor_open,
-.bdrv_close = cor_close,
 .bdrv_child_perm= cor_child_perm,
 
 .bdrv_getlength = cor_getlength,
diff --git a/block/mirror.c b/block/mirror.c
index dd5ca02b09..6cc10df5c9 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1426,10 +1426,6 @@ static void 
bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
 bs->backing->bs->filename);
 }
 
-static void bdrv_mirror_top_close(BlockDriverState *bs)
-{
-}
-
 static void bdrv_mirror_top_child_perm(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
@@ -1456,7 +1452,6 @@ static BlockDriver bdrv_mirror_top = {
 .bdrv_co_flush  = bdrv_mirror_top_flush,
 .bdrv_co_block_status   = bdrv_co_block_status_from_backing,
 .bdrv_refresh_filename  = bdrv_mirror_top_refresh_filename,
-.bdrv_close = bdrv_mirror_top_close,
 .bdrv_child_perm= bdrv_mirror_top_child_perm,
 };
 
diff --git a/block/null.c b/block/null.c
index 5d610fdfba..d442d3e901 100644
--- a/block/null.c
+++ b/block/null.c
@@ -97,10 +97,6 @@ static int null_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 return ret;
 }
 
-static void null_close(BlockDriverState *bs)
-{
-}
-
 static int64_t null_getlength(BlockDriverState *bs)
 {
 BDRVNullState *s = bs->opaque;
@@ -263,7 +259,6 @@ static BlockDriver bdrv_null_co = {
 
 .bdrv_file_open = null_file_open,
 .bdrv_parse_filename= null_co_parse_filename,
-.bdrv_close = null_close,
 .bdrv_getlength = null_getlength,
 
 .bdrv_co_preadv = null_co_preadv,
@@ -283,7 +278,6 @@ static BlockDriver bdrv_null_aio = {
 
 .bdrv_file_open = null_file_open,
 .bdrv_parse_filename= null_aio_parse_filename,
-.bdrv_close = null_close,
 .bdrv_getlength = null_getlength,
 
 .bdrv_aio_preadv= null_aio_preadv,
diff --git a/block/raw-format.c b/block/raw-format.c
index 2fd69cdb08..6f6dc99b2c 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -459,10 +459,6 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 return 0;
 }
 
-static void raw_close(BlockDr

[Qemu-block] [PULL 12/21] mirror: Fail gracefully for source == target

2018-08-15 Thread Kevin Wolf
blockdev-mirror with the same node for source and target segfaults
today: A node is in its own backing chain, so mirror_start_job() decides
that this is an active commit. When adding the intermediate nodes with
block_job_add_bdrv(), it starts the iteration through the subchain with
the backing file of source, though, so it never reaches target and
instead runs into NULL at the base.

While we could fix that by starting with source itself, there is no
point in allowing mirroring a node into itself and I wouldn't be
surprised if this caused more problems later.

So just check for this scenario and error out.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
---
 block/mirror.c | 5 +
 tests/qemu-iotests/041 | 6 ++
 tests/qemu-iotests/041.out | 4 ++--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b48c3f8cf5..dd5ca02b09 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1499,6 +1499,11 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 buf_size = DEFAULT_MIRROR_BUF_SIZE;
 }
 
+if (bs == target) {
+error_setg(errp, "Can't mirror node into itself");
+return;
+}
+
 /* In the case of active commit, add dummy driver to provide consistent
  * reads on the top, while disabling it in the intermediate nodes, and make
  * the backing chain writable. */
diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index c20ac7da87..9336ab6ff5 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -234,6 +234,12 @@ class TestSingleBlockdev(TestSingleDrive):
 result = self.vm.qmp("blockdev-add", **args)
 self.assert_qmp(result, 'return', {})
 
+def test_mirror_to_self(self):
+result = self.vm.qmp(self.qmp_cmd, job_id='job0',
+ device=self.qmp_target, sync='full',
+ target=self.qmp_target)
+self.assert_qmp(result, 'error/class', 'GenericError')
+
 test_large_cluster = None
 test_image_not_found = None
 test_small_buffer2 = None
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index c28b392b87..e071d0b261 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-.
+
 --
-Ran 85 tests
+Ran 88 tests
 
 OK
-- 
2.13.6




[Qemu-block] [PULL 11/21] qapi/block: Document restrictions for node names

2018-08-15 Thread Kevin Wolf
blockdev-add fails if an invalid node name is given, so we should
document what a valid node name even is.

Reported-by: Cong Li 
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Cong Li 
Reviewed-by: Alberto Garcia 
---
 qapi/block-core.json | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5b9084a394..f20efc97f7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3572,6 +3572,9 @@
 # @driver:block driver name
 # @node-name: the node name of the new node (Since 2.0).
 # This option is required on the top level of blockdev-add.
+# Valid node names start with an alphabetic character and may
+# contain only alphanumeric characters, '-', '.' and '_'. Their
+# maximum length is 31 characters.
 # @discard:   discard-related options (default: ignore)
 # @cache: cache-related options
 # @read-only: whether the block device should be read-only (default: 
false).
-- 
2.13.6




[Qemu-block] [PULL 21/21] qapi: block: Remove mentions of error types which were removed

2018-08-15 Thread Kevin Wolf
From: Peter Krempa 

Most of the various error classes were removed prior to the 1.2 release.
Remove mentions of the error classes which did not make it.

Signed-off-by: Peter Krempa 
Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index f20efc97f7..4c7a37afdc 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1499,11 +1499,8 @@
 #autogenerated. (Since: 2.9)
 #
 # Returns: Nothing on success
-#  If commit or stream is already active on this device, DeviceInUse
 #  If @device does not exist, DeviceNotFound
-#  If image commit is not supported by this device, NotSupported
-#  If @base or @top is invalid, a generic error is returned
-#  If @speed is invalid, InvalidParameter
+#  Any other error returns a GenericError.
 #
 # Since: 1.3
 #
-- 
2.13.6




[Qemu-block] [PULL 09/21] block: Remove deprecated -drive option serial

2018-08-15 Thread Kevin Wolf
This reinstates commit b0083267444a5e0f28391f6c2831a539f878d424,
which was temporarily reverted for the 3.0 release so that libvirt gets
some extra time to update their command lines.

The -drive option serial was deprecated in QEMU 2.10. It's time to
remove it.

Tests need to be updated to set the serial number with -global instead
of using the -drive option.

Signed-off-by: Kevin Wolf 
Reviewed-by: Markus Armbruster 
Reviewed-by: Jeff Cody 
---
 include/hw/block/block.h  |  1 -
 include/sysemu/blockdev.h |  1 -
 block/block-backend.c |  1 -
 blockdev.c| 10 --
 hw/block/block.c  | 13 -
 hw/block/nvme.c   |  1 -
 hw/block/virtio-blk.c |  1 -
 hw/ide/qdev.c |  1 -
 hw/scsi/scsi-disk.c   |  1 -
 hw/usb/dev-storage.c  |  1 -
 tests/ahci-test.c |  6 +++---
 tests/ide-test.c  |  8 
 qemu-deprecated.texi  |  5 -
 qemu-options.hx   |  6 +-
 14 files changed, 8 insertions(+), 48 deletions(-)

diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index d4f4dfffab..e9f9e2223f 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -72,7 +72,6 @@ static inline unsigned int get_physical_block_exp(BlockConf 
*conf)
 
 /* Configuration helpers */
 
-void blkconf_serial(BlockConf *conf, char **serial);
 bool blkconf_geometry(BlockConf *conf, int *trans,
   unsigned cyls_max, unsigned heads_max, unsigned secs_max,
   Error **errp);
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index c0ae3700ec..24954b94e0 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -35,7 +35,6 @@ struct DriveInfo {
 bool is_default;/* Added by default_drive() ?  */
 int media_cd;
 QemuOpts *opts;
-char *serial;
 QTAILQ_ENTRY(DriveInfo) next;
 };
 
diff --git a/block/block-backend.c b/block/block-backend.c
index f2f75a977d..fa120630be 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -419,7 +419,6 @@ static void drive_info_del(DriveInfo *dinfo)
 return;
 }
 qemu_opts_del(dinfo->opts);
-g_free(dinfo->serial);
 g_free(dinfo);
 }
 
diff --git a/blockdev.c b/blockdev.c
index 6c530769fd..37eb40670b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -731,10 +731,6 @@ QemuOptsList qemu_legacy_drive_opts = {
 .type = QEMU_OPT_STRING,
 .help = "interface (ide, scsi, sd, mtd, floppy, pflash, virtio)",
 },{
-.name = "serial",
-.type = QEMU_OPT_STRING,
-.help = "disk serial number",
-},{
 .name = "file",
 .type = QEMU_OPT_STRING,
 .help = "file name",
@@ -776,12 +772,10 @@ DriveInfo *drive_new(QemuOpts *all_opts, 
BlockInterfaceType block_default_type)
 const char *werror, *rerror;
 bool read_only = false;
 bool copy_on_read;
-const char *serial;
 const char *filename;
 Error *local_err = NULL;
 int i;
 const char *deprecated[] = {
-"serial"
 };
 
 /* Change legacy command line options into QMP ones */
@@ -949,9 +943,6 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type)
 goto fail;
 }
 
-/* Serial number */
-serial = qemu_opt_get(legacy_opts, "serial");
-
 /* no id supplied -> create one */
 if (qemu_opts_id(all_opts) == NULL) {
 char *new_id;
@@ -1026,7 +1017,6 @@ DriveInfo *drive_new(QemuOpts *all_opts, 
BlockInterfaceType block_default_type)
 dinfo->type = type;
 dinfo->bus = bus_id;
 dinfo->unit = unit_id;
-dinfo->serial = g_strdup(serial);
 
 blk_set_legacy_dinfo(blk, dinfo);
 
diff --git a/hw/block/block.c b/hw/block/block.c
index b6c80ab0b7..cf0eb826f1 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -15,19 +15,6 @@
 #include "qapi/qapi-types-block.h"
 #include "qemu/error-report.h"
 
-void blkconf_serial(BlockConf *conf, char **serial)
-{
-DriveInfo *dinfo;
-
-if (!*serial) {
-/* try to fall back to value set with legacy -drive serial=... */
-dinfo = blk_legacy_dinfo(conf->blk);
-if (dinfo) {
-*serial = g_strdup(dinfo->serial);
-}
-}
-}
-
 void blkconf_blocksizes(BlockConf *conf)
 {
 BlockBackend *blk = conf->blk;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 5e508ab1b3..fc7dacb816 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1217,7 +1217,6 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 return;
 }
 
-blkconf_serial(&n->conf, &n->serial);
 if (!n->serial) {
 error_setg(errp, "serial property not set");
 return;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 50b5c869e3..225fe44b7a 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -935,7 +935,6 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-blkc

[Qemu-block] [PULL 07/21] block: Remove deprecated -drive geometry options

2018-08-15 Thread Kevin Wolf
This reinstates commit a7aff6dd10b16b67e8b142d0c94c5d92c3fe88f6,
which was temporarily reverted for the 3.0 release so that libvirt gets
some extra time to update their command lines.

The -drive options cyls, heads, secs and trans were deprecated in
QEMU 2.10. It's time to remove them.

hd-geo-test tested both the old version with geometry options in -drive
and the new one with -device. Therefore the code using -drive doesn't
have to be replaced there, we just need to remove the -drive test cases.
This in turn allows some simplification of the code.

Signed-off-by: Kevin Wolf 
Reviewed-by: Markus Armbruster 
---
 include/sysemu/blockdev.h |  1 -
 blockdev.c| 75 +--
 hw/block/block.c  | 14 -
 tests/hd-geo-test.c   | 37 +--
 hmp-commands.hx   |  1 -
 qemu-deprecated.texi  |  5 
 qemu-options.hx   |  7 +
 7 files changed, 9 insertions(+), 131 deletions(-)

diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index ac22f2ae1f..37ea39719e 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -35,7 +35,6 @@ struct DriveInfo {
 int auto_del;   /* see blockdev_mark_auto_del() */
 bool is_default;/* Added by default_drive() ?  */
 int media_cd;
-int cyls, heads, secs, trans;
 QemuOpts *opts;
 char *serial;
 QTAILQ_ENTRY(DriveInfo) next;
diff --git a/blockdev.c b/blockdev.c
index dcf8c8d2ab..c23587b075 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -731,22 +731,6 @@ QemuOptsList qemu_legacy_drive_opts = {
 .type = QEMU_OPT_STRING,
 .help = "interface (ide, scsi, sd, mtd, floppy, pflash, virtio)",
 },{
-.name = "cyls",
-.type = QEMU_OPT_NUMBER,
-.help = "number of cylinders (ide disk geometry)",
-},{
-.name = "heads",
-.type = QEMU_OPT_NUMBER,
-.help = "number of heads (ide disk geometry)",
-},{
-.name = "secs",
-.type = QEMU_OPT_NUMBER,
-.help = "number of sectors (ide disk geometry)",
-},{
-.name = "trans",
-.type = QEMU_OPT_STRING,
-.help = "chs translation (auto, lba, none)",
-},{
 .name = "addr",
 .type = QEMU_OPT_STRING,
 .help = "pci address (virtio only)",
@@ -792,7 +776,6 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type)
 QemuOpts *legacy_opts;
 DriveMediaType media = MEDIA_DISK;
 BlockInterfaceType type;
-int cyls, heads, secs, translation;
 int max_devs, bus_id, unit_id, index;
 const char *devaddr;
 const char *werror, *rerror;
@@ -803,7 +786,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type)
 Error *local_err = NULL;
 int i;
 const char *deprecated[] = {
-"serial", "trans", "secs", "heads", "cyls", "addr"
+"serial", "addr"
 };
 
 /* Change legacy command line options into QMP ones */
@@ -932,57 +915,6 @@ DriveInfo *drive_new(QemuOpts *all_opts, 
BlockInterfaceType block_default_type)
 type = block_default_type;
 }
 
-/* Geometry */
-cyls  = qemu_opt_get_number(legacy_opts, "cyls", 0);
-heads = qemu_opt_get_number(legacy_opts, "heads", 0);
-secs  = qemu_opt_get_number(legacy_opts, "secs", 0);
-
-if (cyls || heads || secs) {
-if (cyls < 1) {
-error_report("invalid physical cyls number");
-goto fail;
-}
-if (heads < 1) {
-error_report("invalid physical heads number");
-goto fail;
-}
-if (secs < 1) {
-error_report("invalid physical secs number");
-goto fail;
-}
-}
-
-translation = BIOS_ATA_TRANSLATION_AUTO;
-value = qemu_opt_get(legacy_opts, "trans");
-if (value != NULL) {
-if (!cyls) {
-error_report("'%s' trans must be used with cyls, heads and secs",
- value);
-goto fail;
-}
-if (!strcmp(value, "none")) {
-translation = BIOS_ATA_TRANSLATION_NONE;
-} else if (!strcmp(value, "lba")) {
-translation = BIOS_ATA_TRANSLATION_LBA;
-} else if (!strcmp(value, "large")) {
-translation = BIOS_ATA_TRANSLATION_LARGE;
-} else if (!strcmp(value, "rechs")) {
-translation = BIOS_ATA_TRANSLATION_RECHS;
-} else if (!strcmp(value, "auto")) {
-translation = BIOS_ATA_TRANSLATION_AUTO;
-} else {
-error_report("'%s' invalid translation type", value);
-goto fail;
-}
-}
-
-if (media == MEDIA_CDROM) {
-if (cyls || secs || heads) {
-error_report("CHS can't be set with media=cdrom");
-goto fail;
-}
-}
-
 /* Device address specified by bus

[Qemu-block] [PULL 02/21] qemu-iotests: Test removing a throttle group member with a pending timer

2018-08-15 Thread Kevin Wolf
From: Alberto Garcia 

A throttle group can have several members, and each one of them can
have several pending requests in the queue.

The requests are processed in a round-robin fashion, so the algorithm
decides the drive that is going to run the next request and sets a
timer in it. Once the timer fires and the throttled request is run
then the next drive from the group is selected and a new timer is set.

If the user tried to remove a drive from a group and that drive had a
timer set then the code was not taking care of setting up a new timer
in one of the remaining members of the group, freezing their I/O.

This problem was fixed in 6fccbb475bc6effc313ee9481726a1748b6dae57,
and this patch adds a new test case that reproduces this exact
scenario.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/093 | 52 ++
 tests/qemu-iotests/093.out |  4 ++--
 2 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/093 b/tests/qemu-iotests/093
index 68e344f8c1..b26cd34e32 100755
--- a/tests/qemu-iotests/093
+++ b/tests/qemu-iotests/093
@@ -208,6 +208,58 @@ class ThrottleTestCase(iotests.QMPTestCase):
 limits[tk] = rate
 self.do_test_throttle(ndrives, 5, limits)
 
+# Test that removing a drive from a throttle group should not
+# affect the remaining members of the group.
+# https://bugzilla.redhat.com/show_bug.cgi?id=1535914
+def test_remove_group_member(self):
+# Create a throttle group with two drives
+# and set a 4 KB/s read limit.
+params = {"bps": 0,
+  "bps_rd": 4096,
+  "bps_wr": 0,
+  "iops": 0,
+  "iops_rd": 0,
+  "iops_wr": 0 }
+self.configure_throttle(2, params)
+
+# Read 4KB from drive0. This is performed immediately.
+self.vm.hmp_qemu_io("drive0", "aio_read 0 4096")
+
+# Read 4KB again. The I/O limit has been exceeded so this
+# request is throttled and a timer is set to wake it up.
+self.vm.hmp_qemu_io("drive0", "aio_read 0 4096")
+
+# Read from drive1. We're still over the I/O limit so this
+# request is also throttled. There's no timer set in drive1
+# because there's already one in drive0. Once the timer in
+# drive0 fires and its throttled request is processed then the
+# next request in the queue will be scheduled: this one.
+self.vm.hmp_qemu_io("drive1", "aio_read 0 4096")
+
+# At this point only the first 4KB have been read from drive0.
+# The other requests are throttled.
+self.assertEqual(self.blockstats('drive0')[0], 4096)
+self.assertEqual(self.blockstats('drive1')[0], 0)
+
+# Remove drive0 from the throttle group and disable its I/O limits.
+# drive1 remains in the group with a throttled request.
+params['bps_rd'] = 0
+params['device'] = 'drive0'
+result = self.vm.qmp("block_set_io_throttle", conv_keys=False, 
**params)
+self.assert_qmp(result, 'return', {})
+
+# Removing the I/O limits from drive0 drains its pending request.
+# The read request in drive1 is still throttled.
+self.assertEqual(self.blockstats('drive0')[0], 8192)
+self.assertEqual(self.blockstats('drive1')[0], 0)
+
+# Advance the clock 5 seconds. This completes the request in drive1
+self.vm.qtest("clock_step %d" % (5 * nsec_per_sec))
+
+# Now all requests have been processed.
+self.assertEqual(self.blockstats('drive0')[0], 8192)
+self.assertEqual(self.blockstats('drive1')[0], 4096)
+
 class ThrottleTestCoroutine(ThrottleTestCase):
 test_img = "null-co://"
 
diff --git a/tests/qemu-iotests/093.out b/tests/qemu-iotests/093.out
index 594c16f49f..36376bed87 100644
--- a/tests/qemu-iotests/093.out
+++ b/tests/qemu-iotests/093.out
@@ -1,5 +1,5 @@
-
+..
 --
-Ran 8 tests
+Ran 10 tests
 
 OK
-- 
2.13.6




[Qemu-block] [PULL 06/21] luks: Allow share-rw=on

2018-08-15 Thread Kevin Wolf
From: Fam Zheng 

Format drivers such as qcow2 don't allow sharing the same image between
two QEMU instances in order to prevent image corruptions, because of
metadata cache. LUKS driver don't modify metadata except for when
creating image, so it is safe to relax the permission. This makes
share-rw=on property work on virtual devices.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Fam Zheng 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Kevin Wolf 
---
 block/crypto.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index 146d81c90a..33ee01bebd 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -627,7 +627,9 @@ BlockDriver bdrv_crypto_luks = {
 .bdrv_probe = block_crypto_probe_luks,
 .bdrv_open  = block_crypto_open_luks,
 .bdrv_close = block_crypto_close,
-.bdrv_child_perm= bdrv_format_default_perms,
+/* This driver doesn't modify LUKS metadata except when creating image.
+ * Allow share-rw=on as a special case. */
+.bdrv_child_perm= bdrv_filter_default_perms,
 .bdrv_co_create = block_crypto_co_create_luks,
 .bdrv_co_create_opts = block_crypto_co_create_opts_luks,
 .bdrv_co_truncate   = block_crypto_co_truncate,
-- 
2.13.6




[Qemu-block] [PULL 05/21] throttle-groups: Don't allow timers without throttled requests

2018-08-15 Thread Kevin Wolf
From: Alberto Garcia 

Commit 6fccbb475bc6effc313ee9481726a1748b6dae57 fixed a bug caused by
QEMU attempting to remove a throttle group member with no pending
requests but an active timer set. This was the result of a previous
bdrv_drained_begin() call processing the throttled requests but
leaving the timer untouched.

Although the commit does solve the problem, the situation shouldn't
happen in the first place. If we try to drain a throttle group member
which has a timer set, we should cancel the timer instead of ignoring
it.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block/throttle-groups.c | 32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index d46c56b31e..5d8213a443 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -36,6 +36,7 @@
 
 static void throttle_group_obj_init(Object *obj);
 static void throttle_group_obj_complete(UserCreatable *obj, Error **errp);
+static void timer_cb(ThrottleGroupMember *tgm, bool is_write);
 
 /* The ThrottleGroup structure (with its ThrottleState) is shared
  * among different ThrottleGroupMembers and it's independent from
@@ -424,15 +425,31 @@ static void 
throttle_group_restart_queue(ThrottleGroupMember *tgm, bool is_write
 rd->tgm = tgm;
 rd->is_write = is_write;
 
+/* This function is called when a timer is fired or when
+ * throttle_group_restart_tgm() is called. Either way, there can
+ * be no timer pending on this tgm at this point */
+assert(!timer_pending(tgm->throttle_timers.timers[is_write]));
+
 co = qemu_coroutine_create(throttle_group_restart_queue_entry, rd);
 aio_co_enter(tgm->aio_context, co);
 }
 
 void throttle_group_restart_tgm(ThrottleGroupMember *tgm)
 {
+int i;
+
 if (tgm->throttle_state) {
-throttle_group_restart_queue(tgm, 0);
-throttle_group_restart_queue(tgm, 1);
+for (i = 0; i < 2; i++) {
+QEMUTimer *t = tgm->throttle_timers.timers[i];
+if (timer_pending(t)) {
+/* If there's a pending timer on this tgm, fire it now */
+timer_del(t);
+timer_cb(tgm, i);
+} else {
+/* Else run the next request from the queue manually */
+throttle_group_restart_queue(tgm, i);
+}
+}
 }
 }
 
@@ -567,16 +584,11 @@ void throttle_group_unregister_tgm(ThrottleGroupMember 
*tgm)
 return;
 }
 
-assert(tgm->pending_reqs[0] == 0 && tgm->pending_reqs[1] == 0);
-assert(qemu_co_queue_empty(&tgm->throttled_reqs[0]));
-assert(qemu_co_queue_empty(&tgm->throttled_reqs[1]));
-
 qemu_mutex_lock(&tg->lock);
 for (i = 0; i < 2; i++) {
-if (timer_pending(tgm->throttle_timers.timers[i])) {
-tg->any_timer_armed[i] = false;
-schedule_next_request(tgm, i);
-}
+assert(tgm->pending_reqs[i] == 0);
+assert(qemu_co_queue_empty(&tgm->throttled_reqs[i]));
+assert(!timer_pending(tgm->throttle_timers.timers[i]));
 if (tg->tokens[i] == tgm) {
 token = throttle_group_next_tgm(tgm);
 /* Take care of the case where this is the last tgm in the group */
-- 
2.13.6




[Qemu-block] [PULL 03/21] throttle-groups: Skip the round-robin if a member is being drained

2018-08-15 Thread Kevin Wolf
From: Alberto Garcia 

In the throttling code after an I/O request has been completed the
next one is selected from a different member using a round-robin
algorithm. This ensures that all members get a chance to finish their
pending I/O requests.

However, if a group member has its I/O limits disabled (because it's
being drained) then we should always give it priority in order to have
all its pending requests finished as soon as possible.

If we don't do this we could have a member in the process of being
drained waiting for the throttled requests of other members, for which
the I/O limits still apply.

This can have additional consequences: if we're running in qtest mode
(with QEMU_CLOCK_VIRTUAL) then timers can only fire if we advance the
clock manually, so attempting to drain a block device can hang QEMU in
the BDRV_POLL_WHILE() loop at the end of bdrv_do_drained_begin().

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block/throttle-groups.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index e297b04e17..d46c56b31e 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -221,6 +221,15 @@ static ThrottleGroupMember 
*next_throttle_token(ThrottleGroupMember *tgm,
 ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
 ThrottleGroupMember *token, *start;
 
+/* If this member has its I/O limits disabled then it means that
+ * it's being drained. Skip the round-robin search and return tgm
+ * immediately if it has pending requests. Otherwise we could be
+ * forcing it to wait for other member's throttled requests. */
+if (tgm_has_pending_reqs(tgm, is_write) &&
+atomic_read(&tgm->io_limits_disabled)) {
+return tgm;
+}
+
 start = token = tg->tokens[is_write];
 
 /* get next bs round in round robin style */
-- 
2.13.6




[Qemu-block] [PULL 01/21] block/qapi: Fix memory leak in qmp_query_blockstats()

2018-08-15 Thread Kevin Wolf
For BlockBackends that are skipped in query-blockstats, we would leak
info since commit 567dcb31. Allocate info only later to avoid the memory
leak.

Fixes: CID 1394727
Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Reviewed-by: Alberto Garcia 
---
 block/qapi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/qapi.c b/block/qapi.c
index 339727f0f4..c66f949db8 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -594,7 +594,7 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 }
 } else {
 for (blk = blk_all_next(NULL); blk; blk = blk_all_next(blk)) {
-BlockStatsList *info = g_malloc0(sizeof(*info));
+BlockStatsList *info;
 AioContext *ctx = blk_get_aio_context(blk);
 BlockStats *s;
 char *qdev;
@@ -619,6 +619,7 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 bdrv_query_blk_stats(s->stats, blk);
 aio_context_release(ctx);
 
+info = g_malloc0(sizeof(*info));
 info->value = s;
 *p_next = info;
 p_next = &info->next;
-- 
2.13.6




[Qemu-block] [PULL 00/21] Block layer patches

2018-08-15 Thread Kevin Wolf
The following changes since commit 38441756b70eec5807b5f60dad11a93a91199866:

  Update version for v3.0.0 release (2018-08-14 16:38:43 +0100)

are available in the git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to b5fc2d306664c0c1c6c5cf8e164ffa7b8892283e:

  qapi: block: Remove mentions of error types which were removed (2018-08-15 
12:50:39 +0200)


Block layer patches:

- Remove deprecated -drive options for geometry/serial/addr
- luks: Allow shared writers if the parents allow them (share-rw=on)
- qemu-img: Fix error when trying to convert to encrypted target image
- mirror: Fail gracefully for source == target
- I/O throttling: Fix behaviour during drain (always ignore the limits)
- bdrv_reopen() related fixes for bs->options/explicit_options content
- Documentation improvements


Alberto Garcia (9):
  qemu-iotests: Test removing a throttle group member with a pending timer
  throttle-groups: Skip the round-robin if a member is being drained
  qemu-iotests: Update 093 to improve the draining test
  throttle-groups: Don't allow timers without throttled requests
  qdict: Make qdict_extract_subqdict() accept dst = NULL
  block: Remove children options from bs->{options,explicit_options}
  block: Simplify bdrv_reopen_abort()
  block: Update bs->options if bdrv_reopen() succeeds
  block: Simplify append_open_options()

Daniel P. Berrangé (1):
  qemu-img: fix regression copying secrets during convert

Fam Zheng (1):
  luks: Allow share-rw=on

Kevin Wolf (7):
  block/qapi: Fix memory leak in qmp_query_blockstats()
  block: Remove deprecated -drive geometry options
  block: Remove deprecated -drive option addr
  block: Remove deprecated -drive option serial
  block: Remove dead deprecation warning code
  qapi/block: Document restrictions for node names
  mirror: Fail gracefully for source == target

Peter Krempa (1):
  qapi: block: Remove mentions of error types which were removed

Vladimir Sementsov-Ogievskiy (2):
  block: make .bdrv_close optional
  block: drop empty .bdrv_close handlers

 qapi/block-core.json   |   8 ++--
 include/hw/block/block.h   |   1 -
 include/sysemu/blockdev.h  |   3 --
 block.c|  46 ++-
 block/blkreplay.c  |   5 ---
 block/block-backend.c  |   1 -
 block/commit.c |   5 ---
 block/copy-on-read.c   |   6 ---
 block/crypto.c |   4 +-
 block/mirror.c |  10 ++---
 block/null.c   |   6 ---
 block/qapi.c   |   3 +-
 block/raw-format.c |   5 ---
 block/snapshot.c   |   4 +-
 block/throttle-groups.c|  41 -
 blockdev.c | 110 -
 device-hotplug.c   |   4 --
 hw/block/block.c   |  27 ---
 hw/block/nvme.c|   1 -
 hw/block/virtio-blk.c  |   1 -
 hw/ide/qdev.c  |   1 -
 hw/scsi/scsi-disk.c|   1 -
 hw/usb/dev-storage.c   |   1 -
 qemu-img.c |  32 +++--
 qobject/block-qdict.c  |  11 +++--
 tests/ahci-test.c  |   6 +--
 tests/hd-geo-test.c|  37 +++
 tests/ide-test.c   |   8 ++--
 hmp-commands.hx|   1 -
 qemu-deprecated.texi   |  15 ---
 qemu-options.hx|  14 +-
 tests/qemu-iotests/041 |   6 +++
 tests/qemu-iotests/041.out |   4 +-
 tests/qemu-iotests/093 |  55 +++
 tests/qemu-iotests/093.out |   4 +-
 35 files changed, 185 insertions(+), 302 deletions(-)



[Qemu-block] [PULL 04/21] qemu-iotests: Update 093 to improve the draining test

2018-08-15 Thread Kevin Wolf
From: Alberto Garcia 

The previous patch fixes a problem in which draining a block device
with more than one throttled request can make it wait first for the
completion of requests in other members of the same group.

This patch updates test_remove_group_member() in iotest 093 to
reproduce that scenario. This updated test would hang QEMU without the
fix from the previous patch.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/093 | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/tests/qemu-iotests/093 b/tests/qemu-iotests/093
index b26cd34e32..9d1971a56c 100755
--- a/tests/qemu-iotests/093
+++ b/tests/qemu-iotests/093
@@ -225,15 +225,18 @@ class ThrottleTestCase(iotests.QMPTestCase):
 # Read 4KB from drive0. This is performed immediately.
 self.vm.hmp_qemu_io("drive0", "aio_read 0 4096")
 
-# Read 4KB again. The I/O limit has been exceeded so this
+# Read 2KB. The I/O limit has been exceeded so this
 # request is throttled and a timer is set to wake it up.
-self.vm.hmp_qemu_io("drive0", "aio_read 0 4096")
+self.vm.hmp_qemu_io("drive0", "aio_read 0 2048")
+
+# Read 2KB again. We're still over the I/O limit so this is
+# request is also throttled, but no new timer is set since
+# there's already one.
+self.vm.hmp_qemu_io("drive0", "aio_read 0 2048")
 
-# Read from drive1. We're still over the I/O limit so this
-# request is also throttled. There's no timer set in drive1
-# because there's already one in drive0. Once the timer in
-# drive0 fires and its throttled request is processed then the
-# next request in the queue will be scheduled: this one.
+# Read from drive1. This request is also throttled, and no
+# timer is set in drive1 because there's already one in
+# drive0.
 self.vm.hmp_qemu_io("drive1", "aio_read 0 4096")
 
 # At this point only the first 4KB have been read from drive0.
@@ -248,7 +251,7 @@ class ThrottleTestCase(iotests.QMPTestCase):
 result = self.vm.qmp("block_set_io_throttle", conv_keys=False, 
**params)
 self.assert_qmp(result, 'return', {})
 
-# Removing the I/O limits from drive0 drains its pending request.
+# Removing the I/O limits from drive0 drains its two pending requests.
 # The read request in drive1 is still throttled.
 self.assertEqual(self.blockstats('drive0')[0], 8192)
 self.assertEqual(self.blockstats('drive1')[0], 0)
-- 
2.13.6




Re: [Qemu-block] [Libguestfs] [PATCH] v2v: Add --print-estimate option to print source size estimate.

2018-08-15 Thread Kevin Wolf
[ Cc: qemu-block ]

Am 15.08.2018 um 12:55 hat Richard W.M. Jones geschrieben:
> (Adding Stefan who implemented the subcommand)
> 
> On Wed, Aug 15, 2018 at 12:44:44PM +0200, Kevin Wolf wrote:
> > Am 15.08.2018 um 12:26 hat Richard W.M. Jones geschrieben:
> > > On Tue, Aug 14, 2018 at 09:31:06PM +0300, Nir Soffer wrote:
> > > > On Tue, Aug 14, 2018 at 8:29 PM Richard W.M. Jones 
> > > > wrote:
> > > > 
> > > > > This option prints the estimated size of the data that will be copied
> > > > > from the source disk.
> > > > >
> > > > > For interest, the test prints:
> > > > >
> > > > > 3747840 ../test-data/phony-guests/windows.img
> > > > > Estimate: 3710976
> > > > >
> > > > 
> > > > Why not use qemu-img measure on the overlay?
> > > 
> > > Yes this would be better, but oddly it doesn't work properly when
> > > the output format is set to 'raw':
> > > 
> > >   /usr/bin/qemu-img 'measure' '-f' 'qcow2' '-O' 'raw' '--output=human' 
> > > '/home/rjones/d/libguestfs/tmp/v2vovl62b7c2.qcow2'
> > >   required size: 6442450944
> > >   fully allocated size: 6442450944
> > > 
> > > However it's OK if the output format is set to 'qcow2':
> > > 
> > >   /usr/bin/qemu-img 'measure' '-f' 'qcow2' '-O' 'qcow2' '--output=human' 
> > > '/home/rjones/d/libguestfs/tmp/v2vovla53d7c.qcow2'
> > >   required size: 1047986176
> > >   fully allocated size: 6443696128
> > > 
> > > I guess it ignores sparseness of raw images, but I can't see a way to
> > > specify that on the command line.
> > > 
> > > OTOH the qcow2 figure is probably a good enough guess for our purposes
> > > (ie. estimating how much data will be copied).
> > 
> > 'qemu-img measure' calculates the resulting file size, not the number of
> > used blocks. I think this is because its main purpose is to create block
> > devices (like LVs) of the right size, so sparseness on a filesystem
> > level doesn't buy you anything.
> 
> But if I run ‘qemu-img convert ... -O raw output.raw’ then output.raw
> will be a sparse file, and that's the file size I'd expect measure to
> give us for "required size" (of course "fully allocated size" would be
> the virtual size, and that is correct).
> 
> It does look as if the block/raw-format.c:raw_measure function is wrong.

No, it is correct. Its output is what the file size will be. For raw
images, that is the same as the virtual disk size.

Not all blocks in the file will be actually allocated, but the file size
is what 'ls -l' prints, not what 'du' prints (for regular files).

It becomes even clearer when you create LVs as the target. If you have a
10 GB image in which only the last 1 MB actually contains data, you
still need a 10 GB volume. You can't create a 1 MB volume and then store
data at an offset 10 GB - 1 MB, that would be way after the end of the
block device.

> In any case we can use the qcow2 estimate for our purposes as it will
> be near enough what we need (a rough estimate of the size of the copy).

I don't know what the exact purpose is, and it feels a bit hacky, but it
might be good enough.

I suppose what you really want is that 'qemu-img measure' provides
another number for the space taken by allocated blocks. (Probably
excluding metadata for non-raw formats? Might depend on the actual
purpose.)

Kevin



Re: [Qemu-block] [Libguestfs] [PATCH] v2v: Add --print-estimate option to print source size estimate.

2018-08-15 Thread Nir Soffer
On Wed, Aug 15, 2018 at 1:55 PM Richard W.M. Jones 
wrote:

> (Adding Stefan who implemented the subcommand)
>
> On Wed, Aug 15, 2018 at 12:44:44PM +0200, Kevin Wolf wrote:
> > Am 15.08.2018 um 12:26 hat Richard W.M. Jones geschrieben:
> > > On Tue, Aug 14, 2018 at 09:31:06PM +0300, Nir Soffer wrote:
> > > > On Tue, Aug 14, 2018 at 8:29 PM Richard W.M. Jones <
> rjo...@redhat.com>
> > > > wrote:
> > > >
> > > > > This option prints the estimated size of the data that will be
> copied
> > > > > from the source disk.
> > > > >
> > > > > For interest, the test prints:
> > > > >
> > > > > 3747840 ../test-data/phony-guests/windows.img
> > > > > Estimate: 3710976
> > > > >
> > > >
> > > > Why not use qemu-img measure on the overlay?
> > >
> > > Yes this would be better, but oddly it doesn't work properly when
> > > the output format is set to 'raw':
> > >
> > >   /usr/bin/qemu-img 'measure' '-f' 'qcow2' '-O' 'raw' '--output=human'
> '/home/rjones/d/libguestfs/tmp/v2vovl62b7c2.qcow2'
> > >   required size: 6442450944
> > >   fully allocated size: 6442450944
> > >
> > > However it's OK if the output format is set to 'qcow2':
> > >
> > >   /usr/bin/qemu-img 'measure' '-f' 'qcow2' '-O' 'qcow2'
> '--output=human' '/home/rjones/d/libguestfs/tmp/v2vovla53d7c.qcow2'
> > >   required size: 1047986176
> > >   fully allocated size: 6443696128
> > >
> > > I guess it ignores sparseness of raw images, but I can't see a way to
> > > specify that on the command line.
> > >
> > > OTOH the qcow2 figure is probably a good enough guess for our purposes
> > > (ie. estimating how much data will be copied).
> >
> > 'qemu-img measure' calculates the resulting file size, not the number of
> > used blocks. I think this is because its main purpose is to create block
> > devices (like LVs) of the right size, so sparseness on a filesystem
> > level doesn't buy you anything.
>
> But if I run ‘qemu-img convert ... -O raw output.raw’ then output.raw
> will be a sparse file, and that's the file size I'd expect measure to
> give us for "required size" (of course "fully allocated size" would be
> the virtual size, and that is correct).
>
> It does look as if the block/raw-format.c:raw_measure function is wrong.
>

Yes, it looks like the "required-size" value for raw output format is not
very helpful. We could report there the number of bytes that would be
allocated after the image is copied.


>
> In any case we can use the qcow2 estimate for our purposes as it will
> be near enough what we need (a rough estimate of the size of the copy).
>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat
> http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-p2v converts physical machines to virtual machines.  Boot with a
> live CD or over the network (PXE) and turn machines into KVM guests.
> http://libguestfs.org/virt-v2v
>


Re: [Qemu-block] [Qemu-devel] [PATCH 00/33] Qtest driver framework

2018-08-15 Thread Markus Armbruster
Emanuele Giuseppe Esposito  writes:

> Qgraph API for the qtest driver framework
>
> This series of patches introduce a different qtest driver
> organization, viewing machines, drivers and tests as node in a
> graph, each having one or multiple edges relations.
>
> The idea is to have a framework where each test asks for a specific
> driver, and the framework takes care of allocating the proper devices
> required and passing the correct command line arguments to QEMU.
>
> A node can be of four types:
> - MACHINE:   for example "arm/raspi2"
> - DRIVER:for example "generic-sdhci"
> - INTERFACE: for example "sdhci" (interface for all "-sdhci" drivers)
> - TEST:  for example "sdhci-test", consumes an interface and tests
>  the functions provided
>
> An edge relation between two nodes (drivers or machines) X and Y can be:
> - X CONSUMES Y: Y can be plugged into X
> - X PRODUCES Y: X provides the interface Y
> - X CONTAINS Y: Y is part of X component
>
> Basic framework steps are the following:
> - All nodes and edges are created in their respective machine/driver/test 
> files
> - The framework starts QEMU and asks for a list of available devices
>   and machines
> - The framework walks the graph starting from the available machines and
>   performs a Depth First Search for tests
> - Once a test is found, the path is walked again and all drivers are
>   allocated accordingly and the final interface is passed to the test
> - The test is executed
> - Unused objects are cleaned and the path discovery is continued
>
> Depending on the QEMU binary used, only some drivers/machines will be 
> available
> and only test that are reached by them will be executed.
>
> This work is being done as Google Summer of Code 2018 project for QEMU,
> my mentors are Paolo Bonzini and Laurent Vivier.
> Additional infos on the project can be found at:
> https://wiki.qemu.org/Features/qtest_driver_framework

You've likely answered this question somewhere already, but it should be
answered right here: what does this framework buy us?

I figure the main benefit is running tests in all possible contexts,
with automated context setup.  Instead, you feed the framework
declarations of how things are related (the graph stuff described
above).

Taking a step back: your cover letter describes a solution at a high
level.  That's good.  But it should *first* describe the problem you're
trying to solve.

> v3:
> - Minor fixes regarding memory leaks and naming
>
> Signed-off-by: Emanuele Giuseppe Esposito 



Re: [Qemu-block] [PATCH] qapi: block: Remove mentions of error types which were removed

2018-08-15 Thread Kevin Wolf
Am 15.08.2018 um 12:26 hat Peter Krempa geschrieben:
> Most of the various error classes were removed prior to the 1.2 release.
> Remove mentions of the error classes which did not make it.
> 
> Signed-off-by: Peter Krempa 

Thanks, applied to the block branch.

Kevin



[Qemu-block] [PATCH] qapi: block: Remove mentions of error types which were removed

2018-08-15 Thread Peter Krempa
Most of the various error classes were removed prior to the 1.2 release.
Remove mentions of the error classes which did not make it.

Signed-off-by: Peter Krempa 
---
 qapi/block-core.json | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5b9084a394..c5407f801a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1499,11 +1499,8 @@
 #autogenerated. (Since: 2.9)
 #
 # Returns: Nothing on success
-#  If commit or stream is already active on this device, DeviceInUse
 #  If @device does not exist, DeviceNotFound
-#  If image commit is not supported by this device, NotSupported
-#  If @base or @top is invalid, a generic error is returned
-#  If @speed is invalid, InvalidParameter
+#  Any other error returns a GenericError.
 #
 # Since: 1.3
 #
-- 
2.17.1




Re: [Qemu-block] [PATCH] doc: Permit BLOCK_STATUS reply to extend beyond request

2018-08-15 Thread Vladimir Sementsov-Ogievskiy

ping.

Finally, Qemu 3.0 released, and it is incompatible with current NBD 
protocol. Please, commit this patch.


13.08.2018 15:02, Vladimir Sementsov-Ogievskiy wrote:

ping

Qemu 3.0 is near to release, and it is not correspond to current 
version of spec, it corresponds to this patch. Can we push it?


30.05.2018 03:35, Eric Blake wrote:

When the NBD_CMD_BLOCK_STATUS extension was first discussed, the
idea of having the client's length be a hint was proposed, where
the server could reply beyond the client's request in order to
allow for fewer transactions when querying the entire disk. The
portion beyond the client's original request can only occur in
the final extent for a given context, and only if the additional
length matches the type given for the last byte actually requested
by the client.

In the meantime, qemu 2.12 was released as a first client
implementation of NBD_CMD_BLOCK_STATUS, which always sends the
NBD_CMD_FLAG_REQ_ONE flag, and which disconnects from the server
if the server's length exceeds the client request.  This was
relaxed for subsequent qemu, but it means that we have to be
explicit that a server should not send extra length except when
the client is not limiting its request to exactly one extent.

Signed-off-by: Eric Blake 
---

I brought this topic up a few days back, but realized I never
actually posted a patch to go along with it.

  doc/proto.md | 25 ++---
  1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 4b73e0b..bd5f61e 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -1709,8 +1709,8 @@ MUST initiate a hard disconnect.

    *length* MUST be 4 + (a positive integer multiple of 8). This reply
    represents a series of consecutive block descriptors where the sum
-  of the length fields within the descriptors MUST not be greater than
-  the length of the original request. This chunk type MUST appear
+  of the length fields within the descriptors are subject to further
+  constraints documented below. This chunk type MUST appear
    exactly once per metadata ID in a structured reply.

    The payload starts with:
@@ -1725,7 +1725,14 @@ MUST initiate a hard disconnect.
    32 bits, status flags

    If the client used the `NBD_CMD_FLAG_REQ_ONE` flag in the request,
-  then every reply chunk MUST NOT contain more than one descriptor.
+  then every reply chunk MUST NOT contain more than one descriptor,
+  and the descriptor MUST NOT exceed the *length* of the request.
+  Otherwise, when the server returns N extents for a given context,
+  the sum of the *length* fields of the first N-1 extents MUST be
+  smaller than the overall *length* of the client's request, but the
+  final extent MAY exceed the requested length if the server has that
+  information anyway as a side effect of reporting the status of the
+  requested region.

    Even if the client did not use the `NBD_CMD_FLAG_REQ_ONE` flag in
    its request, the server MAY return fewer descriptors in the reply
@@ -2037,10 +2044,14 @@ The following request types exist:

  The list of block status descriptors within the
  `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions
-    of the file starting from specified *offset*, and the sum of the
-    *length* fields of each descriptor MUST not be greater than the
-    overall *length* of the request. This means that the server MAY
-    return less data than required. However the server MUST return at
+    of the file starting from specified *offset*.  If the client used
+    the `NBD_CMD_FLAG_REQ_ONE` flag, each chunk contains exactly one
+    descriptor where the *length* of the descriptor MUST NOT be greater
+    than the *length* of the request; otherwise, a chunk MAY contain
+    multiple descriptors, and the final descriptor MAY extend beyond
+    the original requested size if the server can determine a larger
+    length without additional effort.  On the other hand, the server 
MAY

+    return less data than requested. However the server MUST return at
  least one status descriptor (and since each status descriptor has
  a non-zero length, a client can always make progress on a
  successful return).  The server SHOULD use different *status*






--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] [RFC PATCH] rbd: Don't convert keypairs to JSON and back

2018-08-15 Thread Markus Armbruster
Max Reitz  writes:

> On 2018-07-28 06:32, Jeff Cody wrote:
>> On Wed, Jul 25, 2018 at 05:01:44PM +0100, Daniel P. Berrangé wrote:
>>> On Wed, Jul 25, 2018 at 10:56:48AM -0500, Eric Blake wrote:
 On 07/25/2018 10:10 AM, Markus Armbruster wrote:
> qemu_rbd_parse_filename() builds a keypairs QList, converts it to JSON, 
> and
> stores the resulting QString in a QDict.
>
> qemu_rbd_co_create_opts() and qemu_rbd_open() get the QString from the
> QDict, pass it to qemu_rbd_set_keypairs(), which converts it back into
> a QList.
>
> Drop both conversions, store the QList instead.
>
> This affects output of qemu-img info.  Before this patch:
>
>  $ qemu-img info 
> rbd:rbd/testimg.raw:mon_host=192.168.15.180:rbd_cache=true:conf=/tmp/ceph.conf
>  [junk printed by Ceph library code...]
>  image: json:{"driver": "raw", "file": {"pool": "rbd", "image": 
> "testimg.raw", "conf": "/tmp/ceph.conf", "driver": "rbd", 
> "=keyvalue-pairs": "[\"mon_host\", \"192.168.15.180\", \"rbd_cache\", 
> \"true\"]"}}
>  [more output, not interesting here]
>
> After this patch, we get
>
>  image: json:{"driver": "raw", "file": {"pool": "rbd", "image": 
> "testimg.raw", "conf": "/tmp/ceph.conf", "driver": "rbd", 
> "=keyvalue-pairs": ["mon_host", "192.168.15.180", "rbd_cache", "true"]}}
>
> The value of member "=keyvalue-pairs" changes from a string containing
> a JSON array to that JSON array.  That's an improvement of sorts.  
> However:
>
> * Should "=keyvalue-pairs" even be visible here?  It's supposed to be
>purely internal...

 I'd argue that since it is supposed to be internal (as evidenced by the
 leading '=' that does not name a normal variable), changing it doesn't hurt
 stability. But yes, it would be nicer if we could filter it entirely so 
 that
 it does not appear in json: output, if it doesn't truly affect the contents
 that the guest would see.
>>>
>>> If it appears in the json: output, then that means it could get written
>>> into qcow2 headers as a backing file name, which would make it ABI
>>> sensitive. This makes it even more important to filter it if it is supposed
>>> to be internal only, with no ABI guarantee.
>>>
>> 
>> It's been present for a couple releases (counting 3.0); is it safe to
>> assume that, although it could be present in the qcow2 headers, that it will
>> not break anything by altering it or removing it?
>
> Did =keyvalue-pairs even work in json:{} filename?  If so, it will
> continue to work even after filtering it.  If not, then filtering it
> won't break existing files because they didn't work before either.

I'm afraid it does work:

$ gdb --args upstream-qemu -nodefaults -S -display vnc=:0 -readconfig 
test.cfg 'json:{"driver": "raw", "file": {"pool": "rbd", "image": 
"testimg.raw", "conf": "/tmp/ceph.conf", "driver": "rbd", "=keyvalue-pairs": 
"[\"mon_host\", \"192.168.15.180\", \"rbd_cache\", \"true\"]"}}'
GNU gdb (GDB) Fedora 8.1-25.fc28
[...]
(gdb) b qemu_rbd_open 
Breakpoint 1 at 0x845f83: file /work/armbru/qemu/block/rbd.c, line 660.
(gdb) r
Starting program: /home/armbru/bin/upstream-qemu -nodefaults -S -display 
vnc=:0 -readconfig test.cfg json:\{\"driver\":\ \"raw\",\ \"file\":\ 
\{\"pool\":\ \"rbd\",\ \"image\":\ \"testimg.raw\",\ \"conf\":\ 
\"/tmp/ceph.conf\",\ \"driver\":\ \"rbd\",\ \"=keyvalue-pairs\":\ 
\"\[\\\"mon_host\\\",\ \\\"192.168.15.180\\\",\ \\\"rbd_cache\\\",\ 
\\\"true\\\"\]\"\}\}
[...]
Thread 1 "upstream-qemu" hit Breakpoint 1, qemu_rbd_open 
(bs=0x56a5a970, 
options=0x56a5ec40, flags=24578, errp=0x7fffd370)
at /work/armbru/qemu/block/rbd.c:660
660 {
[...]
(gdb) n
661 BDRVRBDState *s = bs->opaque;
(gdb) 
662 BlockdevOptionsRbd *opts = NULL;
(gdb) 
665 Error *local_err = NULL;
(gdb) 
669 keypairs = g_strdup(qdict_get_try_str(options, "=keyvalue-pairs"));
(gdb) 
670 if (keypairs) {
(gdb) p keypairs 
$1 = 0x569e54c0 "[\"mon_host\", \"192.168.15.180\", \"rbd_cache\", 
\"true\"]"

It really, really, really should not work.

It doesn't work with anything that relies on QAPI to represent
configuration (such as QMP's blockdev-add), because BlockdevOptionsRbd
doesn't have it.

It works with -drive only with a pseudo-filename (more on that below),
even though -drive uses QemuOpts and QDict rather than QAPI, because the
(carefully chosen) name "=keyvalue-pairs" is impossible to use with
QemuOpts.

However, we missed the json:... backdoor :(

Block device configuration has become way too baroque.  I can't keep
enough of it in my mind at the same time to change it safely.  I believe
none of us can.

> To me personally the issue is that if you can specify a plain filename,
> bdrv_refresh_filename() should give you that plain filename back.  So
> rbd's