Re: [Qemu-block] [Qemu-devel] Why is there no qom_get in hmp.c?

2018-04-18 Thread QingFeng Hao


在 2018/4/18 22:04, Dr. David Alan Gilbert 写道:
> * QingFeng Hao (ha...@linux.vnet.ibm.com) wrote:
>> Hi all,
>> I did some investigation and found that "virsh qemu-monitor-command" 
>> supports qom-get,
>> but qemu hmp doesn't. However, in hmp.c there are qom_list and qom_set. It 
>> confused me
>> and my question is: why is this? And how can I get a property's value in 
>> hmp? e.g.
>> qemu-system-* -nodefaults -machine accel=qtest -no-shutdown -nographic 
>> -monitor stdio -serial none -hda /root/t.qcow2
>> "info qtree" can only get a few properties.
> 
> I did try that in 2016 (see my series from about September); but it got
> bogged down in trying to fix output visitors; it's possible the visitor
> code has been fixed since then though.
> 
Thanks! I used your patch with a bit fix of compilation to test, which can work 
and
print the json format string for structure property. The prior discussions seem
no any eventual conclusion with some tricks.

> The 'Show values and description when using "qom-list"' patch
> that Ricardo Perez Blanco posted would do something similar.
Yes, I saw the patch. thanks
> 
> Dave
> 
>> Thanks a lot!
>> -- 
>> Regards
>> QingFeng Hao
>>
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 

-- 
Regards
QingFeng Hao




Re: [Qemu-block] [Qemu-devel] [PATCH v5 00/21] blockjobs: add explicit job management

2018-04-18 Thread Markus Armbruster
John Snow  writes:

> On 04/18/2018 03:25 AM, Markus Armbruster wrote:
>> John Snow  writes:
>> 
>>> On 04/17/2018 09:44 AM, Markus Armbruster wrote:
 John Snow  writes:

> This series seeks to address two distinct but closely related issues
> concerning the job management API.
>
> (1) For jobs that complete when a monitor is not attached and receiving
> events or notifications, there's no way to discern the job's final
> return code. Jobs must remain in the query list until dismissed
> for reliable management.
>
> (2) Jobs that change the block graph structure at an indeterminate point
> after the job starts compete with the management layer that relies
> on that graph structure to issue meaningful commands.
>
> This structure should change only at the behest of the management
> API, and not asynchronously at unknown points in time. Before a job
> issues such changes, it must rely on explicit and synchronous
> confirmation from the management API.
>
> These changes are implemented by formalizing a State Transition Machine
> for the BlockJob subsystem.
>
> Job States:
>
> UNDEFINED   Default state. Internal state only.
> CREATED Job has been created
> RUNNING Job has been started and is running
> PAUSED  Job is not ready and has been paused
> READY   Job is ready and is running
> STANDBY Job is ready and is paused
>
> WAITING Job is waiting on peers in transaction
> PENDING Job is waiting on ACK from QMP
> ABORTINGJob is aborting or has been cancelled
> CONCLUDED   Job has finished and has a retcode available
> NULLJob is being dismantled. Internal state only.
>
> Job Verbs:
>
>>>
>>> Backporting your quote up here:
>>>
 For each job verb and job state: what's the new job state?

>>>
>>> That's not always 1:1, though I tried to address it in the commit messages.
>> 
>> Let me rephrase my question then.  For each job verb and job state: what
>> are the possible new job states?  If there's more than one, what's the
>> condition for each?
>> 
>
> Is my answer below not sufficient? Maybe you're asking "Can you write
> this up in a formal document" instead, or did I miss explaining something?

Your answer was fine.  I blame my one-pass reply writing.

Nevertheless:

>> I appreciate commit messages explaining that, but having complete state
>> machine documentation in one place (a comment or in docs/) would be
>> nice, wouldn't it?

Pretty-please?

> CANCEL  Instructs a running job to terminate with error,
> (Except when that job is READY, which produces no error.)
>>>
>>> CANCEL will take a job to either NULL... (this is the early abort
>>> pathway, prior to the job being fully realized.)
>>>
>>> ...or to ABORTING (from CREATED once it has fully realized the job, or
>>> from RUNNING, READY, WAITING, or PENDING.)
>>>
> PAUSE   Request a job to pause.
>>>
>>> issued to RUNNING or READY, transitions to PAUSED or STANDBY respectively.
>>>
> RESUME  Request a job to resume from a pause.
>>>
>>> issued to PAUSED or STANDBY, transitions to RUNNING or READY respectively.
>>>
> SET-SPEED   Change the speed limiting parameter of a job.
>>>
>>> No run state change.
>>>
> COMPLETEAsk a READY job to finish and exit.
>
>>>
>>> Issued to a READY job, transitions to WAITING.
>>>
> FINALIZEAsk a PENDING job to perform its graph finalization.
>>>
>>> Issued to a PENDING job, transitions to CONCLUDED.
>>>
> DISMISS Finish cleaning up an empty job.

>>>
>>> Issued to a CONCLUDED job, transitions to NULL.
>>>
>>>
> And here's my stab at a diagram:
>
>  +-+
>  |UNDEFINED|
>  +--+--+
> |
>  +--v+
>+-+CREATED+-+
>| +--++ |
>||  |
>| +--++ +--+|
>+-+RUNNING<->PAUSED||
>| +--+-+--+ +--+|
>|| ||
>|| +--+ |
>||| |
>| +--v--+   +---+ | |
>+-+READY<--->STANDBY| | |
>| +--+--+   +---+ | |
>||| |
>| +--v+   | |
>+-+WAITING<---+ |
>| +--++ |
>||  |
>| 

Re: [Qemu-block] [Qemu-devel] [PATCH v5 00/21] blockjobs: add explicit job management

2018-04-18 Thread John Snow


On 04/18/2018 03:25 AM, Markus Armbruster wrote:
> John Snow  writes:
> 
>> On 04/17/2018 09:44 AM, Markus Armbruster wrote:
>>> John Snow  writes:
>>>
 This series seeks to address two distinct but closely related issues
 concerning the job management API.

 (1) For jobs that complete when a monitor is not attached and receiving
 events or notifications, there's no way to discern the job's final
 return code. Jobs must remain in the query list until dismissed
 for reliable management.

 (2) Jobs that change the block graph structure at an indeterminate point
 after the job starts compete with the management layer that relies
 on that graph structure to issue meaningful commands.

 This structure should change only at the behest of the management
 API, and not asynchronously at unknown points in time. Before a job
 issues such changes, it must rely on explicit and synchronous
 confirmation from the management API.

 These changes are implemented by formalizing a State Transition Machine
 for the BlockJob subsystem.

 Job States:

 UNDEFINED   Default state. Internal state only.
 CREATED Job has been created
 RUNNING Job has been started and is running
 PAUSED  Job is not ready and has been paused
 READY   Job is ready and is running
 STANDBY Job is ready and is paused

 WAITING Job is waiting on peers in transaction
 PENDING Job is waiting on ACK from QMP
 ABORTINGJob is aborting or has been cancelled
 CONCLUDED   Job has finished and has a retcode available
 NULLJob is being dismantled. Internal state only.

 Job Verbs:

>>
>> Backporting your quote up here:
>>
>>> For each job verb and job state: what's the new job state?
>>>
>>
>> That's not always 1:1, though I tried to address it in the commit messages.
> 
> Let me rephrase my question then.  For each job verb and job state: what
> are the possible new job states?  If there's more than one, what's the
> condition for each?
> 

Is my answer below not sufficient? Maybe you're asking "Can you write
this up in a formal document" instead, or did I miss explaining something?

> I appreciate commit messages explaining that, but having complete state
> machine documentation in one place (a comment or in docs/) would be
> nice, wouldn't it?
> 
 CANCEL  Instructs a running job to terminate with error,
 (Except when that job is READY, which produces no error.)
>>
>> CANCEL will take a job to either NULL... (this is the early abort
>> pathway, prior to the job being fully realized.)
>>
>> ...or to ABORTING (from CREATED once it has fully realized the job, or
>> from RUNNING, READY, WAITING, or PENDING.)
>>
 PAUSE   Request a job to pause.
>>
>> issued to RUNNING or READY, transitions to PAUSED or STANDBY respectively.
>>
 RESUME  Request a job to resume from a pause.
>>
>> issued to PAUSED or STANDBY, transitions to RUNNING or READY respectively.
>>
 SET-SPEED   Change the speed limiting parameter of a job.
>>
>> No run state change.
>>
 COMPLETEAsk a READY job to finish and exit.

>>
>> Issued to a READY job, transitions to WAITING.
>>
 FINALIZEAsk a PENDING job to perform its graph finalization.
>>
>> Issued to a PENDING job, transitions to CONCLUDED.
>>
 DISMISS Finish cleaning up an empty job.
>>>
>>
>> Issued to a CONCLUDED job, transitions to NULL.
>>
>>
 And here's my stab at a diagram:

  +-+
  |UNDEFINED|
  +--+--+
 |
  +--v+
+-+CREATED+-+
| +--++ |
||  |
| +--++ +--+|
+-+RUNNING<->PAUSED||
| +--+-+--+ +--+|
|| ||
|| +--+ |
||| |
| +--v--+   +---+ | |
+-+READY<--->STANDBY| | |
| +--+--+   +---+ | |
||| |
| +--v+   | |
+-+WAITING<---+ |
| +--++ |
||  |
| +--v+ |
+-+PENDING| |
| +--++ |
||  |
 +--v-+   +--v--+   |
 

Re: [Qemu-block] [Qemu-devel] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-18 Thread Daniel P . Berrangé
On Wed, Apr 18, 2018 at 06:52:08PM +0200, Kevin Wolf wrote:
> Am 18.04.2018 um 18:34 hat Daniel P. Berrangé geschrieben:
> > On Wed, Apr 18, 2018 at 06:28:23PM +0200, Kevin Wolf wrote:
> > > Am 18.04.2018 um 17:06 hat Markus Armbruster geschrieben:
> > 
> > > > Note that users can still configure authentication methods with a
> > > > configuration file.  They probably do that anyway if they use Ceph
> > > > outside QEMU as well.
> > > 
> > > This solution that we originally intended to offer was dismissed by
> > > libvirt as unpractical: libvirt allows the user to specify both a config
> > > file and a key, and if it wanted to use a config file to pass the key,
> > > it would have to create a merged config file and keep it sync with the
> > > user config file at all times. Understandable that they want to avoid
> > > this.
> > 
> > Even if the config file does have auth info setup, we can't assume that
> > the QEMU VMs are supposed to use the same auth info. In fact to properly
> > protect against compromised QEMU, ideally every QEMU would use a completely
> > separate RBD user+password, so that compromised QEMU can't then access
> > RBD disks belonging to a different user.
> > 
> > So from libvirt POV we want to pretend the config file does not exist at
> > all and explicitly pass everything that is needed via normal per-disk
> > setup for blockdev.
> 
> From the rbd driver:
> 
>  * The "conf" option specifies a Ceph configuration file to read.  If
>  * it is not specified, we will read from the default Ceph locations
>  * (e.g., /etc/ceph/ceph.conf).  To avoid reading _any_ configuration
>  * file, specify conf=/dev/null.
> 
> So what we actually expected libvirt to do is to create a config file
> for each rbd image and pass that to qemu. However, libvirt allows the
> user to specify their own config file and passes that, and therefore
> doesn't want to create its own config file. If the user doesn't specify
> a config file, libvirt should probably indeed use /dev/null at least.

Yeah this is a mess - I wish we had never allowed users to pass a config
file, and had used /dev/null all the time. Unfortunately changing either
of these aspects would cause backcompat problems for existing deployments
now :-( So we just have to accept that the global config file is always
in present, but none the less libvirt should try to specify things as
fully as possible.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [Qemu-devel] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-18 Thread Kevin Wolf
Am 18.04.2018 um 18:34 hat Daniel P. Berrangé geschrieben:
> On Wed, Apr 18, 2018 at 06:28:23PM +0200, Kevin Wolf wrote:
> > Am 18.04.2018 um 17:06 hat Markus Armbruster geschrieben:
> 
> > > Note that users can still configure authentication methods with a
> > > configuration file.  They probably do that anyway if they use Ceph
> > > outside QEMU as well.
> > 
> > This solution that we originally intended to offer was dismissed by
> > libvirt as unpractical: libvirt allows the user to specify both a config
> > file and a key, and if it wanted to use a config file to pass the key,
> > it would have to create a merged config file and keep it sync with the
> > user config file at all times. Understandable that they want to avoid
> > this.
> 
> Even if the config file does have auth info setup, we can't assume that
> the QEMU VMs are supposed to use the same auth info. In fact to properly
> protect against compromised QEMU, ideally every QEMU would use a completely
> separate RBD user+password, so that compromised QEMU can't then access
> RBD disks belonging to a different user.
> 
> So from libvirt POV we want to pretend the config file does not exist at
> all and explicitly pass everything that is needed via normal per-disk
> setup for blockdev.

>From the rbd driver:

 * The "conf" option specifies a Ceph configuration file to read.  If
 * it is not specified, we will read from the default Ceph locations
 * (e.g., /etc/ceph/ceph.conf).  To avoid reading _any_ configuration
 * file, specify conf=/dev/null.

So what we actually expected libvirt to do is to create a config file
for each rbd image and pass that to qemu. However, libvirt allows the
user to specify their own config file and passes that, and therefore
doesn't want to create its own config file. If the user doesn't specify
a config file, libvirt should probably indeed use /dev/null at least.

Kevin



Re: [Qemu-block] [PATCH] nbd/server: introduce NBD_CMD_CACHE

2018-04-18 Thread Eric Blake
On 04/13/2018 09:31 AM, Vladimir Sementsov-Ogievskiy wrote:
> Handle nbd CACHE command. Just do read, without sending read data back.
> Cache mechanism should be done by exported node driver chain.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/nbd.h |  3 ++-
>  nbd/server.c| 10 ++
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index fcdcd54502..b4793d0a29 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -135,6 +135,7 @@ typedef struct NBDExtent {
>  #define NBD_FLAG_SEND_TRIM (1 << 5) /* Send TRIM (discard) */
>  #define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6) /* Send WRITE_ZEROES */
>  #define NBD_FLAG_SEND_DF   (1 << 7) /* Send DF (Do not Fragment) */
> +#define NBD_FLAG_SEND_CACHE(1 << 8) /* Send CACHE (prefetch) */

Hmm, this flag is not documented in the upstream NBD protocol yet; are
we sure it matches the xNBD implementation?  We'll want at least a
documentation patch proposed for the NBD list before taking this.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-18 Thread Daniel P . Berrangé
On Wed, Apr 18, 2018 at 06:28:23PM +0200, Kevin Wolf wrote:
> Am 18.04.2018 um 17:06 hat Markus Armbruster geschrieben:

> > Note that users can still configure authentication methods with a
> > configuration file.  They probably do that anyway if they use Ceph
> > outside QEMU as well.
> 
> This solution that we originally intended to offer was dismissed by
> libvirt as unpractical: libvirt allows the user to specify both a config
> file and a key, and if it wanted to use a config file to pass the key,
> it would have to create a merged config file and keep it sync with the
> user config file at all times. Understandable that they want to avoid
> this.

Even if the config file does have auth info setup, we can't assume that
the QEMU VMs are supposed to use the same auth info. In fact to properly
protect against compromised QEMU, ideally every QEMU would use a completely
separate RBD user+password, so that compromised QEMU can't then access
RBD disks belonging to a different user.

So from libvirt POV we want to pretend the config file does not exist at
all and explicitly pass everything that is needed via normal per-disk
setup for blockdev.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [Qemu-devel] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-18 Thread Kevin Wolf
Am 18.04.2018 um 17:06 hat Markus Armbruster geschrieben:
> Kevin Wolf  writes:
> 
> > The legacy command line syntax supports a "password-secret" option that
> > allows to pass an authentication key to Ceph. This was not supported in
> > QMP so far.
> 
> We tried during development of v2.9.0 (commit 0a55679b4a5), but reverted
> before the release:
> 
> commit 46fcc161284ac0e743b99251debc297d5236
> Author: Markus Armbruster 
> Date:   Tue Mar 28 10:56:06 2017 +0200
> 
> rbd: Revert -blockdev and -drive parameter auth-supported
> 
> This reverts half of commit 0a55679.  We're having second thoughts on
> the QAPI schema (and thus the external interface), and haven't reached
> consensus, yet.  Issues include:
> 
> * The implementation uses deprecated rados_conf_set() key
>   "auth_supported".  No biggie.
> 
> * The implementation makes -drive silently ignore invalid parameters
>   "auth" and "auth-supported.*.X" where X isn't "auth".  Fixable (in
>   fact I'm going to fix similar bugs around parameter server), so
>   again no biggie.
> 
> * BlockdevOptionsRbd member @password-secret applies only to
>   authentication method cephx.  Should it be a variant member of
>   RbdAuthMethod?
> 
> * BlockdevOptionsRbd member @user could apply to both methods cephx
>   and none, but I'm not sure it's actually used with none.  If it
>   isn't, should it be a variant member of RbdAuthMethod?
> 
> * The client offers a *set* of authentication methods, not a list.
>   Should the methods be optional members of BlockdevOptionsRbd instead
>   of members of list @auth-supported?  The latter begs the question
>   what multiple entries for the same method mean.  Trivial question
>   now that RbdAuthMethod contains nothing but @type, but less so when
>   RbdAuthMethod acquires other members, such the ones discussed above.
> 
> * How BlockdevOptionsRbd member @auth-supported interacts with
>   settings from a configuration file specified with @conf is
>   undocumented.  I suspect it's untested, too.
> 
> Let's avoid painting ourselves into a corner now, and revert the
> feature for 2.9.

We tried to address all of these points in the schema of this RFC. Maybe
things become easier if we compromise on some.

> Note that users can still configure authentication methods with a
> configuration file.  They probably do that anyway if they use Ceph
> outside QEMU as well.

This solution that we originally intended to offer was dismissed by
libvirt as unpractical: libvirt allows the user to specify both a config
file and a key, and if it wanted to use a config file to pass the key,
it would have to create a merged config file and keep it sync with the
user config file at all times. Understandable that they want to avoid
this.

> Further note that this doesn't affect use of key "auth-supported" in
> -drive file=rbd:...:key=value.
> 
> qemu_rbd_array_opts()'s parameter @type now must be RBD_MON_HOST,
> which is silly.  This will be cleaned up shortly.
> 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Max Reitz 
> Reviewed-by: Eric Blake 
> Reviewed-by: Jeff Cody 
> Message-id: 1490691368-32099-9-git-send-email-arm...@redhat.com
> Signed-off-by: Jeff Cody 
> 
> > This patch introduces authentication options in the QAPI schema, makes
> > them do the corresponding rados_conf_set() calls and adds compatibility
> > code that translates the old "password-secret" option both for opening
> 
> Suggest 'the old "password-secret" command line option parameter'.
> 
> > and creating images to the new set of options.
> >
> > Note that the old option didn't allow to explicitly specify the set of
> 
> Likewise, 'the old option parameter'.
> 
> > allowed authentication schemes. The compatibility code assumes that if
> > "password-secret" is given, only the cephx scheme is allowed. If it's
> > missing, both none and cephx are allowed because the configuration file
> > could still provide a key.
> 
> This is a bit terse for readers who aren't familiar with the way things
> work now (or have since forgotten pretty much everything about it, like
> me).
> 
> Perhaps spelling out how the compatibility code translates the old
> option parameter to BlockdevOptionsRbd would help.
> 
> > Signed-off-by: Kevin Wolf 
> > ---
> >
> > This doesn't actually work correctly yet because the way that options
> > are passed through the block layer (QAPI -> QemuOpts -> QDict). Before
> > we fix or hack around this, let's make sure this is the schema that we
> > want.
> 
> Makes sense.
> 
> > The two known problems are:
> >
> > 1. QDict *options in qemu_rbd_open() may contain options either in their
> >proper QObject types (if 

Re: [Qemu-block] [RFC] Intermediate block mirroring

2018-04-18 Thread Alberto Garcia
On Mon 16 Apr 2018 05:15:21 PM CEST, Max Reitz wrote:
> On 2018-04-16 16:59, Alberto Garcia wrote:
>> On Fri 13 Apr 2018 04:23:07 PM CEST, Max Reitz wrote:
>>> On 2018-04-12 19:07, Alberto Garcia wrote:
 Hello,

 I mentioned this some time ago, but I'd like to retake it now: I'm
 checking how to copy arbitrary nodes on a backing chain, so if I have
 e.g.

[A] <- [B] <- [C] <- [D]

 I'd like to end up with

[A] <- [E] <- [C] <- [D]

 where [E] is a copy of [B]. The most obvious use case is to move [B]
 to a different storage backend.

 At the moment we can already copy [B] into [E] (outside QEMU) and then
 do

change-backing-file device=[D] image-node-name=[C] backing-file=[E]

 However this only changes the image on disk and the bs->backing_file
 string in memory. QEMU keeps using the [B] BlockDriverState, and we
 need to make it switch to [E].

 One possible way to do this would be to modify blockdev-mirror so the
 source image can be located anywhere on a chain. Currently there's
 bs->backing_blocker preventing that, plus qmp_blockdev_mirror()
 refuses to take any non-root source node. Other than permission
 changes I don't think the algorithm itself needs any additional
 modification, although I suppose we'd like to change the backing file
 as part of the same job, and that would require API changes.
>>>
>>> Another way would be to write blockdev-copy. :-)
>> 
>> Something like blockdev-backup but without copying data from backing
>> files? blockdev-mirror already allows configuring that using the
>> MirrorSyncMode parameter.
>
> No, a block job that unites blockdev-backup, blockdev-mirror, and
> block-commit.

Yeah, I was reading again the notes from December. This would then unify
all those existing commands.

Is there any other plan or roadmap for this command, or was just an idea
that was discussed as nice to have? Also, does that mean that the old
commands should not be changed now? I'm asking because there's a big
difference between updating blockdev-mirror to allow non-root source
nodes and creating this new command that would support all features of
the old ones :-)

> I think we wanted to exclude streaming, because that works
> differently.  But, well, streaming would be just installing a COR
> filter node and doing a blockdev-copy to null-co://, right? :-)

Yeah I suppose, but I don't think you can make the filter step part of
the blockdev-copy API.

> I just now remembered a bitmap issue, but that probably shouldn't stop
> you.  There is an issue with bitmaps on nodes with shared-writing
> children.
>
> Imagine you have some qcow2 node or whatever, and you have a guest
> running on it.  Then you attach a filter to it (which allows shared
> writing) and put a mirror job on top of the filter.  Now whenever the
> guest writes to the qcow2 node, the bitmap the mirror job installed on
> the filter won't reflect that.  So that's an issue, but I think that
> case isn't forbidden by the current permission system, so we're
> already screwed there.
>
> So on the contrary, I'd suspect that running mirror inside of some
> backing chain or whatever may be less problematic than running it on
> top, actually...

Yeah I don't think there's a problem there.

 One other way is to have a more generic replace-node command which
 would call bdrv_replace_node(), but I don't know if we want to
 expose that and I don't have any other use case for it at the
 moment.
>>>
>>> I think we do want to expose that.  As far as I'm informed, for
>>> graph manipulation we want a command that can replace nodes
>>> (including replacing nothing by a new node or replacing an existing
>>> node by nothing, if the parent supports that).
>> 
>> Was there any discussion about this in the mailing list? (proposed
>> name for this function, features, etc.)
>
> Well, there is x-blockdev-change.  But we probably want to expose
> bdrv_reopen() in the long run.

Exposing bdrv_reopen() itself shouldn't be too much work (it takes just
two node names, or am I missing something?).

There's still the question of how to update the backing file string (on
the overlay). stream and commit do it automatically, but if we do
bdrv_reopen() or the aforementioned modification to blockdev-mirror we'd
need to do it explicitly with change-backing-file, or extend the API to
allow for that.

Berto



Re: [Qemu-block] [Qemu-devel] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-18 Thread Markus Armbruster
Kevin Wolf  writes:

> The legacy command line syntax supports a "password-secret" option that
> allows to pass an authentication key to Ceph. This was not supported in
> QMP so far.

We tried during development of v2.9.0 (commit 0a55679b4a5), but reverted
before the release:

commit 46fcc161284ac0e743b99251debc297d5236
Author: Markus Armbruster 
Date:   Tue Mar 28 10:56:06 2017 +0200

rbd: Revert -blockdev and -drive parameter auth-supported

This reverts half of commit 0a55679.  We're having second thoughts on
the QAPI schema (and thus the external interface), and haven't reached
consensus, yet.  Issues include:

* The implementation uses deprecated rados_conf_set() key
  "auth_supported".  No biggie.

* The implementation makes -drive silently ignore invalid parameters
  "auth" and "auth-supported.*.X" where X isn't "auth".  Fixable (in
  fact I'm going to fix similar bugs around parameter server), so
  again no biggie.

* BlockdevOptionsRbd member @password-secret applies only to
  authentication method cephx.  Should it be a variant member of
  RbdAuthMethod?

* BlockdevOptionsRbd member @user could apply to both methods cephx
  and none, but I'm not sure it's actually used with none.  If it
  isn't, should it be a variant member of RbdAuthMethod?

* The client offers a *set* of authentication methods, not a list.
  Should the methods be optional members of BlockdevOptionsRbd instead
  of members of list @auth-supported?  The latter begs the question
  what multiple entries for the same method mean.  Trivial question
  now that RbdAuthMethod contains nothing but @type, but less so when
  RbdAuthMethod acquires other members, such the ones discussed above.

* How BlockdevOptionsRbd member @auth-supported interacts with
  settings from a configuration file specified with @conf is
  undocumented.  I suspect it's untested, too.

Let's avoid painting ourselves into a corner now, and revert the
feature for 2.9.

Note that users can still configure authentication methods with a
configuration file.  They probably do that anyway if they use Ceph
outside QEMU as well.

Further note that this doesn't affect use of key "auth-supported" in
-drive file=rbd:...:key=value.

qemu_rbd_array_opts()'s parameter @type now must be RBD_MON_HOST,
which is silly.  This will be cleaned up shortly.

Signed-off-by: Markus Armbruster 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Jeff Cody 
Message-id: 1490691368-32099-9-git-send-email-arm...@redhat.com
Signed-off-by: Jeff Cody 

> This patch introduces authentication options in the QAPI schema, makes
> them do the corresponding rados_conf_set() calls and adds compatibility
> code that translates the old "password-secret" option both for opening

Suggest 'the old "password-secret" command line option parameter'.

> and creating images to the new set of options.
>
> Note that the old option didn't allow to explicitly specify the set of

Likewise, 'the old option parameter'.

> allowed authentication schemes. The compatibility code assumes that if
> "password-secret" is given, only the cephx scheme is allowed. If it's
> missing, both none and cephx are allowed because the configuration file
> could still provide a key.

This is a bit terse for readers who aren't familiar with the way things
work now (or have since forgotten pretty much everything about it, like
me).

Perhaps spelling out how the compatibility code translates the old
option parameter to BlockdevOptionsRbd would help.

> Signed-off-by: Kevin Wolf 
> ---
>
> This doesn't actually work correctly yet because the way that options
> are passed through the block layer (QAPI -> QemuOpts -> QDict). Before
> we fix or hack around this, let's make sure this is the schema that we
> want.

Makes sense.

> The two known problems are:
>
> 1. QDict *options in qemu_rbd_open() may contain options either in their
>proper QObject types (if passed from blockdev-add) or as strings
>(-drive). Both forms can be mixed in the same options QDict.

Remind me, how can such a mix happen?

>rbd uses the keyval visitor to convert the options into a QAPI
>object. This means that it requires all options to be strings. This
>patch, however, introduces a bool property, so the visitor breaks
>when it gets its input from blockdev-add.
>
>Options to hack around the problem:
>
>a. Do an extra conversion step or two and go through QemuOpts like
>   some other block drivers. When I offered something like this to
>   Markus a while ago in a similar case, he rejected the idea.

Going through QemuOpts just to get rid of strings is a bit like going

Re: [Qemu-block] [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC

2018-04-18 Thread Eric Blake
On 04/18/2018 09:25 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Do your code with
>>>
>>>  /* Found an extent, and we're inside it.  */
>>>  *next = f.fe.fe_logical + f.fe.fe_length;
>>>  if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
>>>  return BDRV_BLOCK_DATA|BDRV_BLOCK_ZERO;
>>>  } else {
>>>  return BDRV_BLOCK_DATA;
>>>  }
>>>
>>> provide safe block_status based on FIEMAP without FLAG_SYNC?
>> No, in fact we found data corruption with FIEMAP.
> 
> How to reproduce it? I've tried your code, looks like it shows all
> "data" regions even if I didn't call "sync".
> 

There's no easy way to reproduce unsafe data races reliably; but FIEMAP
without sync is such an unsafe data race (most of the time, you will get
the answer you expect, but under the right conditions, FIEMAP may report
the area as unallocated even though you have already called write(); if
you treat that unallocated region as BDRV_BLOCK_ZERO, rather than
read()ing it, you have corrupted data).  That's because FIEMAP only
reports what the disk has allocated, but file systems can have delayed
allocations where contents in the kernel cache are NOT yet flushed to
disk unless you use sync; but using sync kills performance.

If you want examples of FIEMAP corrupting data, look at the coreutils
archive from several years ago, where FIEMAP without sync caused
corruptions during cp. A quick search found at least this example:
https://lists.gnu.org/archive/html/bug-coreutils/2011-04/msg00023.html

For more details, see qemu commits c4875e5b and 38c4d0a, and discussion
at https://lists.gnu.org/archive/html/qemu-devel/2014-09/msg04921.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)

2018-04-18 Thread Vladimir Sementsov-Ogievskiy

15.02.2018 19:42, Paolo Bonzini wrote:

On 15/02/2018 17:40, Vladimir Sementsov-Ogievskiy wrote:

Hi all.

Two years later, is there are any news on the topic?

I can't understand the following thing:

  - FIEMAP without FLAG_SYNC is unsafe
  - FIEMAP with FLAG_SYNC is safe but slow
  - so, we've dropped FIEMAP and use only lseek. So, it means that lseek
is safe _and_ fast (at least, faster than FIEMAP with FLAG_SYNC)...

but how is it possible, if lseek and fiemap share same code in the kernel?

Do they?  Maybe not for all file systems.


Do your code with

     /* Found an extent, and we're inside it.  */
     *next = f.fe.fe_logical + f.fe.fe_length;
     if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
     return BDRV_BLOCK_DATA|BDRV_BLOCK_ZERO;
     } else {
     return BDRV_BLOCK_DATA;
     }

provide safe block_status based on FIEMAP without FLAG_SYNC?

No, in fact we found data corruption with FIEMAP.


How to reproduce it? I've tried your code, looks like it shows all 
"data" regions even if I didn't call "sync".




Paolo


--
may help: link to discussion start:
http://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04641.html

22.07.2016 13:41, Paolo Bonzini wrote:

i.e. the use of fiemap to duplicate the exact layout of a file
from userspace is only posisble if you can /guarantee/ the source
file has not changed in any way during the copy operation at the
pointin time you finalise the destination data copy.

We don't do exactly that, exactly because it's messy when you have
concurrent accesses (which shouldn't be done but you never know).

Which means you *cannot make the assumption it won't happen*.
FIEMAP is not guaranteed to tell you exactly where the data in the
file is that you need to copy is and that nothing you can do from
userspace changes that. I can't say it any clearer than that.

You've said it very clearly.  But I'm not saying "fix the damn
FIEMAP", I'm
saying "this is what we need, lseek doesn't provide it, FIEMAP comes
close but really doesn't".  If the solution is to fix FIEMAP, if it's
possible at all, that'd be great.  But any other solution is okay.

Do you at least agree that it's possible to use the kind of information
in struct fiemap_extent (extent start, length and flags) in a way that is
not racy, or at least not any different from SEEK_DATA and SEEK_HOLE's
raciness?  I'm not saying that you'd get that information from FIEMAP.
It's just the kind of information that I'd like to get.

(BTW, the documentation of FIEMAP is horrible.  It does not say at all
that FIEMAP_FLAG_SYNC is needed to return extents that match what
SEEK_HOLE/SEEK_DATA would return.  The obvious reading is that
FIEMAP_FLAG_SYNC would avoid returning FIEMAP_EXTENT_DELALLOC extents,
and that in turn would not be a problem if you don't need fe_physical.
Perhaps it would help if fiemap.txt said started with *why* would one
use FIEMAP, not just what it does).


When doing a copy, we use(d to use) FIEMAP the same way as you'd use
lseek,
querying one extent at a time.  If you proceed this way, all of these
can cause the same races:

- pread(ofs=10MB, len=10MB) returns all zeroes, so the 10MB..20MB is
not copied

- pread(ofs=10MB, len=10MB) returns non-zero data, so the 10MB..20MB is
copied

- lseek(SEEK_DATA, 10MB) returns 20MB, so the 10MB..20MB area is not
copied

- lseek(SEEK_HOLE, 10MB) returns 20MB, so the 10MB..20MB area is
copied

- ioctl(FIEMAP at 10MB) returns an extent starting at 20MB, so
the 10MB..20MB area is not copied

No, FIEMAP is not guaranteed to behave like this. what is returned
is filesystem dependent. Fielsystems that don't support holes will
return data extents. Filesystems that support compression might
return a compressed data extent rather than a hole. Encrypted files
might not expose holes at all, so people can't easily find known
plain text regions in the encrypted data. Filesystems could report
holes as deduplicated data, etc.  What do you do when FIEMAP returns
"OFFLINE" to indicate that the data is located elsewhere and will
need to be retrieved by the HSM operating on top of the filesystem
before layout can be determined?

lseek(SEEK_DATA) might also say you're not on a hole because the file
is compressed/encrypted/deduplicated/offline/whatnot.  So lseek is
also filesystem dependent, isn't it?  It also doesn't work on block
devices, so it's really file descriptor dependent.  That's not news.

Of course read, lseek and FIEMAP will not return exactly the same
information.  The point is that they're subject to exactly the same
races, and that struct fiemap_extent *can* be parsed conservatively.
The code I attached to the previous message does that, if it finds any
extent kind other than an unwritten extent it just treats it as data.

Based on this it would be nice to understand the reason why FIEMAP needs
FIEMAP_FLAG_SYNC to return meaningful values (the meaningful values
are there, they're just what lseek or read use), or to have a more
powerful function than just 

Re: [Qemu-block] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-18 Thread Kevin Wolf
Am 18.04.2018 um 16:16 hat Eric Blake geschrieben:
> On 04/18/2018 08:50 AM, Kevin Wolf wrote:
> 
> >>> @@ -3195,6 +3215,8 @@
> >>>  '*conf': 'str',
> >>>  '*snapshot': 'str',
> >>>  '*user': 'str',
> >>> +'*auth-none': 'bool',
> >>> +'*auth-cephx': 'RbdAuthCephx',
> >>>  '*server': ['InetSocketAddressBase'] } }
> >>
> >> Would it be better to have this be a flat union with 'auth' with enum
> >> values 'none', 'cephx', 'both' as a discriminator that determines which
> >> additional fields can be present?  Or does that require that we first
> >> fix the QAPI generator to allow nesting a flat union within another flat
> >> union (probably doable, just no one has needed it before now)?  Is it
> >> also time to improve the QAPI generator to allow a default value to the
> >> discriminator field, rather than requiring the field to be present?
> > 
> > Both options can be enabled at the same time, so that the client
> > connects to a server no matter whether it does 'cephx' authentication or
> > only 'none. This is even the default for rbd driver (in the existing
> > command line interface, but I think we need to stay compatible with it).
> > With a union you would have to explicitly choose one or the other, but
> > could never accept both.
> > 
> > The other option we were considering was a list of authentication
> > options, which would be easier to implement, but isn't really an
> > accurate representation of what we really accept. There is no way we
> > could meaningfully implement something like this:
> > 
> > 'auth': [ { 'type': 'cephx', 'key-secret': 'foo' },
> >   { 'type': 'cephx', 'key-secret': 'bar' } ]
> > 
> > Because Ceph only allows us to enable the 'cephx' authentication method
> > and to set a single key for it.
> 
> How does it look as a choice between:
> 
> {'enum':'CephxAuth', 'data': ['none', 'cephx', 'both' ]}
> 
> where both 'cephx' and 'both' support the optional 'key-secret'
> parameter, but 'none' does not?

Doesn't really look extensible for the case that Ceph adds a third mode.
At least I don't think we want to have an enum and associated union
branches for all possible combinations?

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-18 Thread Eric Blake
On 04/18/2018 08:50 AM, Kevin Wolf wrote:

>>> @@ -3195,6 +3215,8 @@
>>>  '*conf': 'str',
>>>  '*snapshot': 'str',
>>>  '*user': 'str',
>>> +'*auth-none': 'bool',
>>> +'*auth-cephx': 'RbdAuthCephx',
>>>  '*server': ['InetSocketAddressBase'] } }
>>
>> Would it be better to have this be a flat union with 'auth' with enum
>> values 'none', 'cephx', 'both' as a discriminator that determines which
>> additional fields can be present?  Or does that require that we first
>> fix the QAPI generator to allow nesting a flat union within another flat
>> union (probably doable, just no one has needed it before now)?  Is it
>> also time to improve the QAPI generator to allow a default value to the
>> discriminator field, rather than requiring the field to be present?
> 
> Both options can be enabled at the same time, so that the client
> connects to a server no matter whether it does 'cephx' authentication or
> only 'none. This is even the default for rbd driver (in the existing
> command line interface, but I think we need to stay compatible with it).
> With a union you would have to explicitly choose one or the other, but
> could never accept both.
> 
> The other option we were considering was a list of authentication
> options, which would be easier to implement, but isn't really an
> accurate representation of what we really accept. There is no way we
> could meaningfully implement something like this:
> 
> 'auth': [ { 'type': 'cephx', 'key-secret': 'foo' },
>   { 'type': 'cephx', 'key-secret': 'bar' } ]
> 
> Because Ceph only allows us to enable the 'cephx' authentication method
> and to set a single key for it.

How does it look as a choice between:

{'enum':'CephxAuth', 'data': ['none', 'cephx', 'both' ]}

where both 'cephx' and 'both' support the optional 'key-secret'
parameter, but 'none' does not?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 2/3] nbd/server: implement dirty bitmap export

2018-04-18 Thread Eric Blake
On 04/13/2018 01:14 PM, Vladimir Sementsov-Ogievskiy wrote:
> Handle new NBD meta namespace: "qemu", and corresponding queries:
> "qemu:dirty-bitmap:".
> 
> With new metadata context negotiated, BLOCK_STATUS query will reply
> with dirty-bitmap data, converted to extents. New public function
> nbd_export_bitmap selects bitmap to export. For now, only one bitmap
> may be exported.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/nbd.h |   6 ++
>  nbd/server.c| 234 
> 
>  2 files changed, 224 insertions(+), 16 deletions(-)
> 

>  
> +/* nbd_meta_bitmap_query
> + *
> + * Handle query to 'qemu-dirty-bitmap' namespace.

Stale comment

> + * 'len' is the amount of text remaining to be read from the current name, 
> after
> + * the 'qemu-dirty-bitmap:' portion has been stripped.

and again

> + *
> + * Return -errno on I/O error, 0 if option was completely handled by
> + * sending a reply about inconsistent lengths, or 1 on success. */
> +static int nbd_meta_bitmap_query(NBDClient *client, NBDExportMetaContexts 
> *meta,
> + uint32_t len, Error **errp)
> +{
> +if (!client->exp->export_bitmap) {
> +return nbd_opt_skip(client, len, errp);
> +}
> +
> +return nbd_meta_single_query(
> +client, client->exp->export_bitmap_context + strlen("qemu:"), 
> len,
> +>dirty_bitmap, errp);

Wait, so client->exp->export_bitmap_context is stored WITH the "qemu:"
prefix as part of the name?

During _LIST, you are allowing a query of "qemu:" to list all
qemu-specific contexts (namely, qemu:dirty-bitmap:); but
should we also make it work that if you do "qemu:dirty-bitmap:", you get
a list of just dirty-bitmap contexts even if "qemu:" has added other
contexts in the meantime?

> @@ -1800,6 +1851,94 @@ static int nbd_co_send_block_status(NBDClient *client, 
> uint64_t handle,
>  return nbd_co_send_extents(client, handle, , 1, context_id, errp);
>  }
>  
> +/* Set several extents, describing region of given @length with given @flags.
> + * Do not set more than @nb_extents, return number of set extents.
> + */
> +static unsigned add_extents(NBDExtent *extents, unsigned nb_extents,
> +uint64_t length, uint32_t flags)
> +{
> +unsigned i = 0;
> +uint32_t max_extent = QEMU_ALIGN_DOWN(INT32_MAX, 512);
> +/* TODO: should be somehow in agreement with blocksize of OPT_GO,
> + * and this should be documented in NBD proto. */

nbd.git commit 218c446 documented that descriptor lengths MUST be an
integer multiple of minimum block size; as for maximum, I see no reason
why BLOCK_STATUS can't report larger extents than what NBD_CMD_READ can
do.  Do we need further doc tweaks to nbd.git?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] Why is there no qom_get in hmp.c?

2018-04-18 Thread Dr. David Alan Gilbert
* QingFeng Hao (ha...@linux.vnet.ibm.com) wrote:
> Hi all,
> I did some investigation and found that "virsh qemu-monitor-command" supports 
> qom-get,
> but qemu hmp doesn't. However, in hmp.c there are qom_list and qom_set. It 
> confused me
> and my question is: why is this? And how can I get a property's value in hmp? 
> e.g.
> qemu-system-* -nodefaults -machine accel=qtest -no-shutdown -nographic 
> -monitor stdio -serial none -hda /root/t.qcow2
> "info qtree" can only get a few properties.

I did try that in 2016 (see my series from about September); but it got
bogged down in trying to fix output visitors; it's possible the visitor
code has been fixed since then though.

The 'Show values and description when using "qom-list"' patch
that Ricardo Perez Blanco posted would do something similar.

Dave

> Thanks a lot!
> -- 
> Regards
> QingFeng Hao
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-block] Restoring bitmaps after failed/cancelled migration

2018-04-18 Thread Vladimir Sementsov-Ogievskiy

Hi all.

We now have the following problem:

If dirty-bitmaps migration capability is enabled, persistance flag is 
dropped for all migrated bitmaps, to prevent their storing to the 
storage on inactivate. It works ok, persistence itself is migrated 
through the migration channel. But on source, bitmaps becomes not 
persistent, so if we, for some reasons, want to continue using source 
vm, we'll lose bitmaps on stop/start.


It's simple to fix it: just make bitmaps persistent again on invalidate 
[1].. But I have some questions.


1. What are possible cases? I think about the following:

a. migration cancel or fail, then invalidate
b. migration success, then qmp cont => invalidate
c. migration success, then stop/start (there was no invalidate, so [1] 
will not work)



2. Is it safe at all, saving bitmaps after inactivate, even without 
persistence?


Inactive disk implies, that it may be changed by somebody other, isn't 
it? Is it possible, that target will change the disk, and then we return 
control to the source? In this case bitmaps will be invalid. So, should 
not we drop all the bitmaps on inactivate?


--
Best regards,
Vladimir




Re: [Qemu-block] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-18 Thread Kevin Wolf
Am 18.04.2018 um 15:26 hat Eric Blake geschrieben:
> On 04/05/2018 12:06 PM, Kevin Wolf wrote:
> > The legacy command line syntax supports a "password-secret" option that
> > allows to pass an authentication key to Ceph. This was not supported in
> > QMP so far.
> > 
> > This patch introduces authentication options in the QAPI schema, makes
> > them do the corresponding rados_conf_set() calls and adds compatibility
> > code that translates the old "password-secret" option both for opening
> > and creating images to the new set of options.
> > 
> > Note that the old option didn't allow to explicitly specify the set of
> > allowed authentication schemes. The compatibility code assumes that if
> > "password-secret" is given, only the cephx scheme is allowed. If it's
> > missing, both none and cephx are allowed because the configuration file
> > could still provide a key.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> 
> > Any thoughts on the proposed QAPI schema or the two implementation
> > problems are welcome.
> > 
> 
> >  qapi/block-core.json |  22 +++
> >  block/rbd.c  | 102 
> > ++-
> >  2 files changed, 99 insertions(+), 25 deletions(-)
> > 
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index c50517bff3..d5ce588add 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -3170,6 +3170,19 @@
> >  
> >  
> >  ##
> > +# @RbdAuthCephx:
> > +#
> > +# @key-secret: ID of a QCryptoSecret object providing a key for 
> > cephx
> > +#  authentication. If not specified, a key from the
> > +#  specified configuration file, or the system default
> > +#  configuration is used, if present.
> > +#
> > +# Since: 2.13
> > +##
> > +{ 'struct': 'RbdAuthCephx',
> > +  'data': { '*key-secret': 'str' } }
> > +
> > +##
> >  # @BlockdevOptionsRbd:
> >  #
> >  # @pool:   Ceph pool name.
> > @@ -3184,6 +3197,13 @@
> >  #
> >  # @user:   Ceph id name.
> >  #
> > +# @auth-none:  true if connecting to a server without 
> > authentication
> > +#  should be allowed (default: false; since 2.13)
> > +#
> > +# @auth-cephx: Configuration for cephx authentication if 
> > specified. If
> > +#  not specified, cephx authentication is not allowed.
> > +#  (since 2.13)
> > +#
> >  # @server: Monitor host address and port.  This maps
> >  #  to the "mon_host" Ceph option.
> >  #
> > @@ -3195,6 +3215,8 @@
> >  '*conf': 'str',
> >  '*snapshot': 'str',
> >  '*user': 'str',
> > +'*auth-none': 'bool',
> > +'*auth-cephx': 'RbdAuthCephx',
> >  '*server': ['InetSocketAddressBase'] } }
> 
> Would it be better to have this be a flat union with 'auth' with enum
> values 'none', 'cephx', 'both' as a discriminator that determines which
> additional fields can be present?  Or does that require that we first
> fix the QAPI generator to allow nesting a flat union within another flat
> union (probably doable, just no one has needed it before now)?  Is it
> also time to improve the QAPI generator to allow a default value to the
> discriminator field, rather than requiring the field to be present?

Both options can be enabled at the same time, so that the client
connects to a server no matter whether it does 'cephx' authentication or
only 'none. This is even the default for rbd driver (in the existing
command line interface, but I think we need to stay compatible with it).
With a union you would have to explicitly choose one or the other, but
could never accept both.

The other option we were considering was a list of authentication
options, which would be easier to implement, but isn't really an
accurate representation of what we really accept. There is no way we
could meaningfully implement something like this:

'auth': [ { 'type': 'cephx', 'key-secret': 'foo' },
  { 'type': 'cephx', 'key-secret': 'bar' } ]

Because Ceph only allows us to enable the 'cephx' authentication method
and to set a single key for it.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v4 2/5] qcow2: Document some maximum size constraints

2018-04-18 Thread Eric Blake
On 04/13/2018 12:08 PM, Max Reitz wrote:

 With 512 byte clusters and 64 bit refcount entries I still get 8 PB,
 way over what's limited by the L1/L2 tables (128 GB).
>>>
>>> Do I need to make any modifications to the sentence, then?
>>
>> I guess what surprised me the first time that I read it was that it
>> suggests that this has to be taken into account when calculating the
>> physical limits of an image, while in practice it can be ignored.
>>
>> You could say something like 
>>
>>   Although the larger the cluster size, the larger the offsets that can
>>   be covered by the refcount table, in practice these limits cannot be
>>   reached because they are larger than the ones imposed by other data
>>   structures.
> 
> Are there any updates here?  I guess I personally would just drop the
> whole paragraph, because I think it really doesn't matter...

Yeah, I need to post a v5 of this series now that 2.13 is nearly open.

> 
> Also note that the maximum file size of ext4 is 16 PB (for 4 kB blocks).
>  OK, it's bigger for XFS, but that still gives some perspective.
> 
> Also, long before anyone is going to complain about the specification
> failing to mention that limit, they are going to complain that qemu
> refuses to open their image (because of its limit on the reftable size).
> 
> Max
> 
>> although I'm sure that you can come up with a better wording than mine :)
>>
>> Berto
>>
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-18 Thread Kevin Wolf
Am 18.04.2018 um 15:21 hat Eric Blake geschrieben:
> On 04/06/2018 03:04 AM, Kevin Wolf wrote:
> > Am 05.04.2018 um 19:06 hat Kevin Wolf geschrieben:
> >> The legacy command line syntax supports a "password-secret" option that
> >> allows to pass an authentication key to Ceph. This was not supported in
> >> QMP so far.
> >>
> >> This patch introduces authentication options in the QAPI schema, makes
> >> them do the corresponding rados_conf_set() calls and adds compatibility
> >> code that translates the old "password-secret" option both for opening
> >> and creating images to the new set of options.
> >>
> >> Note that the old option didn't allow to explicitly specify the set of
> >> allowed authentication schemes. The compatibility code assumes that if
> >> "password-secret" is given, only the cephx scheme is allowed. If it's
> >> missing, both none and cephx are allowed because the configuration file
> >> could still provide a key.
> > 
> > There is another problem here that suggests that maybe this is not the
> > right QAPI schema after all: The defaults needed for command line
> > compatibility and those promised in the QAPI schema are conflicting.
> > 
> > The required command line behaviour is as described above:
> > 
> > * password-secret given: only cephx
> > * no options given: cephx, none
> > 
> > The desired QMP default behaviour is:
> > 
> > * auth-cephx given: allow cephx
> > * auth-none given: allow none
> > * both given: allow both
> > * no options given: error
> > 
> > In .bdrv_open() there is no way to distinguish the "no options given" of
> > the command line from that of QMP. The current implementation allows
> > everything if no options are given, i.e. it keeps existing command lines
> > working, but it doesn't correctly implement the behaviour described in
> > the QAPI schema.
> 
> Can we use a QAPI alternate with an explicit JSON null for the command
> line 'no options given' case?

The fundamental problem is that we can only have a single default value
for both command line and QMP. I don't think an alternate would change
anything there.

Both the commands line and the QMP 'no options given' cases would end up
being represented by a missing key in the options QDict. null would only
be used if explicitly given in blockdev-add (I don't think you can
specify null on the command line at all).

> > 
> > I don't think changing the description of the QAPI schema would be a
> > good idea, it would be a rather surprising interface.
> > 
> >> Signed-off-by: Kevin Wolf 
> >> ---
> >>
> >> This doesn't actually work correctly yet because the way that options
> >> are passed through the block layer (QAPI -> QemuOpts -> QDict). Before
> >> we fix or hack around this, let's make sure this is the schema that we
> >> want.
> 
> Can we skip the intermediate QemuOpts?  If we can go straight from
> command line to QDict using just QAPI, won't that give us what we really
> need?

In fact, I think that's what -blockdev already does. The one that I had
in mind is -drive, which adds the additional QemuOpts step, but we don't
have to support -drive with a new syntax.

However, the problem is with the representation in the QDict, so
skipping QemuOpts doesn't help.

> >>
> >> The two known problems are:
> >>
> >> 1. QDict *options in qemu_rbd_open() may contain options either in their
> >>proper QObject types (if passed from blockdev-add) or as strings
> >>(-drive). Both forms can be mixed in the same options QDict.
> >>
> >>rbd uses the keyval visitor to convert the options into a QAPI
> >>object. This means that it requires all options to be strings. This
> >>patch, however, introduces a bool property, so the visitor breaks
> >>when it gets its input from blockdev-add.
> >>
> >>Options to hack around the problem:
> >>
> >>a. Do an extra conversion step or two and go through QemuOpts like
> >>   some other block drivers. When I offered something like this to
> >>   Markus a while ago in a similar case, he rejected the idea.
> >>
> >>b. Introduce a qdict_stringify_entries() that just does what its name
> >>   says. It would be called before the running keyval visitor so that
> >>   only strings will be present in the QDict.
> >>
> >>c. Do a local one-off hack that checks if the entry with the key
> >>   "auth-none" is a QBool, and if so, overwrite it with a string. The
> >>   problem will reappear with the next non-string option.
> >>
> >>(d. Get rid of the QDict detour and work only with QAPI objects
> >>everywhere. Support rbd authentication only in QEMU 4.0.)
> 
> Oh, even one step further than just avoiding QemuOpts, but using REAL
> types everywhere in the block layer!  It might be a nice cleanup, but it
> would probably take a lot of effort in the meantime to get to that point?

Yes, I would like to get there, but it's definitely a long term goal
(that's why I wrote "QEMU 4.0"). It definitely means getting rid 

Re: [Qemu-block] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-18 Thread Eric Blake
On 04/05/2018 12:06 PM, Kevin Wolf wrote:
> The legacy command line syntax supports a "password-secret" option that
> allows to pass an authentication key to Ceph. This was not supported in
> QMP so far.
> 
> This patch introduces authentication options in the QAPI schema, makes
> them do the corresponding rados_conf_set() calls and adds compatibility
> code that translates the old "password-secret" option both for opening
> and creating images to the new set of options.
> 
> Note that the old option didn't allow to explicitly specify the set of
> allowed authentication schemes. The compatibility code assumes that if
> "password-secret" is given, only the cephx scheme is allowed. If it's
> missing, both none and cephx are allowed because the configuration file
> could still provide a key.
> 
> Signed-off-by: Kevin Wolf 
> ---

> Any thoughts on the proposed QAPI schema or the two implementation
> problems are welcome.
> 

>  qapi/block-core.json |  22 +++
>  block/rbd.c  | 102 
> ++-
>  2 files changed, 99 insertions(+), 25 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index c50517bff3..d5ce588add 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3170,6 +3170,19 @@
>  
>  
>  ##
> +# @RbdAuthCephx:
> +#
> +# @key-secret: ID of a QCryptoSecret object providing a key for cephx
> +#  authentication. If not specified, a key from the
> +#  specified configuration file, or the system default
> +#  configuration is used, if present.
> +#
> +# Since: 2.13
> +##
> +{ 'struct': 'RbdAuthCephx',
> +  'data': { '*key-secret': 'str' } }
> +
> +##
>  # @BlockdevOptionsRbd:
>  #
>  # @pool:   Ceph pool name.
> @@ -3184,6 +3197,13 @@
>  #
>  # @user:   Ceph id name.
>  #
> +# @auth-none:  true if connecting to a server without authentication
> +#  should be allowed (default: false; since 2.13)
> +#
> +# @auth-cephx: Configuration for cephx authentication if specified. 
> If
> +#  not specified, cephx authentication is not allowed.
> +#  (since 2.13)
> +#
>  # @server: Monitor host address and port.  This maps
>  #  to the "mon_host" Ceph option.
>  #
> @@ -3195,6 +3215,8 @@
>  '*conf': 'str',
>  '*snapshot': 'str',
>  '*user': 'str',
> +'*auth-none': 'bool',
> +'*auth-cephx': 'RbdAuthCephx',
>  '*server': ['InetSocketAddressBase'] } }

Would it be better to have this be a flat union with 'auth' with enum
values 'none', 'cephx', 'both' as a discriminator that determines which
additional fields can be present?  Or does that require that we first
fix the QAPI generator to allow nesting a flat union within another flat
union (probably doable, just no one has needed it before now)?  Is it
also time to improve the QAPI generator to allow a default value to the
discriminator field, rather than requiring the field to be present?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-18 Thread Eric Blake
On 04/06/2018 03:04 AM, Kevin Wolf wrote:
> Am 05.04.2018 um 19:06 hat Kevin Wolf geschrieben:
>> The legacy command line syntax supports a "password-secret" option that
>> allows to pass an authentication key to Ceph. This was not supported in
>> QMP so far.
>>
>> This patch introduces authentication options in the QAPI schema, makes
>> them do the corresponding rados_conf_set() calls and adds compatibility
>> code that translates the old "password-secret" option both for opening
>> and creating images to the new set of options.
>>
>> Note that the old option didn't allow to explicitly specify the set of
>> allowed authentication schemes. The compatibility code assumes that if
>> "password-secret" is given, only the cephx scheme is allowed. If it's
>> missing, both none and cephx are allowed because the configuration file
>> could still provide a key.
> 
> There is another problem here that suggests that maybe this is not the
> right QAPI schema after all: The defaults needed for command line
> compatibility and those promised in the QAPI schema are conflicting.
> 
> The required command line behaviour is as described above:
> 
> * password-secret given: only cephx
> * no options given: cephx, none
> 
> The desired QMP default behaviour is:
> 
> * auth-cephx given: allow cephx
> * auth-none given: allow none
> * both given: allow both
> * no options given: error
> 
> In .bdrv_open() there is no way to distinguish the "no options given" of
> the command line from that of QMP. The current implementation allows
> everything if no options are given, i.e. it keeps existing command lines
> working, but it doesn't correctly implement the behaviour described in
> the QAPI schema.

Can we use a QAPI alternate with an explicit JSON null for the command
line 'no options given' case?

> 
> I don't think changing the description of the QAPI schema would be a
> good idea, it would be a rather surprising interface.
> 
>> Signed-off-by: Kevin Wolf 
>> ---
>>
>> This doesn't actually work correctly yet because the way that options
>> are passed through the block layer (QAPI -> QemuOpts -> QDict). Before
>> we fix or hack around this, let's make sure this is the schema that we
>> want.

Can we skip the intermediate QemuOpts?  If we can go straight from
command line to QDict using just QAPI, won't that give us what we really
need?

>>
>> The two known problems are:
>>
>> 1. QDict *options in qemu_rbd_open() may contain options either in their
>>proper QObject types (if passed from blockdev-add) or as strings
>>(-drive). Both forms can be mixed in the same options QDict.
>>
>>rbd uses the keyval visitor to convert the options into a QAPI
>>object. This means that it requires all options to be strings. This
>>patch, however, introduces a bool property, so the visitor breaks
>>when it gets its input from blockdev-add.
>>
>>Options to hack around the problem:
>>
>>a. Do an extra conversion step or two and go through QemuOpts like
>>   some other block drivers. When I offered something like this to
>>   Markus a while ago in a similar case, he rejected the idea.
>>
>>b. Introduce a qdict_stringify_entries() that just does what its name
>>   says. It would be called before the running keyval visitor so that
>>   only strings will be present in the QDict.
>>
>>c. Do a local one-off hack that checks if the entry with the key
>>   "auth-none" is a QBool, and if so, overwrite it with a string. The
>>   problem will reappear with the next non-string option.
>>
>>(d. Get rid of the QDict detour and work only with QAPI objects
>>everywhere. Support rbd authentication only in QEMU 4.0.)

Oh, even one step further than just avoiding QemuOpts, but using REAL
types everywhere in the block layer!  It might be a nice cleanup, but it
would probably take a lot of effort in the meantime to get to that point?

>>
>> 2. The proposed schema allows 'auth-cephx': {} as a valid option with
>>the meaning that the cephx authentication scheme is enabled, but no
>>key is given (e.g. it is taken from the config file).
>>
>>However, an empty dict cannot currently be represented by flattened
>>QDicts. We need to find a way to enable this. I think this will be
>>externally visible because it directly translates into the dotted
>>syntax of -blockdev, so we may want to be careful.

Can we just require users to give -blockdev with the JSON format (rather
than dotted format) when they need to use that particular feature on the
command line?

>>
>> Any thoughts on the proposed QAPI schema or the two implementation
>> problems are welcome.
> 
> Kevin
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 0/2] Give the refcount cache the minimum possible size by default

2018-04-18 Thread Max Reitz
On 2018-04-17 14:37, Alberto Garcia wrote:
> Hi,
> 
> we talked about this the other day, so here are the patches to change
> the default cache sizes in qcow2.
> 
> Without this patch:
> 
>  * refcount-cache-size = l2-cache-size / 4
> 
> unless otherwise specified by the user. This is wasteful, the refcount
> cache is accessed sequentially during normal I/O, so there's no point
> in caching more tables. I measured the effect on the refcount cache
> size when populating an empty qcow2 image using random writes, and
> there's no difference between having the minimum or the maximum
> sizes(*).
> 
> With this patch:
> 
>  * refcount-cache-size is always 4 clusters by default (the minimum)
> 
>  * If "cache-size" is set then l2-cache-size is set to the maximum if
>possible (disk_size * 8 / cluster_size) and the remainder is
>assigned to the refcount cache.
> 
> Regards,
> 
> Berto
> 
> (*) there is, actually: having a very large cache can even make the
> I/O slightly slower, because the larger the cache the longer it
> takes longer to find a cached entry. I only noticed this under
> tmpfs anyway.
> 
> Changes:
> v3:
> - Mention that if you use internal snapshots you may want to increase
>   the cache size [Max]
> 
> v2: https://lists.gnu.org/archive/html/qemu-block/2018-03/msg00822.html
> - s/overriden/overridden/ (in both patches)
> 
> v1: https://lists.gnu.org/archive/html/qemu-block/2018-03/msg00709.html
> - Initial release
> 
> Alberto Garcia (2):
>   qcow2: Give the refcount cache the minimum possible size by default
>   docs: Document the new default sizes of the qcow2 caches
> 
>  block/qcow2.c  | 31 +++
>  block/qcow2.h  |  4 
>  docs/qcow2-cache.txt   | 33 -
>  tests/qemu-iotests/137.out |  2 +-
>  4 files changed, 36 insertions(+), 34 deletions(-)

Thanks, applied to my block-next branch:

https://github.com/XanClic/qemu/commits/block-next

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck

2018-04-18 Thread David Lee
On Wed, Apr 18, 2018 at 4:44 PM, Fam Zheng  wrote:
>
> qemu-img hangs because the convert_iteration_sectors loop cannot make any
> progress when it reaches the end of the base image. It is a bug (implicitly?)
> fixed by Eric Blake (Cc'ed) 's BDRV_BLOCK_EOF patches on upstream, backporting
> them to the above downstream version fixes the problem for me:
>
> commit c61e684e44272f2acb2bef34cf2aa234582a73a9
> Author: Eric Blake 
>
> block: Exploit BDRV_BLOCK_EOF for larger zero blocks
>
> commit fb0d8654ffc3ea1494067327fc4c4da5d0872724
> Author: Eric Blake 
>
> block: Add BDRV_BLOCK_EOF to bdrv_get_block_status()
>
> Fam

Fam,

Thanks for the info.

-- 
Thanks,
Li Qun



Re: [Qemu-block] [PATCH 3/3] qemu-iotests: Test new qemu-nbd --nolist option

2018-04-18 Thread Vladimir Sementsov-Ogievskiy

17.04.2018 22:56, Eric Blake wrote:

On 04/13/2018 02:26 PM, Nir Soffer wrote:

Add new test module for tesing the --nolist option.

Signed-off-by: Nir Soffer 
---
+iotests.log('Check that listing exports is allowed by default')
+disk, nbd_sock = iotests.file_path('disk1', 'nbd-sock1')
+iotests.qemu_img_create('-f', iotests.imgfmt, disk, '1m')
+iotests.qemu_nbd('-k', nbd_sock, '-f', iotests.imgfmt, '-x', 'export', disk)
+out = iotests.run('nbd-client', '-l', '--unix', nbd_sock)

Should we really be relying on the third-party nbd-client to be
installed?  Would it not be better to teach our own NBD client to learn
how to do interesting things over NBD?  Your use case of listing the
output of NBD_OPT_LIST is one, but another that readily comes to mind is
listing the possibilities of NBD_OPT_LIST_META_CONTEXT that just went
into 2.12.  Maybe making 'qemu-img info' give more details when
connecting to an NBD server, compared to what is normally needed just
for connecting to an export on that server?

Additionally, once we merge in Vladimir's work to expose persistent
dirty bitmaps via NBD_OPT_SET_META_CONTEXT/NBD_CMD_BLOCK_STATUS, it
would be nice to have an in-tree tool for reading out the context
information of an NBD export, perhaps extending what 'qemu-img map' can
already do (Vladimir already mentioned that he only implemented the
server side, and left the client side for an out-of-tree solution [1],
although I'm wondering if that is still the wisest course of action).

[1]


Client part for base:allocation is done, and qemu-img map shows it, 
client part for dirty bitmaps export - isn't.






+
+assert 'export' in out.splitlines(), 'Export not in %r' % out
+
+iotests.log('Check that listing exports is forbidden with --nolist')
+disk, nbd_sock = iotests.file_path('disk2', 'nbd-sock2')
+iotests.qemu_img_create('-f', iotests.imgfmt, disk, '1m')
+iotests.qemu_nbd('-k', nbd_sock, '-f', iotests.imgfmt, '-x', 'secret',
+ '--nolist', disk)
+
+# nbd-client fails when listing is not allowed, but lets not depend on 3rd

s/lets/let's/




--
Best regards,
Vladimir




Re: [Qemu-block] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-18 Thread Kevin Wolf
Am 06.04.2018 um 10:04 hat Kevin Wolf geschrieben:
> Am 05.04.2018 um 19:06 hat Kevin Wolf geschrieben:
> > The legacy command line syntax supports a "password-secret" option that
> > allows to pass an authentication key to Ceph. This was not supported in
> > QMP so far.
> > 
> > This patch introduces authentication options in the QAPI schema, makes
> > them do the corresponding rados_conf_set() calls and adds compatibility
> > code that translates the old "password-secret" option both for opening
> > and creating images to the new set of options.
> > 
> > Note that the old option didn't allow to explicitly specify the set of
> > allowed authentication schemes. The compatibility code assumes that if
> > "password-secret" is given, only the cephx scheme is allowed. If it's
> > missing, both none and cephx are allowed because the configuration file
> > could still provide a key.
> 
> There is another problem here that suggests that maybe this is not the
> right QAPI schema after all: The defaults needed for command line
> compatibility and those promised in the QAPI schema are conflicting.
> 
> The required command line behaviour is as described above:
> 
> * password-secret given: only cephx
> * no options given: cephx, none
> 
> The desired QMP default behaviour is:
> 
> * auth-cephx given: allow cephx
> * auth-none given: allow none
> * both given: allow both
> * no options given: error
> 
> In .bdrv_open() there is no way to distinguish the "no options given" of
> the command line from that of QMP. The current implementation allows
> everything if no options are given, i.e. it keeps existing command lines
> working, but it doesn't correctly implement the behaviour described in
> the QAPI schema.
> 
> I don't think changing the description of the QAPI schema would be a
> good idea, it would be a rather surprising interface.
> 
> > Signed-off-by: Kevin Wolf 
> > ---
> > 
> > This doesn't actually work correctly yet because the way that options
> > are passed through the block layer (QAPI -> QemuOpts -> QDict). Before
> > we fix or hack around this, let's make sure this is the schema that we
> > want.
> > 
> > The two known problems are:
> > 
> > 1. QDict *options in qemu_rbd_open() may contain options either in their
> >proper QObject types (if passed from blockdev-add) or as strings
> >(-drive). Both forms can be mixed in the same options QDict.
> > 
> >rbd uses the keyval visitor to convert the options into a QAPI
> >object. This means that it requires all options to be strings. This
> >patch, however, introduces a bool property, so the visitor breaks
> >when it gets its input from blockdev-add.
> > 
> >Options to hack around the problem:
> > 
> >a. Do an extra conversion step or two and go through QemuOpts like
> >   some other block drivers. When I offered something like this to
> >   Markus a while ago in a similar case, he rejected the idea.
> > 
> >b. Introduce a qdict_stringify_entries() that just does what its name
> >   says. It would be called before the running keyval visitor so that
> >   only strings will be present in the QDict.
> > 
> >c. Do a local one-off hack that checks if the entry with the key
> >   "auth-none" is a QBool, and if so, overwrite it with a string. The
> >   problem will reappear with the next non-string option.
> > 
> >(d. Get rid of the QDict detour and work only with QAPI objects
> >everywhere. Support rbd authentication only in QEMU 4.0.)
> > 
> > 2. The proposed schema allows 'auth-cephx': {} as a valid option with
> >the meaning that the cephx authentication scheme is enabled, but no
> >key is given (e.g. it is taken from the config file).
> > 
> >However, an empty dict cannot currently be represented by flattened
> >QDicts. We need to find a way to enable this. I think this will be
> >externally visible because it directly translates into the dotted
> >syntax of -blockdev, so we may want to be careful.
> > 
> > Any thoughts on the proposed QAPI schema or the two implementation
> > problems are welcome.

Ping?

If nobody has an opinion, we might as well just revert the revert and
bring the legacy interface 1:1 to QMP.

Kevin



Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck

2018-04-18 Thread Fam Zheng
On Wed, 04/18 15:58, Fam Zheng wrote:
> On Wed, 04/18 15:42, David Lee wrote:
> > On Thu, Apr 12, 2018 at 11:57 PM, David Lee  wrote:
> > >>>
> > >>> We tested qemu-kvm-ev-2.9.0-16.el7_4.14.1 - where from the source RPM we
> > >>> verified it does contain ef6dada8b44e1e7c4bec5c1115903af9af415b50
> > >>>
> > >>> But the issue still exists.  The convert got stuck if one of the old
> > >>> active overlay
> > >>> had been 'vol-resize'd  with qemu monitor command to a larger size.  
> > >>> This looks
> > >>> like a prerequisite but not sufficient condition to trigger this 
> > >>> badness.
> > >>
> > >> So it is a separate issue. Did you try upstream master as well?
> > >>
> > >> Fam
> > >
> > > Not yet.
> > 
> > Stefan & FAM,
> > 
> > Here are the steps to reproduce this issue reliably:
> > 
> > # qemu-img create -f qcow2 test.qcow2 100m
> > ... omitted
> > # qemu-img create -F qcow2 -f qcow2 -b test.qcow2 overlay.qcow2
> > ... omitted
> > # qemu-img resize overlay.qcow2 +20m
> > Image resized.
> > # qemu-img create -F qcow2 -f qcow2 -b overlay.qcow2 overlay2.qcow2
> > ... omitted
> > # qemu-img convert overlay2.qcow2 -f qcow2 -O qcow2 combined.qcow2
> > [hang]
> > 
> > 
> > # qemu-img --version
> > qemu-img version 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.14.1)
> 
> Thanks, I can reproduce this but not on master. I will take a look.

qemu-img hangs because the convert_iteration_sectors loop cannot make any
progress when it reaches the end of the base image. It is a bug (implicitly?)
fixed by Eric Blake (Cc'ed) 's BDRV_BLOCK_EOF patches on upstream, backporting
them to the above downstream version fixes the problem for me:

commit c61e684e44272f2acb2bef34cf2aa234582a73a9
Author: Eric Blake 

block: Exploit BDRV_BLOCK_EOF for larger zero blocks

commit fb0d8654ffc3ea1494067327fc4c4da5d0872724
Author: Eric Blake 

block: Add BDRV_BLOCK_EOF to bdrv_get_block_status()

Fam



Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck

2018-04-18 Thread Fam Zheng
On Wed, 04/18 15:42, David Lee wrote:
> On Thu, Apr 12, 2018 at 11:57 PM, David Lee  wrote:
> >>>
> >>> We tested qemu-kvm-ev-2.9.0-16.el7_4.14.1 - where from the source RPM we
> >>> verified it does contain ef6dada8b44e1e7c4bec5c1115903af9af415b50
> >>>
> >>> But the issue still exists.  The convert got stuck if one of the old
> >>> active overlay
> >>> had been 'vol-resize'd  with qemu monitor command to a larger size.  This 
> >>> looks
> >>> like a prerequisite but not sufficient condition to trigger this badness.
> >>
> >> So it is a separate issue. Did you try upstream master as well?
> >>
> >> Fam
> >
> > Not yet.
> 
> Stefan & FAM,
> 
> Here are the steps to reproduce this issue reliably:
> 
> # qemu-img create -f qcow2 test.qcow2 100m
> ... omitted
> # qemu-img create -F qcow2 -f qcow2 -b test.qcow2 overlay.qcow2
> ... omitted
> # qemu-img resize overlay.qcow2 +20m
> Image resized.
> # qemu-img create -F qcow2 -f qcow2 -b overlay.qcow2 overlay2.qcow2
> ... omitted
> # qemu-img convert overlay2.qcow2 -f qcow2 -O qcow2 combined.qcow2
> [hang]
> 
> 
> # qemu-img --version
> qemu-img version 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.14.1)

Thanks, I can reproduce this but not on master. I will take a look.

Fam



Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck

2018-04-18 Thread David Lee
On Thu, Apr 12, 2018 at 11:57 PM, David Lee  wrote:
>>>
>>> We tested qemu-kvm-ev-2.9.0-16.el7_4.14.1 - where from the source RPM we
>>> verified it does contain ef6dada8b44e1e7c4bec5c1115903af9af415b50
>>>
>>> But the issue still exists.  The convert got stuck if one of the old
>>> active overlay
>>> had been 'vol-resize'd  with qemu monitor command to a larger size.  This 
>>> looks
>>> like a prerequisite but not sufficient condition to trigger this badness.
>>
>> So it is a separate issue. Did you try upstream master as well?
>>
>> Fam
>
> Not yet.

Stefan & FAM,

Here are the steps to reproduce this issue reliably:

# qemu-img create -f qcow2 test.qcow2 100m
... omitted
# qemu-img create -F qcow2 -f qcow2 -b test.qcow2 overlay.qcow2
... omitted
# qemu-img resize overlay.qcow2 +20m
Image resized.
# qemu-img create -F qcow2 -f qcow2 -b overlay.qcow2 overlay2.qcow2
... omitted
# qemu-img convert overlay2.qcow2 -f qcow2 -O qcow2 combined.qcow2
[hang]


# qemu-img --version
qemu-img version 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.14.1)

-- 
Thanks,
Li Qun



Re: [Qemu-block] [Qemu-devel] [PATCH v5 00/21] blockjobs: add explicit job management

2018-04-18 Thread Markus Armbruster
John Snow  writes:

> On 04/17/2018 09:44 AM, Markus Armbruster wrote:
>> John Snow  writes:
>> 
>>> This series seeks to address two distinct but closely related issues
>>> concerning the job management API.
>>>
>>> (1) For jobs that complete when a monitor is not attached and receiving
>>> events or notifications, there's no way to discern the job's final
>>> return code. Jobs must remain in the query list until dismissed
>>> for reliable management.
>>>
>>> (2) Jobs that change the block graph structure at an indeterminate point
>>> after the job starts compete with the management layer that relies
>>> on that graph structure to issue meaningful commands.
>>>
>>> This structure should change only at the behest of the management
>>> API, and not asynchronously at unknown points in time. Before a job
>>> issues such changes, it must rely on explicit and synchronous
>>> confirmation from the management API.
>>>
>>> These changes are implemented by formalizing a State Transition Machine
>>> for the BlockJob subsystem.
>>>
>>> Job States:
>>>
>>> UNDEFINED   Default state. Internal state only.
>>> CREATED Job has been created
>>> RUNNING Job has been started and is running
>>> PAUSED  Job is not ready and has been paused
>>> READY   Job is ready and is running
>>> STANDBY Job is ready and is paused
>>>
>>> WAITING Job is waiting on peers in transaction
>>> PENDING Job is waiting on ACK from QMP
>>> ABORTINGJob is aborting or has been cancelled
>>> CONCLUDED   Job has finished and has a retcode available
>>> NULLJob is being dismantled. Internal state only.
>>>
>>> Job Verbs:
>>>
>
> Backporting your quote up here:
>
>> For each job verb and job state: what's the new job state?
>>
>
> That's not always 1:1, though I tried to address it in the commit messages.

Let me rephrase my question then.  For each job verb and job state: what
are the possible new job states?  If there's more than one, what's the
condition for each?

I appreciate commit messages explaining that, but having complete state
machine documentation in one place (a comment or in docs/) would be
nice, wouldn't it?

>>> CANCEL  Instructs a running job to terminate with error,
>>> (Except when that job is READY, which produces no error.)
>
> CANCEL will take a job to either NULL... (this is the early abort
> pathway, prior to the job being fully realized.)
>
> ...or to ABORTING (from CREATED once it has fully realized the job, or
> from RUNNING, READY, WAITING, or PENDING.)
>
>>> PAUSE   Request a job to pause.
>
> issued to RUNNING or READY, transitions to PAUSED or STANDBY respectively.
>
>>> RESUME  Request a job to resume from a pause.
>
> issued to PAUSED or STANDBY, transitions to RUNNING or READY respectively.
>
>>> SET-SPEED   Change the speed limiting parameter of a job.
>
> No run state change.
>
>>> COMPLETEAsk a READY job to finish and exit.
>>>
>
> Issued to a READY job, transitions to WAITING.
>
>>> FINALIZEAsk a PENDING job to perform its graph finalization.
>
> Issued to a PENDING job, transitions to CONCLUDED.
>
>>> DISMISS Finish cleaning up an empty job.
>> 
>
> Issued to a CONCLUDED job, transitions to NULL.
>
>
>>> And here's my stab at a diagram:
>>>
>>>  +-+
>>>  |UNDEFINED|
>>>  +--+--+
>>> |
>>>  +--v+
>>>+-+CREATED+-+
>>>| +--++ |
>>>||  |
>>>| +--++ +--+|
>>>+-+RUNNING<->PAUSED||
>>>| +--+-+--+ +--+|
>>>|| ||
>>>|| +--+ |
>>>||| |
>>>| +--v--+   +---+ | |
>>>+-+READY<--->STANDBY| | |
>>>| +--+--+   +---+ | |
>>>||| |
>>>| +--v+   | |
>>>+-+WAITING<---+ |
>>>| +--++ |
>>>||  |
>>>| +--v+ |
>>>+-+PENDING| |
>>>| +--++ |
>>>||  |
>>> +--v-+   +--v--+   |
>>> |ABORTING+--->CONCLUDED|   |
>>> ++   +--+--+   |
>>> |  |
>>>  +--v-+|
>>>  |NULL<+
>>>  ++
>> 
>> Is this diagram missing a few arrowheads?  E.g. on the edge between
>>