Re: [PATCH v6 02/14] qcrypto/luks: implement encryption key management

2020-05-17 Thread Maxim Levitsky
On Thu, 2020-05-14 at 13:56 +0200, Max Reitz wrote:
> On 10.05.20 15:40, Maxim Levitsky wrote:
> > Next few patches will expose that functionality to the user.
> > 
> > Signed-off-by: Maxim Levitsky 
> > Reviewed-by: Daniel P. Berrangé 
> > ---
> >  crypto/block-luks.c | 395 +++-
> >  qapi/crypto.json|  61 ++-
> >  2 files changed, 452 insertions(+), 4 deletions(-)
> > 
> > diff --git a/crypto/block-luks.c b/crypto/block-luks.c
> > index 4861db810c..967d382882 100644
> > --- a/crypto/block-luks.c
> > +++ b/crypto/block-luks.c
> 
> [...]
> 
> > @@ -1069,6 +1076,119 @@ qcrypto_block_luks_find_key(QCryptoBlock *block,
> >  return -1;
> >  }
> >  
> > +/*
> > + * Returns true if a slot i is marked as active
> > + * (contains encrypted copy of the master key)
> > + */
> > +static bool
> > +qcrypto_block_luks_slot_active(const QCryptoBlockLUKS *luks,
> > +   unsigned int slot_idx)
> > +{
> > +uint32_t val = luks->header.key_slots[slot_idx].active;
> > +return val ==  QCRYPTO_BLOCK_LUKS_KEY_SLOT_ENABLED;
> 
> One space too many after the ==.
Fixed, thanks!
> 
> [...]
> 
> > +/*
> > + * Erases an keyslot given its index
> > + * Returns:
> > + *0 if the keyslot was erased successfully
> > + *   -1 if a error occurred while erasing the keyslot
> > + *
> > + */
> > +static int
> > +qcrypto_block_luks_erase_key(QCryptoBlock *block,
> > + unsigned int slot_idx,
> > + QCryptoBlockWriteFunc writefunc,
> > + void *opaque,
> > + Error **errp)
> > +{
> > +QCryptoBlockLUKS *luks = block->opaque;
> > +QCryptoBlockLUKSKeySlot *slot = &luks->header.key_slots[slot_idx];
> > +g_autofree uint8_t *garbagesplitkey = NULL;
> > +size_t splitkeylen = luks->header.master_key_len * slot->stripes;
> 
> This accesses header.key_slots[slot_idx] before...
> 
> > +size_t i;
> > +Error *local_err = NULL;
> > +int ret;
> > +
> > +assert(slot_idx < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS);
> 
> ...we assert that slot_idx is in bounds.
Fixed, thanks as well!
There are few more places where I didn't place this assert,
and I added it just in case.

> 
> I suppose that’s kind of fine, because assertions aren’t meant to fire
> either, but this basically makes the assertion useless.
> 
> > +assert(splitkeylen > 0);
> > +garbagesplitkey = g_new0(uint8_t, splitkeylen);
> > +
> > +/* Reset the key slot header */
> > +memset(slot->salt, 0, QCRYPTO_BLOCK_LUKS_SALT_LEN);
> > +slot->iterations = 0;
> > +slot->active = QCRYPTO_BLOCK_LUKS_KEY_SLOT_DISABLED;
> > +
> > +ret = qcrypto_block_luks_store_header(block,  writefunc,
> 
> Superfluous space after the comma.
True, fixed.

> 
> > +  opaque, &local_err);
> > +
> > +if (ret < 0) {
> > +error_propagate(errp, local_err);
> > +}
> 
> error_propagate() is a no-op with local_err == NULL, so you could do
> without checking @ret (just calling error_propagate() unconditionally).
> 
> (But who cares, we need to set @ret anyway, so we might as well check it.)
Yep.
> 
> [...]
> 
> > +static int
> > +qcrypto_block_luks_amend_add_keyslot(QCryptoBlock *block,
> > + QCryptoBlockReadFunc readfunc,
> > + QCryptoBlockWriteFunc writefunc,
> > + void *opaque,
> > + QCryptoBlockAmendOptionsLUKS 
> > *opts_luks,
> > + bool force,
> > + Error **errp)
> > +{
> > +QCryptoBlockLUKS *luks = block->opaque;
> > +uint64_t iter_time = opts_luks->has_iter_time ?
> > + opts_luks->iter_time :
> > + QCRYPTO_BLOCK_LUKS_DEFAULT_ITER_TIME_MS;
> > +int keyslot;
> > +g_autofree char *old_password = NULL;
> > +g_autofree char *new_password = NULL;
> > +g_autofree uint8_t *master_key = NULL;
> 
> (I assume we don’t really care about purging secrets from memory after use)
Yep, I tried once fixing this but I decided to just leave it as is,
and later fix in independent patch series, which would need to touch
much that luks (we for example keep the qcrypto secrets in memory as well).


> 
> [...]
> 
> > +static int
> > +qcrypto_block_luks_amend_erase_keyslots(QCryptoBlock *block,
> > +QCryptoBlockReadFunc readfunc,
> > +QCryptoBlockWriteFunc writefunc,
> > +void *opaque,
> > +QCryptoBlockAmendOptionsLUKS 
> > *opts_luks,
> > +bool force,
> > +Error **errp)
> > +{
> 
> [...]
> 
> > +for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_

Re: [PATCH v6 03/14] block/amend: add 'force' option

2020-05-17 Thread Maxim Levitsky
On Thu, 2020-05-14 at 14:18 +0200, Max Reitz wrote:
> On 10.05.20 15:40, Maxim Levitsky wrote:
> > 'force' option will be used for some unsafe amend operations.
> > 
> > This includes things like erasing last keyslot in luks based formats
> > (which destroys the data, unless the master key is backed up
> > by external means), but that _might_ be desired result.
> > 
> > Signed-off-by: Maxim Levitsky 
> > Reviewed-by: Daniel P. Berrangé 
> > ---
> >  block.c   | 4 +++-
> >  block/qcow2.c | 1 +
> >  docs/tools/qemu-img.rst   | 5 -
> >  include/block/block.h | 1 +
> >  include/block/block_int.h | 1 +
> >  qemu-img-cmds.hx  | 4 ++--
> >  qemu-img.c| 8 +++-
> >  7 files changed, 19 insertions(+), 5 deletions(-)
> 
> [...]
> 
> > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> > index 0080f83a76..fc2dca6649 100644
> > --- a/docs/tools/qemu-img.rst
> > +++ b/docs/tools/qemu-img.rst
> > @@ -249,11 +249,14 @@ Command description:
> >  
> >  .. program:: qemu-img-commands
> >  
> > -.. option:: amend [--object OBJECTDEF] [--image-opts] [-p] [-q] [-f FMT] 
> > [-t CACHE] -o OPTIONS FILENAME
> > +.. option:: amend [--object OBJECTDEF] [--image-opts] [-p] [-q] [-f FMT] 
> > [-t CACHE] [--force] -o OPTIONS FILENAME
> >  
> >Amends the image format specific *OPTIONS* for the image file
> >*FILENAME*. Not all file formats support this operation.
> >  
> > +  --force allows some unsafe operations. Currently for -f luks, it allows 
> > to
> > +  erase last encryption key, and to overwrite an active encryption key.
> 
> *erase the last encryption key
Fixed, thanks!
> 
> With that fixed:
> 
> Reviewed-by: Max Reitz 
> 

Thanks for the review,
Best regards,
Maxim Levitsky




Re: [PATCH v6 04/14] block/amend: separate amend and create options for qemu-img

2020-05-17 Thread Maxim Levitsky
On Fri, 2020-05-15 at 12:24 -0500, Eric Blake wrote:
> On 5/15/20 1:22 AM, Max Reitz wrote:
> 
> > > > 
> > > > > +QCOW_COMMON_OPTIONS,
> > > > > +{ /* end of list */ }
> > > 
> > > ...the intended usage is to use the macro name followed by a comma, so
> > > including a trailing comma in the macro itself would lead to a syntax
> > > error.
> > 
> > But why is that the indended usage?  Is there something in our coding
> > style that forbids macros that don’t allow a separator to be placed
> > after them?
> 
> If we have more than one such macro, it is easier to write and indent 
> (especially when using your editor's ability to decipher enough syntax 
> to suggest how to indent):
> 
> myarray = {
>COMMON_ELEMENTS,
>MORE_ELEMENTS,
>{ /* end of list */ }
> };
> 
> than it is:
> 
> myarray = {
>COMMON_ELEMENTS
>MORE_ELEMENTS
>{ /* end of list */ }
> };
> 
> which in turn implies that it is better to NOT stick a trailing comma in 
> the macro itself.  Similarly, for macros intended to replace statements, 
> we tend to avoid the trailing ; in the macro itself, because it is 
> easier to read:
> 
> {
>code;
>MACRO();
>more code;
> }
> 
> than it is:
> 
> {
>code;
>MACRO()
>more code;
> }
> 

100% agree with that.


Here something a bit off-topic, but something that I find a bit amusing and is 
somewhat related to
hiding punctuation in macros:

I once wasted about hour of my life trying to understand why kernel ABI macro I 
added for a backport
didn't work as intended.
(This was a macro we are supposed to wrap each new struct field in it to
inform to the ABI checker
that it is OK).

I was almost ready to poke my eyes out as I were comparing what I wrote to what 
is present in
few more places that use that macro, till I finally understood that the macro 
expects you to
not stick ';' after it. It does compile fine with an extra ';', since it is 
just an empty statement,
but this was tripping some regular expression in the ABI checker script or 
something.
(It didn't give me any meaningful error).

Back to topic, I'll rebase this patch, as I always do prior to sending
a new patch series.


Best regards,
Maxim Levitsky





Re: [PATCH v6 04/14] block/amend: separate amend and create options for qemu-img

2020-05-17 Thread Maxim Levitsky
On Thu, 2020-05-14 at 14:28 +0200, Max Reitz wrote:
> On 10.05.20 15:40, Maxim Levitsky wrote:
> > Some options are only useful for creation
> > (or hard to be amended, like cluster size for qcow2), while some other
> > options are only useful for amend, like upcoming keyslot management
> > options for luks
> > 
> > Since currently only qcow2 supports amend, move all its options
> > to a common macro and then include it in each action option list.
> > 
> > In future it might be useful to remove some options which are
> > not supported anyway from amend list, which currently
> > cause an error message if amended.
> > 
> > Signed-off-by: Maxim Levitsky 
> > Reviewed-by: Daniel P. Berrangé 
> > ---
> >  block/qcow2.c | 160 +-
> >  include/block/block_int.h |   4 +
> >  qemu-img.c|  18 ++---
> >  3 files changed, 100 insertions(+), 82 deletions(-)
> > 
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index 79fbad9d76..fc494c7591 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -5520,83 +5520,96 @@ void qcow2_signal_corruption(BlockDriverState *bs, 
> > bool fatal, int64_t offset,
> >  s->signaled_corruption = true;
> >  }
> >  
> > +#define QCOW_COMMON_OPTIONS \
> > +{   \
> > +.name = BLOCK_OPT_SIZE, \
> > +.type = QEMU_OPT_SIZE,  \
> > +.help = "Virtual disk size" \
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_COMPAT_LEVEL, \
> > +.type = QEMU_OPT_STRING,\
> > +.help = "Compatibility level (v2 [0.10] or v3 [1.1])"   \
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_BACKING_FILE, \
> > +.type = QEMU_OPT_STRING,\
> > +.help = "File name of a base image" \
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_BACKING_FMT,  \
> > +.type = QEMU_OPT_STRING,\
> > +.help = "Image format of the base image"\
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_DATA_FILE,\
> > +.type = QEMU_OPT_STRING,\
> > +.help = "File name of an external data file"\
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_DATA_FILE_RAW,\
> > +.type = QEMU_OPT_BOOL,  \
> > +.help = "The external data file must stay valid "   \
> > +"as a raw image"\
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_ENCRYPT,  \
> > +.type = QEMU_OPT_BOOL,  \
> > +.help = "Encrypt the image with format 'aes'. (Deprecated " \
> > +"in favor of " BLOCK_OPT_ENCRYPT_FORMAT "=aes)",\
> > +},  \
> > +{   \
> > +.name = BLOCK_OPT_ENCRYPT_FORMAT,   \
> > +.type = QEMU_OPT_STRING,\
> > +.help = "Encrypt the image, format choices: 'aes', 'luks'", \
> > +},  \
> > +BLOCK_CRYPTO_OPT_DEF_KEY_SECRET("encrypt.", \
> > +"ID of secret providing qcow AES key or LUKS passphrase"),  \
> > +BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_ALG("encrypt."),   \
> > +BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_MODE("encrypt."),  \
> > +BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_ALG("encrypt."),\
> > +BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_HASH_ALG("encrypt."),   \
> > +BLOCK_CRYPTO_OPT_DEF_LUKS_HASH_ALG("encrypt."), \
> > +BLOCK_CRYPTO_OP

Re: [PATCH v6 05/14] block/amend: refactor qcow2 amend options

2020-05-17 Thread Maxim Levitsky
On Thu, 2020-05-14 at 15:36 +0200, Max Reitz wrote:
> On 10.05.20 15:40, Maxim Levitsky wrote:
> > Some qcow2 create options can't be used for amend.
> > Remove them from the qcow2 create options and add generic logic to detect
> > such options in qemu-img
> > 
> > Signed-off-by: Maxim Levitsky 
> > Reviewed-by: Daniel P. Berrangé 
> > ---
> >  block/qcow2.c  | 108 ++---
> >  qemu-img.c |  18 +++-
> >  tests/qemu-iotests/049.out | 102 ++--
> >  tests/qemu-iotests/061.out |  12 ++-
> >  tests/qemu-iotests/079.out |  18 ++--
> >  tests/qemu-iotests/082.out | 149 
> >  tests/qemu-iotests/085.out |  38 
> >  tests/qemu-iotests/087.out |   6 +-
> >  tests/qemu-iotests/115.out |   2 +-
> >  tests/qemu-iotests/121.out |   4 +-
> >  tests/qemu-iotests/125.out | 192 ++---
> >  tests/qemu-iotests/134.out |   2 +-
> >  tests/qemu-iotests/144.out |   4 +-
> >  tests/qemu-iotests/158.out |   4 +-
> >  tests/qemu-iotests/182.out |   2 +-
> >  tests/qemu-iotests/185.out |   8 +-
> >  tests/qemu-iotests/188.out |   2 +-
> >  tests/qemu-iotests/189.out |   4 +-
> >  tests/qemu-iotests/198.out |   4 +-
> >  tests/qemu-iotests/243.out |  16 ++--
> >  tests/qemu-iotests/250.out |   2 +-
> >  tests/qemu-iotests/255.out |   8 +-
> >  tests/qemu-iotests/259.out |   2 +-
> >  tests/qemu-iotests/263.out |   4 +-
> >  tests/qemu-iotests/280.out |   2 +-
> 
> These reference output hunks need some rebasing due to the new
> compression_type option.
Done. I so hope to get it merged so I won't need to rebase it again :-)

> 
> >  25 files changed, 284 insertions(+), 429 deletions(-)
> > 
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index fc494c7591..db86500839 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> 
> [...]
> 
> > @@ -5552,37 +5506,6 @@ void qcow2_signal_corruption(BlockDriverState *bs, 
> > bool fatal, int64_t offset,
> >  .help = "The external data file must stay valid "   \
> >  "as a raw image"\
> >  },  \
> > -{   \
> > -.name = BLOCK_OPT_ENCRYPT,  \
> > -.type = QEMU_OPT_BOOL,  \
> > -.help = "Encrypt the image with format 'aes'. (Deprecated " \
> > -"in favor of " BLOCK_OPT_ENCRYPT_FORMAT "=aes)",\
> > -},  \
> > -{   \
> > -.name = BLOCK_OPT_ENCRYPT_FORMAT,   \
> > -.type = QEMU_OPT_STRING,\
> > -.help = "Encrypt the image, format choices: 'aes', 'luks'", \
> > -},  \
> > -BLOCK_CRYPTO_OPT_DEF_KEY_SECRET("encrypt.", \
> > -"ID of secret providing qcow AES key or LUKS passphrase"),  \
> > -BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_ALG("encrypt."),   \
> > -BLOCK_CRYPTO_OPT_DEF_LUKS_CIPHER_MODE("encrypt."),  \
> > -BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_ALG("encrypt."),\
> > -BLOCK_CRYPTO_OPT_DEF_LUKS_IVGEN_HASH_ALG("encrypt."),   \
> > -BLOCK_CRYPTO_OPT_DEF_LUKS_HASH_ALG("encrypt."), \
> > -BLOCK_CRYPTO_OPT_DEF_LUKS_ITER_TIME("encrypt."),\
> > -{   \
> > -.name = BLOCK_OPT_CLUSTER_SIZE, \
> > -.type = QEMU_OPT_SIZE,  \
> > -.help = "qcow2 cluster size",   \
> > -.def_value_str = stringify(DEFAULT_CLUSTER_SIZE)\
> > -},  \
> > -{   \
> > -.name = BLOCK_OPT_PREALLOC, \
> > -.type = QEMU_OPT_STRING,\
> > -.help = "Preallocation mode (allowed values: off, " \
> > -"metadata, falloc, full)"   \
> > -},  \
> >  {   \
> >  .name = BLOCK_OPT_LAZY_REFCOUNTS,   \
> >  .type = QEMU_OPT_BOOL,  \
> > @@ -5600,6 +5523,37 @@ static QemuOptsList qcow2_create_opts = {
> >  .name = "qcow2-create-opts",
> >  .head = QTAILQ_HEAD_INITIALIZER(qcow2_create_opts.head),
> >  .desc = {
> > +{ 

Re: [PATCH v6 07/14] block/crypto: implement the encryption key management

2020-05-17 Thread Maxim Levitsky
On Thu, 2020-05-14 at 16:32 +0200, Max Reitz wrote:
> On 14.05.20 16:14, Daniel P. Berrangé wrote:
> > On Thu, May 14, 2020 at 04:09:59PM +0200, Max Reitz wrote:
> > > On 10.05.20 15:40, Maxim Levitsky wrote:
> > > > This implements the encryption key management using the generic code in
> > > > qcrypto layer and exposes it to the user via qemu-img
> > > > 
> > > > This code adds another 'write_func' because the initialization
> > > > write_func works directly on the underlying file, and amend
> > > > works on instance of luks device.
> > > > 
> > > > This commit also adds a 'hack/workaround' I and Kevin Wolf (thanks)
> > > > made to make the driver both support write sharing (to avoid breaking 
> > > > the users),
> > > > and be safe against concurrent  metadata update (the keyslots)
> > > > 
> > > > Eventually the write sharing for luks driver will be deprecated
> > > > and removed together with this hack.
> > > > 
> > > > The hack is that we ask (as a format driver) for 
> > > > BLK_PERM_CONSISTENT_READ
> > > > and then when we want to update the keys, we unshare that permission.
> > > > So if someone else has the image open, even readonly, encryption
> > > > key update will fail gracefully.
> > > > 
> > > > Also thanks to Daniel Berrange for the idea of
> > > > unsharing read, rather that write permission which allows
> > > > to avoid cases when the other user had opened the image read-only.
> > > > 
> > > > Signed-off-by: Maxim Levitsky 
> > > > Reviewed-by: Daniel P. Berrangé 
> > > > ---
> > > >  block/crypto.c | 127 +++--
> > > >  block/crypto.h |  34 +
> > > >  2 files changed, 158 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/block/crypto.c b/block/crypto.c
> > > > index 2e16b62bdc..b14cb0ff06 100644
> > > > --- a/block/crypto.c
> > > > +++ b/block/crypto.c
> > > 
> > > [...]
> > > 
> > > > +static void
> > > > +block_crypto_child_perms(BlockDriverState *bs, BdrvChild *c,
> > > > + const BdrvChildRole *role,
> > > > + BlockReopenQueue *reopen_queue,
> > > > + uint64_t perm, uint64_t shared,
> > > > + uint64_t *nperm, uint64_t *nshared)
> > > > +{
> > > > +
> > > > +BlockCrypto *crypto = bs->opaque;
> > > > +
> > > > +bdrv_filter_default_perms(bs, c, role, reopen_queue,
> > > > +perm, shared, nperm, nshared);
> > > > +/*
> > > > + * Ask for consistent read permission so that if
> > > > + * someone else tries to open this image with this permission
> > > > + * neither will be able to edit encryption keys, since
> > > > + * we will unshare that permission while trying to
> > > > + * update the encryption keys
> > > > + */
> > > > +if (!(bs->open_flags & BDRV_O_NO_IO)) {
> > > > +*nperm |= BLK_PERM_CONSISTENT_READ;
> > > > +}
> > > 
> > > I’m not sure this is important, because this really means we won’t do
> > > I/O.  Its only relevant use in this case is for qemu-img info.  Do we
> > > really care if someone edits the key slots while qemu-img info is
> > > processing?
> > 
> > FWIW, OpenStack runs  qemu-img info in a periodic background job, so
> > it can be concurrent with anything else they are running.
> 
> That might actually be a problem then, because this may cause sporadic
> failure when trying to change (amend) keyslots; while qemu-img info
> holds the CONSISTENT_READ permission, the amend process can’t unshare
> it.  That might lead to hard-to-track-down bugs.
> 
> > Having said
> > that due to previous QEMU bugs, they unconditonally pass the arg to
> > qemu-img to explicitly disable locking
> 
> Well, then it doesn’t matter in this case.  But still something to
> consider, probably.
> 
> Max
> 
So I understand correctly that I should leave the patch as is?

Thanks for the review!

Best regards,
Maxim Levitsky




Re: [PATCH v6 08/14] block/qcow2: extend qemu-img amend interface with crypto options

2020-05-17 Thread Maxim Levitsky
On Thu, 2020-05-14 at 16:30 +0200, Max Reitz wrote:
> On 10.05.20 15:40, Maxim Levitsky wrote:
> > Now that we have all the infrastructure in place,
> > wire it in the qcow2 driver and expose this to the user.
> > 
> > Signed-off-by: Maxim Levitsky 
> > Reviewed-by: Daniel P. Berrangé 
> > ---
> >  block/qcow2.c  | 72 +-
> >  tests/qemu-iotests/082.out | 45 
> 
> Again, some rebasing required because of compression_type.
Actually for this particular test, the output didn't change,
because the compression_type is not amendable (at least yet).


> 
> >  2 files changed, 108 insertions(+), 9 deletions(-)
> > 
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index db86500839..4bb6e3fc8f 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> 
> [...]
> 
> > @@ -4744,17 +4757,11 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts 
> > *opts, BlockDriverState *in_bs,
> >  g_free(optstr);
> >  
> >  if (has_luks) {
> > +
> 
> Why?
No reason. Fixed!

> 
> With this line dropped, and 082.out fixed up:
> 
> Reviewed-by: Max Reitz 
> 
Thank you very much!
Best regards,
Maxim Levitsky







Re: [PATCH v6 09/14] iotests: filter few more luks specific create options

2020-05-17 Thread Maxim Levitsky
On Thu, 2020-05-14 at 16:49 +0200, Max Reitz wrote:
> On 10.05.20 15:40, Maxim Levitsky wrote:
> > This allows more tests to be able to have same output on both qcow2 luks 
> > encrypted images
> > and raw luks images
> > 
> > Signed-off-by: Maxim Levitsky 
> > Reviewed-by: Daniel P. Berrangé 
> > ---
> >  tests/qemu-iotests/087.out   |  6 ++---
> >  tests/qemu-iotests/134.out   |  2 +-
> >  tests/qemu-iotests/158.out   |  4 +--
> >  tests/qemu-iotests/188.out   |  2 +-
> >  tests/qemu-iotests/189.out   |  4 +--
> >  tests/qemu-iotests/198.out   |  4 +--
> >  tests/qemu-iotests/263.out   |  4 +--
> >  tests/qemu-iotests/274.out   | 46 
> >  tests/qemu-iotests/284.out   |  6 ++---
> >  tests/qemu-iotests/common.filter |  6 +++--
> >  10 files changed, 43 insertions(+), 41 deletions(-)
> 
> [...]
> 
> > diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
> > index 9d6fdeb1f7..59de176b99 100644
> > --- a/tests/qemu-iotests/274.out
> > +++ b/tests/qemu-iotests/274.out
> > @@ -1,9 +1,9 @@
> >  == Commit tests ==
> > -Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=2097152 cluster_size=65536 
> > lazy_refcounts=off refcount_bits=16
> > +Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 size=2097152 
> > lazy_refcounts=off refcount_bits=16
> >  
> > -Formatting 'TEST_DIR/PID-mid', fmt=qcow2 size=1048576 
> > backing_file=TEST_DIR/PID-base cluster_size=65536 lazy_refcounts=off 
> > refcount_bits=16
> > +Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 size=1048576 
> > backing_file=TEST_DIR/PID-base lazy_refcounts=off refcount_bits=16
> >  
> > -Formatting 'TEST_DIR/PID-top', fmt=qcow2 size=2097152 
> > backing_file=TEST_DIR/PID-mid cluster_size=65536 lazy_refcounts=off 
> > refcount_bits=16
> > +Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 size=2097152 
> > backing_file=TEST_DIR/PID-mid lazy_refcounts=off refcount_bits=16
> 
> @size and @cluster_size swapping positions doesn’t look right for this
> patch.  I think all hunks for 274.out should be in patch 5.
Fixed

> 
> Max
> 

Best regards,
Maxim Levitsky





Re: [PATCH v2 5/5] vhost: add device started check in migration set log

2020-05-17 Thread Jason Wang



On 2020/5/16 上午12:54, Dima Stepanov wrote:

On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:

On 2020/5/13 下午5:47, Dima Stepanov wrote:

 case CHR_EVENT_CLOSED:
 /* a close event may happen during a read/write, but vhost
  * code assumes the vhost_dev remains setup, so delay the
  * stop & clear to idle.
  * FIXME: better handle failure in vhost code, remove bh
  */
 if (s->watch) {
 AioContext *ctx = qemu_get_current_aio_context();

 g_source_remove(s->watch);
 s->watch = 0;
 qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
  NULL, NULL, false);

 aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
 }
 break;

I think it's time we dropped the FIXME and moved the handling to common
code. Jason? Marc-André?

I agree. Just to confirm, do you prefer bh or doing changes like what is
done in this series? It looks to me bh can have more easier codes.

Could it be a good idea just to make disconnect in the char device but
postphone clean up in the vhost-user-blk (or any other vhost-user
device) itself? So we are moving the postphone logic and decision from
the char device to vhost-user device. One of the idea i have is as
follows:
   - Put ourself in the INITIALIZATION state
   - Start these vhost-user "handshake" commands
   - If we got a disconnect error, perform disconnect, but don't clean up
 device (it will be clean up on the roll back). I can be done by
 checking the state in vhost_user_..._disconnect routine or smth like it


Any issue you saw just using the aio bh as Michael posted above.

Then we don't need to deal with the silent vhost_dev_stop() and we will have
codes that is much more easier to understand.

I've implemented this solution inside
hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
using the s->connected field. Looks good and more correct fix ). I have
two questions here before i'll rework the fixes:
1. Is it okay to make the similar fix inside vhost_user_blk_event() or
we are looking for more generic vhost-user solution? What do you think?



I think I agree with Michael, it's better to have a generic vhost-user 
solution. But if it turns out to be not easy, we can start from fixing 
vhost-user-blk.




2. For migration we require an additional information that for the
vhost-user device it isn't an error, because i'm trigerring the
following assert error:
   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults 
-no-user-config -M q35,sata=false'.
   Program terminated with signal SIGABRT, Aborted.
   #0  0x7fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]

   (gdb) bt
   #0  0x7fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
   #1  0x7fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
   #2  0x5648ea376ee6 in vhost_log_global_start
   (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
   #3  0x5648ea2dde7e in memory_global_dirty_log_start ()
   at ./memory.c:2611
   #4  0x5648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
   at ./migration/ram.c:2305
   #5  0x5648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 )
   at ./migration/ram.c:2323
   #6  0x5648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
   opaque=0x5648eb1f0f20 )
   at ./migration/ram.c:2436
   #7  0x5648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
   migration/savevm.c:1176
   #8  0x5648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
   migration/migration.c:3416
   #9  0x5648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
   util/qemu-thread-posix.c:519
   #10 0x7fb56eac56ba in start_thread () from
   /lib/x86_64-linux-gnu/libpthread.so.0
   #11 0x7fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
   (gdb) frame 2
   #2  0x5648ea376ee6 in vhost_log_global_start
  (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
   857 abort();
   (gdb) list
   852 {
   853 int r;
   854
   855 r = vhost_migration_log(listener, true);
   856 if (r < 0) {
   857 abort();
   858 }
   859 }
   860
   861 static void vhost_log_global_stop(MemoryListener *listener)
Since bh postphone the clean up, we can't use the ->started field.
Do we have any mechanism to get the device type/state in the common
vhost_migration_log() routine? So for example for the vhost-user/disconnect
device we will be able to return 0. Or should we implement it and introduce
it in this patch set?



This requires more thought, I will reply in Feng's mail.

Thanks




Thanks, Dima.


Thank



   - vhost-user command returns error back to the _start() routine
   - Rollback in one place in the start() routine, by calling this
 postphoned clean

Re: [PATCH v2 5/5] vhost: add device started check in migration set log

2020-05-17 Thread Jason Wang



On 2020/5/16 上午11:20, Li Feng wrote:

Hi, Dima.
This abort is what I have mentioned in my previous email.
I have triggered this crash without any fix a week ago.
And I have written a test patch to let vhost_log_global_start return
int and propagate the error to up layer.
However, my change is a little large, because the origin callback
return void, and don't do some rollback.
After test, the migration could migrate to dst successfully, and fio
is still running perfectly, but the src vm is still stuck here, no
crash.

Is it right to return this error to the up layer?



That could be a solution or we may ask David for more suggestion.

Another thing that might be useful is to block re connection during 
migration.


Thanks




Thanks,
Feng Li

Dima Stepanov  于2020年5月16日周六 上午12:55写道:

On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:

On 2020/5/13 下午5:47, Dima Stepanov wrote:

 case CHR_EVENT_CLOSED:
 /* a close event may happen during a read/write, but vhost
  * code assumes the vhost_dev remains setup, so delay the
  * stop & clear to idle.
  * FIXME: better handle failure in vhost code, remove bh
  */
 if (s->watch) {
 AioContext *ctx = qemu_get_current_aio_context();

 g_source_remove(s->watch);
 s->watch = 0;
 qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
  NULL, NULL, false);

 aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
 }
 break;

I think it's time we dropped the FIXME and moved the handling to common
code. Jason? Marc-André?

I agree. Just to confirm, do you prefer bh or doing changes like what is
done in this series? It looks to me bh can have more easier codes.

Could it be a good idea just to make disconnect in the char device but
postphone clean up in the vhost-user-blk (or any other vhost-user
device) itself? So we are moving the postphone logic and decision from
the char device to vhost-user device. One of the idea i have is as
follows:
   - Put ourself in the INITIALIZATION state
   - Start these vhost-user "handshake" commands
   - If we got a disconnect error, perform disconnect, but don't clean up
 device (it will be clean up on the roll back). I can be done by
 checking the state in vhost_user_..._disconnect routine or smth like it


Any issue you saw just using the aio bh as Michael posted above.

Then we don't need to deal with the silent vhost_dev_stop() and we will have
codes that is much more easier to understand.

I've implemented this solution inside
hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
using the s->connected field. Looks good and more correct fix ). I have
two questions here before i'll rework the fixes:
1. Is it okay to make the similar fix inside vhost_user_blk_event() or
we are looking for more generic vhost-user solution? What do you think?
2. For migration we require an additional information that for the
vhost-user device it isn't an error, because i'm trigerring the
following assert error:
   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults 
-no-user-config -M q35,sata=false'.
   Program terminated with signal SIGABRT, Aborted.
   #0  0x7fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]

   (gdb) bt
   #0  0x7fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
   #1  0x7fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
   #2  0x5648ea376ee6 in vhost_log_global_start
   (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
   #3  0x5648ea2dde7e in memory_global_dirty_log_start ()
   at ./memory.c:2611
   #4  0x5648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
   at ./migration/ram.c:2305
   #5  0x5648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 )
   at ./migration/ram.c:2323
   #6  0x5648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
   opaque=0x5648eb1f0f20 )
   at ./migration/ram.c:2436
   #7  0x5648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
   migration/savevm.c:1176
   #8  0x5648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
   migration/migration.c:3416
   #9  0x5648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
   util/qemu-thread-posix.c:519
   #10 0x7fb56eac56ba in start_thread () from
   /lib/x86_64-linux-gnu/libpthread.so.0
   #11 0x7fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
   (gdb) frame 2
   #2  0x5648ea376ee6 in vhost_log_global_start
  (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
   857 abort();
   (gdb) list
   852 {
   853 int r;
   854
   855 r = vhost_migration_log(listener, true);
   856 if (r < 0) {
   857 abort();
   858 }
   859 }
   860
   861 static void vhost_log_global_stop(Memo

Re: [RFC v3 4/6] qmp: add QMP command x-debug-virtio-queue-status

2020-05-17 Thread Jason Wang



On 2020/5/15 下午11:16, Laurent Vivier wrote:

On 08/05/2020 04:57, Jason Wang wrote:

On 2020/5/7 下午7:49, Laurent Vivier wrote:

This new command shows internal status of a VirtQueue.
(vrings and indexes).

Signed-off-by: Laurent Vivier 


It looks to me that packed virtqueue is not supported. It's better to
add them in the future.

I agree, it's why the series still remains an "RFC".

...

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 59bf6ef651a6..57552bf05014 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3877,6 +3877,41 @@ static VirtIODevice *virtio_device_find(const
char *path)
   return NULL;
   }
   +VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,
+ uint16_t queue,
Error **errp)
+{
+    VirtIODevice *vdev;
+    VirtQueueStatus *status;
+
+    vdev = virtio_device_find(path);
+    if (vdev == NULL) {
+    error_setg(errp, "Path %s is not a VirtIO device", path);
+    return NULL;
+    }
+
+    if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev,
queue)) {
+    error_setg(errp, "Invalid virtqueue number %d", queue);
+    return NULL;
+    }
+
+    status = g_new0(VirtQueueStatus, 1);
+    status->queue_index = vdev->vq[queue].queue_index;
+    status->inuse = vdev->vq[queue].inuse;
+    status->vring_num = vdev->vq[queue].vring.num;
+    status->vring_num_default = vdev->vq[queue].vring.num_default;
+    status->vring_align = vdev->vq[queue].vring.align;
+    status->vring_desc = vdev->vq[queue].vring.desc;
+    status->vring_avail = vdev->vq[queue].vring.avail;
+    status->vring_used = vdev->vq[queue].vring.used;
+    status->last_avail_idx = vdev->vq[queue].last_avail_idx;


This might not be correct when vhost is used.

We may consider to sync last_avail_idx from vhost backends here?

Yes, but I don't know how to do that. Where can I find the information?



It could be synced through vhost ops vhost_get_vring_base(), see 
vhost_virtqueue_stop().


Thanks




Thanks,
Laurent





Re: [PATCH v6 07/20] hw/block/nvme: fix pin-based interrupt behavior

2020-05-17 Thread Klaus Jensen
On May 14 06:45, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> First, since the device only supports MSI-X or pin-based interrupt, if
> MSI-X is not enabled, it should not accept interrupt vectors different
> from 0 when creating completion queues.
> 
> Secondly, the irq_status NvmeCtrl member is meant to be compared to the
> INTMS register, so it should only be 32 bits wide. And it is really only
> useful when used with multi-message MSI.
> 
> Third, since we do not force a 1-to-1 correspondence between cqid and
> interrupt vector, the irq_status register should not have bits set
> according to cqid, but according to the associated interrupt vector.
> 
> Fix these issues, but keep irq_status available so we can easily support
> multi-message MSI down the line.
> 
> Fixes: 5e9aa92eb1a5 ("hw/block: Fix pin-based interrupt behaviour of NVMe")
> Cc: "Michael S. Tsirkin" 
> Cc: Marcel Apfelbaum 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c | 12 
>  hw/block/nvme.h |  2 +-
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 

Gentle ping on this and [PATCH v6 08/20].


Thanks,
Klaus