Re: [PATCH] qapi/block: fix nbd-server-add spec

2019-12-20 Thread Eric Blake

On 12/19/19 10:14 AM, Nir Soffer wrote:


1. Using disk name as a bitmap name is a bad behavior, as they are completely
different concepts. Especially keeping in mind that user already knows disk 
name anyway
and no reason to write this export name inside metadata context of this export.


The different concept is expressed by the "qemu:dirty-bitmap:" prefix.
"qemu:dirty-bitmap:export-name" means the dirty bitmap for this export.


Why do you think so? Did you read NBD specification?


Yes - the name of the bitmap does not have any meaning.
But for nbd_server_add we allow only single bitmap for export.


Just because qemu is currently limited to only exposing one bitmap at 
the moment does not mean that a future version can't expose multiple 
bitmaps. It may very well be that we have reason to expose both 
"qemu:dirty-bitmap:timeA" and "qemu:dirty-bitmap:timeB" on the same 
export, for exposing two bitmaps at once.  To get to that point, we'd 
have to refactor the QAPI command to allow attaching more than one 
bitmap at the time of creating the NBD export, but it's not impossible.





Metadata context is always owned by some export.


Of course.


Do you mean that there will bemetadata contexts

qemu:dirty-bitmap:export-A
qemu:dirty-bitmap:export-B

both defined for export-A?


It does not make sense, but it is valid.


If an image has multiple bitmaps, exposing all of those as separate 
contexts at the same time for a single export can indeed make sense.





2. It's not directly documented. You assume that NAME == @name. I understand 
that
it may be assumed.. But it's not documented.


But NAME is likely to be understood as the name argument, and unlikely to be the
bitmap name.


Yes likely. But it's still bad specification, which should be fixed.


If we cannot change the current behavior since it will break current users,
I agree fixing the spec to describe the current behavior is a good idea.


We need the doc fix. Whether we also want an additional fix adding an 
optional parameter allowing user control over the export name is also 
under debate (the fact that the old x-nbd-server-add-bitmap supported it 
shows that it may be useful, but it is not minimal, and as I pointed out 
at the time of removing x-, libvirt can always control what name is 
exposed by creating a temporary bitmap and merging from other bitmaps 
into the temporary).


We also have to think about a future of parallel backup jobs: libvirt 
can create a single temporary bitmap to expose whatever name it wants 
under one job, but if libvirt wants to expose the SAME user-visible name 
to two parallel jobs, it cannot create two bitmaps with the same name, 
so having a way to set the user-visible name of an arbitrary bitmap when 
producing the NBD export makes sense on that front.






3. It's never worked like you write. So if we change the behavior, we'll break
existing users.


Do we have existing users? isn't this new feature in 4.2?


No, it's since 4.0


As long as altering the exported name is controlled by a new optional 
parameter, it does not hurt older 4.0 clients that do not use the new 
parameter.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH] qapi/block: fix nbd-server-add spec

2019-12-20 Thread Eric Blake

On 12/19/19 9:08 AM, Nir Soffer wrote:


Let's just fix qapi spec now.


But qapi documents a better behavior for users. We should fix the code instead
to mach the docs.


1. Using disk name as a bitmap name is a bad behavior, as they are completely
different concepts. Especially keeping in mind that user already knows disk 
name anyway
and no reason to write this export name inside metadata context of this export.


The different concept is expressed by the "qemu:dirty-bitmap:" prefix.
"qemu:dirty-bitmap:export-name" means the dirty bitmap for this export.


2. It's not directly documented. You assume that NAME == @name. I understand 
that
it may be assumed.. But it's not documented.


But NAME is likely to be understood as the name argument, and unlikely to be the
bitmap name.


That's a misunderstanding due to poor docs.  The bitmap name has always 
been what was exposed, ever since we promoted things to stable by 
getting rid of x-.





3. It's never worked like you write. So if we change the behavior, we'll break
existing users.


Do we have existing users? isn't this new feature in 4.2?


No, the feature stems back to 4.0, when we got rid of x-.  There are 
other reasons that dirty bitmaps aren't really usable for incremental 
backups without qemu 4.2, but qemu 4.0 was the first time we exposed a 
stable interface for a bitmap over an NBD export, and that release used 
the bitmap name (and not the export name), so at this point, a code 
change would break expectations of any 4.0 client using bitmaps for 
other reasons.  Libvirt currently has absolute control over the bitmap 
name (my initial code in libvirt created a temporary bitmap with my 
desired name, then merged the contents from the permanent bitmaps 
corresponding to the actual libvirt Checkpoint objects into the 
temporary, so that it could then call nbd-export-add with the temporary 
bitmap name).  But, as you point out...




Before we had experimental x-block-dirty-bitmap APIs, which are stable, so users
could not depend on them.


The unstable x-block-dirty-bitmap APIs _did_ have a way to export a 
user-controlled name SEPARATE from the bitmap name.  At the time I was 
removing the x- prefix, I asked if anyone had a strong use case for 
keeping that functionality.  No one spoke up in favor of keeping it 
(Nikolay mentioned using the old interface, but not being stumped by its 
removal), so we nuked it at the time.  We can always add it back (now 
that it sounds like you have a use case where it may be more 
compelling), but it was easier to stabilize less and add more later as 
needed, than to stabilize too much and regret that we had to support the 
flexibility that no one needed.

https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg02373.html
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg01970.html

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: qemu implementation details leaks to NBD client

2019-12-20 Thread Eric Blake

On 12/19/19 5:36 AM, Nir Soffer wrote:

When connecting to qemu NBD server during incremental backup, client needs to
register this meta context:

 "qemu:dirty-bitmap:backup-sda"

To get dirty bitmap chunks in NBD_CMD_BLOCK_STATUS for export "sda".

This comes from libvirt code creating a backup node named "backup-sda"
for drive "sda",
and creating a temporary dirty bitmap with the same name, which is reasonable.


The name shown here is the bitmap name; libvirt can create any temporary 
bitmap name if that is easier to use.  Also, while connecting your NBD 
client, you can use NBD_OPT_LIST_META_CONTEXT on the query 
"qemu:dirty-bitmap:" to learn which dirty-bitmap name(s) are exported 
(for now, qemu exports at most one, but in the future, it could export 
more than one).




All clients using this API need to have code like:

 # Libvirt uses this name for the dirty bitmap.
 dirty_bitmap = "backup-" + export_name

Duplicating the magic libvirt code:

 if (incremental) {
 dd->incrementalBitmap = g_strdup_printf("backup-%s", dd->domdisk->dst);

We have only one client now[1] so this is not a huge issue, but it is
also easy to fix.

I think what we would like to have instead is:

 "qemu:dirty-bitmap:sda"


The libvirt API for backups is not frozen until the 6.0 release, so we 
could have libvirt just blindly export a bitmap by this name.  The 
problem is that this name is not unique - for now, while we only support 
one backup job at a time, it doesn't matter what name is in use. But in 
the future, when we have multiple backup jobs, we have to start worrying 
about whether parallel jobs share a single NBD server (in which case the 
export names matter, but for a given export, the bitmap names available 
for that export are not constrained by other exports) or multiple NBD 
servers (one per backup job).  But we ALSO have to worry about 
presenting a sane bitmap name over any given export regardless of what 
bitmap is being exported: if we have two parallel jobs from different 
points in time, we REQUIRE that two different bitmaps be in use between 
the two jobs, even if we WANT the NBD client to see the name 
'qemu:dirty-bitmap:sda' for both jobs.  So our current default of naming 
the export name after the bitmap name is not future-friendly.




So clients connecting to NBD server with exportname=sda would find the
dirty bitmap
for this export at the expected name, without the need to depend on
the internal node
name.

It looks like a but in qemu, since:

# @name: Export name. If unspecified, the @device parameter is used as the
#export name. (Since 2.12)

If export name is "sda"...

#
# @writable: Whether clients should be able to write to the device via the
# NBD connection (default false).

# @bitmap: Also export the dirty bitmap reachable from @device, so the
#  NBD client can use NBD_OPT_SET_META_CONTEXT with
#  "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0)

This API expect that the user can access it at:

 "qemu:dirty-bitmap:sda"


As mentioned in the other thread, this is a doc bug, stemming from when 
we removed the x- prefix. I'll reply more there.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 2/2] iotests: Fix IMGOPTSSYNTAX for nbd

2019-12-20 Thread Eric Blake

On 12/18/19 4:48 AM, Max Reitz wrote:

There is no $SOCKDIR, only $SOCK_DIR.

Fixes: f3923a72f199b2c63747a7032db74730546f55c6
Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/common.rc | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)


Whoops.  Thanks for the fix.

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE

2019-12-20 Thread Eric Blake

On 12/19/19 8:38 AM, Max Reitz wrote:

fuse-export-add allows mounting block graph nodes via FUSE on some
existing regular file.  That file should then appears like a raw disk
image, and accesses to it result in accesses to the exported BDS.

Right now, we only set up the mount point and tear all mount points down
in bdrv_close_all().  We do not implement any access functions, so
accessing the mount point only results in errors.  This will be
addressed by a followup patch.

The set of exported nodes is kept in a hash table so we can later add a
fuse-export-remove that allows unmounting.


Before I review this, a quick question:

How does this compare to the recently added nbdfuse?
https://www.redhat.com/archives/libguestfs/2019-October/msg00080.html

Or put another way, maybe we get the same effect by combining qemu-nbd 
with nbdfuse, but this new utility would cut out a middleman for more 
efficiency, right?




+++ b/block/fuse.c
@@ -0,0 +1,260 @@
+/*
+ * Present a block device as a raw image through FUSE
+ *
+ * Copyright (c) 2019 Max Reitz 



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 00/18] block: Allow exporting BDSs via FUSE

2019-12-20 Thread Eric Blake

On 12/20/19 6:50 AM, Kevin Wolf wrote:

Am 20.12.2019 um 11:30 hat Max Reitz geschrieben:

I placed it into block/ because that just seemed like the least bad
place to me (apart from creating a new top-level directory like nbd has)
– and also because we already have quite some few non-driver files in
block/ (io.c, the jobs (where some got drivers only rather recently),
accounting.c, ...).


We could consider block/exports/ and eventually also move the NBD server
there.


We already had another thread considering the motion of qemu-nbd.c to 
tools/, and I don't mind moving top-level nbd/ into block/exports/ if 
that makes things easier to reason about.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE

2019-12-20 Thread Eric Blake

On 12/20/19 7:25 AM, Markus Armbruster wrote:



I suppose moving a field between a union base and all variants does
still result in different introspection even though the accepted inputs
are the same.


Correct.  A common member (whether it's local or from the base) is in
SchemaInfoObject.members[].  Moving it to all the variants moves it to
the variant types' .members[].


   Is this kind of movement still allowed unconditionally or
should we be more careful with something like this?


QMP's backward compatibility promise does not include "introspection
value won't change".  Still, such changes can conceivably confuse
clients.  Care is advisable.  But it's not a hard "no".


And libvirt already correctly handles movements like this (so there are 
existing clients aware of the potential confusion).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: Can we retire Python 2 now?

2019-12-20 Thread Juan Quintela
Markus Armbruster  wrote:
> Python 2 EOL is only a few days away[*].  We made configure bitch about
> it in commit e5abf59eae "Deprecate Python 2 support", 2019-07-01.  Any
> objections to retiring it now, i.e. in 5.0?
>
> Cc'ing everyone who appears to be maintaining something that looks like
> a Python script.
>
> [*] https://pythonclock.org/

I am pretty sure that I am not a python maintaainer at all.

But anyways, python3 is only at python3.7.
python3.0 debuted at 2008, so ...

Acked-by: Juan Quintela 
Reviewed-by: Juan Quintela 

And anything else that you can think that endorses the change.

Later, Juan.




Re: Can we retire Python 2 now?

2019-12-20 Thread Eduardo Habkost
On Fri, Dec 20, 2019 at 05:29:30PM +0100, Markus Armbruster wrote:
> Python 2 EOL is only a few days away[*].  We made configure bitch about
> it in commit e5abf59eae "Deprecate Python 2 support", 2019-07-01.  Any
> objections to retiring it now, i.e. in 5.0?

Thanks for the reminder!

I'll be honest: even if somebody in this list objects to dropping
Python 2 support, I'm not willing to be maintainer of a Python 2
codebase in 2020.  The only reason for not doing it in 4.1 was
the tests/vm/netbsd breakage we took very long to debug and fix.

I have just submitted this pull request:

  Subject: [PULL 0/2] Require Python >= 3.5 to build QEMU
  
https://lore.kernel.org/qemu-devel/20191220165141.2207058-1-ehabk...@redhat.com/

> 
> Cc'ing everyone who appears to be maintaining something that looks like
> a Python script.
> 
> [*] https://pythonclock.org/

-- 
Eduardo




Can we retire Python 2 now?

2019-12-20 Thread Markus Armbruster
Python 2 EOL is only a few days away[*].  We made configure bitch about
it in commit e5abf59eae "Deprecate Python 2 support", 2019-07-01.  Any
objections to retiring it now, i.e. in 5.0?

Cc'ing everyone who appears to be maintaining something that looks like
a Python script.

[*] https://pythonclock.org/




Re: [PATCH v12 0/3] qcow2: advanced compression options

2019-12-20 Thread Max Reitz
On 02.12.19 13:15, Andrey Shinkevich wrote:
> The compression filter driver is introduced as suggested by Max.
> A sample usage of the filter can be found in the test #214.
> Now, multiple clusters can be written compressed.
> It is useful for the backup job.
> 
> v12:
>   01: Missed to change the driver interface .bdrv_co_block_status
>   from _status_from_backing to _status_from_file (noticed by
>   Vladimir).
> 
> Andrey Shinkevich (3):
>   block: introduce compress filter driver
>   qcow2: Allow writing compressed data of multiple clusters
>   tests/qemu-iotests: add case to write compressed data of multiple
> clusters
> 
>  block/Makefile.objs|   1 +
>  block/filter-compress.c| 168 
> +
>  block/qcow2.c  | 102 +++
>  qapi/block-core.json   |  10 +--
>  tests/qemu-iotests/214 |  43 
>  tests/qemu-iotests/214.out |  14 
>  6 files changed, 307 insertions(+), 31 deletions(-)
>  create mode 100644 block/filter-compress.c

Thanks, fixed patch 1 and applied to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v12 1/3] block: introduce compress filter driver

2019-12-20 Thread Andrey Shinkevich



On 20/12/2019 17:52, Max Reitz wrote:
> On 02.12.19 13:15, Andrey Shinkevich wrote:
>> Allow writing all the data compressed through the filter driver.
>> The written data will be aligned by the cluster size.
>> Based on the QEMU current implementation, that data can be written to
>> unallocated clusters only. May be used for a backup job.
>>
>> Suggested-by: Max Reitz 
>> Signed-off-by: Andrey Shinkevich 
>> Reviewed-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   block/Makefile.objs |   1 +
>>   block/filter-compress.c | 168 
>> 
>>   qapi/block-core.json|  10 +--
>>   3 files changed, 175 insertions(+), 4 deletions(-)
>>   create mode 100644 block/filter-compress.c
> 
> [...]
> 
>> diff --git a/block/filter-compress.c b/block/filter-compress.c
>> new file mode 100644
>> index 000..4d756ea
>> --- /dev/null
>> +++ b/block/filter-compress.c
>> @@ -0,0 +1,168 @@
> 
> [...]
> 
>> +static int compress_open(BlockDriverState *bs, QDict *options, int flags,
>> + Error **errp)
>> +{
>> +bs->file = bdrv_open_child(NULL, options, "file", bs, _file, 
>> false,
>> +   errp);
>> +if (!bs->file) {
>> +return -EINVAL;
>> +}
>> +
>> +if (!bs->file->bs->drv || 
>> !block_driver_can_compress(bs->file->bs->drv)) {
>> +error_setg(errp,
>> +   "Compression is not supported for underlying format: %s",
>> +   bdrv_get_format_name(bs->file->bs));
> 
> bdrv_get_format_name() returns NULL if bs->file->bs->drv is NULL.  I’m
> sure g_strdup_vprintf() handles %s with a NULL string gracefully in
> practice, but I can’t find that specified anywhere.  So even though I’m
> well aware I’m being a bit stupid about a minor edge case, I’m hesitant
> to accept this patch as-is.
> 
> Obviously the solution can be as simple as bdrv_get_format_name(...) ?:
> "(no format)".
> 
> Well, actually, I can be a bit less stupid about it and just propose
> merging that change in myself.  Would that be OK for you?

Yes, please.
Thank you, Max.

Andrey

> 
> (The rest looks good to me.)
> 
> Max
> 

-- 
With the best regards,
Andrey Shinkevich




Re: [PATCH v12 1/3] block: introduce compress filter driver

2019-12-20 Thread Max Reitz
On 02.12.19 13:15, Andrey Shinkevich wrote:
> Allow writing all the data compressed through the filter driver.
> The written data will be aligned by the cluster size.
> Based on the QEMU current implementation, that data can be written to
> unallocated clusters only. May be used for a backup job.
> 
> Suggested-by: Max Reitz 
> Signed-off-by: Andrey Shinkevich 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/Makefile.objs |   1 +
>  block/filter-compress.c | 168 
> 
>  qapi/block-core.json|  10 +--
>  3 files changed, 175 insertions(+), 4 deletions(-)
>  create mode 100644 block/filter-compress.c

[...]

> diff --git a/block/filter-compress.c b/block/filter-compress.c
> new file mode 100644
> index 000..4d756ea
> --- /dev/null
> +++ b/block/filter-compress.c
> @@ -0,0 +1,168 @@

[...]

> +static int compress_open(BlockDriverState *bs, QDict *options, int flags,
> + Error **errp)
> +{
> +bs->file = bdrv_open_child(NULL, options, "file", bs, _file, false,
> +   errp);
> +if (!bs->file) {
> +return -EINVAL;
> +}
> +
> +if (!bs->file->bs->drv || !block_driver_can_compress(bs->file->bs->drv)) 
> {
> +error_setg(errp,
> +   "Compression is not supported for underlying format: %s",
> +   bdrv_get_format_name(bs->file->bs));

bdrv_get_format_name() returns NULL if bs->file->bs->drv is NULL.  I’m
sure g_strdup_vprintf() handles %s with a NULL string gracefully in
practice, but I can’t find that specified anywhere.  So even though I’m
well aware I’m being a bit stupid about a minor edge case, I’m hesitant
to accept this patch as-is.

Obviously the solution can be as simple as bdrv_get_format_name(...) ?:
"(no format)".

Well, actually, I can be a bit less stupid about it and just propose
merging that change in myself.  Would that be OK for you?

(The rest looks good to me.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 00/14] chardev: Use QEMUChrEvent enum in IOEventHandler typedef

2019-12-20 Thread Marc-André Lureau
On Wed, Dec 18, 2019 at 9:20 PM Philippe Mathieu-Daudé
 wrote:
>
> Hi,
>
> After this chat on #qemu IRC:
> 13:20  so what is the difference between a IOReadHandler and 
> IOEventHandler?
> 13:25  stsquad: one is in-band and the other out-of-band?
> 13:26  f4bug: ahh yes it seems so - connect/disconnect etc...
> 13:27  see QEMUChrEvent for IOEventHandler
>
> I thought it might be a good opportunity to make the IOEventHandler
> typedef meaning more obvious, by using the QEMUChrEvent enum.
>
> To be able to build I had to explicit all enums ignored in the
> switch(event) statement by these frontends.
>
> Then I used a coccinelle spatch to change the various IOEventHandler.
> I don't think the last patch can be split, but suggestions are welcome!
>
> Regards,
>
> Phil.
>
> v2:
> - do blindly ignore all events using a 'default' case.
>
> Philippe Mathieu-Daudé (14):
>   hw/ipmi: Remove unnecessary declarations
>   hw/ipmi: Explicit we ignore some QEMUChrEvent in IOEventHandler
>   hw/char/terminal3270: Explicit ignored QEMUChrEvent in IOEventHandler
>   hw/usb/dev-serial: Explicit we ignore few QEMUChrEvent in IOEventHandler
>   hw/usb/redirect: Explicit we ignore few QEMUChrEvent in IOEventHandler
>   ccid-card-passthru: Explicit we ignore QEMUChrEvent in IOEventHandler
>   vhost-user-crypto: Explicit we ignore some QEMUChrEvent in IOEventHandler
>   vhost-user-net: Explicit we ignore few QEMUChrEvent in IOEventHandler
>   vhost-user-blk: Explicit we ignore few QEMUChrEvent in IOEventHandler
>   virtio-console: Explicit we ignore some QEMUChrEvent in IOEventHandler
>   monitor/qmp: Explicit we ignore few QEMUChrEvent in IOEventHandler
>   monitor/hmp: Explicit we ignore a QEMUChrEvent in IOEventHandler
>   chardev/char: Explicit we ignore some QEMUChrEvent in IOEventHandler
>   chardev: Use QEMUChrEvent enum in IOEventHandler typedef

Reviewed-by: Marc-André Lureau 

(I guess Paolo will take the series for next PR?)

>
>  include/chardev/char-fe.h   |  2 +-
>  include/chardev/char-mux.h  |  2 +-
>  include/chardev/char.h  |  4 ++--
>  backends/cryptodev-vhost-user.c |  7 ++-
>  chardev/char-mux.c  |  8 
>  chardev/char.c  |  9 +++--
>  gdbstub.c   |  2 +-
>  hw/arm/pxa2xx.c |  2 +-
>  hw/arm/strongarm.c  |  2 +-
>  hw/block/vhost-user-blk.c   |  7 ++-
>  hw/char/cadence_uart.c  |  2 +-
>  hw/char/digic-uart.c|  2 +-
>  hw/char/escc.c  |  2 +-
>  hw/char/etraxfs_ser.c   |  2 +-
>  hw/char/exynos4210_uart.c   |  2 +-
>  hw/char/grlib_apbuart.c |  2 +-
>  hw/char/imx_serial.c|  2 +-
>  hw/char/ipoctal232.c|  2 +-
>  hw/char/lm32_juart.c|  2 +-
>  hw/char/lm32_uart.c |  2 +-
>  hw/char/mcf_uart.c  |  2 +-
>  hw/char/milkymist-uart.c|  2 +-
>  hw/char/nrf51_uart.c|  2 +-
>  hw/char/pl011.c |  2 +-
>  hw/char/serial.c|  2 +-
>  hw/char/sh_serial.c |  2 +-
>  hw/char/terminal3270.c  |  7 ++-
>  hw/char/virtio-console.c|  7 ++-
>  hw/char/xilinx_uartlite.c   |  2 +-
>  hw/ipmi/ipmi_bmc_extern.c   | 12 +++-
>  hw/mips/boston.c|  2 +-
>  hw/mips/mips_malta.c|  2 +-
>  hw/riscv/riscv_htif.c   |  2 +-
>  hw/riscv/sifive_uart.c  |  2 +-
>  hw/usb/ccid-card-passthru.c |  7 ++-
>  hw/usb/dev-serial.c |  6 +-
>  hw/usb/redirect.c   |  7 ++-
>  monitor/hmp.c   |  6 +-
>  monitor/qmp.c   |  7 ++-
>  net/filter-mirror.c |  2 +-
>  net/vhost-user.c|  9 +++--
>  qtest.c |  2 +-
>  tests/test-char.c   |  6 +++---
>  tests/vhost-user-test.c |  2 +-
>  44 files changed, 111 insertions(+), 56 deletions(-)
>
> Cc: "Gonglei (Arei)" 
> Cc: "Marc-André Lureau" 
> Cc: Paolo Bonzini 
> Cc: "Alex Bennée" 
> Cc: "Philippe Mathieu-Daudé" 
> Cc: Andrzej Zaborowski 
> Cc: Peter Maydell 
> Cc: "Michael S. Tsirkin" 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: "Edgar E. Iglesias" 
> Cc: Alistair Francis 
> Cc: Antony Pavlov 
> Cc: Igor Mitsyanko 
> Cc: Fabien Chouteau 
> Cc: KONRAD Frederic 
> Cc: Peter Chubb 
> Cc: Alberto Garcia 
> Cc: Michael Walle 
> Cc: Thomas Huth 
> Cc: Joel Stanley 
> Cc: Cornelia Huck 
> Cc: Halil Pasic 
> Cc: Christian Borntraeger 
> Cc: Laurent Vivier 
> Cc: Amit Shah 
> Cc: Corey Minyard 
> Cc: Paul Burton 
> Cc: Aleksandar Rikalo 
> Cc: Aurelien Jarno 
> Cc: Aleksandar Markovic 
> Cc: Palmer Dabbelt 
> Cc: Sagar Karandikar 
> Cc: Bastian Koppelmann 
> Cc: Gerd Hoffmann 
> Cc: Samuel Thibault 
> Cc: "Dr. David Alan Gilbert" 
> Cc: Markus Armbruster 
> Cc: Zhang Chen 
> Cc: Li Zhijian 
> Cc: Jason Wang 
> Cc: qemu-...@nongnu.org
> Cc: qemu-block@nongnu.org
> Cc: qemu-s3...@nongnu.org
> Cc: 

Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE

2019-12-20 Thread Markus Armbruster
Kevin Wolf  writes:

> Am 20.12.2019 um 13:48 hat Markus Armbruster geschrieben:
>> Kevin Wolf  writes:
>> 
>> > Am 20.12.2019 um 11:48 hat Max Reitz geschrieben:
>> >> So if we kept writable and growable in the common base, then the schema
>> >> would give no information about what exports actually support them.
>> >> 
>> >> On one hand, I don’t know whether it’s important to have this
>> >> information in a static form, or whether it’s sufficient to learn at
>> >> runtime.
>> >> 
>> >> On the other, I don’t know whether it’s important to have those fields
>> >> in the base or not.  Would it make a difference on the wire?
>> >
>> > Not for the command itself, so I think we're free to change it later. It
>> > might make a difference for introspection, though, not sure. Markus?
>> 
>> QAPI schema introspection is designed to hide the difference between
>> local members and base members.  You can move members to or from a base
>> type freely without affecting introspection.  Even if that creates or
>> deletes the base type.
>
> Good, that's helpful. So I can split the nbd-server-add argument type
> into a base that is reused as a union branch and the rest without
> potentially breaking anything.
>
> I suppose moving a field between a union base and all variants does
> still result in different introspection even though the accepted inputs
> are the same.

Correct.  A common member (whether it's local or from the base) is in
SchemaInfoObject.members[].  Moving it to all the variants moves it to
the variant types' .members[].

>   Is this kind of movement still allowed unconditionally or
> should we be more careful with something like this?

QMP's backward compatibility promise does not include "introspection
value won't change".  Still, such changes can conceivably confuse
clients.  Care is advisable.  But it's not a hard "no".




Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE

2019-12-20 Thread Kevin Wolf
Am 20.12.2019 um 13:49 hat Markus Armbruster geschrieben:
> Kevin Wolf  writes:
> 
> > Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
> >> fuse-export-add allows mounting block graph nodes via FUSE on some
> >> existing regular file.  That file should then appears like a raw disk
> >> image, and accesses to it result in accesses to the exported BDS.
> >> 
> >> Right now, we only set up the mount point and tear all mount points down
> >> in bdrv_close_all().  We do not implement any access functions, so
> >> accessing the mount point only results in errors.  This will be
> >> addressed by a followup patch.
> >> 
> >> The set of exported nodes is kept in a hash table so we can later add a
> >> fuse-export-remove that allows unmounting.
> >> 
> >> Signed-off-by: Max Reitz 
> >
> >> diff --git a/qapi/block.json b/qapi/block.json
> >> index 145c268bb6..03f8d1b537 100644
> >> --- a/qapi/block.json
> >> +++ b/qapi/block.json
> >> @@ -317,6 +317,29 @@
> >>  ##
> >>  { 'command': 'nbd-server-stop' }
> >>  
> >> +##
> >> +# @fuse-export-add:
> >> +#
> >> +# Exports a block graph node on some (file) mountpoint as a raw image.
> >> +#
> >> +# @node-name: Node to be exported
> >> +#
> >> +# @mountpoint: Path on which to export the block device via FUSE.
> >> +#  This must point to an existing regular file.
> >> +#
> >> +# @writable: Whether clients should be able to write to the block
> >> +#device via the FUSE export. (default: false)
> >> +#
> >> +# Since: 5.0
> >> +##
> >> +{ 'command': 'fuse-export-add',
> >> +  'data': {
> >> +  'node-name': 'str',
> >> +  'mountpoint': 'str',
> >> +  '*writable': 'bool'
> >> +  },
> >> +  'if': 'defined(CONFIG_FUSE)' }
> >
> > Can this use a BlockExport union from the start like I'm introducing in
> > the storage daemon series, together with a generic block-export-add?
> >
> > It also looks like node-name and writable should be part of the common
> > base of BlockExport. Unfortunately this would mean that I can't use the
> > same BlockExportNbd for the existing nbd-server-add command any more. I
> > guess I could somehow get a shared base type for both, though.
> >
> > Markus, any thoughts on these QAPI interfaces?
> 
> Context?  How far back should I read?

Basically just the hunk quoted above and the QAPI portion of the
following storage daemon patch:

[RFC PATCH 08/18] qemu-storage-daemon: Add --export option

Maybe the cover letter, too, if you need a more high-level introduction.

Kevin




Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE

2019-12-20 Thread Kevin Wolf
Am 20.12.2019 um 13:48 hat Markus Armbruster geschrieben:
> Kevin Wolf  writes:
> 
> > Am 20.12.2019 um 11:48 hat Max Reitz geschrieben:
> >> So if we kept writable and growable in the common base, then the schema
> >> would give no information about what exports actually support them.
> >> 
> >> On one hand, I don’t know whether it’s important to have this
> >> information in a static form, or whether it’s sufficient to learn at
> >> runtime.
> >> 
> >> On the other, I don’t know whether it’s important to have those fields
> >> in the base or not.  Would it make a difference on the wire?
> >
> > Not for the command itself, so I think we're free to change it later. It
> > might make a difference for introspection, though, not sure. Markus?
> 
> QAPI schema introspection is designed to hide the difference between
> local members and base members.  You can move members to or from a base
> type freely without affecting introspection.  Even if that creates or
> deletes the base type.

Good, that's helpful. So I can split the nbd-server-add argument type
into a base that is reused as a union branch and the rest without
potentially breaking anything.

I suppose moving a field between a union base and all variants does
still result in different introspection even though the accepted inputs
are the same. Is this kind of movement still allowed unconditionally or
should we be more careful with something like this?

Kevin




Re: [PATCH 00/18] block: Allow exporting BDSs via FUSE

2019-12-20 Thread Kevin Wolf
Am 20.12.2019 um 11:30 hat Max Reitz geschrieben:
> I placed it into block/ because that just seemed like the least bad
> place to me (apart from creating a new top-level directory like nbd has)
> – and also because we already have quite some few non-driver files in
> block/ (io.c, the jobs (where some got drivers only rather recently),
> accounting.c, ...).

We could consider block/exports/ and eventually also move the NBD server
there.

Kevin


signature.asc
Description: PGP signature


Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE

2019-12-20 Thread Markus Armbruster
Kevin Wolf  writes:

> Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
>> fuse-export-add allows mounting block graph nodes via FUSE on some
>> existing regular file.  That file should then appears like a raw disk
>> image, and accesses to it result in accesses to the exported BDS.
>> 
>> Right now, we only set up the mount point and tear all mount points down
>> in bdrv_close_all().  We do not implement any access functions, so
>> accessing the mount point only results in errors.  This will be
>> addressed by a followup patch.
>> 
>> The set of exported nodes is kept in a hash table so we can later add a
>> fuse-export-remove that allows unmounting.
>> 
>> Signed-off-by: Max Reitz 
>
>> diff --git a/qapi/block.json b/qapi/block.json
>> index 145c268bb6..03f8d1b537 100644
>> --- a/qapi/block.json
>> +++ b/qapi/block.json
>> @@ -317,6 +317,29 @@
>>  ##
>>  { 'command': 'nbd-server-stop' }
>>  
>> +##
>> +# @fuse-export-add:
>> +#
>> +# Exports a block graph node on some (file) mountpoint as a raw image.
>> +#
>> +# @node-name: Node to be exported
>> +#
>> +# @mountpoint: Path on which to export the block device via FUSE.
>> +#  This must point to an existing regular file.
>> +#
>> +# @writable: Whether clients should be able to write to the block
>> +#device via the FUSE export. (default: false)
>> +#
>> +# Since: 5.0
>> +##
>> +{ 'command': 'fuse-export-add',
>> +  'data': {
>> +  'node-name': 'str',
>> +  'mountpoint': 'str',
>> +  '*writable': 'bool'
>> +  },
>> +  'if': 'defined(CONFIG_FUSE)' }
>
> Can this use a BlockExport union from the start like I'm introducing in
> the storage daemon series, together with a generic block-export-add?
>
> It also looks like node-name and writable should be part of the common
> base of BlockExport. Unfortunately this would mean that I can't use the
> same BlockExportNbd for the existing nbd-server-add command any more. I
> guess I could somehow get a shared base type for both, though.
>
> Markus, any thoughts on these QAPI interfaces?

Context?  How far back should I read?




Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE

2019-12-20 Thread Markus Armbruster
Kevin Wolf  writes:

> Am 20.12.2019 um 11:48 hat Max Reitz geschrieben:
>> On 20.12.19 11:26, Kevin Wolf wrote:
>> > Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
>> >> fuse-export-add allows mounting block graph nodes via FUSE on some
>> >> existing regular file.  That file should then appears like a raw disk
>> >> image, and accesses to it result in accesses to the exported BDS.
>> >>
>> >> Right now, we only set up the mount point and tear all mount points down
>> >> in bdrv_close_all().  We do not implement any access functions, so
>> >> accessing the mount point only results in errors.  This will be
>> >> addressed by a followup patch.
>> >>
>> >> The set of exported nodes is kept in a hash table so we can later add a
>> >> fuse-export-remove that allows unmounting.
>> >>
>> >> Signed-off-by: Max Reitz 
>> > 
>> >> diff --git a/qapi/block.json b/qapi/block.json
>> >> index 145c268bb6..03f8d1b537 100644
>> >> --- a/qapi/block.json
>> >> +++ b/qapi/block.json
>> >> @@ -317,6 +317,29 @@
>> >>  ##
>> >>  { 'command': 'nbd-server-stop' }
>> >>  
>> >> +##
>> >> +# @fuse-export-add:
>> >> +#
>> >> +# Exports a block graph node on some (file) mountpoint as a raw image.
>> >> +#
>> >> +# @node-name: Node to be exported
>> >> +#
>> >> +# @mountpoint: Path on which to export the block device via FUSE.
>> >> +#  This must point to an existing regular file.
>> >> +#
>> >> +# @writable: Whether clients should be able to write to the block
>> >> +#device via the FUSE export. (default: false)
>> >> +#
>> >> +# Since: 5.0
>> >> +##
>> >> +{ 'command': 'fuse-export-add',
>> >> +  'data': {
>> >> +  'node-name': 'str',
>> >> +  'mountpoint': 'str',
>> >> +  '*writable': 'bool'
>> >> +  },
>> >> +  'if': 'defined(CONFIG_FUSE)' }
>> > 
>> > Can this use a BlockExport union from the start like I'm introducing in
>> > the storage daemon series, together with a generic block-export-add?
>> 
>> Hm, you mean still adding a FuseExport structure that would be part of
>> BlockExport and then dropping fuse-export-add in favor of a
>> block-export-add that we want anyway?
>
> Yes.
>
>> > It also looks like node-name and writable should be part of the common
>> > base of BlockExport.
>> 
>> node-name definitely, I’m not so sure about writable.  Or, to be more
>> precise, I think that if we want writable to be in the base, we also
>> want growable to be there: Both are primarily options for the
>> BlockBackend that the exports use.
>> 
>> But both of course also need to be supported by the export
>> implementation.  nbd can make its BB growable all it wants, but that
>> doesn’t make it work.
>
> Right. Pragmatically, I think exports are very like to support writable,
> but probably rather unlikely to support growable. So I do think there
> would be a point for making writable part of the common base, but not
> growable.
>
>> So if we kept writable and growable in the common base, then the schema
>> would give no information about what exports actually support them.
>> 
>> On one hand, I don’t know whether it’s important to have this
>> information in a static form, or whether it’s sufficient to learn at
>> runtime.
>> 
>> On the other, I don’t know whether it’s important to have those fields
>> in the base or not.  Would it make a difference on the wire?
>
> Not for the command itself, so I think we're free to change it later. It
> might make a difference for introspection, though, not sure. Markus?

QAPI schema introspection is designed to hide the difference between
local members and base members.  You can move members to or from a base
type freely without affecting introspection.  Even if that creates or
deletes the base type.

Example: DriveBackup

{ 'struct': 'DriveBackup',
  'base': 'BackupCommon',
  'data': { 'target': 'str',
'*format': 'str',
'*mode': 'NewImageMode' } }

where BackupCommon is

{ 'struct': 'BackupCommon',
  'data': { '*job-id': 'str', 'device': 'str',
'sync': 'MirrorSyncMode', '*speed': 'int',
'*bitmap': 'str', '*bitmap-mode': 'BitmapSyncMode',
'*compress': 'bool',
'*on-source-error': 'BlockdevOnError',
'*on-target-error': 'BlockdevOnError',
'*auto-finalize': 'bool', '*auto-dismiss': 'bool',
'*filter-node-name': 'str' } }

query-qmp-schema describes DriveBackup like this:

{"name": "30",
 "members": [
 {"name": "job-id", "default": null, "type": "str"},
 {"name": "device", "type": "str"},
 {"name": "sync", "type": "235"},
 {"name": "speed", "default": null, "type": "int"},
 {"name": "bitmap", "default": null, "type": "str"},
 {"name": "bitmap-mode", "default": null, "type": "236"},
 {"name": "compress", "default": null, "type": "bool"},
 {"name": "on-source-error", "default": null, "type": "237"},
 {"name": "on-target-error", 

Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE

2019-12-20 Thread Max Reitz
On 20.12.19 12:24, Kevin Wolf wrote:
> Am 20.12.2019 um 11:48 hat Max Reitz geschrieben:
>> On 20.12.19 11:26, Kevin Wolf wrote:
>>> Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
 fuse-export-add allows mounting block graph nodes via FUSE on some
 existing regular file.  That file should then appears like a raw disk
 image, and accesses to it result in accesses to the exported BDS.

 Right now, we only set up the mount point and tear all mount points down
 in bdrv_close_all().  We do not implement any access functions, so
 accessing the mount point only results in errors.  This will be
 addressed by a followup patch.

 The set of exported nodes is kept in a hash table so we can later add a
 fuse-export-remove that allows unmounting.

 Signed-off-by: Max Reitz 
>>>
 diff --git a/qapi/block.json b/qapi/block.json
 index 145c268bb6..03f8d1b537 100644
 --- a/qapi/block.json
 +++ b/qapi/block.json
 @@ -317,6 +317,29 @@
  ##
  { 'command': 'nbd-server-stop' }
  
 +##
 +# @fuse-export-add:
 +#
 +# Exports a block graph node on some (file) mountpoint as a raw image.
 +#
 +# @node-name: Node to be exported
 +#
 +# @mountpoint: Path on which to export the block device via FUSE.
 +#  This must point to an existing regular file.
 +#
 +# @writable: Whether clients should be able to write to the block
 +#device via the FUSE export. (default: false)
 +#
 +# Since: 5.0
 +##
 +{ 'command': 'fuse-export-add',
 +  'data': {
 +  'node-name': 'str',
 +  'mountpoint': 'str',
 +  '*writable': 'bool'
 +  },
 +  'if': 'defined(CONFIG_FUSE)' }
>>>
>>> Can this use a BlockExport union from the start like I'm introducing in
>>> the storage daemon series, together with a generic block-export-add?
>>
>> Hm, you mean still adding a FuseExport structure that would be part of
>> BlockExport and then dropping fuse-export-add in favor of a
>> block-export-add that we want anyway?
> 
> Yes.
> 
>>> It also looks like node-name and writable should be part of the common
>>> base of BlockExport.
>>
>> node-name definitely, I’m not so sure about writable.  Or, to be more
>> precise, I think that if we want writable to be in the base, we also
>> want growable to be there: Both are primarily options for the
>> BlockBackend that the exports use.
>>
>> But both of course also need to be supported by the export
>> implementation.  nbd can make its BB growable all it wants, but that
>> doesn’t make it work.
> 
> Right. Pragmatically, I think exports are very like to support writable,
> but probably rather unlikely to support growable. So I do think there
> would be a point for making writable part of the common base, but not
> growable.

True.

But there’s nothing that inherently binds it to FUSE, so I think both
from an implementation’s POV and from a user’s POV, it looks just as
generic as “writable”.  But that’s theory.  I agree that in practice, it
won’t be as generic.

(I realize this doesn’t help much in finding out what we should do.)

>> So if we kept writable and growable in the common base, then the schema
>> would give no information about what exports actually support them.
>>
>> On one hand, I don’t know whether it’s important to have this
>> information in a static form, or whether it’s sufficient to learn at
>> runtime.
>>
>> On the other, I don’t know whether it’s important to have those fields
>> in the base or not.  Would it make a difference on the wire?
> 
> Not for the command itself, so I think we're free to change it later. It
> might make a difference for introspection, though, not sure. Markus?

Yes, I asked because I’m wondering whether it would be more cumbersome
to users if we didn’t keep it in the base structure.

The duplication depends on how we want to design the command.  Should
the export implementations receive a ready-to-use BB?  Or just a
node-name?  In the latter case, we wouldn’t get rid of duplicated code
by having writable/growable in the base.  For the former, it could, but
then again, just taking the WRITE permission and making the BB growable
isn’t that bad to duplicate.

Something to consider is that of course the current NBD code wants a
node-name and not a BB.  So if we decided to generally give export
implementations a BB, then the initial implementation of
qmp_block_export_add() would look a bit freaky: It would first branch
off to qmp_nbd_server_add(), and then open the BB for the “common” case,
but this common case only exists for FUSE (right now).

OTOH, right now we’re free to decide whether we open the BB in
qmp_block_export_add() or fuse.c, and so we might as well just do it in
the former.  If we later find out that this was a stupid idea, we can
always move it into fuse.c.


Now I don’t quite know where I’m trying to get with this.

I suppose it means that we should start with 

Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops

2019-12-20 Thread Max Reitz
On 20.12.19 12:55, Vladimir Sementsov-Ogievskiy wrote:
> 20.12.2019 14:39, Max Reitz wrote:
>> On 13.12.19 12:18, Vladimir Sementsov-Ogievskiy wrote:
>>> 09.12.2019 17:43, Max Reitz wrote:
 On 02.12.19 13:12, Vladimir Sementsov-Ogievskiy wrote:
> 11.11.2019 19:02, Max Reitz wrote:
>> While bdrv_replace_node() will not follow through with it, a specific
>> @replaces asks the mirror job to create a loop.
>>
>> For example, say both the source and the target share a child where the
>> source is a filter; by letting @replaces point to the common child, you
>> ask for a loop.
>>
>> Or if you use @replaces in drive-mirror with sync=none and
>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>> to a child of the source, and sync=none makes the source the backing
>> file of the target after the job).
>>
>> bdrv_replace_node() will not create those loops, but by doing so, it
>> ignores the user-requested configuration, which is not ideally either.
>> (In the first example above, the target's child will remain what it was,
>> which may still be reasonable.  But in the second example, the target
>> will just not become a child of the source, which is precisely what was
>> requested with @replaces.)
>>
>> So prevent such configurations, both before the job, and before it
>> actually completes.
>>
>> Signed-off-by: Max Reitz 
>> ---
>> block.c   | 30 
>> block/mirror.c| 19 +++-
>> blockdev.c| 48 
>> ++-
>> include/block/block_int.h |  3 +++
>> 4 files changed, 98 insertions(+), 2 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 0159f8e510..e3922a0474 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6259,6 +6259,36 @@ out:
>> return to_replace_bs;
>> }
>> 
>> +/*
>> + * Return true iff @child is a (recursive) child of @parent, with at
>> + * least @min_level edges between them.
>> + *
>> + * (If @min_level == 0, return true if @child == @parent.  For
>> + * @min_level == 1, @child needs to be at least a real child; for
>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>> + */
>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>> +  int min_level)
>> +{
>> +BdrvChild *c;
>> +
>> +if (child == parent && min_level <= 0) {
>> +return true;
>> +}
>> +
>> +if (!parent) {
>> +return false;
>> +}
>> +
>> +QLIST_FOREACH(c, >children, next) {
>> +if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>> +return true;
>> +}
>> +}
>> +
>> +return false;
>> +}
>> +
>> /**
>>  * Iterates through the list of runtime option keys that are said to
>>  * be "strong" for a BDS.  An option is called "strong" if it changes
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 68a4404666..b258c7e98b 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>  * there.
>>  */
>> if (bdrv_recurse_can_replace(src, to_replace)) {
>> -bdrv_replace_node(to_replace, target_bs, _err);
>> +/*
>> + * It is OK for @to_replace to be an immediate child of
>> + * @target_bs, because that is what happens with
>> + * drive-mirror sync=none mode=absolute-paths: target_bs's
>> + * backing file will be the source node, which is also
>> + * to_replace (by default).
>> + * bdrv_replace_node() handles this case by not letting
>> + * target_bs->backing point to itself, but to the source
>> + * still.
>> + */
>> +if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
>> +bdrv_replace_node(to_replace, target_bs, _err);
>> +} else {
>> +error_setg(_err, "Can no longer replace '%s' by 
>> '%s', "
>> +   "because the former is now a child of the 
>> latter, "
>> +   "and doing so would thus create a loop",
>> +   to_replace->node_name, target_bs->node_name);
>> +}
>
> you may swap if and else branch, dropping "!" mark..

 Yes, but I just personally prefer to have the error case in the else 
 branch.

>> } else {
>> error_setg(_err, "Can no longer replace '%s' by 
>> '%s', "
>>  

Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops

2019-12-20 Thread Vladimir Sementsov-Ogievskiy
20.12.2019 14:39, Max Reitz wrote:
> On 13.12.19 12:18, Vladimir Sementsov-Ogievskiy wrote:
>> 09.12.2019 17:43, Max Reitz wrote:
>>> On 02.12.19 13:12, Vladimir Sementsov-Ogievskiy wrote:
 11.11.2019 19:02, Max Reitz wrote:
> While bdrv_replace_node() will not follow through with it, a specific
> @replaces asks the mirror job to create a loop.
>
> For example, say both the source and the target share a child where the
> source is a filter; by letting @replaces point to the common child, you
> ask for a loop.
>
> Or if you use @replaces in drive-mirror with sync=none and
> mode=absolute-paths, you generally ask for a loop (@replaces must point
> to a child of the source, and sync=none makes the source the backing
> file of the target after the job).
>
> bdrv_replace_node() will not create those loops, but by doing so, it
> ignores the user-requested configuration, which is not ideally either.
> (In the first example above, the target's child will remain what it was,
> which may still be reasonable.  But in the second example, the target
> will just not become a child of the source, which is precisely what was
> requested with @replaces.)
>
> So prevent such configurations, both before the job, and before it
> actually completes.
>
> Signed-off-by: Max Reitz 
> ---
> block.c   | 30 
> block/mirror.c| 19 +++-
> blockdev.c| 48 ++-
> include/block/block_int.h |  3 +++
> 4 files changed, 98 insertions(+), 2 deletions(-)
>
> diff --git a/block.c b/block.c
> index 0159f8e510..e3922a0474 100644
> --- a/block.c
> +++ b/block.c
> @@ -6259,6 +6259,36 @@ out:
> return to_replace_bs;
> }
> 
> +/*
> + * Return true iff @child is a (recursive) child of @parent, with at
> + * least @min_level edges between them.
> + *
> + * (If @min_level == 0, return true if @child == @parent.  For
> + * @min_level == 1, @child needs to be at least a real child; for
> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
> + */
> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
> +  int min_level)
> +{
> +BdrvChild *c;
> +
> +if (child == parent && min_level <= 0) {
> +return true;
> +}
> +
> +if (!parent) {
> +return false;
> +}
> +
> +QLIST_FOREACH(c, >children, next) {
> +if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
> +return true;
> +}
> +}
> +
> +return false;
> +}
> +
> /**
>  * Iterates through the list of runtime option keys that are said to
>  * be "strong" for a BDS.  An option is called "strong" if it changes
> diff --git a/block/mirror.c b/block/mirror.c
> index 68a4404666..b258c7e98b 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>  * there.
>  */
> if (bdrv_recurse_can_replace(src, to_replace)) {
> -bdrv_replace_node(to_replace, target_bs, _err);
> +/*
> + * It is OK for @to_replace to be an immediate child of
> + * @target_bs, because that is what happens with
> + * drive-mirror sync=none mode=absolute-paths: target_bs's
> + * backing file will be the source node, which is also
> + * to_replace (by default).
> + * bdrv_replace_node() handles this case by not letting
> + * target_bs->backing point to itself, but to the source
> + * still.
> + */
> +if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
> +bdrv_replace_node(to_replace, target_bs, _err);
> +} else {
> +error_setg(_err, "Can no longer replace '%s' by 
> '%s', "
> +   "because the former is now a child of the 
> latter, "
> +   "and doing so would thus create a loop",
> +   to_replace->node_name, target_bs->node_name);
> +}

 you may swap if and else branch, dropping "!" mark..
>>>
>>> Yes, but I just personally prefer to have the error case in the else branch.
>>>
> } else {
> error_setg(_err, "Can no longer replace '%s' by 
> '%s', "
>"because it can no longer be guaranteed that 
> doing so "
> diff --git a/blockdev.c b/blockdev.c
> index 9dc2238bf3..d29f147f72 100644
> --- a/blockdev.c
> +++ 

Re: [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path()

2019-12-20 Thread Max Reitz
On 13.12.19 12:27, Vladimir Sementsov-Ogievskiy wrote:
> 11.11.2019 19:02, Max Reitz wrote:
>> Signed-off-by: Max Reitz 
>> ---
>>   tests/qemu-iotests/iotests.py | 59 +++
>>   1 file changed, 59 insertions(+)
>>
>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>> index d34305ce69..3e03320ce3 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -681,6 +681,65 @@ class VM(qtest.QEMUQtestMachine):
>>   
>>   return fields.items() <= ret.items()
>>   
>> +"""
>> +Check whether the node under the given path in the block graph is
>> +@expected_node.
>> +
>> +@root is the node name of the node where the @path is rooted.
>> +
>> +@path is a string that consists of child names separated by
>> +slashes.  It must begin with a slash.
>> +
>> +Examples for @root + @path:
>> +  - root="qcow2-node", path="/backing/file"
>> +  - root="quorum-node", path="/children.2/file"
>> +
>> +Hypothetically, @path could be empty, in which case it would point
>> +to @root.  However, in practice this case is not useful and hence
>> +not allowed.
>> +
>> +@expected_node may be None.
>> +
>> +@graph may be None or the result of an x-debug-query-block-graph
>> +call that has already been performed.
>> +"""
>> +def assert_block_path(self, root, path, expected_node, graph=None):
>> +if graph is None:
>> +graph = self.qmp('x-debug-query-block-graph')['return']
>> +
>> +iter_path = iter(path.split('/'))
>> +
>> +# Must start with a /
>> +assert next(iter_path) == ''
>> +
>> +node = next((node for node in graph['nodes'] if node['name'] == 
>> root),
>> +None)
>> +
>> +for path_node in iter_path:
> 
> I'd rename path_node to child or edge, to not interfere with block nodes here.

Sure.  Or maybe child_name.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops

2019-12-20 Thread Max Reitz
On 13.12.19 12:18, Vladimir Sementsov-Ogievskiy wrote:
> 09.12.2019 17:43, Max Reitz wrote:
>> On 02.12.19 13:12, Vladimir Sementsov-Ogievskiy wrote:
>>> 11.11.2019 19:02, Max Reitz wrote:
 While bdrv_replace_node() will not follow through with it, a specific
 @replaces asks the mirror job to create a loop.

 For example, say both the source and the target share a child where the
 source is a filter; by letting @replaces point to the common child, you
 ask for a loop.

 Or if you use @replaces in drive-mirror with sync=none and
 mode=absolute-paths, you generally ask for a loop (@replaces must point
 to a child of the source, and sync=none makes the source the backing
 file of the target after the job).

 bdrv_replace_node() will not create those loops, but by doing so, it
 ignores the user-requested configuration, which is not ideally either.
 (In the first example above, the target's child will remain what it was,
 which may still be reasonable.  But in the second example, the target
 will just not become a child of the source, which is precisely what was
 requested with @replaces.)

 So prevent such configurations, both before the job, and before it
 actually completes.

 Signed-off-by: Max Reitz 
 ---
block.c   | 30 
block/mirror.c| 19 +++-
blockdev.c| 48 ++-
include/block/block_int.h |  3 +++
4 files changed, 98 insertions(+), 2 deletions(-)

 diff --git a/block.c b/block.c
 index 0159f8e510..e3922a0474 100644
 --- a/block.c
 +++ b/block.c
 @@ -6259,6 +6259,36 @@ out:
return to_replace_bs;
}

 +/*
 + * Return true iff @child is a (recursive) child of @parent, with at
 + * least @min_level edges between them.
 + *
 + * (If @min_level == 0, return true if @child == @parent.  For
 + * @min_level == 1, @child needs to be at least a real child; for
 + * @min_level == 2, it needs to be at least a grand-child; and so on.)
 + */
 +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
 +  int min_level)
 +{
 +BdrvChild *c;
 +
 +if (child == parent && min_level <= 0) {
 +return true;
 +}
 +
 +if (!parent) {
 +return false;
 +}
 +
 +QLIST_FOREACH(c, >children, next) {
 +if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
 +return true;
 +}
 +}
 +
 +return false;
 +}
 +
/**
 * Iterates through the list of runtime option keys that are said to
 * be "strong" for a BDS.  An option is called "strong" if it changes
 diff --git a/block/mirror.c b/block/mirror.c
 index 68a4404666..b258c7e98b 100644
 --- a/block/mirror.c
 +++ b/block/mirror.c
 @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
 * there.
 */
if (bdrv_recurse_can_replace(src, to_replace)) {
 -bdrv_replace_node(to_replace, target_bs, _err);
 +/*
 + * It is OK for @to_replace to be an immediate child of
 + * @target_bs, because that is what happens with
 + * drive-mirror sync=none mode=absolute-paths: target_bs's
 + * backing file will be the source node, which is also
 + * to_replace (by default).
 + * bdrv_replace_node() handles this case by not letting
 + * target_bs->backing point to itself, but to the source
 + * still.
 + */
 +if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
 +bdrv_replace_node(to_replace, target_bs, _err);
 +} else {
 +error_setg(_err, "Can no longer replace '%s' by 
 '%s', "
 +   "because the former is now a child of the 
 latter, "
 +   "and doing so would thus create a loop",
 +   to_replace->node_name, target_bs->node_name);
 +}
>>>
>>> you may swap if and else branch, dropping "!" mark..
>>
>> Yes, but I just personally prefer to have the error case in the else branch.
>>
} else {
error_setg(_err, "Can no longer replace '%s' by '%s', 
 "
   "because it can no longer be guaranteed that 
 doing so "
 diff --git a/blockdev.c b/blockdev.c
 index 9dc2238bf3..d29f147f72 100644
 --- a/blockdev.c
 +++ b/blockdev.c
 @@ -3824,7 +3824,7 @@ static void blockdev_mirror_common(const char 
 *job_id, BlockDriverState *bs,
}

if 

Re: [PATCH] block: nbd: Fix dirty bitmap context name

2019-12-20 Thread Vladimir Sementsov-Ogievskiy
20.12.2019 13:40, Peter Krempa wrote:
> On Fri, Dec 20, 2019 at 10:06:17 +, Vladimir Sementsov-Ogievskiy wrote:
>> 20.12.2019 12:56, Peter Krempa wrote:
>>> On Fri, Dec 20, 2019 at 09:39:17 +, Vladimir Sementsov-Ogievskiy wrote:
 19.12.2019 18:55, Nir Soffer wrote:
> On Thu, Dec 19, 2019 at 5:17 PM Vladimir Sementsov-Ogievskiy
>  wrote:
>>
>> 19.12.2019 17:59, Nir Soffer wrote:
>>> On Thu, Dec 19, 2019 at 4:04 PM Kevin Wolf  wrote:

 Am 19.12.2019 um 14:41 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Ahh, I see, it's documented as
>
> +# @bitmap: Also export the dirty bitmap reachable from @device, so 
> the
> +#  NBD client can use NBD_OPT_SET_META_CONTEXT with
> +#  "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 
> 4.0)
>
> and it is logical to assume that export name (which is @name 
> argument) is
> mentioned. But we never mentioned it. This is just documented after
> removed experimenatl command x-nbd-server-add-bitmap,
>
> look at
>
> commit 7dc570b3806e5b0a4c9219061556ed5a4a0de80c
> Author: Eric Blake 
> Date:   Fri Jan 11 13:47:18 2019 -0600
>
>  nbd: Remove x-nbd-server-add-bitmap
>
> ...
>
> -# @bitmap-export-name: How the bitmap will be seen by nbd clients
> -#  (default @bitmap)
> -#
> -# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of
> -# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) 
> to access
> -# the exposed bitmap.
>
>
> So, this "NAME" is saved and now looks incorrect. What should be 
> fixed, is Qapi
> documentation.

 Hm, I don't know these interfaces very well, but from you explanation 
 it
 looks to me as if having a bitmap name made sense with
 x-nbd-server-add-bitmap because it could be called more than once for
 exporting multiple bitmaps.

 But now, we have only nbd-server-add, which takes a single bitmap name.
 As we don't have to distinguish multiple bitmaps any more, wouldn't it
 make more sense to use "qemu:dirty-bitmap" without any other
 information? Both export name and bitmap name are already identified by
 the connection.
>>>
>>> We can use empty string (like the default export name), so the bitmap
>>> would be exposed as:
>>>
>>> "qemu:dirty-bitmap:"
>>>
>>> This would solve the issue for users, and keep the API extensible.
>>
>> As I already said, we can not. If we really wont such thing, use another 
>> name,
>> likq qemu:default-dirty-bitmap..
>>
>> Or call bitmap export "default", to produce
>>  "qemu:dirty-bitmaps:default"
>>
>> But don't change default behavior of nbd-server-add
>>
>>>
 But if we have to have something there, using the bitmap name (which 
 may
 or may not be the same as used in the image file) makes a little more
 sense because it makes the interface extensible for the case that we
 ever want to re-introduce an nbd-server-add-bitmap.
>>>
>>> But using the bitmap name means user of the NBD server need to know 
>>> this name.
>>
>> Why not? What is your case exactly? User knows, what kind of bitmap you 
>> are
>> exporting, so user is in some relation with exporting process anyway. Why
>> shouldn't it know the name?
>
> Because the user configuring qemu (libvirt) is not the same user
> accessing qemu NBD
> server (ovirt, or some backup application).
>
>> This name may be defined in you exporting protocol.. What are you 
>> exporting?
>
> We export HTTP API, allowing getting dirty extents and reading data:
> https://www.ovirt.org/develop/release-management/features/storage/incremental-backup.html#map-request
> (this document is outdated, but it gives the general idea)
>
> Or provide the NBD URL directly to user (future).
>
>> Note also, that client may use NBD_OPT_LIST_META_CONTEXT with query
>> "qemu:dirty-bitmap:", to get list of all exported bitmaps.
>
> This is another option, I did not try to use this yet. We can use the 
> single
> exported bitmap and fail if we get more than one. This is probably better
> than changing the entire stack to support bitmap name.
>
>>> One option is that libvirt would publish the name of the bitmap in the
>>> xml describing
>>> the backup, and oVirt will have to propagate this name to the actual
>>> program accessing
>>> the NBD server, which may be a user program in the case when we expose 
>>> the NBD
>>> URL to users (planned for 

Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE

2019-12-20 Thread Kevin Wolf
Am 20.12.2019 um 11:48 hat Max Reitz geschrieben:
> On 20.12.19 11:26, Kevin Wolf wrote:
> > Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
> >> fuse-export-add allows mounting block graph nodes via FUSE on some
> >> existing regular file.  That file should then appears like a raw disk
> >> image, and accesses to it result in accesses to the exported BDS.
> >>
> >> Right now, we only set up the mount point and tear all mount points down
> >> in bdrv_close_all().  We do not implement any access functions, so
> >> accessing the mount point only results in errors.  This will be
> >> addressed by a followup patch.
> >>
> >> The set of exported nodes is kept in a hash table so we can later add a
> >> fuse-export-remove that allows unmounting.
> >>
> >> Signed-off-by: Max Reitz 
> > 
> >> diff --git a/qapi/block.json b/qapi/block.json
> >> index 145c268bb6..03f8d1b537 100644
> >> --- a/qapi/block.json
> >> +++ b/qapi/block.json
> >> @@ -317,6 +317,29 @@
> >>  ##
> >>  { 'command': 'nbd-server-stop' }
> >>  
> >> +##
> >> +# @fuse-export-add:
> >> +#
> >> +# Exports a block graph node on some (file) mountpoint as a raw image.
> >> +#
> >> +# @node-name: Node to be exported
> >> +#
> >> +# @mountpoint: Path on which to export the block device via FUSE.
> >> +#  This must point to an existing regular file.
> >> +#
> >> +# @writable: Whether clients should be able to write to the block
> >> +#device via the FUSE export. (default: false)
> >> +#
> >> +# Since: 5.0
> >> +##
> >> +{ 'command': 'fuse-export-add',
> >> +  'data': {
> >> +  'node-name': 'str',
> >> +  'mountpoint': 'str',
> >> +  '*writable': 'bool'
> >> +  },
> >> +  'if': 'defined(CONFIG_FUSE)' }
> > 
> > Can this use a BlockExport union from the start like I'm introducing in
> > the storage daemon series, together with a generic block-export-add?
> 
> Hm, you mean still adding a FuseExport structure that would be part of
> BlockExport and then dropping fuse-export-add in favor of a
> block-export-add that we want anyway?

Yes.

> > It also looks like node-name and writable should be part of the common
> > base of BlockExport.
> 
> node-name definitely, I’m not so sure about writable.  Or, to be more
> precise, I think that if we want writable to be in the base, we also
> want growable to be there: Both are primarily options for the
> BlockBackend that the exports use.
> 
> But both of course also need to be supported by the export
> implementation.  nbd can make its BB growable all it wants, but that
> doesn’t make it work.

Right. Pragmatically, I think exports are very like to support writable,
but probably rather unlikely to support growable. So I do think there
would be a point for making writable part of the common base, but not
growable.

> So if we kept writable and growable in the common base, then the schema
> would give no information about what exports actually support them.
> 
> On one hand, I don’t know whether it’s important to have this
> information in a static form, or whether it’s sufficient to learn at
> runtime.
> 
> On the other, I don’t know whether it’s important to have those fields
> in the base or not.  Would it make a difference on the wire?

Not for the command itself, so I think we're free to change it later. It
might make a difference for introspection, though, not sure. Markus?

Having it in the base might allow us to remove some duplication in the
code. Probably not much, though, so not too important.

> > Unfortunately this would mean that I can't use the
> > same BlockExportNbd for the existing nbd-server-add command any more. I
> > guess I could somehow get a shared base type for both, though.
> 
> Hm.  This sounds like you want to make it your problem.  Can I take that
> to mean that you want to implement block-export-add and I can wait with
> v2 until that’s done? :-)

The NBD integration, yes. I already added the BlockExport type to my
patches, too, but I expect you would beat me to it. I'm not currently
planning to write a block-export-add because it doesn't add anything new
for the storage daemon, so FuseExport and the command this is your part.
The type currently only exists for --export.

Kevin



signature.asc
Description: PGP signature


Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE

2019-12-20 Thread Max Reitz
On 20.12.19 11:26, Kevin Wolf wrote:
> Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
>> fuse-export-add allows mounting block graph nodes via FUSE on some
>> existing regular file.  That file should then appears like a raw disk
>> image, and accesses to it result in accesses to the exported BDS.
>>
>> Right now, we only set up the mount point and tear all mount points down
>> in bdrv_close_all().  We do not implement any access functions, so
>> accessing the mount point only results in errors.  This will be
>> addressed by a followup patch.
>>
>> The set of exported nodes is kept in a hash table so we can later add a
>> fuse-export-remove that allows unmounting.
>>
>> Signed-off-by: Max Reitz 
> 
>> diff --git a/qapi/block.json b/qapi/block.json
>> index 145c268bb6..03f8d1b537 100644
>> --- a/qapi/block.json
>> +++ b/qapi/block.json
>> @@ -317,6 +317,29 @@
>>  ##
>>  { 'command': 'nbd-server-stop' }
>>  
>> +##
>> +# @fuse-export-add:
>> +#
>> +# Exports a block graph node on some (file) mountpoint as a raw image.
>> +#
>> +# @node-name: Node to be exported
>> +#
>> +# @mountpoint: Path on which to export the block device via FUSE.
>> +#  This must point to an existing regular file.
>> +#
>> +# @writable: Whether clients should be able to write to the block
>> +#device via the FUSE export. (default: false)
>> +#
>> +# Since: 5.0
>> +##
>> +{ 'command': 'fuse-export-add',
>> +  'data': {
>> +  'node-name': 'str',
>> +  'mountpoint': 'str',
>> +  '*writable': 'bool'
>> +  },
>> +  'if': 'defined(CONFIG_FUSE)' }
> 
> Can this use a BlockExport union from the start like I'm introducing in
> the storage daemon series, together with a generic block-export-add?

Hm, you mean still adding a FuseExport structure that would be part of
BlockExport and then dropping fuse-export-add in favor of a
block-export-add that we want anyway?

> It also looks like node-name and writable should be part of the common
> base of BlockExport.

node-name definitely, I’m not so sure about writable.  Or, to be more
precise, I think that if we want writable to be in the base, we also
want growable to be there: Both are primarily options for the
BlockBackend that the exports use.

But both of course also need to be supported by the export
implementation.  nbd can make its BB growable all it wants, but that
doesn’t make it work.

So if we kept writable and growable in the common base, then the schema
would give no information about what exports actually support them.

On one hand, I don’t know whether it’s important to have this
information in a static form, or whether it’s sufficient to learn at
runtime.

On the other, I don’t know whether it’s important to have those fields
in the base or not.  Would it make a difference on the wire?

> Unfortunately this would mean that I can't use the
> same BlockExportNbd for the existing nbd-server-add command any more. I
> guess I could somehow get a shared base type for both, though.

Hm.  This sounds like you want to make it your problem.  Can I take that
to mean that you want to implement block-export-add and I can wait with
v2 until that’s done? :-)

Max

> Markus, any thoughts on these QAPI interfaces?
> 
> Kevin



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] block: nbd: Fix dirty bitmap context name

2019-12-20 Thread Peter Krempa
On Fri, Dec 20, 2019 at 10:06:17 +, Vladimir Sementsov-Ogievskiy wrote:
> 20.12.2019 12:56, Peter Krempa wrote:
> > On Fri, Dec 20, 2019 at 09:39:17 +, Vladimir Sementsov-Ogievskiy wrote:
> >> 19.12.2019 18:55, Nir Soffer wrote:
> >>> On Thu, Dec 19, 2019 at 5:17 PM Vladimir Sementsov-Ogievskiy
> >>>  wrote:
> 
>  19.12.2019 17:59, Nir Soffer wrote:
> > On Thu, Dec 19, 2019 at 4:04 PM Kevin Wolf  wrote:
> >>
> >> Am 19.12.2019 um 14:41 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>> Ahh, I see, it's documented as
> >>>
> >>> +# @bitmap: Also export the dirty bitmap reachable from @device, so 
> >>> the
> >>> +#  NBD client can use NBD_OPT_SET_META_CONTEXT with
> >>> +#  "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 
> >>> 4.0)
> >>>
> >>> and it is logical to assume that export name (which is @name 
> >>> argument) is
> >>> mentioned. But we never mentioned it. This is just documented after
> >>> removed experimenatl command x-nbd-server-add-bitmap,
> >>>
> >>> look at
> >>>
> >>> commit 7dc570b3806e5b0a4c9219061556ed5a4a0de80c
> >>> Author: Eric Blake 
> >>> Date:   Fri Jan 11 13:47:18 2019 -0600
> >>>
> >>> nbd: Remove x-nbd-server-add-bitmap
> >>>
> >>> ...
> >>>
> >>> -# @bitmap-export-name: How the bitmap will be seen by nbd clients
> >>> -#  (default @bitmap)
> >>> -#
> >>> -# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of
> >>> -# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) 
> >>> to access
> >>> -# the exposed bitmap.
> >>>
> >>>
> >>> So, this "NAME" is saved and now looks incorrect. What should be 
> >>> fixed, is Qapi
> >>> documentation.
> >>
> >> Hm, I don't know these interfaces very well, but from you explanation 
> >> it
> >> looks to me as if having a bitmap name made sense with
> >> x-nbd-server-add-bitmap because it could be called more than once for
> >> exporting multiple bitmaps.
> >>
> >> But now, we have only nbd-server-add, which takes a single bitmap name.
> >> As we don't have to distinguish multiple bitmaps any more, wouldn't it
> >> make more sense to use "qemu:dirty-bitmap" without any other
> >> information? Both export name and bitmap name are already identified by
> >> the connection.
> >
> > We can use empty string (like the default export name), so the bitmap
> > would be exposed as:
> >
> >"qemu:dirty-bitmap:"
> >
> > This would solve the issue for users, and keep the API extensible.
> 
>  As I already said, we can not. If we really wont such thing, use another 
>  name,
>  likq qemu:default-dirty-bitmap..
> 
>  Or call bitmap export "default", to produce
>  "qemu:dirty-bitmaps:default"
> 
>  But don't change default behavior of nbd-server-add
> 
> >
> >> But if we have to have something there, using the bitmap name (which 
> >> may
> >> or may not be the same as used in the image file) makes a little more
> >> sense because it makes the interface extensible for the case that we
> >> ever want to re-introduce an nbd-server-add-bitmap.
> >
> > But using the bitmap name means user of the NBD server need to know 
> > this name.
> 
>  Why not? What is your case exactly? User knows, what kind of bitmap you 
>  are
>  exporting, so user is in some relation with exporting process anyway. Why
>  shouldn't it know the name?
> >>>
> >>> Because the user configuring qemu (libvirt) is not the same user
> >>> accessing qemu NBD
> >>> server (ovirt, or some backup application).
> >>>
>  This name may be defined in you exporting protocol.. What are you 
>  exporting?
> >>>
> >>> We export HTTP API, allowing getting dirty extents and reading data:
> >>> https://www.ovirt.org/develop/release-management/features/storage/incremental-backup.html#map-request
> >>> (this document is outdated, but it gives the general idea)
> >>>
> >>> Or provide the NBD URL directly to user (future).
> >>>
>  Note also, that client may use NBD_OPT_LIST_META_CONTEXT with query
>  "qemu:dirty-bitmap:", to get list of all exported bitmaps.
> >>>
> >>> This is another option, I did not try to use this yet. We can use the 
> >>> single
> >>> exported bitmap and fail if we get more than one. This is probably better
> >>> than changing the entire stack to support bitmap name.
> >>>
> > One option is that libvirt would publish the name of the bitmap in the
> > xml describing
> > the backup, and oVirt will have to propagate this name to the actual
> > program accessing
> > the NBD server, which may be a user program in the case when we expose 
> > the NBD
> > URL to users (planned for future version).
> >
> > Another option 

Re: [PATCH v5 4/4] blockdev: honor bdrv_try_set_aio_context() context requirements

2019-12-20 Thread Sergio Lopez

Sergio Lopez  writes:

> Sergio Lopez  writes:
>
>> Kevin Wolf  writes:
>>
>>> Am 13.12.2019 um 21:59 hat Eric Blake geschrieben:
 On 12/9/19 10:06 AM, Kevin Wolf wrote:
 > Am 28.11.2019 um 11:41 hat Sergio Lopez geschrieben:
 > > bdrv_try_set_aio_context() requires that the old context is held, and
 > > the new context is not held. Fix all the occurrences where it's not
 > > done this way.
 > > 
 > > Suggested-by: Max Reitz 
 > > Signed-off-by: Sergio Lopez 
 > > ---
 
 > Or in fact, I think you need to hold the AioContext of a bs to
 > bdrv_unref() it, so maybe 'goto out' is right, but you need to unref
 > target_bs while you still hold old_context.
 
 I suspect https://bugzilla.redhat.com/show_bug.cgi?id=1779036 is also a
 symptom of this.  The v5 patch did not fix this simple test case:
>>>
>>> Speaking of a test case... I think this series should probably add
>>> something to iotests.
>>
>> Okay, I'll try to add one.
>
> So I've been working on this, but it turns out that the issue isn't
> reproducible with 'accel=qtest'. I guess that's because the devices
> using the nodes aren't really active in this situation.
>
> Is it allowed to use 'accel=kvm:tcg' in iotests? I see test 235 does
> that, but I'm not sure if that's just an exception.

Please ignore this, it was another issue.

Thanks,
Sergio.




signature.asc
Description: PGP signature


Re: [PATCH 00/18] block: Allow exporting BDSs via FUSE

2019-12-20 Thread Max Reitz
On 20.12.19 11:08, Stefan Hajnoczi wrote:
> On Thu, Dec 19, 2019 at 03:38:00PM +0100, Max Reitz wrote:
>> Preamble: This series is based on a combination of my (current) block
>> branch and “iotests: Minor fixes”.  I’ve pushed it here:
>>
>>   https://git.xanclic.moe/XanClic/qemu fuse-exports-v1
>>
>> (The base is on fuse-exports-v1-base.)
>>
>>
>> Hi,
>>
>> Ever since I found out that you can mount FUSE filesystems on regular
>> files (not just directories), I had the idea of adding FUSE block
>> exports to qemu where you can export block nodes as raw images.  The
>> best thing is that you’d be able to mount an image on itself, so
>> whatever format it may be in, qemu lets it appear as a raw image (and
>> you can then use regular tools like dd on it).
>>
>> I started with some concept of a qemu-blkfuse daemon (similar to
>> qemu-nbd), but never sent patches, for two reasons: (1) Performance was
>> not good, (2) it didn’t seem right, for some reason.
>>
>> Now Kevin is proposing a storage daemon for multiple export types like
>> NBD, and he also mentioned FUSE (while knowing of my previous attempts).
>> Now it does seem right to add FUSE exports to qemu, but only in the form
>> of some module with a proper QAPI/QMP binding.
>>
>> Performance is still quite bad, but who cares.  We can always improve
>> it, if the need arises.
>>
>>
>> This series does the following:
>>
>> First, add the FUSE export module (block/fuse.c) that implements the
>> basic file access functions.  (Note that you need libfuse 3.8.0 or later
>> for SEEK_HOLE/SEEK_DATA.)
>>
>> Second, it allows using FUSE exports as a protocol in the iotests and
>> makes many iotests work with it.  (The file node is exported by a
>> background qemu instance to $SOCK_DIR.)
>> Note that I only ran raw and qcow2 on it; I’m sure other formats
>> currently have some failing tests.
>>
>> This gives us a lot of coverage for, well, not free (it does take ten
>> patches), but for cheap; but there are still some more specialized
>> things we want to test, so third and last, this series adds an iotest
>> dedicated to FUSE exports.
>>
>>
>> Final rather important notice: I didn’t really run the iotests with this
>> yet.  I wanted to, but they appear rather broken on current master,
>> actually.  I’m not yet sure whether that’s because something in my setup
>> broke in the last two weeks, or because there’s quite something broken
>> in master (it does look like there are a couple things broken in master
>> currently).
>>
>>
>> Max Reitz (18):
>>   configure: Detect libfuse
>>   fuse: Allow exporting BDSs via FUSE
>>   fuse: Implement standard FUSE operations
>>   fuse: Add fuse-export-remove
>>   fuse: Allow growable exports
>>   fuse: (Partially) implement fallocate()
>>   fuse: Implement hole detection through lseek
>>   iotests: Do not needlessly filter _make_test_img
>>   iotests: Do not pipe _make_test_img
>>   iotests: Use convert -n in some cases
>>   iotests: Avoid renaming images
>>   iotests: Derive image names from $TEST_IMG
>>   iotests/091: Use _cleanup_qemu instad of "wait"
>>   iotests: Restrict some Python tests to file
>>   iotests: Let _make_test_img guess $TEST_IMG_FILE
>>   iotests: Allow testing FUSE exports
>>   iotests: Enable fuse for many tests
>>   iotests/281: Add test for FUSE exports
>>
>>  block.c  |   4 +
>>  block/Makefile.objs  |   3 +
>>  block/fuse.c | 668 +++
>>  configure|  68 
>>  include/block/fuse.h |  24 ++
>>  qapi/block.json  |  42 ++
>>  tests/qemu-iotests/013   |   9 +-
>>  tests/qemu-iotests/013.out   |   3 +-
>>  tests/qemu-iotests/018   |   5 +-
>>  tests/qemu-iotests/018.out   |   1 +
>>  tests/qemu-iotests/020   |   2 +-
>>  tests/qemu-iotests/025   |   2 +-
>>  tests/qemu-iotests/026   |   2 +-
>>  tests/qemu-iotests/028   |  16 +-
>>  tests/qemu-iotests/028.out   |   3 +
>>  tests/qemu-iotests/031   |   2 +-
>>  tests/qemu-iotests/034   |   2 +-
>>  tests/qemu-iotests/036   |   2 +-
>>  tests/qemu-iotests/037   |   2 +-
>>  tests/qemu-iotests/038   |   2 +-
>>  tests/qemu-iotests/039   |   2 +-
>>  tests/qemu-iotests/046   |   7 +-
>>  tests/qemu-iotests/046.out   |   2 +-
>>  tests/qemu-iotests/050   |   2 +-
>>  tests/qemu-iotests/054   |   2 +-
>>  tests/qemu-iotests/060   |   2 +-
>>  tests/qemu-iotests/071   |  21 +-
>>  tests/qemu-iotests/072   |   5 +-
>>  tests/qemu-iotests/072.out   |   1 +
>>  tests/qemu-iotests/079   |   2 +-
>>  tests/qemu-iotests/080   |   2 +-
>>  tests/qemu-iotests/089   |   5 +-
>>  tests/qemu-iotests/089.out   |   1 +
>>  tests/qemu-iotests/090   |   2 +-
>>  tests/qemu-iotests/091   |   5 +-
>>  tests/qemu-iotests/095   |   2 +-

Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE

2019-12-20 Thread Kevin Wolf
Am 19.12.2019 um 15:38 hat Max Reitz geschrieben:
> fuse-export-add allows mounting block graph nodes via FUSE on some
> existing regular file.  That file should then appears like a raw disk
> image, and accesses to it result in accesses to the exported BDS.
> 
> Right now, we only set up the mount point and tear all mount points down
> in bdrv_close_all().  We do not implement any access functions, so
> accessing the mount point only results in errors.  This will be
> addressed by a followup patch.
> 
> The set of exported nodes is kept in a hash table so we can later add a
> fuse-export-remove that allows unmounting.
> 
> Signed-off-by: Max Reitz 

> diff --git a/qapi/block.json b/qapi/block.json
> index 145c268bb6..03f8d1b537 100644
> --- a/qapi/block.json
> +++ b/qapi/block.json
> @@ -317,6 +317,29 @@
>  ##
>  { 'command': 'nbd-server-stop' }
>  
> +##
> +# @fuse-export-add:
> +#
> +# Exports a block graph node on some (file) mountpoint as a raw image.
> +#
> +# @node-name: Node to be exported
> +#
> +# @mountpoint: Path on which to export the block device via FUSE.
> +#  This must point to an existing regular file.
> +#
> +# @writable: Whether clients should be able to write to the block
> +#device via the FUSE export. (default: false)
> +#
> +# Since: 5.0
> +##
> +{ 'command': 'fuse-export-add',
> +  'data': {
> +  'node-name': 'str',
> +  'mountpoint': 'str',
> +  '*writable': 'bool'
> +  },
> +  'if': 'defined(CONFIG_FUSE)' }

Can this use a BlockExport union from the start like I'm introducing in
the storage daemon series, together with a generic block-export-add?

It also looks like node-name and writable should be part of the common
base of BlockExport. Unfortunately this would mean that I can't use the
same BlockExportNbd for the existing nbd-server-add command any more. I
guess I could somehow get a shared base type for both, though.

Markus, any thoughts on these QAPI interfaces?

Kevin




[PULL 3/3] virtio-blk: fix out-of-bounds access to bitmap in notify_guest_bh

2019-12-20 Thread Stefan Hajnoczi
From: Li Hangjing 

When the number of a virtio-blk device's virtqueues is larger than
BITS_PER_LONG, the out-of-bounds access to bitmap[ ] will occur.

Fixes: e21737ab15 ("virtio-blk: multiqueue batch notify")
Cc: qemu-sta...@nongnu.org
Cc: Stefan Hajnoczi 
Signed-off-by: Li Hangjing 
Reviewed-by: Xie Yongji 
Reviewed-by: Chai Wen 
Message-id: 20191216023050.48620-1-lihangj...@baidu.com
Message-Id: <20191216023050.48620-1-lihangj...@baidu.com>
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/dataplane/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 119906a5fe..1b52e8159c 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -67,7 +67,7 @@ static void notify_guest_bh(void *opaque)
 memset(s->batch_notify_vqs, 0, sizeof(bitmap));
 
 for (j = 0; j < nvqs; j += BITS_PER_LONG) {
-unsigned long bits = bitmap[j];
+unsigned long bits = bitmap[j / BITS_PER_LONG];
 
 while (bits != 0) {
 unsigned i = j + ctzl(bits);
-- 
2.23.0




[PULL 2/3] docs: fix rst syntax errors in unbuilt docs

2019-12-20 Thread Stefan Hajnoczi
The .rst files outside docs/{devel,interop,specs} aren't built yet and
therefore a few syntax errors have slipped through.  Fix them.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Daniel P. Berrangé 
Message-Id: <2019094411.427174-1-stefa...@redhat.com>
Signed-off-by: Stefan Hajnoczi 
---
 docs/arm-cpu-features.rst|  6 +++---
 docs/virtio-net-failover.rst |  4 ++--
 docs/virtio-pmem.rst | 19 ++-
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
index 1b367e22e1..9b537a75e6 100644
--- a/docs/arm-cpu-features.rst
+++ b/docs/arm-cpu-features.rst
@@ -41,9 +41,9 @@ CPU type is possible with the `query-cpu-model-expansion` QMP 
command.
 Below are some examples where `scripts/qmp/qmp-shell` (see the top comment
 block in the script for usage) is used to issue the QMP commands.
 
-(1) Determine which CPU features are available for the `max` CPU type
-(Note, we started QEMU with qemu-system-aarch64, so `max` is
- implementing the ARMv8-A reference manual in this case)::
+1. Determine which CPU features are available for the `max` CPU type
+   (Note, we started QEMU with qemu-system-aarch64, so `max` is
+   implementing the ARMv8-A reference manual in this case)::
 
   (QEMU) query-cpu-model-expansion type=full model={"name":"max"}
   { "return": {
diff --git a/docs/virtio-net-failover.rst b/docs/virtio-net-failover.rst
index 22f64c7bc8..6002dc5d96 100644
--- a/docs/virtio-net-failover.rst
+++ b/docs/virtio-net-failover.rst
@@ -1,6 +1,6 @@
-
+==
 QEMU virtio-net standby (net_failover)
-
+==
 
 This document explains the setup and usage of virtio-net standby feature which
 is used to create a net_failover pair of devices.
diff --git a/docs/virtio-pmem.rst b/docs/virtio-pmem.rst
index e77881b26f..4bf5d00443 100644
--- a/docs/virtio-pmem.rst
+++ b/docs/virtio-pmem.rst
@@ -27,17 +27,18 @@ virtio pmem usage
 -
 
   A virtio pmem device backed by a memory-backend-file can be created on
-  the QEMU command line as in the following example:
+  the QEMU command line as in the following example::
 
-  -object memory-backend-file,id=mem1,share,mem-path=./virtio_pmem.img,size=4G
-  -device virtio-pmem-pci,memdev=mem1,id=nv1
+-object 
memory-backend-file,id=mem1,share,mem-path=./virtio_pmem.img,size=4G
+-device virtio-pmem-pci,memdev=mem1,id=nv1
 
-   where:
-   - "object memory-backend-file,id=mem1,share,mem-path=, size="
- creates a backend file with the specified size.
+  where:
 
-   - "device virtio-pmem-pci,id=nvdimm1,memdev=mem1" creates a virtio pmem
- pci device whose storage is provided by above memory backend device.
+  - "object memory-backend-file,id=mem1,share,mem-path=, size="
+creates a backend file with the specified size.
+
+  - "device virtio-pmem-pci,id=nvdimm1,memdev=mem1" creates a virtio pmem
+pci device whose storage is provided by above memory backend device.
 
   Multiple virtio pmem devices can be created if multiple pairs of "-object"
   and "-device" are provided.
@@ -50,7 +51,7 @@ memory backing has to be added via 'object_add'; afterwards, 
the virtio
 pmem device can be added via 'device_add'.
 
 For example, the following commands add another 4GB virtio pmem device to
-the guest:
+the guest::
 
  (qemu) object_add 
memory-backend-file,id=mem2,share=on,mem-path=virtio_pmem2.img,size=4G
  (qemu) device_add virtio-pmem-pci,id=virtio_pmem2,memdev=mem2
-- 
2.23.0




[PULL 1/3] virtio-blk: deprecate SCSI passthrough

2019-12-20 Thread Stefan Hajnoczi
The Linux virtio_blk.ko guest driver is removing legacy SCSI passthrough
support.  Deprecate this feature in QEMU too.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Thomas Huth 
Message-id: 20191213144626.1208237-1-stefa...@redhat.com
Message-Id: <20191213144626.1208237-1-stefa...@redhat.com>
Signed-off-by: Stefan Hajnoczi 
---
 qemu-deprecated.texi | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index 62680f7bd5..259cb9ce9e 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -278,6 +278,17 @@ spec you can use the ``-cpu rv64gcsu,priv_spec=v1.9.1`` 
command line argument.
 
 @section Device options
 
+@subsection Emulated device options
+
+@subsubsection -device virtio-blk,scsi=on|off (since 5.0.0)
+
+The virtio-blk SCSI passthrough feature is a legacy VIRTIO feature.  VIRTIO 1.0
+and later do not support it because the virtio-scsi device was introduced for
+full SCSI support.  Use virtio-scsi instead when SCSI passthrough is required.
+
+Note this also applies to ``-device virtio-blk-pci,scsi=on|off'', which is an
+alias.
+
 @subsection Block device options
 
 @subsubsection "backing": "" (since 2.12.0)
-- 
2.23.0




[PULL 0/3] Block patches

2019-12-20 Thread Stefan Hajnoczi
The following changes since commit aceeaa69d28e6f08a24395d0aa6915b687d0a681:

  Merge remote-tracking branch 
'remotes/huth-gitlab/tags/pull-request-2019-12-17' into staging (2019-12-17 
15:55:20 +)

are available in the Git repository at:

  https://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 725fe5d10dbd4259b1853b7d253cef83a3c0d22a:

  virtio-blk: fix out-of-bounds access to bitmap in notify_guest_bh (2019-12-19 
16:20:25 +)


Pull request



Li Hangjing (1):
  virtio-blk: fix out-of-bounds access to bitmap in notify_guest_bh

Stefan Hajnoczi (2):
  virtio-blk: deprecate SCSI passthrough
  docs: fix rst syntax errors in unbuilt docs

 docs/arm-cpu-features.rst   |  6 +++---
 docs/virtio-net-failover.rst|  4 ++--
 docs/virtio-pmem.rst| 19 ++-
 hw/block/dataplane/virtio-blk.c |  2 +-
 qemu-deprecated.texi| 11 +++
 5 files changed, 27 insertions(+), 15 deletions(-)

-- 
2.23.0




Re: [PATCH v2] iotests.py: Let wait_migration wait even more

2019-12-20 Thread Max Reitz
On 20.12.19 11:07, Kevin Wolf wrote:
> Am 20.12.2019 um 10:52 hat Max Reitz geschrieben:
>> On 20.12.19 10:32, Kevin Wolf wrote:
>>> Am 19.12.2019 um 19:36 hat Max Reitz geschrieben:
 The "migration completed" event may be sent (on the source, to be
 specific) before the migration is actually completed, so the VM runstate
 will still be "finish-migrate" instead of "postmigrate".  So ask the
 users of VM.wait_migration() to specify the final runstate they desire
 and then poll the VM until it has reached that state.  (This should be
 over very quickly, so busy polling is fine.)

 Without this patch, I see intermittent failures in the new iotest 280
 under high system load.  I have not yet seen such failures with other
 iotests that use VM.wait_migration() and query-status afterwards, but
 maybe they just occur even more rarely, or it is because they also wait
 on the destination VM to be running.

 Signed-off-by: Max Reitz 
>>>
 diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
 index 13fd8b5cd2..0b62c42851 100644
 --- a/tests/qemu-iotests/iotests.py
 +++ b/tests/qemu-iotests/iotests.py
 @@ -668,12 +668,16 @@ class VM(qtest.QEMUQtestMachine):
  }
  ]))
  
 -def wait_migration(self):
 +def wait_migration(self, expect_runstate):
  while True:
  event = self.event_wait('MIGRATION')
  log(event, filters=[filter_qmp_event])
  if event['data']['status'] == 'completed':
  break
 +# The event may occur in finish-migrate, so wait for the expected
 +# post-migration runstate
>>>
>>> That's a bit too specific now that you have expect_runstate.
>>
>> Can you be more specific? :-)
>>
>> If you mean the fact of mentioning “post-migration runstate”, I simply
>> meant that as “the runstate after the migration”.  The specific runstate
>> on the source VM is called “postmigrate”.
> 
> Oh, yes, "postmigrate" is what I had in mind.
> 
>> I wouldn’t mind changing it to “after-migration runstate” or something
>> similar, if that’s what you mean.
> 
> "runstate after migration"?

Sure.

(Now you got me to wonder what permutations are permissible.  “migration
after runstate” isn’t.  “runstate migration after” is just garbage.  So
probebly on RAM and AMR.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 00/18] block: Allow exporting BDSs via FUSE

2019-12-20 Thread Stefan Hajnoczi
On Thu, Dec 19, 2019 at 03:38:00PM +0100, Max Reitz wrote:
> Preamble: This series is based on a combination of my (current) block
> branch and “iotests: Minor fixes”.  I’ve pushed it here:
> 
>   https://git.xanclic.moe/XanClic/qemu fuse-exports-v1
> 
> (The base is on fuse-exports-v1-base.)
> 
> 
> Hi,
> 
> Ever since I found out that you can mount FUSE filesystems on regular
> files (not just directories), I had the idea of adding FUSE block
> exports to qemu where you can export block nodes as raw images.  The
> best thing is that you’d be able to mount an image on itself, so
> whatever format it may be in, qemu lets it appear as a raw image (and
> you can then use regular tools like dd on it).
> 
> I started with some concept of a qemu-blkfuse daemon (similar to
> qemu-nbd), but never sent patches, for two reasons: (1) Performance was
> not good, (2) it didn’t seem right, for some reason.
> 
> Now Kevin is proposing a storage daemon for multiple export types like
> NBD, and he also mentioned FUSE (while knowing of my previous attempts).
> Now it does seem right to add FUSE exports to qemu, but only in the form
> of some module with a proper QAPI/QMP binding.
> 
> Performance is still quite bad, but who cares.  We can always improve
> it, if the need arises.
> 
> 
> This series does the following:
> 
> First, add the FUSE export module (block/fuse.c) that implements the
> basic file access functions.  (Note that you need libfuse 3.8.0 or later
> for SEEK_HOLE/SEEK_DATA.)
> 
> Second, it allows using FUSE exports as a protocol in the iotests and
> makes many iotests work with it.  (The file node is exported by a
> background qemu instance to $SOCK_DIR.)
> Note that I only ran raw and qcow2 on it; I’m sure other formats
> currently have some failing tests.
> 
> This gives us a lot of coverage for, well, not free (it does take ten
> patches), but for cheap; but there are still some more specialized
> things we want to test, so third and last, this series adds an iotest
> dedicated to FUSE exports.
> 
> 
> Final rather important notice: I didn’t really run the iotests with this
> yet.  I wanted to, but they appear rather broken on current master,
> actually.  I’m not yet sure whether that’s because something in my setup
> broke in the last two weeks, or because there’s quite something broken
> in master (it does look like there are a couple things broken in master
> currently).
> 
> 
> Max Reitz (18):
>   configure: Detect libfuse
>   fuse: Allow exporting BDSs via FUSE
>   fuse: Implement standard FUSE operations
>   fuse: Add fuse-export-remove
>   fuse: Allow growable exports
>   fuse: (Partially) implement fallocate()
>   fuse: Implement hole detection through lseek
>   iotests: Do not needlessly filter _make_test_img
>   iotests: Do not pipe _make_test_img
>   iotests: Use convert -n in some cases
>   iotests: Avoid renaming images
>   iotests: Derive image names from $TEST_IMG
>   iotests/091: Use _cleanup_qemu instad of "wait"
>   iotests: Restrict some Python tests to file
>   iotests: Let _make_test_img guess $TEST_IMG_FILE
>   iotests: Allow testing FUSE exports
>   iotests: Enable fuse for many tests
>   iotests/281: Add test for FUSE exports
> 
>  block.c  |   4 +
>  block/Makefile.objs  |   3 +
>  block/fuse.c | 668 +++
>  configure|  68 
>  include/block/fuse.h |  24 ++
>  qapi/block.json  |  42 ++
>  tests/qemu-iotests/013   |   9 +-
>  tests/qemu-iotests/013.out   |   3 +-
>  tests/qemu-iotests/018   |   5 +-
>  tests/qemu-iotests/018.out   |   1 +
>  tests/qemu-iotests/020   |   2 +-
>  tests/qemu-iotests/025   |   2 +-
>  tests/qemu-iotests/026   |   2 +-
>  tests/qemu-iotests/028   |  16 +-
>  tests/qemu-iotests/028.out   |   3 +
>  tests/qemu-iotests/031   |   2 +-
>  tests/qemu-iotests/034   |   2 +-
>  tests/qemu-iotests/036   |   2 +-
>  tests/qemu-iotests/037   |   2 +-
>  tests/qemu-iotests/038   |   2 +-
>  tests/qemu-iotests/039   |   2 +-
>  tests/qemu-iotests/046   |   7 +-
>  tests/qemu-iotests/046.out   |   2 +-
>  tests/qemu-iotests/050   |   2 +-
>  tests/qemu-iotests/054   |   2 +-
>  tests/qemu-iotests/060   |   2 +-
>  tests/qemu-iotests/071   |  21 +-
>  tests/qemu-iotests/072   |   5 +-
>  tests/qemu-iotests/072.out   |   1 +
>  tests/qemu-iotests/079   |   2 +-
>  tests/qemu-iotests/080   |   2 +-
>  tests/qemu-iotests/089   |   5 +-
>  tests/qemu-iotests/089.out   |   1 +
>  tests/qemu-iotests/090   |   2 +-
>  tests/qemu-iotests/091   |   5 +-
>  tests/qemu-iotests/095   |   2 +-
>  tests/qemu-iotests/097   |   2 +-
>  tests/qemu-iotests/098   |   2 +-
>  tests/qemu-iotests/102   |   2 +-
>  

Re: [PATCH v2] iotests.py: Let wait_migration wait even more

2019-12-20 Thread Kevin Wolf
Am 20.12.2019 um 10:52 hat Max Reitz geschrieben:
> On 20.12.19 10:32, Kevin Wolf wrote:
> > Am 19.12.2019 um 19:36 hat Max Reitz geschrieben:
> >> The "migration completed" event may be sent (on the source, to be
> >> specific) before the migration is actually completed, so the VM runstate
> >> will still be "finish-migrate" instead of "postmigrate".  So ask the
> >> users of VM.wait_migration() to specify the final runstate they desire
> >> and then poll the VM until it has reached that state.  (This should be
> >> over very quickly, so busy polling is fine.)
> >>
> >> Without this patch, I see intermittent failures in the new iotest 280
> >> under high system load.  I have not yet seen such failures with other
> >> iotests that use VM.wait_migration() and query-status afterwards, but
> >> maybe they just occur even more rarely, or it is because they also wait
> >> on the destination VM to be running.
> >>
> >> Signed-off-by: Max Reitz 
> > 
> >> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> >> index 13fd8b5cd2..0b62c42851 100644
> >> --- a/tests/qemu-iotests/iotests.py
> >> +++ b/tests/qemu-iotests/iotests.py
> >> @@ -668,12 +668,16 @@ class VM(qtest.QEMUQtestMachine):
> >>  }
> >>  ]))
> >>  
> >> -def wait_migration(self):
> >> +def wait_migration(self, expect_runstate):
> >>  while True:
> >>  event = self.event_wait('MIGRATION')
> >>  log(event, filters=[filter_qmp_event])
> >>  if event['data']['status'] == 'completed':
> >>  break
> >> +# The event may occur in finish-migrate, so wait for the expected
> >> +# post-migration runstate
> > 
> > That's a bit too specific now that you have expect_runstate.
> 
> Can you be more specific? :-)
> 
> If you mean the fact of mentioning “post-migration runstate”, I simply
> meant that as “the runstate after the migration”.  The specific runstate
> on the source VM is called “postmigrate”.

Oh, yes, "postmigrate" is what I had in mind.

> I wouldn’t mind changing it to “after-migration runstate” or something
> similar, if that’s what you mean.

"runstate after migration"?

Kevin


signature.asc
Description: PGP signature


Re: [PATCH] block: nbd: Fix dirty bitmap context name

2019-12-20 Thread Vladimir Sementsov-Ogievskiy
20.12.2019 12:56, Peter Krempa wrote:
> On Fri, Dec 20, 2019 at 09:39:17 +, Vladimir Sementsov-Ogievskiy wrote:
>> 19.12.2019 18:55, Nir Soffer wrote:
>>> On Thu, Dec 19, 2019 at 5:17 PM Vladimir Sementsov-Ogievskiy
>>>  wrote:

 19.12.2019 17:59, Nir Soffer wrote:
> On Thu, Dec 19, 2019 at 4:04 PM Kevin Wolf  wrote:
>>
>> Am 19.12.2019 um 14:41 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> Ahh, I see, it's documented as
>>>
>>> +# @bitmap: Also export the dirty bitmap reachable from @device, so the
>>> +#  NBD client can use NBD_OPT_SET_META_CONTEXT with
>>> +#  "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0)
>>>
>>> and it is logical to assume that export name (which is @name argument) 
>>> is
>>> mentioned. But we never mentioned it. This is just documented after
>>> removed experimenatl command x-nbd-server-add-bitmap,
>>>
>>> look at
>>>
>>> commit 7dc570b3806e5b0a4c9219061556ed5a4a0de80c
>>> Author: Eric Blake 
>>> Date:   Fri Jan 11 13:47:18 2019 -0600
>>>
>>> nbd: Remove x-nbd-server-add-bitmap
>>>
>>> ...
>>>
>>> -# @bitmap-export-name: How the bitmap will be seen by nbd clients
>>> -#  (default @bitmap)
>>> -#
>>> -# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of
>>> -# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) to 
>>> access
>>> -# the exposed bitmap.
>>>
>>>
>>> So, this "NAME" is saved and now looks incorrect. What should be fixed, 
>>> is Qapi
>>> documentation.
>>
>> Hm, I don't know these interfaces very well, but from you explanation it
>> looks to me as if having a bitmap name made sense with
>> x-nbd-server-add-bitmap because it could be called more than once for
>> exporting multiple bitmaps.
>>
>> But now, we have only nbd-server-add, which takes a single bitmap name.
>> As we don't have to distinguish multiple bitmaps any more, wouldn't it
>> make more sense to use "qemu:dirty-bitmap" without any other
>> information? Both export name and bitmap name are already identified by
>> the connection.
>
> We can use empty string (like the default export name), so the bitmap
> would be exposed as:
>
>"qemu:dirty-bitmap:"
>
> This would solve the issue for users, and keep the API extensible.

 As I already said, we can not. If we really wont such thing, use another 
 name,
 likq qemu:default-dirty-bitmap..

 Or call bitmap export "default", to produce
 "qemu:dirty-bitmaps:default"

 But don't change default behavior of nbd-server-add

>
>> But if we have to have something there, using the bitmap name (which may
>> or may not be the same as used in the image file) makes a little more
>> sense because it makes the interface extensible for the case that we
>> ever want to re-introduce an nbd-server-add-bitmap.
>
> But using the bitmap name means user of the NBD server need to know this 
> name.

 Why not? What is your case exactly? User knows, what kind of bitmap you are
 exporting, so user is in some relation with exporting process anyway. Why
 shouldn't it know the name?
>>>
>>> Because the user configuring qemu (libvirt) is not the same user
>>> accessing qemu NBD
>>> server (ovirt, or some backup application).
>>>
 This name may be defined in you exporting protocol.. What are you 
 exporting?
>>>
>>> We export HTTP API, allowing getting dirty extents and reading data:
>>> https://www.ovirt.org/develop/release-management/features/storage/incremental-backup.html#map-request
>>> (this document is outdated, but it gives the general idea)
>>>
>>> Or provide the NBD URL directly to user (future).
>>>
 Note also, that client may use NBD_OPT_LIST_META_CONTEXT with query
 "qemu:dirty-bitmap:", to get list of all exported bitmaps.
>>>
>>> This is another option, I did not try to use this yet. We can use the single
>>> exported bitmap and fail if we get more than one. This is probably better
>>> than changing the entire stack to support bitmap name.
>>>
> One option is that libvirt would publish the name of the bitmap in the
> xml describing
> the backup, and oVirt will have to propagate this name to the actual
> program accessing
> the NBD server, which may be a user program in the case when we expose 
> the NBD
> URL to users (planned for future version).
>
> Another option is that the user will control this name, and libvirt
> will use the name specified
> by the user. This means oVirt will have to provide API to set this
> name and pass it to libvirt.
>
> Both cases require lot of effort which does not help anyone in the
> task of getting dirty
> extents from an image - which is our current goal. 

Re: [PATCH] block: nbd: Fix dirty bitmap context name

2019-12-20 Thread Peter Krempa
On Fri, Dec 20, 2019 at 09:39:17 +, Vladimir Sementsov-Ogievskiy wrote:
> 19.12.2019 18:55, Nir Soffer wrote:
> > On Thu, Dec 19, 2019 at 5:17 PM Vladimir Sementsov-Ogievskiy
> >  wrote:
> >>
> >> 19.12.2019 17:59, Nir Soffer wrote:
> >>> On Thu, Dec 19, 2019 at 4:04 PM Kevin Wolf  wrote:
> 
>  Am 19.12.2019 um 14:41 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > Ahh, I see, it's documented as
> >
> > +# @bitmap: Also export the dirty bitmap reachable from @device, so the
> > +#  NBD client can use NBD_OPT_SET_META_CONTEXT with
> > +#  "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0)
> >
> > and it is logical to assume that export name (which is @name argument) 
> > is
> > mentioned. But we never mentioned it. This is just documented after
> > removed experimenatl command x-nbd-server-add-bitmap,
> >
> > look at
> >
> > commit 7dc570b3806e5b0a4c9219061556ed5a4a0de80c
> > Author: Eric Blake 
> > Date:   Fri Jan 11 13:47:18 2019 -0600
> >
> >nbd: Remove x-nbd-server-add-bitmap
> >
> > ...
> >
> > -# @bitmap-export-name: How the bitmap will be seen by nbd clients
> > -#  (default @bitmap)
> > -#
> > -# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of
> > -# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) to 
> > access
> > -# the exposed bitmap.
> >
> >
> > So, this "NAME" is saved and now looks incorrect. What should be fixed, 
> > is Qapi
> > documentation.
> 
>  Hm, I don't know these interfaces very well, but from you explanation it
>  looks to me as if having a bitmap name made sense with
>  x-nbd-server-add-bitmap because it could be called more than once for
>  exporting multiple bitmaps.
> 
>  But now, we have only nbd-server-add, which takes a single bitmap name.
>  As we don't have to distinguish multiple bitmaps any more, wouldn't it
>  make more sense to use "qemu:dirty-bitmap" without any other
>  information? Both export name and bitmap name are already identified by
>  the connection.
> >>>
> >>> We can use empty string (like the default export name), so the bitmap
> >>> would be exposed as:
> >>>
> >>>   "qemu:dirty-bitmap:"
> >>>
> >>> This would solve the issue for users, and keep the API extensible.
> >>
> >> As I already said, we can not. If we really wont such thing, use another 
> >> name,
> >> likq qemu:default-dirty-bitmap..
> >>
> >> Or call bitmap export "default", to produce
> >>"qemu:dirty-bitmaps:default"
> >>
> >> But don't change default behavior of nbd-server-add
> >>
> >>>
>  But if we have to have something there, using the bitmap name (which may
>  or may not be the same as used in the image file) makes a little more
>  sense because it makes the interface extensible for the case that we
>  ever want to re-introduce an nbd-server-add-bitmap.
> >>>
> >>> But using the bitmap name means user of the NBD server need to know this 
> >>> name.
> >>
> >> Why not? What is your case exactly? User knows, what kind of bitmap you are
> >> exporting, so user is in some relation with exporting process anyway. Why
> >> shouldn't it know the name?
> > 
> > Because the user configuring qemu (libvirt) is not the same user
> > accessing qemu NBD
> > server (ovirt, or some backup application).
> > 
> >> This name may be defined in you exporting protocol.. What are you 
> >> exporting?
> > 
> > We export HTTP API, allowing getting dirty extents and reading data:
> > https://www.ovirt.org/develop/release-management/features/storage/incremental-backup.html#map-request
> > (this document is outdated, but it gives the general idea)
> > 
> > Or provide the NBD URL directly to user (future).
> > 
> >> Note also, that client may use NBD_OPT_LIST_META_CONTEXT with query
> >> "qemu:dirty-bitmap:", to get list of all exported bitmaps.
> > 
> > This is another option, I did not try to use this yet. We can use the single
> > exported bitmap and fail if we get more than one. This is probably better
> > than changing the entire stack to support bitmap name.
> > 
> >>> One option is that libvirt would publish the name of the bitmap in the
> >>> xml describing
> >>> the backup, and oVirt will have to propagate this name to the actual
> >>> program accessing
> >>> the NBD server, which may be a user program in the case when we expose 
> >>> the NBD
> >>> URL to users (planned for future version).
> >>>
> >>> Another option is that the user will control this name, and libvirt
> >>> will use the name specified
> >>> by the user. This means oVirt will have to provide API to set this
> >>> name and pass it to libvirt.
> >>>
> >>> Both cases require lot of effort which does not help anyone in the
> >>> task of getting dirty
> >>> extents from an image - which is our current goal. We need to have
> >>> good defaults 

Re: [PATCH v2] iotests.py: Let wait_migration wait even more

2019-12-20 Thread Max Reitz
On 20.12.19 10:32, Kevin Wolf wrote:
> Am 19.12.2019 um 19:36 hat Max Reitz geschrieben:
>> The "migration completed" event may be sent (on the source, to be
>> specific) before the migration is actually completed, so the VM runstate
>> will still be "finish-migrate" instead of "postmigrate".  So ask the
>> users of VM.wait_migration() to specify the final runstate they desire
>> and then poll the VM until it has reached that state.  (This should be
>> over very quickly, so busy polling is fine.)
>>
>> Without this patch, I see intermittent failures in the new iotest 280
>> under high system load.  I have not yet seen such failures with other
>> iotests that use VM.wait_migration() and query-status afterwards, but
>> maybe they just occur even more rarely, or it is because they also wait
>> on the destination VM to be running.
>>
>> Signed-off-by: Max Reitz 
> 
>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>> index 13fd8b5cd2..0b62c42851 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -668,12 +668,16 @@ class VM(qtest.QEMUQtestMachine):
>>  }
>>  ]))
>>  
>> -def wait_migration(self):
>> +def wait_migration(self, expect_runstate):
>>  while True:
>>  event = self.event_wait('MIGRATION')
>>  log(event, filters=[filter_qmp_event])
>>  if event['data']['status'] == 'completed':
>>  break
>> +# The event may occur in finish-migrate, so wait for the expected
>> +# post-migration runstate
> 
> That's a bit too specific now that you have expect_runstate.

Can you be more specific? :-)

If you mean the fact of mentioning “post-migration runstate”, I simply
meant that as “the runstate after the migration”.  The specific runstate
on the source VM is called “postmigrate”.

I wouldn’t mind changing it to “after-migration runstate” or something
similar, if that’s what you mean.

Max

>> +while self.qmp('query-status')['return']['status'] != 
>> expect_runstate:
>> +pass
>>  
>>  def node_info(self, node_name):
>>  nodes = self.qmp('query-named-block-nodes')
> 
> Tested-by: Kevin Wolf 
> 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] block: nbd: Fix dirty bitmap context name

2019-12-20 Thread Vladimir Sementsov-Ogievskiy
19.12.2019 18:55, Nir Soffer wrote:
> On Thu, Dec 19, 2019 at 5:17 PM Vladimir Sementsov-Ogievskiy
>  wrote:
>>
>> 19.12.2019 17:59, Nir Soffer wrote:
>>> On Thu, Dec 19, 2019 at 4:04 PM Kevin Wolf  wrote:

 Am 19.12.2019 um 14:41 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Ahh, I see, it's documented as
>
> +# @bitmap: Also export the dirty bitmap reachable from @device, so the
> +#  NBD client can use NBD_OPT_SET_META_CONTEXT with
> +#  "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0)
>
> and it is logical to assume that export name (which is @name argument) is
> mentioned. But we never mentioned it. This is just documented after
> removed experimenatl command x-nbd-server-add-bitmap,
>
> look at
>
> commit 7dc570b3806e5b0a4c9219061556ed5a4a0de80c
> Author: Eric Blake 
> Date:   Fri Jan 11 13:47:18 2019 -0600
>
>nbd: Remove x-nbd-server-add-bitmap
>
> ...
>
> -# @bitmap-export-name: How the bitmap will be seen by nbd clients
> -#  (default @bitmap)
> -#
> -# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of
> -# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) to 
> access
> -# the exposed bitmap.
>
>
> So, this "NAME" is saved and now looks incorrect. What should be fixed, 
> is Qapi
> documentation.

 Hm, I don't know these interfaces very well, but from you explanation it
 looks to me as if having a bitmap name made sense with
 x-nbd-server-add-bitmap because it could be called more than once for
 exporting multiple bitmaps.

 But now, we have only nbd-server-add, which takes a single bitmap name.
 As we don't have to distinguish multiple bitmaps any more, wouldn't it
 make more sense to use "qemu:dirty-bitmap" without any other
 information? Both export name and bitmap name are already identified by
 the connection.
>>>
>>> We can use empty string (like the default export name), so the bitmap
>>> would be exposed as:
>>>
>>>   "qemu:dirty-bitmap:"
>>>
>>> This would solve the issue for users, and keep the API extensible.
>>
>> As I already said, we can not. If we really wont such thing, use another 
>> name,
>> likq qemu:default-dirty-bitmap..
>>
>> Or call bitmap export "default", to produce
>>"qemu:dirty-bitmaps:default"
>>
>> But don't change default behavior of nbd-server-add
>>
>>>
 But if we have to have something there, using the bitmap name (which may
 or may not be the same as used in the image file) makes a little more
 sense because it makes the interface extensible for the case that we
 ever want to re-introduce an nbd-server-add-bitmap.
>>>
>>> But using the bitmap name means user of the NBD server need to know this 
>>> name.
>>
>> Why not? What is your case exactly? User knows, what kind of bitmap you are
>> exporting, so user is in some relation with exporting process anyway. Why
>> shouldn't it know the name?
> 
> Because the user configuring qemu (libvirt) is not the same user
> accessing qemu NBD
> server (ovirt, or some backup application).
> 
>> This name may be defined in you exporting protocol.. What are you exporting?
> 
> We export HTTP API, allowing getting dirty extents and reading data:
> https://www.ovirt.org/develop/release-management/features/storage/incremental-backup.html#map-request
> (this document is outdated, but it gives the general idea)
> 
> Or provide the NBD URL directly to user (future).
> 
>> Note also, that client may use NBD_OPT_LIST_META_CONTEXT with query
>> "qemu:dirty-bitmap:", to get list of all exported bitmaps.
> 
> This is another option, I did not try to use this yet. We can use the single
> exported bitmap and fail if we get more than one. This is probably better
> than changing the entire stack to support bitmap name.
> 
>>> One option is that libvirt would publish the name of the bitmap in the
>>> xml describing
>>> the backup, and oVirt will have to propagate this name to the actual
>>> program accessing
>>> the NBD server, which may be a user program in the case when we expose the 
>>> NBD
>>> URL to users (planned for future version).
>>>
>>> Another option is that the user will control this name, and libvirt
>>> will use the name specified
>>> by the user. This means oVirt will have to provide API to set this
>>> name and pass it to libvirt.
>>>
>>> Both cases require lot of effort which does not help anyone in the
>>> task of getting dirty
>>> extents from an image - which is our current goal. We need to have
>>> good defaults that
>>> save unneeded effort in the entire stack.
>>
>> So, you implementing some protocol, and need to export only one bitmap for
>> your specified scenario. Why not just give a constant name? Like 
>> ovirt-bitmap,
>> or something like this?
> 
> But we don't use qemu directly. We use libvirt, and libvirt does not expose

Re: [PATCH v2] iotests.py: Let wait_migration wait even more

2019-12-20 Thread Kevin Wolf
Am 19.12.2019 um 19:36 hat Max Reitz geschrieben:
> The "migration completed" event may be sent (on the source, to be
> specific) before the migration is actually completed, so the VM runstate
> will still be "finish-migrate" instead of "postmigrate".  So ask the
> users of VM.wait_migration() to specify the final runstate they desire
> and then poll the VM until it has reached that state.  (This should be
> over very quickly, so busy polling is fine.)
> 
> Without this patch, I see intermittent failures in the new iotest 280
> under high system load.  I have not yet seen such failures with other
> iotests that use VM.wait_migration() and query-status afterwards, but
> maybe they just occur even more rarely, or it is because they also wait
> on the destination VM to be running.
> 
> Signed-off-by: Max Reitz 

> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 13fd8b5cd2..0b62c42851 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -668,12 +668,16 @@ class VM(qtest.QEMUQtestMachine):
>  }
>  ]))
>  
> -def wait_migration(self):
> +def wait_migration(self, expect_runstate):
>  while True:
>  event = self.event_wait('MIGRATION')
>  log(event, filters=[filter_qmp_event])
>  if event['data']['status'] == 'completed':
>  break
> +# The event may occur in finish-migrate, so wait for the expected
> +# post-migration runstate

That's a bit too specific now that you have expect_runstate.

> +while self.qmp('query-status')['return']['status'] != 
> expect_runstate:
> +pass
>  
>  def node_info(self, node_name):
>  nodes = self.qmp('query-named-block-nodes')

Tested-by: Kevin Wolf 




Re: [PATCH] qapi/block: fix nbd-server-add spec

2019-12-20 Thread Vladimir Sementsov-Ogievskiy
19.12.2019 19:14, Nir Soffer wrote:
> On Thu, Dec 19, 2019 at 5:25 PM Vladimir Sementsov-Ogievskiy
>  wrote:
>>
>> 19.12.2019 18:08, Nir Soffer wrote:
>>> On Thu, Dec 19, 2019 at 5:00 PM Vladimir Sementsov-Ogievskiy
>>>  wrote:

 19.12.2019 17:42, Nir Soffer wrote:
> On Thu, Dec 19, 2019 at 4:34 PM Vladimir Sementsov-Ogievskiy
>  wrote:
>>
>> "NAME" here may be interpreted like it should match @name, which is
>> export name. But it was never mentioned in such way. Make it obvious,
>> that actual "" (see docs/interop/nbd.txt)
>> will match @bitmap parameter.
>
> But this is wrong, dirty-bitmap-export-name does not mean the actual 
> bitmap
> name but the name exposed to the NBD client, which can be anything.

 Yes. What is wrong? It can be enything. Currently by default it is bitmap 
 name.
 It purely documented (okay, even confusingly documented), but it was so 
 since
 4.0. And existing users obviously knows how it work (otherwise, they can't 
 use
 the feature)

 So, I think it's OK to fix spec to directly show implementation, that was 
 here
 since feature introducing.

>
>> Fixes: 5fcbeb06812685a2
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>
>> Hi all.
>>
>> This patch follows discussion on Nir's patch
>> [PATCH] block: nbd: Fix dirty bitmap context name
>> ( 
>> https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg04309.html )
>>
>> Let's just fix qapi spec now.
>
> But qapi documents a better behavior for users. We should fix the code 
> instead
> to mach the docs.

 1. Using disk name as a bitmap name is a bad behavior, as they are 
 completely
 different concepts. Especially keeping in mind that user already knows 
 disk name anyway
 and no reason to write this export name inside metadata context of this 
 export.
>>>
>>> The different concept is expressed by the "qemu:dirty-bitmap:" prefix.
>>> "qemu:dirty-bitmap:export-name" means the dirty bitmap for this export.
>>
>> Why do you think so? Did you read NBD specification?
> 
> Yes - the name of the bitmap does not have any meaning.

it does not have any meaning in your scenario. But in some other, when we need
to export several bitmaps, the name will be used.

> But for nbd_server_add we allow only single bitmap for export.

Yes, Qemu just don't use the whole power of NBD spec. But it may change.

> 
>> Metadata context is always owned by some export.
> 
> Of course.
> 
>> Do you mean that there will bemetadata contexts
>>
>> qemu:dirty-bitmap:export-A
>> qemu:dirty-bitmap:export-B
>>
>> both defined for export-A?
> 
> It does not make sense, but it is valid.

It conflicts with the fact that both contexts are owned by export-A. The idea
was that metadata is bound to the export. We should not see metadata of export
B in metadata of export A.

Moreover, consider that these two exports may have different size. Then how 
NBD_CMD_BLOCK_STATUS
will work?

We don't have global nbd server metadata, like you consider it. We only have 
per-export metadata,
which is bound (at least by size) to the export. So it's theoretically possible 
to export bitmap
of another nbd-export (if it has the same size), but it breaks the concept.

> 
 2. It's not directly documented. You assume that NAME == @name. I 
 understand that
 it may be assumed.. But it's not documented.
>>>
>>> But NAME is likely to be understood as the name argument, and unlikely to 
>>> be the
>>> bitmap name.
>>
>> Yes likely. But it's still bad specification, which should be fixed.
> 
> If we cannot change the current behavior since it will break current users,
> I agree fixing the spec to describe the current behavior is a good idea.
> 
> 
> 
> 
>>>
 3. It's never worked like you write. So if we change the behavior, we'll 
 break
 existing users.
>>>
>>> Do we have existing users? isn't this new feature in 4.2?
>>
>> No, it's since 4.0
>>
>>>
>>> Before we had experimental x-block-dirty-bitmap APIs, which are stable, so 
>>> users
>>> could not depend on them.
>>>
> With this we still have the issue of leaking internal bitmap name to
> users who do not
> control the name, and do not care about it.
>
>> qapi/block.json | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/qapi/block.json b/qapi/block.json
>> index 145c268bb6..8042ef78f0 100644
>> --- a/qapi/block.json
>> +++ b/qapi/block.json
>> @@ -255,7 +255,8 @@
>>
>> # @bitmap: Also export the dirty bitmap reachable from @device, so 
>> the
>> #  NBD client can use NBD_OPT_SET_META_CONTEXT with
>> -#  "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0)
>> +#  "qemu:dirty-bitmap:BITMAP" to inspect the bitmap (BITMAP here
>> +#  matches @bitmap 

Re: [PATCH] backup-top: Begin drain earlier

2019-12-20 Thread Vladimir Sementsov-Ogievskiy
19.12.2019 21:26, Max Reitz wrote:
> When dropping backup-top, we need to drain the node before freeing the
> BlockCopyState.  Otherwise, requests may still be in flight and then the
> assertion in shres_destroy() will fail.
> 
> (This becomes visible in intermittent failure of 056.)
> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Max Reitz 

Good catch

Reviewed-by: Vladimir Sementsov-Ogievskiy 

> ---
>   block/backup-top.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/backup-top.c b/block/backup-top.c
> index 7cdb1f8eba..818d3f26b4 100644
> --- a/block/backup-top.c
> +++ b/block/backup-top.c
> @@ -257,12 +257,12 @@ void bdrv_backup_top_drop(BlockDriverState *bs)
>   BDRVBackupTopState *s = bs->opaque;
>   AioContext *aio_context = bdrv_get_aio_context(bs);
>   
> -block_copy_state_free(s->bcs);
> -
>   aio_context_acquire(aio_context);
>   
>   bdrv_drained_begin(bs);
>   
> +block_copy_state_free(s->bcs);
> +
>   s->active = false;
>   bdrv_child_refresh_perms(bs, bs->backing, _abort);
>   bdrv_replace_node(bs, backing_bs(bs), _abort);
> 


-- 
Best regards,
Vladimir