Re: [PATCH] qapi/block: fix nbd-server-add spec
On 12/19/19 10:14 AM, Nir Soffer wrote: 1. Using disk name as a bitmap name is a bad behavior, as they are completely different concepts. Especially keeping in mind that user already knows disk name anyway and no reason to write this export name inside metadata context of this export. The different concept is expressed by the "qemu:dirty-bitmap:" prefix. "qemu:dirty-bitmap:export-name" means the dirty bitmap for this export. Why do you think so? Did you read NBD specification? Yes - the name of the bitmap does not have any meaning. But for nbd_server_add we allow only single bitmap for export. Just because qemu is currently limited to only exposing one bitmap at the moment does not mean that a future version can't expose multiple bitmaps. It may very well be that we have reason to expose both "qemu:dirty-bitmap:timeA" and "qemu:dirty-bitmap:timeB" on the same export, for exposing two bitmaps at once. To get to that point, we'd have to refactor the QAPI command to allow attaching more than one bitmap at the time of creating the NBD export, but it's not impossible. Metadata context is always owned by some export. Of course. Do you mean that there will bemetadata contexts qemu:dirty-bitmap:export-A qemu:dirty-bitmap:export-B both defined for export-A? It does not make sense, but it is valid. If an image has multiple bitmaps, exposing all of those as separate contexts at the same time for a single export can indeed make sense. 2. It's not directly documented. You assume that NAME == @name. I understand that it may be assumed.. But it's not documented. But NAME is likely to be understood as the name argument, and unlikely to be the bitmap name. Yes likely. But it's still bad specification, which should be fixed. If we cannot change the current behavior since it will break current users, I agree fixing the spec to describe the current behavior is a good idea. We need the doc fix. Whether we also want an additional fix adding an optional parameter allowing user control over the export name is also under debate (the fact that the old x-nbd-server-add-bitmap supported it shows that it may be useful, but it is not minimal, and as I pointed out at the time of removing x-, libvirt can always control what name is exposed by creating a temporary bitmap and merging from other bitmaps into the temporary). We also have to think about a future of parallel backup jobs: libvirt can create a single temporary bitmap to expose whatever name it wants under one job, but if libvirt wants to expose the SAME user-visible name to two parallel jobs, it cannot create two bitmaps with the same name, so having a way to set the user-visible name of an arbitrary bitmap when producing the NBD export makes sense on that front. 3. It's never worked like you write. So if we change the behavior, we'll break existing users. Do we have existing users? isn't this new feature in 4.2? No, it's since 4.0 As long as altering the exported name is controlled by a new optional parameter, it does not hurt older 4.0 clients that do not use the new parameter. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH] qapi/block: fix nbd-server-add spec
On 12/19/19 9:08 AM, Nir Soffer wrote: Let's just fix qapi spec now. But qapi documents a better behavior for users. We should fix the code instead to mach the docs. 1. Using disk name as a bitmap name is a bad behavior, as they are completely different concepts. Especially keeping in mind that user already knows disk name anyway and no reason to write this export name inside metadata context of this export. The different concept is expressed by the "qemu:dirty-bitmap:" prefix. "qemu:dirty-bitmap:export-name" means the dirty bitmap for this export. 2. It's not directly documented. You assume that NAME == @name. I understand that it may be assumed.. But it's not documented. But NAME is likely to be understood as the name argument, and unlikely to be the bitmap name. That's a misunderstanding due to poor docs. The bitmap name has always been what was exposed, ever since we promoted things to stable by getting rid of x-. 3. It's never worked like you write. So if we change the behavior, we'll break existing users. Do we have existing users? isn't this new feature in 4.2? No, the feature stems back to 4.0, when we got rid of x-. There are other reasons that dirty bitmaps aren't really usable for incremental backups without qemu 4.2, but qemu 4.0 was the first time we exposed a stable interface for a bitmap over an NBD export, and that release used the bitmap name (and not the export name), so at this point, a code change would break expectations of any 4.0 client using bitmaps for other reasons. Libvirt currently has absolute control over the bitmap name (my initial code in libvirt created a temporary bitmap with my desired name, then merged the contents from the permanent bitmaps corresponding to the actual libvirt Checkpoint objects into the temporary, so that it could then call nbd-export-add with the temporary bitmap name). But, as you point out... Before we had experimental x-block-dirty-bitmap APIs, which are stable, so users could not depend on them. The unstable x-block-dirty-bitmap APIs _did_ have a way to export a user-controlled name SEPARATE from the bitmap name. At the time I was removing the x- prefix, I asked if anyone had a strong use case for keeping that functionality. No one spoke up in favor of keeping it (Nikolay mentioned using the old interface, but not being stumped by its removal), so we nuked it at the time. We can always add it back (now that it sounds like you have a use case where it may be more compelling), but it was easier to stabilize less and add more later as needed, than to stabilize too much and regret that we had to support the flexibility that no one needed. https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg02373.html https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg01970.html -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: qemu implementation details leaks to NBD client
On 12/19/19 5:36 AM, Nir Soffer wrote: When connecting to qemu NBD server during incremental backup, client needs to register this meta context: "qemu:dirty-bitmap:backup-sda" To get dirty bitmap chunks in NBD_CMD_BLOCK_STATUS for export "sda". This comes from libvirt code creating a backup node named "backup-sda" for drive "sda", and creating a temporary dirty bitmap with the same name, which is reasonable. The name shown here is the bitmap name; libvirt can create any temporary bitmap name if that is easier to use. Also, while connecting your NBD client, you can use NBD_OPT_LIST_META_CONTEXT on the query "qemu:dirty-bitmap:" to learn which dirty-bitmap name(s) are exported (for now, qemu exports at most one, but in the future, it could export more than one). All clients using this API need to have code like: # Libvirt uses this name for the dirty bitmap. dirty_bitmap = "backup-" + export_name Duplicating the magic libvirt code: if (incremental) { dd->incrementalBitmap = g_strdup_printf("backup-%s", dd->domdisk->dst); We have only one client now[1] so this is not a huge issue, but it is also easy to fix. I think what we would like to have instead is: "qemu:dirty-bitmap:sda" The libvirt API for backups is not frozen until the 6.0 release, so we could have libvirt just blindly export a bitmap by this name. The problem is that this name is not unique - for now, while we only support one backup job at a time, it doesn't matter what name is in use. But in the future, when we have multiple backup jobs, we have to start worrying about whether parallel jobs share a single NBD server (in which case the export names matter, but for a given export, the bitmap names available for that export are not constrained by other exports) or multiple NBD servers (one per backup job). But we ALSO have to worry about presenting a sane bitmap name over any given export regardless of what bitmap is being exported: if we have two parallel jobs from different points in time, we REQUIRE that two different bitmaps be in use between the two jobs, even if we WANT the NBD client to see the name 'qemu:dirty-bitmap:sda' for both jobs. So our current default of naming the export name after the bitmap name is not future-friendly. So clients connecting to NBD server with exportname=sda would find the dirty bitmap for this export at the expected name, without the need to depend on the internal node name. It looks like a but in qemu, since: # @name: Export name. If unspecified, the @device parameter is used as the #export name. (Since 2.12) If export name is "sda"... # # @writable: Whether clients should be able to write to the device via the # NBD connection (default false). # @bitmap: Also export the dirty bitmap reachable from @device, so the # NBD client can use NBD_OPT_SET_META_CONTEXT with # "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0) This API expect that the user can access it at: "qemu:dirty-bitmap:sda" As mentioned in the other thread, this is a doc bug, stemming from when we removed the x- prefix. I'll reply more there. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH 2/2] iotests: Fix IMGOPTSSYNTAX for nbd
On 12/18/19 4:48 AM, Max Reitz wrote: There is no $SOCKDIR, only $SOCK_DIR. Fixes: f3923a72f199b2c63747a7032db74730546f55c6 Signed-off-by: Max Reitz --- tests/qemu-iotests/common.rc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Whoops. Thanks for the fix. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE
On 12/19/19 8:38 AM, Max Reitz wrote: fuse-export-add allows mounting block graph nodes via FUSE on some existing regular file. That file should then appears like a raw disk image, and accesses to it result in accesses to the exported BDS. Right now, we only set up the mount point and tear all mount points down in bdrv_close_all(). We do not implement any access functions, so accessing the mount point only results in errors. This will be addressed by a followup patch. The set of exported nodes is kept in a hash table so we can later add a fuse-export-remove that allows unmounting. Before I review this, a quick question: How does this compare to the recently added nbdfuse? https://www.redhat.com/archives/libguestfs/2019-October/msg00080.html Or put another way, maybe we get the same effect by combining qemu-nbd with nbdfuse, but this new utility would cut out a middleman for more efficiency, right? +++ b/block/fuse.c @@ -0,0 +1,260 @@ +/* + * Present a block device as a raw image through FUSE + * + * Copyright (c) 2019 Max Reitz -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH 00/18] block: Allow exporting BDSs via FUSE
On 12/20/19 6:50 AM, Kevin Wolf wrote: Am 20.12.2019 um 11:30 hat Max Reitz geschrieben: I placed it into block/ because that just seemed like the least bad place to me (apart from creating a new top-level directory like nbd has) – and also because we already have quite some few non-driver files in block/ (io.c, the jobs (where some got drivers only rather recently), accounting.c, ...). We could consider block/exports/ and eventually also move the NBD server there. We already had another thread considering the motion of qemu-nbd.c to tools/, and I don't mind moving top-level nbd/ into block/exports/ if that makes things easier to reason about. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE
On 12/20/19 7:25 AM, Markus Armbruster wrote: I suppose moving a field between a union base and all variants does still result in different introspection even though the accepted inputs are the same. Correct. A common member (whether it's local or from the base) is in SchemaInfoObject.members[]. Moving it to all the variants moves it to the variant types' .members[]. Is this kind of movement still allowed unconditionally or should we be more careful with something like this? QMP's backward compatibility promise does not include "introspection value won't change". Still, such changes can conceivably confuse clients. Care is advisable. But it's not a hard "no". And libvirt already correctly handles movements like this (so there are existing clients aware of the potential confusion). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: Can we retire Python 2 now?
Markus Armbruster wrote: > Python 2 EOL is only a few days away[*]. We made configure bitch about > it in commit e5abf59eae "Deprecate Python 2 support", 2019-07-01. Any > objections to retiring it now, i.e. in 5.0? > > Cc'ing everyone who appears to be maintaining something that looks like > a Python script. > > [*] https://pythonclock.org/ I am pretty sure that I am not a python maintaainer at all. But anyways, python3 is only at python3.7. python3.0 debuted at 2008, so ... Acked-by: Juan Quintela Reviewed-by: Juan Quintela And anything else that you can think that endorses the change. Later, Juan.
Re: Can we retire Python 2 now?
On Fri, Dec 20, 2019 at 05:29:30PM +0100, Markus Armbruster wrote: > Python 2 EOL is only a few days away[*]. We made configure bitch about > it in commit e5abf59eae "Deprecate Python 2 support", 2019-07-01. Any > objections to retiring it now, i.e. in 5.0? Thanks for the reminder! I'll be honest: even if somebody in this list objects to dropping Python 2 support, I'm not willing to be maintainer of a Python 2 codebase in 2020. The only reason for not doing it in 4.1 was the tests/vm/netbsd breakage we took very long to debug and fix. I have just submitted this pull request: Subject: [PULL 0/2] Require Python >= 3.5 to build QEMU https://lore.kernel.org/qemu-devel/20191220165141.2207058-1-ehabk...@redhat.com/ > > Cc'ing everyone who appears to be maintaining something that looks like > a Python script. > > [*] https://pythonclock.org/ -- Eduardo
Can we retire Python 2 now?
Python 2 EOL is only a few days away[*]. We made configure bitch about it in commit e5abf59eae "Deprecate Python 2 support", 2019-07-01. Any objections to retiring it now, i.e. in 5.0? Cc'ing everyone who appears to be maintaining something that looks like a Python script. [*] https://pythonclock.org/
Re: [PATCH v12 0/3] qcow2: advanced compression options
On 02.12.19 13:15, Andrey Shinkevich wrote: > The compression filter driver is introduced as suggested by Max. > A sample usage of the filter can be found in the test #214. > Now, multiple clusters can be written compressed. > It is useful for the backup job. > > v12: > 01: Missed to change the driver interface .bdrv_co_block_status > from _status_from_backing to _status_from_file (noticed by > Vladimir). > > Andrey Shinkevich (3): > block: introduce compress filter driver > qcow2: Allow writing compressed data of multiple clusters > tests/qemu-iotests: add case to write compressed data of multiple > clusters > > block/Makefile.objs| 1 + > block/filter-compress.c| 168 > + > block/qcow2.c | 102 +++ > qapi/block-core.json | 10 +-- > tests/qemu-iotests/214 | 43 > tests/qemu-iotests/214.out | 14 > 6 files changed, 307 insertions(+), 31 deletions(-) > create mode 100644 block/filter-compress.c Thanks, fixed patch 1 and applied to my block branch: https://git.xanclic.moe/XanClic/qemu/commits/branch/block Max signature.asc Description: OpenPGP digital signature
Re: [PATCH v12 1/3] block: introduce compress filter driver
On 20/12/2019 17:52, Max Reitz wrote: > On 02.12.19 13:15, Andrey Shinkevich wrote: >> Allow writing all the data compressed through the filter driver. >> The written data will be aligned by the cluster size. >> Based on the QEMU current implementation, that data can be written to >> unallocated clusters only. May be used for a backup job. >> >> Suggested-by: Max Reitz >> Signed-off-by: Andrey Shinkevich >> Reviewed-by: Vladimir Sementsov-Ogievskiy >> --- >> block/Makefile.objs | 1 + >> block/filter-compress.c | 168 >> >> qapi/block-core.json| 10 +-- >> 3 files changed, 175 insertions(+), 4 deletions(-) >> create mode 100644 block/filter-compress.c > > [...] > >> diff --git a/block/filter-compress.c b/block/filter-compress.c >> new file mode 100644 >> index 000..4d756ea >> --- /dev/null >> +++ b/block/filter-compress.c >> @@ -0,0 +1,168 @@ > > [...] > >> +static int compress_open(BlockDriverState *bs, QDict *options, int flags, >> + Error **errp) >> +{ >> +bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, >> false, >> + errp); >> +if (!bs->file) { >> +return -EINVAL; >> +} >> + >> +if (!bs->file->bs->drv || >> !block_driver_can_compress(bs->file->bs->drv)) { >> +error_setg(errp, >> + "Compression is not supported for underlying format: %s", >> + bdrv_get_format_name(bs->file->bs)); > > bdrv_get_format_name() returns NULL if bs->file->bs->drv is NULL. I’m > sure g_strdup_vprintf() handles %s with a NULL string gracefully in > practice, but I can’t find that specified anywhere. So even though I’m > well aware I’m being a bit stupid about a minor edge case, I’m hesitant > to accept this patch as-is. > > Obviously the solution can be as simple as bdrv_get_format_name(...) ?: > "(no format)". > > Well, actually, I can be a bit less stupid about it and just propose > merging that change in myself. Would that be OK for you? Yes, please. Thank you, Max. Andrey > > (The rest looks good to me.) > > Max > -- With the best regards, Andrey Shinkevich
Re: [PATCH v12 1/3] block: introduce compress filter driver
On 02.12.19 13:15, Andrey Shinkevich wrote: > Allow writing all the data compressed through the filter driver. > The written data will be aligned by the cluster size. > Based on the QEMU current implementation, that data can be written to > unallocated clusters only. May be used for a backup job. > > Suggested-by: Max Reitz > Signed-off-by: Andrey Shinkevich > Reviewed-by: Vladimir Sementsov-Ogievskiy > --- > block/Makefile.objs | 1 + > block/filter-compress.c | 168 > > qapi/block-core.json| 10 +-- > 3 files changed, 175 insertions(+), 4 deletions(-) > create mode 100644 block/filter-compress.c [...] > diff --git a/block/filter-compress.c b/block/filter-compress.c > new file mode 100644 > index 000..4d756ea > --- /dev/null > +++ b/block/filter-compress.c > @@ -0,0 +1,168 @@ [...] > +static int compress_open(BlockDriverState *bs, QDict *options, int flags, > + Error **errp) > +{ > +bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, false, > + errp); > +if (!bs->file) { > +return -EINVAL; > +} > + > +if (!bs->file->bs->drv || !block_driver_can_compress(bs->file->bs->drv)) > { > +error_setg(errp, > + "Compression is not supported for underlying format: %s", > + bdrv_get_format_name(bs->file->bs)); bdrv_get_format_name() returns NULL if bs->file->bs->drv is NULL. I’m sure g_strdup_vprintf() handles %s with a NULL string gracefully in practice, but I can’t find that specified anywhere. So even though I’m well aware I’m being a bit stupid about a minor edge case, I’m hesitant to accept this patch as-is. Obviously the solution can be as simple as bdrv_get_format_name(...) ?: "(no format)". Well, actually, I can be a bit less stupid about it and just propose merging that change in myself. Would that be OK for you? (The rest looks good to me.) Max signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 00/14] chardev: Use QEMUChrEvent enum in IOEventHandler typedef
On Wed, Dec 18, 2019 at 9:20 PM Philippe Mathieu-Daudé wrote: > > Hi, > > After this chat on #qemu IRC: > 13:20 so what is the difference between a IOReadHandler and > IOEventHandler? > 13:25 stsquad: one is in-band and the other out-of-band? > 13:26 f4bug: ahh yes it seems so - connect/disconnect etc... > 13:27 see QEMUChrEvent for IOEventHandler > > I thought it might be a good opportunity to make the IOEventHandler > typedef meaning more obvious, by using the QEMUChrEvent enum. > > To be able to build I had to explicit all enums ignored in the > switch(event) statement by these frontends. > > Then I used a coccinelle spatch to change the various IOEventHandler. > I don't think the last patch can be split, but suggestions are welcome! > > Regards, > > Phil. > > v2: > - do blindly ignore all events using a 'default' case. > > Philippe Mathieu-Daudé (14): > hw/ipmi: Remove unnecessary declarations > hw/ipmi: Explicit we ignore some QEMUChrEvent in IOEventHandler > hw/char/terminal3270: Explicit ignored QEMUChrEvent in IOEventHandler > hw/usb/dev-serial: Explicit we ignore few QEMUChrEvent in IOEventHandler > hw/usb/redirect: Explicit we ignore few QEMUChrEvent in IOEventHandler > ccid-card-passthru: Explicit we ignore QEMUChrEvent in IOEventHandler > vhost-user-crypto: Explicit we ignore some QEMUChrEvent in IOEventHandler > vhost-user-net: Explicit we ignore few QEMUChrEvent in IOEventHandler > vhost-user-blk: Explicit we ignore few QEMUChrEvent in IOEventHandler > virtio-console: Explicit we ignore some QEMUChrEvent in IOEventHandler > monitor/qmp: Explicit we ignore few QEMUChrEvent in IOEventHandler > monitor/hmp: Explicit we ignore a QEMUChrEvent in IOEventHandler > chardev/char: Explicit we ignore some QEMUChrEvent in IOEventHandler > chardev: Use QEMUChrEvent enum in IOEventHandler typedef Reviewed-by: Marc-André Lureau (I guess Paolo will take the series for next PR?) > > include/chardev/char-fe.h | 2 +- > include/chardev/char-mux.h | 2 +- > include/chardev/char.h | 4 ++-- > backends/cryptodev-vhost-user.c | 7 ++- > chardev/char-mux.c | 8 > chardev/char.c | 9 +++-- > gdbstub.c | 2 +- > hw/arm/pxa2xx.c | 2 +- > hw/arm/strongarm.c | 2 +- > hw/block/vhost-user-blk.c | 7 ++- > hw/char/cadence_uart.c | 2 +- > hw/char/digic-uart.c| 2 +- > hw/char/escc.c | 2 +- > hw/char/etraxfs_ser.c | 2 +- > hw/char/exynos4210_uart.c | 2 +- > hw/char/grlib_apbuart.c | 2 +- > hw/char/imx_serial.c| 2 +- > hw/char/ipoctal232.c| 2 +- > hw/char/lm32_juart.c| 2 +- > hw/char/lm32_uart.c | 2 +- > hw/char/mcf_uart.c | 2 +- > hw/char/milkymist-uart.c| 2 +- > hw/char/nrf51_uart.c| 2 +- > hw/char/pl011.c | 2 +- > hw/char/serial.c| 2 +- > hw/char/sh_serial.c | 2 +- > hw/char/terminal3270.c | 7 ++- > hw/char/virtio-console.c| 7 ++- > hw/char/xilinx_uartlite.c | 2 +- > hw/ipmi/ipmi_bmc_extern.c | 12 +++- > hw/mips/boston.c| 2 +- > hw/mips/mips_malta.c| 2 +- > hw/riscv/riscv_htif.c | 2 +- > hw/riscv/sifive_uart.c | 2 +- > hw/usb/ccid-card-passthru.c | 7 ++- > hw/usb/dev-serial.c | 6 +- > hw/usb/redirect.c | 7 ++- > monitor/hmp.c | 6 +- > monitor/qmp.c | 7 ++- > net/filter-mirror.c | 2 +- > net/vhost-user.c| 9 +++-- > qtest.c | 2 +- > tests/test-char.c | 6 +++--- > tests/vhost-user-test.c | 2 +- > 44 files changed, 111 insertions(+), 56 deletions(-) > > Cc: "Gonglei (Arei)" > Cc: "Marc-André Lureau" > Cc: Paolo Bonzini > Cc: "Alex Bennée" > Cc: "Philippe Mathieu-Daudé" > Cc: Andrzej Zaborowski > Cc: Peter Maydell > Cc: "Michael S. Tsirkin" > Cc: Kevin Wolf > Cc: Max Reitz > Cc: "Edgar E. Iglesias" > Cc: Alistair Francis > Cc: Antony Pavlov > Cc: Igor Mitsyanko > Cc: Fabien Chouteau > Cc: KONRAD Frederic > Cc: Peter Chubb > Cc: Alberto Garcia > Cc: Michael Walle > Cc: Thomas Huth > Cc: Joel Stanley > Cc: Cornelia Huck > Cc: Halil Pasic > Cc: Christian Borntraeger > Cc: Laurent Vivier > Cc: Amit Shah > Cc: Corey Minyard > Cc: Paul Burton > Cc: Aleksandar Rikalo > Cc: Aurelien Jarno > Cc: Aleksandar Markovic > Cc: Palmer Dabbelt > Cc: Sagar Karandikar > Cc: Bastian Koppelmann > Cc: Gerd Hoffmann > Cc: Samuel Thibault > Cc: "Dr. David Alan Gilbert" > Cc: Markus Armbruster > Cc: Zhang Chen > Cc: Li Zhijian > Cc: Jason Wang > Cc: qemu-...@nongnu.org > Cc: qemu-block@nongnu.org > Cc: qemu-s3...@nongnu.org > Cc: qemu-ri.
Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE
Kevin Wolf writes: > Am 20.12.2019 um 13:48 hat Markus Armbruster geschrieben: >> Kevin Wolf writes: >> >> > Am 20.12.2019 um 11:48 hat Max Reitz geschrieben: >> >> So if we kept writable and growable in the common base, then the schema >> >> would give no information about what exports actually support them. >> >> >> >> On one hand, I don’t know whether it’s important to have this >> >> information in a static form, or whether it’s sufficient to learn at >> >> runtime. >> >> >> >> On the other, I don’t know whether it’s important to have those fields >> >> in the base or not. Would it make a difference on the wire? >> > >> > Not for the command itself, so I think we're free to change it later. It >> > might make a difference for introspection, though, not sure. Markus? >> >> QAPI schema introspection is designed to hide the difference between >> local members and base members. You can move members to or from a base >> type freely without affecting introspection. Even if that creates or >> deletes the base type. > > Good, that's helpful. So I can split the nbd-server-add argument type > into a base that is reused as a union branch and the rest without > potentially breaking anything. > > I suppose moving a field between a union base and all variants does > still result in different introspection even though the accepted inputs > are the same. Correct. A common member (whether it's local or from the base) is in SchemaInfoObject.members[]. Moving it to all the variants moves it to the variant types' .members[]. > Is this kind of movement still allowed unconditionally or > should we be more careful with something like this? QMP's backward compatibility promise does not include "introspection value won't change". Still, such changes can conceivably confuse clients. Care is advisable. But it's not a hard "no".
Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE
Am 20.12.2019 um 13:49 hat Markus Armbruster geschrieben: > Kevin Wolf writes: > > > Am 19.12.2019 um 15:38 hat Max Reitz geschrieben: > >> fuse-export-add allows mounting block graph nodes via FUSE on some > >> existing regular file. That file should then appears like a raw disk > >> image, and accesses to it result in accesses to the exported BDS. > >> > >> Right now, we only set up the mount point and tear all mount points down > >> in bdrv_close_all(). We do not implement any access functions, so > >> accessing the mount point only results in errors. This will be > >> addressed by a followup patch. > >> > >> The set of exported nodes is kept in a hash table so we can later add a > >> fuse-export-remove that allows unmounting. > >> > >> Signed-off-by: Max Reitz > > > >> diff --git a/qapi/block.json b/qapi/block.json > >> index 145c268bb6..03f8d1b537 100644 > >> --- a/qapi/block.json > >> +++ b/qapi/block.json > >> @@ -317,6 +317,29 @@ > >> ## > >> { 'command': 'nbd-server-stop' } > >> > >> +## > >> +# @fuse-export-add: > >> +# > >> +# Exports a block graph node on some (file) mountpoint as a raw image. > >> +# > >> +# @node-name: Node to be exported > >> +# > >> +# @mountpoint: Path on which to export the block device via FUSE. > >> +# This must point to an existing regular file. > >> +# > >> +# @writable: Whether clients should be able to write to the block > >> +#device via the FUSE export. (default: false) > >> +# > >> +# Since: 5.0 > >> +## > >> +{ 'command': 'fuse-export-add', > >> + 'data': { > >> + 'node-name': 'str', > >> + 'mountpoint': 'str', > >> + '*writable': 'bool' > >> + }, > >> + 'if': 'defined(CONFIG_FUSE)' } > > > > Can this use a BlockExport union from the start like I'm introducing in > > the storage daemon series, together with a generic block-export-add? > > > > It also looks like node-name and writable should be part of the common > > base of BlockExport. Unfortunately this would mean that I can't use the > > same BlockExportNbd for the existing nbd-server-add command any more. I > > guess I could somehow get a shared base type for both, though. > > > > Markus, any thoughts on these QAPI interfaces? > > Context? How far back should I read? Basically just the hunk quoted above and the QAPI portion of the following storage daemon patch: [RFC PATCH 08/18] qemu-storage-daemon: Add --export option Maybe the cover letter, too, if you need a more high-level introduction. Kevin
Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE
Am 20.12.2019 um 13:48 hat Markus Armbruster geschrieben: > Kevin Wolf writes: > > > Am 20.12.2019 um 11:48 hat Max Reitz geschrieben: > >> So if we kept writable and growable in the common base, then the schema > >> would give no information about what exports actually support them. > >> > >> On one hand, I don’t know whether it’s important to have this > >> information in a static form, or whether it’s sufficient to learn at > >> runtime. > >> > >> On the other, I don’t know whether it’s important to have those fields > >> in the base or not. Would it make a difference on the wire? > > > > Not for the command itself, so I think we're free to change it later. It > > might make a difference for introspection, though, not sure. Markus? > > QAPI schema introspection is designed to hide the difference between > local members and base members. You can move members to or from a base > type freely without affecting introspection. Even if that creates or > deletes the base type. Good, that's helpful. So I can split the nbd-server-add argument type into a base that is reused as a union branch and the rest without potentially breaking anything. I suppose moving a field between a union base and all variants does still result in different introspection even though the accepted inputs are the same. Is this kind of movement still allowed unconditionally or should we be more careful with something like this? Kevin
Re: [PATCH 00/18] block: Allow exporting BDSs via FUSE
Am 20.12.2019 um 11:30 hat Max Reitz geschrieben: > I placed it into block/ because that just seemed like the least bad > place to me (apart from creating a new top-level directory like nbd has) > – and also because we already have quite some few non-driver files in > block/ (io.c, the jobs (where some got drivers only rather recently), > accounting.c, ...). We could consider block/exports/ and eventually also move the NBD server there. Kevin signature.asc Description: PGP signature
Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE
Kevin Wolf writes: > Am 19.12.2019 um 15:38 hat Max Reitz geschrieben: >> fuse-export-add allows mounting block graph nodes via FUSE on some >> existing regular file. That file should then appears like a raw disk >> image, and accesses to it result in accesses to the exported BDS. >> >> Right now, we only set up the mount point and tear all mount points down >> in bdrv_close_all(). We do not implement any access functions, so >> accessing the mount point only results in errors. This will be >> addressed by a followup patch. >> >> The set of exported nodes is kept in a hash table so we can later add a >> fuse-export-remove that allows unmounting. >> >> Signed-off-by: Max Reitz > >> diff --git a/qapi/block.json b/qapi/block.json >> index 145c268bb6..03f8d1b537 100644 >> --- a/qapi/block.json >> +++ b/qapi/block.json >> @@ -317,6 +317,29 @@ >> ## >> { 'command': 'nbd-server-stop' } >> >> +## >> +# @fuse-export-add: >> +# >> +# Exports a block graph node on some (file) mountpoint as a raw image. >> +# >> +# @node-name: Node to be exported >> +# >> +# @mountpoint: Path on which to export the block device via FUSE. >> +# This must point to an existing regular file. >> +# >> +# @writable: Whether clients should be able to write to the block >> +#device via the FUSE export. (default: false) >> +# >> +# Since: 5.0 >> +## >> +{ 'command': 'fuse-export-add', >> + 'data': { >> + 'node-name': 'str', >> + 'mountpoint': 'str', >> + '*writable': 'bool' >> + }, >> + 'if': 'defined(CONFIG_FUSE)' } > > Can this use a BlockExport union from the start like I'm introducing in > the storage daemon series, together with a generic block-export-add? > > It also looks like node-name and writable should be part of the common > base of BlockExport. Unfortunately this would mean that I can't use the > same BlockExportNbd for the existing nbd-server-add command any more. I > guess I could somehow get a shared base type for both, though. > > Markus, any thoughts on these QAPI interfaces? Context? How far back should I read?
Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE
Kevin Wolf writes: > Am 20.12.2019 um 11:48 hat Max Reitz geschrieben: >> On 20.12.19 11:26, Kevin Wolf wrote: >> > Am 19.12.2019 um 15:38 hat Max Reitz geschrieben: >> >> fuse-export-add allows mounting block graph nodes via FUSE on some >> >> existing regular file. That file should then appears like a raw disk >> >> image, and accesses to it result in accesses to the exported BDS. >> >> >> >> Right now, we only set up the mount point and tear all mount points down >> >> in bdrv_close_all(). We do not implement any access functions, so >> >> accessing the mount point only results in errors. This will be >> >> addressed by a followup patch. >> >> >> >> The set of exported nodes is kept in a hash table so we can later add a >> >> fuse-export-remove that allows unmounting. >> >> >> >> Signed-off-by: Max Reitz >> > >> >> diff --git a/qapi/block.json b/qapi/block.json >> >> index 145c268bb6..03f8d1b537 100644 >> >> --- a/qapi/block.json >> >> +++ b/qapi/block.json >> >> @@ -317,6 +317,29 @@ >> >> ## >> >> { 'command': 'nbd-server-stop' } >> >> >> >> +## >> >> +# @fuse-export-add: >> >> +# >> >> +# Exports a block graph node on some (file) mountpoint as a raw image. >> >> +# >> >> +# @node-name: Node to be exported >> >> +# >> >> +# @mountpoint: Path on which to export the block device via FUSE. >> >> +# This must point to an existing regular file. >> >> +# >> >> +# @writable: Whether clients should be able to write to the block >> >> +#device via the FUSE export. (default: false) >> >> +# >> >> +# Since: 5.0 >> >> +## >> >> +{ 'command': 'fuse-export-add', >> >> + 'data': { >> >> + 'node-name': 'str', >> >> + 'mountpoint': 'str', >> >> + '*writable': 'bool' >> >> + }, >> >> + 'if': 'defined(CONFIG_FUSE)' } >> > >> > Can this use a BlockExport union from the start like I'm introducing in >> > the storage daemon series, together with a generic block-export-add? >> >> Hm, you mean still adding a FuseExport structure that would be part of >> BlockExport and then dropping fuse-export-add in favor of a >> block-export-add that we want anyway? > > Yes. > >> > It also looks like node-name and writable should be part of the common >> > base of BlockExport. >> >> node-name definitely, I’m not so sure about writable. Or, to be more >> precise, I think that if we want writable to be in the base, we also >> want growable to be there: Both are primarily options for the >> BlockBackend that the exports use. >> >> But both of course also need to be supported by the export >> implementation. nbd can make its BB growable all it wants, but that >> doesn’t make it work. > > Right. Pragmatically, I think exports are very like to support writable, > but probably rather unlikely to support growable. So I do think there > would be a point for making writable part of the common base, but not > growable. > >> So if we kept writable and growable in the common base, then the schema >> would give no information about what exports actually support them. >> >> On one hand, I don’t know whether it’s important to have this >> information in a static form, or whether it’s sufficient to learn at >> runtime. >> >> On the other, I don’t know whether it’s important to have those fields >> in the base or not. Would it make a difference on the wire? > > Not for the command itself, so I think we're free to change it later. It > might make a difference for introspection, though, not sure. Markus? QAPI schema introspection is designed to hide the difference between local members and base members. You can move members to or from a base type freely without affecting introspection. Even if that creates or deletes the base type. Example: DriveBackup { 'struct': 'DriveBackup', 'base': 'BackupCommon', 'data': { 'target': 'str', '*format': 'str', '*mode': 'NewImageMode' } } where BackupCommon is { 'struct': 'BackupCommon', 'data': { '*job-id': 'str', 'device': 'str', 'sync': 'MirrorSyncMode', '*speed': 'int', '*bitmap': 'str', '*bitmap-mode': 'BitmapSyncMode', '*compress': 'bool', '*on-source-error': 'BlockdevOnError', '*on-target-error': 'BlockdevOnError', '*auto-finalize': 'bool', '*auto-dismiss': 'bool', '*filter-node-name': 'str' } } query-qmp-schema describes DriveBackup like this: {"name": "30", "members": [ {"name": "job-id", "default": null, "type": "str"}, {"name": "device", "type": "str"}, {"name": "sync", "type": "235"}, {"name": "speed", "default": null, "type": "int"}, {"name": "bitmap", "default": null, "type": "str"}, {"name": "bitmap-mode", "default": null, "type": "236"}, {"name": "compress", "default": null, "type": "bool"}, {"name": "on-source-error", "default": null, "type": "237"}, {"name": "on-target-error", "de
Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE
On 20.12.19 12:24, Kevin Wolf wrote: > Am 20.12.2019 um 11:48 hat Max Reitz geschrieben: >> On 20.12.19 11:26, Kevin Wolf wrote: >>> Am 19.12.2019 um 15:38 hat Max Reitz geschrieben: fuse-export-add allows mounting block graph nodes via FUSE on some existing regular file. That file should then appears like a raw disk image, and accesses to it result in accesses to the exported BDS. Right now, we only set up the mount point and tear all mount points down in bdrv_close_all(). We do not implement any access functions, so accessing the mount point only results in errors. This will be addressed by a followup patch. The set of exported nodes is kept in a hash table so we can later add a fuse-export-remove that allows unmounting. Signed-off-by: Max Reitz >>> diff --git a/qapi/block.json b/qapi/block.json index 145c268bb6..03f8d1b537 100644 --- a/qapi/block.json +++ b/qapi/block.json @@ -317,6 +317,29 @@ ## { 'command': 'nbd-server-stop' } +## +# @fuse-export-add: +# +# Exports a block graph node on some (file) mountpoint as a raw image. +# +# @node-name: Node to be exported +# +# @mountpoint: Path on which to export the block device via FUSE. +# This must point to an existing regular file. +# +# @writable: Whether clients should be able to write to the block +#device via the FUSE export. (default: false) +# +# Since: 5.0 +## +{ 'command': 'fuse-export-add', + 'data': { + 'node-name': 'str', + 'mountpoint': 'str', + '*writable': 'bool' + }, + 'if': 'defined(CONFIG_FUSE)' } >>> >>> Can this use a BlockExport union from the start like I'm introducing in >>> the storage daemon series, together with a generic block-export-add? >> >> Hm, you mean still adding a FuseExport structure that would be part of >> BlockExport and then dropping fuse-export-add in favor of a >> block-export-add that we want anyway? > > Yes. > >>> It also looks like node-name and writable should be part of the common >>> base of BlockExport. >> >> node-name definitely, I’m not so sure about writable. Or, to be more >> precise, I think that if we want writable to be in the base, we also >> want growable to be there: Both are primarily options for the >> BlockBackend that the exports use. >> >> But both of course also need to be supported by the export >> implementation. nbd can make its BB growable all it wants, but that >> doesn’t make it work. > > Right. Pragmatically, I think exports are very like to support writable, > but probably rather unlikely to support growable. So I do think there > would be a point for making writable part of the common base, but not > growable. True. But there’s nothing that inherently binds it to FUSE, so I think both from an implementation’s POV and from a user’s POV, it looks just as generic as “writable”. But that’s theory. I agree that in practice, it won’t be as generic. (I realize this doesn’t help much in finding out what we should do.) >> So if we kept writable and growable in the common base, then the schema >> would give no information about what exports actually support them. >> >> On one hand, I don’t know whether it’s important to have this >> information in a static form, or whether it’s sufficient to learn at >> runtime. >> >> On the other, I don’t know whether it’s important to have those fields >> in the base or not. Would it make a difference on the wire? > > Not for the command itself, so I think we're free to change it later. It > might make a difference for introspection, though, not sure. Markus? Yes, I asked because I’m wondering whether it would be more cumbersome to users if we didn’t keep it in the base structure. The duplication depends on how we want to design the command. Should the export implementations receive a ready-to-use BB? Or just a node-name? In the latter case, we wouldn’t get rid of duplicated code by having writable/growable in the base. For the former, it could, but then again, just taking the WRITE permission and making the BB growable isn’t that bad to duplicate. Something to consider is that of course the current NBD code wants a node-name and not a BB. So if we decided to generally give export implementations a BB, then the initial implementation of qmp_block_export_add() would look a bit freaky: It would first branch off to qmp_nbd_server_add(), and then open the BB for the “common” case, but this common case only exists for FUSE (right now). OTOH, right now we’re free to decide whether we open the BB in qmp_block_export_add() or fuse.c, and so we might as well just do it in the former. If we later find out that this was a stupid idea, we can always move it into fuse.c. Now I don’t quite know where I’m trying to get with this. I suppose it means that we should start with qmp_block_ex
Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
On 20.12.19 12:55, Vladimir Sementsov-Ogievskiy wrote: > 20.12.2019 14:39, Max Reitz wrote: >> On 13.12.19 12:18, Vladimir Sementsov-Ogievskiy wrote: >>> 09.12.2019 17:43, Max Reitz wrote: On 02.12.19 13:12, Vladimir Sementsov-Ogievskiy wrote: > 11.11.2019 19:02, Max Reitz wrote: >> While bdrv_replace_node() will not follow through with it, a specific >> @replaces asks the mirror job to create a loop. >> >> For example, say both the source and the target share a child where the >> source is a filter; by letting @replaces point to the common child, you >> ask for a loop. >> >> Or if you use @replaces in drive-mirror with sync=none and >> mode=absolute-paths, you generally ask for a loop (@replaces must point >> to a child of the source, and sync=none makes the source the backing >> file of the target after the job). >> >> bdrv_replace_node() will not create those loops, but by doing so, it >> ignores the user-requested configuration, which is not ideally either. >> (In the first example above, the target's child will remain what it was, >> which may still be reasonable. But in the second example, the target >> will just not become a child of the source, which is precisely what was >> requested with @replaces.) >> >> So prevent such configurations, both before the job, and before it >> actually completes. >> >> Signed-off-by: Max Reitz >> --- >> block.c | 30 >> block/mirror.c| 19 +++- >> blockdev.c| 48 >> ++- >> include/block/block_int.h | 3 +++ >> 4 files changed, 98 insertions(+), 2 deletions(-) >> >> diff --git a/block.c b/block.c >> index 0159f8e510..e3922a0474 100644 >> --- a/block.c >> +++ b/block.c >> @@ -6259,6 +6259,36 @@ out: >> return to_replace_bs; >> } >> >> +/* >> + * Return true iff @child is a (recursive) child of @parent, with at >> + * least @min_level edges between them. >> + * >> + * (If @min_level == 0, return true if @child == @parent. For >> + * @min_level == 1, @child needs to be at least a real child; for >> + * @min_level == 2, it needs to be at least a grand-child; and so on.) >> + */ >> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent, >> + int min_level) >> +{ >> +BdrvChild *c; >> + >> +if (child == parent && min_level <= 0) { >> +return true; >> +} >> + >> +if (!parent) { >> +return false; >> +} >> + >> +QLIST_FOREACH(c, &parent->children, next) { >> +if (bdrv_is_child_of(child, c->bs, min_level - 1)) { >> +return true; >> +} >> +} >> + >> +return false; >> +} >> + >> /** >> * Iterates through the list of runtime option keys that are said to >> * be "strong" for a BDS. An option is called "strong" if it changes >> diff --git a/block/mirror.c b/block/mirror.c >> index 68a4404666..b258c7e98b 100644 >> --- a/block/mirror.c >> +++ b/block/mirror.c >> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job) >> * there. >> */ >> if (bdrv_recurse_can_replace(src, to_replace)) { >> -bdrv_replace_node(to_replace, target_bs, &local_err); >> +/* >> + * It is OK for @to_replace to be an immediate child of >> + * @target_bs, because that is what happens with >> + * drive-mirror sync=none mode=absolute-paths: target_bs's >> + * backing file will be the source node, which is also >> + * to_replace (by default). >> + * bdrv_replace_node() handles this case by not letting >> + * target_bs->backing point to itself, but to the source >> + * still. >> + */ >> +if (!bdrv_is_child_of(to_replace, target_bs, 2)) { >> +bdrv_replace_node(to_replace, target_bs, &local_err); >> +} else { >> +error_setg(&local_err, "Can no longer replace '%s' by >> '%s', " >> + "because the former is now a child of the >> latter, " >> + "and doing so would thus create a loop", >> + to_replace->node_name, target_bs->node_name); >> +} > > you may swap if and else branch, dropping "!" mark.. Yes, but I just personally prefer to have the error case in the else branch. >> } else { >> error_setg(&local_err, "Can no longer replace '%s' by >> '%s', "
Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
20.12.2019 14:39, Max Reitz wrote: > On 13.12.19 12:18, Vladimir Sementsov-Ogievskiy wrote: >> 09.12.2019 17:43, Max Reitz wrote: >>> On 02.12.19 13:12, Vladimir Sementsov-Ogievskiy wrote: 11.11.2019 19:02, Max Reitz wrote: > While bdrv_replace_node() will not follow through with it, a specific > @replaces asks the mirror job to create a loop. > > For example, say both the source and the target share a child where the > source is a filter; by letting @replaces point to the common child, you > ask for a loop. > > Or if you use @replaces in drive-mirror with sync=none and > mode=absolute-paths, you generally ask for a loop (@replaces must point > to a child of the source, and sync=none makes the source the backing > file of the target after the job). > > bdrv_replace_node() will not create those loops, but by doing so, it > ignores the user-requested configuration, which is not ideally either. > (In the first example above, the target's child will remain what it was, > which may still be reasonable. But in the second example, the target > will just not become a child of the source, which is precisely what was > requested with @replaces.) > > So prevent such configurations, both before the job, and before it > actually completes. > > Signed-off-by: Max Reitz > --- > block.c | 30 > block/mirror.c| 19 +++- > blockdev.c| 48 ++- > include/block/block_int.h | 3 +++ > 4 files changed, 98 insertions(+), 2 deletions(-) > > diff --git a/block.c b/block.c > index 0159f8e510..e3922a0474 100644 > --- a/block.c > +++ b/block.c > @@ -6259,6 +6259,36 @@ out: > return to_replace_bs; > } > > +/* > + * Return true iff @child is a (recursive) child of @parent, with at > + * least @min_level edges between them. > + * > + * (If @min_level == 0, return true if @child == @parent. For > + * @min_level == 1, @child needs to be at least a real child; for > + * @min_level == 2, it needs to be at least a grand-child; and so on.) > + */ > +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent, > + int min_level) > +{ > +BdrvChild *c; > + > +if (child == parent && min_level <= 0) { > +return true; > +} > + > +if (!parent) { > +return false; > +} > + > +QLIST_FOREACH(c, &parent->children, next) { > +if (bdrv_is_child_of(child, c->bs, min_level - 1)) { > +return true; > +} > +} > + > +return false; > +} > + > /** > * Iterates through the list of runtime option keys that are said to > * be "strong" for a BDS. An option is called "strong" if it changes > diff --git a/block/mirror.c b/block/mirror.c > index 68a4404666..b258c7e98b 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job) > * there. > */ > if (bdrv_recurse_can_replace(src, to_replace)) { > -bdrv_replace_node(to_replace, target_bs, &local_err); > +/* > + * It is OK for @to_replace to be an immediate child of > + * @target_bs, because that is what happens with > + * drive-mirror sync=none mode=absolute-paths: target_bs's > + * backing file will be the source node, which is also > + * to_replace (by default). > + * bdrv_replace_node() handles this case by not letting > + * target_bs->backing point to itself, but to the source > + * still. > + */ > +if (!bdrv_is_child_of(to_replace, target_bs, 2)) { > +bdrv_replace_node(to_replace, target_bs, &local_err); > +} else { > +error_setg(&local_err, "Can no longer replace '%s' by > '%s', " > + "because the former is now a child of the > latter, " > + "and doing so would thus create a loop", > + to_replace->node_name, target_bs->node_name); > +} you may swap if and else branch, dropping "!" mark.. >>> >>> Yes, but I just personally prefer to have the error case in the else branch. >>> > } else { > error_setg(&local_err, "Can no longer replace '%s' by > '%s', " >"because it can no longer be guaranteed that > doing so " > diff --git a/blockdev.c b/blockdev.c > index 9dc2238bf3..d29f147f72 100644 >>
Re: [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path()
On 13.12.19 12:27, Vladimir Sementsov-Ogievskiy wrote: > 11.11.2019 19:02, Max Reitz wrote: >> Signed-off-by: Max Reitz >> --- >> tests/qemu-iotests/iotests.py | 59 +++ >> 1 file changed, 59 insertions(+) >> >> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py >> index d34305ce69..3e03320ce3 100644 >> --- a/tests/qemu-iotests/iotests.py >> +++ b/tests/qemu-iotests/iotests.py >> @@ -681,6 +681,65 @@ class VM(qtest.QEMUQtestMachine): >> >> return fields.items() <= ret.items() >> >> +""" >> +Check whether the node under the given path in the block graph is >> +@expected_node. >> + >> +@root is the node name of the node where the @path is rooted. >> + >> +@path is a string that consists of child names separated by >> +slashes. It must begin with a slash. >> + >> +Examples for @root + @path: >> + - root="qcow2-node", path="/backing/file" >> + - root="quorum-node", path="/children.2/file" >> + >> +Hypothetically, @path could be empty, in which case it would point >> +to @root. However, in practice this case is not useful and hence >> +not allowed. >> + >> +@expected_node may be None. >> + >> +@graph may be None or the result of an x-debug-query-block-graph >> +call that has already been performed. >> +""" >> +def assert_block_path(self, root, path, expected_node, graph=None): >> +if graph is None: >> +graph = self.qmp('x-debug-query-block-graph')['return'] >> + >> +iter_path = iter(path.split('/')) >> + >> +# Must start with a / >> +assert next(iter_path) == '' >> + >> +node = next((node for node in graph['nodes'] if node['name'] == >> root), >> +None) >> + >> +for path_node in iter_path: > > I'd rename path_node to child or edge, to not interfere with block nodes here. Sure. Or maybe child_name. Max signature.asc Description: OpenPGP digital signature
Re: [PATCH for-5.0 v2 15/23] mirror: Prevent loops
On 13.12.19 12:18, Vladimir Sementsov-Ogievskiy wrote: > 09.12.2019 17:43, Max Reitz wrote: >> On 02.12.19 13:12, Vladimir Sementsov-Ogievskiy wrote: >>> 11.11.2019 19:02, Max Reitz wrote: While bdrv_replace_node() will not follow through with it, a specific @replaces asks the mirror job to create a loop. For example, say both the source and the target share a child where the source is a filter; by letting @replaces point to the common child, you ask for a loop. Or if you use @replaces in drive-mirror with sync=none and mode=absolute-paths, you generally ask for a loop (@replaces must point to a child of the source, and sync=none makes the source the backing file of the target after the job). bdrv_replace_node() will not create those loops, but by doing so, it ignores the user-requested configuration, which is not ideally either. (In the first example above, the target's child will remain what it was, which may still be reasonable. But in the second example, the target will just not become a child of the source, which is precisely what was requested with @replaces.) So prevent such configurations, both before the job, and before it actually completes. Signed-off-by: Max Reitz --- block.c | 30 block/mirror.c| 19 +++- blockdev.c| 48 ++- include/block/block_int.h | 3 +++ 4 files changed, 98 insertions(+), 2 deletions(-) diff --git a/block.c b/block.c index 0159f8e510..e3922a0474 100644 --- a/block.c +++ b/block.c @@ -6259,6 +6259,36 @@ out: return to_replace_bs; } +/* + * Return true iff @child is a (recursive) child of @parent, with at + * least @min_level edges between them. + * + * (If @min_level == 0, return true if @child == @parent. For + * @min_level == 1, @child needs to be at least a real child; for + * @min_level == 2, it needs to be at least a grand-child; and so on.) + */ +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent, + int min_level) +{ +BdrvChild *c; + +if (child == parent && min_level <= 0) { +return true; +} + +if (!parent) { +return false; +} + +QLIST_FOREACH(c, &parent->children, next) { +if (bdrv_is_child_of(child, c->bs, min_level - 1)) { +return true; +} +} + +return false; +} + /** * Iterates through the list of runtime option keys that are said to * be "strong" for a BDS. An option is called "strong" if it changes diff --git a/block/mirror.c b/block/mirror.c index 68a4404666..b258c7e98b 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job) * there. */ if (bdrv_recurse_can_replace(src, to_replace)) { -bdrv_replace_node(to_replace, target_bs, &local_err); +/* + * It is OK for @to_replace to be an immediate child of + * @target_bs, because that is what happens with + * drive-mirror sync=none mode=absolute-paths: target_bs's + * backing file will be the source node, which is also + * to_replace (by default). + * bdrv_replace_node() handles this case by not letting + * target_bs->backing point to itself, but to the source + * still. + */ +if (!bdrv_is_child_of(to_replace, target_bs, 2)) { +bdrv_replace_node(to_replace, target_bs, &local_err); +} else { +error_setg(&local_err, "Can no longer replace '%s' by '%s', " + "because the former is now a child of the latter, " + "and doing so would thus create a loop", + to_replace->node_name, target_bs->node_name); +} >>> >>> you may swap if and else branch, dropping "!" mark.. >> >> Yes, but I just personally prefer to have the error case in the else branch. >> } else { error_setg(&local_err, "Can no longer replace '%s' by '%s', " "because it can no longer be guaranteed that doing so " diff --git a/blockdev.c b/blockdev.c index 9dc2238bf3..d29f147f72 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3824,7 +3824,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs, } >>>
Re: [PATCH] block: nbd: Fix dirty bitmap context name
20.12.2019 13:40, Peter Krempa wrote: > On Fri, Dec 20, 2019 at 10:06:17 +, Vladimir Sementsov-Ogievskiy wrote: >> 20.12.2019 12:56, Peter Krempa wrote: >>> On Fri, Dec 20, 2019 at 09:39:17 +, Vladimir Sementsov-Ogievskiy wrote: 19.12.2019 18:55, Nir Soffer wrote: > On Thu, Dec 19, 2019 at 5:17 PM Vladimir Sementsov-Ogievskiy > wrote: >> >> 19.12.2019 17:59, Nir Soffer wrote: >>> On Thu, Dec 19, 2019 at 4:04 PM Kevin Wolf wrote: Am 19.12.2019 um 14:41 hat Vladimir Sementsov-Ogievskiy geschrieben: > Ahh, I see, it's documented as > > +# @bitmap: Also export the dirty bitmap reachable from @device, so > the > +# NBD client can use NBD_OPT_SET_META_CONTEXT with > +# "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since > 4.0) > > and it is logical to assume that export name (which is @name > argument) is > mentioned. But we never mentioned it. This is just documented after > removed experimenatl command x-nbd-server-add-bitmap, > > look at > > commit 7dc570b3806e5b0a4c9219061556ed5a4a0de80c > Author: Eric Blake > Date: Fri Jan 11 13:47:18 2019 -0600 > > nbd: Remove x-nbd-server-add-bitmap > > ... > > -# @bitmap-export-name: How the bitmap will be seen by nbd clients > -# (default @bitmap) > -# > -# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of > -# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) > to access > -# the exposed bitmap. > > > So, this "NAME" is saved and now looks incorrect. What should be > fixed, is Qapi > documentation. Hm, I don't know these interfaces very well, but from you explanation it looks to me as if having a bitmap name made sense with x-nbd-server-add-bitmap because it could be called more than once for exporting multiple bitmaps. But now, we have only nbd-server-add, which takes a single bitmap name. As we don't have to distinguish multiple bitmaps any more, wouldn't it make more sense to use "qemu:dirty-bitmap" without any other information? Both export name and bitmap name are already identified by the connection. >>> >>> We can use empty string (like the default export name), so the bitmap >>> would be exposed as: >>> >>> "qemu:dirty-bitmap:" >>> >>> This would solve the issue for users, and keep the API extensible. >> >> As I already said, we can not. If we really wont such thing, use another >> name, >> likq qemu:default-dirty-bitmap.. >> >> Or call bitmap export "default", to produce >> "qemu:dirty-bitmaps:default" >> >> But don't change default behavior of nbd-server-add >> >>> But if we have to have something there, using the bitmap name (which may or may not be the same as used in the image file) makes a little more sense because it makes the interface extensible for the case that we ever want to re-introduce an nbd-server-add-bitmap. >>> >>> But using the bitmap name means user of the NBD server need to know >>> this name. >> >> Why not? What is your case exactly? User knows, what kind of bitmap you >> are >> exporting, so user is in some relation with exporting process anyway. Why >> shouldn't it know the name? > > Because the user configuring qemu (libvirt) is not the same user > accessing qemu NBD > server (ovirt, or some backup application). > >> This name may be defined in you exporting protocol.. What are you >> exporting? > > We export HTTP API, allowing getting dirty extents and reading data: > https://www.ovirt.org/develop/release-management/features/storage/incremental-backup.html#map-request > (this document is outdated, but it gives the general idea) > > Or provide the NBD URL directly to user (future). > >> Note also, that client may use NBD_OPT_LIST_META_CONTEXT with query >> "qemu:dirty-bitmap:", to get list of all exported bitmaps. > > This is another option, I did not try to use this yet. We can use the > single > exported bitmap and fail if we get more than one. This is probably better > than changing the entire stack to support bitmap name. > >>> One option is that libvirt would publish the name of the bitmap in the >>> xml describing >>> the backup, and oVirt will have to propagate this name to the actual >>> program accessing >>> the NBD server, which may be a user program in the case when we expose >>> the NBD >>> URL to users (planned for futu
Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE
Am 20.12.2019 um 11:48 hat Max Reitz geschrieben: > On 20.12.19 11:26, Kevin Wolf wrote: > > Am 19.12.2019 um 15:38 hat Max Reitz geschrieben: > >> fuse-export-add allows mounting block graph nodes via FUSE on some > >> existing regular file. That file should then appears like a raw disk > >> image, and accesses to it result in accesses to the exported BDS. > >> > >> Right now, we only set up the mount point and tear all mount points down > >> in bdrv_close_all(). We do not implement any access functions, so > >> accessing the mount point only results in errors. This will be > >> addressed by a followup patch. > >> > >> The set of exported nodes is kept in a hash table so we can later add a > >> fuse-export-remove that allows unmounting. > >> > >> Signed-off-by: Max Reitz > > > >> diff --git a/qapi/block.json b/qapi/block.json > >> index 145c268bb6..03f8d1b537 100644 > >> --- a/qapi/block.json > >> +++ b/qapi/block.json > >> @@ -317,6 +317,29 @@ > >> ## > >> { 'command': 'nbd-server-stop' } > >> > >> +## > >> +# @fuse-export-add: > >> +# > >> +# Exports a block graph node on some (file) mountpoint as a raw image. > >> +# > >> +# @node-name: Node to be exported > >> +# > >> +# @mountpoint: Path on which to export the block device via FUSE. > >> +# This must point to an existing regular file. > >> +# > >> +# @writable: Whether clients should be able to write to the block > >> +#device via the FUSE export. (default: false) > >> +# > >> +# Since: 5.0 > >> +## > >> +{ 'command': 'fuse-export-add', > >> + 'data': { > >> + 'node-name': 'str', > >> + 'mountpoint': 'str', > >> + '*writable': 'bool' > >> + }, > >> + 'if': 'defined(CONFIG_FUSE)' } > > > > Can this use a BlockExport union from the start like I'm introducing in > > the storage daemon series, together with a generic block-export-add? > > Hm, you mean still adding a FuseExport structure that would be part of > BlockExport and then dropping fuse-export-add in favor of a > block-export-add that we want anyway? Yes. > > It also looks like node-name and writable should be part of the common > > base of BlockExport. > > node-name definitely, I’m not so sure about writable. Or, to be more > precise, I think that if we want writable to be in the base, we also > want growable to be there: Both are primarily options for the > BlockBackend that the exports use. > > But both of course also need to be supported by the export > implementation. nbd can make its BB growable all it wants, but that > doesn’t make it work. Right. Pragmatically, I think exports are very like to support writable, but probably rather unlikely to support growable. So I do think there would be a point for making writable part of the common base, but not growable. > So if we kept writable and growable in the common base, then the schema > would give no information about what exports actually support them. > > On one hand, I don’t know whether it’s important to have this > information in a static form, or whether it’s sufficient to learn at > runtime. > > On the other, I don’t know whether it’s important to have those fields > in the base or not. Would it make a difference on the wire? Not for the command itself, so I think we're free to change it later. It might make a difference for introspection, though, not sure. Markus? Having it in the base might allow us to remove some duplication in the code. Probably not much, though, so not too important. > > Unfortunately this would mean that I can't use the > > same BlockExportNbd for the existing nbd-server-add command any more. I > > guess I could somehow get a shared base type for both, though. > > Hm. This sounds like you want to make it your problem. Can I take that > to mean that you want to implement block-export-add and I can wait with > v2 until that’s done? :-) The NBD integration, yes. I already added the BlockExport type to my patches, too, but I expect you would beat me to it. I'm not currently planning to write a block-export-add because it doesn't add anything new for the storage daemon, so FuseExport and the command this is your part. The type currently only exists for --export. Kevin signature.asc Description: PGP signature
Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE
On 20.12.19 11:26, Kevin Wolf wrote: > Am 19.12.2019 um 15:38 hat Max Reitz geschrieben: >> fuse-export-add allows mounting block graph nodes via FUSE on some >> existing regular file. That file should then appears like a raw disk >> image, and accesses to it result in accesses to the exported BDS. >> >> Right now, we only set up the mount point and tear all mount points down >> in bdrv_close_all(). We do not implement any access functions, so >> accessing the mount point only results in errors. This will be >> addressed by a followup patch. >> >> The set of exported nodes is kept in a hash table so we can later add a >> fuse-export-remove that allows unmounting. >> >> Signed-off-by: Max Reitz > >> diff --git a/qapi/block.json b/qapi/block.json >> index 145c268bb6..03f8d1b537 100644 >> --- a/qapi/block.json >> +++ b/qapi/block.json >> @@ -317,6 +317,29 @@ >> ## >> { 'command': 'nbd-server-stop' } >> >> +## >> +# @fuse-export-add: >> +# >> +# Exports a block graph node on some (file) mountpoint as a raw image. >> +# >> +# @node-name: Node to be exported >> +# >> +# @mountpoint: Path on which to export the block device via FUSE. >> +# This must point to an existing regular file. >> +# >> +# @writable: Whether clients should be able to write to the block >> +#device via the FUSE export. (default: false) >> +# >> +# Since: 5.0 >> +## >> +{ 'command': 'fuse-export-add', >> + 'data': { >> + 'node-name': 'str', >> + 'mountpoint': 'str', >> + '*writable': 'bool' >> + }, >> + 'if': 'defined(CONFIG_FUSE)' } > > Can this use a BlockExport union from the start like I'm introducing in > the storage daemon series, together with a generic block-export-add? Hm, you mean still adding a FuseExport structure that would be part of BlockExport and then dropping fuse-export-add in favor of a block-export-add that we want anyway? > It also looks like node-name and writable should be part of the common > base of BlockExport. node-name definitely, I’m not so sure about writable. Or, to be more precise, I think that if we want writable to be in the base, we also want growable to be there: Both are primarily options for the BlockBackend that the exports use. But both of course also need to be supported by the export implementation. nbd can make its BB growable all it wants, but that doesn’t make it work. So if we kept writable and growable in the common base, then the schema would give no information about what exports actually support them. On one hand, I don’t know whether it’s important to have this information in a static form, or whether it’s sufficient to learn at runtime. On the other, I don’t know whether it’s important to have those fields in the base or not. Would it make a difference on the wire? > Unfortunately this would mean that I can't use the > same BlockExportNbd for the existing nbd-server-add command any more. I > guess I could somehow get a shared base type for both, though. Hm. This sounds like you want to make it your problem. Can I take that to mean that you want to implement block-export-add and I can wait with v2 until that’s done? :-) Max > Markus, any thoughts on these QAPI interfaces? > > Kevin signature.asc Description: OpenPGP digital signature
Re: [PATCH] block: nbd: Fix dirty bitmap context name
On Fri, Dec 20, 2019 at 10:06:17 +, Vladimir Sementsov-Ogievskiy wrote: > 20.12.2019 12:56, Peter Krempa wrote: > > On Fri, Dec 20, 2019 at 09:39:17 +, Vladimir Sementsov-Ogievskiy wrote: > >> 19.12.2019 18:55, Nir Soffer wrote: > >>> On Thu, Dec 19, 2019 at 5:17 PM Vladimir Sementsov-Ogievskiy > >>> wrote: > > 19.12.2019 17:59, Nir Soffer wrote: > > On Thu, Dec 19, 2019 at 4:04 PM Kevin Wolf wrote: > >> > >> Am 19.12.2019 um 14:41 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>> Ahh, I see, it's documented as > >>> > >>> +# @bitmap: Also export the dirty bitmap reachable from @device, so > >>> the > >>> +# NBD client can use NBD_OPT_SET_META_CONTEXT with > >>> +# "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since > >>> 4.0) > >>> > >>> and it is logical to assume that export name (which is @name > >>> argument) is > >>> mentioned. But we never mentioned it. This is just documented after > >>> removed experimenatl command x-nbd-server-add-bitmap, > >>> > >>> look at > >>> > >>> commit 7dc570b3806e5b0a4c9219061556ed5a4a0de80c > >>> Author: Eric Blake > >>> Date: Fri Jan 11 13:47:18 2019 -0600 > >>> > >>> nbd: Remove x-nbd-server-add-bitmap > >>> > >>> ... > >>> > >>> -# @bitmap-export-name: How the bitmap will be seen by nbd clients > >>> -# (default @bitmap) > >>> -# > >>> -# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of > >>> -# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) > >>> to access > >>> -# the exposed bitmap. > >>> > >>> > >>> So, this "NAME" is saved and now looks incorrect. What should be > >>> fixed, is Qapi > >>> documentation. > >> > >> Hm, I don't know these interfaces very well, but from you explanation > >> it > >> looks to me as if having a bitmap name made sense with > >> x-nbd-server-add-bitmap because it could be called more than once for > >> exporting multiple bitmaps. > >> > >> But now, we have only nbd-server-add, which takes a single bitmap name. > >> As we don't have to distinguish multiple bitmaps any more, wouldn't it > >> make more sense to use "qemu:dirty-bitmap" without any other > >> information? Both export name and bitmap name are already identified by > >> the connection. > > > > We can use empty string (like the default export name), so the bitmap > > would be exposed as: > > > >"qemu:dirty-bitmap:" > > > > This would solve the issue for users, and keep the API extensible. > > As I already said, we can not. If we really wont such thing, use another > name, > likq qemu:default-dirty-bitmap.. > > Or call bitmap export "default", to produce > "qemu:dirty-bitmaps:default" > > But don't change default behavior of nbd-server-add > > > > >> But if we have to have something there, using the bitmap name (which > >> may > >> or may not be the same as used in the image file) makes a little more > >> sense because it makes the interface extensible for the case that we > >> ever want to re-introduce an nbd-server-add-bitmap. > > > > But using the bitmap name means user of the NBD server need to know > > this name. > > Why not? What is your case exactly? User knows, what kind of bitmap you > are > exporting, so user is in some relation with exporting process anyway. Why > shouldn't it know the name? > >>> > >>> Because the user configuring qemu (libvirt) is not the same user > >>> accessing qemu NBD > >>> server (ovirt, or some backup application). > >>> > This name may be defined in you exporting protocol.. What are you > exporting? > >>> > >>> We export HTTP API, allowing getting dirty extents and reading data: > >>> https://www.ovirt.org/develop/release-management/features/storage/incremental-backup.html#map-request > >>> (this document is outdated, but it gives the general idea) > >>> > >>> Or provide the NBD URL directly to user (future). > >>> > Note also, that client may use NBD_OPT_LIST_META_CONTEXT with query > "qemu:dirty-bitmap:", to get list of all exported bitmaps. > >>> > >>> This is another option, I did not try to use this yet. We can use the > >>> single > >>> exported bitmap and fail if we get more than one. This is probably better > >>> than changing the entire stack to support bitmap name. > >>> > > One option is that libvirt would publish the name of the bitmap in the > > xml describing > > the backup, and oVirt will have to propagate this name to the actual > > program accessing > > the NBD server, which may be a user program in the case when we expose > > the NBD > > URL to users (planned for future version). > > > > Another option
Re: [PATCH v5 4/4] blockdev: honor bdrv_try_set_aio_context() context requirements
Sergio Lopez writes: > Sergio Lopez writes: > >> Kevin Wolf writes: >> >>> Am 13.12.2019 um 21:59 hat Eric Blake geschrieben: On 12/9/19 10:06 AM, Kevin Wolf wrote: > Am 28.11.2019 um 11:41 hat Sergio Lopez geschrieben: > > bdrv_try_set_aio_context() requires that the old context is held, and > > the new context is not held. Fix all the occurrences where it's not > > done this way. > > > > Suggested-by: Max Reitz > > Signed-off-by: Sergio Lopez > > --- > Or in fact, I think you need to hold the AioContext of a bs to > bdrv_unref() it, so maybe 'goto out' is right, but you need to unref > target_bs while you still hold old_context. I suspect https://bugzilla.redhat.com/show_bug.cgi?id=1779036 is also a symptom of this. The v5 patch did not fix this simple test case: >>> >>> Speaking of a test case... I think this series should probably add >>> something to iotests. >> >> Okay, I'll try to add one. > > So I've been working on this, but it turns out that the issue isn't > reproducible with 'accel=qtest'. I guess that's because the devices > using the nodes aren't really active in this situation. > > Is it allowed to use 'accel=kvm:tcg' in iotests? I see test 235 does > that, but I'm not sure if that's just an exception. Please ignore this, it was another issue. Thanks, Sergio. signature.asc Description: PGP signature
Re: [PATCH 00/18] block: Allow exporting BDSs via FUSE
On 20.12.19 11:08, Stefan Hajnoczi wrote: > On Thu, Dec 19, 2019 at 03:38:00PM +0100, Max Reitz wrote: >> Preamble: This series is based on a combination of my (current) block >> branch and “iotests: Minor fixes”. I’ve pushed it here: >> >> https://git.xanclic.moe/XanClic/qemu fuse-exports-v1 >> >> (The base is on fuse-exports-v1-base.) >> >> >> Hi, >> >> Ever since I found out that you can mount FUSE filesystems on regular >> files (not just directories), I had the idea of adding FUSE block >> exports to qemu where you can export block nodes as raw images. The >> best thing is that you’d be able to mount an image on itself, so >> whatever format it may be in, qemu lets it appear as a raw image (and >> you can then use regular tools like dd on it). >> >> I started with some concept of a qemu-blkfuse daemon (similar to >> qemu-nbd), but never sent patches, for two reasons: (1) Performance was >> not good, (2) it didn’t seem right, for some reason. >> >> Now Kevin is proposing a storage daemon for multiple export types like >> NBD, and he also mentioned FUSE (while knowing of my previous attempts). >> Now it does seem right to add FUSE exports to qemu, but only in the form >> of some module with a proper QAPI/QMP binding. >> >> Performance is still quite bad, but who cares. We can always improve >> it, if the need arises. >> >> >> This series does the following: >> >> First, add the FUSE export module (block/fuse.c) that implements the >> basic file access functions. (Note that you need libfuse 3.8.0 or later >> for SEEK_HOLE/SEEK_DATA.) >> >> Second, it allows using FUSE exports as a protocol in the iotests and >> makes many iotests work with it. (The file node is exported by a >> background qemu instance to $SOCK_DIR.) >> Note that I only ran raw and qcow2 on it; I’m sure other formats >> currently have some failing tests. >> >> This gives us a lot of coverage for, well, not free (it does take ten >> patches), but for cheap; but there are still some more specialized >> things we want to test, so third and last, this series adds an iotest >> dedicated to FUSE exports. >> >> >> Final rather important notice: I didn’t really run the iotests with this >> yet. I wanted to, but they appear rather broken on current master, >> actually. I’m not yet sure whether that’s because something in my setup >> broke in the last two weeks, or because there’s quite something broken >> in master (it does look like there are a couple things broken in master >> currently). >> >> >> Max Reitz (18): >> configure: Detect libfuse >> fuse: Allow exporting BDSs via FUSE >> fuse: Implement standard FUSE operations >> fuse: Add fuse-export-remove >> fuse: Allow growable exports >> fuse: (Partially) implement fallocate() >> fuse: Implement hole detection through lseek >> iotests: Do not needlessly filter _make_test_img >> iotests: Do not pipe _make_test_img >> iotests: Use convert -n in some cases >> iotests: Avoid renaming images >> iotests: Derive image names from $TEST_IMG >> iotests/091: Use _cleanup_qemu instad of "wait" >> iotests: Restrict some Python tests to file >> iotests: Let _make_test_img guess $TEST_IMG_FILE >> iotests: Allow testing FUSE exports >> iotests: Enable fuse for many tests >> iotests/281: Add test for FUSE exports >> >> block.c | 4 + >> block/Makefile.objs | 3 + >> block/fuse.c | 668 +++ >> configure| 68 >> include/block/fuse.h | 24 ++ >> qapi/block.json | 42 ++ >> tests/qemu-iotests/013 | 9 +- >> tests/qemu-iotests/013.out | 3 +- >> tests/qemu-iotests/018 | 5 +- >> tests/qemu-iotests/018.out | 1 + >> tests/qemu-iotests/020 | 2 +- >> tests/qemu-iotests/025 | 2 +- >> tests/qemu-iotests/026 | 2 +- >> tests/qemu-iotests/028 | 16 +- >> tests/qemu-iotests/028.out | 3 + >> tests/qemu-iotests/031 | 2 +- >> tests/qemu-iotests/034 | 2 +- >> tests/qemu-iotests/036 | 2 +- >> tests/qemu-iotests/037 | 2 +- >> tests/qemu-iotests/038 | 2 +- >> tests/qemu-iotests/039 | 2 +- >> tests/qemu-iotests/046 | 7 +- >> tests/qemu-iotests/046.out | 2 +- >> tests/qemu-iotests/050 | 2 +- >> tests/qemu-iotests/054 | 2 +- >> tests/qemu-iotests/060 | 2 +- >> tests/qemu-iotests/071 | 21 +- >> tests/qemu-iotests/072 | 5 +- >> tests/qemu-iotests/072.out | 1 + >> tests/qemu-iotests/079 | 2 +- >> tests/qemu-iotests/080 | 2 +- >> tests/qemu-iotests/089 | 5 +- >> tests/qemu-iotests/089.out | 1 + >> tests/qemu-iotests/090 | 2 +- >> tests/qemu-iotests/091 | 5 +- >> tests/qemu-iotests/095 | 2 +- >
Re: [PATCH 02/18] fuse: Allow exporting BDSs via FUSE
Am 19.12.2019 um 15:38 hat Max Reitz geschrieben: > fuse-export-add allows mounting block graph nodes via FUSE on some > existing regular file. That file should then appears like a raw disk > image, and accesses to it result in accesses to the exported BDS. > > Right now, we only set up the mount point and tear all mount points down > in bdrv_close_all(). We do not implement any access functions, so > accessing the mount point only results in errors. This will be > addressed by a followup patch. > > The set of exported nodes is kept in a hash table so we can later add a > fuse-export-remove that allows unmounting. > > Signed-off-by: Max Reitz > diff --git a/qapi/block.json b/qapi/block.json > index 145c268bb6..03f8d1b537 100644 > --- a/qapi/block.json > +++ b/qapi/block.json > @@ -317,6 +317,29 @@ > ## > { 'command': 'nbd-server-stop' } > > +## > +# @fuse-export-add: > +# > +# Exports a block graph node on some (file) mountpoint as a raw image. > +# > +# @node-name: Node to be exported > +# > +# @mountpoint: Path on which to export the block device via FUSE. > +# This must point to an existing regular file. > +# > +# @writable: Whether clients should be able to write to the block > +#device via the FUSE export. (default: false) > +# > +# Since: 5.0 > +## > +{ 'command': 'fuse-export-add', > + 'data': { > + 'node-name': 'str', > + 'mountpoint': 'str', > + '*writable': 'bool' > + }, > + 'if': 'defined(CONFIG_FUSE)' } Can this use a BlockExport union from the start like I'm introducing in the storage daemon series, together with a generic block-export-add? It also looks like node-name and writable should be part of the common base of BlockExport. Unfortunately this would mean that I can't use the same BlockExportNbd for the existing nbd-server-add command any more. I guess I could somehow get a shared base type for both, though. Markus, any thoughts on these QAPI interfaces? Kevin
[PULL 3/3] virtio-blk: fix out-of-bounds access to bitmap in notify_guest_bh
From: Li Hangjing When the number of a virtio-blk device's virtqueues is larger than BITS_PER_LONG, the out-of-bounds access to bitmap[ ] will occur. Fixes: e21737ab15 ("virtio-blk: multiqueue batch notify") Cc: qemu-sta...@nongnu.org Cc: Stefan Hajnoczi Signed-off-by: Li Hangjing Reviewed-by: Xie Yongji Reviewed-by: Chai Wen Message-id: 20191216023050.48620-1-lihangj...@baidu.com Message-Id: <20191216023050.48620-1-lihangj...@baidu.com> Signed-off-by: Stefan Hajnoczi --- hw/block/dataplane/virtio-blk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index 119906a5fe..1b52e8159c 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -67,7 +67,7 @@ static void notify_guest_bh(void *opaque) memset(s->batch_notify_vqs, 0, sizeof(bitmap)); for (j = 0; j < nvqs; j += BITS_PER_LONG) { -unsigned long bits = bitmap[j]; +unsigned long bits = bitmap[j / BITS_PER_LONG]; while (bits != 0) { unsigned i = j + ctzl(bits); -- 2.23.0
[PULL 2/3] docs: fix rst syntax errors in unbuilt docs
The .rst files outside docs/{devel,interop,specs} aren't built yet and therefore a few syntax errors have slipped through. Fix them. Signed-off-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrangé Message-Id: <2019094411.427174-1-stefa...@redhat.com> Signed-off-by: Stefan Hajnoczi --- docs/arm-cpu-features.rst| 6 +++--- docs/virtio-net-failover.rst | 4 ++-- docs/virtio-pmem.rst | 19 ++- 3 files changed, 15 insertions(+), 14 deletions(-) diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst index 1b367e22e1..9b537a75e6 100644 --- a/docs/arm-cpu-features.rst +++ b/docs/arm-cpu-features.rst @@ -41,9 +41,9 @@ CPU type is possible with the `query-cpu-model-expansion` QMP command. Below are some examples where `scripts/qmp/qmp-shell` (see the top comment block in the script for usage) is used to issue the QMP commands. -(1) Determine which CPU features are available for the `max` CPU type -(Note, we started QEMU with qemu-system-aarch64, so `max` is - implementing the ARMv8-A reference manual in this case):: +1. Determine which CPU features are available for the `max` CPU type + (Note, we started QEMU with qemu-system-aarch64, so `max` is + implementing the ARMv8-A reference manual in this case):: (QEMU) query-cpu-model-expansion type=full model={"name":"max"} { "return": { diff --git a/docs/virtio-net-failover.rst b/docs/virtio-net-failover.rst index 22f64c7bc8..6002dc5d96 100644 --- a/docs/virtio-net-failover.rst +++ b/docs/virtio-net-failover.rst @@ -1,6 +1,6 @@ - +== QEMU virtio-net standby (net_failover) - +== This document explains the setup and usage of virtio-net standby feature which is used to create a net_failover pair of devices. diff --git a/docs/virtio-pmem.rst b/docs/virtio-pmem.rst index e77881b26f..4bf5d00443 100644 --- a/docs/virtio-pmem.rst +++ b/docs/virtio-pmem.rst @@ -27,17 +27,18 @@ virtio pmem usage - A virtio pmem device backed by a memory-backend-file can be created on - the QEMU command line as in the following example: + the QEMU command line as in the following example:: - -object memory-backend-file,id=mem1,share,mem-path=./virtio_pmem.img,size=4G - -device virtio-pmem-pci,memdev=mem1,id=nv1 +-object memory-backend-file,id=mem1,share,mem-path=./virtio_pmem.img,size=4G +-device virtio-pmem-pci,memdev=mem1,id=nv1 - where: - - "object memory-backend-file,id=mem1,share,mem-path=, size=" - creates a backend file with the specified size. + where: - - "device virtio-pmem-pci,id=nvdimm1,memdev=mem1" creates a virtio pmem - pci device whose storage is provided by above memory backend device. + - "object memory-backend-file,id=mem1,share,mem-path=, size=" +creates a backend file with the specified size. + + - "device virtio-pmem-pci,id=nvdimm1,memdev=mem1" creates a virtio pmem +pci device whose storage is provided by above memory backend device. Multiple virtio pmem devices can be created if multiple pairs of "-object" and "-device" are provided. @@ -50,7 +51,7 @@ memory backing has to be added via 'object_add'; afterwards, the virtio pmem device can be added via 'device_add'. For example, the following commands add another 4GB virtio pmem device to -the guest: +the guest:: (qemu) object_add memory-backend-file,id=mem2,share=on,mem-path=virtio_pmem2.img,size=4G (qemu) device_add virtio-pmem-pci,id=virtio_pmem2,memdev=mem2 -- 2.23.0
[PULL 1/3] virtio-blk: deprecate SCSI passthrough
The Linux virtio_blk.ko guest driver is removing legacy SCSI passthrough support. Deprecate this feature in QEMU too. Signed-off-by: Stefan Hajnoczi Reviewed-by: Paolo Bonzini Reviewed-by: Christoph Hellwig Reviewed-by: Thomas Huth Message-id: 20191213144626.1208237-1-stefa...@redhat.com Message-Id: <20191213144626.1208237-1-stefa...@redhat.com> Signed-off-by: Stefan Hajnoczi --- qemu-deprecated.texi | 11 +++ 1 file changed, 11 insertions(+) diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index 62680f7bd5..259cb9ce9e 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -278,6 +278,17 @@ spec you can use the ``-cpu rv64gcsu,priv_spec=v1.9.1`` command line argument. @section Device options +@subsection Emulated device options + +@subsubsection -device virtio-blk,scsi=on|off (since 5.0.0) + +The virtio-blk SCSI passthrough feature is a legacy VIRTIO feature. VIRTIO 1.0 +and later do not support it because the virtio-scsi device was introduced for +full SCSI support. Use virtio-scsi instead when SCSI passthrough is required. + +Note this also applies to ``-device virtio-blk-pci,scsi=on|off'', which is an +alias. + @subsection Block device options @subsubsection "backing": "" (since 2.12.0) -- 2.23.0
[PULL 0/3] Block patches
The following changes since commit aceeaa69d28e6f08a24395d0aa6915b687d0a681: Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2019-12-17' into staging (2019-12-17 15:55:20 +) are available in the Git repository at: https://github.com/stefanha/qemu.git tags/block-pull-request for you to fetch changes up to 725fe5d10dbd4259b1853b7d253cef83a3c0d22a: virtio-blk: fix out-of-bounds access to bitmap in notify_guest_bh (2019-12-19 16:20:25 +) Pull request Li Hangjing (1): virtio-blk: fix out-of-bounds access to bitmap in notify_guest_bh Stefan Hajnoczi (2): virtio-blk: deprecate SCSI passthrough docs: fix rst syntax errors in unbuilt docs docs/arm-cpu-features.rst | 6 +++--- docs/virtio-net-failover.rst| 4 ++-- docs/virtio-pmem.rst| 19 ++- hw/block/dataplane/virtio-blk.c | 2 +- qemu-deprecated.texi| 11 +++ 5 files changed, 27 insertions(+), 15 deletions(-) -- 2.23.0
Re: [PATCH v2] iotests.py: Let wait_migration wait even more
On 20.12.19 11:07, Kevin Wolf wrote: > Am 20.12.2019 um 10:52 hat Max Reitz geschrieben: >> On 20.12.19 10:32, Kevin Wolf wrote: >>> Am 19.12.2019 um 19:36 hat Max Reitz geschrieben: The "migration completed" event may be sent (on the source, to be specific) before the migration is actually completed, so the VM runstate will still be "finish-migrate" instead of "postmigrate". So ask the users of VM.wait_migration() to specify the final runstate they desire and then poll the VM until it has reached that state. (This should be over very quickly, so busy polling is fine.) Without this patch, I see intermittent failures in the new iotest 280 under high system load. I have not yet seen such failures with other iotests that use VM.wait_migration() and query-status afterwards, but maybe they just occur even more rarely, or it is because they also wait on the destination VM to be running. Signed-off-by: Max Reitz >>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 13fd8b5cd2..0b62c42851 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -668,12 +668,16 @@ class VM(qtest.QEMUQtestMachine): } ])) -def wait_migration(self): +def wait_migration(self, expect_runstate): while True: event = self.event_wait('MIGRATION') log(event, filters=[filter_qmp_event]) if event['data']['status'] == 'completed': break +# The event may occur in finish-migrate, so wait for the expected +# post-migration runstate >>> >>> That's a bit too specific now that you have expect_runstate. >> >> Can you be more specific? :-) >> >> If you mean the fact of mentioning “post-migration runstate”, I simply >> meant that as “the runstate after the migration”. The specific runstate >> on the source VM is called “postmigrate”. > > Oh, yes, "postmigrate" is what I had in mind. > >> I wouldn’t mind changing it to “after-migration runstate” or something >> similar, if that’s what you mean. > > "runstate after migration"? Sure. (Now you got me to wonder what permutations are permissible. “migration after runstate” isn’t. “runstate migration after” is just garbage. So probebly on RAM and AMR.) Max signature.asc Description: OpenPGP digital signature
Re: [PATCH 00/18] block: Allow exporting BDSs via FUSE
On Thu, Dec 19, 2019 at 03:38:00PM +0100, Max Reitz wrote: > Preamble: This series is based on a combination of my (current) block > branch and “iotests: Minor fixes”. I’ve pushed it here: > > https://git.xanclic.moe/XanClic/qemu fuse-exports-v1 > > (The base is on fuse-exports-v1-base.) > > > Hi, > > Ever since I found out that you can mount FUSE filesystems on regular > files (not just directories), I had the idea of adding FUSE block > exports to qemu where you can export block nodes as raw images. The > best thing is that you’d be able to mount an image on itself, so > whatever format it may be in, qemu lets it appear as a raw image (and > you can then use regular tools like dd on it). > > I started with some concept of a qemu-blkfuse daemon (similar to > qemu-nbd), but never sent patches, for two reasons: (1) Performance was > not good, (2) it didn’t seem right, for some reason. > > Now Kevin is proposing a storage daemon for multiple export types like > NBD, and he also mentioned FUSE (while knowing of my previous attempts). > Now it does seem right to add FUSE exports to qemu, but only in the form > of some module with a proper QAPI/QMP binding. > > Performance is still quite bad, but who cares. We can always improve > it, if the need arises. > > > This series does the following: > > First, add the FUSE export module (block/fuse.c) that implements the > basic file access functions. (Note that you need libfuse 3.8.0 or later > for SEEK_HOLE/SEEK_DATA.) > > Second, it allows using FUSE exports as a protocol in the iotests and > makes many iotests work with it. (The file node is exported by a > background qemu instance to $SOCK_DIR.) > Note that I only ran raw and qcow2 on it; I’m sure other formats > currently have some failing tests. > > This gives us a lot of coverage for, well, not free (it does take ten > patches), but for cheap; but there are still some more specialized > things we want to test, so third and last, this series adds an iotest > dedicated to FUSE exports. > > > Final rather important notice: I didn’t really run the iotests with this > yet. I wanted to, but they appear rather broken on current master, > actually. I’m not yet sure whether that’s because something in my setup > broke in the last two weeks, or because there’s quite something broken > in master (it does look like there are a couple things broken in master > currently). > > > Max Reitz (18): > configure: Detect libfuse > fuse: Allow exporting BDSs via FUSE > fuse: Implement standard FUSE operations > fuse: Add fuse-export-remove > fuse: Allow growable exports > fuse: (Partially) implement fallocate() > fuse: Implement hole detection through lseek > iotests: Do not needlessly filter _make_test_img > iotests: Do not pipe _make_test_img > iotests: Use convert -n in some cases > iotests: Avoid renaming images > iotests: Derive image names from $TEST_IMG > iotests/091: Use _cleanup_qemu instad of "wait" > iotests: Restrict some Python tests to file > iotests: Let _make_test_img guess $TEST_IMG_FILE > iotests: Allow testing FUSE exports > iotests: Enable fuse for many tests > iotests/281: Add test for FUSE exports > > block.c | 4 + > block/Makefile.objs | 3 + > block/fuse.c | 668 +++ > configure| 68 > include/block/fuse.h | 24 ++ > qapi/block.json | 42 ++ > tests/qemu-iotests/013 | 9 +- > tests/qemu-iotests/013.out | 3 +- > tests/qemu-iotests/018 | 5 +- > tests/qemu-iotests/018.out | 1 + > tests/qemu-iotests/020 | 2 +- > tests/qemu-iotests/025 | 2 +- > tests/qemu-iotests/026 | 2 +- > tests/qemu-iotests/028 | 16 +- > tests/qemu-iotests/028.out | 3 + > tests/qemu-iotests/031 | 2 +- > tests/qemu-iotests/034 | 2 +- > tests/qemu-iotests/036 | 2 +- > tests/qemu-iotests/037 | 2 +- > tests/qemu-iotests/038 | 2 +- > tests/qemu-iotests/039 | 2 +- > tests/qemu-iotests/046 | 7 +- > tests/qemu-iotests/046.out | 2 +- > tests/qemu-iotests/050 | 2 +- > tests/qemu-iotests/054 | 2 +- > tests/qemu-iotests/060 | 2 +- > tests/qemu-iotests/071 | 21 +- > tests/qemu-iotests/072 | 5 +- > tests/qemu-iotests/072.out | 1 + > tests/qemu-iotests/079 | 2 +- > tests/qemu-iotests/080 | 2 +- > tests/qemu-iotests/089 | 5 +- > tests/qemu-iotests/089.out | 1 + > tests/qemu-iotests/090 | 2 +- > tests/qemu-iotests/091 | 5 +- > tests/qemu-iotests/095 | 2 +- > tests/qemu-iotests/097 | 2 +- > tests/qemu-iotests/098 | 2 +- > tests/qemu-iotests/102 | 2 +- > t
Re: [PATCH v2] iotests.py: Let wait_migration wait even more
Am 20.12.2019 um 10:52 hat Max Reitz geschrieben: > On 20.12.19 10:32, Kevin Wolf wrote: > > Am 19.12.2019 um 19:36 hat Max Reitz geschrieben: > >> The "migration completed" event may be sent (on the source, to be > >> specific) before the migration is actually completed, so the VM runstate > >> will still be "finish-migrate" instead of "postmigrate". So ask the > >> users of VM.wait_migration() to specify the final runstate they desire > >> and then poll the VM until it has reached that state. (This should be > >> over very quickly, so busy polling is fine.) > >> > >> Without this patch, I see intermittent failures in the new iotest 280 > >> under high system load. I have not yet seen such failures with other > >> iotests that use VM.wait_migration() and query-status afterwards, but > >> maybe they just occur even more rarely, or it is because they also wait > >> on the destination VM to be running. > >> > >> Signed-off-by: Max Reitz > > > >> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py > >> index 13fd8b5cd2..0b62c42851 100644 > >> --- a/tests/qemu-iotests/iotests.py > >> +++ b/tests/qemu-iotests/iotests.py > >> @@ -668,12 +668,16 @@ class VM(qtest.QEMUQtestMachine): > >> } > >> ])) > >> > >> -def wait_migration(self): > >> +def wait_migration(self, expect_runstate): > >> while True: > >> event = self.event_wait('MIGRATION') > >> log(event, filters=[filter_qmp_event]) > >> if event['data']['status'] == 'completed': > >> break > >> +# The event may occur in finish-migrate, so wait for the expected > >> +# post-migration runstate > > > > That's a bit too specific now that you have expect_runstate. > > Can you be more specific? :-) > > If you mean the fact of mentioning “post-migration runstate”, I simply > meant that as “the runstate after the migration”. The specific runstate > on the source VM is called “postmigrate”. Oh, yes, "postmigrate" is what I had in mind. > I wouldn’t mind changing it to “after-migration runstate” or something > similar, if that’s what you mean. "runstate after migration"? Kevin signature.asc Description: PGP signature
Re: [PATCH] block: nbd: Fix dirty bitmap context name
20.12.2019 12:56, Peter Krempa wrote: > On Fri, Dec 20, 2019 at 09:39:17 +, Vladimir Sementsov-Ogievskiy wrote: >> 19.12.2019 18:55, Nir Soffer wrote: >>> On Thu, Dec 19, 2019 at 5:17 PM Vladimir Sementsov-Ogievskiy >>> wrote: 19.12.2019 17:59, Nir Soffer wrote: > On Thu, Dec 19, 2019 at 4:04 PM Kevin Wolf wrote: >> >> Am 19.12.2019 um 14:41 hat Vladimir Sementsov-Ogievskiy geschrieben: >>> Ahh, I see, it's documented as >>> >>> +# @bitmap: Also export the dirty bitmap reachable from @device, so the >>> +# NBD client can use NBD_OPT_SET_META_CONTEXT with >>> +# "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0) >>> >>> and it is logical to assume that export name (which is @name argument) >>> is >>> mentioned. But we never mentioned it. This is just documented after >>> removed experimenatl command x-nbd-server-add-bitmap, >>> >>> look at >>> >>> commit 7dc570b3806e5b0a4c9219061556ed5a4a0de80c >>> Author: Eric Blake >>> Date: Fri Jan 11 13:47:18 2019 -0600 >>> >>> nbd: Remove x-nbd-server-add-bitmap >>> >>> ... >>> >>> -# @bitmap-export-name: How the bitmap will be seen by nbd clients >>> -# (default @bitmap) >>> -# >>> -# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of >>> -# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) to >>> access >>> -# the exposed bitmap. >>> >>> >>> So, this "NAME" is saved and now looks incorrect. What should be fixed, >>> is Qapi >>> documentation. >> >> Hm, I don't know these interfaces very well, but from you explanation it >> looks to me as if having a bitmap name made sense with >> x-nbd-server-add-bitmap because it could be called more than once for >> exporting multiple bitmaps. >> >> But now, we have only nbd-server-add, which takes a single bitmap name. >> As we don't have to distinguish multiple bitmaps any more, wouldn't it >> make more sense to use "qemu:dirty-bitmap" without any other >> information? Both export name and bitmap name are already identified by >> the connection. > > We can use empty string (like the default export name), so the bitmap > would be exposed as: > >"qemu:dirty-bitmap:" > > This would solve the issue for users, and keep the API extensible. As I already said, we can not. If we really wont such thing, use another name, likq qemu:default-dirty-bitmap.. Or call bitmap export "default", to produce "qemu:dirty-bitmaps:default" But don't change default behavior of nbd-server-add > >> But if we have to have something there, using the bitmap name (which may >> or may not be the same as used in the image file) makes a little more >> sense because it makes the interface extensible for the case that we >> ever want to re-introduce an nbd-server-add-bitmap. > > But using the bitmap name means user of the NBD server need to know this > name. Why not? What is your case exactly? User knows, what kind of bitmap you are exporting, so user is in some relation with exporting process anyway. Why shouldn't it know the name? >>> >>> Because the user configuring qemu (libvirt) is not the same user >>> accessing qemu NBD >>> server (ovirt, or some backup application). >>> This name may be defined in you exporting protocol.. What are you exporting? >>> >>> We export HTTP API, allowing getting dirty extents and reading data: >>> https://www.ovirt.org/develop/release-management/features/storage/incremental-backup.html#map-request >>> (this document is outdated, but it gives the general idea) >>> >>> Or provide the NBD URL directly to user (future). >>> Note also, that client may use NBD_OPT_LIST_META_CONTEXT with query "qemu:dirty-bitmap:", to get list of all exported bitmaps. >>> >>> This is another option, I did not try to use this yet. We can use the single >>> exported bitmap and fail if we get more than one. This is probably better >>> than changing the entire stack to support bitmap name. >>> > One option is that libvirt would publish the name of the bitmap in the > xml describing > the backup, and oVirt will have to propagate this name to the actual > program accessing > the NBD server, which may be a user program in the case when we expose > the NBD > URL to users (planned for future version). > > Another option is that the user will control this name, and libvirt > will use the name specified > by the user. This means oVirt will have to provide API to set this > name and pass it to libvirt. > > Both cases require lot of effort which does not help anyone in the > task of getting dirty > extents from an image - which is our current goal.
Re: [PATCH] block: nbd: Fix dirty bitmap context name
On Fri, Dec 20, 2019 at 09:39:17 +, Vladimir Sementsov-Ogievskiy wrote: > 19.12.2019 18:55, Nir Soffer wrote: > > On Thu, Dec 19, 2019 at 5:17 PM Vladimir Sementsov-Ogievskiy > > wrote: > >> > >> 19.12.2019 17:59, Nir Soffer wrote: > >>> On Thu, Dec 19, 2019 at 4:04 PM Kevin Wolf wrote: > > Am 19.12.2019 um 14:41 hat Vladimir Sementsov-Ogievskiy geschrieben: > > Ahh, I see, it's documented as > > > > +# @bitmap: Also export the dirty bitmap reachable from @device, so the > > +# NBD client can use NBD_OPT_SET_META_CONTEXT with > > +# "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0) > > > > and it is logical to assume that export name (which is @name argument) > > is > > mentioned. But we never mentioned it. This is just documented after > > removed experimenatl command x-nbd-server-add-bitmap, > > > > look at > > > > commit 7dc570b3806e5b0a4c9219061556ed5a4a0de80c > > Author: Eric Blake > > Date: Fri Jan 11 13:47:18 2019 -0600 > > > >nbd: Remove x-nbd-server-add-bitmap > > > > ... > > > > -# @bitmap-export-name: How the bitmap will be seen by nbd clients > > -# (default @bitmap) > > -# > > -# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of > > -# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) to > > access > > -# the exposed bitmap. > > > > > > So, this "NAME" is saved and now looks incorrect. What should be fixed, > > is Qapi > > documentation. > > Hm, I don't know these interfaces very well, but from you explanation it > looks to me as if having a bitmap name made sense with > x-nbd-server-add-bitmap because it could be called more than once for > exporting multiple bitmaps. > > But now, we have only nbd-server-add, which takes a single bitmap name. > As we don't have to distinguish multiple bitmaps any more, wouldn't it > make more sense to use "qemu:dirty-bitmap" without any other > information? Both export name and bitmap name are already identified by > the connection. > >>> > >>> We can use empty string (like the default export name), so the bitmap > >>> would be exposed as: > >>> > >>> "qemu:dirty-bitmap:" > >>> > >>> This would solve the issue for users, and keep the API extensible. > >> > >> As I already said, we can not. If we really wont such thing, use another > >> name, > >> likq qemu:default-dirty-bitmap.. > >> > >> Or call bitmap export "default", to produce > >>"qemu:dirty-bitmaps:default" > >> > >> But don't change default behavior of nbd-server-add > >> > >>> > But if we have to have something there, using the bitmap name (which may > or may not be the same as used in the image file) makes a little more > sense because it makes the interface extensible for the case that we > ever want to re-introduce an nbd-server-add-bitmap. > >>> > >>> But using the bitmap name means user of the NBD server need to know this > >>> name. > >> > >> Why not? What is your case exactly? User knows, what kind of bitmap you are > >> exporting, so user is in some relation with exporting process anyway. Why > >> shouldn't it know the name? > > > > Because the user configuring qemu (libvirt) is not the same user > > accessing qemu NBD > > server (ovirt, or some backup application). > > > >> This name may be defined in you exporting protocol.. What are you > >> exporting? > > > > We export HTTP API, allowing getting dirty extents and reading data: > > https://www.ovirt.org/develop/release-management/features/storage/incremental-backup.html#map-request > > (this document is outdated, but it gives the general idea) > > > > Or provide the NBD URL directly to user (future). > > > >> Note also, that client may use NBD_OPT_LIST_META_CONTEXT with query > >> "qemu:dirty-bitmap:", to get list of all exported bitmaps. > > > > This is another option, I did not try to use this yet. We can use the single > > exported bitmap and fail if we get more than one. This is probably better > > than changing the entire stack to support bitmap name. > > > >>> One option is that libvirt would publish the name of the bitmap in the > >>> xml describing > >>> the backup, and oVirt will have to propagate this name to the actual > >>> program accessing > >>> the NBD server, which may be a user program in the case when we expose > >>> the NBD > >>> URL to users (planned for future version). > >>> > >>> Another option is that the user will control this name, and libvirt > >>> will use the name specified > >>> by the user. This means oVirt will have to provide API to set this > >>> name and pass it to libvirt. > >>> > >>> Both cases require lot of effort which does not help anyone in the > >>> task of getting dirty > >>> extents from an image - which is our current goal. We need to have > >>> good defaults t
Re: [PATCH v2] iotests.py: Let wait_migration wait even more
On 20.12.19 10:32, Kevin Wolf wrote: > Am 19.12.2019 um 19:36 hat Max Reitz geschrieben: >> The "migration completed" event may be sent (on the source, to be >> specific) before the migration is actually completed, so the VM runstate >> will still be "finish-migrate" instead of "postmigrate". So ask the >> users of VM.wait_migration() to specify the final runstate they desire >> and then poll the VM until it has reached that state. (This should be >> over very quickly, so busy polling is fine.) >> >> Without this patch, I see intermittent failures in the new iotest 280 >> under high system load. I have not yet seen such failures with other >> iotests that use VM.wait_migration() and query-status afterwards, but >> maybe they just occur even more rarely, or it is because they also wait >> on the destination VM to be running. >> >> Signed-off-by: Max Reitz > >> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py >> index 13fd8b5cd2..0b62c42851 100644 >> --- a/tests/qemu-iotests/iotests.py >> +++ b/tests/qemu-iotests/iotests.py >> @@ -668,12 +668,16 @@ class VM(qtest.QEMUQtestMachine): >> } >> ])) >> >> -def wait_migration(self): >> +def wait_migration(self, expect_runstate): >> while True: >> event = self.event_wait('MIGRATION') >> log(event, filters=[filter_qmp_event]) >> if event['data']['status'] == 'completed': >> break >> +# The event may occur in finish-migrate, so wait for the expected >> +# post-migration runstate > > That's a bit too specific now that you have expect_runstate. Can you be more specific? :-) If you mean the fact of mentioning “post-migration runstate”, I simply meant that as “the runstate after the migration”. The specific runstate on the source VM is called “postmigrate”. I wouldn’t mind changing it to “after-migration runstate” or something similar, if that’s what you mean. Max >> +while self.qmp('query-status')['return']['status'] != >> expect_runstate: >> +pass >> >> def node_info(self, node_name): >> nodes = self.qmp('query-named-block-nodes') > > Tested-by: Kevin Wolf > signature.asc Description: OpenPGP digital signature
Re: [PATCH] block: nbd: Fix dirty bitmap context name
19.12.2019 18:55, Nir Soffer wrote: > On Thu, Dec 19, 2019 at 5:17 PM Vladimir Sementsov-Ogievskiy > wrote: >> >> 19.12.2019 17:59, Nir Soffer wrote: >>> On Thu, Dec 19, 2019 at 4:04 PM Kevin Wolf wrote: Am 19.12.2019 um 14:41 hat Vladimir Sementsov-Ogievskiy geschrieben: > Ahh, I see, it's documented as > > +# @bitmap: Also export the dirty bitmap reachable from @device, so the > +# NBD client can use NBD_OPT_SET_META_CONTEXT with > +# "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0) > > and it is logical to assume that export name (which is @name argument) is > mentioned. But we never mentioned it. This is just documented after > removed experimenatl command x-nbd-server-add-bitmap, > > look at > > commit 7dc570b3806e5b0a4c9219061556ed5a4a0de80c > Author: Eric Blake > Date: Fri Jan 11 13:47:18 2019 -0600 > >nbd: Remove x-nbd-server-add-bitmap > > ... > > -# @bitmap-export-name: How the bitmap will be seen by nbd clients > -# (default @bitmap) > -# > -# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of > -# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) to > access > -# the exposed bitmap. > > > So, this "NAME" is saved and now looks incorrect. What should be fixed, > is Qapi > documentation. Hm, I don't know these interfaces very well, but from you explanation it looks to me as if having a bitmap name made sense with x-nbd-server-add-bitmap because it could be called more than once for exporting multiple bitmaps. But now, we have only nbd-server-add, which takes a single bitmap name. As we don't have to distinguish multiple bitmaps any more, wouldn't it make more sense to use "qemu:dirty-bitmap" without any other information? Both export name and bitmap name are already identified by the connection. >>> >>> We can use empty string (like the default export name), so the bitmap >>> would be exposed as: >>> >>> "qemu:dirty-bitmap:" >>> >>> This would solve the issue for users, and keep the API extensible. >> >> As I already said, we can not. If we really wont such thing, use another >> name, >> likq qemu:default-dirty-bitmap.. >> >> Or call bitmap export "default", to produce >>"qemu:dirty-bitmaps:default" >> >> But don't change default behavior of nbd-server-add >> >>> But if we have to have something there, using the bitmap name (which may or may not be the same as used in the image file) makes a little more sense because it makes the interface extensible for the case that we ever want to re-introduce an nbd-server-add-bitmap. >>> >>> But using the bitmap name means user of the NBD server need to know this >>> name. >> >> Why not? What is your case exactly? User knows, what kind of bitmap you are >> exporting, so user is in some relation with exporting process anyway. Why >> shouldn't it know the name? > > Because the user configuring qemu (libvirt) is not the same user > accessing qemu NBD > server (ovirt, or some backup application). > >> This name may be defined in you exporting protocol.. What are you exporting? > > We export HTTP API, allowing getting dirty extents and reading data: > https://www.ovirt.org/develop/release-management/features/storage/incremental-backup.html#map-request > (this document is outdated, but it gives the general idea) > > Or provide the NBD URL directly to user (future). > >> Note also, that client may use NBD_OPT_LIST_META_CONTEXT with query >> "qemu:dirty-bitmap:", to get list of all exported bitmaps. > > This is another option, I did not try to use this yet. We can use the single > exported bitmap and fail if we get more than one. This is probably better > than changing the entire stack to support bitmap name. > >>> One option is that libvirt would publish the name of the bitmap in the >>> xml describing >>> the backup, and oVirt will have to propagate this name to the actual >>> program accessing >>> the NBD server, which may be a user program in the case when we expose the >>> NBD >>> URL to users (planned for future version). >>> >>> Another option is that the user will control this name, and libvirt >>> will use the name specified >>> by the user. This means oVirt will have to provide API to set this >>> name and pass it to libvirt. >>> >>> Both cases require lot of effort which does not help anyone in the >>> task of getting dirty >>> extents from an image - which is our current goal. We need to have >>> good defaults that >>> save unneeded effort in the entire stack. >> >> So, you implementing some protocol, and need to export only one bitmap for >> your specified scenario. Why not just give a constant name? Like >> ovirt-bitmap, >> or something like this? > > But we don't use qemu directly. We use libvirt, and libvirt does not expose
Re: [PATCH v2] iotests.py: Let wait_migration wait even more
Am 19.12.2019 um 19:36 hat Max Reitz geschrieben: > The "migration completed" event may be sent (on the source, to be > specific) before the migration is actually completed, so the VM runstate > will still be "finish-migrate" instead of "postmigrate". So ask the > users of VM.wait_migration() to specify the final runstate they desire > and then poll the VM until it has reached that state. (This should be > over very quickly, so busy polling is fine.) > > Without this patch, I see intermittent failures in the new iotest 280 > under high system load. I have not yet seen such failures with other > iotests that use VM.wait_migration() and query-status afterwards, but > maybe they just occur even more rarely, or it is because they also wait > on the destination VM to be running. > > Signed-off-by: Max Reitz > diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py > index 13fd8b5cd2..0b62c42851 100644 > --- a/tests/qemu-iotests/iotests.py > +++ b/tests/qemu-iotests/iotests.py > @@ -668,12 +668,16 @@ class VM(qtest.QEMUQtestMachine): > } > ])) > > -def wait_migration(self): > +def wait_migration(self, expect_runstate): > while True: > event = self.event_wait('MIGRATION') > log(event, filters=[filter_qmp_event]) > if event['data']['status'] == 'completed': > break > +# The event may occur in finish-migrate, so wait for the expected > +# post-migration runstate That's a bit too specific now that you have expect_runstate. > +while self.qmp('query-status')['return']['status'] != > expect_runstate: > +pass > > def node_info(self, node_name): > nodes = self.qmp('query-named-block-nodes') Tested-by: Kevin Wolf
Re: [PATCH] qapi/block: fix nbd-server-add spec
19.12.2019 19:14, Nir Soffer wrote: > On Thu, Dec 19, 2019 at 5:25 PM Vladimir Sementsov-Ogievskiy > wrote: >> >> 19.12.2019 18:08, Nir Soffer wrote: >>> On Thu, Dec 19, 2019 at 5:00 PM Vladimir Sementsov-Ogievskiy >>> wrote: 19.12.2019 17:42, Nir Soffer wrote: > On Thu, Dec 19, 2019 at 4:34 PM Vladimir Sementsov-Ogievskiy > wrote: >> >> "NAME" here may be interpreted like it should match @name, which is >> export name. But it was never mentioned in such way. Make it obvious, >> that actual "" (see docs/interop/nbd.txt) >> will match @bitmap parameter. > > But this is wrong, dirty-bitmap-export-name does not mean the actual > bitmap > name but the name exposed to the NBD client, which can be anything. Yes. What is wrong? It can be enything. Currently by default it is bitmap name. It purely documented (okay, even confusingly documented), but it was so since 4.0. And existing users obviously knows how it work (otherwise, they can't use the feature) So, I think it's OK to fix spec to directly show implementation, that was here since feature introducing. > >> Fixes: 5fcbeb06812685a2 >> Signed-off-by: Vladimir Sementsov-Ogievskiy >> --- >> >> Hi all. >> >> This patch follows discussion on Nir's patch >> [PATCH] block: nbd: Fix dirty bitmap context name >> ( >> https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg04309.html ) >> >> Let's just fix qapi spec now. > > But qapi documents a better behavior for users. We should fix the code > instead > to mach the docs. 1. Using disk name as a bitmap name is a bad behavior, as they are completely different concepts. Especially keeping in mind that user already knows disk name anyway and no reason to write this export name inside metadata context of this export. >>> >>> The different concept is expressed by the "qemu:dirty-bitmap:" prefix. >>> "qemu:dirty-bitmap:export-name" means the dirty bitmap for this export. >> >> Why do you think so? Did you read NBD specification? > > Yes - the name of the bitmap does not have any meaning. it does not have any meaning in your scenario. But in some other, when we need to export several bitmaps, the name will be used. > But for nbd_server_add we allow only single bitmap for export. Yes, Qemu just don't use the whole power of NBD spec. But it may change. > >> Metadata context is always owned by some export. > > Of course. > >> Do you mean that there will bemetadata contexts >> >> qemu:dirty-bitmap:export-A >> qemu:dirty-bitmap:export-B >> >> both defined for export-A? > > It does not make sense, but it is valid. It conflicts with the fact that both contexts are owned by export-A. The idea was that metadata is bound to the export. We should not see metadata of export B in metadata of export A. Moreover, consider that these two exports may have different size. Then how NBD_CMD_BLOCK_STATUS will work? We don't have global nbd server metadata, like you consider it. We only have per-export metadata, which is bound (at least by size) to the export. So it's theoretically possible to export bitmap of another nbd-export (if it has the same size), but it breaks the concept. > 2. It's not directly documented. You assume that NAME == @name. I understand that it may be assumed.. But it's not documented. >>> >>> But NAME is likely to be understood as the name argument, and unlikely to >>> be the >>> bitmap name. >> >> Yes likely. But it's still bad specification, which should be fixed. > > If we cannot change the current behavior since it will break current users, > I agree fixing the spec to describe the current behavior is a good idea. > > > > >>> 3. It's never worked like you write. So if we change the behavior, we'll break existing users. >>> >>> Do we have existing users? isn't this new feature in 4.2? >> >> No, it's since 4.0 >> >>> >>> Before we had experimental x-block-dirty-bitmap APIs, which are stable, so >>> users >>> could not depend on them. >>> > With this we still have the issue of leaking internal bitmap name to > users who do not > control the name, and do not care about it. > >> qapi/block.json | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/qapi/block.json b/qapi/block.json >> index 145c268bb6..8042ef78f0 100644 >> --- a/qapi/block.json >> +++ b/qapi/block.json >> @@ -255,7 +255,8 @@ >> >> # @bitmap: Also export the dirty bitmap reachable from @device, so >> the >> # NBD client can use NBD_OPT_SET_META_CONTEXT with >> -# "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0) >> +# "qemu:dirty-bitmap:BITMAP" to inspect the bitmap (BITMAP here >> +# matches @bitmap par
Re: [PATCH] backup-top: Begin drain earlier
19.12.2019 21:26, Max Reitz wrote: > When dropping backup-top, we need to drain the node before freeing the > BlockCopyState. Otherwise, requests may still be in flight and then the > assertion in shres_destroy() will fail. > > (This becomes visible in intermittent failure of 056.) > > Cc: qemu-sta...@nongnu.org > Signed-off-by: Max Reitz Good catch Reviewed-by: Vladimir Sementsov-Ogievskiy > --- > block/backup-top.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/block/backup-top.c b/block/backup-top.c > index 7cdb1f8eba..818d3f26b4 100644 > --- a/block/backup-top.c > +++ b/block/backup-top.c > @@ -257,12 +257,12 @@ void bdrv_backup_top_drop(BlockDriverState *bs) > BDRVBackupTopState *s = bs->opaque; > AioContext *aio_context = bdrv_get_aio_context(bs); > > -block_copy_state_free(s->bcs); > - > aio_context_acquire(aio_context); > > bdrv_drained_begin(bs); > > +block_copy_state_free(s->bcs); > + > s->active = false; > bdrv_child_refresh_perms(bs, bs->backing, &error_abort); > bdrv_replace_node(bs, backing_bs(bs), &error_abort); > -- Best regards, Vladimir