Re: [PATCH v2 0/2] qdev-monitor: avoid QemuOpts in QMP device_add
On 12/08/2024 19:15, Stefan Hajnoczi wrote: On Fri, Aug 02, 2024 at 10:10:43AM +0200, Markus Armbruster wrote: Can we additionally cut out the QemuOpts middleman in usbback_portid_add()? qdict = qdict_new(); qdict_put_str(qdict, "driver", "usb-host"); tmp = g_strdup_printf("%s.0", usbif->xendev.qdev.id); qdict_put_str(qdict, "bus", tmp); g_free(tmp); tmp = g_strdup_printf("%s-%u", usbif->xendev.qdev.id, port); qdict_put_str(qdict, "id", tmp); g_free(tmp); qdict_put_int(qdict, "port", port); qdict_put_int(qdict, "hostbus", atoi(busid)); qdict_put_str(qdict, "hostport", portname); opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, &error_abort); usbif->ports[port - 1].dev = USB_DEVICE(qdev_device_add(opts, &local_err)); Trying this is up to you! Paul or Anthony: Do you know how to run usbback_portid_add() for testing? I would like to make sure that suggested the code change works and don't have experience running the Xen code in QEMU. Sorry, PV USB is not something I'm familiar with. https://wiki.xenproject.org/wiki/Xen_USB_Passthrough suggests that `xl usbdev-attach` might be the way to test... but you'd need a system with Xen installed and suitably configured guest, so not trivial to set up.
Re: [PATCH] MAINTAINERS: add Edgar as Xen maintainer
On 10/07/2024 22:28, Stefano Stabellini wrote: Add Edgar as Xen subsystem maintainer in QEMU. Edgar has been a QEMU maintainer for years, and has already made key changes to one of the most difficult areas of the Xen subsystem (the mapcache). Edgar volunteered helping us maintain the Xen subsystem in QEMU and we are very happy to welcome him to the team. His knowledge and expertise with QEMU internals will be of great help. Signed-off-by: Stefano Stabellini Reviewed-by: Paul Durrant diff --git a/MAINTAINERS b/MAINTAINERS index 6725913c8b..63e11095a2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -536,6 +536,7 @@ X86 Xen CPUs M: Stefano Stabellini M: Anthony PERARD M: Paul Durrant +M: Edgar E. Iglesias L: xen-de...@lists.xenproject.org S: Supported F: */xen*
Re: [PATCH v3 3/3] ui+display: rename is_buffer_shared() -> surface_is_allocated()
On 05/06/2024 14:14, Gerd Hoffmann wrote: Boolean return value is reversed, to align with QEMU_ALLOCATED_FLAG, so all callers must be adapted. Also rename share_surface variable in vga_draw_graphic() to reduce confusion. No functional change. Suggested-by: Marc-André Lureau Signed-off-by: Gerd Hoffmann --- include/ui/surface.h| 4 ++-- hw/display/qxl-render.c | 2 +- hw/display/vga.c| 20 ++-- hw/display/xenfb.c | 5 +++-- ui/console.c| 3 ++- 5 files changed, 18 insertions(+), 16 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 7/7] hw/xen: Register framebuffer backend via xen_backend_init()
On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote: Align the framebuffer backend with the other legacy ones, register it via xen_backend_init() when '-vga xenfb' is used. It is safe because MODULE_INIT_XEN_BACKEND is called in xen_bus_realize(), long after CLI processing initialized the vga_interface_type variable. Signed-off-by: Philippe Mathieu-Daudé --- include/hw/xen/xen-legacy-backend.h | 3 --- hw/display/xenfb.c | 9 +++-- hw/xenpv/xen_machine_pv.c | 2 -- 3 files changed, 7 insertions(+), 7 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 6/7] hw/xen: register legacy backends via xen_backend_init
On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote: From: Paolo Bonzini It is okay to register legacy backends in the middle of xen_bus_init(). All that the registration does is record the existence of the backend in xenstore. This makes it possible to remove them from the build without introducing undefined symbols in xen_be_init(). It also removes the need for the backend_register callback, whose only purpose is to avoid registering nonfunctional backends. Signed-off-by: Paolo Bonzini Message-ID: <20240509170044.190795-8-pbonz...@redhat.com> Signed-off-by: Philippe Mathieu-Daudé --- include/hw/xen/xen-legacy-backend.h | 11 ++- include/hw/xen/xen_pvdev.h | 1 - hw/9pfs/xen-9p-backend.c| 8 +++- hw/display/xenfb.c | 8 +++- hw/usb/xen-usb.c| 14 -- hw/xen/xen-legacy-backend.c | 16 6 files changed, 20 insertions(+), 38 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 5/7] hw/xen: initialize legacy backends from xen_bus_init()
On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote: From: Paolo Bonzini Prepare for moving the calls to xen_be_register() under the control of xen_bus_init(), using the normal xen_backend_init() method that is used by the "modern" backends. This requires the xenstore global variable to be initialized, which is done by xen_be_init(). To ensure that everything is ready at the time the xen_backend_init() functions are called, remove the xen_be_init() function from all the boards and place it directly in xen_bus_init(). Signed-off-by: Paolo Bonzini Reviewed-by: Philippe Mathieu-Daudé Message-ID: <20240509170044.190795-7-pbonz...@redhat.com> Signed-off-by: Philippe Mathieu-Daudé --- hw/i386/pc.c | 1 - hw/xen/xen-bus.c | 4 hw/xen/xen-hvm-common.c | 2 -- hw/xenpv/xen_machine_pv.c | 5 + 4 files changed, 5 insertions(+), 7 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 4/7] hw/xen: Make XenDevOps structures const
On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote: Keep XenDevOps structures in .rodata. Signed-off-by: Philippe Mathieu-Daudé --- include/hw/xen/xen-legacy-backend.h | 8 hw/9pfs/xen-9p-backend.c| 2 +- hw/display/xenfb.c | 4 ++-- hw/usb/xen-usb.c| 4 ++-- 4 files changed, 9 insertions(+), 9 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 3/7] hw/xen: Constify xenstore_be::XenDevOps
On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote: XenDevOps @ops is not updated, mark it const. Signed-off-by: Philippe Mathieu-Daudé --- include/hw/xen/xen-legacy-backend.h | 2 +- hw/xen/xen-legacy-backend.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 2/7] hw/xen: Constify XenLegacyDevice::XenDevOps
On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote: XenDevOps @ops is not updated, mark it const. Signed-off-by: Philippe Mathieu-Daudé --- include/hw/xen/xen_pvdev.h | 2 +- hw/xen/xen-legacy-backend.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 1/7] hw/xen: Remove declarations left over in 'xen-legacy-backend.h'
On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote: 'xen_blkdev_ops' was removed in commit 19f87870ba ("xen: remove the legacy 'xen_disk' backend"), 'xen_netdev_ops' in commit 25967ff69f ("hw/xen: update Xen PV NIC to XenDevice model") and 'xen_console_ops' in commit 9b77374690 ("hw/xen: update Xen console to XenDevice model"). Remove them. Signed-off-by: Philippe Mathieu-Daudé --- include/hw/xen/xen-legacy-backend.h | 3 --- 1 file changed, 3 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] xen-hvm: Avoid livelock while handling buffered ioreqs
On 04/04/2024 15:08, Ross Lagerwall wrote: A malicious or buggy guest may generated buffered ioreqs faster than QEMU can process them in handle_buffered_iopage(). The result is a livelock - QEMU continuously processes ioreqs on the main thread without iterating through the main loop which prevents handling other events, processing timers, etc. Without QEMU handling other events, it often results in the guest becoming unsable and makes it difficult to stop the source of buffered ioreqs. To avoid this, if we process a full page of buffered ioreqs, stop and reschedule an immediate timer to continue processing them. This lets QEMU go back to the main loop and catch up. Signed-off-by: Ross Lagerwall --- hw/xen/xen-hvm-common.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] xen-hvm: Avoid livelock while handling buffered ioreqs
On 08/04/2024 14:00, Ross Lagerwall wrote: On Sat, Apr 6, 2024 at 11:58 AM Durrant, Paul wrote: On 04/04/2024 15:08, Ross Lagerwall wrote: A malicious or buggy guest may generated buffered ioreqs faster than QEMU can process them in handle_buffered_iopage(). The result is a livelock - QEMU continuously processes ioreqs on the main thread without iterating through the main loop which prevents handling other events, processing timers, etc. Without QEMU handling other events, it often results in the guest becoming unsable and makes it difficult to stop the source of buffered ioreqs. To avoid this, if we process a full page of buffered ioreqs, stop and reschedule an immediate timer to continue processing them. This lets QEMU go back to the main loop and catch up. Do PV backends potentially cause the same scheduling issue (if not using io threads)? From what I can tell: xen-block: It reads req_prod / req_cons once before entering the loop so it should be fine, I think. xen_console: Same as xen-block xen_nic: It reads req_prod / req_cons once before entering the loop. However, once the loop ends it checks for more requests and if there are more requests it restarts from the beginning. It seems like this could be susceptible to the same issue. (These PV backends generally aren't used by XenServer's system QEMU so I didn't spend too much time looking into it.) Thanks, Ok. Thanks for checking. Paul
Re: [PATCH v4 3/6] xen: decouple generic xen code from legacy backends codebase
On 02/12/2023 01:41, Volodymyr Babchuk wrote: In xen-all.c there are unneeded dependencies on xen-legacy-backend.c: - xen_init() uses xen_pv_printf() to report errors, but it does not provide a pointer to the struct XenLegacyDevice, so it is kind of useless, we can use standard error_report() instead. - xen-all.c has function xenstore_record_dm_state() which uses global variable "xenstore" defined and initialized in xen-legacy-backend.c It is used exactly once, so we can just open a new connection to the xenstore, update DM state and close connection back. Those two changes allows us to remove xen-legacy-backend.c at all, what should be done in the future anyways. But right now this patch moves us one step close to have QEMU build without legacy Xen backends. Signed-off-by: Volodymyr Babchuk --- In v4: - New in v4, previous was part of "xen: add option to disable legacy backends" - Do not move xenstore global variable from xen-legacy-backend.c, instead use a local variable. --- accel/xen/xen-all.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 05/12] block: remove AioContext locking
On 29/11/2023 19:55, Stefan Hajnoczi wrote: This is the big patch that removes aio_context_acquire()/aio_context_release() from the block layer and affected block layer users. There isn't a clean way to split this patch and the reviewers are likely the same group of people, so I decided to do it in one patch. Signed-off-by: Stefan Hajnoczi --- include/block/block-global-state.h | 9 +- include/block/block-io.h | 3 +- include/block/snapshot.h | 2 - block.c| 234 +- block/block-backend.c | 14 -- block/copy-before-write.c | 22 +-- block/export/export.c | 22 +-- block/io.c | 45 + block/mirror.c | 19 -- block/monitor/bitmap-qmp-cmds.c| 20 +- block/monitor/block-hmp-cmds.c | 29 --- block/qapi-sysemu.c| 27 +-- block/qapi.c | 18 +- block/raw-format.c | 5 - block/replication.c| 58 +- block/snapshot.c | 22 +-- block/write-threshold.c| 6 - blockdev.c | 307 + blockjob.c | 18 -- hw/block/dataplane/virtio-blk.c| 10 - hw/block/dataplane/xen-block.c | 17 +- hw/block/virtio-blk.c | 45 + hw/core/qdev-properties-system.c | 9 - job.c | 16 -- migration/block.c | 33 +--- migration/migration-hmp-cmds.c | 3 - migration/savevm.c | 22 --- net/colo-compare.c | 2 - qemu-img.c | 4 - qemu-io.c | 10 +- qemu-nbd.c | 2 - replay/replay-debugging.c | 4 - tests/unit/test-bdrv-drain.c | 51 + tests/unit/test-bdrv-graph-mod.c | 6 - tests/unit/test-block-iothread.c | 31 --- tests/unit/test-blockjob.c | 137 - tests/unit/test-replication.c | 11 -- util/async.c | 4 - util/vhost-user-server.c | 3 - scripts/block-coroutine-wrapper.py | 3 - tests/tsan/suppressions.tsan | 1 - 41 files changed, 102 insertions(+), 1202 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 2/6] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to QEMU_BQL_LOCK_GUARD
On 29/11/2023 21:26, Stefan Hajnoczi wrote: The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL) instead, it is already widely used and unambiguous. Signed-off-by: Stefan Hajnoczi --- include/qemu/main-loop.h | 20 ++-- hw/i386/kvm/xen_evtchn.c | 14 +++--- hw/i386/kvm/xen_gnttab.c | 2 +- hw/mips/mips_int.c| 2 +- hw/ppc/ppc.c | 2 +- target/i386/kvm/xen-emu.c | 2 +- target/ppc/excp_helper.c | 2 +- target/ppc/helper_regs.c | 2 +- target/riscv/cpu_helper.c | 4 ++-- 9 files changed, 25 insertions(+), 25 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()
m.c | 4 +- target/s390x/tcg/misc_helper.c | 118 +-- target/sparc/int32_helper.c | 2 +- target/sparc/int64_helper.c | 6 +- target/sparc/win_helper.c| 20 ++--- target/xtensa/exc_helper.c | 8 +- ui/spice-core.c | 4 +- util/async.c | 2 +- util/main-loop.c | 8 +- util/rcu.c | 14 ++-- audio/coreaudio.m| 4 +- memory_ldst.c.inc| 18 ++-- target/i386/hvf/README.md| 2 +- ui/cocoa.m | 50 ++-- 94 files changed, 502 insertions(+), 502 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 4/6] xen_pvdev: Do not assume Dom0 when creating a directory
On 23/11/2023 12:27, David Woodhouse wrote: On 23 November 2023 12:17:57 GMT, Volodymyr Babchuk wrote: Hi David, David Woodhouse writes: Which PV backends do you care about? We already have net, block and console converted. Well, this is all what we need, actually. Even console only will be sufficient, as we are using QEMU to provide virtio-pci backends, so both storage and networking should be provided by virtio. Are you proposing to just drop this patch at all? I believe we can live without it, yes. Yeah, I think you can drop anything that's only needed for the legacy backends. I'm tempted to make a separate config option to compile those out. I think that would be a good idea. The other legacy bacckend that we may need to care about is xenfb... not so much the framebuffer itself, but the mouse and keyboard aspects. The XENVKBD and XENHID drivers expose PV mouse and keyboard to Windows instances so it's be nice if we can avoid the backend withering away. Paul
Re: [PATCH v2 1/6] hw/xen: Set XenBackendInstance in the XenDevice before realizing it
On 22/11/2023 23:04, David Woodhouse wrote: On Wed, 2023-11-22 at 22:56 +, Volodymyr Babchuk wrote: Paul Durrant writes: On 21/11/2023 22:10, Volodymyr Babchuk wrote: From: David Woodhouse This allows a XenDevice implementation to know whether it was created by QEMU, or merely discovered in XenStore after the toolstack created it. This will allow us to create frontend/backend nodes only when we should, rather than unconditionally attempting to overwrite them from a driver domain which doesn't have privileges to do so. As an added benefit, it also means we no longer have to call the xen_backend_set_device() function from the device models immediately after calling qdev_realize_and_unref(). Even though we could make the argument that it's safe to do so, and the pointer to the unreffed device *will* actually still be valid, it still made my skin itch to look at it. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 3 +-- hw/char/xen_console.c | 2 +- hw/net/xen_nic.c | 2 +- hw/xen/xen-bus.c | 4 include/hw/xen/xen-backend.h | 2 -- include/hw/xen/xen-bus.h | 2 ++ 6 files changed, 9 insertions(+), 6 deletions(-) Actually, I think you should probably update xen_backend_try_device_destroy() in this patch too. It currently uses xen_backend_list_find() to check whether the specified XenDevice has an associated XenBackendInstance. Sure. Looks like it is the only user of xen_backend_list_find(), so we can get rid of it as well. I'll drop your R-b tag, because of those additional changes in the new version. I think it's fine to keep. He told me to do it :) I confirm that :-)
Re: [PATCH v2 4/6] xen_pvdev: Do not assume Dom0 when creating a directory
On 23/11/2023 00:07, Volodymyr Babchuk wrote: Hi, Volodymyr Babchuk writes: Hi Stefano, Stefano Stabellini writes: On Wed, 22 Nov 2023, David Woodhouse wrote: On Wed, 2023-11-22 at 15:09 -0800, Stefano Stabellini wrote: On Wed, 22 Nov 2023, David Woodhouse wrote: On Wed, 2023-11-22 at 14:29 -0800, Stefano Stabellini wrote: On Wed, 22 Nov 2023, Paul Durrant wrote: On 21/11/2023 22:10, Volodymyr Babchuk wrote: From: Oleksandr Tyshchenko Instead of forcing the owner to domid 0, use XS_PRESERVE_OWNER to inherit the owner of the directory. Ah... so that's why the previous patch is there. This is not the right way to fix it. The QEMU Xen support is *assuming* that QEMU is either running in, or emulating, dom0. In the emulation case this is probably fine, but the 'real Xen' case it should be using the correct domid for node creation. I guess this could either be supplied on the command line or discerned by reading the local domain 'domid' node. yes, it should be passed as command line option to QEMU I'm not sure I like the idea of a command line option for something which QEMU could discover for itself. That's fine too. I meant to say "yes, as far as I know the toolstack passes the domid to QEMU as a command line option today". The -xen-domid argument on the QEMU command line today is the *guest* domain ID, not the domain ID in which QEMU itself is running. Or were you thinking of something different? Ops, you are right and I understand your comment better now. The backend domid is not on the command line but it should be discoverable (on xenstore if I remember right). Yes, it is just "~/domid". I'll add a function that reads it. Just a quick question to QEMU folks: is it better to add a global variable where we will store own Domain ID or it will be okay to read domid from Xenstore every time we need it? If global variable variant is better, what is proffered place to define this variable? system/globals.c ? Actually... is it possible for QEMU just to use a relative path for the backend nodes? That way it won't need to know it's own domid, will it? Paul
Re: [PATCH v2 4/6] xen_pvdev: Do not assume Dom0 when creating a directory
On 21/11/2023 22:10, Volodymyr Babchuk wrote: From: Oleksandr Tyshchenko Instead of forcing the owner to domid 0, use XS_PRESERVE_OWNER to inherit the owner of the directory. Ah... so that's why the previous patch is there. This is not the right way to fix it. The QEMU Xen support is *assuming* that QEMU is either running in, or emulating, dom0. In the emulation case this is probably fine, but the 'real Xen' case it should be using the correct domid for node creation. I guess this could either be supplied on the command line or discerned by reading the local domain 'domid' node. Note that for other than Dom0 domain (non toolstack domain) the "driver_domain" property should be set in domain config file for the toolstack to create required directories in advance. Signed-off-by: Oleksandr Tyshchenko Signed-off-by: Volodymyr Babchuk --- hw/xen/xen_pvdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/xen/xen_pvdev.c b/hw/xen/xen_pvdev.c index c5ad71e8dc..42bdd4f6c8 100644 --- a/hw/xen/xen_pvdev.c +++ b/hw/xen/xen_pvdev.c @@ -60,7 +60,8 @@ void xen_config_cleanup(void) int xenstore_mkdir(char *path, int p) { -if (!qemu_xen_xs_create(xenstore, 0, 0, xen_domid, p, path)) { +if (!qemu_xen_xs_create(xenstore, 0, XS_PRESERVE_OWNER, +xen_domid, p, path)) { xen_pv_printf(NULL, 0, "xs_mkdir %s: failed\n", path); return -1; }
Re: [PATCH v2 3/6] xen: xenstore: add possibility to preserve owner
On 21/11/2023 22:10, Volodymyr Babchuk wrote: Add option to preserve owner when creating an entry in Xen Store. This may be needed in cases when Qemu is working as device model in a *may* be needed? I don't undertstand why this patch is necessary and the commit comment isn't really helping me. domain that is Domain-0, e.g. in driver domain. "owner" parameter for qemu_xen_xs_create() function can have special value XS_PRESERVE_OWNER, which will make specific implementation to get original owner of an entry and pass it back to set_permissions() call. Please note, that XenStore inherits permissions, so even if entry is newly created by, it already has the owner set to match owner of entry at previous level. Signed-off-by: Volodymyr Babchuk
Re: [PATCH v2 1/6] hw/xen: Set XenBackendInstance in the XenDevice before realizing it
On 21/11/2023 22:10, Volodymyr Babchuk wrote: From: David Woodhouse This allows a XenDevice implementation to know whether it was created by QEMU, or merely discovered in XenStore after the toolstack created it. This will allow us to create frontend/backend nodes only when we should, rather than unconditionally attempting to overwrite them from a driver domain which doesn't have privileges to do so. As an added benefit, it also means we no longer have to call the xen_backend_set_device() function from the device models immediately after calling qdev_realize_and_unref(). Even though we could make the argument that it's safe to do so, and the pointer to the unreffed device *will* actually still be valid, it still made my skin itch to look at it. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 3 +-- hw/char/xen_console.c| 2 +- hw/net/xen_nic.c | 2 +- hw/xen/xen-bus.c | 4 include/hw/xen/xen-backend.h | 2 -- include/hw/xen/xen-bus.h | 2 ++ 6 files changed, 9 insertions(+), 6 deletions(-) Actually, I think you should probably update xen_backend_try_device_destroy() in this patch too. It currently uses xen_backend_list_find() to check whether the specified XenDevice has an associated XenBackendInstance. Paul
Re: [PATCH v2 2/6] xen: backends: touch some XenStore nodes only if device...
On 21/11/2023 22:10, Volodymyr Babchuk wrote: was created by QEMU Xen PV devices in QEMU can be created in two ways: either by QEMU itself, if they were passed via command line, or by Xen toolstack. In the latter case, QEMU scans XenStore entries and configures devices accordingly. In the second case we don't want QEMU to write/delete front-end entries for two reasons: it might have no access to those entries if it is running in un-privileged domain and it is just incorrect to overwrite entries already provided by Xen toolstack, because toolstack manages those nodes. For example, it might read backend- or frontend- state to be sure that they are both disconnected and it is safe to destroy a domain. This patch checks presence of xendev->backend to check if Xen PV device is acting as a backend (i.e. it was configured by Xen Technally *all* XenDevice objects are backends. toolstack) to decide if it should touch frontend entries in XenStore. Also, when we need to remove XenStore entries during device teardown only if they weren't created by Xen toolstack. If they were created by toolstack, then it is toolstack's job to do proper clean-up.
Re: [PATCH v2 1/6] hw/xen: Set XenBackendInstance in the XenDevice before realizing it
On 21/11/2023 22:10, Volodymyr Babchuk wrote: From: David Woodhouse This allows a XenDevice implementation to know whether it was created by QEMU, or merely discovered in XenStore after the toolstack created it. This will allow us to create frontend/backend nodes only when we should, rather than unconditionally attempting to overwrite them from a driver domain which doesn't have privileges to do so. As an added benefit, it also means we no longer have to call the xen_backend_set_device() function from the device models immediately after calling qdev_realize_and_unref(). Even though we could make the argument that it's safe to do so, and the pointer to the unreffed device *will* actually still be valid, it still made my skin itch to look at it. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 3 +-- hw/char/xen_console.c| 2 +- hw/net/xen_nic.c | 2 +- hw/xen/xen-bus.c | 4 include/hw/xen/xen-backend.h | 2 -- include/hw/xen/xen-bus.h | 2 ++ 6 files changed, 9 insertions(+), 6 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 2/3] vl: disable default serial when xen-console is enabled
On 15/11/2023 17:24, David Woodhouse wrote: From: David Woodhouse If a Xen console is configured on the command line, do not add a default serial port. Signed-off-by: David Woodhouse --- system/vl.c | 1 + 1 file changed, 1 insertion(+) Reviewed-by: Paul Durrant
Re: [PATCH 3/3] hw/xen: clean up xen_block_find_free_vdev() to avoid Coverity false positive
On 15/11/2023 12:24, David Woodhouse wrote: From: David Woodhouse Coverity couldn't see that nr_existing was always going to be zero when qemu_xen_xs_directory() returned NULL in the ENOENT case (CID 1523906). Perhaps more to the point, neither could Peter at first glance. Improve the code to hopefully make it clearer to Coverity and human reviewers alike. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 24 +--- 1 file changed, 21 insertions(+), 3 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v1 3/7] xen: xenstore: add possibility to preserve owner
On 10/11/2023 15:42, Volodymyr Babchuk wrote: Add option to preserve owner when creating an entry in Xen Store. This may be needed in cases when Qemu is working as device model in a domain that is Domain-0, e.g. in driver domain. "owner" parameter for qemu_xen_xs_create() function can have special value XS_PRESERVE_OWNER, which will make specific implementation to get original owner of an entry and pass it back to set_permissions() call. If QEMU is running in a driver domain then it should know whether the domid of the domain it is running in and use that. Yes, it is hardcoded to 0 at the moment but surely a backend domid (which defaults to 0) could be passed on the command line? Paul Signed-off-by: Volodymyr Babchuk --- hw/i386/kvm/xen_xenstore.c | 18 ++ hw/xen/xen-operations.c | 12 include/hw/xen/xen_backend_ops.h | 2 ++ 3 files changed, 32 insertions(+) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index 660d0b72f9..7b894a9884 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -1572,6 +1572,24 @@ static bool xs_be_create(struct qemu_xs_handle *h, xs_transaction_t t, return false; } +if (owner == XS_PRESERVE_OWNER) { +GList *perms; +char letter; + +err = xs_impl_get_perms(h->impl, 0, t, path, &perms); +if (err) { +errno = err; +return false; +} + +if (sscanf(perms->data, "%c%u", &letter, &owner) != 2) { +errno = EFAULT; +g_list_free_full(perms, g_free); +return false; +} +g_list_free_full(perms, g_free); +} + perms_list = g_list_append(perms_list, xs_perm_as_string(XS_PERM_NONE, owner)); perms_list = g_list_append(perms_list, diff --git a/hw/xen/xen-operations.c b/hw/xen/xen-operations.c index e00983ec44..1df59b3c08 100644 --- a/hw/xen/xen-operations.c +++ b/hw/xen/xen-operations.c @@ -300,6 +300,18 @@ static bool libxenstore_create(struct qemu_xs_handle *h, xs_transaction_t t, return false; } +if (owner == XS_PRESERVE_OWNER) { +struct xs_permissions *tmp; +unsigned int num; + +tmp = xs_get_permissions(h->xsh, 0, path, &num); +if (tmp == NULL) { +return false; +} +perms_list[0].id = tmp[0].id; +free(tmp); +} + return xs_set_permissions(h->xsh, t, path, perms_list, ARRAY_SIZE(perms_list)); } diff --git a/include/hw/xen/xen_backend_ops.h b/include/hw/xen/xen_backend_ops.h index 90cca85f52..273e414559 100644 --- a/include/hw/xen/xen_backend_ops.h +++ b/include/hw/xen/xen_backend_ops.h @@ -266,6 +266,8 @@ typedef uint32_t xs_transaction_t; #define XS_PERM_READ 0x01 #define XS_PERM_WRITE 0x02 +#define XS_PRESERVE_OWNER0xFFFE + struct xenstore_backend_ops { struct qemu_xs_handle *(*open)(void); void (*close)(struct qemu_xs_handle *h);
Re: [PATCH v1 5/7] xen-bus: Set offline if backend's state is XenbusStateClosed
On 10/11/2023 15:42, Volodymyr Babchuk wrote: From: Oleksandr Tyshchenko Both state (XenbusStateClosed) and online (0) are expected by toolstack/xl devd to completely destroy the device. But "offline" is never being set by the backend resulting in timeout during domain destruction, garbage in Xestore and still running Qemu instance. Signed-off-by: Oleksandr Tyshchenko Signed-off-by: Volodymyr Babchuk --- hw/xen/xen-bus.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index 75474d4b43..6e7ec3af64 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -519,6 +519,10 @@ static void xen_device_backend_changed(void *opaque, const char *path) xen_device_backend_set_state(xendev, XenbusStateClosed); } +if (xen_device_backend_get_state(xendev) == XenbusStateClosed) { +xen_device_backend_set_online(xendev, false); +} + /* * If a backend is still 'online' then we should leave it alone but, * if a backend is not 'online', then the device is a candidate I don't understand what you're trying to do here. Just a few lines up from this hunk there is: 506if (xen_device_backend_scanf(xendev, "online", "%u", &online) != 1) { 507online = 0; 508} 509 510xen_device_backend_set_online(xendev, !!online); Why is this not sufficient? What happens if the frontend decides to stop and start (e.g. for a driver update)? I'm guessing the backend will be destroyed... which is not very friendly. Paul
Re: [PATCH v1 1/7] xen-block: Do not write frontend nodes
On 10/11/2023 15:42, Volodymyr Babchuk wrote: From: Oleksandr Tyshchenko The PV backend running in other than Dom0 domain (non toolstack domain) is not allowed to write frontend nodes. The more, the backend does not need to do that at all, this is purely toolstack/xl devd business. I do not know for what reason the backend does that here, this is not really needed, probably it is just a leftover and all xen_device_frontend_printf() instances should go away completely. It is not a leftover and it is needed in the case that QEMU is instantiating the backend unilaterally... i.e. without the involvement of any Xen toolstack. Agreed that, if QEMU, is running in a deprivileged context, that is not an option. The correct way to determined this though is whether the device is being created via the QEMU command line or whether is being created because XenStore nodes written by a toolstack were discovered. In the latter case there should be a XenBackendInstance that corresponds to the XenDevice whereas in the former case there should not be. For example, see xen_backend_try_device_destroy() Paul Signed-off-by: Oleksandr Tyshchenko Signed-off-by: Volodymyr Babchuk --- hw/block/xen-block.c | 11 +++ hw/xen/xen-bus.c | 2 +- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c index a07cd7eb5d..dc4d477c22 100644 --- a/hw/block/xen-block.c +++ b/hw/block/xen-block.c @@ -221,6 +221,7 @@ static void xen_block_realize(XenDevice *xendev, Error **errp) XenBlockVdev *vdev = &blockdev->props.vdev; BlockConf *conf = &blockdev->props.conf; BlockBackend *blk = conf->blk; +XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev))); if (vdev->type == XEN_BLOCK_VDEV_TYPE_INVALID) { error_setg(errp, "vdev property not set"); @@ -280,10 +281,12 @@ static void xen_block_realize(XenDevice *xendev, Error **errp) xen_device_backend_printf(xendev, "info", "%u", blockdev->info); -xen_device_frontend_printf(xendev, "virtual-device", "%lu", - vdev->number); -xen_device_frontend_printf(xendev, "device-type", "%s", - blockdev->device_type); +if (xenbus->backend_id == 0) { +xen_device_frontend_printf(xendev, "virtual-device", "%lu", + vdev->number); +xen_device_frontend_printf(xendev, "device-type", "%s", + blockdev->device_type); +} xen_device_backend_printf(xendev, "sector-size", "%u", conf->logical_block_size); diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index ece8ec40cd..06d5192aca 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -1048,7 +1048,7 @@ static void xen_device_realize(DeviceState *dev, Error **errp) xen_device_backend_set_online(xendev, true); xen_device_backend_set_state(xendev, XenbusStateInitWait); -if (!xen_device_frontend_exists(xendev)) { +if (!xen_device_frontend_exists(xendev) && xenbus->backend_id == 0) { xen_device_frontend_printf(xendev, "backend", "%s", xendev->backend_path); xen_device_frontend_printf(xendev, "backend-id", "%u",
Re: [PATCH] hw/xen: clean up xen_block_find_free_vdev() to avoid Coverity false positive
On 09/11/2023 15:30, David Woodhouse wrote: From: David Woodhouse Coverity couldn't see that nr_existing was always going to be zero when qemu_xen_xs_directory() returned NULL in the ENOENT case (CID 1523906). Perhaps more to the point, neither could Peter at first glance. Improve the code to hopefully make it clearer to Coverity and human reviewers alike. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 24 +--- 1 file changed, 21 insertions(+), 3 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v4 17/17] docs: update Xen-on-KVM documentation
On 06/11/2023 14:35, David Woodhouse wrote: From: David Woodhouse Add notes about console and network support, and how to launch PV guests. Clean up the disk configuration examples now that that's simpler, and remove the comment about IDE unplug on q35/AHCI now that it's fixed. Update the -initrd option documentation to explain how to quote commas in module command lines, and reference it when documenting PV guests. Also update stale avocado test filename in MAINTAINERS. Signed-off-by: David Woodhouse --- MAINTAINERS | 2 +- docs/system/i386/xen.rst | 107 +-- qemu-options.hx | 14 +++-- 3 files changed, 91 insertions(+), 32 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v4 16/17] doc/sphinx/hxtool.py: add optional label argument to SRST directive
On 06/11/2023 14:35, David Woodhouse wrote: From: David Woodhouse We can't just embed labels directly into files like qemu-options.hx which are included from multiple top-level RST files, because Sphinx sees the labels as duplicate: https://github.com/sphinx-doc/sphinx/issues/9707 So add an 'emitrefs' option to the Sphinx hxtool-doc directive, which is set only in invocation.rst and not from the HTML rendition of the man page. Along with an argument to the SRST directive which causes a label of the form '.. _LABEL-reference-label:' to be emitted when the emitrefs option is set. Signed-off-by: David Woodhouse --- docs/sphinx/hxtool.py | 18 +- docs/system/invocation.rst | 1 + 2 files changed, 18 insertions(+), 1 deletion(-) Reviewed-by: Paul Durrant
Re: [PATCH v4 13/17] hw/i386/pc: support '-nic' for xen-net-device
On 06/11/2023 14:35, David Woodhouse wrote: From: David Woodhouse The default NIC creation seems a bit hackish to me. I don't understand why each platform has to call pci_nic_init_nofail() from a point in the code where it actually has a pointer to the PCI bus, and then we have the special cases for things like ne2k_isa. If qmp_device_add() can *find* the appropriate bus and instantiate the device on it, why can't we just do that from generic code for creating the default NICs too? But that isn't a yak I want to shave today. Add a xenbus field to the PCMachineState so that it can make its way from pc_basic_device_init() to pc_nic_init() and be handled as a special case like ne2k_isa is. Now we can launch emulated Xen guests with '-nic user'. Signed-off-by: David Woodhouse --- hw/i386/pc.c | 11 --- hw/i386/pc_piix.c| 2 +- hw/i386/pc_q35.c | 2 +- hw/xen/xen-bus.c | 4 +++- include/hw/i386/pc.h | 4 +++- include/hw/xen/xen-bus.h | 2 +- 6 files changed, 17 insertions(+), 8 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v4 06/17] hw/xen: automatically assign device index to block devices
On 06/11/2023 14:34, David Woodhouse wrote: From: David Woodhouse There's no need to force the user to assign a vdev. We can automatically assign one, starting at xvda and searching until we find the first disk name that's unused. This means we can now allow '-drive if=xen,file=xxx' to work without an explicit separate -driver argument, just like if=virtio. Rip out the legacy handling from the xenpv machine, which was scribbling over any disks configured by the toolstack, and didn't work with anything but raw images. Signed-off-by: David Woodhouse Acked-by: Kevin Wolf --- blockdev.c | 15 +++- hw/block/xen-block.c| 118 ++-- hw/xen/xen_devconfig.c | 28 --- hw/xenpv/xen_machine_pv.c | 9 --- include/hw/xen/xen-legacy-backend.h | 1 - 5 files changed, 125 insertions(+), 46 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] xen/pt: Emulate multifunction bit in header type
On 03/11/2023 17:26, Ross Lagerwall wrote: The intention of the code appears to have been to unconditionally set the multifunction bit but since the emulation mask is 0x00 it has no effect. Instead, emulate the bit and set it based on the multifunction property of the PCIDevice (which can be set using QAPI). This allows making passthrough devices appear as functions in a Xen guest. Signed-off-by: Ross Lagerwall Reviewed-by: Paul Durrant
Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode
On 25/10/2023 10:00, David Woodhouse wrote: On Wed, 2023-10-25 at 09:31 +0100, Paul Durrant wrote: On 24/10/2023 17:34, David Woodhouse wrote: On Tue, 2023-10-24 at 17:25 +0100, Paul Durrant wrote: On 24/10/2023 16:49, David Woodhouse wrote: On Tue, 2023-10-24 at 16:39 +0100, Paul Durrant wrote: On 24/10/2023 16:37, David Woodhouse wrote: On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote: On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse The primary console is special because the toolstack maps a page at a fixed GFN and also allocates the guest-side event channel. Add support for that in emulated mode, so that we can have a primary console. Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which supports literally nothing except a single-page mapping of the console page. This might as well have been a hack in the xen_console driver, but this way at least the special-casing is kept within the Xen emulation code, and it gives us a hook for a more complete implementation if/when we ever do need one. Why can't you map the console page via the grant table like the xenstore page? I suppose we could, but I didn't really want the generic xen-console device code having any more of a special case for 'Xen emulation' than it does already by having to call xen_primary_console_create(). But doesn't is save you the whole foreignmem thing? You can use the grant table for primary and secondary consoles. Yes. And I could leave the existing foreignmem thing just for the case of primary console under true Xen. It's probably not that awful a special case, in the end. Then again, I was surprised I didn't *already* have a foreignmem ops for the emulated case, and we're probably going to want to continue fleshing it out later, so I don't really mind adding it. True. We'll need it for some of the other more fun protocols like vkbd or fb. Still, I think it'd be nicer to align the xenstore and primary console code to look similar and punt the work until then :-) I don't think it ends up looking like xenstore either way, does it? Xenstore is special because it gets to use the original pointer to its own page. Not sure what you mean there? A guest can query the PFN for either xenstore or console using HVM params, or it can find them in its own grant table entries 0 or 1. The code in our xen_xenstore.c uses its *own* pointer (s->xs) to the MemoryRegion that it created (s->xenstore_page). It is its own backend, as well as doing the "magic" to create the guest-side mapping and event channel. The difference for the console code is that we actually have a *separation* between the standard backend code in xen_console.c, and the magic frontend parts for the emulated mode. I don't think I want to hack the xen_console code to explicitly call a xen_console_give_me_your_page() function. If not foreignmem, I think you were suggesting that we actually call the grant mapping code to get a pointer to the underlying page, right? I'm suggesting that the page be mapped in the same way that the xenstore backend does: 1462 /* 1463 * We don't actually access the guest's page through the grant, because 1464 * this isn't real Xen, and we can just use the page we gave it in the 1465 * first place. Map the grant anyway, mostly for cosmetic purposes so 1466 * it *looks* like it's in use in the guest-visible grant table. 1467 */ 1468 s->gt = qemu_xen_gnttab_open(); 1469 uint32_t xs_gntref = GNTTAB_RESERVED_XENSTORE; 1470 s->granted_xs = qemu_xen_gnttab_map_refs(s->gt, 1, xen_domid, &xs_gntref, 1471 PROT_READ | PROT_WRITE); It already *is*. But as with xen_xenstore.c, nothing ever *uses* the s->granted_xs pointer. It's just cosmetic to make the grant table look right. But that doesn't help the *backend* code. The backend doesn't even know the grant ref#, because the convention we inherited from Xen is that the `ring-ref` in XenStore for the primary console is actually the MFN, to be mapped as foreignmem. Of course, we *do* know the grant-ref for the primary console, as it's always GNTTAB_RESERVED_CONSOLE. So I suppose we could put a hack into the xen_console backend to map *that* in the case of primary console under emu? In fact that would probably do the right thing even under Xen if we could persuade Xen to make an ioemu primary console? That's exactly what I am getting at :-) I don't think we need care about the ring-ref in xenstore for the primary console. Paul I could kind of live with that... except that Xen has this ugly convention that the "ring-ref" frontend node for the primary console actually has the *MFN* not a grant ref. Which I don't understand since the toolstack *does* populate the grant table for it (just
Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode
On 24/10/2023 17:34, David Woodhouse wrote: On Tue, 2023-10-24 at 17:25 +0100, Paul Durrant wrote: On 24/10/2023 16:49, David Woodhouse wrote: On Tue, 2023-10-24 at 16:39 +0100, Paul Durrant wrote: On 24/10/2023 16:37, David Woodhouse wrote: On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote: On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse The primary console is special because the toolstack maps a page at a fixed GFN and also allocates the guest-side event channel. Add support for that in emulated mode, so that we can have a primary console. Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which supports literally nothing except a single-page mapping of the console page. This might as well have been a hack in the xen_console driver, but this way at least the special-casing is kept within the Xen emulation code, and it gives us a hook for a more complete implementation if/when we ever do need one. Why can't you map the console page via the grant table like the xenstore page? I suppose we could, but I didn't really want the generic xen-console device code having any more of a special case for 'Xen emulation' than it does already by having to call xen_primary_console_create(). But doesn't is save you the whole foreignmem thing? You can use the grant table for primary and secondary consoles. Yes. And I could leave the existing foreignmem thing just for the case of primary console under true Xen. It's probably not that awful a special case, in the end. Then again, I was surprised I didn't *already* have a foreignmem ops for the emulated case, and we're probably going to want to continue fleshing it out later, so I don't really mind adding it. True. We'll need it for some of the other more fun protocols like vkbd or fb. Still, I think it'd be nicer to align the xenstore and primary console code to look similar and punt the work until then :-) I don't think it ends up looking like xenstore either way, does it? Xenstore is special because it gets to use the original pointer to its own page. Not sure what you mean there? A guest can query the PFN for either xenstore or console using HVM params, or it can find them in its own grant table entries 0 or 1. I don't think I want to hack the xen_console code to explicitly call a xen_console_give_me_your_page() function. If not foreignmem, I think you were suggesting that we actually call the grant mapping code to get a pointer to the underlying page, right? I'm suggesting that the page be mapped in the same way that the xenstore backend does: 1462/* 1463 * We don't actually access the guest's page through the grant, because 1464 * this isn't real Xen, and we can just use the page we gave it in the 1465 * first place. Map the grant anyway, mostly for cosmetic purposes so 1466 * it *looks* like it's in use in the guest-visible grant table. 1467 */ 1468s->gt = qemu_xen_gnttab_open(); 1469uint32_t xs_gntref = GNTTAB_RESERVED_XENSTORE; 1470s->granted_xs = qemu_xen_gnttab_map_refs(s->gt, 1, xen_domid, &xs_gntref, 1471 PROT_READ | PROT_WRITE); I could kind of live with that... except that Xen has this ugly convention that the "ring-ref" frontend node for the primary console actually has the *MFN* not a grant ref. Which I don't understand since the toolstack *does* populate the grant table for it (just as it does for the xenstore page). But we'd have to add a special case exception to that special case, so that in the emu case it's an actual grant ref again. I think I prefer just having a stub of foreignmem, TBH. You're worried about the guest changing the page it uses for the primary console and putting a new one in xenstore? I'd be amazed if that even works on Xen unless the guest is careful to write it into GNTTAB_RESERVED_CONSOLE. (I didn't yet manage to get Xen to actually create a primary console of type iomem, FWIW) No, that doesn't entirely surprise me. Paul
Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode
On 24/10/2023 16:49, David Woodhouse wrote: On Tue, 2023-10-24 at 16:39 +0100, Paul Durrant wrote: On 24/10/2023 16:37, David Woodhouse wrote: On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote: On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse The primary console is special because the toolstack maps a page at a fixed GFN and also allocates the guest-side event channel. Add support for that in emulated mode, so that we can have a primary console. Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which supports literally nothing except a single-page mapping of the console page. This might as well have been a hack in the xen_console driver, but this way at least the special-casing is kept within the Xen emulation code, and it gives us a hook for a more complete implementation if/when we ever do need one. Why can't you map the console page via the grant table like the xenstore page? I suppose we could, but I didn't really want the generic xen-console device code having any more of a special case for 'Xen emulation' than it does already by having to call xen_primary_console_create(). But doesn't is save you the whole foreignmem thing? You can use the grant table for primary and secondary consoles. Yes. And I could leave the existing foreignmem thing just for the case of primary console under true Xen. It's probably not that awful a special case, in the end. Then again, I was surprised I didn't *already* have a foreignmem ops for the emulated case, and we're probably going to want to continue fleshing it out later, so I don't really mind adding it. True. We'll need it for some of the other more fun protocols like vkbd or fb. Still, I think it'd be nicer to align the xenstore and primary console code to look similar and punt the work until then :-) Paul
Re: [PATCH v2 16/24] hw/xen: handle soft reset for primary console
On 24/10/2023 16:48, David Woodhouse wrote: On Tue, 2023-10-24 at 16:44 +0100, Paul Durrant wrote: On 19/10/2023 16:40, David Woodhouse wrote: From: David Woodhouse On soft reset, the prinary console event channel needs to be rebound to the backend port (in the xen-console driver). We could put that into the xen-console driver itself, but it's slightly less ugly to keep it within the KVM/Xen code, by stashing the backend port# on event channel reset and then rebinding in the primary console reset when it has to recreate the guest port anyway. Does Xen re-bind the primary console on EVTCHNOP_reset? That's news to me. I go check. I spent an unhapp hour trying to work out how Xen actually does any of this :) In the short term I'm more interested in having soft reset work, than an explicit EVTCHNOP_reset. And I can't work out *how*, but we do seem to have console again after a kexec in real Xen. *Soft* reset may do it, but not the EVTCHNOP_reset hypercall itself, because there's a bunch of impenetrable toolstack magic involved the former. Perhaps you could just push the re-bind code up a layer into kvm_xen_soft_reset(). Paul
Re: [PATCH 2/4] hw/xen: update Xen PV NIC to XenDevice model
On 24/10/2023 16:17, David Woodhouse wrote: [snip] I don't think that's really a valid state for a network frontend. Linux netback just ignores it. Must we? I was thinking of making the ->frontend_changed() methods optional and allowing backends to just provide ->connect() and ->disconnect() methods instead if they wanted to. Because we have three identical ->frontend_changed() methods now... Now maybe... Not sure things will look so common when other backends are converted. I'd prefer to maintain fidelity with Linux xen-netback as it is generally considered to be the canonical implementation. Paul
Re: [PATCH v2 16/24] hw/xen: handle soft reset for primary console
On 19/10/2023 16:40, David Woodhouse wrote: From: David Woodhouse On soft reset, the prinary console event channel needs to be rebound to the backend port (in the xen-console driver). We could put that into the xen-console driver itself, but it's slightly less ugly to keep it within the KVM/Xen code, by stashing the backend port# on event channel reset and then rebinding in the primary console reset when it has to recreate the guest port anyway. Does Xen re-bind the primary console on EVTCHNOP_reset? That's news to me. I go check. Paul Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 9 + hw/i386/kvm/xen_primary_console.c | 29 - hw/i386/kvm/xen_primary_console.h | 1 + 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c index d72dca6591..ce4da6d37a 100644 --- a/hw/i386/kvm/xen_evtchn.c +++ b/hw/i386/kvm/xen_evtchn.c @@ -40,6 +40,7 @@ #include "xen_evtchn.h" #include "xen_overlay.h" #include "xen_xenstore.h" +#include "xen_primary_console.h" #include "sysemu/kvm.h" #include "sysemu/kvm_xen.h" @@ -1098,6 +1099,7 @@ int xen_evtchn_soft_reset(void) { XenEvtchnState *s = xen_evtchn_singleton; bool flush_kvm_routes; +uint16_t con_port = xen_primary_console_get_port(); int i; if (!s) { @@ -1108,6 +1110,13 @@ int xen_evtchn_soft_reset(void) qemu_mutex_lock(&s->port_lock); +if (con_port) { +XenEvtchnPort *p = &s->port_table[con_port]; +if (p->type == EVTCHNSTAT_interdomain) { +xen_primary_console_set_be_port(p->u.interdomain.port); +} +} + for (i = 0; i < s->nr_ports; i++) { close_port(s, i, &flush_kvm_routes); } diff --git a/hw/i386/kvm/xen_primary_console.c b/hw/i386/kvm/xen_primary_console.c index 0aa1c16ad6..5e6e085ac7 100644 --- a/hw/i386/kvm/xen_primary_console.c +++ b/hw/i386/kvm/xen_primary_console.c @@ -112,6 +112,15 @@ uint16_t xen_primary_console_get_port(void) return s->guest_port; } +void xen_primary_console_set_be_port(uint16_t port) +{ +XenPrimaryConsoleState *s = xen_primary_console_singleton; +if (s) { +printf("be port set to %d\n", port); +s->be_port = port; +} +} + uint64_t xen_primary_console_get_pfn(void) { XenPrimaryConsoleState *s = xen_primary_console_singleton; @@ -142,6 +151,20 @@ static void alloc_guest_port(XenPrimaryConsoleState *s) } } +static void rebind_guest_port(XenPrimaryConsoleState *s) +{ +struct evtchn_bind_interdomain inter = { +.remote_dom = DOMID_QEMU, +.remote_port = s->be_port, +}; + +if (!xen_evtchn_bind_interdomain_op(&inter)) { +s->guest_port = inter.local_port; +} + +s->be_port = 0; +} + int xen_primary_console_reset(void) { XenPrimaryConsoleState *s = xen_primary_console_singleton; @@ -154,7 +177,11 @@ int xen_primary_console_reset(void) xen_overlay_do_map_page(&s->console_page, gpa); } -alloc_guest_port(s); +if (s->be_port) { +rebind_guest_port(s); +} else { +alloc_guest_port(s); +} trace_xen_primary_console_reset(s->guest_port); diff --git a/hw/i386/kvm/xen_primary_console.h b/hw/i386/kvm/xen_primary_console.h index dd4922f3f4..7e2989ea0d 100644 --- a/hw/i386/kvm/xen_primary_console.h +++ b/hw/i386/kvm/xen_primary_console.h @@ -16,6 +16,7 @@ void xen_primary_console_create(void); int xen_primary_console_reset(void); uint16_t xen_primary_console_get_port(void); +void xen_primary_console_set_be_port(uint16_t port); uint64_t xen_primary_console_get_pfn(void); void *xen_primary_console_get_map(void);
Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode
On 24/10/2023 16:37, David Woodhouse wrote: On Tue, 2023-10-24 at 15:20 +0100, Paul Durrant wrote: On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse The primary console is special because the toolstack maps a page at a fixed GFN and also allocates the guest-side event channel. Add support for that in emulated mode, so that we can have a primary console. Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which supports literally nothing except a single-page mapping of the console page. This might as well have been a hack in the xen_console driver, but this way at least the special-casing is kept within the Xen emulation code, and it gives us a hook for a more complete implementation if/when we ever do need one. Why can't you map the console page via the grant table like the xenstore page? I suppose we could, but I didn't really want the generic xen-console device code having any more of a special case for 'Xen emulation' than it does already by having to call xen_primary_console_create(). But doesn't is save you the whole foreignmem thing? You can use the grant table for primary and secondary consoles. Paul
Re: [PATCH v2 10/24] hw/xen: populate store frontend nodes with XenStore PFN/port
On 19/10/2023 16:40, David Woodhouse wrote: From: David Woodhouse This is kind of redundant since without being able to get these through some other method (HVMOP_get_param) the guest wouldn't be able to access XenStore in order to find them. But Xen populates them, and it does allow guests to *rebind* to the event channel port after a reset. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 11 +++ 1 file changed, 11 insertions(+) Reviewed-by: Paul Durrant
Re: [PATCH v2 05/24] hw/xen: fix XenStore watch delivery to guest
On 19/10/2023 16:40, David Woodhouse wrote: From: David Woodhouse When fire_watch_cb() found the response buffer empty, it would call deliver_watch() to generate the XS_WATCH_EVENT message in the response buffer and send an event channel notification to the guest… without actually *copying* the response buffer into the ring. So there was nothing for the guest to see. The pending response didn't actually get processed into the ring until the guest next triggered some activity from its side. Add the missing call to put_rsp(). It might have been slightly nicer to call xen_xenstore_event() here, which would *almost* have worked. Except for the fact that it calls xen_be_evtchn_pending() to check that it really does have an event pending (and clear the eventfd for next time). And under Xen it's defined that setting that fd to O_NONBLOCK isn't guaranteed to work, so the emu implementation follows suit. This fixes Xen device hot-unplug. Fixes: 0254c4d19df ("hw/xen: Add xenstore wire implementation and implementation stubs") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index 660d0b72f9..82a215058a 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -1357,10 +1357,12 @@ static void fire_watch_cb(void *opaque, const char *path, const char *token) } else { deliver_watch(s, path, token); /* - * If the message was queued because there was already ring activity, - * no need to wake the guest. But if not, we need to send the evtchn. + * Attempt to queue the message into the actual ring, and send + * the event channel notification if any bytes are copied. */ -xen_be_evtchn_notify(s->eh, s->be_port); +if (put_rsp(s) > 0) { +xen_be_evtchn_notify(s->eh, s->be_port); +} There's actually the potential for an assertion failure there; if the guest has specified an oversize token then deliver_watch() will not set rsp_pending... then put_rsp() will fail its assertion that the flag is true. Paul } }
Re: [PATCH v2 04/24] hw/xen: don't clear map_track[] in xen_gnttab_reset()
On 19/10/2023 16:40, David Woodhouse wrote: From: David Woodhouse The refcounts actually correspond to 'active_ref' structures stored in a GHashTable per "user" on the backend side (mostly, per XenDevice). If we zero map_track[] on reset, then when the backend drivers get torn down and release their mapping we hit the assert(s->map_track[ref] != 0) in gnt_unref(). So leave them in place. Each backend driver will disconnect and reconnect as the guest comes back up again and reconnects, and it all works out OK in the end as the old refs get dropped. Fixes: de26b2619789 ("hw/xen: Implement soft reset for emulated gnttab") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_gnttab.c | 2 -- 1 file changed, 2 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 03/24] hw/xen: select kernel mode for per-vCPU event channel upcall vector
On 19/10/2023 16:39, David Woodhouse wrote: From: David Woodhouse A guest which has configured the per-vCPU upcall vector may set the HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero. For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support for HVMOP_set_evtchn_upcall_vector") will just do this after setting the vector: /* Trick toolstack to think we are enlightened. */ if (!cpu) rc = xen_set_callback_via(1); That's explicitly setting the delivery to GSI#1, but it's supposed to be overridden by the per-vCPU vector setting. This mostly works in Qemu *except* for the logic to enable the in-kernel handling of event channels, which falsely determines that the kernel cannot accelerate GSI delivery in this case. Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has the vector set, and use that in xen_evtchn_set_callback_param() to enable the kernel acceleration features even when the param *appears* to be set to target a GSI. Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to *zero* the event channel delivery is disabled completely. (Which is what that bizarre guest behaviour is working round in the first place.) Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 6 ++ include/sysemu/kvm_xen.h | 1 + target/i386/kvm/xen-emu.c | 7 +++ 3 files changed, 14 insertions(+) Reviewed-by: Paul Durrant
Re: [PATCH 2/4] hw/xen: update Xen PV NIC to XenDevice model
On 17/10/2023 19:25, David Woodhouse wrote: From: David Woodhouse Signed-off-by: David Woodhouse --- hw/net/meson.build| 2 +- hw/net/trace-events | 9 + hw/net/xen_nic.c | 426 +- hw/xenpv/xen_machine_pv.c | 1 - 4 files changed, 342 insertions(+), 96 deletions(-) diff --git a/hw/net/meson.build b/hw/net/meson.build index 2632634df3..f64651c467 100644 --- a/hw/net/meson.build +++ b/hw/net/meson.build @@ -1,5 +1,5 @@ system_ss.add(when: 'CONFIG_DP8393X', if_true: files('dp8393x.c')) -system_ss.add(when: 'CONFIG_XEN', if_true: files('xen_nic.c')) +system_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xen_nic.c')) system_ss.add(when: 'CONFIG_NE2000_COMMON', if_true: files('ne2000.c')) # PCI network cards diff --git a/hw/net/trace-events b/hw/net/trace-events index 3abfd65e5b..ee56acfbce 100644 --- a/hw/net/trace-events +++ b/hw/net/trace-events @@ -482,3 +482,12 @@ dp8393x_receive_oversize(int size) "oversize packet, pkt_size is %d" dp8393x_receive_not_netcard(void) "packet not for netcard" dp8393x_receive_packet(int crba) "Receive packet at 0x%"PRIx32 dp8393x_receive_write_status(int crba) "Write status at 0x%"PRIx32 + +# xen_nic.c +xen_netdev_realize(int idx, const char *info) "idx %u info '%s'" +xen_netdev_unrealize(int idx) "idx %u" +xen_netdev_create(int idx) "idx %u" +xen_netdev_destroy(int idx) "idx %u" +xen_netdev_disconnect(int idx) "idx %u" +xen_netdev_connect(int idx, unsigned int tx, unsigned int rx, int port) "idx %u tx %u rx %u port %u" +xen_netdev_frontend_changed(const char *idx, int state) "idx %s state %d" diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c index 9bbf6599fc..84914c329c 100644 --- a/hw/net/xen_nic.c +++ b/hw/net/xen_nic.c @@ -20,6 +20,11 @@ */ #include "qemu/osdep.h" +#include "qemu/cutils.h" +#include "qemu/qemu-print.h" +#include "qapi/qmp/qdict.h" +#include "qapi/error.h" + #include #include #include @@ -27,18 +32,26 @@ #include "net/net.h" #include "net/checksum.h" #include "net/util.h" -#include "hw/xen/xen-legacy-backend.h" + +#include "hw/xen/xen-backend.h" +#include "hw/xen/xen-bus-helper.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-properties-system.h" #include "hw/xen/interface/io/netif.h" +#include "hw/xen/interface/io/xs_wire.h" + +#include "trace.h" /* - */ struct XenNetDev { -struct XenLegacyDevice xendev; /* must be first */ -char *mac; +struct XenDevice xendev; /* must be first */ +XenEventChannel *event_channel; +int dev; int tx_work; -int tx_ring_ref; -int rx_ring_ref; +unsigned int tx_ring_ref; +unsigned int rx_ring_ref; struct netif_tx_sring *txs; struct netif_rx_sring *rxs; netif_tx_back_ring_t tx_ring; @@ -47,6 +60,13 @@ struct XenNetDev { NICState *nic; }; +typedef struct XenNetDev XenNetDev; + +#define TYPE_XEN_NET_DEVICE "xen-net-device" +OBJECT_DECLARE_SIMPLE_TYPE(XenNetDev, XEN_NET_DEVICE) + +#define xen_pv_printf(a, n, ...) qemu_printf(__VA_ARGS__) Why define this... + /* - */ static void net_tx_response(struct XenNetDev *netdev, netif_tx_request_t *txp, int8_t st) @@ -68,7 +88,8 @@ static void net_tx_response(struct XenNetDev *netdev, netif_tx_request_t *txp, i netdev->tx_ring.rsp_prod_pvt = ++i; RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netdev->tx_ring, notify); if (notify) { -xen_pv_send_notify(&netdev->xendev); +xen_device_notify_event_channel(XEN_DEVICE(netdev), +netdev->event_channel, NULL); } if (i == netdev->tx_ring.req_cons) { @@ -104,8 +125,9 @@ static void net_tx_error(struct XenNetDev *netdev, netif_tx_request_t *txp, RING #endif } -static void net_tx_packets(struct XenNetDev *netdev) +static bool net_tx_packets(struct XenNetDev *netdev) { +bool done_something = false; netif_tx_request_t txreq; RING_IDX rc, rp; void *page; @@ -122,6 +144,7 @@ static void net_tx_packets(struct XenNetDev *netdev) } memcpy(&txreq, RING_GET_REQUEST(&netdev->tx_ring, rc), sizeof(txreq)); netdev->tx_ring.req_cons = ++rc; +done_something = true; #if 1 /* should not happen in theory, we don't announce the * @@ -151,7 +174,7 @@ static void net_tx_packets(struct XenNetDev *netdev) continue; } -xen_pv_printf(&netdev->xendev, 3, +if (0) qemu_printf(//&netdev->xendev, 3, ... and then not use it here? Perhaps the 'if (0)' ugliness can go in the macro too. "tx packet ref %d, off %d, len %d, flags 0x%x%s%s%
Re: [PATCH 1/4] hw/xen: only remove peers of PCI NICs on unplug
On 17/10/2023 19:25, David Woodhouse wrote: From: David Woodhouse When the Xen guest asks to unplug *emulated* NICs, it's kind of unhelpful also to unplug the peer of the *Xen* PV NIC. Signed-off-by: David Woodhouse --- hw/i386/xen/xen_platform.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c index 17457ff3de..e2dd1b536a 100644 --- a/hw/i386/xen/xen_platform.c +++ b/hw/i386/xen/xen_platform.c @@ -140,9 +140,14 @@ static void unplug_nic(PCIBus *b, PCIDevice *d, void *o) /* Remove the peer of the NIC device. Normally, this would be a tap device. */ static void del_nic_peer(NICState *nic, void *opaque) { -NetClientState *nc; +NetClientState *nc = qemu_get_queue(nic); +ObjectClass *klass = module_object_class_by_name(nc->model); + +/* Only delete peers of PCI NICs that we're about to delete */ +if (!klass || !object_class_dynamic_cast(klass, TYPE_PCI_DEVICE)) { Would it not be better to test for object_class_dynamic_cast(klass, TYPE_XEN_DEVICE)? Paul +return; +} -nc = qemu_get_queue(nic); if (nc->peer) qemu_del_net_client(nc->peer); }
Re: [PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse The primary console is special because the toolstack maps a page at a fixed GFN and also allocates the guest-side event channel. Add support for that in emulated mode, so that we can have a primary console. Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which supports literally nothing except a single-page mapping of the console page. This might as well have been a hack in the xen_console driver, but this way at least the special-casing is kept within the Xen emulation code, and it gives us a hook for a more complete implementation if/when we ever do need one. Why can't you map the console page via the grant table like the xenstore page? Paul
Re: [PATCH 09/12] hw/xen: prevent duplicate device registrations
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse Ensure that we have a XenBackendInstance for every device regardless of whether it was "discovered" in XenStore or created directly in QEMU. This allows the backend_list to be a source of truth about whether a given backend exists, and allows us to reject duplicates. This also cleans up the fact that backend drivers were calling xen_backend_set_device() with a XenDevice immediately after calling qdev_realize_and_unref() on it, when it wasn't theirs to play with any more. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 1 - hw/char/xen_console.c| 2 +- hw/xen/xen-backend.c | 78 ++-- hw/xen/xen-bus.c | 8 include/hw/xen/xen-backend.h | 3 ++ 5 files changed, 69 insertions(+), 23 deletions(-) diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c index a07cd7eb5d..9262338535 100644 --- a/hw/block/xen-block.c +++ b/hw/block/xen-block.c @@ -975,7 +975,6 @@ static void xen_block_device_create(XenBackendInstance *backend, goto fail; } -xen_backend_set_device(backend, xendev); return; fail: diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c index bd20be116c..2825b8c511 100644 --- a/hw/char/xen_console.c +++ b/hw/char/xen_console.c @@ -468,7 +468,7 @@ static void xen_console_device_create(XenBackendInstance *backend, Chardev *cd = NULL; struct qemu_xs_handle *xsh = xenbus->xsh; -if (qemu_strtoul(name, NULL, 10, &number)) { +if (qemu_strtoul(name, NULL, 10, &number) || number >= INT_MAX) { error_setg(errp, "failed to parse name '%s'", name); goto fail; } I don't think this hunk belongs here, does it? Seems like it should be in patch 7. diff --git a/hw/xen/xen-backend.c b/hw/xen/xen-backend.c index b9bf70a9f5..dcb4329258 100644 --- a/hw/xen/xen-backend.c +++ b/hw/xen/xen-backend.c @@ -101,22 +101,28 @@ static XenBackendInstance *xen_backend_list_find(XenDevice *xendev) return NULL; } -bool xen_backend_exists(const char *type, const char *name) +static XenBackendInstance *xen_backend_lookup(const XenBackendImpl *impl, const char *name) This name is a little close to xen_backend_table_lookup()... perhaps that one should be renamed xen_backend_impl_lookup() for clarity. { -const XenBackendImpl *impl = xen_backend_table_lookup(type); XenBackendInstance *backend; -if (!impl) { -return false; -} - QLIST_FOREACH(backend, &backend_list, entry) { if (backend->impl == impl && !strcmp(backend->name, name)) { -return true; +return backend; } } -return false; +return NULL; +} + +bool xen_backend_exists(const char *type, const char *name) +{ +const XenBackendImpl *impl = xen_backend_table_lookup(type); + +if (!impl) { +return false; +} + +return !!xen_backend_lookup(impl, name); } static void xen_backend_list_remove(XenBackendInstance *backend) @@ -138,11 +144,10 @@ void xen_backend_device_create(XenBus *xenbus, const char *type, backend = g_new0(XenBackendInstance, 1); backend->xenbus = xenbus; backend->name = g_strdup(name); - -impl->create(backend, opts, errp); - backend->impl = impl; xen_backend_list_add(backend); + +impl->create(backend, opts, errp); } XenBus *xen_backend_get_bus(XenBackendInstance *backend) @@ -155,13 +160,6 @@ const char *xen_backend_get_name(XenBackendInstance *backend) return backend->name; } -void xen_backend_set_device(XenBackendInstance *backend, -XenDevice *xendev) -{ -g_assert(!backend->xendev); -backend->xendev = xendev; -} - The declaration also needs removing from xen-backend.h presumably. XenDevice *xen_backend_get_device(XenBackendInstance *backend) { return backend->xendev; @@ -178,9 +176,7 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, Error **errp) } impl = backend->impl; -if (backend->xendev) { -impl->destroy(backend, errp); -} +impl->destroy(backend, errp); xen_backend_list_remove(backend); g_free(backend->name); @@ -188,3 +184,43 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, Error **errp) return true; } + +bool xen_backend_device_realized(XenDevice *xendev, Error **errp) +{ +XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev); +const char *type = xendev_class->backend ? : object_get_typename(OBJECT(xendev)); +const XenBackendImpl *impl = xen_backend_table_lookup(type); +XenBackendInstance *backend; + +if (!impl) { +return false; +} + +backend = xen_backend_lookup(impl, xendev->name); +if (backend) { +if (backend->xendev && backend->xendev != xendev) { +error_setg(errp, "device %s/%s already exists", type, xend
Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass
On 24/10/2023 14:29, David Woodhouse wrote: On Tue, 2023-10-24 at 13:59 +0100, Paul Durrant wrote: On 24/10/2023 13:56, David Woodhouse wrote: On Tue, 2023-10-24 at 13:42 +0100, Paul Durrant wrote: --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, Error **errp) { ERRP_GUARD(); XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev))); + XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev); - xendev->frontend_path = xen_device_get_frontend_path(xendev); + if (xendev_class->get_frontend_path) { + xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp); + if (!xendev->frontend_path) { + return; I think you need to update errp here to note that you are failing to create the frontend. If xendev_class->get_frontend_path returned NULL it will have filled in errp. Ok, but a prepend to say that a lack of path there means we skip frontend creation seems reasonable? No, it *is* returning an error. Perhaps I can make it I understand it is returning an error. I thought the point of the cascading error handling was to prepend text at each (meaningful) layer such that the eventual message conveyed what failed and also what the consequences of that failure were. Paul if (!xendev->frontend_path) { /* * If the ->get_frontend_path() method returned NULL, it will * already have set *errp accordingly. Return the error. */ return /* false */; } As a general rule (I'll be doing a bombing run on xen-bus once I get my patch queue down into single digits) we should never check 'if (*errp)' to check if a function had an error. It should *also* return a success or failure indication, and we should cope with errp being NULL. I'm pretty sure someone told me the exact opposite a few years back. Then they were wrong :)
Re: [PATCH 08/12] hw/xen: do not repeatedly try to create a failing backend device
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse If xen_backend_device_create() fails to instantiate a device, the XenBus code will just keep trying over and over again each time the bus is re-enumerated, as long as the backend appears online and in XenbusStateInitialising. The only thing which prevents the XenBus code from recreating duplicates of devices which already exist, is the fact that xen_device_realize() sets the backend state to XenbusStateInitWait. If the attempt to create the device doesn't get *that* far, that's when it will keep getting retried. My first thought was to handle errors by setting the backend state to XenbusStateClosed, but that doesn't work for XenConsole which wants to *ignore* any device of type != "ioemu" completely. So, make xen_backend_device_create() *keep* the XenBackendInstance for a failed device, and provide a new xen_backend_exists() function to allow xen_bus_type_enumerate() to check whether one already exists before creating a new one. Signed-off-by: David Woodhouse --- hw/xen/xen-backend.c | 27 +-- hw/xen/xen-bus.c | 3 ++- include/hw/xen/xen-backend.h | 1 + 3 files changed, 24 insertions(+), 7 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 07/12] hw/xen: update Xen console to XenDevice model
xen_device_notify_event_channel(XEN_DEVICE(con), con->event_channel, NULL); } -static void xencons_send(struct XenConsole *con) +static bool xencons_send(XenConsole *con) { ssize_t len, size; @@ -159,174 +170,407 @@ static void xencons_send(struct XenConsole *con) if (len < 1) { if (!con->backlog) { con->backlog = 1; -xen_pv_printf(&con->xendev, 1, - "backlog piling up, nobody listening?\n"); } } else { buffer_advance(&con->buffer, len); if (con->backlog && len == size) { con->backlog = 0; -xen_pv_printf(&con->xendev, 1, "backlog is gone\n"); } } +return len > 0; } /* */ -static int store_con_info(struct XenConsole *con) +static bool con_event(void *_xendev) { -Chardev *cs = qemu_chr_fe_get_driver(&con->chr); -char *pts = NULL; -char *dom_path; -g_autoptr(GString) path = NULL; +XenConsole *con = XEN_CONSOLE_DEVICE(_xendev); +bool done_something; -/* Only continue if we're talking to a pty. */ -if (!CHARDEV_IS_PTY(cs)) { -return 0; -} -pts = cs->filename + 4; +done_something = buffer_append(con); -dom_path = qemu_xen_xs_get_domain_path(xenstore, xen_domid); -if (!dom_path) { -return 0; +if (con->buffer.size - con->buffer.consumed) { +done_something |= xencons_send(con); } +return done_something; +} -path = g_string_new(dom_path); -free(dom_path); +/* */ -if (con->xendev.dev) { -g_string_append_printf(path, "/device/console/%d", con->xendev.dev); -} else { -g_string_append(path, "/console"); +static void xen_console_disconnect(XenDevice *xendev, Error **errp) +{ +XenConsole *con = XEN_CONSOLE_DEVICE(xendev); + +qemu_chr_fe_set_handlers(&con->chr, NULL, NULL, NULL, NULL, + con, NULL, true); + nit: extraneous blank line by the looks of it. With that fixed... Reviewed-by: Paul Durrant
Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass
On 24/10/2023 13:56, David Woodhouse wrote: On Tue, 2023-10-24 at 13:42 +0100, Paul Durrant wrote: --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, Error **errp) { ERRP_GUARD(); XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev))); + XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev); - xendev->frontend_path = xen_device_get_frontend_path(xendev); + if (xendev_class->get_frontend_path) { + xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp); + if (!xendev->frontend_path) { + return; I think you need to update errp here to note that you are failing to create the frontend. If xendev_class->get_frontend_path returned NULL it will have filled in errp. Ok, but a prepend to say that a lack of path there means we skip frontend creation seems reasonable? As a general rule (I'll be doing a bombing run on xen-bus once I get my patch queue down into single digits) we should never check 'if (*errp)' to check if a function had an error. It should *also* return a success or failure indication, and we should cope with errp being NULL. I'm pretty sure someone told me the exact opposite a few years back. Paul
Re: [PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse The primary Xen console is special. The guest's side is set up for it by the toolstack automatically and not by the standard PV init sequence. Accordingly, its *frontend* doesn't appear in …/device/console/0 either; instead it appears under …/console in the guest's XenStore node. To allow the Xen console driver to override the frontend path for the primary console, add a method to the XenDeviceClass which can be used instead of the standard xen_device_get_frontend_path() Signed-off-by: David Woodhouse --- hw/xen/xen-bus.c | 10 +- include/hw/xen/xen-bus.h | 2 ++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index ece8ec40cd..cc524ed92c 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, Error **errp) { ERRP_GUARD(); XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev))); +XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev); -xendev->frontend_path = xen_device_get_frontend_path(xendev); +if (xendev_class->get_frontend_path) { +xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp); +if (!xendev->frontend_path) { +return; I think you need to update errp here to note that you are failing to create the frontend. Paul
Re: [PATCH 05/12] hw/xen: populate store frontend nodes with XenStore PFN/port
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse This is kind of redundant since without being able to get these through ome other method (HVMOP_get_param) the guest wouldn't be able to access ^ typo XenStore in order to find them. But Xen populates them, and it does allow guests to *rebind* to the event channel port after a reset. ... although this can also be done by querying the remote end of the port before reset. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 10 ++ 1 file changed, 10 insertions(+) Reviewed-by: Paul Durrant
Re: [PATCH 04/12] i386/xen: advertise XEN_HVM_CPUID_UPCALL_VECTOR in CPUID
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse This will allow Linux guests (since v6.0) to use the per-vCPU upcall vector delivered as MSI through the local APIC. Signed-off-by: David Woodhouse --- target/i386/kvm/kvm.c | 4 1 file changed, 4 insertions(+) Reviewed-by: Paul Durrant
Re: [PATCH 03/12] include: update Xen public headers to Xen 4.17.2 release
On 16/10/2023 16:19, David Woodhouse wrote: From: David Woodhouse ... in order to advertise the XEN_HVM_CPUID_UPCALL_VECTOR feature, which will come in a subsequent commit. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c| 2 +- include/hw/xen/interface/arch-arm.h | 37 +++--- include/hw/xen/interface/arch-x86/cpuid.h | 31 +--- .../hw/xen/interface/arch-x86/xen-x86_32.h| 19 +-- .../hw/xen/interface/arch-x86/xen-x86_64.h| 19 +-- include/hw/xen/interface/arch-x86/xen.h | 26 ++ include/hw/xen/interface/event_channel.h | 19 +-- include/hw/xen/interface/features.h | 19 +-- include/hw/xen/interface/grant_table.h| 19 +-- include/hw/xen/interface/hvm/hvm_op.h | 19 +-- include/hw/xen/interface/hvm/params.h | 19 +-- include/hw/xen/interface/io/blkif.h | 27 -- include/hw/xen/interface/io/console.h | 19 +-- include/hw/xen/interface/io/fbif.h| 19 +-- include/hw/xen/interface/io/kbdif.h | 19 +-- include/hw/xen/interface/io/netif.h | 25 +++--- include/hw/xen/interface/io/protocols.h | 19 +-- include/hw/xen/interface/io/ring.h| 49 ++- include/hw/xen/interface/io/usbif.h | 19 +-- include/hw/xen/interface/io/xenbus.h | 19 +-- include/hw/xen/interface/io/xs_wire.h | 36 ++ include/hw/xen/interface/memory.h | 30 +--- include/hw/xen/interface/physdev.h| 23 ++--- include/hw/xen/interface/sched.h | 19 +-- include/hw/xen/interface/trace.h | 19 +-- include/hw/xen/interface/vcpu.h | 19 +-- include/hw/xen/interface/version.h| 19 +-- include/hw/xen/interface/xen-compat.h | 19 +-- include/hw/xen/interface/xen.h| 19 +-- 29 files changed, 124 insertions(+), 523 deletions(-) Acked-by: Paul Durrant
Re: [PATCH 02/12] hw/xen: select kernel mode for per-vCPU event channel upcall vector
On 16/10/2023 16:18, David Woodhouse wrote: From: David Woodhouse A guest which has configured the per-vCPU upcall vector may set the HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero. For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support for HVMOP_set_evtchn_upcall_vector") will just do this after setting the vector: /* Trick toolstack to think we are enlightened. */ if (!cpu) rc = xen_set_callback_via(1); That's explicitly setting the delivery to GSI#, but it's supposed to be overridden by the per-vCPU vector setting. This mostly works in QEMU *except* for the logic to enable the in-kernel handling of event channels, which falsely determines that the kernel cannot accelerate GSI delivery in this case. Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has the vector set, and use that in xen_evtchn_set_callback_param() to enable the kernel acceleration features even when the param *appears* to be set to target a GSI. Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to *zero* the event channel delivery is disabled completely. (Which is what that bizarre guest behaviour is working round in the first place.) Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 6 ++ include/sysemu/kvm_xen.h | 1 + target/i386/kvm/xen-emu.c | 7 +++ 3 files changed, 14 insertions(+) diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c index 4df973022c..d72dca6591 100644 --- a/hw/i386/kvm/xen_evtchn.c +++ b/hw/i386/kvm/xen_evtchn.c @@ -490,6 +490,12 @@ int xen_evtchn_set_callback_param(uint64_t param) break; } +/* If the guest has set a per-vCPU callback vector, prefer that. */ +if (gsi && kvm_xen_has_vcpu_callback_vector()) { +in_kernel = kvm_xen_has_cap(EVTCHN_SEND); +gsi = 0; +} + So this deals with setting the callback via after setting the upcall vector. What happens if the guest then disables the upcall vector (by setting it to zero)? Xen checks 'v->arch.hvm.evtchn_upcall_vector != 0' for every event delivery. Paul
Re: [PATCH 01/12] i386/xen: fix per-vCPU upcall vector for Xen emulation
On 16/10/2023 16:18, David Woodhouse wrote: From: David Woodhouse The per-vCPU upcall vector support had two problems. Firstly it was using the wrong hypercall argument and would always return -EFAULT. And secondly it was using the wrong ioctl() to pass the vector to the kernel and thus the *kernel* would always return -EINVAL. Linux doesn't (yet) use this mode so it went without decent testing for a while. Fixes: 105b47fdf2d0 ("i386/xen: implement HVMOP_set_evtchn_upcall_vector") Signed-off-by: David Woodhouse --- target/i386/kvm/xen-emu.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) Reviewed-by: Paul Durrant
Re: [RFC PATCH v3 42/78] hw/i386: add fallthrough pseudo-keyword
On 13/10/2023 09:46, Emmanouil Pitsidianakis wrote: In preparation of raising -Wimplicit-fallthrough to 5, replace all fall-through comments with the fallthrough attribute pseudo-keyword. Signed-off-by: Emmanouil Pitsidianakis --- hw/i386/intel_iommu.c| 4 ++-- hw/i386/kvm/xen_evtchn.c | 2 +- Reviewed-by: Paul Durrant hw/i386/x86.c| 2 +- 3 files changed, 4 insertions(+), 4 deletions(-)
Re: [PATCH] i386/xen: Ignore VCPU_SSHOTTMR_future flag in set_singleshot_timer()
On 23/08/2023 12:58, David Woodhouse wrote: From: David Woodhouse Upstream Xen now ignores this flag¹, since the only guest kernel ever to use it was buggy. ¹ https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=19c6cbd909 Signed-off-by: David Woodhouse --- We do take an argument to emulate a specific Xen version, and *theoretically* we could ignore the VCPU_SSHOTTMR_future flag only if we're emulating Xen 4.17 or newer? But I don't think it's worth doing that (and QEMU won't be the only instance of Xen emulation or even real older Xen versions patched to ignore the flag). target/i386/kvm/xen-emu.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 1/4] block: rename blk_io_plug_call() API to defer_call()
On 17/08/2023 16:58, Stefan Hajnoczi wrote: Prepare to move the blk_io_plug_call() API out of the block layer so that other subsystems call use this deferred call mechanism. Rename it to defer_call() but leave the code in block/plug.c. The next commit will move the code out of the block layer. Suggested-by: Ilya Maximets Signed-off-by: Stefan Hajnoczi --- include/sysemu/block-backend-io.h | 6 +- block/blkio.c | 8 +-- block/io_uring.c | 4 +- block/linux-aio.c | 4 +- block/nvme.c | 4 +- block/plug.c | 109 +++--- hw/block/dataplane/xen-block.c| 10 +-- hw/block/virtio-blk.c | 4 +- hw/scsi/virtio-scsi.c | 6 +- 9 files changed, 76 insertions(+), 79 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] i386/xen: Don't advertise XENFEAT_supervisor_mode_kernel
On 08/08/2023 18:08, David Woodhouse wrote: From: David Woodhouse XENFEAT_supervisor_mode_kernel shouldn't be set for HVM guests. It confuses lscpu into thinking it's running in PVH mode. No non-cosmetic effects have been observed so far. Signed-off-by: David Woodhouse --- Only really cosmetic. Don't feel strongly about whether it makes 8.1. target/i386/kvm/xen-emu.c | 1 - 1 file changed, 1 deletion(-) Reviewed-by: Paul Durrant
Re: [RFC PATCH post-8.1] hw/xen: Clean up event channel 'type_val' handling to use union
in->remote_dom == xen_domid) { -type_val = 0; -} else { +if (interdomain->remote_dom != DOMID_QEMU && +interdomain->remote_dom != DOMID_SELF && +interdomain->remote_dom != xen_domid) { return -ESRCH; } @@ -1375,8 +1378,8 @@ int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain *interdomain) qemu_mutex_lock(&s->port_lock); /* The newly allocated port starts out as unbound */ -ret = allocate_port(s, 0, EVTCHNSTAT_unbound, type_val, -&interdomain->local_port); +ret = allocate_port(s, 0, EVTCHNSTAT_unbound, 0, &interdomain->local_port); + if (ret) { goto out; } @@ -1401,7 +1404,8 @@ int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain *interdomain) assign_kernel_eventfd(lp->type, xc->guest_port, xc->fd); } lp->type = EVTCHNSTAT_interdomain; -lp->type_val = PORT_INFO_TYPEVAL_REMOTE_QEMU | interdomain->remote_port; +lp->u.interdomain.to_qemu = 1; +lp->u.interdomain.port = interdomain->remote_port; ret = 0; } else { /* Loopback */ @@ -1415,13 +1419,13 @@ int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain *interdomain) * the port that was just allocated for the local end. */ if (interdomain->local_port != interdomain->remote_port && -rp->type == EVTCHNSTAT_unbound && rp->type_val == 0) { +rp->type == EVTCHNSTAT_unbound && !rp->u.interdomain.to_qemu) { rp->type = EVTCHNSTAT_interdomain; -rp->type_val = interdomain->local_port; +rp->u.interdomain.port = interdomain->local_port; lp->type = EVTCHNSTAT_interdomain; -lp->type_val = interdomain->remote_port; +lp->u.interdomain.port = interdomain->remote_port; } else { ret = -EINVAL; } @@ -1440,7 +1444,6 @@ int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain *interdomain) int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc) { XenEvtchnState *s = xen_evtchn_singleton; -uint16_t type_val; int ret; if (!s) { @@ -1451,18 +1454,19 @@ int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc) return -ESRCH; } -if (alloc->remote_dom == DOMID_QEMU) { -type_val = PORT_INFO_TYPEVAL_REMOTE_QEMU; -} else if (alloc->remote_dom == DOMID_SELF || - alloc->remote_dom == xen_domid) { -type_val = 0; -} else { +if (alloc->remote_dom != DOMID_QEMU && alloc->remote_dom != DOMID_SELF && +alloc->remote_dom != xen_domid) { Maybe vertically align the clauses here as in the 'if' a couple of hunks back? return -EPERM; } Since all the above are nits... Reviewed-by: Paul Durrant
Re: [PATCH v1] xen-platform: do full PCI reset during unplug of IDE devices
On 20/07/2023 08:29, Olaf Hering wrote: The IDE unplug function needs to reset the entire PCI device, to make sure all state is initialized to defaults. This is done by calling pci_device_reset, which resets not only the chip specific registers, but also all PCI state. This fixes "unplug" in a Xen HVM domU with the modular legacy xenlinux PV drivers. Commit ee358e919e38 ("hw/ide/piix: Convert reset handler to DeviceReset") changed the way how the the disks are unplugged. Prior this commit the PCI device remained unchanged. After this change, piix_ide_reset is exercised after the "unplug" command, which was not the case prior that commit. This function resets the command register. As a result the ata_piix driver inside the domU will see a disabled PCI device. The generic PCI code will reenable the PCI device. On the qemu side, this runs pci_default_write_config/pci_update_mappings. Here a changed address is returned by pci_bar_address, this is the address which was truncated in piix_ide_reset. In case of a Xen HVM domU, the address changes from 0xc120 to 0xc100. This truncation was a bug in piix_ide_reset, which was fixed in commit 230dfd9257 ("hw/ide/piix: properly initialize the BMIBA register"). If pci_xen_ide_unplug had used pci_device_reset, the PCI registers would have been properly reset, and commit ee358e919e38 would have not introduced a regression for this specific domU environment. While the unplug is supposed to hide the IDE disks, the changed BMIBA address broke the UHCI device. In case the domU has an USB tablet configured, to recive absolute pointer coordinates for the GUI, it will cause a hang during device discovery of the partly discovered USB hid device. Reading the USBSTS word size register will fail. The access ends up in the QEMU piix-bmdma device, instead of the expected uhci device. Here a byte size request is expected, and a value of ~0 is returned. As a result the UCHI driver sees an error state in the register, and turns off the UHCI controller. Signed-off-by: Olaf Hering --- hw/i386/xen/xen_platform.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) Reviewed-by: Paul Durrant
Re: i386/xen: prevent guest from binding loopback event channel to itself
On 26/07/2023 10:07, David Woodhouse wrote: On Wed, 2023-07-26 at 09:44 +0100, Paul Durrant wrote: On 25/07/2023 11:05, David Woodhouse wrote: From: David Woodhouse Fuzzing showed that a guest could bind an interdomain port to itself, by guessing the next port to be allocated and putting that as the 'remote' port number. By chance, that works because the newly-allocated port has type EVTCHNSTAT_unbound. It shouldn't. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) Reviewed-by: Paul Durrant Thanks. I'll change the title prefix to 'hw/xen' since it's in hw/ not target/i386. Yes, makes sense. Please can I have also have a review for https://lore.kernel.org/qemu-devel/20076888f6bdf06a65aafc5cf954260965d45b97.ca...@infradead.org/ Sorry I missed that. Done. Cheers, Paul
Re: [PATCH] hw/xen: Clarify (lack of) error handling in transaction_commit()
On 20/06/2023 18:58, David Woodhouse wrote: From: David Woodhouse Coverity was unhappy (CID 1508359) because we didn't check the return of init_walk_op() in transaction_commit(), despite doing so at every other call site. Strictly speaking, this is a false positive since it can never fail. It only fails for invalid user input (transaction ID or path), and both of those are hard-coded to known sane values in this invocation. But Coverity doesn't know that, and neither does the casual reader of the code. Returning an error here would be weird, since the transaction *is* committed by this point; all the walk_op is doing is firing watches on the newly-committed changed nodes. So make it a g_assert(!ret), since it really should never happen. Signed-off-by: David Woodhouse --- hw/i386/kvm/xenstore_impl.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) Reviewed-by: Paul Durrant
Re: i386/xen: prevent guest from binding loopback event channel to itself
On 25/07/2023 11:05, David Woodhouse wrote: From: David Woodhouse Fuzzing showed that a guest could bind an interdomain port to itself, by guessing the next port to be allocated and putting that as the 'remote' port number. By chance, that works because the newly-allocated port has type EVTCHNSTAT_unbound. It shouldn't. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] xen-block: Avoid leaks on new error path
On 04/07/2023 18:18, Anthony PERARD wrote: From: Anthony PERARD Commit 189829399070 ("xen-block: Use specific blockdev driver") introduced a new error path, without taking care of allocated resources. So only allocate the qdicts after the error check, and free both `filename` and `driver` when we are about to return and thus taking care of both success and error path. Coverity only spotted the leak of qdicts (*_layer variables). Reported-by: Peter Maydell Fixes: Coverity CID 1508722, 1398649 Fixes: 189829399070 ("xen-block: Use specific blockdev driver") Signed-off-by: Anthony PERARD --- hw/block/xen-block.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2] i386/xen: consistent locking around Xen singleshot timers
On 04/07/2023 16:51, David Woodhouse wrote: From: David Woodhouse Coverity points out (CID 1507534, 1507968) that we sometimes access env->xen_singleshot_timer_ns under the protection of env->xen_timers_lock and sometimes not. This isn't always an issue. There are two modes for the timers; if the kernel supports the EVTCHN_SEND capability then it handles all the timer hypercalls and delivery internally, and all we use the field for is to get/set the timer as part of the vCPU state via an ioctl(). If the kernel doesn't have that support, then we do all the emulation within qemu, and *those* are the code paths where we actually care about the locking. But it doesn't hurt to be a little bit more consistent and avoid having to explain *why* it's OK. Signed-off-by: David Woodhouse --- target/i386/kvm/xen-emu.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] i386/xen: consistent locking around Xen singleshot timers
On 22/05/2023 19:52, David Woodhouse wrote: From: David Woodhouse Coverity points out (CID 1507534) that we sometimes access env->xen_singleshot_timer_ns under the protection of env->xen_timers_lock (eg in xen_vcpu_singleshot_timer_event()) and sometimes not (the specific case Coverity complains about is in do_vcpu_soft_reset()). This isn't strictly an issue. There are two modes for the timers; if the kernel supports the EVTCHN_SEND capability then it handles all the timer hypercalls and delivery internally, and all we need to do is an ioctl to get/set the next timer as part of the vCPU state. If the kernel doesn't have that support, then we do all the emulation within qemu, and *those* are the code paths where we actually care about the locking. But it doesn't hurt to be a little bit more consistent and avoid having to explain *why* it's OK. Signed-off-by: David Woodhouse --- On Tue, 2023-05-09 at 15:55 +0100, Peter Maydell wrote: Hi; Coverity points out (CID 1507534) that we seem to sometimes access env->xen_singleshot_timer_ns under the protection of env->xen_timers_lock (eg in xen_vcpu_singleshot_timer_event()) and sometimes not (the specific case Coverity complains about is in do_vcpu_soft_reset()). Is this a false positive, or is there Thanks. As noted, I think it's harmless but it doesn't hurt to clean it up a bit. I've pushed this to my tree at https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv on top of the other fixes that didn't make 8.0. target/i386/kvm/xen-emu.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 RESEND] xen: Fix SEGV on domain disconnect
On 24/04/2023 14:17, Tim Smith wrote: On Mon, Apr 24, 2023 at 1:08 PM Mark Syms wrote: Copying in Tim who did the final phase of the changes. On Mon, 24 Apr 2023 at 11:32, Paul Durrant wrote: On 20/04/2023 12:02, mark.s...@citrix.com wrote: From: Mark Syms Ensure the PV ring is drained on disconnect. Also ensure all pending AIO is complete, otherwise AIO tries to complete into a mapping of the ring which has been torn down. Signed-off-by: Mark Syms --- CC: Stefano Stabellini CC: Anthony Perard CC: Paul Durrant CC: xen-de...@lists.xenproject.org v2: * Ensure all inflight requests are completed before teardown * RESEND to fix formatting --- hw/block/dataplane/xen-block.c | 31 +-- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c index 734da42ea7..d9da4090bf 100644 --- a/hw/block/dataplane/xen-block.c +++ b/hw/block/dataplane/xen-block.c @@ -523,6 +523,10 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane) dataplane->more_work = 0; +if (dataplane->sring == 0) { +return done_something; +} + I think you could just return false here... Nothing is ever going to be done if there's no ring :-) rc = dataplane->rings.common.req_cons; rp = dataplane->rings.common.sring->req_prod; xen_rmb(); /* Ensure we see queued requests up to 'rp'. */ @@ -666,14 +670,35 @@ void xen_block_dataplane_destroy(XenBlockDataPlane *dataplane > void xen_block_dataplane_stop(XenBlockDataPlane *dataplane) { XenDevice *xendev; +XenBlockRequest *request, *next; if (!dataplane) { return; } +/* We're about to drain the ring. We can cancel the scheduling of any + * bottom half now */ +qemu_bh_cancel(dataplane->bh); + +/* Ensure we have drained the ring */ +aio_context_acquire(dataplane->ctx); +do { +xen_block_handle_requests(dataplane); +} while (dataplane->more_work); +aio_context_release(dataplane->ctx); + I don't think we want to be taking new requests, do we? If we're in this situation and the guest has put something on the ring, I think we should do our best with it. We cannot just rely on the guest to be well-behaved, because they're not :-( We're about to throw the ring away, so whatever is there would otherwise be lost. We only throw away our mapping. The memory belongs to the guest and it should ensure it does not submit requests after the state has left 'connected' This bit is here to try to handle guests which are less than diligent about their shutdown. We *should* always be past this fast enough when the disconnect()/connect() of XenbusStateConnected happens that all remains well (if not, we were in a worse situation before). What about a malicious guest that is piling requests into the ring. It could keep us in the loop forever, couldn't it? +/* Now ensure that all inflight requests are complete */ +while (!QLIST_EMPTY(&dataplane->inflight)) { +QLIST_FOREACH_SAFE(request, &dataplane->inflight, list, next) { +blk_aio_flush(request->dataplane->blk, xen_block_complete_aio, +request); +} +} + I think this could possibly be simplified by doing the drain after the call to blk_set_aio_context(), as long as we set dataplane->ctx to qemu_get_aio_context(). Alos, as long as more_work is not set then it should still be safe to cancel the bh before the drain AFAICT. I'm not sure what you mean by simpler? Possibly I'm not getting something. Sorry, I was referring to the need to do aio_context_acquire() calls but they are only around the disputed xen_block_handle_requests() call anyway, so there's no simplification in this bit. We have to make sure that any "aio_bh_schedule_oneshot_full()" which happens as a result of "blk_aio_flush()" has finished before any change of AIO context, because it tries to use the one which was current at the time of being called (I have the SEGVs to prove it :-)). Ok, I had assumed that the issue was the context being picked up inside the xen_block_complete_aio() call. Whether that happens before or after "blk_set_aio_context(qemu_get_aio_context())" doesn't seem to be a change in complexity to me. Motivation was to get as much as possible to happen in the way it "normally" would, so that future changes are less likely to regress, but as mentioned maybe I'm missing something. The BH needs to be prevented from firing ASAP, otherwise the disconnect()/connect() which happens when XenbusStateConnected can have the bh fire from what the guest does next right in the middle of juggling contexts for the disconnect() (I have the SEGVs from that too...). So if you drop the ring drain then this patch should still stop the SEGVs, right? Paul
Re: [PATCH v2 RESEND] xen: Fix SEGV on domain disconnect
On 20/04/2023 12:02, mark.s...@citrix.com wrote: From: Mark Syms Ensure the PV ring is drained on disconnect. Also ensure all pending AIO is complete, otherwise AIO tries to complete into a mapping of the ring which has been torn down. Signed-off-by: Mark Syms --- CC: Stefano Stabellini CC: Anthony Perard CC: Paul Durrant CC: xen-de...@lists.xenproject.org v2: * Ensure all inflight requests are completed before teardown * RESEND to fix formatting --- hw/block/dataplane/xen-block.c | 31 +-- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c index 734da42ea7..d9da4090bf 100644 --- a/hw/block/dataplane/xen-block.c +++ b/hw/block/dataplane/xen-block.c @@ -523,6 +523,10 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane) dataplane->more_work = 0; +if (dataplane->sring == 0) { +return done_something; +} + I think you could just return false here... Nothing is ever going to be done if there's no ring :-) rc = dataplane->rings.common.req_cons; rp = dataplane->rings.common.sring->req_prod; xen_rmb(); /* Ensure we see queued requests up to 'rp'. */ @@ -666,14 +670,35 @@ void xen_block_dataplane_destroy(XenBlockDataPlane *dataplane > void xen_block_dataplane_stop(XenBlockDataPlane *dataplane) { XenDevice *xendev; +XenBlockRequest *request, *next; if (!dataplane) { return; } +/* We're about to drain the ring. We can cancel the scheduling of any + * bottom half now */ +qemu_bh_cancel(dataplane->bh); + +/* Ensure we have drained the ring */ +aio_context_acquire(dataplane->ctx); +do { +xen_block_handle_requests(dataplane); +} while (dataplane->more_work); +aio_context_release(dataplane->ctx); + I don't think we want to be taking new requests, do we? +/* Now ensure that all inflight requests are complete */ +while (!QLIST_EMPTY(&dataplane->inflight)) { +QLIST_FOREACH_SAFE(request, &dataplane->inflight, list, next) { +blk_aio_flush(request->dataplane->blk, xen_block_complete_aio, +request); +} +} + I think this could possibly be simplified by doing the drain after the call to blk_set_aio_context(), as long as we set dataplane->ctx to qemu_get_aio_context(). Alos, as long as more_work is not set then it should still be safe to cancel the bh before the drain AFAICT. Paul xendev = dataplane->xendev; aio_context_acquire(dataplane->ctx); + if (dataplane->event_channel) { /* Only reason for failure is a NULL channel */ xen_device_set_event_channel_context(xendev, dataplane->event_channel, @@ -684,12 +709,6 @@ void xen_block_dataplane_stop(XenBlockDataPlane *dataplane) blk_set_aio_context(dataplane->blk, qemu_get_aio_context(), &error_abort); aio_context_release(dataplane->ctx); -/* - * Now that the context has been moved onto the main thread, cancel - * further processing. - */ -qemu_bh_cancel(dataplane->bh); - if (dataplane->event_channel) { Error *local_err = NULL;
Re: [PATCH v2 08/16] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore
On 19/04/2023 18:28, Stefan Hajnoczi wrote: There is no need to suspend activity between aio_disable_external() and aio_enable_external(), which is mainly used for the block layer's drain operation. This is part of ongoing work to remove the aio_disable_external() API. Reviewed-by: David Woodhouse Signed-off-by: Stefan Hajnoczi --- hw/i386/kvm/xen_xenstore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Paul Durrant
Re: [PATCH 5/5] hw/xen: Fix broken check for invalid state in xs_be_open()
On 12/04/2023 19:51, David Woodhouse wrote: From: David Woodhouse Coverity points out that if (!s && !s->impl) isn't really what we intended to do here. CID 1508131. Fixes: 032475127225 ("hw/xen: Add emulated implementation of XenStore operations") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Paul Durrant
Re: [PATCH 4/5] hw/xen: Fix double-free in xen_console store_con_info()
On 12/04/2023 19:51, David Woodhouse wrote: From: David Woodhouse Coverity spotted a double-free (CID 1508254); we g_string_free(path) and then for some reason immediately call free(path) too. We should just use g_autoptr() for it anyway, which simplifies the code a bit. Fixes: 7a8a749da7d3 ("hw/xen: Move xenstore_store_pv_console_info to xen_console.c") Signed-off-by: David Woodhouse --- hw/char/xen_console.c | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 3/5] xen: Drop support for Xen versions below 4.7.1
On 12/04/2023 19:51, David Woodhouse wrote: From: David Woodhouse In restructuring to allow for internal emulation of Xen functionality, I broke compatibility for Xen 4.6 and earlier. Fix this by explicitly removing support for anything older than 4.7.1, which is also ancient but it does still build, and the compatibility support for it is fairly unintrusive. Fixes: 15e283c5b684 ("hw/xen: Add foreignmem operations to allow redirection to internal emulation") Signed-off-by: David Woodhouse --- hw/xen/xen-operations.c | 57 +-- include/hw/xen/xen_native.h | 107 +--- meson.build | 5 +- scripts/xen-detect.c| 60 4 files changed, 3 insertions(+), 226 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH 2/5] hw/xen: Fix memory leak in libxenstore_open() for Xen
On 12/04/2023 19:50, David Woodhouse wrote: From: David Woodhouse There was a superfluous allocation of the XS handle, leading to it being leaked on both the error path and the success path (where it gets allocated again). Spotted by Coverity (CID 1508098). Fixes: ba2a92db1ff6 ("hw/xen: Add xenstore operations to allow redirection to internal emulation") Suggested-by: Peter Maydell Signed-off-by: David Woodhouse Reviewed-by: Peter Maydell Reviewed-by: Paul Durrant
Re: [PATCH] hw/xen: Simplify emulated Xen platform init
On 10/03/2023 10:54, David Woodhouse wrote: From: David Woodhouse I initially put the basic platform init (overlay pages, grant tables, event channels) into mc->kvm_type because that was the earliest place that could sensibly test for xen_mode==XEN_EMULATE. The intent was to do this early enough that we could then initialise the XenBus and other parts which would have depended on them, from a generic location for both Xen and KVM/Xen in the PC-specific code, as seen in https://lore.kernel.org/qemu-devel/20230116221919.1124201-16-dw...@infradead.org/ However, then the Xen on Arm patches came along, and *they* wanted to do the XenBus init from a 'generic' Xen-specific location instead: https://lore.kernel.org/qemu-devel/20230210222729.957168-4-sstabell...@kernel.org/ Since there's no generic location that covers all three, I conceded to do it for XEN_EMULATE mode in pc_basic_devices_init(). And now there's absolutely no point in having some of the platform init done from pc_machine_kvm_type(); we can move it all up to live in a single place in pc_basic_devices_init(). This has the added benefit that we can drop the separate xen_evtchn_connect_gsis() function completely, and pass just the system GSIs in directly to xen_evtchn_create(). While I'm at it, it does no harm to explicitly pass in the *number* of said GSIs, because it does make me twitch a bit to pass an array of impicit size. During the lifetime of the KVM/Xen patchset, that had already changed (albeit just cosmetically) from GSI_NUM_PINS to IOAPIC_NUM_PINS. And document a bit better that this is for the *output* GSI for raising CPU0's events when the per-CPU vector isn't available. The fact that we create a whole set of them and then only waggle the one we're told to, instead of having a single output and only *connecting* it to the GSI that it should be connected to, is still non-intuitive for me. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 40 hw/i386/kvm/xen_evtchn.h | 3 +-- hw/i386/pc.c | 13 - 3 files changed, 25 insertions(+), 31 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH] hw/xenpv: Initialize Xen backend operations
On 23/03/2023 10:57, David Woodhouse wrote: From: David Woodhouse As the Xen backend operations were abstracted out into a function table to allow for internally emulated Xen support, we missed the xen_init_pv() code path which also needs to install the operations for the true Xen libraries. Add the missing call to setup_xen_backend_ops(). Fixes: b6cacfea0b38 ("hw/xen: Add evtchn operations to allow redirection to internal emulation") Reported-by: Anthony PERARD Signed-off-by: David Woodhouse --- hw/xenpv/xen_machine_pv.c | 2 ++ 1 file changed, 2 insertions(+) Reviewed-by: Paul Durrant diff --git a/hw/xenpv/xen_machine_pv.c b/hw/xenpv/xen_machine_pv.c index 2e759d0619..17cda5ec13 100644 --- a/hw/xenpv/xen_machine_pv.c +++ b/hw/xenpv/xen_machine_pv.c @@ -35,6 +35,8 @@ static void xen_init_pv(MachineState *machine) DriveInfo *dinfo; int i; +setup_xen_backend_ops(); + /* Initialize backend core & drivers */ xen_be_init();
Re: [PATCH] accel/xen: Fix DM state change notification in dm_restrict mode
On 14/03/2023 08:35, David Woodhouse wrote: From: David Woodhouse When dm_restrict is set, QEMU isn't permitted to update the XenStore node to indicate its running status. Previously, the xs_write() call would fail but the failure was ignored. However, in refactoring to allow for emulated XenStore operations, a new call to xs_open() was added. That one didn't fail gracefully, causing a fatal error when running in dm_restrict mode. Partially revert the offending patch, removing the additional call to xs_open() because the global 'xenstore' variable is still available; it just needs to be used with qemu_xen_xs_write() now instead of directly with the xs_write() libxenstore function. Also make the whole thing conditional on !xen_domid_restrict. There's no point even registering the state change handler to attempt to update the XenStore node when we know it's destined to fail. Fixes: ba2a92db1ff6 ("hw/xen: Add xenstore operations to allow redirection to internal emulation") Reported-by: Jason Andryuk Co-developed-by: Jason Andryuk Not-Signed-off-by: Jason Andryuk Signed-off-by: David Woodhouse Will-be-Tested-by: Jason Andryuk --- accel/xen/xen-all.c | 27 ++- 1 file changed, 10 insertions(+), 17 deletions(-) Reviewed-by: Paul Durrant
Re: [PATCH v2 00/27] Enable PV backends with Xen/KVM emulation
On 07/03/2023 17:17, David Woodhouse wrote: Following on from the basic platform support which has already been merged, here's phase 2 which wires up the XenBus and PV back ends. It starts with a basic single-tenant internal implementation of a XenStore, with a copy-on-write tree, watches, transactions, quotas. Then we introduce operations tables for the grant table, event channel, foreignmen and xenstore operations so that in addition to using the Xen libraries for those, QEMU can use its internal emulated versions. A little bit of cleaning up of header files, and we can enable the build of xen-bus in the CONFIG_XEN_EMU build, and run a Xen guest with an actual PV disk... qemu-system-x86_64 -serial mon:stdio -M q35 -display none -m 1G -smp 2 \ -accel kvm,xen-version=0x4000e,kernel-irqchip=split \ -kernel bzImage -append "console=ttyS0 root=/dev/xvda1 selinux=0" \ -drive file=/var/lib/libvirt/images/fedora28.qcow2,if=none,id=disk \ -device xen-disk,drive=disk,vdev=xvda The main thing that isn't working here is migration. I've implemented it for the internal xenstore and the unit tests exercise it, but the existing PV back ends don't support it, perhaps partly because support for guest transparent live migration support isn't upstream in Xen yet. So the disk doesn't come back correctly after migration. I'm content with that for 8.0 though, and we just mark the emulated XenStore device as unmigratable to prevent users from trying. The other pre-existing constraint is that only the block back end has yet been ported to the "new" XenBus infrastructure, and is actually capable of creating its own backend nodes. Again, I can live with that for 8.0. Maybe this will motivate us to finally get round to converting the rest off XenLegacyBackend and killing it. We also don't have a simple way to perform grant mapping of multiple guest pages to contiguous addresses, as we can under real Xen. So we don't advertise max-ring-page-order for xen-disk in the emulated mode. Fixing that — if we actually want to — would probably require mapping RAM from an actual backing store object, so that it can be mapped again at a different location for the PV back end to see. v2: https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-2 • Full set of reviewed-by tags from Paul (and associated minor fixes). • Disable migration for emulated XenStore device. • Update docs and add MAINTAINERS entry. v1: https://lore.kernel.org/qemu-devel/20230302153435.1170111-1-dw...@infradead.org/ https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv-1 David Woodhouse (23): hw/xen: Add xenstore wire implementation and implementation stubs hw/xen: Add basic XenStore tree walk and write/read/directory support hw/xen: Implement XenStore watches hw/xen: Implement XenStore transactions hw/xen: Watches on XenStore transactions hw/xen: Implement core serialize/deserialize methods for xenstore_impl hw/xen: Add evtchn operations to allow redirection to internal emulation hw/xen: Add gnttab operations to allow redirection to internal emulation hw/xen: Pass grant ref to gnttab unmap operation hw/xen: Add foreignmem operations to allow redirection to internal emulation hw/xen: Move xenstore_store_pv_console_info to xen_console.c hw/xen: Use XEN_PAGE_SIZE in PV backend drivers hw/xen: Rename xen_common.h to xen_native.h hw/xen: Build PV backend drivers for CONFIG_XEN_BUS hw/xen: Only advertise ring-page-order for xen-block if gnttab supports it hw/xen: Hook up emulated implementation for event channel operations hw/xen: Add emulated implementation of grant table operations hw/xen: Add emulated implementation of XenStore operations hw/xen: Map guest XENSTORE_PFN grant in emulated Xenstore hw/xen: Implement soft reset for emulated gnttab i386/xen: Initialize Xen backends from pc_basic_device_init() for emulation MAINTAINERS: Add entry for Xen on KVM emulation docs: Update Xen-on-KVM documentation for PV disk support Paul Durrant (4): hw/xen: Implement XenStore permissions hw/xen: Create initial XenStore nodes hw/xen: Add xenstore operations to allow redirection to internal emulation hw/xen: Avoid crash when backend watch fires too early MAINTAINERS |9 + accel/xen/xen-all.c | 69 +- docs/system/i386/xen.rst | 30 +- hw/9pfs/meson.build |2 +- hw/9pfs/xen-9p-backend.c | 32 +- hw/block/dataplane/meson.build|2 +- hw/block/dataplane/xen-block.c| 12 +- hw/block/meson.build |2 +- hw/block/xen
Re: [PATCH v2 25/27] i386/xen: Initialize Xen backends from pc_basic_device_init() for emulation
On 07/03/2023 17:17, David Woodhouse wrote: From: David Woodhouse Now that all the work is done to enable the PV backends to work without actual Xen, instantiate the bus from pc_basic_device_init() for emulated mode. This allows us finally to launch an emulated Xen guest with PV disk. qemu-system-x86_64 -serial mon:stdio -M q35 -cpu host -display none \ -m 1G -smp 2 -accel kvm,xen-version=0x4000a,kernel-irqchip=split \ -kernel bzImage -append "console=ttyS0 root=/dev/xvda1" \ -drive file=/var/lib/libvirt/images/fedora28.qcow2,if=none,id=disk \ -device xen-disk,drive=disk,vdev=xvda If we use -M pc instead of q35, we can even add an IDE disk and boot a guest image normally through grub. But q35 gives us AHCI and that isn't unplugged by the Xen magic, so the guests ends up seeing "both" disks. Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- hw/i386/pc.c | 7 +++ 1 file changed, 7 insertions(+) Also... Tested-by: Paul Durrant ... on real Xen (master branch, 4.18) with a Debian guest.
Re: [RFC PATCH v1 07/25] hw/xen: Implement core serialize/deserialize methods for xenstore_impl
On 07/03/2023 16:59, Paul Durrant wrote: On 07/03/2023 16:52, David Woodhouse wrote: On Tue, 2023-03-07 at 16:39 +, Paul Durrant wrote: On 07/03/2023 16:33, David Woodhouse wrote: On Thu, 2023-03-02 at 15:34 +, David Woodhouse wrote: From: David Woodhouse In fact I think we want to only serialize the contents of the domain's path in /local/domain/${domid} and leave the rest to be recreated? Will defer to Paul for that. Signed-off-by: David Woodhouse Paul, your Reviewed-by: on this one is conspicuous in its absence. I mentioned migration in the cover letter — this much is working fine, but it's the PV back ends that don't yet work. I'd quite like to merge the basic serialization/deserialization of XenStore itself, with the unit tests. The patch is basically ok, I just think the serialization should be limited to the guest nodes... filtering out those not owned by xen_domid would probably work for that. Yeah, so let's just do this (as part of this patch #7) for now: --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -235,6 +235,7 @@ static int xen_xenstore_post_load(void *opaque, int ver) static const VMStateDescription xen_xenstore_vmstate = { .name = "xen_xenstore", + .unmigratable = 1, /* The PV back ends don't migrate yet */ .version_id = 1, .minimum_version_id = 1, .needed = xen_xenstore_is_needed, It means we can't migrate guests even if they're only using fully emulated devices... but I think that's a reasonable limitation until we implement it fully. Ok. With that added... Revieweed-by: Paul Durrant Typoed, sorry... Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 07/25] hw/xen: Implement core serialize/deserialize methods for xenstore_impl
On 07/03/2023 16:52, David Woodhouse wrote: On Tue, 2023-03-07 at 16:39 +, Paul Durrant wrote: On 07/03/2023 16:33, David Woodhouse wrote: On Thu, 2023-03-02 at 15:34 +, David Woodhouse wrote: From: David Woodhouse In fact I think we want to only serialize the contents of the domain's path in /local/domain/${domid} and leave the rest to be recreated? Will defer to Paul for that. Signed-off-by: David Woodhouse Paul, your Reviewed-by: on this one is conspicuous in its absence. I mentioned migration in the cover letter — this much is working fine, but it's the PV back ends that don't yet work. I'd quite like to merge the basic serialization/deserialization of XenStore itself, with the unit tests. The patch is basically ok, I just think the serialization should be limited to the guest nodes... filtering out those not owned by xen_domid would probably work for that. Yeah, so let's just do this (as part of this patch #7) for now: --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -235,6 +235,7 @@ static int xen_xenstore_post_load(void *opaque, int ver) static const VMStateDescription xen_xenstore_vmstate = { .name = "xen_xenstore", +.unmigratable = 1, /* The PV back ends don't migrate yet */ .version_id = 1, .minimum_version_id = 1, .needed = xen_xenstore_is_needed, It means we can't migrate guests even if they're only using fully emulated devices... but I think that's a reasonable limitation until we implement it fully. Ok. With that added... Revieweed-by: Paul Durrant
Re: [RFC PATCH v1 07/25] hw/xen: Implement core serialize/deserialize methods for xenstore_impl
On 07/03/2023 16:39, Paul Durrant wrote: On 07/03/2023 16:33, David Woodhouse wrote: On Thu, 2023-03-02 at 15:34 +, David Woodhouse wrote: From: David Woodhouse In fact I think we want to only serialize the contents of the domain's path in /local/domain/${domid} and leave the rest to be recreated? Will defer to Paul for that. Signed-off-by: David Woodhouse Paul, your Reviewed-by: on this one is conspicuous in its absence. I mentioned migration in the cover letter — this much is working fine, but it's the PV back ends that don't yet work. I'd quite like to merge the basic serialization/deserialization of XenStore itself, with the unit tests. The patch is basically ok, I just think the serialization should be limited to the guest nodes... filtering out those not owned by xen_domid would probably work for that. Perhaps we can also set TYPE_XEN_DEVICE or TYPE_XEN_BLOCK_DEVICE to be unmigratable? Ideally I think we want TYPE_XEN_DEVICE to be unmigratable by default *unless* the specific device class (including net and other as we port them from XenLegacyDevice) says otherwise. Yes, that sounds like a good idea. Is there a way to do that? Not something I've looked into. I'll go look now. Maybe calling migrate_add_blocker() in the realize method is the right way to achieve this? Paul
Re: [RFC PATCH v1 07/25] hw/xen: Implement core serialize/deserialize methods for xenstore_impl
On 07/03/2023 16:33, David Woodhouse wrote: On Thu, 2023-03-02 at 15:34 +, David Woodhouse wrote: From: David Woodhouse In fact I think we want to only serialize the contents of the domain's path in /local/domain/${domid} and leave the rest to be recreated? Will defer to Paul for that. Signed-off-by: David Woodhouse Paul, your Reviewed-by: on this one is conspicuous in its absence. I mentioned migration in the cover letter — this much is working fine, but it's the PV back ends that don't yet work. I'd quite like to merge the basic serialization/deserialization of XenStore itself, with the unit tests. The patch is basically ok, I just think the serialization should be limited to the guest nodes... filtering out those not owned by xen_domid would probably work for that. Perhaps we can also set TYPE_XEN_DEVICE or TYPE_XEN_BLOCK_DEVICE to be unmigratable? Ideally I think we want TYPE_XEN_DEVICE to be unmigratable by default *unless* the specific device class (including net and other as we port them from XenLegacyDevice) says otherwise. Yes, that sounds like a good idea. Is there a way to do that? Not something I've looked into. I'll go look now. Paul
Re: [RFC PATCH v1 27/25] docs: Update Xen-on-KVM documentation for PV disk support
On 07/03/2023 16:22, David Woodhouse wrote: From: David Woodhouse Signed-off-by: David Woodhouse --- docs/system/i386/xen.rst | 30 +++--- 1 file changed, 23 insertions(+), 7 deletions(-) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 26/25] MAINTAINERS: Add entry for Xen on KVM emulation
On 07/03/2023 16:21, David Woodhouse wrote: From: David Woodhouse Signed-off-by: David Woodhouse --- MAINTAINERS | 9 + 1 file changed, 9 insertions(+) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 25/25] i386/xen: Initialize Xen backends from pc_basic_device_init() for emulation
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse Now that all the work is done to enable the PV backends to work without actual Xen, instantiate the bus from pc_basic_device_init() for emulated mode. This allows us finally to launch an emulated Xen guest with PV disk. qemu-system-x86_64 -serial mon:stdio -M q35 -cpu host -display none \ -m 1G -smp 2 -accel kvm,xen-version=0x4000a,kernel-irqchip=split \ -kernel bzImage -append "console=ttyS0 root=/dev/xvda1" \ -drive file=/var/lib/libvirt/images/fedora28.qcow2,if=none,id=disk \ -device xen-disk,drive=disk,vdev=xvda If we use -M pc instead of q35, we can even add an IDE disk and boot a guest image normally through grub. But q35 gives us AHCI and that isn't unplugged by the Xen magic, so the guests ends up seeing "both" disks. Signed-off-by: David Woodhouse --- hw/i386/pc.c | 7 +++ 1 file changed, 7 insertions(+) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 24/25] hw/xen: Implement soft reset for emulated gnttab
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse This is only part of it; we will also need to get the PV back end drivers to tear down their own mappings (or do it for them, but they kind of need to stop using the pointers too). Some more work on the actual PV back ends and xen-bus code is going to be needed to really make soft reset and migration fully functional, and this part is the basis for that. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_gnttab.c | 26 -- hw/i386/kvm/xen_gnttab.h | 1 + target/i386/kvm/xen-emu.c | 5 + 3 files changed, 30 insertions(+), 2 deletions(-) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 23/25] hw/xen: Map guest XENSTORE_PFN grant in emulated Xenstore
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 16 1 file changed, 16 insertions(+) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index 028f80499e..f9b7387024 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -21,6 +21,7 @@ #include "hw/sysbus.h" #include "hw/xen/xen.h" +#include "hw/xen/xen_backend_ops.h" #include "xen_overlay.h" #include "xen_evtchn.h" #include "xen_xenstore.h" @@ -34,6 +35,7 @@ #include "hw/xen/interface/io/xs_wire.h" #include "hw/xen/interface/event_channel.h" +#include "hw/xen/interface/grant_table.h" #define TYPE_XEN_XENSTORE "xen-xenstore" OBJECT_DECLARE_SIMPLE_TYPE(XenXenstoreState, XEN_XENSTORE) @@ -66,6 +68,9 @@ struct XenXenstoreState { uint8_t *impl_state; uint32_t impl_state_size; + +struct xengntdev_handle *gt; +void *granted_xs; }; struct XenXenstoreState *xen_xenstore_singleton; @@ -1452,6 +1457,17 @@ int xen_xenstore_reset(void) } s->be_port = err; +/* + * We don't actually access the guest's page through the grant, because + * this isn't real Xen, and we can just use the page we gave it in the + * first place. Map the grant anyway, mostly for cosmetic purposes so + * it *looks* like it's in use in the guest-visible grant table. Might be useful to stick this text in the commit comment too. Reviewed-by: Paul Durrant + */ +s->gt = qemu_xen_gnttab_open(); +uint32_t xs_gntref = GNTTAB_RESERVED_XENSTORE; +s->granted_xs = qemu_xen_gnttab_map_refs(s->gt, 1, xen_domid, &xs_gntref, + PROT_READ | PROT_WRITE); + return 0; }
Re: [RFC PATCH v1 22/25] hw/xen: Add emulated implementation of XenStore operations
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse Now that we have an internal implementation of XenStore, we can populate the xenstore_backend_ops to allow PV backends to talk to it. Watches can't be processed with immediate callbacks because that would call back into XenBus code recursively. Defer them to a QEMUBH to be run as appropriate from the main loop. We use a QEMUBH per XS handle, and it walks all the watches (there shouldn't be many per handle) to fire any which have pending events. We *could* have done it differently but this allows us to use the same struct watch_event as we have for the guest side, and keeps things relatively simple. Yes, it's more consistent with watch events on real Xen this way. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 273 - 1 file changed, 269 insertions(+), 4 deletions(-) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 21/25] hw/xen: Add emulated implementation of grant table operations
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse This is limited to mapping a single grant at a time, because under Xen the pages are mapped *contiguously* into qemu's address space, and that's very hard to do when those pages actually come from anonymous mappings in qemu in the first place. Eventually perhaps we can look at using shared mappings of actual objects for system RAM, and then we can make new mappings of the same backing store (be it deleted files, shmem, whatever). But for now let's stick to a page at a time. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_gnttab.c | 299 ++- 1 file changed, 296 insertions(+), 3 deletions(-) [snip] +static uint64_t gnt_ref(XenGnttabState *s, grant_ref_t ref, int prot) +{ +uint16_t mask = GTF_type_mask | GTF_sub_page; +grant_entry_v1_t gnt, *gnt_p; +int retries = 0; + +if (ref >= s->max_frames * ENTRIES_PER_FRAME_V1 || +s->map_track[ref] == UINT8_MAX) { +return INVALID_GPA; +} + +if (prot & PROT_WRITE) { +mask |= GTF_readonly; +} + +gnt_p = &s->entries.v1[ref]; + +/* + * The guest can legitimately be changing the GTF_readonly flag. Allow I'd call a guest playing with the ref after setting GTF_permit_access a buggy guest and not bother with the loop. + * that, but don't let a malicious guest cause a livelock. + */ +for (retries = 0; retries < 5; retries++) { +uint16_t new_flags; + +/* Read the entry before an atomic operation on its flags */ +gnt = *(volatile grant_entry_v1_t *)gnt_p; + +if ((gnt.flags & mask) != GTF_permit_access || +gnt.domid != DOMID_QEMU) { +return INVALID_GPA; +} + +new_flags = gnt.flags | GTF_reading; +if (prot & PROT_WRITE) { +new_flags |= GTF_writing; +} + +if (qatomic_cmpxchg(&gnt_p->flags, gnt.flags, new_flags) == gnt.flags) { Xen actually does a cmpxchg on both the flags and the domid. We probably ought to fail to set the flags if the guest is playing with the domid but since we're single-tenant it doesn't *really* matter... just a nice-to-have. So... Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 20/25] hw/xen: Hook up emulated implementation for event channel operations
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse We provided the backend-facing evtchn functions very early on as part of the core Xen platform support, since things like timers and xenstore need to use them. By what may or may not be an astonishing coincidence, those functions just *happen* all to have exactly the right function prototypes to slot into the evtchn_backend_ops table and be called by the PV backends. It is indeed an amazing coincidence :-) Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 15 +++ 1 file changed, 15 insertions(+) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 19/25] hw/xen: Only advertise ring-page-order for xen-block if gnttab supports it
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse Whem emulating Xen, multi-page grants are distinctly non-trivial and we have elected not to support them for the time being. Don't advertise them to the guest. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 18/25] hw/xen: Avoid crash when backend watch fires too early
On 02/03/2023 15:34, David Woodhouse wrote: From: Paul Durrant The xen-block code ends up calling aio_poll() through blkconf_geometry(), which means we see watch events during the indirect call to xendev_class->realize() in xen_device_realize(). Unfortunately this call is made before populating the initial frontend and backend device nodes in xenstore and hence xen_block_frontend_changed() (which is called from a watch event) fails to read the frontend's 'state' node, and hence believes the device is being torn down. This in-turn sets the backend state to XenbusStateClosed and causes the device to be deleted before it is fully set up, leading to the crash. By simply moving the call to xendev_class->realize() after the initial xenstore nodes are populated, this sorry state of affairs is avoided. Reported-by: David Woodhouse Signed-off-by: Paul Durrant Signed-off-by: David Woodhouse --- hw/xen/xen-bus.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 17/25] hw/xen: Build PV backend drivers for CONFIG_XEN_BUS
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse Now that we have the redirectable Xen backend operations we can build the PV backends even without the Xen libraries. Signed-off-by: David Woodhouse --- hw/9pfs/meson.build| 2 +- hw/block/dataplane/meson.build | 2 +- hw/block/meson.build | 2 +- hw/char/meson.build| 2 +- hw/display/meson.build | 2 +- hw/usb/meson.build | 2 +- hw/xen/meson.build | 5 - 7 files changed, 10 insertions(+), 7 deletions(-) Reviewed-by: Paul Durrant
Re: [RFC PATCH v1 16/25] hw/xen: Rename xen_common.h to xen_native.h
On 02/03/2023 15:34, David Woodhouse wrote: From: David Woodhouse This header is now only for native Xen code, not PV backends that may be used in Xen emulation. Since the toolstack libraries may depend on the specific version of Xen headers that they pull in (and will set the __XEN_TOOLS__ macro to enable internal definitions that they depend on), the rule is that xen_native.h (and thus the toolstack library headers) must be included *before* any of the headers in include/hw/xen/interface. Signed-off-by: David Woodhouse --- accel/xen/xen-all.c | 1 + hw/9pfs/xen-9p-backend.c | 1 + hw/block/dataplane/xen-block.c| 3 ++- hw/block/xen-block.c | 1 - hw/i386/pc_piix.c | 4 ++-- hw/i386/xen/xen-hvm.c | 11 +- hw/i386/xen/xen-mapcache.c| 2 +- hw/i386/xen/xen_platform.c| 7 +++--- hw/xen/trace-events | 2 +- hw/xen/xen-operations.c | 2 +- hw/xen/xen_pt.c | 2 +- hw/xen/xen_pt.h | 2 +- hw/xen/xen_pt_config_init.c | 2 +- hw/xen/xen_pt_msi.c | 4 ++-- include/hw/xen/xen.h | 22 --- include/hw/xen/{xen_common.h => xen_native.h} | 10 ++--- include/hw/xen/xen_pvdev.h| 3 ++- 17 files changed, 47 insertions(+), 32 deletions(-) rename include/hw/xen/{xen_common.h => xen_native.h} (98%) Reviewed-by: Paul Durrant