[PATCH v4 6/9] xen/arm/gic: Allow removing interrupt to running VMs
From: Vikram Garhwal Currently, removing physical interrupts are only allowed at the domain destroy time. For use cases such as dynamic device tree overlay removing, the removing of physical IRQ to running domains should be allowed. Move the above-mentioned domain dying check to vgic_connect_hw_irq(). Similarly as the routing interrupt to running domains, reject the operation if the IRQ is active or pending in the guest. Do it for both new and old vGIC implementations. Since now vgic_connect_hw_irq() may reject the invalid operation case, move the clear of _IRQ_INPROGRESS flag in gic_remove_irq_from_guest() to after the successful execution of vgic_connect_hw_irq(). Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- v4: - Split the original patch, only do the removing IRQ stuff in this patch. - Move the clear of _IRQ_INPROGRESS flag in gic_remove_irq_from_guest() to after the successful execution of vgic_connect_hw_irq(). - Special case the d->is_dying check. --- xen/arch/arm/gic-vgic.c | 27 --- xen/arch/arm/gic.c | 9 + xen/arch/arm/vgic/vgic.c | 24 3 files changed, 45 insertions(+), 15 deletions(-) diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index b99e287224..56b6a3d5b0 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -439,6 +439,14 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, /* We are taking to rank lock to prevent parallel connections. */ vgic_lock_rank(v_target, rank, flags); +/* Return with error if the IRQ is being migrated. */ +if( test_bit(GIC_IRQ_GUEST_MIGRATING, >status) ) +{ +vgic_unlock_rank(v_target, rank, flags); +return -EBUSY; +} + +spin_lock(_target->arch.vgic.lock); if ( connect ) { @@ -456,12 +464,25 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, } else { -if ( desc && p->desc != desc ) -ret = -EINVAL; +if ( d->is_dying ) +{ +if ( desc && p->desc != desc ) +ret = -EINVAL; +else +p->desc = NULL; +} else -p->desc = NULL; +{ +if ( (desc && p->desc != desc) || + test_bit(GIC_IRQ_GUEST_VISIBLE, >status) || + test_bit(GIC_IRQ_GUEST_ACTIVE, >status) ) +ret = -EINVAL; +else +p->desc = NULL; +} } +spin_unlock(_target->arch.vgic.lock); vgic_unlock_rank(v_target, rank, flags); return ret; diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index b3467a76ae..8633f14bdd 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -159,24 +159,17 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, ASSERT(test_bit(_IRQ_GUEST, >status)); ASSERT(!is_lpi(virq)); -/* - * Removing an interrupt while the domain is running may have - * undesirable effect on the vGIC emulation. - */ -if ( !d->is_dying ) -return -EBUSY; - desc->handler->shutdown(desc); /* EOI the IRQ if it has not been done by the guest */ if ( test_bit(_IRQ_INPROGRESS, >status) ) gic_hw_ops->deactivate_irq(desc); -clear_bit(_IRQ_INPROGRESS, >status); ret = vgic_connect_hw_irq(d, NULL, virq, desc, false); if ( ret ) return ret; +clear_bit(_IRQ_INPROGRESS, >status); clear_bit(_IRQ_GUEST, >status); desc->handler = _irq_type; diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c index 048e12c562..0c324b58f7 100644 --- a/xen/arch/arm/vgic/vgic.c +++ b/xen/arch/arm/vgic/vgic.c @@ -890,14 +890,30 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *vcpu, } else/* remove a mapped IRQ */ { -if ( desc && irq->hwintid != desc->irq ) +if ( d->is_dying ) { -ret = -EINVAL; +if ( desc && irq->hwintid != desc->irq ) +{ +ret = -EINVAL; +} +else +{ +irq->hw = false; +irq->hwintid = 0; +} } else { -irq->hw = false; -irq->hwintid = 0; +if ( (desc && irq->hwintid != desc->irq) || + irq->active || irq->pending_latch ) +{ +ret = -EINVAL; +} +else +{ +irq->hw = false; +irq->hwintid = 0; +} } } -- 2.34.1
[PATCH v4 8/9] tools: Introduce the "xl dt-overlay {attach,detach}" commands
With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to attach/detach devices from the provided DT overlay to domains. Support this by introducing a new set of "xl dt-overlay" commands and related documentation, i.e. "xl dt-overlay {attach,detach}". Slightly rework the command option parsing logic. Signed-off-by: Henry Wang Reviewed-by: Jason Andryuk --- v4: - Add Jason's Reviewed-by tag. v3: - Introduce new API libxl_dt_overlay_domain() and co., instead of reusing existing API libxl_dt_overlay(). - Add in-code comments for the LIBXL_DT_OVERLAY_* macros. - Use find_domain() to avoid getting domain_id from strtol(). v2: - New patch. --- tools/include/libxl.h | 10 +++ tools/include/xenctrl.h | 3 +++ tools/libs/ctrl/xc_dt_overlay.c | 31 + tools/libs/light/libxl_dt_overlay.c | 28 +++ tools/xl/xl_cmdtable.c | 4 +-- tools/xl/xl_vmcontrol.c | 42 - 6 files changed, 104 insertions(+), 14 deletions(-) diff --git a/tools/include/libxl.h b/tools/include/libxl.h index 62cb07dea6..6cc6d6bf6a 100644 --- a/tools/include/libxl.h +++ b/tools/include/libxl.h @@ -2549,8 +2549,18 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, uint32_t domid, void libxl_device_pci_list_free(libxl_device_pci* list, int num); #if defined(__arm__) || defined(__aarch64__) +/* Values should keep consistent with the op from XEN_SYSCTL_dt_overlay */ +#define LIBXL_DT_OVERLAY_ADD 1 +#define LIBXL_DT_OVERLAY_REMOVE2 int libxl_dt_overlay(libxl_ctx *ctx, void *overlay, uint32_t overlay_size, uint8_t overlay_op); + +/* Values should keep consistent with the op from XEN_DOMCTL_dt_overlay */ +#define LIBXL_DT_OVERLAY_DOMAIN_ATTACH 1 +#define LIBXL_DT_OVERLAY_DOMAIN_DETACH 2 +int libxl_dt_overlay_domain(libxl_ctx *ctx, uint32_t domain_id, +void *overlay_dt, uint32_t overlay_dt_size, +uint8_t overlay_op); #endif /* diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 4996855944..9ceca0cffc 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -2657,6 +2657,9 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t domid, #if defined(__arm__) || defined(__aarch64__) int xc_dt_overlay(xc_interface *xch, void *overlay_fdt, uint32_t overlay_fdt_size, uint8_t overlay_op); +int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt, + uint32_t overlay_fdt_size, uint8_t overlay_op, + uint32_t domain_id); #endif /* Compat shims */ diff --git a/tools/libs/ctrl/xc_dt_overlay.c b/tools/libs/ctrl/xc_dt_overlay.c index c2224c4d15..ea1da522d1 100644 --- a/tools/libs/ctrl/xc_dt_overlay.c +++ b/tools/libs/ctrl/xc_dt_overlay.c @@ -48,3 +48,34 @@ err: return err; } + +int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt, + uint32_t overlay_fdt_size, uint8_t overlay_op, + uint32_t domain_id) +{ +int err; +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_dt_overlay, +.domain = domain_id, +.u.dt_overlay = { +.overlay_op = overlay_op, +.overlay_fdt_size = overlay_fdt_size, +} +}; + +DECLARE_HYPERCALL_BOUNCE(overlay_fdt, overlay_fdt_size, + XC_HYPERCALL_BUFFER_BOUNCE_IN); + +if ( (err = xc_hypercall_bounce_pre(xch, overlay_fdt)) ) +goto err; + +set_xen_guest_handle(domctl.u.dt_overlay.overlay_fdt, overlay_fdt); + +if ( (err = do_domctl(xch, )) != 0 ) +PERROR("%s failed", __func__); + +err: +xc_hypercall_bounce_post(xch, overlay_fdt); + +return err; +} diff --git a/tools/libs/light/libxl_dt_overlay.c b/tools/libs/light/libxl_dt_overlay.c index a6c709a6dc..00503b76bd 100644 --- a/tools/libs/light/libxl_dt_overlay.c +++ b/tools/libs/light/libxl_dt_overlay.c @@ -69,3 +69,31 @@ out: return rc; } +int libxl_dt_overlay_domain(libxl_ctx *ctx, uint32_t domain_id, +void *overlay_dt, uint32_t overlay_dt_size, +uint8_t overlay_op) +{ +int rc; +int r; +GC_INIT(ctx); + +if (check_overlay_fdt(gc, overlay_dt, overlay_dt_size)) { +LOG(ERROR, "Overlay DTB check failed"); +rc = ERROR_FAIL; +goto out; +} else { +LOG(DEBUG, "Overlay DTB check passed"); +rc = 0; +} + +r = xc_dt_overlay_domain(ctx->xch, overlay_dt, overlay_dt_size, overlay_op, + domain_id); +if (r) { +LOG(ERROR, "%s: Attaching/Detaching overlay dtb failed.", __func__); +rc = ERROR_FAIL; +} + +out: +GC_FREE; +return rc; +} diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c index 1
[PATCH v4 5/9] xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains
In order to support the dynamic dtbo device assignment to a running VM, the add/remove of the DT overlay and the attach/detach of the device from the DT overlay should happen separately. Therefore, repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT overlay to Xen device tree, instead of assigning the device to the hardware domain at the same time. Add the XEN_DOMCTL_dt_overlay with operations XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment to the domain. The hypervisor firstly checks the DT overlay passed from the toolstack is valid. Then the device nodes are retrieved from the overlay tracker based on the DT overlay. The attach of the device is implemented by mapping the IRQ and IOMMU resources. Signed-off-by: Henry Wang Signed-off-by: Vikram Garhwal --- v4: - Split the original patch, only do the device attachment. v3: - Style fixes for arch-selection #ifdefs. - Do not include public/domctl.h, only add a forward declaration of struct xen_domctl_dt_overlay. - Extract the overlay track entry finding logic to a function, drop the unused variables. - Use op code 1&2 for XEN_DOMCTL_DT_OVERLAY_{ATTACH,DETACH}. v2: - New patch. --- xen/arch/arm/domctl.c| 3 + xen/common/dt-overlay.c | 199 ++- xen/include/public/domctl.h | 14 +++ xen/include/public/sysctl.h | 11 +- xen/include/xen/dt-overlay.h | 7 ++ 5 files changed, 176 insertions(+), 58 deletions(-) diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index ad56efb0f5..12a12ee781 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -5,6 +5,7 @@ * Copyright (c) 2012, Citrix Systems */ +#include #include #include #include @@ -176,6 +177,8 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, return rc; } +case XEN_DOMCTL_dt_overlay: +return dt_overlay_domctl(d, >u.dt_overlay); default: return subarch_do_domctl(domctl, d, u_domctl); } diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c index 9cece79067..1087f9b502 100644 --- a/xen/common/dt-overlay.c +++ b/xen/common/dt-overlay.c @@ -356,6 +356,42 @@ static int overlay_get_nodes_info(const void *fdto, char **nodes_full_path) return 0; } +/* This function should be called with the overlay_lock taken */ +static struct overlay_track * +find_track_entry_from_tracker(const void *overlay_fdt, + uint32_t overlay_fdt_size) +{ +struct overlay_track *entry, *temp; +bool found_entry = false; + +ASSERT(spin_is_locked(_lock)); + +/* + * First check if dtbo is correct i.e. it should one of the dtbo which was + * used when dynamically adding the node. + * Limitation: Cases with same node names but different property are not + * supported currently. We are relying on user to provide the same dtbo + * as it was used when adding the nodes. + */ +list_for_each_entry_safe( entry, temp, _tracker, entry ) +{ +if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 ) +{ +found_entry = true; +break; +} +} + +if ( !found_entry ) +{ +printk(XENLOG_ERR "Cannot find any matching tracker with input dtbo." + " Operation is supported only for prior added dtbo.\n"); +return NULL; +} + +return entry; +} + /* Check if node itself can be removed and remove node from IOMMU. */ static int remove_node_resources(struct dt_device_node *device_node) { @@ -485,8 +521,7 @@ static long handle_remove_overlay_nodes(const void *overlay_fdt, uint32_t overlay_fdt_size) { int rc; -struct overlay_track *entry, *temp, *track; -bool found_entry = false; +struct overlay_track *entry; rc = check_overlay_fdt(overlay_fdt, overlay_fdt_size); if ( rc ) @@ -494,29 +529,10 @@ static long handle_remove_overlay_nodes(const void *overlay_fdt, spin_lock(_lock); -/* - * First check if dtbo is correct i.e. it should one of the dtbo which was - * used when dynamically adding the node. - * Limitation: Cases with same node names but different property are not - * supported currently. We are relying on user to provide the same dtbo - * as it was used when adding the nodes. - */ -list_for_each_entry_safe( entry, temp, _tracker, entry ) -{ -if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 ) -{ -track = entry; -found_entry = true; -break; -} -} - -if ( !found_entry ) +entry = find_track_entry_from_tracker(overlay_fdt, overlay_fdt_size); +if ( entry == NULL ) { rc = -EINVAL; - -printk(XENLOG_ERR "Cannot find any matching tracker with input dtbo." - " Removing nodes is supported only for prior added dtbo.\n&
[PATCH v4 9/9] docs: Add device tree overlay documentation
From: Vikram Garhwal Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- v4: - No change. v3: - No change. v2: - Update the content based on the changes in this version. --- docs/misc/arm/overlay.txt | 99 +++ 1 file changed, 99 insertions(+) create mode 100644 docs/misc/arm/overlay.txt diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt new file mode 100644 index 00..811a6de369 --- /dev/null +++ b/docs/misc/arm/overlay.txt @@ -0,0 +1,99 @@ +# Device Tree Overlays support in Xen + +Xen now supports dynamic device assignment to running domains, +i.e. adding/removing nodes (using .dtbo) to/from Xen device tree, and +attaching/detaching them to/from a running domain with given $domid. + +Dynamic node assignment works in two steps: + +## Add/Remove device tree overlay to/from Xen device tree + +1. Xen tools check the dtbo given and parse all other user provided arguments +2. Xen tools pass the dtbo to Xen hypervisor via hypercall. +3. Xen hypervisor applies/removes the dtbo to/from Xen device tree. + +## Attach/Detach device from the DT overlay to/from domain + +1. Xen tools check the dtbo given and parse all other user provided arguments +2. Xen tools pass the dtbo to Xen hypervisor via hypercall. +3. Xen hypervisor attach/detach the device to/from the user-provided $domid by + mapping/unmapping node resources in the DT overlay. + +# Examples + +Here are a few examples on how to use it. + +## Dom0 device add + +For assigning a device tree overlay to Dom0, user should firstly properly +prepare the DT overlay. More information about device tree overlays can be +found in [1]. Then, in Dom0, enter the following: + +(dom0) xl dt-overlay add overlay.dtbo + +This will allocate the devices mentioned in overlay.dtbo to Xen device tree. + +To assign the newly added device from the dtbo to Dom0: + +(dom0) xl dt-overlay attach overlay.dtbo 0 + +Next, if the user wants to add the same device tree overlay to dom0 +Linux, execute the following: + +(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay +(dom0) cat overlay.dtbo > /sys/kernel/config/device-tree/overlays/new_overlay/dtbo + +Finally if needed, the relevant Linux kernel drive can be loaded using: + +(dom0) modprobe module_name.ko + +## Dom0 device remove + +For removing the device from Dom0, first detach the device from Dom0: + +(dom0) xl dt-overlay detach overlay.dtbo 0 + +NOTE: The user is expected to unload any Linux kernel modules which +might be accessing the devices in overlay.dtbo before detach the device. +Detaching devices without unloading the modules might result in a crash. + +Then remove the overlay from Xen device tree: + +(dom0) xl dt-overlay remove overlay.dtbo + +## DomU device add/remove + +All the nodes in dtbo will be assigned to a domain; the user will need +to prepare the dtb for the domU. For example, the `interrupt-parent` property +of the DomU overlay should be changed to the Xen hardcoded value `0xfde8`. +Below assumes the properly written DomU dtbo is `overlay_domu.dtbo`. + +User will need to create the DomU with below properties properly configured +in the xl config file: +- `iomem` +- `passthrough` (if IOMMU is needed) + +User will also need to modprobe the relevant drivers. + +Example for domU device add: + +(dom0) xl dt-overlay add overlay.dtbo# If not executed before +(dom0) xl dt-overlay attach overlay.dtbo $domid +(dom0) xl console $domid # To access $domid console + +Next, if the user needs to modify/prepare the overlay.dtbo suitable for +the domU: + +(domU) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay +(domU) cat overlay_domu.dtbo > /sys/kernel/config/device-tree/overlays/new_overlay/dtbo + +Finally, if needed, the relevant Linux kernel drive can be probed: + +(domU) modprobe module_name.ko + +Example for domU overlay remove: + +(dom0) xl dt-overlay detach overlay.dtbo $domid +(dom0) xl dt-overlay remove overlay.dtbo + +[1] https://www.kernel.org/doc/Documentation/devicetree/overlay-notes.txt -- 2.34.1
[PATCH v4 1/9] tools/xl: Correct the help information and exit code of the dt-overlay command
Fix the name mismatch in the xl dt-overlay command, the command name should be "dt-overlay" instead of "dt_overlay". Add the missing "," in the cmdtable. Fix the exit code of the dt-overlay command, use EXIT_FAILURE instead of ERROR_FAIL. Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support") Suggested-by: Anthony PERARD Signed-off-by: Henry Wang Reviewed-by: Jason Andryuk --- v4: - No change. v3: - Add Jason's Reviewed-by tag. v2: - New patch --- tools/xl/xl_cmdtable.c | 2 +- tools/xl/xl_vmcontrol.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c index 62bdb2aeaa..1f3c6b5897 100644 --- a/tools/xl/xl_cmdtable.c +++ b/tools/xl/xl_cmdtable.c @@ -635,7 +635,7 @@ const struct cmd_spec cmd_table[] = { { "dt-overlay", _dt_overlay, 0, 1, "Add/Remove a device tree overlay", - "add/remove <.dtbo>" + "add/remove <.dtbo>", "-h print this help\n" }, #endif diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c index 98f6bd2e76..02575d5d36 100644 --- a/tools/xl/xl_vmcontrol.c +++ b/tools/xl/xl_vmcontrol.c @@ -1278,7 +1278,7 @@ int main_dt_overlay(int argc, char **argv) const int overlay_remove_op = 2; if (argc < 2) { -help("dt_overlay"); +help("dt-overlay"); return EXIT_FAILURE; } @@ -1302,11 +1302,11 @@ int main_dt_overlay(int argc, char **argv) fprintf(stderr, "failed to read the overlay device tree file %s\n", overlay_config_file); free(overlay_dtb); -return ERROR_FAIL; +return EXIT_FAILURE; } } else { fprintf(stderr, "overlay dtbo file not provided\n"); -return ERROR_FAIL; +return EXIT_FAILURE; } rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op); -- 2.34.1
[PATCH v4 3/9] tools/arm: Introduce the "nr_spis" xl config entry
Currently, the number of SPIs allocated to the domain is only configurable for Dom0less DomUs. Xen domains are supposed to be platform agnostics and therefore the numbers of SPIs for libxl guests should not be based on the hardware. Introduce a new xl config entry for Arm to provide a method for user to decide the number of SPIs. This would help to avoid bumping the `config->arch.nr_spis` in libxl everytime there is a new platform with increased SPI numbers. Update the doc and the golang bindings accordingly. Signed-off-by: Henry Wang Reviewed-by: Jason Andryuk --- v4: - Add Jason's Reviewed-by tag. v3: - Reword documentation to avoid ambiguity. v2: - New patch to replace the original patch in v1: "[PATCH 05/15] tools/libs/light: Increase nr_spi to 160" --- docs/man/xl.cfg.5.pod.in | 14 ++ tools/golang/xenlight/helpers.gen.go | 2 ++ tools/golang/xenlight/types.gen.go | 1 + tools/libs/light/libxl_arm.c | 4 ++-- tools/libs/light/libxl_types.idl | 1 + tools/xl/xl_parse.c | 3 +++ 6 files changed, 23 insertions(+), 2 deletions(-) diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in index 8f2b375ce9..416d582844 100644 --- a/docs/man/xl.cfg.5.pod.in +++ b/docs/man/xl.cfg.5.pod.in @@ -3072,6 +3072,20 @@ raised. =back +=over 4 + +=item B + +An optional 32-bit integer parameter specifying the number of SPIs (Shared +Peripheral Interrupts) to allocate for the domain. If the value specified by +the `nr_spis` parameter is smaller than the number of SPIs calculated by the +toolstack based on the devices allocated for the domain, or the `nr_spis` +parameter is not specified, the value calculated by the toolstack will be used +for the domain. Otherwise, the value specified by the `nr_spis` parameter will +be used. + +=back + =head3 x86 =over 4 diff --git a/tools/golang/xenlight/helpers.gen.go b/tools/golang/xenlight/helpers.gen.go index b9cb5b33c7..fe5110474d 100644 --- a/tools/golang/xenlight/helpers.gen.go +++ b/tools/golang/xenlight/helpers.gen.go @@ -1154,6 +1154,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)} x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version) x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart) x.ArchArm.SveVl = SveType(xc.arch_arm.sve_vl) +x.ArchArm.NrSpis = uint32(xc.arch_arm.nr_spis) if err := x.ArchX86.MsrRelaxed.fromC(_x86.msr_relaxed);err != nil { return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err) } @@ -1670,6 +1671,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)} xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion) xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart) xc.arch_arm.sve_vl = C.libxl_sve_type(x.ArchArm.SveVl) +xc.arch_arm.nr_spis = C.uint32_t(x.ArchArm.NrSpis) if err := x.ArchX86.MsrRelaxed.toC(_x86.msr_relaxed); err != nil { return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err) } diff --git a/tools/golang/xenlight/types.gen.go b/tools/golang/xenlight/types.gen.go index 5b293755d7..c9e45b306f 100644 --- a/tools/golang/xenlight/types.gen.go +++ b/tools/golang/xenlight/types.gen.go @@ -597,6 +597,7 @@ ArchArm struct { GicVersion GicVersion Vuart VuartType SveVl SveType +NrSpis uint32 } ArchX86 struct { MsrRelaxed Defbool diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index 1cb89fa584..a4029e3ac8 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -181,8 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, LOG(DEBUG, "Configure the domain"); -config->arch.nr_spis = nr_spis; -LOG(DEBUG, " - Allocate %u SPIs", nr_spis); +config->arch.nr_spis = max(nr_spis, d_config->b_info.arch_arm.nr_spis); +LOG(DEBUG, " - Allocate %u SPIs", config->arch.nr_spis); switch (d_config->b_info.arch_arm.gic_version) { case LIBXL_GIC_VERSION_DEFAULT: diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl index 79e9c656cc..4e65e6fda5 100644 --- a/tools/libs/light/libxl_types.idl +++ b/tools/libs/light/libxl_types.idl @@ -722,6 +722,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), ("vuart", libxl_vuart_type), ("sve_vl", libxl_sve_type), + ("nr_spis", uint32), ])), ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool), ])), diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index c504ab3711..e3a4800f6e 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -2935,6 +2935,9 @@ skip_usbdev: } } +if (!xlu_cfg_get_long (config, "nr_spis", , 0)) +b_info->arch_arm.nr_spis = l; + parse_vkb_list(config, d_config); d_config->virtios = NULL; -- 2.34.1
[PATCH v4 7/9] xen/arm: Support device detachment from domains
Similarly as the device attachment from DT overlay to domain, this commit implements the device detachment from domain. The DOMCTL XEN_DOMCTL_dt_overlay op is extended to have the operation XEN_DOMCTL_DT_OVERLAY_DETACH. The detachment of the device is implemented by unmapping the IRQ and IOMMU resources. Note that with these changes, the device de-registration from the IOMMU driver should only happen at the time when the DT overlay is removed from the Xen device tree. Signed-off-by: Henry Wang Signed-off-by: Vikram Garhwal --- v4: - Split the original patch, only do device detachment from domain. --- xen/common/dt-overlay.c | 243 xen/include/public/domctl.h | 3 +- 2 files changed, 194 insertions(+), 52 deletions(-) diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c index 1087f9b502..693b6e4777 100644 --- a/xen/common/dt-overlay.c +++ b/xen/common/dt-overlay.c @@ -392,24 +392,100 @@ find_track_entry_from_tracker(const void *overlay_fdt, return entry; } +static int remove_irq(unsigned long s, unsigned long e, void *data) +{ +struct domain *d = data; +int rc = 0; + +/* + * IRQ should always have access unless there are duplication of + * of irqs in device tree. There are few cases of xen device tree + * where there are duplicate interrupts for the same node. + */ +if (!irq_access_permitted(d, s)) +return 0; +/* + * TODO: We don't handle shared IRQs for now. So, it is assumed that + * the IRQs was not shared with another domain. + */ +rc = irq_deny_access(d, s); +if ( rc ) +{ +printk(XENLOG_ERR "unable to revoke access for irq %ld\n", s); +return rc; +} + +rc = release_guest_irq(d, s); +if ( rc ) +{ +printk(XENLOG_ERR "unable to release irq %ld\n", s); +return rc; +} + +return rc; +} + +static int remove_all_irqs(struct rangeset *irq_ranges, struct domain *d) +{ +return rangeset_report_ranges(irq_ranges, 0, ~0UL, remove_irq, d); +} + +static int remove_iomem(unsigned long s, unsigned long e, void *data) +{ +struct domain *d = data; +int rc = 0; +p2m_type_t t; +mfn_t mfn; + +mfn = p2m_lookup(d, _gfn(s), ); +if ( mfn_x(mfn) == 0 || mfn_x(mfn) == ~0UL ) +return -EINVAL; + +rc = iomem_deny_access(d, s, e); +if ( rc ) +{ +printk(XENLOG_ERR "Unable to remove %pd access to %#lx - %#lx\n", + d, s, e); +return rc; +} + +rc = unmap_mmio_regions(d, _gfn(s), e - s, _mfn(s)); +if ( rc ) +return rc; + +return rc; +} + +static int remove_all_iomems(struct rangeset *iomem_ranges, struct domain *d) +{ +return rangeset_report_ranges(iomem_ranges, 0, ~0UL, remove_iomem, d); +} + /* Check if node itself can be removed and remove node from IOMMU. */ -static int remove_node_resources(struct dt_device_node *device_node) +static int remove_node_resources(struct dt_device_node *device_node, + struct domain *d) { int rc = 0; unsigned int len; domid_t domid; -domid = dt_device_used_by(device_node); +if ( !d ) +{ +domid = dt_device_used_by(device_node); -dt_dprintk("Checking if node %s is used by any domain\n", - device_node->full_name); +dt_dprintk("Checking if node %s is used by any domain\n", + device_node->full_name); -/* Remove the node if only it's assigned to hardware domain or domain io. */ -if ( domid != hardware_domain->domain_id && domid != DOMID_IO ) -{ -printk(XENLOG_ERR "Device %s is being used by domain %u. Removing nodes failed\n", - device_node->full_name, domid); -return -EINVAL; +/* + * We also check if device is assigned to DOMID_IO as when a domain + * is destroyed device is assigned to DOMID_IO. + */ +if ( domid != DOMID_IO ) +{ +printk(XENLOG_ERR "Device %s is being assigned to %u. Device is assigned to %d\n", + device_node->full_name, DOMID_IO, domid); +return -EINVAL; +} } /* Check if iommu property exists. */ @@ -417,9 +493,12 @@ static int remove_node_resources(struct dt_device_node *device_node) { if ( dt_device_is_protected(device_node) ) { -rc = iommu_remove_dt_device(device_node); -if ( rc < 0 ) -return rc; +if ( !list_empty(_node->domain_list) ) +{ +rc = iommu_deassign_dt_device(d, device_node); +if ( rc < 0 ) +return rc; +} } } @@ -428,7 +507,8 @@ static int remove_node_resources(struct dt_device_node *device_node) /* Remove all descendants
[PATCH v4 4/9] xen/arm/gic: Allow adding interrupt to running VMs
Currently, adding physical interrupts are only allowed at the domain creation time. For use cases such as dynamic device tree overlay addition, the adding of physical IRQ to running domains should be allowed. Drop the above-mentioned domain creation check. Since this will introduce interrupt state unsync issues for cases when the interrupt is active or pending in the guest, therefore for these cases we simply reject the operation. Do it for both new and old vGIC implementations. Signed-off-by: Henry Wang --- v4: - Split the original patch, only do the adding IRQ stuff in this patch. v3: - Update in-code comments. - Correct the if conditions. - Add taking/releasing the vgic lock of the vcpu. v2: - Reject the case where the IRQ is active or pending in guest. --- xen/arch/arm/gic-vgic.c | 9 +++-- xen/arch/arm/gic.c | 8 xen/arch/arm/vgic/vgic.c | 7 +-- 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 56490dbc43..b99e287224 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -442,9 +442,14 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, if ( connect ) { -/* The VIRQ should not be already enabled by the guest */ +/* + * The VIRQ should not be already enabled by the guest nor + * active/pending in the guest. + */ if ( !p->desc && - !test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) + !test_bit(GIC_IRQ_GUEST_ENABLED, >status) && + !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) && + !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) ) p->desc = desc; else ret = -EBUSY; diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 44c40e86de..b3467a76ae 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -135,14 +135,6 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, ASSERT(virq < vgic_num_irqs(d)); ASSERT(!is_lpi(virq)); -/* - * When routing an IRQ to guest, the virtual state is not synced - * back to the physical IRQ. To prevent get unsync, restrict the - * routing to when the Domain is been created. - */ -if ( d->creation_finished ) -return -EBUSY; - ret = vgic_connect_hw_irq(d, NULL, virq, desc, true); if ( ret ) return ret; diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c index b9463a5f27..048e12c562 100644 --- a/xen/arch/arm/vgic/vgic.c +++ b/xen/arch/arm/vgic/vgic.c @@ -876,8 +876,11 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *vcpu, if ( connect ) /* assign a mapped IRQ */ { -/* The VIRQ should not be already enabled by the guest */ -if ( !irq->hw && !irq->enabled ) +/* + * The VIRQ should not be already enabled by the guest nor + * active/pending in the guest + */ +if ( !irq->hw && !irq->enabled && !irq->active && !irq->pending_latch ) { irq->hw = true; irq->hwintid = desc->irq; -- 2.34.1
[PATCH v4 2/9] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
There are some use cases in which the dom0less domUs need to have the XEN_DOMCTL_CDF_iommu set at the domain construction time. For example, the dynamic dtbo feature allows the domain to be assigned a device that is behind the IOMMU at runtime. For these use cases, we need to have a way to specify the domain will need the IOMMU mapping at domain construction time. Introduce a "passthrough" DT property for Dom0less DomUs following the same entry as the xl.cfg. Currently only provide two options, i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain construction time based on the property. Signed-off-by: Henry Wang --- v4: - No change. v3: - Use a separate variable to cache the condition from the "passthrough" flag separately to improve readability. - Update the doc to explain the default condition more clearly. v2: - New patch to replace the original patch in v1: "[PATCH 03/15] xen/arm: Always enable IOMMU" --- docs/misc/arm/device-tree/booting.txt | 16 xen/arch/arm/dom0less-build.c | 11 +-- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt index bbd955e9c2..f1fd069c87 100644 --- a/docs/misc/arm/device-tree/booting.txt +++ b/docs/misc/arm/device-tree/booting.txt @@ -260,6 +260,22 @@ with the following properties: value specified by Xen command line parameter gnttab_max_maptrack_frames (or its default value if unspecified, i.e. 1024) is used. +- passthrough + +A string property specifying whether IOMMU mappings are enabled for the +domain and hence whether it will be enabled for passthrough hardware. +Possible property values are: + +- "enabled" +IOMMU mappings are enabled for the domain. Note that this option is the +default if the user provides the device partial passthrough device tree +for the domain. + +- "disabled" +IOMMU mappings are disabled for the domain and so hardware may not be +passed through. This option is the default if this property is missing +and the user does not provide the device partial device tree for the domain. + Under the "xen,domain" compatible node, one or more sub-nodes are present for the DomU kernel and ramdisk. diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index 74f053c242..5830a7051d 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -848,6 +848,8 @@ static int __init construct_domU(struct domain *d, void __init create_domUs(void) { struct dt_device_node *node; +const char *dom0less_iommu; +bool iommu = false; const struct dt_device_node *cpupool_node, *chosen = dt_find_node_by_path("/chosen"); @@ -895,8 +897,13 @@ void __init create_domUs(void) panic("Missing property 'cpus' for domain %s\n", dt_node_name(node)); -if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") && - iommu_enabled ) +if ( !dt_property_read_string(node, "passthrough", _iommu) && + !strcmp(dom0less_iommu, "enabled") ) +iommu = true; + +if ( iommu_enabled && + (iommu || dt_find_compatible_node(node, NULL, + "multiboot,device-tree")) ) d_cfg.flags |= XEN_DOMCTL_CDF_iommu; if ( !dt_property_read_u32(node, "nr_spis", _cfg.arch.nr_spis) ) -- 2.34.1
[PATCH v4 0/9] Remaining patches for dynamic node programming using overlay dtbo
Hi all, This is the remaining series for the full functional "dynamic node programming using overlay dtbo" feature. The first part [1] has already been merged. Quoting from the original series, the first part has already made Xen aware of new device tree node which means updating the dt_host with overlay node information, and in this series, the goal is to map IRQ and IOMMU during runtime, where we will do the actual IOMMU and IRQ mapping and unmapping to a running domain. Also, documentation of the "dynamic node programming using overlay dtbo" feature is added. During the discussion in v3, I was recommended to split the overlay devices attach/detach to/from running domains to separated patches [3]. But I decided to only expose the xl user interfaces together to the users after device attach/detach is fully functional, so I didn't split the toolstack patch (#8). Patch 1 is a fix of the existing code which is noticed during my local tests, details please see the commit message. Gitlab CI for this series can be found in [2]. [1] https://lore.kernel.org/xen-devel/20230906011631.30310-1-vikram.garh...@amd.com/ [2] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1301720278 [3] https://lore.kernel.org/xen-devel/e743d3d2-5884-4e55-8627-85985ba33...@amd.com/ Henry Wang (7): tools/xl: Correct the help information and exit code of the dt-overlay command xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs tools/arm: Introduce the "nr_spis" xl config entry xen/arm/gic: Allow adding interrupt to running VMs xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains xen/arm: Support device detachment from domains tools: Introduce the "xl dt-overlay {attach,detach}" commands Vikram Garhwal (2): xen/arm/gic: Allow removing interrupt to running VMs docs: Add device tree overlay documentation docs/man/xl.cfg.5.pod.in | 14 + docs/misc/arm/device-tree/booting.txt | 16 + docs/misc/arm/overlay.txt | 99 ++ tools/golang/xenlight/helpers.gen.go | 2 + tools/golang/xenlight/types.gen.go| 1 + tools/include/libxl.h | 10 + tools/include/xenctrl.h | 3 + tools/libs/ctrl/xc_dt_overlay.c | 31 ++ tools/libs/light/libxl_arm.c | 4 +- tools/libs/light/libxl_dt_overlay.c | 28 ++ tools/libs/light/libxl_types.idl | 1 + tools/xl/xl_cmdtable.c| 4 +- tools/xl/xl_parse.c | 3 + tools/xl/xl_vmcontrol.c | 48 ++- xen/arch/arm/dom0less-build.c | 11 +- xen/arch/arm/domctl.c | 3 + xen/arch/arm/gic-vgic.c | 36 ++- xen/arch/arm/gic.c| 17 +- xen/arch/arm/vgic/vgic.c | 31 +- xen/common/dt-overlay.c | 438 -- xen/include/public/domctl.h | 15 + xen/include/public/sysctl.h | 11 +- xen/include/xen/dt-overlay.h | 7 + 23 files changed, 678 insertions(+), 155 deletions(-) create mode 100644 docs/misc/arm/overlay.txt -- 2.34.1
Re: [PATCH v3 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs
Hi Julien, Stefano, On 5/22/2024 9:03 PM, Julien Grall wrote: Hi Henry, On 22/05/2024 02:22, Henry Wang wrote: Also, while looking at the locking, I noticed that we are not doing anything with GIC_IRQ_GUEST_MIGRATING. In gic_update_one_lr(), we seem to assume that if the flag is set, then p->desc cannot be NULL. Can we reach vgic_connect_hw_irq() with the flag set? I think even from the perspective of making the code extra safe, we should also check GIC_IRQ_GUEST_MIGRATING as the LR is allocated for this case. I will also add the check of GIC_IRQ_GUEST_MIGRATING here. Yes. I think it might be easier to check for GIC_IRQ_GUEST_MIGRATING early and return error immediately in that case. Otherwise, we can continue and take spin_lock(_target->arch.vgic.lock) because no migration is in progress Ok, this makes sense to me, I will add if( test_bit(GIC_IRQ_GUEST_MIGRATING, >status) ) { vgic_unlock_rank(v_target, rank, flags); return -EBUSY; } right after taking the vgic rank lock. Summary of our yesterday's discussion on Matrix: For the split of patch mentioned in... I think that would be ok. I have to admit, I am still a bit wary about allowing to remove interrupts when the domain is running. I am less concerned about the add part. Do you need the remove part now? If not, I would suggest to split in two so we can get the most of this series merged for 4.19 and continue to deal with the remove path in the background. ...here, I will do that in the next version. I will answer here to the other reply: > I don't think so, if I am not mistaken, no LR will be allocated with other flags set. I wasn't necessarily thinking about the LR allocation. I was more thinking whether there are any flags that could still be set. IOW, will the vIRQ like new once vgic_connect_hw_irq() is succesful? Also, while looking at the flags, I noticed we clear _IRQ_INPROGRESS before vgic_connect_hw_irq(). Shouldn't we only clear *after*? This is a good catch, with the logic of vgic_connect_hw_irq() extended to reject the invalid cases, it is indeed safer to clear the _IRQ_INPROGRESS after the successful vgic_connect_hw_irq(). I will move it after. This brings to another question. You don't special case a dying domain. If the domain is crashing, wouldn't this mean it wouldn't be possible to destroy it? Another good point, thanks. I will try to make a special case of the dying domain. Kind regards, Henry Cheers,
Re: [PATCH v3 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs
Hi Stefano, On 5/22/2024 9:16 AM, Stefano Stabellini wrote: On Wed, 22 May 2024, Henry Wang wrote: Hi Julien, On 5/21/2024 8:30 PM, Julien Grall wrote: Hi, On 21/05/2024 05:35, Henry Wang wrote: diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 56490dbc43..956c11ba13 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -439,24 +439,33 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, /* We are taking to rank lock to prevent parallel connections. */ vgic_lock_rank(v_target, rank, flags); + spin_lock(_target->arch.vgic.lock); I know this is what Stefano suggested, but v_target would point to the current affinity whereas the interrupt may be pending/active on the "previous" vCPU. So it is a little unclear whether v_target is the correct lock. Do you have more pointer to show this is correct? No I think you are correct, we have discussed this in the initial version of this patch. Sorry. I followed the way from that discussion to note down the vcpu ID and retrieve here, below is the diff, would this make sense to you? diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 956c11ba13..134ed4e107 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -439,7 +439,7 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, /* We are taking to rank lock to prevent parallel connections. */ vgic_lock_rank(v_target, rank, flags); - spin_lock(_target->arch.vgic.lock); + spin_lock(>vcpu[p->spi_vcpu_id]->arch.vgic.lock); if ( connect ) { @@ -465,7 +465,7 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, p->desc = NULL; } - spin_unlock(_target->arch.vgic.lock); + spin_unlock(>vcpu[p->spi_vcpu_id]->arch.vgic.lock); vgic_unlock_rank(v_target, rank, flags); return ret; diff --git a/xen/arch/arm/include/asm/vgic.h b/xen/arch/arm/include/asm/vgic.h index 79b73a0dbb..f4075d3e75 100644 --- a/xen/arch/arm/include/asm/vgic.h +++ b/xen/arch/arm/include/asm/vgic.h @@ -85,6 +85,7 @@ struct pending_irq uint8_t priority; uint8_t lpi_priority; /* Caches the priority if this is an LPI. */ uint8_t lpi_vcpu_id; /* The VCPU for an LPI. */ + uint8_t spi_vcpu_id; /* The VCPU for an SPI. */ /* inflight is used to append instances of pending_irq to * vgic.inflight_irqs */ struct list_head inflight; diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index c04fc4f83f..e852479f13 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -632,6 +632,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq, } list_add_tail(>inflight, >arch.vgic.inflight_irqs); out: + n->spi_vcpu_id = v->vcpu_id; spin_unlock_irqrestore(>arch.vgic.lock, flags); /* we have a new higher priority irq, inject it into the guest */ vcpu_kick(v); Also, while looking at the locking, I noticed that we are not doing anything with GIC_IRQ_GUEST_MIGRATING. In gic_update_one_lr(), we seem to assume that if the flag is set, then p->desc cannot be NULL. Can we reach vgic_connect_hw_irq() with the flag set? I think even from the perspective of making the code extra safe, we should also check GIC_IRQ_GUEST_MIGRATING as the LR is allocated for this case. I will also add the check of GIC_IRQ_GUEST_MIGRATING here. Yes. I think it might be easier to check for GIC_IRQ_GUEST_MIGRATING early and return error immediately in that case. Otherwise, we can continue and take spin_lock(_target->arch.vgic.lock) because no migration is in progress Ok, this makes sense to me, I will add if( test_bit(GIC_IRQ_GUEST_MIGRATING, >status) ) { vgic_unlock_rank(v_target, rank, flags); return -EBUSY; } right after taking the vgic rank lock. Kind regards, Henry
Re: [PATCH v3 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs
Hi Julien, On 5/21/2024 8:30 PM, Julien Grall wrote: Hi, On 21/05/2024 05:35, Henry Wang wrote: diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 56490dbc43..956c11ba13 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -439,24 +439,33 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, /* We are taking to rank lock to prevent parallel connections. */ vgic_lock_rank(v_target, rank, flags); + spin_lock(_target->arch.vgic.lock); I know this is what Stefano suggested, but v_target would point to the current affinity whereas the interrupt may be pending/active on the "previous" vCPU. So it is a little unclear whether v_target is the correct lock. Do you have more pointer to show this is correct? No I think you are correct, we have discussed this in the initial version of this patch. Sorry. I followed the way from that discussion to note down the vcpu ID and retrieve here, below is the diff, would this make sense to you? diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 956c11ba13..134ed4e107 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -439,7 +439,7 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, /* We are taking to rank lock to prevent parallel connections. */ vgic_lock_rank(v_target, rank, flags); - spin_lock(_target->arch.vgic.lock); + spin_lock(>vcpu[p->spi_vcpu_id]->arch.vgic.lock); if ( connect ) { @@ -465,7 +465,7 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, p->desc = NULL; } - spin_unlock(_target->arch.vgic.lock); + spin_unlock(>vcpu[p->spi_vcpu_id]->arch.vgic.lock); vgic_unlock_rank(v_target, rank, flags); return ret; diff --git a/xen/arch/arm/include/asm/vgic.h b/xen/arch/arm/include/asm/vgic.h index 79b73a0dbb..f4075d3e75 100644 --- a/xen/arch/arm/include/asm/vgic.h +++ b/xen/arch/arm/include/asm/vgic.h @@ -85,6 +85,7 @@ struct pending_irq uint8_t priority; uint8_t lpi_priority; /* Caches the priority if this is an LPI. */ uint8_t lpi_vcpu_id; /* The VCPU for an LPI. */ + uint8_t spi_vcpu_id; /* The VCPU for an SPI. */ /* inflight is used to append instances of pending_irq to * vgic.inflight_irqs */ struct list_head inflight; diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index c04fc4f83f..e852479f13 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -632,6 +632,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq, } list_add_tail(>inflight, >arch.vgic.inflight_irqs); out: + n->spi_vcpu_id = v->vcpu_id; spin_unlock_irqrestore(>arch.vgic.lock, flags); /* we have a new higher priority irq, inject it into the guest */ vcpu_kick(v); Also, while looking at the locking, I noticed that we are not doing anything with GIC_IRQ_GUEST_MIGRATING. In gic_update_one_lr(), we seem to assume that if the flag is set, then p->desc cannot be NULL. Can we reach vgic_connect_hw_irq() with the flag set? I think even from the perspective of making the code extra safe, we should also check GIC_IRQ_GUEST_MIGRATING as the LR is allocated for this case. I will also add the check of GIC_IRQ_GUEST_MIGRATING here. What about the other flags? Is this going to be a concern if we don't reset them? I don't think so, if I am not mistaken, no LR will be allocated with other flags set. Kind regards, Henry Cheers,
Re: [PATCH v3] xen/arm: Set correct per-cpu cpu_core_mask
Hi Michal, On 5/21/2024 3:47 PM, Michal Orzel wrote: Hi Henry. On 3/21/2024 11:57 AM, Henry Wang wrote: In the common sysctl command XEN_SYSCTL_physinfo, the value of cores_per_socket is calculated based on the cpu_core_mask of CPU0. Currently on Arm this is a fixed value 1 (can be checked via xl info), which is not correct. This is because during the Arm CPU online process at boot time, setup_cpu_sibling_map() only sets the per-cpu cpu_core_mask for itself. cores_per_socket refers to the number of cores that belong to the same socket (NUMA node). Currently Xen on Arm does not support physical CPU hotplug and NUMA, also we assume there is no multithread. Therefore cores_per_socket means all possible CPUs detected from the device tree. Setting the per-cpu cpu_core_mask in setup_cpu_sibling_map() accordingly. Modify the in-code comment which seems to be outdated. Add a warning to users if Xen is running on processors with multithread support. Signed-off-by: Henry Wang Signed-off-by: Henry Wang Reviewed-by: Michal Orzel Thanks. /* ID of the PCPU we're running on */ DEFINE_PER_CPU(unsigned int, cpu_id); -/* XXX these seem awfully x86ish... */ +/* + * Although multithread is part of the Arm spec, there are not many + * processors support multithread and current Xen on Arm assumes there NIT: s/support/supporting Sorry, it should have been spotted locally before sending. Anyway, I will correct this in v4 with your Reviewed-by tag taken. Thanks for pointing this out. Kind regards, Henry __init smp_get_max_cpus(void) ~Michal
[PATCH v3 8/8] docs: Add device tree overlay documentation
From: Vikram Garhwal Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- v3: - No change. v2: - Update the content based on the changes in this version. --- docs/misc/arm/overlay.txt | 99 +++ 1 file changed, 99 insertions(+) create mode 100644 docs/misc/arm/overlay.txt diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt new file mode 100644 index 00..811a6de369 --- /dev/null +++ b/docs/misc/arm/overlay.txt @@ -0,0 +1,99 @@ +# Device Tree Overlays support in Xen + +Xen now supports dynamic device assignment to running domains, +i.e. adding/removing nodes (using .dtbo) to/from Xen device tree, and +attaching/detaching them to/from a running domain with given $domid. + +Dynamic node assignment works in two steps: + +## Add/Remove device tree overlay to/from Xen device tree + +1. Xen tools check the dtbo given and parse all other user provided arguments +2. Xen tools pass the dtbo to Xen hypervisor via hypercall. +3. Xen hypervisor applies/removes the dtbo to/from Xen device tree. + +## Attach/Detach device from the DT overlay to/from domain + +1. Xen tools check the dtbo given and parse all other user provided arguments +2. Xen tools pass the dtbo to Xen hypervisor via hypercall. +3. Xen hypervisor attach/detach the device to/from the user-provided $domid by + mapping/unmapping node resources in the DT overlay. + +# Examples + +Here are a few examples on how to use it. + +## Dom0 device add + +For assigning a device tree overlay to Dom0, user should firstly properly +prepare the DT overlay. More information about device tree overlays can be +found in [1]. Then, in Dom0, enter the following: + +(dom0) xl dt-overlay add overlay.dtbo + +This will allocate the devices mentioned in overlay.dtbo to Xen device tree. + +To assign the newly added device from the dtbo to Dom0: + +(dom0) xl dt-overlay attach overlay.dtbo 0 + +Next, if the user wants to add the same device tree overlay to dom0 +Linux, execute the following: + +(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay +(dom0) cat overlay.dtbo > /sys/kernel/config/device-tree/overlays/new_overlay/dtbo + +Finally if needed, the relevant Linux kernel drive can be loaded using: + +(dom0) modprobe module_name.ko + +## Dom0 device remove + +For removing the device from Dom0, first detach the device from Dom0: + +(dom0) xl dt-overlay detach overlay.dtbo 0 + +NOTE: The user is expected to unload any Linux kernel modules which +might be accessing the devices in overlay.dtbo before detach the device. +Detaching devices without unloading the modules might result in a crash. + +Then remove the overlay from Xen device tree: + +(dom0) xl dt-overlay remove overlay.dtbo + +## DomU device add/remove + +All the nodes in dtbo will be assigned to a domain; the user will need +to prepare the dtb for the domU. For example, the `interrupt-parent` property +of the DomU overlay should be changed to the Xen hardcoded value `0xfde8`. +Below assumes the properly written DomU dtbo is `overlay_domu.dtbo`. + +User will need to create the DomU with below properties properly configured +in the xl config file: +- `iomem` +- `passthrough` (if IOMMU is needed) + +User will also need to modprobe the relevant drivers. + +Example for domU device add: + +(dom0) xl dt-overlay add overlay.dtbo# If not executed before +(dom0) xl dt-overlay attach overlay.dtbo $domid +(dom0) xl console $domid # To access $domid console + +Next, if the user needs to modify/prepare the overlay.dtbo suitable for +the domU: + +(domU) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay +(domU) cat overlay_domu.dtbo > /sys/kernel/config/device-tree/overlays/new_overlay/dtbo + +Finally, if needed, the relevant Linux kernel drive can be probed: + +(domU) modprobe module_name.ko + +Example for domU overlay remove: + +(dom0) xl dt-overlay detach overlay.dtbo $domid +(dom0) xl dt-overlay remove overlay.dtbo + +[1] https://www.kernel.org/doc/Documentation/devicetree/overlay-notes.txt -- 2.34.1
[PATCH v3 0/8] Remaining patches for dynamic node programming using overlay dtbo
Hi all, This is the remaining series for the full functional "dynamic node programming using overlay dtbo" feature. The first part [1] has already been merged. Quoting from the original series, the first part has already made Xen aware of new device tree node which means updating the dt_host with overlay node information, and in this series, the goal is to map IRQ and IOMMU during runtime, where we will do the actual IOMMU and IRQ mapping and unmapping to a running domain. Also, documentation of the "dynamic node programming using overlay dtbo" feature is added. Patch 1 and 2 are fixes of the existing code which is noticed during my local tests, details please see the commit message. Gitlab CI for this series can be found in [2]. [1] https://lore.kernel.org/xen-devel/20230906011631.30310-1-vikram.garh...@amd.com/ [2] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1298425517 Henry Wang (6): xen/common/dt-overlay: Fix lock issue when add/remove the device tools/xl: Correct the help information and exit code of the dt-overlay command xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs tools/arm: Introduce the "nr_spis" xl config entry xen/arm: Add XEN_DOMCTL_dt_overlay DOMCTL and related operations tools: Introduce the "xl dt-overlay {attach,detach}" commands Vikram Garhwal (2): xen/arm/gic: Allow routing/removing interrupt to running VMs docs: Add device tree overlay documentation docs/man/xl.cfg.5.pod.in | 14 + docs/misc/arm/device-tree/booting.txt | 16 + docs/misc/arm/overlay.txt | 99 ++ tools/golang/xenlight/helpers.gen.go | 2 + tools/golang/xenlight/types.gen.go| 1 + tools/include/libxl.h | 10 + tools/include/xenctrl.h | 3 + tools/libs/ctrl/xc_dt_overlay.c | 31 ++ tools/libs/light/libxl_arm.c | 4 +- tools/libs/light/libxl_dt_overlay.c | 28 ++ tools/libs/light/libxl_types.idl | 1 + tools/xl/xl_cmdtable.c| 4 +- tools/xl/xl_parse.c | 3 + tools/xl/xl_vmcontrol.c | 48 ++- xen/arch/arm/dom0less-build.c | 11 +- xen/arch/arm/domctl.c | 3 + xen/arch/arm/gic-vgic.c | 15 +- xen/arch/arm/gic.c| 15 - xen/arch/arm/vgic/vgic.c | 10 +- xen/common/dt-overlay.c | 441 -- xen/include/public/domctl.h | 15 + xen/include/public/sysctl.h | 11 +- xen/include/xen/dt-overlay.h | 7 + 23 files changed, 644 insertions(+), 148 deletions(-) create mode 100644 docs/misc/arm/overlay.txt -- 2.34.1
[PATCH v3 3/8] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
There are some use cases in which the dom0less domUs need to have the XEN_DOMCTL_CDF_iommu set at the domain construction time. For example, the dynamic dtbo feature allows the domain to be assigned a device that is behind the IOMMU at runtime. For these use cases, we need to have a way to specify the domain will need the IOMMU mapping at domain construction time. Introduce a "passthrough" DT property for Dom0less DomUs following the same entry as the xl.cfg. Currently only provide two options, i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain construction time based on the property. Signed-off-by: Henry Wang --- v3: - Use a separate variable to cache the condition from the "passthrough" flag separately to improve readability. - Update the doc to explain the default condition more clearly. v2: - New patch to replace the original patch in v1: "[PATCH 03/15] xen/arm: Always enable IOMMU" --- docs/misc/arm/device-tree/booting.txt | 16 xen/arch/arm/dom0less-build.c | 11 +-- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt index bbd955e9c2..f1fd069c87 100644 --- a/docs/misc/arm/device-tree/booting.txt +++ b/docs/misc/arm/device-tree/booting.txt @@ -260,6 +260,22 @@ with the following properties: value specified by Xen command line parameter gnttab_max_maptrack_frames (or its default value if unspecified, i.e. 1024) is used. +- passthrough + +A string property specifying whether IOMMU mappings are enabled for the +domain and hence whether it will be enabled for passthrough hardware. +Possible property values are: + +- "enabled" +IOMMU mappings are enabled for the domain. Note that this option is the +default if the user provides the device partial passthrough device tree +for the domain. + +- "disabled" +IOMMU mappings are disabled for the domain and so hardware may not be +passed through. This option is the default if this property is missing +and the user does not provide the device partial device tree for the domain. + Under the "xen,domain" compatible node, one or more sub-nodes are present for the DomU kernel and ramdisk. diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index 74f053c242..5830a7051d 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -848,6 +848,8 @@ static int __init construct_domU(struct domain *d, void __init create_domUs(void) { struct dt_device_node *node; +const char *dom0less_iommu; +bool iommu = false; const struct dt_device_node *cpupool_node, *chosen = dt_find_node_by_path("/chosen"); @@ -895,8 +897,13 @@ void __init create_domUs(void) panic("Missing property 'cpus' for domain %s\n", dt_node_name(node)); -if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") && - iommu_enabled ) +if ( !dt_property_read_string(node, "passthrough", _iommu) && + !strcmp(dom0less_iommu, "enabled") ) +iommu = true; + +if ( iommu_enabled && + (iommu || dt_find_compatible_node(node, NULL, + "multiboot,device-tree")) ) d_cfg.flags |= XEN_DOMCTL_CDF_iommu; if ( !dt_property_read_u32(node, "nr_spis", _cfg.arch.nr_spis) ) -- 2.34.1
[PATCH v3 1/8] xen/common/dt-overlay: Fix lock issue when add/remove the device
If CONFIG_DEBUG=y, below assertion will be triggered: (XEN) Assertion 'rw_is_locked(_host_lock)' failed at drivers/passthrough/device_tree.c:146 (XEN) [ Xen-4.19-unstable arm64 debug=y Not tainted ] [...] (XEN) Xen call trace: (XEN)[<0a257418>] iommu_remove_dt_device+0x8c/0xd4 (PC) (XEN)[<0a2573a0>] iommu_remove_dt_device+0x14/0xd4 (LR) (XEN)[<0a20797c>] dt-overlay.c#remove_node_resources+0x8c/0x90 (XEN)[<0a207f14>] dt-overlay.c#remove_nodes+0x524/0x648 (XEN)[<0a208460>] dt_overlay_sysctl+0x428/0xc68 (XEN)[<0a2707f8>] arch_do_sysctl+0x1c/0x2c (XEN)[<0a230b40>] do_sysctl+0x96c/0x9ec (XEN)[<0a271e08>] traps.c#do_trap_hypercall+0x1e8/0x288 (XEN)[<0a273490>] do_trap_guest_sync+0x448/0x63c (XEN)[<0a25c480>] entry.o#guest_sync_slowpath+0xa8/0xd8 (XEN) (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Assertion 'rw_is_locked(_host_lock)' failed at drivers/passthrough/device_tree.c:146 (XEN) This is because iommu_remove_dt_device() is called without taking the dt_host_lock. dt_host_lock is meant to ensure that the DT node will not disappear behind back. So fix the issue by taking the lock as soon as getting hold of overlay_node. Similar issue will be observed in adding the dtbo: (XEN) Assertion 'system_state < SYS_STATE_active || rw_is_locked(_host_lock)' failed at xen-source/xen/drivers/passthrough/device_tree.c:192 (XEN) [ Xen-4.19-unstable arm64 debug=y Not tainted ] [...] (XEN) Xen call trace: (XEN)[<0a2594f4>] iommu_add_dt_device+0x7c/0x17c (PC) (XEN)[<0a259494>] iommu_add_dt_device+0x1c/0x17c (LR) (XEN)[<0a267db4>] handle_device+0x68/0x1e8 (XEN)[<0a208ba8>] dt_overlay_sysctl+0x9d4/0xb84 (XEN)[<0a27342c>] arch_do_sysctl+0x24/0x38 (XEN)[<0a231ac8>] do_sysctl+0x9ac/0xa34 (XEN)[<0a274b70>] traps.c#do_trap_hypercall+0x230/0x2dc (XEN)[<0a276330>] do_trap_guest_sync+0x478/0x688 (XEN)[<0a25e480>] entry.o#guest_sync_slowpath+0xa8/0xd8 This is because the lock is released too early. So fix the issue by releasing the lock after handle_device(). Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities") Signed-off-by: Henry Wang Reviewed-by: Julien Grall --- v3: - Add Julien's Reviewed-by tag. v2: - Take the lock as soon as getting hold of overlay_node. Also release the lock after handle_device() when adding dtbo. v1.1: - Move the unlock position before the check of rc. --- xen/common/dt-overlay.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c index 1b197381f6..9cece79067 100644 --- a/xen/common/dt-overlay.c +++ b/xen/common/dt-overlay.c @@ -429,18 +429,24 @@ static int remove_nodes(const struct overlay_track *tracker) if ( overlay_node == NULL ) return -EINVAL; +write_lock(_host_lock); + rc = remove_descendant_nodes_resources(overlay_node); if ( rc ) +{ +write_unlock(_host_lock); return rc; +} rc = remove_node_resources(overlay_node); if ( rc ) +{ +write_unlock(_host_lock); return rc; +} dt_dprintk("Removing node: %s\n", overlay_node->full_name); -write_lock(_host_lock); - rc = dt_overlay_remove_node(overlay_node); if ( rc ) { @@ -604,8 +610,6 @@ static long add_nodes(struct overlay_track *tr, char **nodes_full_path) return rc; } -write_unlock(_host_lock); - prev_node->allnext = next_node; overlay_node = dt_find_node_by_path(overlay_node->full_name); @@ -619,6 +623,7 @@ static long add_nodes(struct overlay_track *tr, char **nodes_full_path) rc = handle_device(hardware_domain, overlay_node, p2m_mmio_direct_c, tr->iomem_ranges, tr->irq_ranges); +write_unlock(_host_lock); if ( rc ) { printk(XENLOG_ERR "Adding IRQ and IOMMU failed\n"); -- 2.34.1
[PATCH v3 7/8] tools: Introduce the "xl dt-overlay {attach,detach}" commands
With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to attach/detach devices from the provided DT overlay to domains. Support this by introducing a new set of "xl dt-overlay" commands and related documentation, i.e. "xl dt-overlay {attach,detach}". Slightly rework the command option parsing logic. Signed-off-by: Henry Wang --- v3: - Introduce new API libxl_dt_overlay_domain() and co., instead of reusing existing API libxl_dt_overlay(). - Add in-code comments for the LIBXL_DT_OVERLAY_* macros. - Use find_domain() to avoid getting domain_id from strtol(). v2: - New patch. --- tools/include/libxl.h | 10 +++ tools/include/xenctrl.h | 3 +++ tools/libs/ctrl/xc_dt_overlay.c | 31 + tools/libs/light/libxl_dt_overlay.c | 28 +++ tools/xl/xl_cmdtable.c | 4 +-- tools/xl/xl_vmcontrol.c | 42 - 6 files changed, 104 insertions(+), 14 deletions(-) diff --git a/tools/include/libxl.h b/tools/include/libxl.h index 62cb07dea6..6cc6d6bf6a 100644 --- a/tools/include/libxl.h +++ b/tools/include/libxl.h @@ -2549,8 +2549,18 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, uint32_t domid, void libxl_device_pci_list_free(libxl_device_pci* list, int num); #if defined(__arm__) || defined(__aarch64__) +/* Values should keep consistent with the op from XEN_SYSCTL_dt_overlay */ +#define LIBXL_DT_OVERLAY_ADD 1 +#define LIBXL_DT_OVERLAY_REMOVE2 int libxl_dt_overlay(libxl_ctx *ctx, void *overlay, uint32_t overlay_size, uint8_t overlay_op); + +/* Values should keep consistent with the op from XEN_DOMCTL_dt_overlay */ +#define LIBXL_DT_OVERLAY_DOMAIN_ATTACH 1 +#define LIBXL_DT_OVERLAY_DOMAIN_DETACH 2 +int libxl_dt_overlay_domain(libxl_ctx *ctx, uint32_t domain_id, +void *overlay_dt, uint32_t overlay_dt_size, +uint8_t overlay_op); #endif /* diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 4996855944..9ceca0cffc 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -2657,6 +2657,9 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t domid, #if defined(__arm__) || defined(__aarch64__) int xc_dt_overlay(xc_interface *xch, void *overlay_fdt, uint32_t overlay_fdt_size, uint8_t overlay_op); +int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt, + uint32_t overlay_fdt_size, uint8_t overlay_op, + uint32_t domain_id); #endif /* Compat shims */ diff --git a/tools/libs/ctrl/xc_dt_overlay.c b/tools/libs/ctrl/xc_dt_overlay.c index c2224c4d15..ea1da522d1 100644 --- a/tools/libs/ctrl/xc_dt_overlay.c +++ b/tools/libs/ctrl/xc_dt_overlay.c @@ -48,3 +48,34 @@ err: return err; } + +int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt, + uint32_t overlay_fdt_size, uint8_t overlay_op, + uint32_t domain_id) +{ +int err; +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_dt_overlay, +.domain = domain_id, +.u.dt_overlay = { +.overlay_op = overlay_op, +.overlay_fdt_size = overlay_fdt_size, +} +}; + +DECLARE_HYPERCALL_BOUNCE(overlay_fdt, overlay_fdt_size, + XC_HYPERCALL_BUFFER_BOUNCE_IN); + +if ( (err = xc_hypercall_bounce_pre(xch, overlay_fdt)) ) +goto err; + +set_xen_guest_handle(domctl.u.dt_overlay.overlay_fdt, overlay_fdt); + +if ( (err = do_domctl(xch, )) != 0 ) +PERROR("%s failed", __func__); + +err: +xc_hypercall_bounce_post(xch, overlay_fdt); + +return err; +} diff --git a/tools/libs/light/libxl_dt_overlay.c b/tools/libs/light/libxl_dt_overlay.c index a6c709a6dc..00503b76bd 100644 --- a/tools/libs/light/libxl_dt_overlay.c +++ b/tools/libs/light/libxl_dt_overlay.c @@ -69,3 +69,31 @@ out: return rc; } +int libxl_dt_overlay_domain(libxl_ctx *ctx, uint32_t domain_id, +void *overlay_dt, uint32_t overlay_dt_size, +uint8_t overlay_op) +{ +int rc; +int r; +GC_INIT(ctx); + +if (check_overlay_fdt(gc, overlay_dt, overlay_dt_size)) { +LOG(ERROR, "Overlay DTB check failed"); +rc = ERROR_FAIL; +goto out; +} else { +LOG(DEBUG, "Overlay DTB check passed"); +rc = 0; +} + +r = xc_dt_overlay_domain(ctx->xch, overlay_dt, overlay_dt_size, overlay_op, + domain_id); +if (r) { +LOG(ERROR, "%s: Attaching/Detaching overlay dtb failed.", __func__); +rc = ERROR_FAIL; +} + +out: +GC_FREE; +return rc; +} diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c index 1f3c6b5897..37770b20e3 100644 --- a/tools/xl/xl_cmdtable.
[PATCH v3 6/8] xen/arm: Add XEN_DOMCTL_dt_overlay DOMCTL and related operations
In order to support the dynamic dtbo device assignment to a running VM, the add/remove of the DT overlay and the attach/detach of the device from the DT overlay should happen separately. Therefore, repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT overlay to Xen device tree, instead of assigning the device to the hardware domain at the same time. Add the XEN_DOMCTL_dt_overlay with operations XEN_DOMCTL_DT_OVERLAY_{ATTACH,DETACH} to do/undo the device assignment to the domain. The hypervisor firstly checks the DT overlay passed from the toolstack is valid. Then the device nodes are retrieved from the overlay tracker based on the DT overlay. The attach/detach of the device is implemented by map/unmap the IRQ and IOMMU resources. Note that with these changes, the device de-registration from the IOMMU driver should only happen at the time when the DT overlay is removed from the Xen device tree. Signed-off-by: Henry Wang Signed-off-by: Vikram Garhwal --- v3: - Style fixes for arch-selection #ifdefs. - Do not include public/domctl.h, only add a forward declaration of struct xen_domctl_dt_overlay. - Extract the overlay track entry finding logic to a function, drop the unused variables. - Use op code 1&2 for XEN_DOMCTL_DT_OVERLAY_{ATTACH,DETACH}. v2: - New patch. --- xen/arch/arm/domctl.c| 3 + xen/common/dt-overlay.c | 438 +++ xen/include/public/domctl.h | 15 ++ xen/include/public/sysctl.h | 11 +- xen/include/xen/dt-overlay.h | 7 + 5 files changed, 367 insertions(+), 107 deletions(-) diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index ad56efb0f5..12a12ee781 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -5,6 +5,7 @@ * Copyright (c) 2012, Citrix Systems */ +#include #include #include #include @@ -176,6 +177,8 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, return rc; } +case XEN_DOMCTL_dt_overlay: +return dt_overlay_domctl(d, >u.dt_overlay); default: return subarch_do_domctl(domctl, d, u_domctl); } diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c index 9cece79067..693b6e4777 100644 --- a/xen/common/dt-overlay.c +++ b/xen/common/dt-overlay.c @@ -356,24 +356,136 @@ static int overlay_get_nodes_info(const void *fdto, char **nodes_full_path) return 0; } +/* This function should be called with the overlay_lock taken */ +static struct overlay_track * +find_track_entry_from_tracker(const void *overlay_fdt, + uint32_t overlay_fdt_size) +{ +struct overlay_track *entry, *temp; +bool found_entry = false; + +ASSERT(spin_is_locked(_lock)); + +/* + * First check if dtbo is correct i.e. it should one of the dtbo which was + * used when dynamically adding the node. + * Limitation: Cases with same node names but different property are not + * supported currently. We are relying on user to provide the same dtbo + * as it was used when adding the nodes. + */ +list_for_each_entry_safe( entry, temp, _tracker, entry ) +{ +if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 ) +{ +found_entry = true; +break; +} +} + +if ( !found_entry ) +{ +printk(XENLOG_ERR "Cannot find any matching tracker with input dtbo." + " Operation is supported only for prior added dtbo.\n"); +return NULL; +} + +return entry; +} + +static int remove_irq(unsigned long s, unsigned long e, void *data) +{ +struct domain *d = data; +int rc = 0; + +/* + * IRQ should always have access unless there are duplication of + * of irqs in device tree. There are few cases of xen device tree + * where there are duplicate interrupts for the same node. + */ +if (!irq_access_permitted(d, s)) +return 0; +/* + * TODO: We don't handle shared IRQs for now. So, it is assumed that + * the IRQs was not shared with another domain. + */ +rc = irq_deny_access(d, s); +if ( rc ) +{ +printk(XENLOG_ERR "unable to revoke access for irq %ld\n", s); +return rc; +} + +rc = release_guest_irq(d, s); +if ( rc ) +{ +printk(XENLOG_ERR "unable to release irq %ld\n", s); +return rc; +} + +return rc; +} + +static int remove_all_irqs(struct rangeset *irq_ranges, struct domain *d) +{ +return rangeset_report_ranges(irq_ranges, 0, ~0UL, remove_irq, d); +} + +static int remove_iomem(unsigned long s, unsigned long e, void *data) +{ +struct domain *d = data; +int rc = 0; +p2m_type_t t; +mfn_t mfn; + +mfn = p2m_lookup(d, _gfn(s), ); +if ( mfn_x(mfn) == 0 || mfn_x(mfn) == ~0UL ) +return -EINVAL; + +rc = iomem_deny_access(d, s, e); +if ( rc ) +{ +printk(XENLOG_ERR "Unable to remov
[PATCH v3 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs
From: Vikram Garhwal Currently, routing/removing physical interrupts are only allowed at the domain creation/destroy time. For use cases such as dynamic device tree overlay adding/removing, the routing/removing of physical IRQ to running domains should be allowed. Removing the above-mentioned domain creation/dying check. Since this will introduce interrupt state unsync issues for cases when the interrupt is active or pending in the guest, therefore for these cases we simply reject the operation. Do it for both new and old vGIC implementations. Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- v3: - Update in-code comments. - Correct the if conditions. - Add taking/releasing the vgic lock of the vcpu. v2: - Reject the case where the IRQ is active or pending in guest. --- xen/arch/arm/gic-vgic.c | 15 --- xen/arch/arm/gic.c | 15 --- xen/arch/arm/vgic/vgic.c | 10 +++--- 3 files changed, 19 insertions(+), 21 deletions(-) diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 56490dbc43..956c11ba13 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -439,24 +439,33 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, /* We are taking to rank lock to prevent parallel connections. */ vgic_lock_rank(v_target, rank, flags); +spin_lock(_target->arch.vgic.lock); if ( connect ) { -/* The VIRQ should not be already enabled by the guest */ +/* + * The VIRQ should not be already enabled by the guest nor + * active/pending in the guest. + */ if ( !p->desc && - !test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) + !test_bit(GIC_IRQ_GUEST_ENABLED, >status) && + !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) && + !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) ) p->desc = desc; else ret = -EBUSY; } else { -if ( desc && p->desc != desc ) +if ( (desc && p->desc != desc) || + test_bit(GIC_IRQ_GUEST_VISIBLE, >status) || + test_bit(GIC_IRQ_GUEST_ACTIVE, >status) ) ret = -EINVAL; else p->desc = NULL; } +spin_unlock(_target->arch.vgic.lock); vgic_unlock_rank(v_target, rank, flags); return ret; diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 44c40e86de..3ebd89940a 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -135,14 +135,6 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, ASSERT(virq < vgic_num_irqs(d)); ASSERT(!is_lpi(virq)); -/* - * When routing an IRQ to guest, the virtual state is not synced - * back to the physical IRQ. To prevent get unsync, restrict the - * routing to when the Domain is been created. - */ -if ( d->creation_finished ) -return -EBUSY; - ret = vgic_connect_hw_irq(d, NULL, virq, desc, true); if ( ret ) return ret; @@ -167,13 +159,6 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, ASSERT(test_bit(_IRQ_GUEST, >status)); ASSERT(!is_lpi(virq)); -/* - * Removing an interrupt while the domain is running may have - * undesirable effect on the vGIC emulation. - */ -if ( !d->is_dying ) -return -EBUSY; - desc->handler->shutdown(desc); /* EOI the IRQ if it has not been done by the guest */ diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c index b9463a5f27..78554c11e2 100644 --- a/xen/arch/arm/vgic/vgic.c +++ b/xen/arch/arm/vgic/vgic.c @@ -876,8 +876,11 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *vcpu, if ( connect ) /* assign a mapped IRQ */ { -/* The VIRQ should not be already enabled by the guest */ -if ( !irq->hw && !irq->enabled ) +/* + * The VIRQ should not be already enabled by the guest nor + * active/pending in the guest + */ +if ( !irq->hw && !irq->enabled && !irq->active && !irq->pending_latch ) { irq->hw = true; irq->hwintid = desc->irq; @@ -887,7 +890,8 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *vcpu, } else/* remove a mapped IRQ */ { -if ( desc && irq->hwintid != desc->irq ) +if ( (desc && irq->hwintid != desc->irq) || + irq->active || irq->pending_latch ) { ret = -EINVAL; } -- 2.34.1
[PATCH v3 4/8] tools/arm: Introduce the "nr_spis" xl config entry
Currently, the number of SPIs allocated to the domain is only configurable for Dom0less DomUs. Xen domains are supposed to be platform agnostics and therefore the numbers of SPIs for libxl guests should not be based on the hardware. Introduce a new xl config entry for Arm to provide a method for user to decide the number of SPIs. This would help to avoid bumping the `config->arch.nr_spis` in libxl everytime there is a new platform with increased SPI numbers. Update the doc and the golang bindings accordingly. Signed-off-by: Henry Wang --- v3: - Reword documentation to avoid ambiguity. v2: - New patch to replace the original patch in v1: "[PATCH 05/15] tools/libs/light: Increase nr_spi to 160" --- docs/man/xl.cfg.5.pod.in | 14 ++ tools/golang/xenlight/helpers.gen.go | 2 ++ tools/golang/xenlight/types.gen.go | 1 + tools/libs/light/libxl_arm.c | 4 ++-- tools/libs/light/libxl_types.idl | 1 + tools/xl/xl_parse.c | 3 +++ 6 files changed, 23 insertions(+), 2 deletions(-) diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in index 8f2b375ce9..416d582844 100644 --- a/docs/man/xl.cfg.5.pod.in +++ b/docs/man/xl.cfg.5.pod.in @@ -3072,6 +3072,20 @@ raised. =back +=over 4 + +=item B + +An optional 32-bit integer parameter specifying the number of SPIs (Shared +Peripheral Interrupts) to allocate for the domain. If the value specified by +the `nr_spis` parameter is smaller than the number of SPIs calculated by the +toolstack based on the devices allocated for the domain, or the `nr_spis` +parameter is not specified, the value calculated by the toolstack will be used +for the domain. Otherwise, the value specified by the `nr_spis` parameter will +be used. + +=back + =head3 x86 =over 4 diff --git a/tools/golang/xenlight/helpers.gen.go b/tools/golang/xenlight/helpers.gen.go index b9cb5b33c7..fe5110474d 100644 --- a/tools/golang/xenlight/helpers.gen.go +++ b/tools/golang/xenlight/helpers.gen.go @@ -1154,6 +1154,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)} x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version) x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart) x.ArchArm.SveVl = SveType(xc.arch_arm.sve_vl) +x.ArchArm.NrSpis = uint32(xc.arch_arm.nr_spis) if err := x.ArchX86.MsrRelaxed.fromC(_x86.msr_relaxed);err != nil { return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err) } @@ -1670,6 +1671,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)} xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion) xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart) xc.arch_arm.sve_vl = C.libxl_sve_type(x.ArchArm.SveVl) +xc.arch_arm.nr_spis = C.uint32_t(x.ArchArm.NrSpis) if err := x.ArchX86.MsrRelaxed.toC(_x86.msr_relaxed); err != nil { return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err) } diff --git a/tools/golang/xenlight/types.gen.go b/tools/golang/xenlight/types.gen.go index 5b293755d7..c9e45b306f 100644 --- a/tools/golang/xenlight/types.gen.go +++ b/tools/golang/xenlight/types.gen.go @@ -597,6 +597,7 @@ ArchArm struct { GicVersion GicVersion Vuart VuartType SveVl SveType +NrSpis uint32 } ArchX86 struct { MsrRelaxed Defbool diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index 1cb89fa584..a4029e3ac8 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -181,8 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, LOG(DEBUG, "Configure the domain"); -config->arch.nr_spis = nr_spis; -LOG(DEBUG, " - Allocate %u SPIs", nr_spis); +config->arch.nr_spis = max(nr_spis, d_config->b_info.arch_arm.nr_spis); +LOG(DEBUG, " - Allocate %u SPIs", config->arch.nr_spis); switch (d_config->b_info.arch_arm.gic_version) { case LIBXL_GIC_VERSION_DEFAULT: diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl index 79e9c656cc..4e65e6fda5 100644 --- a/tools/libs/light/libxl_types.idl +++ b/tools/libs/light/libxl_types.idl @@ -722,6 +722,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), ("vuart", libxl_vuart_type), ("sve_vl", libxl_sve_type), + ("nr_spis", uint32), ])), ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool), ])), diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index c504ab3711..e3a4800f6e 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -2935,6 +2935,9 @@ skip_usbdev: } } +if (!xlu_cfg_get_long (config, "nr_spis", , 0)) +b_info->arch_arm.nr_spis = l; + parse_vkb_list(config, d_config); d_config->virtios = NULL; -- 2.34.1
[PATCH v3 2/8] tools/xl: Correct the help information and exit code of the dt-overlay command
Fix the name mismatch in the xl dt-overlay command, the command name should be "dt-overlay" instead of "dt_overlay". Add the missing "," in the cmdtable. Fix the exit code of the dt-overlay command, use EXIT_FAILURE instead of ERROR_FAIL. Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support") Suggested-by: Anthony PERARD Signed-off-by: Henry Wang Reviewed-by: Jason Andryuk --- v3: - Add Jason's Reviewed-by tag. v2: - New patch --- tools/xl/xl_cmdtable.c | 2 +- tools/xl/xl_vmcontrol.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c index 62bdb2aeaa..1f3c6b5897 100644 --- a/tools/xl/xl_cmdtable.c +++ b/tools/xl/xl_cmdtable.c @@ -635,7 +635,7 @@ const struct cmd_spec cmd_table[] = { { "dt-overlay", _dt_overlay, 0, 1, "Add/Remove a device tree overlay", - "add/remove <.dtbo>" + "add/remove <.dtbo>", "-h print this help\n" }, #endif diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c index 98f6bd2e76..02575d5d36 100644 --- a/tools/xl/xl_vmcontrol.c +++ b/tools/xl/xl_vmcontrol.c @@ -1278,7 +1278,7 @@ int main_dt_overlay(int argc, char **argv) const int overlay_remove_op = 2; if (argc < 2) { -help("dt_overlay"); +help("dt-overlay"); return EXIT_FAILURE; } @@ -1302,11 +1302,11 @@ int main_dt_overlay(int argc, char **argv) fprintf(stderr, "failed to read the overlay device tree file %s\n", overlay_config_file); free(overlay_dtb); -return ERROR_FAIL; +return EXIT_FAILURE; } } else { fprintf(stderr, "overlay dtbo file not provided\n"); -return ERROR_FAIL; +return EXIT_FAILURE; } rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op); -- 2.34.1
Re: [PATCH v2 7/8] tools: Introduce the "xl dt-overlay {attach,detach}" commands
Hi Jason, On 5/21/2024 3:41 AM, Jason Andryuk wrote: On 2024-05-16 06:03, Henry Wang wrote: + domain_id = strtol(argv[optind+2], NULL, 10); domain_id = find_domain(argv[optind+2]); Good point, thanks. I will use the find_domain() in the next version. Kind regards, Henry And you'll get name resolution, too. Thanks, Jason
Re: [PATCH v2 4/8] tools/arm: Introduce the "nr_spis" xl config entry
Hi Jason, On 5/21/2024 3:13 AM, Jason Andryuk wrote: + +=item B + +A 32-bit optional integer parameter specifying the number of SPIs (Shared I'd phrase it "An optional 32-but integer" +Peripheral Interrupts) to allocate for the domain. If the `nr_spis` parameter +is missing, the max number of SPIs calculated by the toolstack based on the +devices allocated for the domain will be used. This text says the maximum only applies if xl.cfg nr_spis is not setup. + +=back + =head3 x86 =over 4 diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index 1cb89fa584..a4029e3ac8 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -181,8 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, LOG(DEBUG, "Configure the domain"); - config->arch.nr_spis = nr_spis; - LOG(DEBUG, " - Allocate %u SPIs", nr_spis); + config->arch.nr_spis = max(nr_spis, d_config->b_info.arch_arm.nr_spis); + LOG(DEBUG, " - Allocate %u SPIs", config->arch.nr_spis); But this is always taking the max. Should it instead be: config->arch.nr_spis = d_config->b_info.arch_arm.nr_spis ?: nr_spis; However, I don't know if that makes sense for ARM. Does the hardware nr_spis need to be a minimum for a domain? Really, we just want the documentation to match the code. Before you pointed this out, I didn't realize the ambiguity in the doc about the "max". The "max" in the doc have different meanings compared to the "max()" in the code. I will drop the "max" in the doc and reword the doc to "If the `nr_spis` parameter is missing, the number of SPIs calculated by the toolstack based on the devices allocated for the domain will be used.". Thanks for pointing it out. Kind regards, Henry Thanks, Jason
Re: [PATCH v3 1/4] xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains
Hi Michal, On 5/21/2024 12:09 AM, Michal Orzel wrote: Thanks. I will take the tag if you are ok with above diff (for the case if this series goes in later than Luca's). I would move this check to process_shm() right after "gbase = dt_read_paddr" setting. This would be the most natural placement for such a check. That sounds good. Thanks! IIUC we only need to add the check for the pbase != INVALID_PADDR case correct? Yes, but at the same time I wonder whether we should also return error if a user omits pbase for direct mapped domain. I think this makes sense. So I will add also a check for the case if users omit pbase in the device tree for the direct mapped domain. Kind regards, Henry ~Michal
Re: [PATCH v3 1/4] xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains
Hi Michal, On 5/20/2024 11:46 PM, Michal Orzel wrote: Hi Henry, On 20/05/2024 16:52, Henry Wang wrote: Hi Michal, Luca, On 5/20/2024 7:24 PM, Michal Orzel wrote: Hi Henry, +CC: Luca On 17/05/2024 05:21, Henry Wang wrote: To make things easier, add restriction that static shared memory should also be direct-mapped for direct-mapped domains. Check the host physical address to be matched with guest physical address when parsing the device tree. Document this restriction in the doc. I'm ok with this restriction. @Luca, do you have any use case preventing us from making this restriction? This patch clashes with Luca series so depending on which goes first, I agree that there will be some conflicts between the two series. To avoid back and forth, if Luca's series goes in first, would it be ok for you if I place the same check from this patch in handle_shared_mem_bank() like below? diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c index 9c3a83042d..2d23fa4917 100644 --- a/xen/arch/arm/static-shmem.c +++ b/xen/arch/arm/static-shmem.c @@ -219,6 +219,13 @@ static int __init handle_shared_mem_bank(struct domain *d, paddr_t gbase, pbase = shm_bank->start; psize = shm_bank->size; + if ( is_domain_direct_mapped(d) && (pbase != gbase) ) + { + printk("%pd: physical address 0x%"PRIpaddr" and guest address 0x%"PRIpaddr" are not direct-mapped.\n", + d, pbase, gbase); + return -EINVAL; + } + printk("%pd: SHMEM map from %s: mphys 0x%"PRIpaddr" -> gphys 0x%"PRIpaddr", size 0x%"PRIpaddr"\n", d, bank_from_heap ? "Xen heap" : "Host", pbase, gbase, psize); Acked-by: Michal Orzel Thanks. I will take the tag if you are ok with above diff (for the case if this series goes in later than Luca's). I would move this check to process_shm() right after "gbase = dt_read_paddr" setting. This would be the most natural placement for such a check. That sounds good. Thanks! IIUC we only need to add the check for the pbase != INVALID_PADDR case correct? Kind regards, Henry ~Michal
Re: [PATCH v3 1/4] xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains
Hi Michal, Luca, On 5/20/2024 7:24 PM, Michal Orzel wrote: Hi Henry, +CC: Luca On 17/05/2024 05:21, Henry Wang wrote: To make things easier, add restriction that static shared memory should also be direct-mapped for direct-mapped domains. Check the host physical address to be matched with guest physical address when parsing the device tree. Document this restriction in the doc. I'm ok with this restriction. @Luca, do you have any use case preventing us from making this restriction? This patch clashes with Luca series so depending on which goes first, I agree that there will be some conflicts between the two series. To avoid back and forth, if Luca's series goes in first, would it be ok for you if I place the same check from this patch in handle_shared_mem_bank() like below? diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c index 9c3a83042d..2d23fa4917 100644 --- a/xen/arch/arm/static-shmem.c +++ b/xen/arch/arm/static-shmem.c @@ -219,6 +219,13 @@ static int __init handle_shared_mem_bank(struct domain *d, paddr_t gbase, pbase = shm_bank->start; psize = shm_bank->size; + if ( is_domain_direct_mapped(d) && (pbase != gbase) ) + { + printk("%pd: physical address 0x%"PRIpaddr" and guest address 0x%"PRIpaddr" are not direct-mapped.\n", + d, pbase, gbase); + return -EINVAL; + } + printk("%pd: SHMEM map from %s: mphys 0x%"PRIpaddr" -> gphys 0x%"PRIpaddr", size 0x%"PRIpaddr"\n", d, bank_from_heap ? "Xen heap" : "Host", pbase, gbase, psize); Acked-by: Michal Orzel Thanks. I will take the tag if you are ok with above diff (for the case if this series goes in later than Luca's). } +if ( is_domain_direct_mapped(d) && (pbase != gbase) ) +{ +printk("%pd: physical address 0x%"PRIpaddr" and guest address 0x%"PRIpaddr" are not 1:1 direct-mapped.\n", NIT: 1:1 and direct-mapped means the same so no need to place them next to each other Ok. I will drop the "1:1" in the next version. Thanks. Kind regards, Henry ~Michal
[PATCH] tools/golang: Add missing golang bindings for vlan
It is noticed that commit: 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic") introduces a new "vlan" string field to libxl_device_nic. But the golang bindings are missing. Add it in this patch. Fixes: 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic") Signed-off-by: Henry Wang --- The code is automatically generated by: ``` ./configure make tools ``` --- tools/golang/xenlight/helpers.gen.go | 3 +++ tools/golang/xenlight/types.gen.go | 1 + 2 files changed, 4 insertions(+) diff --git a/tools/golang/xenlight/helpers.gen.go b/tools/golang/xenlight/helpers.gen.go index 78bdb08b15..b9cb5b33c7 100644 --- a/tools/golang/xenlight/helpers.gen.go +++ b/tools/golang/xenlight/helpers.gen.go @@ -1963,6 +1963,7 @@ func (x *DeviceNic) fromC(xc *C.libxl_device_nic) error { x.BackendDomname = C.GoString(xc.backend_domname) x.Devid = Devid(xc.devid) x.Mtu = int(xc.mtu) +x.Vlan = C.GoString(xc.vlan) x.Model = C.GoString(xc.model) if err := x.Mac.fromC();err != nil { return fmt.Errorf("converting field Mac: %v", err) @@ -2040,6 +2041,8 @@ if x.BackendDomname != "" { xc.backend_domname = C.CString(x.BackendDomname)} xc.devid = C.libxl_devid(x.Devid) xc.mtu = C.int(x.Mtu) +if x.Vlan != "" { +xc.vlan = C.CString(x.Vlan)} if x.Model != "" { xc.model = C.CString(x.Model)} if err := x.Mac.toC(); err != nil { diff --git a/tools/golang/xenlight/types.gen.go b/tools/golang/xenlight/types.gen.go index ccfe18019e..5b293755d7 100644 --- a/tools/golang/xenlight/types.gen.go +++ b/tools/golang/xenlight/types.gen.go @@ -756,6 +756,7 @@ BackendDomid Domid BackendDomname string Devid Devid Mtu int +Vlan string Model string Mac Mac Ip string -- 2.34.1
Re: [PATCH v3] xen/arm: Set correct per-cpu cpu_core_mask
Hi All, Gentle ping since it has been a couple of months, any comments on this updated patch? Thanks! Kind regards, Henry On 3/21/2024 11:57 AM, Henry Wang wrote: In the common sysctl command XEN_SYSCTL_physinfo, the value of cores_per_socket is calculated based on the cpu_core_mask of CPU0. Currently on Arm this is a fixed value 1 (can be checked via xl info), which is not correct. This is because during the Arm CPU online process at boot time, setup_cpu_sibling_map() only sets the per-cpu cpu_core_mask for itself. cores_per_socket refers to the number of cores that belong to the same socket (NUMA node). Currently Xen on Arm does not support physical CPU hotplug and NUMA, also we assume there is no multithread. Therefore cores_per_socket means all possible CPUs detected from the device tree. Setting the per-cpu cpu_core_mask in setup_cpu_sibling_map() accordingly. Modify the in-code comment which seems to be outdated. Add a warning to users if Xen is running on processors with multithread support. Signed-off-by: Henry Wang Signed-off-by: Henry Wang --- v3: - Use cpumask_copy() to set cpu_core_mask and drop the unnecessary cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu)). - In-code comment adjustments. - Add a warning for multithread. v2: - Do not do the multithread check. --- xen/arch/arm/smpboot.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c index a84e706d77..b6268be27a 100644 --- a/xen/arch/arm/smpboot.c +++ b/xen/arch/arm/smpboot.c @@ -66,7 +66,11 @@ static bool cpu_is_dead; /* ID of the PCPU we're running on */ DEFINE_PER_CPU(unsigned int, cpu_id); -/* XXX these seem awfully x86ish... */ +/* + * Although multithread is part of the Arm spec, there are not many + * processors support multithread and current Xen on Arm assumes there + * is no multithread. + */ /* representing HT siblings of each logical CPU */ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_mask); /* representing HT and core siblings of each logical CPU */ @@ -85,9 +89,13 @@ static int setup_cpu_sibling_map(int cpu) !zalloc_cpumask_var(_cpu(cpu_core_mask, cpu)) ) return -ENOMEM; -/* A CPU is a sibling with itself and is always on its own core. */ +/* + * Currently we assume there is no multithread and NUMA, so + * a CPU is a sibling with itself, and the all possible CPUs + * are supposed to belong to the same socket (NUMA node). + */ cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu)); -cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu)); +cpumask_copy(per_cpu(cpu_core_mask, cpu), _possible_map); return 0; } @@ -277,6 +285,10 @@ void __init smp_init_cpus(void) warning_add("WARNING: HMP COMPUTING HAS BEEN ENABLED.\n" "It has implications on the security and stability of the system,\n" "unless the cpu affinity of all domains is specified.\n"); + +if ( system_cpuinfo.mpidr.mt == 1 ) +warning_add("WARNING: MULTITHREADING HAS BEEN DETECTED ON THE PROCESSOR.\n" +"It might impact the security of the system.\n"); } unsigned int __init smp_get_max_cpus(void)
Re: [PATCH v2 3/8] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
Hi Julien, On 5/20/2024 8:41 AM, Henry Wang wrote: Hi Julien, Thanks for spending time on the review! On 5/19/2024 6:17 PM, Julien Grall wrote: Hi Henry, On 16/05/2024 11:03, Henry Wang wrote: diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt index bbd955e9c2..61f9082553 100644 --- a/docs/misc/arm/device-tree/booting.txt +++ b/docs/misc/arm/device-tree/booting.txt @@ -260,6 +260,19 @@ with the following properties: value specified by Xen command line parameter gnttab_max_maptrack_frames (or its default value if unspecified, i.e. 1024) is used. +- passthrough + + A string property specifying whether IOMMU mappings are enabled for the + domain and hence whether it will be enabled for passthrough hardware. + Possible property values are: + + - "enabled" + IOMMU mappings are enabled for the domain. + + - "disabled" + IOMMU mappings are disabled for the domain and so hardware may not be + passed through. This option is the default if this property is missing. Looking at the code below, it seems like the default will depend on whether the partial device-tree is present. Did I misunderstand? I am not sure if I understand the "partial device tree" in above comment correctly. The "passthrough" property is supposed to be placed in the dom0less domU domain node exactly the same way as the other dom0less domU properties (such as "direct-map" etc.). This way we can control the XEN_DOMCTL_CDF_iommu is set or not for each dom0less domU separately. Oh I think I get your points, you meant the XEN_DOMCTL_CDF_iommu will still be set if the passthrough dt property is "disabled", but user provides a partial device tree. Yes you are correct. I will update the doc to explain a bit more details as below. Thanks for pointing it out. - "enabled" IOMMU mappings are enabled for the domain. Note that this option is the default if the user provides the device partial passthrough device tree for the domain. - "disabled" IOMMU mappings are disabled for the domain and so hardware may not be passed through. This option is the default if this property is missing and the user does not provide the device partial device tree for the domain. Kind regards, Henry
Re: [PATCH v2 3/8] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
Hi Julien, On 5/20/2024 8:41 AM, Henry Wang wrote: Hi Julien, Thanks for spending time on the review! On 5/19/2024 6:17 PM, Julien Grall wrote: Hi Henry, On 16/05/2024 11:03, Henry Wang wrote: diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt index bbd955e9c2..61f9082553 100644 --- a/docs/misc/arm/device-tree/booting.txt +++ b/docs/misc/arm/device-tree/booting.txt @@ -260,6 +260,19 @@ with the following properties: value specified by Xen command line parameter gnttab_max_maptrack_frames (or its default value if unspecified, i.e. 1024) is used. +- passthrough + + A string property specifying whether IOMMU mappings are enabled for the + domain and hence whether it will be enabled for passthrough hardware. + Possible property values are: + + - "enabled" + IOMMU mappings are enabled for the domain. + + - "disabled" + IOMMU mappings are disabled for the domain and so hardware may not be + passed through. This option is the default if this property is missing. Looking at the code below, it seems like the default will depend on whether the partial device-tree is present. Did I misunderstand? I am not sure if I understand the "partial device tree" in above comment correctly. The "passthrough" property is supposed to be placed in the dom0less domU domain node exactly the same way as the other dom0less domU properties (such as "direct-map" etc.). This way we can control the XEN_DOMCTL_CDF_iommu is set or not for each dom0less domU separately. Oh I think I get your points, you meant the XEN_DOMCTL_CDF_iommu will still be set if the passthrough dt property is "disabled", but user provides a partial device tree. Yes you are correct. I will update the doc to explain a bit more details as below. Thanks for pointing it out. - "enabled" IOMMU mappings are enabled for the domain. Note that this option is the default if the user provides the device partial passthrough device tree for the domain. - "disabled" IOMMU mappings are disabled for the domain and so hardware may not be passed through. This option is the default if this property is missing and the user does not provide the device partial device tree for the domain. Kind regards, Henry
Re: [PATCH v2 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs
Hi Julien, On 5/19/2024 7:08 PM, Julien Grall wrote: Hi, On 17/05/2024 07:03, Henry Wang wrote: @@ -444,14 +444,18 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, { /* The VIRQ should not be already enabled by the guest */ This comment needs to be updated. Yes, sorry. I will update this and the one in the new vGIC in v3. if ( !p->desc && - !test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) + !test_bit(GIC_IRQ_GUEST_ENABLED, >status) && + !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) && + !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) ) p->desc = desc; else ret = -EBUSY; } else { - if ( desc && p->desc != desc ) + if ( desc && p->desc != desc && + (test_bit(GIC_IRQ_GUEST_VISIBLE, >status) || + test_bit(GIC_IRQ_GUEST_ACTIVE, >status)) ) This should be + if ( (desc && p->desc != desc) || + test_bit(GIC_IRQ_GUEST_VISIBLE, >status) || + test_bit(GIC_IRQ_GUEST_ACTIVE, >status) ) Looking at gic_set_lr(), we first check p->desc, before setting IRQ_GUEST_VISIBLE. I can't find a common lock, so what would guarantee that p->desc is not going to be used or IRQ_GUEST_VISIBLE set afterwards? I think the gic_set_lr() is supposed to be called with v->arch.vgic.lock taken, at least the current two callers (gic_raise_guest_irq() and gic_restore_pending_irqs()) are doing it this way. Would this address your concern? Thanks. Kind regards, Henry
Re: [PATCH v2 3/8] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
Hi Julien, Thanks for spending time on the review! On 5/19/2024 6:17 PM, Julien Grall wrote: Hi Henry, On 16/05/2024 11:03, Henry Wang wrote: diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt index bbd955e9c2..61f9082553 100644 --- a/docs/misc/arm/device-tree/booting.txt +++ b/docs/misc/arm/device-tree/booting.txt @@ -260,6 +260,19 @@ with the following properties: value specified by Xen command line parameter gnttab_max_maptrack_frames (or its default value if unspecified, i.e. 1024) is used. +- passthrough + + A string property specifying whether IOMMU mappings are enabled for the + domain and hence whether it will be enabled for passthrough hardware. + Possible property values are: + + - "enabled" + IOMMU mappings are enabled for the domain. + + - "disabled" + IOMMU mappings are disabled for the domain and so hardware may not be + passed through. This option is the default if this property is missing. Looking at the code below, it seems like the default will depend on whether the partial device-tree is present. Did I misunderstand? I am not sure if I understand the "partial device tree" in above comment correctly. The "passthrough" property is supposed to be placed in the dom0less domU domain node exactly the same way as the other dom0less domU properties (such as "direct-map" etc.). This way we can control the XEN_DOMCTL_CDF_iommu is set or not for each dom0less domU separately. + Under the "xen,domain" compatible node, one or more sub-nodes are present for the DomU kernel and ramdisk. diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index 74f053c242..1396a102e1 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -848,6 +848,7 @@ static int __init construct_domU(struct domain *d, void __init create_domUs(void) { struct dt_device_node *node; + const char *dom0less_iommu; const struct dt_device_node *cpupool_node, *chosen = dt_find_node_by_path("/chosen"); @@ -895,8 +896,10 @@ void __init create_domUs(void) panic("Missing property 'cpus' for domain %s\n", dt_node_name(node)); - if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") && - iommu_enabled ) + if ( iommu_enabled && + ((!dt_property_read_string(node, "passthrough", _iommu) && + !strcmp(dom0less_iommu, "enabled")) || + dt_find_compatible_node(node, NULL, "multiboot,device-tree")) ) This condition is getting a little bit harder to read. Can we cache the "passthrough" flag separately? Yes sure. Will do this in v3. Also, shouldn't we throw a panic if passthrough = "enabled" but the IOMMU is enabled? I take the above "enabled" should be "disabled"? Actually we already have several checks to do that: Firstly, the above if condition checks the "iommu_enabled", so if IOMMU is disabled, the XEN_DOMCTL_CDF_iommu is never set. Also, in later on domain config sanitising process, i.e. domain_create() -> sanitise_domain_config(), there is also a check and panic to check if XEN_DOMCTL_CDF_iommu is somehow set but IOMMU is disabled. So I think these are sufficient for us. Did I understand your comment correctly? Kind regards, Henry d_cfg.flags |= XEN_DOMCTL_CDF_iommu; if ( !dt_property_read_u32(node, "nr_spis", _cfg.arch.nr_spis) ) Cheers,
Re: [PATCH v2 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs
On 5/16/2024 6:03 PM, Henry Wang wrote: From: Vikram Garhwal Currently, routing/removing physical interrupts are only allowed at the domain creation/destroy time. For use cases such as dynamic device tree overlay adding/removing, the routing/removing of physical IRQ to running domains should be allowed. Removing the above-mentioned domain creation/dying check. Since this will introduce interrupt state unsync issues for cases when the interrupt is active or pending in the guest, therefore for these cases we simply reject the operation. Do it for both new and old vGIC implementations. Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- v2: - Reject the case where the IRQ is active or pending in guest. --- xen/arch/arm/gic-vgic.c | 8 ++-- xen/arch/arm/gic.c | 15 --- xen/arch/arm/vgic/vgic.c | 5 +++-- 3 files changed, 9 insertions(+), 19 deletions(-) diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 56490dbc43..d1608415f8 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -444,14 +444,18 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, { /* The VIRQ should not be already enabled by the guest */ if ( !p->desc && - !test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) + !test_bit(GIC_IRQ_GUEST_ENABLED, >status) && + !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) && + !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) ) p->desc = desc; else ret = -EBUSY; } else { -if ( desc && p->desc != desc ) +if ( desc && p->desc != desc && + (test_bit(GIC_IRQ_GUEST_VISIBLE, >status) || + test_bit(GIC_IRQ_GUEST_ACTIVE, >status)) ) This should be +if ( (desc && p->desc != desc) || + test_bit(GIC_IRQ_GUEST_VISIBLE, >status) || + test_bit(GIC_IRQ_GUEST_ACTIVE, >status) ) @@ -887,7 +887,8 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *vcpu, } else/* remove a mapped IRQ */ { -if ( desc && irq->hwintid != desc->irq ) +if ( desc && irq->hwintid != desc->irq && + (irq->active || irq->pending_latch) ) Same here, this should be +if ( (desc && irq->hwintid != desc->irq) || + irq->active || irq->pending_latch ) Kind regards, Henry
[PATCH v3 3/4] tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE
Currently the GUEST_MAGIC_BASE in the init-dom0less application is hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less DomUs. Since the guest magic region allocation from init-dom0less is for XenStore, and the XenStore page is now allocated from the hypervisor, instead of hardcoding the guest magic pages region, use xc_hvm_param_get() to get the XenStore page PFN. Rename alloc_xs_page() to get_xs_page() to reflect the changes. With this change, some existing code is not needed anymore, including: (1) The definition of the XenStore page offset. (2) Call to xc_domain_setmaxmem() and xc_clear_domain_page() as we don't need to set the max mem and clear the page anymore. (3) Foreign mapping of the XenStore page, setting of XenStore interface status and HVM_PARAM_STORE_PFN from init-dom0less, as they are set by the hypervisor. Take the opportunity to do some coding style improvements when possible. Reported-by: Alec Kwapis Signed-off-by: Henry Wang --- v3: - Only get the XenStore page. - Drop the unneeded code. v2: - Update HVMOP keys name. --- tools/helpers/init-dom0less.c | 58 +-- 1 file changed, 14 insertions(+), 44 deletions(-) diff --git a/tools/helpers/init-dom0less.c b/tools/helpers/init-dom0less.c index fee93459c4..2b51965fa7 100644 --- a/tools/helpers/init-dom0less.c +++ b/tools/helpers/init-dom0less.c @@ -16,30 +16,18 @@ #include "init-dom-json.h" -#define XENSTORE_PFN_OFFSET 1 #define STR_MAX_LENGTH 128 -static int alloc_xs_page(struct xc_interface_core *xch, - libxl_dominfo *info, - uint64_t *xenstore_pfn) +static int get_xs_page(struct xc_interface_core *xch, libxl_dominfo *info, + uint64_t *xenstore_pfn) { int rc; -const xen_pfn_t base = GUEST_MAGIC_BASE >> XC_PAGE_SHIFT; -xen_pfn_t p2m = (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET; -rc = xc_domain_setmaxmem(xch, info->domid, - info->max_memkb + (XC_PAGE_SIZE/1024)); -if (rc < 0) -return rc; - -rc = xc_domain_populate_physmap_exact(xch, info->domid, 1, 0, 0, ); -if (rc < 0) -return rc; - -*xenstore_pfn = base + XENSTORE_PFN_OFFSET; -rc = xc_clear_domain_page(xch, info->domid, *xenstore_pfn); -if (rc < 0) -return rc; +rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_STORE_PFN, xenstore_pfn); +if (rc < 0) { +printf("Failed to get HVM_PARAM_STORE_PFN\n"); +return 1; +} return 0; } @@ -100,6 +88,7 @@ static bool do_xs_write_vm(struct xs_handle *xsh, xs_transaction_t t, */ static int create_xenstore(struct xs_handle *xsh, libxl_dominfo *info, libxl_uuid uuid, + uint64_t xenstore_pfn, evtchn_port_t xenstore_port) { domid_t domid; @@ -145,8 +134,7 @@ static int create_xenstore(struct xs_handle *xsh, rc = snprintf(target_memkb_str, STR_MAX_LENGTH, "%"PRIu64, info->current_memkb); if (rc < 0 || rc >= STR_MAX_LENGTH) return rc; -rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%lld", - (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET); +rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%"PRIu64, xenstore_pfn); if (rc < 0 || rc >= STR_MAX_LENGTH) return rc; rc = snprintf(xenstore_port_str, STR_MAX_LENGTH, "%u", xenstore_port); @@ -230,7 +218,6 @@ static int init_domain(struct xs_handle *xsh, libxl_uuid uuid; uint64_t xenstore_evtchn, xenstore_pfn; int rc; -struct xenstore_domain_interface *intf; printf("Init dom0less domain: %u\n", info->domid); @@ -245,20 +232,11 @@ static int init_domain(struct xs_handle *xsh, if (!xenstore_evtchn) return 0; -/* Alloc xenstore page */ -if (alloc_xs_page(xch, info, _pfn) != 0) { -printf("Error on alloc magic pages\n"); -return 1; -} - -intf = xenforeignmemory_map(xfh, info->domid, PROT_READ | PROT_WRITE, 1, -_pfn, NULL); -if (!intf) { -printf("Error mapping xenstore page\n"); +/* Get xenstore page */ +if (get_xs_page(xch, info, _pfn) != 0) { +printf("Error on getting xenstore page\n"); return 1; } -intf->connection = XENSTORE_RECONNECT; -xenforeignmemory_unmap(xfh, intf, 1); rc = xc_dom_gnttab_seed(xch, info->domid, true, (xen_pfn_t)-1, xenstore_pfn, 0, 0); @@ -272,19 +250,11 @@ static int init_domain(struct xs_handle *xsh, if (rc) err(1, "gen_stub_json_config"); -/* Now everything is ready: set HVM_PARAM_STORE_PFN */ -rc = xc_hvm_param_set(xch, info->domid, HVM_PARAM_STORE_PFN
[PATCH v3 4/4] docs/features/dom0less: Update the late XenStore init protocol
With the new allocation strategy of Dom0less DomUs XenStore page, update the doc of the late XenStore init protocol accordingly. Signed-off-by: Henry Wang --- v3: - Wording change. v2: - New patch. --- docs/features/dom0less.pandoc | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/features/dom0less.pandoc b/docs/features/dom0less.pandoc index 725afa0558..8b178edee0 100644 --- a/docs/features/dom0less.pandoc +++ b/docs/features/dom0less.pandoc @@ -110,9 +110,10 @@ hotplug PV drivers to dom0less guests. E.g. xl network-attach domU. The implementation works as follows: - Xen allocates the xenstore event channel for each dom0less domU that has the "xen,enhanced" property, and sets HVM_PARAM_STORE_EVTCHN -- Xen does *not* allocate the xenstore page and sets HVM_PARAM_STORE_PFN - to ~0ULL (invalid) -- Dom0less domU kernels check that HVM_PARAM_STORE_PFN is set to invalid +- Xen allocates the xenstore page and sets HVM_PARAM_STORE_PFN as well + as the connection status to XENSTORE_RECONNECT. +- Dom0less domU kernels check that HVM_PARAM_STORE_PFN is set to + ~0ULL (invalid) or the connection status is *not* XENSTORE_CONNECTED. - Old kernels will continue without xenstore support (Note: some old buggy kernels might crash because they don't check the validity of HVM_PARAM_STORE_PFN before using it! Disable "xen,enhanced" in @@ -121,13 +122,14 @@ The implementation works as follows: channel (HVM_PARAM_STORE_EVTCHN) before continuing with the initialization - Once dom0 is booted, init-dom0less is executed: -- it allocates the xenstore shared page and sets HVM_PARAM_STORE_PFN +- it gets the xenstore shared page from HVM_PARAM_STORE_PFN - it calls xs_introduce_domain - Xenstored notices the new domain, initializes interfaces as usual, and sends an event channel notification to the domain using the xenstore event channel (HVM_PARAM_STORE_EVTCHN) - The Linux domU kernel receives the event channel notification, checks - HVM_PARAM_STORE_PFN again and continue with the initialization + HVM_PARAM_STORE_PFN and the connection status again and continue with + the initialization Limitations -- 2.34.1
[PATCH v3 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor
There are use cases (for example using the PV driver) in Dom0less setup that require Dom0less DomUs start immediately with Dom0, but initialize XenStore later after Dom0's successful boot and call to the init-dom0less application. An error message can seen from the init-dom0less application on 1:1 direct-mapped domains: ``` Allocating magic pages memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1 Error on alloc magic pages ``` The "magic page" is a terminology used in the toolstack as reserved pages for the VM to have access to virtual platform capabilities. Currently the magic pages for Dom0less DomUs are populated by the init-dom0less app through populate_physmap(), and populate_physmap() automatically assumes gfn == mfn for 1:1 direct mapped domains. This cannot be true for the magic pages that are allocated later from the init-dom0less application executed in Dom0. For domain using statically allocated memory but not 1:1 direct-mapped, similar error "failed to retrieve a reserved page" can be seen as the reserved memory list is empty at that time. Since for init-dom0less, the magic page region is only for XenStore. To solve above issue, this commit allocates the XenStore page for Dom0less DomUs at the domain construction time. The PFN will be noted and communicated to the init-dom0less application executed from Dom0. To keep the XenStore late init protocol, set the connection status to XENSTORE_RECONNECT. Reported-by: Alec Kwapis Suggested-by: Daniel P. Smith Signed-off-by: Henry Wang --- v3: - Only allocate XenStore page. (Julien) - Set HVM_PARAM_STORE_PFN and the XenStore connection status directly from hypervisor. (Stefano) v2: - Reword the commit msg to explain what is "magic page" and use generic terminology "hypervisor reserved pages" in commit msg. (Daniel) - Also move the offset definition of magic pages. (Michal) - Extract the magic page allocation logic to a function. (Michal) --- xen/arch/arm/dom0less-build.c | 44 ++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index 74f053c242..95c4fd1a2d 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0-only */ #include +#include #include #include #include @@ -10,6 +11,8 @@ #include #include +#include + #include #include #include @@ -739,6 +742,42 @@ static int __init alloc_xenstore_evtchn(struct domain *d) return 0; } +#define XENSTORE_PFN_OFFSET 1 +static int __init alloc_xenstore_page(struct domain *d) +{ +struct page_info *xenstore_pg; +struct xenstore_domain_interface *interface; +mfn_t mfn; +gfn_t gfn; +int rc; + +d->max_pages += 1; +xenstore_pg = alloc_domheap_page(d, 0); +if ( xenstore_pg == NULL ) +return -ENOMEM; + +mfn = page_to_mfn(xenstore_pg); +if ( !is_domain_direct_mapped(d) ) +gfn = gaddr_to_gfn(GUEST_MAGIC_BASE + + (XENSTORE_PFN_OFFSET << PAGE_SHIFT)); +else +gfn = gaddr_to_gfn(mfn_to_maddr(mfn)); + +rc = guest_physmap_add_page(d, gfn, mfn, 0); +if ( rc ) +{ +free_domheap_page(xenstore_pg); +return rc; +} + +d->arch.hvm.params[HVM_PARAM_STORE_PFN] = gfn_x(gfn); +interface = (struct xenstore_domain_interface *)map_domain_page(mfn); +interface->connection = XENSTORE_RECONNECT; +unmap_domain_page(interface); + +return 0; +} + static int __init construct_domU(struct domain *d, const struct dt_device_node *node) { @@ -839,7 +878,10 @@ static int __init construct_domU(struct domain *d, rc = alloc_xenstore_evtchn(d); if ( rc < 0 ) return rc; -d->arch.hvm.params[HVM_PARAM_STORE_PFN] = ~0ULL; + +rc = alloc_xenstore_page(d); +if ( rc < 0 ) +return rc; } return rc; -- 2.34.1
[PATCH v3 1/4] xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains
Currently, users are allowed to map static shared memory in a non-direct-mapped way for direct-mapped domains. This can lead to clashing of guest memory spaces. Also, the current extended region finding logic only removes the host physical addresses of the static shared memory areas for direct-mapped domains, which may be inconsistent with the guest memory map if users map the static shared memory in a non-direct-mapped way. This will lead to incorrect extended region calculation results. To make things easier, add restriction that static shared memory should also be direct-mapped for direct-mapped domains. Check the host physical address to be matched with guest physical address when parsing the device tree. Document this restriction in the doc. Signed-off-by: Henry Wang --- v3: - New patch. --- docs/misc/arm/device-tree/booting.txt | 3 +++ xen/arch/arm/static-shmem.c | 6 ++ 2 files changed, 9 insertions(+) diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt index bbd955e9c2..c994e48391 100644 --- a/docs/misc/arm/device-tree/booting.txt +++ b/docs/misc/arm/device-tree/booting.txt @@ -591,6 +591,9 @@ communication. shared memory region in host physical address space, a size, and a guest physical address, as the target address of the mapping. e.g. xen,shared-mem = < [host physical address] [guest address] [size] > +Note that if a domain is direct-mapped, i.e. the Dom0 and the Dom0less +DomUs with `direct-map` device tree property, the static shared memory +should also be direct-mapped (host physical address == guest address). It shall also meet the following criteria: 1) If the SHM ID matches with an existing region, the address range of the diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c index 78881dd1d3..b26fb69874 100644 --- a/xen/arch/arm/static-shmem.c +++ b/xen/arch/arm/static-shmem.c @@ -235,6 +235,12 @@ int __init process_shm(struct domain *d, struct kernel_info *kinfo, d, psize); return -EINVAL; } +if ( is_domain_direct_mapped(d) && (pbase != gbase) ) +{ +printk("%pd: physical address 0x%"PRIpaddr" and guest address 0x%"PRIpaddr" are not 1:1 direct-mapped.\n", + d, pbase, gbase); +return -EINVAL; +} for ( i = 0; i < PFN_DOWN(psize); i++ ) if ( !mfn_valid(mfn_add(maddr_to_mfn(pbase), i)) ) -- 2.34.1
[PATCH v3 0/4] Guest XenStore page allocation for 11 Dom0less domUs
Hi all, This series is trying to fix the reported guest magic region alloc issue for 11 Dom0less domUs, an error message can seen from the init-dom0less application on 1:1 direct-mapped Dom0less DomUs: ``` Allocating magic pages memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1 Error on alloc magic pages ``` This is because populate_physmap() automatically assumes gfn == mfn for direct mapped domains. This cannot be true for the magic pages that are allocated later for 1:1 Dom0less DomUs from the init-dom0less helper application executed in Dom0. For domain using statically allocated memory but not 1:1 direct-mapped, similar error "failed to retrieve a reserved page" can be seen as the reserved memory list is empty at that time. In [1] I've tried to fix this issue by the domctl approach, and discussions in [2] and [3] indicates that a domctl is not really necessary, as we can simplify the issue to "allocate the Dom0less guest magic regions at the Dom0less DomU build time and pass the region base PFN to init-dom0less application". The later on discussion [4] reached an agreement that we only need to allocate one single page for XenStore, and set the HVM_PARAM_STORE_PFN from hypervisor with some Linux XenStore late init protocol improvements. Therefore, this series tries to fix the issue based on all discussions. The first patch puts a restriction that static shared memory on direct-mapped DomUs should also be direct mapped, as otherwise it will clash [5]. Patch 2 allocates the XenStore page from Xen and set the initial connection status to RECONNECTED. Patch 3 is the update of the init-dom0less application with all of the changes. Patch 4 is the doc change to reflect the changes introduced by this series. **NOTE**: This series should work with the Linux change [6]. [1] https://lore.kernel.org/xen-devel/20240409045357.236802-1-xin.wa...@amd.com/ [2] https://lore.kernel.org/xen-devel/c7857223-eab8-409a-b618-6ec70f616...@apertussolutions.com/ [3] https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2404251508470.3940@ubuntu-linux-20-04-desktop/ [4] https://lore.kernel.org/xen-devel/d33ea00d-890d-45cc-9583-64c953abd...@xen.org/ [5] https://lore.kernel.org/xen-devel/686ba256-f8bf-47e7-872f-d277bf7df...@xen.org/ [6] https://lore.kernel.org/xen-devel/20240517011516.1451087-1-xin.wa...@amd.com/ Henry Wang (4): xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE docs/features/dom0less: Update the late XenStore init protocol docs/features/dom0less.pandoc | 12 +++--- docs/misc/arm/device-tree/booting.txt | 3 ++ tools/helpers/init-dom0less.c | 58 +++ xen/arch/arm/dom0less-build.c | 44 +++- xen/arch/arm/static-shmem.c | 6 +++ 5 files changed, 73 insertions(+), 50 deletions(-) -- 2.34.1
Re: [PATCH v2 6/8] xen/arm: Add XEN_DOMCTL_dt_overlay DOMCTL and related operations
Hi Jan, As usual, thanks for the review! On 5/16/2024 8:31 PM, Jan Beulich wrote: On 16.05.2024 12:03, Henry Wang wrote: +/* + * First check if dtbo is correct i.e. it should one of the dtbo which was + * used when dynamically adding the node. + * Limitation: Cases with same node names but different property are not + * supported currently. We are relying on user to provide the same dtbo + * as it was used when adding the nodes. + */ +list_for_each_entry_safe( entry, temp, _tracker, entry ) +{ +if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 ) +{ +track = entry; Random question (not doing a full review of the DT code): What use is this (and the track variable itself)? It's never used further down afaics. Same for attach. I think you are correct, it is a copy paste of the existing code and the track variable is indeed useless. So in v3, I will simply drop it and mention this clean-up in commit message. Also I realized that the exact logic of finding the entry is duplicated third times, so I will also extract the logic to a function. --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -1190,6 +1190,17 @@ struct xen_domctl_vmtrace_op { typedef struct xen_domctl_vmtrace_op xen_domctl_vmtrace_op_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_vmtrace_op_t); +#if defined(__arm__) || defined (__aarch64__) Nit: Consistent use of blanks please (also again below). Good catch. Will fix it. +struct xen_domctl_dt_overlay { +XEN_GUEST_HANDLE_64(const_void) overlay_fdt; /* IN: overlay fdt. */ +uint32_t overlay_fdt_size; /* IN: Overlay dtb size. */ +#define XEN_DOMCTL_DT_OVERLAY_ATTACH3 +#define XEN_DOMCTL_DT_OVERLAY_DETACH4 While the numbers don't really matter much, picking 3 and 4 rather than, say, 1 and 2 still looks a little odd. Well although I agree with you it is indeed a bit odd, the problem of this is that, in current implementation I reused the libxl_dt_overlay() (with proper backward compatible) to deliver the sysctl and domctl depend on the op, and we have: #define LIBXL_DT_OVERLAY_ADD 1 #define LIBXL_DT_OVERLAY_REMOVE 2 #define LIBXL_DT_OVERLAY_ATTACH 3 #define LIBXL_DT_OVERLAY_DETACH 4 Then the op-number is passed from the toolstack to Xen, and checked in dt_overlay_domctl(). So with this implementation the attach/detach op number should be 3 and 4 since 1 and 2 have different meanings. But I realized that I can also implement a similar API, say libxl_dt_overlay_domain() and that way we can reuse 1 and 2 and there is not even need to provide backward compatible of libxl_dt_overlay(). So would you mind sharing your preference on which approach would you like more? Thanks! --- a/xen/include/xen/dt-overlay.h +++ b/xen/include/xen/dt-overlay.h @@ -14,6 +14,7 @@ #include #include #include +#include Why? All you need here ... @@ -42,12 +43,18 @@ struct xen_sysctl_dt_overlay; #ifdef CONFIG_OVERLAY_DTB long dt_overlay_sysctl(struct xen_sysctl_dt_overlay *op); +long dt_overlay_domctl(struct domain *d, struct xen_domctl_dt_overlay *op); ... is a forward declaration of struct xen_domctl_dt_overlay. Oh indeed. Will fix this. Thanks! Kind regards, Henry Jan
Re: [PATCH] drivers/xen: Improve the late XenStore init protocol
Hi Stefano, On 5/17/2024 8:52 AM, Stefano Stabellini wrote: On Thu, 16 May 2024, Henry Wang wrote: enum xenstore_init xen_store_domain_type; EXPORT_SYMBOL_GPL(xen_store_domain_type); @@ -751,9 +755,10 @@ static void xenbus_probe(void) { xenstored_ready = 1; -if (!xen_store_interface) { - xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT, - XEN_PAGE_SIZE, MEMREMAP_WB); + if (!xen_store_interface || XS_INTERFACE_READY) { + if (!xen_store_interface) These two nested if's don't make sense to me. If XS_INTERFACE_READY succeeds, it means that ((xen_store_interface != NULL) && (xen_store_interface->connection == XENSTORE_CONNECTED)). So it is not possible that xen_store_interface == NULL immediately after. Right? I think this is because we want to free the irq for the late init case, otherwise the init-dom0less will fail. For the xenstore PFN allocated case, the connection is already set to CONNECTED when we execute init-dom0less. But I agree with you, would below diff makes more sense to you? diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 8aec0ed1d047..b8005b651a29 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -76,6 +76,8 @@ EXPORT_SYMBOL_GPL(xen_store_interface); ((xen_store_interface != NULL) && \ (xen_store_interface->connection == XENSTORE_CONNECTED)) +static bool xs_late_init = false; + enum xenstore_init xen_store_domain_type; EXPORT_SYMBOL_GPL(xen_store_domain_type); @@ -755,7 +757,7 @@ static void xenbus_probe(void) { xenstored_ready = 1; - if (!xen_store_interface || XS_INTERFACE_READY) { + if (xs_late_init) { if (!xen_store_interface) xen_store_interface = memremap(xen_store_gfn << I would just remove the outer 'if' and do this: if (!xen_store_interface) xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE, MEMREMAP_WB); /* * Now it is safe to free the IRQ used for xenstore late * initialization. No need to unbind: it is about to be * bound again from xb_init_comms. Note that calling * unbind_from_irqhandler now would result in xen_evtchn_close() * being called and the event channel not being enabled again * afterwards, resulting in missed event notifications. */ if (xs_init_irq > 0) free_irq(xs_init_irq, _waitq); I think this should work fine in all cases. Thanks. I followed your suggestion in v2. I am unsure if xs_init_irq==0 is possible valid value for xs_init_irq. If it is not, then we are fine. If 0 is a possible valid irq number, then we should initialize xs_init_irq to -1, and here check for xs_init_irq >= 0. Yeah the xs_init_irq==0 is a valid value. I followed your latter comment to init it to -1 and check it >=0. Kind regards, Henry
[PATCH v2] drivers/xen: Improve the late XenStore init protocol
Currently, the late XenStore init protocol is only triggered properly for the case that HVM_PARAM_STORE_PFN is ~0ULL (invalid). For the case that XenStore interface is allocated but not ready (the connection status is not XENSTORE_CONNECTED), Linux should also wait until the XenStore is set up properly. Introduce a macro to describe the XenStore interface is ready, use it in xenbus_probe_initcall() to select the code path of doing the late XenStore init protocol or not. Since now we have more than one condition for XenStore late init, rework the check in xenbus_probe() for the free_irq(). Take the opportunity to enhance the check of the allocated XenStore interface can be properly mapped, and return error early if the memremap() fails. Fixes: 5b3353949e89 ("xen: add support for initializing xenstore later as HVM domain") Signed-off-by: Henry Wang Signed-off-by: Michal Orzel --- v2: - Use -EINVAL for the memremap() check. (Stefano) - Add Fixes: tag. (Stefano) - Rework the condition for free_irq() in xenbus_probe(). (Stefano) --- drivers/xen/xenbus/xenbus_probe.c | 36 --- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 3205e5d724c8..1a9ded0cddcb 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -65,13 +65,17 @@ #include "xenbus.h" -static int xs_init_irq; +static int xs_init_irq = -1; int xen_store_evtchn; EXPORT_SYMBOL_GPL(xen_store_evtchn); struct xenstore_domain_interface *xen_store_interface; EXPORT_SYMBOL_GPL(xen_store_interface); +#define XS_INTERFACE_READY \ + ((xen_store_interface != NULL) && \ +(xen_store_interface->connection == XENSTORE_CONNECTED)) + enum xenstore_init xen_store_domain_type; EXPORT_SYMBOL_GPL(xen_store_domain_type); @@ -751,19 +755,19 @@ static void xenbus_probe(void) { xenstored_ready = 1; - if (!xen_store_interface) { + if (!xen_store_interface) xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE, MEMREMAP_WB); - /* -* Now it is safe to free the IRQ used for xenstore late -* initialization. No need to unbind: it is about to be -* bound again from xb_init_comms. Note that calling -* unbind_from_irqhandler now would result in xen_evtchn_close() -* being called and the event channel not being enabled again -* afterwards, resulting in missed event notifications. -*/ + /* +* Now it is safe to free the IRQ used for xenstore late +* initialization. No need to unbind: it is about to be +* bound again from xb_init_comms. Note that calling +* unbind_from_irqhandler now would result in xen_evtchn_close() +* being called and the event channel not being enabled again +* afterwards, resulting in missed event notifications. +*/ + if (xs_init_irq >= 0) free_irq(xs_init_irq, _waitq); - } /* * In the HVM case, xenbus_init() deferred its call to @@ -822,7 +826,7 @@ static int __init xenbus_probe_initcall(void) if (xen_store_domain_type == XS_PV || (xen_store_domain_type == XS_HVM && !xs_hvm_defer_init_for_callback() && -xen_store_interface != NULL)) +XS_INTERFACE_READY)) xenbus_probe(); /* @@ -831,7 +835,7 @@ static int __init xenbus_probe_initcall(void) * started, then probe. It will be triggered when communication * starts happening, by waiting on xb_waitq. */ - if (xen_store_domain_type == XS_LOCAL || xen_store_interface == NULL) { + if (xen_store_domain_type == XS_LOCAL || !XS_INTERFACE_READY) { struct task_struct *probe_task; probe_task = kthread_run(xenbus_probe_thread, NULL, @@ -1014,6 +1018,12 @@ static int __init xenbus_init(void) xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE, MEMREMAP_WB); + if (!xen_store_interface) { + pr_err("%s: cannot map HVM_PARAM_STORE_PFN=%llx\n", + __func__, v); + err = -EINVAL; + goto out_error; + } if (xen_store_interface->connection != XENSTORE_CONNECTED) wait = true; } -- 2.34.1
[PATCH v2 3/8] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
There are some use cases in which the dom0less domUs need to have the XEN_DOMCTL_CDF_iommu set at the domain construction time. For example, the dynamic dtbo feature allows the domain to be assigned a device that is behind the IOMMU at runtime. For these use cases, we need to have a way to specify the domain will need the IOMMU mapping at domain construction time. Introduce a "passthrough" DT property for Dom0less DomUs following the same entry as the xl.cfg. Currently only provide two options, i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain construction time based on the property. Signed-off-by: Henry Wang --- v2: - New patch to replace the original patch in v1: "[PATCH 03/15] xen/arm: Always enable IOMMU" --- docs/misc/arm/device-tree/booting.txt | 13 + xen/arch/arm/dom0less-build.c | 7 +-- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt index bbd955e9c2..61f9082553 100644 --- a/docs/misc/arm/device-tree/booting.txt +++ b/docs/misc/arm/device-tree/booting.txt @@ -260,6 +260,19 @@ with the following properties: value specified by Xen command line parameter gnttab_max_maptrack_frames (or its default value if unspecified, i.e. 1024) is used. +- passthrough + +A string property specifying whether IOMMU mappings are enabled for the +domain and hence whether it will be enabled for passthrough hardware. +Possible property values are: + +- "enabled" +IOMMU mappings are enabled for the domain. + +- "disabled" +IOMMU mappings are disabled for the domain and so hardware may not be +passed through. This option is the default if this property is missing. + Under the "xen,domain" compatible node, one or more sub-nodes are present for the DomU kernel and ramdisk. diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index 74f053c242..1396a102e1 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -848,6 +848,7 @@ static int __init construct_domU(struct domain *d, void __init create_domUs(void) { struct dt_device_node *node; +const char *dom0less_iommu; const struct dt_device_node *cpupool_node, *chosen = dt_find_node_by_path("/chosen"); @@ -895,8 +896,10 @@ void __init create_domUs(void) panic("Missing property 'cpus' for domain %s\n", dt_node_name(node)); -if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") && - iommu_enabled ) +if ( iommu_enabled && + ((!dt_property_read_string(node, "passthrough", _iommu) && + !strcmp(dom0less_iommu, "enabled")) || + dt_find_compatible_node(node, NULL, "multiboot,device-tree")) ) d_cfg.flags |= XEN_DOMCTL_CDF_iommu; if ( !dt_property_read_u32(node, "nr_spis", _cfg.arch.nr_spis) ) -- 2.34.1
[PATCH v2 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs
From: Vikram Garhwal Currently, routing/removing physical interrupts are only allowed at the domain creation/destroy time. For use cases such as dynamic device tree overlay adding/removing, the routing/removing of physical IRQ to running domains should be allowed. Removing the above-mentioned domain creation/dying check. Since this will introduce interrupt state unsync issues for cases when the interrupt is active or pending in the guest, therefore for these cases we simply reject the operation. Do it for both new and old vGIC implementations. Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- v2: - Reject the case where the IRQ is active or pending in guest. --- xen/arch/arm/gic-vgic.c | 8 ++-- xen/arch/arm/gic.c | 15 --- xen/arch/arm/vgic/vgic.c | 5 +++-- 3 files changed, 9 insertions(+), 19 deletions(-) diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 56490dbc43..d1608415f8 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -444,14 +444,18 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, { /* The VIRQ should not be already enabled by the guest */ if ( !p->desc && - !test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) + !test_bit(GIC_IRQ_GUEST_ENABLED, >status) && + !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) && + !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) ) p->desc = desc; else ret = -EBUSY; } else { -if ( desc && p->desc != desc ) +if ( desc && p->desc != desc && + (test_bit(GIC_IRQ_GUEST_VISIBLE, >status) || + test_bit(GIC_IRQ_GUEST_ACTIVE, >status)) ) ret = -EINVAL; else p->desc = NULL; diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 44c40e86de..3ebd89940a 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -135,14 +135,6 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, ASSERT(virq < vgic_num_irqs(d)); ASSERT(!is_lpi(virq)); -/* - * When routing an IRQ to guest, the virtual state is not synced - * back to the physical IRQ. To prevent get unsync, restrict the - * routing to when the Domain is been created. - */ -if ( d->creation_finished ) -return -EBUSY; - ret = vgic_connect_hw_irq(d, NULL, virq, desc, true); if ( ret ) return ret; @@ -167,13 +159,6 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, ASSERT(test_bit(_IRQ_GUEST, >status)); ASSERT(!is_lpi(virq)); -/* - * Removing an interrupt while the domain is running may have - * undesirable effect on the vGIC emulation. - */ -if ( !d->is_dying ) -return -EBUSY; - desc->handler->shutdown(desc); /* EOI the IRQ if it has not been done by the guest */ diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c index b9463a5f27..785ef2b192 100644 --- a/xen/arch/arm/vgic/vgic.c +++ b/xen/arch/arm/vgic/vgic.c @@ -877,7 +877,7 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *vcpu, if ( connect ) /* assign a mapped IRQ */ { /* The VIRQ should not be already enabled by the guest */ -if ( !irq->hw && !irq->enabled ) +if ( !irq->hw && !irq->enabled && !irq->active && !irq->pending_latch ) { irq->hw = true; irq->hwintid = desc->irq; @@ -887,7 +887,8 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *vcpu, } else/* remove a mapped IRQ */ { -if ( desc && irq->hwintid != desc->irq ) +if ( desc && irq->hwintid != desc->irq && + (irq->active || irq->pending_latch) ) { ret = -EINVAL; } -- 2.34.1
[PATCH v2 7/8] tools: Introduce the "xl dt-overlay {attach,detach}" commands
With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to attach/detach devices from the provided DT overlay to domains. Support this by introducing a new set of "xl dt-overlay" commands and related documentation, i.e. "xl dt-overlay {attach,detach}". Slightly rework the command option parsing logic. Since the addition of these two commands modifies the existing libxl API libxl_dt_overlay(), also provide the backward compatible for it. Signed-off-by: Henry Wang --- v2: - New patch. --- tools/include/libxl.h | 15 - tools/include/xenctrl.h | 3 +++ tools/libs/ctrl/xc_dt_overlay.c | 31 +++ tools/libs/light/libxl_dt_overlay.c | 30 -- tools/xl/xl_cmdtable.c | 4 ++-- tools/xl/xl_vmcontrol.c | 33 +++-- 6 files changed, 96 insertions(+), 20 deletions(-) diff --git a/tools/include/libxl.h b/tools/include/libxl.h index 62cb07dea6..27aab4bcee 100644 --- a/tools/include/libxl.h +++ b/tools/include/libxl.h @@ -2549,8 +2549,21 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, uint32_t domid, void libxl_device_pci_list_free(libxl_device_pci* list, int num); #if defined(__arm__) || defined(__aarch64__) -int libxl_dt_overlay(libxl_ctx *ctx, void *overlay, +#define LIBXL_DT_OVERLAY_ADD 1 +#define LIBXL_DT_OVERLAY_REMOVE2 +#define LIBXL_DT_OVERLAY_ATTACH3 +#define LIBXL_DT_OVERLAY_DETACH4 + +int libxl_dt_overlay(libxl_ctx *ctx, uint32_t domain_id, void *overlay, uint32_t overlay_size, uint8_t overlay_op); +#if defined(LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x041900 +int libxl_dt_overlay_0x041800(libxl_ctx *ctx, void *overlay, + uint32_t overlay_size, uint8_t overlay_op); +{ +return libxl_dt_overlay(ctx, 0, overlay, overlay_size, overlay_op); +} +#define libxl_dt_overlay libxl_dt_overlay_0x041800 +#endif #endif /* diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 4996855944..9ceca0cffc 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -2657,6 +2657,9 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t domid, #if defined(__arm__) || defined(__aarch64__) int xc_dt_overlay(xc_interface *xch, void *overlay_fdt, uint32_t overlay_fdt_size, uint8_t overlay_op); +int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt, + uint32_t overlay_fdt_size, uint8_t overlay_op, + uint32_t domain_id); #endif /* Compat shims */ diff --git a/tools/libs/ctrl/xc_dt_overlay.c b/tools/libs/ctrl/xc_dt_overlay.c index c2224c4d15..ea1da522d1 100644 --- a/tools/libs/ctrl/xc_dt_overlay.c +++ b/tools/libs/ctrl/xc_dt_overlay.c @@ -48,3 +48,34 @@ err: return err; } + +int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt, + uint32_t overlay_fdt_size, uint8_t overlay_op, + uint32_t domain_id) +{ +int err; +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_dt_overlay, +.domain = domain_id, +.u.dt_overlay = { +.overlay_op = overlay_op, +.overlay_fdt_size = overlay_fdt_size, +} +}; + +DECLARE_HYPERCALL_BOUNCE(overlay_fdt, overlay_fdt_size, + XC_HYPERCALL_BUFFER_BOUNCE_IN); + +if ( (err = xc_hypercall_bounce_pre(xch, overlay_fdt)) ) +goto err; + +set_xen_guest_handle(domctl.u.dt_overlay.overlay_fdt, overlay_fdt); + +if ( (err = do_domctl(xch, )) != 0 ) +PERROR("%s failed", __func__); + +err: +xc_hypercall_bounce_post(xch, overlay_fdt); + +return err; +} diff --git a/tools/libs/light/libxl_dt_overlay.c b/tools/libs/light/libxl_dt_overlay.c index a6c709a6dc..9110b1efd2 100644 --- a/tools/libs/light/libxl_dt_overlay.c +++ b/tools/libs/light/libxl_dt_overlay.c @@ -41,8 +41,8 @@ static int check_overlay_fdt(libxl__gc *gc, void *fdt, size_t size) return 0; } -int libxl_dt_overlay(libxl_ctx *ctx, void *overlay_dt, uint32_t overlay_dt_size, - uint8_t overlay_op) +int libxl_dt_overlay(libxl_ctx *ctx, uint32_t domain_id, void *overlay_dt, + uint32_t overlay_dt_size, uint8_t overlay_op) { int rc; int r; @@ -57,11 +57,29 @@ int libxl_dt_overlay(libxl_ctx *ctx, void *overlay_dt, uint32_t overlay_dt_size, rc = 0; } -r = xc_dt_overlay(ctx->xch, overlay_dt, overlay_dt_size, overlay_op); - -if (r) { -LOG(ERROR, "%s: Adding/Removing overlay dtb failed.", __func__); +switch (overlay_op) +{ +case LIBXL_DT_OVERLAY_ADD: +case LIBXL_DT_OVERLAY_REMOVE: +r = xc_dt_overlay(ctx->xch, overlay_dt, overlay_dt_size, overlay_op); +if (r) { +LOG(ERROR, "%s: Adding/Removing
[PATCH v2 2/8] tools/xl: Correct the help information and exit code of the dt-overlay command
Fix the name mismatch in the xl dt-overlay command, the command name should be "dt-overlay" instead of "dt_overlay". Add the missing "," in the cmdtable. Fix the exit code of the dt-overlay command, use EXIT_FAILURE instead of ERROR_FAIL. Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support") Suggested-by: Anthony PERARD Signed-off-by: Henry Wang --- v2: - New patch --- tools/xl/xl_cmdtable.c | 2 +- tools/xl/xl_vmcontrol.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c index 62bdb2aeaa..1f3c6b5897 100644 --- a/tools/xl/xl_cmdtable.c +++ b/tools/xl/xl_cmdtable.c @@ -635,7 +635,7 @@ const struct cmd_spec cmd_table[] = { { "dt-overlay", _dt_overlay, 0, 1, "Add/Remove a device tree overlay", - "add/remove <.dtbo>" + "add/remove <.dtbo>", "-h print this help\n" }, #endif diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c index 98f6bd2e76..02575d5d36 100644 --- a/tools/xl/xl_vmcontrol.c +++ b/tools/xl/xl_vmcontrol.c @@ -1278,7 +1278,7 @@ int main_dt_overlay(int argc, char **argv) const int overlay_remove_op = 2; if (argc < 2) { -help("dt_overlay"); +help("dt-overlay"); return EXIT_FAILURE; } @@ -1302,11 +1302,11 @@ int main_dt_overlay(int argc, char **argv) fprintf(stderr, "failed to read the overlay device tree file %s\n", overlay_config_file); free(overlay_dtb); -return ERROR_FAIL; +return EXIT_FAILURE; } } else { fprintf(stderr, "overlay dtbo file not provided\n"); -return ERROR_FAIL; +return EXIT_FAILURE; } rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op); -- 2.34.1
[PATCH v2 6/8] xen/arm: Add XEN_DOMCTL_dt_overlay DOMCTL and related operations
In order to support the dynamic dtbo device assignment to a running VM, the add/remove of the DT overlay and the attach/detach of the device from the DT overlay should happen separately. Therefore, repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT overlay to Xen device tree, instead of assigning the device to the hardware domain at the same time. Add the XEN_DOMCTL_dt_overlay with operations XEN_DOMCTL_DT_OVERLAY_{ATTACH,DETACH} to do/undo the device assignment to the domain. The hypervisor firstly checks the DT overlay passed from the toolstack is valid. Then the device nodes are retrieved from the overlay tracker based on the DT overlay. The attach/detach of the device is implemented by map/unmap the IRQ and IOMMU resources. Note that with these changes, the device de-registration from the IOMMU driver should only happen at the time when the DT overlay is removed from the Xen device tree. Signed-off-by: Henry Wang Signed-off-by: Vikram Garhwal --- v2: - New patch. --- xen/arch/arm/domctl.c| 3 + xen/common/dt-overlay.c | 415 --- xen/include/public/domctl.h | 15 ++ xen/include/public/sysctl.h | 7 +- xen/include/xen/dt-overlay.h | 7 + 5 files changed, 366 insertions(+), 81 deletions(-) diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index ad56efb0f5..12a12ee781 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -5,6 +5,7 @@ * Copyright (c) 2012, Citrix Systems */ +#include #include #include #include @@ -176,6 +177,8 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, return rc; } +case XEN_DOMCTL_dt_overlay: +return dt_overlay_domctl(d, >u.dt_overlay); default: return subarch_do_domctl(domctl, d, u_domctl); } diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c index 9cece79067..593e985949 100644 --- a/xen/common/dt-overlay.c +++ b/xen/common/dt-overlay.c @@ -356,24 +356,100 @@ static int overlay_get_nodes_info(const void *fdto, char **nodes_full_path) return 0; } +static int remove_irq(unsigned long s, unsigned long e, void *data) +{ +struct domain *d = data; +int rc = 0; + +/* + * IRQ should always have access unless there are duplication of + * of irqs in device tree. There are few cases of xen device tree + * where there are duplicate interrupts for the same node. + */ +if (!irq_access_permitted(d, s)) +return 0; +/* + * TODO: We don't handle shared IRQs for now. So, it is assumed that + * the IRQs was not shared with another domain. + */ +rc = irq_deny_access(d, s); +if ( rc ) +{ +printk(XENLOG_ERR "unable to revoke access for irq %ld\n", s); +return rc; +} + +rc = release_guest_irq(d, s); +if ( rc ) +{ +printk(XENLOG_ERR "unable to release irq %ld\n", s); +return rc; +} + +return rc; +} + +static int remove_all_irqs(struct rangeset *irq_ranges, struct domain *d) +{ +return rangeset_report_ranges(irq_ranges, 0, ~0UL, remove_irq, d); +} + +static int remove_iomem(unsigned long s, unsigned long e, void *data) +{ +struct domain *d = data; +int rc = 0; +p2m_type_t t; +mfn_t mfn; + +mfn = p2m_lookup(d, _gfn(s), ); +if ( mfn_x(mfn) == 0 || mfn_x(mfn) == ~0UL ) +return -EINVAL; + +rc = iomem_deny_access(d, s, e); +if ( rc ) +{ +printk(XENLOG_ERR "Unable to remove %pd access to %#lx - %#lx\n", + d, s, e); +return rc; +} + +rc = unmap_mmio_regions(d, _gfn(s), e - s, _mfn(s)); +if ( rc ) +return rc; + +return rc; +} + +static int remove_all_iomems(struct rangeset *iomem_ranges, struct domain *d) +{ +return rangeset_report_ranges(iomem_ranges, 0, ~0UL, remove_iomem, d); +} + /* Check if node itself can be removed and remove node from IOMMU. */ -static int remove_node_resources(struct dt_device_node *device_node) +static int remove_node_resources(struct dt_device_node *device_node, + struct domain *d) { int rc = 0; unsigned int len; domid_t domid; -domid = dt_device_used_by(device_node); +if ( !d ) +{ +domid = dt_device_used_by(device_node); -dt_dprintk("Checking if node %s is used by any domain\n", - device_node->full_name); +dt_dprintk("Checking if node %s is used by any domain\n", + device_node->full_name); -/* Remove the node if only it's assigned to hardware domain or domain io. */ -if ( domid != hardware_domain->domain_id && domid != DOMID_IO ) -{ -printk(XENLOG_ERR "Device %s is being used by domain %u. Removing nodes failed\n", - device_node->full_name, domid); -return -EINVAL; +/* + * We also check if device is assigned t
[PATCH v2 8/8] docs: Add device tree overlay documentation
From: Vikram Garhwal Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- v2: - Update the content based on the changes in this version. --- docs/misc/arm/overlay.txt | 99 +++ 1 file changed, 99 insertions(+) create mode 100644 docs/misc/arm/overlay.txt diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt new file mode 100644 index 00..811a6de369 --- /dev/null +++ b/docs/misc/arm/overlay.txt @@ -0,0 +1,99 @@ +# Device Tree Overlays support in Xen + +Xen now supports dynamic device assignment to running domains, +i.e. adding/removing nodes (using .dtbo) to/from Xen device tree, and +attaching/detaching them to/from a running domain with given $domid. + +Dynamic node assignment works in two steps: + +## Add/Remove device tree overlay to/from Xen device tree + +1. Xen tools check the dtbo given and parse all other user provided arguments +2. Xen tools pass the dtbo to Xen hypervisor via hypercall. +3. Xen hypervisor applies/removes the dtbo to/from Xen device tree. + +## Attach/Detach device from the DT overlay to/from domain + +1. Xen tools check the dtbo given and parse all other user provided arguments +2. Xen tools pass the dtbo to Xen hypervisor via hypercall. +3. Xen hypervisor attach/detach the device to/from the user-provided $domid by + mapping/unmapping node resources in the DT overlay. + +# Examples + +Here are a few examples on how to use it. + +## Dom0 device add + +For assigning a device tree overlay to Dom0, user should firstly properly +prepare the DT overlay. More information about device tree overlays can be +found in [1]. Then, in Dom0, enter the following: + +(dom0) xl dt-overlay add overlay.dtbo + +This will allocate the devices mentioned in overlay.dtbo to Xen device tree. + +To assign the newly added device from the dtbo to Dom0: + +(dom0) xl dt-overlay attach overlay.dtbo 0 + +Next, if the user wants to add the same device tree overlay to dom0 +Linux, execute the following: + +(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay +(dom0) cat overlay.dtbo > /sys/kernel/config/device-tree/overlays/new_overlay/dtbo + +Finally if needed, the relevant Linux kernel drive can be loaded using: + +(dom0) modprobe module_name.ko + +## Dom0 device remove + +For removing the device from Dom0, first detach the device from Dom0: + +(dom0) xl dt-overlay detach overlay.dtbo 0 + +NOTE: The user is expected to unload any Linux kernel modules which +might be accessing the devices in overlay.dtbo before detach the device. +Detaching devices without unloading the modules might result in a crash. + +Then remove the overlay from Xen device tree: + +(dom0) xl dt-overlay remove overlay.dtbo + +## DomU device add/remove + +All the nodes in dtbo will be assigned to a domain; the user will need +to prepare the dtb for the domU. For example, the `interrupt-parent` property +of the DomU overlay should be changed to the Xen hardcoded value `0xfde8`. +Below assumes the properly written DomU dtbo is `overlay_domu.dtbo`. + +User will need to create the DomU with below properties properly configured +in the xl config file: +- `iomem` +- `passthrough` (if IOMMU is needed) + +User will also need to modprobe the relevant drivers. + +Example for domU device add: + +(dom0) xl dt-overlay add overlay.dtbo# If not executed before +(dom0) xl dt-overlay attach overlay.dtbo $domid +(dom0) xl console $domid # To access $domid console + +Next, if the user needs to modify/prepare the overlay.dtbo suitable for +the domU: + +(domU) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay +(domU) cat overlay_domu.dtbo > /sys/kernel/config/device-tree/overlays/new_overlay/dtbo + +Finally, if needed, the relevant Linux kernel drive can be probed: + +(domU) modprobe module_name.ko + +Example for domU overlay remove: + +(dom0) xl dt-overlay detach overlay.dtbo $domid +(dom0) xl dt-overlay remove overlay.dtbo + +[1] https://www.kernel.org/doc/Documentation/devicetree/overlay-notes.txt -- 2.34.1
[PATCH v2 4/8] tools/arm: Introduce the "nr_spis" xl config entry
Currently, the number of SPIs allocated to the domain is only configurable for Dom0less DomUs. Xen domains are supposed to be platform agnostics and therefore the numbers of SPIs for libxl guests should not be based on the hardware. Introduce a new xl config entry for Arm to provide a method for user to decide the number of SPIs. This would help to avoid bumping the `config->arch.nr_spis` in libxl everytime there is a new platform with increased SPI numbers. Update the doc and the golang bindings accordingly. Signed-off-by: Henry Wang --- v2: - New patch to replace the original patch in v1: "[PATCH 05/15] tools/libs/light: Increase nr_spi to 160" --- docs/man/xl.cfg.5.pod.in | 11 +++ tools/golang/xenlight/helpers.gen.go | 2 ++ tools/golang/xenlight/types.gen.go | 1 + tools/libs/light/libxl_arm.c | 4 ++-- tools/libs/light/libxl_types.idl | 1 + tools/xl/xl_parse.c | 3 +++ 6 files changed, 20 insertions(+), 2 deletions(-) diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in index 8f2b375ce9..6a2d86065e 100644 --- a/docs/man/xl.cfg.5.pod.in +++ b/docs/man/xl.cfg.5.pod.in @@ -3072,6 +3072,17 @@ raised. =back +=over 4 + +=item B + +A 32-bit optional integer parameter specifying the number of SPIs (Shared +Peripheral Interrupts) to allocate for the domain. If the `nr_spis` parameter +is missing, the max number of SPIs calculated by the toolstack based on the +devices allocated for the domain will be used. + +=back + =head3 x86 =over 4 diff --git a/tools/golang/xenlight/helpers.gen.go b/tools/golang/xenlight/helpers.gen.go index 78bdb08b15..757ccaf035 100644 --- a/tools/golang/xenlight/helpers.gen.go +++ b/tools/golang/xenlight/helpers.gen.go @@ -1154,6 +1154,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)} x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version) x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart) x.ArchArm.SveVl = SveType(xc.arch_arm.sve_vl) +x.ArchArm.NrSpis = uint32(xc.arch_arm.nr_spis) if err := x.ArchX86.MsrRelaxed.fromC(_x86.msr_relaxed);err != nil { return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err) } @@ -1670,6 +1671,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)} xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion) xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart) xc.arch_arm.sve_vl = C.libxl_sve_type(x.ArchArm.SveVl) +xc.arch_arm.nr_spis = C.uint32_t(x.ArchArm.NrSpis) if err := x.ArchX86.MsrRelaxed.toC(_x86.msr_relaxed); err != nil { return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err) } diff --git a/tools/golang/xenlight/types.gen.go b/tools/golang/xenlight/types.gen.go index ccfe18019e..b7b4ba88af 100644 --- a/tools/golang/xenlight/types.gen.go +++ b/tools/golang/xenlight/types.gen.go @@ -597,6 +597,7 @@ ArchArm struct { GicVersion GicVersion Vuart VuartType SveVl SveType +NrSpis uint32 } ArchX86 struct { MsrRelaxed Defbool diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index 1cb89fa584..a4029e3ac8 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -181,8 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, LOG(DEBUG, "Configure the domain"); -config->arch.nr_spis = nr_spis; -LOG(DEBUG, " - Allocate %u SPIs", nr_spis); +config->arch.nr_spis = max(nr_spis, d_config->b_info.arch_arm.nr_spis); +LOG(DEBUG, " - Allocate %u SPIs", config->arch.nr_spis); switch (d_config->b_info.arch_arm.gic_version) { case LIBXL_GIC_VERSION_DEFAULT: diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl index 470122e768..3f143f405d 100644 --- a/tools/libs/light/libxl_types.idl +++ b/tools/libs/light/libxl_types.idl @@ -722,6 +722,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), ("vuart", libxl_vuart_type), ("sve_vl", libxl_sve_type), + ("nr_spis", uint32), ])), ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool), ])), diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index ab09d0288b..4aa99029b5 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -2933,6 +2933,9 @@ skip_usbdev: } } +if (!xlu_cfg_get_long (config, "nr_spis", , 0)) +b_info->arch_arm.nr_spis = l; + parse_vkb_list(config, d_config); d_config->virtios = NULL; -- 2.34.1
[PATCH v2 1/8] xen/common/dt-overlay: Fix lock issue when add/remove the device
If CONFIG_DEBUG=y, below assertion will be triggered: (XEN) Assertion 'rw_is_locked(_host_lock)' failed at drivers/passthrough/device_tree.c:146 (XEN) [ Xen-4.19-unstable arm64 debug=y Not tainted ] [...] (XEN) Xen call trace: (XEN)[<0a257418>] iommu_remove_dt_device+0x8c/0xd4 (PC) (XEN)[<0a2573a0>] iommu_remove_dt_device+0x14/0xd4 (LR) (XEN)[<0a20797c>] dt-overlay.c#remove_node_resources+0x8c/0x90 (XEN)[<0a207f14>] dt-overlay.c#remove_nodes+0x524/0x648 (XEN)[<0a208460>] dt_overlay_sysctl+0x428/0xc68 (XEN)[<0a2707f8>] arch_do_sysctl+0x1c/0x2c (XEN)[<0a230b40>] do_sysctl+0x96c/0x9ec (XEN)[<0a271e08>] traps.c#do_trap_hypercall+0x1e8/0x288 (XEN)[<0a273490>] do_trap_guest_sync+0x448/0x63c (XEN)[<0a25c480>] entry.o#guest_sync_slowpath+0xa8/0xd8 (XEN) (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Assertion 'rw_is_locked(_host_lock)' failed at drivers/passthrough/device_tree.c:146 (XEN) This is because iommu_remove_dt_device() is called without taking the dt_host_lock. dt_host_lock is meant to ensure that the DT node will not disappear behind back. So fix the issue by taking the lock as soon as getting hold of overlay_node. Similar issue will be observed in adding the dtbo: (XEN) Assertion 'system_state < SYS_STATE_active || rw_is_locked(_host_lock)' failed at xen-source/xen/drivers/passthrough/device_tree.c:192 (XEN) [ Xen-4.19-unstable arm64 debug=y Not tainted ] [...] (XEN) Xen call trace: (XEN)[<0a2594f4>] iommu_add_dt_device+0x7c/0x17c (PC) (XEN)[<0a259494>] iommu_add_dt_device+0x1c/0x17c (LR) (XEN)[<0a267db4>] handle_device+0x68/0x1e8 (XEN)[<0a208ba8>] dt_overlay_sysctl+0x9d4/0xb84 (XEN)[<0a27342c>] arch_do_sysctl+0x24/0x38 (XEN)[<0a231ac8>] do_sysctl+0x9ac/0xa34 (XEN)[<0a274b70>] traps.c#do_trap_hypercall+0x230/0x2dc (XEN)[<0a276330>] do_trap_guest_sync+0x478/0x688 (XEN)[<0a25e480>] entry.o#guest_sync_slowpath+0xa8/0xd8 This is because the lock is released too early. So fix the issue by releasing the lock after handle_device(). Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities") Signed-off-by: Henry Wang --- v2: - Take the lock as soon as getting hold of overlay_node. Also release the lock after handle_device() when adding dtbo. v1.1: - Move the unlock position before the check of rc. --- xen/common/dt-overlay.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c index 1b197381f6..9cece79067 100644 --- a/xen/common/dt-overlay.c +++ b/xen/common/dt-overlay.c @@ -429,18 +429,24 @@ static int remove_nodes(const struct overlay_track *tracker) if ( overlay_node == NULL ) return -EINVAL; +write_lock(_host_lock); + rc = remove_descendant_nodes_resources(overlay_node); if ( rc ) +{ +write_unlock(_host_lock); return rc; +} rc = remove_node_resources(overlay_node); if ( rc ) +{ +write_unlock(_host_lock); return rc; +} dt_dprintk("Removing node: %s\n", overlay_node->full_name); -write_lock(_host_lock); - rc = dt_overlay_remove_node(overlay_node); if ( rc ) { @@ -604,8 +610,6 @@ static long add_nodes(struct overlay_track *tr, char **nodes_full_path) return rc; } -write_unlock(_host_lock); - prev_node->allnext = next_node; overlay_node = dt_find_node_by_path(overlay_node->full_name); @@ -619,6 +623,7 @@ static long add_nodes(struct overlay_track *tr, char **nodes_full_path) rc = handle_device(hardware_domain, overlay_node, p2m_mmio_direct_c, tr->iomem_ranges, tr->irq_ranges); +write_unlock(_host_lock); if ( rc ) { printk(XENLOG_ERR "Adding IRQ and IOMMU failed\n"); -- 2.34.1
[PATCH v2 0/8] Remaining patches for dynamic node programming using overlay dtbo
Hi all, This is the remaining series for the full functional "dynamic node programming using overlay dtbo" feature. The first part [1] has already been merged. Quoting from the original series, the first part has already made Xen aware of new device tree node which means updating the dt_host with overlay node information, and in this series, the goal is to map IRQ and IOMMU during runtime, where we will do the actual IOMMU and IRQ mapping and unmapping to a running domain. Also, documentation of the "dynamic node programming using overlay dtbo" feature is added. Patch 1 and 2 are fixes of the existing code which is noticed during my local tests, details please see the commit message. Gitlab CI for this series can be found in [1]. [1] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1293126857 Henry Wang (6): xen/common/dt-overlay: Fix lock issue when add/remove the device tools/xl: Correct the help information and exit code of the dt-overlay command xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs tools/arm: Introduce the "nr_spis" xl config entry xen/arm: Add XEN_DOMCTL_dt_overlay DOMCTL and related operations tools: Introduce the "xl dt-overlay {attach,detach}" commands Vikram Garhwal (2): xen/arm/gic: Allow routing/removing interrupt to running VMs docs: Add device tree overlay documentation docs/man/xl.cfg.5.pod.in | 11 + docs/misc/arm/device-tree/booting.txt | 13 + docs/misc/arm/overlay.txt | 99 ++ tools/golang/xenlight/helpers.gen.go | 2 + tools/golang/xenlight/types.gen.go| 1 + tools/include/libxl.h | 15 +- tools/include/xenctrl.h | 3 + tools/libs/ctrl/xc_dt_overlay.c | 31 ++ tools/libs/light/libxl_arm.c | 4 +- tools/libs/light/libxl_dt_overlay.c | 30 +- tools/libs/light/libxl_types.idl | 1 + tools/xl/xl_cmdtable.c| 4 +- tools/xl/xl_parse.c | 3 + tools/xl/xl_vmcontrol.c | 39 ++- xen/arch/arm/dom0less-build.c | 7 +- xen/arch/arm/domctl.c | 3 + xen/arch/arm/gic-vgic.c | 8 +- xen/arch/arm/gic.c| 15 - xen/arch/arm/vgic/vgic.c | 5 +- xen/common/dt-overlay.c | 418 +- xen/include/public/domctl.h | 15 + xen/include/public/sysctl.h | 7 +- xen/include/xen/dt-overlay.h | 7 + 23 files changed, 615 insertions(+), 126 deletions(-) create mode 100644 docs/misc/arm/overlay.txt -- 2.34.1
Re: [PATCH] drivers/xen: Improve the late XenStore init protocol
Hi Stefano, On 5/16/2024 6:30 AM, Stefano Stabellini wrote: On Wed, 15 May 2024, Henry Wang wrote: Currently, the late XenStore init protocol is only triggered properly for the case that HVM_PARAM_STORE_PFN is ~0ULL (invalid). For the case that XenStore interface is allocated but not ready (the connection status is not XENSTORE_CONNECTED), Linux should also wait until the XenStore is set up properly. Introduce a macro to describe the XenStore interface is ready, use it in xenbus_probe_initcall() and xenbus_probe() to select the code path of doing the late XenStore init protocol or not. Take the opportunity to enhance the check of the allocated XenStore interface can be properly mapped, and return error early if the memremap() fails. Signed-off-by: Henry Wang Signed-off-by: Michal Orzel Please add a Fixes: tag Sure. Will do. --- drivers/xen/xenbus/xenbus_probe.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 3205e5d724c8..8aec0ed1d047 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -72,6 +72,10 @@ EXPORT_SYMBOL_GPL(xen_store_evtchn); struct xenstore_domain_interface *xen_store_interface; EXPORT_SYMBOL_GPL(xen_store_interface); +#define XS_INTERFACE_READY \ + ((xen_store_interface != NULL) && \ +(xen_store_interface->connection == XENSTORE_CONNECTED)) + enum xenstore_init xen_store_domain_type; EXPORT_SYMBOL_GPL(xen_store_domain_type); @@ -751,9 +755,10 @@ static void xenbus_probe(void) { xenstored_ready = 1; - if (!xen_store_interface) { - xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT, - XEN_PAGE_SIZE, MEMREMAP_WB); + if (!xen_store_interface || XS_INTERFACE_READY) { + if (!xen_store_interface) These two nested if's don't make sense to me. If XS_INTERFACE_READY succeeds, it means that ((xen_store_interface != NULL) && (xen_store_interface->connection == XENSTORE_CONNECTED)). So it is not possible that xen_store_interface == NULL immediately after. Right? I think this is because we want to free the irq for the late init case, otherwise the init-dom0less will fail. For the xenstore PFN allocated case, the connection is already set to CONNECTED when we execute init-dom0less. But I agree with you, would below diff makes more sense to you? diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 8aec0ed1d047..b8005b651a29 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -76,6 +76,8 @@ EXPORT_SYMBOL_GPL(xen_store_interface); ((xen_store_interface != NULL) && \ (xen_store_interface->connection == XENSTORE_CONNECTED)) +static bool xs_late_init = false; + enum xenstore_init xen_store_domain_type; EXPORT_SYMBOL_GPL(xen_store_domain_type); @@ -755,7 +757,7 @@ static void xenbus_probe(void) { xenstored_ready = 1; - if (!xen_store_interface || XS_INTERFACE_READY) { + if (xs_late_init) { if (!xen_store_interface) xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE, MEMREMAP_WB); @@ -937,6 +939,8 @@ static irqreturn_t xenbus_late_init(int irq, void *unused) int err; uint64_t v = 0; + xs_late_init = true; + err = hvm_get_parameter(HVM_PARAM_STORE_PFN, ); if (err || !v || !~v) return IRQ_HANDLED; + xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT, + XEN_PAGE_SIZE, MEMREMAP_WB); /* * Now it is safe to free the IRQ used for xenstore late * initialization. No need to unbind: it is about to be @@ -822,7 +827,7 @@ static int __init xenbus_probe_initcall(void) if (xen_store_domain_type == XS_PV || (xen_store_domain_type == XS_HVM && !xs_hvm_defer_init_for_callback() && -xen_store_interface != NULL)) +XS_INTERFACE_READY)) xenbus_probe(); /* @@ -831,7 +836,7 @@ static int __init xenbus_probe_initcall(void) * started, then probe. It will be triggered when communication * starts happening, by waiting on xb_waitq. */ - if (xen_store_domain_type == XS_LOCAL || xen_store_interface == NULL) { + if (xen_store_domain_type == XS_LOCAL || !XS_INTERFACE_READY) { struct task_struct *probe_task; probe_task = kthread_run(xenbus_probe_thread, NULL, @@ -1014,6 +1019,12 @@ static int __init xenbus_init(void) xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_S
Re: Proposal to Extend Feature Freeze Deadline
Hi Oleksii, On 5/14/2024 11:43 PM, Andrew Cooper wrote: On 14/05/2024 4:40 pm, Oleksii K. wrote: Hello everyone, We're observing fewer merged patches/series across several architectures for the current 4.19 release in comparison to previous release. For example: 1. For Arm, significant features like Cache Coloring and PCI Passthrough won't be fully merged. Thus, it would be beneficial to commit at least the following two patch series: [1]https://lore.kernel.org/xen-devel/20240511005611.83125-1-xin.wa...@amd.com/ [2]https://lore.kernel.org/xen-devel/20240424033449.168398-1-xin.wa...@amd.com/ 2. For RISC-V, having the following patch series [3], mostly reviewed with only one blocker [4], would be advantageous (As far as I know, Andrew is planning to update his patch series): [3]https://lore.kernel.org/xen-devel/cover.1713347222.git.oleksii.kuroc...@gmail.com/ [4] https://patchew.org/Xen/20240313172716.2325427-1-andrew.coop...@citrix.com/ 3. For PPC, it would be beneficial to have [5] merged: [5] https://lore.kernel.org/xen-devel/cover.1712893887.git.sanasta...@raptorengineering.com/ Extending the feature freeze deadline by one week, until May 24th, would provide additional time for merges mentioned above. This, in turn, would create space for more features and fixes for x86 and other common elements. If we agree to extend the feature freeze deadline, please feel free to outline what you would like to see in the 4.19 release. This will make it easier to track our final goals and determine if they are realistically achievable. I'd like to open the floor for discussion on this proposal. Does it make sense, and would it be useful? Considering how many people are blocked on me, I'd welcome a little bit longer to get the outstanding series/fixes to land. It would be great if we can extend the deadline for a week, thank you! I will try my best to make progress of the two above-mentioned Arm series. Kind regards, Henry
[PATCH] drivers/xen: Improve the late XenStore init protocol
Currently, the late XenStore init protocol is only triggered properly for the case that HVM_PARAM_STORE_PFN is ~0ULL (invalid). For the case that XenStore interface is allocated but not ready (the connection status is not XENSTORE_CONNECTED), Linux should also wait until the XenStore is set up properly. Introduce a macro to describe the XenStore interface is ready, use it in xenbus_probe_initcall() and xenbus_probe() to select the code path of doing the late XenStore init protocol or not. Take the opportunity to enhance the check of the allocated XenStore interface can be properly mapped, and return error early if the memremap() fails. Signed-off-by: Henry Wang Signed-off-by: Michal Orzel --- drivers/xen/xenbus/xenbus_probe.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 3205e5d724c8..8aec0ed1d047 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -72,6 +72,10 @@ EXPORT_SYMBOL_GPL(xen_store_evtchn); struct xenstore_domain_interface *xen_store_interface; EXPORT_SYMBOL_GPL(xen_store_interface); +#define XS_INTERFACE_READY \ + ((xen_store_interface != NULL) && \ +(xen_store_interface->connection == XENSTORE_CONNECTED)) + enum xenstore_init xen_store_domain_type; EXPORT_SYMBOL_GPL(xen_store_domain_type); @@ -751,9 +755,10 @@ static void xenbus_probe(void) { xenstored_ready = 1; - if (!xen_store_interface) { - xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT, - XEN_PAGE_SIZE, MEMREMAP_WB); + if (!xen_store_interface || XS_INTERFACE_READY) { + if (!xen_store_interface) + xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT, + XEN_PAGE_SIZE, MEMREMAP_WB); /* * Now it is safe to free the IRQ used for xenstore late * initialization. No need to unbind: it is about to be @@ -822,7 +827,7 @@ static int __init xenbus_probe_initcall(void) if (xen_store_domain_type == XS_PV || (xen_store_domain_type == XS_HVM && !xs_hvm_defer_init_for_callback() && -xen_store_interface != NULL)) +XS_INTERFACE_READY)) xenbus_probe(); /* @@ -831,7 +836,7 @@ static int __init xenbus_probe_initcall(void) * started, then probe. It will be triggered when communication * starts happening, by waiting on xb_waitq. */ - if (xen_store_domain_type == XS_LOCAL || xen_store_interface == NULL) { + if (xen_store_domain_type == XS_LOCAL || !XS_INTERFACE_READY) { struct task_struct *probe_task; probe_task = kthread_run(xenbus_probe_thread, NULL, @@ -1014,6 +1019,12 @@ static int __init xenbus_init(void) xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE, MEMREMAP_WB); + if (!xen_store_interface) { + pr_err("%s: cannot map HVM_PARAM_STORE_PFN=%llx\n", + __func__, v); + err = -ENOMEM; + goto out_error; + } if (xen_store_interface->connection != XENSTORE_CONNECTED) wait = true; } -- 2.34.1
Re: [PATCH v2 1/4] xen/arm: Alloc hypervisor reserved pages as magic pages for Dom0less DomUs
Hi Julien, On 5/11/2024 7:03 PM, Julien Grall wrote: Hi Henry, On 11/05/2024 01:56, Henry Wang wrote: +static int __init alloc_magic_pages(struct domain *d) +{ + struct page_info *magic_pg; + mfn_t mfn; + gfn_t gfn; + int rc; + + d->max_pages += NR_MAGIC_PAGES; + magic_pg = alloc_domheap_pages(d, get_order_from_pages(NR_MAGIC_PAGES), 0); + if ( magic_pg == NULL ) + return -ENOMEM; + + mfn = page_to_mfn(magic_pg); + if ( !is_domain_direct_mapped(d) ) + gfn = gaddr_to_gfn(GUEST_MAGIC_BASE); + else + gfn = gaddr_to_gfn(mfn_to_maddr(mfn)); Summarizing the discussion we had on Matrix. Regions like the extend area and shared memory may not be direct mapped. So unfortunately, I think it is possible that the GFN could clash with one of those. At least in the shared memory case, the user can provide the address. But as you use the domheap allocator, the address returned could easily change if you tweak your setup. I am not entirely sure what's the best solution. We could ask the user to provide the information for reserved region. But it feels like we are exposing a bit too much to the user. So possibly we would want to use the same approach as extended regions. Once we processed all the mappings, find some space for the hypervisor regions. One thing that I noticed when I re-visit the extended region finding code from the hypervisor side is: When the domain is direct-mapped, when we find extended region for the domain, we either use find_unallocated_memory() or find_memory_holes(). It looks like the removal of shared memory regions in both functions uses the paddr parsed from the device tree to remove the regions, which indicates there is an assumption that when a domain is direct-mapped, the shared memory should also be direct-mapped. I might be wrong, but otherwise I don't think the extended region finding logic will carve out the correct shared memory region gpaddr range for guests. So I think we are missing the documentation (and the corresponding checking when we parse the device tree) for above assumption for the static shared memory, i.e., when the domain is direct-mapped, the static shared memory should also be direct-mapped, and user should make sure this is satisfied in the device tree otherwise Xen should complain. If we add this assumption and related checking code, I think your concern of clashing with static shared memory can be addressed. Do you agree? Kind regards, Henry Any other suggestions? Cheers,
Re: [PATCH v2 1/4] xen/arm: Alloc hypervisor reserved pages as magic pages for Dom0less DomUs
Hi Julien, On 5/11/2024 4:46 PM, Julien Grall wrote: Hi Henry, On 11/05/2024 01:56, Henry Wang wrote: There are use cases (for example using the PV driver) in Dom0less setup that require Dom0less DomUs start immediately with Dom0, but initialize XenStore later after Dom0's successful boot and call to the init-dom0less application. An error message can seen from the init-dom0less application on 1:1 direct-mapped domains: ``` Allocating magic pages memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1 Error on alloc magic pages ``` The "magic page" is a terminology used in the toolstack as reserved pages for the VM to have access to virtual platform capabilities. Currently the magic pages for Dom0less DomUs are populated by the init-dom0less app through populate_physmap(), and populate_physmap() automatically assumes gfn == mfn for 1:1 direct mapped domains. This cannot be true for the magic pages that are allocated later from the init-dom0less application executed in Dom0. For domain using statically allocated memory but not 1:1 direct-mapped, similar error "failed to retrieve a reserved page" can be seen as the reserved memory list is empty at that time. To solve above issue, this commit allocates hypervisor reserved pages (currently used as the magic pages) for Arm Dom0less DomUs at the domain construction time. The base address/PFN of the region will be noted and communicated to the init-dom0less application in Dom0. Reported-by: Alec Kwapis Suggested-by: Daniel P. Smith Signed-off-by: Henry Wang --- v2: - Reword the commit msg to explain what is "magic page" and use generic terminology "hypervisor reserved pages" in commit msg. (Daniel) - Also move the offset definition of magic pages. (Michal) - Extract the magic page allocation logic to a function. (Michal) --- tools/libs/guest/xg_dom_arm.c | 6 -- xen/arch/arm/dom0less-build.c | 32 xen/include/public/arch-arm.h | 6 ++ 3 files changed, 38 insertions(+), 6 deletions(-) diff --git a/tools/libs/guest/xg_dom_arm.c b/tools/libs/guest/xg_dom_arm.c index 2fd8ee7ad4..8c579d7576 100644 --- a/tools/libs/guest/xg_dom_arm.c +++ b/tools/libs/guest/xg_dom_arm.c @@ -25,12 +25,6 @@ #include "xg_private.h" -#define NR_MAGIC_PAGES 4 -#define CONSOLE_PFN_OFFSET 0 -#define XENSTORE_PFN_OFFSET 1 -#define MEMACCESS_PFN_OFFSET 2 -#define VUART_PFN_OFFSET 3 - #define LPAE_SHIFT 9 #define PFN_4K_SHIFT (0) diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index 74f053c242..4b96ddd9ce 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -739,6 +739,34 @@ static int __init alloc_xenstore_evtchn(struct domain *d) return 0; } +static int __init alloc_magic_pages(struct domain *d) +{ + struct page_info *magic_pg; + mfn_t mfn; + gfn_t gfn; + int rc; + + d->max_pages += NR_MAGIC_PAGES; Here you bump d->max_mages by NR_MAGIC_PAGES but... + magic_pg = alloc_domheap_pages(d, get_order_from_pages(NR_MAGIC_PAGES), 0); ... here you will allocate using a power-of-two. Which may end up to fail as there is nothing guaranteeing that NR_MAGIC_PAGES is suitably aligned. For now NR_MAGIC_PAGES seems suitably aligned, so it BUILD_BUG_ON() woudl be ok. Great catch! I will add BUILD_BUG_ON(NR_MAGIC_PAGES & (NR_MAGIC_PAGES - 1)); Thanks. + if ( magic_pg == NULL ) + return -ENOMEM; + + mfn = page_to_mfn(magic_pg); + if ( !is_domain_direct_mapped(d) ) + gfn = gaddr_to_gfn(GUEST_MAGIC_BASE); + else + gfn = gaddr_to_gfn(mfn_to_maddr(mfn)); Allocating the magic pages contiguously is only necessary for direct mapped domain. For the other it might be preferable to allocate page by page. That said, NR_MAGIC_PAGES is not big enough. So it would be okay. + + rc = guest_physmap_add_pages(d, gfn, mfn, NR_MAGIC_PAGES); + if ( rc ) + { + free_domheap_pages(magic_pg, get_order_from_pages(NR_MAGIC_PAGES)); + return rc; + } + + return 0; +} + static int __init construct_domU(struct domain *d, const struct dt_device_node *node) { @@ -840,6 +868,10 @@ static int __init construct_domU(struct domain *d, if ( rc < 0 ) return rc; d->arch.hvm.params[HVM_PARAM_STORE_PFN] = ~0ULL; + + rc = alloc_magic_pages(d); + if ( rc < 0 ) + return rc; This will only be allocated xenstore is enabled. But I don't think some of the magic pages really require xenstore to work. In the future we may need some more fine graine choice (see my comment in patch #2 as well). Sorry, but it seems that by the time that I am writing this reply, I didn't get the email for patch #2 comment. I will reply both together when I see it. } return rc; diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm
Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM
Hi Julien, On 5/11/2024 4:22 PM, Julien Grall wrote: Hi Henry, On 11/05/2024 08:29, Henry Wang wrote: + /* + * Handle the LR where the physical interrupt is de-assigned from the + * guest before it was EOIed + */ + struct vcpu *v_target = vgic_get_target_vcpu(d->vcpu[0], virq); This will return a vCPU from the current affinity. This may not be where the interrupt was injected. From a brief look, I can't tell whether we have an easy way to know where the interrupt was injected (other than the pending_irq is in the list lr_queue/inflight) I doubt if we need to handle more than this - I think if the pending_irq is not in the lr_queue/inflight list, it would not belong to the corner case we are talking about (?). I didn't suggest we would need to handle the case where the pending_irq is not any of the queues. I was pointing out that I think we don't directly store the vCPU ID where we injected the IRQ. Instead, the pending_irq is just in list, so we will possibly need to store the vCPU ID for convenience. Sorry for misunderstanding. Yeah you are definitely correct. Also thank you so much for the suggestion! Before seeing this suggestion, I was struggling in finding the correct vCPU by "for_each_vcpus" and comparison... but now I realized your suggestion is way more clever :) Kind regards, Henry
Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM
Hi Julien, On 5/10/2024 4:54 PM, Julien Grall wrote: Hi, On 09/05/2024 16:31, Henry Wang wrote: On 5/9/2024 4:46 AM, Julien Grall wrote: Hi Henry, [...] ``` diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index a775f886ed..d3f9cd2299 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -135,16 +135,6 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, ASSERT(virq < vgic_num_irqs(d)); ASSERT(!is_lpi(virq)); - /* - * When routing an IRQ to guest, the virtual state is not synced - * back to the physical IRQ. To prevent get unsync, restrict the - * routing to when the Domain is been created. - */ -#ifndef CONFIG_OVERLAY_DTB - if ( d->creation_finished ) - return -EBUSY; -#endif - ret = vgic_connect_hw_irq(d, NULL, virq, desc, true); This is checking if the interrupt is already enabled. Do we also need to check for active/pending? Thank you for raising this! I assume you meant this? @@ -444,7 +444,9 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, unsigned int virq, { /* The VIRQ should not be already enabled by the guest */ if ( !p->desc && - !test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) + !test_bit(GIC_IRQ_GUEST_ENABLED, >status) && + !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) && + !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) ) p->desc = desc; else ret = -EBUSY; I think adding the check for active/pending check at the time of routing the IRQ makes sense, so I will add them (both for old and new vGIC implementation). if ( ret ) return ret; @@ -169,20 +159,40 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, ASSERT(test_bit(_IRQ_GUEST, >status)); ASSERT(!is_lpi(virq)); - /* - * Removing an interrupt while the domain is running may have - * undesirable effect on the vGIC emulation. - */ -#ifndef CONFIG_OVERLAY_DTB - if ( !d->is_dying ) - return -EBUSY; -#endif - desc->handler->shutdown(desc); /* EOI the IRQ if it has not been done by the guest */ if ( test_bit(_IRQ_INPROGRESS, >status) ) + { I assume this is just a PoC state, but I just want to point out that this will not work with the new vGIC (some of the functions doesn't exist there). Thank you. Yes currently we can discuss for the old vGIC implementation. After we reach the final conclusion I will do the changes for both old and new vGIC. + /* + * Handle the LR where the physical interrupt is de-assigned from the + * guest before it was EOIed + */ + struct vcpu *v_target = vgic_get_target_vcpu(d->vcpu[0], virq); This will return a vCPU from the current affinity. This may not be where the interrupt was injected. From a brief look, I can't tell whether we have an easy way to know where the interrupt was injected (other than the pending_irq is in the list lr_queue/inflight) I doubt if we need to handle more than this - I think if the pending_irq is not in the lr_queue/inflight list, it would not belong to the corner case we are talking about (?). + } + spin_unlock_irqrestore(_target->arch.vgic.lock, flags); + + vgic_lock_rank(v_target, rank, flags); + vgic_disable_irqs(v_target, (~rank->ienable) & rank->ienable, rank->index); + vgic_unlock_rank(v_target, rank, flags); Why do you need to call vgic_disable_irqs()? I will drop this part. Kind regards, Henry
[PATCH v2 0/4] Guest magic region allocation for 11 Dom0less domUs - Take two
Hi all, This series is trying to fix the reported guest magic region allocation issue for 11 Dom0less domUs, an error message can seen from the init-dom0less application on 1:1 direct-mapped Dom0less DomUs: ``` Allocating magic pages memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1 Error on alloc magic pages ``` This is because populate_physmap() automatically assumes gfn == mfn for direct mapped domains. This cannot be true for the magic pages that are allocated later for 1:1 Dom0less DomUs from the init-dom0less helper application executed in Dom0. For domain using statically allocated memory but not 1:1 direct-mapped, similar error "failed to retrieve a reserved page" can be seen as the reserved memory list is empty at that time. In [1] I've tried to fix this issue by the domctl approach, and discussions in [2] and [3] indicates that a domctl is not really necessary, as we can simplify the issue to "allocate the Dom0less guest magic regions at the Dom0less DomU build time and pass the region base PFN to init-dom0less application". Therefore, the first patch in this series will allocate magic pages for Dom0less DomUs, the second patch will store the allocated region base PFN to HVMOP params like HVM_PARAM_CALLBACK_IRQ, and the third patch uses the HVMOP to get the stored guest magic region base PFN to avoid hardcoding GUEST_MAGIC_BASE. The last patch will update documentation. Gitlab CI for this series can be found in [4]. [1] https://lore.kernel.org/xen-devel/20240409045357.236802-1-xin.wa...@amd.com/ [2] https://lore.kernel.org/xen-devel/c7857223-eab8-409a-b618-6ec70f616...@apertussolutions.com/ [3] https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2404251508470.3940@ubuntu-linux-20-04-desktop/ [4] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1285727622 Henry Wang (4): xen/arm: Alloc hypervisor reserved pages as magic pages for Dom0less DomUs xen/arm: Add new HVM_PARAM_HV_RSRV_{BASE_PFN,SIZE} keys in HVMOP tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE docs/features/dom0less: Update the late XenStore init protocol docs/features/dom0less.pandoc | 8 --- tools/helpers/init-dom0less.c | 40 ++--- tools/libs/guest/xg_dom_arm.c | 6 - xen/arch/arm/dom0less-build.c | 35 + xen/arch/arm/hvm.c | 2 ++ xen/include/public/arch-arm.h | 6 + xen/include/public/hvm/params.h | 11 - 7 files changed, 75 insertions(+), 33 deletions(-) -- 2.34.1
[PATCH v2 4/4] docs/features/dom0less: Update the late XenStore init protocol
With the new allocation strategy of Dom0less DomUs magic page region, update the documentation of the late XenStore init protocol accordingly. Signed-off-by: Henry Wang --- v2: - New patch. --- docs/features/dom0less.pandoc | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/features/dom0less.pandoc b/docs/features/dom0less.pandoc index 725afa0558..137e6b618b 100644 --- a/docs/features/dom0less.pandoc +++ b/docs/features/dom0less.pandoc @@ -110,8 +110,9 @@ hotplug PV drivers to dom0less guests. E.g. xl network-attach domU. The implementation works as follows: - Xen allocates the xenstore event channel for each dom0less domU that has the "xen,enhanced" property, and sets HVM_PARAM_STORE_EVTCHN -- Xen does *not* allocate the xenstore page and sets HVM_PARAM_STORE_PFN - to ~0ULL (invalid) +- Xen allocates the hypervisor reserved pages region (the xenstore page + is part of it) and sets HVM_PARAM_HV_RSRV_{BASE_PFN,SIZE} accordingly. + Xen sets HVM_PARAM_STORE_PFN to ~0ULL (invalid). - Dom0less domU kernels check that HVM_PARAM_STORE_PFN is set to invalid - Old kernels will continue without xenstore support (Note: some old buggy kernels might crash because they don't check the validity of @@ -121,7 +122,8 @@ The implementation works as follows: channel (HVM_PARAM_STORE_EVTCHN) before continuing with the initialization - Once dom0 is booted, init-dom0less is executed: -- it allocates the xenstore shared page and sets HVM_PARAM_STORE_PFN +- it gets the xenstore shared page from HVM_PARAM_HV_RSRV_BASE_PFN + and sets HVM_PARAM_STORE_PFN - it calls xs_introduce_domain - Xenstored notices the new domain, initializes interfaces as usual, and sends an event channel notification to the domain using the xenstore -- 2.34.1
[PATCH v2 2/4] xen/arm: Add new HVM_PARAM_HV_RSRV_{BASE_PFN,SIZE} keys in HVMOP
For use cases such as Dom0less PV drivers, a mechanism to communicate Dom0less DomU's static data with the runtime control plane (Dom0) is needed. Since on Arm HVMOP is already the existing approach to address such use cases (for example the allocation of HVM_PARAM_CALLBACK_IRQ), add new HVMOP keys HVM_PARAM_HV_RSRV_{BASE_PFN,SIZE} for storing the hypervisor reserved pages region base PFN and size. Currently, the hypervisor reserved pages region is used as the Arm Dom0less DomU guest magic pages region. Therefore protect the HVMOP keys with "#if defined(__arm__) || defined(__aarch64__)". The values will be set at Dom0less DomU construction time after Dom0less DomU's magic pages region has been allocated. Reported-by: Alec Kwapis Signed-off-by: Henry Wang --- v2: - Rename the HVMOP keys to HVM_PARAM_HV_RSRV_{BASE_PFN,SIZE}. (Daniel) - Add comment on top of HVM_PARAM_HV_RSRV_{BASE_PFN,SIZE} to describe its usage. Protect them with #ifdef. (Daniel, Jan) --- xen/arch/arm/dom0less-build.c | 3 +++ xen/arch/arm/hvm.c | 2 ++ xen/include/public/hvm/params.h | 11 ++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index 4b96ddd9ce..5bb53ebb47 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -764,6 +764,9 @@ static int __init alloc_magic_pages(struct domain *d) return rc; } +d->arch.hvm.params[HVM_PARAM_HV_RSRV_BASE_PFN] = gfn_x(gfn); +d->arch.hvm.params[HVM_PARAM_HV_RSRV_SIZE] = NR_MAGIC_PAGES; + return 0; } diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c index 0989309fea..949d804f8b 100644 --- a/xen/arch/arm/hvm.c +++ b/xen/arch/arm/hvm.c @@ -55,6 +55,8 @@ static int hvm_allow_get_param(const struct domain *d, unsigned int param) case HVM_PARAM_STORE_EVTCHN: case HVM_PARAM_CONSOLE_PFN: case HVM_PARAM_CONSOLE_EVTCHN: +case HVM_PARAM_HV_RSRV_BASE_PFN: +case HVM_PARAM_HV_RSRV_SIZE: return 0; /* diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h index a22b4ed45d..337f5b0bf8 100644 --- a/xen/include/public/hvm/params.h +++ b/xen/include/public/hvm/params.h @@ -296,6 +296,15 @@ #define XEN_HVM_MCA_CAP_LMCE (xen_mk_ullong(1) << 0) #define XEN_HVM_MCA_CAP_MASK XEN_HVM_MCA_CAP_LMCE -#define HVM_NR_PARAMS 39 +/* + * Base PFN and number of pages of the hypervisor reserved pages region. + * Currently only used on Arm for Dom0less DomUs as guest magic pages region. + */ +#if defined(__arm__) || defined(__aarch64__) +#define HVM_PARAM_HV_RSRV_BASE_PFN 39 +#define HVM_PARAM_HV_RSRV_SIZE 40 +#endif + +#define HVM_NR_PARAMS 41 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */ -- 2.34.1
[PATCH v2 3/4] tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE
Currently the GUEST_MAGIC_BASE in the init-dom0less application is hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less DomUs. Since the guest magic region is now allocated from the hypervisor, instead of hardcoding the guest magic pages region, use xc_hvm_param_get() to get the guest magic region PFN, and based on that the XenStore PFN can be calculated. Also, we don't need to set the max mem anymore, so drop the call to xc_domain_setmaxmem(). Rename the alloc_xs_page() to get_xs_page() to reflect the changes. Take the opportunity to do some coding style improvements when possible. Reported-by: Alec Kwapis Signed-off-by: Henry Wang --- v2: - Update HVMOP keys name. --- tools/helpers/init-dom0less.c | 40 +++ 1 file changed, 17 insertions(+), 23 deletions(-) diff --git a/tools/helpers/init-dom0less.c b/tools/helpers/init-dom0less.c index fee93459c4..04039a2a66 100644 --- a/tools/helpers/init-dom0less.c +++ b/tools/helpers/init-dom0less.c @@ -19,24 +19,20 @@ #define XENSTORE_PFN_OFFSET 1 #define STR_MAX_LENGTH 128 -static int alloc_xs_page(struct xc_interface_core *xch, - libxl_dominfo *info, - uint64_t *xenstore_pfn) +static int get_xs_page(struct xc_interface_core *xch, libxl_dominfo *info, + uint64_t *xenstore_pfn) { int rc; -const xen_pfn_t base = GUEST_MAGIC_BASE >> XC_PAGE_SHIFT; -xen_pfn_t p2m = (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET; +xen_pfn_t magic_base_pfn; -rc = xc_domain_setmaxmem(xch, info->domid, - info->max_memkb + (XC_PAGE_SIZE/1024)); -if (rc < 0) -return rc; - -rc = xc_domain_populate_physmap_exact(xch, info->domid, 1, 0, 0, ); -if (rc < 0) -return rc; +rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_HV_RSRV_BASE_PFN, + _base_pfn); +if (rc < 0) { +printf("Failed to get HVM_PARAM_HV_RSRV_BASE_PFN\n"); +return 1; +} -*xenstore_pfn = base + XENSTORE_PFN_OFFSET; +*xenstore_pfn = magic_base_pfn + XENSTORE_PFN_OFFSET; rc = xc_clear_domain_page(xch, info->domid, *xenstore_pfn); if (rc < 0) return rc; @@ -100,6 +96,7 @@ static bool do_xs_write_vm(struct xs_handle *xsh, xs_transaction_t t, */ static int create_xenstore(struct xs_handle *xsh, libxl_dominfo *info, libxl_uuid uuid, + xen_pfn_t xenstore_pfn, evtchn_port_t xenstore_port) { domid_t domid; @@ -145,8 +142,7 @@ static int create_xenstore(struct xs_handle *xsh, rc = snprintf(target_memkb_str, STR_MAX_LENGTH, "%"PRIu64, info->current_memkb); if (rc < 0 || rc >= STR_MAX_LENGTH) return rc; -rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%lld", - (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET); +rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%"PRIu_xen_pfn, xenstore_pfn); if (rc < 0 || rc >= STR_MAX_LENGTH) return rc; rc = snprintf(xenstore_port_str, STR_MAX_LENGTH, "%u", xenstore_port); @@ -245,9 +241,9 @@ static int init_domain(struct xs_handle *xsh, if (!xenstore_evtchn) return 0; -/* Alloc xenstore page */ -if (alloc_xs_page(xch, info, _pfn) != 0) { -printf("Error on alloc magic pages\n"); +/* Get xenstore page */ +if (get_xs_page(xch, info, _pfn) != 0) { +printf("Error on getting xenstore page\n"); return 1; } @@ -278,13 +274,11 @@ static int init_domain(struct xs_handle *xsh, if (rc < 0) return rc; -rc = create_xenstore(xsh, info, uuid, xenstore_evtchn); +rc = create_xenstore(xsh, info, uuid, xenstore_pfn, xenstore_evtchn); if (rc) err(1, "writing to xenstore"); -rc = xs_introduce_domain(xsh, info->domid, -(GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET, -xenstore_evtchn); +rc = xs_introduce_domain(xsh, info->domid, xenstore_pfn, xenstore_evtchn); if (!rc) err(1, "xs_introduce_domain"); return 0; -- 2.34.1
[PATCH v2 1/4] xen/arm: Alloc hypervisor reserved pages as magic pages for Dom0less DomUs
There are use cases (for example using the PV driver) in Dom0less setup that require Dom0less DomUs start immediately with Dom0, but initialize XenStore later after Dom0's successful boot and call to the init-dom0less application. An error message can seen from the init-dom0less application on 1:1 direct-mapped domains: ``` Allocating magic pages memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1 Error on alloc magic pages ``` The "magic page" is a terminology used in the toolstack as reserved pages for the VM to have access to virtual platform capabilities. Currently the magic pages for Dom0less DomUs are populated by the init-dom0less app through populate_physmap(), and populate_physmap() automatically assumes gfn == mfn for 1:1 direct mapped domains. This cannot be true for the magic pages that are allocated later from the init-dom0less application executed in Dom0. For domain using statically allocated memory but not 1:1 direct-mapped, similar error "failed to retrieve a reserved page" can be seen as the reserved memory list is empty at that time. To solve above issue, this commit allocates hypervisor reserved pages (currently used as the magic pages) for Arm Dom0less DomUs at the domain construction time. The base address/PFN of the region will be noted and communicated to the init-dom0less application in Dom0. Reported-by: Alec Kwapis Suggested-by: Daniel P. Smith Signed-off-by: Henry Wang --- v2: - Reword the commit msg to explain what is "magic page" and use generic terminology "hypervisor reserved pages" in commit msg. (Daniel) - Also move the offset definition of magic pages. (Michal) - Extract the magic page allocation logic to a function. (Michal) --- tools/libs/guest/xg_dom_arm.c | 6 -- xen/arch/arm/dom0less-build.c | 32 xen/include/public/arch-arm.h | 6 ++ 3 files changed, 38 insertions(+), 6 deletions(-) diff --git a/tools/libs/guest/xg_dom_arm.c b/tools/libs/guest/xg_dom_arm.c index 2fd8ee7ad4..8c579d7576 100644 --- a/tools/libs/guest/xg_dom_arm.c +++ b/tools/libs/guest/xg_dom_arm.c @@ -25,12 +25,6 @@ #include "xg_private.h" -#define NR_MAGIC_PAGES 4 -#define CONSOLE_PFN_OFFSET 0 -#define XENSTORE_PFN_OFFSET 1 -#define MEMACCESS_PFN_OFFSET 2 -#define VUART_PFN_OFFSET 3 - #define LPAE_SHIFT 9 #define PFN_4K_SHIFT (0) diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index 74f053c242..4b96ddd9ce 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -739,6 +739,34 @@ static int __init alloc_xenstore_evtchn(struct domain *d) return 0; } +static int __init alloc_magic_pages(struct domain *d) +{ +struct page_info *magic_pg; +mfn_t mfn; +gfn_t gfn; +int rc; + +d->max_pages += NR_MAGIC_PAGES; +magic_pg = alloc_domheap_pages(d, get_order_from_pages(NR_MAGIC_PAGES), 0); +if ( magic_pg == NULL ) +return -ENOMEM; + +mfn = page_to_mfn(magic_pg); +if ( !is_domain_direct_mapped(d) ) +gfn = gaddr_to_gfn(GUEST_MAGIC_BASE); +else +gfn = gaddr_to_gfn(mfn_to_maddr(mfn)); + +rc = guest_physmap_add_pages(d, gfn, mfn, NR_MAGIC_PAGES); +if ( rc ) +{ +free_domheap_pages(magic_pg, get_order_from_pages(NR_MAGIC_PAGES)); +return rc; +} + +return 0; +} + static int __init construct_domU(struct domain *d, const struct dt_device_node *node) { @@ -840,6 +868,10 @@ static int __init construct_domU(struct domain *d, if ( rc < 0 ) return rc; d->arch.hvm.params[HVM_PARAM_STORE_PFN] = ~0ULL; + +rc = alloc_magic_pages(d); +if ( rc < 0 ) +return rc; } return rc; diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h index 289af81bd6..186520d01f 100644 --- a/xen/include/public/arch-arm.h +++ b/xen/include/public/arch-arm.h @@ -476,6 +476,12 @@ typedef uint64_t xen_callback_t; #define GUEST_MAGIC_BASE xen_mk_ullong(0x3900) #define GUEST_MAGIC_SIZE xen_mk_ullong(0x0100) +#define NR_MAGIC_PAGES 4 +#define CONSOLE_PFN_OFFSET 0 +#define XENSTORE_PFN_OFFSET 1 +#define MEMACCESS_PFN_OFFSET 2 +#define VUART_PFN_OFFSET 3 + #define GUEST_RAM_BANKS 2 /* -- 2.34.1
Re: [PATCH 1/3] xen/arm/dom0less-build: Alloc magic pages for Dom0less DomUs from hypervisor
Hi Michal, Thanks very much for taking a look! On 5/10/2024 3:37 PM, Michal Orzel wrote: Hi Henry, On 26/04/2024 05:14, Henry Wang wrote: There are use cases (for example using the PV driver) in Dom0less setup that require Dom0less DomUs start immediately with Dom0, but initialize XenStore later after Dom0's successful boot and call to the init-dom0less application. An error message can seen from the init-dom0less application on 1:1 direct-mapped domains: ``` Allocating magic pages memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1 Error on alloc magic pages ``` This is because currently the magic pages for Dom0less DomUs are populated by the init-dom0less app through populate_physmap(), and populate_physmap() automatically assumes gfn == mfn for 1:1 direct mapped domains. This cannot be true for the magic pages that are allocated later from the init-dom0less application executed in Dom0. For domain using statically allocated memory but not 1:1 direct-mapped, similar error "failed to retrieve a reserved page" can be seen as the reserved memory list is empty at that time. To solve above issue, this commit allocates the magic pages for Dom0less DomUs at the domain construction time. The base address/PFN of the magic page region will be noted and communicated to the init-dom0less application in Dom0. Reported-by: Alec Kwapis Suggested-by: Daniel P. Smith Signed-off-by: Henry Wang --- tools/libs/guest/xg_dom_arm.c | 1 - xen/arch/arm/dom0less-build.c | 22 ++ xen/include/public/arch-arm.h | 1 + 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/tools/libs/guest/xg_dom_arm.c b/tools/libs/guest/xg_dom_arm.c index 2fd8ee7ad4..8cc7f27dbb 100644 --- a/tools/libs/guest/xg_dom_arm.c +++ b/tools/libs/guest/xg_dom_arm.c @@ -25,7 +25,6 @@ #include "xg_private.h" -#define NR_MAGIC_PAGES 4 Moving only this macro to arch-arm.h while leaving the offsets does not make much sense to me. I think they all should be moved. This would also allow init-dom0less.h not to re-define XENSTORE_PFN_OFFSET. Sounds good. Will do in v2. #define CONSOLE_PFN_OFFSET 0 #define XENSTORE_PFN_OFFSET 1 #define MEMACCESS_PFN_OFFSET 2 diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index fb63ec6fd1..40dc85c759 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -834,11 +834,33 @@ static int __init construct_domU(struct domain *d, if ( kinfo.dom0less_feature & DOM0LESS_XENSTORE ) { +struct page_info *magic_pg; +mfn_t mfn; +gfn_t gfn; + ASSERT(hardware_domain); rc = alloc_xenstore_evtchn(d); if ( rc < 0 ) return rc; d->arch.hvm.params[HVM_PARAM_STORE_PFN] = ~0ULL; + +d->max_pages += NR_MAGIC_PAGES; +magic_pg = alloc_domheap_pages(d, get_order_from_pages(NR_MAGIC_PAGES), 0); 80 char exceeded Ooops, I am sorry. Will fix in v2. +if ( magic_pg == NULL ) +return -ENOMEM; + +mfn = page_to_mfn(magic_pg); +if ( !is_domain_direct_mapped(d) ) +gfn = gaddr_to_gfn(GUEST_MAGIC_BASE); +else +gfn = gaddr_to_gfn(mfn_to_maddr(mfn)); + +rc = guest_physmap_add_pages(d, gfn, mfn, NR_MAGIC_PAGES); +if ( rc ) +{ +free_domheap_pages(magic_pg, get_order_from_pages(NR_MAGIC_PAGES)); +return rc; +} Please create a function alloc_magic_pages to encapsulate the above block. Sure. Will do. Kind regards, Henry
Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM
Hi Julien, On 5/9/2024 4:46 AM, Julien Grall wrote: Hi Henry, [...] we have 3 possible states which can be read from LR for this case : active, pending, pending and active. - I don't think we can do anything about the active state, so we should return -EBUSY and reject the whole operation of removing the IRQ from running guest, and user can always retry this operation. This would mean a malicious/buggy guest would be able to prevent a device to be de-assigned. This is not a good idea in particular when the domain is dying. That said, I think you can handle this case. The LR has a bit to indicate whether the pIRQ needs to be EOIed. You can clear it and this would prevent the guest to touch the pIRQ. There might be other clean-up to do in the vGIC datastructure. I probably misunderstood this sentence, do you mean the EOI bit in the pINTID field? I think this bit is only available when the HW bit of LR is 0, but in our case the HW is supposed to be 1 (as indicated as your previous comment). Would you mind clarifying a bit more? Thanks! You are right, ICH_LR.HW will be 1 for physical IRQ routed to a guest. What I was trying to explain is this bit could be cleared (with ICH_LR.pINTD adjusted). Thank you for all the discussions. Based on that, would below diff make sense to you? I did a test of the dynamic dtbo adding/removing with a ethernet device with this patch applied. Test steps are: (1) Use xl dt-overlay to add the ethernet device to Xen device tree and assign it to dom0. (2) Create a domU. (3) Use xl dt-overlay to de-assign the device from dom0 and assign it to domU. (4) Destroy the domU. The ethernet device is functional in the domain respectively when it is attached to a domain and I don't see errors when I destroy domU. But honestly I think the case we talked about is a quite unusual case so I am not sure if it was hit during my test. ``` diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index a775f886ed..d3f9cd2299 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -135,16 +135,6 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, ASSERT(virq < vgic_num_irqs(d)); ASSERT(!is_lpi(virq)); - /* - * When routing an IRQ to guest, the virtual state is not synced - * back to the physical IRQ. To prevent get unsync, restrict the - * routing to when the Domain is been created. - */ -#ifndef CONFIG_OVERLAY_DTB - if ( d->creation_finished ) - return -EBUSY; -#endif - ret = vgic_connect_hw_irq(d, NULL, virq, desc, true); if ( ret ) return ret; @@ -169,20 +159,40 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, ASSERT(test_bit(_IRQ_GUEST, >status)); ASSERT(!is_lpi(virq)); - /* - * Removing an interrupt while the domain is running may have - * undesirable effect on the vGIC emulation. - */ -#ifndef CONFIG_OVERLAY_DTB - if ( !d->is_dying ) - return -EBUSY; -#endif - desc->handler->shutdown(desc); /* EOI the IRQ if it has not been done by the guest */ if ( test_bit(_IRQ_INPROGRESS, >status) ) + { + /* + * Handle the LR where the physical interrupt is de-assigned from the + * guest before it was EOIed + */ + struct vcpu *v_target = vgic_get_target_vcpu(d->vcpu[0], virq); + struct vgic_irq_rank *rank = vgic_rank_irq(v_target, virq); + struct pending_irq *p = irq_to_pending(v_target, virq); + unsigned long flags; + + spin_lock_irqsave(_target->arch.vgic.lock, flags); + /* LR allocated for the IRQ */ + if ( test_bit(GIC_IRQ_GUEST_ACTIVE, >status) && + test_bit(GIC_IRQ_GUEST_VISIBLE, >status) ) + { + gic_hw_ops->clear_lr(p->lr); + clear_bit(p->lr, _target->arch.lr_mask); + + clear_bit(GIC_IRQ_GUEST_VISIBLE, >status); + clear_bit(GIC_IRQ_GUEST_ACTIVE, >status); + p->lr = GIC_INVALID_LR; + } + spin_unlock_irqrestore(_target->arch.vgic.lock, flags); + + vgic_lock_rank(v_target, rank, flags); + vgic_disable_irqs(v_target, (~rank->ienable) & rank->ienable, rank->index); + vgic_unlock_rank(v_target, rank, flags); + gic_hw_ops->deactivate_irq(desc); + } clear_bit(_IRQ_INPROGRESS, >status); ret = vgic_connect_hw_irq(d, NULL, virq, desc, false); ``` Kind regards, Henry Cheers,
Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM
Hi Julien, On 5/8/2024 5:54 AM, Julien Grall wrote: Hi Henry, What if the DT overlay is unloaded and then reloaded? Wouldn't the same interrupt be re-used? As a more generic case, this could also be a new bitstream for the FPGA. But even if the interrupt is brand new every time for the DT overlay, you are effectively relaxing the check for every user (such as XEN_DOMCTL_bind_pt_irq). So the interrupt re-use case needs to be taken into account. I agree. I think IIUC, with your explanation here and below, could we simplify the problem to how to properly handle the removal of the IRQ from a running guest, if we always properly remove and clean up the information when remove the IRQ from the guest? In this way, the IRQ can always be viewed as a brand new one when we add it back. If we can make sure the virtual IRQ and physical IRQ is cleaned then yes. Then the only corner case that we need to take care of would be... Can you clarify whether you say the "only corner case" because you looked at the code? Or is it just because I mentioned only one? Well, I indeed checked the code and to my best knowledge the corner case that you pointed out would be the only one I can think of. Xen allows the guest to enable a vIRQ even if there is no pIRQ assigned. Thanksfully, it looks like the vgic_connect_hw_irq(), in both the current and new vGIC, will return an error if we are trying to route a pIRQ to an already enabled vIRQ. But we need to investigate all the possible scenarios to make sure that any inconsistencies between the physical state and virtual state (including the LRs) will not result to bigger problem. The one that comes to my mind is: The physical interrupt is de-assigned from the guest before it was EOIed. In this case, the interrupt will still be in the LR with the HW bit set. This would allow the guest to EOI the interrupt even if it is routed to someone else. It is unclear what would be the impact on the other guest. ...same as this case, i.e. test_bit(_IRQ_INPROGRESS, >status) || !test_bit(_IRQ_DISABLED, >status)) when we try to remove the IRQ from a running domain. We already call ->shutdown() which will disable the IRQ. So don't we only need to take care of _IRQ_INPROGRESS? Yes you are correct. we have 3 possible states which can be read from LR for this case : active, pending, pending and active. - I don't think we can do anything about the active state, so we should return -EBUSY and reject the whole operation of removing the IRQ from running guest, and user can always retry this operation. This would mean a malicious/buggy guest would be able to prevent a device to be de-assigned. This is not a good idea in particular when the domain is dying. That said, I think you can handle this case. The LR has a bit to indicate whether the pIRQ needs to be EOIed. You can clear it and this would prevent the guest to touch the pIRQ. There might be other clean-up to do in the vGIC datastructure. I probably misunderstood this sentence, do you mean the EOI bit in the pINTID field? I think this bit is only available when the HW bit of LR is 0, but in our case the HW is supposed to be 1 (as indicated as your previous comment). Would you mind clarifying a bit more? Thanks! Anyway, we don't have to handle removing an active IRQ when the domain is still running (although we do when the domain is destroying). But I think this would need to be solved before the feature is (security) supported. - For the pending (and active) case, Shouldn't the pending and active case handled the same way as the active case? Sorry, yes you are correct. Kind regards, Henry
Re: [PATCH 05/15] tools/libs/light: Increase nr_spi to 160
Hi Julien, On 5/7/2024 10:35 PM, Julien Grall wrote: Hi, On 06/05/2024 06:17, Henry Wang wrote: On 5/1/2024 9:58 PM, Anthony PERARD wrote: On Wed, Apr 24, 2024 at 11:34:39AM +0800, Henry Wang wrote: Increase number of spi to 160 i.e. gic_number_lines() for Xilinx ZynqMP - 32. This was done to allocate and assign IRQs to a running domain. Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- tools/libs/light/libxl_arm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index dd5c9f4917..50dbd0f2a9 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -181,7 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, LOG(DEBUG, "Configure the domain"); - config->arch.nr_spis = nr_spis; + /* gic_number_lines() is 192 for Xilinx ZynqMP. min nr_spis = 192 - 32. */ + config->arch.nr_spis = MAX(nr_spis, 160); Is there a way that that Xen or libxl could find out what the minimum number of SPI needs to be? I am afraid currently there is none. Are we going to have to increase that minimum number every time a new platform comes along? It doesn't appear that libxl is using that `nr_spis` value and it is probably just given to Xen. So my guess is that Xen could simply take care of the minimum value, gic_number_lines() seems to be a Xen function. Xen will take care of the value of nr_spis for dom0 in create_dom0() dom0_cfg.arch.nr_spis = min(gic_number_lines(), (unsigned int) 992) - 32; and also for dom0less domUs in create_domUs(). However, it looks like Xen will not take care of the mininum value for libxl guests, the value from config->arch.nr_spis in guest config file will be directly passed to the domain_vgic_init() function from arch_domain_create(). I agree with you that we shouldn't just bump the number everytime when we have a new platform. Therefore, would it be a good idea to move the logic in this patch to arch_sanitise_domain_config()? Xen domains are supposed to be platform agnostics and therefore the numbers of SPIs should not be based on the HW. Furthermore, with your proposal we would end up to allocate data structure for N SPIs when a domain may never needs any SPIs (such as if passthrough is not in-use). This is more likely for domain created by the toolstack than from Xen directly. Agreed on both comments. Instead, we should introduce a new XL configuration to let the user decide the number of SPIs. I would suggest to name "nr_spis" to match the DT bindings. Sure, I will introduce a new xl config for this to replace this patch. Thank you for the suggestion. Kind regards, Henry Cheers,
[PATCH v2 0/2] Some fixes for the existing dynamic dtbo code
During the review process for the v1 of the dynamic dtbo series, some issues of the existing code were identified. Discussions of them can be found in [1] (for the first patch) and [2] (for the second patch). Since the main part of the remaining dynamic dtbo series requires more rework, just send these fixes for now. [1] https://lore.kernel.org/xen-devel/835099c8-6cf0-4f6d-899b-07388df89...@xen.org/ [2] https://lore.kernel.org/xen-devel/eaea1986-a27e-4d6c-932f-1d0a9918861f@perard/ Henry Wang (2): xen/common/dt-overlay: Fix missing lock when remove the device tools/xl: Correct the help information and exit code of the dt-overlay command tools/xl/xl_vmcontrol.c | 6 +++--- xen/common/dt-overlay.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) -- 2.34.1
[PATCH v2 2/2] tools/xl: Correct the help information and exit code of the dt-overlay command
Fix the name mismatch in the xl dt-overlay command, the command name should be "dt-overlay" instead of "dt_overlay". Fix the exit code of the dt-overlay command, use EXIT_FAILURE instead of ERROR_FAIL. Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support") Suggested-by: Anthony PERARD Signed-off-by: Henry Wang --- v2: - New patch --- tools/xl/xl_vmcontrol.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c index 98f6bd2e76..02575d5d36 100644 --- a/tools/xl/xl_vmcontrol.c +++ b/tools/xl/xl_vmcontrol.c @@ -1278,7 +1278,7 @@ int main_dt_overlay(int argc, char **argv) const int overlay_remove_op = 2; if (argc < 2) { -help("dt_overlay"); +help("dt-overlay"); return EXIT_FAILURE; } @@ -1302,11 +1302,11 @@ int main_dt_overlay(int argc, char **argv) fprintf(stderr, "failed to read the overlay device tree file %s\n", overlay_config_file); free(overlay_dtb); -return ERROR_FAIL; +return EXIT_FAILURE; } } else { fprintf(stderr, "overlay dtbo file not provided\n"); -return ERROR_FAIL; +return EXIT_FAILURE; } rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op); -- 2.34.1
[PATCH v2 1/2] xen/common/dt-overlay: Fix missing lock when remove the device
If CONFIG_DEBUG=y, below assertion will be triggered: (XEN) Assertion 'rw_is_locked(_host_lock)' failed at drivers/passthrough/device_tree.c:146 (XEN) [ Xen-4.19-unstable arm64 debug=y Not tainted ] (XEN) CPU:0 (XEN) PC: 0a257418 iommu_remove_dt_device+0x8c/0xd4 (XEN) LR: 0a2573a0 (XEN) SP: 8000fff7fb30 (XEN) CPSR: 0249 MODE:64-bit EL2h (Hypervisor, handler) [...] (XEN) Xen call trace: (XEN)[<0a257418>] iommu_remove_dt_device+0x8c/0xd4 (PC) (XEN)[<0a2573a0>] iommu_remove_dt_device+0x14/0xd4 (LR) (XEN)[<0a20797c>] dt-overlay.c#remove_node_resources+0x8c/0x90 (XEN)[<0a207f14>] dt-overlay.c#remove_nodes+0x524/0x648 (XEN)[<0a208460>] dt_overlay_sysctl+0x428/0xc68 (XEN)[<0a2707f8>] arch_do_sysctl+0x1c/0x2c (XEN)[<0a230b40>] do_sysctl+0x96c/0x9ec (XEN)[<0a271e08>] traps.c#do_trap_hypercall+0x1e8/0x288 (XEN)[<0a273490>] do_trap_guest_sync+0x448/0x63c (XEN)[<0a25c480>] entry.o#guest_sync_slowpath+0xa8/0xd8 (XEN) (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Assertion 'rw_is_locked(_host_lock)' failed at drivers/passthrough/device_tree.c:146 (XEN) This is because iommu_remove_dt_device() is called without taking the dt_host_lock. dt_host_lock is meant to ensure that the DT node will not disappear behind back. So fix the issue by taking the lock as soon as getting hold of overlay_node. Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities") Signed-off-by: Henry Wang --- v2: - Take the lock as soon as getting hold of overlay_node. v1.1: - Move the unlock position before the check of rc. --- xen/common/dt-overlay.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c index 1b197381f6..25d15cbcb1 100644 --- a/xen/common/dt-overlay.c +++ b/xen/common/dt-overlay.c @@ -429,6 +429,8 @@ static int remove_nodes(const struct overlay_track *tracker) if ( overlay_node == NULL ) return -EINVAL; +write_lock(_host_lock); + rc = remove_descendant_nodes_resources(overlay_node); if ( rc ) return rc; @@ -439,8 +441,6 @@ static int remove_nodes(const struct overlay_track *tracker) dt_dprintk("Removing node: %s\n", overlay_node->full_name); -write_lock(_host_lock); - rc = dt_overlay_remove_node(overlay_node); if ( rc ) { -- 2.34.1
Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM
Hi Julien, On 5/1/2024 4:13 AM, Julien Grall wrote: Hi Henry, On 30/04/2024 04:50, Henry Wang wrote: On 4/25/2024 10:28 PM, Julien Grall wrote: Thanks for your feeedback. After checking the b8577547236f commit message I think I now understand your point. Do you have any suggestion about how can I properly add the support to route/remove the IRQ to running domains? Thanks. I spent some time going through the GIC/vGIC code and had some discussions with Stefano and Stewart during the last couple of days, let me see if I can describe the use case properly now to continue the discussion: We have some use cases that requires assigning devices to domains after domain boot time. For example, suppose there is an FPGA on the board which can simulate a device, and the bitstream for the FPGA is provided and programmed after domain boot. So we need a way to assign the device to the running domain. This series tries to implement this use case by using device tree overlay - users can firstly add the overlay to Xen dtb, assign the device in the overlay to a domain by the xl command, then apply the overlay to Linux. Thanks for the description! This helps to understand your goal :). Thank you very much for spending your time on discussing this and provide these valuable comments! I haven't really look at that code in quite a while. I think we need to make sure that the virtual and physical IRQ state matches at the time we do the routing. I am undecided on whether we want to simply prevent the action to happen or try to reset the state. There is also the question of what to do if the guest is enabling the vIRQ before it is routed. Sorry for bothering, would you mind elaborating a bit more about the two cases that you mentioned above? Commit b8577547236f ("xen/arm: Restrict when a physical IRQ can be routed/removed from/to a domain") only said there will be undesirable effects, so I am not sure if I understand the concerns raised above and the consequences of these two use cases. I will try to explain them below after I answer the rest. I am probably wrong, I think when we add the overlay, we are probably fine as the interrupt is not being used before. What if the DT overlay is unloaded and then reloaded? Wouldn't the same interrupt be re-used? As a more generic case, this could also be a new bitstream for the FPGA. But even if the interrupt is brand new every time for the DT overlay, you are effectively relaxing the check for every user (such as XEN_DOMCTL_bind_pt_irq). So the interrupt re-use case needs to be taken into account. I agree. I think IIUC, with your explanation here and below, could we simplify the problem to how to properly handle the removal of the IRQ from a running guest, if we always properly remove and clean up the information when remove the IRQ from the guest? In this way, the IRQ can always be viewed as a brand new one when we add it back. Then the only corner case that we need to take care of would be... Also since we only load the device driver after the IRQ is routed to the guest, This is what a well-behave guest will do. However, we need to think what will happen if a guest misbehaves. I am not concerned about a guest only impacting itself, I am more concerned about the case where the rest of the system is impacted. I am not sure the guest can enable the vIRQ before it is routed. Xen allows the guest to enable a vIRQ even if there is no pIRQ assigned. Thanksfully, it looks like the vgic_connect_hw_irq(), in both the current and new vGIC, will return an error if we are trying to route a pIRQ to an already enabled vIRQ. But we need to investigate all the possible scenarios to make sure that any inconsistencies between the physical state and virtual state (including the LRs) will not result to bigger problem. The one that comes to my mind is: The physical interrupt is de-assigned from the guest before it was EOIed. In this case, the interrupt will still be in the LR with the HW bit set. This would allow the guest to EOI the interrupt even if it is routed to someone else. It is unclear what would be the impact on the other guest. ...same as this case, i.e. test_bit(_IRQ_INPROGRESS, >status) || !test_bit(_IRQ_DISABLED, >status)) when we try to remove the IRQ from a running domain. we have 3 possible states which can be read from LR for this case : active, pending, pending and active. - I don't think we can do anything about the active state, so we should return -EBUSY and reject the whole operation of removing the IRQ from running guest, and user can always retry this operation. - For the pending (and active) case, can we clear the LR and point the LR for the pending_irq to invalid? Kind regards, Henry Cheers,
Re: [PATCH 08/15] tools: Add domain_id and expert mode for overlay operations
Hi Anthony, On 5/1/2024 10:46 PM, Anthony PERARD wrote: On Wed, Apr 24, 2024 at 11:34:42AM +0800, Henry Wang wrote: From: Vikram Garhwal Add domain_id and expert mode for overlay assignment. This enables dynamic programming of nodes during runtime. Take the opportunity to fix the name mismatch in the xl command, the command name should be "dt-overlay" instead of "dt_overlay". I don't like much these unrelated / opportunistic changes in a patch, I'd rather have a separate patch. And in this case, if it was on a separate patch, that separated patch could gain: Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support") and potentially backported. Ok. I can split this part to a separated commit. Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- tools/include/libxl.h | 8 +-- tools/include/xenctrl.h | 5 +++-- tools/libs/ctrl/xc_dt_overlay.c | 7 -- tools/libs/light/libxl_dt_overlay.c | 17 +++ tools/xl/xl_vmcontrol.c | 34 ++--- 5 files changed, 58 insertions(+), 13 deletions(-) diff --git a/tools/include/libxl.h b/tools/include/libxl.h index 62cb07dea6..59a3e1b37c 100644 --- a/tools/include/libxl.h +++ b/tools/include/libxl.h @@ -2549,8 +2549,12 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, uint32_t domid, void libxl_device_pci_list_free(libxl_device_pci* list, int num); #if defined(__arm__) || defined(__aarch64__) -int libxl_dt_overlay(libxl_ctx *ctx, void *overlay, - uint32_t overlay_size, uint8_t overlay_op); +#define LIBXL_DT_OVERLAY_ADD 1 +#define LIBXL_DT_OVERLAY_REMOVE2 + +int libxl_dt_overlay(libxl_ctx *ctx, uint32_t domain_id, void *overlay, + uint32_t overlay_size, uint8_t overlay_op, bool auto_mode, + bool domain_mapping); Sorry, you cannot change the API of an existing libxl function without providing something backward compatible. We have already a few example of this changes in libxl.h, e.g.: fded24ea8315 ("libxl: Make libxl_set_vcpuonline async") So, providing a wrapper called libxl_dt_overlay_0x041800() which call the new function. Ok, I will add an wrapper. #endif /* diff --git a/tools/libs/light/libxl_dt_overlay.c b/tools/libs/light/libxl_dt_overlay.c index a6c709a6dc..cdb62b28cf 100644 --- a/tools/libs/light/libxl_dt_overlay.c +++ b/tools/libs/light/libxl_dt_overlay.c @@ -57,10 +58,18 @@ int libxl_dt_overlay(libxl_ctx *ctx, void *overlay_dt, uint32_t overlay_dt_size, rc = 0; } -r = xc_dt_overlay(ctx->xch, overlay_dt, overlay_dt_size, overlay_op); +/* Check if user entered a valid domain id. */ +rc = libxl_domain_info(CTX, NULL, domid); +if (rc == ERROR_DOMAIN_NOTFOUND) { Why do you check specifically for "domain not found", what about other error? I agree this is indeed very confusing...I will rewrite this part properly in the next version. +LOGD(ERROR, domid, "Non-existant domain."); +return ERROR_FAIL; Use `goto out`, and you can let the function return ERROR_DOMAIN_NOTFOUND if that the error, we can just propagate the `rc` from libxl_domain_info(). Sure, will do the suggested way. +} + +r = xc_dt_overlay(ctx->xch, domid, overlay_dt, overlay_dt_size, overlay_op, + domain_mapping); if (r) { -LOG(ERROR, "%s: Adding/Removing overlay dtb failed.", __func__); +LOG(ERROR, "domain%d: Adding/Removing overlay dtb failed.", domid); You could replace the macro by LOGD, instead of handwriting "domain%d". Great suggestion. I will use LOGD. rc = ERROR_FAIL; } diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c index 98f6bd2e76..9674383ec3 100644 --- a/tools/xl/xl_vmcontrol.c +++ b/tools/xl/xl_vmcontrol.c @@ -1270,21 +1270,48 @@ int main_dt_overlay(int argc, char **argv) { const char *overlay_ops = NULL; const char *overlay_config_file = NULL; +uint32_t domain_id = 0; void *overlay_dtb = NULL; int rc; +bool auto_mode = true; +bool domain_mapping = false; uint8_t op; int overlay_dtb_size = 0; const int overlay_add_op = 1; const int overlay_remove_op = 2; -if (argc < 2) { -help("dt_overlay"); +if (argc < 3) { +help("dt-overlay"); return EXIT_FAILURE; } +if (argc > 5) { +fprintf(stderr, "Too many arguments\n"); +return ERROR_FAIL; +} + overlay_ops = argv[1]; overlay_config_file = argv[2]; +if (!strcmp(argv[argc - 1], "-e")) +auto_mode = false; + +if (argc == 4 || !auto_mode) { +domain_id = find_domain(argv[argc-1]); +
Re: [PATCH 09/15] tools/libs/light: Modify dtbo to domU linux dtbo format
Hi Anthony, On 5/1/2024 11:09 PM, Anthony PERARD wrote: On Wed, Apr 24, 2024 at 11:34:43AM +0800, Henry Wang wrote: diff --git a/tools/libs/light/libxl_dt_overlay.c b/tools/libs/light/libxl_dt_overlay.c index cdb62b28cf..eaf11a0f9c 100644 --- a/tools/libs/light/libxl_dt_overlay.c +++ b/tools/libs/light/libxl_dt_overlay.c @@ -41,6 +42,69 @@ static int check_overlay_fdt(libxl__gc *gc, void *fdt, size_t size) return 0; } +static int modify_overlay_for_domU(libxl__gc *gc, void *overlay_dt_domU, + size_t size) +{ +int rc = 0; +int virtual_interrupt_parent = GUEST_PHANDLE_GIC; +const struct fdt_property *fdt_prop_node = NULL; +int overlay; +int prop_len = 0; +int subnode = 0; +int fragment; +const char *prop_name; +const char *target_path = "/"; + +fdt_for_each_subnode(fragment, overlay_dt_domU, 0) { +prop_name = fdt_getprop(overlay_dt_domU, fragment, "target-path", +_len); +if (prop_name == NULL) { +LOG(ERROR, "target-path property not found\n"); LOG* macros already takes care of adding \n, no need to add an extra one. Sure, I will remove the "\n". +rc = ERROR_FAIL; +goto err; +} + +/* Change target path for domU dtb. */ +rc = fdt_setprop_string(overlay_dt_domU, fragment, "target-path", fdt_setprop_string() isn't a libxl function, store the return value in a variable named `r` instead.' Thanks for spotting this. Will change it to `r`. +target_path); +if (rc) { +LOG(ERROR, "Setting interrupt parent property failed for %s\n", +prop_name); +goto err; +} + +overlay = fdt_subnode_offset(overlay_dt_domU, fragment, "__overlay__"); + +fdt_for_each_subnode(subnode, overlay_dt_domU, overlay) +{ +const char *node_name = fdt_get_name(overlay_dt_domU, subnode, + NULL); + +fdt_prop_node = fdt_getprop(overlay_dt_domU, subnode, +"interrupt-parent", _len); +if (fdt_prop_node == NULL) { +LOG(DETAIL, "%s property not found for %s. Skip to next node\n", +"interrupt-parent", node_name); Why do you have "interrupt-parent" in a separate argument? Do you meant to do something like const char *some_name = "interrupt-parent"; and use that in the 4 different places that this string is used? (Using a variable mean that we (or the compiler) can make sure that they are all spelled correctly. Great suggestion! I will do this way. +continue; +} + +rc = fdt_setprop_inplace_u32(overlay_dt_domU, subnode, + "interrupt-parent", + virtual_interrupt_parent); +if (rc) { +LOG(ERROR, "Setting interrupt parent property failed for %s\n", +"interrupt-parent"); +goto err; +} +} +} + +return 0; Missed indentation. Will correct it. + +err: +return rc; A few things, looks like `rc` is always going to be ERROR_FAIL here, unless you find an libxl_error code that better describe the error, so you could forgo the `rc` variable. Also, if you don't need to clean up anything in the function or have a generic error message, you could simply "return " instead of using the "goto" style. Sure, I will simply use return because I don't really think there is anything to be cleaned up. +} + int libxl_dt_overlay(libxl_ctx *ctx, uint32_t domid, void *overlay_dt, uint32_t overlay_dt_size, uint8_t overlay_op, bool auto_mode, bool domain_mapping) @@ -73,6 +137,15 @@ int libxl_dt_overlay(libxl_ctx *ctx, uint32_t domid, void *overlay_dt, rc = ERROR_FAIL; } +/* + * auto_mode doesn't apply to dom0 as dom0 can get the physical + * description of the hardware. + */ +if (domid && auto_mode) { +if (overlay_op == LIBXL_DT_OVERLAY_ADD) Shouldn't libxl complain if the operation is different? I will add corresponding error handling code here. Thanks! Kind regards, Henry +rc = modify_overlay_for_domU(gc, overlay_dt, overlay_dt_size); +} + out: GC_FREE; return rc; Thanks,
Re: [PATCH 07/15] xen/overlay: Enable device tree overlay assignment to running domains
Hi Julien, On 4/30/2024 5:47 PM, Julien Grall wrote: On 30/04/2024 05:00, Henry Wang wrote: Hi Julien, Hi Henry, On 4/30/2024 1:34 AM, Julien Grall wrote: On 29/04/2024 04:36, Henry Wang wrote: Hi Jan, Julien, Stefano, Hi Henry, On 4/24/2024 2:05 PM, Jan Beulich wrote: On 24.04.2024 05:34, Henry Wang wrote: --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -1197,7 +1197,9 @@ struct xen_sysctl_dt_overlay { #define XEN_SYSCTL_DT_OVERLAY_ADD 1 #define XEN_SYSCTL_DT_OVERLAY_REMOVE 2 uint8_t overlay_op; /* IN: Add or remove. */ - uint8_t pad[3]; /* IN: Must be zero. */ + bool domain_mapping; /* IN: True of False. */ + uint8_t pad[2]; /* IN: Must be zero. */ + uint32_t domain_id; }; If you merely re-purposed padding fields, all would be fine without bumping the interface version. Yet you don't, albeit for an unclear reason: Why uint32_t rather than domid_t? And on top of that - why a separate boolean when you could use e.g. DOMID_INVALID to indicate "no domain mapping"? I think both of your suggestion make great sense. I will follow the suggestion in v2. That said - anything taking a domain ID is certainly suspicious in a sysctl. Judging from the description you really mean this to be a domctl. Anything else will require extra justification. I also think a domctl is better. I had a look at the history of the already merged series, it looks like in the first version of merged part 1 [1], the hypercall was implemented as the domctl in the beginning but later in v2 changed to sysctl. I think this makes sense as the scope of that time is just to make Xen aware of the device tree node via Xen device tree. However this is now a problem for the current part where the scope (and the end goal) is extended to assign the added device to Linux Dom0/DomU via device tree overlays. I am not sure which way is better, should we repurposing the sysctl to domctl or maybe add another domctl (I am worrying about the duplication because basically we need the same sysctl functionality but now with a domid in it)? What do you think? I am not entirely sure this is a good idea to try to add the device in Xen and attach it to the guests at the same time. Imagine the following situation: 1) Add and attach devices 2) The domain is rebooted 3) Detach and remove devices After step 2, you technically have a new domain. You could have also a case where this is a completely different guest. So the flow would look a little bit weird (you create the DT overlay with domain A but remove with domain B). So, at the moment, it feels like the add/attach (resp detech/remove) operations should happen separately. Thinking a bit more about it, there is another problem with the single hypercall appproach. The MMIOs will be mapped 1:1 to the guest. These region may clash with other part of the layout for domain created by the toolstack and dom0less (if the 1:1 option has not been enabled). I guess for that add, it would be possible to specify the mapping in the Device-Tree. But that would not work for the removal (this may be a different domain). On a somewhat similar topic, the number of IRQs supported by the vGIC is fixed at boot. How would that work with this patch? Seeing your comment here I now realized patch #5 is to address this issue. But I think we need to have a complete rework of the original patch to make the feature portable. We can continue the discussion in patch 5. Can you clarify why you want to add devices to Xen and attach to a guest within a single hypercall? Sorry I don't know if there is any specific thoughts on the design of using a single hypercall to do both add devices to Xen device tree and assign the device to the guest. In fact seeing your above comments, I think separating these two functionality to two xl commands using separated hypercalls would indeed be a better idea. Thank you for the suggestion! To make sure I understand correctly, would you mind confirming if below actions for v2 make sense to you? Thanks! - Only use the XEN_SYSCTL_DT_OVERLAY_{ADD, REMOVE} sysctls to add/remove overlay to Xen device tree Note that this would attach the devices to dom0 first. Maybe this is why it was decided to merge the two operations? An option would be to allow the devices to be attached to no-one. - Introduce the xl dt-overlay attach command and respective domctls to do the device assignment for the overlay to domain. We already have domctls to route IRQs and map MMIOs. So do we actually need new domctls? No I don't think so, like you and Stefano said in the other thread, I think I need to split the command to different hypercalls instead of only one hypercall and reuse the existing domctl. Kind regards, Henry Cheers,
Re: [PATCH 05/15] tools/libs/light: Increase nr_spi to 160
Hi Anthony, (+Arm maintainers) On 5/1/2024 9:58 PM, Anthony PERARD wrote: On Wed, Apr 24, 2024 at 11:34:39AM +0800, Henry Wang wrote: Increase number of spi to 160 i.e. gic_number_lines() for Xilinx ZynqMP - 32. This was done to allocate and assign IRQs to a running domain. Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- tools/libs/light/libxl_arm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index dd5c9f4917..50dbd0f2a9 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -181,7 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, LOG(DEBUG, "Configure the domain"); -config->arch.nr_spis = nr_spis; +/* gic_number_lines() is 192 for Xilinx ZynqMP. min nr_spis = 192 - 32. */ +config->arch.nr_spis = MAX(nr_spis, 160); Is there a way that that Xen or libxl could find out what the minimum number of SPI needs to be? I am afraid currently there is none. Are we going to have to increase that minimum number every time a new platform comes along? It doesn't appear that libxl is using that `nr_spis` value and it is probably just given to Xen. So my guess is that Xen could simply take care of the minimum value, gic_number_lines() seems to be a Xen function. Xen will take care of the value of nr_spis for dom0 in create_dom0() dom0_cfg.arch.nr_spis = min(gic_number_lines(), (unsigned int) 992) - 32; and also for dom0less domUs in create_domUs(). However, it looks like Xen will not take care of the mininum value for libxl guests, the value from config->arch.nr_spis in guest config file will be directly passed to the domain_vgic_init() function from arch_domain_create(). I agree with you that we shouldn't just bump the number everytime when we have a new platform. Therefore, would it be a good idea to move the logic in this patch to arch_sanitise_domain_config()? Kind regards, Henry Thanks,
Re: [PATCH 04/15] tools/libs/light: Always enable IOMMU
Hi Anthony, On 5/1/2024 9:47 PM, Anthony PERARD wrote: On Wed, Apr 24, 2024 at 11:34:38AM +0800, Henry Wang wrote: For overlay with iommu functionality to work with running VMs, we need to enable IOMMU when iomem presents for the domains. Signed-off-by: Vikram Garhwal Signed-off-by: Henry Wang --- tools/libs/light/libxl_arm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index 1cb89fa584..dd5c9f4917 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -222,6 +222,12 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, config->arch.sve_vl = d_config->b_info.arch_arm.sve_vl / 128U; } +#ifdef LIBXL_HAVE_DT_OVERLAY libxl_arm.c is only build on Arm, so this should be defined, so no need to check. Ah sure, I was just thought in the future RISC-V/PPC may have the same, but you are correct. I will remove the check. +if (d_config->b_info.num_iomem) { +config->flags |= XEN_DOMCTL_CDF_iommu; Is this doing the same thing as the previous patch? I think so, yes, we need the IOMMU flag to be set if we want to assign a device from a DT node protected by IOMMU. Kind regards, Henry Thanks,
Re: [PATCH 07/15] xen/overlay: Enable device tree overlay assignment to running domains
Hi Stefano, Julien, On 5/3/2024 2:02 AM, Stefano Stabellini wrote: On Tue, 30 Apr 2024, Henry Wang wrote: Hi Julien, On 4/30/2024 1:34 AM, Julien Grall wrote: On 29/04/2024 04:36, Henry Wang wrote: Hi Jan, Julien, Stefano, Hi Henry, On 4/24/2024 2:05 PM, Jan Beulich wrote: On 24.04.2024 05:34, Henry Wang wrote: --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -1197,7 +1197,9 @@ struct xen_sysctl_dt_overlay { #define XEN_SYSCTL_DT_OVERLAY_ADD 1 #define XEN_SYSCTL_DT_OVERLAY_REMOVE 2 uint8_t overlay_op; /* IN: Add or remove. */ - uint8_t pad[3]; /* IN: Must be zero. */ + bool domain_mapping; /* IN: True of False. */ + uint8_t pad[2]; /* IN: Must be zero. */ + uint32_t domain_id; }; If you merely re-purposed padding fields, all would be fine without bumping the interface version. Yet you don't, albeit for an unclear reason: Why uint32_t rather than domid_t? And on top of that - why a separate boolean when you could use e.g. DOMID_INVALID to indicate "no domain mapping"? I think both of your suggestion make great sense. I will follow the suggestion in v2. That said - anything taking a domain ID is certainly suspicious in a sysctl. Judging from the description you really mean this to be a domctl. Anything else will require extra justification. I also think a domctl is better. I had a look at the history of the already merged series, it looks like in the first version of merged part 1 [1], the hypercall was implemented as the domctl in the beginning but later in v2 changed to sysctl. I think this makes sense as the scope of that time is just to make Xen aware of the device tree node via Xen device tree. However this is now a problem for the current part where the scope (and the end goal) is extended to assign the added device to Linux Dom0/DomU via device tree overlays. I am not sure which way is better, should we repurposing the sysctl to domctl or maybe add another domctl (I am worrying about the duplication because basically we need the same sysctl functionality but now with a domid in it)? What do you think? I am not entirely sure this is a good idea to try to add the device in Xen and attach it to the guests at the same time. Imagine the following situation: 1) Add and attach devices 2) The domain is rebooted 3) Detach and remove devices After step 2, you technically have a new domain. You could have also a case where this is a completely different guest. So the flow would look a little bit weird (you create the DT overlay with domain A but remove with domain B). So, at the moment, it feels like the add/attach (resp detech/remove) operations should happen separately. Can you clarify why you want to add devices to Xen and attach to a guest within a single hypercall? Sorry I don't know if there is any specific thoughts on the design of using a single hypercall to do both add devices to Xen device tree and assign the device to the guest. In fact seeing your above comments, I think separating these two functionality to two xl commands using separated hypercalls would indeed be a better idea. Thank you for the suggestion! To make sure I understand correctly, would you mind confirming if below actions for v2 make sense to you? Thanks! - Only use the XEN_SYSCTL_DT_OVERLAY_{ADD, REMOVE} sysctls to add/remove overlay to Xen device tree - Introduce the xl dt-overlay attach command and respective domctls to do the device assignment for the overlay to domain. I think two hypercalls is OK. The original idea was to have a single xl command to do the operation for user convenience (even that is not a hard requirement) but that can result easily in two hypercalls. Ok, sounds good. I will break the command to two hypercalls and try to reuse the existing domctls for assign/remove IRQ/MMIO ranges. Kind regards, Henry
Re: [PATCH 1/3] xen/arm/dom0less-build: Alloc magic pages for Dom0less DomUs from hypervisor
Hi Daniel, On 4/30/2024 6:22 PM, Daniel P. Smith wrote: On 4/29/24 22:55, Henry Wang wrote: Hi Daniel, On 4/30/2024 8:27 AM, Daniel P. Smith wrote: On 4/25/24 23:14, Henry Wang wrote: There are use cases (for example using the PV driver) in Dom0less setup that require Dom0less DomUs start immediately with Dom0, but initialize XenStore later after Dom0's successful boot and call to the init-dom0less application. An error message can seen from the init-dom0less application on 1:1 direct-mapped domains: ``` Allocating magic pages memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1 Error on alloc magic pages ``` This is because currently the magic pages for Dom0less DomUs are populated by the init-dom0less app through populate_physmap(), and populate_physmap() automatically assumes gfn == mfn for 1:1 direct mapped domains. This cannot be true for the magic pages that are allocated later from the init-dom0less application executed in Dom0. For domain using statically allocated memory but not 1:1 direct-mapped, similar error "failed to retrieve a reserved page" can be seen as the reserved memory list is empty at that time. To solve above issue, this commit allocates the magic pages for Dom0less DomUs at the domain construction time. The base address/PFN of the magic page region will be noted and communicated to the init-dom0less application in Dom0. Might I suggest we not refer to these as magic pages? I would consider them as hypervisor reserved pages for the VM to have access to virtual platform capabilities. We may see this expand in the future for some unforeseen, new capability. I think magic page is a specific terminology to refer to these pages, see alloc_magic_pages() for both x86 and Arm. I will reword the last paragraph of the commit message to refer them as "hypervisor reserved pages (currently used as magic pages on Arm)" if this sounds good to you. I would highlight that is a term used in the toolstack, while is probably not the best, there is no reason to change in there, but the hypervisor does not carry that terminology. IMHO we should not introduce it there and be explicit about why the pages are getting reserved. Thanks for the suggestion. I will rework the commit message. Kind regards, Henry v/r, dps
Re: [PATCH v1.1] xen/commom/dt-overlay: Fix missing lock when remove the device
Hi Julien, On 5/3/2024 9:04 PM, Julien Grall wrote: Hi Henry, On 26/04/2024 02:55, Henry Wang wrote: If CONFIG_DEBUG=y, below assertion will be triggered: (XEN) Assertion 'rw_is_locked(_host_lock)' failed at drivers/passthrough/device_tree.c:146 (XEN) [ Xen-4.19-unstable arm64 debug=y Not tainted ] (XEN) CPU: 0 (XEN) PC: 0a257418 iommu_remove_dt_device+0x8c/0xd4 (XEN) LR: 0a2573a0 (XEN) SP: 8000fff7fb30 (XEN) CPSR: 0249 MODE:64-bit EL2h (Hypervisor, handler) [...] (XEN) Xen call trace: (XEN) [<0a257418>] iommu_remove_dt_device+0x8c/0xd4 (PC) (XEN) [<0a2573a0>] iommu_remove_dt_device+0x14/0xd4 (LR) (XEN) [<0a20797c>] dt-overlay.c#remove_node_resources+0x8c/0x90 (XEN) [<0a207f14>] dt-overlay.c#remove_nodes+0x524/0x648 (XEN) [<0a208460>] dt_overlay_sysctl+0x428/0xc68 (XEN) [<0a2707f8>] arch_do_sysctl+0x1c/0x2c (XEN) [<0a230b40>] do_sysctl+0x96c/0x9ec (XEN) [<0a271e08>] traps.c#do_trap_hypercall+0x1e8/0x288 (XEN) [<0a273490>] do_trap_guest_sync+0x448/0x63c (XEN) [<0a25c480>] entry.o#guest_sync_slowpath+0xa8/0xd8 (XEN) (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Assertion 'rw_is_locked(_host_lock)' failed at drivers/passthrough/device_tree.c:146 (XEN) This is because iommu_remove_dt_device() is called without taking the dt_host_lock. Fix the issue by taking and releasing the lock properly. Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities") Signed-off-by: Henry Wang --- v1.1: - Move the unlock position before the check of rc. --- xen/common/dt-overlay.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c index 1b197381f6..ab8f43aea2 100644 --- a/xen/common/dt-overlay.c +++ b/xen/common/dt-overlay.c @@ -381,7 +381,9 @@ static int remove_node_resources(struct dt_device_node *device_node) { if ( dt_device_is_protected(device_node) ) { + write_lock(_host_lock); Looking at the code, we are not modifying the device_node, so shouldn't this be a read_lock()? Hmm yes, however after seeing your comment... That said, even though either fix your issue, I am not entirely convinced this is the correct position for the lock. From my understanding, dt_host_lock is meant to ensure that the DT node will not disappear behind your back. So in theory, shouldn't the lock be taken as soon as you get hold of device_node? ...here. I believe you made a point here so I think I will just move the write_lock(_host_lock) as soon as getting overlay_node, i.e. on top of the call to remove_descendant_nodes_resources(). Therefore we can solve the assertion issue of this patch together. Kind regards, Henry Cheers,
Re: [PATCH 2/3] xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP
Hi Stefano, On 5/3/2024 2:08 AM, Stefano Stabellini wrote: On Fri, 26 Apr 2024, Henry Wang wrote: For use cases such as Dom0less PV drivers, a mechanism to communicate Dom0less DomU's static data with the runtime control plane (Dom0) is needed. Since on Arm HVMOP is already the existing approach to address such use cases (for example the allocation of HVM_PARAM_CALLBACK_IRQ), add a new HVMOP key HVM_PARAM_MAGIC_BASE_PFN for storing the magic page region base PFN. The value will be set at Dom0less DomU construction time after Dom0less DomU's magic page region has been allocated. To keep consistent, also set the value for HVM_PARAM_MAGIC_BASE_PFN for libxl guests in alloc_magic_pages(). Reported-by: Alec Kwapis Signed-off-by: Henry Wang --- tools/libs/guest/xg_dom_arm.c | 2 ++ xen/arch/arm/dom0less-build.c | 2 ++ xen/arch/arm/hvm.c | 1 + xen/include/public/hvm/params.h | 1 + 4 files changed, 6 insertions(+) diff --git a/tools/libs/guest/xg_dom_arm.c b/tools/libs/guest/xg_dom_arm.c index 8cc7f27dbb..3c08782d1d 100644 --- a/tools/libs/guest/xg_dom_arm.c +++ b/tools/libs/guest/xg_dom_arm.c @@ -74,6 +74,8 @@ static int alloc_magic_pages(struct xc_dom_image *dom) xc_clear_domain_page(dom->xch, dom->guest_domid, base + MEMACCESS_PFN_OFFSET); xc_clear_domain_page(dom->xch, dom->guest_domid, dom->vuart_gfn); +xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_MAGIC_BASE_PFN, +base); xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_CONSOLE_PFN, dom->console_pfn); xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_STORE_PFN, diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index 40dc85c759..72187c167d 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -861,6 +861,8 @@ static int __init construct_domU(struct domain *d, free_domheap_pages(magic_pg, get_order_from_pages(NR_MAGIC_PAGES)); return rc; } + +d->arch.hvm.params[HVM_PARAM_MAGIC_BASE_PFN] = gfn_x(gfn); I apologize as I have not read the whole email thread in reply to this patch. Why do we need to introduce a new hvm param instead of just setting HVM_PARAM_CONSOLE_PFN and HVM_PARAM_STORE_PFN directly here? Yeah this is a good question, I aIso thought about this but in the end didn't do that directly because I don't really want to break the current protocol between Linux, Xen and toolstack. In docs/features/dom0less.pandoc, section "PV Drivers", there is a communication protocol saying that Xen should keep the HVM_PARAM_STORE_PFN to ~0ULL until the toolstack sets the HVM_PARAM_STORE_PFN. I am open to change the protocol (changes might be needed in the Linux side too), if it is ok to do that, I can set the HVM params here directly and change the doc accordingly. Kind regards, Henry
Re: [PATCH 2/3] xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP
Hi Jan, On 4/30/2024 2:11 PM, Jan Beulich wrote: On 30.04.2024 04:51, Henry Wang wrote: On 4/30/2024 8:31 AM, Daniel P. Smith wrote: On 4/26/24 02:21, Jan Beulich wrote: On 26.04.2024 05:14, Henry Wang wrote: --- a/xen/include/public/hvm/params.h +++ b/xen/include/public/hvm/params.h @@ -76,6 +76,7 @@ */ #define HVM_PARAM_STORE_PFN 1 #define HVM_PARAM_STORE_EVTCHN 2 +#define HVM_PARAM_MAGIC_BASE_PFN 3 #define HVM_PARAM_IOREQ_PFN 5 Considering all adjacent values are used, it is overwhelmingly likely that 3 was once used, too. Such re-use needs to be done carefully. Since you need this for Arm only, that's likely okay, but doesn't go without (a) saying and (b) considering the possible future case of dom0less becoming arch-agnostic, or hyperlaunch wanting to extend the scope. Plus (c) imo this also needs at least a comment, maybe even an #ifdef, seeing how x86- focused most of the rest of this header is. I would recommend having two new params, Sounds good. I can do the suggestion in v2. #define HVM_PARAM_HV_RSRV_BASE_PVH 3 #define HVM_PARAM_HV_RSRV_SIZE 4 I think 4 is currently in use, so I think I will find another couple of numbers in the end for both of them. Instead of reusing 3 and 4. Right. There are ample gaps, but any use of values within a gap will need appropriate care. FTAOD using such a gap looks indeed preferable, to avoid further growing the (sparse) array. Alternatively, if we're firm on this never going to be used on x86, some clearly x86-specific indexes (e.g. 36 and 37) could be given non-x86 purpose. Sorry, I am a bit confused. I take Daniel's comment as to add two new params, which is currently only used for Arm, but eventually will be used for hyperlaunch on x86 (as the name indicated). So I think I will use the name that he suggested, but the number changed to 39 and 40. Kind regards, Henry Jan
Re: [PATCH 07/15] xen/overlay: Enable device tree overlay assignment to running domains
Hi Julien, On 4/30/2024 1:34 AM, Julien Grall wrote: On 29/04/2024 04:36, Henry Wang wrote: Hi Jan, Julien, Stefano, Hi Henry, On 4/24/2024 2:05 PM, Jan Beulich wrote: On 24.04.2024 05:34, Henry Wang wrote: --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -1197,7 +1197,9 @@ struct xen_sysctl_dt_overlay { #define XEN_SYSCTL_DT_OVERLAY_ADD 1 #define XEN_SYSCTL_DT_OVERLAY_REMOVE 2 uint8_t overlay_op; /* IN: Add or remove. */ - uint8_t pad[3]; /* IN: Must be zero. */ + bool domain_mapping; /* IN: True of False. */ + uint8_t pad[2]; /* IN: Must be zero. */ + uint32_t domain_id; }; If you merely re-purposed padding fields, all would be fine without bumping the interface version. Yet you don't, albeit for an unclear reason: Why uint32_t rather than domid_t? And on top of that - why a separate boolean when you could use e.g. DOMID_INVALID to indicate "no domain mapping"? I think both of your suggestion make great sense. I will follow the suggestion in v2. That said - anything taking a domain ID is certainly suspicious in a sysctl. Judging from the description you really mean this to be a domctl. Anything else will require extra justification. I also think a domctl is better. I had a look at the history of the already merged series, it looks like in the first version of merged part 1 [1], the hypercall was implemented as the domctl in the beginning but later in v2 changed to sysctl. I think this makes sense as the scope of that time is just to make Xen aware of the device tree node via Xen device tree. However this is now a problem for the current part where the scope (and the end goal) is extended to assign the added device to Linux Dom0/DomU via device tree overlays. I am not sure which way is better, should we repurposing the sysctl to domctl or maybe add another domctl (I am worrying about the duplication because basically we need the same sysctl functionality but now with a domid in it)? What do you think? I am not entirely sure this is a good idea to try to add the device in Xen and attach it to the guests at the same time. Imagine the following situation: 1) Add and attach devices 2) The domain is rebooted 3) Detach and remove devices After step 2, you technically have a new domain. You could have also a case where this is a completely different guest. So the flow would look a little bit weird (you create the DT overlay with domain A but remove with domain B). So, at the moment, it feels like the add/attach (resp detech/remove) operations should happen separately. Can you clarify why you want to add devices to Xen and attach to a guest within a single hypercall? Sorry I don't know if there is any specific thoughts on the design of using a single hypercall to do both add devices to Xen device tree and assign the device to the guest. In fact seeing your above comments, I think separating these two functionality to two xl commands using separated hypercalls would indeed be a better idea. Thank you for the suggestion! To make sure I understand correctly, would you mind confirming if below actions for v2 make sense to you? Thanks! - Only use the XEN_SYSCTL_DT_OVERLAY_{ADD, REMOVE} sysctls to add/remove overlay to Xen device tree - Introduce the xl dt-overlay attach command and respective domctls to do the device assignment for the overlay to domain. Kind regards, Henry Cheers,
Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM
Hi Julien, Sorry for the late reply, On 4/25/2024 10:28 PM, Julien Grall wrote: Hi, On 25/04/2024 08:06, Henry Wang wrote: Hi Julien, On 4/24/2024 8:58 PM, Julien Grall wrote: Hi Henry, On 24/04/2024 04:34, Henry Wang wrote: From: Vikram Garhwal Enable interrupt assign/remove for running VMs in CONFIG_OVERLAY_DTB. Currently, irq_route and mapping is only allowed at the domain creation. Adding exception for CONFIG_OVERLAY_DTB. AFAICT, this is mostly reverting b8577547236f ("xen/arm: Restrict when a physical IRQ can be routed/removed from/to a domain"). Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- xen/arch/arm/gic.c | 4 1 file changed, 4 insertions(+) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 44c40e86de..a775f886ed 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -140,8 +140,10 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, * back to the physical IRQ. To prevent get unsync, restrict the * routing to when the Domain is been created. */ The above comment explains why the check was added. But the commit message doesn't explain why this can be disregarded for your use-case. Looking at the history, I don't think you can simply remove the checks. Regardless that... +#ifndef CONFIG_OVERLAY_DTB ... I am against such #ifdef. A distros may want to have OVERLAY_DTB enabled, yet the user will not use it. Instead, you want to remove the check once the code can properly handle routing an IRQ the domain is created or ... if ( d->creation_finished ) return -EBUSY; +#endif ret = vgic_connect_hw_irq(d, NULL, virq, desc, true); if ( ret ) @@ -171,8 +173,10 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, * Removing an interrupt while the domain is running may have * undesirable effect on the vGIC emulation. */ +#ifndef CONFIG_OVERLAY_DTB if ( !d->is_dying ) return -EBUSY; +#endif ... removed before they domain is destroyed. Thanks for your feeedback. After checking the b8577547236f commit message I think I now understand your point. Do you have any suggestion about how can I properly add the support to route/remove the IRQ to running domains? Thanks. I spent some time going through the GIC/vGIC code and had some discussions with Stefano and Stewart during the last couple of days, let me see if I can describe the use case properly now to continue the discussion: We have some use cases that requires assigning devices to domains after domain boot time. For example, suppose there is an FPGA on the board which can simulate a device, and the bitstream for the FPGA is provided and programmed after domain boot. So we need a way to assign the device to the running domain. This series tries to implement this use case by using device tree overlay - users can firstly add the overlay to Xen dtb, assign the device in the overlay to a domain by the xl command, then apply the overlay to Linux. I haven't really look at that code in quite a while. I think we need to make sure that the virtual and physical IRQ state matches at the time we do the routing. I am undecided on whether we want to simply prevent the action to happen or try to reset the state. There is also the question of what to do if the guest is enabling the vIRQ before it is routed. Sorry for bothering, would you mind elaborating a bit more about the two cases that you mentioned above? Commit b8577547236f ("xen/arm: Restrict when a physical IRQ can be routed/removed from/to a domain") only said there will be undesirable effects, so I am not sure if I understand the concerns raised above and the consequences of these two use cases. I am probably wrong, I think when we add the overlay, we are probably fine as the interrupt is not being used before. Also since we only load the device driver after the IRQ is routed to the guest, I am not sure the guest can enable the vIRQ before it is routed. Kind regards, Henry Overall, someone needs to spend some time reading the code and then make a proposal (this could be just documentation if we believe it is safe to do). Both the current vGIC and the new one may need an update. Cheers,
Re: [PATCH 1/3] xen/arm/dom0less-build: Alloc magic pages for Dom0less DomUs from hypervisor
Hi Daniel, On 4/30/2024 8:27 AM, Daniel P. Smith wrote: On 4/25/24 23:14, Henry Wang wrote: There are use cases (for example using the PV driver) in Dom0less setup that require Dom0less DomUs start immediately with Dom0, but initialize XenStore later after Dom0's successful boot and call to the init-dom0less application. An error message can seen from the init-dom0less application on 1:1 direct-mapped domains: ``` Allocating magic pages memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1 Error on alloc magic pages ``` This is because currently the magic pages for Dom0less DomUs are populated by the init-dom0less app through populate_physmap(), and populate_physmap() automatically assumes gfn == mfn for 1:1 direct mapped domains. This cannot be true for the magic pages that are allocated later from the init-dom0less application executed in Dom0. For domain using statically allocated memory but not 1:1 direct-mapped, similar error "failed to retrieve a reserved page" can be seen as the reserved memory list is empty at that time. To solve above issue, this commit allocates the magic pages for Dom0less DomUs at the domain construction time. The base address/PFN of the magic page region will be noted and communicated to the init-dom0less application in Dom0. Might I suggest we not refer to these as magic pages? I would consider them as hypervisor reserved pages for the VM to have access to virtual platform capabilities. We may see this expand in the future for some unforeseen, new capability. I think magic page is a specific terminology to refer to these pages, see alloc_magic_pages() for both x86 and Arm. I will reword the last paragraph of the commit message to refer them as "hypervisor reserved pages (currently used as magic pages on Arm)" if this sounds good to you. Kind regards, Henry
Re: [PATCH 2/3] xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP
Hi Daniel, On 4/30/2024 8:31 AM, Daniel P. Smith wrote: On 4/26/24 02:21, Jan Beulich wrote: On 26.04.2024 05:14, Henry Wang wrote: --- a/xen/include/public/hvm/params.h +++ b/xen/include/public/hvm/params.h @@ -76,6 +76,7 @@ */ #define HVM_PARAM_STORE_PFN 1 #define HVM_PARAM_STORE_EVTCHN 2 +#define HVM_PARAM_MAGIC_BASE_PFN 3 #define HVM_PARAM_IOREQ_PFN 5 Considering all adjacent values are used, it is overwhelmingly likely that 3 was once used, too. Such re-use needs to be done carefully. Since you need this for Arm only, that's likely okay, but doesn't go without (a) saying and (b) considering the possible future case of dom0less becoming arch-agnostic, or hyperlaunch wanting to extend the scope. Plus (c) imo this also needs at least a comment, maybe even an #ifdef, seeing how x86- focused most of the rest of this header is. I would recommend having two new params, Sounds good. I can do the suggestion in v2. #define HVM_PARAM_HV_RSRV_BASE_PVH 3 #define HVM_PARAM_HV_RSRV_SIZE 4 I think 4 is currently in use, so I think I will find another couple of numbers in the end for both of them. Instead of reusing 3 and 4. Kind regards, Henry This will communicate how many pages have been reserved and where those pages are located. v/r, dps
Re: [PATCH 07/15] xen/overlay: Enable device tree overlay assignment to running domains
Hi Jan, Julien, Stefano, On 4/24/2024 2:05 PM, Jan Beulich wrote: On 24.04.2024 05:34, Henry Wang wrote: --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -1197,7 +1197,9 @@ struct xen_sysctl_dt_overlay { #define XEN_SYSCTL_DT_OVERLAY_ADD 1 #define XEN_SYSCTL_DT_OVERLAY_REMOVE2 uint8_t overlay_op; /* IN: Add or remove. */ -uint8_t pad[3]; /* IN: Must be zero. */ +bool domain_mapping;/* IN: True of False. */ +uint8_t pad[2]; /* IN: Must be zero. */ +uint32_t domain_id; }; If you merely re-purposed padding fields, all would be fine without bumping the interface version. Yet you don't, albeit for an unclear reason: Why uint32_t rather than domid_t? And on top of that - why a separate boolean when you could use e.g. DOMID_INVALID to indicate "no domain mapping"? I think both of your suggestion make great sense. I will follow the suggestion in v2. That said - anything taking a domain ID is certainly suspicious in a sysctl. Judging from the description you really mean this to be a domctl. Anything else will require extra justification. I also think a domctl is better. I had a look at the history of the already merged series, it looks like in the first version of merged part 1 [1], the hypercall was implemented as the domctl in the beginning but later in v2 changed to sysctl. I think this makes sense as the scope of that time is just to make Xen aware of the device tree node via Xen device tree. However this is now a problem for the current part where the scope (and the end goal) is extended to assign the added device to Linux Dom0/DomU via device tree overlays. I am not sure which way is better, should we repurposing the sysctl to domctl or maybe add another domctl (I am worrying about the duplication because basically we need the same sysctl functionality but now with a domid in it)? What do you think? @Stefano: Since I am not 100% if I understand the whole story behind this feature, would you mind checking if I am providing correct information above and sharing your opinions on this? Thank you very much! [1] https://lore.kernel.org/xen-devel/13240b69-f7bb-6a64-b89c-b7c2cbb7e...@xen.org/ Kind regards, Henry Jan
Re: [PATCH 2/3] xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP
Hi Jan, On 4/26/2024 2:50 PM, Jan Beulich wrote: On 26.04.2024 08:30, Henry Wang wrote: On 4/26/2024 2:21 PM, Jan Beulich wrote: On 26.04.2024 05:14, Henry Wang wrote: --- a/xen/include/public/hvm/params.h +++ b/xen/include/public/hvm/params.h @@ -76,6 +76,7 @@ */ #define HVM_PARAM_STORE_PFN1 #define HVM_PARAM_STORE_EVTCHN 2 +#define HVM_PARAM_MAGIC_BASE_PFN3 #define HVM_PARAM_IOREQ_PFN5 Considering all adjacent values are used, it is overwhelmingly likely that 3 was once used, too. Such re-use needs to be done carefully. Since you need this for Arm only, that's likely okay, but doesn't go without (a) saying and (b) considering the possible future case of dom0less becoming arch-agnostic, or hyperlaunch wanting to extend the scope. Plus (c) imo this also needs at least a comment, maybe even an #ifdef, seeing how x86- focused most of the rest of this header is. Thanks for the feedback. These make sense. I think probably dom0less/hyperlaunch will have similar use cases so the number 3 can be reused at that time. Therefore, in v2, I will add more description in commit message, a comment on top of this macro and protect it with #ifdef. Hope this will address your concern. Thanks. FTAOD: If you foresee re-use by hyperlaunch, re-using a previously used number may need re-considering. Which isn't to say that number re-use is excluded here, but it would need at least figuring out (and then stating) what exactly the number was used for and until when. I just did a bit search and noticed that the number 3 was used to be #define HVM_PARAM_APIC_ENABLED 3 and it was removed 18 years ago in commit: 6bc01e4efd50e1986a9391f75980d45691f42b74 So I think we are likely to be ok if reuse 3 on Arm with proper #ifdef. Kind regards, Henry Jan
Re: [PATCH 2/3] xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP
Hi Jan, On 4/26/2024 2:21 PM, Jan Beulich wrote: On 26.04.2024 05:14, Henry Wang wrote: --- a/xen/include/public/hvm/params.h +++ b/xen/include/public/hvm/params.h @@ -76,6 +76,7 @@ */ #define HVM_PARAM_STORE_PFN1 #define HVM_PARAM_STORE_EVTCHN 2 +#define HVM_PARAM_MAGIC_BASE_PFN3 #define HVM_PARAM_IOREQ_PFN5 Considering all adjacent values are used, it is overwhelmingly likely that 3 was once used, too. Such re-use needs to be done carefully. Since you need this for Arm only, that's likely okay, but doesn't go without (a) saying and (b) considering the possible future case of dom0less becoming arch-agnostic, or hyperlaunch wanting to extend the scope. Plus (c) imo this also needs at least a comment, maybe even an #ifdef, seeing how x86- focused most of the rest of this header is. Thanks for the feedback. These make sense. I think probably dom0less/hyperlaunch will have similar use cases so the number 3 can be reused at that time. Therefore, in v2, I will add more description in commit message, a comment on top of this macro and protect it with #ifdef. Hope this will address your concern. Thanks. Kind regards, Henry Jan
Re: [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs
Hi Stefano, Daniel, On 4/26/2024 6:18 AM, Stefano Stabellini wrote: On Thu, 18 Apr 2024, Daniel P. Smith wrote: On 4/9/24 00:53, Henry Wang wrote: An error message can seen from the init-dom0less application on direct-mapped 1:1 domains: ``` Allocating magic pages memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1 Error on alloc magic pages ``` This is because populate_physmap() automatically assumes gfn == mfn for direct mapped domains. This cannot be true for the magic pages that are allocated later for 1:1 Dom0less DomUs from the init-dom0less helper application executed in Dom0. For domain using statically allocated memory but not 1:1 direct-mapped, similar error "failed to retrieve a reserved page" can be seen as the reserved memory list is empty at that time. This series tries to fix this issue using a DOMCTL-based approach, because for 1:1 direct-mapped domUs, we need to avoid the RAM regions and inform the toolstack about the region found by hypervisor for mapping the magic pages. Patch 1 introduced a new DOMCTL to get the guest memory map, currently only used for the magic page regions. Patch 2 generalized the extended region finding logic so that it can be reused for other use cases such as finding 1:1 domU magic regions. Patch 3 uses the same approach as finding the extended regions to find the guest magic page regions for direct-mapped DomUs. Patch 4 avoids hardcoding all base addresses of guest magic region in the init-dom0less application by consuming the newly introduced DOMCTL. Patch 5 is a simple patch to do some code duplication clean-up in xc. Hey Henry, To help provide some perspective, these issues are not experienced with hyperlaunch. This is because we understood early on that you cannot move a lightweight version of the toolstack into hypervisor init and not provide a mechanism to communicate what it did to the runtime control plane. We evaluated the possible mechanism, to include introducing a new hypercall op, and ultimately settled on using hypfs. The primary reason is this information is static data that, while informative later, is only necessary for the control plane to understand the state of the system. As a result, hyperlaunch is able to allocate any and all special pages required as part of domain construction and communicate their addresses to the control plane. As for XSM, hypfs is already protected and at this time we do not see any domain builder information needing to be restricted separately from the data already present in hypfs. I would like to make the suggestion that instead of continuing down this path, perhaps you might consider adopting the hyperlaunch usage of hypfs. Then adjust dom0less domain construction to allocate the special pages at construction time. The original hyperlaunch series includes a patch that provides the helper app for the xenstore announcement. And I can provide you with updated versions if that would be helpful. I also think that the new domctl is not needed and that the dom0less domain builder should allocate the magic pages. Yes this is indeed much better. Thanks Daniel for suggesting this. On ARM, we already allocate HVM_PARAM_CALLBACK_IRQ during dom0less domain build and set HVM_PARAM_STORE_PFN to ~0ULL. I think it would be only natural to extend that code to also allocate the magic pages and set HVM_PARAM_STORE_PFN (and others) correctly. If we do it that way it is simpler and consistent with the HVM_PARAM_CALLBACK_IRQ allocation, and we don't even need hypfs. Currently we do not enable hypfs in our safety certifiability configuration. It is indeed very important to consider the safety certification (which I completely missed). Therefore I've sent an updated version based on HVMOP [1]. In the future we can switch to hypfs if needed. [1] https://lore.kernel.org/xen-devel/20240426031455.579637-1-xin.wa...@amd.com/ Kind regards, Henry
[PATCH 2/3] xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP
For use cases such as Dom0less PV drivers, a mechanism to communicate Dom0less DomU's static data with the runtime control plane (Dom0) is needed. Since on Arm HVMOP is already the existing approach to address such use cases (for example the allocation of HVM_PARAM_CALLBACK_IRQ), add a new HVMOP key HVM_PARAM_MAGIC_BASE_PFN for storing the magic page region base PFN. The value will be set at Dom0less DomU construction time after Dom0less DomU's magic page region has been allocated. To keep consistent, also set the value for HVM_PARAM_MAGIC_BASE_PFN for libxl guests in alloc_magic_pages(). Reported-by: Alec Kwapis Signed-off-by: Henry Wang --- tools/libs/guest/xg_dom_arm.c | 2 ++ xen/arch/arm/dom0less-build.c | 2 ++ xen/arch/arm/hvm.c | 1 + xen/include/public/hvm/params.h | 1 + 4 files changed, 6 insertions(+) diff --git a/tools/libs/guest/xg_dom_arm.c b/tools/libs/guest/xg_dom_arm.c index 8cc7f27dbb..3c08782d1d 100644 --- a/tools/libs/guest/xg_dom_arm.c +++ b/tools/libs/guest/xg_dom_arm.c @@ -74,6 +74,8 @@ static int alloc_magic_pages(struct xc_dom_image *dom) xc_clear_domain_page(dom->xch, dom->guest_domid, base + MEMACCESS_PFN_OFFSET); xc_clear_domain_page(dom->xch, dom->guest_domid, dom->vuart_gfn); +xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_MAGIC_BASE_PFN, +base); xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_CONSOLE_PFN, dom->console_pfn); xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_STORE_PFN, diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index 40dc85c759..72187c167d 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -861,6 +861,8 @@ static int __init construct_domU(struct domain *d, free_domheap_pages(magic_pg, get_order_from_pages(NR_MAGIC_PAGES)); return rc; } + +d->arch.hvm.params[HVM_PARAM_MAGIC_BASE_PFN] = gfn_x(gfn); } return rc; diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c index 0989309fea..fa6141e30c 100644 --- a/xen/arch/arm/hvm.c +++ b/xen/arch/arm/hvm.c @@ -55,6 +55,7 @@ static int hvm_allow_get_param(const struct domain *d, unsigned int param) case HVM_PARAM_STORE_EVTCHN: case HVM_PARAM_CONSOLE_PFN: case HVM_PARAM_CONSOLE_EVTCHN: +case HVM_PARAM_MAGIC_BASE_PFN: return 0; /* diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h index a22b4ed45d..c1720b33b9 100644 --- a/xen/include/public/hvm/params.h +++ b/xen/include/public/hvm/params.h @@ -76,6 +76,7 @@ */ #define HVM_PARAM_STORE_PFN1 #define HVM_PARAM_STORE_EVTCHN 2 +#define HVM_PARAM_MAGIC_BASE_PFN3 #define HVM_PARAM_IOREQ_PFN5 -- 2.34.1
[PATCH 0/3] Guest magic region allocation for 11 Dom0less domUs - Take two
Hi all, This series is trying to fix the reported guest magic region allocation issue for 11 Dom0less domUs, an error message can seen from the init-dom0less application on 1:1 direct-mapped Dom0less DomUs: ``` Allocating magic pages memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1 Error on alloc magic pages ``` This is because populate_physmap() automatically assumes gfn == mfn for direct mapped domains. This cannot be true for the magic pages that are allocated later for 1:1 Dom0less DomUs from the init-dom0less helper application executed in Dom0. For domain using statically allocated memory but not 1:1 direct-mapped, similar error "failed to retrieve a reserved page" can be seen as the reserved memory list is empty at that time. In [1] I've tried to fix this issue by the domctl approach, and discussions in [2] and [3] indicates that a domctl is not really necessary, as we can simplify the issue to "allocate the Dom0less guest magic regions at the Dom0less DomU build time and pass the region base PFN to init-dom0less application". Therefore, the first patch in this series will allocate magic pages for Dom0less DomUs, the second patch will store the allocated region base PFN to HVMOP params like HVM_PARAM_CALLBACK_IRQ, and the third patch uses the HVMOP to get the stored guest magic region base PFN to avoid hardcoding GUEST_MAGIC_BASE. Gitlab CI for this series can be found in [4]. [1] https://lore.kernel.org/xen-devel/20240409045357.236802-1-xin.wa...@amd.com/ [2] https://lore.kernel.org/xen-devel/c7857223-eab8-409a-b618-6ec70f616...@apertussolutions.com/ [3] https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2404251508470.3940@ubuntu-linux-20-04-desktop/ [4] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1268643360 Henry Wang (3): xen/arm/dom0less-build: Alloc magic pages for Dom0less DomUs from hypervisor xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE tools/helpers/init-dom0less.c | 38 ++--- tools/libs/guest/xg_dom_arm.c | 3 ++- xen/arch/arm/dom0less-build.c | 24 + xen/arch/arm/hvm.c | 1 + xen/include/public/arch-arm.h | 1 + xen/include/public/hvm/params.h | 1 + 6 files changed, 45 insertions(+), 23 deletions(-) -- 2.34.1
[PATCH 3/3] tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE
Currently the GUEST_MAGIC_BASE in the init-dom0less application is hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less DomUs. Since the guest magic region is now allocated from the hypervisor, instead of hardcoding the guest magic pages region, use xc_hvm_param_get() to get the guest magic region PFN, and based on that the XenStore PFN can be calculated. Also, we don't need to set the max mem anymore, so drop the call to xc_domain_setmaxmem(). Rename the alloc_xs_page() to get_xs_page() to reflect the changes. Take the opportunity to do some coding style improvements when possible. Reported-by: Alec Kwapis Signed-off-by: Henry Wang --- tools/helpers/init-dom0less.c | 38 +++ 1 file changed, 16 insertions(+), 22 deletions(-) diff --git a/tools/helpers/init-dom0less.c b/tools/helpers/init-dom0less.c index fee93459c4..7f6953a818 100644 --- a/tools/helpers/init-dom0less.c +++ b/tools/helpers/init-dom0less.c @@ -19,24 +19,20 @@ #define XENSTORE_PFN_OFFSET 1 #define STR_MAX_LENGTH 128 -static int alloc_xs_page(struct xc_interface_core *xch, - libxl_dominfo *info, - uint64_t *xenstore_pfn) +static int get_xs_page(struct xc_interface_core *xch, libxl_dominfo *info, + uint64_t *xenstore_pfn) { int rc; -const xen_pfn_t base = GUEST_MAGIC_BASE >> XC_PAGE_SHIFT; -xen_pfn_t p2m = (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET; +xen_pfn_t magic_base_pfn; -rc = xc_domain_setmaxmem(xch, info->domid, - info->max_memkb + (XC_PAGE_SIZE/1024)); -if (rc < 0) -return rc; - -rc = xc_domain_populate_physmap_exact(xch, info->domid, 1, 0, 0, ); -if (rc < 0) -return rc; +rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_MAGIC_BASE_PFN, + _base_pfn); +if (rc < 0) { +printf("Failed to get HVM_PARAM_MAGIC_BASE_PFN\n"); +return 1; +} -*xenstore_pfn = base + XENSTORE_PFN_OFFSET; +*xenstore_pfn = magic_base_pfn + XENSTORE_PFN_OFFSET; rc = xc_clear_domain_page(xch, info->domid, *xenstore_pfn); if (rc < 0) return rc; @@ -100,6 +96,7 @@ static bool do_xs_write_vm(struct xs_handle *xsh, xs_transaction_t t, */ static int create_xenstore(struct xs_handle *xsh, libxl_dominfo *info, libxl_uuid uuid, + xen_pfn_t xenstore_pfn, evtchn_port_t xenstore_port) { domid_t domid; @@ -145,8 +142,7 @@ static int create_xenstore(struct xs_handle *xsh, rc = snprintf(target_memkb_str, STR_MAX_LENGTH, "%"PRIu64, info->current_memkb); if (rc < 0 || rc >= STR_MAX_LENGTH) return rc; -rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%lld", - (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET); +rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%"PRIu_xen_pfn, xenstore_pfn); if (rc < 0 || rc >= STR_MAX_LENGTH) return rc; rc = snprintf(xenstore_port_str, STR_MAX_LENGTH, "%u", xenstore_port); @@ -245,8 +241,8 @@ static int init_domain(struct xs_handle *xsh, if (!xenstore_evtchn) return 0; -/* Alloc xenstore page */ -if (alloc_xs_page(xch, info, _pfn) != 0) { +/* Get xenstore page */ +if (get_xs_page(xch, info, _pfn) != 0) { printf("Error on alloc magic pages\n"); return 1; } @@ -278,13 +274,11 @@ static int init_domain(struct xs_handle *xsh, if (rc < 0) return rc; -rc = create_xenstore(xsh, info, uuid, xenstore_evtchn); +rc = create_xenstore(xsh, info, uuid, xenstore_pfn, xenstore_evtchn); if (rc) err(1, "writing to xenstore"); -rc = xs_introduce_domain(xsh, info->domid, -(GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET, -xenstore_evtchn); +rc = xs_introduce_domain(xsh, info->domid, xenstore_pfn, xenstore_evtchn); if (!rc) err(1, "xs_introduce_domain"); return 0; -- 2.34.1
[PATCH 1/3] xen/arm/dom0less-build: Alloc magic pages for Dom0less DomUs from hypervisor
There are use cases (for example using the PV driver) in Dom0less setup that require Dom0less DomUs start immediately with Dom0, but initialize XenStore later after Dom0's successful boot and call to the init-dom0less application. An error message can seen from the init-dom0less application on 1:1 direct-mapped domains: ``` Allocating magic pages memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1 Error on alloc magic pages ``` This is because currently the magic pages for Dom0less DomUs are populated by the init-dom0less app through populate_physmap(), and populate_physmap() automatically assumes gfn == mfn for 1:1 direct mapped domains. This cannot be true for the magic pages that are allocated later from the init-dom0less application executed in Dom0. For domain using statically allocated memory but not 1:1 direct-mapped, similar error "failed to retrieve a reserved page" can be seen as the reserved memory list is empty at that time. To solve above issue, this commit allocates the magic pages for Dom0less DomUs at the domain construction time. The base address/PFN of the magic page region will be noted and communicated to the init-dom0less application in Dom0. Reported-by: Alec Kwapis Suggested-by: Daniel P. Smith Signed-off-by: Henry Wang --- tools/libs/guest/xg_dom_arm.c | 1 - xen/arch/arm/dom0less-build.c | 22 ++ xen/include/public/arch-arm.h | 1 + 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/tools/libs/guest/xg_dom_arm.c b/tools/libs/guest/xg_dom_arm.c index 2fd8ee7ad4..8cc7f27dbb 100644 --- a/tools/libs/guest/xg_dom_arm.c +++ b/tools/libs/guest/xg_dom_arm.c @@ -25,7 +25,6 @@ #include "xg_private.h" -#define NR_MAGIC_PAGES 4 #define CONSOLE_PFN_OFFSET 0 #define XENSTORE_PFN_OFFSET 1 #define MEMACCESS_PFN_OFFSET 2 diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index fb63ec6fd1..40dc85c759 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -834,11 +834,33 @@ static int __init construct_domU(struct domain *d, if ( kinfo.dom0less_feature & DOM0LESS_XENSTORE ) { +struct page_info *magic_pg; +mfn_t mfn; +gfn_t gfn; + ASSERT(hardware_domain); rc = alloc_xenstore_evtchn(d); if ( rc < 0 ) return rc; d->arch.hvm.params[HVM_PARAM_STORE_PFN] = ~0ULL; + +d->max_pages += NR_MAGIC_PAGES; +magic_pg = alloc_domheap_pages(d, get_order_from_pages(NR_MAGIC_PAGES), 0); +if ( magic_pg == NULL ) +return -ENOMEM; + +mfn = page_to_mfn(magic_pg); +if ( !is_domain_direct_mapped(d) ) +gfn = gaddr_to_gfn(GUEST_MAGIC_BASE); +else +gfn = gaddr_to_gfn(mfn_to_maddr(mfn)); + +rc = guest_physmap_add_pages(d, gfn, mfn, NR_MAGIC_PAGES); +if ( rc ) +{ +free_domheap_pages(magic_pg, get_order_from_pages(NR_MAGIC_PAGES)); +return rc; +} } return rc; diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h index e167e14f8d..f24e7bbe37 100644 --- a/xen/include/public/arch-arm.h +++ b/xen/include/public/arch-arm.h @@ -475,6 +475,7 @@ typedef uint64_t xen_callback_t; #define GUEST_MAGIC_BASE xen_mk_ullong(0x3900) #define GUEST_MAGIC_SIZE xen_mk_ullong(0x0100) +#define NR_MAGIC_PAGES 4 #define GUEST_RAM_BANKS 2 -- 2.34.1
[PATCH v1.1] xen/commom/dt-overlay: Fix missing lock when remove the device
If CONFIG_DEBUG=y, below assertion will be triggered: (XEN) Assertion 'rw_is_locked(_host_lock)' failed at drivers/passthrough/device_tree.c:146 (XEN) [ Xen-4.19-unstable arm64 debug=y Not tainted ] (XEN) CPU: 0 (XEN) PC: 0a257418 iommu_remove_dt_device+0x8c/0xd4 (XEN) LR: 0a2573a0 (XEN) SP: 8000fff7fb30 (XEN) CPSR: 0249 MODE:64-bit EL2h (Hypervisor, handler) [...] (XEN) Xen call trace: (XEN) [<0a257418>] iommu_remove_dt_device+0x8c/0xd4 (PC) (XEN) [<0a2573a0>] iommu_remove_dt_device+0x14/0xd4 (LR) (XEN) [<0a20797c>] dt-overlay.c#remove_node_resources+0x8c/0x90 (XEN) [<0a207f14>] dt-overlay.c#remove_nodes+0x524/0x648 (XEN) [<0a208460>] dt_overlay_sysctl+0x428/0xc68 (XEN) [<0a2707f8>] arch_do_sysctl+0x1c/0x2c (XEN) [<0a230b40>] do_sysctl+0x96c/0x9ec (XEN) [<0a271e08>] traps.c#do_trap_hypercall+0x1e8/0x288 (XEN) [<0a273490>] do_trap_guest_sync+0x448/0x63c (XEN) [<0a25c480>] entry.o#guest_sync_slowpath+0xa8/0xd8 (XEN) (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Assertion 'rw_is_locked(_host_lock)' failed at drivers/passthrough/device_tree.c:146 (XEN) This is because iommu_remove_dt_device() is called without taking the dt_host_lock. Fix the issue by taking and releasing the lock properly. Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities") Signed-off-by: Henry Wang --- v1.1: - Move the unlock position before the check of rc. --- xen/common/dt-overlay.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c index 1b197381f6..ab8f43aea2 100644 --- a/xen/common/dt-overlay.c +++ b/xen/common/dt-overlay.c @@ -381,7 +381,9 @@ static int remove_node_resources(struct dt_device_node *device_node) { if ( dt_device_is_protected(device_node) ) { +write_lock(_host_lock); rc = iommu_remove_dt_device(device_node); +write_unlock(_host_lock); if ( rc < 0 ) return rc; } -- 2.34.1
Re: [PATCH 11/15] tools/helpers: Add get_overlay
Hi Stewart, On 4/26/2024 9:45 AM, Stewart Hildebrand wrote: On 4/24/24 20:43, Henry Wang wrote: Hi Jan, On 4/24/2024 2:08 PM, Jan Beulich wrote: On 24.04.2024 05:34, Henry Wang wrote: From: Vikram Garhwal This user level application copies the overlay dtbo shared by dom0 while doing overlay node assignment operation. It uses xenstore to communicate with dom0. More information on the protocol is writtien in docs/misc/overlay.txt file. Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- tools/helpers/Makefile | 8 + tools/helpers/get_overlay.c | 393 2 files changed, 401 insertions(+) create mode 100644 tools/helpers/get_overlay.c As mentioned before on various occasions - new files preferably use dashes as separators in preference to underscores. You not doing so is particularly puzzling seeing ... --- a/tools/helpers/Makefile +++ b/tools/helpers/Makefile @@ -12,6 +12,7 @@ TARGETS += init-xenstore-domain endif ifeq ($(CONFIG_ARM),y) TARGETS += init-dom0less +TARGETS += get_overlay ... patch context here (demonstrating a whopping 3 dashes used in similar cases). I am not very sure why Vikram used "_" in the original patch. However I agree you are correct. Since I am currently doing the follow up of this series, I will use "-" in v2 as suggested. Thanks. Please also add tools/helpers/get-overlay to .gitignore Thanks for the reminder! Yes sure I will add it. Kind regards, Henry
Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM
Hi Julien, On 4/24/2024 8:58 PM, Julien Grall wrote: Hi Henry, On 24/04/2024 04:34, Henry Wang wrote: From: Vikram Garhwal Enable interrupt assign/remove for running VMs in CONFIG_OVERLAY_DTB. Currently, irq_route and mapping is only allowed at the domain creation. Adding exception for CONFIG_OVERLAY_DTB. AFAICT, this is mostly reverting b8577547236f ("xen/arm: Restrict when a physical IRQ can be routed/removed from/to a domain"). Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- xen/arch/arm/gic.c | 4 1 file changed, 4 insertions(+) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 44c40e86de..a775f886ed 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -140,8 +140,10 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, * back to the physical IRQ. To prevent get unsync, restrict the * routing to when the Domain is been created. */ The above comment explains why the check was added. But the commit message doesn't explain why this can be disregarded for your use-case. Looking at the history, I don't think you can simply remove the checks. Regardless that... +#ifndef CONFIG_OVERLAY_DTB ... I am against such #ifdef. A distros may want to have OVERLAY_DTB enabled, yet the user will not use it. Instead, you want to remove the check once the code can properly handle routing an IRQ the domain is created or ... if ( d->creation_finished ) return -EBUSY; +#endif ret = vgic_connect_hw_irq(d, NULL, virq, desc, true); if ( ret ) @@ -171,8 +173,10 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, * Removing an interrupt while the domain is running may have * undesirable effect on the vGIC emulation. */ +#ifndef CONFIG_OVERLAY_DTB if ( !d->is_dying ) return -EBUSY; +#endif ... removed before they domain is destroyed. Thanks for your feeedback. After checking the b8577547236f commit message I think I now understand your point. Do you have any suggestion about how can I properly add the support to route/remove the IRQ to running domains? Thanks. Kind regards, Henry desc->handler->shutdown(desc); Cheers,
Re: [PATCH 14/15] add a domU script to fetch overlays and applying them to linux
Hi Jan, On 4/25/2024 2:46 PM, Jan Beulich wrote: On 25.04.2024 02:54, Henry Wang wrote: On 4/24/2024 2:16 PM, Jan Beulich wrote: On 24.04.2024 05:34, Henry Wang wrote: From: Vikram Garhwal Introduce a shell script that runs in the background and calls get_overlay to retrive overlays and add them (or remove them) to Linux device tree (running as a domU). Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- tools/helpers/Makefile | 2 +- tools/helpers/get_overlay.sh | 81 2 files changed, 82 insertions(+), 1 deletion(-) create mode 100755 tools/helpers/get_overlay.sh Besides the same naming issue as in the earlier patch, the script also looks very Linux-ish. Yet ... I will fix the naming issue in v2. Would you mind elaborating a bit more about the "Linux-ish" concern? I guess this is because the original use case is on Linux, should I do anything about this? Well, the script won't work on other than Linux, will it? Therefore ... --- a/tools/helpers/Makefile +++ b/tools/helpers/Makefile @@ -58,7 +58,6 @@ init-dom0less: $(INIT_DOM0LESS_OBJS) get_overlay: $(SHARE_OVERLAY_OBJS) $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenvchan) $(LDLIBS_libxenstore) $(LDLIBS_libxenctrl) $(LDLIBS_libxengnttab) $(APPEND_LDFLAGS) - .PHONY: install install: all $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN) @@ -67,6 +66,7 @@ install: all .PHONY: uninstall uninstall: for i in $(TARGETS); do rm -f $(DESTDIR)$(LIBEXEC_BIN)/$$i; done + $(RM) $(DESTDIR)$(LIBEXEC_BIN)/get_overlay.sh .PHONY: clean clean: ... you touching only the uninstall target, it's not even clear to me how (and under what conditions) the script is going to make it into $(DESTDIR)$(LIBEXEC_BIN)/. Did you mean to add to $(TARGETS), perhaps, alongside the earlier added get-overlay binary? ... it first of needs to become clear under what conditions it is actually going to be installed. You are right, I think the get-overlay binary and this script should be installed if DTB overlay is supported. Checking the code, I found LIBXL_HAVE_DT_OVERLAY which can indicate if we have this feature supported in libxl. Do you think it is a good idea to use it to install these two files in Makefile? Thanks. Counter question: If it's not going to be installed, how are people going to make use of it? If the script is intended for manual use only, I think that would want saying in the description. Yet then I couldn't see why the uninstall goal would need modifying. Checking the code again, I feel like this is a mistake actually. I think this script should be installed together with the get-overlay application as the script actually calls get-overlay. The uninstall goal should remain untouched. I will fix this in v2. As to LIBXL_HAVE_DT_OVERLAY - that's not accessible from a Makefile, I guess? Yes. Kind regards, Henry Jan
Re: [PATCH 03/15] xen/arm: Always enable IOMMU
Hi Julien, On 4/24/2024 9:03 PM, Julien Grall wrote: Hi Henry, On 24/04/2024 04:34, Henry Wang wrote: From: Vikram Garhwal For overlay with iommu functionality to work with running VMs, we need to enable IOMMU by default for the domains. Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- xen/arch/arm/dom0less-build.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c index fb63ec6fd1..2d1fd1e214 100644 --- a/xen/arch/arm/dom0less-build.c +++ b/xen/arch/arm/dom0less-build.c @@ -894,7 +894,8 @@ void __init create_domUs(void) panic("Missing property 'cpus' for domain %s\n", dt_node_name(node)); - if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") && + if ( (IS_ENABLED(CONFIG_OVERLAY_DTB) || Similar to the first patch, building Xen with the DTB overlay doesn't mean the user will want to use it (think of distros that may want to provide a generic Xen). Instead, we should introduce a new DT property "passthrough" that would indicate whether the IOMMU should be used. To be futureproof, I would match the values used by xl.cfg (see docs/man/xl.cfg.5.pod.in). That sounds good. I can introduce a new DT property as suggested. Thanks for the suggestion! Kind regards, Henry + dt_find_compatible_node(node, NULL, "multiboot,device-tree")) && iommu_enabled ) d_cfg.flags |= XEN_DOMCTL_CDF_iommu; Cheers,
Re: [PATCH 14/15] add a domU script to fetch overlays and applying them to linux
Hi Jan, On 4/24/2024 2:16 PM, Jan Beulich wrote: On 24.04.2024 05:34, Henry Wang wrote: From: Vikram Garhwal Introduce a shell script that runs in the background and calls get_overlay to retrive overlays and add them (or remove them) to Linux device tree (running as a domU). Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- tools/helpers/Makefile | 2 +- tools/helpers/get_overlay.sh | 81 2 files changed, 82 insertions(+), 1 deletion(-) create mode 100755 tools/helpers/get_overlay.sh Besides the same naming issue as in the earlier patch, the script also looks very Linux-ish. Yet ... I will fix the naming issue in v2. Would you mind elaborating a bit more about the "Linux-ish" concern? I guess this is because the original use case is on Linux, should I do anything about this? --- a/tools/helpers/Makefile +++ b/tools/helpers/Makefile @@ -58,7 +58,6 @@ init-dom0less: $(INIT_DOM0LESS_OBJS) get_overlay: $(SHARE_OVERLAY_OBJS) $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenvchan) $(LDLIBS_libxenstore) $(LDLIBS_libxenctrl) $(LDLIBS_libxengnttab) $(APPEND_LDFLAGS) - .PHONY: install install: all $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN) @@ -67,6 +66,7 @@ install: all .PHONY: uninstall uninstall: for i in $(TARGETS); do rm -f $(DESTDIR)$(LIBEXEC_BIN)/$$i; done + $(RM) $(DESTDIR)$(LIBEXEC_BIN)/get_overlay.sh .PHONY: clean clean: ... you touching only the uninstall target, it's not even clear to me how (and under what conditions) the script is going to make it into $(DESTDIR)$(LIBEXEC_BIN)/. Did you mean to add to $(TARGETS), perhaps, alongside the earlier added get-overlay binary? You are right, I think the get-overlay binary and this script should be installed if DTB overlay is supported. Checking the code, I found LIBXL_HAVE_DT_OVERLAY which can indicate if we have this feature supported in libxl. Do you think it is a good idea to use it to install these two files in Makefile? Thanks. Kind regards, Henry Jan