[PATCH v4 6/9] xen/arm/gic: Allow removing interrupt to running VMs

2024-05-23 Thread Henry Wang
From: Vikram Garhwal 

Currently, removing physical interrupts are only allowed at
the domain destroy time. For use cases such as dynamic device
tree overlay removing, the removing of physical IRQ to
running domains should be allowed.

Move the above-mentioned domain dying check to vgic_connect_hw_irq().
Similarly as the routing interrupt to running domains, reject the
operation if the IRQ is active or pending in the guest. Do it for
both new and old vGIC implementations. Since now vgic_connect_hw_irq()
may reject the invalid operation case, move the clear of
_IRQ_INPROGRESS flag in gic_remove_irq_from_guest() to after the
successful execution of vgic_connect_hw_irq().

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
v4:
- Split the original patch, only do the removing IRQ stuff in this
  patch.
- Move the clear of _IRQ_INPROGRESS flag in gic_remove_irq_from_guest()
  to after the successful execution of vgic_connect_hw_irq().
- Special case the d->is_dying check.
---
 xen/arch/arm/gic-vgic.c  | 27 ---
 xen/arch/arm/gic.c   |  9 +
 xen/arch/arm/vgic/vgic.c | 24 
 3 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index b99e287224..56b6a3d5b0 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -439,6 +439,14 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
 
 /* We are taking to rank lock to prevent parallel connections. */
 vgic_lock_rank(v_target, rank, flags);
+/* Return with error if the IRQ is being migrated. */
+if( test_bit(GIC_IRQ_GUEST_MIGRATING, >status) )
+{
+vgic_unlock_rank(v_target, rank, flags);
+return -EBUSY;
+}
+
+spin_lock(_target->arch.vgic.lock);
 
 if ( connect )
 {
@@ -456,12 +464,25 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
 }
 else
 {
-if ( desc && p->desc != desc )
-ret = -EINVAL;
+if ( d->is_dying )
+{
+if ( desc && p->desc != desc )
+ret = -EINVAL;
+else
+p->desc = NULL;
+}
 else
-p->desc = NULL;
+{
+if ( (desc && p->desc != desc) ||
+ test_bit(GIC_IRQ_GUEST_VISIBLE, >status) ||
+ test_bit(GIC_IRQ_GUEST_ACTIVE, >status) )
+ret = -EINVAL;
+else
+p->desc = NULL;
+}
 }
 
+spin_unlock(_target->arch.vgic.lock);
 vgic_unlock_rank(v_target, rank, flags);
 
 return ret;
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index b3467a76ae..8633f14bdd 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -159,24 +159,17 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned 
int virq,
 ASSERT(test_bit(_IRQ_GUEST, >status));
 ASSERT(!is_lpi(virq));
 
-/*
- * Removing an interrupt while the domain is running may have
- * undesirable effect on the vGIC emulation.
- */
-if ( !d->is_dying )
-return -EBUSY;
-
 desc->handler->shutdown(desc);
 
 /* EOI the IRQ if it has not been done by the guest */
 if ( test_bit(_IRQ_INPROGRESS, >status) )
 gic_hw_ops->deactivate_irq(desc);
-clear_bit(_IRQ_INPROGRESS, >status);
 
 ret = vgic_connect_hw_irq(d, NULL, virq, desc, false);
 if ( ret )
 return ret;
 
+clear_bit(_IRQ_INPROGRESS, >status);
 clear_bit(_IRQ_GUEST, >status);
 desc->handler = _irq_type;
 
diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
index 048e12c562..0c324b58f7 100644
--- a/xen/arch/arm/vgic/vgic.c
+++ b/xen/arch/arm/vgic/vgic.c
@@ -890,14 +890,30 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu 
*vcpu,
 }
 else/* remove a mapped IRQ */
 {
-if ( desc && irq->hwintid != desc->irq )
+if ( d->is_dying )
 {
-ret = -EINVAL;
+if ( desc && irq->hwintid != desc->irq )
+{
+ret = -EINVAL;
+}
+else
+{
+irq->hw = false;
+irq->hwintid = 0;
+}
 }
 else
 {
-irq->hw = false;
-irq->hwintid = 0;
+if ( (desc && irq->hwintid != desc->irq) ||
+ irq->active || irq->pending_latch )
+{
+ret = -EINVAL;
+}
+else
+{
+irq->hw = false;
+irq->hwintid = 0;
+}
 }
 }
 
-- 
2.34.1




[PATCH v4 8/9] tools: Introduce the "xl dt-overlay {attach,detach}" commands

2024-05-23 Thread Henry Wang
With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
attach/detach devices from the provided DT overlay to domains.
Support this by introducing a new set of "xl dt-overlay" commands and
related documentation, i.e. "xl dt-overlay {attach,detach}". Slightly
rework the command option parsing logic.

Signed-off-by: Henry Wang 
Reviewed-by: Jason Andryuk 
---
v4:
- Add Jason's Reviewed-by tag.
v3:
- Introduce new API libxl_dt_overlay_domain() and co., instead of
  reusing existing API libxl_dt_overlay().
- Add in-code comments for the LIBXL_DT_OVERLAY_* macros.
- Use find_domain() to avoid getting domain_id from strtol().
v2:
- New patch.
---
 tools/include/libxl.h   | 10 +++
 tools/include/xenctrl.h |  3 +++
 tools/libs/ctrl/xc_dt_overlay.c | 31 +
 tools/libs/light/libxl_dt_overlay.c | 28 +++
 tools/xl/xl_cmdtable.c  |  4 +--
 tools/xl/xl_vmcontrol.c | 42 -
 6 files changed, 104 insertions(+), 14 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 62cb07dea6..6cc6d6bf6a 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -2549,8 +2549,18 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, 
uint32_t domid,
 void libxl_device_pci_list_free(libxl_device_pci* list, int num);
 
 #if defined(__arm__) || defined(__aarch64__)
+/* Values should keep consistent with the op from XEN_SYSCTL_dt_overlay */
+#define LIBXL_DT_OVERLAY_ADD   1
+#define LIBXL_DT_OVERLAY_REMOVE2
 int libxl_dt_overlay(libxl_ctx *ctx, void *overlay,
  uint32_t overlay_size, uint8_t overlay_op);
+
+/* Values should keep consistent with the op from XEN_DOMCTL_dt_overlay */
+#define LIBXL_DT_OVERLAY_DOMAIN_ATTACH 1
+#define LIBXL_DT_OVERLAY_DOMAIN_DETACH 2
+int libxl_dt_overlay_domain(libxl_ctx *ctx, uint32_t domain_id,
+void *overlay_dt, uint32_t overlay_dt_size,
+uint8_t overlay_op);
 #endif
 
 /*
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 4996855944..9ceca0cffc 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2657,6 +2657,9 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t 
domid,
 #if defined(__arm__) || defined(__aarch64__)
 int xc_dt_overlay(xc_interface *xch, void *overlay_fdt,
   uint32_t overlay_fdt_size, uint8_t overlay_op);
+int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt,
+ uint32_t overlay_fdt_size, uint8_t overlay_op,
+ uint32_t domain_id);
 #endif
 
 /* Compat shims */
diff --git a/tools/libs/ctrl/xc_dt_overlay.c b/tools/libs/ctrl/xc_dt_overlay.c
index c2224c4d15..ea1da522d1 100644
--- a/tools/libs/ctrl/xc_dt_overlay.c
+++ b/tools/libs/ctrl/xc_dt_overlay.c
@@ -48,3 +48,34 @@ err:
 
 return err;
 }
+
+int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt,
+ uint32_t overlay_fdt_size, uint8_t overlay_op,
+ uint32_t domain_id)
+{
+int err;
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_dt_overlay,
+.domain = domain_id,
+.u.dt_overlay = {
+.overlay_op = overlay_op,
+.overlay_fdt_size = overlay_fdt_size,
+}
+};
+
+DECLARE_HYPERCALL_BOUNCE(overlay_fdt, overlay_fdt_size,
+ XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( (err = xc_hypercall_bounce_pre(xch, overlay_fdt)) )
+goto err;
+
+set_xen_guest_handle(domctl.u.dt_overlay.overlay_fdt, overlay_fdt);
+
+if ( (err = do_domctl(xch, )) != 0 )
+PERROR("%s failed", __func__);
+
+err:
+xc_hypercall_bounce_post(xch, overlay_fdt);
+
+return err;
+}
diff --git a/tools/libs/light/libxl_dt_overlay.c 
b/tools/libs/light/libxl_dt_overlay.c
index a6c709a6dc..00503b76bd 100644
--- a/tools/libs/light/libxl_dt_overlay.c
+++ b/tools/libs/light/libxl_dt_overlay.c
@@ -69,3 +69,31 @@ out:
 return rc;
 }
 
+int libxl_dt_overlay_domain(libxl_ctx *ctx, uint32_t domain_id,
+void *overlay_dt, uint32_t overlay_dt_size,
+uint8_t overlay_op)
+{
+int rc;
+int r;
+GC_INIT(ctx);
+
+if (check_overlay_fdt(gc, overlay_dt, overlay_dt_size)) {
+LOG(ERROR, "Overlay DTB check failed");
+rc = ERROR_FAIL;
+goto out;
+} else {
+LOG(DEBUG, "Overlay DTB check passed");
+rc = 0;
+}
+
+r = xc_dt_overlay_domain(ctx->xch, overlay_dt, overlay_dt_size, overlay_op,
+ domain_id);
+if (r) {
+LOG(ERROR, "%s: Attaching/Detaching overlay dtb failed.", __func__);
+rc = ERROR_FAIL;
+}
+
+out:
+GC_FREE;
+return rc;
+}
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 1

[PATCH v4 5/9] xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains

2024-05-23 Thread Henry Wang
In order to support the dynamic dtbo device assignment to a running
VM, the add/remove of the DT overlay and the attach/detach of the
device from the DT overlay should happen separately. Therefore,
repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
overlay to Xen device tree, instead of assigning the device to the
hardware domain at the same time. Add the XEN_DOMCTL_dt_overlay with
operations XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment
to the domain.

The hypervisor firstly checks the DT overlay passed from the toolstack
is valid. Then the device nodes are retrieved from the overlay tracker
based on the DT overlay. The attach of the device is implemented by
mapping the IRQ and IOMMU resources.

Signed-off-by: Henry Wang 
Signed-off-by: Vikram Garhwal 
---
v4:
- Split the original patch, only do the device attachment.
v3:
- Style fixes for arch-selection #ifdefs.
- Do not include public/domctl.h, only add a forward declaration of
  struct xen_domctl_dt_overlay.
- Extract the overlay track entry finding logic to a function, drop
  the unused variables.
- Use op code 1&2 for XEN_DOMCTL_DT_OVERLAY_{ATTACH,DETACH}.
v2:
- New patch.
---
 xen/arch/arm/domctl.c|   3 +
 xen/common/dt-overlay.c  | 199 ++-
 xen/include/public/domctl.h  |  14 +++
 xen/include/public/sysctl.h  |  11 +-
 xen/include/xen/dt-overlay.h |   7 ++
 5 files changed, 176 insertions(+), 58 deletions(-)

diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index ad56efb0f5..12a12ee781 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -5,6 +5,7 @@
  * Copyright (c) 2012, Citrix Systems
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -176,6 +177,8 @@ long arch_do_domctl(struct xen_domctl *domctl, struct 
domain *d,
 
 return rc;
 }
+case XEN_DOMCTL_dt_overlay:
+return dt_overlay_domctl(d, >u.dt_overlay);
 default:
 return subarch_do_domctl(domctl, d, u_domctl);
 }
diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 9cece79067..1087f9b502 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -356,6 +356,42 @@ static int overlay_get_nodes_info(const void *fdto, char 
**nodes_full_path)
 return 0;
 }
 
+/* This function should be called with the overlay_lock taken */
+static struct overlay_track *
+find_track_entry_from_tracker(const void *overlay_fdt,
+  uint32_t overlay_fdt_size)
+{
+struct overlay_track *entry, *temp;
+bool found_entry = false;
+
+ASSERT(spin_is_locked(_lock));
+
+/*
+ * First check if dtbo is correct i.e. it should one of the dtbo which was
+ * used when dynamically adding the node.
+ * Limitation: Cases with same node names but different property are not
+ * supported currently. We are relying on user to provide the same dtbo
+ * as it was used when adding the nodes.
+ */
+list_for_each_entry_safe( entry, temp, _tracker, entry )
+{
+if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 )
+{
+found_entry = true;
+break;
+}
+}
+
+if ( !found_entry )
+{
+printk(XENLOG_ERR "Cannot find any matching tracker with input dtbo."
+   " Operation is supported only for prior added dtbo.\n");
+return NULL;
+}
+
+return entry;
+}
+
 /* Check if node itself can be removed and remove node from IOMMU. */
 static int remove_node_resources(struct dt_device_node *device_node)
 {
@@ -485,8 +521,7 @@ static long handle_remove_overlay_nodes(const void 
*overlay_fdt,
 uint32_t overlay_fdt_size)
 {
 int rc;
-struct overlay_track *entry, *temp, *track;
-bool found_entry = false;
+struct overlay_track *entry;
 
 rc = check_overlay_fdt(overlay_fdt, overlay_fdt_size);
 if ( rc )
@@ -494,29 +529,10 @@ static long handle_remove_overlay_nodes(const void 
*overlay_fdt,
 
 spin_lock(_lock);
 
-/*
- * First check if dtbo is correct i.e. it should one of the dtbo which was
- * used when dynamically adding the node.
- * Limitation: Cases with same node names but different property are not
- * supported currently. We are relying on user to provide the same dtbo
- * as it was used when adding the nodes.
- */
-list_for_each_entry_safe( entry, temp, _tracker, entry )
-{
-if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 )
-{
-track = entry;
-found_entry = true;
-break;
-}
-}
-
-if ( !found_entry )
+entry = find_track_entry_from_tracker(overlay_fdt, overlay_fdt_size);
+if ( entry == NULL )
 {
 rc = -EINVAL;
-
-printk(XENLOG_ERR "Cannot find any matching tracker with input dtbo."
-   " Removing nodes is supported only for prior added dtbo.\n&

[PATCH v4 9/9] docs: Add device tree overlay documentation

2024-05-23 Thread Henry Wang
From: Vikram Garhwal 

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
v4:
- No change.
v3:
- No change.
v2:
- Update the content based on the changes in this version.
---
 docs/misc/arm/overlay.txt | 99 +++
 1 file changed, 99 insertions(+)
 create mode 100644 docs/misc/arm/overlay.txt

diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt
new file mode 100644
index 00..811a6de369
--- /dev/null
+++ b/docs/misc/arm/overlay.txt
@@ -0,0 +1,99 @@
+# Device Tree Overlays support in Xen
+
+Xen now supports dynamic device assignment to running domains,
+i.e. adding/removing nodes (using .dtbo) to/from Xen device tree, and
+attaching/detaching them to/from a running domain with given $domid.
+
+Dynamic node assignment works in two steps:
+
+## Add/Remove device tree overlay to/from Xen device tree
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor applies/removes the dtbo to/from Xen device tree.
+
+## Attach/Detach device from the DT overlay to/from domain
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor attach/detach the device to/from the user-provided $domid by
+   mapping/unmapping node resources in the DT overlay.
+
+# Examples
+
+Here are a few examples on how to use it.
+
+## Dom0 device add
+
+For assigning a device tree overlay to Dom0, user should firstly properly
+prepare the DT overlay. More information about device tree overlays can be
+found in [1]. Then, in Dom0, enter the following:
+
+(dom0) xl dt-overlay add overlay.dtbo
+
+This will allocate the devices mentioned in overlay.dtbo to Xen device tree.
+
+To assign the newly added device from the dtbo to Dom0:
+
+(dom0) xl dt-overlay attach overlay.dtbo 0
+
+Next, if the user wants to add the same device tree overlay to dom0
+Linux, execute the following:
+
+(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(dom0) cat overlay.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally if needed, the relevant Linux kernel drive can be loaded using:
+
+(dom0) modprobe module_name.ko
+
+## Dom0 device remove
+
+For removing the device from Dom0, first detach the device from Dom0:
+
+(dom0) xl dt-overlay detach overlay.dtbo 0
+
+NOTE: The user is expected to unload any Linux kernel modules which
+might be accessing the devices in overlay.dtbo before detach the device.
+Detaching devices without unloading the modules might result in a crash.
+
+Then remove the overlay from Xen device tree:
+
+(dom0) xl dt-overlay remove overlay.dtbo
+
+## DomU device add/remove
+
+All the nodes in dtbo will be assigned to a domain; the user will need
+to prepare the dtb for the domU. For example, the `interrupt-parent` property
+of the DomU overlay should be changed to the Xen hardcoded value `0xfde8`.
+Below assumes the properly written DomU dtbo is `overlay_domu.dtbo`.
+
+User will need to create the DomU with below properties properly configured
+in the xl config file:
+- `iomem`
+- `passthrough` (if IOMMU is needed)
+
+User will also need to modprobe the relevant drivers.
+
+Example for domU device add:
+
+(dom0) xl dt-overlay add overlay.dtbo# If not executed before
+(dom0) xl dt-overlay attach overlay.dtbo $domid
+(dom0) xl console $domid # To access $domid console
+
+Next, if the user needs to modify/prepare the overlay.dtbo suitable for
+the domU:
+
+(domU) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(domU) cat overlay_domu.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally, if needed, the relevant Linux kernel drive can be probed:
+
+(domU) modprobe module_name.ko
+
+Example for domU overlay remove:
+
+(dom0) xl dt-overlay detach overlay.dtbo $domid
+(dom0) xl dt-overlay remove overlay.dtbo
+
+[1] https://www.kernel.org/doc/Documentation/devicetree/overlay-notes.txt
-- 
2.34.1




[PATCH v4 1/9] tools/xl: Correct the help information and exit code of the dt-overlay command

2024-05-23 Thread Henry Wang
Fix the name mismatch in the xl dt-overlay command, the
command name should be "dt-overlay" instead of "dt_overlay".
Add the missing "," in the cmdtable.

Fix the exit code of the dt-overlay command, use EXIT_FAILURE
instead of ERROR_FAIL.

Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree 
overlay support")
Suggested-by: Anthony PERARD 
Signed-off-by: Henry Wang 
Reviewed-by: Jason Andryuk 
---
v4:
- No change.
v3:
- Add Jason's Reviewed-by tag.
v2:
- New patch
---
 tools/xl/xl_cmdtable.c  | 2 +-
 tools/xl/xl_vmcontrol.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 62bdb2aeaa..1f3c6b5897 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -635,7 +635,7 @@ const struct cmd_spec cmd_table[] = {
 { "dt-overlay",
   _dt_overlay, 0, 1,
   "Add/Remove a device tree overlay",
-  "add/remove <.dtbo>"
+  "add/remove <.dtbo>",
   "-h print this help\n"
 },
 #endif
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 98f6bd2e76..02575d5d36 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -1278,7 +1278,7 @@ int main_dt_overlay(int argc, char **argv)
 const int overlay_remove_op = 2;
 
 if (argc < 2) {
-help("dt_overlay");
+help("dt-overlay");
 return EXIT_FAILURE;
 }
 
@@ -1302,11 +1302,11 @@ int main_dt_overlay(int argc, char **argv)
 fprintf(stderr, "failed to read the overlay device tree file %s\n",
 overlay_config_file);
 free(overlay_dtb);
-return ERROR_FAIL;
+return EXIT_FAILURE;
 }
 } else {
 fprintf(stderr, "overlay dtbo file not provided\n");
-return ERROR_FAIL;
+return EXIT_FAILURE;
 }
 
 rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op);
-- 
2.34.1




[PATCH v4 3/9] tools/arm: Introduce the "nr_spis" xl config entry

2024-05-23 Thread Henry Wang
Currently, the number of SPIs allocated to the domain is only
configurable for Dom0less DomUs. Xen domains are supposed to be
platform agnostics and therefore the numbers of SPIs for libxl
guests should not be based on the hardware.

Introduce a new xl config entry for Arm to provide a method for
user to decide the number of SPIs. This would help to avoid
bumping the `config->arch.nr_spis` in libxl everytime there is a
new platform with increased SPI numbers.

Update the doc and the golang bindings accordingly.

Signed-off-by: Henry Wang 
Reviewed-by: Jason Andryuk 
---
v4:
- Add Jason's Reviewed-by tag.
v3:
- Reword documentation to avoid ambiguity.
v2:
- New patch to replace the original patch in v1:
  "[PATCH 05/15] tools/libs/light: Increase nr_spi to 160"
---
 docs/man/xl.cfg.5.pod.in | 14 ++
 tools/golang/xenlight/helpers.gen.go |  2 ++
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/libs/light/libxl_arm.c |  4 ++--
 tools/libs/light/libxl_types.idl |  1 +
 tools/xl/xl_parse.c  |  3 +++
 6 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 8f2b375ce9..416d582844 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -3072,6 +3072,20 @@ raised.
 
 =back
 
+=over 4
+
+=item B
+
+An optional 32-bit integer parameter specifying the number of SPIs (Shared
+Peripheral Interrupts) to allocate for the domain. If the value specified by
+the `nr_spis` parameter is smaller than the number of SPIs calculated by the
+toolstack based on the devices allocated for the domain, or the `nr_spis`
+parameter is not specified, the value calculated by the toolstack will be used
+for the domain. Otherwise, the value specified by the `nr_spis` parameter will
+be used.
+
+=back
+
 =head3 x86
 
 =over 4
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index b9cb5b33c7..fe5110474d 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1154,6 +1154,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.ArchArm.SveVl = SveType(xc.arch_arm.sve_vl)
+x.ArchArm.NrSpis = uint32(xc.arch_arm.nr_spis)
 if err := x.ArchX86.MsrRelaxed.fromC(_x86.msr_relaxed);err != nil {
 return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
 }
@@ -1670,6 +1671,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.arch_arm.sve_vl = C.libxl_sve_type(x.ArchArm.SveVl)
+xc.arch_arm.nr_spis = C.uint32_t(x.ArchArm.NrSpis)
 if err := x.ArchX86.MsrRelaxed.toC(_x86.msr_relaxed); err != nil {
 return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
 }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index 5b293755d7..c9e45b306f 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -597,6 +597,7 @@ ArchArm struct {
 GicVersion GicVersion
 Vuart VuartType
 SveVl SveType
+NrSpis uint32
 }
 ArchX86 struct {
 MsrRelaxed Defbool
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 1cb89fa584..a4029e3ac8 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -181,8 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 
 LOG(DEBUG, "Configure the domain");
 
-config->arch.nr_spis = nr_spis;
-LOG(DEBUG, " - Allocate %u SPIs", nr_spis);
+config->arch.nr_spis = max(nr_spis, d_config->b_info.arch_arm.nr_spis);
+LOG(DEBUG, " - Allocate %u SPIs", config->arch.nr_spis);
 
 switch (d_config->b_info.arch_arm.gic_version) {
 case LIBXL_GIC_VERSION_DEFAULT:
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 79e9c656cc..4e65e6fda5 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -722,6 +722,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
 ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
("vuart", libxl_vuart_type),
("sve_vl", libxl_sve_type),
+   ("nr_spis", uint32),
   ])),
 ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool),
   ])),
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index c504ab3711..e3a4800f6e 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2935,6 +2935,9 @@ skip_usbdev:
 }
 }
 
+if (!xlu_cfg_get_long (config, "nr_spis", , 0))
+b_info->arch_arm.nr_spis = l;
+
 parse_vkb_list(config, d_config);
 
 d_config->virtios = NULL;
-- 
2.34.1




[PATCH v4 7/9] xen/arm: Support device detachment from domains

2024-05-23 Thread Henry Wang
Similarly as the device attachment from DT overlay to domain, this
commit implements the device detachment from domain. The DOMCTL
XEN_DOMCTL_dt_overlay op is extended to have the operation
XEN_DOMCTL_DT_OVERLAY_DETACH. The detachment of the device is
implemented by unmapping the IRQ and IOMMU resources. Note that with
these changes, the device de-registration from the IOMMU driver should
only happen at the time when the DT overlay is removed from the Xen
device tree.

Signed-off-by: Henry Wang 
Signed-off-by: Vikram Garhwal 
---
v4:
- Split the original patch, only do device detachment from domain.
---
 xen/common/dt-overlay.c | 243 
 xen/include/public/domctl.h |   3 +-
 2 files changed, 194 insertions(+), 52 deletions(-)

diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 1087f9b502..693b6e4777 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -392,24 +392,100 @@ find_track_entry_from_tracker(const void *overlay_fdt,
 return entry;
 }
 
+static int remove_irq(unsigned long s, unsigned long e, void *data)
+{
+struct domain *d = data;
+int rc = 0;
+
+/*
+ * IRQ should always have access unless there are duplication of
+ * of irqs in device tree. There are few cases of xen device tree
+ * where there are duplicate interrupts for the same node.
+ */
+if (!irq_access_permitted(d, s))
+return 0;
+/*
+ * TODO: We don't handle shared IRQs for now. So, it is assumed that
+ * the IRQs was not shared with another domain.
+ */
+rc = irq_deny_access(d, s);
+if ( rc )
+{
+printk(XENLOG_ERR "unable to revoke access for irq %ld\n", s);
+return rc;
+}
+
+rc = release_guest_irq(d, s);
+if ( rc )
+{
+printk(XENLOG_ERR "unable to release irq %ld\n", s);
+return rc;
+}
+
+return rc;
+}
+
+static int remove_all_irqs(struct rangeset *irq_ranges, struct domain *d)
+{
+return rangeset_report_ranges(irq_ranges, 0, ~0UL, remove_irq, d);
+}
+
+static int remove_iomem(unsigned long s, unsigned long e, void *data)
+{
+struct domain *d = data;
+int rc = 0;
+p2m_type_t t;
+mfn_t mfn;
+
+mfn = p2m_lookup(d, _gfn(s), );
+if ( mfn_x(mfn) == 0 || mfn_x(mfn) == ~0UL )
+return -EINVAL;
+
+rc = iomem_deny_access(d, s, e);
+if ( rc )
+{
+printk(XENLOG_ERR "Unable to remove %pd access to %#lx - %#lx\n",
+   d, s, e);
+return rc;
+}
+
+rc = unmap_mmio_regions(d, _gfn(s), e - s, _mfn(s));
+if ( rc )
+return rc;
+
+return rc;
+}
+
+static int remove_all_iomems(struct rangeset *iomem_ranges, struct domain *d)
+{
+return rangeset_report_ranges(iomem_ranges, 0, ~0UL, remove_iomem, d);
+}
+
 /* Check if node itself can be removed and remove node from IOMMU. */
-static int remove_node_resources(struct dt_device_node *device_node)
+static int remove_node_resources(struct dt_device_node *device_node,
+ struct domain *d)
 {
 int rc = 0;
 unsigned int len;
 domid_t domid;
 
-domid = dt_device_used_by(device_node);
+if ( !d )
+{
+domid = dt_device_used_by(device_node);
 
-dt_dprintk("Checking if node %s is used by any domain\n",
-   device_node->full_name);
+dt_dprintk("Checking if node %s is used by any domain\n",
+   device_node->full_name);
 
-/* Remove the node if only it's assigned to hardware domain or domain io. 
*/
-if ( domid != hardware_domain->domain_id && domid != DOMID_IO )
-{
-printk(XENLOG_ERR "Device %s is being used by domain %u. Removing 
nodes failed\n",
-   device_node->full_name, domid);
-return -EINVAL;
+/*
+ * We also check if device is assigned to DOMID_IO as when a domain
+ * is destroyed device is assigned to DOMID_IO.
+ */
+if ( domid != DOMID_IO )
+{
+printk(XENLOG_ERR "Device %s is being assigned to %u. Device is 
assigned to %d\n",
+   device_node->full_name, DOMID_IO, domid);
+return -EINVAL;
+}
 }
 
 /* Check if iommu property exists. */
@@ -417,9 +493,12 @@ static int remove_node_resources(struct dt_device_node 
*device_node)
 {
 if ( dt_device_is_protected(device_node) )
 {
-rc = iommu_remove_dt_device(device_node);
-if ( rc < 0 )
-return rc;
+if ( !list_empty(_node->domain_list) )
+{
+rc = iommu_deassign_dt_device(d, device_node);
+if ( rc < 0 )
+return rc;
+}
 }
 }
 
@@ -428,7 +507,8 @@ static int remove_node_resources(struct dt_device_node 
*device_node)
 
 /* Remove all descendants 

[PATCH v4 4/9] xen/arm/gic: Allow adding interrupt to running VMs

2024-05-23 Thread Henry Wang
Currently, adding physical interrupts are only allowed at
the domain creation time. For use cases such as dynamic device
tree overlay addition, the adding of physical IRQ to
running domains should be allowed.

Drop the above-mentioned domain creation check. Since this
will introduce interrupt state unsync issues for cases when the
interrupt is active or pending in the guest, therefore for these
cases we simply reject the operation. Do it for both new and old
vGIC implementations.

Signed-off-by: Henry Wang 
---
v4:
- Split the original patch, only do the adding IRQ stuff in this
  patch.
v3:
- Update in-code comments.
- Correct the if conditions.
- Add taking/releasing the vgic lock of the vcpu.
v2:
- Reject the case where the IRQ is active or pending in guest.
---
 xen/arch/arm/gic-vgic.c  | 9 +++--
 xen/arch/arm/gic.c   | 8 
 xen/arch/arm/vgic/vgic.c | 7 +--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 56490dbc43..b99e287224 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -442,9 +442,14 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
 
 if ( connect )
 {
-/* The VIRQ should not be already enabled by the guest */
+/*
+ * The VIRQ should not be already enabled by the guest nor
+ * active/pending in the guest.
+ */
 if ( !p->desc &&
- !test_bit(GIC_IRQ_GUEST_ENABLED, >status) )
+ !test_bit(GIC_IRQ_GUEST_ENABLED, >status) &&
+ !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) &&
+ !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) )
 p->desc = desc;
 else
 ret = -EBUSY;
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 44c40e86de..b3467a76ae 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -135,14 +135,6 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int 
virq,
 ASSERT(virq < vgic_num_irqs(d));
 ASSERT(!is_lpi(virq));
 
-/*
- * When routing an IRQ to guest, the virtual state is not synced
- * back to the physical IRQ. To prevent get unsync, restrict the
- * routing to when the Domain is been created.
- */
-if ( d->creation_finished )
-return -EBUSY;
-
 ret = vgic_connect_hw_irq(d, NULL, virq, desc, true);
 if ( ret )
 return ret;
diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
index b9463a5f27..048e12c562 100644
--- a/xen/arch/arm/vgic/vgic.c
+++ b/xen/arch/arm/vgic/vgic.c
@@ -876,8 +876,11 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu 
*vcpu,
 
 if ( connect )  /* assign a mapped IRQ */
 {
-/* The VIRQ should not be already enabled by the guest */
-if ( !irq->hw && !irq->enabled )
+/*
+ * The VIRQ should not be already enabled by the guest nor
+ * active/pending in the guest
+ */
+if ( !irq->hw && !irq->enabled && !irq->active && !irq->pending_latch )
 {
 irq->hw = true;
 irq->hwintid = desc->irq;
-- 
2.34.1




[PATCH v4 2/9] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs

2024-05-23 Thread Henry Wang
There are some use cases in which the dom0less domUs need to have
the XEN_DOMCTL_CDF_iommu set at the domain construction time. For
example, the dynamic dtbo feature allows the domain to be assigned
a device that is behind the IOMMU at runtime. For these use cases,
we need to have a way to specify the domain will need the IOMMU
mapping at domain construction time.

Introduce a "passthrough" DT property for Dom0less DomUs following
the same entry as the xl.cfg. Currently only provide two options,
i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain
construction time based on the property.

Signed-off-by: Henry Wang 
---
v4:
- No change.
v3:
- Use a separate variable to cache the condition from the "passthrough"
  flag separately to improve readability.
- Update the doc to explain the default condition more clearly.
v2:
- New patch to replace the original patch in v1:
  "[PATCH 03/15] xen/arm: Always enable IOMMU"
---
 docs/misc/arm/device-tree/booting.txt | 16 
 xen/arch/arm/dom0less-build.c | 11 +--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index bbd955e9c2..f1fd069c87 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -260,6 +260,22 @@ with the following properties:
 value specified by Xen command line parameter gnttab_max_maptrack_frames
 (or its default value if unspecified, i.e. 1024) is used.
 
+- passthrough
+
+A string property specifying whether IOMMU mappings are enabled for the
+domain and hence whether it will be enabled for passthrough hardware.
+Possible property values are:
+
+- "enabled"
+IOMMU mappings are enabled for the domain. Note that this option is the
+default if the user provides the device partial passthrough device tree
+for the domain.
+
+- "disabled"
+IOMMU mappings are disabled for the domain and so hardware may not be
+passed through. This option is the default if this property is missing
+and the user does not provide the device partial device tree for the 
domain.
+
 Under the "xen,domain" compatible node, one or more sub-nodes are present
 for the DomU kernel and ramdisk.
 
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 74f053c242..5830a7051d 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -848,6 +848,8 @@ static int __init construct_domU(struct domain *d,
 void __init create_domUs(void)
 {
 struct dt_device_node *node;
+const char *dom0less_iommu;
+bool iommu = false;
 const struct dt_device_node *cpupool_node,
 *chosen = dt_find_node_by_path("/chosen");
 
@@ -895,8 +897,13 @@ void __init create_domUs(void)
 panic("Missing property 'cpus' for domain %s\n",
   dt_node_name(node));
 
-if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") &&
- iommu_enabled )
+if ( !dt_property_read_string(node, "passthrough", _iommu) &&
+ !strcmp(dom0less_iommu, "enabled") )
+iommu = true;
+
+if ( iommu_enabled &&
+ (iommu || dt_find_compatible_node(node, NULL,
+   "multiboot,device-tree")) )
 d_cfg.flags |= XEN_DOMCTL_CDF_iommu;
 
 if ( !dt_property_read_u32(node, "nr_spis", _cfg.arch.nr_spis) )
-- 
2.34.1




[PATCH v4 0/9] Remaining patches for dynamic node programming using overlay dtbo

2024-05-23 Thread Henry Wang
Hi all,

This is the remaining series for the full functional "dynamic node
programming using overlay dtbo" feature. The first part [1] has
already been merged.

Quoting from the original series, the first part has already made
Xen aware of new device tree node which means updating the dt_host
with overlay node information, and in this series, the goal is to
map IRQ and IOMMU during runtime, where we will do the actual IOMMU
and IRQ mapping and unmapping to a running domain. Also, documentation
of the "dynamic node programming using overlay dtbo" feature is added.

During the discussion in v3, I was recommended to split the overlay
devices attach/detach to/from running domains to separated patches [3].
But I decided to only expose the xl user interfaces together to the
users after device attach/detach is fully functional, so I didn't
split the toolstack patch (#8).

Patch 1 is a fix of the existing code which is noticed during my local
tests, details please see the commit message.

Gitlab CI for this series can be found in [2].

[1] 
https://lore.kernel.org/xen-devel/20230906011631.30310-1-vikram.garh...@amd.com/
[2] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1301720278
[3] 
https://lore.kernel.org/xen-devel/e743d3d2-5884-4e55-8627-85985ba33...@amd.com/

Henry Wang (7):
  tools/xl: Correct the help information and exit code of the dt-overlay
command
  xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
  tools/arm: Introduce the "nr_spis" xl config entry
  xen/arm/gic: Allow adding interrupt to running VMs
  xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains
  xen/arm: Support device detachment from domains
  tools: Introduce the "xl dt-overlay {attach,detach}" commands

Vikram Garhwal (2):
  xen/arm/gic: Allow removing interrupt to running VMs
  docs: Add device tree overlay documentation

 docs/man/xl.cfg.5.pod.in  |  14 +
 docs/misc/arm/device-tree/booting.txt |  16 +
 docs/misc/arm/overlay.txt |  99 ++
 tools/golang/xenlight/helpers.gen.go  |   2 +
 tools/golang/xenlight/types.gen.go|   1 +
 tools/include/libxl.h |  10 +
 tools/include/xenctrl.h   |   3 +
 tools/libs/ctrl/xc_dt_overlay.c   |  31 ++
 tools/libs/light/libxl_arm.c  |   4 +-
 tools/libs/light/libxl_dt_overlay.c   |  28 ++
 tools/libs/light/libxl_types.idl  |   1 +
 tools/xl/xl_cmdtable.c|   4 +-
 tools/xl/xl_parse.c   |   3 +
 tools/xl/xl_vmcontrol.c   |  48 ++-
 xen/arch/arm/dom0less-build.c |  11 +-
 xen/arch/arm/domctl.c |   3 +
 xen/arch/arm/gic-vgic.c   |  36 ++-
 xen/arch/arm/gic.c|  17 +-
 xen/arch/arm/vgic/vgic.c  |  31 +-
 xen/common/dt-overlay.c   | 438 --
 xen/include/public/domctl.h   |  15 +
 xen/include/public/sysctl.h   |  11 +-
 xen/include/xen/dt-overlay.h  |   7 +
 23 files changed, 678 insertions(+), 155 deletions(-)
 create mode 100644 docs/misc/arm/overlay.txt

-- 
2.34.1




Re: [PATCH v3 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs

2024-05-22 Thread Henry Wang

Hi Julien, Stefano,

On 5/22/2024 9:03 PM, Julien Grall wrote:

Hi Henry,

On 22/05/2024 02:22, Henry Wang wrote:
Also, while looking at the locking, I noticed that we are not 
doing anything
with GIC_IRQ_GUEST_MIGRATING. In gic_update_one_lr(), we seem to 
assume that

if the flag is set, then p->desc cannot be NULL.

Can we reach vgic_connect_hw_irq() with the flag set?
I think even from the perspective of making the code extra safe, we 
should
also check GIC_IRQ_GUEST_MIGRATING as the LR is allocated for this 
case. I

will also add the check of GIC_IRQ_GUEST_MIGRATING here.

Yes. I think it might be easier to check for GIC_IRQ_GUEST_MIGRATING
early and return error immediately in that case. Otherwise, we can
continue and take spin_lock(_target->arch.vgic.lock) because no
migration is in progress


Ok, this makes sense to me, I will add

 if( test_bit(GIC_IRQ_GUEST_MIGRATING, >status) )
 {
 vgic_unlock_rank(v_target, rank, flags);
 return -EBUSY;
 }

right after taking the vgic rank lock.


Summary of our yesterday's discussion on Matrix:
For the split of patch mentioned in...

I think that would be ok. I have to admit, I am still a bit wary about 
allowing to remove interrupts when the domain is running.


I am less concerned about the add part. Do you need the remove part 
now? If not, I would suggest to split in two so we can get the most of 
this series merged for 4.19 and continue to deal with the remove path 
in the background.


...here, I will do that in the next version.


I will answer here to the other reply:

> I don't think so, if I am not mistaken, no LR will be allocated with 
other flags set.


I wasn't necessarily thinking about the LR allocation. I was more 
thinking whether there are any flags that could still be set.


IOW, will the vIRQ like new once vgic_connect_hw_irq() is succesful?

Also, while looking at the flags, I noticed we clear _IRQ_INPROGRESS 
before vgic_connect_hw_irq(). Shouldn't we only clear *after*?


This is a good catch, with the logic of vgic_connect_hw_irq() extended 
to reject the invalid cases, it is indeed safer to clear the 
_IRQ_INPROGRESS  after the successful vgic_connect_hw_irq(). I will move 
it after.


This brings to another question. You don't special case a dying 
domain. If the domain is crashing, wouldn't this mean it wouldn't be 
possible to destroy it?


Another good point, thanks. I will try to make a special case of the 
dying domain.


Kind regards,
Henry




Cheers,






Re: [PATCH v3 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs

2024-05-21 Thread Henry Wang

Hi Stefano,

On 5/22/2024 9:16 AM, Stefano Stabellini wrote:

On Wed, 22 May 2024, Henry Wang wrote:

Hi Julien,

On 5/21/2024 8:30 PM, Julien Grall wrote:

Hi,

On 21/05/2024 05:35, Henry Wang wrote:

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 56490dbc43..956c11ba13 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -439,24 +439,33 @@ int vgic_connect_hw_irq(struct domain *d, struct
vcpu *v, unsigned int virq,
     /* We are taking to rank lock to prevent parallel connections. */
   vgic_lock_rank(v_target, rank, flags);
+    spin_lock(_target->arch.vgic.lock);

I know this is what Stefano suggested, but v_target would point to the
current affinity whereas the interrupt may be pending/active on the
"previous" vCPU. So it is a little unclear whether v_target is the correct
lock. Do you have more pointer to show this is correct?

No I think you are correct, we have discussed this in the initial version of
this patch. Sorry.

I followed the way from that discussion to note down the vcpu ID and retrieve
here, below is the diff, would this make sense to you?

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 956c11ba13..134ed4e107 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -439,7 +439,7 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v,
unsigned int virq,

  /* We are taking to rank lock to prevent parallel connections. */
  vgic_lock_rank(v_target, rank, flags);
-    spin_lock(_target->arch.vgic.lock);
+ spin_lock(>vcpu[p->spi_vcpu_id]->arch.vgic.lock);

  if ( connect )
  {
@@ -465,7 +465,7 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v,
unsigned int virq,
  p->desc = NULL;
  }

-    spin_unlock(_target->arch.vgic.lock);
+ spin_unlock(>vcpu[p->spi_vcpu_id]->arch.vgic.lock);
  vgic_unlock_rank(v_target, rank, flags);

  return ret;
diff --git a/xen/arch/arm/include/asm/vgic.h b/xen/arch/arm/include/asm/vgic.h
index 79b73a0dbb..f4075d3e75 100644
--- a/xen/arch/arm/include/asm/vgic.h
+++ b/xen/arch/arm/include/asm/vgic.h
@@ -85,6 +85,7 @@ struct pending_irq
  uint8_t priority;
  uint8_t lpi_priority;   /* Caches the priority if this is an LPI. */
  uint8_t lpi_vcpu_id;    /* The VCPU for an LPI. */
+    uint8_t spi_vcpu_id;    /* The VCPU for an SPI. */
  /* inflight is used to append instances of pending_irq to
   * vgic.inflight_irqs */
  struct list_head inflight;
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index c04fc4f83f..e852479f13 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -632,6 +632,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v,
unsigned int virq,
  }
  list_add_tail(>inflight, >arch.vgic.inflight_irqs);
  out:
+    n->spi_vcpu_id = v->vcpu_id;
  spin_unlock_irqrestore(>arch.vgic.lock, flags);

  /* we have a new higher priority irq, inject it into the guest */
  vcpu_kick(v);



Also, while looking at the locking, I noticed that we are not doing anything
with GIC_IRQ_GUEST_MIGRATING. In gic_update_one_lr(), we seem to assume that
if the flag is set, then p->desc cannot be NULL.

Can we reach vgic_connect_hw_irq() with the flag set?

I think even from the perspective of making the code extra safe, we should
also check GIC_IRQ_GUEST_MIGRATING as the LR is allocated for this case. I
will also add the check of GIC_IRQ_GUEST_MIGRATING here.

Yes. I think it might be easier to check for GIC_IRQ_GUEST_MIGRATING
early and return error immediately in that case. Otherwise, we can
continue and take spin_lock(_target->arch.vgic.lock) because no
migration is in progress


Ok, this makes sense to me, I will add

    if( test_bit(GIC_IRQ_GUEST_MIGRATING, >status) )
    {
    vgic_unlock_rank(v_target, rank, flags);
    return -EBUSY;
    }

right after taking the vgic rank lock.

Kind regards,
Henry



Re: [PATCH v3 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs

2024-05-21 Thread Henry Wang

Hi Julien,

On 5/21/2024 8:30 PM, Julien Grall wrote:

Hi,

On 21/05/2024 05:35, Henry Wang wrote:

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 56490dbc43..956c11ba13 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -439,24 +439,33 @@ int vgic_connect_hw_irq(struct domain *d, 
struct vcpu *v, unsigned int virq,
    /* We are taking to rank lock to prevent parallel 
connections. */

  vgic_lock_rank(v_target, rank, flags);
+    spin_lock(_target->arch.vgic.lock);


I know this is what Stefano suggested, but v_target would point to the 
current affinity whereas the interrupt may be pending/active on the 
"previous" vCPU. So it is a little unclear whether v_target is the 
correct lock. Do you have more pointer to show this is correct?


No I think you are correct, we have discussed this in the initial 
version of this patch. Sorry.


I followed the way from that discussion to note down the vcpu ID and 
retrieve here, below is the diff, would this make sense to you?


diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 956c11ba13..134ed4e107 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -439,7 +439,7 @@ int vgic_connect_hw_irq(struct domain *d, struct 
vcpu *v, unsigned int virq,


 /* We are taking to rank lock to prevent parallel connections. */
 vgic_lock_rank(v_target, rank, flags);
-    spin_lock(_target->arch.vgic.lock);
+ spin_lock(>vcpu[p->spi_vcpu_id]->arch.vgic.lock);

 if ( connect )
 {
@@ -465,7 +465,7 @@ int vgic_connect_hw_irq(struct domain *d, struct 
vcpu *v, unsigned int virq,

 p->desc = NULL;
 }

-    spin_unlock(_target->arch.vgic.lock);
+ spin_unlock(>vcpu[p->spi_vcpu_id]->arch.vgic.lock);
 vgic_unlock_rank(v_target, rank, flags);

 return ret;
diff --git a/xen/arch/arm/include/asm/vgic.h 
b/xen/arch/arm/include/asm/vgic.h

index 79b73a0dbb..f4075d3e75 100644
--- a/xen/arch/arm/include/asm/vgic.h
+++ b/xen/arch/arm/include/asm/vgic.h
@@ -85,6 +85,7 @@ struct pending_irq
 uint8_t priority;
 uint8_t lpi_priority;   /* Caches the priority if this is an 
LPI. */

 uint8_t lpi_vcpu_id;    /* The VCPU for an LPI. */
+    uint8_t spi_vcpu_id;    /* The VCPU for an SPI. */
 /* inflight is used to append instances of pending_irq to
  * vgic.inflight_irqs */
 struct list_head inflight;
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index c04fc4f83f..e852479f13 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -632,6 +632,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu 
*v, unsigned int virq,

 }
 list_add_tail(>inflight, >arch.vgic.inflight_irqs);
 out:
+    n->spi_vcpu_id = v->vcpu_id;
 spin_unlock_irqrestore(>arch.vgic.lock, flags);

 /* we have a new higher priority irq, inject it into the guest */
 vcpu_kick(v);


Also, while looking at the locking, I noticed that we are not doing 
anything with GIC_IRQ_GUEST_MIGRATING. In gic_update_one_lr(), we seem 
to assume that if the flag is set, then p->desc cannot be NULL.


Can we reach vgic_connect_hw_irq() with the flag set?


I think even from the perspective of making the code extra safe, we 
should also check GIC_IRQ_GUEST_MIGRATING as the LR is allocated for 
this case. I will also add the check of GIC_IRQ_GUEST_MIGRATING here.


What about the other flags? Is this going to be a concern if we don't 
reset them?


I don't think so, if I am not mistaken, no LR will be allocated with 
other flags set.


Kind regards,
Henry



Cheers,






Re: [PATCH v3] xen/arm: Set correct per-cpu cpu_core_mask

2024-05-21 Thread Henry Wang

Hi Michal,

On 5/21/2024 3:47 PM, Michal Orzel wrote:

Hi Henry.

On 3/21/2024 11:57 AM, Henry Wang wrote:

In the common sysctl command XEN_SYSCTL_physinfo, the value of
cores_per_socket is calculated based on the cpu_core_mask of CPU0.
Currently on Arm this is a fixed value 1 (can be checked via xl info),
which is not correct. This is because during the Arm CPU online
process at boot time, setup_cpu_sibling_map() only sets the per-cpu
cpu_core_mask for itself.

cores_per_socket refers to the number of cores that belong to the same
socket (NUMA node). Currently Xen on Arm does not support physical
CPU hotplug and NUMA, also we assume there is no multithread. Therefore
cores_per_socket means all possible CPUs detected from the device
tree. Setting the per-cpu cpu_core_mask in setup_cpu_sibling_map()
accordingly. Modify the in-code comment which seems to be outdated. Add
a warning to users if Xen is running on processors with multithread
support.

Signed-off-by: Henry Wang 
Signed-off-by: Henry Wang 

Reviewed-by: Michal Orzel 


Thanks.


   /* ID of the PCPU we're running on */
   DEFINE_PER_CPU(unsigned int, cpu_id);
-/* XXX these seem awfully x86ish... */
+/*
+ * Although multithread is part of the Arm spec, there are not many
+ * processors support multithread and current Xen on Arm assumes there

NIT: s/support/supporting


Sorry, it should have been spotted locally before sending. Anyway, I 
will correct this in v4 with your Reviewed-by tag taken. Thanks for 
pointing this out.


Kind regards,
Henry


__init smp_get_max_cpus(void)

~Michal





[PATCH v3 8/8] docs: Add device tree overlay documentation

2024-05-20 Thread Henry Wang
From: Vikram Garhwal 

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
v3:
- No change.
v2:
- Update the content based on the changes in this version.
---
 docs/misc/arm/overlay.txt | 99 +++
 1 file changed, 99 insertions(+)
 create mode 100644 docs/misc/arm/overlay.txt

diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt
new file mode 100644
index 00..811a6de369
--- /dev/null
+++ b/docs/misc/arm/overlay.txt
@@ -0,0 +1,99 @@
+# Device Tree Overlays support in Xen
+
+Xen now supports dynamic device assignment to running domains,
+i.e. adding/removing nodes (using .dtbo) to/from Xen device tree, and
+attaching/detaching them to/from a running domain with given $domid.
+
+Dynamic node assignment works in two steps:
+
+## Add/Remove device tree overlay to/from Xen device tree
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor applies/removes the dtbo to/from Xen device tree.
+
+## Attach/Detach device from the DT overlay to/from domain
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor attach/detach the device to/from the user-provided $domid by
+   mapping/unmapping node resources in the DT overlay.
+
+# Examples
+
+Here are a few examples on how to use it.
+
+## Dom0 device add
+
+For assigning a device tree overlay to Dom0, user should firstly properly
+prepare the DT overlay. More information about device tree overlays can be
+found in [1]. Then, in Dom0, enter the following:
+
+(dom0) xl dt-overlay add overlay.dtbo
+
+This will allocate the devices mentioned in overlay.dtbo to Xen device tree.
+
+To assign the newly added device from the dtbo to Dom0:
+
+(dom0) xl dt-overlay attach overlay.dtbo 0
+
+Next, if the user wants to add the same device tree overlay to dom0
+Linux, execute the following:
+
+(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(dom0) cat overlay.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally if needed, the relevant Linux kernel drive can be loaded using:
+
+(dom0) modprobe module_name.ko
+
+## Dom0 device remove
+
+For removing the device from Dom0, first detach the device from Dom0:
+
+(dom0) xl dt-overlay detach overlay.dtbo 0
+
+NOTE: The user is expected to unload any Linux kernel modules which
+might be accessing the devices in overlay.dtbo before detach the device.
+Detaching devices without unloading the modules might result in a crash.
+
+Then remove the overlay from Xen device tree:
+
+(dom0) xl dt-overlay remove overlay.dtbo
+
+## DomU device add/remove
+
+All the nodes in dtbo will be assigned to a domain; the user will need
+to prepare the dtb for the domU. For example, the `interrupt-parent` property
+of the DomU overlay should be changed to the Xen hardcoded value `0xfde8`.
+Below assumes the properly written DomU dtbo is `overlay_domu.dtbo`.
+
+User will need to create the DomU with below properties properly configured
+in the xl config file:
+- `iomem`
+- `passthrough` (if IOMMU is needed)
+
+User will also need to modprobe the relevant drivers.
+
+Example for domU device add:
+
+(dom0) xl dt-overlay add overlay.dtbo# If not executed before
+(dom0) xl dt-overlay attach overlay.dtbo $domid
+(dom0) xl console $domid # To access $domid console
+
+Next, if the user needs to modify/prepare the overlay.dtbo suitable for
+the domU:
+
+(domU) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(domU) cat overlay_domu.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally, if needed, the relevant Linux kernel drive can be probed:
+
+(domU) modprobe module_name.ko
+
+Example for domU overlay remove:
+
+(dom0) xl dt-overlay detach overlay.dtbo $domid
+(dom0) xl dt-overlay remove overlay.dtbo
+
+[1] https://www.kernel.org/doc/Documentation/devicetree/overlay-notes.txt
-- 
2.34.1




[PATCH v3 0/8] Remaining patches for dynamic node programming using overlay dtbo

2024-05-20 Thread Henry Wang
Hi all,

This is the remaining series for the full functional "dynamic node
programming using overlay dtbo" feature. The first part [1] has
already been merged.

Quoting from the original series, the first part has already made
Xen aware of new device tree node which means updating the dt_host
with overlay node information, and in this series, the goal is to
map IRQ and IOMMU during runtime, where we will do the actual IOMMU
and IRQ mapping and unmapping to a running domain. Also, documentation
of the "dynamic node programming using overlay dtbo" feature is added.

Patch 1 and 2 are fixes of the existing code which is noticed during
my local tests, details please see the commit message.

Gitlab CI for this series can be found in [2].

[1] 
https://lore.kernel.org/xen-devel/20230906011631.30310-1-vikram.garh...@amd.com/
[2] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1298425517

Henry Wang (6):
  xen/common/dt-overlay: Fix lock issue when add/remove the device
  tools/xl: Correct the help information and exit code of the dt-overlay
command
  xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
  tools/arm: Introduce the "nr_spis" xl config entry
  xen/arm: Add XEN_DOMCTL_dt_overlay DOMCTL and related operations
  tools: Introduce the "xl dt-overlay {attach,detach}" commands

Vikram Garhwal (2):
  xen/arm/gic: Allow routing/removing interrupt to running VMs
  docs: Add device tree overlay documentation

 docs/man/xl.cfg.5.pod.in  |  14 +
 docs/misc/arm/device-tree/booting.txt |  16 +
 docs/misc/arm/overlay.txt |  99 ++
 tools/golang/xenlight/helpers.gen.go  |   2 +
 tools/golang/xenlight/types.gen.go|   1 +
 tools/include/libxl.h |  10 +
 tools/include/xenctrl.h   |   3 +
 tools/libs/ctrl/xc_dt_overlay.c   |  31 ++
 tools/libs/light/libxl_arm.c  |   4 +-
 tools/libs/light/libxl_dt_overlay.c   |  28 ++
 tools/libs/light/libxl_types.idl  |   1 +
 tools/xl/xl_cmdtable.c|   4 +-
 tools/xl/xl_parse.c   |   3 +
 tools/xl/xl_vmcontrol.c   |  48 ++-
 xen/arch/arm/dom0less-build.c |  11 +-
 xen/arch/arm/domctl.c |   3 +
 xen/arch/arm/gic-vgic.c   |  15 +-
 xen/arch/arm/gic.c|  15 -
 xen/arch/arm/vgic/vgic.c  |  10 +-
 xen/common/dt-overlay.c   | 441 --
 xen/include/public/domctl.h   |  15 +
 xen/include/public/sysctl.h   |  11 +-
 xen/include/xen/dt-overlay.h  |   7 +
 23 files changed, 644 insertions(+), 148 deletions(-)
 create mode 100644 docs/misc/arm/overlay.txt

-- 
2.34.1




[PATCH v3 3/8] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs

2024-05-20 Thread Henry Wang
There are some use cases in which the dom0less domUs need to have
the XEN_DOMCTL_CDF_iommu set at the domain construction time. For
example, the dynamic dtbo feature allows the domain to be assigned
a device that is behind the IOMMU at runtime. For these use cases,
we need to have a way to specify the domain will need the IOMMU
mapping at domain construction time.

Introduce a "passthrough" DT property for Dom0less DomUs following
the same entry as the xl.cfg. Currently only provide two options,
i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain
construction time based on the property.

Signed-off-by: Henry Wang 
---
v3:
- Use a separate variable to cache the condition from the "passthrough"
  flag separately to improve readability.
- Update the doc to explain the default condition more clearly.
v2:
- New patch to replace the original patch in v1:
  "[PATCH 03/15] xen/arm: Always enable IOMMU"
---
 docs/misc/arm/device-tree/booting.txt | 16 
 xen/arch/arm/dom0less-build.c | 11 +--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index bbd955e9c2..f1fd069c87 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -260,6 +260,22 @@ with the following properties:
 value specified by Xen command line parameter gnttab_max_maptrack_frames
 (or its default value if unspecified, i.e. 1024) is used.
 
+- passthrough
+
+A string property specifying whether IOMMU mappings are enabled for the
+domain and hence whether it will be enabled for passthrough hardware.
+Possible property values are:
+
+- "enabled"
+IOMMU mappings are enabled for the domain. Note that this option is the
+default if the user provides the device partial passthrough device tree
+for the domain.
+
+- "disabled"
+IOMMU mappings are disabled for the domain and so hardware may not be
+passed through. This option is the default if this property is missing
+and the user does not provide the device partial device tree for the 
domain.
+
 Under the "xen,domain" compatible node, one or more sub-nodes are present
 for the DomU kernel and ramdisk.
 
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 74f053c242..5830a7051d 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -848,6 +848,8 @@ static int __init construct_domU(struct domain *d,
 void __init create_domUs(void)
 {
 struct dt_device_node *node;
+const char *dom0less_iommu;
+bool iommu = false;
 const struct dt_device_node *cpupool_node,
 *chosen = dt_find_node_by_path("/chosen");
 
@@ -895,8 +897,13 @@ void __init create_domUs(void)
 panic("Missing property 'cpus' for domain %s\n",
   dt_node_name(node));
 
-if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") &&
- iommu_enabled )
+if ( !dt_property_read_string(node, "passthrough", _iommu) &&
+ !strcmp(dom0less_iommu, "enabled") )
+iommu = true;
+
+if ( iommu_enabled &&
+ (iommu || dt_find_compatible_node(node, NULL,
+   "multiboot,device-tree")) )
 d_cfg.flags |= XEN_DOMCTL_CDF_iommu;
 
 if ( !dt_property_read_u32(node, "nr_spis", _cfg.arch.nr_spis) )
-- 
2.34.1




[PATCH v3 1/8] xen/common/dt-overlay: Fix lock issue when add/remove the device

2024-05-20 Thread Henry Wang
If CONFIG_DEBUG=y, below assertion will be triggered:
(XEN) Assertion 'rw_is_locked(_host_lock)' failed at 
drivers/passthrough/device_tree.c:146
(XEN) [ Xen-4.19-unstable  arm64  debug=y  Not tainted ]
[...]
(XEN) Xen call trace:
(XEN)[<0a257418>] iommu_remove_dt_device+0x8c/0xd4 (PC)
(XEN)[<0a2573a0>] iommu_remove_dt_device+0x14/0xd4 (LR)
(XEN)[<0a20797c>] dt-overlay.c#remove_node_resources+0x8c/0x90
(XEN)[<0a207f14>] dt-overlay.c#remove_nodes+0x524/0x648
(XEN)[<0a208460>] dt_overlay_sysctl+0x428/0xc68
(XEN)[<0a2707f8>] arch_do_sysctl+0x1c/0x2c
(XEN)[<0a230b40>] do_sysctl+0x96c/0x9ec
(XEN)[<0a271e08>] traps.c#do_trap_hypercall+0x1e8/0x288
(XEN)[<0a273490>] do_trap_guest_sync+0x448/0x63c
(XEN)[<0a25c480>] entry.o#guest_sync_slowpath+0xa8/0xd8
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Assertion 'rw_is_locked(_host_lock)' failed at 
drivers/passthrough/device_tree.c:146
(XEN) 

This is because iommu_remove_dt_device() is called without taking the
dt_host_lock. dt_host_lock is meant to ensure that the DT node will not
disappear behind back. So fix the issue by taking the lock as soon as
getting hold of overlay_node.

Similar issue will be observed in adding the dtbo:
(XEN) Assertion 'system_state < SYS_STATE_active || rw_is_locked(_host_lock)'
failed at xen-source/xen/drivers/passthrough/device_tree.c:192
(XEN) [ Xen-4.19-unstable  arm64  debug=y  Not tainted ]
[...]
(XEN) Xen call trace:
(XEN)[<0a2594f4>] iommu_add_dt_device+0x7c/0x17c (PC)
(XEN)[<0a259494>] iommu_add_dt_device+0x1c/0x17c (LR)
(XEN)[<0a267db4>] handle_device+0x68/0x1e8
(XEN)[<0a208ba8>] dt_overlay_sysctl+0x9d4/0xb84
(XEN)[<0a27342c>] arch_do_sysctl+0x24/0x38
(XEN)[<0a231ac8>] do_sysctl+0x9ac/0xa34
(XEN)[<0a274b70>] traps.c#do_trap_hypercall+0x230/0x2dc
(XEN)[<0a276330>] do_trap_guest_sync+0x478/0x688
(XEN)[<0a25e480>] entry.o#guest_sync_slowpath+0xa8/0xd8

This is because the lock is released too early. So fix the issue by
releasing the lock after handle_device().

Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal 
functionalities")
Signed-off-by: Henry Wang 
Reviewed-by: Julien Grall 
---
v3:
- Add Julien's Reviewed-by tag.
v2:
- Take the lock as soon as getting hold of overlay_node. Also
  release the lock after handle_device() when adding dtbo.
v1.1:
- Move the unlock position before the check of rc.
---
 xen/common/dt-overlay.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 1b197381f6..9cece79067 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -429,18 +429,24 @@ static int remove_nodes(const struct overlay_track 
*tracker)
 if ( overlay_node == NULL )
 return -EINVAL;
 
+write_lock(_host_lock);
+
 rc = remove_descendant_nodes_resources(overlay_node);
 if ( rc )
+{
+write_unlock(_host_lock);
 return rc;
+}
 
 rc = remove_node_resources(overlay_node);
 if ( rc )
+{
+write_unlock(_host_lock);
 return rc;
+}
 
 dt_dprintk("Removing node: %s\n", overlay_node->full_name);
 
-write_lock(_host_lock);
-
 rc = dt_overlay_remove_node(overlay_node);
 if ( rc )
 {
@@ -604,8 +610,6 @@ static long add_nodes(struct overlay_track *tr, char 
**nodes_full_path)
 return rc;
 }
 
-write_unlock(_host_lock);
-
 prev_node->allnext = next_node;
 
 overlay_node = dt_find_node_by_path(overlay_node->full_name);
@@ -619,6 +623,7 @@ static long add_nodes(struct overlay_track *tr, char 
**nodes_full_path)
 rc = handle_device(hardware_domain, overlay_node, p2m_mmio_direct_c,
tr->iomem_ranges,
tr->irq_ranges);
+write_unlock(_host_lock);
 if ( rc )
 {
 printk(XENLOG_ERR "Adding IRQ and IOMMU failed\n");
-- 
2.34.1




[PATCH v3 7/8] tools: Introduce the "xl dt-overlay {attach,detach}" commands

2024-05-20 Thread Henry Wang
With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
attach/detach devices from the provided DT overlay to domains.
Support this by introducing a new set of "xl dt-overlay" commands and
related documentation, i.e. "xl dt-overlay {attach,detach}". Slightly
rework the command option parsing logic.

Signed-off-by: Henry Wang 
---
v3:
- Introduce new API libxl_dt_overlay_domain() and co., instead of
  reusing existing API libxl_dt_overlay().
- Add in-code comments for the LIBXL_DT_OVERLAY_* macros.
- Use find_domain() to avoid getting domain_id from strtol().
v2:
- New patch.
---
 tools/include/libxl.h   | 10 +++
 tools/include/xenctrl.h |  3 +++
 tools/libs/ctrl/xc_dt_overlay.c | 31 +
 tools/libs/light/libxl_dt_overlay.c | 28 +++
 tools/xl/xl_cmdtable.c  |  4 +--
 tools/xl/xl_vmcontrol.c | 42 -
 6 files changed, 104 insertions(+), 14 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 62cb07dea6..6cc6d6bf6a 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -2549,8 +2549,18 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, 
uint32_t domid,
 void libxl_device_pci_list_free(libxl_device_pci* list, int num);
 
 #if defined(__arm__) || defined(__aarch64__)
+/* Values should keep consistent with the op from XEN_SYSCTL_dt_overlay */
+#define LIBXL_DT_OVERLAY_ADD   1
+#define LIBXL_DT_OVERLAY_REMOVE2
 int libxl_dt_overlay(libxl_ctx *ctx, void *overlay,
  uint32_t overlay_size, uint8_t overlay_op);
+
+/* Values should keep consistent with the op from XEN_DOMCTL_dt_overlay */
+#define LIBXL_DT_OVERLAY_DOMAIN_ATTACH 1
+#define LIBXL_DT_OVERLAY_DOMAIN_DETACH 2
+int libxl_dt_overlay_domain(libxl_ctx *ctx, uint32_t domain_id,
+void *overlay_dt, uint32_t overlay_dt_size,
+uint8_t overlay_op);
 #endif
 
 /*
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 4996855944..9ceca0cffc 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2657,6 +2657,9 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t 
domid,
 #if defined(__arm__) || defined(__aarch64__)
 int xc_dt_overlay(xc_interface *xch, void *overlay_fdt,
   uint32_t overlay_fdt_size, uint8_t overlay_op);
+int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt,
+ uint32_t overlay_fdt_size, uint8_t overlay_op,
+ uint32_t domain_id);
 #endif
 
 /* Compat shims */
diff --git a/tools/libs/ctrl/xc_dt_overlay.c b/tools/libs/ctrl/xc_dt_overlay.c
index c2224c4d15..ea1da522d1 100644
--- a/tools/libs/ctrl/xc_dt_overlay.c
+++ b/tools/libs/ctrl/xc_dt_overlay.c
@@ -48,3 +48,34 @@ err:
 
 return err;
 }
+
+int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt,
+ uint32_t overlay_fdt_size, uint8_t overlay_op,
+ uint32_t domain_id)
+{
+int err;
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_dt_overlay,
+.domain = domain_id,
+.u.dt_overlay = {
+.overlay_op = overlay_op,
+.overlay_fdt_size = overlay_fdt_size,
+}
+};
+
+DECLARE_HYPERCALL_BOUNCE(overlay_fdt, overlay_fdt_size,
+ XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( (err = xc_hypercall_bounce_pre(xch, overlay_fdt)) )
+goto err;
+
+set_xen_guest_handle(domctl.u.dt_overlay.overlay_fdt, overlay_fdt);
+
+if ( (err = do_domctl(xch, )) != 0 )
+PERROR("%s failed", __func__);
+
+err:
+xc_hypercall_bounce_post(xch, overlay_fdt);
+
+return err;
+}
diff --git a/tools/libs/light/libxl_dt_overlay.c 
b/tools/libs/light/libxl_dt_overlay.c
index a6c709a6dc..00503b76bd 100644
--- a/tools/libs/light/libxl_dt_overlay.c
+++ b/tools/libs/light/libxl_dt_overlay.c
@@ -69,3 +69,31 @@ out:
 return rc;
 }
 
+int libxl_dt_overlay_domain(libxl_ctx *ctx, uint32_t domain_id,
+void *overlay_dt, uint32_t overlay_dt_size,
+uint8_t overlay_op)
+{
+int rc;
+int r;
+GC_INIT(ctx);
+
+if (check_overlay_fdt(gc, overlay_dt, overlay_dt_size)) {
+LOG(ERROR, "Overlay DTB check failed");
+rc = ERROR_FAIL;
+goto out;
+} else {
+LOG(DEBUG, "Overlay DTB check passed");
+rc = 0;
+}
+
+r = xc_dt_overlay_domain(ctx->xch, overlay_dt, overlay_dt_size, overlay_op,
+ domain_id);
+if (r) {
+LOG(ERROR, "%s: Attaching/Detaching overlay dtb failed.", __func__);
+rc = ERROR_FAIL;
+}
+
+out:
+GC_FREE;
+return rc;
+}
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 1f3c6b5897..37770b20e3 100644
--- a/tools/xl/xl_cmdtable.

[PATCH v3 6/8] xen/arm: Add XEN_DOMCTL_dt_overlay DOMCTL and related operations

2024-05-20 Thread Henry Wang
In order to support the dynamic dtbo device assignment to a running
VM, the add/remove of the DT overlay and the attach/detach of the
device from the DT overlay should happen separately. Therefore,
repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
overlay to Xen device tree, instead of assigning the device to the
hardware domain at the same time. Add the XEN_DOMCTL_dt_overlay with
operations XEN_DOMCTL_DT_OVERLAY_{ATTACH,DETACH} to do/undo the
device assignment to the domain.

The hypervisor firstly checks the DT overlay passed from the toolstack
is valid. Then the device nodes are retrieved from the overlay tracker
based on the DT overlay. The attach/detach of the device is implemented
by map/unmap the IRQ and IOMMU resources. Note that with these changes,
the device de-registration from the IOMMU driver should only happen at
the time when the DT overlay is removed from the Xen device tree.

Signed-off-by: Henry Wang 
Signed-off-by: Vikram Garhwal 
---
v3:
- Style fixes for arch-selection #ifdefs.
- Do not include public/domctl.h, only add a forward declaration of
  struct xen_domctl_dt_overlay.
- Extract the overlay track entry finding logic to a function, drop
  the unused variables.
- Use op code 1&2 for XEN_DOMCTL_DT_OVERLAY_{ATTACH,DETACH}.
v2:
- New patch.
---
 xen/arch/arm/domctl.c|   3 +
 xen/common/dt-overlay.c  | 438 +++
 xen/include/public/domctl.h  |  15 ++
 xen/include/public/sysctl.h  |  11 +-
 xen/include/xen/dt-overlay.h |   7 +
 5 files changed, 367 insertions(+), 107 deletions(-)

diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index ad56efb0f5..12a12ee781 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -5,6 +5,7 @@
  * Copyright (c) 2012, Citrix Systems
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -176,6 +177,8 @@ long arch_do_domctl(struct xen_domctl *domctl, struct 
domain *d,
 
 return rc;
 }
+case XEN_DOMCTL_dt_overlay:
+return dt_overlay_domctl(d, >u.dt_overlay);
 default:
 return subarch_do_domctl(domctl, d, u_domctl);
 }
diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 9cece79067..693b6e4777 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -356,24 +356,136 @@ static int overlay_get_nodes_info(const void *fdto, char 
**nodes_full_path)
 return 0;
 }
 
+/* This function should be called with the overlay_lock taken */
+static struct overlay_track *
+find_track_entry_from_tracker(const void *overlay_fdt,
+  uint32_t overlay_fdt_size)
+{
+struct overlay_track *entry, *temp;
+bool found_entry = false;
+
+ASSERT(spin_is_locked(_lock));
+
+/*
+ * First check if dtbo is correct i.e. it should one of the dtbo which was
+ * used when dynamically adding the node.
+ * Limitation: Cases with same node names but different property are not
+ * supported currently. We are relying on user to provide the same dtbo
+ * as it was used when adding the nodes.
+ */
+list_for_each_entry_safe( entry, temp, _tracker, entry )
+{
+if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 )
+{
+found_entry = true;
+break;
+}
+}
+
+if ( !found_entry )
+{
+printk(XENLOG_ERR "Cannot find any matching tracker with input dtbo."
+   " Operation is supported only for prior added dtbo.\n");
+return NULL;
+}
+
+return entry;
+}
+
+static int remove_irq(unsigned long s, unsigned long e, void *data)
+{
+struct domain *d = data;
+int rc = 0;
+
+/*
+ * IRQ should always have access unless there are duplication of
+ * of irqs in device tree. There are few cases of xen device tree
+ * where there are duplicate interrupts for the same node.
+ */
+if (!irq_access_permitted(d, s))
+return 0;
+/*
+ * TODO: We don't handle shared IRQs for now. So, it is assumed that
+ * the IRQs was not shared with another domain.
+ */
+rc = irq_deny_access(d, s);
+if ( rc )
+{
+printk(XENLOG_ERR "unable to revoke access for irq %ld\n", s);
+return rc;
+}
+
+rc = release_guest_irq(d, s);
+if ( rc )
+{
+printk(XENLOG_ERR "unable to release irq %ld\n", s);
+return rc;
+}
+
+return rc;
+}
+
+static int remove_all_irqs(struct rangeset *irq_ranges, struct domain *d)
+{
+return rangeset_report_ranges(irq_ranges, 0, ~0UL, remove_irq, d);
+}
+
+static int remove_iomem(unsigned long s, unsigned long e, void *data)
+{
+struct domain *d = data;
+int rc = 0;
+p2m_type_t t;
+mfn_t mfn;
+
+mfn = p2m_lookup(d, _gfn(s), );
+if ( mfn_x(mfn) == 0 || mfn_x(mfn) == ~0UL )
+return -EINVAL;
+
+rc = iomem_deny_access(d, s, e);
+if ( rc )
+{
+printk(XENLOG_ERR "Unable to remov

[PATCH v3 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs

2024-05-20 Thread Henry Wang
From: Vikram Garhwal 

Currently, routing/removing physical interrupts are only allowed at
the domain creation/destroy time. For use cases such as dynamic device
tree overlay adding/removing, the routing/removing of physical IRQ to
running domains should be allowed.

Removing the above-mentioned domain creation/dying check. Since this
will introduce interrupt state unsync issues for cases when the
interrupt is active or pending in the guest, therefore for these cases
we simply reject the operation. Do it for both new and old vGIC
implementations.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
v3:
- Update in-code comments.
- Correct the if conditions.
- Add taking/releasing the vgic lock of the vcpu.
v2:
- Reject the case where the IRQ is active or pending in guest.
---
 xen/arch/arm/gic-vgic.c  | 15 ---
 xen/arch/arm/gic.c   | 15 ---
 xen/arch/arm/vgic/vgic.c | 10 +++---
 3 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 56490dbc43..956c11ba13 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -439,24 +439,33 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
 
 /* We are taking to rank lock to prevent parallel connections. */
 vgic_lock_rank(v_target, rank, flags);
+spin_lock(_target->arch.vgic.lock);
 
 if ( connect )
 {
-/* The VIRQ should not be already enabled by the guest */
+/*
+ * The VIRQ should not be already enabled by the guest nor
+ * active/pending in the guest.
+ */
 if ( !p->desc &&
- !test_bit(GIC_IRQ_GUEST_ENABLED, >status) )
+ !test_bit(GIC_IRQ_GUEST_ENABLED, >status) &&
+ !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) &&
+ !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) )
 p->desc = desc;
 else
 ret = -EBUSY;
 }
 else
 {
-if ( desc && p->desc != desc )
+if ( (desc && p->desc != desc) ||
+ test_bit(GIC_IRQ_GUEST_VISIBLE, >status) ||
+ test_bit(GIC_IRQ_GUEST_ACTIVE, >status) )
 ret = -EINVAL;
 else
 p->desc = NULL;
 }
 
+spin_unlock(_target->arch.vgic.lock);
 vgic_unlock_rank(v_target, rank, flags);
 
 return ret;
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 44c40e86de..3ebd89940a 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -135,14 +135,6 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int 
virq,
 ASSERT(virq < vgic_num_irqs(d));
 ASSERT(!is_lpi(virq));
 
-/*
- * When routing an IRQ to guest, the virtual state is not synced
- * back to the physical IRQ. To prevent get unsync, restrict the
- * routing to when the Domain is been created.
- */
-if ( d->creation_finished )
-return -EBUSY;
-
 ret = vgic_connect_hw_irq(d, NULL, virq, desc, true);
 if ( ret )
 return ret;
@@ -167,13 +159,6 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned 
int virq,
 ASSERT(test_bit(_IRQ_GUEST, >status));
 ASSERT(!is_lpi(virq));
 
-/*
- * Removing an interrupt while the domain is running may have
- * undesirable effect on the vGIC emulation.
- */
-if ( !d->is_dying )
-return -EBUSY;
-
 desc->handler->shutdown(desc);
 
 /* EOI the IRQ if it has not been done by the guest */
diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
index b9463a5f27..78554c11e2 100644
--- a/xen/arch/arm/vgic/vgic.c
+++ b/xen/arch/arm/vgic/vgic.c
@@ -876,8 +876,11 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu 
*vcpu,
 
 if ( connect )  /* assign a mapped IRQ */
 {
-/* The VIRQ should not be already enabled by the guest */
-if ( !irq->hw && !irq->enabled )
+/*
+ * The VIRQ should not be already enabled by the guest nor
+ * active/pending in the guest
+ */
+if ( !irq->hw && !irq->enabled && !irq->active && !irq->pending_latch )
 {
 irq->hw = true;
 irq->hwintid = desc->irq;
@@ -887,7 +890,8 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *vcpu,
 }
 else/* remove a mapped IRQ */
 {
-if ( desc && irq->hwintid != desc->irq )
+if ( (desc && irq->hwintid != desc->irq) ||
+ irq->active || irq->pending_latch )
 {
 ret = -EINVAL;
 }
-- 
2.34.1




[PATCH v3 4/8] tools/arm: Introduce the "nr_spis" xl config entry

2024-05-20 Thread Henry Wang
Currently, the number of SPIs allocated to the domain is only
configurable for Dom0less DomUs. Xen domains are supposed to be
platform agnostics and therefore the numbers of SPIs for libxl
guests should not be based on the hardware.

Introduce a new xl config entry for Arm to provide a method for
user to decide the number of SPIs. This would help to avoid
bumping the `config->arch.nr_spis` in libxl everytime there is a
new platform with increased SPI numbers.

Update the doc and the golang bindings accordingly.

Signed-off-by: Henry Wang 
---
v3:
- Reword documentation to avoid ambiguity.
v2:
- New patch to replace the original patch in v1:
  "[PATCH 05/15] tools/libs/light: Increase nr_spi to 160"
---
 docs/man/xl.cfg.5.pod.in | 14 ++
 tools/golang/xenlight/helpers.gen.go |  2 ++
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/libs/light/libxl_arm.c |  4 ++--
 tools/libs/light/libxl_types.idl |  1 +
 tools/xl/xl_parse.c  |  3 +++
 6 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 8f2b375ce9..416d582844 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -3072,6 +3072,20 @@ raised.
 
 =back
 
+=over 4
+
+=item B
+
+An optional 32-bit integer parameter specifying the number of SPIs (Shared
+Peripheral Interrupts) to allocate for the domain. If the value specified by
+the `nr_spis` parameter is smaller than the number of SPIs calculated by the
+toolstack based on the devices allocated for the domain, or the `nr_spis`
+parameter is not specified, the value calculated by the toolstack will be used
+for the domain. Otherwise, the value specified by the `nr_spis` parameter will
+be used.
+
+=back
+
 =head3 x86
 
 =over 4
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index b9cb5b33c7..fe5110474d 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1154,6 +1154,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.ArchArm.SveVl = SveType(xc.arch_arm.sve_vl)
+x.ArchArm.NrSpis = uint32(xc.arch_arm.nr_spis)
 if err := x.ArchX86.MsrRelaxed.fromC(_x86.msr_relaxed);err != nil {
 return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
 }
@@ -1670,6 +1671,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.arch_arm.sve_vl = C.libxl_sve_type(x.ArchArm.SveVl)
+xc.arch_arm.nr_spis = C.uint32_t(x.ArchArm.NrSpis)
 if err := x.ArchX86.MsrRelaxed.toC(_x86.msr_relaxed); err != nil {
 return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
 }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index 5b293755d7..c9e45b306f 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -597,6 +597,7 @@ ArchArm struct {
 GicVersion GicVersion
 Vuart VuartType
 SveVl SveType
+NrSpis uint32
 }
 ArchX86 struct {
 MsrRelaxed Defbool
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 1cb89fa584..a4029e3ac8 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -181,8 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 
 LOG(DEBUG, "Configure the domain");
 
-config->arch.nr_spis = nr_spis;
-LOG(DEBUG, " - Allocate %u SPIs", nr_spis);
+config->arch.nr_spis = max(nr_spis, d_config->b_info.arch_arm.nr_spis);
+LOG(DEBUG, " - Allocate %u SPIs", config->arch.nr_spis);
 
 switch (d_config->b_info.arch_arm.gic_version) {
 case LIBXL_GIC_VERSION_DEFAULT:
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 79e9c656cc..4e65e6fda5 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -722,6 +722,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
 ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
("vuart", libxl_vuart_type),
("sve_vl", libxl_sve_type),
+   ("nr_spis", uint32),
   ])),
 ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool),
   ])),
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index c504ab3711..e3a4800f6e 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2935,6 +2935,9 @@ skip_usbdev:
 }
 }
 
+if (!xlu_cfg_get_long (config, "nr_spis", , 0))
+b_info->arch_arm.nr_spis = l;
+
 parse_vkb_list(config, d_config);
 
 d_config->virtios = NULL;
-- 
2.34.1




[PATCH v3 2/8] tools/xl: Correct the help information and exit code of the dt-overlay command

2024-05-20 Thread Henry Wang
Fix the name mismatch in the xl dt-overlay command, the
command name should be "dt-overlay" instead of "dt_overlay".
Add the missing "," in the cmdtable.

Fix the exit code of the dt-overlay command, use EXIT_FAILURE
instead of ERROR_FAIL.

Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree 
overlay support")
Suggested-by: Anthony PERARD 
Signed-off-by: Henry Wang 
Reviewed-by: Jason Andryuk 
---
v3:
- Add Jason's Reviewed-by tag.
v2:
- New patch
---
 tools/xl/xl_cmdtable.c  | 2 +-
 tools/xl/xl_vmcontrol.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 62bdb2aeaa..1f3c6b5897 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -635,7 +635,7 @@ const struct cmd_spec cmd_table[] = {
 { "dt-overlay",
   _dt_overlay, 0, 1,
   "Add/Remove a device tree overlay",
-  "add/remove <.dtbo>"
+  "add/remove <.dtbo>",
   "-h print this help\n"
 },
 #endif
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 98f6bd2e76..02575d5d36 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -1278,7 +1278,7 @@ int main_dt_overlay(int argc, char **argv)
 const int overlay_remove_op = 2;
 
 if (argc < 2) {
-help("dt_overlay");
+help("dt-overlay");
 return EXIT_FAILURE;
 }
 
@@ -1302,11 +1302,11 @@ int main_dt_overlay(int argc, char **argv)
 fprintf(stderr, "failed to read the overlay device tree file %s\n",
 overlay_config_file);
 free(overlay_dtb);
-return ERROR_FAIL;
+return EXIT_FAILURE;
 }
 } else {
 fprintf(stderr, "overlay dtbo file not provided\n");
-return ERROR_FAIL;
+return EXIT_FAILURE;
 }
 
 rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op);
-- 
2.34.1




Re: [PATCH v2 7/8] tools: Introduce the "xl dt-overlay {attach,detach}" commands

2024-05-20 Thread Henry Wang

Hi Jason,

On 5/21/2024 3:41 AM, Jason Andryuk wrote:

On 2024-05-16 06:03, Henry Wang wrote:

+    domain_id = strtol(argv[optind+2], NULL, 10);


domain_id = find_domain(argv[optind+2]);


Good point, thanks. I will use the find_domain() in the next version.

Kind regards,
Henry



And you'll get name resolution, too.

Thanks,
Jason





Re: [PATCH v2 4/8] tools/arm: Introduce the "nr_spis" xl config entry

2024-05-20 Thread Henry Wang

Hi Jason,

On 5/21/2024 3:13 AM, Jason Andryuk wrote:

+
+=item B
+
+A 32-bit optional integer parameter specifying the number of SPIs 
(Shared


I'd phrase it "An optional 32-but integer"

+Peripheral Interrupts) to allocate for the domain. If the `nr_spis` 
parameter
+is missing, the max number of SPIs calculated by the toolstack based 
on the

+devices allocated for the domain will be used.


This text says the maximum only applies if xl.cfg nr_spis is not setup.


+
+=back
+
  =head3 x86
    =over 4



diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 1cb89fa584..a4029e3ac8 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -181,8 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
    LOG(DEBUG, "Configure the domain");
  -    config->arch.nr_spis = nr_spis;
-    LOG(DEBUG, " - Allocate %u SPIs", nr_spis);
+    config->arch.nr_spis = max(nr_spis, 
d_config->b_info.arch_arm.nr_spis);

+    LOG(DEBUG, " - Allocate %u SPIs", config->arch.nr_spis);


But this is always taking the max.  Should it instead be:

config->arch.nr_spis = d_config->b_info.arch_arm.nr_spis ?: nr_spis;

However, I don't know if that makes sense for ARM.  Does the hardware 
nr_spis need to be a minimum for a domain?


Really, we just want the documentation to match the code.


Before you pointed this out, I didn't realize the ambiguity in the doc 
about the "max". The "max" in the doc have different meanings compared 
to the "max()" in the code. I will drop the "max" in the doc and reword 
the doc to "If the `nr_spis` parameter is missing, the number of SPIs 
calculated by the toolstack based on the  devices allocated for the 
domain will be used.". Thanks for pointing it out.


Kind regards,
Henry



Thanks,
Jason





Re: [PATCH v3 1/4] xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains

2024-05-20 Thread Henry Wang

Hi Michal,

On 5/21/2024 12:09 AM, Michal Orzel wrote:

Thanks. I will take the tag if you are ok with above diff (for the case
if this series goes in later than Luca's).

I would move this check to process_shm() right after "gbase = dt_read_paddr" 
setting.
This would be the most natural placement for such a check.

That sounds good. Thanks! IIUC we only need to add the check for the
pbase != INVALID_PADDR case correct?

Yes, but at the same time I wonder whether we should also return error if a 
user omits pbase
for direct mapped domain.


I think this makes sense. So I will add also a check for the case if 
users omit pbase in the device tree for the direct mapped domain.


Kind regards,
Henry



~Michal





Re: [PATCH v3 1/4] xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains

2024-05-20 Thread Henry Wang

Hi Michal,

On 5/20/2024 11:46 PM, Michal Orzel wrote:

Hi Henry,

On 20/05/2024 16:52, Henry Wang wrote:

Hi Michal, Luca,

On 5/20/2024 7:24 PM, Michal Orzel wrote:

Hi Henry,

+CC: Luca

On 17/05/2024 05:21, Henry Wang wrote:

To make things easier, add restriction that static shared memory
should also be direct-mapped for direct-mapped domains. Check the
host physical address to be matched with guest physical address when
parsing the device tree. Document this restriction in the doc.

I'm ok with this restriction.

@Luca, do you have any use case preventing us from making this restriction?

This patch clashes with Luca series so depending on which goes first,

I agree that there will be some conflicts between the two series. To
avoid back and forth, if Luca's series goes in first, would it be ok for
you if I place the same check from this patch in
handle_shared_mem_bank() like below?

diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index 9c3a83042d..2d23fa4917 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -219,6 +219,13 @@ static int __init handle_shared_mem_bank(struct
domain *d, paddr_t gbase,
   pbase = shm_bank->start;
   psize = shm_bank->size;

+    if ( is_domain_direct_mapped(d) && (pbase != gbase) )
+    {
+    printk("%pd: physical address 0x%"PRIpaddr" and guest address
0x%"PRIpaddr" are not direct-mapped.\n",
+   d, pbase, gbase);
+    return -EINVAL;
+    }
+
   printk("%pd: SHMEM map from %s: mphys 0x%"PRIpaddr" -> gphys
0x%"PRIpaddr", size 0x%"PRIpaddr"\n",
      d, bank_from_heap ? "Xen heap" : "Host", pbase, gbase, psize);


Acked-by: Michal Orzel 

Thanks. I will take the tag if you are ok with above diff (for the case
if this series goes in later than Luca's).

I would move this check to process_shm() right after "gbase = dt_read_paddr" 
setting.
This would be the most natural placement for such a check.


That sounds good. Thanks! IIUC we only need to add the check for the 
pbase != INVALID_PADDR case correct?


Kind regards,
Henry


~Michal





Re: [PATCH v3 1/4] xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains

2024-05-20 Thread Henry Wang

Hi Michal, Luca,

On 5/20/2024 7:24 PM, Michal Orzel wrote:

Hi Henry,

+CC: Luca

On 17/05/2024 05:21, Henry Wang wrote:

To make things easier, add restriction that static shared memory
should also be direct-mapped for direct-mapped domains. Check the
host physical address to be matched with guest physical address when
parsing the device tree. Document this restriction in the doc.

I'm ok with this restriction.

@Luca, do you have any use case preventing us from making this restriction?

This patch clashes with Luca series so depending on which goes first,


I agree that there will be some conflicts between the two series. To 
avoid back and forth, if Luca's series goes in first, would it be ok for 
you if I place the same check from this patch in 
handle_shared_mem_bank() like below?


diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index 9c3a83042d..2d23fa4917 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -219,6 +219,13 @@ static int __init handle_shared_mem_bank(struct 
domain *d, paddr_t gbase,

 pbase = shm_bank->start;
 psize = shm_bank->size;

+    if ( is_domain_direct_mapped(d) && (pbase != gbase) )
+    {
+    printk("%pd: physical address 0x%"PRIpaddr" and guest address 
0x%"PRIpaddr" are not direct-mapped.\n",

+   d, pbase, gbase);
+    return -EINVAL;
+    }
+
 printk("%pd: SHMEM map from %s: mphys 0x%"PRIpaddr" -> gphys 
0x%"PRIpaddr", size 0x%"PRIpaddr"\n",

    d, bank_from_heap ? "Xen heap" : "Host", pbase, gbase, psize);


Acked-by: Michal Orzel 


Thanks. I will take the tag if you are ok with above diff (for the case 
if this series goes in later than Luca's).



  }
+if ( is_domain_direct_mapped(d) && (pbase != gbase) )
+{
+printk("%pd: physical address 0x%"PRIpaddr" and guest address 
0x%"PRIpaddr" are not 1:1 direct-mapped.\n",

NIT: 1:1 and direct-mapped means the same so no need to place them next to each 
other


Ok. I will drop the "1:1" in the next version. Thanks.

Kind regards,
Henry


~Michal





[PATCH] tools/golang: Add missing golang bindings for vlan

2024-05-20 Thread Henry Wang
It is noticed that commit:
3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic")
introduces a new "vlan" string field to libxl_device_nic. But the
golang bindings are missing. Add it in this patch.

Fixes: 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic")
Signed-off-by: Henry Wang 
---
The code is automatically generated by:
```
./configure
make tools
```
---
 tools/golang/xenlight/helpers.gen.go | 3 +++
 tools/golang/xenlight/types.gen.go   | 1 +
 2 files changed, 4 insertions(+)

diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index 78bdb08b15..b9cb5b33c7 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1963,6 +1963,7 @@ func (x *DeviceNic) fromC(xc *C.libxl_device_nic) error {
 x.BackendDomname = C.GoString(xc.backend_domname)
 x.Devid = Devid(xc.devid)
 x.Mtu = int(xc.mtu)
+x.Vlan = C.GoString(xc.vlan)
 x.Model = C.GoString(xc.model)
 if err := x.Mac.fromC();err != nil {
 return fmt.Errorf("converting field Mac: %v", err)
@@ -2040,6 +2041,8 @@ if x.BackendDomname != "" {
 xc.backend_domname = C.CString(x.BackendDomname)}
 xc.devid = C.libxl_devid(x.Devid)
 xc.mtu = C.int(x.Mtu)
+if x.Vlan != "" {
+xc.vlan = C.CString(x.Vlan)}
 if x.Model != "" {
 xc.model = C.CString(x.Model)}
 if err := x.Mac.toC(); err != nil {
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index ccfe18019e..5b293755d7 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -756,6 +756,7 @@ BackendDomid Domid
 BackendDomname string
 Devid Devid
 Mtu int
+Vlan string
 Model string
 Mac Mac
 Ip string
-- 
2.34.1




Re: [PATCH v3] xen/arm: Set correct per-cpu cpu_core_mask

2024-05-19 Thread Henry Wang

Hi All,

Gentle ping since it has been a couple of months, any comments on this 
updated patch? Thanks!


Kind regards,
Henry

On 3/21/2024 11:57 AM, Henry Wang wrote:

In the common sysctl command XEN_SYSCTL_physinfo, the value of
cores_per_socket is calculated based on the cpu_core_mask of CPU0.
Currently on Arm this is a fixed value 1 (can be checked via xl info),
which is not correct. This is because during the Arm CPU online
process at boot time, setup_cpu_sibling_map() only sets the per-cpu
cpu_core_mask for itself.

cores_per_socket refers to the number of cores that belong to the same
socket (NUMA node). Currently Xen on Arm does not support physical
CPU hotplug and NUMA, also we assume there is no multithread. Therefore
cores_per_socket means all possible CPUs detected from the device
tree. Setting the per-cpu cpu_core_mask in setup_cpu_sibling_map()
accordingly. Modify the in-code comment which seems to be outdated. Add
a warning to users if Xen is running on processors with multithread
support.

Signed-off-by: Henry Wang 
Signed-off-by: Henry Wang 
---
v3:
- Use cpumask_copy() to set cpu_core_mask and drop the unnecessary
   cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu)).
- In-code comment adjustments.
- Add a warning for multithread.
v2:
- Do not do the multithread check.
---
  xen/arch/arm/smpboot.c | 18 +++---
  1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index a84e706d77..b6268be27a 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -66,7 +66,11 @@ static bool cpu_is_dead;
  
  /* ID of the PCPU we're running on */

  DEFINE_PER_CPU(unsigned int, cpu_id);
-/* XXX these seem awfully x86ish... */
+/*
+ * Although multithread is part of the Arm spec, there are not many
+ * processors support multithread and current Xen on Arm assumes there
+ * is no multithread.
+ */
  /* representing HT siblings of each logical CPU */
  DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_mask);
  /* representing HT and core siblings of each logical CPU */
@@ -85,9 +89,13 @@ static int setup_cpu_sibling_map(int cpu)
   !zalloc_cpumask_var(_cpu(cpu_core_mask, cpu)) )
  return -ENOMEM;
  
-/* A CPU is a sibling with itself and is always on its own core. */

+/*
+ * Currently we assume there is no multithread and NUMA, so
+ * a CPU is a sibling with itself, and the all possible CPUs
+ * are supposed to belong to the same socket (NUMA node).
+ */
  cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
-cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu));
+cpumask_copy(per_cpu(cpu_core_mask, cpu), _possible_map);
  
  return 0;

  }
@@ -277,6 +285,10 @@ void __init smp_init_cpus(void)
  warning_add("WARNING: HMP COMPUTING HAS BEEN ENABLED.\n"
  "It has implications on the security and stability of the 
system,\n"
  "unless the cpu affinity of all domains is specified.\n");
+
+if ( system_cpuinfo.mpidr.mt == 1 )
+warning_add("WARNING: MULTITHREADING HAS BEEN DETECTED ON THE 
PROCESSOR.\n"
+"It might impact the security of the system.\n");
  }
  
  unsigned int __init smp_get_max_cpus(void)





Re: [PATCH v2 3/8] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs

2024-05-19 Thread Henry Wang

Hi Julien,

On 5/20/2024 8:41 AM, Henry Wang wrote:

Hi Julien,

Thanks for spending time on the review!

On 5/19/2024 6:17 PM, Julien Grall wrote:

Hi Henry,

On 16/05/2024 11:03, Henry Wang wrote:
diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt

index bbd955e9c2..61f9082553 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -260,6 +260,19 @@ with the following properties:
  value specified by Xen command line parameter 
gnttab_max_maptrack_frames

  (or its default value if unspecified, i.e. 1024) is used.
  +- passthrough
+
+    A string property specifying whether IOMMU mappings are enabled 
for the
+    domain and hence whether it will be enabled for passthrough 
hardware.

+    Possible property values are:
+
+    - "enabled"
+    IOMMU mappings are enabled for the domain.
+
+    - "disabled"
+    IOMMU mappings are disabled for the domain and so hardware may 
not be
+    passed through. This option is the default if this property is 
missing.


Looking at the code below, it seems like the default will depend on 
whether the partial device-tree is present. Did I misunderstand?


I am not sure if I understand the "partial device tree" in above 
comment correctly. The "passthrough" property is supposed to be placed 
in the dom0less domU domain node exactly the same way as the other 
dom0less domU properties (such as "direct-map" etc.). This way we can 
control the XEN_DOMCTL_CDF_iommu is set or not for each dom0less domU 
separately.


Oh I think I get your points, you meant the XEN_DOMCTL_CDF_iommu will 
still be set if the passthrough dt property is "disabled", but user 
provides a partial device tree. Yes you are correct. I will update the 
doc to explain a bit more details as below. Thanks for pointing it out.


 - "enabled"
    IOMMU mappings are enabled for the domain. Note that this option is 
the
    default if the user provides the device partial passthrough device 
tree

    for the domain.

 - "disabled"
    IOMMU mappings are disabled for the domain and so hardware may not be
    passed through. This option is the default if this property is missing
    and the user does not provide the device partial device tree for 
the domain.


Kind regards,
Henry



Re: [PATCH v2 3/8] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs

2024-05-19 Thread Henry Wang

Hi Julien,

On 5/20/2024 8:41 AM, Henry Wang wrote:

Hi Julien,

Thanks for spending time on the review!

On 5/19/2024 6:17 PM, Julien Grall wrote:

Hi Henry,

On 16/05/2024 11:03, Henry Wang wrote:
diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt

index bbd955e9c2..61f9082553 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -260,6 +260,19 @@ with the following properties:
  value specified by Xen command line parameter 
gnttab_max_maptrack_frames

  (or its default value if unspecified, i.e. 1024) is used.
  +- passthrough
+
+    A string property specifying whether IOMMU mappings are enabled 
for the
+    domain and hence whether it will be enabled for passthrough 
hardware.

+    Possible property values are:
+
+    - "enabled"
+    IOMMU mappings are enabled for the domain.
+
+    - "disabled"
+    IOMMU mappings are disabled for the domain and so hardware may 
not be
+    passed through. This option is the default if this property is 
missing.


Looking at the code below, it seems like the default will depend on 
whether the partial device-tree is present. Did I misunderstand?


I am not sure if I understand the "partial device tree" in above 
comment correctly. The "passthrough" property is supposed to be placed 
in the dom0less domU domain node exactly the same way as the other 
dom0less domU properties (such as "direct-map" etc.). This way we can 
control the XEN_DOMCTL_CDF_iommu is set or not for each dom0less domU 
separately.


Oh I think I get your points, you meant the XEN_DOMCTL_CDF_iommu will 
still be set if the passthrough dt property is "disabled", but user 
provides a partial device tree. Yes you are correct. I will update the 
doc to explain a bit more details as below. Thanks for pointing it out.


 - "enabled"
    IOMMU mappings are enabled for the domain. Note that this option is the
    default if the user provides the device partial passthrough device tree
    for the domain.

 - "disabled"
    IOMMU mappings are disabled for the domain and so hardware may not be
    passed through. This option is the default if this property is missing
    and the user does not provide the device partial device tree for 
the domain.


Kind regards,
Henry



Re: [PATCH v2 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs

2024-05-19 Thread Henry Wang

Hi Julien,

On 5/19/2024 7:08 PM, Julien Grall wrote:

Hi,

On 17/05/2024 07:03, Henry Wang wrote:
@@ -444,14 +444,18 @@ int vgic_connect_hw_irq(struct domain *d, 
struct vcpu *v, unsigned int virq,

  {
  /* The VIRQ should not be already enabled by the guest */


This comment needs to be updated.


Yes, sorry. I will update this and the one in the new vGIC in v3.


  if ( !p->desc &&
- !test_bit(GIC_IRQ_GUEST_ENABLED, >status) )
+ !test_bit(GIC_IRQ_GUEST_ENABLED, >status) &&
+ !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) &&
+ !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) )
  p->desc = desc;
  else
  ret = -EBUSY;
  }
  else
  {
-    if ( desc && p->desc != desc )
+    if ( desc && p->desc != desc &&
+ (test_bit(GIC_IRQ_GUEST_VISIBLE, >status) ||
+  test_bit(GIC_IRQ_GUEST_ACTIVE, >status)) )


This should be

+    if ( (desc && p->desc != desc) ||
+ test_bit(GIC_IRQ_GUEST_VISIBLE, >status) ||
+ test_bit(GIC_IRQ_GUEST_ACTIVE, >status) )
Looking at gic_set_lr(), we first check p->desc, before setting 
IRQ_GUEST_VISIBLE.


I can't find a common lock, so what would guarantee that p->desc is 
not going to be used or IRQ_GUEST_VISIBLE set afterwards?


I think the gic_set_lr() is supposed to be called with v->arch.vgic.lock 
taken, at least the current two callers (gic_raise_guest_irq() and 
gic_restore_pending_irqs()) are doing it this way. Would this address 
your concern? Thanks.


Kind regards,
Henry



Re: [PATCH v2 3/8] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs

2024-05-19 Thread Henry Wang

Hi Julien,

Thanks for spending time on the review!

On 5/19/2024 6:17 PM, Julien Grall wrote:

Hi Henry,

On 16/05/2024 11:03, Henry Wang wrote:
diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt

index bbd955e9c2..61f9082553 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -260,6 +260,19 @@ with the following properties:
  value specified by Xen command line parameter 
gnttab_max_maptrack_frames

  (or its default value if unspecified, i.e. 1024) is used.
  +- passthrough
+
+    A string property specifying whether IOMMU mappings are enabled 
for the
+    domain and hence whether it will be enabled for passthrough 
hardware.

+    Possible property values are:
+
+    - "enabled"
+    IOMMU mappings are enabled for the domain.
+
+    - "disabled"
+    IOMMU mappings are disabled for the domain and so hardware may 
not be
+    passed through. This option is the default if this property is 
missing.


Looking at the code below, it seems like the default will depend on 
whether the partial device-tree is present. Did I misunderstand?


I am not sure if I understand the "partial device tree" in above comment 
correctly. The "passthrough" property is supposed to be placed in the 
dom0less domU domain node exactly the same way as the other dom0less 
domU properties (such as "direct-map" etc.). This way we can control the 
XEN_DOMCTL_CDF_iommu is set or not for each dom0less domU separately.



+
  Under the "xen,domain" compatible node, one or more sub-nodes are 
present

  for the DomU kernel and ramdisk.
  diff --git a/xen/arch/arm/dom0less-build.c 
b/xen/arch/arm/dom0less-build.c

index 74f053c242..1396a102e1 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -848,6 +848,7 @@ static int __init construct_domU(struct domain *d,
  void __init create_domUs(void)
  {
  struct dt_device_node *node;
+    const char *dom0less_iommu;
  const struct dt_device_node *cpupool_node,
  *chosen = 
dt_find_node_by_path("/chosen");

  @@ -895,8 +896,10 @@ void __init create_domUs(void)
  panic("Missing property 'cpus' for domain %s\n",
    dt_node_name(node));
  -    if ( dt_find_compatible_node(node, NULL, 
"multiboot,device-tree") &&

- iommu_enabled )
+    if ( iommu_enabled &&
+ ((!dt_property_read_string(node, "passthrough", 
_iommu) &&

+   !strcmp(dom0less_iommu, "enabled")) ||
+  dt_find_compatible_node(node, NULL, 
"multiboot,device-tree")) )


This condition is getting a little bit harder to read. Can we cache 
the "passthrough" flag separately?


Yes sure. Will do this in v3.

Also, shouldn't we throw a panic if passthrough = "enabled" but the 
IOMMU is enabled?


I take the above "enabled" should be "disabled"? Actually we already 
have several checks to do that: Firstly, the above if condition checks 
the "iommu_enabled", so if IOMMU is disabled, the XEN_DOMCTL_CDF_iommu 
is never set. Also, in later on domain config sanitising process, i.e. 
domain_create() -> sanitise_domain_config(), there is also a check and 
panic to check if XEN_DOMCTL_CDF_iommu is somehow set but IOMMU is 
disabled. So I think these are sufficient for us. Did I understand your 
comment correctly?


Kind regards,
Henry


  d_cfg.flags |= XEN_DOMCTL_CDF_iommu;
    if ( !dt_property_read_u32(node, "nr_spis", 
_cfg.arch.nr_spis) )


Cheers,






Re: [PATCH v2 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs

2024-05-17 Thread Henry Wang




On 5/16/2024 6:03 PM, Henry Wang wrote:

From: Vikram Garhwal 

Currently, routing/removing physical interrupts are only allowed at
the domain creation/destroy time. For use cases such as dynamic device
tree overlay adding/removing, the routing/removing of physical IRQ to
running domains should be allowed.

Removing the above-mentioned domain creation/dying check. Since this
will introduce interrupt state unsync issues for cases when the
interrupt is active or pending in the guest, therefore for these cases
we simply reject the operation. Do it for both new and old vGIC
implementations.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
v2:
- Reject the case where the IRQ is active or pending in guest.
---
  xen/arch/arm/gic-vgic.c  |  8 ++--
  xen/arch/arm/gic.c   | 15 ---
  xen/arch/arm/vgic/vgic.c |  5 +++--
  3 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 56490dbc43..d1608415f8 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -444,14 +444,18 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
  {
  /* The VIRQ should not be already enabled by the guest */
  if ( !p->desc &&
- !test_bit(GIC_IRQ_GUEST_ENABLED, >status) )
+ !test_bit(GIC_IRQ_GUEST_ENABLED, >status) &&
+ !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) &&
+ !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) )
  p->desc = desc;
  else
  ret = -EBUSY;
  }
  else
  {
-if ( desc && p->desc != desc )
+if ( desc && p->desc != desc &&
+ (test_bit(GIC_IRQ_GUEST_VISIBLE, >status) ||
+  test_bit(GIC_IRQ_GUEST_ACTIVE, >status)) )


This should be

+if ( (desc && p->desc != desc) ||
+ test_bit(GIC_IRQ_GUEST_VISIBLE, >status) ||
+ test_bit(GIC_IRQ_GUEST_ACTIVE, >status) )


@@ -887,7 +887,8 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *vcpu,
  }
  else/* remove a mapped IRQ */
  {
-if ( desc && irq->hwintid != desc->irq )
+if ( desc && irq->hwintid != desc->irq &&
+ (irq->active || irq->pending_latch) )


Same here, this should be

+if ( (desc && irq->hwintid != desc->irq) ||
+ irq->active || irq->pending_latch )

Kind regards,
Henry




[PATCH v3 3/4] tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE

2024-05-16 Thread Henry Wang
Currently the GUEST_MAGIC_BASE in the init-dom0less application is
hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less
DomUs.

Since the guest magic region allocation from init-dom0less is for
XenStore, and the XenStore page is now allocated from the hypervisor,
instead of hardcoding the guest magic pages region, use
xc_hvm_param_get() to get the XenStore page PFN. Rename alloc_xs_page()
to get_xs_page() to reflect the changes.

With this change, some existing code is not needed anymore, including:
(1) The definition of the XenStore page offset.
(2) Call to xc_domain_setmaxmem() and xc_clear_domain_page() as we
don't need to set the max mem and clear the page anymore.
(3) Foreign mapping of the XenStore page, setting of XenStore interface
status and HVM_PARAM_STORE_PFN from init-dom0less, as they are set
by the hypervisor.

Take the opportunity to do some coding style improvements when possible.

Reported-by: Alec Kwapis 
Signed-off-by: Henry Wang 
---
v3:
- Only get the XenStore page.
- Drop the unneeded code.
v2:
- Update HVMOP keys name.
---
 tools/helpers/init-dom0less.c | 58 +--
 1 file changed, 14 insertions(+), 44 deletions(-)

diff --git a/tools/helpers/init-dom0less.c b/tools/helpers/init-dom0less.c
index fee93459c4..2b51965fa7 100644
--- a/tools/helpers/init-dom0less.c
+++ b/tools/helpers/init-dom0less.c
@@ -16,30 +16,18 @@
 
 #include "init-dom-json.h"
 
-#define XENSTORE_PFN_OFFSET 1
 #define STR_MAX_LENGTH 128
 
-static int alloc_xs_page(struct xc_interface_core *xch,
- libxl_dominfo *info,
- uint64_t *xenstore_pfn)
+static int get_xs_page(struct xc_interface_core *xch, libxl_dominfo *info,
+   uint64_t *xenstore_pfn)
 {
 int rc;
-const xen_pfn_t base = GUEST_MAGIC_BASE >> XC_PAGE_SHIFT;
-xen_pfn_t p2m = (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET;
 
-rc = xc_domain_setmaxmem(xch, info->domid,
- info->max_memkb + (XC_PAGE_SIZE/1024));
-if (rc < 0)
-return rc;
-
-rc = xc_domain_populate_physmap_exact(xch, info->domid, 1, 0, 0, );
-if (rc < 0)
-return rc;
-
-*xenstore_pfn = base + XENSTORE_PFN_OFFSET;
-rc = xc_clear_domain_page(xch, info->domid, *xenstore_pfn);
-if (rc < 0)
-return rc;
+rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_STORE_PFN, xenstore_pfn);
+if (rc < 0) {
+printf("Failed to get HVM_PARAM_STORE_PFN\n");
+return 1;
+}
 
 return 0;
 }
@@ -100,6 +88,7 @@ static bool do_xs_write_vm(struct xs_handle *xsh, 
xs_transaction_t t,
  */
 static int create_xenstore(struct xs_handle *xsh,
libxl_dominfo *info, libxl_uuid uuid,
+   uint64_t xenstore_pfn,
evtchn_port_t xenstore_port)
 {
 domid_t domid;
@@ -145,8 +134,7 @@ static int create_xenstore(struct xs_handle *xsh,
 rc = snprintf(target_memkb_str, STR_MAX_LENGTH, "%"PRIu64, 
info->current_memkb);
 if (rc < 0 || rc >= STR_MAX_LENGTH)
 return rc;
-rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%lld",
-  (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET);
+rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%"PRIu64, xenstore_pfn);
 if (rc < 0 || rc >= STR_MAX_LENGTH)
 return rc;
 rc = snprintf(xenstore_port_str, STR_MAX_LENGTH, "%u", xenstore_port);
@@ -230,7 +218,6 @@ static int init_domain(struct xs_handle *xsh,
 libxl_uuid uuid;
 uint64_t xenstore_evtchn, xenstore_pfn;
 int rc;
-struct xenstore_domain_interface *intf;
 
 printf("Init dom0less domain: %u\n", info->domid);
 
@@ -245,20 +232,11 @@ static int init_domain(struct xs_handle *xsh,
 if (!xenstore_evtchn)
 return 0;
 
-/* Alloc xenstore page */
-if (alloc_xs_page(xch, info, _pfn) != 0) {
-printf("Error on alloc magic pages\n");
-return 1;
-}
-
-intf = xenforeignmemory_map(xfh, info->domid, PROT_READ | PROT_WRITE, 1,
-_pfn, NULL);
-if (!intf) {
-printf("Error mapping xenstore page\n");
+/* Get xenstore page */
+if (get_xs_page(xch, info, _pfn) != 0) {
+printf("Error on getting xenstore page\n");
 return 1;
 }
-intf->connection = XENSTORE_RECONNECT;
-xenforeignmemory_unmap(xfh, intf, 1);
 
 rc = xc_dom_gnttab_seed(xch, info->domid, true,
 (xen_pfn_t)-1, xenstore_pfn, 0, 0);
@@ -272,19 +250,11 @@ static int init_domain(struct xs_handle *xsh,
 if (rc)
 err(1, "gen_stub_json_config");
 
-/* Now everything is ready: set HVM_PARAM_STORE_PFN */
-rc = xc_hvm_param_set(xch, info->domid, HVM_PARAM_STORE_PFN

[PATCH v3 4/4] docs/features/dom0less: Update the late XenStore init protocol

2024-05-16 Thread Henry Wang
With the new allocation strategy of Dom0less DomUs XenStore page,
update the doc of the late XenStore init protocol accordingly.

Signed-off-by: Henry Wang 
---
v3:
- Wording change.
v2:
- New patch.
---
 docs/features/dom0less.pandoc | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/docs/features/dom0less.pandoc b/docs/features/dom0less.pandoc
index 725afa0558..8b178edee0 100644
--- a/docs/features/dom0less.pandoc
+++ b/docs/features/dom0less.pandoc
@@ -110,9 +110,10 @@ hotplug PV drivers to dom0less guests. E.g. xl 
network-attach domU.
 The implementation works as follows:
 - Xen allocates the xenstore event channel for each dom0less domU that
   has the "xen,enhanced" property, and sets HVM_PARAM_STORE_EVTCHN
-- Xen does *not* allocate the xenstore page and sets HVM_PARAM_STORE_PFN
-  to ~0ULL (invalid)
-- Dom0less domU kernels check that HVM_PARAM_STORE_PFN is set to invalid
+- Xen allocates the xenstore page and sets HVM_PARAM_STORE_PFN as well
+  as the connection status to XENSTORE_RECONNECT.
+- Dom0less domU kernels check that HVM_PARAM_STORE_PFN is set to
+  ~0ULL (invalid) or the connection status is *not* XENSTORE_CONNECTED.
 - Old kernels will continue without xenstore support (Note: some old
   buggy kernels might crash because they don't check the validity of
   HVM_PARAM_STORE_PFN before using it! Disable "xen,enhanced" in
@@ -121,13 +122,14 @@ The implementation works as follows:
   channel (HVM_PARAM_STORE_EVTCHN) before continuing with the
   initialization
 - Once dom0 is booted, init-dom0less is executed:
-- it allocates the xenstore shared page and sets HVM_PARAM_STORE_PFN
+- it gets the xenstore shared page from HVM_PARAM_STORE_PFN
 - it calls xs_introduce_domain
 - Xenstored notices the new domain, initializes interfaces as usual, and
   sends an event channel notification to the domain using the xenstore
   event channel (HVM_PARAM_STORE_EVTCHN)
 - The Linux domU kernel receives the event channel notification, checks
-  HVM_PARAM_STORE_PFN again and continue with the initialization
+  HVM_PARAM_STORE_PFN and the connection status again and continue with
+  the initialization
 
 
 Limitations
-- 
2.34.1




[PATCH v3 2/4] xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor

2024-05-16 Thread Henry Wang
There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

The "magic page" is a terminology used in the toolstack as reserved
pages for the VM to have access to virtual platform capabilities.
Currently the magic pages for Dom0less DomUs are populated by the
init-dom0less app through populate_physmap(), and populate_physmap()
automatically assumes gfn == mfn for 1:1 direct mapped domains. This
cannot be true for the magic pages that are allocated later from the
init-dom0less application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list is
empty at that time.

Since for init-dom0less, the magic page region is only for XenStore.
To solve above issue, this commit allocates the XenStore page for
Dom0less DomUs at the domain construction time. The PFN will be
noted and communicated to the init-dom0less application executed
from Dom0. To keep the XenStore late init protocol, set the connection
status to XENSTORE_RECONNECT.

Reported-by: Alec Kwapis 
Suggested-by: Daniel P. Smith 
Signed-off-by: Henry Wang 
---
v3:
- Only allocate XenStore page. (Julien)
- Set HVM_PARAM_STORE_PFN and the XenStore connection status directly
  from hypervisor. (Stefano)
v2:
- Reword the commit msg to explain what is "magic page" and use generic
  terminology "hypervisor reserved pages" in commit msg. (Daniel)
- Also move the offset definition of magic pages. (Michal)
- Extract the magic page allocation logic to a function. (Michal)
---
 xen/arch/arm/dom0less-build.c | 44 ++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 74f053c242..95c4fd1a2d 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -10,6 +11,8 @@
 #include 
 #include 
 
+#include 
+
 #include 
 #include 
 #include 
@@ -739,6 +742,42 @@ static int __init alloc_xenstore_evtchn(struct domain *d)
 return 0;
 }
 
+#define XENSTORE_PFN_OFFSET 1
+static int __init alloc_xenstore_page(struct domain *d)
+{
+struct page_info *xenstore_pg;
+struct xenstore_domain_interface *interface;
+mfn_t mfn;
+gfn_t gfn;
+int rc;
+
+d->max_pages += 1;
+xenstore_pg = alloc_domheap_page(d, 0);
+if ( xenstore_pg == NULL )
+return -ENOMEM;
+
+mfn = page_to_mfn(xenstore_pg);
+if ( !is_domain_direct_mapped(d) )
+gfn = gaddr_to_gfn(GUEST_MAGIC_BASE +
+   (XENSTORE_PFN_OFFSET << PAGE_SHIFT));
+else
+gfn = gaddr_to_gfn(mfn_to_maddr(mfn));
+
+rc = guest_physmap_add_page(d, gfn, mfn, 0);
+if ( rc )
+{
+free_domheap_page(xenstore_pg);
+return rc;
+}
+
+d->arch.hvm.params[HVM_PARAM_STORE_PFN] = gfn_x(gfn);
+interface = (struct xenstore_domain_interface *)map_domain_page(mfn);
+interface->connection = XENSTORE_RECONNECT;
+unmap_domain_page(interface);
+
+return 0;
+}
+
 static int __init construct_domU(struct domain *d,
  const struct dt_device_node *node)
 {
@@ -839,7 +878,10 @@ static int __init construct_domU(struct domain *d,
 rc = alloc_xenstore_evtchn(d);
 if ( rc < 0 )
 return rc;
-d->arch.hvm.params[HVM_PARAM_STORE_PFN] = ~0ULL;
+
+rc = alloc_xenstore_page(d);
+if ( rc < 0 )
+return rc;
 }
 
 return rc;
-- 
2.34.1




[PATCH v3 1/4] xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains

2024-05-16 Thread Henry Wang
Currently, users are allowed to map static shared memory in a
non-direct-mapped way for direct-mapped domains. This can lead to
clashing of guest memory spaces. Also, the current extended region
finding logic only removes the host physical addresses of the
static shared memory areas for direct-mapped domains, which may be
inconsistent with the guest memory map if users map the static
shared memory in a non-direct-mapped way. This will lead to incorrect
extended region calculation results.

To make things easier, add restriction that static shared memory
should also be direct-mapped for direct-mapped domains. Check the
host physical address to be matched with guest physical address when
parsing the device tree. Document this restriction in the doc.

Signed-off-by: Henry Wang 
---
v3:
- New patch.
---
 docs/misc/arm/device-tree/booting.txt | 3 +++
 xen/arch/arm/static-shmem.c   | 6 ++
 2 files changed, 9 insertions(+)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index bbd955e9c2..c994e48391 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -591,6 +591,9 @@ communication.
 shared memory region in host physical address space, a size, and a guest
 physical address, as the target address of the mapping.
 e.g. xen,shared-mem = < [host physical address] [guest address] [size] >
+Note that if a domain is direct-mapped, i.e. the Dom0 and the Dom0less
+DomUs with `direct-map` device tree property, the static shared memory
+should also be direct-mapped (host physical address == guest address).
 
 It shall also meet the following criteria:
 1) If the SHM ID matches with an existing region, the address range of the
diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index 78881dd1d3..b26fb69874 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -235,6 +235,12 @@ int __init process_shm(struct domain *d, struct 
kernel_info *kinfo,
d, psize);
 return -EINVAL;
 }
+if ( is_domain_direct_mapped(d) && (pbase != gbase) )
+{
+printk("%pd: physical address 0x%"PRIpaddr" and guest address 
0x%"PRIpaddr" are not 1:1 direct-mapped.\n",
+   d, pbase, gbase);
+return -EINVAL;
+}
 
 for ( i = 0; i < PFN_DOWN(psize); i++ )
 if ( !mfn_valid(mfn_add(maddr_to_mfn(pbase), i)) )
-- 
2.34.1




[PATCH v3 0/4] Guest XenStore page allocation for 11 Dom0less domUs

2024-05-16 Thread Henry Wang
Hi all,

This series is trying to fix the reported guest magic region alloc
issue for 11 Dom0less domUs, an error message can seen from the
init-dom0less application on 1:1 direct-mapped Dom0less DomUs:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

This is because populate_physmap() automatically assumes gfn == mfn
for direct mapped domains. This cannot be true for the magic pages
that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
helper application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list
is empty at that time.

In [1] I've tried to fix this issue by the domctl approach, and
discussions in [2] and [3] indicates that a domctl is not really
necessary, as we can simplify the issue to "allocate the Dom0less
guest magic regions at the Dom0less DomU build time and pass the
region base PFN to init-dom0less application". The later on
discussion [4] reached an agreement that we only need to allocate
one single page for XenStore, and set the HVM_PARAM_STORE_PFN from
hypervisor with some Linux XenStore late init protocol improvements.
Therefore, this series tries to fix the issue based on all discussions.
The first patch puts a restriction that static shared memory on
direct-mapped DomUs should also be direct mapped, as otherwise it will
clash [5]. Patch 2 allocates the XenStore page from Xen and set the
initial connection status to RECONNECTED. Patch 3 is the update of the
init-dom0less application with all of the changes. Patch 4 is the doc
change to reflect the changes introduced by this series.

**NOTE**: This series should work with the Linux change [6].

[1] https://lore.kernel.org/xen-devel/20240409045357.236802-1-xin.wa...@amd.com/
[2] 
https://lore.kernel.org/xen-devel/c7857223-eab8-409a-b618-6ec70f616...@apertussolutions.com/
[3] 
https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2404251508470.3940@ubuntu-linux-20-04-desktop/
[4] 
https://lore.kernel.org/xen-devel/d33ea00d-890d-45cc-9583-64c953abd...@xen.org/
[5] 
https://lore.kernel.org/xen-devel/686ba256-f8bf-47e7-872f-d277bf7df...@xen.org/
[6] 
https://lore.kernel.org/xen-devel/20240517011516.1451087-1-xin.wa...@amd.com/

Henry Wang (4):
  xen/arm/static-shmem: Static-shmem should be direct-mapped for
direct-mapped domains
  xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor
  tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE
  docs/features/dom0less: Update the late XenStore init protocol

 docs/features/dom0less.pandoc | 12 +++---
 docs/misc/arm/device-tree/booting.txt |  3 ++
 tools/helpers/init-dom0less.c | 58 +++
 xen/arch/arm/dom0less-build.c | 44 +++-
 xen/arch/arm/static-shmem.c   |  6 +++
 5 files changed, 73 insertions(+), 50 deletions(-)

-- 
2.34.1




Re: [PATCH v2 6/8] xen/arm: Add XEN_DOMCTL_dt_overlay DOMCTL and related operations

2024-05-16 Thread Henry Wang

Hi Jan,

As usual, thanks for the review!

On 5/16/2024 8:31 PM, Jan Beulich wrote:

On 16.05.2024 12:03, Henry Wang wrote:

+/*
+ * First check if dtbo is correct i.e. it should one of the dtbo which was
+ * used when dynamically adding the node.
+ * Limitation: Cases with same node names but different property are not
+ * supported currently. We are relying on user to provide the same dtbo
+ * as it was used when adding the nodes.
+ */
+list_for_each_entry_safe( entry, temp, _tracker, entry )
+{
+if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 )
+{
+track = entry;
Random question (not doing a full review of the DT code): What use is
this (and the track variable itself)? It's never used further down afaics.
Same for attach.


I think you are correct, it is a copy paste of the existing code and the 
track variable is indeed useless. So in v3, I will simply drop it and 
mention this clean-up in commit message. Also I realized that the exact 
logic of finding the entry is duplicated third times, so I will also 
extract the logic to a function.



--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1190,6 +1190,17 @@ struct xen_domctl_vmtrace_op {
  typedef struct xen_domctl_vmtrace_op xen_domctl_vmtrace_op_t;
  DEFINE_XEN_GUEST_HANDLE(xen_domctl_vmtrace_op_t);
  
+#if defined(__arm__) || defined (__aarch64__)

Nit: Consistent use of blanks please (also again below).


Good catch. Will fix it.


+struct xen_domctl_dt_overlay {
+XEN_GUEST_HANDLE_64(const_void) overlay_fdt;  /* IN: overlay fdt. */
+uint32_t overlay_fdt_size;  /* IN: Overlay dtb size. */
+#define XEN_DOMCTL_DT_OVERLAY_ATTACH3
+#define XEN_DOMCTL_DT_OVERLAY_DETACH4

While the numbers don't really matter much, picking 3 and 4 rather than,
say, 1 and 2 still looks a little odd.


Well although I agree with you it is indeed a bit odd, the problem of 
this is that, in current implementation I reused the libxl_dt_overlay() 
(with proper backward compatible) to deliver the sysctl and domctl 
depend on the op, and we have:

#define LIBXL_DT_OVERLAY_ADD   1
#define LIBXL_DT_OVERLAY_REMOVE    2
#define LIBXL_DT_OVERLAY_ATTACH    3
#define LIBXL_DT_OVERLAY_DETACH    4

Then the op-number is passed from the toolstack to Xen, and checked in 
dt_overlay_domctl(). So with this implementation the attach/detach op 
number should be 3 and 4 since 1 and 2 have different meanings.


But I realized that I can also implement a similar API, say 
libxl_dt_overlay_domain() and that way we can reuse 1 and 2 and there is 
not even need to provide backward compatible of libxl_dt_overlay(). So 
would you mind sharing your preference on which approach would you like 
more? Thanks!



--- a/xen/include/xen/dt-overlay.h
+++ b/xen/include/xen/dt-overlay.h
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 

Why? All you need here ...


@@ -42,12 +43,18 @@ struct xen_sysctl_dt_overlay;
  
  #ifdef CONFIG_OVERLAY_DTB

  long dt_overlay_sysctl(struct xen_sysctl_dt_overlay *op);
+long dt_overlay_domctl(struct domain *d, struct xen_domctl_dt_overlay *op);

... is a forward declaration of struct xen_domctl_dt_overlay.


Oh indeed. Will fix this. Thanks!

Kind regards,
Henry



Jan





Re: [PATCH] drivers/xen: Improve the late XenStore init protocol

2024-05-16 Thread Henry Wang

Hi Stefano,

On 5/17/2024 8:52 AM, Stefano Stabellini wrote:

On Thu, 16 May 2024, Henry Wang wrote:

   enum xenstore_init xen_store_domain_type;
   EXPORT_SYMBOL_GPL(xen_store_domain_type);
   @@ -751,9 +755,10 @@ static void xenbus_probe(void)
   {
xenstored_ready = 1;
   -if (!xen_store_interface) {
-   xen_store_interface = memremap(xen_store_gfn <<
XEN_PAGE_SHIFT,
-  XEN_PAGE_SIZE, MEMREMAP_WB);
+   if (!xen_store_interface || XS_INTERFACE_READY) {
+   if (!xen_store_interface)

These two nested if's don't make sense to me. If XS_INTERFACE_READY
succeeds, it means that  ((xen_store_interface != NULL) &&
(xen_store_interface->connection == XENSTORE_CONNECTED)).

So it is not possible that xen_store_interface == NULL immediately
after. Right?

I think this is because we want to free the irq for the late init case,
otherwise the init-dom0less will fail. For the xenstore PFN allocated case,
the connection is already set to CONNECTED when we execute init-dom0less. But
I agree with you, would below diff makes more sense to you?

diff --git a/drivers/xen/xenbus/xenbus_probe.c
b/drivers/xen/xenbus/xenbus_probe.c
index 8aec0ed1d047..b8005b651a29 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -76,6 +76,8 @@ EXPORT_SYMBOL_GPL(xen_store_interface);
     ((xen_store_interface != NULL) && \
  (xen_store_interface->connection == XENSTORE_CONNECTED))

+static bool xs_late_init = false;
+
  enum xenstore_init xen_store_domain_type;
  EXPORT_SYMBOL_GPL(xen_store_domain_type);

@@ -755,7 +757,7 @@ static void xenbus_probe(void)
  {
     xenstored_ready = 1;

-   if (!xen_store_interface || XS_INTERFACE_READY) {
+   if (xs_late_init) {
     if (!xen_store_interface)
     xen_store_interface = memremap(xen_store_gfn <<


I would just remove the outer 'if' and do this:


if (!xen_store_interface)
xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT,
XEN_PAGE_SIZE, MEMREMAP_WB);
/*
 * Now it is safe to free the IRQ used for xenstore late
 * initialization. No need to unbind: it is about to be
 * bound again from xb_init_comms. Note that calling
 * unbind_from_irqhandler now would result in xen_evtchn_close()
 * being called and the event channel not being enabled again
 * afterwards, resulting in missed event notifications.
 */
if (xs_init_irq > 0)
free_irq(xs_init_irq, _waitq);


I think this should work fine in all cases.


Thanks. I followed your suggestion in v2.


  I am unsure if
xs_init_irq==0 is possible valid value for xs_init_irq. If it is not,
then we are fine. If 0 is a possible valid irq number, then we should
initialize xs_init_irq to -1, and here check for xs_init_irq >= 0.


Yeah the xs_init_irq==0 is a valid value. I followed your latter comment 
to init it to -1 and check it >=0.


Kind regards,
Henry



[PATCH v2] drivers/xen: Improve the late XenStore init protocol

2024-05-16 Thread Henry Wang
Currently, the late XenStore init protocol is only triggered properly
for the case that HVM_PARAM_STORE_PFN is ~0ULL (invalid). For the
case that XenStore interface is allocated but not ready (the connection
status is not XENSTORE_CONNECTED), Linux should also wait until the
XenStore is set up properly.

Introduce a macro to describe the XenStore interface is ready, use
it in xenbus_probe_initcall() to select the code path of doing the
late XenStore init protocol or not. Since now we have more than one
condition for XenStore late init, rework the check in xenbus_probe()
for the free_irq().

Take the opportunity to enhance the check of the allocated XenStore
interface can be properly mapped, and return error early if the
memremap() fails.

Fixes: 5b3353949e89 ("xen: add support for initializing xenstore later as HVM 
domain")
Signed-off-by: Henry Wang 
Signed-off-by: Michal Orzel 
---
v2:
- Use -EINVAL for the memremap() check. (Stefano)
- Add Fixes: tag. (Stefano)
- Rework the condition for free_irq() in xenbus_probe(). (Stefano)
---
 drivers/xen/xenbus/xenbus_probe.c | 36 ---
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 3205e5d724c8..1a9ded0cddcb 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -65,13 +65,17 @@
 #include "xenbus.h"
 
 
-static int xs_init_irq;
+static int xs_init_irq = -1;
 int xen_store_evtchn;
 EXPORT_SYMBOL_GPL(xen_store_evtchn);
 
 struct xenstore_domain_interface *xen_store_interface;
 EXPORT_SYMBOL_GPL(xen_store_interface);
 
+#define XS_INTERFACE_READY \
+   ((xen_store_interface != NULL) && \
+(xen_store_interface->connection == XENSTORE_CONNECTED))
+
 enum xenstore_init xen_store_domain_type;
 EXPORT_SYMBOL_GPL(xen_store_domain_type);
 
@@ -751,19 +755,19 @@ static void xenbus_probe(void)
 {
xenstored_ready = 1;
 
-   if (!xen_store_interface) {
+   if (!xen_store_interface)
xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT,
   XEN_PAGE_SIZE, MEMREMAP_WB);
-   /*
-* Now it is safe to free the IRQ used for xenstore late
-* initialization. No need to unbind: it is about to be
-* bound again from xb_init_comms. Note that calling
-* unbind_from_irqhandler now would result in xen_evtchn_close()
-* being called and the event channel not being enabled again
-* afterwards, resulting in missed event notifications.
-*/
+   /*
+* Now it is safe to free the IRQ used for xenstore late
+* initialization. No need to unbind: it is about to be
+* bound again from xb_init_comms. Note that calling
+* unbind_from_irqhandler now would result in xen_evtchn_close()
+* being called and the event channel not being enabled again
+* afterwards, resulting in missed event notifications.
+*/
+   if (xs_init_irq >= 0)
free_irq(xs_init_irq, _waitq);
-   }
 
/*
 * In the HVM case, xenbus_init() deferred its call to
@@ -822,7 +826,7 @@ static int __init xenbus_probe_initcall(void)
if (xen_store_domain_type == XS_PV ||
(xen_store_domain_type == XS_HVM &&
 !xs_hvm_defer_init_for_callback() &&
-xen_store_interface != NULL))
+XS_INTERFACE_READY))
xenbus_probe();
 
/*
@@ -831,7 +835,7 @@ static int __init xenbus_probe_initcall(void)
 * started, then probe.  It will be triggered when communication
 * starts happening, by waiting on xb_waitq.
 */
-   if (xen_store_domain_type == XS_LOCAL || xen_store_interface == NULL) {
+   if (xen_store_domain_type == XS_LOCAL || !XS_INTERFACE_READY) {
struct task_struct *probe_task;
 
probe_task = kthread_run(xenbus_probe_thread, NULL,
@@ -1014,6 +1018,12 @@ static int __init xenbus_init(void)
xen_store_interface =
memremap(xen_store_gfn << XEN_PAGE_SHIFT,
 XEN_PAGE_SIZE, MEMREMAP_WB);
+   if (!xen_store_interface) {
+   pr_err("%s: cannot map 
HVM_PARAM_STORE_PFN=%llx\n",
+  __func__, v);
+   err = -EINVAL;
+   goto out_error;
+   }
if (xen_store_interface->connection != 
XENSTORE_CONNECTED)
wait = true;
}
-- 
2.34.1




[PATCH v2 3/8] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs

2024-05-16 Thread Henry Wang
There are some use cases in which the dom0less domUs need to have
the XEN_DOMCTL_CDF_iommu set at the domain construction time. For
example, the dynamic dtbo feature allows the domain to be assigned
a device that is behind the IOMMU at runtime. For these use cases,
we need to have a way to specify the domain will need the IOMMU
mapping at domain construction time.

Introduce a "passthrough" DT property for Dom0less DomUs following
the same entry as the xl.cfg. Currently only provide two options,
i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain
construction time based on the property.

Signed-off-by: Henry Wang 
---
v2:
- New patch to replace the original patch in v1:
  "[PATCH 03/15] xen/arm: Always enable IOMMU"
---
 docs/misc/arm/device-tree/booting.txt | 13 +
 xen/arch/arm/dom0less-build.c |  7 +--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index bbd955e9c2..61f9082553 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -260,6 +260,19 @@ with the following properties:
 value specified by Xen command line parameter gnttab_max_maptrack_frames
 (or its default value if unspecified, i.e. 1024) is used.
 
+- passthrough
+
+A string property specifying whether IOMMU mappings are enabled for the
+domain and hence whether it will be enabled for passthrough hardware.
+Possible property values are:
+
+- "enabled"
+IOMMU mappings are enabled for the domain.
+
+- "disabled"
+IOMMU mappings are disabled for the domain and so hardware may not be
+passed through. This option is the default if this property is missing.
+
 Under the "xen,domain" compatible node, one or more sub-nodes are present
 for the DomU kernel and ramdisk.
 
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 74f053c242..1396a102e1 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -848,6 +848,7 @@ static int __init construct_domU(struct domain *d,
 void __init create_domUs(void)
 {
 struct dt_device_node *node;
+const char *dom0less_iommu;
 const struct dt_device_node *cpupool_node,
 *chosen = dt_find_node_by_path("/chosen");
 
@@ -895,8 +896,10 @@ void __init create_domUs(void)
 panic("Missing property 'cpus' for domain %s\n",
   dt_node_name(node));
 
-if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") &&
- iommu_enabled )
+if ( iommu_enabled &&
+ ((!dt_property_read_string(node, "passthrough", _iommu) 
&&
+   !strcmp(dom0less_iommu, "enabled")) ||
+  dt_find_compatible_node(node, NULL, "multiboot,device-tree")) )
 d_cfg.flags |= XEN_DOMCTL_CDF_iommu;
 
 if ( !dt_property_read_u32(node, "nr_spis", _cfg.arch.nr_spis) )
-- 
2.34.1




[PATCH v2 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs

2024-05-16 Thread Henry Wang
From: Vikram Garhwal 

Currently, routing/removing physical interrupts are only allowed at
the domain creation/destroy time. For use cases such as dynamic device
tree overlay adding/removing, the routing/removing of physical IRQ to
running domains should be allowed.

Removing the above-mentioned domain creation/dying check. Since this
will introduce interrupt state unsync issues for cases when the
interrupt is active or pending in the guest, therefore for these cases
we simply reject the operation. Do it for both new and old vGIC
implementations.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
v2:
- Reject the case where the IRQ is active or pending in guest.
---
 xen/arch/arm/gic-vgic.c  |  8 ++--
 xen/arch/arm/gic.c   | 15 ---
 xen/arch/arm/vgic/vgic.c |  5 +++--
 3 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 56490dbc43..d1608415f8 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -444,14 +444,18 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
 {
 /* The VIRQ should not be already enabled by the guest */
 if ( !p->desc &&
- !test_bit(GIC_IRQ_GUEST_ENABLED, >status) )
+ !test_bit(GIC_IRQ_GUEST_ENABLED, >status) &&
+ !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) &&
+ !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) )
 p->desc = desc;
 else
 ret = -EBUSY;
 }
 else
 {
-if ( desc && p->desc != desc )
+if ( desc && p->desc != desc &&
+ (test_bit(GIC_IRQ_GUEST_VISIBLE, >status) ||
+  test_bit(GIC_IRQ_GUEST_ACTIVE, >status)) )
 ret = -EINVAL;
 else
 p->desc = NULL;
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 44c40e86de..3ebd89940a 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -135,14 +135,6 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int 
virq,
 ASSERT(virq < vgic_num_irqs(d));
 ASSERT(!is_lpi(virq));
 
-/*
- * When routing an IRQ to guest, the virtual state is not synced
- * back to the physical IRQ. To prevent get unsync, restrict the
- * routing to when the Domain is been created.
- */
-if ( d->creation_finished )
-return -EBUSY;
-
 ret = vgic_connect_hw_irq(d, NULL, virq, desc, true);
 if ( ret )
 return ret;
@@ -167,13 +159,6 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned 
int virq,
 ASSERT(test_bit(_IRQ_GUEST, >status));
 ASSERT(!is_lpi(virq));
 
-/*
- * Removing an interrupt while the domain is running may have
- * undesirable effect on the vGIC emulation.
- */
-if ( !d->is_dying )
-return -EBUSY;
-
 desc->handler->shutdown(desc);
 
 /* EOI the IRQ if it has not been done by the guest */
diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
index b9463a5f27..785ef2b192 100644
--- a/xen/arch/arm/vgic/vgic.c
+++ b/xen/arch/arm/vgic/vgic.c
@@ -877,7 +877,7 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *vcpu,
 if ( connect )  /* assign a mapped IRQ */
 {
 /* The VIRQ should not be already enabled by the guest */
-if ( !irq->hw && !irq->enabled )
+if ( !irq->hw && !irq->enabled && !irq->active && !irq->pending_latch )
 {
 irq->hw = true;
 irq->hwintid = desc->irq;
@@ -887,7 +887,8 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *vcpu,
 }
 else/* remove a mapped IRQ */
 {
-if ( desc && irq->hwintid != desc->irq )
+if ( desc && irq->hwintid != desc->irq &&
+ (irq->active || irq->pending_latch) )
 {
 ret = -EINVAL;
 }
-- 
2.34.1




[PATCH v2 7/8] tools: Introduce the "xl dt-overlay {attach,detach}" commands

2024-05-16 Thread Henry Wang
With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
attach/detach devices from the provided DT overlay to domains.
Support this by introducing a new set of "xl dt-overlay" commands and
related documentation, i.e. "xl dt-overlay {attach,detach}". Slightly
rework the command option parsing logic.

Since the addition of these two commands modifies the existing libxl
API libxl_dt_overlay(), also provide the backward compatible for it.

Signed-off-by: Henry Wang 
---
v2:
- New patch.
---
 tools/include/libxl.h   | 15 -
 tools/include/xenctrl.h |  3 +++
 tools/libs/ctrl/xc_dt_overlay.c | 31 +++
 tools/libs/light/libxl_dt_overlay.c | 30 --
 tools/xl/xl_cmdtable.c  |  4 ++--
 tools/xl/xl_vmcontrol.c | 33 +++--
 6 files changed, 96 insertions(+), 20 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 62cb07dea6..27aab4bcee 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -2549,8 +2549,21 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, 
uint32_t domid,
 void libxl_device_pci_list_free(libxl_device_pci* list, int num);
 
 #if defined(__arm__) || defined(__aarch64__)
-int libxl_dt_overlay(libxl_ctx *ctx, void *overlay,
+#define LIBXL_DT_OVERLAY_ADD   1
+#define LIBXL_DT_OVERLAY_REMOVE2
+#define LIBXL_DT_OVERLAY_ATTACH3
+#define LIBXL_DT_OVERLAY_DETACH4
+
+int libxl_dt_overlay(libxl_ctx *ctx, uint32_t domain_id, void *overlay,
  uint32_t overlay_size, uint8_t overlay_op);
+#if defined(LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x041900
+int libxl_dt_overlay_0x041800(libxl_ctx *ctx, void *overlay,
+  uint32_t overlay_size, uint8_t overlay_op);
+{
+return libxl_dt_overlay(ctx, 0, overlay, overlay_size, overlay_op);
+}
+#define libxl_dt_overlay libxl_dt_overlay_0x041800
+#endif
 #endif
 
 /*
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 4996855944..9ceca0cffc 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2657,6 +2657,9 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t 
domid,
 #if defined(__arm__) || defined(__aarch64__)
 int xc_dt_overlay(xc_interface *xch, void *overlay_fdt,
   uint32_t overlay_fdt_size, uint8_t overlay_op);
+int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt,
+ uint32_t overlay_fdt_size, uint8_t overlay_op,
+ uint32_t domain_id);
 #endif
 
 /* Compat shims */
diff --git a/tools/libs/ctrl/xc_dt_overlay.c b/tools/libs/ctrl/xc_dt_overlay.c
index c2224c4d15..ea1da522d1 100644
--- a/tools/libs/ctrl/xc_dt_overlay.c
+++ b/tools/libs/ctrl/xc_dt_overlay.c
@@ -48,3 +48,34 @@ err:
 
 return err;
 }
+
+int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt,
+ uint32_t overlay_fdt_size, uint8_t overlay_op,
+ uint32_t domain_id)
+{
+int err;
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_dt_overlay,
+.domain = domain_id,
+.u.dt_overlay = {
+.overlay_op = overlay_op,
+.overlay_fdt_size = overlay_fdt_size,
+}
+};
+
+DECLARE_HYPERCALL_BOUNCE(overlay_fdt, overlay_fdt_size,
+ XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( (err = xc_hypercall_bounce_pre(xch, overlay_fdt)) )
+goto err;
+
+set_xen_guest_handle(domctl.u.dt_overlay.overlay_fdt, overlay_fdt);
+
+if ( (err = do_domctl(xch, )) != 0 )
+PERROR("%s failed", __func__);
+
+err:
+xc_hypercall_bounce_post(xch, overlay_fdt);
+
+return err;
+}
diff --git a/tools/libs/light/libxl_dt_overlay.c 
b/tools/libs/light/libxl_dt_overlay.c
index a6c709a6dc..9110b1efd2 100644
--- a/tools/libs/light/libxl_dt_overlay.c
+++ b/tools/libs/light/libxl_dt_overlay.c
@@ -41,8 +41,8 @@ static int check_overlay_fdt(libxl__gc *gc, void *fdt, size_t 
size)
 return 0;
 }
 
-int libxl_dt_overlay(libxl_ctx *ctx, void *overlay_dt, uint32_t 
overlay_dt_size,
- uint8_t overlay_op)
+int libxl_dt_overlay(libxl_ctx *ctx, uint32_t domain_id, void *overlay_dt,
+ uint32_t overlay_dt_size, uint8_t overlay_op)
 {
 int rc;
 int r;
@@ -57,11 +57,29 @@ int libxl_dt_overlay(libxl_ctx *ctx, void *overlay_dt, 
uint32_t overlay_dt_size,
 rc = 0;
 }
 
-r = xc_dt_overlay(ctx->xch, overlay_dt, overlay_dt_size, overlay_op);
-
-if (r) {
-LOG(ERROR, "%s: Adding/Removing overlay dtb failed.", __func__);
+switch (overlay_op)
+{
+case LIBXL_DT_OVERLAY_ADD:
+case LIBXL_DT_OVERLAY_REMOVE:
+r = xc_dt_overlay(ctx->xch, overlay_dt, overlay_dt_size, overlay_op);
+if (r) {
+LOG(ERROR, "%s: Adding/Removing

[PATCH v2 2/8] tools/xl: Correct the help information and exit code of the dt-overlay command

2024-05-16 Thread Henry Wang
Fix the name mismatch in the xl dt-overlay command, the
command name should be "dt-overlay" instead of "dt_overlay".
Add the missing "," in the cmdtable.

Fix the exit code of the dt-overlay command, use EXIT_FAILURE
instead of ERROR_FAIL.

Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree 
overlay support")
Suggested-by: Anthony PERARD 
Signed-off-by: Henry Wang 
---
v2:
- New patch
---
 tools/xl/xl_cmdtable.c  | 2 +-
 tools/xl/xl_vmcontrol.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 62bdb2aeaa..1f3c6b5897 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -635,7 +635,7 @@ const struct cmd_spec cmd_table[] = {
 { "dt-overlay",
   _dt_overlay, 0, 1,
   "Add/Remove a device tree overlay",
-  "add/remove <.dtbo>"
+  "add/remove <.dtbo>",
   "-h print this help\n"
 },
 #endif
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 98f6bd2e76..02575d5d36 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -1278,7 +1278,7 @@ int main_dt_overlay(int argc, char **argv)
 const int overlay_remove_op = 2;
 
 if (argc < 2) {
-help("dt_overlay");
+help("dt-overlay");
 return EXIT_FAILURE;
 }
 
@@ -1302,11 +1302,11 @@ int main_dt_overlay(int argc, char **argv)
 fprintf(stderr, "failed to read the overlay device tree file %s\n",
 overlay_config_file);
 free(overlay_dtb);
-return ERROR_FAIL;
+return EXIT_FAILURE;
 }
 } else {
 fprintf(stderr, "overlay dtbo file not provided\n");
-return ERROR_FAIL;
+return EXIT_FAILURE;
 }
 
 rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op);
-- 
2.34.1




[PATCH v2 6/8] xen/arm: Add XEN_DOMCTL_dt_overlay DOMCTL and related operations

2024-05-16 Thread Henry Wang
In order to support the dynamic dtbo device assignment to a running
VM, the add/remove of the DT overlay and the attach/detach of the
device from the DT overlay should happen separately. Therefore,
repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
overlay to Xen device tree, instead of assigning the device to the
hardware domain at the same time. Add the XEN_DOMCTL_dt_overlay with
operations XEN_DOMCTL_DT_OVERLAY_{ATTACH,DETACH} to do/undo the
device assignment to the domain.

The hypervisor firstly checks the DT overlay passed from the toolstack
is valid. Then the device nodes are retrieved from the overlay tracker
based on the DT overlay. The attach/detach of the device is implemented
by map/unmap the IRQ and IOMMU resources. Note that with these changes,
the device de-registration from the IOMMU driver should only happen at
the time when the DT overlay is removed from the Xen device tree.

Signed-off-by: Henry Wang 
Signed-off-by: Vikram Garhwal 
---
v2:
- New patch.
---
 xen/arch/arm/domctl.c|   3 +
 xen/common/dt-overlay.c  | 415 ---
 xen/include/public/domctl.h  |  15 ++
 xen/include/public/sysctl.h  |   7 +-
 xen/include/xen/dt-overlay.h |   7 +
 5 files changed, 366 insertions(+), 81 deletions(-)

diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index ad56efb0f5..12a12ee781 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -5,6 +5,7 @@
  * Copyright (c) 2012, Citrix Systems
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -176,6 +177,8 @@ long arch_do_domctl(struct xen_domctl *domctl, struct 
domain *d,
 
 return rc;
 }
+case XEN_DOMCTL_dt_overlay:
+return dt_overlay_domctl(d, >u.dt_overlay);
 default:
 return subarch_do_domctl(domctl, d, u_domctl);
 }
diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 9cece79067..593e985949 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -356,24 +356,100 @@ static int overlay_get_nodes_info(const void *fdto, char 
**nodes_full_path)
 return 0;
 }
 
+static int remove_irq(unsigned long s, unsigned long e, void *data)
+{
+struct domain *d = data;
+int rc = 0;
+
+/*
+ * IRQ should always have access unless there are duplication of
+ * of irqs in device tree. There are few cases of xen device tree
+ * where there are duplicate interrupts for the same node.
+ */
+if (!irq_access_permitted(d, s))
+return 0;
+/*
+ * TODO: We don't handle shared IRQs for now. So, it is assumed that
+ * the IRQs was not shared with another domain.
+ */
+rc = irq_deny_access(d, s);
+if ( rc )
+{
+printk(XENLOG_ERR "unable to revoke access for irq %ld\n", s);
+return rc;
+}
+
+rc = release_guest_irq(d, s);
+if ( rc )
+{
+printk(XENLOG_ERR "unable to release irq %ld\n", s);
+return rc;
+}
+
+return rc;
+}
+
+static int remove_all_irqs(struct rangeset *irq_ranges, struct domain *d)
+{
+return rangeset_report_ranges(irq_ranges, 0, ~0UL, remove_irq, d);
+}
+
+static int remove_iomem(unsigned long s, unsigned long e, void *data)
+{
+struct domain *d = data;
+int rc = 0;
+p2m_type_t t;
+mfn_t mfn;
+
+mfn = p2m_lookup(d, _gfn(s), );
+if ( mfn_x(mfn) == 0 || mfn_x(mfn) == ~0UL )
+return -EINVAL;
+
+rc = iomem_deny_access(d, s, e);
+if ( rc )
+{
+printk(XENLOG_ERR "Unable to remove %pd access to %#lx - %#lx\n",
+   d, s, e);
+return rc;
+}
+
+rc = unmap_mmio_regions(d, _gfn(s), e - s, _mfn(s));
+if ( rc )
+return rc;
+
+return rc;
+}
+
+static int remove_all_iomems(struct rangeset *iomem_ranges, struct domain *d)
+{
+return rangeset_report_ranges(iomem_ranges, 0, ~0UL, remove_iomem, d);
+}
+
 /* Check if node itself can be removed and remove node from IOMMU. */
-static int remove_node_resources(struct dt_device_node *device_node)
+static int remove_node_resources(struct dt_device_node *device_node,
+ struct domain *d)
 {
 int rc = 0;
 unsigned int len;
 domid_t domid;
 
-domid = dt_device_used_by(device_node);
+if ( !d )
+{
+domid = dt_device_used_by(device_node);
 
-dt_dprintk("Checking if node %s is used by any domain\n",
-   device_node->full_name);
+dt_dprintk("Checking if node %s is used by any domain\n",
+   device_node->full_name);
 
-/* Remove the node if only it's assigned to hardware domain or domain io. 
*/
-if ( domid != hardware_domain->domain_id && domid != DOMID_IO )
-{
-printk(XENLOG_ERR "Device %s is being used by domain %u. Removing 
nodes failed\n",
-   device_node->full_name, domid);
-return -EINVAL;
+/*
+ * We also check if device is assigned t

[PATCH v2 8/8] docs: Add device tree overlay documentation

2024-05-16 Thread Henry Wang
From: Vikram Garhwal 

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
v2:
- Update the content based on the changes in this version.
---
 docs/misc/arm/overlay.txt | 99 +++
 1 file changed, 99 insertions(+)
 create mode 100644 docs/misc/arm/overlay.txt

diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt
new file mode 100644
index 00..811a6de369
--- /dev/null
+++ b/docs/misc/arm/overlay.txt
@@ -0,0 +1,99 @@
+# Device Tree Overlays support in Xen
+
+Xen now supports dynamic device assignment to running domains,
+i.e. adding/removing nodes (using .dtbo) to/from Xen device tree, and
+attaching/detaching them to/from a running domain with given $domid.
+
+Dynamic node assignment works in two steps:
+
+## Add/Remove device tree overlay to/from Xen device tree
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor applies/removes the dtbo to/from Xen device tree.
+
+## Attach/Detach device from the DT overlay to/from domain
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor attach/detach the device to/from the user-provided $domid by
+   mapping/unmapping node resources in the DT overlay.
+
+# Examples
+
+Here are a few examples on how to use it.
+
+## Dom0 device add
+
+For assigning a device tree overlay to Dom0, user should firstly properly
+prepare the DT overlay. More information about device tree overlays can be
+found in [1]. Then, in Dom0, enter the following:
+
+(dom0) xl dt-overlay add overlay.dtbo
+
+This will allocate the devices mentioned in overlay.dtbo to Xen device tree.
+
+To assign the newly added device from the dtbo to Dom0:
+
+(dom0) xl dt-overlay attach overlay.dtbo 0
+
+Next, if the user wants to add the same device tree overlay to dom0
+Linux, execute the following:
+
+(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(dom0) cat overlay.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally if needed, the relevant Linux kernel drive can be loaded using:
+
+(dom0) modprobe module_name.ko
+
+## Dom0 device remove
+
+For removing the device from Dom0, first detach the device from Dom0:
+
+(dom0) xl dt-overlay detach overlay.dtbo 0
+
+NOTE: The user is expected to unload any Linux kernel modules which
+might be accessing the devices in overlay.dtbo before detach the device.
+Detaching devices without unloading the modules might result in a crash.
+
+Then remove the overlay from Xen device tree:
+
+(dom0) xl dt-overlay remove overlay.dtbo
+
+## DomU device add/remove
+
+All the nodes in dtbo will be assigned to a domain; the user will need
+to prepare the dtb for the domU. For example, the `interrupt-parent` property
+of the DomU overlay should be changed to the Xen hardcoded value `0xfde8`.
+Below assumes the properly written DomU dtbo is `overlay_domu.dtbo`.
+
+User will need to create the DomU with below properties properly configured
+in the xl config file:
+- `iomem`
+- `passthrough` (if IOMMU is needed)
+
+User will also need to modprobe the relevant drivers.
+
+Example for domU device add:
+
+(dom0) xl dt-overlay add overlay.dtbo# If not executed before
+(dom0) xl dt-overlay attach overlay.dtbo $domid
+(dom0) xl console $domid # To access $domid console
+
+Next, if the user needs to modify/prepare the overlay.dtbo suitable for
+the domU:
+
+(domU) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(domU) cat overlay_domu.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally, if needed, the relevant Linux kernel drive can be probed:
+
+(domU) modprobe module_name.ko
+
+Example for domU overlay remove:
+
+(dom0) xl dt-overlay detach overlay.dtbo $domid
+(dom0) xl dt-overlay remove overlay.dtbo
+
+[1] https://www.kernel.org/doc/Documentation/devicetree/overlay-notes.txt
-- 
2.34.1




[PATCH v2 4/8] tools/arm: Introduce the "nr_spis" xl config entry

2024-05-16 Thread Henry Wang
Currently, the number of SPIs allocated to the domain is only
configurable for Dom0less DomUs. Xen domains are supposed to be
platform agnostics and therefore the numbers of SPIs for libxl
guests should not be based on the hardware.

Introduce a new xl config entry for Arm to provide a method for
user to decide the number of SPIs. This would help to avoid
bumping the `config->arch.nr_spis` in libxl everytime there is a
new platform with increased SPI numbers.

Update the doc and the golang bindings accordingly.

Signed-off-by: Henry Wang 
---
v2:
- New patch to replace the original patch in v1:
  "[PATCH 05/15] tools/libs/light: Increase nr_spi to 160"
---
 docs/man/xl.cfg.5.pod.in | 11 +++
 tools/golang/xenlight/helpers.gen.go |  2 ++
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/libs/light/libxl_arm.c |  4 ++--
 tools/libs/light/libxl_types.idl |  1 +
 tools/xl/xl_parse.c  |  3 +++
 6 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 8f2b375ce9..6a2d86065e 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -3072,6 +3072,17 @@ raised.
 
 =back
 
+=over 4
+
+=item B
+
+A 32-bit optional integer parameter specifying the number of SPIs (Shared
+Peripheral Interrupts) to allocate for the domain. If the `nr_spis` parameter
+is missing, the max number of SPIs calculated by the toolstack based on the
+devices allocated for the domain will be used.
+
+=back
+
 =head3 x86
 
 =over 4
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index 78bdb08b15..757ccaf035 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1154,6 +1154,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.ArchArm.SveVl = SveType(xc.arch_arm.sve_vl)
+x.ArchArm.NrSpis = uint32(xc.arch_arm.nr_spis)
 if err := x.ArchX86.MsrRelaxed.fromC(_x86.msr_relaxed);err != nil {
 return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
 }
@@ -1670,6 +1671,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.arch_arm.sve_vl = C.libxl_sve_type(x.ArchArm.SveVl)
+xc.arch_arm.nr_spis = C.uint32_t(x.ArchArm.NrSpis)
 if err := x.ArchX86.MsrRelaxed.toC(_x86.msr_relaxed); err != nil {
 return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
 }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index ccfe18019e..b7b4ba88af 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -597,6 +597,7 @@ ArchArm struct {
 GicVersion GicVersion
 Vuart VuartType
 SveVl SveType
+NrSpis uint32
 }
 ArchX86 struct {
 MsrRelaxed Defbool
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 1cb89fa584..a4029e3ac8 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -181,8 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 
 LOG(DEBUG, "Configure the domain");
 
-config->arch.nr_spis = nr_spis;
-LOG(DEBUG, " - Allocate %u SPIs", nr_spis);
+config->arch.nr_spis = max(nr_spis, d_config->b_info.arch_arm.nr_spis);
+LOG(DEBUG, " - Allocate %u SPIs", config->arch.nr_spis);
 
 switch (d_config->b_info.arch_arm.gic_version) {
 case LIBXL_GIC_VERSION_DEFAULT:
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 470122e768..3f143f405d 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -722,6 +722,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
 ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
("vuart", libxl_vuart_type),
("sve_vl", libxl_sve_type),
+   ("nr_spis", uint32),
   ])),
 ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool),
   ])),
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index ab09d0288b..4aa99029b5 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2933,6 +2933,9 @@ skip_usbdev:
 }
 }
 
+if (!xlu_cfg_get_long (config, "nr_spis", , 0))
+b_info->arch_arm.nr_spis = l;
+
 parse_vkb_list(config, d_config);
 
 d_config->virtios = NULL;
-- 
2.34.1




[PATCH v2 1/8] xen/common/dt-overlay: Fix lock issue when add/remove the device

2024-05-16 Thread Henry Wang
If CONFIG_DEBUG=y, below assertion will be triggered:
(XEN) Assertion 'rw_is_locked(_host_lock)' failed at 
drivers/passthrough/device_tree.c:146
(XEN) [ Xen-4.19-unstable  arm64  debug=y  Not tainted ]
[...]
(XEN) Xen call trace:
(XEN)[<0a257418>] iommu_remove_dt_device+0x8c/0xd4 (PC)
(XEN)[<0a2573a0>] iommu_remove_dt_device+0x14/0xd4 (LR)
(XEN)[<0a20797c>] dt-overlay.c#remove_node_resources+0x8c/0x90
(XEN)[<0a207f14>] dt-overlay.c#remove_nodes+0x524/0x648
(XEN)[<0a208460>] dt_overlay_sysctl+0x428/0xc68
(XEN)[<0a2707f8>] arch_do_sysctl+0x1c/0x2c
(XEN)[<0a230b40>] do_sysctl+0x96c/0x9ec
(XEN)[<0a271e08>] traps.c#do_trap_hypercall+0x1e8/0x288
(XEN)[<0a273490>] do_trap_guest_sync+0x448/0x63c
(XEN)[<0a25c480>] entry.o#guest_sync_slowpath+0xa8/0xd8
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Assertion 'rw_is_locked(_host_lock)' failed at 
drivers/passthrough/device_tree.c:146
(XEN) 

This is because iommu_remove_dt_device() is called without taking the
dt_host_lock. dt_host_lock is meant to ensure that the DT node will not
disappear behind back. So fix the issue by taking the lock as soon as
getting hold of overlay_node.

Similar issue will be observed in adding the dtbo:
(XEN) Assertion 'system_state < SYS_STATE_active || rw_is_locked(_host_lock)'
failed at xen-source/xen/drivers/passthrough/device_tree.c:192
(XEN) [ Xen-4.19-unstable  arm64  debug=y  Not tainted ]
[...]
(XEN) Xen call trace:
(XEN)[<0a2594f4>] iommu_add_dt_device+0x7c/0x17c (PC)
(XEN)[<0a259494>] iommu_add_dt_device+0x1c/0x17c (LR)
(XEN)[<0a267db4>] handle_device+0x68/0x1e8
(XEN)[<0a208ba8>] dt_overlay_sysctl+0x9d4/0xb84
(XEN)[<0a27342c>] arch_do_sysctl+0x24/0x38
(XEN)[<0a231ac8>] do_sysctl+0x9ac/0xa34
(XEN)[<0a274b70>] traps.c#do_trap_hypercall+0x230/0x2dc
(XEN)[<0a276330>] do_trap_guest_sync+0x478/0x688
(XEN)[<0a25e480>] entry.o#guest_sync_slowpath+0xa8/0xd8

This is because the lock is released too early. So fix the issue by
releasing the lock after handle_device().

Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal 
functionalities")
Signed-off-by: Henry Wang 
---
v2:
- Take the lock as soon as getting hold of overlay_node. Also
  release the lock after handle_device() when adding dtbo.
v1.1:
- Move the unlock position before the check of rc.
---
 xen/common/dt-overlay.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 1b197381f6..9cece79067 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -429,18 +429,24 @@ static int remove_nodes(const struct overlay_track 
*tracker)
 if ( overlay_node == NULL )
 return -EINVAL;
 
+write_lock(_host_lock);
+
 rc = remove_descendant_nodes_resources(overlay_node);
 if ( rc )
+{
+write_unlock(_host_lock);
 return rc;
+}
 
 rc = remove_node_resources(overlay_node);
 if ( rc )
+{
+write_unlock(_host_lock);
 return rc;
+}
 
 dt_dprintk("Removing node: %s\n", overlay_node->full_name);
 
-write_lock(_host_lock);
-
 rc = dt_overlay_remove_node(overlay_node);
 if ( rc )
 {
@@ -604,8 +610,6 @@ static long add_nodes(struct overlay_track *tr, char 
**nodes_full_path)
 return rc;
 }
 
-write_unlock(_host_lock);
-
 prev_node->allnext = next_node;
 
 overlay_node = dt_find_node_by_path(overlay_node->full_name);
@@ -619,6 +623,7 @@ static long add_nodes(struct overlay_track *tr, char 
**nodes_full_path)
 rc = handle_device(hardware_domain, overlay_node, p2m_mmio_direct_c,
tr->iomem_ranges,
tr->irq_ranges);
+write_unlock(_host_lock);
 if ( rc )
 {
 printk(XENLOG_ERR "Adding IRQ and IOMMU failed\n");
-- 
2.34.1




[PATCH v2 0/8] Remaining patches for dynamic node programming using overlay dtbo

2024-05-16 Thread Henry Wang
Hi all,

This is the remaining series for the full functional "dynamic node
programming using overlay dtbo" feature. The first part [1] has
already been merged.

Quoting from the original series, the first part has already made
Xen aware of new device tree node which means updating the dt_host
with overlay node information, and in this series, the goal is to
map IRQ and IOMMU during runtime, where we will do the actual IOMMU
and IRQ mapping and unmapping to a running domain. Also, documentation
of the "dynamic node programming using overlay dtbo" feature is added.

Patch 1 and 2 are fixes of the existing code which is noticed during
my local tests, details please see the commit message.

Gitlab CI for this series can be found in [1].

[1] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1293126857

Henry Wang (6):
  xen/common/dt-overlay: Fix lock issue when add/remove the device
  tools/xl: Correct the help information and exit code of the dt-overlay
command
  xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
  tools/arm: Introduce the "nr_spis" xl config entry
  xen/arm: Add XEN_DOMCTL_dt_overlay DOMCTL and related operations
  tools: Introduce the "xl dt-overlay {attach,detach}" commands

Vikram Garhwal (2):
  xen/arm/gic: Allow routing/removing interrupt to running VMs
  docs: Add device tree overlay documentation

 docs/man/xl.cfg.5.pod.in  |  11 +
 docs/misc/arm/device-tree/booting.txt |  13 +
 docs/misc/arm/overlay.txt |  99 ++
 tools/golang/xenlight/helpers.gen.go  |   2 +
 tools/golang/xenlight/types.gen.go|   1 +
 tools/include/libxl.h |  15 +-
 tools/include/xenctrl.h   |   3 +
 tools/libs/ctrl/xc_dt_overlay.c   |  31 ++
 tools/libs/light/libxl_arm.c  |   4 +-
 tools/libs/light/libxl_dt_overlay.c   |  30 +-
 tools/libs/light/libxl_types.idl  |   1 +
 tools/xl/xl_cmdtable.c|   4 +-
 tools/xl/xl_parse.c   |   3 +
 tools/xl/xl_vmcontrol.c   |  39 ++-
 xen/arch/arm/dom0less-build.c |   7 +-
 xen/arch/arm/domctl.c |   3 +
 xen/arch/arm/gic-vgic.c   |   8 +-
 xen/arch/arm/gic.c|  15 -
 xen/arch/arm/vgic/vgic.c  |   5 +-
 xen/common/dt-overlay.c   | 418 +-
 xen/include/public/domctl.h   |  15 +
 xen/include/public/sysctl.h   |   7 +-
 xen/include/xen/dt-overlay.h  |   7 +
 23 files changed, 615 insertions(+), 126 deletions(-)
 create mode 100644 docs/misc/arm/overlay.txt

-- 
2.34.1




Re: [PATCH] drivers/xen: Improve the late XenStore init protocol

2024-05-15 Thread Henry Wang

Hi Stefano,

On 5/16/2024 6:30 AM, Stefano Stabellini wrote:

On Wed, 15 May 2024, Henry Wang wrote:

Currently, the late XenStore init protocol is only triggered properly
for the case that HVM_PARAM_STORE_PFN is ~0ULL (invalid). For the
case that XenStore interface is allocated but not ready (the connection
status is not XENSTORE_CONNECTED), Linux should also wait until the
XenStore is set up properly.

Introduce a macro to describe the XenStore interface is ready, use
it in xenbus_probe_initcall() and xenbus_probe() to select the code
path of doing the late XenStore init protocol or not.

Take the opportunity to enhance the check of the allocated XenStore
interface can be properly mapped, and return error early if the
memremap() fails.

Signed-off-by: Henry Wang 
Signed-off-by: Michal Orzel 

Please add a Fixes: tag


Sure. Will do.


---
  drivers/xen/xenbus/xenbus_probe.c | 21 -
  1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 3205e5d724c8..8aec0ed1d047 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -72,6 +72,10 @@ EXPORT_SYMBOL_GPL(xen_store_evtchn);
  struct xenstore_domain_interface *xen_store_interface;
  EXPORT_SYMBOL_GPL(xen_store_interface);
  
+#define XS_INTERFACE_READY \

+   ((xen_store_interface != NULL) && \
+(xen_store_interface->connection == XENSTORE_CONNECTED))
+
  enum xenstore_init xen_store_domain_type;
  EXPORT_SYMBOL_GPL(xen_store_domain_type);
  
@@ -751,9 +755,10 @@ static void xenbus_probe(void)

  {
xenstored_ready = 1;
  
-	if (!xen_store_interface) {

-   xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT,
-  XEN_PAGE_SIZE, MEMREMAP_WB);
+   if (!xen_store_interface || XS_INTERFACE_READY) {
+   if (!xen_store_interface)

These two nested if's don't make sense to me. If XS_INTERFACE_READY
succeeds, it means that  ((xen_store_interface != NULL) &&
(xen_store_interface->connection == XENSTORE_CONNECTED)).

So it is not possible that xen_store_interface == NULL immediately
after. Right?


I think this is because we want to free the irq for the late init case, 
otherwise the init-dom0less will fail. For the xenstore PFN allocated 
case, the connection is already set to CONNECTED when we execute 
init-dom0less. But I agree with you, would below diff makes more sense 
to you?


diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c

index 8aec0ed1d047..b8005b651a29 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -76,6 +76,8 @@ EXPORT_SYMBOL_GPL(xen_store_interface);
    ((xen_store_interface != NULL) && \
 (xen_store_interface->connection == XENSTORE_CONNECTED))

+static bool xs_late_init = false;
+
 enum xenstore_init xen_store_domain_type;
 EXPORT_SYMBOL_GPL(xen_store_domain_type);

@@ -755,7 +757,7 @@ static void xenbus_probe(void)
 {
    xenstored_ready = 1;

-   if (!xen_store_interface || XS_INTERFACE_READY) {
+   if (xs_late_init) {
    if (!xen_store_interface)
    xen_store_interface = memremap(xen_store_gfn << 
XEN_PAGE_SHIFT,

XEN_PAGE_SIZE, MEMREMAP_WB);
@@ -937,6 +939,8 @@ static irqreturn_t xenbus_late_init(int irq, void 
*unused)

    int err;
    uint64_t v = 0;

+   xs_late_init = true;
+
    err = hvm_get_parameter(HVM_PARAM_STORE_PFN, );
    if (err || !v || !~v)
    return IRQ_HANDLED;


+   xen_store_interface = memremap(xen_store_gfn << 
XEN_PAGE_SHIFT,
+  XEN_PAGE_SIZE, 
MEMREMAP_WB);
/*
 * Now it is safe to free the IRQ used for xenstore late
 * initialization. No need to unbind: it is about to be
@@ -822,7 +827,7 @@ static int __init xenbus_probe_initcall(void)
if (xen_store_domain_type == XS_PV ||
(xen_store_domain_type == XS_HVM &&
 !xs_hvm_defer_init_for_callback() &&
-xen_store_interface != NULL))
+XS_INTERFACE_READY))
xenbus_probe();
  
  	/*

@@ -831,7 +836,7 @@ static int __init xenbus_probe_initcall(void)
 * started, then probe.  It will be triggered when communication
 * starts happening, by waiting on xb_waitq.
 */
-   if (xen_store_domain_type == XS_LOCAL || xen_store_interface == NULL) {
+   if (xen_store_domain_type == XS_LOCAL || !XS_INTERFACE_READY) {
struct task_struct *probe_task;
  
  		probe_task = kthread_run(xenbus_probe_thread, NULL,

@@ -1014,6 +1019,12 @@ static int __init xenbus_init(void)
xen_store_interface =
memremap(xen_store_gfn << XEN_PAGE_S

Re: Proposal to Extend Feature Freeze Deadline

2024-05-14 Thread Henry Wang

Hi Oleksii,

On 5/14/2024 11:43 PM, Andrew Cooper wrote:

On 14/05/2024 4:40 pm, Oleksii K. wrote:

Hello everyone,

We're observing fewer merged patches/series across several
architectures for the current 4.19 release in comparison to previous
release.

For example:
1. For Arm, significant features like Cache Coloring and PCI
Passthrough won't be fully merged. Thus, it would be beneficial to
commit at least the following two patch series:
[1]https://lore.kernel.org/xen-devel/20240511005611.83125-1-xin.wa...@amd.com/
   
[2]https://lore.kernel.org/xen-devel/20240424033449.168398-1-xin.wa...@amd.com/


2. For RISC-V, having the following patch series [3], mostly reviewed
with only one blocker [4], would be advantageous (As far as I know,
Andrew is planning to update his patch series):
[3]https://lore.kernel.org/xen-devel/cover.1713347222.git.oleksii.kuroc...@gmail.com/
[4]
https://patchew.org/Xen/20240313172716.2325427-1-andrew.coop...@citrix.com/

3. For PPC, it would be beneficial to have [5] merged:
[5]
https://lore.kernel.org/xen-devel/cover.1712893887.git.sanasta...@raptorengineering.com/

Extending the feature freeze deadline by one week, until May 24th,
would provide additional time for merges mentioned above. This, in
turn, would create space for more features and fixes for x86 and other
common elements. If we agree to extend the feature freeze deadline,
please feel free to outline what you would like to see in the 4.19
release. This will make it easier to track our final goals and
determine if they are realistically achievable.

I'd like to open the floor for discussion on this proposal. Does it
make sense, and would it be useful?

Considering how many people are blocked on me, I'd welcome a little bit
longer to get the outstanding series/fixes to land.


It would be great if we can extend the deadline for a week, thank you! I 
will try my best to make progress of the two above-mentioned Arm series.


Kind regards,
Henry



[PATCH] drivers/xen: Improve the late XenStore init protocol

2024-05-14 Thread Henry Wang
Currently, the late XenStore init protocol is only triggered properly
for the case that HVM_PARAM_STORE_PFN is ~0ULL (invalid). For the
case that XenStore interface is allocated but not ready (the connection
status is not XENSTORE_CONNECTED), Linux should also wait until the
XenStore is set up properly.

Introduce a macro to describe the XenStore interface is ready, use
it in xenbus_probe_initcall() and xenbus_probe() to select the code
path of doing the late XenStore init protocol or not.

Take the opportunity to enhance the check of the allocated XenStore
interface can be properly mapped, and return error early if the
memremap() fails.

Signed-off-by: Henry Wang 
Signed-off-by: Michal Orzel 
---
 drivers/xen/xenbus/xenbus_probe.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 3205e5d724c8..8aec0ed1d047 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -72,6 +72,10 @@ EXPORT_SYMBOL_GPL(xen_store_evtchn);
 struct xenstore_domain_interface *xen_store_interface;
 EXPORT_SYMBOL_GPL(xen_store_interface);
 
+#define XS_INTERFACE_READY \
+   ((xen_store_interface != NULL) && \
+(xen_store_interface->connection == XENSTORE_CONNECTED))
+
 enum xenstore_init xen_store_domain_type;
 EXPORT_SYMBOL_GPL(xen_store_domain_type);
 
@@ -751,9 +755,10 @@ static void xenbus_probe(void)
 {
xenstored_ready = 1;
 
-   if (!xen_store_interface) {
-   xen_store_interface = memremap(xen_store_gfn << XEN_PAGE_SHIFT,
-  XEN_PAGE_SIZE, MEMREMAP_WB);
+   if (!xen_store_interface || XS_INTERFACE_READY) {
+   if (!xen_store_interface)
+   xen_store_interface = memremap(xen_store_gfn << 
XEN_PAGE_SHIFT,
+  XEN_PAGE_SIZE, 
MEMREMAP_WB);
/*
 * Now it is safe to free the IRQ used for xenstore late
 * initialization. No need to unbind: it is about to be
@@ -822,7 +827,7 @@ static int __init xenbus_probe_initcall(void)
if (xen_store_domain_type == XS_PV ||
(xen_store_domain_type == XS_HVM &&
 !xs_hvm_defer_init_for_callback() &&
-xen_store_interface != NULL))
+XS_INTERFACE_READY))
xenbus_probe();
 
/*
@@ -831,7 +836,7 @@ static int __init xenbus_probe_initcall(void)
 * started, then probe.  It will be triggered when communication
 * starts happening, by waiting on xb_waitq.
 */
-   if (xen_store_domain_type == XS_LOCAL || xen_store_interface == NULL) {
+   if (xen_store_domain_type == XS_LOCAL || !XS_INTERFACE_READY) {
struct task_struct *probe_task;
 
probe_task = kthread_run(xenbus_probe_thread, NULL,
@@ -1014,6 +1019,12 @@ static int __init xenbus_init(void)
xen_store_interface =
memremap(xen_store_gfn << XEN_PAGE_SHIFT,
 XEN_PAGE_SIZE, MEMREMAP_WB);
+   if (!xen_store_interface) {
+   pr_err("%s: cannot map 
HVM_PARAM_STORE_PFN=%llx\n",
+  __func__, v);
+   err = -ENOMEM;
+   goto out_error;
+   }
if (xen_store_interface->connection != 
XENSTORE_CONNECTED)
wait = true;
}
-- 
2.34.1




Re: [PATCH v2 1/4] xen/arm: Alloc hypervisor reserved pages as magic pages for Dom0less DomUs

2024-05-12 Thread Henry Wang

Hi Julien,

On 5/11/2024 7:03 PM, Julien Grall wrote:

Hi Henry,

On 11/05/2024 01:56, Henry Wang wrote:

  +static int __init alloc_magic_pages(struct domain *d)
+{
+    struct page_info *magic_pg;
+    mfn_t mfn;
+    gfn_t gfn;
+    int rc;
+
+    d->max_pages += NR_MAGIC_PAGES;
+    magic_pg = alloc_domheap_pages(d, 
get_order_from_pages(NR_MAGIC_PAGES), 0);

+    if ( magic_pg == NULL )
+    return -ENOMEM;
+
+    mfn = page_to_mfn(magic_pg);
+    if ( !is_domain_direct_mapped(d) )
+    gfn = gaddr_to_gfn(GUEST_MAGIC_BASE);
+    else
+    gfn = gaddr_to_gfn(mfn_to_maddr(mfn));


Summarizing the discussion we had on Matrix. Regions like the extend 
area and shared memory may not be direct mapped. So unfortunately, I 
think it is possible that the GFN could clash with one of those.


At least in the shared memory case, the user can provide the address. 
But as you use the domheap allocator, the address returned could 
easily change if you tweak your setup.


I am not entirely sure what's the best solution. We could ask the user 
to provide the information for reserved region. But it feels like we 
are exposing a bit too much to the user.


So possibly we would want to use the same approach as extended 
regions. Once we processed all the mappings, find some space for the 
hypervisor regions.


One thing that I noticed when I re-visit the extended region finding 
code from the hypervisor side is:
When the domain is direct-mapped, when we find extended region for the 
domain, we either use find_unallocated_memory() or find_memory_holes(). 
It looks like the removal of shared memory regions in both functions 
uses the paddr parsed from the device tree to remove the regions, which 
indicates there is an assumption that when a domain is direct-mapped, 
the shared memory should also be direct-mapped. I might be wrong, but 
otherwise I don't think the extended region finding logic will carve out 
the correct shared memory region gpaddr range for guests.


So I think we are missing the documentation (and the corresponding 
checking when we parse the device tree) for above assumption for the 
static shared memory, i.e., when the domain is direct-mapped, the static 
shared memory should also be direct-mapped, and user should make sure 
this is satisfied in the device tree otherwise Xen should complain.


If we add this assumption and related checking code, I think your 
concern of clashing with static shared memory can be addressed. Do you 
agree?


Kind regards,
Henry



Any other suggestions?

Cheers,






Re: [PATCH v2 1/4] xen/arm: Alloc hypervisor reserved pages as magic pages for Dom0less DomUs

2024-05-11 Thread Henry Wang

Hi Julien,

On 5/11/2024 4:46 PM, Julien Grall wrote:

Hi Henry,

On 11/05/2024 01:56, Henry Wang wrote:

There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

The "magic page" is a terminology used in the toolstack as reserved
pages for the VM to have access to virtual platform capabilities.
Currently the magic pages for Dom0less DomUs are populated by the
init-dom0less app through populate_physmap(), and populate_physmap()
automatically assumes gfn == mfn for 1:1 direct mapped domains. This
cannot be true for the magic pages that are allocated later from the
init-dom0less application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list is
empty at that time.

To solve above issue, this commit allocates hypervisor reserved pages
(currently used as the magic pages) for Arm Dom0less DomUs at the
domain construction time. The base address/PFN of the region will be
noted and communicated to the init-dom0less application in Dom0.

Reported-by: Alec Kwapis 
Suggested-by: Daniel P. Smith 
Signed-off-by: Henry Wang 
---
v2:
- Reword the commit msg to explain what is "magic page" and use generic
   terminology "hypervisor reserved pages" in commit msg. (Daniel)
- Also move the offset definition of magic pages. (Michal)
- Extract the magic page allocation logic to a function. (Michal)
---
  tools/libs/guest/xg_dom_arm.c |  6 --
  xen/arch/arm/dom0less-build.c | 32 
  xen/include/public/arch-arm.h |  6 ++
  3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/tools/libs/guest/xg_dom_arm.c 
b/tools/libs/guest/xg_dom_arm.c

index 2fd8ee7ad4..8c579d7576 100644
--- a/tools/libs/guest/xg_dom_arm.c
+++ b/tools/libs/guest/xg_dom_arm.c
@@ -25,12 +25,6 @@
    #include "xg_private.h"
  -#define NR_MAGIC_PAGES 4
-#define CONSOLE_PFN_OFFSET 0
-#define XENSTORE_PFN_OFFSET 1
-#define MEMACCESS_PFN_OFFSET 2
-#define VUART_PFN_OFFSET 3
-
  #define LPAE_SHIFT 9
    #define PFN_4K_SHIFT  (0)
diff --git a/xen/arch/arm/dom0less-build.c 
b/xen/arch/arm/dom0less-build.c

index 74f053c242..4b96ddd9ce 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -739,6 +739,34 @@ static int __init alloc_xenstore_evtchn(struct 
domain *d)

  return 0;
  }
  +static int __init alloc_magic_pages(struct domain *d)
+{
+    struct page_info *magic_pg;
+    mfn_t mfn;
+    gfn_t gfn;
+    int rc;
+
+    d->max_pages += NR_MAGIC_PAGES;


Here you bump d->max_mages by NR_MAGIC_PAGES but...

+    magic_pg = alloc_domheap_pages(d, 
get_order_from_pages(NR_MAGIC_PAGES), 0);


... here you will allocate using a power-of-two. Which may end up to 
fail as there is nothing guaranteeing that NR_MAGIC_PAGES is suitably 
aligned.


For now NR_MAGIC_PAGES seems suitably aligned, so it BUILD_BUG_ON() 
woudl be ok.


Great catch! I will add BUILD_BUG_ON(NR_MAGIC_PAGES & (NR_MAGIC_PAGES - 1));
Thanks.


+    if ( magic_pg == NULL )
+    return -ENOMEM;
+
+    mfn = page_to_mfn(magic_pg);
+    if ( !is_domain_direct_mapped(d) )
+    gfn = gaddr_to_gfn(GUEST_MAGIC_BASE);
+    else
+    gfn = gaddr_to_gfn(mfn_to_maddr(mfn));


Allocating the magic pages contiguously is only necessary for direct 
mapped domain. For the other it might be preferable to allocate page 
by page. That said, NR_MAGIC_PAGES is not big enough. So it would be 
okay.



+
+    rc = guest_physmap_add_pages(d, gfn, mfn, NR_MAGIC_PAGES);
+    if ( rc )
+    {
+    free_domheap_pages(magic_pg, 
get_order_from_pages(NR_MAGIC_PAGES));

+    return rc;
+    }
+
+    return 0;
+}
+
  static int __init construct_domU(struct domain *d,
   const struct dt_device_node *node)
  {
@@ -840,6 +868,10 @@ static int __init construct_domU(struct domain *d,
  if ( rc < 0 )
  return rc;
  d->arch.hvm.params[HVM_PARAM_STORE_PFN] = ~0ULL;
+
+    rc = alloc_magic_pages(d);
+    if ( rc < 0 )
+    return rc;


This will only be allocated xenstore is enabled. But I don't think 
some of the magic pages really require xenstore to work. In the future 
we may need some more fine graine choice (see my comment in patch #2 
as well).


Sorry, but it seems that by the time that I am writing this reply, I 
didn't get the email for patch #2 comment. I will reply both together 
when I see it.



  }
    return rc;
diff --git a/xen/include/public/arch-arm.h 
b/xen/include/public/arch-arm

Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM

2024-05-11 Thread Henry Wang

Hi Julien,

On 5/11/2024 4:22 PM, Julien Grall wrote:

Hi Henry,

On 11/05/2024 08:29, Henry Wang wrote:

+    /*
+ * Handle the LR where the physical interrupt is 
de-assigned from the

+ * guest before it was EOIed
+ */
+    struct vcpu *v_target = vgic_get_target_vcpu(d->vcpu[0], 
virq);


This will return a vCPU from the current affinity. This may not be 
where the interrupt was injected. From a brief look, I can't tell 
whether we have an easy way to know where the interrupt was injected 
(other than the pending_irq is in the list lr_queue/inflight)


I doubt if we need to handle more than this - I think if the 
pending_irq is not in the lr_queue/inflight list, it would not belong 
to the corner case we are talking about (?).


I didn't suggest we would need to handle the case where the 
pending_irq is not any of the queues. I was pointing out that I think 
we don't directly store the vCPU ID where we injected the IRQ. 
Instead, the pending_irq is just in list, so we will possibly need to 
store the vCPU ID for convenience.


Sorry for misunderstanding. Yeah you are definitely correct. Also thank 
you so much for the suggestion! Before seeing this suggestion, I was 
struggling in finding the correct vCPU by "for_each_vcpus" and 
comparison... but now I realized your suggestion is way more clever :)


Kind regards,
Henry



Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM

2024-05-11 Thread Henry Wang

Hi Julien,

On 5/10/2024 4:54 PM, Julien Grall wrote:

Hi,

On 09/05/2024 16:31, Henry Wang wrote:

On 5/9/2024 4:46 AM, Julien Grall wrote:

Hi Henry,
[...]
```
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index a775f886ed..d3f9cd2299 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -135,16 +135,6 @@ int gic_route_irq_to_guest(struct domain *d, 
unsigned int virq,

  ASSERT(virq < vgic_num_irqs(d));
  ASSERT(!is_lpi(virq));

-    /*
- * When routing an IRQ to guest, the virtual state is not synced
- * back to the physical IRQ. To prevent get unsync, restrict the
- * routing to when the Domain is been created.
- */
-#ifndef CONFIG_OVERLAY_DTB
-    if ( d->creation_finished )
-    return -EBUSY;
-#endif
-
  ret = vgic_connect_hw_irq(d, NULL, virq, desc, true);


This is checking if the interrupt is already enabled. Do we also need 
to check for active/pending?


Thank you for raising this! I assume you meant this?
@@ -444,7 +444,9 @@ int vgic_connect_hw_irq(struct domain *d, struct 
vcpu *v, unsigned int virq,

 {
 /* The VIRQ should not be already enabled by the guest */
 if ( !p->desc &&
- !test_bit(GIC_IRQ_GUEST_ENABLED, >status) )
+ !test_bit(GIC_IRQ_GUEST_ENABLED, >status) &&
+ !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) &&
+ !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) )
 p->desc = desc;
 else
 ret = -EBUSY;

I think adding the check for active/pending check at the time of routing 
the IRQ makes sense, so I will add them (both for old and new vGIC 
implementation).



  if ( ret )
  return ret;
@@ -169,20 +159,40 @@ int gic_remove_irq_from_guest(struct domain *d, 
unsigned int virq,

  ASSERT(test_bit(_IRQ_GUEST, >status));
  ASSERT(!is_lpi(virq));

-    /*
- * Removing an interrupt while the domain is running may have
- * undesirable effect on the vGIC emulation.
- */
-#ifndef CONFIG_OVERLAY_DTB
-    if ( !d->is_dying )
-    return -EBUSY;
-#endif
-
  desc->handler->shutdown(desc);

  /* EOI the IRQ if it has not been done by the guest */
  if ( test_bit(_IRQ_INPROGRESS, >status) )
+    {


I assume this is just a PoC state, but I just want to point out that 
this will not work with the new vGIC (some of the functions doesn't 
exist there).


Thank you. Yes currently we can discuss for the old vGIC implementation. 
After we reach the final conclusion I will do the changes for both old 
and new vGIC.



+    /*
+ * Handle the LR where the physical interrupt is de-assigned 
from the

+ * guest before it was EOIed
+ */
+    struct vcpu *v_target = vgic_get_target_vcpu(d->vcpu[0], virq);


This will return a vCPU from the current affinity. This may not be 
where the interrupt was injected. From a brief look, I can't tell 
whether we have an easy way to know where the interrupt was injected 
(other than the pending_irq is in the list lr_queue/inflight)


I doubt if we need to handle more than this - I think if the pending_irq 
is not in the lr_queue/inflight list, it would not belong to the corner 
case we are talking about (?).



+    }
+ spin_unlock_irqrestore(_target->arch.vgic.lock, flags);
+
+    vgic_lock_rank(v_target, rank, flags);
+    vgic_disable_irqs(v_target, (~rank->ienable) & 
rank->ienable, rank->index);

+    vgic_unlock_rank(v_target, rank, flags);


Why do you need to call vgic_disable_irqs()?


I will drop this part.

Kind regards,
Henry



[PATCH v2 0/4] Guest magic region allocation for 11 Dom0less domUs - Take two

2024-05-10 Thread Henry Wang
Hi all,

This series is trying to fix the reported guest magic region allocation
issue for 11 Dom0less domUs, an error message can seen from the
init-dom0less application on 1:1 direct-mapped Dom0less DomUs:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

This is because populate_physmap() automatically assumes gfn == mfn
for direct mapped domains. This cannot be true for the magic pages
that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
helper application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list
is empty at that time.

In [1] I've tried to fix this issue by the domctl approach, and
discussions in [2] and [3] indicates that a domctl is not really
necessary, as we can simplify the issue to "allocate the Dom0less
guest magic regions at the Dom0less DomU build time and pass the
region base PFN to init-dom0less application". Therefore, the first
patch in this series will allocate magic pages for Dom0less DomUs,
the second patch will store the allocated region base PFN to HVMOP
params like HVM_PARAM_CALLBACK_IRQ, and the third patch uses the
HVMOP to get the stored guest magic region base PFN to avoid hardcoding
GUEST_MAGIC_BASE. The last patch will update documentation.

Gitlab CI for this series can be found in [4].

[1] https://lore.kernel.org/xen-devel/20240409045357.236802-1-xin.wa...@amd.com/
[2] 
https://lore.kernel.org/xen-devel/c7857223-eab8-409a-b618-6ec70f616...@apertussolutions.com/
[3] 
https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2404251508470.3940@ubuntu-linux-20-04-desktop/
[4] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1285727622

Henry Wang (4):
  xen/arm: Alloc hypervisor reserved pages as magic pages for Dom0less
DomUs
  xen/arm: Add new HVM_PARAM_HV_RSRV_{BASE_PFN,SIZE} keys in HVMOP
  tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE
  docs/features/dom0less: Update the late XenStore init protocol

 docs/features/dom0less.pandoc   |  8 ---
 tools/helpers/init-dom0less.c   | 40 ++---
 tools/libs/guest/xg_dom_arm.c   |  6 -
 xen/arch/arm/dom0less-build.c   | 35 +
 xen/arch/arm/hvm.c  |  2 ++
 xen/include/public/arch-arm.h   |  6 +
 xen/include/public/hvm/params.h | 11 -
 7 files changed, 75 insertions(+), 33 deletions(-)

-- 
2.34.1




[PATCH v2 4/4] docs/features/dom0less: Update the late XenStore init protocol

2024-05-10 Thread Henry Wang
With the new allocation strategy of Dom0less DomUs magic page
region, update the documentation of the late XenStore init
protocol accordingly.

Signed-off-by: Henry Wang 
---
v2:
- New patch.
---
 docs/features/dom0less.pandoc | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/docs/features/dom0less.pandoc b/docs/features/dom0less.pandoc
index 725afa0558..137e6b618b 100644
--- a/docs/features/dom0less.pandoc
+++ b/docs/features/dom0less.pandoc
@@ -110,8 +110,9 @@ hotplug PV drivers to dom0less guests. E.g. xl 
network-attach domU.
 The implementation works as follows:
 - Xen allocates the xenstore event channel for each dom0less domU that
   has the "xen,enhanced" property, and sets HVM_PARAM_STORE_EVTCHN
-- Xen does *not* allocate the xenstore page and sets HVM_PARAM_STORE_PFN
-  to ~0ULL (invalid)
+- Xen allocates the hypervisor reserved pages region (the xenstore page
+  is part of it) and sets HVM_PARAM_HV_RSRV_{BASE_PFN,SIZE} accordingly.
+  Xen sets HVM_PARAM_STORE_PFN to ~0ULL (invalid).
 - Dom0less domU kernels check that HVM_PARAM_STORE_PFN is set to invalid
 - Old kernels will continue without xenstore support (Note: some old
   buggy kernels might crash because they don't check the validity of
@@ -121,7 +122,8 @@ The implementation works as follows:
   channel (HVM_PARAM_STORE_EVTCHN) before continuing with the
   initialization
 - Once dom0 is booted, init-dom0less is executed:
-- it allocates the xenstore shared page and sets HVM_PARAM_STORE_PFN
+- it gets the xenstore shared page from HVM_PARAM_HV_RSRV_BASE_PFN
+  and sets HVM_PARAM_STORE_PFN
 - it calls xs_introduce_domain
 - Xenstored notices the new domain, initializes interfaces as usual, and
   sends an event channel notification to the domain using the xenstore
-- 
2.34.1




[PATCH v2 2/4] xen/arm: Add new HVM_PARAM_HV_RSRV_{BASE_PFN,SIZE} keys in HVMOP

2024-05-10 Thread Henry Wang
For use cases such as Dom0less PV drivers, a mechanism to communicate
Dom0less DomU's static data with the runtime control plane (Dom0) is
needed. Since on Arm HVMOP is already the existing approach to address
such use cases (for example the allocation of HVM_PARAM_CALLBACK_IRQ),
add new HVMOP keys HVM_PARAM_HV_RSRV_{BASE_PFN,SIZE} for storing the
hypervisor reserved pages region base PFN and size.

Currently, the hypervisor reserved pages region is used as the Arm
Dom0less DomU guest magic pages region. Therefore protect the HVMOP
keys with "#if defined(__arm__) || defined(__aarch64__)". The values
will be set at Dom0less DomU construction time after Dom0less DomU's
magic pages region has been allocated.

Reported-by: Alec Kwapis 
Signed-off-by: Henry Wang 
---
v2:
- Rename the HVMOP keys to HVM_PARAM_HV_RSRV_{BASE_PFN,SIZE}. (Daniel)
- Add comment on top of HVM_PARAM_HV_RSRV_{BASE_PFN,SIZE} to describe
  its usage. Protect them with #ifdef. (Daniel, Jan)
---
 xen/arch/arm/dom0less-build.c   |  3 +++
 xen/arch/arm/hvm.c  |  2 ++
 xen/include/public/hvm/params.h | 11 ++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 4b96ddd9ce..5bb53ebb47 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -764,6 +764,9 @@ static int __init alloc_magic_pages(struct domain *d)
 return rc;
 }
 
+d->arch.hvm.params[HVM_PARAM_HV_RSRV_BASE_PFN] = gfn_x(gfn);
+d->arch.hvm.params[HVM_PARAM_HV_RSRV_SIZE] = NR_MAGIC_PAGES;
+
 return 0;
 }
 
diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
index 0989309fea..949d804f8b 100644
--- a/xen/arch/arm/hvm.c
+++ b/xen/arch/arm/hvm.c
@@ -55,6 +55,8 @@ static int hvm_allow_get_param(const struct domain *d, 
unsigned int param)
 case HVM_PARAM_STORE_EVTCHN:
 case HVM_PARAM_CONSOLE_PFN:
 case HVM_PARAM_CONSOLE_EVTCHN:
+case HVM_PARAM_HV_RSRV_BASE_PFN:
+case HVM_PARAM_HV_RSRV_SIZE:
 return 0;
 
 /*
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index a22b4ed45d..337f5b0bf8 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -296,6 +296,15 @@
 #define XEN_HVM_MCA_CAP_LMCE   (xen_mk_ullong(1) << 0)
 #define XEN_HVM_MCA_CAP_MASK   XEN_HVM_MCA_CAP_LMCE
 
-#define HVM_NR_PARAMS 39
+/*
+ * Base PFN and number of pages of the hypervisor reserved pages region.
+ * Currently only used on Arm for Dom0less DomUs as guest magic pages region.
+ */
+#if defined(__arm__) || defined(__aarch64__)
+#define HVM_PARAM_HV_RSRV_BASE_PFN 39
+#define HVM_PARAM_HV_RSRV_SIZE 40
+#endif
+
+#define HVM_NR_PARAMS 41
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
-- 
2.34.1




[PATCH v2 3/4] tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE

2024-05-10 Thread Henry Wang
Currently the GUEST_MAGIC_BASE in the init-dom0less application is
hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less
DomUs.

Since the guest magic region is now allocated from the hypervisor,
instead of hardcoding the guest magic pages region, use
xc_hvm_param_get() to get the guest magic region PFN, and based on
that the XenStore PFN can be calculated. Also, we don't need to set
the max mem anymore, so drop the call to xc_domain_setmaxmem(). Rename
the alloc_xs_page() to get_xs_page() to reflect the changes.

Take the opportunity to do some coding style improvements when possible.

Reported-by: Alec Kwapis 
Signed-off-by: Henry Wang 
---
v2:
- Update HVMOP keys name.
---
 tools/helpers/init-dom0less.c | 40 +++
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/tools/helpers/init-dom0less.c b/tools/helpers/init-dom0less.c
index fee93459c4..04039a2a66 100644
--- a/tools/helpers/init-dom0less.c
+++ b/tools/helpers/init-dom0less.c
@@ -19,24 +19,20 @@
 #define XENSTORE_PFN_OFFSET 1
 #define STR_MAX_LENGTH 128
 
-static int alloc_xs_page(struct xc_interface_core *xch,
- libxl_dominfo *info,
- uint64_t *xenstore_pfn)
+static int get_xs_page(struct xc_interface_core *xch, libxl_dominfo *info,
+   uint64_t *xenstore_pfn)
 {
 int rc;
-const xen_pfn_t base = GUEST_MAGIC_BASE >> XC_PAGE_SHIFT;
-xen_pfn_t p2m = (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET;
+xen_pfn_t magic_base_pfn;
 
-rc = xc_domain_setmaxmem(xch, info->domid,
- info->max_memkb + (XC_PAGE_SIZE/1024));
-if (rc < 0)
-return rc;
-
-rc = xc_domain_populate_physmap_exact(xch, info->domid, 1, 0, 0, );
-if (rc < 0)
-return rc;
+rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_HV_RSRV_BASE_PFN,
+  _base_pfn);
+if (rc < 0) {
+printf("Failed to get HVM_PARAM_HV_RSRV_BASE_PFN\n");
+return 1;
+}
 
-*xenstore_pfn = base + XENSTORE_PFN_OFFSET;
+*xenstore_pfn = magic_base_pfn + XENSTORE_PFN_OFFSET;
 rc = xc_clear_domain_page(xch, info->domid, *xenstore_pfn);
 if (rc < 0)
 return rc;
@@ -100,6 +96,7 @@ static bool do_xs_write_vm(struct xs_handle *xsh, 
xs_transaction_t t,
  */
 static int create_xenstore(struct xs_handle *xsh,
libxl_dominfo *info, libxl_uuid uuid,
+   xen_pfn_t xenstore_pfn,
evtchn_port_t xenstore_port)
 {
 domid_t domid;
@@ -145,8 +142,7 @@ static int create_xenstore(struct xs_handle *xsh,
 rc = snprintf(target_memkb_str, STR_MAX_LENGTH, "%"PRIu64, 
info->current_memkb);
 if (rc < 0 || rc >= STR_MAX_LENGTH)
 return rc;
-rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%lld",
-  (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET);
+rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%"PRIu_xen_pfn, xenstore_pfn);
 if (rc < 0 || rc >= STR_MAX_LENGTH)
 return rc;
 rc = snprintf(xenstore_port_str, STR_MAX_LENGTH, "%u", xenstore_port);
@@ -245,9 +241,9 @@ static int init_domain(struct xs_handle *xsh,
 if (!xenstore_evtchn)
 return 0;
 
-/* Alloc xenstore page */
-if (alloc_xs_page(xch, info, _pfn) != 0) {
-printf("Error on alloc magic pages\n");
+/* Get xenstore page */
+if (get_xs_page(xch, info, _pfn) != 0) {
+printf("Error on getting xenstore page\n");
 return 1;
 }
 
@@ -278,13 +274,11 @@ static int init_domain(struct xs_handle *xsh,
 if (rc < 0)
 return rc;
 
-rc = create_xenstore(xsh, info, uuid, xenstore_evtchn);
+rc = create_xenstore(xsh, info, uuid, xenstore_pfn, xenstore_evtchn);
 if (rc)
 err(1, "writing to xenstore");
 
-rc = xs_introduce_domain(xsh, info->domid,
-(GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET,
-xenstore_evtchn);
+rc = xs_introduce_domain(xsh, info->domid, xenstore_pfn, xenstore_evtchn);
 if (!rc)
 err(1, "xs_introduce_domain");
 return 0;
-- 
2.34.1




[PATCH v2 1/4] xen/arm: Alloc hypervisor reserved pages as magic pages for Dom0less DomUs

2024-05-10 Thread Henry Wang
There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

The "magic page" is a terminology used in the toolstack as reserved
pages for the VM to have access to virtual platform capabilities.
Currently the magic pages for Dom0less DomUs are populated by the
init-dom0less app through populate_physmap(), and populate_physmap()
automatically assumes gfn == mfn for 1:1 direct mapped domains. This
cannot be true for the magic pages that are allocated later from the
init-dom0less application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list is
empty at that time.

To solve above issue, this commit allocates hypervisor reserved pages
(currently used as the magic pages) for Arm Dom0less DomUs at the
domain construction time. The base address/PFN of the region will be
noted and communicated to the init-dom0less application in Dom0.

Reported-by: Alec Kwapis 
Suggested-by: Daniel P. Smith 
Signed-off-by: Henry Wang 
---
v2:
- Reword the commit msg to explain what is "magic page" and use generic
  terminology "hypervisor reserved pages" in commit msg. (Daniel)
- Also move the offset definition of magic pages. (Michal)
- Extract the magic page allocation logic to a function. (Michal)
---
 tools/libs/guest/xg_dom_arm.c |  6 --
 xen/arch/arm/dom0less-build.c | 32 
 xen/include/public/arch-arm.h |  6 ++
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/tools/libs/guest/xg_dom_arm.c b/tools/libs/guest/xg_dom_arm.c
index 2fd8ee7ad4..8c579d7576 100644
--- a/tools/libs/guest/xg_dom_arm.c
+++ b/tools/libs/guest/xg_dom_arm.c
@@ -25,12 +25,6 @@
 
 #include "xg_private.h"
 
-#define NR_MAGIC_PAGES 4
-#define CONSOLE_PFN_OFFSET 0
-#define XENSTORE_PFN_OFFSET 1
-#define MEMACCESS_PFN_OFFSET 2
-#define VUART_PFN_OFFSET 3
-
 #define LPAE_SHIFT 9
 
 #define PFN_4K_SHIFT  (0)
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 74f053c242..4b96ddd9ce 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -739,6 +739,34 @@ static int __init alloc_xenstore_evtchn(struct domain *d)
 return 0;
 }
 
+static int __init alloc_magic_pages(struct domain *d)
+{
+struct page_info *magic_pg;
+mfn_t mfn;
+gfn_t gfn;
+int rc;
+
+d->max_pages += NR_MAGIC_PAGES;
+magic_pg = alloc_domheap_pages(d, get_order_from_pages(NR_MAGIC_PAGES), 0);
+if ( magic_pg == NULL )
+return -ENOMEM;
+
+mfn = page_to_mfn(magic_pg);
+if ( !is_domain_direct_mapped(d) )
+gfn = gaddr_to_gfn(GUEST_MAGIC_BASE);
+else
+gfn = gaddr_to_gfn(mfn_to_maddr(mfn));
+
+rc = guest_physmap_add_pages(d, gfn, mfn, NR_MAGIC_PAGES);
+if ( rc )
+{
+free_domheap_pages(magic_pg, get_order_from_pages(NR_MAGIC_PAGES));
+return rc;
+}
+
+return 0;
+}
+
 static int __init construct_domU(struct domain *d,
  const struct dt_device_node *node)
 {
@@ -840,6 +868,10 @@ static int __init construct_domU(struct domain *d,
 if ( rc < 0 )
 return rc;
 d->arch.hvm.params[HVM_PARAM_STORE_PFN] = ~0ULL;
+
+rc = alloc_magic_pages(d);
+if ( rc < 0 )
+return rc;
 }
 
 return rc;
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 289af81bd6..186520d01f 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -476,6 +476,12 @@ typedef uint64_t xen_callback_t;
 #define GUEST_MAGIC_BASE  xen_mk_ullong(0x3900)
 #define GUEST_MAGIC_SIZE  xen_mk_ullong(0x0100)
 
+#define NR_MAGIC_PAGES 4
+#define CONSOLE_PFN_OFFSET 0
+#define XENSTORE_PFN_OFFSET 1
+#define MEMACCESS_PFN_OFFSET 2
+#define VUART_PFN_OFFSET 3
+
 #define GUEST_RAM_BANKS   2
 
 /*
-- 
2.34.1




Re: [PATCH 1/3] xen/arm/dom0less-build: Alloc magic pages for Dom0less DomUs from hypervisor

2024-05-10 Thread Henry Wang

Hi Michal,

Thanks very much for taking a look!

On 5/10/2024 3:37 PM, Michal Orzel wrote:

Hi Henry,

On 26/04/2024 05:14, Henry Wang wrote:

There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```
This is because currently the magic pages for Dom0less DomUs are
populated by the init-dom0less app through populate_physmap(), and
populate_physmap() automatically assumes gfn == mfn for 1:1 direct
mapped domains. This cannot be true for the magic pages that are
allocated later from the init-dom0less application executed in Dom0.
For domain using statically allocated memory but not 1:1 direct-mapped,
similar error "failed to retrieve a reserved page" can be seen as the
reserved memory list is empty at that time.

To solve above issue, this commit allocates the magic pages for
Dom0less DomUs at the domain construction time. The base address/PFN
of the magic page region will be noted and communicated to the
init-dom0less application in Dom0.

Reported-by: Alec Kwapis 
Suggested-by: Daniel P. Smith 
Signed-off-by: Henry Wang 
---
  tools/libs/guest/xg_dom_arm.c |  1 -
  xen/arch/arm/dom0less-build.c | 22 ++
  xen/include/public/arch-arm.h |  1 +
  3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/libs/guest/xg_dom_arm.c b/tools/libs/guest/xg_dom_arm.c
index 2fd8ee7ad4..8cc7f27dbb 100644
--- a/tools/libs/guest/xg_dom_arm.c
+++ b/tools/libs/guest/xg_dom_arm.c
@@ -25,7 +25,6 @@
  
  #include "xg_private.h"
  
-#define NR_MAGIC_PAGES 4

Moving only this macro to arch-arm.h while leaving the offsets does not make 
much sense to me.
I think they all should be moved. This would also allow init-dom0less.h not to 
re-define XENSTORE_PFN_OFFSET.


Sounds good. Will do in v2.


  #define CONSOLE_PFN_OFFSET 0
  #define XENSTORE_PFN_OFFSET 1
  #define MEMACCESS_PFN_OFFSET 2
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index fb63ec6fd1..40dc85c759 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -834,11 +834,33 @@ static int __init construct_domU(struct domain *d,
  
  if ( kinfo.dom0less_feature & DOM0LESS_XENSTORE )

  {
+struct page_info *magic_pg;
+mfn_t mfn;
+gfn_t gfn;
+
  ASSERT(hardware_domain);
  rc = alloc_xenstore_evtchn(d);
  if ( rc < 0 )
  return rc;
  d->arch.hvm.params[HVM_PARAM_STORE_PFN] = ~0ULL;
+
+d->max_pages += NR_MAGIC_PAGES;
+magic_pg = alloc_domheap_pages(d, 
get_order_from_pages(NR_MAGIC_PAGES), 0);

80 char exceeded


Ooops, I am sorry. Will fix in v2.


+if ( magic_pg == NULL )
+return -ENOMEM;
+
+mfn = page_to_mfn(magic_pg);
+if ( !is_domain_direct_mapped(d) )
+gfn = gaddr_to_gfn(GUEST_MAGIC_BASE);
+else
+gfn = gaddr_to_gfn(mfn_to_maddr(mfn));
+
+rc = guest_physmap_add_pages(d, gfn, mfn, NR_MAGIC_PAGES);
+if ( rc )
+{
+free_domheap_pages(magic_pg, get_order_from_pages(NR_MAGIC_PAGES));
+return rc;
+}

Please create a function alloc_magic_pages to encapsulate the above block.


Sure. Will do.

Kind regards,
Henry



Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM

2024-05-09 Thread Henry Wang

Hi Julien,

On 5/9/2024 4:46 AM, Julien Grall wrote:

Hi Henry,
[...]

we have 3 possible states which can be read from LR for this case : 
active, pending, pending and active.
- I don't think we can do anything about the active state, so we 
should return -EBUSY and reject the whole operation of removing the 
IRQ from running guest, and user can always retry this operation.


This would mean a malicious/buggy guest would be able to prevent a 
device to be de-assigned. This is not a good idea in particular when 
the domain is dying.


That said, I think you can handle this case. The LR has a bit to 
indicate whether the pIRQ needs to be EOIed. You can clear it and 
this would prevent the guest to touch the pIRQ. There might be other 
clean-up to do in the vGIC datastructure.


I probably misunderstood this sentence, do you mean the EOI bit in 
the pINTID field? I think this bit is only available when the HW bit 
of LR is 0, but in our case the HW is supposed to be 1 (as indicated 
as your previous comment). Would you mind clarifying a bit more? Thanks!


You are right, ICH_LR.HW will be 1 for physical IRQ routed to a guest. 
What I was trying to explain is this bit could be cleared (with 
ICH_LR.pINTD adjusted).


Thank you for all the discussions. Based on that, would below diff make 
sense to you? I did a test of the dynamic dtbo adding/removing with a 
ethernet device with this patch applied. Test steps are:
(1) Use xl dt-overlay to add the ethernet device to Xen device tree and 
assign it to dom0.

(2) Create a domU.
(3) Use xl dt-overlay to de-assign the device from dom0 and assign it to 
domU.

(4) Destroy the domU.

The ethernet device is functional in the domain respectively when it is 
attached to a domain and I don't see errors when I destroy domU. But 
honestly I think the case we talked about is a quite unusual case so I 
am not sure if it was hit during my test.


```
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index a775f886ed..d3f9cd2299 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -135,16 +135,6 @@ int gic_route_irq_to_guest(struct domain *d, 
unsigned int virq,

 ASSERT(virq < vgic_num_irqs(d));
 ASSERT(!is_lpi(virq));

-    /*
- * When routing an IRQ to guest, the virtual state is not synced
- * back to the physical IRQ. To prevent get unsync, restrict the
- * routing to when the Domain is been created.
- */
-#ifndef CONFIG_OVERLAY_DTB
-    if ( d->creation_finished )
-    return -EBUSY;
-#endif
-
 ret = vgic_connect_hw_irq(d, NULL, virq, desc, true);
 if ( ret )
 return ret;
@@ -169,20 +159,40 @@ int gic_remove_irq_from_guest(struct domain *d, 
unsigned int virq,

 ASSERT(test_bit(_IRQ_GUEST, >status));
 ASSERT(!is_lpi(virq));

-    /*
- * Removing an interrupt while the domain is running may have
- * undesirable effect on the vGIC emulation.
- */
-#ifndef CONFIG_OVERLAY_DTB
-    if ( !d->is_dying )
-    return -EBUSY;
-#endif
-
 desc->handler->shutdown(desc);

 /* EOI the IRQ if it has not been done by the guest */
 if ( test_bit(_IRQ_INPROGRESS, >status) )
+    {
+    /*
+ * Handle the LR where the physical interrupt is de-assigned 
from the

+ * guest before it was EOIed
+ */
+    struct vcpu *v_target = vgic_get_target_vcpu(d->vcpu[0], virq);
+    struct vgic_irq_rank *rank = vgic_rank_irq(v_target, virq);
+    struct pending_irq *p = irq_to_pending(v_target, virq);
+    unsigned long flags;
+
+    spin_lock_irqsave(_target->arch.vgic.lock, flags);
+    /* LR allocated for the IRQ */
+    if ( test_bit(GIC_IRQ_GUEST_ACTIVE, >status) &&
+ test_bit(GIC_IRQ_GUEST_VISIBLE, >status) )
+    {
+    gic_hw_ops->clear_lr(p->lr);
+    clear_bit(p->lr, _target->arch.lr_mask);
+
+    clear_bit(GIC_IRQ_GUEST_VISIBLE, >status);
+    clear_bit(GIC_IRQ_GUEST_ACTIVE, >status);
+    p->lr = GIC_INVALID_LR;
+    }
+    spin_unlock_irqrestore(_target->arch.vgic.lock, flags);
+
+    vgic_lock_rank(v_target, rank, flags);
+    vgic_disable_irqs(v_target, (~rank->ienable) & rank->ienable, 
rank->index);

+    vgic_unlock_rank(v_target, rank, flags);
+
 gic_hw_ops->deactivate_irq(desc);
+    }
 clear_bit(_IRQ_INPROGRESS, >status);

 ret = vgic_connect_hw_irq(d, NULL, virq, desc, false);
```

Kind regards,
Henry



Cheers,






Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM

2024-05-08 Thread Henry Wang

Hi Julien,

On 5/8/2024 5:54 AM, Julien Grall wrote:

Hi Henry,
What if the DT overlay is unloaded and then reloaded? Wouldn't the 
same interrupt be re-used? As a more generic case, this could also 
be a new bitstream for the FPGA.


But even if the interrupt is brand new every time for the DT 
overlay, you are effectively relaxing the check for every user (such 
as XEN_DOMCTL_bind_pt_irq). So the interrupt re-use case needs to be 
taken into account.


I agree. I think IIUC, with your explanation here and below, could we 
simplify the problem to how to properly handle the removal of the IRQ 
from a running guest, if we always properly remove and clean up the 
information when remove the IRQ from the guest? In this way, the IRQ 
can always be viewed as a brand new one when we add it back.


If we can make sure the virtual IRQ and physical IRQ is cleaned then yes.


Then the only corner case that we need to take care of would be...


Can you clarify whether you say the "only corner case" because you 
looked at the code? Or is it just because I mentioned only one?


Well, I indeed checked the code and to my best knowledge the corner case 
that you pointed out would be the only one I can think of.


Xen allows the guest to enable a vIRQ even if there is no pIRQ 
assigned. Thanksfully, it looks like the vgic_connect_hw_irq(), in 
both the current and new vGIC, will return an error if we are trying 
to route a pIRQ to an already enabled vIRQ.


But we need to investigate all the possible scenarios to make sure 
that any inconsistencies between the physical state and virtual 
state (including the LRs) will not result to bigger problem.


The one that comes to my mind is: The physical interrupt is 
de-assigned from the guest before it was EOIed. In this case, the 
interrupt will still be in the LR with the HW bit set. This would 
allow the guest to EOI the interrupt even if it is routed to someone 
else. It is unclear what would be the impact on the other guest.


...same as this case, i.e.
test_bit(_IRQ_INPROGRESS, >status) || !test_bit(_IRQ_DISABLED, 
>status)) when we try to remove the IRQ from a running domain.


We already call ->shutdown() which will disable the IRQ. So don't we 
only need to take care of _IRQ_INPROGRESS?


Yes you are correct.

we have 3 possible states which can be read from LR for this case : 
active, pending, pending and active.
- I don't think we can do anything about the active state, so we 
should return -EBUSY and reject the whole operation of removing the 
IRQ from running guest, and user can always retry this operation.


This would mean a malicious/buggy guest would be able to prevent a 
device to be de-assigned. This is not a good idea in particular when 
the domain is dying.


That said, I think you can handle this case. The LR has a bit to 
indicate whether the pIRQ needs to be EOIed. You can clear it and this 
would prevent the guest to touch the pIRQ. There might be other 
clean-up to do in the vGIC datastructure.


I probably misunderstood this sentence, do you mean the EOI bit in the 
pINTID field? I think this bit is only available when the HW bit of LR 
is 0, but in our case the HW is supposed to be 1 (as indicated as your 
previous comment). Would you mind clarifying a bit more? Thanks!


Anyway, we don't have to handle removing an active IRQ when the domain 
is still running (although we do when the domain is destroying). But I 
think this would need to be solved before the feature is (security) 
supported.



- For the pending (and active) case,


Shouldn't the pending and active case handled the same way as the 
active case?


Sorry, yes you are correct.

Kind regards,
Henry



Re: [PATCH 05/15] tools/libs/light: Increase nr_spi to 160

2024-05-07 Thread Henry Wang

Hi Julien,

On 5/7/2024 10:35 PM, Julien Grall wrote:

Hi,

On 06/05/2024 06:17, Henry Wang wrote:

On 5/1/2024 9:58 PM, Anthony PERARD wrote:

On Wed, Apr 24, 2024 at 11:34:39AM +0800, Henry Wang wrote:
Increase number of spi to 160 i.e. gic_number_lines() for Xilinx 
ZynqMP - 32.

This was done to allocate and assign IRQs to a running domain.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
  tools/libs/light/libxl_arm.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/libs/light/libxl_arm.c 
b/tools/libs/light/libxl_arm.c

index dd5c9f4917..50dbd0f2a9 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -181,7 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc 
*gc,

  LOG(DEBUG, "Configure the domain");
-    config->arch.nr_spis = nr_spis;
+    /* gic_number_lines() is 192 for Xilinx ZynqMP. min nr_spis = 
192 - 32. */

+    config->arch.nr_spis = MAX(nr_spis, 160);

Is there a way that that Xen or libxl could find out what the minimum
number of SPI needs to be?


I am afraid currently there is none.


Are we going to have to increase that minimum
number every time a new platform comes along?

It doesn't appear that libxl is using that `nr_spis` value and it is
probably just given to Xen. So my guess is that Xen could simply take
care of the minimum value, gic_number_lines() seems to be a Xen
function.


Xen will take care of the value of nr_spis for dom0 in create_dom0()
dom0_cfg.arch.nr_spis = min(gic_number_lines(), (unsigned int) 992) - 
32;

and also for dom0less domUs in create_domUs().

However, it looks like Xen will not take care of the mininum value 
for libxl guests, the value from config->arch.nr_spis in guest config 
file will be directly passed to the domain_vgic_init() function from 
arch_domain_create().


I agree with you that we shouldn't just bump the number everytime 
when we have a new platform. Therefore, would it be a good idea to 
move the logic in this patch to arch_sanitise_domain_config()?


Xen domains are supposed to be platform agnostics and therefore the 
numbers of SPIs should not be based on the HW.


Furthermore, with your proposal we would end up to allocate data 
structure for N SPIs when a domain may never needs any SPIs (such as 
if passthrough is not in-use). This is more likely for domain created 
by the toolstack than from Xen directly.


Agreed on both comments.

Instead, we should introduce a new XL configuration to let the user 
decide the number of SPIs. I would suggest to name "nr_spis" to match 
the DT bindings.


Sure, I will introduce a new xl config for this to replace this patch. 
Thank you for the suggestion.


Kind regards,
Henry



Cheers,






[PATCH v2 0/2] Some fixes for the existing dynamic dtbo code

2024-05-07 Thread Henry Wang
During the review process for the v1 of the dynamic dtbo series, some
issues of the existing code were identified. Discussions of them can
be found in [1] (for the first patch) and [2] (for the second patch).

Since the main part of the remaining dynamic dtbo series requires more
rework, just send these fixes for now.

[1] 
https://lore.kernel.org/xen-devel/835099c8-6cf0-4f6d-899b-07388df89...@xen.org/
[2] 
https://lore.kernel.org/xen-devel/eaea1986-a27e-4d6c-932f-1d0a9918861f@perard/

Henry Wang (2):
  xen/common/dt-overlay: Fix missing lock when remove the device
  tools/xl: Correct the help information and exit code of the dt-overlay
command

 tools/xl/xl_vmcontrol.c | 6 +++---
 xen/common/dt-overlay.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

-- 
2.34.1




[PATCH v2 2/2] tools/xl: Correct the help information and exit code of the dt-overlay command

2024-05-07 Thread Henry Wang
Fix the name mismatch in the xl dt-overlay command, the
command name should be "dt-overlay" instead of "dt_overlay".

Fix the exit code of the dt-overlay command, use EXIT_FAILURE
instead of ERROR_FAIL.

Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree 
overlay support")
Suggested-by: Anthony PERARD 
Signed-off-by: Henry Wang 
---
v2:
- New patch
---
 tools/xl/xl_vmcontrol.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 98f6bd2e76..02575d5d36 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -1278,7 +1278,7 @@ int main_dt_overlay(int argc, char **argv)
 const int overlay_remove_op = 2;
 
 if (argc < 2) {
-help("dt_overlay");
+help("dt-overlay");
 return EXIT_FAILURE;
 }
 
@@ -1302,11 +1302,11 @@ int main_dt_overlay(int argc, char **argv)
 fprintf(stderr, "failed to read the overlay device tree file %s\n",
 overlay_config_file);
 free(overlay_dtb);
-return ERROR_FAIL;
+return EXIT_FAILURE;
 }
 } else {
 fprintf(stderr, "overlay dtbo file not provided\n");
-return ERROR_FAIL;
+return EXIT_FAILURE;
 }
 
 rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op);
-- 
2.34.1




[PATCH v2 1/2] xen/common/dt-overlay: Fix missing lock when remove the device

2024-05-07 Thread Henry Wang
If CONFIG_DEBUG=y, below assertion will be triggered:
(XEN) Assertion 'rw_is_locked(_host_lock)' failed at 
drivers/passthrough/device_tree.c:146
(XEN) [ Xen-4.19-unstable  arm64  debug=y  Not tainted ]
(XEN) CPU:0
(XEN) PC: 0a257418 iommu_remove_dt_device+0x8c/0xd4
(XEN) LR: 0a2573a0
(XEN) SP: 8000fff7fb30
(XEN) CPSR:   0249 MODE:64-bit EL2h (Hypervisor, handler)
[...]

(XEN) Xen call trace:
(XEN)[<0a257418>] iommu_remove_dt_device+0x8c/0xd4 (PC)
(XEN)[<0a2573a0>] iommu_remove_dt_device+0x14/0xd4 (LR)
(XEN)[<0a20797c>] dt-overlay.c#remove_node_resources+0x8c/0x90
(XEN)[<0a207f14>] dt-overlay.c#remove_nodes+0x524/0x648
(XEN)[<0a208460>] dt_overlay_sysctl+0x428/0xc68
(XEN)[<0a2707f8>] arch_do_sysctl+0x1c/0x2c
(XEN)[<0a230b40>] do_sysctl+0x96c/0x9ec
(XEN)[<0a271e08>] traps.c#do_trap_hypercall+0x1e8/0x288
(XEN)[<0a273490>] do_trap_guest_sync+0x448/0x63c
(XEN)[<0a25c480>] entry.o#guest_sync_slowpath+0xa8/0xd8
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Assertion 'rw_is_locked(_host_lock)' failed at 
drivers/passthrough/device_tree.c:146
(XEN) 

This is because iommu_remove_dt_device() is called without taking the
dt_host_lock. dt_host_lock is meant to ensure that the DT node will not
disappear behind back. So fix the issue by taking the lock as soon as getting
hold of overlay_node.

Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal 
functionalities")
Signed-off-by: Henry Wang 
---
v2:
- Take the lock as soon as getting hold of overlay_node.
v1.1:
- Move the unlock position before the check of rc.
---
 xen/common/dt-overlay.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 1b197381f6..25d15cbcb1 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -429,6 +429,8 @@ static int remove_nodes(const struct overlay_track *tracker)
 if ( overlay_node == NULL )
 return -EINVAL;
 
+write_lock(_host_lock);
+
 rc = remove_descendant_nodes_resources(overlay_node);
 if ( rc )
 return rc;
@@ -439,8 +441,6 @@ static int remove_nodes(const struct overlay_track *tracker)
 
 dt_dprintk("Removing node: %s\n", overlay_node->full_name);
 
-write_lock(_host_lock);
-
 rc = dt_overlay_remove_node(overlay_node);
 if ( rc )
 {
-- 
2.34.1




Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM

2024-05-06 Thread Henry Wang

Hi Julien,

On 5/1/2024 4:13 AM, Julien Grall wrote:

Hi Henry,

On 30/04/2024 04:50, Henry Wang wrote:

On 4/25/2024 10:28 PM, Julien Grall wrote:
Thanks for your feeedback. After checking the b8577547236f commit 
message I think I now understand your point. Do you have any 
suggestion about how can I properly add the support to route/remove 
the IRQ to running domains? Thanks.


I spent some time going through the GIC/vGIC code and had some 
discussions with Stefano and Stewart during the last couple of days, 
let me see if I can describe the use case properly now to continue 
the discussion:


We have some use cases that requires assigning devices to domains 
after domain boot time. For example, suppose there is an FPGA on the 
board which can simulate a device, and the bitstream for the FPGA is 
provided and programmed after domain boot. So we need a way to assign 
the device to the running domain. This series tries to implement this 
use case by using device tree overlay - users can firstly add the 
overlay to Xen dtb, assign the device in the overlay to a domain by 
the xl command, then apply the overlay to Linux.


Thanks for the description! This helps to understand your goal :).


Thank you very much for spending your time on discussing this and 
provide these valuable comments!




I haven't really look at that code in quite a while. I think we need 
to make sure that the virtual and physical IRQ state matches at the 
time we do the routing.


I am undecided on whether we want to simply prevent the action to 
happen or try to reset the state.


There is also the question of what to do if the guest is enabling 
the vIRQ before it is routed.


Sorry for bothering, would you mind elaborating a bit more about the 
two cases that you mentioned above? Commit b8577547236f ("xen/arm: 
Restrict when a physical IRQ can be routed/removed from/to a domain") 
only said there will be undesirable effects, so I am not sure if I 
understand the concerns raised above and the consequences of these 
two use cases.


I will try to explain them below after I answer the rest.

I am probably wrong, I think when we add the overlay, we are probably 
fine as the interrupt is not being used before. 


What if the DT overlay is unloaded and then reloaded? Wouldn't the 
same interrupt be re-used? As a more generic case, this could also be 
a new bitstream for the FPGA.


But even if the interrupt is brand new every time for the DT overlay, 
you are effectively relaxing the check for every user (such as 
XEN_DOMCTL_bind_pt_irq). So the interrupt re-use case needs to be 
taken into account.


I agree. I think IIUC, with your explanation here and below, could we 
simplify the problem to how to properly handle the removal of the IRQ 
from a running guest, if we always properly remove and clean up the 
information when remove the IRQ from the guest? In this way, the IRQ can 
always be viewed as a brand new one when we add it back. Then the only 
corner case that we need to take care of would be...


Also since we only load the device driver after the IRQ is routed to 
the guest, 


This is what a well-behave guest will do. However, we need to think 
what will happen if a guest misbehaves. I am not concerned about a 
guest only impacting itself, I am more concerned about the case where 
the rest of the system is impacted.



I am not sure the guest can enable the vIRQ before it is routed.


Xen allows the guest to enable a vIRQ even if there is no pIRQ 
assigned. Thanksfully, it looks like the vgic_connect_hw_irq(), in 
both the current and new vGIC, will return an error if we are trying 
to route a pIRQ to an already enabled vIRQ.


But we need to investigate all the possible scenarios to make sure 
that any inconsistencies between the physical state and virtual state 
(including the LRs) will not result to bigger problem.


The one that comes to my mind is: The physical interrupt is 
de-assigned from the guest before it was EOIed. In this case, the 
interrupt will still be in the LR with the HW bit set. This would 
allow the guest to EOI the interrupt even if it is routed to someone 
else. It is unclear what would be the impact on the other guest.


...same as this case, i.e.
test_bit(_IRQ_INPROGRESS, >status) || !test_bit(_IRQ_DISABLED, 
>status)) when we try to remove the IRQ from a running domain.


we have 3 possible states which can be read from LR for this case : 
active, pending, pending and active.
- I don't think we can do anything about the active state, so we should 
return -EBUSY and reject the whole operation of removing the IRQ from 
running guest, and user can always retry this operation.
- For the pending (and active) case, can we clear the LR and point the 
LR for the pending_irq to invalid?


Kind regards,
Henry



Cheers,






Re: [PATCH 08/15] tools: Add domain_id and expert mode for overlay operations

2024-05-05 Thread Henry Wang

Hi Anthony,

On 5/1/2024 10:46 PM, Anthony PERARD wrote:

On Wed, Apr 24, 2024 at 11:34:42AM +0800, Henry Wang wrote:

From: Vikram Garhwal 

Add domain_id and expert mode for overlay assignment. This enables dynamic
programming of nodes during runtime.

Take the opportunity to fix the name mismatch in the xl command, the
command name should be "dt-overlay" instead of "dt_overlay".

I don't like much these unrelated / opportunistic changes in a patch,
I'd rather have a separate patch. And in this case, if it was on a
separate patch, that separated patch could gain: Fixes: 61765a07e3d8
("tools/xl: Add new xl command overlay for device tree overlay support")
and potentially backported.


Ok. I can split this part to a separated commit.


Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
  tools/include/libxl.h   |  8 +--
  tools/include/xenctrl.h |  5 +++--
  tools/libs/ctrl/xc_dt_overlay.c |  7 --
  tools/libs/light/libxl_dt_overlay.c | 17 +++
  tools/xl/xl_vmcontrol.c | 34 ++---
  5 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 62cb07dea6..59a3e1b37c 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -2549,8 +2549,12 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, 
uint32_t domid,
  void libxl_device_pci_list_free(libxl_device_pci* list, int num);
  
  #if defined(__arm__) || defined(__aarch64__)

-int libxl_dt_overlay(libxl_ctx *ctx, void *overlay,
- uint32_t overlay_size, uint8_t overlay_op);
+#define LIBXL_DT_OVERLAY_ADD   1
+#define LIBXL_DT_OVERLAY_REMOVE2
+
+int libxl_dt_overlay(libxl_ctx *ctx, uint32_t domain_id, void *overlay,
+ uint32_t overlay_size, uint8_t overlay_op, bool auto_mode,
+ bool domain_mapping);

Sorry, you cannot change the API of an existing libxl function without
providing something backward compatible. We have already a few example
of this changes in libxl.h, e.g.: fded24ea8315 ("libxl: Make libxl_set_vcpuonline 
async")
So, providing a wrapper called libxl_dt_overlay_0x041800() which call
the new function.


Ok, I will add an wrapper.


  #endif
  
  /*

diff --git a/tools/libs/light/libxl_dt_overlay.c 
b/tools/libs/light/libxl_dt_overlay.c
index a6c709a6dc..cdb62b28cf 100644
--- a/tools/libs/light/libxl_dt_overlay.c
+++ b/tools/libs/light/libxl_dt_overlay.c
@@ -57,10 +58,18 @@ int libxl_dt_overlay(libxl_ctx *ctx, void *overlay_dt, 
uint32_t overlay_dt_size,
  rc = 0;
  }
  
-r = xc_dt_overlay(ctx->xch, overlay_dt, overlay_dt_size, overlay_op);

+/* Check if user entered a valid domain id. */
+rc = libxl_domain_info(CTX, NULL, domid);
+if (rc == ERROR_DOMAIN_NOTFOUND) {

Why do you check specifically for "domain not found", what about other
error?


I agree this is indeed very confusing...I will rewrite this part 
properly in the next version.



+LOGD(ERROR, domid, "Non-existant domain.");
+return ERROR_FAIL;

Use `goto out`, and you can let the function return
ERROR_DOMAIN_NOTFOUND if that the error, we can just propagate the `rc`
from libxl_domain_info().


Sure, will do the suggested way.


+}
+
+r = xc_dt_overlay(ctx->xch, domid, overlay_dt, overlay_dt_size, overlay_op,
+  domain_mapping);
  
  if (r) {

-LOG(ERROR, "%s: Adding/Removing overlay dtb failed.", __func__);
+LOG(ERROR, "domain%d: Adding/Removing overlay dtb failed.", domid);

You could replace the macro by LOGD, instead of handwriting "domain%d".


Great suggestion. I will use LOGD.


  rc = ERROR_FAIL;
  }
  
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c

index 98f6bd2e76..9674383ec3 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -1270,21 +1270,48 @@ int main_dt_overlay(int argc, char **argv)
  {
  const char *overlay_ops = NULL;
  const char *overlay_config_file = NULL;
+uint32_t domain_id = 0;
  void *overlay_dtb = NULL;
  int rc;
+bool auto_mode = true;
+bool domain_mapping = false;
  uint8_t op;
  int overlay_dtb_size = 0;
  const int overlay_add_op = 1;
  const int overlay_remove_op = 2;
  
-if (argc < 2) {

-help("dt_overlay");
+if (argc < 3) {
+help("dt-overlay");
  return EXIT_FAILURE;
  }
  
+if (argc > 5) {

+fprintf(stderr, "Too many arguments\n");
+return ERROR_FAIL;
+}
+
  overlay_ops = argv[1];
  overlay_config_file = argv[2];
  
+if (!strcmp(argv[argc - 1], "-e"))

+auto_mode = false;
+
+if (argc == 4 || !auto_mode) {
+domain_id = find_domain(argv[argc-1]);
+  

Re: [PATCH 09/15] tools/libs/light: Modify dtbo to domU linux dtbo format

2024-05-05 Thread Henry Wang

Hi Anthony,

On 5/1/2024 11:09 PM, Anthony PERARD wrote:

On Wed, Apr 24, 2024 at 11:34:43AM +0800, Henry Wang wrote:

diff --git a/tools/libs/light/libxl_dt_overlay.c 
b/tools/libs/light/libxl_dt_overlay.c
index cdb62b28cf..eaf11a0f9c 100644
--- a/tools/libs/light/libxl_dt_overlay.c
+++ b/tools/libs/light/libxl_dt_overlay.c
@@ -41,6 +42,69 @@ static int check_overlay_fdt(libxl__gc *gc, void *fdt, 
size_t size)
  return 0;
  }
  
+static int modify_overlay_for_domU(libxl__gc *gc, void *overlay_dt_domU,

+   size_t size)
+{
+int rc = 0;
+int virtual_interrupt_parent = GUEST_PHANDLE_GIC;
+const struct fdt_property *fdt_prop_node = NULL;
+int overlay;
+int prop_len = 0;
+int subnode = 0;
+int fragment;
+const char *prop_name;
+const char *target_path = "/";
+
+fdt_for_each_subnode(fragment, overlay_dt_domU, 0) {
+prop_name = fdt_getprop(overlay_dt_domU, fragment, "target-path",
+_len);
+if (prop_name == NULL) {
+LOG(ERROR, "target-path property not found\n");

LOG* macros already takes care of adding \n, no need to add an extra
one.


Sure, I will remove the "\n".




+rc = ERROR_FAIL;
+goto err;
+}
+
+/* Change target path for domU dtb. */
+rc = fdt_setprop_string(overlay_dt_domU, fragment, "target-path",

fdt_setprop_string() isn't a libxl function, store the return value in a
variable named `r` instead.'


Thanks for spotting this. Will change it to `r`.


+target_path);
+if (rc) {
+LOG(ERROR, "Setting interrupt parent property failed for %s\n",
+prop_name);
+goto err;
+}
+
+overlay = fdt_subnode_offset(overlay_dt_domU, fragment, "__overlay__");
+
+fdt_for_each_subnode(subnode, overlay_dt_domU, overlay)
+{
+const char *node_name = fdt_get_name(overlay_dt_domU, subnode,
+ NULL);
+
+fdt_prop_node = fdt_getprop(overlay_dt_domU, subnode,
+"interrupt-parent", _len);
+if (fdt_prop_node == NULL) {
+LOG(DETAIL, "%s property not found for %s. Skip to next 
node\n",
+"interrupt-parent", node_name);

Why do you have "interrupt-parent" in a separate argument? Do you meant
to do something like
 const char *some_name = "interrupt-parent";
and use that in the 4 different places that this string is used? (Using
a variable mean that we (or the compiler) can make sure that they are
all spelled correctly.


Great suggestion! I will do this way.


+continue;
+}
+
+rc = fdt_setprop_inplace_u32(overlay_dt_domU, subnode,
+ "interrupt-parent",
+ virtual_interrupt_parent);
+if (rc) {
+LOG(ERROR, "Setting interrupt parent property failed for %s\n",
+"interrupt-parent");
+goto err;
+}
+}
+}
+
+return 0;

Missed indentation.


Will correct it.


+
+err:
+return rc;

A few things, looks like `rc` is always going to be ERROR_FAIL here,
unless you find an libxl_error code that better describe the error, so
you could forgo the `rc` variable.

Also, if you don't need to clean up anything in the function or have a
generic error message, you could simply "return " instead of using the
"goto" style.


Sure, I will simply use return because I don't really think there is 
anything to be cleaned up.



+}
+
  int libxl_dt_overlay(libxl_ctx *ctx, uint32_t domid, void *overlay_dt,
   uint32_t overlay_dt_size, uint8_t overlay_op,
   bool auto_mode, bool domain_mapping)
@@ -73,6 +137,15 @@ int libxl_dt_overlay(libxl_ctx *ctx, uint32_t domid, void 
*overlay_dt,
  rc = ERROR_FAIL;
  }
  
+/*

+ * auto_mode doesn't apply to dom0 as dom0 can get the physical
+ * description of the hardware.
+ */
+if (domid && auto_mode) {
+if (overlay_op == LIBXL_DT_OVERLAY_ADD)

Shouldn't libxl complain if the operation is different?


I will add corresponding error handling code here. Thanks!

Kind regards,
Henry


+rc = modify_overlay_for_domU(gc, overlay_dt, overlay_dt_size);
+}
+
  out:
  GC_FREE;
  return rc;

Thanks,






Re: [PATCH 07/15] xen/overlay: Enable device tree overlay assignment to running domains

2024-05-05 Thread Henry Wang

Hi Julien,

On 4/30/2024 5:47 PM, Julien Grall wrote:

On 30/04/2024 05:00, Henry Wang wrote:

Hi Julien,


Hi Henry,


On 4/30/2024 1:34 AM, Julien Grall wrote:

On 29/04/2024 04:36, Henry Wang wrote:

Hi Jan, Julien, Stefano,


Hi Henry,


On 4/24/2024 2:05 PM, Jan Beulich wrote:

On 24.04.2024 05:34, Henry Wang wrote:

--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1197,7 +1197,9 @@ struct xen_sysctl_dt_overlay {
  #define XEN_SYSCTL_DT_OVERLAY_ADD   1
  #define XEN_SYSCTL_DT_OVERLAY_REMOVE    2
  uint8_t overlay_op; /* IN: Add or 
remove. */

-    uint8_t pad[3]; /* IN: Must be zero. */
+    bool domain_mapping;    /* IN: True of 
False. */

+    uint8_t pad[2]; /* IN: Must be zero. */
+    uint32_t domain_id;
  };

If you merely re-purposed padding fields, all would be fine without
bumping the interface version. Yet you don't, albeit for an unclear
reason: Why uint32_t rather than domid_t? And on top of that - why a
separate boolean when you could use e.g. DOMID_INVALID to indicate
"no domain mapping"?


I think both of your suggestion make great sense. I will follow the 
suggestion in v2.



That said - anything taking a domain ID is certainly suspicious in a
sysctl. Judging from the description you really mean this to be a
domctl. Anything else will require extra justification.


I also think a domctl is better. I had a look at the history of the 
already merged series, it looks like in the first version of merged 
part 1 [1], the hypercall was implemented as the domctl in the 
beginning but later in v2 changed to sysctl. I think this makes 
sense as the scope of that time is just to make Xen aware of the 
device tree node via Xen device tree.


However this is now a problem for the current part where the 
scope (and the end goal) is extended to assign the added device to 
Linux Dom0/DomU via device tree overlays. I am not sure which way 
is better, should we repurposing the sysctl to domctl or maybe add 
another domctl (I am worrying about the duplication because 
basically we need the same sysctl functionality but now with a 
domid in it)? What do you think?


I am not entirely sure this is a good idea to try to add the device 
in Xen and attach it to the guests at the same time. Imagine the 
following situation:


1) Add and attach devices
2) The domain is rebooted
3) Detach and remove devices

After step 2, you technically have a new domain. You could have also 
a case where this is a completely different guest. So the flow would 
look a little bit weird (you create the DT overlay with domain A but 
remove with domain B).


So, at the moment, it feels like the add/attach (resp detech/remove) 
operations should happen separately.


Thinking a bit more about it, there is another problem with the single 
hypercall appproach. The MMIOs will be mapped 1:1 to the guest. These 
region may clash with other part of the layout for domain created by 
the toolstack

and dom0less (if the 1:1 option has not been enabled).

I guess for that add, it would be possible to specify the mapping in 
the Device-Tree. But that would not work for the removal (this may be 
a different domain).


On a somewhat similar topic, the number of IRQs supported by the vGIC 
is fixed at boot. How would that work with this patch?


Seeing your comment here I now realized patch #5 is to address this 
issue. But I think we need to have a complete rework of the original 
patch to make the feature portable. We can continue the discussion in 
patch 5.




Can you clarify why you want to add devices to Xen and attach to a 
guest within a single hypercall?


Sorry I don't know if there is any specific thoughts on the design of 
using a single hypercall to do both add devices to Xen device tree 
and assign the device to the guest. In fact seeing your above 
comments, I think separating these two functionality to two xl 
commands using separated hypercalls would indeed be a better idea. 
Thank you for the suggestion!


To make sure I understand correctly, would you mind confirming if 
below actions for v2 make sense to you? Thanks!
- Only use the XEN_SYSCTL_DT_OVERLAY_{ADD, REMOVE} sysctls to 
add/remove overlay to Xen device tree


Note that this would attach the devices to dom0 first. Maybe this is 
why it was decided to merge the two operations? An option would be to 
allow the devices to be attached to no-one.


- Introduce the xl dt-overlay attach  command and respective 
domctls to do the device assignment for the overlay to domain.


We already have domctls to route IRQs and map MMIOs. So do we actually 
need new domctls?


No I don't think so, like you and Stefano said in the other thread, I 
think I need to split the command to different hypercalls instead of 
only one hypercall and reuse the existing domctl.


Kind regards,
Henry



Cheers,






Re: [PATCH 05/15] tools/libs/light: Increase nr_spi to 160

2024-05-05 Thread Henry Wang

Hi Anthony,

(+Arm maintainers)

On 5/1/2024 9:58 PM, Anthony PERARD wrote:

On Wed, Apr 24, 2024 at 11:34:39AM +0800, Henry Wang wrote:

Increase number of spi to 160 i.e. gic_number_lines() for Xilinx ZynqMP - 32.
This was done to allocate and assign IRQs to a running domain.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
  tools/libs/light/libxl_arm.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index dd5c9f4917..50dbd0f2a9 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -181,7 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
  
  LOG(DEBUG, "Configure the domain");
  
-config->arch.nr_spis = nr_spis;

+/* gic_number_lines() is 192 for Xilinx ZynqMP. min nr_spis = 192 - 32. */
+config->arch.nr_spis = MAX(nr_spis, 160);

Is there a way that that Xen or libxl could find out what the minimum
number of SPI needs to be?


I am afraid currently there is none.


Are we going to have to increase that minimum
number every time a new platform comes along?

It doesn't appear that libxl is using that `nr_spis` value and it is
probably just given to Xen. So my guess is that Xen could simply take
care of the minimum value, gic_number_lines() seems to be a Xen
function.


Xen will take care of the value of nr_spis for dom0 in create_dom0()
dom0_cfg.arch.nr_spis = min(gic_number_lines(), (unsigned int) 992) - 32;
and also for dom0less domUs in create_domUs().

However, it looks like Xen will not take care of the mininum value for 
libxl guests, the value from config->arch.nr_spis in guest config file 
will be directly passed to the domain_vgic_init() function from 
arch_domain_create().


I agree with you that we shouldn't just bump the number everytime when 
we have a new platform. Therefore, would it be a good idea to move the 
logic in this patch to arch_sanitise_domain_config()?


Kind regards,
Henry



Thanks,






Re: [PATCH 04/15] tools/libs/light: Always enable IOMMU

2024-05-05 Thread Henry Wang

Hi Anthony,

On 5/1/2024 9:47 PM, Anthony PERARD wrote:

On Wed, Apr 24, 2024 at 11:34:38AM +0800, Henry Wang wrote:

For overlay with iommu functionality to work with running VMs, we need
to enable IOMMU when iomem presents for the domains.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Henry Wang 
---
  tools/libs/light/libxl_arm.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 1cb89fa584..dd5c9f4917 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -222,6 +222,12 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
  config->arch.sve_vl = d_config->b_info.arch_arm.sve_vl / 128U;
  }
  
+#ifdef LIBXL_HAVE_DT_OVERLAY

libxl_arm.c is only build on Arm, so this should be defined, so no need
to check.


Ah sure, I was just thought in the future RISC-V/PPC may have the same, 
but you are correct. I will remove the check.



+if (d_config->b_info.num_iomem) {
+config->flags |= XEN_DOMCTL_CDF_iommu;

Is this doing the same thing as the previous patch?


I think so, yes, we need the IOMMU flag to be set if we want to assign a 
device from a DT node protected by IOMMU.


Kind regards,
Henry



Thanks,






Re: [PATCH 07/15] xen/overlay: Enable device tree overlay assignment to running domains

2024-05-05 Thread Henry Wang

Hi Stefano, Julien,

On 5/3/2024 2:02 AM, Stefano Stabellini wrote:

On Tue, 30 Apr 2024, Henry Wang wrote:

Hi Julien,

On 4/30/2024 1:34 AM, Julien Grall wrote:

On 29/04/2024 04:36, Henry Wang wrote:

Hi Jan, Julien, Stefano,

Hi Henry,


On 4/24/2024 2:05 PM, Jan Beulich wrote:

On 24.04.2024 05:34, Henry Wang wrote:

--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1197,7 +1197,9 @@ struct xen_sysctl_dt_overlay {
   #define XEN_SYSCTL_DT_OVERLAY_ADD   1
   #define XEN_SYSCTL_DT_OVERLAY_REMOVE    2
   uint8_t overlay_op; /* IN: Add or remove. */
-    uint8_t pad[3]; /* IN: Must be zero. */
+    bool domain_mapping;    /* IN: True of False. */
+    uint8_t pad[2]; /* IN: Must be zero. */
+    uint32_t domain_id;
   };

If you merely re-purposed padding fields, all would be fine without
bumping the interface version. Yet you don't, albeit for an unclear
reason: Why uint32_t rather than domid_t? And on top of that - why a
separate boolean when you could use e.g. DOMID_INVALID to indicate
"no domain mapping"?

I think both of your suggestion make great sense. I will follow the
suggestion in v2.


That said - anything taking a domain ID is certainly suspicious in a
sysctl. Judging from the description you really mean this to be a
domctl. Anything else will require extra justification.

I also think a domctl is better. I had a look at the history of the
already merged series, it looks like in the first version of merged part 1
[1], the hypercall was implemented as the domctl in the beginning but
later in v2 changed to sysctl. I think this makes sense as the scope of
that time is just to make Xen aware of the device tree node via Xen device
tree.

However this is now a problem for the current part where the scope (and
the end goal) is extended to assign the added device to Linux Dom0/DomU
via device tree overlays. I am not sure which way is better, should we
repurposing the sysctl to domctl or maybe add another domctl (I am
worrying about the duplication because basically we need the same sysctl
functionality but now with a domid in it)? What do you think?

I am not entirely sure this is a good idea to try to add the device in Xen
and attach it to the guests at the same time. Imagine the following
situation:

1) Add and attach devices
2) The domain is rebooted
3) Detach and remove devices

After step 2, you technically have a new domain. You could have also a case
where this is a completely different guest. So the flow would look a little
bit weird (you create the DT overlay with domain A but remove with domain
B).

So, at the moment, it feels like the add/attach (resp detech/remove)
operations should happen separately.

Can you clarify why you want to add devices to Xen and attach to a guest
within a single hypercall?

Sorry I don't know if there is any specific thoughts on the design of using a
single hypercall to do both add devices to Xen device tree and assign the
device to the guest. In fact seeing your above comments, I think separating
these two functionality to two xl commands using separated hypercalls would
indeed be a better idea. Thank you for the suggestion!

To make sure I understand correctly, would you mind confirming if below
actions for v2 make sense to you? Thanks!
- Only use the XEN_SYSCTL_DT_OVERLAY_{ADD, REMOVE} sysctls to add/remove
overlay to Xen device tree
- Introduce the xl dt-overlay attach  command and respective domctls to
do the device assignment for the overlay to domain.

I think two hypercalls is OK. The original idea was to have a single xl
command to do the operation for user convenience (even that is not a
hard requirement) but that can result easily in two hypercalls.


Ok, sounds good. I will break the command to two hypercalls and try to 
reuse the existing domctls for assign/remove IRQ/MMIO ranges.


Kind regards,
Henry




Re: [PATCH 1/3] xen/arm/dom0less-build: Alloc magic pages for Dom0less DomUs from hypervisor

2024-05-05 Thread Henry Wang

Hi Daniel,

On 4/30/2024 6:22 PM, Daniel P. Smith wrote:

On 4/29/24 22:55, Henry Wang wrote:

Hi Daniel,

On 4/30/2024 8:27 AM, Daniel P. Smith wrote:

On 4/25/24 23:14, Henry Wang wrote:

There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```
This is because currently the magic pages for Dom0less DomUs are
populated by the init-dom0less app through populate_physmap(), and
populate_physmap() automatically assumes gfn == mfn for 1:1 direct
mapped domains. This cannot be true for the magic pages that are
allocated later from the init-dom0less application executed in Dom0.
For domain using statically allocated memory but not 1:1 
direct-mapped,

similar error "failed to retrieve a reserved page" can be seen as the
reserved memory list is empty at that time.

To solve above issue, this commit allocates the magic pages for
Dom0less DomUs at the domain construction time. The base address/PFN
of the magic page region will be noted and communicated to the
init-dom0less application in Dom0.


Might I suggest we not refer to these as magic pages? I would 
consider them as hypervisor reserved pages for the VM to have access 
to virtual platform capabilities. We may see this expand in the 
future for some unforeseen, new capability.


I think magic page is a specific terminology to refer to these pages, 
see alloc_magic_pages() for both x86 and Arm. I will reword the last 
paragraph of the commit message to refer them as "hypervisor reserved 
pages (currently used as magic pages on Arm)" if this sounds good to 
you.


I would highlight that is a term used in the toolstack, while is 
probably not the best, there is no reason to change in there, but the 
hypervisor does not carry that terminology. IMHO we should not 
introduce it there and be explicit about why the pages are getting 
reserved.


Thanks for the suggestion. I will rework the commit message.

Kind regards,
Henry



v/r,
dps






Re: [PATCH v1.1] xen/commom/dt-overlay: Fix missing lock when remove the device

2024-05-05 Thread Henry Wang

Hi Julien,

On 5/3/2024 9:04 PM, Julien Grall wrote:

Hi Henry,

On 26/04/2024 02:55, Henry Wang wrote:

If CONFIG_DEBUG=y, below assertion will be triggered:
(XEN) Assertion 'rw_is_locked(_host_lock)' failed at 
drivers/passthrough/device_tree.c:146

(XEN) [ Xen-4.19-unstable  arm64  debug=y  Not tainted ]
(XEN) CPU:    0
(XEN) PC: 0a257418 iommu_remove_dt_device+0x8c/0xd4
(XEN) LR: 0a2573a0
(XEN) SP: 8000fff7fb30
(XEN) CPSR:   0249 MODE:64-bit EL2h (Hypervisor, handler)
[...]

(XEN) Xen call trace:
(XEN)    [<0a257418>] iommu_remove_dt_device+0x8c/0xd4 (PC)
(XEN)    [<0a2573a0>] iommu_remove_dt_device+0x14/0xd4 (LR)
(XEN)    [<0a20797c>] 
dt-overlay.c#remove_node_resources+0x8c/0x90

(XEN)    [<0a207f14>] dt-overlay.c#remove_nodes+0x524/0x648
(XEN)    [<0a208460>] dt_overlay_sysctl+0x428/0xc68
(XEN)    [<0a2707f8>] arch_do_sysctl+0x1c/0x2c
(XEN)    [<0a230b40>] do_sysctl+0x96c/0x9ec
(XEN)    [<0a271e08>] traps.c#do_trap_hypercall+0x1e8/0x288
(XEN)    [<0a273490>] do_trap_guest_sync+0x448/0x63c
(XEN)    [<0a25c480>] entry.o#guest_sync_slowpath+0xa8/0xd8
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Assertion 'rw_is_locked(_host_lock)' failed at 
drivers/passthrough/device_tree.c:146

(XEN) 

This is because iommu_remove_dt_device() is called without taking the
dt_host_lock. Fix the issue by taking and releasing the lock properly.

Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal 
functionalities")

Signed-off-by: Henry Wang 
---
v1.1:
- Move the unlock position before the check of rc.
---
  xen/common/dt-overlay.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 1b197381f6..ab8f43aea2 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -381,7 +381,9 @@ static int remove_node_resources(struct 
dt_device_node *device_node)

  {
  if ( dt_device_is_protected(device_node) )
  {
+    write_lock(_host_lock);


Looking at the code, we are not modifying the device_node, so 
shouldn't this be a read_lock()?


Hmm yes, however after seeing your comment...



That said, even though either fix your issue, I am not entirely 
convinced this is the correct position for the lock. From my 
understanding, dt_host_lock is meant to ensure that the DT node will 
not disappear behind your back. So in theory, shouldn't the lock be 
taken as soon as you get hold of device_node?


...here. I believe you made a point here so I think I will just move the 
write_lock(_host_lock) as soon as getting  overlay_node, i.e. on top 
of the call to remove_descendant_nodes_resources(). Therefore we can 
solve the assertion issue of this patch together.


Kind regards,
Henry



Cheers,






Re: [PATCH 2/3] xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP

2024-05-05 Thread Henry Wang

Hi Stefano,

On 5/3/2024 2:08 AM, Stefano Stabellini wrote:

On Fri, 26 Apr 2024, Henry Wang wrote:

For use cases such as Dom0less PV drivers, a mechanism to communicate
Dom0less DomU's static data with the runtime control plane (Dom0) is
needed. Since on Arm HVMOP is already the existing approach to address
such use cases (for example the allocation of HVM_PARAM_CALLBACK_IRQ),
add a new HVMOP key HVM_PARAM_MAGIC_BASE_PFN for storing the magic
page region base PFN. The value will be set at Dom0less DomU
construction time after Dom0less DomU's magic page region has been
allocated.

To keep consistent, also set the value for HVM_PARAM_MAGIC_BASE_PFN
for libxl guests in alloc_magic_pages().

Reported-by: Alec Kwapis 
Signed-off-by: Henry Wang 
---
  tools/libs/guest/xg_dom_arm.c   | 2 ++
  xen/arch/arm/dom0less-build.c   | 2 ++
  xen/arch/arm/hvm.c  | 1 +
  xen/include/public/hvm/params.h | 1 +
  4 files changed, 6 insertions(+)

diff --git a/tools/libs/guest/xg_dom_arm.c b/tools/libs/guest/xg_dom_arm.c
index 8cc7f27dbb..3c08782d1d 100644
--- a/tools/libs/guest/xg_dom_arm.c
+++ b/tools/libs/guest/xg_dom_arm.c
@@ -74,6 +74,8 @@ static int alloc_magic_pages(struct xc_dom_image *dom)
  xc_clear_domain_page(dom->xch, dom->guest_domid, base + 
MEMACCESS_PFN_OFFSET);
  xc_clear_domain_page(dom->xch, dom->guest_domid, dom->vuart_gfn);
  
+xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_MAGIC_BASE_PFN,

+base);
  xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_CONSOLE_PFN,
  dom->console_pfn);
  xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_STORE_PFN,
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 40dc85c759..72187c167d 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -861,6 +861,8 @@ static int __init construct_domU(struct domain *d,
  free_domheap_pages(magic_pg, 
get_order_from_pages(NR_MAGIC_PAGES));
  return rc;
  }
+
+d->arch.hvm.params[HVM_PARAM_MAGIC_BASE_PFN] = gfn_x(gfn);

I apologize as I have not read the whole email thread in reply to this
patch.

Why do we need to introduce a new hvm param instead of just setting
HVM_PARAM_CONSOLE_PFN and HVM_PARAM_STORE_PFN directly here?



Yeah this is a good question, I aIso thought about this but in the end 
didn't do that directly because I don't really want to break the current 
protocol between Linux, Xen and toolstack.
In docs/features/dom0less.pandoc, section "PV Drivers", there is a 
communication protocol saying that Xen should keep the 
HVM_PARAM_STORE_PFN to ~0ULL until the toolstack sets the 
HVM_PARAM_STORE_PFN.


I am open to change the protocol (changes might be needed in the Linux 
side too), if it is ok to do that, I can set the HVM params here 
directly and change the doc accordingly.


Kind regards,
Henry





Re: [PATCH 2/3] xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP

2024-04-30 Thread Henry Wang

Hi Jan,

On 4/30/2024 2:11 PM, Jan Beulich wrote:

On 30.04.2024 04:51, Henry Wang wrote:

On 4/30/2024 8:31 AM, Daniel P. Smith wrote:

On 4/26/24 02:21, Jan Beulich wrote:

On 26.04.2024 05:14, Henry Wang wrote:

--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -76,6 +76,7 @@
    */
   #define HVM_PARAM_STORE_PFN    1
   #define HVM_PARAM_STORE_EVTCHN 2
+#define HVM_PARAM_MAGIC_BASE_PFN    3
     #define HVM_PARAM_IOREQ_PFN    5

Considering all adjacent values are used, it is overwhelmingly likely
that
3 was once used, too. Such re-use needs to be done carefully. Since you
need this for Arm only, that's likely okay, but doesn't go without (a)
saying and (b) considering the possible future case of dom0less becoming
arch-agnostic, or hyperlaunch wanting to extend the scope. Plus (c) imo
this also needs at least a comment, maybe even an #ifdef, seeing how
x86-
focused most of the rest of this header is.

I would recommend having two new params,

Sounds good. I can do the suggestion in v2.


#define HVM_PARAM_HV_RSRV_BASE_PVH 3
#define HVM_PARAM_HV_RSRV_SIZE 4

I think 4 is currently in use, so I think I will find another couple of
numbers in the end for both of them. Instead of reusing 3 and 4.

Right. There are ample gaps, but any use of values within a gap will need
appropriate care. FTAOD using such a gap looks indeed preferable, to avoid
further growing the (sparse) array. Alternatively, if we're firm on this
never going to be used on x86, some clearly x86-specific indexes (e.g. 36
and 37) could be given non-x86 purpose.


Sorry, I am a bit confused. I take Daniel's comment as to add two new 
params, which is currently only used for Arm, but eventually will be 
used for hyperlaunch on x86 (as the name indicated). So I think I will 
use the name that he suggested, but the number changed to 39 and 40.


Kind regards,
Henry



Jan





Re: [PATCH 07/15] xen/overlay: Enable device tree overlay assignment to running domains

2024-04-29 Thread Henry Wang

Hi Julien,

On 4/30/2024 1:34 AM, Julien Grall wrote:

On 29/04/2024 04:36, Henry Wang wrote:

Hi Jan, Julien, Stefano,


Hi Henry,


On 4/24/2024 2:05 PM, Jan Beulich wrote:

On 24.04.2024 05:34, Henry Wang wrote:

--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1197,7 +1197,9 @@ struct xen_sysctl_dt_overlay {
  #define XEN_SYSCTL_DT_OVERLAY_ADD   1
  #define XEN_SYSCTL_DT_OVERLAY_REMOVE    2
  uint8_t overlay_op; /* IN: Add or remove. */
-    uint8_t pad[3]; /* IN: Must be zero. */
+    bool domain_mapping;    /* IN: True of False. */
+    uint8_t pad[2]; /* IN: Must be zero. */
+    uint32_t domain_id;
  };

If you merely re-purposed padding fields, all would be fine without
bumping the interface version. Yet you don't, albeit for an unclear
reason: Why uint32_t rather than domid_t? And on top of that - why a
separate boolean when you could use e.g. DOMID_INVALID to indicate
"no domain mapping"?


I think both of your suggestion make great sense. I will follow the 
suggestion in v2.



That said - anything taking a domain ID is certainly suspicious in a
sysctl. Judging from the description you really mean this to be a
domctl. Anything else will require extra justification.


I also think a domctl is better. I had a look at the history of the 
already merged series, it looks like in the first version of merged 
part 1 [1], the hypercall was implemented as the domctl in the 
beginning but later in v2 changed to sysctl. I think this makes sense 
as the scope of that time is just to make Xen aware of the device 
tree node via Xen device tree.


However this is now a problem for the current part where the 
scope (and the end goal) is extended to assign the added device to 
Linux Dom0/DomU via device tree overlays. I am not sure which way is 
better, should we repurposing the sysctl to domctl or maybe add 
another domctl (I am worrying about the duplication because basically 
we need the same sysctl functionality but now with a domid in it)? 
What do you think?


I am not entirely sure this is a good idea to try to add the device in 
Xen and attach it to the guests at the same time. 
Imagine the following situation:


1) Add and attach devices
2) The domain is rebooted
3) Detach and remove devices

After step 2, you technically have a new domain. You could have also a 
case where this is a completely different guest. So the flow would 
look a little bit weird (you create the DT overlay with domain A but 
remove with domain B).


So, at the moment, it feels like the add/attach (resp detech/remove) 
operations should happen separately.


Can you clarify why you want to add devices to Xen and attach to a 
guest within a single hypercall?


Sorry I don't know if there is any specific thoughts on the design of 
using a single hypercall to do both add devices to Xen device tree and 
assign the device to the guest. In fact seeing your above comments, I 
think separating these two functionality to two xl commands using 
separated hypercalls would indeed be a better idea. Thank you for the 
suggestion!


To make sure I understand correctly, would you mind confirming if below 
actions for v2 make sense to you? Thanks!
- Only use the XEN_SYSCTL_DT_OVERLAY_{ADD, REMOVE} sysctls to add/remove 
overlay to Xen device tree
- Introduce the xl dt-overlay attach  command and respective 
domctls to do the device assignment for the overlay to domain.


Kind regards,
Henry



Cheers,






Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM

2024-04-29 Thread Henry Wang

Hi Julien,

Sorry for the late reply,

On 4/25/2024 10:28 PM, Julien Grall wrote:

Hi,

On 25/04/2024 08:06, Henry Wang wrote:

Hi Julien,

On 4/24/2024 8:58 PM, Julien Grall wrote:

Hi Henry,

On 24/04/2024 04:34, Henry Wang wrote:

From: Vikram Garhwal 

Enable interrupt assign/remove for running VMs in CONFIG_OVERLAY_DTB.

Currently, irq_route and mapping is only allowed at the domain 
creation. Adding

exception for CONFIG_OVERLAY_DTB.


AFAICT, this is mostly reverting b8577547236f ("xen/arm: Restrict 
when a physical IRQ can be routed/removed from/to a domain").




Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
  xen/arch/arm/gic.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 44c40e86de..a775f886ed 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -140,8 +140,10 @@ int gic_route_irq_to_guest(struct domain *d, 
unsigned int virq,

   * back to the physical IRQ. To prevent get unsync, restrict the
   * routing to when the Domain is been created.
   */


The above comment explains why the check was added. But the commit 
message doesn't explain why this can be disregarded for your use-case.


Looking at the history, I don't think you can simply remove the checks.

Regardless that...


+#ifndef CONFIG_OVERLAY_DTB


... I am against such #ifdef. A distros may want to have OVERLAY_DTB 
enabled, yet the user will not use it.


Instead, you want to remove the check once the code can properly 
handle routing an IRQ the domain is created or ...



  if ( d->creation_finished )
  return -EBUSY;
+#endif
    ret = vgic_connect_hw_irq(d, NULL, virq, desc, true);
  if ( ret )
@@ -171,8 +173,10 @@ int gic_remove_irq_from_guest(struct domain 
*d, unsigned int virq,

   * Removing an interrupt while the domain is running may have
   * undesirable effect on the vGIC emulation.
   */
+#ifndef CONFIG_OVERLAY_DTB
  if ( !d->is_dying )
  return -EBUSY;
+#endif


... removed before they domain is destroyed.


Thanks for your feeedback. After checking the b8577547236f commit 
message I think I now understand your point. Do you have any 
suggestion about how can I properly add the support to route/remove 
the IRQ to running domains? Thanks.


I spent some time going through the GIC/vGIC code and had some 
discussions with Stefano and Stewart during the last couple of days, let 
me see if I can describe the use case properly now to continue the 
discussion:


We have some use cases that requires assigning devices to domains after 
domain boot time. For example, suppose there is an FPGA on the board 
which can simulate a device, and the bitstream for the FPGA is provided 
and programmed after domain boot. So we need a way to assign the device 
to the running domain. This series tries to implement this use case by 
using device tree overlay - users can firstly add the overlay to Xen 
dtb, assign the device in the overlay to a domain by the xl command, 
then apply the overlay to Linux.


I haven't really look at that code in quite a while. I think we need 
to make sure that the virtual and physical IRQ state matches at the 
time we do the routing.


I am undecided on whether we want to simply prevent the action to 
happen or try to reset the state.


There is also the question of what to do if the guest is enabling the 
vIRQ before it is routed.


Sorry for bothering, would you mind elaborating a bit more about the two 
cases that you mentioned above? Commit b8577547236f ("xen/arm: Restrict 
when a physical IRQ can be routed/removed from/to a domain") only said 
there will be undesirable effects, so I am not sure if I understand the 
concerns raised above and the consequences of these two use cases. I am 
probably wrong, I think when we add the overlay, we are probably fine as 
the interrupt is not being used before. Also since we only load the 
device driver after the IRQ is routed to the guest, I am not sure the 
guest can enable the vIRQ before it is routed.


Kind regards,
Henry

Overall, someone needs to spend some time reading the code and then 
make a proposal (this could be just documentation if we believe it is 
safe to do). Both the current vGIC and the new one may need an update.


Cheers,






Re: [PATCH 1/3] xen/arm/dom0less-build: Alloc magic pages for Dom0less DomUs from hypervisor

2024-04-29 Thread Henry Wang

Hi Daniel,

On 4/30/2024 8:27 AM, Daniel P. Smith wrote:

On 4/25/24 23:14, Henry Wang wrote:

There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```
This is because currently the magic pages for Dom0less DomUs are
populated by the init-dom0less app through populate_physmap(), and
populate_physmap() automatically assumes gfn == mfn for 1:1 direct
mapped domains. This cannot be true for the magic pages that are
allocated later from the init-dom0less application executed in Dom0.
For domain using statically allocated memory but not 1:1 direct-mapped,
similar error "failed to retrieve a reserved page" can be seen as the
reserved memory list is empty at that time.

To solve above issue, this commit allocates the magic pages for
Dom0less DomUs at the domain construction time. The base address/PFN
of the magic page region will be noted and communicated to the
init-dom0less application in Dom0.


Might I suggest we not refer to these as magic pages? I would consider 
them as hypervisor reserved pages for the VM to have access to virtual 
platform capabilities. We may see this expand in the future for some 
unforeseen, new capability.


I think magic page is a specific terminology to refer to these pages, 
see alloc_magic_pages() for both x86 and Arm. I will reword the last 
paragraph of the commit message to refer them as "hypervisor reserved 
pages (currently used as magic pages on Arm)" if this sounds good to you.


Kind regards,
Henry





Re: [PATCH 2/3] xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP

2024-04-29 Thread Henry Wang

Hi Daniel,

On 4/30/2024 8:31 AM, Daniel P. Smith wrote:


On 4/26/24 02:21, Jan Beulich wrote:

On 26.04.2024 05:14, Henry Wang wrote:

--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -76,6 +76,7 @@
   */
  #define HVM_PARAM_STORE_PFN    1
  #define HVM_PARAM_STORE_EVTCHN 2
+#define HVM_PARAM_MAGIC_BASE_PFN    3
    #define HVM_PARAM_IOREQ_PFN    5


Considering all adjacent values are used, it is overwhelmingly likely 
that

3 was once used, too. Such re-use needs to be done carefully. Since you
need this for Arm only, that's likely okay, but doesn't go without (a)
saying and (b) considering the possible future case of dom0less becoming
arch-agnostic, or hyperlaunch wanting to extend the scope. Plus (c) imo
this also needs at least a comment, maybe even an #ifdef, seeing how 
x86-

focused most of the rest of this header is.


I would recommend having two new params,


Sounds good. I can do the suggestion in v2.



#define HVM_PARAM_HV_RSRV_BASE_PVH 3
#define HVM_PARAM_HV_RSRV_SIZE 4


I think 4 is currently in use, so I think I will find another couple of 
numbers in the end for both of them. Instead of reusing 3 and 4.


Kind regards,
Henry



This will communicate how many pages have been reserved and where 
those pages are located.


v/r,
dps





Re: [PATCH 07/15] xen/overlay: Enable device tree overlay assignment to running domains

2024-04-28 Thread Henry Wang

Hi Jan, Julien, Stefano,

On 4/24/2024 2:05 PM, Jan Beulich wrote:

On 24.04.2024 05:34, Henry Wang wrote:

--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -1197,7 +1197,9 @@ struct xen_sysctl_dt_overlay {
  #define XEN_SYSCTL_DT_OVERLAY_ADD   1
  #define XEN_SYSCTL_DT_OVERLAY_REMOVE2
  uint8_t overlay_op; /* IN: Add or remove. */
-uint8_t pad[3]; /* IN: Must be zero. */
+bool domain_mapping;/* IN: True of False. */
+uint8_t pad[2]; /* IN: Must be zero. */
+uint32_t domain_id;
  };

If you merely re-purposed padding fields, all would be fine without
bumping the interface version. Yet you don't, albeit for an unclear
reason: Why uint32_t rather than domid_t? And on top of that - why a
separate boolean when you could use e.g. DOMID_INVALID to indicate
"no domain mapping"?


I think both of your suggestion make great sense. I will follow the 
suggestion in v2.



That said - anything taking a domain ID is certainly suspicious in a
sysctl. Judging from the description you really mean this to be a
domctl. Anything else will require extra justification.


I also think a domctl is better. I had a look at the history of the 
already merged series, it looks like in the first version of merged part 
1 [1], the hypercall was implemented as the domctl in the beginning but 
later in v2 changed to sysctl. I think this makes sense as the scope of 
that time is just to make Xen aware of the device tree node via Xen 
device tree.


However this is now a problem for the current part where the scope (and 
the end goal) is extended to assign the added device to Linux Dom0/DomU 
via device tree overlays. I am not sure which way is better, should we 
repurposing the sysctl to domctl or maybe add another domctl (I am 
worrying about the duplication because basically we need the same sysctl 
functionality but now with a domid in it)? What do you think?


@Stefano: Since I am not 100% if I understand the whole story behind 
this feature, would you mind checking if I am providing correct 
information above and sharing your opinions on this? Thank you very much!


[1] 
https://lore.kernel.org/xen-devel/13240b69-f7bb-6a64-b89c-b7c2cbb7e...@xen.org/


Kind regards,
Henry


Jan





Re: [PATCH 2/3] xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP

2024-04-26 Thread Henry Wang

Hi Jan,

On 4/26/2024 2:50 PM, Jan Beulich wrote:

On 26.04.2024 08:30, Henry Wang wrote:

On 4/26/2024 2:21 PM, Jan Beulich wrote:

On 26.04.2024 05:14, Henry Wang wrote:

--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -76,6 +76,7 @@
*/
   #define HVM_PARAM_STORE_PFN1
   #define HVM_PARAM_STORE_EVTCHN 2
+#define HVM_PARAM_MAGIC_BASE_PFN3
   
   #define HVM_PARAM_IOREQ_PFN5

Considering all adjacent values are used, it is overwhelmingly likely that
3 was once used, too. Such re-use needs to be done carefully. Since you
need this for Arm only, that's likely okay, but doesn't go without (a)
saying and (b) considering the possible future case of dom0less becoming
arch-agnostic, or hyperlaunch wanting to extend the scope. Plus (c) imo
this also needs at least a comment, maybe even an #ifdef, seeing how x86-
focused most of the rest of this header is.

Thanks for the feedback. These make sense. I think probably
dom0less/hyperlaunch will have similar use cases so the number 3 can be
reused at that time. Therefore, in v2, I will add more description in
commit message, a comment on top of this macro and protect it with
#ifdef. Hope this will address your concern. Thanks.

FTAOD: If you foresee re-use by hyperlaunch, re-using a previously used
number may need re-considering. Which isn't to say that number re-use is
excluded here, but it would need at least figuring out (and then stating)
what exactly the number was used for and until when.


I just did a bit search and noticed that the number 3 was used to be
#define HVM_PARAM_APIC_ENABLED 3

and it was removed 18 years ago in commit: 
6bc01e4efd50e1986a9391f75980d45691f42b74


So I think we are likely to be ok if reuse 3 on Arm with proper #ifdef.

Kind regards,
Henry


Jan





Re: [PATCH 2/3] xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP

2024-04-26 Thread Henry Wang

Hi Jan,

On 4/26/2024 2:21 PM, Jan Beulich wrote:

On 26.04.2024 05:14, Henry Wang wrote:

--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -76,6 +76,7 @@
   */
  #define HVM_PARAM_STORE_PFN1
  #define HVM_PARAM_STORE_EVTCHN 2
+#define HVM_PARAM_MAGIC_BASE_PFN3
  
  #define HVM_PARAM_IOREQ_PFN5

Considering all adjacent values are used, it is overwhelmingly likely that
3 was once used, too. Such re-use needs to be done carefully. Since you
need this for Arm only, that's likely okay, but doesn't go without (a)
saying and (b) considering the possible future case of dom0less becoming
arch-agnostic, or hyperlaunch wanting to extend the scope. Plus (c) imo
this also needs at least a comment, maybe even an #ifdef, seeing how x86-
focused most of the rest of this header is.


Thanks for the feedback. These make sense. I think probably 
dom0less/hyperlaunch will have similar use cases so the number 3 can be 
reused at that time. Therefore, in v2, I will add more description in 
commit message, a comment on top of this macro and protect it with 
#ifdef. Hope this will address your concern. Thanks.


Kind regards,
Henry


Jan





Re: [PATCH v4 0/5] DOMCTL-based guest magic region allocation for 11 domUs

2024-04-25 Thread Henry Wang

Hi Stefano, Daniel,

On 4/26/2024 6:18 AM, Stefano Stabellini wrote:

On Thu, 18 Apr 2024, Daniel P. Smith wrote:

On 4/9/24 00:53, Henry Wang wrote:

An error message can seen from the init-dom0less application on
direct-mapped 1:1 domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

This is because populate_physmap() automatically assumes gfn == mfn
for direct mapped domains. This cannot be true for the magic pages
that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
helper application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list
is empty at that time.

This series tries to fix this issue using a DOMCTL-based approach,
because for 1:1 direct-mapped domUs, we need to avoid the RAM regions
and inform the toolstack about the region found by hypervisor for
mapping the magic pages. Patch 1 introduced a new DOMCTL to get the
guest memory map, currently only used for the magic page regions.
Patch 2 generalized the extended region finding logic so that it can
be reused for other use cases such as finding 1:1 domU magic regions.
Patch 3 uses the same approach as finding the extended regions to find
the guest magic page regions for direct-mapped DomUs. Patch 4 avoids
hardcoding all base addresses of guest magic region in the init-dom0less
application by consuming the newly introduced DOMCTL. Patch 5 is a
simple patch to do some code duplication clean-up in xc.

Hey Henry,

To help provide some perspective, these issues are not experienced with
hyperlaunch. This is because we understood early on that you cannot move a
lightweight version of the toolstack into hypervisor init and not provide a
mechanism to communicate what it did to the runtime control plane. We
evaluated the possible mechanism, to include introducing a new hypercall op,
and ultimately settled on using hypfs. The primary reason is this information
is static data that, while informative later, is only necessary for the
control plane to understand the state of the system. As a result, hyperlaunch
is able to allocate any and all special pages required as part of domain
construction and communicate their addresses to the control plane. As for XSM,
hypfs is already protected and at this time we do not see any domain builder
information needing to be restricted separately from the data already present
in hypfs.

I would like to make the suggestion that instead of continuing down this path,
perhaps you might consider adopting the hyperlaunch usage of hypfs. Then
adjust dom0less domain construction to allocate the special pages at
construction time. The original hyperlaunch series includes a patch that
provides the helper app for the xenstore announcement. And I can provide you
with updated versions if that would be helpful.

I also think that the new domctl is not needed and that the dom0less
domain builder should allocate the magic pages.


Yes this is indeed much better. Thanks Daniel for suggesting this.


On ARM, we already
allocate HVM_PARAM_CALLBACK_IRQ during dom0less domain build and set
HVM_PARAM_STORE_PFN to ~0ULL. I think it would be only natural to extend
that code to also allocate the magic pages and set HVM_PARAM_STORE_PFN
(and others) correctly. If we do it that way it is simpler and
consistent with the HVM_PARAM_CALLBACK_IRQ allocation, and we don't even
need hypfs. Currently we do not enable hypfs in our safety
certifiability configuration.


It is indeed very important to consider the safety certification (which 
I completely missed). Therefore I've sent an updated version based on 
HVMOP [1]. In the future we can switch to hypfs if needed.


[1] 
https://lore.kernel.org/xen-devel/20240426031455.579637-1-xin.wa...@amd.com/


Kind regards,
Henry




[PATCH 2/3] xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP

2024-04-25 Thread Henry Wang
For use cases such as Dom0less PV drivers, a mechanism to communicate
Dom0less DomU's static data with the runtime control plane (Dom0) is
needed. Since on Arm HVMOP is already the existing approach to address
such use cases (for example the allocation of HVM_PARAM_CALLBACK_IRQ),
add a new HVMOP key HVM_PARAM_MAGIC_BASE_PFN for storing the magic
page region base PFN. The value will be set at Dom0less DomU
construction time after Dom0less DomU's magic page region has been
allocated.

To keep consistent, also set the value for HVM_PARAM_MAGIC_BASE_PFN
for libxl guests in alloc_magic_pages().

Reported-by: Alec Kwapis 
Signed-off-by: Henry Wang 
---
 tools/libs/guest/xg_dom_arm.c   | 2 ++
 xen/arch/arm/dom0less-build.c   | 2 ++
 xen/arch/arm/hvm.c  | 1 +
 xen/include/public/hvm/params.h | 1 +
 4 files changed, 6 insertions(+)

diff --git a/tools/libs/guest/xg_dom_arm.c b/tools/libs/guest/xg_dom_arm.c
index 8cc7f27dbb..3c08782d1d 100644
--- a/tools/libs/guest/xg_dom_arm.c
+++ b/tools/libs/guest/xg_dom_arm.c
@@ -74,6 +74,8 @@ static int alloc_magic_pages(struct xc_dom_image *dom)
 xc_clear_domain_page(dom->xch, dom->guest_domid, base + 
MEMACCESS_PFN_OFFSET);
 xc_clear_domain_page(dom->xch, dom->guest_domid, dom->vuart_gfn);
 
+xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_MAGIC_BASE_PFN,
+base);
 xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_CONSOLE_PFN,
 dom->console_pfn);
 xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_STORE_PFN,
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 40dc85c759..72187c167d 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -861,6 +861,8 @@ static int __init construct_domU(struct domain *d,
 free_domheap_pages(magic_pg, get_order_from_pages(NR_MAGIC_PAGES));
 return rc;
 }
+
+d->arch.hvm.params[HVM_PARAM_MAGIC_BASE_PFN] = gfn_x(gfn);
 }
 
 return rc;
diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
index 0989309fea..fa6141e30c 100644
--- a/xen/arch/arm/hvm.c
+++ b/xen/arch/arm/hvm.c
@@ -55,6 +55,7 @@ static int hvm_allow_get_param(const struct domain *d, 
unsigned int param)
 case HVM_PARAM_STORE_EVTCHN:
 case HVM_PARAM_CONSOLE_PFN:
 case HVM_PARAM_CONSOLE_EVTCHN:
+case HVM_PARAM_MAGIC_BASE_PFN:
 return 0;
 
 /*
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index a22b4ed45d..c1720b33b9 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -76,6 +76,7 @@
  */
 #define HVM_PARAM_STORE_PFN1
 #define HVM_PARAM_STORE_EVTCHN 2
+#define HVM_PARAM_MAGIC_BASE_PFN3
 
 #define HVM_PARAM_IOREQ_PFN5
 
-- 
2.34.1




[PATCH 0/3] Guest magic region allocation for 11 Dom0less domUs - Take two

2024-04-25 Thread Henry Wang
Hi all,

This series is trying to fix the reported guest magic region allocation
issue for 11 Dom0less domUs, an error message can seen from the
init-dom0less application on 1:1 direct-mapped Dom0less DomUs:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

This is because populate_physmap() automatically assumes gfn == mfn
for direct mapped domains. This cannot be true for the magic pages
that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
helper application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list
is empty at that time.

In [1] I've tried to fix this issue by the domctl approach, and
discussions in [2] and [3] indicates that a domctl is not really
necessary, as we can simplify the issue to "allocate the Dom0less
guest magic regions at the Dom0less DomU build time and pass the
region base PFN to init-dom0less application". Therefore, the first
patch in this series will allocate magic pages for Dom0less DomUs,
the second patch will store the allocated region base PFN to HVMOP
params like HVM_PARAM_CALLBACK_IRQ, and the third patch uses the
HVMOP to get the stored guest magic region base PFN to avoid hardcoding
GUEST_MAGIC_BASE.

Gitlab CI for this series can be found in [4].

[1] https://lore.kernel.org/xen-devel/20240409045357.236802-1-xin.wa...@amd.com/
[2] 
https://lore.kernel.org/xen-devel/c7857223-eab8-409a-b618-6ec70f616...@apertussolutions.com/
[3] 
https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2404251508470.3940@ubuntu-linux-20-04-desktop/
[4] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1268643360

Henry Wang (3):
  xen/arm/dom0less-build: Alloc magic pages for Dom0less DomUs from
hypervisor
  xen/arm, tools: Add a new HVM_PARAM_MAGIC_BASE_PFN key in HVMOP
  tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE

 tools/helpers/init-dom0less.c   | 38 ++---
 tools/libs/guest/xg_dom_arm.c   |  3 ++-
 xen/arch/arm/dom0less-build.c   | 24 +
 xen/arch/arm/hvm.c  |  1 +
 xen/include/public/arch-arm.h   |  1 +
 xen/include/public/hvm/params.h |  1 +
 6 files changed, 45 insertions(+), 23 deletions(-)

-- 
2.34.1




[PATCH 3/3] tools/init-dom0less: Avoid hardcoding GUEST_MAGIC_BASE

2024-04-25 Thread Henry Wang
Currently the GUEST_MAGIC_BASE in the init-dom0less application is
hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less
DomUs.

Since the guest magic region is now allocated from the hypervisor,
instead of hardcoding the guest magic pages region, use
xc_hvm_param_get() to get the guest magic region PFN, and based on
that the XenStore PFN can be calculated. Also, we don't need to set
the max mem anymore, so drop the call to xc_domain_setmaxmem(). Rename
the alloc_xs_page() to get_xs_page() to reflect the changes.

Take the opportunity to do some coding style improvements when possible.

Reported-by: Alec Kwapis 
Signed-off-by: Henry Wang 
---
 tools/helpers/init-dom0less.c | 38 +++
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/tools/helpers/init-dom0less.c b/tools/helpers/init-dom0less.c
index fee93459c4..7f6953a818 100644
--- a/tools/helpers/init-dom0less.c
+++ b/tools/helpers/init-dom0less.c
@@ -19,24 +19,20 @@
 #define XENSTORE_PFN_OFFSET 1
 #define STR_MAX_LENGTH 128
 
-static int alloc_xs_page(struct xc_interface_core *xch,
- libxl_dominfo *info,
- uint64_t *xenstore_pfn)
+static int get_xs_page(struct xc_interface_core *xch, libxl_dominfo *info,
+   uint64_t *xenstore_pfn)
 {
 int rc;
-const xen_pfn_t base = GUEST_MAGIC_BASE >> XC_PAGE_SHIFT;
-xen_pfn_t p2m = (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET;
+xen_pfn_t magic_base_pfn;
 
-rc = xc_domain_setmaxmem(xch, info->domid,
- info->max_memkb + (XC_PAGE_SIZE/1024));
-if (rc < 0)
-return rc;
-
-rc = xc_domain_populate_physmap_exact(xch, info->domid, 1, 0, 0, );
-if (rc < 0)
-return rc;
+rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_MAGIC_BASE_PFN,
+  _base_pfn);
+if (rc < 0) {
+printf("Failed to get HVM_PARAM_MAGIC_BASE_PFN\n");
+return 1;
+}
 
-*xenstore_pfn = base + XENSTORE_PFN_OFFSET;
+*xenstore_pfn = magic_base_pfn + XENSTORE_PFN_OFFSET;
 rc = xc_clear_domain_page(xch, info->domid, *xenstore_pfn);
 if (rc < 0)
 return rc;
@@ -100,6 +96,7 @@ static bool do_xs_write_vm(struct xs_handle *xsh, 
xs_transaction_t t,
  */
 static int create_xenstore(struct xs_handle *xsh,
libxl_dominfo *info, libxl_uuid uuid,
+   xen_pfn_t xenstore_pfn,
evtchn_port_t xenstore_port)
 {
 domid_t domid;
@@ -145,8 +142,7 @@ static int create_xenstore(struct xs_handle *xsh,
 rc = snprintf(target_memkb_str, STR_MAX_LENGTH, "%"PRIu64, 
info->current_memkb);
 if (rc < 0 || rc >= STR_MAX_LENGTH)
 return rc;
-rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%lld",
-  (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET);
+rc = snprintf(ring_ref_str, STR_MAX_LENGTH, "%"PRIu_xen_pfn, xenstore_pfn);
 if (rc < 0 || rc >= STR_MAX_LENGTH)
 return rc;
 rc = snprintf(xenstore_port_str, STR_MAX_LENGTH, "%u", xenstore_port);
@@ -245,8 +241,8 @@ static int init_domain(struct xs_handle *xsh,
 if (!xenstore_evtchn)
 return 0;
 
-/* Alloc xenstore page */
-if (alloc_xs_page(xch, info, _pfn) != 0) {
+/* Get xenstore page */
+if (get_xs_page(xch, info, _pfn) != 0) {
 printf("Error on alloc magic pages\n");
 return 1;
 }
@@ -278,13 +274,11 @@ static int init_domain(struct xs_handle *xsh,
 if (rc < 0)
 return rc;
 
-rc = create_xenstore(xsh, info, uuid, xenstore_evtchn);
+rc = create_xenstore(xsh, info, uuid, xenstore_pfn, xenstore_evtchn);
 if (rc)
 err(1, "writing to xenstore");
 
-rc = xs_introduce_domain(xsh, info->domid,
-(GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + XENSTORE_PFN_OFFSET,
-xenstore_evtchn);
+rc = xs_introduce_domain(xsh, info->domid, xenstore_pfn, xenstore_evtchn);
 if (!rc)
 err(1, "xs_introduce_domain");
 return 0;
-- 
2.34.1




[PATCH 1/3] xen/arm/dom0less-build: Alloc magic pages for Dom0less DomUs from hypervisor

2024-04-25 Thread Henry Wang
There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```
This is because currently the magic pages for Dom0less DomUs are
populated by the init-dom0less app through populate_physmap(), and
populate_physmap() automatically assumes gfn == mfn for 1:1 direct
mapped domains. This cannot be true for the magic pages that are
allocated later from the init-dom0less application executed in Dom0.
For domain using statically allocated memory but not 1:1 direct-mapped,
similar error "failed to retrieve a reserved page" can be seen as the
reserved memory list is empty at that time.

To solve above issue, this commit allocates the magic pages for
Dom0less DomUs at the domain construction time. The base address/PFN
of the magic page region will be noted and communicated to the
init-dom0less application in Dom0.

Reported-by: Alec Kwapis 
Suggested-by: Daniel P. Smith 
Signed-off-by: Henry Wang 
---
 tools/libs/guest/xg_dom_arm.c |  1 -
 xen/arch/arm/dom0less-build.c | 22 ++
 xen/include/public/arch-arm.h |  1 +
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/libs/guest/xg_dom_arm.c b/tools/libs/guest/xg_dom_arm.c
index 2fd8ee7ad4..8cc7f27dbb 100644
--- a/tools/libs/guest/xg_dom_arm.c
+++ b/tools/libs/guest/xg_dom_arm.c
@@ -25,7 +25,6 @@
 
 #include "xg_private.h"
 
-#define NR_MAGIC_PAGES 4
 #define CONSOLE_PFN_OFFSET 0
 #define XENSTORE_PFN_OFFSET 1
 #define MEMACCESS_PFN_OFFSET 2
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index fb63ec6fd1..40dc85c759 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -834,11 +834,33 @@ static int __init construct_domU(struct domain *d,
 
 if ( kinfo.dom0less_feature & DOM0LESS_XENSTORE )
 {
+struct page_info *magic_pg;
+mfn_t mfn;
+gfn_t gfn;
+
 ASSERT(hardware_domain);
 rc = alloc_xenstore_evtchn(d);
 if ( rc < 0 )
 return rc;
 d->arch.hvm.params[HVM_PARAM_STORE_PFN] = ~0ULL;
+
+d->max_pages += NR_MAGIC_PAGES;
+magic_pg = alloc_domheap_pages(d, 
get_order_from_pages(NR_MAGIC_PAGES), 0);
+if ( magic_pg == NULL )
+return -ENOMEM;
+
+mfn = page_to_mfn(magic_pg);
+if ( !is_domain_direct_mapped(d) )
+gfn = gaddr_to_gfn(GUEST_MAGIC_BASE);
+else
+gfn = gaddr_to_gfn(mfn_to_maddr(mfn));
+
+rc = guest_physmap_add_pages(d, gfn, mfn, NR_MAGIC_PAGES);
+if ( rc )
+{
+free_domheap_pages(magic_pg, get_order_from_pages(NR_MAGIC_PAGES));
+return rc;
+}
 }
 
 return rc;
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index e167e14f8d..f24e7bbe37 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -475,6 +475,7 @@ typedef uint64_t xen_callback_t;
 
 #define GUEST_MAGIC_BASE  xen_mk_ullong(0x3900)
 #define GUEST_MAGIC_SIZE  xen_mk_ullong(0x0100)
+#define NR_MAGIC_PAGES 4
 
 #define GUEST_RAM_BANKS   2
 
-- 
2.34.1




[PATCH v1.1] xen/commom/dt-overlay: Fix missing lock when remove the device

2024-04-25 Thread Henry Wang
If CONFIG_DEBUG=y, below assertion will be triggered:
(XEN) Assertion 'rw_is_locked(_host_lock)' failed at 
drivers/passthrough/device_tree.c:146
(XEN) [ Xen-4.19-unstable  arm64  debug=y  Not tainted ]
(XEN) CPU:    0
(XEN) PC: 0a257418 iommu_remove_dt_device+0x8c/0xd4
(XEN) LR: 0a2573a0
(XEN) SP: 8000fff7fb30
(XEN) CPSR:   0249 MODE:64-bit EL2h (Hypervisor, handler)
[...]

(XEN) Xen call trace:
(XEN)    [<0a257418>] iommu_remove_dt_device+0x8c/0xd4 (PC)
(XEN)    [<0a2573a0>] iommu_remove_dt_device+0x14/0xd4 (LR)
(XEN)    [<0a20797c>] dt-overlay.c#remove_node_resources+0x8c/0x90
(XEN)    [<0a207f14>] dt-overlay.c#remove_nodes+0x524/0x648
(XEN)    [<0a208460>] dt_overlay_sysctl+0x428/0xc68
(XEN)    [<0a2707f8>] arch_do_sysctl+0x1c/0x2c
(XEN)    [<0a230b40>] do_sysctl+0x96c/0x9ec
(XEN)    [<0a271e08>] traps.c#do_trap_hypercall+0x1e8/0x288
(XEN)    [<0a273490>] do_trap_guest_sync+0x448/0x63c
(XEN)    [<0a25c480>] entry.o#guest_sync_slowpath+0xa8/0xd8
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Assertion 'rw_is_locked(_host_lock)' failed at 
drivers/passthrough/device_tree.c:146
(XEN) 

This is because iommu_remove_dt_device() is called without taking the
dt_host_lock. Fix the issue by taking and releasing the lock properly.

Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal 
functionalities")
Signed-off-by: Henry Wang 
---
v1.1:
- Move the unlock position before the check of rc.
---
 xen/common/dt-overlay.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 1b197381f6..ab8f43aea2 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -381,7 +381,9 @@ static int remove_node_resources(struct dt_device_node 
*device_node)
 {
 if ( dt_device_is_protected(device_node) )
 {
+write_lock(_host_lock);
 rc = iommu_remove_dt_device(device_node);
+write_unlock(_host_lock);
 if ( rc < 0 )
 return rc;
 }
-- 
2.34.1




Re: [PATCH 11/15] tools/helpers: Add get_overlay

2024-04-25 Thread Henry Wang

Hi Stewart,

On 4/26/2024 9:45 AM, Stewart Hildebrand wrote:

On 4/24/24 20:43, Henry Wang wrote:

Hi Jan,

On 4/24/2024 2:08 PM, Jan Beulich wrote:

On 24.04.2024 05:34, Henry Wang wrote:

From: Vikram Garhwal 

This user level application copies the overlay dtbo shared by dom0 while doing
overlay node assignment operation. It uses xenstore to communicate with dom0.
More information on the protocol is writtien in docs/misc/overlay.txt file.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
   tools/helpers/Makefile  |   8 +
   tools/helpers/get_overlay.c | 393 
   2 files changed, 401 insertions(+)
   create mode 100644 tools/helpers/get_overlay.c

As mentioned before on various occasions - new files preferably use dashes as
separators in preference to underscores. You not doing so is particularly
puzzling seeing ...


--- a/tools/helpers/Makefile
+++ b/tools/helpers/Makefile
@@ -12,6 +12,7 @@ TARGETS += init-xenstore-domain
   endif
   ifeq ($(CONFIG_ARM),y)
   TARGETS += init-dom0less
+TARGETS += get_overlay

... patch context here (demonstrating a whopping 3 dashes used in similar
cases).

I am not very sure why Vikram used "_" in the original patch. However I agree you are 
correct. Since I am currently doing the follow up of this series, I will use "-" in v2 as 
suggested. Thanks.

Please also add tools/helpers/get-overlay to .gitignore


Thanks for the reminder! Yes sure I will add it.

Kind regards,
Henry



Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM

2024-04-25 Thread Henry Wang

Hi Julien,

On 4/24/2024 8:58 PM, Julien Grall wrote:

Hi Henry,

On 24/04/2024 04:34, Henry Wang wrote:

From: Vikram Garhwal 

Enable interrupt assign/remove for running VMs in CONFIG_OVERLAY_DTB.

Currently, irq_route and mapping is only allowed at the domain 
creation. Adding

exception for CONFIG_OVERLAY_DTB.


AFAICT, this is mostly reverting b8577547236f ("xen/arm: Restrict when 
a physical IRQ can be routed/removed from/to a domain").




Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
  xen/arch/arm/gic.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 44c40e86de..a775f886ed 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -140,8 +140,10 @@ int gic_route_irq_to_guest(struct domain *d, 
unsigned int virq,

   * back to the physical IRQ. To prevent get unsync, restrict the
   * routing to when the Domain is been created.
   */


The above comment explains why the check was added. But the commit 
message doesn't explain why this can be disregarded for your use-case.


Looking at the history, I don't think you can simply remove the checks.

Regardless that...


+#ifndef CONFIG_OVERLAY_DTB


... I am against such #ifdef. A distros may want to have OVERLAY_DTB 
enabled, yet the user will not use it.


Instead, you want to remove the check once the code can properly 
handle routing an IRQ the domain is created or ...



  if ( d->creation_finished )
  return -EBUSY;
+#endif
    ret = vgic_connect_hw_irq(d, NULL, virq, desc, true);
  if ( ret )
@@ -171,8 +173,10 @@ int gic_remove_irq_from_guest(struct domain *d, 
unsigned int virq,

   * Removing an interrupt while the domain is running may have
   * undesirable effect on the vGIC emulation.
   */
+#ifndef CONFIG_OVERLAY_DTB
  if ( !d->is_dying )
  return -EBUSY;
+#endif


... removed before they domain is destroyed.


Thanks for your feeedback. After checking the b8577547236f commit 
message I think I now understand your point. Do you have any suggestion 
about how can I properly add the support to route/remove the IRQ to 
running domains? Thanks.


Kind regards,
Henry




desc->handler->shutdown(desc);


Cheers,






Re: [PATCH 14/15] add a domU script to fetch overlays and applying them to linux

2024-04-25 Thread Henry Wang

Hi Jan,

On 4/25/2024 2:46 PM, Jan Beulich wrote:

On 25.04.2024 02:54, Henry Wang wrote:

On 4/24/2024 2:16 PM, Jan Beulich wrote:

On 24.04.2024 05:34, Henry Wang wrote:

From: Vikram Garhwal 

Introduce a shell script that runs in the background and calls
get_overlay to retrive overlays and add them (or remove them) to Linux
device tree (running as a domU).

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
   tools/helpers/Makefile   |  2 +-
   tools/helpers/get_overlay.sh | 81 
   2 files changed, 82 insertions(+), 1 deletion(-)
   create mode 100755 tools/helpers/get_overlay.sh

Besides the same naming issue as in the earlier patch, the script also
looks very Linux-ish. Yet ...

I will fix the naming issue in v2. Would you mind elaborating a bit more
about the "Linux-ish" concern? I guess this is because the original use
case is on Linux, should I do anything about this?

Well, the script won't work on other than Linux, will it? Therefore ...


--- a/tools/helpers/Makefile
+++ b/tools/helpers/Makefile
@@ -58,7 +58,6 @@ init-dom0less: $(INIT_DOM0LESS_OBJS)
   get_overlay: $(SHARE_OVERLAY_OBJS)
$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenvchan) $(LDLIBS_libxenstore) 
$(LDLIBS_libxenctrl) $(LDLIBS_libxengnttab) $(APPEND_LDFLAGS)
   
-

   .PHONY: install
   install: all
$(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
@@ -67,6 +66,7 @@ install: all
   .PHONY: uninstall
   uninstall:
for i in $(TARGETS); do rm -f $(DESTDIR)$(LIBEXEC_BIN)/$$i; done
+   $(RM) $(DESTDIR)$(LIBEXEC_BIN)/get_overlay.sh
   
   .PHONY: clean

   clean:

... you touching only the uninstall target, it's not even clear to me
how (and under what conditions) the script is going to make it into
$(DESTDIR)$(LIBEXEC_BIN)/. Did you mean to add to $(TARGETS), perhaps,
alongside the earlier added get-overlay binary?

... it first of needs to become clear under what conditions it is actually
going to be installed.


You are right, I think the get-overlay binary and this script should be
installed if DTB overlay is supported. Checking the code, I found
LIBXL_HAVE_DT_OVERLAY which can indicate if we have this feature
supported in libxl. Do you think it is a good idea to use it to install
these two files in Makefile? Thanks.

Counter question: If it's not going to be installed, how are people going
to make use of it? If the script is intended for manual use only, I think
that would want saying in the description. Yet then I couldn't see why
the uninstall goal would need modifying.


Checking the code again, I feel like this is a mistake actually. I think 
this script should be installed together with the get-overlay 
application as the script actually calls get-overlay. The uninstall goal 
should remain untouched. I will fix this in v2.



As to LIBXL_HAVE_DT_OVERLAY - that's not accessible from a Makefile, I
guess?


Yes.

Kind regards,
Henry



Jan





Re: [PATCH 03/15] xen/arm: Always enable IOMMU

2024-04-24 Thread Henry Wang

Hi Julien,

On 4/24/2024 9:03 PM, Julien Grall wrote:

Hi Henry,

On 24/04/2024 04:34, Henry Wang wrote:

From: Vikram Garhwal 

For overlay with iommu functionality to work with running VMs, we 
need to enable

IOMMU by default for the domains.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
  xen/arch/arm/dom0less-build.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/dom0less-build.c 
b/xen/arch/arm/dom0less-build.c

index fb63ec6fd1..2d1fd1e214 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -894,7 +894,8 @@ void __init create_domUs(void)
  panic("Missing property 'cpus' for domain %s\n",
    dt_node_name(node));
  -    if ( dt_find_compatible_node(node, NULL, 
"multiboot,device-tree") &&

+    if ( (IS_ENABLED(CONFIG_OVERLAY_DTB) ||


Similar to the first patch, building Xen with the DTB overlay doesn't 
mean the user will want to use it (think of distros that may want to 
provide a generic Xen).


Instead, we should introduce a new DT property "passthrough" that 
would indicate whether the IOMMU should be used.


To be futureproof, I would match the values used by xl.cfg (see 
docs/man/xl.cfg.5.pod.in).


That sounds good. I can introduce a new DT property as suggested. Thanks 
for the suggestion!


Kind regards,
Henry




+ dt_find_compatible_node(node, NULL, "multiboot,device-tree")) &&
   iommu_enabled )
  d_cfg.flags |= XEN_DOMCTL_CDF_iommu;


Cheers,






Re: [PATCH 14/15] add a domU script to fetch overlays and applying them to linux

2024-04-24 Thread Henry Wang

Hi Jan,

On 4/24/2024 2:16 PM, Jan Beulich wrote:

On 24.04.2024 05:34, Henry Wang wrote:

From: Vikram Garhwal 

Introduce a shell script that runs in the background and calls
get_overlay to retrive overlays and add them (or remove them) to Linux
device tree (running as a domU).

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
  tools/helpers/Makefile   |  2 +-
  tools/helpers/get_overlay.sh | 81 
  2 files changed, 82 insertions(+), 1 deletion(-)
  create mode 100755 tools/helpers/get_overlay.sh

Besides the same naming issue as in the earlier patch, the script also
looks very Linux-ish. Yet ...


I will fix the naming issue in v2. Would you mind elaborating a bit more 
about the "Linux-ish" concern? I guess this is because the original use 
case is on Linux, should I do anything about this?



--- a/tools/helpers/Makefile
+++ b/tools/helpers/Makefile
@@ -58,7 +58,6 @@ init-dom0less: $(INIT_DOM0LESS_OBJS)
  get_overlay: $(SHARE_OVERLAY_OBJS)
$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenvchan) $(LDLIBS_libxenstore) 
$(LDLIBS_libxenctrl) $(LDLIBS_libxengnttab) $(APPEND_LDFLAGS)
  
-

  .PHONY: install
  install: all
$(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
@@ -67,6 +66,7 @@ install: all
  .PHONY: uninstall
  uninstall:
for i in $(TARGETS); do rm -f $(DESTDIR)$(LIBEXEC_BIN)/$$i; done
+   $(RM) $(DESTDIR)$(LIBEXEC_BIN)/get_overlay.sh
  
  .PHONY: clean

  clean:

... you touching only the uninstall target, it's not even clear to me
how (and under what conditions) the script is going to make it into
$(DESTDIR)$(LIBEXEC_BIN)/. Did you mean to add to $(TARGETS), perhaps,
alongside the earlier added get-overlay binary?


You are right, I think the get-overlay binary and this script should be 
installed if DTB overlay is supported. Checking the code, I found 
LIBXL_HAVE_DT_OVERLAY which can indicate if we have this feature 
supported in libxl. Do you think it is a good idea to use it to install 
these two files in Makefile? Thanks.


Kind regards,
Henry



Jan





  1   2   3   4   5   6   7   8   9   10   >