[KERNEL PATCH v9 3/3] xen/privcmd: Add new syscall to get gsi from dev
On PVH dom0, when passthrough a device to domU, QEMU and xl tools want to use gsi number to do pirq mapping, see QEMU code xen_pt_realize->xc_physdev_map_pirq, and xl code pci_add_dm_done->xc_physdev_map_pirq, but in current codes, the gsi number is got from file /sys/bus/pci/devices//irq, that is wrong, because irq is not equal with gsi, they are in different spaces, so pirq mapping fails. And in current linux codes, there is no method to get gsi for userspace. For above purpose, record gsi of pcistub devices when init pcistub and add a new syscall into privcmd to let userspace can get gsi when they have a need. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- v8->v9 changes: Changed the syscall name from "IOCTL_PRIVCMD_GSI_FROM_DEV" to "IOCTL_PRIVCMD_PCIDEV_GET_GSI". Also changed the other functions name. Changed the macro wrapping "pcistub_get_gsi_from_sbdf" from "CONFIG_XEN_ACPI" to "CONFIG_XEN_PCIDEV_BACKEND" to fix compile errors reported by CI robot. Changed the parameter gsi of struct privcmd_pcidev_get_gsi from int to u32. v7->v8 changes: In function privcmd_ioctl_gsi_from_dev, return -EINVAL when not confige CONFIG_XEN_ACPI. Used PCI_BUS_NUM PCI_SLOT PCI_FUNC instead of open coding. v6->v7 changes: Changed implementation to add a new parameter "gsi" to struct pcistub_device and set gsi when pcistub initialize device. Then when userspace wants to get gsi and pass sbdf, we can return that gsi. v5->v6 changes: Changed implementation to add a new syscall to translate irq to gsi, instead adding a new gsi sysfs node, because the pci Maintainer didn't allow to add that sysfs node. v3->v5 changes: No. v2->v3 changes: Suggested by Roger: Abandoned previous implementations that added new syscall to get gsi from irq and changed to add a new sysfs node for gsi, then userspace can get gsi number from sysfs node. --- | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202406090826.whl6cb7r-...@intel.com/ --- | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202405171113.t431pc8o-...@intel.com/ --- drivers/xen/privcmd.c | 30 +++ drivers/xen/xen-pciback/pci_stub.c | 38 +++--- include/uapi/xen/privcmd.h | 7 ++ include/xen/acpi.h | 9 +++ 4 files changed, 81 insertions(+), 3 deletions(-) diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 9563650dfbaf..1ed612d21543 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -46,6 +46,9 @@ #include #include #include +#ifdef CONFIG_XEN_ACPI +#include +#endif #include "privcmd.h" @@ -844,6 +847,29 @@ static long privcmd_ioctl_mmap_resource(struct file *file, return rc; } +static long privcmd_ioctl_pcidev_get_gsi(struct file *file, void __user *udata) +{ +#ifdef CONFIG_XEN_ACPI + int rc; + struct privcmd_pcidev_get_gsi kdata; + + if (copy_from_user(&kdata, udata, sizeof(kdata))) + return -EFAULT; + + rc = pcistub_get_gsi_from_sbdf(kdata.sbdf); + if (rc < 0) + return rc; + + kdata.gsi = rc; + if (copy_to_user(udata, &kdata, sizeof(kdata))) + return -EFAULT; + + return 0; +#else + return -EINVAL; +#endif +} + #ifdef CONFIG_XEN_PRIVCMD_EVENTFD /* Irqfd support */ static struct workqueue_struct *irqfd_cleanup_wq; @@ -1543,6 +1569,10 @@ static long privcmd_ioctl(struct file *file, ret = privcmd_ioctl_ioeventfd(file, udata); break; + case IOCTL_PRIVCMD_PCIDEV_GET_GSI: + ret = privcmd_ioctl_pcidev_get_gsi(file, udata); + break; + default: break; } diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 8ce27333f54b..2ea8e4075adc 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -56,6 +56,9 @@ struct pcistub_device { struct pci_dev *dev; struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */ +#ifdef CONFIG_XEN_ACPI + int gsi; +#endif }; /* Access to pcistub_devices & seized_devices lists and the initialize_devices @@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev) kref_init(&psdev->kref); spin_lock_init(&psdev->lock); +#ifdef CONFIG_XEN_ACPI + psdev->gsi = -1; +#endif return psdev; } @@ -220,6 +226,25 @@ static struct pci_dev *pcistub_device_get_pci_dev(struct xen_pcibk_device *pdev, return pci_dev; } +#ifdef CONFIG_XEN_PCIDEV_BACKEND +int pcistub_get_gsi_from_sbdf(unsigned int sbdf) +{ + struct pcistub_device *psdev; + int domain = (sbdf >> 16) & 0x; + int bus
[KERNEL PATCH v9 0/3] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v9 series to support passthrough on Xen when dom0 is PVH. Due to the dependency codes on Xen side have been merged, so I continue to upstream this series. Although all patches of v8 have got "Reviewed-by", too much time has passed and there are some changes in the code, so I didn't add "Reviewed-by". Please review them again. v8->v9 changes: * patch#1: Due to the struct and name of the hypercall changed on Xen side, I did the corresponding changes. But no function changes actually. * patch#2: Moved the calling of xen_acpi_get_gsi_info under check "if (xen_initial_domain() && xen_pvh_domain())" to prevent it is called in PV dom0. * patch#3: Changed the syscall name from "IOCTL_PRIVCMD_GSI_FROM_DEV" to "IOCTL_PRIVCMD_PCIDEV_GET_GSI". Also changed the other functions name. Changed the macro wrapping "pcistub_get_gsi_from_sbdf" from "CONFIG_XEN_ACPI" to "CONFIG_XEN_PCIDEV_BACKEND" to fix compile errors reported by CI robot. Changed the parameter gsi of struct privcmd_pcidev_get_gsi from int to u32. Best regards, Jiqian Chen v7->v8 change: * patch#1: This is the patch#1 of v6, because it is reverted from the staging branch due to the API changes on Xen side. Add pci_device_state_reset_type_t to distinguish the reset types. * patch#2: is the patch#1 of v7. Use CONFIG_XEN_ACPI instead of CONFIG_ACPI to wrap codes. * patch#3: is the patch#2 of v7. In function privcmd_ioctl_gsi_from_dev, return -EINVAL when not confige CONFIG_XEN_ACPI. Used PCI_BUS_NUM PCI_SLOT PCI_FUNC instead of open coding. v6->v7 change: * the first patch of v6 was already merged into branch linux_next. * patch#1: is the patch#2 of v6. move the implementation of function xen_acpi_get_gsi_info to file drivers/xen/acpi.c, that modification is more convenient for the subsequent patch to obtain gsi. * patch#2: is the patch#3 of v6. add a new parameter "gsi" to struct pcistub_device and set gsi when pcistub initialize device. Then when userspace wants to get gsi by passing sbdf, we can return that gsi. v5->v6 change: * patch#3: change to add a new syscall to translate irq to gsi, instead adding a new gsi sysfs. v4->v5 changes: * patch#1: Add Reviewed-by Stefano * patch#2: Add Reviewed-by Stefano * patch#3: No changes v3->v4 changes: * patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; use a new function pcistub_reset_device_state to wrap __pci_reset_function_locked and xen_reset_device_state, and call pcistub_reset_device_state in pci_stub.c * patch#2: remove map_pirq from xen_pvh_passthrough_gsi v2->v3 changes: * patch#1: add condition to limit do xen_reset_device_state for no-pv domain in pcistub_init_device. * patch#2: Abandoning previous implementations that call unmask_irq. To setup gsi and map pirq for passthrough device in pcistub_init_device. * patch#3: Abandoning previous implementations that adds new syscall to get gsi from irq. To add a new sysfs for gsi, then userspace can get gsi number from sysfs. Below is the description of v2 cover letter: This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen. We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions. I will introduce all issues that these patches try to fix and the differences between v1 and v2. Issues we encountered: 1. pci_stub failed to write bar for a passthrough device. Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a device, pci_stub will call \u201cpcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() -> pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword(), the pci config write will trigger an io interrupt to bar_write() in the xen, but the bar->enabled was set before, the write is not allowed now, and then when bar->Qemu config the passthrough device in xen_pt_realize(), it gets invalid bar values. Reason: the reason is that we don't tell vPCI that the device has been reset, so the current cached state in pdev->vpci is all out of date and is different from the real device state. Solution: to solve this problem, the first patch of kernel(xen/pci: Add xen_reset_device_state function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) add a new hypercall to reset the state stored in vPCI when the state of real device has changed. Thank Roger for the suggestion of this v2, and it is different from v1 (https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), v1 simply allow domU to write pci bar, it does not comply with the design principl
[KERNEL PATCH v9 1/3] xen/pci: Add a function to reset device for xen
When device on dom0 side has been reset, the vpci on Xen side won't get notification, so that the cached state in vpci is all out of date with the real device state. To solve that problem, add a new function to clear all vpci device state when device is reset on dom0 side. And call that function in pcistub_init_device. Because when using "pci-assignable-add" to assign a passthrough device in Xen, it will reset passthrough device and the vpci state will out of date, and then device will fail to restore bar state. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- v8->v9 changes: Due to the struct and name of the hypercall changed on Xen side, I did the corresponding changes, so removed the Reviewed-by of Stefano. But no function changes actually. v5->v8 changes: No. v4->v5 changes: Added Reviewed-by of Stefano. v3->v4 changes: Changed the code comment of PHYSDEVOP_pci_device_state_reset. Used a new function pcistub_reset_device_state to wrap __pci_reset_function_locked and xen_reset_device_state, and called pcistub_reset_device_state in pci_stub.c. v2->v3 changes: Added condition to limit do xen_reset_device_state for no-pv domain in pcistub_init_device. v1->v2 changes: New patch to add a new function to call reset hypercall. --- drivers/xen/pci.c | 13 + drivers/xen/xen-pciback/pci_stub.c | 18 +++--- include/xen/interface/physdev.h| 17 + include/xen/pci.h | 6 ++ 4 files changed, 51 insertions(+), 3 deletions(-) diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 72d4e3f193af..bb59524b8bbd 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -177,6 +177,19 @@ static int xen_remove_device(struct device *dev) return r; } +int xen_reset_device(const struct pci_dev *dev) +{ + struct pci_device_reset device = { + .dev.seg = pci_domain_nr(dev->bus), + .dev.bus = dev->bus->number, + .dev.devfn = dev->devfn, + .flags = PCI_DEVICE_RESET_FLR, + }; + + return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_reset, &device); +} +EXPORT_SYMBOL_GPL(xen_reset_device); + static int xen_pci_notifier(struct notifier_block *nb, unsigned long action, void *data) { diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 4faebbb84999..3e162c1753e2 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -89,6 +89,16 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev) return psdev; } +static int pcistub_reset_device_state(struct pci_dev *dev) +{ + __pci_reset_function_locked(dev); + + if (!xen_pv_domain()) + return xen_reset_device(dev); + else + return 0; +} + /* Don't call this directly as it's called by pcistub_device_put */ static void pcistub_device_release(struct kref *kref) { @@ -107,7 +117,7 @@ static void pcistub_device_release(struct kref *kref) /* Call the reset function which does not take lock as this * is called from "unbind" which takes a device_lock mutex. */ - __pci_reset_function_locked(dev); + pcistub_reset_device_state(dev); if (dev_data && pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state)) dev_info(&dev->dev, "Could not reload PCI state\n"); @@ -284,7 +294,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev) * (so it's ready for the next domain) */ device_lock_assert(&dev->dev); - __pci_reset_function_locked(dev); + pcistub_reset_device_state(dev); dev_data = pci_get_drvdata(dev); ret = pci_load_saved_state(dev, dev_data->pci_saved_state); @@ -420,7 +430,9 @@ static int pcistub_init_device(struct pci_dev *dev) dev_err(&dev->dev, "Could not store PCI conf saved state!\n"); else { dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n"); - __pci_reset_function_locked(dev); + err = pcistub_reset_device_state(dev); + if (err) + goto config_release; pci_restore_state(dev); } /* Now disable the device (this also ensures some private device diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h index a237af867873..df74e65a884b 100644 --- a/include/xen/interface/physdev.h +++ b/include/xen/interface/physdev.h @@ -256,6 +256,13 @@ struct physdev_pci_device_add { */ #define PHYSDEVOP_prepare_msix 30 #define PHYSDEVOP_release_msix 31 +/* + * Notify the hypervisor that a PCI device has been reset, so that any + * internally cached state is
[KERNEL PATCH v9 2/3] xen/pvh: Setup gsi for passthrough device
In PVH dom0, the gsis don't get registered, but the gsi of a passthrough device must be configured for it to be able to be mapped into a domU. When assigning a device to passthrough, proactively setup the gsi of the device during that process. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- v8->v9 changes: Moved the calling of xen_acpi_get_gsi_info under check "if (xen_initial_domain() && xen_pvh_domain())" to prevent it is called in PV dom0. Removed Reviewed-by of Stefano. v7->v8 changes: Used CONFIG_XEN_ACPI instead of CONFIG_ACPI to wrap codes. v6->v7 changes: Moved the implementation of function xen_acpi_get_gsi_info to file drivers/xen/acpi.c, that modification is more convenient for the subsequent patch to obtain gsi. v5->v6 changes: No. v4->v5 changes: Added Reviewed-by of Stefano. v3->v4 changes: Removed map_pirq from xen_pvh_passthrough_gsi since let pvh calls map_pirq here is not right. v2->v3 changes: Abandoned previous implementations that called unmask_irq, and change to do setup_gsi and map_pirq for passthrough device in pcistub_init_device. --- | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202406090859.kw3eeesv-...@intel.com/ --- | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202405172132.tazuvppo-...@intel.com/ --- arch/x86/xen/enlighten_pvh.c | 23 ++ drivers/acpi/pci_irq.c | 2 +- drivers/xen/acpi.c | 50 ++ drivers/xen/xen-pciback/pci_stub.c | 20 include/linux/acpi.h | 1 + include/xen/acpi.h | 18 +++ 6 files changed, 113 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 728a4366ca85..bf68c329fc01 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -4,6 +4,7 @@ #include #include +#include #include #include @@ -28,6 +29,28 @@ bool __ro_after_init xen_pvh; EXPORT_SYMBOL_GPL(xen_pvh); +#ifdef CONFIG_XEN_DOM0 +int xen_pvh_setup_gsi(int gsi, int trigger, int polarity) +{ + int ret; + struct physdev_setup_gsi setup_gsi; + + setup_gsi.gsi = gsi; + setup_gsi.triggering = (trigger == ACPI_EDGE_SENSITIVE ? 0 : 1); + setup_gsi.polarity = (polarity == ACPI_ACTIVE_HIGH ? 0 : 1); + + ret = HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi); + if (ret == -EEXIST) { + xen_raw_printk("Already setup the GSI :%d\n", gsi); + ret = 0; + } else if (ret) + xen_raw_printk("Fail to setup GSI (%d)!\n", gsi); + + return ret; +} +EXPORT_SYMBOL_GPL(xen_pvh_setup_gsi); +#endif + /* * Reserve e820 UNUSABLE regions to inflate the memory balloon. * diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c index ff30ceca2203..630fe0a34bc6 100644 --- a/drivers/acpi/pci_irq.c +++ b/drivers/acpi/pci_irq.c @@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev, } #endif /* CONFIG_X86_IO_APIC */ -static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin) +struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin) { struct acpi_prt_entry *entry = NULL; struct pci_dev *bridge; diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c index 6893c79fd2a1..9e2096524fbc 100644 --- a/drivers/xen/acpi.c +++ b/drivers/xen/acpi.c @@ -30,6 +30,7 @@ * IN THE SOFTWARE. */ +#include #include #include #include @@ -75,3 +76,52 @@ int xen_acpi_notify_hypervisor_extended_sleep(u8 sleep_state, return xen_acpi_notify_hypervisor_state(sleep_state, val_a, val_b, true); } + +struct acpi_prt_entry { + struct acpi_pci_id id; + u8 pin; + acpi_handle link; + u32 index; +}; + +int xen_acpi_get_gsi_info(struct pci_dev *dev, + int *gsi_out, + int *trigger_out, + int *polarity_out) +{ + int gsi; + u8 pin; + struct acpi_prt_entry *entry; + int trigger = ACPI_LEVEL_SENSITIVE; + int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ? + ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW; + + if (!dev || !gsi_out || !trigger_out || !polarity_out) + return -EINVAL; + + pin = dev->pin; + if (!pin) + return -EINVAL; + + entry = acpi_pci_irq_lookup(dev, pin); + if (entry) { + if (entry->link) + gsi = acpi_pci_link_allocate_irq(entry->link, +entry->ind
[RFC XEN PATCH v15 4/4] tools: Add new function to do PIRQ (un)map on PVH dom0
When dom0 is PVH, and passthrough a device to dumU, xl will use the gsi number of device to do a pirq mapping, see pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, that confuses irq and gsi, they are in different space and are not equal, so it will fail when mapping. To solve this issue, to get the real gsi and add a new function xc_physdev_map_pirq_gsi to get a free pirq for gsi. Note: why not use current function xc_physdev_map_pirq, because it doesn't support to allocate a free pirq, what's more, to prevent changing it and affecting its callers, so add xc_physdev_map_pirq_gsi. Besides, PVH dom0 doesn't have PIRQs flag, it doesn't do PHYSDEVOP_map_pirq for each gsi. So grant function callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission requires passing in pirq, it is not suitable for PVH dom0 that doesn't have PIRQs to grant irq permission. To solve this issue, use the another hypercall XEN_DOMCTL_gsi_permission to grant the permission of irq( translate from gsi) to dumU when dom0 has no PIRQs. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged. https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/ --- v13->v15 changes: Change the initialization way of "struct physdev_map_pirq map" in function xc_physdev_map_pirq_gsi to be definition and set value directly. Change code from "rc = libxl__arch_local_domain_has_pirq_notion(gc); if (!rc) {}" to "if (libxl__arch_local_domain_has_pirq_notion(gc) == false) {}" Modified some log prints codes. v12->v13 changes: Deleted patch #6 of v12, and added function xc_physdev_map_pirq_gsi to map pirq for gsi. For functions that generate libxl error, changed the return value from -1 to ERROR_*. Instead of declaring "ctx", use the macro "CTX". Add the function libxl__arch_local_romain_ has_pirq_notion to determine if there is a concept of pirq in the domain where xl is located. In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use map_pirq to obtain the pirq corresponding to gsi. v11->v12 changes: Nothing. v10->v11 changes: New patch Modification of the tools part of patches#4 and #5 of v10, use privcmd_gsi_from_dev to get gsi, and use XEN_DOMCTL_gsi_permission to grant gsi. Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID. Add libxl__arch_hvm_map_gsi to distinguish x86 related implementations. Add a list pcidev_pirq_list to record the relationship between sbdf and pirq, which can be used to obtain the corresponding pirq when unmap PIRQ. --- tools/include/xenctrl.h | 10 tools/libs/ctrl/xc_domain.c | 15 + tools/libs/ctrl/xc_physdev.c | 27 + tools/libs/light/libxl_arch.h | 6 ++ tools/libs/light/libxl_arm.c | 15 + tools/libs/light/libxl_pci.c | 110 -- tools/libs/light/libxl_x86.c | 72 ++ 7 files changed, 210 insertions(+), 45 deletions(-) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 924f9a35f790..29617585c535 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1383,6 +1383,11 @@ int xc_domain_irq_permission(xc_interface *xch, uint32_t pirq, bool allow_access); +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + uint32_t flags); + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, @@ -1638,6 +1643,11 @@ int xc_physdev_map_pirq_msi(xc_interface *xch, int entry_nr, uint64_t table_base); +int xc_physdev_map_pirq_gsi(xc_interface *xch, +uint32_t domid, +int gsi, +int *pirq); + int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c index f2d9d14b4d9f..e3538ec0ba80 100644 --- a/tools/libs/ctrl/xc_domain.c +++ b/tools/libs/ctrl/xc_domain.c @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch, return do_domctl(xch, &domctl); } +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + uint32_t flags) +{ +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_gsi_permission, +.domain = domid, +.u.gsi_permission.gsi = gsi, +
[RFC XEN PATCH v15 3/4] tools: Add new function to get gsi from dev
On PVH dom0, when passthrough a device to domU, QEMU and xl tools want to use gsi number to do pirq mapping, see QEMU code xen_pt_realize->xc_physdev_map_pirq, and xl code pci_add_dm_done->xc_physdev_map_pirq, but in current codes, the gsi number is got from file /sys/bus/pci/devices//irq, that is wrong, because irq is not equal with gsi, they are in different spaces, so pirq mapping fails. And in current codes, there is no method to get gsi for userspace. For above purpose, add new function to get gsi, and the corresponding ioctl is implemented on linux kernel side. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian Reviewed-by: Anthony PERARD --- RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged. https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/ --- v13->v15 changes: Add "Reviewed-by: Anthony PERARD " v12->v13 changes: Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi to avoid confusion with physdev namesapce. Move the implementation of xc_pcidev_get_gsi into xc_linux.c. Directly use xencall_fd(xch->xcall) in the function xc_pcidev_get_gsi instead of opening "privcmd". v11->v12 changes: Nothing. v10->v11 changes: Patch#4 of v10, directly open "/dev/xen/privcmd" in the function xc_physdev_gsi_from_dev instead of adding unnecessary functions to libxencall. Change the type of gsi in the structure privcmd_gsi_from_dev from int to u32. v9->v10 changes: Extract the implementation of xc_physdev_gsi_from_dev to be a new patch. --- tools/include/xen-sys/Linux/privcmd.h | 7 +++ tools/include/xenctrl.h | 2 ++ tools/libs/ctrl/xc_freebsd.c | 6 ++ tools/libs/ctrl/xc_linux.c| 20 tools/libs/ctrl/xc_minios.c | 6 ++ tools/libs/ctrl/xc_netbsd.c | 6 ++ tools/libs/ctrl/xc_solaris.c | 6 ++ 7 files changed, 53 insertions(+) diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h index bc60e8fd55eb..607dfa2287bc 100644 --- a/tools/include/xen-sys/Linux/privcmd.h +++ b/tools/include/xen-sys/Linux/privcmd.h @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource { __u64 addr; } privcmd_mmap_resource_t; +typedef struct privcmd_pcidev_get_gsi { + __u32 sbdf; + __u32 gsi; +} privcmd_pcidev_get_gsi_t; + /* * @cmd: IOCTL_PRIVCMD_HYPERCALL * @arg: &privcmd_hypercall_t @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource { _IOC(_IOC_NONE, 'P', 6, sizeof(domid_t)) #define IOCTL_PRIVCMD_MMAP_RESOURCE\ _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t)) +#define IOCTL_PRIVCMD_PCIDEV_GET_GSI \ + _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_pcidev_get_gsi_t)) #define IOCTL_PRIVCMD_UNIMPLEMENTED\ _IOC(_IOC_NONE, 'P', 0xFF, 0) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 2c4608c09ab0..924f9a35f790 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1642,6 +1642,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libs/ctrl/xc_freebsd.c b/tools/libs/ctrl/xc_freebsd.c index 9dd48a3a08bb..9019fc663361 100644 --- a/tools/libs/ctrl/xc_freebsd.c +++ b/tools/libs/ctrl/xc_freebsd.c @@ -60,6 +60,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size) return ptr; } +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf) +{ +errno = ENOSYS; +return -1; +} + /* * Local variables: * mode: C diff --git a/tools/libs/ctrl/xc_linux.c b/tools/libs/ctrl/xc_linux.c index c67c71c08be3..92591e49a1c8 100644 --- a/tools/libs/ctrl/xc_linux.c +++ b/tools/libs/ctrl/xc_linux.c @@ -66,6 +66,26 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size) return ptr; } +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf) +{ +int ret; +privcmd_pcidev_get_gsi_t dev_gsi = { +.sbdf = sbdf, +.gsi = 0, +}; + +ret = ioctl(xencall_fd(xch->xcall), +IOCTL_PRIVCMD_PCIDEV_GET_GSI, &dev_gsi); + +if (ret < 0) { +PERROR("Failed to get gsi from dev"); +} else { +ret = dev_gsi.gsi; +} + +return ret; +} + /* * Local variables: * mode: C diff --git a/tools/libs/ctrl/xc_minios.c b/tools/libs/ctrl/xc_minios.c index 3dea7a78a576..462af827b33c 100644 --- a/tools/libs/ctrl/xc_minios.c +++ b/tools/libs/ctrl/xc_minios.c @@ -47,6 +47,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size) return memalign(alignment, size); } +int xc_pcidev_get_gsi(xc_inter
[XEN PATCH v15 1/4] x86/hvm: allow {,un}map_pirq hypercalls unconditionally
The current hypercall interfaces to manage and assign interrupts to domains is mostly based in using pIRQs as handlers. Such pIRQ values are abstract domain-specific references to interrupts. Classic HVM domains can have access to {,un}map_pirq hypercalls if the domain is allowed to route physical interrupts over event channels. That's however a different interface, limited to only mapping interrupts to itself. PVH domains on the other hand never had access to the interface, as PVH domains are not allowed to route interrupts over event channels. In order to allow setting up PCI passthrough from a PVH domain it needs access to the {,un}map_pirq hypercalls so interrupts can be assigned a pIRQ handler that can then be used by further hypercalls to bind the interrupt to a domain. Note that the {,un}map_pirq hypercalls end up calling helpers that are already used against a PVH domain in order to setup interrupts for the hardware domain when running in PVH mode. physdev_map_pirq() will call allocate_and_map_{gsi,msi}_pirq() which is already used by the vIO-APIC or the vPCI code respectively. So the exposed code paths are not new when targeting a PVH domain, but rather previous callers are not hypercall but emulation based. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- v14->v15 changes: Change to use the commit message wrote by Roger. v13->v14 changes: Modified the commit message. v12->v13 changes: Removed the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and added a corresponding description in the commit message. v11->v12 changes: Avoid using return, set error code instead when (un)map is not allowed. v10->v11 changes: Delete the judgment of "d==currd", so that we can prevent physdev_(un)map_pirq from being executed when domU has no pirq, instead of just preventing self-mapping. And modify the description of the commit message accordingly. v9->v10 changes: Indent the comments above PHYSDEVOP_map_pirq according to the code style. v8->v9 changes: Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall. Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" to "d == current->domian". v7->v8 changes: Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't use pirq. That check was missed in the previous version. v6->v7 changes: Nothing. v5->v6 changes: Nothing. v4->v5 changes: Move the check of self map_pirq to physdev.c, and change to check if the caller has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op. v3->v4 changes: add check to prevent PVH self map. v2->v3 changes: Du to changes in the implementation of the second patch on kernel side(that it will do setup_gsi and map_pirq when assigning a device to passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self mapping. --- xen/arch/x86/hvm/hypercall.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index f023f7879e24..81883c8d4f60 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: -- 2.34.1
[XEN PATCH v15 2/4] x86/irq: allow setting IRQ permissions from GSI instead of pIRQ
Some domains are not aware of the pIRQ abstraction layer that maps interrupt sources into Xen space interrupt numbers. pIRQs values are only exposed to domains that have the option to route physical interrupts over event channels. This creates issues for PCI-passthrough from a PVH domain, as some of the passthrough related hypercalls use pIRQ as references to physical interrupts on the system. One of such interfaces is XEN_DOMCTL_irq_permission, used to grant or revoke access to interrupts, takes a pIRQ as the reference to the interrupt to be adjusted. Since PVH doesn't manage interrupts in terms of pIRQs, introduce a new hypercall that allows setting interrupt permissions based on GSI value rather than pIRQ. Note the GSI hypercall parameters is translated to an IRQ value (in case there are ACPI overrides) before doing the checks. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- CC: Daniel P . Smith Remaining comment @Daniel P . Smith: +ret = -EPERM; +if ( !irq_access_permitted(currd, irq) || + xsm_irq_permission(XSM_HOOK, d, irq, flags) ) +break; Is it okay to issue the XSM check using the translated value(irq), not the one(gsi) that was originally passed into the hypercall? --- v13->v15 changes: Change to use the commit message wrote by Roger. Change the code comment from "Check all bits are zero except lowest bit" to "Check only valid bits are set". Change the end return sentence of gsi_2_irq to "return irq ?: -EINVAL;" to preserve the error code from apic_pin_2_gsi_irq(). v12->v13 changes: For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and XEN_DOMCTL_GSI_GRANT macros. Move "gsi > highest_gsi()" into function gsi_2_irq. Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int type. Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK". Delete unnecessary goto statements and change to direct break. Add description in commit message to explain how gsi to irq isconverted. v11->v12 changes: Change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove "__init" of highest_gsi function. Change the check of irq boundary from <0 to <=0, and remove unnecessary space. Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit. v10->v11 changes: Extracted from patch#5 of v10 into a separate patch. Add non-zero judgment for other bits of allow_access. Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )". Change the error exit path identifier "out" to "gsi_permission_out". Use ARRAY_SIZE() instead of open coed. v9->v10 changes: Modified the commit message to further describe the purpose of adding XEN_DOMCTL_gsi_permission. Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, and used currd instead of current->domain. In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original new code, and error handling for irq0 was added. Deleted the extra spaces in the upper and lower lines of the struct xen_domctl_gsi_permission definition. v8->v9 changes: Change the commit message to describe more why we need this new hypercall. Add comment above "if ( is_pv_domain(current->domain) || has_pirq(current->domain) )" to explain why we need this check. Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq. Add explicit padding to struct xen_domctl_gsi_permission. v5->v8 changes: Nothing. v4->v5 changes: New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi. --- xen/arch/x86/domctl.c | 29 + xen/arch/x86/include/asm/io_apic.h | 2 ++ xen/arch/x86/io_apic.c | 19 +++ xen/arch/x86/mpparse.c | 7 +++ xen/include/public/domctl.h| 10 ++ xen/xsm/flask/hooks.c | 1 + 6 files changed, 64 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 68b5b46d1a83..939b1de0ee59 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -36,6 +36,7 @@ #include #include #include +#include static int update_domain_cpu_policy(struct domain *d, xen_domctl_cpu_policy_t *xdpc) @@ -237,6 +238,34 @@ long arch_do_domctl( break; } +case XEN_DOMCTL_gsi_permission: +{ +int irq; +unsigned int gsi = domctl->u.gsi_permission.gsi; +uint32_t flags = domctl->u.gsi_permission.flags; + +/* Check only valid bits are set */ +ret = -EINVAL; +if ( flags & ~XEN_DOMCTL_GSI_ACTION_MASK ) +break; + +ret = irq = gsi_2_
[XEN PATCH v15 0/4] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v15 series to support passthrough when dom0 is PVH v14->v15 changes: Due to the patch#1 of v14 had been merged, so the sequence number of following patches are v14 decrese one. * patch#1: Change to use the commit message wrote by Roger. * patch#2: Change to use the commit message wrote by Roger. Change the code comment from "Check all bits are zero except lowest bit" to "Check only valid bits are set". Change the end return sentence of gsi_2_irq to "return irq ?: -EINVAL;" to preserve the error code from apic_pin_2_gsi_irq(). * patch#3: Add "Reviewed-by: Anthony PERARD " * patch#4: Change the initialization way of "struct physdev_map_pirq map" in function xc_physdev_map_pirq_gsi to be definition and set value directly. Change code from "rc = libxl__arch_local_domain_has_pirq_notion(gc); if (!rc) {}" to "if (libxl__arch_local_domain_has_pirq_notion(gc) == false) {}" Modified some log prints codes. Best regards, Jiqian Chen v13->v14 changes: * patch#1: Removed the check ( !is_pci_passthrough_enabled() ). Added if ( dev_reset.flags & ~PCI_DEVICE_RESET_MASK ) to check if the other bits are zero. * patch#2: Modified the commit message. Due to the patch#3 of v13 had been merged, so the sequence number of following patches are v13 decrese one. * patch#3~5: No changes. v12->v13 changes: Due to major changes in the codes, all the Reviewed-by received before have been removed. Please review them again. * patch#1: Delete all "state" words in new code, because it is not necessary. Delete unnecessary parameter reset_type of function vpci_reset_device, and changed this function to inline function. Add description to commit message to indicate that the classification of reset types is for possible different behaviors in the future. Rename reset_type of struct pci_device_reset to flags, and modified the value of macro definition of reset, let them occupy two lowest bits. Change the function vpci_reset_device to an inline function and delete the "ASSERT(rw_is_write_locked(&pdev->domain->pci_lock))"; because this exists in subsequent functions and it accesses domain and pci_lock, which will affect the compilation process. * patch#2: Remove the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and added a corresponding description in the commit message. * patch#3: Add more detailed descriptions into commit message not just callstack. * patch#4: For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and XEN_DOMCTL_GSI_GRANT macros. Move "gsi > highest_gsi()" into function gsi_2_irq. Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int type. Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK". Delete unnecessary goto statements and change to direct break. Add description in commit message to explain how gsi to irq is converted. * patch#5: Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi to avoid confusion with physdev namesapce. Move the implementation of xc_pcidev_get_gsi into xc_linux.c. Directly use xencall_fd(xch->xcall) in the function xc_pcidev_get_gsi instead of opening "privcmd". * patch#6: Delete patch #6 of v12, and added function xc_physdev_map_pirq_gsi to map pirq for gsi. For functions that generate libxl error, changed the return value from -1 to ERROR_*. Instead of declaring "ctx", use the macro "CTX". Add the function libxl__arch_local_romain_ has_pirq_notion to determine if there is a concept of pirq in the domain where xl is located. In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use map_pirq to obtain the pirq corresponding to gsi. v11->v12 changes: * patch#1: Change the title of this patch. Remove unnecessary notes, erroneous stamps, and #define. * patch#2: Avoid using return, set error code instead when (un)map is not allowed. Due to functional change in v11, remove the Reviewed-by of Stefano. * patch#3: Add more detailed descriptions into commit message not just callstack. patch#4 in v11: remove from this series and upstream individually. * patch#4: is patch#5 of v11, change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove "__init" of highest_gsi function. Change the check
[XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
When dom0 is PVH type and passthrough a device to HVM domU, Qemu code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done-> xc_physdev_map_pirq map a pirq for passthrough devices. In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag, so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current codes. But it is fine to map interrupts through pirq to a HVM domain whose XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to reference interrupts and it is just the way for the device model to identify which interrupt should be mapped to which domain, however has_pirq() is just to check if HVM domains route interrupts from devices(emulated or passthrough) through event channel, so, the has_pirq() check should not be applied to the PHYSDEVOP_map_pirq issued by dom0. So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the interrupt of a passthrough device can be successfully mapped to pirq for domU. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- v13->v14 changes: Modified the commit message. v12->v13 changes: Removed the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and added a corresponding description in the commit message. v11->v12 changes: Avoid using return, set error code instead when (un)map is not allowed. v10->v11 changes: Delete the judgment of "d==currd", so that we can prevent physdev_(un)map_pirq from being executed when domU has no pirq, instead of just preventing self-mapping. And modify the description of the commit message accordingly. v9->v10 changes: Indent the comments above PHYSDEVOP_map_pirq according to the code style. v8->v9 changes: Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall. Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" to "d == current->domian". v7->v8 changes: Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't use pirq. That check was missed in the previous version. v6->v7 changes: Nothing. v5->v6 changes: Nothing. v4->v5 changes: Move the check of self map_pirq to physdev.c, and change to check if the caller has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op. v3->v4 changes: add check to prevent PVH self map. v2->v3 changes: Du to changes in the implementation of the second patch on kernel side(that it will do setup_gsi and map_pirq when assigning a device to passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self mapping. --- xen/arch/x86/hvm/hypercall.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index f023f7879e24..81883c8d4f60 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: -- 2.34.1
[RFC XEN PATCH v14 4/5] tools: Add new function to get gsi from dev
When passthrough a device to domU, QEMU and xl tools use its gsi number to do pirq mapping, see QEMU code xen_pt_realize->xc_physdev_map_pirq, and xl code pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, that is wrong, because irq is not equal with gsi, they are in different spaces, so pirq mapping fails. And in current codes, there is no method to get gsi for userspace. For above purpose, add new function to get gsi, and the corresponding ioctl is implemented on linux kernel side. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged. https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/ --- v13->v14 changes: No. v12->v13 changes: Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi to avoid confusion with physdev namesapce. Move the implementation of xc_pcidev_get_gsi into xc_linux.c. Directly use xencall_fd(xch->xcall) in the function xc_pcidev_get_gsi instead of opening "privcmd". v11->v12 changes: Nothing. v10->v11 changes: Patch#4 of v10, directly open "/dev/xen/privcmd" in the function xc_physdev_gsi_from_dev instead of adding unnecessary functions to libxencall. Change the type of gsi in the structure privcmd_gsi_from_dev from int to u32. v9->v10 changes: Extract the implementation of xc_physdev_gsi_from_dev to be a new patch. --- tools/include/xen-sys/Linux/privcmd.h | 7 +++ tools/include/xenctrl.h | 2 ++ tools/libs/ctrl/xc_freebsd.c | 6 ++ tools/libs/ctrl/xc_linux.c| 20 tools/libs/ctrl/xc_minios.c | 6 ++ tools/libs/ctrl/xc_netbsd.c | 6 ++ tools/libs/ctrl/xc_solaris.c | 6 ++ 7 files changed, 53 insertions(+) diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h index bc60e8fd55eb..607dfa2287bc 100644 --- a/tools/include/xen-sys/Linux/privcmd.h +++ b/tools/include/xen-sys/Linux/privcmd.h @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource { __u64 addr; } privcmd_mmap_resource_t; +typedef struct privcmd_pcidev_get_gsi { + __u32 sbdf; + __u32 gsi; +} privcmd_pcidev_get_gsi_t; + /* * @cmd: IOCTL_PRIVCMD_HYPERCALL * @arg: &privcmd_hypercall_t @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource { _IOC(_IOC_NONE, 'P', 6, sizeof(domid_t)) #define IOCTL_PRIVCMD_MMAP_RESOURCE\ _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t)) +#define IOCTL_PRIVCMD_PCIDEV_GET_GSI \ + _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_pcidev_get_gsi_t)) #define IOCTL_PRIVCMD_UNIMPLEMENTED\ _IOC(_IOC_NONE, 'P', 0xFF, 0) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 2c4608c09ab0..924f9a35f790 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1642,6 +1642,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libs/ctrl/xc_freebsd.c b/tools/libs/ctrl/xc_freebsd.c index 9dd48a3a08bb..9019fc663361 100644 --- a/tools/libs/ctrl/xc_freebsd.c +++ b/tools/libs/ctrl/xc_freebsd.c @@ -60,6 +60,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size) return ptr; } +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf) +{ +errno = ENOSYS; +return -1; +} + /* * Local variables: * mode: C diff --git a/tools/libs/ctrl/xc_linux.c b/tools/libs/ctrl/xc_linux.c index c67c71c08be3..92591e49a1c8 100644 --- a/tools/libs/ctrl/xc_linux.c +++ b/tools/libs/ctrl/xc_linux.c @@ -66,6 +66,26 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size) return ptr; } +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf) +{ +int ret; +privcmd_pcidev_get_gsi_t dev_gsi = { +.sbdf = sbdf, +.gsi = 0, +}; + +ret = ioctl(xencall_fd(xch->xcall), +IOCTL_PRIVCMD_PCIDEV_GET_GSI, &dev_gsi); + +if (ret < 0) { +PERROR("Failed to get gsi from dev"); +} else { +ret = dev_gsi.gsi; +} + +return ret; +} + /* * Local variables: * mode: C diff --git a/tools/libs/ctrl/xc_minios.c b/tools/libs/ctrl/xc_minios.c index 3dea7a78a576..462af827b33c 100644 --- a/tools/libs/ctrl/xc_minios.c +++ b/tools/libs/ctrl/xc_minios.c @@ -47,6 +47,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size) return memalign(alignment, size); } +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf) +{ +errno = ENOSYS; +return -1; +} + /* * Local variables: * mod
[XEN PATCH v14 1/5] xen/pci: Add hypercall to support reset of pcidev
When a device has been reset on dom0 side, the Xen hypervisor doesn't get notification, so the cached state in vpci is all out of date compare with the real device state. To solve that problem, add a new hypercall to support the reset of pcidev and clear the vpci state of device. So that once the state of device is reset on dom0 side, dom0 can call this hypercall to notify hypervisor. The behavior of different reset types may be different in the future, so divide them now so that they can be easily modified in the future without affecting the hypercall interface. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- v13->v14 changes: Removed the check ( !is_pci_passthrough_enabled() ). Added if ( dev_reset.flags & ~PCI_DEVICE_RESET_MASK ) to check if the other bits are zero. v12->v13 changes: Deleted all "state" words in new code, because it is not necessary. Deleted unnecessary parameter reset_type of function vpci_reset_device, and changed this function to inline function Added description to commit message to indicate that the classification of reset types is for possible different behaviors in the future Renamed reset_type of struct pci_device_reset to flags, and modified the value of macro definition of reset, let them occupy two lowest bits. Change the function vpci_reset_device to an inline function and delete the ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); because this call exists in subsequent functions and it accesses domain and pci_lock, which will affect the compilation process. v11->v12 changes: Change the title of this patch(Add hypercall to support reset of pcidev). Remove unnecessary notes, erroneous stamps, and #define. v10->v11 changes: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to the next line. Delete unnecessary local variables "struct physdev_pci_device *dev". Downgrade printk to dprintk. Moved struct pci_device_state_reset to the public header file. Delete enum pci_device_state_reset_type, and use macro definitions to represent different reset types. Delete pci_device_state_reset_method, and add switch cases in PHYSDEVOP_pci_device_state_reset to handle different reset functions. Add reset type as a function parameter for vpci_reset_device_state for possible future use. v9->v10 changes: Nothing. v8->v9 changes: Move pcidevs_unlock below write_lock, and remove "ASSERT(pcidevs_locked());" from vpci_reset_device_state; Add pci_device_state_reset_type to distinguish the reset types. v7->v8 changes: Nothing. v6->v7 changes: Nothing. v5->v6 changes: Rebase code and change old function vpci_remove_device, vpci_add_handlers to vpci_deassign_device, vpci_assign_device. v4->v5 changes: Add pci_lock wrap function vpci_reset_device_state. v3->v4 changes: Change the comment of PHYSDEVOP_pci_device_state_reset; Move printings behind pcidevs_unlock. v2->v3 changes: Move the content out of pci_reset_device_state and delete pci_reset_device_state; Add xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; Add description for PHYSDEVOP_pci_device_state_reset; for patch 1 --- xen/arch/x86/hvm/hypercall.c | 1 + xen/drivers/pci/physdev.c| 52 xen/include/public/physdev.h | 17 xen/include/xen/vpci.h | 6 + 4 files changed, 76 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 44342e7e7fc3..f023f7879e24 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: +case PHYSDEVOP_pci_device_reset: case PHYSDEVOP_dbgp_op: if ( !is_hardware_domain(currd) ) return -ENOSYS; diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c index 42db3e6d133c..0161a85e1e9c 100644 --- a/xen/drivers/pci/physdev.c +++ b/xen/drivers/pci/physdev.c @@ -2,6 +2,7 @@ #include #include #include +#include #ifndef COMPAT typedef long ret_t; @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case PHYSDEVOP_pci_device_reset: +{ +struct pci_device_reset dev_reset; +struct pci_dev *pdev; +pci_sbdf_t sbdf; + +ret = -EFAULT; +if ( copy_from_guest(&dev_reset, arg, 1) != 0 ) +break; + +ret = -EINVAL; +if ( dev_reset.flags & ~PCI_DEVICE_RESET_MASK ) +break; + +sbdf = PCI_SBDF(dev_reset.dev.seg, +dev_reset.dev.bus, +dev_reset.dev.devfn); + +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf); +if ( ret ) +break; + +pcidevs_lock();
[RFC XEN PATCH v14 5/5] tools: Add new function to do PIRQ (un)map on PVH dom0
When dom0 is PVH, and passthrough a device to dumU, xl will use the gsi number of device to do a pirq mapping, see pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, that confuses irq and gsi, they are in different space and are not equal, so it will fail when mapping. To solve this issue, use xc_physdev_gsi_from_dev to get the real gsi and add a new function xc_physdev_map_pirq_gsi to get a free pirq for gsi(why not use current function xc_physdev_map_pirq, because it doesn't support to allocate a free pirq, what's more, to prevent changing it and affecting its callers, so add xc_physdev_map_pirq_gsi). Besides, PVH dom0 doesn't have PIRQ flag, it doesn't do PHYSDEVOP_map_pirq for each gsi. So grant function callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission requires passing in pirq, it is not suitable for dom0 that doesn't have PIRQs to grant irq permission. To solve this issue, use the new hypercall XEN_DOMCTL_gsi_permission to grant the permission of irq( translate from gsi) to dumU when dom0 has no PIRQs. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged. https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/ --- v13->v14 changes: No. v12->v13 changes: Deleted patch #6 of v12, and added function xc_physdev_map_pirq_gsi to map pirq for gsi. For functions that generate libxl error, changed the return value from -1 to ERROR_*. Instead of declaring "ctx", use the macro "CTX". Add the function libxl__arch_local_romain_ has_pirq_notion to determine if there is a concept of pirq in the domain where xl is located. In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use map_pirq to obtain the pirq corresponding to gsi. v11->v12 changes: Nothing. v10->v11 changes: New patch Modification of the tools part of patches#4 and #5 of v10, use privcmd_gsi_from_dev to get gsi, and use XEN_DOMCTL_gsi_permission to grant gsi. Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID. Add libxl__arch_hvm_map_gsi to distinguish x86 related implementations. Add a list pcidev_pirq_list to record the relationship between sbdf and pirq, which can be used to obtain the corresponding pirq when unmap PIRQ. --- tools/include/xenctrl.h | 10 +++ tools/libs/ctrl/xc_domain.c | 15 + tools/libs/ctrl/xc_physdev.c | 27 tools/libs/light/libxl_arch.h | 6 ++ tools/libs/light/libxl_arm.c | 15 + tools/libs/light/libxl_pci.c | 112 -- tools/libs/light/libxl_x86.c | 72 ++ 7 files changed, 212 insertions(+), 45 deletions(-) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 924f9a35f790..29617585c535 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1383,6 +1383,11 @@ int xc_domain_irq_permission(xc_interface *xch, uint32_t pirq, bool allow_access); +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + uint32_t flags); + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, @@ -1638,6 +1643,11 @@ int xc_physdev_map_pirq_msi(xc_interface *xch, int entry_nr, uint64_t table_base); +int xc_physdev_map_pirq_gsi(xc_interface *xch, +uint32_t domid, +int gsi, +int *pirq); + int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c index f2d9d14b4d9f..e3538ec0ba80 100644 --- a/tools/libs/ctrl/xc_domain.c +++ b/tools/libs/ctrl/xc_domain.c @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch, return do_domctl(xch, &domctl); } +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + uint32_t flags) +{ +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_gsi_permission, +.domain = domid, +.u.gsi_permission.gsi = gsi, +.u.gsi_permission.flags = flags, +}; + +return do_domctl(xch, &domctl); +} + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c index 460a8e779
[XEN PATCH v14 3/5] x86/domctl: Add hypercall to set the access of x86 gsi
Some type of domains don't have PIRQs, like PVH, it doesn't do PHYSDEVOP_map_pirq for each gsi. When passthrough a device to guest base on PVH dom0, callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and irq on Xen side. What's more, current hypercall XEN_DOMCTL_irq_permission requires passing in pirq to set the access of irq, it is not suitable for dom0 that doesn't have PIRQs. So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/revoke the permission of irq (translated from x86 gsi) to dumU when dom0 has no PIRQs. Regarding the translation from gsi to irq, it is that if there are ACPI overrides entries then get translation from them, if not gsi are identity mapped into irq. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- CC: Daniel P . Smith Remaining unsolved comment @Daniel P . Smith: +ret = -EPERM; +if ( !irq_access_permitted(currd, irq) || + xsm_irq_permission(XSM_HOOK, d, irq, flags) ) +break; Is it okay to issue the XSM check using the translated value(irq), not the one(gsi) that was originally passed into the hypercall? --- v13->v14 changes: No. v12->v13 changes: For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and XEN_DOMCTL_GSI_GRANT macros. Move "gsi > highest_gsi()" into function gsi_2_irq. Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int type. Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK". Delete unnecessary goto statements and change to direct break. Add description in commit message to explain how gsi to irq isconverted. v11->v12 changes: Change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove "__init" of highest_gsi function. Change the check of irq boundary from <0 to <=0, and remove unnecessary space. Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit. v10->v11 changes: Extracted from patch#5 of v10 into a separate patch. Add non-zero judgment for other bits of allow_access. Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )". Change the error exit path identifier "out" to "gsi_permission_out". Use ARRAY_SIZE() instead of open coed. v9->v10 changes: Modified the commit message to further describe the purpose of adding XEN_DOMCTL_gsi_permission. Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, and used currd instead of current->domain. In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original new code, and error handling for irq0 was added. Deleted the extra spaces in the upper and lower lines of the struct xen_domctl_gsi_permission definition. v8->v9 changes: Change the commit message to describe more why we need this new hypercall. Add comment above "if ( is_pv_domain(current->domain) || has_pirq(current->domain) )" to explain why we need this check. Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq. Add explicit padding to struct xen_domctl_gsi_permission. v5->v8 changes: Nothing. v4->v5 changes: New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi. --- xen/arch/x86/domctl.c | 29 + xen/arch/x86/include/asm/io_apic.h | 2 ++ xen/arch/x86/io_apic.c | 21 + xen/arch/x86/mpparse.c | 7 +++ xen/include/public/domctl.h| 10 ++ xen/xsm/flask/hooks.c | 1 + 6 files changed, 66 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 68b5b46d1a83..60b5578c47f8 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -36,6 +36,7 @@ #include #include #include +#include static int update_domain_cpu_policy(struct domain *d, xen_domctl_cpu_policy_t *xdpc) @@ -237,6 +238,34 @@ long arch_do_domctl( break; } +case XEN_DOMCTL_gsi_permission: +{ +int irq; +unsigned int gsi = domctl->u.gsi_permission.gsi; +uint32_t flags = domctl->u.gsi_permission.flags; + +/* Check all bits are zero except lowest bit */ +ret = -EINVAL; +if ( flags & ~XEN_DOMCTL_GSI_ACTION_MASK ) +break; + +ret = irq = gsi_2_irq(gsi); +if ( ret <= 0 ) +break; + +ret = -EPERM; +if ( !irq_access_permitted(currd, irq) || + xsm_irq_permission(XSM_HOOK, d, irq, flags) ) +break; + +if ( flags ) +ret = irq_permit_access(d, irq); +else +ret = irq_deny_access(d,
[XEN PATCH v14 0/5] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v14 series to support passthrough when dom0 is PVH The expected merge order of this series is the first two patches in this series, then patches on kernel side, then the last three patches in this series. v13->v14 changes: * patch#1: Removed the check ( !is_pci_passthrough_enabled() ). Added if ( dev_reset.flags & ~PCI_DEVICE_RESET_MASK ) to check if the other bits are zero. * patch#2: Modified the commit message. Due to the patch#3 of v13 had been merged, so the sequence number of following patches are v13 decrese one. * patch#3~5: No changes. Best regards, Jiqian Chen v12->v13 changes: Due to major changes in the codes, all the Reviewed-by received before have been removed. Please review them again. * patch#1: Delete all "state" words in new code, because it is not necessary. Delete unnecessary parameter reset_type of function vpci_reset_device, and changed this function to inline function. Add description to commit message to indicate that the classification of reset types is for possible different behaviors in the future. Rename reset_type of struct pci_device_reset to flags, and modified the value of macro definition of reset, let them occupy two lowest bits. Change the function vpci_reset_device to an inline function and delete the "ASSERT(rw_is_write_locked(&pdev->domain->pci_lock))"; because this exists in subsequent functions and it accesses domain and pci_lock, which will affect the compilation process. * patch#2: Remove the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and added a corresponding description in the commit message. * patch#3: Add more detailed descriptions into commit message not just callstack. * patch#4: For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and XEN_DOMCTL_GSI_GRANT macros. Move "gsi > highest_gsi()" into function gsi_2_irq. Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int type. Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK". Delete unnecessary goto statements and change to direct break. Add description in commit message to explain how gsi to irq is converted. * patch#5: Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi to avoid confusion with physdev namesapce. Move the implementation of xc_pcidev_get_gsi into xc_linux.c. Directly use xencall_fd(xch->xcall) in the function xc_pcidev_get_gsi instead of opening "privcmd". * patch#6: Delete patch #6 of v12, and added function xc_physdev_map_pirq_gsi to map pirq for gsi. For functions that generate libxl error, changed the return value from -1 to ERROR_*. Instead of declaring "ctx", use the macro "CTX". Add the function libxl__arch_local_romain_ has_pirq_notion to determine if there is a concept of pirq in the domain where xl is located. In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use map_pirq to obtain the pirq corresponding to gsi. v11->v12 changes: * patch#1: Change the title of this patch. Remove unnecessary notes, erroneous stamps, and #define. * patch#2: Avoid using return, set error code instead when (un)map is not allowed. Due to functional change in v11, remove the Reviewed-by of Stefano. * patch#3: Add more detailed descriptions into commit message not just callstack. patch#4 in v11: remove from this series and upstream individually. * patch#4: is patch#5 of v11, change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove "__init" of highest_gsi function. Change the check of irq boundary from <0 to <=0, and remove unnecessary space. Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit. * patch#5: Add explanation of whether the caller of xc_physdev_map_pirq is affected. v10->v11 changes: * patch#1: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to the next line. Delete unnecessary local variables "struct physdev_pci_device *dev". Downgrade printk to dprintk. Moved struct pci_device_state_reset to the public header file. Delete enum pci_device_state_reset_type, and use macro definitions to represent different reset types. Delete pci_device_state_reset_method, and add switch cases in PHYSDEVOP_pci_device_state_reset to handle different reset functions. Add reset type as a
[RFC XEN PATCH v13 6/6] tools: Add new function to do PIRQ (un)map on PVH dom0
When dom0 is PVH, and passthrough a device to dumU, xl will use the gsi number of device to do a pirq mapping, see pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, that confuses irq and gsi, they are in different space and are not equal, so it will fail when mapping. To solve this issue, use xc_physdev_gsi_from_dev to get the real gsi and add a new function xc_physdev_map_pirq_gsi to get a free pirq for gsi(why not use current function xc_physdev_map_pirq, because it doesn't support to allocate a free pirq, what's more, to prevent changing it and affecting its callers, so add xc_physdev_map_pirq_gsi). Besides, PVH dom0 doesn't have PIRQ flag, it doesn't do PHYSDEVOP_map_pirq for each gsi. So grant function callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission requires passing in pirq, it is not suitable for dom0 that doesn't have PIRQs to grant irq permission. To solve this issue, use the new hypercall XEN_DOMCTL_gsi_permission to grant the permission of irq( translate from gsi) to dumU when dom0 has no PIRQs. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged. https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/ --- v12->v13 changes: Deleted patch #6 of v12, and added function xc_physdev_map_pirq_gsi to map pirq for gsi. For functions that generate libxl error, changed the return value from -1 to ERROR_*. Instead of declaring "ctx", use the macro "CTX". Add the function libxl__arch_local_romain_ has_pirq_notion to determine if there is a concept of pirq in the domain where xl is located. In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use map_pirq to obtain the pirq corresponding to gsi. v11->v12 changes: Nothing. v10->v11 changes: New patch Modification of the tools part of patches#4 and #5 of v10, use privcmd_gsi_from_dev to get gsi, and use XEN_DOMCTL_gsi_permission to grant gsi. Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID. Add libxl__arch_hvm_map_gsi to distinguish x86 related implementations. Add a list pcidev_pirq_list to record the relationship between sbdf and pirq, which can be used to obtain the corresponding pirq when unmap PIRQ. --- tools/include/xenctrl.h | 10 +++ tools/libs/ctrl/xc_domain.c | 15 + tools/libs/ctrl/xc_physdev.c | 27 tools/libs/light/libxl_arch.h | 6 ++ tools/libs/light/libxl_arm.c | 15 + tools/libs/light/libxl_pci.c | 112 -- tools/libs/light/libxl_x86.c | 72 ++ 7 files changed, 212 insertions(+), 45 deletions(-) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 82de6748f7a7..c798472995f7 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch, uint32_t pirq, bool allow_access); +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + uint32_t flags); + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, @@ -1637,6 +1642,11 @@ int xc_physdev_map_pirq_msi(xc_interface *xch, int entry_nr, uint64_t table_base); +int xc_physdev_map_pirq_gsi(xc_interface *xch, +uint32_t domid, +int gsi, +int *pirq); + int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c index f2d9d14b4d9f..e3538ec0ba80 100644 --- a/tools/libs/ctrl/xc_domain.c +++ b/tools/libs/ctrl/xc_domain.c @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch, return do_domctl(xch, &domctl); } +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + uint32_t flags) +{ +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_gsi_permission, +.domain = domid, +.u.gsi_permission.gsi = gsi, +.u.gsi_permission.flags = flags, +}; + +return do_domctl(xch, &domctl); +} + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c index 460a8e779ce8..c752cd1f4410 100644 -
[RFC XEN PATCH v13 5/6] tools: Add new function to get gsi from dev
When passthrough a device to domU, QEMU and xl tools use its gsi number to do pirq mapping, see QEMU code xen_pt_realize->xc_physdev_map_pirq, and xl code pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, that is wrong, because irq is not equal with gsi, they are in different spaces, so pirq mapping fails. And in current codes, there is no method to get gsi for userspace. For above purpose, add new function to get gsi, and the corresponding ioctl is implemented on linux kernel side. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged. https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/ --- v12->v13 changes: Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi to avoid confusion with physdev namesapce. Move the implementation of xc_pcidev_get_gsi into xc_linux.c. Directly use xencall_fd(xch->xcall) in the function xc_pcidev_get_gsi instead of opening "privcmd". v11->v12 changes: Nothing. v10->v11 changes: Patch#4 of v10, directly open "/dev/xen/privcmd" in the function xc_physdev_gsi_from_dev instead of adding unnecessary functions to libxencall. Change the type of gsi in the structure privcmd_gsi_from_dev from int to u32. v9->v10 changes: Extract the implementation of xc_physdev_gsi_from_dev to be a new patch. --- tools/include/xen-sys/Linux/privcmd.h | 7 +++ tools/include/xenctrl.h | 2 ++ tools/libs/ctrl/xc_freebsd.c | 6 ++ tools/libs/ctrl/xc_linux.c| 20 tools/libs/ctrl/xc_minios.c | 6 ++ tools/libs/ctrl/xc_netbsd.c | 6 ++ tools/libs/ctrl/xc_solaris.c | 6 ++ 7 files changed, 53 insertions(+) diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h index bc60e8fd55eb..607dfa2287bc 100644 --- a/tools/include/xen-sys/Linux/privcmd.h +++ b/tools/include/xen-sys/Linux/privcmd.h @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource { __u64 addr; } privcmd_mmap_resource_t; +typedef struct privcmd_pcidev_get_gsi { + __u32 sbdf; + __u32 gsi; +} privcmd_pcidev_get_gsi_t; + /* * @cmd: IOCTL_PRIVCMD_HYPERCALL * @arg: &privcmd_hypercall_t @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource { _IOC(_IOC_NONE, 'P', 6, sizeof(domid_t)) #define IOCTL_PRIVCMD_MMAP_RESOURCE\ _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t)) +#define IOCTL_PRIVCMD_PCIDEV_GET_GSI \ + _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_pcidev_get_gsi_t)) #define IOCTL_PRIVCMD_UNIMPLEMENTED\ _IOC(_IOC_NONE, 'P', 0xFF, 0) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 9ceca0cffc2f..82de6748f7a7 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libs/ctrl/xc_freebsd.c b/tools/libs/ctrl/xc_freebsd.c index 9dd48a3a08bb..9019fc663361 100644 --- a/tools/libs/ctrl/xc_freebsd.c +++ b/tools/libs/ctrl/xc_freebsd.c @@ -60,6 +60,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size) return ptr; } +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf) +{ +errno = ENOSYS; +return -1; +} + /* * Local variables: * mode: C diff --git a/tools/libs/ctrl/xc_linux.c b/tools/libs/ctrl/xc_linux.c index c67c71c08be3..92591e49a1c8 100644 --- a/tools/libs/ctrl/xc_linux.c +++ b/tools/libs/ctrl/xc_linux.c @@ -66,6 +66,26 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size) return ptr; } +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf) +{ +int ret; +privcmd_pcidev_get_gsi_t dev_gsi = { +.sbdf = sbdf, +.gsi = 0, +}; + +ret = ioctl(xencall_fd(xch->xcall), +IOCTL_PRIVCMD_PCIDEV_GET_GSI, &dev_gsi); + +if (ret < 0) { +PERROR("Failed to get gsi from dev"); +} else { +ret = dev_gsi.gsi; +} + +return ret; +} + /* * Local variables: * mode: C diff --git a/tools/libs/ctrl/xc_minios.c b/tools/libs/ctrl/xc_minios.c index 3dea7a78a576..462af827b33c 100644 --- a/tools/libs/ctrl/xc_minios.c +++ b/tools/libs/ctrl/xc_minios.c @@ -47,6 +47,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, size_t size) return memalign(alignment, size); } +int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf) +{ +errno = ENOSYS; +return -1; +} + /* * Local variables: * mode: C diff --git a/tools/lib
[XEN PATCH v13 1/6] xen/pci: Add hypercall to support reset of pcidev
When a device has been reset on dom0 side, the Xen hypervisor doesn't get notification, so the cached state in vpci is all out of date compare with the real device state. To solve that problem, add a new hypercall to support the reset of pcidev and clear the vpci state of device. So that once the state of device is reset on dom0 side, dom0 can call this hypercall to notify hypervisor. The behavior of different reset types may be different in the future, so divide them now so that they can be easily modified in the future without affecting the hypercall interface. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- v12->v13 changes: Deleted all "state" words in new code, because it is not necessary. Deleted unnecessary parameter reset_type of function vpci_reset_device, and changed this function to inline function Added description to commit message to indicate that the classification of reset types is for possible different behaviors in the future Renamed reset_type of struct pci_device_reset to flags, and modified the value of macro definition of reset, let them occupy two lowest bits. Change the function vpci_reset_device to an inline function and delete the ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); because this call exists in subsequent functions and it accesses domain and pci_lock, which will affect the compilation process. v11->v12 changes: Change the title of this patch(Add hypercall to support reset of pcidev). Remove unnecessary notes, erroneous stamps, and #define. v10->v11 changes: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to the next line. Delete unnecessary local variables "struct physdev_pci_device *dev". Downgrade printk to dprintk. Moved struct pci_device_state_reset to the public header file. Delete enum pci_device_state_reset_type, and use macro definitions to represent different reset types. Delete pci_device_state_reset_method, and add switch cases in PHYSDEVOP_pci_device_state_reset to handle different reset functions. Add reset type as a function parameter for vpci_reset_device_state for possible future use. v9->v10 changes: Nothing. v8->v9 changes: Move pcidevs_unlock below write_lock, and remove "ASSERT(pcidevs_locked());" from vpci_reset_device_state; Add pci_device_state_reset_type to distinguish the reset types. v7->v8 changes: Nothing. v6->v7 changes: Nothing. v5->v6 changes: Rebase code and change old function vpci_remove_device, vpci_add_handlers to vpci_deassign_device, vpci_assign_device. v4->v5 changes: Add pci_lock wrap function vpci_reset_device_state. v3->v4 changes: Change the comment of PHYSDEVOP_pci_device_state_reset; Move printings behind pcidevs_unlock. v2->v3 changes: Move the content out of pci_reset_device_state and delete pci_reset_device_state; Add xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; Add description for PHYSDEVOP_pci_device_state_reset; --- xen/arch/x86/hvm/hypercall.c | 1 + xen/drivers/pci/physdev.c| 52 xen/include/public/physdev.h | 17 xen/include/xen/vpci.h | 6 + 4 files changed, 76 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index c1bd17571e47..68815b03eb25 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: +case PHYSDEVOP_pci_device_reset: case PHYSDEVOP_dbgp_op: if ( !is_hardware_domain(currd) ) return -ENOSYS; diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c index 42db3e6d133c..980ff1ba3d07 100644 --- a/xen/drivers/pci/physdev.c +++ b/xen/drivers/pci/physdev.c @@ -2,6 +2,7 @@ #include #include #include +#include #ifndef COMPAT typedef long ret_t; @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case PHYSDEVOP_pci_device_reset: +{ +struct pci_device_reset dev_reset; +struct pci_dev *pdev; +pci_sbdf_t sbdf; + +ret = -EOPNOTSUPP; +if ( !is_pci_passthrough_enabled() ) +break; + +ret = -EFAULT; +if ( copy_from_guest(&dev_reset, arg, 1) != 0 ) +break; + +sbdf = PCI_SBDF(dev_reset.dev.seg, +dev_reset.dev.bus, +dev_reset.dev.devfn); + +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf); +if ( ret ) +break; + +pcidevs_lock(); +pdev = pci_get_pdev(NULL, sbdf); +if ( !pdev ) +{ +pcidevs_unlock(); +ret = -ENODEV; +break; +} + +write
[XEN PATCH v13 4/6] x86/domctl: Add hypercall to set the access of x86 gsi
Some type of domains don't have PIRQs, like PVH, it doesn't do PHYSDEVOP_map_pirq for each gsi. When passthrough a device to guest base on PVH dom0, callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and irq on Xen side. What's more, current hypercall XEN_DOMCTL_irq_permission requires passing in pirq to set the access of irq, it is not suitable for dom0 that doesn't have PIRQs. So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/revoke the permission of irq (translated from x86 gsi) to dumU when dom0 has no PIRQs. Regarding the translation from gsi to irq, it is that if there are ACPI overrides entries then get translation from them, if not gsi are identity mapped into irq. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- CC: Daniel P . Smith Remaining comment @Daniel P . Smith: +ret = -EPERM; +if ( !irq_access_permitted(currd, irq) || + xsm_irq_permission(XSM_HOOK, d, irq, flags) ) +break; Is it okay to issue the XSM check using the translated value(irq), not the one(gsi) that was originally passed into the hypercall? --- v12->v13 changes: For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and XEN_DOMCTL_GSI_GRANT macros. Move "gsi > highest_gsi()" into function gsi_2_irq. Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int type. Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK". Delete unnecessary goto statements and change to direct break. Add description in commit message to explain how gsi to irq isconverted. v11->v12 changes: Change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove "__init" of highest_gsi function. Change the check of irq boundary from <0 to <=0, and remove unnecessary space. Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit. v10->v11 changes: Extracted from patch#5 of v10 into a separate patch. Add non-zero judgment for other bits of allow_access. Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )". Change the error exit path identifier "out" to "gsi_permission_out". Use ARRAY_SIZE() instead of open coed. v9->v10 changes: Modified the commit message to further describe the purpose of adding XEN_DOMCTL_gsi_permission. Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, and used currd instead of current->domain. In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original new code, and error handling for irq0 was added. Deleted the extra spaces in the upper and lower lines of the struct xen_domctl_gsi_permission definition. v8->v9 changes: Change the commit message to describe more why we need this new hypercall. Add comment above "if ( is_pv_domain(current->domain) || has_pirq(current->domain) )" to explain why we need this check. Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq. Add explicit padding to struct xen_domctl_gsi_permission. v5->v8 changes: Nothing. v4->v5 changes: New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi. --- xen/arch/x86/domctl.c | 29 + xen/arch/x86/include/asm/io_apic.h | 2 ++ xen/arch/x86/io_apic.c | 21 + xen/arch/x86/mpparse.c | 7 +++ xen/include/public/domctl.h| 10 ++ xen/xsm/flask/hooks.c | 1 + 6 files changed, 66 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 68b5b46d1a83..60b5578c47f8 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -36,6 +36,7 @@ #include #include #include +#include static int update_domain_cpu_policy(struct domain *d, xen_domctl_cpu_policy_t *xdpc) @@ -237,6 +238,34 @@ long arch_do_domctl( break; } +case XEN_DOMCTL_gsi_permission: +{ +int irq; +unsigned int gsi = domctl->u.gsi_permission.gsi; +uint32_t flags = domctl->u.gsi_permission.flags; + +/* Check all bits are zero except lowest bit */ +ret = -EINVAL; +if ( flags & ~XEN_DOMCTL_GSI_ACTION_MASK ) +break; + +ret = irq = gsi_2_irq(gsi); +if ( ret <= 0 ) +break; + +ret = -EPERM; +if ( !irq_access_permitted(currd, irq) || + xsm_irq_permission(XSM_HOOK, d, irq, flags) ) +break; + +if ( flags ) +ret = irq_permit_access(d, irq); +else +ret = irq_deny_access(d, irq); + +break; +} + cas
[XEN PATCH v13 3/6] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
The gsi of a passthrough device must be configured for it to be able to be mapped into a hvm domU. But When dom0 is PVH, the gsis may not get registered(see below clarification), it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail. To fix above problem, on Linux kernel side, a new code will need to call PHYSDEVOP_setup_gsi for passthrough devices to register gsi when dom0 is PVH. So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above purpose. Clarify two questions: First, why the gsi of devices belong to PVH dom0 can work? Because when probe a driver to a normal device, it uses the normal probe function of pci device, in its callstack, it requests irq and unmask corresponding ioapic of gsi, then trap into xen and register gsi finally. Callstack is(on linux kernel side) pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered. Second, why the gsi of passthrough device can't work when dom0 is PVH? Because when assign a device to passthrough, it uses the specific probe function of pciback, in its callstack, it doesn't install a fake irq handler due to the ISR is not running. So that mp_register_gsi on Xen side is never called, then the gsi is not registered. Callstack is(on linux kernel side) pcistub_probe->pcistub_seize-> pcistub_init_device-> xen_pcibk_reset_device-> xen_pcibk_control_isr->isr_on==0. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/hvm/hypercall.c | 1 + 1 file changed, 1 insertion(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 0b7fc060b4e2..81883c8d4f60 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -82,6 +82,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return -ENOSYS; break; +case PHYSDEVOP_setup_gsi: case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: -- 2.34.1
[XEN PATCH v13 2/6] x86/pvh: Allow (un)map_pirq when dom0 is PVH
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for a passthrough device by using gsi, see qemu code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq is not allowed because currd is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail at has_pirq check. So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow iPHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. So that the interrupt of a passthrough device can be successfully mapped to pirq for domU with a notion of PIRQ when dom0 is PVH. To exposing the functionality to wider than (presently) necessary audience(like PVH domU), so it doesn't add any futher restrictions. And there already are some senarios for domains without X86_EMU_USE_PIRQ to use these functions. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- v12->v13 changes: Removed the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and added a corresponding description in the commit message. v11->v12 changes: Avoid using return, set error code instead when (un)map is not allowed. v10->v11 changes: Delete the judgment of "d==currd", so that we can prevent physdev_(un)map_pirq from being executed when domU has no pirq, instead of just preventing self-mapping. And modify the description of the commit message accordingly. v9->v10 changes: Indent the comments above PHYSDEVOP_map_pirq according to the code style. v8->v9 changes: Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall. Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" to "d == current->domian". v7->v8 changes: Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't use pirq. That check was missed in the previous version. v6->v7 changes: Nothing. v5->v6 changes: Nothing. v4->v5 changes: Move the check of self map_pirq to physdev.c, and change to check if the caller has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op. v3->v4 changes: add check to prevent PVH self map. v2->v3 changes: Du to changes in the implementation of the second patch on kernel side(that it will do setup_gsi and map_pirq when assigning a device to passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self mapping. --- xen/arch/x86/hvm/hypercall.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 68815b03eb25..0b7fc060b4e2 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: -- 2.34.1
[XEN PATCH v13 0/6] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v13 series to support passthrough when dom0 is PVH The expected merge order of this series is the first three patches in this series, then patches on kernel side, then the last three patches in this series. v12->v13 changes: Due to major changes in the codes, all the Reviewed-by received before have been removed. Please review them again. * patch#1: Delete all "state" words in new code, because it is not necessary. Delete unnecessary parameter reset_type of function vpci_reset_device, and changed this function to inline function. Add description to commit message to indicate that the classification of reset types is for possible different behaviors in the future. Rename reset_type of struct pci_device_reset to flags, and modified the value of macro definition of reset, let them occupy two lowest bits. Change the function vpci_reset_device to an inline function and delete the "ASSERT(rw_is_write_locked(&pdev->domain->pci_lock))"; because this exists in subsequent functions and it accesses domain and pci_lock, which will affect the compilation process. * patch#2: Remove the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and added a corresponding description in the commit message. * patch#3: Add more detailed descriptions into commit message not just callstack. * patch#4: For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and XEN_DOMCTL_GSI_GRANT macros. Move "gsi > highest_gsi()" into function gsi_2_irq. Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int type. Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK". Delete unnecessary goto statements and change to direct break. Add description in commit message to explain how gsi to irq is converted. * patch#5: Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi to avoid confusion with physdev namesapce. Move the implementation of xc_pcidev_get_gsi into xc_linux.c. Directly use xencall_fd(xch->xcall) in the function xc_pcidev_get_gsi instead of opening "privcmd". * patch#6: Delete patch #6 of v12, and added function xc_physdev_map_pirq_gsi to map pirq for gsi. For functions that generate libxl error, changed the return value from -1 to ERROR_*. Instead of declaring "ctx", use the macro "CTX". Add the function libxl__arch_local_romain_ has_pirq_notion to determine if there is a concept of pirq in the domain where xl is located. In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use map_pirq to obtain the pirq corresponding to gsi. Best regards, Jiqian Chen v11->v12 changes: * patch#1: Change the title of this patch. Remove unnecessary notes, erroneous stamps, and #define. * patch#2: Avoid using return, set error code instead when (un)map is not allowed. Due to functional change in v11, remove the Reviewed-by of Stefano. * patch#3: Add more detailed descriptions into commit message not just callstack. patch#4 in v11: remove from this series and upstream individually. * patch#4: is patch#5 of v11, change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove "__init" of highest_gsi function. Change the check of irq boundary from <0 to <=0, and remove unnecessary space. Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit. * patch#5: Add explanation of whether the caller of xc_physdev_map_pirq is affected. v10->v11 changes: * patch#1: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to the next line. Delete unnecessary local variables "struct physdev_pci_device *dev". Downgrade printk to dprintk. Moved struct pci_device_state_reset to the public header file. Delete enum pci_device_state_reset_type, and use macro definitions to represent different reset types. Delete pci_device_state_reset_method, and add switch cases in PHYSDEVOP_pci_device_state_reset to handle different reset functions. Add reset type as a function parameter for vpci_reset_device_state for possible future use * patch#2: Delete the judgment of "d==currd", so that we can prevent physdev_(un)map_pirq from being executed when domU has no pirq, instead of just preventing self-mapping; and modify the description of the commit message accordingly. * patch#3: Modify the commit message to explain why the gs
[RFC XEN PATCH v12 7/7] tools: Add new function to do PIRQ (un)map on PVH dom0
When dom0 is PVH, and passthrough a device to dumU, xl will use the gsi number of device to do a pirq mapping, see pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, that confuses irq and gsi, they are in different space and are not equal, so it will fail when mapping. To solve this issue, use xc_physdev_gsi_from_dev to get the real gsi and then to map pirq. Besides, PVH dom doesn't have PIRQ flag, it doesn't do PHYSDEVOP_map_pirq for each gsi. So grant function callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission requires passing in pirq, it is not suitable for dom0 that doesn't have PIRQs to grant irq permission. To solve this issue, use the new hypercall XEN_DOMCTL_gsi_permission to grant the permission of irq( translate from gsi) to dumU when dom0 has no PIRQs. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged. https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/ This patch must be merged after the patch on linux kernel side --- tools/include/xenctrl.h | 5 ++ tools/libs/ctrl/xc_domain.c | 15 + tools/libs/light/libxl_arch.h | 4 ++ tools/libs/light/libxl_arm.c | 10 +++ tools/libs/light/libxl_pci.c | 17 ++ tools/libs/light/libxl_x86.c | 111 ++ 6 files changed, 162 insertions(+) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 3720e22b399a..9ff5f1810cf8 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch, uint32_t pirq, bool allow_access); +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + uint8_t access_flag); + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c index f2d9d14b4d9f..4c89f07e4d6e 100644 --- a/tools/libs/ctrl/xc_domain.c +++ b/tools/libs/ctrl/xc_domain.c @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch, return do_domctl(xch, &domctl); } +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + uint8_t access_flag) +{ +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_gsi_permission, +.domain = domid, +.u.gsi_permission.gsi = gsi, +.u.gsi_permission.access_flag = access_flag, +}; + +return do_domctl(xch, &domctl); +} + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/light/libxl_arch.h b/tools/libs/light/libxl_arch.h index f88f11d6de1d..11b736067951 100644 --- a/tools/libs/light/libxl_arch.h +++ b/tools/libs/light/libxl_arch.h @@ -91,6 +91,10 @@ void libxl__arch_update_domain_config(libxl__gc *gc, libxl_domain_config *dst, const libxl_domain_config *src); +_hidden +int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid); +_hidden +int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid); #if defined(__i386__) || defined(__x86_64__) #define LAPIC_BASE_ADDRESS 0xfee0 diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index a4029e3ac810..d869bbec769e 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -1774,6 +1774,16 @@ void libxl__arch_update_domain_config(libxl__gc *gc, { } +int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid) +{ +return -1; +} + +int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid) +{ +return -1; +} + /* * Local variables: * mode: C diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index 96cb4da0794e..3d25997921cc 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -17,6 +17,7 @@ #include "libxl_osdeps.h" /* must come before any other headers */ #include "libxl_internal.h" +#include "libxl_arch.h" #define PCI_BDF"%04x:%02x:%02x.%01x" #define PCI_BDF_SHORT "%02x:%02x.%01x" @@ -1478,6 +1479,16 @@ static void pci_add_dm_done(libxl__egc *egc, fclose(f); if (!pci_supp_legacy_irq()) goto out_no_irq; + +/* + * When dom0 is PVH and mapping a x86 gsi to
[RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev
When passthrough a device to domU, QEMU and xl tools use its gsi number to do pirq mapping, see QEMU code xen_pt_realize->xc_physdev_map_pirq, and xl code pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, that is wrong, because irq is not equal with gsi, they are in different spaces, so pirq mapping fails. And in current codes, there is no method to get gsi for userspace. For above purpose, add new function to get gsi, and the corresponding ioctl is implemented on linux kernel side. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged. https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/ This patch must be merged after the patch on linux kernel side CC: Anthony PERARD Remaining comment @Anthony PERARD: Do I need to make " opening of /dev/xen/privcmd " as a single function, then use it in this patch and other libraries? --- tools/include/xen-sys/Linux/privcmd.h | 7 ++ tools/include/xenctrl.h | 2 ++ tools/libs/ctrl/xc_physdev.c | 35 +++ 3 files changed, 44 insertions(+) diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h index bc60e8fd55eb..4cf719102116 100644 --- a/tools/include/xen-sys/Linux/privcmd.h +++ b/tools/include/xen-sys/Linux/privcmd.h @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource { __u64 addr; } privcmd_mmap_resource_t; +typedef struct privcmd_gsi_from_pcidev { + __u32 sbdf; + __u32 gsi; +} privcmd_gsi_from_pcidev_t; + /* * @cmd: IOCTL_PRIVCMD_HYPERCALL * @arg: &privcmd_hypercall_t @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource { _IOC(_IOC_NONE, 'P', 6, sizeof(domid_t)) #define IOCTL_PRIVCMD_MMAP_RESOURCE\ _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t)) +#define IOCTL_PRIVCMD_GSI_FROM_PCIDEV \ + _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_pcidev_t)) #define IOCTL_PRIVCMD_UNIMPLEMENTED\ _IOC(_IOC_NONE, 'P', 0xFF, 0) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 9ceca0cffc2f..3720e22b399a 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c index e9fcd755fa62..54edb0f3c0dc 100644 --- a/tools/libs/ctrl/xc_physdev.c +++ b/tools/libs/ctrl/xc_physdev.c @@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch, return rc; } +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf) +{ +int rc = -1; + +#if defined(__linux__) +int fd; +privcmd_gsi_from_pcidev_t dev_gsi = { +.sbdf = sbdf, +.gsi = 0, +}; + +fd = open("/dev/xen/privcmd", O_RDWR); + +if (fd < 0 && (errno == ENOENT || errno == ENXIO || errno == ENODEV)) { +/* Fallback to /proc/xen/privcmd */ +fd = open("/proc/xen/privcmd", O_RDWR); +} + +if (fd < 0) { +PERROR("Could not obtain handle on privileged command interface"); +return rc; +} + +rc = ioctl(fd, IOCTL_PRIVCMD_GSI_FROM_PCIDEV, &dev_gsi); +close(fd); + +if (rc) { +PERROR("Failed to get gsi from dev"); +} else { +rc = dev_gsi.gsi; +} +#endif + +return rc; +} -- 2.34.1
[XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for a passthrough device by using gsi, see qemu code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq is not allowed because currd is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail at has_pirq check. So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. And add a new check to prevent (un)map when the subject domain doesn't have a notion of PIRQ. So that the interrupt of a passthrough device can be successfully mapped to pirq for domU with a notion of PIRQ when dom0 is PVH Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/hvm/hypercall.c | 6 ++ xen/arch/x86/physdev.c | 12 ++-- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 0fab670a4871..03ada3c880bd 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) switch ( cmd ) { +/* +* Only being permitted for management of other domains. +* Further restrictions are enforced in do_physdev_op. +*/ case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index d6dd622952a9..9f30a8c63a06 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( !d ) break; -ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi); +/* Only mapping when the subject domain has a notion of PIRQ */ +if ( !is_hvm_domain(d) || has_pirq(d) ) +ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi); +else +ret = -EOPNOTSUPP; rcu_unlock_domain(d); @@ -346,7 +350,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( !d ) break; -ret = physdev_unmap_pirq(d, unmap.pirq); +/* Only unmapping when the subject domain has a notion of PIRQ */ +if ( !is_hvm_domain(d) || has_pirq(d) ) +ret = physdev_unmap_pirq(d, unmap.pirq); +else +ret = -EOPNOTSUPP; rcu_unlock_domain(d); -- 2.34.1
[XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev
When a device has been reset on dom0 side, the Xen hypervisor doesn't get notification, so the cached state in vpci is all out of date compare with the real device state. To solve that problem, add a new hypercall to support the reset of pcidev and clear the vpci state of device. So that once the state of device is reset on dom0 side, dom0 can call this hypercall to notify hypervisor. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stewart Hildebrand Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 1 + xen/drivers/pci/physdev.c| 52 xen/drivers/vpci/vpci.c | 10 +++ xen/include/public/physdev.h | 16 +++ xen/include/xen/vpci.h | 8 ++ 5 files changed, 87 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 7fb3136f0c7c..0fab670a4871 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: +case PHYSDEVOP_pci_device_state_reset: case PHYSDEVOP_dbgp_op: if ( !is_hardware_domain(currd) ) return -ENOSYS; diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c index 42db3e6d133c..c0f47945d955 100644 --- a/xen/drivers/pci/physdev.c +++ b/xen/drivers/pci/physdev.c @@ -2,6 +2,7 @@ #include #include #include +#include #ifndef COMPAT typedef long ret_t; @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case PHYSDEVOP_pci_device_state_reset: +{ +struct pci_device_state_reset dev_reset; +struct pci_dev *pdev; +pci_sbdf_t sbdf; + +ret = -EOPNOTSUPP; +if ( !is_pci_passthrough_enabled() ) +break; + +ret = -EFAULT; +if ( copy_from_guest(&dev_reset, arg, 1) != 0 ) +break; + +sbdf = PCI_SBDF(dev_reset.dev.seg, +dev_reset.dev.bus, +dev_reset.dev.devfn); + +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf); +if ( ret ) +break; + +pcidevs_lock(); +pdev = pci_get_pdev(NULL, sbdf); +if ( !pdev ) +{ +pcidevs_unlock(); +ret = -ENODEV; +break; +} + +write_lock(&pdev->domain->pci_lock); +pcidevs_unlock(); +switch ( dev_reset.reset_type ) +{ +case PCI_DEVICE_STATE_RESET_COLD: +case PCI_DEVICE_STATE_RESET_WARM: +case PCI_DEVICE_STATE_RESET_HOT: +case PCI_DEVICE_STATE_RESET_FLR: +ret = vpci_reset_device_state(pdev, dev_reset.reset_type); +break; + +default: +ret = -EOPNOTSUPP; +break; +} +write_unlock(&pdev->domain->pci_lock); + +break; +} + default: ret = -ENOSYS; break; diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 1e6aa5d799b9..7e914d1eff9f 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev) return rc; } + +int vpci_reset_device_state(struct pci_dev *pdev, +uint32_t reset_type) +{ +ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); + +vpci_deassign_device(pdev); +return vpci_assign_device(pdev); +} + #endif /* __XEN__ */ static int vpci_register_cmp(const struct vpci_register *r1, diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h index f0c0d4727c0b..3cfde3fd2389 100644 --- a/xen/include/public/physdev.h +++ b/xen/include/public/physdev.h @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t); */ #define PHYSDEVOP_prepare_msix 30 #define PHYSDEVOP_release_msix 31 +/* + * Notify the hypervisor that a PCI device has been reset, so that any + * internally cached state is regenerated. Should be called after any + * device reset performed by the hardware domain. + */ +#define PHYSDEVOP_pci_device_state_reset 32 + struct physdev_pci_device { /* IN */ uint16_t seg; @@ -305,6 +312,15 @@ struct physdev_pci_device { typedef struct physdev_pci_device physdev_pci_device_t; DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t); +struct pci_device_state_reset { +physdev_pci_device_t dev; +#define PCI_DEVICE_STATE_RESET_COLD 0 +#define PCI_DEVICE_STATE_RESET_WARM 1 +#define PCI_DEVICE_STATE_RESET_HOT 2 +#define PCI_DEVICE_STATE_RESET_FLR 3 +uint32_t reset_type; +}; + #define PHYSDEVOP_DBGP_RESET_PREPARE1 #define PHYSDEVOP_DBGP_RESET_DONE 2 diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h index da8d0f41e6f4..6be812dbc04a 100644 --- a/xen/i
[XEN PATCH v12 5/7] tools/libxc: Allow gsi be mapped into a free pirq
Hypercall PHYSDEVOP_map_pirq support to map a gsi into a specific pirq or a free pirq, it depends on the parameter pirq(>0 or <0). But in current xc_physdev_map_pirq, it set *pirq=index when parameter pirq is <0, it causes to force all cases to be mapped to a specific pirq. That has some problems, one is caller can't get a free pirq value, another is that once the pecific pirq was already mapped to other gsi, then it will fail. So, change xc_physdev_map_pirq to allow to pass negative parameter in and then get a free pirq. There are four caller of xc_physdev_map_pirq in original codes, so clarify the affect below(just need to clarify the pirq<0 case): First, pci_add_dm_done->xc_physdev_map_pirq, it pass irq to pirq parameter, if pirq<0 means irq<0, then it will fail at check "index < 0" in allocate_and_map_gsi_pirq and get EINVAL, logic is the same as original code. Second, domcreate_launch_dm->libxl__arch_domain_map_irq-> xc_physdev_map_pirq, the passed pirq is always >=0, so no affect. Third, pyxc_physdev_map_pirq->xc_physdev_map_pirq, not sure, so add the check logic into pyxc_physdev_map_pirq to keep the same behavior. Fourth, xen_pt_realize->xc_physdev_map_pirq, it wants to allocate a pirq for gsi, but it isn't necessary to get pirq whose value is equal with the value of gsi. After this patch, it will get a free pirq, and it also can work. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- tools/libs/ctrl/xc_physdev.c | 2 +- tools/python/xen/lowlevel/xc/xc.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c index 460a8e779ce8..e9fcd755fa62 100644 --- a/tools/libs/ctrl/xc_physdev.c +++ b/tools/libs/ctrl/xc_physdev.c @@ -50,7 +50,7 @@ int xc_physdev_map_pirq(xc_interface *xch, map.domid = domid; map.type = MAP_PIRQ_TYPE_GSI; map.index = index; -map.pirq = *pirq < 0 ? index : *pirq; +map.pirq = *pirq; rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, &map, sizeof(map)); diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c index 9feb12ae2b16..f8c9db7115ee 100644 --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -774,6 +774,8 @@ static PyObject *pyxc_physdev_map_pirq(PyObject *self, if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iii", kwd_list, &dom, &index, &pirq) ) return NULL; +if ( pirq < 0 ) +pirq = index; ret = xc_physdev_map_pirq(xc->xc_handle, dom, index, &pirq); if ( ret != 0 ) return pyxc_error_to_exception(xc->xc_handle); -- 2.34.1
[XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
The gsi of a passthrough device must be configured for it to be able to be mapped into a hvm domU. But When dom0 is PVH, the gsis may not get registered(see below clarification), it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail. To fix above problem, on Linux kernel side, a new code will need to call PHYSDEVOP_setup_gsi for passthrough devices to register gsi when dom0 is PVH. So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above purpose. Clarify two questions: First, why the gsi of devices belong to PVH dom0 can work? Because when probe a driver to a normal device, it uses the normal probe function of pci device, in its callstack, it requests irq and unmask corresponding ioapic of gsi, then trap into xen and register gsi finally. Callstack is(on linux kernel side) pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered. Second, why the gsi of passthrough device can't work when dom0 is PVH? Because when assign a device to passthrough, it uses the specific probe function of pciback, in its callstack, it doesn't install a fake irq handler due to the ISR is not running. So that mp_register_gsi on Xen side is never called, then the gsi is not registered. Callstack is(on linux kernel side) pcistub_probe->pcistub_seize-> pcistub_init_device-> xen_pcibk_reset_device-> xen_pcibk_control_isr->isr_on==0. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/hvm/hypercall.c | 1 + 1 file changed, 1 insertion(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 03ada3c880bd..cfe82d0f96ed 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return -ENOSYS; break; +case PHYSDEVOP_setup_gsi: case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: -- 2.34.1
[XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi
Some type of domains don't have PIRQs, like PVH, it doesn't do PHYSDEVOP_map_pirq for each gsi. When passthrough a device to guest base on PVH dom0, callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and irq on Xen side. What's more, current hypercall XEN_DOMCTL_irq_permission requires passing in pirq to set the access of irq, it is not suitable for dom0 that doesn't have PIRQs. So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny the permission of irq(translate from x86 gsi) to dumU when dom0 has no PIRQs. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- CC: Daniel P . Smith Remaining comment @Daniel P . Smith: +ret = -EPERM; +if ( !irq_access_permitted(currd, irq) || + xsm_irq_permission(XSM_HOOK, d, irq, access_flag) ) +goto gsi_permission_out; Is it okay to issue the XSM check using the translated value, not the one that was originally passed into the hypercall? --- xen/arch/x86/domctl.c | 32 ++ xen/arch/x86/include/asm/io_apic.h | 2 ++ xen/arch/x86/io_apic.c | 17 xen/arch/x86/mpparse.c | 5 ++--- xen/include/public/domctl.h| 9 + xen/xsm/flask/hooks.c | 1 + 6 files changed, 63 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 9190e11faaa3..4e9e4c4cfed3 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -36,6 +36,7 @@ #include #include #include +#include static int update_domain_cpu_policy(struct domain *d, xen_domctl_cpu_policy_t *xdpc) @@ -237,6 +238,37 @@ long arch_do_domctl( break; } +case XEN_DOMCTL_gsi_permission: +{ +int irq; +unsigned int gsi = domctl->u.gsi_permission.gsi; +uint8_t access_flag = domctl->u.gsi_permission.access_flag; + +/* Check all bits and pads are zero except lowest bit */ +ret = -EINVAL; +if ( access_flag & ( ~XEN_DOMCTL_GSI_PERMISSION_MASK ) ) +goto gsi_permission_out; +for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i ) +if ( domctl->u.gsi_permission.pad[i] ) +goto gsi_permission_out; + +if ( gsi > highest_gsi() || (irq = gsi_2_irq(gsi)) <= 0 ) +goto gsi_permission_out; + +ret = -EPERM; +if ( !irq_access_permitted(currd, irq) || + xsm_irq_permission(XSM_HOOK, d, irq, access_flag) ) +goto gsi_permission_out; + +if ( access_flag ) +ret = irq_permit_access(d, irq); +else +ret = irq_deny_access(d, irq); + +gsi_permission_out: +break; +} + case XEN_DOMCTL_getpageframeinfo3: { unsigned int num = domctl->u.getpageframeinfo3.num; diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h index 78268ea8f666..7e86d8337758 100644 --- a/xen/arch/x86/include/asm/io_apic.h +++ b/xen/arch/x86/include/asm/io_apic.h @@ -213,5 +213,7 @@ unsigned highest_gsi(void); int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval); int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val); +int mp_find_ioapic(int gsi); +int gsi_2_irq(int gsi); #endif diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c index d2a313c4ac72..5968c8055671 100644 --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin) return irq; } +int gsi_2_irq(int gsi) +{ +int ioapic, pin, irq; + +ioapic = mp_find_ioapic(gsi); +if ( ioapic < 0 ) +return -EINVAL; + +pin = gsi - io_apic_gsi_base(ioapic); + +irq = apic_pin_2_gsi_irq(ioapic, pin); +if ( irq <= 0 ) +return -EINVAL; + +return irq; +} + static inline int IO_APIC_irq_trigger(int irq) { int apic, idx, pin; diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c index d8ccab2449c6..7786a3337760 100644 --- a/xen/arch/x86/mpparse.c +++ b/xen/arch/x86/mpparse.c @@ -841,8 +841,7 @@ static struct mp_ioapic_routing { } mp_ioapic_routing[MAX_IO_APICS]; -static int mp_find_ioapic ( - int gsi) +int mp_find_ioapic(int gsi) { unsigned inti; @@ -914,7 +913,7 @@ void __init mp_register_ioapic ( return; } -unsigned __init highest_gsi(void) +unsigned highest_gsi(void) { unsigned x, res = 0; for (x = 0; x < nr_ioapics; x++) diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 2a49fe46ce25..877e35ab1376 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -464,6 +464,13 @@ struct xen_domctl_irq_permission { ui
[XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v12 series to support passthrough when dom0 is PVH The expected merge order of this series is the first three patches in this series, then patches on kernel side, then the last four patches in this series. v11->v12 changes: * patch#1: Change the title of this patch. Remove unnecessary notes, erroneous stamps, and #define. * patch#2: Avoid using return, set error code instead when (un)map is not allowed. Due to functional change in v11, remove the Reviewed-by of Stefano. * patch#3: Add more detailed descriptions into commit message not just callstack. patch#4 in v11: remove from this series and upstream individually. * patch#4: is patch#5 of v11, change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove "__init" of highest_gsi function. Change the check of irq boundary from <0 to <=0, and remove unnecessary space. Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit. * patch#5: Add explanation of whether the caller of xc_physdev_map_pirq is affected. Best regards, Jiqian Chen v10->v11 changes: * patch#1: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to the next line. Delete unnecessary local variables "struct physdev_pci_device *dev". Downgrade printk to dprintk. Moved struct pci_device_state_reset to the public header file. Delete enum pci_device_state_reset_type, and use macro definitions to represent different reset types. Delete pci_device_state_reset_method, and add switch cases in PHYSDEVOP_pci_device_state_reset to handle different reset functions. Add reset type as a function parameter for vpci_reset_device_state for possible future use * patch#2: Delete the judgment of "d==currd", so that we can prevent physdev_(un)map_pirq from being executed when domU has no pirq, instead of just preventing self-mapping; and modify the description of the commit message accordingly. * patch#3: Modify the commit message to explain why the gsi of normal devices can work in PVH dom0 and why the passthrough device does not work in PVH dom0. * patch#4: New patch, modification of allocate_pirq function, return the allocated pirq when there is already an allocated pirq and the caller has no specific requirements for pirq, and make it successful. * patch#5: Modification on the hypervisor side proposed from patch#5 of v10. Add non-zero judgment for other bits of allow_access. Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )". Change the error exit path identifier "out" to "gsi_permission_out". Use ARRAY_SIZE() instead of open coed. * patch#6: New patch, modification of xc_physdev_map_pirq to support mapping gsi to an idle pirq. * patch#7: Patch#4 of v10, directly open "/dev/xen/privcmd" in the function xc_physdev_gsi_from_dev instead of adding unnecessary functions to libxencall. Change the type of gsi in the structure privcmd_gsi_from_dev from int to u32. * patch#8: Modification of the tools part of patches#4 and #5 of v10, use privcmd_gsi_from_dev to get gsi, and use XEN_DOMCTL_gsi_permission to grant gsi. Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID. Add libxl__arch_hvm_map_gsi to distinguish x86 related implementations. Add a list pcidev_pirq_list to record the relationship between sbdf and pirq, which can be used to obtain the corresponding pirq when unmap PIRQ. v9->v10 changes: * patch#2: Indent the comments above PHYSDEVOP_map_pirq according to the code style. * patch#3: Modified the description in the commit message, changing "it calls" to "it will need to call", indicating that there will be new codes on the kernel side that will call PHYSDEVOP_setup_gsi. Also added an explanation of why the interrupt of passthrough device does not work if gsi is not registered. * patch#4: Added define for CONFIG_X86 in tools/libs/light/Makefile to isolate x86 code in libxl_pci.c. * patch#5: Modified the commit message to further describe the purpose of adding XEN_DOMCTL_gsi_permission. Deleted pci_device_set_gsi and called XEN_DOMCTL_gsi_permission directly in pci_add_dm_done. Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, and used currd instead of current->domain. In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original new code, and error handling for irq0 was added. Deleted the extra spaces in the upper and lower lines of the struct xen_domctl_gsi_permission definition. All patches have modified s
[PATCH for-4.19 v2] x86/physdev: Return pirq that irq was already mapped to
Fix bug introduced by 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a pirq"). After that re-factoring, when pirq<0 and current_pirq>0, it means caller want to allocate a free pirq for irq but irq already has a mapped pirq, then it returns the negative pirq, so it fails. However, the logic before that re-factoring is different, it should return the current_pirq that irq was already mapped to and make the call success. Fixes: 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a pirq") Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Jan Beulich --- xen/arch/x86/irq.c | 1 + 1 file changed, 1 insertion(+) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 017a94e31155..47477d88171b 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -2898,6 +2898,7 @@ static int allocate_pirq(struct domain *d, int index, int pirq, int irq, d->domain_id, index, pirq, current_pirq); if ( current_pirq < 0 ) return -EBUSY; +pirq = current_pirq; } else if ( type == MAP_PIRQ_TYPE_MULTI_MSI ) { -- 2.34.1
[PATCH for-4.19] x86/physdev: Return pirq that irq was already mapped to
Fix bug imported by 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a pirq"). After that re-factoring, when pirq<0 and current_pirq>0, it means caller want to allocate a free pirq for irq but irq already has a mapped pirq, then it returns the negative pirq, so it fails. However, the logic before that re-factoring is different, it should return the current_pirq that irq was already mapped to and make the call success. Fixes: 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a pirq") Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/irq.c | 1 + 1 file changed, 1 insertion(+) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 9a611c79e024..1a827ccc8498 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -2897,6 +2897,7 @@ static int allocate_pirq(struct domain *d, int index, int pirq, int irq, d->domain_id, index, pirq, current_pirq); if ( current_pirq < 0 ) return -EBUSY; +pirq = current_pirq; } else if ( type == MAP_PIRQ_TYPE_MULTI_MSI ) { -- 2.34.1
[XEN PATCH v11 6/8] tools/libxc: Allow gsi be mapped into a free pirq
Hypercall PHYSDEVOP_map_pirq support to map a gsi into a specific pirq or a free pirq, it depends on the parameter pirq(>0 or <0). But in current xc_physdev_map_pirq, it set *pirq=index when parameter pirq is <0, it causes to force all cases to be mapped to a specific pirq. That has some problems, one is caller can't get a free pirq value, another is that once the pecific pirq was already mapped to other gsi, then it will fail. So, change xc_physdev_map_pirq to allow to pass negative parameter in and then get a free pirq. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- tools/libs/ctrl/xc_physdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c index 460a8e779ce8..e9fcd755fa62 100644 --- a/tools/libs/ctrl/xc_physdev.c +++ b/tools/libs/ctrl/xc_physdev.c @@ -50,7 +50,7 @@ int xc_physdev_map_pirq(xc_interface *xch, map.domid = domid; map.type = MAP_PIRQ_TYPE_GSI; map.index = index; -map.pirq = *pirq < 0 ? index : *pirq; +map.pirq = *pirq; rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, &map, sizeof(map)); -- 2.34.1
[RFC XEN PATCH v11 8/8] tools: Add new function to do PIRQ (un)map on PVH dom0
When dom0 is PVH, and passthrough a device to dumU, xl will use the gsi number of device to do a pirq mapping, see pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, that confuses irq and gsi, they are in different space and are not equal, so it will fail when mapping. To solve this issue, use xc_physdev_gsi_from_dev to get the real gsi and then to map pirq. Besides, PVH dom doesn't have PIRQ flag, it doesn't do PHYSDEVOP_map_pirq for each gsi. So grant function callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission requires passing in pirq, it is not suitable for dom0 that doesn't have PIRQs to grant irq permission. To solve this issue, use the new hypercall XEN_DOMCTL_gsi_permission to grant the permission of irq( translate from gsi) to dumU when dom0 has no PIRQs. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged. https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/ This patch must be merged after the patch on linux kernel side --- tools/include/xenctrl.h | 5 ++ tools/libs/ctrl/xc_domain.c | 15 + tools/libs/light/libxl_arch.h | 4 ++ tools/libs/light/libxl_arm.c | 10 +++ tools/libs/light/libxl_pci.c | 17 ++ tools/libs/light/libxl_x86.c | 111 ++ 6 files changed, 162 insertions(+) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 3720e22b399a..33810385535e 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch, uint32_t pirq, bool allow_access); +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access); + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c index f2d9d14b4d9f..8540e84fda93 100644 --- a/tools/libs/ctrl/xc_domain.c +++ b/tools/libs/ctrl/xc_domain.c @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch, return do_domctl(xch, &domctl); } +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access) +{ +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_gsi_permission, +.domain = domid, +.u.gsi_permission.gsi = gsi, +.u.gsi_permission.allow_access = allow_access, +}; + +return do_domctl(xch, &domctl); +} + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/light/libxl_arch.h b/tools/libs/light/libxl_arch.h index f88f11d6de1d..11b736067951 100644 --- a/tools/libs/light/libxl_arch.h +++ b/tools/libs/light/libxl_arch.h @@ -91,6 +91,10 @@ void libxl__arch_update_domain_config(libxl__gc *gc, libxl_domain_config *dst, const libxl_domain_config *src); +_hidden +int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid); +_hidden +int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid); #if defined(__i386__) || defined(__x86_64__) #define LAPIC_BASE_ADDRESS 0xfee0 diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index a4029e3ac810..d869bbec769e 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -1774,6 +1774,16 @@ void libxl__arch_update_domain_config(libxl__gc *gc, { } +int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid) +{ +return -1; +} + +int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid) +{ +return -1; +} + /* * Local variables: * mode: C diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index 96cb4da0794e..3d25997921cc 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -17,6 +17,7 @@ #include "libxl_osdeps.h" /* must come before any other headers */ #include "libxl_internal.h" +#include "libxl_arch.h" #define PCI_BDF"%04x:%02x:%02x.%01x" #define PCI_BDF_SHORT "%02x:%02x.%01x" @@ -1478,6 +1479,16 @@ static void pci_add_dm_done(libxl__egc *egc, fclose(f); if (!pci_supp_legacy_irq()) goto out_no_irq; + +/* + * When dom0 is PVH and mapping a x86 gsi to
[XEN PATCH v11 0/8] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v11 series to support passthrough when dom0 is PVH v10->v11 changes: * patch#1: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to the next line. Delete unnecessary local variables "struct physdev_pci_device *dev". Downgrade printk to dprintk. Moved struct pci_device_state_reset to the public header file. Delete enum pci_device_state_reset_type, and use macro definitions to represent different reset types. Delete pci_device_state_reset_method, and add switch cases in PHYSDEVOP_pci_device_state_reset to handle different reset functions. Add reset type as a function parameter for vpci_reset_device_state for possible future use * patch#2: Delete the judgment of "d==currd", so that we can prevent physdev_(un)map_pirq from being executed when domU has no pirq, instead of just preventing self-mapping; and modify the description of the commit message accordingly. * patch#3: Modify the commit message to explain why the gsi of normal devices can work in PVH dom0 and why the passthrough device does not work in PVH dom0. * patch#4: New patch, modification of allocate_pirq function, return the allocated pirq when there is already an allocated pirq and the caller has no specific requirements for pirq, and make it successful. * patch#5: Modification on the hypervisor side proposed from patch#5 of v10. Add non-zero judgment for other bits of allow_access. Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )". Change the error exit path identifier "out" to "gsi_permission_out". Use ARRAY_SIZE() instead of open coed. * patch#6: New patch, modification of xc_physdev_map_pirq to support mapping gsi to an idle pirq. * patch#7: Patch#4 of v10, directly open "/dev/xen/privcmd" in the function xc_physdev_gsi_from_dev instead of adding unnecessary functions to libxencall. Change the type of gsi in the structure privcmd_gsi_from_dev from int to u32. * patch#8: Modification of the tools part of patches#4 and #5 of v10, use privcmd_gsi_from_dev to get gsi, and use XEN_DOMCTL_gsi_permission to grant gsi. Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID. Add libxl__arch_hvm_map_gsi to distinguish x86 related implementations. Add a list pcidev_pirq_list to record the relationship between sbdf and pirq, which can be used to obtain the corresponding pirq when unmap PIRQ. Best regards, Jiqian Chen v9->v10 changes: * patch#2: Indent the comments above PHYSDEVOP_map_pirq according to the code style. * patch#3: Modified the description in the commit message, changing "it calls" to "it will need to call", indicating that there will be new codes on the kernel side that will call PHYSDEVOP_setup_gsi. Also added an explanation of why the interrupt of passthrough device does not work if gsi is not registered. * patch#4: Added define for CONFIG_X86 in tools/libs/light/Makefile to isolate x86 code in libxl_pci.c. * patch#5: Modified the commit message to further describe the purpose of adding XEN_DOMCTL_gsi_permission. Deleted pci_device_set_gsi and called XEN_DOMCTL_gsi_permission directly in pci_add_dm_done. Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, and used currd instead of current->domain. In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original new code, and error handling for irq0 was added. Deleted the extra spaces in the upper and lower lines of the struct xen_domctl_gsi_permission definition. All patches have modified signatures as follows: Signed-off-by: Jiqian Chen means I am the author. Signed-off-by: Huang Rui means Rui sent them to upstream firstly. Signed-off-by: Jiqian Chen means I take continue to upstream. v8->v9 changes: * patch#1: Move pcidevs_unlock below write_lock, and remove "ASSERT(pcidevs_locked());" from vpci_reset_device_state; Add pci_device_state_reset_type to distinguish the reset types. * patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall. Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" to "d == current->domian". * patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same checke in below.Although their return values are different, this difference is acceptable for the sake of code consistency if ( !is_hardware_domain(currd) ) return -ENOSYS; break; * patch#5: Change
[XEN PATCH v11 3/8] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
The gsi of a passthrough device must be configured for it to be able to be mapped into a hvm domU. But When dom0 is PVH, the gsis don't get registered, it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail. To fix above problem, on Linux kernel side, a new code will need to call PHYSDEVOP_setup_gsi for passthrough devices to register gsi when dom0 is PVH. So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above purpose. Clarify two questions: First, why the gsi of devices belong to PVH dom0 can work? Because when probe a driver to a normal device, it calls(on linux kernel side) pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered. Second, why the gsi of passthrough device can't work when dom0 is PVH? Because when assign a device to passthrough, it uses pciback to probe the device, and it calls pcistub_probe->pcistub_seize-> pcistub_init_device-> xen_pcibk_reset_device-> xen_pcibk_control_isr->isr_on, but isr_on is not set, so that the fake IRQ handler is not installed, then the gsi isn't unmasked. What's more, we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/hvm/hypercall.c | 1 + 1 file changed, 1 insertion(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 03ada3c880bd..cfe82d0f96ed 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return -ENOSYS; break; +case PHYSDEVOP_setup_gsi: case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: -- 2.34.1
[XEN PATCH v11 2/8] x86/pvh: Allow (un)map_pirq when dom0 is PVH
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for a passthrough device by using gsi, see qemu code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq is not allowed because currd is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail at has_pirq check. So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. And add a new check to prevent (un)map when the subject domain has no X86_EMU_USE_PIRQ flag. So that the interrupt of a passthrough device can be successfully mapped to pirq for domU with X86_EMU_USE_PIRQ flag when dom0 is PVH Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 6 ++ xen/arch/x86/physdev.c | 14 ++ 2 files changed, 20 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 0fab670a4871..03ada3c880bd 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) switch ( cmd ) { +/* +* Only being permitted for management of other domains. +* Further restrictions are enforced in do_physdev_op. +*/ case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index d6dd622952a9..a165f68225c1 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -323,6 +323,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( !d ) break; +/* Prevent mapping when the subject domain has no X86_EMU_USE_PIRQ */ +if ( is_hvm_domain(d) && !has_pirq(d) ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} + ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi); rcu_unlock_domain(d); @@ -346,6 +353,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( !d ) break; +/* Prevent unmapping when the subject domain has no X86_EMU_USE_PIRQ */ +if ( is_hvm_domain(d) && !has_pirq(d) ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} + ret = physdev_unmap_pirq(d, unmap.pirq); rcu_unlock_domain(d); -- 2.34.1
[RFC XEN PATCH v11 7/8] tools: Add new function to get gsi from dev
When passthrough a device to domU, QEMU and xl tools use its gsi number to do pirq mapping, see QEMU code xen_pt_realize->xc_physdev_map_pirq, and xl code pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, that is wrong, because irq is not equal with gsi, they are in different spaces, so pirq mapping fails. And in current codes, there is no method to get gsi for userspace. For above purpose, add new function to get gsi, and the corresponding ioctl is implemented on linux kernel side. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- RFC: it needs to wait for the corresponding third patch on linux kernel side to be merged. https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/ This patch must be merged after the patch on linux kernel side --- tools/include/xen-sys/Linux/privcmd.h | 7 ++ tools/include/xenctrl.h | 2 ++ tools/libs/ctrl/xc_physdev.c | 35 +++ 3 files changed, 44 insertions(+) diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h index bc60e8fd55eb..4cf719102116 100644 --- a/tools/include/xen-sys/Linux/privcmd.h +++ b/tools/include/xen-sys/Linux/privcmd.h @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource { __u64 addr; } privcmd_mmap_resource_t; +typedef struct privcmd_gsi_from_pcidev { + __u32 sbdf; + __u32 gsi; +} privcmd_gsi_from_pcidev_t; + /* * @cmd: IOCTL_PRIVCMD_HYPERCALL * @arg: &privcmd_hypercall_t @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource { _IOC(_IOC_NONE, 'P', 6, sizeof(domid_t)) #define IOCTL_PRIVCMD_MMAP_RESOURCE\ _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t)) +#define IOCTL_PRIVCMD_GSI_FROM_PCIDEV \ + _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_pcidev_t)) #define IOCTL_PRIVCMD_UNIMPLEMENTED\ _IOC(_IOC_NONE, 'P', 0xFF, 0) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 9ceca0cffc2f..3720e22b399a 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c index e9fcd755fa62..54edb0f3c0dc 100644 --- a/tools/libs/ctrl/xc_physdev.c +++ b/tools/libs/ctrl/xc_physdev.c @@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch, return rc; } +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf) +{ +int rc = -1; + +#if defined(__linux__) +int fd; +privcmd_gsi_from_pcidev_t dev_gsi = { +.sbdf = sbdf, +.gsi = 0, +}; + +fd = open("/dev/xen/privcmd", O_RDWR); + +if (fd < 0 && (errno == ENOENT || errno == ENXIO || errno == ENODEV)) { +/* Fallback to /proc/xen/privcmd */ +fd = open("/proc/xen/privcmd", O_RDWR); +} + +if (fd < 0) { +PERROR("Could not obtain handle on privileged command interface"); +return rc; +} + +rc = ioctl(fd, IOCTL_PRIVCMD_GSI_FROM_PCIDEV, &dev_gsi); +close(fd); + +if (rc) { +PERROR("Failed to get gsi from dev"); +} else { +rc = dev_gsi.gsi; +} +#endif + +return rc; +} -- 2.34.1
[XEN PATCH v11 5/8] x86/domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
Some type of domain don't have PIRQs, like PVH, it doesn't do PHYSDEVOP_map_pirq for each gsi. When passthrough a device to guest base on PVH dom0, callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and irq on Xen side. What's more, current hypercall XEN_DOMCTL_irq_permission requires passing in pirq, it is not suitable for dom0 that doesn't have PIRQs. So, add a new hypercall XEN_DOMCTL_gsi_permission to grant the permission of irq(translate from gsi) to dumU when dom0 has no PIRQs. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/domctl.c | 33 ++ xen/arch/x86/include/asm/io_apic.h | 2 ++ xen/arch/x86/io_apic.c | 17 +++ xen/arch/x86/mpparse.c | 3 +-- xen/include/public/domctl.h| 8 xen/xsm/flask/hooks.c | 1 + 6 files changed, 62 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 9190e11faaa3..5f20febabbf2 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -36,6 +36,7 @@ #include #include #include +#include static int update_domain_cpu_policy(struct domain *d, xen_domctl_cpu_policy_t *xdpc) @@ -237,6 +238,38 @@ long arch_do_domctl( break; } +case XEN_DOMCTL_gsi_permission: +{ +int irq; +uint8_t mask = 1; +unsigned int gsi = domctl->u.gsi_permission.gsi; +bool allow = domctl->u.gsi_permission.allow_access; + +/* Check all bits and pads are zero except lowest bit */ +ret = -EINVAL; +if ( domctl->u.gsi_permission.allow_access & ( !mask ) ) +goto gsi_permission_out; +for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i ) +if ( domctl->u.gsi_permission.pad[i] ) +goto gsi_permission_out; + +if ( gsi >= nr_irqs_gsi || ( irq = gsi_2_irq(gsi) ) < 0 ) +goto gsi_permission_out; + +ret = -EPERM; +if ( !irq_access_permitted(currd, irq) || + xsm_irq_permission(XSM_HOOK, d, irq, allow) ) +goto gsi_permission_out; + +if ( allow ) +ret = irq_permit_access(d, irq); +else +ret = irq_deny_access(d, irq); + +gsi_permission_out: +break; +} + case XEN_DOMCTL_getpageframeinfo3: { unsigned int num = domctl->u.getpageframeinfo3.num; diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h index 78268ea8f666..7e86d8337758 100644 --- a/xen/arch/x86/include/asm/io_apic.h +++ b/xen/arch/x86/include/asm/io_apic.h @@ -213,5 +213,7 @@ unsigned highest_gsi(void); int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval); int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val); +int mp_find_ioapic(int gsi); +int gsi_2_irq(int gsi); #endif diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c index d73108558e09..d54283955a60 100644 --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin) return irq; } +int gsi_2_irq(int gsi) +{ +int ioapic, pin, irq; + +ioapic = mp_find_ioapic(gsi); +if ( ioapic < 0 ) +return -EINVAL; + +pin = gsi - io_apic_gsi_base(ioapic); + +irq = apic_pin_2_gsi_irq(ioapic, pin); +if ( irq <= 0 ) +return -EINVAL; + +return irq; +} + static inline int IO_APIC_irq_trigger(int irq) { int apic, idx, pin; diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c index d8ccab2449c6..c95da0de5770 100644 --- a/xen/arch/x86/mpparse.c +++ b/xen/arch/x86/mpparse.c @@ -841,8 +841,7 @@ static struct mp_ioapic_routing { } mp_ioapic_routing[MAX_IO_APICS]; -static int mp_find_ioapic ( - int gsi) +int mp_find_ioapic(int gsi) { unsigned inti; diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 2a49fe46ce25..f7ae8b19d27d 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -464,6 +464,12 @@ struct xen_domctl_irq_permission { uint8_t pad[3]; }; +/* XEN_DOMCTL_gsi_permission */ +struct xen_domctl_gsi_permission { +uint32_t gsi; +uint8_t allow_access;/* flag to specify enable/disable of x86 gsi access */ +uint8_t pad[3]; +}; /* XEN_DOMCTL_iomem_permission */ struct xen_domctl_iomem_permission { @@ -1306,6 +1312,7 @@ struct xen_domctl { #define XEN_DOMCTL_get_paging_mempool_size 85 #define XEN_DOMCTL_set_paging_mempool_size 86 #define XEN_DOMCTL_dt_overlay87 +#define XEN_DOMCTL_gsi_permission88 #define XEN_DOMCTL_gdbsx_guestmemio1000
[XEN PATCH v11 4/8] x86/physdev: Return pirq that irq was already mapped to
allocate_pirq is to allocate a pirq for a irq, and it supports to allocate a free pirq(pirq parameter is <0) or a specific pirq (pirq parameter is > 0). For current code, it has four usecases. First, pirq>0 and current_pirq>0, (current_pirq means if irq already has a mapped pirq), if pirq==current_pirq means the irq already has mapped to the pirq expected by the caller, it successes, if pirq!=current_pirq means the pirq expected by the caller has been mapped into other irq, it fails. Second, pirq>0 and current_pirq<0, it means pirq expected by the caller has not been allocated to any irqs, so it can be allocated to caller, it successes. Third, pirq<0 and current_pirq<0, it means caller want to allocate a free pirq for irq and irq has no mapped pirq, it successes. Fourth, pirq<0 and current_pirq>0, it means caller want to allocate a free pirq for irq but irq has a mapped pirq, then it returns the negative pirq, so it fails. The problem is in Fourth, since the irq has a mapped pirq(current_pirq), and the caller doesn't want to allocate a specified pirq to the irq, so the current_pirq should be returned directly in this case, indicating that the allocation is successful. That can help caller to success when caller just want to allocate a free pirq but doesn't know if the irq already has a mapped pirq or not. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/irq.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 9a611c79e024..5ccca1646eb1 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -2897,6 +2897,8 @@ static int allocate_pirq(struct domain *d, int index, int pirq, int irq, d->domain_id, index, pirq, current_pirq); if ( current_pirq < 0 ) return -EBUSY; +else +return current_pirq; } else if ( type == MAP_PIRQ_TYPE_MULTI_MSI ) { -- 2.34.1
[XEN PATCH v11 1/8] xen/vpci: Clear all vpci status of device
When a device has been reset on dom0 side, the vpci on Xen side won't get notification, so the cached state in vpci is all out of date compare with the real device state. To solve that problem, add a new hypercall to clear all vpci device state. When the state of device is reset on dom0 side, dom0 can call this hypercall to notify vpci. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stewart Hildebrand Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 1 + xen/drivers/pci/physdev.c| 58 xen/drivers/vpci/vpci.c | 10 +++ xen/include/public/physdev.h | 20 + xen/include/xen/vpci.h | 8 + 5 files changed, 97 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 7fb3136f0c7c..0fab670a4871 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: +case PHYSDEVOP_pci_device_state_reset: case PHYSDEVOP_dbgp_op: if ( !is_hardware_domain(currd) ) return -ENOSYS; diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c index 42db3e6d133c..19a755d1c127 100644 --- a/xen/drivers/pci/physdev.c +++ b/xen/drivers/pci/physdev.c @@ -2,6 +2,7 @@ #include #include #include +#include #ifndef COMPAT typedef long ret_t; @@ -67,6 +68,63 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case PHYSDEVOP_pci_device_state_reset: +{ +struct pci_device_state_reset dev_reset; +struct pci_dev *pdev; +pci_sbdf_t sbdf; + +ret = -EOPNOTSUPP; +if ( !is_pci_passthrough_enabled() ) +break; + +ret = -EFAULT; +if ( copy_from_guest(&dev_reset, arg, 1) != 0 ) +break; + +sbdf = PCI_SBDF(dev_reset.dev.seg, +dev_reset.dev.bus, +dev_reset.dev.devfn); + +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf); +if ( ret ) +break; + +pcidevs_lock(); +pdev = pci_get_pdev(NULL, sbdf); +if ( !pdev ) +{ +pcidevs_unlock(); +ret = -ENODEV; +break; +} + +write_lock(&pdev->domain->pci_lock); +pcidevs_unlock(); +/* Implement FLR, other reset types may be implemented in future */ +switch ( dev_reset.reset_type ) +{ +case PCI_DEVICE_STATE_RESET_COLD: +case PCI_DEVICE_STATE_RESET_WARM: +case PCI_DEVICE_STATE_RESET_HOT: +case PCI_DEVICE_STATE_RESET_FLR: +{ +ret = vpci_reset_device_state(pdev, dev_reset.reset_type); +if ( ret ) +dprintk(XENLOG_ERR, +"%pp: failed to reset vPCI device state\n", &sbdf); +break; +} + +default: +ret = -EOPNOTSUPP; +break; +} +write_unlock(&pdev->domain->pci_lock); + +break; +} + default: ret = -ENOSYS; break; diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 1e6aa5d799b9..7e914d1eff9f 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev) return rc; } + +int vpci_reset_device_state(struct pci_dev *pdev, +uint32_t reset_type) +{ +ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); + +vpci_deassign_device(pdev); +return vpci_assign_device(pdev); +} + #endif /* __XEN__ */ static int vpci_register_cmp(const struct vpci_register *r1, diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h index f0c0d4727c0b..ddbcdfb05248 100644 --- a/xen/include/public/physdev.h +++ b/xen/include/public/physdev.h @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t); */ #define PHYSDEVOP_prepare_msix 30 #define PHYSDEVOP_release_msix 31 +/* + * Notify the hypervisor that a PCI device has been reset, so that any + * internally cached state is regenerated. Should be called after any + * device reset performed by the hardware domain. + */ +#define PHYSDEVOP_pci_device_state_reset 32 + struct physdev_pci_device { /* IN */ uint16_t seg; @@ -305,6 +312,19 @@ struct physdev_pci_device { typedef struct physdev_pci_device physdev_pci_device_t; DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t); +struct pci_device_state_reset { +physdev_pci_device_t dev; +#define _PCI_DEVICE_STATE_RESET_COLD 0 +#define PCI_DEVICE_STATE_RESET_COLD (1U<<_PCI_DEVICE_STATE_RESET_COLD) +#define _PCI_DEVICE_STATE_RESET_WARM 1 +#define PCI_DEVICE_ST
[XEN PATCH v10 4/5] tools: Add new function to get gsi from dev
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, that causes the irq number is not equal with the gsi number. And when passthrough a device, QEMU will use its gsi number to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, so it will fail when mapping. And in current codes, there is no method to get gsi for userspace. For above purpose, add new function to get gsi. And call this function before xc_physdev_(un)map_pirq Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- RFC: it needs review and needs to wait for the corresponding third patch on linux kernel side to be merged. --- tools/include/xen-sys/Linux/privcmd.h | 7 + tools/include/xencall.h | 2 ++ tools/include/xenctrl.h | 2 ++ tools/libs/call/core.c| 5 tools/libs/call/libxencall.map| 2 ++ tools/libs/call/linux.c | 15 +++ tools/libs/call/private.h | 9 +++ tools/libs/ctrl/xc_physdev.c | 4 +++ tools/libs/light/Makefile | 2 +- tools/libs/light/libxl_pci.c | 38 +++ 10 files changed, 85 insertions(+), 1 deletion(-) diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h index bc60e8fd55eb..977f1a058797 100644 --- a/tools/include/xen-sys/Linux/privcmd.h +++ b/tools/include/xen-sys/Linux/privcmd.h @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource { __u64 addr; } privcmd_mmap_resource_t; +typedef struct privcmd_gsi_from_dev { + __u32 sbdf; + int gsi; +} privcmd_gsi_from_dev_t; + /* * @cmd: IOCTL_PRIVCMD_HYPERCALL * @arg: &privcmd_hypercall_t @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource { _IOC(_IOC_NONE, 'P', 6, sizeof(domid_t)) #define IOCTL_PRIVCMD_MMAP_RESOURCE\ _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t)) +#define IOCTL_PRIVCMD_GSI_FROM_DEV \ + _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_dev_t)) #define IOCTL_PRIVCMD_UNIMPLEMENTED\ _IOC(_IOC_NONE, 'P', 0xFF, 0) diff --git a/tools/include/xencall.h b/tools/include/xencall.h index fc95ed0fe58e..750aab070323 100644 --- a/tools/include/xencall.h +++ b/tools/include/xencall.h @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op, uint64_t arg1, uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5); +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf); + /* Variant(s) of the above, as needed, returning "long" instead of "int". */ long xencall2L(xencall_handle *xcall, unsigned int op, uint64_t arg1, uint64_t arg2); diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 9ceca0cffc2f..a0381f74d24b 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c index 02c4f8e1aefa..6dae50c9a6ba 100644 --- a/tools/libs/call/core.c +++ b/tools/libs/call/core.c @@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op, return osdep_hypercall(xcall, &call); } +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf) +{ +return osdep_oscall(xcall, sbdf); +} + /* * Local variables: * mode: C diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map index d18a3174e9dc..b92a0b5dc12c 100644 --- a/tools/libs/call/libxencall.map +++ b/tools/libs/call/libxencall.map @@ -10,6 +10,8 @@ VERS_1.0 { xencall4; xencall5; + xen_oscall_gsi_from_dev; + xencall_alloc_buffer; xencall_free_buffer; xencall_alloc_buffer_pages; diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c index 6d588e6bea8f..92c740e176f2 100644 --- a/tools/libs/call/linux.c +++ b/tools/libs/call/linux.c @@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall) return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall); } +int osdep_oscall(xencall_handle *xcall, unsigned int sbdf) +{ +privcmd_gsi_from_dev_t dev_gsi = { +.sbdf = sbdf, +.gsi = -1, +}; + +if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, &
[XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
Some type of domain don't have PIRQs, like PVH, it doesn't do PHYSDEVOP_map_pirq for each gsi. When passthrough a device to guest base on PVH dom0, callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and irq on Xen side. What's more, current hypercall XEN_DOMCTL_irq_permission requires passing in pirq, it is not suitable for dom0 that doesn't have PIRQs. So, add a new hypercall XEN_DOMCTL_gsi_permission to grant the permission of irq(translate from gsi) to dumU when dom0 has no PIRQs. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- RFC: it needs review and needs to wait for the corresponding third patch on linux kernel side to be merged. --- tools/include/xenctrl.h| 5 +++ tools/libs/ctrl/xc_domain.c| 15 +++ tools/libs/light/libxl_pci.c | 67 +++--- xen/arch/x86/domctl.c | 43 +++ xen/arch/x86/include/asm/io_apic.h | 2 + xen/arch/x86/io_apic.c | 17 xen/arch/x86/mpparse.c | 3 +- xen/include/public/domctl.h| 8 xen/xsm/flask/hooks.c | 1 + 9 files changed, 153 insertions(+), 8 deletions(-) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index a0381f74d24b..f3feb6848e25 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch, uint32_t pirq, bool allow_access); +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access); + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c index f2d9d14b4d9f..8540e84fda93 100644 --- a/tools/libs/ctrl/xc_domain.c +++ b/tools/libs/ctrl/xc_domain.c @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch, return do_domctl(xch, &domctl); } +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access) +{ +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_gsi_permission, +.domain = domid, +.u.gsi_permission.gsi = gsi, +.u.gsi_permission.allow_access = allow_access, +}; + +return do_domctl(xch, &domctl); +} + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index 376f91759ac6..f027f22c0028 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -1431,6 +1431,9 @@ static void pci_add_dm_done(libxl__egc *egc, uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED; uint32_t domainid = domid; bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid); +#ifdef CONFIG_X86 +xc_domaininfo_t info; +#endif /* Convenience aliases */ bool starting = pas->starting; @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc, rc = ERROR_FAIL; goto out; } -r = xc_domain_irq_permission(ctx->xch, domid, irq, 1); +#ifdef CONFIG_X86 +/* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */ +r = xc_domain_getinfo_single(ctx->xch, 0, &info); if (r < 0) { -LOGED(ERROR, domainid, - "xc_domain_irq_permission irq=%d (error=%d)", irq, r); +LOGED(ERROR, domainid, "getdomaininfo failed (error=%d)", errno); fclose(f); rc = ERROR_FAIL; goto out; } +if (info.flags & XEN_DOMINF_hvm_guest && +!(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ) && +gsi > 0) { +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1); +if (r < 0) { +LOGED(ERROR, domainid, +"xc_domain_gsi_permission gsi=%d (error=%d)", gsi, errno); +fclose(f); +rc = ERROR_FAIL; +goto out; +} +} +else +#endif +{ +r = xc_domain_irq_permission(ctx->xch, domid, irq, 1); +if (r < 0) { +LOGED(ERROR, domainid, +"xc_domain_irq_permission irq=%d (error=%d)", irq, errno); +fclose(f); +rc = ERROR_FAIL; +goto out; +} +}
[XEN PATCH v10 0/5] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v10 series to support passthrough when dom0 is PVH v9->v10 changes: * patch#2: Indent the comments above PHYSDEVOP_map_pirq according to the code style. * patch#3: Modified the description in the commit message, changing "it calls" to "it will need to call", indicating that there will be new codes on the kernel side that will call PHYSDEVOP_setup_gsi. Also added an explanation of why the interrupt of passthrough device does not work if gsi is not registered. * patch#4: Added define for CONFIG_X86 in tools/libs/light/Makefile to isolate x86 code in libxl_pci.c. * patch#5: Modified the commit message to further describe the purpose of adding XEN_DOMCTL_gsi_permission. Deleted pci_device_set_gsi and called XEN_DOMCTL_gsi_permission directly in pci_add_dm_done. Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, and used currd instead of current->domain. In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original new code, and error handling for irq0 was added. Deleted the extra spaces in the upper and lower lines of the struct xen_domctl_gsi_permission definition. All patches have modified signatures as follows: Signed-off-by: Jiqian Chen means I am the author. Signed-off-by: Huang Rui means Rui sent them to upstream firstly. Signed-off-by: Jiqian Chen means I take continue to upstream. Best regards, Jiqian Chen v8->v9 changes: * patch#1: Move pcidevs_unlock below write_lock, and remove "ASSERT(pcidevs_locked());" from vpci_reset_device_state; Add pci_device_state_reset_type to distinguish the reset types. * patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall. Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" to "d == current->domian". * patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same checke in below.Although their return values are different, this difference is acceptable for the sake of code consistency if ( !is_hardware_domain(currd) ) return -ENOSYS; break; * patch#5: Change the commit message to describe more why we need this new hypercall. Add comment above "if ( is_pv_domain(current->domain) || has_pirq(current->domain) )" to explain why we need this check. Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq. Add explicit padding to struct xen_domctl_gsi_permission. v7->v8 changes: * patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't use pirq. That check was missed in the previous version. * patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function to get gsi by passing in the sbdf of pci device. * patch#5: Remove the parameter "is_gsi", when there exist gsi, in pci_add_dm_done use a new function pci_device_set_gsi to do map_pirq and grant permission. That gets more intuitive code logic. v6->v7 changes: * patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function to get gsi from irq, instead of gsi sysfs. * patch#5: Fix the issue with variable usage, rc->r. v5->v6 changes: * patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old function vpci_remove_device, vpci_add_handlers to vpci_deassign_device, vpci_assign_device * patch#2: Add Reviewed-by Stefano * patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));" * patch#4: Fix some coding style issues below directory tools * patch#5: Modified some variable names and code logic to make code easier to be understood, which to use gsi by default and be compatible with older kernel versions to continue to use irq v4->v5 changes: * patch#1: add pci_lock wrap function vpci_reset_device_state * patch#2: move the check of self map_pirq to physdev.c, and change to check if the caller has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op * patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd)); * patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on it. And add the handling of errno and add the Reviewed-by Stefano * patch#5: is the patch#4 in v4. New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi v3->v4 changes: * patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move printings behind pcidevs_unlock * patch#2: add check to prevent PVH self map * patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gs
[XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for a passthrough device by using gsi, see qemu code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq is not allowed because currd is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail at has_pirq check. So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And add a new check to prevent self map when subject domain has no PIRQ flag. So that domU with PIRQ flag can success to map pirq for passthrough devices even dom0 has no PIRQ flag. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 6 ++ xen/arch/x86/physdev.c | 14 ++ 2 files changed, 20 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 0fab670a4871..03ada3c880bd 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) switch ( cmd ) { +/* +* Only being permitted for management of other domains. +* Further restrictions are enforced in do_physdev_op. +*/ case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index d6dd622952a9..f38cc22c872e 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -323,6 +323,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( !d ) break; +/* Prevent self-map when currd has no X86_EMU_USE_PIRQ flag */ +if ( is_hvm_domain(d) && !has_pirq(d) && d == currd ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} + ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi); rcu_unlock_domain(d); @@ -346,6 +353,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( !d ) break; +/* Prevent self-unmap when currd has no X86_EMU_USE_PIRQ flag */ +if ( is_hvm_domain(d) && !has_pirq(d) && d == currd ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} + ret = physdev_unmap_pirq(d, unmap.pirq); rcu_unlock_domain(d); -- 2.34.1
[XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
The gsi of a passthrough device must be configured for it to be able to be mapped into a hvm domU. But When dom0 is PVH, the gsis don't get registered, it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail. To fix above problem, on Linux kernel side, a new code will need to call PHYSDEVOP_setup_gsi for passthrough devices to register gsi when dom0 is PVH. So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above purpose. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- The code link that will call this hypercall on linux kernel side is as follows: https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/ --- xen/arch/x86/hvm/hypercall.c | 1 + 1 file changed, 1 insertion(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 03ada3c880bd..cfe82d0f96ed 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return -ENOSYS; break; +case PHYSDEVOP_setup_gsi: case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: -- 2.34.1
[XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device
When a device has been reset on dom0 side, the vpci on Xen side won't get notification, so the cached state in vpci is all out of date compare with the real device state. To solve that problem, add a new hypercall to clear all vpci device state. When the state of device is reset on dom0 side, dom0 can call this hypercall to notify vpci. Signed-off-by: Jiqian Chen Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stewart Hildebrand Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 1 + xen/drivers/pci/physdev.c| 43 xen/drivers/vpci/vpci.c | 9 xen/include/public/physdev.h | 7 ++ xen/include/xen/pci.h| 16 ++ xen/include/xen/vpci.h | 6 + 6 files changed, 82 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 7fb3136f0c7c..0fab670a4871 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: +case PHYSDEVOP_pci_device_state_reset: case PHYSDEVOP_dbgp_op: if ( !is_hardware_domain(currd) ) return -ENOSYS; diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c index 42db3e6d133c..1cce508a73b1 100644 --- a/xen/drivers/pci/physdev.c +++ b/xen/drivers/pci/physdev.c @@ -2,11 +2,17 @@ #include #include #include +#include #ifndef COMPAT typedef long ret_t; #endif +static const struct pci_device_state_reset_method +pci_device_state_reset_methods[] = { +[ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state, +}; + ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { ret_t ret; @@ -67,6 +73,43 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case PHYSDEVOP_pci_device_state_reset: { +struct pci_device_state_reset dev_reset; +struct physdev_pci_device *dev; +struct pci_dev *pdev; +pci_sbdf_t sbdf; + +if ( !is_pci_passthrough_enabled() ) +return -EOPNOTSUPP; + +ret = -EFAULT; +if ( copy_from_guest(&dev_reset, arg, 1) != 0 ) +break; +dev = &dev_reset.dev; +sbdf = PCI_SBDF(dev->seg, dev->bus, dev->devfn); + +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf); +if ( ret ) +break; + +pcidevs_lock(); +pdev = pci_get_pdev(NULL, sbdf); +if ( !pdev ) +{ +pcidevs_unlock(); +ret = -ENODEV; +break; +} + +write_lock(&pdev->domain->pci_lock); +pcidevs_unlock(); +ret = pci_device_state_reset_methods[dev_reset.reset_type].reset_fn(pdev); +write_unlock(&pdev->domain->pci_lock); +if ( ret ) +printk(XENLOG_ERR "%pp: failed to reset vPCI device state\n", &sbdf); +break; +} + default: ret = -ENOSYS; break; diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 1e6aa5d799b9..ff67c2550ccb 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev) return rc; } + +int vpci_reset_device_state(struct pci_dev *pdev) +{ +ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); + +vpci_deassign_device(pdev); +return vpci_assign_device(pdev); +} + #endif /* __XEN__ */ static int vpci_register_cmp(const struct vpci_register *r1, diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h index f0c0d4727c0b..a71da5892e5f 100644 --- a/xen/include/public/physdev.h +++ b/xen/include/public/physdev.h @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t); */ #define PHYSDEVOP_prepare_msix 30 #define PHYSDEVOP_release_msix 31 +/* + * Notify the hypervisor that a PCI device has been reset, so that any + * internally cached state is regenerated. Should be called after any + * device reset performed by the hardware domain. + */ +#define PHYSDEVOP_pci_device_state_reset 32 + struct physdev_pci_device { /* IN */ uint16_t seg; diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 63e49f0117e9..376981f9da98 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -156,6 +156,22 @@ struct pci_dev { struct vpci *vpci; }; +struct pci_device_state_reset_method { +int (*reset_fn)(struct pci_dev *pdev); +}; + +enum pci_device_state_reset_type { +DEVICE_RESET_FLR, +DEVICE_RESET_COLD, +DEVICE_RESET_WARM, +DEVICE_RESET_HOT, +}; + +struct pci_device_state_reset { +struct physdev_pci_device dev; +enum pci_device_state_reset_type reset_type; +}; + #define for_each_pdev(d
[XEN PATCH v9 0/5] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v9 series to support passthrough when dom0 is PVH v8->v9 changes: * patch#1: Move pcidevs_unlock below write_lock, and remove "ASSERT(pcidevs_locked());" from vpci_reset_device_state; Add pci_device_state_reset_type to distinguish the reset types. * patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall. Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" to "d == current->domian". * patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same checke in below. * patch#5: Change the commit message to describe more why we need this new hypercall. Add comment above "if ( is_pv_domain(current->domain) || has_pirq(current->domain) )" to explain why we need this check. Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq. Add explicit padding to struct xen_domctl_gsi_permission. Best regards, Jiqian Chen v7->v8 changes: * patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't use pirq. That check was missed in the previous version. * patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function to get gsi by passing in the sbdf of pci device. * patch#5: Remove the parameter "is_gsi", when there exist gsi, in pci_add_dm_done use a new function pci_device_set_gsi to do map_pirq and grant permission. That gets more intuitive code logic. v6->v7 changes: * patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function to get gsi from irq, instead of gsi sysfs. * patch#5: Fix the issue with variable usage, rc->r. v5->v6 changes: * patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old function vpci_remove_device, vpci_add_handlers to vpci_deassign_device, vpci_assign_device * patch#2: Add Reviewed-by Stefano * patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));" * patch#4: Fix some coding style issues below directory tools * patch#5: Modified some variable names and code logic to make code easier to be understood, which to use gsi by default and be compatible with older kernel versions to continue to use irq v4->v5 changes: * patch#1: add pci_lock wrap function vpci_reset_device_state * patch#2: move the check of self map_pirq to physdev.c, and change to check if the caller has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op * patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd)); * patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on it. And add the handling of errno and add the Reviewed-by Stefano * patch#5: is the patch#4 in v4. New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi v3->v4 changes: * patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move printings behind pcidevs_unlock * patch#2: add check to prevent PVH self map * patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH is treated as a separate patch * patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to grant irq permission in XEN_DOMCTL_irq_permission. * patch#5: to be compatible with previous kernel versions, when there is no gsi sysfs, still use irq v4 link: https://lore.kernel.org/xen-devel/20240105070920.350113-1-jiqian.c...@amd.com/T/#t v2->v3 changes: * patch#1: move the content out of pci_reset_device_state and delete pci_reset_device_state; add xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; add description for PHYSDEVOP_pci_device_state_reset; * patch#2: du to changes in the implementation of the second patch on kernel side(that it will do setup_gsi and map_pirq when assigning a device to passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self mapping. * patch#3: du to changes in the implementation of the second patch on kernel side(that adds a new sysfs for gsi instead of a new syscall), so read gsi number from the sysfs of gsi. v3 link: https://lore.kernel.org/xen-devel/20231210164009.1551147-1-jiqian.c...@amd.com/T/#t v2 link: https://lore.kernel.org/xen-devel/20231124104136.3263722-1-jiqian.c...@amd.com/T/#t Below is the description of v2 cover letter: This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen. We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions. I will introduce all issues that these patches try to fix and the differences between v1 and v2. Issues we encountered: 1. pci_stub failed to write bar for a pa
[XEN PATCH v9 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for a passthrough device by using gsi, see qemu code xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq is not allowed because currd is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail at has_pirq check. So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And add a new check to prevent self map when subject domain has no PIRQ flag. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 6 ++ xen/arch/x86/physdev.c | 24 2 files changed, 30 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 0fab670a4871..fa5d50a0dd22 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) switch ( cmd ) { +/* + * Only being permitted for management of other domains. + * Further restrictions are enforced in do_physdev_op. + */ case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 7efa17cf4c1e..61999882f836 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_map_pirq: { physdev_map_pirq_t map; struct msi_info msi; +struct domain *d; ret = -EFAULT; if ( copy_from_guest(&map, arg, 1) != 0 ) break; +d = rcu_lock_domain_by_any_id(map.domid); +if ( d == NULL ) +return -ESRCH; +/* Prevent self-map when domain has no X86_EMU_USE_PIRQ flag */ +if ( is_hvm_domain(d) && !has_pirq(d) && d == current->domain ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} +rcu_unlock_domain(d); + switch ( map.type ) { case MAP_PIRQ_TYPE_MSI_SEG: @@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_unmap_pirq: { struct physdev_unmap_pirq unmap; +struct domain *d; ret = -EFAULT; if ( copy_from_guest(&unmap, arg, 1) != 0 ) break; +d = rcu_lock_domain_by_any_id(unmap.domid); +if ( d == NULL ) +return -ESRCH; +/* Prevent self-unmap when domain has no X86_EMU_USE_PIRQ flag */ +if ( is_hvm_domain(d) && !has_pirq(d) && d == current->domain ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} +rcu_unlock_domain(d); + ret = physdev_unmap_pirq(unmap.domid, unmap.pirq); break; } -- 2.34.1
[RFC XEN PATCH v9 4/5] tools: Add new function to get gsi from dev
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, that causes the irq number is not equal with the gsi number. And when passthrough a device, QEMU will use its gsi number to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, so it will fail when mapping. And in current codes, there is no method to get gsi for userspace. For above purpose, add new function to get gsi. And call this function before xc_physdev_(un)map_pirq Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- RFC: it needs review and needs to wait for the corresponding third patch on linux kernel side to be merged. --- tools/include/xen-sys/Linux/privcmd.h | 7 +++ tools/include/xencall.h | 2 ++ tools/include/xenctrl.h | 2 ++ tools/libs/call/core.c| 5 + tools/libs/call/libxencall.map| 2 ++ tools/libs/call/linux.c | 15 +++ tools/libs/call/private.h | 9 + tools/libs/ctrl/xc_physdev.c | 4 tools/libs/light/libxl_pci.c | 23 +++ 9 files changed, 69 insertions(+) diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h index bc60e8fd55eb..977f1a058797 100644 --- a/tools/include/xen-sys/Linux/privcmd.h +++ b/tools/include/xen-sys/Linux/privcmd.h @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource { __u64 addr; } privcmd_mmap_resource_t; +typedef struct privcmd_gsi_from_dev { + __u32 sbdf; + int gsi; +} privcmd_gsi_from_dev_t; + /* * @cmd: IOCTL_PRIVCMD_HYPERCALL * @arg: &privcmd_hypercall_t @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource { _IOC(_IOC_NONE, 'P', 6, sizeof(domid_t)) #define IOCTL_PRIVCMD_MMAP_RESOURCE\ _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t)) +#define IOCTL_PRIVCMD_GSI_FROM_DEV \ + _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_dev_t)) #define IOCTL_PRIVCMD_UNIMPLEMENTED\ _IOC(_IOC_NONE, 'P', 0xFF, 0) diff --git a/tools/include/xencall.h b/tools/include/xencall.h index fc95ed0fe58e..750aab070323 100644 --- a/tools/include/xencall.h +++ b/tools/include/xencall.h @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op, uint64_t arg1, uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5); +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf); + /* Variant(s) of the above, as needed, returning "long" instead of "int". */ long xencall2L(xencall_handle *xcall, unsigned int op, uint64_t arg1, uint64_t arg2); diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 9ceca0cffc2f..a0381f74d24b 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c index 02c4f8e1aefa..6dae50c9a6ba 100644 --- a/tools/libs/call/core.c +++ b/tools/libs/call/core.c @@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op, return osdep_hypercall(xcall, &call); } +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf) +{ +return osdep_oscall(xcall, sbdf); +} + /* * Local variables: * mode: C diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map index d18a3174e9dc..b92a0b5dc12c 100644 --- a/tools/libs/call/libxencall.map +++ b/tools/libs/call/libxencall.map @@ -10,6 +10,8 @@ VERS_1.0 { xencall4; xencall5; + xen_oscall_gsi_from_dev; + xencall_alloc_buffer; xencall_free_buffer; xencall_alloc_buffer_pages; diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c index 6d588e6bea8f..92c740e176f2 100644 --- a/tools/libs/call/linux.c +++ b/tools/libs/call/linux.c @@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall) return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall); } +int osdep_oscall(xencall_handle *xcall, unsigned int sbdf) +{ +privcmd_gsi_from_dev_t dev_gsi = { +.sbdf = sbdf, +.gsi = -1, +}; + +if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, &dev_gsi)) { +PERROR("failed to get gsi from dev"); +return -1; +} + +return dev_gsi.gsi; +} + static void *alloc_pages_bufdev(xencall_handle *xcall, size_t
[RFC XEN PATCH v9 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
Some type of domain don't have PIRQ, like PVH, it do not do PHYSDEVOP_map_pirq for each gsi. When passthrough a device to guest on PVH dom0, callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will failed at domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and irq on Xen side. What's more, current hypercall XEN_DOMCTL_irq_permission require passing in pirq and grant the access of irq, it is not suitable for dom0 that has no PIRQ flag, because passthrough a device needs gsi and grant the corresponding irq to guest. So, add a new hypercall to grant gsi permission when dom0 is not PV or dom0 has not PIRQ flag. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- RFC: it needs review and needs to wait for the corresponding third patch on linux kernel side to be merged. --- tools/include/xenctrl.h| 5 +++ tools/libs/ctrl/xc_domain.c| 15 +++ tools/libs/light/libxl_pci.c | 72 +++--- xen/arch/x86/domctl.c | 38 xen/arch/x86/include/asm/io_apic.h | 2 + xen/arch/x86/io_apic.c | 21 + xen/arch/x86/mpparse.c | 3 +- xen/include/public/domctl.h| 10 + xen/xsm/flask/hooks.c | 1 + 9 files changed, 149 insertions(+), 18 deletions(-) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index a0381f74d24b..f3feb6848e25 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch, uint32_t pirq, bool allow_access); +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access); + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c index f2d9d14b4d9f..8540e84fda93 100644 --- a/tools/libs/ctrl/xc_domain.c +++ b/tools/libs/ctrl/xc_domain.c @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch, return do_domctl(xch, &domctl); } +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access) +{ +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_gsi_permission, +.domain = domid, +.u.gsi_permission.gsi = gsi, +.u.gsi_permission.allow_access = allow_access, +}; + +return do_domctl(xch, &domctl); +} + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index 7e44d4c3ae2b..b8ec37d8d7e3 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -1412,6 +1412,37 @@ static bool pci_supp_legacy_irq(void) #define PCI_SBDF(seg, bus, devfn) \ uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn))) +static int pci_device_set_gsi(libxl_ctx *ctx, + libxl_domid domid, + libxl_device_pci *pci, + bool map, + int *gsi_back) +{ +int r, gsi, pirq; +uint32_t sbdf; + +sbdf = PCI_SBDF(pci->domain, pci->bus, (PCI_DEVFN(pci->dev, pci->func))); +r = xc_physdev_gsi_from_dev(ctx->xch, sbdf); +*gsi_back = r; +if (r < 0) +return r; + +gsi = r; +pirq = r; +if (map) +r = xc_physdev_map_pirq(ctx->xch, domid, gsi, &pirq); +else +r = xc_physdev_unmap_pirq(ctx->xch, domid, pirq); +if (r) +return r; + +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map); +if (r && errno == EOPNOTSUPP) +r = xc_domain_irq_permission(ctx->xch, domid, pirq, map); + +return r; +} + static void pci_add_dm_done(libxl__egc *egc, pci_add_state *pas, int rc) @@ -1424,10 +1455,10 @@ static void pci_add_dm_done(libxl__egc *egc, unsigned long long start, end, flags, size; int irq, i; int r; -uint32_t sbdf; uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED; uint32_t domainid = domid; bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid); +int gsi; /* Convenience aliases */ bool starting = pas->starting; @@ -1485,6 +1516,19 @@ static void pci_add_dm_done(libxl__egc *egc, fclose(f); if (!pci_supp_legacy_irq()) goto out_no_irq; + +r = pci_device_set_gsi(ctx, domid, pci, 1, &gsi); +if (gsi >= 0) { +if (r < 0) { +rc = ERROR_FAIL; +LOGED(
[XEN PATCH v9 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
On PVH dom0, the gsis don't get registered, but the gsi of a passthrough device must be configured for it to be able to be mapped into a hvm domU. On Linux kernel side, it calles PHYSDEVOP_setup_gsi for passthrough devices to register gsi when dom0 is PVH. So, add PHYSDEVOP_setup_gsi for above purpose. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- The code link that will call this hypercall on linux kernel side is as follows https://lore.kernel.org/lkml/20240607075109.126277-3-jiqian.c...@amd.com/T/#u --- xen/arch/x86/hvm/hypercall.c | 1 + 1 file changed, 1 insertion(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index fa5d50a0dd22..164f4eefa043 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return -ENOSYS; break; +case PHYSDEVOP_setup_gsi: case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: -- 2.34.1
[XEN PATCH v9 1/5] xen/vpci: Clear all vpci status of device
When a device has been reset on dom0 side, the vpci on Xen side won't get notification, so the cached state in vpci is all out of date compare with the real device state. To solve that problem, add a new hypercall to clear all vpci device state. When the state of device is reset on dom0 side, dom0 can call this hypercall to notify vpci. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stewart Hildebrand Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 1 + xen/drivers/pci/physdev.c| 43 xen/drivers/vpci/vpci.c | 9 xen/include/public/physdev.h | 7 ++ xen/include/xen/pci.h| 16 ++ xen/include/xen/vpci.h | 6 + 6 files changed, 82 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 7fb3136f0c7c..0fab670a4871 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: +case PHYSDEVOP_pci_device_state_reset: case PHYSDEVOP_dbgp_op: if ( !is_hardware_domain(currd) ) return -ENOSYS; diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c index 42db3e6d133c..1cce508a73b1 100644 --- a/xen/drivers/pci/physdev.c +++ b/xen/drivers/pci/physdev.c @@ -2,11 +2,17 @@ #include #include #include +#include #ifndef COMPAT typedef long ret_t; #endif +static const struct pci_device_state_reset_method +pci_device_state_reset_methods[] = { +[ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state, +}; + ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { ret_t ret; @@ -67,6 +73,43 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case PHYSDEVOP_pci_device_state_reset: { +struct pci_device_state_reset dev_reset; +struct physdev_pci_device *dev; +struct pci_dev *pdev; +pci_sbdf_t sbdf; + +if ( !is_pci_passthrough_enabled() ) +return -EOPNOTSUPP; + +ret = -EFAULT; +if ( copy_from_guest(&dev_reset, arg, 1) != 0 ) +break; +dev = &dev_reset.dev; +sbdf = PCI_SBDF(dev->seg, dev->bus, dev->devfn); + +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf); +if ( ret ) +break; + +pcidevs_lock(); +pdev = pci_get_pdev(NULL, sbdf); +if ( !pdev ) +{ +pcidevs_unlock(); +ret = -ENODEV; +break; +} + +write_lock(&pdev->domain->pci_lock); +pcidevs_unlock(); +ret = pci_device_state_reset_methods[dev_reset.reset_type].reset_fn(pdev); +write_unlock(&pdev->domain->pci_lock); +if ( ret ) +printk(XENLOG_ERR "%pp: failed to reset vPCI device state\n", &sbdf); +break; +} + default: ret = -ENOSYS; break; diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 1e6aa5d799b9..ff67c2550ccb 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev) return rc; } + +int vpci_reset_device_state(struct pci_dev *pdev) +{ +ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); + +vpci_deassign_device(pdev); +return vpci_assign_device(pdev); +} + #endif /* __XEN__ */ static int vpci_register_cmp(const struct vpci_register *r1, diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h index f0c0d4727c0b..a71da5892e5f 100644 --- a/xen/include/public/physdev.h +++ b/xen/include/public/physdev.h @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t); */ #define PHYSDEVOP_prepare_msix 30 #define PHYSDEVOP_release_msix 31 +/* + * Notify the hypervisor that a PCI device has been reset, so that any + * internally cached state is regenerated. Should be called after any + * device reset performed by the hardware domain. + */ +#define PHYSDEVOP_pci_device_state_reset 32 + struct physdev_pci_device { /* IN */ uint16_t seg; diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 63e49f0117e9..376981f9da98 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -156,6 +156,22 @@ struct pci_dev { struct vpci *vpci; }; +struct pci_device_state_reset_method { +int (*reset_fn)(struct pci_dev *pdev); +}; + +enum pci_device_state_reset_type { +DEVICE_RESET_FLR, +DEVICE_RESET_COLD, +DEVICE_RESET_WARM, +DEVICE_RESET_HOT, +}; + +struct pci_device_state_reset { +struct physdev_pci_device dev; +enum pci_device_state_reset_type reset_type; +}; + #define for_each_pdev(domain, pdev) \ list_for_
[RFC KERNEL PATCH v8 2/3] xen/pvh: Setup gsi for passthrough device
In PVH dom0, the gsis don't get registered, but the gsi of a passthrough device must be configured for it to be able to be mapped into a domU. When assign a device to passthrough, proactively setup the gsi of the device during that process. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- RFC: it need to wait for the corresponding third patch on xen side to be merged. --- arch/x86/xen/enlighten_pvh.c | 23 ++ drivers/acpi/pci_irq.c | 2 +- drivers/xen/acpi.c | 50 ++ drivers/xen/xen-pciback/pci_stub.c | 21 + include/linux/acpi.h | 1 + include/xen/acpi.h | 10 ++ 6 files changed, 106 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 27a2a02ef8fb..6caadf9c00ab 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -4,6 +4,7 @@ #include #include +#include #include #include @@ -27,6 +28,28 @@ bool __ro_after_init xen_pvh; EXPORT_SYMBOL_GPL(xen_pvh); +#ifdef CONFIG_XEN_DOM0 +int xen_pvh_setup_gsi(int gsi, int trigger, int polarity) +{ + int ret; + struct physdev_setup_gsi setup_gsi; + + setup_gsi.gsi = gsi; + setup_gsi.triggering = (trigger == ACPI_EDGE_SENSITIVE ? 0 : 1); + setup_gsi.polarity = (polarity == ACPI_ACTIVE_HIGH ? 0 : 1); + + ret = HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi); + if (ret == -EEXIST) { + xen_raw_printk("Already setup the GSI :%d\n", gsi); + ret = 0; + } else if (ret) + xen_raw_printk("Fail to setup GSI (%d)!\n", gsi); + + return ret; +} +EXPORT_SYMBOL_GPL(xen_pvh_setup_gsi); +#endif + void __init xen_pvh_init(struct boot_params *boot_params) { u32 msr; diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c index ff30ceca2203..630fe0a34bc6 100644 --- a/drivers/acpi/pci_irq.c +++ b/drivers/acpi/pci_irq.c @@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev, } #endif /* CONFIG_X86_IO_APIC */ -static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin) +struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin) { struct acpi_prt_entry *entry = NULL; struct pci_dev *bridge; diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c index 6893c79fd2a1..9e2096524fbc 100644 --- a/drivers/xen/acpi.c +++ b/drivers/xen/acpi.c @@ -30,6 +30,7 @@ * IN THE SOFTWARE. */ +#include #include #include #include @@ -75,3 +76,52 @@ int xen_acpi_notify_hypervisor_extended_sleep(u8 sleep_state, return xen_acpi_notify_hypervisor_state(sleep_state, val_a, val_b, true); } + +struct acpi_prt_entry { + struct acpi_pci_id id; + u8 pin; + acpi_handle link; + u32 index; +}; + +int xen_acpi_get_gsi_info(struct pci_dev *dev, + int *gsi_out, + int *trigger_out, + int *polarity_out) +{ + int gsi; + u8 pin; + struct acpi_prt_entry *entry; + int trigger = ACPI_LEVEL_SENSITIVE; + int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ? + ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW; + + if (!dev || !gsi_out || !trigger_out || !polarity_out) + return -EINVAL; + + pin = dev->pin; + if (!pin) + return -EINVAL; + + entry = acpi_pci_irq_lookup(dev, pin); + if (entry) { + if (entry->link) + gsi = acpi_pci_link_allocate_irq(entry->link, +entry->index, +&trigger, &polarity, +NULL); + else + gsi = entry->index; + } else + gsi = -1; + + if (gsi < 0) + return -EINVAL; + + *gsi_out = gsi; + *trigger_out = trigger; + *polarity_out = polarity; + + return 0; +} +EXPORT_SYMBOL_GPL(xen_acpi_get_gsi_info); diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 73062e531c34..6b22e45188f5 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -21,6 +21,9 @@ #include #include #include +#ifdef CONFIG_XEN_ACPI +#include +#endif #include #include #include "pciback.h" @@ -367,6 +370,9 @@ static int pcistub_match(struct pci_dev *dev) static int pcistub_init_device(struct pci_dev *dev) { struct xen_pcibk_dev_data *dev_data; +#
[RFC KERNEL PATCH v8 3/3] xen/privcmd: Add new syscall to get gsi from dev
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And the irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, it causes the irq number is not equal with the gsi number. And when passthrough a device, QEMU will use device's gsi number to do pirq mapping, but the gsi number is got from file /sys/bus/pci/devices//irq, irq!= gsi, so it will fail when mapping. And in current linux codes, there is no method to get gsi for userspace. For above purpose, record gsi of pcistub devices when init pcistub and add a new syscall into privcmd to let userspace can get gsi when they have a need. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- RFC: it need review and need to wait for previous patch of this series to be merged. --- drivers/xen/privcmd.c | 28 ++ drivers/xen/xen-pciback/pci_stub.c | 38 +++--- include/uapi/xen/privcmd.h | 7 ++ include/xen/acpi.h | 9 +++ 4 files changed, 79 insertions(+), 3 deletions(-) diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 67dfa4778864..5809b3168f25 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -45,6 +45,9 @@ #include #include #include +#ifdef CONFIG_XEN_ACPI +#include +#endif #include "privcmd.h" @@ -842,6 +845,27 @@ static long privcmd_ioctl_mmap_resource(struct file *file, return rc; } +static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata) +{ +#ifdef CONFIG_XEN_ACPI + struct privcmd_gsi_from_dev kdata; + + if (copy_from_user(&kdata, udata, sizeof(kdata))) + return -EFAULT; + + kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf); + if (kdata.gsi == -1) + return -EINVAL; + + if (copy_to_user(udata, &kdata, sizeof(kdata))) + return -EFAULT; + + return 0; +#else + return -EINVAL; +#endif +} + #ifdef CONFIG_XEN_PRIVCMD_EVENTFD /* Irqfd support */ static struct workqueue_struct *irqfd_cleanup_wq; @@ -1529,6 +1553,10 @@ static long privcmd_ioctl(struct file *file, ret = privcmd_ioctl_ioeventfd(file, udata); break; + case IOCTL_PRIVCMD_GSI_FROM_DEV: + ret = privcmd_ioctl_gsi_from_dev(file, udata); + break; + default: break; } diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 6b22e45188f5..9d791d7a8098 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -56,6 +56,9 @@ struct pcistub_device { struct pci_dev *dev; struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */ +#ifdef CONFIG_XEN_ACPI + int gsi; +#endif }; /* Access to pcistub_devices & seized_devices lists and the initialize_devices @@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev) kref_init(&psdev->kref); spin_lock_init(&psdev->lock); +#ifdef CONFIG_XEN_ACPI + psdev->gsi = -1; +#endif return psdev; } @@ -220,6 +226,25 @@ static struct pci_dev *pcistub_device_get_pci_dev(struct xen_pcibk_device *pdev, return pci_dev; } +#ifdef CONFIG_XEN_ACPI +int pcistub_get_gsi_from_sbdf(unsigned int sbdf) +{ + struct pcistub_device *psdev; + int domain = (sbdf >> 16) & 0x; + int bus = PCI_BUS_NUM(sbdf); + int slot = PCI_SLOT(sbdf); + int func = PCI_FUNC(sbdf); + + psdev = pcistub_device_find(domain, bus, slot, func); + + if (!psdev) + return -1; + + return psdev->gsi; +} +EXPORT_SYMBOL_GPL(pcistub_get_gsi_from_sbdf); +#endif + struct pci_dev *pcistub_get_pci_dev_by_slot(struct xen_pcibk_device *pdev, int domain, int bus, int slot, int func) @@ -367,14 +392,20 @@ static int pcistub_match(struct pci_dev *dev) return found; } -static int pcistub_init_device(struct pci_dev *dev) +static int pcistub_init_device(struct pcistub_device *psdev) { struct xen_pcibk_dev_data *dev_data; + struct pci_dev *dev; #ifdef CONFIG_XEN_ACPI int gsi, trigger, polarity; #endif int err = 0; + if (!psdev) + return -EINVAL; + + dev = psdev->dev; + dev_dbg(&dev->dev, "initializing...\n"); /* The PCI backend is not intended to be a module (or to work with @@ -448,6 +479,7 @@ static int pcistub_init_device(struct pci_dev *dev) dev_err(&dev->dev, "Fail to get gsi info!\n"); goto config_release; } + psdev->gsi = gsi; if (xen_initi
[RFC KERNEL PATCH v8 0/2] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v8 series to support passthrough on Xen when dom0 is PVH. v7->v8 change: * patch#1: This is the patch#1 of v6, because it is reverted from the staging branch due to the API changes on Xen side. Add pci_device_state_reset_type_t to distinguish the reset types. * patch#2: is the patch#1 of v7. Use CONFIG_XEN_ACPI instead of CONFIG_ACPI to wrap codes. * patch#3: is the patch#2 of v7. In function privcmd_ioctl_gsi_from_dev, return -EINVAL when not confige CONFIG_XEN_ACPI. use PCI_BUS_NUM PCI_SLOT PCI_FUNC instead of open coding. Best regards, Jiqian Chen v6->v7 change: * the first patch of v6 was already merged into branch linux_next. * patch#1: is the patch#2 of v6. move the implementation of function xen_acpi_get_gsi_info to file drivers/xen/acpi.c, that modification is more convenient for the subsequent patch to obtain gsi. * patch#2: is the patch#3 of v6. add a new parameter "gsi" to struct pcistub_device and set gsi when pcistub initialize device. Then when userspace wants to get gsi by passing sbdf, we can return that gsi. v5->v6 change: * patch#3: change to add a new syscall to translate irq to gsi, instead adding a new gsi sysfs. v4->v5 changes: * patch#1: Add Reviewed-by Stefano * patch#2: Add Reviewed-by Stefano * patch#3: No changes v3->v4 changes: * patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; use a new function pcistub_reset_device_state to wrap __pci_reset_function_locked and xen_reset_device_state, and call pcistub_reset_device_state in pci_stub.c * patch#2: remove map_pirq from xen_pvh_passthrough_gsi v2->v3 changes: * patch#1: add condition to limit do xen_reset_device_state for no-pv domain in pcistub_init_device. * patch#2: Abandoning previous implementations that call unmask_irq. To setup gsi and map pirq for passthrough device in pcistub_init_device. * patch#3: Abandoning previous implementations that adds new syscall to get gsi from irq. To add a new sysfs for gsi, then userspace can get gsi number from sysfs. Below is the description of v2 cover letter: This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen. We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions. I will introduce all issues that these patches try to fix and the differences between v1 and v2. Issues we encountered: 1. pci_stub failed to write bar for a passthrough device. Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a device, pci_stub will call \u201cpcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() -> pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword(), the pci config write will trigger an io interrupt to bar_write() in the xen, but the bar->enabled was set before, the write is not allowed now, and then when bar->Qemu config the passthrough device in xen_pt_realize(), it gets invalid bar values. Reason: the reason is that we don't tell vPCI that the device has been reset, so the current cached state in pdev->vpci is all out of date and is different from the real device state. Solution: to solve this problem, the first patch of kernel(xen/pci: Add xen_reset_device_state function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) add a new hypercall to reset the state stored in vPCI when the state of real device has changed. Thank Roger for the suggestion of this v2, and it is different from v1 (https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), v1 simply allow domU to write pci bar, it does not comply with the design principles of vPCI. 2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using gsi. See xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed. Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail at has_pirq check. Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 (at present dom0 is PVH). The second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow PVH dom0 do PHYSDEVOP_map_pirq. This v2 patch is better than v1, v1 simply remove the has_pirq check (xen https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.hu...@amd.com/). 3. the gsi of a passthrough device doesn't be unmasked 3.1 failed to check the permission of pirq 3.2 the gsi of passthrough device was not registered in PVH dom0 Problem: 3.1 callback function pci_add_dm_done() will be called when qemu config a passthrough device for domU. This
[RFC KERNEL PATCH v8 1/3] xen/pci: Add xen_reset_device_function_state
When device on dom0 side has been reset, the vpci on Xen side won't get notification, so that the cached state in vpci is all out of date with the real device state. To solve that problem, add a new function to clear all vpci device state when device is reset on dom0 side. And call that function in pcistub_init_device. Because when using "pci-assignable-add" to assign a passthrough device in Xen, it will reset passthrough device and the vpci state will out of date, and then device will fail to restore bar state. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- RFC: it need to wait for the corresponding first patch on xen side to be merged. --- drivers/xen/pci.c | 25 + drivers/xen/xen-pciback/pci_stub.c | 18 +++--- include/xen/interface/physdev.h| 7 +++ include/xen/pci.h | 6 ++ 4 files changed, 53 insertions(+), 3 deletions(-) diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 72d4e3f193af..57093e395982 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -177,6 +177,31 @@ static int xen_remove_device(struct device *dev) return r; } +enum pci_device_state_reset_type { + DEVICE_RESET_FLR, + DEVICE_RESET_COLD, + DEVICE_RESET_WARM, + DEVICE_RESET_HOT, +}; + +struct pci_device_state_reset { + struct physdev_pci_device dev; + enum pci_device_state_reset_type reset_type; +}; + +int xen_reset_device_function_state(const struct pci_dev *dev) +{ + struct pci_device_state_reset device = { + .dev.seg = pci_domain_nr(dev->bus), + .dev.bus = dev->bus->number, + .dev.devfn = dev->devfn, + .reset_type = DEVICE_RESET_FLR, + }; + + return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, &device); +} +EXPORT_SYMBOL_GPL(xen_reset_device_function_state); + static int xen_pci_notifier(struct notifier_block *nb, unsigned long action, void *data) { diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index e34b623e4b41..73062e531c34 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -89,6 +89,16 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev) return psdev; } +static int pcistub_reset_device_state(struct pci_dev *dev) +{ + __pci_reset_function_locked(dev); + + if (!xen_pv_domain()) + return xen_reset_device_function_state(dev); + else + return 0; +} + /* Don't call this directly as it's called by pcistub_device_put */ static void pcistub_device_release(struct kref *kref) { @@ -107,7 +117,7 @@ static void pcistub_device_release(struct kref *kref) /* Call the reset function which does not take lock as this * is called from "unbind" which takes a device_lock mutex. */ - __pci_reset_function_locked(dev); + pcistub_reset_device_state(dev); if (dev_data && pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state)) dev_info(&dev->dev, "Could not reload PCI state\n"); @@ -284,7 +294,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev) * (so it's ready for the next domain) */ device_lock_assert(&dev->dev); - __pci_reset_function_locked(dev); + pcistub_reset_device_state(dev); dev_data = pci_get_drvdata(dev); ret = pci_load_saved_state(dev, dev_data->pci_saved_state); @@ -420,7 +430,9 @@ static int pcistub_init_device(struct pci_dev *dev) dev_err(&dev->dev, "Could not store PCI conf saved state!\n"); else { dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n"); - __pci_reset_function_locked(dev); + err = pcistub_reset_device_state(dev); + if (err) + goto config_release; pci_restore_state(dev); } /* Now disable the device (this also ensures some private device diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h index a237af867873..b50646c993dd 100644 --- a/include/xen/interface/physdev.h +++ b/include/xen/interface/physdev.h @@ -256,6 +256,13 @@ struct physdev_pci_device_add { */ #define PHYSDEVOP_prepare_msix 30 #define PHYSDEVOP_release_msix 31 +/* + * Notify the hypervisor that a PCI device has been reset, so that any + * internally cached state is regenerated. Should be called after any + * device reset performed by the hardware domain. + */ +#define PHYSDEVOP_pci_device_state_reset 32 + struct physdev_pci_device { /* IN */ uint16_t seg; diff --git a/include/xen/pci.h b/include/xen/pci.h index b8337cf85fd1..7941809ab729
[RFC QEMU PATCH v7 1/1] xen/pci: get gsi for passthrough devices
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And the irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, that causes the irq number is not equal with the gsi number. And when passthrough a device, qemu wants to use gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq in current code, so it will fail when mapping. Get gsi by using new function supported by Xen tools. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- hw/xen/xen-host-pci-device.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c index 8c6e9a1716a2..2fe6a60434ba 100644 --- a/hw/xen/xen-host-pci-device.c +++ b/hw/xen/xen-host-pci-device.c @@ -10,6 +10,7 @@ #include "qapi/error.h" #include "qemu/cutils.h" #include "xen-host-pci-device.h" +#include "hw/xen/xen_native.h" #define XEN_HOST_PCI_MAX_EXT_CAP \ ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4)) @@ -329,12 +330,17 @@ int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *d, uint32_t cap) return -1; } +#define PCI_SBDF(seg, bus, dev, func) \ +uint32_t)(seg)) << 16) | \ +(PCI_BUILD_BDF(bus, PCI_DEVFN(dev, func + void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain, uint8_t bus, uint8_t dev, uint8_t func, Error **errp) { ERRP_GUARD(); unsigned int v; +uint32_t sdbf; d->config_fd = -1; d->domain = domain; @@ -364,11 +370,16 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain, } d->device_id = v; -xen_host_pci_get_dec_value(d, "irq", &v, errp); -if (*errp) { -goto error; +sdbf = PCI_SBDF(domain, bus, dev, func); +d->irq = xc_physdev_gsi_from_dev(xen_xc, sdbf); +/* fail to get gsi, fallback to irq */ +if (d->irq == -1) { +xen_host_pci_get_dec_value(d, "irq", &v, errp); +if (*errp) { +goto error; +} +d->irq = v; } -d->irq = v; xen_host_pci_get_hex_value(d, "class", &v, errp); if (*errp) { -- 2.34.1
[RFC QEMU PATCH v7 0/1] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v7 series to support passthrough on Xen when dom0 is PVH. v6->v7 changes: * Due to changes in the implementation of obtaining gsi in the kernel and Xen. Change to use xc_physdev_gsi_from_dev, that requires passing in sbdf instead of irq. Best regards, Jiqian Chen v5->v6 changes: * Due to changes in the implementation of obtaining gsi in the kernel and Xen. Change to use xc_physdev_gsi_from_irq, instead of gsi sysfs. v4->v5 changes: * Add review by Stefano v3->v4 changes: * Add gsi into struct XenHostPCIDevice and use gsi number that read from gsi sysfs if it exists, if there is no gsi sysfs, still use irq. v2->v3 changes: * Du to changes in the implementation of the second patch on kernel side(that adds a new sysfs for gsi instead of a new syscall), so read gsi number from the sysfs of gsi. Below is the description of v2 cover letter: This patch is the v2 of the implementation of passthrough when dom0 is PVH on Xen. Issues we encountered: 1. failed to map pirq for gsi Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device\u2019s gsi to pirq in function xen_pt_realize(). But failed. Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from file /sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). But actually the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a gsi in function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of applying first, distributing first. And if you debug the kernel codes (see function __irq_alloc_descs), you will find the irq number is allocated from small to large by order, but the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq. Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide a syscall for userspace to get the gsi from irq. The third patch of xen(tools: Add new function to get gsi from irq) add a new function xc_physdev_gsi_from_irq() to call the new syscall added on kernel side. And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2 on qemu side is the same as the v1 (qemu https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), just call xc_physdev_gsi_from_irq() to get gsi from irq. Jiqian Chen (1): xen/pci: get gsi for passthrough devices hw/xen/xen-host-pci-device.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) -- 2.34.1
[RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
Some type of domain don't have PIRQ, like PVH, when passthrough a device to guest on PVH dom0, callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will failed at domain_pirq_to_irq. So, add a new hypercall to grant/revoke gsi permission when dom0 is not PV or dom0 has not PIRQ flag. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- tools/include/xenctrl.h | 5 +++ tools/libs/ctrl/xc_domain.c | 15 tools/libs/light/libxl_pci.c | 72 xen/arch/x86/domctl.c| 31 xen/include/public/domctl.h | 9 + xen/xsm/flask/hooks.c| 1 + 6 files changed, 117 insertions(+), 16 deletions(-) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 841db41ad7e4..c21a79d74be3 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch, uint32_t pirq, bool allow_access); +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access); + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c index f2d9d14b4d9f..8540e84fda93 100644 --- a/tools/libs/ctrl/xc_domain.c +++ b/tools/libs/ctrl/xc_domain.c @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch, return do_domctl(xch, &domctl); } +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access) +{ +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_gsi_permission, +.domain = domid, +.u.gsi_permission.gsi = gsi, +.u.gsi_permission.allow_access = allow_access, +}; + +return do_domctl(xch, &domctl); +} + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index 7e44d4c3ae2b..1d1b81dd2844 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -1412,6 +1412,37 @@ static bool pci_supp_legacy_irq(void) #define PCI_SBDF(seg, bus, devfn) \ uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn))) +static int pci_device_set_gsi(libxl_ctx *ctx, + libxl_domid domid, + libxl_device_pci *pci, + bool map, + int *gsi_back) +{ +int r, gsi, pirq; +uint32_t sbdf; + +sbdf = PCI_SBDF(pci->domain, pci->bus, (PCI_DEVFN(pci->dev, pci->func))); +r = xc_physdev_gsi_from_dev(ctx->xch, sbdf); +*gsi_back = r; +if (r < 0) +return r; + +gsi = r; +pirq = r; +if (map) +r = xc_physdev_map_pirq(ctx->xch, domid, gsi, &pirq); +else +r = xc_physdev_unmap_pirq(ctx->xch, domid, pirq); +if (r) +return r; + +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map); +if (r && errno == EOPNOTSUPP) +r = xc_domain_irq_permission(ctx->xch, domid, gsi, map); + +return r; +} + static void pci_add_dm_done(libxl__egc *egc, pci_add_state *pas, int rc) @@ -1424,10 +1455,10 @@ static void pci_add_dm_done(libxl__egc *egc, unsigned long long start, end, flags, size; int irq, i; int r; -uint32_t sbdf; uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED; uint32_t domainid = domid; bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid); +int gsi; /* Convenience aliases */ bool starting = pas->starting; @@ -1485,6 +1516,19 @@ static void pci_add_dm_done(libxl__egc *egc, fclose(f); if (!pci_supp_legacy_irq()) goto out_no_irq; + +r = pci_device_set_gsi(ctx, domid, pci, 1, &gsi); +if (gsi >= 0) { +if (r < 0) { +rc = ERROR_FAIL; +LOGED(ERROR, domainid, + "pci_device_set_gsi gsi=%d (error=%d)", gsi, errno); +goto out; +} else { +goto process_permissive; +} +} +/* if gsi < 0, keep using irq */ sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain, pci->bus, pci->dev, pci->func); f = fopen(sysfs_path, "r"); @@ -1493,13 +1537,6 @@ static void pci_add_dm_done(libxl__egc *egc, goto out_no_irq; } if ((fscanf(f, "%u", &irq) == 1) && irq) { -
[RFC XEN PATCH v8 4/5] tools: Add new function to get gsi from dev
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, that causes the irq number is not equal with the gsi number. And when passthrough a device, QEMU will use its gsi number to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, so it will fail when mapping. And in current codes, there is no method to get gsi for userspace. For above purpose, add new function to get gsi. And call this function before xc_physdev_(un)map_pirq Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- tools/include/xen-sys/Linux/privcmd.h | 7 +++ tools/include/xencall.h | 2 ++ tools/include/xenctrl.h | 2 ++ tools/libs/call/core.c| 5 + tools/libs/call/libxencall.map| 2 ++ tools/libs/call/linux.c | 15 +++ tools/libs/call/private.h | 9 + tools/libs/ctrl/xc_physdev.c | 4 tools/libs/light/libxl_pci.c | 23 +++ 9 files changed, 69 insertions(+) diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h index bc60e8fd55eb..977f1a058797 100644 --- a/tools/include/xen-sys/Linux/privcmd.h +++ b/tools/include/xen-sys/Linux/privcmd.h @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource { __u64 addr; } privcmd_mmap_resource_t; +typedef struct privcmd_gsi_from_dev { + __u32 sbdf; + int gsi; +} privcmd_gsi_from_dev_t; + /* * @cmd: IOCTL_PRIVCMD_HYPERCALL * @arg: &privcmd_hypercall_t @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource { _IOC(_IOC_NONE, 'P', 6, sizeof(domid_t)) #define IOCTL_PRIVCMD_MMAP_RESOURCE\ _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t)) +#define IOCTL_PRIVCMD_GSI_FROM_DEV \ + _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_dev_t)) #define IOCTL_PRIVCMD_UNIMPLEMENTED\ _IOC(_IOC_NONE, 'P', 0xFF, 0) diff --git a/tools/include/xencall.h b/tools/include/xencall.h index fc95ed0fe58e..750aab070323 100644 --- a/tools/include/xencall.h +++ b/tools/include/xencall.h @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op, uint64_t arg1, uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5); +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf); + /* Variant(s) of the above, as needed, returning "long" instead of "int". */ long xencall2L(xencall_handle *xcall, unsigned int op, uint64_t arg1, uint64_t arg2); diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 499685594427..841db41ad7e4 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c index 02c4f8e1aefa..6dae50c9a6ba 100644 --- a/tools/libs/call/core.c +++ b/tools/libs/call/core.c @@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op, return osdep_hypercall(xcall, &call); } +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf) +{ +return osdep_oscall(xcall, sbdf); +} + /* * Local variables: * mode: C diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map index d18a3174e9dc..b92a0b5dc12c 100644 --- a/tools/libs/call/libxencall.map +++ b/tools/libs/call/libxencall.map @@ -10,6 +10,8 @@ VERS_1.0 { xencall4; xencall5; + xen_oscall_gsi_from_dev; + xencall_alloc_buffer; xencall_free_buffer; xencall_alloc_buffer_pages; diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c index 6d588e6bea8f..92c740e176f2 100644 --- a/tools/libs/call/linux.c +++ b/tools/libs/call/linux.c @@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall) return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall); } +int osdep_oscall(xencall_handle *xcall, unsigned int sbdf) +{ +privcmd_gsi_from_dev_t dev_gsi = { +.sbdf = sbdf, +.gsi = -1, +}; + +if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, &dev_gsi)) { +PERROR("failed to get gsi from dev"); +return -1; +} + +return dev_gsi.gsi; +} + static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages) { void *p; diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h index 9c3aa432efe2..c
[XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
When a device has been reset on dom0 side, the vpci on Xen side won't get notification, so the cached state in vpci is all out of date compare with the real device state. To solve that problem, add a new hypercall to clear all vpci device state. When the state of device is reset on dom0 side, dom0 can call this hypercall to notify vpci. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stewart Hildebrand Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 1 + xen/drivers/pci/physdev.c| 36 xen/drivers/vpci/vpci.c | 10 ++ xen/include/public/physdev.h | 7 +++ xen/include/xen/vpci.h | 6 ++ 5 files changed, 60 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 14679dd82971..56fbb69ab201 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: +case PHYSDEVOP_pci_device_state_reset: case PHYSDEVOP_dbgp_op: if ( !is_hardware_domain(currd) ) return -ENOSYS; diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c index 42db3e6d133c..73dc8f058b0e 100644 --- a/xen/drivers/pci/physdev.c +++ b/xen/drivers/pci/physdev.c @@ -2,6 +2,7 @@ #include #include #include +#include #ifndef COMPAT typedef long ret_t; @@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case PHYSDEVOP_pci_device_state_reset: { +struct physdev_pci_device dev; +struct pci_dev *pdev; +pci_sbdf_t sbdf; + +if ( !is_pci_passthrough_enabled() ) +return -EOPNOTSUPP; + +ret = -EFAULT; +if ( copy_from_guest(&dev, arg, 1) != 0 ) +break; +sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn); + +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf); +if ( ret ) +break; + +pcidevs_lock(); +pdev = pci_get_pdev(NULL, sbdf); +if ( !pdev ) +{ +pcidevs_unlock(); +ret = -ENODEV; +break; +} + +write_lock(&pdev->domain->pci_lock); +ret = vpci_reset_device_state(pdev); +write_unlock(&pdev->domain->pci_lock); +pcidevs_unlock(); +if ( ret ) +printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", &sbdf); +break; +} + default: ret = -ENOSYS; break; diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 97e115dc5798..424aec2d5c46 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -115,6 +115,16 @@ int vpci_assign_device(struct pci_dev *pdev) return rc; } + +int vpci_reset_device_state(struct pci_dev *pdev) +{ +ASSERT(pcidevs_locked()); +ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); + +vpci_deassign_device(pdev); +return vpci_assign_device(pdev); +} + #endif /* __XEN__ */ static int vpci_register_cmp(const struct vpci_register *r1, diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h index f0c0d4727c0b..f5bab1f29779 100644 --- a/xen/include/public/physdev.h +++ b/xen/include/public/physdev.h @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t); */ #define PHYSDEVOP_prepare_msix 30 #define PHYSDEVOP_release_msix 31 +/* + * Notify the hypervisor that a PCI device has been reset, so that any + * internally cached state is regenerated. Should be called after any + * device reset performed by the hardware domain. + */ +#define PHYSDEVOP_pci_device_state_reset 32 + struct physdev_pci_device { /* IN */ uint16_t seg; diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h index 6e4c972f35ed..93b1c1d72c05 100644 --- a/xen/include/xen/vpci.h +++ b/xen/include/xen/vpci.h @@ -30,6 +30,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev); /* Remove all handlers and free vpci related structures. */ void vpci_deassign_device(struct pci_dev *pdev); +int __must_check vpci_reset_device_state(struct pci_dev *pdev); /* Add/remove a register handler. */ int __must_check vpci_add_register_mask(struct vpci *vpci, @@ -266,6 +267,11 @@ static inline int vpci_assign_device(struct pci_dev *pdev) static inline void vpci_deassign_device(struct pci_dev *pdev) { } +static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev) +{ +return 0; +} + static inline void vpci_dump_msi(void) { } static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, -- 2.34.1
[XEN PATCH v8 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
On PVH dom0, the gsis don't get registered, but the gsi of a passthrough device must be configured for it to be able to be mapped into a hvm domU. On Linux kernel side, it calles PHYSDEVOP_setup_gsi for passthrough devices to register gsi when dom0 is PVH. So, add PHYSDEVOP_setup_gsi for above purpose. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/hvm/hypercall.c | 5 + 1 file changed, 5 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index d49fb8b548a3..98e3c6b176ff 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_unmap_pirq: break; +case PHYSDEVOP_setup_gsi: +if ( !is_hardware_domain(currd) ) +return -EOPNOTSUPP; +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: -- 2.34.1
[XEN PATCH v8 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for a passthrough device by using gsi, see xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq is not allowed because currd is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail at has_pirq check. So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And add a new check to prevent self map when caller has no PIRQ flag. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 2 ++ xen/arch/x86/physdev.c | 24 2 files changed, 26 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 56fbb69ab201..d49fb8b548a3 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 7efa17cf4c1e..1337f95171cd 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_map_pirq: { physdev_map_pirq_t map; struct msi_info msi; +struct domain *d; ret = -EFAULT; if ( copy_from_guest(&map, arg, 1) != 0 ) break; +d = rcu_lock_domain_by_any_id(map.domid); +if ( d == NULL ) +return -ESRCH; +/* If caller is the same HVM guest as current, check pirq flag */ +if ( !is_pv_domain(d) && !has_pirq(d) && map.domid == DOMID_SELF ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} +rcu_unlock_domain(d); + switch ( map.type ) { case MAP_PIRQ_TYPE_MSI_SEG: @@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_unmap_pirq: { struct physdev_unmap_pirq unmap; +struct domain *d; ret = -EFAULT; if ( copy_from_guest(&unmap, arg, 1) != 0 ) break; +d = rcu_lock_domain_by_any_id(unmap.domid); +if ( d == NULL ) +return -ESRCH; +/* If caller is the same HVM guest as current, check pirq flag */ +if ( !is_pv_domain(d) && !has_pirq(d) && unmap.domid == DOMID_SELF ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} +rcu_unlock_domain(d); + ret = physdev_unmap_pirq(unmap.domid, unmap.pirq); break; } -- 2.34.1
[XEN PATCH v8 0/5] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v8 series to support passthrough when dom0 is PVH v6->v7 changes: * patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't use pirq. That check was missed in the previous version. * patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function to get gsi by passing in the sbdf of pci device. * patch#5: Remove the parameter "is_gsi", when there exist gsi, in pci_add_dm_done use a new function pci_device_set_gsi to do map_pirq and grant permission. That gets more intuitive code logic. Best regards, Jiqian Chen v6->v7 changes: * patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function to get gsi from irq, instead of gsi sysfs. * patch#5: Fix the issue with variable usage, rc->r. v5->v6 changes: * patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old function vpci_remove_device, vpci_add_handlers to vpci_deassign_device, vpci_assign_device * patch#2: Add Reviewed-by Stefano * patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));" * patch#4: Fix some coding style issues below directory tools * patch#5: Modified some variable names and code logic to make code easier to be understood, which to use gsi by default and be compatible with older kernel versions to continue to use irq v4->v5 changes: * patch#1: add pci_lock wrap function vpci_reset_device_state * patch#2: move the check of self map_pirq to physdev.c, and change to check if the caller has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op * patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd)); * patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on it. And add the handling of errno and add the Reviewed-by Stefano * patch#5: is the patch#4 in v4. New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi v3->v4 changes: * patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move printings behind pcidevs_unlock * patch#2: add check to prevent PVH self map * patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH is treated as a separate patch * patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to grant irq permission in XEN_DOMCTL_irq_permission. * patch#5: to be compatible with previous kernel versions, when there is no gsi sysfs, still use irq v4 link: https://lore.kernel.org/xen-devel/20240105070920.350113-1-jiqian.c...@amd.com/T/#t v2->v3 changes: * patch#1: move the content out of pci_reset_device_state and delete pci_reset_device_state; add xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; add description for PHYSDEVOP_pci_device_state_reset; * patch#2: du to changes in the implementation of the second patch on kernel side(that it will do setup_gsi and map_pirq when assigning a device to passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self mapping. * patch#3: du to changes in the implementation of the second patch on kernel side(that adds a new sysfs for gsi instead of a new syscall), so read gsi number from the sysfs of gsi. v3 link: https://lore.kernel.org/xen-devel/20231210164009.1551147-1-jiqian.c...@amd.com/T/#t v2 link: https://lore.kernel.org/xen-devel/20231124104136.3263722-1-jiqian.c...@amd.com/T/#t Below is the description of v2 cover letter: This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen. We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions. I will introduce all issues that these patches try to fix and the differences between v1 and v2. Issues we encountered: 1. pci_stub failed to write bar for a passthrough device. Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a device, pci_stub will call pcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() -> pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword()\u201d, the pci config write will trigger an io interrupt to bar_write() in the xen, but the bar->enabled was set before, the write is not allowed now, and then when bar->Qemu config the passthrough device in xen_pt_realize(), it gets invalid bar values. Reason: the reason is that we don't tell vPCI that the device has been reset, so the current cached state in pdev->vpci is all out of date and is different from the real device state. Solution: to solve this problem, the first patch of kernel(xen/pci: Add xen_reset_device_state function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) add a new hypercall to reset the state st
[RFC KERNEL PATCH v7 0/2] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v7 series to support passthrough on Xen when dom0 is PVH. v6->v7 change: * the first patch of v6 was already merged into branch linux_next. * patch#1: is the patch#2 of v6. move the implementation of function xen_acpi_get_gsi_info to file drivers/xen/acpi.c, that modification is more convenient for the subsequent patch to obtain gsi. * patch#2: is the patch#3 of v6. add a new parameter "gsi" to struct pcistub_device and set gsi when pcistub initialize device. Then when userspace wants to get gsi by passing sbdf, we can return that gsi. Best regards, Jiqian Chen v5->v6 change: * patch#3: change to add a new syscall to translate irq to gsi, instead adding a new gsi sysfs. v4->v5 changes: * patch#1: Add Reviewed-by Stefano * patch#2: Add Reviewed-by Stefano * patch#3: No changes v3->v4 changes: * patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; use a new function pcistub_reset_device_state to wrap __pci_reset_function_locked and xen_reset_device_state, and call pcistub_reset_device_state in pci_stub.c * patch#2: remove map_pirq from xen_pvh_passthrough_gsi v2->v3 changes: * patch#1: add condition to limit do xen_reset_device_state for no-pv domain in pcistub_init_device. * patch#2: Abandoning previous implementations that call unmask_irq. To setup gsi and map pirq for passthrough device in pcistub_init_device. * patch#3: Abandoning previous implementations that adds new syscall to get gsi from irq. To add a new sysfs for gsi, then userspace can get gsi number from sysfs. Below is the description of v2 cover letter: This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen. We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions. I will introduce all issues that these patches try to fix and the differences between v1 and v2. Issues we encountered: 1. pci_stub failed to write bar for a passthrough device. Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a device, pci_stub will call \u201cpcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() -> pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword(), the pci config write will trigger an io interrupt to bar_write() in the xen, but the bar->enabled was set before, the write is not allowed now, and then when bar->Qemu config the passthrough device in xen_pt_realize(), it gets invalid bar values. Reason: the reason is that we don't tell vPCI that the device has been reset, so the current cached state in pdev->vpci is all out of date and is different from the real device state. Solution: to solve this problem, the first patch of kernel(xen/pci: Add xen_reset_device_state function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) add a new hypercall to reset the state stored in vPCI when the state of real device has changed. Thank Roger for the suggestion of this v2, and it is different from v1 (https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), v1 simply allow domU to write pci bar, it does not comply with the design principles of vPCI. 2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using gsi. See xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed. Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail at has_pirq check. Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 (at present dom0 is PVH). The second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow PVH dom0 do PHYSDEVOP_map_pirq. This v2 patch is better than v1, v1 simply remove the has_pirq check (xen https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.hu...@amd.com/). 3. the gsi of a passthrough device doesn't be unmasked 3.1 failed to check the permission of pirq 3.2 the gsi of passthrough device was not registered in PVH dom0 Problem: 3.1 callback function pci_add_dm_done() will be called when qemu config a passthrough device for domU. This function will call xc_domain_irq_permission()-> pirq_access_permitted() to check if the gsi has corresponding mappings in dom0. But it didn\u2019t, so failed. See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and it return irq is 0. 3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH dom0, because the devices of PVH are using MSI(-X) interrupts. However, the IO-APIC pin must be configured for it to be able to be mapped i
[RFC KERNEL PATCH v7 2/2] xen/privcmd: Add new syscall to get gsi from dev
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And the irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, it causes the irq number is not equal with the gsi number. And when passthrough a device, QEMU will use device's gsi number to do pirq mapping, but the gsi number is got from file /sys/bus/pci/devices//irq, irq!= gsi, so it will fail when mapping. And in current linux codes, there is no method to get gsi for userspace. For above purpose, record gsi of pcistub devices when init pcistub and add a new syscall into privcmd to let userspace can get gsi when they have a need. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen --- drivers/xen/privcmd.c | 28 ++ drivers/xen/xen-pciback/pci_stub.c | 38 +++--- include/uapi/xen/privcmd.h | 7 ++ include/xen/acpi.h | 2 ++ 4 files changed, 72 insertions(+), 3 deletions(-) diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 67dfa4778864..5953a03b5cb0 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -45,6 +45,9 @@ #include #include #include +#ifdef CONFIG_ACPI +#include +#endif #include "privcmd.h" @@ -842,6 +845,27 @@ static long privcmd_ioctl_mmap_resource(struct file *file, return rc; } +static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata) +{ + struct privcmd_gsi_from_dev kdata; + + if (copy_from_user(&kdata, udata, sizeof(kdata))) + return -EFAULT; + +#ifdef CONFIG_ACPI + kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf); + if (kdata.gsi == -1) + return -EINVAL; +#else + kdata.gsi = -1; +#endif + + if (copy_to_user(udata, &kdata, sizeof(kdata))) + return -EFAULT; + + return 0; +} + #ifdef CONFIG_XEN_PRIVCMD_EVENTFD /* Irqfd support */ static struct workqueue_struct *irqfd_cleanup_wq; @@ -1529,6 +1553,10 @@ static long privcmd_ioctl(struct file *file, ret = privcmd_ioctl_ioeventfd(file, udata); break; + case IOCTL_PRIVCMD_GSI_FROM_DEV: + ret = privcmd_ioctl_gsi_from_dev(file, udata); + break; + default: break; } diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 2b90d832d0a7..4b62b4d377a9 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -56,6 +56,9 @@ struct pcistub_device { struct pci_dev *dev; struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */ +#ifdef CONFIG_ACPI + int gsi; +#endif }; /* Access to pcistub_devices & seized_devices lists and the initialize_devices @@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev) kref_init(&psdev->kref); spin_lock_init(&psdev->lock); +#ifdef CONFIG_ACPI + psdev->gsi = -1; +#endif return psdev; } @@ -220,6 +226,25 @@ static struct pci_dev *pcistub_device_get_pci_dev(struct xen_pcibk_device *pdev, return pci_dev; } +#ifdef CONFIG_ACPI +int pcistub_get_gsi_from_sbdf(unsigned int sbdf) +{ + struct pcistub_device *psdev; + int domain = sbdf >> 16; + int bus = (sbdf >> 8) & 0xff; + int slot = (sbdf >> 3) & 0x1f; + int func = sbdf & 0x7; + + psdev = pcistub_device_find(domain, bus, slot, func); + + if (!psdev) + return -1; + + return psdev->gsi; +} +EXPORT_SYMBOL_GPL(pcistub_get_gsi_from_sbdf); +#endif + struct pci_dev *pcistub_get_pci_dev_by_slot(struct xen_pcibk_device *pdev, int domain, int bus, int slot, int func) @@ -367,14 +392,20 @@ static int pcistub_match(struct pci_dev *dev) return found; } -static int pcistub_init_device(struct pci_dev *dev) +static int pcistub_init_device(struct pcistub_device *psdev) { struct xen_pcibk_dev_data *dev_data; + struct pci_dev *dev; #ifdef CONFIG_ACPI int gsi, trigger, polarity; #endif int err = 0; + if (!psdev) + return -EINVAL; + + dev = psdev->dev; + dev_dbg(&dev->dev, "initializing...\n"); /* The PCI backend is not intended to be a module (or to work with @@ -448,6 +479,7 @@ static int pcistub_init_device(struct pci_dev *dev) dev_err(&dev->dev, "Fail to get gsi info!\n"); goto config_release; } + psdev->gsi = gsi; if (xen_initial_domain() && xen_pvh_domain()) { err = xen_pvh_setup_gsi(gsi, trigg
[RFC KERNEL PATCH v7 1/2] xen/pvh: Setup gsi for passthrough device
In PVH dom0, the gsis don't get registered, but the gsi of a passthrough device must be configured for it to be able to be mapped into a domU. When assign a device to passthrough, proactively setup the gsi of the device during that process. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- arch/x86/xen/enlighten_pvh.c | 21 + drivers/acpi/pci_irq.c | 2 +- drivers/xen/acpi.c | 50 ++ drivers/xen/xen-pciback/pci_stub.c | 21 + include/linux/acpi.h | 1 + include/xen/acpi.h | 10 ++ 6 files changed, 104 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 27a2a02ef8fb..711cdcbc6916 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -4,6 +4,7 @@ #include #include +#include #include #include @@ -27,6 +28,26 @@ bool __ro_after_init xen_pvh; EXPORT_SYMBOL_GPL(xen_pvh); +int xen_pvh_setup_gsi(int gsi, int trigger, int polarity) +{ + int ret; + struct physdev_setup_gsi setup_gsi; + + setup_gsi.gsi = gsi; + setup_gsi.triggering = (trigger == ACPI_EDGE_SENSITIVE ? 0 : 1); + setup_gsi.polarity = (polarity == ACPI_ACTIVE_HIGH ? 0 : 1); + + ret = HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi); + if (ret == -EEXIST) { + xen_raw_printk("Already setup the GSI :%d\n", gsi); + ret = 0; + } else if (ret) + xen_raw_printk("Fail to setup GSI (%d)!\n", gsi); + + return ret; +} +EXPORT_SYMBOL_GPL(xen_pvh_setup_gsi); + void __init xen_pvh_init(struct boot_params *boot_params) { u32 msr; diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c index ff30ceca2203..630fe0a34bc6 100644 --- a/drivers/acpi/pci_irq.c +++ b/drivers/acpi/pci_irq.c @@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev, } #endif /* CONFIG_X86_IO_APIC */ -static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin) +struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin) { struct acpi_prt_entry *entry = NULL; struct pci_dev *bridge; diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c index 6893c79fd2a1..9e2096524fbc 100644 --- a/drivers/xen/acpi.c +++ b/drivers/xen/acpi.c @@ -30,6 +30,7 @@ * IN THE SOFTWARE. */ +#include #include #include #include @@ -75,3 +76,52 @@ int xen_acpi_notify_hypervisor_extended_sleep(u8 sleep_state, return xen_acpi_notify_hypervisor_state(sleep_state, val_a, val_b, true); } + +struct acpi_prt_entry { + struct acpi_pci_id id; + u8 pin; + acpi_handle link; + u32 index; +}; + +int xen_acpi_get_gsi_info(struct pci_dev *dev, + int *gsi_out, + int *trigger_out, + int *polarity_out) +{ + int gsi; + u8 pin; + struct acpi_prt_entry *entry; + int trigger = ACPI_LEVEL_SENSITIVE; + int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ? + ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW; + + if (!dev || !gsi_out || !trigger_out || !polarity_out) + return -EINVAL; + + pin = dev->pin; + if (!pin) + return -EINVAL; + + entry = acpi_pci_irq_lookup(dev, pin); + if (entry) { + if (entry->link) + gsi = acpi_pci_link_allocate_irq(entry->link, +entry->index, +&trigger, &polarity, +NULL); + else + gsi = entry->index; + } else + gsi = -1; + + if (gsi < 0) + return -EINVAL; + + *gsi_out = gsi; + *trigger_out = trigger; + *polarity_out = polarity; + + return 0; +} +EXPORT_SYMBOL_GPL(xen_acpi_get_gsi_info); diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 46c40ec8a18e..2b90d832d0a7 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -21,6 +21,9 @@ #include #include #include +#ifdef CONFIG_ACPI +#include +#endif #include #include #include "pciback.h" @@ -367,6 +370,9 @@ static int pcistub_match(struct pci_dev *dev) static int pcistub_init_device(struct pci_dev *dev) { struct xen_pcibk_dev_data *dev_data; +#ifdef CONFIG_ACPI + int gsi, trigger, polarity; +#endif int err = 0; dev_dbg(&dev->dev, "in
[RFC QEMU PATCH v6 0/1] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v6 series to support passthrough on Xen when dom0 is PVH. v5->v6 changes: * Due to changes in the implementation of obtaining gsi in the kernel and Xen. Change to use xc_physdev_gsi_from_irq, instead of gsi sysfs. Best regards, Jiqian Chen v4->v5 changes: * Add review by Stefano v3->v4 changes: * Add gsi into struct XenHostPCIDevice and use gsi number that read from gsi sysfs if it exists, if there is no gsi sysfs, still use irq. v2->v3 changes: * Du to changes in the implementation of the second patch on kernel side(that adds a new sysfs for gsi instead of a new syscall), so read gsi number from the sysfs of gsi. Below is the description of v2 cover letter: This patch is the v2 of the implementation of passthrough when dom0 is PVH on Xen. Issues we encountered: 1. failed to map pirq for gsi Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device\u2019s gsi to pirq in function xen_pt_realize(). But failed. Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from file /sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). But actually the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a gsi in function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of applying first, distributing first. And if you debug the kernel codes (see function __irq_alloc_descs), you will find the irq number is allocated from small to large by order, but the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq. Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide a syscall for userspace to get the gsi from irq. The third patch of xen(tools: Add new function to get gsi from irq) add a new function xc_physdev_gsi_from_irq() to call the new syscall added on kernel side. And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2 on qemu side is the same as the v1 (qemu https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), just call xc_physdev_gsi_from_irq() to get gsi from irq. Jiqian Chen (1): xen/pci: get gsi from irq for passthrough devices hw/xen/xen-host-pci-device.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) -- 2.34.1
[RFC QEMU PATCH v6 1/1] xen/pci: get gsi from irq for passthrough devices
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And the irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, that causes the irq number is not equal with the gsi number. And when passthrough a device, qemu wants to use gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq in current code, so it will fail when mapping. Translate irq to gsi by using new function supported by Xen tools. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- hw/xen/xen-host-pci-device.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c index 8c6e9a1716a2..5e9aa9679e3e 100644 --- a/hw/xen/xen-host-pci-device.c +++ b/hw/xen/xen-host-pci-device.c @@ -10,6 +10,7 @@ #include "qapi/error.h" #include "qemu/cutils.h" #include "xen-host-pci-device.h" +#include "hw/xen/xen_native.h" #define XEN_HOST_PCI_MAX_EXT_CAP \ ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4)) @@ -368,7 +369,11 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain, if (*errp) { goto error; } -d->irq = v; +d->irq = xc_physdev_gsi_from_irq(xen_xc, v); +/* if fail to get gsi, fallback to irq */ +if (d->irq == -1) { +d->irq = v; +} xen_host_pci_get_hex_value(d, "class", &v, errp); if (*errp) { -- 2.34.1
[RFC XEN PATCH v7 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
Some type of domain don't have PIRQ, like PVH, when passthrough a device to guest on PVH dom0, callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will failed at domain_pirq_to_irq. So, add a new hypercall to grant/revoke gsi permission when dom0 is not PV or dom0 has not PIRQ flag. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- tools/include/xenctrl.h | 5 tools/libs/ctrl/xc_domain.c | 15 tools/libs/light/libxl_pci.c | 46 xen/arch/x86/domctl.c| 31 xen/include/public/domctl.h | 9 +++ xen/xsm/flask/hooks.c| 1 + 6 files changed, 97 insertions(+), 10 deletions(-) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 2b9d55d2c6d7..adeaab93d0f7 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch, uint32_t pirq, bool allow_access); +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access); + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c index f2d9d14b4d9f..8540e84fda93 100644 --- a/tools/libs/ctrl/xc_domain.c +++ b/tools/libs/ctrl/xc_domain.c @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch, return do_domctl(xch, &domctl); } +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access) +{ +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_gsi_permission, +.domain = domid, +.u.gsi_permission.gsi = gsi, +.u.gsi_permission.allow_access = allow_access, +}; + +return do_domctl(xch, &domctl); +} + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index d4313e196ebd..7e82f31ffc4f 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -1421,6 +1421,8 @@ static void pci_add_dm_done(libxl__egc *egc, uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED; uint32_t domainid = domid; bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid); +int gsi; +bool is_gsi = false; /* Convenience aliases */ bool starting = pas->starting; @@ -1490,6 +1492,8 @@ static void pci_add_dm_done(libxl__egc *egc, r = xc_physdev_gsi_from_irq(ctx->xch, irq); if (r != -1) { irq = r; +gsi = r; +is_gsi = true; } r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq); if (r < 0) { @@ -1499,13 +1503,25 @@ static void pci_add_dm_done(libxl__egc *egc, rc = ERROR_FAIL; goto out; } -r = xc_domain_irq_permission(ctx->xch, domid, irq, 1); -if (r < 0) { -LOGED(ERROR, domainid, - "xc_domain_irq_permission irq=%d (error=%d)", irq, r); -fclose(f); -rc = ERROR_FAIL; -goto out; +if (is_gsi) { +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1); +if (r < 0 && errno != -EOPNOTSUPP) { +LOGED(ERROR, domainid, + "xc_domain_gsi_permission gsi=%d (error=%d)", gsi, errno); +fclose(f); +rc = ERROR_FAIL; +goto out; +} +} +if (!is_gsi || errno == -EOPNOTSUPP) { +r = xc_domain_irq_permission(ctx->xch, domid, irq, 1); +if (r < 0) { +LOGED(ERROR, domainid, +"xc_domain_irq_permission irq=%d (error=%d)", irq, errno); +fclose(f); +rc = ERROR_FAIL; +goto out; +} } } fclose(f); @@ -2180,6 +2196,7 @@ static void pci_remove_detached(libxl__egc *egc, uint32_t domainid = prs->domid; bool isstubdom; int r; +bool is_gsi = false; /* Convenience aliases */ libxl_device_pci *const pci = &prs->pci; @@ -2249,6 +2266,7 @@ skip_bar: r = xc_physdev_gsi_from_irq(ctx->xch, irq); if (r != -1) { irq = r; +is_gsi = true; } rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq); if (rc < 0) { @@ -2260,9 +2278,17 @@ skip_bar: */ LOGED(ERROR, domid, "xc_physdev_unmap
[RFC XEN PATCH v7 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
On PVH dom0, the gsis don't get registered, but the gsi of a passthrough device must be configured for it to be able to be mapped into a hvm domU. On Linux kernel side, it calles PHYSDEVOP_setup_gsi for passthrough devices to register gsi when dom0 is PVH. So, add PHYSDEVOP_setup_gsi for above purpose. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/hvm/hypercall.c | 5 + 1 file changed, 5 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index d49fb8b548a3..98e3c6b176ff 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_unmap_pirq: break; +case PHYSDEVOP_setup_gsi: +if ( !is_hardware_domain(currd) ) +return -EOPNOTSUPP; +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: -- 2.34.1
[XEN PATCH v7 1/5] xen/vpci: Clear all vpci status of device
When a device has been reset on dom0 side, the vpci on Xen side won't get notification, so the cached state in vpci is all out of date compare with the real device state. To solve that problem, add a new hypercall to clear all vpci device state. When the state of device is reset on dom0 side, dom0 can call this hypercall to notify vpci. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stewart Hildebrand Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 1 + xen/drivers/pci/physdev.c| 36 xen/drivers/vpci/vpci.c | 10 ++ xen/include/public/physdev.h | 7 +++ xen/include/xen/vpci.h | 6 ++ 5 files changed, 60 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 14679dd82971..56fbb69ab201 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: +case PHYSDEVOP_pci_device_state_reset: case PHYSDEVOP_dbgp_op: if ( !is_hardware_domain(currd) ) return -ENOSYS; diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c index 42db3e6d133c..73dc8f058b0e 100644 --- a/xen/drivers/pci/physdev.c +++ b/xen/drivers/pci/physdev.c @@ -2,6 +2,7 @@ #include #include #include +#include #ifndef COMPAT typedef long ret_t; @@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case PHYSDEVOP_pci_device_state_reset: { +struct physdev_pci_device dev; +struct pci_dev *pdev; +pci_sbdf_t sbdf; + +if ( !is_pci_passthrough_enabled() ) +return -EOPNOTSUPP; + +ret = -EFAULT; +if ( copy_from_guest(&dev, arg, 1) != 0 ) +break; +sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn); + +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf); +if ( ret ) +break; + +pcidevs_lock(); +pdev = pci_get_pdev(NULL, sbdf); +if ( !pdev ) +{ +pcidevs_unlock(); +ret = -ENODEV; +break; +} + +write_lock(&pdev->domain->pci_lock); +ret = vpci_reset_device_state(pdev); +write_unlock(&pdev->domain->pci_lock); +pcidevs_unlock(); +if ( ret ) +printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", &sbdf); +break; +} + default: ret = -ENOSYS; break; diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 97e115dc5798..424aec2d5c46 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -115,6 +115,16 @@ int vpci_assign_device(struct pci_dev *pdev) return rc; } + +int vpci_reset_device_state(struct pci_dev *pdev) +{ +ASSERT(pcidevs_locked()); +ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); + +vpci_deassign_device(pdev); +return vpci_assign_device(pdev); +} + #endif /* __XEN__ */ static int vpci_register_cmp(const struct vpci_register *r1, diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h index f0c0d4727c0b..f5bab1f29779 100644 --- a/xen/include/public/physdev.h +++ b/xen/include/public/physdev.h @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t); */ #define PHYSDEVOP_prepare_msix 30 #define PHYSDEVOP_release_msix 31 +/* + * Notify the hypervisor that a PCI device has been reset, so that any + * internally cached state is regenerated. Should be called after any + * device reset performed by the hardware domain. + */ +#define PHYSDEVOP_pci_device_state_reset 32 + struct physdev_pci_device { /* IN */ uint16_t seg; diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h index e89c571890b2..ea64d94e818b 100644 --- a/xen/include/xen/vpci.h +++ b/xen/include/xen/vpci.h @@ -30,6 +30,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev); /* Remove all handlers and free vpci related structures. */ void vpci_deassign_device(struct pci_dev *pdev); +int __must_check vpci_reset_device_state(struct pci_dev *pdev); /* Add/remove a register handler. */ int __must_check vpci_add_register_mask(struct vpci *vpci, @@ -266,6 +267,11 @@ static inline int vpci_assign_device(struct pci_dev *pdev) static inline void vpci_deassign_device(struct pci_dev *pdev) { } +static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev) +{ +return 0; +} + static inline void vpci_dump_msi(void) { } static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, -- 2.34.1
[RFC XEN PATCH v7 4/5] tools: Add new function to get gsi from irq
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, that causes the irq number is not equal with the gsi number. And when passthrough a device, QEMU will use its gsi number to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq, so it will fail when mapping. And in current codes, there is no method to translate irq to gsi for userspace. For above purpose, add new function to get that translation. And call this function before xc_physdev_(un)map_pirq Signed-off-by: Huang Rui Signed-off-by: Chen Jiqian --- tools/include/xencall.h| 2 ++ tools/include/xenctrl.h| 2 ++ tools/libs/call/core.c | 5 + tools/libs/call/libxencall.map | 2 ++ tools/libs/call/linux.c| 15 +++ tools/libs/call/private.h | 9 + tools/libs/ctrl/xc_physdev.c | 4 tools/libs/light/libxl_pci.c | 11 +++ 8 files changed, 50 insertions(+) diff --git a/tools/include/xencall.h b/tools/include/xencall.h index fc95ed0fe58e..962cb45e1f1b 100644 --- a/tools/include/xencall.h +++ b/tools/include/xencall.h @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op, uint64_t arg1, uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5); +int xen_oscall_gsi_from_irq(xencall_handle *xcall, int irq); + /* Variant(s) of the above, as needed, returning "long" instead of "int". */ long xencall2L(xencall_handle *xcall, unsigned int op, uint64_t arg1, uint64_t arg2); diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 2ef8b4e05422..2b9d55d2c6d7 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_physdev_gsi_from_irq(xc_interface *xch, int irq); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c index 02c4f8e1aefa..6f79f3babd19 100644 --- a/tools/libs/call/core.c +++ b/tools/libs/call/core.c @@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op, return osdep_hypercall(xcall, &call); } +int xen_oscall_gsi_from_irq(xencall_handle *xcall, int irq) +{ +return osdep_oscall(xcall, irq); +} + /* * Local variables: * mode: C diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map index d18a3174e9dc..6cde8eda05e2 100644 --- a/tools/libs/call/libxencall.map +++ b/tools/libs/call/libxencall.map @@ -10,6 +10,8 @@ VERS_1.0 { xencall4; xencall5; + xen_oscall_gsi_from_irq; + xencall_alloc_buffer; xencall_free_buffer; xencall_alloc_buffer_pages; diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c index 6d588e6bea8f..32b60c8b403e 100644 --- a/tools/libs/call/linux.c +++ b/tools/libs/call/linux.c @@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall) return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall); } +long osdep_oscall(xencall_handle *xcall, int irq) +{ +privcmd_gsi_from_irq_t gsi_irq = { +.irq = irq, +.gsi = -1, +}; + +if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_IRQ, &gsi_irq)) { +PERROR("failed to get gsi from irq"); +return -1; +} + +return gsi_irq.gsi; +} + static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages) { void *p; diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h index 9c3aa432efe2..2d86cfb1e099 100644 --- a/tools/libs/call/private.h +++ b/tools/libs/call/private.h @@ -57,6 +57,15 @@ int osdep_xencall_close(xencall_handle *xcall); long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall); +#if defined(__linux__) +long osdep_oscall(xencall_handle *xcall, int irq); +#else +static inline long osdep_oscall(xencall_handle *xcall, int irq) +{ +return -1; +} +#endif + void *osdep_alloc_pages(xencall_handle *xcall, size_t nr_pages); void osdep_free_pages(xencall_handle *xcall, void *p, size_t nr_pages); diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c index 460a8e779ce8..4d3b138ebd0e 100644 --- a/tools/libs/ctrl/xc_physdev.c +++ b/tools/libs/ctrl/xc_physdev.c @@ -111,3 +111,7 @@ int xc_physdev_unmap_pirq(xc_interface *xch, return rc; } +int xc_physdev_gsi_from_irq(xc_interface *xch, int irq) +{ +return xen_oscall_gsi_from_irq(xch->xcall, irq); +} diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index 96cb4da0794e..d4313e196ebd 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light
[RFC XEN PATCH v7 0/5] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v7 series to support passthrough when dom0 is PVH v6->v7 changes: * patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function to get gsi from irq, instead of gsi sysfs. * patch#5: Fix the issue with variable usage, rc->r. Best regards, Jiqian Chen v5->v6 changes: * patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old function vpci_remove_device, vpci_add_handlers to vpci_deassign_device, vpci_assign_device * patch#2: Add Reviewed-by Stefano * patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));" * patch#4: Fix some coding style issues below directory tools * patch#5: Modified some variable names and code logic to make code easier to be understood, which to use gsi by default and be compatible with older kernel versions to continue to use irq v4->v5 changes: * patch#1: add pci_lock wrap function vpci_reset_device_state * patch#2: move the check of self map_pirq to physdev.c, and change to check if the caller has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op * patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd)); * patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on it. And add the handling of errno and add the Reviewed-by Stefano * patch#5: is the patch#4 in v4. New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi v3->v4 changes: * patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move printings behind pcidevs_unlock * patch#2: add check to prevent PVH self map * patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH is treated as a separate patch * patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to grant irq permission in XEN_DOMCTL_irq_permission. * patch#5: to be compatible with previous kernel versions, when there is no gsi sysfs, still use irq v4 link: https://lore.kernel.org/xen-devel/20240105070920.350113-1-jiqian.c...@amd.com/T/#t v2->v3 changes: * patch#1: move the content out of pci_reset_device_state and delete pci_reset_device_state; add xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; add description for PHYSDEVOP_pci_device_state_reset; * patch#2: du to changes in the implementation of the second patch on kernel side(that it will do setup_gsi and map_pirq when assigning a device to passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self mapping. * patch#3: du to changes in the implementation of the second patch on kernel side(that adds a new sysfs for gsi instead of a new syscall), so read gsi number from the sysfs of gsi. v3 link: https://lore.kernel.org/xen-devel/20231210164009.1551147-1-jiqian.c...@amd.com/T/#t v2 link: https://lore.kernel.org/xen-devel/20231124104136.3263722-1-jiqian.c...@amd.com/T/#t Below is the description of v2 cover letter: This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen. We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions. I will introduce all issues that these patches try to fix and the differences between v1 and v2. Issues we encountered: 1. pci_stub failed to write bar for a passthrough device. Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a device, pci_stub will call \u201cpcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() -> pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword()\u201d, the pci config write will trigger an io interrupt to bar_write() in the xen, but the bar->enabled was set before, the write is not allowed now, and then when bar->Qemu config the passthrough device in xen_pt_realize(), it gets invalid bar values. Reason: the reason is that we don't tell vPCI that the device has been reset, so the current cached state in pdev->vpci is all out of date and is different from the real device state. Solution: to solve this problem, the first patch of kernel(xen/pci: Add xen_reset_device_state function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) add a new hypercall to reset the state stored in vPCI when the state of real device has changed. Thank Roger for the suggestion of this v2, and it is different from v1 (https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), v1 simply allow domU to write pci bar, it does not comply with the design principles of vPCI. 2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using gsi. See xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed. Reason: In hvm_physdev_op(), the variable "currd" is PVH d
[XEN PATCH v7 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for a passthrough device by using gsi, see xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq is not allowed because currd is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail at has_pirq check. So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And add a new check to prevent self map when caller has no PIRQ flag. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 2 ++ xen/arch/x86/physdev.c | 24 2 files changed, 26 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 56fbb69ab201..d49fb8b548a3 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 7efa17cf4c1e..1367abc61e54 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_map_pirq: { physdev_map_pirq_t map; struct msi_info msi; +struct domain *d; ret = -EFAULT; if ( copy_from_guest(&map, arg, 1) != 0 ) break; +d = rcu_lock_domain_by_any_id(map.domid); +if ( d == NULL ) +return -ESRCH; +/* If it is an HVM guest, check if it has PIRQs */ +if ( !is_pv_domain(d) && !has_pirq(d) ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} +rcu_unlock_domain(d); + switch ( map.type ) { case MAP_PIRQ_TYPE_MSI_SEG: @@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_unmap_pirq: { struct physdev_unmap_pirq unmap; +struct domain *d; ret = -EFAULT; if ( copy_from_guest(&unmap, arg, 1) != 0 ) break; +d = rcu_lock_domain_by_any_id(unmap.domid); +if ( d == NULL ) +return -ESRCH; +/* If it is an HVM guest, check if it has PIRQs */ +if ( !is_pv_domain(d) && !has_pirq(d) ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} +rcu_unlock_domain(d); + ret = physdev_unmap_pirq(unmap.domid, unmap.pirq); break; } -- 2.34.1
[RFC KERNEL PATCH v6 0/3] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v6 series to support passthrough on Xen when dom0 is PVH. v5->v6 change: * patch#3: change to add a new syscall to translate irq to gsi, instead adding a new gsi sysfs. Best regards, Jiqian Chen v4->v5 changes: * patch#1: Add Reviewed-by Stefano * patch#2: Add Reviewed-by Stefano * patch#3: No changes v3->v4 changes: * patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; use a new function pcistub_reset_device_state to wrap __pci_reset_function_locked and xen_reset_device_state, and call pcistub_reset_device_state in pci_stub.c * patch#2: remove map_pirq from xen_pvh_passthrough_gsi v2->v3 changes: * patch#1: add condition to limit do xen_reset_device_state for no-pv domain in pcistub_init_device. * patch#2: Abandoning previous implementations that call unmask_irq. To setup gsi and map pirq for passthrough device in pcistub_init_device. * patch#3: Abandoning previous implementations that adds new syscall to get gsi from irq. To add a new sysfs for gsi, then userspace can get gsi number from sysfs. Below is the description of v2 cover letter: This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen. We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions. I will introduce all issues that these patches try to fix and the differences between v1 and v2. Issues we encountered: 1. pci_stub failed to write bar for a passthrough device. Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a device, pci_stub will call \u201cpcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() -> pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword()\u201d, the pci config write will trigger an io interrupt to bar_write() in the xen, but the bar->enabled was set before, the write is not allowed now, and then when bar->Qemu config the passthrough device in xen_pt_realize(), it gets invalid bar values. Reason: the reason is that we don't tell vPCI that the device has been reset, so the current cached state in pdev->vpci is all out of date and is different from the real device state. Solution: to solve this problem, the first patch of kernel(xen/pci: Add xen_reset_device_state function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) add a new hypercall to reset the state stored in vPCI when the state of real device has changed. Thank Roger for the suggestion of this v2, and it is different from v1 (https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), v1 simply allow domU to write pci bar, it does not comply with the design principles of vPCI. 2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using gsi. See xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed. Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail at has_pirq check. Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 (at present dom0 is PVH). The second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow PVH dom0 do PHYSDEVOP_map_pirq. This v2 patch is better than v1, v1 simply remove the has_pirq check(xen https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.hu...@amd.com/). 3. the gsi of a passthrough device doesn't be unmasked 3.1 failed to check the permission of pirq 3.2 the gsi of passthrough device was not registered in PVH dom0 Problem: 3.1 callback function pci_add_dm_done() will be called when qemu config a passthrough device for domU. This function will call xc_domain_irq_permission()-> pirq_access_permitted() to check if the gsi has corresponding mappings in dom0. But it didn\u2019t, so failed. See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and it return irq is 0. 3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH dom0, because the devices of PVH are using MSI(-X) interrupts. However, the IO-APIC pin must be configured for it to be able to be mapped into a domU. Reason: After searching codes, I find "map_pirq" and "register_gsi" will be done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka ioapic's pin) is unmasked in PVH dom0. So the two problems can be concluded to that the gsi of a passthrough device doesn't be unmasked. Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a device to be passthrough. So that passthrough devices can have the mapping of
[RFC KERNEL PATCH v6 2/3] xen/pvh: Setup gsi for passthrough device
In PVH dom0, the gsis don't get registered, but the gsi of a passthrough device must be configured for it to be able to be mapped into a domU. When assign a device to passthrough, proactively setup the gsi of the device during that process. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- arch/x86/xen/enlighten_pvh.c | 92 ++ drivers/acpi/pci_irq.c | 2 +- drivers/xen/xen-pciback/pci_stub.c | 8 +++ include/linux/acpi.h | 1 + include/xen/acpi.h | 6 ++ 5 files changed, 108 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index c28f073c1df5..12be665b27d8 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -2,6 +2,7 @@ #include #include #include +#include #include @@ -26,6 +27,97 @@ bool __ro_after_init xen_pvh; EXPORT_SYMBOL_GPL(xen_pvh); +typedef struct gsi_info { + int gsi; + int trigger; + int polarity; +} gsi_info_t; + +struct acpi_prt_entry { + struct acpi_pci_id id; + u8 pin; + acpi_handle link; + u32 index; /* GSI, or link _CRS index */ +}; + +static int xen_pvh_get_gsi_info(struct pci_dev *dev, + gsi_info_t *gsi_info) +{ + int gsi; + u8 pin; + struct acpi_prt_entry *entry; + int trigger = ACPI_LEVEL_SENSITIVE; + int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ? + ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW; + + if (!dev || !gsi_info) + return -EINVAL; + + pin = dev->pin; + if (!pin) + return -EINVAL; + + entry = acpi_pci_irq_lookup(dev, pin); + if (entry) { + if (entry->link) + gsi = acpi_pci_link_allocate_irq(entry->link, +entry->index, +&trigger, &polarity, +NULL); + else + gsi = entry->index; + } else + gsi = -1; + + if (gsi < 0) + return -EINVAL; + + gsi_info->gsi = gsi; + gsi_info->trigger = trigger; + gsi_info->polarity = polarity; + + return 0; +} + +static int xen_pvh_setup_gsi(gsi_info_t *gsi_info) +{ + struct physdev_setup_gsi setup_gsi; + + if (!gsi_info) + return -EINVAL; + + setup_gsi.gsi = gsi_info->gsi; + setup_gsi.triggering = (gsi_info->trigger == ACPI_EDGE_SENSITIVE ? 0 : 1); + setup_gsi.polarity = (gsi_info->polarity == ACPI_ACTIVE_HIGH ? 0 : 1); + + return HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi); +} + +int xen_pvh_passthrough_gsi(struct pci_dev *dev) +{ + int ret; + gsi_info_t gsi_info; + + if (!dev) + return -EINVAL; + + ret = xen_pvh_get_gsi_info(dev, &gsi_info); + if (ret) { + xen_raw_printk("Fail to get gsi info!\n"); + return ret; + } + + ret = xen_pvh_setup_gsi(&gsi_info); + if (ret == -EEXIST) { + xen_raw_printk("Already setup the GSI :%d\n", gsi_info.gsi); + ret = 0; + } else if (ret) + xen_raw_printk("Fail to setup GSI (%d)!\n", gsi_info.gsi); + + return ret; +} +EXPORT_SYMBOL_GPL(xen_pvh_passthrough_gsi); + void __init xen_pvh_init(struct boot_params *boot_params) { u32 msr; diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c index ff30ceca2203..630fe0a34bc6 100644 --- a/drivers/acpi/pci_irq.c +++ b/drivers/acpi/pci_irq.c @@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev, } #endif /* CONFIG_X86_IO_APIC */ -static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin) +struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin) { struct acpi_prt_entry *entry = NULL; struct pci_dev *bridge; diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 46c40ec8a18e..22d4380d2b04 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -435,6 +436,13 @@ static int pcistub_init_device(struct pci_dev *dev) goto config_release; pci_restore_state(dev); } + + if (xen_initial_domain() && xen_pvh_domain()) { + err = xen_pvh_passthrough_gsi(dev); + if (err) + goto config_release; + } + /* Now
[RFC KERNEL PATCH v6 3/3] xen/privcmd: Add new syscall to get gsi from irq
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And the irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, it causes the irq number is not equal with the gsi number. And when passthrough a device, QEMU will use device's gsi number to do pirq mapping, but the gsi number is got from file /sys/bus/pci/devices//irq, irq!= gsi, so it will fail when mapping. And in current linux codes, there is no method to translate irq to gsi for userspace. For above purpose, record the relationship of gsi and irq when PVH dom0 do acpi_register_gsi_ioapic for devices and adds a new syscall into privcmd to let userspace can get that translation when they have a need. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen --- arch/x86/include/asm/apic.h | 8 +++ arch/x86/include/asm/xen/pci.h | 5 arch/x86/kernel/acpi/boot.c | 2 +- arch/x86/pci/xen.c | 21 + drivers/xen/events/events_base.c | 39 drivers/xen/privcmd.c| 19 include/uapi/xen/privcmd.h | 7 ++ include/xen/events.h | 5 8 files changed, 105 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 9d159b771dc8..dd4139250895 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -169,6 +169,9 @@ extern bool apic_needs_pit(void); extern void apic_send_IPI_allbutself(unsigned int vector); +extern int acpi_register_gsi_ioapic(struct device *dev, u32 gsi, + int trigger, int polarity); + #else /* !CONFIG_X86_LOCAL_APIC */ static inline void lapic_shutdown(void) { } #define local_apic_timer_c2_ok 1 @@ -183,6 +186,11 @@ static inline void apic_intr_mode_init(void) { } static inline void lapic_assign_system_vectors(void) { } static inline void lapic_assign_legacy_vector(unsigned int i, bool r) { } static inline bool apic_needs_pit(void) { return true; } +static inline int acpi_register_gsi_ioapic(struct device *dev, u32 gsi, + int trigger, int polarity) +{ + return (int)gsi; +} #endif /* !CONFIG_X86_LOCAL_APIC */ #ifdef CONFIG_X86_X2APIC diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h index 9015b888edd6..aa8ded61fc2d 100644 --- a/arch/x86/include/asm/xen/pci.h +++ b/arch/x86/include/asm/xen/pci.h @@ -5,6 +5,7 @@ #if defined(CONFIG_PCI_XEN) extern int __init pci_xen_init(void); extern int __init pci_xen_hvm_init(void); +extern int __init pci_xen_pvh_init(void); #define pci_xen 1 #else #define pci_xen 0 @@ -13,6 +14,10 @@ static inline int pci_xen_hvm_init(void) { return -1; } +static inline int pci_xen_pvh_init(void) +{ + return -1; +} #endif #ifdef CONFIG_XEN_PV_DOM0 int __init pci_xen_initial_domain(void); diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 85a3ce2a3666..72c73458c083 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -749,7 +749,7 @@ static int acpi_register_gsi_pic(struct device *dev, u32 gsi, } #ifdef CONFIG_X86_LOCAL_APIC -static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi, +int acpi_register_gsi_ioapic(struct device *dev, u32 gsi, int trigger, int polarity) { int irq = gsi; diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index 652cd53e77f6..f056ab5c0a06 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -114,6 +114,21 @@ static int acpi_register_gsi_xen_hvm(struct device *dev, u32 gsi, false /* no mapping of GSI to PIRQ */); } +static int acpi_register_gsi_xen_pvh(struct device *dev, u32 gsi, + int trigger, int polarity) +{ + int irq; + + irq = acpi_register_gsi_ioapic(dev, gsi, trigger, polarity); + if (irq < 0) + return irq; + + if (xen_pvh_add_gsi_irq_map(gsi, irq) == -EEXIST) + printk(KERN_INFO "Already map the GSI :%u and IRQ: %d\n", gsi, irq); + + return irq; +} + #ifdef CONFIG_XEN_PV_DOM0 static int xen_register_gsi(u32 gsi, int triggering, int polarity) { @@ -558,6 +573,12 @@ int __init pci_xen_hvm_init(void) return 0; } +int __init pci_xen_pvh_init(void) +{ + __acpi_register_gsi = acpi_register_gsi_xen_pvh; + return 0; +} + #ifdef CONFIG_XEN_PV_DOM0 int __init pci_xen_initial_domain(void) { diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c index 27553673e46b..80d4f7faac64 100644 --- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -953,6 +953,43 @@ int xen_irq_from_gsi(unsigned gsi) } EXPORT_SYMBOL_GPL(xen_irq_from_gsi); +int xen_gsi_fro
[KERNEL PATCH v6 1/3] xen/pci: Add xen_reset_device_state function
When device on dom0 side has been reset, the vpci on Xen side won't get notification, so that the cached state in vpci is all out of date with the real device state. To solve that problem, add a new function to clear all vpci device state when device is reset on dom0 side. And call that function in pcistub_init_device. Because when using "pci-assignable-add" to assign a passthrough device in Xen, it will reset passthrough device and the vpci state will out of date, and then device will fail to restore bar state. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- drivers/xen/pci.c | 12 drivers/xen/xen-pciback/pci_stub.c | 18 +++--- include/xen/interface/physdev.h| 7 +++ include/xen/pci.h | 6 ++ 4 files changed, 40 insertions(+), 3 deletions(-) diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 72d4e3f193af..e9b30bc09139 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -177,6 +177,18 @@ static int xen_remove_device(struct device *dev) return r; } +int xen_reset_device_state(const struct pci_dev *dev) +{ + struct physdev_pci_device device = { + .seg = pci_domain_nr(dev->bus), + .bus = dev->bus->number, + .devfn = dev->devfn + }; + + return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, &device); +} +EXPORT_SYMBOL_GPL(xen_reset_device_state); + static int xen_pci_notifier(struct notifier_block *nb, unsigned long action, void *data) { diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index e34b623e4b41..46c40ec8a18e 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -89,6 +89,16 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev) return psdev; } +static int pcistub_reset_device_state(struct pci_dev *dev) +{ + __pci_reset_function_locked(dev); + + if (!xen_pv_domain()) + return xen_reset_device_state(dev); + else + return 0; +} + /* Don't call this directly as it's called by pcistub_device_put */ static void pcistub_device_release(struct kref *kref) { @@ -107,7 +117,7 @@ static void pcistub_device_release(struct kref *kref) /* Call the reset function which does not take lock as this * is called from "unbind" which takes a device_lock mutex. */ - __pci_reset_function_locked(dev); + pcistub_reset_device_state(dev); if (dev_data && pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state)) dev_info(&dev->dev, "Could not reload PCI state\n"); @@ -284,7 +294,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev) * (so it's ready for the next domain) */ device_lock_assert(&dev->dev); - __pci_reset_function_locked(dev); + pcistub_reset_device_state(dev); dev_data = pci_get_drvdata(dev); ret = pci_load_saved_state(dev, dev_data->pci_saved_state); @@ -420,7 +430,9 @@ static int pcistub_init_device(struct pci_dev *dev) dev_err(&dev->dev, "Could not store PCI conf saved state!\n"); else { dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n"); - __pci_reset_function_locked(dev); + err = pcistub_reset_device_state(dev); + if (err) + goto config_release; pci_restore_state(dev); } /* Now disable the device (this also ensures some private device diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h index a237af867873..8609770e28f5 100644 --- a/include/xen/interface/physdev.h +++ b/include/xen/interface/physdev.h @@ -256,6 +256,13 @@ struct physdev_pci_device_add { */ #define PHYSDEVOP_prepare_msix 30 #define PHYSDEVOP_release_msix 31 +/* + * Notify the hypervisor that a PCI device has been reset, so that any + * internally cached state is regenerated. Should be called after any + * device reset performed by the hardware domain. + */ +#define PHYSDEVOP_pci_device_state_reset 32 + struct physdev_pci_device { /* IN */ uint16_t seg; diff --git a/include/xen/pci.h b/include/xen/pci.h index b8337cf85fd1..b2e2e856efd6 100644 --- a/include/xen/pci.h +++ b/include/xen/pci.h @@ -4,10 +4,16 @@ #define __XEN_PCI_H__ #if defined(CONFIG_XEN_DOM0) +int xen_reset_device_state(const struct pci_dev *dev); int xen_find_device_domain_owner(struct pci_dev *dev); int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain); int xen_unregister_device_domain_owner(struct pci_dev *dev); #else +static inline int xen_reset_device_state(const struct pci_dev *dev) +{ + return -1; +} + static inline int xen_find_device_domain_owner(struct pci_dev *dev) { return -1; -- 2.34.1
[KERNEL PATCH v5 1/3] xen/pci: Add xen_reset_device_state function
When device on dom0 side has been reset, the vpci on Xen side won't get notification, so that the cached state in vpci is all out of date with the real device state. To solve that problem, add a new function to clear all vpci device state when device is reset on dom0 side. And call that function in pcistub_init_device. Because when using "pci-assignable-add" to assign a passthrough device in Xen, it will reset passthrough device and the vpci state will out of date, and then device will fail to restore bar state. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- drivers/xen/pci.c | 12 drivers/xen/xen-pciback/pci_stub.c | 18 +++--- include/xen/interface/physdev.h| 7 +++ include/xen/pci.h | 6 ++ 4 files changed, 40 insertions(+), 3 deletions(-) diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 72d4e3f193af..e9b30bc09139 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -177,6 +177,18 @@ static int xen_remove_device(struct device *dev) return r; } +int xen_reset_device_state(const struct pci_dev *dev) +{ + struct physdev_pci_device device = { + .seg = pci_domain_nr(dev->bus), + .bus = dev->bus->number, + .devfn = dev->devfn + }; + + return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, &device); +} +EXPORT_SYMBOL_GPL(xen_reset_device_state); + static int xen_pci_notifier(struct notifier_block *nb, unsigned long action, void *data) { diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index e34b623e4b41..46c40ec8a18e 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -89,6 +89,16 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev) return psdev; } +static int pcistub_reset_device_state(struct pci_dev *dev) +{ + __pci_reset_function_locked(dev); + + if (!xen_pv_domain()) + return xen_reset_device_state(dev); + else + return 0; +} + /* Don't call this directly as it's called by pcistub_device_put */ static void pcistub_device_release(struct kref *kref) { @@ -107,7 +117,7 @@ static void pcistub_device_release(struct kref *kref) /* Call the reset function which does not take lock as this * is called from "unbind" which takes a device_lock mutex. */ - __pci_reset_function_locked(dev); + pcistub_reset_device_state(dev); if (dev_data && pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state)) dev_info(&dev->dev, "Could not reload PCI state\n"); @@ -284,7 +294,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev) * (so it's ready for the next domain) */ device_lock_assert(&dev->dev); - __pci_reset_function_locked(dev); + pcistub_reset_device_state(dev); dev_data = pci_get_drvdata(dev); ret = pci_load_saved_state(dev, dev_data->pci_saved_state); @@ -420,7 +430,9 @@ static int pcistub_init_device(struct pci_dev *dev) dev_err(&dev->dev, "Could not store PCI conf saved state!\n"); else { dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n"); - __pci_reset_function_locked(dev); + err = pcistub_reset_device_state(dev); + if (err) + goto config_release; pci_restore_state(dev); } /* Now disable the device (this also ensures some private device diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h index a237af867873..8609770e28f5 100644 --- a/include/xen/interface/physdev.h +++ b/include/xen/interface/physdev.h @@ -256,6 +256,13 @@ struct physdev_pci_device_add { */ #define PHYSDEVOP_prepare_msix 30 #define PHYSDEVOP_release_msix 31 +/* + * Notify the hypervisor that a PCI device has been reset, so that any + * internally cached state is regenerated. Should be called after any + * device reset performed by the hardware domain. + */ +#define PHYSDEVOP_pci_device_state_reset 32 + struct physdev_pci_device { /* IN */ uint16_t seg; diff --git a/include/xen/pci.h b/include/xen/pci.h index b8337cf85fd1..b2e2e856efd6 100644 --- a/include/xen/pci.h +++ b/include/xen/pci.h @@ -4,10 +4,16 @@ #define __XEN_PCI_H__ #if defined(CONFIG_XEN_DOM0) +int xen_reset_device_state(const struct pci_dev *dev); int xen_find_device_domain_owner(struct pci_dev *dev); int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain); int xen_unregister_device_domain_owner(struct pci_dev *dev); #else +static inline int xen_reset_device_state(const struct pci_dev *dev) +{ + return -1; +} + static inline int xen_find_device_domain_owner(struct pci_dev *dev) { return -1; -- 2.34.1
[RFC KERNEL PATCH v5 3/3] PCI/sysfs: Add gsi sysfs for pci_dev
There is a need for some scenarios to use gsi sysfs. For example, when xen passthrough a device to dumU, it will use gsi to map pirq, but currently userspace can't get gsi number. So, add gsi sysfs for that and for other potential scenarios. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen --- RFC: No feasible suggestions were obtained in the discussion of v4. Discussions are still needed where/how to expose the gsi. Looking forward to get more comments and suggestions from PCI/ACPI Maintainers. --- drivers/acpi/pci_irq.c | 1 + drivers/pci/pci-sysfs.c | 11 +++ include/linux/pci.h | 2 ++ 3 files changed, 14 insertions(+) diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c index 630fe0a34bc6..739a58755df2 100644 --- a/drivers/acpi/pci_irq.c +++ b/drivers/acpi/pci_irq.c @@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev) kfree(entry); return 0; } + dev->gsi = gsi; rc = acpi_register_gsi(&dev->dev, gsi, triggering, polarity); if (rc < 0) { diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 2321fdfefd7d..c51df88d079e 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -71,6 +71,16 @@ static ssize_t irq_show(struct device *dev, } static DEVICE_ATTR_RO(irq); +static ssize_t gsi_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct pci_dev *pdev = to_pci_dev(dev); + + return sysfs_emit(buf, "%u\n", pdev->gsi); +} +static DEVICE_ATTR_RO(gsi); + static ssize_t broken_parity_status_show(struct device *dev, struct device_attribute *attr, char *buf) @@ -596,6 +606,7 @@ static struct attribute *pci_dev_attrs[] = { &dev_attr_revision.attr, &dev_attr_class.attr, &dev_attr_irq.attr, + &dev_attr_gsi.attr, &dev_attr_local_cpus.attr, &dev_attr_local_cpulist.attr, &dev_attr_modalias.attr, diff --git a/include/linux/pci.h b/include/linux/pci.h index 7ab0d13672da..457043cfdfce 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -529,6 +529,8 @@ struct pci_dev { /* These methods index pci_reset_fn_methods[] */ u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */ + + unsigned intgsi; }; static inline struct pci_dev *pci_physfn(struct pci_dev *dev) -- 2.34.1
[RFC KERNEL PATCH v5 2/3] xen/pvh: Setup gsi for passthrough device
In PVH dom0, the gsis don't get registered, but the gsi of a passthrough device must be configured for it to be able to be mapped into a domU. When assign a device to passthrough, proactively setup the gsi of the device during that process. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- RFC: This patch change function acpi_pci_irq_lookup from a static function to non-static, need ACPI Maintainer to give some comments. --- arch/x86/xen/enlighten_pvh.c | 92 ++ drivers/acpi/pci_irq.c | 2 +- drivers/xen/xen-pciback/pci_stub.c | 8 +++ include/linux/acpi.h | 1 + include/xen/acpi.h | 6 ++ 5 files changed, 108 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index c28f073c1df5..12be665b27d8 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -2,6 +2,7 @@ #include #include #include +#include #include @@ -26,6 +27,97 @@ bool __ro_after_init xen_pvh; EXPORT_SYMBOL_GPL(xen_pvh); +typedef struct gsi_info { + int gsi; + int trigger; + int polarity; +} gsi_info_t; + +struct acpi_prt_entry { + struct acpi_pci_id id; + u8 pin; + acpi_handle link; + u32 index; /* GSI, or link _CRS index */ +}; + +static int xen_pvh_get_gsi_info(struct pci_dev *dev, + gsi_info_t *gsi_info) +{ + int gsi; + u8 pin; + struct acpi_prt_entry *entry; + int trigger = ACPI_LEVEL_SENSITIVE; + int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ? + ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW; + + if (!dev || !gsi_info) + return -EINVAL; + + pin = dev->pin; + if (!pin) + return -EINVAL; + + entry = acpi_pci_irq_lookup(dev, pin); + if (entry) { + if (entry->link) + gsi = acpi_pci_link_allocate_irq(entry->link, +entry->index, +&trigger, &polarity, +NULL); + else + gsi = entry->index; + } else + gsi = -1; + + if (gsi < 0) + return -EINVAL; + + gsi_info->gsi = gsi; + gsi_info->trigger = trigger; + gsi_info->polarity = polarity; + + return 0; +} + +static int xen_pvh_setup_gsi(gsi_info_t *gsi_info) +{ + struct physdev_setup_gsi setup_gsi; + + if (!gsi_info) + return -EINVAL; + + setup_gsi.gsi = gsi_info->gsi; + setup_gsi.triggering = (gsi_info->trigger == ACPI_EDGE_SENSITIVE ? 0 : 1); + setup_gsi.polarity = (gsi_info->polarity == ACPI_ACTIVE_HIGH ? 0 : 1); + + return HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi); +} + +int xen_pvh_passthrough_gsi(struct pci_dev *dev) +{ + int ret; + gsi_info_t gsi_info; + + if (!dev) + return -EINVAL; + + ret = xen_pvh_get_gsi_info(dev, &gsi_info); + if (ret) { + xen_raw_printk("Fail to get gsi info!\n"); + return ret; + } + + ret = xen_pvh_setup_gsi(&gsi_info); + if (ret == -EEXIST) { + xen_raw_printk("Already setup the GSI :%d\n", gsi_info.gsi); + ret = 0; + } else if (ret) + xen_raw_printk("Fail to setup GSI (%d)!\n", gsi_info.gsi); + + return ret; +} +EXPORT_SYMBOL_GPL(xen_pvh_passthrough_gsi); + void __init xen_pvh_init(struct boot_params *boot_params) { u32 msr; diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c index ff30ceca2203..630fe0a34bc6 100644 --- a/drivers/acpi/pci_irq.c +++ b/drivers/acpi/pci_irq.c @@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev, } #endif /* CONFIG_X86_IO_APIC */ -static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin) +struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin) { struct acpi_prt_entry *entry = NULL; struct pci_dev *bridge; diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 46c40ec8a18e..22d4380d2b04 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -435,6 +436,13 @@ static int pcistub_init_device(struct pci_dev *dev) goto config_release; pci_restore_state(dev); } + + if (xen_initial_domain() && xen_pvh_domain()) { +
[RFC KERNEL PATCH v5 0/3] Support device passthrough when dom0 is PVH on Xen
se problems, the second patch of kernel(xen/pvh: Unmask irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a device to be passthrough. So that passthrough devices can have the mapping of gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/, kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/ and xen https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.hu...@amd.com/), v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, which is unnecessary and may cause multiple registration. 4. failed to map pirq for gsi Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi to pirq in function xen_pt_realize(). But failed. Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from file /sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). But actually the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a gsi in function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of applying first, distributing first. And if you debug the kernel codes(see function __irq_alloc_descs), you will find the irq number is allocated from small to large by order, but the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq. Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide a syscall for userspace to get the gsi from irq. The third patch of xen(tools: Add new function to get gsi from irq) add a new function xc_physdev_gsi_from_irq() to call the new syscall added on kernel side. And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2 patch is the same as v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ and xen https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/) About the v2 patch of qemu, just change an included head file, other are similar to the v1 ( qemu https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), just call xc_physdev_gsi_from_irq() to get gsi from irq. Jiqian Chen (3): xen/pci: Add xen_reset_device_state function xen/pvh: Setup gsi for passthrough device PCI/sysfs: Add gsi sysfs for pci_dev arch/x86/xen/enlighten_pvh.c | 92 ++ drivers/acpi/pci_irq.c | 3 +- drivers/pci/pci-sysfs.c| 11 drivers/xen/pci.c | 12 drivers/xen/xen-pciback/pci_stub.c | 26 - include/linux/acpi.h | 1 + include/linux/pci.h| 2 + include/xen/acpi.h | 6 ++ include/xen/interface/physdev.h| 7 +++ include/xen/pci.h | 6 ++ 10 files changed, 162 insertions(+), 4 deletions(-) -- 2.34.1
[RFC XEN PATCH v6 4/5] libxl: Use gsi instead of irq for mapping pirq
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And the irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, that causes the irq number is not equal with the gsi number. And when passthrough a device, xl wants to use gsi to map pirq, see pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq in current code, so it will fail when mapping. So, use real gsi number read from gsi sysfs. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- RFC: discussions ongoing on the Linux side where/how to expose the gsi --- tools/libs/light/libxl_pci.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index 96cb4da0794e..2cec83e0b734 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -1478,8 +1478,14 @@ static void pci_add_dm_done(libxl__egc *egc, fclose(f); if (!pci_supp_legacy_irq()) goto out_no_irq; -sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain, +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain, pci->bus, pci->dev, pci->func); +r = access(sysfs_path, F_OK); +if (r && errno == ENOENT) { +/* To compitable with old version of kernel, still need to use irq */ +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain, + pci->bus, pci->dev, pci->func); +} f = fopen(sysfs_path, "r"); if (f == NULL) { LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path); @@ -2229,9 +2235,15 @@ skip_bar: if (!pci_supp_legacy_irq()) goto skip_legacy_irq; -sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain, +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain, pci->bus, pci->dev, pci->func); +rc = access(sysfs_path, F_OK); +if (rc && errno == ENOENT) { +/* To compitable with old version of kernel, still need to use irq */ +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain, + pci->bus, pci->dev, pci->func); +} f = fopen(sysfs_path, "r"); if (f == NULL) { LOGED(ERROR, domid, "Couldn't open %s", sysfs_path); -- 2.34.1
[RFC XEN PATCH v6 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
Some type of domain don't have PIRQ, like PVH, when passthrough a device to guest on PVH dom0, callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will failed at domain_pirq_to_irq. So, add a new hypercall to grant/revoke gsi permission when dom0 is not PV or dom0 has not PIRQ flag. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- tools/include/xenctrl.h | 5 tools/libs/ctrl/xc_domain.c | 15 +++ tools/libs/light/libxl_pci.c | 52 +--- xen/arch/x86/domctl.c| 31 + xen/include/public/domctl.h | 9 +++ xen/xsm/flask/hooks.c| 1 + 6 files changed, 103 insertions(+), 10 deletions(-) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 2ef8b4e05422..519c860a00d5 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch, uint32_t pirq, bool allow_access); +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access); + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c index f2d9d14b4d9f..8540e84fda93 100644 --- a/tools/libs/ctrl/xc_domain.c +++ b/tools/libs/ctrl/xc_domain.c @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch, return do_domctl(xch, &domctl); } +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access) +{ +struct xen_domctl domctl = { +.cmd = XEN_DOMCTL_gsi_permission, +.domain = domid, +.u.gsi_permission.gsi = gsi, +.u.gsi_permission.allow_access = allow_access, +}; + +return do_domctl(xch, &domctl); +} + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index 2cec83e0b734..debf6ec6ddc7 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -1421,6 +1421,8 @@ static void pci_add_dm_done(libxl__egc *egc, uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED; uint32_t domainid = domid; bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid); +int gsi; +bool is_gsi = true; /* Convenience aliases */ bool starting = pas->starting; @@ -1485,6 +1487,7 @@ static void pci_add_dm_done(libxl__egc *egc, /* To compitable with old version of kernel, still need to use irq */ sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain, pci->bus, pci->dev, pci->func); +is_gsi = false; } f = fopen(sysfs_path, "r"); if (f == NULL) { @@ -1492,6 +1495,13 @@ static void pci_add_dm_done(libxl__egc *egc, goto out_no_irq; } if ((fscanf(f, "%u", &irq) == 1) && irq) { +/* + * If use gsi, save the value, because the value of irq + * will be changed by function xc_physdev_map_pirq + */ +if (is_gsi) { +gsi = irq; +} r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq); if (r < 0) { LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)", @@ -1500,13 +1510,25 @@ static void pci_add_dm_done(libxl__egc *egc, rc = ERROR_FAIL; goto out; } -r = xc_domain_irq_permission(ctx->xch, domid, irq, 1); -if (r < 0) { -LOGED(ERROR, domainid, - "xc_domain_irq_permission irq=%d (error=%d)", irq, r); -fclose(f); -rc = ERROR_FAIL; -goto out; +if (is_gsi) { +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1); +if (r < 0 && r != -EOPNOTSUPP) { +LOGED(ERROR, domainid, + "xc_domain_gsi_permission gsi=%d (error=%d)", gsi, r); +fclose(f); +rc = ERROR_FAIL; +goto out; +} +} +if (!is_gsi || r == -EOPNOTSUPP) { +r = xc_domain_irq_permission(ctx->xch, domid, irq, 1); +if (r < 0) { +LOGED(ERROR, domainid, +"xc_domain_irq_permission irq=%d (error=%d)", irq, r); +fclose(f); +rc = ERROR_FAIL; +goto out; +} } } fclose(f); @@
[XEN PATCH v6 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for a passthrough device by using gsi, see xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq is not allowed because currd is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail at has_pirq check. So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And add a new check to prevent self map when caller has no PIRQ flag. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 2 ++ xen/arch/x86/physdev.c | 24 2 files changed, 26 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 6ad5b4d5f11f..493998b42ec5 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 7efa17cf4c1e..1367abc61e54 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_map_pirq: { physdev_map_pirq_t map; struct msi_info msi; +struct domain *d; ret = -EFAULT; if ( copy_from_guest(&map, arg, 1) != 0 ) break; +d = rcu_lock_domain_by_any_id(map.domid); +if ( d == NULL ) +return -ESRCH; +/* If it is an HVM guest, check if it has PIRQs */ +if ( !is_pv_domain(d) && !has_pirq(d) ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} +rcu_unlock_domain(d); + switch ( map.type ) { case MAP_PIRQ_TYPE_MSI_SEG: @@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_unmap_pirq: { struct physdev_unmap_pirq unmap; +struct domain *d; ret = -EFAULT; if ( copy_from_guest(&unmap, arg, 1) != 0 ) break; +d = rcu_lock_domain_by_any_id(unmap.domid); +if ( d == NULL ) +return -ESRCH; +/* If it is an HVM guest, check if it has PIRQs */ +if ( !is_pv_domain(d) && !has_pirq(d) ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} +rcu_unlock_domain(d); + ret = physdev_unmap_pirq(unmap.domid, unmap.pirq); break; } -- 2.34.1
[RFC XEN PATCH v6 0/5] Support device passthrough when dom0 is PVH on Xen
om0 do PHYSDEVOP_map_pirq. This v2 patch is better than v1, v1 simply remove the has_pirq check(xen https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.hu...@amd.com/). 3. the gsi of a passthrough device doesn't be unmasked 3.1 failed to check the permission of pirq 3.2 the gsi of passthrough device was not registered in PVH dom0 Problem: 3.1 callback function pci_add_dm_done() will be called when qemu config a passthrough device for domU. This function will call xc_domain_irq_permission()-> pirq_access_permitted() to check if the gsi has corresponding mappings in dom0. But it didn’t, so failed. See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and it return irq is 0. 3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH dom0, because the devices of PVH are using MSI(-X) interrupts. However, the IO-APIC pin must be configured for it to be able to be mapped into a domU. Reason: After searching codes, I find "map_pirq" and "register_gsi" will be done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka ioapic's pin) is unmasked in PVH dom0. So the two problems can be concluded to that the gsi of a passthrough device doesn't be unmasked. Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a device to be passthrough. So that passthrough devices can have the mapping of gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/, kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/ and xen https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.hu...@amd.com/), v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, which is unnecessary and may cause multiple registration. 4. failed to map pirq for gsi Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi to pirq in function xen_pt_realize(). But failed. Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from file /sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). But actually the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a gsi in function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of applying first, distributing first. And if you debug the kernel codes(see function __irq_alloc_descs), you will find the irq number is allocated from small to large by order, but the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq. Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide a syscall for userspace to get the gsi from irq. The third patch of xen(tools: Add new function to get gsi from irq) add a new function xc_physdev_gsi_from_irq() to call the new syscall added on kernel side. And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2 patch is the same as v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ and xen https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/) About the v2 patch of qemu, just change an included head file, other are similar to the v1 ( qemu https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), just call xc_physdev_gsi_from_irq() to get gsi from irq. Jiqian Chen (5): xen/vpci: Clear all vpci status of device x86/pvh: Allow (un)map_pirq when dom0 is PVH x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 libxl: Use gsi instead of irq for mapping pirq domctl: Add XEN_DOMCTL_gsi_permission to grant gsi tools/include/xenctrl.h | 5 +++ tools/libs/ctrl/xc_domain.c | 15 tools/libs/light/libxl_pci.c | 68 +--- xen/arch/x86/domctl.c| 31 xen/arch/x86/hvm/hypercall.c | 8 + xen/arch/x86/physdev.c | 24 + xen/drivers/pci/physdev.c| 36 +++ xen/drivers/vpci/vpci.c | 10 ++ xen/include/public/domctl.h | 9 + xen/include/public/physdev.h | 7 xen/include/xen/vpci.h | 6 xen/xsm/flask/hooks.c| 1 + 12 files changed, 208 insertions(+), 12 deletions(-) -- 2.34.1
[RFC XEN PATCH v6 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
On PVH dom0, the gsis don't get registered, but the gsi of a passthrough device must be configured for it to be able to be mapped into a hvm domU. On Linux kernel side, it calles PHYSDEVOP_setup_gsi for passthrough devices to register gsi when dom0 is PVH. So, add PHYSDEVOP_setup_gsi for above purpose. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/hvm/hypercall.c | 5 + 1 file changed, 5 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 493998b42ec5..7d4e41f66885 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_unmap_pirq: break; +case PHYSDEVOP_setup_gsi: +if ( !is_hardware_domain(currd) ) +return -EOPNOTSUPP; +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: -- 2.34.1
[XEN PATCH v6 1/5] xen/vpci: Clear all vpci status of device
When a device has been reset on dom0 side, the vpci on Xen side won't get notification, so the cached state in vpci is all out of date compare with the real device state. To solve that problem, add a new hypercall to clear all vpci device state. When the state of device is reset on dom0 side, dom0 can call this hypercall to notify vpci. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stewart Hildebrand Reviewed-by: Stefano Stabellini --- xen/arch/x86/hvm/hypercall.c | 1 + xen/drivers/pci/physdev.c| 36 xen/drivers/vpci/vpci.c | 10 ++ xen/include/public/physdev.h | 7 +++ xen/include/xen/vpci.h | 6 ++ 5 files changed, 60 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index eeb73e1aa5d0..6ad5b4d5f11f 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: +case PHYSDEVOP_pci_device_state_reset: case PHYSDEVOP_dbgp_op: if ( !is_hardware_domain(currd) ) return -ENOSYS; diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c index 42db3e6d133c..73dc8f058b0e 100644 --- a/xen/drivers/pci/physdev.c +++ b/xen/drivers/pci/physdev.c @@ -2,6 +2,7 @@ #include #include #include +#include #ifndef COMPAT typedef long ret_t; @@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case PHYSDEVOP_pci_device_state_reset: { +struct physdev_pci_device dev; +struct pci_dev *pdev; +pci_sbdf_t sbdf; + +if ( !is_pci_passthrough_enabled() ) +return -EOPNOTSUPP; + +ret = -EFAULT; +if ( copy_from_guest(&dev, arg, 1) != 0 ) +break; +sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn); + +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf); +if ( ret ) +break; + +pcidevs_lock(); +pdev = pci_get_pdev(NULL, sbdf); +if ( !pdev ) +{ +pcidevs_unlock(); +ret = -ENODEV; +break; +} + +write_lock(&pdev->domain->pci_lock); +ret = vpci_reset_device_state(pdev); +write_unlock(&pdev->domain->pci_lock); +pcidevs_unlock(); +if ( ret ) +printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", &sbdf); +break; +} + default: ret = -ENOSYS; break; diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 260b72875ee1..310700c1e775 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -117,6 +117,16 @@ int vpci_assign_device(struct pci_dev *pdev) return rc; } + +int vpci_reset_device_state(struct pci_dev *pdev) +{ +ASSERT(pcidevs_locked()); +ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); + +vpci_deassign_device(pdev); +return vpci_assign_device(pdev); +} + #endif /* __XEN__ */ static int vpci_register_cmp(const struct vpci_register *r1, diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h index f0c0d4727c0b..f5bab1f29779 100644 --- a/xen/include/public/physdev.h +++ b/xen/include/public/physdev.h @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t); */ #define PHYSDEVOP_prepare_msix 30 #define PHYSDEVOP_release_msix 31 +/* + * Notify the hypervisor that a PCI device has been reset, so that any + * internally cached state is regenerated. Should be called after any + * device reset performed by the hardware domain. + */ +#define PHYSDEVOP_pci_device_state_reset 32 + struct physdev_pci_device { /* IN */ uint16_t seg; diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h index e89c571890b2..ea64d94e818b 100644 --- a/xen/include/xen/vpci.h +++ b/xen/include/xen/vpci.h @@ -30,6 +30,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev); /* Remove all handlers and free vpci related structures. */ void vpci_deassign_device(struct pci_dev *pdev); +int __must_check vpci_reset_device_state(struct pci_dev *pdev); /* Add/remove a register handler. */ int __must_check vpci_add_register_mask(struct vpci *vpci, @@ -266,6 +267,11 @@ static inline int vpci_assign_device(struct pci_dev *pdev) static inline void vpci_deassign_device(struct pci_dev *pdev) { } +static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev) +{ +return 0; +} + static inline void vpci_dump_msi(void) { } static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, -- 2.34.1
[RFC QEMU PATCH v5 1/1] xen: Use gsi instead of irq for mapping pirq
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And the irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, that causes the irq number is not equal with the gsi number. And when passthrough a device, qemu wants to use gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq in current code, so it will fail when mapping. Add gsi into XenHostPCIDevice and use gsi number that read from gsi sysfs if it exists. Signed-off-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- RFC: discussions ongoing on the Linux side where/how to expose the gsi --- hw/xen/xen-host-pci-device.c | 7 +++ hw/xen/xen-host-pci-device.h | 1 + hw/xen/xen_pt.c | 6 +- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c index 8c6e9a1716a2..5be3279aa25b 100644 --- a/hw/xen/xen-host-pci-device.c +++ b/hw/xen/xen-host-pci-device.c @@ -370,6 +370,13 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain, } d->irq = v; +xen_host_pci_get_dec_value(d, "gsi", &v, errp); +if (*errp) { +d->gsi = -1; +} else { +d->gsi = v; +} + xen_host_pci_get_hex_value(d, "class", &v, errp); if (*errp) { goto error; diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h index 4d8d34ecb024..74c552bb5548 100644 --- a/hw/xen/xen-host-pci-device.h +++ b/hw/xen/xen-host-pci-device.h @@ -27,6 +27,7 @@ typedef struct XenHostPCIDevice { uint16_t device_id; uint32_t class_code; int irq; +int gsi; XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1]; XenHostPCIIORegion rom; diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c index 3635d1b39f79..d34a7a8764ab 100644 --- a/hw/xen/xen_pt.c +++ b/hw/xen/xen_pt.c @@ -840,7 +840,11 @@ static void xen_pt_realize(PCIDevice *d, Error **errp) goto out; } -machine_irq = s->real_device.irq; +if (s->real_device.gsi < 0) { +machine_irq = s->real_device.irq; +} else { +machine_irq = s->real_device.gsi; +} if (machine_irq == 0) { XEN_PT_LOG(d, "machine irq is 0\n"); cmd |= PCI_COMMAND_INTX_DISABLE; -- 2.34.1
[QEMU PATCH v5 0/1] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v5 series to support passthrough on Xen when dom0 is PVH. v4->v5 changes: * Add review by Stefano v3->v4 changes: * Add gsi into struct XenHostPCIDevice and use gsi number that read from gsi sysfs if it exists, if there is no gsi sysfs, still use irq. v2->v3 changes: * Du to changes in the implementation of the second patch on kernel side(that adds a new sysfs for gsi instead of a new syscall), so read gsi number from the sysfs of gsi. Below is the description of v2 cover letter: This patch is the v2 of the implementation of passthrough when dom0 is PVH on Xen. Issues we encountered: 1. failed to map pirq for gsi Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi to pirq in function xen_pt_realize(). But failed. Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from file /sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). But actually the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a gsi in function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of applying first, distributing first. And if you debug the kernel codes (see function __irq_alloc_descs), you will find the irq number is allocated from small to large by order, but the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq. Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide a syscall for userspace to get the gsi from irq. The third patch of xen(tools: Add new function to get gsi from irq) add a new function xc_physdev_gsi_from_irq() to call the new syscall added on kernel side. And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2 on qemu side is the same as the v1 (qemu https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), just call xc_physdev_gsi_from_irq() to get gsi from irq. Jiqian Chen (1): xen: Use gsi instead of irq for mapping pirq hw/xen/xen-host-pci-device.c | 7 +++ hw/xen/xen-host-pci-device.h | 1 + hw/xen/xen_pt.c | 6 +- 3 files changed, 13 insertions(+), 1 deletion(-) -- 2.34.1
[RFC XEN PATCH v5 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
Some type of domain don't have PIRQ, like PVH, when passthrough a device to guest on PVH dom0, callstack pci_add_dm_done->XEN_DOMCTL_irq_permission will failed at domain_pirq_to_irq. So, add a new hypercall to grant/revoke gsi permission when dom0 is not PV or dom0 has not PIRQ flag. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen --- tools/include/xenctrl.h | 5 + tools/libs/ctrl/xc_domain.c | 15 +++ tools/libs/light/libxl_pci.c | 16 ++-- xen/arch/x86/domctl.c| 31 +++ xen/include/public/domctl.h | 9 + xen/xsm/flask/hooks.c| 1 + 6 files changed, 75 insertions(+), 2 deletions(-) diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 2ef8b4e05422..519c860a00d5 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch, uint32_t pirq, bool allow_access); +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access); + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c index f2d9d14b4d9f..448ba2c59ae1 100644 --- a/tools/libs/ctrl/xc_domain.c +++ b/tools/libs/ctrl/xc_domain.c @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch, return do_domctl(xch, &domctl); } +int xc_domain_gsi_permission(xc_interface *xch, + uint32_t domid, + uint32_t gsi, + bool allow_access) +{ +struct xen_domctl domctl = {}; + +domctl.cmd = XEN_DOMCTL_gsi_permission; +domctl.domain = domid; +domctl.u.gsi_permission.gsi = gsi; +domctl.u.gsi_permission.allow_access = allow_access; + +return do_domctl(xch, &domctl); +} + int xc_domain_iomem_permission(xc_interface *xch, uint32_t domid, unsigned long first_mfn, diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index a1c6e82631e9..4136a860a048 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -1421,6 +1421,8 @@ static void pci_add_dm_done(libxl__egc *egc, uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED; uint32_t domainid = domid; bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid); +int gsi; +bool has_gsi = true; /* Convenience aliases */ bool starting = pas->starting; @@ -1482,6 +1484,7 @@ static void pci_add_dm_done(libxl__egc *egc, pci->bus, pci->dev, pci->func); if ( access(sysfs_path, F_OK) != 0 ) { +has_gsi = false; if ( errno == ENOENT ) sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain, pci->bus, pci->dev, pci->func); @@ -1497,6 +1500,7 @@ static void pci_add_dm_done(libxl__egc *egc, goto out_no_irq; } if ((fscanf(f, "%u", &irq) == 1) && irq) { +gsi = irq; r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq); if (r < 0) { LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)", @@ -1505,7 +1509,10 @@ static void pci_add_dm_done(libxl__egc *egc, rc = ERROR_FAIL; goto out; } -r = xc_domain_irq_permission(ctx->xch, domid, irq, 1); +if ( has_gsi ) +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1); +if ( !has_gsi || r == -EOPNOTSUPP ) +r = xc_domain_irq_permission(ctx->xch, domid, irq, 1); if (r < 0) { LOGED(ERROR, domainid, "xc_domain_irq_permission irq=%d (error=%d)", irq, r); @@ -2185,6 +2192,7 @@ static void pci_remove_detached(libxl__egc *egc, FILE *f; uint32_t domainid = prs->domid; bool isstubdom; +bool has_gsi = true; /* Convenience aliases */ libxl_device_pci *const pci = &prs->pci; @@ -2244,6 +2252,7 @@ skip_bar: pci->bus, pci->dev, pci->func); if ( access(sysfs_path, F_OK) != 0 ) { +has_gsi = false; if ( errno == ENOENT ) sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain, pci->bus, pci->dev, pci->func); @@ -2270,7 +2279,10 @@ skip_bar: */ LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq); } -rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0); +i
[RFC XEN PATCH v5 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
On PVH dom0, the gsis don't get registered, but the gsi of a passthrough device must be configured for it to be able to be mapped into a hvm domU. On Linux kernel side, it calles PHYSDEVOP_setup_gsi for passthrough devices to register gsi when dom0 is PVH. So, add PHYSDEVOP_setup_gsi for above purpose. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/hvm/hypercall.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 493998b42ec5..46f51ee459f6 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -76,6 +76,12 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_unmap_pirq: break; +case PHYSDEVOP_setup_gsi: +if ( !is_hardware_domain(currd) ) +return -EOPNOTSUPP; +ASSERT(!has_pirq(currd)); +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: -- 2.34.1
[RFC XEN PATCH v5 4/5] libxl: Use gsi instead of irq for mapping pirq
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And the irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, that causes the irq number is not equal with the gsi number. And when passthrough a device, xl wants to use gsi to map pirq, see pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq in current code, so it will fail when mapping. So, use real gsi number read from gsi sysfs. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen Reviewed-by: Stefano Stabellini --- tools/libs/light/libxl_pci.c | 25 +++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index 96cb4da0794e..a1c6e82631e9 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -1478,8 +1478,19 @@ static void pci_add_dm_done(libxl__egc *egc, fclose(f); if (!pci_supp_legacy_irq()) goto out_no_irq; -sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain, +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain, pci->bus, pci->dev, pci->func); + +if ( access(sysfs_path, F_OK) != 0 ) { +if ( errno == ENOENT ) +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain, +pci->bus, pci->dev, pci->func); +else { +LOGED(ERROR, domainid, "Can't access %s", sysfs_path); +goto out_no_irq; +} +} + f = fopen(sysfs_path, "r"); if (f == NULL) { LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path); @@ -2229,9 +2240,19 @@ skip_bar: if (!pci_supp_legacy_irq()) goto skip_legacy_irq; -sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain, +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain, pci->bus, pci->dev, pci->func); +if ( access(sysfs_path, F_OK) != 0 ) { +if ( errno == ENOENT ) +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain, +pci->bus, pci->dev, pci->func); +else { +LOGED(ERROR, domid, "Can't access %s", sysfs_path); +goto skip_legacy_irq; +} +} + f = fopen(sysfs_path, "r"); if (f == NULL) { LOGED(ERROR, domid, "Couldn't open %s", sysfs_path); -- 2.34.1
[RFC XEN PATCH v5 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for a passthrough device by using gsi, see xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq is not allowed because currd is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail at has_pirq check. So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And add a new check to prevent self map when caller has no PIRQ flag. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/hvm/hypercall.c | 2 ++ xen/arch/x86/physdev.c | 22 ++ 2 files changed, 24 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 6ad5b4d5f11f..493998b42ec5 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: +break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 47c4da0af7e1..7f2422c2a483 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -303,11 +303,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_map_pirq: { physdev_map_pirq_t map; struct msi_info msi; +struct domain *d; ret = -EFAULT; if ( copy_from_guest(&map, arg, 1) != 0 ) break; +d = rcu_lock_domain_by_any_id(map.domid); +if ( d == NULL ) +return -ESRCH; +if ( !is_pv_domain(d) && !has_pirq(d) ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} +rcu_unlock_domain(d); + switch ( map.type ) { case MAP_PIRQ_TYPE_MSI_SEG: @@ -341,11 +352,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_unmap_pirq: { struct physdev_unmap_pirq unmap; +struct domain *d; ret = -EFAULT; if ( copy_from_guest(&unmap, arg, 1) != 0 ) break; +d = rcu_lock_domain_by_any_id(unmap.domid); +if ( d == NULL ) +return -ESRCH; +if ( !is_pv_domain(d) && !has_pirq(d) ) +{ +rcu_unlock_domain(d); +return -EOPNOTSUPP; +} +rcu_unlock_domain(d); + ret = physdev_unmap_pirq(unmap.domid, unmap.pirq); break; } -- 2.34.1
[RFC XEN PATCH v5 1/5] xen/vpci: Clear all vpci status of device
When a device has been reset on dom0 side, the vpci on Xen side won't get notification, so the cached state in vpci is all out of date compare with the real device state. To solve that problem, add a new hypercall to clear all vpci device state. When the state of device is reset on dom0 side, dom0 can call this hypercall to notify vpci. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen --- xen/arch/x86/hvm/hypercall.c | 1 + xen/drivers/pci/physdev.c| 36 xen/drivers/vpci/vpci.c | 10 ++ xen/include/public/physdev.h | 7 +++ xen/include/xen/vpci.h | 6 ++ 5 files changed, 60 insertions(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index eeb73e1aa5d0..6ad5b4d5f11f 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_pci_mmcfg_reserved: case PHYSDEVOP_pci_device_add: case PHYSDEVOP_pci_device_remove: +case PHYSDEVOP_pci_device_state_reset: case PHYSDEVOP_dbgp_op: if ( !is_hardware_domain(currd) ) return -ENOSYS; diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c index 42db3e6d133c..73dc8f058b0e 100644 --- a/xen/drivers/pci/physdev.c +++ b/xen/drivers/pci/physdev.c @@ -2,6 +2,7 @@ #include #include #include +#include #ifndef COMPAT typedef long ret_t; @@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +case PHYSDEVOP_pci_device_state_reset: { +struct physdev_pci_device dev; +struct pci_dev *pdev; +pci_sbdf_t sbdf; + +if ( !is_pci_passthrough_enabled() ) +return -EOPNOTSUPP; + +ret = -EFAULT; +if ( copy_from_guest(&dev, arg, 1) != 0 ) +break; +sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn); + +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf); +if ( ret ) +break; + +pcidevs_lock(); +pdev = pci_get_pdev(NULL, sbdf); +if ( !pdev ) +{ +pcidevs_unlock(); +ret = -ENODEV; +break; +} + +write_lock(&pdev->domain->pci_lock); +ret = vpci_reset_device_state(pdev); +write_unlock(&pdev->domain->pci_lock); +pcidevs_unlock(); +if ( ret ) +printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", &sbdf); +break; +} + default: ret = -ENOSYS; break; diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 72ef277c4f8e..c6df2c6a9561 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -107,6 +107,16 @@ int vpci_add_handlers(struct pci_dev *pdev) return rc; } + +int vpci_reset_device_state(struct pci_dev *pdev) +{ +ASSERT(pcidevs_locked()); +ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); + +vpci_remove_device(pdev); +return vpci_add_handlers(pdev); +} + #endif /* __XEN__ */ static int vpci_register_cmp(const struct vpci_register *r1, diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h index f0c0d4727c0b..f5bab1f29779 100644 --- a/xen/include/public/physdev.h +++ b/xen/include/public/physdev.h @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t); */ #define PHYSDEVOP_prepare_msix 30 #define PHYSDEVOP_release_msix 31 +/* + * Notify the hypervisor that a PCI device has been reset, so that any + * internally cached state is regenerated. Should be called after any + * device reset performed by the hardware domain. + */ +#define PHYSDEVOP_pci_device_state_reset 32 + struct physdev_pci_device { /* IN */ uint16_t seg; diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h index d20c301a3db3..6ec83ce9ae13 100644 --- a/xen/include/xen/vpci.h +++ b/xen/include/xen/vpci.h @@ -30,6 +30,7 @@ int __must_check vpci_add_handlers(struct pci_dev *pdev); /* Remove all handlers and free vpci related structures. */ void vpci_remove_device(struct pci_dev *pdev); +int __must_check vpci_reset_device_state(struct pci_dev *pdev); /* Add/remove a register handler. */ int __must_check vpci_add_register_mask(struct vpci *vpci, @@ -262,6 +263,11 @@ static inline int vpci_add_handlers(struct pci_dev *pdev) static inline void vpci_remove_device(struct pci_dev *pdev) { } +static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev) +{ +return 0; +} + static inline void vpci_dump_msi(void) { } static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, -- 2.34.1
[RFC XEN PATCH v5 0/5] Support device passthrough when dom0 is PVH on Xen
) to check if the gsi has corresponding mappings in dom0. But it didn’t, so failed. See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and it return irq is 0. 3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH dom0, because the devices of PVH are using MSI(-X) interrupts. However, the IO-APIC pin must be configured for it to be able to be mapped into a domU. Reason: After searching codes, I find "map_pirq" and "register_gsi" will be done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka ioapic's pin) is unmasked in PVH dom0. So the two problems can be concluded to that the gsi of a passthrough device doesn't be unmasked. Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a device to be passthrough. So that passthrough devices can have the mapping of gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/, kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/ and xen https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.hu...@amd.com/), v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, which is unnecessary and may cause multiple registration. 4. failed to map pirq for gsi Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi to pirq in function xen_pt_realize(). But failed. Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from file /sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). But actually the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a gsi in function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of applying first, distributing first. And if you debug the kernel codes(see function __irq_alloc_descs), you will find the irq number is allocated from small to large by order, but the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq. Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide a syscall for userspace to get the gsi from irq. The third patch of xen(tools: Add new function to get gsi from irq) add a new function xc_physdev_gsi_from_irq() to call the new syscall added on kernel side. And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2 patch is the same as v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ and xen https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/) About the v2 patch of qemu, just change an included head file, other are similar to the v1 ( qemu https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), just call xc_physdev_gsi_from_irq() to get gsi from irq. Jiqian Chen (5): xen/vpci: Clear all vpci status of device x86/pvh: Allow (un)map_pirq when dom0 is PVH x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 libxl: Use gsi instead of irq for mapping pirq domctl: Add XEN_DOMCTL_gsi_permission to grant gsi tools/include/xenctrl.h | 5 + tools/libs/ctrl/xc_domain.c | 15 + tools/libs/light/libxl_pci.c | 41 xen/arch/x86/domctl.c| 31 +++ xen/arch/x86/hvm/hypercall.c | 9 xen/arch/x86/physdev.c | 22 +++ xen/drivers/pci/physdev.c| 36 +++ xen/drivers/vpci/vpci.c | 10 + xen/include/public/domctl.h | 9 xen/include/public/physdev.h | 7 ++ xen/include/xen/vpci.h | 6 ++ xen/xsm/flask/hooks.c| 1 + 12 files changed, 188 insertions(+), 4 deletions(-) -- 2.34.1
[RFC QEMU PATCH v4 1/1] xen: Use gsi instead of irq for mapping pirq
In PVH dom0, it uses the linux local interrupt mechanism, when it allocs irq for a gsi, it is dynamic, and follow the principle of applying first, distributing first. And the irq number is alloced from small to large, but the applying gsi number is not, may gsi 38 comes before gsi 28, that causes the irq number is not equal with the gsi number. And when passthrough a device, qemu wants to use gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but the gsi number is got from file /sys/bus/pci/devices//irq in current code, so it will fail when mapping. Add gsi into XenHostPCIDevice and use gsi number that read from gsi sysfs if it exists. Co-developed-by: Huang Rui Signed-off-by: Jiqian Chen --- hw/xen/xen-host-pci-device.c | 7 +++ hw/xen/xen-host-pci-device.h | 1 + hw/xen/xen_pt.c | 6 +- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c index 8c6e9a1716a2..5be3279aa25b 100644 --- a/hw/xen/xen-host-pci-device.c +++ b/hw/xen/xen-host-pci-device.c @@ -370,6 +370,13 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain, } d->irq = v; +xen_host_pci_get_dec_value(d, "gsi", &v, errp); +if (*errp) { +d->gsi = -1; +} else { +d->gsi = v; +} + xen_host_pci_get_hex_value(d, "class", &v, errp); if (*errp) { goto error; diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h index 4d8d34ecb024..74c552bb5548 100644 --- a/hw/xen/xen-host-pci-device.h +++ b/hw/xen/xen-host-pci-device.h @@ -27,6 +27,7 @@ typedef struct XenHostPCIDevice { uint16_t device_id; uint32_t class_code; int irq; +int gsi; XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1]; XenHostPCIIORegion rom; diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c index 36e6f93c372f..d448f3a17306 100644 --- a/hw/xen/xen_pt.c +++ b/hw/xen/xen_pt.c @@ -839,7 +839,11 @@ static void xen_pt_realize(PCIDevice *d, Error **errp) goto out; } -machine_irq = s->real_device.irq; +if (s->real_device.gsi < 0) { +machine_irq = s->real_device.irq; +} else { +machine_irq = s->real_device.gsi; +} if (machine_irq == 0) { XEN_PT_LOG(d, "machine irq is 0\n"); cmd |= PCI_COMMAND_INTX_DISABLE; -- 2.34.1
[RFC QEMU PATCH v4 0/1] Support device passthrough when dom0 is PVH on Xen
Hi All, This is v4 series to support passthrough on Xen when dom0 is PVH. v3->v4 changes: * Add gsi into struct XenHostPCIDevice and use gsi number that read from gsi sysfs if it exists, if there is no gsi sysfs, still use irq. v2->v3 changes: * du to changes in the implementation of the second patch on kernel side(that adds a new sysfs for gsi instead of a new syscall), so read gsi number from the sysfs of gsi. Below is the description of v2 cover letter: This patch is the v2 of the implementation of passthrough when dom0 is PVH on Xen. Issues we encountered: 1. failed to map pirq for gsi Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi to pirq in function xen_pt_realize(). But failed. Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from file /sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). But actually the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a gsi in function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of applying first, distributing first. And if you debug the kernel codes(see function __irq_alloc_descs), you will find the irq number is allocated from small to large by order, but the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq. Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide a syscall for userspace to get the gsi from irq. The third patch of xen(tools: Add new function to get gsi from irq) add a new function xc_physdev_gsi_from_irq() to call the new syscall added on kernel side. And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2 on qemu side is the same as the v1 ( qemu https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), just call xc_physdev_gsi_from_irq() to get gsi from irq. Jiqian Chen (1): xen: Use gsi instead of irq for mapping pirq hw/xen/xen-host-pci-device.c | 7 +++ hw/xen/xen-host-pci-device.h | 1 + hw/xen/xen_pt.c | 6 +- 3 files changed, 13 insertions(+), 1 deletion(-) -- 2.34.1