[KERNEL PATCH v9 3/3] xen/privcmd: Add new syscall to get gsi from dev

2024-09-12 Thread Jiqian Chen
On PVH dom0, when passthrough a device to domU, QEMU and xl tools
want to use gsi number to do pirq mapping, see QEMU code
xen_pt_realize->xc_physdev_map_pirq, and xl code
pci_add_dm_done->xc_physdev_map_pirq, but in current codes, the gsi
number is got from file /sys/bus/pci/devices//irq, that is
wrong, because irq is not equal with gsi, they are in different
spaces, so pirq mapping fails.
And in current linux codes, there is no method to get gsi
for userspace.

For above purpose, record gsi of pcistub devices when init
pcistub and add a new syscall into privcmd to let userspace
can get gsi when they have a need.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
v8->v9 changes:
Changed the syscall name from "IOCTL_PRIVCMD_GSI_FROM_DEV" to 
"IOCTL_PRIVCMD_PCIDEV_GET_GSI". Also changed the other functions name.
Changed the macro wrapping "pcistub_get_gsi_from_sbdf" from "CONFIG_XEN_ACPI" 
to "CONFIG_XEN_PCIDEV_BACKEND" to fix compile errors reported by CI robot.
Changed the parameter gsi of struct privcmd_pcidev_get_gsi from int to u32.

v7->v8 changes:
In function privcmd_ioctl_gsi_from_dev, return -EINVAL when not confige 
CONFIG_XEN_ACPI.
Used PCI_BUS_NUM PCI_SLOT PCI_FUNC instead of open coding.

v6->v7 changes:
Changed implementation to add a new parameter "gsi" to struct pcistub_device 
and set gsi when pcistub initialize device. Then when userspace wants to get 
gsi and pass sbdf, we can return that gsi.

v5->v6 changes:
Changed implementation to add a new syscall to translate irq to gsi, instead 
adding a new gsi sysfs node, because the pci Maintainer didn't allow to add 
that sysfs node.

v3->v5 changes:
No.

v2->v3 changes:
Suggested by Roger: Abandoned previous implementations that added new syscall 
to get gsi from irq and changed to add a new sysfs node for gsi, then userspace 
can get gsi number from sysfs node.
---
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202406090826.whl6cb7r-...@intel.com/
---
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202405171113.t431pc8o-...@intel.com/
---
 drivers/xen/privcmd.c  | 30 +++
 drivers/xen/xen-pciback/pci_stub.c | 38 +++---
 include/uapi/xen/privcmd.h |  7 ++
 include/xen/acpi.h |  9 +++
 4 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 9563650dfbaf..1ed612d21543 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -46,6 +46,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_XEN_ACPI
+#include 
+#endif
 
 #include "privcmd.h"
 
@@ -844,6 +847,29 @@ static long privcmd_ioctl_mmap_resource(struct file *file,
return rc;
 }
 
+static long privcmd_ioctl_pcidev_get_gsi(struct file *file, void __user *udata)
+{
+#ifdef CONFIG_XEN_ACPI
+   int rc;
+   struct privcmd_pcidev_get_gsi kdata;
+
+   if (copy_from_user(&kdata, udata, sizeof(kdata)))
+   return -EFAULT;
+
+   rc = pcistub_get_gsi_from_sbdf(kdata.sbdf);
+   if (rc < 0)
+   return rc;
+
+   kdata.gsi = rc;
+   if (copy_to_user(udata, &kdata, sizeof(kdata)))
+   return -EFAULT;
+
+   return 0;
+#else
+   return -EINVAL;
+#endif
+}
+
 #ifdef CONFIG_XEN_PRIVCMD_EVENTFD
 /* Irqfd support */
 static struct workqueue_struct *irqfd_cleanup_wq;
@@ -1543,6 +1569,10 @@ static long privcmd_ioctl(struct file *file,
ret = privcmd_ioctl_ioeventfd(file, udata);
break;
 
+   case IOCTL_PRIVCMD_PCIDEV_GET_GSI:
+   ret = privcmd_ioctl_pcidev_get_gsi(file, udata);
+   break;
+
default:
break;
}
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 8ce27333f54b..2ea8e4075adc 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -56,6 +56,9 @@ struct pcistub_device {
 
struct pci_dev *dev;
struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */
+#ifdef CONFIG_XEN_ACPI
+   int gsi;
+#endif
 };
 
 /* Access to pcistub_devices & seized_devices lists and the initialize_devices
@@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
 
kref_init(&psdev->kref);
spin_lock_init(&psdev->lock);
+#ifdef CONFIG_XEN_ACPI
+   psdev->gsi = -1;
+#endif
 
return psdev;
 }
@@ -220,6 +226,25 @@ static struct pci_dev *pcistub_device_get_pci_dev(struct 
xen_pcibk_device *pdev,
return pci_dev;
 }
 
+#ifdef CONFIG_XEN_PCIDEV_BACKEND
+int pcistub_get_gsi_from_sbdf(unsigned int sbdf)
+{
+   struct pcistub_device *psdev;
+   int domain = (sbdf >> 16) & 0x;
+   int bus

[KERNEL PATCH v9 0/3] Support device passthrough when dom0 is PVH on Xen

2024-09-12 Thread Jiqian Chen
Hi All,
This is v9 series to support passthrough on Xen when dom0 is PVH.
Due to the dependency codes on Xen side have been merged, so I continue to 
upstream this series.
Although all patches of v8 have got "Reviewed-by", too much time has passed and 
there are some changes
in the code, so I didn't add "Reviewed-by". Please review them again.

v8->v9 changes:
* patch#1: Due to the struct and name of the hypercall changed on Xen side, I 
did the corresponding
   changes. But no function changes actually.
* patch#2: Moved the calling of xen_acpi_get_gsi_info under check "if 
(xen_initial_domain() && xen_pvh_domain())"
   to prevent it is called in PV dom0.
* patch#3: Changed the syscall name from "IOCTL_PRIVCMD_GSI_FROM_DEV" to 
"IOCTL_PRIVCMD_PCIDEV_GET_GSI".
   Also changed the other functions name.
   Changed the macro wrapping "pcistub_get_gsi_from_sbdf" from 
"CONFIG_XEN_ACPI" to "CONFIG_XEN_PCIDEV_BACKEND"
   to fix compile errors reported by CI robot.
   Changed the parameter gsi of struct privcmd_pcidev_get_gsi from int 
to u32.


Best regards,
Jiqian Chen



v7->v8 change:
* patch#1: This is the patch#1 of v6, because it is reverted from the staging 
branch due to the
   API changes on Xen side.
   Add pci_device_state_reset_type_t to distinguish the reset types.
* patch#2: is the patch#1 of v7. Use CONFIG_XEN_ACPI instead of CONFIG_ACPI to 
wrap codes.
* patch#3: is the patch#2 of v7. In function privcmd_ioctl_gsi_from_dev, return 
-EINVAL when not
   confige CONFIG_XEN_ACPI.
   Used PCI_BUS_NUM PCI_SLOT PCI_FUNC instead of open coding.


v6->v7 change:
* the first patch of v6 was already merged into branch linux_next.
* patch#1: is the patch#2 of v6. move the implementation of function 
xen_acpi_get_gsi_info to
   file drivers/xen/acpi.c, that modification is more convenient for 
the subsequent
   patch to obtain gsi.
* patch#2: is the patch#3 of v6. add a new parameter "gsi" to struct 
pcistub_device and set
   gsi when pcistub initialize device. Then when userspace wants to get 
gsi by passing
   sbdf, we can return that gsi.


v5->v6 change:
* patch#3: change to add a new syscall to translate irq to gsi, instead adding 
a new gsi sysfs.


v4->v5 changes:
* patch#1: Add Reviewed-by Stefano
* patch#2: Add Reviewed-by Stefano
* patch#3: No changes


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; use a new 
function
   pcistub_reset_device_state to wrap __pci_reset_function_locked and 
xen_reset_device_state,
   and call pcistub_reset_device_state in pci_stub.c
* patch#2: remove map_pirq from xen_pvh_passthrough_gsi


v2->v3 changes:
* patch#1: add condition to limit do xen_reset_device_state for no-pv domain in 
pcistub_init_device.
* patch#2: Abandoning previous implementations that call unmask_irq. To setup 
gsi and map pirq for
   passthrough device in pcistub_init_device.
* patch#3: Abandoning previous implementations that adds new syscall to get gsi 
from irq. To add a new
   sysfs for gsi, then userspace can get gsi number from sysfs.


Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a 
device, pci_stub will
call \u201cpcistub_init_device() -> pci_restore_state() -> 
pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword(), the pci
config write will trigger an io interrupt to bar_write() in the xen, but the 
bar->enabled was set before,
the write is not allowed now, and then when bar->Qemu config the passthrough 
device in xen_pt_realize(),
it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current cached state
in pdev->vpci is all out of date and is different from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new hypercall to
reset the state stored in vPCI when the state of real device has changed.
Thank Roger for the suggestion of this v2, and it is different from v1
(https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), 
v1 simply allow domU to
write pci bar, it does not comply with the design principl

[KERNEL PATCH v9 1/3] xen/pci: Add a function to reset device for xen

2024-09-12 Thread Jiqian Chen
When device on dom0 side has been reset, the vpci on Xen side
won't get notification, so that the cached state in vpci is
all out of date with the real device state.
To solve that problem, add a new function to clear all vpci
device state when device is reset on dom0 side.

And call that function in pcistub_init_device. Because when
using "pci-assignable-add" to assign a passthrough device in
Xen, it will reset passthrough device and the vpci state will
out of date, and then device will fail to restore bar state.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
v8->v9 changes:
Due to the struct and name of the hypercall changed on Xen side, I did the 
corresponding changes, so removed the Reviewed-by of Stefano. But no function 
changes actually.

v5->v8 changes:
No.

v4->v5 changes:
Added Reviewed-by of Stefano.

v3->v4 changes:
Changed the code comment of PHYSDEVOP_pci_device_state_reset.
Used a new function pcistub_reset_device_state to wrap 
__pci_reset_function_locked and xen_reset_device_state, and called 
pcistub_reset_device_state in pci_stub.c.

v2->v3 changes:
Added condition to limit do xen_reset_device_state for no-pv domain in 
pcistub_init_device.

v1->v2 changes:
New patch to add a new function to call reset hypercall.
---
 drivers/xen/pci.c  | 13 +
 drivers/xen/xen-pciback/pci_stub.c | 18 +++---
 include/xen/interface/physdev.h| 17 +
 include/xen/pci.h  |  6 ++
 4 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 72d4e3f193af..bb59524b8bbd 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -177,6 +177,19 @@ static int xen_remove_device(struct device *dev)
return r;
 }
 
+int xen_reset_device(const struct pci_dev *dev)
+{
+   struct pci_device_reset device = {
+   .dev.seg = pci_domain_nr(dev->bus),
+   .dev.bus = dev->bus->number,
+   .dev.devfn = dev->devfn,
+   .flags = PCI_DEVICE_RESET_FLR,
+   };
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_reset, &device);
+}
+EXPORT_SYMBOL_GPL(xen_reset_device);
+
 static int xen_pci_notifier(struct notifier_block *nb,
unsigned long action, void *data)
 {
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 4faebbb84999..3e162c1753e2 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -89,6 +89,16 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
return psdev;
 }
 
+static int pcistub_reset_device_state(struct pci_dev *dev)
+{
+   __pci_reset_function_locked(dev);
+
+   if (!xen_pv_domain())
+   return xen_reset_device(dev);
+   else
+   return 0;
+}
+
 /* Don't call this directly as it's called by pcistub_device_put */
 static void pcistub_device_release(struct kref *kref)
 {
@@ -107,7 +117,7 @@ static void pcistub_device_release(struct kref *kref)
/* Call the reset function which does not take lock as this
 * is called from "unbind" which takes a device_lock mutex.
 */
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
if (dev_data &&
pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state))
dev_info(&dev->dev, "Could not reload PCI state\n");
@@ -284,7 +294,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 * (so it's ready for the next domain)
 */
device_lock_assert(&dev->dev);
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
 
dev_data = pci_get_drvdata(dev);
ret = pci_load_saved_state(dev, dev_data->pci_saved_state);
@@ -420,7 +430,9 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_err(&dev->dev, "Could not store PCI conf saved state!\n");
else {
dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n");
-   __pci_reset_function_locked(dev);
+   err = pcistub_reset_device_state(dev);
+   if (err)
+   goto config_release;
pci_restore_state(dev);
}
/* Now disable the device (this also ensures some private device
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index a237af867873..df74e65a884b 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -256,6 +256,13 @@ struct physdev_pci_device_add {
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is 

[KERNEL PATCH v9 2/3] xen/pvh: Setup gsi for passthrough device

2024-09-12 Thread Jiqian Chen
In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assigning a device to passthrough, proactively setup the gsi
of the device during that process.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
v8->v9 changes:
Moved the calling of xen_acpi_get_gsi_info under check "if 
(xen_initial_domain() && xen_pvh_domain())" to prevent it is called in PV dom0.
Removed Reviewed-by of Stefano.

v7->v8 changes:
Used CONFIG_XEN_ACPI instead of CONFIG_ACPI to wrap codes.

v6->v7 changes:
Moved the implementation of function xen_acpi_get_gsi_info to file 
drivers/xen/acpi.c, that modification is more convenient for the subsequent 
patch to obtain gsi.

v5->v6 changes:
No.

v4->v5 changes:
Added Reviewed-by of Stefano.

v3->v4 changes:
Removed map_pirq from xen_pvh_passthrough_gsi since let pvh calls map_pirq here 
is not right.

v2->v3 changes:
Abandoned previous implementations that called unmask_irq, and change to do 
setup_gsi and map_pirq for passthrough device in pcistub_init_device.
---
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202406090859.kw3eeesv-...@intel.com/
---
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202405172132.tazuvppo-...@intel.com/
---
 arch/x86/xen/enlighten_pvh.c   | 23 ++
 drivers/acpi/pci_irq.c |  2 +-
 drivers/xen/acpi.c | 50 ++
 drivers/xen/xen-pciback/pci_stub.c | 20 
 include/linux/acpi.h   |  1 +
 include/xen/acpi.h | 18 +++
 6 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index 728a4366ca85..bf68c329fc01 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -4,6 +4,7 @@
 #include 
 
 #include 
+#include 
 
 #include 
 #include 
@@ -28,6 +29,28 @@
 bool __ro_after_init xen_pvh;
 EXPORT_SYMBOL_GPL(xen_pvh);
 
+#ifdef CONFIG_XEN_DOM0
+int xen_pvh_setup_gsi(int gsi, int trigger, int polarity)
+{
+   int ret;
+   struct physdev_setup_gsi setup_gsi;
+
+   setup_gsi.gsi = gsi;
+   setup_gsi.triggering = (trigger == ACPI_EDGE_SENSITIVE ? 0 : 1);
+   setup_gsi.polarity = (polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+   ret = HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi);
+   if (ret == -EEXIST) {
+   xen_raw_printk("Already setup the GSI :%d\n", gsi);
+   ret = 0;
+   } else if (ret)
+   xen_raw_printk("Fail to setup GSI (%d)!\n", gsi);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_setup_gsi);
+#endif
+
 /*
  * Reserve e820 UNUSABLE regions to inflate the memory balloon.
  *
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..630fe0a34bc6 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev,
 }
 #endif /* CONFIG_X86_IO_APIC */
 
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
 {
struct acpi_prt_entry *entry = NULL;
struct pci_dev *bridge;
diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c
index 6893c79fd2a1..9e2096524fbc 100644
--- a/drivers/xen/acpi.c
+++ b/drivers/xen/acpi.c
@@ -30,6 +30,7 @@
  * IN THE SOFTWARE.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -75,3 +76,52 @@ int xen_acpi_notify_hypervisor_extended_sleep(u8 sleep_state,
return xen_acpi_notify_hypervisor_state(sleep_state, val_a,
val_b, true);
 }
+
+struct acpi_prt_entry {
+   struct acpi_pci_id  id;
+   u8  pin;
+   acpi_handle link;
+   u32 index;
+};
+
+int xen_acpi_get_gsi_info(struct pci_dev *dev,
+ int *gsi_out,
+ int *trigger_out,
+ int *polarity_out)
+{
+   int gsi;
+   u8 pin;
+   struct acpi_prt_entry *entry;
+   int trigger = ACPI_LEVEL_SENSITIVE;
+   int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+ ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+   if (!dev || !gsi_out || !trigger_out || !polarity_out)
+   return -EINVAL;
+
+   pin = dev->pin;
+   if (!pin)
+   return -EINVAL;
+
+   entry = acpi_pci_irq_lookup(dev, pin);
+   if (entry) {
+   if (entry->link)
+   gsi = acpi_pci_link_allocate_irq(entry->link,
+entry->ind

[RFC XEN PATCH v15 4/4] tools: Add new function to do PIRQ (un)map on PVH dom0

2024-09-11 Thread Jiqian Chen
When dom0 is PVH, and passthrough a device to dumU, xl will
use the gsi number of device to do a pirq mapping, see
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is
got from file /sys/bus/pci/devices//irq, that confuses
irq and gsi, they are in different space and are not equal,
so it will fail when mapping.
To solve this issue, to get the real gsi and add a new function
xc_physdev_map_pirq_gsi to get a free pirq for gsi.
Note: why not use current function xc_physdev_map_pirq, because
it doesn't support to allocate a free pirq, what's more, to
prevent changing it and affecting its callers, so add
xc_physdev_map_pirq_gsi.

Besides, PVH dom0 doesn't have PIRQs flag, it doesn't do
PHYSDEVOP_map_pirq for each gsi. So grant function callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission
requires passing in pirq, it is not suitable for PVH dom0 that
doesn't have PIRQs to grant irq permission.
To solve this issue, use the another hypercall
XEN_DOMCTL_gsi_permission to grant the permission of irq(
translate from gsi) to dumU when dom0 has no PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
---
v13->v15 changes:
Change the initialization way of "struct physdev_map_pirq map" in function 
xc_physdev_map_pirq_gsi to be definition and set value directly.
Change code from "rc = libxl__arch_local_domain_has_pirq_notion(gc); if (!rc) 
{}" to "if (libxl__arch_local_domain_has_pirq_notion(gc) == false) {}"
Modified some log prints codes.

v12->v13 changes:
Deleted patch #6 of v12, and added function xc_physdev_map_pirq_gsi to map pirq 
for gsi.
For functions that generate libxl error, changed the return value from -1 to 
ERROR_*.
Instead of declaring "ctx", use the macro "CTX".
Add the function libxl__arch_local_romain_ has_pirq_notion to determine if 
there is a concept of pirq in the domain where xl is located.
In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use map_pirq to 
obtain the pirq corresponding to gsi.

v11->v12 changes:
Nothing.

v10->v11 changes:
New patch
Modification of the tools part of patches#4 and #5 of v10, use 
privcmd_gsi_from_dev to get gsi, and use XEN_DOMCTL_gsi_permission to grant gsi.
Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID.
Add libxl__arch_hvm_map_gsi to distinguish x86 related implementations.
Add a list pcidev_pirq_list to record the relationship between sbdf and pirq, 
which can be used to obtain the corresponding pirq when unmap PIRQ.
---
 tools/include/xenctrl.h   |  10 
 tools/libs/ctrl/xc_domain.c   |  15 +
 tools/libs/ctrl/xc_physdev.c  |  27 +
 tools/libs/light/libxl_arch.h |   6 ++
 tools/libs/light/libxl_arm.c  |  15 +
 tools/libs/light/libxl_pci.c  | 110 --
 tools/libs/light/libxl_x86.c  |  72 ++
 7 files changed, 210 insertions(+), 45 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 924f9a35f790..29617585c535 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1383,6 +1383,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ uint32_t flags);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
@@ -1638,6 +1643,11 @@ int xc_physdev_map_pirq_msi(xc_interface *xch,
 int entry_nr,
 uint64_t table_base);
 
+int xc_physdev_map_pirq_gsi(xc_interface *xch,
+uint32_t domid,
+int gsi,
+int *pirq);
+
 int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..e3538ec0ba80 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ uint32_t flags)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+

[RFC XEN PATCH v15 3/4] tools: Add new function to get gsi from dev

2024-09-11 Thread Jiqian Chen
On PVH dom0, when passthrough a device to domU, QEMU and xl tools
want to use gsi number to do pirq mapping, see QEMU code
xen_pt_realize->xc_physdev_map_pirq, and xl code
pci_add_dm_done->xc_physdev_map_pirq, but in current codes, the gsi
number is got from file /sys/bus/pci/devices//irq, that is
wrong, because irq is not equal with gsi, they are in different
spaces, so pirq mapping fails.

And in current codes, there is no method to get gsi for userspace.
For above purpose, add new function to get gsi, and the
corresponding ioctl is implemented on linux kernel side.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
Reviewed-by: Anthony PERARD 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
---
v13->v15 changes:
Add "Reviewed-by: Anthony PERARD "

v12->v13 changes:
Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi to avoid 
confusion with physdev namesapce.
Move the implementation of xc_pcidev_get_gsi into xc_linux.c.
Directly use xencall_fd(xch->xcall) in the function xc_pcidev_get_gsi instead 
of opening "privcmd".

v11->v12 changes:
Nothing.

v10->v11 changes:
Patch#4 of v10, directly open "/dev/xen/privcmd" in the function 
xc_physdev_gsi_from_dev instead of adding unnecessary functions to libxencall.
Change the type of gsi in the structure privcmd_gsi_from_dev from int to u32.

v9->v10 changes:
Extract the implementation of xc_physdev_gsi_from_dev to be a new patch.
---
 tools/include/xen-sys/Linux/privcmd.h |  7 +++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/ctrl/xc_freebsd.c  |  6 ++
 tools/libs/ctrl/xc_linux.c| 20 
 tools/libs/ctrl/xc_minios.c   |  6 ++
 tools/libs/ctrl/xc_netbsd.c   |  6 ++
 tools/libs/ctrl/xc_solaris.c  |  6 ++
 7 files changed, 53 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..607dfa2287bc 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_pcidev_get_gsi {
+   __u32 sbdf;
+   __u32 gsi;
+} privcmd_pcidev_get_gsi_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_PCIDEV_GET_GSI   \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_pcidev_get_gsi_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2c4608c09ab0..924f9a35f790 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1642,6 +1642,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/ctrl/xc_freebsd.c b/tools/libs/ctrl/xc_freebsd.c
index 9dd48a3a08bb..9019fc663361 100644
--- a/tools/libs/ctrl/xc_freebsd.c
+++ b/tools/libs/ctrl/xc_freebsd.c
@@ -60,6 +60,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, 
size_t size)
 return ptr;
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+errno = ENOSYS;
+return -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/ctrl/xc_linux.c b/tools/libs/ctrl/xc_linux.c
index c67c71c08be3..92591e49a1c8 100644
--- a/tools/libs/ctrl/xc_linux.c
+++ b/tools/libs/ctrl/xc_linux.c
@@ -66,6 +66,26 @@ void *xc_memalign(xc_interface *xch, size_t alignment, 
size_t size)
 return ptr;
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+int ret;
+privcmd_pcidev_get_gsi_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = 0,
+};
+
+ret = ioctl(xencall_fd(xch->xcall),
+IOCTL_PRIVCMD_PCIDEV_GET_GSI, &dev_gsi);
+
+if (ret < 0) {
+PERROR("Failed to get gsi from dev");
+} else {
+ret = dev_gsi.gsi;
+}
+
+return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/ctrl/xc_minios.c b/tools/libs/ctrl/xc_minios.c
index 3dea7a78a576..462af827b33c 100644
--- a/tools/libs/ctrl/xc_minios.c
+++ b/tools/libs/ctrl/xc_minios.c
@@ -47,6 +47,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, 
size_t size)
 return memalign(alignment, size);
 }
 
+int xc_pcidev_get_gsi(xc_inter

[XEN PATCH v15 1/4] x86/hvm: allow {,un}map_pirq hypercalls unconditionally

2024-09-11 Thread Jiqian Chen
The current hypercall interfaces to manage and assign interrupts to
domains is mostly based in using pIRQs as handlers.  Such pIRQ values
are abstract domain-specific references to interrupts.

Classic HVM domains can have access to {,un}map_pirq hypercalls if the
domain is allowed to route physical interrupts over event channels.
That's however a different interface, limited to only mapping
interrupts to itself. PVH domains on the other hand never had access
to the interface, as PVH domains are not allowed to route interrupts
over event channels.

In order to allow setting up PCI passthrough from a PVH domain it
needs access to the {,un}map_pirq hypercalls so interrupts can be
assigned a pIRQ handler that can then be used by further hypercalls to
bind the interrupt to a domain.

Note that the {,un}map_pirq hypercalls end up calling helpers that are
already used against a PVH domain in order to setup interrupts for the
hardware domain when running in PVH mode.  physdev_map_pirq() will
call allocate_and_map_{gsi,msi}_pirq() which is already used by the
vIO-APIC or the vPCI code respectively.  So the exposed code paths are
not new when targeting a PVH domain, but rather previous callers are
not hypercall but emulation based.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
v14->v15 changes:
Change to use the commit message wrote by Roger.

v13->v14 changes:
Modified the commit message.

v12->v13 changes:
Removed the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and added a 
corresponding description in the commit message.

v11->v12 changes:
Avoid using return, set error code instead when (un)map is not allowed.

v10->v11 changes:
Delete the judgment of "d==currd", so that we can prevent physdev_(un)map_pirq 
from being executed when domU has no pirq, instead of just preventing 
self-mapping.
And modify the description of the commit message accordingly.

v9->v10 changes:
Indent the comments above PHYSDEVOP_map_pirq according to the code style.

v8->v9 changes:
Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall.
Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" 
to "d == current->domian".

v7->v8 changes:
Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't 
use pirq.
That check was missed in the previous version.

v6->v7 changes:
Nothing.

v5->v6 changes:
Nothing.

v4->v5 changes:
Move the check of self map_pirq to physdev.c, and change to check if the caller 
has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op.

v3->v4 changes:
add check to prevent PVH self map.

v2->v3 changes:
Du to changes in the implementation of the second patch on kernel side(that it 
will do setup_gsi and map_pirq when assigning a device to passthrough), add 
PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self mapping.
---
 xen/arch/x86/hvm/hypercall.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index f023f7879e24..81883c8d4f60 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[XEN PATCH v15 2/4] x86/irq: allow setting IRQ permissions from GSI instead of pIRQ

2024-09-11 Thread Jiqian Chen
Some domains are not aware of the pIRQ abstraction layer that maps
interrupt sources into Xen space interrupt numbers.  pIRQs values are
only exposed to domains that have the option to route physical
interrupts over event channels.

This creates issues for PCI-passthrough from a PVH domain, as some of
the passthrough related hypercalls use pIRQ as references to physical
interrupts on the system.  One of such interfaces is
XEN_DOMCTL_irq_permission, used to grant or revoke access to
interrupts, takes a pIRQ as the reference to the interrupt to be
adjusted.

Since PVH doesn't manage interrupts in terms of pIRQs, introduce a new
hypercall that allows setting interrupt permissions based on GSI value
rather than pIRQ.

Note the GSI hypercall parameters is translated to an IRQ value (in
case there are ACPI overrides) before doing the checks.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
CC: Daniel P . Smith 
Remaining comment @Daniel P . Smith:
+ret = -EPERM;
+if ( !irq_access_permitted(currd, irq) ||
+ xsm_irq_permission(XSM_HOOK, d, irq, flags) )
+break;
Is it okay to issue the XSM check using the translated value(irq),
not the one(gsi) that was originally passed into the hypercall?
---
v13->v15 changes:
Change to use the commit message wrote by Roger.
Change the code comment from "Check all bits are zero except lowest bit" to 
"Check only valid bits are set".
Change the end return sentence of gsi_2_irq to "return irq ?: -EINVAL;" to 
preserve the error code from apic_pin_2_gsi_irq().

v12->v13 changes:
For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change 
its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and 
XEN_DOMCTL_GSI_GRANT macros.
Move "gsi > highest_gsi()" into function gsi_2_irq.
Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int 
type.
Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK".
Delete unnecessary goto statements and change to direct break.
Add description in commit message to explain how gsi to irq isconverted.

v11->v12 changes:
Change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove 
"__init" of highest_gsi function.
Change the check of irq boundary from <0 to <=0, and remove unnecessary space.
Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.

v10->v11 changes:
Extracted from patch#5 of v10 into a separate patch.
Add non-zero judgment for other bits of allow_access.
Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )".
Change the error exit path identifier "out" to "gsi_permission_out".
Use ARRAY_SIZE() instead of open coed.

v9->v10 changes:
Modified the commit message to further describe the purpose of adding 
XEN_DOMCTL_gsi_permission.
Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, 
and used currd instead of current->domain.
In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original 
new code, and error handling for irq0 was added.
Deleted the extra spaces in the upper and lower lines of the struct 
xen_domctl_gsi_permission definition.

v8->v9 changes:
Change the commit message to describe more why we need this new hypercall.
Add comment above "if ( is_pv_domain(current->domain) || 
has_pirq(current->domain) )" to explain why we need this check.
Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq.
Add explicit padding to struct xen_domctl_gsi_permission.

v5->v8 changes:
Nothing.

v4->v5 changes:
New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi.
---
 xen/arch/x86/domctl.c  | 29 +
 xen/arch/x86/include/asm/io_apic.h |  2 ++
 xen/arch/x86/io_apic.c | 19 +++
 xen/arch/x86/mpparse.c |  7 +++
 xen/include/public/domctl.h| 10 ++
 xen/xsm/flask/hooks.c  |  1 +
 6 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 68b5b46d1a83..939b1de0ee59 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int update_domain_cpu_policy(struct domain *d,
 xen_domctl_cpu_policy_t *xdpc)
@@ -237,6 +238,34 @@ long arch_do_domctl(
 break;
 }
 
+case XEN_DOMCTL_gsi_permission:
+{
+int irq;
+unsigned int gsi = domctl->u.gsi_permission.gsi;
+uint32_t flags = domctl->u.gsi_permission.flags;
+
+/* Check only valid bits are set */
+ret = -EINVAL;
+if ( flags & ~XEN_DOMCTL_GSI_ACTION_MASK )
+break;
+
+ret = irq = gsi_2_

[XEN PATCH v15 0/4] Support device passthrough when dom0 is PVH on Xen

2024-09-11 Thread Jiqian Chen
Hi All,
This is v15 series to support passthrough when dom0 is PVH

v14->v15 changes:
Due to the patch#1 of v14 had been merged, so the sequence number of following 
patches are v14 decrese one.

* patch#1: Change to use the commit message wrote by Roger.
* patch#2: Change to use the commit message wrote by Roger.
   Change the code comment from "Check all bits are zero except lowest 
bit" to "Check only valid
   bits are set".
   Change the end return sentence of gsi_2_irq to "return irq ?: 
-EINVAL;" to preserve the error
   code from apic_pin_2_gsi_irq().
* patch#3: Add "Reviewed-by: Anthony PERARD "
* patch#4: Change the initialization way of "struct physdev_map_pirq map" in 
function xc_physdev_map_pirq_gsi
   to be definition and set value directly.
   Change code from "rc = libxl__arch_local_domain_has_pirq_notion(gc); 
if (!rc) {}" to
   "if (libxl__arch_local_domain_has_pirq_notion(gc) == false) {}"
   Modified some log prints codes.

Best regards,
Jiqian Chen



v13->v14 changes:
* patch#1: Removed the check ( !is_pci_passthrough_enabled() ).
   Added if ( dev_reset.flags & ~PCI_DEVICE_RESET_MASK ) to check if 
the other bits are zero.
* patch#2: Modified the commit message.
   
Due to the patch#3 of v13 had been merged, so the sequence number of following 
patches are v13 decrese one.
   
* patch#3~5: No changes.


v12->v13 changes:
Due to major changes in the codes, all the Reviewed-by received before have 
been removed.
Please review them again.
* patch#1: Delete all "state" words in new code, because it is not necessary.
   Delete unnecessary parameter reset_type of function 
vpci_reset_device, and changed this
   function to inline function.
   Add description to commit message to indicate that the 
classification of reset types is
   for possible different behaviors in the future.
   Rename reset_type of struct pci_device_reset to flags, and modified 
the value of macro
   definition of reset, let them occupy two lowest bits.
   Change the function vpci_reset_device to an inline function and 
delete the
   "ASSERT(rw_is_write_locked(&pdev->domain->pci_lock))"; because this 
exists in subsequent
   functions and it accesses domain and pci_lock, which will affect the 
compilation process.
* patch#2: Remove the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and 
added a corresponding
   description in the commit message.
* patch#3: Add more detailed descriptions into commit message not just 
callstack.
* patch#4: For struct xen_domctl_gsi_permission, rename "access_flag" to 
"flags", change its type from
   uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and 
XEN_DOMCTL_GSI_GRANT macros.
   Move "gsi > highest_gsi()" into function gsi_2_irq.
   Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to 
unsigned int type.
   Delete unnecessary spaces and brackets around 
"~XEN_DOMCTL_GSI_ACTION_MASK".
   Delete unnecessary goto statements and change to direct break.
   Add description in commit message to explain how gsi to irq is 
converted.
* patch#5: Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi 
to avoid confusion with
   physdev namesapce.
   Move the implementation of xc_pcidev_get_gsi into xc_linux.c.
   Directly use xencall_fd(xch->xcall) in the function 
xc_pcidev_get_gsi instead of opening
   "privcmd".
* patch#6: Delete patch #6 of v12, and added function xc_physdev_map_pirq_gsi 
to map pirq for gsi.
   For functions that generate libxl error, changed the return value 
from -1 to ERROR_*.
   Instead of declaring "ctx", use the macro "CTX".
   Add the function libxl__arch_local_romain_ has_pirq_notion to 
determine if there is a concept
   of pirq in the domain where xl is located.
   In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use 
map_pirq to obtain the pirq
   corresponding to gsi.


v11->v12 changes:
* patch#1: Change the title of this patch.
   Remove unnecessary notes, erroneous stamps, and #define.
* patch#2: Avoid using return, set error code instead when (un)map is not 
allowed.
   Due to functional change in v11, remove the Reviewed-by of Stefano.
* patch#3: Add more detailed descriptions into commit message not just 
callstack.

patch#4 in v11: remove from this series and upstream individually.

* patch#4: is patch#5 of v11, change nr_irqs_gsi to highest_gsi() to check gsi 
boundary, then need to
   remove "__init" of highest_gsi function.
   Change the check 

[XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-09-03 Thread Jiqian Chen
When dom0 is PVH type and passthrough a device to HVM domU, Qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code pci_add_dm_done->
xc_physdev_map_pirq map a pirq for passthrough devices.
In xc_physdev_map_pirq call stack, function hvm_physdev_op has a check
has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ flag,
so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in current
codes.

But it is fine to map interrupts through pirq to a HVM domain whose
XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as a way to
reference interrupts and it is just the way for the device model to
identify which interrupt should be mapped to which domain, however
has_pirq() is just to check if HVM domains route interrupts from
devices(emulated or passthrough) through event channel, so, the has_pirq()
check should not be applied to the PHYSDEVOP_map_pirq issued by dom0.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. Then the
interrupt of a passthrough device can be successfully mapped to pirq for domU.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
v13->v14 changes:
Modified the commit message.

v12->v13 changes:
Removed the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and added a 
corresponding description in the commit message.

v11->v12 changes:
Avoid using return, set error code instead when (un)map is not allowed.

v10->v11 changes:
Delete the judgment of "d==currd", so that we can prevent physdev_(un)map_pirq 
from being executed when domU has no pirq, instead of just preventing 
self-mapping.
And modify the description of the commit message accordingly.

v9->v10 changes:
Indent the comments above PHYSDEVOP_map_pirq according to the code style.

v8->v9 changes:
Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall.
Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" 
to "d == current->domian".

v7->v8 changes:
Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't 
use pirq.
That check was missed in the previous version.

v6->v7 changes:
Nothing.

v5->v6 changes:
Nothing.

v4->v5 changes:
Move the check of self map_pirq to physdev.c, and change to check if the caller 
has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op.

v3->v4 changes:
add check to prevent PVH self map.

v2->v3 changes:
Du to changes in the implementation of the second patch on kernel side(that it 
will do setup_gsi and map_pirq when assigning a device to passthrough), add 
PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self mapping.
---
 xen/arch/x86/hvm/hypercall.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index f023f7879e24..81883c8d4f60 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[RFC XEN PATCH v14 4/5] tools: Add new function to get gsi from dev

2024-09-03 Thread Jiqian Chen
When passthrough a device to domU, QEMU and xl tools use its gsi
number to do pirq mapping, see QEMU code
xen_pt_realize->xc_physdev_map_pirq, and xl code
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got
from file /sys/bus/pci/devices//irq, that is wrong, because
irq is not equal with gsi, they are in different spaces, so pirq
mapping fails.

And in current codes, there is no method to get gsi for userspace.
For above purpose, add new function to get gsi, and the
corresponding ioctl is implemented on linux kernel side.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
---
v13->v14 changes:
No.

v12->v13 changes:
Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi to avoid 
confusion with physdev namesapce.
Move the implementation of xc_pcidev_get_gsi into xc_linux.c.
Directly use xencall_fd(xch->xcall) in the function xc_pcidev_get_gsi instead 
of opening "privcmd".

v11->v12 changes:
Nothing.

v10->v11 changes:
Patch#4 of v10, directly open "/dev/xen/privcmd" in the function 
xc_physdev_gsi_from_dev instead of adding unnecessary functions to libxencall.
Change the type of gsi in the structure privcmd_gsi_from_dev from int to u32.

v9->v10 changes:
Extract the implementation of xc_physdev_gsi_from_dev to be a new patch.
---
 tools/include/xen-sys/Linux/privcmd.h |  7 +++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/ctrl/xc_freebsd.c  |  6 ++
 tools/libs/ctrl/xc_linux.c| 20 
 tools/libs/ctrl/xc_minios.c   |  6 ++
 tools/libs/ctrl/xc_netbsd.c   |  6 ++
 tools/libs/ctrl/xc_solaris.c  |  6 ++
 7 files changed, 53 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..607dfa2287bc 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_pcidev_get_gsi {
+   __u32 sbdf;
+   __u32 gsi;
+} privcmd_pcidev_get_gsi_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_PCIDEV_GET_GSI   \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_pcidev_get_gsi_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2c4608c09ab0..924f9a35f790 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1642,6 +1642,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/ctrl/xc_freebsd.c b/tools/libs/ctrl/xc_freebsd.c
index 9dd48a3a08bb..9019fc663361 100644
--- a/tools/libs/ctrl/xc_freebsd.c
+++ b/tools/libs/ctrl/xc_freebsd.c
@@ -60,6 +60,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, 
size_t size)
 return ptr;
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+errno = ENOSYS;
+return -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/ctrl/xc_linux.c b/tools/libs/ctrl/xc_linux.c
index c67c71c08be3..92591e49a1c8 100644
--- a/tools/libs/ctrl/xc_linux.c
+++ b/tools/libs/ctrl/xc_linux.c
@@ -66,6 +66,26 @@ void *xc_memalign(xc_interface *xch, size_t alignment, 
size_t size)
 return ptr;
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+int ret;
+privcmd_pcidev_get_gsi_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = 0,
+};
+
+ret = ioctl(xencall_fd(xch->xcall),
+IOCTL_PRIVCMD_PCIDEV_GET_GSI, &dev_gsi);
+
+if (ret < 0) {
+PERROR("Failed to get gsi from dev");
+} else {
+ret = dev_gsi.gsi;
+}
+
+return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/ctrl/xc_minios.c b/tools/libs/ctrl/xc_minios.c
index 3dea7a78a576..462af827b33c 100644
--- a/tools/libs/ctrl/xc_minios.c
+++ b/tools/libs/ctrl/xc_minios.c
@@ -47,6 +47,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, 
size_t size)
 return memalign(alignment, size);
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+errno = ENOSYS;
+return -1;
+}
+
 /*
  * Local variables:
  * mod

[XEN PATCH v14 1/5] xen/pci: Add hypercall to support reset of pcidev

2024-09-03 Thread Jiqian Chen
When a device has been reset on dom0 side, the Xen hypervisor
doesn't get notification, so the cached state in vpci is all
out of date compare with the real device state.

To solve that problem, add a new hypercall to support the reset
of pcidev and clear the vpci state of device. So that once the
state of device is reset on dom0 side, dom0 can call this
hypercall to notify hypervisor.

The behavior of different reset types may be different in the
future, so divide them now so that they can be easily modified
in the future without affecting the hypercall interface.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
v13->v14 changes:
Removed the check ( !is_pci_passthrough_enabled() ).
Added if ( dev_reset.flags & ~PCI_DEVICE_RESET_MASK ) to check if the other 
bits are zero.

v12->v13 changes:
Deleted all "state" words in new code, because it is not necessary.
Deleted unnecessary parameter reset_type of function vpci_reset_device, and 
changed this function to inline function
Added description to commit message to indicate that the classification of 
reset types is for possible different behaviors in the future
Renamed reset_type of struct pci_device_reset to flags, and modified the value 
of macro definition of reset, let them occupy two lowest bits.
Change the function vpci_reset_device to an inline function and delete the 
ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); because this call exists 
in subsequent functions and it accesses domain and pci_lock, which will affect 
the compilation process.

v11->v12 changes:
Change the title of this patch(Add hypercall to support reset of pcidev).
Remove unnecessary notes, erroneous stamps, and #define.

v10->v11 changes:
Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to the next 
line.
Delete unnecessary local variables "struct physdev_pci_device *dev".
Downgrade printk to dprintk.
Moved struct pci_device_state_reset to the public header file.
Delete enum pci_device_state_reset_type, and use macro definitions to represent 
different reset types.
Delete pci_device_state_reset_method, and add switch cases in 
PHYSDEVOP_pci_device_state_reset to handle different reset functions.
Add reset type as a function parameter for vpci_reset_device_state for possible 
future use.

v9->v10 changes:
Nothing.

v8->v9 changes:
Move pcidevs_unlock below write_lock, and remove "ASSERT(pcidevs_locked());" 
from vpci_reset_device_state;
Add pci_device_state_reset_type to distinguish the reset types.

v7->v8 changes:
Nothing.

v6->v7 changes:
Nothing.

v5->v6 changes:
Rebase code and change old function vpci_remove_device, vpci_add_handlers to 
vpci_deassign_device, vpci_assign_device.

v4->v5 changes:
Add pci_lock wrap function vpci_reset_device_state.

v3->v4 changes:
Change the comment of PHYSDEVOP_pci_device_state_reset;
Move printings behind pcidevs_unlock.

v2->v3 changes:
Move the content out of pci_reset_device_state and delete 
pci_reset_device_state;
Add xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset;
Add description for PHYSDEVOP_pci_device_state_reset;

for patch 1
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 52 
 xen/include/public/physdev.h | 17 
 xen/include/xen/vpci.h   |  6 +
 4 files changed, 76 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 44342e7e7fc3..f023f7879e24 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..0161a85e1e9c 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_reset:
+{
+struct pci_device_reset dev_reset;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+ret = -EFAULT;
+if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
+break;
+
+ret = -EINVAL;
+if ( dev_reset.flags & ~PCI_DEVICE_RESET_MASK )
+break;
+
+sbdf = PCI_SBDF(dev_reset.dev.seg,
+dev_reset.dev.bus,
+dev_reset.dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();

[RFC XEN PATCH v14 5/5] tools: Add new function to do PIRQ (un)map on PVH dom0

2024-09-03 Thread Jiqian Chen
When dom0 is PVH, and passthrough a device to dumU, xl will
use the gsi number of device to do a pirq mapping, see
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is
got from file /sys/bus/pci/devices//irq, that confuses
irq and gsi, they are in different space and are not equal,
so it will fail when mapping.
To solve this issue, use xc_physdev_gsi_from_dev to get the
real gsi and add a new function xc_physdev_map_pirq_gsi to get
a free pirq for gsi(why not use current function
xc_physdev_map_pirq, because it doesn't support to allocate a
free pirq, what's more, to prevent changing it and affecting
its callers, so add xc_physdev_map_pirq_gsi).

Besides, PVH dom0 doesn't have PIRQ flag, it doesn't do
PHYSDEVOP_map_pirq for each gsi. So grant function callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission
requires passing in pirq, it is not suitable for dom0 that
doesn't have PIRQs to grant irq permission.
To solve this issue, use the new hypercall
XEN_DOMCTL_gsi_permission to grant the permission of irq(
translate from gsi) to dumU when dom0 has no PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
---
v13->v14 changes:
No.

v12->v13 changes:
Deleted patch #6 of v12, and added function xc_physdev_map_pirq_gsi to map pirq 
for gsi.
For functions that generate libxl error, changed the return value from -1 to 
ERROR_*.
Instead of declaring "ctx", use the macro "CTX".
Add the function libxl__arch_local_romain_ has_pirq_notion to determine if 
there is a concept of pirq in the domain where xl is located.
In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use map_pirq to 
obtain the pirq corresponding to gsi.

v11->v12 changes:
Nothing.

v10->v11 changes:
New patch
Modification of the tools part of patches#4 and #5 of v10, use 
privcmd_gsi_from_dev to get gsi, and use XEN_DOMCTL_gsi_permission to grant gsi.
Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID.
Add libxl__arch_hvm_map_gsi to distinguish x86 related implementations.
Add a list pcidev_pirq_list to record the relationship between sbdf and pirq, 
which can be used to obtain the corresponding pirq when unmap PIRQ.
---
 tools/include/xenctrl.h   |  10 +++
 tools/libs/ctrl/xc_domain.c   |  15 +
 tools/libs/ctrl/xc_physdev.c  |  27 
 tools/libs/light/libxl_arch.h |   6 ++
 tools/libs/light/libxl_arm.c  |  15 +
 tools/libs/light/libxl_pci.c  | 112 --
 tools/libs/light/libxl_x86.c  |  72 ++
 7 files changed, 212 insertions(+), 45 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 924f9a35f790..29617585c535 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1383,6 +1383,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ uint32_t flags);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
@@ -1638,6 +1643,11 @@ int xc_physdev_map_pirq_msi(xc_interface *xch,
 int entry_nr,
 uint64_t table_base);
 
+int xc_physdev_map_pirq_gsi(xc_interface *xch,
+uint32_t domid,
+int gsi,
+int *pirq);
+
 int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..e3538ec0ba80 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ uint32_t flags)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.flags = flags,
+};
+
+return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 460a8e779

[XEN PATCH v14 3/5] x86/domctl: Add hypercall to set the access of x86 gsi

2024-09-03 Thread Jiqian Chen
Some type of domains don't have PIRQs, like PVH, it doesn't do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest base on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
irq on Xen side.
What's more, current hypercall XEN_DOMCTL_irq_permission requires
passing in pirq to set the access of irq, it is not suitable for
dom0 that doesn't have PIRQs.

So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/revoke
the permission of irq (translated from x86 gsi) to dumU when dom0
has no PIRQs.

Regarding the translation from gsi to irq, it is that if there are
ACPI overrides entries then get translation from them, if not gsi
are identity mapped into irq.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
CC: Daniel P . Smith 
Remaining unsolved comment @Daniel P . Smith:
+ret = -EPERM;
+if ( !irq_access_permitted(currd, irq) ||
+ xsm_irq_permission(XSM_HOOK, d, irq, flags) )
+break;
Is it okay to issue the XSM check using the translated value(irq),
not the one(gsi) that was originally passed into the hypercall?
---
v13->v14 changes:
No.

v12->v13 changes:
For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change 
its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and 
XEN_DOMCTL_GSI_GRANT macros.
Move "gsi > highest_gsi()" into function gsi_2_irq.
Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int 
type.
Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK".
Delete unnecessary goto statements and change to direct break.
Add description in commit message to explain how gsi to irq isconverted.

v11->v12 changes:
Change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove 
"__init" of highest_gsi function.
Change the check of irq boundary from <0 to <=0, and remove unnecessary space.
Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.

v10->v11 changes:
Extracted from patch#5 of v10 into a separate patch.
Add non-zero judgment for other bits of allow_access.
Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )".
Change the error exit path identifier "out" to "gsi_permission_out".
Use ARRAY_SIZE() instead of open coed.

v9->v10 changes:
Modified the commit message to further describe the purpose of adding 
XEN_DOMCTL_gsi_permission.
Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, 
and used currd instead of current->domain.
In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original 
new code, and error handling for irq0 was added.
Deleted the extra spaces in the upper and lower lines of the struct 
xen_domctl_gsi_permission definition.

v8->v9 changes:
Change the commit message to describe more why we need this new hypercall.
Add comment above "if ( is_pv_domain(current->domain) || 
has_pirq(current->domain) )" to explain why we need this check.
Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq.
Add explicit padding to struct xen_domctl_gsi_permission.

v5->v8 changes:
Nothing.

v4->v5 changes:
New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi.
---
 xen/arch/x86/domctl.c  | 29 +
 xen/arch/x86/include/asm/io_apic.h |  2 ++
 xen/arch/x86/io_apic.c | 21 +
 xen/arch/x86/mpparse.c |  7 +++
 xen/include/public/domctl.h| 10 ++
 xen/xsm/flask/hooks.c  |  1 +
 6 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 68b5b46d1a83..60b5578c47f8 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int update_domain_cpu_policy(struct domain *d,
 xen_domctl_cpu_policy_t *xdpc)
@@ -237,6 +238,34 @@ long arch_do_domctl(
 break;
 }
 
+case XEN_DOMCTL_gsi_permission:
+{
+int irq;
+unsigned int gsi = domctl->u.gsi_permission.gsi;
+uint32_t flags = domctl->u.gsi_permission.flags;
+
+/* Check all bits are zero except lowest bit */
+ret = -EINVAL;
+if ( flags & ~XEN_DOMCTL_GSI_ACTION_MASK )
+break;
+
+ret = irq = gsi_2_irq(gsi);
+if ( ret <= 0 )
+break;
+
+ret = -EPERM;
+if ( !irq_access_permitted(currd, irq) ||
+ xsm_irq_permission(XSM_HOOK, d, irq, flags) )
+break;
+
+if ( flags )
+ret = irq_permit_access(d, irq);
+else
+ret = irq_deny_access(d, 

[XEN PATCH v14 0/5] Support device passthrough when dom0 is PVH on Xen

2024-09-03 Thread Jiqian Chen
Hi All,
This is v14 series to support passthrough when dom0 is PVH
The expected merge order of this series is the first two patches in this 
series, then patches on
kernel side, then the last three patches in this series.

v13->v14 changes:
* patch#1: Removed the check ( !is_pci_passthrough_enabled() ).
   Added if ( dev_reset.flags & ~PCI_DEVICE_RESET_MASK ) to check if 
the other bits are zero.

* patch#2: Modified the commit message.
   
Due to the patch#3 of v13 had been merged, so the sequence number of following 
patches are v13 decrese one.
   
* patch#3~5: No changes.

Best regards,
Jiqian Chen


v12->v13 changes:
Due to major changes in the codes, all the Reviewed-by received before have 
been removed.
Please review them again.
* patch#1: Delete all "state" words in new code, because it is not necessary.
   Delete unnecessary parameter reset_type of function 
vpci_reset_device, and changed this
   function to inline function.
   Add description to commit message to indicate that the 
classification of reset types is
   for possible different behaviors in the future.
   Rename reset_type of struct pci_device_reset to flags, and modified 
the value of macro
   definition of reset, let them occupy two lowest bits.
   Change the function vpci_reset_device to an inline function and 
delete the
   "ASSERT(rw_is_write_locked(&pdev->domain->pci_lock))"; because this 
exists in subsequent
   functions and it accesses domain and pci_lock, which will affect the 
compilation process.
* patch#2: Remove the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and 
added a corresponding
   description in the commit message.
* patch#3: Add more detailed descriptions into commit message not just 
callstack.
* patch#4: For struct xen_domctl_gsi_permission, rename "access_flag" to 
"flags", change its type from
   uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and 
XEN_DOMCTL_GSI_GRANT macros.
   Move "gsi > highest_gsi()" into function gsi_2_irq.
   Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to 
unsigned int type.
   Delete unnecessary spaces and brackets around 
"~XEN_DOMCTL_GSI_ACTION_MASK".
   Delete unnecessary goto statements and change to direct break.
   Add description in commit message to explain how gsi to irq is 
converted.
* patch#5: Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi 
to avoid confusion with
   physdev namesapce.
   Move the implementation of xc_pcidev_get_gsi into xc_linux.c.
   Directly use xencall_fd(xch->xcall) in the function 
xc_pcidev_get_gsi instead of opening
   "privcmd".
* patch#6: Delete patch #6 of v12, and added function xc_physdev_map_pirq_gsi 
to map pirq for gsi.
   For functions that generate libxl error, changed the return value 
from -1 to ERROR_*.
   Instead of declaring "ctx", use the macro "CTX".
   Add the function libxl__arch_local_romain_ has_pirq_notion to 
determine if there is a concept
   of pirq in the domain where xl is located.
   In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use 
map_pirq to obtain the pirq
   corresponding to gsi.


v11->v12 changes:
* patch#1: Change the title of this patch.
   Remove unnecessary notes, erroneous stamps, and #define.
* patch#2: Avoid using return, set error code instead when (un)map is not 
allowed.
   Due to functional change in v11, remove the Reviewed-by of Stefano.
* patch#3: Add more detailed descriptions into commit message not just 
callstack.

patch#4 in v11: remove from this series and upstream individually.

* patch#4: is patch#5 of v11, change nr_irqs_gsi to highest_gsi() to check gsi 
boundary, then need to
   remove "__init" of highest_gsi function.
   Change the check of irq boundary from <0 to <=0, and remove 
unnecessary space.
   Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.
* patch#5: Add explanation of whether the caller of xc_physdev_map_pirq is 
affected.


v10->v11 changes:
* patch#1: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to 
the next line.
   Delete unnecessary local variables "struct physdev_pci_device *dev".
   Downgrade printk to dprintk.
   Moved struct pci_device_state_reset to the public header file.
   Delete enum pci_device_state_reset_type, and use macro definitions 
to represent different
   reset types.
   Delete pci_device_state_reset_method, and add switch cases in 
PHYSDEVOP_pci_device_state_reset
   to handle different reset functions.
   Add reset type as a

[RFC XEN PATCH v13 6/6] tools: Add new function to do PIRQ (un)map on PVH dom0

2024-08-16 Thread Jiqian Chen
When dom0 is PVH, and passthrough a device to dumU, xl will
use the gsi number of device to do a pirq mapping, see
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is
got from file /sys/bus/pci/devices//irq, that confuses
irq and gsi, they are in different space and are not equal,
so it will fail when mapping.
To solve this issue, use xc_physdev_gsi_from_dev to get the
real gsi and add a new function xc_physdev_map_pirq_gsi to get
a free pirq for gsi(why not use current function
xc_physdev_map_pirq, because it doesn't support to allocate a
free pirq, what's more, to prevent changing it and affecting
its callers, so add xc_physdev_map_pirq_gsi).

Besides, PVH dom0 doesn't have PIRQ flag, it doesn't do
PHYSDEVOP_map_pirq for each gsi. So grant function callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission
requires passing in pirq, it is not suitable for dom0 that
doesn't have PIRQs to grant irq permission.
To solve this issue, use the new hypercall
XEN_DOMCTL_gsi_permission to grant the permission of irq(
translate from gsi) to dumU when dom0 has no PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
---
v12->v13 changes:
Deleted patch #6 of v12, and added function xc_physdev_map_pirq_gsi to map pirq 
for gsi.
For functions that generate libxl error, changed the return value from -1 to 
ERROR_*.
Instead of declaring "ctx", use the macro "CTX".
Add the function libxl__arch_local_romain_ has_pirq_notion to determine if 
there is a concept of pirq in the domain where xl is located.
In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use map_pirq to 
obtain the pirq corresponding to gsi.

v11->v12 changes:
Nothing.

v10->v11 changes:
New patch
Modification of the tools part of patches#4 and #5 of v10, use 
privcmd_gsi_from_dev to get gsi, and use XEN_DOMCTL_gsi_permission to grant gsi.
Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID.
Add libxl__arch_hvm_map_gsi to distinguish x86 related implementations.
Add a list pcidev_pirq_list to record the relationship between sbdf and pirq, 
which can be used to obtain the corresponding pirq when unmap PIRQ.
---
 tools/include/xenctrl.h   |  10 +++
 tools/libs/ctrl/xc_domain.c   |  15 +
 tools/libs/ctrl/xc_physdev.c  |  27 
 tools/libs/light/libxl_arch.h |   6 ++
 tools/libs/light/libxl_arm.c  |  15 +
 tools/libs/light/libxl_pci.c  | 112 --
 tools/libs/light/libxl_x86.c  |  72 ++
 7 files changed, 212 insertions(+), 45 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 82de6748f7a7..c798472995f7 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ uint32_t flags);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
@@ -1637,6 +1642,11 @@ int xc_physdev_map_pirq_msi(xc_interface *xch,
 int entry_nr,
 uint64_t table_base);
 
+int xc_physdev_map_pirq_gsi(xc_interface *xch,
+uint32_t domid,
+int gsi,
+int *pirq);
+
 int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..e3538ec0ba80 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ uint32_t flags)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.flags = flags,
+};
+
+return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 460a8e779ce8..c752cd1f4410 100644
-

[RFC XEN PATCH v13 5/6] tools: Add new function to get gsi from dev

2024-08-16 Thread Jiqian Chen
When passthrough a device to domU, QEMU and xl tools use its gsi
number to do pirq mapping, see QEMU code
xen_pt_realize->xc_physdev_map_pirq, and xl code
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got
from file /sys/bus/pci/devices//irq, that is wrong, because
irq is not equal with gsi, they are in different spaces, so pirq
mapping fails.

And in current codes, there is no method to get gsi for userspace.
For above purpose, add new function to get gsi, and the
corresponding ioctl is implemented on linux kernel side.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
---
v12->v13 changes:
Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi to avoid 
confusion with physdev namesapce.
Move the implementation of xc_pcidev_get_gsi into xc_linux.c.
Directly use xencall_fd(xch->xcall) in the function xc_pcidev_get_gsi instead 
of opening "privcmd".

v11->v12 changes:
Nothing.

v10->v11 changes:
Patch#4 of v10, directly open "/dev/xen/privcmd" in the function 
xc_physdev_gsi_from_dev instead of adding unnecessary functions to libxencall.
Change the type of gsi in the structure privcmd_gsi_from_dev from int to u32.

v9->v10 changes:
Extract the implementation of xc_physdev_gsi_from_dev to be a new patch.
---
 tools/include/xen-sys/Linux/privcmd.h |  7 +++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/ctrl/xc_freebsd.c  |  6 ++
 tools/libs/ctrl/xc_linux.c| 20 
 tools/libs/ctrl/xc_minios.c   |  6 ++
 tools/libs/ctrl/xc_netbsd.c   |  6 ++
 tools/libs/ctrl/xc_solaris.c  |  6 ++
 7 files changed, 53 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..607dfa2287bc 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_pcidev_get_gsi {
+   __u32 sbdf;
+   __u32 gsi;
+} privcmd_pcidev_get_gsi_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_PCIDEV_GET_GSI   \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_pcidev_get_gsi_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 9ceca0cffc2f..82de6748f7a7 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/ctrl/xc_freebsd.c b/tools/libs/ctrl/xc_freebsd.c
index 9dd48a3a08bb..9019fc663361 100644
--- a/tools/libs/ctrl/xc_freebsd.c
+++ b/tools/libs/ctrl/xc_freebsd.c
@@ -60,6 +60,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, 
size_t size)
 return ptr;
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+errno = ENOSYS;
+return -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/ctrl/xc_linux.c b/tools/libs/ctrl/xc_linux.c
index c67c71c08be3..92591e49a1c8 100644
--- a/tools/libs/ctrl/xc_linux.c
+++ b/tools/libs/ctrl/xc_linux.c
@@ -66,6 +66,26 @@ void *xc_memalign(xc_interface *xch, size_t alignment, 
size_t size)
 return ptr;
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+int ret;
+privcmd_pcidev_get_gsi_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = 0,
+};
+
+ret = ioctl(xencall_fd(xch->xcall),
+IOCTL_PRIVCMD_PCIDEV_GET_GSI, &dev_gsi);
+
+if (ret < 0) {
+PERROR("Failed to get gsi from dev");
+} else {
+ret = dev_gsi.gsi;
+}
+
+return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/ctrl/xc_minios.c b/tools/libs/ctrl/xc_minios.c
index 3dea7a78a576..462af827b33c 100644
--- a/tools/libs/ctrl/xc_minios.c
+++ b/tools/libs/ctrl/xc_minios.c
@@ -47,6 +47,12 @@ void *xc_memalign(xc_interface *xch, size_t alignment, 
size_t size)
 return memalign(alignment, size);
 }
 
+int xc_pcidev_get_gsi(xc_interface *xch, uint32_t sbdf)
+{
+errno = ENOSYS;
+return -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/lib

[XEN PATCH v13 1/6] xen/pci: Add hypercall to support reset of pcidev

2024-08-16 Thread Jiqian Chen
When a device has been reset on dom0 side, the Xen hypervisor
doesn't get notification, so the cached state in vpci is all
out of date compare with the real device state.

To solve that problem, add a new hypercall to support the reset
of pcidev and clear the vpci state of device. So that once the
state of device is reset on dom0 side, dom0 can call this
hypercall to notify hypervisor.

The behavior of different reset types may be different in the
future, so divide them now so that they can be easily modified
in the future without affecting the hypercall interface.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
v12->v13 changes:
Deleted all "state" words in new code, because it is not necessary.
Deleted unnecessary parameter reset_type of function vpci_reset_device, and 
changed this function to inline function
Added description to commit message to indicate that the classification of 
reset types is for possible different behaviors in the future
Renamed reset_type of struct pci_device_reset to flags, and modified the value 
of macro definition of reset, let them occupy two lowest bits.
Change the function vpci_reset_device to an inline function and delete the 
ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); because this call exists 
in subsequent functions and it accesses domain and pci_lock, which will affect 
the compilation process.

v11->v12 changes:
Change the title of this patch(Add hypercall to support reset of pcidev).
Remove unnecessary notes, erroneous stamps, and #define.

v10->v11 changes:
Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to the next 
line.
Delete unnecessary local variables "struct physdev_pci_device *dev".
Downgrade printk to dprintk.
Moved struct pci_device_state_reset to the public header file.
Delete enum pci_device_state_reset_type, and use macro definitions to represent 
different reset types.
Delete pci_device_state_reset_method, and add switch cases in 
PHYSDEVOP_pci_device_state_reset to handle different reset functions.
Add reset type as a function parameter for vpci_reset_device_state for possible 
future use.

v9->v10 changes:
Nothing.

v8->v9 changes:
Move pcidevs_unlock below write_lock, and remove "ASSERT(pcidevs_locked());" 
from vpci_reset_device_state;
Add pci_device_state_reset_type to distinguish the reset types.

v7->v8 changes:
Nothing.

v6->v7 changes:
Nothing.

v5->v6 changes:
Rebase code and change old function vpci_remove_device, vpci_add_handlers to 
vpci_deassign_device, vpci_assign_device.

v4->v5 changes:
Add pci_lock wrap function vpci_reset_device_state.

v3->v4 changes:
Change the comment of PHYSDEVOP_pci_device_state_reset;
Move printings behind pcidevs_unlock.

v2->v3 changes:
Move the content out of pci_reset_device_state and delete 
pci_reset_device_state;
Add xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset;
Add description for PHYSDEVOP_pci_device_state_reset;
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 52 
 xen/include/public/physdev.h | 17 
 xen/include/xen/vpci.h   |  6 +
 4 files changed, 76 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index c1bd17571e47..68815b03eb25 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..980ff1ba3d07 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_reset:
+{
+struct pci_device_reset dev_reset;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+ret = -EOPNOTSUPP;
+if ( !is_pci_passthrough_enabled() )
+break;
+
+ret = -EFAULT;
+if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
+break;
+
+sbdf = PCI_SBDF(dev_reset.dev.seg,
+dev_reset.dev.bus,
+dev_reset.dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write

[XEN PATCH v13 4/6] x86/domctl: Add hypercall to set the access of x86 gsi

2024-08-16 Thread Jiqian Chen
Some type of domains don't have PIRQs, like PVH, it doesn't do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest base on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
irq on Xen side.
What's more, current hypercall XEN_DOMCTL_irq_permission requires
passing in pirq to set the access of irq, it is not suitable for
dom0 that doesn't have PIRQs.

So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/revoke
the permission of irq (translated from x86 gsi) to dumU when dom0
has no PIRQs.

Regarding the translation from gsi to irq, it is that if there are
ACPI overrides entries then get translation from them, if not gsi
are identity mapped into irq.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
CC: Daniel P . Smith 
Remaining comment @Daniel P . Smith:
+ret = -EPERM;
+if ( !irq_access_permitted(currd, irq) ||
+ xsm_irq_permission(XSM_HOOK, d, irq, flags) )
+break;
Is it okay to issue the XSM check using the translated value(irq),
not the one(gsi) that was originally passed into the hypercall?
---
v12->v13 changes:
For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change 
its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and 
XEN_DOMCTL_GSI_GRANT macros.
Move "gsi > highest_gsi()" into function gsi_2_irq.
Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int 
type.
Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK".
Delete unnecessary goto statements and change to direct break.
Add description in commit message to explain how gsi to irq isconverted.

v11->v12 changes:
Change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to remove 
"__init" of highest_gsi function.
Change the check of irq boundary from <0 to <=0, and remove unnecessary space.
Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.

v10->v11 changes:
Extracted from patch#5 of v10 into a separate patch.
Add non-zero judgment for other bits of allow_access.
Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )".
Change the error exit path identifier "out" to "gsi_permission_out".
Use ARRAY_SIZE() instead of open coed.

v9->v10 changes:
Modified the commit message to further describe the purpose of adding 
XEN_DOMCTL_gsi_permission.
Added a check for all zeros in the padding field in XEN_DOMCTL_gsi_permission, 
and used currd instead of current->domain.
In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the original 
new code, and error handling for irq0 was added.
Deleted the extra spaces in the upper and lower lines of the struct 
xen_domctl_gsi_permission definition.

v8->v9 changes:
Change the commit message to describe more why we need this new hypercall.
Add comment above "if ( is_pv_domain(current->domain) || 
has_pirq(current->domain) )" to explain why we need this check.
Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq.
Add explicit padding to struct xen_domctl_gsi_permission.

v5->v8 changes:
Nothing.

v4->v5 changes:
New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi.
---
 xen/arch/x86/domctl.c  | 29 +
 xen/arch/x86/include/asm/io_apic.h |  2 ++
 xen/arch/x86/io_apic.c | 21 +
 xen/arch/x86/mpparse.c |  7 +++
 xen/include/public/domctl.h| 10 ++
 xen/xsm/flask/hooks.c  |  1 +
 6 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 68b5b46d1a83..60b5578c47f8 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int update_domain_cpu_policy(struct domain *d,
 xen_domctl_cpu_policy_t *xdpc)
@@ -237,6 +238,34 @@ long arch_do_domctl(
 break;
 }
 
+case XEN_DOMCTL_gsi_permission:
+{
+int irq;
+unsigned int gsi = domctl->u.gsi_permission.gsi;
+uint32_t flags = domctl->u.gsi_permission.flags;
+
+/* Check all bits are zero except lowest bit */
+ret = -EINVAL;
+if ( flags & ~XEN_DOMCTL_GSI_ACTION_MASK )
+break;
+
+ret = irq = gsi_2_irq(gsi);
+if ( ret <= 0 )
+break;
+
+ret = -EPERM;
+if ( !irq_access_permitted(currd, irq) ||
+ xsm_irq_permission(XSM_HOOK, d, irq, flags) )
+break;
+
+if ( flags )
+ret = irq_permit_access(d, irq);
+else
+ret = irq_deny_access(d, irq);
+
+break;
+}
+
 cas

[XEN PATCH v13 3/6] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-08-16 Thread Jiqian Chen
The gsi of a passthrough device must be configured for it to be
able to be mapped into a hvm domU.
But When dom0 is PVH, the gsis may not get registered(see below
clarification), it causes the info of apic, pin and irq not be
added into irq_2_pin list, and the handler of irq_desc is not set,
then when passthrough a device, setting ioapic affinity and vector
will fail.

To fix above problem, on Linux kernel side, a new code will
need to call PHYSDEVOP_setup_gsi for passthrough devices to
register gsi when dom0 is PVH.

So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
purpose.

Clarify two questions:
First, why the gsi of devices belong to PVH dom0 can work?
Because when probe a driver to a normal device, it uses the normal
probe function of pci device, in its callstack, it requests irq
and unmask corresponding ioapic of gsi, then trap into xen and
register gsi finally.
Callstack is(on linux kernel side) pci_device_probe->
request_threaded_irq-> irq_startup-> __unmask_ioapic->
io_apic_write, then trap into xen hvmemul_do_io->
hvm_io_intercept-> hvm_process_io_intercept->
vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi.
So that the gsi can be registered.

Second, why the gsi of passthrough device can't work when dom0
is PVH?
Because when assign a device to passthrough, it uses the specific
probe function of pciback, in its callstack, it doesn't install a
fake irq handler due to the ISR is not running. So that
mp_register_gsi on Xen side is never called, then the gsi is not
registered.
Callstack is(on linux kernel side) pcistub_probe->pcistub_seize->
pcistub_init_device-> xen_pcibk_reset_device->
xen_pcibk_control_isr->isr_on==0.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 0b7fc060b4e2..81883c8d4f60 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -82,6 +82,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 return -ENOSYS;
 break;
 
+case PHYSDEVOP_setup_gsi:
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
-- 
2.34.1




[XEN PATCH v13 2/6] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-08-16 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
iPHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
So that the interrupt of a passthrough device can be successfully
mapped to pirq for domU with a notion of PIRQ when dom0 is PVH.

To exposing the functionality to wider than (presently) necessary
audience(like PVH domU), so it doesn't add any futher restrictions.
And there already are some senarios for domains without
X86_EMU_USE_PIRQ to use these functions.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
v12->v13 changes:
Removed the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and added a 
corresponding description in the commit message.

v11->v12 changes:
Avoid using return, set error code instead when (un)map is not allowed.

v10->v11 changes:
Delete the judgment of "d==currd", so that we can prevent physdev_(un)map_pirq 
from being executed when domU has no pirq, instead of just preventing 
self-mapping.
And modify the description of the commit message accordingly.

v9->v10 changes:
Indent the comments above PHYSDEVOP_map_pirq according to the code style.

v8->v9 changes:
Add a comment above PHYSDEVOP_map_pirq to describe why need this hypercall.
Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == DOMID_SELF" 
to "d == current->domian".

v7->v8 changes:
Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't 
use pirq.
That check was missed in the previous version.

v6->v7 changes:
Nothing.

v5->v6 changes:
Nothing.

v4->v5 changes:
Move the check of self map_pirq to physdev.c, and change to check if the caller 
has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op.

v3->v4 changes:
add check to prevent PVH self map.

v2->v3 changes:
Du to changes in the implementation of the second patch on kernel side(that it 
will do setup_gsi and map_pirq when assigning a device to passthrough), add 
PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self mapping.
---
 xen/arch/x86/hvm/hypercall.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 68815b03eb25..0b7fc060b4e2 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[XEN PATCH v13 0/6] Support device passthrough when dom0 is PVH on Xen

2024-08-16 Thread Jiqian Chen
Hi All,
This is v13 series to support passthrough when dom0 is PVH
The expected merge order of this series is the first three patches in this 
series, then patches on
kernel side, then the last three patches in this series.

v12->v13 changes:
Due to major changes in the codes, all the Reviewed-by received before have 
been removed.
Please review them again.
* patch#1: Delete all "state" words in new code, because it is not necessary.
   Delete unnecessary parameter reset_type of function 
vpci_reset_device, and changed this
   function to inline function.
   Add description to commit message to indicate that the 
classification of reset types is
   for possible different behaviors in the future.
   Rename reset_type of struct pci_device_reset to flags, and modified 
the value of macro
   definition of reset, let them occupy two lowest bits.
   Change the function vpci_reset_device to an inline function and 
delete the
   "ASSERT(rw_is_write_locked(&pdev->domain->pci_lock))"; because this 
exists in subsequent
   functions and it accesses domain and pci_lock, which will affect the 
compilation process.
* patch#2: Remove the PHYSDEVOP_(un)map_pirq restriction check for pvh domU and 
added a corresponding
   description in the commit message.
* patch#3: Add more detailed descriptions into commit message not just 
callstack.
* patch#4: For struct xen_domctl_gsi_permission, rename "access_flag" to 
"flags", change its type from
   uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE and 
XEN_DOMCTL_GSI_GRANT macros.
   Move "gsi > highest_gsi()" into function gsi_2_irq.
   Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to 
unsigned int type.
   Delete unnecessary spaces and brackets around 
"~XEN_DOMCTL_GSI_ACTION_MASK".
   Delete unnecessary goto statements and change to direct break.
   Add description in commit message to explain how gsi to irq is 
converted.
* patch#5: Rename the function xc_physdev_gsi_from_pcidev to xc_pcidev_get_gsi 
to avoid confusion with
   physdev namesapce.
   Move the implementation of xc_pcidev_get_gsi into xc_linux.c.
   Directly use xencall_fd(xch->xcall) in the function 
xc_pcidev_get_gsi instead of opening
   "privcmd".
* patch#6: Delete patch #6 of v12, and added function xc_physdev_map_pirq_gsi 
to map pirq for gsi.
   For functions that generate libxl error, changed the return value 
from -1 to ERROR_*.
   Instead of declaring "ctx", use the macro "CTX".
   Add the function libxl__arch_local_romain_ has_pirq_notion to 
determine if there is a concept
   of pirq in the domain where xl is located.
   In the function libxl__arch_hvm_unmap_gsi, before unmap_pirq, use 
map_pirq to obtain the pirq
   corresponding to gsi.

Best regards,
Jiqian Chen



v11->v12 changes:
* patch#1: Change the title of this patch.
   Remove unnecessary notes, erroneous stamps, and #define.
* patch#2: Avoid using return, set error code instead when (un)map is not 
allowed.
   Due to functional change in v11, remove the Reviewed-by of Stefano.
* patch#3: Add more detailed descriptions into commit message not just 
callstack.

patch#4 in v11: remove from this series and upstream individually.

* patch#4: is patch#5 of v11, change nr_irqs_gsi to highest_gsi() to check gsi 
boundary, then need to
   remove "__init" of highest_gsi function.
   Change the check of irq boundary from <0 to <=0, and remove 
unnecessary space.
   Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.
* patch#5: Add explanation of whether the caller of xc_physdev_map_pirq is 
affected.


v10->v11 changes:
* patch#1: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to 
the next line.
   Delete unnecessary local variables "struct physdev_pci_device *dev".
   Downgrade printk to dprintk.
   Moved struct pci_device_state_reset to the public header file.
   Delete enum pci_device_state_reset_type, and use macro definitions 
to represent different
   reset types.
   Delete pci_device_state_reset_method, and add switch cases in 
PHYSDEVOP_pci_device_state_reset
   to handle different reset functions.
   Add reset type as a function parameter for vpci_reset_device_state 
for possible future use
* patch#2: Delete the judgment of "d==currd", so that we can prevent 
physdev_(un)map_pirq from being
   executed when domU has no pirq, instead of just preventing 
self-mapping; and modify the
   description of the commit message accordingly.
* patch#3: Modify the commit message to explain why the gs

[RFC XEN PATCH v12 7/7] tools: Add new function to do PIRQ (un)map on PVH dom0

2024-07-08 Thread Jiqian Chen
When dom0 is PVH, and passthrough a device to dumU, xl will
use the gsi number of device to do a pirq mapping, see
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is
got from file /sys/bus/pci/devices//irq, that confuses
irq and gsi, they are in different space and are not equal,
so it will fail when mapping.
To solve this issue, use xc_physdev_gsi_from_dev to get the
real gsi and then to map pirq.

Besides, PVH dom doesn't have PIRQ flag, it doesn't do
PHYSDEVOP_map_pirq for each gsi. So grant function callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission
requires passing in pirq, it is not suitable for dom0 that
doesn't have PIRQs to grant irq permission.
To solve this issue, use the new hypercall
XEN_DOMCTL_gsi_permission to grant the permission of irq(
translate from gsi) to dumU when dom0 has no PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
This patch must be merged after the patch on linux kernel side
---
 tools/include/xenctrl.h   |   5 ++
 tools/libs/ctrl/xc_domain.c   |  15 +
 tools/libs/light/libxl_arch.h |   4 ++
 tools/libs/light/libxl_arm.c  |  10 +++
 tools/libs/light/libxl_pci.c  |  17 ++
 tools/libs/light/libxl_x86.c  | 111 ++
 6 files changed, 162 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 3720e22b399a..9ff5f1810cf8 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ uint8_t access_flag);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..4c89f07e4d6e 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ uint8_t access_flag)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.access_flag = access_flag,
+};
+
+return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_arch.h b/tools/libs/light/libxl_arch.h
index f88f11d6de1d..11b736067951 100644
--- a/tools/libs/light/libxl_arch.h
+++ b/tools/libs/light/libxl_arch.h
@@ -91,6 +91,10 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
   libxl_domain_config *dst,
   const libxl_domain_config *src);
 
+_hidden
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid);
+_hidden
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid);
 #if defined(__i386__) || defined(__x86_64__)
 
 #define LAPIC_BASE_ADDRESS  0xfee0
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index a4029e3ac810..d869bbec769e 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -1774,6 +1774,16 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
 {
 }
 
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+return -1;
+}
+
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+return -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..3d25997921cc 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -17,6 +17,7 @@
 #include "libxl_osdeps.h" /* must come before any other headers */
 
 #include "libxl_internal.h"
+#include "libxl_arch.h"
 
 #define PCI_BDF"%04x:%02x:%02x.%01x"
 #define PCI_BDF_SHORT  "%02x:%02x.%01x"
@@ -1478,6 +1479,16 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
+
+/*
+ * When dom0 is PVH and mapping a x86 gsi to 

[RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev

2024-07-08 Thread Jiqian Chen
When passthrough a device to domU, QEMU and xl tools use its gsi
number to do pirq mapping, see QEMU code
xen_pt_realize->xc_physdev_map_pirq, and xl code
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got
from file /sys/bus/pci/devices//irq, that is wrong, because
irq is not equal with gsi, they are in different spaces, so pirq
mapping fails.

And in current codes, there is no method to get gsi for userspace.
For above purpose, add new function to get gsi, and the
corresponding ioctl is implemented on linux kernel side.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
This patch must be merged after the patch on linux kernel side

CC: Anthony PERARD 
Remaining comment @Anthony PERARD:
Do I need to make " opening of /dev/xen/privcmd " as a single function, then 
use it in this
patch and other libraries?
---
 tools/include/xen-sys/Linux/privcmd.h |  7 ++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/ctrl/xc_physdev.c  | 35 +++
 3 files changed, 44 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..4cf719102116 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_pcidev {
+   __u32 sbdf;
+   __u32 gsi;
+} privcmd_gsi_from_pcidev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_PCIDEV  \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_pcidev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 9ceca0cffc2f..3720e22b399a 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index e9fcd755fa62..54edb0f3c0dc 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
 return rc;
 }
 
+int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf)
+{
+int rc = -1;
+
+#if defined(__linux__)
+int fd;
+privcmd_gsi_from_pcidev_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = 0,
+};
+
+fd = open("/dev/xen/privcmd", O_RDWR);
+
+if (fd < 0 && (errno == ENOENT || errno == ENXIO || errno == ENODEV)) {
+/* Fallback to /proc/xen/privcmd */
+fd = open("/proc/xen/privcmd", O_RDWR);
+}
+
+if (fd < 0) {
+PERROR("Could not obtain handle on privileged command interface");
+return rc;
+}
+
+rc = ioctl(fd, IOCTL_PRIVCMD_GSI_FROM_PCIDEV, &dev_gsi);
+close(fd);
+
+if (rc) {
+PERROR("Failed to get gsi from dev");
+} else {
+rc = dev_gsi.gsi;
+}
+#endif
+
+return rc;
+}
-- 
2.34.1




[XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-07-08 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
And add a new check to prevent (un)map when the subject domain
doesn't have a notion of PIRQ.

So that the interrupt of a passthrough device can be
successfully mapped to pirq for domU with a notion of PIRQ
when dom0 is PVH

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c |  6 ++
 xen/arch/x86/physdev.c   | 12 ++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 0fab670a4871..03ada3c880bd 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 switch ( cmd )
 {
+/*
+* Only being permitted for management of other domains.
+* Further restrictions are enforced in do_physdev_op.
+*/
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index d6dd622952a9..9f30a8c63a06 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( !d )
 break;
 
-ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
+/* Only mapping when the subject domain has a notion of PIRQ */
+if ( !is_hvm_domain(d) || has_pirq(d) )
+ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
+else
+ret = -EOPNOTSUPP;
 
 rcu_unlock_domain(d);
 
@@ -346,7 +350,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( !d )
 break;
 
-ret = physdev_unmap_pirq(d, unmap.pirq);
+/* Only unmapping when the subject domain has a notion of PIRQ */
+if ( !is_hvm_domain(d) || has_pirq(d) )
+ret = physdev_unmap_pirq(d, unmap.pirq);
+else
+ret = -EOPNOTSUPP;
 
 rcu_unlock_domain(d);
 
-- 
2.34.1




[XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev

2024-07-08 Thread Jiqian Chen
When a device has been reset on dom0 side, the Xen hypervisor
doesn't get notification, so the cached state in vpci is all
out of date compare with the real device state.

To solve that problem, add a new hypercall to support the reset
of pcidev and clear the vpci state of device. So that once the
state of device is reset on dom0 side, dom0 can call this
hypercall to notify hypervisor.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 52 
 xen/drivers/vpci/vpci.c  | 10 +++
 xen/include/public/physdev.h | 16 +++
 xen/include/xen/vpci.h   |  8 ++
 5 files changed, 87 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 7fb3136f0c7c..0fab670a4871 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..c0f47945d955 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset:
+{
+struct pci_device_state_reset dev_reset;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+ret = -EOPNOTSUPP;
+if ( !is_pci_passthrough_enabled() )
+break;
+
+ret = -EFAULT;
+if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
+break;
+
+sbdf = PCI_SBDF(dev_reset.dev.seg,
+dev_reset.dev.bus,
+dev_reset.dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(&pdev->domain->pci_lock);
+pcidevs_unlock();
+switch ( dev_reset.reset_type )
+{
+case PCI_DEVICE_STATE_RESET_COLD:
+case PCI_DEVICE_STATE_RESET_WARM:
+case PCI_DEVICE_STATE_RESET_HOT:
+case PCI_DEVICE_STATE_RESET_FLR:
+ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
+break;
+
+default:
+ret = -EOPNOTSUPP;
+break;
+}
+write_unlock(&pdev->domain->pci_lock);
+
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..7e914d1eff9f 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev,
+uint32_t reset_type)
+{
+ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..3cfde3fd2389 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
@@ -305,6 +312,15 @@ struct physdev_pci_device {
 typedef struct physdev_pci_device physdev_pci_device_t;
 DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
 
+struct pci_device_state_reset {
+physdev_pci_device_t dev;
+#define PCI_DEVICE_STATE_RESET_COLD 0
+#define PCI_DEVICE_STATE_RESET_WARM 1
+#define PCI_DEVICE_STATE_RESET_HOT  2
+#define PCI_DEVICE_STATE_RESET_FLR  3
+uint32_t reset_type;
+};
+
 #define PHYSDEVOP_DBGP_RESET_PREPARE1
 #define PHYSDEVOP_DBGP_RESET_DONE   2
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index da8d0f41e6f4..6be812dbc04a 100644
--- a/xen/i

[XEN PATCH v12 5/7] tools/libxc: Allow gsi be mapped into a free pirq

2024-07-08 Thread Jiqian Chen
Hypercall PHYSDEVOP_map_pirq support to map a gsi into a specific
pirq or a free pirq, it depends on the parameter pirq(>0 or <0).
But in current xc_physdev_map_pirq, it set *pirq=index when
parameter pirq is <0, it causes to force all cases to be mapped
to a specific pirq. That has some problems, one is caller can't
get a free pirq value, another is that once the pecific pirq was
already mapped to other gsi, then it will fail.

So, change xc_physdev_map_pirq to allow to pass negative parameter
in and then get a free pirq.

There are four caller of xc_physdev_map_pirq in original codes, so
clarify the affect below(just need to clarify the pirq<0 case):

First, pci_add_dm_done->xc_physdev_map_pirq, it pass irq to pirq
parameter, if pirq<0 means irq<0, then it will fail at check
"index < 0" in allocate_and_map_gsi_pirq and get EINVAL, logic is
the same as original code.

Second, domcreate_launch_dm->libxl__arch_domain_map_irq->
xc_physdev_map_pirq, the passed pirq is always >=0, so no affect.

Third, pyxc_physdev_map_pirq->xc_physdev_map_pirq, not sure, so add
the check logic into pyxc_physdev_map_pirq to keep the same behavior.

Fourth, xen_pt_realize->xc_physdev_map_pirq, it wants to allocate a
pirq for gsi, but it isn't necessary to get pirq whose value is equal
with the value of gsi. After this patch, it will get a free pirq, and
it also can work.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/libs/ctrl/xc_physdev.c  | 2 +-
 tools/python/xen/lowlevel/xc/xc.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 460a8e779ce8..e9fcd755fa62 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -50,7 +50,7 @@ int xc_physdev_map_pirq(xc_interface *xch,
 map.domid = domid;
 map.type = MAP_PIRQ_TYPE_GSI;
 map.index = index;
-map.pirq = *pirq < 0 ? index : *pirq;
+map.pirq = *pirq;
 
 rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, &map, sizeof(map));
 
diff --git a/tools/python/xen/lowlevel/xc/xc.c 
b/tools/python/xen/lowlevel/xc/xc.c
index 9feb12ae2b16..f8c9db7115ee 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -774,6 +774,8 @@ static PyObject *pyxc_physdev_map_pirq(PyObject *self,
 if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iii", kwd_list,
   &dom, &index, &pirq) )
 return NULL;
+if ( pirq < 0 )
+pirq = index;
 ret = xc_physdev_map_pirq(xc->xc_handle, dom, index, &pirq);
 if ( ret != 0 )
   return pyxc_error_to_exception(xc->xc_handle);
-- 
2.34.1




[XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-07-08 Thread Jiqian Chen
The gsi of a passthrough device must be configured for it to be
able to be mapped into a hvm domU.
But When dom0 is PVH, the gsis may not get registered(see below
clarification), it causes the info of apic, pin and irq not be
added into irq_2_pin list, and the handler of irq_desc is not set,
then when passthrough a device, setting ioapic affinity and vector
will fail.

To fix above problem, on Linux kernel side, a new code will
need to call PHYSDEVOP_setup_gsi for passthrough devices to
register gsi when dom0 is PVH.

So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
purpose.

Clarify two questions:
First, why the gsi of devices belong to PVH dom0 can work?
Because when probe a driver to a normal device, it uses the normal
probe function of pci device, in its callstack, it requests irq
and unmask corresponding ioapic of gsi, then trap into xen and
register gsi finally.
Callstack is(on linux kernel side) pci_device_probe->
request_threaded_irq-> irq_startup-> __unmask_ioapic->
io_apic_write, then trap into xen hvmemul_do_io->
hvm_io_intercept-> hvm_process_io_intercept->
vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi.
So that the gsi can be registered.

Second, why the gsi of passthrough device can't work when dom0
is PVH?
Because when assign a device to passthrough, it uses the specific
probe function of pciback, in its callstack, it doesn't install a
fake irq handler due to the ISR is not running. So that
mp_register_gsi on Xen side is never called, then the gsi is not
registered.
Callstack is(on linux kernel side) pcistub_probe->pcistub_seize->
pcistub_init_device-> xen_pcibk_reset_device->
xen_pcibk_control_isr->isr_on==0.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 03ada3c880bd..cfe82d0f96ed 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 return -ENOSYS;
 break;
 
+case PHYSDEVOP_setup_gsi:
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
-- 
2.34.1




[XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi

2024-07-08 Thread Jiqian Chen
Some type of domains don't have PIRQs, like PVH, it doesn't do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest base on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
irq on Xen side.
What's more, current hypercall XEN_DOMCTL_irq_permission requires
passing in pirq to set the access of irq, it is not suitable for
dom0 that doesn't have PIRQs.

So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
the permission of irq(translate from x86 gsi) to dumU when dom0
has no PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
CC: Daniel P . Smith 
Remaining comment @Daniel P . Smith:
+ret = -EPERM;
+if ( !irq_access_permitted(currd, irq) ||
+ xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
+goto gsi_permission_out;
Is it okay to issue the XSM check using the translated value, 
not the one that was originally passed into the hypercall?
---
 xen/arch/x86/domctl.c  | 32 ++
 xen/arch/x86/include/asm/io_apic.h |  2 ++
 xen/arch/x86/io_apic.c | 17 
 xen/arch/x86/mpparse.c |  5 ++---
 xen/include/public/domctl.h|  9 +
 xen/xsm/flask/hooks.c  |  1 +
 6 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 9190e11faaa3..4e9e4c4cfed3 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int update_domain_cpu_policy(struct domain *d,
 xen_domctl_cpu_policy_t *xdpc)
@@ -237,6 +238,37 @@ long arch_do_domctl(
 break;
 }
 
+case XEN_DOMCTL_gsi_permission:
+{
+int irq;
+unsigned int gsi = domctl->u.gsi_permission.gsi;
+uint8_t access_flag = domctl->u.gsi_permission.access_flag;
+
+/* Check all bits and pads are zero except lowest bit */
+ret = -EINVAL;
+if ( access_flag & ( ~XEN_DOMCTL_GSI_PERMISSION_MASK ) )
+goto gsi_permission_out;
+for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
+if ( domctl->u.gsi_permission.pad[i] )
+goto gsi_permission_out;
+
+if ( gsi > highest_gsi() || (irq = gsi_2_irq(gsi)) <= 0 )
+goto gsi_permission_out;
+
+ret = -EPERM;
+if ( !irq_access_permitted(currd, irq) ||
+ xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
+goto gsi_permission_out;
+
+if ( access_flag )
+ret = irq_permit_access(d, irq);
+else
+ret = irq_deny_access(d, irq);
+
+gsi_permission_out:
+break;
+}
+
 case XEN_DOMCTL_getpageframeinfo3:
 {
 unsigned int num = domctl->u.getpageframeinfo3.num;
diff --git a/xen/arch/x86/include/asm/io_apic.h 
b/xen/arch/x86/include/asm/io_apic.h
index 78268ea8f666..7e86d8337758 100644
--- a/xen/arch/x86/include/asm/io_apic.h
+++ b/xen/arch/x86/include/asm/io_apic.h
@@ -213,5 +213,7 @@ unsigned highest_gsi(void);
 
 int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
 int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
+int mp_find_ioapic(int gsi);
+int gsi_2_irq(int gsi);
 
 #endif
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index d2a313c4ac72..5968c8055671 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
 return irq;
 }
 
+int gsi_2_irq(int gsi)
+{
+int ioapic, pin, irq;
+
+ioapic = mp_find_ioapic(gsi);
+if ( ioapic < 0 )
+return -EINVAL;
+
+pin = gsi - io_apic_gsi_base(ioapic);
+
+irq = apic_pin_2_gsi_irq(ioapic, pin);
+if ( irq <= 0 )
+return -EINVAL;
+
+return irq;
+}
+
 static inline int IO_APIC_irq_trigger(int irq)
 {
 int apic, idx, pin;
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index d8ccab2449c6..7786a3337760 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -841,8 +841,7 @@ static struct mp_ioapic_routing {
 } mp_ioapic_routing[MAX_IO_APICS];
 
 
-static int mp_find_ioapic (
-   int gsi)
+int mp_find_ioapic(int gsi)
 {
unsigned inti;
 
@@ -914,7 +913,7 @@ void __init mp_register_ioapic (
return;
 }
 
-unsigned __init highest_gsi(void)
+unsigned highest_gsi(void)
 {
unsigned x, res = 0;
for (x = 0; x < nr_ioapics; x++)
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 2a49fe46ce25..877e35ab1376 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -464,6 +464,13 @@ struct xen_domctl_irq_permission {
 ui

[XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen

2024-07-08 Thread Jiqian Chen
Hi All,
This is v12 series to support passthrough when dom0 is PVH
The expected merge order of this series is the first three patches in this 
series, then patches on
kernel side, then the last four patches in this series.
v11->v12 changes:
* patch#1: Change the title of this patch.
   Remove unnecessary notes, erroneous stamps, and #define.
* patch#2: Avoid using return, set error code instead when (un)map is not 
allowed.
   Due to functional change in v11, remove the Reviewed-by of Stefano.
* patch#3: Add more detailed descriptions into commit message not just 
callstack.

patch#4 in v11: remove from this series and upstream individually.

* patch#4: is patch#5 of v11, change nr_irqs_gsi to highest_gsi() to check gsi 
boundary, then need to
   remove "__init" of highest_gsi function.
   Change the check of irq boundary from <0 to <=0, and remove 
unnecessary space.
   Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.
* patch#5: Add explanation of whether the caller of xc_physdev_map_pirq is 
affected.


Best regards,
Jiqian Chen



v10->v11 changes:
* patch#1: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to 
the next line.
   Delete unnecessary local variables "struct physdev_pci_device *dev".
   Downgrade printk to dprintk.
   Moved struct pci_device_state_reset to the public header file.
   Delete enum pci_device_state_reset_type, and use macro definitions 
to represent different
   reset types.
   Delete pci_device_state_reset_method, and add switch cases in 
PHYSDEVOP_pci_device_state_reset
   to handle different reset functions.
   Add reset type as a function parameter for vpci_reset_device_state 
for possible future use
* patch#2: Delete the judgment of "d==currd", so that we can prevent 
physdev_(un)map_pirq from being
   executed when domU has no pirq, instead of just preventing 
self-mapping; and modify the
   description of the commit message accordingly.
* patch#3: Modify the commit message to explain why the gsi of normal devices 
can work in PVH dom0 and why
   the passthrough device does not work in PVH dom0.
* patch#4: New patch, modification of allocate_pirq function, return the 
allocated pirq when there is
   already an allocated pirq and the caller has no specific 
requirements for pirq, and make it
   successful.
* patch#5: Modification on the hypervisor side proposed from patch#5 of v10.
   Add non-zero judgment for other bits of allow_access.
   Delete unnecessary judgment "if ( is_pv_domain(currd) || 
has_pirq(currd) )".
   Change the error exit path identifier "out" to "gsi_permission_out".
   Use ARRAY_SIZE() instead of open coed.
* patch#6: New patch, modification of xc_physdev_map_pirq to support mapping 
gsi to an idle pirq.
* patch#7: Patch#4 of v10, directly open "/dev/xen/privcmd" in the function 
xc_physdev_gsi_from_dev
   instead of adding unnecessary functions to libxencall.
   Change the type of gsi in the structure privcmd_gsi_from_dev from 
int to u32.
* patch#8: Modification of the tools part of patches#4 and #5 of v10, use 
privcmd_gsi_from_dev to get
   gsi, and use XEN_DOMCTL_gsi_permission to grant gsi.
   Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID.
   Add libxl__arch_hvm_map_gsi to distinguish x86 related 
implementations.
   Add a list pcidev_pirq_list to record the relationship between sbdf 
and pirq, which can be
   used to obtain the corresponding pirq when unmap PIRQ.


v9->v10 changes:
* patch#2: Indent the comments above PHYSDEVOP_map_pirq according to the code 
style.
* patch#3: Modified the description in the commit message, changing "it calls" 
to "it will need to call",
   indicating that there will be new codes on the kernel side that will 
call PHYSDEVOP_setup_gsi.
   Also added an explanation of why the interrupt of passthrough device 
does not work if gsi is not
   registered.
* patch#4: Added define for CONFIG_X86 in tools/libs/light/Makefile to isolate 
x86 code in libxl_pci.c.
* patch#5: Modified the commit message to further describe the purpose of 
adding XEN_DOMCTL_gsi_permission.
   Deleted pci_device_set_gsi and called XEN_DOMCTL_gsi_permission 
directly in pci_add_dm_done.
   Added a check for all zeros in the padding field in 
XEN_DOMCTL_gsi_permission, and used currd
   instead of current->domain.
   In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of 
the original new code, and
   error handling for irq0 was added.
   Deleted the extra spaces in the upper and lower lines of the struct 
xen_domctl_gsi_permission
   definition.
All patches have modified s

[PATCH for-4.19 v2] x86/physdev: Return pirq that irq was already mapped to

2024-07-08 Thread Jiqian Chen
Fix bug introduced by 0762e2502f1f ("x86/physdev: factor out the code to 
allocate and
map a pirq"). After that re-factoring, when pirq<0 and current_pirq>0, it means
caller want to allocate a free pirq for irq but irq already has a mapped pirq, 
then
it returns the negative pirq, so it fails. However, the logic before that
re-factoring is different, it should return the current_pirq that irq was 
already
mapped to and make the call success.

Fixes: 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a 
pirq")

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Jan Beulich 
---
 xen/arch/x86/irq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 017a94e31155..47477d88171b 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2898,6 +2898,7 @@ static int allocate_pirq(struct domain *d, int index, int 
pirq, int irq,
 d->domain_id, index, pirq, current_pirq);
 if ( current_pirq < 0 )
 return -EBUSY;
+pirq = current_pirq;
 }
 else if ( type == MAP_PIRQ_TYPE_MULTI_MSI )
 {
-- 
2.34.1




[PATCH for-4.19] x86/physdev: Return pirq that irq was already mapped to

2024-07-07 Thread Jiqian Chen
Fix bug imported by 0762e2502f1f ("x86/physdev: factor out the code to allocate 
and
map a pirq"). After that re-factoring, when pirq<0 and current_pirq>0, it means
caller want to allocate a free pirq for irq but irq already has a mapped pirq, 
then
it returns the negative pirq, so it fails. However, the logic before that
re-factoring is different, it should return the current_pirq that irq was 
already
mapped to and make the call success.

Fixes: 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a 
pirq")

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/irq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 9a611c79e024..1a827ccc8498 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2897,6 +2897,7 @@ static int allocate_pirq(struct domain *d, int index, int 
pirq, int irq,
 d->domain_id, index, pirq, current_pirq);
 if ( current_pirq < 0 )
 return -EBUSY;
+pirq = current_pirq;
 }
 else if ( type == MAP_PIRQ_TYPE_MULTI_MSI )
 {
-- 
2.34.1




[XEN PATCH v11 6/8] tools/libxc: Allow gsi be mapped into a free pirq

2024-06-30 Thread Jiqian Chen
Hypercall PHYSDEVOP_map_pirq support to map a gsi into a specific
pirq or a free pirq, it depends on the parameter pirq(>0 or <0).
But in current xc_physdev_map_pirq, it set *pirq=index when
parameter pirq is <0, it causes to force all cases to be mapped
to a specific pirq. That has some problems, one is caller can't
get a free pirq value, another is that once the pecific pirq was
already mapped to other gsi, then it will fail.

So, change xc_physdev_map_pirq to allow to pass negative parameter
in and then get a free pirq.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/libs/ctrl/xc_physdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 460a8e779ce8..e9fcd755fa62 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -50,7 +50,7 @@ int xc_physdev_map_pirq(xc_interface *xch,
 map.domid = domid;
 map.type = MAP_PIRQ_TYPE_GSI;
 map.index = index;
-map.pirq = *pirq < 0 ? index : *pirq;
+map.pirq = *pirq;
 
 rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, &map, sizeof(map));
 
-- 
2.34.1




[RFC XEN PATCH v11 8/8] tools: Add new function to do PIRQ (un)map on PVH dom0

2024-06-30 Thread Jiqian Chen
When dom0 is PVH, and passthrough a device to dumU, xl will
use the gsi number of device to do a pirq mapping, see
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is
got from file /sys/bus/pci/devices//irq, that confuses
irq and gsi, they are in different space and are not equal,
so it will fail when mapping.
To solve this issue, use xc_physdev_gsi_from_dev to get the
real gsi and then to map pirq.

Besides, PVH dom doesn't have PIRQ flag, it doesn't do
PHYSDEVOP_map_pirq for each gsi. So grant function callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission
requires passing in pirq, it is not suitable for dom0 that
doesn't have PIRQs to grant irq permission.
To solve this issue, use the new hypercall
XEN_DOMCTL_gsi_permission to grant the permission of irq(
translate from gsi) to dumU when dom0 has no PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
This patch must be merged after the patch on linux kernel side
---
 tools/include/xenctrl.h   |   5 ++
 tools/libs/ctrl/xc_domain.c   |  15 +
 tools/libs/light/libxl_arch.h |   4 ++
 tools/libs/light/libxl_arm.c  |  10 +++
 tools/libs/light/libxl_pci.c  |  17 ++
 tools/libs/light/libxl_x86.c  | 111 ++
 6 files changed, 162 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 3720e22b399a..33810385535e 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.allow_access = allow_access,
+};
+
+return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_arch.h b/tools/libs/light/libxl_arch.h
index f88f11d6de1d..11b736067951 100644
--- a/tools/libs/light/libxl_arch.h
+++ b/tools/libs/light/libxl_arch.h
@@ -91,6 +91,10 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
   libxl_domain_config *dst,
   const libxl_domain_config *src);
 
+_hidden
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid);
+_hidden
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid);
 #if defined(__i386__) || defined(__x86_64__)
 
 #define LAPIC_BASE_ADDRESS  0xfee0
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index a4029e3ac810..d869bbec769e 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -1774,6 +1774,16 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
 {
 }
 
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+return -1;
+}
+
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+return -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..3d25997921cc 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -17,6 +17,7 @@
 #include "libxl_osdeps.h" /* must come before any other headers */
 
 #include "libxl_internal.h"
+#include "libxl_arch.h"
 
 #define PCI_BDF"%04x:%02x:%02x.%01x"
 #define PCI_BDF_SHORT  "%02x:%02x.%01x"
@@ -1478,6 +1479,16 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
+
+/*
+ * When dom0 is PVH and mapping a x86 gsi to 

[XEN PATCH v11 0/8] Support device passthrough when dom0 is PVH on Xen

2024-06-30 Thread Jiqian Chen
Hi All,
This is v11 series to support passthrough when dom0 is PVH
v10->v11 changes:
* patch#1: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to 
the next line.
   Delete unnecessary local variables "struct physdev_pci_device *dev".
   Downgrade printk to dprintk.
   Moved struct pci_device_state_reset to the public header file.
   Delete enum pci_device_state_reset_type, and use macro definitions 
to represent different
   reset types.
   Delete pci_device_state_reset_method, and add switch cases in 
PHYSDEVOP_pci_device_state_reset
   to handle different reset functions.
   Add reset type as a function parameter for vpci_reset_device_state 
for possible future use
* patch#2: Delete the judgment of "d==currd", so that we can prevent 
physdev_(un)map_pirq from being
   executed when domU has no pirq, instead of just preventing 
self-mapping; and modify the
   description of the commit message accordingly.
* patch#3: Modify the commit message to explain why the gsi of normal devices 
can work in PVH dom0 and why
   the passthrough device does not work in PVH dom0.
* patch#4: New patch, modification of allocate_pirq function, return the 
allocated pirq when there is
   already an allocated pirq and the caller has no specific 
requirements for pirq, and make it
   successful.
* patch#5: Modification on the hypervisor side proposed from patch#5 of v10.
   Add non-zero judgment for other bits of allow_access.
   Delete unnecessary judgment "if ( is_pv_domain(currd) || 
has_pirq(currd) )".
   Change the error exit path identifier "out" to "gsi_permission_out".
   Use ARRAY_SIZE() instead of open coed.
* patch#6: New patch, modification of xc_physdev_map_pirq to support mapping 
gsi to an idle pirq.
* patch#7: Patch#4 of v10, directly open "/dev/xen/privcmd" in the function 
xc_physdev_gsi_from_dev
   instead of adding unnecessary functions to libxencall.
   Change the type of gsi in the structure privcmd_gsi_from_dev from 
int to u32.
* patch#8: Modification of the tools part of patches#4 and #5 of v10, use 
privcmd_gsi_from_dev to get
   gsi, and use XEN_DOMCTL_gsi_permission to grant gsi.
   Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID.
   Add libxl__arch_hvm_map_gsi to distinguish x86 related 
implementations.
   Add a list pcidev_pirq_list to record the relationship between sbdf 
and pirq, which can be
   used to obtain the corresponding pirq when unmap PIRQ.


Best regards,
Jiqian Chen



v9->v10 changes:
* patch#2: Indent the comments above PHYSDEVOP_map_pirq according to the code 
style.
* patch#3: Modified the description in the commit message, changing "it calls" 
to "it will need to call",
   indicating that there will be new codes on the kernel side that will 
call PHYSDEVOP_setup_gsi.
   Also added an explanation of why the interrupt of passthrough device 
does not work if gsi is not
   registered.
* patch#4: Added define for CONFIG_X86 in tools/libs/light/Makefile to isolate 
x86 code in libxl_pci.c.
* patch#5: Modified the commit message to further describe the purpose of 
adding XEN_DOMCTL_gsi_permission.
   Deleted pci_device_set_gsi and called XEN_DOMCTL_gsi_permission 
directly in pci_add_dm_done.
   Added a check for all zeros in the padding field in 
XEN_DOMCTL_gsi_permission, and used currd
   instead of current->domain.
   In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of 
the original new code, and
   error handling for irq0 was added.
   Deleted the extra spaces in the upper and lower lines of the struct 
xen_domctl_gsi_permission
   definition.
All patches have modified signatures as follows:
Signed-off-by: Jiqian Chen  means I am the author.
Signed-off-by: Huang Rui  means Rui sent them to upstream 
firstly.
Signed-off-by: Jiqian Chen  means I take continue to 
upstream.


v8->v9 changes:
* patch#1: Move pcidevs_unlock below write_lock, and remove 
"ASSERT(pcidevs_locked());"
   from vpci_reset_device_state;
   Add pci_device_state_reset_type to distinguish the reset types.
* patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this 
hypercall.
   Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == 
DOMID_SELF" to
   "d == current->domian".
* patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same checke 
in below.Although their return
   values are different, this difference is acceptable for the sake of 
code consistency
   if ( !is_hardware_domain(currd) )
   return -ENOSYS;
   break;
* patch#5: Change

[XEN PATCH v11 3/8] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-30 Thread Jiqian Chen
The gsi of a passthrough device must be configured for it to be
able to be mapped into a hvm domU.
But When dom0 is PVH, the gsis don't get registered, it causes
the info of apic, pin and irq not be added into irq_2_pin list,
and the handler of irq_desc is not set, then when passthrough a
device, setting ioapic affinity and vector will fail.

To fix above problem, on Linux kernel side, a new code will
need to call PHYSDEVOP_setup_gsi for passthrough devices to
register gsi when dom0 is PVH.

So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
purpose.

Clarify two questions:
First, why the gsi of devices belong to PVH dom0 can work?
Because when probe a driver to a normal device, it calls(on linux
kernel side) pci_device_probe-> request_threaded_irq->
irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen
side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept->
vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi.
So that the gsi can be registered.

Second, why the gsi of passthrough device can't work when dom0
is PVH?
Because when assign a device to passthrough, it uses pciback to
probe the device, and it calls pcistub_probe->pcistub_seize->
pcistub_init_device-> xen_pcibk_reset_device->
xen_pcibk_control_isr->isr_on, but isr_on is not set, so that the
fake IRQ handler is not installed, then the gsi isn't unmasked.
What's more, we can see on Xen side, the function
vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when
the gsi is unmasked, so that the gsi can't work for passthrough
device.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 03ada3c880bd..cfe82d0f96ed 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 return -ENOSYS;
 break;
 
+case PHYSDEVOP_setup_gsi:
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
-- 
2.34.1




[XEN PATCH v11 2/8] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-06-30 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
And add a new check to prevent (un)map when the subject domain
has no X86_EMU_USE_PIRQ flag.

So that the interrupt of a passthrough device can be
successfully mapped to pirq for domU with X86_EMU_USE_PIRQ flag
when dom0 is PVH

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  6 ++
 xen/arch/x86/physdev.c   | 14 ++
 2 files changed, 20 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 0fab670a4871..03ada3c880bd 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 switch ( cmd )
 {
+/*
+* Only being permitted for management of other domains.
+* Further restrictions are enforced in do_physdev_op.
+*/
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index d6dd622952a9..a165f68225c1 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -323,6 +323,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( !d )
 break;
 
+/* Prevent mapping when the subject domain has no X86_EMU_USE_PIRQ */
+if ( is_hvm_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+
 ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
 
 rcu_unlock_domain(d);
@@ -346,6 +353,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( !d )
 break;
 
+/* Prevent unmapping when the subject domain has no X86_EMU_USE_PIRQ */
+if ( is_hvm_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+
 ret = physdev_unmap_pirq(d, unmap.pirq);
 
 rcu_unlock_domain(d);
-- 
2.34.1




[RFC XEN PATCH v11 7/8] tools: Add new function to get gsi from dev

2024-06-30 Thread Jiqian Chen
When passthrough a device to domU, QEMU and xl tools use its gsi
number to do pirq mapping, see QEMU code
xen_pt_realize->xc_physdev_map_pirq, and xl code
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got
from file /sys/bus/pci/devices//irq, that is wrong, because
irq is not equal with gsi, they are in different spaces, so pirq
mapping fails.

And in current codes, there is no method to get gsi for userspace.
For above purpose, add new function to get gsi, and the
corresponding ioctl is implemented on linux kernel side.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
This patch must be merged after the patch on linux kernel side
---
 tools/include/xen-sys/Linux/privcmd.h |  7 ++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/ctrl/xc_physdev.c  | 35 +++
 3 files changed, 44 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..4cf719102116 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_pcidev {
+   __u32 sbdf;
+   __u32 gsi;
+} privcmd_gsi_from_pcidev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_PCIDEV  \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_pcidev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 9ceca0cffc2f..3720e22b399a 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index e9fcd755fa62..54edb0f3c0dc 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
 return rc;
 }
 
+int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf)
+{
+int rc = -1;
+
+#if defined(__linux__)
+int fd;
+privcmd_gsi_from_pcidev_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = 0,
+};
+
+fd = open("/dev/xen/privcmd", O_RDWR);
+
+if (fd < 0 && (errno == ENOENT || errno == ENXIO || errno == ENODEV)) {
+/* Fallback to /proc/xen/privcmd */
+fd = open("/proc/xen/privcmd", O_RDWR);
+}
+
+if (fd < 0) {
+PERROR("Could not obtain handle on privileged command interface");
+return rc;
+}
+
+rc = ioctl(fd, IOCTL_PRIVCMD_GSI_FROM_PCIDEV, &dev_gsi);
+close(fd);
+
+if (rc) {
+PERROR("Failed to get gsi from dev");
+} else {
+rc = dev_gsi.gsi;
+}
+#endif
+
+return rc;
+}
-- 
2.34.1




[XEN PATCH v11 5/8] x86/domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-30 Thread Jiqian Chen
Some type of domain don't have PIRQs, like PVH, it doesn't do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest base on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
irq on Xen side.
What's more, current hypercall XEN_DOMCTL_irq_permission requires
passing in pirq, it is not suitable for dom0 that doesn't have
PIRQs.

So, add a new hypercall XEN_DOMCTL_gsi_permission to grant the
permission of irq(translate from gsi) to dumU when dom0 has no
PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/domctl.c  | 33 ++
 xen/arch/x86/include/asm/io_apic.h |  2 ++
 xen/arch/x86/io_apic.c | 17 +++
 xen/arch/x86/mpparse.c |  3 +--
 xen/include/public/domctl.h|  8 
 xen/xsm/flask/hooks.c  |  1 +
 6 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 9190e11faaa3..5f20febabbf2 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int update_domain_cpu_policy(struct domain *d,
 xen_domctl_cpu_policy_t *xdpc)
@@ -237,6 +238,38 @@ long arch_do_domctl(
 break;
 }
 
+case XEN_DOMCTL_gsi_permission:
+{
+int irq;
+uint8_t mask = 1;
+unsigned int gsi = domctl->u.gsi_permission.gsi;
+bool allow = domctl->u.gsi_permission.allow_access;
+
+/* Check all bits and pads are zero except lowest bit */
+ret = -EINVAL;
+if ( domctl->u.gsi_permission.allow_access & ( !mask ) )
+goto gsi_permission_out;
+for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
+if ( domctl->u.gsi_permission.pad[i] )
+goto gsi_permission_out;
+
+if ( gsi >= nr_irqs_gsi || ( irq = gsi_2_irq(gsi) ) < 0 )
+goto gsi_permission_out;
+
+ret = -EPERM;
+if ( !irq_access_permitted(currd, irq) ||
+ xsm_irq_permission(XSM_HOOK, d, irq, allow) )
+goto gsi_permission_out;
+
+if ( allow )
+ret = irq_permit_access(d, irq);
+else
+ret = irq_deny_access(d, irq);
+
+gsi_permission_out:
+break;
+}
+
 case XEN_DOMCTL_getpageframeinfo3:
 {
 unsigned int num = domctl->u.getpageframeinfo3.num;
diff --git a/xen/arch/x86/include/asm/io_apic.h 
b/xen/arch/x86/include/asm/io_apic.h
index 78268ea8f666..7e86d8337758 100644
--- a/xen/arch/x86/include/asm/io_apic.h
+++ b/xen/arch/x86/include/asm/io_apic.h
@@ -213,5 +213,7 @@ unsigned highest_gsi(void);
 
 int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
 int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
+int mp_find_ioapic(int gsi);
+int gsi_2_irq(int gsi);
 
 #endif
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index d73108558e09..d54283955a60 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
 return irq;
 }
 
+int gsi_2_irq(int gsi)
+{
+int ioapic, pin, irq;
+
+ioapic = mp_find_ioapic(gsi);
+if ( ioapic < 0 )
+return -EINVAL;
+
+pin = gsi - io_apic_gsi_base(ioapic);
+
+irq = apic_pin_2_gsi_irq(ioapic, pin);
+if ( irq <= 0 )
+return -EINVAL;
+
+return irq;
+}
+
 static inline int IO_APIC_irq_trigger(int irq)
 {
 int apic, idx, pin;
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index d8ccab2449c6..c95da0de5770 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -841,8 +841,7 @@ static struct mp_ioapic_routing {
 } mp_ioapic_routing[MAX_IO_APICS];
 
 
-static int mp_find_ioapic (
-   int gsi)
+int mp_find_ioapic(int gsi)
 {
unsigned inti;
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 2a49fe46ce25..f7ae8b19d27d 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -464,6 +464,12 @@ struct xen_domctl_irq_permission {
 uint8_t pad[3];
 };
 
+/* XEN_DOMCTL_gsi_permission */
+struct xen_domctl_gsi_permission {
+uint32_t gsi;
+uint8_t allow_access;/* flag to specify enable/disable of x86 gsi 
access */
+uint8_t pad[3];
+};
 
 /* XEN_DOMCTL_iomem_permission */
 struct xen_domctl_iomem_permission {
@@ -1306,6 +1312,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_get_paging_mempool_size   85
 #define XEN_DOMCTL_set_paging_mempool_size   86
 #define XEN_DOMCTL_dt_overlay87
+#define XEN_DOMCTL_gsi_permission88
 #define XEN_DOMCTL_gdbsx_guestmemio1000
 

[XEN PATCH v11 4/8] x86/physdev: Return pirq that irq was already mapped to

2024-06-30 Thread Jiqian Chen
allocate_pirq is to allocate a pirq for a irq, and it supports to
allocate a free pirq(pirq parameter is <0) or a specific pirq (pirq
parameter is > 0).

For current code, it has four usecases.

First, pirq>0 and current_pirq>0, (current_pirq means if irq already
has a mapped pirq), if pirq==current_pirq means the irq already has
mapped to the pirq expected by the caller, it successes, if
pirq!=current_pirq means the pirq expected by the caller has been
mapped into other irq, it fails.

Second, pirq>0 and current_pirq<0, it means pirq expected by the
caller has not been allocated to any irqs, so it can be allocated to
caller, it successes.

Third, pirq<0 and current_pirq<0, it means caller want to allocate a
free pirq for irq and irq has no mapped pirq, it successes.

Fourth, pirq<0 and current_pirq>0, it means caller want to allocate
a free pirq for irq but irq has a mapped pirq, then it returns the
negative pirq, so it fails.

The problem is in Fourth, since the irq has a mapped pirq(current_pirq),
and the caller doesn't want to allocate a specified pirq to the irq, so
the current_pirq should be returned directly in this case, indicating
that the allocation is successful. That can help caller to success when
caller just want to allocate a free pirq but doesn't know if the irq
already has a mapped pirq or not.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/irq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 9a611c79e024..5ccca1646eb1 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2897,6 +2897,8 @@ static int allocate_pirq(struct domain *d, int index, int 
pirq, int irq,
 d->domain_id, index, pirq, current_pirq);
 if ( current_pirq < 0 )
 return -EBUSY;
+else
+return current_pirq;
 }
 else if ( type == MAP_PIRQ_TYPE_MULTI_MSI )
 {
-- 
2.34.1




[XEN PATCH v11 1/8] xen/vpci: Clear all vpci status of device

2024-06-30 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 58 
 xen/drivers/vpci/vpci.c  | 10 +++
 xen/include/public/physdev.h | 20 +
 xen/include/xen/vpci.h   |  8 +
 5 files changed, 97 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 7fb3136f0c7c..0fab670a4871 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..19a755d1c127 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,63 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset:
+{
+struct pci_device_state_reset dev_reset;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+ret = -EOPNOTSUPP;
+if ( !is_pci_passthrough_enabled() )
+break;
+
+ret = -EFAULT;
+if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
+break;
+
+sbdf = PCI_SBDF(dev_reset.dev.seg,
+dev_reset.dev.bus,
+dev_reset.dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(&pdev->domain->pci_lock);
+pcidevs_unlock();
+/* Implement FLR, other reset types may be implemented in future */
+switch ( dev_reset.reset_type )
+{
+case PCI_DEVICE_STATE_RESET_COLD:
+case PCI_DEVICE_STATE_RESET_WARM:
+case PCI_DEVICE_STATE_RESET_HOT:
+case PCI_DEVICE_STATE_RESET_FLR:
+{
+ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
+if ( ret )
+dprintk(XENLOG_ERR,
+"%pp: failed to reset vPCI device state\n", &sbdf);
+break;
+}
+
+default:
+ret = -EOPNOTSUPP;
+break;
+}
+write_unlock(&pdev->domain->pci_lock);
+
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..7e914d1eff9f 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev,
+uint32_t reset_type)
+{
+ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..ddbcdfb05248 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
@@ -305,6 +312,19 @@ struct physdev_pci_device {
 typedef struct physdev_pci_device physdev_pci_device_t;
 DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
 
+struct pci_device_state_reset {
+physdev_pci_device_t dev;
+#define _PCI_DEVICE_STATE_RESET_COLD 0
+#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
+#define _PCI_DEVICE_STATE_RESET_WARM 1
+#define PCI_DEVICE_ST

[XEN PATCH v10 4/5] tools: Add new function to get gsi from dev

2024-06-17 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
irq number is alloced from small to large, but the applying
gsi number is not, may gsi 38 comes before gsi 28, that
causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use its gsi number
to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq, so it will fail when mapping.
And in current codes, there is no method to get gsi for
userspace.

For above purpose, add new function to get gsi. And call this
function before xc_physdev_(un)map_pirq

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs review and needs to wait for the corresponding third patch on 
linux kernel side to be merged.
---
 tools/include/xen-sys/Linux/privcmd.h |  7 +
 tools/include/xencall.h   |  2 ++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/call/core.c|  5 
 tools/libs/call/libxencall.map|  2 ++
 tools/libs/call/linux.c   | 15 +++
 tools/libs/call/private.h |  9 +++
 tools/libs/ctrl/xc_physdev.c  |  4 +++
 tools/libs/light/Makefile |  2 +-
 tools/libs/light/libxl_pci.c  | 38 +++
 10 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..977f1a058797 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_dev {
+   __u32 sbdf;
+   int gsi;
+} privcmd_gsi_from_dev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_DEV \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_dev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xencall.h b/tools/include/xencall.h
index fc95ed0fe58e..750aab070323 100644
--- a/tools/include/xencall.h
+++ b/tools/include/xencall.h
@@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
  uint64_t arg1, uint64_t arg2, uint64_t arg3,
  uint64_t arg4, uint64_t arg5);
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
+
 /* Variant(s) of the above, as needed, returning "long" instead of "int". */
 long xencall2L(xencall_handle *xcall, unsigned int op,
uint64_t arg1, uint64_t arg2);
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 9ceca0cffc2f..a0381f74d24b 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
index 02c4f8e1aefa..6dae50c9a6ba 100644
--- a/tools/libs/call/core.c
+++ b/tools/libs/call/core.c
@@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
 return osdep_hypercall(xcall, &call);
 }
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf)
+{
+return osdep_oscall(xcall, sbdf);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
index d18a3174e9dc..b92a0b5dc12c 100644
--- a/tools/libs/call/libxencall.map
+++ b/tools/libs/call/libxencall.map
@@ -10,6 +10,8 @@ VERS_1.0 {
xencall4;
xencall5;
 
+   xen_oscall_gsi_from_dev;
+
xencall_alloc_buffer;
xencall_free_buffer;
xencall_alloc_buffer_pages;
diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
index 6d588e6bea8f..92c740e176f2 100644
--- a/tools/libs/call/linux.c
+++ b/tools/libs/call/linux.c
@@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, 
privcmd_hypercall_t *hypercall)
 return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
 }
 
+int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
+{
+privcmd_gsi_from_dev_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = -1,
+};
+
+if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, &

[XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-17 Thread Jiqian Chen
Some type of domain don't have PIRQs, like PVH, it doesn't do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest base on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
irq on Xen side.
What's more, current hypercall XEN_DOMCTL_irq_permission requires
passing in pirq, it is not suitable for dom0 that doesn't have
PIRQs.

So, add a new hypercall XEN_DOMCTL_gsi_permission to grant the
permission of irq(translate from gsi) to dumU when dom0 has no
PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
RFC: it needs review and needs to wait for the corresponding third patch on 
linux kernel side to be merged.
---
 tools/include/xenctrl.h|  5 +++
 tools/libs/ctrl/xc_domain.c| 15 +++
 tools/libs/light/libxl_pci.c   | 67 +++---
 xen/arch/x86/domctl.c  | 43 +++
 xen/arch/x86/include/asm/io_apic.h |  2 +
 xen/arch/x86/io_apic.c | 17 
 xen/arch/x86/mpparse.c |  3 +-
 xen/include/public/domctl.h|  8 
 xen/xsm/flask/hooks.c  |  1 +
 9 files changed, 153 insertions(+), 8 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index a0381f74d24b..f3feb6848e25 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.allow_access = allow_access,
+};
+
+return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 376f91759ac6..f027f22c0028 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1431,6 +1431,9 @@ static void pci_add_dm_done(libxl__egc *egc,
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
+#ifdef CONFIG_X86
+xc_domaininfo_t info;
+#endif
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
 rc = ERROR_FAIL;
 goto out;
 }
-r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+#ifdef CONFIG_X86
+/* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
+r = xc_domain_getinfo_single(ctx->xch, 0, &info);
 if (r < 0) {
-LOGED(ERROR, domainid,
-  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
+LOGED(ERROR, domainid, "getdomaininfo failed (error=%d)", errno);
 fclose(f);
 rc = ERROR_FAIL;
 goto out;
 }
+if (info.flags & XEN_DOMINF_hvm_guest &&
+!(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ) &&
+gsi > 0) {
+r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
+if (r < 0) {
+LOGED(ERROR, domainid,
+"xc_domain_gsi_permission gsi=%d (error=%d)", gsi, errno);
+fclose(f);
+rc = ERROR_FAIL;
+goto out;
+}
+}
+else
+#endif
+{
+r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+if (r < 0) {
+LOGED(ERROR, domainid,
+"xc_domain_irq_permission irq=%d (error=%d)", irq, errno);
+fclose(f);
+rc = ERROR_FAIL;
+goto out;
+}
+}
 

[XEN PATCH v10 0/5] Support device passthrough when dom0 is PVH on Xen

2024-06-17 Thread Jiqian Chen
Hi All,
This is v10 series to support passthrough when dom0 is PVH
v9->v10 changes:
* patch#2: Indent the comments above PHYSDEVOP_map_pirq according to the code 
style.
* patch#3: Modified the description in the commit message, changing "it calls" 
to "it will need to call",
   indicating that there will be new codes on the kernel side that will 
call PHYSDEVOP_setup_gsi.
   Also added an explanation of why the interrupt of passthrough device 
does not work if gsi is not
   registered.
* patch#4: Added define for CONFIG_X86 in tools/libs/light/Makefile to isolate 
x86 code in libxl_pci.c.
* patch#5: Modified the commit message to further describe the purpose of 
adding XEN_DOMCTL_gsi_permission.
   Deleted pci_device_set_gsi and called XEN_DOMCTL_gsi_permission 
directly in pci_add_dm_done.
   Added a check for all zeros in the padding field in 
XEN_DOMCTL_gsi_permission, and used currd
   instead of current->domain.
   In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of 
the original new code, and
   error handling for irq0 was added.
   Deleted the extra spaces in the upper and lower lines of the struct 
xen_domctl_gsi_permission
   definition.
All patches have modified signatures as follows:
Signed-off-by: Jiqian Chen  means I am the author.
Signed-off-by: Huang Rui  means Rui sent them to upstream 
firstly.
Signed-off-by: Jiqian Chen  means I take continue to 
upstream.


Best regards,
Jiqian Chen



v8->v9 changes:
* patch#1: Move pcidevs_unlock below write_lock, and remove 
"ASSERT(pcidevs_locked());"
   from vpci_reset_device_state;
   Add pci_device_state_reset_type to distinguish the reset types.
* patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this 
hypercall.
   Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == 
DOMID_SELF" to
   "d == current->domian".
* patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same checke 
in below.Although their return
   values are different, this difference is acceptable for the sake of 
code consistency
   if ( !is_hardware_domain(currd) )
   return -ENOSYS;
   break;
* patch#5: Change the commit message to describe more why we need this new 
hypercall.
   Add comment above "if ( is_pv_domain(current->domain) || 
has_pirq(current->domain) )" to explain
   why we need this check.
   Add gsi_2_irq to transform gsi to irq, instead of considering gsi == 
irq.
   Add explicit padding to struct xen_domctl_gsi_permission.


v7->v8 changes:
* patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when 
guest doesn't use pirq.
   That check was missed in the previous version.
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function
   to get gsi by passing in the sbdf of pci device.
* patch#5: Remove the parameter "is_gsi", when there exist gsi, in 
pci_add_dm_done use a new function
   pci_device_set_gsi to do map_pirq and grant permission. That gets 
more intuitive code logic.


v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function
   to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.


v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old 
function vpci_remove_device,
   vpci_add_handlers to vpci_deassign_device, vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to 
be understood, which to use
   gsi by default and be compatible with older kernel versions to 
continue to use irq


v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if 
the caller has PIRQ flag, and
   just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on 
it. And add the handling of errno
   and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall 
XEN_DOMCTL_gsi_permission to grant gsi


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move 
printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gs

[XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-06-17 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when subject domain has no
PIRQ flag.

So that domU with PIRQ flag can success to map pirq for
passthrough devices even dom0 has no PIRQ flag.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  6 ++
 xen/arch/x86/physdev.c   | 14 ++
 2 files changed, 20 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 0fab670a4871..03ada3c880bd 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 switch ( cmd )
 {
+/*
+* Only being permitted for management of other domains.
+* Further restrictions are enforced in do_physdev_op.
+*/
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index d6dd622952a9..f38cc22c872e 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -323,6 +323,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( !d )
 break;
 
+/* Prevent self-map when currd has no X86_EMU_USE_PIRQ flag */
+if ( is_hvm_domain(d) && !has_pirq(d) && d == currd )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+
 ret = physdev_map_pirq(d, map.type, &map.index, &map.pirq, &msi);
 
 rcu_unlock_domain(d);
@@ -346,6 +353,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( !d )
 break;
 
+/* Prevent self-unmap when currd has no X86_EMU_USE_PIRQ flag */
+if ( is_hvm_domain(d) && !has_pirq(d) && d == currd )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+
 ret = physdev_unmap_pirq(d, unmap.pirq);
 
 rcu_unlock_domain(d);
-- 
2.34.1




[XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-17 Thread Jiqian Chen
The gsi of a passthrough device must be configured for it to be
able to be mapped into a hvm domU.
But When dom0 is PVH, the gsis don't get registered, it causes
the info of apic, pin and irq not be added into irq_2_pin list,
and the handler of irq_desc is not set, then when passthrough a
device, setting ioapic affinity and vector will fail.

To fix above problem, on Linux kernel side, a new code will
need to call PHYSDEVOP_setup_gsi for passthrough devices to
register gsi when dom0 is PVH.

So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
purpose.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
The code link that will call this hypercall on linux kernel side is as follows:
https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 03ada3c880bd..cfe82d0f96ed 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 return -ENOSYS;
 break;
 
+case PHYSDEVOP_setup_gsi:
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
-- 
2.34.1




[XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device

2024-06-17 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 43 
 xen/drivers/vpci/vpci.c  |  9 
 xen/include/public/physdev.h |  7 ++
 xen/include/xen/pci.h| 16 ++
 xen/include/xen/vpci.h   |  6 +
 6 files changed, 82 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 7fb3136f0c7c..0fab670a4871 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..1cce508a73b1 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,11 +2,17 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
 #endif
 
+static const struct pci_device_state_reset_method
+pci_device_state_reset_methods[] = {
+[ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
+};
+
 ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 ret_t ret;
@@ -67,6 +73,43 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct pci_device_state_reset dev_reset;
+struct physdev_pci_device *dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
+break;
+dev = &dev_reset.dev;
+sbdf = PCI_SBDF(dev->seg, dev->bus, dev->devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(&pdev->domain->pci_lock);
+pcidevs_unlock();
+ret = 
pci_device_state_reset_methods[dev_reset.reset_type].reset_fn(pdev);
+write_unlock(&pdev->domain->pci_lock);
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset vPCI device state\n", 
&sbdf);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..ff67c2550ccb 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..a71da5892e5f 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 63e49f0117e9..376981f9da98 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -156,6 +156,22 @@ struct pci_dev {
 struct vpci *vpci;
 };
 
+struct pci_device_state_reset_method {
+int (*reset_fn)(struct pci_dev *pdev);
+};
+
+enum pci_device_state_reset_type {
+DEVICE_RESET_FLR,
+DEVICE_RESET_COLD,
+DEVICE_RESET_WARM,
+DEVICE_RESET_HOT,
+};
+
+struct pci_device_state_reset {
+struct physdev_pci_device dev;
+enum pci_device_state_reset_type reset_type;
+};
+
 #define for_each_pdev(d

[XEN PATCH v9 0/5] Support device passthrough when dom0 is PVH on Xen

2024-06-07 Thread Jiqian Chen
Hi All,
This is v9 series to support passthrough when dom0 is PVH
v8->v9 changes:
* patch#1: Move pcidevs_unlock below write_lock, and remove 
"ASSERT(pcidevs_locked());" from vpci_reset_device_state;
   Add pci_device_state_reset_type to distinguish the reset types.
* patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this 
hypercall.
   Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == 
DOMID_SELF" to "d == current->domian".
* patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same checke 
in below.
* patch#5: Change the commit message to describe more why we need this new 
hypercall.
   Add comment above "if ( is_pv_domain(current->domain) || 
has_pirq(current->domain) )" to explain why we need this check.
   Add gsi_2_irq to transform gsi to irq, instead of 
considering gsi == irq.
   Add explicit padding to struct xen_domctl_gsi_permission.


Best regards,
Jiqian Chen



v7->v8 changes:
* patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when 
guest doesn't use pirq.
   That check was missed in the previous version.
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function
   to get gsi by passing in the sbdf of pci device.
* patch#5: Remove the parameter "is_gsi", when there exist gsi, in 
pci_add_dm_done use a new function
   pci_device_set_gsi to do map_pirq and grant permission. That gets 
more intuitive code logic.


v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function
   to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.


v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old 
function vpci_remove_device,
   vpci_add_handlers to vpci_deassign_device, vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to 
be understood, which to use
   gsi by default and be compatible with older kernel versions to 
continue to use irq


v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if 
the caller has PIRQ flag, and
   just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on 
it. And add the handling of errno
   and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall 
XEN_DOMCTL_gsi_permission to grant gsi


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move 
printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH 
is treated as a separate patch
* patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to 
grant irq permission in
   XEN_DOMCTL_irq_permission.
* patch#5: to be compatible with previous kernel versions, when there is no gsi 
sysfs, still use irq
v4 link:
https://lore.kernel.org/xen-devel/20240105070920.350113-1-jiqian.c...@amd.com/T/#t

v2->v3 changes:
* patch#1: move the content out of pci_reset_device_state and delete 
pci_reset_device_state; add
   xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; 
add description for
   PHYSDEVOP_pci_device_state_reset;
* patch#2: du to changes in the implementation of the second patch on kernel 
side(that it will do setup_gsi and
   map_pirq when assigning a device to passthrough), add 
PHYSDEVOP_setup_gsi for PVH dom0, and we need
   to support self mapping.
* patch#3: du to changes in the implementation of the second patch on kernel 
side(that adds a new sysfs for gsi
   instead of a new syscall), so read gsi number from the sysfs of gsi.
v3 link:
https://lore.kernel.org/xen-devel/20231210164009.1551147-1-jiqian.c...@amd.com/T/#t

v2 link:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-jiqian.c...@amd.com/T/#t
Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a pa

[XEN PATCH v9 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-06-07 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when subject domain has no
PIRQ flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  6 ++
 xen/arch/x86/physdev.c   | 24 
 2 files changed, 30 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 0fab670a4871..fa5d50a0dd22 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 switch ( cmd )
 {
+/*
+ * Only being permitted for management of other domains.
+ * Further restrictions are enforced in do_physdev_op.
+ */
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 7efa17cf4c1e..61999882f836 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_map_pirq: {
 physdev_map_pirq_t map;
 struct msi_info msi;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(&map, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(map.domid);
+if ( d == NULL )
+return -ESRCH;
+/* Prevent self-map when domain has no X86_EMU_USE_PIRQ flag */
+if ( is_hvm_domain(d) && !has_pirq(d) && d == current->domain )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 switch ( map.type )
 {
 case MAP_PIRQ_TYPE_MSI_SEG:
@@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 case PHYSDEVOP_unmap_pirq: {
 struct physdev_unmap_pirq unmap;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(&unmap, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(unmap.domid);
+if ( d == NULL )
+return -ESRCH;
+/* Prevent self-unmap when domain has no X86_EMU_USE_PIRQ flag */
+if ( is_hvm_domain(d) && !has_pirq(d) && d == current->domain )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
 break;
 }
-- 
2.34.1




[RFC XEN PATCH v9 4/5] tools: Add new function to get gsi from dev

2024-06-07 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
irq number is alloced from small to large, but the applying
gsi number is not, may gsi 38 comes before gsi 28, that
causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use its gsi number
to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq, so it will fail when mapping.
And in current codes, there is no method to get gsi for
userspace.

For above purpose, add new function to get gsi. And call this
function before xc_physdev_(un)map_pirq

Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs review and needs to wait for the corresponding third patch on 
linux kernel side to be merged.
---
 tools/include/xen-sys/Linux/privcmd.h |  7 +++
 tools/include/xencall.h   |  2 ++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/call/core.c|  5 +
 tools/libs/call/libxencall.map|  2 ++
 tools/libs/call/linux.c   | 15 +++
 tools/libs/call/private.h |  9 +
 tools/libs/ctrl/xc_physdev.c  |  4 
 tools/libs/light/libxl_pci.c  | 23 +++
 9 files changed, 69 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..977f1a058797 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_dev {
+   __u32 sbdf;
+   int gsi;
+} privcmd_gsi_from_dev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_DEV \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_dev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xencall.h b/tools/include/xencall.h
index fc95ed0fe58e..750aab070323 100644
--- a/tools/include/xencall.h
+++ b/tools/include/xencall.h
@@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
  uint64_t arg1, uint64_t arg2, uint64_t arg3,
  uint64_t arg4, uint64_t arg5);
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
+
 /* Variant(s) of the above, as needed, returning "long" instead of "int". */
 long xencall2L(xencall_handle *xcall, unsigned int op,
uint64_t arg1, uint64_t arg2);
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 9ceca0cffc2f..a0381f74d24b 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
index 02c4f8e1aefa..6dae50c9a6ba 100644
--- a/tools/libs/call/core.c
+++ b/tools/libs/call/core.c
@@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
 return osdep_hypercall(xcall, &call);
 }
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf)
+{
+return osdep_oscall(xcall, sbdf);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
index d18a3174e9dc..b92a0b5dc12c 100644
--- a/tools/libs/call/libxencall.map
+++ b/tools/libs/call/libxencall.map
@@ -10,6 +10,8 @@ VERS_1.0 {
xencall4;
xencall5;
 
+   xen_oscall_gsi_from_dev;
+
xencall_alloc_buffer;
xencall_free_buffer;
xencall_alloc_buffer_pages;
diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
index 6d588e6bea8f..92c740e176f2 100644
--- a/tools/libs/call/linux.c
+++ b/tools/libs/call/linux.c
@@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, 
privcmd_hypercall_t *hypercall)
 return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
 }
 
+int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
+{
+privcmd_gsi_from_dev_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = -1,
+};
+
+if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, &dev_gsi)) {
+PERROR("failed to get gsi from dev");
+return -1;
+}
+
+return dev_gsi.gsi;
+}
+
 static void *alloc_pages_bufdev(xencall_handle *xcall, size_t 

[RFC XEN PATCH v9 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-07 Thread Jiqian Chen
Some type of domain don't have PIRQ, like PVH, it do not do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will failed at
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq
and irq on Xen side.

What's more, current hypercall XEN_DOMCTL_irq_permission require
passing in pirq and grant the access of irq, it is not suitable
for dom0 that has no PIRQ flag, because passthrough a device
needs gsi and grant the corresponding irq to guest. So, add a
new hypercall to grant gsi permission when dom0 is not PV or dom0
has not PIRQ flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
RFC: it needs review and needs to wait for the corresponding third patch on 
linux kernel side to be merged.
---
 tools/include/xenctrl.h|  5 +++
 tools/libs/ctrl/xc_domain.c| 15 +++
 tools/libs/light/libxl_pci.c   | 72 +++---
 xen/arch/x86/domctl.c  | 38 
 xen/arch/x86/include/asm/io_apic.h |  2 +
 xen/arch/x86/io_apic.c | 21 +
 xen/arch/x86/mpparse.c |  3 +-
 xen/include/public/domctl.h| 10 +
 xen/xsm/flask/hooks.c  |  1 +
 9 files changed, 149 insertions(+), 18 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index a0381f74d24b..f3feb6848e25 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.allow_access = allow_access,
+};
+
+return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 7e44d4c3ae2b..b8ec37d8d7e3 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1412,6 +1412,37 @@ static bool pci_supp_legacy_irq(void)
 #define PCI_SBDF(seg, bus, devfn) \
 uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
 
+static int pci_device_set_gsi(libxl_ctx *ctx,
+  libxl_domid domid,
+  libxl_device_pci *pci,
+  bool map,
+  int *gsi_back)
+{
+int r, gsi, pirq;
+uint32_t sbdf;
+
+sbdf = PCI_SBDF(pci->domain, pci->bus, (PCI_DEVFN(pci->dev, pci->func)));
+r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
+*gsi_back = r;
+if (r < 0)
+return r;
+
+gsi = r;
+pirq = r;
+if (map)
+r = xc_physdev_map_pirq(ctx->xch, domid, gsi, &pirq);
+else
+r = xc_physdev_unmap_pirq(ctx->xch, domid, pirq);
+if (r)
+return r;
+
+r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map);
+if (r && errno == EOPNOTSUPP)
+r = xc_domain_irq_permission(ctx->xch, domid, pirq, map);
+
+return r;
+}
+
 static void pci_add_dm_done(libxl__egc *egc,
 pci_add_state *pas,
 int rc)
@@ -1424,10 +1455,10 @@ static void pci_add_dm_done(libxl__egc *egc,
 unsigned long long start, end, flags, size;
 int irq, i;
 int r;
-uint32_t sbdf;
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
+int gsi;
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1485,6 +1516,19 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
+
+r = pci_device_set_gsi(ctx, domid, pci, 1, &gsi);
+if (gsi >= 0) {
+if (r < 0) {
+rc = ERROR_FAIL;
+LOGED(

[XEN PATCH v9 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-07 Thread Jiqian Chen
On PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
passthrough devices to register gsi when dom0 is PVH.
So, add PHYSDEVOP_setup_gsi for above purpose.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
The code link that will call this hypercall on linux kernel side is as follows
https://lore.kernel.org/lkml/20240607075109.126277-3-jiqian.c...@amd.com/T/#u
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index fa5d50a0dd22..164f4eefa043 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 return -ENOSYS;
 break;
 
+case PHYSDEVOP_setup_gsi:
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
-- 
2.34.1




[XEN PATCH v9 1/5] xen/vpci: Clear all vpci status of device

2024-06-07 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 43 
 xen/drivers/vpci/vpci.c  |  9 
 xen/include/public/physdev.h |  7 ++
 xen/include/xen/pci.h| 16 ++
 xen/include/xen/vpci.h   |  6 +
 6 files changed, 82 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 7fb3136f0c7c..0fab670a4871 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..1cce508a73b1 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,11 +2,17 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
 #endif
 
+static const struct pci_device_state_reset_method
+pci_device_state_reset_methods[] = {
+[ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
+};
+
 ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 ret_t ret;
@@ -67,6 +73,43 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct pci_device_state_reset dev_reset;
+struct physdev_pci_device *dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(&dev_reset, arg, 1) != 0 )
+break;
+dev = &dev_reset.dev;
+sbdf = PCI_SBDF(dev->seg, dev->bus, dev->devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(&pdev->domain->pci_lock);
+pcidevs_unlock();
+ret = 
pci_device_state_reset_methods[dev_reset.reset_type].reset_fn(pdev);
+write_unlock(&pdev->domain->pci_lock);
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset vPCI device state\n", 
&sbdf);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..ff67c2550ccb 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..a71da5892e5f 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 63e49f0117e9..376981f9da98 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -156,6 +156,22 @@ struct pci_dev {
 struct vpci *vpci;
 };
 
+struct pci_device_state_reset_method {
+int (*reset_fn)(struct pci_dev *pdev);
+};
+
+enum pci_device_state_reset_type {
+DEVICE_RESET_FLR,
+DEVICE_RESET_COLD,
+DEVICE_RESET_WARM,
+DEVICE_RESET_HOT,
+};
+
+struct pci_device_state_reset {
+struct physdev_pci_device dev;
+enum pci_device_state_reset_type reset_type;
+};
+
 #define for_each_pdev(domain, pdev) \
 list_for_

[RFC KERNEL PATCH v8 2/3] xen/pvh: Setup gsi for passthrough device

2024-06-07 Thread Jiqian Chen
In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assign a device to passthrough, proactively setup the gsi
of the device during that process.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
RFC: it need to wait for the corresponding third patch on xen side to be merged.
---
 arch/x86/xen/enlighten_pvh.c   | 23 ++
 drivers/acpi/pci_irq.c |  2 +-
 drivers/xen/acpi.c | 50 ++
 drivers/xen/xen-pciback/pci_stub.c | 21 +
 include/linux/acpi.h   |  1 +
 include/xen/acpi.h | 10 ++
 6 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index 27a2a02ef8fb..6caadf9c00ab 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -4,6 +4,7 @@
 #include 
 
 #include 
+#include 
 
 #include 
 #include 
@@ -27,6 +28,28 @@
 bool __ro_after_init xen_pvh;
 EXPORT_SYMBOL_GPL(xen_pvh);
 
+#ifdef CONFIG_XEN_DOM0
+int xen_pvh_setup_gsi(int gsi, int trigger, int polarity)
+{
+   int ret;
+   struct physdev_setup_gsi setup_gsi;
+
+   setup_gsi.gsi = gsi;
+   setup_gsi.triggering = (trigger == ACPI_EDGE_SENSITIVE ? 0 : 1);
+   setup_gsi.polarity = (polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+   ret = HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi);
+   if (ret == -EEXIST) {
+   xen_raw_printk("Already setup the GSI :%d\n", gsi);
+   ret = 0;
+   } else if (ret)
+   xen_raw_printk("Fail to setup GSI (%d)!\n", gsi);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_setup_gsi);
+#endif
+
 void __init xen_pvh_init(struct boot_params *boot_params)
 {
u32 msr;
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..630fe0a34bc6 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev,
 }
 #endif /* CONFIG_X86_IO_APIC */
 
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
 {
struct acpi_prt_entry *entry = NULL;
struct pci_dev *bridge;
diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c
index 6893c79fd2a1..9e2096524fbc 100644
--- a/drivers/xen/acpi.c
+++ b/drivers/xen/acpi.c
@@ -30,6 +30,7 @@
  * IN THE SOFTWARE.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -75,3 +76,52 @@ int xen_acpi_notify_hypervisor_extended_sleep(u8 sleep_state,
return xen_acpi_notify_hypervisor_state(sleep_state, val_a,
val_b, true);
 }
+
+struct acpi_prt_entry {
+   struct acpi_pci_id  id;
+   u8  pin;
+   acpi_handle link;
+   u32 index;
+};
+
+int xen_acpi_get_gsi_info(struct pci_dev *dev,
+ int *gsi_out,
+ int *trigger_out,
+ int *polarity_out)
+{
+   int gsi;
+   u8 pin;
+   struct acpi_prt_entry *entry;
+   int trigger = ACPI_LEVEL_SENSITIVE;
+   int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+ ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+   if (!dev || !gsi_out || !trigger_out || !polarity_out)
+   return -EINVAL;
+
+   pin = dev->pin;
+   if (!pin)
+   return -EINVAL;
+
+   entry = acpi_pci_irq_lookup(dev, pin);
+   if (entry) {
+   if (entry->link)
+   gsi = acpi_pci_link_allocate_irq(entry->link,
+entry->index,
+&trigger, &polarity,
+NULL);
+   else
+   gsi = entry->index;
+   } else
+   gsi = -1;
+
+   if (gsi < 0)
+   return -EINVAL;
+
+   *gsi_out = gsi;
+   *trigger_out = trigger;
+   *polarity_out = polarity;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(xen_acpi_get_gsi_info);
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 73062e531c34..6b22e45188f5 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -21,6 +21,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_XEN_ACPI
+#include 
+#endif
 #include 
 #include 
 #include "pciback.h"
@@ -367,6 +370,9 @@ static int pcistub_match(struct pci_dev *dev)
 static int pcistub_init_device(struct pci_dev *dev)
 {
struct xen_pcibk_dev_data *dev_data;
+#

[RFC KERNEL PATCH v8 3/3] xen/privcmd: Add new syscall to get gsi from dev

2024-06-07 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi 28,
it causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use device's gsi
number to do pirq mapping, but the gsi number is got from
file /sys/bus/pci/devices//irq, irq!= gsi, so it will
fail when mapping.
And in current linux codes, there is no method to get gsi
for userspace.

For above purpose, record gsi of pcistub devices when init
pcistub and add a new syscall into privcmd to let userspace
can get gsi when they have a need.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
RFC: it need review and need to wait for previous patch of this series to be 
merged.
---
 drivers/xen/privcmd.c  | 28 ++
 drivers/xen/xen-pciback/pci_stub.c | 38 +++---
 include/uapi/xen/privcmd.h |  7 ++
 include/xen/acpi.h |  9 +++
 4 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 67dfa4778864..5809b3168f25 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -45,6 +45,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_XEN_ACPI
+#include 
+#endif
 
 #include "privcmd.h"
 
@@ -842,6 +845,27 @@ static long privcmd_ioctl_mmap_resource(struct file *file,
return rc;
 }
 
+static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata)
+{
+#ifdef CONFIG_XEN_ACPI
+   struct privcmd_gsi_from_dev kdata;
+
+   if (copy_from_user(&kdata, udata, sizeof(kdata)))
+   return -EFAULT;
+
+   kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf);
+   if (kdata.gsi == -1)
+   return -EINVAL;
+
+   if (copy_to_user(udata, &kdata, sizeof(kdata)))
+   return -EFAULT;
+
+   return 0;
+#else
+   return -EINVAL;
+#endif
+}
+
 #ifdef CONFIG_XEN_PRIVCMD_EVENTFD
 /* Irqfd support */
 static struct workqueue_struct *irqfd_cleanup_wq;
@@ -1529,6 +1553,10 @@ static long privcmd_ioctl(struct file *file,
ret = privcmd_ioctl_ioeventfd(file, udata);
break;
 
+   case IOCTL_PRIVCMD_GSI_FROM_DEV:
+   ret = privcmd_ioctl_gsi_from_dev(file, udata);
+   break;
+
default:
break;
}
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 6b22e45188f5..9d791d7a8098 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -56,6 +56,9 @@ struct pcistub_device {
 
struct pci_dev *dev;
struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */
+#ifdef CONFIG_XEN_ACPI
+   int gsi;
+#endif
 };
 
 /* Access to pcistub_devices & seized_devices lists and the initialize_devices
@@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
 
kref_init(&psdev->kref);
spin_lock_init(&psdev->lock);
+#ifdef CONFIG_XEN_ACPI
+   psdev->gsi = -1;
+#endif
 
return psdev;
 }
@@ -220,6 +226,25 @@ static struct pci_dev *pcistub_device_get_pci_dev(struct 
xen_pcibk_device *pdev,
return pci_dev;
 }
 
+#ifdef CONFIG_XEN_ACPI
+int pcistub_get_gsi_from_sbdf(unsigned int sbdf)
+{
+   struct pcistub_device *psdev;
+   int domain = (sbdf >> 16) & 0x;
+   int bus = PCI_BUS_NUM(sbdf);
+   int slot = PCI_SLOT(sbdf);
+   int func = PCI_FUNC(sbdf);
+
+   psdev = pcistub_device_find(domain, bus, slot, func);
+
+   if (!psdev)
+   return -1;
+
+   return psdev->gsi;
+}
+EXPORT_SYMBOL_GPL(pcistub_get_gsi_from_sbdf);
+#endif
+
 struct pci_dev *pcistub_get_pci_dev_by_slot(struct xen_pcibk_device *pdev,
int domain, int bus,
int slot, int func)
@@ -367,14 +392,20 @@ static int pcistub_match(struct pci_dev *dev)
return found;
 }
 
-static int pcistub_init_device(struct pci_dev *dev)
+static int pcistub_init_device(struct pcistub_device *psdev)
 {
struct xen_pcibk_dev_data *dev_data;
+   struct pci_dev *dev;
 #ifdef CONFIG_XEN_ACPI
int gsi, trigger, polarity;
 #endif
int err = 0;
 
+   if (!psdev)
+   return -EINVAL;
+
+   dev = psdev->dev;
+
dev_dbg(&dev->dev, "initializing...\n");
 
/* The PCI backend is not intended to be a module (or to work with
@@ -448,6 +479,7 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_err(&dev->dev, "Fail to get gsi info!\n");
goto config_release;
}
+   psdev->gsi = gsi;
 
if (xen_initi

[RFC KERNEL PATCH v8 0/2] Support device passthrough when dom0 is PVH on Xen

2024-06-07 Thread Jiqian Chen
Hi All,
This is v8 series to support passthrough on Xen when dom0 is PVH.
v7->v8 change:
* patch#1: This is the patch#1 of v6, because it is reverted from the staging 
branch due to the API changes on Xen side.
   Add pci_device_state_reset_type_t to distinguish the reset types.
* patch#2: is the patch#1 of v7. Use CONFIG_XEN_ACPI instead of CONFIG_ACPI to 
wrap codes.
* patch#3: is the patch#2 of v7. In function privcmd_ioctl_gsi_from_dev, return 
-EINVAL when not confige CONFIG_XEN_ACPI.
   use PCI_BUS_NUM PCI_SLOT PCI_FUNC instead of open coding.


Best regards,
Jiqian Chen



v6->v7 change:
* the first patch of v6 was already merged into branch linux_next.
* patch#1: is the patch#2 of v6. move the implementation of function 
xen_acpi_get_gsi_info to
   file drivers/xen/acpi.c, that modification is more convenient for 
the subsequent
   patch to obtain gsi.
* patch#2: is the patch#3 of v6. add a new parameter "gsi" to struct 
pcistub_device and set
   gsi when pcistub initialize device. Then when userspace wants to get 
gsi by passing
   sbdf, we can return that gsi.


v5->v6 change:
* patch#3: change to add a new syscall to translate irq to gsi, instead adding 
a new gsi sysfs.


v4->v5 changes:
* patch#1: Add Reviewed-by Stefano
* patch#2: Add Reviewed-by Stefano
* patch#3: No changes


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; use a new 
function
   pcistub_reset_device_state to wrap __pci_reset_function_locked and 
xen_reset_device_state,
   and call pcistub_reset_device_state in pci_stub.c
* patch#2: remove map_pirq from xen_pvh_passthrough_gsi


v2->v3 changes:
* patch#1: add condition to limit do xen_reset_device_state for no-pv domain in 
pcistub_init_device.
* patch#2: Abandoning previous implementations that call unmask_irq. To setup 
gsi and map pirq for
   passthrough device in pcistub_init_device.
* patch#3: Abandoning previous implementations that adds new syscall to get gsi 
from irq. To add a new
   sysfs for gsi, then userspace can get gsi number from sysfs.


Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a 
device, pci_stub will
call \u201cpcistub_init_device() -> pci_restore_state() -> 
pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword(), the pci
config write will trigger an io interrupt to bar_write() in the xen, but the 
bar->enabled was set before,
the write is not allowed now, and then when bar->Qemu config the passthrough 
device in xen_pt_realize(),
it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current cached state
in pdev->vpci is all out of date and is different from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new hypercall to
reset the state stored in vPCI when the state of real device has changed.
Thank Roger for the suggestion of this v2, and it is different from v1
(https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), 
v1 simply allow domU to
write pci bar, it does not comply with the design principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using 
gsi. See
xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. 
Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no 
X86_EMU_USE_PIRQ flag, it
will fail at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 
(at present dom0 is PVH).
The second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow 
PVH dom0 do
PHYSDEVOP_map_pirq. This v2 patch is better than v1, v1 simply remove the 
has_pirq check
(xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.hu...@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a 
passthrough device for domU.
This 

[RFC KERNEL PATCH v8 1/3] xen/pci: Add xen_reset_device_function_state

2024-06-07 Thread Jiqian Chen
When device on dom0 side has been reset, the vpci on Xen side
won't get notification, so that the cached state in vpci is
all out of date with the real device state.
To solve that problem, add a new function to clear all vpci
device state when device is reset on dom0 side.

And call that function in pcistub_init_device. Because when
using "pci-assignable-add" to assign a passthrough device in
Xen, it will reset passthrough device and the vpci state will
out of date, and then device will fail to restore bar state.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
RFC: it need to wait for the corresponding first patch on xen side to be merged.
---
 drivers/xen/pci.c  | 25 +
 drivers/xen/xen-pciback/pci_stub.c | 18 +++---
 include/xen/interface/physdev.h|  7 +++
 include/xen/pci.h  |  6 ++
 4 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 72d4e3f193af..57093e395982 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -177,6 +177,31 @@ static int xen_remove_device(struct device *dev)
return r;
 }
 
+enum pci_device_state_reset_type {
+   DEVICE_RESET_FLR,
+   DEVICE_RESET_COLD,
+   DEVICE_RESET_WARM,
+   DEVICE_RESET_HOT,
+};
+
+struct pci_device_state_reset {
+   struct physdev_pci_device dev;
+   enum pci_device_state_reset_type reset_type;
+};
+
+int xen_reset_device_function_state(const struct pci_dev *dev)
+{
+   struct pci_device_state_reset device = {
+   .dev.seg = pci_domain_nr(dev->bus),
+   .dev.bus = dev->bus->number,
+   .dev.devfn = dev->devfn,
+   .reset_type = DEVICE_RESET_FLR,
+   };
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, &device);
+}
+EXPORT_SYMBOL_GPL(xen_reset_device_function_state);
+
 static int xen_pci_notifier(struct notifier_block *nb,
unsigned long action, void *data)
 {
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index e34b623e4b41..73062e531c34 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -89,6 +89,16 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
return psdev;
 }
 
+static int pcistub_reset_device_state(struct pci_dev *dev)
+{
+   __pci_reset_function_locked(dev);
+
+   if (!xen_pv_domain())
+   return xen_reset_device_function_state(dev);
+   else
+   return 0;
+}
+
 /* Don't call this directly as it's called by pcistub_device_put */
 static void pcistub_device_release(struct kref *kref)
 {
@@ -107,7 +117,7 @@ static void pcistub_device_release(struct kref *kref)
/* Call the reset function which does not take lock as this
 * is called from "unbind" which takes a device_lock mutex.
 */
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
if (dev_data &&
pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state))
dev_info(&dev->dev, "Could not reload PCI state\n");
@@ -284,7 +294,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 * (so it's ready for the next domain)
 */
device_lock_assert(&dev->dev);
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
 
dev_data = pci_get_drvdata(dev);
ret = pci_load_saved_state(dev, dev_data->pci_saved_state);
@@ -420,7 +430,9 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_err(&dev->dev, "Could not store PCI conf saved state!\n");
else {
dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n");
-   __pci_reset_function_locked(dev);
+   err = pcistub_reset_device_state(dev);
+   if (err)
+   goto config_release;
pci_restore_state(dev);
}
/* Now disable the device (this also ensures some private device
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index a237af867873..b50646c993dd 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -256,6 +256,13 @@ struct physdev_pci_device_add {
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/include/xen/pci.h b/include/xen/pci.h
index b8337cf85fd1..7941809ab729 

[RFC QEMU PATCH v7 1/1] xen/pci: get gsi for passthrough devices

2024-05-16 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, qemu wants to use
gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but
the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

Get gsi by using new function supported by Xen tools.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 hw/xen/xen-host-pci-device.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 8c6e9a1716a2..2fe6a60434ba 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -10,6 +10,7 @@
 #include "qapi/error.h"
 #include "qemu/cutils.h"
 #include "xen-host-pci-device.h"
+#include "hw/xen/xen_native.h"
 
 #define XEN_HOST_PCI_MAX_EXT_CAP \
 ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4))
@@ -329,12 +330,17 @@ int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *d, 
uint32_t cap)
 return -1;
 }
 
+#define PCI_SBDF(seg, bus, dev, func) \
+uint32_t)(seg)) << 16) | \
+(PCI_BUILD_BDF(bus, PCI_DEVFN(dev, func
+
 void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
  uint8_t bus, uint8_t dev, uint8_t func,
  Error **errp)
 {
 ERRP_GUARD();
 unsigned int v;
+uint32_t sdbf;
 
 d->config_fd = -1;
 d->domain = domain;
@@ -364,11 +370,16 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, 
uint16_t domain,
 }
 d->device_id = v;
 
-xen_host_pci_get_dec_value(d, "irq", &v, errp);
-if (*errp) {
-goto error;
+sdbf = PCI_SBDF(domain, bus, dev, func);
+d->irq = xc_physdev_gsi_from_dev(xen_xc, sdbf);
+/* fail to get gsi, fallback to irq */
+if (d->irq == -1) {
+xen_host_pci_get_dec_value(d, "irq", &v, errp);
+if (*errp) {
+goto error;
+}
+d->irq = v;
 }
-d->irq = v;
 
 xen_host_pci_get_hex_value(d, "class", &v, errp);
 if (*errp) {
-- 
2.34.1




[RFC QEMU PATCH v7 0/1] Support device passthrough when dom0 is PVH on Xen

2024-05-16 Thread Jiqian Chen
Hi All,
This is v7 series to support passthrough on Xen when dom0 is PVH.
v6->v7 changes:
* Due to changes in the implementation of obtaining gsi in the kernel and Xen. 
Change to use
  xc_physdev_gsi_from_dev, that requires passing in sbdf instead of irq.


Best regards,
Jiqian Chen



v5->v6 changes:
* Due to changes in the implementation of obtaining gsi in the kernel and Xen. 
Change to use
  xc_physdev_gsi_from_irq, instead of gsi sysfs.


v4->v5 changes:
* Add review by Stefano


v3->v4 changes:
* Add gsi into struct XenHostPCIDevice and use gsi number that read from gsi 
sysfs
  if it exists, if there is no gsi sysfs, still use irq.


v2->v3 changes:
* Du to changes in the implementation of the second patch on kernel side(that 
adds
  a new sysfs for gsi instead of a new syscall), so read gsi number from the 
sysfs of gsi.


Below is the description of v2 cover letter:
This patch is the v2 of the implementation of passthrough when dom0 is PVH on 
Xen.
Issues we encountered:
1. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough 
device\u2019s gsi to pirq in
function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq,
but qemu pass irq to it and treat irq as gsi, it is got from file
/sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). 
But actually
the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a 
gsi in
function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the 
principle of
applying first, distributing first. And if you debug the kernel codes
(see function __irq_alloc_descs), you will find the irq number is allocated 
from small to
large by order, but the applying gsi number is not, gsi 38 may come before gsi 
28, that
causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want
to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add 
new syscall
to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() 
when dom0
initialize pci devices, and provide a syscall for userspace to get the gsi from 
irq. The
third patch of xen(tools: Add new function to get gsi from irq) add a new 
function
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success.

This v2 on qemu side is the same as the v1 
(qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (1):
  xen/pci: get gsi for passthrough devices

 hw/xen/xen-host-pci-device.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

-- 
2.34.1




[RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-05-16 Thread Jiqian Chen
Some type of domain don't have PIRQ, like PVH, when
passthrough a device to guest on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
at domain_pirq_to_irq.

So, add a new hypercall to grant/revoke gsi permission
when dom0 is not PV or dom0 has not PIRQ flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/include/xenctrl.h  |  5 +++
 tools/libs/ctrl/xc_domain.c  | 15 
 tools/libs/light/libxl_pci.c | 72 
 xen/arch/x86/domctl.c| 31 
 xen/include/public/domctl.h  |  9 +
 xen/xsm/flask/hooks.c|  1 +
 6 files changed, 117 insertions(+), 16 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 841db41ad7e4..c21a79d74be3 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.allow_access = allow_access,
+};
+
+return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 7e44d4c3ae2b..1d1b81dd2844 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1412,6 +1412,37 @@ static bool pci_supp_legacy_irq(void)
 #define PCI_SBDF(seg, bus, devfn) \
 uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
 
+static int pci_device_set_gsi(libxl_ctx *ctx,
+  libxl_domid domid,
+  libxl_device_pci *pci,
+  bool map,
+  int *gsi_back)
+{
+int r, gsi, pirq;
+uint32_t sbdf;
+
+sbdf = PCI_SBDF(pci->domain, pci->bus, (PCI_DEVFN(pci->dev, pci->func)));
+r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
+*gsi_back = r;
+if (r < 0)
+return r;
+
+gsi = r;
+pirq = r;
+if (map)
+r = xc_physdev_map_pirq(ctx->xch, domid, gsi, &pirq);
+else
+r = xc_physdev_unmap_pirq(ctx->xch, domid, pirq);
+if (r)
+return r;
+
+r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map);
+if (r && errno == EOPNOTSUPP)
+r = xc_domain_irq_permission(ctx->xch, domid, gsi, map);
+
+return r;
+}
+
 static void pci_add_dm_done(libxl__egc *egc,
 pci_add_state *pas,
 int rc)
@@ -1424,10 +1455,10 @@ static void pci_add_dm_done(libxl__egc *egc,
 unsigned long long start, end, flags, size;
 int irq, i;
 int r;
-uint32_t sbdf;
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
+int gsi;
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1485,6 +1516,19 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
+
+r = pci_device_set_gsi(ctx, domid, pci, 1, &gsi);
+if (gsi >= 0) {
+if (r < 0) {
+rc = ERROR_FAIL;
+LOGED(ERROR, domainid,
+  "pci_device_set_gsi gsi=%d (error=%d)", gsi, errno);
+goto out;
+} else {
+goto process_permissive;
+}
+}
+/* if gsi < 0, keep using irq */
 sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
 pci->bus, pci->dev, pci->func);
 f = fopen(sysfs_path, "r");
@@ -1493,13 +1537,6 @@ static void pci_add_dm_done(libxl__egc *egc,
 goto out_no_irq;
 }
 if ((fscanf(f, "%u", &irq) == 1) && irq) {
-

[RFC XEN PATCH v8 4/5] tools: Add new function to get gsi from dev

2024-05-16 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
irq number is alloced from small to large, but the applying
gsi number is not, may gsi 38 comes before gsi 28, that
causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use its gsi number
to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq, so it will fail when mapping.
And in current codes, there is no method to get gsi for
userspace.

For above purpose, add new function to get gsi. And call this
function before xc_physdev_(un)map_pirq

Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
 tools/include/xen-sys/Linux/privcmd.h |  7 +++
 tools/include/xencall.h   |  2 ++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/call/core.c|  5 +
 tools/libs/call/libxencall.map|  2 ++
 tools/libs/call/linux.c   | 15 +++
 tools/libs/call/private.h |  9 +
 tools/libs/ctrl/xc_physdev.c  |  4 
 tools/libs/light/libxl_pci.c  | 23 +++
 9 files changed, 69 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..977f1a058797 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_dev {
+   __u32 sbdf;
+   int gsi;
+} privcmd_gsi_from_dev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_DEV \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_dev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xencall.h b/tools/include/xencall.h
index fc95ed0fe58e..750aab070323 100644
--- a/tools/include/xencall.h
+++ b/tools/include/xencall.h
@@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
  uint64_t arg1, uint64_t arg2, uint64_t arg3,
  uint64_t arg4, uint64_t arg5);
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
+
 /* Variant(s) of the above, as needed, returning "long" instead of "int". */
 long xencall2L(xencall_handle *xcall, unsigned int op,
uint64_t arg1, uint64_t arg2);
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 499685594427..841db41ad7e4 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
index 02c4f8e1aefa..6dae50c9a6ba 100644
--- a/tools/libs/call/core.c
+++ b/tools/libs/call/core.c
@@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
 return osdep_hypercall(xcall, &call);
 }
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf)
+{
+return osdep_oscall(xcall, sbdf);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
index d18a3174e9dc..b92a0b5dc12c 100644
--- a/tools/libs/call/libxencall.map
+++ b/tools/libs/call/libxencall.map
@@ -10,6 +10,8 @@ VERS_1.0 {
xencall4;
xencall5;
 
+   xen_oscall_gsi_from_dev;
+
xencall_alloc_buffer;
xencall_free_buffer;
xencall_alloc_buffer_pages;
diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
index 6d588e6bea8f..92c740e176f2 100644
--- a/tools/libs/call/linux.c
+++ b/tools/libs/call/linux.c
@@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, 
privcmd_hypercall_t *hypercall)
 return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
 }
 
+int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
+{
+privcmd_gsi_from_dev_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = -1,
+};
+
+if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, &dev_gsi)) {
+PERROR("failed to get gsi from dev");
+return -1;
+}
+
+return dev_gsi.gsi;
+}
+
 static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages)
 {
 void *p;
diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h
index 9c3aa432efe2..c

[XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device

2024-05-16 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 36 
 xen/drivers/vpci/vpci.c  | 10 ++
 xen/include/public/physdev.h |  7 +++
 xen/include/xen/vpci.h   |  6 ++
 5 files changed, 60 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 14679dd82971..56fbb69ab201 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..73dc8f058b0e 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct physdev_pci_device dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(&dev, arg, 1) != 0 )
+break;
+sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(&pdev->domain->pci_lock);
+ret = vpci_reset_device_state(pdev);
+write_unlock(&pdev->domain->pci_lock);
+pcidevs_unlock();
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
&sbdf);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 97e115dc5798..424aec2d5c46 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -115,6 +115,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(pcidevs_locked());
+ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..f5bab1f29779 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 6e4c972f35ed..93b1c1d72c05 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -30,6 +30,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_deassign_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -266,6 +267,11 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
 
 static inline void vpci_deassign_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1




[XEN PATCH v8 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-05-16 Thread Jiqian Chen
On PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
passthrough devices to register gsi when dom0 is PVH.
So, add PHYSDEVOP_setup_gsi for above purpose.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index d49fb8b548a3..98e3c6b176ff 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_unmap_pirq:
 break;
 
+case PHYSDEVOP_setup_gsi:
+if ( !is_hardware_domain(currd) )
+return -EOPNOTSUPP;
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[XEN PATCH v8 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-05-16 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see
xen_pt_realize->xc_physdev_map_pirq and
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when caller has no PIRQ
flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  2 ++
 xen/arch/x86/physdev.c   | 24 
 2 files changed, 26 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 56fbb69ab201..d49fb8b548a3 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 7efa17cf4c1e..1337f95171cd 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_map_pirq: {
 physdev_map_pirq_t map;
 struct msi_info msi;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(&map, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(map.domid);
+if ( d == NULL )
+return -ESRCH;
+/* If caller is the same HVM guest as current, check pirq flag */
+if ( !is_pv_domain(d) && !has_pirq(d) && map.domid == DOMID_SELF )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 switch ( map.type )
 {
 case MAP_PIRQ_TYPE_MSI_SEG:
@@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 case PHYSDEVOP_unmap_pirq: {
 struct physdev_unmap_pirq unmap;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(&unmap, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(unmap.domid);
+if ( d == NULL )
+return -ESRCH;
+/* If caller is the same HVM guest as current, check pirq flag */
+if ( !is_pv_domain(d) && !has_pirq(d) && unmap.domid == DOMID_SELF )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
 break;
 }
-- 
2.34.1




[XEN PATCH v8 0/5] Support device passthrough when dom0 is PVH on Xen

2024-05-16 Thread Jiqian Chen
Hi All,
This is v8 series to support passthrough when dom0 is PVH
v6->v7 changes:
* patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when 
guest doesn't use pirq.
   That check was missed in the previous version.
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function
   to get gsi by passing in the sbdf of pci device.
* patch#5: Remove the parameter "is_gsi", when there exist gsi, in 
pci_add_dm_done use a new function
   pci_device_set_gsi to do map_pirq and grant permission. That gets 
more intuitive code logic.


Best regards,
Jiqian Chen



v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function
   to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.


v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old 
function vpci_remove_device,
   vpci_add_handlers to vpci_deassign_device, vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to 
be understood, which to use
   gsi by default and be compatible with older kernel versions to 
continue to use irq


v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if 
the caller has PIRQ flag, and
   just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on 
it. And add the handling of errno
   and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall 
XEN_DOMCTL_gsi_permission to grant gsi


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move 
printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH 
is treated as a separate patch
* patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to 
grant irq permission in
   XEN_DOMCTL_irq_permission.
* patch#5: to be compatible with previous kernel versions, when there is no gsi 
sysfs, still use irq
v4 link:
https://lore.kernel.org/xen-devel/20240105070920.350113-1-jiqian.c...@amd.com/T/#t

v2->v3 changes:
* patch#1: move the content out of pci_reset_device_state and delete 
pci_reset_device_state; add
   xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; 
add description for
   PHYSDEVOP_pci_device_state_reset;
* patch#2: du to changes in the implementation of the second patch on kernel 
side(that it will do setup_gsi and
   map_pirq when assigning a device to passthrough), add 
PHYSDEVOP_setup_gsi for PVH dom0, and we need
   to support self mapping.
* patch#3: du to changes in the implementation of the second patch on kernel 
side(that adds a new sysfs for gsi
   instead of a new syscall), so read gsi number from the sysfs of gsi.
v3 link:
https://lore.kernel.org/xen-devel/20231210164009.1551147-1-jiqian.c...@amd.com/T/#t

v2 link:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-jiqian.c...@amd.com/T/#t
Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a 
device, pci_stub will call
pcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword()\u201d, the pci config
write will trigger an io interrupt to bar_write() in the xen, but the
bar->enabled was set before, the write is not allowed now, and then when 
bar->Qemu config the
passthrough device in xen_pt_realize(), it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current cached state in
pdev->vpci is all out of date and is different from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new hypercall to reset the
state st

[RFC KERNEL PATCH v7 0/2] Support device passthrough when dom0 is PVH on Xen

2024-05-14 Thread Jiqian Chen
Hi All,
This is v7 series to support passthrough on Xen when dom0 is PVH.
v6->v7 change:
* the first patch of v6 was already merged into branch linux_next.
* patch#1: is the patch#2 of v6. move the implementation of function 
xen_acpi_get_gsi_info to
   file drivers/xen/acpi.c, that modification is more convenient for 
the subsequent
   patch to obtain gsi.
* patch#2: is the patch#3 of v6. add a new parameter "gsi" to struct 
pcistub_device and set
   gsi when pcistub initialize device. Then when userspace wants to get 
gsi by passing
   sbdf, we can return that gsi.


Best regards,
Jiqian Chen




v5->v6 change:
* patch#3: change to add a new syscall to translate irq to gsi, instead adding 
a new gsi sysfs.


v4->v5 changes:
* patch#1: Add Reviewed-by Stefano
* patch#2: Add Reviewed-by Stefano
* patch#3: No changes


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; use a new 
function
   pcistub_reset_device_state to wrap __pci_reset_function_locked and 
xen_reset_device_state,
   and call pcistub_reset_device_state in pci_stub.c
* patch#2: remove map_pirq from xen_pvh_passthrough_gsi


v2->v3 changes:
* patch#1: add condition to limit do xen_reset_device_state for no-pv domain in 
pcistub_init_device.
* patch#2: Abandoning previous implementations that call unmask_irq. To setup 
gsi and map pirq for
   passthrough device in pcistub_init_device.
* patch#3: Abandoning previous implementations that adds new syscall to get gsi 
from irq. To add a new
   sysfs for gsi, then userspace can get gsi number from sysfs.


Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a 
device, pci_stub will
call \u201cpcistub_init_device() -> pci_restore_state() -> 
pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword(), the pci
config write will trigger an io interrupt to bar_write() in the xen, but the 
bar->enabled was set before,
the write is not allowed now, and then when bar->Qemu config the passthrough 
device in xen_pt_realize(),
it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current cached state
in pdev->vpci is all out of date and is different from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new hypercall to
reset the state stored in vPCI when the state of real device has changed.
Thank Roger for the suggestion of this v2, and it is different from v1
(https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), 
v1 simply allow domU to
write pci bar, it does not comply with the design principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using 
gsi. See
xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. 
Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no 
X86_EMU_USE_PIRQ flag, it
will fail at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 
(at present dom0 is PVH).
The second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow 
PVH dom0 do
PHYSDEVOP_map_pirq. This v2 patch is better than v1, v1 simply remove the 
has_pirq check
(xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.hu...@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a 
passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to 
check if the gsi has
corresponding mappings in dom0. But it didn\u2019t, so failed. See
XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and it 
return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH 
dom0, because the
devices of PVH are using MSI(-X) interrupts. However, the IO-APIC pin must be 
configured for it to be
able to be mapped i

[RFC KERNEL PATCH v7 2/2] xen/privcmd: Add new syscall to get gsi from dev

2024-05-14 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi 28,
it causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use device's gsi
number to do pirq mapping, but the gsi number is got from
file /sys/bus/pci/devices//irq, irq!= gsi, so it will
fail when mapping.
And in current linux codes, there is no method to get gsi
for userspace.

For above purpose, record gsi of pcistub devices when init
pcistub and add a new syscall into privcmd to let userspace
can get gsi when they have a need.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 drivers/xen/privcmd.c  | 28 ++
 drivers/xen/xen-pciback/pci_stub.c | 38 +++---
 include/uapi/xen/privcmd.h |  7 ++
 include/xen/acpi.h |  2 ++
 4 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 67dfa4778864..5953a03b5cb0 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -45,6 +45,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_ACPI
+#include 
+#endif
 
 #include "privcmd.h"
 
@@ -842,6 +845,27 @@ static long privcmd_ioctl_mmap_resource(struct file *file,
return rc;
 }
 
+static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata)
+{
+   struct privcmd_gsi_from_dev kdata;
+
+   if (copy_from_user(&kdata, udata, sizeof(kdata)))
+   return -EFAULT;
+
+#ifdef CONFIG_ACPI
+   kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf);
+   if (kdata.gsi == -1)
+   return -EINVAL;
+#else
+   kdata.gsi = -1;
+#endif
+
+   if (copy_to_user(udata, &kdata, sizeof(kdata)))
+   return -EFAULT;
+
+   return 0;
+}
+
 #ifdef CONFIG_XEN_PRIVCMD_EVENTFD
 /* Irqfd support */
 static struct workqueue_struct *irqfd_cleanup_wq;
@@ -1529,6 +1553,10 @@ static long privcmd_ioctl(struct file *file,
ret = privcmd_ioctl_ioeventfd(file, udata);
break;
 
+   case IOCTL_PRIVCMD_GSI_FROM_DEV:
+   ret = privcmd_ioctl_gsi_from_dev(file, udata);
+   break;
+
default:
break;
}
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 2b90d832d0a7..4b62b4d377a9 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -56,6 +56,9 @@ struct pcistub_device {
 
struct pci_dev *dev;
struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */
+#ifdef CONFIG_ACPI
+   int gsi;
+#endif
 };
 
 /* Access to pcistub_devices & seized_devices lists and the initialize_devices
@@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
 
kref_init(&psdev->kref);
spin_lock_init(&psdev->lock);
+#ifdef CONFIG_ACPI
+   psdev->gsi = -1;
+#endif
 
return psdev;
 }
@@ -220,6 +226,25 @@ static struct pci_dev *pcistub_device_get_pci_dev(struct 
xen_pcibk_device *pdev,
return pci_dev;
 }
 
+#ifdef CONFIG_ACPI
+int pcistub_get_gsi_from_sbdf(unsigned int sbdf)
+{
+   struct pcistub_device *psdev;
+   int domain = sbdf >> 16;
+   int bus = (sbdf >> 8) & 0xff;
+   int slot = (sbdf >> 3) & 0x1f;
+   int func = sbdf & 0x7;
+
+   psdev = pcistub_device_find(domain, bus, slot, func);
+
+   if (!psdev)
+   return -1;
+
+   return psdev->gsi;
+}
+EXPORT_SYMBOL_GPL(pcistub_get_gsi_from_sbdf);
+#endif
+
 struct pci_dev *pcistub_get_pci_dev_by_slot(struct xen_pcibk_device *pdev,
int domain, int bus,
int slot, int func)
@@ -367,14 +392,20 @@ static int pcistub_match(struct pci_dev *dev)
return found;
 }
 
-static int pcistub_init_device(struct pci_dev *dev)
+static int pcistub_init_device(struct pcistub_device *psdev)
 {
struct xen_pcibk_dev_data *dev_data;
+   struct pci_dev *dev;
 #ifdef CONFIG_ACPI
int gsi, trigger, polarity;
 #endif
int err = 0;
 
+   if (!psdev)
+   return -EINVAL;
+
+   dev = psdev->dev;
+
dev_dbg(&dev->dev, "initializing...\n");
 
/* The PCI backend is not intended to be a module (or to work with
@@ -448,6 +479,7 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_err(&dev->dev, "Fail to get gsi info!\n");
goto config_release;
}
+   psdev->gsi = gsi;
 
if (xen_initial_domain() && xen_pvh_domain()) {
err = xen_pvh_setup_gsi(gsi, trigg

[RFC KERNEL PATCH v7 1/2] xen/pvh: Setup gsi for passthrough device

2024-05-14 Thread Jiqian Chen
In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assign a device to passthrough, proactively setup the gsi
of the device during that process.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 arch/x86/xen/enlighten_pvh.c   | 21 +
 drivers/acpi/pci_irq.c |  2 +-
 drivers/xen/acpi.c | 50 ++
 drivers/xen/xen-pciback/pci_stub.c | 21 +
 include/linux/acpi.h   |  1 +
 include/xen/acpi.h | 10 ++
 6 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index 27a2a02ef8fb..711cdcbc6916 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -4,6 +4,7 @@
 #include 
 
 #include 
+#include 
 
 #include 
 #include 
@@ -27,6 +28,26 @@
 bool __ro_after_init xen_pvh;
 EXPORT_SYMBOL_GPL(xen_pvh);
 
+int xen_pvh_setup_gsi(int gsi, int trigger, int polarity)
+{
+   int ret;
+   struct physdev_setup_gsi setup_gsi;
+
+   setup_gsi.gsi = gsi;
+   setup_gsi.triggering = (trigger == ACPI_EDGE_SENSITIVE ? 0 : 1);
+   setup_gsi.polarity = (polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+   ret = HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi);
+   if (ret == -EEXIST) {
+   xen_raw_printk("Already setup the GSI :%d\n", gsi);
+   ret = 0;
+   } else if (ret)
+   xen_raw_printk("Fail to setup GSI (%d)!\n", gsi);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_setup_gsi);
+
 void __init xen_pvh_init(struct boot_params *boot_params)
 {
u32 msr;
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..630fe0a34bc6 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev,
 }
 #endif /* CONFIG_X86_IO_APIC */
 
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
 {
struct acpi_prt_entry *entry = NULL;
struct pci_dev *bridge;
diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c
index 6893c79fd2a1..9e2096524fbc 100644
--- a/drivers/xen/acpi.c
+++ b/drivers/xen/acpi.c
@@ -30,6 +30,7 @@
  * IN THE SOFTWARE.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -75,3 +76,52 @@ int xen_acpi_notify_hypervisor_extended_sleep(u8 sleep_state,
return xen_acpi_notify_hypervisor_state(sleep_state, val_a,
val_b, true);
 }
+
+struct acpi_prt_entry {
+   struct acpi_pci_id  id;
+   u8  pin;
+   acpi_handle link;
+   u32 index;
+};
+
+int xen_acpi_get_gsi_info(struct pci_dev *dev,
+ int *gsi_out,
+ int *trigger_out,
+ int *polarity_out)
+{
+   int gsi;
+   u8 pin;
+   struct acpi_prt_entry *entry;
+   int trigger = ACPI_LEVEL_SENSITIVE;
+   int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+ ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+   if (!dev || !gsi_out || !trigger_out || !polarity_out)
+   return -EINVAL;
+
+   pin = dev->pin;
+   if (!pin)
+   return -EINVAL;
+
+   entry = acpi_pci_irq_lookup(dev, pin);
+   if (entry) {
+   if (entry->link)
+   gsi = acpi_pci_link_allocate_irq(entry->link,
+entry->index,
+&trigger, &polarity,
+NULL);
+   else
+   gsi = entry->index;
+   } else
+   gsi = -1;
+
+   if (gsi < 0)
+   return -EINVAL;
+
+   *gsi_out = gsi;
+   *trigger_out = trigger;
+   *polarity_out = polarity;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(xen_acpi_get_gsi_info);
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 46c40ec8a18e..2b90d832d0a7 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -21,6 +21,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_ACPI
+#include 
+#endif
 #include 
 #include 
 #include "pciback.h"
@@ -367,6 +370,9 @@ static int pcistub_match(struct pci_dev *dev)
 static int pcistub_init_device(struct pci_dev *dev)
 {
struct xen_pcibk_dev_data *dev_data;
+#ifdef CONFIG_ACPI
+   int gsi, trigger, polarity;
+#endif
int err = 0;
 
dev_dbg(&dev->dev, "in

[RFC QEMU PATCH v6 0/1] Support device passthrough when dom0 is PVH on Xen

2024-04-18 Thread Jiqian Chen
Hi All,
This is v6 series to support passthrough on Xen when dom0 is PVH.
v5->v6 changes:
* Due to changes in the implementation of obtaining gsi in the kernel and Xen. 
Change to use xc_physdev_gsi_from_irq, instead of gsi sysfs.

Best regards,
Jiqian Chen


v4->v5 changes:
* Add review by Stefano


v3->v4 changes:
* Add gsi into struct XenHostPCIDevice and use gsi number that read from gsi 
sysfs
  if it exists, if there is no gsi sysfs, still use irq.


v2->v3 changes:
* Du to changes in the implementation of the second patch on kernel side(that 
adds
  a new sysfs for gsi instead of a new syscall), so read gsi number from the 
sysfs of gsi.


Below is the description of v2 cover letter:
This patch is the v2 of the implementation of passthrough when dom0 is PVH on 
Xen.
Issues we encountered:
1. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough 
device\u2019s gsi to pirq in
function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq,
but qemu pass irq to it and treat irq as gsi, it is got from file
/sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). 
But actually
the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a 
gsi in
function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the 
principle of
applying first, distributing first. And if you debug the kernel codes
(see function __irq_alloc_descs), you will find the irq number is allocated 
from small to
large by order, but the applying gsi number is not, gsi 38 may come before gsi 
28, that
causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want
to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add 
new syscall
to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() 
when dom0
initialize pci devices, and provide a syscall for userspace to get the gsi from 
irq. The
third patch of xen(tools: Add new function to get gsi from irq) add a new 
function
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success.

This v2 on qemu side is the same as the v1 
(qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (1):
  xen/pci: get gsi from irq for passthrough devices

 hw/xen/xen-host-pci-device.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
2.34.1




[RFC QEMU PATCH v6 1/1] xen/pci: get gsi from irq for passthrough devices

2024-04-18 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, qemu wants to use
gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but
the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

Translate irq to gsi by using new function supported by
Xen tools.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 hw/xen/xen-host-pci-device.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 8c6e9a1716a2..5e9aa9679e3e 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -10,6 +10,7 @@
 #include "qapi/error.h"
 #include "qemu/cutils.h"
 #include "xen-host-pci-device.h"
+#include "hw/xen/xen_native.h"
 
 #define XEN_HOST_PCI_MAX_EXT_CAP \
 ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4))
@@ -368,7 +369,11 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 if (*errp) {
 goto error;
 }
-d->irq = v;
+d->irq = xc_physdev_gsi_from_irq(xen_xc, v);
+/* if fail to get gsi, fallback to irq */
+if (d->irq == -1) {
+d->irq = v;
+}
 
 xen_host_pci_get_hex_value(d, "class", &v, errp);
 if (*errp) {
-- 
2.34.1




[RFC XEN PATCH v7 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-04-18 Thread Jiqian Chen
Some type of domain don't have PIRQ, like PVH, when
passthrough a device to guest on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
at domain_pirq_to_irq.

So, add a new hypercall to grant/revoke gsi permission
when dom0 is not PV or dom0 has not PIRQ flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/include/xenctrl.h  |  5 
 tools/libs/ctrl/xc_domain.c  | 15 
 tools/libs/light/libxl_pci.c | 46 
 xen/arch/x86/domctl.c| 31 
 xen/include/public/domctl.h  |  9 +++
 xen/xsm/flask/hooks.c|  1 +
 6 files changed, 97 insertions(+), 10 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2b9d55d2c6d7..adeaab93d0f7 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.allow_access = allow_access,
+};
+
+return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index d4313e196ebd..7e82f31ffc4f 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1421,6 +1421,8 @@ static void pci_add_dm_done(libxl__egc *egc,
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
+int gsi;
+bool is_gsi = false;
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1490,6 +1492,8 @@ static void pci_add_dm_done(libxl__egc *egc,
 r = xc_physdev_gsi_from_irq(ctx->xch, irq);
 if (r != -1) {
 irq = r;
+gsi = r;
+is_gsi = true;
 }
 r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
 if (r < 0) {
@@ -1499,13 +1503,25 @@ static void pci_add_dm_done(libxl__egc *egc,
 rc = ERROR_FAIL;
 goto out;
 }
-r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
-if (r < 0) {
-LOGED(ERROR, domainid,
-  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
-fclose(f);
-rc = ERROR_FAIL;
-goto out;
+if (is_gsi) {
+r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
+if (r < 0 && errno != -EOPNOTSUPP) {
+LOGED(ERROR, domainid,
+  "xc_domain_gsi_permission gsi=%d (error=%d)", gsi, 
errno);
+fclose(f);
+rc = ERROR_FAIL;
+goto out;
+}
+}
+if (!is_gsi || errno == -EOPNOTSUPP) {
+r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+if (r < 0) {
+LOGED(ERROR, domainid,
+"xc_domain_irq_permission irq=%d (error=%d)", irq, errno);
+fclose(f);
+rc = ERROR_FAIL;
+goto out;
+}
 }
 }
 fclose(f);
@@ -2180,6 +2196,7 @@ static void pci_remove_detached(libxl__egc *egc,
 uint32_t domainid = prs->domid;
 bool isstubdom;
 int r;
+bool is_gsi = false;
 
 /* Convenience aliases */
 libxl_device_pci *const pci = &prs->pci;
@@ -2249,6 +2266,7 @@ skip_bar:
 r = xc_physdev_gsi_from_irq(ctx->xch, irq);
 if (r != -1) {
 irq = r;
+is_gsi = true;
 }
 rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
 if (rc < 0) {
@@ -2260,9 +2278,17 @@ skip_bar:
  */
 LOGED(ERROR, domid, "xc_physdev_unmap

[RFC XEN PATCH v7 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-04-18 Thread Jiqian Chen
On PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
passthrough devices to register gsi when dom0 is PVH.
So, add PHYSDEVOP_setup_gsi for above purpose.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index d49fb8b548a3..98e3c6b176ff 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_unmap_pirq:
 break;
 
+case PHYSDEVOP_setup_gsi:
+if ( !is_hardware_domain(currd) )
+return -EOPNOTSUPP;
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[XEN PATCH v7 1/5] xen/vpci: Clear all vpci status of device

2024-04-18 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 36 
 xen/drivers/vpci/vpci.c  | 10 ++
 xen/include/public/physdev.h |  7 +++
 xen/include/xen/vpci.h   |  6 ++
 5 files changed, 60 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 14679dd82971..56fbb69ab201 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..73dc8f058b0e 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct physdev_pci_device dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(&dev, arg, 1) != 0 )
+break;
+sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(&pdev->domain->pci_lock);
+ret = vpci_reset_device_state(pdev);
+write_unlock(&pdev->domain->pci_lock);
+pcidevs_unlock();
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
&sbdf);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 97e115dc5798..424aec2d5c46 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -115,6 +115,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(pcidevs_locked());
+ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..f5bab1f29779 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index e89c571890b2..ea64d94e818b 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -30,6 +30,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_deassign_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -266,6 +267,11 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
 
 static inline void vpci_deassign_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1




[RFC XEN PATCH v7 4/5] tools: Add new function to get gsi from irq

2024-04-18 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
irq number is alloced from small to large, but the applying
gsi number is not, may gsi 38 comes before gsi 28, that
causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use its gsi number
to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq, so it will fail when mapping.
And in current codes, there is no method to translate irq to
gsi for userspace.

For above purpose, add new function to get that translation.

And call this function before xc_physdev_(un)map_pirq

Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
 tools/include/xencall.h|  2 ++
 tools/include/xenctrl.h|  2 ++
 tools/libs/call/core.c |  5 +
 tools/libs/call/libxencall.map |  2 ++
 tools/libs/call/linux.c| 15 +++
 tools/libs/call/private.h  |  9 +
 tools/libs/ctrl/xc_physdev.c   |  4 
 tools/libs/light/libxl_pci.c   | 11 +++
 8 files changed, 50 insertions(+)

diff --git a/tools/include/xencall.h b/tools/include/xencall.h
index fc95ed0fe58e..962cb45e1f1b 100644
--- a/tools/include/xencall.h
+++ b/tools/include/xencall.h
@@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
  uint64_t arg1, uint64_t arg2, uint64_t arg3,
  uint64_t arg4, uint64_t arg5);
 
+int xen_oscall_gsi_from_irq(xencall_handle *xcall, int irq);
+
 /* Variant(s) of the above, as needed, returning "long" instead of "int". */
 long xencall2L(xencall_handle *xcall, unsigned int op,
uint64_t arg1, uint64_t arg2);
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2ef8b4e05422..2b9d55d2c6d7 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_gsi_from_irq(xc_interface *xch, int irq);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
index 02c4f8e1aefa..6f79f3babd19 100644
--- a/tools/libs/call/core.c
+++ b/tools/libs/call/core.c
@@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
 return osdep_hypercall(xcall, &call);
 }
 
+int xen_oscall_gsi_from_irq(xencall_handle *xcall, int irq)
+{
+return osdep_oscall(xcall, irq);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
index d18a3174e9dc..6cde8eda05e2 100644
--- a/tools/libs/call/libxencall.map
+++ b/tools/libs/call/libxencall.map
@@ -10,6 +10,8 @@ VERS_1.0 {
xencall4;
xencall5;
 
+   xen_oscall_gsi_from_irq;
+
xencall_alloc_buffer;
xencall_free_buffer;
xencall_alloc_buffer_pages;
diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
index 6d588e6bea8f..32b60c8b403e 100644
--- a/tools/libs/call/linux.c
+++ b/tools/libs/call/linux.c
@@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, 
privcmd_hypercall_t *hypercall)
 return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
 }
 
+long osdep_oscall(xencall_handle *xcall, int irq)
+{
+privcmd_gsi_from_irq_t gsi_irq = {
+.irq = irq,
+.gsi = -1,
+};
+
+if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_IRQ, &gsi_irq)) {
+PERROR("failed to get gsi from irq");
+return -1;
+}
+
+return gsi_irq.gsi;
+}
+
 static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages)
 {
 void *p;
diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h
index 9c3aa432efe2..2d86cfb1e099 100644
--- a/tools/libs/call/private.h
+++ b/tools/libs/call/private.h
@@ -57,6 +57,15 @@ int osdep_xencall_close(xencall_handle *xcall);
 
 long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall);
 
+#if defined(__linux__)
+long osdep_oscall(xencall_handle *xcall, int irq);
+#else
+static inline long osdep_oscall(xencall_handle *xcall, int irq)
+{
+return -1;
+}
+#endif
+
 void *osdep_alloc_pages(xencall_handle *xcall, size_t nr_pages);
 void osdep_free_pages(xencall_handle *xcall, void *p, size_t nr_pages);
 
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 460a8e779ce8..4d3b138ebd0e 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -111,3 +111,7 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
 return rc;
 }
 
+int xc_physdev_gsi_from_irq(xc_interface *xch, int irq)
+{
+return xen_oscall_gsi_from_irq(xch->xcall, irq);
+}
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..d4313e196ebd 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light

[RFC XEN PATCH v7 0/5] Support device passthrough when dom0 is PVH on Xen

2024-04-18 Thread Jiqian Chen
Hi All,
This is v7 series to support passthrough when dom0 is PVH
v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.

Best regards,
Jiqian Chen


v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old 
function vpci_remove_device, vpci_add_handlers to vpci_deassign_device, 
vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to 
be understood, which to use gsi by default and be compatible with older kernel 
versions to continue to use irq


v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if 
the caller has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in 
hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on 
it. And add the handling of errno and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall 
XEN_DOMCTL_gsi_permission to grant gsi


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move 
printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH 
is treated as a separate patch
* patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to 
grant irq permission in XEN_DOMCTL_irq_permission.
* patch#5: to be compatible with previous kernel versions, when there is no gsi 
sysfs, still use irq
v4 link:
https://lore.kernel.org/xen-devel/20240105070920.350113-1-jiqian.c...@amd.com/T/#t

v2->v3 changes:
* patch#1: move the content out of pci_reset_device_state and delete 
pci_reset_device_state; add xsm_resource_setup_pci check for 
PHYSDEVOP_pci_device_state_reset; add description for 
PHYSDEVOP_pci_device_state_reset;
* patch#2: du to changes in the implementation of the second patch on kernel 
side(that it will do setup_gsi and map_pirq when assigning a device to 
passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self 
mapping.
* patch#3: du to changes in the implementation of the second patch on kernel 
side(that adds a new sysfs for gsi instead of a new syscall), so read gsi 
number from the sysfs of gsi.
v3 link:
https://lore.kernel.org/xen-devel/20231210164009.1551147-1-jiqian.c...@amd.com/T/#t

v2 link:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-jiqian.c...@amd.com/T/#t
Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a 
device, pci_stub will call \u201cpcistub_init_device() -> pci_restore_state() 
-> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword()\u201d, the pci config write will trigger an io 
interrupt to bar_write() in the xen, but the
bar->enabled was set before, the write is not allowed now, and then when 
bar->Qemu config the
passthrough device in xen_pt_realize(), it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current cached state in pdev->vpci is all out of date and is different 
from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new hypercall to reset the state stored in vPCI when the state of real 
device has changed.
Thank Roger for the suggestion of this v2, and it is different from v1 
(https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), 
v1 simply allow domU to write pci bar, it does not comply with the design 
principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using 
gsi. See xen_pt_realize->xc_physdev_map_pirq and 
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into 
Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH d

[XEN PATCH v7 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-04-18 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see
xen_pt_realize->xc_physdev_map_pirq and
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when caller has no PIRQ
flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  2 ++
 xen/arch/x86/physdev.c   | 24 
 2 files changed, 26 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 56fbb69ab201..d49fb8b548a3 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 7efa17cf4c1e..1367abc61e54 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_map_pirq: {
 physdev_map_pirq_t map;
 struct msi_info msi;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(&map, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(map.domid);
+if ( d == NULL )
+return -ESRCH;
+/* If it is an HVM guest, check if it has PIRQs */
+if ( !is_pv_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 switch ( map.type )
 {
 case MAP_PIRQ_TYPE_MSI_SEG:
@@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 case PHYSDEVOP_unmap_pirq: {
 struct physdev_unmap_pirq unmap;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(&unmap, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(unmap.domid);
+if ( d == NULL )
+return -ESRCH;
+/* If it is an HVM guest, check if it has PIRQs */
+if ( !is_pv_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
 break;
 }
-- 
2.34.1




[RFC KERNEL PATCH v6 0/3] Support device passthrough when dom0 is PVH on Xen

2024-04-18 Thread Jiqian Chen
Hi All,
This is v6 series to support passthrough on Xen when dom0 is PVH.
v5->v6 change:
* patch#3: change to add a new syscall to translate irq to gsi, instead adding 
a new gsi sysfs.


Best regards,
Jiqian Chen


v4->v5 changes:
* patch#1: Add Reviewed-by Stefano
* patch#2: Add Reviewed-by Stefano
* patch#3: No changes


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; use a new 
function pcistub_reset_device_state to wrap __pci_reset_function_locked and 
xen_reset_device_state, and call pcistub_reset_device_state in pci_stub.c
* patch#2: remove map_pirq from xen_pvh_passthrough_gsi


v2->v3 changes:
* patch#1: add condition to limit do xen_reset_device_state for no-pv domain in 
pcistub_init_device.
* patch#2: Abandoning previous implementations that call unmask_irq. To setup 
gsi and map pirq for passthrough device in pcistub_init_device.
* patch#3: Abandoning previous implementations that adds new syscall to get gsi 
from irq. To add a new sysfs for gsi, then userspace can get gsi number from 
sysfs.


Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a 
device, pci_stub will call \u201cpcistub_init_device() -> pci_restore_state() 
-> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword()\u201d, the pci config write will trigger an io 
interrupt to bar_write() in the xen, but the
bar->enabled was set before, the write is not allowed now, and then when 
bar->Qemu config the
passthrough device in xen_pt_realize(), it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current cached state in pdev->vpci is all out of date and is different 
from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new hypercall to reset the state stored in vPCI when the state of real 
device has changed.
Thank Roger for the suggestion of this v2, and it is different from v1 
(https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), 
v1 simply allow domU to write pci bar, it does not comply with the design 
principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using 
gsi. See xen_pt_realize->xc_physdev_map_pirq and 
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into 
Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no 
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 
(at present dom0 is PVH). The second patch of xen(x86/pvh: Open 
PHYSDEVOP_map_pirq for PVH dom0) allow PVH dom0 do PHYSDEVOP_map_pirq. This v2 
patch is better than v1, v1 simply remove the has_pirq check(xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.hu...@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a 
passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to 
check if the gsi has corresponding mappings in dom0. But it didn\u2019t, so 
failed. See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH 
dom0 and it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH 
dom0, because the devices of PVH are using MSI(-X) interrupts. However, the 
IO-APIC pin must be configured for it to be able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be 
done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka 
ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device 
doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask 
irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a 
device to be passthrough. So that passthrough devices can have the mapping of 

[RFC KERNEL PATCH v6 2/3] xen/pvh: Setup gsi for passthrough device

2024-04-18 Thread Jiqian Chen
In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assign a device to passthrough, proactively setup the gsi
of the device during that process.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 arch/x86/xen/enlighten_pvh.c   | 92 ++
 drivers/acpi/pci_irq.c |  2 +-
 drivers/xen/xen-pciback/pci_stub.c |  8 +++
 include/linux/acpi.h   |  1 +
 include/xen/acpi.h |  6 ++
 5 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index c28f073c1df5..12be665b27d8 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -26,6 +27,97 @@
 bool __ro_after_init xen_pvh;
 EXPORT_SYMBOL_GPL(xen_pvh);
 
+typedef struct gsi_info {
+   int gsi;
+   int trigger;
+   int polarity;
+} gsi_info_t;
+
+struct acpi_prt_entry {
+   struct acpi_pci_id  id;
+   u8  pin;
+   acpi_handle link;
+   u32 index;  /* GSI, or link _CRS index */
+};
+
+static int xen_pvh_get_gsi_info(struct pci_dev *dev,
+   gsi_info_t 
*gsi_info)
+{
+   int gsi;
+   u8 pin;
+   struct acpi_prt_entry *entry;
+   int trigger = ACPI_LEVEL_SENSITIVE;
+   int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+ ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+   if (!dev || !gsi_info)
+   return -EINVAL;
+
+   pin = dev->pin;
+   if (!pin)
+   return -EINVAL;
+
+   entry = acpi_pci_irq_lookup(dev, pin);
+   if (entry) {
+   if (entry->link)
+   gsi = acpi_pci_link_allocate_irq(entry->link,
+entry->index,
+&trigger, &polarity,
+NULL);
+   else
+   gsi = entry->index;
+   } else
+   gsi = -1;
+
+   if (gsi < 0)
+   return -EINVAL;
+
+   gsi_info->gsi = gsi;
+   gsi_info->trigger = trigger;
+   gsi_info->polarity = polarity;
+
+   return 0;
+}
+
+static int xen_pvh_setup_gsi(gsi_info_t *gsi_info)
+{
+   struct physdev_setup_gsi setup_gsi;
+
+   if (!gsi_info)
+   return -EINVAL;
+
+   setup_gsi.gsi = gsi_info->gsi;
+   setup_gsi.triggering = (gsi_info->trigger == ACPI_EDGE_SENSITIVE ? 0 : 
1);
+   setup_gsi.polarity = (gsi_info->polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi);
+}
+
+int xen_pvh_passthrough_gsi(struct pci_dev *dev)
+{
+   int ret;
+   gsi_info_t gsi_info;
+
+   if (!dev)
+   return -EINVAL;
+
+   ret = xen_pvh_get_gsi_info(dev, &gsi_info);
+   if (ret) {
+   xen_raw_printk("Fail to get gsi info!\n");
+   return ret;
+   }
+
+   ret = xen_pvh_setup_gsi(&gsi_info);
+   if (ret == -EEXIST) {
+   xen_raw_printk("Already setup the GSI :%d\n", gsi_info.gsi);
+   ret = 0;
+   } else if (ret)
+   xen_raw_printk("Fail to setup GSI (%d)!\n", gsi_info.gsi);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_passthrough_gsi);
+
 void __init xen_pvh_init(struct boot_params *boot_params)
 {
u32 msr;
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..630fe0a34bc6 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev,
 }
 #endif /* CONFIG_X86_IO_APIC */
 
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
 {
struct acpi_prt_entry *entry = NULL;
struct pci_dev *bridge;
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 46c40ec8a18e..22d4380d2b04 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -435,6 +436,13 @@ static int pcistub_init_device(struct pci_dev *dev)
goto config_release;
pci_restore_state(dev);
}
+
+   if (xen_initial_domain() && xen_pvh_domain()) {
+   err = xen_pvh_passthrough_gsi(dev);
+   if (err)
+   goto config_release;
+   }
+
/* Now 

[RFC KERNEL PATCH v6 3/3] xen/privcmd: Add new syscall to get gsi from irq

2024-04-18 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi 28,
it causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use device's gsi
number to do pirq mapping, but the gsi number is got from
file /sys/bus/pci/devices//irq, irq!= gsi, so it will
fail when mapping.
And in current linux codes, there is no method to translate
irq to gsi for userspace.

For above purpose, record the relationship of gsi and irq
when PVH dom0 do acpi_register_gsi_ioapic for devices and
adds a new syscall into privcmd to let userspace can get
that translation when they have a need.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 arch/x86/include/asm/apic.h  |  8 +++
 arch/x86/include/asm/xen/pci.h   |  5 
 arch/x86/kernel/acpi/boot.c  |  2 +-
 arch/x86/pci/xen.c   | 21 +
 drivers/xen/events/events_base.c | 39 
 drivers/xen/privcmd.c| 19 
 include/uapi/xen/privcmd.h   |  7 ++
 include/xen/events.h |  5 
 8 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 9d159b771dc8..dd4139250895 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -169,6 +169,9 @@ extern bool apic_needs_pit(void);
 
 extern void apic_send_IPI_allbutself(unsigned int vector);
 
+extern int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
+   int trigger, int polarity);
+
 #else /* !CONFIG_X86_LOCAL_APIC */
 static inline void lapic_shutdown(void) { }
 #define local_apic_timer_c2_ok 1
@@ -183,6 +186,11 @@ static inline void apic_intr_mode_init(void) { }
 static inline void lapic_assign_system_vectors(void) { }
 static inline void lapic_assign_legacy_vector(unsigned int i, bool r) { }
 static inline bool apic_needs_pit(void) { return true; }
+static inline int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
+   int trigger, int polarity)
+{
+   return (int)gsi;
+}
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
 #ifdef CONFIG_X86_X2APIC
diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h
index 9015b888edd6..aa8ded61fc2d 100644
--- a/arch/x86/include/asm/xen/pci.h
+++ b/arch/x86/include/asm/xen/pci.h
@@ -5,6 +5,7 @@
 #if defined(CONFIG_PCI_XEN)
 extern int __init pci_xen_init(void);
 extern int __init pci_xen_hvm_init(void);
+extern int __init pci_xen_pvh_init(void);
 #define pci_xen 1
 #else
 #define pci_xen 0
@@ -13,6 +14,10 @@ static inline int pci_xen_hvm_init(void)
 {
return -1;
 }
+static inline int pci_xen_pvh_init(void)
+{
+   return -1;
+}
 #endif
 #ifdef CONFIG_XEN_PV_DOM0
 int __init pci_xen_initial_domain(void);
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 85a3ce2a3666..72c73458c083 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -749,7 +749,7 @@ static int acpi_register_gsi_pic(struct device *dev, u32 
gsi,
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC
-static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
+int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
int trigger, int polarity)
 {
int irq = gsi;
diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 652cd53e77f6..f056ab5c0a06 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -114,6 +114,21 @@ static int acpi_register_gsi_xen_hvm(struct device *dev, 
u32 gsi,
 false /* no mapping of GSI to PIRQ */);
 }
 
+static int acpi_register_gsi_xen_pvh(struct device *dev, u32 gsi,
+   int trigger, int polarity)
+{
+   int irq;
+
+   irq = acpi_register_gsi_ioapic(dev, gsi, trigger, polarity);
+   if (irq < 0)
+   return irq;
+
+   if (xen_pvh_add_gsi_irq_map(gsi, irq) == -EEXIST)
+   printk(KERN_INFO "Already map the GSI :%u and IRQ: %d\n", gsi, 
irq);
+
+   return irq;
+}
+
 #ifdef CONFIG_XEN_PV_DOM0
 static int xen_register_gsi(u32 gsi, int triggering, int polarity)
 {
@@ -558,6 +573,12 @@ int __init pci_xen_hvm_init(void)
return 0;
 }
 
+int __init pci_xen_pvh_init(void)
+{
+   __acpi_register_gsi = acpi_register_gsi_xen_pvh;
+   return 0;
+}
+
 #ifdef CONFIG_XEN_PV_DOM0
 int __init pci_xen_initial_domain(void)
 {
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 27553673e46b..80d4f7faac64 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -953,6 +953,43 @@ int xen_irq_from_gsi(unsigned gsi)
 }
 EXPORT_SYMBOL_GPL(xen_irq_from_gsi);
 
+int xen_gsi_fro

[KERNEL PATCH v6 1/3] xen/pci: Add xen_reset_device_state function

2024-04-18 Thread Jiqian Chen
When device on dom0 side has been reset, the vpci on Xen side
won't get notification, so that the cached state in vpci is
all out of date with the real device state.
To solve that problem, add a new function to clear all vpci
device state when device is reset on dom0 side.

And call that function in pcistub_init_device. Because when
using "pci-assignable-add" to assign a passthrough device in
Xen, it will reset passthrough device and the vpci state will
out of date, and then device will fail to restore bar state.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 drivers/xen/pci.c  | 12 
 drivers/xen/xen-pciback/pci_stub.c | 18 +++---
 include/xen/interface/physdev.h|  7 +++
 include/xen/pci.h  |  6 ++
 4 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 72d4e3f193af..e9b30bc09139 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -177,6 +177,18 @@ static int xen_remove_device(struct device *dev)
return r;
 }
 
+int xen_reset_device_state(const struct pci_dev *dev)
+{
+   struct physdev_pci_device device = {
+   .seg = pci_domain_nr(dev->bus),
+   .bus = dev->bus->number,
+   .devfn = dev->devfn
+   };
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, &device);
+}
+EXPORT_SYMBOL_GPL(xen_reset_device_state);
+
 static int xen_pci_notifier(struct notifier_block *nb,
unsigned long action, void *data)
 {
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index e34b623e4b41..46c40ec8a18e 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -89,6 +89,16 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
return psdev;
 }
 
+static int pcistub_reset_device_state(struct pci_dev *dev)
+{
+   __pci_reset_function_locked(dev);
+
+   if (!xen_pv_domain())
+   return xen_reset_device_state(dev);
+   else
+   return 0;
+}
+
 /* Don't call this directly as it's called by pcistub_device_put */
 static void pcistub_device_release(struct kref *kref)
 {
@@ -107,7 +117,7 @@ static void pcistub_device_release(struct kref *kref)
/* Call the reset function which does not take lock as this
 * is called from "unbind" which takes a device_lock mutex.
 */
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
if (dev_data &&
pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state))
dev_info(&dev->dev, "Could not reload PCI state\n");
@@ -284,7 +294,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 * (so it's ready for the next domain)
 */
device_lock_assert(&dev->dev);
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
 
dev_data = pci_get_drvdata(dev);
ret = pci_load_saved_state(dev, dev_data->pci_saved_state);
@@ -420,7 +430,9 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_err(&dev->dev, "Could not store PCI conf saved state!\n");
else {
dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n");
-   __pci_reset_function_locked(dev);
+   err = pcistub_reset_device_state(dev);
+   if (err)
+   goto config_release;
pci_restore_state(dev);
}
/* Now disable the device (this also ensures some private device
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index a237af867873..8609770e28f5 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -256,6 +256,13 @@ struct physdev_pci_device_add {
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/include/xen/pci.h b/include/xen/pci.h
index b8337cf85fd1..b2e2e856efd6 100644
--- a/include/xen/pci.h
+++ b/include/xen/pci.h
@@ -4,10 +4,16 @@
 #define __XEN_PCI_H__
 
 #if defined(CONFIG_XEN_DOM0)
+int xen_reset_device_state(const struct pci_dev *dev);
 int xen_find_device_domain_owner(struct pci_dev *dev);
 int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain);
 int xen_unregister_device_domain_owner(struct pci_dev *dev);
 #else
+static inline int xen_reset_device_state(const struct pci_dev *dev)
+{
+   return -1;
+}
+
 static inline int xen_find_device_domain_owner(struct pci_dev *dev)
 {
return -1;
-- 
2.34.1




[KERNEL PATCH v5 1/3] xen/pci: Add xen_reset_device_state function

2024-03-27 Thread Jiqian Chen
When device on dom0 side has been reset, the vpci on Xen side
won't get notification, so that the cached state in vpci is
all out of date with the real device state.
To solve that problem, add a new function to clear all vpci
device state when device is reset on dom0 side.

And call that function in pcistub_init_device. Because when
using "pci-assignable-add" to assign a passthrough device in
Xen, it will reset passthrough device and the vpci state will
out of date, and then device will fail to restore bar state.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 drivers/xen/pci.c  | 12 
 drivers/xen/xen-pciback/pci_stub.c | 18 +++---
 include/xen/interface/physdev.h|  7 +++
 include/xen/pci.h  |  6 ++
 4 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 72d4e3f193af..e9b30bc09139 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -177,6 +177,18 @@ static int xen_remove_device(struct device *dev)
return r;
 }
 
+int xen_reset_device_state(const struct pci_dev *dev)
+{
+   struct physdev_pci_device device = {
+   .seg = pci_domain_nr(dev->bus),
+   .bus = dev->bus->number,
+   .devfn = dev->devfn
+   };
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, &device);
+}
+EXPORT_SYMBOL_GPL(xen_reset_device_state);
+
 static int xen_pci_notifier(struct notifier_block *nb,
unsigned long action, void *data)
 {
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index e34b623e4b41..46c40ec8a18e 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -89,6 +89,16 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
return psdev;
 }
 
+static int pcistub_reset_device_state(struct pci_dev *dev)
+{
+   __pci_reset_function_locked(dev);
+
+   if (!xen_pv_domain())
+   return xen_reset_device_state(dev);
+   else
+   return 0;
+}
+
 /* Don't call this directly as it's called by pcistub_device_put */
 static void pcistub_device_release(struct kref *kref)
 {
@@ -107,7 +117,7 @@ static void pcistub_device_release(struct kref *kref)
/* Call the reset function which does not take lock as this
 * is called from "unbind" which takes a device_lock mutex.
 */
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
if (dev_data &&
pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state))
dev_info(&dev->dev, "Could not reload PCI state\n");
@@ -284,7 +294,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 * (so it's ready for the next domain)
 */
device_lock_assert(&dev->dev);
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
 
dev_data = pci_get_drvdata(dev);
ret = pci_load_saved_state(dev, dev_data->pci_saved_state);
@@ -420,7 +430,9 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_err(&dev->dev, "Could not store PCI conf saved state!\n");
else {
dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n");
-   __pci_reset_function_locked(dev);
+   err = pcistub_reset_device_state(dev);
+   if (err)
+   goto config_release;
pci_restore_state(dev);
}
/* Now disable the device (this also ensures some private device
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index a237af867873..8609770e28f5 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -256,6 +256,13 @@ struct physdev_pci_device_add {
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/include/xen/pci.h b/include/xen/pci.h
index b8337cf85fd1..b2e2e856efd6 100644
--- a/include/xen/pci.h
+++ b/include/xen/pci.h
@@ -4,10 +4,16 @@
 #define __XEN_PCI_H__
 
 #if defined(CONFIG_XEN_DOM0)
+int xen_reset_device_state(const struct pci_dev *dev);
 int xen_find_device_domain_owner(struct pci_dev *dev);
 int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain);
 int xen_unregister_device_domain_owner(struct pci_dev *dev);
 #else
+static inline int xen_reset_device_state(const struct pci_dev *dev)
+{
+   return -1;
+}
+
 static inline int xen_find_device_domain_owner(struct pci_dev *dev)
 {
return -1;
-- 
2.34.1




[RFC KERNEL PATCH v5 3/3] PCI/sysfs: Add gsi sysfs for pci_dev

2024-03-27 Thread Jiqian Chen
There is a need for some scenarios to use gsi sysfs.
For example, when xen passthrough a device to dumU, it will
use gsi to map pirq, but currently userspace can't get gsi
number.
So, add gsi sysfs for that and for other potential scenarios.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
RFC: No feasible suggestions were obtained in the discussion of v4.
Discussions are still needed where/how to expose the gsi.
Looking forward to get more comments and suggestions from PCI/ACPI Maintainers.

---
 drivers/acpi/pci_irq.c  |  1 +
 drivers/pci/pci-sysfs.c | 11 +++
 include/linux/pci.h |  2 ++
 3 files changed, 14 insertions(+)

diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index 630fe0a34bc6..739a58755df2 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
kfree(entry);
return 0;
}
+   dev->gsi = gsi;
 
rc = acpi_register_gsi(&dev->dev, gsi, triggering, polarity);
if (rc < 0) {
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 2321fdfefd7d..c51df88d079e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -71,6 +71,16 @@ static ssize_t irq_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(irq);
 
+static ssize_t gsi_show(struct device *dev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   return sysfs_emit(buf, "%u\n", pdev->gsi);
+}
+static DEVICE_ATTR_RO(gsi);
+
 static ssize_t broken_parity_status_show(struct device *dev,
 struct device_attribute *attr,
 char *buf)
@@ -596,6 +606,7 @@ static struct attribute *pci_dev_attrs[] = {
&dev_attr_revision.attr,
&dev_attr_class.attr,
&dev_attr_irq.attr,
+   &dev_attr_gsi.attr,
&dev_attr_local_cpus.attr,
&dev_attr_local_cpulist.attr,
&dev_attr_modalias.attr,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7ab0d13672da..457043cfdfce 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -529,6 +529,8 @@ struct pci_dev {
 
/* These methods index pci_reset_fn_methods[] */
u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
+
+   unsigned intgsi;
 };
 
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
-- 
2.34.1




[RFC KERNEL PATCH v5 2/3] xen/pvh: Setup gsi for passthrough device

2024-03-27 Thread Jiqian Chen
In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assign a device to passthrough, proactively setup the gsi
of the device during that process.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
RFC: This patch change function acpi_pci_irq_lookup from a static function to 
non-static, need ACPI Maintainer to give some comments.

---
 arch/x86/xen/enlighten_pvh.c   | 92 ++
 drivers/acpi/pci_irq.c |  2 +-
 drivers/xen/xen-pciback/pci_stub.c |  8 +++
 include/linux/acpi.h   |  1 +
 include/xen/acpi.h |  6 ++
 5 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index c28f073c1df5..12be665b27d8 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -26,6 +27,97 @@
 bool __ro_after_init xen_pvh;
 EXPORT_SYMBOL_GPL(xen_pvh);
 
+typedef struct gsi_info {
+   int gsi;
+   int trigger;
+   int polarity;
+} gsi_info_t;
+
+struct acpi_prt_entry {
+   struct acpi_pci_id  id;
+   u8  pin;
+   acpi_handle link;
+   u32 index;  /* GSI, or link _CRS index */
+};
+
+static int xen_pvh_get_gsi_info(struct pci_dev *dev,
+   gsi_info_t 
*gsi_info)
+{
+   int gsi;
+   u8 pin;
+   struct acpi_prt_entry *entry;
+   int trigger = ACPI_LEVEL_SENSITIVE;
+   int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+ ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+   if (!dev || !gsi_info)
+   return -EINVAL;
+
+   pin = dev->pin;
+   if (!pin)
+   return -EINVAL;
+
+   entry = acpi_pci_irq_lookup(dev, pin);
+   if (entry) {
+   if (entry->link)
+   gsi = acpi_pci_link_allocate_irq(entry->link,
+entry->index,
+&trigger, &polarity,
+NULL);
+   else
+   gsi = entry->index;
+   } else
+   gsi = -1;
+
+   if (gsi < 0)
+   return -EINVAL;
+
+   gsi_info->gsi = gsi;
+   gsi_info->trigger = trigger;
+   gsi_info->polarity = polarity;
+
+   return 0;
+}
+
+static int xen_pvh_setup_gsi(gsi_info_t *gsi_info)
+{
+   struct physdev_setup_gsi setup_gsi;
+
+   if (!gsi_info)
+   return -EINVAL;
+
+   setup_gsi.gsi = gsi_info->gsi;
+   setup_gsi.triggering = (gsi_info->trigger == ACPI_EDGE_SENSITIVE ? 0 : 
1);
+   setup_gsi.polarity = (gsi_info->polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi);
+}
+
+int xen_pvh_passthrough_gsi(struct pci_dev *dev)
+{
+   int ret;
+   gsi_info_t gsi_info;
+
+   if (!dev)
+   return -EINVAL;
+
+   ret = xen_pvh_get_gsi_info(dev, &gsi_info);
+   if (ret) {
+   xen_raw_printk("Fail to get gsi info!\n");
+   return ret;
+   }
+
+   ret = xen_pvh_setup_gsi(&gsi_info);
+   if (ret == -EEXIST) {
+   xen_raw_printk("Already setup the GSI :%d\n", gsi_info.gsi);
+   ret = 0;
+   } else if (ret)
+   xen_raw_printk("Fail to setup GSI (%d)!\n", gsi_info.gsi);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_passthrough_gsi);
+
 void __init xen_pvh_init(struct boot_params *boot_params)
 {
u32 msr;
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..630fe0a34bc6 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev,
 }
 #endif /* CONFIG_X86_IO_APIC */
 
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
 {
struct acpi_prt_entry *entry = NULL;
struct pci_dev *bridge;
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 46c40ec8a18e..22d4380d2b04 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -435,6 +436,13 @@ static int pcistub_init_device(struct pci_dev *dev)
goto config_release;
pci_restore_state(dev);
}
+
+   if (xen_initial_domain() && xen_pvh_domain()) {
+ 

[RFC KERNEL PATCH v5 0/3] Support device passthrough when dom0 is PVH on Xen

2024-03-27 Thread Jiqian Chen
se problems, the second patch of kernel(xen/pvh: Unmask 
irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a 
device to be passthrough. So that passthrough devices can have the mapping of 
gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the 
v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/,
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.hu...@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, 
which is unnecessary and may cause multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from 
file /sys/bus/pci/devices/:xx:xx.x/irq in function 
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. 
On PVH dom0, when it allocates irq for a gsi in function 
acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of 
applying first, distributing first. And if you debug the kernel codes(see 
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order, but the applying gsi number is not, gsi 38 may come 
before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and 
then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi, we can do a translation. The third patch of 
kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the 
relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and 
provide a syscall for userspace to get the gsi from irq. The third patch of 
xen(tools: Add new function to get gsi from irq) add a new function 
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success. This v2 patch is the same as v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/)

About the v2 patch of qemu, just change an included head file, other are 
similar to the v1 ( qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.


Jiqian Chen (3):
  xen/pci: Add xen_reset_device_state function
  xen/pvh: Setup gsi for passthrough device
  PCI/sysfs: Add gsi sysfs for pci_dev

 arch/x86/xen/enlighten_pvh.c   | 92 ++
 drivers/acpi/pci_irq.c |  3 +-
 drivers/pci/pci-sysfs.c| 11 
 drivers/xen/pci.c  | 12 
 drivers/xen/xen-pciback/pci_stub.c | 26 -
 include/linux/acpi.h   |  1 +
 include/linux/pci.h|  2 +
 include/xen/acpi.h |  6 ++
 include/xen/interface/physdev.h|  7 +++
 include/xen/pci.h  |  6 ++
 10 files changed, 162 insertions(+), 4 deletions(-)

-- 
2.34.1




[RFC XEN PATCH v6 4/5] libxl: Use gsi instead of irq for mapping pirq

2024-03-27 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, xl wants to use
gsi to map pirq, see pci_add_dm_done->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

So, use real gsi number read from gsi sysfs.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 

---
RFC: discussions ongoing on the Linux side where/how to expose the gsi

---
 tools/libs/light/libxl_pci.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..2cec83e0b734 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1478,8 +1478,14 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
-sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
 pci->bus, pci->dev, pci->func);
+r = access(sysfs_path, F_OK);
+if (r && errno == ENOENT) {
+/* To compitable with old version of kernel, still need to use irq */
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+   pci->bus, pci->dev, pci->func);
+}
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
 LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
@@ -2229,9 +2235,15 @@ skip_bar:
 if (!pci_supp_legacy_irq())
 goto skip_legacy_irq;
 
-sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
pci->bus, pci->dev, pci->func);
 
+rc = access(sysfs_path, F_OK);
+if (rc && errno == ENOENT) {
+/* To compitable with old version of kernel, still need to use irq */
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+   pci->bus, pci->dev, pci->func);
+}
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
 LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
-- 
2.34.1




[RFC XEN PATCH v6 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-03-27 Thread Jiqian Chen
Some type of domain don't have PIRQ, like PVH, when
passthrough a device to guest on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
at domain_pirq_to_irq.

So, add a new hypercall to grant/revoke gsi permission
when dom0 is not PV or dom0 has not PIRQ flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/include/xenctrl.h  |  5 
 tools/libs/ctrl/xc_domain.c  | 15 +++
 tools/libs/light/libxl_pci.c | 52 +---
 xen/arch/x86/domctl.c| 31 +
 xen/include/public/domctl.h  |  9 +++
 xen/xsm/flask/hooks.c|  1 +
 6 files changed, 103 insertions(+), 10 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2ef8b4e05422..519c860a00d5 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.allow_access = allow_access,
+};
+
+return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 2cec83e0b734..debf6ec6ddc7 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1421,6 +1421,8 @@ static void pci_add_dm_done(libxl__egc *egc,
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
+int gsi;
+bool is_gsi = true;
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1485,6 +1487,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 /* To compitable with old version of kernel, still need to use irq */
 sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
pci->bus, pci->dev, pci->func);
+is_gsi = false;
 }
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
@@ -1492,6 +1495,13 @@ static void pci_add_dm_done(libxl__egc *egc,
 goto out_no_irq;
 }
 if ((fscanf(f, "%u", &irq) == 1) && irq) {
+/*
+ * If use gsi, save the value, because the value of irq
+ * will be changed by function xc_physdev_map_pirq
+ */
+if (is_gsi) {
+gsi = irq;
+}
 r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
 if (r < 0) {
 LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
@@ -1500,13 +1510,25 @@ static void pci_add_dm_done(libxl__egc *egc,
 rc = ERROR_FAIL;
 goto out;
 }
-r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
-if (r < 0) {
-LOGED(ERROR, domainid,
-  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
-fclose(f);
-rc = ERROR_FAIL;
-goto out;
+if (is_gsi) {
+r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
+if (r < 0 && r != -EOPNOTSUPP) {
+LOGED(ERROR, domainid,
+  "xc_domain_gsi_permission gsi=%d (error=%d)", gsi, r);
+fclose(f);
+rc = ERROR_FAIL;
+goto out;
+}
+}
+if (!is_gsi || r == -EOPNOTSUPP) {
+r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+if (r < 0) {
+LOGED(ERROR, domainid,
+"xc_domain_irq_permission irq=%d (error=%d)", irq, r);
+fclose(f);
+rc = ERROR_FAIL;
+goto out;
+}
 }
 }
 fclose(f);
@@ 

[XEN PATCH v6 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-03-27 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see
xen_pt_realize->xc_physdev_map_pirq and
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when caller has no PIRQ
flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  2 ++
 xen/arch/x86/physdev.c   | 24 
 2 files changed, 26 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 6ad5b4d5f11f..493998b42ec5 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 7efa17cf4c1e..1367abc61e54 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_map_pirq: {
 physdev_map_pirq_t map;
 struct msi_info msi;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(&map, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(map.domid);
+if ( d == NULL )
+return -ESRCH;
+/* If it is an HVM guest, check if it has PIRQs */
+if ( !is_pv_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 switch ( map.type )
 {
 case MAP_PIRQ_TYPE_MSI_SEG:
@@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 case PHYSDEVOP_unmap_pirq: {
 struct physdev_unmap_pirq unmap;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(&unmap, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(unmap.domid);
+if ( d == NULL )
+return -ESRCH;
+/* If it is an HVM guest, check if it has PIRQs */
+if ( !is_pv_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
 break;
 }
-- 
2.34.1




[RFC XEN PATCH v6 0/5] Support device passthrough when dom0 is PVH on Xen

2024-03-27 Thread Jiqian Chen
om0 do PHYSDEVOP_map_pirq. This v2 
patch is better than v1, v1 simply remove the has_pirq check(xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.hu...@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a 
passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to 
check if the gsi has corresponding mappings in dom0. But it didn’t, so failed. 
See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and 
it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH 
dom0, because the devices of PVH are using MSI(-X) interrupts. However, the 
IO-APIC pin must be configured for it to be able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be 
done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka 
ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device 
doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask 
irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a 
device to be passthrough. So that passthrough devices can have the mapping of 
gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the 
v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/,
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.hu...@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, 
which is unnecessary and may cause multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from 
file /sys/bus/pci/devices/:xx:xx.x/irq in function 
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. 
On PVH dom0, when it allocates irq for a gsi in function 
acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of 
applying first, distributing first. And if you debug the kernel codes(see 
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order, but the applying gsi number is not, gsi 38 may come 
before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and 
then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi, we can do a translation. The third patch of 
kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the 
relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and 
provide a syscall for userspace to get the gsi from irq. The third patch of 
xen(tools: Add new function to get gsi from irq) add a new function 
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success. This v2 patch is the same as v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/)

About the v2 patch of qemu, just change an included head file, other are 
similar to the v1 ( qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.


Jiqian Chen (5):
  xen/vpci: Clear all vpci status of device
  x86/pvh: Allow (un)map_pirq when dom0 is PVH
  x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  libxl: Use gsi instead of irq for mapping pirq
  domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

 tools/include/xenctrl.h  |  5 +++
 tools/libs/ctrl/xc_domain.c  | 15 
 tools/libs/light/libxl_pci.c | 68 +---
 xen/arch/x86/domctl.c| 31 
 xen/arch/x86/hvm/hypercall.c |  8 +
 xen/arch/x86/physdev.c   | 24 +
 xen/drivers/pci/physdev.c| 36 +++
 xen/drivers/vpci/vpci.c  | 10 ++
 xen/include/public/domctl.h  |  9 +
 xen/include/public/physdev.h |  7 
 xen/include/xen/vpci.h   |  6 
 xen/xsm/flask/hooks.c|  1 +
 12 files changed, 208 insertions(+), 12 deletions(-)

-- 
2.34.1




[RFC XEN PATCH v6 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-03-27 Thread Jiqian Chen
On PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
passthrough devices to register gsi when dom0 is PVH.
So, add PHYSDEVOP_setup_gsi for above purpose.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 493998b42ec5..7d4e41f66885 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_unmap_pirq:
 break;
 
+case PHYSDEVOP_setup_gsi:
+if ( !is_hardware_domain(currd) )
+return -EOPNOTSUPP;
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[XEN PATCH v6 1/5] xen/vpci: Clear all vpci status of device

2024-03-27 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 36 
 xen/drivers/vpci/vpci.c  | 10 ++
 xen/include/public/physdev.h |  7 +++
 xen/include/xen/vpci.h   |  6 ++
 5 files changed, 60 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index eeb73e1aa5d0..6ad5b4d5f11f 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..73dc8f058b0e 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct physdev_pci_device dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(&dev, arg, 1) != 0 )
+break;
+sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(&pdev->domain->pci_lock);
+ret = vpci_reset_device_state(pdev);
+write_unlock(&pdev->domain->pci_lock);
+pcidevs_unlock();
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
&sbdf);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 260b72875ee1..310700c1e775 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -117,6 +117,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(pcidevs_locked());
+ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..f5bab1f29779 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index e89c571890b2..ea64d94e818b 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -30,6 +30,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_deassign_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -266,6 +267,11 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
 
 static inline void vpci_deassign_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1




[RFC QEMU PATCH v5 1/1] xen: Use gsi instead of irq for mapping pirq

2024-03-27 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, qemu wants to use
gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but
the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

Add gsi into XenHostPCIDevice and use gsi number that
read from gsi sysfs if it exists.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 

---
RFC: discussions ongoing on the Linux side where/how to expose the gsi

---
 hw/xen/xen-host-pci-device.c | 7 +++
 hw/xen/xen-host-pci-device.h | 1 +
 hw/xen/xen_pt.c  | 6 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 8c6e9a1716a2..5be3279aa25b 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -370,6 +370,13 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 }
 d->irq = v;
 
+xen_host_pci_get_dec_value(d, "gsi", &v, errp);
+if (*errp) {
+d->gsi = -1;
+} else {
+d->gsi = v;
+}
+
 xen_host_pci_get_hex_value(d, "class", &v, errp);
 if (*errp) {
 goto error;
diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h
index 4d8d34ecb024..74c552bb5548 100644
--- a/hw/xen/xen-host-pci-device.h
+++ b/hw/xen/xen-host-pci-device.h
@@ -27,6 +27,7 @@ typedef struct XenHostPCIDevice {
 uint16_t device_id;
 uint32_t class_code;
 int irq;
+int gsi;
 
 XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1];
 XenHostPCIIORegion rom;
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 3635d1b39f79..d34a7a8764ab 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -840,7 +840,11 @@ static void xen_pt_realize(PCIDevice *d, Error **errp)
 goto out;
 }
 
-machine_irq = s->real_device.irq;
+if (s->real_device.gsi < 0) {
+machine_irq = s->real_device.irq;
+} else {
+machine_irq = s->real_device.gsi;
+}
 if (machine_irq == 0) {
 XEN_PT_LOG(d, "machine irq is 0\n");
 cmd |= PCI_COMMAND_INTX_DISABLE;
-- 
2.34.1




[QEMU PATCH v5 0/1] Support device passthrough when dom0 is PVH on Xen

2024-03-27 Thread Jiqian Chen
Hi All,
This is v5 series to support passthrough on Xen when dom0 is PVH.
v4->v5 changes:
* Add review by Stefano

v3->v4 changes:
* Add gsi into struct XenHostPCIDevice and use gsi number that read from gsi 
sysfs
  if it exists, if there is no gsi sysfs, still use irq.

v2->v3 changes:
* Du to changes in the implementation of the second patch on kernel side(that 
adds
  a new sysfs for gsi instead of a new syscall), so read gsi number from the 
sysfs of gsi.

Below is the description of v2 cover letter:
This patch is the v2 of the implementation of passthrough when dom0 is PVH on 
Xen.
Issues we encountered:
1. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in
function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq,
but qemu pass irq to it and treat irq as gsi, it is got from file
/sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). 
But actually
the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a 
gsi in
function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the 
principle of
applying first, distributing first. And if you debug the kernel codes
(see function __irq_alloc_descs), you will find the irq number is allocated 
from small to
large by order, but the applying gsi number is not, gsi 38 may come before gsi 
28, that
causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want
to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add 
new syscall
to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() 
when dom0
initialize pci devices, and provide a syscall for userspace to get the gsi from 
irq. The
third patch of xen(tools: Add new function to get gsi from irq) add a new 
function
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success.

This v2 on qemu side is the same as the v1 
(qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.


Jiqian Chen (1):
  xen: Use gsi instead of irq for mapping pirq

 hw/xen/xen-host-pci-device.c | 7 +++
 hw/xen/xen-host-pci-device.h | 1 +
 hw/xen/xen_pt.c  | 6 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

-- 
2.34.1




[RFC XEN PATCH v5 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-01-11 Thread Jiqian Chen
Some type of domain don't have PIRQ, like PVH, when
passthrough a device to guest on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
at domain_pirq_to_irq.

So, add a new hypercall to grant/revoke gsi permission
when dom0 is not PV or dom0 has not PIRQ flag.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/include/xenctrl.h  |  5 +
 tools/libs/ctrl/xc_domain.c  | 15 +++
 tools/libs/light/libxl_pci.c | 16 ++--
 xen/arch/x86/domctl.c| 31 +++
 xen/include/public/domctl.h  |  9 +
 xen/xsm/flask/hooks.c|  1 +
 6 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2ef8b4e05422..519c860a00d5 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..448ba2c59ae1 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {};
+
+domctl.cmd = XEN_DOMCTL_gsi_permission;
+domctl.domain = domid;
+domctl.u.gsi_permission.gsi = gsi;
+domctl.u.gsi_permission.allow_access = allow_access;
+
+return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index a1c6e82631e9..4136a860a048 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1421,6 +1421,8 @@ static void pci_add_dm_done(libxl__egc *egc,
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
+int gsi;
+bool has_gsi = true;
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1482,6 +1484,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 pci->bus, pci->dev, pci->func);
 
 if ( access(sysfs_path, F_OK) != 0 ) {
+has_gsi = false;
 if ( errno == ENOENT )
 sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
 pci->bus, pci->dev, pci->func);
@@ -1497,6 +1500,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 goto out_no_irq;
 }
 if ((fscanf(f, "%u", &irq) == 1) && irq) {
+gsi = irq;
 r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
 if (r < 0) {
 LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
@@ -1505,7 +1509,10 @@ static void pci_add_dm_done(libxl__egc *egc,
 rc = ERROR_FAIL;
 goto out;
 }
-r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+if ( has_gsi )
+r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
+if ( !has_gsi || r == -EOPNOTSUPP )
+r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
 if (r < 0) {
 LOGED(ERROR, domainid,
   "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
@@ -2185,6 +2192,7 @@ static void pci_remove_detached(libxl__egc *egc,
 FILE *f;
 uint32_t domainid = prs->domid;
 bool isstubdom;
+bool has_gsi = true;
 
 /* Convenience aliases */
 libxl_device_pci *const pci = &prs->pci;
@@ -2244,6 +2252,7 @@ skip_bar:
pci->bus, pci->dev, pci->func);
 
 if ( access(sysfs_path, F_OK) != 0 ) {
+has_gsi = false;
 if ( errno == ENOENT )
 sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
 pci->bus, pci->dev, pci->func);
@@ -2270,7 +2279,10 @@ skip_bar:
  */
 LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
 }
-rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
+i

[RFC XEN PATCH v5 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-01-11 Thread Jiqian Chen
On PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
passthrough devices to register gsi when dom0 is PVH.
So, add PHYSDEVOP_setup_gsi for above purpose.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 493998b42ec5..46f51ee459f6 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -76,6 +76,12 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_unmap_pirq:
 break;
 
+case PHYSDEVOP_setup_gsi:
+if ( !is_hardware_domain(currd) )
+return -EOPNOTSUPP;
+ASSERT(!has_pirq(currd));
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[RFC XEN PATCH v5 4/5] libxl: Use gsi instead of irq for mapping pirq

2024-01-11 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, xl wants to use
gsi to map pirq, see pci_add_dm_done->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

So, use real gsi number read from gsi sysfs.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 tools/libs/light/libxl_pci.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..a1c6e82631e9 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1478,8 +1478,19 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
-sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
 pci->bus, pci->dev, pci->func);
+
+if ( access(sysfs_path, F_OK) != 0 ) {
+if ( errno == ENOENT )
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+pci->bus, pci->dev, pci->func);
+else {
+LOGED(ERROR, domainid, "Can't access %s", sysfs_path);
+goto out_no_irq;
+}
+}
+
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
 LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
@@ -2229,9 +2240,19 @@ skip_bar:
 if (!pci_supp_legacy_irq())
 goto skip_legacy_irq;
 
-sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
pci->bus, pci->dev, pci->func);
 
+if ( access(sysfs_path, F_OK) != 0 ) {
+if ( errno == ENOENT )
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+pci->bus, pci->dev, pci->func);
+else {
+LOGED(ERROR, domid, "Can't access %s", sysfs_path);
+goto skip_legacy_irq;
+}
+}
+
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
 LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
-- 
2.34.1




[RFC XEN PATCH v5 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-01-11 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see
xen_pt_realize->xc_physdev_map_pirq and
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when caller has no PIRQ
flag.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c |  2 ++
 xen/arch/x86/physdev.c   | 22 ++
 2 files changed, 24 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 6ad5b4d5f11f..493998b42ec5 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 47c4da0af7e1..7f2422c2a483 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -303,11 +303,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_map_pirq: {
 physdev_map_pirq_t map;
 struct msi_info msi;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(&map, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(map.domid);
+if ( d == NULL )
+return -ESRCH;
+if ( !is_pv_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 switch ( map.type )
 {
 case MAP_PIRQ_TYPE_MSI_SEG:
@@ -341,11 +352,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 case PHYSDEVOP_unmap_pirq: {
 struct physdev_unmap_pirq unmap;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(&unmap, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(unmap.domid);
+if ( d == NULL )
+return -ESRCH;
+if ( !is_pv_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
 break;
 }
-- 
2.34.1




[RFC XEN PATCH v5 1/5] xen/vpci: Clear all vpci status of device

2024-01-11 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 36 
 xen/drivers/vpci/vpci.c  | 10 ++
 xen/include/public/physdev.h |  7 +++
 xen/include/xen/vpci.h   |  6 ++
 5 files changed, 60 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index eeb73e1aa5d0..6ad5b4d5f11f 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..73dc8f058b0e 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct physdev_pci_device dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(&dev, arg, 1) != 0 )
+break;
+sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(&pdev->domain->pci_lock);
+ret = vpci_reset_device_state(pdev);
+write_unlock(&pdev->domain->pci_lock);
+pcidevs_unlock();
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
&sbdf);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 72ef277c4f8e..c6df2c6a9561 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -107,6 +107,16 @@ int vpci_add_handlers(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(pcidevs_locked());
+ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+vpci_remove_device(pdev);
+return vpci_add_handlers(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..f5bab1f29779 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index d20c301a3db3..6ec83ce9ae13 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -30,6 +30,7 @@ int __must_check vpci_add_handlers(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_remove_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -262,6 +263,11 @@ static inline int vpci_add_handlers(struct pci_dev *pdev)
 
 static inline void vpci_remove_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1




[RFC XEN PATCH v5 0/5] Support device passthrough when dom0 is PVH on Xen

2024-01-11 Thread Jiqian Chen
) to 
check if the gsi has corresponding mappings in dom0. But it didn’t, so failed. 
See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and 
it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH 
dom0, because the devices of PVH are using MSI(-X) interrupts. However, the 
IO-APIC pin must be configured for it to be able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be 
done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka 
ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device 
doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask 
irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a 
device to be passthrough. So that passthrough devices can have the mapping of 
gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the 
v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/,
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.hu...@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, 
which is unnecessary and may cause multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from 
file /sys/bus/pci/devices/:xx:xx.x/irq in function 
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. 
On PVH dom0, when it allocates irq for a gsi in function 
acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of 
applying first, distributing first. And if you debug the kernel codes(see 
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order, but the applying gsi number is not, gsi 38 may come 
before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and 
then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi, we can do a translation. The third patch of 
kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the 
relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and 
provide a syscall for userspace to get the gsi from irq. The third patch of 
xen(tools: Add new function to get gsi from irq) add a new function 
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success. This v2 patch is the same as v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/)

About the v2 patch of qemu, just change an included head file, other are 
similar to the v1 ( qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (5):
  xen/vpci: Clear all vpci status of device
  x86/pvh: Allow (un)map_pirq when dom0 is PVH
  x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  libxl: Use gsi instead of irq for mapping pirq
  domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

 tools/include/xenctrl.h  |  5 +
 tools/libs/ctrl/xc_domain.c  | 15 +
 tools/libs/light/libxl_pci.c | 41 
 xen/arch/x86/domctl.c| 31 +++
 xen/arch/x86/hvm/hypercall.c |  9 
 xen/arch/x86/physdev.c   | 22 +++
 xen/drivers/pci/physdev.c| 36 +++
 xen/drivers/vpci/vpci.c  | 10 +
 xen/include/public/domctl.h  |  9 
 xen/include/public/physdev.h |  7 ++
 xen/include/xen/vpci.h   |  6 ++
 xen/xsm/flask/hooks.c|  1 +
 12 files changed, 188 insertions(+), 4 deletions(-)

-- 
2.34.1




[RFC QEMU PATCH v4 1/1] xen: Use gsi instead of irq for mapping pirq

2024-01-04 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, qemu wants to use
gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but
the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

Add gsi into XenHostPCIDevice and use gsi number that
read from gsi sysfs if it exists.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 hw/xen/xen-host-pci-device.c | 7 +++
 hw/xen/xen-host-pci-device.h | 1 +
 hw/xen/xen_pt.c  | 6 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 8c6e9a1716a2..5be3279aa25b 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -370,6 +370,13 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 }
 d->irq = v;
 
+xen_host_pci_get_dec_value(d, "gsi", &v, errp);
+if (*errp) {
+d->gsi = -1;
+} else {
+d->gsi = v;
+}
+
 xen_host_pci_get_hex_value(d, "class", &v, errp);
 if (*errp) {
 goto error;
diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h
index 4d8d34ecb024..74c552bb5548 100644
--- a/hw/xen/xen-host-pci-device.h
+++ b/hw/xen/xen-host-pci-device.h
@@ -27,6 +27,7 @@ typedef struct XenHostPCIDevice {
 uint16_t device_id;
 uint32_t class_code;
 int irq;
+int gsi;
 
 XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1];
 XenHostPCIIORegion rom;
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 36e6f93c372f..d448f3a17306 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -839,7 +839,11 @@ static void xen_pt_realize(PCIDevice *d, Error **errp)
 goto out;
 }
 
-machine_irq = s->real_device.irq;
+if (s->real_device.gsi < 0) {
+machine_irq = s->real_device.irq;
+} else {
+machine_irq = s->real_device.gsi;
+}
 if (machine_irq == 0) {
 XEN_PT_LOG(d, "machine irq is 0\n");
 cmd |= PCI_COMMAND_INTX_DISABLE;
-- 
2.34.1




[RFC QEMU PATCH v4 0/1] Support device passthrough when dom0 is PVH on Xen

2024-01-04 Thread Jiqian Chen
Hi All,
This is v4 series to support passthrough on Xen when dom0 is PVH.
v3->v4 changes:
* Add gsi into struct XenHostPCIDevice and use gsi number that read from gsi 
sysfs if it exists, if there is no gsi sysfs, still use irq.

v2->v3 changes:
* du to changes in the implementation of the second patch on kernel side(that 
adds a new sysfs for gsi instead of a new syscall), so read gsi number from the 
sysfs of gsi.

Below is the description of v2 cover letter:
This patch is the v2 of the implementation of passthrough when dom0 is PVH on 
Xen.
Issues we encountered:
1. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from 
file /sys/bus/pci/devices/:xx:xx.x/irq in function 
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. 
On PVH dom0, when it allocates irq for a gsi in function 
acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of 
applying first, distributing first. And if you debug the kernel codes(see 
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order, but the applying gsi number is not, gsi 38 may come 
before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and 
then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi, we can do a translation. The third patch of 
kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the 
relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and 
provide a syscall for userspace to get the gsi from irq. The third patch of 
xen(tools: Add new function to get gsi from irq) add a new function 
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success.

This v2 on qemu side is the same as the v1 ( qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (1):
  xen: Use gsi instead of irq for mapping pirq

 hw/xen/xen-host-pci-device.c | 7 +++
 hw/xen/xen-host-pci-device.h | 1 +
 hw/xen/xen_pt.c  | 6 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

-- 
2.34.1




  1   2   >