RE: [PATCH v2 2/4] iommu: Introduce device fault data
> From: Jacob Pan [mailto:jacob.jun@linux.intel.com] > Sent: Thursday, June 6, 2019 1:38 AM > > On Wed, 5 Jun 2019 08:51:45 + > "Tian, Kevin" wrote: > > > > From: Jacob Pan > > > Sent: Tuesday, June 4, 2019 6:09 AM > > > > > > On Mon, 3 Jun 2019 15:57:47 +0100 > > > Jean-Philippe Brucker wrote: > > > > > > > +/** > > > > + * struct iommu_fault_page_request - Page Request data > > > > + * @flags: encodes whether the corresponding fields are valid and > > > > whether this > > > > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* > > > > values) > > > > + * @pasid: Process Address Space ID > > > > + * @grpid: Page Request Group Index > > > > + * @perm: requested page permissions (IOMMU_FAULT_PERM_* > values) > > > > + * @addr: page address > > > > + * @private_data: device-specific private information > > > > + */ > > > > +struct iommu_fault_page_request { > > > > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) > > > > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) > > > > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) > > > > + __u32 flags; > > > > + __u32 pasid; > > > > + __u32 grpid; > > > > + __u32 perm; > > > > + __u64 addr; > > > > + __u64 private_data[2]; > > > > +}; > > > > + > > > > > > Just a thought, for non-identity G-H PASID management. We could > > > pass on guest PASID in PRQ to save a lookup in QEMU. In this case, > > > QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU. > > > QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with > > > HPASID, IOMMU driver > > > can retrieve GPASID from the bind data then report to the guest via > > > VFIO. In this case QEMU does not need to do a H->G PASID lookup. > > > > > > Should we add a gpasid field here? or we can add a flag and field > > > later, up to you. > > > > > > > Can private_data serve this purpose? It's better not introducing > > gpasid awareness within host IOMMU driver. It is just a user-level > > data associated with a PASID when binding happens. Kernel doesn't > > care the actual meaning, simply record it and then return back to > > user space later upon device fault. Qemu interprets the meaning as > > gpasid in its own context. otherwise usages may use it for other > > purpose. > > > private_data was intended for device PRQ with private data, part of the > VT-d PRQ descriptor. For vSVA, we can withhold private_data in the host > then respond back when page response from the guest matches pending PRQ > with the data withheld. But for in-kernel PRQ reporting, private data > still might be passed on to any driver who wants to process the PRQ. So > we can't re-purpose it. sure. I just use it as one example to extend. > > But for in-kernel VDCM driver, it needs a lookup from guest PASID to > host PASID. I thought you wanted to have IOMMU driver provide such > service since the knowledge of H-G pasid can be established during > bind_gpasid time. In that sense, we _do_ have gpasid awareness. > yes, it makes sense. My original point is that IOMMU driver itself doesn't need to know the actual meaning of this field (then it may be reused for different purpose from gpasid), but you are right that mdev driver in kernel anyway needs to do G-H translation then explicitly defining it looks reasonable. Thanks Kevin
Re: [PATCH v2 2/4] iommu: Introduce device fault data
On Wed, 5 Jun 2019 12:24:09 +0100 Jean-Philippe Brucker wrote: > On 05/06/2019 09:51, Tian, Kevin wrote: > >> From: Jacob Pan > >> Sent: Tuesday, June 4, 2019 6:09 AM > >> > >> On Mon, 3 Jun 2019 15:57:47 +0100 > >> Jean-Philippe Brucker wrote: > >> > >>> +/** > >>> + * struct iommu_fault_page_request - Page Request data > >>> + * @flags: encodes whether the corresponding fields are valid and > >>> whether this > >>> + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* > >>> values) > >>> + * @pasid: Process Address Space ID > >>> + * @grpid: Page Request Group Index > >>> + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values) > >>> + * @addr: page address > >>> + * @private_data: device-specific private information > >>> + */ > >>> +struct iommu_fault_page_request { > >>> +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) > >>> +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) > >>> +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) > >>> + __u32 flags; > >>> + __u32 pasid; > >>> + __u32 grpid; > >>> + __u32 perm; > >>> + __u64 addr; > >>> + __u64 private_data[2]; > >>> +}; > >>> + > >> > >> Just a thought, for non-identity G-H PASID management. We could > >> pass on guest PASID in PRQ to save a lookup in QEMU. In this case, > >> QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU. > >> QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with > >> HPASID, IOMMU driver > >> can retrieve GPASID from the bind data then report to the guest via > >> VFIO. In this case QEMU does not need to do a H->G PASID lookup. > >> > >> Should we add a gpasid field here? or we can add a flag and field > >> later, up to you. > >> > > > > Can private_data serve this purpose? > > Isn't private_data already used for VT-d's Private Data field? > yes, as part of the PRQ. please see my explanation in the previous email. > > It's better not introducing > > gpasid awareness within host IOMMU driver. It is just a user-level > > data associated with a PASID when binding happens. Kernel doesn't > > care the actual meaning, simply record it and then return back to > > user space later upon device fault. Qemu interprets the meaning as > > gpasid in its own context. otherwise usages may use it for other > > purpose. > > Regarding a gpasid field I don't mind either way, but extending the > iommu_fault structure later won't be completely straightforward so we > could add some padding now. > > Userspace negotiate the iommu_fault struct format with VFIO, before > allocating a circular buffer of N fault structures > (). > So adding new fields requires introducing a new ABI version and a > struct iommu_fault_v2. That may be OK for disruptive changes, but > just adding a new field indicated by a flag shouldn't have to be that > complicated. > > How about setting the iommu_fault structure to 128 bytes? > > struct iommu_fault { > __u32 type; > __u32 padding; > union { > struct iommu_fault_unrecoverable event; > struct iommu_fault_page_request prm; > __u8 padding2[120]; > }; > }; > > Given that @prm is currently 40 bytes and @event 32 bytes, the padding > allows either of them to grow 10 new 64-bit fields (or 20 new 32-bit > fields, which is still representable with new flags) before we have to > upgrade the ABI version. > > A 4kB and a 64kB queue can hold respectively: > > * 85 and 1365 records when iommu_fault is 48 bytes (current format). > * 64 and 1024 records when iommu_fault is 64 bytes (but allows to grow > only 2 new 64-bit fields). > * 32 and 512 records when iommu_fault is 128 bytes. > > In comparison, > * the SMMU even queue can hold 128 and 2048 events respectively at > those sizes (and is allowed to grow up to 524k entries) > * the SMMU PRI queue can hold 256 and 4096 PR. > > But the SMMU queues have to be physically contiguous, whereas our > fault queues are in userspace memory which is less expensive. So > 128-byte records might be reasonable. What do you think? > I think though 128-byte is large enough for any future extension but 64B might be good enough and it is a cache line. PCI page request msg is only 16B :) VT-d currently uses one 4K page for PRQ, holds 128 records of PRQ descriptors. This can grow to 16K entries per spec. That is per IOMMU. The user fault queue here is per device. So we do have to be frugal about it since it will support mdev at per PASID level at some point? I have to look into Eric's patchset on how he handles queue full in the producer. If we go with 128B size in iommu_fault and 4KB size queue (32 entries as in your table), VT-d PRQ size of 128 entries can potentially cause queue full. We have to handle this VFIO queue full differently than the IOMMU queue full in that we only need to discard PRQ for one device. (Whereas IOMMU queue full has to clear out all). Anyway, I think 64B should be enough but 128B is fine too. We have to de
Re: [PATCH v2 2/4] iommu: Introduce device fault data
On Wed, 5 Jun 2019 08:51:45 + "Tian, Kevin" wrote: > > From: Jacob Pan > > Sent: Tuesday, June 4, 2019 6:09 AM > > > > On Mon, 3 Jun 2019 15:57:47 +0100 > > Jean-Philippe Brucker wrote: > > > > > +/** > > > + * struct iommu_fault_page_request - Page Request data > > > + * @flags: encodes whether the corresponding fields are valid and > > > whether this > > > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* > > > values) > > > + * @pasid: Process Address Space ID > > > + * @grpid: Page Request Group Index > > > + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values) > > > + * @addr: page address > > > + * @private_data: device-specific private information > > > + */ > > > +struct iommu_fault_page_request { > > > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) > > > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) > > > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) > > > + __u32 flags; > > > + __u32 pasid; > > > + __u32 grpid; > > > + __u32 perm; > > > + __u64 addr; > > > + __u64 private_data[2]; > > > +}; > > > + > > > > Just a thought, for non-identity G-H PASID management. We could > > pass on guest PASID in PRQ to save a lookup in QEMU. In this case, > > QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU. > > QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with > > HPASID, IOMMU driver > > can retrieve GPASID from the bind data then report to the guest via > > VFIO. In this case QEMU does not need to do a H->G PASID lookup. > > > > Should we add a gpasid field here? or we can add a flag and field > > later, up to you. > > > > Can private_data serve this purpose? It's better not introducing > gpasid awareness within host IOMMU driver. It is just a user-level > data associated with a PASID when binding happens. Kernel doesn't > care the actual meaning, simply record it and then return back to > user space later upon device fault. Qemu interprets the meaning as > gpasid in its own context. otherwise usages may use it for other > purpose. > private_data was intended for device PRQ with private data, part of the VT-d PRQ descriptor. For vSVA, we can withhold private_data in the host then respond back when page response from the guest matches pending PRQ with the data withheld. But for in-kernel PRQ reporting, private data still might be passed on to any driver who wants to process the PRQ. So we can't re-purpose it. But for in-kernel VDCM driver, it needs a lookup from guest PASID to host PASID. I thought you wanted to have IOMMU driver provide such service since the knowledge of H-G pasid can be established during bind_gpasid time. In that sense, we _do_ have gpasid awareness. > Thanks > Kevin [Jacob Pan] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/4] iommu: Introduce device fault data
On 05/06/2019 09:51, Tian, Kevin wrote: >> From: Jacob Pan >> Sent: Tuesday, June 4, 2019 6:09 AM >> >> On Mon, 3 Jun 2019 15:57:47 +0100 >> Jean-Philippe Brucker wrote: >> >>> +/** >>> + * struct iommu_fault_page_request - Page Request data >>> + * @flags: encodes whether the corresponding fields are valid and >>> whether this >>> + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* >>> values) >>> + * @pasid: Process Address Space ID >>> + * @grpid: Page Request Group Index >>> + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values) >>> + * @addr: page address >>> + * @private_data: device-specific private information >>> + */ >>> +struct iommu_fault_page_request { >>> +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) >>> +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) >>> +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) >>> + __u32 flags; >>> + __u32 pasid; >>> + __u32 grpid; >>> + __u32 perm; >>> + __u64 addr; >>> + __u64 private_data[2]; >>> +}; >>> + >> >> Just a thought, for non-identity G-H PASID management. We could pass on >> guest PASID in PRQ to save a lookup in QEMU. In this case, QEMU >> allocate a GPASID for vIOMMU then a host PASID for pIOMMU. QEMU has a >> G->H lookup. When PRQ comes in to the pIOMMU with HPASID, IOMMU >> driver >> can retrieve GPASID from the bind data then report to the guest via >> VFIO. In this case QEMU does not need to do a H->G PASID lookup. >> >> Should we add a gpasid field here? or we can add a flag and field >> later, up to you. >> > > Can private_data serve this purpose? Isn't private_data already used for VT-d's Private Data field? > It's better not introducing > gpasid awareness within host IOMMU driver. It is just a user-level > data associated with a PASID when binding happens. Kernel doesn't > care the actual meaning, simply record it and then return back to user > space later upon device fault. Qemu interprets the meaning as gpasid > in its own context. otherwise usages may use it for other purpose. Regarding a gpasid field I don't mind either way, but extending the iommu_fault structure later won't be completely straightforward so we could add some padding now. Userspace negotiate the iommu_fault struct format with VFIO, before allocating a circular buffer of N fault structures (https://lore.kernel.org/lkml/20190526161004.25232-26-eric.au...@redhat.com/). So adding new fields requires introducing a new ABI version and a struct iommu_fault_v2. That may be OK for disruptive changes, but just adding a new field indicated by a flag shouldn't have to be that complicated. How about setting the iommu_fault structure to 128 bytes? struct iommu_fault { __u32 type; __u32 padding; union { struct iommu_fault_unrecoverable event; struct iommu_fault_page_request prm; __u8 padding2[120]; }; }; Given that @prm is currently 40 bytes and @event 32 bytes, the padding allows either of them to grow 10 new 64-bit fields (or 20 new 32-bit fields, which is still representable with new flags) before we have to upgrade the ABI version. A 4kB and a 64kB queue can hold respectively: * 85 and 1365 records when iommu_fault is 48 bytes (current format). * 64 and 1024 records when iommu_fault is 64 bytes (but allows to grow only 2 new 64-bit fields). * 32 and 512 records when iommu_fault is 128 bytes. In comparison, * the SMMU even queue can hold 128 and 2048 events respectively at those sizes (and is allowed to grow up to 524k entries) * the SMMU PRI queue can hold 256 and 4096 PR. But the SMMU queues have to be physically contiguous, whereas our fault queues are in userspace memory which is less expensive. So 128-byte records might be reasonable. What do you think? The iommu_fault_response (patch 4/4) is a bit easier to extend because it's userspace->kernel and userspace can just declare the size it's using. I did add a version field in case we run out of flags or want to change the whole thing, but I think I was being overly cautious and it might just be a waste of space. Thanks, Jean
RE: [PATCH v2 2/4] iommu: Introduce device fault data
> From: Jacob Pan > Sent: Tuesday, June 4, 2019 6:09 AM > > On Mon, 3 Jun 2019 15:57:47 +0100 > Jean-Philippe Brucker wrote: > > > +/** > > + * struct iommu_fault_page_request - Page Request data > > + * @flags: encodes whether the corresponding fields are valid and > > whether this > > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* > > values) > > + * @pasid: Process Address Space ID > > + * @grpid: Page Request Group Index > > + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values) > > + * @addr: page address > > + * @private_data: device-specific private information > > + */ > > +struct iommu_fault_page_request { > > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) > > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) > > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) > > + __u32 flags; > > + __u32 pasid; > > + __u32 grpid; > > + __u32 perm; > > + __u64 addr; > > + __u64 private_data[2]; > > +}; > > + > > Just a thought, for non-identity G-H PASID management. We could pass on > guest PASID in PRQ to save a lookup in QEMU. In this case, QEMU > allocate a GPASID for vIOMMU then a host PASID for pIOMMU. QEMU has a > G->H lookup. When PRQ comes in to the pIOMMU with HPASID, IOMMU > driver > can retrieve GPASID from the bind data then report to the guest via > VFIO. In this case QEMU does not need to do a H->G PASID lookup. > > Should we add a gpasid field here? or we can add a flag and field > later, up to you. > Can private_data serve this purpose? It's better not introducing gpasid awareness within host IOMMU driver. It is just a user-level data associated with a PASID when binding happens. Kernel doesn't care the actual meaning, simply record it and then return back to user space later upon device fault. Qemu interprets the meaning as gpasid in its own context. otherwise usages may use it for other purpose. Thanks Kevin
Re: [PATCH v2 2/4] iommu: Introduce device fault data
On Mon, 3 Jun 2019 15:57:47 +0100 Jean-Philippe Brucker wrote: > +/** > + * struct iommu_fault_page_request - Page Request data > + * @flags: encodes whether the corresponding fields are valid and > whether this > + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* > values) > + * @pasid: Process Address Space ID > + * @grpid: Page Request Group Index > + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values) > + * @addr: page address > + * @private_data: device-specific private information > + */ > +struct iommu_fault_page_request { > +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) > +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) > +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) > + __u32 flags; > + __u32 pasid; > + __u32 grpid; > + __u32 perm; > + __u64 addr; > + __u64 private_data[2]; > +}; > + Just a thought, for non-identity G-H PASID management. We could pass on guest PASID in PRQ to save a lookup in QEMU. In this case, QEMU allocate a GPASID for vIOMMU then a host PASID for pIOMMU. QEMU has a G->H lookup. When PRQ comes in to the pIOMMU with HPASID, IOMMU driver can retrieve GPASID from the bind data then report to the guest via VFIO. In this case QEMU does not need to do a H->G PASID lookup. Should we add a gpasid field here? or we can add a flag and field later, up to you. Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/4] iommu: Introduce device fault data
From: Jacob Pan Device faults detected by IOMMU can be reported outside the IOMMU subsystem for further processing. This patch introduces a generic device fault data structure. The fault can be either an unrecoverable fault or a page request, also referred to as a recoverable fault. We only care about non internal faults that are likely to be reported to an external subsystem. Signed-off-by: Jacob Pan Signed-off-by: Jean-Philippe Brucker Signed-off-by: Liu, Yi L Signed-off-by: Ashok Raj Signed-off-by: Eric Auger --- include/linux/iommu.h | 39 include/uapi/linux/iommu.h | 118 + 2 files changed, 157 insertions(+) create mode 100644 include/uapi/linux/iommu.h diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a815cf6f6f47..2b05056d5fa7 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -25,6 +25,7 @@ #include #include #include +#include #define IOMMU_READ (1 << 0) #define IOMMU_WRITE(1 << 1) @@ -49,6 +50,7 @@ struct device; struct iommu_domain; struct notifier_block; struct iommu_sva; +struct iommu_fault_event; /* iommu fault flags */ #define IOMMU_FAULT_READ 0x0 @@ -58,6 +60,7 @@ typedef int (*iommu_fault_handler_t)(struct iommu_domain *, struct device *, unsigned long, int, void *); typedef int (*iommu_mm_exit_handler_t)(struct device *dev, struct iommu_sva *, void *); +typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault *, void *); struct iommu_domain_geometry { dma_addr_t aperture_start; /* First address that can be mapped*/ @@ -301,6 +304,41 @@ struct iommu_device { struct device *dev; }; +/** + * struct iommu_fault_event - Generic fault event + * + * Can represent recoverable faults such as a page requests or + * unrecoverable faults such as DMA or IRQ remapping faults. + * + * @fault: fault descriptor + */ +struct iommu_fault_event { + struct iommu_fault fault; +}; + +/** + * struct iommu_fault_param - per-device IOMMU fault data + * @handler: Callback function to handle IOMMU faults at device level + * @data: handler private data + */ +struct iommu_fault_param { + iommu_dev_fault_handler_t handler; + void *data; +}; + +/** + * struct iommu_param - collection of per-device IOMMU data + * + * @fault_param: IOMMU detected device fault reporting data + * + * TODO: migrate other per device data pointers under iommu_dev_data, e.g. + * struct iommu_group *iommu_group; + * struct iommu_fwspec *iommu_fwspec; + */ +struct iommu_param { + struct iommu_fault_param *fault_param; +}; + int iommu_device_register(struct iommu_device *iommu); void iommu_device_unregister(struct iommu_device *iommu); int iommu_device_sysfs_add(struct iommu_device *iommu, @@ -504,6 +542,7 @@ struct iommu_ops {}; struct iommu_group {}; struct iommu_fwspec {}; struct iommu_device {}; +struct iommu_fault_param {}; static inline bool iommu_present(struct bus_type *bus) { diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h new file mode 100644 index ..796402174d6c --- /dev/null +++ b/include/uapi/linux/iommu.h @@ -0,0 +1,118 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * IOMMU user API definitions + */ + +#ifndef _UAPI_IOMMU_H +#define _UAPI_IOMMU_H + +#include + +#define IOMMU_FAULT_PERM_READ (1 << 0) /* read */ +#define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */ +#define IOMMU_FAULT_PERM_EXEC (1 << 2) /* exec */ +#define IOMMU_FAULT_PERM_PRIV (1 << 3) /* privileged */ + +/* Generic fault types, can be expanded IRQ remapping fault */ +enum iommu_fault_type { + IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */ + IOMMU_FAULT_PAGE_REQ, /* page request fault */ +}; + +enum iommu_fault_reason { + IOMMU_FAULT_REASON_UNKNOWN = 0, + + /* Could not access the PASID table (fetch caused external abort) */ + IOMMU_FAULT_REASON_PASID_FETCH, + + /* PASID entry is invalid or has configuration errors */ + IOMMU_FAULT_REASON_BAD_PASID_ENTRY, + + /* +* PASID is out of range (e.g. exceeds the maximum PASID +* supported by the IOMMU) or disabled. +*/ + IOMMU_FAULT_REASON_PASID_INVALID, + + /* +* An external abort occurred fetching (or updating) a translation +* table descriptor +*/ + IOMMU_FAULT_REASON_WALK_EABT, + + /* +* Could not access the page table entry (Bad address), +* actual translation fault +*/ + IOMMU_FAULT_REASON_PTE_FETCH, + + /* Protection flag check failed */ + IOMMU_FAULT_REASON_PERMISSION, + + /* access flag check failed */ + IOMMU_FAULT_REASON_ACCESS, + + /* Output address of a translation stage caused Address Size fault */ + IOMMU_FAULT_REASON_OOR_ADDRESS, +}; + +/** + * struct iommu_fault_unrecoverabl