Re: [PATCH 10/18] iommu/vt-d: Add custom allocator for IOASID
On 19/04/2019 05:29, Jacob Pan wrote: > If it is OK with you, I will squash my changes into your ioasid patch > and address the review comments in the v2 of this set, OK? > i.e. > [PATCH 02/18] ioasid: Add custom IOASID allocator > [PATCH 03/18] ioasid: Convert ioasid_idr to XArray That's fine by me, although the "base" and "custom" patches are already relatively large, so it might make sense to keep them separate Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/18] iommu/vt-d: Add custom allocator for IOASID
On Thu, 18 Apr 2019 16:36:02 +0100 Jean-Philippe Brucker wrote: > On 16/04/2019 00:10, Jacob Pan wrote:[...] > >> > + /* > >> > + * Register a custom ASID allocator if we > >> > are running > >> > + * in a guest, the purpose is to have a > >> > system wide PASID > >> > + * namespace among all PASID users. > >> > + * Note that only one vIOMMU in each guest > >> > is supported. > >> > >> Why one vIOMMU per guest? This would prevent guests with multiple > >> PCI domains aiui. > >> > > This is mainly for simplicity reasons. These are all virtual BDFs > > anyway. As long as guest BDF can be mapped to a host BDF, it should > > be sufficient, am I missing anything? > > > > From PASID allocation perspective, it is not tied to any PCI device > > until bind call. We only need to track PASID ownership per guest. > > > > virtio-IOMMU spec does support multiple PCI domains but I am not > > sure if that applies to all assigned devices, whether all assigned > > devices are under the same domain. Perhaps Jean can help to clarify > > how PASID allocation API looks like on virtio IOMMU. > > [Ugh, this is much longer than I hoped. In short I don't think > multiple vIOMMUs is a problem, because the host uses the same > allocator for all of them.] > I agreed, it is not an issue as far as PASID allocation is concerned. > Yes there can be a single virtio-iommu instance for multiple PCI > domains, or multiple instances each managing assigned devices. It's > up to the hypervisor to decide on the topology. > > For Linux and QEMU I was assuming that choosing the vIOMMU used for > PASID allocation isn't a big deal, since in the end they all use the > same allocator in the host. It gets complicated when some vIOMMUs can > be removed at runtime (unload the virtio-iommu module that was > providing the PASID allocator, and then you can't allocate PASIDs for > the VT-d instance anymore), so maybe limiting to one type of vIOMMU > (don't mix VT-d and virtio-iommu in the same VM) is more reasonable. > I think you can deal with the hot removal of vIOMMU by keeping multiple allocators in a list. i.e. when the second vIOMMU register an allocator, instead of return -EBUSY, we just keep the it in back pocket list. If the first vIOMMU is removed, the second one can be popped out into action (and vise versa). Then we always have an allocator. > It's a bit more delicate from the virtio-iommu perspective. The > interface is portable and I can't tie it down to the choices we're > making for Linux and KVM. Having a system-wide PASID space is what we > picked for Linux but the PCIe architecture allows for each device to > have their own PASID space, and I suppose some hypervisors and guests > might prefer implementing it that way. > > My plan for the moment is to implement global PASID allocation using > one feature bit and two new requests, but leave space for a > per-device PASID allocation, introduced with another feature bit if > necessary. If it ever gets added, I expect the per-device allocation > to be done during the bind request rather than with a separate > PASID_ALLOC request. > > So currently I have a new feature bit and two commands: > > #define VIRTIO_IOMMU_F_PASID_ALLOC > #define VIRTIO_IOMMU_T_ALLOC_PASID > #define VIRTIO_IOMMU_T_FREE_PASID > > struct virtio_iommu_req_alloc_pasid { > struct virtio_iommu_req_head head; > u32 reserved; > > /* Device-writeable */ > le32 pasid; > struct virtio_iommu_req_tail tail; > }; > > struct virtio_iommu_req_free_pasid { > struct virtio_iommu_req_head head; > u32 reserved; > le32 pasid; > > /* Device-writeable */ > struct virtio_iommu_req_tail tail; > }; > > If the feature bit is offered it must be used, and the guest can only > use PASIDs allocated via VIRTIO_IOMMU_T_ALLOC_PASID in its bind > requests. > > The PASID space is global at the host scope. If multiple virtio-iommu > devices in the VM offer the feature bit, then using either of their > command queue to issue a VIRTIO_IOMMU_F_ALLOC_PASID and > VIRTIO_IOMMU_F_FREE_PASID is equivalent. Another possibility is to > require that only one of the virtio-iommu instances per VM offers the > feature bit. I do prefer this option, but there is the vIOMMU removal > problem mentioned above - which, with the first option, could be > solved by keeping a list of PASID allocator functions rather than a > single one. > > I'm considering adding max_pasid field to > virtio_iommu_req_alloc_pasid. If VIRTIO_IOMMU_T_ALLOC_PASID returns a > random 20-bit value then a lot of space might be needed for storing > PASID contexts (is that a real concern though? For internal data it > can use a binary tree, and the guest is not in charge of hardware > PASID tables here). If the guest is short on memory then it could > benefit from a smaller number of PASID bits. That could
Re: [PATCH 10/18] iommu/vt-d: Add custom allocator for IOASID
On 16/04/2019 00:10, Jacob Pan wrote:[...] >> > + /* >> > + * Register a custom ASID allocator if we >> > are running >> > + * in a guest, the purpose is to have a >> > system wide PASID >> > + * namespace among all PASID users. >> > + * Note that only one vIOMMU in each guest >> > is supported. >> >> Why one vIOMMU per guest? This would prevent guests with multiple PCI >> domains aiui. >> > This is mainly for simplicity reasons. These are all virtual BDFs > anyway. As long as guest BDF can be mapped to a host BDF, it should be > sufficient, am I missing anything? > > From PASID allocation perspective, it is not tied to any PCI device > until bind call. We only need to track PASID ownership per guest. > > virtio-IOMMU spec does support multiple PCI domains but I am not sure > if that applies to all assigned devices, whether all assigned devices > are under the same domain. Perhaps Jean can help to clarify how PASID > allocation API looks like on virtio IOMMU. [Ugh, this is much longer than I hoped. In short I don't think multiple vIOMMUs is a problem, because the host uses the same allocator for all of them.] Yes there can be a single virtio-iommu instance for multiple PCI domains, or multiple instances each managing assigned devices. It's up to the hypervisor to decide on the topology. For Linux and QEMU I was assuming that choosing the vIOMMU used for PASID allocation isn't a big deal, since in the end they all use the same allocator in the host. It gets complicated when some vIOMMUs can be removed at runtime (unload the virtio-iommu module that was providing the PASID allocator, and then you can't allocate PASIDs for the VT-d instance anymore), so maybe limiting to one type of vIOMMU (don't mix VT-d and virtio-iommu in the same VM) is more reasonable. It's a bit more delicate from the virtio-iommu perspective. The interface is portable and I can't tie it down to the choices we're making for Linux and KVM. Having a system-wide PASID space is what we picked for Linux but the PCIe architecture allows for each device to have their own PASID space, and I suppose some hypervisors and guests might prefer implementing it that way. My plan for the moment is to implement global PASID allocation using one feature bit and two new requests, but leave space for a per-device PASID allocation, introduced with another feature bit if necessary. If it ever gets added, I expect the per-device allocation to be done during the bind request rather than with a separate PASID_ALLOC request. So currently I have a new feature bit and two commands: #define VIRTIO_IOMMU_F_PASID_ALLOC #define VIRTIO_IOMMU_T_ALLOC_PASID #define VIRTIO_IOMMU_T_FREE_PASID struct virtio_iommu_req_alloc_pasid { struct virtio_iommu_req_head head; u32 reserved; /* Device-writeable */ le32 pasid; struct virtio_iommu_req_tail tail; }; struct virtio_iommu_req_free_pasid { struct virtio_iommu_req_head head; u32 reserved; le32 pasid; /* Device-writeable */ struct virtio_iommu_req_tail tail; }; If the feature bit is offered it must be used, and the guest can only use PASIDs allocated via VIRTIO_IOMMU_T_ALLOC_PASID in its bind requests. The PASID space is global at the host scope. If multiple virtio-iommu devices in the VM offer the feature bit, then using either of their command queue to issue a VIRTIO_IOMMU_F_ALLOC_PASID and VIRTIO_IOMMU_F_FREE_PASID is equivalent. Another possibility is to require that only one of the virtio-iommu instances per VM offers the feature bit. I do prefer this option, but there is the vIOMMU removal problem mentioned above - which, with the first option, could be solved by keeping a list of PASID allocator functions rather than a single one. I'm considering adding max_pasid field to virtio_iommu_req_alloc_pasid. If VIRTIO_IOMMU_T_ALLOC_PASID returns a random 20-bit value then a lot of space might be needed for storing PASID contexts (is that a real concern though? For internal data it can use a binary tree, and the guest is not in charge of hardware PASID tables here). If the guest is short on memory then it could benefit from a smaller number of PASID bits. That could either be globally configurable in the virtio-iommu config space, or using a max_pasid field in the VIRTIO_IOMMU_T_ALLOC_PASID request. The latter allows to support devices with less than 20 PASID bits, though we're hoping that no one will implement that. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/18] iommu/vt-d: Add custom allocator for IOASID
On Mon, 15 Apr 2019 14:37:11 -0600 Alex Williamson wrote: > On Mon, 8 Apr 2019 16:59:25 -0700 > Jacob Pan wrote: > > > When VT-d driver runs in the guest, PASID allocation must be > > performed via virtual command interface. This patch register a > > custom IOASID allocator which takes precedence over the default > > IDR based allocator. The resulting IOASID allocation will always > > come from the host. This ensures that PASID namespace is system- > > wide. > > > > Signed-off-by: Lu Baolu > > Signed-off-by: Liu, Yi L > > Signed-off-by: Jacob Pan > > --- > > drivers/iommu/intel-iommu.c | 50 > > + > > include/linux/intel-iommu.h | 1 + 2 files changed, 51 insertions(+) > > > > diff --git a/drivers/iommu/intel-iommu.c > > b/drivers/iommu/intel-iommu.c index 28cb713..a38d774 100644 > > --- a/drivers/iommu/intel-iommu.c > > +++ b/drivers/iommu/intel-iommu.c > > @@ -4820,6 +4820,42 @@ static int __init > > platform_optin_force_iommu(void) return 1; > > } > > > > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, > > void *data) +{ > > + struct intel_iommu *iommu = data; > > + ioasid_t ioasid; > > + > > + if (vcmd_alloc_pasid(iommu, )) > > + return INVALID_IOASID; > > + return ioasid; > > How does this honor min/max? > Sorry I missed this in my previous reply. VT-d virtual command interface always allocate PASIDs with full 20bit range. I think a range checking is missing here. Thanks for pointing this out. > [...] > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/18] iommu/vt-d: Add custom allocator for IOASID
On Mon, 15 Apr 2019 14:37:11 -0600 Alex Williamson wrote: > On Mon, 8 Apr 2019 16:59:25 -0700 > Jacob Pan wrote: > > > When VT-d driver runs in the guest, PASID allocation must be > > performed via virtual command interface. This patch register a > > custom IOASID allocator which takes precedence over the default > > IDR based allocator. The resulting IOASID allocation will always > > come from the host. This ensures that PASID namespace is system- > > wide. > > > > Signed-off-by: Lu Baolu > > Signed-off-by: Liu, Yi L > > Signed-off-by: Jacob Pan > > --- > > drivers/iommu/intel-iommu.c | 50 > > + > > include/linux/intel-iommu.h | 1 + 2 files changed, 51 insertions(+) > > > > diff --git a/drivers/iommu/intel-iommu.c > > b/drivers/iommu/intel-iommu.c index 28cb713..a38d774 100644 > > --- a/drivers/iommu/intel-iommu.c > > +++ b/drivers/iommu/intel-iommu.c > > @@ -4820,6 +4820,42 @@ static int __init > > platform_optin_force_iommu(void) return 1; > > } > > > > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, > > void *data) +{ > > + struct intel_iommu *iommu = data; > > + ioasid_t ioasid; > > + > > + if (vcmd_alloc_pasid(iommu, )) > > + return INVALID_IOASID; > > + return ioasid; > > How does this honor min/max? > > > +} > > + > > +static int intel_ioasid_free(ioasid_t ioasid, void *data) > > +{ > > + struct iommu_pasid_alloc_info *svm; > > + struct intel_iommu *iommu = data; > > + > > + if (!iommu || !cap_caching_mode(iommu->cap)) > > + return -EINVAL; > > + /* > > +* Sanity check the ioasid owner is done at upper layer, > > e.g. VFIO > > +* We can only free the PASID when all the devices are > > unbond. > > +*/ > > + svm = ioasid_find(NULL, ioasid, NULL); > > + if (!svm) { > > + pr_warn("Freeing unbond IOASID %d\n", ioasid); > > + return -EBUSY; > > + } > > + vcmd_free_pasid(iommu, ioasid); > > + > > + return 0; > > +} > > + > > +static struct ioasid_allocator intel_iommu_ioasid_allocator = { > > + .alloc = intel_ioasid_alloc, > > + .free = intel_ioasid_free, > > +}; > > + > > int __init intel_iommu_init(void) > > { > > int ret = -ENODEV; > > @@ -4921,6 +4957,20 @@ int __init intel_iommu_init(void) > >"%s", iommu->name); > > iommu_device_set_ops(>iommu, > > _iommu_ops); iommu_device_register(>iommu); > > + if (cap_caching_mode(iommu->cap) && > > sm_supported(iommu)) { > > + /* > > +* Register a custom ASID allocator if we > > are running > > +* in a guest, the purpose is to have a > > system wide PASID > > +* namespace among all PASID users. > > +* Note that only one vIOMMU in each guest > > is supported. > > Why one vIOMMU per guest? This would prevent guests with multiple PCI > domains aiui. > This is mainly for simplicity reasons. These are all virtual BDFs anyway. As long as guest BDF can be mapped to a host BDF, it should be sufficient, am I missing anything? >From PASID allocation perspective, it is not tied to any PCI device until bind call. We only need to track PASID ownership per guest. virtio-IOMMU spec does support multiple PCI domains but I am not sure if that applies to all assigned devices, whether all assigned devices are under the same domain. Perhaps Jean can help to clarify how PASID allocation API looks like on virtio IOMMU. > > +*/ > > + intel_iommu_ioasid_allocator.pdata = (void > > *)iommu; > > + ret = > > ioasid_set_allocator(_iommu_ioasid_allocator); > > + if (ret == -EBUSY) { > > + pr_info("Custom IOASID allocator > > already registered\n"); > > + break; > > + } > > + } > > } > > > > bus_set_iommu(_bus_type, _iommu_ops); > > diff --git a/include/linux/intel-iommu.h > > b/include/linux/intel-iommu.h index b29c85c..bc09d80 100644 > > --- a/include/linux/intel-iommu.h > > +++ b/include/linux/intel-iommu.h > > @@ -31,6 +31,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/18] iommu/vt-d: Add custom allocator for IOASID
On Mon, 8 Apr 2019 16:59:25 -0700 Jacob Pan wrote: > When VT-d driver runs in the guest, PASID allocation must be > performed via virtual command interface. This patch register a > custom IOASID allocator which takes precedence over the default > IDR based allocator. The resulting IOASID allocation will always > come from the host. This ensures that PASID namespace is system- > wide. > > Signed-off-by: Lu Baolu > Signed-off-by: Liu, Yi L > Signed-off-by: Jacob Pan > --- > drivers/iommu/intel-iommu.c | 50 > + > include/linux/intel-iommu.h | 1 + > 2 files changed, 51 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 28cb713..a38d774 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -4820,6 +4820,42 @@ static int __init platform_optin_force_iommu(void) > return 1; > } > > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data) > +{ > + struct intel_iommu *iommu = data; > + ioasid_t ioasid; > + > + if (vcmd_alloc_pasid(iommu, )) > + return INVALID_IOASID; > + return ioasid; How does this honor min/max? > +} > + > +static int intel_ioasid_free(ioasid_t ioasid, void *data) > +{ > + struct iommu_pasid_alloc_info *svm; > + struct intel_iommu *iommu = data; > + > + if (!iommu || !cap_caching_mode(iommu->cap)) > + return -EINVAL; > + /* > + * Sanity check the ioasid owner is done at upper layer, e.g. VFIO > + * We can only free the PASID when all the devices are unbond. > + */ > + svm = ioasid_find(NULL, ioasid, NULL); > + if (!svm) { > + pr_warn("Freeing unbond IOASID %d\n", ioasid); > + return -EBUSY; > + } > + vcmd_free_pasid(iommu, ioasid); > + > + return 0; > +} > + > +static struct ioasid_allocator intel_iommu_ioasid_allocator = { > + .alloc = intel_ioasid_alloc, > + .free = intel_ioasid_free, > +}; > + > int __init intel_iommu_init(void) > { > int ret = -ENODEV; > @@ -4921,6 +4957,20 @@ int __init intel_iommu_init(void) > "%s", iommu->name); > iommu_device_set_ops(>iommu, _iommu_ops); > iommu_device_register(>iommu); > + if (cap_caching_mode(iommu->cap) && sm_supported(iommu)) { > + /* > + * Register a custom ASID allocator if we are running > + * in a guest, the purpose is to have a system wide > PASID > + * namespace among all PASID users. > + * Note that only one vIOMMU in each guest is supported. Why one vIOMMU per guest? This would prevent guests with multiple PCI domains aiui. > + */ > + intel_iommu_ioasid_allocator.pdata = (void *)iommu; > + ret = > ioasid_set_allocator(_iommu_ioasid_allocator); > + if (ret == -EBUSY) { > + pr_info("Custom IOASID allocator already > registered\n"); > + break; > + } > + } > } > > bus_set_iommu(_bus_type, _iommu_ops); > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > index b29c85c..bc09d80 100644 > --- a/include/linux/intel-iommu.h > +++ b/include/linux/intel-iommu.h > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > > #include > #include ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu