Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support
On 2022/5/24 21:44, Jason Gunthorpe wrote: +{ + struct iommu_sva_domain *sva_domain; + struct iommu_domain *domain; + + if (!bus->iommu_ops || !bus->iommu_ops->sva_domain_ops) + return ERR_PTR(-ENODEV); + + sva_domain = kzalloc(sizeof(*sva_domain), GFP_KERNEL); + if (!sva_domain) + return ERR_PTR(-ENOMEM); + + mmgrab(mm); + sva_domain->mm = mm; + + domain = _domain->domain; + domain->type = IOMMU_DOMAIN_SVA; + domain->ops = bus->iommu_ops->sva_domain_ops; + + return domain; +} + +void iommu_sva_free_domain(struct iommu_domain *domain) +{ + struct iommu_sva_domain *sva_domain = to_sva_domain(domain); + + mmdrop(sva_domain->mm); + kfree(sva_domain); +} No callback to the driver? Should do this in the next version. This version added an sva-specific iommu_domain_ops pointer in iommu_ops. This is not the right way to go. +int iommu_sva_set_domain(struct iommu_domain *domain, struct device *dev, +ioasid_t pasid) +{ Why does this function exist? Just call iommu_set_device_pasid() Yes, agreed. +int iommu_set_device_pasid(struct iommu_domain *domain, struct device *dev, + ioasid_t pasid) +{ Here you can continue to use attach/detach language as at this API level we expect strict pairing.. Sure. +void iommu_block_device_pasid(struct iommu_domain *domain, struct device *dev, + ioasid_t pasid) +{ + struct iommu_group *group = iommu_group_get(dev); + + mutex_lock(>mutex); + domain->ops->block_dev_pasid(domain, dev, pasid); + xa_erase(>pasid_array, pasid); + mutex_unlock(>mutex); Should be the blocking domain. As we discussed, we should change above to blocking domain when the blocking domain is supported on at least Intel and arm-smmu-v3 drivers. I have started the work for Intel driver support. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support
Hi Jason, On 2022/5/24 21:44, Jason Gunthorpe wrote: diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c index 106506143896..210c376f6043 100644 +++ b/drivers/iommu/iommu-sva-lib.c @@ -69,3 +69,51 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid) return ioasid_find(_sva_pasid, pasid, __mmget_not_zero); } EXPORT_SYMBOL_GPL(iommu_sva_find); + +/* + * IOMMU SVA driver-oriented interfaces + */ +struct iommu_domain * +iommu_sva_alloc_domain(struct bus_type *bus, struct mm_struct *mm) This should return the proper type Can you please elaborate a bit on "return the proper type"? Did you mean return iommu_sva_domain instead? That's a wrapper of iommu_domain and only for iommu internal usage. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support
On 2022/5/24 17:39, Tian, Kevin wrote: From: Lu Baolu Sent: Thursday, May 19, 2022 3:21 PM The iommu_sva_domain represents a hardware pagetable that the IOMMU hardware could use for SVA translation. This adds some infrastructure to support SVA domain in the iommu common layer. It includes: - Add a new struct iommu_sva_domain and new IOMMU_DOMAIN_SVA domain type. - Add a new domain ops pointer in iommu_ops. The IOMMU drivers that support SVA should provide the callbacks. - Add helpers to allocate and free an SVA domain. - Add helpers to set an SVA domain to a device and the reverse operation. Some buses, like PCI, route packets without considering the PASID value. Thus a DMA target address with PASID might be treated as P2P if the address falls into the MMIO BAR of other devices in the group. To make things simple, the attach/detach interfaces only apply to devices belonging to the singleton groups, and the singleton is immutable in fabric i.e. not affected by hotplug. The iommu_set/block_device_pasid() can be used for other purposes, such as kernel DMA with pasid, mediation device, etc. Hence, it is put in the iommu.c. usually we have 'set/clear' pair or 'allow/block'. Having 'set' paired with 'block' doesn't read very clearly. Yes. Let's still use the attach/detach semantics. +static bool device_group_immutable_singleton(struct device *dev) +{ + struct iommu_group *group = iommu_group_get(dev); what about passing group as the parameter since the caller will get the group again right after calling this function? In that case the function could be renamed as: iommu_group_immutable_singleton() or be shorter: iommu_group_fixed_singleton() Fair enough. I will tune it as below: +static bool iommu_group_immutable_singleton(struct iommu_group *group) +{ + int count; + + mutex_lock(>mutex); + count = iommu_group_device_count(group); + mutex_unlock(>mutex); + + if (count != 1) + return false; + + /* +* The PCI device could be considered to be fully isolated if all +* devices on the path from the device to the host-PCI bridge are +* protected from peer-to-peer DMA by ACS. +*/ + if (dev_is_pci(dev)) + return pci_acs_path_enabled(to_pci_dev(dev), NULL, + REQ_ACS_FLAGS); + + /* +* Otherwise, the device came from DT/ACPI, assume it is static and +* then singleton can know from the device count in the group. +*/ + return true; +} + int count; + + if (!group) + return false; + + mutex_lock(>mutex); + count = iommu_group_device_count(group); + mutex_unlock(>mutex); + iommu_group_put(group); + + if (count != 1) + return false; For non-pci devices above doesn't check anything against immutable. Please add some comment to explain why doing so is correct. Yes, as above code shows. + + /* +* The PCI device could be considered to be fully isolated if all +* devices on the path from the device to the host-PCI bridge are +* protected from peer-to-peer DMA by ACS. +*/ + if (dev_is_pci(dev)) + return pci_acs_path_enabled(to_pci_dev(dev), NULL, + REQ_ACS_FLAGS); + + return true; +} + Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support
On 2022/5/25 08:44, Tian, Kevin wrote: From: Jason Gunthorpe Sent: Tuesday, May 24, 2022 9:39 PM On Tue, May 24, 2022 at 09:39:52AM +, Tian, Kevin wrote: From: Lu Baolu Sent: Thursday, May 19, 2022 3:21 PM The iommu_sva_domain represents a hardware pagetable that the IOMMU hardware could use for SVA translation. This adds some infrastructure to support SVA domain in the iommu common layer. It includes: - Add a new struct iommu_sva_domain and new IOMMU_DOMAIN_SVA domain type. - Add a new domain ops pointer in iommu_ops. The IOMMU drivers that support SVA should provide the callbacks. - Add helpers to allocate and free an SVA domain. - Add helpers to set an SVA domain to a device and the reverse operation. Some buses, like PCI, route packets without considering the PASID value. Thus a DMA target address with PASID might be treated as P2P if the address falls into the MMIO BAR of other devices in the group. To make things simple, the attach/detach interfaces only apply to devices belonging to the singleton groups, and the singleton is immutable in fabric i.e. not affected by hotplug. The iommu_set/block_device_pasid() can be used for other purposes, such as kernel DMA with pasid, mediation device, etc. Hence, it is put in the iommu.c. usually we have 'set/clear' pair or 'allow/block'. Having 'set' paired with 'block' doesn't read very clearly. I thought we agreed we'd use the blocking domain for this? Why did it go back to an op? Probably it's based on following discussion: https://lore.kernel.org/all/c8492b29-bc27-ae12-d5c4-9fbbc797e...@linux.intel.com/ -- FWIW from my point of view I'm happy with having a .detach_dev_pasid op meaning implicitly-blocked access for now. If this is the path then lets not call it attach/detach please. 'set_dev_pasid' and 'set_dev_blocking_pasid' are clearer names. Yes. Learning from above discussion, we are about to implement the set_dev_pasid and blocking domain in parallel. We will convert all the callback names to set_dev and set_dev_pasid after blocking domain support is merged. -- Looks Baolu chooses this path and plans to use the blocking domain later. Yes. I have already started to implement the blocking domain in Intel driver. With it as an example, we can extend it to other possible IOMMU drivers. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support
On 2022/5/24 17:44, Tian, Kevin wrote: From: Baolu Lu Sent: Monday, May 23, 2022 3:13 PM @@ -254,6 +259,7 @@ struct iommu_ops { int (*def_domain_type)(struct device *dev); const struct iommu_domain_ops *default_domain_ops; + const struct iommu_domain_ops *sva_domain_ops; Per Joerg's comment in anther thread, https://lore.kernel.org/linux-iommu/yodvj7ervpidw...@8bytes.org/ adding a sva_domain_ops here is not the right way to go. If no objection, I will make the sva domain go through the generic domain_alloc/free() callbacks in the next version. suppose it's just back to what v1-v6 did which all registered the ops in domain_alloc() callback? Yes. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 01/10] iommu: Add pasids field in struct iommu_device
On 2022/5/25 10:03, Baolu Lu wrote: diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index 4de960834a1b..1c3cf267934d 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1126,6 +1126,10 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd) raw_spin_lock_init(>register_lock); + /* Supports full 20-bit PASID in scalable mode. */ + if (ecap_pasid(iommu->ecap)) + iommu->iommu.pasids = 1UL << 20; + supported pasid bits is reported by ecap_pss(). I don't think we should assume 20bits here. Yes. I overlooked this. Thanks for reminding. Another thing I need to improve is that scalable mode could be disabled. This field should be 0 in that case. I will change above to: +/* + * A value of N in PSS field of eCap register indicates hardware + * supports PASID field of N+1 bits. + */ +if (pasid_supported(iommu)) +iommu->iommu.max_pasids = 2UL << ecap_pss(iommu->ecap); Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 01/10] iommu: Add pasids field in struct iommu_device
Hi Kevin, Thank you for reviewing my patches. On 2022/5/24 17:24, Tian, Kevin wrote: From: Lu Baolu Sent: Thursday, May 19, 2022 3:21 PM Use this field to keep the number of supported PASIDs that an IOMMU hardware is able to support. This is a generic attribute of an IOMMU and lifting it into the per-IOMMU device structure makes it possible to allocate a PASID for device without calls into the IOMMU drivers. Any iommu driver which suports PASID related features should set this field before enabling them on the devices. Signed-off-by: Lu Baolu Reviewed-by: Jean-Philippe Brucker --- include/linux/iommu.h | 2 ++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 + drivers/iommu/intel/dmar.c | 4 3 files changed, 7 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 5e1afe169549..da423e87f248 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -318,12 +318,14 @@ struct iommu_domain_ops { * @list: Used by the iommu-core to keep a list of registered iommus * @ops: iommu-ops for talking to this iommu * @dev: struct device for sysfs handling + * @pasids: number of supported PASIDs */ struct iommu_device { struct list_head list; const struct iommu_ops *ops; struct fwnode_handle *fwnode; struct device *dev; + u32 pasids; max_pasid or nr_pasids? max_pasid looks better. }; /** diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 88817a3376ef..6e2cd082c670 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3546,6 +3546,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) /* SID/SSID sizes */ smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg); smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg); + smmu->iommu.pasids = smmu->ssid_bits; /* * If the SMMU supports fewer bits than would fill a single L2 stream diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index 4de960834a1b..1c3cf267934d 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1126,6 +1126,10 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd) raw_spin_lock_init(>register_lock); + /* Supports full 20-bit PASID in scalable mode. */ + if (ecap_pasid(iommu->ecap)) + iommu->iommu.pasids = 1UL << 20; + supported pasid bits is reported by ecap_pss(). I don't think we should assume 20bits here. Yes. I overlooked this. Thanks for reminding. Another thing I need to improve is that scalable mode could be disabled. This field should be 0 in that case. /* * This is only for hotplug; at boot time intel_iommu_enabled won't * be set yet. When intel_iommu_init() runs, it registers the units -- 2.25.1 Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add
Hi Joerg, On 2022/5/21 08:21, Yian Chen wrote: Notifier calling chain uses priority to determine the execution order of the notifiers or listeners registered to the chain. PCI bus device hot add utilizes the notification mechanism. The current code sets low priority (INT_MIN) to Intel dmar_pci_bus_notifier and postpones DMAR decoding after adding new device into IOMMU. The result is that struct device pointer cannot be found in DRHD search for the new device's DMAR/IOMMU. Subsequently, the device is put under the "catch-all" IOMMU instead of the correct one. This could cause system hang when device TLB invalidation is sent to the wrong IOMMU. Invalidation timeout error and hard lockup have been observed and data inconsistency/crush may occur as well. This patch fixes the issue by setting a positive priority(1) for dmar_pci_bus_notifier while the priority of IOMMU bus notifier uses the default value(0), therefore DMAR decoding will be in advance of DRHD search for a new device to find the correct IOMMU. Following is a 2-step example that triggers the bug by simulating PCI device hot add behavior in Intel Sapphire Rapids server. echo 1 > /sys/bus/pci/devices/:6a:01.0/remove echo 1 > /sys/bus/pci/rescan Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope") Cc: sta...@vger.kernel.org # v3.15+ Reported-by: Zhang, Bernice Signed-off-by: Jacob Pan Signed-off-by: Yian Chen --- This is a quick fix for the bug reported. Intel internally evaluated another redesigned solution that eliminates dmar pci bus notifier to simplify the workflow of pci hotplug and improve its runtime efficiency. While considering the fix could apply to downstream and the complexity of pci hotplug workflow change may significantly increase the engineering effort to downstream the patch, the choice is to submit this simple patch to help the deployment of this bug fix. Yian has been worked on using IOMMU bus notifier to solve this problem. It turns out that due to the following facts, we need to refactor the IOMMU core and Intel DMAR Code: - Interrupt remapping also requires Intel DMAR code. Therefore, when IOMMU is not enabled, the PCI bus notifier in DMAR is still required. - The IOMMU PCI bus notifier calls .probe_device() which lacks of the information about hot-add or static boot. Considering that the problem described here is a serious problem, because users can easily damage the system by writing sysfs files on some platforms, we need a quick fix for both upstream and stable kernels. The refactoring code will be discussed in a separate series. How do you like it? If you agree, I can queue it in my next pull request for fixes. Best regards, baolu --- drivers/iommu/intel/dmar.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index 4de960834a1b..497c5bd95caf 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -383,7 +383,7 @@ static int dmar_pci_bus_notifier(struct notifier_block *nb, static struct notifier_block dmar_pci_bus_nb = { .notifier_call = dmar_pci_bus_notifier, - .priority = INT_MIN, + .priority = 1, }; static struct dmar_drhd_unit * ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support
On 2022/5/19 15:20, Lu Baolu wrote: The iommu_sva_domain represents a hardware pagetable that the IOMMU hardware could use for SVA translation. This adds some infrastructure to support SVA domain in the iommu common layer. It includes: - Add a new struct iommu_sva_domain and new IOMMU_DOMAIN_SVA domain type. - Add a new domain ops pointer in iommu_ops. The IOMMU drivers that support SVA should provide the callbacks. - Add helpers to allocate and free an SVA domain. - Add helpers to set an SVA domain to a device and the reverse operation. Some buses, like PCI, route packets without considering the PASID value. Thus a DMA target address with PASID might be treated as P2P if the address falls into the MMIO BAR of other devices in the group. To make things simple, the attach/detach interfaces only apply to devices belonging to the singleton groups, and the singleton is immutable in fabric i.e. not affected by hotplug. The iommu_set/block_device_pasid() can be used for other purposes, such as kernel DMA with pasid, mediation device, etc. Hence, it is put in the iommu.c. Suggested-by: Jean-Philippe Brucker Suggested-by: Jason Gunthorpe Signed-off-by: Lu Baolu --- include/linux/iommu.h | 51 + drivers/iommu/iommu-sva-lib.h | 15 drivers/iommu/iommu-sva-lib.c | 48 +++ drivers/iommu/iommu.c | 71 +++ 4 files changed, 185 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0c358b7c583b..e8cf82d46ce1 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -64,6 +64,9 @@ struct iommu_domain_geometry { #define __IOMMU_DOMAIN_PT (1U << 2) /* Domain is identity mapped */ #define __IOMMU_DOMAIN_DMA_FQ (1U << 3) /* DMA-API uses flush queue*/ +#define __IOMMU_DOMAIN_SHARED (1U << 4) /* Page table shared from CPU */ +#define __IOMMU_DOMAIN_HOST_VA (1U << 5) /* Host CPU virtual address */ + /* * This are the possible domain-types * @@ -86,6 +89,8 @@ struct iommu_domain_geometry { #define IOMMU_DOMAIN_DMA_FQ (__IOMMU_DOMAIN_PAGING |\ __IOMMU_DOMAIN_DMA_API | \ __IOMMU_DOMAIN_DMA_FQ) +#define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SHARED |\ +__IOMMU_DOMAIN_HOST_VA) struct iommu_domain { unsigned type; @@ -254,6 +259,7 @@ struct iommu_ops { int (*def_domain_type)(struct device *dev); const struct iommu_domain_ops *default_domain_ops; + const struct iommu_domain_ops *sva_domain_ops; Per Joerg's comment in anther thread, https://lore.kernel.org/linux-iommu/yodvj7ervpidw...@8bytes.org/ adding a sva_domain_ops here is not the right way to go. If no objection, I will make the sva domain go through the generic domain_alloc/free() callbacks in the next version. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 06/10] iommu/sva: Refactoring iommu_sva_bind/unbind_device()
On 2022/5/20 19:28, Jean-Philippe Brucker wrote: On Fri, May 20, 2022 at 02:38:12PM +0800, Baolu Lu wrote: On 2022/5/20 00:39, Jean-Philippe Brucker wrote: +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm) +{ + struct iommu_sva_domain *sva_domain; + struct iommu_domain *domain; + ioasid_t max_pasid = 0; + int ret = -EINVAL; + + /* Allocate mm->pasid if necessary. */ + if (!dev->iommu->iommu_dev->pasids) + return ERR_PTR(-EOPNOTSUPP); + + if (dev_is_pci(dev)) { + max_pasid = pci_max_pasids(to_pci_dev(dev)); + if (max_pasid < 0) + return ERR_PTR(max_pasid); + } else { + ret = device_property_read_u32(dev, "pasid-num-bits", + _pasid); + if (ret) + return ERR_PTR(ret); + max_pasid = (1UL << max_pasid); + } The IOMMU driver needs this PASID width information earlier, when creating the PASID table (in .probe_device(), .attach_dev()). Since we're moving it to the IOMMU core to avoid code duplication, it should be done earlier and stored in dev->iommu Yes, really. How about below changes? From f1382579e8a15ca49acdf758d38fd36451ea174d Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Mon, 28 Feb 2022 15:01:35 +0800 Subject: [PATCH 1/1] iommu: Add pasids field in struct dev_iommu Use this field to save the number of PASIDs that a device is able to consume. It is a generic attribute of a device and lifting it into the per-device dev_iommu struct could help to avoid the boilerplate code in various IOMMU drivers. Signed-off-by: Lu Baolu --- drivers/iommu/iommu.c | 15 +++ include/linux/iommu.h | 2 ++ 2 files changed, 17 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index e49c5a5b8cc1..6b731171d42f 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -194,6 +195,8 @@ EXPORT_SYMBOL_GPL(iommu_device_unregister); static struct dev_iommu *dev_iommu_get(struct device *dev) { struct dev_iommu *param = dev->iommu; + u32 max_pasids = 0; + int ret; if (param) return param; @@ -202,6 +205,18 @@ static struct dev_iommu *dev_iommu_get(struct device *dev) if (!param) return NULL; + if (dev_is_pci(dev)) { + ret = pci_max_pasids(to_pci_dev(dev)); + if (ret > 0) + max_pasids = ret; + } else { + ret = device_property_read_u32(dev, "pasid-num-bits", + _pasids); + if (!ret) + max_pasids = (1UL << max_pasids); + } + param->pasids = max_pasids; + we could also do a min() with the IOMMU PASID size here mutex_init(>lock); dev->iommu = param; return param; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 45f274b2640d..d4296136ba75 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -371,6 +371,7 @@ struct iommu_fault_param { * @fwspec:IOMMU fwspec data * @iommu_dev: IOMMU device this device is linked to * @priv: IOMMU Driver private data + * @pasids: number of supported PASIDs 'max_pasids' to stay consistent? Both done. How about below changes? From 008c73b9c0ad51a4a70a18d60361a76c28a63342 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Mon, 28 Feb 2022 15:01:35 +0800 Subject: [PATCH 1/1] iommu: Add max_pasids field in struct dev_iommu Use this field to save the number of PASIDs that a device is able to consume. It is a generic attribute of a device and lifting it into the per-device dev_iommu struct could help to avoid the boilerplate code in various IOMMU drivers. Signed-off-by: Lu Baolu --- drivers/iommu/iommu.c | 26 ++ include/linux/iommu.h | 2 ++ 2 files changed, 28 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 847ad47a2dfd..365d0f2b7f55 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -218,6 +219,30 @@ static void dev_iommu_free(struct device *dev) kfree(param); } +static u32 dev_iommu_get_max_pasids(struct device *dev) +{ + u32 max_pasids = dev->iommu->iommu_dev->max_pasids; + u32 num_bits; + int ret; + + if (!max_pasids) + return 0; + + if (dev_is_pci(dev)) { + ret = pci_max_pasids(to_pci_dev(dev)); + if (ret < 0) + return 0; + + return min_t(u32, max_pasids, ret); + } + + ret = device_property_read
Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops
On 2022/5/20 16:45, Joerg Roedel wrote: On Mon, May 16, 2022 at 09:57:56AM +0800, Lu Baolu wrote: const struct iommu_domain_ops *default_domain_ops; + const struct iommu_domain_ops *blocking_domain_ops; I don't understand why extra domain-ops are needed for this. I think it would be more straight-forward to implement allocation of IOMMU_DOMAIN_BLOCKED domains in each driver and let the details be handled in the set_dev() call-back. The IOMMU driver can make sure DMA is blocked for a device when it encounters a IOMMU_DOMAIN_BLOCKED domain. For IOMMUs that have no explicit way to block DMA could just use an unmanaged domain with an empty page-table. Yes, this is what will go. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 06/10] iommu/sva: Refactoring iommu_sva_bind/unbind_device()
On 2022/5/20 00:39, Jean-Philippe Brucker wrote: +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm) +{ + struct iommu_sva_domain *sva_domain; + struct iommu_domain *domain; + ioasid_t max_pasid = 0; + int ret = -EINVAL; + + /* Allocate mm->pasid if necessary. */ + if (!dev->iommu->iommu_dev->pasids) + return ERR_PTR(-EOPNOTSUPP); + + if (dev_is_pci(dev)) { + max_pasid = pci_max_pasids(to_pci_dev(dev)); + if (max_pasid < 0) + return ERR_PTR(max_pasid); + } else { + ret = device_property_read_u32(dev, "pasid-num-bits", + _pasid); + if (ret) + return ERR_PTR(ret); + max_pasid = (1UL << max_pasid); + } The IOMMU driver needs this PASID width information earlier, when creating the PASID table (in .probe_device(), .attach_dev()). Since we're moving it to the IOMMU core to avoid code duplication, it should be done earlier and stored in dev->iommu Yes, really. How about below changes? From f1382579e8a15ca49acdf758d38fd36451ea174d Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Mon, 28 Feb 2022 15:01:35 +0800 Subject: [PATCH 1/1] iommu: Add pasids field in struct dev_iommu Use this field to save the number of PASIDs that a device is able to consume. It is a generic attribute of a device and lifting it into the per-device dev_iommu struct could help to avoid the boilerplate code in various IOMMU drivers. Signed-off-by: Lu Baolu --- drivers/iommu/iommu.c | 15 +++ include/linux/iommu.h | 2 ++ 2 files changed, 17 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index e49c5a5b8cc1..6b731171d42f 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -194,6 +195,8 @@ EXPORT_SYMBOL_GPL(iommu_device_unregister); static struct dev_iommu *dev_iommu_get(struct device *dev) { struct dev_iommu *param = dev->iommu; + u32 max_pasids = 0; + int ret; if (param) return param; @@ -202,6 +205,18 @@ static struct dev_iommu *dev_iommu_get(struct device *dev) if (!param) return NULL; + if (dev_is_pci(dev)) { + ret = pci_max_pasids(to_pci_dev(dev)); + if (ret > 0) + max_pasids = ret; + } else { + ret = device_property_read_u32(dev, "pasid-num-bits", + _pasids); + if (!ret) + max_pasids = (1UL << max_pasids); + } + param->pasids = max_pasids; + mutex_init(>lock); dev->iommu = param; return param; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 45f274b2640d..d4296136ba75 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -371,6 +371,7 @@ struct iommu_fault_param { * @fwspec: IOMMU fwspec data * @iommu_dev: IOMMU device this device is linked to * @priv: IOMMU Driver private data + * @pasids: number of supported PASIDs * * TODO: migrate other per device data pointers under iommu_dev_data, e.g. * struct iommu_group *iommu_group; @@ -382,6 +383,7 @@ struct dev_iommu { struct iommu_fwspec *fwspec; struct iommu_device *iommu_dev; void*priv; + u32 pasids; }; int iommu_device_register(struct iommu_device *iommu, -- 2.25.1 Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support
On 2022/5/20 00:33, Jean-Philippe Brucker wrote: diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h index 8909ea1094e3..1be21e6b93ec 100644 --- a/drivers/iommu/iommu-sva-lib.h +++ b/drivers/iommu/iommu-sva-lib.h @@ -7,6 +7,7 @@ #include #include +#include int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max); struct mm_struct *iommu_sva_find(ioasid_t pasid); @@ -16,6 +17,20 @@ struct device; struct iommu_fault; struct iopf_queue; +struct iommu_sva_domain { + struct iommu_domain domain; + struct mm_struct*mm; +}; + +#define to_sva_domain(d) container_of_safe(d, struct iommu_sva_domain, domain) Is there a reason to use the 'safe' version of container_of()? Callers of to_sva_domain() don't check the return value before dereferencing it so they would break anyway if someone passes an error pointer as domain. I think it matters because there is no other user of container_of_safe() in the kernel (the only user, lustre, went away in 2018) so someone will want to remove it. Fair enough. I wondered why there's no user in the tree. Thanks for the explanation. I will replace it with container_of(). Apart from that Reviewed-by: Jean-Philippe Brucker Thank you! Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 01/10] iommu: Add pasids field in struct iommu_device
Hi Jean, On 2022/5/19 18:37, Jean-Philippe Brucker wrote: On Thu, May 19, 2022 at 03:20:38PM +0800, Lu Baolu wrote: diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 88817a3376ef..6e2cd082c670 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3546,6 +3546,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) /* SID/SSID sizes */ smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg); smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg); + smmu->iommu.pasids = smmu->ssid_bits; This should be 1UL << smmu->ssid_bits Done. Thank you for the reminding. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API
On 2022/5/19 02:21, Jacob Pan wrote: DMA requests tagged with PASID can target individual IOMMU domains. Introduce a domain-wide PASID for DMA API, it will be used on the same mapping as legacy DMA without PASID. Let it be IOVA or PA in case of identity domain. Signed-off-by: Jacob Pan --- include/linux/iommu.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 9405034e3013..36ad007084cc 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -106,6 +106,8 @@ struct iommu_domain { enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault, void *data); void *fault_data; + ioasid_t dma_pasid; /* Used for DMA requests with PASID */ This looks more suitable for iommu_dma_cookie? + atomic_t dma_pasid_users; }; static inline bool iommu_is_dma_domain(struct iommu_domain *domain) Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain
On 2022/5/19 02:21, Jacob Pan wrote: IOMMU group maintains a PASID array which stores the associated IOMMU domains. This patch introduces a helper function to do domain to PASID look up. It will be used by TLB flush and device-PASID attach verification. Do you really need this? The IOMMU driver has maintained a device tracking list for each domain. It has been used for cache invalidation when unmap() is called against dma domain. Best regards, baolu Signed-off-by: Jacob Pan --- drivers/iommu/iommu.c | 22 ++ include/linux/iommu.h | 6 +- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 00d0262a1fe9..22f44833db64 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3199,3 +3199,25 @@ struct iommu_domain *iommu_get_domain_for_iopf(struct device *dev, return domain; } + +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct iommu_domain *domain) +{ + struct iommu_domain *tdomain; + struct iommu_group *group; + unsigned long index; + ioasid_t pasid = INVALID_IOASID; + + group = iommu_group_get(dev); + if (!group) + return pasid; + + xa_for_each(>pasid_array, index, tdomain) { + if (domain == tdomain) { + pasid = index; + break; + } + } + iommu_group_put(group); + + return pasid; +} diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 36ad007084cc..c0440a4be699 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -694,7 +694,7 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid); struct iommu_domain * iommu_get_domain_for_iopf(struct device *dev, ioasid_t pasid); - +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct iommu_domain *domain); #else /* CONFIG_IOMMU_API */ struct iommu_ops {}; @@ -1070,6 +1070,10 @@ iommu_get_domain_for_iopf(struct device *dev, ioasid_t pasid) { return NULL; } +static ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct iommu_domain *domain) +{ + return INVALID_IOASID; +} #endif /* CONFIG_IOMMU_API */ #ifdef CONFIG_IOMMU_SVA ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 6/7] x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMU
On 2022/5/17 19:13, Jason Gunthorpe wrote: On Tue, May 17, 2022 at 10:05:43AM +0800, Baolu Lu wrote: Hi Jason, On 2022/5/17 02:06, Jason Gunthorpe wrote: +static __init int tboot_force_iommu(void) +{ + if (!tboot_enabled()) + return 0; + + if (no_iommu || dmar_disabled) + pr_warn("Forcing Intel-IOMMU to enabled\n"); Unrelated, but when we are in the special secure IOMMU modes, do we force ATS off? Specifically does the IOMMU reject TLPs that are marked as translated? Good question. From IOMMU point of view, I don't see a point to force ATS off, but trust boot involves lots of other things that I am not familiar with. Anybody else could help to answer? ATS is inherently not secure, if a rouge device can issue a TLP with the translated bit set then it has unlimited access to host memory. Agreed. The current logic is that the platform lets the OS know such devices through firmware (ACPI/DT) and OS sets the untrusted flag in their device structures. The IOMMU subsystem will disable ATS on devices with the untrusted flag set. There is some discussion about allowing the supervisor users to set the trust policy through the sysfs ABI, but I don't think this has happened in upstream kernel. Many of these trusted iommu scenarios rely on the idea that a rouge device cannot DMA to arbitary system memory. I am not sure whether tboot has the same requirement. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops
On 2022/5/17 21:13, Jason Gunthorpe wrote: On Tue, May 17, 2022 at 01:43:03PM +0100, Robin Murphy wrote: FWIW from my point of view I'm happy with having a .detach_dev_pasid op meaning implicitly-blocked access for now. If this is the path then lets not call it attach/detach please. 'set_dev_pasid' and 'set_dev_blocking_pasid' are clearer names. Sure. And with the blocking domain implemented, the set_dev_blocking_pasid could be deprecated. On SMMUv3, PASIDs don't mix with our current notion of IOMMU_DOMAIN_IDENTITY (nor the potential one for IOMMU_DOMAIN_BLOCKED), so giving PASIDs functional symmetry with devices would need significantly more work anyway. There is no extra work in the driver, beyond SMMU having to implement a blocking domain. The blocking domain's set_dev_pasid op simply is whatever set_dev_blocking_pasid would have done on the unmanaged domain. identity doesn't come into this, identity domains should have a NULL set_dev_pasid op if the driver can't support using it on a PASID. IMHO blocking_domain->ops->set_dev_pasid() is just a more logical name than domain->ops->set_dev_blocking_pasid() - especially since VFIO would like drivers to implement blocking domain anyhow. Perhaps implementing blocking domains for intel and smmuv3 iommu drivers seem to be a more practical start. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops
Hi Jason, On 2022/5/16 21:57, Jason Gunthorpe wrote: On Mon, May 16, 2022 at 12:22:08PM +0100, Robin Murphy wrote: On 2022-05-16 02:57, Lu Baolu wrote: Each IOMMU driver must provide a blocking domain ops. If the hardware supports detaching domain from device, setting blocking domain equals detaching the existing domain from the deivce. Otherwise, an UNMANAGED domain without any mapping will be used instead. Unfortunately that's backwards - most of the implementations of .detach_dev are disabling translation entirely, meaning the device ends up effectively in passthrough rather than blocked. Ideally we'd convert the detach_dev of every driver into either a blocking or identity domain. The trick is knowing which is which.. I am still a bit puzzled about how the blocking_domain should be used when it is extended to support ->set_dev_pasid. If it's a blocking domain, the IOMMU driver knows that setting the blocking domain to device pasid means detaching the existing one. But if it's an identity domain, how could the IOMMU driver choose between: - setting the input domain to the pasid on device; or, - detaching the existing domain. I've ever thought about below solutions: - Checking the domain types and dispatching them to different operations. - Using different blocking domains for different types of domains. But both look rough. Guessing going down the list: apple dart - blocking, detach_dev calls apple_dart_hw_disable_dma() same as IOMMU_DOMAIN_BLOCKED [I wonder if this drive ris wrong in other ways though because I dont see a remove_streams in attach_dev] exynos - this seems to disable the 'sysmmu' so I'm guessing this is identity iommu-vmsa - Comment says 'disable mmu translaction' so I'm guessing this is idenity mkt_v1 - Code looks similar to mkt, which is probably identity. rkt - No idea sprd - No idea sun50i - This driver confusingly treats identity the same as unmanaged, seems wrong, no idea. amd - Not sure, clear_dte_entry() seems to set translation on but points the PTE to 0 ? Based on the spec table 8 I would have expected TV to be clear which would be blocking. Maybe a bug?? arm smmu qcomm - not sure intel - blocking These doesn't support default domains, so detach_dev should return back to DMA API ownership, which is either identity or something weird: fsl_pamu - identity due to the PPC use of dma direct msm mkt omap s390 - platform DMA ops terga-gart - Usually something called a GART would be 0 length once disabled, guessing blocking? tegra-smmu So, the approach here should be to go driver by driver and convert detach_dev to either identity, blocking or just delete it entirely, excluding the above 7 that don't support default domains. And get acks from the driver owners. Agreed. There seems to be a long way to go. I am wondering if we could decouple this refactoring from my new SVA API work? We can easily switch .detach_dev_pasid to using blocking domain later. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 6/7] x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMU
Hi Jason, On 2022/5/17 02:06, Jason Gunthorpe wrote: +static __init int tboot_force_iommu(void) +{ + if (!tboot_enabled()) + return 0; + + if (no_iommu || dmar_disabled) + pr_warn("Forcing Intel-IOMMU to enabled\n"); Unrelated, but when we are in the special secure IOMMU modes, do we force ATS off? Specifically does the IOMMU reject TLPs that are marked as translated? Good question. From IOMMU point of view, I don't see a point to force ATS off, but trust boot involves lots of other things that I am not familiar with. Anybody else could help to answer? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops
Hi Robin, On 2022/5/16 19:22, Robin Murphy wrote: On 2022-05-16 02:57, Lu Baolu wrote: Each IOMMU driver must provide a blocking domain ops. If the hardware supports detaching domain from device, setting blocking domain equals detaching the existing domain from the deivce. Otherwise, an UNMANAGED domain without any mapping will be used instead. Unfortunately that's backwards - most of the implementations of .detach_dev are disabling translation entirely, meaning the device ends up effectively in passthrough rather than blocked. Conversely, at least arm-smmu and arm-smmu-v3 could implement IOMMU_DOMAIN_BLOCKED properly with fault-type S2CRs and STEs respectively, it just needs a bit of wiring up. Thank you for letting me know this. This means that we need to add an additional UNMANAGED domain for each iommu group, although it is not used most of the time. If most IOMMU drivers could implement real dumb blocking domains, this burden may be reduced. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops
On 2022/5/12 19:51, Jason Gunthorpe wrote: On Thu, May 12, 2022 at 08:00:59AM +0100, Jean-Philippe Brucker wrote: It is not "missing" it is just renamed to blocking_domain->ops->set_dev_pasid() The implementation of that function would be identical to detach_dev_pasid. attach(dev, pasid, sva_domain) detach(dev, pasid, sva_domain) versus set_dev_pasid(dev, pasid, sva_domain) set_dev_pasid(dev, pasid, blocking) we loose the information of the domain previously attached, and the SMMU driver has to retrieve it to find the ASID corresponding to the mm. It would be easy to have the old domain be an input as well - the core code knows it. Thanks a lot for all suggestions. I have posted a follow-up series for this: https://lore.kernel.org/linux-iommu/20220516015759.2952771-1-baolu...@linux.intel.com/ Let's discuss this there. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Try info->iommu in device_to_iommu()
On 2022/5/13 08:32, Nicolin Chen wrote: Local boot test and VFIO sanity test show that info->iommu can be used in device_to_iommu() as a fast path. So this patch adds it. Signed-off-by: Nicolin Chen --- drivers/iommu/intel/iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 2990f80c5e08..412fca5ab9cd 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -777,6 +777,7 @@ static bool iommu_is_dummy(struct intel_iommu *iommu, struct device *dev) struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn) { struct dmar_drhd_unit *drhd = NULL; + struct device_domain_info *info; struct pci_dev *pdev = NULL; struct intel_iommu *iommu; struct device *tmp; @@ -786,6 +787,10 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn) if (!dev) return NULL; + info = dev_iommu_priv_get(dev); + if (info) + return info->iommu; device_to_iommu() also returns device source id (@bus, @devfn). Perhaps, we can make device_to_iommu() only for probe_device() where the per-device info data is not initialized yet. After probe_device(), iommu and sid are retrieved through other helpers by looking up the device info directly? Best regards, baolu + if (dev_is_pci(dev)) { struct pci_dev *pf_pdev; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/vt-d: Make DMAR_UNITS_SUPPORTED a config setting
On 2022/5/13 07:12, Steve Wahl wrote: On Thu, May 12, 2022 at 10:13:09AM -0500, Steve Wahl wrote: To support up to 64 sockets with 10 DMAR units each (640), make the value of DMAR_UNITS_SUPPORTED adjustable by a config variable, CONFIG_DMAR_UNITS_SUPPORTED, and make it's default 1024 when MAXSMP is set. If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ remapping doesn't support X2APIC mode x2apic disabled"; and the system fails to boot properly. Signed-off-by: Steve Wahl I've received a report from the kernel test robot , that this patch causes an error (shown below) when CONFIG_IOMMU_SUPPORT is not set. In my opinion, this is because include/linux/dmar.h and include/linux/intel-iommu are being #included when they are not really being used. I've tried placing the contents of intel-iommu.h within an #ifdef CONFIG_INTEL_IOMMU, and that fixes the problem. Two questions: A) Is this the desired approach to to the fix? Most part of include/linux/intel-iommu.h is private to Intel IOMMU driver. They should be put in a header like drivers/iommu/intel /iommu.h. Eventually, we should remove include/linux/intel-iommu.h and device drivers interact with iommu subsystem through the IOMMU kAPIs. Best regards, baolu B) Should it be a separate patch, or added onto this patch as a v3? Error message: -- In file included from include/linux/intel-iommu.h:21, from arch/x86/kvm/x86.c:44: include/linux/dmar.h:21:33: error: 'CONFIG_DMAR_UNITS_SUPPORTED' undeclared here (not in a function); did you mean 'DMAR_UNITS_SUPPORTED'? 21 | #define DMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED | ^~~ include/linux/intel-iommu.h:531:35: note: in expansion of macro 'DMAR_UNITS_SUPPORTED' 531 | unsigned int iommu_refcnt[DMAR_UNITS_SUPPORTED]; | ^~~~ vim +21 include/linux/dmar.h 20 > 21 #define DMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED 22 Initial stab at fixing it: -- diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 2f9891cb3d00..916fd7b5bcb5 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -10,6 +10,8 @@ #ifndef _INTEL_IOMMU_H_ #define _INTEL_IOMMU_H_ +#ifdef CONFIG_INTEL_IOMMU + #include #include #include @@ -831,4 +833,6 @@ static inline const char *decode_prq_descriptor(char *str, size_t size, return str; } +#endif /* CONFIG_IOMMU_SUPPORT */ + #endif Thanks. --> Steve Wahl ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On 2022/5/12 20:03, Jason Gunthorpe wrote: On Thu, May 12, 2022 at 07:59:41PM +0800, Baolu Lu wrote: On 2022/5/12 19:48, Jason Gunthorpe wrote: On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote: On 2022/5/12 13:01, Tian, Kevin wrote: From: Baolu Lu Sent: Thursday, May 12, 2022 11:03 AM On 2022/5/11 22:53, Jason Gunthorpe wrote: Also, given the current arrangement it might make sense to have a struct iommu_domain_sva given that no driver is wrappering this in something else. Fair enough. How about below wrapper? +struct iommu_sva_domain { + /* +* Common iommu domain header,*must* be put at the top +* of the structure. +*/ + struct iommu_domain domain; + struct mm_struct *mm; + struct iommu_sva bond; +} The refcount is wrapped in bond. I'm still not sure that bond is necessary "bond" is the sva handle that the device drivers get through calling iommu_sva_bind(). 'bond' was required before because we didn't have a domain to wrap the page table at that time. Now we have a domain and it is 1:1 associated to bond. Probably make sense now by just returning the domain as the sva handle instead? It also includes the device information that the domain has been attached. So the sva_unbind() looks like this: /** * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device * @handle: the handle returned by iommu_sva_bind_device() * * Put reference to a bond between device and address space. The device should * not be issuing any more transaction for this PASID. All outstanding page * requests for this PASID must have been flushed to the IOMMU. */ void iommu_sva_unbind_device(struct iommu_sva *handle) It's fine to replace the iommu_sva with iommu_sva_domain for sva handle, if we can include the device in the unbind() interface. Why would we have a special unbind for SVA? It's about SVA kAPI for device drivers. The existing kAPIs include: struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata); void iommu_sva_unbind_device(struct iommu_sva *handle); u32 iommu_sva_get_pasid(struct iommu_sva *handle); This is not what we agreed the API should be. We agreed: iommu_sva_domain_alloc() iommu_attach_device_pasid() iommu_detach_device_pasid() Again, SVA should not be different from normal domain stuff. Yes, agreed. I am trying to achieve this in two steps. This first step focuses on internal iommu implementation and keep the driver kAPI untouched. Then, the second step focus on the driver APIs. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On 2022/5/12 19:51, Jason Gunthorpe wrote: On Thu, May 12, 2022 at 11:02:39AM +0800, Baolu Lu wrote: + mutex_lock(>mutex); + domain = xa_load(>pasid_array, pasid); + if (domain && domain->type != type) + domain = NULL; + mutex_unlock(>mutex); + iommu_group_put(group); + + return domain; This is bad locking, group->pasid_array values cannot be taken outside the lock. It's not iommu core, but SVA (or other feature components) that manage the life cycle of a domain. The iommu core only provides a place to store the domain pointer. The feature components are free to fetch their domain pointers from iommu core as long as they are sure that the domain is alive during use. I'm not convinced. I'm sorry, I may not have explained it clearly. :-) This helper is safe for uses inside the IOMMU subsystem. We could trust ourselves that nobody will abuse this helper to obtain domains belonging to others as the pasid has been allocated for SVA code. No other code should be able to setup another type of domain for this pasid. The SVA code has its own lock mechanism to avoid using after free. Please correct me if I missed anything. :-) By the way, I can see some similar helpers in current IOMMU core. For example, struct iommu_domain *iommu_get_domain_for_dev(struct device *dev) { struct iommu_domain *domain; struct iommu_group *group; group = iommu_group_get(dev); if (!group) return NULL; domain = group->domain; iommu_group_put(group); return domain; } EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev); Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On 2022/5/12 19:48, Jason Gunthorpe wrote: On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote: On 2022/5/12 13:01, Tian, Kevin wrote: From: Baolu Lu Sent: Thursday, May 12, 2022 11:03 AM On 2022/5/11 22:53, Jason Gunthorpe wrote: Also, given the current arrangement it might make sense to have a struct iommu_domain_sva given that no driver is wrappering this in something else. Fair enough. How about below wrapper? +struct iommu_sva_domain { + /* +* Common iommu domain header,*must* be put at the top +* of the structure. +*/ + struct iommu_domain domain; + struct mm_struct *mm; + struct iommu_sva bond; +} The refcount is wrapped in bond. I'm still not sure that bond is necessary "bond" is the sva handle that the device drivers get through calling iommu_sva_bind(). 'bond' was required before because we didn't have a domain to wrap the page table at that time. Now we have a domain and it is 1:1 associated to bond. Probably make sense now by just returning the domain as the sva handle instead? It also includes the device information that the domain has been attached. So the sva_unbind() looks like this: /** * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device * @handle: the handle returned by iommu_sva_bind_device() * * Put reference to a bond between device and address space. The device should * not be issuing any more transaction for this PASID. All outstanding page * requests for this PASID must have been flushed to the IOMMU. */ void iommu_sva_unbind_device(struct iommu_sva *handle) It's fine to replace the iommu_sva with iommu_sva_domain for sva handle, if we can include the device in the unbind() interface. Why would we have a special unbind for SVA? It's about SVA kAPI for device drivers. The existing kAPIs include: struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata); void iommu_sva_unbind_device(struct iommu_sva *handle); u32 iommu_sva_get_pasid(struct iommu_sva *handle); SVA should not different from normal domains it should use the normal detach flow too. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid
Hi Jason, On 2022/5/12 01:00, Jason Gunthorpe wrote: Consolidate pasid programming into dev_set_pasid() then called by both intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right? I was only suggesting that really dev_attach_pasid() op is misnamed, it should be called set_dev_pasid() and act like a set, not a paired attach/detach - same as the non-PASID ops. So, "set_dev_pasid(domain, device, pasid)" equals to dev_attach_pasid() and "set_dev_pasid(NULL, device, pasid)" equals to dev_detach_pasid()? do I understand it right? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On 2022/5/12 13:44, Tian, Kevin wrote: From: Baolu Lu Sent: Thursday, May 12, 2022 1:17 PM On 2022/5/12 13:01, Tian, Kevin wrote: From: Baolu Lu Sent: Thursday, May 12, 2022 11:03 AM On 2022/5/11 22:53, Jason Gunthorpe wrote: Also, given the current arrangement it might make sense to have a struct iommu_domain_sva given that no driver is wrappering this in something else. Fair enough. How about below wrapper? +struct iommu_sva_domain { + /* +* Common iommu domain header,*must* be put at the top +* of the structure. +*/ + struct iommu_domain domain; + struct mm_struct *mm; + struct iommu_sva bond; +} The refcount is wrapped in bond. I'm still not sure that bond is necessary "bond" is the sva handle that the device drivers get through calling iommu_sva_bind(). 'bond' was required before because we didn't have a domain to wrap the page table at that time. Now we have a domain and it is 1:1 associated to bond. Probably make sense now by just returning the domain as the sva handle instead? It also includes the device information that the domain has been attached. So the sva_unbind() looks like this: /** * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device * @handle: the handle returned by iommu_sva_bind_device() * * Put reference to a bond between device and address space. The device should * not be issuing any more transaction for this PASID. All outstanding page * requests for this PASID must have been flushed to the IOMMU. */ void iommu_sva_unbind_device(struct iommu_sva *handle) It's fine to replace the iommu_sva with iommu_sva_domain for sva handle, if we can include the device in the unbind() interface. can we just have unbind(domain, device)? Yes. With this, we can remove bond. This could be done in below phase 2. Anyway, I'd expect to achieve all these in two steps: - sva and iopf refactoring, only iommu internal changes; - sva interface refactoring, only interface changes. Does above work? Best regards, baolu Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On 2022/5/12 13:01, Tian, Kevin wrote: From: Baolu Lu Sent: Thursday, May 12, 2022 11:03 AM On 2022/5/11 22:53, Jason Gunthorpe wrote: Also, given the current arrangement it might make sense to have a struct iommu_domain_sva given that no driver is wrappering this in something else. Fair enough. How about below wrapper? +struct iommu_sva_domain { + /* +* Common iommu domain header,*must* be put at the top +* of the structure. +*/ + struct iommu_domain domain; + struct mm_struct *mm; + struct iommu_sva bond; +} The refcount is wrapped in bond. I'm still not sure that bond is necessary "bond" is the sva handle that the device drivers get through calling iommu_sva_bind(). 'bond' was required before because we didn't have a domain to wrap the page table at that time. Now we have a domain and it is 1:1 associated to bond. Probably make sense now by just returning the domain as the sva handle instead? It also includes the device information that the domain has been attached. So the sva_unbind() looks like this: /** * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device * @handle: the handle returned by iommu_sva_bind_device() * * Put reference to a bond between device and address space. The device should * not be issuing any more transaction for this PASID. All outstanding page * requests for this PASID must have been flushed to the IOMMU. */ void iommu_sva_unbind_device(struct iommu_sva *handle) It's fine to replace the iommu_sva with iommu_sva_domain for sva handle, if we can include the device in the unbind() interface. Anyway, I'd expect to achieve all these in two steps: - sva and iopf refactoring, only iommu internal changes; - sva interface refactoring, only interface changes. Does above work? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On 2022/5/11 22:53, Jason Gunthorpe wrote: Assuming we leave room for multi-device groups this logic should just be group = iommu_group_get(dev); if (!group) return -ENODEV; mutex_lock(>mutex); domain = xa_load(>pasid_array, mm->pasid); if (!domain || domain->type != IOMMU_DOMAIN_SVA || domain->mm != mm) domain = iommu_sva_alloc_domain(dev, mm); ? Agreed. As a helper in iommu core, how about making it more generic like below? IDK, is there more users of this? AFAIK SVA is the only place that will be auto-sharing? The generic thing is that components, like SVA, want to fetch the attached domain from the iommu core. + mutex_lock(>mutex); + domain = xa_load(>pasid_array, pasid); + if (domain && domain->type != type) + domain = NULL; + mutex_unlock(>mutex); + iommu_group_put(group); + + return domain; This is bad locking, group->pasid_array values cannot be taken outside the lock. It's not iommu core, but SVA (or other feature components) that manage the life cycle of a domain. The iommu core only provides a place to store the domain pointer. The feature components are free to fetch their domain pointers from iommu core as long as they are sure that the domain is alive during use. And stick the refcount in the sva_domain Also, given the current arrangement it might make sense to have a struct iommu_domain_sva given that no driver is wrappering this in something else. Fair enough. How about below wrapper? +struct iommu_sva_domain { + /* +* Common iommu domain header,*must* be put at the top +* of the structure. +*/ + struct iommu_domain domain; + struct mm_struct *mm; + struct iommu_sva bond; +} The refcount is wrapped in bond. I'm still not sure that bond is necessary "bond" is the sva handle that the device drivers get through calling iommu_sva_bind(). But yes, something like that Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid
On 2022/5/12 01:25, Jacob Pan wrote: Hi Jason, On Wed, 11 May 2022 14:00:25 -0300, Jason Gunthorpe wrote: On Wed, May 11, 2022 at 10:02:16AM -0700, Jacob Pan wrote: If not global, perhaps we could have a list of pasids (e.g. xarray) attached to the device_domain_info. The TLB flush logic would just go through the list w/o caring what the PASIDs are for. Does it make sense to you? Sort of, but we shouldn't duplicate xarrays - the group already has this xarray - need to find some way to allow access to it from the driver. I am not following, here are the PASIDs for devTLB flush which is per device. Why group? Because group is where the core code stores it. I see, with singleton group. I guess I can let dma-iommu code call iommu_attach_dma_pasid { iommu_attach_device_pasid(); Then the PASID will be stored in the group xa. The flush code can retrieve PASIDs from device_domain_info.device -> group -> pasid_array. Thanks for pointing it out, I missed the new pasid_array. We could retrieve PASIDs from the device PASID table but xa would be more efficient. Are you suggesting the dma-iommu API should be called iommu_set_dma_pasid instead of iommu_attach_dma_pasid? No that API is Ok - the driver ops API should be 'set' not attach/detach Sounds good, this operation has little in common with domain_ops.dev_attach_pasid() used by SVA domain. So I will add a new domain_ops.dev_set_pasid() What? No, their should only be one operation, 'dev_set_pasid' and it is exactly the same as the SVA operation. It configures things so that any existing translation on the PASID is removed and the PASID translates according to the given domain. SVA given domain or UNMANAGED given domain doesn't matter to the higher level code. The driver should implement per-domain ops as required to get the different behaviors. Perhaps some code to clarify, we have sva_domain_ops.dev_attach_pasid() = intel_svm_attach_dev_pasid; default_domain_ops.dev_attach_pasid() = intel_iommu_attach_dev_pasid; Yes, keep that structure Consolidate pasid programming into dev_set_pasid() then called by both intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right? I was only suggesting that really dev_attach_pasid() op is misnamed, it should be called set_dev_pasid() and act like a set, not a paired attach/detach - same as the non-PASID ops. Got it. Perhaps another patch to rename, Baolu? Yes. I can rename it in my sva series if others are also happy with this naming. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces
On 2022/5/10 23:23, Jason Gunthorpe wrote: On Tue, May 10, 2022 at 02:17:34PM +0800, Lu Baolu wrote: +/** + * iommu_sva_bind_device() - Bind a process address space to a device + * @dev: the device + * @mm: the mm to bind, caller must hold a reference to mm_users + * @drvdata: opaque data pointer to pass to bind callback + * + * Create a bond between device and address space, allowing the device to access + * the mm using the returned PASID. If a bond already exists between @device and + * @mm, it is returned and an additional reference is taken. Caller must call + * iommu_sva_unbind_device() to release each reference. + * + * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to + * initialize the required SVA features. + * + * On error, returns an ERR_PTR value. + */ +struct iommu_sva * +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) +{ + int ret = -EINVAL; + struct iommu_sva *handle; + struct iommu_domain *domain; + + /* +* TODO: Remove the drvdata parameter after kernel PASID support is +* enabled for the idxd driver. +*/ + if (drvdata) + return ERR_PTR(-EOPNOTSUPP); Why is this being left behind? Clean up the callers too please. Okay, let me try to. + /* Allocate mm->pasid if necessary. */ + ret = iommu_sva_alloc_pasid(mm, 1, (1U << dev->iommu->pasid_bits) - 1); + if (ret) + return ERR_PTR(ret); + + mutex_lock(_sva_lock); + /* Search for an existing bond. */ + handle = xa_load(>iommu->sva_bonds, mm->pasid); + if (handle) { + refcount_inc(>users); + goto out_success; + } How can there be an existing bond? dev->iommu is per-device The device_group_immutable_singleton() insists on a single device group Basically 'sva_bonds' is the same thing as the group->pasid_array. Yes, really. Assuming we leave room for multi-device groups this logic should just be group = iommu_group_get(dev); if (!group) return -ENODEV; mutex_lock(>mutex); domain = xa_load(>pasid_array, mm->pasid); if (!domain || domain->type != IOMMU_DOMAIN_SVA || domain->mm != mm) domain = iommu_sva_alloc_domain(dev, mm); ? Agreed. As a helper in iommu core, how about making it more generic like below? +struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev, + iosid_t pasid, + unsigned int type) +{ + struct iommu_domain *domain; + struct iommu_group *group; + + if (!pasid_valid(pasid)) + return NULL; + + group = iommu_group_get(dev); + if (!group) + return NULL; + + mutex_lock(>mutex); + domain = xa_load(>pasid_array, pasid); + if (domain && domain->type != type) + domain = NULL; + mutex_unlock(>mutex); + iommu_group_put(group); + + return domain; +} And stick the refcount in the sva_domain Also, given the current arrangement it might make sense to have a struct iommu_domain_sva given that no driver is wrappering this in something else. Fair enough. How about below wrapper? +struct iommu_sva_domain { + /* +* Common iommu domain header, *must* be put at the top +* of the structure. +*/ + struct iommu_domain domain; + struct mm_struct *mm; + struct iommu_sva bond; +} The refcount is wrapped in bond. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops
On 2022/5/10 22:02, Jason Gunthorpe wrote: On Tue, May 10, 2022 at 02:17:29PM +0800, Lu Baolu wrote: This adds a pair of common domain ops for this purpose and adds helpers to attach/detach a domain to/from a {device, PASID}. I wonder if this should not have a detach op - after discussing with Robin we can see that detach_dev is not used in updated drivers. Instead attach_dev acts as 'set_domain' So, it would be more symmetrical if attaching a blocking_domain to the PASID was the way to 'detach'. This could be made straightforward by following the sketch I showed to have a static, global blocing_domain and providing a pointer to it in struct iommu_ops Then 'detach pasid' is: iommu_ops->blocking_domain->ops->attach_dev_pasid(domain, dev, pasid); And we move away from the notion of 'detach' and in the direction that everything continuously has a domain set. PASID would logically default to blocking_domain, though we wouldn't track this anywhere. I am not sure whether we still need to keep the blocking domain concept when we are entering the new PASID world. Please allow me to wait and listen to more opinions. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 02/12] iommu: Add pasid_bits field in struct dev_iommu
On 2022/5/10 22:34, Jason Gunthorpe wrote: On Tue, May 10, 2022 at 02:17:28PM +0800, Lu Baolu wrote: int iommu_device_register(struct iommu_device *iommu, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 627a3ed5ee8f..afc63fce6107 100644 +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2681,6 +2681,8 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) smmu->features & ARM_SMMU_FEAT_STALL_FORCE) master->stall_enabled = true; + dev->iommu->pasid_bits = master->ssid_bits; return >iommu; err_free_master: diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 2990f80c5e08..99643f897f26 100644 +++ b/drivers/iommu/intel/iommu.c @@ -4624,8 +4624,11 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev) if (pasid_supported(iommu)) { int features = pci_pasid_features(pdev); -if (features >= 0) + if (features >= 0) { info->pasid_supported = features | 1; + dev->iommu->pasid_bits = + fls(pci_max_pasids(pdev)) - 1; + } It is not very nice that both the iommu drivers have to duplicate the code to read the pasid capability out of the PCI device. IMHO it would make more sense for the iommu layer to report the capability of its own HW block only, and for the core code to figure out the master's limitation using a bus-specific approach. Fair enough. The iommu hardware capability could be reported in /** * struct iommu_device - IOMMU core representation of one IOMMU hardware * instance * @list: Used by the iommu-core to keep a list of registered iommus * @ops: iommu-ops for talking to this iommu * @dev: struct device for sysfs handling */ struct iommu_device { struct list_head list; const struct iommu_ops *ops; struct fwnode_handle *fwnode; struct device *dev; }; I haven't checked ARM code yet, but it works for x86 as far as I can see. It is also unfortunate that the enable/disable pasid is inside the iommu driver as well - ideally the PCI driver itself would do this when it knows it wants to use PASIDs. The ordering interaction with ATS makes this look quite annoying though. :( I'm also not convinced individual IOMMU drivers should be forcing ATS on, there are performance and functional implications here. Using ATS or not is possibly best left as an administrator policy controlled by the core code. Again we seem to have some mess. Agreed with you. This has already been in my task list. I will start to solve it after the iommufd tasks. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/4] iommu/vt-d: Check domain force_snooping against attached devices
On 2022/5/10 08:51, Tian, Kevin wrote: From: Lu Baolu Sent: Sunday, May 8, 2022 8:35 PM As domain->force_snooping only impacts the devices attached with the domain, there's no need to check against all IOMMU units. On the other hand, force_snooping could be set on a domain no matter whether it has been attached or not, and once set it is an immutable flag. If no device attached, the operation always succeeds. Then this empty domain can be only attached to a device of which the IOMMU supports snoop control. Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian Thank you, Kevin. I will queue this series for v5.19. Best regards, baolu --- include/linux/intel-iommu.h | 1 + drivers/iommu/intel/pasid.h | 2 ++ drivers/iommu/intel/iommu.c | 53 ++--- drivers/iommu/intel/pasid.c | 42 + 4 files changed, 95 insertions(+), 3 deletions(-) diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 72e5d7900e71..4f29139bbfc3 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -540,6 +540,7 @@ struct dmar_domain { u8 has_iotlb_device: 1; u8 iommu_coherency: 1; /* indicate coherency of iommu access */ u8 force_snooping : 1; /* Create IOPTEs with snoop control */ + u8 set_pte_snp:1; struct list_head devices; /* all devices' list */ struct iova_domain iovad; /* iova's that belong to this domain */ diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h index ab4408c824a5..583ea67fc783 100644 --- a/drivers/iommu/intel/pasid.h +++ b/drivers/iommu/intel/pasid.h @@ -123,4 +123,6 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu, bool fault_ignore); int vcmd_alloc_pasid(struct intel_iommu *iommu, u32 *pasid); void vcmd_free_pasid(struct intel_iommu *iommu, u32 pasid); +void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu, + struct device *dev, u32 pasid); #endif /* __INTEL_PASID_H */ diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index b4802f4055a0..048ebfbd5fcb 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2459,7 +2459,7 @@ static int domain_setup_first_level(struct intel_iommu *iommu, if (level == 5) flags |= PASID_FLAG_FL5LP; - if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED) + if (domain->force_snooping) flags |= PASID_FLAG_PAGE_SNOOP; return intel_pasid_setup_first_level(iommu, dev, (pgd_t *)pgd, pasid, @@ -,7 +,7 @@ static int intel_iommu_map(struct iommu_domain *domain, prot |= DMA_PTE_READ; if (iommu_prot & IOMMU_WRITE) prot |= DMA_PTE_WRITE; - if (dmar_domain->force_snooping) + if (dmar_domain->set_pte_snp) prot |= DMA_PTE_SNP; max_addr = iova + size; @@ -4567,13 +4567,60 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain, return phys; } +static bool domain_support_force_snooping(struct dmar_domain *domain) +{ + struct device_domain_info *info; + bool support = true; + + assert_spin_locked(_domain_lock); + list_for_each_entry(info, >devices, link) { + if (!ecap_sc_support(info->iommu->ecap)) { + support = false; + break; + } + } + + return support; +} + +static void domain_set_force_snooping(struct dmar_domain *domain) +{ + struct device_domain_info *info; + + assert_spin_locked(_domain_lock); + + /* +* Second level page table supports per-PTE snoop control. The +* iommu_map() interface will handle this by setting SNP bit. +*/ + if (!domain_use_first_level(domain)) { + domain->set_pte_snp = true; + return; + } + + list_for_each_entry(info, >devices, link) + intel_pasid_setup_page_snoop_control(info->iommu, info- dev, +PASID_RID2PASID); +} + static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain) { struct dmar_domain *dmar_domain = to_dmar_domain(domain); + unsigned long flags; - if (!domain_update_iommu_snooping(NULL)) + if (dmar_domain->force_snooping) + return true; + + spin_lock_irqsave(_domain_lock, flags); + if (!domain_support_force_snooping(dmar_domain)) { + spin_unlock_irqrestore(_domain_lock, flags); return false; + } + + domain_set_force_snooping(dmar_domain); dmar_domain->force_snooping = true; + spin_unlock_irqrestore(_domain_lock, flags); + return true; } diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index
Re: [PATCH v5 04/12] iommu/sva: Basic data structures for SVA
On 2022/5/7 16:32, Baolu Lu wrote: Hi Jean, On 2022/5/5 14:42, Baolu Lu wrote: On 2022/5/4 02:09, Jean-Philippe Brucker wrote: On Mon, May 02, 2022 at 09:48:34AM +0800, Lu Baolu wrote: Use below data structures for SVA implementation in the IOMMU core: - struct iommu_sva_ioas Represent the I/O address space shared with an application CPU address space. This structure has a 1:1 relationship with an mm_struct. It grabs a "mm->mm_count" refcount during creation and drop it on release. Do we actually need this structure? At the moment it only keeps track of bonds, which we can move to struct dev_iommu. Replacing it by a mm pointer in struct iommu_domain simplifies the driver and seems to work Fair enough. +struct iommu_sva_ioas { + struct mm_struct *mm; + ioasid_t pasid; + + /* Counter of domains attached to this ioas. */ + refcount_t users; + + /* All bindings are linked here. */ + struct list_head bonds; +}; By moving @mm to struct iommu_domain and @bonds to struct dev_iommu, the code looks simpler. The mm, sva domain and per-device dev_iommu are guaranteed to be valid during bind() and unbind(). Will head this direction in the next version. I'm trying to implement this idea in real code. It seems that we need additional fields in struct iommu_domain to track which devices the mm was bound to. It doesn't simplify the code much. Any thoughts? Sorry, Jean. This has been discussed. We don't need to share sva domain among devices at this stage. It's not a big issue to sva domain as it's a dumb domain which has no support for map()/unmap() and the cache manipulation. I will still head this direction. Sorry for the noise. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 04/12] iommu/sva: Basic data structures for SVA
Hi Jean, On 2022/5/5 14:42, Baolu Lu wrote: On 2022/5/4 02:09, Jean-Philippe Brucker wrote: On Mon, May 02, 2022 at 09:48:34AM +0800, Lu Baolu wrote: Use below data structures for SVA implementation in the IOMMU core: - struct iommu_sva_ioas Represent the I/O address space shared with an application CPU address space. This structure has a 1:1 relationship with an mm_struct. It grabs a "mm->mm_count" refcount during creation and drop it on release. Do we actually need this structure? At the moment it only keeps track of bonds, which we can move to struct dev_iommu. Replacing it by a mm pointer in struct iommu_domain simplifies the driver and seems to work Fair enough. +struct iommu_sva_ioas { + struct mm_struct *mm; + ioasid_t pasid; + + /* Counter of domains attached to this ioas. */ + refcount_t users; + + /* All bindings are linked here. */ + struct list_head bonds; +}; By moving @mm to struct iommu_domain and @bonds to struct dev_iommu, the code looks simpler. The mm, sva domain and per-device dev_iommu are guaranteed to be valid during bind() and unbind(). Will head this direction in the next version. I'm trying to implement this idea in real code. It seems that we need additional fields in struct iommu_domain to track which devices the mm was bound to. It doesn't simplify the code much. Any thoughts? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu: iommu_group_claim_dma_owner() must always assign a domain
On 2022/5/6 03:27, Jason Gunthorpe wrote: On Thu, May 05, 2022 at 07:56:59PM +0100, Robin Murphy wrote: Ack to that, there are certainly further improvements to consider once we've got known-working code into a released kernel, but let's not get ahead of ourselves just now. Yes please (I'm pretty sure we could get away with a single blocking domain per IOMMU instance, rather than per group, but I deliberately saved that one for later - right now one per group to match default domains is simpler to reason about and less churny in the context of the current ownership patches) I noticed this too.. But I thought the driver can do a better job of this. There is no reason it has to allocate memory to return a IOMMU_DOMAIN_BLOCKED domain - this struct could be a globally allocated singleton for the entire driver and that would be even better memory savings. For instance, here is a sketch for the Intel driver based on Baolu's remark that intel_iommu_detach_device() establishes a blocking behavior already on detach_dev (Baolu if you like it can you make a proper patch?): Yes, I will. The same scheme could also be applied to identity/sva/block domains. Perhaps make it after v5.19. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 2/4] iommu/vt-d: Check domain force_snooping against attached devices
On 2022/5/6 14:10, Tian, Kevin wrote: From: Lu Baolu Sent: Friday, May 6, 2022 1:27 PM + +/* + * Set the page snoop control for a pasid entry which has been set up. + */ +void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu, + struct device *dev, u32 pasid) +{ + struct pasid_entry *pte; + u16 did; + + spin_lock(>lock); + pte = intel_pasid_get_entry(dev, pasid); + if (WARN_ON(!pte || !pasid_pte_is_present(pte))) { + spin_unlock(>lock); + return; + } + + pasid_set_pgsnp(pte); + did = pasid_get_domain_id(pte); + spin_unlock(>lock); + + pasid_flush_caches(iommu, pte, pasid, did); +} The comment of pasid_flush_caches() says: /* * This function flushes cache for a newly setup pasid table entry. * Caller of it should not modify the in-use pasid table entries. */ Can you check whether that comment still hold? I am sorry that I overlooked this. Very appreciated for pointing this out! How about below additional changes? diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index da47406011d3..68f7e8cfa848 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -782,5 +782,24 @@ void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu, did = pasid_get_domain_id(pte); spin_unlock(>lock); - pasid_flush_caches(iommu, pte, pasid, did); + if (!ecap_coherent(iommu->ecap)) + clflush_cache_range(pte, sizeof(*pte)); + + /* +* VT-d spec 3.4 table23 states guides for cache invalidation: +* +* - PASID-selective-within-Domain PASID-cache invalidation +* - PASID-selective PASID-based IOTLB invalidation +* - If (pasid is RID_PASID) +*- Global Device-TLB invalidation to affected functions +* Else +*- PASID-based Device-TLB invalidation (with S=1 and +* Addr[63:12]=0x7FFF_F) to affected functions +*/ + pasid_cache_invalidation_with_pasid(iommu, did, pasid); + qi_flush_piotlb(iommu, did, pasid, 0, -1, 0); + + /* Device IOTLB doesn't need to be flushed in caching mode. */ + if (!cap_caching_mode(iommu->cap)) + devtlb_invalidation_with_pasid(iommu, dev, pasid); } Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Increase DMAR_UNITS_SUPPORTED
On 2022/5/6 03:46, Steve Wahl wrote: Increase DMAR_UNITS_SUPPORTED to support 64 sockets with 10 DMAR units each, for a total of 640. If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ remapping doesn't support X2APIC mode x2apic disabled"; and the system fails to boot. Signed-off-by: Steve Wahl Reviewed-by: Mike Travis --- Note that we could not find a reason for connecting DMAR_UNITS_SUPPORTED to MAX_IO_APICS as was done previously. Perhaps it seemed like the two would continue to match on earlier processors. There doesn't appear to be kernel code that assumes that the value of one is related to the other. +Kevin This maximum value was introduced by below commit. And I don't see any hardware/software restrictions that we can't enlarge it after ten years. commit 1b198bb04ad72669d4bd6575fc9945ed595bfee0 Author: Mike Travis Date: Mon Mar 5 15:05:16 2012 -0800 x86/iommu/intel: Increase the number of iommus supported to MAX_IO_APICS The number of IOMMUs supported should be the same as the number of IO APICS. This limit comes into play when the IOMMUs are identity mapped, thus the number of possible IOMMUs in the "static identity" (si) domain should be this same number. [...] include/linux/dmar.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/dmar.h b/include/linux/dmar.h index 45e903d84733..9d4867b8f42e 100644 --- a/include/linux/dmar.h +++ b/include/linux/dmar.h @@ -19,7 +19,7 @@ struct acpi_dmar_header; #ifdef CONFIG_X86 -# define DMAR_UNITS_SUPPORTEDMAX_IO_APICS +# define DMAR_UNITS_SUPPORTED640 #else # define DMAR_UNITS_SUPPORTED64 #endif Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 10/12] iommu: Prepare IOMMU domain for IOPF
On 2022/5/5 21:38, Jean-Philippe Brucker wrote: Hi Baolu, On Thu, May 05, 2022 at 04:31:38PM +0800, Baolu Lu wrote: On 2022/5/4 02:20, Jean-Philippe Brucker wrote: diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 7cae631c1baa..33449523afbe 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3174,3 +3174,24 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, iommu_group_put(group); } + +struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev, + ioasid_t pasid) +{ + struct iommu_domain *domain; + struct iommu_group *group; + + if (!pasid_valid(pasid)) + return NULL; + + group = iommu_group_get(dev); + if (!group) + return NULL; + + mutex_lock(>mutex); Unfortunately this still causes the deadlock when unbind() flushes the IOPF queue while holding the group mutex. Sorry, I didn't get your point here. Do you mean unbind() could hold group mutex before calling this helper? The group mutex is only available in iommu.c. The unbind() has no means to hold this lock. Or, I missed anything? I wasn't clear, it's iommu_detach_device_pasid() that holds the group->mutex: iommu_sva_unbind_device() | iommu_detach_device_pasid() | mutex_lock(>mutex)| domain->ops->detach_dev_pasid() | iopf_handle_group() iopf_queue_flush_dev() | iommu_get_domain_for_dev_pasid() ... wait for IOPF work | mutex_lock(>mutex) |... deadlock Ah! Yes. Thank you for the clarification. Thanks, Jean Best regards, baolu If we make this function private to IOPF, then we can get rid of this mutex_lock(). It's OK because: * xarray protects its internal state with RCU, so we can call xa_load() outside the lock. * The domain obtained from xa_load is finalized. Its content is valid because xarray stores the domain using rcu_assign_pointer(), which has a release memory barrier, which pairs with data dependencies in IOPF (domain->sva_ioas etc). We'll need to be careful about this when allowing other users to install a fault handler. Should be fine as long as the handler and data are installed before the domain is added to pasid_array. * We know the domain is valid the whole time IOPF is using it, because unbind() waits for pending faults. We just need a comment explaining the last point, something like: /* * Safe to fetch outside the group mutex because: * - xarray protects its internal state with RCU * - the domain obtained is either NULL or fully formed * - the IOPF work is the only caller and is flushed before the * domain is freed. */ Agreed. The mutex is needed only when domain could possibly be freed before unbind(). In that case, we need this mutex and get a reference from the domain. As we have dropped the domain user reference, this lock is unnecessary. Thanks, Jean + domain = xa_load(>pasid_array, pasid); + mutex_unlock(>mutex); + iommu_group_put(group); + + return domain; +} Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/5/3 15:49, Jean-Philippe Brucker wrote: On Sat, Apr 30, 2022 at 03:33:17PM +0800, Baolu Lu wrote: Jean, another quick question about the iommu_sva_bind_device() /** * iommu_sva_bind_device() - Bind a process address space to a device * @dev: the device * @mm: the mm to bind, caller must hold a reference to it * @drvdata: opaque data pointer to pass to bind callback This interface requires the caller to take a reference to mm. Which reference should it take, mm->mm_count or mm->mm_users? It's better to make it explicit in this comment. Agreed, it's mm_users as required by mmu_notifier_register() Thanks! I will add this in my refactoring patch. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 4/4] iommu/vt-d: Remove hard coding PGSNP bit in PASID entries
On 2022/5/5 16:46, Tian, Kevin wrote: From: Lu Baolu Sent: Thursday, May 5, 2022 9:07 AM As enforce_cache_coherency has been introduced into the iommu_domain_ops, the kernel component which owns the iommu domain is able to opt-in its requirement for force snooping support. The iommu driver has no need to hard code the page snoop control bit in the PASID table entries anymore. Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian , with one nit: --- drivers/iommu/intel/pasid.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 41a0e3b02c79..0abfa7fc7fb0 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -710,9 +710,6 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu, pasid_set_fault_enable(pte); pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); Probably in a separate patch but above should really be renamed to pasid_set_page_walk_snoop(). Yeah! Need a cleanup here. Above name is confusing. - if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED) - pasid_set_pgsnp(pte); - /* * Since it is a second level only translation setup, we should * set SRE bit as well (addresses are expected to be GPAs). -- 2.25.1 Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/4] iommu/vt-d: Check domain force_snooping against attached devices
On 2022/5/5 16:43, Tian, Kevin wrote: From: Lu Baolu Sent: Thursday, May 5, 2022 9:07 AM As domain->force_snooping only impacts the devices attached with the domain, there's no need to check against all IOMMU units. At the same time, for a brand new domain (hasn't been attached to any device), the force_snooping field could be set, but the attach_dev callback will return failure if it wants to attach to a device which IOMMU has no snoop control capability. The description about brand new domain is not very clear. I think the point here is that force_snooping could be set on a domain no matter whether it has been attached or not and once set it is an immutable flag. If no device attached the operation always succeeds then this empty domain can be only attached to a device of which the IOMMU supports snoop control. Exactly. Will update this description. static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain) { struct dmar_domain *dmar_domain = to_dmar_domain(domain); - if (!domain_update_iommu_snooping(NULL)) + if (dmar_domain->force_snooping) + return true; + + if (!domain_support_force_snooping(dmar_domain)) return false; + Who guarantees that domain->devices won't change between above condition check and following set operation? Good catch. Should lift the lock up here. + domain_set_force_snooping(dmar_domain); dmar_domain->force_snooping = true; + return true; } Thanks Kevin Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 10/12] iommu: Prepare IOMMU domain for IOPF
On 2022/5/4 02:20, Jean-Philippe Brucker wrote: diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 7cae631c1baa..33449523afbe 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3174,3 +3174,24 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, iommu_group_put(group); } + +struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev, + ioasid_t pasid) +{ + struct iommu_domain *domain; + struct iommu_group *group; + + if (!pasid_valid(pasid)) + return NULL; + + group = iommu_group_get(dev); + if (!group) + return NULL; + + mutex_lock(>mutex); Unfortunately this still causes the deadlock when unbind() flushes the IOPF queue while holding the group mutex. Sorry, I didn't get your point here. Do you mean unbind() could hold group mutex before calling this helper? The group mutex is only available in iommu.c. The unbind() has no means to hold this lock. Or, I missed anything? Best regards, baolu If we make this function private to IOPF, then we can get rid of this mutex_lock(). It's OK because: * xarray protects its internal state with RCU, so we can call xa_load() outside the lock. * The domain obtained from xa_load is finalized. Its content is valid because xarray stores the domain using rcu_assign_pointer(), which has a release memory barrier, which pairs with data dependencies in IOPF (domain->sva_ioas etc). We'll need to be careful about this when allowing other users to install a fault handler. Should be fine as long as the handler and data are installed before the domain is added to pasid_array. * We know the domain is valid the whole time IOPF is using it, because unbind() waits for pending faults. We just need a comment explaining the last point, something like: /* * Safe to fetch outside the group mutex because: * - xarray protects its internal state with RCU * - the domain obtained is either NULL or fully formed * - the IOPF work is the only caller and is flushed before the * domain is freed. */ Thanks, Jean + domain = xa_load(>pasid_array, pasid); + mutex_unlock(>mutex); + iommu_group_put(group); + + return domain; +} ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 07/12] arm-smmu-v3/sva: Add SVA domain support
On 2022/5/4 02:12, Jean-Philippe Brucker wrote: On Mon, May 02, 2022 at 09:48:37AM +0800, Lu Baolu wrote: Add support for SVA domain allocation and provide an SVA-specific iommu_domain_ops. Signed-off-by: Lu Baolu --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 14 +++ .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 42 +++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 21 ++ 3 files changed, 77 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index cd48590ada30..7631c00fdcbd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -759,6 +759,10 @@ struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void arm_smmu_sva_unbind(struct iommu_sva *handle); u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle); void arm_smmu_sva_notifier_synchronize(void); +int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t id); +void arm_smmu_sva_detach_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t id); #else /* CONFIG_ARM_SMMU_V3_SVA */ static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu) { @@ -804,5 +808,15 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle) } static inline void arm_smmu_sva_notifier_synchronize(void) {} + +static inline int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t id) +{ + return -ENODEV; +} + +static inline void arm_smmu_sva_detach_dev_pasid(struct iommu_domain *domain, +struct device *dev, +ioasid_t id) {} #endif /* CONFIG_ARM_SMMU_V3_SVA */ #endif /* _ARM_SMMU_V3_H */ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index c623dae1e115..3b843cd3ed67 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -541,3 +541,45 @@ void arm_smmu_sva_notifier_synchronize(void) */ mmu_notifier_synchronize(); } + +int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t id) +{ + int ret = 0; + struct iommu_sva *handle; + struct mm_struct *mm = iommu_sva_domain_mm(domain); + + if (domain->type != IOMMU_DOMAIN_SVA || !mm) We wouldn't get that far with a non-SVA domain since iommu_sva_domain_mm() would dereference a NULL pointer. Could you move it after the domain->type check, and maybe add a WARN_ON()? It could help catch issues in future API changes. Sure. I will make it like this, int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t id) { int ret = 0; struct mm_struct *mm; struct iommu_sva *handle; if (domain->type != IOMMU_DOMAIN_SVA) return -EINVAL; mm = iommu_sva_domain_mm(domain); if (WARN_ON(!mm)) return -ENODEV; ... ... + return -EINVAL; + + mutex_lock(_lock); + handle = __arm_smmu_sva_bind(dev, mm); + if (IS_ERR(handle)) + ret = PTR_ERR(handle); + mutex_unlock(_lock); + + return ret; +} + +void arm_smmu_sva_detach_dev_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t id) +{ + struct arm_smmu_bond *bond = NULL, *t; + struct mm_struct *mm = iommu_sva_domain_mm(domain); + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + + mutex_lock(_lock); + list_for_each_entry(t, >bonds, list) { + if (t->mm == mm) { + bond = t; + break; + } + } + + if (!WARN_ON(!bond) && refcount_dec_and_test(>refs)) { + list_del(>list); + arm_smmu_mmu_notifier_put(bond->smmu_mn); + kfree(bond); + } + mutex_unlock(_lock); +} diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index afc63fce6107..bd80de0bad98 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1995,10 +1995,31 @@ static bool arm_smmu_capable(enum iommu_cap cap) } } +static void arm_smmu_sva_domain_free(struct iommu_domain *domain) +{ + kfree(domain); +} + +static const struct iommu_domain_ops arm_smmu_sva_domain_ops = { + .attach_dev_pasid = arm_smmu_sva_attach_dev_pasid, + .detach_dev_pasid = arm_smmu_sva_detach_dev_pasid, + .free =
Re: [PATCH v5 04/12] iommu/sva: Basic data structures for SVA
On 2022/5/4 02:09, Jean-Philippe Brucker wrote: On Mon, May 02, 2022 at 09:48:34AM +0800, Lu Baolu wrote: Use below data structures for SVA implementation in the IOMMU core: - struct iommu_sva_ioas Represent the I/O address space shared with an application CPU address space. This structure has a 1:1 relationship with an mm_struct. It grabs a "mm->mm_count" refcount during creation and drop it on release. Do we actually need this structure? At the moment it only keeps track of bonds, which we can move to struct dev_iommu. Replacing it by a mm pointer in struct iommu_domain simplifies the driver and seems to work Fair enough. +struct iommu_sva_ioas { + struct mm_struct *mm; + ioasid_t pasid; + + /* Counter of domains attached to this ioas. */ + refcount_t users; + + /* All bindings are linked here. */ + struct list_head bonds; +}; By moving @mm to struct iommu_domain and @bonds to struct dev_iommu, the code looks simpler. The mm, sva domain and per-device dev_iommu are guaranteed to be valid during bind() and unbind(). Will head this direction in the next version. Thanks, Jean Best regards, baolu - struct iommu_domain (IOMMU_DOMAIN_SVA type) Represent a hardware pagetable that the IOMMU hardware could use for SVA translation. Multiple iommu domains could be bound with an SVA ioas and each grabs a refcount from ioas in order to make sure ioas could only be freed after all domains have been unbound. - struct iommu_sva Represent a bond relationship between an SVA ioas and an iommu domain. If a bond already exists, it's reused and a reference is taken. Signed-off-by: Lu Baolu --- include/linux/iommu.h | 14 +- drivers/iommu/iommu-sva-lib.h | 1 + drivers/iommu/iommu-sva-lib.c | 18 ++ 3 files changed, 32 insertions(+), 1 deletion(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index ab36244d4e94..f582f434c513 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -42,6 +42,7 @@ struct notifier_block; struct iommu_sva; struct iommu_fault_event; struct iommu_dma_cookie; +struct iommu_sva_ioas; /* iommu fault flags */ #define IOMMU_FAULT_READ 0x0 @@ -64,6 +65,9 @@ struct iommu_domain_geometry { #define __IOMMU_DOMAIN_PT (1U << 2) /* Domain is identity mapped */ #define __IOMMU_DOMAIN_DMA_FQ (1U << 3) /* DMA-API uses flush queue*/ +#define __IOMMU_DOMAIN_SHARED (1U << 4) /* Page table shared from CPU */ +#define __IOMMU_DOMAIN_HOST_VA (1U << 5) /* Host CPU virtual address */ + /* * This are the possible domain-types * @@ -86,6 +90,8 @@ struct iommu_domain_geometry { #define IOMMU_DOMAIN_DMA_FQ (__IOMMU_DOMAIN_PAGING |\ __IOMMU_DOMAIN_DMA_API | \ __IOMMU_DOMAIN_DMA_FQ) +#define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SHARED |\ +__IOMMU_DOMAIN_HOST_VA) struct iommu_domain { unsigned type; @@ -95,6 +101,7 @@ struct iommu_domain { void *handler_token; struct iommu_domain_geometry geometry; struct iommu_dma_cookie *iova_cookie; + struct iommu_sva_ioas *sva_ioas; }; static inline bool iommu_is_dma_domain(struct iommu_domain *domain) @@ -628,7 +635,12 @@ struct iommu_fwspec { * struct iommu_sva - handle to a device-mm bond */ struct iommu_sva { - struct device *dev; + struct device *dev; + struct iommu_sva_ioas *sva_ioas; + struct iommu_domain *domain; + /* Link to sva ioas's bonds list */ + struct list_headnode; + refcount_t users; }; int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode, diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h index 8909ea1094e3..9c5e108e2c8a 100644 --- a/drivers/iommu/iommu-sva-lib.h +++ b/drivers/iommu/iommu-sva-lib.h @@ -10,6 +10,7 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max); struct mm_struct *iommu_sva_find(ioasid_t pasid); +struct mm_struct *iommu_sva_domain_mm(struct iommu_domain *domain); /* I/O Page fault */ struct device; diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c index 106506143896..d524a402be3b 100644 --- a/drivers/iommu/iommu-sva-lib.c +++ b/drivers/iommu/iommu-sva-lib.c @@ -3,6 +3,8 @@ * Helpers for IOMMU drivers implementing SVA */ #include +#include +#include #include #include "iommu-sva-lib.h" @@ -10,6 +12,22 @@ static DEFINE_MUTEX(iommu_sva_lock); static DECLARE_IOASID_SET(iommu_sva_pasid); +struct iommu_sva_ioas { + struct mm_struct *mm; + ioasid_t pasid; + + /* Counter of domains attached to this ioas. */ + refcount_t users; + + /* All bindings are linked here. */ + struct list_head bonds; +}; +
Re: [PATCH v5 03/12] iommu: Add attach/detach_dev_pasid domain ops
On 2022/5/4 02:07, Jean-Philippe Brucker wrote: On Mon, May 02, 2022 at 09:48:33AM +0800, Lu Baolu wrote: Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds a pair of common domain ops for this purpose and adds helpers to attach/detach a domain to/from a {device, PASID}. Some buses, like PCI, route packets without considering the PASID value. Thus a DMA target address with PASID might be treated as P2P if the address falls into the MMIO BAR of other devices in the group. To make things simple, these interfaces only apply to devices belonging to the singleton groups, and the singleton is immutable in fabric i.e. not affected by hotplug. Signed-off-by: Lu Baolu Reviewed-by: Jean-Philippe Brucker just a nit below --- include/linux/iommu.h | 21 drivers/iommu/iommu.c | 76 +++ 2 files changed, 97 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b8ffaf2cb1d0..ab36244d4e94 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -263,6 +263,8 @@ struct iommu_ops { * struct iommu_domain_ops - domain specific operations * @attach_dev: attach an iommu domain to a device * @detach_dev: detach an iommu domain from a device + * @attach_dev_pasid: attach an iommu domain to a pasid of device + * @detach_dev_pasid: detach an iommu domain from a pasid of device * @map: map a physically contiguous memory region to an iommu domain * @map_pages: map a physically contiguous set of pages of the same size to * an iommu domain. @@ -283,6 +285,10 @@ struct iommu_ops { struct iommu_domain_ops { int (*attach_dev)(struct iommu_domain *domain, struct device *dev); void (*detach_dev)(struct iommu_domain *domain, struct device *dev); + int (*attach_dev_pasid)(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid); + void (*detach_dev_pasid)(struct iommu_domain *domain, +struct device *dev, ioasid_t pasid); int (*map)(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot, gfp_t gfp); @@ -678,6 +684,10 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); +int iommu_attach_device_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid); +void iommu_detach_device_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid); #else /* CONFIG_IOMMU_API */ struct iommu_ops {}; @@ -1051,6 +1061,17 @@ static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group) { return false; } + +static inline int iommu_attach_device_pasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid) +{ + return -ENODEV; +} + +static inline void iommu_detach_device_pasid(struct iommu_domain *domain, +struct device *dev, ioasid_t pasid) +{ +} #endif /* CONFIG_IOMMU_API */ /** diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 29906bc16371..89c9d19ddb28 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -38,6 +38,7 @@ struct iommu_group { struct kobject kobj; struct kobject *devices_kobj; struct list_head devices; + struct xarray pasid_array; struct mutex mutex; void *iommu_data; void (*iommu_data_release)(void *iommu_data); @@ -630,6 +631,7 @@ struct iommu_group *iommu_group_alloc(void) mutex_init(>mutex); INIT_LIST_HEAD(>devices); INIT_LIST_HEAD(>entry); + xa_init(>pasid_array); ret = ida_simple_get(_group_ida, 0, 0, GFP_KERNEL); if (ret < 0) { @@ -3190,3 +3192,77 @@ bool iommu_group_dma_owner_claimed(struct iommu_group *group) return user; } EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed); + +/* + * Use standard PCI bus topology and isolation features to check immutable + * singleton. Otherwise, assume the bus is static and then singleton can + * know from the device count in the group. + */ The comment doesn't really add anything that can't be directly understood from the code. Yes. It's fine to remove it. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 02/12] iommu: Add pasid_bits field in struct dev_iommu
On 2022/5/4 02:02, Jean-Philippe Brucker wrote: On Mon, May 02, 2022 at 09:48:32AM +0800, Lu Baolu wrote: Use this field to save the pasid/ssid bits that a device is able to support with its IOMMU hardware. It is a generic attribute of a device and lifting it into the per-device dev_iommu struct makes it possible to allocate a PASID for device without calls into the IOMMU drivers. Any iommu driver which suports PASID related features should set this field before features are enabled on the devices. For initialization of this field in the VT-d driver, the info->pasid_supported is only set for PCI devices. So the status is that non-PCI SVA hasn't been supported yet. Setting this field only for PCI devices has no functional change. Signed-off-by: Lu Baolu Reviewed-by: Jean-Philippe Brucker Thank you! Very appreciated for reviewing my patches. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: iommu_group_claim_dma_owner() must always assign a domain
On 2022/5/4 22:38, Jason Gunthorpe wrote: @@ -3180,7 +3181,9 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner) ret = -EPERM; goto unlock_out; } else { - if (group->domain && group->domain != group->default_domain) { + if (group->domain && + group->domain != group->default_domain && + group->domain != group->blocking_domain) { Why do we need this hunk? This is just trying to check if there is some conflict with some other domain attach, group->domain can never be blocking_domain here. This is *not* needed. Also verified with real hardware. Sorry for the noise. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] iommu/vt-d: Set SNP bit only in second-level page table entries
On 2022/5/4 21:31, Jason Gunthorpe wrote: On Wed, May 04, 2022 at 03:25:50PM +0800, Baolu Lu wrote: Hi Jason, On 2022/5/2 21:05, Jason Gunthorpe wrote: On Sun, May 01, 2022 at 07:24:31PM +0800, Lu Baolu wrote: The SNP bit is only valid for second-level PTEs. Setting this bit in the first-level PTEs has no functional impact because the Intel IOMMU always ignores the same bit in first-level PTEs. Anyway, let's check the page table type before setting SNP bit in PTEs to make the code more readable. Shouldn't this be tested before setting force_snooping and not during every map? The check is in the following patch. This just makes sure that SNP is only set in second-level page table entries. I think you should add a 2nd flag to indicate 'set SNP bit in PTEs' and take care of computing that flag in the enforce_no_snoop function Yours looks better. Will add it in the next version. Jason Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: iommu_group_claim_dma_owner() must always assign a domain
Hi Jason, On 2022/5/4 08:11, Jason Gunthorpe wrote: Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_attach_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_attach_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai Signed-off-by: Robin Murphy Signed-off-by: Jason Gunthorpe --- drivers/iommu/iommu.c | 112 +++--- 1 file changed, 82 insertions(+), 30 deletions(-) This is based on Robins draft here: https://lore.kernel.org/linux-iommu/18831161-473f-e04f-4a81-1c7062ad1...@arm.com/ With some rework. I re-organized the call chains instead of introducing iommu_group_user_attached(), fixed a recursive locking for iommu_group_get_purgatory(), and made a proper commit message. Still only compile tested, so RFCish. Nicolin/Lu? What do you think, can you check it? Thank you for the patch. With below additional changes, this patch works on my Intel test machine. diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 513da82f2ed1..7c415e9b6906 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2063,7 +2063,8 @@ static int __iommu_attach_group(struct iommu_domain *domain, { int ret; - if (group->domain && group->domain != group->default_domain) + if (group->domain && group->domain != group->default_domain && + group->domain != group->blocking_domain) return -EBUSY; ret = __iommu_group_for_each_dev(group, domain, @@ -2125,7 +2126,7 @@ static int __iommu_group_attach_domain(struct iommu_group *group, * Note that this is called in error unwind paths, attaching to a * domain that has already been attached cannot fail. */ - ret = __iommu_group_for_each_dev(group, group->default_domain, + ret = __iommu_group_for_each_dev(group, new_domain, iommu_group_do_attach_device); if (ret) return ret; @@ -3180,7 +3181,9 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner) ret = -EPERM; goto unlock_out; } else { - if (group->domain && group->domain != group->default_domain) { + if (group->domain && + group->domain != group->default_domain && + group->domain != group->blocking_domain) { ret = -EBUSY; goto unlock_out; Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: iommu_group_claim_dma_owner() must always assign a domain
Hi Jason, On 2022/5/4 08:11, Jason Gunthorpe wrote: +static int __iommu_group_attach_domain(struct iommu_group *group, + struct iommu_domain *new_domain) { int ret; + if (group->domain == new_domain) + return 0; + /* -* If the group has been claimed already, do not re-attach the default -* domain. +* A NULL domain means to call the detach_dev() op. New drivers should +* use a IOMMU_DOMAIN_IDENTITY domain instead of a NULL default_domain +* and detatch_dev(). */ - if (!group->default_domain || group->owner) { - __iommu_group_for_each_dev(group, domain, + if (!new_domain) { + WARN_ON(!group->domain->ops->detach_dev); + __iommu_group_for_each_dev(group, group->domain, iommu_group_do_detach_device); group->domain = NULL; - return; + return 0; } - if (group->domain == group->default_domain) - return; - - /* Detach by re-attaching to the default domain */ + /* +* New drivers do not implement detach_dev, so changing the domain is +* done by calling attach on the new domain. Drivers should implement +* this so that DMA is always translated by either the new, old, or a +* blocking domain. DMA should never become untranslated. +* +* Note that this is called in error unwind paths, attaching to a +* domain that has already been attached cannot fail. +*/ ret = __iommu_group_for_each_dev(group, group->default_domain, ^^ I suppose this should be @new_domain, right? iommu_group_do_attach_device); - if (ret != 0) - WARN_ON(1); + if (ret) + return ret; + group->domain = new_domain; + return 0; +} Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] iommu/vt-d: Remove hard coding PGSNP bit in PASID entries
On 2022/5/2 21:19, Jason Gunthorpe wrote: On Sun, May 01, 2022 at 07:24:34PM +0800, Lu Baolu wrote: As enforce_cache_coherency has been introduced into the iommu_domain_ops, the kernel component which owns the iommu domain is able to opt-in its requirement for force snooping support. The iommu driver has no need to hard code the page snoop control bit in the PASID table entries anymore. Signed-off-by: Lu Baolu --- drivers/iommu/intel/pasid.h | 1 - drivers/iommu/intel/iommu.c | 3 --- drivers/iommu/intel/pasid.c | 6 -- 3 files changed, 10 deletions(-) It seems fine, but as in the other email where do we do pasid_set_pgsnp() for a new device attach on an already no-snopp domain? Yes. I will take care of this in the next version. Jason Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/5] iommu/vt-d: Remove domain_update_iommu_snooping()
On 2022/5/3 05:36, Jacob Pan wrote: Hi BaoLu, On Sun, 1 May 2022 19:24:33 +0800, Lu Baolu wrote: The IOMMU force snooping capability is not required to be consistent among all the IOMMUs anymore. Remove force snooping capability check in the IOMMU hot-add path and domain_update_iommu_snooping() becomes a dead code now. Signed-off-by: Lu Baolu --- drivers/iommu/intel/iommu.c | 34 +- 1 file changed, 1 insertion(+), 33 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 3c1c228f9031..d5808495eb64 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -533,33 +533,6 @@ static void domain_update_iommu_coherency(struct dmar_domain *domain) rcu_read_unlock(); } -static bool domain_update_iommu_snooping(struct intel_iommu *skip) -{ - struct dmar_drhd_unit *drhd; - struct intel_iommu *iommu; - bool ret = true; - - rcu_read_lock(); - for_each_active_iommu(iommu, drhd) { - if (iommu != skip) { - /* -* If the hardware is operating in the scalable mode, -* the snooping control is always supported since we -* always set PASID-table-entry.PGSNP bit if the domain -* is managed outside (UNMANAGED). -*/ - if (!sm_supported(iommu) && - !ecap_sc_support(iommu->ecap)) { - ret = false; - break; - } - } - } - rcu_read_unlock(); - - return ret; -} - static int domain_update_iommu_superpage(struct dmar_domain *domain, struct intel_iommu *skip) { @@ -3593,12 +3566,7 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru) iommu->name); return -ENXIO; } - if (!ecap_sc_support(iommu->ecap) && - domain_update_iommu_snooping(iommu)) { - pr_warn("%s: Doesn't support snooping.\n", - iommu->name); - return -ENXIO; - } + Maybe I missed earlier patches, so this bit can also be deleted? struct dmar_domain { u8 iommu_snooping: 1; /* indicate snooping control feature */ It has been cleaned up by below commit: 71cfafda9c9b vfio: Move the Intel no-snoop control off of IOMMU_CACHE sp = domain_update_iommu_superpage(NULL, iommu) - 1; if (sp >= 0 && !(cap_super_page_val(iommu->cap) & (1 << sp))) { pr_warn("%s: Doesn't support large page.\n", Thanks, Jacob Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/5] iommu/vt-d: Check domain force_snooping against attached devices
On 2022/5/3 05:31, Jacob Pan wrote: Hi BaoLu, Hi Jacob, On Sun, 1 May 2022 19:24:32 +0800, Lu Baolu wrote: As domain->force_snooping only impacts the devices attached with the domain, there's no need to check against all IOMMU units. At the same time, for a brand new domain (hasn't been attached to any device), the force_snooping field could be set, but the attach_dev callback will return failure if it wants to attach to a device which IOMMU has no snoop control capability. Signed-off-by: Lu Baolu --- drivers/iommu/intel/pasid.h | 2 ++ drivers/iommu/intel/iommu.c | 50 - drivers/iommu/intel/pasid.c | 18 + 3 files changed, 69 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h index ab4408c824a5..583ea67fc783 100644 --- a/drivers/iommu/intel/pasid.h +++ b/drivers/iommu/intel/pasid.h @@ -123,4 +123,6 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu, bool fault_ignore); int vcmd_alloc_pasid(struct intel_iommu *iommu, u32 *pasid); void vcmd_free_pasid(struct intel_iommu *iommu, u32 pasid); +void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu, + struct device *dev, u32 pasid); #endif /* __INTEL_PASID_H */ diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 98050943d863..3c1c228f9031 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4554,13 +4554,61 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain, return phys; } +static bool domain_support_force_snooping(struct dmar_domain *domain) +{ + struct device_domain_info *info; + unsigned long flags; + bool support = true; + + spin_lock_irqsave(_domain_lock, flags); + if (list_empty(>devices)) + goto out; + + list_for_each_entry(info, >devices, link) { + if (!ecap_sc_support(info->iommu->ecap)) { + support = false; + break; + } + } why not just check the flag dmar_domain->force_snooping? devices wouldn't be able to attach if !ecap_sc, right? I should check "dmar_domain->force_snooping" first. If this is the first time that this flag is about to set, then check the capabilities. +out: + spin_unlock_irqrestore(_domain_lock, flags); + return support; +} + +static void domain_set_force_snooping(struct dmar_domain *domain) +{ + struct device_domain_info *info; + unsigned long flags; + + /* +* Second level page table supports per-PTE snoop control. The +* iommu_map() interface will handle this by setting SNP bit. +*/ + if (!domain_use_first_level(domain)) + return; + + spin_lock_irqsave(_domain_lock, flags); + if (list_empty(>devices)) + goto out_unlock; + + list_for_each_entry(info, >devices, link) + intel_pasid_setup_page_snoop_control(info->iommu, info->dev, +PASID_RID2PASID); + I guess other DMA API PASIDs need to have sc bit set as well. I will keep this in mind for my DMA API PASID patch. Kernel DMA don't need to set the PGSNP bit. The x86 arch is always DMA coherent. The force snooping is only needed when the device is controlled by user space, but the VMM is optimized not to support the virtualization of the wbinv instruction. +out_unlock: + spin_unlock_irqrestore(_domain_lock, flags); +} + static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain) { struct dmar_domain *dmar_domain = to_dmar_domain(domain); - if (!domain_update_iommu_snooping(NULL)) + if (!domain_support_force_snooping(dmar_domain)) return false; + + domain_set_force_snooping(dmar_domain); dmar_domain->force_snooping = true; + nit: spurious change I expect a blank line before return in the end. return true; } diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index f8d215d85695..815c744e6a34 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -762,3 +762,21 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu, return 0; } + +/* + * Set the page snoop control for a pasid entry which has been set up. + */ +void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu, + struct device *dev, u32 pasid) +{ + struct pasid_entry *pte; + u16 did; + + pte = intel_pasid_get_entry(dev, pasid); + if (WARN_ON(!pte || !pasid_pte_is_present(pte))) + return; + + pasid_set_pgsnp(pte); + did = pasid_get_domain_id(pte); + pasid_flush_caches(iommu, pte, pasid, did); +} Thanks, Jacob Best regards, baolu ___ iommu mailing list
Re: [PATCH 3/5] iommu/vt-d: Check domain force_snooping against attached devices
On 2022/5/2 21:17, Jason Gunthorpe wrote: On Sun, May 01, 2022 at 07:24:32PM +0800, Lu Baolu wrote: +static bool domain_support_force_snooping(struct dmar_domain *domain) +{ + struct device_domain_info *info; + unsigned long flags; + bool support = true; + + spin_lock_irqsave(_domain_lock, flags); + if (list_empty(>devices)) + goto out; Why? list_for_each_entry will just do nothing.. Yes. I will remove above two lines. + list_for_each_entry(info, >devices, link) { + if (!ecap_sc_support(info->iommu->ecap)) { + support = false; + break; + } + } +out: + spin_unlock_irqrestore(_domain_lock, flags); + return support; +} + +static void domain_set_force_snooping(struct dmar_domain *domain) +{ + struct device_domain_info *info; + unsigned long flags; + + /* +* Second level page table supports per-PTE snoop control. The +* iommu_map() interface will handle this by setting SNP bit. +*/ + if (!domain_use_first_level(domain)) + return; + + spin_lock_irqsave(_domain_lock, flags); + if (list_empty(>devices)) + goto out_unlock; + + list_for_each_entry(info, >devices, link) + intel_pasid_setup_page_snoop_control(info->iommu, info->dev, +PASID_RID2PASID); + +out_unlock: + spin_unlock_irqrestore(_domain_lock, flags); +} + static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain) { struct dmar_domain *dmar_domain = to_dmar_domain(domain); - if (!domain_update_iommu_snooping(NULL)) + if (!domain_support_force_snooping(dmar_domain)) return false; Maybe exit early if force_snooping = true? Yes, should check "force_snooping = true" and return directly if force_snooping has already been set. As you pointed below, the new domain_attach should take care of this flag as well. Thanks! + domain_set_force_snooping(dmar_domain); dmar_domain->force_snooping = true; + return true; } diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index f8d215d85695..815c744e6a34 100644 +++ b/drivers/iommu/intel/pasid.c @@ -762,3 +762,21 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu, return 0; } + +/* + * Set the page snoop control for a pasid entry which has been set up. + */ So the 'first level' is only used with pasid? Yes. A fake pasid (RID2PASID in spec) is used for legacy transactions (those w/o pasid). +void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu, + struct device *dev, u32 pasid) +{ + struct pasid_entry *pte; + u16 did; + + pte = intel_pasid_get_entry(dev, pasid); + if (WARN_ON(!pte || !pasid_pte_is_present(pte))) + return; + + pasid_set_pgsnp(pte); Doesn't this need to be done in other places too, like when a new attach is made? Patch 5 removed it, but should that be made if domain->force_snooping? Yes. I missed this. Will take care of this in the next version. Jason Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] iommu/vt-d: Set SNP bit only in second-level page table entries
Hi Jason, On 2022/5/2 21:05, Jason Gunthorpe wrote: On Sun, May 01, 2022 at 07:24:31PM +0800, Lu Baolu wrote: The SNP bit is only valid for second-level PTEs. Setting this bit in the first-level PTEs has no functional impact because the Intel IOMMU always ignores the same bit in first-level PTEs. Anyway, let's check the page table type before setting SNP bit in PTEs to make the code more readable. Shouldn't this be tested before setting force_snooping and not during every map? The check is in the following patch. This just makes sure that SNP is only set in second-level page table entries. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/30 06:19, Fenghua Yu wrote: Hi, Jean and Baolu, On Fri, Apr 29, 2022 at 03:34:36PM +0100, Jean-Philippe Brucker wrote: On Fri, Apr 29, 2022 at 06:51:17AM -0700, Fenghua Yu wrote: Hi, Baolu, On Fri, Apr 29, 2022 at 03:53:57PM +0800, Baolu Lu wrote: On 2022/4/28 16:39, Jean-Philippe Brucker wrote: The address space is what the OOM killer is after. That gets refcounted with mmget()/mmput()/mm->mm_users. The OOM killer is satiated by the page freeing done in __mmput()->exit_mmap(). Also, all the VMAs should be gone after exit_mmap(). So, even if vma->vm_file was holding a reference to a device driver, that reference should be gone by the time __mmdrop() is actually freeing the PASID. I agree with all that. The concern was about tearing down the PASID in the IOMMU and device from the release() MMU notifier, which would happen in exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And since the IOMMU drivers need to hold mm->mm_count anyway between bind() and unbind(), I think Fenghua's fix works. But I didn't find mmgrab()/mmdrop() get called in both arm and intel IOMMU drivers. $ git grep mmgrab drivers/iommu/ [no output] Do we need to add these in a separated fix patch, or I missed anything here? On both ARM and X86, sva_bind() calls mmu_notifier_register()->mmgrab() and sva_unbind() calls mmu_notifier_unregister()/mmu_notifier_put()->mmdrop(). Yes, although for Arm I realized the mmu_notifier grab wasn't sufficient so I sent a separate fix that should go in 5.18 as well https://lore.kernel.org/linux-iommu/20220426130444.300556-1-jean-phili...@linaro.org/ The Arm driver still touches the arch mm context after mmu_notifier_put(). I don't think X86 has that problem. I think so too. On X86, the mm is not used after mmu_notifier_unregister(). In case of supervisor mode SVM (i.e. svm is bound to init_mm), the code is right too because init_mm and its PASID cannot be dropped and mmu_notifier_register()/mmu_notifier_unregister() are not called. So I think no extra fix patch is needed on X86. Thanks, Fenghua and Jean. It's clear to me now. Jean, another quick question about the iommu_sva_bind_device() /** * iommu_sva_bind_device() - Bind a process address space to a device * @dev: the device * @mm: the mm to bind, caller must hold a reference to it * @drvdata: opaque data pointer to pass to bind callback This interface requires the caller to take a reference to mm. Which reference should it take, mm->mm_count or mm->mm_users? It's better to make it explicit in this comment. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 18/19] iommu/intel: Access/Dirty bit support for SL domains
On 2022/4/29 05:09, Joao Martins wrote: --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5089,6 +5089,113 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain, } } +static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain, + bool enable) +{ + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + struct device_domain_info *info; + unsigned long flags; + int ret = -EINVAL; if (domain_use_first_level(dmar_domain)) return -EOPNOTSUPP; + + spin_lock_irqsave(_domain_lock, flags); + if (list_empty(_domain->devices)) { + spin_unlock_irqrestore(_domain_lock, flags); + return ret; + } I agreed with Kevin's suggestion in his reply. + + list_for_each_entry(info, _domain->devices, link) { + if (!info->dev || (info->domain != dmar_domain)) + continue; This check is redundant. + + /* Dirty tracking is second-stage level SM only */ + if ((info->domain && domain_use_first_level(info->domain)) || + !ecap_slads(info->iommu->ecap) || + !sm_supported(info->iommu) || !intel_iommu_sm) { + ret = -EOPNOTSUPP; + continue; Perhaps break and return -EOPNOTSUPP directly here? We are not able to support a mixed mode, right? + } + + ret = intel_pasid_setup_dirty_tracking(info->iommu, info->domain, +info->dev, PASID_RID2PASID, +enable); + if (ret) + break; + } + spin_unlock_irqrestore(_domain_lock, flags); + + /* +* We need to flush context TLB and IOTLB with any cached translations +* to force the incoming DMA requests for have its IOTLB entries tagged +* with A/D bits +*/ + intel_flush_iotlb_all(domain); + return ret; +} Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 04/19] iommu: Add an unmap API that returns dirtied IOPTEs
On 2022/4/29 05:09, Joao Martins wrote: Today, the dirty state is lost and the page wouldn't be migrated to destination potentially leading the guest into error. Add an unmap API that reads the dirty bit and sets it in the user passed bitmap. This unmap iommu API tackles a potentially racy update to the dirty bit *when* doing DMA on a iova that is being unmapped at the same time. The new unmap_read_dirty/unmap_pages_read_dirty does not replace the unmap pages, but rather only when explicit called with an dirty bitmap data passed in. It could be said that the guest is buggy and rather than a special unmap path tackling the theoretical race ... it would suffice fetching the dirty bits (with GET_DIRTY_IOVA), and then unmap the IOVA. I am not sure whether this API could solve the race. size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { struct iommu_iotlb_gather iotlb_gather; size_t ret; iommu_iotlb_gather_init(_gather); ret = __iommu_unmap(domain, iova, size, _gather); iommu_iotlb_sync(domain, _gather); return ret; } The PTEs are cleared before iotlb invalidation. What if a DMA write happens after PTE clearing and before the iotlb invalidation with the PTE happening to be cached? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 03/19] iommufd: Dirty tracking data support
On 2022/4/29 05:09, Joao Martins wrote: Add an IO pagetable API iopt_read_and_clear_dirty_data() that performs the reading of dirty IOPTEs for a given IOVA range and then copying back to userspace from each area-internal bitmap. Underneath it uses the IOMMU equivalent API which will read the dirty bits, as well as atomically clearing the IOPTE dirty bit and flushing the IOTLB at the end. The dirty bitmaps pass an iotlb_gather to allow batching the dirty-bit updates. Most of the complexity, though, is in the handling of the user bitmaps to avoid copies back and forth. The bitmap user addresses need to be iterated through, pinned and then passing the pages into iommu core. The amount of bitmap data passed at a time for a read_and_clear_dirty() is 1 page worth of pinned base page pointers. That equates to 16M bits, or rather 64G of data that can be returned as 'dirtied'. The flush the IOTLB at the end of the whole scanned IOVA range, to defer as much as possible the potential DMA performance penalty. Signed-off-by: Joao Martins --- drivers/iommu/iommufd/io_pagetable.c| 169 drivers/iommu/iommufd/iommufd_private.h | 44 ++ 2 files changed, 213 insertions(+) diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c index f4609ef369e0..835b5040fce9 100644 --- a/drivers/iommu/iommufd/io_pagetable.c +++ b/drivers/iommu/iommufd/io_pagetable.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "io_pagetable.h" @@ -347,6 +348,174 @@ int iopt_set_dirty_tracking(struct io_pagetable *iopt, return ret; } +int iommufd_dirty_iter_init(struct iommufd_dirty_iter *iter, + struct iommufd_dirty_data *bitmap) +{ + struct iommu_dirty_bitmap *dirty = >dirty; + unsigned long bitmap_len; + + bitmap_len = dirty_bitmap_bytes(bitmap->length >> dirty->pgshift); + + import_single_range(WRITE, bitmap->data, bitmap_len, + >bitmap_iov, >bitmap_iter); + iter->iova = bitmap->iova; + + /* Can record up to 64G at a time */ + dirty->pages = (struct page **) __get_free_page(GFP_KERNEL); + + return !dirty->pages ? -ENOMEM : 0; +} + +void iommufd_dirty_iter_free(struct iommufd_dirty_iter *iter) +{ + struct iommu_dirty_bitmap *dirty = >dirty; + + if (dirty->pages) { + free_page((unsigned long) dirty->pages); + dirty->pages = NULL; + } +} + +bool iommufd_dirty_iter_done(struct iommufd_dirty_iter *iter) +{ + return iov_iter_count(>bitmap_iter) > 0; +} + +static inline unsigned long iommufd_dirty_iter_bytes(struct iommufd_dirty_iter *iter) +{ + unsigned long left = iter->bitmap_iter.count - iter->bitmap_iter.iov_offset; + + left = min_t(unsigned long, left, (iter->dirty.npages << PAGE_SHIFT)); + + return left; +} + +unsigned long iommufd_dirty_iova_length(struct iommufd_dirty_iter *iter) +{ + unsigned long left = iommufd_dirty_iter_bytes(iter); + + return ((BITS_PER_BYTE * left) << iter->dirty.pgshift); +} + +unsigned long iommufd_dirty_iova(struct iommufd_dirty_iter *iter) +{ + unsigned long skip = iter->bitmap_iter.iov_offset; + + return iter->iova + ((BITS_PER_BYTE * skip) << iter->dirty.pgshift); +} + +void iommufd_dirty_iter_advance(struct iommufd_dirty_iter *iter) +{ + iov_iter_advance(>bitmap_iter, iommufd_dirty_iter_bytes(iter)); +} + +void iommufd_dirty_iter_put(struct iommufd_dirty_iter *iter) +{ + struct iommu_dirty_bitmap *dirty = >dirty; + + if (dirty->npages) + unpin_user_pages(dirty->pages, dirty->npages); +} + +int iommufd_dirty_iter_get(struct iommufd_dirty_iter *iter) +{ + struct iommu_dirty_bitmap *dirty = >dirty; + unsigned long npages; + unsigned long ret; + void *addr; + + addr = iter->bitmap_iov.iov_base + iter->bitmap_iter.iov_offset; + npages = iov_iter_npages(>bitmap_iter, +PAGE_SIZE / sizeof(struct page *)); + + ret = pin_user_pages_fast((unsigned long) addr, npages, + FOLL_WRITE, dirty->pages); + if (ret <= 0) + return -EINVAL; + + dirty->npages = ret; + dirty->iova = iommufd_dirty_iova(iter); + dirty->start_offset = offset_in_page(addr); + return 0; +} + +static int iommu_read_and_clear_dirty(struct iommu_domain *domain, + struct iommufd_dirty_data *bitmap) This looks more like a helper in the iommu core. How about iommufd_read_clear_domain_dirty()? +{ + const struct iommu_domain_ops *ops = domain->ops; + struct iommu_iotlb_gather gather; + struct iommufd_dirty_iter iter; + int ret = 0; + + if (!ops || !ops->read_and_clear_dirty) + return -EOPNOTSUPP; + + iommu_dirty_bitmap_init(, bitmap->iova, +
Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable
On 2022/4/29 05:09, Joao Martins wrote: +int iopt_set_dirty_tracking(struct io_pagetable *iopt, + struct iommu_domain *domain, bool enable) +{ + struct iommu_domain *dom; + unsigned long index; + int ret = -EOPNOTSUPP; + + down_write(>iova_rwsem); + if (!domain) { + down_write(>domains_rwsem); + xa_for_each(>domains, index, dom) { + ret = iommu_set_dirty_tracking(dom, iopt, enable); + if (ret < 0) + break; Do you need to roll back to the original state before return failure? Partial domains have already had dirty bit tracking enabled. + } + up_write(>domains_rwsem); + } else { + ret = iommu_set_dirty_tracking(domain, iopt, enable); + } + + up_write(>iova_rwsem); + return ret; +} Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 01/19] iommu: Add iommu_domain ops for dirty tracking
Hi Joao, Thanks for doing this. On 2022/4/29 05:09, Joao Martins wrote: Add to iommu domain operations a set of callbacks to perform dirty tracking, particulary to start and stop tracking and finally to test and clear the dirty data. Drivers are expected to dynamically change its hw protection domain bits to toggle the tracking and flush some form of control state structure that stands in the IOVA translation path. For reading and clearing dirty data, in all IOMMUs a transition from any of the PTE access bits (Access, Dirty) implies flushing the IOTLB to invalidate any stale data in the IOTLB as to whether or not the IOMMU should update the said PTEs. The iommu core APIs introduce a new structure for storing the dirties, albeit vendor IOMMUs implementing .read_and_clear_dirty() just use iommu_dirty_bitmap_record() to set the memory storing dirties. The underlying tracking/iteration of user bitmap memory is instead done by iommufd which takes care of initializing the dirty bitmap *prior* to passing to the IOMMU domain op. So far for currently/to-be-supported IOMMUs with dirty tracking support this particularly because the tracking is part of first stage tables and part of address translation. Below it is mentioned how hardware deal with the hardware protection domain control bits, to justify the added iommu core APIs. vendor IOMMU implementation will also explain in more detail on the dirty bit usage/clearing in the IOPTEs. * x86 AMD: The same thing for AMD particularly the Device Table respectivally, followed by flushing the Device IOTLB. On AMD[1], section "2.2.1 Updating Shared Tables", e.g. Each table can also have its contents cached by the IOMMU or peripheral IOTLBs. Therefore, after updating a table entry that can be cached, system software must send the IOMMU an appropriate invalidate command. Information in the peripheral IOTLBs must also be invalidated. There's no mention of particular bits that are cached or not but fetching a dev entry is part of address translation as also depicted, so invalidate the device table to make sure the next translations fetch a DTE entry with the HD bits set. * x86 Intel (rev3.0+): Likewise[2] set the SSADE bit in the scalable-entry second stage table to enable Access/Dirty bits in the second stage page table. See manual, particularly on "6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations" When modifying root-entries, scalable-mode root-entries, context-entries, or scalable-mode context entries: Software must serially invalidate the context-cache, PASID-cache (if applicable), and the IOTLB. The serialization is required since hardware may utilize information from the context-caches (e.g., Domain-ID) to tag new entries inserted to the PASID-cache and IOTLB for processing in-flight requests. Section 6.5 describe the invalidation operations. And also the whole chapter "" Table "Table 23. Guidance to Software for Invalidations" in "6.5.3.3 Guidance to Software for Invalidations" explicitly mentions SSADE transition from 0 to 1 in a scalable-mode PASID-table entry with PGTT value of Second-stage or Nested * ARM SMMUV3.2: SMMUv3.2 needs to toggle the dirty bit descriptor over the CD (or S2CD) for toggling and flush/invalidate the IOMMU dev IOTLB. Reference[0]: SMMU spec, "5.4.1 CD notes", The following CD fields are permitted to be cached as part of a translation or TLB entry, and alteration requires invalidation of any TLB entry that might have cached these fields, in addition to CD structure cache invalidation: ... HA, HD ... Although, The ARM SMMUv3 case is a tad different that its x86 counterparts. Rather than changing *only* the IOMMU domain device entry to enable dirty tracking (and having a dedicated bit for dirtyness in IOPTE) ARM instead uses a dirty-bit modifier which is separately enabled, and changes the *existing* meaning of access bits (for ro/rw), to the point that marking access bit read-only but with dirty-bit-modifier enabled doesn't trigger an perm io page fault. In pratice this means that changing iommu context isn't enough and in fact mostly useless IIUC (and can be always enabled). Dirtying is only really enabled when the DBM pte bit is enabled (with the CD.HD bit as a prereq). To capture this h/w construct an iommu core API is added which enables dirty tracking on an IOVA range rather than a device/context entry. iommufd picks one or the other, and IOMMUFD core will favour device-context op followed by IOVA-range alternative. Instead of specification words, I'd like to read more about why the callbacks are needed and how should they be implemented and consumed. [0] https://developer.arm.com/documentation/ihi0070/latest [1] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf [2] https://cdrdv2.intel.com/v1/dl/getContent/671081 Signed-off-by: Joao Martins --- drivers/iommu/iommu.c | 28 include/linux/io-pgtable.h | 6 + include/linux/iommu.h | 52
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/28 16:39, Jean-Philippe Brucker wrote: On Tue, Apr 26, 2022 at 04:31:57PM -0700, Dave Hansen wrote: On 4/26/22 09:48, Jean-Philippe Brucker wrote: On Tue, Apr 26, 2022 at 08:27:00AM -0700, Dave Hansen wrote: On 4/25/22 09:40, Jean-Philippe Brucker wrote: The problem is that we'd have to request the device driver to stop DMA before we can destroy the context and free the PASID. We did consider doing this in the release() MMU notifier, but there were concerns about blocking mmput() for too long (for example https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa79...@amd.com/ though I think there was a more recent discussion). We also need to drain the PRI and fault queues to get rid of all references to that PASID. Is the concern truly about blocking mmput() itself? Or, is it about releasing the resources associated with the mm? The latter I think, this one was about releasing pages as fast as possible if the process is picked by the OOM killer. We're tying the PASID to the life of the mm itself, not the mm's address space. That means the PASID should be tied to mmgrab()/mmdrop()/mm->mm_count. The address space is what the OOM killer is after. That gets refcounted with mmget()/mmput()/mm->mm_users. The OOM killer is satiated by the page freeing done in __mmput()->exit_mmap(). Also, all the VMAs should be gone after exit_mmap(). So, even if vma->vm_file was holding a reference to a device driver, that reference should be gone by the time __mmdrop() is actually freeing the PASID. I agree with all that. The concern was about tearing down the PASID in the IOMMU and device from the release() MMU notifier, which would happen in exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And since the IOMMU drivers need to hold mm->mm_count anyway between bind() and unbind(), I think Fenghua's fix works. But I didn't find mmgrab()/mmdrop() get called in both arm and intel IOMMU drivers. $ git grep mmgrab drivers/iommu/ [no output] Do we need to add these in a separated fix patch, or I missed anything here? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 03/14] iommu: Move bus setup to IOMMU device registration
Hi Robin, On 2022/4/28 21:18, Robin Murphy wrote: Move the bus setup to iommu_device_register(). This should allow bus_iommu_probe() to be correctly replayed for multiple IOMMU instances, and leaves bus_set_iommu() as a glorified no-op to be cleaned up next. I re-fetched the latest patches on https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus and rolled back the head to "iommu: Cleanup bus_set_iommu". The test machine still hangs during boot. I went through the code. It seems that the .probe_device for Intel IOMMU driver can't handle the probe replay well. It always assumes that the device has never been probed. Best regards, baolu At this point we can also handle cleanup better than just rolling back the most-recently-touched bus upon failure - which may release devices owned by other already-registered instances, and still leave devices on other buses with dangling pointers to the failed instance. Now it's easy to clean up the exact footprint of a given instance, no more, no less. Tested-by: Marek Szyprowski Signed-off-by: Robin Murphy --- drivers/iommu/iommu.c | 51 +++ 1 file changed, 27 insertions(+), 24 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 6c4621afc8cf..c89af4dc54c2 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -175,6 +175,14 @@ static int __init iommu_subsys_init(void) } subsys_initcall(iommu_subsys_init); +static int remove_iommu_group(struct device *dev, void *data) +{ + if (dev->iommu && dev->iommu->iommu_dev == data) + iommu_release_device(dev); + + return 0; +} + /** * iommu_device_register() - Register an IOMMU hardware instance * @iommu: IOMMU handle for the instance @@ -197,12 +205,29 @@ int iommu_device_register(struct iommu_device *iommu, spin_lock(_device_lock); list_add_tail(>list, _device_list); spin_unlock(_device_lock); + + for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) { + struct bus_type *bus = iommu_buses[i]; + int err; + + WARN_ON(bus->iommu_ops && bus->iommu_ops != ops); + bus->iommu_ops = ops; + err = bus_iommu_probe(bus); + if (err) { + iommu_device_unregister(iommu); + return err; + } + } + return 0; } EXPORT_SYMBOL_GPL(iommu_device_register); void iommu_device_unregister(struct iommu_device *iommu) { + for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) + bus_for_each_dev(iommu_buses[i], NULL, iommu, remove_iommu_group); + spin_lock(_device_lock); list_del(>list); spin_unlock(_device_lock); @@ -1655,13 +1680,6 @@ static int probe_iommu_group(struct device *dev, void *data) return ret; } -static int remove_iommu_group(struct device *dev, void *data) -{ - iommu_release_device(dev); - - return 0; -} - static int iommu_bus_notifier(struct notifier_block *nb, unsigned long action, void *data) { @@ -1884,27 +1902,12 @@ static int iommu_bus_init(struct bus_type *bus) */ int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops) { - int err; - - if (ops == NULL) { - bus->iommu_ops = NULL; - return 0; - } - - if (bus->iommu_ops != NULL) + if (bus->iommu_ops && ops && bus->iommu_ops != ops) return -EBUSY; bus->iommu_ops = ops; - /* Do IOMMU specific setup for this bus-type */ - err = bus_iommu_probe(bus); - if (err) { - /* Clean up */ - bus_for_each_dev(bus, NULL, NULL, remove_iommu_group); - bus->iommu_ops = NULL; - } - - return err; + return 0; } EXPORT_SYMBOL_GPL(bus_set_iommu); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 11/12] iommu: Per-domain I/O page fault handling
On 2022/4/28 22:57, Jean-Philippe Brucker wrote: On Thu, Apr 21, 2022 at 01:21:20PM +0800, Lu Baolu wrote: static void iopf_handle_group(struct work_struct *work) { struct iopf_group *group; @@ -134,12 +78,23 @@ static void iopf_handle_group(struct work_struct *work) group = container_of(work, struct iopf_group, work); list_for_each_entry_safe(iopf, next, >faults, list) { + struct iommu_domain *domain; + + domain = iommu_get_domain_for_dev_pasid_async(group->dev, + iopf->fault.prm.pasid); Reading the PCIe spec again (v6.0 10.4.1.1 PASID Usage), all faults within the group have the same PASID so we could move the domain fetch out of the loop. It does deviate from the old behavior, though, so we could change it later. Perhaps we can add a pasid member in the struct iopf_group and do a sanity check when a new iopf is added to the group? Here, we just fetch the domain with group->pasid. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 03/12] iommu: Add attach/detach_dev_pasid domain ops
On 2022/4/28 22:53, Jean-Philippe Brucker wrote: On Thu, Apr 21, 2022 at 01:21:12PM +0800, Lu Baolu wrote: Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds a pair of common domain ops for this purpose and adds helpers to attach/detach a domain to/from a {device, PASID}. Some buses, like PCI, route packets without considering the PASID value. Thus a DMA target address with PASID might be treated as P2P if the address falls into the MMIO BAR of other devices in the group. To make things simple, these interfaces only apply to devices belonging to the singleton groups, and the singleton is immutable in fabric i.e. not affected by hotplug. Signed-off-by: Lu Baolu [...] +/* + * Use standard PCI bus topology, isolation features, and DMA + * alias quirks to check the immutable singleton attribute. If + * the device came from DT, assume it is static and then + * singleton can know from the device count in the group. + */ +static bool device_group_immutable_singleton(struct device *dev) +{ + struct iommu_group *group = iommu_group_get(dev); + int count; + + if (!group) + return false; + + mutex_lock(>mutex); + count = iommu_group_device_count(group); + mutex_unlock(>mutex); + iommu_group_put(group); + + if (count != 1) + return false; + + if (dev_is_pci(dev)) { + struct pci_dev *pdev = to_pci_dev(dev); + + /* +* The device could be considered to be fully isolated if +* all devices on the path from the device to the host-PCI +* bridge are protected from peer-to-peer DMA by ACS. +*/ + if (!pci_acs_path_enabled(pdev, NULL, REQ_ACS_FLAGS)) + return false; + + /* Filter out devices which has any alias device. */ + if (pci_for_each_dma_alias(pdev, has_pci_alias, pdev)) + return false; Aren't aliases already added to the group by pci_device_group()? If so it's part of the count check above You are right. pci_device_group() has already covered pci aliases. + + return true; + } + + /* +* If the device came from DT, assume it is static and then +* singleton can know from the device count in the group. +*/ + return is_of_node(dev_fwnode(dev)); I don't think DT is relevant here because a platform device enumerated through ACPI will also have its own group. It should be safe to stick to what the IOMMU drivers declare with their device_group() callback. Except for PCI those groups should be immutable so we can return true here. Fair enough. My code is too restrict. The group singleton is immutable as long as the fabric is static or ACS (or similar) technology is implemented. Currently we only need to care about PCI as far as I can see. Thanks, Jean Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 10/12] iommu: Prepare IOMMU domain for IOPF
Hi Jean, On 2022/4/28 22:47, Jean-Philippe Brucker wrote: Hi Baolu, On Thu, Apr 21, 2022 at 01:21:19PM +0800, Lu Baolu wrote: +/* + * Get the attached domain for asynchronous usage, for example the I/O + * page fault handling framework. The caller get a reference counter + * of the domain automatically on a successful return and should put + * it with iommu_domain_put() after usage. + */ +struct iommu_domain * +iommu_get_domain_for_dev_pasid_async(struct device *dev, ioasid_t pasid) +{ + struct iommu_domain *domain; + struct iommu_group *group; + + if (!pasid_valid(pasid)) + return NULL; + + group = iommu_group_get(dev); + if (!group) + return NULL; + + mutex_lock(>mutex); There is a possible deadlock between unbind() and the fault handler: unbind()iopf_handle_group() mutex_lock(>mutex) iommu_detach_device_pasid() iopf_queue_flush_dev() iommu_get_domain_for_dev_pasid_async() ... waits for IOPF workmutex_lock(>mutex) Yes, really. I was wrong in my previous review: we do have a guarantee that the SVA domain does not go away during IOPF handling, because unbind() waits for pending faults with iopf_queue_flush_dev() before freeing the domain (or for Arm stall, knows that there are no pending faults). So we can just get rid of domain->async_users and the group->mutex in IOPF, I think? Agreed with you. The Intel code does the same thing in its unbind(). Thus, the sva domain's life cycle has already synchronized with IOPF handling, there's no need for domain->async. I will drop it in the next version. Thanks you! Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu