Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-24 Thread Baolu Lu

On 2022/5/24 21:44, Jason Gunthorpe wrote:

+{
+   struct iommu_sva_domain *sva_domain;
+   struct iommu_domain *domain;
+
+   if (!bus->iommu_ops || !bus->iommu_ops->sva_domain_ops)
+   return ERR_PTR(-ENODEV);
+
+   sva_domain = kzalloc(sizeof(*sva_domain), GFP_KERNEL);
+   if (!sva_domain)
+   return ERR_PTR(-ENOMEM);
+
+   mmgrab(mm);
+   sva_domain->mm = mm;
+
+   domain = _domain->domain;
+   domain->type = IOMMU_DOMAIN_SVA;
+   domain->ops = bus->iommu_ops->sva_domain_ops;
+
+   return domain;
+}
+
+void iommu_sva_free_domain(struct iommu_domain *domain)
+{
+   struct iommu_sva_domain *sva_domain = to_sva_domain(domain);
+
+   mmdrop(sva_domain->mm);
+   kfree(sva_domain);
+}

No callback to the driver?


Should do this in the next version. This version added an sva-specific
iommu_domain_ops pointer in iommu_ops. This is not the right way to go.




+int iommu_sva_set_domain(struct iommu_domain *domain, struct device *dev,
+ioasid_t pasid)
+{

Why does this function exist? Just call iommu_set_device_pasid()


Yes, agreed.




+int iommu_set_device_pasid(struct iommu_domain *domain, struct device *dev,
+  ioasid_t pasid)
+{

Here you can continue to use attach/detach language as at this API
level we expect strict pairing..


Sure.





+void iommu_block_device_pasid(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid)
+{
+   struct iommu_group *group = iommu_group_get(dev);
+
+   mutex_lock(>mutex);
+   domain->ops->block_dev_pasid(domain, dev, pasid);
+   xa_erase(>pasid_array, pasid);
+   mutex_unlock(>mutex);

Should be the blocking domain.


As we discussed, we should change above to blocking domain when the
blocking domain is supported on at least Intel and arm-smmu-v3 drivers.
I have started the work for Intel driver support.

Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-24 Thread Baolu Lu

Hi Jason,

On 2022/5/24 21:44, Jason Gunthorpe wrote:

diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 106506143896..210c376f6043 100644
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -69,3 +69,51 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid)
return ioasid_find(_sva_pasid, pasid, __mmget_not_zero);
  }
  EXPORT_SYMBOL_GPL(iommu_sva_find);
+
+/*
+ * IOMMU SVA driver-oriented interfaces
+ */
+struct iommu_domain *
+iommu_sva_alloc_domain(struct bus_type *bus, struct mm_struct *mm)

This should return the proper type



Can you please elaborate a bit on "return the proper type"? Did you mean
return iommu_sva_domain instead? That's a wrapper of iommu_domain and
only for iommu internal usage.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-24 Thread Baolu Lu

On 2022/5/24 17:39, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Thursday, May 19, 2022 3:21 PM

The iommu_sva_domain represents a hardware pagetable that the IOMMU
hardware could use for SVA translation. This adds some infrastructure
to support SVA domain in the iommu common layer. It includes:

- Add a new struct iommu_sva_domain and new IOMMU_DOMAIN_SVA
domain
   type.
- Add a new domain ops pointer in iommu_ops. The IOMMU drivers that
   support SVA should provide the callbacks.
- Add helpers to allocate and free an SVA domain.
- Add helpers to set an SVA domain to a device and the reverse
   operation.

Some buses, like PCI, route packets without considering the PASID value.
Thus a DMA target address with PASID might be treated as P2P if the
address falls into the MMIO BAR of other devices in the group. To make
things simple, the attach/detach interfaces only apply to devices
belonging to the singleton groups, and the singleton is immutable in
fabric i.e. not affected by hotplug.

The iommu_set/block_device_pasid() can be used for other purposes,
such as kernel DMA with pasid, mediation device, etc. Hence, it is put
in the iommu.c.


usually we have 'set/clear' pair or 'allow/block'. Having 'set' paired
with 'block' doesn't read very clearly.


Yes. Let's still use the attach/detach semantics.




+static bool device_group_immutable_singleton(struct device *dev)
+{
+   struct iommu_group *group = iommu_group_get(dev);


what about passing group as the parameter since the caller will
get the group again right after calling this function? In that case
the function could be renamed as:

iommu_group_immutable_singleton()

or be shorter:

iommu_group_fixed_singleton()


Fair enough. I will tune it as below:

+static bool iommu_group_immutable_singleton(struct iommu_group *group)
+{
+   int count;
+
+   mutex_lock(>mutex);
+   count = iommu_group_device_count(group);
+   mutex_unlock(>mutex);
+
+   if (count != 1)
+   return false;
+
+   /*
+* The PCI device could be considered to be fully isolated if all
+* devices on the path from the device to the host-PCI bridge are
+* protected from peer-to-peer DMA by ACS.
+*/
+   if (dev_is_pci(dev))
+   return pci_acs_path_enabled(to_pci_dev(dev), NULL,
+   REQ_ACS_FLAGS);
+
+   /*
+* Otherwise, the device came from DT/ACPI, assume it is static and
+* then singleton can know from the device count in the group.
+*/
+   return true;
+}





+   int count;
+
+   if (!group)
+   return false;
+
+   mutex_lock(>mutex);
+   count = iommu_group_device_count(group);
+   mutex_unlock(>mutex);
+   iommu_group_put(group);
+
+   if (count != 1)
+   return false;


For non-pci devices above doesn't check anything against immutable.
Please add some comment to explain why doing so is correct.


Yes, as above code shows.




+
+   /*
+* The PCI device could be considered to be fully isolated if all
+* devices on the path from the device to the host-PCI bridge are
+* protected from peer-to-peer DMA by ACS.
+*/
+   if (dev_is_pci(dev))
+   return pci_acs_path_enabled(to_pci_dev(dev), NULL,
+   REQ_ACS_FLAGS);
+
+   return true;
+}
+


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-24 Thread Baolu Lu

On 2022/5/25 08:44, Tian, Kevin wrote:

From: Jason Gunthorpe 
Sent: Tuesday, May 24, 2022 9:39 PM

On Tue, May 24, 2022 at 09:39:52AM +, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Thursday, May 19, 2022 3:21 PM

The iommu_sva_domain represents a hardware pagetable that the

IOMMU

hardware could use for SVA translation. This adds some infrastructure
to support SVA domain in the iommu common layer. It includes:

- Add a new struct iommu_sva_domain and new IOMMU_DOMAIN_SVA
domain
   type.
- Add a new domain ops pointer in iommu_ops. The IOMMU drivers that
   support SVA should provide the callbacks.
- Add helpers to allocate and free an SVA domain.
- Add helpers to set an SVA domain to a device and the reverse
   operation.

Some buses, like PCI, route packets without considering the PASID value.
Thus a DMA target address with PASID might be treated as P2P if the
address falls into the MMIO BAR of other devices in the group. To make
things simple, the attach/detach interfaces only apply to devices
belonging to the singleton groups, and the singleton is immutable in
fabric i.e. not affected by hotplug.

The iommu_set/block_device_pasid() can be used for other purposes,
such as kernel DMA with pasid, mediation device, etc. Hence, it is put
in the iommu.c.


usually we have 'set/clear' pair or 'allow/block'. Having 'set' paired
with 'block' doesn't read very clearly.


I thought we agreed we'd use the blocking domain for this? Why did it
go back to an op?



Probably it's based on following discussion:

https://lore.kernel.org/all/c8492b29-bc27-ae12-d5c4-9fbbc797e...@linux.intel.com/

--

FWIW from my point of view I'm happy with having a .detach_dev_pasid op
meaning implicitly-blocked access for now.


If this is the path then lets not call it attach/detach
please. 'set_dev_pasid' and 'set_dev_blocking_pasid' are clearer
names.


Yes. Learning from above discussion, we are about to implement the
set_dev_pasid and blocking domain in parallel. We will convert all
the callback names to set_dev and set_dev_pasid after blocking domain
support is merged.


--

Looks Baolu chooses this path and plans to use the blocking domain
later.


Yes. I have already started to implement the blocking domain in Intel
driver. With it as an example, we can extend it to other possible IOMMU
drivers.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-24 Thread Baolu Lu

On 2022/5/24 17:44, Tian, Kevin wrote:

From: Baolu Lu 
Sent: Monday, May 23, 2022 3:13 PM

@@ -254,6 +259,7 @@ struct iommu_ops {
int (*def_domain_type)(struct device *dev);

const struct iommu_domain_ops *default_domain_ops;
+   const struct iommu_domain_ops *sva_domain_ops;


Per Joerg's comment in anther thread,

https://lore.kernel.org/linux-iommu/yodvj7ervpidw...@8bytes.org/

adding a sva_domain_ops here is not the right way to go.

If no objection, I will make the sva domain go through the
generic domain_alloc/free() callbacks in the next version.



suppose it's just back to what v1-v6 did which all registered the ops
in domain_alloc() callback?


Yes.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 01/10] iommu: Add pasids field in struct iommu_device

2022-05-24 Thread Baolu Lu

On 2022/5/25 10:03, Baolu Lu wrote:

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 4de960834a1b..1c3cf267934d 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1126,6 +1126,10 @@ static int alloc_iommu(struct dmar_drhd_unit
*drhd)

  raw_spin_lock_init(>register_lock);

+    /* Supports full 20-bit PASID in scalable mode. */
+    if (ecap_pasid(iommu->ecap))
+    iommu->iommu.pasids = 1UL << 20;
+


supported pasid bits is reported by ecap_pss(). I don't think we should
assume 20bits here.


Yes. I overlooked this. Thanks for reminding.

Another thing I need to improve is that scalable mode could be disabled.
This field should be 0 in that case.


I will change above to:

+/*
+ * A value of N in PSS field of eCap register indicates hardware
+ * supports PASID field of N+1 bits.
+ */
+if (pasid_supported(iommu))
+iommu->iommu.max_pasids = 2UL << ecap_pss(iommu->ecap);

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 01/10] iommu: Add pasids field in struct iommu_device

2022-05-24 Thread Baolu Lu

Hi Kevin,

Thank you for reviewing my patches.

On 2022/5/24 17:24, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Thursday, May 19, 2022 3:21 PM

Use this field to keep the number of supported PASIDs that an IOMMU
hardware is able to support. This is a generic attribute of an IOMMU
and lifting it into the per-IOMMU device structure makes it possible
to allocate a PASID for device without calls into the IOMMU drivers.
Any iommu driver which suports PASID related features should set this
field before enabling them on the devices.

Signed-off-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 
---
  include/linux/iommu.h   | 2 ++
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
  drivers/iommu/intel/dmar.c  | 4 
  3 files changed, 7 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e1afe169549..da423e87f248 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -318,12 +318,14 @@ struct iommu_domain_ops {
   * @list: Used by the iommu-core to keep a list of registered iommus
   * @ops: iommu-ops for talking to this iommu
   * @dev: struct device for sysfs handling
+ * @pasids: number of supported PASIDs
   */
  struct iommu_device {
struct list_head list;
const struct iommu_ops *ops;
struct fwnode_handle *fwnode;
struct device *dev;
+   u32 pasids;


max_pasid or nr_pasids?


max_pasid looks better.




  };

  /**
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 88817a3376ef..6e2cd082c670 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3546,6 +3546,7 @@ static int arm_smmu_device_hw_probe(struct
arm_smmu_device *smmu)
/* SID/SSID sizes */
smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg);
smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
+   smmu->iommu.pasids = smmu->ssid_bits;

/*
 * If the SMMU supports fewer bits than would fill a single L2 stream
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 4de960834a1b..1c3cf267934d 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1126,6 +1126,10 @@ static int alloc_iommu(struct dmar_drhd_unit
*drhd)

raw_spin_lock_init(>register_lock);

+   /* Supports full 20-bit PASID in scalable mode. */
+   if (ecap_pasid(iommu->ecap))
+   iommu->iommu.pasids = 1UL << 20;
+


supported pasid bits is reported by ecap_pss(). I don't think we should
assume 20bits here.


Yes. I overlooked this. Thanks for reminding.

Another thing I need to improve is that scalable mode could be disabled.
This field should be 0 in that case.




/*
 * This is only for hotplug; at boot time intel_iommu_enabled won't
 * be set yet. When intel_iommu_init() runs, it registers the units
--
2.25.1




Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-05-24 Thread Baolu Lu

Hi Joerg,

On 2022/5/21 08:21, Yian Chen wrote:

Notifier calling chain uses priority to determine the execution
order of the notifiers or listeners registered to the chain.
PCI bus device hot add utilizes the notification mechanism.

The current code sets low priority (INT_MIN) to Intel
dmar_pci_bus_notifier and postpones DMAR decoding after adding
new device into IOMMU. The result is that struct device pointer
cannot be found in DRHD search for the new device's DMAR/IOMMU.
Subsequently, the device is put under the "catch-all" IOMMU
instead of the correct one. This could cause system hang when
device TLB invalidation is sent to the wrong IOMMU. Invalidation
timeout error and hard lockup have been observed and data
inconsistency/crush may occur as well.

This patch fixes the issue by setting a positive priority(1) for
dmar_pci_bus_notifier while the priority of IOMMU bus notifier
uses the default value(0), therefore DMAR decoding will be in
advance of DRHD search for a new device to find the correct IOMMU.

Following is a 2-step example that triggers the bug by simulating
PCI device hot add behavior in Intel Sapphire Rapids server.

echo 1 > /sys/bus/pci/devices/:6a:01.0/remove
echo 1 > /sys/bus/pci/rescan

Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
Cc: sta...@vger.kernel.org # v3.15+
Reported-by: Zhang, Bernice 
Signed-off-by: Jacob Pan 
Signed-off-by: Yian Chen 
---
This is a quick fix for the bug reported. Intel internally evaluated
another redesigned solution that eliminates dmar pci bus notifier to
simplify the workflow of pci hotplug and improve its runtime efficiency.

While considering the fix could apply to downstream and the complexity
of pci hotplug workflow change may significantly increase the
engineering effort to downstream the patch, the choice is to submit this
simple patch to help the deployment of this bug fix.


Yian has been worked on using IOMMU bus notifier to solve this problem.
It turns out that due to the following facts, we need to refactor the 
IOMMU core and Intel DMAR Code:


- Interrupt remapping also requires Intel DMAR code. Therefore, when
  IOMMU is not enabled, the PCI bus notifier in DMAR is still required.
- The IOMMU PCI bus notifier calls .probe_device() which lacks of the
  information about hot-add or static boot.

Considering that the problem described here is a serious problem,
because users can easily damage the system by writing sysfs files on
some platforms, we need a quick fix for both upstream and stable
kernels. The refactoring code will be discussed in a separate series.

How do you like it? If you agree, I can queue it in my next pull request
for fixes.

Best regards,
baolu


---

  drivers/iommu/intel/dmar.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 4de960834a1b..497c5bd95caf 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -383,7 +383,7 @@ static int dmar_pci_bus_notifier(struct notifier_block *nb,
  
  static struct notifier_block dmar_pci_bus_nb = {

.notifier_call = dmar_pci_bus_notifier,
-   .priority = INT_MIN,
+   .priority = 1,
  };
  
  static struct dmar_drhd_unit *


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-23 Thread Baolu Lu

On 2022/5/19 15:20, Lu Baolu wrote:

The iommu_sva_domain represents a hardware pagetable that the IOMMU
hardware could use for SVA translation. This adds some infrastructure
to support SVA domain in the iommu common layer. It includes:

- Add a new struct iommu_sva_domain and new IOMMU_DOMAIN_SVA domain
   type.
- Add a new domain ops pointer in iommu_ops. The IOMMU drivers that
   support SVA should provide the callbacks.
- Add helpers to allocate and free an SVA domain.
- Add helpers to set an SVA domain to a device and the reverse
   operation.

Some buses, like PCI, route packets without considering the PASID value.
Thus a DMA target address with PASID might be treated as P2P if the
address falls into the MMIO BAR of other devices in the group. To make
things simple, the attach/detach interfaces only apply to devices
belonging to the singleton groups, and the singleton is immutable in
fabric i.e. not affected by hotplug.

The iommu_set/block_device_pasid() can be used for other purposes,
such as kernel DMA with pasid, mediation device, etc. Hence, it is put
in the iommu.c.

Suggested-by: Jean-Philippe Brucker
Suggested-by: Jason Gunthorpe
Signed-off-by: Lu Baolu
---
  include/linux/iommu.h | 51 +
  drivers/iommu/iommu-sva-lib.h | 15 
  drivers/iommu/iommu-sva-lib.c | 48 +++
  drivers/iommu/iommu.c | 71 +++
  4 files changed, 185 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0c358b7c583b..e8cf82d46ce1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -64,6 +64,9 @@ struct iommu_domain_geometry {
  #define __IOMMU_DOMAIN_PT (1U << 2)  /* Domain is identity mapped   */
  #define __IOMMU_DOMAIN_DMA_FQ (1U << 3)  /* DMA-API uses flush queue*/
  
+#define __IOMMU_DOMAIN_SHARED	(1U << 4)  /* Page table shared from CPU  */

+#define __IOMMU_DOMAIN_HOST_VA (1U << 5)  /* Host CPU virtual address */
+
  /*
   * This are the possible domain-types
   *
@@ -86,6 +89,8 @@ struct iommu_domain_geometry {
  #define IOMMU_DOMAIN_DMA_FQ   (__IOMMU_DOMAIN_PAGING |\
 __IOMMU_DOMAIN_DMA_API |   \
 __IOMMU_DOMAIN_DMA_FQ)
+#define IOMMU_DOMAIN_SVA   (__IOMMU_DOMAIN_SHARED |\
+__IOMMU_DOMAIN_HOST_VA)
  
  struct iommu_domain {

unsigned type;
@@ -254,6 +259,7 @@ struct iommu_ops {
int (*def_domain_type)(struct device *dev);
  
  	const struct iommu_domain_ops *default_domain_ops;

+   const struct iommu_domain_ops *sva_domain_ops;


Per Joerg's comment in anther thread,

https://lore.kernel.org/linux-iommu/yodvj7ervpidw...@8bytes.org/

adding a sva_domain_ops here is not the right way to go.

If no objection, I will make the sva domain go through the
generic domain_alloc/free() callbacks in the next version.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 06/10] iommu/sva: Refactoring iommu_sva_bind/unbind_device()

2022-05-22 Thread Baolu Lu

On 2022/5/20 19:28, Jean-Philippe Brucker wrote:

On Fri, May 20, 2022 at 02:38:12PM +0800, Baolu Lu wrote:

On 2022/5/20 00:39, Jean-Philippe Brucker wrote:

+struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct 
*mm)
+{
+   struct iommu_sva_domain *sva_domain;
+   struct iommu_domain *domain;
+   ioasid_t max_pasid = 0;
+   int ret = -EINVAL;
+
+   /* Allocate mm->pasid if necessary. */
+   if (!dev->iommu->iommu_dev->pasids)
+   return ERR_PTR(-EOPNOTSUPP);
+
+   if (dev_is_pci(dev)) {
+   max_pasid = pci_max_pasids(to_pci_dev(dev));
+   if (max_pasid < 0)
+   return ERR_PTR(max_pasid);
+   } else {
+   ret = device_property_read_u32(dev, "pasid-num-bits",
+  _pasid);
+   if (ret)
+   return ERR_PTR(ret);
+   max_pasid = (1UL << max_pasid);
+   }

The IOMMU driver needs this PASID width information earlier, when creating
the PASID table (in .probe_device(), .attach_dev()). Since we're moving it
to the IOMMU core to avoid code duplication, it should be done earlier and
stored in dev->iommu

Yes, really. How about below changes?

 From f1382579e8a15ca49acdf758d38fd36451ea174d Mon Sep 17 00:00:00 2001
From: Lu Baolu
Date: Mon, 28 Feb 2022 15:01:35 +0800
Subject: [PATCH 1/1] iommu: Add pasids field in struct dev_iommu

Use this field to save the number of PASIDs that a device is able to
consume. It is a generic attribute of a device and lifting it into the
per-device dev_iommu struct could help to avoid the boilerplate code
in various IOMMU drivers.

Signed-off-by: Lu Baolu
---
  drivers/iommu/iommu.c | 15 +++
  include/linux/iommu.h |  2 ++
  2 files changed, 17 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e49c5a5b8cc1..6b731171d42f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -20,6 +20,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -194,6 +195,8 @@ EXPORT_SYMBOL_GPL(iommu_device_unregister);
  static struct dev_iommu *dev_iommu_get(struct device *dev)
  {
struct dev_iommu *param = dev->iommu;
+   u32 max_pasids = 0;
+   int ret;

if (param)
return param;
@@ -202,6 +205,18 @@ static struct dev_iommu *dev_iommu_get(struct device
*dev)
if (!param)
return NULL;

+   if (dev_is_pci(dev)) {
+   ret = pci_max_pasids(to_pci_dev(dev));
+   if (ret > 0)
+   max_pasids = ret;
+   } else {
+   ret = device_property_read_u32(dev, "pasid-num-bits",
+  _pasids);
+   if (!ret)
+   max_pasids = (1UL << max_pasids);
+   }
+   param->pasids = max_pasids;
+

we could also do a min() with the IOMMU PASID size here


mutex_init(>lock);
dev->iommu = param;
return param;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 45f274b2640d..d4296136ba75 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -371,6 +371,7 @@ struct iommu_fault_param {
   * @fwspec:IOMMU fwspec data
   * @iommu_dev: IOMMU device this device is linked to
   * @priv:  IOMMU Driver private data
+ * @pasids: number of supported PASIDs

'max_pasids' to stay consistent?


Both done. How about below changes?

From 008c73b9c0ad51a4a70a18d60361a76c28a63342 Mon Sep 17 00:00:00 2001
From: Lu Baolu 
Date: Mon, 28 Feb 2022 15:01:35 +0800
Subject: [PATCH 1/1] iommu: Add max_pasids field in struct dev_iommu

Use this field to save the number of PASIDs that a device is able to
consume. It is a generic attribute of a device and lifting it into the
per-device dev_iommu struct could help to avoid the boilerplate code
in various IOMMU drivers.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/iommu.c | 26 ++
 include/linux/iommu.h |  2 ++
 2 files changed, 28 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 847ad47a2dfd..365d0f2b7f55 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -218,6 +219,30 @@ static void dev_iommu_free(struct device *dev)
kfree(param);
 }

+static u32 dev_iommu_get_max_pasids(struct device *dev)
+{
+   u32 max_pasids = dev->iommu->iommu_dev->max_pasids;
+   u32 num_bits;
+   int ret;
+
+   if (!max_pasids)
+   return 0;
+
+   if (dev_is_pci(dev)) {
+   ret = pci_max_pasids(to_pci_dev(dev));
+   if (ret < 0)
+   return 0;
+
+   return min_t(u32, max_pasids, ret);
+   }
+
+   ret = device_property_read

Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops

2022-05-20 Thread Baolu Lu

On 2022/5/20 16:45, Joerg Roedel wrote:

On Mon, May 16, 2022 at 09:57:56AM +0800, Lu Baolu wrote:

const struct iommu_domain_ops *default_domain_ops;
+   const struct iommu_domain_ops *blocking_domain_ops;


I don't understand why extra domain-ops are needed for this. I think it
would be more straight-forward to implement allocation of
IOMMU_DOMAIN_BLOCKED domains in each driver and let the details be
handled in the set_dev() call-back. The IOMMU driver can make sure DMA
is blocked for a device when it encounters a IOMMU_DOMAIN_BLOCKED
domain.

For IOMMUs that have no explicit way to block DMA could just use an
unmanaged domain with an empty page-table.


Yes, this is what will go.

Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 06/10] iommu/sva: Refactoring iommu_sva_bind/unbind_device()

2022-05-20 Thread Baolu Lu

On 2022/5/20 00:39, Jean-Philippe Brucker wrote:

+struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct 
*mm)
+{
+   struct iommu_sva_domain *sva_domain;
+   struct iommu_domain *domain;
+   ioasid_t max_pasid = 0;
+   int ret = -EINVAL;
+
+   /* Allocate mm->pasid if necessary. */
+   if (!dev->iommu->iommu_dev->pasids)
+   return ERR_PTR(-EOPNOTSUPP);
+
+   if (dev_is_pci(dev)) {
+   max_pasid = pci_max_pasids(to_pci_dev(dev));
+   if (max_pasid < 0)
+   return ERR_PTR(max_pasid);
+   } else {
+   ret = device_property_read_u32(dev, "pasid-num-bits",
+  _pasid);
+   if (ret)
+   return ERR_PTR(ret);
+   max_pasid = (1UL << max_pasid);
+   }

The IOMMU driver needs this PASID width information earlier, when creating
the PASID table (in .probe_device(), .attach_dev()). Since we're moving it
to the IOMMU core to avoid code duplication, it should be done earlier and
stored in dev->iommu


Yes, really. How about below changes?

From f1382579e8a15ca49acdf758d38fd36451ea174d Mon Sep 17 00:00:00 2001
From: Lu Baolu 
Date: Mon, 28 Feb 2022 15:01:35 +0800
Subject: [PATCH 1/1] iommu: Add pasids field in struct dev_iommu

Use this field to save the number of PASIDs that a device is able to
consume. It is a generic attribute of a device and lifting it into the
per-device dev_iommu struct could help to avoid the boilerplate code
in various IOMMU drivers.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/iommu.c | 15 +++
 include/linux/iommu.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e49c5a5b8cc1..6b731171d42f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -194,6 +195,8 @@ EXPORT_SYMBOL_GPL(iommu_device_unregister);
 static struct dev_iommu *dev_iommu_get(struct device *dev)
 {
struct dev_iommu *param = dev->iommu;
+   u32 max_pasids = 0;
+   int ret;

if (param)
return param;
@@ -202,6 +205,18 @@ static struct dev_iommu *dev_iommu_get(struct 
device *dev)

if (!param)
return NULL;

+   if (dev_is_pci(dev)) {
+   ret = pci_max_pasids(to_pci_dev(dev));
+   if (ret > 0)
+   max_pasids = ret;
+   } else {
+   ret = device_property_read_u32(dev, "pasid-num-bits",
+  _pasids);
+   if (!ret)
+   max_pasids = (1UL << max_pasids);
+   }
+   param->pasids = max_pasids;
+
mutex_init(>lock);
dev->iommu = param;
return param;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 45f274b2640d..d4296136ba75 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -371,6 +371,7 @@ struct iommu_fault_param {
  * @fwspec: IOMMU fwspec data
  * @iommu_dev:  IOMMU device this device is linked to
  * @priv:   IOMMU Driver private data
+ * @pasids: number of supported PASIDs
  *
  * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
  * struct iommu_group  *iommu_group;
@@ -382,6 +383,7 @@ struct dev_iommu {
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void*priv;
+   u32 pasids;
 };

 int iommu_device_register(struct iommu_device *iommu,
--
2.25.1

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-19 Thread Baolu Lu

On 2022/5/20 00:33, Jean-Philippe Brucker wrote:

diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 8909ea1094e3..1be21e6b93ec 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -7,6 +7,7 @@
  
  #include 

  #include 
+#include 
  
  int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max);

  struct mm_struct *iommu_sva_find(ioasid_t pasid);
@@ -16,6 +17,20 @@ struct device;
  struct iommu_fault;
  struct iopf_queue;
  
+struct iommu_sva_domain {

+   struct iommu_domain domain;
+   struct mm_struct*mm;
+};
+
+#define to_sva_domain(d) container_of_safe(d, struct iommu_sva_domain, domain)

Is there a reason to use the 'safe' version of container_of()?  Callers of
to_sva_domain() don't check the return value before dereferencing it so
they would break anyway if someone passes an error pointer as domain.  I
think it matters because there is no other user of container_of_safe() in
the kernel (the only user, lustre, went away in 2018) so someone will want
to remove it.


Fair enough. I wondered why there's no user in the tree. Thanks for the
explanation. I will replace it with container_of().



Apart from that

Reviewed-by: Jean-Philippe Brucker



Thank you!

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 01/10] iommu: Add pasids field in struct iommu_device

2022-05-19 Thread Baolu Lu

Hi Jean,

On 2022/5/19 18:37, Jean-Philippe Brucker wrote:

On Thu, May 19, 2022 at 03:20:38PM +0800, Lu Baolu wrote:

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 88817a3376ef..6e2cd082c670 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3546,6 +3546,7 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
/* SID/SSID sizes */
smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg);
smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg);
+   smmu->iommu.pasids = smmu->ssid_bits;

This should be 1UL << smmu->ssid_bits


Done. Thank you for the reminding.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-19 Thread Baolu Lu

On 2022/5/19 02:21, Jacob Pan wrote:

DMA requests tagged with PASID can target individual IOMMU domains.
Introduce a domain-wide PASID for DMA API, it will be used on the same
mapping as legacy DMA without PASID. Let it be IOVA or PA in case of
identity domain.

Signed-off-by: Jacob Pan 
---
  include/linux/iommu.h | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9405034e3013..36ad007084cc 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -106,6 +106,8 @@ struct iommu_domain {
enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
  void *data);
void *fault_data;
+   ioasid_t dma_pasid; /* Used for DMA requests with PASID */


This looks more suitable for iommu_dma_cookie?


+   atomic_t dma_pasid_users;
  };
  
  static inline bool iommu_is_dma_domain(struct iommu_domain *domain)


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain

2022-05-19 Thread Baolu Lu

On 2022/5/19 02:21, Jacob Pan wrote:

IOMMU group maintains a PASID array which stores the associated IOMMU
domains. This patch introduces a helper function to do domain to PASID
look up. It will be used by TLB flush and device-PASID attach verification.


Do you really need this?

The IOMMU driver has maintained a device tracking list for each domain.
It has been used for cache invalidation when unmap() is called against
dma domain.

Best regards,
baolu



Signed-off-by: Jacob Pan 
---
  drivers/iommu/iommu.c | 22 ++
  include/linux/iommu.h |  6 +-
  2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 00d0262a1fe9..22f44833db64 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3199,3 +3199,25 @@ struct iommu_domain *iommu_get_domain_for_iopf(struct 
device *dev,
  
  	return domain;

  }
+
+ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct iommu_domain 
*domain)
+{
+   struct iommu_domain *tdomain;
+   struct iommu_group *group;
+   unsigned long index;
+   ioasid_t pasid = INVALID_IOASID;
+
+   group = iommu_group_get(dev);
+   if (!group)
+   return pasid;
+
+   xa_for_each(>pasid_array, index, tdomain) {
+   if (domain == tdomain) {
+   pasid = index;
+   break;
+   }
+   }
+   iommu_group_put(group);
+
+   return pasid;
+}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 36ad007084cc..c0440a4be699 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -694,7 +694,7 @@ void iommu_detach_device_pasid(struct iommu_domain *domain,
   struct device *dev, ioasid_t pasid);
  struct iommu_domain *
  iommu_get_domain_for_iopf(struct device *dev, ioasid_t pasid);
-
+ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct iommu_domain 
*domain);
  #else /* CONFIG_IOMMU_API */
  
  struct iommu_ops {};

@@ -1070,6 +1070,10 @@ iommu_get_domain_for_iopf(struct device *dev, ioasid_t 
pasid)
  {
return NULL;
  }
+static ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct 
iommu_domain *domain)
+{
+   return INVALID_IOASID;
+}
  #endif /* CONFIG_IOMMU_API */
  
  #ifdef CONFIG_IOMMU_SVA


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMU

2022-05-18 Thread Baolu Lu

On 2022/5/17 19:13, Jason Gunthorpe wrote:

On Tue, May 17, 2022 at 10:05:43AM +0800, Baolu Lu wrote:

Hi Jason,

On 2022/5/17 02:06, Jason Gunthorpe wrote:

+static __init int tboot_force_iommu(void)
+{
+   if (!tboot_enabled())
+   return 0;
+
+   if (no_iommu || dmar_disabled)
+   pr_warn("Forcing Intel-IOMMU to enabled\n");

Unrelated, but when we are in the special secure IOMMU modes, do we
force ATS off? Specifically does the IOMMU reject TLPs that are marked
as translated?


Good question. From IOMMU point of view, I don't see a point to force
ATS off, but trust boot involves lots of other things that I am not
familiar with. Anybody else could help to answer?


ATS is inherently not secure, if a rouge device can issue a TLP with
the translated bit set then it has unlimited access to host memory.


Agreed. The current logic is that the platform lets the OS know such
devices through firmware (ACPI/DT) and OS sets the untrusted flag in
their device structures. The IOMMU subsystem will disable ATS on devices
with the untrusted flag set.

There is some discussion about allowing the supervisor users to set the
trust policy through the sysfs ABI, but I don't think this has happened
in upstream kernel.


Many of these trusted iommu scenarios rely on the idea that a rouge
device cannot DMA to arbitary system memory.


I am not sure whether tboot has the same requirement.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops

2022-05-18 Thread Baolu Lu

On 2022/5/17 21:13, Jason Gunthorpe wrote:

On Tue, May 17, 2022 at 01:43:03PM +0100, Robin Murphy wrote:


FWIW from my point of view I'm happy with having a .detach_dev_pasid op
meaning implicitly-blocked access for now.


If this is the path then lets not call it attach/detach
please. 'set_dev_pasid' and 'set_dev_blocking_pasid' are clearer
names.


Sure. And with the blocking domain implemented, the
set_dev_blocking_pasid could be deprecated.




On SMMUv3, PASIDs don't mix with our current notion of
IOMMU_DOMAIN_IDENTITY (nor the potential one for
IOMMU_DOMAIN_BLOCKED), so giving PASIDs functional symmetry with
devices would need significantly more work anyway.


There is no extra work in the driver, beyond SMMU having to implement
a blocking domain. The blocking domain's set_dev_pasid op simply is
whatever set_dev_blocking_pasid would have done on the unmanaged
domain.

identity doesn't come into this, identity domains should have a NULL
set_dev_pasid op if the driver can't support using it on a PASID.

IMHO blocking_domain->ops->set_dev_pasid() is just a more logical name
than domain->ops->set_dev_blocking_pasid() - especially since VFIO
would like drivers to implement blocking domain anyhow.


Perhaps implementing blocking domains for intel and smmuv3 iommu drivers
seem to be a more practical start.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops

2022-05-16 Thread Baolu Lu

Hi Jason,

On 2022/5/16 21:57, Jason Gunthorpe wrote:

On Mon, May 16, 2022 at 12:22:08PM +0100, Robin Murphy wrote:

On 2022-05-16 02:57, Lu Baolu wrote:

Each IOMMU driver must provide a blocking domain ops. If the hardware
supports detaching domain from device, setting blocking domain equals
detaching the existing domain from the deivce. Otherwise, an UNMANAGED
domain without any mapping will be used instead.

Unfortunately that's backwards - most of the implementations of .detach_dev
are disabling translation entirely, meaning the device ends up effectively
in passthrough rather than blocked.

Ideally we'd convert the detach_dev of every driver into either
a blocking or identity domain. The trick is knowing which is which..


I am still a bit puzzled about how the blocking_domain should be used 
when it is extended to support ->set_dev_pasid.


If it's a blocking domain, the IOMMU driver knows that setting the
blocking domain to device pasid means detaching the existing one.

But if it's an identity domain, how could the IOMMU driver choose
between:

 - setting the input domain to the pasid on device; or,
 - detaching the existing domain.

I've ever thought about below solutions:

- Checking the domain types and dispatching them to different
  operations.
- Using different blocking domains for different types of domains.

But both look rough.



Guessing going down the list:
  apple dart - blocking, detach_dev calls apple_dart_hw_disable_dma() same as
   IOMMU_DOMAIN_BLOCKED
  [I wonder if this drive ris wrong in other ways though because
I dont see a remove_streams in attach_dev]
  exynos - this seems to disable the 'sysmmu' so I'm guessing this is
   identity
  iommu-vmsa - Comment says 'disable mmu translaction' so I'm guessing
   this is idenity
  mkt_v1 - Code looks similar to mkt, which is probably identity.
  rkt - No idea
  sprd - No idea
  sun50i - This driver confusingly treats identity the same as
   unmanaged, seems wrong, no idea.
  amd - Not sure, clear_dte_entry() seems to set translation on but points
the PTE to 0 ? Based on the spec table 8 I would have expected
TV to be clear which would be blocking. Maybe a bug??
  arm smmu qcomm - not sure
  intel - blocking

These doesn't support default domains, so detach_dev should return
back to DMA API ownership, which is either identity or something weird:
  fsl_pamu - identity due to the PPC use of dma direct
  msm
  mkt
  omap
  s390 - platform DMA ops
  terga-gart - Usually something called a GART would be 0 length once
   disabled, guessing blocking?
  tegra-smmu

So, the approach here should be to go driver by driver and convert
detach_dev to either identity, blocking or just delete it entirely,
excluding the above 7 that don't support default domains. And get acks
from the driver owners.



Agreed. There seems to be a long way to go. I am wondering if we could
decouple this refactoring from my new SVA API work? We can easily switch
.detach_dev_pasid to using blocking domain later.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMU

2022-05-16 Thread Baolu Lu

Hi Jason,

On 2022/5/17 02:06, Jason Gunthorpe wrote:

+static __init int tboot_force_iommu(void)
+{
+   if (!tboot_enabled())
+   return 0;
+
+   if (no_iommu || dmar_disabled)
+   pr_warn("Forcing Intel-IOMMU to enabled\n");

Unrelated, but when we are in the special secure IOMMU modes, do we
force ATS off? Specifically does the IOMMU reject TLPs that are marked
as translated?


Good question. From IOMMU point of view, I don't see a point to force
ATS off, but trust boot involves lots of other things that I am not
familiar with. Anybody else could help to answer?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops

2022-05-16 Thread Baolu Lu

Hi Robin,

On 2022/5/16 19:22, Robin Murphy wrote:

On 2022-05-16 02:57, Lu Baolu wrote:

Each IOMMU driver must provide a blocking domain ops. If the hardware
supports detaching domain from device, setting blocking domain equals
detaching the existing domain from the deivce. Otherwise, an UNMANAGED
domain without any mapping will be used instead.


Unfortunately that's backwards - most of the implementations of 
.detach_dev are disabling translation entirely, meaning the device ends 
up effectively in passthrough rather than blocked. Conversely, at least 
arm-smmu and arm-smmu-v3 could implement IOMMU_DOMAIN_BLOCKED properly 
with fault-type S2CRs and STEs respectively, it just needs a bit of 
wiring up.


Thank you for letting me know this.

This means that we need to add an additional UNMANAGED domain for each
iommu group, although it is not used most of the time. If most IOMMU
drivers could implement real dumb blocking domains, this burden may be
reduced.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-15 Thread Baolu Lu

On 2022/5/12 19:51, Jason Gunthorpe wrote:

On Thu, May 12, 2022 at 08:00:59AM +0100, Jean-Philippe Brucker wrote:


It is not "missing" it is just renamed to blocking_domain->ops->set_dev_pasid()

The implementation of that function would be identical to
detach_dev_pasid.


   attach(dev, pasid, sva_domain)
   detach(dev, pasid, sva_domain)

versus

   set_dev_pasid(dev, pasid, sva_domain)
   set_dev_pasid(dev, pasid, blocking)

we loose the information of the domain previously attached, and the SMMU
driver has to retrieve it to find the ASID corresponding to the mm.


It would be easy to have the old domain be an input as well - the core
code knows it.


Thanks a lot for all suggestions. I have posted a follow-up series for
this:

https://lore.kernel.org/linux-iommu/20220516015759.2952771-1-baolu...@linux.intel.com/

Let's discuss this there.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Try info->iommu in device_to_iommu()

2022-05-12 Thread Baolu Lu

On 2022/5/13 08:32, Nicolin Chen wrote:

Local boot test and VFIO sanity test show that info->iommu can be
used in device_to_iommu() as a fast path. So this patch adds it.

Signed-off-by: Nicolin Chen 
---
  drivers/iommu/intel/iommu.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2990f80c5e08..412fca5ab9cd 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -777,6 +777,7 @@ static bool iommu_is_dummy(struct intel_iommu *iommu, 
struct device *dev)
  struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn)
  {
struct dmar_drhd_unit *drhd = NULL;
+   struct device_domain_info *info;
struct pci_dev *pdev = NULL;
struct intel_iommu *iommu;
struct device *tmp;
@@ -786,6 +787,10 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 
*bus, u8 *devfn)
if (!dev)
return NULL;
  
+	info = dev_iommu_priv_get(dev);

+   if (info)
+   return info->iommu;


device_to_iommu() also returns device source id (@bus, @devfn).

Perhaps, we can make device_to_iommu() only for probe_device() where the
per-device info data is not initialized yet. After probe_device(), iommu
and sid are retrieved through other helpers by looking up the device
info directly?

Best regards,
baolu


+
if (dev_is_pci(dev)) {
struct pci_dev *pf_pdev;
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/vt-d: Make DMAR_UNITS_SUPPORTED a config setting

2022-05-12 Thread Baolu Lu

On 2022/5/13 07:12, Steve Wahl wrote:

On Thu, May 12, 2022 at 10:13:09AM -0500, Steve Wahl wrote:

To support up to 64 sockets with 10 DMAR units each (640), make the
value of DMAR_UNITS_SUPPORTED adjustable by a config variable,
CONFIG_DMAR_UNITS_SUPPORTED, and make it's default 1024 when MAXSMP is
set.

If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set
to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to
allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ
remapping doesn't support X2APIC mode x2apic disabled"; and the system
fails to boot properly.

Signed-off-by: Steve Wahl 


I've received a report from the kernel test robot ,
that this patch causes an error (shown below) when
CONFIG_IOMMU_SUPPORT is not set.

In my opinion, this is because include/linux/dmar.h and
include/linux/intel-iommu are being #included when they are not really
being used.

I've tried placing the contents of intel-iommu.h within an #ifdef
CONFIG_INTEL_IOMMU, and that fixes the problem.

Two questions:

A) Is this the desired approach to to the fix?


Most part of include/linux/intel-iommu.h is private to Intel IOMMU
driver. They should be put in a header like drivers/iommu/intel
/iommu.h. Eventually, we should remove include/linux/intel-iommu.h
and device drivers interact with iommu subsystem through the IOMMU
kAPIs.

Best regards,
baolu



B) Should it be a separate patch, or added onto this patch as a v3?

Error message:  --

In file included from include/linux/intel-iommu.h:21,
 from arch/x86/kvm/x86.c:44:

include/linux/dmar.h:21:33: error: 'CONFIG_DMAR_UNITS_SUPPORTED' undeclared 
here (not in a function); did you mean 'DMAR_UNITS_SUPPORTED'?

   21 | #define DMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED
  | ^~~
include/linux/intel-iommu.h:531:35: note: in expansion of macro 
'DMAR_UNITS_SUPPORTED'
  531 | unsigned int iommu_refcnt[DMAR_UNITS_SUPPORTED];
  |   ^~~~


vim +21 include/linux/dmar.h

 20
   > 21  #define DMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED
 22

Initial stab at fixing it: --

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 2f9891cb3d00..916fd7b5bcb5 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -10,6 +10,8 @@
  #ifndef _INTEL_IOMMU_H_
  #define _INTEL_IOMMU_H_
  
+#ifdef CONFIG_INTEL_IOMMU

+
  #include 
  #include 
  #include 
@@ -831,4 +833,6 @@ static inline const char *decode_prq_descriptor(char *str, 
size_t size,
return str;
  }
  
+#endif /* CONFIG_IOMMU_SUPPORT */

+
  #endif


Thanks.

--> Steve Wahl




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Baolu Lu

On 2022/5/12 20:03, Jason Gunthorpe wrote:

On Thu, May 12, 2022 at 07:59:41PM +0800, Baolu Lu wrote:

On 2022/5/12 19:48, Jason Gunthorpe wrote:

On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote:

On 2022/5/12 13:01, Tian, Kevin wrote:

From: Baolu Lu 
Sent: Thursday, May 12, 2022 11:03 AM

On 2022/5/11 22:53, Jason Gunthorpe wrote:

Also, given the current arrangement it might make sense to have a
struct iommu_domain_sva given that no driver is wrappering this in
something else.

Fair enough. How about below wrapper?

+struct iommu_sva_domain {
+   /*
+* Common iommu domain header,*must*  be put at the top
+* of the structure.
+*/
+   struct iommu_domain domain;
+   struct mm_struct *mm;
+   struct iommu_sva bond;
+}

The refcount is wrapped in bond.

I'm still not sure that bond is necessary


"bond" is the sva handle that the device drivers get through calling
iommu_sva_bind().



'bond' was required before because we didn't have a domain to wrap
the page table at that time.

Now we have a domain and it is 1:1 associated to bond. Probably
make sense now by just returning the domain as the sva handle
instead?


It also includes the device information that the domain has been
attached. So the sva_unbind() looks like this:

/**
   * iommu_sva_unbind_device() - Remove a bond created with
iommu_sva_bind_device
   * @handle: the handle returned by iommu_sva_bind_device()
   *
   * Put reference to a bond between device and address space. The device
should
   * not be issuing any more transaction for this PASID. All outstanding page
   * requests for this PASID must have been flushed to the IOMMU.
   */
void iommu_sva_unbind_device(struct iommu_sva *handle)

It's fine to replace the iommu_sva with iommu_sva_domain for sva handle,
if we can include the device in the unbind() interface.


Why would we have a special unbind for SVA?


It's about SVA kAPI for device drivers. The existing kAPIs include:

struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 struct mm_struct *mm,
 void *drvdata);
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);


This is not what we agreed the API should be. We agreed:

  iommu_sva_domain_alloc()
  iommu_attach_device_pasid()
  iommu_detach_device_pasid()

Again, SVA should not be different from normal domain stuff.


Yes, agreed.

I am trying to achieve this in two steps. This first step focuses on
internal iommu implementation and keep the driver kAPI untouched. Then,
the second step focus on the driver APIs.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Baolu Lu

On 2022/5/12 19:51, Jason Gunthorpe wrote:

On Thu, May 12, 2022 at 11:02:39AM +0800, Baolu Lu wrote:

+   mutex_lock(>mutex);
+   domain = xa_load(>pasid_array, pasid);
+   if (domain && domain->type != type)
+   domain = NULL;
+   mutex_unlock(>mutex);
+   iommu_group_put(group);
+
+   return domain;

This is bad locking, group->pasid_array values cannot be taken outside
the lock.


It's not iommu core, but SVA (or other feature components) that manage
the life cycle of a domain. The iommu core only provides a place to
store the domain pointer. The feature components are free to fetch their
domain pointers from iommu core as long as they are sure that the domain
is alive during use.


I'm not convinced.


I'm sorry, I may not have explained it clearly. :-)

This helper is safe for uses inside the IOMMU subsystem. We could trust
ourselves that nobody will abuse this helper to obtain domains belonging
to others as the pasid has been allocated for SVA code. No other code
should be able to setup another type of domain for this pasid. The SVA
code has its own lock mechanism to avoid using after free.

Please correct me if I missed anything. :-) By the way, I can see some
similar helpers in current IOMMU core. For example,

struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
{
struct iommu_domain *domain;
struct iommu_group *group;

group = iommu_group_get(dev);
if (!group)
return NULL;

domain = group->domain;

iommu_group_put(group);

return domain;
}
EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Baolu Lu

On 2022/5/12 19:48, Jason Gunthorpe wrote:

On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote:

On 2022/5/12 13:01, Tian, Kevin wrote:

From: Baolu Lu 
Sent: Thursday, May 12, 2022 11:03 AM

On 2022/5/11 22:53, Jason Gunthorpe wrote:

Also, given the current arrangement it might make sense to have a
struct iommu_domain_sva given that no driver is wrappering this in
something else.

Fair enough. How about below wrapper?

+struct iommu_sva_domain {
+   /*
+* Common iommu domain header,*must*  be put at the top
+* of the structure.
+*/
+   struct iommu_domain domain;
+   struct mm_struct *mm;
+   struct iommu_sva bond;
+}

The refcount is wrapped in bond.

I'm still not sure that bond is necessary


"bond" is the sva handle that the device drivers get through calling
iommu_sva_bind().



'bond' was required before because we didn't have a domain to wrap
the page table at that time.

Now we have a domain and it is 1:1 associated to bond. Probably
make sense now by just returning the domain as the sva handle
instead?


It also includes the device information that the domain has been
attached. So the sva_unbind() looks like this:

/**
  * iommu_sva_unbind_device() - Remove a bond created with
iommu_sva_bind_device
  * @handle: the handle returned by iommu_sva_bind_device()
  *
  * Put reference to a bond between device and address space. The device
should
  * not be issuing any more transaction for this PASID. All outstanding page
  * requests for this PASID must have been flushed to the IOMMU.
  */
void iommu_sva_unbind_device(struct iommu_sva *handle)

It's fine to replace the iommu_sva with iommu_sva_domain for sva handle,
if we can include the device in the unbind() interface.


Why would we have a special unbind for SVA?


It's about SVA kAPI for device drivers. The existing kAPIs include:

struct iommu_sva *iommu_sva_bind_device(struct device *dev,
struct mm_struct *mm,
void *drvdata);
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);


SVA should not different from normal domains it should use the normal
detach flow too.


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-12 Thread Baolu Lu

Hi Jason,

On 2022/5/12 01:00, Jason Gunthorpe wrote:

Consolidate pasid programming into dev_set_pasid() then called by both
intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right?

I was only suggesting that really dev_attach_pasid() op is misnamed,
it should be called set_dev_pasid() and act like a set, not a paired
attach/detach - same as the non-PASID ops.


So,

  "set_dev_pasid(domain, device, pasid)" equals to dev_attach_pasid()

and

  "set_dev_pasid(NULL, device, pasid)" equals to dev_detach_pasid()?

do I understand it right?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Baolu Lu

On 2022/5/12 13:44, Tian, Kevin wrote:

From: Baolu Lu 
Sent: Thursday, May 12, 2022 1:17 PM

On 2022/5/12 13:01, Tian, Kevin wrote:

From: Baolu Lu 
Sent: Thursday, May 12, 2022 11:03 AM

On 2022/5/11 22:53, Jason Gunthorpe wrote:

Also, given the current arrangement it might make sense to have a
struct iommu_domain_sva given that no driver is wrappering this in
something else.

Fair enough. How about below wrapper?

+struct iommu_sva_domain {
+   /*
+* Common iommu domain header,*must*  be put at the top
+* of the structure.
+*/
+   struct iommu_domain domain;
+   struct mm_struct *mm;
+   struct iommu_sva bond;
+}

The refcount is wrapped in bond.

I'm still not sure that bond is necessary


"bond" is the sva handle that the device drivers get through calling
iommu_sva_bind().



'bond' was required before because we didn't have a domain to wrap
the page table at that time.

Now we have a domain and it is 1:1 associated to bond. Probably
make sense now by just returning the domain as the sva handle
instead?


It also includes the device information that the domain has been
attached. So the sva_unbind() looks like this:

/**
   * iommu_sva_unbind_device() - Remove a bond created with
iommu_sva_bind_device
   * @handle: the handle returned by iommu_sva_bind_device()
   *
   * Put reference to a bond between device and address space. The device
should
   * not be issuing any more transaction for this PASID. All outstanding page
   * requests for this PASID must have been flushed to the IOMMU.
   */
void iommu_sva_unbind_device(struct iommu_sva *handle)

It's fine to replace the iommu_sva with iommu_sva_domain for sva handle,
if we can include the device in the unbind() interface.


can we just have unbind(domain, device)?


Yes. With this, we can remove bond.

This could be done in below phase 2.





Anyway, I'd expect to achieve all these in two steps:

- sva and iopf refactoring, only iommu internal changes;
- sva interface refactoring, only interface changes.

Does above work?

Best regards,
baolu


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-11 Thread Baolu Lu

On 2022/5/12 13:01, Tian, Kevin wrote:

From: Baolu Lu 
Sent: Thursday, May 12, 2022 11:03 AM

On 2022/5/11 22:53, Jason Gunthorpe wrote:

Also, given the current arrangement it might make sense to have a
struct iommu_domain_sva given that no driver is wrappering this in
something else.

Fair enough. How about below wrapper?

+struct iommu_sva_domain {
+   /*
+* Common iommu domain header,*must*  be put at the top
+* of the structure.
+*/
+   struct iommu_domain domain;
+   struct mm_struct *mm;
+   struct iommu_sva bond;
+}

The refcount is wrapped in bond.

I'm still not sure that bond is necessary


"bond" is the sva handle that the device drivers get through calling
iommu_sva_bind().



'bond' was required before because we didn't have a domain to wrap
the page table at that time.

Now we have a domain and it is 1:1 associated to bond. Probably
make sense now by just returning the domain as the sva handle
instead?


It also includes the device information that the domain has been
attached. So the sva_unbind() looks like this:

/**
 * iommu_sva_unbind_device() - Remove a bond created with 
iommu_sva_bind_device

 * @handle: the handle returned by iommu_sva_bind_device()
 *
 * Put reference to a bond between device and address space. The device 
should

 * not be issuing any more transaction for this PASID. All outstanding page
 * requests for this PASID must have been flushed to the IOMMU.
 */
void iommu_sva_unbind_device(struct iommu_sva *handle)

It's fine to replace the iommu_sva with iommu_sva_domain for sva handle,
if we can include the device in the unbind() interface.

Anyway, I'd expect to achieve all these in two steps:

- sva and iopf refactoring, only iommu internal changes;
- sva interface refactoring, only interface changes.

Does above work?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-11 Thread Baolu Lu

On 2022/5/11 22:53, Jason Gunthorpe wrote:

Assuming we leave room for multi-device groups this logic should just
be

group = iommu_group_get(dev);
if (!group)
return -ENODEV;

mutex_lock(>mutex);
domain = xa_load(>pasid_array, mm->pasid);
if (!domain || domain->type != IOMMU_DOMAIN_SVA || domain->mm != mm)
domain = iommu_sva_alloc_domain(dev, mm);

?

Agreed. As a helper in iommu core, how about making it more generic like
below?

IDK, is there more users of this? AFAIK SVA is the only place that
will be auto-sharing?


The generic thing is that components, like SVA, want to fetch the
attached domain from the iommu core.




+   mutex_lock(>mutex);
+   domain = xa_load(>pasid_array, pasid);
+   if (domain && domain->type != type)
+   domain = NULL;
+   mutex_unlock(>mutex);
+   iommu_group_put(group);
+
+   return domain;

This is bad locking, group->pasid_array values cannot be taken outside
the lock.


It's not iommu core, but SVA (or other feature components) that manage
the life cycle of a domain. The iommu core only provides a place to
store the domain pointer. The feature components are free to fetch their
domain pointers from iommu core as long as they are sure that the domain
is alive during use.




And stick the refcount in the sva_domain

Also, given the current arrangement it might make sense to have a
struct iommu_domain_sva given that no driver is wrappering this in
something else.

Fair enough. How about below wrapper?

+struct iommu_sva_domain {
+   /*
+* Common iommu domain header,*must*  be put at the top
+* of the structure.
+*/
+   struct iommu_domain domain;
+   struct mm_struct *mm;
+   struct iommu_sva bond;
+}

The refcount is wrapped in bond.

I'm still not sure that bond is necessary


"bond" is the sva handle that the device drivers get through calling
iommu_sva_bind().



But yes, something like that


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Baolu Lu

On 2022/5/12 01:25, Jacob Pan wrote:

Hi Jason,

On Wed, 11 May 2022 14:00:25 -0300, Jason Gunthorpe  wrote:


On Wed, May 11, 2022 at 10:02:16AM -0700, Jacob Pan wrote:

If not global, perhaps we could have a list of pasids (e.g. xarray)
attached to the device_domain_info. The TLB flush logic would just
go through the list w/o caring what the PASIDs are for. Does it
make sense to you?


Sort of, but we shouldn't duplicate xarrays - the group already has
this xarray - need to find some way to allow access to it from the
driver.
   

I am not following,  here are the PASIDs for devTLB flush which is per
device. Why group?


Because group is where the core code stores it.

I see, with singleton group. I guess I can let dma-iommu code call

iommu_attach_dma_pasid {
iommu_attach_device_pasid();
Then the PASID will be stored in the group xa.
The flush code can retrieve PASIDs from device_domain_info.device -> group
-> pasid_array.
Thanks for pointing it out, I missed the new pasid_array.



We could retrieve PASIDs from the device PASID table but xa would be
more efficient.
   

Are you suggesting the dma-iommu API should be called
iommu_set_dma_pasid instead of iommu_attach_dma_pasid?


No that API is Ok - the driver ops API should be 'set' not
attach/detach

Sounds good, this operation has little in common with
domain_ops.dev_attach_pasid() used by SVA domain. So I will add a
new domain_ops.dev_set_pasid()


What? No, their should only be one operation, 'dev_set_pasid' and it
is exactly the same as the SVA operation. It configures things so that
any existing translation on the PASID is removed and the PASID
translates according to the given domain.

SVA given domain or UNMANAGED given domain doesn't matter to the
higher level code. The driver should implement per-domain ops as
required to get the different behaviors.

Perhaps some code to clarify, we have
sva_domain_ops.dev_attach_pasid() = intel_svm_attach_dev_pasid;
default_domain_ops.dev_attach_pasid() = intel_iommu_attach_dev_pasid;


Yes, keep that structure
  

Consolidate pasid programming into dev_set_pasid() then called by both
intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right?
  


I was only suggesting that really dev_attach_pasid() op is misnamed,
it should be called set_dev_pasid() and act like a set, not a paired
attach/detach - same as the non-PASID ops.


Got it. Perhaps another patch to rename, Baolu?


Yes. I can rename it in my sva series if others are also happy with this
naming.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-11 Thread Baolu Lu

On 2022/5/10 23:23, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 02:17:34PM +0800, Lu Baolu wrote:


+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to mm_users
+ * @drvdata: opaque data pointer to pass to bind callback
+ *
+ * Create a bond between device and address space, allowing the device to 
access
+ * the mm using the returned PASID. If a bond already exists between @device 
and
+ * @mm, it is returned and an additional reference is taken. Caller must call
+ * iommu_sva_unbind_device() to release each reference.
+ *
+ * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
+ * initialize the required SVA features.
+ *
+ * On error, returns an ERR_PTR value.
+ */
+struct iommu_sva *
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+{
+   int ret = -EINVAL;
+   struct iommu_sva *handle;
+   struct iommu_domain *domain;
+
+   /*
+* TODO: Remove the drvdata parameter after kernel PASID support is
+* enabled for the idxd driver.
+*/
+   if (drvdata)
+   return ERR_PTR(-EOPNOTSUPP);


Why is this being left behind? Clean up the callers too please.


Okay, let me try to.




+   /* Allocate mm->pasid if necessary. */
+   ret = iommu_sva_alloc_pasid(mm, 1, (1U << dev->iommu->pasid_bits) - 1);
+   if (ret)
+   return ERR_PTR(ret);
+
+   mutex_lock(_sva_lock);
+   /* Search for an existing bond. */
+   handle = xa_load(>iommu->sva_bonds, mm->pasid);
+   if (handle) {
+   refcount_inc(>users);
+   goto out_success;
+   }


How can there be an existing bond?

dev->iommu is per-device

The device_group_immutable_singleton() insists on a single device
group

Basically 'sva_bonds' is the same thing as the group->pasid_array.


Yes, really.



Assuming we leave room for multi-device groups this logic should just
be

group = iommu_group_get(dev);
if (!group)
return -ENODEV;

mutex_lock(>mutex);
domain = xa_load(>pasid_array, mm->pasid);
if (!domain || domain->type != IOMMU_DOMAIN_SVA || domain->mm != mm)
domain = iommu_sva_alloc_domain(dev, mm);

?


Agreed. As a helper in iommu core, how about making it more generic like
below?

+struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
+   iosid_t pasid,
+   unsigned int type)
+{
+   struct iommu_domain *domain;
+   struct iommu_group *group;
+
+   if (!pasid_valid(pasid))
+   return NULL;
+
+   group = iommu_group_get(dev);
+   if (!group)
+   return NULL;
+
+   mutex_lock(>mutex);
+   domain = xa_load(>pasid_array, pasid);
+   if (domain && domain->type != type)
+   domain = NULL;
+   mutex_unlock(>mutex);
+   iommu_group_put(group);
+
+   return domain;
+}



And stick the refcount in the sva_domain

Also, given the current arrangement it might make sense to have a
struct iommu_domain_sva given that no driver is wrappering this in
something else.


Fair enough. How about below wrapper?

+struct iommu_sva_domain {
+   /*
+* Common iommu domain header, *must* be put at the top
+* of the structure.
+*/
+   struct iommu_domain domain;
+   struct mm_struct *mm;
+   struct iommu_sva bond;
+}

The refcount is wrapped in bond.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-10 Thread Baolu Lu

On 2022/5/10 22:02, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 02:17:29PM +0800, Lu Baolu wrote:


This adds a pair of common domain ops for this purpose and adds helpers
to attach/detach a domain to/from a {device, PASID}.


I wonder if this should not have a detach op - after discussing with
Robin we can see that detach_dev is not used in updated
drivers. Instead attach_dev acts as 'set_domain'

So, it would be more symmetrical if attaching a blocking_domain to the
PASID was the way to 'detach'.

This could be made straightforward by following the sketch I showed to
have a static, global blocing_domain and providing a pointer to it in
struct iommu_ops

Then 'detach pasid' is:

iommu_ops->blocking_domain->ops->attach_dev_pasid(domain, dev, pasid);

And we move away from the notion of 'detach' and in the direction that
everything continuously has a domain set. PASID would logically
default to blocking_domain, though we wouldn't track this anywhere.


I am not sure whether we still need to keep the blocking domain concept
when we are entering the new PASID world. Please allow me to wait and
listen to more opinions.

Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/12] iommu: Add pasid_bits field in struct dev_iommu

2022-05-10 Thread Baolu Lu

On 2022/5/10 22:34, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 02:17:28PM +0800, Lu Baolu wrote:


  int iommu_device_register(struct iommu_device *iommu,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 627a3ed5ee8f..afc63fce6107 100644
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2681,6 +2681,8 @@ static struct iommu_device *arm_smmu_probe_device(struct 
device *dev)
smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
master->stall_enabled = true;
  
+	dev->iommu->pasid_bits = master->ssid_bits;

return >iommu;
  
  err_free_master:

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2990f80c5e08..99643f897f26 100644
+++ b/drivers/iommu/intel/iommu.c
@@ -4624,8 +4624,11 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
if (pasid_supported(iommu)) {
int features = pci_pasid_features(pdev);
  
-if (features >= 0)

+   if (features >= 0) {
info->pasid_supported = features | 1;
+   dev->iommu->pasid_bits =
+   fls(pci_max_pasids(pdev)) - 1;
+   }


It is not very nice that both the iommu drivers have to duplicate the
code to read the pasid capability out of the PCI device.

IMHO it would make more sense for the iommu layer to report the
capability of its own HW block only, and for the core code to figure
out the master's limitation using a bus-specific approach.


Fair enough. The iommu hardware capability could be reported in

/**
 * struct iommu_device - IOMMU core representation of one IOMMU hardware
 *   instance
 * @list: Used by the iommu-core to keep a list of registered iommus
 * @ops: iommu-ops for talking to this iommu
 * @dev: struct device for sysfs handling
 */
struct iommu_device {
struct list_head list;
const struct iommu_ops *ops;
struct fwnode_handle *fwnode;
struct device *dev;
};

I haven't checked ARM code yet, but it works for x86 as far as I can
see.



It is also unfortunate that the enable/disable pasid is inside the
iommu driver as well - ideally the PCI driver itself would do this
when it knows it wants to use PASIDs.

The ordering interaction with ATS makes this look quite annoying
though. :(

I'm also not convinced individual IOMMU drivers should be forcing ATS
on, there are performance and functional implications here. Using ATS
or not is possibly best left as an administrator policy controlled by
the core code. Again we seem to have some mess.


Agreed with you. This has already been in my task list. I will start to
solve it after the iommufd tasks.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/4] iommu/vt-d: Check domain force_snooping against attached devices

2022-05-09 Thread Baolu Lu

On 2022/5/10 08:51, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Sunday, May 8, 2022 8:35 PM

As domain->force_snooping only impacts the devices attached with the
domain, there's no need to check against all IOMMU units. On the other
hand, force_snooping could be set on a domain no matter whether it has
been attached or not, and once set it is an immutable flag. If no
device attached, the operation always succeeds. Then this empty domain
can be only attached to a device of which the IOMMU supports snoop
control.

Signed-off-by: Lu Baolu 


Reviewed-by: Kevin Tian 


Thank you, Kevin. I will queue this series for v5.19.

Best regards,
baolu




---
  include/linux/intel-iommu.h |  1 +
  drivers/iommu/intel/pasid.h |  2 ++
  drivers/iommu/intel/iommu.c | 53
++---
  drivers/iommu/intel/pasid.c | 42 +
  4 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 72e5d7900e71..4f29139bbfc3 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -540,6 +540,7 @@ struct dmar_domain {
u8 has_iotlb_device: 1;
u8 iommu_coherency: 1;  /* indicate coherency of
iommu access */
u8 force_snooping : 1;  /* Create IOPTEs with snoop control
*/
+   u8 set_pte_snp:1;

struct list_head devices;   /* all devices' list */
struct iova_domain iovad;   /* iova's that belong to this domain
*/
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index ab4408c824a5..583ea67fc783 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -123,4 +123,6 @@ void intel_pasid_tear_down_entry(struct
intel_iommu *iommu,
 bool fault_ignore);
  int vcmd_alloc_pasid(struct intel_iommu *iommu, u32 *pasid);
  void vcmd_free_pasid(struct intel_iommu *iommu, u32 pasid);
+void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
+ struct device *dev, u32 pasid);
  #endif /* __INTEL_PASID_H */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b4802f4055a0..048ebfbd5fcb 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2459,7 +2459,7 @@ static int domain_setup_first_level(struct
intel_iommu *iommu,
if (level == 5)
flags |= PASID_FLAG_FL5LP;

-   if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
+   if (domain->force_snooping)
flags |= PASID_FLAG_PAGE_SNOOP;

return intel_pasid_setup_first_level(iommu, dev, (pgd_t *)pgd, pasid,
@@ -,7 +,7 @@ static int intel_iommu_map(struct iommu_domain
*domain,
prot |= DMA_PTE_READ;
if (iommu_prot & IOMMU_WRITE)
prot |= DMA_PTE_WRITE;
-   if (dmar_domain->force_snooping)
+   if (dmar_domain->set_pte_snp)
prot |= DMA_PTE_SNP;

max_addr = iova + size;
@@ -4567,13 +4567,60 @@ static phys_addr_t
intel_iommu_iova_to_phys(struct iommu_domain *domain,
return phys;
  }

+static bool domain_support_force_snooping(struct dmar_domain *domain)
+{
+   struct device_domain_info *info;
+   bool support = true;
+
+   assert_spin_locked(_domain_lock);
+   list_for_each_entry(info, >devices, link) {
+   if (!ecap_sc_support(info->iommu->ecap)) {
+   support = false;
+   break;
+   }
+   }
+
+   return support;
+}
+
+static void domain_set_force_snooping(struct dmar_domain *domain)
+{
+   struct device_domain_info *info;
+
+   assert_spin_locked(_domain_lock);
+
+   /*
+* Second level page table supports per-PTE snoop control. The
+* iommu_map() interface will handle this by setting SNP bit.
+*/
+   if (!domain_use_first_level(domain)) {
+   domain->set_pte_snp = true;
+   return;
+   }
+
+   list_for_each_entry(info, >devices, link)
+   intel_pasid_setup_page_snoop_control(info->iommu, info-

dev,

+PASID_RID2PASID);
+}
+
  static bool intel_iommu_enforce_cache_coherency(struct iommu_domain
*domain)
  {
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   unsigned long flags;

-   if (!domain_update_iommu_snooping(NULL))
+   if (dmar_domain->force_snooping)
+   return true;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   if (!domain_support_force_snooping(dmar_domain)) {
+   spin_unlock_irqrestore(_domain_lock, flags);
return false;
+   }
+
+   domain_set_force_snooping(dmar_domain);
dmar_domain->force_snooping = true;
+   spin_unlock_irqrestore(_domain_lock, flags);
+
return true;
  }

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 

Re: [PATCH v5 04/12] iommu/sva: Basic data structures for SVA

2022-05-07 Thread Baolu Lu

On 2022/5/7 16:32, Baolu Lu wrote:

Hi Jean,

On 2022/5/5 14:42, Baolu Lu wrote:

On 2022/5/4 02:09, Jean-Philippe Brucker wrote:

On Mon, May 02, 2022 at 09:48:34AM +0800, Lu Baolu wrote:

Use below data structures for SVA implementation in the IOMMU core:

- struct iommu_sva_ioas
   Represent the I/O address space shared with an application CPU 
address

   space. This structure has a 1:1 relationship with an mm_struct. It
   grabs a "mm->mm_count" refcount during creation and drop it on 
release.


Do we actually need this structure?  At the moment it only keeps 
track of
bonds, which we can move to struct dev_iommu. Replacing it by a mm 
pointer

in struct iommu_domain simplifies the driver and seems to work


Fair enough.

+struct iommu_sva_ioas {
+    struct mm_struct *mm;
+    ioasid_t pasid;
+
+    /* Counter of domains attached to this ioas. */
+    refcount_t users;
+
+    /* All bindings are linked here. */
+    struct list_head bonds;
+};

By moving @mm to struct iommu_domain and @bonds to struct dev_iommu, the
code looks simpler. The mm, sva domain and per-device dev_iommu are
guaranteed to be valid during bind() and unbind().

Will head this direction in the next version.


I'm trying to implement this idea in real code. It seems that we need
additional fields in struct iommu_domain to track which devices the mm
was bound to. It doesn't simplify the code much. Any thoughts?


Sorry, Jean. This has been discussed. We don't need to share sva domain
among devices at this stage. It's not a big issue to sva domain as it's
a dumb domain which has no support for map()/unmap() and the cache
manipulation.

I will still head this direction. Sorry for the noise.

Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 04/12] iommu/sva: Basic data structures for SVA

2022-05-07 Thread Baolu Lu

Hi Jean,

On 2022/5/5 14:42, Baolu Lu wrote:

On 2022/5/4 02:09, Jean-Philippe Brucker wrote:

On Mon, May 02, 2022 at 09:48:34AM +0800, Lu Baolu wrote:

Use below data structures for SVA implementation in the IOMMU core:

- struct iommu_sva_ioas
   Represent the I/O address space shared with an application CPU 
address

   space. This structure has a 1:1 relationship with an mm_struct. It
   grabs a "mm->mm_count" refcount during creation and drop it on 
release.


Do we actually need this structure?  At the moment it only keeps track of
bonds, which we can move to struct dev_iommu. Replacing it by a mm 
pointer

in struct iommu_domain simplifies the driver and seems to work


Fair enough.

+struct iommu_sva_ioas {
+    struct mm_struct *mm;
+    ioasid_t pasid;
+
+    /* Counter of domains attached to this ioas. */
+    refcount_t users;
+
+    /* All bindings are linked here. */
+    struct list_head bonds;
+};

By moving @mm to struct iommu_domain and @bonds to struct dev_iommu, the
code looks simpler. The mm, sva domain and per-device dev_iommu are
guaranteed to be valid during bind() and unbind().

Will head this direction in the next version.


I'm trying to implement this idea in real code. It seems that we need
additional fields in struct iommu_domain to track which devices the mm
was bound to. It doesn't simplify the code much. Any thoughts?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2] iommu: iommu_group_claim_dma_owner() must always assign a domain

2022-05-06 Thread Baolu Lu

On 2022/5/6 03:27, Jason Gunthorpe wrote:

On Thu, May 05, 2022 at 07:56:59PM +0100, Robin Murphy wrote:


Ack to that, there are certainly further improvements to consider once we've
got known-working code into a released kernel, but let's not get ahead of
ourselves just now.

Yes please
  

(I'm pretty sure we could get away with a single blocking domain per IOMMU
instance, rather than per group, but I deliberately saved that one for later
- right now one per group to match default domains is simpler to reason
about and less churny in the context of the current ownership patches)

I noticed this too..

But I thought the driver can do a better job of this. There is no
reason it has to allocate memory to return a IOMMU_DOMAIN_BLOCKED
domain - this struct could be a globally allocated singleton for the
entire driver and that would be even better memory savings.

For instance, here is a sketch for the Intel driver based on Baolu's
remark that intel_iommu_detach_device() establishes a blocking
behavior already on detach_dev (Baolu if you like it can you make a
proper patch?):


Yes, I will.

The same scheme could also be applied to identity/sva/block domains.
Perhaps make it after v5.19.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/4] iommu/vt-d: Check domain force_snooping against attached devices

2022-05-06 Thread Baolu Lu

On 2022/5/6 14:10, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Friday, May 6, 2022 1:27 PM
+
+/*
+ * Set the page snoop control for a pasid entry which has been set up.
+ */
+void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
+ struct device *dev, u32 pasid)
+{
+   struct pasid_entry *pte;
+   u16 did;
+
+   spin_lock(>lock);
+   pte = intel_pasid_get_entry(dev, pasid);
+   if (WARN_ON(!pte || !pasid_pte_is_present(pte))) {
+   spin_unlock(>lock);
+   return;
+   }
+
+   pasid_set_pgsnp(pte);
+   did = pasid_get_domain_id(pte);
+   spin_unlock(>lock);
+
+   pasid_flush_caches(iommu, pte, pasid, did);
+}


The comment of pasid_flush_caches() says:

/*
  * This function flushes cache for a newly setup pasid table entry.
  * Caller of it should not modify the in-use pasid table entries.
  */

Can you check whether that comment still hold?


I am sorry that I overlooked this. Very appreciated for pointing this
out!

How about below additional changes?

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index da47406011d3..68f7e8cfa848 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -782,5 +782,24 @@ void intel_pasid_setup_page_snoop_control(struct 
intel_iommu *iommu,

did = pasid_get_domain_id(pte);
spin_unlock(>lock);

-   pasid_flush_caches(iommu, pte, pasid, did);
+   if (!ecap_coherent(iommu->ecap))
+   clflush_cache_range(pte, sizeof(*pte));
+
+   /*
+* VT-d spec 3.4 table23 states guides for cache invalidation:
+*
+* - PASID-selective-within-Domain PASID-cache invalidation
+* - PASID-selective PASID-based IOTLB invalidation
+* - If (pasid is RID_PASID)
+*- Global Device-TLB invalidation to affected functions
+*   Else
+*- PASID-based Device-TLB invalidation (with S=1 and
+*  Addr[63:12]=0x7FFF_F) to affected functions
+*/
+   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
+   qi_flush_piotlb(iommu, did, pasid, 0, -1, 0);
+
+   /* Device IOTLB doesn't need to be flushed in caching mode. */
+   if (!cap_caching_mode(iommu->cap))
+   devtlb_invalidation_with_pasid(iommu, dev, pasid);
 }

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Increase DMAR_UNITS_SUPPORTED

2022-05-05 Thread Baolu Lu

On 2022/5/6 03:46, Steve Wahl wrote:

Increase DMAR_UNITS_SUPPORTED to support 64 sockets with 10 DMAR units
each, for a total of 640.

If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set
to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to
allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ
remapping doesn't support X2APIC mode x2apic disabled"; and the system
fails to boot.

Signed-off-by: Steve Wahl 
Reviewed-by: Mike Travis 
---

Note that we could not find a reason for connecting
DMAR_UNITS_SUPPORTED to MAX_IO_APICS as was done previously.  Perhaps
it seemed like the two would continue to match on earlier processors.
There doesn't appear to be kernel code that assumes that the value of
one is related to the other.


+Kevin

This maximum value was introduced by below commit. And I don't see any
hardware/software restrictions that we can't enlarge it after ten years.

commit 1b198bb04ad72669d4bd6575fc9945ed595bfee0
Author: Mike Travis 
Date:   Mon Mar 5 15:05:16 2012 -0800

x86/iommu/intel: Increase the number of iommus supported to 
MAX_IO_APICS


The number of IOMMUs supported should be the same as the number
of IO APICS.  This limit comes into play when the IOMMUs are
identity mapped, thus the number of possible IOMMUs in the
"static identity" (si) domain should be this same number.
[...]



  include/linux/dmar.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 45e903d84733..9d4867b8f42e 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -19,7 +19,7 @@
  struct acpi_dmar_header;
  
  #ifdef	CONFIG_X86

-# define   DMAR_UNITS_SUPPORTEDMAX_IO_APICS
+# define   DMAR_UNITS_SUPPORTED640
  #else
  # define  DMAR_UNITS_SUPPORTED64
  #endif


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 10/12] iommu: Prepare IOMMU domain for IOPF

2022-05-05 Thread Baolu Lu

On 2022/5/5 21:38, Jean-Philippe Brucker wrote:

Hi Baolu,

On Thu, May 05, 2022 at 04:31:38PM +0800, Baolu Lu wrote:

On 2022/5/4 02:20, Jean-Philippe Brucker wrote:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7cae631c1baa..33449523afbe 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3174,3 +3174,24 @@ void iommu_detach_device_pasid(struct iommu_domain 
*domain,
iommu_group_put(group);
   }
+
+struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
+   ioasid_t pasid)
+{
+   struct iommu_domain *domain;
+   struct iommu_group *group;
+
+   if (!pasid_valid(pasid))
+   return NULL;
+
+   group = iommu_group_get(dev);
+   if (!group)
+   return NULL;
+
+   mutex_lock(>mutex);

Unfortunately this still causes the deadlock when unbind() flushes the
IOPF queue while holding the group mutex.


Sorry, I didn't get your point here.

Do you mean unbind() could hold group mutex before calling this helper?
The group mutex is only available in iommu.c. The unbind() has no means
to hold this lock. Or, I missed anything?


I wasn't clear, it's iommu_detach_device_pasid() that holds the
group->mutex:

  iommu_sva_unbind_device()  |
   iommu_detach_device_pasid()   |
mutex_lock(>mutex)|
domain->ops->detach_dev_pasid()  | iopf_handle_group()
 iopf_queue_flush_dev()  |  iommu_get_domain_for_dev_pasid()
  ... wait for IOPF work |   mutex_lock(>mutex)
 |... deadlock


Ah! Yes. Thank you for the clarification.



Thanks,
Jean



Best regards,
baolu



If we make this function private to IOPF, then we can get rid of this
mutex_lock(). It's OK because:

* xarray protects its internal state with RCU, so we can call
xa_load() outside the lock.

* The domain obtained from xa_load is finalized. Its content is valid
because xarray stores the domain using rcu_assign_pointer(), which has a
release memory barrier, which pairs with data dependencies in IOPF
(domain->sva_ioas etc).

We'll need to be careful about this when allowing other users to install
a fault handler. Should be fine as long as the handler and data are
installed before the domain is added to pasid_array.

* We know the domain is valid the whole time IOPF is using it, because
unbind() waits for pending faults.

We just need a comment explaining the last point, something like:

 /*
* Safe to fetch outside the group mutex because:
  * - xarray protects its internal state with RCU
  * - the domain obtained is either NULL or fully formed
* - the IOPF work is the only caller and is flushed before the
*   domain is freed.
  */


Agreed. The mutex is needed only when domain could possibly be freed
before unbind(). In that case, we need this mutex and get a reference
from the domain. As we have dropped the domain user reference, this lock
is unnecessary.



Thanks,
Jean


+   domain = xa_load(>pasid_array, pasid);
+   mutex_unlock(>mutex);
+   iommu_group_put(group);
+
+   return domain;
+}




Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-05-05 Thread Baolu Lu

On 2022/5/3 15:49, Jean-Philippe Brucker wrote:

On Sat, Apr 30, 2022 at 03:33:17PM +0800, Baolu Lu wrote:

Jean, another quick question about the iommu_sva_bind_device()

/**
  * iommu_sva_bind_device() - Bind a process address space to a device
  * @dev: the device
  * @mm: the mm to bind, caller must hold a reference to it
  * @drvdata: opaque data pointer to pass to bind callback

This interface requires the caller to take a reference to mm. Which
reference should it take, mm->mm_count or mm->mm_users? It's better to
make it explicit in this comment.


Agreed, it's mm_users as required by mmu_notifier_register()


Thanks! I will add this in my refactoring patch.

Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/4] iommu/vt-d: Remove hard coding PGSNP bit in PASID entries

2022-05-05 Thread Baolu Lu

On 2022/5/5 16:46, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Thursday, May 5, 2022 9:07 AM

As enforce_cache_coherency has been introduced into the
iommu_domain_ops,
the kernel component which owns the iommu domain is able to opt-in its
requirement for force snooping support. The iommu driver has no need to
hard code the page snoop control bit in the PASID table entries anymore.

Signed-off-by: Lu Baolu 


Reviewed-by: Kevin Tian , with one nit:


---
  drivers/iommu/intel/pasid.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 41a0e3b02c79..0abfa7fc7fb0 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -710,9 +710,6 @@ int intel_pasid_setup_second_level(struct
intel_iommu *iommu,
pasid_set_fault_enable(pte);
pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));


Probably in a separate patch but above should really be renamed
to pasid_set_page_walk_snoop().


Yeah! Need a cleanup here. Above name is confusing.





-   if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
-   pasid_set_pgsnp(pte);
-
/*
 * Since it is a second level only translation setup, we should
 * set SRE bit as well (addresses are expected to be GPAs).
--
2.25.1




Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/4] iommu/vt-d: Check domain force_snooping against attached devices

2022-05-05 Thread Baolu Lu

On 2022/5/5 16:43, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Thursday, May 5, 2022 9:07 AM

As domain->force_snooping only impacts the devices attached with the
domain, there's no need to check against all IOMMU units. At the same
time, for a brand new domain (hasn't been attached to any device), the
force_snooping field could be set, but the attach_dev callback will
return failure if it wants to attach to a device which IOMMU has no
snoop control capability.


The description about brand new domain is not very clear. I think the
point here is that force_snooping could be set on a domain no matter
whether it has been attached or not and once set it is an immutable
flag. If no device attached the operation always succeeds then this
empty domain can be only attached to a device of which the IOMMU
supports snoop control.


Exactly. Will update this description.




  static bool intel_iommu_enforce_cache_coherency(struct iommu_domain
*domain)
  {
struct dmar_domain *dmar_domain = to_dmar_domain(domain);

-   if (!domain_update_iommu_snooping(NULL))
+   if (dmar_domain->force_snooping)
+   return true;
+
+   if (!domain_support_force_snooping(dmar_domain))
return false;
+


Who guarantees that domain->devices won't change between
above condition check and following set operation?


Good catch. Should lift the lock up here.




+   domain_set_force_snooping(dmar_domain);
dmar_domain->force_snooping = true;
+
return true;
  }



Thanks
Kevin


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 10/12] iommu: Prepare IOMMU domain for IOPF

2022-05-05 Thread Baolu Lu

On 2022/5/4 02:20, Jean-Philippe Brucker wrote:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7cae631c1baa..33449523afbe 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3174,3 +3174,24 @@ void iommu_detach_device_pasid(struct iommu_domain 
*domain,
  
  	iommu_group_put(group);

  }
+
+struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
+   ioasid_t pasid)
+{
+   struct iommu_domain *domain;
+   struct iommu_group *group;
+
+   if (!pasid_valid(pasid))
+   return NULL;
+
+   group = iommu_group_get(dev);
+   if (!group)
+   return NULL;
+
+   mutex_lock(>mutex);

Unfortunately this still causes the deadlock when unbind() flushes the
IOPF queue while holding the group mutex.


Sorry, I didn't get your point here.

Do you mean unbind() could hold group mutex before calling this helper?
The group mutex is only available in iommu.c. The unbind() has no means
to hold this lock. Or, I missed anything?

Best regards,
baolu



If we make this function private to IOPF, then we can get rid of this
mutex_lock(). It's OK because:

* xarray protects its internal state with RCU, so we can call
   xa_load() outside the lock.

* The domain obtained from xa_load is finalized. Its content is valid
   because xarray stores the domain using rcu_assign_pointer(), which has a
   release memory barrier, which pairs with data dependencies in IOPF
   (domain->sva_ioas etc).

   We'll need to be careful about this when allowing other users to install
   a fault handler. Should be fine as long as the handler and data are
   installed before the domain is added to pasid_array.

* We know the domain is valid the whole time IOPF is using it, because
   unbind() waits for pending faults.

We just need a comment explaining the last point, something like:

/*
* Safe to fetch outside the group mutex because:
 * - xarray protects its internal state with RCU
 * - the domain obtained is either NULL or fully formed
* - the IOPF work is the only caller and is flushed before the
*   domain is freed.
 */

Thanks,
Jean


+   domain = xa_load(>pasid_array, pasid);
+   mutex_unlock(>mutex);
+   iommu_group_put(group);
+
+   return domain;
+}


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 07/12] arm-smmu-v3/sva: Add SVA domain support

2022-05-05 Thread Baolu Lu

On 2022/5/4 02:12, Jean-Philippe Brucker wrote:

On Mon, May 02, 2022 at 09:48:37AM +0800, Lu Baolu wrote:

Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.

Signed-off-by: Lu Baolu 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 14 +++
  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 42 +++
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 21 ++
  3 files changed, 77 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..7631c00fdcbd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -759,6 +759,10 @@ struct iommu_sva *arm_smmu_sva_bind(struct device *dev, 
struct mm_struct *mm,
  void arm_smmu_sva_unbind(struct iommu_sva *handle);
  u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
  void arm_smmu_sva_notifier_synchronize(void);
+int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id);
+void arm_smmu_sva_detach_dev_pasid(struct iommu_domain *domain,
+  struct device *dev, ioasid_t id);
  #else /* CONFIG_ARM_SMMU_V3_SVA */
  static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
  {
@@ -804,5 +808,15 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva 
*handle)
  }
  
  static inline void arm_smmu_sva_notifier_synchronize(void) {}

+
+static inline int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t id)
+{
+   return -ENODEV;
+}
+
+static inline void arm_smmu_sva_detach_dev_pasid(struct iommu_domain *domain,
+struct device *dev,
+ioasid_t id) {}
  #endif /* CONFIG_ARM_SMMU_V3_SVA */
  #endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index c623dae1e115..3b843cd3ed67 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -541,3 +541,45 @@ void arm_smmu_sva_notifier_synchronize(void)
 */
mmu_notifier_synchronize();
  }
+
+int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id)
+{
+   int ret = 0;
+   struct iommu_sva *handle;
+   struct mm_struct *mm = iommu_sva_domain_mm(domain);
+
+   if (domain->type != IOMMU_DOMAIN_SVA || !mm)


We wouldn't get that far with a non-SVA domain since iommu_sva_domain_mm()
would dereference a NULL pointer. Could you move it after the domain->type
check, and maybe add a WARN_ON()?  It could help catch issues in future
API changes.


Sure. I will make it like this,

int arm_smmu_sva_attach_dev_pasid(struct iommu_domain *domain,
  struct device *dev, ioasid_t id)
{
int ret = 0;
struct mm_struct *mm;
struct iommu_sva *handle;

if (domain->type != IOMMU_DOMAIN_SVA)
return -EINVAL;

mm = iommu_sva_domain_mm(domain);
if (WARN_ON(!mm))
return -ENODEV;
... ...




+   return -EINVAL;
+
+   mutex_lock(_lock);
+   handle = __arm_smmu_sva_bind(dev, mm);
+   if (IS_ERR(handle))
+   ret = PTR_ERR(handle);
+   mutex_unlock(_lock);
+
+   return ret;
+}
+
+void arm_smmu_sva_detach_dev_pasid(struct iommu_domain *domain,
+  struct device *dev, ioasid_t id)
+{
+   struct arm_smmu_bond *bond = NULL, *t;
+   struct mm_struct *mm = iommu_sva_domain_mm(domain);
+   struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+   mutex_lock(_lock);
+   list_for_each_entry(t, >bonds, list) {
+   if (t->mm == mm) {
+   bond = t;
+   break;
+   }
+   }
+
+   if (!WARN_ON(!bond) && refcount_dec_and_test(>refs)) {
+   list_del(>list);
+   arm_smmu_mmu_notifier_put(bond->smmu_mn);
+   kfree(bond);
+   }
+   mutex_unlock(_lock);
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index afc63fce6107..bd80de0bad98 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1995,10 +1995,31 @@ static bool arm_smmu_capable(enum iommu_cap cap)
}
  }
  
+static void arm_smmu_sva_domain_free(struct iommu_domain *domain)

+{
+   kfree(domain);
+}
+
+static const struct iommu_domain_ops arm_smmu_sva_domain_ops = {
+   .attach_dev_pasid   = arm_smmu_sva_attach_dev_pasid,
+   .detach_dev_pasid   = arm_smmu_sva_detach_dev_pasid,
+   .free   = 

Re: [PATCH v5 04/12] iommu/sva: Basic data structures for SVA

2022-05-05 Thread Baolu Lu

On 2022/5/4 02:09, Jean-Philippe Brucker wrote:

On Mon, May 02, 2022 at 09:48:34AM +0800, Lu Baolu wrote:

Use below data structures for SVA implementation in the IOMMU core:

- struct iommu_sva_ioas
   Represent the I/O address space shared with an application CPU address
   space. This structure has a 1:1 relationship with an mm_struct. It
   grabs a "mm->mm_count" refcount during creation and drop it on release.


Do we actually need this structure?  At the moment it only keeps track of
bonds, which we can move to struct dev_iommu. Replacing it by a mm pointer
in struct iommu_domain simplifies the driver and seems to work


Fair enough.

+struct iommu_sva_ioas {
+   struct mm_struct *mm;
+   ioasid_t pasid;
+
+   /* Counter of domains attached to this ioas. */
+   refcount_t users;
+
+   /* All bindings are linked here. */
+   struct list_head bonds;
+};

By moving @mm to struct iommu_domain and @bonds to struct dev_iommu, the
code looks simpler. The mm, sva domain and per-device dev_iommu are
guaranteed to be valid during bind() and unbind().

Will head this direction in the next version.



Thanks,
Jean


Best regards,
baolu





- struct iommu_domain (IOMMU_DOMAIN_SVA type)
   Represent a hardware pagetable that the IOMMU hardware could use for
   SVA translation. Multiple iommu domains could be bound with an SVA ioas
   and each grabs a refcount from ioas in order to make sure ioas could
   only be freed after all domains have been unbound.

- struct iommu_sva
   Represent a bond relationship between an SVA ioas and an iommu domain.
   If a bond already exists, it's reused and a reference is taken.

Signed-off-by: Lu Baolu 
---
  include/linux/iommu.h | 14 +-
  drivers/iommu/iommu-sva-lib.h |  1 +
  drivers/iommu/iommu-sva-lib.c | 18 ++
  3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ab36244d4e94..f582f434c513 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -42,6 +42,7 @@ struct notifier_block;
  struct iommu_sva;
  struct iommu_fault_event;
  struct iommu_dma_cookie;
+struct iommu_sva_ioas;
  
  /* iommu fault flags */

  #define IOMMU_FAULT_READ  0x0
@@ -64,6 +65,9 @@ struct iommu_domain_geometry {
  #define __IOMMU_DOMAIN_PT (1U << 2)  /* Domain is identity mapped   */
  #define __IOMMU_DOMAIN_DMA_FQ (1U << 3)  /* DMA-API uses flush queue*/
  
+#define __IOMMU_DOMAIN_SHARED	(1U << 4)  /* Page table shared from CPU  */

+#define __IOMMU_DOMAIN_HOST_VA (1U << 5)  /* Host CPU virtual address */
+
  /*
   * This are the possible domain-types
   *
@@ -86,6 +90,8 @@ struct iommu_domain_geometry {
  #define IOMMU_DOMAIN_DMA_FQ   (__IOMMU_DOMAIN_PAGING |\
 __IOMMU_DOMAIN_DMA_API |   \
 __IOMMU_DOMAIN_DMA_FQ)
+#define IOMMU_DOMAIN_SVA   (__IOMMU_DOMAIN_SHARED |\
+__IOMMU_DOMAIN_HOST_VA)
  
  struct iommu_domain {

unsigned type;
@@ -95,6 +101,7 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
+   struct iommu_sva_ioas *sva_ioas;
  };
  
  static inline bool iommu_is_dma_domain(struct iommu_domain *domain)

@@ -628,7 +635,12 @@ struct iommu_fwspec {
   * struct iommu_sva - handle to a device-mm bond
   */
  struct iommu_sva {
-   struct device   *dev;
+   struct device   *dev;
+   struct iommu_sva_ioas   *sva_ioas;
+   struct iommu_domain *domain;
+   /* Link to sva ioas's bonds list */
+   struct list_headnode;
+   refcount_t  users;
  };
  
  int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,

diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 8909ea1094e3..9c5e108e2c8a 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -10,6 +10,7 @@
  
  int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max);

  struct mm_struct *iommu_sva_find(ioasid_t pasid);
+struct mm_struct *iommu_sva_domain_mm(struct iommu_domain *domain);
  
  /* I/O Page fault */

  struct device;
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 106506143896..d524a402be3b 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -3,6 +3,8 @@
   * Helpers for IOMMU drivers implementing SVA
   */
  #include 
+#include 
+#include 
  #include 
  
  #include "iommu-sva-lib.h"

@@ -10,6 +12,22 @@
  static DEFINE_MUTEX(iommu_sva_lock);
  static DECLARE_IOASID_SET(iommu_sva_pasid);
  
+struct iommu_sva_ioas {

+   struct mm_struct *mm;
+   ioasid_t pasid;
+
+   /* Counter of domains attached to this ioas. */
+   refcount_t users;
+
+   /* All bindings are linked here. */
+   struct list_head bonds;
+};
+

Re: [PATCH v5 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-05 Thread Baolu Lu

On 2022/5/4 02:07, Jean-Philippe Brucker wrote:

On Mon, May 02, 2022 at 09:48:33AM +0800, Lu Baolu wrote:

Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):

  - SVA (Shared Virtual Address)
  - kernel DMA with PASID
  - hardware-assist mediated device

This adds a pair of common domain ops for this purpose and adds helpers
to attach/detach a domain to/from a {device, PASID}. Some buses, like
PCI, route packets without considering the PASID value. Thus a DMA target
address with PASID might be treated as P2P if the address falls into the
MMIO BAR of other devices in the group. To make things simple, these
interfaces only apply to devices belonging to the singleton groups, and
the singleton is immutable in fabric i.e. not affected by hotplug.

Signed-off-by: Lu Baolu 


Reviewed-by: Jean-Philippe Brucker 

just a nit below


---
  include/linux/iommu.h | 21 
  drivers/iommu/iommu.c | 76 +++
  2 files changed, 97 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b8ffaf2cb1d0..ab36244d4e94 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -263,6 +263,8 @@ struct iommu_ops {
   * struct iommu_domain_ops - domain specific operations
   * @attach_dev: attach an iommu domain to a device
   * @detach_dev: detach an iommu domain from a device
+ * @attach_dev_pasid: attach an iommu domain to a pasid of device
+ * @detach_dev_pasid: detach an iommu domain from a pasid of device
   * @map: map a physically contiguous memory region to an iommu domain
   * @map_pages: map a physically contiguous set of pages of the same size to
   * an iommu domain.
@@ -283,6 +285,10 @@ struct iommu_ops {
  struct iommu_domain_ops {
int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+   int (*attach_dev_pasid)(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid);
+   void (*detach_dev_pasid)(struct iommu_domain *domain,
+struct device *dev, ioasid_t pasid);
  
  	int (*map)(struct iommu_domain *domain, unsigned long iova,

   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
@@ -678,6 +684,10 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, 
void *owner);
  void iommu_group_release_dma_owner(struct iommu_group *group);
  bool iommu_group_dma_owner_claimed(struct iommu_group *group);
  
+int iommu_attach_device_pasid(struct iommu_domain *domain,

+ struct device *dev, ioasid_t pasid);
+void iommu_detach_device_pasid(struct iommu_domain *domain,
+  struct device *dev, ioasid_t pasid);
  #else /* CONFIG_IOMMU_API */
  
  struct iommu_ops {};

@@ -1051,6 +1061,17 @@ static inline bool iommu_group_dma_owner_claimed(struct 
iommu_group *group)
  {
return false;
  }
+
+static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid)
+{
+   return -ENODEV;
+}
+
+static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
+struct device *dev, ioasid_t pasid)
+{
+}
  #endif /* CONFIG_IOMMU_API */
  
  /**

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 29906bc16371..89c9d19ddb28 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -38,6 +38,7 @@ struct iommu_group {
struct kobject kobj;
struct kobject *devices_kobj;
struct list_head devices;
+   struct xarray pasid_array;
struct mutex mutex;
void *iommu_data;
void (*iommu_data_release)(void *iommu_data);
@@ -630,6 +631,7 @@ struct iommu_group *iommu_group_alloc(void)
mutex_init(>mutex);
INIT_LIST_HEAD(>devices);
INIT_LIST_HEAD(>entry);
+   xa_init(>pasid_array);
  
  	ret = ida_simple_get(_group_ida, 0, 0, GFP_KERNEL);

if (ret < 0) {
@@ -3190,3 +3192,77 @@ bool iommu_group_dma_owner_claimed(struct iommu_group 
*group)
return user;
  }
  EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
+
+/*
+ * Use standard PCI bus topology and isolation features to check immutable
+ * singleton. Otherwise, assume the bus is static and then singleton can
+ * know from the device count in the group.
+ */


The comment doesn't really add anything that can't be directly understood
from the code.


Yes. It's fine to remove it.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 02/12] iommu: Add pasid_bits field in struct dev_iommu

2022-05-05 Thread Baolu Lu

On 2022/5/4 02:02, Jean-Philippe Brucker wrote:

On Mon, May 02, 2022 at 09:48:32AM +0800, Lu Baolu wrote:

Use this field to save the pasid/ssid bits that a device is able to
support with its IOMMU hardware. It is a generic attribute of a device
and lifting it into the per-device dev_iommu struct makes it possible
to allocate a PASID for device without calls into the IOMMU drivers.
Any iommu driver which suports PASID related features should set this
field before features are enabled on the devices.

For initialization of this field in the VT-d driver, the
info->pasid_supported is only set for PCI devices. So the status is
that non-PCI SVA hasn't been supported yet. Setting this field only for
PCI devices has no functional change.

Signed-off-by: Lu Baolu

Reviewed-by: Jean-Philippe Brucker


Thank you! Very appreciated for reviewing my patches.

Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: iommu_group_claim_dma_owner() must always assign a domain

2022-05-04 Thread Baolu Lu

On 2022/5/4 22:38, Jason Gunthorpe wrote:

@@ -3180,7 +3181,9 @@ int iommu_group_claim_dma_owner(struct iommu_group
*group, void *owner)
 ret = -EPERM;
 goto unlock_out;
 } else {
-   if (group->domain && group->domain != group->default_domain)
{
+   if (group->domain &&
+   group->domain != group->default_domain &&
+   group->domain != group->blocking_domain) {

Why do we need this hunk? This is just trying to check if there is
some conflict with some other domain attach, group->domain can never
be blocking_domain here.

This is *not* needed. Also verified with real hardware.

Sorry for the noise.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu/vt-d: Set SNP bit only in second-level page table entries

2022-05-04 Thread Baolu Lu

On 2022/5/4 21:31, Jason Gunthorpe wrote:

On Wed, May 04, 2022 at 03:25:50PM +0800, Baolu Lu wrote:

Hi Jason,

On 2022/5/2 21:05, Jason Gunthorpe wrote:

On Sun, May 01, 2022 at 07:24:31PM +0800, Lu Baolu wrote:

The SNP bit is only valid for second-level PTEs. Setting this bit in the
first-level PTEs has no functional impact because the Intel IOMMU always
ignores the same bit in first-level PTEs. Anyway, let's check the page
table type before setting SNP bit in PTEs to make the code more readable.

Shouldn't this be tested before setting force_snooping and not during
every map?


The check is in the following patch. This just makes sure that SNP is
only set in second-level page table entries.


I think you should add a 2nd flag to indicate 'set SNP bit in PTEs'
and take care of computing that flag in the enforce_no_snoop function


Yours looks better. Will add it in the next version.


Jason


Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: iommu_group_claim_dma_owner() must always assign a domain

2022-05-04 Thread Baolu Lu

Hi Jason,

On 2022/5/4 08:11, Jason Gunthorpe wrote:

Once the group enters 'owned' mode it can never be assigned back to the
default_domain or to a NULL domain. It must always be actively assigned to
a current domain. If the caller hasn't provided a domain then the core
must provide an explicit DMA blocking domain that has no DMA map.

Lazily create a group-global blocking DMA domain when
iommu_group_claim_dma_owner is first called and immediately assign the
group to it. This ensures that DMA is immediately fully isolated on all
IOMMU drivers.

If the user attaches/detaches while owned then detach will set the group
back to the blocking domain.

Slightly reorganize the call chains so that
__iommu_group_attach_core_domain() is the function that removes any caller
configured domain and sets the domains back a core owned domain with an
appropriate lifetime.

__iommu_group_attach_domain() is the worker function that can change the
domain assigned to a group to any target domain, including NULL.

Add comments clarifying how the NULL vs detach_dev vs default_domain works
based on Robin's remarks.

This fixes an oops with VFIO and SMMUv3 because VFIO will call
iommu_detach_group() and then immediately iommu_domain_free(), but
SMMUv3 has no way to know that the domain it is holding a pointer to
has been freed. Now the iommu_detach_group() will assign the blocking
domain and SMMUv3 will no longer hold a stale domain reference.

Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces")
Reported-by: Qian Cai 
Signed-off-by: Robin Murphy 
Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/iommu.c | 112 +++---
  1 file changed, 82 insertions(+), 30 deletions(-)

This is based on Robins draft here:

https://lore.kernel.org/linux-iommu/18831161-473f-e04f-4a81-1c7062ad1...@arm.com/

With some rework. I re-organized the call chains instead of introducing
iommu_group_user_attached(), fixed a recursive locking for
iommu_group_get_purgatory(), and made a proper commit message.

Still only compile tested, so RFCish.

Nicolin/Lu? What do you think, can you check it?


Thank you for the patch.

With below additional changes, this patch works on my Intel test
machine.

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 513da82f2ed1..7c415e9b6906 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2063,7 +2063,8 @@ static int __iommu_attach_group(struct 
iommu_domain *domain,

 {
int ret;

-   if (group->domain && group->domain != group->default_domain)
+   if (group->domain && group->domain != group->default_domain &&
+   group->domain != group->blocking_domain)
return -EBUSY;

ret = __iommu_group_for_each_dev(group, domain,
@@ -2125,7 +2126,7 @@ static int __iommu_group_attach_domain(struct 
iommu_group *group,

 * Note that this is called in error unwind paths, attaching to a
 * domain that has already been attached cannot fail.
 */
-   ret = __iommu_group_for_each_dev(group, group->default_domain,
+   ret = __iommu_group_for_each_dev(group, new_domain,
 iommu_group_do_attach_device);
if (ret)
return ret;
@@ -3180,7 +3181,9 @@ int iommu_group_claim_dma_owner(struct iommu_group 
*group, void *owner)

ret = -EPERM;
goto unlock_out;
} else {
-   if (group->domain && group->domain != 
group->default_domain) {

+   if (group->domain &&
+   group->domain != group->default_domain &&
+   group->domain != group->blocking_domain) {
ret = -EBUSY;
goto unlock_out;

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: iommu_group_claim_dma_owner() must always assign a domain

2022-05-04 Thread Baolu Lu

Hi Jason,

On 2022/5/4 08:11, Jason Gunthorpe wrote:

+static int __iommu_group_attach_domain(struct iommu_group *group,
+  struct iommu_domain *new_domain)
  {
int ret;
  
+	if (group->domain == new_domain)

+   return 0;
+
/*
-* If the group has been claimed already, do not re-attach the default
-* domain.
+* A NULL domain means to call the detach_dev() op. New drivers should
+* use a IOMMU_DOMAIN_IDENTITY domain instead of a NULL default_domain
+* and detatch_dev().
 */
-   if (!group->default_domain || group->owner) {
-   __iommu_group_for_each_dev(group, domain,
+   if (!new_domain) {
+   WARN_ON(!group->domain->ops->detach_dev);
+   __iommu_group_for_each_dev(group, group->domain,
   iommu_group_do_detach_device);
group->domain = NULL;
-   return;
+   return 0;
}
  
-	if (group->domain == group->default_domain)

-   return;
-
-   /* Detach by re-attaching to the default domain */
+   /*
+* New drivers do not implement detach_dev, so changing the domain is
+* done by calling attach on the new domain. Drivers should implement
+* this so that DMA is always translated by either the new, old, or a
+* blocking domain. DMA should never become untranslated.
+*
+* Note that this is called in error unwind paths, attaching to a
+* domain that has already been attached cannot fail.
+*/
ret = __iommu_group_for_each_dev(group, group->default_domain,

^^

I suppose this should be @new_domain, right?


 iommu_group_do_attach_device);
-   if (ret != 0)
-   WARN_ON(1);
+   if (ret)
+   return ret;
+   group->domain = new_domain;
+   return 0;
+}


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] iommu/vt-d: Remove hard coding PGSNP bit in PASID entries

2022-05-04 Thread Baolu Lu

On 2022/5/2 21:19, Jason Gunthorpe wrote:

On Sun, May 01, 2022 at 07:24:34PM +0800, Lu Baolu wrote:

As enforce_cache_coherency has been introduced into the iommu_domain_ops,
the kernel component which owns the iommu domain is able to opt-in its
requirement for force snooping support. The iommu driver has no need to
hard code the page snoop control bit in the PASID table entries anymore.

Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel/pasid.h | 1 -
  drivers/iommu/intel/iommu.c | 3 ---
  drivers/iommu/intel/pasid.c | 6 --
  3 files changed, 10 deletions(-)


It seems fine, but as in the other email where do we do
pasid_set_pgsnp() for a new device attach on an already no-snopp domain?


Yes. I will take care of this in the next version.


Jason


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/5] iommu/vt-d: Remove domain_update_iommu_snooping()

2022-05-04 Thread Baolu Lu

On 2022/5/3 05:36, Jacob Pan wrote:

Hi BaoLu,

On Sun, 1 May 2022 19:24:33 +0800, Lu Baolu 
wrote:


The IOMMU force snooping capability is not required to be consistent
among all the IOMMUs anymore. Remove force snooping capability check
in the IOMMU hot-add path and domain_update_iommu_snooping() becomes
a dead code now.

Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel/iommu.c | 34 +-
  1 file changed, 1 insertion(+), 33 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 3c1c228f9031..d5808495eb64 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -533,33 +533,6 @@ static void domain_update_iommu_coherency(struct
dmar_domain *domain) rcu_read_unlock();
  }
  
-static bool domain_update_iommu_snooping(struct intel_iommu *skip)

-{
-   struct dmar_drhd_unit *drhd;
-   struct intel_iommu *iommu;
-   bool ret = true;
-
-   rcu_read_lock();
-   for_each_active_iommu(iommu, drhd) {
-   if (iommu != skip) {
-   /*
-* If the hardware is operating in the scalable
mode,
-* the snooping control is always supported
since we
-* always set PASID-table-entry.PGSNP bit if the
domain
-* is managed outside (UNMANAGED).
-*/
-   if (!sm_supported(iommu) &&
-   !ecap_sc_support(iommu->ecap)) {
-   ret = false;
-   break;
-   }
-   }
-   }
-   rcu_read_unlock();
-
-   return ret;
-}
-
  static int domain_update_iommu_superpage(struct dmar_domain *domain,
 struct intel_iommu *skip)
  {
@@ -3593,12 +3566,7 @@ static int intel_iommu_add(struct dmar_drhd_unit
*dmaru) iommu->name);
return -ENXIO;
}
-   if (!ecap_sc_support(iommu->ecap) &&
-   domain_update_iommu_snooping(iommu)) {
-   pr_warn("%s: Doesn't support snooping.\n",
-   iommu->name);
-   return -ENXIO;
-   }
+

Maybe I missed earlier patches, so this bit can also be deleted?

struct dmar_domain {
u8 iommu_snooping: 1;   /* indicate snooping control
feature */


It has been cleaned up by below commit:

71cfafda9c9b vfio: Move the Intel no-snoop control off of IOMMU_CACHE




sp = domain_update_iommu_superpage(NULL, iommu) - 1;
if (sp >= 0 && !(cap_super_page_val(iommu->cap) & (1 << sp))) {
pr_warn("%s: Doesn't support large page.\n",



Thanks,

Jacob


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] iommu/vt-d: Check domain force_snooping against attached devices

2022-05-04 Thread Baolu Lu

On 2022/5/3 05:31, Jacob Pan wrote:

Hi BaoLu,


Hi Jacob,



On Sun, 1 May 2022 19:24:32 +0800, Lu Baolu 
wrote:


As domain->force_snooping only impacts the devices attached with the
domain, there's no need to check against all IOMMU units. At the same
time, for a brand new domain (hasn't been attached to any device), the
force_snooping field could be set, but the attach_dev callback will
return failure if it wants to attach to a device which IOMMU has no
snoop control capability.

Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel/pasid.h |  2 ++
  drivers/iommu/intel/iommu.c | 50 -
  drivers/iommu/intel/pasid.c | 18 +
  3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index ab4408c824a5..583ea67fc783 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -123,4 +123,6 @@ void intel_pasid_tear_down_entry(struct intel_iommu
*iommu, bool fault_ignore);
  int vcmd_alloc_pasid(struct intel_iommu *iommu, u32 *pasid);
  void vcmd_free_pasid(struct intel_iommu *iommu, u32 pasid);
+void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
+ struct device *dev, u32 pasid);
  #endif /* __INTEL_PASID_H */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 98050943d863..3c1c228f9031 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4554,13 +4554,61 @@ static phys_addr_t
intel_iommu_iova_to_phys(struct iommu_domain *domain, return phys;
  }
  
+static bool domain_support_force_snooping(struct dmar_domain *domain)

+{
+   struct device_domain_info *info;
+   unsigned long flags;
+   bool support = true;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   if (list_empty(>devices))
+   goto out;
+
+   list_for_each_entry(info, >devices, link) {
+   if (!ecap_sc_support(info->iommu->ecap)) {
+   support = false;
+   break;
+   }
+   }

why not just check the flag dmar_domain->force_snooping? devices wouldn't
be able to attach if !ecap_sc, right?


I should check "dmar_domain->force_snooping" first. If this is the first
time that this flag is about to set, then check the capabilities.




+out:
+   spin_unlock_irqrestore(_domain_lock, flags);
+   return support;
+}
+
+static void domain_set_force_snooping(struct dmar_domain *domain)
+{
+   struct device_domain_info *info;
+   unsigned long flags;
+
+   /*
+* Second level page table supports per-PTE snoop control. The
+* iommu_map() interface will handle this by setting SNP bit.
+*/
+   if (!domain_use_first_level(domain))
+   return;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   if (list_empty(>devices))
+   goto out_unlock;
+
+   list_for_each_entry(info, >devices, link)
+   intel_pasid_setup_page_snoop_control(info->iommu,
info->dev,
+PASID_RID2PASID);
+

I guess other DMA API PASIDs need to have sc bit set as well. I will keep
this in mind for my DMA API PASID patch.


Kernel DMA don't need to set the PGSNP bit. The x86 arch is always DMA
coherent. The force snooping is only needed when the device is
controlled by user space, but the VMM is optimized not to support the
virtualization of the wbinv instruction.




+out_unlock:
+   spin_unlock_irqrestore(_domain_lock, flags);
+}
+
  static bool intel_iommu_enforce_cache_coherency(struct iommu_domain
*domain) {
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
  
-	if (!domain_update_iommu_snooping(NULL))

+   if (!domain_support_force_snooping(dmar_domain))
return false;
+
+   domain_set_force_snooping(dmar_domain);
dmar_domain->force_snooping = true;
+

nit: spurious change


I expect a blank line before return in the end.


return true;
  }
  
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c

index f8d215d85695..815c744e6a34 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -762,3 +762,21 @@ int intel_pasid_setup_pass_through(struct
intel_iommu *iommu,
return 0;
  }
+
+/*
+ * Set the page snoop control for a pasid entry which has been set up.
+ */
+void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
+ struct device *dev, u32 pasid)
+{
+   struct pasid_entry *pte;
+   u16 did;
+
+   pte = intel_pasid_get_entry(dev, pasid);
+   if (WARN_ON(!pte || !pasid_pte_is_present(pte)))
+   return;
+
+   pasid_set_pgsnp(pte);
+   did = pasid_get_domain_id(pte);
+   pasid_flush_caches(iommu, pte, pasid, did);
+}



Thanks,

Jacob


Best regards,
baolu
___
iommu mailing list

Re: [PATCH 3/5] iommu/vt-d: Check domain force_snooping against attached devices

2022-05-04 Thread Baolu Lu

On 2022/5/2 21:17, Jason Gunthorpe wrote:

On Sun, May 01, 2022 at 07:24:32PM +0800, Lu Baolu wrote:

+static bool domain_support_force_snooping(struct dmar_domain *domain)
+{
+   struct device_domain_info *info;
+   unsigned long flags;
+   bool support = true;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   if (list_empty(>devices))
+   goto out;


Why? list_for_each_entry will just do nothing..


Yes. I will remove above two lines.




+   list_for_each_entry(info, >devices, link) {
+   if (!ecap_sc_support(info->iommu->ecap)) {
+   support = false;
+   break;
+   }
+   }
+out:
+   spin_unlock_irqrestore(_domain_lock, flags);
+   return support;
+}
+
+static void domain_set_force_snooping(struct dmar_domain *domain)
+{
+   struct device_domain_info *info;
+   unsigned long flags;
+
+   /*
+* Second level page table supports per-PTE snoop control. The
+* iommu_map() interface will handle this by setting SNP bit.
+*/
+   if (!domain_use_first_level(domain))
+   return;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   if (list_empty(>devices))
+   goto out_unlock;
+
+   list_for_each_entry(info, >devices, link)
+   intel_pasid_setup_page_snoop_control(info->iommu, info->dev,
+PASID_RID2PASID);
+
+out_unlock:
+   spin_unlock_irqrestore(_domain_lock, flags);
+}
+
  static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
  {
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
  
-	if (!domain_update_iommu_snooping(NULL))

+   if (!domain_support_force_snooping(dmar_domain))
return false;


Maybe exit early if force_snooping = true?


Yes, should check "force_snooping = true" and return directly if
force_snooping has already been set. As you pointed below, the new
domain_attach should take care of this flag as well. Thanks!




+   domain_set_force_snooping(dmar_domain);
dmar_domain->force_snooping = true;
+
return true;
  }
  
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c

index f8d215d85695..815c744e6a34 100644
+++ b/drivers/iommu/intel/pasid.c
@@ -762,3 +762,21 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
*iommu,
  
  	return 0;

  }
+
+/*
+ * Set the page snoop control for a pasid entry which has been set up.
+ */


So the 'first level' is only used with pasid?


Yes. A fake pasid (RID2PASID in spec) is used for legacy transactions
(those w/o pasid).




+void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
+ struct device *dev, u32 pasid)
+{
+   struct pasid_entry *pte;
+   u16 did;
+
+   pte = intel_pasid_get_entry(dev, pasid);
+   if (WARN_ON(!pte || !pasid_pte_is_present(pte)))
+   return;
+
+   pasid_set_pgsnp(pte);


Doesn't this need to be done in other places too, like when a new attach
is made? Patch 5 removed it, but should that be made if
domain->force_snooping?


Yes. I missed this. Will take care of this in the next version.



Jason


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu/vt-d: Set SNP bit only in second-level page table entries

2022-05-04 Thread Baolu Lu

Hi Jason,

On 2022/5/2 21:05, Jason Gunthorpe wrote:

On Sun, May 01, 2022 at 07:24:31PM +0800, Lu Baolu wrote:

The SNP bit is only valid for second-level PTEs. Setting this bit in the
first-level PTEs has no functional impact because the Intel IOMMU always
ignores the same bit in first-level PTEs. Anyway, let's check the page
table type before setting SNP bit in PTEs to make the code more readable.

Shouldn't this be tested before setting force_snooping and not during
every map?


The check is in the following patch. This just makes sure that SNP is
only set in second-level page table entries.

Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-30 Thread Baolu Lu

On 2022/4/30 06:19, Fenghua Yu wrote:

Hi, Jean and Baolu,

On Fri, Apr 29, 2022 at 03:34:36PM +0100, Jean-Philippe Brucker wrote:

On Fri, Apr 29, 2022 at 06:51:17AM -0700, Fenghua Yu wrote:

Hi, Baolu,

On Fri, Apr 29, 2022 at 03:53:57PM +0800, Baolu Lu wrote:

On 2022/4/28 16:39, Jean-Philippe Brucker wrote:

The address space is what the OOM killer is after.  That gets refcounted
with mmget()/mmput()/mm->mm_users.  The OOM killer is satiated by the
page freeing done in __mmput()->exit_mmap().

Also, all the VMAs should be gone after exit_mmap().  So, even if
vma->vm_file was holding a reference to a device driver, that reference
should be gone by the time __mmdrop() is actually freeing the PASID.


I agree with all that. The concern was about tearing down the PASID in the
IOMMU and device from the release() MMU notifier, which would happen in
exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And
since the IOMMU drivers need to hold mm->mm_count anyway between bind()
and unbind(), I think Fenghua's fix works.


But I didn't find mmgrab()/mmdrop() get called in both arm and intel
IOMMU drivers.

$ git grep mmgrab drivers/iommu/
[no output]

Do we need to add these in a separated fix patch, or I missed anything
here?


On both ARM and X86, sva_bind() calls mmu_notifier_register()->mmgrab() and
sva_unbind() calls mmu_notifier_unregister()/mmu_notifier_put()->mmdrop().


Yes, although for Arm I realized the mmu_notifier grab wasn't sufficient
so I sent a separate fix that should go in 5.18 as well
https://lore.kernel.org/linux-iommu/20220426130444.300556-1-jean-phili...@linaro.org/
The Arm driver still touches the arch mm context after mmu_notifier_put().
I don't think X86 has that problem.


I think so too. On X86, the mm is not used after mmu_notifier_unregister().

In case of supervisor mode SVM (i.e. svm is bound to init_mm), the code
is right too because init_mm and its PASID cannot be dropped and
mmu_notifier_register()/mmu_notifier_unregister() are not called.

So I think no extra fix patch is needed on X86.


Thanks, Fenghua and Jean. It's clear to me now.

Jean, another quick question about the iommu_sva_bind_device()

/**
 * iommu_sva_bind_device() - Bind a process address space to a device
 * @dev: the device
 * @mm: the mm to bind, caller must hold a reference to it
 * @drvdata: opaque data pointer to pass to bind callback

This interface requires the caller to take a reference to mm. Which
reference should it take, mm->mm_count or mm->mm_users? It's better to
make it explicit in this comment.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 18/19] iommu/intel: Access/Dirty bit support for SL domains

2022-04-30 Thread Baolu Lu

On 2022/4/29 05:09, Joao Martins wrote:

--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5089,6 +5089,113 @@ static void intel_iommu_iotlb_sync_map(struct 
iommu_domain *domain,
}
  }
  
+static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,

+ bool enable)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct device_domain_info *info;
+   unsigned long flags;
+   int ret = -EINVAL;


if (domain_use_first_level(dmar_domain))
return -EOPNOTSUPP;


+
+   spin_lock_irqsave(_domain_lock, flags);
+   if (list_empty(_domain->devices)) {
+   spin_unlock_irqrestore(_domain_lock, flags);
+   return ret;
+   }


I agreed with Kevin's suggestion in his reply.


+
+   list_for_each_entry(info, _domain->devices, link) {
+   if (!info->dev || (info->domain != dmar_domain))
+   continue;


This check is redundant.


+
+   /* Dirty tracking is second-stage level SM only */
+   if ((info->domain && domain_use_first_level(info->domain)) ||
+   !ecap_slads(info->iommu->ecap) ||
+   !sm_supported(info->iommu) || !intel_iommu_sm) {
+   ret = -EOPNOTSUPP;
+   continue;


Perhaps break and return -EOPNOTSUPP directly here? We are not able to
support a mixed mode, right?


+   }
+
+   ret = intel_pasid_setup_dirty_tracking(info->iommu, 
info->domain,
+info->dev, PASID_RID2PASID,
+enable);
+   if (ret)
+   break;
+   }
+   spin_unlock_irqrestore(_domain_lock, flags);
+
+   /*
+* We need to flush context TLB and IOTLB with any cached translations
+* to force the incoming DMA requests for have its IOTLB entries tagged
+* with A/D bits
+*/
+   intel_flush_iotlb_all(domain);
+   return ret;
+}


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 04/19] iommu: Add an unmap API that returns dirtied IOPTEs

2022-04-29 Thread Baolu Lu

On 2022/4/29 05:09, Joao Martins wrote:

Today, the dirty state is lost and the page wouldn't be migrated to
destination potentially leading the guest into error.

Add an unmap API that reads the dirty bit and sets it in the
user passed bitmap. This unmap iommu API tackles a potentially
racy update to the dirty bit *when* doing DMA on a iova that is
being unmapped at the same time.

The new unmap_read_dirty/unmap_pages_read_dirty does not replace
the unmap pages, but rather only when explicit called with an dirty
bitmap data passed in.

It could be said that the guest is buggy and rather than a special unmap
path tackling the theoretical race ... it would suffice fetching the
dirty bits (with GET_DIRTY_IOVA), and then unmap the IOVA.


I am not sure whether this API could solve the race.

size_t iommu_unmap(struct iommu_domain *domain,
   unsigned long iova, size_t size)
{
struct iommu_iotlb_gather iotlb_gather;
size_t ret;

iommu_iotlb_gather_init(_gather);
ret = __iommu_unmap(domain, iova, size, _gather);
iommu_iotlb_sync(domain, _gather);

return ret;
}

The PTEs are cleared before iotlb invalidation. What if a DMA write
happens after PTE clearing and before the iotlb invalidation with the
PTE happening to be cached?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 03/19] iommufd: Dirty tracking data support

2022-04-29 Thread Baolu Lu

On 2022/4/29 05:09, Joao Martins wrote:

Add an IO pagetable API iopt_read_and_clear_dirty_data() that
performs the reading of dirty IOPTEs for a given IOVA range and
then copying back to userspace from each area-internal bitmap.

Underneath it uses the IOMMU equivalent API which will read the
dirty bits, as well as atomically clearing the IOPTE dirty bit
and flushing the IOTLB at the end. The dirty bitmaps pass an
iotlb_gather to allow batching the dirty-bit updates.

Most of the complexity, though, is in the handling of the user
bitmaps to avoid copies back and forth. The bitmap user addresses
need to be iterated through, pinned and then passing the pages
into iommu core. The amount of bitmap data passed at a time for a
read_and_clear_dirty() is 1 page worth of pinned base page
pointers. That equates to 16M bits, or rather 64G of data that
can be returned as 'dirtied'. The flush the IOTLB at the end of
the whole scanned IOVA range, to defer as much as possible the
potential DMA performance penalty.

Signed-off-by: Joao Martins 
---
  drivers/iommu/iommufd/io_pagetable.c| 169 
  drivers/iommu/iommufd/iommufd_private.h |  44 ++
  2 files changed, 213 insertions(+)

diff --git a/drivers/iommu/iommufd/io_pagetable.c 
b/drivers/iommu/iommufd/io_pagetable.c
index f4609ef369e0..835b5040fce9 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "io_pagetable.h"
  
@@ -347,6 +348,174 @@ int iopt_set_dirty_tracking(struct io_pagetable *iopt,

return ret;
  }
  
+int iommufd_dirty_iter_init(struct iommufd_dirty_iter *iter,

+   struct iommufd_dirty_data *bitmap)
+{
+   struct iommu_dirty_bitmap *dirty = >dirty;
+   unsigned long bitmap_len;
+
+   bitmap_len = dirty_bitmap_bytes(bitmap->length >> dirty->pgshift);
+
+   import_single_range(WRITE, bitmap->data, bitmap_len,
+   >bitmap_iov, >bitmap_iter);
+   iter->iova = bitmap->iova;
+
+   /* Can record up to 64G at a time */
+   dirty->pages = (struct page **) __get_free_page(GFP_KERNEL);
+
+   return !dirty->pages ? -ENOMEM : 0;
+}
+
+void iommufd_dirty_iter_free(struct iommufd_dirty_iter *iter)
+{
+   struct iommu_dirty_bitmap *dirty = >dirty;
+
+   if (dirty->pages) {
+   free_page((unsigned long) dirty->pages);
+   dirty->pages = NULL;
+   }
+}
+
+bool iommufd_dirty_iter_done(struct iommufd_dirty_iter *iter)
+{
+   return iov_iter_count(>bitmap_iter) > 0;
+}
+
+static inline unsigned long iommufd_dirty_iter_bytes(struct iommufd_dirty_iter 
*iter)
+{
+   unsigned long left = iter->bitmap_iter.count - 
iter->bitmap_iter.iov_offset;
+
+   left = min_t(unsigned long, left, (iter->dirty.npages << PAGE_SHIFT));
+
+   return left;
+}
+
+unsigned long iommufd_dirty_iova_length(struct iommufd_dirty_iter *iter)
+{
+   unsigned long left = iommufd_dirty_iter_bytes(iter);
+
+   return ((BITS_PER_BYTE * left) << iter->dirty.pgshift);
+}
+
+unsigned long iommufd_dirty_iova(struct iommufd_dirty_iter *iter)
+{
+   unsigned long skip = iter->bitmap_iter.iov_offset;
+
+   return iter->iova + ((BITS_PER_BYTE * skip) << iter->dirty.pgshift);
+}
+
+void iommufd_dirty_iter_advance(struct iommufd_dirty_iter *iter)
+{
+   iov_iter_advance(>bitmap_iter, iommufd_dirty_iter_bytes(iter));
+}
+
+void iommufd_dirty_iter_put(struct iommufd_dirty_iter *iter)
+{
+   struct iommu_dirty_bitmap *dirty = >dirty;
+
+   if (dirty->npages)
+   unpin_user_pages(dirty->pages, dirty->npages);
+}
+
+int iommufd_dirty_iter_get(struct iommufd_dirty_iter *iter)
+{
+   struct iommu_dirty_bitmap *dirty = >dirty;
+   unsigned long npages;
+   unsigned long ret;
+   void *addr;
+
+   addr = iter->bitmap_iov.iov_base + iter->bitmap_iter.iov_offset;
+   npages = iov_iter_npages(>bitmap_iter,
+PAGE_SIZE / sizeof(struct page *));
+
+   ret = pin_user_pages_fast((unsigned long) addr, npages,
+ FOLL_WRITE, dirty->pages);
+   if (ret <= 0)
+   return -EINVAL;
+
+   dirty->npages = ret;
+   dirty->iova = iommufd_dirty_iova(iter);
+   dirty->start_offset = offset_in_page(addr);
+   return 0;
+}
+
+static int iommu_read_and_clear_dirty(struct iommu_domain *domain,
+ struct iommufd_dirty_data *bitmap)


This looks more like a helper in the iommu core. How about

iommufd_read_clear_domain_dirty()?


+{
+   const struct iommu_domain_ops *ops = domain->ops;
+   struct iommu_iotlb_gather gather;
+   struct iommufd_dirty_iter iter;
+   int ret = 0;
+
+   if (!ops || !ops->read_and_clear_dirty)
+   return -EOPNOTSUPP;
+
+   iommu_dirty_bitmap_init(, bitmap->iova,
+   

Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable

2022-04-29 Thread Baolu Lu

On 2022/4/29 05:09, Joao Martins wrote:

+int iopt_set_dirty_tracking(struct io_pagetable *iopt,
+   struct iommu_domain *domain, bool enable)
+{
+   struct iommu_domain *dom;
+   unsigned long index;
+   int ret = -EOPNOTSUPP;
+
+   down_write(>iova_rwsem);
+   if (!domain) {
+   down_write(>domains_rwsem);
+   xa_for_each(>domains, index, dom) {
+   ret = iommu_set_dirty_tracking(dom, iopt, enable);
+   if (ret < 0)
+   break;


Do you need to roll back to the original state before return failure?
Partial domains have already had dirty bit tracking enabled.


+   }
+   up_write(>domains_rwsem);
+   } else {
+   ret = iommu_set_dirty_tracking(domain, iopt, enable);
+   }
+
+   up_write(>iova_rwsem);
+   return ret;
+}


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 01/19] iommu: Add iommu_domain ops for dirty tracking

2022-04-29 Thread Baolu Lu

Hi Joao,

Thanks for doing this.

On 2022/4/29 05:09, Joao Martins wrote:

Add to iommu domain operations a set of callbacks to
perform dirty tracking, particulary to start and stop
tracking and finally to test and clear the dirty data.

Drivers are expected to dynamically change its hw protection
domain bits to toggle the tracking and flush some form of
control state structure that stands in the IOVA translation
path.

For reading and clearing dirty data, in all IOMMUs a transition
from any of the PTE access bits (Access, Dirty) implies flushing
the IOTLB to invalidate any stale data in the IOTLB as to whether
or not the IOMMU should update the said PTEs. The iommu core APIs
introduce a new structure for storing the dirties, albeit vendor
IOMMUs implementing .read_and_clear_dirty() just use
iommu_dirty_bitmap_record() to set the memory storing dirties.
The underlying tracking/iteration of user bitmap memory is instead
done by iommufd which takes care of initializing the dirty bitmap
*prior* to passing to the IOMMU domain op.

So far for currently/to-be-supported IOMMUs with dirty tracking
support this particularly because the tracking is part of
first stage tables and part of address translation. Below
it is mentioned how hardware deal with the hardware protection
domain control bits, to justify the added iommu core APIs.
vendor IOMMU implementation will also explain in more detail on
the dirty bit usage/clearing in the IOPTEs.

* x86 AMD:

The same thing for AMD particularly the Device Table
respectivally, followed by flushing the Device IOTLB. On AMD[1],
section "2.2.1 Updating Shared Tables", e.g.


Each table can also have its contents cached by the IOMMU or

peripheral IOTLBs. Therefore, after
updating a table entry that can be cached, system software must
send the IOMMU an appropriate
invalidate command. Information in the peripheral IOTLBs must
also be invalidated.

There's no mention of particular bits that are cached or
not but fetching a dev entry is part of address translation
as also depicted, so invalidate the device table to make
sure the next translations fetch a DTE entry with the HD bits set.

* x86 Intel (rev3.0+):

Likewise[2] set the SSADE bit in the scalable-entry second stage table
to enable Access/Dirty bits in the second stage page table. See manual,
particularly on "6.2.3.1 Scalable-Mode PASID-Table Entry Programming
Considerations"


When modifying root-entries, scalable-mode root-entries,

context-entries, or scalable-mode context
entries:

Software must serially invalidate the context-cache,

PASID-cache (if applicable), and the IOTLB.  The serialization is
required since hardware may utilize information from the
context-caches (e.g., Domain-ID) to tag new entries inserted to
the PASID-cache and IOTLB for processing in-flight requests.
Section 6.5 describe the invalidation operations.

And also the whole chapter "" Table "Table 23.  Guidance to
Software for Invalidations" in "6.5.3.3 Guidance to Software for
Invalidations" explicitly mentions


SSADE transition from 0 to 1 in a scalable-mode PASID-table

entry with PGTT value of Second-stage or Nested

* ARM SMMUV3.2:

SMMUv3.2 needs to toggle the dirty bit descriptor
over the CD (or S2CD) for toggling and flush/invalidate
the IOMMU dev IOTLB.

Reference[0]: SMMU spec, "5.4.1 CD notes",


The following CD fields are permitted to be cached as part of a

translation or TLB entry, and alteration requires
invalidation of any TLB entry that might have cached these
fields, in addition to CD structure cache invalidation:

...
HA, HD
...

Although, The ARM SMMUv3 case is a tad different that its x86
counterparts. Rather than changing *only* the IOMMU domain device entry to
enable dirty tracking (and having a dedicated bit for dirtyness in IOPTE)
ARM instead uses a dirty-bit modifier which is separately enabled, and
changes the *existing* meaning of access bits (for ro/rw), to the point
that marking access bit read-only but with dirty-bit-modifier enabled
doesn't trigger an perm io page fault.

In pratice this means that changing iommu context isn't enough
and in fact mostly useless IIUC (and can be always enabled). Dirtying
is only really enabled when the DBM pte bit is enabled (with the
CD.HD bit as a prereq).

To capture this h/w construct an iommu core API is added which enables
dirty tracking on an IOVA range rather than a device/context entry.
iommufd picks one or the other, and IOMMUFD core will favour
device-context op followed by IOVA-range alternative.


Instead of specification words, I'd like to read more about why the
callbacks are needed and how should they be implemented and consumed.



[0] https://developer.arm.com/documentation/ihi0070/latest
[1] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf
[2] https://cdrdv2.intel.com/v1/dl/getContent/671081

Signed-off-by: Joao Martins 
---
  drivers/iommu/iommu.c  | 28 
  include/linux/io-pgtable.h |  6 +
  include/linux/iommu.h  | 52 

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-29 Thread Baolu Lu

On 2022/4/28 16:39, Jean-Philippe Brucker wrote:

On Tue, Apr 26, 2022 at 04:31:57PM -0700, Dave Hansen wrote:

On 4/26/22 09:48, Jean-Philippe Brucker wrote:

On Tue, Apr 26, 2022 at 08:27:00AM -0700, Dave Hansen wrote:

On 4/25/22 09:40, Jean-Philippe Brucker wrote:

The problem is that we'd have to request the device driver to stop DMA
before we can destroy the context and free the PASID. We did consider
doing this in the release() MMU notifier, but there were concerns about
blocking mmput() for too long (for example
https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa79...@amd.com/
though I think there was a more recent discussion). We also need to drain
the PRI and fault queues to get rid of all references to that PASID.

Is the concern truly about blocking mmput() itself?  Or, is it about
releasing the resources associated with the mm?

The latter I think, this one was about releasing pages as fast as possible
if the process is picked by the OOM killer.


We're tying the PASID to the life of the mm itself, not the mm's address
space.  That means the PASID should be tied to
mmgrab()/mmdrop()/mm->mm_count.

The address space is what the OOM killer is after.  That gets refcounted
with mmget()/mmput()/mm->mm_users.  The OOM killer is satiated by the
page freeing done in __mmput()->exit_mmap().

Also, all the VMAs should be gone after exit_mmap().  So, even if
vma->vm_file was holding a reference to a device driver, that reference
should be gone by the time __mmdrop() is actually freeing the PASID.


I agree with all that. The concern was about tearing down the PASID in the
IOMMU and device from the release() MMU notifier, which would happen in
exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And
since the IOMMU drivers need to hold mm->mm_count anyway between bind()
and unbind(), I think Fenghua's fix works.


But I didn't find mmgrab()/mmdrop() get called in both arm and intel
IOMMU drivers.

$ git grep mmgrab drivers/iommu/
[no output]

Do we need to add these in a separated fix patch, or I missed anything
here?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 03/14] iommu: Move bus setup to IOMMU device registration

2022-04-29 Thread Baolu Lu

Hi Robin,

On 2022/4/28 21:18, Robin Murphy wrote:

Move the bus setup to iommu_device_register(). This should allow
bus_iommu_probe() to be correctly replayed for multiple IOMMU instances,
and leaves bus_set_iommu() as a glorified no-op to be cleaned up next.


I re-fetched the latest patches on

https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/bus

and rolled back the head to "iommu: Cleanup bus_set_iommu".

The test machine still hangs during boot.

I went through the code. It seems that the .probe_device for Intel IOMMU
driver can't handle the probe replay well. It always assumes that the
device has never been probed.

Best regards,
baolu



At this point we can also handle cleanup better than just rolling back
the most-recently-touched bus upon failure - which may release devices
owned by other already-registered instances, and still leave devices on
other buses with dangling pointers to the failed instance. Now it's easy
to clean up the exact footprint of a given instance, no more, no less.

Tested-by: Marek Szyprowski 
Signed-off-by: Robin Murphy 
---
  drivers/iommu/iommu.c | 51 +++
  1 file changed, 27 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6c4621afc8cf..c89af4dc54c2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -175,6 +175,14 @@ static int __init iommu_subsys_init(void)
  }
  subsys_initcall(iommu_subsys_init);
  
+static int remove_iommu_group(struct device *dev, void *data)

+{
+   if (dev->iommu && dev->iommu->iommu_dev == data)
+   iommu_release_device(dev);
+
+   return 0;
+}
+
  /**
   * iommu_device_register() - Register an IOMMU hardware instance
   * @iommu: IOMMU handle for the instance
@@ -197,12 +205,29 @@ int iommu_device_register(struct iommu_device *iommu,
spin_lock(_device_lock);
list_add_tail(>list, _device_list);
spin_unlock(_device_lock);
+
+   for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) {
+   struct bus_type *bus = iommu_buses[i];
+   int err;
+
+   WARN_ON(bus->iommu_ops && bus->iommu_ops != ops);
+   bus->iommu_ops = ops;
+   err = bus_iommu_probe(bus);
+   if (err) {
+   iommu_device_unregister(iommu);
+   return err;
+   }
+   }
+
return 0;
  }
  EXPORT_SYMBOL_GPL(iommu_device_register);
  
  void iommu_device_unregister(struct iommu_device *iommu)

  {
+   for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++)
+   bus_for_each_dev(iommu_buses[i], NULL, iommu, 
remove_iommu_group);
+
spin_lock(_device_lock);
list_del(>list);
spin_unlock(_device_lock);
@@ -1655,13 +1680,6 @@ static int probe_iommu_group(struct device *dev, void 
*data)
return ret;
  }
  
-static int remove_iommu_group(struct device *dev, void *data)

-{
-   iommu_release_device(dev);
-
-   return 0;
-}
-
  static int iommu_bus_notifier(struct notifier_block *nb,
  unsigned long action, void *data)
  {
@@ -1884,27 +1902,12 @@ static int iommu_bus_init(struct bus_type *bus)
   */
  int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops)
  {
-   int err;
-
-   if (ops == NULL) {
-   bus->iommu_ops = NULL;
-   return 0;
-   }
-
-   if (bus->iommu_ops != NULL)
+   if (bus->iommu_ops && ops && bus->iommu_ops != ops)
return -EBUSY;
  
  	bus->iommu_ops = ops;
  
-	/* Do IOMMU specific setup for this bus-type */

-   err = bus_iommu_probe(bus);
-   if (err) {
-   /* Clean up */
-   bus_for_each_dev(bus, NULL, NULL, remove_iommu_group);
-   bus->iommu_ops = NULL;
-   }
-
-   return err;
+   return 0;
  }
  EXPORT_SYMBOL_GPL(bus_set_iommu);
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 11/12] iommu: Per-domain I/O page fault handling

2022-04-29 Thread Baolu Lu

On 2022/4/28 22:57, Jean-Philippe Brucker wrote:

On Thu, Apr 21, 2022 at 01:21:20PM +0800, Lu Baolu wrote:

  static void iopf_handle_group(struct work_struct *work)
  {
struct iopf_group *group;
@@ -134,12 +78,23 @@ static void iopf_handle_group(struct work_struct *work)
group = container_of(work, struct iopf_group, work);
  
  	list_for_each_entry_safe(iopf, next, >faults, list) {

+   struct iommu_domain *domain;
+
+   domain = iommu_get_domain_for_dev_pasid_async(group->dev,
+   iopf->fault.prm.pasid);

Reading the PCIe spec again (v6.0 10.4.1.1 PASID Usage), all faults within
the group have the same PASID so we could move the domain fetch out of the
loop. It does deviate from the old behavior, though, so we could change
it later.


Perhaps we can add a pasid member in the struct iopf_group and do a
sanity check when a new iopf is added to the group? Here, we just fetch
the domain with group->pasid.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-04-29 Thread Baolu Lu

On 2022/4/28 22:53, Jean-Philippe Brucker wrote:

On Thu, Apr 21, 2022 at 01:21:12PM +0800, Lu Baolu wrote:

Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):

  - SVA (Shared Virtual Address)
  - kernel DMA with PASID
  - hardware-assist mediated device

This adds a pair of common domain ops for this purpose and adds helpers
to attach/detach a domain to/from a {device, PASID}. Some buses, like
PCI, route packets without considering the PASID value. Thus a DMA target
address with PASID might be treated as P2P if the address falls into the
MMIO BAR of other devices in the group. To make things simple, these
interfaces only apply to devices belonging to the singleton groups, and
the singleton is immutable in fabric i.e. not affected by hotplug.

Signed-off-by: Lu Baolu 

[...]

+/*
+ * Use standard PCI bus topology, isolation features, and DMA
+ * alias quirks to check the immutable singleton attribute. If
+ * the device came from DT, assume it is static and then
+ * singleton can know from the device count in the group.
+ */
+static bool device_group_immutable_singleton(struct device *dev)
+{
+   struct iommu_group *group = iommu_group_get(dev);
+   int count;
+
+   if (!group)
+   return false;
+
+   mutex_lock(>mutex);
+   count = iommu_group_device_count(group);
+   mutex_unlock(>mutex);
+   iommu_group_put(group);
+
+   if (count != 1)
+   return false;
+
+   if (dev_is_pci(dev)) {
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   /*
+* The device could be considered to be fully isolated if
+* all devices on the path from the device to the host-PCI
+* bridge are protected from peer-to-peer DMA by ACS.
+*/
+   if (!pci_acs_path_enabled(pdev, NULL, REQ_ACS_FLAGS))
+   return false;
+
+   /* Filter out devices which has any alias device. */
+   if (pci_for_each_dma_alias(pdev, has_pci_alias, pdev))
+   return false;


Aren't aliases already added to the group by pci_device_group()?  If so
it's part of the count check above


You are right. pci_device_group() has already covered pci aliases.




+
+   return true;
+   }
+
+   /*
+* If the device came from DT, assume it is static and then
+* singleton can know from the device count in the group.
+*/
+   return is_of_node(dev_fwnode(dev));


I don't think DT is relevant here because a platform device enumerated
through ACPI will also have its own group. It should be safe to stick to
what the IOMMU drivers declare with their device_group() callback. Except
for PCI those groups should be immutable so we can return true here.


Fair enough. My code is too restrict. The group singleton is immutable
as long as the fabric is static or ACS (or similar) technology is
implemented. Currently we only need to care about PCI as far as I can
see.


Thanks,
Jean



Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 10/12] iommu: Prepare IOMMU domain for IOPF

2022-04-29 Thread Baolu Lu

Hi Jean,

On 2022/4/28 22:47, Jean-Philippe Brucker wrote:

Hi Baolu,

On Thu, Apr 21, 2022 at 01:21:19PM +0800, Lu Baolu wrote:

+/*
+ * Get the attached domain for asynchronous usage, for example the I/O
+ * page fault handling framework. The caller get a reference counter
+ * of the domain automatically on a successful return and should put
+ * it with iommu_domain_put() after usage.
+ */
+struct iommu_domain *
+iommu_get_domain_for_dev_pasid_async(struct device *dev, ioasid_t pasid)
+{
+   struct iommu_domain *domain;
+   struct iommu_group *group;
+
+   if (!pasid_valid(pasid))
+   return NULL;
+
+   group = iommu_group_get(dev);
+   if (!group)
+   return NULL;
+
+   mutex_lock(>mutex);


There is a possible deadlock between unbind() and the fault handler:

  unbind()iopf_handle_group()
   mutex_lock(>mutex)
   iommu_detach_device_pasid()
iopf_queue_flush_dev() iommu_get_domain_for_dev_pasid_async()
 ... waits for IOPF workmutex_lock(>mutex)



Yes, really.


I was wrong in my previous review: we do have a guarantee that the SVA
domain does not go away during IOPF handling, because unbind() waits for
pending faults with iopf_queue_flush_dev() before freeing the domain (or
for Arm stall, knows that there are no pending faults). So we can just get
rid of domain->async_users and the group->mutex in IOPF, I think?


Agreed with you. The Intel code does the same thing in its unbind().

Thus, the sva domain's life cycle has already synchronized with IOPF
handling, there's no need for domain->async.

I will drop it in the next version. Thanks you!

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


<    1   2