RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-07 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, April 29, 2021 4:46 AM
> 
> > > I think the name IOASID is fine for the uAPI, the kernel version can
> > > be called ioasid_id or something.
> >
> > ioasid is already an id and then ioasid_id just adds confusion. Another
> > point is that ioasid is currently used to represent both PCI PASID and
> > ARM substream ID in the kernel. It implies that if we want to separate
> > ioasid and pasid in the uAPI the 'pasid' also needs to be replaced with
> > another general term usable for substream ID. Are we making the
> > terms too confusing here?
> 
> This is why I also am not so sure about exposing the PASID in the API
> because it is ultimately a HW specific item.
> 
> As I said to David, one avenue is to have some generic uAPI that is
> very general and keep all this deeply detailed stuff, that really only
> matters for qemu, as part of a more HW specific vIOMMU driver
> interface.
> 

OK, so the general uAPI will not expose hw_id and just provide everything
generic for managing I/O page table (map/unmap, nesting, etc.) through 
IOASID and then specific uAPI is provided to handle platform specific
requirements (hw_id, iova windows, etc.)

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework

2021-05-07 Thread Lu Baolu

Hi Keqian,

On 5/7/21 6:21 PM, Keqian Zhu wrote:

Some types of IOMMU are capable of tracking DMA dirty log, such as
ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
dirty log tracking framework in the IOMMU base layer.

Four new essential interfaces are added, and we maintaince the status
of dirty log tracking in iommu_domain.
1. iommu_support_dirty_log: Check whether domain supports dirty log tracking
2. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
3. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
4. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap

Note: Don't concurrently call these interfaces with other ops that
access underlying page table.

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
  drivers/iommu/iommu.c| 201 +++
  include/linux/iommu.h|  63 +++
  include/trace/events/iommu.h |  63 +++
  3 files changed, 327 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d5df5..0d15620d1e90 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1940,6 +1940,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
domain->type = type;
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+   mutex_init(>switch_log_lock);
  
  	return domain;

  }
@@ -2703,6 +2704,206 @@ int iommu_set_pgtable_quirks(struct iommu_domain 
*domain,
  }
  EXPORT_SYMBOL_GPL(iommu_set_pgtable_quirks);
  
+bool iommu_support_dirty_log(struct iommu_domain *domain)

+{
+   const struct iommu_ops *ops = domain->ops;
+
+   return ops->support_dirty_log && ops->support_dirty_log(domain);
+}
+EXPORT_SYMBOL_GPL(iommu_support_dirty_log);


I suppose this interface is to ask the vendor IOMMU driver to check
whether each device/iommu in the domain supports dirty bit tracking.
But what will happen if new devices with different tracking capability
are added afterward?

To make things simple, is it possible to support this tracking only when
all underlying IOMMUs support dirty bit tracking?

Or, the more crazy idea is that we don't need to check this capability
at all. If dirty bit tracking is not supported by hardware, just mark
all pages dirty?


+
+int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
+  unsigned long iova, size_t size, int prot)
+{
+   const struct iommu_ops *ops = domain->ops;
+   unsigned long orig_iova = iova;
+   unsigned int min_pagesz;
+   size_t orig_size = size;
+   bool flush = false;
+   int ret = 0;
+
+   if (unlikely(!ops->switch_dirty_log))
+   return -ENODEV;
+
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+   if (!IS_ALIGNED(iova | size, min_pagesz)) {
+   pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+  iova, size, min_pagesz);
+   return -EINVAL;
+   }
+
+   mutex_lock(>switch_log_lock);
+   if (enable && domain->dirty_log_tracking) {
+   ret = -EBUSY;
+   goto out;
+   } else if (!enable && !domain->dirty_log_tracking) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   pr_debug("switch_dirty_log %s for: iova 0x%lx size 0x%zx\n",
+enable ? "enable" : "disable", iova, size);
+
+   while (size) {
+   size_t pgsize = iommu_pgsize(domain, iova, size);
+
+   flush = true;
+   ret = ops->switch_dirty_log(domain, enable, iova, pgsize, prot);


Per minimal page callback is much expensive. How about using (pagesize,
count), so that all pages with the same page size could be handled in a
single indirect call? I remember I commented this during last review,
but I don't mind doing it again.

Best regards,
baolu


+   if (ret)
+   break;
+
+   pr_debug("switch_dirty_log handled: iova 0x%lx size 0x%zx\n",
+iova, pgsize);
+
+   iova += pgsize;
+   size -= pgsize;
+   }
+
+   if (flush)
+   iommu_flush_iotlb_all(domain);
+
+   if (!ret) {
+   domain->dirty_log_tracking = enable;
+   trace_switch_dirty_log(orig_iova, orig_size, enable);
+   }
+out:
+   mutex_unlock(>switch_log_lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_switch_dirty_log);
+
+int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
+size_t size, unsigned long *bitmap,
+unsigned long base_iova, unsigned long bitmap_pgshift)
+{
+   const struct iommu_ops *ops = domain->ops;
+   unsigned long orig_iova = iova;
+   unsigned int min_pagesz;
+   size_t orig_size = size;
+   int ret = 0;
+
+   if 

[PATCH -next] iommu/virtio: Add missing MODULE_DEVICE_TABLE

2021-05-07 Thread Bixuan Cui
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.

Reported-by: Hulk Robot 
Signed-off-by: Bixuan Cui 
---
 drivers/iommu/virtio-iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 7c02481a81b4..c6e5ee4d9cef 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1136,6 +1136,7 @@ static struct virtio_device_id id_table[] = {
{ VIRTIO_ID_IOMMU, VIRTIO_DEV_ANY_ID },
{ 0 },
 };
+MODULE_DEVICE_TABLE(virtio, id_table);
 
 static struct virtio_driver virtio_iommu_drv = {
.driver.name= KBUILD_MODNAME,

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/of: Fix request and enable ACS for of_iommu_configure

2021-05-07 Thread Xingang Wang

Hi Bjorn,

On 2021/5/8 5:14, Bjorn Helgaas wrote:

On Fri, May 07, 2021 at 12:49:53PM +, Wang Xingang wrote:

From: Xingang Wang 

When request ACS for PCI device in of_iommu_configure, the pci device
has already been scanned and added with 'pci_acs_enable=0'. So the
pci_request_acs() in current procedure does not work for enabling ACS.
Besides, the ACS should be enabled only if there's an IOMMU in system.
So this fix the call of pci_request_acs() and call pci_enable_acs() to
make sure ACS is enabled for the pci_device.


For consistency:

   s/of_iommu_configure/of_iommu_configure()/
   s/pci device/PCI device/
   s/pci_device/PCI device/

But I'm confused about what problem this fixes.  On x86, I think we
*do* set pci_acs_enable=1 in this path:

   start_kernel
 mm_init
   mem_init
 pci_iommu_alloc
   p->detect()
 detect_intel_iommu   # IOMMU_INIT_POST(detect_intel_iommu)
   pci_request_acs
 pci_acs_enable = 1

before enumerating any PCI devices.

But you mentioned pci_host_common_probe(), which I think is mostly
used on non-x86 architectures, and I'm guessing those arches detect
the IOMMU differently.



I am testing on an arm architecture, and found that the ACS is enabled 
properly when booting with ACPI. With ACPI enabled, x86 IOMMU is checked 
when parsing DMAR, arm SMMU is checked when parsing IORT.


But when booting with devicetree, IOMMU is checked in 
of_iommu_configure, and the pci_request_acs() is late at this time.



So my question is, can we figure out how to detect IOMMUs the same way
across all arches?



Thanks.
It would be better if there's a way to detect IOMMUs across all arches 
and both for ACPI and DT.



Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
configuring IOMMU linkage")
Signed-off-by: Xingang Wang 
---
  drivers/iommu/of_iommu.c | 10 +-
  drivers/pci/pci.c|  2 +-
  include/linux/pci.h  |  1 +
  3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index a9d2df001149..dc621861ae72 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
.np = master_np,
};
  
-		pci_request_acs();

err = pci_for_each_dma_alias(to_pci_dev(dev),
 of_pci_iommu_init, );
} else {
@@ -222,6 +221,15 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
/* The fwspec pointer changed, read it again */
fwspec = dev_iommu_fwspec_get(dev);
ops= fwspec->ops;
+
+   /*
+* If we found an IOMMU and the device is pci,
+* make sure we enable ACS.


s/pci/PCI/ for consistency.


+*/
+   if (dev_is_pci(dev)) {
+   pci_request_acs();
+   pci_enable_acs(to_pci_dev(dev));
+   }
}
/*
 * If we have reason to believe the IOMMU driver missed the initial
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b717680377a9..4e4f98ee2870 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -926,7 +926,7 @@ static void pci_std_enable_acs(struct pci_dev *dev)
   * pci_enable_acs - enable ACS if hardware support it
   * @dev: the PCI device
   */
-static void pci_enable_acs(struct pci_dev *dev)
+void pci_enable_acs(struct pci_dev *dev)
  {
if (!pci_acs_enable)
goto disable_acs_redir;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c20211e59a57..e6a8bfbc9c98 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2223,6 +2223,7 @@ static inline struct pci_dev *pcie_find_root_port(struct 
pci_dev *dev)
  }
  
  void pci_request_acs(void);

+void pci_enable_acs(struct pci_dev *dev);
  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
  bool pci_acs_path_enabled(struct pci_dev *start,
  struct pci_dev *end, u16 acs_flags);
--
2.19.1


.



Thanks

Xingang

.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Regression when booting 5.15 as dom0 on arm64 (WAS: Re: [linux-linus test] 161829: regressions - FAIL)

2021-05-07 Thread Julien Grall

Hi all,

On 07/05/2021 22:00, osstest service owner wrote:

flight 161829 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/161829/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:


[...]


  test-arm64-arm64-examine  8 reboot   fail REGR. vs. 152332
  test-arm64-arm64-libvirt-xsm  8 xen-boot fail REGR. vs. 152332
  test-arm64-arm64-xl-credit2   8 xen-boot fail REGR. vs. 152332
  test-arm64-arm64-xl-credit1   8 xen-boot fail REGR. vs. 152332
  test-arm64-arm64-xl   8 xen-boot fail REGR. vs. 152332
  test-arm64-arm64-xl-xsm   8 xen-boot fail REGR. vs. 152332
  test-amd64-amd64-amd64-pvgrub 20 guest-stop  fail REGR. vs. 152332
  test-amd64-amd64-i386-pvgrub 20 guest-stop   fail REGR. vs. 152332
  test-arm64-arm64-xl-thunderx  8 xen-boot fail REGR. vs. 152332
  test-amd64-amd64-xl-qcow213 guest-start  fail REGR. vs. 152332
  test-amd64-amd64-libvirt-vhd 13 guest-start  fail REGR. vs. 152332
  test-amd64-amd64-examine  4 memdisk-try-append   fail REGR. vs. 152332
  test-arm64-arm64-xl-seattle   8 xen-boot fail REGR. vs. 152332
  test-armhf-armhf-xl-vhd  13 guest-start  fail REGR. vs. 152332


Osstest reported dom0 boot failure on all the arm64 platform we have. 
The stack trace is similar everywhere:


[   18.101077] Unable to handle kernel NULL pointer dereference at 
virtual address 0008

[   18.101441] Mem abort info:
[   18.101625]   ESR = 0x9604
[   18.101839]   EC = 0x25: DABT (current EL), IL = 32 bits
[   18.102111]   SET = 0, FnV = 0
[   18.102327]   EA = 0, S1PTW = 0
[   18.102544] Data abort info:
[   18.102747]   ISV = 0, ISS = 0x0004
[   18.102968]   CM = 0, WnR = 0
[   18.103183] [0008] user address but active_mm is swapper
[   18.103476] Internal error: Oops: 9604 [#1] SMP
[   18.103689] Modules linked in:
[   18.103881] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc3+ #126
[   18.104172] Hardware name: Foundation-v8A (DT)
[   18.104376] pstate: 6005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[   18.104653] pc : xen_swiotlb_dma_supported+0x30/0xc8
[   18.104893] lr : dma_supported+0x38/0x68
[   18.105118] sp : 80001295bac0
[   18.105289] x29: 80001295bac0 x28: 8000114f44c0
[   18.105600] x27: 0007 x26: 8000113a1000
[   18.105906] x25:  x24: 800011d2e910
[   18.106213] x23: 800011d4d000 x22: 12fad810
[   18.106525] x21:  x20: 
[   18.106837] x19: 12fad810 x18: ffeb
[   18.107146] x17:  x16: 493f1445
[   18.107450] x15: 80001132d000 x14: 1c131000
[   18.107759] x13: 498d0616 x12: 8000129f7000
[   18.108068] x11: 12c08710 x10: 800011a91000
[   18.108380] x9 : 3000 x8 : 1000
[   18.108722] x7 : 800011a90a88 x6 : 800010a7275c
[   18.109031] x5 :  x4 : 0001
[   18.109331] x3 : 2cd8f9dc91b3df00 x2 : 8000109c7578
[   18.109640] x1 :  x0 : 
[   18.109940] Call trace:
[   18.110079]  xen_swiotlb_dma_supported+0x30/0xc8
[   18.110319]  dma_supported+0x38/0x68
[   18.110543]  dma_set_mask+0x30/0x58
[   18.110765]  virtio_mmio_probe+0x1c8/0x238
[   18.110979]  platform_probe+0x6c/0x108
[   18.88]  really_probe+0xfc/0x3c8
[   18.111413]  driver_probe_device+0x68/0xe8
[   18.111647]  device_driver_attach+0x74/0x98
[   18.111883]  __driver_attach+0x98/0xe0
[   18.112111]  bus_for_each_dev+0x84/0xd8
[   18.112334]  driver_attach+0x30/0x40
[   18.112557]  bus_add_driver+0x168/0x228
[   18.112784]  driver_register+0x64/0x110
[   18.113016]  __platform_driver_register+0x34/0x40
[   18.113257]  virtio_mmio_init+0x20/0x28
[   18.113480]  do_one_initcall+0x90/0x470
[   18.113694]  kernel_init_freeable+0x3ec/0x440
[   18.113950]  kernel_init+0x1c/0x138
[   18.114158]  ret_from_fork+0x10/0x18
[   18.114409] Code: f000f641 f000a200 f944e821 f9410400 (f9400434)
[   18.114718] ---[ end trace d39406719f25d228 ]---
[   18.115044] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x000b

[   18.115339] SMP: stopping secondary CPUs
[   18.115584] Kernel Offset: disabled
[   18.115743] CPU features: 0x0024,61802000
[   18.115954] Memory Limit: none
[   18.116173] ---[ end Kernel panic - not syncing: Attempted to kill 
init! exitcode=0x000b ]---


I have bisected manually and pinpoint the following commit:

commit 2726bf3ff2520dba61fafc90a055640f7ad54e05 (refs/bisect/bad)
Author: Florian Fainelli 
Date:   Mon Mar 22 18:53:49 2021 -0700

swiotlb: Make SWIOTLB_NO_FORCE perform no allocation

When SWIOTLB_NO_FORCE is used, there should really be no allocations of
default_nslabs to occur since we 

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-07 Thread Jacob Pan
Hi Jason,

On Fri, 7 May 2021 16:28:10 -0300, Jason Gunthorpe  wrote:

> > The unanswered question is how do we plumb from vIOMMU without a custom
> > allocator to get a system wide PASID?   
> 
> PASID allocation is part of the iommu driver, it really shouldn't be
> global.
> 
In the current code, the pluggable custom allocator *is* part of the iommu
vendor driver. If it decides the allocation is global then it should be
suitable for the platform since there will never be a VT-d IOMMU on another
vendor's platform.

It is true that the default allocator is global which suites the current
needs. I am just wondering if we are solving a problem does not exist yet.

> When the architecture code goes to allocate a single PASID for the
> mm_struct it should flag that allocation request with a 'must work for
> all RIDs flag' and the iommu driver should take care of it. That might
> mean the iommu driver consults a global static xarray, or maybe it
> does a hypercall, but it should be done through that API, not a side
> care global singleton.
Why do we need to flag the allocation every time if on a platform *every*
PASID can potentially be global? At the time of allocation, we don't know
if the PASID will be used for a shared (ENQCMD) or a dedicated workqueue.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/of: Fix request and enable ACS for of_iommu_configure

2021-05-07 Thread Bjorn Helgaas
On Fri, May 07, 2021 at 12:49:53PM +, Wang Xingang wrote:
> From: Xingang Wang 
> 
> When request ACS for PCI device in of_iommu_configure, the pci device
> has already been scanned and added with 'pci_acs_enable=0'. So the
> pci_request_acs() in current procedure does not work for enabling ACS.
> Besides, the ACS should be enabled only if there's an IOMMU in system.
> So this fix the call of pci_request_acs() and call pci_enable_acs() to
> make sure ACS is enabled for the pci_device.

For consistency:

  s/of_iommu_configure/of_iommu_configure()/
  s/pci device/PCI device/
  s/pci_device/PCI device/

But I'm confused about what problem this fixes.  On x86, I think we
*do* set pci_acs_enable=1 in this path:

  start_kernel
mm_init
  mem_init
pci_iommu_alloc
  p->detect()
detect_intel_iommu   # IOMMU_INIT_POST(detect_intel_iommu)
  pci_request_acs
pci_acs_enable = 1

before enumerating any PCI devices.

But you mentioned pci_host_common_probe(), which I think is mostly
used on non-x86 architectures, and I'm guessing those arches detect
the IOMMU differently.

So my question is, can we figure out how to detect IOMMUs the same way
across all arches?

> Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> configuring IOMMU linkage")
> Signed-off-by: Xingang Wang 
> ---
>  drivers/iommu/of_iommu.c | 10 +-
>  drivers/pci/pci.c|  2 +-
>  include/linux/pci.h  |  1 +
>  3 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index a9d2df001149..dc621861ae72 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   .np = master_np,
>   };
>  
> - pci_request_acs();
>   err = pci_for_each_dma_alias(to_pci_dev(dev),
>of_pci_iommu_init, );
>   } else {
> @@ -222,6 +221,15 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   /* The fwspec pointer changed, read it again */
>   fwspec = dev_iommu_fwspec_get(dev);
>   ops= fwspec->ops;
> +
> + /*
> +  * If we found an IOMMU and the device is pci,
> +  * make sure we enable ACS.

s/pci/PCI/ for consistency.

> +  */
> + if (dev_is_pci(dev)) {
> + pci_request_acs();
> + pci_enable_acs(to_pci_dev(dev));
> + }
>   }
>   /*
>* If we have reason to believe the IOMMU driver missed the initial
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b717680377a9..4e4f98ee2870 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -926,7 +926,7 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   * pci_enable_acs - enable ACS if hardware support it
>   * @dev: the PCI device
>   */
> -static void pci_enable_acs(struct pci_dev *dev)
> +void pci_enable_acs(struct pci_dev *dev)
>  {
>   if (!pci_acs_enable)
>   goto disable_acs_redir;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c20211e59a57..e6a8bfbc9c98 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -2223,6 +2223,7 @@ static inline struct pci_dev 
> *pcie_find_root_port(struct pci_dev *dev)
>  }
>  
>  void pci_request_acs(void);
> +void pci_enable_acs(struct pci_dev *dev);
>  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
>  bool pci_acs_path_enabled(struct pci_dev *start,
> struct pci_dev *end, u16 acs_flags);
> -- 
> 2.19.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/6] dt-bindings: iommu: rockchip: Convert IOMMU to DT schema

2021-05-07 Thread Rob Herring
On Fri, 07 May 2021 11:02:27 +0200, Benjamin Gaignard wrote:
> Convert Rockchip IOMMU to DT schema
> 
> Signed-off-by: Benjamin Gaignard 
> ---
> version 4:
>  - Add descriptions for reg items
>  - Add description for interrupts items
>  - Remove useless interrupt-names proporties
> 
> version 2:
>  - Change maintainer
>  - Change reg maxItems
>  - Change interrupt maxItems
>  .../bindings/iommu/rockchip,iommu.txt | 38 -
>  .../bindings/iommu/rockchip,iommu.yaml| 80 +++
>  2 files changed, 80 insertions(+), 38 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
>  create mode 100644 
> Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
> 

Reviewed-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-07 Thread Jason Gunthorpe
On Fri, May 07, 2021 at 12:23:25PM -0700, Raj, Ashok wrote:
> Hi Jason
> 
> - Removed lizefan's email due to bounces... 
> 
> On Fri, May 07, 2021 at 03:20:50PM -0300, Jason Gunthorpe wrote:
> > On Fri, May 07, 2021 at 11:14:58AM -0700, Raj, Ashok wrote:
> > > On Fri, May 07, 2021 at 02:20:51PM -0300, Jason Gunthorpe wrote:
> > > > On Thu, May 06, 2021 at 09:32:40AM -0700, Raj, Ashok wrote:
> > > > 
> > > > > For platforms that support ENQCMD, it is required to mandate PASIDs 
> > > > > are
> > > > > global across the entire system. Maybe its better to call them gPASID 
> > > > > for
> > > > > guest and hPASID for host. Short reason being gPASID->hPASID is a 
> > > > > guest
> > > > > wide mapping for ENQCMD and not a per-RID based mapping. (We covered 
> > > > > that
> > > > > in earlier responses)
> > > > 
> > > > I don't think it is actually ENQCMD that forces this, ENQCMD can use a
> > > > per-RID PASID in the translation table as well.
> > > 
> > > When using ENQCMD the PASID that needs to be sent on the wire is picked
> > > from an MSR setup by kernel. This is context switched along with the
> > > process. So each process has only 1 PASID that can go out when using
> > > ENQCMD. ENQCMD takes one mmio address specific to the acclerator and a
> > > source for the descriptor.
> > 
> > Oh. I forgot this also globally locked the PASID to a single
> > MSR. Sigh. That makes the whole mechanism useless for anything except
> > whole process SVA.
> 
> Is there another kind of SVA? Our mapping from that each process requires a
> single mm, and PASID for SVM was a direct map from that. 

There are lots of potential applications for something like ENQCMD
that are not whole process SVA. Linking it to a single PASID basically
nukes any other use of it unfortunately.

> > I would have to ask for a PASID that has the property it needs. You
> > are saying the property is even bigger than "usable on a group of
> > RIDs" but is actually "global for every RID and IOMMU in the system so
> > it can go into a MSR". Gross, but fine, ask for that explicitly when
> > allocating the PASID.
> 
> If one process has a single mm, is that also gross? :-) So a single process
> having a PASID is just an identifier for IOMMU. It just seems like what a
> mm is for a process == PASID for SVM-IOMMU support.
> 
> The unanswered question is how do we plumb from vIOMMU without a custom
> allocator to get a system wide PASID? 

PASID allocation is part of the iommu driver, it really shouldn't be
global.

When the architecture code goes to allocate a single PASID for the
mm_struct it should flag that allocation request with a 'must work for
all RIDs flag' and the iommu driver should take care of it. That might
mean the iommu driver consults a global static xarray, or maybe it
does a hypercall, but it should be done through that API, not a side
care global singleton.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-07 Thread Raj, Ashok
Hi Jason

- Removed lizefan's email due to bounces... 

On Fri, May 07, 2021 at 03:20:50PM -0300, Jason Gunthorpe wrote:
> On Fri, May 07, 2021 at 11:14:58AM -0700, Raj, Ashok wrote:
> > On Fri, May 07, 2021 at 02:20:51PM -0300, Jason Gunthorpe wrote:
> > > On Thu, May 06, 2021 at 09:32:40AM -0700, Raj, Ashok wrote:
> > > 
> > > > For platforms that support ENQCMD, it is required to mandate PASIDs are
> > > > global across the entire system. Maybe its better to call them gPASID 
> > > > for
> > > > guest and hPASID for host. Short reason being gPASID->hPASID is a guest
> > > > wide mapping for ENQCMD and not a per-RID based mapping. (We covered 
> > > > that
> > > > in earlier responses)
> > > 
> > > I don't think it is actually ENQCMD that forces this, ENQCMD can use a
> > > per-RID PASID in the translation table as well.
> > 
> > When using ENQCMD the PASID that needs to be sent on the wire is picked
> > from an MSR setup by kernel. This is context switched along with the
> > process. So each process has only 1 PASID that can go out when using
> > ENQCMD. ENQCMD takes one mmio address specific to the acclerator and a
> > source for the descriptor.
> 
> Oh. I forgot this also globally locked the PASID to a single
> MSR. Sigh. That makes the whole mechanism useless for anything except
> whole process SVA.

Is there another kind of SVA? Our mapping from that each process requires a
single mm, and PASID for SVM was a direct map from that. 

> 
> It also make it a general kernel problem and not just related to the
> vIOMMU scenario.
> 
> > > I think at the uAPI level the callpaths that require allocating a
> > > PASID from a group of RIDs should be explicit in their intention and
> > > not implicitly rely on a certain allocator behavior.
> > 
> > The difficult part I see is, when one application establishes a path
> > to one acclerator, we have no knowledge if its going to connect to a
> > second, third or such. I don't see how this can work reasonably
> > well. What if PASIDx is allocated for one, but the second RID its
> > trying to attach already has this PASID allocated?
> 
> You mean like some kind of vIOMMU hot plug?

Not vIOMMU hot plug. but an application opens accel1, does a bind to
allocate a PASID. What i meant was kernel has no information if this needs
to be a per-RID PASID, or a global PASID. Keeping this global solves the
other problems or more complex mechanisms to say "Reserve this PASID on all
accelerators" which seems pretty complicated to implement.

Now are we loosing anything by keeping the PASIDs global? 

As we discussed there is no security issue since the PASID table that hosts 
these PASIDs for SVM are still per-RID.  For e.g.

app establishes connection to accl1, allocates PASID-X
   RID for accel1 now has PASID-X and the process mm plummed 
later app also connects with accl2, now the PASID-X is plummed in for RID
of accel2.


> 
> > > If you want to get a PASID that can be used with every RID on in your
> > > /dev/ioasid then ask for that exactly.
> > 
> > Correct, but how does guest through vIOMMU driver communicate that intent 
> > so uAPI
> > plumbing can do this? I mean architecturally via IOMMU interfaces? 
> 
> I would have to ask for a PASID that has the property it needs. You
> are saying the property is even bigger than "usable on a group of
> RIDs" but is actually "global for every RID and IOMMU in the system so
> it can go into a MSR". Gross, but fine, ask for that explicitly when
> allocating the PASID.

If one process has a single mm, is that also gross? :-) So a single process
having a PASID is just an identifier for IOMMU. It just seems like what a
mm is for a process == PASID for SVM-IOMMU support.

The unanswered question is how do we plumb from vIOMMU without a custom
allocator to get a system wide PASID? 

The way it works today is if we have a custom allocator registered, that's
the mechanics to get PASIDs allocated. for Intel vIOMMU it happens to be a
global unique allocation. If a particular vIOMMU doesn't require, it does
not have vIOMMU interface, and those naturally get a guest local PASID name
space. (Im not sure if that's how the allocator works today, but I guess its
extensible to accomplish a RID local PASID if that's exactly what is
required)

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-07 Thread Jason Gunthorpe
On Fri, May 07, 2021 at 11:14:58AM -0700, Raj, Ashok wrote:
> On Fri, May 07, 2021 at 02:20:51PM -0300, Jason Gunthorpe wrote:
> > On Thu, May 06, 2021 at 09:32:40AM -0700, Raj, Ashok wrote:
> > 
> > > For platforms that support ENQCMD, it is required to mandate PASIDs are
> > > global across the entire system. Maybe its better to call them gPASID for
> > > guest and hPASID for host. Short reason being gPASID->hPASID is a guest
> > > wide mapping for ENQCMD and not a per-RID based mapping. (We covered that
> > > in earlier responses)
> > 
> > I don't think it is actually ENQCMD that forces this, ENQCMD can use a
> > per-RID PASID in the translation table as well.
> 
> When using ENQCMD the PASID that needs to be sent on the wire is picked
> from an MSR setup by kernel. This is context switched along with the
> process. So each process has only 1 PASID that can go out when using
> ENQCMD. ENQCMD takes one mmio address specific to the acclerator and a
> source for the descriptor.

Oh. I forgot this also globally locked the PASID to a single
MSR. Sigh. That makes the whole mechanism useless for anything except
whole process SVA.

It also make it a general kernel problem and not just related to the
vIOMMU scenario.

> > I think at the uAPI level the callpaths that require allocating a
> > PASID from a group of RIDs should be explicit in their intention and
> > not implicitly rely on a certain allocator behavior.
> 
> The difficult part I see is, when one application establishes a path
> to one acclerator, we have no knowledge if its going to connect to a
> second, third or such. I don't see how this can work reasonably
> well. What if PASIDx is allocated for one, but the second RID its
> trying to attach already has this PASID allocated?

You mean like some kind of vIOMMU hot plug?

> > If you want to get a PASID that can be used with every RID on in your
> > /dev/ioasid then ask for that exactly.
> 
> Correct, but how does guest through vIOMMU driver communicate that intent so 
> uAPI
> plumbing can do this? I mean architecturally via IOMMU interfaces? 

I would have to ask for a PASID that has the property it needs. You
are saying the property is even bigger than "usable on a group of
RIDs" but is actually "global for every RID and IOMMU in the system so
it can go into a MSR". Gross, but fine, ask for that explicitly when
allocating the PASID.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-07 Thread Raj, Ashok
On Fri, May 07, 2021 at 02:20:51PM -0300, Jason Gunthorpe wrote:
> On Thu, May 06, 2021 at 09:32:40AM -0700, Raj, Ashok wrote:
> 
> > For platforms that support ENQCMD, it is required to mandate PASIDs are
> > global across the entire system. Maybe its better to call them gPASID for
> > guest and hPASID for host. Short reason being gPASID->hPASID is a guest
> > wide mapping for ENQCMD and not a per-RID based mapping. (We covered that
> > in earlier responses)
> 
> I don't think it is actually ENQCMD that forces this, ENQCMD can use a
> per-RID PASID in the translation table as well.

When using ENQCMD the PASID that needs to be sent on the wire is picked
from an MSR setup by kernel. This is context switched along with the
process. So each process has only 1 PASID that can go out when using
ENQCMD. ENQCMD takes one mmio address specific to the acclerator and a
source for the descriptor. 

When one application is connecting to more than one accelerator since this
is MSR based filled in by the cpu instruction automaticaly requires both
accelerators to use the same PASID. 

Did you refer to this implementation? or something else?
> 
> You get forced here only based on the design of the vIOMMU
> communication channel. If the guest can demand any RID is attached to
> a specific guest determined PASID then the hypervisor must accommodate
> that.

True, but when we have guest using vSVM, and enabling vENQCMD the
requirement is the same inside a guest.
> 
> > > Which would be a different behavior than something like Intel's top
> > > level IOASID that doesn't claim all the PASIDs.
> > 
> > isn't this simple, if we can say ioasid allocator can provide 
> > 
> > - system wide PASID
> > - RID local PASID
> > 
> > Based on platform capabilities that require such differentiation?
> 
> I think at the uAPI level the callpaths that require allocating a
> PASID from a group of RIDs should be explicit in their intention and
> not implicitly rely on a certain allocator behavior.

The difficult part I see is, when one application establishes a path to one
acclerator, we have no knowledge if its going to connect to a second, third 
or such. I don't see how this can work reasonably well. What if PASIDx is 
allocated
for one, but the second RID its trying to attach already has this
PASID allocated? 

> 
> If you want to get a PASID that can be used with every RID on in your
> /dev/ioasid then ask for that exactly.

Correct, but how does guest through vIOMMU driver communicate that intent so 
uAPI
plumbing can do this? I mean architecturally via IOMMU interfaces? 


Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-07 Thread Jason Gunthorpe
On Thu, May 06, 2021 at 09:32:40AM -0700, Raj, Ashok wrote:

> For platforms that support ENQCMD, it is required to mandate PASIDs are
> global across the entire system. Maybe its better to call them gPASID for
> guest and hPASID for host. Short reason being gPASID->hPASID is a guest
> wide mapping for ENQCMD and not a per-RID based mapping. (We covered that
> in earlier responses)

I don't think it is actually ENQCMD that forces this, ENQCMD can use a
per-RID PASID in the translation table as well.

You get forced here only based on the design of the vIOMMU
communication channel. If the guest can demand any RID is attached to
a specific guest determined PASID then the hypervisor must accommodate
that.

> > Which would be a different behavior than something like Intel's top
> > level IOASID that doesn't claim all the PASIDs.
> 
> isn't this simple, if we can say ioasid allocator can provide 
> 
> - system wide PASID
> - RID local PASID
> 
> Based on platform capabilities that require such differentiation?

I think at the uAPI level the callpaths that require allocating a
PASID from a group of RIDs should be explicit in their intention and
not implicitly rely on a certain allocator behavior.

If you want to get a PASID that can be used with every RID on in your
/dev/ioasid then ask for that exactly.

It makes the uAPI much more understandable to be explicit.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-07 Thread Jason Gunthorpe
On Fri, May 07, 2021 at 11:06:14AM -0600, Alex Williamson wrote:

> We had tossed around an idea of a super-container with vfio, it's maybe
> something we'd want to incorporate into this design.  For instance, if
> memory could be pre-registered with a super container, which would
> handle the locked memory accounting for that memory, then
> sub-containers could all handle the IOMMU context of their sets of
> devices relative to that common memory pool.

This is where I suggested to David to use nesting of IOASIDs.

Without HW support for nesting a SW nest is really just re-using the
memory registration information stored in the parent when constructing
the children

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-07 Thread Alex Williamson
On Fri, 7 May 2021 07:36:49 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson 
> > Sent: Wednesday, April 28, 2021 11:06 PM
> > 
> > On Wed, 28 Apr 2021 06:34:11 +
> > "Tian, Kevin"  wrote:
> > >
> > > Can you or Alex elaborate where the complexity and performance problem
> > > locate in VFIO map/umap? We'd like to understand more detail and see  
> > how  
> > > to avoid it in the new interface.  
> > 
> > 
> > The map/unmap interface is really only good for long lived mappings,
> > the overhead is too high for things like vIOMMU use cases or any case
> > where the mapping is intended to be dynamic.  Userspace drivers must
> > make use of a long lived buffer mapping in order to achieve performance.  
> 
> This is not a limitation of VFIO map/unmap. It's the limitation of any
> map/unmap semantics since the fact of long-lived vs. short-lived is 
> imposed by userspace. Nested translation is the only viable optimization
> allowing 2nd-level to be a long-lived mapping even w/ vIOMMU. From 
> this angle I'm not sure how a new map/unmap implementation could 
> address this perf limitation alone.

Sure, we don't need to try to tackle every problem at once, a map/unmap
interface compatible with what we have is a good place to start and
nested translation may provide the high performance option.  That's not
to say that we couldn't, in the future, extend the map/unmap with memory
pre-registration like done in the spapr IOMMU to see how that could
reduce latency.

> > The mapping and unmapping granularity has been a problem as well,
> > type1v1 allowed arbitrary unmaps to bisect the original mapping, with
> > the massive caveat that the caller relies on the return value of the
> > unmap to determine what was actually unmapped because the IOMMU use
> > of
> > superpages is transparent to the caller.  This led to type1v2 that
> > simply restricts the user to avoid ever bisecting mappings.  That still
> > leaves us with problems for things like virtio-mem support where we
> > need to create initial mappings with a granularity that allows us to
> > later remove entries, which can prevent effective use of IOMMU
> > superpages.  
> 
> We could start with a semantics similar to type1v2. 
> 
> btw why does virtio-mem require a smaller granularity? Can we split
> superpages in-the-fly when removal actually happens (just similar
> to page split in VM live migration for efficient dirty page tracking)?

The IOMMU API doesn't currently support those semantics.  If the IOMMU
used a superpage, then a superpage gets unmapped, it doesn't get
atomically broken down into smaller pages.  Therefore virtio-mem
proposes a fixed mapping granularity to allow for that same unmapping
granularity.

> and isn't it another problem imposed by userspace? How could a new
> map/unmap implementation mitigate this problem if the userspace 
> insists on a smaller granularity for initial mappings?

Currently if userspace wants to guarantee unmap granularity, they need
to make the same restriction themselves on the mapping granularity.
For instance, userspace cannot currently map a 1GB IOVA range while
guaranteeing 2MB unmap granularity of that range with a single ioctl.
Instead userspace would need to make 512, 2MB mappings calls.

> > Locked page accounting has been another constant issue.  We perform
> > locked page accounting at the container level, where each container
> > accounts independently.  A user may require multiple containers, the
> > containers may pin the same physical memory, but be accounted against
> > the user once per container.  
> 
> for /dev/ioasid there is still an open whether an process is allowed to
> open /dev/ioasid once or multiple times. If there is only one ioasid_fd
> per process, the accounting can be made accurately. otherwise the
> same problem still exists as each ioasid_fd is akin to the container, then
> we need find a better solution.

We had tossed around an idea of a super-container with vfio, it's maybe
something we'd want to incorporate into this design.  For instance, if
memory could be pre-registered with a super container, which would
handle the locked memory accounting for that memory, then
sub-containers could all handle the IOMMU context of their sets of
devices relative to that common memory pool.
 
> > Those are the main ones I can think of.  It is nice to have a simple
> > map/unmap interface, I'd hope that a new /dev/ioasid interface wouldn't
> > raise the barrier to entry too high, but the user needs to have the
> > ability to have more control of their mappings and locked page
> > accounting should probably be offloaded somewhere.  Thanks,
> >   
> 
> Based on your feedbacks I feel it's probably reasonable to start with
> a type1v2 semantics for the new interface. Locked accounting could
> also start with the same VFIO restriction and then improve it
> incrementally, if a cleaner way is intrusive (if not affecting uAPI).
> But I didn't get the suggestion on "more control of their 

Re: [PATCH v3 4/4] iommu: rockchip: Add support iommu v2

2021-05-07 Thread Robin Murphy

On 2021-05-04 09:41, Benjamin Gaignard wrote:

From: Simon Xue 

RK356x SoC got new IOMMU hardware block (version 2).
Add a compatible to distinguish it from the first version.

Signed-off-by: Simon Xue 
[Benjamin]
- port driver from kernel 4.19 to 5.12
- change functions prototype.
- squash all fixes in this commit.
Signed-off-by: Benjamin Gaignard 
---
version 3:
  - Change compatible name to use SoC name

  drivers/iommu/rockchip-iommu.c | 422 +++--
  1 file changed, 406 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index e5d86b7177de..32dcf0aaec18 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -19,6 +19,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -81,6 +82,30 @@
*/
  #define RK_IOMMU_PGSIZE_BITMAP 0x007ff000
  
+#define DT_LO_MASK 0xf000

+#define DT_HI_MASK GENMASK_ULL(39, 32)


Mixing GENMASK and raw constants seems a bit inconsistent.


+#define DT_SHIFT   28
+
+#define DTE_BASE_HI_MASK GENMASK(11, 4)
+
+#define PAGE_DESC_LO_MASK   0xf000
+#define PAGE_DESC_HI1_LOWER 32
+#define PAGE_DESC_HI1_UPPER 35
+#define PAGE_DESC_HI2_LOWER 36
+#define PAGE_DESC_HI2_UPPER 39
+#define PAGE_DESC_HI_MASK1  GENMASK_ULL(PAGE_DESC_HI1_UPPER, 
PAGE_DESC_HI1_LOWER)
+#define PAGE_DESC_HI_MASK2  GENMASK_ULL(PAGE_DESC_HI2_UPPER, 
PAGE_DESC_HI2_LOWER)


It might just be me, but that ends up as just an unreadable mess. I can 
see the overall intent, but either way it's hardly pretty. Maybe 
consider trying the bitfield.h helpers so you wouldn't have to keep 
track of explicit shifts at all?



+#define DTE_HI1_LOWER 8
+#define DTE_HI1_UPPER 11
+#define DTE_HI2_LOWER 4
+#define DTE_HI2_UPPER 7
+#define DTE_HI_MASK1  GENMASK(DTE_HI1_UPPER, DTE_HI1_LOWER)
+#define DTE_HI_MASK2  GENMASK(DTE_HI2_UPPER, DTE_HI2_LOWER)
+
+#define PAGE_DESC_HI_SHIFT1 (PAGE_DESC_HI1_LOWER - DTE_HI1_LOWER)
+#define PAGE_DESC_HI_SHIFT2 (PAGE_DESC_HI2_LOWER - DTE_HI2_LOWER)


What makes it even worse is the contrast between these and "#define 
DT_SHIFT 28" above, as equal parts of the same patch :(



+
  struct rk_iommu_domain {
struct list_head iommus;
u32 *dt; /* page directory table */
@@ -91,6 +116,10 @@ struct rk_iommu_domain {
struct iommu_domain domain;
  };
  
+struct rockchip_iommu_data {

+   u32 version;
+};
+
  /* list of clocks required by IOMMU */
  static const char * const rk_iommu_clocks[] = {
"aclk", "iface",
@@ -108,6 +137,7 @@ struct rk_iommu {
struct list_head node; /* entry in rk_iommu_domain.iommus */
struct iommu_domain *domain; /* domain to which iommu is attached */
struct iommu_group *group;
+   u32 version;
  };
  
  struct rk_iommudata {

@@ -174,11 +204,32 @@ static struct rk_iommu_domain *to_rk_domain(struct 
iommu_domain *dom)
  #define RK_DTE_PT_ADDRESS_MASK0xf000
  #define RK_DTE_PT_VALID   BIT(0)
  
+/*

+ * In v2:
+ * 31:12 - PT address bit 31:0
+ * 11: 8 - PT address bit 35:32
+ *  7: 4 - PT address bit 39:36
+ *  3: 1 - Reserved
+ * 0 - 1 if PT @ PT address is valid
+ */
+#define RK_DTE_PT_ADDRESS_MASK_V2 0xfff0
+
  static inline phys_addr_t rk_dte_pt_address(u32 dte)
  {
return (phys_addr_t)dte & RK_DTE_PT_ADDRESS_MASK;
  }
  
+static inline phys_addr_t rk_dte_pt_address_v2(u32 dte)

+{
+   u64 dte_v2 = dte;
+
+   dte_v2 = ((dte_v2 & DTE_HI_MASK2) << PAGE_DESC_HI_SHIFT2) |
+((dte_v2 & DTE_HI_MASK1) << PAGE_DESC_HI_SHIFT1) |
+(dte_v2 & PAGE_DESC_LO_MASK);
+
+   return (phys_addr_t)dte_v2;
+}


But on current (v1) systems, the respective bits 11:4 of the DTE/PTE and 
40:32 of the address should always be 0, so you could in principle just 
use the packing/unpacking logic all the time, right? That's what we did 
with 52-bit support in io-pgtable-arm, for instance.



+
  static inline bool rk_dte_is_pt_valid(u32 dte)
  {
return dte & RK_DTE_PT_VALID;
@@ -189,6 +240,15 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
return (pt_dma & RK_DTE_PT_ADDRESS_MASK) | RK_DTE_PT_VALID;
  }
  
+static inline u32 rk_mk_dte_v2(dma_addr_t pt_dma)

+{
+   pt_dma = (pt_dma & PAGE_DESC_LO_MASK) |
+((pt_dma & PAGE_DESC_HI_MASK1) >> PAGE_DESC_HI_SHIFT1) |
+(pt_dma & PAGE_DESC_HI_MASK2) >> PAGE_DESC_HI_SHIFT2;
+
+   return (pt_dma & RK_DTE_PT_ADDRESS_MASK_V2) | RK_DTE_PT_VALID;
+}
+
  /*
   * Each PTE has a Page address, some flags and a valid bit:
   * +-+---+---+-+
@@ -215,11 +275,37 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
  #define RK_PTE_PAGE_READABLE  BIT(1)
  #define RK_PTE_PAGE_VALID BIT(0)
  
+/*

+ * In v2:
+ * 31:12 - Page address bit 31:0
+ *  11:9 - Page address bit 34:32
+ *   8:4 - Page address bit 39:35
+ * 3 - Security
+ * 2 - Readable
+ * 1 - Writable
+ * 0 - 1 if Page @ Page 

[PATCH 1/1] iommu/of: Fix request and enable ACS for of_iommu_configure

2021-05-07 Thread Wang Xingang
From: Xingang Wang 

When request ACS for PCI device in of_iommu_configure, the pci device
has already been scanned and added with 'pci_acs_enable=0'. So the
pci_request_acs() in current procedure does not work for enabling ACS.
Besides, the ACS should be enabled only if there's an IOMMU in system.
So this fix the call of pci_request_acs() and call pci_enable_acs() to
make sure ACS is enabled for the pci_device.

Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
configuring IOMMU linkage")
Signed-off-by: Xingang Wang 
---
 drivers/iommu/of_iommu.c | 10 +-
 drivers/pci/pci.c|  2 +-
 include/linux/pci.h  |  1 +
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index a9d2df001149..dc621861ae72 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
.np = master_np,
};
 
-   pci_request_acs();
err = pci_for_each_dma_alias(to_pci_dev(dev),
 of_pci_iommu_init, );
} else {
@@ -222,6 +221,15 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
/* The fwspec pointer changed, read it again */
fwspec = dev_iommu_fwspec_get(dev);
ops= fwspec->ops;
+
+   /*
+* If we found an IOMMU and the device is pci,
+* make sure we enable ACS.
+*/
+   if (dev_is_pci(dev)) {
+   pci_request_acs();
+   pci_enable_acs(to_pci_dev(dev));
+   }
}
/*
 * If we have reason to believe the IOMMU driver missed the initial
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b717680377a9..4e4f98ee2870 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -926,7 +926,7 @@ static void pci_std_enable_acs(struct pci_dev *dev)
  * pci_enable_acs - enable ACS if hardware support it
  * @dev: the PCI device
  */
-static void pci_enable_acs(struct pci_dev *dev)
+void pci_enable_acs(struct pci_dev *dev)
 {
if (!pci_acs_enable)
goto disable_acs_redir;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c20211e59a57..e6a8bfbc9c98 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2223,6 +2223,7 @@ static inline struct pci_dev *pcie_find_root_port(struct 
pci_dev *dev)
 }
 
 void pci_request_acs(void);
+void pci_enable_acs(struct pci_dev *dev);
 bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
 bool pci_acs_path_enabled(struct pci_dev *start,
  struct pci_dev *end, u16 acs_flags);
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/1] iommu/of: Fix request and enable ACS for of_iommu_configure

2021-05-07 Thread Wang Xingang
From: Xingang Wang 

When request ACS in of_iommu_configure, the pci_acs_init procedure has
already been called. The pci device probe procedure is like the following:
pci_host_common_probe
pci_device_add
pci_acs_init
of_iommu_configure
pci_request_acs

The pci_request_acs() does not work because the pci_acs_init and
pci_enable_acs procedure has already finished, so the ACS is not
enabled as expected.  Besides, the ACS is enabled only if IOMMU is
detected and the device is pci.

So this fix 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core
when configuring IOMMU linkage"), add pci_enable_acs() and IOMMU check
to make sure ACS is enabled for the pci_device.

Xingang Wang (1):
  iommu/of: Fix request and enable ACS for of_iommu_configure

 drivers/iommu/of_iommu.c | 10 +-
 drivers/pci/pci.c|  2 +-
 include/linux/pci.h  |  1 +
 3 files changed, 11 insertions(+), 2 deletions(-)

-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-07 Thread Jason Gunthorpe
On Fri, May 07, 2021 at 07:36:49AM +, Tian, Kevin wrote:

> for /dev/ioasid there is still an open whether an process is allowed to
> open /dev/ioasid once or multiple times. If there is only one ioasid_fd
> per process, the accounting can be made accurately. otherwise the
> same problem still exists as each ioasid_fd is akin to the container, then
> we need find a better solution.

You can't really do tricks like 'FD once per process' in linux.

The locked page accounting problem is much bigger than vfio and I
don't really know of any solution..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 09/13] iommu/arm-smmu-v3: Add feature detection for BBML

2021-05-07 Thread Keqian Zhu
From: Kunkun Jiang 

This detects BBML feature and if SMMU supports it, transfer BBMLx
quirk to io-pgtable.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  6 ++
 2 files changed, 25 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index c42e59655fd0..3a2dc3177180 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2051,6 +2051,11 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
if (smmu->features & ARM_SMMU_FEAT_HD)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
 
+   if (smmu->features & ARM_SMMU_FEAT_BBML1)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_BBML1;
+   else if (smmu->features & ARM_SMMU_FEAT_BBML2)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_BBML2;
+
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops)
return -ENOMEM;
@@ -3419,6 +3424,20 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
 
/* IDR3 */
reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
+   switch (FIELD_GET(IDR3_BBML, reg)) {
+   case IDR3_BBML0:
+   break;
+   case IDR3_BBML1:
+   smmu->features |= ARM_SMMU_FEAT_BBML1;
+   break;
+   case IDR3_BBML2:
+   smmu->features |= ARM_SMMU_FEAT_BBML2;
+   break;
+   default:
+   dev_err(smmu->dev, "unknown/unsupported BBM behavior level\n");
+   return -ENXIO;
+   }
+
if (FIELD_GET(IDR3_RIL, reg))
smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 3edcd31b046e..e3b6bdd292c9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -54,6 +54,10 @@
 #define IDR1_SIDSIZE   GENMASK(5, 0)
 
 #define ARM_SMMU_IDR3  0xc
+#define IDR3_BBML  GENMASK(12, 11)
+#define IDR3_BBML0 0
+#define IDR3_BBML1 1
+#define IDR3_BBML2 2
 #define IDR3_RIL   (1 << 10)
 
 #define ARM_SMMU_IDR5  0x14
@@ -613,6 +617,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_E2H  (1 << 18)
 #define ARM_SMMU_FEAT_HA   (1 << 19)
 #define ARM_SMMU_FEAT_HD   (1 << 20)
+#define ARM_SMMU_FEAT_BBML1(1 << 21)
+#define ARM_SMMU_FEAT_BBML2(1 << 22)
u32 features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 11/13] iommu/arm-smmu-v3: Realize sync_dirty_log iommu ops

2021-05-07 Thread Keqian Zhu
From: Kunkun Jiang 

This realizes sync_dirty_log iommu ops based on sync_dirty_log
io-pgtable ops.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 30 +
 1 file changed, 30 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6de81d6ab652..3d3c0f8e2446 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2721,6 +2721,35 @@ static int arm_smmu_switch_dirty_log(struct iommu_domain 
*domain, bool enable,
return 0;
 }
 
+static int arm_smmu_sync_dirty_log(struct iommu_domain *domain,
+  unsigned long iova, size_t size,
+  unsigned long *bitmap,
+  unsigned long base_iova,
+  unsigned long bitmap_pgshift)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+   return -EINVAL;
+
+   if (!ops || !ops->sync_dirty_log) {
+   pr_err("io-pgtable don't realize sync dirty log\n");
+   return -ENODEV;
+   }
+
+   /*
+* Flush iotlb to ensure all inflight transactions are completed.
+* See doc IHI0070Da 3.13.4 "HTTU behavior summary".
+*/
+   arm_smmu_flush_iotlb_all(domain);
+   return ops->sync_dirty_log(ops, iova, size, bitmap, base_iova,
+  bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2820,6 +2849,7 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.enable_nesting = arm_smmu_enable_nesting,
.switch_dirty_log   = arm_smmu_switch_dirty_log,
+   .sync_dirty_log = arm_smmu_sync_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 13/13] iommu/arm-smmu-v3: Realize support_dirty_log iommu ops

2021-05-07 Thread Keqian Zhu
From: Kunkun Jiang 

We have implemented these interfaces required to support iommu
dirty log tracking. The last step is reporting this feature to
upper user, then the user can perform higher policy base on it.
For arm smmuv3, it is equal to ARM_SMMU_FEAT_HD.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9b4739247dbb..59d11f084199 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2684,6 +2684,13 @@ static int arm_smmu_merge_page(struct iommu_domain 
*domain, unsigned long iova,
return ret;
 }
 
+static bool arm_smmu_support_dirty_log(struct iommu_domain *domain)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+   return !!(smmu_domain->smmu->features & ARM_SMMU_FEAT_HD);
+}
+
 static int arm_smmu_switch_dirty_log(struct iommu_domain *domain, bool enable,
 unsigned long iova, size_t size, int prot)
 {
@@ -2872,6 +2879,7 @@ static struct iommu_ops arm_smmu_ops = {
.release_device = arm_smmu_release_device,
.device_group   = arm_smmu_device_group,
.enable_nesting = arm_smmu_enable_nesting,
+   .support_dirty_log  = arm_smmu_support_dirty_log,
.switch_dirty_log   = arm_smmu_switch_dirty_log,
.sync_dirty_log = arm_smmu_sync_dirty_log,
.clear_dirty_log= arm_smmu_clear_dirty_log,
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 10/13] iommu/arm-smmu-v3: Realize switch_dirty_log iommu ops

2021-05-07 Thread Keqian Zhu
From: Kunkun Jiang 

This realizes switch_dirty_log. In order to get finer dirty
granule, it invokes arm_smmu_split_block when start dirty
log, and invokes arm_smmu_merge_page() to recover block
mapping when stop dirty log.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 142 
 drivers/iommu/iommu.c   |   5 +-
 include/linux/iommu.h   |   2 +
 3 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3a2dc3177180..6de81d6ab652 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2580,6 +2580,147 @@ static int arm_smmu_enable_nesting(struct iommu_domain 
*domain)
return ret;
 }
 
+static int arm_smmu_split_block(struct iommu_domain *domain,
+   unsigned long iova, size_t size)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   size_t handled_size;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2, can't split 
block\n");
+   return -ENODEV;
+   }
+   if (!ops || !ops->split_block) {
+   pr_err("io-pgtable don't realize split block\n");
+   return -ENODEV;
+   }
+
+   handled_size = ops->split_block(ops, iova, size);
+   if (handled_size != size) {
+   pr_err("split block failed\n");
+   return -EFAULT;
+   }
+
+   return 0;
+}
+
+static int __arm_smmu_merge_page(struct iommu_domain *domain,
+unsigned long iova, phys_addr_t paddr,
+size_t size, int prot)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   size_t handled_size;
+
+   if (!ops || !ops->merge_page) {
+   pr_err("io-pgtable don't realize merge page\n");
+   return -ENODEV;
+   }
+
+   while (size) {
+   size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
+
+   handled_size = ops->merge_page(ops, iova, paddr, pgsize, prot);
+   if (handled_size != pgsize) {
+   pr_err("merge page failed\n");
+   return -EFAULT;
+   }
+
+   pr_debug("merge handled: iova 0x%lx pa %pa size 0x%zx\n",
+iova, , pgsize);
+
+   iova += pgsize;
+   paddr += pgsize;
+   size -= pgsize;
+   }
+
+   return 0;
+}
+
+static int arm_smmu_merge_page(struct iommu_domain *domain, unsigned long iova,
+  size_t size, int prot)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   phys_addr_t phys;
+   dma_addr_t p, i;
+   size_t cont_size;
+   int ret = 0;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2, can't merge page\n");
+   return -ENODEV;
+   }
+
+   if (!ops || !ops->iova_to_phys)
+   return -ENODEV;
+
+   while (size) {
+   phys = ops->iova_to_phys(ops, iova);
+   cont_size = PAGE_SIZE;
+   p = phys + cont_size;
+   i = iova + cont_size;
+
+   while (cont_size < size && p == ops->iova_to_phys(ops, i)) {
+   p += PAGE_SIZE;
+   i += PAGE_SIZE;
+   cont_size += PAGE_SIZE;
+   }
+
+   if (cont_size != PAGE_SIZE) {
+   ret = __arm_smmu_merge_page(domain, iova, phys,
+   cont_size, prot);
+   if (ret)
+   break;
+   }
+
+   iova += cont_size;
+   size -= cont_size;
+   }
+
+   return ret;
+}
+
+static int arm_smmu_switch_dirty_log(struct iommu_domain *domain, bool enable,
+unsigned long iova, size_t size, int prot)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+   return -EINVAL;
+
+   if (enable) {
+   /*
+* For SMMU, the hardware dirty management is always enabled if
+* 

[RFC PATCH v4 07/13] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update

2021-05-07 Thread Keqian Zhu
From: Jean-Philippe Brucker 

If the SMMU supports it and the kernel was built with HTTU support,
enable hardware update of access and dirty flags. This is essential for
shared page tables, to reduce the number of access faults on the fault
queue. Normal DMA with io-pgtables doesn't currently use the access or
dirty flags.

We can enable HTTU even if CPUs don't support it, because the kernel
always checks for HW dirty bit and updates the PTE flags atomically.

Signed-off-by: Jean-Philippe Brucker 
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  2 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 41 ++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  8 
 3 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251cab61f3..ae075e675892 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -121,10 +121,12 @@ static struct arm_smmu_ctx_desc 
*arm_smmu_alloc_shared_cd(struct mm_struct *mm)
if (err)
goto out_free_asid;
 
+   /* HA and HD will be filtered out later if not supported by the SMMU */
tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - vabits_actual) |
  FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) |
  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) |
  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) |
+ CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD |
  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
 
switch (PAGE_SIZE) {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 54b2f27b81d4..4ac59a89bc76 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1010,10 +1010,17 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain, int ssid,
 * this substream's traffic
 */
} else { /* (1) and (2) */
+   u64 tcr = cd->tcr;
+
cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
cdptr[2] = 0;
cdptr[3] = cpu_to_le64(cd->mair);
 
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   tcr &= ~CTXDESC_CD_0_TCR_HD;
+   if (!(smmu->features & ARM_SMMU_FEAT_HA))
+   tcr &= ~CTXDESC_CD_0_TCR_HA;
+
/*
 * STE is live, and the SMMU might read dwords of this CD in any
 * order. Ensure that it observes valid values before reading
@@ -1021,7 +1028,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain, int ssid,
 */
arm_smmu_sync_cd(smmu_domain, ssid, true);
 
-   val = cd->tcr |
+   val = tcr |
 #ifdef __BIG_ENDIAN
CTXDESC_CD_0_ENDI |
 #endif
@@ -3242,6 +3249,28 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
*smmu, bool bypass)
return 0;
 }
 
+static void arm_smmu_get_httu(struct arm_smmu_device *smmu, u32 reg)
+{
+   u32 fw_features = smmu->features & (ARM_SMMU_FEAT_HA | 
ARM_SMMU_FEAT_HD);
+   u32 features = 0;
+
+   switch (FIELD_GET(IDR0_HTTU, reg)) {
+   case IDR0_HTTU_ACCESS_DIRTY:
+   features |= ARM_SMMU_FEAT_HD;
+   fallthrough;
+   case IDR0_HTTU_ACCESS:
+   features |= ARM_SMMU_FEAT_HA;
+   }
+
+   if (smmu->dev->of_node)
+   smmu->features |= features;
+   else if (features != fw_features)
+   /* ACPI IORT sets the HTTU bits */
+   dev_warn(smmu->dev,
+"IDR0.HTTU overridden by FW configuration (0x%x)\n",
+fw_features);
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
u32 reg;
@@ -3302,6 +3331,8 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
smmu->features |= ARM_SMMU_FEAT_E2H;
}
 
+   arm_smmu_get_httu(smmu, reg);
+
/*
 * The coherency feature as set by FW is used in preference to the ID
 * register, but warn on mismatch.
@@ -3487,6 +3518,14 @@ static int arm_smmu_device_acpi_probe(struct 
platform_device *pdev,
if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE)
smmu->features |= ARM_SMMU_FEAT_COHERENCY;
 
+   switch (FIELD_GET(ACPI_IORT_SMMU_V3_HTTU_OVERRIDE, iort_smmu->flags)) {
+   case IDR0_HTTU_ACCESS_DIRTY:
+   smmu->features |= ARM_SMMU_FEAT_HD;
+   fallthrough;
+   case IDR0_HTTU_ACCESS:
+   smmu->features |= ARM_SMMU_FEAT_HA;
+   }
+
return 0;
 }
 #else
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 

[RFC PATCH v4 08/13] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping

2021-05-07 Thread Keqian Zhu
From: Kunkun Jiang 

As nested mode is not upstreamed now, we just aim to support dirty
log tracking for stage1 with io-pgtable mapping (means not support
SVA mapping). If HTTU is supported, we enable HA/HD bits in the SMMU
CD and transfer ARM_HD quirk to io-pgtable.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4ac59a89bc76..c42e59655fd0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1942,6 +1942,7 @@ static int arm_smmu_domain_finalise_s1(struct 
arm_smmu_domain *smmu_domain,
  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) |
  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
+ CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD |
  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
cfg->cd.mair= pgtbl_cfg->arm_lpae_s1_cfg.mair;
 
@@ -2047,6 +2048,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
 
if (!iommu_get_dma_strict(domain))
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+   if (smmu->features & ARM_SMMU_FEAT_HD)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
 
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops)
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 06/13] iommu/io-pgtable-arm: Add and realize clear_dirty_log ops

2021-05-07 Thread Keqian Zhu
From: Kunkun Jiang 

After dirty log is retrieved, user should clear dirty log to re-enable
dirty log tracking for these dirtied pages. This clears the dirty state
(As we just set DBM bit for stage1 mapping, so should set the AP[2] bit)
of these leaf TTDs that are specified by the user provided bitmap.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/io-pgtable-arm.c | 93 ++
 include/linux/io-pgtable.h |  4 ++
 2 files changed, 97 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 155d440099ab..2b41b9d0faa3 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -965,6 +965,98 @@ static int arm_lpae_sync_dirty_log(struct io_pgtable_ops 
*ops,
 bitmap, base_iova, bitmap_pgshift);
 }
 
+static int __arm_lpae_clear_dirty_log(struct arm_lpae_io_pgtable *data,
+ unsigned long iova, size_t size,
+ int lvl, arm_lpae_iopte *ptep,
+ unsigned long *bitmap,
+ unsigned long base_iova,
+ unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   unsigned long offset;
+   size_t base, next_size;
+   int nbits, ret, i;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* Ensure all corresponding bits are set */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   for (i = offset; i < offset + nbits; i++) {
+   if (!test_bit(i, bitmap))
+   return 0;
+   }
+
+   /* Race does not exist */
+   pte |= ARM_LPAE_PTE_AP_RDONLY;
+   __arm_lpae_set_pte(ptep, pte, >cfg);
+   return 0;
+   }
+   /* Current level is table, traverse next level */
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopte_deref(pte, data);
+   for (base = 0; base < size; base += next_size) {
+   ret = __arm_lpae_clear_dirty_log(data, iova + base,
+   next_size, lvl + 1, ptep, bitmap,
+   base_iova, bitmap_pgshift);
+   if (ret)
+   return ret;
+   }
+   return 0;
+   } else if (iopte_leaf(pte, lvl, iop->fmt)) {
+   /* Though the size is too small, it is already clean */
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   return -EINVAL;
+   }
+
+   /* Keep on walkin */
+   ptep = iopte_deref(pte, data);
+   return __arm_lpae_clear_dirty_log(data, iova, size, lvl + 1, ptep,
+   bitmap, base_iova, bitmap_pgshift);
+}
+
+static int arm_lpae_clear_dirty_log(struct io_pgtable_ops *ops,
+   unsigned long iova, size_t size,
+   unsigned long *bitmap,
+   unsigned long base_iova,
+   unsigned long bitmap_pgshift)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   arm_lpae_iopte *ptep = data->pgd;
+   int lvl = data->start_level;
+   long iaext = (s64)iova >> cfg->ias;
+
+   if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+   return -EINVAL;
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   iaext = ~iaext;
+   if (WARN_ON(iaext))
+   return -EINVAL;
+
+   if (data->iop.fmt != ARM_64_LPAE_S1 &&
+   data->iop.fmt != ARM_32_LPAE_S1)
+   return -EINVAL;
+
+   return __arm_lpae_clear_dirty_log(data, iova, size, lvl, ptep,
+   bitmap, base_iova, bitmap_pgshift);
+}
+
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
unsigned long granule, page_sizes;
@@ -1046,6 +1138,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
.split_block= arm_lpae_split_block,
.merge_page = arm_lpae_merge_page,
.sync_dirty_log = arm_lpae_sync_dirty_log,
+   

[RFC PATCH v4 12/13] iommu/arm-smmu-v3: Realize clear_dirty_log iommu ops

2021-05-07 Thread Keqian Zhu
From: Kunkun Jiang 

This realizes clear_dirty_log iommu ops based on clear_dirty_log
io-pgtable ops.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3d3c0f8e2446..9b4739247dbb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2750,6 +2750,30 @@ static int arm_smmu_sync_dirty_log(struct iommu_domain 
*domain,
   bitmap_pgshift);
 }
 
+static int arm_smmu_clear_dirty_log(struct iommu_domain *domain,
+   unsigned long iova, size_t size,
+   unsigned long *bitmap,
+   unsigned long base_iova,
+   unsigned long bitmap_pgshift)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+   return -EINVAL;
+
+   if (!ops || !ops->clear_dirty_log) {
+   pr_err("io-pgtable don't realize clear dirty log\n");
+   return -ENODEV;
+   }
+
+   return ops->clear_dirty_log(ops, iova, size, bitmap, base_iova,
+   bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2850,6 +2874,7 @@ static struct iommu_ops arm_smmu_ops = {
.enable_nesting = arm_smmu_enable_nesting,
.switch_dirty_log   = arm_smmu_switch_dirty_log,
.sync_dirty_log = arm_smmu_sync_dirty_log,
+   .clear_dirty_log= arm_smmu_clear_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 04/13] iommu/io-pgtable-arm: Add and realize merge_page ops

2021-05-07 Thread Keqian Zhu
From: Kunkun Jiang 

If block(largepage) mappings are split during start dirty log, then
when stop dirty log, we need to recover them for better DMA performance.

This recovers block mappings and unmap the span of page mappings. BBML1
or BBML2 feature is required.

Merging page is designed to be only used by dirty log tracking, which
does not concurrently work with other pgtable ops that access underlying
page table, so race condition does not exist.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/io-pgtable-arm.c | 78 ++
 include/linux/io-pgtable.h |  2 +
 2 files changed, 80 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 664a9548b199..b9f6e3370032 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -800,6 +800,83 @@ static size_t arm_lpae_split_block(struct io_pgtable_ops 
*ops,
return __arm_lpae_split_block(data, iova, size, lvl, ptep);
 }
 
+static size_t __arm_lpae_merge_page(struct arm_lpae_io_pgtable *data,
+   unsigned long iova, phys_addr_t paddr,
+   size_t size, int lvl, arm_lpae_iopte *ptep,
+   arm_lpae_iopte prot)
+{
+   arm_lpae_iopte pte, *tablep;
+   struct io_pgtable *iop = >iop;
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return 0;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt))
+   return size;
+
+   /* Race does not exist */
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_BBML1) {
+   prot |= ARM_LPAE_PTE_NT;
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   io_pgtable_tlb_flush_walk(iop, iova, size,
+ ARM_LPAE_GRANULE(data));
+
+   prot &= ~(ARM_LPAE_PTE_NT);
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   } else {
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   }
+
+   tablep = iopte_deref(pte, data);
+   __arm_lpae_free_pgtable(data, lvl + 1, tablep);
+   return size;
+   } else if (iopte_leaf(pte, lvl, iop->fmt)) {
+   /* The size is too small, already merged */
+   return size;
+   }
+
+   /* Keep on walkin */
+   ptep = iopte_deref(pte, data);
+   return __arm_lpae_merge_page(data, iova, paddr, size, lvl + 1, ptep, 
prot);
+}
+
+static size_t arm_lpae_merge_page(struct io_pgtable_ops *ops, unsigned long 
iova,
+ phys_addr_t paddr, size_t size, int 
iommu_prot)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   arm_lpae_iopte *ptep = data->pgd;
+   int lvl = data->start_level;
+   arm_lpae_iopte prot;
+   long iaext = (s64)iova >> cfg->ias;
+
+   if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+   return 0;
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   iaext = ~iaext;
+   if (WARN_ON(iaext || paddr >> cfg->oas))
+   return 0;
+
+   /* If no access, then nothing to do */
+   if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
+   return size;
+
+   /* If it is smallest granule, then nothing to do */
+   if (size == ARM_LPAE_BLOCK_SIZE(ARM_LPAE_MAX_LEVELS - 1, data))
+   return size;
+
+   prot = arm_lpae_prot_to_pte(data, iommu_prot);
+   return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot);
+}
+
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
unsigned long granule, page_sizes;
@@ -879,6 +956,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
.unmap  = arm_lpae_unmap,
.iova_to_phys   = arm_lpae_iova_to_phys,
.split_block= arm_lpae_split_block,
+   .merge_page = arm_lpae_merge_page,
};
 
return data;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index eba6c6ccbe49..e77576d946a2 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -169,6 +169,8 @@ struct io_pgtable_ops {
unsigned long iova);
size_t (*split_block)(struct io_pgtable_ops *ops, unsigned long iova,
  size_t size);
+   size_t (*merge_page)(struct io_pgtable_ops *ops, unsigned long iova,
+phys_addr_t phys, size_t size, int prot);
 };
 
 /**
-- 

[RFC PATCH v4 05/13] iommu/io-pgtable-arm: Add and realize sync_dirty_log ops

2021-05-07 Thread Keqian Zhu
From: Kunkun Jiang 

During dirty log tracking, user will try to retrieve dirty log from
iommu if it supports hardware dirty log. Scan leaf TTD and treat it
is dirty if it's writable. As we just set DBM bit for stage1 mapping,
so check whether AP[2] is not set.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/io-pgtable-arm.c | 89 ++
 include/linux/io-pgtable.h |  4 ++
 2 files changed, 93 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index b9f6e3370032..155d440099ab 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -877,6 +877,94 @@ static size_t arm_lpae_merge_page(struct io_pgtable_ops 
*ops, unsigned long iova
return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot);
 }
 
+static int __arm_lpae_sync_dirty_log(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size,
+int lvl, arm_lpae_iopte *ptep,
+unsigned long *bitmap,
+unsigned long base_iova,
+unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   size_t base, next_size;
+   unsigned long offset;
+   int nbits, ret;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* It is writable, set the bitmap */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   bitmap_set(bitmap, offset, nbits);
+   return 0;
+   }
+   /* Current level is table, traverse next level */
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopte_deref(pte, data);
+   for (base = 0; base < size; base += next_size) {
+   ret = __arm_lpae_sync_dirty_log(data, iova + base,
+   next_size, lvl + 1, ptep, bitmap,
+   base_iova, bitmap_pgshift);
+   if (ret)
+   return ret;
+   }
+   return 0;
+   } else if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* Though the size is too small, also set bitmap */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   bitmap_set(bitmap, offset, nbits);
+   return 0;
+   }
+
+   /* Keep on walkin */
+   ptep = iopte_deref(pte, data);
+   return __arm_lpae_sync_dirty_log(data, iova, size, lvl + 1, ptep,
+   bitmap, base_iova, bitmap_pgshift);
+}
+
+static int arm_lpae_sync_dirty_log(struct io_pgtable_ops *ops,
+  unsigned long iova, size_t size,
+  unsigned long *bitmap,
+  unsigned long base_iova,
+  unsigned long bitmap_pgshift)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   arm_lpae_iopte *ptep = data->pgd;
+   int lvl = data->start_level;
+   long iaext = (s64)iova >> cfg->ias;
+
+   if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+   return -EINVAL;
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   iaext = ~iaext;
+   if (WARN_ON(iaext))
+   return -EINVAL;
+
+   if (data->iop.fmt != ARM_64_LPAE_S1 &&
+   data->iop.fmt != ARM_32_LPAE_S1)
+   return -EINVAL;
+
+   return __arm_lpae_sync_dirty_log(data, iova, size, lvl, ptep,
+bitmap, base_iova, bitmap_pgshift);
+}
+
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
unsigned long granule, page_sizes;
@@ -957,6 +1045,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
.iova_to_phys   = arm_lpae_iova_to_phys,
.split_block= arm_lpae_split_block,
.merge_page = arm_lpae_merge_page,
+   .sync_dirty_log = arm_lpae_sync_dirty_log,
};
 
return data;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index e77576d946a2..329fa99d9d96 100644
--- 

[RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework

2021-05-07 Thread Keqian Zhu
Some types of IOMMU are capable of tracking DMA dirty log, such as
ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
dirty log tracking framework in the IOMMU base layer.

Four new essential interfaces are added, and we maintaince the status
of dirty log tracking in iommu_domain.
1. iommu_support_dirty_log: Check whether domain supports dirty log tracking
2. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
3. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
4. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap

Note: Don't concurrently call these interfaces with other ops that
access underlying page table.

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/iommu.c| 201 +++
 include/linux/iommu.h|  63 +++
 include/trace/events/iommu.h |  63 +++
 3 files changed, 327 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d5df5..0d15620d1e90 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1940,6 +1940,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
domain->type = type;
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+   mutex_init(>switch_log_lock);
 
return domain;
 }
@@ -2703,6 +2704,206 @@ int iommu_set_pgtable_quirks(struct iommu_domain 
*domain,
 }
 EXPORT_SYMBOL_GPL(iommu_set_pgtable_quirks);
 
+bool iommu_support_dirty_log(struct iommu_domain *domain)
+{
+   const struct iommu_ops *ops = domain->ops;
+
+   return ops->support_dirty_log && ops->support_dirty_log(domain);
+}
+EXPORT_SYMBOL_GPL(iommu_support_dirty_log);
+
+int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
+  unsigned long iova, size_t size, int prot)
+{
+   const struct iommu_ops *ops = domain->ops;
+   unsigned long orig_iova = iova;
+   unsigned int min_pagesz;
+   size_t orig_size = size;
+   bool flush = false;
+   int ret = 0;
+
+   if (unlikely(!ops->switch_dirty_log))
+   return -ENODEV;
+
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+   if (!IS_ALIGNED(iova | size, min_pagesz)) {
+   pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+  iova, size, min_pagesz);
+   return -EINVAL;
+   }
+
+   mutex_lock(>switch_log_lock);
+   if (enable && domain->dirty_log_tracking) {
+   ret = -EBUSY;
+   goto out;
+   } else if (!enable && !domain->dirty_log_tracking) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   pr_debug("switch_dirty_log %s for: iova 0x%lx size 0x%zx\n",
+enable ? "enable" : "disable", iova, size);
+
+   while (size) {
+   size_t pgsize = iommu_pgsize(domain, iova, size);
+
+   flush = true;
+   ret = ops->switch_dirty_log(domain, enable, iova, pgsize, prot);
+   if (ret)
+   break;
+
+   pr_debug("switch_dirty_log handled: iova 0x%lx size 0x%zx\n",
+iova, pgsize);
+
+   iova += pgsize;
+   size -= pgsize;
+   }
+
+   if (flush)
+   iommu_flush_iotlb_all(domain);
+
+   if (!ret) {
+   domain->dirty_log_tracking = enable;
+   trace_switch_dirty_log(orig_iova, orig_size, enable);
+   }
+out:
+   mutex_unlock(>switch_log_lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_switch_dirty_log);
+
+int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
+size_t size, unsigned long *bitmap,
+unsigned long base_iova, unsigned long bitmap_pgshift)
+{
+   const struct iommu_ops *ops = domain->ops;
+   unsigned long orig_iova = iova;
+   unsigned int min_pagesz;
+   size_t orig_size = size;
+   int ret = 0;
+
+   if (unlikely(!ops->sync_dirty_log))
+   return -ENODEV;
+
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+   if (!IS_ALIGNED(iova | size, min_pagesz)) {
+   pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+  iova, size, min_pagesz);
+   return -EINVAL;
+   }
+
+   mutex_lock(>switch_log_lock);
+   if (!domain->dirty_log_tracking) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   pr_debug("sync_dirty_log for: iova 0x%lx size 0x%zx\n", iova, size);
+
+   while (size) {
+   size_t pgsize = iommu_pgsize(domain, iova, size);
+
+   ret = ops->sync_dirty_log(domain, iova, pgsize,
+ bitmap, base_iova, bitmap_pgshift);
+   if (ret)
+   break;
+
+ 

[RFC PATCH v4 00/13] iommu/smmuv3: Implement hardware dirty log tracking

2021-05-07 Thread Keqian Zhu
Hi Robin, Will and everyone,

I think this series is relative mature now, please give your valuable 
suggestions,
thanks!


This patch series is split from the series[1] that containes both IOMMU part and
VFIO part. The VFIO part will be sent out in another series.

[1] 
https://lore.kernel.org/linux-iommu/20210310090614.26668-1-zhukeqi...@huawei.com/

changelog:

v4:
 - Modify the framework as suggested by Baolu, thanks!
 - Add trace for iommu ops.
 - Extract io-pgtable part.

v3:
 - Merge start_dirty_log and stop_dirty_log into switch_dirty_log. (Yi Sun)
 - Maintain the dirty log status in iommu_domain.
 - Update commit message to make patch easier to review.

v2:
 - Address all comments of RFC version, thanks for all of you ;-)
 - Add a bugfix that start dirty log for newly added dma ranges and domain.



Hi everyone,

This patch series introduces a framework of iommu dirty log tracking, and smmuv3
realizes this framework. This new feature can be used by VFIO dma dirty 
tracking.

Intention:

Some types of IOMMU are capable of tracking DMA dirty log, such as
ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
dirty log tracking framework in the IOMMU base layer.

Three new essential interfaces are added, and we maintaince the status
of dirty log tracking in iommu_domain.
1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap

About SMMU HTTU:

HTTU (Hardware Translation Table Update) is a feature of ARM SMMUv3, it can 
update
access flag or/and dirty state of the TTD (Translation Table Descriptor) by 
hardware.
With HTTU, stage1 TTD is classified into 3 types:
DBM bit AP[2](readonly bit)
1. writable_clean 1   1
2. writable_dirty 1   0
3. readonly   0   1

If HTTU_HD (manage dirty state) is enabled, smmu can change TTD from 
writable_clean to
writable_dirty. Then software can scan TTD to sync dirty state into dirty 
bitmap. With
this feature, we can track the dirty log of DMA continuously and precisely.

About this series:

Patch 1-3:Introduce dirty log tracking framework in the IOMMU base layer, and 
two common
   interfaces that can be used by many types of iommu.

Patch 4-6: Add feature detection for smmu HTTU and enable HTTU for smmu stage1 
mapping.
   And add feature detection for smmu BBML. We need to split block 
mapping when
   start dirty log tracking and merge page mapping when stop dirty log 
tracking,
   which requires break-before-make procedure. But it might 
cause problems when the
   TTD is alive. The I/O streams might not tolerate translation 
faults. So BBML
   should be used.

Patch 7-12: We implement these interfaces for arm smmuv3.

Thanks,
Keqian

Jean-Philippe Brucker (1):
  iommu/arm-smmu-v3: Add support for Hardware Translation Table Update

Keqian Zhu (1):
  iommu: Introduce dirty log tracking framework

Kunkun Jiang (11):
  iommu/io-pgtable-arm: Add quirk ARM_HD and ARM_BBMLx
  iommu/io-pgtable-arm: Add and realize split_block ops
  iommu/io-pgtable-arm: Add and realize merge_page ops
  iommu/io-pgtable-arm: Add and realize sync_dirty_log ops
  iommu/io-pgtable-arm: Add and realize clear_dirty_log ops
  iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  iommu/arm-smmu-v3: Add feature detection for BBML
  iommu/arm-smmu-v3: Realize switch_dirty_log iommu ops
  iommu/arm-smmu-v3: Realize sync_dirty_log iommu ops
  iommu/arm-smmu-v3: Realize clear_dirty_log iommu ops
  iommu/arm-smmu-v3: Realize support_dirty_log iommu ops

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 268 +++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  14 +
 drivers/iommu/io-pgtable-arm.c| 389 +-
 drivers/iommu/iommu.c | 206 +-
 include/linux/io-pgtable.h|  23 ++
 include/linux/iommu.h |  65 +++
 include/trace/events/iommu.h  |  63 +++
 8 files changed, 1026 insertions(+), 4 deletions(-)

-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH v4 02/13] iommu/io-pgtable-arm: Add quirk ARM_HD and ARM_BBMLx

2021-05-07 Thread Keqian Zhu
From: Kunkun Jiang 

These features are essential to support dirty log tracking for
SMMU with io-pgtable mapping.

The dirty state information is encoded using the access permission
bits AP[2] (stage 1) or S2AP[1] (stage 2) in conjunction with the
DBM (Dirty Bit Modifier) bit, where DBM means writable and AP[2]/
S2AP[1] means dirty.

When has ARM_HD, we set DBM bit for S1 mapping. As SMMU nested
mode is not upstreamed for now, we just aim to support dirty
log tracking for stage1 with io-pgtable mapping (means not support
SVA).

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/io-pgtable-arm.c |  7 ++-
 include/linux/io-pgtable.h | 11 +++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..94d790b8ed27 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -72,6 +72,7 @@
 
 #define ARM_LPAE_PTE_NSTABLE   (((arm_lpae_iopte)1) << 63)
 #define ARM_LPAE_PTE_XN(((arm_lpae_iopte)3) << 53)
+#define ARM_LPAE_PTE_DBM   (((arm_lpae_iopte)1) << 51)
 #define ARM_LPAE_PTE_AF(((arm_lpae_iopte)1) << 10)
 #define ARM_LPAE_PTE_SH_NS (((arm_lpae_iopte)0) << 8)
 #define ARM_LPAE_PTE_SH_OS (((arm_lpae_iopte)2) << 8)
@@ -81,7 +82,7 @@
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK  (((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
-#define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)6) << 52)
+#define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)13) << 51)
 #define ARM_LPAE_PTE_ATTR_MASK (ARM_LPAE_PTE_ATTR_LO_MASK |\
 ARM_LPAE_PTE_ATTR_HI_MASK)
 /* Software bit for solving coherency races */
@@ -379,6 +380,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, 
unsigned long iova,
 static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
   int prot)
 {
+   struct io_pgtable_cfg *cfg = >iop.cfg;
arm_lpae_iopte pte;
 
if (data->iop.fmt == ARM_64_LPAE_S1 ||
@@ -386,6 +388,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
pte = ARM_LPAE_PTE_nG;
if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
pte |= ARM_LPAE_PTE_AP_RDONLY;
+   else if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_HD)
+   pte |= ARM_LPAE_PTE_DBM;
+
if (!(prot & IOMMU_PRIV))
pte |= ARM_LPAE_PTE_AP_UNPRIV;
} else {
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 4d40dfa75b55..92274705b772 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -82,6 +82,14 @@ struct io_pgtable_cfg {
 *
 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
 *  attributes set in the TCR for a non-coherent page-table walker.
+*
+* IO_PGTABLE_QUIRK_ARM_HD: Support hardware management of dirty status.
+*
+* IO_PGTABLE_QUIRK_ARM_BBML1: ARM SMMU supports BBM Level 1 behavior
+*  when changing block size.
+*
+* IO_PGTABLE_QUIRK_ARM_BBML2: ARM SMMU supports BBM Level 2 behavior
+*  when changing block size.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -89,6 +97,9 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
#define IO_PGTABLE_QUIRK_ARM_TTBR1  BIT(5)
#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
+   #define IO_PGTABLE_QUIRK_ARM_HD BIT(7)
+   #define IO_PGTABLE_QUIRK_ARM_BBML1  BIT(8)
+   #define IO_PGTABLE_QUIRK_ARM_BBML2  BIT(9)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v4 03/13] iommu/io-pgtable-arm: Add and realize split_block ops

2021-05-07 Thread Keqian Zhu
From: Kunkun Jiang 

Block(largepage) mapping is not a proper granule for dirty log tracking.
Take an extreme example, if DMA writes one byte, under 1G mapping, the
dirty amount reported is 1G, but under 4K mapping, the dirty amount is
just 4K.

This splits block descriptor to an span of page descriptors. BBML1 or
BBML2 feature is required.

Spliting block is designed to be only used by dirty log tracking, which
does not concurrently work with other pgtable ops that access underlying
page table, so race condition does not exist.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/io-pgtable-arm.c | 122 +
 include/linux/io-pgtable.h |   2 +
 2 files changed, 124 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 94d790b8ed27..664a9548b199 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -79,6 +79,8 @@
 #define ARM_LPAE_PTE_SH_IS (((arm_lpae_iopte)3) << 8)
 #define ARM_LPAE_PTE_NS(((arm_lpae_iopte)1) << 5)
 #define ARM_LPAE_PTE_VALID (((arm_lpae_iopte)1) << 0)
+/* Block descriptor bits */
+#define ARM_LPAE_PTE_NT(((arm_lpae_iopte)1) << 16)
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK  (((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
@@ -679,6 +681,125 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
return iopte_to_paddr(pte, data) | iova;
 }
 
+static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size, int lvl,
+arm_lpae_iopte *ptep);
+
+static size_t arm_lpae_do_split_blk(struct arm_lpae_io_pgtable *data,
+   unsigned long iova, size_t size,
+   arm_lpae_iopte blk_pte, int lvl,
+   arm_lpae_iopte *ptep)
+{
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   arm_lpae_iopte pte, *tablep;
+   phys_addr_t blk_paddr;
+   size_t tablesz = ARM_LPAE_GRANULE(data);
+   size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+   int i;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
+   if (!tablep)
+   return 0;
+
+   blk_paddr = iopte_to_paddr(blk_pte, data);
+   pte = iopte_prot(blk_pte);
+   for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz)
+   __arm_lpae_init_pte(data, blk_paddr, pte, lvl, [i]);
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_BBML1) {
+   /* Race does not exist */
+   blk_pte |= ARM_LPAE_PTE_NT;
+   __arm_lpae_set_pte(ptep, blk_pte, cfg);
+   io_pgtable_tlb_flush_walk(>iop, iova, size, size);
+   }
+   /* Race does not exist */
+   pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);
+
+   /* Have splited it into page? */
+   if (lvl == (ARM_LPAE_MAX_LEVELS - 1))
+   return size;
+
+   /* Go back to lvl - 1 */
+   ptep -= ARM_LPAE_LVL_IDX(iova, lvl - 1, data);
+   return __arm_lpae_split_block(data, iova, size, lvl - 1, ptep);
+}
+
+static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size, int lvl,
+arm_lpae_iopte *ptep)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   size_t base, next_size, total_size;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return 0;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (lvl == (ARM_LPAE_MAX_LEVELS - 1) ||
+   (pte & ARM_LPAE_PTE_AP_RDONLY))
+   return size;
+
+   /* We find a writable block, split it. */
+   return arm_lpae_do_split_blk(data, iova, size, pte,
+   lvl + 1, ptep);
+   } else {
+   /* If it is the last table level, then nothing to do */
+   if (lvl == (ARM_LPAE_MAX_LEVELS - 2))
+   return size;
+
+   total_size = 0;
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopte_deref(pte, data);
+   for (base = 0; base < size; base += next_size)
+   total_size += __arm_lpae_split_block(data,
+   iova + base, next_size, lvl + 1,
+   

Re: [PATCH v3 08/10] iommu/arm-smmu-v3: Reserve any RMR regions associated with a dev

2021-05-07 Thread Robin Murphy

On 2021-04-20 09:27, Shameer Kolothum wrote:

Get RMR regions associated with a dev reserved so that there is
a unity mapping for them in SMMU.

Signed-off-by: Shameer Kolothum 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 +
  1 file changed, 29 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 14e9c7034c04..8bacedf7bb34 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2531,6 +2531,34 @@ static int arm_smmu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
  }
  
+static bool arm_smmu_dev_has_rmr(struct arm_smmu_master *master,

+struct iommu_rmr *e)
+{
+   int i;
+
+   for (i = 0; i < master->num_sids; i++) {
+   if (e->sid == master->sids[i])
+   return true;
+   }
+
+   return false;
+}
+
+static void arm_smmu_rmr_get_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+   struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+   struct arm_smmu_device *smmu = master->smmu;
+   struct iommu_rmr *rmr;
+
+   list_for_each_entry(rmr, >rmr_list, list) {
+   if (!arm_smmu_dev_has_rmr(master, rmr))
+   continue;
+
+   iommu_dma_get_rmr_resv_regions(dev, rmr, head);
+   }
+}
+


TBH I wouldn't have thought we need a driver-specific hook for this, or 
is it too painful to correlate fwspec->iommu_fwnode back to the relevant 
IORT node generically?


Robin.


  static void arm_smmu_get_resv_regions(struct device *dev,
  struct list_head *head)
  {
@@ -2545,6 +2573,7 @@ static void arm_smmu_get_resv_regions(struct device *dev,
list_add_tail(>list, head);
  
  	iommu_dma_get_resv_regions(dev, head);

+   arm_smmu_rmr_get_resv_regions(dev, head);
  }
  
  static bool arm_smmu_dev_has_feature(struct device *dev,



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 09/10] iommu/arm-smmu: Get associated RMR info and install bypass SMR

2021-05-07 Thread Robin Murphy



On 2021-05-06 16:17, Steven Price wrote:

On 20/04/2021 09:27, Shameer Kolothum wrote:

From: Jon Nettleton 

Check if there is any RMR info associated with the devices behind
the SMMU and if any, install bypass SMRs for them. This is to
keep any ongoing traffic associated with these devices alive
when we enable/reset SMMU during probe().

Signed-off-by: Jon Nettleton 
Signed-off-by: Shameer Kolothum 
---
  drivers/iommu/arm/arm-smmu/arm-smmu.c | 42 +++
  drivers/iommu/arm/arm-smmu/arm-smmu.h |  2 ++
  2 files changed, 44 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c

index d8c6bfde6a61..4d2f91626d87 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2102,6 +2102,43 @@ err_reset_platform_ops: __maybe_unused;
  return err;
  }
+static void arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device 
*smmu)

+{
+    struct iommu_rmr *e;
+    int i, cnt = 0;
+    u32 smr;
+
+    for (i = 0; i < smmu->num_mapping_groups; i++) {
+    smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
+    if (!FIELD_GET(ARM_SMMU_SMR_VALID, smr))
+    continue;
+
+    list_for_each_entry(e, >rmr_list, list) {
+    if (FIELD_GET(ARM_SMMU_SMR_ID, smr) != e->sid)
+    continue;
+
+    smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
+    smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
+    smmu->smrs[i].valid = true;
+
+    smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
+    smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
+    smmu->s2crs[i].cbndx = 0xff;
+
+    cnt++;
+    }
+    }


If I understand this correctly - this is looking at the current
(hardware) configuration of the SMMU and attempting to preserve any
bypass SMRs. However from what I can tell it suffers from the following
two problems:

  (a) Only the ID of the SMR is being checked, not the MASK. So if the
firmware has setup an SMR matching a number of streams this will break.

  (b) The SMMU might not be enabled at all (CLIENTPD==1) or bypass
enabled for unmatched streams (USFCFG==0).


Yes, trying to infer anything from the current SMMU hardware state is 
bogus - consider what you might find left over after a kexec, for 
instance. The *only* way to detect the presence and applicability of 
RMRs is to look at the actual RMR nodes in the IORT.


Ignore what we let the Qualcomm ACPI bootloader hack do - that whole 
implementation is "special".


Robin.


Certainly in my test setup case (b) applies and so this doesn't work.
Perhaps something like the below would work better? (It works in the
case of the SMMU not enabled - I've not tested case (a)).

Steve

8<
static void arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device *smmu)
{
 struct iommu_rmr *e;
 int i, cnt = 0;
 u32 smr;
 u32 reg;

 reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sCR0);

 if ((reg & ARM_SMMU_sCR0_USFCFG) && !(reg & ARM_SMMU_sCR0_CLIENTPD)) {
     /*
  * SMMU is already enabled and disallowing bypass, so preserve
  * the existing SMRs
  */
     for (i = 0; i < smmu->num_mapping_groups; i++) {
     smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
     if (!FIELD_GET(ARM_SMMU_SMR_VALID, smr))
     continue;
     smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
     smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
     smmu->smrs[i].valid = true;
     }
 }

 list_for_each_entry(e, >rmr_list, list) {
     u32 sid = e->sid;

     i = arm_smmu_find_sme(smmu, sid, ~0);
     if (i < 0)
     continue;
     if (smmu->s2crs[i].count == 0) {
     smmu->smrs[i].id = sid;
     smmu->smrs[i].mask = ~0;
     smmu->smrs[i].valid = true;
     }
     smmu->s2crs[i].count++;
     smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
     smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
     smmu->s2crs[i].cbndx = 0xff;

     cnt++;
 }

 if ((reg & ARM_SMMU_sCR0_USFCFG) && !(reg & ARM_SMMU_sCR0_CLIENTPD)) {
     /* Remove the valid bit for unused SMRs */
     for (i = 0; i < smmu->num_mapping_groups; i++) {
     if (smmu->s2crs[i].count == 0)
     smmu->smrs[i].valid = false;
     }
 }

 dev_notice(smmu->dev, "\tpreserved %d boot mapping%s\n", cnt,
    cnt == 1 ? "" : "s");
}

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v4 6/6] iommu: rockchip: Add support iommu v2

2021-05-07 Thread Benjamin Gaignard
From: Simon Xue 

RK356x SoC got new IOMMU hardware block (version 2).
Add a compatible to distinguish it from the first version.

Signed-off-by: Simon Xue 
[Benjamin]
- port driver from kernel 4.19 to 5.12
- change functions prototype.
- squash all fixes in this commit.
Signed-off-by: Benjamin Gaignard 
---
 drivers/iommu/rockchip-iommu.c | 422 +++--
 1 file changed, 406 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index e5d86b7177de..32dcf0aaec18 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -81,6 +82,30 @@
   */
 #define RK_IOMMU_PGSIZE_BITMAP 0x007ff000
 
+#define DT_LO_MASK 0xf000
+#define DT_HI_MASK GENMASK_ULL(39, 32)
+#define DT_SHIFT   28
+
+#define DTE_BASE_HI_MASK GENMASK(11, 4)
+
+#define PAGE_DESC_LO_MASK   0xf000
+#define PAGE_DESC_HI1_LOWER 32
+#define PAGE_DESC_HI1_UPPER 35
+#define PAGE_DESC_HI2_LOWER 36
+#define PAGE_DESC_HI2_UPPER 39
+#define PAGE_DESC_HI_MASK1  GENMASK_ULL(PAGE_DESC_HI1_UPPER, 
PAGE_DESC_HI1_LOWER)
+#define PAGE_DESC_HI_MASK2  GENMASK_ULL(PAGE_DESC_HI2_UPPER, 
PAGE_DESC_HI2_LOWER)
+
+#define DTE_HI1_LOWER 8
+#define DTE_HI1_UPPER 11
+#define DTE_HI2_LOWER 4
+#define DTE_HI2_UPPER 7
+#define DTE_HI_MASK1  GENMASK(DTE_HI1_UPPER, DTE_HI1_LOWER)
+#define DTE_HI_MASK2  GENMASK(DTE_HI2_UPPER, DTE_HI2_LOWER)
+
+#define PAGE_DESC_HI_SHIFT1 (PAGE_DESC_HI1_LOWER - DTE_HI1_LOWER)
+#define PAGE_DESC_HI_SHIFT2 (PAGE_DESC_HI2_LOWER - DTE_HI2_LOWER)
+
 struct rk_iommu_domain {
struct list_head iommus;
u32 *dt; /* page directory table */
@@ -91,6 +116,10 @@ struct rk_iommu_domain {
struct iommu_domain domain;
 };
 
+struct rockchip_iommu_data {
+   u32 version;
+};
+
 /* list of clocks required by IOMMU */
 static const char * const rk_iommu_clocks[] = {
"aclk", "iface",
@@ -108,6 +137,7 @@ struct rk_iommu {
struct list_head node; /* entry in rk_iommu_domain.iommus */
struct iommu_domain *domain; /* domain to which iommu is attached */
struct iommu_group *group;
+   u32 version;
 };
 
 struct rk_iommudata {
@@ -174,11 +204,32 @@ static struct rk_iommu_domain *to_rk_domain(struct 
iommu_domain *dom)
 #define RK_DTE_PT_ADDRESS_MASK0xf000
 #define RK_DTE_PT_VALID   BIT(0)
 
+/*
+ * In v2:
+ * 31:12 - PT address bit 31:0
+ * 11: 8 - PT address bit 35:32
+ *  7: 4 - PT address bit 39:36
+ *  3: 1 - Reserved
+ * 0 - 1 if PT @ PT address is valid
+ */
+#define RK_DTE_PT_ADDRESS_MASK_V2 0xfff0
+
 static inline phys_addr_t rk_dte_pt_address(u32 dte)
 {
return (phys_addr_t)dte & RK_DTE_PT_ADDRESS_MASK;
 }
 
+static inline phys_addr_t rk_dte_pt_address_v2(u32 dte)
+{
+   u64 dte_v2 = dte;
+
+   dte_v2 = ((dte_v2 & DTE_HI_MASK2) << PAGE_DESC_HI_SHIFT2) |
+((dte_v2 & DTE_HI_MASK1) << PAGE_DESC_HI_SHIFT1) |
+(dte_v2 & PAGE_DESC_LO_MASK);
+
+   return (phys_addr_t)dte_v2;
+}
+
 static inline bool rk_dte_is_pt_valid(u32 dte)
 {
return dte & RK_DTE_PT_VALID;
@@ -189,6 +240,15 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
return (pt_dma & RK_DTE_PT_ADDRESS_MASK) | RK_DTE_PT_VALID;
 }
 
+static inline u32 rk_mk_dte_v2(dma_addr_t pt_dma)
+{
+   pt_dma = (pt_dma & PAGE_DESC_LO_MASK) |
+((pt_dma & PAGE_DESC_HI_MASK1) >> PAGE_DESC_HI_SHIFT1) |
+(pt_dma & PAGE_DESC_HI_MASK2) >> PAGE_DESC_HI_SHIFT2;
+
+   return (pt_dma & RK_DTE_PT_ADDRESS_MASK_V2) | RK_DTE_PT_VALID;
+}
+
 /*
  * Each PTE has a Page address, some flags and a valid bit:
  * +-+---+---+-+
@@ -215,11 +275,37 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
 #define RK_PTE_PAGE_READABLE  BIT(1)
 #define RK_PTE_PAGE_VALID BIT(0)
 
+/*
+ * In v2:
+ * 31:12 - Page address bit 31:0
+ *  11:9 - Page address bit 34:32
+ *   8:4 - Page address bit 39:35
+ * 3 - Security
+ * 2 - Readable
+ * 1 - Writable
+ * 0 - 1 if Page @ Page address is valid
+ */
+#define RK_PTE_PAGE_ADDRESS_MASK_V2  0xfff0
+#define RK_PTE_PAGE_FLAGS_MASK_V20x000e
+#define RK_PTE_PAGE_READABLE_V2  BIT(2)
+#define RK_PTE_PAGE_WRITABLE_V2  BIT(1)
+
 static inline phys_addr_t rk_pte_page_address(u32 pte)
 {
return (phys_addr_t)pte & RK_PTE_PAGE_ADDRESS_MASK;
 }
 
+static inline phys_addr_t rk_pte_page_address_v2(u32 pte)
+{
+   u64 pte_v2 = pte;
+
+   pte_v2 = ((pte_v2 & DTE_HI_MASK2) << PAGE_DESC_HI_SHIFT2) |
+((pte_v2 & DTE_HI_MASK1) << PAGE_DESC_HI_SHIFT1) |
+(pte_v2 & PAGE_DESC_LO_MASK);
+
+   return (phys_addr_t)pte_v2;
+}
+
 static inline bool rk_pte_is_page_valid(u32 pte)
 {
return pte & RK_PTE_PAGE_VALID;
@@ -229,12 +315,26 @@ static inline bool rk_pte_is_page_valid(u32 pte)
 static u32 rk_mk_pte(phys_addr_t page, int prot)

[PATCH v4 3/6] ARM: dts: rockchip: rk322x: Fix IOMMU nodes properties

2021-05-07 Thread Benjamin Gaignard
Add '#" to iommu-cells properties.
Remove useless interrupt-names properties

Signed-off-by: Benjamin Gaignard 
---
version 4:
 - Remove interrupt-names properties from IOMMU nodes

 arch/arm/boot/dts/rk322x.dtsi | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/arm/boot/dts/rk322x.dtsi b/arch/arm/boot/dts/rk322x.dtsi
index a4dd50aaf3fc..7b4143ddc95c 100644
--- a/arch/arm/boot/dts/rk322x.dtsi
+++ b/arch/arm/boot/dts/rk322x.dtsi
@@ -561,10 +561,9 @@ vpu_mmu: iommu@20020800 {
compatible = "rockchip,iommu";
reg = <0x20020800 0x100>;
interrupts = ;
-   interrupt-names = "vpu_mmu";
clocks = < ACLK_VPU>, < HCLK_VPU>;
clock-names = "aclk", "iface";
-   iommu-cells = <0>;
+   #iommu-cells = <0>;
status = "disabled";
};
 
@@ -572,10 +571,9 @@ vdec_mmu: iommu@20030480 {
compatible = "rockchip,iommu";
reg = <0x20030480 0x40>, <0x200304c0 0x40>;
interrupts = ;
-   interrupt-names = "vdec_mmu";
clocks = < ACLK_RKVDEC>, < HCLK_RKVDEC>;
clock-names = "aclk", "iface";
-   iommu-cells = <0>;
+   #iommu-cells = <0>;
status = "disabled";
};
 
@@ -605,7 +603,6 @@ vop_mmu: iommu@20053f00 {
compatible = "rockchip,iommu";
reg = <0x20053f00 0x100>;
interrupts = ;
-   interrupt-names = "vop_mmu";
clocks = < ACLK_VOP>, < HCLK_VOP>;
clock-names = "aclk", "iface";
#iommu-cells = <0>;
@@ -626,10 +623,9 @@ iep_mmu: iommu@20070800 {
compatible = "rockchip,iommu";
reg = <0x20070800 0x100>;
interrupts = ;
-   interrupt-names = "iep_mmu";
clocks = < ACLK_IEP>, < HCLK_IEP>;
clock-names = "aclk", "iface";
-   iommu-cells = <0>;
+   #iommu-cells = <0>;
status = "disabled";
};
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 5/6] ARM64: dts: rockchip: rk3036: Remove useless interrupt-names properties

2021-05-07 Thread Benjamin Gaignard
Remove useless interrupt-names properties for IOMMU nodes

Signed-off-by: Benjamin Gaignard 
---
 arch/arm64/boot/dts/rockchip/px30.dtsi | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/boot/dts/rockchip/px30.dtsi 
b/arch/arm64/boot/dts/rockchip/px30.dtsi
index c45b0cfcae09..47d0c29fd8d0 100644
--- a/arch/arm64/boot/dts/rockchip/px30.dtsi
+++ b/arch/arm64/boot/dts/rockchip/px30.dtsi
@@ -1068,7 +1068,6 @@ vopb_mmu: iommu@ff460f00 {
compatible = "rockchip,iommu";
reg = <0x0 0xff460f00 0x0 0x100>;
interrupts = ;
-   interrupt-names = "vopb_mmu";
clocks = < ACLK_VOPB>, < HCLK_VOPB>;
clock-names = "aclk", "iface";
power-domains = < PX30_PD_VO>;
@@ -1109,7 +1108,6 @@ vopl_mmu: iommu@ff470f00 {
compatible = "rockchip,iommu";
reg = <0x0 0xff470f00 0x0 0x100>;
interrupts = ;
-   interrupt-names = "vopl_mmu";
clocks = < ACLK_VOPL>, < HCLK_VOPL>;
clock-names = "aclk", "iface";
power-domains = < PX30_PD_VO>;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 4/6] ARM: dts: rockchip: rk3036: Remove useless interrupt-names on IOMMU node

2021-05-07 Thread Benjamin Gaignard
Remove useless interrupt-names property for IOMMU node

Signed-off-by: Benjamin Gaignard 
---
 arch/arm/boot/dts/rk3036.dtsi | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm/boot/dts/rk3036.dtsi b/arch/arm/boot/dts/rk3036.dtsi
index 47a787a12e55..16753dc50bf3 100644
--- a/arch/arm/boot/dts/rk3036.dtsi
+++ b/arch/arm/boot/dts/rk3036.dtsi
@@ -140,7 +140,6 @@ vop_mmu: iommu@10118300 {
compatible = "rockchip,iommu";
reg = <0x10118300 0x100>;
interrupts = ;
-   interrupt-names = "vop_mmu";
clocks = < ACLK_LCDC>, < HCLK_LCDC>;
clock-names = "aclk", "iface";
#iommu-cells = <0>;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/6] dt-bindings: iommu: rockchip: Convert IOMMU to DT schema

2021-05-07 Thread Benjamin Gaignard
Convert Rockchip IOMMU to DT schema

Signed-off-by: Benjamin Gaignard 
---
version 4:
 - Add descriptions for reg items
 - Add description for interrupts items
 - Remove useless interrupt-names proporties

version 2:
 - Change maintainer
 - Change reg maxItems
 - Change interrupt maxItems
 .../bindings/iommu/rockchip,iommu.txt | 38 -
 .../bindings/iommu/rockchip,iommu.yaml| 80 +++
 2 files changed, 80 insertions(+), 38 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml

diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
deleted file mode 100644
index 6ecefea1c6f9..
--- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
+++ /dev/null
@@ -1,38 +0,0 @@
-Rockchip IOMMU
-==
-
-A Rockchip DRM iommu translates io virtual addresses to physical addresses for
-its master device.  Each slave device is bound to a single master device, and
-shares its clocks, power domain and irq.
-
-Required properties:
-- compatible  : Should be "rockchip,iommu"
-- reg : Address space for the configuration registers
-- interrupts  : Interrupt specifier for the IOMMU instance
-- interrupt-names : Interrupt name for the IOMMU instance
-- #iommu-cells: Should be <0>.  This indicates the iommu is a
-"single-master" device, and needs no additional information
-to associate with its master device.  See:
-Documentation/devicetree/bindings/iommu/iommu.txt
-- clocks  : A list of clocks required for the IOMMU to be accessible by
-the host CPU.
-- clock-names : Should contain the following:
-   "iface" - Main peripheral bus clock (PCLK/HCL) (required)
-   "aclk"  - AXI bus clock (required)
-
-Optional properties:
-- rockchip,disable-mmu-reset : Don't use the mmu reset operation.
-  Some mmu instances may produce unexpected results
-  when the reset operation is used.
-
-Example:
-
-   vopl_mmu: iommu@ff940300 {
-   compatible = "rockchip,iommu";
-   reg = <0xff940300 0x100>;
-   interrupts = ;
-   interrupt-names = "vopl_mmu";
-   clocks = < ACLK_VOP1>, < HCLK_VOP1>;
-   clock-names = "aclk", "iface";
-   #iommu-cells = <0>;
-   };
diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
new file mode 100644
index ..099fc2578b54
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
@@ -0,0 +1,80 @@
+# SPDX-License-Identifier: GPL-2.0-only
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/rockchip,iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Rockchip IOMMU
+
+maintainers:
+  - Heiko Stuebner 
+
+description: |+
+  A Rockchip DRM iommu translates io virtual addresses to physical addresses 
for
+  its master device. Each slave device is bound to a single master device and
+  shares its clocks, power domain and irq.
+
+  For information on assigning IOMMU controller to its peripheral devices,
+  see generic IOMMU bindings.
+
+properties:
+  compatible:
+const: rockchip,iommu
+
+  reg:
+items:
+  - description: configuration registers for MMU instance 0
+  - description: configuration registers for MMU instance 1
+minItems: 1
+maxItems: 2
+
+  interrupts:
+items:
+  - description: interruption for MMU instance 0
+  - description: interruption for MMU instance 1
+minItems: 1
+maxItems: 2
+
+  clocks:
+items:
+  - description: Core clock
+  - description: Interface clock
+
+  clock-names:
+items:
+  - const: aclk
+  - const: iface
+
+  "#iommu-cells":
+const: 0
+
+  rockchip,disable-mmu-reset:
+$ref: /schemas/types.yaml#/definitions/flag
+description: |
+  Do not use the mmu reset operation.
+  Some mmu instances may produce unexpected results
+  when the reset operation is used.
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - clock-names
+  - "#iommu-cells"
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+#include 
+
+vopl_mmu: iommu@ff940300 {
+  compatible = "rockchip,iommu";
+  reg = <0xff940300 0x100>;
+  interrupts = ;
+  clocks = < ACLK_VOP1>, < HCLK_VOP1>;
+  clock-names = "aclk", "iface";
+  #iommu-cells = <0>;
+};
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/6] dt-bindings: iommu: rockchip: Add compatible for v2

2021-05-07 Thread Benjamin Gaignard
Add compatible for the second version of IOMMU hardware block.
RK356x IOMMU can also be link to a power domain.

Signed-off-by: Benjamin Gaignard 
Reviewed-by: Rob Herring 
---
version 3:
 - Rename compatible with SoC name

version 2:
 - Add power-domains property

 .../devicetree/bindings/iommu/rockchip,iommu.yaml  | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
index 099fc2578b54..d2e28a9e3545 100644
--- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
@@ -19,7 +19,9 @@ description: |+
 
 properties:
   compatible:
-const: rockchip,iommu
+enum:
+  - rockchip,iommu
+  - rockchip,rk3568-iommu
 
   reg:
 items:
@@ -48,6 +50,9 @@ properties:
   "#iommu-cells":
 const: 0
 
+  power-domains:
+maxItems: 1
+
   rockchip,disable-mmu-reset:
 $ref: /schemas/types.yaml#/definitions/flag
 description: |
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/6] Add IOMMU driver for rk356x

2021-05-07 Thread Benjamin Gaignard
This series adds the IOMMU driver for rk356x SoC.
Since a new compatible is needed to distinguish this second version of 
IOMMU hardware block from the first one, it is an opportunity to convert
the binding to DT schema.

version 4:
 - Add description for reg items
 - Remove useless interrupt-names properties
 - Add description for interrupts items
 - Remove interrupt-names properties from DST files

version 3:
 - Rename compatible with soc prefix
 - Rebase on v5.12 tag

version 2:
 - Fix iommu-cells typo in rk322x.dtsi
 - Change maintainer
 - Change reg maxItems
 - Add power-domains property
 
Benjamin Gaignard (5):
  dt-bindings: iommu: rockchip: Convert IOMMU to DT schema
  dt-bindings: iommu: rockchip: Add compatible for v2
  ARM: dts: rockchip: rk322x: Fix IOMMU nodes properties
  ARM: dts: rockchip: rk3036: Remove useless interrupt-names on IOMMU
node
  ARM64: dts: rockchip: rk3036: Remove useless interrupt-names
properties

Simon Xue (1):
  iommu: rockchip: Add support iommu v2

 .../bindings/iommu/rockchip,iommu.txt |  38 --
 .../bindings/iommu/rockchip,iommu.yaml|  85 
 arch/arm/boot/dts/rk3036.dtsi |   1 -
 arch/arm/boot/dts/rk322x.dtsi |  10 +-
 arch/arm64/boot/dts/rockchip/px30.dtsi|   2 -
 drivers/iommu/rockchip-iommu.c| 422 +-
 6 files changed, 494 insertions(+), 64 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-07 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, May 5, 2021 1:13 AM
> 
> On Wed, Apr 28, 2021 at 06:58:19AM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Wednesday, April 28, 2021 1:12 AM
> > >
> > [...]
> > > > As Alex says, if this line fails because of the group restrictions,
> > > > that's not great because it's not very obvious what's gone wrong.
> > >
> > > Okay, that is fair, but let's solve that problem directly. For
> > > instance netlink has been going in the direction of adding a "extack"
> > > from the kernel which is a descriptive error string. If the failing
> > > ioctl returned the string:
> > >
> > >   "cannot join this device to the IOASID because device XXX in the
> > >same group #10 is in use"
> > >
> > > Would you agree it is now obvious what has gone wrong? In fact would
> > > you agree this is a lot better user experience than what applications
> > > do today even though they have the group FD?
> > >
> >
> > Currently all the discussions are around implicit vs. explicit uAPI 
> > semantics
> > on the group restriction. However if we look beyond group the implicit
> > semantics might be inevitable when dealing with incompatible iommu
> > domains. An existing example of iommu incompatibility is IOMMU_
> > CACHE.
> 
> I still think we need to get rid of these incompatibilities
> somehow. Having multiple HW incompatible IOASID in the same platform
> is just bad all around.
> 
> When modeling in userspace IOMMU_CACHE sounds like it is a property of
> each individual IOASID, not an attribute that requires a new domain.

sure. the iommu domain is an kernel-internal concept. The userspace 
should focus everything on IOASID.

> 
> People that want to create cache bypass IOASID's should just ask for
> that that directly.
> 

Yes, in earlier discussion we agreed on a scheme that ioasid module
will return an error to userspace indicating incompatibility detected
when binding a device to ioasid then the userspace should create 
a new IOASID for this device. This has to be done 'explicitly'. 

When I used it as the example for 'implicit semantics" is that the kernel 
won't create another group-like object to contain devices with compatible 
attributes and 'explicitly' manage it in uAPI like group_fd. If we anyway
rely on the userspace to have more intelligence on those hardware
restrictions, it's little sense to only explicitly handle group_fd in uAPI.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-05-07 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Wednesday, April 28, 2021 11:06 PM
> 
> On Wed, 28 Apr 2021 06:34:11 +
> "Tian, Kevin"  wrote:
> 
> > > From: Jason Gunthorpe 
> > > Sent: Monday, April 26, 2021 8:38 PM
> > >
> > [...]
> > > > Want to hear your opinion for one open here. There is no doubt that
> > > > an ioasid represents a HW page table when the table is constructed by
> > > > userspace and then linked to the IOMMU through the bind/unbind
> > > > API. But I'm not very sure about whether an ioasid should represent
> > > > the exact pgtable or the mapping metadata when the underlying
> > > > pgtable is indirectly constructed through map/unmap API. VFIO does
> > > > the latter way, which is why it allows multiple incompatible domains
> > > > in a single container which all share the same mapping metadata.
> > >
> > > I think VFIO's map/unmap is way too complex and we know it has bad
> > > performance problems.
> >
> > Can you or Alex elaborate where the complexity and performance problem
> > locate in VFIO map/umap? We'd like to understand more detail and see
> how
> > to avoid it in the new interface.
> 
> 
> The map/unmap interface is really only good for long lived mappings,
> the overhead is too high for things like vIOMMU use cases or any case
> where the mapping is intended to be dynamic.  Userspace drivers must
> make use of a long lived buffer mapping in order to achieve performance.

This is not a limitation of VFIO map/unmap. It's the limitation of any
map/unmap semantics since the fact of long-lived vs. short-lived is 
imposed by userspace. Nested translation is the only viable optimization
allowing 2nd-level to be a long-lived mapping even w/ vIOMMU. From 
this angle I'm not sure how a new map/unmap implementation could 
address this perf limitation alone.

> 
> The mapping and unmapping granularity has been a problem as well,
> type1v1 allowed arbitrary unmaps to bisect the original mapping, with
> the massive caveat that the caller relies on the return value of the
> unmap to determine what was actually unmapped because the IOMMU use
> of
> superpages is transparent to the caller.  This led to type1v2 that
> simply restricts the user to avoid ever bisecting mappings.  That still
> leaves us with problems for things like virtio-mem support where we
> need to create initial mappings with a granularity that allows us to
> later remove entries, which can prevent effective use of IOMMU
> superpages.

We could start with a semantics similar to type1v2. 

btw why does virtio-mem require a smaller granularity? Can we split
superpages in-the-fly when removal actually happens (just similar
to page split in VM live migration for efficient dirty page tracking)?

and isn't it another problem imposed by userspace? How could a new
map/unmap implementation mitigate this problem if the userspace 
insists on a smaller granularity for initial mappings?

> 
> Locked page accounting has been another constant issue.  We perform
> locked page accounting at the container level, where each container
> accounts independently.  A user may require multiple containers, the
> containers may pin the same physical memory, but be accounted against
> the user once per container.

for /dev/ioasid there is still an open whether an process is allowed to
open /dev/ioasid once or multiple times. If there is only one ioasid_fd
per process, the accounting can be made accurately. otherwise the
same problem still exists as each ioasid_fd is akin to the container, then
we need find a better solution.

> 
> Those are the main ones I can think of.  It is nice to have a simple
> map/unmap interface, I'd hope that a new /dev/ioasid interface wouldn't
> raise the barrier to entry too high, but the user needs to have the
> ability to have more control of their mappings and locked page
> accounting should probably be offloaded somewhere.  Thanks,
> 

Based on your feedbacks I feel it's probably reasonable to start with
a type1v2 semantics for the new interface. Locked accounting could
also start with the same VFIO restriction and then improve it
incrementally, if a cleaner way is intrusive (if not affecting uAPI).
But I didn't get the suggestion on "more control of their mappings".
Can you elaborate?

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu