Re: [PATCH v2 1/2] uacce: Remove mm_exit() op

2020-04-23 Thread Zhangfei Gao



On 2020/4/23 下午8:53, Jean-Philippe Brucker wrote:

The mm_exit() op will be removed from the SVA API. When a process dies
and its mm goes away, the IOMMU driver won't notify device drivers
anymore. Drivers should expect to handle a lot more aborted DMA. On the
upside, it does greatly simplify the queue management.

The uacce_mm struct, that tracks all queues bound to an mm, was only
used by the mm_exit() callback. Remove it.

Signed-off-by: Jean-Philippe Brucker 

Thanks Jean
Acked-by: Zhangfei Gao 


---
v1->v2: clear q->handle after unbind


Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: iommu_iova slab eats too much memory

2020-04-23 Thread Bin
Hello? anyone there?

Bin  于2020年4月23日周四 下午5:14写道:

> Forget to mention, I've already disabled the slab merge, so this is what
> it is.
>
> Bin  于2020年4月23日周四 下午5:11写道:
>
>> Hey, guys:
>>
>> I'm running a batch of CoreOS boxes, the lsb_release is:
>>
>> ```
>> # cat /etc/lsb-release
>> DISTRIB_ID="Container Linux by CoreOS"
>> DISTRIB_RELEASE=2303.3.0
>> DISTRIB_CODENAME="Rhyolite"
>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
>> ```
>>
>> ```
>> # uname -a
>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019
>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel GNU/Linux
>> ```
>> Recently, I found my vms constently being killed due to OOM, and after
>> digging into the problem, I finally realized that the kernel is leaking
>> memory.
>>
>> Here's my slabinfo:
>>
>>  Active / Total Objects (% used): 83818306 / 84191607 (99.6%)
>>  Active / Total Slabs (% used)  : 1336293 / 1336293 (100.0%)
>>  Active / Total Caches (% used) : 152 / 217 (70.0%)
>>  Active / Total Size (% used)   : 5828768.08K / 5996848.72K (97.2%)
>>  Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
>>
>>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>>
>> 80253888 80253888 100%0.06K 1253967   64   5015868K iommu_iova
>>
>> 489472 489123  99%0.03K   3824  128 15296K kmalloc-32
>>
>> 297444 271112  91%0.19K   7082   42 56656K dentry
>>
>> 254400 252784  99%0.06K   3975   64 15900K anon_vma_chain
>>
>> 222528  39255  17%0.50K   6954   32111264K kmalloc-512
>>
>> 202482 201814  99%0.19K   4821   42 38568K vm_area_struct
>>
>> 200192 200192 100%0.01K391  512  1564K kmalloc-8
>>
>> 170528 169359  99%0.25K   5329   32 42632K filp
>>
>> 158144 153508  97%0.06K   2471   64  9884K kmalloc-64
>>
>> 149914 149365  99%0.09K   3259   46 13036K anon_vma
>>
>> 146640 143123  97%0.10K   3760   39 15040K buffer_head
>>
>> 130368  32791  25%0.09K   3104   42 12416K kmalloc-96
>>
>> 129752 129752 100%0.07K   2317   56  9268K Acpi-Operand
>>
>> 105468 105106  99%0.04K   1034  102  4136K
>> selinux_inode_security
>>  73080  73080 100%0.13K   2436   30  9744K kernfs_node_cache
>>
>>  72360  70261  97%0.59K   1340   54 42880K inode_cache
>>
>>  71040  71040 100%0.12K   2220   32  8880K eventpoll_epi
>>
>>  68096  59262  87%0.02K266  256  1064K kmalloc-16
>>
>>  53652  53652 100%0.04K526  102  2104K pde_opener
>>
>>  50496  31654  62%2.00K   3156   16100992K kmalloc-2048
>>
>>  46242  46242 100%0.19K   1101   42  8808K cred_jar
>>
>>  44496  43013  96%0.66K927   48 29664K proc_inode_cache
>>
>>  44352  44352 100%0.06K693   64  2772K task_delay_info
>>
>>  43516  43471  99%0.69K946   46 30272K sock_inode_cache
>>
>>  37856  27626  72%1.00K   1183   32 37856K kmalloc-1024
>>
>>  36736  36736 100%0.07K656   56  2624K eventpoll_pwq
>>
>>  34076  31282  91%0.57K   1217   28 19472K radix_tree_node
>>
>>  33660  30528  90%1.05K   1122   30 35904K ext4_inode_cache
>>
>>  32760  30959  94%0.19K780   42  6240K kmalloc-192
>>
>>  32028  32028 100%0.04K314  102  1256K ext4_extent_status
>>
>>  30048  30048 100%0.25K939   32  7512K skbuff_head_cache
>>
>>  28736  28736 100%0.06K449   64  1796K fs_cache
>>
>>  24702  24702 100%0.69K537   46 17184K files_cache
>>
>>  23808  23808 100%0.66K496   48 15872K ovl_inode
>>
>>  23104  22945  99%0.12K722   32  2888K kmalloc-128
>>
>>  22724  21307  93%0.69K494   46 15808K shmem_inode_cache
>>
>>  21472  21472 100%0.12K671   32  2684K seq_file
>>
>>  19904  19904 100%1.00K622   32 19904K UNIX
>>
>>  17340  17340 100%1.06K578   30 18496K mm_struct
>>
>>  15980  15980 100%0.02K 94  170   376K avtab_node
>>
>>  14070  14070 100%1.06K469   30 15008K signal_cache
>>
>>  13248  13248 100%0.12K414   32  1656K pid
>>
>>  12128  11777  97%0.25K379   32  3032K kmalloc-256
>>
>>  11008  11008 100%0.02K 43  256   172K
>> selinux_file_security
>>  10812  10812 100%0.04K106  102   424K Acpi-Namespace
>>
>> These information shows that the 'iommu_iova' is the top memory consumer.
>> In order to optimize the network performence of Openstack virtual machines,
>> I enabled the vt-d feature in bios and sriov feature of Intel 82599 10G
>> NIC. I'm assuming this is the root cause of this issue.
>>
>> Is there anything I can do to fix it?
>>
>
___
iommu mailing list

[PATCH] dma-contiguous: fix comment for dma_release_from_contiguous

2020-04-23 Thread Peter Collingbourne via iommu
Commit 90ae409f9eb3 ("dma-direct: fix zone selection
after an unaddressable CMA allocation") changed the logic in
dma_release_from_contiguous to remove the normal pages fallback path,
but did not update the comment. Fix that.

Signed-off-by: Peter Collingbourne 
---
 kernel/dma/contiguous.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 8bc6f2d670f9..15bc5026c485 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -222,8 +222,8 @@ bool dma_release_from_contiguous(struct device *dev, struct 
page *pages,
  * @gfp:   Allocation flags.
  *
  * This function allocates contiguous memory buffer for specified device. It
- * first tries to use device specific contiguous memory area if available or
- * the default global one, then tries a fallback allocation of normal pages.
+ * tries to use device specific contiguous memory area if available, or the
+ * default global one.
  *
  * Note that it byapss one-page size of allocations from the global area as
  * the addresses within one page are always contiguous, so there is no need
-- 
2.26.2.303.gf8c07b1a785-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv2] iommu/arm-smmu: Make remove callback message more informative

2020-04-23 Thread Doug Anderson
Hi,

On Thu, Apr 23, 2020 at 2:55 AM Sai Prakash Ranjan
 wrote:
>
> Currently on reboot/shutdown, the following messages are
> displayed on the console as error messages before the
> system reboots/shutdown as part of remove callback.
>
> On SC7180:
>
>   arm-smmu 1500.iommu: removing device with active domains!
>   arm-smmu 504.iommu: removing device with active domains!
>
> Make this error message more informative and less scary.
>
> Reported-by: Douglas Anderson 
> Suggested-by: Robin Murphy 
> Signed-off-by: Sai Prakash Ranjan 
> ---
>  drivers/iommu/arm-smmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Douglas Anderson 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu_iova slab eats too much memory

2020-04-23 Thread Bin
Forget to mention, I've already disabled the slab merge, so this is what it
is.

Bin  于2020年4月23日周四 下午5:11写道:

> Hey, guys:
>
> I'm running a batch of CoreOS boxes, the lsb_release is:
>
> ```
> # cat /etc/lsb-release
> DISTRIB_ID="Container Linux by CoreOS"
> DISTRIB_RELEASE=2303.3.0
> DISTRIB_CODENAME="Rhyolite"
> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
> ```
>
> ```
> # uname -a
> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019
> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel GNU/Linux
> ```
> Recently, I found my vms constently being killed due to OOM, and after
> digging into the problem, I finally realized that the kernel is leaking
> memory.
>
> Here's my slabinfo:
>
>  Active / Total Objects (% used): 83818306 / 84191607 (99.6%)
>  Active / Total Slabs (% used)  : 1336293 / 1336293 (100.0%)
>  Active / Total Caches (% used) : 152 / 217 (70.0%)
>  Active / Total Size (% used)   : 5828768.08K / 5996848.72K (97.2%)
>  Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
>
>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>
> 80253888 80253888 100%0.06K 1253967   64   5015868K iommu_iova
>
> 489472 489123  99%0.03K   3824  128 15296K kmalloc-32
>
> 297444 271112  91%0.19K   7082   42 56656K dentry
>
> 254400 252784  99%0.06K   3975   64 15900K anon_vma_chain
>
> 222528  39255  17%0.50K   6954   32111264K kmalloc-512
>
> 202482 201814  99%0.19K   4821   42 38568K vm_area_struct
>
> 200192 200192 100%0.01K391  512  1564K kmalloc-8
>
> 170528 169359  99%0.25K   5329   32 42632K filp
>
> 158144 153508  97%0.06K   2471   64  9884K kmalloc-64
>
> 149914 149365  99%0.09K   3259   46 13036K anon_vma
>
> 146640 143123  97%0.10K   3760   39 15040K buffer_head
>
> 130368  32791  25%0.09K   3104   42 12416K kmalloc-96
>
> 129752 129752 100%0.07K   2317   56  9268K Acpi-Operand
>
> 105468 105106  99%0.04K   1034  102  4136K
> selinux_inode_security
>  73080  73080 100%0.13K   2436   30  9744K kernfs_node_cache
>
>  72360  70261  97%0.59K   1340   54 42880K inode_cache
>
>  71040  71040 100%0.12K   2220   32  8880K eventpoll_epi
>
>  68096  59262  87%0.02K266  256  1064K kmalloc-16
>
>  53652  53652 100%0.04K526  102  2104K pde_opener
>
>  50496  31654  62%2.00K   3156   16100992K kmalloc-2048
>
>  46242  46242 100%0.19K   1101   42  8808K cred_jar
>
>  44496  43013  96%0.66K927   48 29664K proc_inode_cache
>
>  44352  44352 100%0.06K693   64  2772K task_delay_info
>
>  43516  43471  99%0.69K946   46 30272K sock_inode_cache
>
>  37856  27626  72%1.00K   1183   32 37856K kmalloc-1024
>
>  36736  36736 100%0.07K656   56  2624K eventpoll_pwq
>
>  34076  31282  91%0.57K   1217   28 19472K radix_tree_node
>
>  33660  30528  90%1.05K   1122   30 35904K ext4_inode_cache
>
>  32760  30959  94%0.19K780   42  6240K kmalloc-192
>
>  32028  32028 100%0.04K314  102  1256K ext4_extent_status
>
>  30048  30048 100%0.25K939   32  7512K skbuff_head_cache
>
>  28736  28736 100%0.06K449   64  1796K fs_cache
>
>  24702  24702 100%0.69K537   46 17184K files_cache
>
>  23808  23808 100%0.66K496   48 15872K ovl_inode
>
>  23104  22945  99%0.12K722   32  2888K kmalloc-128
>
>  22724  21307  93%0.69K494   46 15808K shmem_inode_cache
>
>  21472  21472 100%0.12K671   32  2684K seq_file
>
>  19904  19904 100%1.00K622   32 19904K UNIX
>
>  17340  17340 100%1.06K578   30 18496K mm_struct
>
>  15980  15980 100%0.02K 94  170   376K avtab_node
>
>  14070  14070 100%1.06K469   30 15008K signal_cache
>
>  13248  13248 100%0.12K414   32  1656K pid
>
>  12128  11777  97%0.25K379   32  3032K kmalloc-256
>
>  11008  11008 100%0.02K 43  256   172K
> selinux_file_security
>  10812  10812 100%0.04K106  102   424K Acpi-Namespace
>
> These information shows that the 'iommu_iova' is the top memory consumer.
> In order to optimize the network performence of Openstack virtual machines,
> I enabled the vt-d feature in bios and sriov feature of Intel 82599 10G
> NIC. I'm assuming this is the root cause of this issue.
>
> Is there anything I can do to fix it?
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

iommu_iova slab eats too much memory

2020-04-23 Thread Bin
Hey, guys:

I'm running a batch of CoreOS boxes, the lsb_release is:

```
# cat /etc/lsb-release
DISTRIB_ID="Container Linux by CoreOS"
DISTRIB_RELEASE=2303.3.0
DISTRIB_CODENAME="Rhyolite"
DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
```

```
# uname -a
Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019
x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel GNU/Linux
```
Recently, I found my vms constently being killed due to OOM, and after
digging into the problem, I finally realized that the kernel is leaking
memory.

Here's my slabinfo:

 Active / Total Objects (% used): 83818306 / 84191607 (99.6%)
 Active / Total Slabs (% used)  : 1336293 / 1336293 (100.0%)
 Active / Total Caches (% used) : 152 / 217 (70.0%)
 Active / Total Size (% used)   : 5828768.08K / 5996848.72K (97.2%)
 Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME

80253888 80253888 100%0.06K 1253967   64   5015868K iommu_iova

489472 489123  99%0.03K   3824  128 15296K kmalloc-32

297444 271112  91%0.19K   7082   42 56656K dentry

254400 252784  99%0.06K   3975   64 15900K anon_vma_chain

222528  39255  17%0.50K   6954   32111264K kmalloc-512

202482 201814  99%0.19K   4821   42 38568K vm_area_struct

200192 200192 100%0.01K391  512  1564K kmalloc-8

170528 169359  99%0.25K   5329   32 42632K filp

158144 153508  97%0.06K   2471   64  9884K kmalloc-64

149914 149365  99%0.09K   3259   46 13036K anon_vma

146640 143123  97%0.10K   3760   39 15040K buffer_head

130368  32791  25%0.09K   3104   42 12416K kmalloc-96

129752 129752 100%0.07K   2317   56  9268K Acpi-Operand

105468 105106  99%0.04K   1034  102  4136K
selinux_inode_security
 73080  73080 100%0.13K   2436   30  9744K kernfs_node_cache

 72360  70261  97%0.59K   1340   54 42880K inode_cache

 71040  71040 100%0.12K   2220   32  8880K eventpoll_epi

 68096  59262  87%0.02K266  256  1064K kmalloc-16

 53652  53652 100%0.04K526  102  2104K pde_opener

 50496  31654  62%2.00K   3156   16100992K kmalloc-2048

 46242  46242 100%0.19K   1101   42  8808K cred_jar

 44496  43013  96%0.66K927   48 29664K proc_inode_cache

 44352  44352 100%0.06K693   64  2772K task_delay_info

 43516  43471  99%0.69K946   46 30272K sock_inode_cache

 37856  27626  72%1.00K   1183   32 37856K kmalloc-1024

 36736  36736 100%0.07K656   56  2624K eventpoll_pwq

 34076  31282  91%0.57K   1217   28 19472K radix_tree_node

 33660  30528  90%1.05K   1122   30 35904K ext4_inode_cache

 32760  30959  94%0.19K780   42  6240K kmalloc-192

 32028  32028 100%0.04K314  102  1256K ext4_extent_status

 30048  30048 100%0.25K939   32  7512K skbuff_head_cache

 28736  28736 100%0.06K449   64  1796K fs_cache

 24702  24702 100%0.69K537   46 17184K files_cache

 23808  23808 100%0.66K496   48 15872K ovl_inode

 23104  22945  99%0.12K722   32  2888K kmalloc-128

 22724  21307  93%0.69K494   46 15808K shmem_inode_cache

 21472  21472 100%0.12K671   32  2684K seq_file

 19904  19904 100%1.00K622   32 19904K UNIX

 17340  17340 100%1.06K578   30 18496K mm_struct

 15980  15980 100%0.02K 94  170   376K avtab_node

 14070  14070 100%1.06K469   30 15008K signal_cache

 13248  13248 100%0.12K414   32  1656K pid

 12128  11777  97%0.25K379   32  3032K kmalloc-256

 11008  11008 100%0.02K 43  256   172K
selinux_file_security
 10812  10812 100%0.04K106  102   424K Acpi-Namespace

These information shows that the 'iommu_iova' is the top memory consumer.
In order to optimize the network performence of Openstack virtual machines,
I enabled the vt-d feature in bios and sriov feature of Intel 82599 10G
NIC. I'm assuming this is the root cause of this issue.

Is there anything I can do to fix it?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 1/2] uacce: Remove mm_exit() op

2020-04-23 Thread Jean-Philippe Brucker
The mm_exit() op will be removed from the SVA API. When a process dies
and its mm goes away, the IOMMU driver won't notify device drivers
anymore. Drivers should expect to handle a lot more aborted DMA. On the
upside, it does greatly simplify the queue management.

The uacce_mm struct, that tracks all queues bound to an mm, was only
used by the mm_exit() callback. Remove it.

Signed-off-by: Jean-Philippe Brucker 
---
v1->v2: clear q->handle after unbind
---
 include/linux/uacce.h  |  34 ++--
 drivers/misc/uacce/uacce.c | 172 +
 2 files changed, 51 insertions(+), 155 deletions(-)

diff --git a/include/linux/uacce.h b/include/linux/uacce.h
index 0e215e6d0534a..454c2f6672d79 100644
--- a/include/linux/uacce.h
+++ b/include/linux/uacce.h
@@ -68,19 +68,21 @@ enum uacce_q_state {
  * @uacce: pointer to uacce
  * @priv: private pointer
  * @wait: wait queue head
- * @list: index into uacce_mm
- * @uacce_mm: the corresponding mm
+ * @list: index into uacce queues list
  * @qfrs: pointer of qfr regions
  * @state: queue state machine
+ * @pasid: pasid associated to the mm
+ * @handle: iommu_sva handle returned by iommu_sva_bind_device()
  */
 struct uacce_queue {
struct uacce_device *uacce;
void *priv;
wait_queue_head_t wait;
struct list_head list;
-   struct uacce_mm *uacce_mm;
struct uacce_qfile_region *qfrs[UACCE_MAX_REGION];
enum uacce_q_state state;
+   int pasid;
+   struct iommu_sva *handle;
 };
 
 /**
@@ -96,8 +98,8 @@ struct uacce_queue {
  * @cdev: cdev of the uacce
  * @dev: dev of the uacce
  * @priv: private pointer of the uacce
- * @mm_list: list head of uacce_mm->list
- * @mm_lock: lock for mm_list
+ * @queues: list of queues
+ * @queues_lock: lock for queues list
  * @inode: core vfs
  */
 struct uacce_device {
@@ -112,27 +114,9 @@ struct uacce_device {
struct cdev *cdev;
struct device dev;
void *priv;
-   struct list_head mm_list;
-   struct mutex mm_lock;
-   struct inode *inode;
-};
-
-/**
- * struct uacce_mm - keep track of queues bound to a process
- * @list: index into uacce_device
- * @queues: list of queues
- * @mm: the mm struct
- * @lock: protects the list of queues
- * @pasid: pasid of the uacce_mm
- * @handle: iommu_sva handle return from iommu_sva_bind_device
- */
-struct uacce_mm {
-   struct list_head list;
struct list_head queues;
-   struct mm_struct *mm;
-   struct mutex lock;
-   int pasid;
-   struct iommu_sva *handle;
+   struct mutex queues_lock;
+   struct inode *inode;
 };
 
 #if IS_ENABLED(CONFIG_UACCE)
diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index d39307f060bd6..107028e77ca37 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -90,109 +90,39 @@ static long uacce_fops_compat_ioctl(struct file *filep,
 }
 #endif
 
-static int uacce_sva_exit(struct device *dev, struct iommu_sva *handle,
- void *data)
+static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
 {
-   struct uacce_mm *uacce_mm = data;
-   struct uacce_queue *q;
-
-   /*
-* No new queue can be added concurrently because no caller can have a
-* reference to this mm. But there may be concurrent calls to
-* uacce_mm_put(), so we need the lock.
-*/
-   mutex_lock(_mm->lock);
-   list_for_each_entry(q, _mm->queues, list)
-   uacce_put_queue(q);
-   uacce_mm->mm = NULL;
-   mutex_unlock(_mm->lock);
+   int pasid;
+   struct iommu_sva *handle;
 
-   return 0;
-}
-
-static struct iommu_sva_ops uacce_sva_ops = {
-   .mm_exit = uacce_sva_exit,
-};
-
-static struct uacce_mm *uacce_mm_get(struct uacce_device *uacce,
-struct uacce_queue *q,
-struct mm_struct *mm)
-{
-   struct uacce_mm *uacce_mm = NULL;
-   struct iommu_sva *handle = NULL;
-   int ret;
-
-   lockdep_assert_held(>mm_lock);
-
-   list_for_each_entry(uacce_mm, >mm_list, list) {
-   if (uacce_mm->mm == mm) {
-   mutex_lock(_mm->lock);
-   list_add(>list, _mm->queues);
-   mutex_unlock(_mm->lock);
-   return uacce_mm;
-   }
-   }
-
-   uacce_mm = kzalloc(sizeof(*uacce_mm), GFP_KERNEL);
-   if (!uacce_mm)
-   return NULL;
+   if (!(uacce->flags & UACCE_DEV_SVA))
+   return 0;
 
-   if (uacce->flags & UACCE_DEV_SVA) {
-   /*
-* Safe to pass an incomplete uacce_mm, since mm_exit cannot
-* fire while we hold a reference to the mm.
-*/
-   handle = iommu_sva_bind_device(uacce->parent, mm, uacce_mm);
-   if (IS_ERR(handle))
-   goto err_free;
+   handle = 

[PATCH v2 2/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-23 Thread Jean-Philippe Brucker
After binding a device to an mm, device drivers currently need to
register a mm_exit handler. This function is called when the mm exits,
to gracefully stop DMA targeting the address space and flush page faults
to the IOMMU.

This is deemed too complex for the MMU release() notifier, which may be
triggered by any mmput() invocation, from about 120 callsites [1]. The
upcoming SVA module has an example of such complexity: the I/O Page
Fault handler would need to call mmput_async() instead of mmput() after
handling an IOPF, to avoid triggering the release() notifier which would
in turn drain the IOPF queue and lock up.

Another concern is the DMA stop function taking too long, up to several
minutes [2]. For some mmput() callers this may disturb other users. For
example, if the OOM killer picks the mm bound to a device as the victim
and that mm's memory is locked, if the release() takes too long, it
might choose additional innocent victims to kill.

To simplify the MMU release notifier, don't forward the notification to
device drivers. Since they don't stop DMA on mm exit anymore, the PASID
lifetime is extended:

(1) The device driver calls bind(). A PASID is allocated.

  Here any DMA fault is handled by mm, and on error we don't print
  anything to dmesg. Userspace can easily trigger errors by issuing DMA
  on unmapped buffers.

(2) exit_mmap(), for example the process took a SIGKILL. This step
doesn't happen during normal operations. Remove the pgd from the
PASID table, since the page tables are about to be freed. Invalidate
the IOTLBs.

  Here the device may still perform DMA on the address space. Incoming
  transactions are aborted but faults aren't printed out. ATS
  Translation Requests return Successful Translation Completions with
  R=W=0. PRI Page Requests return with Invalid Request.

(3) The device driver stops DMA, possibly following release of a fd, and
calls unbind(). PASID table is cleared, IOTLB invalidated if
necessary. The page fault queues are drained, and the PASID is
freed.

  If DMA for that PASID is still running here, something went seriously
  wrong and errors should be reported.

For now remove iommu_sva_ops entirely. We might need to re-introduce
them at some point, for example to notify device drivers of unhandled
IOPF.

[1] https://lore.kernel.org/linux-iommu/20200306174239.gm31...@ziepe.ca/
[2] 
https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa79...@amd.com/

Signed-off-by: Jean-Philippe Brucker 
---
v1->v2: tried to clarify the lifetime of the PASID
---
 include/linux/iommu.h | 30 --
 drivers/iommu/iommu.c | 11 ---
 2 files changed, 41 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7ef8b0bda6951..bd330d6343b78 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -53,8 +53,6 @@ struct iommu_fault_event;
 
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
struct device *, unsigned long, int, void *);
-typedef int (*iommu_mm_exit_handler_t)(struct device *dev, struct iommu_sva *,
-  void *);
 typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault *, void *);
 
 struct iommu_domain_geometry {
@@ -171,25 +169,6 @@ enum iommu_dev_features {
 
 #define IOMMU_PASID_INVALID(-1U)
 
-/**
- * struct iommu_sva_ops - device driver callbacks for an SVA context
- *
- * @mm_exit: called when the mm is about to be torn down by exit_mmap. After
- *   @mm_exit returns, the device must not issue any more transaction
- *   with the PASID given as argument.
- *
- *   The @mm_exit handler is allowed to sleep. Be careful about the
- *   locks taken in @mm_exit, because they might lead to deadlocks if
- *   they are also held when dropping references to the mm. Consider 
the
- *   following call chain:
- *   mutex_lock(A); mmput(mm) -> exit_mm() -> @mm_exit() -> 
mutex_lock(A)
- *   Using mmput_async() prevents this scenario.
- *
- */
-struct iommu_sva_ops {
-   iommu_mm_exit_handler_t mm_exit;
-};
-
 #ifdef CONFIG_IOMMU_API
 
 /**
@@ -605,7 +584,6 @@ struct iommu_fwspec {
  */
 struct iommu_sva {
struct device   *dev;
-   const struct iommu_sva_ops  *ops;
 };
 
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
@@ -653,8 +631,6 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
struct mm_struct *mm,
void *drvdata);
 void iommu_sva_unbind_device(struct iommu_sva *handle);
-int iommu_sva_set_ops(struct iommu_sva *handle,
- const struct iommu_sva_ops *ops);
 int iommu_sva_get_pasid(struct iommu_sva *handle);
 
 #else /* CONFIG_IOMMU_API */
@@ -1058,12 +1034,6 @@ static inline void iommu_sva_unbind_device(struct 
iommu_sva *handle)
 {
 }
 
-static inline int 

[PATCH v2 0/2] iommu: Remove iommu_sva_ops::mm_exit()

2020-04-23 Thread Jean-Philippe Brucker
The IOMMU SVA API currently requires device drivers to implement an
mm_exit() callback, which stops device jobs that do DMA. This function
is called in the release() MMU notifier, when an address space that is
shared with a device exits.

It has been noted several time during discussions about SVA that
cancelling DMA jobs can be slow and complex, and doing it in the
release() notifier might cause synchronization issues. Device drivers
must in any case call unbind() to remove their bond, after stopping DMA
from a more favorable context (release of a file descriptor).

Patch 1 removes the mm_exit() callback from the uacce module, and patch
2 removes it from the IOMMU API. Since v1 [1] I fixed the uacce unbind
reported by Zhangfei and added details in the commit message of patch 2.

Jean-Philippe Brucker (2):
  uacce: Remove mm_exit() op
  iommu: Remove iommu_sva_ops::mm_exit()

 include/linux/iommu.h  |  30 ---
 include/linux/uacce.h  |  34 ++--
 drivers/iommu/iommu.c  |  11 ---
 drivers/misc/uacce/uacce.c | 172 +
 4 files changed, 51 insertions(+), 196 deletions(-)

-- 
2.26.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXT] Re: [RFC PATCH 1/4] bus: fsl-mc: add custom .dma_configure implementation

2020-04-23 Thread Lorenzo Pieralisi
On Thu, Apr 23, 2020 at 09:56:54AM +, Makarand Pawagi wrote:

[...]

> > > At a first glance, looks like this could very well fix the ACPI
> > > scenario, but I have some unclarities on the approach:
> > >   * are we going to rely in DT and ACPI generic layers even if these
> > > devices are not published / enumerated through DT or ACPI tables?
> > 
> > Assuming you mean the DPRC devices rather than the MC itself, then yes; in 
> > that
> > sense it's exactly the same as how we treat dynamically-discovered PCI 
> > devices.
> > 
> > >   * the firmware manages and provides discrete streamids for the
> > > devices it exposes so there's no translation involved. There's no
> > > requestor_id / input_id involved but it seems that we would still
> > > do some kind of translation relying for this on the DT/ACPI functions.
> > 
> > Wrong - last time I looked, what that firmware actually manages are
> > *ICIDs* for the devices, not SMMU Stream IDs or GIC Device IDs; what DT/ACPI
> > specifies is a translation from ICID to Stream ID/Device ID. The ICID is 
> > very much
> > the requester/input ID for that translation. Yes, in practice the 
> > "translation" is
> > effectively always a trivial identity mapping, but conceptually it most 
> > definitely
> > exists. Yes, the subtlety is incredibly easy to overlook because it's 
> > basically
> > drawing a distinction between one end of some wires vs. the other end, but 
> > it
> > matters.
> > 
> > (and of course "trivial 1:1 translation" isn't even true in the case of SMMU
> > Stream ID values, since IIRC they are really composed of 5 different 
> > inputs, only
> > one of which is (part of) the incoming ICID)
> > 
> > >   * MC firmware has its own stream_id (e.g. on some chips 0x4000,
> > > others 0xf00, so outside the range of stream_ids used for the mc devices)
> > > while for the devices on this bus, MC allocates stream_ids from a
> > > range (e.g. 0x17 - 0x3f). Is it possible to describe this in the IORT 
> > > table?
> > 
> > If it represents a unique ICID allocated to the MC itself, then sure, it 
> > simply goes
> > through the mapping like anything else. Just like a PCI host bridge owns
> > requester ID 0:0.0 and thus whatever Stream ID/Device ID that might map to.
> > 
> > If (for the sake of argument, because AIUI everything is an ICID in this 
> > particular
> > case) it's some hard-wired thing that exists in Stream ID/Device ID space 
> > only,
> > then it's a little trickier, but still in scope. In DT we have a lovely 
> > distinction
> > between between "originating from the node" and "translated through the
> > node", e.g. "msi-parent" vs.
> > "msi-map"; IORT is not quite as clear-cut, but there are at least a few 
> > options. If
> > the valid input ID space is smaller than 32 bits, then the "Named Component 
> > as
> > bridge" binding could simply define special out-of-range values to 
> > represent IDs
> > originating from the bridge itself, such that the NC driver knows what to 
> > do and
> > from IORT's point of view everything is just a normal mapping. Alternatively
> > there's already the example of SMMUv3 where we can have a mix of the normal
> > mappings from Stream ID to Device ID for the upstream masters plus a single
> > mapping for the SMMU's own Device ID - admittedly that depends on the
> > additional SMMUv3-specific Device ID Mapping Index property, but if 
> > necessary
> > it might be workable to have a de-facto interface for NCs that only 
> > considers
> > single mappings when configuring the NC itself, and only considers normal
> > mappings when configuring its children. Or maybe define a new mapping flag 
> > or
> > NC property if there's a real need to specify such a situation 
> > unambiguously at
> > the IORT level.
> > 
> > >   * Regarding the of_map_rid() use you mentioned, I was planning to
> > > decouple the mc bus from the DT layer by dropping the use of
> > > of_map_rid(), see patch 4.
> > > I briefly glanced over the iort code and spotted this static function:
> > > iort_iommu_xlate(). Wouldn't it also help, of course after making it 
> > > public?
> > 
> > I won't speak for Lorenzo or claim we've agreed on an approach at all (not 
> > least
> > because in all honesty we haven't really discussed it beyond these various 
> > email
> > threads), but FWIW my vision is that ultimately the DT/ACPI code would 
> > expose
> > a *_dma_configure() interface that takes an optional input ID, or (perhaps 
> > more
> > likely) an explicit pair of interfaces for "configure this regular device" 
> > and
> > "configure this device based on this 'host' device and this ID", and it 
> > becomes
> > the bus code's responsibility to pass the right device(s) and deal with 
> > multiple IDs
> > (i.e. for starters all the PCI alias stuff goes back to the PCI code where 
> > it really
> > should be, rather than having multiple copies of magic PCI awareness deep
> > down in DT/ACPI code).
> > 
> > Robin.
>  
> Hi Lorenzo, Robin,


RE: [EXT] Re: [RFC PATCH 1/4] bus: fsl-mc: add custom .dma_configure implementation

2020-04-23 Thread Makarand Pawagi



> -Original Message-
> From: Robin Murphy 
> Sent: Wednesday, April 15, 2020 11:02 PM
> To: Laurentiu Tudor ; Lorenzo Pieralisi
> 
> Cc: linux-ker...@vger.kernel.org; iommu@lists.linux-foundation.org; linux-arm-
> ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> ard.biesheu...@linaro.org; Ioana Ciornei ; Diana
> Madalina Craciun (OSS) ; m...@kernel.org;
> j...@solid-run.com; Pankaj Bansal ; Makarand
> Pawagi ; Calvin Johnson
> ; Varun Sethi ; Cristi Sovaiala
> ; stuart.yo...@arm.com; jeremy.lin...@arm.com;
> j...@8bytes.org; t...@linutronix.de; ja...@lakedaemon.net
> Subject: [EXT] Re: [RFC PATCH 1/4] bus: fsl-mc: add custom .dma_configure
> implementation
> 
> Caution: EXT Email
> 
> On 2020-04-15 4:44 pm, Laurentiu Tudor wrote:
> >
> >
> > On 4/14/2020 5:32 PM, Lorenzo Pieralisi wrote:
> >> On Wed, Mar 25, 2020 at 06:48:55PM +0200, Laurentiu Tudor wrote:
> >>> Hi Lorenzo,
> >>>
> >>> On 3/25/2020 2:51 PM, Lorenzo Pieralisi wrote:
>  On Thu, Feb 27, 2020 at 12:05:39PM +0200, laurentiu.tu...@nxp.com
> wrote:
> > From: Laurentiu Tudor 
> >
> > The devices on this bus are not discovered by way of device tree
> > but by queries to the firmware. It makes little sense to trick the
> > generic of layer into thinking that these devices are of related
> > so that we can get our dma configuration. Instead of doing that,
> > add our custom dma configuration implementation.
> >
> > Signed-off-by: Laurentiu Tudor 
> > ---
> >   drivers/bus/fsl-mc/fsl-mc-bus.c | 31
> ++-
> >   1 file changed, 30 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c
> > b/drivers/bus/fsl-mc/fsl-mc-bus.c index 36eb25f82c8e..eafaa0e0b906
> > 100644
> > --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> > +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> > @@ -132,11 +132,40 @@ static int fsl_mc_bus_uevent(struct device
> *dev, struct kobj_uevent_env *env)
> >   static int fsl_mc_dma_configure(struct device *dev)
> >   {
> >   struct device *dma_dev = dev;
> > + struct iommu_fwspec *fwspec;
> > + const struct iommu_ops *iommu_ops; struct fsl_mc_device *mc_dev
> > + = to_fsl_mc_device(dev); int ret;
> > + u32 icid;
> >
> >   while (dev_is_fsl_mc(dma_dev))
> >   dma_dev = dma_dev->parent;
> >
> > - return of_dma_configure(dev, dma_dev->of_node, 0);
> > + fwspec = dev_iommu_fwspec_get(dma_dev); if (!fwspec)
> > + return -ENODEV;
> > + iommu_ops = iommu_ops_from_fwnode(fwspec->iommu_fwnode);
> > + if (!iommu_ops)
> > + return -ENODEV;
> > +
> > + ret = iommu_fwspec_init(dev, fwspec->iommu_fwnode, iommu_ops);
> > + if (ret)
> > + return ret;
> > +
> > + icid = mc_dev->icid;
> > + ret = iommu_fwspec_add_ids(dev, , 1);
> 
>  I see. So with this patch we would use the MC named component only
>  to retrieve the iommu_ops
> >>>
> >>> Right. I'd also add that the implementation tries to follow the
> >>> existing standard .dma_configure implementations, e.g.
> >>> of_dma_configure + of_iommu_configure. I'd also note that similarly
> >>> to the ACPI case, this MC FW device is probed as a platform device
> >>> in the DT scenario, binding here [1].
> >>> A similar approach is used for the retrieval of the msi irq domain,
> >>> see following patch.
> >>>
>  - the streamid are injected directly here bypassing OF/IORT bindings
> translations altogether.
> >>>
> >>> Actually I've submitted a v2 [2] that calls into .of_xlate() to
> >>> allow the smmu driver to do some processing on the raw streamid
> >>> coming from the firmware. I have not yet tested this with ACPI but
> >>> expect it to work, however, it's debatable how valid is this
> >>> approach in the context of ACPI.
> >>
> >> Actually, what I think you need is of_map_rid() (and an IORT
> >> equivalent, that I am going to write - generalizing iort_msi_map_rid()).
> >>
> >> Would that be enough to enable IORT "normal" mappings in the MC bus
> >> named components ?
> >>
> >
> > At a first glance, looks like this could very well fix the ACPI
> > scenario, but I have some unclarities on the approach:
> >   * are we going to rely in DT and ACPI generic layers even if these
> > devices are not published / enumerated through DT or ACPI tables?
> 
> Assuming you mean the DPRC devices rather than the MC itself, then yes; in 
> that
> sense it's exactly the same as how we treat dynamically-discovered PCI 
> devices.
> 
> >   * the firmware manages and provides discrete streamids for the
> > devices it exposes so there's no translation involved. There's no
> > requestor_id / input_id involved but it seems that we would still
> > do some kind of translation relying for this on the DT/ACPI functions.
> 
> Wrong - last time I looked, what that firmware actually manages are
> *ICIDs* for the devices, not SMMU 

Re: [PATCH 04/29] staging: media: ipu3: use vmap instead of reimplementing it

2020-04-23 Thread Sakari Ailus
On Tue, Apr 14, 2020 at 03:13:23PM +0200, Christoph Hellwig wrote:
> Just use vmap instead of messing with vmalloc internals.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Peter Zijlstra (Intel) 

Thanks!

Acked-by: Sakari Ailus 

-- 
Sakari Ailus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv2] iommu/arm-smmu: Make remove callback message more informative

2020-04-23 Thread Sai Prakash Ranjan
Currently on reboot/shutdown, the following messages are
displayed on the console as error messages before the
system reboots/shutdown as part of remove callback.

On SC7180:

  arm-smmu 1500.iommu: removing device with active domains!
  arm-smmu 504.iommu: removing device with active domains!

Make this error message more informative and less scary.

Reported-by: Douglas Anderson 
Suggested-by: Robin Murphy 
Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/arm-smmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e622f4e33379..8ea634876e6c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2244,7 +2244,7 @@ static int arm_smmu_device_remove(struct platform_device 
*pdev)
return -ENODEV;
 
if (!bitmap_empty(smmu->context_map, ARM_SMMU_MAX_CBS))
-   dev_err(>dev, "removing device with active domains!\n");
+   dev_info(>dev, "disabling translation\n");
 
arm_smmu_bus_init(NULL);
iommu_device_unregister(>iommu);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: Demote error messages to debug in shutdown callback

2020-04-23 Thread Sai Prakash Ranjan

On 2020-04-23 14:58, Robin Murphy wrote:

On 2020-04-23 9:17 am, Sai Prakash Ranjan wrote:
[...]
Any update on the status here?  If I'm reading the conversation 
above,

Robin said: "we'll *always* see the warning because there's no way to
tear down the default DMA domains, and even if all devices *have* 
been

nicely quiesced there's no way to tell".  Did I understand that
properly?  If so, it seems like it's fully expected to see this
message on every reboot and it doesn't necessarily signify anything
bad.



Understanding is the same, waiting for Will and Robin to check if its 
OK

to make the message more friendly.


The way I see it, we essentially just want *something* visible that
will correlate with any misbehaviour that *might* result from turning
off a possibly-live context. How about simply "disabling translation",
at dev_warn or dev_info level?




Sounds good, I'll go with disabling translation with dev_info.

Thanks,
Sai
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/arm-smmu: Demote error messages to debug in shutdown callback

2020-04-23 Thread Robin Murphy

On 2020-04-23 9:17 am, Sai Prakash Ranjan wrote:
[...]

Any update on the status here?  If I'm reading the conversation above,
Robin said: "we'll *always* see the warning because there's no way to
tear down the default DMA domains, and even if all devices *have* been
nicely quiesced there's no way to tell".  Did I understand that
properly?  If so, it seems like it's fully expected to see this
message on every reboot and it doesn't necessarily signify anything
bad.



Understanding is the same, waiting for Will and Robin to check if its OK
to make the message more friendly.


The way I see it, we essentially just want *something* visible that will 
correlate with any misbehaviour that *might* result from turning off a 
possibly-live context. How about simply "disabling translation", at 
dev_warn or dev_info level?


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/arm-smmu: Demote error messages to debug in shutdown callback

2020-04-23 Thread Sai Prakash Ranjan

Hi,

On 2020-04-23 01:19, Doug Anderson wrote:

Hi,

On Tue, Mar 31, 2020 at 12:53 AM Sai Prakash Ranjan
 wrote:


Hi Will,

On 2020-03-31 13:14, Will Deacon wrote:
> On Tue, Mar 31, 2020 at 01:06:11PM +0530, Sai Prakash Ranjan wrote:
>> On 2020-03-30 23:54, Doug Anderson wrote:
>> > On Sat, Mar 28, 2020 at 12:35 AM Sai Prakash Ranjan
>> >  wrote:
>> > >
>> > > > Of course the fact that in practice we'll *always* see the warning
>> > > > because there's no way to tear down the default DMA domains, and even
>> > > > if all devices *have* been nicely quiesced there's no way to tell, is
>> > > > certainly less than ideal. Like I say, it's not entirely clear-cut
>> > > > either way...
>> > > >
>> > >
>> > > Thanks for these examples, good to know these scenarios in case we
>> > > come
>> > > across these.
>> > > However, if we see these error/warning messages appear everytime then
>> > > what will be
>> > > the credibility of these messages? We will just ignore these messages
>> > > when
>> > > these issues you mention actually appears because we see them
>> > > everytime
>> > > on
>> > > reboot or shutdown.
>> >
>> > I would agree that if these messages are expected to be seen every
>> > time, there's no way to fix them, and they're not indicative of any
>> > problem then something should be done.  Seeing something printed at
>> > "dev_error" level with an exclamation point (!) at the end makes me
>> > feel like this is something that needs immediate action on my part.
>> >
>> > If we really can't do better but feel that the messages need to be
>> > there, at least make them dev_info and less scary like:
>> >
>> >   arm-smmu 1500.iommu: turning off; DMA should be quiesced before
>> > now
>> >
>> > ...that would still give you a hint in the logs that if you saw a DMA
>> > transaction after the message that it was a bug but also wouldn't
>> > sound scary to someone who wasn't seeing any other problems.
>> >
>>
>> We can do this if Robin is OK?
>
> It would be nice if you could figure out which domains are still live
> when
> the SMMU is being shut down in your case and verify that it *is* infact
> benign before we start making the message more friendly. As Robin said
> earlier, rogue DMA is a real nightmare to debug.
>

I could see this error message for all the clients of apps_smmu.
I checked manually enabling bypass and removing iommus dt property
for each client of apps_smmu.


Any update on the status here?  If I'm reading the conversation above,
Robin said: "we'll *always* see the warning because there's no way to
tear down the default DMA domains, and even if all devices *have* been
nicely quiesced there's no way to tell".  Did I understand that
properly?  If so, it seems like it's fully expected to see this
message on every reboot and it doesn't necessarily signify anything
bad.



Understanding is the same, waiting for Will and Robin to check if its OK
to make the message more friendly.

-Sai
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v12 2/8] iommu/vt-d: Use a helper function to skip agaw for SL

2020-04-23 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Wednesday, April 22, 2020 2:53 AM
> 
> An Intel iommu domain uses 5-level page table by default. If the
> iommu that the domain tries to attach supports less page levels,
> the top level page tables should be skipped. Add a helper to do
> this so that it could be used in other places.
> 
> Signed-off-by: Jacob Pan 
> Reviewed-by: Eric Auger 
> ---
>  drivers/iommu/intel-pasid.c | 33 +++--
>  1 file changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 22b30f10b396..d9cea3011b58 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -500,6 +500,25 @@ int intel_pasid_setup_first_level(struct intel_iommu
> *iommu,
>  }
> 
>  /*
> + * Skip top levels of page tables for iommu which has less agaw
> + * than default. Unnecessary for PT mode.
> + */
> +static inline int iommu_skip_agaw(struct dmar_domain *domain,
> +   struct intel_iommu *iommu,
> +   struct dma_pte **pgd)
> +{
> + int agaw;
> +
> + for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
> + *pgd = phys_to_virt(dma_pte_addr(*pgd));
> + if (!dma_pte_present(*pgd))
> + return -EINVAL;
> + }
> +
> + return agaw;
> +}
> +
> +/*
>   * Set up the scalable mode pasid entry for second only translation type.
>   */
>  int intel_pasid_setup_second_level(struct intel_iommu *iommu,
> @@ -522,17 +541,11 @@ int intel_pasid_setup_second_level(struct
> intel_iommu *iommu,
>   return -EINVAL;
>   }
> 
> - /*
> -  * Skip top levels of page tables for iommu which has less agaw
> -  * than default. Unnecessary for PT mode.
> -  */
>   pgd = domain->pgd;
> - for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
> - pgd = phys_to_virt(dma_pte_addr(pgd));
> - if (!dma_pte_present(pgd)) {
> - dev_err(dev, "Invalid domain page table\n");
> - return -EINVAL;
> - }
> + agaw = iommu_skip_agaw(domain, iommu, );
> + if (agaw < 0) {
> + dev_err(dev, "Invalid domain page table\n");
> + return -EINVAL;
>   }
> 
>   pgd_val = virt_to_phys(pgd);
> --
> 2.7.4

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu