Re: [RFC PATCH V4 01/12] x86/HV: Initialize shared memory boundary in the Isolation VM.

2021-07-08 Thread Olaf Hering
On Wed, Jul 07, Tianyu Lan wrote:

> +++ b/include/asm-generic/mshyperv.h
> @@ -34,8 +34,18 @@ struct ms_hyperv_info {

>   void  __percpu **ghcb_base;

It would be cool if the cover letter states which commit id this series is 
based on.

Thanks,
Olaf
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: Enable swiotlb if any device supports iommu v2 and uses identity mapping

2021-07-08 Thread Kai-Heng Feng
We are seeing kernel panic on rtw88 probe routine because swiotlb isn't
set:
[  252.036773] rtw_8821ce :06:00.0: enabling device ( -> 0003)
[  252.037084] Kernel panic - not syncing: Can not allocate SWIOTLB buffer 
earlier and can't now provide you with the DMA bounce buffer
[  252.037146] CPU: 7 PID: 1174 Comm: modprobe Not tainted 5.13.0+ #39
[  252.037175] Hardware name: HP HP ProDesk 405 G6 Small Form Factor PC/8835, 
BIOS S05 Ver. 02.04.00 06/03/2021
[  252.037218] Call Trace:
[  252.037231]  dump_stack_lvl+0x4a/0x5f
[  252.037251]  dump_stack+0x10/0x12
[  252.037267]  panic+0x101/0x2e3
[  252.037284]  swiotlb_tbl_map_single.cold+0xc/0x73
[  252.037305]  ? __mod_lruvec_page_state+0x95/0xb0
[  252.037329]  ? kmalloc_large_node+0x8c/0xb0
[  252.037348]  ? __netdev_alloc_skb+0x44/0x160
[  252.037370]  swiotlb_map+0x61/0x240
[  252.037387]  ? __alloc_skb+0xed/0x1e0
[  252.037404]  dma_map_page_attrs+0x12c/0x1f0
[  252.037422]  ? __netdev_alloc_skb+0x44/0x160
[  252.037443]  rtw_pci_probe+0x30f/0x872 [rtw88_pci]
[  252.037467]  local_pci_probe+0x48/0x80
[  252.037487]  pci_device_probe+0x105/0x1c0
[  252.037506]  really_probe+0x1fe/0x3f0
[  252.037524]  __driver_probe_device+0x109/0x180
[  252.037545]  driver_probe_device+0x23/0x90
[  252.037564]  __driver_attach+0xac/0x1b0
[  252.037582]  ? __device_attach_driver+0xe0/0xe0
[  252.037602]  bus_for_each_dev+0x7e/0xc0
[  252.037620]  driver_attach+0x1e/0x20
[  252.037637]  bus_add_driver+0x135/0x1f0
[  252.037654]  driver_register+0x95/0xf0
[  252.037672]  ? 0xc0fa
[  252.037687]  __pci_register_driver+0x68/0x70
[  252.037707]  rtw_8821ce_driver_init+0x23/0x1000 [rtw88_8821ce]
[  252.037734]  do_one_initcall+0x48/0x1d0
[  252.037752]  ? __cond_resched+0x1a/0x50
[  252.037771]  ? kmem_cache_alloc_trace+0x29d/0x3c0
[  252.037792]  do_init_module+0x62/0x280
[  252.037810]  load_module+0x2577/0x27c0
[  252.037862]  __do_sys_finit_module+0xbf/0x120
[  252.037877]  __x64_sys_finit_module+0x1a/0x20
[  252.037893]  do_syscall_64+0x3b/0xc0
[  252.037907]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  252.037925] RIP: 0033:0x7ff5a2f9408d
[  252.037938] Code: 27 0d 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d ab dd 0c 00 f7 d8 64 89 01 48
[  252.037993] RSP: 002b:7fffaa89dce8 EFLAGS: 0246 ORIG_RAX: 
0139
[  252.038017] RAX: ffda RBX: 55fd4f881080 RCX: 7ff5a2f9408d
[  252.038039] RDX:  RSI: 55fd4f63ec02 RDI: 0009
[  252.038063] RBP: 0004 R08:  R09: 55fd4f8885b0
[  252.038085] R10: 0009 R11: 0246 R12: 55fd4f63ec02
[  252.038107] R13: 55fd4f881120 R14:  R15: 55fd4f88e350
[  252.038293] Kernel Offset: 0x3060 from 0x8100 (relocation 
range: 0x8000-0xbfff)

Because the Realtek WiFi (PCI 06:00.0) is in the same IOMMU group as AMD
graphics (PCI 01:00.0),
[1.326166] pci :01:00.0: Adding to iommu group 0
...
[1.326268] pci :06:00.0: Adding to iommu group 0

And the AMD graphics supports iommu v2, so the group uses intentity
mapping based on the query from amd_iommu_def_domain_type().

However, the Realtek WiFi doesn't support 64bit DMA, so we need to
enable swiotlb, which was disabled by amd_iommu_init_dma_ops(), to make
remapping work.

Cc: Robin Murphy 
Signed-off-by: Kai-Heng Feng 
---
 drivers/iommu/amd/iommu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..7c5111ed5c97 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -344,6 +344,9 @@ static int iommu_init_device(struct device *dev)
 
iommu = amd_iommu_rlookup_table[dev_data->devid];
dev_data->iommu_v2 = iommu->is_iommu_v2;
+
+   if (dev_data->iommu_v2)
+   swiotlb = 1;
}
 
dev_iommu_priv_set(dev, dev_data);
-- 
2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu: Enable non-strict DMA on QCom SD/MMC

2021-07-08 Thread Joerg Roedel
On Wed, Jul 07, 2021 at 01:00:13PM -0700, Doug Anderson wrote:
> a) Nothing is inherently broken with my current approach.
> 
> b) My current approach doesn't make anybody terribly upset even if
> nobody is totally in love with it.

Well, no, sorry :)

I don't think it is a good idea to allow drivers to opt-out of the
strict-setting. This is a platform or user decision, and the driver
should accept whatever it gets.

So the real question is still why strict is the default setting and how
to change that. Or document for the users that want performance how to
change the setting, so that they can decide.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE

2021-07-08 Thread Stefan Hajnoczi
On Thu, Jul 08, 2021 at 12:17:56PM +0800, Jason Wang wrote:
> 
> 在 2021/7/7 下午11:54, Stefan Hajnoczi 写道:
> > On Wed, Jul 07, 2021 at 05:24:08PM +0800, Jason Wang wrote:
> > > 在 2021/7/7 下午4:55, Stefan Hajnoczi 写道:
> > > > On Wed, Jul 07, 2021 at 11:43:28AM +0800, Jason Wang wrote:
> > > > > 在 2021/7/7 上午1:11, Stefan Hajnoczi 写道:
> > > > > > On Tue, Jul 06, 2021 at 09:08:26PM +0800, Jason Wang wrote:
> > > > > > > On Tue, Jul 6, 2021 at 6:15 PM Stefan Hajnoczi 
> > > > > > >  wrote:
> > > > > > > > On Tue, Jul 06, 2021 at 10:34:33AM +0800, Jason Wang wrote:
> > > > > > > > > 在 2021/7/5 下午8:49, Stefan Hajnoczi 写道:
> > > > > > > > > > On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote:
> > > > > > > > > > > 在 2021/7/4 下午5:49, Yongji Xie 写道:
> > > > > > > > > > > > > > OK, I get you now. Since the VIRTIO specification 
> > > > > > > > > > > > > > says "Device
> > > > > > > > > > > > > > configuration space is generally used for 
> > > > > > > > > > > > > > rarely-changing or
> > > > > > > > > > > > > > initialization-time parameters". I assume the 
> > > > > > > > > > > > > > VDUSE_DEV_SET_CONFIG
> > > > > > > > > > > > > > ioctl should not be called frequently.
> > > > > > > > > > > > > The spec uses MUST and other terms to define the 
> > > > > > > > > > > > > precise requirements.
> > > > > > > > > > > > > Here the language (especially the word "generally") 
> > > > > > > > > > > > > is weaker and means
> > > > > > > > > > > > > there may be exceptions.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Another type of access that doesn't work with the 
> > > > > > > > > > > > > VDUSE_DEV_SET_CONFIG
> > > > > > > > > > > > > approach is reads that have side-effects. For 
> > > > > > > > > > > > > example, imagine a field
> > > > > > > > > > > > > containing an error code if the device encounters a 
> > > > > > > > > > > > > problem unrelated to
> > > > > > > > > > > > > a specific virtqueue request. Reading from this field 
> > > > > > > > > > > > > resets the error
> > > > > > > > > > > > > code to 0, saving the driver an extra configuration 
> > > > > > > > > > > > > space write access
> > > > > > > > > > > > > and possibly race conditions. It isn't possible to 
> > > > > > > > > > > > > implement those
> > > > > > > > > > > > > semantics suing VDUSE_DEV_SET_CONFIG. It's another 
> > > > > > > > > > > > > corner case, but it
> > > > > > > > > > > > > makes me think that the interface does not allow full 
> > > > > > > > > > > > > VIRTIO semantics.
> > > > > > > > > > > Note that though you're correct, my understanding is that 
> > > > > > > > > > > config space is
> > > > > > > > > > > not suitable for this kind of error propagating. And it 
> > > > > > > > > > > would be very hard
> > > > > > > > > > > to implement such kind of semantic in some transports.  
> > > > > > > > > > > Virtqueue should be
> > > > > > > > > > > much better. As Yong Ji quoted, the config space is used 
> > > > > > > > > > > for
> > > > > > > > > > > "rarely-changing or intialization-time parameters".
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > > Agreed. I will use VDUSE_DEV_GET_CONFIG in the next 
> > > > > > > > > > > > version. And to
> > > > > > > > > > > > handle the message failure, I'm going to add a return 
> > > > > > > > > > > > value to
> > > > > > > > > > > > virtio_config_ops.get() and virtio_cread_* API so that 
> > > > > > > > > > > > the error can
> > > > > > > > > > > > be propagated to the virtio device driver. Then the 
> > > > > > > > > > > > virtio-blk device
> > > > > > > > > > > > driver can be modified to handle that.
> > > > > > > > > > > > 
> > > > > > > > > > > > Jason and Stefan, what do you think of this way?
> > > > > > > > > > Why does VDUSE_DEV_GET_CONFIG need to support an error 
> > > > > > > > > > return value?
> > > > > > > > > > 
> > > > > > > > > > The VIRTIO spec provides no way for the device to report 
> > > > > > > > > > errors from
> > > > > > > > > > config space accesses.
> > > > > > > > > > 
> > > > > > > > > > The QEMU virtio-pci implementation returns -1 from invalid
> > > > > > > > > > virtio_config_read*() and silently discards 
> > > > > > > > > > virtio_config_write*()
> > > > > > > > > > accesses.
> > > > > > > > > > 
> > > > > > > > > > VDUSE can take the same approach with
> > > > > > > > > > VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG.
> > > > > > > > > > 
> > > > > > > > > > > I'd like to stick to the current assumption thich 
> > > > > > > > > > > get_config won't fail.
> > > > > > > > > > > That is to say,
> > > > > > > > > > > 
> > > > > > > > > > > 1) maintain a config in the kernel, make sure the config 
> > > > > > > > > > > space read can
> > > > > > > > > > > always succeed
> > > > > > > > > > > 2) introduce an ioctl for the vduse usersapce to update 
> > > > > > > > > > > the config space.
> > > > > > > > > > > 3) we can synchronize with the vduse userspace during 
> > > > > > > > > > > set_config
> > > > > > > > > > > 
> > > > > > >

Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE

2021-07-08 Thread Stefan Hajnoczi
On Wed, Jul 07, 2021 at 05:09:13PM +0800, Yongji Xie wrote:
> On Tue, Jul 6, 2021 at 6:22 PM Stefan Hajnoczi  wrote:
> >
> > On Tue, Jul 06, 2021 at 11:04:18AM +0800, Yongji Xie wrote:
> > > On Mon, Jul 5, 2021 at 8:50 PM Stefan Hajnoczi  
> > > wrote:
> > > >
> > > > On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang wrote:
> > > > >
> > > > > 在 2021/7/4 下午5:49, Yongji Xie 写道:
> > > > > > > > OK, I get you now. Since the VIRTIO specification says "Device
> > > > > > > > configuration space is generally used for rarely-changing or
> > > > > > > > initialization-time parameters". I assume the 
> > > > > > > > VDUSE_DEV_SET_CONFIG
> > > > > > > > ioctl should not be called frequently.
> > > > > > > The spec uses MUST and other terms to define the precise 
> > > > > > > requirements.
> > > > > > > Here the language (especially the word "generally") is weaker and 
> > > > > > > means
> > > > > > > there may be exceptions.
> > > > > > >
> > > > > > > Another type of access that doesn't work with the 
> > > > > > > VDUSE_DEV_SET_CONFIG
> > > > > > > approach is reads that have side-effects. For example, imagine a 
> > > > > > > field
> > > > > > > containing an error code if the device encounters a problem 
> > > > > > > unrelated to
> > > > > > > a specific virtqueue request. Reading from this field resets the 
> > > > > > > error
> > > > > > > code to 0, saving the driver an extra configuration space write 
> > > > > > > access
> > > > > > > and possibly race conditions. It isn't possible to implement those
> > > > > > > semantics suing VDUSE_DEV_SET_CONFIG. It's another corner case, 
> > > > > > > but it
> > > > > > > makes me think that the interface does not allow full VIRTIO 
> > > > > > > semantics.
> > > > >
> > > > >
> > > > > Note that though you're correct, my understanding is that config 
> > > > > space is
> > > > > not suitable for this kind of error propagating. And it would be very 
> > > > > hard
> > > > > to implement such kind of semantic in some transports.  Virtqueue 
> > > > > should be
> > > > > much better. As Yong Ji quoted, the config space is used for
> > > > > "rarely-changing or intialization-time parameters".
> > > > >
> > > > >
> > > > > > Agreed. I will use VDUSE_DEV_GET_CONFIG in the next version. And to
> > > > > > handle the message failure, I'm going to add a return value to
> > > > > > virtio_config_ops.get() and virtio_cread_* API so that the error can
> > > > > > be propagated to the virtio device driver. Then the virtio-blk 
> > > > > > device
> > > > > > driver can be modified to handle that.
> > > > > >
> > > > > > Jason and Stefan, what do you think of this way?
> > > >
> > > > Why does VDUSE_DEV_GET_CONFIG need to support an error return value?
> > > >
> > >
> > > We add a timeout and return error in case userspace never replies to
> > > the message.
> > >
> > > > The VIRTIO spec provides no way for the device to report errors from
> > > > config space accesses.
> > > >
> > > > The QEMU virtio-pci implementation returns -1 from invalid
> > > > virtio_config_read*() and silently discards virtio_config_write*()
> > > > accesses.
> > > >
> > > > VDUSE can take the same approach with
> > > > VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG.
> > > >
> > >
> > > I noticed that virtio_config_read*() only returns -1 when we access a
> > > invalid field. But in the VDUSE case, VDUSE_DEV_GET_CONFIG might fail
> > > when we access a valid field. Not sure if it's ok to silently ignore
> > > this kind of error.
> >
> > That's a good point but it's a general VIRTIO issue. Any device
> > implementation (QEMU userspace, hardware vDPA, etc) can fail, so the
> > VIRTIO specification needs to provide a way for the driver to detect
> > this.
> >
> > If userspace violates the contract then VDUSE needs to mark the device
> > broken. QEMU's device emulation does something similar with the
> > vdev->broken flag.
> >
> > The VIRTIO Device Status field DEVICE_NEEDS_RESET bit can be set by
> > vDPA/VDUSE to indicate that the device is not operational and must be
> > reset.
> >
> 
> It might be a solution. But DEVICE_NEEDS_RESET  is not implemented
> currently. So I'm thinking whether it's ok to add a check of
> DEVICE_NEEDS_RESET status bit in probe function of virtio device
> driver (e.g. virtio-blk driver). Then VDUSE can make use of it to fail
> device initailization when configuration space access failed.

Okay.

Stefan


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] dma-iommu: fix swiotlb SKIP_CPU_SYNC and arch sync

2021-07-08 Thread Joerg Roedel
Adding Robin.

On Fri, Jul 02, 2021 at 02:37:41PM +0900, David Stevens wrote:
> From: David Stevens 
> 
> Make map_swiotlb and unmap_swiotlb only for mapping, and consistently
> use sync_single_for and sync_sg_for functions for swiotlb sync and arch
> sync. This ensures that the same code path is responsible for syncing
> regardless of whether or not SKIP_CPU_SYNC is set. In the process, fix
> various places where the original physical address and swiotlb tlb_addr
> are mixed up:
>   - Make sync_sg functions call sync_single functions for untrusted
> devices, so they use tlb_addr when checking is_swiotlb_buffer and
> when doing arch sync if necessary.
>   - Use tlb_addr for arch sync in map_page if necessary.
>   - In map_sg, map before syncing so that arch sync can target the
> bounce buffer if necessary.
>   - Pass SKIP_CPU_SYNC to swiotlb map and unmap to avoid double syncing
> the swiotlb. This had previously only happened in the unmap_page
> case, but is now necessary for all swiotlb cases.
> 
> Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
> Signed-off-by: David Stevens 
> ---
>  drivers/iommu/dma-iommu.c | 82 ---
>  1 file changed, 51 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7bcdd1205535..24d1042cd052 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -505,7 +505,8 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
> dma_addr_t dma_addr,
>   __iommu_dma_unmap(dev, dma_addr, size);
>  
>   if (unlikely(is_swiotlb_buffer(phys)))
> - swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> + swiotlb_tbl_unmap_single(dev, phys, size, dir,
> +  attrs | DMA_ATTR_SKIP_CPU_SYNC);
>  }
>  
>  static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
> @@ -536,7 +537,8 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
> phys_addr_t phys,
>  
>  static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t 
> phys,
>   size_t org_size, dma_addr_t dma_mask, bool coherent,
> - enum dma_data_direction dir, unsigned long attrs)
> + enum dma_data_direction dir, unsigned long attrs,
> + phys_addr_t *adj_phys)
>  {
>   int prot = dma_info_to_prot(dir, coherent, attrs);
>   struct iommu_domain *domain = iommu_get_dma_domain(dev);
> @@ -555,7 +557,8 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
> *dev, phys_addr_t phys,
>   iova_offset(iovad, phys | org_size)) {
>   aligned_size = iova_align(iovad, org_size);
>   phys = swiotlb_tbl_map_single(dev, phys, org_size,
> -   aligned_size, dir, attrs);
> +   aligned_size, dir,
> +   attrs | DMA_ATTR_SKIP_CPU_SYNC);
>  
>   if (phys == DMA_MAPPING_ERROR)
>   return DMA_MAPPING_ERROR;
> @@ -573,6 +576,8 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
> *dev, phys_addr_t phys,
>  
>   memset(padding_start, 0, padding_size);
>   }
> + if (adj_phys)
> + *adj_phys = phys;
>  
>   iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
>   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> @@ -785,15 +790,17 @@ static void iommu_dma_sync_single_for_cpu(struct device 
> *dev,
>   swiotlb_sync_single_for_cpu(dev, phys, size, dir);
>  }
>  
> -static void iommu_dma_sync_single_for_device(struct device *dev,
> - dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
> +static void __iommu_dma_sync_single_for_device(struct device *dev,
> + dma_addr_t dma_handle, size_t size,
> + enum dma_data_direction dir, phys_addr_t phys)
>  {
> - phys_addr_t phys;
> -
>   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
>   return;
>  
> - phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> + if (phys == 0)
> + phys = iommu_iova_to_phys(iommu_get_dma_domain(dev),
> +   dma_handle);
> +
>   if (is_swiotlb_buffer(phys))
>   swiotlb_sync_single_for_device(dev, phys, size, dir);
>  
> @@ -801,6 +808,12 @@ static void iommu_dma_sync_single_for_device(struct 
> device *dev,
>   arch_sync_dma_for_device(phys, size, dir);
>  }
>  
> +static void iommu_dma_sync_single_for_device(struct device *dev,
> + dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
> +{
> + __iommu_dma_sync_single_for_device(dev, dma_handle, size, dir, 0);
> +}
> +
>  static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>   struct scatterlist *sgl, int nelems,
>   enum dma_data_direction 

Re: [PATCH 1/9] dt-bindings: memory: mediatek: Add mt8195 smi binding

2021-07-08 Thread Krzysztof Kozlowski
On 16/06/2021 13:43, Yong Wu wrote:
> This patch adds mt8195 smi supporting in the bindings.
> 
> In mt8195, there are two smi-common HW, one is for vdo(video output),
> the other is for vpp(video processing pipe). They connects with different
> smi-larbs, then some setting(bus_sel) is different. Differentiate them
> with the compatible string.
> 
> Something like this:
> 
> IOMMU(VDO)  IOMMU(VPP)
>|   |
>   SMI_COMMON_VDO  SMI_COMMON_VPP
>   --- 
>   |  |   ...  |  | ...
> larb0 larb2  ...larb1 larb3...
> 
> Signed-off-by: Yong Wu 
> ---
>  .../bindings/memory-controllers/mediatek,smi-common.yaml| 6 +-
>  .../bindings/memory-controllers/mediatek,smi-larb.yaml  | 3 +++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 

I cannot find it on devicetree list, it seems you did not Cc it. Please
use scripts/get_maintainer.pl.


Best regards,
Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Enable swiotlb if any device supports iommu v2 and uses identity mapping

2021-07-08 Thread Joerg Roedel
On Thu, Jul 08, 2021 at 03:42:32PM +0800, Kai-Heng Feng wrote:
> @@ -344,6 +344,9 @@ static int iommu_init_device(struct device *dev)
>  
>   iommu = amd_iommu_rlookup_table[dev_data->devid];
>   dev_data->iommu_v2 = iommu->is_iommu_v2;
> +
> + if (dev_data->iommu_v2)
> + swiotlb = 1;

This looks like the big hammer, as it will affect all other systems
where the AMD GPUs are in their own group.

What is needed here is an explicit check whether a non-iommu-v2 device
is direct-mapped because it shares a group with the GPU, and only enable
swiotlb in this case.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/4] Add dynamic iommu backed bounce buffers

2021-07-08 Thread Joerg Roedel
Adding Robin too.

On Wed, Jul 07, 2021 at 04:55:01PM +0900, David Stevens wrote:
> Add support for per-domain dynamic pools of iommu bounce buffers to the 
> dma-iommu API. This allows iommu mappings to be reused while still
> maintaining strict iommu protection. Allocating buffers dynamically
> instead of using swiotlb carveouts makes per-domain pools more amenable
> on systems with large numbers of devices or where devices are unknown.
> 
> When enabled, all non-direct streaming mappings below a configurable
> size will go through bounce buffers. Note that this means drivers which
> don't properly use the DMA API (e.g. i915) cannot use an iommu when this
> feature is enabled. However, all drivers which work with swiotlb=force
> should work.
> 
> Bounce buffers serve as an optimization in situations where interactions
> with the iommu are very costly. For example, virtio-iommu operations in
> a guest on a linux host require a vmexit, involvement the VMM, and a
> VFIO syscall. For relatively small DMA operations, memcpy can be
> significantly faster.
> 
> As a performance comparison, on a device with an i5-10210U, I ran fio
> with a VFIO passthrough NVMe drive with '--direct=1 --rw=read
> --ioengine=libaio --iodepth=64' and block sizes 4k, 16k, 64k, and
> 128k. Test throughput increased by 2.8x, 4.7x, 3.6x, and 3.6x. Time
> spent in iommu_dma_unmap_(page|sg) per GB processed decreased by 97%,
> 94%, 90%, and 87%. Time spent in iommu_dma_map_(page|sg) decreased
> by >99%, as bounce buffers don't require syncing here in the read case.
> Running with multiple jobs doesn't serve as a useful performance
> comparison because virtio-iommu and vfio_iommu_type1 both have big
> locks that significantly limit mulithreaded DMA performance.
> 
> This patch set is based on v5.13-rc7 plus the patches at [1].
> 
> David Stevens (4):
>   dma-iommu: add kalloc gfp flag to alloc helper
>   dma-iommu: replace device arguments
>   dma-iommu: expose a few helper functions to module
>   dma-iommu: Add iommu bounce buffers to dma-iommu api
> 
>  drivers/iommu/Kconfig  |  10 +
>  drivers/iommu/Makefile |   1 +
>  drivers/iommu/dma-iommu.c  | 119 --
>  drivers/iommu/io-buffer-pool.c | 656 +
>  drivers/iommu/io-buffer-pool.h |  91 +
>  include/linux/dma-iommu.h  |  12 +
>  6 files changed, 861 insertions(+), 28 deletions(-)
>  create mode 100644 drivers/iommu/io-buffer-pool.c
>  create mode 100644 drivers/iommu/io-buffer-pool.h
> 
> -- 
> 2.32.0.93.g670b81a890-goog
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/9] memory: mtk-smi: Use clk_bulk instead of the clk ops

2021-07-08 Thread Krzysztof Kozlowski
On 16/06/2021 13:43, Yong Wu wrote:
> smi have many clocks: apb/smi/gals.
> This patch use clk_bulk interface instead of the orginal one to simply
> the code.
> 
> gals is optional clk(some larbs may don't have gals). use clk_bulk_optional
> instead. and then remove the has_gals flag.
> 
> Also remove clk fail logs since bulk interface already output fail log.
> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/memory/mtk-smi.c | 124 +++
>  1 file changed, 34 insertions(+), 90 deletions(-)
> 
> diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
> index c5fb51f73b34..bcd2bf130655 100644
> --- a/drivers/memory/mtk-smi.c
> +++ b/drivers/memory/mtk-smi.c
> @@ -60,9 +60,18 @@ enum mtk_smi_gen {
>   MTK_SMI_GEN2
>  };
>  
> +#define MTK_SMI_CLK_NR_MAX   4
> +
> +static const char * const mtk_smi_common_clocks[] = {
> + "apb", "smi", "gals0", "gals1", /* glas is optional */

Typo here - glas.

> +};
> +
> +static const char * const mtk_smi_larb_clocks[] = {
> + "apb",  "smi", "gals"
> +};
> +
>  struct mtk_smi_common_plat {
>   enum mtk_smi_gen gen;
> - bool has_gals;
>   u32  bus_sel; /* Balance some larbs to enter mmu0 or mmu1 */
>  };
>  
> @@ -70,13 +79,12 @@ struct mtk_smi_larb_gen {
>   int port_in_larb[MTK_LARB_NR_MAX + 1];
>   void (*config_port)(struct device *dev);
>   unsigned intlarb_direct_to_common_mask;
> - boolhas_gals;
>  };
>  
>  struct mtk_smi {
>   struct device   *dev;
> - struct clk  *clk_apb, *clk_smi;
> - struct clk  *clk_gals0, *clk_gals1;
> + unsigned intclk_num;
> + struct clk_bulk_dataclks[MTK_SMI_CLK_NR_MAX];
>   struct clk  *clk_async; /*only needed by mt2701*/
>   union {
>   void __iomem*smi_ao_base; /* only for gen1 */
> @@ -95,45 +103,6 @@ struct mtk_smi_larb { /* larb: local arbiter */
>   unsigned char   *bank;
>  };
>  
> -static int mtk_smi_clk_enable(const struct mtk_smi *smi)
> -{
> - int ret;
> -
> - ret = clk_prepare_enable(smi->clk_apb);
> - if (ret)
> - return ret;
> -
> - ret = clk_prepare_enable(smi->clk_smi);
> - if (ret)
> - goto err_disable_apb;
> -
> - ret = clk_prepare_enable(smi->clk_gals0);
> - if (ret)
> - goto err_disable_smi;
> -
> - ret = clk_prepare_enable(smi->clk_gals1);
> - if (ret)
> - goto err_disable_gals0;
> -
> - return 0;
> -
> -err_disable_gals0:
> - clk_disable_unprepare(smi->clk_gals0);
> -err_disable_smi:
> - clk_disable_unprepare(smi->clk_smi);
> -err_disable_apb:
> - clk_disable_unprepare(smi->clk_apb);
> - return ret;
> -}
> -
> -static void mtk_smi_clk_disable(const struct mtk_smi *smi)
> -{
> - clk_disable_unprepare(smi->clk_gals1);
> - clk_disable_unprepare(smi->clk_gals0);
> - clk_disable_unprepare(smi->clk_smi);
> - clk_disable_unprepare(smi->clk_apb);
> -}
> -
>  int mtk_smi_larb_get(struct device *larbdev)
>  {
>   int ret = pm_runtime_resume_and_get(larbdev);
> @@ -270,7 +239,6 @@ static const struct mtk_smi_larb_gen mtk_smi_larb_mt6779 
> = {
>  };
>  
>  static const struct mtk_smi_larb_gen mtk_smi_larb_mt8183 = {
> - .has_gals   = true,
>   .config_port= mtk_smi_larb_config_port_gen2_general,
>   .larb_direct_to_common_mask = BIT(2) | BIT(3) | BIT(7),
> /* IPU0 | IPU1 | CCU */
> @@ -320,6 +288,7 @@ static int mtk_smi_larb_probe(struct platform_device 
> *pdev)
>   struct device_node *smi_node;
>   struct platform_device *smi_pdev;
>   struct device_link *link;
> + int i, ret;
>  
>   larb = devm_kzalloc(dev, sizeof(*larb), GFP_KERNEL);
>   if (!larb)
> @@ -331,22 +300,14 @@ static int mtk_smi_larb_probe(struct platform_device 
> *pdev)
>   if (IS_ERR(larb->base))
>   return PTR_ERR(larb->base);
>  
> - larb->smi.clk_apb = devm_clk_get(dev, "apb");
> - if (IS_ERR(larb->smi.clk_apb))
> - return PTR_ERR(larb->smi.clk_apb);
> -
> - larb->smi.clk_smi = devm_clk_get(dev, "smi");
> - if (IS_ERR(larb->smi.clk_smi))
> - return PTR_ERR(larb->smi.clk_smi);
> -
> - if (larb->larb_gen->has_gals) {
> - /* The larbs may still haven't gals even if the SoC support.*/
> - larb->smi.clk_gals0 = devm_clk_get(dev, "gals");
> - if (PTR_ERR(larb->smi.clk_gals0) == -ENOENT)
> - larb->smi.clk_gals0 = NULL;
> - else if (IS_ERR(larb->smi.clk_gals0))
> - return PTR_ERR(larb->smi.clk_gals0);
> - }
> + larb->smi.clk_num = ARRAY_SIZE(mtk_smi_larb_clocks);
> + for (i = 0; i < larb->smi.clk_num; i++)
> + larb->smi.clks[i].i

Re: [PATCH 4/9] memory: mtk-smi: Rename smi_gen to smi_type

2021-07-08 Thread Krzysztof Kozlowski
On 16/06/2021 13:43, Yong Wu wrote:
> This is a preparing patch for adding smi sub common.

Don't write "This patch". Use simple imperative:
"Prepare for adding smi sub common."

https://elixir.bootlin.com/linux/v5.13/source/Documentation/process/submitting-patches.rst#L89
 
> About the previou smi_gen, we have gen1/gen2 that stand for the generation
> number for HW. I plan to add a new type(sub_common), then the "gen" is not
> prober. this patch only change it to "type", No functional change.

Same.

> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/memory/mtk-smi.c | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 

Best regards,
Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 0/6] iommu: Enhance IOMMU default DMA mode build options

2021-07-08 Thread j...@8bytes.org
Hi John,

On Fri, Jun 25, 2021 at 05:41:09PM +0100, John Garry wrote:
> We think that this series is ready to go.
> 
> There would be a build conflict with the following:
> https://lore.kernel.org/linux-iommu/20210616100500.174507-1-na...@vmware.com/
> 
> So please let us know where you stand on it, so that could be resolved.
> 
> Robin and Baolu have kindly reviewed all the patches, apart from the AMD
> one.

The AMD one also looks good to me, please re-send after the merge window
closes and I will take care of it then. Note that I usually start
merging new stuff after -rc3 is out.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu: rockchip: Fix physical address decoding

2021-07-08 Thread Joerg Roedel
Hi Benjamin,

On Mon, Jul 05, 2021 at 01:40:24PM +0200, Benjamin Gaignard wrote:
> Gentle ping on this patch :-)

Please fix the subject to match the IOMMU tree conventions. This would
be:

iommu/rockchip: Fix physical address decoding

Re-send the patch after the merge window is closed.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/9] memory: mtk-smi: Add smi sub common support

2021-07-08 Thread Krzysztof Kozlowski
On 16/06/2021 13:43, Yong Wu wrote:
> This patch adds smi-sub-common support. some larbs may connect with the
> smi-sub-common, then connect with smi-common.

Please start sentences with capital letter. This (similarly to "This
patch") appears in multiple patches.

> 
> Before we create device link between smi-larb with smi-common, If we have
> sub-common, we should use device link the smi-larb and smi-sub-common,
> then use device link between the smi-sub-common with smi-common. This is
> for enabling clock/power automatically.
> Move the device link code to a new interface for reusing.
> 
> there is no SW extra setting for smi-sub-common.
> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/memory/mtk-smi.c | 78 ++--
>  1 file changed, 52 insertions(+), 26 deletions(-)
> 


Best regards,
Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: qcom: Revert "iommu/arm: Cleanup resources in case of probe error path"

2021-07-08 Thread Joerg Roedel
On Mon, Jul 05, 2021 at 08:56:57AM +0200, Marek Szyprowski wrote:
> QCOM IOMMU driver calls bus_set_iommu() for every IOMMU device controller,
> what fails for the second and latter IOMMU devices. This is intended and
> must be not fatal to the driver registration process. Also the cleanup
> path should take care of the runtime PM state, what is missing in the
> current patch. Revert relevant changes to the QCOM IOMMU driver until
> a proper fix is prepared.
> 
> This partially reverts commit 249c9dc6aa0db74a0f7908efd04acf774e19b155.
> 
> Fixes: 249c9dc6aa0d ("iommu/arm: Cleanup resources in case of probe error 
> path")
> Suggested-by: Will Deacon 
> Signed-off-by: Marek Szyprowski 
> ---
>  drivers/iommu/arm/arm-smmu/qcom_iommu.c | 13 ++---
>  1 file changed, 2 insertions(+), 11 deletions(-)

Applied to iommu/fixes, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Enable swiotlb if any device supports iommu v2 and uses identity mapping

2021-07-08 Thread Robin Murphy

On 2021-07-08 10:28, Joerg Roedel wrote:

On Thu, Jul 08, 2021 at 03:42:32PM +0800, Kai-Heng Feng wrote:

@@ -344,6 +344,9 @@ static int iommu_init_device(struct device *dev)
  
  		iommu = amd_iommu_rlookup_table[dev_data->devid];

dev_data->iommu_v2 = iommu->is_iommu_v2;
+
+   if (dev_data->iommu_v2)
+   swiotlb = 1;


This looks like the big hammer, as it will affect all other systems
where the AMD GPUs are in their own group.

What is needed here is an explicit check whether a non-iommu-v2 device
is direct-mapped because it shares a group with the GPU, and only enable
swiotlb in this case.


Right, it's basically about whether any DMA-limited device might at any 
time end up in an IOMMU_DOMAIN_IDENTITY domain. And given the 
possibility of device hotplug and the user being silly with the sysfs 
interface, I don't think we can categorically determine that at boot time.


Also note that Intel systems are likely to be similarly affected (in 
fact intel-iommu doesn't even have the iommu_default_passthough() check 
so it's probably even easier to blow up).


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE

2021-07-08 Thread Yongji Xie
On Thu, Jul 8, 2021 at 5:06 PM Stefan Hajnoczi  wrote:
>
> On Thu, Jul 08, 2021 at 12:17:56PM +0800, Jason Wang wrote:
> >
> > 在 2021/7/7 下午11:54, Stefan Hajnoczi 写道:
> > > On Wed, Jul 07, 2021 at 05:24:08PM +0800, Jason Wang wrote:
> > > > 在 2021/7/7 下午4:55, Stefan Hajnoczi 写道:
> > > > > On Wed, Jul 07, 2021 at 11:43:28AM +0800, Jason Wang wrote:
> > > > > > 在 2021/7/7 上午1:11, Stefan Hajnoczi 写道:
> > > > > > > On Tue, Jul 06, 2021 at 09:08:26PM +0800, Jason Wang wrote:
> > > > > > > > On Tue, Jul 6, 2021 at 6:15 PM Stefan Hajnoczi 
> > > > > > > >  wrote:
> > > > > > > > > On Tue, Jul 06, 2021 at 10:34:33AM +0800, Jason Wang wrote:
> > > > > > > > > > 在 2021/7/5 下午8:49, Stefan Hajnoczi 写道:
> > > > > > > > > > > On Mon, Jul 05, 2021 at 11:36:15AM +0800, Jason Wang 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > 在 2021/7/4 下午5:49, Yongji Xie 写道:
> > > > > > > > > > > > > > > OK, I get you now. Since the VIRTIO specification 
> > > > > > > > > > > > > > > says "Device
> > > > > > > > > > > > > > > configuration space is generally used for 
> > > > > > > > > > > > > > > rarely-changing or
> > > > > > > > > > > > > > > initialization-time parameters". I assume the 
> > > > > > > > > > > > > > > VDUSE_DEV_SET_CONFIG
> > > > > > > > > > > > > > > ioctl should not be called frequently.
> > > > > > > > > > > > > > The spec uses MUST and other terms to define the 
> > > > > > > > > > > > > > precise requirements.
> > > > > > > > > > > > > > Here the language (especially the word "generally") 
> > > > > > > > > > > > > > is weaker and means
> > > > > > > > > > > > > > there may be exceptions.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Another type of access that doesn't work with the 
> > > > > > > > > > > > > > VDUSE_DEV_SET_CONFIG
> > > > > > > > > > > > > > approach is reads that have side-effects. For 
> > > > > > > > > > > > > > example, imagine a field
> > > > > > > > > > > > > > containing an error code if the device encounters a 
> > > > > > > > > > > > > > problem unrelated to
> > > > > > > > > > > > > > a specific virtqueue request. Reading from this 
> > > > > > > > > > > > > > field resets the error
> > > > > > > > > > > > > > code to 0, saving the driver an extra configuration 
> > > > > > > > > > > > > > space write access
> > > > > > > > > > > > > > and possibly race conditions. It isn't possible to 
> > > > > > > > > > > > > > implement those
> > > > > > > > > > > > > > semantics suing VDUSE_DEV_SET_CONFIG. It's another 
> > > > > > > > > > > > > > corner case, but it
> > > > > > > > > > > > > > makes me think that the interface does not allow 
> > > > > > > > > > > > > > full VIRTIO semantics.
> > > > > > > > > > > > Note that though you're correct, my understanding is 
> > > > > > > > > > > > that config space is
> > > > > > > > > > > > not suitable for this kind of error propagating. And it 
> > > > > > > > > > > > would be very hard
> > > > > > > > > > > > to implement such kind of semantic in some transports.  
> > > > > > > > > > > > Virtqueue should be
> > > > > > > > > > > > much better. As Yong Ji quoted, the config space is 
> > > > > > > > > > > > used for
> > > > > > > > > > > > "rarely-changing or intialization-time parameters".
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Agreed. I will use VDUSE_DEV_GET_CONFIG in the next 
> > > > > > > > > > > > > version. And to
> > > > > > > > > > > > > handle the message failure, I'm going to add a return 
> > > > > > > > > > > > > value to
> > > > > > > > > > > > > virtio_config_ops.get() and virtio_cread_* API so 
> > > > > > > > > > > > > that the error can
> > > > > > > > > > > > > be propagated to the virtio device driver. Then the 
> > > > > > > > > > > > > virtio-blk device
> > > > > > > > > > > > > driver can be modified to handle that.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jason and Stefan, what do you think of this way?
> > > > > > > > > > > Why does VDUSE_DEV_GET_CONFIG need to support an error 
> > > > > > > > > > > return value?
> > > > > > > > > > >
> > > > > > > > > > > The VIRTIO spec provides no way for the device to report 
> > > > > > > > > > > errors from
> > > > > > > > > > > config space accesses.
> > > > > > > > > > >
> > > > > > > > > > > The QEMU virtio-pci implementation returns -1 from invalid
> > > > > > > > > > > virtio_config_read*() and silently discards 
> > > > > > > > > > > virtio_config_write*()
> > > > > > > > > > > accesses.
> > > > > > > > > > >
> > > > > > > > > > > VDUSE can take the same approach with
> > > > > > > > > > > VDUSE_DEV_GET_CONFIG/VDUSE_DEV_SET_CONFIG.
> > > > > > > > > > >
> > > > > > > > > > > > I'd like to stick to the current assumption thich 
> > > > > > > > > > > > get_config won't fail.
> > > > > > > > > > > > That is to say,
> > > > > > > > > > > >
> > > > > > > > > > > > 1) maintain a config in the kernel, make sure the 
> > > > > > > > > > > > config space read can
> > > > > > > > > > > > always succeed
> > > > > > > > > > >

Re: [PATCH 0/4] Add dynamic iommu backed bounce buffers

2021-07-08 Thread Lu Baolu

Hi David,

I like this idea. Thanks for proposing this.

On 2021/7/7 15:55, David Stevens wrote:

Add support for per-domain dynamic pools of iommu bounce buffers to the
dma-iommu API. This allows iommu mappings to be reused while still
maintaining strict iommu protection. Allocating buffers dynamically
instead of using swiotlb carveouts makes per-domain pools more amenable
on systems with large numbers of devices or where devices are unknown.


Have you ever considered leveraging the per-device swiotlb memory pool
added by below series?

https://lore.kernel.org/linux-iommu/20210625123004.GA3170@willie-the-truck/



When enabled, all non-direct streaming mappings below a configurable
size will go through bounce buffers. Note that this means drivers which
don't properly use the DMA API (e.g. i915) cannot use an iommu when this
feature is enabled. However, all drivers which work with swiotlb=force
should work.


If so, why not making it more scalable by adding a callback into vendor
iommu drivers? The vendor iommu drivers have enough information to tell
whether the bounce buffer is feasible for a specific domain.



Bounce buffers serve as an optimization in situations where interactions
with the iommu are very costly. For example, virtio-iommu operations in


The simulated IOMMU does the same thing.

It's also an optimization for bare metal in cases where the strict mode
of cache invalidation is used. CPU moving data is faster than IOMMU
cache invalidation if the buffer is small.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V4 13/13] x86/HV: Not set memory decrypted/encrypted during kexec alloc/free page in IVM

2021-07-08 Thread Tianyu Lan

Hi Dave:
 Thanks for your review.

On 7/8/2021 12:14 AM, Dave Hansen wrote:

On 7/7/21 8:46 AM, Tianyu Lan wrote:

@@ -598,7 +599,7 @@ void arch_kexec_unprotect_crashkres(void)
   */
  int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
  {
-   if (sev_active())
+   if (sev_active() || hv_is_isolation_supported())
return 0;
  
  	/*

@@ -611,7 +612,7 @@ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int 
pages, gfp_t gfp)
  
  void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)

  {
-   if (sev_active())
+   if (sev_active() || hv_is_isolation_supported())
return;


You might want to take a look through the "protected guest" patches.  I
think this series is touching a few of the same locations that TDX and
recent SEV work touch.

https://lore.kernel.org/lkml/20210618225755.662725-5-sathyanarayanan.kuppusw...@linux.intel.com/


Thanks for reminder. You are right. There will be a generic API to check 
"proteced guest" type.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Enable swiotlb if any device supports iommu v2 and uses identity mapping

2021-07-08 Thread Kai-Heng Feng
On Thu, Jul 8, 2021 at 6:18 PM Robin Murphy  wrote:
>
> On 2021-07-08 10:28, Joerg Roedel wrote:
> > On Thu, Jul 08, 2021 at 03:42:32PM +0800, Kai-Heng Feng wrote:
> >> @@ -344,6 +344,9 @@ static int iommu_init_device(struct device *dev)
> >>
> >>  iommu = amd_iommu_rlookup_table[dev_data->devid];
> >>  dev_data->iommu_v2 = iommu->is_iommu_v2;
> >> +
> >> +if (dev_data->iommu_v2)
> >> +swiotlb = 1;
> >
> > This looks like the big hammer, as it will affect all other systems
> > where the AMD GPUs are in their own group.
> >
> > What is needed here is an explicit check whether a non-iommu-v2 device
> > is direct-mapped because it shares a group with the GPU, and only enable
> > swiotlb in this case.
>
> Right, it's basically about whether any DMA-limited device might at any
> time end up in an IOMMU_DOMAIN_IDENTITY domain. And given the
> possibility of device hotplug and the user being silly with the sysfs
> interface, I don't think we can categorically determine that at boot time.
>
> Also note that Intel systems are likely to be similarly affected (in
> fact intel-iommu doesn't even have the iommu_default_passthough() check
> so it's probably even easier to blow up).

swiotlb is enabled by pci_swiotlb_detect_4gb() and intel-iommu doesn't
disable it.

I wonder if we can take the same approach in amd-iommu?

Kai-Heng

>
> Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH V4 01/12] x86/HV: Initialize shared memory boundary in the Isolation VM.

2021-07-08 Thread Tianyu Lan

Hi Olaf:

On 7/8/2021 3:34 PM, Olaf Hering wrote:

On Wed, Jul 07, Tianyu Lan wrote:


+++ b/include/asm-generic/mshyperv.h
@@ -34,8 +34,18 @@ struct ms_hyperv_info {



void  __percpu **ghcb_base;


It would be cool if the cover letter states which commit id this series is 
based on.


Thanks for your reminder. I will add this in the later version.
This patchset is rebased on Hyper-V next branch with Swiotlb 
“Restricted DMA“ patches from Claire Chang 


https://lore.kernel.org/lkml/20210624155526.2775863-1-tien...@chromium.org/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 0/3] iommu: Enable non-strict DMA on QCom SD/MMC

2021-07-08 Thread Doug Anderson
Hi,

On Thu, Jul 8, 2021 at 1:09 AM Joerg Roedel  wrote:
>
> On Wed, Jul 07, 2021 at 01:00:13PM -0700, Doug Anderson wrote:
> > a) Nothing is inherently broken with my current approach.
> >
> > b) My current approach doesn't make anybody terribly upset even if
> > nobody is totally in love with it.
>
> Well, no, sorry :)
>
> I don't think it is a good idea to allow drivers to opt-out of the
> strict-setting. This is a platform or user decision, and the driver
> should accept whatever it gets.

Sure, I agree with you there. The driver shouldn't ever be able to
override and make things less strict than the user or platform wants.
It feels like that can be accomplished. See below.


> So the real question is still why strict is the default setting and how
> to change that.

I guess there are two strategies if we agree that there's a benefit to
running some devices in strict and others in non-strict:

* opt-in to strict: default is non-strict and we have to explicitly
list what we want to be strict.

* opt-out of strict: default is strict and we have to explicitly list
what we want to be non-strict.

I guess the question is: do we allow both strategies or only one of
them? I think you are suggesting that the kernel should support
"opt-in" to strict and that that matches the status quo with PCI on
x86. I'm pushing for some type of "opt-out" of strict support. I have
heard from security folks that they'd prefer "opt-out" of strict as
well. If we're willing to accept more complex config options we could
support both choosable by KConfig. How it'd all work in my mind:

Command line:

* iommu.strict=0 - suggest non-strict by default
* iommu.strict=1 - force strict for all drivers
* iommu.strict not specified - no opinion

Kconfig:

* IOMMU_DEFAULT_LAZY - suggest non-strict by default; drivers can
opt-in to strict
* IOMMU_DEFAULT_STRICT - force strict for all drivers
* IOMMU_DEFAULT_LOOSE_STRICT - allow explicit suggestions for laziness
but default to strict if no votes.

Drivers:
* suggest lazy - suggest non-strict
* force strict - force strict
* no vote


How the above work together:

* if _any_ of the three things wants strict then it's strict.

* if _all_ of the three things want lazy then it's lazy.

* If the KConfig is "loose strict" and the command line is set to
"lazy" then it's equivalent to the KConfig saying "lazy". In other
words drivers could still "opt-in" to strict but otherwise we'd be
lazy.

* The only way for a driver's "suggest lazy" vote to have any effect
at all is if "iommu.strict" wasn't specified on the command line _and_
if the KConfig was "loose strict". This is effectively the "opt-out"
of lazy.


If you think the strategy I describe above is garbage then would you
be OK if I re-worked my patchset to at least allow non-PCI drivers to
"opt-in" to strict? Effectively I'd change patch #3 to list all of the
peripherals on my SoC _except_ the USB and SD/MMC and request that
they all be strict. If other people expressed their preference for the
"opt-out" of strict strategy would that change your mind?


> Or document for the users that want performance how to
> change the setting, so that they can decide.

Pushing this to the users can make sense for a Linux distribution but
probably less sense for an embedded platform. So I'm happy to make
some way for a user to override this (like via kernel command line),
but I also strongly believe there should be a default that users don't
have to futz with that we think is correct.

-Doug
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Enable swiotlb if any device supports iommu v2 and uses identity mapping

2021-07-08 Thread Robin Murphy

On 2021-07-08 14:57, Kai-Heng Feng wrote:

On Thu, Jul 8, 2021 at 6:18 PM Robin Murphy  wrote:


On 2021-07-08 10:28, Joerg Roedel wrote:

On Thu, Jul 08, 2021 at 03:42:32PM +0800, Kai-Heng Feng wrote:

@@ -344,6 +344,9 @@ static int iommu_init_device(struct device *dev)

  iommu = amd_iommu_rlookup_table[dev_data->devid];
  dev_data->iommu_v2 = iommu->is_iommu_v2;
+
+if (dev_data->iommu_v2)
+swiotlb = 1;


This looks like the big hammer, as it will affect all other systems
where the AMD GPUs are in their own group.

What is needed here is an explicit check whether a non-iommu-v2 device
is direct-mapped because it shares a group with the GPU, and only enable
swiotlb in this case.


Right, it's basically about whether any DMA-limited device might at any
time end up in an IOMMU_DOMAIN_IDENTITY domain. And given the
possibility of device hotplug and the user being silly with the sysfs
interface, I don't think we can categorically determine that at boot time.

Also note that Intel systems are likely to be similarly affected (in
fact intel-iommu doesn't even have the iommu_default_passthough() check
so it's probably even easier to blow up).


swiotlb is enabled by pci_swiotlb_detect_4gb() and intel-iommu doesn't
disable it.


Oh, right... I did say I found this dance hard to follow. Clearly I 
shouldn't have trusted what I thought I remembered from looking at it 
yesterday :)


Also not helped by the fact that it sets iommu_detected which *does* 
disable SWIOTLB, but only on IA-64.



I wonder if we can take the same approach in amd-iommu?


Certainly if there's a precedent for leaving SWIOTLB enabled even if it 
*might* be redundant, that seems like the easiest option (it's what we 
do on arm64 too, but then we have system topologies where some devices 
may not be behind IOMMUs even when others are). More fun would be to try 
to bring it up at the first sign of IOMMU_DOMAIN_IDENTITY if it was 
disabled previously, but I don't have the highest hope of that being 
practical.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] dma-iommu: fix swiotlb SKIP_CPU_SYNC and arch sync

2021-07-08 Thread Robin Murphy

On 2021-07-08 10:17, Joerg Roedel wrote:

Adding Robin.

On Fri, Jul 02, 2021 at 02:37:41PM +0900, David Stevens wrote:

From: David Stevens 

Make map_swiotlb and unmap_swiotlb only for mapping, and consistently
use sync_single_for and sync_sg_for functions for swiotlb sync and arch
sync. This ensures that the same code path is responsible for syncing
regardless of whether or not SKIP_CPU_SYNC is set. In the process, fix
various places where the original physical address and swiotlb tlb_addr
are mixed up:
   - Make sync_sg functions call sync_single functions for untrusted
 devices, so they use tlb_addr when checking is_swiotlb_buffer and
 when doing arch sync if necessary.
   - Use tlb_addr for arch sync in map_page if necessary.
   - In map_sg, map before syncing so that arch sync can target the
 bounce buffer if necessary.
   - Pass SKIP_CPU_SYNC to swiotlb map and unmap to avoid double syncing
 the swiotlb. This had previously only happened in the unmap_page
 case, but is now necessary for all swiotlb cases.

Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")


Hmm, there's a lot going on here, and it's not really clear to me what's 
what, especially WRT the comment about addresses being mixed up. Is 
there an actual correctness bug anywhere, i.e. an address which *should* 
be synced *isn't*? If so, please fix that minimally in a standalone 
patch without all the additional complication. If it's purely an 
efficiency improvement - i.e. the bounce page getting synced twice 
(which shouldn't have *too* much impact) or the original copy geting 
redundantly synced *as well* as the bounce page, then please clarify 
whether you've measured significant impact in practice or if this is 
just an observation from code inspection.


TBH I don't find the overall justification in the commit message 
particularly convincing - to a less generous reading it kind of comes 
across as "move things around for the sake of it". Yes one can argue for 
keeping all the cache maintenance in one place, but one can equally 
argue for the keeping all the low-level practicalities of bouncing under 
SWIOTLB's umbrella such that it doesn't clutter up the IOMMU-focused 
code. The point of utilising the SWIOTLB machinery in the first place 
was to share and reuse as much of its "normal" flow as possible to keep 
things simple (see the thread at [1] for where the history began).


Now, what is true is that the integration into iommu-dma is a bit rough 
and bolted-on in places. IIRC it was done expediently to un-block 
further development on the Intel driver, and it was always the plan to 
come back and improve it later (like only bouncing the unaligned start 
and end of larger buffers such that whole pages in the middle can still 
be mapped in-place). As such I'm very happy to see that someone has the 
time and interest to improve things, but some of this patch looks like 
adding more mess to work around the existing mess, rather than more 
fundamentally cleaning it up. For example, rather than further cementing 
the bodge of having two separate map_sg implementations, I think the 
bouncing could be streamlined into iommu-dma's own flow as part of its 
scatterlist transformation (AFAICS temporarily swizzling sg_page so that 
iommu_map_sg() maps the bounced copy would be no worse than what it's 
already doing).


Thanks,
Robin.

[1] 
https://lore.kernel.org/linux-iommu/20190312163845.ga13...@infradead.org/



Signed-off-by: David Stevens 
---
  drivers/iommu/dma-iommu.c | 82 ---
  1 file changed, 51 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..24d1042cd052 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -505,7 +505,8 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
__iommu_dma_unmap(dev, dma_addr, size);
  
  	if (unlikely(is_swiotlb_buffer(phys)))

-   swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+   swiotlb_tbl_unmap_single(dev, phys, size, dir,
+attrs | DMA_ATTR_SKIP_CPU_SYNC);
  }
  
  static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,

@@ -536,7 +537,8 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
  
  static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,

size_t org_size, dma_addr_t dma_mask, bool coherent,
-   enum dma_data_direction dir, unsigned long attrs)
+   enum dma_data_direction dir, unsigned long attrs,
+   phys_addr_t *adj_phys)
  {
int prot = dma_info_to_prot(dir, coherent, attrs);
struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -555,7 +557,8 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
iova_offset(iovad, phys | org_size)

[PATCH v2] iommu/amd: Keep swiotlb enabled to ensure devices with 32bit DMA still work

2021-07-08 Thread Kai-Heng Feng
We are seeing kernel panic on rtw88 probe routine because swiotlb isn't
set:
[  252.036773] rtw_8821ce :06:00.0: enabling device ( -> 0003)
[  252.037084] Kernel panic - not syncing: Can not allocate SWIOTLB buffer 
earlier and can't now provide you with the DMA bounce buffer
[  252.037146] CPU: 7 PID: 1174 Comm: modprobe Not tainted 5.13.0+ #39
[  252.037175] Hardware name: HP HP ProDesk 405 G6 Small Form Factor PC/8835, 
BIOS S05 Ver. 02.04.00 06/03/2021
[  252.037218] Call Trace:
[  252.037231]  dump_stack_lvl+0x4a/0x5f
[  252.037251]  dump_stack+0x10/0x12
[  252.037267]  panic+0x101/0x2e3
[  252.037284]  swiotlb_tbl_map_single.cold+0xc/0x73
[  252.037305]  ? __mod_lruvec_page_state+0x95/0xb0
[  252.037329]  ? kmalloc_large_node+0x8c/0xb0
[  252.037348]  ? __netdev_alloc_skb+0x44/0x160
[  252.037370]  swiotlb_map+0x61/0x240
[  252.037387]  ? __alloc_skb+0xed/0x1e0
[  252.037404]  dma_map_page_attrs+0x12c/0x1f0
[  252.037422]  ? __netdev_alloc_skb+0x44/0x160
[  252.037443]  rtw_pci_probe+0x30f/0x872 [rtw88_pci]
[  252.037467]  local_pci_probe+0x48/0x80
[  252.037487]  pci_device_probe+0x105/0x1c0
[  252.037506]  really_probe+0x1fe/0x3f0
[  252.037524]  __driver_probe_device+0x109/0x180
[  252.037545]  driver_probe_device+0x23/0x90
[  252.037564]  __driver_attach+0xac/0x1b0
[  252.037582]  ? __device_attach_driver+0xe0/0xe0
[  252.037602]  bus_for_each_dev+0x7e/0xc0
[  252.037620]  driver_attach+0x1e/0x20
[  252.037637]  bus_add_driver+0x135/0x1f0
[  252.037654]  driver_register+0x95/0xf0
[  252.037672]  ? 0xc0fa
[  252.037687]  __pci_register_driver+0x68/0x70
[  252.037707]  rtw_8821ce_driver_init+0x23/0x1000 [rtw88_8821ce]
[  252.037734]  do_one_initcall+0x48/0x1d0
[  252.037752]  ? __cond_resched+0x1a/0x50
[  252.037771]  ? kmem_cache_alloc_trace+0x29d/0x3c0
[  252.037792]  do_init_module+0x62/0x280
[  252.037810]  load_module+0x2577/0x27c0
[  252.037862]  __do_sys_finit_module+0xbf/0x120
[  252.037877]  __x64_sys_finit_module+0x1a/0x20
[  252.037893]  do_syscall_64+0x3b/0xc0
[  252.037907]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  252.037925] RIP: 0033:0x7ff5a2f9408d
[  252.037938] Code: 27 0d 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d ab dd 0c 00 f7 d8 64 89 01 48
[  252.037993] RSP: 002b:7fffaa89dce8 EFLAGS: 0246 ORIG_RAX: 
0139
[  252.038017] RAX: ffda RBX: 55fd4f881080 RCX: 7ff5a2f9408d
[  252.038039] RDX:  RSI: 55fd4f63ec02 RDI: 0009
[  252.038063] RBP: 0004 R08:  R09: 55fd4f8885b0
[  252.038085] R10: 0009 R11: 0246 R12: 55fd4f63ec02
[  252.038107] R13: 55fd4f881120 R14:  R15: 55fd4f88e350
[  252.038293] Kernel Offset: 0x3060 from 0x8100 (relocation 
range: 0x8000-0xbfff)

Because the Realtek WiFi (PCI 06:00.0) is in the same IOMMU group as AMD
graphics (PCI 01:00.0),
[1.326166] pci :01:00.0: Adding to iommu group 0
...
[1.326268] pci :06:00.0: Adding to iommu group 0

And the AMD graphics supports iommu_v2, so the group uses intentity
mapping based on the query from amd_iommu_def_domain_type().

However, the Realtek WiFi only supports 32bit DMA, so we need to
make sure swiotlb is enabled.

The swiotlb is enabled by pci_swiotlb_detect_4gb() to support legacy
devices, but it gets disabled later by amd_iommu_init_dma_ops(). Keep
swiotlb enabled to resolve the issue.

Cc: Robin Murphy 
Signed-off-by: Kai-Heng Feng 
---
v2:
 - Keep swiotlb enabled if it's already set.
 - Some wording change.

 drivers/iommu/amd/iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..a893a6b6aeba 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1774,7 +1774,8 @@ void amd_iommu_domain_update(struct protection_domain 
*domain)
 
 static void __init amd_iommu_init_dma_ops(void)
 {
-   swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
+   if (!swiotlb)
+   swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
 
if (amd_iommu_unmap_flush)
pr_info("IO/TLB flush on unmap enabled\n");
-- 
2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-08 Thread Will Deacon
On Tue, Jul 06, 2021 at 12:14:16PM -0700, Nathan Chancellor wrote:
> On 7/6/2021 10:06 AM, Will Deacon wrote:
> > On Tue, Jul 06, 2021 at 04:39:11PM +0100, Robin Murphy wrote:
> > > On 2021-07-06 15:05, Christoph Hellwig wrote:
> > > > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > > > > FWIW I was pondering the question of whether to do something along 
> > > > > those
> > > > > lines or just scrap the default assignment entirely, so since I 
> > > > > hadn't got
> > > > > round to saying that I've gone ahead and hacked up the alternative
> > > > > (similarly untested) for comparison :)
> > > > > 
> > > > > TBH I'm still not sure which one I prefer...
> > > > 
> > > > Claire did implement something like your suggestion originally, but
> > > > I don't really like it as it doesn't scale for adding multiple global
> > > > pools, e.g. for the 64-bit addressable one for the various encrypted
> > > > secure guest schemes.
> > > 
> > > Ah yes, that had slipped my mind, and it's a fair point indeed. Since 
> > > we're
> > > not concerned with a minimal fix for backports anyway I'm more than happy 
> > > to
> > > focus on Will's approach. Another thing is that that looks to take us a
> > > quiet step closer to the possibility of dynamically resizing a SWIOTLB 
> > > pool,
> > > which is something that some of the hypervisor protection schemes looking 
> > > to
> > > build on top of this series may want to explore at some point.
> > 
> > Ok, I'll split that nasty diff I posted up into a reviewable series and we
> > can take it from there.
> 
> For what it's worth, I attempted to boot Will's diff on top of Konrad's
> devel/for-linus-5.14 and it did not work; in fact, I got no output on my
> monitor period, even with earlyprintk=, and I do not think this machine has
> a serial console.

Looking back at the diff, I completely messed up swiotlb_exit() by mixing up
physical and virtual addresses.

> Robin's fix does work, it survived ten reboots with no issues getting to X
> and I do not see the KASAN and slub debug messages anymore but I understand
> that this is not the preferred solution it seems (although Konrad did want
> to know if it works).
> 
> I am happy to test any further patches or follow ups as needed, just keep me
> on CC.

Cheers. Since this isn't 5.14 material any more, I'll CC you on a series
next week.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/4] Add dynamic iommu backed bounce buffers

2021-07-08 Thread Robin Murphy

On 2021-07-08 10:29, Joerg Roedel wrote:

Adding Robin too.

On Wed, Jul 07, 2021 at 04:55:01PM +0900, David Stevens wrote:

Add support for per-domain dynamic pools of iommu bounce buffers to the
dma-iommu API. This allows iommu mappings to be reused while still
maintaining strict iommu protection. Allocating buffers dynamically
instead of using swiotlb carveouts makes per-domain pools more amenable
on systems with large numbers of devices or where devices are unknown.


But isn't that just as true for the currently-supported case? All you 
need is a large enough Thunderbolt enclosure and you could suddenly plug 
in a dozen untrusted GPUs all wanting to map hundreds of megabytes of 
memory. If there's a real concern worth addressing, surely it's worth 
addressing properly for everyone.



When enabled, all non-direct streaming mappings below a configurable
size will go through bounce buffers. Note that this means drivers which
don't properly use the DMA API (e.g. i915) cannot use an iommu when this
feature is enabled. However, all drivers which work with swiotlb=force
should work.

Bounce buffers serve as an optimization in situations where interactions
with the iommu are very costly. For example, virtio-iommu operations in
a guest on a linux host require a vmexit, involvement the VMM, and a
VFIO syscall. For relatively small DMA operations, memcpy can be
significantly faster.


Yup, back when the bounce-buffering stuff first came up I know 
networking folks were interested in terms of latency for small packets - 
virtualised IOMMUs are indeed another interesting case I hadn't thought 
of. It's definitely been on the radar as another use-case we'd like to 
accommodate with the bounce-buffering scheme. However, that's the thing: 
bouncing is bouncing and however you look at it it still overlaps so 
much with the untrusted case - there's no reason that couldn't use 
pre-mapped bounce buffers too, for instance - that the only necessary 
difference is really the policy decision of when to bounce. iommu-dma 
has already grown complicated enough, and having *three* different ways 
of doing things internally just seems bonkers and untenable. Pre-map the 
bounce buffers? Absolutely. Dynamically grow them on demand? Yes please! 
Do it all as a special thing in its own NIH module and leave the 
existing mess to rot? Sorry, but no.


Thanks,
Robin.


As a performance comparison, on a device with an i5-10210U, I ran fio
with a VFIO passthrough NVMe drive with '--direct=1 --rw=read
--ioengine=libaio --iodepth=64' and block sizes 4k, 16k, 64k, and
128k. Test throughput increased by 2.8x, 4.7x, 3.6x, and 3.6x. Time
spent in iommu_dma_unmap_(page|sg) per GB processed decreased by 97%,
94%, 90%, and 87%. Time spent in iommu_dma_map_(page|sg) decreased
by >99%, as bounce buffers don't require syncing here in the read case.
Running with multiple jobs doesn't serve as a useful performance
comparison because virtio-iommu and vfio_iommu_type1 both have big
locks that significantly limit mulithreaded DMA performance.

This patch set is based on v5.13-rc7 plus the patches at [1].

David Stevens (4):
   dma-iommu: add kalloc gfp flag to alloc helper
   dma-iommu: replace device arguments
   dma-iommu: expose a few helper functions to module
   dma-iommu: Add iommu bounce buffers to dma-iommu api

  drivers/iommu/Kconfig  |  10 +
  drivers/iommu/Makefile |   1 +
  drivers/iommu/dma-iommu.c  | 119 --
  drivers/iommu/io-buffer-pool.c | 656 +
  drivers/iommu/io-buffer-pool.h |  91 +
  include/linux/dma-iommu.h  |  12 +
  6 files changed, 861 insertions(+), 28 deletions(-)
  create mode 100644 drivers/iommu/io-buffer-pool.c
  create mode 100644 drivers/iommu/io-buffer-pool.h

--
2.32.0.93.g670b81a890-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] dma-iommu: add kalloc gfp flag to alloc helper

2021-07-08 Thread Robin Murphy

On 2021-07-07 08:55, David Stevens wrote:

From: David Stevens 

Add gfp flag for kalloc calls within __iommu_dma_alloc_pages, so the
function can be called from atomic contexts.


Why bother? If you need GFP_ATOMIC for allocating the pages array, then 
you don't not need it for allocating the pages themselves. It's hardly 
rocket science to infer one from the other.


Robin.


Signed-off-by: David Stevens 
---
  drivers/iommu/dma-iommu.c | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 614f0dd86b08..00993b56c977 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -593,7 +593,8 @@ static void __iommu_dma_free_pages(struct page **pages, int 
count)
  }
  
  static struct page **__iommu_dma_alloc_pages(struct device *dev,

-   unsigned int count, unsigned long order_mask, gfp_t gfp)
+   unsigned int count, unsigned long order_mask,
+   gfp_t page_gfp, gfp_t kalloc_gfp)
  {
struct page **pages;
unsigned int i = 0, nid = dev_to_node(dev);
@@ -602,15 +603,15 @@ static struct page **__iommu_dma_alloc_pages(struct 
device *dev,
if (!order_mask)
return NULL;
  
-	pages = kvzalloc(count * sizeof(*pages), GFP_KERNEL);

+   pages = kvzalloc(count * sizeof(*pages), kalloc_gfp);
if (!pages)
return NULL;
  
  	/* IOMMU can map any pages, so himem can also be used here */

-   gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
+   page_gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
  
  	/* It makes no sense to muck about with huge pages */

-   gfp &= ~__GFP_COMP;
+   page_gfp &= ~__GFP_COMP;
  
  	while (count) {

struct page *page = NULL;
@@ -624,7 +625,7 @@ static struct page **__iommu_dma_alloc_pages(struct device 
*dev,
for (order_mask &= (2U << __fls(count)) - 1;
 order_mask; order_mask &= ~order_size) {
unsigned int order = __fls(order_mask);
-   gfp_t alloc_flags = gfp;
+   gfp_t alloc_flags = page_gfp;
  
  			order_size = 1U << order;

if (order_mask > order_size)
@@ -680,7 +681,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct 
device *dev,
  
  	count = PAGE_ALIGN(size) >> PAGE_SHIFT;

pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT,
-   gfp);
+   gfp, GFP_KERNEL);
if (!pages)
return NULL;
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] dma-iommu: Check CONFIG_SWIOTLB more broadly

2021-07-08 Thread Robin Murphy

On 2021-07-02 06:37, David Stevens wrote:

From: David Stevens 

Add check for CONFIG_SWIOTLB to dev_is_untrusted, so that swiotlb
related code can be removed more aggressively.


Seems logical, and I think the new name is secretly the best part since 
it clarifies the intent of 90% of the callers. However...



Signed-off-by: David Stevens 
---
  drivers/iommu/dma-iommu.c | 26 +-
  1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 24d1042cd052..614f0dd86b08 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -310,9 +310,10 @@ static void iommu_dma_flush_iotlb_all(struct iova_domain 
*iovad)
domain->ops->flush_iotlb_all(domain);
  }
  
-static bool dev_is_untrusted(struct device *dev)

+static bool dev_use_swiotlb(struct device *dev)
  {
-   return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
+   return IS_ENABLED(CONFIG_SWIOTLB) &&
+  dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
  }
  
  /**

@@ -368,7 +369,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
  
  	init_iova_domain(iovad, 1UL << order, base_pfn);
  
-	if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&

+   if (!cookie->fq_domain && (!dev || !dev_use_swiotlb(dev)) &&


...this one is unrelated to SWIOTLB. Even when we can't use bouncing to 
fully mitigate untrusted devices, it still makes sense to impose strict 
invalidation on them. Maybe we can keep dev_is_untrusted() and define 
dev_use_swiotlb() in terms of it?


Robin.


domain->ops->flush_iotlb_all && !iommu_get_dma_strict(domain)) {
if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
  iommu_dma_entry_dtor))
@@ -553,8 +554,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
 * If both the physical buffer start address and size are
 * page aligned, we don't need to use a bounce page.
 */
-   if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) &&
-   iova_offset(iovad, phys | org_size)) {
+   if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | org_size)) {
aligned_size = iova_align(iovad, org_size);
phys = swiotlb_tbl_map_single(dev, phys, org_size,
  aligned_size, dir,
@@ -779,7 +779,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
  {
phys_addr_t phys;
  
-	if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))

+   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
return;
  
  	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);

@@ -794,7 +794,7 @@ static void __iommu_dma_sync_single_for_device(struct 
device *dev,
dma_addr_t dma_handle, size_t size,
enum dma_data_direction dir, phys_addr_t phys)
  {
-   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
+   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
return;
  
  	if (phys == 0)

@@ -821,10 +821,10 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
struct scatterlist *sg;
int i;
  
-	if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))

+   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
return;
  
-	if (dev_is_untrusted(dev))

+   if (dev_use_swiotlb(dev))
for_each_sg(sgl, sg, nelems, i)
iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
  sg->length, dir);
@@ -840,10 +840,10 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
struct scatterlist *sg;
int i;
  
-	if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))

+   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
return;
  
-	if (dev_is_untrusted(dev))

+   if (dev_use_swiotlb(dev))
for_each_sg(sgl, sg, nelems, i)
__iommu_dma_sync_single_for_device(dev,
   sg_dma_address(sg),
@@ -1010,7 +1010,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
iommu_deferred_attach(dev, domain))
return 0;
  
-	if (dev_is_untrusted(dev)) {

+   if (dev_use_swiotlb(dev)) {
early_mapped = iommu_dma_map_sg_swiotlb(dev, sg, nents,
dir, attrs);
if (!early_mapped)
@@ -1092,7 +1092,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct 
scatterlist *sg,
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir);
  
-	if (dev_is_untrusted(dev)) {

+   if (dev_use_swiotlb(dev)) {
iommu_dma_unmap_sg_swiotlb(dev, s

[PATCH v2 0/4] Fixes for dma-iommu swiotlb bounce buffers

2021-07-08 Thread David Stevens
From: David Stevens 

This patch set includes two fixes for bugs caused by mixing up the
original buffer's physical address and bounce buffer's physical address.
It also includes a performance fix that avoids an extra copy, as well as
a general cleanup fix.

The issues were found via code inspection, so I don't have any specific
use cases where things were not working, or any performance numbers.

v1 -> v2:
 - Split fixes into dedicated patches
 - Less invasive changes to fix arch_sync when mapping
 - Leave dev_is_untrusted check for strict iommu

David Stevens (4):
  dma-iommu: fix sync_sg with swiotlb
  dma-iommu: fix arch_sync_dma for map with swiotlb
  dma-iommu: pass SKIP_CPU_SYNC to swiotlb unmap
  dma-iommu: Check CONFIG_SWIOTLB more broadly

 drivers/iommu/dma-iommu.c | 63 ---
 1 file changed, 33 insertions(+), 30 deletions(-)

-- 
2.32.0.93.g670b81a890-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/4] dma-iommu: fix sync_sg with swiotlb

2021-07-08 Thread David Stevens
From: David Stevens 

The is_swiotlb_buffer function takes the physical address of the swiotlb
buffer, not the physical address of the original buffer. The sglist
contains the physical addresses of the original buffer, so for the
sync_sg functions to work properly when a bounce buffer might have been
used, we need to use iommu_iova_to_phys to look up the physical address.
This is what sync_single does, so call that function on each sglist
segment.

The previous code mostly worked because swiotlb does the transfer on map
and unmap. However, any callers which use DMA_ATTR_SKIP_CPU_SYNC with
sglists or which call sync_sg would not have had anything copied to the
bounce buffer.

Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Signed-off-by: David Stevens 
---
 drivers/iommu/dma-iommu.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..eac65302439e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -811,14 +811,14 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
return;
 
-   for_each_sg(sgl, sg, nelems, i) {
-   if (!dev_is_dma_coherent(dev))
-   arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
-
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (dev_is_untrusted(dev))
+   for_each_sg(sgl, sg, nelems, i)
+   iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
+ sg->length, dir);
+   else
+   for_each_sg(sgl, sg, nelems, i)
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
-   }
 }
 
 static void iommu_dma_sync_sg_for_device(struct device *dev,
@@ -831,14 +831,14 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
return;
 
-   for_each_sg(sgl, sg, nelems, i) {
-   if (is_swiotlb_buffer(sg_phys(sg)))
-   swiotlb_sync_single_for_device(dev, sg_phys(sg),
-  sg->length, dir);
-
-   if (!dev_is_dma_coherent(dev))
+   if (dev_is_untrusted(dev))
+   for_each_sg(sgl, sg, nelems, i)
+   iommu_dma_sync_single_for_device(dev,
+sg_dma_address(sg),
+sg->length, dir);
+   else
+   for_each_sg(sgl, sg, nelems, i)
arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
-   }
 }
 
 static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
-- 
2.32.0.93.g670b81a890-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/4] dma-iommu: fix arch_sync_dma for map with swiotlb

2021-07-08 Thread David Stevens
From: David Stevens 

When calling arch_sync_dma, we need to pass it the memory that's
actually being used for dma. When using swiotlb bounce buffers, this is
the bounce buffer. Move arch_sync_dma into the __iommu_dma_map_swiotlb
helper, so it can use the bounce buffer address if necessary. This also
means it is no longer necessary to call iommu_dma_sync_sg_for_device in
iommu_dma_map_sg for untrusted devices.

Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Signed-off-by: David Stevens 
---
 drivers/iommu/dma-iommu.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index eac65302439e..e79e274d2dc5 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -574,6 +574,9 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
memset(padding_start, 0, padding_size);
}
 
+   if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   arch_sync_dma_for_device(phys, org_size, dir);
+
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
@@ -847,14 +850,9 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
 {
phys_addr_t phys = page_to_phys(page) + offset;
bool coherent = dev_is_dma_coherent(dev);
-   dma_addr_t dma_handle;
 
-   dma_handle = __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev),
+   return __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev),
coherent, dir, attrs);
-   if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   dma_handle != DMA_MAPPING_ERROR)
-   arch_sync_dma_for_device(phys, size, dir);
-   return dma_handle;
 }
 
 static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
@@ -997,12 +995,12 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
iommu_deferred_attach(dev, domain))
return 0;
 
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-   iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
-
if (dev_is_untrusted(dev))
return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
 
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
+
/*
 * Work out how much IOVA space we need, and align the segments to
 * IOVA granules for the IOMMU driver to handle. With some clever
-- 
2.32.0.93.g670b81a890-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/4] dma-iommu: pass SKIP_CPU_SYNC to swiotlb unmap

2021-07-08 Thread David Stevens
From: David Stevens 

If SKIP_CPU_SYNC isn't already set, then iommu_dma_unmap_(page|sg) has
already called iommu_dma_sync_(single|sg)_for_cpu, so there is no need
to copy from the bounce buffer again.

Signed-off-by: David Stevens 
---
 drivers/iommu/dma-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index e79e274d2dc5..0a9a9a343e64 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -505,7 +505,8 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
__iommu_dma_unmap(dev, dma_addr, size);
 
if (unlikely(is_swiotlb_buffer(phys)))
-   swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+   swiotlb_tbl_unmap_single(dev, phys, size, dir,
+attrs | DMA_ATTR_SKIP_CPU_SYNC);
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-- 
2.32.0.93.g670b81a890-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/4] dma-iommu: Check CONFIG_SWIOTLB more broadly

2021-07-08 Thread David Stevens
From: David Stevens 

Introduce a new dev_use_swiotlb function to guard swiotlb code, instead
of overloading dev_is_untrusted. This allows CONFIG_SWIOTLB to be
checked more broadly, so the swiotlb related code can be removed more
aggressively.

Signed-off-by: David Stevens 
---
 drivers/iommu/dma-iommu.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 0a9a9a343e64..d8a0764c69aa 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -315,6 +315,11 @@ static bool dev_is_untrusted(struct device *dev)
return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
 }
 
+static bool dev_use_swiotlb(struct device *dev)
+{
+   return IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev);
+}
+
 /**
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
@@ -552,8 +557,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
 * If both the physical buffer start address and size are
 * page aligned, we don't need to use a bounce page.
 */
-   if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) &&
-   iova_offset(iovad, phys | org_size)) {
+   if (dev_use_swiotlb(dev) && iova_offset(iovad, phys | org_size)) {
aligned_size = iova_align(iovad, org_size);
phys = swiotlb_tbl_map_single(dev, phys, org_size,
  aligned_size, dir, attrs);
@@ -778,7 +782,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
 {
phys_addr_t phys;
 
-   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
+   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
@@ -794,7 +798,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
 {
phys_addr_t phys;
 
-   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
+   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
@@ -812,10 +816,10 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
struct scatterlist *sg;
int i;
 
-   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
+   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
return;
 
-   if (dev_is_untrusted(dev))
+   if (dev_use_swiotlb(dev))
for_each_sg(sgl, sg, nelems, i)
iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
  sg->length, dir);
@@ -832,10 +836,10 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
struct scatterlist *sg;
int i;
 
-   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
+   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev))
return;
 
-   if (dev_is_untrusted(dev))
+   if (dev_use_swiotlb(dev))
for_each_sg(sgl, sg, nelems, i)
iommu_dma_sync_single_for_device(dev,
 sg_dma_address(sg),
@@ -996,7 +1000,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
iommu_deferred_attach(dev, domain))
return 0;
 
-   if (dev_is_untrusted(dev))
+   if (dev_use_swiotlb(dev))
return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
@@ -1071,7 +1075,7 @@ static void iommu_dma_unmap_sg(struct device *dev, struct 
scatterlist *sg,
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir);
 
-   if (dev_is_untrusted(dev)) {
+   if (dev_use_swiotlb(dev)) {
iommu_dma_unmap_sg_swiotlb(dev, sg, nents, dir, attrs);
return;
}
-- 
2.32.0.93.g670b81a890-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/4] Add dynamic iommu backed bounce buffers

2021-07-08 Thread David Stevens
On Thu, Jul 8, 2021 at 10:38 PM Lu Baolu  wrote:
>
> Hi David,
>
> I like this idea. Thanks for proposing this.
>
> On 2021/7/7 15:55, David Stevens wrote:
> > Add support for per-domain dynamic pools of iommu bounce buffers to the
> > dma-iommu API. This allows iommu mappings to be reused while still
> > maintaining strict iommu protection. Allocating buffers dynamically
> > instead of using swiotlb carveouts makes per-domain pools more amenable
> > on systems with large numbers of devices or where devices are unknown.
>
> Have you ever considered leveraging the per-device swiotlb memory pool
> added by below series?
>
> https://lore.kernel.org/linux-iommu/20210625123004.GA3170@willie-the-truck/

I'm not sure if that's a good fit. The swiotlb pools are allocated
during device initialization, so they require setting aside the
worst-case amount of memory. That's okay if you only use it with a
small number of devices where you know in advance approximately how
much memory they use. However, it doesn't work as well if you want to
use it with a large number of devices, or with unknown (i.e.
hotplugged) devices.

> >
> > When enabled, all non-direct streaming mappings below a configurable
> > size will go through bounce buffers. Note that this means drivers which
> > don't properly use the DMA API (e.g. i915) cannot use an iommu when this
> > feature is enabled. However, all drivers which work with swiotlb=force
> > should work.
>
> If so, why not making it more scalable by adding a callback into vendor
> iommu drivers? The vendor iommu drivers have enough information to tell
> whether the bounce buffer is feasible for a specific domain.

I'm not very familiar with the specifics of VT-d or restrictions with
the graphics hardware, but at least on the surface it looks like a
limitation of the i915 driver's implementation. The driver uses the
DMA_ATTR_SKIP_CPU_SYNC flag, but never calls the dma_sync functions,
since things are coherent on x86 hardware. However, bounce buffers
violate the driver's assumption that there's no need to sync the CPU
and device domain. I doubt there's an inherent limitation of the
hardware here, it's just how the driver is implemented. Given that, I
don't know if it's something the iommu driver needs to handle.

One potential way this could be addressed would be to add explicit
support to the DMA API for long-lived streaming mappings. Drivers can
get that behavior today via DMA_ATTR_SKIP_CPU_SYNC and dma_sync.
However, the DMA API doesn't really have enough information to treat
ephemeral and long-lived mappings differently. With a new DMA_ATTR
flag for long-lived streaming mappings, the DMA API could skip bounce
buffers. That flag could also be used as a performance optimization in
the various dma-buf implementations, since they seem to mostly fall
into the long-lived streaming category (the handful I checked do call
dma_sync, so there isn't a correctness issue).

-David

> >
> > Bounce buffers serve as an optimization in situations where interactions
> > with the iommu are very costly. For example, virtio-iommu operations in
>
> The simulated IOMMU does the same thing.
>
> It's also an optimization for bare metal in cases where the strict mode
> of cache invalidation is used. CPU moving data is faster than IOMMU
> cache invalidation if the buffer is small.
>
> Best regards,
> baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu