Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Vivek Gautam




On 6/3/2019 11:50 AM, Tomasz Figa wrote:

On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:

On Fri, May 10, 2019 at 7:35 AM Rob Clark  wrote:

On Tue, Dec 4, 2018 at 2:29 PM Rob Herring  wrote:

On Sat, Dec 1, 2018 at 10:54 AM Rob Clark  wrote:

This solves a problem we see with drm/msm, caused by getting
iommu_dma_ops while we attach our own domain and manage it directly at
the iommu API level:

   [0038] user address but active_mm is swapper
   Internal error: Oops: 9605 [#1] PREEMPT SMP
   Modules linked in:
   CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 4.19.3 #90
   Hardware name: xxx (DT)
   Workqueue: events deferred_probe_work_func
   pstate: 80c9 (Nzcv daif +PAN +UAO)
   pc : iommu_dma_map_sg+0x7c/0x2c8
   lr : iommu_dma_map_sg+0x40/0x2c8
   sp : ff80095eb4f0
   x29: ff80095eb4f0 x28: 
   x27: ffc0f9431578 x26: 
   x25:  x24: 0003
   x23: 0001 x22: ffc0fa9ac010
   x21:  x20: ffc0fab40980
   x19: ffc0fab40980 x18: 0003
   x17: 01c4 x16: 0007
   x15: 000e x14: 
   x13:  x12: 0028
   x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
   x9 :  x8 : ffc0fab409a0
   x7 :  x6 : 0002
   x5 : 0001 x4 : 
   x3 : 0001 x2 : 0002
   x1 : ffc0f9431578 x0 : 
   Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
   Call trace:
iommu_dma_map_sg+0x7c/0x2c8
__iommu_map_sg_attrs+0x70/0x84
get_pages+0x170/0x1e8
msm_gem_get_iova+0x8c/0x128
_msm_gem_kernel_new+0x6c/0xc8
msm_gem_kernel_new+0x4c/0x58
dsi_tx_buf_alloc_6g+0x4c/0x8c
msm_dsi_host_modeset_init+0xc8/0x108
msm_dsi_modeset_init+0x54/0x18c
_dpu_kms_drm_obj_init+0x430/0x474
dpu_kms_hw_init+0x5f8/0x6b4
msm_drm_bind+0x360/0x6c8
try_to_bring_up_master.part.7+0x28/0x70
component_master_add_with_match+0xe8/0x124
msm_pdev_probe+0x294/0x2b4
platform_drv_probe+0x58/0xa4
really_probe+0x150/0x294
driver_probe_device+0xac/0xe8
__device_attach_driver+0xa4/0xb4
bus_for_each_drv+0x98/0xc8
__device_attach+0xac/0x12c
device_initial_probe+0x24/0x30
bus_probe_device+0x38/0x98
deferred_probe_work_func+0x78/0xa4
process_one_work+0x24c/0x3dc
worker_thread+0x280/0x360
kthread+0x134/0x13c
ret_from_fork+0x10/0x18
   Code: d284 91000725 6b17039f 5400048a (f9401f40)
   ---[ end trace f22dda57f3648e2c ]---
   Kernel panic - not syncing: Fatal exception
   SMP: stopping secondary CPUs
   Kernel Offset: disabled
   CPU features: 0x0,22802a18
   Memory Limit: none

The problem is that when drm/msm does it's own iommu_attach_device(),
now the domain returned by iommu_get_domain_for_dev() is drm/msm's
domain, and it doesn't have domain->iova_cookie.

We kind of avoided this problem prior to sdm845/dpu because the iommu
was attached to the mdp node in dt, which is a child of the toplevel
mdss node (which corresponds to the dev passed in dma_map_sg()).  But
with sdm845, now the iommu is attached at the mdss level so we hit the
iommu_dma_ops in dma_map_sg().

But auto allocating/attaching a domain before the driver is probed was
already a blocking problem for enabling per-context pagetables for the
GPU.  This problem is also now solved with this patch.

Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in of_dma_configure
Tested-by: Douglas Anderson 
Signed-off-by: Rob Clark 
---
This is an alternative/replacement for [1].  What it lacks in elegance
it makes up for in practicality ;-)

[1] https://patchwork.freedesktop.org/patch/264930/

  drivers/of/device.c | 22 ++
  1 file changed, 22 insertions(+)

diff --git a/drivers/of/device.c b/drivers/of/device.c
index 5957cd4fa262..15ffee00fb22 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -72,6 +72,14 @@ int of_device_add(struct platform_device *ofdev)
 return device_add(&ofdev->dev);
  }

+static const struct of_device_id iommu_blacklist[] = {
+   { .compatible = "qcom,mdp4" },
+   { .compatible = "qcom,mdss" },
+   { .compatible = "qcom,sdm845-mdss" },
+   { .compatible = "qcom,adreno" },
+   {}
+};

Not completely clear to whether this is still needed or not, but this
really won't scale. Why can't the driver for these devices override
whatever has been setup by default?


fwiw, at the moment it is not needed, but it will become needed again
to implement per-context pagetables (although I suppose for this we
only need to blacklist qcom,adreno and not also the display nodes).

So, another case I've come across, on the display side.. I'm working
on handling the case where bootloader enables display (and takes iommu
out of reset).. as soon as DMA domain gets attached we get iommu
faults, because bootloade

Re: [PATCH -next] intel-iommu: fix a variable set but not used

2019-06-03 Thread Joerg Roedel
On Fri, May 31, 2019 at 04:16:02PM -0400, Qian Cai wrote:
> The commit "iommu/vt-d: Delegate the dma domain to upper layer" left an
> unused variable,
> 
> drivers/iommu/intel-iommu.c: In function 'disable_dmar_iommu':
> drivers/iommu/intel-iommu.c:1652:23: warning: variable 'domain' set but
> not used [-Wunused-but-set-variable]
> 
> Signed-off-by: Qian Cai 
> ---
>  drivers/iommu/intel-iommu.c | 4 
>  1 file changed, 4 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/6] iommu/ipmmu-vmsa: Suspend/resume support and assorted cleanups

2019-06-03 Thread Joerg Roedel
On Mon, May 27, 2019 at 01:52:47PM +0200, Geert Uytterhoeven wrote:
> Geert Uytterhoeven (6):
>   iommu/ipmmu-vmsa: Link IOMMUs and devices in sysfs
>   iommu/ipmmu-vmsa: Prepare to handle 40-bit error addresses
>   iommu/ipmmu-vmsa: Make IPMMU_CTX_MAX unsigned
>   iommu/ipmmu-vmsa: Move num_utlbs to SoC-specific features
>   iommu/ipmmu-vmsa: Extract hardware context initialization
>   iommu/ipmmu-vmsa: Add suspend/resume support

Applied, thanks Geert.


Re: [PATCH] iommu/dma: Fix condition check in iommu_dma_unmap_sg

2019-06-03 Thread Joerg Roedel
On Wed, May 29, 2019 at 01:15:32AM -0700, Nathan Chancellor wrote:
> Fixes: 06d60728ff5c ("iommu/dma: move the arm64 wrappers to common code")
> Link: https://github.com/ClangBuiltLinux/linux/issues/497
> Signed-off-by: Nathan Chancellor 
> ---
>  drivers/iommu/dma-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Rob Clark
On Mon, Jun 3, 2019 at 12:57 AM Vivek Gautam
 wrote:
>
>
>
> On 6/3/2019 11:50 AM, Tomasz Figa wrote:
> > On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
> >> On Fri, May 10, 2019 at 7:35 AM Rob Clark  wrote:
> >>> On Tue, Dec 4, 2018 at 2:29 PM Rob Herring  wrote:
>  On Sat, Dec 1, 2018 at 10:54 AM Rob Clark  wrote:
> > This solves a problem we see with drm/msm, caused by getting
> > iommu_dma_ops while we attach our own domain and manage it directly at
> > the iommu API level:
> >
> >[0038] user address but active_mm is swapper
> >Internal error: Oops: 9605 [#1] PREEMPT SMP
> >Modules linked in:
> >CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 4.19.3 
> > #90
> >Hardware name: xxx (DT)
> >Workqueue: events deferred_probe_work_func
> >pstate: 80c9 (Nzcv daif +PAN +UAO)
> >pc : iommu_dma_map_sg+0x7c/0x2c8
> >lr : iommu_dma_map_sg+0x40/0x2c8
> >sp : ff80095eb4f0
> >x29: ff80095eb4f0 x28: 
> >x27: ffc0f9431578 x26: 
> >x25:  x24: 0003
> >x23: 0001 x22: ffc0fa9ac010
> >x21:  x20: ffc0fab40980
> >x19: ffc0fab40980 x18: 0003
> >x17: 01c4 x16: 0007
> >x15: 000e x14: 
> >x13:  x12: 0028
> >x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> >x9 :  x8 : ffc0fab409a0
> >x7 :  x6 : 0002
> >x5 : 0001 x4 : 
> >x3 : 0001 x2 : 0002
> >x1 : ffc0f9431578 x0 : 
> >Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
> >Call trace:
> > iommu_dma_map_sg+0x7c/0x2c8
> > __iommu_map_sg_attrs+0x70/0x84
> > get_pages+0x170/0x1e8
> > msm_gem_get_iova+0x8c/0x128
> > _msm_gem_kernel_new+0x6c/0xc8
> > msm_gem_kernel_new+0x4c/0x58
> > dsi_tx_buf_alloc_6g+0x4c/0x8c
> > msm_dsi_host_modeset_init+0xc8/0x108
> > msm_dsi_modeset_init+0x54/0x18c
> > _dpu_kms_drm_obj_init+0x430/0x474
> > dpu_kms_hw_init+0x5f8/0x6b4
> > msm_drm_bind+0x360/0x6c8
> > try_to_bring_up_master.part.7+0x28/0x70
> > component_master_add_with_match+0xe8/0x124
> > msm_pdev_probe+0x294/0x2b4
> > platform_drv_probe+0x58/0xa4
> > really_probe+0x150/0x294
> > driver_probe_device+0xac/0xe8
> > __device_attach_driver+0xa4/0xb4
> > bus_for_each_drv+0x98/0xc8
> > __device_attach+0xac/0x12c
> > device_initial_probe+0x24/0x30
> > bus_probe_device+0x38/0x98
> > deferred_probe_work_func+0x78/0xa4
> > process_one_work+0x24c/0x3dc
> > worker_thread+0x280/0x360
> > kthread+0x134/0x13c
> > ret_from_fork+0x10/0x18
> >Code: d284 91000725 6b17039f 5400048a (f9401f40)
> >---[ end trace f22dda57f3648e2c ]---
> >Kernel panic - not syncing: Fatal exception
> >SMP: stopping secondary CPUs
> >Kernel Offset: disabled
> >CPU features: 0x0,22802a18
> >Memory Limit: none
> >
> > The problem is that when drm/msm does it's own iommu_attach_device(),
> > now the domain returned by iommu_get_domain_for_dev() is drm/msm's
> > domain, and it doesn't have domain->iova_cookie.
> >
> > We kind of avoided this problem prior to sdm845/dpu because the iommu
> > was attached to the mdp node in dt, which is a child of the toplevel
> > mdss node (which corresponds to the dev passed in dma_map_sg()).  But
> > with sdm845, now the iommu is attached at the mdss level so we hit the
> > iommu_dma_ops in dma_map_sg().
> >
> > But auto allocating/attaching a domain before the driver is probed was
> > already a blocking problem for enabling per-context pagetables for the
> > GPU.  This problem is also now solved with this patch.
> >
> > Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in 
> > of_dma_configure
> > Tested-by: Douglas Anderson 
> > Signed-off-by: Rob Clark 
> > ---
> > This is an alternative/replacement for [1].  What it lacks in elegance
> > it makes up for in practicality ;-)
> >
> > [1] https://patchwork.freedesktop.org/patch/264930/
> >
> >   drivers/of/device.c | 22 ++
> >   1 file changed, 22 insertions(+)
> >
> > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > index 5957cd4fa262..15ffee00fb22 100644
> > --- a/drivers/of/device.c
> > +++ b/drivers/of/device.c
> > @@ -72,6 +72,14 @@ int of_device_add(struct platform_device *ofdev)
> >  return devi

Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Rob Clark
On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa  wrote:
>
> On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
> >
> > On Fri, May 10, 2019 at 7:35 AM Rob Clark  wrote:
> > >
> > > On Tue, Dec 4, 2018 at 2:29 PM Rob Herring  wrote:
> > > >
> > > > On Sat, Dec 1, 2018 at 10:54 AM Rob Clark  wrote:
> > > > >
> > > > > This solves a problem we see with drm/msm, caused by getting
> > > > > iommu_dma_ops while we attach our own domain and manage it directly at
> > > > > the iommu API level:
> > > > >
> > > > >   [0038] user address but active_mm is swapper
> > > > >   Internal error: Oops: 9605 [#1] PREEMPT SMP
> > > > >   Modules linked in:
> > > > >   CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 4.19.3 
> > > > > #90
> > > > >   Hardware name: xxx (DT)
> > > > >   Workqueue: events deferred_probe_work_func
> > > > >   pstate: 80c9 (Nzcv daif +PAN +UAO)
> > > > >   pc : iommu_dma_map_sg+0x7c/0x2c8
> > > > >   lr : iommu_dma_map_sg+0x40/0x2c8
> > > > >   sp : ff80095eb4f0
> > > > >   x29: ff80095eb4f0 x28: 
> > > > >   x27: ffc0f9431578 x26: 
> > > > >   x25:  x24: 0003
> > > > >   x23: 0001 x22: ffc0fa9ac010
> > > > >   x21:  x20: ffc0fab40980
> > > > >   x19: ffc0fab40980 x18: 0003
> > > > >   x17: 01c4 x16: 0007
> > > > >   x15: 000e x14: 
> > > > >   x13:  x12: 0028
> > > > >   x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> > > > >   x9 :  x8 : ffc0fab409a0
> > > > >   x7 :  x6 : 0002
> > > > >   x5 : 0001 x4 : 
> > > > >   x3 : 0001 x2 : 0002
> > > > >   x1 : ffc0f9431578 x0 : 
> > > > >   Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
> > > > >   Call trace:
> > > > >iommu_dma_map_sg+0x7c/0x2c8
> > > > >__iommu_map_sg_attrs+0x70/0x84
> > > > >get_pages+0x170/0x1e8
> > > > >msm_gem_get_iova+0x8c/0x128
> > > > >_msm_gem_kernel_new+0x6c/0xc8
> > > > >msm_gem_kernel_new+0x4c/0x58
> > > > >dsi_tx_buf_alloc_6g+0x4c/0x8c
> > > > >msm_dsi_host_modeset_init+0xc8/0x108
> > > > >msm_dsi_modeset_init+0x54/0x18c
> > > > >_dpu_kms_drm_obj_init+0x430/0x474
> > > > >dpu_kms_hw_init+0x5f8/0x6b4
> > > > >msm_drm_bind+0x360/0x6c8
> > > > >try_to_bring_up_master.part.7+0x28/0x70
> > > > >component_master_add_with_match+0xe8/0x124
> > > > >msm_pdev_probe+0x294/0x2b4
> > > > >platform_drv_probe+0x58/0xa4
> > > > >really_probe+0x150/0x294
> > > > >driver_probe_device+0xac/0xe8
> > > > >__device_attach_driver+0xa4/0xb4
> > > > >bus_for_each_drv+0x98/0xc8
> > > > >__device_attach+0xac/0x12c
> > > > >device_initial_probe+0x24/0x30
> > > > >bus_probe_device+0x38/0x98
> > > > >deferred_probe_work_func+0x78/0xa4
> > > > >process_one_work+0x24c/0x3dc
> > > > >worker_thread+0x280/0x360
> > > > >kthread+0x134/0x13c
> > > > >ret_from_fork+0x10/0x18
> > > > >   Code: d284 91000725 6b17039f 5400048a (f9401f40)
> > > > >   ---[ end trace f22dda57f3648e2c ]---
> > > > >   Kernel panic - not syncing: Fatal exception
> > > > >   SMP: stopping secondary CPUs
> > > > >   Kernel Offset: disabled
> > > > >   CPU features: 0x0,22802a18
> > > > >   Memory Limit: none
> > > > >
> > > > > The problem is that when drm/msm does it's own iommu_attach_device(),
> > > > > now the domain returned by iommu_get_domain_for_dev() is drm/msm's
> > > > > domain, and it doesn't have domain->iova_cookie.
> > > > >
> > > > > We kind of avoided this problem prior to sdm845/dpu because the iommu
> > > > > was attached to the mdp node in dt, which is a child of the toplevel
> > > > > mdss node (which corresponds to the dev passed in dma_map_sg()).  But
> > > > > with sdm845, now the iommu is attached at the mdss level so we hit the
> > > > > iommu_dma_ops in dma_map_sg().
> > > > >
> > > > > But auto allocating/attaching a domain before the driver is probed was
> > > > > already a blocking problem for enabling per-context pagetables for the
> > > > > GPU.  This problem is also now solved with this patch.
> > > > >
> > > > > Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in 
> > > > > of_dma_configure
> > > > > Tested-by: Douglas Anderson 
> > > > > Signed-off-by: Rob Clark 
> > > > > ---
> > > > > This is an alternative/replacement for [1].  What it lacks in elegance
> > > > > it makes up for in practicality ;-)
> > > > >
> > > > > [1] https://patchwork.freedesktop.org/patch/264930/
> > > > >
> > > > >  drivers/of/device.c | 22 ++
> > > > >  1 file changed, 22 insertions(+)
> > > > >
> > > > > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > > > > index 5957cd4fa262..15ffee00fb22 100644
> > > > > --- a/drivers/of/device.c
> > > > > +++ b/drivers/of/d

Re: [PATCH v3 0/4] iommu/amd: Convert the AMD iommu driver to the dma-iommu api

2019-06-03 Thread Joerg Roedel
Hi Tom,

On Mon, May 06, 2019 at 07:52:02PM +0100, Tom Murphy wrote:
> Convert the AMD iommu driver to the dma-iommu api. Remove the iova
> handling and reserve region code from the AMD iommu driver.

Thank you for your work on this! I appreciate that much, but I am not
sure we are ready to make that move for the AMD and Intel IOMMU drivers
yet.

My main concern right now is that these changes will add a per-page
table lock into the fast-path for dma-mapping operations. There has been
much work in the past to remove all locking from these code-paths and
make it scalable on x86.

The dma-ops implementations in the x86 IOMMU drivers have the benefit
that they can call their page-table manipulation functions directly and
without locks, because they can make the necessary assumptions. The
IOMMU-API mapping/unmapping path can't make these assumptions because it
is also used for non-DMA-API use-cases.

So before we can move the AMD and Intel drivers to the generic DMA-API
implementation we need to solve this problem to not introduce new
scalability regressions.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Vivek Gautam
On Mon, Jun 3, 2019 at 4:14 PM Rob Clark  wrote:
>
> On Mon, Jun 3, 2019 at 12:57 AM Vivek Gautam
>  wrote:
> >
> >
> >
> > On 6/3/2019 11:50 AM, Tomasz Figa wrote:
> > > On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
> > >> On Fri, May 10, 2019 at 7:35 AM Rob Clark  wrote:
> > >>> On Tue, Dec 4, 2018 at 2:29 PM Rob Herring  wrote:
> >  On Sat, Dec 1, 2018 at 10:54 AM Rob Clark  wrote:
> > > This solves a problem we see with drm/msm, caused by getting
> > > iommu_dma_ops while we attach our own domain and manage it directly at
> > > the iommu API level:
> > >
> > >[0038] user address but active_mm is swapper
> > >Internal error: Oops: 9605 [#1] PREEMPT SMP
> > >Modules linked in:
> > >CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 
> > > 4.19.3 #90
> > >Hardware name: xxx (DT)
> > >Workqueue: events deferred_probe_work_func
> > >pstate: 80c9 (Nzcv daif +PAN +UAO)
> > >pc : iommu_dma_map_sg+0x7c/0x2c8
> > >lr : iommu_dma_map_sg+0x40/0x2c8
> > >sp : ff80095eb4f0
> > >x29: ff80095eb4f0 x28: 
> > >x27: ffc0f9431578 x26: 
> > >x25:  x24: 0003
> > >x23: 0001 x22: ffc0fa9ac010
> > >x21:  x20: ffc0fab40980
> > >x19: ffc0fab40980 x18: 0003
> > >x17: 01c4 x16: 0007
> > >x15: 000e x14: 
> > >x13:  x12: 0028
> > >x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> > >x9 :  x8 : ffc0fab409a0
> > >x7 :  x6 : 0002
> > >x5 : 0001 x4 : 
> > >x3 : 0001 x2 : 0002
> > >x1 : ffc0f9431578 x0 : 
> > >Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
> > >Call trace:
> > > iommu_dma_map_sg+0x7c/0x2c8
> > > __iommu_map_sg_attrs+0x70/0x84
> > > get_pages+0x170/0x1e8
> > > msm_gem_get_iova+0x8c/0x128
> > > _msm_gem_kernel_new+0x6c/0xc8
> > > msm_gem_kernel_new+0x4c/0x58
> > > dsi_tx_buf_alloc_6g+0x4c/0x8c
> > > msm_dsi_host_modeset_init+0xc8/0x108
> > > msm_dsi_modeset_init+0x54/0x18c
> > > _dpu_kms_drm_obj_init+0x430/0x474
> > > dpu_kms_hw_init+0x5f8/0x6b4
> > > msm_drm_bind+0x360/0x6c8
> > > try_to_bring_up_master.part.7+0x28/0x70
> > > component_master_add_with_match+0xe8/0x124
> > > msm_pdev_probe+0x294/0x2b4
> > > platform_drv_probe+0x58/0xa4
> > > really_probe+0x150/0x294
> > > driver_probe_device+0xac/0xe8
> > > __device_attach_driver+0xa4/0xb4
> > > bus_for_each_drv+0x98/0xc8
> > > __device_attach+0xac/0x12c
> > > device_initial_probe+0x24/0x30
> > > bus_probe_device+0x38/0x98
> > > deferred_probe_work_func+0x78/0xa4
> > > process_one_work+0x24c/0x3dc
> > > worker_thread+0x280/0x360
> > > kthread+0x134/0x13c
> > > ret_from_fork+0x10/0x18
> > >Code: d284 91000725 6b17039f 5400048a (f9401f40)
> > >---[ end trace f22dda57f3648e2c ]---
> > >Kernel panic - not syncing: Fatal exception
> > >SMP: stopping secondary CPUs
> > >Kernel Offset: disabled
> > >CPU features: 0x0,22802a18
> > >Memory Limit: none
> > >
> > > The problem is that when drm/msm does it's own iommu_attach_device(),
> > > now the domain returned by iommu_get_domain_for_dev() is drm/msm's
> > > domain, and it doesn't have domain->iova_cookie.
> > >
> > > We kind of avoided this problem prior to sdm845/dpu because the iommu
> > > was attached to the mdp node in dt, which is a child of the toplevel
> > > mdss node (which corresponds to the dev passed in dma_map_sg()).  But
> > > with sdm845, now the iommu is attached at the mdss level so we hit the
> > > iommu_dma_ops in dma_map_sg().
> > >
> > > But auto allocating/attaching a domain before the driver is probed was
> > > already a blocking problem for enabling per-context pagetables for the
> > > GPU.  This problem is also now solved with this patch.
> > >
> > > Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in 
> > > of_dma_configure
> > > Tested-by: Douglas Anderson 
> > > Signed-off-by: Rob Clark 
> > > ---
> > > This is an alternative/replacement for [1].  What it lacks in elegance
> > > it makes up for in practicality ;-)
> > >
> > > [1] https://patchwork.freedesktop.org/patch/264930/
> > >
> > >   drivers/of/device.c | 22 ++
> > >   1 file changed, 22 insertions(+)
> > >
> > > diff --git a

Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Robin Murphy

On 03/06/2019 11:47, Rob Clark wrote:

On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa  wrote:


On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:


On Fri, May 10, 2019 at 7:35 AM Rob Clark  wrote:


On Tue, Dec 4, 2018 at 2:29 PM Rob Herring  wrote:


On Sat, Dec 1, 2018 at 10:54 AM Rob Clark  wrote:


This solves a problem we see with drm/msm, caused by getting
iommu_dma_ops while we attach our own domain and manage it directly at
the iommu API level:

   [0038] user address but active_mm is swapper
   Internal error: Oops: 9605 [#1] PREEMPT SMP
   Modules linked in:
   CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 4.19.3 #90
   Hardware name: xxx (DT)
   Workqueue: events deferred_probe_work_func
   pstate: 80c9 (Nzcv daif +PAN +UAO)
   pc : iommu_dma_map_sg+0x7c/0x2c8
   lr : iommu_dma_map_sg+0x40/0x2c8
   sp : ff80095eb4f0
   x29: ff80095eb4f0 x28: 
   x27: ffc0f9431578 x26: 
   x25:  x24: 0003
   x23: 0001 x22: ffc0fa9ac010
   x21:  x20: ffc0fab40980
   x19: ffc0fab40980 x18: 0003
   x17: 01c4 x16: 0007
   x15: 000e x14: 
   x13:  x12: 0028
   x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
   x9 :  x8 : ffc0fab409a0
   x7 :  x6 : 0002
   x5 : 0001 x4 : 
   x3 : 0001 x2 : 0002
   x1 : ffc0f9431578 x0 : 
   Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
   Call trace:
iommu_dma_map_sg+0x7c/0x2c8
__iommu_map_sg_attrs+0x70/0x84
get_pages+0x170/0x1e8
msm_gem_get_iova+0x8c/0x128
_msm_gem_kernel_new+0x6c/0xc8
msm_gem_kernel_new+0x4c/0x58
dsi_tx_buf_alloc_6g+0x4c/0x8c
msm_dsi_host_modeset_init+0xc8/0x108
msm_dsi_modeset_init+0x54/0x18c
_dpu_kms_drm_obj_init+0x430/0x474
dpu_kms_hw_init+0x5f8/0x6b4
msm_drm_bind+0x360/0x6c8
try_to_bring_up_master.part.7+0x28/0x70
component_master_add_with_match+0xe8/0x124
msm_pdev_probe+0x294/0x2b4
platform_drv_probe+0x58/0xa4
really_probe+0x150/0x294
driver_probe_device+0xac/0xe8
__device_attach_driver+0xa4/0xb4
bus_for_each_drv+0x98/0xc8
__device_attach+0xac/0x12c
device_initial_probe+0x24/0x30
bus_probe_device+0x38/0x98
deferred_probe_work_func+0x78/0xa4
process_one_work+0x24c/0x3dc
worker_thread+0x280/0x360
kthread+0x134/0x13c
ret_from_fork+0x10/0x18
   Code: d284 91000725 6b17039f 5400048a (f9401f40)
   ---[ end trace f22dda57f3648e2c ]---
   Kernel panic - not syncing: Fatal exception
   SMP: stopping secondary CPUs
   Kernel Offset: disabled
   CPU features: 0x0,22802a18
   Memory Limit: none

The problem is that when drm/msm does it's own iommu_attach_device(),
now the domain returned by iommu_get_domain_for_dev() is drm/msm's
domain, and it doesn't have domain->iova_cookie.

We kind of avoided this problem prior to sdm845/dpu because the iommu
was attached to the mdp node in dt, which is a child of the toplevel
mdss node (which corresponds to the dev passed in dma_map_sg()).  But
with sdm845, now the iommu is attached at the mdss level so we hit the
iommu_dma_ops in dma_map_sg().

But auto allocating/attaching a domain before the driver is probed was
already a blocking problem for enabling per-context pagetables for the
GPU.  This problem is also now solved with this patch.

Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in of_dma_configure
Tested-by: Douglas Anderson 
Signed-off-by: Rob Clark 
---
This is an alternative/replacement for [1].  What it lacks in elegance
it makes up for in practicality ;-)

[1] https://patchwork.freedesktop.org/patch/264930/

  drivers/of/device.c | 22 ++
  1 file changed, 22 insertions(+)

diff --git a/drivers/of/device.c b/drivers/of/device.c
index 5957cd4fa262..15ffee00fb22 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -72,6 +72,14 @@ int of_device_add(struct platform_device *ofdev)
 return device_add(&ofdev->dev);
  }

+static const struct of_device_id iommu_blacklist[] = {
+   { .compatible = "qcom,mdp4" },
+   { .compatible = "qcom,mdss" },
+   { .compatible = "qcom,sdm845-mdss" },
+   { .compatible = "qcom,adreno" },
+   {}
+};


Not completely clear to whether this is still needed or not, but this
really won't scale. Why can't the driver for these devices override
whatever has been setup by default?



fwiw, at the moment it is not needed, but it will become needed again
to implement per-context pagetables (although I suppose for this we
only need to blacklist qcom,adreno and not also the display nodes).


So, another case I've come across, on the display side.. I'm working
on handling the case where bootloader enables display (and takes iommu
out of reset).. as soon as DMA doma

Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Christoph Hellwig
If you (and a few others actors in the thread) want people to actually
read what you wrote please follow proper mailing list ettiquette.  I've
given up on reading all the recent mails after scrolling through two
pages of full quotes.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/4] iommu/amd: Convert the AMD iommu driver to the dma-iommu api

2019-06-03 Thread Tom Murphy via iommu
On Mon, Jun 3, 2019 at 11:52 AM Joerg Roedel  wrote:
>
> Hi Tom,
>
> On Mon, May 06, 2019 at 07:52:02PM +0100, Tom Murphy wrote:
> > Convert the AMD iommu driver to the dma-iommu api. Remove the iova
> > handling and reserve region code from the AMD iommu driver.
>
> Thank you for your work on this! I appreciate that much, but I am not
> sure we are ready to make that move for the AMD and Intel IOMMU drivers
> yet.
>
> My main concern right now is that these changes will add a per-page
> table lock into the fast-path for dma-mapping operations. There has been
> much work in the past to remove all locking from these code-paths and
> make it scalable on x86.

Where is the locking introduced? intel doesn't use a lock in it's
iommu_map function:
https://github.com/torvalds/linux/blob/f2c7c76c5d0a443053e94adb9f0918fa2fb85c3a/drivers/iommu/intel-iommu.c#L5302
because it cleverly uses cmpxchg64 to avoid using locks:
https://github.com/torvalds/linux/blob/f2c7c76c5d0a443053e94adb9f0918fa2fb85c3a/drivers/iommu/intel-iommu.c#L900
And the locking in AMD's iommu_map function can be removed (and i have
removed it in my patch set) because it does that same thing as intel:
https://github.com/torvalds/linux/blob/f2c7c76c5d0a443053e94adb9f0918fa2fb85c3a/drivers/iommu/amd_iommu.c#L1486

Is there something I'm missing?

>
> The dma-ops implementations in the x86 IOMMU drivers have the benefit
> that they can call their page-table manipulation functions directly and
> without locks, because they can make the necessary assumptions. The
> IOMMU-API mapping/unmapping path can't make these assumptions because it
> is also used for non-DMA-API use-cases.
>
> So before we can move the AMD and Intel drivers to the generic DMA-API
> implementation we need to solve this problem to not introduce new
> scalability regressions.
>
> Regards,
>
> Joerg
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Vivek Gautam
On Mon, Jun 3, 2019 at 4:47 PM Christoph Hellwig  wrote:
>
> If you (and a few others actors in the thread) want people to actually
> read what you wrote please follow proper mailing list ettiquette.  I've
> given up on reading all the recent mails after scrolling through two
> pages of full quotes.

Apologies for not cutting down the quoted text. I will be more careful
next time onwards.

Regards
Vivek

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 3/7] s390/pci: add support for IOMMU default DMA mode build options

2019-06-03 Thread Sebastian Ott


On Thu, 30 May 2019, Zhen Lei wrote:
> The default DMA mode is LAZY on s390, this patch make it can be set to
> STRICT at build time. It can be overridden by boot option.
> 
> There is no functional change.
> 
> Signed-off-by: Zhen Lei 

Acked-by: Sebastian Ott 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3] iommu/arm-smmu: Avoid constant zero in TLBI writes

2019-06-03 Thread Marc Gonzalez
From: Robin Murphy 

Apparently, some Qualcomm arm64 platforms which appear to expose their
SMMU global register space are still, in fact, using a hypervisor to
mediate it by trapping and emulating register accesses. Sadly, some
deployed versions of said trapping code have bugs wherein they go
horribly wrong for stores using r31 (i.e. XZR/WZR) as the source
register.

While this can be mitigated for GCC today by tweaking the constraints
for the implementation of writel_relaxed(), to avoid any potential
arms race with future compilers more aggressively optimising register
allocation, the simple way is to just remove all the problematic
constant zeros. For the write-only TLB operations, the actual value is
irrelevant anyway and any old nearby variable will provide a suitable
GPR to encode. The one point at which we really do need a zero to clear
a context bank happens before any of the TLB maintenance where crashes
have been reported, so is apparently not a problem... :/

Reported-by: AngeloGioacchino Del Regno 
Tested-by: Marc Gonzalez 
Signed-off-by: Robin Murphy 
Signed-off-by: Marc Gonzalez 
---
Changes from v2:
- Define and use QCOM_DUMMY_VAL for the 3 problematic mmio writes
- Drop previous Reviewed-by and Tested-by tags (rationale: we are now writing a 
different value)

Boot log:
REMAP: PA=0168 VA=ff80111b SIZE=1
arm-smmu 168.arm,smmu: probing hardware configuration...
arm-smmu 168.arm,smmu: SMMUv2 with:
arm-smmu 168.arm,smmu:  stage 1 translation
arm-smmu 168.arm,smmu:  address translation ops
arm-smmu 168.arm,smmu:  non-coherent table walk
arm-smmu 168.arm,smmu:  (IDR0.CTTW overridden by FW configuration)
arm-smmu 168.arm,smmu:  stream matching with 16 register groups
arm-smmu 168.arm,smmu:  6 context banks (0 stage-2 only)
arm-smmu 168.arm,smmu:  Supported page sizes: 0x63315000
arm-smmu 168.arm,smmu:  Stage-1: 36-bit VA -> 36-bit IPA
[SMMU + 48] = 
[SMMU + 000c00] = 0002
[SMMU + 000800] = 
[SMMU + 000c04] = 0002
[SMMU + 000804] = 
[SMMU + 000c08] = 0002
[SMMU + 000808] = 
[SMMU + 000c0c] = 0002
[SMMU + 00080c] = 
[SMMU + 000c10] = 0002
[SMMU + 000810] = 
[SMMU + 000c14] = 0002
[SMMU + 000814] = 
[SMMU + 000c18] = 0002
[SMMU + 000818] = 
[SMMU + 000c1c] = 0002
[SMMU + 00081c] = 
[SMMU + 000c20] = 0002
[SMMU + 000820] = 
[SMMU + 000c24] = 0002
[SMMU + 000824] = 
[SMMU + 000c28] = 0002
[SMMU + 000828] = 
[SMMU + 000c2c] = 0002
[SMMU + 00082c] = 
[SMMU + 000c30] = 0002
[SMMU + 000830] = 
[SMMU + 000c34] = 0002
[SMMU + 000834] = 
[SMMU + 000c38] = 0002
[SMMU + 000838] = 
[SMMU + 000c3c] = 0002
[SMMU + 00083c] = 
[SMMU + 008000] = 
[SMMU + 008058] = c1fe
[SMMU + 009000] = 
[SMMU + 009058] = c1fe
[SMMU + 00a000] = 
[SMMU + 00a058] = c1fe
[SMMU + 00b000] = 
[SMMU + 00b058] = c1fe
[SMMU + 00c000] = 
[SMMU + 00c058] = c1fe
[SMMU + 00d000] = 
[SMMU + 00d058] = c1fe
[SMMU + 6c] = 
[SMMU + 68] = 
[SMMU + 70] = 
[SMMU + 00] = 00201e36
[SMMU + 000800] = 1fff
[SMMU + 000800] = 1fff
[SMMU + 001800] = 0001
[SMMU + 001000] = 0001f300
[SMMU + 008010] = 00038011
[SMMU + 008030] = 0080351c
[SMMU + 008020] = 785d5000
[SMMU + 008028] = 
[SMMU + 008038] = 0004ff44
[SMMU + 00803c] = 
[SMMU + 008000] = 1067
[SMMU + 000c00] = 
atl1c :01:00.0: Adding to iommu group 0
[SMMU + 000c00] = 
[SMMU + 000800] = 80001480
---
 drivers/iommu/arm-smmu.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 930c07635956..9435e4a7759f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -59,6 +59,15 @@
 
 #include "arm-smmu-regs.h"
 
+/*
+ * Apparently, some Qualcomm arm64 platforms which appear to expose their SMMU
+ * global register space are still, in fact, using a hypervisor to mediate it
+ * by trapping and emulating register accesses. Sadly, some deployed versions
+ * of said trapping code have bugs wherein they go horribly wrong for stores
+ * using r31 (i.e. XZR/WZR) as the source register.
+ */
+#define QCOM_DUMMY_VAL -1
+
 #define ARM_MMU500_ACTLR_CPRE  (1 

[PATCH] iommu: replace single-char identifiers in macros

2019-06-03 Thread Qian Cai
There are a few macros in IOMMU have single-char identifiers make the
code hard to read and debug. Replace them with meaningful names.

Suggested-by: Andrew Morton 
Signed-off-by: Qian Cai 
---
 include/linux/dmar.h | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index f8af1d770520..eb634912f475 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -104,12 +104,14 @@ static inline bool dmar_rcu_check(void)
 
 #definedmar_rcu_dereference(p) rcu_dereference_check((p), 
dmar_rcu_check())
 
-#definefor_each_dev_scope(a, c, p, d)  \
-   for ((p) = 0; ((d) = (p) < (c) ? dmar_rcu_dereference((a)[(p)].dev) : \
-   NULL, (p) < (c)); (p)++)
-
-#definefor_each_active_dev_scope(a, c, p, d)   \
-   for_each_dev_scope((a), (c), (p), (d))  if (!(d)) { continue; } else
+#define for_each_dev_scope(devs, cnt, i, tmp)  \
+   for ((i) = 0; ((tmp) = (i) < (cnt) ?\
+   dmar_rcu_dereference((devs)[(i)].dev) : NULL, (i) < (cnt)); \
+   (i)++)
+
+#define for_each_active_dev_scope(devs, cnt, i, tmp)   \
+   for_each_dev_scope((devs), (cnt), (i), (tmp))   \
+   if (!(tmp)) { continue; } else
 
 extern int dmar_table_init(void);
 extern int dmar_dev_scope_init(void);
-- 
1.8.3.1



Re: [PATCH] iommu: replace single-char identifiers in macros

2019-06-03 Thread Robin Murphy

On 03/06/2019 13:59, Qian Cai wrote:

There are a few macros in IOMMU have single-char identifiers make the
code hard to read and debug. Replace them with meaningful names.

Suggested-by: Andrew Morton 
Signed-off-by: Qian Cai 
---
  include/linux/dmar.h | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index f8af1d770520..eb634912f475 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -104,12 +104,14 @@ static inline bool dmar_rcu_check(void)
  
  #define	dmar_rcu_dereference(p)	rcu_dereference_check((p), dmar_rcu_check())
  
-#define	for_each_dev_scope(a, c, p, d)	\

-   for ((p) = 0; ((d) = (p) < (c) ? dmar_rcu_dereference((a)[(p)].dev) : \
-   NULL, (p) < (c)); (p)++)
-
-#definefor_each_active_dev_scope(a, c, p, d)   \
-   for_each_dev_scope((a), (c), (p), (d))  if (!(d)) { continue; } else
+#define for_each_dev_scope(devs, cnt, i, tmp)  \
+   for ((i) = 0; ((tmp) = (i) < (cnt) ? \


Given that "tmp" actually appears to be some sort of device cursor, I'm 
not sure that that naming really achieves the stated goal of clarity :/


Robin.


+   dmar_rcu_dereference((devs)[(i)].dev) : NULL, (i) < (cnt)); \
+   (i)++)
+
+#define for_each_active_dev_scope(devs, cnt, i, tmp)   \
+   for_each_dev_scope((devs), (cnt), (i), (tmp))   \
+   if (!(tmp)) { continue; } else
  
  extern int dmar_table_init(void);

  extern int dmar_dev_scope_init(void);



Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Rob Clark
On Mon, Jun 3, 2019 at 4:14 AM Robin Murphy  wrote:
>
> On 03/06/2019 11:47, Rob Clark wrote:
> > On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa  wrote:
> >>
> >> On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
> >>>
> >>> So, another case I've come across, on the display side.. I'm working
> >>> on handling the case where bootloader enables display (and takes iommu
> >>> out of reset).. as soon as DMA domain gets attached we get iommu
> >>> faults, because bootloader has already configured display for scanout.
> >>> Unfortunately this all happens before actual driver is probed and has
> >>> a chance to intervene.
> >>>
> >>> It's rather unfortunate that we tried to be clever rather than just
> >>> making drivers call some function to opt-in to the hookup of dma iommu
> >>> ops :-(
> >>
> >> I think it still works for the 90% of cases and if 10% needs some
> >> explicit work in the drivers, that's better than requiring 100% of the
> >> drivers to do things manually.
>
> Right, it's not about "being clever", it's about not adding opt-in code
> to the hundreds and hundreds and hundreds of drivers which *might* ever
> find themselves used on a system where they would need the IOMMU's help
> for DMA.

Well, I mean, at one point we didn't do the automatic iommu hookup, we
could have just stuck with that and added a helper so drivers could
request the hookup.  Things wouldn't have been any more broken than
before, and when people bring up systems where iommu is required, they
could have added the necessary dma_iommu_configure() call.  But that
is water under the bridge now.

> >> Adding Marek who had the same problem on Exynos.
> >
> > I do wonder how many drivers need to iommu_map in their ->probe()?
> > I'm thinking moving the auto-hookup to after a successful probe(),
> > with some function a driver could call if they need mapping in probe,
> > might be a way to eventually get rid of the blacklist.  But I've no
> > idea how to find the subset of drivers that would be broken without a
> > dma_setup_iommu_stuff() call in their probe.
>
> Wouldn't help much. That particular issue is nothing to do with DMA ops
> really, it's about IOMMU initialisation. On something like SMMUv3 there
> is literally no way to turn the thing on without blocking unknown
> traffic - it *has* to have stream table entries programmed before it can
> even allow stuff to bypass.

fwiw, on these sdm850 laptops (and I think sdm845 boards like mtp and
db845c) the SMMU (v2) is taken out of bypass by the bootloader.  Bjorn
has some patches for arm-smmu to read-back the state at boot.

(Although older devices were booting with display enabled but SMMU in bypass.)

> The answer is either to either pay attention to the "Quiesce all DMA
> capable devices" part of the boot protocol (which has been there since
> pretty much forever), or to come up with some robust way of
> communicating "live" boot-time mappings to IOMMU drivers so that they
> can program themselves appropriately during probe.

Unfortunately display lit up by bootloader is basically ubiquitous.
Every single android phone does it.  All of the windows-arm laptops do
it.  Basically 99.9% of things that have a display do it.  It's a
tough problem to solve involving clks, gdsc, regulators, etc, in
addition to the display driver.. and sadly now smmu.  And devices
where we can make changes to and update the firmware are rather rare.
So there is really no option to ignore this problem.

I guess if we had some early-quirks mechanism like x86 does, we could
mash the display off early in boot.  That would be an easy solution.
Although I'd prefer a proper solution so that android phones aren't
carrying around enormous stacks of hack patches to achieve a smooth
flicker-free boot.

I suppose arm-smmu could realize that the context bank is already
taken out of bypass..  although I'm not entirely sure if we can assume
that the CPU would be able to read back the pagetable setup by the
bootloader.  Maybe Vivek has an idea about that?

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: replace single-char identifiers in macros

2019-06-03 Thread Qian Cai
On Mon, 2019-06-03 at 14:07 +0100, Robin Murphy wrote:
> On 03/06/2019 13:59, Qian Cai wrote:
> > There are a few macros in IOMMU have single-char identifiers make the
> > code hard to read and debug. Replace them with meaningful names.
> > 
> > Suggested-by: Andrew Morton 
> > Signed-off-by: Qian Cai 
> > ---
> >   include/linux/dmar.h | 14 --
> >   1 file changed, 8 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/linux/dmar.h b/include/linux/dmar.h
> > index f8af1d770520..eb634912f475 100644
> > --- a/include/linux/dmar.h
> > +++ b/include/linux/dmar.h
> > @@ -104,12 +104,14 @@ static inline bool dmar_rcu_check(void)
> >   
> >   #define   dmar_rcu_dereference(p) rcu_dereference_check((p),
> > dmar_rcu_check())
> >   
> > -#definefor_each_dev_scope(a, c, p, d)  \
> > -   for ((p) = 0; ((d) = (p) < (c) ? dmar_rcu_dereference((a)[(p)].dev)
> > : \
> > -   NULL, (p) < (c)); (p)++)
> > -
> > -#definefor_each_active_dev_scope(a, c, p, d)   \
> > -   for_each_dev_scope((a), (c), (p), (d))  if (!(d)) { continue;
> > } else
> > +#define for_each_dev_scope(devs, cnt, i, tmp)  
> > \
> > +   for ((i) = 0; ((tmp) = (i) < (cnt) ?
> > \
> 
> Given that "tmp" actually appears to be some sort of device cursor, I'm 
> not sure that that naming really achieves the stated goal of clarity :/

"tmp" is used in the callers everywhere though, although I suppose something
like "tmp_dev" can be used if you prefer.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Thierry Reding
On Mon, Jun 03, 2019 at 12:14:27PM +0100, Robin Murphy wrote:
> On 03/06/2019 11:47, Rob Clark wrote:
> > On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa  wrote:
> > > 
> > > On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
> > > > 
> > > > On Fri, May 10, 2019 at 7:35 AM Rob Clark  wrote:
> > > > > 
> > > > > On Tue, Dec 4, 2018 at 2:29 PM Rob Herring  wrote:
> > > > > > 
> > > > > > On Sat, Dec 1, 2018 at 10:54 AM Rob Clark  
> > > > > > wrote:
> > > > > > > 
> > > > > > > This solves a problem we see with drm/msm, caused by getting
> > > > > > > iommu_dma_ops while we attach our own domain and manage it 
> > > > > > > directly at
> > > > > > > the iommu API level:
> > > > > > > 
> > > > > > >[0038] user address but active_mm is swapper
> > > > > > >Internal error: Oops: 9605 [#1] PREEMPT SMP
> > > > > > >Modules linked in:
> > > > > > >CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 
> > > > > > > 4.19.3 #90
> > > > > > >Hardware name: xxx (DT)
> > > > > > >Workqueue: events deferred_probe_work_func
> > > > > > >pstate: 80c9 (Nzcv daif +PAN +UAO)
> > > > > > >pc : iommu_dma_map_sg+0x7c/0x2c8
> > > > > > >lr : iommu_dma_map_sg+0x40/0x2c8
> > > > > > >sp : ff80095eb4f0
> > > > > > >x29: ff80095eb4f0 x28: 
> > > > > > >x27: ffc0f9431578 x26: 
> > > > > > >x25:  x24: 0003
> > > > > > >x23: 0001 x22: ffc0fa9ac010
> > > > > > >x21:  x20: ffc0fab40980
> > > > > > >x19: ffc0fab40980 x18: 0003
> > > > > > >x17: 01c4 x16: 0007
> > > > > > >x15: 000e x14: 
> > > > > > >x13:  x12: 0028
> > > > > > >x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> > > > > > >x9 :  x8 : ffc0fab409a0
> > > > > > >x7 :  x6 : 0002
> > > > > > >x5 : 0001 x4 : 
> > > > > > >x3 : 0001 x2 : 0002
> > > > > > >x1 : ffc0f9431578 x0 : 
> > > > > > >Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
> > > > > > >Call trace:
> > > > > > > iommu_dma_map_sg+0x7c/0x2c8
> > > > > > > __iommu_map_sg_attrs+0x70/0x84
> > > > > > > get_pages+0x170/0x1e8
> > > > > > > msm_gem_get_iova+0x8c/0x128
> > > > > > > _msm_gem_kernel_new+0x6c/0xc8
> > > > > > > msm_gem_kernel_new+0x4c/0x58
> > > > > > > dsi_tx_buf_alloc_6g+0x4c/0x8c
> > > > > > > msm_dsi_host_modeset_init+0xc8/0x108
> > > > > > > msm_dsi_modeset_init+0x54/0x18c
> > > > > > > _dpu_kms_drm_obj_init+0x430/0x474
> > > > > > > dpu_kms_hw_init+0x5f8/0x6b4
> > > > > > > msm_drm_bind+0x360/0x6c8
> > > > > > > try_to_bring_up_master.part.7+0x28/0x70
> > > > > > > component_master_add_with_match+0xe8/0x124
> > > > > > > msm_pdev_probe+0x294/0x2b4
> > > > > > > platform_drv_probe+0x58/0xa4
> > > > > > > really_probe+0x150/0x294
> > > > > > > driver_probe_device+0xac/0xe8
> > > > > > > __device_attach_driver+0xa4/0xb4
> > > > > > > bus_for_each_drv+0x98/0xc8
> > > > > > > __device_attach+0xac/0x12c
> > > > > > > device_initial_probe+0x24/0x30
> > > > > > > bus_probe_device+0x38/0x98
> > > > > > > deferred_probe_work_func+0x78/0xa4
> > > > > > > process_one_work+0x24c/0x3dc
> > > > > > > worker_thread+0x280/0x360
> > > > > > > kthread+0x134/0x13c
> > > > > > > ret_from_fork+0x10/0x18
> > > > > > >Code: d284 91000725 6b17039f 5400048a (f9401f40)
> > > > > > >---[ end trace f22dda57f3648e2c ]---
> > > > > > >Kernel panic - not syncing: Fatal exception
> > > > > > >SMP: stopping secondary CPUs
> > > > > > >Kernel Offset: disabled
> > > > > > >CPU features: 0x0,22802a18
> > > > > > >Memory Limit: none
> > > > > > > 
> > > > > > > The problem is that when drm/msm does it's own 
> > > > > > > iommu_attach_device(),
> > > > > > > now the domain returned by iommu_get_domain_for_dev() is drm/msm's
> > > > > > > domain, and it doesn't have domain->iova_cookie.
> > > > > > > 
> > > > > > > We kind of avoided this problem prior to sdm845/dpu because the 
> > > > > > > iommu
> > > > > > > was attached to the mdp node in dt, which is a child of the 
> > > > > > > toplevel
> > > > > > > mdss node (which corresponds to the dev passed in dma_map_sg()).  
> > > > > > > But
> > > > > > > with sdm845, now the iommu is attached at the mdss level so we 
> > > > > > > hit the
> > > > > > > iommu_dma_ops in dma_map_sg().
> > > > > > > 
> > > > > > > But auto allocating/attaching a domain before the driver is 
> > > > > > > probed was
> > > > > > > already a blocking problem for enabling per-context pagetables 
> > > > > > > for the
> > > > > > > GPU.  This problem is also now solved with this patch.
> > > > > > > 
> > > > > > > Fi

Re: [PATCH] iommu: replace single-char identifiers in macros

2019-06-03 Thread Robin Murphy

On 03/06/2019 14:29, Qian Cai wrote:

On Mon, 2019-06-03 at 14:07 +0100, Robin Murphy wrote:

On 03/06/2019 13:59, Qian Cai wrote:

There are a few macros in IOMMU have single-char identifiers make the
code hard to read and debug. Replace them with meaningful names.

Suggested-by: Andrew Morton 
Signed-off-by: Qian Cai 
---
   include/linux/dmar.h | 14 --
   1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index f8af1d770520..eb634912f475 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -104,12 +104,14 @@ static inline bool dmar_rcu_check(void)
   
   #define	dmar_rcu_dereference(p)	rcu_dereference_check((p),

dmar_rcu_check())
   
-#define	for_each_dev_scope(a, c, p, d)	\

-   for ((p) = 0; ((d) = (p) < (c) ? dmar_rcu_dereference((a)[(p)].dev)
: \
-   NULL, (p) < (c)); (p)++)
-
-#definefor_each_active_dev_scope(a, c, p, d)   \
-   for_each_dev_scope((a), (c), (p), (d))  if (!(d)) { continue;
} else
+#define for_each_dev_scope(devs, cnt, i, tmp)  
\
+   for ((i) = 0; ((tmp) = (i) < (cnt) ? 
\


Given that "tmp" actually appears to be some sort of device cursor, I'm
not sure that that naming really achieves the stated goal of clarity :/


"tmp" is used in the callers everywhere though, although I suppose something
like "tmp_dev" can be used if you prefer.


I don't have any preference, I'm just questioning the assertion in the 
commit message - as a reader not intimately familiar with this code, 
"tmp" is honestly no more meaningful than "d" was.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Thierry Reding
On Mon, Jun 03, 2019 at 06:20:57AM -0700, Rob Clark wrote:
> On Mon, Jun 3, 2019 at 4:14 AM Robin Murphy  wrote:
> >
> > On 03/06/2019 11:47, Rob Clark wrote:
> > > On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa  wrote:
> > >>
> > >> On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
> > >>>
> > >>> So, another case I've come across, on the display side.. I'm working
> > >>> on handling the case where bootloader enables display (and takes iommu
> > >>> out of reset).. as soon as DMA domain gets attached we get iommu
> > >>> faults, because bootloader has already configured display for scanout.
> > >>> Unfortunately this all happens before actual driver is probed and has
> > >>> a chance to intervene.
> > >>>
> > >>> It's rather unfortunate that we tried to be clever rather than just
> > >>> making drivers call some function to opt-in to the hookup of dma iommu
> > >>> ops :-(
> > >>
> > >> I think it still works for the 90% of cases and if 10% needs some
> > >> explicit work in the drivers, that's better than requiring 100% of the
> > >> drivers to do things manually.
> >
> > Right, it's not about "being clever", it's about not adding opt-in code
> > to the hundreds and hundreds and hundreds of drivers which *might* ever
> > find themselves used on a system where they would need the IOMMU's help
> > for DMA.
> 
> Well, I mean, at one point we didn't do the automatic iommu hookup, we
> could have just stuck with that and added a helper so drivers could
> request the hookup.  Things wouldn't have been any more broken than
> before, and when people bring up systems where iommu is required, they
> could have added the necessary dma_iommu_configure() call.  But that
> is water under the bridge now.
> 
> > >> Adding Marek who had the same problem on Exynos.
> > >
> > > I do wonder how many drivers need to iommu_map in their ->probe()?
> > > I'm thinking moving the auto-hookup to after a successful probe(),
> > > with some function a driver could call if they need mapping in probe,
> > > might be a way to eventually get rid of the blacklist.  But I've no
> > > idea how to find the subset of drivers that would be broken without a
> > > dma_setup_iommu_stuff() call in their probe.
> >
> > Wouldn't help much. That particular issue is nothing to do with DMA ops
> > really, it's about IOMMU initialisation. On something like SMMUv3 there
> > is literally no way to turn the thing on without blocking unknown
> > traffic - it *has* to have stream table entries programmed before it can
> > even allow stuff to bypass.
> 
> fwiw, on these sdm850 laptops (and I think sdm845 boards like mtp and
> db845c) the SMMU (v2) is taken out of bypass by the bootloader.  Bjorn
> has some patches for arm-smmu to read-back the state at boot.
> 
> (Although older devices were booting with display enabled but SMMU in bypass.)
> 
> > The answer is either to either pay attention to the "Quiesce all DMA
> > capable devices" part of the boot protocol (which has been there since
> > pretty much forever), or to come up with some robust way of
> > communicating "live" boot-time mappings to IOMMU drivers so that they
> > can program themselves appropriately during probe.
> 
> Unfortunately display lit up by bootloader is basically ubiquitous.
> Every single android phone does it.  All of the windows-arm laptops do
> it.  Basically 99.9% of things that have a display do it.  It's a
> tough problem to solve involving clks, gdsc, regulators, etc, in
> addition to the display driver.. and sadly now smmu.  And devices
> where we can make changes to and update the firmware are rather rare.
> So there is really no option to ignore this problem.

I think this is going to require at least some degree of cooperation
from the bootloader. See my other thread on that. Unfortunately I think
this is an area where everyone has kind of been doing their own thing
even if standard bindings for this have been around for quite a while
(actually 5 years by now). I suspect that most bootloaders that run
today are not that old, but as always downstream doesn't follow closely
where upstream is guiding.

> I guess if we had some early-quirks mechanism like x86 does, we could
> mash the display off early in boot.  That would be an easy solution.
> Although I'd prefer a proper solution so that android phones aren't
> carrying around enormous stacks of hack patches to achieve a smooth
> flicker-free boot.

The proper solution, I think, is for bootloader and kernel to work
together. Unfortunately that means we'll just have to bite the bullet
and get things fixed across the stack. I think this is just the latest
manifestation of the catch-up that upstream has been playing. Only now
that we're starting to enable all of these features upstream are we
running into interoperability issues.

If upstream had been further along we would've caught these issues way
ahead of time and could've influenced the designs of bootloader much
earlier. Now, unless we get all the vendors to go back and mo

The dma-mapping for-next tree has been rebased

2019-06-03 Thread Christoph Hellwig
I want to pull in the cache maintainance fix for the dma-iommu
conversion.  Hope it didn't cause any inconvenience.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH -next] iommu/intel: silence a variable set but not used

2019-06-03 Thread Qian Cai
The commit "iommu/vt-d: Probe DMA-capable ACPI name space devices"
introduced a compilation warning due to the "iommu" variable in
for_each_active_iommu() but never used the for each element, i.e,
"drhd->iommu".

drivers/iommu/intel-iommu.c: In function 'probe_acpi_namespace_devices':
drivers/iommu/intel-iommu.c:4639:22: warning: variable 'iommu' set but
not used [-Wunused-but-set-variable]
  struct intel_iommu *iommu;

Silence the warning the same way as in the commit d3ed71e5cc50
("drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not used")

Signed-off-by: Qian Cai 
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b431cc6f6ba4..2897354a341a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4636,7 +4636,8 @@ static int __init platform_optin_force_iommu(void)
 static int __init probe_acpi_namespace_devices(void)
 {
struct dmar_drhd_unit *drhd;
-   struct intel_iommu *iommu;
+   /* To avoid a -Wunused-but-set-variable warning. */
+   struct intel_iommu *iommu __maybe_unused;
struct device *dev;
int i, ret = 0;
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Rob Clark
On Mon, Jun 3, 2019 at 6:54 AM Thierry Reding  wrote:
>
> On Mon, Jun 03, 2019 at 06:20:57AM -0700, Rob Clark wrote:
> > On Mon, Jun 3, 2019 at 4:14 AM Robin Murphy  wrote:
> > >
> > > On 03/06/2019 11:47, Rob Clark wrote:
> > > > On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa  wrote:
> > > >>
> > > >> On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
> > > >>>
> > > >>> So, another case I've come across, on the display side.. I'm working
> > > >>> on handling the case where bootloader enables display (and takes iommu
> > > >>> out of reset).. as soon as DMA domain gets attached we get iommu
> > > >>> faults, because bootloader has already configured display for scanout.
> > > >>> Unfortunately this all happens before actual driver is probed and has
> > > >>> a chance to intervene.
> > > >>>
> > > >>> It's rather unfortunate that we tried to be clever rather than just
> > > >>> making drivers call some function to opt-in to the hookup of dma iommu
> > > >>> ops :-(
> > > >>
> > > >> I think it still works for the 90% of cases and if 10% needs some
> > > >> explicit work in the drivers, that's better than requiring 100% of the
> > > >> drivers to do things manually.
> > >
> > > Right, it's not about "being clever", it's about not adding opt-in code
> > > to the hundreds and hundreds and hundreds of drivers which *might* ever
> > > find themselves used on a system where they would need the IOMMU's help
> > > for DMA.
> >
> > Well, I mean, at one point we didn't do the automatic iommu hookup, we
> > could have just stuck with that and added a helper so drivers could
> > request the hookup.  Things wouldn't have been any more broken than
> > before, and when people bring up systems where iommu is required, they
> > could have added the necessary dma_iommu_configure() call.  But that
> > is water under the bridge now.
> >
> > > >> Adding Marek who had the same problem on Exynos.
> > > >
> > > > I do wonder how many drivers need to iommu_map in their ->probe()?
> > > > I'm thinking moving the auto-hookup to after a successful probe(),
> > > > with some function a driver could call if they need mapping in probe,
> > > > might be a way to eventually get rid of the blacklist.  But I've no
> > > > idea how to find the subset of drivers that would be broken without a
> > > > dma_setup_iommu_stuff() call in their probe.
> > >
> > > Wouldn't help much. That particular issue is nothing to do with DMA ops
> > > really, it's about IOMMU initialisation. On something like SMMUv3 there
> > > is literally no way to turn the thing on without blocking unknown
> > > traffic - it *has* to have stream table entries programmed before it can
> > > even allow stuff to bypass.
> >
> > fwiw, on these sdm850 laptops (and I think sdm845 boards like mtp and
> > db845c) the SMMU (v2) is taken out of bypass by the bootloader.  Bjorn
> > has some patches for arm-smmu to read-back the state at boot.
> >
> > (Although older devices were booting with display enabled but SMMU in 
> > bypass.)
> >
> > > The answer is either to either pay attention to the "Quiesce all DMA
> > > capable devices" part of the boot protocol (which has been there since
> > > pretty much forever), or to come up with some robust way of
> > > communicating "live" boot-time mappings to IOMMU drivers so that they
> > > can program themselves appropriately during probe.
> >
> > Unfortunately display lit up by bootloader is basically ubiquitous.
> > Every single android phone does it.  All of the windows-arm laptops do
> > it.  Basically 99.9% of things that have a display do it.  It's a
> > tough problem to solve involving clks, gdsc, regulators, etc, in
> > addition to the display driver.. and sadly now smmu.  And devices
> > where we can make changes to and update the firmware are rather rare.
> > So there is really no option to ignore this problem.
>
> I think this is going to require at least some degree of cooperation
> from the bootloader. See my other thread on that. Unfortunately I think
> this is an area where everyone has kind of been doing their own thing
> even if standard bindings for this have been around for quite a while
> (actually 5 years by now). I suspect that most bootloaders that run
> today are not that old, but as always downstream doesn't follow closely
> where upstream is guiding.
>
> > I guess if we had some early-quirks mechanism like x86 does, we could
> > mash the display off early in boot.  That would be an easy solution.
> > Although I'd prefer a proper solution so that android phones aren't
> > carrying around enormous stacks of hack patches to achieve a smooth
> > flicker-free boot.
>
> The proper solution, I think, is for bootloader and kernel to work
> together. Unfortunately that means we'll just have to bite the bullet
> and get things fixed across the stack. I think this is just the latest
> manifestation of the catch-up that upstream has been playing. Only now
> that we're starting to enable all of these features upstream are we
> running 

Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Thierry Reding
On Mon, Jun 03, 2019 at 07:20:14AM -0700, Rob Clark wrote:
> On Mon, Jun 3, 2019 at 6:54 AM Thierry Reding  
> wrote:
> >
> > On Mon, Jun 03, 2019 at 06:20:57AM -0700, Rob Clark wrote:
> > > On Mon, Jun 3, 2019 at 4:14 AM Robin Murphy  wrote:
> > > >
> > > > On 03/06/2019 11:47, Rob Clark wrote:
> > > > > On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa  
> > > > > wrote:
> > > > >>
> > > > >> On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
> > > > >>>
> > > > >>> So, another case I've come across, on the display side.. I'm working
> > > > >>> on handling the case where bootloader enables display (and takes 
> > > > >>> iommu
> > > > >>> out of reset).. as soon as DMA domain gets attached we get iommu
> > > > >>> faults, because bootloader has already configured display for 
> > > > >>> scanout.
> > > > >>> Unfortunately this all happens before actual driver is probed and 
> > > > >>> has
> > > > >>> a chance to intervene.
> > > > >>>
> > > > >>> It's rather unfortunate that we tried to be clever rather than just
> > > > >>> making drivers call some function to opt-in to the hookup of dma 
> > > > >>> iommu
> > > > >>> ops :-(
> > > > >>
> > > > >> I think it still works for the 90% of cases and if 10% needs some
> > > > >> explicit work in the drivers, that's better than requiring 100% of 
> > > > >> the
> > > > >> drivers to do things manually.
> > > >
> > > > Right, it's not about "being clever", it's about not adding opt-in code
> > > > to the hundreds and hundreds and hundreds of drivers which *might* ever
> > > > find themselves used on a system where they would need the IOMMU's help
> > > > for DMA.
> > >
> > > Well, I mean, at one point we didn't do the automatic iommu hookup, we
> > > could have just stuck with that and added a helper so drivers could
> > > request the hookup.  Things wouldn't have been any more broken than
> > > before, and when people bring up systems where iommu is required, they
> > > could have added the necessary dma_iommu_configure() call.  But that
> > > is water under the bridge now.
> > >
> > > > >> Adding Marek who had the same problem on Exynos.
> > > > >
> > > > > I do wonder how many drivers need to iommu_map in their ->probe()?
> > > > > I'm thinking moving the auto-hookup to after a successful probe(),
> > > > > with some function a driver could call if they need mapping in probe,
> > > > > might be a way to eventually get rid of the blacklist.  But I've no
> > > > > idea how to find the subset of drivers that would be broken without a
> > > > > dma_setup_iommu_stuff() call in their probe.
> > > >
> > > > Wouldn't help much. That particular issue is nothing to do with DMA ops
> > > > really, it's about IOMMU initialisation. On something like SMMUv3 there
> > > > is literally no way to turn the thing on without blocking unknown
> > > > traffic - it *has* to have stream table entries programmed before it can
> > > > even allow stuff to bypass.
> > >
> > > fwiw, on these sdm850 laptops (and I think sdm845 boards like mtp and
> > > db845c) the SMMU (v2) is taken out of bypass by the bootloader.  Bjorn
> > > has some patches for arm-smmu to read-back the state at boot.
> > >
> > > (Although older devices were booting with display enabled but SMMU in 
> > > bypass.)
> > >
> > > > The answer is either to either pay attention to the "Quiesce all DMA
> > > > capable devices" part of the boot protocol (which has been there since
> > > > pretty much forever), or to come up with some robust way of
> > > > communicating "live" boot-time mappings to IOMMU drivers so that they
> > > > can program themselves appropriately during probe.
> > >
> > > Unfortunately display lit up by bootloader is basically ubiquitous.
> > > Every single android phone does it.  All of the windows-arm laptops do
> > > it.  Basically 99.9% of things that have a display do it.  It's a
> > > tough problem to solve involving clks, gdsc, regulators, etc, in
> > > addition to the display driver.. and sadly now smmu.  And devices
> > > where we can make changes to and update the firmware are rather rare.
> > > So there is really no option to ignore this problem.
> >
> > I think this is going to require at least some degree of cooperation
> > from the bootloader. See my other thread on that. Unfortunately I think
> > this is an area where everyone has kind of been doing their own thing
> > even if standard bindings for this have been around for quite a while
> > (actually 5 years by now). I suspect that most bootloaders that run
> > today are not that old, but as always downstream doesn't follow closely
> > where upstream is guiding.
> >
> > > I guess if we had some early-quirks mechanism like x86 does, we could
> > > mash the display off early in boot.  That would be an easy solution.
> > > Although I'd prefer a proper solution so that android phones aren't
> > > carrying around enormous stacks of hack patches to achieve a smooth
> > > flicker-free boot.
> >
> > The proper solution, I think, is for bootloader and 

Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Jordan Crouse
> It shouldn't be a problem to hook something else up to the IOMMU
> subsystem. Hopefully it's something that people are going to standardize
> on.
> 
> > 3) The automatic attach of DMA domain is also causing a different
> >problem for us on the GPU side, preventing us from supporting per-
> >context pagetables (since we end up with a disagreement about
> >which context bank is used between arm-smmu and the firmware).
> 
> I'm not sure I understand this issue. Is the context bank hard-coded in
> the firmware somehow? Or is it possible to rewrite which one is going to
> be used at runtime? Do you switch out the actual page tables rather than
> the IOMMU domains for context switching?
 
We have a rather long history on this but the tl;dr is that the GPU microcode
switches the pagetables by rewriting TTBR0 on the fly (since this is
arm-smmu-v2 we have no better option) and yes, unfortunately it is hard coded
to use context bank 0. [1] is the current patchset to support all this,
including my own take on avoiding the dma-domain (all the cool kids have one).

Jordan

[1] https://patchwork.freedesktop.org/series/57441/

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/4] iommu: Add device fault reporting API

2019-06-03 Thread Jean-Philippe Brucker
Allow device drivers and VFIO to get notified on IOMMU translation
fault, and handle recoverable faults (PCI PRI). Several series require
this API (Intel VT-d and Arm SMMUv3 nested support, as well as the
generic host SVA implementation).

Changes since v1 [1]:
* Allocate iommu_param earlier, in iommu_probe_device().
* Pass struct iommu_fault to fault handlers, instead of the
  iommu_fault_event wrapper.
* Removed unused iommu_fault_event::iommu_private.
* Removed unnecessary iommu_page_response::addr.
* Added iommu_page_response::version, which would allow to introduce a
  new incompatible iommu_page_response structure (as opposed to just
  adding a flag + field).

[1] [PATCH 0/4] iommu: Add device fault reporting API

https://lore.kernel.org/lkml/20190523180613.55049-1-jean-philippe.bruc...@arm.com/

Jacob Pan (3):
  driver core: Add per device iommu param
  iommu: Introduce device fault data
  iommu: Introduce device fault report API

Jean-Philippe Brucker (1):
  iommu: Add recoverable fault reporting

 drivers/iommu/iommu.c  | 236 -
 include/linux/device.h |   3 +
 include/linux/iommu.h  |  87 ++
 include/uapi/linux/iommu.h | 153 
 4 files changed, 476 insertions(+), 3 deletions(-)
 create mode 100644 include/uapi/linux/iommu.h

-- 
2.21.0



[PATCH v2 2/4] iommu: Introduce device fault data

2019-06-03 Thread Jean-Philippe Brucker
From: Jacob Pan 

Device faults detected by IOMMU can be reported outside the IOMMU
subsystem for further processing. This patch introduces
a generic device fault data structure.

The fault can be either an unrecoverable fault or a page request,
also referred to as a recoverable fault.

We only care about non internal faults that are likely to be reported
to an external subsystem.

Signed-off-by: Jacob Pan 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Liu, Yi L 
Signed-off-by: Ashok Raj 
Signed-off-by: Eric Auger 
---
 include/linux/iommu.h  |  39 
 include/uapi/linux/iommu.h | 118 +
 2 files changed, 157 insertions(+)
 create mode 100644 include/uapi/linux/iommu.h

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a815cf6f6f47..2b05056d5fa7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define IOMMU_READ (1 << 0)
 #define IOMMU_WRITE(1 << 1)
@@ -49,6 +50,7 @@ struct device;
 struct iommu_domain;
 struct notifier_block;
 struct iommu_sva;
+struct iommu_fault_event;
 
 /* iommu fault flags */
 #define IOMMU_FAULT_READ   0x0
@@ -58,6 +60,7 @@ typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
struct device *, unsigned long, int, void *);
 typedef int (*iommu_mm_exit_handler_t)(struct device *dev, struct iommu_sva *,
   void *);
+typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault *, void *);
 
 struct iommu_domain_geometry {
dma_addr_t aperture_start; /* First address that can be mapped*/
@@ -301,6 +304,41 @@ struct iommu_device {
struct device *dev;
 };
 
+/**
+ * struct iommu_fault_event - Generic fault event
+ *
+ * Can represent recoverable faults such as a page requests or
+ * unrecoverable faults such as DMA or IRQ remapping faults.
+ *
+ * @fault: fault descriptor
+ */
+struct iommu_fault_event {
+   struct iommu_fault fault;
+};
+
+/**
+ * struct iommu_fault_param - per-device IOMMU fault data
+ * @handler: Callback function to handle IOMMU faults at device level
+ * @data: handler private data
+ */
+struct iommu_fault_param {
+   iommu_dev_fault_handler_t handler;
+   void *data;
+};
+
+/**
+ * struct iommu_param - collection of per-device IOMMU data
+ *
+ * @fault_param: IOMMU detected device fault reporting data
+ *
+ * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
+ * struct iommu_group  *iommu_group;
+ * struct iommu_fwspec *iommu_fwspec;
+ */
+struct iommu_param {
+   struct iommu_fault_param *fault_param;
+};
+
 int  iommu_device_register(struct iommu_device *iommu);
 void iommu_device_unregister(struct iommu_device *iommu);
 int  iommu_device_sysfs_add(struct iommu_device *iommu,
@@ -504,6 +542,7 @@ struct iommu_ops {};
 struct iommu_group {};
 struct iommu_fwspec {};
 struct iommu_device {};
+struct iommu_fault_param {};
 
 static inline bool iommu_present(struct bus_type *bus)
 {
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
new file mode 100644
index ..796402174d6c
--- /dev/null
+++ b/include/uapi/linux/iommu.h
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * IOMMU user API definitions
+ */
+
+#ifndef _UAPI_IOMMU_H
+#define _UAPI_IOMMU_H
+
+#include 
+
+#define IOMMU_FAULT_PERM_READ  (1 << 0) /* read */
+#define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */
+#define IOMMU_FAULT_PERM_EXEC  (1 << 2) /* exec */
+#define IOMMU_FAULT_PERM_PRIV  (1 << 3) /* privileged */
+
+/* Generic fault types, can be expanded IRQ remapping fault */
+enum iommu_fault_type {
+   IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */
+   IOMMU_FAULT_PAGE_REQ,   /* page request fault */
+};
+
+enum iommu_fault_reason {
+   IOMMU_FAULT_REASON_UNKNOWN = 0,
+
+   /* Could not access the PASID table (fetch caused external abort) */
+   IOMMU_FAULT_REASON_PASID_FETCH,
+
+   /* PASID entry is invalid or has configuration errors */
+   IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+
+   /*
+* PASID is out of range (e.g. exceeds the maximum PASID
+* supported by the IOMMU) or disabled.
+*/
+   IOMMU_FAULT_REASON_PASID_INVALID,
+
+   /*
+* An external abort occurred fetching (or updating) a translation
+* table descriptor
+*/
+   IOMMU_FAULT_REASON_WALK_EABT,
+
+   /*
+* Could not access the page table entry (Bad address),
+* actual translation fault
+*/
+   IOMMU_FAULT_REASON_PTE_FETCH,
+
+   /* Protection flag check failed */
+   IOMMU_FAULT_REASON_PERMISSION,
+
+   /* access flag check failed */
+   IOMMU_FAULT_REASON_ACCESS,
+
+   /* Output address of a translation stage caused Address Size fault */
+   IOMMU_FAULT_REASON_OOR_ADDRESS,
+};
+
+/**
+ * struct iommu_fault_unrecoverabl

[PATCH v2 3/4] iommu: Introduce device fault report API

2019-06-03 Thread Jean-Philippe Brucker
From: Jacob Pan 

Traditionally, device specific faults are detected and handled within
their own device drivers. When IOMMU is enabled, faults such as DMA
related transactions are detected by IOMMU. There is no generic
reporting mechanism to report faults back to the in-kernel device
driver or the guest OS in case of assigned devices.

This patch introduces a registration API for device specific fault
handlers. This differs from the existing iommu_set_fault_handler/
report_iommu_fault infrastructures in several ways:
- it allows to report more sophisticated fault events (both
  unrecoverable faults and page request faults) due to the nature
  of the iommu_fault struct
- it is device specific and not domain specific.

The current iommu_report_device_fault() implementation only handles
the "shoot and forget" unrecoverable fault case. Handling of page
request faults or stalled faults will come later.

Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Eric Auger 
---
 drivers/iommu/iommu.c | 146 +-
 include/linux/iommu.h |  29 +
 2 files changed, 172 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 67ee6623f9b2..8037a3f07f07 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -107,15 +107,43 @@ void iommu_device_unregister(struct iommu_device *iommu)
spin_unlock(&iommu_device_lock);
 }
 
+static struct iommu_param *iommu_get_dev_param(struct device *dev)
+{
+   struct iommu_param *param = dev->iommu_param;
+
+   if (param)
+   return param;
+
+   param = kzalloc(sizeof(*param), GFP_KERNEL);
+   if (!param)
+   return NULL;
+
+   mutex_init(¶m->lock);
+   dev->iommu_param = param;
+   return param;
+}
+
+static void iommu_free_dev_param(struct device *dev)
+{
+   kfree(dev->iommu_param);
+   dev->iommu_param = NULL;
+}
+
 int iommu_probe_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
-   int ret = -EINVAL;
+   int ret;
 
WARN_ON(dev->iommu_group);
+   if (!ops)
+   return -EINVAL;
 
-   if (ops)
-   ret = ops->add_device(dev);
+   if (!iommu_get_dev_param(dev))
+   return -ENOMEM;
+
+   ret = ops->add_device(dev);
+   if (ret)
+   iommu_free_dev_param(dev);
 
return ret;
 }
@@ -126,6 +154,8 @@ void iommu_release_device(struct device *dev)
 
if (dev->iommu_group)
ops->remove_device(dev);
+
+   iommu_free_dev_param(dev);
 }
 
 static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
@@ -854,6 +884,116 @@ int iommu_group_unregister_notifier(struct iommu_group 
*group,
 }
 EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
 
+/**
+ * iommu_register_device_fault_handler() - Register a device fault handler
+ * @dev: the device
+ * @handler: the fault handler
+ * @data: private data passed as argument to the handler
+ *
+ * When an IOMMU fault event is received, this handler gets called with the
+ * fault event and data as argument. The handler should return 0 on success.
+ *
+ * Return 0 if the fault handler was installed successfully, or an error.
+ */
+int iommu_register_device_fault_handler(struct device *dev,
+   iommu_dev_fault_handler_t handler,
+   void *data)
+{
+   struct iommu_param *param = dev->iommu_param;
+   int ret = 0;
+
+   if (!param)
+   return -EINVAL;
+
+   mutex_lock(¶m->lock);
+   /* Only allow one fault handler registered for each device */
+   if (param->fault_param) {
+   ret = -EBUSY;
+   goto done_unlock;
+   }
+
+   get_device(dev);
+   param->fault_param = kzalloc(sizeof(*param->fault_param), GFP_KERNEL);
+   if (!param->fault_param) {
+   put_device(dev);
+   ret = -ENOMEM;
+   goto done_unlock;
+   }
+   param->fault_param->handler = handler;
+   param->fault_param->data = data;
+
+done_unlock:
+   mutex_unlock(¶m->lock);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
+
+/**
+ * iommu_unregister_device_fault_handler() - Unregister the device fault 
handler
+ * @dev: the device
+ *
+ * Remove the device fault handler installed with
+ * iommu_register_device_fault_handler().
+ *
+ * Return 0 on success, or an error.
+ */
+int iommu_unregister_device_fault_handler(struct device *dev)
+{
+   struct iommu_param *param = dev->iommu_param;
+   int ret = 0;
+
+   if (!param)
+   return -EINVAL;
+
+   mutex_lock(¶m->lock);
+
+   if (!param->fault_param)
+   goto unlock;
+
+   kfree(param->fault_param);
+   param->fault_param = NULL;
+   put_device(dev);
+unlock:
+   mutex_unlock(¶m->lock);
+
+   re

[PATCH v2 4/4] iommu: Add recoverable fault reporting

2019-06-03 Thread Jean-Philippe Brucker
Some IOMMU hardware features, for example PCI PRI and Arm SMMU Stall,
enable recoverable I/O page faults. Allow IOMMU drivers to report PRI Page
Requests and Stall events through the new fault reporting API. The
consumer of the fault can be either an I/O page fault handler in the host,
or a guest OS.

Once handled, the fault must be completed by sending a page response back
to the IOMMU. Add an iommu_page_response() function to complete a page
fault.

There are two ways to extend the userspace API:
* Add a field to iommu_page_response and a flag to
  iommu_page_response::flags describing the validity of this field.
* Introduce a new iommu_page_response_X structure with a different version
  number. The kernel must then support both versions.

Signed-off-by: Jacob Pan 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu.c  | 94 +-
 include/linux/iommu.h  | 19 
 include/uapi/linux/iommu.h | 35 ++
 3 files changed, 146 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8037a3f07f07..956a80364efd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -891,7 +891,14 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
  * @data: private data passed as argument to the handler
  *
  * When an IOMMU fault event is received, this handler gets called with the
- * fault event and data as argument. The handler should return 0 on success.
+ * fault event and data as argument. The handler should return 0 on success. If
+ * the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the consumer should also
+ * complete the fault by calling iommu_page_response() with one of the 
following
+ * response code:
+ * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
+ * - IOMMU_PAGE_RESP_INVALID: terminate the fault
+ * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
+ *   page faults if possible.
  *
  * Return 0 if the fault handler was installed successfully, or an error.
  */
@@ -921,6 +928,8 @@ int iommu_register_device_fault_handler(struct device *dev,
}
param->fault_param->handler = handler;
param->fault_param->data = data;
+   mutex_init(¶m->fault_param->lock);
+   INIT_LIST_HEAD(¶m->fault_param->faults);
 
 done_unlock:
mutex_unlock(¶m->lock);
@@ -951,6 +960,12 @@ int iommu_unregister_device_fault_handler(struct device 
*dev)
if (!param->fault_param)
goto unlock;
 
+   /* we cannot unregister handler if there are pending faults */
+   if (!list_empty(¶m->fault_param->faults)) {
+   ret = -EBUSY;
+   goto unlock;
+   }
+
kfree(param->fault_param);
param->fault_param = NULL;
put_device(dev);
@@ -967,13 +982,15 @@ EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
  * @evt: fault event data
  *
  * Called by IOMMU drivers when a fault is detected, typically in a threaded 
IRQ
- * handler.
+ * handler. When this function fails and the fault is recoverable, it is the
+ * caller's responsibility to complete the fault.
  *
  * Return 0 on success, or an error.
  */
 int iommu_report_device_fault(struct device *dev, struct iommu_fault_event 
*evt)
 {
struct iommu_param *param = dev->iommu_param;
+   struct iommu_fault_event *evt_pending = NULL;
struct iommu_fault_param *fparam;
int ret = 0;
 
@@ -987,13 +1004,86 @@ int iommu_report_device_fault(struct device *dev, struct 
iommu_fault_event *evt)
ret = -EINVAL;
goto done_unlock;
}
+
+   if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
+   (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
+   evt_pending = kmemdup(evt, sizeof(struct iommu_fault_event),
+ GFP_KERNEL);
+   if (!evt_pending) {
+   ret = -ENOMEM;
+   goto done_unlock;
+   }
+   mutex_lock(&fparam->lock);
+   list_add_tail(&evt_pending->list, &fparam->faults);
+   mutex_unlock(&fparam->lock);
+   }
+
ret = fparam->handler(&evt->fault, fparam->data);
+   if (ret && evt_pending) {
+   mutex_lock(&fparam->lock);
+   list_del(&evt_pending->list);
+   mutex_unlock(&fparam->lock);
+   kfree(evt_pending);
+   }
 done_unlock:
mutex_unlock(¶m->lock);
return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_report_device_fault);
 
+int iommu_page_response(struct device *dev,
+   struct iommu_page_response *msg)
+{
+   bool pasid_valid;
+   int ret = -EINVAL;
+   struct iommu_fault_event *evt;
+   struct iommu_fault_page_request *prm;
+   struct iommu_param *param = dev->iommu_param;
+   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+   if (!domain || !domain->ops->page_response)
+   re

[PATCH v2 1/4] driver core: Add per device iommu param

2019-06-03 Thread Jean-Philippe Brucker
From: Jacob Pan 

DMA faults can be detected by IOMMU at device level. Adding a pointer
to struct device allows IOMMU subsystem to report relevant faults
back to the device driver for further handling.
For direct assigned device (or user space drivers), guest OS holds
responsibility to handle and respond per device IOMMU fault.
Therefore we need fault reporting mechanism to propagate faults beyond
IOMMU subsystem.

There are two other IOMMU data pointers under struct device today, here
we introduce iommu_param as a parent pointer such that all device IOMMU
data can be consolidated here. The idea was suggested here by Greg KH
and Joerg. The name iommu_param is chosen here since iommu_data has been
used.

Suggested-by: Greg Kroah-Hartman 
Reviewed-by: Greg Kroah-Hartman 
Signed-off-by: Jacob Pan 
Link: https://lkml.org/lkml/2017/10/6/81
---
 include/linux/device.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index e85264fb6616..f0a975abd6e9 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -42,6 +42,7 @@ struct iommu_ops;
 struct iommu_group;
 struct iommu_fwspec;
 struct dev_pin_info;
+struct iommu_param;
 
 struct bus_attribute {
struct attributeattr;
@@ -959,6 +960,7 @@ struct dev_links_info {
  * device (i.e. the bus driver that discovered the device).
  * @iommu_group: IOMMU group the device belongs to.
  * @iommu_fwspec: IOMMU-specific properties supplied by firmware.
+ * @iommu_param: Per device generic IOMMU runtime data
  *
  * @offline_disabled: If set, the device is permanently online.
  * @offline:   Set after successful invocation of bus type's .offline().
@@ -1052,6 +1054,7 @@ struct device {
void(*release)(struct device *dev);
struct iommu_group  *iommu_group;
struct iommu_fwspec *iommu_fwspec;
+   struct iommu_param  *iommu_param;
 
booloffline_disabled:1;
booloffline:1;
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Prerequisites for NXP LS104xA SMMU enablement

2019-06-03 Thread Andreas Färber
Am 31.05.19 um 19:32 schrieb Laurentiu Tudor:
>> -Original Message-
>> From: Andreas Färber 
>> Sent: Friday, May 31, 2019 8:04 PM
>>
>> Hello Laurentiu,
>>
>> Am 31.05.19 um 18:46 schrieb Laurentiu Tudor:
 -Original Message-
 From: Andreas Färber 
 Sent: Friday, May 31, 2019 7:15 PM

 Hi Laurentiu,

 Am 30.05.19 um 16:19 schrieb laurentiu.tu...@nxp.com:
> This patch series contains several fixes in preparation for SMMU
> support on NXP LS1043A and LS1046A chips. Once these get picked up,
> I'll submit the actual SMMU enablement patches consisting in the
> required device tree changes.

 Have you thought through what will happen if this patch ordering is not
 preserved? In particular, a user installing a future U-Boot update with
 the DTB bits but booting a stable kernel without this patch series -
 wouldn't that regress dpaa then for our customers?

>>>
>>> These are fixes for issues that popped out after enabling SMMU.
>>> I do not expect them to break anything.
>>
>> That was not my question! You're missing my point: All your patches are
>> lacking a Fixes header in their commit message, for backporting them, to
>> avoid _your DT patches_ breaking the driver on stable branches!
> 
> It does appear that I'm missing your point. For sure, the DT updates solely 
> will
> break the kernel without these fixes but I'm not sure I understand how this
> could happen.

In short, customers rarely run master branch. Kindly have your
colleagues explain stable branches to you in details.

With Fixes header I was referring to the syntax explained here:
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

> My plan was to share the kernel dts patches sometime after this series
> makes it through.

That's fine. What I'm warning you is that seemingly your DT patches,
once in one of your LSDK U-Boot releases, will cause a regression for
distros like our SLES 15 SP1 unless these prereq kernel patches get
applied on the respective stable branches. Which will not happen
automatically unless you as patch author take the appropriate action
before they get merged.

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support

2019-06-03 Thread Clément Péron
Hi Maxime, Joerg,

On Wed, 22 May 2019 at 21:27, Rob Herring  wrote:
>
> On Tue, May 21, 2019 at 11:11 AM Clément Péron  wrote:
> >
> > Hi,
> >
> > The Allwinner H6 has a Mali-T720 MP2 which should be supported by
> > the new panfrost driver. This series fix two issues and introduce the
> > dt-bindings but a simple benchmark show that it's still NOT WORKING.
> >
> > I'm pushing it in case someone want to continue the work.
> >
> > This has been tested with Mesa3D 19.1.0-RC2 and a GPU bitness patch[1].
> >
> > One patch is from Icenowy Zheng where I changed the order as required
> > by Rob Herring[2].
> >
> > Thanks,
> > Clement
> >
> > [1] https://gitlab.freedesktop.org/kszaq/mesa/tree/panfrost_64_32
> > [2] https://patchwork.kernel.org/patch/10699829/
> >
> >
> > [  345.204813] panfrost 180.gpu: mmu irq status=1
> > [  345.209617] panfrost 180.gpu: Unhandled Page fault in AS0 at VA
> > 0x02400400
> > [  345.209617] Reason: TODO
> > [  345.209617] raw fault status: 0x82C1
> > [  345.209617] decoded fault status: SLAVE FAULT
> > [  345.209617] exception type 0xC1: TRANSLATION_FAULT_LEVEL1
> > [  345.209617] access type 0x2: READ
> > [  345.209617] source id 0x8000
> > [  345.729957] panfrost 180.gpu: gpu sched timeout, js=0,
> > status=0x8, head=0x2400400, tail=0x2400400, sched_job=9e204de9
> > [  346.055876] panfrost 180.gpu: mmu irq status=1
> > [  346.060680] panfrost 180.gpu: Unhandled Page fault in AS0 at VA
> > 0x02C00A00
> > [  346.060680] Reason: TODO
> > [  346.060680] raw fault status: 0x810002C1
> > [  346.060680] decoded fault status: SLAVE FAULT
> > [  346.060680] exception type 0xC1: TRANSLATION_FAULT_LEVEL1
> > [  346.060680] access type 0x2: READ
> > [  346.060680] source id 0x8100
> > [  346.561955] panfrost 180.gpu: gpu sched timeout, js=1,
> > status=0x8, head=0x2c00a00, tail=0x2c00a00, sched_job=b55a9a85
> > [  346.573913] panfrost 180.gpu: mmu irq status=1
> > [  346.578707] panfrost 180.gpu: Unhandled Page fault in AS0 at VA
> > 0x02C00B80
> >
> > Change in v5:
> >  - Remove fix indent
> >
> > Changes in v4:
> >  - Add bus_clock probe
> >  - Fix sanity check in io-pgtable
> >  - Add vramp-delay
> >  - Merge all boards into one patch
> >  - Remove upstreamed Neil A. patch
> >
> > Change in v3 (Thanks to Maxime Ripard):
> >  - Reauthor Icenowy for her path
> >
> > Changes in v2 (Thanks to Maxime Ripard):
> >  - Drop GPU OPP Table
> >  - Add clocks and clock-names in required
> >
> > Clément Péron (5):
> >   drm: panfrost: add optional bus_clock
> >   iommu: io-pgtable: fix sanity check for non 48-bit mali iommu
> >   dt-bindings: gpu: mali-midgard: Add H6 mali gpu compatible
> >   arm64: dts: allwinner: Add ARM Mali GPU node for H6
> >   arm64: dts: allwinner: Add mali GPU supply for H6 boards
> >
> > Icenowy Zheng (1):
> >   dt-bindings: gpu: add bus clock for Mali Midgard GPUs
>
> I've applied patches 1 and 3 to drm-misc. I was going to do patch 4
> too, but it doesn't apply.
>
> Patch 2 can go in via the iommu tree and the rest via the allwinner tree.

Is this OK for you to pick up this series?

Thanks,
Clément

>
> Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 07/26] iommu/dma: move the arm64 wrappers to common code

2019-06-03 Thread Jon Hunter


On 29/04/2019 13:56, Robin Murphy wrote:
> On 22/04/2019 18:59, Christoph Hellwig wrote:
>> There is nothing really arm64 specific in the iommu_dma_ops
>> implementation, so move it to dma-iommu.c and keep a lot of symbols
>> self-contained.  Note the implementation does depend on the
>> DMA_DIRECT_REMAP infrastructure for now, so we'll have to make the
>> DMA_IOMMU support depend on it, but this will be relaxed soon.
> 
> Nothing looks objectionable, and boot testing with this much of the
> series merged has my coherent and non-coherent IOMMU-backed devices
> appearing to still work OK, so:
> 
> Acked-by: Robin Murphy 

Since next-20190529 one of our tests for MMC has started failing, where
the symptom is that the data written to the MMC does not match the
source. Bisecting this is pointing to this commit. Unfortunately, I am
not able to cleanly revert this on top of -next, but wanted to report
this if case you have any ideas.

Cheers
Jon

-- 
nvpublic


Re: [PATCH v2 0/4] iommu: Add device fault reporting API

2019-06-03 Thread Jacob Pan
On Mon,  3 Jun 2019 15:57:45 +0100
Jean-Philippe Brucker  wrote:

> Allow device drivers and VFIO to get notified on IOMMU translation
> fault, and handle recoverable faults (PCI PRI). Several series require
> this API (Intel VT-d and Arm SMMUv3 nested support, as well as the
> generic host SVA implementation).
> 
> Changes since v1 [1]:
> * Allocate iommu_param earlier, in iommu_probe_device().
> * Pass struct iommu_fault to fault handlers, instead of the
>   iommu_fault_event wrapper.
> * Removed unused iommu_fault_event::iommu_private.
> * Removed unnecessary iommu_page_response::addr.
> * Added iommu_page_response::version, which would allow to introduce a
>   new incompatible iommu_page_response structure (as opposed to just
>   adding a flag + field).
> 
> [1] [PATCH 0/4] iommu: Add device fault reporting API
> 
> https://lore.kernel.org/lkml/20190523180613.55049-1-jean-philippe.bruc...@arm.com/
> 
> Jacob Pan (3):
>   driver core: Add per device iommu param
>   iommu: Introduce device fault data
>   iommu: Introduce device fault report API
> 
> Jean-Philippe Brucker (1):
>   iommu: Add recoverable fault reporting
> 
This interface meet the need for vt-d, just one more comment on 2/4. Do
you want to add Co-developed-by you for the three patches from me?

Thanks,

Jacob

>  drivers/iommu/iommu.c  | 236
> - include/linux/device.h |
> 3 + include/linux/iommu.h  |  87 ++
>  include/uapi/linux/iommu.h | 153 
>  4 files changed, 476 insertions(+), 3 deletions(-)
>  create mode 100644 include/uapi/linux/iommu.h
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/4] iommu: Introduce device fault data

2019-06-03 Thread Jacob Pan
On Mon,  3 Jun 2019 15:57:47 +0100
Jean-Philippe Brucker  wrote:

> +/**
> + * struct iommu_fault_page_request - Page Request data
> + * @flags: encodes whether the corresponding fields are valid and
> whether this
> + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_*
> values)
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
> + * @addr: page address
> + * @private_data: device-specific private information
> + */
> +struct iommu_fault_page_request {
> +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0)
> +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE   (1 << 1)
> +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA   (1 << 2)
> + __u32   flags;
> + __u32   pasid;
> + __u32   grpid;
> + __u32   perm;
> + __u64   addr;
> + __u64   private_data[2];
> +};
> +

Just a thought, for non-identity G-H PASID management. We could pass on
guest PASID in PRQ to save a lookup in QEMU. In this case, QEMU
allocate a GPASID for vIOMMU then a host PASID for pIOMMU. QEMU has a
G->H lookup. When PRQ comes in to the pIOMMU with HPASID, IOMMU driver
can retrieve GPASID from the bind data then report to the guest via
VFIO. In this case QEMU does not need to do a H->G PASID lookup.

Should we add a gpasid field here? or we can add a flag and field
later, up to you.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 28/29] vfio-pci: Add VFIO_PCI_DMA_FAULT_IRQ_INDEX

2019-06-03 Thread Alex Williamson
On Sun, 26 May 2019 18:10:03 +0200
Eric Auger  wrote:

> Add a new VFIO_PCI_DMA_FAULT_IRQ_INDEX index. This allows to
> set/unset an eventfd that will be triggered when DMA translation
> faults are detected at physical level when the nested mode is used.
> 
> Signed-off-by: Eric Auger 
> ---
>  drivers/vfio/pci/vfio_pci.c   |  3 +++
>  drivers/vfio/pci/vfio_pci_intrs.c | 19 +++
>  include/uapi/linux/vfio.h |  1 +
>  3 files changed, 23 insertions(+)


Note that I suggested to Intel folks trying to add a GVT-g page
flipping eventfd to convert to device specific interrupts the same way
we added device specific regions:

https://patchwork.kernel.org/patch/10962337/

I'd probably suggest the same here so we can optionally expose it when
supported.  Thanks,

Alex

> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index a9c8af2a774a..65a1e6814f5c 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -746,6 +746,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
> *vdev, int irq_type)
>   return 1;
>   } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
>   return 1;
> + } else if (irq_type == VFIO_PCI_DMA_FAULT_IRQ_INDEX) {
> + return 1;
>   }
>  
>   return 0;
> @@ -1082,6 +1084,7 @@ static long vfio_pci_ioctl(void *device_data,
>   switch (info.index) {
>   case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
>   case VFIO_PCI_REQ_IRQ_INDEX:
> + case VFIO_PCI_DMA_FAULT_IRQ_INDEX:
>   break;
>   case VFIO_PCI_ERR_IRQ_INDEX:
>   if (pci_is_pcie(vdev->pdev))
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
> b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1c46045b0e7f..28a96117daf3 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -622,6 +622,18 @@ static int vfio_pci_set_req_trigger(struct 
> vfio_pci_device *vdev,
>  count, flags, data);
>  }
>  
> +static int vfio_pci_set_dma_fault_trigger(struct vfio_pci_device *vdev,
> +   unsigned index, unsigned start,
> +   unsigned count, uint32_t flags,
> +   void *data)
> +{
> + if (index != VFIO_PCI_DMA_FAULT_IRQ_INDEX || start != 0 || count > 1)
> + return -EINVAL;
> +
> + return vfio_pci_set_ctx_trigger_single(&vdev->dma_fault_trigger,
> +count, flags, data);
> +}
> +
>  int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
>   unsigned index, unsigned start, unsigned count,
>   void *data)
> @@ -671,6 +683,13 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device 
> *vdev, uint32_t flags,
>   break;
>   }
>   break;
> + case VFIO_PCI_DMA_FAULT_IRQ_INDEX:
> + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> + case VFIO_IRQ_SET_ACTION_TRIGGER:
> + func = vfio_pci_set_dma_fault_trigger;
> + break;
> + }
> + break;
>   }
>  
>   if (!func)
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 13e041b84d48..66b6b08c4a38 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -559,6 +559,7 @@ enum {
>   VFIO_PCI_MSIX_IRQ_INDEX,
>   VFIO_PCI_ERR_IRQ_INDEX,
>   VFIO_PCI_REQ_IRQ_INDEX,
> + VFIO_PCI_DMA_FAULT_IRQ_INDEX,
>   VFIO_PCI_NUM_IRQS
>  };
>  



Re: [PATCH v8 04/29] iommu: Add recoverable fault reporting

2019-06-03 Thread Alex Williamson
On Sun, 26 May 2019 18:09:39 +0200
Eric Auger  wrote:

> From: Jean-Philippe Brucker 
> 
> Some IOMMU hardware features, for example PCI's PRI and Arm SMMU's Stall,
> enable recoverable I/O page faults. Allow IOMMU drivers to report PRI Page
> Requests and Stall events through the new fault reporting API. The
> consumer of the fault can be either an I/O page fault handler in the host,
> or a guest OS.
> 
> Once handled, the fault must be completed by sending a page response back
> to the IOMMU. Add an iommu_page_response() function to complete a page
> fault.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/iommu.c | 77 ++-
>  include/linux/iommu.h | 51 
>  2 files changed, 127 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 795518445a3a..13b301cfb10f 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -869,7 +869,14 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>   * @data: private data passed as argument to the handler
>   *
>   * When an IOMMU fault event is received, this handler gets called with the
> - * fault event and data as argument.
> + * fault event and data as argument. The handler should return 0 on success. 
> If
> + * the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the handler should also
> + * complete the fault by calling iommu_page_response() with one of the 
> following
> + * response code:
> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
> + *   page faults if possible.
>   *
>   * Return 0 if the fault handler was installed successfully, or an error.
>   */
> @@ -904,6 +911,8 @@ int iommu_register_device_fault_handler(struct device 
> *dev,
>   }
>   param->fault_param->handler = handler;
>   param->fault_param->data = data;
> + mutex_init(¶m->fault_param->lock);
> + INIT_LIST_HEAD(¶m->fault_param->faults);
>  
>  done_unlock:
>   mutex_unlock(¶m->lock);
> @@ -934,6 +943,12 @@ int iommu_unregister_device_fault_handler(struct device 
> *dev)
>   if (!param->fault_param)
>   goto unlock;
>  
> + /* we cannot unregister handler if there are pending faults */
> + if (!list_empty(¶m->fault_param->faults)) {
> + ret = -EBUSY;
> + goto unlock;
> + }

Why?  Attempting to unregister a fault handler suggests the handler
doesn't care about outstanding faults.  Can't we go ahead and dispatch
them as failed?  Otherwise we need to be careful that we don't
introduce an environment where the registered fault handler is blocked
trying to shutdown and release the device due to a flood of errors.
Thanks,

Alex

> +
>   kfree(param->fault_param);
>   param->fault_param = NULL;
>   put_device(dev);
> @@ -958,6 +973,7 @@ EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
>  int iommu_report_device_fault(struct device *dev, struct iommu_fault_event 
> *evt)
>  {
>   struct iommu_param *param = dev->iommu_param;
> + struct iommu_fault_event *evt_pending;
>   struct iommu_fault_param *fparam;
>   int ret = 0;
>  
> @@ -972,6 +988,20 @@ int iommu_report_device_fault(struct device *dev, struct 
> iommu_fault_event *evt)
>   ret = -EINVAL;
>   goto done_unlock;
>   }
> +
> + if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
> + (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
> + evt_pending = kmemdup(evt, sizeof(struct iommu_fault_event),
> +   GFP_KERNEL);
> + if (!evt_pending) {
> + ret = -ENOMEM;
> + goto done_unlock;
> + }
> + mutex_lock(&fparam->lock);
> + list_add_tail(&evt_pending->list, &fparam->faults);
> + mutex_unlock(&fparam->lock);
> + }
> +
>   ret = fparam->handler(evt, fparam->data);
>  done_unlock:
>   mutex_unlock(¶m->lock);
> @@ -1513,6 +1543,51 @@ int iommu_attach_device(struct iommu_domain *domain, 
> struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +int iommu_page_response(struct device *dev,
> + struct page_response_msg *msg)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = -EINVAL;
> + struct iommu_fault_event *evt;
> + struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> + if (!domain || !domain->ops->page_response)
> + return -ENODEV;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param || !param->fault_param)
> + return -EINVAL;
> +
> + /* Only send response if there is a fault report pending */
> + mutex_lock(¶m->fault_par

Re: [PATCH v8 05/29] iommu: Add a timeout parameter for PRQ response

2019-06-03 Thread Alex Williamson
On Sun, 26 May 2019 18:09:40 +0200
Eric Auger  wrote:

> From: Jacob Pan 
> 
> When an IO page request is processed outside IOMMU subsystem, response
> can be delayed or lost. Add a tunable setup parameter such that user can
> choose the timeout for IOMMU to track pending page requests.
> 
> This timeout mechanism is a basic safety net which can be implemented in
> conjunction with credit based or device level page response exception
> handling.
> 
> Signed-off-by: Jacob Pan 
> ---
>  .../admin-guide/kernel-parameters.txt |  8 +
>  drivers/iommu/iommu.c | 29 +++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 138f6664b2e2..b43f0893d252 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1813,6 +1813,14 @@
>   1 - Bypass the IOMMU for DMA.
>   unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
>  
> + iommu.prq_timeout=
> + Timeout in seconds to wait for page response
> + of a pending page request.
> + Format: 
> + Default: 10
> + 0 - no timeout tracking
> + 1 to 100 - allowed range
> +
>   io7=[HW] IO7 for Marvel based alpha systems
>   See comment before marvel_specify_io7 in
>   arch/alpha/kernel/core_marvel.c.
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 13b301cfb10f..64e87d56f471 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -45,6 +45,19 @@ static unsigned int iommu_def_domain_type = 
> IOMMU_DOMAIN_DMA;
>  #endif
>  static bool iommu_dma_strict __read_mostly = true;
>  
> +/*
> + * Timeout to wait for page response of a pending page request. This is
> + * intended as a basic safty net in case a pending page request is not
> + * responded for an exceptionally long time. Device may also implement
> + * its own protection mechanism against this exception.
> + * Units are in jiffies with a range between 1 - 100 seconds equivalent.
> + * Default to 10 seconds.
> + * Setting 0 means no timeout tracking.
> + */
> +#define IOMMU_PAGE_RESPONSE_MAX_TIMEOUT (HZ * 100)
> +#define IOMMU_PAGE_RESPONSE_DEF_TIMEOUT (HZ * 10)
> +static unsigned long prq_timeout = IOMMU_PAGE_RESPONSE_DEF_TIMEOUT;
> +
>  struct iommu_group {
>   struct kobject kobj;
>   struct kobject *devices_kobj;
> @@ -157,6 +170,22 @@ static int __init iommu_dma_setup(char *str)
>  }
>  early_param("iommu.strict", iommu_dma_setup);
>  
> +static int __init iommu_set_prq_timeout(char *str)
> +{
> + unsigned long timeout;
> +
> + if (!str)
> + return -EINVAL;
> + timeout = simple_strtoul(str, NULL, 0);
> + timeout = timeout * HZ;
> + if (timeout > IOMMU_PAGE_RESPONSE_MAX_TIMEOUT)
> + return -EINVAL;
> + prq_timeout = timeout;
> +
> + return 0;
> +}
> +early_param("iommu.prq_timeout", iommu_set_prq_timeout);
> +
>  static ssize_t iommu_group_attr_show(struct kobject *kobj,
>struct attribute *__attr, char *buf)
>  {

It doesn't seem to make much sense to include this patch without also
including "iommu: handle page response timeout".  Was that one lost?
Dropped?  Lives elsewhere?  Thanks,

Alex


Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-03 Thread Alex Williamson
On Sun, 26 May 2019 18:10:01 +0200
Eric Auger  wrote:

> This patch registers a fault handler which records faults in
> a circular buffer and then signals an eventfd. This buffer is
> exposed within the fault region.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v3 -> v4:
> - move iommu_unregister_device_fault_handler to vfio_pci_release
> ---
>  drivers/vfio/pci/vfio_pci.c | 49 +
>  drivers/vfio/pci/vfio_pci_private.h |  1 +
>  2 files changed, 50 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index f75f61127277..52094ba8 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "vfio_pci_private.h"
>  
> @@ -296,6 +297,46 @@ static const struct vfio_pci_regops 
> vfio_pci_fault_prod_regops = {
>   .add_capability = vfio_pci_fault_prod_add_capability,
>  };
>  
> +int vfio_pci_iommu_dev_fault_handler(struct iommu_fault_event *evt, void 
> *data)
> +{
> + struct vfio_pci_device *vdev = (struct vfio_pci_device *) data;
> + struct vfio_region_fault_prod *prod_region =
> + (struct vfio_region_fault_prod *)vdev->fault_pages;
> + struct vfio_region_fault_cons *cons_region =
> + (struct vfio_region_fault_cons *)(vdev->fault_pages + 2 * 
> PAGE_SIZE);
> + struct iommu_fault *new =
> + (struct iommu_fault *)(vdev->fault_pages + prod_region->offset +
> + prod_region->prod * prod_region->entry_size);
> + int prod, cons, size;
> +
> + mutex_lock(&vdev->fault_queue_lock);
> +
> + if (!vdev->fault_abi)
> + goto unlock;
> +
> + prod = prod_region->prod;
> + cons = cons_region->cons;
> + size = prod_region->nb_entries;
> +
> + if (CIRC_SPACE(prod, cons, size) < 1)
> + goto unlock;
> +
> + *new = evt->fault;
> + prod = (prod + 1) % size;
> + prod_region->prod = prod;
> + mutex_unlock(&vdev->fault_queue_lock);
> +
> + mutex_lock(&vdev->igate);
> + if (vdev->dma_fault_trigger)
> + eventfd_signal(vdev->dma_fault_trigger, 1);
> + mutex_unlock(&vdev->igate);
> + return 0;
> +
> +unlock:
> + mutex_unlock(&vdev->fault_queue_lock);
> + return -EINVAL;
> +}
> +
>  static int vfio_pci_init_fault_region(struct vfio_pci_device *vdev)
>  {
>   struct vfio_region_fault_prod *header;
> @@ -328,6 +369,13 @@ static int vfio_pci_init_fault_region(struct 
> vfio_pci_device *vdev)
>   header = (struct vfio_region_fault_prod *)vdev->fault_pages;
>   header->version = -1;
>   header->offset = PAGE_SIZE;
> +
> + ret = iommu_register_device_fault_handler(&vdev->pdev->dev,
> + vfio_pci_iommu_dev_fault_handler,
> + vdev);
> + if (ret)
> + goto out;
> +
>   return 0;
>  out:
>   kfree(vdev->fault_pages);
> @@ -570,6 +618,7 @@ static void vfio_pci_release(void *device_data)
>   if (!(--vdev->refcnt)) {
>   vfio_spapr_pci_eeh_release(vdev->pdev);
>   vfio_pci_disable(vdev);
> + iommu_unregister_device_fault_handler(&vdev->pdev->dev);


But this can fail if there are pending faults which leaves a device
reference and then the system is broken :(

>   }
>  
>   mutex_unlock(&vdev->reflck->lock);
> diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> b/drivers/vfio/pci/vfio_pci_private.h
> index 8e0a55682d3f..a9276926f008 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -122,6 +122,7 @@ struct vfio_pci_device {
>   int ioeventfds_nr;
>   struct eventfd_ctx  *err_trigger;
>   struct eventfd_ctx  *req_trigger;
> + struct eventfd_ctx  *dma_fault_trigger;
>   struct mutexfault_queue_lock;
>   int fault_abi;
>   struct list_headdummy_resources_list;



Re: [PATCH v8 22/29] vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE

2019-06-03 Thread Alex Williamson
On Sun, 26 May 2019 18:09:57 +0200
Eric Auger  wrote:

> From: "Liu, Yi L" 
> 
> This patch adds VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE ioctl
> which aims to pass/withdraw the virtual iommu guest configuration
> to/from the VFIO driver downto to the iommu subsystem.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Eric Auger 
> 
> ---
> v6 -> v7:
> - add a comment related to VFIO_IOMMU_DETACH_PASID_TABLE
> 
> v3 -> v4:
> - restore ATTACH/DETACH
> - add unwind on failure
> 
> v2 -> v3:
> - s/BIND_PASID_TABLE/SET_PASID_TABLE
> 
> v1 -> v2:
> - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
> - remove the struct device arg
> ---
>  drivers/vfio/vfio_iommu_type1.c | 53 +
>  include/uapi/linux/vfio.h   | 22 ++
>  2 files changed, 75 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 3ddc375e7063..b2d609d6fe83 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1758,6 +1758,43 @@ static int vfio_domains_have_iommu_cache(struct 
> vfio_iommu *iommu)
>   return ret;
>  }
>  
> +static void
> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
> +{
> + struct vfio_domain *d;
> +
> + mutex_lock(&iommu->lock);
> +
> + list_for_each_entry(d, &iommu->domain_list, next) {
> + iommu_detach_pasid_table(d->domain);
> + }
> + mutex_unlock(&iommu->lock);
> +}
> +
> +static int
> +vfio_attach_pasid_table(struct vfio_iommu *iommu,
> + struct vfio_iommu_type1_attach_pasid_table *ustruct)
> +{
> + struct vfio_domain *d;
> + int ret = 0;
> +
> + mutex_lock(&iommu->lock);
> +
> + list_for_each_entry(d, &iommu->domain_list, next) {
> + ret = iommu_attach_pasid_table(d->domain, &ustruct->config);
> + if (ret)
> + goto unwind;
> + }
> + goto unlock;
> +unwind:
> + list_for_each_entry_continue_reverse(d, &iommu->domain_list, next) {
> + iommu_detach_pasid_table(d->domain);
> + }
> +unlock:
> + mutex_unlock(&iommu->lock);
> + return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  unsigned int cmd, unsigned long arg)
>  {
> @@ -1828,6 +1865,22 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>   return copy_to_user((void __user *)arg, &unmap, minsz) ?
>   -EFAULT : 0;
> + } else if (cmd == VFIO_IOMMU_ATTACH_PASID_TABLE) {
> + struct vfio_iommu_type1_attach_pasid_table ustruct;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_attach_pasid_table,
> + config);
> +
> + if (copy_from_user(&ustruct, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)
> + return -EINVAL;
> +
> + return vfio_attach_pasid_table(iommu, &ustruct);
> + } else if (cmd == VFIO_IOMMU_DETACH_PASID_TABLE) {
> + vfio_detach_pasid_table(iommu);
> + return 0;
>   }
>  
>   return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 8f10748dac79..4316dd8cb5b5 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -14,6 +14,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  #define VFIO_API_VERSION 0
>  
> @@ -763,6 +764,27 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE   _IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/**
> + * VFIO_IOMMU_ATTACH_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 22,
> + *   struct vfio_iommu_type1_attach_pasid_table)
> + *
> + * Passes the PASID table to the host. Calling ATTACH_PASID_TABLE
> + * while a table is already installed is allowed: it replaces the old
> + * table. DETACH does a comprehensive tear down of the nested mode.
> + */
> +struct vfio_iommu_type1_attach_pasid_table {
> + __u32   argsz;
> + __u32   flags;
> + struct iommu_pasid_table_config config;
> +};
> +#define VFIO_IOMMU_ATTACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 22)
> +
> +/**
> + * VFIO_IOMMU_DETACH_PASID_TABLE - - _IOWR(VFIO_TYPE, VFIO_BASE + 23)
> + * Detaches the PASID table
> + */
> +#define VFIO_IOMMU_DETACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 23)
> +
>  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
>  
>  /*

I'm tempted to suggest a "SET" rather than ATTACH/DETACH interface so
this could be done in one ioctl and make use of the flags provided.


Re: [PATCH v8 25/29] vfio-pci: Add a new VFIO_REGION_TYPE_NESTED region type

2019-06-03 Thread Alex Williamson
On Sun, 26 May 2019 18:10:00 +0200
Eric Auger  wrote:

> This patch adds two new regions aiming to handle nested mode
> translation faults.
> 
> The first region (two host kernel pages) is read-only from the
> user-space perspective. The first page contains an header
> that provides information about the circular buffer located in the
> second page. The circular buffer is put in a different page in
> the prospect to be mmappable.
> 
> The max user API version supported by the kernel is returned
> through a dedicated fault region capability.
> 
> The prod header contains
> - the user API version in use (potentially inferior to the one
>   returned in the capability),
> - the offset of the queue within the region,
> - the producer index relative to the start of the queue
> - the max number of fault records,
> - the size of each record.
> 
> The second region is write-only from the user perspective. It
> contains the version of the requested fault ABI and the consumer
> index that is updated by the userspace each time this latter has
> consumed fault records.
> 
> The natural order of operation for the userspace is:
> - retrieve the highest supported fault ABI version
> - set the requested fault ABI version in the consumer region
> 
> Until the ABI version is not set by the userspace, the kernel
> cannot return a comprehensive set of information inside the
> prod header (entry size and number of entries in the fault queue).

It's not clear to me why two regions are required for this.  If the
first page is not mmap capable, why does it need to be read-only?  If
it were not read-only couldn't the fields of the second region also fit
within this first page?  If you wanted to deal with an mmap capable
writeable region, it could just be yet a 3rd page in the first region.

> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v4 -> v5
> - check cons is not null in vfio_pci_check_cons_fault
> 
> v3 -> v4:
> - use 2 separate regions, respectively in read and write modes
> - add the version capability
> ---
>  drivers/vfio/pci/vfio_pci.c | 105 
>  drivers/vfio/pci/vfio_pci_private.h |  17 +
>  drivers/vfio/pci/vfio_pci_rdwr.c|  73 +++
>  include/uapi/linux/vfio.h   |  42 +++
>  4 files changed, 237 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index cab71da46f4a..f75f61127277 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -261,6 +261,106 @@ int vfio_pci_set_power_state(struct vfio_pci_device 
> *vdev, pci_power_t state)
>   return ret;
>  }
>  
> +void vfio_pci_fault_release(struct vfio_pci_device *vdev,
> + struct vfio_pci_region *region)
> +{
> +}
> +
> +static const struct vfio_pci_fault_abi fault_abi_versions[] = {
> + [0] = {
> + .entry_size = sizeof(struct iommu_fault),
> + },
> +};
> +
> +#define NR_FAULT_ABIS ARRAY_SIZE(fault_abi_versions)

This looks like it's leading to some dangerous complicated code to
support multiple user selected ABIs.  How many ABIs do we plan to
support?  The region capability also exposes a type, sub-type, and
version.  How much of this could be exposed that way?  ie. if we need
to support multiple versions, expose multiple regions.

> +
> +static int vfio_pci_fault_prod_add_capability(struct vfio_pci_device *vdev,
> + struct vfio_pci_region *region, struct vfio_info_cap *caps)
> +{
> + struct vfio_region_info_cap_fault cap = {
> + .header.id = VFIO_REGION_INFO_CAP_PRODUCER_FAULT,
> + .header.version = 1,
> + .version = NR_FAULT_ABIS,
> + };
> + return vfio_info_add_capability(caps, &cap.header, sizeof(cap));
> +}
> +
> +static const struct vfio_pci_regops vfio_pci_fault_cons_regops = {
> + .rw = vfio_pci_fault_cons_rw,
> + .release= vfio_pci_fault_release,
> +};
> +
> +static const struct vfio_pci_regops vfio_pci_fault_prod_regops = {
> + .rw = vfio_pci_fault_prod_rw,
> + .release= vfio_pci_fault_release,
> + .add_capability = vfio_pci_fault_prod_add_capability,
> +};
> +
> +static int vfio_pci_init_fault_region(struct vfio_pci_device *vdev)
> +{
> + struct vfio_region_fault_prod *header;
> + int ret;
> +
> + mutex_init(&vdev->fault_queue_lock);
> +
> + vdev->fault_pages = kzalloc(3 * PAGE_SIZE, GFP_KERNEL);
> + if (!vdev->fault_pages)
> + return -ENOMEM;
> +
> + ret = vfio_pci_register_dev_region(vdev,
> + VFIO_REGION_TYPE_NESTED,
> + VFIO_REGION_SUBTYPE_NESTED_FAULT_PROD,
> + &vfio_pci_fault_prod_regops, 2 * PAGE_SIZE,
> + VFIO_REGION_INFO_FLAG_READ, vdev->fault_pages);

If mmap isn't supported yet, why are we pushing the queue out to the
2nd page?  We're just wasting space.  vfio_region_fault_prod.offset
allows us to relocate it when/if it is mmap capable.

> + if (ret)
> +  

Re: [PATCH v8 24/29] vfio: VFIO_IOMMU_BIND/UNBIND_MSI

2019-06-03 Thread Alex Williamson
On Sun, 26 May 2019 18:09:59 +0200
Eric Auger  wrote:

> This patch adds the VFIO_IOMMU_BIND/UNBIND_MSI ioctl which aim
> to pass/withdraw the guest MSI binding to/from the host.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> v6 -> v7:
> - removed the dev arg
> 
> v3 -> v4:
> - add UNBIND
> - unwind on BIND error
> 
> v2 -> v3:
> - adapt to new proto of bind_guest_msi
> - directly use vfio_iommu_for_each_dev
> 
> v1 -> v2:
> - s/vfio_iommu_type1_guest_msi_binding/vfio_iommu_type1_bind_guest_msi
> ---
>  drivers/vfio/vfio_iommu_type1.c | 64 +
>  include/uapi/linux/vfio.h   | 29 +++
>  2 files changed, 93 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 6fda4fbc9bfa..18142cb078a3 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1832,6 +1832,42 @@ static int vfio_cache_inv_fn(struct device *dev, void 
> *data)
>   return iommu_cache_invalidate(dc->domain, dev, &ustruct->info);
>  }
>  
> +static int
> +vfio_bind_msi(struct vfio_iommu *iommu,
> +   dma_addr_t giova, phys_addr_t gpa, size_t size)
> +{
> + struct vfio_domain *d;
> + int ret = 0;
> +
> + mutex_lock(&iommu->lock);
> +
> + list_for_each_entry(d, &iommu->domain_list, next) {
> + ret = iommu_bind_guest_msi(d->domain, giova, gpa, size);
> + if (ret)
> + goto unwind;
> + }
> + goto unlock;
> +unwind:
> + list_for_each_entry_continue_reverse(d, &iommu->domain_list, next) {
> + iommu_unbind_guest_msi(d->domain, giova);
> + }
> +unlock:
> + mutex_unlock(&iommu->lock);
> + return ret;
> +}
> +
> +static void
> +vfio_unbind_msi(struct vfio_iommu *iommu, dma_addr_t giova)
> +{
> + struct vfio_domain *d;
> +
> + mutex_lock(&iommu->lock);
> + list_for_each_entry(d, &iommu->domain_list, next) {
> + iommu_unbind_guest_msi(d->domain, giova);
> + }
> + mutex_unlock(&iommu->lock);
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  unsigned int cmd, unsigned long arg)
>  {
> @@ -1936,6 +1972,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   &ustruct);
>   mutex_unlock(&iommu->lock);
>   return ret;
> + } else if (cmd == VFIO_IOMMU_BIND_MSI) {
> + struct vfio_iommu_type1_bind_msi ustruct;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_bind_msi,
> + size);
> +
> + if (copy_from_user(&ustruct, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)
> + return -EINVAL;
> +
> + return vfio_bind_msi(iommu, ustruct.iova, ustruct.gpa,
> +  ustruct.size);
> + } else if (cmd == VFIO_IOMMU_UNBIND_MSI) {
> + struct vfio_iommu_type1_unbind_msi ustruct;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_unbind_msi,
> + iova);
> +
> + if (copy_from_user(&ustruct, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)
> + return -EINVAL;
> +
> + vfio_unbind_msi(iommu, ustruct.iova);
> + return 0;
>   }
>  
>   return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 055aa9b9745a..2774a1ab37ae 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -798,6 +798,35 @@ struct vfio_iommu_type1_cache_invalidate {
>  };
>  #define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE, VFIO_BASE + 24)
>  
> +/**
> + * VFIO_IOMMU_BIND_MSI - _IOWR(VFIO_TYPE, VFIO_BASE + 25,
> + *   struct vfio_iommu_type1_bind_msi)
> + *
> + * Pass a stage 1 MSI doorbell mapping to the host so that this
> + * latter can build a nested stage2 mapping
> + */
> +struct vfio_iommu_type1_bind_msi {
> + __u32   argsz;
> + __u32   flags;
> + __u64   iova;
> + __u64   gpa;
> + __u64   size;
> +};
> +#define VFIO_IOMMU_BIND_MSI  _IO(VFIO_TYPE, VFIO_BASE + 25)
> +
> +/**
> + * VFIO_IOMMU_UNBIND_MSI - _IOWR(VFIO_TYPE, VFIO_BASE + 26,
> + *   struct vfio_iommu_type1_unbind_msi)
> + *
> + * Unregister an MSI mapping
> + */
> +struct vfio_iommu_type1_unbind_msi {
> + __u32   argsz;
> + __u32   flags;
> + __u64   iova;
> +};
> +#define VFIO_IOMMU_UNBIND_MSI  _IO(VFIO_TYPE, VFIO_BASE + 26)
> +
>  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
>  
>  /*

And another pair of ioctls.  Maybe think about how we can reduce the
ioctl bloat of this series.  I don't want to impose an awkward
interface for the sake of fewer ioctls, but I also don't want us
casually bur

[PATCH] iommu/dma: Apply dma_{alloc,free}_contiguous functions

2019-06-03 Thread Nicolin Chen
This patch replaces dma_{alloc,release}_from_contiguous() with
dma_{alloc,free}_contiguous() to simplify those function calls.

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/dma-iommu.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 0cd49c2d3770..cc3d39dbbe1a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -951,8 +951,8 @@ static void __iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr)
 
if (pages)
__iommu_dma_free_pages(pages, count);
-   if (page && !dma_release_from_contiguous(dev, page, count))
-   __free_pages(page, get_order(alloc_size));
+   if (page)
+   dma_free_contiguous(dev, page, alloc_size);
 }
 
 static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
@@ -970,12 +970,7 @@ static void *iommu_dma_alloc_pages(struct device *dev, 
size_t size,
struct page *page = NULL;
void *cpu_addr;
 
-   if (gfpflags_allow_blocking(gfp))
-   page = dma_alloc_from_contiguous(dev, alloc_size >> PAGE_SHIFT,
-get_order(alloc_size),
-gfp & __GFP_NOWARN);
-   if (!page)
-   page = alloc_pages(gfp, get_order(alloc_size));
+   page = dma_alloc_contiguous(dev, alloc_size, gfp);
if (!page)
return NULL;
 
@@ -997,8 +992,7 @@ static void *iommu_dma_alloc_pages(struct device *dev, 
size_t size,
memset(cpu_addr, 0, alloc_size);
return cpu_addr;
 out_free_pages:
-   if (!dma_release_from_contiguous(dev, page, alloc_size >> PAGE_SHIFT))
-   __free_pages(page, get_order(alloc_size));
+   dma_free_contiguous(dev, page, alloc_size);
return NULL;
 }
 
-- 
2.17.1



Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-06-03 Thread Thiago Jung Bauermann



Michael S. Tsirkin  writes:

> On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
>> I rephrased it in terms of address translation. What do you think of
>> this version? The flag name is slightly different too:
>>
>>
>> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
>> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
>> with the exception that address translation is guaranteed to be
>> unnecessary when accessing memory addresses supplied to the device
>> by the driver. Which is to say, the device will always use physical
>> addresses matching addresses used by the driver (typically meaning
>> physical addresses used by the CPU) and not translated further. This
>> flag should be set by the guest if offered, but to allow for
>> backward-compatibility device implementations allow for it to be
>> left unset by the guest. It is an error to set both this flag and
>> VIRTIO_F_ACCESS_PLATFORM.
>
>
> OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
> drivers. This is why devices fail when it's not negotiated.

Just to clarify, what do you mean by unprivileged drivers? Is it drivers
implemented in guest userspace such as with VFIO? Or unprivileged in
some other sense such as needing to use bounce buffers for some reason?

> This confuses me.
> If driver is unpriveledged then what happens with this flag?
> It can supply any address it wants. Will that corrupt kernel
> memory?

Not needing address translation doesn't necessarily mean that there's no
IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's
always an IOMMU present. And we also support VFIO drivers. The VFIO API
for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls
to program the IOMMU.

For our use case, we don't need address translation because we set up an
identity mapping in the IOMMU so that the device can use guest physical
addresses.

If the guest kernel is concerned that an unprivileged driver could
jeopardize its integrity it should not negotiate this feature flag.
Perhaps there should be a note about this in the flag definition? This
concern is platform-dependant though. I don't believe it's an issue in
pseries.

--
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-06-03 Thread Michael S. Tsirkin
On Mon, Jun 03, 2019 at 10:13:59PM -0300, Thiago Jung Bauermann wrote:
> 
> 
> Michael S. Tsirkin  writes:
> 
> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >> I rephrased it in terms of address translation. What do you think of
> >> this version? The flag name is slightly different too:
> >>
> >>
> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
> >> with the exception that address translation is guaranteed to be
> >> unnecessary when accessing memory addresses supplied to the device
> >> by the driver. Which is to say, the device will always use physical
> >> addresses matching addresses used by the driver (typically meaning
> >> physical addresses used by the CPU) and not translated further. This
> >> flag should be set by the guest if offered, but to allow for
> >> backward-compatibility device implementations allow for it to be
> >> left unset by the guest. It is an error to set both this flag and
> >> VIRTIO_F_ACCESS_PLATFORM.
> >
> >
> > OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
> > drivers. This is why devices fail when it's not negotiated.
> 
> Just to clarify, what do you mean by unprivileged drivers? Is it drivers
> implemented in guest userspace such as with VFIO? Or unprivileged in
> some other sense such as needing to use bounce buffers for some reason?

I had drivers in guest userspace in mind.

> > This confuses me.
> > If driver is unpriveledged then what happens with this flag?
> > It can supply any address it wants. Will that corrupt kernel
> > memory?
> 
> Not needing address translation doesn't necessarily mean that there's no
> IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's
> always an IOMMU present. And we also support VFIO drivers. The VFIO API
> for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls
> to program the IOMMU.
> 
> For our use case, we don't need address translation because we set up an
> identity mapping in the IOMMU so that the device can use guest physical
> addresses.

And can it access any guest physical address?

> If the guest kernel is concerned that an unprivileged driver could
> jeopardize its integrity it should not negotiate this feature flag.

Unfortunately flag negotiation is done through config space
and so can be overwritten by the driver.

> Perhaps there should be a note about this in the flag definition? This
> concern is platform-dependant though. I don't believe it's an issue in
> pseries.

Again ACCESS_PLATFORM has a pretty open definition. It does actually
say it's all up to the platform.

Specifically how will VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION be
implemented portably? virtio has no portable way to know
whether DMA API bypasses translation.



> --
> Thiago Jung Bauermann
> IBM Linux Technology Center


Re: [PATCH 07/26] iommu/dma: move the arm64 wrappers to common code

2019-06-03 Thread Christoph Hellwig
On Mon, Jun 03, 2019 at 08:47:57PM +0100, Jon Hunter wrote:
> Since next-20190529 one of our tests for MMC has started failing, where
> the symptom is that the data written to the MMC does not match the
> source. Bisecting this is pointing to this commit. Unfortunately, I am
> not able to cleanly revert this on top of -next, but wanted to report
> this if case you have any ideas.

Does this fix your problem?

https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=generic-dma-ops&id=1b961423158caaae49d3900b7c9c37477bbfa9b3


cleanup vmap usage in the dma-mapping layer

2019-06-03 Thread Christoph Hellwig
Hi all,

the common DMA remapping code uses the vmalloc/vmap code to create
page table entries for DMA mappings.  This series lifts the currently
arm specific VM_* flag for that into common code, and also exposes
it to userspace in procfs to better understand the mappings, and cleans
up a couple helpers in this area.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] vmalloc: lift the arm flag for coherent mappings to common code

2019-06-03 Thread Christoph Hellwig
The arm architecture had a VM_ARM_DMA_CONSISTENT flag to mark DMA
coherent remapping for a while.  Lift this flag to common code so
that we can use it generically.  We also check it in the only place
VM_USERMAP is directly check so that we can entirely replace that
flag as well (although I'm not even sure why we'd want to allow
remapping DMA appings, but I'd rather not change behavior).

Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c | 22 +++---
 arch/arm/mm/mm.h  |  3 ---
 include/linux/vmalloc.h   |  2 ++
 mm/vmalloc.c  |  5 -
 4 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0a75058c11f3..e197b028e341 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -360,19 +360,13 @@ static void *
 __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot,
const void *caller)
 {
-   /*
-* DMA allocation can be mapped to user space, so lets
-* set VM_USERMAP flags too.
-*/
-   return dma_common_contiguous_remap(page, size,
-   VM_ARM_DMA_CONSISTENT | VM_USERMAP,
+   return dma_common_contiguous_remap(page, size, VM_DMA_COHERENT,
prot, caller);
 }
 
 static void __dma_free_remap(void *cpu_addr, size_t size)
 {
-   dma_common_free_remap(cpu_addr, size,
-   VM_ARM_DMA_CONSISTENT | VM_USERMAP);
+   dma_common_free_remap(cpu_addr, size, VM_DMA_COHERENT);
 }
 
 #define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K
@@ -1387,8 +1381,8 @@ static void *
 __iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp, pgprot_t prot,
const void *caller)
 {
-   return dma_common_pages_remap(pages, size,
-   VM_ARM_DMA_CONSISTENT | VM_USERMAP, prot, caller);
+   return dma_common_pages_remap(pages, size, VM_DMA_COHERENT, prot,
+   caller);
 }
 
 /*
@@ -1472,7 +1466,7 @@ static struct page **__iommu_get_pages(void *cpu_addr, 
unsigned long attrs)
return cpu_addr;
 
area = find_vm_area(cpu_addr);
-   if (area && (area->flags & VM_ARM_DMA_CONSISTENT))
+   if (area && (area->flags & VM_DMA_COHERENT))
return area->pages;
return NULL;
 }
@@ -1630,10 +1624,8 @@ void __arm_iommu_free_attrs(struct device *dev, size_t 
size, void *cpu_addr,
return;
}
 
-   if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0) {
-   dma_common_free_remap(cpu_addr, size,
-   VM_ARM_DMA_CONSISTENT | VM_USERMAP);
-   }
+   if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
+   dma_common_free_remap(cpu_addr, size, VM_DMA_COHERENT);
 
__iommu_remove_mapping(dev, handle, size);
__iommu_free_buffer(dev, pages, size, attrs);
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index 6b045c6653ea..6ec48188ef9f 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -68,9 +68,6 @@ extern void __flush_dcache_page(struct address_space 
*mapping, struct page *page
 #define VM_ARM_MTYPE(mt)   ((mt) << 20)
 #define VM_ARM_MTYPE_MASK  (0x1f << 20)
 
-/* consistent regions used by dma_alloc_attrs() */
-#define VM_ARM_DMA_CONSISTENT  0x2000
-
 
 struct static_vm {
struct vm_struct vm;
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 51e131245379..500fa4fb06f0 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -18,6 +18,7 @@ struct notifier_block;/* in notifier.h */
 #define VM_ALLOC   0x0002  /* vmalloc() */
 #define VM_MAP 0x0004  /* vmap()ed pages */
 #define VM_USERMAP 0x0008  /* suitable for 
remap_vmalloc_range */
+#define VM_DMA_COHERENT0x0010  /* dma_alloc_coherent */
 #define VM_UNINITIALIZED   0x0020  /* vm_struct is not fully 
initialized */
 #define VM_NO_GUARD0x0040  /* don't add guard page */
 #define VM_KASAN   0x0080  /* has allocated kasan shadow 
memory */
@@ -26,6 +27,7 @@ struct notifier_block;/* in notifier.h */
  * vfree_atomic().
  */
 #define VM_FLUSH_RESET_PERMS   0x0100  /* Reset direct map and flush 
TLB on unmap */
+
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 233af6936c93..c4b5784bccc1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2948,7 +2948,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct 
*vma, unsigned long uaddr,
if (!area)
return -EINVAL;
 
-   if (!(area->flags & VM_USERMAP))
+   if (!(area->flags & (VM_USERMAP | VM_DMA_COHERENT)))
return -EINVAL;
 
if (kaddr + size > area->addr + get_vm_area_size(area))
@@ -3438,6 +3438,9 @@ static int s_show(struct seq_file *m, void *p)
 

[PATCH 3/3] dma-mapping: introduce a dma_common_find_pages helper

2019-06-03 Thread Christoph Hellwig
A helper to find the backing page array based on a virtual address.
This also ensures we do the same vm_flags check everywhere instead
of slightly different or missing ones in a few places.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c   |  7 +--
 drivers/iommu/dma-iommu.c   | 15 +++
 include/linux/dma-mapping.h |  1 +
 kernel/dma/remap.c  | 13 +++--
 4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 647fd25d2aba..7620d4f55e92 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1434,18 +1434,13 @@ static struct page **__atomic_get_pages(void *addr)
 
 static struct page **__iommu_get_pages(void *cpu_addr, unsigned long attrs)
 {
-   struct vm_struct *area;
-
if (__in_atomic_pool(cpu_addr, PAGE_SIZE))
return __atomic_get_pages(cpu_addr);
 
if (attrs & DMA_ATTR_NO_KERNEL_MAPPING)
return cpu_addr;
 
-   area = find_vm_area(cpu_addr);
-   if (area && (area->flags & VM_DMA_COHERENT))
-   return area->pages;
-   return NULL;
+   return dma_common_find_pages(cpu_addr);
 }
 
 static void *__iommu_alloc_simple(struct device *dev, size_t size, gfp_t gfp,
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index cea561897086..002d3bb6254a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -543,15 +543,6 @@ static struct page **__iommu_dma_alloc_pages(struct device 
*dev,
return pages;
 }
 
-static struct page **__iommu_dma_get_pages(void *cpu_addr)
-{
-   struct vm_struct *area = find_vm_area(cpu_addr);
-
-   if (!area || !area->pages)
-   return NULL;
-   return area->pages;
-}
-
 /**
  * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space
  * @dev: Device to allocate memory for. Must be a real device
@@ -940,7 +931,7 @@ static void __iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr)
 * If it the address is remapped, then it's either non-coherent
 * or highmem CMA, or an iommu_dma_alloc_remap() construction.
 */
-   pages = __iommu_dma_get_pages(cpu_addr);
+   pages = dma_common_find_pages(cpu_addr);
if (!pages)
page = vmalloc_to_page(cpu_addr);
dma_common_free_remap(cpu_addr, alloc_size);
@@ -1050,7 +1041,7 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
return -ENXIO;
 
if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
-   struct page **pages = __iommu_dma_get_pages(cpu_addr);
+   struct page **pages = dma_common_find_pages(cpu_addr);
 
if (pages)
return __iommu_dma_mmap(pages, size, vma);
@@ -1072,7 +1063,7 @@ static int iommu_dma_get_sgtable(struct device *dev, 
struct sg_table *sgt,
int ret;
 
if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
-   struct page **pages = __iommu_dma_get_pages(cpu_addr);
+   struct page **pages = dma_common_find_pages(cpu_addr);
 
if (pages) {
return sg_alloc_table_from_pages(sgt, pages,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index ac320b7cacfd..cb07d1388d66 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -615,6 +615,7 @@ extern int dma_common_mmap(struct device *dev, struct 
vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs);
 
+struct page **dma_common_find_pages(void *cpu_addr);
 void *dma_common_contiguous_remap(struct page *page, size_t size,
pgprot_t prot, const void *caller);
 
diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
index 51958d21c810..52cdca386de0 100644
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -11,6 +11,15 @@
 #include 
 #include 
 
+struct page **dma_common_find_pages(void *cpu_addr)
+{
+   struct vm_struct *area = find_vm_area(cpu_addr);
+
+   if (!area || area->flags != VM_DMA_COHERENT)
+   return NULL;
+   return area->pages;
+}
+
 static struct vm_struct *__dma_common_pages_remap(struct page **pages,
size_t size, pgprot_t prot, const void *caller)
 {
@@ -78,9 +87,9 @@ void *dma_common_contiguous_remap(struct page *page, size_t 
size,
  */
 void dma_common_free_remap(void *cpu_addr, size_t size)
 {
-   struct vm_struct *area = find_vm_area(cpu_addr);
+   struct page **pages = dma_common_find_pages(cpu_addr);
 
-   if (!area || area->flags != VM_DMA_COHERENT) {
+   if (!pages) {
WARN(1, "trying to free invalid coherent area: %p\n", cpu_addr);
return;
}
-- 
2.20.1

__

[PATCH 2/3] dma-mapping: always use VM_DMA_COHERENT for generic DMA remap

2019-06-03 Thread Christoph Hellwig
Currently the generic dma remap allocator gets a vm_flags passed by
the caller that is a little confusing.  We just introduced a generic
vmalloc-level flag to identify the dma coherent allocations, so use
that everywhere and remove the now pointless argument.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c| 37 +++-
 arch/xtensa/kernel/pci-dma.c |  4 ++--
 drivers/iommu/dma-iommu.c|  6 +++---
 include/linux/dma-mapping.h  |  6 ++
 kernel/dma/remap.c   | 25 +++-
 5 files changed, 25 insertions(+), 53 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e197b028e341..647fd25d2aba 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -356,19 +356,6 @@ static void *__alloc_remap_buffer(struct device *dev, 
size_t size, gfp_t gfp,
 pgprot_t prot, struct page **ret_page,
 const void *caller, bool want_vaddr);
 
-static void *
-__dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot,
-   const void *caller)
-{
-   return dma_common_contiguous_remap(page, size, VM_DMA_COHERENT,
-   prot, caller);
-}
-
-static void __dma_free_remap(void *cpu_addr, size_t size)
-{
-   dma_common_free_remap(cpu_addr, size, VM_DMA_COHERENT);
-}
-
 #define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K
 static struct gen_pool *atomic_pool __ro_after_init;
 
@@ -525,7 +512,7 @@ static void *__alloc_remap_buffer(struct device *dev, 
size_t size, gfp_t gfp,
if (!want_vaddr)
goto out;
 
-   ptr = __dma_alloc_remap(page, size, gfp, prot, caller);
+   ptr = dma_common_contiguous_remap(page, size, gfp, prot, caller);
if (!ptr) {
__dma_free_buffer(page, size);
return NULL;
@@ -592,7 +579,8 @@ static void *__alloc_from_contiguous(struct device *dev, 
size_t size,
goto out;
 
if (PageHighMem(page)) {
-   ptr = __dma_alloc_remap(page, size, GFP_KERNEL, prot, caller);
+   ptr = dma_common_contiguous_remap(page, size, GFP_KERNEL, prot,
+   caller);
if (!ptr) {
dma_release_from_contiguous(dev, page, count);
return NULL;
@@ -612,7 +600,7 @@ static void __free_from_contiguous(struct device *dev, 
struct page *page,
 {
if (want_vaddr) {
if (PageHighMem(page))
-   __dma_free_remap(cpu_addr, size);
+   dma_common_free_remap(cpu_addr, size);
else
__dma_remap(page, size, PAGE_KERNEL);
}
@@ -704,7 +692,7 @@ static void *remap_allocator_alloc(struct 
arm_dma_alloc_args *args,
 static void remap_allocator_free(struct arm_dma_free_args *args)
 {
if (args->want_vaddr)
-   __dma_free_remap(args->cpu_addr, args->size);
+   dma_common_free_remap(args->cpu_addr, args->size);
 
__dma_free_buffer(args->page, args->size);
 }
@@ -1374,17 +1362,6 @@ static int __iommu_free_buffer(struct device *dev, 
struct page **pages,
return 0;
 }
 
-/*
- * Create a CPU mapping for a specified pages
- */
-static void *
-__iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp, pgprot_t prot,
-   const void *caller)
-{
-   return dma_common_pages_remap(pages, size, VM_DMA_COHERENT, prot,
-   caller);
-}
-
 /*
  * Create a mapping in device IO address space for specified pages
  */
@@ -1541,7 +1518,7 @@ static void *__arm_iommu_alloc_attrs(struct device *dev, 
size_t size,
if (attrs & DMA_ATTR_NO_KERNEL_MAPPING)
return pages;
 
-   addr = __iommu_alloc_remap(pages, size, gfp, prot,
+   addr = dma_common_pages_remap(pages, size, gfp, prot,
   __builtin_return_address(0));
if (!addr)
goto err_mapping;
@@ -1625,7 +1602,7 @@ void __arm_iommu_free_attrs(struct device *dev, size_t 
size, void *cpu_addr,
}
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
-   dma_common_free_remap(cpu_addr, size, VM_DMA_COHERENT);
+   dma_common_free_remap(cpu_addr, size);
 
__iommu_remove_mapping(dev, handle, size);
__iommu_free_buffer(dev, pages, size, attrs);
diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c
index 9171bff76fc4..81c1def6b5a2 100644
--- a/arch/xtensa/kernel/pci-dma.c
+++ b/arch/xtensa/kernel/pci-dma.c
@@ -175,7 +175,7 @@ void *arch_dma_alloc(struct device *dev, size_t size, 
dma_addr_t *handle,
if (PageHighMem(page)) {
void *p;
 
-   p = dma_common_contiguous_remap(page, size, VM_MAP,
+   p = dma_common_contiguous_remap(page, size,
pgprot_noncached(PAGE_KERNEL),