Re: [RFC PATCH v2 07/11] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support

2021-03-12 Thread Ira Weiny
On Thu, Mar 11, 2021 at 04:31:37PM -0700, Logan Gunthorpe wrote:
 
> +int dma_pci_p2pdma_supported(struct device *dev)
   ^^^
  bool?

> +{
> + const struct dma_map_ops *ops = get_dma_ops(dev);
> +
> + return !ops || ops->flags & DMA_F_PCI_P2PDMA_SUPPORTED;

Is this logic correct?  I would have expected.

return (ops && ops->flags & DMA_F_PCI_P2PDMA_SUPPORTED);

Ira
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 04/11] PCI/P2PDMA: Introduce pci_p2pdma_should_map_bus() and pci_p2pdma_bus_offset()

2021-03-12 Thread Ira Weiny
On Thu, Mar 11, 2021 at 04:31:34PM -0700, Logan Gunthorpe wrote:
> Introduce pci_p2pdma_should_map_bus() which is meant to be called by
^
pci_p2pdma_dma_map_type() ???

FWIW I find this name confusing with pci_p2pdma_map_type() and looking at the
implementation I'm not clear why pci_p2pdma_dma_map_type() needs to exist?

Ira

[snip]

> +
> +/**
> + * pci_p2pdma_dma_map_type - determine if a DMA mapping should use the
> + *   bus address, be mapped normally or fail
> + * @dev: device doing the DMA request
> + * @pgmap: dev_pagemap structure for the mapping
> + *
> + * Returns:
> + *1 - if the page should be mapped with a bus address,
> + *0 - if the page should be mapped normally through an IOMMU mapping or
> + *physical address; or
> + *   -1 - if the device should not map the pages and an error should be
> + *returned
> + */
> +int pci_p2pdma_dma_map_type(struct device *dev, struct dev_pagemap *pgmap)
> +{
> + struct pci_p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(pgmap);
> + struct pci_dev *client;
> +
> + if (!dev_is_pci(dev))
> + return -1;
> +
> + client = to_pci_dev(dev);
> +
> + switch (pci_p2pdma_map_type(p2p_pgmap->provider, client)) {
> + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> + return 0;
> + case PCI_P2PDMA_MAP_BUS_ADDR:
> + return 1;
> + default:
> + return -1;
> + }
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_dma_map_type);

I guess the main point here is to export this to the DMA layer?

Ira
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 04/11] PCI/P2PDMA: Introduce pci_p2pdma_should_map_bus() and pci_p2pdma_bus_offset()

2021-03-12 Thread Ira Weiny
On Thu, Mar 11, 2021 at 04:31:34PM -0700, Logan Gunthorpe wrote:
> Introduce pci_p2pdma_should_map_bus() which is meant to be called by
> DMA map functions to determine how to map a given p2pdma page.
> 
> pci_p2pdma_bus_offset() is also added to allow callers to get the bus
> offset if they need to map the bus address.
> 
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/pci/p2pdma.c   | 50 ++
>  include/linux/pci-p2pdma.h | 11 +
>  2 files changed, 61 insertions(+)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 7f6836732bce..66d16b7eb668 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -912,6 +912,56 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, 
> struct scatterlist *sg,
>  }
>  EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
>  
> +/**
> + * pci_p2pdma_bus_offset - returns the bus offset for a given page
> + * @page: page to get the offset for
> + *
> + * Must be passed a PCI p2pdma page.
> + */
> +u64 pci_p2pdma_bus_offset(struct page *page)
> +{
> + struct pci_p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(page->pgmap);
> +
> + WARN_ON(!is_pci_p2pdma_page(page));

Shouldn't this check be before the to_p2p_pgmap() call?  And I've been told not
to introduce WARN_ON's.  Should this be?

if (!is_pci_p2pdma_page(page))
return -1;

???

Ira
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 6/6] media: uvcvideo: Use dma_alloc_noncontiguous API

2021-03-12 Thread Ricardo Ribalda
On architectures where there is no coherent caching such as ARM use the
dma_alloc_noncontiguous API and handle manually the cache flushing using
dma_sync_sgtable().

If the architechture has coherent cache, the API falls back to
alloc_dma_pages, so we can remove the coherent caching code-path from the
driver, making it simpler.

With this patch on the affected architectures we can measure up to 20x
performance improvement in uvc_video_copy_data_work().

Eg: aarch64 with an external usb camera

NON_CONTIGUOUS
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 67034480 : duration 33303
FPS: 29.99
URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS)
decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
raw decode speed: 9.931 Gbits/s
raw URB handling speed: 1.025 Gbits/s
throughput: 16.102 Mbits/s
URB decode CPU usage 0.162600 %

COHERENT
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 54683536 : duration 33302
FPS: 29.99
URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS)
header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS)
decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max (uS)
raw decode speed: 365.470 Mbits/s
raw URB handling speed: 295.986 Mbits/s
throughput: 13.136 Mbits/s
URB decode CPU usage 3.594500 %

In non-affected architectures we see no significant impact.

Eg: x86 with an external usb camera

NON_CONTIGUOUS
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 70179056 : duration 33301
FPS: 29.99
URB: 288901/4897 uS/qty: 58.995 avg 26.022 std 4.319 min 253.853 max (uS)
header: 54792/4897 uS/qty: 11.189 avg 6.218 std 0.620 min 61.750 max (uS)
latency: 236602/4897 uS/qty: 48.315 avg 24.244 std 1.764 min 240.924 max (uS)
decode: 52298/4897 uS/qty: 10.679 avg 8.299 std 1.638 min 108.861 max (uS)
raw decode speed: 10.796 Gbits/s
raw URB handling speed: 1.949 Gbits/s
throughput: 16.859 Mbits/s
URB decode CPU usage 0.157000 %

COHERENT
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 71818320 : duration 33301
FPS: 29.99
URB: 321021/5000 uS/qty: 64.204 avg 23.001 std 10.430 min 268.837 max (uS)
header: 54308/5000 uS/qty: 10.861 avg 5.104 std 0.778 min 54.736 max (uS)
latency: 268799/5000 uS/qty: 53.759 avg 21.827 std 6.095 min 255.153 max (uS)
decode: 5/5000 uS/qty: 10.444 avg 7.137 std 1.874 min 71.103 max (uS)
raw decode speed: 11.048 Gbits/s
raw URB handling speed: 1.789 Gbits/s
throughput: 17.253 Mbits/s
URB decode CPU usage 0.156800 %

Signed-off-by: Ricardo Ribalda 
Reviewed-by: Laurent Pinchart 
Reviewed-by: Tomasz Figa 
Signed-off-by: Christoph Hellwig 
---

Changelog from v3 (Thanks Laurent!):

- Rename stream_dir and stream_to_dmadev to avoid collisions
- Improve commit message

 drivers/media/usb/uvc/uvc_video.c | 94 +++
 drivers/media/usb/uvc/uvcvideo.h  |  5 +-
 2 files changed, 73 insertions(+), 26 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_video.c 
b/drivers/media/usb/uvc/uvc_video.c
index f2f565281e63..cdd8eb500bb7 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -6,11 +6,14 @@
  *  Laurent Pinchart (laurent.pinch...@ideasonboard.com)
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1096,6 +1099,29 @@ static int uvc_video_decode_start(struct uvc_streaming 
*stream,
return data[0];
 }
 
+static inline enum dma_data_direction uvc_stream_dir(
+   struct uvc_streaming *stream)
+{
+   if (stream->type == V4L2_BUF_TYPE_VIDEO_CAPTURE)
+   return DMA_FROM_DEVICE;
+   else
+   return DMA_TO_DEVICE;
+}
+
+static inline struct device *uvc_stream_to_dmadev(struct uvc_streaming *stream)
+{
+   return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
+}
+
+static int uvc_submit_urb(struct uvc_urb *uvc_urb, gfp_t mem_flags)
+{
+   /* Sync DMA. */
+   dma_sync_sgtable_for_device(uvc_stream_to_dmadev(uvc_urb->stream),
+   uvc_urb->sgt,
+   uvc_stream_dir(uvc_urb->stream));
+   return usb_submit_urb(uvc_urb->urb, GFP_KERNEL);
+}
+
 /*
  * uvc_video_decode_data_work: Asynchronous 

Re: [PATCH v3 6/6] media: uvcvideo: Use dma_alloc_noncontiguous API

2021-03-12 Thread Laurent Pinchart
Hi Ricardo,

Thank you for the patch.

On Fri, Mar 12, 2021 at 11:25:39PM +0100, Ricardo Ribalda wrote:
> On architectures where there is no coherent caching such as ARM use the
> dma_alloc_noncontiguous API and handle manually the cache flushing using
> dma_sync_sgtable().

Maybe updating this based on the comment I've just sent in the v2 thread
?

> With this patch on the affected architectures we can measure up to 20x
> performance improvement in uvc_video_copy_data_work().
> 
> Eg: aarch64 with an external usb camera
> 
> NON_CONTIGUOUS
> frames:  999
> packets: 999
> empty:   0 (0 %)
> errors:  0
> invalid: 0
> pts: 0 early, 0 initial, 999 ok
> scr: 0 count ok, 0 diff ok
> sof: 2048 <= sof <= 0, freq 0.000 kHz
> bytes 67034480 : duration 33303
> FPS: 29.99
> URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
> header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
> latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS)
> decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
> raw decode speed: 9.931 Gbits/s
> raw URB handling speed: 1.025 Gbits/s
> throughput: 16.102 Mbits/s
> URB decode CPU usage 0.162600 %
> 
> COHERENT
> frames:  999
> packets: 999
> empty:   0 (0 %)
> errors:  0
> invalid: 0
> pts: 0 early, 0 initial, 999 ok
> scr: 0 count ok, 0 diff ok
> sof: 2048 <= sof <= 0, freq 0.000 kHz
> bytes 54683536 : duration 33302
> FPS: 29.99
> URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS)
> header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
> latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS)
> decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max 
> (uS)
> raw decode speed: 365.470 Mbits/s
> raw URB handling speed: 295.986 Mbits/s
> throughput: 13.136 Mbits/s
> URB decode CPU usage 3.594500 %
> 
> In non-affected architectures we see no significant impact.
> 
> Eg: x86 with an external usb camera
> 
> NON_CONTIGUOUS
> frames:  999
> packets: 999
> empty:   0 (0 %)
> errors:  0
> invalid: 0
> pts: 0 early, 0 initial, 999 ok
> scr: 0 count ok, 0 diff ok
> sof: 2048 <= sof <= 0, freq 0.000 kHz
> bytes 70179056 : duration 33301
> FPS: 29.99
> URB: 288901/4897 uS/qty: 58.995 avg 26.022 std 4.319 min 253.853 max (uS)
> header: 54792/4897 uS/qty: 11.189 avg 6.218 std 0.620 min 61.750 max (uS)
> latency: 236602/4897 uS/qty: 48.315 avg 24.244 std 1.764 min 240.924 max (uS)
> decode: 52298/4897 uS/qty: 10.679 avg 8.299 std 1.638 min 108.861 max (uS)
> raw decode speed: 10.796 Gbits/s
> raw URB handling speed: 1.949 Gbits/s
> throughput: 16.859 Mbits/s
> URB decode CPU usage 0.157000 %
> 
> COHERENT
> frames:  999
> packets: 999
> empty:   0 (0 %)
> errors:  0
> invalid: 0
> pts: 0 early, 0 initial, 999 ok
> scr: 0 count ok, 0 diff ok
> sof: 2048 <= sof <= 0, freq 0.000 kHz
> bytes 71818320 : duration 33301
> FPS: 29.99
> URB: 321021/5000 uS/qty: 64.204 avg 23.001 std 10.430 min 268.837 max (uS)
> header: 54308/5000 uS/qty: 10.861 avg 5.104 std 0.778 min 54.736 max (uS)
> latency: 268799/5000 uS/qty: 53.759 avg 21.827 std 6.095 min 255.153 max (uS)
> decode: 5/5000 uS/qty: 10.444 avg 7.137 std 1.874 min 71.103 max (uS)
> raw decode speed: 11.048 Gbits/s
> raw URB handling speed: 1.789 Gbits/s
> throughput: 17.253 Mbits/s
> URB decode CPU usage 0.156800 %
> 
> Signed-off-by: Ricardo Ribalda 
> Reviewed-by: Laurent Pinchart 
> Reviewed-by: Tomasz Figa 
> Signed-off-by: Christoph Hellwig 
> ---
> 
> Changelog from v2 (Thanks Laurent!):
> 
> - Replace uvc_urb_dma_sync with uvc_submit_urb
> - Add x86 stats in commit message
> 
>  drivers/media/usb/uvc/uvc_video.c | 92 ++-
>  drivers/media/usb/uvc/uvcvideo.h  |  5 +-
>  2 files changed, 71 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/media/usb/uvc/uvc_video.c 
> b/drivers/media/usb/uvc/uvc_video.c
> index f2f565281e63..37ee39412b83 100644
> --- a/drivers/media/usb/uvc/uvc_video.c
> +++ b/drivers/media/usb/uvc/uvc_video.c
> @@ -6,11 +6,14 @@
>   *  Laurent Pinchart (laurent.pinch...@ideasonboard.com)
>   */
>  
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1096,6 +1099,28 @@ static int uvc_video_decode_start(struct uvc_streaming 
> *stream,
>   return data[0];
>  }
>  
> +static inline enum dma_data_direction stream_dir(struct uvc_streaming 
> *stream)

I hadn't noticed this before, but could you name this function
uvc_stream_dir() (and uvc_stream_to_dmadev() below) to avoid potential
namespace clashes ?

> +{
> + if (stream->type == V4L2_BUF_TYPE_VIDEO_CAPTURE)
> + return DMA_FROM_DEVICE;
> + else
> + return DMA_TO_DEVICE;
> +}
> +
> +static inline struct device *stream_to_dmadev(struct uvc_streaming *stream)
> +{
> + return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
> +}

Re: [PATCH v2 6/6] media: uvcvideo: Use dma_alloc_noncontiguous API

2021-03-12 Thread Ricardo Ribalda Delgado
Hi Laurent

Thanks a lot for the review

On Fri, Mar 12, 2021 at 10:19 PM Laurent Pinchart
 wrote:
>
> Hi Ricardo,
>
> Thank you for the patch.
>
> On Fri, Mar 12, 2021 at 01:57:09PM +0100, Ricardo Ribalda wrote:
> > On architectures where there is no coherent caching such as ARM use the
> > dma_alloc_noncontiguous API and handle manually the cache flushing using
> > dma_sync_sgtable().
>
> You're actually switching to dma_alloc_noncontiguous() unconditionally,
> not only on those architectures :-) Do I assume correctly that
> dma_alloc_noncontiguous() will do the right thing on x86 too ?

Yes dma_alloc_noncontiguous does all the magic. It falls back to
alloc_dma_pages. Christoph has done a great job.

I tried it in my notebook and although the images are nothing but
pretty it is not the APIs fault. It is because the barbers are closed
due to the pandemic ;).

>
> > With this patch on the affected architectures we can measure up to 20x
> > performance improvement in uvc_video_copy_data_work().
>
> Have you measured performances on x86 to ensure there's no regression ?

Yes in the early stages. I am adding the latest x86 measurements in
the commit message in v3 to make it more clear.

>
> > Eg: aarch64 with an external usb camera
> >
> > NON_CONTIGUOUS
> > frames:  999
> > packets: 999
> > empty:   0 (0 %)
> > errors:  0
> > invalid: 0
> > pts: 0 early, 0 initial, 999 ok
> > scr: 0 count ok, 0 diff ok
> > sof: 2048 <= sof <= 0, freq 0.000 kHz
> > bytes 67034480 : duration 33303
> > FPS: 29.99
> > URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
> > header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
> > latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max 
> > (uS)
> > decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
> > raw decode speed: 9.931 Gbits/s
> > raw URB handling speed: 1.025 Gbits/s
> > throughput: 16.102 Mbits/s
> > URB decode CPU usage 0.162600 %
> >
> > COHERENT
> > frames:  999
> > packets: 999
> > empty:   0 (0 %)
> > errors:  0
> > invalid: 0
> > pts: 0 early, 0 initial, 999 ok
> > scr: 0 count ok, 0 diff ok
> > sof: 2048 <= sof <= 0, freq 0.000 kHz
> > bytes 54683536 : duration 33302
> > FPS: 29.99
> > URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max 
> > (uS)
> > header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
> > latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max 
> > (uS)
> > decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max 
> > (uS)
> > raw decode speed: 365.470 Mbits/s
> > raw URB handling speed: 295.986 Mbits/s
> > throughput: 13.136 Mbits/s
> > URB decode CPU usage 3.594500 %
> >
> > Signed-off-by: Ricardo Ribalda 
> > Reviewed-by: Tomasz Figa 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >
> > Changelog from v2: (Thanks Laurent)
> >
> > - Fix typos
> > - Use the right dma direction if not capturing
> > - Clear sgt during free
> >
> >  drivers/media/usb/uvc/uvc_video.c | 92 +++
> >  drivers/media/usb/uvc/uvcvideo.h  |  5 +-
> >  2 files changed, 74 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/media/usb/uvc/uvc_video.c 
> > b/drivers/media/usb/uvc/uvc_video.c
> > index f2f565281e63..8e60f81e2257 100644
> > --- a/drivers/media/usb/uvc/uvc_video.c
> > +++ b/drivers/media/usb/uvc/uvc_video.c
> > @@ -6,11 +6,14 @@
> >   *  Laurent Pinchart (laurent.pinch...@ideasonboard.com)
> >   */
> >
> > +#include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -1096,6 +1099,34 @@ static int uvc_video_decode_start(struct 
> > uvc_streaming *stream,
> >   return data[0];
> >  }
> >
> > +static inline enum dma_data_direction stream_dir(struct uvc_streaming 
> > *stream)
> > +{
> > + if (stream->type == V4L2_BUF_TYPE_VIDEO_CAPTURE)
> > + return DMA_FROM_DEVICE;
> > + else
> > + return DMA_TO_DEVICE;
> > +}
> > +
> > +static inline struct device *stream_to_dmadev(struct uvc_streaming *stream)
> > +{
> > + return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
> > +}
> > +
> > +static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)
>
> Maybe nitpicking a little bit, but wouldn't the code be clearer if you
> created uvc_urb_dma_sync_for_cpu() and uvc_urb_dma_sync_for_device() ?
> When reading
>
> uvc_urb_dma_sync(uvc_urb, true);
>
> I have to constantly look up the definition of the function to figure
> out what boolean value corresponds to what direction.
>
> Given that uvc_urb_dma_sync(..., true) is always called right before
> submitting the URB, we could even create a uvc_submit_urb() function
> that groups the dma_sync_sgtable_for_device() and usb_submit_urb()
> calls, and do without uvc_urb_dma_sync_for_device(). Up to you on this
> one.

I Like it! thanks for the idea

>
> 

Re: [PATCH v2 6/6] media: uvcvideo: Use dma_alloc_noncontiguous API

2021-03-12 Thread Laurent Pinchart
Hi Ricardo,

On Fri, Mar 12, 2021 at 11:15:46PM +0100, Ricardo Ribalda Delgado wrote:
> On Fri, Mar 12, 2021 at 10:19 PM Laurent Pinchart wrote:
> > On Fri, Mar 12, 2021 at 01:57:09PM +0100, Ricardo Ribalda wrote:
> > > On architectures where there is no coherent caching such as ARM use the
> > > dma_alloc_noncontiguous API and handle manually the cache flushing using
> > > dma_sync_sgtable().
> >
> > You're actually switching to dma_alloc_noncontiguous() unconditionally,
> > not only on those architectures :-) Do I assume correctly that
> > dma_alloc_noncontiguous() will do the right thing on x86 too ?
> 
> Yes dma_alloc_noncontiguous does all the magic. It falls back to
> alloc_dma_pages. Christoph has done a great job.

It's a really nice work yes.

Maybe the commit message could be reworded accordingly then, to
explain that we switch to dma_alloc_noncontiguous() unconditionally as
it handles the differences behind the scenes ?

> I tried it in my notebook and although the images are nothing but
> pretty it is not the APIs fault. It is because the barbers are closed
> due to the pandemic ;).

:-)

> > > With this patch on the affected architectures we can measure up to 20x
> > > performance improvement in uvc_video_copy_data_work().
> >
> > Have you measured performances on x86 to ensure there's no regression ?
> 
> Yes in the early stages. I am adding the latest x86 measurements in
> the commit message in v3 to make it more clear.
> 
> > > Eg: aarch64 with an external usb camera
> > >
> > > NON_CONTIGUOUS
> > > frames:  999
> > > packets: 999
> > > empty:   0 (0 %)
> > > errors:  0
> > > invalid: 0
> > > pts: 0 early, 0 initial, 999 ok
> > > scr: 0 count ok, 0 diff ok
> > > sof: 2048 <= sof <= 0, freq 0.000 kHz
> > > bytes 67034480 : duration 33303
> > > FPS: 29.99
> > > URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max 
> > > (uS)
> > > header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max 
> > > (uS)
> > > latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max 
> > > (uS)
> > > decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
> > > raw decode speed: 9.931 Gbits/s
> > > raw URB handling speed: 1.025 Gbits/s
> > > throughput: 16.102 Mbits/s
> > > URB decode CPU usage 0.162600 %
> > >
> > > COHERENT
> > > frames:  999
> > > packets: 999
> > > empty:   0 (0 %)
> > > errors:  0
> > > invalid: 0
> > > pts: 0 early, 0 initial, 999 ok
> > > scr: 0 count ok, 0 diff ok
> > > sof: 2048 <= sof <= 0, freq 0.000 kHz
> > > bytes 54683536 : duration 33302
> > > FPS: 29.99
> > > URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max 
> > > (uS)
> > > header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max 
> > > (uS)
> > > latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max 
> > > (uS)
> > > decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 
> > > max (uS)
> > > raw decode speed: 365.470 Mbits/s
> > > raw URB handling speed: 295.986 Mbits/s
> > > throughput: 13.136 Mbits/s
> > > URB decode CPU usage 3.594500 %
> > >
> > > Signed-off-by: Ricardo Ribalda 
> > > Reviewed-by: Tomasz Figa 
> > > Signed-off-by: Christoph Hellwig 
> > > ---
> > >
> > > Changelog from v2: (Thanks Laurent)
> > >
> > > - Fix typos
> > > - Use the right dma direction if not capturing
> > > - Clear sgt during free
> > >
> > >  drivers/media/usb/uvc/uvc_video.c | 92 +++
> > >  drivers/media/usb/uvc/uvcvideo.h  |  5 +-
> > >  2 files changed, 74 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/drivers/media/usb/uvc/uvc_video.c 
> > > b/drivers/media/usb/uvc/uvc_video.c
> > > index f2f565281e63..8e60f81e2257 100644
> > > --- a/drivers/media/usb/uvc/uvc_video.c
> > > +++ b/drivers/media/usb/uvc/uvc_video.c
> > > @@ -6,11 +6,14 @@
> > >   *  Laurent Pinchart (laurent.pinch...@ideasonboard.com)
> > >   */
> > >
> > > +#include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -1096,6 +1099,34 @@ static int uvc_video_decode_start(struct 
> > > uvc_streaming *stream,
> > >   return data[0];
> > >  }
> > >
> > > +static inline enum dma_data_direction stream_dir(struct uvc_streaming 
> > > *stream)
> > > +{
> > > + if (stream->type == V4L2_BUF_TYPE_VIDEO_CAPTURE)
> > > + return DMA_FROM_DEVICE;
> > > + else
> > > + return DMA_TO_DEVICE;
> > > +}
> > > +
> > > +static inline struct device *stream_to_dmadev(struct uvc_streaming 
> > > *stream)
> > > +{
> > > + return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
> > > +}
> > > +
> > > +static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)
> >
> > Maybe nitpicking a little bit, but wouldn't the code be clearer if you
> > created uvc_urb_dma_sync_for_cpu() and uvc_urb_dma_sync_for_device() ?
> > When reading
> >
> >   

[PATCH v3 6/6] media: uvcvideo: Use dma_alloc_noncontiguous API

2021-03-12 Thread Ricardo Ribalda
On architectures where there is no coherent caching such as ARM use the
dma_alloc_noncontiguous API and handle manually the cache flushing using
dma_sync_sgtable().

With this patch on the affected architectures we can measure up to 20x
performance improvement in uvc_video_copy_data_work().

Eg: aarch64 with an external usb camera

NON_CONTIGUOUS
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 67034480 : duration 33303
FPS: 29.99
URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS)
decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
raw decode speed: 9.931 Gbits/s
raw URB handling speed: 1.025 Gbits/s
throughput: 16.102 Mbits/s
URB decode CPU usage 0.162600 %

COHERENT
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 54683536 : duration 33302
FPS: 29.99
URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS)
header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS)
decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max (uS)
raw decode speed: 365.470 Mbits/s
raw URB handling speed: 295.986 Mbits/s
throughput: 13.136 Mbits/s
URB decode CPU usage 3.594500 %

In non-affected architectures we see no significant impact.

Eg: x86 with an external usb camera

NON_CONTIGUOUS
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 70179056 : duration 33301
FPS: 29.99
URB: 288901/4897 uS/qty: 58.995 avg 26.022 std 4.319 min 253.853 max (uS)
header: 54792/4897 uS/qty: 11.189 avg 6.218 std 0.620 min 61.750 max (uS)
latency: 236602/4897 uS/qty: 48.315 avg 24.244 std 1.764 min 240.924 max (uS)
decode: 52298/4897 uS/qty: 10.679 avg 8.299 std 1.638 min 108.861 max (uS)
raw decode speed: 10.796 Gbits/s
raw URB handling speed: 1.949 Gbits/s
throughput: 16.859 Mbits/s
URB decode CPU usage 0.157000 %

COHERENT
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 71818320 : duration 33301
FPS: 29.99
URB: 321021/5000 uS/qty: 64.204 avg 23.001 std 10.430 min 268.837 max (uS)
header: 54308/5000 uS/qty: 10.861 avg 5.104 std 0.778 min 54.736 max (uS)
latency: 268799/5000 uS/qty: 53.759 avg 21.827 std 6.095 min 255.153 max (uS)
decode: 5/5000 uS/qty: 10.444 avg 7.137 std 1.874 min 71.103 max (uS)
raw decode speed: 11.048 Gbits/s
raw URB handling speed: 1.789 Gbits/s
throughput: 17.253 Mbits/s
URB decode CPU usage 0.156800 %

Signed-off-by: Ricardo Ribalda 
Reviewed-by: Laurent Pinchart 
Reviewed-by: Tomasz Figa 
Signed-off-by: Christoph Hellwig 
---

Changelog from v2 (Thanks Laurent!):

- Replace uvc_urb_dma_sync with uvc_submit_urb
- Add x86 stats in commit message

 drivers/media/usb/uvc/uvc_video.c | 92 ++-
 drivers/media/usb/uvc/uvcvideo.h  |  5 +-
 2 files changed, 71 insertions(+), 26 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_video.c 
b/drivers/media/usb/uvc/uvc_video.c
index f2f565281e63..37ee39412b83 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -6,11 +6,14 @@
  *  Laurent Pinchart (laurent.pinch...@ideasonboard.com)
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1096,6 +1099,28 @@ static int uvc_video_decode_start(struct uvc_streaming 
*stream,
return data[0];
 }
 
+static inline enum dma_data_direction stream_dir(struct uvc_streaming *stream)
+{
+   if (stream->type == V4L2_BUF_TYPE_VIDEO_CAPTURE)
+   return DMA_FROM_DEVICE;
+   else
+   return DMA_TO_DEVICE;
+}
+
+static inline struct device *stream_to_dmadev(struct uvc_streaming *stream)
+{
+   return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
+}
+
+static int uvc_submit_urb(struct uvc_urb *uvc_urb, gfp_t mem_flags)
+{
+   /* Sync DMA. */
+   dma_sync_sgtable_for_device(stream_to_dmadev(uvc_urb->stream),
+   uvc_urb->sgt,
+   stream_dir(uvc_urb->stream));
+   return usb_submit_urb(uvc_urb->urb, GFP_KERNEL);
+}
+
 /*
  * uvc_video_decode_data_work: Asynchronous memcpy processing
  *
@@ -1117,7 +1142,7 @@ static void uvc_video_copy_data_work(struct work_struct 
*work)
uvc_queue_buffer_release(op->buf);
}
 
-   ret = usb_submit_urb(uvc_urb->urb, 

Re: [RFC PATCH v2 02/11] PCI/P2PDMA: Avoid pci_get_slot() which sleeps

2021-03-12 Thread Logan Gunthorpe



On 2021-03-12 1:57 p.m., Bjorn Helgaas wrote:
> On Thu, Mar 11, 2021 at 04:31:32PM -0700, Logan Gunthorpe wrote:
>> In order to use upstream_bridge_distance_warn() from a dma_map function,
>> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
>> might sleep.
>>
>> In order to avoid this, try to get the host bridge's device from
>> bus->self, and if that is not set just get the first element in the
>> list. It should be impossible for the host bridges device to go away
>> while references are held on child devices, so the first element
>> should not change and this should be safe.
>>
>> Signed-off-by: Logan Gunthorpe 
>> ---
>>  drivers/pci/p2pdma.c | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index bd89437faf06..2135fe69bb07 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -311,11 +311,15 @@ static const struct pci_p2pdma_whitelist_entry {
>>  static bool __host_bridge_whitelist(struct pci_host_bridge *host,
>>  bool same_host_bridge)
>>  {
>> -struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
>>  const struct pci_p2pdma_whitelist_entry *entry;
>> +struct pci_dev *root = host->bus->self;
>>  unsigned short vendor, device;
>>  
>>  if (!root)
>> +root = list_first_entry_or_null(>bus->devices,
>> +struct pci_dev, bus_list);
> 
> Replacing one ugliness (assuming there is a pci_dev for the host
> bridge, and that it is at 00.0) with another (still assuming a pci_dev
> and that it is host->bus->self or the first entry).  I can't suggest
> anything better, but maybe a little comment in the code would help
> future readers.

Yeah, I struggled to find a solution here; this was the best I could
come up with. I'd love it if someone had a better idea. I can add a
comment for future iterations.

> I wish we had a real way to discover this property without the
> whitelist, at least for future devices.  Was there ever any interest
> in a _DSM or similar interface for this?

I'd also like to get rid of the whitelist, but I have no idea how or who
would have to lead a fight to get the hardware to self describe in way
that we could use.

> I *am* very glad to remove a pci_get_slot() usage.
> 
>> +
>> +if (!root || root->devfn)
>>  return false;
>>  
>>  vendor = root->vendor;
> 
> Don't you need to also remove the "pci_dev_put(root)" a few lines
> below?

Yes, right. Good catch!

Logan

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/6] media: uvcvideo: Use dma_alloc_noncontiguous API

2021-03-12 Thread Laurent Pinchart
Hi Ricardo,

Thank you for the patch.

On Fri, Mar 12, 2021 at 01:57:09PM +0100, Ricardo Ribalda wrote:
> On architectures where there is no coherent caching such as ARM use the
> dma_alloc_noncontiguous API and handle manually the cache flushing using
> dma_sync_sgtable().

You're actually switching to dma_alloc_noncontiguous() unconditionally,
not only on those architectures :-) Do I assume correctly that
dma_alloc_noncontiguous() will do the right thing on x86 too ?

> With this patch on the affected architectures we can measure up to 20x
> performance improvement in uvc_video_copy_data_work().

Have you measured performances on x86 to ensure there's no regression ?

> Eg: aarch64 with an external usb camera
> 
> NON_CONTIGUOUS
> frames:  999
> packets: 999
> empty:   0 (0 %)
> errors:  0
> invalid: 0
> pts: 0 early, 0 initial, 999 ok
> scr: 0 count ok, 0 diff ok
> sof: 2048 <= sof <= 0, freq 0.000 kHz
> bytes 67034480 : duration 33303
> FPS: 29.99
> URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
> header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
> latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS)
> decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
> raw decode speed: 9.931 Gbits/s
> raw URB handling speed: 1.025 Gbits/s
> throughput: 16.102 Mbits/s
> URB decode CPU usage 0.162600 %
> 
> COHERENT
> frames:  999
> packets: 999
> empty:   0 (0 %)
> errors:  0
> invalid: 0
> pts: 0 early, 0 initial, 999 ok
> scr: 0 count ok, 0 diff ok
> sof: 2048 <= sof <= 0, freq 0.000 kHz
> bytes 54683536 : duration 33302
> FPS: 29.99
> URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS)
> header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
> latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS)
> decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max 
> (uS)
> raw decode speed: 365.470 Mbits/s
> raw URB handling speed: 295.986 Mbits/s
> throughput: 13.136 Mbits/s
> URB decode CPU usage 3.594500 %
> 
> Signed-off-by: Ricardo Ribalda 
> Reviewed-by: Tomasz Figa 
> Signed-off-by: Christoph Hellwig 
> ---
> 
> Changelog from v2: (Thanks Laurent)
> 
> - Fix typos
> - Use the right dma direction if not capturing
> - Clear sgt during free
> 
>  drivers/media/usb/uvc/uvc_video.c | 92 +++
>  drivers/media/usb/uvc/uvcvideo.h  |  5 +-
>  2 files changed, 74 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/media/usb/uvc/uvc_video.c 
> b/drivers/media/usb/uvc/uvc_video.c
> index f2f565281e63..8e60f81e2257 100644
> --- a/drivers/media/usb/uvc/uvc_video.c
> +++ b/drivers/media/usb/uvc/uvc_video.c
> @@ -6,11 +6,14 @@
>   *  Laurent Pinchart (laurent.pinch...@ideasonboard.com)
>   */
>  
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1096,6 +1099,34 @@ static int uvc_video_decode_start(struct uvc_streaming 
> *stream,
>   return data[0];
>  }
>  
> +static inline enum dma_data_direction stream_dir(struct uvc_streaming 
> *stream)
> +{
> + if (stream->type == V4L2_BUF_TYPE_VIDEO_CAPTURE)
> + return DMA_FROM_DEVICE;
> + else
> + return DMA_TO_DEVICE;
> +}
> +
> +static inline struct device *stream_to_dmadev(struct uvc_streaming *stream)
> +{
> + return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
> +}
> +
> +static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)

Maybe nitpicking a little bit, but wouldn't the code be clearer if you
created uvc_urb_dma_sync_for_cpu() and uvc_urb_dma_sync_for_device() ?
When reading

uvc_urb_dma_sync(uvc_urb, true);

I have to constantly look up the definition of the function to figure
out what boolean value corresponds to what direction.

Given that uvc_urb_dma_sync(..., true) is always called right before
submitting the URB, we could even create a uvc_submit_urb() function
that groups the dma_sync_sgtable_for_device() and usb_submit_urb()
calls, and do without uvc_urb_dma_sync_for_device(). Up to you on this
one.

With those small changes, and assuming there's no performance regression
on x86,

Reviewed-by: Laurent Pinchart 

> +{
> + struct device *dma_dev = stream_to_dmadev(uvc_urb->stream);
> +
> + if (for_device) {
> + dma_sync_sgtable_for_device(dma_dev, uvc_urb->sgt,
> + stream_dir(uvc_urb->stream));
> + } else {
> + dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt,
> +  stream_dir(uvc_urb->stream));
> + invalidate_kernel_vmap_range(uvc_urb->buffer,
> +  uvc_urb->stream->urb_size);
> + }
> +}
> +
>  /*
>   * uvc_video_decode_data_work: Asynchronous memcpy processing
>   *
> @@ -1117,6 +1148,8 @@ static void 

Re: [RFC PATCH v2 02/11] PCI/P2PDMA: Avoid pci_get_slot() which sleeps

2021-03-12 Thread Bjorn Helgaas
On Thu, Mar 11, 2021 at 04:31:32PM -0700, Logan Gunthorpe wrote:
> In order to use upstream_bridge_distance_warn() from a dma_map function,
> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
> might sleep.
> 
> In order to avoid this, try to get the host bridge's device from
> bus->self, and if that is not set just get the first element in the
> list. It should be impossible for the host bridges device to go away
> while references are held on child devices, so the first element
> should not change and this should be safe.
> 
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/pci/p2pdma.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index bd89437faf06..2135fe69bb07 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -311,11 +311,15 @@ static const struct pci_p2pdma_whitelist_entry {
>  static bool __host_bridge_whitelist(struct pci_host_bridge *host,
>   bool same_host_bridge)
>  {
> - struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
>   const struct pci_p2pdma_whitelist_entry *entry;
> + struct pci_dev *root = host->bus->self;
>   unsigned short vendor, device;
>  
>   if (!root)
> + root = list_first_entry_or_null(>bus->devices,
> + struct pci_dev, bus_list);

Replacing one ugliness (assuming there is a pci_dev for the host
bridge, and that it is at 00.0) with another (still assuming a pci_dev
and that it is host->bus->self or the first entry).  I can't suggest
anything better, but maybe a little comment in the code would help
future readers.

I wish we had a real way to discover this property without the
whitelist, at least for future devices.  Was there ever any interest
in a _DSM or similar interface for this?

I *am* very glad to remove a pci_get_slot() usage.

> +
> + if (!root || root->devfn)
>   return false;
>  
>   vendor = root->vendor;

Don't you need to also remove the "pci_dev_put(root)" a few lines
below?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 01/11] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()

2021-03-12 Thread Logan Gunthorpe




On 2021-03-12 1:39 p.m., Bjorn Helgaas wrote:

On Thu, Mar 11, 2021 at 04:31:31PM -0700, Logan Gunthorpe wrote:

In order to call this function from a dma_map function, it must not sleep.
The only reason it does sleep so to allocate the seqbuf to print
which devices are within the ACS path.


s/this function/upstream_bridge_distance_warn()/ ?
s/so to/is to/

Maybe the subject could say something about the purpose, e.g., allow
calling from atomic context or something?  "Pass gfp_mask flags" sort
of restates what we can read from the patch, but without the
motivation of why this is useful.


Switch the kmalloc call to use a passed in gfp_mask  and don't print that
message if the buffer fails to be allocated.

Signed-off-by: Logan Gunthorpe 


Acked-by: Bjorn Helgaas 


Thanks! I'll apply these changes for any future postings.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 01/11] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()

2021-03-12 Thread Bjorn Helgaas
On Thu, Mar 11, 2021 at 04:31:31PM -0700, Logan Gunthorpe wrote:
> In order to call this function from a dma_map function, it must not sleep.
> The only reason it does sleep so to allocate the seqbuf to print
> which devices are within the ACS path.

s/this function/upstream_bridge_distance_warn()/ ?
s/so to/is to/

Maybe the subject could say something about the purpose, e.g., allow
calling from atomic context or something?  "Pass gfp_mask flags" sort
of restates what we can read from the patch, but without the
motivation of why this is useful.

> Switch the kmalloc call to use a passed in gfp_mask  and don't print that
> message if the buffer fails to be allocated.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/p2pdma.c | 21 +++--
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 196382630363..bd89437faf06 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -267,7 +267,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
>  
>  static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev 
> *pdev)
>  {
> - if (!buf)
> + if (!buf || !buf->buffer)
>   return;
>  
>   seq_buf_printf(buf, "%s;", pci_name(pdev));
> @@ -495,25 +495,26 @@ upstream_bridge_distance(struct pci_dev *provider, 
> struct pci_dev *client,
>  
>  static enum pci_p2pdma_map_type
>  upstream_bridge_distance_warn(struct pci_dev *provider, struct pci_dev 
> *client,
> -   int *dist)
> +   int *dist, gfp_t gfp_mask)
>  {
>   struct seq_buf acs_list;
>   bool acs_redirects;
>   int ret;
>  
> - seq_buf_init(_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
> - if (!acs_list.buffer)
> - return -ENOMEM;
> + seq_buf_init(_list, kmalloc(PAGE_SIZE, gfp_mask), PAGE_SIZE);
>  
>   ret = upstream_bridge_distance(provider, client, dist, _redirects,
>  _list);
>   if (acs_redirects) {
>   pci_warn(client, "ACS redirect is set between the client and 
> provider (%s)\n",
>pci_name(provider));
> - /* Drop final semicolon */
> - acs_list.buffer[acs_list.len-1] = 0;
> - pci_warn(client, "to disable ACS redirect for this path, add 
> the kernel parameter: pci=disable_acs_redir=%s\n",
> -  acs_list.buffer);
> +
> + if (acs_list.buffer) {
> + /* Drop final semicolon */
> + acs_list.buffer[acs_list.len - 1] = 0;
> + pci_warn(client, "to disable ACS redirect for this 
> path, add the kernel parameter: pci=disable_acs_redir=%s\n",
> +  acs_list.buffer);
> + }
>   }
>  
>   if (ret == PCI_P2PDMA_MAP_NOT_SUPPORTED) {
> @@ -566,7 +567,7 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, 
> struct device **clients,
>  
>   if (verbose)
>   ret = upstream_bridge_distance_warn(provider,
> - pci_client, );
> + pci_client, , GFP_KERNEL);
>   else
>   ret = upstream_bridge_distance(provider, pci_client,
>  , NULL, NULL);
> -- 
> 2.20.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 08/11] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg

2021-03-12 Thread Logan Gunthorpe



On 2021-03-12 12:47 p.m., Robin Murphy wrote:

   {
   struct scatterlist *s, *cur = sg;
   unsigned long seg_mask = dma_get_seg_boundary(dev);
@@ -864,6 +865,20 @@ static int __finalise_sg(struct device *dev,
struct scatterlist *sg, int nents,
   sg_dma_address(s) = DMA_MAPPING_ERROR;
   sg_dma_len(s) = 0;
   +    if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
+    if (i > 0)
+    cur = sg_next(cur);
+
+    sg_dma_address(cur) = sg_phys(s) + s->offset -


Are you sure about that? ;)


Do you see a bug? I don't follow you...


sg_phys() already accounts for the offset, so you're adding it twice.


Ah, oops. Nice catch. I missed that.




+    pci_p2pdma_bus_offset(sg_page(s));


Can the bus offset make P2P addresses overlap with regions of mem space
that we might use for regular IOVA allocation? That would be very bad...


No. IOMMU drivers already disallow all PCI addresses from being used as
IOVA addresses. See, for example,  dmar_init_reserved_ranges(). It would
be a huge problem for a whole lot of other reasons if it didn't.


I know we reserve the outbound windows (largely *because* some host 
bridges will consider those addresses as attempts at unsupported P2P and 
prevent them working), I just wanted to confirm that this bus offset is 
always something small that stays within the relevant window, rather 
than something that might make a BAR appear in a completely different 
place for P2P purposes. If so, that's good.


Yes, well if an IOVA overlaps with any PCI bus address there's going to 
be some difficult brokenness because when the IOVA is used it might be 
directed to the a PCI device and not the IOMMU. I fixed a bug like that 
once.

I'm not really thrilled about the idea of passing zero-length segments
to iommu_map_sg(). Yes, it happens to trick the concatenation logic in
the current implementation into doing what you want, but it feels 
fragile.


We're not passing zero length segments to iommu_map_sg() (or any
function). This loop is just scanning to calculate the length of the
required IOVA. __finalise_sg() (which is intimately tied to this loop)
then needs a way to determine which segments were P2P segments. The
existing code already overwrites s->length with an aligned length and
stores the original length in sg_dma_len. So we're not relying on
tricking any logic here.


Yes, we temporarily shuffle in page-aligned quantities to satisfy the 
needs of the iommu_map_sg() call, before unpacking things again in 
__finalise_sg(). It's some disgusting trickery that I'm particularly 
proud of. My point is that if you have a mix of both p2p and normal 
segments - which seems to be a case you want to support - then the 
length of 0 that you set to flag p2p segments here will be seen by 
iommu_map_sg() (as it walks the list to map the other segments) before 
you then use it as a key to override the DMA address in the final step. 
It's not a concern if you have a p2p-only list and short-circuit 
straight to that step (in which case all the shuffling was wasted effort 
anyway), but since it's not entirely clear what a segment with zero 
length would mean in general, it seems like a good idea to avoid passing 
the list across a public boundary in that state, if possible.


Ok, well, I mean the iommu_map_sg() does the right thing as is without 
changing it and IMO sg->length set to zero does make sense. Supporting 
mixed P2P and normal segments is really the whole point of this series 
(the current kernel supports homogeneous SGLs with a specialized path -- 
see pci_p2pdma_unmap_sg_attrs()). But do you have an alternate solution 
for sg->length?


Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v2 08/11] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg

2021-03-12 Thread Robin Murphy

On 2021-03-12 17:03, Logan Gunthorpe wrote:



On 2021-03-12 8:52 a.m., Robin Murphy wrote:

On 2021-03-11 23:31, Logan Gunthorpe wrote:

When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.


This misled me at first, but I see the implementation does actually
appear to accomodate the case of working ACS where P2P *would* still
need to be mapped at the IOMMU.


Yes, that's correct.

   static int __finalise_sg(struct device *dev, struct scatterlist *sg,
int nents,
-    dma_addr_t dma_addr)
+    dma_addr_t dma_addr, unsigned long attrs)
   {
   struct scatterlist *s, *cur = sg;
   unsigned long seg_mask = dma_get_seg_boundary(dev);
@@ -864,6 +865,20 @@ static int __finalise_sg(struct device *dev,
struct scatterlist *sg, int nents,
   sg_dma_address(s) = DMA_MAPPING_ERROR;
   sg_dma_len(s) = 0;
   +    if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
+    if (i > 0)
+    cur = sg_next(cur);
+
+    sg_dma_address(cur) = sg_phys(s) + s->offset -


Are you sure about that? ;)


Do you see a bug? I don't follow you...


sg_phys() already accounts for the offset, so you're adding it twice.


+    pci_p2pdma_bus_offset(sg_page(s));


Can the bus offset make P2P addresses overlap with regions of mem space
that we might use for regular IOVA allocation? That would be very bad...


No. IOMMU drivers already disallow all PCI addresses from being used as
IOVA addresses. See, for example,  dmar_init_reserved_ranges(). It would
be a huge problem for a whole lot of other reasons if it didn't.


I know we reserve the outbound windows (largely *because* some host 
bridges will consider those addresses as attempts at unsupported P2P and 
prevent them working), I just wanted to confirm that this bus offset is 
always something small that stays within the relevant window, rather 
than something that might make a BAR appear in a completely different 
place for P2P purposes. If so, that's good.



@@ -960,11 +975,12 @@ static int iommu_dma_map_sg(struct device *dev,
struct scatterlist *sg,
   struct iommu_dma_cookie *cookie = domain->iova_cookie;
   struct iova_domain *iovad = >iovad;
   struct scatterlist *s, *prev = NULL;
+    struct dev_pagemap *pgmap = NULL;
   int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
   dma_addr_t iova;
   size_t iova_len = 0;
   unsigned long mask = dma_get_seg_boundary(dev);
-    int i;
+    int i, map = -1, ret = 0;
     if (static_branch_unlikely(_deferred_attach_enabled) &&
   iommu_deferred_attach(dev, domain))
@@ -993,6 +1009,23 @@ static int iommu_dma_map_sg(struct device *dev,
struct scatterlist *sg,
   s_length = iova_align(iovad, s_length + s_iova_off);
   s->length = s_length;
   +    if (is_pci_p2pdma_page(sg_page(s))) {
+    if (sg_page(s)->pgmap != pgmap) {
+    pgmap = sg_page(s)->pgmap;
+    map = pci_p2pdma_dma_map_type(dev, pgmap);
+    }
+
+    if (map < 0) {


It rather feels like it should be the job of whoever creates the list in
the first place not to put unusable pages in it, especially since the
p2pdma_map_type looks to be a fairly coarse-grained and static thing.
The DMA API isn't responsible for validating normal memory pages, so
what makes P2P special?


Yes, that would be ideal, but there's some difficulties there. For the
driver to check the pages, it would need to loop through the entire SG
one more time on every transaction, regardless of whether there are
P2PDMA pages, or not. So that will have a performance impact even when
the feature isn't being used. I don't think that'll be acceptable for
many drivers.

The other possibility is for GUP to do it when it gets the pages from
userspace. But GUP doesn't have all the information to do this at the
moment. We'd have to pass the struct device that will eventually map the
pages through all the nested functions in the GUP to do that test at
that time. This might not be a bad option (that I half looked into), but
I'm not sure how acceptable it would be to the GUP developers.


Urgh, yes, if a page may or may not be valid for p2p depending on which 
device is trying to map it, then it probably is most reasonable to 
figure that out at this point. It's a little unfortunate having to cope 
with failure so late, but oh well.



But even if we do verify the pages ahead of time, we still need the same
infrastructure in dma_map_sg(); it could only now be a BUG if the driver
sent invalid pages instead of an error return.


The hope was that we could save doing even that - e.g. if you pass a 
dodgy page into dma_map_page(), maybe page_to_phys() will crash, maybe 
you'll just end up with a DMA address that won't work, but 

Re: [RFC PATCH v2 06/11] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg

2021-03-12 Thread Logan Gunthorpe


On 2021-03-12 11:11 a.m., Robin Murphy wrote:
> On 2021-03-12 16:24, Logan Gunthorpe wrote:
>>
>>
>> On 2021-03-12 8:52 a.m., Robin Murphy wrote:
 +
    sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
    sg->offset, sg->length, dir, attrs);
    if (sg->dma_address == DMA_MAPPING_ERROR)
 @@ -411,7 +440,7 @@ int dma_direct_map_sg(struct device *dev, struct
 scatterlist *sgl, int nents,
      out_unmap:
    dma_direct_unmap_sg(dev, sgl, i, dir, attrs |
 DMA_ATTR_SKIP_CPU_SYNC);
 -    return 0;
 +    return ret;
    }
      dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t
 paddr,
 diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
 index b6a633679933..adc1a83950be 100644
 --- a/kernel/dma/mapping.c
 +++ b/kernel/dma/mapping.c
 @@ -178,8 +178,15 @@ void dma_unmap_page_attrs(struct device *dev,
 dma_addr_t addr, size_t size,
    EXPORT_SYMBOL(dma_unmap_page_attrs);
      /*
 - * dma_maps_sg_attrs returns 0 on error and > 0 on success.
 - * It should never return a value < 0.
 + * dma_maps_sg_attrs returns 0 on any resource error and > 0 on
 success.
 + *
 + * If 0 is returned, the mapping can be retried and will succeed once
 + * sufficient resources are available.
>>>
>>> That's not a guarantee we can uphold. Retrying forever in the vain hope
>>> that a device might evolve some extra address bits, or a bounce buffer
>>> might magically grow big enough for a gigantic mapping, isn't
>>> necessarily the best idea.
>>
>> Perhaps this is just poorly worded. Returning 0 is the normal case and
>> nothing has changed there. The block layer, for example, will retry if
>> zero is returned as this only happens if it failed to allocate resources
>> for the mapping. The reason we have to return -1 is to tell the block
>> layer not to retry these requests as they will never succeed in the
>> future.
>>
 + *
 + * If there are P2PDMA pages in the scatterlist then this function may
 + * return -EREMOTEIO to indicate that the pages are not mappable by
 the
 + * device. In this case, an error should be returned for the IO as it
 + * will never be successfully retried.
     */
    int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int
 nents,
    enum dma_data_direction dir, unsigned long attrs)
 @@ -197,7 +204,7 @@ int dma_map_sg_attrs(struct device *dev, struct
 scatterlist *sg, int nents,
    ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
    else
    ents = ops->map_sg(dev, sg, nents, dir, attrs);
 -    BUG_ON(ents < 0);
 +
>>>
>>> This scares me - I hesitate to imagine the amount of driver/subsystem
>>> code out there that will see nonzero and merrily set off iterating a
>>> negative number of segments, if we open the floodgates of allowing
>>> implementations to return error codes here.
>>
>> Yes, but it will never happen on existing drivers/subsystems. The only
>> way it can return a negative number is if the driver passes in P2PDMA
>> pages which can't happen without changes in the driver. We are careful
>> about where P2PDMA pages can get into so we don't have to worry about
>> all the existing driver code out there.
> 
> Sure, that's how things stand immediately after this patch. But then
> someone comes along with the perfectly reasonable argument for returning
> more expressive error information for regular mapping failures as well
> (because sometimes those can be terminal too, as above), we start to get
> divergent behaviour across architectures and random bits of old code
> subtly breaking down the line. *That* is what makes me wary of making a
> fundamental change to a long-standing "nonzero means success" interface...

So then we reject the patches that make that change. Seems like an odd
argument to say that we can't do something that won't cause problems
because someone might use it as an example and do something that will
cause problems. Reject the change that causes the problem.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v2 00/11] Add support to dma_map_sg for P2PDMA

2021-03-12 Thread Logan Gunthorpe


On 2021-03-12 10:46 a.m., Robin Murphy wrote:
> On 2021-03-12 16:18, Logan Gunthorpe wrote:
>>
>>
>> On 2021-03-12 8:51 a.m., Robin Murphy wrote:
>>> On 2021-03-11 23:31, Logan Gunthorpe wrote:
 Hi,

 This is a rework of the first half of my RFC for doing P2PDMA in
 userspace
 with O_DIRECT[1].

 The largest issue with that series was the gross way of flagging P2PDMA
 SGL segments. This RFC proposes a different approach, (suggested by
 Dan Williams[2]) which uses the third bit in the page_link field of the
 SGL.

 This approach is a lot less hacky but comes at the cost of adding a
 CONFIG_64BIT dependency to CONFIG_PCI_P2PDMA and using up the last
 scarce bit in the page_link. For our purposes, a 64BIT restriction is
 acceptable but it's not clear if this is ok for all usecases hoping
 to make use of P2PDMA.

 Matthew Wilcox has already suggested (off-list) that this is the wrong
 approach, preferring a new dma mapping operation and an SGL
 replacement. I
 don't disagree that something along those lines would be a better long
 term solution, but it involves overcoming a lot of challenges to get
 there. Creating a new mapping operation still means adding support to
 more
 than 25 dma_map_ops implementations (many of which are on obscure
 architectures) or creating a redundant path to fallback with
 dma_map_sg()
 for every driver that uses the new operation. This RFC is an approach
 that doesn't require overcoming these blocks.
>>>
>>> I don't really follow that argument - you're only adding support to two
>>> implementations with the awkward flag, so why would using a dedicated
>>> operation instead be any different? Whatever callers need to do if
>>> dma_pci_p2pdma_supported() says no, they could equally do if
>>> dma_map_p2p_sg() (or whatever) returns -ENXIO, no?
>>
>> The thing is if the dma_map_sg doesn't support P2PDMA then P2PDMA
>> transactions cannot be done, but regular transactions can still go
>> through as they always did.
>>
>> But replacing dma_map_sg() with dma_map_new() affects all operations,
>> P2PDMA or otherwise. If dma_map_new() isn't supported it can't simply
>> not support P2PDMA; it has to maintain a fallback path to dma_map_sg().
> 
> But AFAICS the equivalent fallback path still has to exist either way.
> My impression so far is that callers would end up looking something like
> this:
> 
> if (dma_pci_p2pdma_supported()) {
>     if (dma_map_sg(...) < 0)
>     //do non-p2p fallback due to p2p failure
> } else {
>     //do non-p2p fallback due to lack of support
> }
> 
> at which point, simply:
> 
> if (dma_map_sg_p2p(...) < 0)
>     //do non-p2p fallback either way
> 
> seems entirely reasonable. What am I missing?

No, that's not how it works. The dma_pci_p2pdma_supported() flag gates
whether P2PDMA pages will be used at a much higher level. Currently with
NVMe, if P2PDMA is supported, it sets the QUEUE_FLAG_PCI_P2PDMA on the
block queue. This is then tested with blk_queue_pci_p2pdma() before any
P2PDMA transaction with the block device is started.

In the following patches that could add userspace support,
blk_queue_pci_p2pdma() is used to add a flag to GUP which allows P2PDMA
pages into the driver via O_DIRECT.

There is no fallback path on the dma_map_sg() operation if p2pdma is not
supported. dma_map_sg() is always used. The code simply prevents pages
from getting there in the first place.

A new DMA operation would have to be:

if (dma_has_new_operation()) {
 // create input for dma_map_new_op()
 // map with dma_map_new_op()
 // parse output of dma_map_new_op()
} else {
 // create SGL
 dma_map_sg()
 // convert SGL to hardware map
}

And this if statement has nothing to do with p2pdma support or not.

> 
> Let's not pretend that overloading an existing API means we can start
> feeding P2P pages into any old subsystem/driver without further changes
> - there already *are* at least some that retry ad infinitum if DMA
> mapping fails (the USB layer springs to mind...) and thus wouldn't
> handle the PCI_P2PDMA_MAP_NOT_SUPPORTED case acceptably.

Yes, nobody is pretending that at all. We are being very careful with
P2PDMA pages and we don't want them to get into any driver that isn't
explicitly written to expect them. With the code in the current kernel
this is all very explicit and done ahead of time (with
QUEUE_FLAG_PCI_P2PDMA and similar). Once the pages get into userspace,
GUP will reject them unless the driver getting the pages specifically
sets a flag indicating support for them.

>> Given that the inputs and outputs for dma_map_new() will be completely
>> different data structures this will be quite a lot of similar paths
>> required in the driver. (ie mapping a bvec to the input struct and the
>> output struct to hardware requirements) If a bug crops up in the old
>> dma_map_sg(), developers might 

Re: [RFC PATCH v2 06/11] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg

2021-03-12 Thread Robin Murphy

On 2021-03-12 16:24, Logan Gunthorpe wrote:



On 2021-03-12 8:52 a.m., Robin Murphy wrote:

+
   sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
   sg->offset, sg->length, dir, attrs);
   if (sg->dma_address == DMA_MAPPING_ERROR)
@@ -411,7 +440,7 @@ int dma_direct_map_sg(struct device *dev, struct
scatterlist *sgl, int nents,
     out_unmap:
   dma_direct_unmap_sg(dev, sgl, i, dir, attrs |
DMA_ATTR_SKIP_CPU_SYNC);
-    return 0;
+    return ret;
   }
     dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t
paddr,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index b6a633679933..adc1a83950be 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -178,8 +178,15 @@ void dma_unmap_page_attrs(struct device *dev,
dma_addr_t addr, size_t size,
   EXPORT_SYMBOL(dma_unmap_page_attrs);
     /*
- * dma_maps_sg_attrs returns 0 on error and > 0 on success.
- * It should never return a value < 0.
+ * dma_maps_sg_attrs returns 0 on any resource error and > 0 on success.
+ *
+ * If 0 is returned, the mapping can be retried and will succeed once
+ * sufficient resources are available.


That's not a guarantee we can uphold. Retrying forever in the vain hope
that a device might evolve some extra address bits, or a bounce buffer
might magically grow big enough for a gigantic mapping, isn't
necessarily the best idea.


Perhaps this is just poorly worded. Returning 0 is the normal case and
nothing has changed there. The block layer, for example, will retry if
zero is returned as this only happens if it failed to allocate resources
for the mapping. The reason we have to return -1 is to tell the block
layer not to retry these requests as they will never succeed in the future.


+ *
+ * If there are P2PDMA pages in the scatterlist then this function may
+ * return -EREMOTEIO to indicate that the pages are not mappable by the
+ * device. In this case, an error should be returned for the IO as it
+ * will never be successfully retried.
    */
   int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int
nents,
   enum dma_data_direction dir, unsigned long attrs)
@@ -197,7 +204,7 @@ int dma_map_sg_attrs(struct device *dev, struct
scatterlist *sg, int nents,
   ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
   else
   ents = ops->map_sg(dev, sg, nents, dir, attrs);
-    BUG_ON(ents < 0);
+


This scares me - I hesitate to imagine the amount of driver/subsystem
code out there that will see nonzero and merrily set off iterating a
negative number of segments, if we open the floodgates of allowing
implementations to return error codes here.


Yes, but it will never happen on existing drivers/subsystems. The only
way it can return a negative number is if the driver passes in P2PDMA
pages which can't happen without changes in the driver. We are careful
about where P2PDMA pages can get into so we don't have to worry about
all the existing driver code out there.


Sure, that's how things stand immediately after this patch. But then 
someone comes along with the perfectly reasonable argument for returning 
more expressive error information for regular mapping failures as well 
(because sometimes those can be terminal too, as above), we start to get 
divergent behaviour across architectures and random bits of old code 
subtly breaking down the line. *That* is what makes me wary of making a 
fundamental change to a long-standing "nonzero means success" interface...


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v2 00/11] Add support to dma_map_sg for P2PDMA

2021-03-12 Thread Robin Murphy

On 2021-03-12 16:18, Logan Gunthorpe wrote:



On 2021-03-12 8:51 a.m., Robin Murphy wrote:

On 2021-03-11 23:31, Logan Gunthorpe wrote:

Hi,

This is a rework of the first half of my RFC for doing P2PDMA in
userspace
with O_DIRECT[1].

The largest issue with that series was the gross way of flagging P2PDMA
SGL segments. This RFC proposes a different approach, (suggested by
Dan Williams[2]) which uses the third bit in the page_link field of the
SGL.

This approach is a lot less hacky but comes at the cost of adding a
CONFIG_64BIT dependency to CONFIG_PCI_P2PDMA and using up the last
scarce bit in the page_link. For our purposes, a 64BIT restriction is
acceptable but it's not clear if this is ok for all usecases hoping
to make use of P2PDMA.

Matthew Wilcox has already suggested (off-list) that this is the wrong
approach, preferring a new dma mapping operation and an SGL
replacement. I
don't disagree that something along those lines would be a better long
term solution, but it involves overcoming a lot of challenges to get
there. Creating a new mapping operation still means adding support to
more
than 25 dma_map_ops implementations (many of which are on obscure
architectures) or creating a redundant path to fallback with dma_map_sg()
for every driver that uses the new operation. This RFC is an approach
that doesn't require overcoming these blocks.


I don't really follow that argument - you're only adding support to two
implementations with the awkward flag, so why would using a dedicated
operation instead be any different? Whatever callers need to do if
dma_pci_p2pdma_supported() says no, they could equally do if
dma_map_p2p_sg() (or whatever) returns -ENXIO, no?


The thing is if the dma_map_sg doesn't support P2PDMA then P2PDMA
transactions cannot be done, but regular transactions can still go
through as they always did.

But replacing dma_map_sg() with dma_map_new() affects all operations,
P2PDMA or otherwise. If dma_map_new() isn't supported it can't simply
not support P2PDMA; it has to maintain a fallback path to dma_map_sg().


But AFAICS the equivalent fallback path still has to exist either way. 
My impression so far is that callers would end up looking something like 
this:


if (dma_pci_p2pdma_supported()) {
if (dma_map_sg(...) < 0)
//do non-p2p fallback due to p2p failure
} else {
//do non-p2p fallback due to lack of support
}

at which point, simply:

if (dma_map_sg_p2p(...) < 0)
//do non-p2p fallback either way

seems entirely reasonable. What am I missing?

Let's not pretend that overloading an existing API means we can start 
feeding P2P pages into any old subsystem/driver without further changes 
- there already *are* at least some that retry ad infinitum if DMA 
mapping fails (the USB layer springs to mind...) and thus wouldn't 
handle the PCI_P2PDMA_MAP_NOT_SUPPORTED case acceptably.



Given that the inputs and outputs for dma_map_new() will be completely
different data structures this will be quite a lot of similar paths
required in the driver. (ie mapping a bvec to the input struct and the
output struct to hardware requirements) If a bug crops up in the old
dma_map_sg(), developers might not notice it for some time seeing it
won't be used on the most popular architectures.


Huh? I'm specifically suggesting a new interface that takes the *same* 
data structure (at least to begin with), but just gives us more 
flexibility in terms of introducing p2p-aware behaviour somewhat more 
safely. Yes, we already know that we ultimately want something better 
than scatterlists for representing things like this and dma-buf imports, 
but that hardly has to happen overnight.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 08/11] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg

2021-03-12 Thread Logan Gunthorpe


On 2021-03-12 8:52 a.m., Robin Murphy wrote:
> On 2021-03-11 23:31, Logan Gunthorpe wrote:
>> When a PCI P2PDMA page is seen, set the IOVA length of the segment
>> to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
>> apply the appropriate bus address to the segment. The IOVA is not
>> created if the scatterlist only consists of P2PDMA pages.
> 
> This misled me at first, but I see the implementation does actually
> appear to accomodate the case of working ACS where P2P *would* still
> need to be mapped at the IOMMU.

Yes, that's correct.
>>   static int __finalise_sg(struct device *dev, struct scatterlist *sg,
>> int nents,
>> -    dma_addr_t dma_addr)
>> +    dma_addr_t dma_addr, unsigned long attrs)
>>   {
>>   struct scatterlist *s, *cur = sg;
>>   unsigned long seg_mask = dma_get_seg_boundary(dev);
>> @@ -864,6 +865,20 @@ static int __finalise_sg(struct device *dev,
>> struct scatterlist *sg, int nents,
>>   sg_dma_address(s) = DMA_MAPPING_ERROR;
>>   sg_dma_len(s) = 0;
>>   +    if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
>> +    if (i > 0)
>> +    cur = sg_next(cur);
>> +
>> +    sg_dma_address(cur) = sg_phys(s) + s->offset -
> 
> Are you sure about that? ;)

Do you see a bug? I don't follow you...

>> +    pci_p2pdma_bus_offset(sg_page(s));
> 
> Can the bus offset make P2P addresses overlap with regions of mem space
> that we might use for regular IOVA allocation? That would be very bad...

No. IOMMU drivers already disallow all PCI addresses from being used as
IOVA addresses. See, for example,  dmar_init_reserved_ranges(). It would
be a huge problem for a whole lot of other reasons if it didn't.


>> @@ -960,11 +975,12 @@ static int iommu_dma_map_sg(struct device *dev,
>> struct scatterlist *sg,
>>   struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>   struct iova_domain *iovad = >iovad;
>>   struct scatterlist *s, *prev = NULL;
>> +    struct dev_pagemap *pgmap = NULL;
>>   int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
>>   dma_addr_t iova;
>>   size_t iova_len = 0;
>>   unsigned long mask = dma_get_seg_boundary(dev);
>> -    int i;
>> +    int i, map = -1, ret = 0;
>>     if (static_branch_unlikely(_deferred_attach_enabled) &&
>>   iommu_deferred_attach(dev, domain))
>> @@ -993,6 +1009,23 @@ static int iommu_dma_map_sg(struct device *dev,
>> struct scatterlist *sg,
>>   s_length = iova_align(iovad, s_length + s_iova_off);
>>   s->length = s_length;
>>   +    if (is_pci_p2pdma_page(sg_page(s))) {
>> +    if (sg_page(s)->pgmap != pgmap) {
>> +    pgmap = sg_page(s)->pgmap;
>> +    map = pci_p2pdma_dma_map_type(dev, pgmap);
>> +    }
>> +
>> +    if (map < 0) {
> 
> It rather feels like it should be the job of whoever creates the list in
> the first place not to put unusable pages in it, especially since the
> p2pdma_map_type looks to be a fairly coarse-grained and static thing.
> The DMA API isn't responsible for validating normal memory pages, so
> what makes P2P special?

Yes, that would be ideal, but there's some difficulties there. For the
driver to check the pages, it would need to loop through the entire SG
one more time on every transaction, regardless of whether there are
P2PDMA pages, or not. So that will have a performance impact even when
the feature isn't being used. I don't think that'll be acceptable for
many drivers.

The other possibility is for GUP to do it when it gets the pages from
userspace. But GUP doesn't have all the information to do this at the
moment. We'd have to pass the struct device that will eventually map the
pages through all the nested functions in the GUP to do that test at
that time. This might not be a bad option (that I half looked into), but
I'm not sure how acceptable it would be to the GUP developers.

But even if we do verify the pages ahead of time, we still need the same
infrastructure in dma_map_sg(); it could only now be a BUG if the driver
sent invalid pages instead of an error return.

>> +    ret = -EREMOTEIO;
>> +    goto out_restore_sg;
>> +    }
>> +
>> +    if (map) {
>> +    s->length = 0;
> 
> I'm not really thrilled about the idea of passing zero-length segments
> to iommu_map_sg(). Yes, it happens to trick the concatenation logic in
> the current implementation into doing what you want, but it feels fragile.

We're not passing zero length segments to iommu_map_sg() (or any
function). This loop is just scanning to calculate the length of the
required IOVA. __finalise_sg() (which is intimately tied to this loop)
then needs a way to determine which segments were P2P segments. The
existing code already overwrites s->length with an aligned length and
stores the original length in sg_dma_len. So we're not relying on
tricking any logic here.


>>   }
>>    

Re: [RFC PATCH v2 06/11] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg

2021-03-12 Thread Logan Gunthorpe


On 2021-03-12 8:52 a.m., Robin Murphy wrote:
>> +
>>   sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
>>   sg->offset, sg->length, dir, attrs);
>>   if (sg->dma_address == DMA_MAPPING_ERROR)
>> @@ -411,7 +440,7 @@ int dma_direct_map_sg(struct device *dev, struct
>> scatterlist *sgl, int nents,
>>     out_unmap:
>>   dma_direct_unmap_sg(dev, sgl, i, dir, attrs |
>> DMA_ATTR_SKIP_CPU_SYNC);
>> -    return 0;
>> +    return ret;
>>   }
>>     dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t
>> paddr,
>> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
>> index b6a633679933..adc1a83950be 100644
>> --- a/kernel/dma/mapping.c
>> +++ b/kernel/dma/mapping.c
>> @@ -178,8 +178,15 @@ void dma_unmap_page_attrs(struct device *dev,
>> dma_addr_t addr, size_t size,
>>   EXPORT_SYMBOL(dma_unmap_page_attrs);
>>     /*
>> - * dma_maps_sg_attrs returns 0 on error and > 0 on success.
>> - * It should never return a value < 0.
>> + * dma_maps_sg_attrs returns 0 on any resource error and > 0 on success.
>> + *
>> + * If 0 is returned, the mapping can be retried and will succeed once
>> + * sufficient resources are available.
> 
> That's not a guarantee we can uphold. Retrying forever in the vain hope
> that a device might evolve some extra address bits, or a bounce buffer
> might magically grow big enough for a gigantic mapping, isn't
> necessarily the best idea.

Perhaps this is just poorly worded. Returning 0 is the normal case and
nothing has changed there. The block layer, for example, will retry if
zero is returned as this only happens if it failed to allocate resources
for the mapping. The reason we have to return -1 is to tell the block
layer not to retry these requests as they will never succeed in the future.

>> + *
>> + * If there are P2PDMA pages in the scatterlist then this function may
>> + * return -EREMOTEIO to indicate that the pages are not mappable by the
>> + * device. In this case, an error should be returned for the IO as it
>> + * will never be successfully retried.
>>    */
>>   int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int
>> nents,
>>   enum dma_data_direction dir, unsigned long attrs)
>> @@ -197,7 +204,7 @@ int dma_map_sg_attrs(struct device *dev, struct
>> scatterlist *sg, int nents,
>>   ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
>>   else
>>   ents = ops->map_sg(dev, sg, nents, dir, attrs);
>> -    BUG_ON(ents < 0);
>> +
> 
> This scares me - I hesitate to imagine the amount of driver/subsystem
> code out there that will see nonzero and merrily set off iterating a
> negative number of segments, if we open the floodgates of allowing
> implementations to return error codes here.

Yes, but it will never happen on existing drivers/subsystems. The only
way it can return a negative number is if the driver passes in P2PDMA
pages which can't happen without changes in the driver. We are careful
about where P2PDMA pages can get into so we don't have to worry about
all the existing driver code out there.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v2 00/11] Add support to dma_map_sg for P2PDMA

2021-03-12 Thread Logan Gunthorpe



On 2021-03-12 8:51 a.m., Robin Murphy wrote:
> On 2021-03-11 23:31, Logan Gunthorpe wrote:
>> Hi,
>>
>> This is a rework of the first half of my RFC for doing P2PDMA in
>> userspace
>> with O_DIRECT[1].
>>
>> The largest issue with that series was the gross way of flagging P2PDMA
>> SGL segments. This RFC proposes a different approach, (suggested by
>> Dan Williams[2]) which uses the third bit in the page_link field of the
>> SGL.
>>
>> This approach is a lot less hacky but comes at the cost of adding a
>> CONFIG_64BIT dependency to CONFIG_PCI_P2PDMA and using up the last
>> scarce bit in the page_link. For our purposes, a 64BIT restriction is
>> acceptable but it's not clear if this is ok for all usecases hoping
>> to make use of P2PDMA.
>>
>> Matthew Wilcox has already suggested (off-list) that this is the wrong
>> approach, preferring a new dma mapping operation and an SGL
>> replacement. I
>> don't disagree that something along those lines would be a better long
>> term solution, but it involves overcoming a lot of challenges to get
>> there. Creating a new mapping operation still means adding support to
>> more
>> than 25 dma_map_ops implementations (many of which are on obscure
>> architectures) or creating a redundant path to fallback with dma_map_sg()
>> for every driver that uses the new operation. This RFC is an approach
>> that doesn't require overcoming these blocks.
> 
> I don't really follow that argument - you're only adding support to two
> implementations with the awkward flag, so why would using a dedicated
> operation instead be any different? Whatever callers need to do if
> dma_pci_p2pdma_supported() says no, they could equally do if
> dma_map_p2p_sg() (or whatever) returns -ENXIO, no?

The thing is if the dma_map_sg doesn't support P2PDMA then P2PDMA
transactions cannot be done, but regular transactions can still go
through as they always did.

But replacing dma_map_sg() with dma_map_new() affects all operations,
P2PDMA or otherwise. If dma_map_new() isn't supported it can't simply
not support P2PDMA; it has to maintain a fallback path to dma_map_sg().
Given that the inputs and outputs for dma_map_new() will be completely
different data structures this will be quite a lot of similar paths
required in the driver. (ie mapping a bvec to the input struct and the
output struct to hardware requirements) If a bug crops up in the old
dma_map_sg(), developers might not notice it for some time seeing it
won't be used on the most popular architectures.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-12 Thread Robin Murphy

On 2021-03-11 08:26, Christoph Hellwig wrote:

On Wed, Mar 10, 2021 at 06:39:57PM +, Robin Murphy wrote:

Actually... Just mirroring the iommu_dma_strict value into
struct iommu_domain should solve all of that with very little
boilerplate code.


Yes, my initial thought was to directly replace the attribute with a
common flag at iommu_domain level, but since in all cases the behaviour
is effectively global rather than actually per-domain, it seemed
reasonable to take it a step further. This passes compile-testing for
arm64 and x86, what do you think?


It seems to miss a few bits, and also generally seems to be not actually
apply to recent mainline or something like it due to different empty
lines in a few places.


Yeah, that was sketched out on top of some other development patches, 
and in being so focused on not breaking any of the x86 behaviours I did 
indeed overlook fully converting the SMMU drivers... oops!


(my thought was to do the conversion for its own sake, then clean up the 
redundant attribute separately, but I guess it's fine either way)



Let me know what you think of the version here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/iommu-cleanup

I'll happily switch the patch to you as the author if you're fine with
that as well.


I still have reservations about removing the attribute API entirely and 
pretending that io_pgtable_cfg is anything other than a SoC-specific 
private interface, but the reworked patch on its own looks reasonable to 
me, thanks! (I wasn't too convinced about the iommu_cmd_line wrappers 
either...) Just iommu_get_dma_strict() needs an export since the SMMU 
drivers can be modular - I consciously didn't add that myself since I 
was mistakenly thinking only iommu-dma would call it.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1] iommu/tegra-smmu: Make tegra_smmu_probe_device() to handle all IOMMU phandles

2021-03-12 Thread Dmitry Osipenko
The tegra_smmu_probe_device() handles only the first IOMMU device-tree
phandle, skipping the rest. Devices like 3D module on Tegra30 have
multiple IOMMU phandles, one for each h/w block, and thus, only one
IOMMU phandle is added to fwspec for the 3D module, breaking GPU.
Previously this problem was masked by tegra_smmu_attach_dev() which
didn't use the fwspec, but parsed the DT by itself. The previous commit
to tegra-smmu driver partially reverted changes that caused problems for
T124 and now we have tegra_smmu_attach_dev() that uses the fwspec and
the old-buggy variant of tegra_smmu_probe_device() which skips secondary
IOMMUs.

Make tegra_smmu_probe_device() not to skip the secondary IOMMUs. This
fixes a partially attached IOMMU of the 3D module on Tegra30 and now GPU
works properly once again.

Fixes: 765a9d1d02b2 ("iommu/tegra-smmu: Fix mc errors on tegra124-nyan")
Signed-off-by: Dmitry Osipenko 
---
 drivers/iommu/tegra-smmu.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 97eb62f667d2..602aab98c079 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -849,12 +849,11 @@ static struct iommu_device 
*tegra_smmu_probe_device(struct device *dev)
smmu = tegra_smmu_find(args.np);
if (smmu) {
err = tegra_smmu_configure(smmu, dev, );
-   of_node_put(args.np);
 
-   if (err < 0)
+   if (err < 0) {
+   of_node_put(args.np);
return ERR_PTR(err);
-
-   break;
+   }
}
 
of_node_put(args.np);
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 08/11] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg

2021-03-12 Thread Robin Murphy

On 2021-03-11 23:31, Logan Gunthorpe wrote:

When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.


This misled me at first, but I see the implementation does actually 
appear to accomodate the case of working ACS where P2P *would* still 
need to be mapped at the IOMMU.



Similar to dma-direct, the sg_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is
set to DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for
P2PDMA pages.

Signed-off-by: Logan Gunthorpe 
---
  drivers/iommu/dma-iommu.c | 63 ---
  1 file changed, 53 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index af765c813cc8..c0821e9051a9 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -20,6 +20,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -846,7 +847,7 @@ static void iommu_dma_unmap_page(struct device *dev, 
dma_addr_t dma_handle,
   * segment's start address to avoid concatenating across one.
   */
  static int __finalise_sg(struct device *dev, struct scatterlist *sg, int 
nents,
-   dma_addr_t dma_addr)
+   dma_addr_t dma_addr, unsigned long attrs)
  {
struct scatterlist *s, *cur = sg;
unsigned long seg_mask = dma_get_seg_boundary(dev);
@@ -864,6 +865,20 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
sg_dma_address(s) = DMA_MAPPING_ERROR;
sg_dma_len(s) = 0;
  
+		if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {

+   if (i > 0)
+   cur = sg_next(cur);
+
+   sg_dma_address(cur) = sg_phys(s) + s->offset -


Are you sure about that? ;)


+   pci_p2pdma_bus_offset(sg_page(s));


Can the bus offset make P2P addresses overlap with regions of mem space 
that we might use for regular IOVA allocation? That would be very bad...



+   sg_dma_len(cur) = s->length;
+   sg_mark_pci_p2pdma(cur);
+
+   count++;
+   cur_len = 0;
+   continue;
+   }
+
/*
 * Now fill in the real DMA data. If...
 * - there is a valid output segment to append to
@@ -960,11 +975,12 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
struct scatterlist *s, *prev = NULL;
+   struct dev_pagemap *pgmap = NULL;
int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
-   int i;
+   int i, map = -1, ret = 0;
  
  	if (static_branch_unlikely(_deferred_attach_enabled) &&

iommu_deferred_attach(dev, domain))
@@ -993,6 +1009,23 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
s_length = iova_align(iovad, s_length + s_iova_off);
s->length = s_length;
  
+		if (is_pci_p2pdma_page(sg_page(s))) {

+   if (sg_page(s)->pgmap != pgmap) {
+   pgmap = sg_page(s)->pgmap;
+   map = pci_p2pdma_dma_map_type(dev, pgmap);
+   }
+
+   if (map < 0) {


It rather feels like it should be the job of whoever creates the list in 
the first place not to put unusable pages in it, especially since the 
p2pdma_map_type looks to be a fairly coarse-grained and static thing. 
The DMA API isn't responsible for validating normal memory pages, so 
what makes P2P special?



+   ret = -EREMOTEIO;
+   goto out_restore_sg;
+   }
+
+   if (map) {
+   s->length = 0;


I'm not really thrilled about the idea of passing zero-length segments 
to iommu_map_sg(). Yes, it happens to trick the concatenation logic in 
the current implementation into doing what you want, but it feels fragile.



+   continue;
+   }
+   }
+
/*
 * Due to the alignment of our single IOVA allocation, we can
 * depend on these assumptions about the segment boundary mask:
@@ -1015,6 +1048,9 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
prev = 

Re: [RFC PATCH v2 06/11] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg

2021-03-12 Thread Robin Murphy

On 2021-03-11 23:31, Logan Gunthorpe wrote:

Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
PCI P2PDMA pages directly without a hack in the callers. This allows
for heterogeneous SGLs that contain both P2PDMA and regular pages.

SGL segments that contain PCI bus addresses are marked with
sg_mark_pci_p2pdma() and are ignored when unmapped.

Signed-off-by: Logan Gunthorpe 
---
  kernel/dma/direct.c  | 35 ---
  kernel/dma/mapping.c | 13 ++---
  2 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 002268262c9a..f326d32062dd 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -13,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
  #include "direct.h"
  
  /*

@@ -387,19 +388,47 @@ void dma_direct_unmap_sg(struct device *dev, struct 
scatterlist *sgl,
struct scatterlist *sg;
int i;
  
-	for_each_sg(sgl, sg, nents, i)

+   for_each_sg(sgl, sg, nents, i) {
+   if (sg_is_pci_p2pdma(sg))
+   continue;
+
dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
 attrs);
+   }
  }
  #endif
  
  int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,

enum dma_data_direction dir, unsigned long attrs)
  {
-   int i;
+   struct dev_pagemap *pgmap = NULL;
+   int i, map = -1, ret = 0;
struct scatterlist *sg;
+   u64 bus_off;
  
  	for_each_sg(sgl, sg, nents, i) {

+   if (is_pci_p2pdma_page(sg_page(sg))) {
+   if (sg_page(sg)->pgmap != pgmap) {
+   pgmap = sg_page(sg)->pgmap;
+   map = pci_p2pdma_dma_map_type(dev, pgmap);
+   bus_off = pci_p2pdma_bus_offset(sg_page(sg));
+   }
+
+   if (map < 0) {
+   sg->dma_address = DMA_MAPPING_ERROR;
+   ret = -EREMOTEIO;
+   goto out_unmap;
+   }
+
+   if (map) {
+   sg->dma_address = sg_phys(sg) + sg->offset -
+   bus_off;
+   sg_dma_len(sg) = sg->length;
+   sg_mark_pci_p2pdma(sg);
+   continue;
+   }
+   }
+
sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
sg->offset, sg->length, dir, attrs);
if (sg->dma_address == DMA_MAPPING_ERROR)
@@ -411,7 +440,7 @@ int dma_direct_map_sg(struct device *dev, struct 
scatterlist *sgl, int nents,
  
  out_unmap:

dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
-   return 0;
+   return ret;
  }
  
  dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index b6a633679933..adc1a83950be 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -178,8 +178,15 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t 
addr, size_t size,
  EXPORT_SYMBOL(dma_unmap_page_attrs);
  
  /*

- * dma_maps_sg_attrs returns 0 on error and > 0 on success.
- * It should never return a value < 0.
+ * dma_maps_sg_attrs returns 0 on any resource error and > 0 on success.
+ *
+ * If 0 is returned, the mapping can be retried and will succeed once
+ * sufficient resources are available.


That's not a guarantee we can uphold. Retrying forever in the vain hope 
that a device might evolve some extra address bits, or a bounce buffer 
might magically grow big enough for a gigantic mapping, isn't 
necessarily the best idea.



+ *
+ * If there are P2PDMA pages in the scatterlist then this function may
+ * return -EREMOTEIO to indicate that the pages are not mappable by the
+ * device. In this case, an error should be returned for the IO as it
+ * will never be successfully retried.
   */
  int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
enum dma_data_direction dir, unsigned long attrs)
@@ -197,7 +204,7 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist 
*sg, int nents,
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);
-   BUG_ON(ents < 0);
+


This scares me - I hesitate to imagine the amount of driver/subsystem 
code out there that will see nonzero and merrily set off iterating a 
negative number of segments, if we open the floodgates of allowing 
implementations to return error codes here.


Robin.


debug_dma_map_sg(dev, sg, nents, ents, dir);
  
  	return ents;



___
iommu mailing list
iommu@lists.linux-foundation.org

Re: [RFC PATCH v2 00/11] Add support to dma_map_sg for P2PDMA

2021-03-12 Thread Robin Murphy

On 2021-03-11 23:31, Logan Gunthorpe wrote:

Hi,

This is a rework of the first half of my RFC for doing P2PDMA in userspace
with O_DIRECT[1].

The largest issue with that series was the gross way of flagging P2PDMA
SGL segments. This RFC proposes a different approach, (suggested by
Dan Williams[2]) which uses the third bit in the page_link field of the
SGL.

This approach is a lot less hacky but comes at the cost of adding a
CONFIG_64BIT dependency to CONFIG_PCI_P2PDMA and using up the last
scarce bit in the page_link. For our purposes, a 64BIT restriction is
acceptable but it's not clear if this is ok for all usecases hoping
to make use of P2PDMA.

Matthew Wilcox has already suggested (off-list) that this is the wrong
approach, preferring a new dma mapping operation and an SGL replacement. I
don't disagree that something along those lines would be a better long
term solution, but it involves overcoming a lot of challenges to get
there. Creating a new mapping operation still means adding support to more
than 25 dma_map_ops implementations (many of which are on obscure
architectures) or creating a redundant path to fallback with dma_map_sg()
for every driver that uses the new operation. This RFC is an approach
that doesn't require overcoming these blocks.


I don't really follow that argument - you're only adding support to two 
implementations with the awkward flag, so why would using a dedicated 
operation instead be any different? Whatever callers need to do if 
dma_pci_p2pdma_supported() says no, they could equally do if 
dma_map_p2p_sg() (or whatever) returns -ENXIO, no?


We don't try to multiplex .map_resource through .map_page, so there 
doesn't seem to be any good reason to force that complexity on .map_sg 
either. And having a distinct API from the outset should make it a lot 
easier to transition to better "list of P2P memory regions" data 
structures in future without rewriting the whole world. As it is, there 
are potential benefits in a P2P interface which can define its own 
behaviour - for instance if threw out the notion of segment merging it 
could save a load of bother by just maintaining the direct correlation 
between pages and DMA addresses.


Robin.


Any alternative ideas or feedback is welcome.

These patches are based on v5.12-rc2 and a git branch is available here:

   https://github.com/sbates130272/linux-p2pmem/  p2pdma_dma_map_ops_rfc

A branch with the patches from the previous RFC that add userspace
O_DIRECT support is available at the same URL with the name
"p2pdma_dma_map_ops_rfc+user" (however, none of the issues with those
extra patches from the feedback of the last posting have been fixed).

Thanks,

Logan

[1] 
https://lore.kernel.org/linux-block/20201106170036.18713-1-log...@deltatee.com/
[2] 
https://lore.kernel.org/linux-block/capcyv4ifgcrdotut8qr7pmfhmecghqgvre9g0rorgczcgve...@mail.gmail.com/

--

Logan Gunthorpe (11):
   PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
   PCI/P2PDMA: Avoid pci_get_slot() which sleeps
   PCI/P2PDMA: Attempt to set map_type if it has not been set
   PCI/P2PDMA: Introduce pci_p2pdma_should_map_bus() and
 pci_p2pdma_bus_offset()
   lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL
   dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
   dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support
   iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
   block: Add BLK_STS_P2PDMA
   nvme-pci: Check DMA ops when indicating support for PCI P2PDMA
   nvme-pci: Convert to using dma_map_sg for p2pdma pages

  block/blk-core.c|  2 +
  drivers/iommu/dma-iommu.c   | 63 +-
  drivers/nvme/host/core.c|  3 +-
  drivers/nvme/host/nvme.h|  2 +-
  drivers/nvme/host/pci.c | 38 +++-
  drivers/pci/Kconfig |  2 +-
  drivers/pci/p2pdma.c| 89 +++--
  include/linux/blk_types.h   |  7 +++
  include/linux/dma-map-ops.h |  3 ++
  include/linux/dma-mapping.h |  5 +++
  include/linux/pci-p2pdma.h  | 11 +
  include/linux/scatterlist.h | 49 ++--
  kernel/dma/direct.c | 35 +--
  kernel/dma/mapping.c| 21 +++--
  14 files changed, 271 insertions(+), 59 deletions(-)


base-commit: a38fd8748464831584a19438cbb3082b5a2dab15
--
2.20.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 18/18] ioasid: Add /dev/ioasid for userspace

2021-03-12 Thread Jason Gunthorpe
On Thu, Mar 11, 2021 at 02:55:34PM -0800, Jacob Pan wrote:
> Hi Jason,
> 
> Thanks for the review.
> 
> On Wed, 10 Mar 2021 15:23:01 -0400, Jason Gunthorpe  wrote:
> 
> > On Sat, Feb 27, 2021 at 02:01:26PM -0800, Jacob Pan wrote:
> > 
> > > +/*  IOCTLs for IOASID file descriptor (/dev/ioasid)  */
> > > +
> > > +/**
> > > + * IOASID_GET_API_VERSION - _IO(IOASID_TYPE, IOASID_BASE + 0)
> > > + *
> > > + * Report the version of the IOASID API.  This allows us to bump the
> > > entire
> > > + * API version should we later need to add or change features in
> > > incompatible
> > > + * ways.
> > > + * Return: IOASID_API_VERSION
> > > + * Availability: Always
> > > + */
> > > +#define IOASID_GET_API_VERSION   _IO(IOASID_TYPE,
> > > IOASID_BASE + 0)  
> > 
> > I think this is generally a bad idea, if you change the API later then
> > also change the ioctl numbers and everything should work out
> > 
> > eg use the 4th argument to IOC to specify something about the ABI
> > 
> Let me try to understand the idea, do you mean something like this?
> #define IOASID_GET_INFO _IOC(_IOC_NONE, IOASID_TYPE, IOASID_BASE + 1,
> sizeof(struct ioasid_info))
> 
> If we later change the size of struct ioasid_info, IOASID_GET_INFO would be
> a different ioctl number. Then we will break the existing user space that
> uses the old number. So I am guessing you meant we need to have a different
> name also. i.e.

Something like that is more appropriate. Generally we should not be
planning to 'remove' IOCTLs. The kernel must always have backwards
compat, so any new format you introduce down the road has to have new
IOCTL number so the old format can continue to be supported.

Negotiation of support can usually by done by probing for ENOIOCTLCMD
or similar on the new ioctls, not an API version

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 13/15] iommu/virtio: Attach Arm PASID tables when available

2021-03-12 Thread Vivek Kumar Gautam




On 3/3/21 10:55 PM, Jean-Philippe Brucker wrote:

On Fri, Jan 15, 2021 at 05:43:40PM +0530, Vivek Gautam wrote:
[...]

+static int viommu_setup_pgtable(struct viommu_endpoint *vdev,
+   struct viommu_domain *vdomain)
+{
+   int ret, id;
+   u32 asid;
+   enum io_pgtable_fmt fmt;
+   struct io_pgtable_ops *ops = NULL;
+   struct viommu_dev *viommu = vdev->viommu;
+   struct virtio_iommu_probe_table_format *desc = vdev->pgtf;
+   struct iommu_pasid_table *tbl = vdomain->pasid_tbl;
+   struct iommu_vendor_psdtable_cfg *pst_cfg;
+   struct arm_smmu_cfg_info *cfgi;
+   struct io_pgtable_cfg cfg = {
+   .iommu_dev  = viommu->dev->parent,
+   .tlb= _flush_ops,
+   .pgsize_bitmap  = vdev->pgsize_mask ? vdev->pgsize_mask :
+ vdomain->domain.pgsize_bitmap,
+   .ias= (vdev->input_end ? ilog2(vdev->input_end) :
+  
ilog2(vdomain->domain.geometry.aperture_end)) + 1,
+   .oas= vdev->output_bits,
+   };
+
+   if (!desc)
+   return -EINVAL;
+
+   if (!vdev->output_bits)
+   return -ENODEV;
+
+   switch (le16_to_cpu(desc->format)) {
+   case VIRTIO_IOMMU_FOMRAT_PGTF_ARM_LPAE:
+   fmt = ARM_64_LPAE_S1;
+   break;
+   default:
+   dev_err(vdev->dev, "unsupported page table format 0x%x\n",
+   le16_to_cpu(desc->format));
+   return -EINVAL;
+   }
+
+   if (vdomain->mm.ops) {
+   /*
+* TODO: attach additional endpoint to the domain. Check that
+* the config is sane.
+*/
+   return -EEXIST;
+   }
+
+   vdomain->mm.domain = vdomain;
+   ops = alloc_io_pgtable_ops(fmt, , >mm);
+   if (!ops)
+   return -ENOMEM;
+
+   pst_cfg = >cfg;
+   cfgi = _cfg->vendor.cfg;
+   id = ida_simple_get(_ida, 1, 1 << desc->asid_bits, GFP_KERNEL);
+   if (id < 0) {
+   ret = id;
+   goto err_free_pgtable;
+   }
+
+   asid = id;
+   ret = iommu_psdtable_prepare(tbl, pst_cfg, , asid);
+   if (ret)
+   goto err_free_asid;
+
+   /*
+* Strange to setup an op here?
+* cd-lib is the actual user of sync op, and therefore the platform
+* drivers should assign this sync/maintenance ops as per need.
+*/
+   tbl->ops->sync = viommu_flush_pasid;


But this function deals with page tables, not pasid tables. As said on an
earlier patch, the TLB flush ops should probably be passed during table
registration - those ops are global so should really be const.


Right, will amend it.




+
+   /* Right now only PASID 0 supported ?? */
+   ret = iommu_psdtable_write(tbl, pst_cfg, 0, >s1_cfg->cd);
+   if (ret)
+   goto err_free_asid;
+
+   vdomain->mm.ops = ops;
+   dev_dbg(vdev->dev, "using page table format 0x%x\n", fmt);
+
+   return 0;
+
+err_free_asid:
+   ida_simple_remove(_ida, asid);
+err_free_pgtable:
+   free_io_pgtable_ops(ops);
+   return ret;
+}
+
+static int viommu_config_arm_pst(struct iommu_vendor_psdtable_cfg *pst_cfg,
+struct virtio_iommu_req_attach_pst_arm *req)
+{
+   struct arm_smmu_s1_cfg *s1_cfg = pst_cfg->vendor.cfg.s1_cfg;
+
+   if (!s1_cfg)
+   return -ENODEV;
+
+   req->format  = cpu_to_le16(VIRTIO_IOMMU_FORMAT_PSTF_ARM_SMMU_V3);
+   req->s1fmt   = s1_cfg->s1fmt;
+   req->s1dss   = VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_DSS_0;
+   req->s1contextptr = cpu_to_le64(pst_cfg->base);
+   req->s1cdmax = cpu_to_le32(s1_cfg->s1cdmax);
+
+   return 0;
+}
+
+static int viommu_config_pst(struct iommu_vendor_psdtable_cfg *pst_cfg,
+void *req, enum pasid_table_fmt fmt)
+{
+   int ret;
+
+   switch (fmt) {
+   case PASID_TABLE_ARM_SMMU_V3:
+   ret = viommu_config_arm_pst(pst_cfg, req);
+   break;
+   default:
+   ret = -EINVAL;
+   WARN_ON(1);
+   }
+
+   return ret;
+}
+
+static int viommu_prepare_arm_pst(struct viommu_endpoint *vdev,
+ struct iommu_vendor_psdtable_cfg *pst_cfg)
+{
+   struct virtio_iommu_probe_table_format *pgtf = vdev->pgtf;
+   struct arm_smmu_cfg_info *cfgi = _cfg->vendor.cfg;
+   struct arm_smmu_s1_cfg *cfg;
+
+   /* Some sanity checks */
+   if (pgtf->asid_bits != 8 && pgtf->asid_bits != 16)
+   return -EINVAL;


No need for this, next patch cheks asid size in viommu_config_arm_pgt()


Right, thanks for catching.




+
+   cfg = devm_kzalloc(pst_cfg->iommu_dev, sizeof(cfg), GFP_KERNEL);
+   if (!cfg)
+   return -ENOMEM;
+
+   cfgi->s1_cfg = cfg;
+   cfg->s1cdmax = 

Re: [PATCH RFC v1 15/15] iommu/virtio: Update fault type and reason info for viommu fault

2021-03-12 Thread Vivek Kumar Gautam




On 3/3/21 10:55 PM, Jean-Philippe Brucker wrote:

On Fri, Jan 15, 2021 at 05:43:42PM +0530, Vivek Gautam wrote:

Fault type information can tell about a page request fault or
an unreceoverable fault, and further additions to fault reasons
and the related PASID information can help in handling faults
efficiently.

Signed-off-by: Vivek Gautam 
Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Michael S. Tsirkin 
Cc: Robin Murphy 
Cc: Jean-Philippe Brucker 
Cc: Eric Auger 
Cc: Alex Williamson 
Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Liu Yi L 
Cc: Lorenzo Pieralisi 
Cc: Shameerali Kolothum Thodi 
---
  drivers/iommu/virtio-iommu.c  | 27 +--
  include/uapi/linux/virtio_iommu.h | 13 -
  2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 9cc3d35125e9..10ef9e98214a 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -652,9 +652,16 @@ static int viommu_fault_handler(struct viommu_dev *viommu,
char *reason_str;
  
  	u8 reason	= fault->reason;

+   u16 type= fault->flt_type;
u32 flags   = le32_to_cpu(fault->flags);
u32 endpoint= le32_to_cpu(fault->endpoint);
u64 address = le64_to_cpu(fault->address);
+   u32 pasid   = le32_to_cpu(fault->pasid);
+
+   if (type == VIRTIO_IOMMU_FAULT_F_PAGE_REQ) {
+   dev_info(viommu->dev, "Page request fault - unhandled\n");
+   return 0;
+   }
  
  	switch (reason) {

case VIRTIO_IOMMU_FAULT_R_DOMAIN:
@@ -663,6 +670,21 @@ static int viommu_fault_handler(struct viommu_dev *viommu,
case VIRTIO_IOMMU_FAULT_R_MAPPING:
reason_str = "page";
break;
+   case VIRTIO_IOMMU_FAULT_R_WALK_EABT:
+   reason_str = "page walk external abort";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_PTE_FETCH:
+   reason_str = "pte fetch";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_PERMISSION:
+   reason_str = "permission";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_ACCESS:
+   reason_str = "access";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_OOR_ADDRESS:
+   reason_str = "output address";
+   break;
case VIRTIO_IOMMU_FAULT_R_UNKNOWN:
default:
reason_str = "unknown";
@@ -671,8 +693,9 @@ static int viommu_fault_handler(struct viommu_dev *viommu,
  
  	/* TODO: find EP by ID and report_iommu_fault */

if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS)
-   dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx 
[%s%s%s]\n",
-   reason_str, endpoint, address,
+   dev_err_ratelimited(viommu->dev,
+   "%s fault from EP %u PASID %u at %#llx 
[%s%s%s]\n",
+   reason_str, endpoint, pasid, address,
flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : 
"",
flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : 
"",
flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : 
"");
diff --git a/include/uapi/linux/virtio_iommu.h 
b/include/uapi/linux/virtio_iommu.h
index 608c8d642e1f..a537d82777f7 100644
--- a/include/uapi/linux/virtio_iommu.h
+++ b/include/uapi/linux/virtio_iommu.h
@@ -290,19 +290,30 @@ struct virtio_iommu_req_invalidate {
  #define VIRTIO_IOMMU_FAULT_R_UNKNOWN  0
  #define VIRTIO_IOMMU_FAULT_R_DOMAIN   1
  #define VIRTIO_IOMMU_FAULT_R_MAPPING  2
+#define VIRTIO_IOMMU_FAULT_R_WALK_EABT 3
+#define VIRTIO_IOMMU_FAULT_R_PTE_FETCH 4
+#define VIRTIO_IOMMU_FAULT_R_PERMISSION5
+#define VIRTIO_IOMMU_FAULT_R_ACCESS6
+#define VIRTIO_IOMMU_FAULT_R_OOR_ADDRESS   7
  
  #define VIRTIO_IOMMU_FAULT_F_READ		(1 << 0)

  #define VIRTIO_IOMMU_FAULT_F_WRITE(1 << 1)
  #define VIRTIO_IOMMU_FAULT_F_EXEC (1 << 2)
  #define VIRTIO_IOMMU_FAULT_F_ADDRESS  (1 << 8)
  
+#define VIRTIO_IOMMU_FAULT_F_DMA_UNRECOV	1

+#define VIRTIO_IOMMU_FAULT_F_PAGE_REQ  2


Currently all reported faults are unrecoverable, so to be consistent
DMA_UNRECOV should be 0. But I'd prefer having just a new "page request"
flag in the flags field, instead of the flt_type field.


Yea, looking at what I am currently trying as well - handle page-request 
and leave all other faults as unrecoverable - I will add the page 
request flag in the structure.




For page requests we'll also need a 16-bit fault ID field to store the PRI
"page request group index" or the stall "stag". "last" and "privileged"
flags as well, to match the PRI page request. And a new command to
complete a page fault.


Right, will add the fields as suggested.
To complete the page request we would also need to send the response 
back to the host from virtio backend 

Re: [PATCH RFC v1 09/15] iommu/virtio: Update table format probing header

2021-03-12 Thread Vivek Kumar Gautam




On 3/3/21 10:51 PM, Jean-Philippe Brucker wrote:

On Fri, Jan 15, 2021 at 05:43:36PM +0530, Vivek Gautam wrote:

Add info about asid_bits and additional flags to table format
probing header.

Signed-off-by: Vivek Gautam 
Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Michael S. Tsirkin 
Cc: Robin Murphy 
Cc: Jean-Philippe Brucker 
Cc: Eric Auger 
Cc: Alex Williamson 
Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Liu Yi L 
Cc: Lorenzo Pieralisi 
Cc: Shameerali Kolothum Thodi 
---
  include/uapi/linux/virtio_iommu.h | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_iommu.h 
b/include/uapi/linux/virtio_iommu.h
index 43821e33e7af..8a0624bab4b2 100644
--- a/include/uapi/linux/virtio_iommu.h
+++ b/include/uapi/linux/virtio_iommu.h
@@ -169,7 +169,10 @@ struct virtio_iommu_probe_pasid_size {
  struct virtio_iommu_probe_table_format {
struct virtio_iommu_probe_property  head;
__le16  format;
-   __u8reserved[2];
+   __le16  asid_bits;
+
+   __le32  flags;


This struct should only contain the head and format fields. asid and flags
should go in a specialized structure - virtio_iommu_probe_pgt_arm64 in the
latest spec draft, where I dropped the asid_bits field in favor of an
"ASID16" flag.


Right, will take care of this looking at the spec draft.

Best regards
Vivek



Thanks,
Jean


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 08/15] iommu: Add asid_bits to arm smmu-v3 stage1 table info

2021-03-12 Thread Vivek Kumar Gautam




On 3/3/21 10:48 PM, Jean-Philippe Brucker wrote:

On Fri, Jan 15, 2021 at 05:43:35PM +0530, Vivek Gautam wrote:

aisd_bits data is required to prepare stage-1 tables for arm-smmu-v3.

Signed-off-by: Vivek Gautam 
Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Jean-Philippe Brucker 
Cc: Eric Auger 
Cc: Alex Williamson 
Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Liu Yi L 
Cc: Lorenzo Pieralisi 
Cc: Shameerali Kolothum Thodi 
---
  include/uapi/linux/iommu.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 082d758dd016..96abbfc7c643 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -357,7 +357,7 @@ struct iommu_pasid_smmuv3 {
__u32   version;
__u8s1fmt;
__u8s1dss;
-   __u8padding[2];
+   __u16   asid_bits;


Is this used anywhere?  This struct is passed from host userspace to host
kernel to attach the PASID table, so I don't think it needs an asid_bits
field.


Yea, must have missed removing it from the WIP work. Will remove it.

Thanks
Vivek
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 6/6] media: uvcvideo: Use dma_alloc_noncontiguous API

2021-03-12 Thread Ricardo Ribalda
On architectures where there is no coherent caching such as ARM use the
dma_alloc_noncontiguous API and handle manually the cache flushing using
dma_sync_sgtable().

With this patch on the affected architectures we can measure up to 20x
performance improvement in uvc_video_copy_data_work().

Eg: aarch64 with an external usb camera

NON_CONTIGUOUS
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 67034480 : duration 33303
FPS: 29.99
URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS)
decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
raw decode speed: 9.931 Gbits/s
raw URB handling speed: 1.025 Gbits/s
throughput: 16.102 Mbits/s
URB decode CPU usage 0.162600 %

COHERENT
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 54683536 : duration 33302
FPS: 29.99
URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS)
header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS)
decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max (uS)
raw decode speed: 365.470 Mbits/s
raw URB handling speed: 295.986 Mbits/s
throughput: 13.136 Mbits/s
URB decode CPU usage 3.594500 %

Signed-off-by: Ricardo Ribalda 
Reviewed-by: Tomasz Figa 
Signed-off-by: Christoph Hellwig 
---

Changelog from v2: (Thanks Laurent)

- Fix typos
- Use the right dma direction if not capturing
- Clear sgt during free

 drivers/media/usb/uvc/uvc_video.c | 92 +++
 drivers/media/usb/uvc/uvcvideo.h  |  5 +-
 2 files changed, 74 insertions(+), 23 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_video.c 
b/drivers/media/usb/uvc/uvc_video.c
index f2f565281e63..8e60f81e2257 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -6,11 +6,14 @@
  *  Laurent Pinchart (laurent.pinch...@ideasonboard.com)
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1096,6 +1099,34 @@ static int uvc_video_decode_start(struct uvc_streaming 
*stream,
return data[0];
 }
 
+static inline enum dma_data_direction stream_dir(struct uvc_streaming *stream)
+{
+   if (stream->type == V4L2_BUF_TYPE_VIDEO_CAPTURE)
+   return DMA_FROM_DEVICE;
+   else
+   return DMA_TO_DEVICE;
+}
+
+static inline struct device *stream_to_dmadev(struct uvc_streaming *stream)
+{
+   return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
+}
+
+static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)
+{
+   struct device *dma_dev = stream_to_dmadev(uvc_urb->stream);
+
+   if (for_device) {
+   dma_sync_sgtable_for_device(dma_dev, uvc_urb->sgt,
+   stream_dir(uvc_urb->stream));
+   } else {
+   dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt,
+stream_dir(uvc_urb->stream));
+   invalidate_kernel_vmap_range(uvc_urb->buffer,
+uvc_urb->stream->urb_size);
+   }
+}
+
 /*
  * uvc_video_decode_data_work: Asynchronous memcpy processing
  *
@@ -1117,6 +1148,8 @@ static void uvc_video_copy_data_work(struct work_struct 
*work)
uvc_queue_buffer_release(op->buf);
}
 
+   uvc_urb_dma_sync(uvc_urb, true);
+
ret = usb_submit_urb(uvc_urb->urb, GFP_KERNEL);
if (ret < 0)
dev_err(_urb->stream->intf->dev,
@@ -1541,10 +1574,12 @@ static void uvc_video_complete(struct urb *urb)
 * Process the URB headers, and optionally queue expensive memcpy tasks
 * to be deferred to a work queue.
 */
+   uvc_urb_dma_sync(uvc_urb, false);
stream->decode(uvc_urb, buf, buf_meta);
 
/* If no async work is needed, resubmit the URB immediately. */
if (!uvc_urb->async_operations) {
+   uvc_urb_dma_sync(uvc_urb, true);
ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC);
if (ret < 0)
dev_err(>intf->dev,
@@ -1560,24 +1595,49 @@ static void uvc_video_complete(struct urb *urb)
  */
 static void uvc_free_urb_buffers(struct uvc_streaming *stream)
 {
+   struct device *dma_dev = stream_to_dmadev(stream);
struct uvc_urb *uvc_urb;
 
for_each_uvc_urb(uvc_urb, stream) {
if (!uvc_urb->buffer)
continue;
 
-#ifndef CONFIG_DMA_NONCOHERENT
-   

Re: [PATCH RFC v1 06/15] iommu/virtio: Add headers for table format probing

2021-03-12 Thread Vivek Kumar Gautam




On 3/3/21 10:47 PM, Jean-Philippe Brucker wrote:

On Fri, Jan 15, 2021 at 05:43:33PM +0530, Vivek Gautam wrote:

From: Jean-Philippe Brucker 

Add required UAPI defines for probing table format for underlying
iommu hardware. The device may provide information about hardware
tables and additional capabilities for each device.
This allows guest to correctly fabricate stage-1 page tables.

Signed-off-by: Jean-Philippe Brucker 
[Vivek: Use a single "struct virtio_iommu_probe_table_format" rather
 than separate structures for page table and pasid table format.


Makes sense. I've integrated that into the spec draft, added more precise
documentation and modified some of the definitions.

The current draft is here:
https://jpbrucker.net/virtio-iommu/spec/virtio-iommu-v0.13.pdf
Posted on the list here
https://lists.oasis-open.org/archives/virtio-dev/202102/msg00012.html


Thanks, I took an initial look, will review it this week.




Also update commit message.]
Signed-off-by: Vivek Gautam 
Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Michael S. Tsirkin 
Cc: Robin Murphy 
Cc: Jean-Philippe Brucker 
Cc: Eric Auger 
Cc: Alex Williamson 
Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Liu Yi L 
Cc: Lorenzo Pieralisi 
Cc: Shameerali Kolothum Thodi 
---
  include/uapi/linux/virtio_iommu.h | 44 ++-
  1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_iommu.h 
b/include/uapi/linux/virtio_iommu.h
index 237e36a280cb..43821e33e7af 100644
--- a/include/uapi/linux/virtio_iommu.h
+++ b/include/uapi/linux/virtio_iommu.h
@@ -2,7 +2,7 @@
  /*
   * Virtio-iommu definition v0.12
   *
- * Copyright (C) 2019 Arm Ltd.
+ * Copyright (C) 2019-2021 Arm Ltd.


Not strictly necessary. But if you're modifying this comment please also
remove the "v0.12" above


Sure, let me keep the copyright year unchanged until we finalize the 
changes in draft spec.





   */
  #ifndef _UAPI_LINUX_VIRTIO_IOMMU_H
  #define _UAPI_LINUX_VIRTIO_IOMMU_H
@@ -111,6 +111,12 @@ struct virtio_iommu_req_unmap {
  
  #define VIRTIO_IOMMU_PROBE_T_NONE		0

  #define VIRTIO_IOMMU_PROBE_T_RESV_MEM 1
+#define VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK2
+#define VIRTIO_IOMMU_PROBE_T_INPUT_RANGE   3
+#define VIRTIO_IOMMU_PROBE_T_OUTPUT_SIZE   4
+#define VIRTIO_IOMMU_PROBE_T_PASID_SIZE5
+#define VIRTIO_IOMMU_PROBE_T_PAGE_TABLE_FMT6
+#define VIRTIO_IOMMU_PROBE_T_PASID_TABLE_FMT   7


Since there is a single struct we can have a single
VIRTIO_IOMMU_PROBE_T_TABLE_FORMAT.


Right, that would make sense.



  
  #define VIRTIO_IOMMU_PROBE_T_MASK		0xfff
  
@@ -130,6 +136,42 @@ struct virtio_iommu_probe_resv_mem {

__le64  end;
  };
  
+struct virtio_iommu_probe_page_size_mask {

+   struct virtio_iommu_probe_property  head;
+   __u8reserved[4];
+   __le64  mask;
+};
+
+struct virtio_iommu_probe_input_range {
+   struct virtio_iommu_probe_property  head;
+   __u8reserved[4];
+   __le64  start;
+   __le64  end;
+};
+
+struct virtio_iommu_probe_output_size {
+   struct virtio_iommu_probe_property  head;
+   __u8bits;
+   __u8reserved[3];
+};
+
+struct virtio_iommu_probe_pasid_size {
+   struct virtio_iommu_probe_property  head;
+   __u8bits;
+   __u8reserved[3];
+};
+
+/* Arm LPAE page table format */
+#define VIRTIO_IOMMU_FOMRAT_PGTF_ARM_LPAE  1


s/FOMRAT/FORMAT


Sure.




+/* Arm smmu-v3 type PASID table format */
+#define VIRTIO_IOMMU_FORMAT_PSTF_ARM_SMMU_V3   2


These should be with the Arm-specific definitions patches 11 and 14


Right, will add these definitions with Arm specific patches.

Best regards
Vivek

[snip]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 05/15] iommu/arm-smmu-v3: Set sync op from consumer driver of cd-lib

2021-03-12 Thread Vivek Kumar Gautam




On 3/3/21 10:45 PM, Jean-Philippe Brucker wrote:

On Fri, Jan 15, 2021 at 05:43:32PM +0530, Vivek Gautam wrote:

Te change allows different consumers of arm-smmu-v3-cd-lib to set
their respective sync op for pasid entries.

Signed-off-by: Vivek Gautam 
Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Jean-Philippe Brucker 
Cc: Eric Auger 
Cc: Alex Williamson 
Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Liu Yi L 
Cc: Lorenzo Pieralisi 
Cc: Shameerali Kolothum Thodi 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 1 -
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c| 7 +++
  2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
index ec37476c8d09..acaa09acecdd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
@@ -265,7 +265,6 @@ struct iommu_vendor_psdtable_ops arm_cd_table_ops = {
.free= arm_smmu_free_cd_tables,
.prepare = arm_smmu_prepare_cd,
.write   = arm_smmu_write_ctx_desc,
-   .sync= arm_smmu_sync_cd,
  };
  
  struct iommu_pasid_table *arm_smmu_register_cd_table(struct device *dev,

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2f86c6ac42b6..0c644be22b4b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1869,6 +1869,13 @@ static int arm_smmu_domain_finalise_s1(struct 
arm_smmu_domain *smmu_domain,
if (ret)
goto out_free_cd_tables;
  
+	/*

+* Strange to setup an op here?
+* cd-lib is the actual user of sync op, and therefore the platform
+* drivers should assign this sync/maintenance ops as per need.
+*/
+   tbl->ops->sync = arm_smmu_sync_cd;
+


Modifying a static struct from here doesn't feel right. I think the
interface should be roughly similar to io-pgtable since the principle is
the same. So the sync() op should be separate from arm_cd_table_ops since
it's a callback into the driver. Maybe pass it to
iommu_register_pasid_table().


Sure, will take care of this.

Thanks & regards
Vivek
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 02/15] iommu: Add a simple PASID table library

2021-03-12 Thread Vivek Kumar Gautam

Hi Jean,


On 3/3/21 10:41 PM, Jean-Philippe Brucker wrote:

Hi Vivek,

Thanks again for working on this. I have a few comments but it looks
sensible overall.


Thanks a lot for reviewing the patch-series. Please find my responses 
inline below.




Regarding the overall design, I was initially assigning page directories
instead of whole PASID tables, which would simplify the driver and host
implementation. A major complication, however, is SMMUv3 accesses PASID
tables using a guest-physical address, so there is a messy negotiation
needed between host and guest when the host needs to allocate PASID
tables. Plus vSMMU needs PASID table assignment, so that's what the host
driver will implement.


By assigning the page directories, you mean setting up just the stage-1 
page table ops, and passing that information to the host using ATTACH_TABLE?
Right now when using kvmtool, the struct iommu_pasid_table_config is 
populated with the correct information, and this whole memory is mapped 
between host and guest by creating a mem bank using 
kvm__for_each_mem_bank().
Did I get you or did I fail terribly in understanding the point you are 
making here?

If it helps, I will publish my kvmtool branch.



On Fri, Jan 15, 2021 at 05:43:29PM +0530, Vivek Gautam wrote:

Add a small API in iommu subsystem to handle PASID table allocation
requests from different consumer drivers, such as a paravirtualized
iommu driver. The API provides ops for allocating and freeing PASID
table, writing to it and managing the table caches.

This library also provides for registering a vendor API that attaches
to these ops. The vendor APIs would eventually perform arch level
implementations for these PASID tables.


Although Arm might be the only vendor to ever use this, I think the
abstraction makes sense and isn't too invasive. Even if we called directly
into the SMMU driver from the virtio one, we'd still need patch 3 and
separate TLB invalidations ops.


Right, the idea was to make users of iommu-pasid-table - virtio-iommu or 
the arm-smmu-v3 - consistent. I also noticed that the whole process of 
allocating the pasid tables (or cd tables) and populating them with 
stage-1 page tables in viommu is also in-line with how things are in 
arm-smmu-v3 or atleast that's how the design can be in general - 
allocate pasid_table, and program stage-1 information into it, and then 
pass it across to host.





Signed-off-by: Vivek Gautam 
Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Jean-Philippe Brucker 
Cc: Eric Auger 
Cc: Alex Williamson 
Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Liu Yi L 
Cc: Lorenzo Pieralisi 
Cc: Shameerali Kolothum Thodi 
---
  drivers/iommu/iommu-pasid-table.h | 134 ++
  1 file changed, 134 insertions(+)
  create mode 100644 drivers/iommu/iommu-pasid-table.h

diff --git a/drivers/iommu/iommu-pasid-table.h 
b/drivers/iommu/iommu-pasid-table.h
new file mode 100644
index ..bd4f57656f67
--- /dev/null
+++ b/drivers/iommu/iommu-pasid-table.h
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PASID table management for the IOMMU
+ *
+ * Copyright (C) 2021 Arm Ltd.
+ */
+#ifndef __IOMMU_PASID_TABLE_H
+#define __IOMMU_PASID_TABLE_H
+
+#include 
+
+#include "arm/arm-smmu-v3/arm-smmu-v3.h"
+
+enum pasid_table_fmt {
+   PASID_TABLE_ARM_SMMU_V3,
+   PASID_TABLE_NUM_FMTS,
+};
+
+/**
+ * struct arm_smmu_cfg_info - arm-smmu-v3 specific configuration data
+ *
+ * @s1_cfg: arm-smmu-v3 stage1 config data
+ * @feat_flag: features supported by arm-smmu-v3 implementation
+ */
+struct arm_smmu_cfg_info {
+   struct arm_smmu_s1_cfg  *s1_cfg;
+   u32 feat_flag;
+};
+
+/**
+ * struct iommu_vendor_psdtable_cfg - Configuration data for PASID tables
+ *
+ * @iommu_dev: device performing the DMA table walks
+ * @fmt: The PASID table format
+ * @base: DMA address of the allocated table, set by the vendor driver
+ * @cfg: arm-smmu-v3 specific config data
+ */
+struct iommu_vendor_psdtable_cfg {
+   struct device   *iommu_dev;
+   enum pasid_table_fmtfmt;
+   dma_addr_t  base;
+   union {
+   struct arm_smmu_cfg_infocfg;


For the union to be extensible, that field should be called "arm" or
something like that.


Sure. Will amend this.

Thanks,
Vivek



Thanks,
Jean


+   } vendor;
+};
+
+struct iommu_vendor_psdtable_ops;
+
+/**
+ * struct iommu_pasid_table - describes a set of PASID tables
+ *
+ * @cookie: An opaque token provided by the IOMMU driver and passed back to any
+ * callback routine.
+ * @cfg: A copy of the PASID table configuration
+ * @ops: The PASID table operations in use for this set of page tables
+ */
+struct iommu_pasid_table {
+   void*cookie;
+   struct iommu_vendor_psdtable_cfgcfg;
+   struct iommu_vendor_psdtable_ops*ops;
+};
+
+#define pasid_table_cfg_to_table(pst_cfg) \
+   container_of((pst_cfg), struct 

Re: [PATCH RFC v1 04/15] iommu/arm-smmu-v3: Update CD base address info for user-space

2021-03-12 Thread Vivek Kumar Gautam

Hi Jean,

On 3/3/21 10:44 PM, Jean-Philippe Brucker wrote:

On Fri, Jan 15, 2021 at 05:43:31PM +0530, Vivek Gautam wrote:

Update base address information in vendor pasid table info to pass that
to user-space for stage1 table management.

Signed-off-by: Vivek Gautam 
Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Jean-Philippe Brucker 
Cc: Eric Auger 
Cc: Alex Williamson 
Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Liu Yi L 
Cc: Lorenzo Pieralisi 
Cc: Shameerali Kolothum Thodi 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
index 8a7187534706..ec37476c8d09 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
@@ -55,6 +55,9 @@ static __le64 *arm_smmu_get_cd_ptr(struct 
iommu_vendor_psdtable_cfg *pst_cfg,
if (arm_smmu_alloc_cd_leaf_table(dev, l1_desc))
return NULL;
  
+		if (s1cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)

+   pst_cfg->base = l1_desc->l2ptr_dma;
+


This isn't the right place, because this path allocates second-level
tables for two-level tables. I don't think we need pst_cfg->base at all,
because for both linear and two-level tables, the base pointer is in
cdcfg->cdtab_dma, which can be read directly.


Sure, will remove this.



Thanks,
Jean


l1ptr = cdcfg->cdtab + idx * CTXDESC_L1_DESC_DWORDS;
arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
/* An invalid L1CD can be cached */
@@ -211,6 +214,9 @@ static int arm_smmu_alloc_cd_tables(struct 
iommu_vendor_psdtable_cfg *pst_cfg)
goto err_free_l1;
}
  
+	if (s1cfg->s1fmt == STRTAB_STE_0_S1FMT_64K_L2)

+   pst_cfg->base = cdcfg->cdtab_dma;
+
return 0;
  
  err_free_l1:

--
2.17.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 7/7] iommu/amd: Add support for using AMD IOMMU v2 page table for DMA-API

2021-03-12 Thread Suravee Suthikulpanit
Introduce init function for setting up DMA domain for DMA-API with
the IOMMU v2 page table.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/iommu.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e29ece6e1e68..bd26de8764bd 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1937,6 +1937,24 @@ static int protection_domain_init_v1(struct 
protection_domain *domain, int mode)
return 0;
 }
 
+static int protection_domain_init_v2(struct protection_domain *domain)
+{
+   spin_lock_init(>lock);
+   domain->id = domain_id_alloc();
+   if (!domain->id)
+   return -ENOMEM;
+   INIT_LIST_HEAD(>dev_list);
+
+   domain->giov = true;
+
+   if (amd_iommu_pgtable == AMD_IOMMU_V2 &&
+   domain_enable_v2(domain, 1, false)) {
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
 static struct protection_domain *protection_domain_alloc(unsigned int type)
 {
struct io_pgtable_ops *pgtbl_ops;
@@ -1964,6 +1982,9 @@ static struct protection_domain 
*protection_domain_alloc(unsigned int type)
case AMD_IOMMU_V1:
ret = protection_domain_init_v1(domain, mode);
break;
+   case AMD_IOMMU_V2:
+   ret = protection_domain_init_v2(domain);
+   break;
default:
ret = -EINVAL;
}
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 6/7] iommu/amd: Introduce amd_iommu_pgtable command-line option

2021-03-12 Thread Suravee Suthikulpanit
To allow specification whether to use v1 or v2 IOMMU pagetable for
DMA remapping when calling kernel DMA-API.

Signed-off-by: Suravee Suthikulpanit 
---
 Documentation/admin-guide/kernel-parameters.txt |  6 ++
 drivers/iommu/amd/init.c| 15 +++
 2 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 04545725f187..466e807369ea 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -319,6 +319,12 @@
 This mode requires kvm-amd.avic=1.
 (Default when IOMMU HW support is present.)
 
+   amd_iommu_pgtable= [HW,X86-64]
+   Specifies one of the following AMD IOMMU page table to
+   be used for DMA remapping for DMA-API:
+   v1 - Use v1 page table (Default)
+   v2 - Use v2 page table
+
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: ,
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 9265c1bf1d84..6d5163bfb87e 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3123,6 +3123,20 @@ static int __init parse_amd_iommu_dump(char *str)
return 1;
 }
 
+static int __init parse_amd_iommu_pgtable(char *str)
+{
+   for (; *str; ++str) {
+   if (strncmp(str, "v1", 2) == 0) {
+   amd_iommu_pgtable = AMD_IOMMU_V1;
+   break;
+   } else if (strncmp(str, "v2", 2) == 0) {
+   amd_iommu_pgtable = AMD_IOMMU_V2;
+   break;
+   }
+   }
+   return 1;
+}
+
 static int __init parse_amd_iommu_intr(char *str)
 {
for (; *str; ++str) {
@@ -3246,6 +3260,7 @@ static int __init parse_ivrs_acpihid(char *str)
 
 __setup("amd_iommu_dump",  parse_amd_iommu_dump);
 __setup("amd_iommu=",  parse_amd_iommu_options);
+__setup("amd_iommu_pgtable=",  parse_amd_iommu_pgtable);
 __setup("amd_iommu_intr=", parse_amd_iommu_intr);
 __setup("ivrs_ioapic", parse_ivrs_ioapic);
 __setup("ivrs_hpet",   parse_ivrs_hpet);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 3/7] iommu/amd: Decouple the logic to enable PPR and GT

2021-03-12 Thread Suravee Suthikulpanit
Currently, the function to enable iommu v2 (GT) assumes PPR log
must also be enabled. This is no longer the case since the IOMMU
v2 page table can be enabled without PRR support (for DMA-API
use case).

Therefore, separate the enabling logic for PPR and GT.
There is no functional change.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/init.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 9126efcbaf2c..5def566de6f6 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -898,14 +898,6 @@ static void iommu_enable_xt(struct amd_iommu *iommu)
 #endif /* CONFIG_IRQ_REMAP */
 }
 
-static void iommu_enable_gt(struct amd_iommu *iommu)
-{
-   if (!iommu_feature(iommu, FEATURE_GT))
-   return;
-
-   iommu_feature_enable(iommu, CONTROL_GT_EN);
-}
-
 /* sets a specific bit in the device table entry. */
 static void set_dev_entry_bit(u16 devid, u8 bit)
 {
@@ -1882,6 +1874,7 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
amd_iommu_max_glx_val = glxval;
else
amd_iommu_max_glx_val = min(amd_iommu_max_glx_val, 
glxval);
+   iommu_feature_enable(iommu, CONTROL_GT_EN);
}
 
if (iommu_feature(iommu, FEATURE_GT) &&
@@ -2530,21 +2523,19 @@ static void early_enable_iommus(void)
 #endif
 }
 
-static void enable_iommus_v2(void)
+static void enable_iommus_ppr(void)
 {
struct amd_iommu *iommu;
 
-   for_each_iommu(iommu) {
+   for_each_iommu(iommu)
iommu_enable_ppr_log(iommu);
-   iommu_enable_gt(iommu);
-   }
 }
 
 static void enable_iommus(void)
 {
early_enable_iommus();
 
-   enable_iommus_v2();
+   enable_iommus_ppr();
 }
 
 static void disable_iommus(void)
@@ -2935,7 +2926,7 @@ static int __init state_next(void)
register_syscore_ops(_iommu_syscore_ops);
ret = amd_iommu_init_pci();
init_state = ret ? IOMMU_INIT_ERROR : IOMMU_PCI_INIT;
-   enable_iommus_v2();
+   enable_iommus_ppr();
break;
case IOMMU_PCI_INIT:
ret = amd_iommu_enable_interrupts();
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 4/7] iommu/amd: Initial support for AMD IOMMU v2 page table

2021-03-12 Thread Suravee Suthikulpanit
Introduce IO page table framework support for AMD IOMMU v2 page table.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/Makefile  |   2 +-
 drivers/iommu/amd/amd_iommu_types.h |   2 +
 drivers/iommu/amd/io_pgtable_v2.c   | 239 
 drivers/iommu/io-pgtable.c  |   1 +
 include/linux/io-pgtable.h  |   2 +
 5 files changed, 245 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/amd/io_pgtable_v2.c

diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index a935f8f4b974..773d8aa00283 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o
+obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += iommu_v2.o
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 6937e3674a16..25062eb86c8b 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -265,6 +265,7 @@
  * 512GB Pages are not supported due to a hardware bug
  */
 #define AMD_IOMMU_PGSIZES  ((~0xFFFUL) & ~(2ULL << 38))
+#define AMD_IOMMU_PGSIZES_V2   (PAGE_SIZE | (1ULL << 12) | (1ULL << 30))
 
 /* Bit value definition for dte irq remapping fields*/
 #define DTE_IRQ_PHYS_ADDR_MASK (((1ULL << 45)-1) << 6)
@@ -503,6 +504,7 @@ struct amd_io_pgtable {
int mode;
u64 *root;
atomic64_t  pt_root;/* pgtable root and pgtable mode */
+   struct mm_structv2_mm;
 };
 
 /*
diff --git a/drivers/iommu/amd/io_pgtable_v2.c 
b/drivers/iommu/amd/io_pgtable_v2.c
new file mode 100644
index ..b0b6ba2d8d35
--- /dev/null
+++ b/drivers/iommu/amd/io_pgtable_v2.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * CPU-agnostic AMD IO page table v2 allocator.
+ *
+ * Copyright (C) 2020 Advanced Micro Devices, Inc.
+ * Author: Suravee Suthikulpanit 
+ */
+
+#define pr_fmt(fmt) "AMD-Vi: " fmt
+#define dev_fmt(fmt)pr_fmt(fmt)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "amd_iommu_types.h"
+#include "amd_iommu.h"
+
+static pte_t *fetch_pte(struct amd_io_pgtable *pgtable,
+ unsigned long iova,
+ unsigned long *page_size)
+{
+   int level;
+   pte_t *ptep;
+
+   ptep = lookup_address_in_mm(>v2_mm, iova, );
+   if (!ptep || pte_none(*ptep) || (level == PG_LEVEL_NONE))
+   return NULL;
+
+   *page_size = PTE_LEVEL_PAGE_SIZE(level-1);
+   return ptep;
+}
+
+static pte_t *v2_pte_alloc_map(struct mm_struct *mm, unsigned long vaddr)
+{
+   pgd_t *pgd;
+   p4d_t *p4d;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+
+   pgd = pgd_offset(mm, vaddr);
+   p4d = p4d_alloc(mm, pgd, vaddr);
+   if (!p4d)
+   return NULL;
+   pud = pud_alloc(mm, p4d, vaddr);
+   if (!pud)
+   return NULL;
+   pmd = pmd_alloc(mm, pud, vaddr);
+   if (!pmd)
+   return NULL;
+   pte = pte_alloc_map(mm, pmd, vaddr);
+   return pte;
+}
+
+static int iommu_v2_map_page(struct io_pgtable_ops *ops, unsigned long iova,
+ phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+{
+   struct protection_domain *dom = io_pgtable_ops_to_domain(ops);
+   struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+   pte_t *pte;
+   int ret, i, count;
+   bool updated = false;
+   unsigned long o_iova = iova;
+   unsigned long pte_pgsize;
+
+   BUG_ON(!IS_ALIGNED(iova, size) || !IS_ALIGNED(paddr, size));
+
+   ret = -EINVAL;
+   if (!(prot & IOMMU_PROT_MASK))
+   goto out;
+
+   count = PAGE_SIZE_PTE_COUNT(size);
+
+   for (i = 0; i < count; ++i, iova += PAGE_SIZE, paddr += PAGE_SIZE) {
+   pte = fetch_pte(pgtable, iova, _pgsize);
+   if (!pte || pte_none(*pte)) {
+   pte = v2_pte_alloc_map(>iop.v2_mm, iova);
+   if (!pte)
+   goto out;
+   } else {
+   updated = true;
+   }
+   set_pte(pte, __pte((paddr & 
PAGE_MASK)|_PAGE_PRESENT|_PAGE_USER));
+   if (prot & IOMMU_PROT_IW)
+   *pte = pte_mkwrite(*pte);
+   }
+
+   if (updated) {
+   if (count > 1)
+   amd_iommu_flush_tlb(>domain, 0);
+   else
+   amd_iommu_flush_page(>domain, 0, o_iova);
+   }
+
+   ret = 0;
+out:
+   return ret;
+}
+
+static unsigned long iommu_v2_unmap_page(struct io_pgtable_ops *ops,
+ unsigned long 

[RFC PATCH 1/7] iommu/amd: Refactor amd_iommu_domain_enable_v2

2021-03-12 Thread Suravee Suthikulpanit
The current function to enable IOMMU v2 also lock the domain.
In order to reuse the same code in different code path, in which
the domain has already been locked, refactor the function to separate
the locking from the enabling logic.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/iommu.c | 42 +--
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40..6f3e42495709 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -88,6 +88,7 @@ struct iommu_cmd {
 struct kmem_cache *amd_iommu_irq_cache;
 
 static void detach_device(struct device *dev);
+static int domain_enable_v2(struct protection_domain *domain, int pasids, bool 
has_ppr);
 
 /
  *
@@ -2304,10 +2305,9 @@ void amd_iommu_domain_direct_map(struct iommu_domain 
*dom)
 }
 EXPORT_SYMBOL(amd_iommu_domain_direct_map);
 
-int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids)
+/* Note: This function expects iommu_domain->lock to be held prior calling the 
function. */
+static int domain_enable_v2(struct protection_domain *domain, int pasids, bool 
has_ppr)
 {
-   struct protection_domain *domain = to_pdomain(dom);
-   unsigned long flags;
int levels, ret;
 
if (pasids <= 0 || pasids > (PASID_MASK + 1))
@@ -2320,17 +2320,6 @@ int amd_iommu_domain_enable_v2(struct iommu_domain *dom, 
int pasids)
if (levels > amd_iommu_max_glx_val)
return -EINVAL;
 
-   spin_lock_irqsave(>lock, flags);
-
-   /*
-* Save us all sanity checks whether devices already in the
-* domain support IOMMUv2. Just force that the domain has no
-* devices attached when it is switched into IOMMUv2 mode.
-*/
-   ret = -EBUSY;
-   if (domain->dev_cnt > 0 || domain->flags & PD_IOMMUV2_MASK)
-   goto out;
-
ret = -ENOMEM;
domain->gcr3_tbl = (void *)get_zeroed_page(GFP_ATOMIC);
if (domain->gcr3_tbl == NULL)
@@ -2344,8 +2333,31 @@ int amd_iommu_domain_enable_v2(struct iommu_domain *dom, 
int pasids)
ret = 0;
 
 out:
-   spin_unlock_irqrestore(>lock, flags);
+   return ret;
+}
 
+int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids)
+{
+   int ret;
+   unsigned long flags;
+   struct protection_domain *pdom = to_pdomain(dom);
+
+   spin_lock_irqsave(>lock, flags);
+
+   /*
+* Save us all sanity checks whether devices already in the
+* domain support IOMMUv2. Just force that the domain has no
+* devices attached when it is switched into IOMMUv2 mode.
+*/
+   ret = -EBUSY;
+   if (pdom->dev_cnt > 0 || pdom->flags & PD_IOMMUV2_MASK)
+   goto out;
+
+   if (pdom->dev_cnt == 0 && !(pdom->gcr3_tbl))
+   ret = domain_enable_v2(pdom, pasids, true);
+
+out:
+   spin_unlock_irqrestore(>lock, flags);
return ret;
 }
 EXPORT_SYMBOL(amd_iommu_domain_enable_v2);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 5/7] iommu/amd: Add support for Guest IO protection

2021-03-12 Thread Suravee Suthikulpanit
AMD IOMMU introduces support for Guest I/O protection where the request
from the I/O device without a PASID are treated as if they have PASID 0.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h | 3 +++
 drivers/iommu/amd/init.c| 8 
 drivers/iommu/amd/iommu.c   | 4 
 3 files changed, 15 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 25062eb86c8b..876ba1adf73e 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -93,6 +93,7 @@
 #define FEATURE_HE (1ULL<<8)
 #define FEATURE_PC (1ULL<<9)
 #define FEATURE_GAM_VAPIC  (1ULL<<21)
+#define FEATURE_GIOSUP (1ULL<<48)
 #define FEATURE_EPHSUP (1ULL<<50)
 #define FEATURE_SNP(1ULL<<63)
 
@@ -366,6 +367,7 @@
 #define DTE_FLAG_IW (1ULL << 62)
 
 #define DTE_FLAG_IOTLB (1ULL << 32)
+#define DTE_FLAG_GIOV  (1ULL << 54)
 #define DTE_FLAG_GV(1ULL << 55)
 #define DTE_FLAG_MASK  (0x3ffULL << 32)
 #define DTE_GLX_SHIFT  (56)
@@ -519,6 +521,7 @@ struct protection_domain {
spinlock_t lock;/* mostly used to lock the page table*/
u16 id; /* the domain id written to the device table */
int glx;/* Number of levels for GCR3 table */
+   bool giov;  /* guest IO protection domain */
u64 *gcr3_tbl;  /* Guest CR3 table */
unsigned long flags;/* flags to find out type of domain */
unsigned dev_cnt;   /* devices assigned to this domain */
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 5def566de6f6..9265c1bf1d84 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1895,6 +1895,12 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
 
init_iommu_perf_ctr(iommu);
 
+   if (amd_iommu_pgtable == AMD_IOMMU_V2 &&
+   !iommu_feature(iommu, FEATURE_GIOSUP)) {
+   pr_warn("Cannot enable v2 page table for DMA-API. Fallback to 
v1.\n");
+   amd_iommu_pgtable = AMD_IOMMU_V1;
+   }
+
if (is_rd890_iommu(iommu->dev)) {
int i, j;
 
@@ -1969,6 +1975,8 @@ static void print_iommu_info(void)
if (amd_iommu_xt_mode == IRQ_REMAP_X2APIC_MODE)
pr_info("X2APIC enabled\n");
}
+   if (amd_iommu_pgtable == AMD_IOMMU_V2)
+   pr_info("GIOV enabled\n");
 }
 
 static int __init amd_iommu_init_pci(void)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index f3800efdbb29..e29ece6e1e68 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1405,6 +1405,10 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
 
pte_root |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
+
+   if (domain->giov && (domain->flags & PD_IOMMUV2_MASK))
+   pte_root |= DTE_FLAG_GIOV;
+
pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V | DTE_FLAG_TV;
 
flags = amd_iommu_dev_table[devid].data[1];
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 2/7] iommu/amd: Update sanity check when enable PRI/ATS

2021-03-12 Thread Suravee Suthikulpanit
Currently, PPR/ATS can be enabled only if the domain is type
identity mapping. However, when we allow the IOMMU v2 page table
to be used for DMA-API, the sanity check needs to be updated to
only apply for the case when using AMD_IOMMU_V1 page table mode.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/iommu.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 6f3e42495709..f3800efdbb29 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1549,7 +1549,7 @@ static int pri_reset_while_enabled(struct pci_dev *pdev)
return 0;
 }
 
-static int pdev_iommuv2_enable(struct pci_dev *pdev)
+static int pdev_pri_ats_enable(struct pci_dev *pdev)
 {
bool reset_enable;
int reqs, ret;
@@ -1624,11 +1624,19 @@ static int attach_device(struct device *dev,
struct iommu_domain *def_domain = iommu_get_dma_domain(dev);
 
ret = -EINVAL;
-   if (def_domain->type != IOMMU_DOMAIN_IDENTITY)
+
+   /*
+* In case of using AMD_IOMMU_V1 page table mode, and the device
+* is enabling for PPR/ATS support (using v2 table),
+* we need to make sure that the domain type is identity map.
+*/
+   if ((amd_iommu_pgtable == AMD_IOMMU_V1) &&
+   def_domain->type != IOMMU_DOMAIN_IDENTITY) {
goto out;
+   }
 
if (dev_data->iommu_v2) {
-   if (pdev_iommuv2_enable(pdev) != 0)
+   if (pdev_pri_ats_enable(pdev) != 0)
goto out;
 
dev_data->ats.enabled = true;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 0/7] iommu/amd: Add Generic IO Page Table Framework Support for v2 Page Table

2021-03-12 Thread Suravee Suthikulpanit
This series introduces a new usage model for the v2 page table, where it
can be used to implement support for DMA-API by adopting the generic
IO page table framework.

One of the target usecases is to support nested IO page tables
where the guest uses the guest IO page table (v2) for translating
GVA to GPA, and the hypervisor uses the host I/O page table (v1) for
translating GPA to SPA. This is a pre-requisite for supporting the new
HW-assisted vIOMMU presented at the KVM Forum 2020.

  
https://static.sched.com/hosted_files/kvmforum2020/26/vIOMMU%20KVM%20Forum%202020.pdf

The following components are introduced in this series:

- Part 1 (patch 1-4 and 7)
  Refactor the current IOMMU page table v2 code
  to adopt the generic IO page table framework, and add
  AMD IOMMU Guest (v2) page table management code.

- Part 2 (patch 5)
  Add support for the AMD IOMMU Guest IO Protection feature (GIOV)
  where requests from the I/O device without a PASID are treated as
  if they have PASID of 0.

- Part 3 (patch 6)
  Introduce new amd_iommu_pgtable command-line to allow users
  to select the mode of operation (v1 or v2).

See AMD I/O Virtualization Technology Specification for more detail.

  http://www.amd.com/system/files/TechDocs/48882_IOMMU_3.05_PUB.pdf

Thanks,
Suravee

Suravee Suthikulpanit (7):
  iommu/amd: Refactor amd_iommu_domain_enable_v2
  iommu/amd: Update sanity check when enable PRI/ATS
  iommu/amd: Decouple the logic to enable PPR and GT
  iommu/amd: Initial support for AMD IOMMU v2 page table
  iommu/amd: Add support for Guest IO protection
  iommu/amd: Introduce amd_iommu_pgtable command-line option
  iommu/amd: Add support for using AMD IOMMU v2 page table for DMA-API

 .../admin-guide/kernel-parameters.txt |   6 +
 drivers/iommu/amd/Makefile|   2 +-
 drivers/iommu/amd/amd_iommu_types.h   |   5 +
 drivers/iommu/amd/init.c  |  42 ++-
 drivers/iommu/amd/io_pgtable_v2.c | 239 ++
 drivers/iommu/amd/iommu.c |  81 --
 drivers/iommu/io-pgtable.c|   1 +
 include/linux/io-pgtable.h|   2 +
 8 files changed, 345 insertions(+), 33 deletions(-)
 create mode 100644 drivers/iommu/amd/io_pgtable_v2.c

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu