Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-11-17 Thread Ricardo Ribalda
Hi Christoph

I have been testing with real hardware on arm64 your patchset. And uvc
performs 20 times better using Kieran's test

https://github.com/ribalda/linux/tree/uvc-noncontiguous

These are the result of running   yavta --capture=1000


dma_alloc_noncontiguous

frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 78466000 : duration 33303
FPS: 29.99
URB: 418105/5000 uS/qty: 83.621 avg 98.783 std 17.396 min 1264.688 max (uS)
header: 100040/5000 uS/qty: 20.008 avg 19.458 std 2.969 min 454.167 max (uS)
latency: 347653/5000 uS/qty: 69.530 avg 98.937 std 9.114 min 1256.875 max (uS)
decode: 70452/5000 uS/qty: 14.090 avg 11.547 std 6.146 min 271.510 max (uS)
raw decode speed: 8.967 Gbits/s
raw URB handling speed: 1.501 Gbits/s
throughput: 18.848 Mbits/s
URB decode CPU usage 0.211500 %


usb_alloc_coherent

frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 70501712 : duration 33319
FPS: 29.98
URB: 1854128/5000 uS/qty: 370.825 avg 417.133 std 14.539 min 2875.760 max (uS)
header: 98765/5000 uS/qty: 19.753 avg 30.714 std 1.042 min 573.463 max (uS)
latency: 453316/5000 uS/qty: 90.663 avg 114.987 std 4.065 min 860.795 max (uS)
decode: 1400811/5000 uS/qty: 280.162 avg 330.786 std 6.305 min 2758.202 max (uS)
raw decode speed: 402.866 Mbits/s
raw URB handling speed: 304.214 Mbits/s
throughput: 16.927 Mbits/s
URB decode CPU usage 4.204200 %


Best regards

On Tue, Nov 10, 2020 at 10:57 AM Christoph Hellwig  wrote:
>
> On Tue, Nov 10, 2020 at 06:50:32PM +0900, Tomasz Figa wrote:
> > In what terms it doesn't actually work? Last time I checked some
> > platforms actually defined CONFIG_DMA_NONCOHERENT, so those would
> > instead use the kmalloc() + dma_map() path. I don't have any
> > background on why that was added and whether it needs to be preserved,
> > though. Kieran, Laurent, do you have any insight?
>
> CONFIG_DMA_NONCOHERENT is set on sh and mips for platforms that may
> support non-coherent DMA at compile time (but at least for mips that
> doesn't actually means this gets used).  Using that ifdef to decide
> on using usb_alloc_coherent vs letting the usb layer map the data
> seems at best odd, and if we are unlucky papering over a bug somewhere.



-- 
Ricardo Ribalda


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-11-10 Thread Christoph Hellwig
On Tue, Nov 10, 2020 at 06:50:32PM +0900, Tomasz Figa wrote:
> In what terms it doesn't actually work? Last time I checked some
> platforms actually defined CONFIG_DMA_NONCOHERENT, so those would
> instead use the kmalloc() + dma_map() path. I don't have any
> background on why that was added and whether it needs to be preserved,
> though. Kieran, Laurent, do you have any insight?

CONFIG_DMA_NONCOHERENT is set on sh and mips for platforms that may
support non-coherent DMA at compile time (but at least for mips that
doesn't actually means this gets used).  Using that ifdef to decide
on using usb_alloc_coherent vs letting the usb layer map the data
seems at best odd, and if we are unlucky papering over a bug somewhere.


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-11-10 Thread Tomasz Figa
On Tue, Nov 10, 2020 at 6:33 PM Ricardo Ribalda  wrote:
>
> Hi Christoph
>
> On Tue, Nov 10, 2020 at 10:25 AM Christoph Hellwig  wrote:
> >
> > On Mon, Nov 09, 2020 at 03:53:55PM +0100, Ricardo Ribalda wrote:
> > > Hi Christoph
> > >
> > > I have started now to give a try to your patchset. Sorry for the delay.
> > >
> > > For uvc I have prepared this patch:
> > > https://github.com/ribalda/linux/commit/9094fe223fe38f8c8ff21366d893b43cbbdf0113
> > >
> > > I have tested successfully in a x86_64 noteboot..., yes I know there
> > > is no change for that platform :).
> > > I am trying to get hold of an arm device that can run the latest
> > > kernel from upstream.
> > >
> > > On the meanwhile if you could take a look to the patch to verify that
> > > this the way that you expect the drivers to use your api I would
> > > appreciate it
> >
> > This looks pretty reaosnable.
> >
>
> Great
>

Thanks Christoph for taking a look quickly.

> Also FYI, I managed to boot an ARM device with that tree. But I could
> not test the uvc driver (it was a remote device with no usb device
> attached)
>
> Hopefully I will be able to test it for real this week.
>
> Any suggestions for how to measure performance difference?

Back in time Kieran (+CC) shared a patch to add extra statistics for
packet processing and payload assembly, with results of various
approaches summarized in a spreadsheet:
https://docs.google.com/spreadsheets/d/1uPdbdVcebO9OQ0LQ8hR2LGIEySWgSnGwwhzv7LPXAlU/edit#gid=0

That and just simple CPU usage comparison would be enough.

>
> Thanks!
>
> > Note that ifdef  CONFIG_DMA_NONCOHERENT in the old code doesn't actually
> > work, as that option is an internal thing just for mips and sh..

In what terms it doesn't actually work? Last time I checked some
platforms actually defined CONFIG_DMA_NONCOHERENT, so those would
instead use the kmalloc() + dma_map() path. I don't have any
background on why that was added and whether it needs to be preserved,
though. Kieran, Laurent, do you have any insight?

Best regards,
Tomasz


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-11-10 Thread Christoph Hellwig
On Tue, Nov 10, 2020 at 10:33:05AM +0100, Ricardo Ribalda wrote:
> Also FYI, I managed to boot an ARM device with that tree. But I could
> not test the uvc driver (it was a remote device with no usb device
> attached)
> 
> Hopefully I will be able to test it for real this week.
> 
> Any suggestions for how to measure performance difference?

I have to admit I don't know at all how uvc works.  But the main
problem with dma_alloc_coherent is that all access is uncached.  So
anything that does larger and/or many data transfers to and from it
will be glacially slow.  With the dma streaming API we still have to
pay for cache flushes, but only before and after the transfers, and
in many cases in a somewhat optimized fashion.


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-11-10 Thread Ricardo Ribalda
Hi Christoph

On Tue, Nov 10, 2020 at 10:25 AM Christoph Hellwig  wrote:
>
> On Mon, Nov 09, 2020 at 03:53:55PM +0100, Ricardo Ribalda wrote:
> > Hi Christoph
> >
> > I have started now to give a try to your patchset. Sorry for the delay.
> >
> > For uvc I have prepared this patch:
> > https://github.com/ribalda/linux/commit/9094fe223fe38f8c8ff21366d893b43cbbdf0113
> >
> > I have tested successfully in a x86_64 noteboot..., yes I know there
> > is no change for that platform :).
> > I am trying to get hold of an arm device that can run the latest
> > kernel from upstream.
> >
> > On the meanwhile if you could take a look to the patch to verify that
> > this the way that you expect the drivers to use your api I would
> > appreciate it
>
> This looks pretty reaosnable.
>

Great

Also FYI, I managed to boot an ARM device with that tree. But I could
not test the uvc driver (it was a remote device with no usb device
attached)

Hopefully I will be able to test it for real this week.

Any suggestions for how to measure performance difference?

Thanks!

> Note that ifdef  CONFIG_DMA_NONCOHERENT in the old code doesn't actually
> work, as that option is an internal thing just for mips and sh..



-- 
Ricardo Ribalda


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-11-10 Thread Christoph Hellwig
On Mon, Nov 09, 2020 at 03:53:55PM +0100, Ricardo Ribalda wrote:
> Hi Christoph
> 
> I have started now to give a try to your patchset. Sorry for the delay.
> 
> For uvc I have prepared this patch:
> https://github.com/ribalda/linux/commit/9094fe223fe38f8c8ff21366d893b43cbbdf0113
> 
> I have tested successfully in a x86_64 noteboot..., yes I know there
> is no change for that platform :).
> I am trying to get hold of an arm device that can run the latest
> kernel from upstream.
> 
> On the meanwhile if you could take a look to the patch to verify that
> this the way that you expect the drivers to use your api I would
> appreciate it

This looks pretty reaosnable.

Note that ifdef  CONFIG_DMA_NONCOHERENT in the old code doesn't actually
work, as that option is an internal thing just for mips and sh..


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-11-09 Thread Ricardo Ribalda
Hi Christoph

I have started now to give a try to your patchset. Sorry for the delay.

For uvc I have prepared this patch:
https://github.com/ribalda/linux/commit/9094fe223fe38f8c8ff21366d893b43cbbdf0113

I have tested successfully in a x86_64 noteboot..., yes I know there
is no change for that platform :).
I am trying to get hold of an arm device that can run the latest
kernel from upstream.

On the meanwhile if you could take a look to the patch to verify that
this the way that you expect the drivers to use your api I would
appreciate it

Thanks



On Wed, Oct 14, 2020 at 3:20 PM Tomasz Figa  wrote:
>
> +CC Ricardo who will be looking into using this in the USB stack (UVC
> camera driver).
>
> On Wed, Sep 30, 2020 at 6:09 PM Christoph Hellwig  wrote:
> >
> > Add a new API that returns a virtually non-contigous array of pages
> > and dma address.  This API is only implemented for dma-iommu and will
> > not be implemented for non-iommu DMA API instances that have to allocate
> > contiguous memory.  It is up to the caller to check if the API is
> > available.
> >
> > The intent is that media drivers can use this API if either:
> >
> >  - no kernel mapping or only temporary kernel mappings are required.
> >That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
> >  - a kernel mapping is required for cached and DMA mapped pages, but
> >the driver also needs the pages to e.g. map them to userspace.
> >In that sense it is a replacement for some aspects of the recently
> >removed and never fully implemented DMA_ATTR_NON_CONSISTENT
> >
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  drivers/iommu/dma-iommu.c   | 73 +
> >  include/linux/dma-mapping.h |  9 +
> >  kernel/dma/mapping.c| 35 ++
> >  3 files changed, 93 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 7922f545cd5eef..158026a856622c 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -565,23 +565,12 @@ static struct page **__iommu_dma_alloc_pages(struct 
> > device *dev,
> > return pages;
> >  }
> >
> > -/**
> > - * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA 
> > space
> > - * @dev: Device to allocate memory for. Must be a real device
> > - *  attached to an iommu_dma_domain
> > - * @size: Size of buffer in bytes
> > - * @dma_handle: Out argument for allocated DMA handle
> > - * @gfp: Allocation flags
> > - * @prot: pgprot_t to use for the remapped mapping
> > - * @attrs: DMA attributes for this allocation
> > - *
> > - * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
> > +/*
> > + * If size is less than PAGE_SIZE, then a full CPU page will be allocated,
> >   * but an IOMMU which supports smaller pages might not map the whole thing.
> > - *
> > - * Return: Mapped virtual address, or NULL on failure.
> >   */
> > -static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
> > -   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
> > +static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
> > +   size_t size, dma_addr_t *dma_handle, gfp_t gfp, pgprot_t 
> > prot,
> > unsigned long attrs)
> >  {
> > struct iommu_domain *domain = iommu_get_dma_domain(dev);
> > @@ -593,7 +582,6 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> > size_t size,
> > struct page **pages;
> > struct sg_table sgt;
> > dma_addr_t iova;
> > -   void *vaddr;
> >
> > *dma_handle = DMA_MAPPING_ERROR;
> >
> > @@ -636,17 +624,10 @@ static void *iommu_dma_alloc_remap(struct device 
> > *dev, size_t size,
> > < size)
> > goto out_free_sg;
> >
> > -   vaddr = dma_common_pages_remap(pages, size, prot,
> > -   __builtin_return_address(0));
> > -   if (!vaddr)
> > -   goto out_unmap;
> > -
> > *dma_handle = iova;
> > sg_free_table(&sgt);
> > -   return vaddr;
> > +   return pages;
> >
> > -out_unmap:
> > -   __iommu_dma_unmap(dev, iova, size);
> >  out_free_sg:
> > sg_free_table(&sgt);
> >  out_free_iova:
> > @@ -656,6 +637,46 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> > size_t size,
> > return NULL;
> >  }
> >
> > +static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
> > +   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
> > +   unsigned long attrs)
> > +{
> > +   struct page **pages;
> > +   void *vaddr;
> > +
> > +   pages = __iommu_dma_alloc_noncontiguous(dev, size, dma_handle, gfp,
> > +   prot, attrs);
> > +   if (!pages)
> > +   return NULL;
> > +   vaddr = dma_common_pages_remap(pages, size, prot,
> > +   __builtin_return_address(0));
> > + 

RE: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-14 Thread David Laight
> On Wed, Sep 30, 2020 at 6:09 PM Christoph Hellwig  wrote:
> >
> > Add a new API that returns a virtually non-contigous array of pages
> > and dma address.  This API is only implemented for dma-iommu and will
> > not be implemented for non-iommu DMA API instances that have to allocate
> > contiguous memory.  It is up to the caller to check if the API is
> > available.

Isn't there already a flag that is only implemented for ARM
systems with iommu that forces pages to actually be physically
contiguous?

So isn't this some kind of strange opposite?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-14 Thread Tomasz Figa
+CC Ricardo who will be looking into using this in the USB stack (UVC
camera driver).

On Wed, Sep 30, 2020 at 6:09 PM Christoph Hellwig  wrote:
>
> Add a new API that returns a virtually non-contigous array of pages
> and dma address.  This API is only implemented for dma-iommu and will
> not be implemented for non-iommu DMA API instances that have to allocate
> contiguous memory.  It is up to the caller to check if the API is
> available.
>
> The intent is that media drivers can use this API if either:
>
>  - no kernel mapping or only temporary kernel mappings are required.
>That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
>  - a kernel mapping is required for cached and DMA mapped pages, but
>the driver also needs the pages to e.g. map them to userspace.
>In that sense it is a replacement for some aspects of the recently
>removed and never fully implemented DMA_ATTR_NON_CONSISTENT
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/dma-iommu.c   | 73 +
>  include/linux/dma-mapping.h |  9 +
>  kernel/dma/mapping.c| 35 ++
>  3 files changed, 93 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7922f545cd5eef..158026a856622c 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -565,23 +565,12 @@ static struct page **__iommu_dma_alloc_pages(struct 
> device *dev,
> return pages;
>  }
>
> -/**
> - * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space
> - * @dev: Device to allocate memory for. Must be a real device
> - *  attached to an iommu_dma_domain
> - * @size: Size of buffer in bytes
> - * @dma_handle: Out argument for allocated DMA handle
> - * @gfp: Allocation flags
> - * @prot: pgprot_t to use for the remapped mapping
> - * @attrs: DMA attributes for this allocation
> - *
> - * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
> +/*
> + * If size is less than PAGE_SIZE, then a full CPU page will be allocated,
>   * but an IOMMU which supports smaller pages might not map the whole thing.
> - *
> - * Return: Mapped virtual address, or NULL on failure.
>   */
> -static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
> -   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
> +static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
> +   size_t size, dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
> unsigned long attrs)
>  {
> struct iommu_domain *domain = iommu_get_dma_domain(dev);
> @@ -593,7 +582,6 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> size_t size,
> struct page **pages;
> struct sg_table sgt;
> dma_addr_t iova;
> -   void *vaddr;
>
> *dma_handle = DMA_MAPPING_ERROR;
>
> @@ -636,17 +624,10 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> size_t size,
> < size)
> goto out_free_sg;
>
> -   vaddr = dma_common_pages_remap(pages, size, prot,
> -   __builtin_return_address(0));
> -   if (!vaddr)
> -   goto out_unmap;
> -
> *dma_handle = iova;
> sg_free_table(&sgt);
> -   return vaddr;
> +   return pages;
>
> -out_unmap:
> -   __iommu_dma_unmap(dev, iova, size);
>  out_free_sg:
> sg_free_table(&sgt);
>  out_free_iova:
> @@ -656,6 +637,46 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> size_t size,
> return NULL;
>  }
>
> +static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
> +   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
> +   unsigned long attrs)
> +{
> +   struct page **pages;
> +   void *vaddr;
> +
> +   pages = __iommu_dma_alloc_noncontiguous(dev, size, dma_handle, gfp,
> +   prot, attrs);
> +   if (!pages)
> +   return NULL;
> +   vaddr = dma_common_pages_remap(pages, size, prot,
> +   __builtin_return_address(0));
> +   if (!vaddr)
> +   goto out_unmap;
> +   return vaddr;
> +
> +out_unmap:
> +   __iommu_dma_unmap(dev, *dma_handle, size);
> +   __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
> +   return NULL;
> +}
> +
> +#ifdef CONFIG_DMA_REMAP
> +static struct page **iommu_dma_alloc_noncontiguous(struct device *dev,
> +   size_t size, dma_addr_t *dma_handle, gfp_t gfp,
> +   unsigned long attrs)
> +{
> +   return __iommu_dma_alloc_noncontiguous(dev, size, dma_handle, gfp,
> +  PAGE_KERNEL, attrs);
> +}
> +
> +static void iommu_dma_free_noncontiguous(struct device *dev, size_t size,
> +   struct page **pages, dma_addr_t dma_handle)
> +{
> +   __iommu_dma_unmap(dev, dma_handle, size);
> +   __iommu_dma

Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-07 Thread Christoph Hellwig
On Wed, Oct 07, 2020 at 02:21:43PM +0200, Tomasz Figa wrote:
> My initial feeling is that it should work, but we'll give you a
> definitive answer once we prototype it. :)
> 
> We might actually give it a try in the USB HCD subsystem as well, to
> implement usb_alloc_noncoherent(), as an optimization for drivers
> which have to perform multiple random accesses to the URB buffers. I
> think you might recall discussing this by the way of the pwc and
> uvcvideo camera drivers.

Yes.  I guess for usb the dma_alloc_noncoherent as-is in linux-next
might be better.   So if you have the cycles please prototype it
either way, although for that we'd also need at least a
mmap_noncoherent method, and probaby a get_sgtable_noncoherent one.


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-07 Thread Tomasz Figa
On Wed, Oct 7, 2020 at 8:21 AM Christoph Hellwig  wrote:
>
> On Tue, Oct 06, 2020 at 10:56:04PM +0200, Tomasz Figa wrote:
> > > Yes.  And make sure the API isn't implemented when VIVT caches are
> > > used, but that isn't really different from the current interface.
> >
> > Okay, thanks. Let's see if we can make necessary changes to the videobuf2.
> >
> > +Sergey Senozhatsky for awareness too.
>
> I can defer the changes a bit to see if you'd really much prefer
> the former interface.  I think for now the most important thing is
> that it works properly for the potential users, and the prime one is
> videobuf2 for now.  drm also seems like a big potential users, but I
> had a really hard time getting the developers to engage in API
> development.

My initial feeling is that it should work, but we'll give you a
definitive answer once we prototype it. :)

We might actually give it a try in the USB HCD subsystem as well, to
implement usb_alloc_noncoherent(), as an optimization for drivers
which have to perform multiple random accesses to the URB buffers. I
think you might recall discussing this by the way of the pwc and
uvcvideo camera drivers.

Best regards,
Tomasz


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-06 Thread Christoph Hellwig
On Tue, Oct 06, 2020 at 10:56:04PM +0200, Tomasz Figa wrote:
> > Yes.  And make sure the API isn't implemented when VIVT caches are
> > used, but that isn't really different from the current interface.
> 
> Okay, thanks. Let's see if we can make necessary changes to the videobuf2.
> 
> +Sergey Senozhatsky for awareness too.

I can defer the changes a bit to see if you'd really much prefer
the former interface.  I think for now the most important thing is
that it works properly for the potential users, and the prime one is
videobuf2 for now.  drm also seems like a big potential users, but I
had a really hard time getting the developers to engage in API
development.


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-06 Thread Tomasz Figa
On Mon, Oct 5, 2020 at 10:26 AM Christoph Hellwig  wrote:
>
> On Fri, Oct 02, 2020 at 05:50:40PM +, Tomasz Figa wrote:
> > Hi Christoph,
> >
> > On Wed, Sep 30, 2020 at 06:09:17PM +0200, Christoph Hellwig wrote:
> > > Add a new API that returns a virtually non-contigous array of pages
> > > and dma address.  This API is only implemented for dma-iommu and will
> > > not be implemented for non-iommu DMA API instances that have to allocate
> > > contiguous memory.  It is up to the caller to check if the API is
> > > available.
> >
> > Would you mind scheding some more light on what made the previous attempt
> > not work well? I liked the previous API because it was more consistent with
> > the regular dma_alloc_coherent().
>
> The problem is that with a dma_alloc_noncoherent that can return pages
> not in the kernel mapping we can't just use virt_to_page to fill in
> scatterlists or mmap the buffer to userspace, but would need new helpers
> and another two methods.
>
> > >  - no kernel mapping or only temporary kernel mappings are required.
> > >That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
> > >  - a kernel mapping is required for cached and DMA mapped pages, but
> > >the driver also needs the pages to e.g. map them to userspace.
> > >In that sense it is a replacement for some aspects of the recently
> > >removed and never fully implemented DMA_ATTR_NON_CONSISTENT
> >
> > What's the expected allocation and mapping flow with the latter? Would that 
> > be
> >
> > pages = dma_alloc_noncoherent(...)
> > vaddr = vmap(pages, ...);
> >
> > ?
>
> Yes.  Witht the vmap step optional for replacements of
> DMA_ATTR_NO_KERNEL_MAPPING, which is another nightmare to deal with.
>
> > Would one just use the usual dma_sync_for_{cpu,device}() for cache
> > invallidate/clean, while keeping the mapping in place?
>
> Yes.  And make sure the API isn't implemented when VIVT caches are
> used, but that isn't really different from the current interface.

Okay, thanks. Let's see if we can make necessary changes to the videobuf2.

+Sergey Senozhatsky for awareness too.

Best regrards,
Tomasz


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-05 Thread Christoph Hellwig
On Fri, Oct 02, 2020 at 05:50:40PM +, Tomasz Figa wrote:
> Hi Christoph,
> 
> On Wed, Sep 30, 2020 at 06:09:17PM +0200, Christoph Hellwig wrote:
> > Add a new API that returns a virtually non-contigous array of pages
> > and dma address.  This API is only implemented for dma-iommu and will
> > not be implemented for non-iommu DMA API instances that have to allocate
> > contiguous memory.  It is up to the caller to check if the API is
> > available.
> 
> Would you mind scheding some more light on what made the previous attempt
> not work well? I liked the previous API because it was more consistent with
> the regular dma_alloc_coherent().

The problem is that with a dma_alloc_noncoherent that can return pages
not in the kernel mapping we can't just use virt_to_page to fill in
scatterlists or mmap the buffer to userspace, but would need new helpers
and another two methods.

> >  - no kernel mapping or only temporary kernel mappings are required.
> >That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
> >  - a kernel mapping is required for cached and DMA mapped pages, but
> >the driver also needs the pages to e.g. map them to userspace.
> >In that sense it is a replacement for some aspects of the recently
> >removed and never fully implemented DMA_ATTR_NON_CONSISTENT
> 
> What's the expected allocation and mapping flow with the latter? Would that be
> 
> pages = dma_alloc_noncoherent(...)
> vaddr = vmap(pages, ...);
> 
> ?

Yes.  Witht the vmap step optional for replacements of
DMA_ATTR_NO_KERNEL_MAPPING, which is another nightmare to deal with.

> Would one just use the usual dma_sync_for_{cpu,device}() for cache
> invallidate/clean, while keeping the mapping in place?

Yes.  And make sure the API isn't implemented when VIVT caches are
used, but that isn't really different from the current interface.


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-02 Thread Tomasz Figa
Hi Christoph,

On Wed, Sep 30, 2020 at 06:09:17PM +0200, Christoph Hellwig wrote:
> Add a new API that returns a virtually non-contigous array of pages
> and dma address.  This API is only implemented for dma-iommu and will
> not be implemented for non-iommu DMA API instances that have to allocate
> contiguous memory.  It is up to the caller to check if the API is
> available.

Would you mind scheding some more light on what made the previous attempt
not work well? I liked the previous API because it was more consistent with
the regular dma_alloc_coherent().

> 
> The intent is that media drivers can use this API if either:

FWIW, the USB subsystem also has similar needs, and so do some DRM drivers
using DMA API rather than IOMMU API directly. Basically I believe that all
the users removed in your previous series relied on custom downstream
patches to make DMA_ATTR_NON_CONSISTENT work and could be finally made work
in upstream using this API.

> 
>  - no kernel mapping or only temporary kernel mappings are required.
>That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
>  - a kernel mapping is required for cached and DMA mapped pages, but
>the driver also needs the pages to e.g. map them to userspace.
>In that sense it is a replacement for some aspects of the recently
>removed and never fully implemented DMA_ATTR_NON_CONSISTENT

What's the expected allocation and mapping flow with the latter? Would that be

pages = dma_alloc_noncoherent(...)
vaddr = vmap(pages, ...);

?

Would one just use the usual dma_sync_for_{cpu,device}() for cache
invallidate/clean, while keeping the mapping in place?

Best regards,
Tomasz