Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Christoph Hellwig
On Thu, Aug 20, 2020 at 06:43:47AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 19, 2020 at 03:57:53PM +0200, Tomasz Figa wrote:
> > > > Could you explain what makes you think it's unused? It's a feature of
> > > > the UAPI generally supported by the videobuf2 framework and relied on
> > > > by Chromium OS to get any kind of reasonable performance when
> > > > accessing V4L2 buffers in the userspace.
> > >
> > > Because it doesn't do anything except on PARISC and non-coherent MIPS,
> > > so by definition it isn't used by any of these media drivers.
> > 
> > It's still an UAPI feature, so we can't simply remove the flag, it
> > must stay there as a no-op, until the problem is resolved.
> 
> Ok, I'll switch to just ignoring it for the next version.

So I took a deeper look.  I don't really think it qualifies as a UAPI
in our traditional sense.  For one it only appeared in 5.9-rc1, so we
can trivially expedite the patch into 5.9-rc and not actually make it
show up in any released kernel version.  And even as of the current
Linus' tree the only user is a test driver.  So I really think the best
way to go ahead is to just revert it ASAP as the design wasn't thought
out at all.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages

2020-08-19 Thread Christoph Hellwig
On Wed, Aug 19, 2020 at 05:03:52PM +0200, Tomasz Figa wrote:
> >
> > -Warning: These pieces of the DMA API should not be used in the
> > -majority of cases, since they cater for unlikely corner cases that
> > -don't belong in usual drivers.
> > +These APIs allow to allocate pages that can be used like normal pages
> > +in the kernel direct mapping, but are guaranteed to be DMA addressable.
> 
> Could we elaborate a bit more on what "like normal pages in kernel
> direct mapping" mean from the driver perspective?

It mostly means you can call virt_to_page and then do anything you'd
do with a page struct.  Unlike dma_alloc_attrs that just return an
opaque virtual address that the caller is not allowed to poke into.

> There is one aspect that the existing dma_alloc_attrs() handles, but
> this new function doesn't: IOMMU support. The function will always
> allocate a physically-contiguous block memory, which is a costly
> operation and not even guaranteed to succeed, even if enough free
> memory is available.
> 
> Modern SoCs employ IOMMUs to avoid the need to allocate
> physically-contiguous memory and those happen to be also the devices
> that could benefit from non-coherent allocations a lot. One of the
> tasks of the DMA API was making it possible to allocate suitable
> memory for a given device, without having the driver know about the
> SoC integration details, such as the presence of an IOMMU.

This is completely out of scope for this API exactly because it
guarantees a page in the direct mapping.  But see my previous mail
in reply to Robin on how you can implement the funtionality you
want right now without any help from the dma-mapping subsystem.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Christoph Hellwig
On Wed, Aug 19, 2020 at 03:07:04PM +0100, Robin Murphy wrote:
>> FWIW, I asked back in time what the plan is for non-coherent
>> allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
>> dma_sync_*() was supposed to be the right thing to go with. [2] The
>> same thread also explains why dma_alloc_pages() isn't suitable for the
>> users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.
>
> AFAICS even back then Christoph was implying getting rid of NON_CONSISTENT 
> and *replacing* it with something streaming-API-based - i.e. this series - 
> not encouraging mixing the existing APIs. It doesn't seem impossible to 
> implement a remapping version of this new dma_alloc_pages() for 
> IOMMU-backed ops if it's really warranted (although at that point it seems 
> like "non-coherent" vb2-dc starts to have significant conceptual overlap 
> with vb2-sg).

You can alway vmap the returned pages from dma_alloc_pages, but it will
make cache invalidation hell - you'll need to use
invalidate_kernel_vmap_range and flush_kernel_vmap_range to properly
handle virtually indexed caches.

Or with remapping you mean using the iommu do de-scatter/gather?

You can implement that trivially implement it yourself for the iommu
case:

{
merge_boundary = dma_get_merge_boundary(dev);
if (!merge_boundary || merge_boundary > chunk_size - 1) {
/* can't coalesce */
return -EINVAL;
}


nents = DIV_ROUND_UP(total_size, chunk_size);
sg = sgl_alloc();
for_each_sgl() {
sg->page = __alloc_pages(get_order(chunk_size))
sg->len = chunk_size;
}
dma_map_sg(sg, DMA_ATTR_SKIP_CPU_SYNC);
// you are guaranteed to get a single dma_addr out
}

Of course this still uses the scatterlist structure with its annoying
mix of input and output parametes, so I'd rather not expose it as
an official API at the DMA layer.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Christoph Hellwig
On Wed, Aug 19, 2020 at 03:57:53PM +0200, Tomasz Figa wrote:
> > > Could you explain what makes you think it's unused? It's a feature of
> > > the UAPI generally supported by the videobuf2 framework and relied on
> > > by Chromium OS to get any kind of reasonable performance when
> > > accessing V4L2 buffers in the userspace.
> >
> > Because it doesn't do anything except on PARISC and non-coherent MIPS,
> > so by definition it isn't used by any of these media drivers.
> 
> It's still an UAPI feature, so we can't simply remove the flag, it
> must stay there as a no-op, until the problem is resolved.

Ok, I'll switch to just ignoring it for the next version.

> Also, it of course might be disputable as an out-of-tree usage, but
> selecting CONFIG_DMA_NONCOHERENT_CACHE_SYNC makes the flag actually do
> something on other platforms, including ARM64.

It isn't just disputable, but by kernel policies simply is not relevant.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Christoph Hellwig
On Wed, Aug 19, 2020 at 04:11:52PM +0200, Tomasz Figa wrote:
> > > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
> > > series related to the subsystem-facing DMA API changes, since
> > > videobuf2 is one of the biggest users of it.
> >
> > The cc list is too long - I cc lists and key maintainers.  As a reviewer
> > should should watch your subsystems lists closely.
> 
> Well, I guess we can disagree on this, because there is no clear
> policy. I'm listed in the MAINTAINERS file for the subsystem and I
> believe the purpose of the file is to list the people to CC on
> relevant patches. We're all overloaded with work and having to look
> through the huge volume of mailing lists like linux-media doesn't help
> and thus I'd still appreciate being added on CC.

I'm happy to Cc and active participant in the discussion.  I'm not
going to add all reviewers because even with the trimmed CC list
I'm already hitting the number of receipients limit on various lists.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Christoph Hellwig
On Wed, Aug 19, 2020 at 04:22:29PM +0200, Tomasz Figa wrote:
> > > FWIW, I asked back in time what the plan is for non-coherent
> > > allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
> > > dma_sync_*() was supposed to be the right thing to go with. [2] The
> > > same thread also explains why dma_alloc_pages() isn't suitable for the
> > > users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.
> >
> > AFAICS even back then Christoph was implying getting rid of
> > NON_CONSISTENT and *replacing* it with something streaming-API-based -
> 
> That's not how I read his reply from the thread I pointed to, but that
> might of course be my misunderstanding.

Yes.  Without changes like in this series just calling dma_sync_single_*
will break in various cases, e.g. because dma_alloc_attrs returns
memory remapped in the vmalloc space, and the dma_sync_single_*
implementation implementation can't cope with vmalloc addresses.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v2 1/2] drm/nouveau/kms/nv50-: Program notifier offset before requesting disp caps

2020-08-19 Thread Sasha Levin
Hi

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag
fixing commit: 4a2cb4181b07 ("drm/nouveau/kms/nv50-: Probe SOR and PIOR caps 
for DP interlacing support").

The bot has tested the following trees: v5.8.1.

v5.8.1: Failed to apply! Possible dependencies:
3c43c362b3a5 ("drm/nouveau/kms/nv50-: convert core caps_init() to new push 
macros")
5e691222eac6 ("drm/nouveau/kms/nv50-: convert core init() to new push 
macros")
d8b24526ef68 ("drm/nouveau/kms/nv50-: use NVIDIA's headers for core 
caps_init()")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] drm/nouveau/kms/nv50-: Program notifier offset before requesting disp caps

2020-08-19 Thread Sasha Levin
Hi

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag
fixing commit: 4a2cb4181b07 ("drm/nouveau/kms/nv50-: Probe SOR and PIOR caps 
for DP interlacing support").

The bot has tested the following trees: v5.8.1.

v5.8.1: Failed to apply! Possible dependencies:
3c43c362b3a5 ("drm/nouveau/kms/nv50-: convert core caps_init() to new push 
macros")
5e691222eac6 ("drm/nouveau/kms/nv50-: convert core init() to new push 
macros")
d8b24526ef68 ("drm/nouveau/kms/nv50-: use NVIDIA's headers for core 
caps_init()")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [RFC 13/20] drm/i915/dp: Extract drm_dp_downstream_read_info()

2020-08-19 Thread Lyude Paul
(adding Ville and Imre to the cc here, they might be interested to know about
this, comments down below)

On Wed, 2020-08-19 at 11:15 -0400, Sean Paul wrote:
> On Tue, Aug 11, 2020 at 04:04:50PM -0400, Lyude Paul wrote:
> > We're going to be doing the same probing process in nouveau for
> > determining downstream DP port capabilities, so let's deduplicate the
> > work by moving i915's code for handling this into a shared helper:
> > drm_dp_downstream_read_info().
> > 
> > Note that when we do this, we also do make some functional changes while
> > we're at it:
> > * We always clear the downstream port info before trying to read it,
> >   just to make things easier for the caller
> > * We skip reading downstream port info if the DPCD indicates that we
> >   don't support downstream port info
> > * We only read as many bytes as needed for the reported number of
> >   downstream ports, no sense in reading the whole thing every time
> > 
> > Signed-off-by: Lyude Paul 
> > ---
> >  drivers/gpu/drm/drm_dp_helper.c | 32 +
> >  drivers/gpu/drm/i915/display/intel_dp.c | 14 ++-
> >  include/drm/drm_dp_helper.h |  3 +++
> >  3 files changed, 37 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_dp_helper.c
> > b/drivers/gpu/drm/drm_dp_helper.c
> > index 4c21cf69dad5a..9703b33599c3b 100644
> > --- a/drivers/gpu/drm/drm_dp_helper.c
> > +++ b/drivers/gpu/drm/drm_dp_helper.c
> > @@ -423,6 +423,38 @@ bool drm_dp_send_real_edid_checksum(struct drm_dp_aux
> > *aux,
> >  }
> >  EXPORT_SYMBOL(drm_dp_send_real_edid_checksum);
> >  
> > +/**
> > + * drm_dp_downstream_read_info() - read DPCD downstream port info if
> > available
> > + * @aux: DisplayPort AUX channel
> > + * @dpcd: A cached copy of the port's DPCD
> > + * @downstream_ports: buffer to store the downstream port info in
> > + *
> > + * Returns: 0 if either the downstream port info was read successfully or
> > + * there was no downstream info to read, or a negative error code
> > otherwise.
> > + */
> > +int drm_dp_downstream_read_info(struct drm_dp_aux *aux,
> > +   const u8 dpcd[DP_RECEIVER_CAP_SIZE],
> > +   u8 downstream_ports[DP_MAX_DOWNSTREAM_PORTS])
> > +{
> > +   int ret;
> > +   u8 len;
> > +
> > +   memset(downstream_ports, 0, DP_MAX_DOWNSTREAM_PORTS);
> > +
> > +   /* No downstream info to read */
> > +   if (!drm_dp_is_branch(dpcd) ||
> > +   dpcd[DP_DPCD_REV] < DP_DPCD_REV_10 ||
> > +   !(dpcd[DP_DOWNSTREAMPORT_PRESENT] & DP_DWN_STRM_PORT_PRESENT))
> > +   return 0;
> > +
> > +   len = (dpcd[DP_DOWN_STREAM_PORT_COUNT] & DP_PORT_COUNT_MASK) * 4;
> 
> I'm having a hard time rationalizing DP_MAX_DOWNSTREAM_PORTS being 16, but
> only
> having 4 ports worth of data in the DP_DOWNSTREAM_PORT_* registers. Do you
> know
> what's supposed to happen if dpcd[DP_DOWN_STREAM_PORT_COUNT] is > 4?
> 
ok!! Taking a lesson from our available_pbn/full_pbn confusion in the past, I
squinted very hard at the specification and eventually found something that I
think clears this up. Surprise - we definitely had this implemented incorrectly
in i915

>From section 5.3.3.1:

   Either one or four bytes are used, per DFP type indication. Therefore, up to
   16 (with 1-byte descriptor) or four (with 4-byte descriptor) DFP capabilities
   can be stored.

So, a couple takeaways from this:

 * A DisplayPort connector can have *multiple* different downstream port types,
   which I think actually makes sense as I've seen an adapter like this before.
 * We actually added the ability to determine the downstream port type for DP
   connectors using the subconnector prop, but it seems like if we want to aim
   for completeness we're going to need to come up with a new prop that can
   report multiple downstream port types :\.
 * It's not explicitly mentioned, but I'm assuming the correct way of handling
   multiple downstream BPC/pixel clock capabilities is to assume the max
   BPC/pixel clock is derived from the lowest max BPC/pixel clock we find on
   *connected* downstream ports (anything else wouldn't really make sense, imho)

So I'm going to rewrite this so we handle this properly in
drm_dp_downstream_read_info() and related helpers. I don't currently have the
time to do this, but if there's interest upstream in properly reporting the
downstream port types of DP ports in userspace someone might want to consider
coming up with another prop that accounts for multiple different downstream port
types.

> Sean
> 
> > +   ret = drm_dp_dpcd_read(aux, DP_DOWNSTREAM_PORT_0, downstream_ports,
> > +  len);
> > +
> > +   return ret == len ? 0 : -EIO;
> > +}
> > +EXPORT_SYMBOL(drm_dp_downstream_read_info);
> > +
> >  /**
> >   * drm_dp_downstream_max_clock() - extract branch device max
> >   * pixel rate for legacy VGA
> > diff --git a/drivers/gpu/drm/i915/display/intel_dp.c
> > 

Re: [Nouveau] [PATCH 10/20] drm/omapdrm: Introduce GEM object functions

2020-08-19 Thread Tomi Valkeinen
Hi,

On 13/08/2020 11:36, Thomas Zimmermann wrote:
> GEM object functions deprecate several similar callback interfaces in
> struct drm_driver. This patch replaces the per-driver callbacks with
> per-instance callbacks in omapdrm.
> 
> Signed-off-by: Thomas Zimmermann 
> ---
>  drivers/gpu/drm/omapdrm/omap_drv.c |  9 -
>  drivers/gpu/drm/omapdrm/omap_gem.c | 16 +++-
>  drivers/gpu/drm/omapdrm/omap_gem.h |  1 -
>  3 files changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/omapdrm/omap_drv.c 
> b/drivers/gpu/drm/omapdrm/omap_drv.c
> index 53d5e184ee77..2e598b8b72af 100644
> --- a/drivers/gpu/drm/omapdrm/omap_drv.c
> +++ b/drivers/gpu/drm/omapdrm/omap_drv.c
> @@ -521,12 +521,6 @@ static int dev_open(struct drm_device *dev, struct 
> drm_file *file)
>   return 0;
>  }
>  
> -static const struct vm_operations_struct omap_gem_vm_ops = {
> - .fault = omap_gem_fault,
> - .open = drm_gem_vm_open,
> - .close = drm_gem_vm_close,
> -};
> -
>  static const struct file_operations omapdriver_fops = {
>   .owner = THIS_MODULE,
>   .open = drm_open,
> @@ -549,10 +543,7 @@ static struct drm_driver omap_drm_driver = {
>  #endif
>   .prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>   .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> - .gem_prime_export = omap_gem_prime_export,
>   .gem_prime_import = omap_gem_prime_import,
> - .gem_free_object_unlocked = omap_gem_free_object,
> - .gem_vm_ops = _gem_vm_ops,
>   .dumb_create = omap_gem_dumb_create,
>   .dumb_map_offset = omap_gem_dumb_map_offset,
>   .ioctls = ioctls,
> diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c 
> b/drivers/gpu/drm/omapdrm/omap_gem.c
> index d0d12d5dd76c..d68dc63dea0a 100644
> --- a/drivers/gpu/drm/omapdrm/omap_gem.c
> +++ b/drivers/gpu/drm/omapdrm/omap_gem.c
> @@ -487,7 +487,7 @@ static vm_fault_t omap_gem_fault_2d(struct drm_gem_object 
> *obj,
>   * vma->vm_private_data points to the GEM object that is backing this
>   * mapping.
>   */
> -vm_fault_t omap_gem_fault(struct vm_fault *vmf)
> +static vm_fault_t omap_gem_fault(struct vm_fault *vmf)
>  {
>   struct vm_area_struct *vma = vmf->vma;
>   struct drm_gem_object *obj = vma->vm_private_data;
> @@ -1169,6 +1169,18 @@ static bool omap_gem_validate_flags(struct drm_device 
> *dev, u32 flags)
>   return true;
>  }
>  
> +static const struct vm_operations_struct omap_gem_vm_ops = {
> + .fault = omap_gem_fault,
> + .open = drm_gem_vm_open,
> + .close = drm_gem_vm_close,
> +};
> +
> +static const struct drm_gem_object_funcs omap_gem_object_funcs = {
> + .free = omap_gem_free_object,
> + .export = omap_gem_prime_export,
> + .vm_ops = _gem_vm_ops,
> +};
> +
>  /* GEM buffer object constructor */
>  struct drm_gem_object *omap_gem_new(struct drm_device *dev,
>   union omap_gem_size gsize, u32 flags)
> @@ -1236,6 +1248,8 @@ struct drm_gem_object *omap_gem_new(struct drm_device 
> *dev,
>   size = PAGE_ALIGN(gsize.bytes);
>   }
>  
> + obj->funcs = _gem_object_funcs;
> +
>   /* Initialize the GEM object. */
>   if (!(flags & OMAP_BO_MEM_SHMEM)) {
>   drm_gem_private_object_init(dev, obj, size);
> diff --git a/drivers/gpu/drm/omapdrm/omap_gem.h 
> b/drivers/gpu/drm/omapdrm/omap_gem.h
> index 729b7812a815..9e6b5c8195d9 100644
> --- a/drivers/gpu/drm/omapdrm/omap_gem.h
> +++ b/drivers/gpu/drm/omapdrm/omap_gem.h
> @@ -69,7 +69,6 @@ struct dma_buf *omap_gem_prime_export(struct drm_gem_object 
> *obj, int flags);
>  struct drm_gem_object *omap_gem_prime_import(struct drm_device *dev,
>   struct dma_buf *buffer);
>  
> -vm_fault_t omap_gem_fault(struct vm_fault *vmf);
>  int omap_gem_roll(struct drm_gem_object *obj, u32 roll);
>  void omap_gem_cpu_sync_page(struct drm_gem_object *obj, int pgoff);
>  void omap_gem_dma_sync_buffer(struct drm_gem_object *obj,

omap_gem_free_object() can also be made static, and removed from omap_gem.h.

Tested on AM5 EVM.

Reviewed-by: Tomi Valkeinen 

 Tomi

-- 
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [RFC 19/20] drm/i915/dp: Extract drm_dp_read_dpcd_caps()

2020-08-19 Thread Sean Paul
On Tue, Aug 11, 2020 at 04:04:56PM -0400, Lyude Paul wrote:
> Since DP 1.3, it's been possible for DP receivers to specify an
> additional set of DPCD capabilities, which can take precedence over the
> capabilities reported at DP_DPCD_REV.
> 
> Basically any device supporting DP is going to need to read these in an
> identical manner, in particular nouveau, so let's go ahead and just move
> this code out of i915 into a shared DRM DP helper that we can use in
> other drivers.
> 
> Signed-off-by: Lyude Paul 
> ---
>  drivers/gpu/drm/drm_dp_helper.c | 76 +
>  drivers/gpu/drm/i915/display/intel_dp.c | 60 +---
>  drivers/gpu/drm/i915/display/intel_dp.h |  1 -
>  drivers/gpu/drm/i915/display/intel_lspcon.c |  2 +-
>  include/drm/drm_dp_helper.h |  3 +
>  5 files changed, 82 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_helper.c b/drivers/gpu/drm/drm_dp_helper.c
> index 0ff2959c8f8e8..f9445915c6c26 100644
> --- a/drivers/gpu/drm/drm_dp_helper.c
> +++ b/drivers/gpu/drm/drm_dp_helper.c
> @@ -423,6 +423,82 @@ bool drm_dp_send_real_edid_checksum(struct drm_dp_aux 
> *aux,
>  }
>  EXPORT_SYMBOL(drm_dp_send_real_edid_checksum);
>  
> +static int drm_dp_read_extended_dpcd_caps(struct drm_dp_aux *aux,
> +   u8 dpcd[DP_RECEIVER_CAP_SIZE])
> +{
> + u8 dpcd_ext[6];
> + int ret;
> +
> + /*
> +  * Prior to DP1.3 the bit represented by
> +  * DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT was reserved.
> +  * If it is set DP_DPCD_REV at h could be at a value less than
> +  * the true capability of the panel. The only way to check is to
> +  * then compare h and 2200h.
> +  */
> + if (!(dpcd[DP_TRAINING_AUX_RD_INTERVAL] &
> +   DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT))
> + return 0;
> +
> + ret = drm_dp_dpcd_read(aux, DP_DP13_DPCD_REV, _ext,
> +sizeof(dpcd_ext));
> + if (ret != sizeof(dpcd_ext))
> + return -EIO;
> +
> + if (dpcd[DP_DPCD_REV] > dpcd_ext[DP_DPCD_REV]) {
> + DRM_DEBUG_KMS("%s: Extended DPCD rev less than base DPCD rev 
> (%d > %d)\n",
> +   aux->name, dpcd[DP_DPCD_REV],
> +   dpcd_ext[DP_DPCD_REV]);

Might be a good opportunity to convert all of these to drm_dbg_dp()?

> + return 0;
> + }
> +
> + if (!memcmp(dpcd, dpcd_ext, sizeof(dpcd_ext)))
> + return 0;
> +
> + DRM_DEBUG_KMS("%s: Base DPCD: %*ph\n",
> +   aux->name, DP_RECEIVER_CAP_SIZE, dpcd);
> +
> + memcpy(dpcd, dpcd_ext, sizeof(dpcd_ext));
> +
> + return 0;
> +}
> +
> +/**
> + * drm_dp_read_dpcd_caps() - read DPCD caps and extended DPCD caps if
> + * available
> + * @aux: DisplayPort AUX channel
> + * @dpcd: Buffer to store the resulting DPCD in
> + *
> + * Attempts to read the base DPCD caps for @aux. Additionally, this function
> + * checks for and reads the extended DPRX caps (%DP_DP13_DPCD_REV) if
> + * present.
> + *
> + * Returns: %0 if the DPCD was read successfully, negative error code
> + * otherwise.
> + */
> +int drm_dp_read_dpcd_caps(struct drm_dp_aux *aux,
> +   u8 dpcd[DP_RECEIVER_CAP_SIZE])
> +{
> + int ret;
> +
> + ret = drm_dp_dpcd_read(aux, DP_DPCD_REV, dpcd, DP_RECEIVER_CAP_SIZE);
> + if (ret != DP_RECEIVER_CAP_SIZE || dpcd[DP_DPCD_REV] == 0)
> + return -EIO;
> +
> + ret = drm_dp_read_extended_dpcd_caps(aux, dpcd);
> + if (ret < 0)
> + return ret;

I wonder if we should just go with the "regular" dpcd caps we just read in this
case?

Regardless of my nits,

Reviewed-by: Sean Paul 

> +
> + DRM_DEBUG_KMS("%s: DPCD: %*ph\n",
> +   aux->name, DP_RECEIVER_CAP_SIZE, dpcd);
> +
> + if (dpcd[DP_DPCD_REV] == 0)
> + ret = -EIO;
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(drm_dp_read_dpcd_caps);
> +
>  /**
>   * drm_dp_downstream_read_info() - read DPCD downstream port info if 
> available
>   * @aux: DisplayPort AUX channel
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
> b/drivers/gpu/drm/i915/display/intel_dp.c
> index e343965a483df..230aa0360dc61 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -4449,62 +4449,6 @@ intel_dp_link_down(struct intel_encoder *encoder,
>   }
>  }
>  
> -static void
> -intel_dp_extended_receiver_capabilities(struct intel_dp *intel_dp)
> -{
> - struct drm_i915_private *i915 = dp_to_i915(intel_dp);
> - u8 dpcd_ext[6];
> -
> - /*
> -  * Prior to DP1.3 the bit represented by
> -  * DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT was reserved.
> -  * if it is set DP_DPCD_REV at h could be at a value less than
> -  * the true capability of the panel. The only way to check is to
> -  * then compare h and 2200h.
> -  */
> - if 

Re: [Nouveau] [RFC 16/20] drm/i915/dp: Extract drm_dp_get_sink_count()

2020-08-19 Thread Sean Paul
On Tue, Aug 11, 2020 at 04:04:53PM -0400, Lyude Paul wrote:
> And of course, we'll also need to read the sink count from other drivers
> as well if we're checking whether or not it's supported. So, let's
> extract the code for this into another helper.
> 
> Signed-off-by: Lyude Paul 
> ---
>  drivers/gpu/drm/drm_dp_helper.c | 20 
>  drivers/gpu/drm/i915/display/intel_dp.c | 17 +
>  include/drm/drm_dp_helper.h |  1 +
>  3 files changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_helper.c b/drivers/gpu/drm/drm_dp_helper.c
> index 05bb47e589731..0ff2959c8f8e8 100644
> --- a/drivers/gpu/drm/drm_dp_helper.c
> +++ b/drivers/gpu/drm/drm_dp_helper.c
> @@ -722,6 +722,26 @@ bool drm_dp_has_sink_count(struct drm_connector 
> *connector,
>  }
>  EXPORT_SYMBOL(drm_dp_has_sink_count);
>  
> +/**
> + * drm_dp_get_sink_count() - Retrieve the sink count for a given sink
> + * @aux: The DP AUX channel to use
> + *
> + * Returns: The current sink count reported by @aux, or a negative error code
> + * otherwise.
> + */
> +int drm_dp_get_sink_count(struct drm_dp_aux *aux)
> +{
> + u8 count;
> + int ret;
> +
> + ret = drm_dp_dpcd_readb(aux, DP_SINK_COUNT, );
> + if (ret < 1)
> + return -EIO;

This should probably be:
if (ret < 0)
return ret;
else if (ret != 1)
return -EIO;


> +
> + return DP_GET_SINK_COUNT(count);
> +}
> +EXPORT_SYMBOL(drm_dp_get_sink_count);
> +
>  /*
>   * I2C-over-AUX implementation
>   */
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
> b/drivers/gpu/drm/i915/display/intel_dp.c
> index 35a4779a442e2..e343965a483df 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -4648,6 +4648,8 @@ intel_dp_has_sink_count(struct intel_dp *intel_dp)
>  static bool
>  intel_dp_get_dpcd(struct intel_dp *intel_dp)
>  {
> + int ret;
> +
>   if (!intel_dp_read_dpcd(intel_dp))
>   return false;
>  
> @@ -4664,20 +4666,10 @@ intel_dp_get_dpcd(struct intel_dp *intel_dp)
>   }
>  
>   if (intel_dp_has_sink_count(intel_dp)) {
> - u8 count;
> - ssize_t r;
> -
> - r = drm_dp_dpcd_readb(_dp->aux, DP_SINK_COUNT, );
> - if (r < 1)
> + ret = drm_dp_get_sink_count(_dp->aux);
> + if (ret < 0)
>   return false;
>  
> - /*
> -  * Sink count can change between short pulse hpd hence
> -  * a member variable in intel_dp will track any changes
> -  * between short pulse interrupts.
> -  */

I think you could probably keep this comment and relocate the 
intel_dp->sink_count
assignment back.

With these nits (or something like them),

Reviewed-by: Sean Paul 

> - intel_dp->sink_count = DP_GET_SINK_COUNT(count);
> -
>   /*
>* SINK_COUNT == 0 and DOWNSTREAM_PORT_PRESENT == 1 implies that
>* a dongle is present but no display. Unless we require to know
> @@ -4685,6 +4677,7 @@ intel_dp_get_dpcd(struct intel_dp *intel_dp)
>* downstream port information. So, an early return here saves
>* time from performing other operations which are not required.
>*/
> + intel_dp->sink_count = ret;
>   if (!intel_dp->sink_count)
>   return false;
>   }
> diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> index a1413a531eaf4..0c141fc81aaa8 100644
> --- a/include/drm/drm_dp_helper.h
> +++ b/include/drm/drm_dp_helper.h
> @@ -1635,6 +1635,7 @@ struct drm_dp_desc;
>  bool drm_dp_has_sink_count(struct drm_connector *connector,
>  const u8 dpcd[DP_RECEIVER_CAP_SIZE],
>  const struct drm_dp_desc *desc);
> +int drm_dp_get_sink_count(struct drm_dp_aux *aux);
>  
>  void drm_dp_remote_aux_init(struct drm_dp_aux *aux);
>  void drm_dp_aux_init(struct drm_dp_aux *aux);
> -- 
> 2.26.2
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Sean Paul, Software Engineer, Google / Chromium OS
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [RFC 15/20] drm/i915/dp: Extract drm_dp_has_sink_count()

2020-08-19 Thread Sean Paul
On Tue, Aug 11, 2020 at 04:04:52PM -0400, Lyude Paul wrote:
> Since other drivers are also going to need to be aware of the sink count
> in order to do proper dongle detection, we might as well steal i915's
> DP_SINK_COUNT helpers and move them into DRM helpers so that other
> dirvers can use them as well.
> 
> Note that this also starts using intel_dp_has_sink_count() in
> intel_dp_detect_dpcd(), which is a functional change.
> 

Reviewed-by: Sean Paul 

> Signed-off-by: Lyude Paul 
> ---
>  drivers/gpu/drm/drm_dp_helper.c | 22 ++
>  drivers/gpu/drm/i915/display/intel_dp.c | 21 -
>  include/drm/drm_dp_helper.h |  8 +++-
>  3 files changed, 41 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_helper.c b/drivers/gpu/drm/drm_dp_helper.c
> index 9703b33599c3b..05bb47e589731 100644
> --- a/drivers/gpu/drm/drm_dp_helper.c
> +++ b/drivers/gpu/drm/drm_dp_helper.c
> @@ -700,6 +700,28 @@ void drm_dp_set_subconnector_property(struct 
> drm_connector *connector,
>  }
>  EXPORT_SYMBOL(drm_dp_set_subconnector_property);
>  
> +/**
> + * drm_dp_has_sink_count() - Check whether a given connector has a valid sink
> + * count
> + * @connector: The DRM connector to check
> + * @dpcd: A cached copy of the connector's DPCD RX capabilities
> + * @desc: A cached copy of the connector's DP descriptor
> + *
> + * Returns: %True if the (e)DP connector has a valid sink count that should
> + * be probed, %false otherwise.
> + */
> +bool drm_dp_has_sink_count(struct drm_connector *connector,
> +const u8 dpcd[DP_RECEIVER_CAP_SIZE],
> +const struct drm_dp_desc *desc)
> +{
> + /* Some eDP panels don't set a valid value for the sink count */
> + return connector->connector_type != DRM_MODE_CONNECTOR_eDP &&
> + dpcd[DP_DPCD_REV] >= DP_DPCD_REV_11 &&
> + dpcd[DP_DOWNSTREAMPORT_PRESENT] & DP_DWN_STRM_PORT_PRESENT &&
> + !drm_dp_has_quirk(desc, 0, DP_DPCD_QUIRK_NO_SINK_COUNT);
> +}
> +EXPORT_SYMBOL(drm_dp_has_sink_count);
> +
>  /*
>   * I2C-over-AUX implementation
>   */
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
> b/drivers/gpu/drm/i915/display/intel_dp.c
> index 984e49194ca31..35a4779a442e2 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -4634,6 +4634,16 @@ intel_edp_init_dpcd(struct intel_dp *intel_dp)
>   return true;
>  }
>  
> +static bool
> +intel_dp_has_sink_count(struct intel_dp *intel_dp)
> +{
> + if (!intel_dp->attached_connector)
> + return false;
> +
> + return drm_dp_has_sink_count(_dp->attached_connector->base,
> +  intel_dp->dpcd,
> +  _dp->desc);
> +}
>  
>  static bool
>  intel_dp_get_dpcd(struct intel_dp *intel_dp)
> @@ -4653,13 +4663,7 @@ intel_dp_get_dpcd(struct intel_dp *intel_dp)
>   intel_dp_set_common_rates(intel_dp);
>   }
>  
> - /*
> -  * Some eDP panels do not set a valid value for sink count, that is why
> -  * it don't care about read it here and in intel_edp_init_dpcd().
> -  */
> - if (!intel_dp_is_edp(intel_dp) &&
> - !drm_dp_has_quirk(_dp->desc, 0,
> -   DP_DPCD_QUIRK_NO_SINK_COUNT)) {
> + if (intel_dp_has_sink_count(intel_dp)) {
>   u8 count;
>   ssize_t r;
>  
> @@ -5939,9 +5943,8 @@ intel_dp_detect_dpcd(struct intel_dp *intel_dp)
>   return connector_status_connected;
>  
>   /* If we're HPD-aware, SINK_COUNT changes dynamically */
> - if (intel_dp->dpcd[DP_DPCD_REV] >= 0x11 &&
> + if (intel_dp_has_sink_count(intel_dp) &&
>   intel_dp->downstream_ports[0] & DP_DS_PORT_HPD) {
> -
>   return intel_dp->sink_count ?
>   connector_status_connected : connector_status_disconnected;
>   }
> diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> index 1349f16564ace..a1413a531eaf4 100644
> --- a/include/drm/drm_dp_helper.h
> +++ b/include/drm/drm_dp_helper.h
> @@ -1631,6 +1631,11 @@ void drm_dp_set_subconnector_property(struct 
> drm_connector *connector,
> const u8 *dpcd,
> const u8 port_cap[4]);
>  
> +struct drm_dp_desc;
> +bool drm_dp_has_sink_count(struct drm_connector *connector,
> +const u8 dpcd[DP_RECEIVER_CAP_SIZE],
> +const struct drm_dp_desc *desc);
> +
>  void drm_dp_remote_aux_init(struct drm_dp_aux *aux);
>  void drm_dp_aux_init(struct drm_dp_aux *aux);
>  int drm_dp_aux_register(struct drm_dp_aux *aux);
> @@ -1689,7 +1694,8 @@ enum drm_dp_quirk {
>* @DP_DPCD_QUIRK_NO_SINK_COUNT:
>*
>* The device does not set SINK_COUNT to a non-zero value.
> -  * The driver should ignore SINK_COUNT during detection.
> +  * The driver should ignore SINK_COUNT 

Re: [Nouveau] [RFC 13/20] drm/i915/dp: Extract drm_dp_downstream_read_info()

2020-08-19 Thread Sean Paul
On Tue, Aug 11, 2020 at 04:04:50PM -0400, Lyude Paul wrote:
> We're going to be doing the same probing process in nouveau for
> determining downstream DP port capabilities, so let's deduplicate the
> work by moving i915's code for handling this into a shared helper:
> drm_dp_downstream_read_info().
> 
> Note that when we do this, we also do make some functional changes while
> we're at it:
> * We always clear the downstream port info before trying to read it,
>   just to make things easier for the caller
> * We skip reading downstream port info if the DPCD indicates that we
>   don't support downstream port info
> * We only read as many bytes as needed for the reported number of
>   downstream ports, no sense in reading the whole thing every time
> 
> Signed-off-by: Lyude Paul 
> ---
>  drivers/gpu/drm/drm_dp_helper.c | 32 +
>  drivers/gpu/drm/i915/display/intel_dp.c | 14 ++-
>  include/drm/drm_dp_helper.h |  3 +++
>  3 files changed, 37 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_helper.c b/drivers/gpu/drm/drm_dp_helper.c
> index 4c21cf69dad5a..9703b33599c3b 100644
> --- a/drivers/gpu/drm/drm_dp_helper.c
> +++ b/drivers/gpu/drm/drm_dp_helper.c
> @@ -423,6 +423,38 @@ bool drm_dp_send_real_edid_checksum(struct drm_dp_aux 
> *aux,
>  }
>  EXPORT_SYMBOL(drm_dp_send_real_edid_checksum);
>  
> +/**
> + * drm_dp_downstream_read_info() - read DPCD downstream port info if 
> available
> + * @aux: DisplayPort AUX channel
> + * @dpcd: A cached copy of the port's DPCD
> + * @downstream_ports: buffer to store the downstream port info in
> + *
> + * Returns: 0 if either the downstream port info was read successfully or
> + * there was no downstream info to read, or a negative error code otherwise.
> + */
> +int drm_dp_downstream_read_info(struct drm_dp_aux *aux,
> + const u8 dpcd[DP_RECEIVER_CAP_SIZE],
> + u8 downstream_ports[DP_MAX_DOWNSTREAM_PORTS])
> +{
> + int ret;
> + u8 len;
> +
> + memset(downstream_ports, 0, DP_MAX_DOWNSTREAM_PORTS);
> +
> + /* No downstream info to read */
> + if (!drm_dp_is_branch(dpcd) ||
> + dpcd[DP_DPCD_REV] < DP_DPCD_REV_10 ||
> + !(dpcd[DP_DOWNSTREAMPORT_PRESENT] & DP_DWN_STRM_PORT_PRESENT))
> + return 0;
> +
> + len = (dpcd[DP_DOWN_STREAM_PORT_COUNT] & DP_PORT_COUNT_MASK) * 4;

I'm having a hard time rationalizing DP_MAX_DOWNSTREAM_PORTS being 16, but only
having 4 ports worth of data in the DP_DOWNSTREAM_PORT_* registers. Do you know
what's supposed to happen if dpcd[DP_DOWN_STREAM_PORT_COUNT] is > 4?

Sean

> + ret = drm_dp_dpcd_read(aux, DP_DOWNSTREAM_PORT_0, downstream_ports,
> +len);
> +
> + return ret == len ? 0 : -EIO;
> +}
> +EXPORT_SYMBOL(drm_dp_downstream_read_info);
> +
>  /**
>   * drm_dp_downstream_max_clock() - extract branch device max
>   * pixel rate for legacy VGA
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
> b/drivers/gpu/drm/i915/display/intel_dp.c
> index 1e29d3a012856..984e49194ca31 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -4685,18 +4685,8 @@ intel_dp_get_dpcd(struct intel_dp *intel_dp)
>   return false;
>   }
>  
> - if (!drm_dp_is_branch(intel_dp->dpcd))
> - return true; /* native DP sink */
> -
> - if (intel_dp->dpcd[DP_DPCD_REV] == 0x10)
> - return true; /* no per-port downstream info */
> -
> - if (drm_dp_dpcd_read(_dp->aux, DP_DOWNSTREAM_PORT_0,
> -  intel_dp->downstream_ports,
> -  DP_MAX_DOWNSTREAM_PORTS) < 0)
> - return false; /* downstream port status fetch failed */
> -
> - return true;
> + return drm_dp_downstream_read_info(_dp->aux, intel_dp->dpcd,
> +intel_dp->downstream_ports) == 0;
>  }
>  
>  static bool
> diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> index 5c28199248626..1349f16564ace 100644
> --- a/include/drm/drm_dp_helper.h
> +++ b/include/drm/drm_dp_helper.h
> @@ -1613,6 +1613,9 @@ int drm_dp_dpcd_read_link_status(struct drm_dp_aux *aux,
>  bool drm_dp_send_real_edid_checksum(struct drm_dp_aux *aux,
>   u8 real_edid_checksum);
>  
> +int drm_dp_downstream_read_info(struct drm_dp_aux *aux,
> + const u8 dpcd[DP_RECEIVER_CAP_SIZE],
> + u8 downstream_ports[DP_MAX_DOWNSTREAM_PORTS]);
>  int drm_dp_downstream_max_clock(const u8 dpcd[DP_RECEIVER_CAP_SIZE],
>   const u8 port_cap[4]);
>  int drm_dp_downstream_max_bpc(const u8 dpcd[DP_RECEIVER_CAP_SIZE],
> -- 
> 2.26.2
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> 

Re: [Nouveau] [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages

2020-08-19 Thread Tomasz Figa
Hi Christoph,

On Wed, Aug 19, 2020 at 8:57 AM Christoph Hellwig  wrote:
>
> Add a new API to allocate and free pages that are guaranteed to be
> addressable by a device, but otherwise behave like pages allocated by
> alloc_pages.  The intended APIs to sync them for use with the device
> and cpu are dma_sync_single_for_{device,cpu} that are also used for
> streaming mappings.
>
> Switch all drivers over to this new API, but keep the usage of the
> crufty dma_cache_sync API for now, which will be cleaned up on a driver
> by driver basis.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  Documentation/core-api/dma-api.rst| 68 +++
>  Documentation/core-api/dma-attributes.rst |  8 ---
>  arch/alpha/kernel/pci_iommu.c |  2 +
>  arch/arm/mm/dma-mapping-nommu.c   |  2 +
>  arch/arm/mm/dma-mapping.c |  4 ++
>  arch/ia64/hp/common/sba_iommu.c   |  2 +
>  arch/mips/jazz/jazzdma.c  |  7 +--
>  arch/powerpc/kernel/dma-iommu.c   |  2 +
>  arch/powerpc/platforms/ps3/system-bus.c   |  4 ++
>  arch/powerpc/platforms/pseries/vio.c  |  2 +
>  arch/s390/pci/pci_dma.c   |  2 +
>  arch/x86/kernel/amd_gart_64.c |  2 +
>  drivers/iommu/dma-iommu.c |  2 +
>  drivers/iommu/intel/iommu.c   |  4 ++
>  drivers/net/ethernet/i825xx/lasi_82596.c  | 13 ++---
>  drivers/net/ethernet/seeq/sgiseeq.c   | 12 ++--
>  drivers/parisc/ccio-dma.c |  2 +
>  drivers/parisc/sba_iommu.c|  2 +
>  drivers/scsi/53c700.c |  8 +--
>  drivers/scsi/sgiwd93.c| 12 ++--
>  drivers/xen/swiotlb-xen.c |  2 +
>  include/linux/dma-direct.h|  5 ++
>  include/linux/dma-mapping.h   | 29 --
>  include/linux/dma-noncoherent.h   |  3 -
>  kernel/dma/direct.c   | 51 -
>  kernel/dma/mapping.c  | 43 +-
>  kernel/dma/ops_helpers.c  | 35 
>  kernel/dma/virt.c |  2 +
>  sound/mips/hal2.c | 20 +++
>  29 files changed, 254 insertions(+), 96 deletions(-)
>

Thanks for the patch. The general design looks quite nice, but please
see my comments inline.


> diff --git a/Documentation/core-api/dma-api.rst 
> b/Documentation/core-api/dma-api.rst
> index 90239348b30f6f..047fcfffa0e5cf 100644
> --- a/Documentation/core-api/dma-api.rst
> +++ b/Documentation/core-api/dma-api.rst
> @@ -516,48 +516,53 @@ routines, e.g.:::
> }
>
>
> -Part II - Advanced dma usage
> -
> +Part II - Non-coherent DMA allocations
> +--
>
> -Warning: These pieces of the DMA API should not be used in the
> -majority of cases, since they cater for unlikely corner cases that
> -don't belong in usual drivers.
> +These APIs allow to allocate pages that can be used like normal pages
> +in the kernel direct mapping, but are guaranteed to be DMA addressable.

Could we elaborate a bit more on what "like normal pages in kernel
direct mapping" mean from the driver perspective?

>
>  If you don't understand how cache line coherency works between a
>  processor and an I/O device, you should not be using this part of the
> -API at all.
> +API.
>
>  ::
>
> void *
> -   dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t 
> *dma_handle,
> -   gfp_t flag, unsigned long attrs)
> +   dma_alloc_pages(struct device *dev, size_t size, dma_addr_t 
> *dma_handle,
> +   enum dma_data_direction dir, gfp_t gfp)
> +
> +This routine allocates a region of  bytes of consistent memory.  It
> +returns a pointer to the allocated region (in the processor's virtual address
> +space) or NULL if the allocation failed. The returned memory is guanteed to
> +behave like memory allocated using alloc_pages.

There is one aspect that the existing dma_alloc_attrs() handles, but
this new function doesn't: IOMMU support. The function will always
allocate a physically-contiguous block memory, which is a costly
operation and not even guaranteed to succeed, even if enough free
memory is available.

Modern SoCs employ IOMMUs to avoid the need to allocate
physically-contiguous memory and those happen to be also the devices
that could benefit from non-coherent allocations a lot. One of the
tasks of the DMA API was making it possible to allocate suitable
memory for a given device, without having the driver know about the
SoC integration details, such as the presence of an IOMMU.

Today, dma_alloc_attrs() uses the .alloc callback of the dma_ops
struct and the IOMMU-aware implementations, like the dma-iommu helpers
[1], would allocate discontiguous pages. Therefore, while I see the
DMA-aware page allocation functionality as a useful functionality on
its own for scatter-gather-capable hardware, I believe it is 

Re: [Nouveau] [RFC 09/20] drm/i915/dp: Extract drm_dp_has_mst()

2020-08-19 Thread Sean Paul
On Tue, Aug 11, 2020 at 04:04:46PM -0400, Lyude Paul wrote:
> Just a tiny drive-by cleanup, we can consolidate i915's code for
> checking for MST support into a helper to be shared across drivers.
> 

Reviewed-by: Sean Paul 

> Signed-off-by: Lyude Paul 
> ---
>  drivers/gpu/drm/i915/display/intel_dp.c | 18 ++
>  include/drm/drm_dp_mst_helper.h | 22 ++
>  2 files changed, 24 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
> b/drivers/gpu/drm/i915/display/intel_dp.c
> index 79c27f91f42c0..1e29d3a012856 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -4699,20 +4699,6 @@ intel_dp_get_dpcd(struct intel_dp *intel_dp)
>   return true;
>  }
>  
> -static bool
> -intel_dp_sink_can_mst(struct intel_dp *intel_dp)
> -{
> - u8 mstm_cap;
> -
> - if (intel_dp->dpcd[DP_DPCD_REV] < 0x12)
> - return false;
> -
> - if (drm_dp_dpcd_readb(_dp->aux, DP_MSTM_CAP, _cap) != 1)
> - return false;
> -
> - return mstm_cap & DP_MST_CAP;
> -}
> -
>  static bool
>  intel_dp_can_mst(struct intel_dp *intel_dp)
>  {
> @@ -4720,7 +4706,7 @@ intel_dp_can_mst(struct intel_dp *intel_dp)
>  
>   return i915->params.enable_dp_mst &&
>   intel_dp->can_mst &&
> - intel_dp_sink_can_mst(intel_dp);
> + drm_dp_has_mst(_dp->aux, intel_dp->dpcd);
>  }
>  
>  static void
> @@ -4729,7 +4715,7 @@ intel_dp_configure_mst(struct intel_dp *intel_dp)
>   struct drm_i915_private *i915 = dp_to_i915(intel_dp);
>   struct intel_encoder *encoder =
>   _to_dig_port(intel_dp)->base;
> - bool sink_can_mst = intel_dp_sink_can_mst(intel_dp);
> + bool sink_can_mst = drm_dp_has_mst(_dp->aux, intel_dp->dpcd);
>  
>   drm_dbg_kms(>drm,
>   "[ENCODER:%d:%s] MST support: port: %s, sink: %s, modparam: 
> %s\n",
> diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
> index 8b9eb4db3381c..2d8983a713e8c 100644
> --- a/include/drm/drm_dp_mst_helper.h
> +++ b/include/drm/drm_dp_mst_helper.h
> @@ -911,4 +911,26 @@ __drm_dp_mst_state_iter_get(struct drm_atomic_state 
> *state,
>   for ((__i) = 0; (__i) < (__state)->num_private_objs; (__i)++) \
>   for_each_if(__drm_dp_mst_state_iter_get((__state), &(mgr), 
> NULL, &(new_state), (__i)))
>  
> +/**
> + * drm_dp_has_mst() - check whether or not a sink supports MST
> + * @aux: The DP AUX channel to use
> + * @dpcd: A cached copy of the DPCD capabilities for this sink
> + *
> + * Returns: %True if the sink supports MST, %false otherwise
> + */
> +static inline bool
> +drm_dp_has_mst(struct drm_dp_aux *aux,
> +const u8 dpcd[DP_RECEIVER_CAP_SIZE])
> +{
> + u8 mstm_cap;
> +
> + if (dpcd[DP_DPCD_REV] < DP_DPCD_REV_12)
> + return false;
> +
> + if (drm_dp_dpcd_readb(aux, DP_MSTM_CAP, _cap) != 1)
> + return false;
> +
> + return !!(mstm_cap & DP_MST_CAP);
> +}
> +
>  #endif
> -- 
> 2.26.2
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Sean Paul, Software Engineer, Google / Chromium OS
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Tomasz Figa
On Wed, Aug 19, 2020 at 4:07 PM Robin Murphy  wrote:
>
> On 2020-08-19 13:49, Tomasz Figa wrote:
> > On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy  wrote:
> >>
> >> Hi Tomasz,
> >>
> >> On 2020-08-19 12:16, Tomasz Figa wrote:
> >>> Hi Christoph,
> >>>
> >>> On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig  wrote:
> 
>  The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
> >>>
> >>> Could you explain what makes you think it's unused? It's a feature of
> >>> the UAPI generally supported by the videobuf2 framework and relied on
> >>> by Chromium OS to get any kind of reasonable performance when
> >>> accessing V4L2 buffers in the userspace.
> >>>
>  and causes
>  weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
>  unimplemented except on PARISC and some MIPS configs, and about to be
>  removed.
> >>>
> >>> It is implemented by the generic DMA mapping layer [1], which is used
> >>> by a number of architectures including ARM64 and supposed to be used
> >>> by new architectures going forward.
> >>
> >> AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up
> >> controling whether DMA_ATTR_NON_CONSISTENT is added to 
> >> vb2_queue::dma_attrs.
> >>
> >> Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at
> >> all on arm64?
> >>
> >
> > With the default config it doesn't, but with
> > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
> > the pgprot value as is, without enforcing coherence attributes.
>
> How active are the PA-RISC and MIPS ports of Chromium OS?

Not active. We enable CONFIG_DMA_NONCOHERENT_CACHE_SYNC for ARM64,
given the directions received back in April when discussing the
noncoherent memory functionality on the mailing list in the thread I
pointed out in my previous message and no clarification on why it is
disabled for ARM64 in upstream, despite making several attempts to get
some.

>
> Hacking CONFIG_DMA_NONCOHERENT_CACHE_SYNC into an architecture that
> doesn't provide dma_cache_sync() is wrong, since at worst it may break
> other drivers. If downstream is wildly misusing an API then so be it,
> but it's hardly a strong basis for an upstream argument.

I guess it means that we're wildly misusing the API, but it still does
work. Could you explain how it could break other drivers?

>
> >> Also, I posit that videobuf2 is not actually relying on
> >> DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly:
> >>
> >> "By using this API, you are guaranteeing to the platform
> >> that you have all the correct and necessary sync points for this memory
> >> in the driver should it choose to return non-consistent memory."
> >>
> >> $ git grep dma_cache_sync drivers/media
> >> $
> >
> > AFAIK dma_cache_sync() isn't the only way to perform the cache
> > synchronization. The earlier patch series that I reviewed relied on
> > dma_get_sgtable() and then dma_sync_sg_*() (which existed in the
> > vb2-dc since forever [1]). However, it looks like with the final code
> > the sgtable isn't acquired and the synchronization isn't happening, so
> > you have a point.
>
> Using the streaming sync calls on coherent allocations has also always
> been wrong per the API, regardless of the bodies of code that have
> happened to get away with it for so long.
>
> > FWIW, I asked back in time what the plan is for non-coherent
> > allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
> > dma_sync_*() was supposed to be the right thing to go with. [2] The
> > same thread also explains why dma_alloc_pages() isn't suitable for the
> > users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.
>
> AFAICS even back then Christoph was implying getting rid of
> NON_CONSISTENT and *replacing* it with something streaming-API-based -

That's not how I read his reply from the thread I pointed to, but that
might of course be my misunderstanding.

> i.e. this series - not encouraging mixing the existing APIs. It doesn't
> seem impossible to implement a remapping version of this new
> dma_alloc_pages() for IOMMU-backed ops if it's really warranted
> (although at that point it seems like "non-coherent" vb2-dc starts to
> have significant conceptual overlap with vb2-sg).

No, there is no overlap between vb2-dc and vb2-sg. They differ on
another level - the former is to be used by devices without
scatter-gather or internal mapping capabilities and gives the driver a
single DMA address for the whole buffer, regardless of whether it's
IOVA-contiguous (for devices behind an IOMMU) or physically contiguous
(for the others), while the latter gives the driver an sgtable, which
of course may be DMA-contiguous internally, but doesn't have to and
usually isn't. This model makes it possible to hide the SoC
implementation details from particular drivers, since those are very
often reused on many SoCs which differ in the availability of IOMMU,
DMA addressing restrictions and so on.

Best regards,
Tomasz

Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Tomasz Figa
On Wed, Aug 19, 2020 at 3:57 PM Christoph Hellwig  wrote:
>
> On Wed, Aug 19, 2020 at 02:49:01PM +0200, Tomasz Figa wrote:
> > With the default config it doesn't, but with
> > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
> > the pgprot value as is, without enforcing coherence attributes.
>
> Which isn't selected on arm64, and that is for a good reason.
>
> > AFAIK dma_cache_sync() isn't the only way to perform the cache
> > synchronization.
>
> Yes, it is the only documented way to do it.  And if you read the whole
> series instead of screaming you'd see that it provides a proper way
> to deal with non-coherent memory which will also work with arm64.
> instead of screaming
>

I'm sorry if I have offended you in any way, but would also appreciate
it if a less aggressive tone was directed towards me as well.

I have valid reasons to object to this patch, as stated in my previous
emails. The fact that the original feature has problems is of course
another story and, as I mentioned too, I'm willing to look into fixing
them.

I'm of course happy to review the rest of the series and even more
happy to help migrating this code to whatever is added there, as long
as the functionality is preserved.

> > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
> > series related to the subsystem-facing DMA API changes, since
> > videobuf2 is one of the biggest users of it.
>
> The cc list is too long - I cc lists and key maintainers.  As a reviewer
> should should watch your subsystems lists closely.

Well, I guess we can disagree on this, because there is no clear
policy. I'm listed in the MAINTAINERS file for the subsystem and I
believe the purpose of the file is to list the people to CC on
relevant patches. We're all overloaded with work and having to look
through the huge volume of mailing lists like linux-media doesn't help
and thus I'd still appreciate being added on CC.

Best regards,
Tomasz
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Robin Murphy

On 2020-08-19 13:49, Tomasz Figa wrote:

On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy  wrote:


Hi Tomasz,

On 2020-08-19 12:16, Tomasz Figa wrote:

Hi Christoph,

On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig  wrote:


The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,


Could you explain what makes you think it's unused? It's a feature of
the UAPI generally supported by the videobuf2 framework and relied on
by Chromium OS to get any kind of reasonable performance when
accessing V4L2 buffers in the userspace.


and causes
weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
unimplemented except on PARISC and some MIPS configs, and about to be
removed.


It is implemented by the generic DMA mapping layer [1], which is used
by a number of architectures including ARM64 and supposed to be used
by new architectures going forward.


AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up
controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs.

Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at
all on arm64?



With the default config it doesn't, but with
CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
the pgprot value as is, without enforcing coherence attributes.


How active are the PA-RISC and MIPS ports of Chromium OS?

Hacking CONFIG_DMA_NONCOHERENT_CACHE_SYNC into an architecture that 
doesn't provide dma_cache_sync() is wrong, since at worst it may break 
other drivers. If downstream is wildly misusing an API then so be it, 
but it's hardly a strong basis for an upstream argument.



Also, I posit that videobuf2 is not actually relying on
DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly:

"By using this API, you are guaranteeing to the platform
that you have all the correct and necessary sync points for this memory
in the driver should it choose to return non-consistent memory."

$ git grep dma_cache_sync drivers/media
$


AFAIK dma_cache_sync() isn't the only way to perform the cache
synchronization. The earlier patch series that I reviewed relied on
dma_get_sgtable() and then dma_sync_sg_*() (which existed in the
vb2-dc since forever [1]). However, it looks like with the final code
the sgtable isn't acquired and the synchronization isn't happening, so
you have a point.


Using the streaming sync calls on coherent allocations has also always 
been wrong per the API, regardless of the bodies of code that have 
happened to get away with it for so long.



FWIW, I asked back in time what the plan is for non-coherent
allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
dma_sync_*() was supposed to be the right thing to go with. [2] The
same thread also explains why dma_alloc_pages() isn't suitable for the
users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.


AFAICS even back then Christoph was implying getting rid of 
NON_CONSISTENT and *replacing* it with something streaming-API-based - 
i.e. this series - not encouraging mixing the existing APIs. It doesn't 
seem impossible to implement a remapping version of this new 
dma_alloc_pages() for IOMMU-backed ops if it's really warranted 
(although at that point it seems like "non-coherent" vb2-dc starts to 
have significant conceptual overlap with vb2-sg).


Robin.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Christoph Hellwig
On Wed, Aug 19, 2020 at 01:16:51PM +0200, Tomasz Figa wrote:
> Hi Christoph,
> 
> On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig  wrote:
> >
> > The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
> 
> Could you explain what makes you think it's unused? It's a feature of
> the UAPI generally supported by the videobuf2 framework and relied on
> by Chromium OS to get any kind of reasonable performance when
> accessing V4L2 buffers in the userspace.

Because it doesn't do anything except on PARISC and non-coherent MIPS,
so by definition it isn't used by any of these media drivers.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Christoph Hellwig
On Wed, Aug 19, 2020 at 02:49:01PM +0200, Tomasz Figa wrote:
> With the default config it doesn't, but with
> CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
> the pgprot value as is, without enforcing coherence attributes.

Which isn't selected on arm64, and that is for a good reason.

> AFAIK dma_cache_sync() isn't the only way to perform the cache
> synchronization.

Yes, it is the only documented way to do it.  And if you read the whole
series instead of screaming you'd see that it provides a proper way
to deal with non-coherent memory which will also work with arm64.
instead of screaming 

> By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
> series related to the subsystem-facing DMA API changes, since
> videobuf2 is one of the biggest users of it.

The cc list is too long - I cc lists and key maintainers.  As a reviewer
should should watch your subsystems lists closely.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Tomasz Figa
On Wed, Aug 19, 2020 at 3:55 PM Christoph Hellwig  wrote:
>
> On Wed, Aug 19, 2020 at 01:16:51PM +0200, Tomasz Figa wrote:
> > Hi Christoph,
> >
> > On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig  wrote:
> > >
> > > The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
> >
> > Could you explain what makes you think it's unused? It's a feature of
> > the UAPI generally supported by the videobuf2 framework and relied on
> > by Chromium OS to get any kind of reasonable performance when
> > accessing V4L2 buffers in the userspace.
>
> Because it doesn't do anything except on PARISC and non-coherent MIPS,
> so by definition it isn't used by any of these media drivers.

It's still an UAPI feature, so we can't simply remove the flag, it
must stay there as a no-op, until the problem is resolved.

Also, it of course might be disputable as an out-of-tree usage, but
selecting CONFIG_DMA_NONCOHERENT_CACHE_SYNC makes the flag actually do
something on other platforms, including ARM64.

Best regards,
Tomasz
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Tomasz Figa
On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy  wrote:
>
> Hi Tomasz,
>
> On 2020-08-19 12:16, Tomasz Figa wrote:
> > Hi Christoph,
> >
> > On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig  wrote:
> >>
> >> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
> >
> > Could you explain what makes you think it's unused? It's a feature of
> > the UAPI generally supported by the videobuf2 framework and relied on
> > by Chromium OS to get any kind of reasonable performance when
> > accessing V4L2 buffers in the userspace.
> >
> >> and causes
> >> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
> >> unimplemented except on PARISC and some MIPS configs, and about to be
> >> removed.
> >
> > It is implemented by the generic DMA mapping layer [1], which is used
> > by a number of architectures including ARM64 and supposed to be used
> > by new architectures going forward.
>
> AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up
> controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs.
>
> Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at
> all on arm64?
>

With the default config it doesn't, but with
CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
the pgprot value as is, without enforcing coherence attributes.


> Also, I posit that videobuf2 is not actually relying on
> DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly:
>
> "By using this API, you are guaranteeing to the platform
> that you have all the correct and necessary sync points for this memory
> in the driver should it choose to return non-consistent memory."
>
> $ git grep dma_cache_sync drivers/media
> $

AFAIK dma_cache_sync() isn't the only way to perform the cache
synchronization. The earlier patch series that I reviewed relied on
dma_get_sgtable() and then dma_sync_sg_*() (which existed in the
vb2-dc since forever [1]). However, it looks like with the final code
the sgtable isn't acquired and the synchronization isn't happening, so
you have a point.

FWIW, I asked back in time what the plan is for non-coherent
allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
dma_sync_*() was supposed to be the right thing to go with. [2] The
same thread also explains why dma_alloc_pages() isn't suitable for the
users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.

I think we could make a deal here. We could revert back the parts
using DMA_ATTR_NON_CONSISTENT, keeping the UAPI intact, but just
rendering it no-op, since it's just a hint after all. Then, you would
propose a proper, functionally equivalent and working for ARM64,
replacement for dma_alloc_attrs(..., DMA_ATTR_NON_CONSISTENT), which
we could then use to enable the functionality expected by this UAPI.
Does it sound like something that could work as a way forward here?

By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
series related to the subsystem-facing DMA API changes, since
videobuf2 is one of the biggest users of it.

[1] 
https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L98
[2] https://patchwork.kernel.org/comment/23312203/

Best regards,
Tomasz


>
> Robin.
>
> > [1] 
> > https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341
> >
> > When removing features from generic kernel code, I'd suggest first
> > providing viable alternatives for its users, rather than killing the
> > users altogether.
> >
> > Given the above, I'm afraid I have to NAK this.
> >
> > Best regards,
> > Tomasz
> >
> >>
> >> Signed-off-by: Christoph Hellwig 
> >> ---
> >>   .../userspace-api/media/v4l/buffer.rst| 17 -
> >>   .../media/v4l/vidioc-reqbufs.rst  |  1 -
> >>   .../media/common/videobuf2/videobuf2-core.c   | 36 +--
> >>   .../common/videobuf2/videobuf2-dma-contig.c   | 19 --
> >>   .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
> >>   .../media/common/videobuf2/videobuf2-v4l2.c   | 12 ---
> >>   include/media/videobuf2-core.h|  3 +-
> >>   include/uapi/linux/videodev2.h|  2 --
> >>   8 files changed, 3 insertions(+), 90 deletions(-)
> >>
> >> diff --git a/Documentation/userspace-api/media/v4l/buffer.rst 
> >> b/Documentation/userspace-api/media/v4l/buffer.rst
> >> index 57e752aaf414a7..2044ed13cd9d7d 100644
> >> --- a/Documentation/userspace-api/media/v4l/buffer.rst
> >> +++ b/Documentation/userspace-api/media/v4l/buffer.rst
> >> @@ -701,23 +701,6 @@ Memory Consistency Flags
> >>   :stub-columns: 0
> >>   :widths:   3 1 4
> >>
> >> -* .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`:
> >> -
> >> -  - ``V4L2_FLAG_MEMORY_NON_CONSISTENT``
> >> -  - 0x0001
> >> -  - A buffer is allocated either in consistent (it will be 
> >> automatically
> >> -   coherent between the CPU and the bus) or non-consistent memory. The
> >> -   latter can provide performance gains, for instance the 

[Nouveau] [PATCH 06/28] lib82596: move DMA allocation into the callers of i82596_probe

2020-08-19 Thread Christoph Hellwig
This allows us to get rid of the LIB82596_DMA_ATTR defined and prepare
for untangling the coherent vs non-coherent DMA allocation API.

Signed-off-by: Christoph Hellwig 
---
 drivers/net/ethernet/i825xx/lasi_82596.c | 24 ++--
 drivers/net/ethernet/i825xx/lib82596.c   | 36 
 drivers/net/ethernet/i825xx/sni_82596.c  | 19 +
 3 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c 
b/drivers/net/ethernet/i825xx/lasi_82596.c
index aec7e98bcc853a..8c5ab9b7553e75 100644
--- a/drivers/net/ethernet/i825xx/lasi_82596.c
+++ b/drivers/net/ethernet/i825xx/lasi_82596.c
@@ -96,8 +96,6 @@
 
 #define OPT_SWAP_PORT  0x0001  /* Need to wordswp on the MPU port */
 
-#define LIB82596_DMA_ATTR  DMA_ATTR_NON_CONSISTENT
-
 #define DMA_WBACK(ndev, addr, len) \
do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, 
DMA_TO_DEVICE); } while (0)
 
@@ -155,7 +153,7 @@ lan_init_chip(struct parisc_device *dev)
 {
struct  net_device *netdevice;
struct i596_private *lp;
-   int retval;
+   int retval = -ENOMEM;
int i;
 
if (!dev->irq) {
@@ -186,12 +184,22 @@ lan_init_chip(struct parisc_device *dev)
 
lp = netdev_priv(netdevice);
lp->options = dev->id.sversion == 0x72 ? OPT_SWAP_PORT : 0;
+   lp->dma = dma_alloc_attrs(dev->dev.parent, sizeof(struct i596_dma),
+ >dma_addr, GFP_KERNEL,
+ DMA_ATTR_NON_CONSISTENT);
+   if (!lp->dma)
+   goto out_free_netdev;
 
retval = i82596_probe(netdevice);
-   if (retval) {
-   free_netdev(netdevice);
-   return -ENODEV;
-   }
+   if (retval)
+   goto out_free_dma;
+   return 0;
+
+out_free_dma:
+   dma_free_attrs(dev->dev.parent, sizeof(struct i596_dma),
+  lp->dma, lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+out_free_netdev:
+   free_netdev(netdevice);
return retval;
 }
 
@@ -202,7 +210,7 @@ static int __exit lan_remove_chip(struct parisc_device 
*pdev)
 
unregister_netdev (dev);
dma_free_attrs(>dev, sizeof(struct i596_private), lp->dma,
-  lp->dma_addr, LIB82596_DMA_ATTR);
+  lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
free_netdev (dev);
return 0;
 }
diff --git a/drivers/net/ethernet/i825xx/lib82596.c 
b/drivers/net/ethernet/i825xx/lib82596.c
index b03757e169e475..b4e4b3eb5758b5 100644
--- a/drivers/net/ethernet/i825xx/lib82596.c
+++ b/drivers/net/ethernet/i825xx/lib82596.c
@@ -1047,9 +1047,8 @@ static const struct net_device_ops i596_netdev_ops = {
 
 static int i82596_probe(struct net_device *dev)
 {
-   int i;
struct i596_private *lp = netdev_priv(dev);
-   struct i596_dma *dma;
+   int ret;
 
/* This lot is ensure things have been cache line aligned. */
BUILD_BUG_ON(sizeof(struct i596_rfd) != 32);
@@ -1063,41 +1062,28 @@ static int i82596_probe(struct net_device *dev)
if (!dev->base_addr || !dev->irq)
return -ENODEV;
 
-   dma = dma_alloc_attrs(dev->dev.parent, sizeof(struct i596_dma),
- >dma_addr, GFP_KERNEL,
- LIB82596_DMA_ATTR);
-   if (!dma) {
-   printk(KERN_ERR "%s: Couldn't get shared memory\n", __FILE__);
-   return -ENOMEM;
-   }
-
dev->netdev_ops = _netdev_ops;
dev->watchdog_timeo = TX_TIMEOUT;
 
-   memset(dma, 0, sizeof(struct i596_dma));
-   lp->dma = dma;
-
-   dma->scb.command = 0;
-   dma->scb.cmd = I596_NULL;
-   dma->scb.rfd = I596_NULL;
+   memset(lp->dma, 0, sizeof(struct i596_dma));
+   lp->dma->scb.command = 0;
+   lp->dma->scb.cmd = I596_NULL;
+   lp->dma->scb.rfd = I596_NULL;
spin_lock_init(>lock);
 
-   DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma));
+   DMA_WBACK_INV(dev, lp->dma, sizeof(struct i596_dma));
 
-   i = register_netdev(dev);
-   if (i) {
-   dma_free_attrs(dev->dev.parent, sizeof(struct i596_dma),
-  dma, lp->dma_addr, LIB82596_DMA_ATTR);
-   return i;
-   }
+   ret = register_netdev(dev);
+   if (ret)
+   return ret;
 
DEB(DEB_PROBE, printk(KERN_INFO "%s: 82596 at %#3lx, %pM IRQ %d.\n",
  dev->name, dev->base_addr, dev->dev_addr,
  dev->irq));
DEB(DEB_INIT, printk(KERN_INFO
 "%s: dma at 0x%p (%d bytes), lp->scb at 0x%p\n",
-dev->name, dma, (int)sizeof(struct i596_dma),
->scb));
+dev->name, lp->dma, (int)sizeof(struct i596_dma),
+>dma->scb));
 
return 0;
 }
diff --git a/drivers/net/ethernet/i825xx/sni_82596.c 

[Nouveau] [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages

2020-08-19 Thread Christoph Hellwig
Add a new API to allocate and free pages that are guaranteed to be
addressable by a device, but otherwise behave like pages allocated by
alloc_pages.  The intended APIs to sync them for use with the device
and cpu are dma_sync_single_for_{device,cpu} that are also used for
streaming mappings.

Switch all drivers over to this new API, but keep the usage of the
crufty dma_cache_sync API for now, which will be cleaned up on a driver
by driver basis.

Signed-off-by: Christoph Hellwig 
---
 Documentation/core-api/dma-api.rst| 68 +++
 Documentation/core-api/dma-attributes.rst |  8 ---
 arch/alpha/kernel/pci_iommu.c |  2 +
 arch/arm/mm/dma-mapping-nommu.c   |  2 +
 arch/arm/mm/dma-mapping.c |  4 ++
 arch/ia64/hp/common/sba_iommu.c   |  2 +
 arch/mips/jazz/jazzdma.c  |  7 +--
 arch/powerpc/kernel/dma-iommu.c   |  2 +
 arch/powerpc/platforms/ps3/system-bus.c   |  4 ++
 arch/powerpc/platforms/pseries/vio.c  |  2 +
 arch/s390/pci/pci_dma.c   |  2 +
 arch/x86/kernel/amd_gart_64.c |  2 +
 drivers/iommu/dma-iommu.c |  2 +
 drivers/iommu/intel/iommu.c   |  4 ++
 drivers/net/ethernet/i825xx/lasi_82596.c  | 13 ++---
 drivers/net/ethernet/seeq/sgiseeq.c   | 12 ++--
 drivers/parisc/ccio-dma.c |  2 +
 drivers/parisc/sba_iommu.c|  2 +
 drivers/scsi/53c700.c |  8 +--
 drivers/scsi/sgiwd93.c| 12 ++--
 drivers/xen/swiotlb-xen.c |  2 +
 include/linux/dma-direct.h|  5 ++
 include/linux/dma-mapping.h   | 29 --
 include/linux/dma-noncoherent.h   |  3 -
 kernel/dma/direct.c   | 51 -
 kernel/dma/mapping.c  | 43 +-
 kernel/dma/ops_helpers.c  | 35 
 kernel/dma/virt.c |  2 +
 sound/mips/hal2.c | 20 +++
 29 files changed, 254 insertions(+), 96 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index 90239348b30f6f..047fcfffa0e5cf 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -516,48 +516,53 @@ routines, e.g.:::
}
 
 
-Part II - Advanced dma usage
-
+Part II - Non-coherent DMA allocations
+--
 
-Warning: These pieces of the DMA API should not be used in the
-majority of cases, since they cater for unlikely corner cases that
-don't belong in usual drivers.
+These APIs allow to allocate pages that can be used like normal pages
+in the kernel direct mapping, but are guaranteed to be DMA addressable.
 
 If you don't understand how cache line coherency works between a
 processor and an I/O device, you should not be using this part of the
-API at all.
+API.
 
 ::
 
void *
-   dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
-   gfp_t flag, unsigned long attrs)
+   dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
+   enum dma_data_direction dir, gfp_t gfp)
+
+This routine allocates a region of  bytes of consistent memory.  It
+returns a pointer to the allocated region (in the processor's virtual address
+space) or NULL if the allocation failed. The returned memory is guanteed to
+behave like memory allocated using alloc_pages.
+
+It also returns a  which may be cast to an unsigned integer the
+same width as the bus and given to the device as the DMA address base of
+the region.
 
-Identical to dma_alloc_coherent() except that when the
-DMA_ATTR_NON_CONSISTENT flags is passed in the attrs argument, the
-platform will choose to return either consistent or non-consistent memory
-as it sees fit.  By using this API, you are guaranteeing to the platform
-that you have all the correct and necessary sync points for this memory
-in the driver should it choose to return non-consistent memory.
+The dir parameter specified if data is read and/or written by the device,
+see dma_map_single() for details.
 
-Note: where the platform can return consistent memory, it will
-guarantee that the sync points become nops.
+The gfp parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation, but rejects flags used to specify a memory
+zone such as GFP_DMA or GFP_HIGHMEM.
 
-Warning:  Handling non-consistent memory is a real pain.  You should
-only use this API if you positively know your driver will be
-required to work on one of the rare (usually non-PCI) architectures
-that simply cannot make consistent memory.
+Before giving the memory to the device, dma_sync_single_for_device() needs
+to be called, and before reading memory written by the device,
+dma_sync_single_for_cpu(), just like for streaming DMA mappings that are
+reused.
 
 ::
 

[Nouveau] [PATCH 22/28] sgiseeq: convert from dma_cache_sync to dma_sync_single_for_device

2020-08-19 Thread Christoph Hellwig
Use the proper modern API to transfer cache ownership for incoherent DMA.

Signed-off-by: Christoph Hellwig 
---
 drivers/net/ethernet/seeq/sgiseeq.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/seeq/sgiseeq.c 
b/drivers/net/ethernet/seeq/sgiseeq.c
index 39599bbb5d45b6..f91dae16d69a19 100644
--- a/drivers/net/ethernet/seeq/sgiseeq.c
+++ b/drivers/net/ethernet/seeq/sgiseeq.c
@@ -112,14 +112,18 @@ struct sgiseeq_private {
 
 static inline void dma_sync_desc_cpu(struct net_device *dev, void *addr)
 {
-   dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
-  DMA_FROM_DEVICE);
+   struct sgiseeq_private *sp = netdev_priv(dev);
+
+   dma_sync_single_for_cpu(dev->dev.parent, VIRT_TO_DMA(sp, addr),
+   sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
 }
 
 static inline void dma_sync_desc_dev(struct net_device *dev, void *addr)
 {
-   dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
-  DMA_TO_DEVICE);
+   struct sgiseeq_private *sp = netdev_priv(dev);
+
+   dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr),
+   sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
 }
 
 static inline void hpc3_eth_reset(struct hpc3_ethregs *hregs)
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 25/28] dma-mapping: remove dma_cache_sync

2020-08-19 Thread Christoph Hellwig
All users are gone now, remove the API.

Signed-off-by: Christoph Hellwig 
---
 arch/mips/Kconfig   |  1 -
 arch/mips/jazz/jazzdma.c|  1 -
 arch/mips/mm/dma-noncoherent.c  |  6 --
 arch/parisc/Kconfig |  1 -
 arch/parisc/kernel/pci-dma.c|  6 --
 include/linux/dma-mapping.h |  9 -
 include/linux/dma-noncoherent.h | 10 --
 kernel/dma/Kconfig  |  3 ---
 kernel/dma/mapping.c| 14 --
 9 files changed, 51 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index c95fa3a2484cf0..1be91c5d666e61 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1134,7 +1134,6 @@ config DMA_NONCOHERENT
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_DMA_SET_UNCACHED
select DMA_NONCOHERENT_MMAP
-   select DMA_NONCOHERENT_CACHE_SYNC
select NEED_DMA_MAP_STATE
 
 config SYS_HAS_EARLY_PRINTK
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index 0f9a9cb7fe7a95..e2efe43f5f9cc3 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -614,7 +614,6 @@ const struct dma_map_ops jazz_dma_ops = {
.sync_single_for_device = jazz_dma_sync_single_for_device,
.sync_sg_for_cpu= jazz_dma_sync_sg_for_cpu,
.sync_sg_for_device = jazz_dma_sync_sg_for_device,
-   .cache_sync = arch_dma_cache_sync,
.mmap   = dma_common_mmap,
.get_sgtable= dma_common_get_sgtable,
.alloc_pages= dma_common_alloc_pages,
diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c
index 97a14adbafc99c..f34ad1f09799f1 100644
--- a/arch/mips/mm/dma-noncoherent.c
+++ b/arch/mips/mm/dma-noncoherent.c
@@ -137,12 +137,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 }
 #endif
 
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-   enum dma_data_direction direction)
-{
-   dma_sync_virt_for_device(vaddr, size, direction);
-}
-
 #ifdef CONFIG_DMA_PERDEV_COHERENT
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent)
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 3b0f53dd70bc9b..ed15da1da174e0 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -195,7 +195,6 @@ config PA11
depends on PA7000 || PA7100LC || PA7200 || PA7300LC
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
-   select DMA_NONCOHERENT_CACHE_SYNC
 
 config PREFETCH
def_bool y
diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
index 38c68e131bbe2a..ce38c0b9158125 100644
--- a/arch/parisc/kernel/pci-dma.c
+++ b/arch/parisc/kernel/pci-dma.c
@@ -454,9 +454,3 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 {
flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size);
 }
-
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-  enum dma_data_direction direction)
-{
-   flush_kernel_dcache_range((unsigned long)vaddr, size);
-}
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 73fa6e10c5c8b5..7321df0b9ffc83 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -123,8 +123,6 @@ struct dma_map_ops {
void (*sync_sg_for_device)(struct device *dev,
   struct scatterlist *sg, int nents,
   enum dma_data_direction dir);
-   void (*cache_sync)(struct device *dev, void *vaddr, size_t size,
-   enum dma_data_direction direction);
int (*dma_supported)(struct device *dev, u64 mask);
u64 (*get_required_mask)(struct device *dev);
size_t (*max_mapping_size)(struct device *dev);
@@ -258,9 +256,6 @@ void *dma_alloc_pages(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
enum dma_data_direction dir, gfp_t gfp);
 void dma_free_pages(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, enum dma_data_direction dir);
-/* dma_cache_sync is deprecated: don't use in new code */
-void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-   enum dma_data_direction dir);
 int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs);
@@ -353,10 +348,6 @@ static inline void dma_free_pages(struct device *dev, 
size_t size, void *vaddr,
dma_addr_t dma_handle, enum dma_data_direction dir)
 {
 }
-static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-   enum dma_data_direction dir)
-{
-}
 static inline int dma_get_sgtable_attrs(struct device *dev,
struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr,
size_t size, 

[Nouveau] [PATCH 10/28] MIPS/jazzdma: decouple from dma-direct

2020-08-19 Thread Christoph Hellwig
The jazzdma ops implement support for a very basic IOMMU.  Thus we really
should not use the dma-direct code that takes physical address limits
into account.  This survived through the great MIPS DMA ops cleanup mostly
because I was lazy, but now it is time to fully split the implementations.

Signed-off-by: Christoph Hellwig 
---
 arch/mips/jazz/jazzdma.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index fe40dbed04c1d6..d0b5a2ba2b1a8a 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -492,26 +491,38 @@ int vdma_get_enable(int channel)
 static void *jazz_dma_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
+   struct page *page;
void *ret;
 
-   ret = dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
-   if (!ret)
-   return NULL;
+   if (attrs & DMA_ATTR_NO_WARN)
+   gfp |= __GFP_NOWARN;
 
-   *dma_handle = vdma_alloc(virt_to_phys(ret), size);
-   if (*dma_handle == DMA_MAPPING_ERROR) {
-   dma_direct_free_pages(dev, size, ret, *dma_handle, attrs);
+   size = PAGE_ALIGN(size);
+   page = alloc_pages(gfp, get_order(size));
+   if (!page)
return NULL;
-   }
+   ret = page_address(page);
+   *dma_handle = vdma_alloc(virt_to_phys(ret), size);
+   if (*dma_handle == DMA_MAPPING_ERROR)
+   goto out_free_pages;
+
+   if (attrs & DMA_ATTR_NON_CONSISTENT)
+   return ret;
+   arch_dma_prep_coherent(page, size);
+   return (void *)(UNCAC_BASE + __pa(ret));
 
-   return ret;
+out_free_pages:
+   __free_pages(page, get_order(size));
+   return NULL;
 }
 
 static void jazz_dma_free(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
 {
vdma_free(dma_handle);
-   dma_direct_free_pages(dev, size, vaddr, dma_handle, attrs);
+   if (!(attrs & DMA_ATTR_NON_CONSISTENT))
+   vaddr = __va(vaddr - UNCAC_BASE);
+   __free_pages(virt_to_page(vaddr), get_order(size));
 }
 
 static dma_addr_t jazz_dma_map_page(struct device *dev, struct page *page,
@@ -608,7 +619,6 @@ const struct dma_map_ops jazz_dma_ops = {
.sync_single_for_device = jazz_dma_sync_single_for_device,
.sync_sg_for_cpu= jazz_dma_sync_sg_for_cpu,
.sync_sg_for_device = jazz_dma_sync_sg_for_device,
-   .dma_supported  = dma_direct_supported,
.cache_sync = arch_dma_cache_sync,
.mmap   = dma_common_mmap,
.get_sgtable= dma_common_get_sgtable,
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 27/28] nvme-pci: fix PRP pool size

2020-08-19 Thread Christoph Hellwig
All operations are based on the controller, not the host page size.
Switch the dma pool to use the controller page size as well to avoid
massive overallocations on large page size systems.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ba725ae47305ef..a33adab62acbaf 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2460,7 +2460,8 @@ static int nvme_disable_prepare_reset(struct nvme_dev 
*dev, bool shutdown)
 static int nvme_setup_prp_pools(struct nvme_dev *dev)
 {
dev->prp_page_pool = dma_pool_create("prp list page", dev->dev,
-   PAGE_SIZE, PAGE_SIZE, 0);
+   NVME_CTRL_PAGE_SIZE,
+   NVME_CTRL_PAGE_SIZE, 0);
if (!dev->prp_page_pool)
return -ENOMEM;
 
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 16/28] dma-direct: rename and cleanup __phys_to_dma

2020-08-19 Thread Christoph Hellwig
The __phys_to_dma vs phys_to_dma distinction isn't exactly obvious.  Try
to improve the situation by renaming __phys_to_dma to
phys_to_dma_unencryped, and not forcing architectures that want to
override phys_to_dma to actually provide __phys_to_dma.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/dma-direct.h  |  2 +-
 arch/mips/bmips/dma.c  |  2 +-
 arch/mips/cavium-octeon/dma-octeon.c   |  2 +-
 arch/mips/include/asm/dma-direct.h |  2 +-
 arch/mips/loongson2ef/fuloong-2e/dma.c |  2 +-
 arch/mips/loongson2ef/lemote-2f/dma.c  |  2 +-
 arch/mips/loongson64/dma.c |  2 +-
 arch/mips/pci/pci-ar2315.c |  2 +-
 arch/mips/pci/pci-xtalk-bridge.c   |  2 +-
 arch/mips/sgi-ip32/ip32-dma.c  |  2 +-
 arch/powerpc/include/asm/dma-direct.h  |  2 +-
 drivers/iommu/intel/iommu.c|  2 +-
 include/linux/dma-direct.h | 28 +++---
 kernel/dma/direct.c|  8 
 kernel/dma/swiotlb.c   |  4 ++--
 15 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/arch/arm/include/asm/dma-direct.h 
b/arch/arm/include/asm/dma-direct.h
index a8cee87a93e8ab..bca0de56753439 100644
--- a/arch/arm/include/asm/dma-direct.h
+++ b/arch/arm/include/asm/dma-direct.h
@@ -2,7 +2,7 @@
 #ifndef ASM_ARM_DMA_DIRECT_H
 #define ASM_ARM_DMA_DIRECT_H 1
 
-static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
unsigned int offset = paddr & ~PAGE_MASK;
return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset;
diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
index ba2a5d33dfd3fa..49061b870680b9 100644
--- a/arch/mips/bmips/dma.c
+++ b/arch/mips/bmips/dma.c
@@ -40,7 +40,7 @@ static struct bmips_dma_range *bmips_dma_ranges;
 
 #define FLUSH_RAC  0x100
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t pa)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t pa)
 {
struct bmips_dma_range *r;
 
diff --git a/arch/mips/cavium-octeon/dma-octeon.c 
b/arch/mips/cavium-octeon/dma-octeon.c
index 388b13ba2558c2..232fa1017b1ec9 100644
--- a/arch/mips/cavium-octeon/dma-octeon.c
+++ b/arch/mips/cavium-octeon/dma-octeon.c
@@ -168,7 +168,7 @@ void __init octeon_pci_dma_init(void)
 }
 #endif /* CONFIG_PCI */
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
 #ifdef CONFIG_PCI
if (dev && dev_is_pci(dev))
diff --git a/arch/mips/include/asm/dma-direct.h 
b/arch/mips/include/asm/dma-direct.h
index 8e178651c638c2..9a640118316c9d 100644
--- a/arch/mips/include/asm/dma-direct.h
+++ b/arch/mips/include/asm/dma-direct.h
@@ -2,7 +2,7 @@
 #ifndef _MIPS_DMA_DIRECT_H
 #define _MIPS_DMA_DIRECT_H 1
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr);
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr);
 phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr);
 
 #endif /* _MIPS_DMA_DIRECT_H */
diff --git a/arch/mips/loongson2ef/fuloong-2e/dma.c 
b/arch/mips/loongson2ef/fuloong-2e/dma.c
index 83fadeb3fd7d56..cea167d8aba8db 100644
--- a/arch/mips/loongson2ef/fuloong-2e/dma.c
+++ b/arch/mips/loongson2ef/fuloong-2e/dma.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
return paddr | 0x8000;
 }
diff --git a/arch/mips/loongson2ef/lemote-2f/dma.c 
b/arch/mips/loongson2ef/lemote-2f/dma.c
index 302b43a14eee74..3c9e994563578c 100644
--- a/arch/mips/loongson2ef/lemote-2f/dma.c
+++ b/arch/mips/loongson2ef/lemote-2f/dma.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
return paddr | 0x8000;
 }
diff --git a/arch/mips/loongson64/dma.c b/arch/mips/loongson64/dma.c
index b3dc5d0bd2b113..364f2f27c8723f 100644
--- a/arch/mips/loongson64/dma.c
+++ b/arch/mips/loongson64/dma.c
@@ -4,7 +4,7 @@
 #include 
 #include 
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
/* We extract 2bit node id (bit 44~47, only bit 44~45 used now) from
 * Loongson-3's 48bit address space and embed it into 40bit */
diff --git a/arch/mips/pci/pci-ar2315.c b/arch/mips/pci/pci-ar2315.c
index d88395684f487d..cef4a47ab06311 100644
--- a/arch/mips/pci/pci-ar2315.c
+++ b/arch/mips/pci/pci-ar2315.c
@@ -170,7 +170,7 @@ static inline dma_addr_t ar2315_dev_offset(struct device 
*dev)
return 0;
 }
 
-dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
return paddr + ar2315_dev_offset(dev);
 }
diff --git 

[Nouveau] [PATCH 04/28] net/au1000-eth: stop using DMA_ATTR_NON_CONSISTENT

2020-08-19 Thread Christoph Hellwig
The au1000-eth driver contains none of the manual cache synchronization
required for using DMA_ATTR_NON_CONSISTENT.  From what I can tell it
can be used on both dma coherent and non-coherent DMA platforms, but
I suspect it has been buggy on the non-coherent platforms all along.

Signed-off-by: Christoph Hellwig 
---
 drivers/net/ethernet/amd/au1000_eth.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/amd/au1000_eth.c 
b/drivers/net/ethernet/amd/au1000_eth.c
index 75dbd221dc594b..19e195420e2434 100644
--- a/drivers/net/ethernet/amd/au1000_eth.c
+++ b/drivers/net/ethernet/amd/au1000_eth.c
@@ -1131,10 +1131,9 @@ static int au1000_probe(struct platform_device *pdev)
/* Allocate the data buffers
 * Snooping works fine with eth on all au1xxx
 */
-   aup->vaddr = (u32)dma_alloc_attrs(>dev, MAX_BUF_SIZE *
+   aup->vaddr = (u32)dma_alloc_coherent(>dev, MAX_BUF_SIZE *
  (NUM_TX_BUFFS + NUM_RX_BUFFS),
- >dma_addr, 0,
- DMA_ATTR_NON_CONSISTENT);
+ >dma_addr, 0);
if (!aup->vaddr) {
dev_err(>dev, "failed to allocate data buffers\n");
err = -ENOMEM;
@@ -1310,9 +1309,8 @@ static int au1000_probe(struct platform_device *pdev)
 err_remap2:
iounmap(aup->mac);
 err_remap1:
-   dma_free_attrs(>dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS),
-   (void *)aup->vaddr, aup->dma_addr,
-   DMA_ATTR_NON_CONSISTENT);
+   dma_free_coherent(>dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + 
NUM_RX_BUFFS),
+   (void *)aup->vaddr, aup->dma_addr);
 err_vaddr:
free_netdev(dev);
 err_alloc:
@@ -1344,9 +1342,8 @@ static int au1000_remove(struct platform_device *pdev)
if (aup->tx_db_inuse[i])
au1000_ReleaseDB(aup, aup->tx_db_inuse[i]);
 
-   dma_free_attrs(>dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS),
-   (void *)aup->vaddr, aup->dma_addr,
-   DMA_ATTR_NON_CONSISTENT);
+   dma_free_coherent(>dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + 
NUM_RX_BUFFS),
+   (void *)aup->vaddr, aup->dma_addr);
 
iounmap(aup->macdma);
iounmap(aup->mac);
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 14/28] dma-direct: use phys_to_dma_direct in dma_direct_alloc

2020-08-19 Thread Christoph Hellwig
Replace the currently open code copy.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 01120510968fa1..2e280b9c063449 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -235,10 +235,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
goto out_encrypt_pages;
}
 done:
-   if (force_dma_unencrypted(dev))
-   *dma_handle = __phys_to_dma(dev, page_to_phys(page));
-   else
-   *dma_handle = phys_to_dma(dev, page_to_phys(page));
+   *dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
return ret;
 
 out_encrypt_pages:
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 12/28] dma-direct: remove dma_direct_{alloc, free}_pages

2020-08-19 Thread Christoph Hellwig
Just merge these helpers into the main dma_direct_{alloc,free} routines,
as the additional checks are always false for the two callers.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/kernel/amd_gart_64.c |  6 +++---
 include/linux/dma-direct.h|  4 
 kernel/dma/direct.c   | 39 ++-
 kernel/dma/pool.c |  2 +-
 4 files changed, 19 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index e89031e9c84761..adbf616d35d15d 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -468,7 +468,7 @@ gart_alloc_coherent(struct device *dev, size_t size, 
dma_addr_t *dma_addr,
 {
void *vaddr;
 
-   vaddr = dma_direct_alloc_pages(dev, size, dma_addr, flag, attrs);
+   vaddr = dma_direct_alloc(dev, size, dma_addr, flag, attrs);
if (!vaddr ||
!force_iommu || dev->coherent_dma_mask <= DMA_BIT_MASK(24))
return vaddr;
@@ -480,7 +480,7 @@ gart_alloc_coherent(struct device *dev, size_t size, 
dma_addr_t *dma_addr,
goto out_free;
return vaddr;
 out_free:
-   dma_direct_free_pages(dev, size, vaddr, *dma_addr, attrs);
+   dma_direct_free(dev, size, vaddr, *dma_addr, attrs);
return NULL;
 }
 
@@ -490,7 +490,7 @@ gart_free_coherent(struct device *dev, size_t size, void 
*vaddr,
   dma_addr_t dma_addr, unsigned long attrs)
 {
gart_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL, 0);
-   dma_direct_free_pages(dev, size, vaddr, dma_addr, attrs);
+   dma_direct_free(dev, size, vaddr, dma_addr, attrs);
 }
 
 static int no_agp;
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 738485b3578062..6a96a8ecac7cbc 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -80,10 +80,6 @@ void *dma_direct_alloc(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
dma_addr_t dma_addr, unsigned long attrs);
-void *dma_direct_alloc_pages(struct device *dev, size_t size,
-   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs);
-void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
-   dma_addr_t dma_addr, unsigned long attrs);
 int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs);
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 1123e767f4315f..8da9a62dd9a72c 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -151,13 +151,18 @@ static struct page *__dma_direct_alloc_pages(struct 
device *dev, size_t size,
return page;
 }
 
-void *dma_direct_alloc_pages(struct device *dev, size_t size,
+void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
struct page *page;
void *ret;
int err;
 
+   if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+   dma_alloc_need_uncached(dev, attrs))
+   return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
+
size = PAGE_ALIGN(size);
 
if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
@@ -251,11 +256,18 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
return NULL;
 }
 
-void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
-   dma_addr_t dma_addr, unsigned long attrs)
+void dma_direct_free(struct device *dev, size_t size,
+   void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs)
 {
unsigned int page_order = get_order(size);
 
+   if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+   dma_alloc_need_uncached(dev, attrs)) {
+   arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
+   return;
+   }
+
/* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */
if (dma_should_free_from_pool(dev, attrs) &&
dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
@@ -279,27 +291,6 @@ void dma_direct_free_pages(struct device *dev, size_t 
size, void *cpu_addr,
dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
 }
 
-void *dma_direct_alloc(struct device *dev, size_t size,
-   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
-{
-   if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   dma_alloc_need_uncached(dev, attrs))
-   return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
-   return dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
-}
-
-void 

[Nouveau] [PATCH 24/28] 53c700: convert from dma_cache_sync to dma_sync_single_for_device

2020-08-19 Thread Christoph Hellwig
Use the proper modern API to transfer cache ownership for incoherent DMA.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/53c700.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index 521950d0731e4a..57a08c42d00325 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -269,18 +269,25 @@ NCR_700_get_SXFER(struct scsi_device *SDp)
  spi_period(SDp->sdev_target));
 }
 
+static inline dma_addr_t virt_to_dma(struct NCR_700_Host_Parameters *h, void 
*p)
+{
+   return h->pScript + ((uintptr_t)p - (uintptr_t)h->script);
+}
+
 static inline void dma_sync_to_dev(struct NCR_700_Host_Parameters *h,
void *addr, size_t size)
 {
if (h->noncoherent)
-   dma_cache_sync(h->dev, addr, size, DMA_TO_DEVICE);
+   dma_sync_single_for_device(h->dev, virt_to_dma(h, addr),
+  size, DMA_BIDIRECTIONAL);
 }
 
 static inline void dma_sync_from_dev(struct NCR_700_Host_Parameters *h,
void *addr, size_t size)
 {
if (h->noncoherent)
-   dma_cache_sync(h->dev, addr, size, DMA_FROM_DEVICE);
+   dma_sync_single_for_device(h->dev, virt_to_dma(h, addr), size,
+  DMA_BIDIRECTIONAL);
 }
 
 struct Scsi_Host *
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Christoph Hellwig
The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, and causes
weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
unimplemented except on PARISC and some MIPS configs, and about to be
removed.

Signed-off-by: Christoph Hellwig 
---
 .../userspace-api/media/v4l/buffer.rst| 17 -
 .../media/v4l/vidioc-reqbufs.rst  |  1 -
 .../media/common/videobuf2/videobuf2-core.c   | 36 +--
 .../common/videobuf2/videobuf2-dma-contig.c   | 19 --
 .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
 .../media/common/videobuf2/videobuf2-v4l2.c   | 12 ---
 include/media/videobuf2-core.h|  3 +-
 include/uapi/linux/videodev2.h|  2 --
 8 files changed, 3 insertions(+), 90 deletions(-)

diff --git a/Documentation/userspace-api/media/v4l/buffer.rst 
b/Documentation/userspace-api/media/v4l/buffer.rst
index 57e752aaf414a7..2044ed13cd9d7d 100644
--- a/Documentation/userspace-api/media/v4l/buffer.rst
+++ b/Documentation/userspace-api/media/v4l/buffer.rst
@@ -701,23 +701,6 @@ Memory Consistency Flags
 :stub-columns: 0
 :widths:   3 1 4
 
-* .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`:
-
-  - ``V4L2_FLAG_MEMORY_NON_CONSISTENT``
-  - 0x0001
-  - A buffer is allocated either in consistent (it will be automatically
-   coherent between the CPU and the bus) or non-consistent memory. The
-   latter can provide performance gains, for instance the CPU cache
-   sync/flush operations can be avoided if the buffer is accessed by the
-   corresponding device only and the CPU does not read/write to/from that
-   buffer. However, this requires extra care from the driver -- it must
-   guarantee memory consistency by issuing a cache flush/sync when
-   consistency is needed. If this flag is set V4L2 will attempt to
-   allocate the buffer in non-consistent memory. The flag takes effect
-   only if the buffer is used for :ref:`memory mapping ` I/O and the
-   queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
-   ` capability.
-
 .. c:type:: v4l2_memory
 
 enum v4l2_memory
diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst 
b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
index 75d894d9c36c42..3180c111d368ee 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
@@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit
   - This capability is set by the driver to indicate that the queue 
supports
 cache and memory management hints. However, it's only valid when the
 queue is used for :ref:`memory mapping ` streaming I/O. See
-:ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT 
`,
 :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE 
` and
 :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN `.
 
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c 
b/drivers/media/common/videobuf2/videobuf2-core.c
index f544d3393e9d6b..66a41cef33c1b1 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q,
 }
 EXPORT_SYMBOL(vb2_verify_memory_type);
 
-static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem)
-{
-   q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT;
-
-   if (!vb2_queue_allows_cache_hints(q))
-   return;
-   if (!consistent_mem)
-   q->dma_attrs |= DMA_ATTR_NON_CONSISTENT;
-}
-
-static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem)
-{
-   bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT);
-
-   if (consistent_mem != queue_is_consistent) {
-   dprintk(q, 1, "memory consistency model mismatch\n");
-   return false;
-   }
-   return true;
-}
-
 int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
 unsigned int flags, unsigned int *count)
 {
unsigned int num_buffers, allocated_buffers, num_planes = 0;
unsigned plane_sizes[VB2_MAX_PLANES] = { };
-   bool consistent_mem = true;
unsigned int i;
int ret;
 
-   if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
-   consistent_mem = false;
-
if (q->streaming) {
dprintk(q, 1, "streaming active\n");
return -EBUSY;
@@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory 
memory,
}
 
if (*count == 0 || q->num_buffers != 0 ||
-   (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) ||
-   !verify_consistency_attr(q, consistent_mem)) {
+   (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) {
/*
 * We already have buffers allocated, so first check if they
 * are not in use and can be freed.
@@ -803,7 +777,6 @@ 

[Nouveau] [PATCH 28/28] nvme-pci: use dma_alloc_pages backed dmapools

2020-08-19 Thread Christoph Hellwig
Switch from coherent DMA pools to those backed by dma_alloc_pages.  This
helps device with non-coherent DMA to avoid host accesses to uncached
memory for every submission of a larger than single entry I/O.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/pci.c | 80 -
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a33adab62acbaf..fb34dbcb973673 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -114,8 +114,8 @@ struct nvme_dev {
struct blk_mq_tag_set admin_tagset;
u32 __iomem *dbs;
struct device *dev;
-   struct dma_pool *prp_page_pool;
-   struct dma_pool *prp_small_pool;
+   struct dma_pool prp_page_pool;
+   struct dma_pool prp_small_pool;
unsigned online_queues;
unsigned max_qid;
unsigned io_queues[HCTX_MAX_TYPES];
@@ -536,7 +536,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct 
request *req)
 
 
if (iod->npages == 0)
-   dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
+   dma_pool_free(>prp_small_pool, nvme_pci_iod_list(req)[0],
dma_addr);
 
for (i = 0; i < iod->npages; i++) {
@@ -553,7 +553,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct 
request *req)
next_dma_addr = le64_to_cpu(prp_list[last_prp]);
}
 
-   dma_pool_free(dev->prp_page_pool, addr, dma_addr);
+   dma_pool_free(>prp_page_pool, addr, dma_addr);
dma_addr = next_dma_addr;
}
 
@@ -611,10 +611,10 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev 
*dev,
 
nprps = DIV_ROUND_UP(length, NVME_CTRL_PAGE_SIZE);
if (nprps <= (256 / 8)) {
-   pool = dev->prp_small_pool;
+   pool = >prp_small_pool;
iod->npages = 0;
} else {
-   pool = dev->prp_page_pool;
+   pool = >prp_page_pool;
iod->npages = 1;
}
 
@@ -630,6 +630,11 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev 
*dev,
for (;;) {
if (i == NVME_CTRL_PAGE_SIZE >> 3) {
__le64 *old_prp_list = prp_list;
+
+   dma_sync_single_for_device(dev->dev, prp_dma,
+  i * sizeof(*prp_list),
+  DMA_TO_DEVICE);
+
prp_list = dma_pool_alloc(pool, GFP_ATOMIC, _dma);
if (!prp_list)
return BLK_STS_RESOURCE;
@@ -653,6 +658,8 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev 
*dev,
dma_len = sg_dma_len(sg);
}
 
+   dma_sync_single_for_device(dev->dev, prp_dma, i * sizeof(*prp_list),
+  DMA_TO_DEVICE);
 done:
cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sg));
cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma);
@@ -706,10 +713,10 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev 
*dev,
}
 
if (entries <= (256 / sizeof(struct nvme_sgl_desc))) {
-   pool = dev->prp_small_pool;
+   pool = >prp_small_pool;
iod->npages = 0;
} else {
-   pool = dev->prp_page_pool;
+   pool = >prp_page_pool;
iod->npages = 1;
}
 
@@ -728,6 +735,10 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev 
*dev,
if (i == SGES_PER_PAGE) {
struct nvme_sgl_desc *old_sg_desc = sg_list;
struct nvme_sgl_desc *link = _sg_desc[i - 1];
+   
+   dma_sync_single_for_device(dev->dev, sgl_dma,
+  i * sizeof(*sg_list),
+  DMA_TO_DEVICE);
 
sg_list = dma_pool_alloc(pool, GFP_ATOMIC, _dma);
if (!sg_list)
@@ -743,6 +754,8 @@ static blk_status_t nvme_pci_setup_sgls(struct nvme_dev 
*dev,
sg = sg_next(sg);
} while (--entries > 0);
 
+   dma_sync_single_for_device(dev->dev, sgl_dma, i * sizeof(*sg_list),
+  DMA_TO_DEVICE);
return BLK_STS_OK;
 }
 
@@ -2457,30 +2470,6 @@ static int nvme_disable_prepare_reset(struct nvme_dev 
*dev, bool shutdown)
return 0;
 }
 
-static int nvme_setup_prp_pools(struct nvme_dev *dev)
-{
-   dev->prp_page_pool = dma_pool_create("prp list page", dev->dev,
-   NVME_CTRL_PAGE_SIZE,
-   NVME_CTRL_PAGE_SIZE, 0);
-   if (!dev->prp_page_pool)
-   return -ENOMEM;
-
-   /* Optimisation for I/Os between 4k and 128k */
-   dev->prp_small_pool = dma_pool_create("prp list 

[Nouveau] [PATCH 15/28] dma-direct: remove __dma_to_phys

2020-08-19 Thread Christoph Hellwig
There is no harm in just always clearing the SME encryption bit, while
significantly simplifying the interface.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/dma-direct.h  |  2 +-
 arch/mips/bmips/dma.c  |  2 +-
 arch/mips/cavium-octeon/dma-octeon.c   |  2 +-
 arch/mips/include/asm/dma-direct.h |  2 +-
 arch/mips/loongson2ef/fuloong-2e/dma.c |  2 +-
 arch/mips/loongson2ef/lemote-2f/dma.c  |  2 +-
 arch/mips/loongson64/dma.c |  2 +-
 arch/mips/pci/pci-ar2315.c |  2 +-
 arch/mips/pci/pci-xtalk-bridge.c   |  2 +-
 arch/mips/sgi-ip32/ip32-dma.c  |  2 +-
 arch/powerpc/include/asm/dma-direct.h  |  2 +-
 include/linux/dma-direct.h | 21 +++--
 kernel/dma/direct.c|  6 +-
 13 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/arch/arm/include/asm/dma-direct.h 
b/arch/arm/include/asm/dma-direct.h
index 7c3001a6a775bf..a8cee87a93e8ab 100644
--- a/arch/arm/include/asm/dma-direct.h
+++ b/arch/arm/include/asm/dma-direct.h
@@ -8,7 +8,7 @@ static inline dma_addr_t __phys_to_dma(struct device *dev, 
phys_addr_t paddr)
return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset;
 }
 
-static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t 
dev_addr)
+static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
 {
unsigned int offset = dev_addr & ~PAGE_MASK;
return __pfn_to_phys(dma_to_pfn(dev, dev_addr)) + offset;
diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
index df56bf4179e347..ba2a5d33dfd3fa 100644
--- a/arch/mips/bmips/dma.c
+++ b/arch/mips/bmips/dma.c
@@ -52,7 +52,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t pa)
return pa;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
struct bmips_dma_range *r;
 
diff --git a/arch/mips/cavium-octeon/dma-octeon.c 
b/arch/mips/cavium-octeon/dma-octeon.c
index 14ea680d180e07..388b13ba2558c2 100644
--- a/arch/mips/cavium-octeon/dma-octeon.c
+++ b/arch/mips/cavium-octeon/dma-octeon.c
@@ -177,7 +177,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t 
paddr)
return paddr;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
 #ifdef CONFIG_PCI
if (dev && dev_is_pci(dev))
diff --git a/arch/mips/include/asm/dma-direct.h 
b/arch/mips/include/asm/dma-direct.h
index 14e352651ce946..8e178651c638c2 100644
--- a/arch/mips/include/asm/dma-direct.h
+++ b/arch/mips/include/asm/dma-direct.h
@@ -3,6 +3,6 @@
 #define _MIPS_DMA_DIRECT_H 1
 
 dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr);
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr);
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr);
 
 #endif /* _MIPS_DMA_DIRECT_H */
diff --git a/arch/mips/loongson2ef/fuloong-2e/dma.c 
b/arch/mips/loongson2ef/fuloong-2e/dma.c
index e122292bf6660a..83fadeb3fd7d56 100644
--- a/arch/mips/loongson2ef/fuloong-2e/dma.c
+++ b/arch/mips/loongson2ef/fuloong-2e/dma.c
@@ -6,7 +6,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
return paddr | 0x8000;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
return dma_addr & 0x7fff;
 }
diff --git a/arch/mips/loongson2ef/lemote-2f/dma.c 
b/arch/mips/loongson2ef/lemote-2f/dma.c
index abf0e39d7e4696..302b43a14eee74 100644
--- a/arch/mips/loongson2ef/lemote-2f/dma.c
+++ b/arch/mips/loongson2ef/lemote-2f/dma.c
@@ -6,7 +6,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
return paddr | 0x8000;
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dma_addr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dma_addr)
 {
if (dma_addr > 0x8fff)
return dma_addr;
diff --git a/arch/mips/loongson64/dma.c b/arch/mips/loongson64/dma.c
index dbfe6e82fddd1c..b3dc5d0bd2b113 100644
--- a/arch/mips/loongson64/dma.c
+++ b/arch/mips/loongson64/dma.c
@@ -13,7 +13,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t 
paddr)
return ((nid << 44) ^ paddr) | (nid << node_id_offset);
 }
 
-phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t daddr)
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
/* We extract 2bit node id (bit 44~47, only bit 44~45 used now) from
 * Loongson-3's 48bit address space and embed it into 40bit */
diff --git a/arch/mips/pci/pci-ar2315.c b/arch/mips/pci/pci-ar2315.c
index 490953f515282a..d88395684f487d 100644
--- a/arch/mips/pci/pci-ar2315.c
+++ b/arch/mips/pci/pci-ar2315.c
@@ -175,7 +175,7 @@ dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t 
paddr)
return paddr + ar2315_dev_offset(dev);
 }
 
-phys_addr_t __dma_to_phys(struct device 

[Nouveau] [PATCH 08/28] MIPS: make dma_sync_*_for_cpu a little less overzealous

2020-08-19 Thread Christoph Hellwig
When transferring DMA ownership back to the CPU there should never
be any writeback from the cache, as the buffer was owned by the
device until now.  Instead it should just be invalidated for the
mapping directions where the device could have written data.
Note that the changes rely on the fact that kmap_atomic is stubbed
out for the !HIGHMEM case to simplify the code a bit.

Signed-off-by: Christoph Hellwig 
---
 arch/mips/mm/dma-noncoherent.c | 44 +-
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c
index 563c2c0d0c8193..97a14adbafc99c 100644
--- a/arch/mips/mm/dma-noncoherent.c
+++ b/arch/mips/mm/dma-noncoherent.c
@@ -55,22 +55,34 @@ void *arch_dma_set_uncached(void *addr, size_t size)
return (void *)(__pa(addr) + UNCAC_BASE);
 }
 
-static inline void dma_sync_virt(void *addr, size_t size,
+static inline void dma_sync_virt_for_device(void *addr, size_t size,
enum dma_data_direction dir)
 {
switch (dir) {
case DMA_TO_DEVICE:
dma_cache_wback((unsigned long)addr, size);
break;
-
case DMA_FROM_DEVICE:
dma_cache_inv((unsigned long)addr, size);
break;
-
case DMA_BIDIRECTIONAL:
dma_cache_wback_inv((unsigned long)addr, size);
break;
+   default:
+   BUG();
+   }
+}
 
+static inline void dma_sync_virt_for_cpu(void *addr, size_t size,
+   enum dma_data_direction dir)
+{
+   switch (dir) {
+   case DMA_TO_DEVICE:
+   break;
+   case DMA_FROM_DEVICE:
+   case DMA_BIDIRECTIONAL:
+   dma_cache_inv((unsigned long)addr, size);
+   break;
default:
BUG();
}
@@ -82,7 +94,7 @@ static inline void dma_sync_virt(void *addr, size_t size,
  * configured then the bulk of this loop gets optimized out.
  */
 static inline void dma_sync_phys(phys_addr_t paddr, size_t size,
-   enum dma_data_direction dir)
+   enum dma_data_direction dir, bool for_device)
 {
struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
unsigned long offset = paddr & ~PAGE_MASK;
@@ -90,18 +102,20 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t 
size,
 
do {
size_t len = left;
+   void *addr;
 
if (PageHighMem(page)) {
-   void *addr;
-
if (offset + len > PAGE_SIZE)
len = PAGE_SIZE - offset;
+   }
+
+   addr = kmap_atomic(page);
+   if (for_device)
+   dma_sync_virt_for_device(addr + offset, len, dir);
+   else
+   dma_sync_virt_for_cpu(addr + offset, len, dir);
+   kunmap_atomic(addr);
 
-   addr = kmap_atomic(page);
-   dma_sync_virt(addr + offset, len, dir);
-   kunmap_atomic(addr);
-   } else
-   dma_sync_virt(page_address(page) + offset, size, dir);
offset = 0;
page++;
left -= len;
@@ -111,7 +125,7 @@ static inline void dma_sync_phys(phys_addr_t paddr, size_t 
size,
 void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
enum dma_data_direction dir)
 {
-   dma_sync_phys(paddr, size, dir);
+   dma_sync_phys(paddr, size, dir, true);
 }
 
 #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU
@@ -119,16 +133,14 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
enum dma_data_direction dir)
 {
if (cpu_needs_post_dma_flush())
-   dma_sync_phys(paddr, size, dir);
+   dma_sync_phys(paddr, size, dir, false);
 }
 #endif
 
 void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction direction)
 {
-   BUG_ON(direction == DMA_NONE);
-
-   dma_sync_virt(vaddr, size, direction);
+   dma_sync_virt_for_device(vaddr, size, direction);
 }
 
 #ifdef CONFIG_DMA_PERDEV_COHERENT
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 20/28] sgiwd93: convert from dma_cache_sync to dma_sync_single_for_device

2020-08-19 Thread Christoph Hellwig
Use the proper modern API to transfer cache ownership for incoherent DMA.
This also means we can allocate the memory as DMA_TO_DEVICE instead
of bidirectional.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/sgiwd93.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/sgiwd93.c b/drivers/scsi/sgiwd93.c
index 133adcf99e9340..1538f65307f22f 100644
--- a/drivers/scsi/sgiwd93.c
+++ b/drivers/scsi/sgiwd93.c
@@ -95,7 +95,7 @@ void fill_hpc_entries(struct ip22_hostdata *hd, struct 
scsi_cmnd *cmd, int din)
 */
hcp->desc.pbuf = 0;
hcp->desc.cntinfo = HPCDMA_EOX;
-   dma_cache_sync(hd->dev, hd->cpu,
+   dma_sync_single_for_device(hd->dev, hd->dma,
   (unsigned long)(hcp + 1) - (unsigned long)hd->cpu,
   DMA_TO_DEVICE);
 }
@@ -235,7 +235,7 @@ static int sgiwd93_probe(struct platform_device *pdev)
hdata = host_to_hostdata(host);
hdata->dev = >dev;
hdata->cpu = dma_alloc_pages(>dev, HPC_DMA_SIZE, >dma,
-DMA_BIDIRECTIONAL, GFP_KERNEL);
+DMA_TO_DEVICE, GFP_KERNEL);
if (!hdata->cpu) {
printk(KERN_WARNING "sgiwd93: Could not allocate memory for "
   "host %d buffer.\n", unit);
@@ -275,7 +275,7 @@ static int sgiwd93_probe(struct platform_device *pdev)
free_irq(irq, host);
 out_free:
dma_free_pages(>dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
-   DMA_BIDIRECTIONAL);
+   DMA_TO_DEVICE);
 out_put:
scsi_host_put(host);
 out:
@@ -292,7 +292,7 @@ static int sgiwd93_remove(struct platform_device *pdev)
scsi_remove_host(host);
free_irq(pd->irq, host);
dma_free_pages(>dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
-   DMA_BIDIRECTIONAL);
+   DMA_TO_DEVICE);
scsi_host_put(host);
return 0;
 }
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Robin Murphy

Hi Tomasz,

On 2020-08-19 12:16, Tomasz Figa wrote:

Hi Christoph,

On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig  wrote:


The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,


Could you explain what makes you think it's unused? It's a feature of
the UAPI generally supported by the videobuf2 framework and relied on
by Chromium OS to get any kind of reasonable performance when
accessing V4L2 buffers in the userspace.


and causes
weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
unimplemented except on PARISC and some MIPS configs, and about to be
removed.


It is implemented by the generic DMA mapping layer [1], which is used
by a number of architectures including ARM64 and supposed to be used
by new architectures going forward.


AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up 
controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs.


Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at 
all on arm64?


Also, I posit that videobuf2 is not actually relying on 
DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly:


"By using this API, you are guaranteeing to the platform
that you have all the correct and necessary sync points for this memory
in the driver should it choose to return non-consistent memory."

$ git grep dma_cache_sync drivers/media
$

Robin.


[1] https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341

When removing features from generic kernel code, I'd suggest first
providing viable alternatives for its users, rather than killing the
users altogether.

Given the above, I'm afraid I have to NAK this.

Best regards,
Tomasz



Signed-off-by: Christoph Hellwig 
---
  .../userspace-api/media/v4l/buffer.rst| 17 -
  .../media/v4l/vidioc-reqbufs.rst  |  1 -
  .../media/common/videobuf2/videobuf2-core.c   | 36 +--
  .../common/videobuf2/videobuf2-dma-contig.c   | 19 --
  .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
  .../media/common/videobuf2/videobuf2-v4l2.c   | 12 ---
  include/media/videobuf2-core.h|  3 +-
  include/uapi/linux/videodev2.h|  2 --
  8 files changed, 3 insertions(+), 90 deletions(-)

diff --git a/Documentation/userspace-api/media/v4l/buffer.rst 
b/Documentation/userspace-api/media/v4l/buffer.rst
index 57e752aaf414a7..2044ed13cd9d7d 100644
--- a/Documentation/userspace-api/media/v4l/buffer.rst
+++ b/Documentation/userspace-api/media/v4l/buffer.rst
@@ -701,23 +701,6 @@ Memory Consistency Flags
  :stub-columns: 0
  :widths:   3 1 4

-* .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`:
-
-  - ``V4L2_FLAG_MEMORY_NON_CONSISTENT``
-  - 0x0001
-  - A buffer is allocated either in consistent (it will be automatically
-   coherent between the CPU and the bus) or non-consistent memory. The
-   latter can provide performance gains, for instance the CPU cache
-   sync/flush operations can be avoided if the buffer is accessed by the
-   corresponding device only and the CPU does not read/write to/from that
-   buffer. However, this requires extra care from the driver -- it must
-   guarantee memory consistency by issuing a cache flush/sync when
-   consistency is needed. If this flag is set V4L2 will attempt to
-   allocate the buffer in non-consistent memory. The flag takes effect
-   only if the buffer is used for :ref:`memory mapping ` I/O and the
-   queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
-   ` capability.
-
  .. c:type:: v4l2_memory

  enum v4l2_memory
diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst 
b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
index 75d894d9c36c42..3180c111d368ee 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
@@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit
- This capability is set by the driver to indicate that the queue 
supports
  cache and memory management hints. However, it's only valid when the
  queue is used for :ref:`memory mapping ` streaming I/O. See
-:ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT 
`,
  :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE 
` and
  :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN `.

diff --git a/drivers/media/common/videobuf2/videobuf2-core.c 
b/drivers/media/common/videobuf2/videobuf2-core.c
index f544d3393e9d6b..66a41cef33c1b1 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q,
  }
  EXPORT_SYMBOL(vb2_verify_memory_type);

-static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem)
-{
-   q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT;
-
-   if (!vb2_queue_allows_cache_hints(q))
-   return;
-   if 

[Nouveau] a saner API for allocating DMA addressable pages

2020-08-19 Thread Christoph Hellwig
Hi all,

this series replaced the DMA_ATTR_NON_CONSISTENT flag to dma_alloc_attrs
with a separate new dma_alloc_pages API, which is available on all
platforms.  In addition to cleaning up the convoluted code path, this
ensures that other drivers that have asked for better support for
non-coherent DMA to pages with incurring bounce buffering over can finally
be properly supported.

I'm still a little unsure about the API naming, as alloc_pages sort of
implies a struct page return value, but we return a kernel virtual
address.  The other alternative would be to name the API
dma_alloc_noncoherent, but the whole non-coherent naming seems to put
people off.  As a follow up I plan to move the implementation of the
DMA_ATTR_NO_KERNEL_MAPPING flag over to this framework as well, given
that is also is a fundamentally non coherent allocation.  The replacement
for that flag would then return a struct page, as it is allowed to
actually return pages without a kernel mapping as the name suggested
(although most of the time they will actually have a kernel mapping..)

In addition to the conversions of the existing non-coherent DMA users
the last three patches also convert the DMA coherent allocations in
the NVMe driver to use this new framework through a dmapool addition.
This was both to give me a good testing vehicle, but also because it
should speed up the NVMe driver on platforms with non-coherent DMA
nicely, without a downside on platforms with cache coherent DMA.


A git tree is available here:

git://git.infradead.org/users/hch/misc.git dma_alloc_pages

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages


Diffstat:
 Documentation/core-api/dma-api.rst   |   92 ++
 Documentation/core-api/dma-attributes.rst|8 
 Documentation/userspace-api/media/v4l/buffer.rst |   17 -
 Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst |1 
 arch/alpha/kernel/pci_iommu.c|2 
 arch/arm/include/asm/dma-direct.h|4 
 arch/arm/mm/dma-mapping-nommu.c  |2 
 arch/arm/mm/dma-mapping.c|4 
 arch/ia64/Kconfig|3 
 arch/ia64/hp/common/sba_iommu.c  |2 
 arch/ia64/kernel/dma-mapping.c   |   14 
 arch/ia64/mm/init.c  |3 
 arch/mips/Kconfig|1 
 arch/mips/bmips/dma.c|4 
 arch/mips/cavium-octeon/dma-octeon.c |4 
 arch/mips/include/asm/dma-direct.h   |4 
 arch/mips/include/asm/jazzdma.h  |2 
 arch/mips/jazz/jazzdma.c |  102 +--
 arch/mips/loongson2ef/fuloong-2e/dma.c   |4 
 arch/mips/loongson2ef/lemote-2f/dma.c|4 
 arch/mips/loongson64/dma.c   |4 
 arch/mips/mm/dma-noncoherent.c   |   48 +--
 arch/mips/pci/pci-ar2315.c   |4 
 arch/mips/pci/pci-xtalk-bridge.c |4 
 arch/mips/sgi-ip32/ip32-dma.c|4 
 arch/parisc/Kconfig  |1 
 arch/parisc/kernel/pci-dma.c |6 
 arch/powerpc/include/asm/dma-direct.h|4 
 arch/powerpc/kernel/dma-iommu.c  |2 
 arch/powerpc/platforms/ps3/system-bus.c  |4 
 arch/powerpc/platforms/pseries/vio.c |2 
 arch/s390/pci/pci_dma.c  |2 
 arch/x86/kernel/amd_gart_64.c|8 
 drivers/gpu/drm/exynos/exynos_drm_gem.c  |2 
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c  |3 
 drivers/iommu/dma-iommu.c|2 
 drivers/iommu/intel/iommu.c  |6 
 drivers/media/common/videobuf2/videobuf2-core.c  |   36 --
 drivers/media/common/videobuf2/videobuf2-dma-contig.c|   19 -
 drivers/media/common/videobuf2/videobuf2-dma-sg.c|3 
 drivers/media/common/videobuf2/videobuf2-v4l2.c  |   12 
 drivers/net/ethernet/amd/au1000_eth.c|   15 -
 drivers/net/ethernet/i825xx/lasi_82596.c |   36 +-
 drivers/net/ethernet/i825xx/lib82596.c   |  148 +-
 drivers/net/ethernet/i825xx/sni_82596.c  |   23 -
 drivers/net/ethernet/seeq/sgiseeq.c  |   24 -
 drivers/nvme/host/pci.c  |   79 ++---
 drivers/parisc/ccio-dma.c|2 
 drivers/parisc/sba_iommu.c   |2 
 drivers/scsi/53c700.c 

[Nouveau] [PATCH 02/28] drm/exynos: stop setting DMA_ATTR_NON_CONSISTENT

2020-08-19 Thread Christoph Hellwig
DMA_ATTR_NON_CONSISTENT is a no-op except on PARISC and some mips
configs, so don't set it in this ARM specific driver.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/exynos/exynos_drm_gem.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c 
b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index efa476858db54b..07073222b8f691 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -42,8 +42,6 @@ static int exynos_drm_alloc_buf(struct exynos_drm_gem 
*exynos_gem, bool kvmap)
if (exynos_gem->flags & EXYNOS_BO_WC ||
!(exynos_gem->flags & EXYNOS_BO_CACHABLE))
attr |= DMA_ATTR_WRITE_COMBINE;
-   else
-   attr |= DMA_ATTR_NON_CONSISTENT;
 
/* FBDev emulation requires kernel mapping */
if (!kvmap)
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 23/28] lib82596: convert from dma_cache_sync to dma_sync_single_for_device

2020-08-19 Thread Christoph Hellwig
Use the proper modern API to transfer cache ownership for incoherent DMA.
Note that this moves the DMA helpers to the main lib82596.c file, so
that they can use virt_to_dma.

Signed-off-by: Christoph Hellwig 
---
 drivers/net/ethernet/i825xx/lasi_82596.c |  11 +--
 drivers/net/ethernet/i825xx/lib82596.c   | 114 ++-
 drivers/net/ethernet/i825xx/sni_82596.c  |   4 -
 3 files changed, 73 insertions(+), 56 deletions(-)

diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c 
b/drivers/net/ethernet/i825xx/lasi_82596.c
index 0c493b7237a910..d13b610935bcf3 100644
--- a/drivers/net/ethernet/i825xx/lasi_82596.c
+++ b/drivers/net/ethernet/i825xx/lasi_82596.c
@@ -96,21 +96,14 @@
 
 #define OPT_SWAP_PORT  0x0001  /* Need to wordswp on the MPU port */
 
-#define DMA_WBACK(ndev, addr, len) \
-   do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, 
DMA_TO_DEVICE); } while (0)
-
-#define DMA_INV(ndev, addr, len) \
-   do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, 
DMA_FROM_DEVICE); } while (0)
-
-#define DMA_WBACK_INV(ndev, addr, len) \
-   do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, 
DMA_BIDIRECTIONAL); } while (0)
-
 #define SYSBUS  0x006c
 
 /* big endian CPU, 82596 "big" endian mode */
 #define SWAP32(x)   (((u32)(x)<<16) | u32)(x)))>>16))
 #define SWAP16(x)   (x)
 
+#define NONCOHERENT_DMA 1
+
 #include "lib82596.c"
 
 MODULE_AUTHOR("Richard Hirst");
diff --git a/drivers/net/ethernet/i825xx/lib82596.c 
b/drivers/net/ethernet/i825xx/lib82596.c
index b4e4b3eb5758b5..ca2fb303fcc6f6 100644
--- a/drivers/net/ethernet/i825xx/lib82596.c
+++ b/drivers/net/ethernet/i825xx/lib82596.c
@@ -365,13 +365,44 @@ static int max_cmd_backlog = TX_RING_SIZE-1;
 static void i596_poll_controller(struct net_device *dev);
 #endif
 
+static inline dma_addr_t virt_to_dma(struct i596_private *lp, volatile void *v)
+{
+   return lp->dma_addr + ((unsigned long)v - (unsigned long)lp->dma);
+}
+
+#ifdef NONCOHERENT_DMA
+static inline void dma_sync_dev(struct net_device *ndev, volatile void *addr,
+   size_t len)
+{
+   dma_sync_single_for_device(ndev->dev.parent,
+   virt_to_dma(netdev_priv(ndev), addr), len,
+   DMA_BIDIRECTIONAL);
+}
+
+static inline void dma_sync_cpu(struct net_device *ndev, volatile void *addr,
+   size_t len)
+{
+   dma_sync_single_for_cpu(ndev->dev.parent,
+   virt_to_dma(netdev_priv(ndev), addr), len,
+   DMA_BIDIRECTIONAL);
+}
+#else
+static inline void dma_sync_dev(struct net_device *ndev, volatile void *addr,
+   size_t len)
+{
+}
+static inline void dma_sync_cpu(struct net_device *ndev, volatile void *addr,
+   size_t len)
+{
+}
+#endif /* NONCOHERENT_DMA */
 
 static inline int wait_istat(struct net_device *dev, struct i596_dma *dma, int 
delcnt, char *str)
 {
-   DMA_INV(dev, &(dma->iscp), sizeof(struct i596_iscp));
+   dma_sync_cpu(dev, &(dma->iscp), sizeof(struct i596_iscp));
while (--delcnt && dma->iscp.stat) {
udelay(10);
-   DMA_INV(dev, &(dma->iscp), sizeof(struct i596_iscp));
+   dma_sync_cpu(dev, &(dma->iscp), sizeof(struct i596_iscp));
}
if (!delcnt) {
printk(KERN_ERR "%s: %s, iscp.stat %04x, didn't clear\n",
@@ -384,10 +415,10 @@ static inline int wait_istat(struct net_device *dev, 
struct i596_dma *dma, int d
 
 static inline int wait_cmd(struct net_device *dev, struct i596_dma *dma, int 
delcnt, char *str)
 {
-   DMA_INV(dev, &(dma->scb), sizeof(struct i596_scb));
+   dma_sync_cpu(dev, &(dma->scb), sizeof(struct i596_scb));
while (--delcnt && dma->scb.command) {
udelay(10);
-   DMA_INV(dev, &(dma->scb), sizeof(struct i596_scb));
+   dma_sync_cpu(dev, &(dma->scb), sizeof(struct i596_scb));
}
if (!delcnt) {
printk(KERN_ERR "%s: %s, status %4.4x, cmd %4.4x.\n",
@@ -451,12 +482,9 @@ static void i596_display_data(struct net_device *dev)
   SWAP32(rbd->b_data), SWAP16(rbd->size));
rbd = rbd->v_next;
} while (rbd != lp->rbd_head);
-   DMA_INV(dev, dma, sizeof(struct i596_dma));
+   dma_sync_cpu(dev, dma, sizeof(struct i596_dma));
 }
 
-
-#define virt_to_dma(lp, v) ((lp)->dma_addr + (dma_addr_t)((unsigned 
long)(v)-(unsigned long)((lp)->dma)))
-
 static inline int init_rx_bufs(struct net_device *dev)
 {
struct i596_private *lp = netdev_priv(dev);
@@ -508,7 +536,7 @@ static inline int init_rx_bufs(struct net_device *dev)
rfd->b_next = SWAP32(virt_to_dma(lp, dma->rfds));
rfd->cmd = SWAP16(CMD_EOL|CMD_FLEX);
 
-   DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma));
+   dma_sync_dev(dev, dma, sizeof(struct i596_dma));
return 0;
 }
 
@@ -547,7 +575,7 @@ static void rebuild_rx_bufs(struct net_device *dev)
lp->rbd_head = dma->rbds;
   

[Nouveau] [PATCH 13/28] dma-direct: lift gfp_t manipulation out of__dma_direct_alloc_pages

2020-08-19 Thread Christoph Hellwig
Move the detailed gfp_t setup from __dma_direct_alloc_pages into the
caller to clean things up a little.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 8da9a62dd9a72c..01120510968fa1 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -108,7 +108,7 @@ static inline bool dma_should_free_from_pool(struct device 
*dev,
 }
 
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
-   gfp_t gfp, unsigned long attrs)
+   gfp_t gfp)
 {
int node = dev_to_node(dev);
struct page *page = NULL;
@@ -116,11 +116,6 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
 
WARN_ON_ONCE(!PAGE_ALIGNED(size));
 
-   if (attrs & DMA_ATTR_NO_WARN)
-   gfp |= __GFP_NOWARN;
-
-   /* we always manually zero the memory once we are done: */
-   gfp &= ~__GFP_ZERO;
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
page = dma_alloc_contiguous(dev, size, gfp);
@@ -164,6 +159,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
size = PAGE_ALIGN(size);
+   if (attrs & DMA_ATTR_NO_WARN)
+   gfp |= __GFP_NOWARN;
 
if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
ret = dma_alloc_from_pool(dev, size, , gfp);
@@ -172,7 +169,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
goto done;
}
 
-   page = __dma_direct_alloc_pages(dev, size, gfp, attrs);
+   /* we always manually zero the memory once we are done */
+   page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
 
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 26/28] dmapool: add dma_alloc_pages support

2020-08-19 Thread Christoph Hellwig
Add an new variant of a dmapool that uses non-coherent memory from
dma_alloc_pages.  Unlike the existing mempool_create this one
initialized a pool allocated by the caller to avoid a pointless extra
allocation.  At some point it might be worth to also switch the coherent
allocation over to a similar dma_pool_init_coherent helper, but that is
better done as a separate series including a few conversions.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dmapool.h |  23 -
 mm/dmapool.c| 211 +---
 2 files changed, 154 insertions(+), 80 deletions(-)

diff --git a/include/linux/dmapool.h b/include/linux/dmapool.h
index f632ecfb423840..1387525c4e52e8 100644
--- a/include/linux/dmapool.h
+++ b/include/linux/dmapool.h
@@ -11,6 +11,10 @@
 #ifndef LINUX_DMAPOOL_H
 #defineLINUX_DMAPOOL_H
 
+#include 
+#include 
+#include 
+#include 
 #include 
 #include 
 
@@ -18,11 +22,28 @@ struct device;
 
 #ifdef CONFIG_HAS_DMA
 
+struct dma_pool {  /* the pool */
+   struct list_head page_list;
+   spinlock_t lock;
+   size_t size;
+   struct device *dev;
+   size_t allocation;
+   size_t boundary;
+   bool is_coherent;
+   enum dma_data_direction dir;
+   char name[32];
+   struct list_head pools;
+};
+
 struct dma_pool *dma_pool_create(const char *name, struct device *dev, 
size_t size, size_t align, size_t allocation);
-
 void dma_pool_destroy(struct dma_pool *pool);
 
+int dma_pool_init(struct device *dev, struct dma_pool *pool, const char *name,
+   size_t size, size_t align, size_t boundary,
+   enum dma_data_direction dir);
+void dma_pool_exit(struct dma_pool *pool);
+
 void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 dma_addr_t *handle);
 void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t addr);
diff --git a/mm/dmapool.c b/mm/dmapool.c
index f9fb9bbd733e0f..c60a48b22c8d6a 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -6,10 +6,10 @@
  * Copyright 2007 Intel Corporation
  *   Author: Matthew Wilcox 
  *
- * This allocator returns small blocks of a given size which are DMA-able by
- * the given device.  It uses the dma_alloc_coherent page allocator to get
- * new pages, then splits them up into blocks of the required size.
- * Many older drivers still have their own code to do this.
+ * This allocator returns small blocks of a given size which are DMA-able by 
the
+ * given device.  It either uses the dma_alloc_coherent or the dma_alloc_pages
+ * allocator to get new pages, then splits them up into blocks of the required
+ * size.
  *
  * The current design of this allocator is fairly simple.  The pool is
  * represented by the 'struct dma_pool' which keeps a doubly-linked list of
@@ -39,17 +39,6 @@
 #define DMAPOOL_DEBUG 1
 #endif
 
-struct dma_pool {  /* the pool */
-   struct list_head page_list;
-   spinlock_t lock;
-   size_t size;
-   struct device *dev;
-   size_t allocation;
-   size_t boundary;
-   char name[32];
-   struct list_head pools;
-};
-
 struct dma_page {  /* cacheable header for 'allocation' bytes */
struct list_head page_list;
void *vaddr;
@@ -104,74 +93,40 @@ show_pools(struct device *dev, struct device_attribute 
*attr, char *buf)
 
 static DEVICE_ATTR(pools, 0444, show_pools, NULL);
 
-/**
- * dma_pool_create - Creates a pool of consistent memory blocks, for dma.
- * @name: name of pool, for diagnostics
- * @dev: device that will be doing the DMA
- * @size: size of the blocks in this pool.
- * @align: alignment requirement for blocks; must be a power of two
- * @boundary: returned blocks won't cross this power of two boundary
- * Context: not in_interrupt()
- *
- * Given one of these pools, dma_pool_alloc()
- * may be used to allocate memory.  Such memory will all have "consistent"
- * DMA mappings, accessible by the device and its driver without using
- * cache flushing primitives.  The actual size of blocks allocated may be
- * larger than requested because of alignment.
- *
- * If @boundary is nonzero, objects returned from dma_pool_alloc() won't
- * cross that size boundary.  This is useful for devices which have
- * addressing restrictions on individual DMA transfers, such as not crossing
- * boundaries of 4KBytes.
- *
- * Return: a dma allocation pool with the requested characteristics, or
- * %NULL if one can't be created.
- */
-struct dma_pool *dma_pool_create(const char *name, struct device *dev,
-size_t size, size_t align, size_t boundary)
+static int __dma_pool_init(struct device *dev, struct dma_pool *pool,
+   const char *name, size_t size, size_t align, size_t boundary)
 {
-   struct dma_pool *retval;
size_t allocation;
bool empty = false;
 
if (align == 0)
align = 1;
-   else if (align & (align - 1))
-  

[Nouveau] [PATCH 07/28] 53c700: improve non-coherent DMA handling

2020-08-19 Thread Christoph Hellwig
Switch the 53c700 driver to only use non-coherent descriptor memory if it
really has to because dma_alloc_coherent fails.  This doesn't matter for
any of the platforms it runs on currently, but that will change soon.

To help with this two new helpers to transfer ownership to and from the
device are added that abstract the syncing of the non-coherent memory.
The two current bidirectional cases are mapped to transfers to the
device, as that appears to what they are used for.  Note that for parisc,
which is the only architecture this driver needs to use non-coherent
memory on, the direction argument of dma_cache_sync is ignored, so this
will not change behavior in any way.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/53c700.c | 113 +++---
 drivers/scsi/53c700.h |   9 ++--
 2 files changed, 68 insertions(+), 54 deletions(-)

diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index 461b3babb601ef..b197ed9399e2e0 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -269,6 +269,20 @@ NCR_700_get_SXFER(struct scsi_device *SDp)
  spi_period(SDp->sdev_target));
 }
 
+static inline void dma_sync_to_dev(struct NCR_700_Host_Parameters *h,
+   void *addr, size_t size)
+{
+   if (h->noncoherent)
+   dma_cache_sync(h->dev, addr, size, DMA_TO_DEVICE);
+}
+
+static inline void dma_sync_from_dev(struct NCR_700_Host_Parameters *h,
+   void *addr, size_t size)
+{
+   if (h->noncoherent)
+   dma_cache_sync(h->dev, addr, size, DMA_FROM_DEVICE);
+}
+
 struct Scsi_Host *
 NCR_700_detect(struct scsi_host_template *tpnt,
   struct NCR_700_Host_Parameters *hostdata, struct device *dev)
@@ -283,9 +297,13 @@ NCR_700_detect(struct scsi_host_template *tpnt,
if(tpnt->sdev_attrs == NULL)
tpnt->sdev_attrs = NCR_700_dev_attrs;
 
-   memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, ,
-GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
-   if(memory == NULL) {
+   memory = dma_alloc_coherent(dev, TOTAL_MEM_SIZE, , GFP_KERNEL);
+   if (!memory) {
+   hostdata->noncoherent = 1;
+   memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, ,
+GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+   }
+   if (!memory) {
printk(KERN_ERR "53c700: Failed to allocate memory for driver, 
detaching\n");
return NULL;
}
@@ -339,11 +357,11 @@ NCR_700_detect(struct scsi_host_template *tpnt,
for (j = 0; j < PATCHES; j++)
script[LABELPATCHES[j]] = bS_to_host(pScript + 
SCRIPT[LABELPATCHES[j]]);
/* now patch up fixed addresses. */
-   script_patch_32(hostdata->dev, script, MessageLocation,
+   script_patch_32(hostdata, script, MessageLocation,
pScript + MSGOUT_OFFSET);
-   script_patch_32(hostdata->dev, script, StatusAddress,
+   script_patch_32(hostdata, script, StatusAddress,
pScript + STATUS_OFFSET);
-   script_patch_32(hostdata->dev, script, ReceiveMsgAddress,
+   script_patch_32(hostdata, script, ReceiveMsgAddress,
pScript + MSGIN_OFFSET);
 
hostdata->script = script;
@@ -395,8 +413,12 @@ NCR_700_release(struct Scsi_Host *host)
struct NCR_700_Host_Parameters *hostdata = 
(struct NCR_700_Host_Parameters *)host->hostdata[0];
 
-   dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
-  hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+   if (hostdata->noncoherent)
+   dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
+  hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+   else
+   dma_free_coherent(hostdata->dev, TOTAL_MEM_SIZE,
+ hostdata->script, hostdata->pScript);
return 1;
 }
 
@@ -804,8 +826,8 @@ process_extended_message(struct Scsi_Host *host,
shost_printk(KERN_WARNING, host,
"Unexpected SDTR msg\n");
hostdata->msgout[0] = A_REJECT_MSG;
-   dma_cache_sync(hostdata->dev, hostdata->msgout, 1, 
DMA_TO_DEVICE);
-   script_patch_16(hostdata->dev, hostdata->script,
+   dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+   script_patch_16(hostdata, hostdata->script,
MessageCount, 1);
/* SendMsgOut returns, so set up the return
 * address */
@@ -817,9 +839,8 @@ process_extended_message(struct Scsi_Host *host,
printk(KERN_INFO "scsi%d: (%d:%d), Unsolicited WDTR after CMD, 
Rejecting\n",
   host->host_no, pun, lun);
hostdata->msgout[0] = A_REJECT_MSG;
-   

[Nouveau] [PATCH 11/28] dma-mapping: add (back) arch_dma_mark_clean for ia64

2020-08-19 Thread Christoph Hellwig
Add back a hook to optimize dcache flushing after reading executable
code using DMA.  This gets ia64 out of the business of pretending to
be dma incoherent just for this optimization.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/Kconfig   |  3 +--
 arch/ia64/kernel/dma-mapping.c  | 14 +-
 arch/ia64/mm/init.c |  3 +--
 include/linux/dma-direct.h  |  3 +++
 include/linux/dma-noncoherent.h |  8 
 kernel/dma/Kconfig  |  6 ++
 kernel/dma/direct.c |  3 +++
 7 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 5b4ec80bf5863a..513ba0c5d33610 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -8,6 +8,7 @@ menu "Processor type and features"
 
 config IA64
bool
+   select ARCH_HAS_DMA_MARK_CLEAN
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
select ACPI
@@ -32,8 +33,6 @@ config IA64
select TTY
select HAVE_ARCH_TRACEHOOK
select HAVE_VIRT_CPU_ACCOUNTING
-   select DMA_NONCOHERENT_MMAP
-   select ARCH_HAS_SYNC_DMA_FOR_CPU
select VIRT_TO_BUS
select GENERIC_IRQ_PROBE
select GENERIC_PENDING_IRQ if SMP
diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
index 09ef9ce9988d1f..f640ed6fe1d576 100644
--- a/arch/ia64/kernel/dma-mapping.c
+++ b/arch/ia64/kernel/dma-mapping.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
-#include 
+#include 
 #include 
 
 /* Set this to 1 if there is a HW IOMMU in the system */
@@ -7,15 +7,3 @@ int iommu_detected __read_mostly;
 
 const struct dma_map_ops *dma_ops;
 EXPORT_SYMBOL(dma_ops);
-
-void *arch_dma_alloc(struct device *dev, size_t size,
-   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
-{
-   return dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
-}
-
-void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
-   dma_addr_t dma_addr, unsigned long attrs)
-{
-   dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs);
-}
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 0b3fb4c7af2920..02e5aa08294ee0 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -73,8 +73,7 @@ __ia64_sync_icache_dcache (pte_t pte)
  * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
  * flush them when they get mapped into an executable vm-area.
  */
-void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
-   enum dma_data_direction dir)
+void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
 {
unsigned long pfn = PHYS_PFN(paddr);
 
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 5a3ce2a2479437..738485b3578062 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -153,6 +153,9 @@ static inline void dma_direct_sync_single_for_cpu(struct 
device *dev,
 
if (unlikely(is_swiotlb_buffer(paddr)))
swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
+
+   if (dir == DMA_FROM_DEVICE)
+   arch_dma_mark_clean(paddr, size);
 }
 
 static inline dma_addr_t dma_direct_map_page(struct device *dev,
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index ca09a4e07d2d3d..b9bc6c557ea46f 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -108,6 +108,14 @@ static inline void arch_dma_prep_coherent(struct page 
*page, size_t size)
 }
 #endif /* CONFIG_ARCH_HAS_DMA_PREP_COHERENT */
 
+#ifdef CONFIG_ARCH_HAS_DMA_MARK_CLEAN
+void arch_dma_mark_clean(phys_addr_t paddr, size_t size);
+#else
+static inline void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
+{
+}
+#endif /* ARCH_HAS_DMA_MARK_CLEAN */
+
 void *arch_dma_set_uncached(void *addr, size_t size);
 void arch_dma_clear_uncached(void *addr, size_t size);
 
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 847a9d1fa6343d..6cf7f7947ae797 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -43,6 +43,12 @@ config ARCH_HAS_DMA_SET_MASK
 config ARCH_HAS_DMA_WRITE_COMBINE
bool
 
+#
+# Select if the architectures provides the arch_dma_mark_clean hook
+#
+config ARCH_HAS_DMA_MARK_CLEAN
+   bool
+
 config DMA_DECLARE_COHERENT
bool
 
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index bb0041e9965975..1123e767f4315f 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -340,6 +340,9 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
if (unlikely(is_swiotlb_buffer(paddr)))
swiotlb_tbl_sync_single(dev, paddr, sg->length, dir,
SYNC_FOR_CPU);
+
+   if (dir == DMA_FROM_DEVICE)
+   arch_dma_mark_clean(paddr, sg->length);
}
 
if (!dev_is_dma_coherent(dev))
-- 
2.28.0

___
Nouveau mailing list

[Nouveau] [PATCH 09/28] MIPS/jazzdma: remove the unused vdma_remap function

2020-08-19 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/mips/include/asm/jazzdma.h |  2 -
 arch/mips/jazz/jazzdma.c| 70 -
 2 files changed, 72 deletions(-)

diff --git a/arch/mips/include/asm/jazzdma.h b/arch/mips/include/asm/jazzdma.h
index d13f940022d5f9..c831da7fa89803 100644
--- a/arch/mips/include/asm/jazzdma.h
+++ b/arch/mips/include/asm/jazzdma.h
@@ -10,8 +10,6 @@
  */
 extern unsigned long vdma_alloc(unsigned long paddr, unsigned long size);
 extern int vdma_free(unsigned long laddr);
-extern int vdma_remap(unsigned long laddr, unsigned long paddr,
- unsigned long size);
 extern unsigned long vdma_phys2log(unsigned long paddr);
 extern unsigned long vdma_log2phys(unsigned long laddr);
 extern void vdma_stats(void);  /* for debugging only */
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index 014773f0bfcd74..fe40dbed04c1d6 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -209,76 +209,6 @@ int vdma_free(unsigned long laddr)
 
 EXPORT_SYMBOL(vdma_free);
 
-/*
- * Map certain page(s) to another physical address.
- * Caller must have allocated the page(s) before.
- */
-int vdma_remap(unsigned long laddr, unsigned long paddr, unsigned long size)
-{
-   int first, pages;
-
-   if (laddr > 0xff) {
-   if (vdma_debug)
-   printk
-   ("vdma_map: Invalid logical address: %08lx\n",
-laddr);
-   return -EINVAL; /* invalid logical address */
-   }
-   if (paddr > 0x1fff) {
-   if (vdma_debug)
-   printk
-   ("vdma_map: Invalid physical address: %08lx\n",
-paddr);
-   return -EINVAL; /* invalid physical address */
-   }
-
-   pages = (((paddr & (VDMA_PAGESIZE - 1)) + size) >> 12) + 1;
-   first = laddr >> 12;
-   if (vdma_debug)
-   printk("vdma_remap: first=%x, pages=%x\n", first, pages);
-   if (first + pages > VDMA_PGTBL_ENTRIES) {
-   if (vdma_debug)
-   printk("vdma_alloc: Invalid size: %08lx\n", size);
-   return -EINVAL;
-   }
-
-   paddr &= ~(VDMA_PAGESIZE - 1);
-   while (pages > 0 && first < VDMA_PGTBL_ENTRIES) {
-   if (pgtbl[first].owner != laddr) {
-   if (vdma_debug)
-   printk("Trying to remap other's pages.\n");
-   return -EPERM;  /* not owner */
-   }
-   pgtbl[first].frame = paddr;
-   paddr += VDMA_PAGESIZE;
-   first++;
-   pages--;
-   }
-
-   /*
-* Update translation table
-*/
-   r4030_write_reg32(JAZZ_R4030_TRSTBL_INV, 0);
-
-   if (vdma_debug > 2) {
-   int i;
-   pages = (((paddr & (VDMA_PAGESIZE - 1)) + size) >> 12) + 1;
-   first = laddr >> 12;
-   printk("LADDR: ");
-   for (i = first; i < first + pages; i++)
-   printk("%08x ", i << 12);
-   printk("\nPADDR: ");
-   for (i = first; i < first + pages; i++)
-   printk("%08x ", pgtbl[i].frame);
-   printk("\nOWNER: ");
-   for (i = first; i < first + pages; i++)
-   printk("%08x ", pgtbl[i].owner);
-   printk("\n");
-   }
-
-   return 0;
-}
-
 /*
  * Translate a physical address to a logical address.
  * This will return the logical address of the first
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 01/28] mm: turn alloc_pages into an inline function

2020-08-19 Thread Christoph Hellwig
To prevent a compiler error when a method call alloc_pages is
added (which I plan to for the dma_map_ops).

Signed-off-by: Christoph Hellwig 
---
 include/linux/gfp.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 67a0774e080b98..dd2577c5407112 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -550,8 +550,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int 
order,
 #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
 #else
-#define alloc_pages(gfp_mask, order) \
-   alloc_pages_node(numa_node_id(), gfp_mask, order)
+static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
+{
+   return alloc_pages_node(numa_node_id(), gfp_mask, order);
+}
 #define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
alloc_pages(gfp_mask, order)
 #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 18/28] dma-mapping: move the dma_declare_coherent_memory documentation

2020-08-19 Thread Christoph Hellwig
dma_declare_coherent_memory should not be in a DMA API guide aimed
at driver writers (that is consumers of the API).  Move it to a comment
near the function instead.

Signed-off-by: Christoph Hellwig 
---
 Documentation/core-api/dma-api.rst | 24 
 kernel/dma/coherent.c  | 17 +
 2 files changed, 17 insertions(+), 24 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index 3b3a4b9a6f..90239348b30f6f 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -586,30 +586,6 @@ the DMA_ATTR_NON_CONSISTENT flag starting at virtual 
address vaddr and
 continuing on for size.  Again, you *must* observe the cache line
 boundaries when doing this.
 
-::
-
-   int
-   dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr,
-   dma_addr_t device_addr, size_t size);
-
-Declare region of memory to be handed out by dma_alloc_coherent() when
-it's asked for coherent memory for this device.
-
-phys_addr is the CPU physical address to which the memory is currently
-assigned (this will be ioremapped so the CPU can access the region).
-
-device_addr is the DMA address the device needs to be programmed
-with to actually address this memory (this will be handed out as the
-dma_addr_t in dma_alloc_coherent()).
-
-size is the size of the area (must be multiples of PAGE_SIZE).
-
-As a simplification for the platforms, only *one* such region of
-memory may be declared per device.
-
-For reasons of efficiency, most platforms choose to track the declared
-region only at the granularity of a page.  For smaller allocations,
-you should use the dma_pool() API.
 
 Part III - Debug drivers use of the DMA-API
 ---
diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c
index 2a0c4985f38e41..f85d14bbfcbe03 100644
--- a/kernel/dma/coherent.c
+++ b/kernel/dma/coherent.c
@@ -107,6 +107,23 @@ static int dma_assign_coherent_memory(struct device *dev,
return 0;
 }
 
+/*
+ * Declare a region of memory to be handed out by dma_alloc_coherent() when it
+ * is asked for coherent memory for this device.  This shall only be used
+ * from platform code, usually based on the device tree description.
+ * 
+ * phys_addr is the CPU physical address to which the memory is currently
+ * assigned (this will be ioremapped so the CPU can access the region).
+ *
+ * device_addr is the DMA address the device needs to be programmed with to
+ * actually address this memory (this will be handed out as the dma_addr_t in
+ * dma_alloc_coherent()).
+ *
+ * size is the size of the area (must be a multiple of PAGE_SIZE).
+ *
+ * As a simplification for the platforms, only *one* such region of memory may
+ * be declared per device.
+ */
 int dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr,
dma_addr_t device_addr, size_t size)
 {
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 03/28] drm/nouveau/gk20a: stop setting DMA_ATTR_NON_CONSISTENT

2020-08-19 Thread Christoph Hellwig
DMA_ATTR_NON_CONSISTENT is a no-op except on PARISC and some mips
configs, so don't set it in this ARM specific driver part.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
index 985f2990ab0dda..13d4d7ac0697b4 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -594,8 +594,7 @@ gk20a_instmem_new(struct nvkm_device *device, int index,
 
nvkm_info(>base.subdev, "using IOMMU\n");
} else {
-   imem->attrs = DMA_ATTR_NON_CONSISTENT |
- DMA_ATTR_WEAK_ORDERING |
+   imem->attrs = DMA_ATTR_WEAK_ORDERING |
  DMA_ATTR_WRITE_COMBINE;
 
nvkm_info(>base.subdev, "using DMA API\n");
-- 
2.28.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 17/28] dma-mapping: move dma_common_{mmap, get_sgtable} out of mapping.c

2020-08-19 Thread Christoph Hellwig
Add a new file that contains helpera for misc DMA ops, which is only
built when CONFIG_DMA_OPS is set.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/Makefile  |  1 +
 kernel/dma/mapping.c | 47 +---
 kernel/dma/ops_helpers.c | 51 
 3 files changed, 53 insertions(+), 46 deletions(-)
 create mode 100644 kernel/dma/ops_helpers.c

diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index 32c7c1942bbd6c..dc755ab68aabf9 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_HAS_DMA)  += mapping.o direct.o
+obj-$(CONFIG_DMA_OPS)  += ops_helpers.o
 obj-$(CONFIG_DMA_OPS)  += dummy.o
 obj-$(CONFIG_DMA_CMA)  += contiguous.o
 obj-$(CONFIG_DMA_DECLARE_COHERENT) += coherent.o
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 0d129421e75fc8..848c95c27d79ff 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -8,7 +8,7 @@
 #include  /* for max_pfn */
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -295,22 +295,6 @@ void dma_sync_sg_for_device(struct device *dev, struct 
scatterlist *sg,
 }
 EXPORT_SYMBOL(dma_sync_sg_for_device);
 
-/*
- * Create scatter-list for the already allocated DMA buffer.
- */
-int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt,
-void *cpu_addr, dma_addr_t dma_addr, size_t size,
-unsigned long attrs)
-{
-   struct page *page = virt_to_page(cpu_addr);
-   int ret;
-
-   ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-   if (!ret)
-   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
-   return ret;
-}
-
 /*
  * The whole dma_get_sgtable() idea is fundamentally unsafe - it seems
  * that the intention is to allow exporting memory allocated via the
@@ -358,35 +342,6 @@ pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, 
unsigned long attrs)
 }
 #endif /* CONFIG_MMU */
 
-/*
- * Create userspace mapping for the DMA-coherent memory.
- */
-int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
-   void *cpu_addr, dma_addr_t dma_addr, size_t size,
-   unsigned long attrs)
-{
-#ifdef CONFIG_MMU
-   unsigned long user_count = vma_pages(vma);
-   unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   unsigned long off = vma->vm_pgoff;
-   int ret = -ENXIO;
-
-   vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs);
-
-   if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, ))
-   return ret;
-
-   if (off >= count || user_count > count - off)
-   return -ENXIO;
-
-   return remap_pfn_range(vma, vma->vm_start,
-   page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff,
-   user_count << PAGE_SHIFT, vma->vm_page_prot);
-#else
-   return -ENXIO;
-#endif /* CONFIG_MMU */
-}
-
 /**
  * dma_can_mmap - check if a given device supports dma_mmap_*
  * @dev: device to check
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
new file mode 100644
index 00..e443c69be4299f
--- /dev/null
+++ b/kernel/dma/ops_helpers.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Helpers for DMA ops implementations.  These generally rely on the fact that
+ * the allocated memory contains normal pages in the direct kernel mapping.
+ */
+#include 
+
+/*
+ * Create scatter-list for the already allocated DMA buffer.
+ */
+int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt,
+void *cpu_addr, dma_addr_t dma_addr, size_t size,
+unsigned long attrs)
+{
+   struct page *page = virt_to_page(cpu_addr);
+   int ret;
+
+   ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
+   if (!ret)
+   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
+   return ret;
+}
+
+/*
+ * Create userspace mapping for the DMA-coherent memory.
+ */
+int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
+   void *cpu_addr, dma_addr_t dma_addr, size_t size,
+   unsigned long attrs)
+{
+#ifdef CONFIG_MMU
+   unsigned long user_count = vma_pages(vma);
+   unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+   unsigned long off = vma->vm_pgoff;
+   int ret = -ENXIO;
+
+   vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs);
+
+   if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, ))
+   return ret;
+
+   if (off >= count || user_count > count - off)
+   return -ENXIO;
+
+   return remap_pfn_range(vma, vma->vm_start,
+   page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff,
+   user_count << PAGE_SHIFT, vma->vm_page_prot);
+#else
+   return -ENXIO;
+#endif /* CONFIG_MMU */
+}
-- 
2.28.0


[Nouveau] [PATCH 21/28] hal2: convert from dma_cache_sync to dma_sync_single_for_device

2020-08-19 Thread Christoph Hellwig
Use the proper modern API to transfer cache ownership for incoherent DMA.
This also means we can allocate the buffer memory with the proper
direction instead of bidirectional.

Signed-off-by: Christoph Hellwig 
---
 sound/mips/hal2.c | 44 
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/sound/mips/hal2.c b/sound/mips/hal2.c
index 746c410bd9bf11..c8e429a5f48f85 100644
--- a/sound/mips/hal2.c
+++ b/sound/mips/hal2.c
@@ -441,7 +441,8 @@ static inline void hal2_stop_adc(struct snd_hal2 *hal2)
hal2->adc.pbus.pbus->pbdma_ctrl = HPC3_PDMACTRL_LD;
 }
 
-static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
+static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec,
+   enum dma_data_direction buffer_dir)
 {
struct device *dev = hal2->card->dev;
struct hal2_desc *desc;
@@ -450,14 +451,14 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, 
struct hal2_codec *codec)
int i;
 
codec->buffer = dma_alloc_pages(dev, H2_BUF_SIZE, _dma,
-   DMA_BIDIRECTIONAL, GFP_KERNEL);
+   buffer_dir, GFP_KERNEL);
if (!codec->buffer)
return -ENOMEM;
desc = dma_alloc_pages(dev, count * sizeof(struct hal2_desc), _dma,
   DMA_BIDIRECTIONAL, GFP_KERNEL);
if (!desc) {
dma_free_pages(dev, H2_BUF_SIZE, codec->buffer, buffer_dma,
-   DMA_BIDIRECTIONAL);
+   buffer_dir);
return -ENOMEM;
}
codec->buffer_dma = buffer_dma;
@@ -470,20 +471,22 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, 
struct hal2_codec *codec)
  desc_dma : desc_dma + (i + 1) * sizeof(struct hal2_desc);
desc++;
}
-   dma_cache_sync(dev, codec->desc, count * sizeof(struct hal2_desc),
-  DMA_TO_DEVICE);
+   dma_sync_single_for_device(dev, codec->desc_dma,
+  count * sizeof(struct hal2_desc),
+  DMA_BIDIRECTIONAL);
codec->desc_count = count;
return 0;
 }
 
-static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
+static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec,
+   enum dma_data_direction buffer_dir)
 {
struct device *dev = hal2->card->dev;
 
dma_free_pages(dev, codec->desc_count * sizeof(struct hal2_desc),
   codec->desc, codec->desc_dma, DMA_BIDIRECTIONAL);
dma_free_pages(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma,
-   DMA_BIDIRECTIONAL);
+   buffer_dir);
 }
 
 static const struct snd_pcm_hardware hal2_pcm_hw = {
@@ -509,21 +512,16 @@ static int hal2_playback_open(struct snd_pcm_substream 
*substream)
 {
struct snd_pcm_runtime *runtime = substream->runtime;
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
-   int err;
 
runtime->hw = hal2_pcm_hw;
-
-   err = hal2_alloc_dmabuf(hal2, >dac);
-   if (err)
-   return err;
-   return 0;
+   return hal2_alloc_dmabuf(hal2, >dac, DMA_TO_DEVICE);
 }
 
 static int hal2_playback_close(struct snd_pcm_substream *substream)
 {
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
 
-   hal2_free_dmabuf(hal2, >dac);
+   hal2_free_dmabuf(hal2, >dac, DMA_TO_DEVICE);
return 0;
 }
 
@@ -579,7 +577,9 @@ static void hal2_playback_transfer(struct snd_pcm_substream 
*substream,
unsigned char *buf = hal2->dac.buffer + rec->hw_data;
 
memcpy(buf, substream->runtime->dma_area + rec->sw_data, bytes);
-   dma_cache_sync(hal2->card->dev, buf, bytes, DMA_TO_DEVICE);
+   dma_sync_single_for_device(hal2->card->dev,
+   hal2->dac.buffer_dma + rec->hw_data, bytes,
+   DMA_TO_DEVICE);
 
 }
 
@@ -597,22 +597,16 @@ static int hal2_capture_open(struct snd_pcm_substream 
*substream)
 {
struct snd_pcm_runtime *runtime = substream->runtime;
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
-   struct hal2_codec *adc = >adc;
-   int err;
 
runtime->hw = hal2_pcm_hw;
-
-   err = hal2_alloc_dmabuf(hal2, adc);
-   if (err)
-   return err;
-   return 0;
+   return hal2_alloc_dmabuf(hal2, >adc, DMA_FROM_DEVICE);
 }
 
 static int hal2_capture_close(struct snd_pcm_substream *substream)
 {
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
 
-   hal2_free_dmabuf(hal2, >adc);
+   hal2_free_dmabuf(hal2, >adc, DMA_FROM_DEVICE);
return 0;
 }
 
@@ -667,7 +661,9 @@ static void hal2_capture_transfer(struct snd_pcm_substream 
*substream,
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
unsigned char 

Re: [Nouveau] [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Tomasz Figa
Hi Christoph,

On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig  wrote:
>
> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,

Could you explain what makes you think it's unused? It's a feature of
the UAPI generally supported by the videobuf2 framework and relied on
by Chromium OS to get any kind of reasonable performance when
accessing V4L2 buffers in the userspace.

> and causes
> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
> unimplemented except on PARISC and some MIPS configs, and about to be
> removed.

It is implemented by the generic DMA mapping layer [1], which is used
by a number of architectures including ARM64 and supposed to be used
by new architectures going forward.

[1] https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341

When removing features from generic kernel code, I'd suggest first
providing viable alternatives for its users, rather than killing the
users altogether.

Given the above, I'm afraid I have to NAK this.

Best regards,
Tomasz

>
> Signed-off-by: Christoph Hellwig 
> ---
>  .../userspace-api/media/v4l/buffer.rst| 17 -
>  .../media/v4l/vidioc-reqbufs.rst  |  1 -
>  .../media/common/videobuf2/videobuf2-core.c   | 36 +--
>  .../common/videobuf2/videobuf2-dma-contig.c   | 19 --
>  .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
>  .../media/common/videobuf2/videobuf2-v4l2.c   | 12 ---
>  include/media/videobuf2-core.h|  3 +-
>  include/uapi/linux/videodev2.h|  2 --
>  8 files changed, 3 insertions(+), 90 deletions(-)
>
> diff --git a/Documentation/userspace-api/media/v4l/buffer.rst 
> b/Documentation/userspace-api/media/v4l/buffer.rst
> index 57e752aaf414a7..2044ed13cd9d7d 100644
> --- a/Documentation/userspace-api/media/v4l/buffer.rst
> +++ b/Documentation/userspace-api/media/v4l/buffer.rst
> @@ -701,23 +701,6 @@ Memory Consistency Flags
>  :stub-columns: 0
>  :widths:   3 1 4
>
> -* .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`:
> -
> -  - ``V4L2_FLAG_MEMORY_NON_CONSISTENT``
> -  - 0x0001
> -  - A buffer is allocated either in consistent (it will be automatically
> -   coherent between the CPU and the bus) or non-consistent memory. The
> -   latter can provide performance gains, for instance the CPU cache
> -   sync/flush operations can be avoided if the buffer is accessed by the
> -   corresponding device only and the CPU does not read/write to/from that
> -   buffer. However, this requires extra care from the driver -- it must
> -   guarantee memory consistency by issuing a cache flush/sync when
> -   consistency is needed. If this flag is set V4L2 will attempt to
> -   allocate the buffer in non-consistent memory. The flag takes effect
> -   only if the buffer is used for :ref:`memory mapping ` I/O and 
> the
> -   queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
> -   ` capability.
> -
>  .. c:type:: v4l2_memory
>
>  enum v4l2_memory
> diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst 
> b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> index 75d894d9c36c42..3180c111d368ee 100644
> --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> @@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit
>- This capability is set by the driver to indicate that the queue 
> supports
>  cache and memory management hints. However, it's only valid when the
>  queue is used for :ref:`memory mapping ` streaming I/O. See
> -:ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT 
> `,
>  :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE 
> ` and
>  :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN `.
>
> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c 
> b/drivers/media/common/videobuf2/videobuf2-core.c
> index f544d3393e9d6b..66a41cef33c1b1 100644
> --- a/drivers/media/common/videobuf2/videobuf2-core.c
> +++ b/drivers/media/common/videobuf2/videobuf2-core.c
> @@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q,
>  }
>  EXPORT_SYMBOL(vb2_verify_memory_type);
>
> -static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem)
> -{
> -   q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT;
> -
> -   if (!vb2_queue_allows_cache_hints(q))
> -   return;
> -   if (!consistent_mem)
> -   q->dma_attrs |= DMA_ATTR_NON_CONSISTENT;
> -}
> -
> -static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem)
> -{
> -   bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT);
> -
> -   if (consistent_mem != queue_is_consistent) {
> -   dprintk(q, 1, "memory consistency model mismatch\n");
> -   return false;
> -   }
> -   return true;
> -}
> -
>  int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
>