Re: disable large folios for shmem file used by xfs xfile

2024-01-10 Thread Christoph Hellwig
On Wed, Jan 10, 2024 at 07:38:43AM -0800, Andrew Morton wrote:
> I assume that kernels which contain 137db333b29186 ("xfs: teach xfile
> to pass back direct-map pages to caller") want this, so a Fixes: that
> and a cc:stable are appropriate?

I think it needs to back all the way back to 3934e8ebb7c
("xfs: create a big array data structure") as that only clears the page
sized chunk of a new allocation and could lead to corruption / or
information leaks.



Re: disable large folios for shmem file used by xfs xfile

2024-01-10 Thread Christoph Hellwig
On Wed, Jan 10, 2024 at 12:37:18PM +, Matthew Wilcox wrote:
> On Wed, Jan 10, 2024 at 10:21:07AM +0100, Christoph Hellwig wrote:
> > Hi all,
> > 
> > Darrick reported that the fairly new XFS xfile code blows up when force
> > enabling large folio for shmem.  This series fixes this quickly by disabling
> > large folios for this particular shmem file for now until it can be fixed
> > properly, which will be a lot more invasive.
> > 
> > I've added most of you to the CC list as I suspect most other users of
> > shmem_file_setup and friends will have similar issues.
> 
> The graphics users _want_ to use large folios.  I'm pretty sure they've
> been tested with this.  It's just XFS that didn't know about this
> feature of shmem.

At least sgx has all kinds of PAGE_SIZE assumptions.  I haven't audited
(and am probably not qualified to) audit that code, so I wanted to send
a headsup to everyone.



[PATCH 2/2] xfs: disable large folio support in xfile_create

2024-01-10 Thread Christoph Hellwig
The xfarray code will crash if large folios are force enabled using:

   echo force > /sys/kernel/mm/transparent_hugepage/shmem_enabled

Fixing this will require a bit of an API change, and prefeably sorting out
the hwpoison story for pages vs folio and where it is placed in the shmem
API.  For now use this one liner to disable large folios.

Reported-by: Darrick J. Wong 
Signed-off-by: Christoph Hellwig 
---
 fs/xfs/scrub/xfile.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 090c3ead43fdf1..1a8d1bedd0b0dc 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -94,6 +94,11 @@ xfile_create(
 
lockdep_set_class(>i_rwsem, _i_mutex_key);
 
+   /*
+* We're not quite ready for large folios yet.
+*/
+   mapping_clear_large_folios(inode->i_mapping);
+
trace_xfile_create(xf);
 
*xfilep = xf;
-- 
2.39.2




[PATCH 1/2] mm: add a mapping_clear_large_folios helper

2024-01-10 Thread Christoph Hellwig
Users of shmem_kernel_file_setup might not be able to deal with large
folios (yet).  Give them a way to disable large folio support on their
mapping.

Signed-off-by: Christoph Hellwig 
---
 include/linux/pagemap.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 06142ff7f9ce0e..352d1f8423292c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -343,6 +343,20 @@ static inline void mapping_set_large_folios(struct 
address_space *mapping)
__set_bit(AS_LARGE_FOLIO_SUPPORT, >flags);
 }
 
+/**
+ * mapping_clear_large_folios() - Disable large folio support for a mapping
+ * @mapping: The mapping.
+ *
+ * This can be called to undo the effect of mapping_set_large_folios().
+ *
+ * Context: This should not be called while the inode is active as it
+ * is non-atomic.
+ */
+static inline void mapping_clear_large_folios(struct address_space *mapping)
+{
+   __clear_bit(AS_LARGE_FOLIO_SUPPORT, >flags);
+}
+
 /*
  * Large folio support currently depends on THP.  These dependencies are
  * being worked on but are not yet fixed.
-- 
2.39.2




disable large folios for shmem file used by xfs xfile

2024-01-10 Thread Christoph Hellwig
Hi all,

Darrick reported that the fairly new XFS xfile code blows up when force
enabling large folio for shmem.  This series fixes this quickly by disabling
large folios for this particular shmem file for now until it can be fixed
properly, which will be a lot more invasive.

I've added most of you to the CC list as I suspect most other users of
shmem_file_setup and friends will have similar issues.



Re: [PATCH v2] fs: clean up usage of noop_dirty_folio

2023-08-28 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH] fs: clean up usage of noop_dirty_folio

2023-08-28 Thread Christoph Hellwig
On Mon, Aug 21, 2023 at 01:20:33PM +0100, Matthew Wilcox wrote:
> I was hoping Christoph would weigh in ;-)  I don't have a strong

I've enjoyed 2 weeks of almost uninterrupted vacation.

I agree with this patch and also your incremental improvements.




Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Thu, Jan 20, 2022 at 07:27:36AM -0800, Keith Busch wrote:
> It doesn't look like IOMMU page sizes are exported, or even necessarily
> consistently sized on at least one arch (power).

At the DMA API layer dma_get_merge_boundary is the API for it.



Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Tue, Jan 11, 2022 at 12:17:18AM -0800, John Hubbard wrote:
> Zooming in on the pinning aspect for a moment: last time I attempted to
> convert O_DIRECT callers from gup to pup, I recall wanting very much to
> record, in each bio_vec, whether these pages were acquired via FOLL_PIN,
> or some non-FOLL_PIN method. Because at the end of the IO, it is not
> easy to disentangle which pages require put_page() and which require
> unpin_user_page*().

I don't think that is a problem.  Pinning only need to happen for
ITER_IOVEC, and the only non-user pages there is the ZERO_PAGE added
for padding that can be special cased.



Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Tue, Jan 11, 2022 at 04:26:48PM -0400, Jason Gunthorpe wrote:
> What I did in RDMA was make an iterator rdma_umem_for_each_dma_block()
> 
> The driver passes in the page size it wants and the iterator breaks up
> the SGL into that size.
> 
> So, eg on a 16k page size system the SGL would be full of 16K stuff,
> but the driver only support 4k and so the iterator hands out 4 pages
> for each SGL entry.
> 
> All the drivers use this to build their DMA lists and tables, it works
> really well.

The block layer also has the equivalent functionality by setting the
virt_boundary value in the queue_limits.  This is needed for NVMe
PRPs and RDMA drivers.



Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Wed, Jan 12, 2022 at 06:37:03PM +, Matthew Wilcox wrote:
> But let's go further than that (which only brings us to 32 bytes per
> range).  For the systems you care about which use an identity mapping,
> and have sizeof(dma_addr_t) == sizeof(phys_addr_t), we can simply
> point the dma_range pointer to the same memory as the phyr.  We just
> have to not free it too early.  That gets us down to 16 bytes per range,
> a saving of 33%.

Even without an IOMMU the dma_addr_t can have offsets vs the actual
physical address.  Not on x86 except for a weirdo SOC, but just about
everywhere else.



Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Tue, Jan 11, 2022 at 11:01:42AM -0400, Jason Gunthorpe wrote:
> Then we are we using get_user_phyr() at all if we are just storing it
> in a sg?

I think we need to stop calling the output of the phyr dma map
helper a sg.  Yes, a { dma_addr, len } tuple is scatter/gather I/O in its
purest form, but it will need a different name in Linux as the scatterlist
will have to stay around for a long time before all that mess is cleaned
up.



Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Mon, Jan 10, 2022 at 08:41:26PM -0400, Jason Gunthorpe wrote:
> > Finally, it may be possible to stop using scatterlist to describe the
> > input to the DMA-mapping operation.  We may be able to get struct
> > scatterlist down to just dma_address and dma_length, with chaining
> > handled through an enclosing struct.
> 
> Can you talk about this some more? IMHO one of the key properties of
> the scatterlist is that it can hold huge amounts of pages without
> having to do any kind of special allocation due to the chaining.
> 
> The same will be true of the phyr idea right?

No special allocations as in no vmalloc?  The chaining still has to
allocate memory using a mempool.

Anyway, to explain my idea which is very similar but not identical to
the one willy has:

 - on the input side to dma mapping the bio_vecs (or phyrs) are chained
   as bios or whatever the containing structure is.  These already exist
   and have infrastructure at least in the block layer
 - on the output side I plan for two options:

1) we have a sane IOMMU and everyting will be coalesced into a
   single dma_range.  This requires setting the block layer
   merge boundary to match the IOMMU page size, but that is
   a very good thing to do anyway.
2) we have no IOMMU (or a weird one) and get one output dma_range
   per input bio_vec.  We'd eithe have to support chaining or use
   vmalloc or huge numbers of entries.

> If you limit to that scenario then we can be more optimal because
> things like byte granular offsets and size in the interior pages don't
> need to exist. Every interior chunk is always aligned to its order and
> we only need to record the order.

The block layer does not small offsets.  Direct I/O can often be
512 byte aligned, and some other passthrough commands can have even
smaller alignment, although I don't think we ever go below 4-byte
alignment anywhere in the block layer.

> IMHO storage density here is quite important, we end up having to keep
> this stuff around for a long time.

If we play these tricks it won't be general purpose enough to get rid
of the existing scatterlist usage.



Re: Phyr Starter

2022-01-20 Thread Christoph Hellwig
On Mon, Jan 10, 2022 at 07:34:49PM +, Matthew Wilcox wrote:
> TLDR: I want to introduce a new data type:
> 
> struct phyr {
> phys_addr_t addr;
> size_t len;
> };
> 
> and use it to replace bio_vec as well as using it to replace the array
> of struct pages used by get_user_pages() and friends.

FYI, I've done a fair amount of work (some already mainline as in all
the helpers for biovec page access), some of it still waiting (switching
more users over to these helpers and cleaning up some other mess)
to move bio_vecs into a form like that.  The difference in my plan
is to have a u32 len, both to allow for flags space on 64-bit which
we might need to support things like P2P without dev_pagemap structures.

> Finally, it may be possible to stop using scatterlist to describe the
> input to the DMA-mapping operation.  We may be able to get struct
> scatterlist down to just dma_address and dma_length, with chaining
> handled through an enclosing struct.

Yes, I have some prototype could that takes a bio_vec as input and
returns an array of

struct dma_range {
dma_addr_t  addr;
u32 len;
}

І need to get back to it and especially back to the question if this
needs the chaining support that the current scatterlist has.



Re: [PATCH v2 1/4] x86/mm: Export force_dma_unencrypted

2019-09-04 Thread Christoph Hellwig
On Wed, Sep 04, 2019 at 09:32:30AM +0200, Thomas Hellström (VMware) wrote:
> That sounds great. Is there anything I can do to help out? I thought this
> was more or less a dead end since the current dma_mmap_ API requires the
> mmap_sem to be held in write mode (modifying the vma->vm_flags) whereas
> fault() only offers read mode. But that would definitely work.

We'll just need to split into a setup and faul phase.  I have some
sketches from a while ago, let me dust them off so that you can
try them.

> "If it's the latter, then I would like to reiterate that it would be better
> that we work to come up with a long term plan to add what's missing to the
> DMA api to help graphics drivers use coherent memory?"

I don't think we need a long term plan.  We've been adding features
on an as-needed basis.  And now that we have siginificanty less
implementations of the API this actually becomes much easier as well.


Re: [PATCH v2 1/4] x86/mm: Export force_dma_unencrypted

2019-09-03 Thread Christoph Hellwig
On Tue, Sep 03, 2019 at 04:32:45PM +0200, Thomas Hellström (VMware) wrote:
> Is this a layer violation concern, that is, would you be ok with a similar
> helper for TTM, or is it that you want to force the graphics drivers into
> adhering strictly to the DMA api, even when it from an engineering
> perspective makes no sense?

>From looking at DRM I strongly believe that making DRM use the DMA
mapping properly makes a lot of sense from the engineering perspective,
and this series is a good argument for that positions.  If DRM was using
the DMA properl we would not need this series to start with, all the
SEV handling is hidden behind the DMA API.  While we had occasional
bugs in that support fixing it meant that it covered all drivers
properly using that API.


Re: [PATCH v2 2/4] s390/mm: Export force_dma_unencrypted

2019-09-03 Thread Christoph Hellwig
On Tue, Sep 03, 2019 at 03:15:02PM +0200, Thomas Hellström (VMware) wrote:
> From: Thomas Hellstrom 
> 
> The force_dma_unencrypted symbol is needed by TTM to set up the correct
> page protection when memory encryption is active. Export it.

Smae here.  None of a drivers business.  DMA decisions are hidden
behind the DMA API.


Re: [PATCH v2 1/4] x86/mm: Export force_dma_unencrypted

2019-09-03 Thread Christoph Hellwig
On Tue, Sep 03, 2019 at 03:15:01PM +0200, Thomas Hellström (VMware) wrote:
> From: Thomas Hellstrom 
> 
> The force_dma_unencrypted symbol is needed by TTM to set up the correct
> page protection when memory encryption is active. Export it.

NAK.  This is a helper for the core DMA code and drivers have no
business looking at it.


Re: [PATCHv2 2/2] i915: do not leak module ref counter

2019-08-19 Thread Christoph Hellwig
On Tue, Aug 20, 2019 at 12:13:59PM +0900, Sergey Senozhatsky wrote:
> Always put_filesystem() in i915_gemfs_init().
> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  - v2: rebased (i915 does not remount gemfs anymore)

Which means it real doesn't need its mount anyore, and thus can use
plain old shmem_file_setup and doesn't need to mess with file system
types at all.

Assuming we find a legitimate rason for why a driver should be able
to create a kernel mount or a file system type where it doesn't have
access to the struct file_system_type an API that mount by file system
name and thus hides the get_fs_type and put_filesystem would be a much
better API than adding this random export.


Re: [PATCH] drm/virtio: use virtio_max_dma_size

2019-08-10 Thread Christoph Hellwig
On Thu, Aug 08, 2019 at 05:34:45PM +0200, Gerd Hoffmann wrote:
> We must make sure our scatterlist segments are not too big, otherwise
> we might see swiotlb failures (happens with sev, also reproducable with
> swiotlb=force).

Btw, any chance I could also draft you to replace the remaining
abuses of swiotlb_max_segment and swiotlb_nr_tbl in drm with proper
dma level interface?


Re: [PATCH for-5.3] drm/omap: ensure we have a valid dma_mask

2019-08-09 Thread Christoph Hellwig
On Fri, Aug 09, 2019 at 01:00:38PM +0300, Tomi Valkeinen wrote:
> Alright, thanks for the clarification!
> 
> Here's my version.

Looks god to me:

Reviewed-by: Christoph Hellwig 


Re: [PATCH 1/2] drm: add cache support for arm64

2019-08-09 Thread Christoph Hellwig
On Thu, Aug 08, 2019 at 09:44:32AM -0700, Rob Clark wrote:
> > GFP_HIGHUSER basically just means that this is an allocation that could
> > dip into highmem, in which case it would not have a kernel mapping.
> > This can happen on arm + LPAE, but not on arm64.
> 
> Just a dumb question, but why is *all* memory in the linear map on
> arm64?  It would seem useful to have a source of pages that is not in
> the linear map.
> I guess it is mapped as huge pages (or something larger than 4k pages)?

In general that is just how the Linux kernel always worked, on all
architectures - we always had a linear mapping for all memory in the
kernel to make accessing it simple.  That started to break down a bit
with the 32-bit x86 PAE mode that supported more physical addressing
that virtual, which required the "high" memory to not be mapped into
the kernel direct mapping.  Similar schemes later showed up on various
other 32-bit architectures.

There is a patchset called XPFO that ensures memory is either in the
kernel direct map, or in userspace but not both to work around
speculation related vulnerabilities, but it has a pretty big performance
impact.

> Any recommended reading to understand how/why the kernel address space
> is setup the way it is (so I can ask fewer dumb questions)?

I don't really have a good pointer.  But usually there is just dumb
answers, not dumb questions.


Re: [PATCH 1/2] drm: add cache support for arm64

2019-08-09 Thread Christoph Hellwig
On Thu, Aug 08, 2019 at 01:58:08PM +0200, Daniel Vetter wrote:
> > > We use shmem to get at swappable pages. We generally just assume that
> > > the gpu can get at those pages, but things fall apart in fun ways:
> > > - some setups somehow inject bounce buffers. Some drivers just give
> > > up, others try to allocate a pool of pages with dma_alloc_coherent.
> > > - some devices are misdesigned and can't access as much as the cpu. We
> > > allocate using GFP_DMA32 to fix that.
> > 
> > Well, for shmem you can't really call allocators directly, right?
> 
> We can pass gfp flags to shmem_read_mapping_page_gfp, which is just about
> enough for the 2 cases on intel platforms where the gpu can only access
> 4G, but the cpu has way more.

Right.  And that works for architectures without weird DMA offsets and
devices that exactly have a 32-bit DMA limit.  It falls flat for all
the more complex ones unfortunately.

> > But userspace malloc really means dma_map_* anyway, so not really
> > relevant for memory allocations.
> 
> It does tie in, since we'll want a dma_map which fails if a direct mapping
> isn't possible. It also helps the driver code a lot if we could use the
> same low-level flushing functions between our own memory (whatever that
> is) and anon pages from malloc. And in all the cases if it's not possible,
> we want a failure, not elaborate attempts at hiding the differences
> between all possible architectures out there.

At the very lowest level all goes down to the same three primitives we
talked about anyway, but there are different ways how they are combined.
For the streaming mappins looks at the table in arch/arc/mm/dma.c I
mentioned earlier.  For memory that is prepared for just mmaping to
userspace without a kernel user we'll always do a wb+inv.  But as the
other subthread shows we'll need to eventually look into unmapping
(or remapping with the same attributes) of that memory in kernel space
to avoid speculation bugs (or just invalid combination on x86 where
we check for that), so the API will be a little more complex.

Btw, are all DRM drivers using vmf_insert_* to pre-populate the mapping
like the MSM case, or are some doing dynamic faulting from
vm_ops->fault?


Re: [PATCH for-5.3] drm/omap: ensure we have a valid dma_mask

2019-08-09 Thread Christoph Hellwig
On Fri, Aug 09, 2019 at 09:40:32AM +0300, Tomi Valkeinen wrote:
> We do call dma_set_coherent_mask() in omapdrm's probe() (in omap_drv.c), 
> but apparently that's not enough anymore. Changing that call to 
> dma_coerce_mask_and_coherent() removes the WARN. I can create a patch for 
> that, or Christoph can respin this one.

Oh, yes - that actually is the right thing to do here.  If you already
have it please just send it out.

>
> I am not too familiar with the dma mask handling, so maybe someone can 
> educate:
>
> dma_coerce_mask_and_coherent() overwrites dev->dma_mask. Isn't that a bad 
> thing? What if the platform has set dev->dma_mask, and the driver 
> overwrites it with its value? Or who is supposed to set dev->dma_mask?

->dma_mask is a complete mess.  It is a pointer when it really should
just be a u64, and that means every driver layer has to allocate space
for it.  We don't really do that for platform_devices, as that breaks
horribly assumptions in the usb code.  That is why
dma_coerce_mask_and_coherent exists as a nasty workaround that sets
the dma_mask to the coherent_dma_mask for devices that don't have
space for ->dma_mask allocated, which works as long as the device
doesn't have differnet addressing requirements for both.

I'm actually working to fix that mess up at the moment, but it is going
to take a few cycles until everything falls into place.


[PATCH for-5.3] drm/omap: ensure we have a valid dma_mask

2019-08-08 Thread Christoph Hellwig
The omapfb platform devices does not have a DMA mask set.  The
traditional arm DMA code ignores, but the generic dma-direct/swiotlb
has stricter checks and thus fails mappings without a DMA mask.
As we use swiotlb for arm with LPAE now, omap needs to catch up
and actually set a DMA mask.

Fixes: ad3c7b18c5b3 ("arm: use swiotlb for bounce buffering on LPAE configs")
Reported-by: "H. Nikolaus Schaller" 
Tested-by: "H. Nikolaus Schaller" 
Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/omapdrm/omap_fbdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/omapdrm/omap_fbdev.c 
b/drivers/gpu/drm/omapdrm/omap_fbdev.c
index 561c4812545b..2c8abf07e617 100644
--- a/drivers/gpu/drm/omapdrm/omap_fbdev.c
+++ b/drivers/gpu/drm/omapdrm/omap_fbdev.c
@@ -232,6 +232,8 @@ void omap_fbdev_init(struct drm_device *dev)
if (!priv->num_pipes)
return;
 
+   dma_coerce_mask_and_coherent(dev->dev, DMA_BIT_MASK(32));
+
fbdev = kzalloc(sizeof(*fbdev), GFP_KERNEL);
if (!fbdev)
goto fail;
-- 
2.20.1



Re: [PATCH 1/2] drm: add cache support for arm64

2019-08-08 Thread Christoph Hellwig
On Wed, Aug 07, 2019 at 09:09:53AM -0700, Rob Clark wrote:
> > > (Eventually I'd like to support pages passed in from userspace.. but
> > > that is down the road.)
> >
> > Eww.  Please talk to the iommu list before starting on that.
> 
> This is more of a long term goal, we can't do it until we have
> per-context/process pagetables, ofc.
> 
> Getting a bit off topic, but I'm curious about what problems you are
> concerned about.  Userspace can shoot it's own foot, but if it is not
> sharing GPU pagetables with other processes, it can't shoot other's
> feet.  (I'm guessing you are concerned about non-page-aligned
> mappings?)

Maybe I misunderstood what you mean above, I though you mean messing
with page cachability attributes for userspace pages.  If what you are
looking into is just "standard" SVM I only hope that our APIs for that
which currently are a mess are in shape by then, as all users currently
have their own crufty and at least slightly buggy versions of that.  But
at least it is an issue that is being worked on.

> > So back to the question, I'd like to understand your use case (and
> > maybe hear from the other drm folks if that is common):
> >
> >  - you allocate pages from shmem (why shmem, btw?  if this is done by
> >other drm drivers how do they guarantee addressability without an
> >iommu?)
> 
> shmem for swappable pages.  I don't unpin and let things get swapped
> out yet, but I'm told it starts to become important when you have 50
> browser tabs open ;-)

Yes,  but at that point the swapping can use the kernel linear mapping
and we are going into aliasing problems that can disturb the cache.  So
as-is this is going to problematic without new hooks into shmemfs.

> >  - then the memory is either mapped to userspace or vmapped (or even
> >both, althrough the lack of aliasing you mentioned would speak
> >against it) as writecombine (aka arm v6+ normal uncached).  Does
> >the mapping live on until the memory is freed?
> 
> (side note, *most* of the drm/msm supported devices are armv8, the
> exceptions are 8060 and 8064 which are armv7.. I don't think drm/msm
> will ever have to deal w/ armv6)

Well, the point was that starting from v6 the kernels dma uncached
really is write combine.  So that applied to v7 and v8 as well.


Re: [PATCH 1/2] drm: add cache support for arm64

2019-08-08 Thread Christoph Hellwig
On Wed, Aug 07, 2019 at 10:48:56AM +0200, Daniel Vetter wrote:
> >other drm drivers how do they guarantee addressability without an
> >iommu?)
> 
> We use shmem to get at swappable pages. We generally just assume that
> the gpu can get at those pages, but things fall apart in fun ways:
> - some setups somehow inject bounce buffers. Some drivers just give
> up, others try to allocate a pool of pages with dma_alloc_coherent.
> - some devices are misdesigned and can't access as much as the cpu. We
> allocate using GFP_DMA32 to fix that.

Well, for shmem you can't really call allocators directly, right?

One thing I have in my pipeline is a dma_alloc_pages API that allocates
pages that are guaranteed to be addressably by the device or otherwise
fail.  But that doesn't really help with the shmem fs.

> Also modern gpu apis pretty much assume you can malloc() and then use
> that directly with the gpu.

Which is fine as long as the GPU itself supports full 64-bit addressing
(or always sits behind an iommu), and the platform doesn't impose
addressing limit, which unfortunately some that are shipped right now
still do :(

But userspace malloc really means dma_map_* anyway, so not really
relevant for memory allocations.


Re: [PATCH 1/2] drm: add cache support for arm64

2019-08-08 Thread Christoph Hellwig
On Wed, Aug 07, 2019 at 10:30:04AM -0700, Rob Clark wrote:
> So, we do end up using GFP_HIGHUSER, which appears to get passed thru
> when shmem gets to the point of actually allocating pages.. not sure
> if that just ends up being a hint, or if it guarantees that we don't
> get something in the linear map.
> 
> (Bear with me while I "page" this all back in.. last time I dug thru
> the shmem code was probably pre-armv8, or at least before I had any
> armv8 hw)

GFP_HIGHUSER basically just means that this is an allocation that could
dip into highmem, in which case it would not have a kernel mapping.
This can happen on arm + LPAE, but not on arm64.


Re: [PATCH 1/2] drm: add cache support for arm64

2019-08-08 Thread Christoph Hellwig
On Wed, Aug 07, 2019 at 05:49:59PM +0100, Mark Rutland wrote:
> I'm fairly confident that the linear/direct map cacheable alias is not
> torn down when pages are allocated. The gneeric page allocation code
> doesn't do so, and I see nothing the shmem code to do so.

It is not torn down anywhere.

> For arm64, we can tear down portions of the linear map, but that has to
> be done explicitly, and this is only possible when using rodata_full. If
> not using rodata_full, it is not possible to dynamically tear down the
> cacheable alias.

Interesting.  For this or next merge window I plan to add support to the
generic DMA code to remap pages as uncachable in place based on the
openrisc code.  Aѕ far as I can tell the requirement for that is
basically just that the kernel direct mapping doesn't use PMD or bigger
mapping so that it supports changing protection bits on a per-PTE basis.
Is that the case with arm64 + rodata_full?

> > My understanding is that a cacheable alias is "ok", with some
> > caveats.. ie. that the cacheable alias is not accessed.  
> 
> Unfortunately, that is not true. You'll often get away with it in
> practice, but that's a matter of probability rather than a guarantee.
> 
> You  cannot prevent a CPU from accessing a VA arbitrarily (e.g. as the
> result of wild speculation). The ARM ARM (ARM DDI 0487E.a) points this
> out explicitly:

Well, if we want to fix this properly we'll have to remap in place
for dma_alloc_coherent and friends.


Re: [PATCH 1/2] drm: add cache support for arm64

2019-08-08 Thread Christoph Hellwig
On Wed, Aug 07, 2019 at 01:38:08PM +0100, Mark Rutland wrote:
> > I *believe* that there are not alias mappings (that I don't control
> > myself) for pages coming from
> > shmem_file_setup()/shmem_read_mapping_page()..  
> 
> AFAICT, that's regular anonymous memory, so there will be a cacheable
> alias in the linear/direct map.

Yes.  Although shmem is in no way special in that regard.  Even with the
normal dma_alloc_coherent implementation on arm and arm64 we keep the
cacheable alias in the direct mapping and just create a new non-cacheable
one.  The only exception are CMA allocations on 32-bit arm, which do
get remapped to uncachable in place.


Re: drm pull for v5.3-rc1

2019-08-07 Thread Christoph Hellwig
On Tue, Aug 06, 2019 at 12:09:38PM -0700, Matthew Wilcox wrote:
> Has anyone looked at turning the interface inside-out?  ie something like:
> 
>   struct mm_walk_state state = { .mm = mm, .start = start, .end = end, };
> 
>   for_each_page_range(, page) {
>   ... do something with page ...
>   }
> 
> with appropriate macrology along the lines of:
> 
> #define for_each_page_range(state, page)  \
>   while ((page = page_range_walk_next(state)))
> 
> Then you don't need to package anything up into structs that are shared
> between the caller and the iterated function.

I'm not an all that huge fan of super magic macro loops.  But in this
case I don't see how it could even work, as we get special callbacks
for huge pages and holes, and people are trying to add a few more ops
as well.


Re: drm pull for v5.3-rc1

2019-08-07 Thread Christoph Hellwig
On Tue, Aug 06, 2019 at 11:50:42AM -0700, Linus Torvalds wrote:
> 
> In fact, I do note that a lot of the users don't actually use the
> "void *private" argument at all - they just want the walker - and just
> pass in a NULL private pointer. So we have things like this:
> 
> > +   if (walk_page_range(_mm, va, va + size, _nocache_walk_ops,
> > +   NULL)) {
> 
> and in a perfect world we'd have arguments with default values so that
> we could skip those entirely for when people just don't need it.
> 
> I'm not a huge fan of C++ because of a lot of the complexity (and some
> really bad decisions), but many of the _syntactic_ things in C++ would
> be nice to use. This one doesn't seem to be one that the gcc people
> have picked up as an extension ;(
> 
> Yes, yes, we could do it with a macro, I guess.
> 
>#define walk_page_range(mm, start,end, ops, ...) \
>__walk_page_range(mm, start, end, (NULL , ## __VA_ARGS__))
> 
> but I'm not sure it's worthwhile.

Given that is is just a single argument I'm not to worried.  A simpler
and a more complex variant seems more useful if we can skip a few
arguments IMHO.


Re: [PATCHv2 2/3] i915: convert to new mount API

2019-08-07 Thread Christoph Hellwig
On Tue, Aug 06, 2019 at 12:50:10AM -0700, Hugh Dickins wrote:
> Though personally I'm averse to managing "f"objects through
> "m"interfaces, which can get ridiculous (notably, MADV_HUGEPAGE works
> on the virtual address of a mapping, but the huge-or-not alignment of
> that mapping must have been decided previously).  In Google we do use
> fcntls F_HUGEPAGE and F_NOHUGEPAGE to override on a per-file basis -
> one day I'll get to upstreaming those.

Such an interface seems very useful, although the two fcntls seem a bit
odd.

But I think the point here is that the i915 has its own somewhat odd
instance of tmpfs.  If we could pass the equivalent of the huge=*
options to shmem_file_setup all that garbage (including the
shmem_file_setup_with_mnt function) could go away.


Re: [PATCH 1/2] drm: add cache support for arm64

2019-08-07 Thread Christoph Hellwig
On Tue, Aug 06, 2019 at 09:23:51AM -0700, Rob Clark wrote:
> On Tue, Aug 6, 2019 at 8:50 AM Christoph Hellwig  wrote:
> >
> > On Tue, Aug 06, 2019 at 07:11:41AM -0700, Rob Clark wrote:
> > > Agreed that drm_cflush_* isn't a great API.  In this particular case
> > > (IIUC), I need wb+inv so that there aren't dirty cache lines that drop
> > > out to memory later, and so that I don't get a cache hit on
> > > uncached/wc mmap'ing.
> >
> > So what is the use case here?  Allocate pages using the page allocator
> > (or CMA for that matter), and then mmaping them to userspace and never
> > touching them again from the kernel?
> 
> Currently, it is pages coming from tmpfs.  Ideally we want pages that
> are swappable when unpinned.

tmpfs is basically a (complicated) frontend for alloc pages as far
as page allocation is concerned.

> CPU mappings are *mostly* just mapping to userspace.  There are a few
> exceptions that are vmap'd (fbcon, and ringbuffer).

And those use the same backend?

> (Eventually I'd like to support pages passed in from userspace.. but
> that is down the road.)

Eww.  Please talk to the iommu list before starting on that.

> > > Tying it in w/ iommu seems a bit weird to me.. but maybe that is just
> > > me, I'm certainly willing to consider proposals or to try things and
> > > see how they work out.
> >
> > This was just my through as the fit seems easy.  But maybe you'll
> > need to explain your use case(s) a bit more so that we can figure out
> > what a good high level API is.
> 
> Tying it to iommu_map/unmap would be awkward, as we could need to
> setup cpu mmap before it ends up mapped to iommu.  And the plan to
> support per-process pagetables involved creating an iommu_domain per
> userspace gl context.. some buffers would end up mapped into multiple
> contexts/iommu_domains.
> 
> If the cache operation was detached from iommu_map/unmap, then it
> would seem weird to be part of the iommu API.
> 
> I guess I'm not entirely sure what you had in mind, but this is why
> iommu seemed to me like a bad fit.

So back to the question, I'd like to understand your use case (and
maybe hear from the other drm folks if that is common):

 - you allocate pages from shmem (why shmem, btw?  if this is done by
   other drm drivers how do they guarantee addressability without an
   iommu?)
 - then the memory is either mapped to userspace or vmapped (or even
   both, althrough the lack of aliasing you mentioned would speak
   against it) as writecombine (aka arm v6+ normal uncached).  Does
   the mapping live on until the memory is freed?
 - as you mention swapping - how do you guarantee there are no
   aliases in the kernel direct mapping after the page has been swapped
   in?
 - then the memory is potentially mapped to the iommu.  Is it using
   a long-living mapping, or does get unmapped/remapped repeatedly?
 


Re: [PATCH 1/2] drm: add cache support for arm64

2019-08-06 Thread Christoph Hellwig
On Tue, Aug 06, 2019 at 07:11:41AM -0700, Rob Clark wrote:
> Agreed that drm_cflush_* isn't a great API.  In this particular case
> (IIUC), I need wb+inv so that there aren't dirty cache lines that drop
> out to memory later, and so that I don't get a cache hit on
> uncached/wc mmap'ing.

So what is the use case here?  Allocate pages using the page allocator
(or CMA for that matter), and then mmaping them to userspace and never
touching them again from the kernel?

> Tying it in w/ iommu seems a bit weird to me.. but maybe that is just
> me, I'm certainly willing to consider proposals or to try things and
> see how they work out.

This was just my through as the fit seems easy.  But maybe you'll
need to explain your use case(s) a bit more so that we can figure out
what a good high level API is.

> Exposing the arch_sync_* API and using that directly (bypassing
> drm_cflush_*) actually seems pretty reasonable and pragmatic.  I did
> have one doubt, as phys_to_virt() is only valid for kernel direct
> mapped memory (AFAIU), what happens for pages that are not in kernel
> linear map?  Maybe it is ok to ignore those pages, since they won't
> have an aliased mapping?

They could have an aliased mapping in vmalloc/vmap space for example,
if you created one.  We have the flush_kernel_vmap_range /
invalidate_kernel_vmap_range APIs for those, that are implement
on architectures with VIVT caches.


Re: [PATCH 1/2] drm: add cache support for arm64

2019-08-06 Thread Christoph Hellwig
On Tue, Aug 06, 2019 at 11:38:16AM +0200, Daniel Vetter wrote:
> I just read through all the arch_sync_dma_for_device/cpu functions and
> none seem to use the struct *dev argument. Iirc you've said that's on the
> way out?

Not actively on the way out yet, but now that we support all
architectures it becomes pretty clear we don't need it.  So yes, we
could eventually remove it.  But..

> That dev parameter is another holdup for the places where we do not yet
> know what the new device will be (e.g. generic dma-buf exporters like
> vgem). And sprinkling a fake dev or passing NULL is a bit silly.

This is where it becomes interesting.  Because while we don't need the
dev argument for the low-level arch API, we do need it at the driver
API level, given that in some systems both dma coherent and non-coherent
devices exist, and we need to decide based on that.

> Add a HAVE_ARCH_SYNC_DMA and the above refactor (assuming it's ok to roll
> out everywhere) and we should indeed be able to use this. We still need to
> have all the others for x86 and all that. Plus I guess we should roll out
> the split into invalidate and flush.

Well, we'll still need a defined API for buffer ownership vs device.
Just exporting arch_sync_dma_for_{device,cpu} is not the right
abstraction level.  Note that these two functions are always present,
just stubbed out for architectures that do not support non-cache
coherent devices (plus the arm false negative that is easily fixable).


Re: [PATCH 1/2] drm: add cache support for arm64

2019-08-06 Thread Christoph Hellwig
This goes in the wrong direction.  drm_cflush_* are a bad API we need to
get rid of, not add use of it.  The reason for that is two-fold:

 a) it doesn't address how cache maintaince actually works in most
platforms.  When talking about a cache we three fundamental operations:

1) write back - this writes the content of the cache back to the
   backing memory
2) invalidate - this remove the content of the cache
3) write back + invalidate - do both of the above

 b) which of the above operation you use when depends on a couple of
factors of what you want to do with the range you do the cache
maintainance operations

Take a look at the comment in arch/arc/mm/dma.c around line 30 that
explains how this applies to buffer ownership management.  Note that
"for device" applies to "for userspace" in the same way, just that
userspace then also needs to follow this protocol.  So the whole idea
that random driver code calls random low-level cache maintainance
operations (and use the non-specific term flush to make it all more
confusing) is a bad idea.  Fortunately enough we have really good
arch helpers for all non-coherent architectures (this excludes the
magic i915 won't be covered by that, but that is a separate issue
to be addressed later, and the fact that while arm32 did grew them
very recently and doesn't expose them for all configs, which is easily
fixable if needed) with arch_sync_dma_for_device and
arch_sync_dma_for_cpu.  So what we need is to figure out where we
have valid cases for buffer ownership transfer outside the DMA
API, and build proper wrappers around the above function for that.
My guess is it should probably be build to go with the iommu API
as that is the only other way to map memory for DMA access, but
if you have a better idea I'd be open to discussion.


Re: drm pull for v5.3-rc1

2019-08-06 Thread Christoph Hellwig
[adding the real linux-mm list now]

On Tue, Aug 06, 2019 at 12:38:31AM -0700, Christoph Hellwig wrote:
> On Mon, Jul 15, 2019 at 03:17:42PM -0700, Linus Torvalds wrote:
> > The attached patch does add more lines than it removes, but in most
> > cases it's actually a clear improvement.
> 
> Seems like no one took this up.  Below is a version which I think is
> slightly better by also moving the mm_walk structure initialization
> into the helpers, with an outcome of just a handful of added lines.
> 
> What I also noticed doing that is that a lot of walk_page_range users
> have a vma at hand and should probably use walk_vma_range.
> 
> But one thing I noticed is that we don't just have Thomas series that
> touches this area, but also the "Generic page walk and ptdump" one
> from Steven.  In addition various users of the functionality are under
> heavy development.
> 
> So I think we need to queue this up in an actual git tree that others
> can pull in, similar to the hmm model.
> 
> --
> From 67c1c6b56322bdd2937008e7fb79fb6f6e345dab Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Mon, 5 Aug 2019 11:10:44 +0300
> Subject: pagewalk: clean up the API
> 
> The mm_walk structure currently mixed data and code.  Split out the
> operations vectors into a new mm_walk_ops structure, and while we
> are changing the API also declare the mm_walk structure inside the
> walk_page_range and walk_page_vma functions.
> 
> Last but not least move all the declarations to a new pagewalk.h
> header so that doesn't pollute every user of mm.h.
> 
> Based on patch from Linus Torvalds.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/openrisc/kernel/dma.c  |  23 +++--
>  arch/powerpc/mm/book3s64/subpage_prot.c |  12 +--
>  arch/s390/mm/gmap.c |  35 ---
>  fs/proc/task_mmu.c  |  76 +++
>  include/linux/mm.h  |  46 -
>  include/linux/pagewalk.h|  66 +
>  mm/hmm.c|  42 +++--
>  mm/madvise.c|  42 +++--
>  mm/memcontrol.c |  25 +++--
>  mm/mempolicy.c  |  17 ++--
>  mm/migrate.c|  16 ++--
>  mm/mincore.c|  17 ++--
>  mm/mprotect.c   |  26 ++---
>  mm/pagewalk.c   | 120 ++--
>  14 files changed, 284 insertions(+), 279 deletions(-)
>  create mode 100644 include/linux/pagewalk.h
> 
> diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
> index b41a79fcdbd9..4d5b8bd1d795 100644
> --- a/arch/openrisc/kernel/dma.c
> +++ b/arch/openrisc/kernel/dma.c
> @@ -16,6 +16,7 @@
>   */
>  
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -43,6 +44,10 @@ page_set_nocache(pte_t *pte, unsigned long addr,
>   return 0;
>  }
>  
> +static const struct mm_walk_ops set_nocache_walk_ops = {
> + .pte_entry  = page_set_nocache,
> +};
> +
>  static int
>  page_clear_nocache(pte_t *pte, unsigned long addr,
>  unsigned long next, struct mm_walk *walk)
> @@ -58,6 +63,10 @@ page_clear_nocache(pte_t *pte, unsigned long addr,
>   return 0;
>  }
>  
> +static const struct mm_walk_ops clear_nocache_walk_ops = {
> + .pte_entry  = page_clear_nocache,
> +};
> +
>  /*
>   * Alloc "coherent" memory, which for OpenRISC means simply uncached.
>   *
> @@ -80,10 +89,6 @@ arch_dma_alloc(struct device *dev, size_t size, dma_addr_t 
> *dma_handle,
>  {
>   unsigned long va;
>   void *page;
> - struct mm_walk walk = {
> - .pte_entry = page_set_nocache,
> - .mm = _mm
> - };
>  
>   page = alloc_pages_exact(size, gfp | __GFP_ZERO);
>   if (!page)
> @@ -98,7 +103,8 @@ arch_dma_alloc(struct device *dev, size_t size, dma_addr_t 
> *dma_handle,
>* We need to iterate through the pages, clearing the dcache for
>* them and setting the cache-inhibit bit.
>*/
> - if (walk_page_range(va, va + size, )) {
> + if (walk_page_range(_mm, va, va + size, _nocache_walk_ops,
> + NULL)) {
>   free_pages_exact(page, size);
>   return NULL;
>   }
> @@ -111,13 +117,10 @@ arch_dma_free(struct device *dev, size_t size, void 
> *vaddr,
>   dma_addr_t dma_handle, unsigned long attrs)
>  {
>   unsigned long va = (unsigned long)vaddr;
> - struct mm_walk walk = {
> - .pte_entry = page_clear_nocache,
> - .mm = _mm
&g

Re: drm pull for v5.3-rc1

2019-08-06 Thread Christoph Hellwig
On Mon, Jul 15, 2019 at 03:17:42PM -0700, Linus Torvalds wrote:
> The attached patch does add more lines than it removes, but in most
> cases it's actually a clear improvement.

Seems like no one took this up.  Below is a version which I think is
slightly better by also moving the mm_walk structure initialization
into the helpers, with an outcome of just a handful of added lines.

What I also noticed doing that is that a lot of walk_page_range users
have a vma at hand and should probably use walk_vma_range.

But one thing I noticed is that we don't just have Thomas series that
touches this area, but also the "Generic page walk and ptdump" one
from Steven.  In addition various users of the functionality are under
heavy development.

So I think we need to queue this up in an actual git tree that others
can pull in, similar to the hmm model.

--
>From 67c1c6b56322bdd2937008e7fb79fb6f6e345dab Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Mon, 5 Aug 2019 11:10:44 +0300
Subject: pagewalk: clean up the API

The mm_walk structure currently mixed data and code.  Split out the
operations vectors into a new mm_walk_ops structure, and while we
are changing the API also declare the mm_walk structure inside the
walk_page_range and walk_page_vma functions.

Last but not least move all the declarations to a new pagewalk.h
header so that doesn't pollute every user of mm.h.

Based on patch from Linus Torvalds.

Signed-off-by: Christoph Hellwig 
---
 arch/openrisc/kernel/dma.c  |  23 +++--
 arch/powerpc/mm/book3s64/subpage_prot.c |  12 +--
 arch/s390/mm/gmap.c |  35 ---
 fs/proc/task_mmu.c  |  76 +++
 include/linux/mm.h  |  46 -
 include/linux/pagewalk.h|  66 +
 mm/hmm.c|  42 +++--
 mm/madvise.c|  42 +++--
 mm/memcontrol.c |  25 +++--
 mm/mempolicy.c  |  17 ++--
 mm/migrate.c|  16 ++--
 mm/mincore.c|  17 ++--
 mm/mprotect.c   |  26 ++---
 mm/pagewalk.c   | 120 ++--
 14 files changed, 284 insertions(+), 279 deletions(-)
 create mode 100644 include/linux/pagewalk.h

diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index b41a79fcdbd9..4d5b8bd1d795 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -16,6 +16,7 @@
  */
 
 #include 
+#include 
 
 #include 
 #include 
@@ -43,6 +44,10 @@ page_set_nocache(pte_t *pte, unsigned long addr,
return 0;
 }
 
+static const struct mm_walk_ops set_nocache_walk_ops = {
+   .pte_entry  = page_set_nocache,
+};
+
 static int
 page_clear_nocache(pte_t *pte, unsigned long addr,
   unsigned long next, struct mm_walk *walk)
@@ -58,6 +63,10 @@ page_clear_nocache(pte_t *pte, unsigned long addr,
return 0;
 }
 
+static const struct mm_walk_ops clear_nocache_walk_ops = {
+   .pte_entry  = page_clear_nocache,
+};
+
 /*
  * Alloc "coherent" memory, which for OpenRISC means simply uncached.
  *
@@ -80,10 +89,6 @@ arch_dma_alloc(struct device *dev, size_t size, dma_addr_t 
*dma_handle,
 {
unsigned long va;
void *page;
-   struct mm_walk walk = {
-   .pte_entry = page_set_nocache,
-   .mm = _mm
-   };
 
page = alloc_pages_exact(size, gfp | __GFP_ZERO);
if (!page)
@@ -98,7 +103,8 @@ arch_dma_alloc(struct device *dev, size_t size, dma_addr_t 
*dma_handle,
 * We need to iterate through the pages, clearing the dcache for
 * them and setting the cache-inhibit bit.
 */
-   if (walk_page_range(va, va + size, )) {
+   if (walk_page_range(_mm, va, va + size, _nocache_walk_ops,
+   NULL)) {
free_pages_exact(page, size);
return NULL;
}
@@ -111,13 +117,10 @@ arch_dma_free(struct device *dev, size_t size, void 
*vaddr,
dma_addr_t dma_handle, unsigned long attrs)
 {
unsigned long va = (unsigned long)vaddr;
-   struct mm_walk walk = {
-   .pte_entry = page_clear_nocache,
-   .mm = _mm
-   };
 
/* walk_page_range shouldn't be able to fail here */
-   WARN_ON(walk_page_range(va, va + size, ));
+   WARN_ON(walk_page_range(_mm, va, va + size,
+   _nocache_walk_ops, NULL));
 
free_pages_exact(vaddr, size);
 }
diff --git a/arch/powerpc/mm/book3s64/subpage_prot.c 
b/arch/powerpc/mm/book3s64/subpage_prot.c
index 9ba07e55c489..2ef24a53f4c9 100644
--- a/arch/powerpc/mm/book3s64/subpage_prot.c
+++ b/arch/powerpc/mm/book3s64/subpage_prot.c
@@ -7,7 +7,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -139,14 +139,14 @@ static int subpage_walk_p

Re: [PATCH] dma-mapping: remove dma_{alloc,free,mmap}_writecombine

2019-07-30 Thread Christoph Hellwig
On Tue, Jul 30, 2019 at 10:50:32AM +0300, Tomi Valkeinen wrote:
> On 30/07/2019 09:18, Christoph Hellwig wrote:
>> We can already use DMA_ATTR_WRITE_COMBINE or the _wc prefixed version,
>> so remove the third way of doing things.
>>
>> Signed-off-by: Christoph Hellwig 
>> ---
>>   drivers/gpu/drm/omapdrm/dss/dispc.c | 11 +--
>>   include/linux/dma-mapping.h |  9 -
>>   2 files changed, 5 insertions(+), 15 deletions(-)
>
> Reviewed-by: Tomi Valkeinen 
>
> Which tree should this be applied to?

I'd like to add it to the dma-mapping tree if possible.


[PATCH] dma-mapping: remove dma_{alloc,free,mmap}_writecombine

2019-07-30 Thread Christoph Hellwig
We can already use DMA_ATTR_WRITE_COMBINE or the _wc prefixed version,
so remove the third way of doing things.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/omapdrm/dss/dispc.c | 11 +--
 include/linux/dma-mapping.h |  9 -
 2 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/omapdrm/dss/dispc.c 
b/drivers/gpu/drm/omapdrm/dss/dispc.c
index 785c5546067a..c70f3246a552 100644
--- a/drivers/gpu/drm/omapdrm/dss/dispc.c
+++ b/drivers/gpu/drm/omapdrm/dss/dispc.c
@@ -4609,11 +4609,10 @@ static int dispc_errata_i734_wa_init(struct 
dispc_device *dispc)
i734_buf.size = i734.ovli.width * i734.ovli.height *
color_mode_to_bpp(i734.ovli.fourcc) / 8;
 
-   i734_buf.vaddr = dma_alloc_writecombine(>pdev->dev,
-   i734_buf.size, _buf.paddr,
-   GFP_KERNEL);
+   i734_buf.vaddr = dma_alloc_wc(>pdev->dev, i734_buf.size,
+   _buf.paddr, GFP_KERNEL);
if (!i734_buf.vaddr) {
-   dev_err(>pdev->dev, "%s: dma_alloc_writecombine 
failed\n",
+   dev_err(>pdev->dev, "%s: dma_alloc_wc failed\n",
__func__);
return -ENOMEM;
}
@@ -4626,8 +4625,8 @@ static void dispc_errata_i734_wa_fini(struct dispc_device 
*dispc)
if (!dispc->feat->has_gamma_i734_bug)
return;
 
-   dma_free_writecombine(>pdev->dev, i734_buf.size, i734_buf.vaddr,
- i734_buf.paddr);
+   dma_free_wc(>pdev->dev, i734_buf.size, i734_buf.vaddr,
+   i734_buf.paddr);
 }
 
 static void dispc_errata_i734_wa(struct dispc_device *dispc)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f7d1eea32c78..633dae466097 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -786,9 +786,6 @@ static inline void *dma_alloc_wc(struct device *dev, size_t 
size,
 
return dma_alloc_attrs(dev, size, dma_addr, gfp, attrs);
 }
-#ifndef dma_alloc_writecombine
-#define dma_alloc_writecombine dma_alloc_wc
-#endif
 
 static inline void dma_free_wc(struct device *dev, size_t size,
   void *cpu_addr, dma_addr_t dma_addr)
@@ -796,9 +793,6 @@ static inline void dma_free_wc(struct device *dev, size_t 
size,
return dma_free_attrs(dev, size, cpu_addr, dma_addr,
  DMA_ATTR_WRITE_COMBINE);
 }
-#ifndef dma_free_writecombine
-#define dma_free_writecombine dma_free_wc
-#endif
 
 static inline int dma_mmap_wc(struct device *dev,
  struct vm_area_struct *vma,
@@ -808,9 +802,6 @@ static inline int dma_mmap_wc(struct device *dev,
return dma_mmap_attrs(dev, vma, cpu_addr, dma_addr, size,
  DMA_ATTR_WRITE_COMBINE);
 }
-#ifndef dma_mmap_writecombine
-#define dma_mmap_writecombine dma_mmap_wc
-#endif
 
 #ifdef CONFIG_NEED_DMA_MAP_STATE
 #define DEFINE_DMA_UNMAP_ADDR(ADDR_NAME)dma_addr_t ADDR_NAME
-- 
2.20.1



Re: [PATCH v6 4/5] dma-buf: heaps: Add CMA heap to dmabuf heaps

2019-07-25 Thread Christoph Hellwig
On Thu, Jul 25, 2019 at 09:47:11AM -0400, Andrew F. Davis wrote:
> This is a central allocator, it is not tied to any one device. If we
> knew the one device ahead of time we would just use the existing dma_alloc.
> 
> We might be able to solve some of that with late mapping after all the
> devices attach to the buffer, but even then, which device's CMA area
> would we chose to use from all the attached devices?
> 
> I can agree that allocating from per-device CMA using Heaps doesn't make
> much sense, but for global pools I'm not sure I see any way to allow
> devices to select which pool is right for a specific use. They don't
> have the full use-case information like the application does, the
> selection needs to be made from the application.

Well, the examples we had before was that we clear want to use the
per-device CMA area.  And at least in upstream a CMA area either is
global or attached to a device, as we otherwise wouldn't find it.


Re: [PATCH v6 4/5] dma-buf: heaps: Add CMA heap to dmabuf heaps

2019-07-25 Thread Christoph Hellwig
On Thu, Jul 25, 2019 at 09:31:50AM -0400, Andrew F. Davis wrote:
> But that's just it, dma-buf does not assume buffers are backed by normal
> kernel managed memory, it is up to the buffer exporter where and when to
> allocate the memory. The memory backed by this SRAM buffer does not have
> the normal struct page backing. So moving the map, sync, etc functions
> to common code would fail for this and many other heap types. This was a
> major problem with Ion that prompted this new design.

The code clearly shows it has page backing, e.g. this:

+   sg_set_page(table->sgl, pfn_to_page(PFN_DOWN(buffer->paddr)), 
buffer->len, 0);

and the fact that it (and the dma-buf API) uses scatterlists, which 
requires pages.


Re: [PATCH v7 3/5] dma-buf: heaps: Add system heap to dmabuf heaps

2019-07-25 Thread Christoph Hellwig
> +struct system_heap {
> + struct dma_heap *heap;
> +} sys_heap;

It seems like this structure could be removed and if would improve
the code flow.

> +static struct dma_heap_ops system_heap_ops = {
> + .allocate = system_heap_allocate,
> +};
> +
> +static int system_heap_create(void)
> +{
> + struct dma_heap_export_info exp_info;
> + int ret = 0;
> +
> + exp_info.name = "system_heap";
> + exp_info.ops = _heap_ops;
> + exp_info.priv = _heap;
> +
> + sys_heap.heap = dma_heap_add(_info);
> + if (IS_ERR(sys_heap.heap))
> + ret = PTR_ERR(sys_heap.heap);
> +
> + return ret;

The data structures here seem a little odd.  I think you want to:

 - mark all dma_heap_ops instanes consts, as we generally do that for
   all structures containing function pointers
 - move the name into dma_heap_ops.
 - remove the dma_heap_export_info structure, which is a bit pointless
 - don't bother setting a private data, as you don't need it.
   If other heaps need private data I'd suggest to switch to embedding
   the dma_heap structure into containing structure insted so that you
   can use container_of to get at it.
 - also why is the free callback passed as a callback rather than
   kept in dma_heap_ops, next to the paired alloc one?


Re: [PATCH v7 2/5] dma-buf: heaps: Add heap helpers

2019-07-25 Thread Christoph Hellwig
> +struct dma_buf *heap_helper_export_dmabuf(
> + struct heap_helper_buffer *helper_buffer,
> + int fd_flags)

Indentation seems odd here as it doesn't follow any of the usual schools
for multi-level prototypes.  But maybe shortening some identifier would
help avoiding that problem altogether :)

> +static void *dma_heap_map_kernel(struct heap_helper_buffer *buffer)
> +{
> + void *vaddr;
> +
> + vaddr = vmap(buffer->pages, buffer->pagecount, VM_MAP, PAGE_KERNEL);
> + if (!vaddr)
> + return ERR_PTR(-ENOMEM);
> +
> + return vaddr;
> +}

Note that a lot of systems don't have highmem at all or don't support
CMA in highmem.  So for contigous allocations that aren't highmem we
could skip the vmap here altogether and just use the direct mapping.


Re: [PATCH v6 4/5] dma-buf: heaps: Add CMA heap to dmabuf heaps

2019-07-25 Thread Christoph Hellwig
On Wed, Jul 24, 2019 at 11:46:24AM -0700, John Stultz wrote:
> I'm still not understanding how this would work. Benjamin and Laura
> already commented on this point, but for a simple example, with the
> HiKey boards, the DRM driver requires contiguous memory for the
> framebuffer, but the GPU can handle non-contiguous. Thus the target
> framebuffers that we pass to the display has to be CMA allocated, but
> all the other graphics buffers that the GPU will render to and
> composite can be system.

But that just means we need a flag that memory needs to be contiguous,
which totally makes sense at the API level.  But CMA is not the only
source of contigous memory, so we should not conflate the two.

> Laura already touched on this, but similar logic can be used for
> camera buffers, which can make sure we allocate from a specifically
> reserved CMA region that is only used for the camera so we can always
> be sure to have N free buffers immediately to capture with, etc.

And for that we already have per-device CMA areas hanging off struct
device, which this should reuse instead of adding another duplicate
CMA area lookup scheme.


Re: [PATCH v6 4/5] dma-buf: heaps: Add CMA heap to dmabuf heaps

2019-07-25 Thread Christoph Hellwig
On Wed, Jul 24, 2019 at 11:46:01AM -0400, Andrew F. Davis wrote:
> https://patchwork.kernel.org/patch/10863957/
> 
> It's actually a more simple heap type IMHO, but the logic inside is
> incompatible with the system/CMA heaps, if you move any of their code
> into the core framework then this heap stops working. Leading to out of
> tree hacks on the core to get it back functional. I see the same for the
> "complex" heaps with ION.

Well, this mostly is just another allocator (gen_pool).  And given that
the whole dma-buf infrastucture assumes things are backed by pages we
really shouldn't need much more than an alloc and a free callback (and
maybe the pgprot to map it) and handle the rest in common code.


Re: [PATCH v6 4/5] dma-buf: heaps: Add CMA heap to dmabuf heaps

2019-07-25 Thread Christoph Hellwig
On Wed, Jul 24, 2019 at 07:38:07AM -0400, Laura Abbott wrote:
> It's not just an optimization for Ion though. Ion was designed to
> let the callers choose between system and multiple CMA heaps.

Who cares about ion?  That some out of tree android crap that should
not be relevant for upstream except as an example for how not to design
things..

> On other
> systems there may be multiple CMA regions dedicated to a specific
> purpose or placed at a specific address. The callers need to
> be able to choose exactly whether they want a particular CMA region
> or discontiguous regions.

At least in cma is only used either with the global pool or a per-device
cma pool.  I think if you want to make this new dma-buf API fit in with
the rest with the kernel you follow that model, and pass in a struct
device to select the particular cma area, similar how the DMA allocator
works.


Re: [PATCH v6 4/5] dma-buf: heaps: Add CMA heap to dmabuf heaps

2019-07-25 Thread Christoph Hellwig
On Wed, Jul 24, 2019 at 10:08:54AM +0200, Benjamin Gaignard wrote:
> CMA has made possible to get large regions of memories and to give some
> priority on device allocating pages on it. I don't think that possible
> with system
> heap so I suggest to keep CMA heap if we want to be able to port a maximum
> of applications on dma-buf-heap.

Yes, CMA is a way to better allocate contigous regions, but it isn't
the only way to do that.

So at least for the system default CMA area it really should be a helper
for the system heap, especially given that CMA is an optional feature
and we can do high order contigous allocations using alloc_pages as
well.

Something like:

if (!require_contigous && order > 0) {
for (i = 0; i < nr_pages; i++)
page[i] = alloc_page();
goto done;
} else if (order > 0)
page = cma_alloc();
if (page)
goto done;
}

page = alloc_pages(order);


Re: [PATCH v6 2/5] dma-buf: heaps: Add heap helpers

2019-07-25 Thread Christoph Hellwig
On Wed, Jul 24, 2019 at 11:20:31AM -0400, Andrew F. Davis wrote:
> Well then lets think on this. A given buffer can have 3 owners states
> (CPU-owned, Device-owned, and Un-owned). These are based on the caching
> state from the CPU perspective.
> 
> If a buffer is CPU-owned then we (Linux) can write to the buffer safely
> without worry that the data is stale or that it will be accessed by the
> device without having been flushed. Device-owned buffers should not be
> accessed by the CPU, and inter-device synchronization should be handled
> by fencing as Rob points out. Un-owned is how a buffer starts for
> consistency and to prevent unneeded cache operations on unwritten buffers.

CPU owned also needs to be split into which mapping owns it - in the
normal DMA this is the kernel direct mapping, but in dma-buf it seems
the primary way of using it in kernel space is the vmap, in addition
to that the mappings can also be exported to userspace, which is another
mapping that is possibly not cache coherent with the kernel one.


Re: [PATCH v6 4/5] dma-buf: heaps: Add CMA heap to dmabuf heaps

2019-07-24 Thread Christoph Hellwig
On Mon, Jul 22, 2019 at 10:04:06PM -0700, John Stultz wrote:
> Apologies, I'm not sure I'm understanding your suggestion here.
> dma_alloc_contiguous() does have some interesting optimizations
> (avoiding allocating single page from cma), though its focus on
> default area vs specific device area doesn't quite match up the usage
> model for dma heaps.  Instead of allocating memory for a single
> device, we want to be able to allow userland, for a given usage mode,
> to be able to allocate a dmabuf from a specific heap of memory which
> will satisfy the usage mode for that buffer pipeline (across multiple
> devices).
> 
> Now, indeed, the system and cma heaps in this patchset are a bit
> simple/trivial (though useful with my devices that require contiguous
> buffers for the display driver), but more complex ION heaps have
> traditionally stayed out of tree in vendor code, in many cases making
> incompatible tweaks to the ION core dmabuf exporter logic.

So what would the more complicated heaps be?

> That's why
> dmabuf heaps is trying to destage ION in a way that allows heaps to
> implement their exporter logic themselves, so we can start pulling
> those more complicated heaps out of their vendor hidey-holes and get
> some proper upstream review.
> 
> But I suspect I just am confused as to what your suggesting. Maybe
> could you expand a bit? Apologies for being a bit dense.

My suggestion is to merge the system and CMA heaps.  CMA (at least
the system-wide CMA area) is really just an optimization to get
large contigous regions more easily.  We should make use of it as
transparent as possible, just like we do in the DMA code.


Re: [PATCH v6 2/5] dma-buf: heaps: Add heap helpers

2019-07-24 Thread Christoph Hellwig
On Mon, Jul 22, 2019 at 09:09:25PM -0700, John Stultz wrote:
> On Thu, Jul 18, 2019 at 3:06 AM Christoph Hellwig  wrote:
> >
> > > +void INIT_HEAP_HELPER_BUFFER(struct heap_helper_buffer *buffer,
> > > +  void (*free)(struct heap_helper_buffer *))
> >
> > Please use a lower case naming following the naming scheme for the
> > rest of the file.
> 
> Yes! Apologies as this was a hold-over from when the initialization
> function was an inline function in the style of
> INIT_WORK/INIT_LIST_HEAD. No longer appropriate that its a function.
> I'll change it.
> 
> > > +static void *dma_heap_map_kernel(struct heap_helper_buffer *buffer)
> > > +{
> > > + void *vaddr;
> > > +
> > > + vaddr = vmap(buffer->pages, buffer->pagecount, VM_MAP, PAGE_KERNEL);
> > > + if (!vaddr)
> > > + return ERR_PTR(-ENOMEM);
> > > +
> > > + return vaddr;
> > > +}
> >
> > Unless I'm misreading the patches this is used for the same pages that
> > also might be dma mapped.  In this case you need to use
> > flush_kernel_vmap_range and invalidate_kernel_vmap_range in the right
> > places to ensure coherency between the vmap and device view.  Please
> > also document the buffer ownership, as this really can get complicated.
> 
> Forgive me I wasn't familiar with those calls, but this seems like it
> would apply to all dma-buf exporters if so, and I don't see any
> similar flush_kernel_vmap_range calls there (both functions are
> seemingly only used by xfs, md and bio).
> 
> We do have the 
> dma_heap_dma_buf_begin_cpu_access()/dma_heap_dma_buf_end_cpu_access()
> hooks (see more on these below) which sync the buffers for each
> attachment (via dma_sync_sg_for_cpu/device), and are used around cpu
> access to the buffers. Are you suggesting the
> flush_kernel_vmap_range() call be added to those hooks or is the
> dma_sync_sg_for_* calls sufficient there?

dma_sync_sg_for_* only operates on the kernel direct mapping, that
is what you get from page_address/kmap* for the page.  But with vmap
your create another alias in kernel virtual space, which
dma_sync_sg_for_* can't know about.  Now for most CPUs caches are
indexed by physical addresses so this doesn't matter, but a significant
minority of CPUs (parisc, some arm, some mips) index by virtual address,
in which case we need non-empy flush_kernel_vmap_range and
invalidate_kernel_vmap_range helper to deal with that vmap alias.


Re: [PATCH v6 2/5] dma-buf: heaps: Add heap helpers

2019-07-24 Thread Christoph Hellwig
On Tue, Jul 23, 2019 at 01:09:55PM -0700, Rob Clark wrote:
> On Mon, Jul 22, 2019 at 9:09 PM John Stultz  wrote:
> >
> > On Thu, Jul 18, 2019 at 3:06 AM Christoph Hellwig  
> > wrote:
> > >
> > > Is there any exlusion between mmap / vmap and the device accessing
> > > the data?  Without that you are going to run into a lot of coherency
> > > problems.
> 
> dma_fence is basically the way to handle exclusion between different
> device access (since device access tends to be asynchronous).  For
> device<->device access, each driver is expected to take care of any
> cache(s) that the device might have.  (Ie. device writing to buffer
> should flush it's caches if needed before signalling fence to let
> reading device know that it is safe to read, etc.)
> 
> _begin/end_cpu_access() is intended to be the exclusion for CPU access
> (which is synchronous)

What I mean is that we need a clear state machine (preferably including
ownership tracking ala dma-debug) where a piece of memory has one
owner at a time that can access it.  Only the owner can access is at
that time, and at any owner switch we need to flush/invalidate all
relevant caches.  And with memory that is vmaped and mapped to userspace
that can get really complicated.

The above sounds like you have some of that in place, but we'll really
need clear rules to make sure we don't have holes in the scheme.


Re: [PATCH 1/3] mm/gup: introduce __put_user_pages()

2019-07-23 Thread Christoph Hellwig
On Mon, Jul 22, 2019 at 11:33:32PM -0700, John Hubbard wrote:
> I'm seeing about 18 places where set_page_dirty() is used, in the call site
> conversions so far, and about 20 places where set_page_dirty_lock() is
> used. So without knowing how many of the former (if any) represent bugs,
> you can see why the proposal here supports both DIRTY and DIRTY_LOCK.

Well, it should be fairly easy to audit.  set_page_dirty() is only
safe if we are dealing with a file backed page where we have reference
on the inode it hangs off.  Which should basically be never or almost
never.


Re: [PATCH 1/3] mm/gup: introduce __put_user_pages()

2019-07-22 Thread Christoph Hellwig
On Mon, Jul 22, 2019 at 03:34:13PM -0700, john.hubb...@gmail.com wrote:
> +enum pup_flags_t {
> + PUP_FLAGS_CLEAN = 0,
> + PUP_FLAGS_DIRTY = 1,
> + PUP_FLAGS_LOCK  = 2,
> + PUP_FLAGS_DIRTY_LOCK= 3,
> +};

Well, the enum defeats the ease of just being able to pass a boolean
expression to the function, which would simplify a lot of the caller,
so if we need to support the !locked version I'd rather see that as
a separate helper.

But do we actually have callers where not using the _lock version is
not a bug?  set_page_dirty makes sense in the context of a file systems
that have a reference to the inode the page hangs off, but that is
(almost?) never the case for get_user_pages.


Re: [PATCH 2/3] net/xdp: convert put_page() to put_user_page*()

2019-07-22 Thread Christoph Hellwig
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 83de74ca729a..9cbbb96c2a32 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -171,8 +171,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
>   for (i = 0; i < umem->npgs; i++) {
>   struct page *page = umem->pgs[i];
>  
> - set_page_dirty_lock(page);
> - put_page(page);
> + put_user_pages_dirty_lock(, 1);

Same here, we really should avoid the need for the loop here and
do the looping inside the helper.


Re: [PATCH 1/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()

2019-07-22 Thread Christoph Hellwig
On Sun, Jul 21, 2019 at 09:30:10PM -0700, john.hubb...@gmail.com wrote:
>   for (i = 0; i < vsg->num_pages; ++i) {
>   if (NULL != (page = vsg->pages[i])) {
>   if (!PageReserved(page) && (DMA_FROM_DEVICE == 
> vsg->direction))
> - SetPageDirty(page);
> - put_page(page);
> + put_user_pages_dirty(, 1);
> + else
> + put_user_page(page);
>   }

Can't just pass a dirty argument to put_user_pages?  Also do we really
need a separate put_user_page for the single page case?
put_user_pages_dirty?

Also the PageReserved check looks bogus, as I can't see how a reserved
page can end up here.  So IMHO the above snippled should really look
something like this:

put_user_pages(vsg->pages[i], vsg->num_pages,
vsg->direction == DMA_FROM_DEVICE);

in the end.


Re: [PATCH v6 4/5] dma-buf: heaps: Add CMA heap to dmabuf heaps

2019-07-18 Thread Christoph Hellwig
This and the previous one seem very much duplicated boilerplate
code.  Why can't just normal branches for allocating and freeing
normal pages vs cma?  We even have an existing helper for that
with dma_alloc_contiguous().


Re: [PATCH v6 2/5] dma-buf: heaps: Add heap helpers

2019-07-18 Thread Christoph Hellwig
> +void INIT_HEAP_HELPER_BUFFER(struct heap_helper_buffer *buffer,
> +  void (*free)(struct heap_helper_buffer *))

Please use a lower case naming following the naming scheme for the
rest of the file.

> +static void *dma_heap_map_kernel(struct heap_helper_buffer *buffer)
> +{
> + void *vaddr;
> +
> + vaddr = vmap(buffer->pages, buffer->pagecount, VM_MAP, PAGE_KERNEL);
> + if (!vaddr)
> + return ERR_PTR(-ENOMEM);
> +
> + return vaddr;
> +}

Unless I'm misreading the patches this is used for the same pages that
also might be dma mapped.  In this case you need to use
flush_kernel_vmap_range and invalidate_kernel_vmap_range in the right
places to ensure coherency between the vmap and device view.  Please
also document the buffer ownership, as this really can get complicated.

> +static vm_fault_t dma_heap_vm_fault(struct vm_fault *vmf)
> +{
> + struct vm_area_struct *vma = vmf->vma;
> + struct heap_helper_buffer *buffer = vma->vm_private_data;
> +
> + vmf->page = buffer->pages[vmf->pgoff];
> + get_page(vmf->page);
> +
> + return 0;
> +}

Is there any exlusion between mmap / vmap and the device accessing
the data?  Without that you are going to run into a lot of coherency
problems.


Re: use exact allocation for dma coherent memory

2019-07-08 Thread Christoph Hellwig
On Tue, Jul 02, 2019 at 11:48:44AM +0200, Arend Van Spriel wrote:
> You made me look ;-) Actually not touching my drivers so I'm off the hook. 
> However, I was wondering if drivers could know so I decided to look into 
> the DMA-API.txt documentation which currently states:
>
> """
> The flag parameter (dma_alloc_coherent() only) allows the caller to
> specify the ``GFP_`` flags (see kmalloc()) for the allocation (the
> implementation may choose to ignore flags that affect the location of
> the returned memory, like GFP_DMA).
> """
>
> I do expect you are going to change that description as well now that you 
> are going to issue a warning on __GFP_COMP. Maybe include that in patch 
> 15/16 where you introduce that warning.

Yes, that description needs an updated, even without this series.
I'll make sure it is more clear.


Re: use exact allocation for dma coherent memory

2019-07-01 Thread Christoph Hellwig
On Fri, Jun 14, 2019 at 03:47:10PM +0200, Christoph Hellwig wrote:
> Switching to a slightly cleaned up alloc_pages_exact is pretty easy,
> but it turns out that because we didn't filter valid gfp_t flags
> on the DMA allocator, a bunch of drivers were passing __GFP_COMP
> to it, which is rather bogus in too many ways to explain.  Arm has
> been filtering it for a while, but this series instead tries to fix
> the drivers and warn when __GFP_COMP is passed, which makes it much
> larger than just adding the functionality.

Dear driver maintainers,

can you look over the patches touching your drivers, please?  I'd
like to get as much as possible of the driver patches into this
merge window, so that it can you through your maintainer trees.


Re: [PATCH v1 1/3] gpu: host1x: Remove implicit IOMMU backing on client's registration

2019-06-24 Thread Christoph Hellwig
Don't we have a device tree problem here if there is a domain covering
them?  I though we should only pick up an IOMMU for a given device
if DT explicitly asked for that?


Re: use exact allocation for dma coherent memory

2019-06-20 Thread Christoph Hellwig
On Wed, Jun 19, 2019 at 01:29:03PM -0300, Jason Gunthorpe wrote:
> > Yes.  This will blow up badly on many platforms, as sq->queue
> > might be vmapped, ioremapped, come from a pool without page backing.
> 
> Gah, this addr gets fed into io_remap_pfn_range/remap_pfn_range too..
> 
> Potnuri, you should fix this.. 
> 
> You probably need to use dma_mmap_from_dev_coherent() in the mmap ?

The function to use is dma_mmap_coherent, dma_mmap_from_dev_coherent is
just an internal helper.

That beiŋ said the drivers/infiniband code has a lot of
*remap_pfn_range, and a lot of them look like they might be for
DMA memory.


Re: use exact allocation for dma coherent memory

2019-06-17 Thread Christoph Hellwig
> drivers/infiniband/hw/cxgb4/qp.c
>129  static int alloc_host_sq(struct c4iw_rdev *rdev, struct t4_sq *sq)
>130  {
>131  sq->queue = dma_alloc_coherent(&(rdev->lldi.pdev->dev), 
> sq->memsize,
>132 &(sq->dma_addr), GFP_KERNEL);
>133  if (!sq->queue)
>134  return -ENOMEM;
>135  sq->phys_addr = virt_to_phys(sq->queue);
>136  dma_unmap_addr_set(sq, mapping, sq->dma_addr);
>137  return 0;
>138  }
> 
> Is this a bug?

Yes.  This will blow up badly on many platforms, as sq->queue
might be vmapped, ioremapped, come from a pool without page backing.


Re: [PATCH 12/16] staging/comedi: mark as broken

2019-06-14 Thread Christoph Hellwig
On Fri, Jun 14, 2019 at 05:30:32PM +0200, Greg KH wrote:
> On Fri, Jun 14, 2019 at 04:48:57PM +0200, Christoph Hellwig wrote:
> > On Fri, Jun 14, 2019 at 04:02:39PM +0200, Greg KH wrote:
> > > Perhaps a hint as to how we can fix this up?  This is the first time
> > > I've heard of the comedi code not handling dma properly.
> > 
> > It can be fixed by:
> > 
> >  a) never calling virt_to_page (or vmalloc_to_page for that matter)
> > on dma allocation
> >  b) never remapping dma allocation with conflicting cache modes
> > (no remapping should be doable after a) anyway).
> 
> Ok, fair enough, have any pointers of drivers/core code that does this
> correctly?  I can put it on my todo list, but might take a week or so...

Just about everyone else.  They just need to remove the vmap and
either do one large allocation, or live with the fact that they need
helpers to access multiple array elements instead of one net vmap,
which most of the users already seem to do anyway, with just a few
using the vmap (which might explain why we didn't see blowups yet).


Re: [PATCH 16/16] dma-mapping: use exact allocation in dma_alloc_contiguous

2019-06-14 Thread 'Christoph Hellwig'
On Fri, Jun 14, 2019 at 04:05:33PM +0100, Robin Murphy wrote:
> That said, I don't believe this particular patch should make any 
> appreciable difference - alloc_pages_exact() is still going to give back 
> the same base address as the rounded up over-allocation would, and 
> PAGE_ALIGN()ing the size passed to get_order() already seemed to be 
> pointless.

True, we actually do get the right alignment just about anywhere.
Not 100% sure about the various static pool implementations, but we
can make sure if any didn't we'll do that right thing once those
get consolidated.


Re: [PATCH 16/16] dma-mapping: use exact allocation in dma_alloc_contiguous

2019-06-14 Thread 'Christoph Hellwig'
On Fri, Jun 14, 2019 at 03:01:22PM +, David Laight wrote:
> I'm pretty sure there is a lot of code out there that makes that assumption.
> Without it many drivers will have to allocate almost double the
> amount of memory they actually need in order to get the required alignment.
> So instead of saving memory you'll actually make more be used.

That code would already be broken on a lot of Linux platforms.


Re: [PATCH 16/16] dma-mapping: use exact allocation in dma_alloc_contiguous

2019-06-14 Thread 'Christoph Hellwig'
On Fri, Jun 14, 2019 at 02:15:44PM +, David Laight wrote:
> Does this still guarantee that requests for 16k will not cross a 16k boundary?
> It looks like you are losing the alignment parameter.

The DMA API never gave you alignment guarantees to start with,
and you can get not naturally aligned memory from many of our
current implementations.


Re: [PATCH 12/16] staging/comedi: mark as broken

2019-06-14 Thread Christoph Hellwig
On Fri, Jun 14, 2019 at 04:02:39PM +0200, Greg KH wrote:
> Perhaps a hint as to how we can fix this up?  This is the first time
> I've heard of the comedi code not handling dma properly.

It can be fixed by:

 a) never calling virt_to_page (or vmalloc_to_page for that matter)
on dma allocation
 b) never remapping dma allocation with conflicting cache modes
(no remapping should be doable after a) anyway).


[PATCH 03/16] drm/i915: stop using drm_pci_alloc

2019-06-14 Thread Christoph Hellwig
Remove usage of the legacy drm PCI DMA wrappers, and with that the
incorrect usage cocktail of __GFP_COMP, virt_to_page on DMA allocation
and SetPageReserved.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/i915/i915_gem.c| 30 +-
 drivers/gpu/drm/i915/i915_gem_object.h |  3 ++-
 drivers/gpu/drm/i915/intel_display.c   |  2 +-
 3 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ad01c92aaf74..8f2053c91aff 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -228,7 +228,6 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, void 
*data,
 static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
 {
struct address_space *mapping = obj->base.filp->f_mapping;
-   drm_dma_handle_t *phys;
struct sg_table *st;
struct scatterlist *sg;
char *vaddr;
@@ -242,13 +241,13 @@ static int i915_gem_object_get_pages_phys(struct 
drm_i915_gem_object *obj)
 * to handle all possible callers, and given typical object sizes,
 * the alignment of the buddy allocation will naturally match.
 */
-   phys = drm_pci_alloc(obj->base.dev,
-roundup_pow_of_two(obj->base.size),
-roundup_pow_of_two(obj->base.size));
-   if (!phys)
+   obj->phys_vaddr = dma_alloc_coherent(>base.dev->pdev->dev,
+   roundup_pow_of_two(obj->base.size),
+   >phys_handle, GFP_KERNEL);
+   if (!obj->phys_vaddr)
return -ENOMEM;
 
-   vaddr = phys->vaddr;
+   vaddr = obj->phys_vaddr;
for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
struct page *page;
char *src;
@@ -286,18 +285,17 @@ static int i915_gem_object_get_pages_phys(struct 
drm_i915_gem_object *obj)
sg->offset = 0;
sg->length = obj->base.size;
 
-   sg_dma_address(sg) = phys->busaddr;
+   sg_dma_address(sg) = obj->phys_handle;
sg_dma_len(sg) = obj->base.size;
 
-   obj->phys_handle = phys;
-
__i915_gem_object_set_pages(obj, st, sg->length);
 
return 0;
 
 err_phys:
-   drm_pci_free(obj->base.dev, phys);
-
+   dma_free_coherent(>base.dev->pdev->dev,
+   roundup_pow_of_two(obj->base.size), obj->phys_vaddr,
+   obj->phys_handle);
return err;
 }
 
@@ -335,7 +333,7 @@ i915_gem_object_put_pages_phys(struct drm_i915_gem_object 
*obj,
 
if (obj->mm.dirty) {
struct address_space *mapping = obj->base.filp->f_mapping;
-   char *vaddr = obj->phys_handle->vaddr;
+   char *vaddr = obj->phys_vaddr;
int i;
 
for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
@@ -363,7 +361,9 @@ i915_gem_object_put_pages_phys(struct drm_i915_gem_object 
*obj,
sg_free_table(pages);
kfree(pages);
 
-   drm_pci_free(obj->base.dev, obj->phys_handle);
+   dma_free_coherent(>base.dev->pdev->dev,
+   roundup_pow_of_two(obj->base.size), obj->phys_vaddr,
+   obj->phys_handle);
 }
 
 static void
@@ -603,7 +603,7 @@ i915_gem_phys_pwrite(struct drm_i915_gem_object *obj,
 struct drm_i915_gem_pwrite *args,
 struct drm_file *file)
 {
-   void *vaddr = obj->phys_handle->vaddr + args->offset;
+   void *vaddr = obj->phys_vaddr + args->offset;
char __user *user_data = u64_to_user_ptr(args->data_ptr);
 
/* We manually control the domain here and pretend that it
@@ -1431,7 +1431,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
ret = i915_gem_gtt_pwrite_fast(obj, args);
 
if (ret == -EFAULT || ret == -ENOSPC) {
-   if (obj->phys_handle)
+   if (obj->phys_vaddr)
ret = i915_gem_phys_pwrite(obj, args, file);
else
ret = i915_gem_shmem_pwrite(obj, args);
diff --git a/drivers/gpu/drm/i915/i915_gem_object.h 
b/drivers/gpu/drm/i915/i915_gem_object.h
index ca93a40c0c87..14bd2d61d0f6 100644
--- a/drivers/gpu/drm/i915/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/i915_gem_object.h
@@ -290,7 +290,8 @@ struct drm_i915_gem_object {
};
 
/** for phys allocated objects */
-   struct drm_dma_handle *phys_handle;
+   dma_addr_t phys_handle;
+   void *phys_vaddr;
 
struct reservation_object __builtin_resv;
 };
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 5098228f1302..4f8b368ac4e2 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10066,7 +1006

[PATCH 05/16] drm: don't mark pages returned from drm_pci_alloc reserved

2019-06-14 Thread Christoph Hellwig
We are not allowed to call virt_to_page on pages returned from
dma_alloc_coherent, as in many cases the virtual address returned
is aactually a kernel direct mapping.  Also there generally is no
need to mark dma memory as reserved.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/drm_bufs.c | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/drm_bufs.c b/drivers/gpu/drm/drm_bufs.c
index 7418872d87c6..b640437ce90f 100644
--- a/drivers/gpu/drm/drm_bufs.c
+++ b/drivers/gpu/drm/drm_bufs.c
@@ -77,13 +77,6 @@ drm_dma_handle_t *drm_pci_alloc(struct drm_device * dev, 
size_t size, size_t ali
return NULL;
}
 
-   /* XXX - Is virt_to_page() legal for consistent mem? */
-   /* Reserve */
-   for (addr = (unsigned long)dmah->vaddr, sz = size;
-sz > 0; addr += PAGE_SIZE, sz -= PAGE_SIZE) {
-   SetPageReserved(virt_to_page((void *)addr));
-   }
-
return dmah;
 }
 
@@ -97,16 +90,9 @@ void __drm_legacy_pci_free(struct drm_device * dev, 
drm_dma_handle_t * dmah)
unsigned long addr;
size_t sz;
 
-   if (dmah->vaddr) {
-   /* XXX - Is virt_to_page() legal for consistent mem? */
-   /* Unreserve */
-   for (addr = (unsigned long)dmah->vaddr, sz = dmah->size;
-sz > 0; addr += PAGE_SIZE, sz -= PAGE_SIZE) {
-   ClearPageReserved(virt_to_page((void *)addr));
-   }
+   if (dmah->vaddr)
dma_free_coherent(>pdev->dev, dmah->size, dmah->vaddr,
  dmah->busaddr);
-   }
 }
 
 /**
-- 
2.20.1



[PATCH 09/16] cnic: stop passing bogus gfp flags arguments to dma_alloc_coherent

2019-06-14 Thread Christoph Hellwig
dma_alloc_coherent is not just the page allocator.  The only valid
arguments to pass are either GFP_ATOMIC or GFP_ATOMIC with possible
modifiers of __GFP_NORETRY or __GFP_NOWARN.

Signed-off-by: Christoph Hellwig 
---
 drivers/net/ethernet/broadcom/cnic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c 
b/drivers/net/ethernet/broadcom/cnic.c
index 57dc3cbff36e..bd1c993680e5 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -1028,7 +1028,7 @@ static int __cnic_alloc_uio_rings(struct cnic_uio_dev 
*udev, int pages)
udev->l2_ring_size = pages * CNIC_PAGE_SIZE;
udev->l2_ring = dma_alloc_coherent(>pdev->dev, udev->l2_ring_size,
   >l2_ring_map,
-  GFP_KERNEL | __GFP_COMP);
+  GFP_KERNEL);
if (!udev->l2_ring)
return -ENOMEM;
 
@@ -1036,7 +1036,7 @@ static int __cnic_alloc_uio_rings(struct cnic_uio_dev 
*udev, int pages)
udev->l2_buf_size = CNIC_PAGE_ALIGN(udev->l2_buf_size);
udev->l2_buf = dma_alloc_coherent(>pdev->dev, udev->l2_buf_size,
  >l2_buf_map,
- GFP_KERNEL | __GFP_COMP);
+ GFP_KERNEL);
if (!udev->l2_buf) {
__cnic_free_uio_rings(udev);
return -ENOMEM;
-- 
2.20.1



[PATCH 11/16] s390/ism: stop passing bogus gfp flags arguments to dma_alloc_coherent

2019-06-14 Thread Christoph Hellwig
dma_alloc_coherent is not just the page allocator.  The only valid
arguments to pass are either GFP_ATOMIC or GFP_ATOMIC with possible
modifiers of __GFP_NORETRY or __GFP_NOWARN.

Signed-off-by: Christoph Hellwig 
---
 drivers/s390/net/ism_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
index 4fc2056bd227..4ff5506fa4c6 100644
--- a/drivers/s390/net/ism_drv.c
+++ b/drivers/s390/net/ism_drv.c
@@ -241,7 +241,8 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct 
smcd_dmb *dmb)
 
dmb->cpu_addr = dma_alloc_coherent(>pdev->dev, dmb->dmb_len,
   >dma_addr,
-  GFP_KERNEL | __GFP_NOWARN | 
__GFP_NOMEMALLOC | __GFP_COMP | __GFP_NORETRY);
+  GFP_KERNEL | __GFP_NOWARN |
+  __GFP_NORETRY);
if (!dmb->cpu_addr)
clear_bit(dmb->sba_idx, ism->sba_bitmap);
 
-- 
2.20.1



[PATCH 13/16] mm: rename alloc_pages_exact_nid to alloc_pages_exact_node

2019-06-14 Thread Christoph Hellwig
This fits in with the naming scheme used by alloc_pages_node.

Signed-off-by: Christoph Hellwig 
---
 include/linux/gfp.h | 2 +-
 mm/page_alloc.c | 4 ++--
 mm/page_ext.c   | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fb07b503dc45..4274ea6bc72b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -532,7 +532,7 @@ extern unsigned long get_zeroed_page(gfp_t gfp_mask);
 
 void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
 void free_pages_exact(void *virt, size_t size);
-void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
+void * __meminit alloc_pages_exact_node(int nid, size_t size, gfp_t gfp_mask);
 
 #define __get_free_page(gfp_mask) \
__get_free_pages((gfp_mask), 0)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d66bc8abe0af..dd2fed66b656 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4888,7 +4888,7 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
 EXPORT_SYMBOL(alloc_pages_exact);
 
 /**
- * alloc_pages_exact_nid - allocate an exact number of physically-contiguous
+ * alloc_pages_exact_node - allocate an exact number of physically-contiguous
  *pages on a node.
  * @nid: the preferred node ID where memory should be allocated
  * @size: the number of bytes to allocate
@@ -4899,7 +4899,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
  *
  * Return: pointer to the allocated area or %NULL in case of error.
  */
-void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
+void * __meminit alloc_pages_exact_node(int nid, size_t size, gfp_t gfp_mask)
 {
unsigned int order = get_order(size);
struct page *p;
diff --git a/mm/page_ext.c b/mm/page_ext.c
index d8f1aca4ad43..bca6bb316714 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -215,7 +215,7 @@ static void *__meminit alloc_page_ext(size_t size, int nid)
gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN;
void *addr = NULL;
 
-   addr = alloc_pages_exact_nid(nid, size, flags);
+   addr = alloc_pages_exact_node(nid, size, flags);
if (addr) {
kmemleak_alloc(addr, size, 1, flags);
return addr;
-- 
2.20.1



[PATCH 15/16] dma-mapping: clear __GFP_COMP in dma_alloc_attrs

2019-06-14 Thread Christoph Hellwig
Lift the code to clear __GFP_COMP from arm into the common DMA
allocator path.  For one this fixes the various other patches that
call alloc_pages_exact or split_page in case a bogus driver passes
the argument, and it also prepares for doing exact allocation in
the generic dma-direct allocator.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c | 17 -
 kernel/dma/mapping.c  |  9 +
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0a75058c11f3..86135feb2c05 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -759,14 +759,6 @@ static void *__dma_alloc(struct device *dev, size_t size, 
dma_addr_t *handle,
if (mask < 0xULL)
gfp |= GFP_DMA;
 
-   /*
-* Following is a work-around (a.k.a. hack) to prevent pages
-* with __GFP_COMP being passed to split_page() which cannot
-* handle them.  The real problem is that this flag probably
-* should be 0 on ARM as it is not supported on this
-* platform; see CONFIG_HUGETLBFS.
-*/
-   gfp &= ~(__GFP_COMP);
args.gfp = gfp;
 
*handle = DMA_MAPPING_ERROR;
@@ -1527,15 +1519,6 @@ static void *__arm_iommu_alloc_attrs(struct device *dev, 
size_t size,
return __iommu_alloc_simple(dev, size, gfp, handle,
coherent_flag, attrs);
 
-   /*
-* Following is a work-around (a.k.a. hack) to prevent pages
-* with __GFP_COMP being passed to split_page() which cannot
-* handle them.  The real problem is that this flag probably
-* should be 0 on ARM as it is not supported on this
-* platform; see CONFIG_HUGETLBFS.
-*/
-   gfp &= ~(__GFP_COMP);
-
pages = __iommu_alloc_buffer(dev, size, gfp, attrs, coherent_flag);
if (!pages)
return NULL;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index f7afdadb6770..4b618e1abbc1 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -252,6 +252,15 @@ void *dma_alloc_attrs(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
/* let the implementation decide on the zone to allocate from: */
flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
 
+   /*
+* __GFP_COMP interacts badly with splitting up a larger order
+* allocation.  But as our allocations might not even come from the
+* page allocator, the callers can't rely on the fact that they
+* even get pages, never mind which kind.
+*/
+   if (WARN_ON_ONCE(flag & __GFP_COMP))
+   flag &= ~__GFP_COMP;
+
if (dma_is_direct(ops))
cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
else if (ops->alloc)
-- 
2.20.1



[PATCH 14/16] mm: use alloc_pages_exact_node to implement alloc_pages_exact

2019-06-14 Thread Christoph Hellwig
No need to duplicate the logic over two functions that are almost the
same.

Signed-off-by: Christoph Hellwig 
---
 include/linux/gfp.h |  5 +++--
 mm/page_alloc.c | 39 +++
 2 files changed, 10 insertions(+), 34 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4274ea6bc72b..c616a23a3f81 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -530,9 +530,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int 
order,
 extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
 extern unsigned long get_zeroed_page(gfp_t gfp_mask);
 
-void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
 void free_pages_exact(void *virt, size_t size);
-void * __meminit alloc_pages_exact_node(int nid, size_t size, gfp_t gfp_mask);
+void *alloc_pages_exact_node(int nid, size_t size, gfp_t gfp_mask);
+#define alloc_pages_exact(size, gfp_mask) \
+   alloc_pages_exact_node(NUMA_NO_NODE, size, gfp_mask)
 
 #define __get_free_page(gfp_mask) \
__get_free_pages((gfp_mask), 0)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dd2fed66b656..dec68bd21a71 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4859,34 +4859,6 @@ static void *make_alloc_exact(unsigned long addr, 
unsigned int order,
return (void *)addr;
 }
 
-/**
- * alloc_pages_exact - allocate an exact number physically-contiguous pages.
- * @size: the number of bytes to allocate
- * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
- *
- * This function is similar to alloc_pages(), except that it allocates the
- * minimum number of pages to satisfy the request.  alloc_pages() can only
- * allocate memory in power-of-two pages.
- *
- * This function is also limited by MAX_ORDER.
- *
- * Memory allocated by this function must be released by free_pages_exact().
- *
- * Return: pointer to the allocated area or %NULL in case of error.
- */
-void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
-{
-   unsigned int order = get_order(size);
-   unsigned long addr;
-
-   if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
-   gfp_mask &= ~__GFP_COMP;
-
-   addr = __get_free_pages(gfp_mask, order);
-   return make_alloc_exact(addr, order, size);
-}
-EXPORT_SYMBOL(alloc_pages_exact);
-
 /**
  * alloc_pages_exact_node - allocate an exact number of physically-contiguous
  *pages on a node.
@@ -4894,12 +4866,15 @@ EXPORT_SYMBOL(alloc_pages_exact);
  * @size: the number of bytes to allocate
  * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
  *
- * Like alloc_pages_exact(), but try to allocate on node nid first before 
falling
- * back.
+ * This function is similar to alloc_pages_node(), except that it allocates the
+ * minimum number of pages to satisfy the request while alloc_pages() can only
+ * allocate memory in power-of-two pages.  This function is also limited by
+ * MAX_ORDER.
  *
- * Return: pointer to the allocated area or %NULL in case of error.
+ * Returns a pointer to the allocated area or %NULL in case of error, memory
+ * allocated by this function must be released by free_pages_exact().
  */
-void * __meminit alloc_pages_exact_node(int nid, size_t size, gfp_t gfp_mask)
+void *alloc_pages_exact_node(int nid, size_t size, gfp_t gfp_mask)
 {
unsigned int order = get_order(size);
struct page *p;
-- 
2.20.1



[PATCH 07/16] IB/hfi1: stop passing bogus gfp flags arguments to dma_alloc_coherent

2019-06-14 Thread Christoph Hellwig
dma_alloc_coherent is not just the page allocator.  The only valid
arguments to pass are either GFP_ATOMIC or GFP_ATOMIC with possible
modifiers of __GFP_NORETRY or __GFP_NOWARN.

Signed-off-by: Christoph Hellwig 
---
 drivers/infiniband/hw/hfi1/init.c | 22 +++---
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/init.c 
b/drivers/infiniband/hw/hfi1/init.c
index 71cb9525c074..ff9d106ee784 100644
--- a/drivers/infiniband/hw/hfi1/init.c
+++ b/drivers/infiniband/hw/hfi1/init.c
@@ -1846,17 +1846,10 @@ int hfi1_create_rcvhdrq(struct hfi1_devdata *dd, struct 
hfi1_ctxtdata *rcd)
u64 reg;
 
if (!rcd->rcvhdrq) {
-   gfp_t gfp_flags;
-
amt = rcvhdrq_size(rcd);
 
-   if (rcd->ctxt < dd->first_dyn_alloc_ctxt || rcd->is_vnic)
-   gfp_flags = GFP_KERNEL;
-   else
-   gfp_flags = GFP_USER;
rcd->rcvhdrq = dma_alloc_coherent(>pcidev->dev, amt,
- >rcvhdrq_dma,
- gfp_flags | __GFP_COMP);
+ >rcvhdrq_dma, 
GFP_KERNEL);
 
if (!rcd->rcvhdrq) {
dd_dev_err(dd,
@@ -1870,7 +1863,7 @@ int hfi1_create_rcvhdrq(struct hfi1_devdata *dd, struct 
hfi1_ctxtdata *rcd)
rcd->rcvhdrtail_kvaddr = 
dma_alloc_coherent(>pcidev->dev,
PAGE_SIZE,

>rcvhdrqtailaddr_dma,
-   gfp_flags);
+   GFP_KERNEL);
if (!rcd->rcvhdrtail_kvaddr)
goto bail_free;
}
@@ -1926,19 +1919,10 @@ int hfi1_setup_eagerbufs(struct hfi1_ctxtdata *rcd)
 {
struct hfi1_devdata *dd = rcd->dd;
u32 max_entries, egrtop, alloced_bytes = 0;
-   gfp_t gfp_flags;
u16 order, idx = 0;
int ret = 0;
u16 round_mtu = roundup_pow_of_two(hfi1_max_mtu);
 
-   /*
-* GFP_USER, but without GFP_FS, so buffer cache can be
-* coalesced (we hope); otherwise, even at order 4,
-* heavy filesystem activity makes these fail, and we can
-* use compound pages.
-*/
-   gfp_flags = __GFP_RECLAIM | __GFP_IO | __GFP_COMP;
-
/*
 * The minimum size of the eager buffers is a groups of MTU-sized
 * buffers.
@@ -1969,7 +1953,7 @@ int hfi1_setup_eagerbufs(struct hfi1_ctxtdata *rcd)
dma_alloc_coherent(>pcidev->dev,
   rcd->egrbufs.rcvtid_size,
   >egrbufs.buffers[idx].dma,
-  gfp_flags);
+  GFP_KERNEL);
if (rcd->egrbufs.buffers[idx].addr) {
rcd->egrbufs.buffers[idx].len =
rcd->egrbufs.rcvtid_size;
-- 
2.20.1



[PATCH 10/16] iwlwifi: stop passing bogus gfp flags arguments to dma_alloc_coherent

2019-06-14 Thread Christoph Hellwig
dma_alloc_coherent is not just the page allocator.  The only valid
arguments to pass are either GFP_ATOMIC or GFP_ATOMIC with possible
modifiers of __GFP_NORETRY or __GFP_NOWARN.

Signed-off-by: Christoph Hellwig 
---
 drivers/net/wireless/intel/iwlwifi/fw/dbg.c | 3 +--
 drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c 
b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
index 5f52e40a2903..323dc5d5ee88 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
+++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
@@ -2361,8 +2361,7 @@ iwl_fw_dbg_buffer_allocation(struct iwl_fw_runtime *fwrt, 
u32 size)
 
virtual_addr =
dma_alloc_coherent(fwrt->trans->dev, size, _addr,
-  GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO |
-  __GFP_COMP);
+  GFP_KERNEL | __GFP_NOWARN);
 
/* TODO: alloc fragments if needed */
if (!virtual_addr)
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index 803fcbac4152..22a47f928dc8 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -210,8 +210,7 @@ static void iwl_pcie_alloc_fw_monitor_block(struct 
iwl_trans *trans,
for (power = max_power; power >= min_power; power--) {
size = BIT(power);
cpu_addr = dma_alloc_coherent(trans->dev, size, ,
- GFP_KERNEL | __GFP_NOWARN |
- __GFP_ZERO | __GFP_COMP);
+ GFP_KERNEL | __GFP_NOWARN);
if (!cpu_addr)
continue;
 
-- 
2.20.1



[PATCH 08/16] IB/qib: stop passing bogus gfp flags arguments to dma_alloc_coherent

2019-06-14 Thread Christoph Hellwig
dma_alloc_coherent is not just the page allocator.  The only valid
arguments to pass are either GFP_ATOMIC or GFP_ATOMIC with possible
modifiers of __GFP_NORETRY or __GFP_NOWARN.

Signed-off-by: Christoph Hellwig 
---
 drivers/infiniband/hw/qib/qib_iba6120.c |  2 +-
 drivers/infiniband/hw/qib/qib_init.c| 20 +++-
 2 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_iba6120.c 
b/drivers/infiniband/hw/qib/qib_iba6120.c
index 531d8a1db2c3..d8a0b8993d22 100644
--- a/drivers/infiniband/hw/qib/qib_iba6120.c
+++ b/drivers/infiniband/hw/qib/qib_iba6120.c
@@ -2076,7 +2076,7 @@ static void alloc_dummy_hdrq(struct qib_devdata *dd)
dd->cspec->dummy_hdrq = dma_alloc_coherent(>pcidev->dev,
dd->rcd[0]->rcvhdrq_size,
>cspec->dummy_hdrq_phys,
-   GFP_ATOMIC | __GFP_COMP);
+   GFP_ATOMIC);
if (!dd->cspec->dummy_hdrq) {
qib_devinfo(dd->pcidev, "Couldn't allocate dummy hdrq\n");
/* fallback to just 0'ing */
diff --git a/drivers/infiniband/hw/qib/qib_init.c 
b/drivers/infiniband/hw/qib/qib_init.c
index d4fd8a6cff7b..072885a6684d 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -1547,18 +1547,13 @@ int qib_create_rcvhdrq(struct qib_devdata *dd, struct 
qib_ctxtdata *rcd)
 
if (!rcd->rcvhdrq) {
dma_addr_t phys_hdrqtail;
-   gfp_t gfp_flags;
-
amt = ALIGN(dd->rcvhdrcnt * dd->rcvhdrentsize *
sizeof(u32), PAGE_SIZE);
-   gfp_flags = (rcd->ctxt >= dd->first_user_ctxt) ?
-   GFP_USER : GFP_KERNEL;
 
old_node_id = dev_to_node(>pcidev->dev);
set_dev_node(>pcidev->dev, rcd->node_id);
rcd->rcvhdrq = dma_alloc_coherent(
-   >pcidev->dev, amt, >rcvhdrq_phys,
-   gfp_flags | __GFP_COMP);
+   >pcidev->dev, amt, >rcvhdrq_phys, GFP_KERNEL);
set_dev_node(>pcidev->dev, old_node_id);
 
if (!rcd->rcvhdrq) {
@@ -1578,7 +1573,7 @@ int qib_create_rcvhdrq(struct qib_devdata *dd, struct 
qib_ctxtdata *rcd)
set_dev_node(>pcidev->dev, rcd->node_id);
rcd->rcvhdrtail_kvaddr = dma_alloc_coherent(
>pcidev->dev, PAGE_SIZE, _hdrqtail,
-   gfp_flags);
+   GFP_KERNEL);
set_dev_node(>pcidev->dev, old_node_id);
if (!rcd->rcvhdrtail_kvaddr)
goto bail_free;
@@ -1622,17 +1617,8 @@ int qib_setup_eagerbufs(struct qib_ctxtdata *rcd)
struct qib_devdata *dd = rcd->dd;
unsigned e, egrcnt, egrperchunk, chunk, egrsize, egroff;
size_t size;
-   gfp_t gfp_flags;
int old_node_id;
 
-   /*
-* GFP_USER, but without GFP_FS, so buffer cache can be
-* coalesced (we hope); otherwise, even at order 4,
-* heavy filesystem activity makes these fail, and we can
-* use compound pages.
-*/
-   gfp_flags = __GFP_RECLAIM | __GFP_IO | __GFP_COMP;
-
egrcnt = rcd->rcvegrcnt;
egroff = rcd->rcvegr_tid_base;
egrsize = dd->rcvegrbufsize;
@@ -1664,7 +1650,7 @@ int qib_setup_eagerbufs(struct qib_ctxtdata *rcd)
rcd->rcvegrbuf[e] =
dma_alloc_coherent(>pcidev->dev, size,
   >rcvegrbuf_phys[e],
-  gfp_flags);
+  GFP_KERNEL);
set_dev_node(>pcidev->dev, old_node_id);
if (!rcd->rcvegrbuf[e])
goto bail_rcvegrbuf_phys;
-- 
2.20.1



[PATCH 16/16] dma-mapping: use exact allocation in dma_alloc_contiguous

2019-06-14 Thread Christoph Hellwig
Many architectures (e.g. arm, m68 and sh) have always used exact
allocation in their dma coherent allocator, which avoids a lot of
memory waste especially for larger allocations.  Lift this behavior
into the generic allocator so that dma-direct and the generic IOMMU
code benefit from this behavior as well.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-contiguous.h |  8 +---
 kernel/dma/contiguous.c| 17 +++--
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
index c05d4e661489..2e542e314acf 100644
--- a/include/linux/dma-contiguous.h
+++ b/include/linux/dma-contiguous.h
@@ -161,15 +161,17 @@ static inline struct page *dma_alloc_contiguous(struct 
device *dev, size_t size,
gfp_t gfp)
 {
int node = dev ? dev_to_node(dev) : NUMA_NO_NODE;
-   size_t align = get_order(PAGE_ALIGN(size));
+   void *cpu_addr = alloc_pages_exact_node(node, size, gfp);
 
-   return alloc_pages_node(node, gfp, align);
+   if (!cpu_addr)
+   return NULL;
+   return virt_to_page(p);
 }
 
 static inline void dma_free_contiguous(struct device *dev, struct page *page,
size_t size)
 {
-   __free_pages(page, get_order(size));
+   free_pages_exact(page_address(page), get_order(size));
 }
 
 #endif
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index bfc0c17f2a3d..84f41eea2741 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -232,9 +232,8 @@ struct page *dma_alloc_contiguous(struct device *dev, 
size_t size, gfp_t gfp)
 {
int node = dev ? dev_to_node(dev) : NUMA_NO_NODE;
size_t count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   size_t align = get_order(PAGE_ALIGN(size));
-   struct page *page = NULL;
struct cma *cma = NULL;
+   void *cpu_addr;
 
if (dev && dev->cma_area)
cma = dev->cma_area;
@@ -243,14 +242,20 @@ struct page *dma_alloc_contiguous(struct device *dev, 
size_t size, gfp_t gfp)
 
/* CMA can be used only in the context which permits sleeping */
if (cma && gfpflags_allow_blocking(gfp)) {
+   size_t align = get_order(PAGE_ALIGN(size));
+   struct page *page;
+
align = min_t(size_t, align, CONFIG_CMA_ALIGNMENT);
page = cma_alloc(cma, count, align, gfp & __GFP_NOWARN);
+   if (page)
+   return page;
}
 
/* Fallback allocation of normal pages */
-   if (!page)
-   page = alloc_pages_node(node, gfp, align);
-   return page;
+   cpu_addr = alloc_pages_exact_node(node, size, gfp);
+   if (!cpu_addr)
+   return NULL;
+   return virt_to_page(cpu_addr);
 }
 
 /**
@@ -267,7 +272,7 @@ struct page *dma_alloc_contiguous(struct device *dev, 
size_t size, gfp_t gfp)
 void dma_free_contiguous(struct device *dev, struct page *page, size_t size)
 {
if (!cma_release(dev_get_cma_area(dev), page, size >> PAGE_SHIFT))
-   __free_pages(page, get_order(size));
+   free_pages_exact(page_address(page), get_order(size));
 }
 
 /*
-- 
2.20.1



[PATCH 12/16] staging/comedi: mark as broken

2019-06-14 Thread Christoph Hellwig
comedi_buf.c abuse the DMA API in gravely broken ways, as it assumes it
can call virt_to_page on the result, and the just remap it as uncached
using vmap.  Disable the driver until this API abuse has been fixed.

Signed-off-by: Christoph Hellwig 
---
 drivers/staging/comedi/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/comedi/Kconfig b/drivers/staging/comedi/Kconfig
index 049b659fa6ad..e7c021d76cfa 100644
--- a/drivers/staging/comedi/Kconfig
+++ b/drivers/staging/comedi/Kconfig
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 config COMEDI
tristate "Data acquisition support (comedi)"
+   depends on BROKEN
help
  Enable support for a wide range of data acquisition devices
  for Linux.
-- 
2.20.1



[PATCH 02/16] drm/ati_pcigart: stop using drm_pci_alloc

2019-06-14 Thread Christoph Hellwig
Remove usage of the legacy drm PCI DMA wrappers, and with that the
incorrect usage cocktail of __GFP_COMP, virt_to_page on DMA allocation
and SetPageReserved.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/ati_pcigart.c | 27 +++
 include/drm/ati_pcigart.h |  5 -
 2 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/ati_pcigart.c b/drivers/gpu/drm/ati_pcigart.c
index 2362f07fe1fc..f66d4fccd812 100644
--- a/drivers/gpu/drm/ati_pcigart.c
+++ b/drivers/gpu/drm/ati_pcigart.c
@@ -41,21 +41,14 @@
 static int drm_ati_alloc_pcigart_table(struct drm_device *dev,
   struct drm_ati_pcigart_info *gart_info)
 {
-   gart_info->table_handle = drm_pci_alloc(dev, gart_info->table_size,
-   PAGE_SIZE);
-   if (gart_info->table_handle == NULL)
+   gart_info->table_vaddr = dma_alloc_coherent(>pdev->dev,
+   gart_info->table_size, _info->table_handle,
+   GFP_KERNEL);
+   if (!gart_info->table_vaddr)
return -ENOMEM;
-
return 0;
 }
 
-static void drm_ati_free_pcigart_table(struct drm_device *dev,
-  struct drm_ati_pcigart_info *gart_info)
-{
-   drm_pci_free(dev, gart_info->table_handle);
-   gart_info->table_handle = NULL;
-}
-
 int drm_ati_pcigart_cleanup(struct drm_device *dev, struct 
drm_ati_pcigart_info *gart_info)
 {
struct drm_sg_mem *entry = dev->sg;
@@ -87,8 +80,10 @@ int drm_ati_pcigart_cleanup(struct drm_device *dev, struct 
drm_ati_pcigart_info
}
 
if (gart_info->gart_table_location == DRM_ATI_GART_MAIN &&
-   gart_info->table_handle) {
-   drm_ati_free_pcigart_table(dev, gart_info);
+   gart_info->table_vaddr) {
+   dma_free_coherent(>pdev->dev, gart_info->table_size,
+   gart_info->table_vaddr, 
gart_info->table_handle);
+   gart_info->table_vaddr = NULL;
}
 
return 1;
@@ -127,9 +122,9 @@ int drm_ati_pcigart_init(struct drm_device *dev, struct 
drm_ati_pcigart_info *ga
goto done;
}
 
-   pci_gart = gart_info->table_handle->vaddr;
-   address = gart_info->table_handle->vaddr;
-   bus_address = gart_info->table_handle->busaddr;
+   pci_gart = gart_info->table_vaddr;
+   address = gart_info->table_vaddr;
+   bus_address = gart_info->table_handle;
} else {
address = gart_info->addr;
bus_address = gart_info->bus_addr;
diff --git a/include/drm/ati_pcigart.h b/include/drm/ati_pcigart.h
index a728a1364e66..2ffe278ba3b3 100644
--- a/include/drm/ati_pcigart.h
+++ b/include/drm/ati_pcigart.h
@@ -18,7 +18,10 @@ struct drm_ati_pcigart_info {
void *addr;
dma_addr_t bus_addr;
dma_addr_t table_mask;
-   struct drm_dma_handle *table_handle;
+
+   dma_addr_t table_handle;
+   void *table_vaddr;
+
struct drm_local_map mapping;
int table_size;
 };
-- 
2.20.1



use exact allocation for dma coherent memory

2019-06-14 Thread Christoph Hellwig
Hi all,

various architectures have used exact memory allocations for dma
allocations for a long time, but x86 and thus the common code based
on it kept using our normal power of two allocator, which tends to
waste a lot of memory for certain allocations.

Switching to a slightly cleaned up alloc_pages_exact is pretty easy,
but it turns out that because we didn't filter valid gfp_t flags
on the DMA allocator, a bunch of drivers were passing __GFP_COMP
to it, which is rather bogus in too many ways to explain.  Arm has
been filtering it for a while, but this series instead tries to fix
the drivers and warn when __GFP_COMP is passed, which makes it much
larger than just adding the functionality.


[PATCH 06/16] drm: don't pass __GFP_COMP to dma_alloc_coherent in drm_pci_alloc

2019-06-14 Thread Christoph Hellwig
The memory returned from dma_alloc_coherent is opaqueue to the user,
thus the exact way of page refcounting shall not matter either.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/drm_bufs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_bufs.c b/drivers/gpu/drm/drm_bufs.c
index b640437ce90f..7a79a16a055c 100644
--- a/drivers/gpu/drm/drm_bufs.c
+++ b/drivers/gpu/drm/drm_bufs.c
@@ -70,7 +70,7 @@ drm_dma_handle_t *drm_pci_alloc(struct drm_device * dev, 
size_t size, size_t ali
dmah->size = size;
dmah->vaddr = dma_alloc_coherent(>pdev->dev, size,
 >busaddr,
-GFP_KERNEL | __GFP_COMP);
+GFP_KERNEL);
 
if (dmah->vaddr == NULL) {
kfree(dmah);
-- 
2.20.1



[PATCH 04/16] drm: move drm_pci_{alloc,free} to drm_legacy

2019-06-14 Thread Christoph Hellwig
These functions are rather broken in that they try to pass __GFP_COMP
to dma_alloc_coherent, call virt_to_page on the return value and
mess with PageReserved.  And not actually used by any modern driver.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/drm_bufs.c | 85 
 drivers/gpu/drm/drm_pci.c  | 89 --
 2 files changed, 85 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/drm_bufs.c b/drivers/gpu/drm/drm_bufs.c
index bfc419ed9d6c..7418872d87c6 100644
--- a/drivers/gpu/drm/drm_bufs.c
+++ b/drivers/gpu/drm/drm_bufs.c
@@ -38,6 +38,91 @@
 
 #include 
 
+/**
+ * drm_pci_alloc - Allocate a PCI consistent memory block, for DMA.
+ * @dev: DRM device
+ * @size: size of block to allocate
+ * @align: alignment of block
+ *
+ * FIXME: This is a needless abstraction of the Linux dma-api and should be
+ * removed.
+ *
+ * Return: A handle to the allocated memory block on success or NULL on
+ * failure.
+ */
+drm_dma_handle_t *drm_pci_alloc(struct drm_device * dev, size_t size, size_t 
align)
+{
+   drm_dma_handle_t *dmah;
+   unsigned long addr;
+   size_t sz;
+
+   /* pci_alloc_consistent only guarantees alignment to the smallest
+* PAGE_SIZE order which is greater than or equal to the requested size.
+* Return NULL here for now to make sure nobody tries for larger 
alignment
+*/
+   if (align > size)
+   return NULL;
+
+   dmah = kmalloc(sizeof(drm_dma_handle_t), GFP_KERNEL);
+   if (!dmah)
+   return NULL;
+
+   dmah->size = size;
+   dmah->vaddr = dma_alloc_coherent(>pdev->dev, size,
+>busaddr,
+GFP_KERNEL | __GFP_COMP);
+
+   if (dmah->vaddr == NULL) {
+   kfree(dmah);
+   return NULL;
+   }
+
+   /* XXX - Is virt_to_page() legal for consistent mem? */
+   /* Reserve */
+   for (addr = (unsigned long)dmah->vaddr, sz = size;
+sz > 0; addr += PAGE_SIZE, sz -= PAGE_SIZE) {
+   SetPageReserved(virt_to_page((void *)addr));
+   }
+
+   return dmah;
+}
+
+/*
+ * Free a PCI consistent memory block without freeing its descriptor.
+ *
+ * This function is for internal use in the Linux-specific DRM core code.
+ */
+void __drm_legacy_pci_free(struct drm_device * dev, drm_dma_handle_t * dmah)
+{
+   unsigned long addr;
+   size_t sz;
+
+   if (dmah->vaddr) {
+   /* XXX - Is virt_to_page() legal for consistent mem? */
+   /* Unreserve */
+   for (addr = (unsigned long)dmah->vaddr, sz = dmah->size;
+sz > 0; addr += PAGE_SIZE, sz -= PAGE_SIZE) {
+   ClearPageReserved(virt_to_page((void *)addr));
+   }
+   dma_free_coherent(>pdev->dev, dmah->size, dmah->vaddr,
+ dmah->busaddr);
+   }
+}
+
+/**
+ * drm_pci_free - Free a PCI consistent memory block
+ * @dev: DRM device
+ * @dmah: handle to memory block
+ *
+ * FIXME: This is a needless abstraction of the Linux dma-api and should be
+ * removed.
+ */
+void drm_pci_free(struct drm_device * dev, drm_dma_handle_t * dmah)
+{
+   __drm_legacy_pci_free(dev, dmah);
+   kfree(dmah);
+}
+
 static struct drm_map_list *drm_find_matching_map(struct drm_device *dev,
  struct drm_local_map *map)
 {
diff --git a/drivers/gpu/drm/drm_pci.c b/drivers/gpu/drm/drm_pci.c
index 693748ad8b88..77a215f2a8e4 100644
--- a/drivers/gpu/drm/drm_pci.c
+++ b/drivers/gpu/drm/drm_pci.c
@@ -31,95 +31,6 @@
 #include "drm_internal.h"
 #include "drm_legacy.h"
 
-/**
- * drm_pci_alloc - Allocate a PCI consistent memory block, for DMA.
- * @dev: DRM device
- * @size: size of block to allocate
- * @align: alignment of block
- *
- * FIXME: This is a needless abstraction of the Linux dma-api and should be
- * removed.
- *
- * Return: A handle to the allocated memory block on success or NULL on
- * failure.
- */
-drm_dma_handle_t *drm_pci_alloc(struct drm_device * dev, size_t size, size_t 
align)
-{
-   drm_dma_handle_t *dmah;
-   unsigned long addr;
-   size_t sz;
-
-   /* pci_alloc_consistent only guarantees alignment to the smallest
-* PAGE_SIZE order which is greater than or equal to the requested size.
-* Return NULL here for now to make sure nobody tries for larger 
alignment
-*/
-   if (align > size)
-   return NULL;
-
-   dmah = kmalloc(sizeof(drm_dma_handle_t), GFP_KERNEL);
-   if (!dmah)
-   return NULL;
-
-   dmah->size = size;
-   dmah->vaddr = dma_alloc_coherent(>pdev->dev, size,
->busaddr,
-GFP_KERNEL | __GFP_COMP);
-

[PATCH 01/16] media: videobuf-dma-contig: use dma_mmap_coherent

2019-06-14 Thread Christoph Hellwig
dma_alloc_coherent does not return a physical address, but a DMA
address, which might be remapped or have an offset.  Passing this
DMA address to vm_iomap_memory is completely bogus.  Use the proper
dma_mmap_coherent helper instead, and stop passing __GFP_COMP
to dma_alloc_coherent, as the memory management inside the DMA
allocator is hidden from the callers.

Fixes: a8f3c203e19b ("[media] videobuf-dma-contig: add cache support")
Signed-off-by: Christoph Hellwig 
---
 drivers/media/v4l2-core/videobuf-dma-contig.c | 23 +++
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c 
b/drivers/media/v4l2-core/videobuf-dma-contig.c
index e1bf50df4c70..a5942ea38f1f 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -39,11 +39,11 @@ struct videobuf_dma_contig_memory {
 
 static int __videobuf_dc_alloc(struct device *dev,
   struct videobuf_dma_contig_memory *mem,
-  unsigned long size, gfp_t flags)
+  unsigned long size)
 {
mem->size = size;
-   mem->vaddr = dma_alloc_coherent(dev, mem->size,
-   >dma_handle, flags);
+   mem->vaddr = dma_alloc_coherent(dev, mem->size, >dma_handle,
+   GFP_KERNEL);
 
if (!mem->vaddr) {
dev_err(dev, "memory alloc size %ld failed\n", mem->size);
@@ -260,8 +260,7 @@ static int __videobuf_iolock(struct videobuf_queue *q,
return videobuf_dma_contig_user_get(mem, vb);
 
/* allocate memory for the read() method */
-   if (__videobuf_dc_alloc(q->dev, mem, PAGE_ALIGN(vb->size),
-   GFP_KERNEL))
+   if (__videobuf_dc_alloc(q->dev, mem, PAGE_ALIGN(vb->size)))
return -ENOMEM;
break;
case V4L2_MEMORY_OVERLAY:
@@ -280,7 +279,6 @@ static int __videobuf_mmap_mapper(struct videobuf_queue *q,
struct videobuf_dma_contig_memory *mem;
struct videobuf_mapping *map;
int retval;
-   unsigned long size;
 
dev_dbg(q->dev, "%s\n", __func__);
 
@@ -298,23 +296,18 @@ static int __videobuf_mmap_mapper(struct videobuf_queue 
*q,
BUG_ON(!mem);
MAGIC_CHECK(mem->magic, MAGIC_DC_MEM);
 
-   if (__videobuf_dc_alloc(q->dev, mem, PAGE_ALIGN(buf->bsize),
-   GFP_KERNEL | __GFP_COMP))
+   if (__videobuf_dc_alloc(q->dev, mem, PAGE_ALIGN(buf->bsize)))
goto error;
 
-   /* Try to remap memory */
-   size = vma->vm_end - vma->vm_start;
-   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
-
/* the "vm_pgoff" is just used in v4l2 to find the
 * corresponding buffer data structure which is allocated
 * earlier and it does not mean the offset from the physical
 * buffer start address as usual. So set it to 0 to pass
-* the sanity check in vm_iomap_memory().
+* the sanity check in dma_mmap_coherent().
 */
vma->vm_pgoff = 0;
-
-   retval = vm_iomap_memory(vma, mem->dma_handle, size);
+   retval = dma_mmap_coherent(q->dev, vma, mem->vaddr, mem->dma_handle,
+   vma->vm_end - vma->vm_start);
if (retval) {
dev_err(q->dev, "mmap: remap failed with error %d. ",
retval);
-- 
2.20.1



Re: [PATCH v5 2/9] mm: Add an apply_to_pfn_range interface

2019-06-12 Thread Christoph Hellwig
On Wed, Jun 12, 2019 at 08:42:36AM +0200, Thomas Hellström (VMware) wrote:
> From: Thomas Hellstrom 
> 
> This is basically apply_to_page_range with added functionality:
> Allocating missing parts of the page table becomes optional, which
> means that the function can be guaranteed not to error if allocation
> is disabled. Also passing of the closure struct and callback function
> becomes different and more in line with how things are done elsewhere.
> 
> Finally we keep apply_to_page_range as a wrapper around apply_to_pfn_range
> 
> The reason for not using the page-walk code is that we want to perform
> the page-walk on vmas pointing to an address space without requiring the
> mmap_sem to be held rather than on vmas belonging to a process with the
> mmap_sem held.
> 
> Notable changes since RFC:
> Don't export apply_to_pfn range.
> 
> Cc: Andrew Morton 
> Cc: Matthew Wilcox 
> Cc: Will Deacon 
> Cc: Peter Zijlstra 
> Cc: Rik van Riel 
> Cc: Minchan Kim 
> Cc: Michal Hocko 
> Cc: Huang Ying 
> Cc: Souptick Joarder 
> Cc: "Jérôme Glisse" 
> Cc: linux...@kvack.org
> Cc: linux-ker...@vger.kernel.org
> 
> Signed-off-by: Thomas Hellstrom 
> Reviewed-by: Ralph Campbell  #v1
> ---
>  include/linux/mm.h |  10 
>  mm/memory.c| 135 ++---
>  2 files changed, 113 insertions(+), 32 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0e8834ac32b7..3d06ce2a64af 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2675,6 +2675,16 @@ typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, 
> unsigned long addr,
>  extern int apply_to_page_range(struct mm_struct *mm, unsigned long address,
>  unsigned long size, pte_fn_t fn, void *data);
>  
> +struct pfn_range_apply;
> +typedef int (*pter_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
> +  struct pfn_range_apply *closure);
> +struct pfn_range_apply {
> + struct mm_struct *mm;
> + pter_fn_t ptefn;
> + unsigned int alloc;
> +};
> +extern int apply_to_pfn_range(struct pfn_range_apply *closure,
> +   unsigned long address, unsigned long size);
>  
>  #ifdef CONFIG_PAGE_POISONING
>  extern bool page_poisoning_enabled(void);
> diff --git a/mm/memory.c b/mm/memory.c
> index 168f546af1ad..462aa47f8878 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2032,18 +2032,17 @@ int vm_iomap_memory(struct vm_area_struct *vma, 
> phys_addr_t start, unsigned long
>  }
>  EXPORT_SYMBOL(vm_iomap_memory);
>  
> -static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
> -  unsigned long addr, unsigned long end,
> -  pte_fn_t fn, void *data)
> +static int apply_to_pte_range(struct pfn_range_apply *closure, pmd_t *pmd,
> +   unsigned long addr, unsigned long end)
>  {
>   pte_t *pte;
>   int err;
>   pgtable_t token;
>   spinlock_t *uninitialized_var(ptl);
>  
> - pte = (mm == _mm) ?
> + pte = (closure->mm == _mm) ?
>   pte_alloc_kernel(pmd, addr) :
> - pte_alloc_map_lock(mm, pmd, addr, );
> + pte_alloc_map_lock(closure->mm, pmd, addr, );
>   if (!pte)
>   return -ENOMEM;
>  
> @@ -2054,86 +2053,109 @@ static int apply_to_pte_range(struct mm_struct *mm, 
> pmd_t *pmd,
>   token = pmd_pgtable(*pmd);
>  
>   do {
> - err = fn(pte++, token, addr, data);
> + err = closure->ptefn(pte++, token, addr, closure);
>   if (err)
>   break;
>   } while (addr += PAGE_SIZE, addr != end);
>  
>   arch_leave_lazy_mmu_mode();
>  
> - if (mm != _mm)
> + if (closure->mm != _mm)
>   pte_unmap_unlock(pte-1, ptl);
>   return err;
>  }
>  
> -static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud,
> -  unsigned long addr, unsigned long end,
> -  pte_fn_t fn, void *data)
> +static int apply_to_pmd_range(struct pfn_range_apply *closure, pud_t *pud,
> +   unsigned long addr, unsigned long end)
>  {
>   pmd_t *pmd;
>   unsigned long next;
> - int err;
> + int err = 0;
>  
>   BUG_ON(pud_huge(*pud));
>  
> - pmd = pmd_alloc(mm, pud, addr);
> + pmd = pmd_alloc(closure->mm, pud, addr);
>   if (!pmd)
>   return -ENOMEM;
> +
>   do {
>   next = pmd_addr_end(addr, end);
> - err = apply_to_pte_range(mm, pmd, addr, next, fn, data);
> + if (!closure->alloc && pmd_none_or_clear_bad(pmd))
> + continue;
> + err = apply_to_pte_range(closure, pmd, addr, next);
>   if (err)
>   break;
>   } while (pmd++, addr = next, addr != end);
>   return err;
>  }
>  
> -static int apply_to_pud_range(struct mm_struct *mm, p4d_t *p4d,
> -   

Re: [PATCH v5 3/9] mm: Add write-protect and clean utilities for address space ranges

2019-06-12 Thread Christoph Hellwig
On Wed, Jun 12, 2019 at 04:23:50AM -0700, Christoph Hellwig wrote:
> friends.  Also in general new core functionality like this should go
> along with the actual user, we don't need to repeat the hmm disaster.

Ok, I see you actually did that, it just got hidden by the awful
selective cc stuff a lot of people do at the moment.  Sorry for the
noise.


Re: [PATCH v5 3/9] mm: Add write-protect and clean utilities for address space ranges

2019-06-12 Thread Christoph Hellwig
On Wed, Jun 12, 2019 at 08:42:37AM +0200, Thomas Hellström (VMware) wrote:
> From: Thomas Hellstrom 
> 
> Add two utilities to a) write-protect and b) clean all ptes pointing into
> a range of an address space.
> The utilities are intended to aid in tracking dirty pages (either
> driver-allocated system memory or pci device memory).
> The write-protect utility should be used in conjunction with
> page_mkwrite() and pfn_mkwrite() to trigger write page-faults on page
> accesses. Typically one would want to use this on sparse accesses into
> large memory regions. The clean utility should be used to utilize
> hardware dirtying functionality and avoid the overhead of page-faults,
> typically on large accesses into small memory regions.

Please use EXPORT_SYMBOL_GPL, just like for apply_to_page_range and
friends.  Also in general new core functionality like this should go
along with the actual user, we don't need to repeat the hmm disaster.


Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Christoph Hellwig
If you (and a few others actors in the thread) want people to actually
read what you wrote please follow proper mailing list ettiquette.  I've
given up on reading all the recent mails after scrolling through two
pages of full quotes.


Re: [PATCH] drm/vmwgfx: fix a warning due to missing dma_parms

2019-05-24 Thread Christoph Hellwig
On Thu, May 23, 2019 at 10:37:19PM -0400, Qian Cai wrote:
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index bf6c3500d363..5c567b81174f 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -747,6 +747,13 @@ static int vmw_driver_load(struct drm_device *dev, 
> unsigned long chipset)
>   if (unlikely(ret != 0))
>   goto out_err0;
>  
> + dev->dev->dma_parms =  kzalloc(sizeof(*dev->dev->dma_parms),
> +GFP_KERNEL);
> + if (!dev->dev->dma_parms)
> + goto out_err0;

What bus does this device come from?  I though vmgfx was a (virtualized)
PCI device, in which case this should be provided by the PCI core.
Or are we calling DMA mapping routines on arbitrary other struct device,
in which case that is the real bug and we should switch the PCI device
instead.

> + dma_set_max_seg_size(dev->dev, *dev->dev->dma_mask);

That looks odd.  If you want to support an unlimited segment size
just pass UINT_MAX here.


Re: [PATCH] etnaviv: allow to build on ARC

2019-01-16 Thread Christoph Hellwig
On Mon, Jan 14, 2019 at 07:31:57PM +0300, Eugeniy Paltsev wrote:
> ARC HSDK SoC has Vivante GPU IP so allow build etnaviv for ARC.
> 
> Signed-off-by: Eugeniy Paltsev 
> ---
>  drivers/gpu/drm/etnaviv/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/Kconfig b/drivers/gpu/drm/etnaviv/Kconfig
> index 342591a1084e..49a9957c3373 100644
> --- a/drivers/gpu/drm/etnaviv/Kconfig
> +++ b/drivers/gpu/drm/etnaviv/Kconfig
> @@ -2,7 +2,7 @@
>  config DRM_ETNAVIV
>   tristate "ETNAVIV (DRM support for Vivante GPU IP cores)"
>   depends on DRM
> - depends on ARCH_MXC || ARCH_DOVE || ARM || COMPILE_TEST
> + depends on ARCH_MXC || ARCH_DOVE || ARM || ARC || COMPILE_TEST

Is there any reason to not just remove the dependencies entirely?
It seems like it could literally build everywhere, and who knows what
other SOCs the IP blocks end up in sooner or later?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2] drm/xen-front: Make shmem backed display buffer coherent

2019-01-15 Thread Christoph Hellwig
On Wed, Jan 16, 2019 at 07:30:02AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > +   if (!dma_map_sg(dev->dev, xen_obj->sgt->sgl, xen_obj->sgt->nents,
> > +   DMA_BIDIRECTIONAL)) {
> > +   ret = -EFAULT;
> > +   goto fail_free_sgt;
> > +   }
> 
> Hmm, so it seems the arm guys could not come up with a suggestion how to
> solve that one in a better way.  Ok, lets go with this then.
> 
> But didn't we agree that this deserves a comment exmplaining the purpose
> of the dma_map_sg() call is to flush caches and that there is no actual
> DMA happening here?

Using a dma mapping call to flush caches is complete no-go.  But the
real question is why you'd even want to flush cashes if you do not
want a dma mapping?

This whole issue keeps getting more and more confusing.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-14 Thread Christoph Hellwig
Hmm, I wonder if we are not actually using swiotlb in the end,
can you check if your dmesg contains this line or not?

PCI-DMA: Using software bounce buffering for IO (SWIOTLB)

If not I guess we found a bug in swiotlb exit vs is_swiotlb_buffer,
and you can try this patch:

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d6361776dc5c..1fb6fd68b9c7 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -378,6 +378,8 @@ void __init swiotlb_exit(void)
memblock_free_late(io_tlb_start,
   PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
}
+   io_tlb_start = 0;
+   io_tlb_end = 0;
io_tlb_nslabs = 0;
max_segment = 0;
 }
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-14 Thread Christoph Hellwig
On Thu, Jan 10, 2019 at 06:52:26PM +0100, Sibren Vasse wrote:
> On Thu, 10 Jan 2019 at 15:48, Christoph Hellwig  wrote:
> >
> > On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
> > >>  From the trace it looks like we git the case where swiotlb tries
> > >> to copy back data from a bounce buffer, but hits a dangling or NULL
> > >> pointer.  So a couple questions for the submitter:
> > >>
> > >>   - does the system have more than 4GB memory and thus use swiotlb?
> > >> (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
> > >>   - does the device this happens on have a DMA mask smaller than
> > >> the available memory, that is should swiotlb be used here to start
> > >> with?
> > >
> > > Rather unlikely. The device is an AMD GPU, so we can address memory up to
> > > 1TB.
> >
> > So we probably somehow got a false positive.
> >
> > For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
> > backtrace really is in the swiotlb code (I can't think of anything else,
> > but I'd rather be sure).
> I'm not sure what you want me to confirm. Could you elaborate?

Please open the vmlinux file for which this happend in gdb,
then send the output from this command

l *(dma_direct_unmap_page+0x92)

to this thread.

> > Second it would be great to print what the contents of io_tlb_start
> > and io_tlb_end are, e.g. by doing a printk_once in is_swiotlb_buffer,
> > maybe that gives a clue why we are hitting the swiotlb code here.
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 7c007ed7505f..042246dbae00 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -69,6 +69,7 @@ extern phys_addr_t io_tlb_start, io_tlb_end;
> 
>  static inline bool is_swiotlb_buffer(phys_addr_t paddr)
>  {
> +printk_once(KERN_INFO "io_tlb_start: %llu, io_tlb_end: %llu",
> io_tlb_start, io_tlb_end);
>  return paddr >= io_tlb_start && paddr < io_tlb_end;
>  }
> 
> Result on boot:
> [   11.405558] io_tlb_start: 3782983680, io_tlb_end: 3850092544

So this is a normal swiotlb location, and it does defintively exist.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

2019-01-14 Thread Christoph Hellwig
On Thu, Jan 10, 2019 at 04:42:18PM -0700, Jason Gunthorpe wrote:
> > Changes since the RFC:
> > - Rework vmwgfx too [CH]
> > - Use a distinct type for the DMA page iterator [CH]
> > - Do not have a #ifdef [CH]
> 
> ChristophH: Will you ack?

This looks generally fine.

> 
> Are you still OK with the vmwgfx reworking, or should we go back to
> the original version that didn't have the type safety so this driver
> can be left broken?

I think the map method in vmgfx that just does virt_to_phys is
pretty broken.  Thomas, can you check if you see any performance
difference with just doing the proper dma mapping, as that gets the
driver out of interface abuse land?

While we're at it I think we need to merge my series in this area
for 5.0, because without that the driver is already broken.  Where
should we merge it?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

2019-01-14 Thread Christoph Hellwig
On Sat, Jan 12, 2019 at 01:03:05PM -0600, Shiraz Saleem wrote:
> Well I was trying convert the RDMA drivers to use your new iterator variant
> and saw the need for it in locations where we need virtual address of the 
> pages
> contained in the SGEs.

As far as i can tell that pg_arr[i] value is only ever used for
the case where we do an explicit dma coherent allocation,  so you
should not have to fill it out at all.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-10 Thread Christoph Hellwig
On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
>>  From the trace it looks like we git the case where swiotlb tries
>> to copy back data from a bounce buffer, but hits a dangling or NULL
>> pointer.  So a couple questions for the submitter:
>>
>>   - does the system have more than 4GB memory and thus use swiotlb?
>> (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
>>   - does the device this happens on have a DMA mask smaller than
>> the available memory, that is should swiotlb be used here to start
>> with?
>
> Rather unlikely. The device is an AMD GPU, so we can address memory up to 
> 1TB.

So we probably somehow got a false positive.

For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
backtrace really is in the swiotlb code (I can't think of anything else,
but I'd rather be sure).

Second it would be great to print what the contents of io_tlb_start
and io_tlb_end are, e.g. by doing a printk_once in is_swiotlb_buffer,
maybe that gives a clue why we are hitting the swiotlb code here.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-10 Thread Christoph Hellwig
On Thu, Jan 10, 2019 at 10:59:02AM +0100, Michel Dänzer wrote:
> 
> Hi Christoph,
> 
> 
> https://bugs.freedesktop.org/109234 (please ignore comments #6-#9) was
> bisected to your commit 55897af63091 "dma-direct: merge swiotlb_dma_ops
> into the dma_direct code". Any ideas?

From the trace it looks like we git the case where swiotlb tries
to copy back data from a bounce buffer, but hits a dangling or NULL
pointer.  So a couple questions for the submitter:

 - does the system have more than 4GB memory and thus use swiotlb?
   (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
 - does the device this happens on have a DMA mask smaller than
   the available memory, that is should swiotlb be used here to start
   with?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/4] drm/vmwgfx: remove CONFIG_X86 ifdefs

2019-01-05 Thread Christoph Hellwig
The driver depends on CONFIG_X86 so these are dead code.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 25afb1d594e3..69e325b2d954 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -565,7 +565,6 @@ static int vmw_dma_select_mode(struct vmw_private *dev_priv)
[vmw_dma_alloc_coherent] = "Using coherent TTM pages.",
[vmw_dma_map_populate] = "Keeping DMA mappings.",
[vmw_dma_map_bind] = "Giving up DMA mappings early."};
-#ifdef CONFIG_X86
const struct dma_map_ops *dma_ops = get_dma_ops(dev_priv->dev->dev);
 
 #ifdef CONFIG_INTEL_IOMMU
@@ -607,11 +606,6 @@ static int vmw_dma_select_mode(struct vmw_private 
*dev_priv)
if (dev_priv->map_mode == vmw_dma_alloc_coherent)
return -EINVAL;
 #endif
-
-#else /* CONFIG_X86 */
-   dev_priv->map_mode = vmw_dma_map_populate;
-#endif /* CONFIG_X86 */
-
DRM_INFO("DMA map mode: %s\n", names[dev_priv->map_mode]);
 
return 0;
-- 
2.20.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


  1   2   3   4   >