[PATCH v2 11/17] vfio/type1: Mark follow_pfn as unsafe

2020-10-09 Thread Daniel Vetter
The code seems to stuff these pfns into iommu pts (or something like
that, I didn't follow), but there's no mmu_notifier to ensure that
access is synchronized with pte updates.

Hence mark these as unsafe. This means that with
CONFIG_STRICT_FOLLOW_PFN, these will be rejected.

Real fix is to wire up an mmu_notifier ... somehow. Probably means any
invalidate is a fatal fault for this vfio device, but then this
shouldn't ever happen if userspace is reasonable.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Dan Williams 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Alex Williamson 
Cc: Cornelia Huck 
Cc: k...@vger.kernel.org
---
 drivers/vfio/vfio_iommu_type1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5fbf0c1f7433..a4d53f3d0a35 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, 
struct mm_struct *mm,
 {
int ret;
 
-   ret = follow_pfn(vma, vaddr, pfn);
+   ret = unsafe_follow_pfn(vma, vaddr, pfn);
if (ret) {
bool unlocked = false;
 
@@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, 
struct mm_struct *mm,
if (ret)
return ret;
 
-   ret = follow_pfn(vma, vaddr, pfn);
+   ret = unsafe_follow_pfn(vma, vaddr, pfn);
}
 
return ret;
-- 
2.28.0



[PATCH v2 13/17] /dev/mem: Only set filp->f_mapping

2020-10-09 Thread Daniel Vetter
When we care about pagecache maintenance, we need to make sure that
both f_mapping and i_mapping point at the right mapping.

But for iomem mappings we only care about the virtual/pte side of
things, so f_mapping is enough. Also setting inode->i_mapping was
confusing me as a driver maintainer, since in e.g. drivers/gpu we
don't do that. Per Dan this seems to be copypasta from places which do
care about pagecache consistency, but not needed. Hence remove it for
slightly less confusion.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Dan Williams 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
---
 drivers/char/mem.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index abd4ffdc8cde..5502f56f3655 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -864,7 +864,6 @@ static int open_port(struct inode *inode, struct file *filp)
 * revocations when drivers want to take over a /dev/mem mapped
 * range.
 */
-   inode->i_mapping = devmem_inode->i_mapping;
filp->f_mapping = inode->i_mapping;
 
return 0;
-- 
2.28.0



[PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe

2020-10-09 Thread Daniel Vetter
The media model assumes that buffers are all preallocated, so that
when a media pipeline is running we never miss a deadline because the
buffers aren't allocated or available.

This means we cannot fix the v4l follow_pfn usage through
mmu_notifier, without breaking how this all works. The only real fix
is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
tell everyone to cut over to dma-buf memory sharing for zerocopy.

userptr for normal memory will keep working as-is.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Dan Williams 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Pawel Osciak 
Cc: Marek Szyprowski 
Cc: Kyungmin Park 
Cc: Tomasz Figa 
Cc: Laurent Dufour 
Cc: Vlastimil Babka 
Cc: Daniel Jordan 
Cc: Michel Lespinasse 
---
 drivers/media/common/videobuf2/frame_vector.c | 2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c 
b/drivers/media/common/videobuf2/frame_vector.c
index 2b0b97761d15..a1b85fe9e7c1 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -69,7 +69,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
nr_frames,
break;
 
while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
-   err = follow_pfn(vma, start, [ret]);
+   err = unsafe_follow_pfn(vma, start, [ret]);
if (err) {
if (ret == 0)
ret = err;
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c 
b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 52312ce2ba05..821c4a76ab96 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct 
videobuf_dma_contig_memory *mem,
user_address = untagged_baddr;
 
while (pages_done < (mem->size >> PAGE_SHIFT)) {
-   ret = follow_pfn(vma, user_address, _pfn);
+   ret = unsafe_follow_pfn(vma, user_address, _pfn);
if (ret)
break;
 
-- 
2.28.0



[PATCH v2 00/17] follow_pfn and other iomap races

2020-10-09 Thread Daniel Vetter
Hi all,

Round two of my patch series to clamp down a bunch of races and gaps
around follow_pfn and other access to iomem mmaps. Previous version:

v1: 
https://lore.kernel.org/dri-devel/20201007164426.1812530-1-daniel.vet...@ffwll.ch/

And the discussion that sparked this journey:

https://lore.kernel.org/dri-devel/20201007164426.1812530-1-daniel.vet...@ffwll.ch/

Changes in v2:
- tons of small polish all over, thanks to all the reviewers who
  spotted issues
- I managed to test at least the generic_access_phys and pci mmap revoke
  stuff with a few gdb sessions using our i915 debug tools (hence now also
  the drm/i915 patch to properly request all the pci bar regions)
- reworked approach for the pci mmap revoke: Infrastructure moved into
  kernel/resource.c, address_space mapping is now set up at open time for
  everyone (which required some sysfs changes). Does indeed look a lot
  cleaner and a lot less invasive than I feared at first.

The big thing I can't test are all the frame_vector changes in habanalbas,
exynos and media. Gerald has given the s390 patch a spin already.

Review, testing, feedback all very much welcome.

Cheers, Daniel

Daniel Vetter (17):
  drm/exynos: Stop using frame_vector helpers
  drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  misc/habana: Stop using frame_vector helpers
  misc/habana: Use FOLL_LONGTERM for userptr
  mm/frame-vector: Use FOLL_LONGTERM
  media: videobuf2: Move frame_vector into media subsystem
  mm: Close race in generic_access_phys
  s390/pci: Remove races against pte updates
  mm: Add unsafe_follow_pfn
  media/videbuf1|2: Mark follow_pfn usage as unsafe
  vfio/type1: Mark follow_pfn as unsafe
  PCI: Obey iomem restrictions for procfs mmap
  /dev/mem: Only set filp->f_mapping
  resource: Move devmem revoke code to resource framework
  sysfs: Support zapping of binary attr mmaps
  PCI: Revoke mappings like devmem
  drm/i915: Properly request PCI BARs

 arch/s390/pci/pci_mmio.c  | 98 +++
 drivers/char/mem.c| 86 +---
 drivers/gpu/drm/exynos/Kconfig|  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c   | 48 -
 drivers/gpu/drm/i915/intel_uncore.c   | 25 -
 drivers/media/common/videobuf2/Kconfig|  1 -
 drivers/media/common/videobuf2/Makefile   |  1 +
 .../media/common/videobuf2}/frame_vector.c| 54 --
 drivers/media/platform/omap/Kconfig   |  1 -
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 drivers/misc/habanalabs/Kconfig   |  1 -
 drivers/misc/habanalabs/common/habanalabs.h   |  3 +-
 drivers/misc/habanalabs/common/memory.c   | 50 --
 drivers/pci/pci-sysfs.c   |  4 +
 drivers/pci/proc.c|  6 ++
 drivers/vfio/vfio_iommu_type1.c   |  4 +-
 fs/sysfs/file.c   | 11 +++
 include/linux/ioport.h|  6 +-
 include/linux/mm.h| 47 +
 include/linux/sysfs.h |  2 +
 include/media/videobuf2-core.h| 42 
 kernel/resource.c | 95 +-
 mm/Kconfig|  3 -
 mm/Makefile   |  1 -
 mm/memory.c   | 76 +-
 mm/nommu.c| 17 
 security/Kconfig  | 13 +++
 27 files changed, 412 insertions(+), 286 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (85%)

-- 
2.28.0



Re: [PATCH 2/4] drm/prime: document that use the page array is deprecated

2020-10-09 Thread Daniel Vetter
On Fri, Oct 09, 2020 at 09:36:41AM +0200, Christian König wrote:
> Am 08.10.20 um 16:14 schrieb Daniel Vetter:
> > On Thu, Oct 08, 2020 at 04:09:14PM +0200, Daniel Vetter wrote:
> > > On Thu, Oct 08, 2020 at 01:23:40PM +0200, Christian König wrote:
> > > > We have reoccurring requests on this so better document that
> > > > this approach doesn't work and dma_buf_mmap() needs to be used instead.
> > > > 
> > > > Signed-off-by: Christian König 
> > > > ---
> > > >   drivers/gpu/drm/drm_prime.c | 7 ++-
> > > >   1 file changed, 6 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > > > index 4910c446db83..16fa2bfc271e 100644
> > > > --- a/drivers/gpu/drm/drm_prime.c
> > > > +++ b/drivers/gpu/drm/drm_prime.c
> > > > @@ -956,7 +956,7 @@ EXPORT_SYMBOL(drm_gem_prime_import);
> > > >   /**
> > > >* drm_prime_sg_to_page_addr_arrays - convert an sg table into a page 
> > > > array
> > > >* @sgt: scatter-gather table to convert
> > > > - * @pages: optional array of page pointers to store the page array in
> > > > + * @pages: deprecated array of page pointers to store the page array in
> > > >* @addrs: optional array to store the dma bus address of each page
> > > >* @max_entries: size of both the passed-in arrays
> > > >*
> > > > @@ -965,6 +965,11 @@ EXPORT_SYMBOL(drm_gem_prime_import);
> > > >*
> > > >* Drivers can use this in their _driver.gem_prime_import_sg_table
> > > >* implementation.
> > > > + *
> > > > + * Specifying the pages array is deprecated and strongly discouraged 
> > > > for new
> > > > + * drivers. The pages array is only useful for page faults and those 
> > > > can
> > > > + * corrupt fields in the struct page if they are not handled by the 
> > > > exporting
> > > > + * driver.
> > > >*/
> > > I'd make this a _lot_ stronger: Aside from amdgpu and radeon all drivers
> > > using this only need it for the pages array. Imo just open-code the sg
> > > table walking loop in amdgpu/radeon (it's really not much code), and then
> > > drop the dma_addr_t parameter from this function here (it's set to NULL by
> > > everyone else).
> > > 
> > > And then deprecate this entire function here with a big warning that a)
> > > dma_buf_map_attachment is allowed to leave the struct page pointers NULL
> > > and b) this breaks mmap, users must call dma_buf_mmap instead.
> > > 
> > > Also maybe make it an uppercase DEPRECATED or something like that :-)
> > OK I just realized I missed nouveau. That would be 3 places where we need
> > to stuff the dma_addr_t list into something ttm can take. Still feels
> > better than this half-deprecated function kludge ...
> 
> Mhm, I don't see a reason why nouveau would need the struct page either.
> 
> How about we split that up into two function?
> 
> One for converting the sg_table into a linear dma_addr array.
> 
> And one for converting the sg_table into a linear struct page array with a
> __deprecated attribute on it?

Yeah makes sense, since converting ttm to just use sgt iterations directly
everywhere is probably a bit too much. Maybe keep that converter in ttm
code, since outside of ttm the rough consensus is to converge on sgt for
handling buffers. Well, for those drivers not stuck on struct page arrays
:-)
-Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > -Daniel
> > > 
> > > >   int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct 
> > > > page **pages,
> > > >  dma_addr_t *addrs, int max_entries)
> > > > -- 
> > > > 2.17.1
> > > > 
> > > -- 
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 1/4] mm: introduce vma_set_file function v2

2020-10-09 Thread Daniel Vetter
On Fri, Oct 09, 2020 at 09:16:49AM +0200, Christian König wrote:
> Am 08.10.20 um 16:12 schrieb Daniel Vetter:
> > On Thu, Oct 08, 2020 at 01:23:39PM +0200, Christian König wrote:
> > > Add the new vma_set_file() function to allow changing
> > > vma->vm_file with the necessary refcount dance.
> > > 
> > > v2: add more users of this.
> > > 
> > > Signed-off-by: Christian König 
> > > ---
> > >   drivers/dma-buf/dma-buf.c  | 16 +---
> > >   drivers/gpu/drm/etnaviv/etnaviv_gem.c  |  4 +---
> > >   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  3 +--
> > >   drivers/gpu/drm/i915/gem/i915_gem_mman.c   |  4 ++--
> > >   drivers/gpu/drm/msm/msm_gem.c  |  4 +---
> > >   drivers/gpu/drm/omapdrm/omap_gem.c |  3 +--
> > >   drivers/gpu/drm/vgem/vgem_drv.c|  3 +--
> > >   drivers/staging/android/ashmem.c   |  5 ++---
> > >   include/linux/mm.h |  2 ++
> > >   mm/mmap.c  | 16 
> > >   10 files changed, 32 insertions(+), 28 deletions(-)
> > > 
> > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > > index a6ba4d598f0e..e4316aa7e0f4 100644
> > > --- a/drivers/dma-buf/dma-buf.c
> > > +++ b/drivers/dma-buf/dma-buf.c
> > > @@ -1163,20 +1163,14 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct 
> > > vm_area_struct *vma,
> > >   return -EINVAL;
> > >   /* readjust the vma */
> > > - get_file(dmabuf->file);
> > > - oldfile = vma->vm_file;
> > > - vma->vm_file = dmabuf->file;
> > > + oldfile = vma_set_file(vma, dmabuf->file);
> > >   vma->vm_pgoff = pgoff;
> > >   ret = dmabuf->ops->mmap(dmabuf, vma);
> > > - if (ret) {
> > > - /* restore old parameters on failure */
> > > - vma->vm_file = oldfile;
> > > - fput(dmabuf->file);
> > > - } else {
> > > - if (oldfile)
> > > - fput(oldfile);
> > > - }
> > > + /* restore old parameters on failure */
> > > + if (ret)
> > > + vma_set_file(vma, oldfile);
> > I think these two lines here are cargo-cult: If this fails, the mmap fails
> > and therefore the vma structure is kfreed. No point at all in restoring
> > anything.
> 
> This was explicitly added with this patch to fix a problem:
> 
> commit 495c10cc1c0c359871d5bef32dd173252fc17995
> Author: John Sheu 
> Date:   Mon Feb 11 17:50:24 2013 -0800
> 
>     CHROMIUM: dma-buf: restore args on failure of dma_buf_mmap
> 
>     Callers to dma_buf_mmap expect to fput() the vma struct's vm_file
>     themselves on failure.  Not restoring the struct's data on failure
>     causes a double-decrement of the vm_file's refcount.
> 
> > With that: Reviewed-by: Daniel Vetter 
> 
> Can I keep that even with the error handling working? :)

Hm good find, I should have looked at git history myself.

I just noticed this here in the patch because everyone else does not do
this. But looking at the mmap_region() code in mmap.c we seem to indeed
have this problem for the error path:

unmap_and_free_vma:
vma->vm_file = NULL;
fput(file);

Note that the success path does things correctly (a bit above):

file = vma->vm_file;
out:

So it indeed looks like dma-buf is the only one that does this fully
correctly. So maybe we should do a follow-up patch to change the
mmap_region exit code to pick up whatever vma->vm_file was set instead,
and fput that?

Anyway I correct, r-b: as-is.

Cheers, Daniel

> 
> Christian.
> 
> > 
> > > +
> > >   return ret;
> > >   }
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
> > > b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > index 312e9d58d5a7..10ce267c0947 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > @@ -145,10 +145,8 @@ static int etnaviv_gem_mmap_obj(struct 
> > > etnaviv_gem_object *etnaviv_obj,
> > >* address_space (so unmap_mapping_range does what we 
> > > want,
> > >* in particular in the case of mmap'd dmabufs)
> > >*/
> > > - fput(vma->vm_file);
> > > - get_file(etnaviv_obj->base.filp);
> > >   vma->vm_pgoff = 0;
> > > - vma->vm_file  = etnaviv_obj->

Re: [PATCH] MAINTAINERS: Update entry for st7703 driver after the rename

2020-10-09 Thread Daniel Vetter
On Fri, Oct 09, 2020 at 06:27:46AM +0200, Lukas Bulwahn wrote:
> 
> 
> On Wed, 1 Jul 2020, Ondrej Jirman wrote:
> 
> > The driver was renamed, change the path in the MAINTAINERS file.
> > 
> > Signed-off-by: Ondrej Jirman 
> 
> This minor non-urgent cleanup patch has not been picked up yet by anyone.
> 
> Hence, ./scripts/get_maintainers.pl --self-test=patterns continues to 
> complain:
> 
>   warning: no file matches  F:
> Documentation/devicetree/bindings/display/panel/rocktech,jh057n00900.txt
>   warning: no file matches  F:
> drivers/gpu/drm/panel/panel-rocktech-jh057n00900.c
> 
> This patch cleanly applies on next-20201008 and resolves the issue above.

Generally after 2-3 weeks a patch is lost and unfortunately needs to be
resend. Please do that next time around instead of waiting.

Patch queued up now for 5.10, thanks.
-Daniel

> Reviewed-by: Lukas Bulwahn 
> 
> 
> Lukas
> 
> > ---
> >  MAINTAINERS | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 5f186a661a9b..f5183eae08df 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -5487,12 +5487,13 @@ S:  Maintained
> >  F: Documentation/devicetree/bindings/display/panel/raydium,rm67191.yaml
> >  F: drivers/gpu/drm/panel/panel-raydium-rm67191.c
> >  
> > -DRM DRIVER FOR ROCKTECH JH057N00900 PANELS
> > +DRM DRIVER FOR SITRONIX ST7703 PANELS
> >  M: Guido Günther 
> >  R: Purism Kernel Team 
> > +R: Ondrej Jirman 
> >  S: Maintained
> > -F: Documentation/devicetree/bindings/display/panel/rocktech,jh057n00900.txt
> > -F: drivers/gpu/drm/panel/panel-rocktech-jh057n00900.c
> > +F: 
> > Documentation/devicetree/bindings/display/panel/rocktech,jh057n00900.yaml
> > +F: drivers/gpu/drm/panel/panel-sitronix-st7703.c
> >  
> >  DRM DRIVER FOR SAVAGE VIDEO CARDS
> >  S: Orphan / Obsolete
> > -- 
> > 2.27.0
> > 
> > 


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 08/13] s390/pci: Remove races against pte updates

2020-10-08 Thread Daniel Vetter
On Thu, Oct 8, 2020 at 6:44 PM Gerald Schaefer
 wrote:
>
> On Wed,  7 Oct 2020 18:44:21 +0200
> Daniel Vetter  wrote:
>
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea. Fix this.
> >
> > Since zpci_memcpy_from|toio seems to not do anything nefarious with
> > locks we just need to open code get_pfn and follow_pfn and make sure
> > we drop the locks only after we've done. The write function also needs
> > the copy_from_user move, since we can't take userspace faults while
> > holding the mmap sem.
> >
> > Signed-off-by: Daniel Vetter 
> > Cc: Jason Gunthorpe 
> > Cc: Dan Williams 
> > Cc: Kees Cook 
> > Cc: Andrew Morton 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Jan Kara 
> > Cc: Dan Williams 
> > Cc: linux...@kvack.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-samsung-...@vger.kernel.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: Niklas Schnelle 
> > Cc: Gerald Schaefer 
> > Cc: linux-s...@vger.kernel.org
> > ---
> >  arch/s390/pci/pci_mmio.c | 98 +++-
> >  1 file changed, 57 insertions(+), 41 deletions(-)
>
> Looks good, thanks. Also survived some basic function test. Only some
> minor nitpick, see below.
>
> Reviewed-by: Gerald Schaefer 
>
> >
> > diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> > index 401cf670a243..4d194cb09372 100644
> > --- a/arch/s390/pci/pci_mmio.c
> > +++ b/arch/s390/pci/pci_mmio.c
> > @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem 
> > *dst,
> >   return rc;
> >  }
> >
> > -static long get_pfn(unsigned long user_addr, unsigned long access,
> > - unsigned long *pfn)
> > -{
> > - struct vm_area_struct *vma;
> > - long ret;
> > -
> > - mmap_read_lock(current->mm);
> > - ret = -EINVAL;
> > - vma = find_vma(current->mm, user_addr);
> > - if (!vma)
> > - goto out;
> > - ret = -EACCES;
> > - if (!(vma->vm_flags & access))
> > - goto out;
> > - ret = follow_pfn(vma, user_addr, pfn);
> > -out:
> > - mmap_read_unlock(current->mm);
> > - return ret;
> > -}
> > -
> >  SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> >   const void __user *, user_buffer, size_t, length)
> >  {
> >   u8 local_buf[64];
> >   void __iomem *io_addr;
> >   void *buf;
> > - unsigned long pfn;
> > + struct vm_area_struct *vma;
> > + pte_t *ptep;
> > + spinlock_t *ptl;
> >   long ret;
> >
> >   if (!zpci_is_enabled())
> > @@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, 
> > mmio_addr,
> >* We only support write access to MIO capable devices if we are on
> >* a MIO enabled system. Otherwise we would have to check for every
> >* address if it is a special ZPCI_ADDR and would have to do
> > -  * a get_pfn() which we don't need for MIO capable devices.  Currently
> > +  * a pfn lookup which we don't need for MIO capable devices.  
> > Currently
> >* ISM devices are the only devices without MIO support and there is 
> > no
> >* known need for accessing these from userspace.
> >*/
> > @@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, 
> > mmio_addr,
> >   } else
> >   buf = local_buf;
> >
> > - ret = get_pfn(mmio_addr, VM_WRITE, );
> > + ret = -EFAULT;
> > + if (copy_from_user(buf, user_buffer, length))
> > + goto out_free;
> > +
> > + mmap_read_lock(current->mm);
> 

Re: [PATCH v2] drm/fourcc: Add AXBXGXRX106106106106 format

2020-10-08 Thread Daniel Vetter
On Thu, Oct 08, 2020 at 03:33:50PM +0100, Matteo Franchin wrote:
> Add ABGR format with 10-bit components packed in 64-bit per pixel.
> This format can be used to handle
> VK_FORMAT_R10X6G10X6B10X6A10X6_UNORM_4PACK16 on little-endian
> architectures.
> 
> Signed-off-by: Matteo Franchin 

Ok so 0xff and 0xfb for a true 16bit format have a slight difference,
whereas for this truncated format they're both max brightness. So yeah
there's a difference and I guess we need to add this.

Acked-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/drm_fourcc.c  | 1 +
>  include/uapi/drm/drm_fourcc.h | 6 ++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c
> index 722c7ebe4e88..03262472059c 100644
> --- a/drivers/gpu/drm/drm_fourcc.c
> +++ b/drivers/gpu/drm/drm_fourcc.c
> @@ -202,6 +202,7 @@ const struct drm_format_info *__drm_format_info(u32 
> format)
>   { .format = DRM_FORMAT_XBGR16161616F,   .depth = 0,  
> .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 },
>   { .format = DRM_FORMAT_ARGB16161616F,   .depth = 0,  
> .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true 
> },
>   { .format = DRM_FORMAT_ABGR16161616F,   .depth = 0,  
> .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true 
> },
> + { .format = DRM_FORMAT_AXBXGXRX106106106106, .depth = 0, 
> .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true 
> },
>   { .format = DRM_FORMAT_RGB888_A8,   .depth = 32, 
> .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true 
> },
>   { .format = DRM_FORMAT_BGR888_A8,   .depth = 32, 
> .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true 
> },
>   { .format = DRM_FORMAT_XRGB_A8, .depth = 32, 
> .num_planes = 2, .cpp = { 4, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true 
> },
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 82f327801267..9374d9558493 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -155,6 +155,12 @@ extern "C" {
>  #define DRM_FORMAT_ARGB16161616F fourcc_code('A', 'R', '4', 'H') /* [63:0] 
> A:R:G:B 16:16:16:16 little endian */
>  #define DRM_FORMAT_ABGR16161616F fourcc_code('A', 'B', '4', 'H') /* [63:0] 
> A:B:G:R 16:16:16:16 little endian */
>  
> +/*
> + * RGBA format with 10-bit components packed in 64-bit per pixel, with 6 bits
> + * of unused padding per component:
> + */
> +#define DRM_FORMAT_AXBXGXRX106106106106 fourcc_code('A', 'B', '1', '0') /* 
> [63:0] A:x:B:x:G:x:R:x 10:6:10:6:10:6:10:6 little endian */
> +
>  /* packed YCbCr */
>  #define DRM_FORMAT_YUYV  fourcc_code('Y', 'U', 'Y', 'V') /* 
> [31:0] Cr0:Y1:Cb0:Y0 8:8:8:8 little endian */
>  #define DRM_FORMAT_YVYU  fourcc_code('Y', 'V', 'Y', 'U') /* 
> [31:0] Cb0:Y1:Cr0:Y0 8:8:8:8 little endian */
> -- 
> 2.17.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/4] drm/prime: document that use the page array is deprecated

2020-10-08 Thread Daniel Vetter
On Thu, Oct 08, 2020 at 04:09:14PM +0200, Daniel Vetter wrote:
> On Thu, Oct 08, 2020 at 01:23:40PM +0200, Christian König wrote:
> > We have reoccurring requests on this so better document that
> > this approach doesn't work and dma_buf_mmap() needs to be used instead.
> > 
> > Signed-off-by: Christian König 
> > ---
> >  drivers/gpu/drm/drm_prime.c | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > index 4910c446db83..16fa2bfc271e 100644
> > --- a/drivers/gpu/drm/drm_prime.c
> > +++ b/drivers/gpu/drm/drm_prime.c
> > @@ -956,7 +956,7 @@ EXPORT_SYMBOL(drm_gem_prime_import);
> >  /**
> >   * drm_prime_sg_to_page_addr_arrays - convert an sg table into a page array
> >   * @sgt: scatter-gather table to convert
> > - * @pages: optional array of page pointers to store the page array in
> > + * @pages: deprecated array of page pointers to store the page array in
> >   * @addrs: optional array to store the dma bus address of each page
> >   * @max_entries: size of both the passed-in arrays
> >   *
> > @@ -965,6 +965,11 @@ EXPORT_SYMBOL(drm_gem_prime_import);
> >   *
> >   * Drivers can use this in their _driver.gem_prime_import_sg_table
> >   * implementation.
> > + *
> > + * Specifying the pages array is deprecated and strongly discouraged for 
> > new
> > + * drivers. The pages array is only useful for page faults and those can
> > + * corrupt fields in the struct page if they are not handled by the 
> > exporting
> > + * driver.
> >   */
> 
> I'd make this a _lot_ stronger: Aside from amdgpu and radeon all drivers
> using this only need it for the pages array. Imo just open-code the sg
> table walking loop in amdgpu/radeon (it's really not much code), and then
> drop the dma_addr_t parameter from this function here (it's set to NULL by
> everyone else).
> 
> And then deprecate this entire function here with a big warning that a)
> dma_buf_map_attachment is allowed to leave the struct page pointers NULL
> and b) this breaks mmap, users must call dma_buf_mmap instead.
> 
> Also maybe make it an uppercase DEPRECATED or something like that :-)

OK I just realized I missed nouveau. That would be 3 places where we need
to stuff the dma_addr_t list into something ttm can take. Still feels
better than this half-deprecated function kludge ...
-Daniel

> -Daniel
> 
> >  int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page 
> > **pages,
> >  dma_addr_t *addrs, int max_entries)
> > -- 
> > 2.17.1
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 1/4] mm: introduce vma_set_file function v2

2020-10-08 Thread Daniel Vetter
On Thu, Oct 08, 2020 at 01:23:39PM +0200, Christian König wrote:
> Add the new vma_set_file() function to allow changing
> vma->vm_file with the necessary refcount dance.
> 
> v2: add more users of this.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-buf.c  | 16 +---
>  drivers/gpu/drm/etnaviv/etnaviv_gem.c  |  4 +---
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  3 +--
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c   |  4 ++--
>  drivers/gpu/drm/msm/msm_gem.c  |  4 +---
>  drivers/gpu/drm/omapdrm/omap_gem.c |  3 +--
>  drivers/gpu/drm/vgem/vgem_drv.c|  3 +--
>  drivers/staging/android/ashmem.c   |  5 ++---
>  include/linux/mm.h |  2 ++
>  mm/mmap.c  | 16 
>  10 files changed, 32 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index a6ba4d598f0e..e4316aa7e0f4 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -1163,20 +1163,14 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct 
> vm_area_struct *vma,
>   return -EINVAL;
>  
>   /* readjust the vma */
> - get_file(dmabuf->file);
> - oldfile = vma->vm_file;
> - vma->vm_file = dmabuf->file;
> + oldfile = vma_set_file(vma, dmabuf->file);
>   vma->vm_pgoff = pgoff;
>  
>   ret = dmabuf->ops->mmap(dmabuf, vma);
> - if (ret) {
> - /* restore old parameters on failure */
> - vma->vm_file = oldfile;
> - fput(dmabuf->file);
> - } else {
> - if (oldfile)
> - fput(oldfile);
> - }
> + /* restore old parameters on failure */
> + if (ret)
> + vma_set_file(vma, oldfile);

I think these two lines here are cargo-cult: If this fails, the mmap fails
and therefore the vma structure is kfreed. No point at all in restoring
anything.

With that: Reviewed-by: Daniel Vetter 

> +
>   return ret;
>  
>  }
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> index 312e9d58d5a7..10ce267c0947 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> @@ -145,10 +145,8 @@ static int etnaviv_gem_mmap_obj(struct 
> etnaviv_gem_object *etnaviv_obj,
>* address_space (so unmap_mapping_range does what we want,
>* in particular in the case of mmap'd dmabufs)
>*/
> - fput(vma->vm_file);
> - get_file(etnaviv_obj->base.filp);
>   vma->vm_pgoff = 0;
> - vma->vm_file  = etnaviv_obj->base.filp;
> + vma_set_file(vma, etnaviv_obj->base.filp);
>  
>   vma->vm_page_prot = vm_page_prot;
>   }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index fec0e1e3dc3e..8ce4c9e28b87 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -119,8 +119,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, 
> struct vm_area_struct *
>   if (ret)
>   return ret;
>  
> - fput(vma->vm_file);
> - vma->vm_file = get_file(obj->base.filp);
> + vma_set_file(vma, obj->base.filp);
>  
>   return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index 3d69e51f3e4d..c9d5f1a38af3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -893,8 +893,8 @@ int i915_gem_mmap(struct file *filp, struct 
> vm_area_struct *vma)
>* requires avoiding extraneous references to their filp, hence why
>* we prefer to use an anonymous file for their mmaps.
>*/
> - fput(vma->vm_file);
> - vma->vm_file = anon;
> + vma_set_file(vma, anon);
> + fput(anon);
>  
>   switch (mmo->mmap_type) {
>   case I915_MMAP_TYPE_WC:
> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> index de915ff6f4b4..a71f42870d5e 100644
> --- a/drivers/gpu/drm/msm/msm_gem.c
> +++ b/drivers/gpu/drm/msm/msm_gem.c
> @@ -223,10 +223,8 @@ int msm_gem_mmap_obj(struct drm_gem_object *obj,
>* address_space (so unmap_mapping_range does what we want,
>* in particular in the case of mmap'd dmabufs)
>*/
> - fput(vma->vm_file);
> - get_file(obj->filp);
>   vma->vm

Re: [PATCH 2/4] drm/prime: document that use the page array is deprecated

2020-10-08 Thread Daniel Vetter
On Thu, Oct 08, 2020 at 01:23:40PM +0200, Christian König wrote:
> We have reoccurring requests on this so better document that
> this approach doesn't work and dma_buf_mmap() needs to be used instead.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/drm_prime.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 4910c446db83..16fa2bfc271e 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -956,7 +956,7 @@ EXPORT_SYMBOL(drm_gem_prime_import);
>  /**
>   * drm_prime_sg_to_page_addr_arrays - convert an sg table into a page array
>   * @sgt: scatter-gather table to convert
> - * @pages: optional array of page pointers to store the page array in
> + * @pages: deprecated array of page pointers to store the page array in
>   * @addrs: optional array to store the dma bus address of each page
>   * @max_entries: size of both the passed-in arrays
>   *
> @@ -965,6 +965,11 @@ EXPORT_SYMBOL(drm_gem_prime_import);
>   *
>   * Drivers can use this in their _driver.gem_prime_import_sg_table
>   * implementation.
> + *
> + * Specifying the pages array is deprecated and strongly discouraged for new
> + * drivers. The pages array is only useful for page faults and those can
> + * corrupt fields in the struct page if they are not handled by the exporting
> + * driver.
>   */

I'd make this a _lot_ stronger: Aside from amdgpu and radeon all drivers
using this only need it for the pages array. Imo just open-code the sg
table walking loop in amdgpu/radeon (it's really not much code), and then
drop the dma_addr_t parameter from this function here (it's set to NULL by
everyone else).

And then deprecate this entire function here with a big warning that a)
dma_buf_map_attachment is allowed to leave the struct page pointers NULL
and b) this breaks mmap, users must call dma_buf_mmap instead.

Also maybe make it an uppercase DEPRECATED or something like that :-)
-Daniel

>  int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page 
> **pages,
>    dma_addr_t *addrs, int max_entries)
> -- 
> 2.17.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v2 0/3] drm: commit_work scheduling

2020-10-08 Thread Daniel Vetter
On Wed, Oct 07, 2020 at 05:30:10PM +0100, Qais Yousef wrote:
> On 10/07/20 08:57, Rob Clark wrote:
> > Yeah, I think we will end up making some use of uclamp.. there is
> > someone else working on that angle
> > 
> > But without it, this is a case that exposes legit prioritization
> > problems with commit_work which we should fix ;-)
> 
> I wasn't suggesting this as an alternative to fixing the other problem. But it
> seemed you had a different problem here that I thought I could help with :-)
> 
> I did give my opinion about how to handle that priority issue. If the 2 
> threads
> are kernel threads and by design they need relative priorities IMO the kernel
> need to be taught to set this relative priority. It seemed the vblank worker
> could run as SCHED_DEADLINE. If this works, then the priority problem for
> commit_work disappears as SCHED_DEADLINE will preempt RT. If commit_work uses
> sched_set_fifo(), its priority will be 50, hence your SF threads can no longer
> preempt it. And you can manage the SF threads to be any value you want 
> relative
> to 50 anyway without having to manage commit_work itself.
> 
> I'm not sure if you have problems with RT tasks preempting important CFS
> tasks. My brain registered two conflicting statements.

I think the problem is there's two modes cros runs in: Normal cros mode,
which mostly works like a linux desktop. CFS commit work seems fine.

Other mode is android emulation, where we have the surface flinger thread
running at SCHED_FIFO. I think Rob's plan is to runtime switch priorities
to match each use case.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 10/13] PCI: revoke mappings like devmem

2020-10-08 Thread Daniel Vetter
On Thu, Oct 8, 2020 at 9:50 AM Dan Williams  wrote:
>
> On Wed, Oct 7, 2020 at 4:25 PM Jason Gunthorpe  wrote:
> >
> > On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter  
> > > wrote:
> > > >
> > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > the default for all driver uses.
> > > >
> > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > support. Let's plug that hole.
> > >
> > > Ooh, yes, lets.
> > >
> > > >
> > > > For revoke_devmem() to work we need to link our vma into the same
> > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > >
> > > I don't think you want to share the devmem inode for this, this should
> > > be based off the sysfs inode which I believe there is already only one
> > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > because anyone can just mknod a new character device file, the same
> > > problem does not exist for sysfs.
> >
> > The inode does not come from the filesystem char/mem.c creates a
> > singular anon inode in devmem_init_inode()
>
> That's not quite right, An inode does come from the filesystem I just
> arranged for that inode's i_mapping to be set to a common instance.
>
> > Seems OK to use this more widely, but it feels a bit weird to live in
> > char/memory.c.
>
> Sure, now that more users have arrived it should move somewhere common.
>
> > This is what got me thinking maybe this needs to be a bit bigger
> > generic infrastructure - eg enter this scheme from fops mmap and
> > everything else is in mm/user_iomem.c
>
> It still requires every file that can map physical memory to have its
> ->open fop do
>
>inode->i_mapping = devmem_inode->i_mapping;
>filp->f_mapping = inode->i_mapping;
>
> I don't see how you can centralize that part.

btw, why are you setting inode->i_mapping? The inode is already
published, changing that looks risky. And I don't think it's needed,
vma_link() only looks at filp->f_mapping, and in our drm_open() we
only set that one.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 10/13] PCI: revoke mappings like devmem

2020-10-08 Thread Daniel Vetter
On Thu, Oct 8, 2020 at 12:29 AM Dan Williams  wrote:
>
> On Wed, Oct 7, 2020 at 3:23 PM Dan Williams  wrote:
> >
> > On Wed, Oct 7, 2020 at 12:49 PM Daniel Vetter  
> > wrote:
> > >
> > > On Wed, Oct 7, 2020 at 9:33 PM Dan Williams  
> > > wrote:
> > > >
> > > > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter  
> > > > wrote:
> > > > >
> > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > > > the default for all driver uses.
> > > > >
> > > > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > > > support. Let's plug that hole.
> > > >
> > > > Ooh, yes, lets.
> > > >
> > > > > For revoke_devmem() to work we need to link our vma into the same
> > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > > > at ->open time, but that's a bit tricky here with all the entry points
> > > > > and arch code. So instead create a fake file and adjust vma->vm_file.
> > > >
> > > > I don't think you want to share the devmem inode for this, this should
> > > > be based off the sysfs inode which I believe there is already only one
> > > > instance per resource. In contrast /dev/mem can have multiple inodes
> > > > because anyone can just mknod a new character device file, the same
> > > > problem does not exist for sysfs.
> > >
> > > But then I need to find the right one, plus I also need to find the
> > > right one for the procfs side. That gets messy, and I already have no
> > > idea how to really test this. Shared address_space is the same trick
> > > we're using in drm (where we have multiple things all pointing to the
> > > same underlying resources, through different files), and it gets the
> > > job done. So that's why I figured the shared address_space is the
> > > cleaner solution since then unmap_mapping_range takes care of
> > > iterating over all vma for us. I guess I could reimplement that logic
> > > with our own locking and everything in revoke_devmem, but feels a bit
> > > silly. But it would also solve the problem of having mutliple
> > > different mknod of /dev/kmem with different address_space behind them.
> > > Also because of how remap_pfn_range works, all these vma do use the
> > > same pgoff already anyway.
> >
> > True, remap_pfn_range() makes sure that ->pgoff is an absolute
> > physical address offset for all use cases. So you might be able to
> > just point proc_bus_pci_open() at the shared devmem address space. For
> > sysfs it's messier. I think you would need to somehow get the inode
> > from kernfs_fop_open() to adjust its address space, but only if the
> > bin_file will ultimately be used for PCI memory.

Just read the code  a bit more, and for proc it's impossible. There's
only a single file, and before you mmap it you have to call a few
ioctl to select the right pci resource on that device you want to
mmap. Which includes legacy ioport stuff, and at least for now those
don't get revoked (maybe they should, but I'm looking at iomem here
now). Setting the mapping too early in ->open means that on
architectures which can do ioport as mmaps (not many, but powerpc is
among them) we'd shoot down these mmaps too.

Looking at the code there's the generic implementation, which consults
pci_iobar_pfn. And the only other implementation for sparc looks
similar, they separate iomem vs ioport through different pfn. So I
think this should indeed work.

> To me this seems like a new sysfs_create_bin_file() flavor that
> registers the file with the common devmem address_space.

Hm I think we could just add a i_mapping member to bin_attributes and
let the normal open code set that up for us. That should work.
mmapable binary sysfs file is already a similar special case.
-Daniel




--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 10/13] PCI: revoke mappings like devmem

2020-10-08 Thread Daniel Vetter
On Thu, Oct 8, 2020 at 1:24 AM Jason Gunthorpe  wrote:
>
> On Wed, Oct 07, 2020 at 12:33:06PM -0700, Dan Williams wrote:
> > On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter  
> > wrote:
> > >
> > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > > the default for all driver uses.
> > >
> > > Except there's two more ways to access pci bars: sysfs and proc mmap
> > > support. Let's plug that hole.
> >
> > Ooh, yes, lets.
> >
> > >
> > > For revoke_devmem() to work we need to link our vma into the same
> > > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > > at ->open time, but that's a bit tricky here with all the entry points
> > > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > I don't think you want to share the devmem inode for this, this should
> > be based off the sysfs inode which I believe there is already only one
> > instance per resource. In contrast /dev/mem can have multiple inodes
> > because anyone can just mknod a new character device file, the same
> > problem does not exist for sysfs.
>
> The inode does not come from the filesystem char/mem.c creates a
> singular anon inode in devmem_init_inode()
>
> Seems OK to use this more widely, but it feels a bit weird to live in
> char/memory.c.
>
> This is what got me thinking maybe this needs to be a bit bigger
> generic infrastructure - eg enter this scheme from fops mmap and
> everything else is in mm/user_iomem.c

Yeah moving it to iomem and renaming it to have an iomem_prefix
instead of devmem sounds like a good idea.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 07/13] mm: close race in generic_access_phys

2020-10-08 Thread Daniel Vetter
On Thu, Oct 8, 2020 at 2:44 AM John Hubbard  wrote:
>
> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> >ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
>
> s/carvetouts/carveouts/
>
> >cma regions. This means if we miss the unmap the pfn might contain
> >pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> >iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> >("/dev/mem: Revoke mappings when a driver claims the region")
>
> Thanks for putting these references into the log, it's very helpful.
> ...
> > diff --git a/mm/memory.c b/mm/memory.c
> > index fcfc4ca36eba..8d467e23b44e 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
> >   return ret;
> >   }
> >
> > +/**
> > + * generic_access_phys - generic implementation for iomem mmap access
> > + * @vma: the vma to access
> > + * @addr: userspace addres, not relative offset within @vma
> > + * @buf: buffer to read/write
> > + * @len: length of transfer
> > + * @write: set to FOLL_WRITE when writing, otherwise reading
> > + *
> > + * This is a generic implementation for _operations_struct.access for an
> > + * iomem mapping. This callback is used by access_process_vm() when the 
> > @vma is
> > + * not page based.
> > + */
> >   int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> >   void *buf, int len, int write)
> >   {
> >   resource_size_t phys_addr;
> >   unsigned long prot = 0;
> >   void __iomem *maddr;
> > + pte_t *ptep, pte;
> > + spinlock_t *ptl;
> >   int offset = addr & (PAGE_SIZE-1);
> > + int ret = -EINVAL;
> > +
> > + if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > + return -EINVAL;
> > +
> > +retry:
> > + if (follow_pte(vma->vm_mm, addr, , ))
> > + return -EINVAL;
> > + pte = *ptep;
> > + pte_unmap_unlock(ptep, ptl);
> >
> > - if (follow_phys(vma, addr, write, , _addr))
> > + prot = pgprot_val(pte_pgprot(pte));
> > + phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
> > +
> > + if ((write & FOLL_WRITE) && !pte_write(pte))
> >   return -EINVAL;
> >
> >   maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
> >   if (!maddr)
> >   return -ENOMEM;
> >
> > + if (follow_pte(vma->vm_mm, addr, , ))
> > +     goto out_unmap;
> > +
> > + if (pte_same(pte, *ptep)) {
>
>
> The ioremap area is something I'm sorta new to, so a newbie question:
> is it possible for the same pte to already be there, ever? If so, we
> be stuck in an infinite loop here.  I'm sure that's not the case, but
> it's not yet obvious to me why it's impossible. Resource reservations
> maybe?

It's just buggy, it should be !pte_same. And I need to figure out how
to test this I guess.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 11:37 PM John Hubbard  wrote:
>
> On 10/7/20 2:32 PM, Daniel Vetter wrote:
> > On Wed, Oct 7, 2020 at 10:33 PM John Hubbard  wrote:
> >>
> >> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> ...
> >>> @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct 
> >>> g2d_data *g2d,
> >>>dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
> >>>  DMA_BIDIRECTIONAL, 0);
> >>>
> >>> - pages = frame_vector_pages(g2d_userptr->vec);
> >>> - if (!IS_ERR(pages)) {
> >>> - int i;
> >>> + for (i = 0; i < g2d_userptr->npages; i++)
> >>> + set_page_dirty_lock(g2d_userptr->pages[i]);
> >>>
> >>> - for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
> >>> - set_page_dirty_lock(pages[i]);
> >>> - }
> >>> - put_vaddr_frames(g2d_userptr->vec);
> >>> - frame_vector_destroy(g2d_userptr->vec);
> >>> + unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
> >>> + kvfree(g2d_userptr->pages);
> >>
> >> You can avoid writing your own loop, and just simplify the whole thing 
> >> down to
> >> two lines:
> >>
> >>  unpin_user_pages_dirty_lock(g2d_userptr->pages, 
> >> g2d_userptr->npages,
> >>  true);
> >>  kvfree(g2d_userptr->pages);
> >
> > Oh nice, this is neat. I'll also roll it out in the habanalabs patch,
> > that has the same thing. Well almost, it only uses set_page_dirty, not
> > the _lock variant. But I have no idea whether that matters or not?
>
>
> It matters. And invariably, call sites that use set_page_dirty() instead
> of set_page_dirty_lock() were already wrong.  Which is why I never had to
> provide anything like "unpin_user_pages_dirty (not locked)".
>
> Although in habanalabs case, I just reviewed patch 3 and I think they *were*
> correctly using set_page_dirty_lock()...

Yeah I mixed that up with some other code I read, habanalabs is using
_lock. I have seen a pile of gup/pup code though that only uses
set_page_dirty. And looking around I did not really parse the comment
above set_page_dirty(). I guess just using the _lock variant shouldn't
hurt too much. I've found a comment though from the infiniband umem
notifier that it's sometimes called with the page locked, and
sometimes not, so life is complicated there. But how it avoids races I
didn't understand.
-Daniel


--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 01/13] drm/exynos: Stop using frame_vector helpers

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 10:33 PM John Hubbard  wrote:
>
> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> > All we need are a pages array, pin_user_pages_fast can give us that
> > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> >
> > Signed-off-by: Daniel Vetter 
> > Cc: Jason Gunthorpe 
> > Cc: Inki Dae 
> > Cc: Joonyoung Shim 
> > Cc: Seung-Woo Kim 
> > Cc: Kyungmin Park 
> > Cc: Kukjin Kim 
> > Cc: Krzysztof Kozlowski 
> > Cc: Andrew Morton 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Jan Kara 
> > Cc: Dan Williams 
> > Cc: linux...@kvack.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-samsung-...@vger.kernel.org
> > Cc: linux-me...@vger.kernel.org
> > ---
> >   drivers/gpu/drm/exynos/Kconfig  |  1 -
> >   drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 -
> >   2 files changed, 22 insertions(+), 27 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
> > index 6417f374b923..43257ef3c09d 100644
> > --- a/drivers/gpu/drm/exynos/Kconfig
> > +++ b/drivers/gpu/drm/exynos/Kconfig
> > @@ -88,7 +88,6 @@ comment "Sub-drivers"
> >   config DRM_EXYNOS_G2D
> >   bool "G2D"
> >   depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
> > - select FRAME_VECTOR
> >   help
> > Choose this option if you want to use Exynos G2D for DRM.
> >
> > diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
> > b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> > index 967a5cdc120e..c83f6faac9de 100644
> > --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> > +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> > @@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
> >   dma_addr_t  dma_addr;
> >   unsigned long   userptr;
> >   unsigned long   size;
> > - struct frame_vector *vec;
> > + struct page **pages;
> > + unsigned intnpages;
> >   struct sg_table *sgt;
> >   atomic_trefcount;
> >   boolin_pool;
> > @@ -378,7 +379,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data 
> > *g2d,
> >   bool force)
> >   {
> >   struct g2d_cmdlist_userptr *g2d_userptr = obj;
> > - struct page **pages;
> > + int i;
>
> The above line can also be deleted, see below.
>
> >
> >   if (!obj)
> >   return;
> > @@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data 
> > *g2d,
> >   dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
> > DMA_BIDIRECTIONAL, 0);
> >
> > - pages = frame_vector_pages(g2d_userptr->vec);
> > - if (!IS_ERR(pages)) {
> > - int i;
> > + for (i = 0; i < g2d_userptr->npages; i++)
> > + set_page_dirty_lock(g2d_userptr->pages[i]);
> >
> > - for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
> > - set_page_dirty_lock(pages[i]);
> > - }
> > - put_vaddr_frames(g2d_userptr->vec);
> > - frame_vector_destroy(g2d_userptr->vec);
> > + unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
> > + kvfree(g2d_userptr->pages);
>
> You can avoid writing your own loop, and just simplify the whole thing down to
> two lines:
>
> unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
> true);
> kvfree(g2d_userptr->pages);

Oh nice, this is neat. I'll also roll it out in the habanalabs patch,
that has the same thing. Well almost, it only uses set_page_dirty, not
the _lock variant. But I have no idea whether that matters or not?
-Daniel

>
>
> >
> >   if (!g2d_userptr->out_of_list)
> >   list_del_init(_userptr->list);
> > @@ -474,35 +471,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
> > g2d_data *g2d,
> >   offset = userptr & ~PAGE_MASK;
> >   end = PAGE_ALIGN(userptr + size);
> >   npages = (end - start) >> PAGE_SHIFT;
> > - g2d_userptr->vec = frame_vector_create(npages);
> > - if (!g2d_userptr->vec) {
> > + g2d_userptr->pages = kvmalloc_array(npages, 
> > sizeof(*g2d_userptr->pages),
> > + GFP_KERNEL);
> > + if (!g2d_userptr->p

Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 11:13 PM John Hubbard  wrote:
>
> On 10/7/20 9:44 AM, Daniel Vetter wrote:
> > This is used by media/videbuf2 for persistent dma mappings, not just
> > for a single dma operation and then freed again, so needs
> > FOLL_LONGTERM.
> >
> > Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
> > locking issues. Rework the code to pull the pup path out from the
> > mmap_sem critical section as suggested by Jason.
> >
> > Signed-off-by: Daniel Vetter 
> > Cc: Jason Gunthorpe 
> > Cc: Pawel Osciak 
> > Cc: Marek Szyprowski 
> > Cc: Kyungmin Park 
> > Cc: Tomasz Figa 
> > Cc: Mauro Carvalho Chehab 
> > Cc: Andrew Morton 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Jan Kara 
> > Cc: Dan Williams 
> > Cc: linux...@kvack.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-samsung-...@vger.kernel.org
> > Cc: linux-me...@vger.kernel.org
> > ---
> >   mm/frame_vector.c | 36 +++-
> >   1 file changed, 11 insertions(+), 25 deletions(-)
> >
> > diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> > index 10f82d5643b6..39db520a51dc 100644
> > --- a/mm/frame_vector.c
> > +++ b/mm/frame_vector.c
> > @@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> > nr_frames,
> >   struct vm_area_struct *vma;
> >   int ret = 0;
> >   int err;
> > - int locked;
> >
> >   if (nr_frames == 0)
> >   return 0;
> > @@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> > nr_frames,
> >
> >   start = untagged_addr(start);
> >
> > + ret = pin_user_pages_fast(start, nr_frames,
> > +   FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> > +   (struct page **)(vec->ptrs));
> > + if (ret > 0) {
> > + vec->got_ref = true;
> > + vec->is_pfns = false;
> > + goto out_unlocked;
> > + }
>
> This part looks good, and changing to _fast is a potential performance 
> improvement,
> too.
>
> > +
> >   mmap_read_lock(mm);
> > - locked = 1;
> >   vma = find_vma_intersection(mm, start, start + 1);
> >   if (!vma) {
> >   ret = -EFAULT;
> >   goto out;
> >   }
> >
> > - /*
> > -  * While get_vaddr_frames() could be used for transient (kernel
> > -  * controlled lifetime) pinning of memory pages all current
> > -  * users establish long term (userspace controlled lifetime)
> > -  * page pinning. Treat get_vaddr_frames() like
> > -  * get_user_pages_longterm() and disallow it for filesystem-dax
> > -  * mappings.
> > -  */
> > - if (vma_is_fsdax(vma)) {
> > - ret = -EOPNOTSUPP;
> > - goto out;
> > - }
>
> Are you sure we don't need to check vma_is_fsdax() anymore?

Since FOLL_LONGTERM checks for this and can only return struct page
backed memory, and explicitly excludes VM_IO | VM_PFNMAP, was assuming
this is not needed for follow_pfn. And the get_user_pages_locked this
used back then didn't have the same check, hence why it was added (and
FOLL_LONGTERM still doesn't work for the _locked versions, as you
pointed out on the last round of this discussion).

But now that you're asking, I have no idea whether fsdax vma can also
be of VM_IO | VM_PFNMAP type. I'm not seeing that set anywhere in
fs/dax.c, but that says nothing :-)

Dan, you added this check originally, do we need it for VM_SPECIAL vmas too?

Thanks, Daniel

>
> > -
> > - if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > - vec->got_ref = true;
> > - vec->is_pfns = false;
> > - ret = pin_user_pages_locked(start, nr_frames,
> > - gup_flags, (struct page **)(vec->ptrs), );
> > - goto out;
> > - }
> > -
> >   vec->got_ref = false;
> >   vec->is_pfns = true;
> >   do {
> > @@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> > nr_frames,
> >   vma = find_vma_intersection(mm, start, start + 1);
> >   } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
> >   out:
> > - if (locked)
> > - mmap_read_unlock(mm);
> > + mmap_read_unlock(mm);
> > +out_unlocked:
> >   if (!ret)
> >   ret = -EFAULT;
> >   if (ret > 0)
> >
>
> All of the error handling still looks accurate there.
>
> thanks,
> --
> John Hubbard
> NVIDIA



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 10/13] PCI: revoke mappings like devmem

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 9:33 PM Dan Williams  wrote:
>
> On Wed, Oct 7, 2020 at 11:11 AM Daniel Vetter  wrote:
> >
> > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > the default for all driver uses.
> >
> > Except there's two more ways to access pci bars: sysfs and proc mmap
> > support. Let's plug that hole.
>
> Ooh, yes, lets.
>
> > For revoke_devmem() to work we need to link our vma into the same
> > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > at ->open time, but that's a bit tricky here with all the entry points
> > and arch code. So instead create a fake file and adjust vma->vm_file.
>
> I don't think you want to share the devmem inode for this, this should
> be based off the sysfs inode which I believe there is already only one
> instance per resource. In contrast /dev/mem can have multiple inodes
> because anyone can just mknod a new character device file, the same
> problem does not exist for sysfs.

But then I need to find the right one, plus I also need to find the
right one for the procfs side. That gets messy, and I already have no
idea how to really test this. Shared address_space is the same trick
we're using in drm (where we have multiple things all pointing to the
same underlying resources, through different files), and it gets the
job done. So that's why I figured the shared address_space is the
cleaner solution since then unmap_mapping_range takes care of
iterating over all vma for us. I guess I could reimplement that logic
with our own locking and everything in revoke_devmem, but feels a bit
silly. But it would also solve the problem of having mutliple
different mknod of /dev/kmem with different address_space behind them.
Also because of how remap_pfn_range works, all these vma do use the
same pgoff already anyway.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 11/13] mm: add unsafe_follow_pfn

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 9:00 PM Jason Gunthorpe  wrote:
>
> On Wed, Oct 07, 2020 at 08:10:34PM +0200, Daniel Vetter wrote:
> > On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe  wrote:
> > >
> > > On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > >
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > >
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > >
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > >
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > >
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > >
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.
> > > >
> > > > Signed-off-by: Daniel Vetter 
> > > > Cc: Jason Gunthorpe 
> > > > Cc: Kees Cook 
> > > > Cc: Dan Williams 
> > > > Cc: Andrew Morton 
> > > > Cc: John Hubbard 
> > > > Cc: Jérôme Glisse 
> > > > Cc: Jan Kara 
> > > > Cc: Dan Williams 
> > > > Cc: linux...@kvack.org
> > > > Cc: linux-arm-ker...@lists.infradead.org
> > > > Cc: linux-samsung-...@vger.kernel.org
> > > > Cc: linux-me...@vger.kernel.org
> > > > Cc: k...@vger.kernel.org
> > > >  include/linux/mm.h |  2 ++
> > > >  mm/memory.c| 32 +++-
> > > >  mm/nommu.c | 17 +
> > > >  security/Kconfig   | 13 +
> > > >  4 files changed, 63 insertions(+), 1 deletion(-)
> > >
> > > Makes sense to me.
> > >
> > > I wonder if we could change the original follow_pfn to require the
> > > ptep and then lockdep_assert_held() it against the page table lock?
> >
> > The safe variant with the pagetable lock is follow_pte_pmd. The only
> > way to make follow_pfn safe is if you have an mmu notifier and
> > corresponding retry logic. That is not covered by lockdep (it would
> > splat if we annotate the retry side), so I'm not sure how you'd check
> > for that?
>
> Right OK.
>
> > Checking for ptep lock doesn't work here, since the one leftover safe
> > user of this (kvm) doesn't need that at all, because it has the mmu
> > notifier.
>
> Ah, so a better name and/or function kdoc for follow_pfn is probably a
> good iead in this patch as well.

I did change that already to mention that you need an mmu notifier,
and that follow_pte_pmd respectively unsafe_follow_pfn are the
alternatives. Do you want more or something else here?

Note that I left the kerneldoc for the nommu.c case unchanged, since
without an mmu all bets are off anyway.

> > So I think we're as good as it gets, since I really have no idea how
> > to make sure follow_pfn callers do have an mmu notifier registered.
>
> Yah, can't be done. Most mmu notifier users should be using
> hmm_range_fault anyhow, kvm is really very special here.

We could pass an mmu notifier to follow_pfn and check that it has a
registration for vma->vm_mm, but that feels like overkill when kvm is
the only legit user for this.

> > I've followed the few other CONFIG_STRICT_FOO I've seen, which are all
> > explicit enables and default to "do not break uapi, damn the
> > (security) bugs". Which is I think how this should be done. It is in
> > the security section though, so hopefully competent distros will
> > enable this all.
>
> I thought the strict ones were more general and less clear security
> worries, not bugs like this.
>
> This is "allow a user triggerable use after free bug to exist in the
> kernel"

Since at most you get at GFP_MOVEABLE stuff I'm not sure you can use
this to pull the kernel over the table. Maybe best way is if you get a
gpu pagetable somehow into your pfn and then use that to access
abitrary stuff, but there's still an iommu. I think leveraging this is
going to be very tricky, and pretty much has to be device or driver
specific somehow.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 10/13] PCI: revoke mappings like devmem

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 8:41 PM Bjorn Helgaas  wrote:
>
> Capitalize subject, like other patches in this series and previous
> drivers/pci history.
>
> On Wed, Oct 07, 2020 at 06:44:23PM +0200, Daniel Vetter wrote:
> > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > the default for all driver uses.
> >
> > Except there's two more ways to access pci bars: sysfs and proc mmap
> > support. Let's plug that hole.
>
> s/pci/PCI/ in commit logs and comments.
>
> > For revoke_devmem() to work we need to link our vma into the same
> > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
> > at ->open time, but that's a bit tricky here with all the entry points
> > and arch code. So instead create a fake file and adjust vma->vm_file.
> >
> > Note this only works for ARCH_GENERIC_PCI_MMAP_RESOURCE. But that
> > seems to be a subset of architectures support STRICT_DEVMEM, so we
> > should be good.
> >
> > The only difference in access checks left is that sysfs pci mmap does
> > not check for CAP_RAWIO. But I think that makes some sense compared to
> > /dev/mem and proc, where one file gives you access to everything and
> > no ownership applies.
>
> > --- a/drivers/char/mem.c
> > +++ b/drivers/char/mem.c
> > @@ -810,6 +810,7 @@ static loff_t memory_lseek(struct file *file, loff_t 
> > offset, int orig)
> >  }
> >
> >  static struct inode *devmem_inode;
> > +static struct vfsmount *devmem_vfs_mount;
> >
> >  #ifdef CONFIG_IO_STRICT_DEVMEM
> >  void revoke_devmem(struct resource *res)
> > @@ -843,6 +844,20 @@ void revoke_devmem(struct resource *res)
> >
> >   unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 
> > 1);
> >  }
> > +
> > +struct file *devmem_getfile(void)
> > +{
> > + struct file *file;
> > +
> > + file = alloc_file_pseudo(devmem_inode, devmem_vfs_mount, "devmem",
> > +  O_RDWR, _fops);
> > + if (IS_ERR(file))
> > + return NULL;
> > +
> > + file->f_mapping = devmem_indoe->i_mapping;
>
> "devmem_indoe"?  Obviously not compiled, I guess?

Yeah apologies, I forgot to compile this with CONFIG_IO_STRICT_DEVMEM
set. The entire series is more rfc about the overall problem really, I
need to also figure out how to even this this somehow. I guess there's
nothing really ready made here?
-Daniel

> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -304,8 +304,10 @@ struct resource *request_free_mem_region(struct 
> > resource *base,
> >
> >  #ifdef CONFIG_IO_STRICT_DEVMEM
> >  void revoke_devmem(struct resource *res);
> > +struct file *devm_getfile(void);
> >  #else
> >  static inline void revoke_devmem(struct resource *res) { };
> > +static inline struct file *devmem_getfile(void) { return NULL; };
>
> I guess these names are supposed to match?
>
> >  #endif
> >
> >  #endif /* __ASSEMBLY__ */
> > --
> > 2.28.0
> >



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 7:39 PM Jason Gunthorpe  wrote:
>
> On Wed, Oct 07, 2020 at 06:44:26PM +0200, Daniel Vetter wrote:
> > The code seems to stuff these pfns into iommu pts (or something like
> > that, I didn't follow), but there's no mmu_notifier to ensure that
> > access is synchronized with pte updates.
> >
> > Hence mark these as unsafe. This means that with
> > CONFIG_STRICT_FOLLOW_PFN, these will be rejected.
> >
> > Real fix is to wire up an mmu_notifier ... somehow. Probably means any
> > invalidate is a fatal fault for this vfio device, but then this
> > shouldn't ever happen if userspace is reasonable.
> >
> > Signed-off-by: Daniel Vetter 
> > Cc: Jason Gunthorpe 
> > Cc: Kees Cook 
> > Cc: Dan Williams 
> > Cc: Andrew Morton 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Jan Kara 
> > Cc: Dan Williams 
> > Cc: linux...@kvack.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-samsung-...@vger.kernel.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: Alex Williamson 
> > Cc: Cornelia Huck 
> > Cc: k...@vger.kernel.org
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 5fbf0c1f7433..a4d53f3d0a35 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, 
> > struct mm_struct *mm,
> >  {
> >   int ret;
> >
> > - ret = follow_pfn(vma, vaddr, pfn);
> > + ret = unsafe_follow_pfn(vma, vaddr, pfn);
> >   if (ret) {
> >   bool unlocked = false;
> >
> > @@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, 
> > struct mm_struct *mm,
> >   if (ret)
> >   return ret;
> >
> > - ret = follow_pfn(vma, vaddr, pfn);
> > + ret = unsafe_follow_pfn(vma, vaddr, pfn);
> >   }
>
> This is actually being commonly used, so it needs fixing.
>
> When I talked to Alex about this last we had worked out a patch series
> that adds a test on vm_ops that the vma came from vfio in the first
> place. The VMA's created by VFIO are 'safe' as the PTEs are never changed.

Hm, but wouldn't need that the semi-nasty vma_open trick to make sure
that vma doesn't untimely disappear? Or is the idea to look up the
underlying vfio object, and refcount that directly?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 11/13] mm: add unsafe_follow_pfn

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 7:36 PM Jason Gunthorpe  wrote:
>
> On Wed, Oct 07, 2020 at 06:44:24PM +0200, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea.
> >
> > Unfortunately there's some users where this is not fixable (like v4l
> > userptr of iomem mappings) or involves a pile of work (vfio type1
> > iommu). For now annotate these as unsafe and splat appropriately.
> >
> > This patch adds an unsafe_follow_pfn, which later patches will then
> > roll out to all appropriate places.
> >
> > Signed-off-by: Daniel Vetter 
> > Cc: Jason Gunthorpe 
> > Cc: Kees Cook 
> > Cc: Dan Williams 
> > Cc: Andrew Morton 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Jan Kara 
> > Cc: Dan Williams 
> > Cc: linux...@kvack.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-samsung-...@vger.kernel.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: k...@vger.kernel.org
> > ---
> >  include/linux/mm.h |  2 ++
> >  mm/memory.c| 32 +++-
> >  mm/nommu.c | 17 +
> >  security/Kconfig   | 13 +
> >  4 files changed, 63 insertions(+), 1 deletion(-)
>
> Makes sense to me.
>
> I wonder if we could change the original follow_pfn to require the
> ptep and then lockdep_assert_held() it against the page table lock?

The safe variant with the pagetable lock is follow_pte_pmd. The only
way to make follow_pfn safe is if you have an mmu notifier and
corresponding retry logic. That is not covered by lockdep (it would
splat if we annotate the retry side), so I'm not sure how you'd check
for that?

Checking for ptep lock doesn't work here, since the one leftover safe
user of this (kvm) doesn't need that at all, because it has the mmu
notifier.

Also follow_pte_pmd will splat with lockdep if you get it wrong, since
the function leaves you with the right ptlock lock when it returns. If
you forget to unlock that, lockdep will complain.

So I think we're as good as it gets, since I really have no idea how
to make sure follow_pfn callers do have an mmu notifier registered.

> > +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> > + unsigned long *pfn)
> > +{
> > +#ifdef CONFIG_STRICT_FOLLOW_PFN
> > + pr_info("unsafe follow_pfn usage rejected, see
> > CONFIG_STRICT_FOLLOW_PFN\n");
>
> Wonder if we can print something useful here, like the current
> PID/process name?

Yeah adding comm/pid here makes sense.

> > diff --git a/security/Kconfig b/security/Kconfig
> > index 7561f6f99f1d..48945402e103 100644
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
> > If you wish for all usermode helper programs to be disabled,
> > specify an empty string here (i.e. "").
> >
> > +config STRICT_FOLLOW_PFN
> > + bool "Disable unsafe use of follow_pfn"
> > + depends on MMU
>
> I would probably invert this CONFIG_ALLOW_UNSAFE_FOLLOW_PFN
> default n

I've followed the few other CONFIG_STRICT_FOO I've seen, which are all
explicit enables and default to "do not break uapi, damn the
(security) bugs". Which is I think how this should be done. It is in
the security section though, so hopefully competent distros will
enable this all.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 07/13] mm: close race in generic_access_phys

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 7:27 PM Jason Gunthorpe  wrote:
>
> On Wed, Oct 07, 2020 at 06:44:20PM +0200, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> >
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> >   ptes with unmap_mapping_range when buffers get moved
> >
> > - contiguous dma allocations have moved from dedicated carvetouts to
> >   cma regions. This means if we miss the unmap the pfn might contain
> >   pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> >
> > - even /dev/mem now invalidates mappings when the kernel requests that
> >   iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> >   ("/dev/mem: Revoke mappings when a driver claims the region")
> >
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea. Fix this.
> >
> > Since ioremap might need to manipulate pagetables too we need to drop
> > the pt lock and have a retry loop if we raced.
> >
> > While at it, also add kerneldoc and improve the comment for the
> > vma_ops->access function. It's for accessing, not for moving the
> > memory from iomem to system memory, as the old comment seemed to
> > suggest.
> >
> > References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
> > Cc: Jason Gunthorpe 
> > Cc: Dan Williams 
> > Cc: Kees Cook 
> > Cc: Rik van Riel 
> > Cc: Benjamin Herrensmidt 
> > Cc: Dave Airlie 
> > Cc: Hugh Dickins 
> > Cc: Andrew Morton 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Jan Kara 
> > Cc: Dan Williams 
> > Cc: linux...@kvack.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-samsung-...@vger.kernel.org
> > Cc: linux-me...@vger.kernel.org
> > Signed-off-by: Daniel Vetter 
> > ---
> >  include/linux/mm.h |  3 ++-
> >  mm/memory.c| 44 ++--
> >  2 files changed, 44 insertions(+), 3 deletions(-)
>
> This does seem to solve the race with revoke_devmem(), but it is really ugly.
>
> It would be much nicer to wrap a rwsem around this access and the unmap.
>
> Any place using it has a nice linear translation from vm_off to pfn,
> so I don't think there is a such a good reason to use follow_pte in
> the first place.
>
> ie why not the helper be this:
>
>  int generic_access_phys(unsigned long pfn, unsigned long pgprot,
>   void *buf, size_t len, bool write)
>
> Then something like dev/mem would compute pfn and obtain the lock:
>
> dev_access(struct vm_area_struct *vma, unsigned long addr, void *buf, int 
> len, int write)
> {
>  cpu_addr = vma->vm_pgoff*PAGE_SIZE + (addr - vma->vm_start));
>
>  /* FIXME: Has to be over each page of len */
>  if (!devmem_is_allowed_access(PHYS_PFN(cpu_addr/4096)))
>return -EPERM;
>
>  down_read(_sem);
>  generic_access_phys(cpu_addr/4096, pgprot_val(vma->vm_page_prot),
>  buf, len, write);
>  up_read(_sem);
> }
>
> The other cases looked simpler because they don't revoke, here the
> mmap_sem alone should be enough protection, they would just need to
> provide the linear translation to pfn.
>
> What do you think?

I think it'd fix the bug, until someone wires ->access up for
drivers/gpu, or the next subsystem. This is also just for ptrace, so
we really don't care when we stall the vm badly and other silly
things. So I figured the somewhat ugly, but full generic solution is
the better one, so that people who want to be able to ptrace
read/write their iomem mmaps can just sprinkle this wherever they feel
like.

But yeah if we go with most minimal fix, i.e. only trying to fix the
current users, then your thing should work and is simpler. But it
leaves the door open for future problems.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 6:53 PM Jason Gunthorpe  wrote:
>
> On Wed, Oct 07, 2020 at 06:44:18PM +0200, Daniel Vetter wrote:
> >
> > - /*
> > -  * While get_vaddr_frames() could be used for transient (kernel
> > -  * controlled lifetime) pinning of memory pages all current
> > -  * users establish long term (userspace controlled lifetime)
> > -  * page pinning. Treat get_vaddr_frames() like
> > -  * get_user_pages_longterm() and disallow it for filesystem-dax
> > -  * mappings.
> > -  */
> > - if (vma_is_fsdax(vma)) {
> > - ret = -EOPNOTSUPP;
> > - goto out;
> > - }
> > -
> > - if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > - vec->got_ref = true;
> > - vec->is_pfns = false;
> > - ret = pin_user_pages_locked(start, nr_frames,
> > - gup_flags, (struct page **)(vec->ptrs), );
> > - goto out;
> > - }
>
> The vm_flags still need to be checked before going into the while
> loop. If the break is taken then nothing would check vm_flags

Hm right that's a bin inconsistent. follow_pfn also checks for this,
so I think we can just ditch this entirely both here and in the do {}
while () check, simplifying the latter to just while (vma). Well, just
make it a real loop with less confusing control flow probably.

Or prefer I keep this and touch the code less?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH 01/13] drm/exynos: Stop using frame_vector helpers

2020-10-07 Thread Daniel Vetter
All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Inki Dae 
Cc: Joonyoung Shim 
Cc: Seung-Woo Kim 
Cc: Kyungmin Park 
Cc: Kukjin Kim 
Cc: Krzysztof Kozlowski 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
---
 drivers/gpu/drm/exynos/Kconfig  |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 -
 2 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 6417f374b923..43257ef3c09d 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -88,7 +88,6 @@ comment "Sub-drivers"
 config DRM_EXYNOS_G2D
bool "G2D"
depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
-   select FRAME_VECTOR
help
  Choose this option if you want to use Exynos G2D for DRM.
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 967a5cdc120e..c83f6faac9de 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
dma_addr_t  dma_addr;
unsigned long   userptr;
unsigned long   size;
-   struct frame_vector *vec;
+   struct page **pages;
+   unsigned intnpages;
struct sg_table *sgt;
atomic_trefcount;
boolin_pool;
@@ -378,7 +379,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
bool force)
 {
struct g2d_cmdlist_userptr *g2d_userptr = obj;
-   struct page **pages;
+   int i;
 
if (!obj)
return;
@@ -398,15 +399,11 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
  DMA_BIDIRECTIONAL, 0);
 
-   pages = frame_vector_pages(g2d_userptr->vec);
-   if (!IS_ERR(pages)) {
-   int i;
+   for (i = 0; i < g2d_userptr->npages; i++)
+   set_page_dirty_lock(g2d_userptr->pages[i]);
 
-   for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
-   set_page_dirty_lock(pages[i]);
-   }
-   put_vaddr_frames(g2d_userptr->vec);
-   frame_vector_destroy(g2d_userptr->vec);
+   unpin_user_pages(g2d_userptr->pages, g2d_userptr->npages);
+   kvfree(g2d_userptr->pages);
 
if (!g2d_userptr->out_of_list)
list_del_init(_userptr->list);
@@ -474,35 +471,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
g2d_data *g2d,
offset = userptr & ~PAGE_MASK;
end = PAGE_ALIGN(userptr + size);
npages = (end - start) >> PAGE_SHIFT;
-   g2d_userptr->vec = frame_vector_create(npages);
-   if (!g2d_userptr->vec) {
+   g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
+   GFP_KERNEL);
+   if (!g2d_userptr->pages) {
ret = -ENOMEM;
goto err_free;
}
 
-   ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-   g2d_userptr->vec);
+   ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+ g2d_userptr->pages);
if (ret != npages) {
DRM_DEV_ERROR(g2d->dev,
  "failed to get user pages from userptr.\n");
if (ret < 0)
-   goto err_destroy_framevec;
-   ret = -EFAULT;
-   goto err_put_framevec;
-   }
-   if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
+   goto err_destroy_pages;
+   npages = ret;
ret = -EFAULT;
-   goto err_put_framevec;
+   goto err_unpin_pages;
}
+   g2d_userptr->npages = npages;
 
sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
if (!sgt) {
ret = -ENOMEM;
-   goto err_put_framevec;
+   goto err_unpin_pages;
}
 
ret = sg_alloc_table_from_pages(sgt,
-   frame_vector_pages(g2d_userptr->vec),
+   g2d_userptr->pages,
npages, offset, size, GFP_KERNEL);
if (ret < 0) {
DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");

[PATCH 10/13] PCI: revoke mappings like devmem

2020-10-07 Thread Daniel Vetter
Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
the region") /dev/kmem zaps ptes when the kernel requests exclusive
acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
the default for all driver uses.

Except there's two more ways to access pci bars: sysfs and proc mmap
support. Let's plug that hole.

For revoke_devmem() to work we need to link our vma into the same
address_space, with consistent vma->vm_pgoff. ->pgoff is already
adjusted, because that's how (io_)remap_pfn_range works, but for the
mapping we need to adjust vma->vm_file->f_mapping. Usually that's done
at ->open time, but that's a bit tricky here with all the entry points
and arch code. So instead create a fake file and adjust vma->vm_file.

Note this only works for ARCH_GENERIC_PCI_MMAP_RESOURCE. But that
seems to be a subset of architectures support STRICT_DEVMEM, so we
should be good.

The only difference in access checks left is that sysfs pci mmap does
not check for CAP_RAWIO. But I think that makes some sense compared to
/dev/mem and proc, where one file gives you access to everything and
no ownership applies.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Dan Williams 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Bjorn Helgaas 
Cc: linux-...@vger.kernel.org
---
 drivers/char/mem.c | 16 +++-
 drivers/pci/mmap.c |  3 +++
 include/linux/ioport.h |  2 ++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index abd4ffdc8cde..5e58a326d4ee 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -810,6 +810,7 @@ static loff_t memory_lseek(struct file *file, loff_t 
offset, int orig)
 }
 
 static struct inode *devmem_inode;
+static struct vfsmount *devmem_vfs_mount;
 
 #ifdef CONFIG_IO_STRICT_DEVMEM
 void revoke_devmem(struct resource *res)
@@ -843,6 +844,20 @@ void revoke_devmem(struct resource *res)
 
unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 
1);
 }
+
+struct file *devmem_getfile(void)
+{
+   struct file *file;
+
+   file = alloc_file_pseudo(devmem_inode, devmem_vfs_mount, "devmem",
+O_RDWR, _fops);
+   if (IS_ERR(file))
+   return NULL;
+
+   file->f_mapping = devmem_indoe->i_mapping;
+
+   return file;
+}
 #endif
 
 static int open_port(struct inode *inode, struct file *filp)
@@ -1010,7 +1025,6 @@ static struct file_system_type devmem_fs_type = {
 
 static int devmem_init_inode(void)
 {
-   static struct vfsmount *devmem_vfs_mount;
static int devmem_fs_cnt;
struct inode *inode;
int rc;
diff --git a/drivers/pci/mmap.c b/drivers/pci/mmap.c
index b8c9011987f4..63786cc9c746 100644
--- a/drivers/pci/mmap.c
+++ b/drivers/pci/mmap.c
@@ -7,6 +7,7 @@
  * Author: David Woodhouse 
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -64,6 +65,8 @@ int pci_mmap_resource_range(struct pci_dev *pdev, int bar,
vma->vm_pgoff += (pci_resource_start(pdev, bar) >> PAGE_SHIFT);
 
vma->vm_ops = _phys_vm_ops;
+   fput(vma->vm_file);
+   vma->vm_file = devmem_getfile();
 
return io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
  vma->vm_end - vma->vm_start,
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6c2b06fe8beb..83238cba19fe 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -304,8 +304,10 @@ struct resource *request_free_mem_region(struct resource 
*base,
 
 #ifdef CONFIG_IO_STRICT_DEVMEM
 void revoke_devmem(struct resource *res);
+struct file *devm_getfile(void);
 #else
 static inline void revoke_devmem(struct resource *res) { };
+static inline struct file *devmem_getfile(void) { return NULL; };
 #endif
 
 #endif /* __ASSEMBLY__ */
-- 
2.28.0



[PATCH 09/13] PCI: obey iomem restrictions for procfs mmap

2020-10-07 Thread Daniel Vetter
There's three ways to access pci bars from userspace: /dev/mem, sysfs
files, and the old proc interface. Two check against
iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
this starts to matter, since we don't want random userspace having
access to pci bars while a driver is loaded and using it.

Fix this.

References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Dan Williams 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Bjorn Helgaas 
Cc: linux-...@vger.kernel.org
---
 drivers/pci/proc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index d35186b01d98..3a2f90beb4cb 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct 
vm_area_struct *vma)
else
return -EINVAL;
}
+
+   if (dev->resource[i].flags & IORESOURCE_MEM &&
+   iomem_is_exclusive(dev->resource[i].start))
+   return -EINVAL;
+
ret = pci_mmap_page_range(dev, i, vma,
  fpriv->mmap_state, write_combine);
if (ret < 0)
-- 
2.28.0



[PATCH 13/13] vfio/type1: Mark follow_pfn as unsafe

2020-10-07 Thread Daniel Vetter
The code seems to stuff these pfns into iommu pts (or something like
that, I didn't follow), but there's no mmu_notifier to ensure that
access is synchronized with pte updates.

Hence mark these as unsafe. This means that with
CONFIG_STRICT_FOLLOW_PFN, these will be rejected.

Real fix is to wire up an mmu_notifier ... somehow. Probably means any
invalidate is a fatal fault for this vfio device, but then this
shouldn't ever happen if userspace is reasonable.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Dan Williams 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Alex Williamson 
Cc: Cornelia Huck 
Cc: k...@vger.kernel.org
---
 drivers/vfio/vfio_iommu_type1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5fbf0c1f7433..a4d53f3d0a35 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, 
struct mm_struct *mm,
 {
int ret;
 
-   ret = follow_pfn(vma, vaddr, pfn);
+   ret = unsafe_follow_pfn(vma, vaddr, pfn);
if (ret) {
bool unlocked = false;
 
@@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, 
struct mm_struct *mm,
if (ret)
return ret;
 
-   ret = follow_pfn(vma, vaddr, pfn);
+   ret = unsafe_follow_pfn(vma, vaddr, pfn);
}
 
return ret;
-- 
2.28.0



[PATCH 11/13] mm: add unsafe_follow_pfn

2020-10-07 Thread Daniel Vetter
Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea.

Unfortunately there's some users where this is not fixable (like v4l
userptr of iomem mappings) or involves a pile of work (vfio type1
iommu). For now annotate these as unsafe and splat appropriately.

This patch adds an unsafe_follow_pfn, which later patches will then
roll out to all appropriate places.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Dan Williams 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: k...@vger.kernel.org
---
 include/linux/mm.h |  2 ++
 mm/memory.c| 32 +++-
 mm/nommu.c | 17 +
 security/Kconfig   | 13 +
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a16631c1fda..ec8c90928fc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long 
address,
   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
unsigned long *pfn);
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+ unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
unsigned int flags, unsigned long *prot, resource_size_t *phys);
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index 8d467e23b44e..8db7ad1c261c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
  * @address: user virtual address
  * @pfn: location to store found PFN
  *
- * Only IO mappings and raw PFN mappings are allowed.
+ * Only IO mappings and raw PFN mappings are allowed. Note that callers must
+ * ensure coherency with pte updates by using a _notifier to follow 
updates.
+ * If this is not feasible, or the access to the @pfn is only very short term,
+ * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
+ * the access instead. Any caller not following these requirements must use
+ * unsafe_follow_pfn() instead.
  *
  * Return: zero and the pfn at @pfn on success, -ve otherwise.
  */
@@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long 
address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+   unsigned long *pfn)
+{
+#ifdef CONFIG_STRICT_FOLLOW_PFN
+   pr_info("unsafe follow_pfn usage rejected, see 
CONFIG_STRICT_FOLLOW_PFN\n");
+   return -EINVAL;
+#else
+   WARN_ONCE(1, "unsafe follow_pfn usage\n");
+   add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+
+   return follow_pfn(vma, address, pfn);
+#endif
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 #ifdef CONFIG_HAVE_IOREMAP_PROT
 int follow_phys(struct vm_area_struct *vma,
unsigned long address, unsigned int flags,
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af1..3db2910f0d64 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long 
address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+   unsigned long *pfn)
+{
+   return follow_pfn(vma, address, pfn);
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 LIST_HEAD(vmap_area_list);
 
 void vfree(const void *addr)
diff --git a/security/Kc

[PATCH 12/13] media/videbuf1|2: Mark follow_pfn usage as unsafe

2020-10-07 Thread Daniel Vetter
The media model assumes that buffers are all preallocated, so that
when a media pipeline is running we never miss a deadline because the
buffers aren't allocated or available.

This means we cannot fix the v4l follow_pfn usage through
mmu_notifier, without breaking how this all works. The only real fix
is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
tell everyone to cut over to dma-buf memory sharing for zerocopy.

userptr for normal memory will keep working as-is.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Dan Williams 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Pawel Osciak 
Cc: Marek Szyprowski 
Cc: Kyungmin Park 
Cc: Tomasz Figa 
Cc: Laurent Dufour 
Cc: Vlastimil Babka 
Cc: Daniel Jordan 
Cc: Michel Lespinasse 
---
 drivers/media/common/videobuf2/frame_vector.c | 2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c 
b/drivers/media/common/videobuf2/frame_vector.c
index b95f4f371681..d56eb6258f09 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -71,7 +71,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
nr_frames,
unsigned long *nums = frame_vector_pfns(vec);
 
while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
-   err = follow_pfn(vma, start, [ret]);
+   err = unsafe_follow_pfn(vma, start, [ret]);
if (err) {
if (ret == 0)
ret = err;
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c 
b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 52312ce2ba05..821c4a76ab96 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct 
videobuf_dma_contig_memory *mem,
user_address = untagged_baddr;
 
while (pages_done < (mem->size >> PAGE_SHIFT)) {
-   ret = follow_pfn(vma, user_address, _pfn);
+   ret = unsafe_follow_pfn(vma, user_address, _pfn);
if (ret)
break;
 
-- 
2.28.0



[PATCH 02/13] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists

2020-10-07 Thread Daniel Vetter
The exynos g2d interface is very unusual, but it looks like the
userptr objects are persistent. Hence they need FOLL_LONGTERM.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Inki Dae 
Cc: Joonyoung Shim 
Cc: Seung-Woo Kim 
Cc: Kyungmin Park 
Cc: Kukjin Kim 
Cc: Krzysztof Kozlowski 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index c83f6faac9de..514fd000feb1 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -478,7 +478,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data 
*g2d,
goto err_free;
}
 
-   ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+   ret = pin_user_pages_fast(start, npages,
+ FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
  g2d_userptr->pages);
if (ret != npages) {
DRM_DEV_ERROR(g2d->dev,
-- 
2.28.0



[PATCH 00/13] follow_pfn and other iomap races

2020-10-07 Thread Daniel Vetter
Hi all,

This developed from a discussion with Jason, starting with some patches
touching get_vaddr_frame that I typed up.

The problem is that way back VM_IO | VM_PFNMAP mappings were pretty
static, and so just following the ptes to derive a pfn and then use that
somewhere else was ok.

But we're no longer in such a world, there's tons of little races and some
fundamental problems.

This series here is an attempt to at least scope the problem, it's all the
issues I've found with quite some code reading all over the tree:
- first part tries to move mm/frame-vector.c away, it's fundamentally an
  unsafe thing
- two patches to close follow_pfn races by holding pt locks
- two pci patches where I spotted inconsinstencies between the 3 different
  ways userspace can map pci bars
- and finally some patches to mark up the remaining issue

No testing beyond "it compiles", this is very much an rfc to figure out
whether this makes sense, whether it's a real thing, and how to fix this
up properly.

Cheers, Daniel

Daniel Vetter (13):
  drm/exynos: Stop using frame_vector helpers
  drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  misc/habana: Stop using frame_vector helpers
  misc/habana: Use FOLL_LONGTERM for userptr
  mm/frame-vector: Use FOLL_LONGTERM
  media: videobuf2: Move frame_vector into media subsystem
  mm: close race in generic_access_phys
  s390/pci: Remove races against pte updates
  PCI: obey iomem restrictions for procfs mmap
  PCI: revoke mappings like devmem
  mm: add unsafe_follow_pfn
  media/videbuf1|2: Mark follow_pfn usage as unsafe
  vfio/type1: Mark follow_pfn as unsafe

 arch/s390/pci/pci_mmio.c  | 98 +++
 drivers/char/mem.c| 16 ++-
 drivers/gpu/drm/exynos/Kconfig|  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c   | 49 +-
 drivers/media/common/videobuf2/Kconfig|  1 -
 drivers/media/common/videobuf2/Makefile   |  1 +
 .../media/common/videobuf2}/frame_vector.c| 40 +++-
 drivers/media/platform/omap/Kconfig   |  1 -
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 drivers/misc/habanalabs/Kconfig   |  1 -
 drivers/misc/habanalabs/common/habanalabs.h   |  3 +-
 drivers/misc/habanalabs/common/memory.c   | 52 +-
 drivers/pci/mmap.c|  3 +
 drivers/pci/proc.c|  5 +
 drivers/vfio/vfio_iommu_type1.c   |  4 +-
 include/linux/ioport.h|  2 +
 include/linux/mm.h| 47 +
 include/media/videobuf2-core.h| 42 
 mm/Kconfig|  3 -
 mm/Makefile   |  1 -
 mm/memory.c   | 76 +-
 mm/nommu.c| 17 
 security/Kconfig  | 13 +++
 23 files changed, 296 insertions(+), 182 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (90%)

-- 
2.28.0



[PATCH 04/13] misc/habana: Use FOLL_LONGTERM for userptr

2020-10-07 Thread Daniel Vetter
These are persistent, not just for the duration of a dma operation.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Oded Gabbay 
Cc: Omer Shpigelman 
Cc: Ofir Bitton 
Cc: Tomer Tayar 
Cc: Moti Haimovski 
Cc: Daniel Vetter 
Cc: Greg Kroah-Hartman 
Cc: Pawel Piskorski 
---
 drivers/misc/habanalabs/common/memory.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/common/memory.c 
b/drivers/misc/habanalabs/common/memory.c
index ef89cfa2f95a..94bef8faa82a 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 
addr, u64 size,
return -ENOMEM;
}
 
-   rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+   rc = pin_user_pages_fast(start, npages,
+FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 userptr->pages);
 
if (rc != npages) {
-- 
2.28.0



[PATCH 08/13] s390/pci: Remove races against pte updates

2020-10-07 Thread Daniel Vetter
Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since zpci_memcpy_from|toio seems to not do anything nefarious with
locks we just need to open code get_pfn and follow_pfn and make sure
we drop the locks only after we've done. The write function also needs
the copy_from_user move, since we can't take userspace faults while
holding the mmap sem.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Dan Williams 
Cc: Kees Cook 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Niklas Schnelle 
Cc: Gerald Schaefer 
Cc: linux-s...@vger.kernel.org
---
 arch/s390/pci/pci_mmio.c | 98 +++-
 1 file changed, 57 insertions(+), 41 deletions(-)

diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index 401cf670a243..4d194cb09372 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
return rc;
 }
 
-static long get_pfn(unsigned long user_addr, unsigned long access,
-   unsigned long *pfn)
-{
-   struct vm_area_struct *vma;
-   long ret;
-
-   mmap_read_lock(current->mm);
-   ret = -EINVAL;
-   vma = find_vma(current->mm, user_addr);
-   if (!vma)
-   goto out;
-   ret = -EACCES;
-   if (!(vma->vm_flags & access))
-   goto out;
-   ret = follow_pfn(vma, user_addr, pfn);
-out:
-   mmap_read_unlock(current->mm);
-   return ret;
-}
-
 SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
const void __user *, user_buffer, size_t, length)
 {
u8 local_buf[64];
void __iomem *io_addr;
void *buf;
-   unsigned long pfn;
+   struct vm_area_struct *vma;
+   pte_t *ptep;
+   spinlock_t *ptl;
long ret;
 
if (!zpci_is_enabled())
@@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, 
mmio_addr,
 * We only support write access to MIO capable devices if we are on
 * a MIO enabled system. Otherwise we would have to check for every
 * address if it is a special ZPCI_ADDR and would have to do
-* a get_pfn() which we don't need for MIO capable devices.  Currently
+* a pfn lookup which we don't need for MIO capable devices.  Currently
 * ISM devices are the only devices without MIO support and there is no
 * known need for accessing these from userspace.
 */
@@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, 
mmio_addr,
} else
buf = local_buf;
 
-   ret = get_pfn(mmio_addr, VM_WRITE, );
+   ret = -EFAULT;
+   if (copy_from_user(buf, user_buffer, length))
+   goto out_free;
+
+   mmap_read_lock(current->mm);
+   ret = -EINVAL;
+   vma = find_vma(current->mm, mmio_addr);
+   if (!vma)
+   goto out_unlock_mmap;
+   ret = -EACCES;
+   if (!(vma->vm_flags & VM_WRITE))
+   goto out_unlock_mmap;
+   if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+   goto out_unlock_mmap;
+
+   ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, , NULL, );
if (ret)
-   goto out;
-   io_addr = (void __iomem *)((pfn << PAGE_SHIFT) |
+   goto out_unlock_mmap;
+
+   io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
(mmio_addr & ~PAGE_MASK));
 
-   ret = -EFAULT;
if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE)
-   goto out;
-
-   if (copy_from_user(buf, user_buffer, length))
-   goto out;
+   goto out_unlock_pt;
 
ret = zpci_memcpy_toio(io_addr, buf, length);
-out:
+out_unlock_pt:
+   pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+   mmap_read_unlock(current->mm);
+out_free:
if (buf != local_buf)
kfree(buf);
return ret;
@@ -274,7 +272,9 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, 
mmio_addr,
u8 local_buf[

[PATCH 05/13] mm/frame-vector: Use FOLL_LONGTERM

2020-10-07 Thread Daniel Vetter
This is used by media/videbuf2 for persistent dma mappings, not just
for a single dma operation and then freed again, so needs
FOLL_LONGTERM.

Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
locking issues. Rework the code to pull the pup path out from the
mmap_sem critical section as suggested by Jason.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Pawel Osciak 
Cc: Marek Szyprowski 
Cc: Kyungmin Park 
Cc: Tomasz Figa 
Cc: Mauro Carvalho Chehab 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
---
 mm/frame_vector.c | 36 +++-
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 10f82d5643b6..39db520a51dc 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int 
nr_frames,
struct vm_area_struct *vma;
int ret = 0;
int err;
-   int locked;
 
if (nr_frames == 0)
return 0;
@@ -48,35 +47,22 @@ int get_vaddr_frames(unsigned long start, unsigned int 
nr_frames,
 
start = untagged_addr(start);
 
+   ret = pin_user_pages_fast(start, nr_frames,
+ FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
+ (struct page **)(vec->ptrs));
+   if (ret > 0) {
+   vec->got_ref = true;
+   vec->is_pfns = false;
+   goto out_unlocked;
+   }
+
mmap_read_lock(mm);
-   locked = 1;
vma = find_vma_intersection(mm, start, start + 1);
if (!vma) {
ret = -EFAULT;
goto out;
}
 
-   /*
-* While get_vaddr_frames() could be used for transient (kernel
-* controlled lifetime) pinning of memory pages all current
-* users establish long term (userspace controlled lifetime)
-* page pinning. Treat get_vaddr_frames() like
-* get_user_pages_longterm() and disallow it for filesystem-dax
-* mappings.
-*/
-   if (vma_is_fsdax(vma)) {
-   ret = -EOPNOTSUPP;
-   goto out;
-   }
-
-   if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
-   vec->got_ref = true;
-   vec->is_pfns = false;
-   ret = pin_user_pages_locked(start, nr_frames,
-   gup_flags, (struct page **)(vec->ptrs), );
-   goto out;
-   }
-
vec->got_ref = false;
vec->is_pfns = true;
do {
@@ -101,8 +87,8 @@ int get_vaddr_frames(unsigned long start, unsigned int 
nr_frames,
vma = find_vma_intersection(mm, start, start + 1);
} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
 out:
-   if (locked)
-   mmap_read_unlock(mm);
+   mmap_read_unlock(mm);
+out_unlocked:
if (!ret)
ret = -EFAULT;
if (ret > 0)
-- 
2.28.0



[PATCH 07/13] mm: close race in generic_access_phys

2020-10-07 Thread Daniel Vetter
Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
  ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
  cma regions. This means if we miss the unmap the pfn might contain
  pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
  iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
  ("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since ioremap might need to manipulate pagetables too we need to drop
the pt lock and have a retry loop if we raced.

While at it, also add kerneldoc and improve the comment for the
vma_ops->access function. It's for accessing, not for moving the
memory from iomem to system memory, as the old comment seemed to
suggest.

References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
Cc: Jason Gunthorpe 
Cc: Dan Williams 
Cc: Kees Cook 
Cc: Rik van Riel 
Cc: Benjamin Herrensmidt 
Cc: Dave Airlie 
Cc: Hugh Dickins 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Signed-off-by: Daniel Vetter 
---
 include/linux/mm.h |  3 ++-
 mm/memory.c| 44 ++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index acd60fbf1a5a..2a16631c1fda 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -566,7 +566,8 @@ struct vm_operations_struct {
vm_fault_t (*pfn_mkwrite)(struct vm_fault *vmf);
 
/* called by access_process_vm when get_user_pages() fails, typically
-* for use by special VMAs that can switch between memory and hardware
+* for use by special VMAs. See also generic_access_phys() for a generic
+* implementation useful for any iomem mapping.
 */
int (*access)(struct vm_area_struct *vma, unsigned long addr,
  void *buf, int len, int write);
diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..8d467e23b44e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
return ret;
 }
 
+/**
+ * generic_access_phys - generic implementation for iomem mmap access
+ * @vma: the vma to access
+ * @addr: userspace addres, not relative offset within @vma
+ * @buf: buffer to read/write
+ * @len: length of transfer
+ * @write: set to FOLL_WRITE when writing, otherwise reading
+ *
+ * This is a generic implementation for _operations_struct.access for an
+ * iomem mapping. This callback is used by access_process_vm() when the @vma is
+ * not page based.
+ */
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
void *buf, int len, int write)
 {
resource_size_t phys_addr;
unsigned long prot = 0;
void __iomem *maddr;
+   pte_t *ptep, pte;
+   spinlock_t *ptl;
int offset = addr & (PAGE_SIZE-1);
+   int ret = -EINVAL;
+
+   if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+   return -EINVAL;
+
+retry:
+   if (follow_pte(vma->vm_mm, addr, , ))
+   return -EINVAL;
+   pte = *ptep;
+   pte_unmap_unlock(ptep, ptl);
 
-   if (follow_phys(vma, addr, write, , _addr))
+   prot = pgprot_val(pte_pgprot(pte));
+   phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
+
+   if ((write & FOLL_WRITE) && !pte_write(pte))
return -EINVAL;
 
maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
if (!maddr)
return -ENOMEM;
 
+   if (follow_pte(vma->vm_mm, addr, , ))
+   goto out_unmap;
+
+   if (pte_same(pte, *ptep)) {
+   pte_unmap_unlock(ptep, ptl);
+   iounmap(maddr);
+
+   goto retry;
+   }
+
if (write)
memcpy_toio(maddr + offset, buf, len);
else
memcpy_fromio(buf, maddr + offset, len);
+   ret = len;
+   pte_unmap_unlock(ptep, ptl);
+out_unmap:
iounmap(maddr);
 
-   return len;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(generic_access_phys);
 #endif
-- 
2.28.0



[PATCH 06/13] media: videobuf2: Move frame_vector into media subsystem

2020-10-07 Thread Daniel Vetter
It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
symbol from all over the tree (well just one place, somehow omap media
driver still had this in its Kconfig, despite not using it).

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Pawel Osciak 
Cc: Marek Szyprowski 
Cc: Kyungmin Park 
Cc: Tomasz Figa 
Cc: Mauro Carvalho Chehab 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Daniel Vetter 
---
 drivers/media/common/videobuf2/Kconfig|  1 -
 drivers/media/common/videobuf2/Makefile   |  1 +
 .../media/common/videobuf2}/frame_vector.c|  2 +
 drivers/media/platform/omap/Kconfig   |  1 -
 include/linux/mm.h| 42 ---
 include/media/videobuf2-core.h| 42 +++
 mm/Kconfig|  3 --
 mm/Makefile   |  1 -
 8 files changed, 45 insertions(+), 48 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)

diff --git a/drivers/media/common/videobuf2/Kconfig 
b/drivers/media/common/videobuf2/Kconfig
index edbc99ebba87..d2223a12c95f 100644
--- a/drivers/media/common/videobuf2/Kconfig
+++ b/drivers/media/common/videobuf2/Kconfig
@@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
 
 config VIDEOBUF2_MEMOPS
tristate
-   select FRAME_VECTOR
 
 config VIDEOBUF2_DMA_CONTIG
tristate
diff --git a/drivers/media/common/videobuf2/Makefile 
b/drivers/media/common/videobuf2/Makefile
index 77bebe8b202f..54306f8d096c 100644
--- a/drivers/media/common/videobuf2/Makefile
+++ b/drivers/media/common/videobuf2/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 videobuf2-common-objs := videobuf2-core.o
+videobuf2-common-objs += frame_vector.o
 
 ifeq ($(CONFIG_TRACEPOINTS),y)
   videobuf2-common-objs += vb2-trace.o
diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
similarity index 99%
rename from mm/frame_vector.c
rename to drivers/media/common/videobuf2/frame_vector.c
index 39db520a51dc..b95f4f371681 100644
--- a/mm/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -8,6 +8,8 @@
 #include 
 #include 
 
+#include 
+
 /**
  * get_vaddr_frames() - map virtual addresses to pfns
  * @start: starting user address
diff --git a/drivers/media/platform/omap/Kconfig 
b/drivers/media/platform/omap/Kconfig
index f73b5893220d..de16de46c0f4 100644
--- a/drivers/media/platform/omap/Kconfig
+++ b/drivers/media/platform/omap/Kconfig
@@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
depends on VIDEO_V4L2
select VIDEOBUF2_DMA_CONTIG
select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
-   select FRAME_VECTOR
help
  V4L2 Display driver support for OMAP2/3 based boards.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..acd60fbf1a5a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned 
long pages, bool inc);
 int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
struct task_struct *task, bool bypass_rlim);
 
-/* Container for pinned pfns / pages */
-struct frame_vector {
-   unsigned int nr_allocated;  /* Number of frames we have space for */
-   unsigned int nr_frames; /* Number of frames stored in ptrs array */
-   bool got_ref;   /* Did we pin pages by getting page ref? */
-   bool is_pfns;   /* Does array contain pages or pfns? */
-   void *ptrs[];   /* Array of pinned pfns / pages. Use
-* pfns_vector_pages() or pfns_vector_pfns()
-* for access */
-};
-
-struct frame_vector *frame_vector_create(unsigned int nr_frames);
-void frame_vector_destroy(struct frame_vector *vec);
-int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
-unsigned int gup_flags, struct frame_vector *vec);
-void put_vaddr_frames(struct frame_vector *vec);
-int frame_vector_to_pages(struct frame_vector *vec);
-void frame_vector_to_pfns(struct frame_vector *vec);
-
-static inline unsigned int frame_vector_count(struct frame_vector *vec)
-{
-   return vec->nr_frames;
-}
-
-static inline struct page **frame_vector_pages(struct frame_vector *vec)
-{
-   if (vec->is_pfns) {
-   int err = frame_vector_to_pages(vec);
-
-   if (err)
-   return ERR_PTR(err);
-   }
-   return (struct page **)(vec->ptrs);
-}
-
-static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
-{
-   if (!vec->is_pfns)
-   frame_vector_to_pfns(vec);
-   return (unsigned long *)(vec->ptrs);
-}
-
 struct kvec;
 int get_kernel_pages(const struct kvec 

[PATCH 03/13] misc/habana: Stop using frame_vector helpers

2020-10-07 Thread Daniel Vetter
All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter 
Cc: Jason Gunthorpe 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Oded Gabbay 
Cc: Omer Shpigelman 
Cc: Ofir Bitton 
Cc: Tomer Tayar 
Cc: Moti Haimovski 
Cc: Daniel Vetter 
Cc: Greg Kroah-Hartman 
Cc: Pawel Piskorski 
---
 drivers/misc/habanalabs/Kconfig |  1 -
 drivers/misc/habanalabs/common/habanalabs.h |  3 +-
 drivers/misc/habanalabs/common/memory.c | 51 +
 3 files changed, 23 insertions(+), 32 deletions(-)

diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
index 8eb5d38c618e..2f04187f7167 100644
--- a/drivers/misc/habanalabs/Kconfig
+++ b/drivers/misc/habanalabs/Kconfig
@@ -6,7 +6,6 @@
 config HABANA_AI
tristate "HabanaAI accelerators (habanalabs)"
depends on PCI && HAS_IOMEM
-   select FRAME_VECTOR
select DMA_SHARED_BUFFER
select GENERIC_ALLOCATOR
select HWMON
diff --git a/drivers/misc/habanalabs/common/habanalabs.h 
b/drivers/misc/habanalabs/common/habanalabs.h
index edbd627b29d2..c1b3ad613b15 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -881,7 +881,8 @@ struct hl_ctx_mgr {
 struct hl_userptr {
enum vm_type_t  vm_type; /* must be first */
struct list_headjob_node;
-   struct frame_vector *vec;
+   struct page **pages;
+   unsigned intnpages;
struct sg_table *sgt;
enum dma_data_direction dir;
struct list_headdebugfs_list;
diff --git a/drivers/misc/habanalabs/common/memory.c 
b/drivers/misc/habanalabs/common/memory.c
index 5ff4688683fd..ef89cfa2f95a 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 
addr, u64 size,
return -EFAULT;
}
 
-   userptr->vec = frame_vector_create(npages);
-   if (!userptr->vec) {
+   userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
+   GFP_KERNEL);
+   if (!userptr->pages) {
dev_err(hdev->dev, "Failed to create frame vector\n");
return -ENOMEM;
}
 
-   rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-   userptr->vec);
+   rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+userptr->pages);
 
if (rc != npages) {
dev_err(hdev->dev,
"Failed to map host memory, user ptr probably wrong\n");
if (rc < 0)
-   goto destroy_framevec;
+   goto destroy_pages;
+   npages = rc;
rc = -EFAULT;
-   goto put_framevec;
-   }
-
-   if (frame_vector_to_pages(userptr->vec) < 0) {
-   dev_err(hdev->dev,
-   "Failed to translate frame vector to pages\n");
-   rc = -EFAULT;
-   goto put_framevec;
+   goto put_pages;
}
+   userptr->npages = npages;
 
rc = sg_alloc_table_from_pages(userptr->sgt,
-   frame_vector_pages(userptr->vec),
-   npages, offset, size, GFP_ATOMIC);
+  userptr->pages,
+  npages, offset, size, GFP_ATOMIC);
if (rc < 0) {
dev_err(hdev->dev, "failed to create SG table from pages\n");
-   goto put_framevec;
+   goto put_pages;
}
 
return 0;
 
-put_framevec:
-   put_vaddr_frames(userptr->vec);
-destroy_framevec:
-   frame_vector_destroy(userptr->vec);
+put_pages:
+   unpin_user_pages(userptr->pages, npages);
+destroy_pages:
+   kvfree(userptr->pages);
return rc;
 }
 
@@ -1405,7 +1401,7 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, 
u64 size,
  */
 void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 {
-   struct page **pages;
+   int i;
 
hl_debugfs_remove_userptr(hdev, userptr);
 
@@ -1414,15 +1410,10 @@ void hl_unpin_host_memory(struct hl_device *hdev, 
struct hl_userptr *userptr)
userptr->sgt->nents,
userptr->dir);
 
-   pages = frame_vecto

Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 4:12 PM Tomasz Figa  wrote:
>
> On Wed, Oct 7, 2020 at 4:09 PM Daniel Vetter  wrote:
> >
> > On Wed, Oct 7, 2020 at 3:34 PM Tomasz Figa  wrote:
> > >
> > > On Wed, Oct 7, 2020 at 3:06 PM Jason Gunthorpe  wrote:
> > > >
> > > > On Wed, Oct 07, 2020 at 02:58:33PM +0200, Daniel Vetter wrote:
> > > > > On Wed, Oct 7, 2020 at 2:48 PM Tomasz Figa  wrote:
> > > > > >
> > > > > > On Wed, Oct 7, 2020 at 2:44 PM Jason Gunthorpe  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, Oct 07, 2020 at 02:33:56PM +0200, Marek Szyprowski wrote:
> > > > > > > > Well, it was in vb2_get_vma() function, but now I see that it 
> > > > > > > > has been
> > > > > > > > lost in fb639eb39154 and 6690c8c78c74 some time ago...
> > > > > > >
> > > > > > > There is no guarentee that holding a get on the file says anthing
> > > > > > > about the VMA. This needed to check that the file was some special
> > > > > > > kind of file that promised the VMA layout and file lifetime are
> > > > > > > connected.
> > > > > > >
> > > > > > > Also, cloning a VMA outside the mm world is just really bad. That
> > > > > > > would screw up many assumptions the drivers make.
> > > > > > >
> > > > > > > If it is all obsolete I say we hide it behind a default n config
> > > > > > > symbol and taint the kernel if anything uses it.
> > > > > > >
> > > > > > > Add a big comment above the follow_pfn to warn others away from 
> > > > > > > this
> > > > > > > code.
> > > > > >
> > > > > > Sadly it's just verbally declared as deprecated and not formally 
> > > > > > noted
> > > > > > anyway. There are a lot of userspace applications relying on user
> > > > > > pointer support.
> > > > >
> > > > > userptr can stay, it's the userptr abuse for zerocpy buffer sharing
> > > > > which doesn't work anymore. At least without major surgery (you'd need
> > > > > an mmu notifier to zap mappings and recreate them, and that pretty
> > > > > much breaks the v4l model of preallocating all buffers to make sure we
> > > > > never underflow the buffer queue). And static mappings are not coming
> > > > > back I think, we'll go ever more into the direction of dynamic
> > > > > mappings and moving stuff around as needed.
> > > >
> > > > Right, and to be clear, the last time I saw a security flaw of this
> > > > magnitude from a subsystem badly mis-designing itself, Linus's
> > > > knee-jerk reaction was to propose to remove the whole subsystem.
> > > >
> > > > Please don't take status-quo as acceptable, V4L community has to work
> > > > to resolve this, uABI breakage or not. The follow_pfn related code
> > > > must be compiled out of normal distro kernel builds.
> > >
> > > I think the userptr zero-copy hack should be able to go away indeed,
> > > given that we now have CMA that allows having carveouts backed by
> > > struct pages and having the memory represented as DMA-buf normally.
> >
> > Not sure whether there's a confusion here: dma-buf supports memory not
> > backed by struct page.
> >
>
> That's new to me. The whole API relies on sg_tables a lot, which in
> turn rely on struct page pointers to describe the physical memory.

You're not allowed to look at struct page pointers from the importer
side, those might not be there. Which isn't the prettiest thing, but
it works. And even if there's a struct page, you're still not allowed
to look at it, since it's fully managed by the exporter under whatever
rules that might need. So no touching it, ever.

This is also not news, supporting this was in the design brief from
the kickoff session 10+ years ago at some linaro connect thing (in
Budapest iirc). And we have implementations doing that for almost as
long merged in upstream.

> > > How about the regular userptr use case, though?
> > >
> > > The existing code resolves the user pointer into pages by following
> > > the get_vaddr_frames() -> frame_vector_to_pages() ->
> > > sg_alloc_table_from_pages() / vm_map_ram() approach.
> > > get_vaddr_frames() seems to use pin_user_p

Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 3:34 PM Tomasz Figa  wrote:
>
> On Wed, Oct 7, 2020 at 3:06 PM Jason Gunthorpe  wrote:
> >
> > On Wed, Oct 07, 2020 at 02:58:33PM +0200, Daniel Vetter wrote:
> > > On Wed, Oct 7, 2020 at 2:48 PM Tomasz Figa  wrote:
> > > >
> > > > On Wed, Oct 7, 2020 at 2:44 PM Jason Gunthorpe  wrote:
> > > > >
> > > > > On Wed, Oct 07, 2020 at 02:33:56PM +0200, Marek Szyprowski wrote:
> > > > > > Well, it was in vb2_get_vma() function, but now I see that it has 
> > > > > > been
> > > > > > lost in fb639eb39154 and 6690c8c78c74 some time ago...
> > > > >
> > > > > There is no guarentee that holding a get on the file says anthing
> > > > > about the VMA. This needed to check that the file was some special
> > > > > kind of file that promised the VMA layout and file lifetime are
> > > > > connected.
> > > > >
> > > > > Also, cloning a VMA outside the mm world is just really bad. That
> > > > > would screw up many assumptions the drivers make.
> > > > >
> > > > > If it is all obsolete I say we hide it behind a default n config
> > > > > symbol and taint the kernel if anything uses it.
> > > > >
> > > > > Add a big comment above the follow_pfn to warn others away from this
> > > > > code.
> > > >
> > > > Sadly it's just verbally declared as deprecated and not formally noted
> > > > anyway. There are a lot of userspace applications relying on user
> > > > pointer support.
> > >
> > > userptr can stay, it's the userptr abuse for zerocpy buffer sharing
> > > which doesn't work anymore. At least without major surgery (you'd need
> > > an mmu notifier to zap mappings and recreate them, and that pretty
> > > much breaks the v4l model of preallocating all buffers to make sure we
> > > never underflow the buffer queue). And static mappings are not coming
> > > back I think, we'll go ever more into the direction of dynamic
> > > mappings and moving stuff around as needed.
> >
> > Right, and to be clear, the last time I saw a security flaw of this
> > magnitude from a subsystem badly mis-designing itself, Linus's
> > knee-jerk reaction was to propose to remove the whole subsystem.
> >
> > Please don't take status-quo as acceptable, V4L community has to work
> > to resolve this, uABI breakage or not. The follow_pfn related code
> > must be compiled out of normal distro kernel builds.
>
> I think the userptr zero-copy hack should be able to go away indeed,
> given that we now have CMA that allows having carveouts backed by
> struct pages and having the memory represented as DMA-buf normally.

Not sure whether there's a confusion here: dma-buf supports memory not
backed by struct page.

> How about the regular userptr use case, though?
>
> The existing code resolves the user pointer into pages by following
> the get_vaddr_frames() -> frame_vector_to_pages() ->
> sg_alloc_table_from_pages() / vm_map_ram() approach.
> get_vaddr_frames() seems to use pin_user_pages() behind the scenes if
> the vma is not an IO or a PFNMAP, falling back to follow_pfn()
> otherwise.

Yeah pin_user_pages is fine, it's just the VM_IO | VM_PFNMAP vma that
don't work.
>
> Is your intention to drop get_vaddr_frames() or we could still keep
> using it and if vec->is_pfns is true:
> a) if CONFIG_VIDEO_LEGACY_PFN_USERPTR is set, taint the kernel
> b) otherwise just undo and fail?

I'm typing that patch series (plus a pile more) right now.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 2:48 PM Tomasz Figa  wrote:
>
> On Wed, Oct 7, 2020 at 2:44 PM Jason Gunthorpe  wrote:
> >
> > On Wed, Oct 07, 2020 at 02:33:56PM +0200, Marek Szyprowski wrote:
> > > Well, it was in vb2_get_vma() function, but now I see that it has been
> > > lost in fb639eb39154 and 6690c8c78c74 some time ago...
> >
> > There is no guarentee that holding a get on the file says anthing
> > about the VMA. This needed to check that the file was some special
> > kind of file that promised the VMA layout and file lifetime are
> > connected.
> >
> > Also, cloning a VMA outside the mm world is just really bad. That
> > would screw up many assumptions the drivers make.
> >
> > If it is all obsolete I say we hide it behind a default n config
> > symbol and taint the kernel if anything uses it.
> >
> > Add a big comment above the follow_pfn to warn others away from this
> > code.
>
> Sadly it's just verbally declared as deprecated and not formally noted
> anyway. There are a lot of userspace applications relying on user
> pointer support.

userptr can stay, it's the userptr abuse for zerocpy buffer sharing
which doesn't work anymore. At least without major surgery (you'd need
an mmu notifier to zap mappings and recreate them, and that pretty
much breaks the v4l model of preallocating all buffers to make sure we
never underflow the buffer queue). And static mappings are not coming
back I think, we'll go ever more into the direction of dynamic
mappings and moving stuff around as needed.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 12:47 PM Marek Szyprowski
 wrote:
>
> Hi Daniel,
>
> On 03.10.2020 11:40, Daniel Vetter wrote:
> >> After he three places above should use pin_user_pages_fast(), then
> >> this whole broken API should be moved into videobuf2-memops.c and a
> >> big fat "THIS DOESN'T WORK" stuck on it.
> >>
> >> videobuf2 should probably use P2P DMA buf for this instead.
> > Yup this should be done with dma_buf instead, and v4l has that.
>
> Yes, V4L2 has dma_buf support NOW. That days, using so called V4L2
> USERPTR method was the only way to achieve zero copy buffer sharing
> between devices, so this is just a historical baggage. I've been
> actively involved in implementing that. I've tried to make it secure as
> much as possible assuming the limitation of that approach. With a few
> assumptions it works fine. Buffers are refcounted both by the
> vm_ops->open or by incrementing the refcount of the vm->file. This
> basically works with any sane driver, which doesn't free the mmaped
> buffer until the file is released. This is true for V4L2 and FBdev devices.

I'm not seeing any of that vm->file refcounting going on, so not
seeing anything that prevents the mmap area from being removed. Can
you pls give me some pointers about which code you're thinking of?
-Daniel

> This API is considered as deprecated in V4L2 world, so I think
> supporting this hack can be removed one day as nowadays userspace should
> use dma buf.
>
>  > ...
>
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R Institute Poland
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] drm/fourcc: Add AXBXGXRX106106106106 format

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 11:29 AM Matteo Franchin  wrote:
>
> Add ABGR format with 10-bit components packed in 64-bit per pixel.
> This format can be used to handle
> VK_FORMAT_R10X6G10X6B10X6A10X6_UNORM_4PACK16 on little-endian
> architectures.
>
> Signed-off-by: Matteo Franchin 

So is this essentially 16 bit, with the lowest 6 bits in each channel
ignored? What exactly is this used for where the full 16bit format
doesn't work?
-Daniel

> ---
>  drivers/gpu/drm/drm_fourcc.c  | 1 +
>  include/uapi/drm/drm_fourcc.h | 7 +++
>  2 files changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c
> index 722c7ebe4e88..bba03fcb016d 100644
> --- a/drivers/gpu/drm/drm_fourcc.c
> +++ b/drivers/gpu/drm/drm_fourcc.c
> @@ -202,6 +202,7 @@ const struct drm_format_info *__drm_format_info(u32 
> format)
> { .format = DRM_FORMAT_XBGR16161616F,   .depth = 0,  
> .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1 },
> { .format = DRM_FORMAT_ARGB16161616F,   .depth = 0,  
> .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true 
> },
> { .format = DRM_FORMAT_ABGR16161616F,   .depth = 0,  
> .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true 
> },
> +   { .format = DRM_FORMAT_AXBXGXRX106106106106,.depth = 0,  
> .num_planes = 1, .cpp = { 8, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true 
> },
> { .format = DRM_FORMAT_RGB888_A8,   .depth = 32, 
> .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true 
> },
> { .format = DRM_FORMAT_BGR888_A8,   .depth = 32, 
> .num_planes = 2, .cpp = { 3, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true 
> },
> { .format = DRM_FORMAT_XRGB_A8, .depth = 32, 
> .num_planes = 2, .cpp = { 4, 1, 0 }, .hsub = 1, .vsub = 1, .has_alpha = true 
> },
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 82f327801267..76eedba52b77 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -155,6 +155,13 @@ extern "C" {
>  #define DRM_FORMAT_ARGB16161616F fourcc_code('A', 'R', '4', 'H') /* [63:0] 
> A:R:G:B 16:16:16:16 little endian */
>  #define DRM_FORMAT_ABGR16161616F fourcc_code('A', 'B', '4', 'H') /* [63:0] 
> A:B:G:R 16:16:16:16 little endian */
>
> +/*
> + * RGBA format with 10-bit components packed in 64-bit per pixel, with 6 bits
> + * of unused padding per component:
> + * [63:0] A:x:B:x:G:x:R:x 10:6:10:6:10:6:10:6 little endian
> + */
> +#define DRM_FORMAT_AXBXGXRX106106106106 fourcc_code('A', 'B', '1', '0')
> +
>  /* packed YCbCr */
>  #define DRM_FORMAT_YUYVfourcc_code('Y', 'U', 'Y', 'V') /* 
> [31:0] Cr0:Y1:Cb0:Y0 8:8:8:8 little endian */
>  #define DRM_FORMAT_YVYUfourcc_code('Y', 'V', 'Y', 'U') /* 
> [31:0] Cb0:Y1:Cr0:Y0 8:8:8:8 little endian */
> --
> 2.17.1
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH rdma-next v5 0/4] Dynamicaly allocate SG table from the pages

2020-10-07 Thread Daniel Vetter
On Wed, Oct 7, 2020 at 9:22 AM Jason Gunthorpe  wrote:
> On Tue, Oct 06, 2020 at 12:41:22PM +0200, Daniel Vetter wrote:
> > On Mon, Oct 05, 2020 at 08:56:50PM -0300, Jason Gunthorpe wrote:
> > > On Sun, Oct 04, 2020 at 06:43:36PM +0300, Leon Romanovsky wrote:
> > > > This series extends __sg_alloc_table_from_pages to allow chaining of
> > > > new pages to already initialized SG table.
> > > >
> > > > This allows for the drivers to utilize the optimization of merging 
> > > > contiguous
> > > > pages without a need to pre allocate all the pages and hold them in
> > > > a very large temporary buffer prior to the call to SG table 
> > > > initialization.
> > > >
> > > > The second patch changes the Infiniband driver to use the new API. It
> > > > removes duplicate functionality from the code and benefits the
> > > > optimization of allocating dynamic SG table from pages.
> > > >
> > > > In huge pages system of 2MB page size, without this change, the SG table
> > > > would contain x512 SG entries.
> > > > E.g. for 100GB memory registration:
> > > >
> > > >  Number of entries  Size
> > > > Before26214400  600.0MB
> > > > After512001.2MB
> > > >
> > > > Thanks
> > > >
> > > > Maor Gottlieb (2):
> > > >   lib/scatterlist: Add support in dynamic allocation of SG table from
> > > > pages
> > > >   RDMA/umem: Move to allocate SG table from pages
> > > >
> > > > Tvrtko Ursulin (2):
> > > >   tools/testing/scatterlist: Rejuvenate bit-rotten test
> > > >   tools/testing/scatterlist: Show errors in human readable form
> > >
> > > This looks OK, I'm going to send it into linux-next on the hmm tree
> > > for awhile to see if anything gets broken. If there is more
> > > remarks/tags/etc please continue
> >
> > An idea that just crossed my mind: A pin_user_pages_sgt might be useful
> > for both rdma and drm, since this would avoid the possible huge interim
> > struct pages array for thp pages. Or anything else that could be coalesced
> > down into a single sg entry.
> >
> > Not sure it's worth it, but would at least give a slightly neater
> > interface I think.
>
> We've talked about it. Christoph wants to see this area move to a biovec
> interface instead of sgl, but it might still be worthwhile to have an
> interm step at least as an API consolidation.

Hm but then we'd need a new struct for the mapped side of things
(which would still be what you get from dma-buf). That would be quite
a bit of work to roll out everywhere, and sgt isn't such a huge misfit
for passing buffer object mappings and system memory backing storage
around, and hence what we (very slowly) converging drivers/gpu towards
over the past 10 years or so.

And moving the dma_map step out of dma-buf doesn't work, because some
of the use-cases we have is for very special iommus which are managed
by the gpu driver directly. Stuff that e.g. rotates/retiles/compresses
on the fly, and is accessible by other (gfx related like video code,
camera, ..) devices. Not something I expect to ever be relevant for
rdma since this exist mostly on some small soc, but it's a thing.
Without that dma-buf could hand out biovec for struct_page backed
stuff, or some pfn_vec for the p2p stuff.

Anyway was just an idea, I guess we'll have to live with some
impedance mismatch since rolling out the one an only iovec structure
which suits everyone is I think impossible :-)

> Avoiding the page list would be complicated as we'd somehow have to
> code share the page table iterator scheme.

We're (slowly) getting towards thp for vram mappings and everything so
I guess for drivers/gpu we might make that happen. But yeah it'd be
not so pretty I think.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-06 Thread Daniel Vetter
On Tue, Oct 6, 2020 at 2:26 PM Jason Gunthorpe  wrote:
>
> On Tue, Oct 06, 2020 at 08:23:23AM +0200, Daniel Vetter wrote:
> > On Tue, Oct 6, 2020 at 1:41 AM Jason Gunthorpe  wrote:
> > >
> > > On Tue, Oct 06, 2020 at 12:43:31AM +0200, Daniel Vetter wrote:
> > >
> > > > > iow I think I can outright delete the frame vector stuff.
> > > >
> > > > Ok this doesn't work, because dma_mmap always uses a remap_pfn_range,
> > > > which is a VM_IO | VM_PFNMAP vma and so even if it's cma backed and
> > > > not a carveout, we can't get the pages.
> > >
> > > If CMA memory has struct pages it probably should be mmap'd with
> > > different flags, and the lifecycle of the CMA memory needs to respect
> > > the struct page refcount?
> >
> > I guess yes and no. The problem is if there's pagecache in the cma
> > region, pup(FOLL_LONGTERM) needs to first migrate those pages out of
> > the cma range. Because all normal page allocation in cma regions must
> > be migratable at all times.
>
> Eh? Then how are we doing follow_pfn() on this stuff and not being
> completely broken?
>
> The entire point of this framevec API is to pin the memory and
> preventing it from moving around.
>
> Sounds like it is fundamentally incompatible with CMA. Why is
> something trying to mix the two?

I think the assumption way back when this started is that any VM_IO |
VM_PFNMAP vma is perma-pinned because it's just a piece of carveout.
Of course this ignored that it could also be a piece of iomem and
peer2peer dma doens't Just Work, so could result in all kinds of
hilarity and hw exceptions. But no leaks. Well, if you assume that the
ownership of a device never changes after you've booted the system.

But now we have dynamic gpu memory management, a bunch of subsystems
that fully support revoke semantics (in a subsystem specific way), and
CMA trying really hard to make the old carveouts useable for the
system at large when the memory isn't needed by the device. So all
these assumptions behind follow_pfn are out of the window, and
follow_pfn is pretty much broken.

What's worse I noticed that even for static pfnmaps (for userspace
drivers) we now revoke access to those mmaps. For example implemented
for /dev/mem in 3234ac664a87 ("/dev/mem: Revoke mappings when a driver
claims the region"). Which means follow_pfn isn't even working
correctly anymore for that case, and it's all pretty much broken.

> > This is actually worse than the gpu case I had in mind, where at most
> > you can sneak access other gpu buffers. With cma you should be able to
> > get at arbitrary pagecache (well anything that's GFP_MOVEABLE really).
> > Nice :-(
>
> Ah, we have a winner :\

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-06 Thread Daniel Vetter
On Mon, Oct 5, 2020 at 7:58 PM Jason Gunthorpe  wrote:
>
> On Mon, Oct 05, 2020 at 07:53:08PM +0200, Jan Kara wrote:
> > On Mon 05-10-20 14:38:54, Jason Gunthorpe wrote:
> > > When get_vaddr_frames() does its hacky follow_pfn() loop it should never
> > > be allowed to extract a struct page from a normal VMA. This could allow a
> > > serious use-after-free problem on any kernel memory.
> > >
> > > Restrict this to only work on VMA's with one of VM_IO | VM_PFNMAP
> > > set. This limits the use-after-free problem to only IO memory, which while
> > > still serious, is an improvement.
> > >
> > > Cc: sta...@vger.kernel.org
> > > Fixes: 8025e5ddf9c1 ("[media] mm: Provide new get_vaddr_frames() helper")
> > > Signed-off-by: Jason Gunthorpe 
> > >  mm/frame_vector.c | 4 
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> > > index 10f82d5643b6de..26cb20544b6c37 100644
> > > +++ b/mm/frame_vector.c
> > > @@ -99,6 +99,10 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> > > nr_frames,
> > > if (ret >= nr_frames || start < vma->vm_end)
> > > break;
> > > vma = find_vma_intersection(mm, start, start + 1);
> > > +   if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > > +   ret = -EINVAL;
> > > +   goto out;
> > > +   }
> > > } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
> >
> > Hum, I fail to see how this helps. If vma has no VM_IO or VM_PFNMAP flag,
> > we'd exit the loop (to out: label) anyway due to the loop termination
> > condition and why not return the frames we already have? Furthermore
> > find_vma_intersection() can return NULL which would oops in your check
> > then. What am I missing?
>
> Oh, nothing, you are right. It just didn't read naturally because
> hitting the wrong kind of VMA should be an error condition :\

Also follow_pfn checks for this same conditionat already too, so this
isn't really stopping anything bad from happening.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH rdma-next v5 0/4] Dynamicaly allocate SG table from the pages

2020-10-06 Thread Daniel Vetter
On Mon, Oct 05, 2020 at 08:56:50PM -0300, Jason Gunthorpe wrote:
> On Sun, Oct 04, 2020 at 06:43:36PM +0300, Leon Romanovsky wrote:
> > This series extends __sg_alloc_table_from_pages to allow chaining of
> > new pages to already initialized SG table.
> > 
> > This allows for the drivers to utilize the optimization of merging 
> > contiguous
> > pages without a need to pre allocate all the pages and hold them in
> > a very large temporary buffer prior to the call to SG table initialization.
> > 
> > The second patch changes the Infiniband driver to use the new API. It
> > removes duplicate functionality from the code and benefits the
> > optimization of allocating dynamic SG table from pages.
> > 
> > In huge pages system of 2MB page size, without this change, the SG table
> > would contain x512 SG entries.
> > E.g. for 100GB memory registration:
> > 
> >  Number of entries  Size
> > Before26214400  600.0MB
> > After512001.2MB
> > 
> > Thanks
> > 
> > Maor Gottlieb (2):
> >   lib/scatterlist: Add support in dynamic allocation of SG table from
> > pages
> >   RDMA/umem: Move to allocate SG table from pages
> > 
> > Tvrtko Ursulin (2):
> >   tools/testing/scatterlist: Rejuvenate bit-rotten test
> >   tools/testing/scatterlist: Show errors in human readable form
> 
> This looks OK, I'm going to send it into linux-next on the hmm tree
> for awhile to see if anything gets broken. If there is more
> remarks/tags/etc please continue

An idea that just crossed my mind: A pin_user_pages_sgt might be useful
for both rdma and drm, since this would avoid the possible huge interim
struct pages array for thp pages. Or anything else that could be coalesced
down into a single sg entry.

Not sure it's worth it, but would at least give a slightly neater
interface I think.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 13/14] drm/msm: Drop struct_mutex in shrinker path

2020-10-06 Thread Daniel Vetter
On Mon, Oct 05, 2020 at 08:40:12PM -0700, Rob Clark wrote:
> On Mon, Oct 5, 2020 at 5:44 PM Hillf Danton  wrote:
> >
> >
> > On Mon, 5 Oct 2020 18:17:01 Kristian H. Kristensen wrote:
> > > On Mon, Oct 5, 2020 at 4:02 PM Daniel Vetter  wrote:
> > > >
> > > > On Mon, Oct 05, 2020 at 05:24:19PM +0800, Hillf Danton wrote:
> > > > >
> > > > > On Sun,  4 Oct 2020 12:21:45
> > > > > > From: Rob Clark 
> > > > > >
> > > > > > Now that the inactive_list is protected by mm_lock, and everything
> > > > > > else on per-obj basis is protected by obj->lock, we no longer depend
> > > > > > on struct_mutex.
> > > > > >
> > > > > > Signed-off-by: Rob Clark 
> > > > > > ---
> > > > > >  drivers/gpu/drm/msm/msm_gem.c  |  1 -
> > > > > >  drivers/gpu/drm/msm/msm_gem_shrinker.c | 54 
> > > > > > --
> > > > > >  2 files changed, 55 deletions(-)
> > > > > >
> > > > > [...]
> > > > >
> > > > > > @@ -71,13 +33,8 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, 
> > > > > > struct shrink_control *sc)
> > > > > >  {
> > > > > > struct msm_drm_private *priv =
> > > > > > container_of(shrinker, struct msm_drm_private, 
> > > > > > shrinker);
> > > > > > -   struct drm_device *dev = priv->dev;
> > > > > > struct msm_gem_object *msm_obj;
> > > > > > unsigned long freed = 0;
> > > > > > -   bool unlock;
> > > > > > -
> > > > > > -   if (!msm_gem_shrinker_lock(dev, ))
> > > > > > -   return SHRINK_STOP;
> > > > > >
> > > > > > mutex_lock(>mm_lock);
> > > > >
> > > > > Better if the change in behavior is documented that SHRINK_STOP will
> > > > > no longer be needed.
> > > >
> > > > btw I read through this and noticed you have your own obj lock, plus
> > > > mutex_lock_nested. I strongly recommend to just cut over to 
> > > > dma_resv_lock
> > > > for all object lock needs (soc drivers have been terrible with this
> > > > unfortuntaly), and in the shrinker just use dma_resv_trylock instead of
> > > > trying to play clever games outsmarting lockdep.
> >
> > The trylock makes page reclaimers turn to their next target e.g. inode
> > cache instead of waiting for the mutex to be released. It makes sense
> > for instance in scenarios of mild memory pressure.
> 
> is there some behind-the-scenes signalling for this, or is this just
> down to what the shrinker callbacks return?  Generally when we get
> into shrinking, there are a big set of purgable bo's to consider, so
> the shrinker callback return wouldn't be considering just one
> potentially lock contended bo (buffer object).  Ie failing one
> trylock, we just move on to the next.
> 
> fwiw, what I've seen on the userspace bo cache vs shrinker (anything
> that is shrinker potential is in userspace bo cache and
> MADV(WONTNEED)) is that in steady state I see a very strong recycling
> of bo's (which avoids allocating and mmap'ing or mapping to gpu a new
> buffer object), so it is definitely a win in mmap/realloc bandwidth..
> in steady state there is a lot of free and realloc of same-sized
> buffers from frame to frame.
> 
> But in transient situations like moving to new game level when there
> is a heavy memory pressure and lots of freeing old
> buffers/textures/etc and then allocating new ones, I see shrinker
> kicking in hard (in android situations, not so much so with
> traditional linux userspace)

Yeah per-buffer trylock is fine. Trylock on the mm_lock (or anything else
device-global, like struct_mutex and msm_gem_shrinker_lock) I think isn't
fine, since if you're unlucky you're hogging a ton of memory and that's
the only freeable resource in the system. Going to other shrinkers won't
help when it's the gpu shrinker that has all the freeable memory.

Also other shrinkers (inode and all these) also do lots of per-object
trylocking. I think there's a canonical threshold of shrinker rounds where
you're supposed to try harder (if possible), but that doesn't apply to
dma_resv_lock.
-Daniel

> 
> BR,
> -R
> 
> >
> > > >
> > > > I recently wrote an entire blog length rant on why I think
> > > > mutex_lock_nested is too dangerous to be us

Re: [PATCH v2 0/3] drm: commit_work scheduling

2020-10-06 Thread Daniel Vetter
eads.  We want commit_work to
> >
> > Why are they at different priorities? Different priority levels means that 
> > some
> > of them have more urgent deadlines to meet and it's okay to steal execution
> > time from lower priority tasks. Is this the case?
> 
> tbh, I'm not fully aware of the background.  It looks like most of the
> SF threads run at priority=2 (100-2==98), and the main one runs at
> priority=1
> 
> > RT planning and partitioning is not easy task for sure. You might want to
> > consider using affinities too to get stronger guarantees for some tasks and
> > prevent cross-talking.
> 
> There is some cgroup stuff that is pinning SF and some other stuff to
> the small cores, fwiw.. I think the reasoning is that they shouldn't
> be doing anything heavy enough to need the big cores.
> 
> > > run ASAP once fences are signalled, and vblank_work to run at a
> > > slightly higher priority still.  But the correct choice for priorities
> > > here depends on what userspace is using, it all needs to fit together
> > > properly.
> >
> > By userspace here I think you mean none display pipeline related RT tasks 
> > that
> > you need to coexit with and could still disrupt your pipeline?
> 
> I mean, commit_work should be higher priority than the other (display
> related) RT tasks.  But the kernel doesn't know what those priorities
> are.
> 
> > Using RT on Gerneral Purpose System is hard for sure. One of the major
> > challenge is that there's no admin that has full view of the system to do
> > proper RT planning.
> >
> > We need proper RT balancer daemon that helps partitioning the system for
> > multiple RT apps on these systems..
> >
> > >
> > > >
> > > > I do appreciate that maybe some of these tasks have varying 
> > > > requirements during
> > > > their life time. e.g: they have RT property during specific critical 
> > > > section
> > > > but otherwise are CFS tasks. I think the UI thread in Android behaves 
> > > > like
> > > > that.
> > > >
> > > > It's worth IMO trying that approach I pointed out earlier to see if 
> > > > making RT
> > > > try to pick an idle CPU rather than preempt CFS helps. Not sure if it'd 
> > > > be
> > > > accepted but IMHO it's a better direction to consider and discuss.
> > >
> > > The problem I was seeing was actually the opposite..  commit_work
> > > becomes runnable (fences signalled) but doesn't get a chance to run
> > > because a SCHED_FIFO SF thread is running.  (Maybe I misunderstood and
> > > you're approach would help this case too?)
> >
> > Ah okay. Sorry I got it the wrong way around for some reason. I thought this
> > task is preempting other CFS-based pipelined tasks.
> >
> > So your system seems to be overcomitted. Is SF short for SufraceFlinger? 
> > Under
> > what scenarios do you have many SurfaceFlinger tasks? On Android I remember
> > seeing they have priority of 1 or 2.
> 
> yeah, SF==SurfaceFlinger, and yeah, 1 and 2..
> 
> > sched_set_fifo() will use priority 50. If you set all your pipeline tasks
> > to this priority, what happens?
> 
> I think this would work.. drm/msm doesn't use vblank work, so I
> wouldn't really have problems with commit_work preempting vblank_work.
> But I think the best option (and to handle the case if android changes
> the RT priorties around in the future) is to let userspace set the
> priorities.
> 
> > >
> > > > Or maybe you can wrap userspace pipeline critical section lock such 
> > > > that any
> > > > task holding it will automatically be promoted to SCHED_FIFO and then 
> > > > demoted
> > > > to CFS once it releases it.
> > >
> > > The SCHED_DEADLINE + token passing approach that the lwn article
> > > mentioned sounds interesting, if that eventually becomes possible.
> > > But doesn't really help today..
> >
> > We were present in the room with Alessio when he gave that talk :-)
> >
> > You might have seen Valentin's talk in LPC where he's trying to get
> > proxy-execution into shape. Which is a pre-requisite to enable using of
> > SCHED_DEADLINE for these scenarios. IIRC it should allow all dependent 
> > tasks to
> > run from the context of the deadline task during the display pipeline 
> > critical
> > section.
> >
> > By the way, do you have issues with SoftIrqs delaying your RT tasks 
> > execution
> > time?
> 
> I don't *think* so, but I'm not 100% sure if they are showing up in
> traces.  So far it seems like SF stomping on commit_work.  (There is
> the added complication that there are some chrome gpu-process tasks in
> between SF and the display, including CrGpuMain (which really doesn't
> want to be SCHED_FIFO when executing gl commands on behalf of
> something unrelated to the compositor.. the deadline approach, IIUC,
> might be the better option eventually for this?)

deadline has the upshot that it compose much better than SCHED_FIFO:
Everyone just drops their deadline requirements onto the scheduler,
scheduler makes sure it's all obeyed (or rejects your request).

The trouble is we'd need to know how long a commit takes, worst case, on a
given platform. And for that you need to measure stuff, and we kinda can't
spend a few minutes at boot-up going through the combinatorial maze of
atomic commits to make sure we have it all.

So I think in practice letting userspace set the right rt priority/mode is
the only way to go here :-/
-Daniel

> 
> BR,
> -R
> 
> >
> > Thanks
> >
> > --
> > Qais Yousef

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-06 Thread Daniel Vetter
On Tue, Oct 6, 2020 at 1:41 AM Jason Gunthorpe  wrote:
>
> On Tue, Oct 06, 2020 at 12:43:31AM +0200, Daniel Vetter wrote:
>
> > > iow I think I can outright delete the frame vector stuff.
> >
> > Ok this doesn't work, because dma_mmap always uses a remap_pfn_range,
> > which is a VM_IO | VM_PFNMAP vma and so even if it's cma backed and
> > not a carveout, we can't get the pages.
>
> If CMA memory has struct pages it probably should be mmap'd with
> different flags, and the lifecycle of the CMA memory needs to respect
> the struct page refcount?

I guess yes and no. The problem is if there's pagecache in the cma
region, pup(FOLL_LONGTERM) needs to first migrate those pages out of
the cma range. Because all normal page allocation in cma regions must
be migratable at all times. But when you use cma as the contig
allocator (mostly with dma_alloc_coherent) and then remap that for
userspace (e.g. dma_mmap_wc), then anyone doing pup or gup should not
try to migrate anything. Also in the past these contig ranges where
generally carveouts without any struct page, changing that would break
too much I guess.

> > Plus trying to move the cma pages out of cma for FOLL_LONGTERM would
> > be kinda bad when they've been allocated as a contig block by
> > dma_alloc_coherent :-)
>
> Isn't holding a long term reference to a CMA page one of those really
> scary use-after-free security issues I've been talking about?
>
> I know nothing about CMA, so can't say too much, sorry

Uh ... yes :-/

This is actually worse than the gpu case I had in mind, where at most
you can sneak access other gpu buffers. With cma you should be able to
get at arbitrary pagecache (well anything that's GFP_MOVEABLE really).
Nice :-(
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-05 Thread Daniel Vetter
On Mon, Oct 5, 2020 at 8:54 PM Daniel Vetter  wrote:
>
> On Mon, Oct 5, 2020 at 8:37 PM Jason Gunthorpe  wrote:
> >
> > On Mon, Oct 05, 2020 at 08:16:33PM +0200, Daniel Vetter wrote:
> >
> > > > kvm is some similar hack added for P2P DMA, see commit
> > > > add6a0cd1c5ba51b201e1361b05a5df817083618. It might be protected by 
> > > > notifiers..
> > >
> > > Yeah my thinking is that kvm (and I think also vfio, also seems to
> > > have mmu notifier nearby) are ok because of the mmu notiifer. Assuming
> > > that one works correctly.
> >
> > vfio doesn't have a notifier, Alex was looking to add a vfio private
> > scheme in the vma->private_data:
> >
> > https://lore.kernel.org/kvm/159017449210.18853.15037950701494323009.st...@gimli.home/
> >
> > Guess it never happened.
>
> I was mislead by the mmu notifier in drivers/vfio/vfio.c. But looking
> closer, that's only used by some drivers, I guess to make sure their
> device pagetables are kept in sync with reality. And not to make sure
> the vfio pfn view is kept in sync with reality.
>
> This could get real nasty I think.
>
> > > > So, the answer really is that s390 and media need fixing, and this API
> > > > should go away (or become kvm specific)
> > >
> > > I'm still not clear how you want fo fix this, since your vma->dma_buf
> > > idea is kinda a decade long plan and so just not going to happen:
> >
> > Well, it doesn't mean we have to change every part of dma_buf to
> > participate in this. Just the bits media cares about. Or maybe it is
> > some higher level varient on top of dma_buf.
> >
> > Or don't use dma_buf for this, add a new object that just provides
> > refcounts and P2P DMA connection for IO pfn ranges..
>
> So good news is, I dug some layers deeper in v4l, and there's only 2
> users which do actually handle pfn and don't immediately convert to a
> pages array:
> - videbuf-dma-contig.c. Luckily videobuf 1 is deprecated since
> forever, so I think we might get away with either just breaking this,
> or at least tainting kernels and hiding it behind a nasty Kconfig.
> This only uses follow_pfn, which we need to keep anyway for vfio in
> the unsafe variant :-/
> - videbuf2-vmalloc.c Digging through history this was added to support
> import of v4l buffers from drivers that needed contig memory. And way
> back before CMA, that meant carveout memory not backed by struct page
> *. That should now all have struct pages and be managed by CMA (since
> videbuf2-dma-contig.c just uses dma_alloc_coherent underneath), so I
> think we can just switch to pin_user_pages(FOLL_LONGTERM here too).
>
> iow I think I can outright delete the frame vector stuff.

Ok this doesn't work, because dma_mmap always uses a remap_pfn_range,
which is a VM_IO | VM_PFNMAP vma and so even if it's cma backed and
not a carveout, we can't get the pages. Plus trying to move the cma
pages out of cma for FOLL_LONGTERM would be kinda bad when they've
been allocated as a contig block by dma_alloc_coherent :-)

So this idea of switching over to pup only is going to break zerocopy.
I guess I'll need something else for this then.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] Revert "gpu/drm: ingenic: Add option to mmap GEM buffers cached"

2020-10-05 Thread Daniel Vetter
On Mon, Oct 5, 2020 at 4:47 PM Paul Cercueil  wrote:
>
> Hi,
>
> Le lun. 5 oct. 2020 à 16:05, Daniel Vetter  a écrit :
> > On Mon, Oct 05, 2020 at 11:01:50PM +1100, Stephen Rothwell wrote:
> >>  Hi Paul,
> >>
> >>  On Sun, 04 Oct 2020 22:11:23 +0200 Paul Cercueil
> >>  wrote:
> >>  >
> >>  > Pushed to drm-misc-next with the changelog fix, thanks.
> >>  >
> >>  > Stephen:
> >>  > Now it should build fine again. Could you remove the BROKEN flag?
> >>
> >>  Thanks for letting me know, but the fix has not appeared in any drm
> >>  tree included in linux-next yet ...
> >>
> >>  If it doesn't show up by the time I will merge the drm tree
> >> tomorrow, I
> >>  will apply this revert patch myself (instead of the patch marking
> >> the
> >>  driver BROKEN).
> >
> > Yeah it should have been pushed to drm-misc-next-fixes per
> >
> > https://drm.pages.freedesktop.org/maintainer-tools/committer-drm-misc.html#where-do-i-apply-my-patch
> >
> > Paul, can you pls git cherry-pick -x this over to drm-misc-next-fixes?
>
> I had a few commits on top of it in drm-misc-next, so the revert
> doesn't apply cleanly in drm-misc-next-fixes... I can revert it there,
> but then we'd have a different revert commit in drm-misc-next and
> drm-misc-next-next.
>
> Sorry for the mess. What should I do?

Hm not sure why, but the reply I thought I've typed didn't seem to
have gone out.

Cherry pick up, fix up conflict and then fix up the conflict when
rebuilding drm-tip. Please tell drm-misc maintainers, they probably
want to do a backmerge once the drm-next merge window pull is merged
in Linus tree.

If we don't fix this up then the drm-next pull goes nowhere.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-05 Thread Daniel Vetter
On Mon, Oct 5, 2020 at 8:37 PM Jason Gunthorpe  wrote:
>
> On Mon, Oct 05, 2020 at 08:16:33PM +0200, Daniel Vetter wrote:
>
> > > kvm is some similar hack added for P2P DMA, see commit
> > > add6a0cd1c5ba51b201e1361b05a5df817083618. It might be protected by 
> > > notifiers..
> >
> > Yeah my thinking is that kvm (and I think also vfio, also seems to
> > have mmu notifier nearby) are ok because of the mmu notiifer. Assuming
> > that one works correctly.
>
> vfio doesn't have a notifier, Alex was looking to add a vfio private
> scheme in the vma->private_data:
>
> https://lore.kernel.org/kvm/159017449210.18853.15037950701494323009.st...@gimli.home/
>
> Guess it never happened.

I was mislead by the mmu notifier in drivers/vfio/vfio.c. But looking
closer, that's only used by some drivers, I guess to make sure their
device pagetables are kept in sync with reality. And not to make sure
the vfio pfn view is kept in sync with reality.

This could get real nasty I think.

> > > So, the answer really is that s390 and media need fixing, and this API
> > > should go away (or become kvm specific)
> >
> > I'm still not clear how you want fo fix this, since your vma->dma_buf
> > idea is kinda a decade long plan and so just not going to happen:
>
> Well, it doesn't mean we have to change every part of dma_buf to
> participate in this. Just the bits media cares about. Or maybe it is
> some higher level varient on top of dma_buf.
>
> Or don't use dma_buf for this, add a new object that just provides
> refcounts and P2P DMA connection for IO pfn ranges..

So good news is, I dug some layers deeper in v4l, and there's only 2
users which do actually handle pfn and don't immediately convert to a
pages array:
- videbuf-dma-contig.c. Luckily videobuf 1 is deprecated since
forever, so I think we might get away with either just breaking this,
or at least tainting kernels and hiding it behind a nasty Kconfig.
This only uses follow_pfn, which we need to keep anyway for vfio in
the unsafe variant :-/
- videbuf2-vmalloc.c Digging through history this was added to support
import of v4l buffers from drivers that needed contig memory. And way
back before CMA, that meant carveout memory not backed by struct page
*. That should now all have struct pages and be managed by CMA (since
videbuf2-dma-contig.c just uses dma_alloc_coherent underneath), so I
think we can just switch to pin_user_pages(FOLL_LONGTERM here too).

iow I think I can outright delete the frame vector stuff.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Freedreno] [PATCH 00/14] drm/msm: de-struct_mutex-ification

2020-10-05 Thread Daniel Vetter
On Mon, Oct 5, 2020 at 6:24 PM Kristian Høgsberg  wrote:
>
> On Sun, Oct 4, 2020 at 9:21 PM Rob Clark  wrote:
> >
> > From: Rob Clark 
> >
> > This doesn't remove *all* the struct_mutex, but it covers the worst
> > of it, ie. shrinker/madvise/free/retire.  The submit path still uses
> > struct_mutex, but it still needs *something* serialize a portion of
> > the submit path, and lock_stat mostly just shows the lock contention
> > there being with other submits.  And there are a few other bits of
> > struct_mutex usage in less critical paths (debugfs, etc).  But this
> > seems like a reasonable step in the right direction.
>
> What a great patch set. Daniel has some good points and nothing that
> requires big changes, but on the other hand, I'm not sure it's
> something that needs to block this set either.

Personally I'd throw the lockdep priming on top to make sure this
stays correct (it's 3 lines), but yes imo this is all good to go. Just
figured I'll sprinkle the latest in terms of gem locking over the
series while it's here :-)
-Daniel

> Either way, for the series
>
> Reviewed-by: Kristian H. Kristensen 
>
> > Rob Clark (14):
> >   drm/msm: Use correct drm_gem_object_put() in fail case
> >   drm/msm: Drop chatty trace
> >   drm/msm: Move update_fences()
> >   drm/msm: Add priv->mm_lock to protect active/inactive lists
> >   drm/msm: Document and rename preempt_lock
> >   drm/msm: Protect ring->submits with it's own lock
> >   drm/msm: Refcount submits
> >   drm/msm: Remove obj->gpu
> >   drm/msm: Drop struct_mutex from the retire path
> >   drm/msm: Drop struct_mutex in free_object() path
> >   drm/msm: remove msm_gem_free_work
> >   drm/msm: drop struct_mutex in madvise path
> >   drm/msm: Drop struct_mutex in shrinker path
> >   drm/msm: Don't implicit-sync if only a single ring
> >
> >  drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  4 +-
> >  drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 12 +--
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c |  4 +-
> >  drivers/gpu/drm/msm/msm_debugfs.c |  7 ++
> >  drivers/gpu/drm/msm/msm_drv.c | 15 +---
> >  drivers/gpu/drm/msm/msm_drv.h | 19 +++--
> >  drivers/gpu/drm/msm/msm_gem.c | 76 ++
> >  drivers/gpu/drm/msm/msm_gem.h | 53 +
> >  drivers/gpu/drm/msm/msm_gem_shrinker.c| 58 ++
> >  drivers/gpu/drm/msm/msm_gem_submit.c  | 17 ++--
> >  drivers/gpu/drm/msm/msm_gpu.c | 96 ++-
> >  drivers/gpu/drm/msm/msm_gpu.h |  5 +-
> >  drivers/gpu/drm/msm/msm_ringbuffer.c  |  3 +-
> >  drivers/gpu/drm/msm/msm_ringbuffer.h  | 13 ++-
> >  14 files changed, 188 insertions(+), 194 deletions(-)
> >
> > --
> > 2.26.2
> >
> > _______
> > Freedreno mailing list
> > freedr...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/freedreno
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 13/14] drm/msm: Drop struct_mutex in shrinker path

2020-10-05 Thread Daniel Vetter
On Mon, Oct 5, 2020 at 6:49 PM Rob Clark  wrote:
>
> On Mon, Oct 5, 2020 at 7:02 AM Daniel Vetter  wrote:
> >
> > On Mon, Oct 05, 2020 at 05:24:19PM +0800, Hillf Danton wrote:
> > >
> > > On Sun,  4 Oct 2020 12:21:45
> > > > From: Rob Clark 
> > > >
> > > > Now that the inactive_list is protected by mm_lock, and everything
> > > > else on per-obj basis is protected by obj->lock, we no longer depend
> > > > on struct_mutex.
> > > >
> > > > Signed-off-by: Rob Clark 
> > > > ---
> > > >  drivers/gpu/drm/msm/msm_gem.c  |  1 -
> > > >  drivers/gpu/drm/msm/msm_gem_shrinker.c | 54 --
> > > >  2 files changed, 55 deletions(-)
> > > >
> > > [...]
> > >
> > > > @@ -71,13 +33,8 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, 
> > > > struct shrink_control *sc)
> > > >  {
> > > > struct msm_drm_private *priv =
> > > > container_of(shrinker, struct msm_drm_private, shrinker);
> > > > -   struct drm_device *dev = priv->dev;
> > > > struct msm_gem_object *msm_obj;
> > > > unsigned long freed = 0;
> > > > -   bool unlock;
> > > > -
> > > > -   if (!msm_gem_shrinker_lock(dev, ))
> > > > -   return SHRINK_STOP;
> > > >
> > > > mutex_lock(>mm_lock);
> > >
> > > Better if the change in behavior is documented that SHRINK_STOP will
> > > no longer be needed.
> >
> > btw I read through this and noticed you have your own obj lock, plus
> > mutex_lock_nested. I strongly recommend to just cut over to dma_resv_lock
> > for all object lock needs (soc drivers have been terrible with this
> > unfortuntaly), and in the shrinker just use dma_resv_trylock instead of
> > trying to play clever games outsmarting lockdep.
> >
> > I recently wrote an entire blog length rant on why I think
> > mutex_lock_nested is too dangerous to be useful:
> >
> > https://blog.ffwll.ch/2020/08/lockdep-false-positives.html
> >
> > Not anything about this here, just general comment. The problem extends to
> > shmem helpers and all that also having their own locks for everything.
>
> the shrinker lock class has existed for a while.. and is based on the
> idea that anything in the get-pages/vmap path cannot happen on a
> WONTNEED bo.. although perhaps there should be a few more 'if
> (WARN_ON(obj->madv != WILLNEED)) return -EBUSY'..

Yeah it works, but it's the kind of really clever stuff that
eventually bites again. For pretty much no benefit, if the lock is
held then you can assume someone else is using the object and you
won't be able to shrink it anyway. So trylock is enough.

> replacing obj->lock with dma_resv lock, might be a nice cleanup.. but
> I think it will be a bit churny..

Oh fully agreed, I tried to push the helpers a bit in that direction
for shmem/cma and gave up. Just something I think we should have in
mind. And in case your gpu ever becomes discrete ... yes the churn is
terrible :-/
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-05 Thread Daniel Vetter
On Mon, Oct 5, 2020 at 7:58 PM Jason Gunthorpe  wrote:
>
> On Mon, Oct 05, 2020 at 07:53:08PM +0200, Jan Kara wrote:
> > On Mon 05-10-20 14:38:54, Jason Gunthorpe wrote:
> > > When get_vaddr_frames() does its hacky follow_pfn() loop it should never
> > > be allowed to extract a struct page from a normal VMA. This could allow a
> > > serious use-after-free problem on any kernel memory.
> > >
> > > Restrict this to only work on VMA's with one of VM_IO | VM_PFNMAP
> > > set. This limits the use-after-free problem to only IO memory, which while
> > > still serious, is an improvement.
> > >
> > > Cc: sta...@vger.kernel.org
> > > Fixes: 8025e5ddf9c1 ("[media] mm: Provide new get_vaddr_frames() helper")
> > > Signed-off-by: Jason Gunthorpe 
> > >  mm/frame_vector.c | 4 
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> > > index 10f82d5643b6de..26cb20544b6c37 100644
> > > +++ b/mm/frame_vector.c
> > > @@ -99,6 +99,10 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> > > nr_frames,
> > > if (ret >= nr_frames || start < vma->vm_end)
> > > break;
> > > vma = find_vma_intersection(mm, start, start + 1);
> > > +   if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > > +   ret = -EINVAL;
> > > +   goto out;
> > > +   }
> > > } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
> >
> > Hum, I fail to see how this helps. If vma has no VM_IO or VM_PFNMAP flag,
> > we'd exit the loop (to out: label) anyway due to the loop termination
> > condition and why not return the frames we already have? Furthermore
> > find_vma_intersection() can return NULL which would oops in your check
> > then. What am I missing?
>
> Oh, nothing, you are right. It just didn't read naturally because
> hitting the wrong kind of VMA should be an error condition :\

Afaik these mmio maps should all be VM_DONTEXPAND (or at least the
ones in drivers/gpu are all), so not sure why we need the loop here.
But maybe there's some drivers that don't set that, or have other
funny things going on with userspace piecing the mmap together, and
I'm not going to audit them all :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-05 Thread Daniel Vetter
On Mon, Oct 5, 2020 at 7:28 PM Jason Gunthorpe  wrote:
>
> On Sun, Oct 04, 2020 at 06:09:29PM +0200, Daniel Vetter wrote:
> > On Sun, Oct 4, 2020 at 2:51 PM Jason Gunthorpe  wrote:
> > >
> > > On Sat, Oct 03, 2020 at 11:40:22AM +0200, Daniel Vetter wrote:
> > >
> > > > > That leaves the only interesting places as vb2_dc_get_userptr() and
> > > > > vb2_vmalloc_get_userptr() which both completely fail to follow the
> > > > > REQUIRED behavior in the function's comment about checking PTEs. It
> > > > > just DMA maps them. Badly broken.
> > > > >
> > > > > Guessing this hackery is for some embedded P2P DMA transfer?
> > > >
> > > > Yeah, see also the follow_pfn trickery in
> > > > videobuf_dma_contig_user_get(), I think this is fully intentional and
> > > > userspace abi we can't break :-/
> > >
> > > We don't need to break uABI, it just needs to work properly in the
> > > kernel:
> > >
> > >   vma = find_vma_intersection()
> > >   dma_buf = dma_buf_get_from_vma(vma)
> > >   sg = dma_buf_p2p_dma_map(dma_buf)
> > >   [.. do dma ..]
> > >   dma_buf_unmap(sg)
> > >   dma_buf_put(dma_buf)
> > >
> > > It is as we discussed before, dma buf needs to be discoverable from a
> > > VMA, at least for users doing this kind of stuff.
> >
> > I'm not a big fan of magic behaviour like this, there's more to
> > dma-buf buffer sharing than just "how do I get at the backing
> > storage". Thus far we've done everything rather explicitly. Plus with
> > exynos and habanalabs converted there's only v4l left over, and that
> > has a proper dma-buf import path already.
>
> Well, any VA approach like this has to access some backing refcount
> via the VMA. Not really any way to avoid something like that
>
> > > A VM flag doesn't help - we need to introduce some kind of lifetime,
> > > and that has to be derived from the VMA. It needs data not just a flag
> >
> > I don't want to make it work, I just want to make it fail. Rough idea
> > I have in mind is to add a follow_pfn_longterm, for all callers which
> > aren't either synchronized through mmap_sem or an mmu_notifier.
>
> follow_pfn() doesn't work outside the pagetable locks or mmu notifier
> protection. Can't be fixed.
>
> We only have a few users:
>
> arch/s390/pci/pci_mmio.c:   ret = follow_pfn(vma, user_addr, pfn);
> drivers/media/v4l2-core/videobuf-dma-contig.c:  ret = follow_pfn(vma, 
> user_address, _pfn);
> drivers/vfio/vfio_iommu_type1.c:ret = follow_pfn(vma, vaddr, pfn);
> drivers/vfio/vfio_iommu_type1.c:ret = follow_pfn(vma, vaddr, 
> pfn);
> mm/frame_vector.c:  err = follow_pfn(vma, start, 
> [ret]);
> virt/kvm/kvm_main.c:r = follow_pfn(vma, addr, );
> virt/kvm/kvm_main.c:r = follow_pfn(vma, addr, );
>
> VFIO is broken like media, but I saw patches fixing the vfio cases
> using the VMA and a vfio specific refcount.
>
> media & frame_vector we are talking about here.
>
> kvm is some similar hack added for P2P DMA, see commit
> add6a0cd1c5ba51b201e1361b05a5df817083618. It might be protected by notifiers..

Yeah my thinking is that kvm (and I think also vfio, also seems to
have mmu notifier nearby) are ok because of the mmu notiifer. Assuming
that one works correctly.

> s390 looks broken too, needs to hold the page table locks.

Hm yeah I guess that looks fairly reasonable to fix too.

> So, the answer really is that s390 and media need fixing, and this API
> should go away (or become kvm specific)

I'm still not clear how you want fo fix this, since your vma->dma_buf
idea is kinda a decade long plan and so just not going to happen:
- v4l used this mostly (afaik the lore at least) for buffer sharing
with v4l itself, and also a bit with fbdev. Neither even has any
dma-buf exporter code as-is.
- like I said, there's no central dma-buf instance, it was fairly
intentionally create as an all-to-all abstraction. Which means you
either have to roll out a vm_ops->gimme_the_dmabuf or, even more work,
refactor all the dma-buf exporters to go through the same things
- even where we have dma-buf, most mmaps of buffer objects aren't a
dma-buf. Those  are only set up when userspace explicitly asks for
one, so we'd also need to change the mmap code of all drivers involved
to make sure the dma-buf is always created when we do any kind of
mmap.

I don't see that as a realistic thing to ever happen, and meanwhile we
can't leave the gap open for a few years.

> > If this really breaks anyone's use-case we can add a tainting kernel
> >

Re: [PATCH] Revert "gpu/drm: ingenic: Add option to mmap GEM buffers cached"

2020-10-05 Thread Daniel Vetter
On Mon, Oct 5, 2020 at 4:47 PM Paul Cercueil  wrote:
>
> Hi,
>
> Le lun. 5 oct. 2020 à 16:05, Daniel Vetter  a écrit :
> > On Mon, Oct 05, 2020 at 11:01:50PM +1100, Stephen Rothwell wrote:
> >>  Hi Paul,
> >>
> >>  On Sun, 04 Oct 2020 22:11:23 +0200 Paul Cercueil
> >>  wrote:
> >>  >
> >>  > Pushed to drm-misc-next with the changelog fix, thanks.
> >>  >
> >>  > Stephen:
> >>  > Now it should build fine again. Could you remove the BROKEN flag?
> >>
> >>  Thanks for letting me know, but the fix has not appeared in any drm
> >>  tree included in linux-next yet ...
> >>
> >>  If it doesn't show up by the time I will merge the drm tree
> >> tomorrow, I
> >>  will apply this revert patch myself (instead of the patch marking
> >> the
> >>  driver BROKEN).
> >
> > Yeah it should have been pushed to drm-misc-next-fixes per
> >
> > https://drm.pages.freedesktop.org/maintainer-tools/committer-drm-misc.html#where-do-i-apply-my-patch
> >
> > Paul, can you pls git cherry-pick -x this over to drm-misc-next-fixes?
>
> I had a few commits on top of it in drm-misc-next, so the revert
> doesn't apply cleanly in drm-misc-next-fixes... I can revert it there,
> but then we'd have a different revert commit in drm-misc-next and
> drm-misc-next-next.
>
> Sorry for the mess. What should I do?

We need the revert in drm-misc-next-fixes or the drm-next pull request
doesn't work out. So cherry-pick over, fix up conflicts, push the
tree, and don't forget to fix up the conflicts when dim rebuilds
drm-tip. Also tell drm-misc maintainers what you've done, they
probably want to do a backmerge to clean this up a bit once the
drm-next pull request has landed.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v2 0/3] drm: commit_work scheduling

2020-10-05 Thread Daniel Vetter
On Mon, Oct 05, 2020 at 03:15:24PM +0300, Ville Syrjälä wrote:
> On Fri, Oct 02, 2020 at 10:55:52AM -0700, Rob Clark wrote:
> > On Fri, Oct 2, 2020 at 4:05 AM Ville Syrjälä
> >  wrote:
> > >
> > > On Fri, Oct 02, 2020 at 01:52:56PM +0300, Ville Syrjälä wrote:
> > > > On Thu, Oct 01, 2020 at 05:25:55PM +0200, Daniel Vetter wrote:
> > > > > On Thu, Oct 1, 2020 at 5:15 PM Rob Clark  wrote:
> > > > > >
> > > > > > On Thu, Oct 1, 2020 at 12:25 AM Daniel Vetter  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, Sep 30, 2020 at 11:16 PM Rob Clark  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > From: Rob Clark 
> > > > > > > >
> > > > > > > > The android userspace treats the display pipeline as a realtime 
> > > > > > > > problem.
> > > > > > > > And arguably, if your goal is to not miss frame deadlines (ie. 
> > > > > > > > vblank),
> > > > > > > > it is.  (See https://lwn.net/Articles/809545/ for the best 
> > > > > > > > explaination
> > > > > > > > that I found.)
> > > > > > > >
> > > > > > > > But this presents a problem with using workqueues for 
> > > > > > > > non-blocking
> > > > > > > > atomic commit_work(), because the SCHED_FIFO userspace 
> > > > > > > > thread(s) can
> > > > > > > > preempt the worker.  Which is not really the outcome you want.. 
> > > > > > > > once
> > > > > > > > the required fences are scheduled, you want to push the atomic 
> > > > > > > > commit
> > > > > > > > down to hw ASAP.
> > > > > > > >
> > > > > > > > But the decision of whether commit_work should be RT or not 
> > > > > > > > really
> > > > > > > > depends on what userspace is doing.  For a pure CFS userspace 
> > > > > > > > display
> > > > > > > > pipeline, commit_work() should remain SCHED_NORMAL.
> > > > > > > >
> > > > > > > > To handle this, convert non-blocking commit_work() to use 
> > > > > > > > per-CRTC
> > > > > > > > kthread workers, instead of system_unbound_wq.  Per-CRTC 
> > > > > > > > workers are
> > > > > > > > used to avoid serializing commits when userspace is using a 
> > > > > > > > per-CRTC
> > > > > > > > update loop.  And the last patch exposes the task id to 
> > > > > > > > userspace as
> > > > > > > > a CRTC property, so that userspace can adjust the priority and 
> > > > > > > > sched
> > > > > > > > policy to fit it's needs.
> > > > > > > >
> > > > > > > >
> > > > > > > > v2: Drop client cap and in-kernel setting of priority/policy in
> > > > > > > > favor of exposing the kworker tid to userspace so that user-
> > > > > > > > space can set priority/policy.
> > > > > > >
> > > > > > > Yeah I think this looks more reasonable. Still a bit irky 
> > > > > > > interface,
> > > > > > > so I'd like to get some kworker/rt ack on this. Other opens:
> > > > > > > - needs userspace, the usual drill
> > > > > >
> > > > > > fwiw, right now the userspace is "modetest + chrt".. *probably* the
> > > > > > userspace will become a standalone helper or daemon, mostly because
> > > > > > the chrome gpu-process sandbox does not allow setting SCHED_FIFO.  
> > > > > > I'm
> > > > > > still entertaining the possibility of switching between rt and cfs
> > > > > > depending on what is in the foreground (ie. only do rt for android
> > > > > > apps).
> > > > > >
> > > > > > > - we need this also for vblank workers, otherwise this wont work 
> > > > > > > for
> > > > > > > drivers needing those because of another priority inversion.
> > > > > >
> > 

Re: [PATCH] drm: bridge: dw-hdmi: Constify dw_hdmi_i2s_ops

2020-10-05 Thread Daniel Vetter
On Sun, Oct 04, 2020 at 10:06:53PM +0200, Rikard Falkeborn wrote:
> The only usage of dw_hdmi_i2s_ops is to assign its address to the ops
> field in the hdmi_codec_pdata struct, which is a const pointer. Make it
> const to allow the compiler to put it in read-only memory.
> 
> Signed-off-by: Rikard Falkeborn 

Queued up in drm-misc-next for 5.11, thanks for your patch.
-Daniel

> ---
>  drivers/gpu/drm/bridge/synopsys/dw-hdmi-i2s-audio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi-i2s-audio.c 
> b/drivers/gpu/drm/bridge/synopsys/dw-hdmi-i2s-audio.c
> index 9fef6413741d..feb04f127b55 100644
> --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi-i2s-audio.c
> +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi-i2s-audio.c
> @@ -170,7 +170,7 @@ static int dw_hdmi_i2s_hook_plugged_cb(struct device 
> *dev, void *data,
>   return dw_hdmi_set_plugged_cb(hdmi, fn, codec_dev);
>  }
>  
> -static struct hdmi_codec_ops dw_hdmi_i2s_ops = {
> +static const struct hdmi_codec_ops dw_hdmi_i2s_ops = {
>   .hw_params  = dw_hdmi_i2s_hw_params,
>   .audio_startup  = dw_hdmi_i2s_audio_startup,
>   .audio_shutdown = dw_hdmi_i2s_audio_shutdown,
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] Revert "gpu/drm: ingenic: Add option to mmap GEM buffers cached"

2020-10-05 Thread Daniel Vetter
On Mon, Oct 05, 2020 at 11:01:50PM +1100, Stephen Rothwell wrote:
> Hi Paul,
> 
> On Sun, 04 Oct 2020 22:11:23 +0200 Paul Cercueil  wrote:
> >
> > Pushed to drm-misc-next with the changelog fix, thanks.
> > 
> > Stephen:
> > Now it should build fine again. Could you remove the BROKEN flag?
> 
> Thanks for letting me know, but the fix has not appeared in any drm
> tree included in linux-next yet ...
> 
> If it doesn't show up by the time I will merge the drm tree tomorrow, I
> will apply this revert patch myself (instead of the patch marking the
> driver BROKEN).

Yeah it should have been pushed to drm-misc-next-fixes per

https://drm.pages.freedesktop.org/maintainer-tools/committer-drm-misc.html#where-do-i-apply-my-patch

Paul, can you pls git cherry-pick -x this over to drm-misc-next-fixes?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 13/14] drm/msm: Drop struct_mutex in shrinker path

2020-10-05 Thread Daniel Vetter
On Mon, Oct 05, 2020 at 05:24:19PM +0800, Hillf Danton wrote:
> 
> On Sun,  4 Oct 2020 12:21:45
> > From: Rob Clark 
> > 
> > Now that the inactive_list is protected by mm_lock, and everything
> > else on per-obj basis is protected by obj->lock, we no longer depend
> > on struct_mutex.
> > 
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/gpu/drm/msm/msm_gem.c  |  1 -
> >  drivers/gpu/drm/msm/msm_gem_shrinker.c | 54 --
> >  2 files changed, 55 deletions(-)
> > 
> [...]
> 
> > @@ -71,13 +33,8 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
> > shrink_control *sc)
> >  {
> > struct msm_drm_private *priv =
> > container_of(shrinker, struct msm_drm_private, shrinker);
> > -   struct drm_device *dev = priv->dev;
> > struct msm_gem_object *msm_obj;
> > unsigned long freed = 0;
> > -   bool unlock;
> > -
> > -   if (!msm_gem_shrinker_lock(dev, ))
> > -   return SHRINK_STOP;
> >  
> > mutex_lock(>mm_lock);
> 
> Better if the change in behavior is documented that SHRINK_STOP will
> no longer be needed.

btw I read through this and noticed you have your own obj lock, plus
mutex_lock_nested. I strongly recommend to just cut over to dma_resv_lock
for all object lock needs (soc drivers have been terrible with this
unfortuntaly), and in the shrinker just use dma_resv_trylock instead of
trying to play clever games outsmarting lockdep.

I recently wrote an entire blog length rant on why I think
mutex_lock_nested is too dangerous to be useful:

https://blog.ffwll.ch/2020/08/lockdep-false-positives.html

Not anything about this here, just general comment. The problem extends to
shmem helpers and all that also having their own locks for everything.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 07/14] drm/msm: Refcount submits

2020-10-05 Thread Daniel Vetter
> +
> + spin_lock(>submit_lock);
> + list_del(>node);
> + spin_unlock(>submit_lock);
> +
> + msm_gem_submit_put(submit);
>  }
>  
>  static void retire_submits(struct msm_gpu *gpu)
> @@ -786,10 +791,6 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>  
>   submit->seqno = ++ring->seqno;
>  
> - spin_lock(>submit_lock);
> - list_add_tail(>node, >submits);
> - spin_unlock(>submit_lock);
> -
>   msm_rd_dump_submit(priv->rd, submit, NULL);
>  
>   update_sw_cntrs(gpu);
> @@ -816,6 +817,16 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct 
> msm_gem_submit *submit)
>   msm_gem_active_get(drm_obj, gpu);
>   }
>  
> + /*
> +  * ring->submits holds a ref to the submit, to deal with the case
> +  * that a submit completes before msm_ioctl_gem_submit() returns.
> +  */
> + msm_gem_submit_get(submit);
> +
> + spin_lock(>submit_lock);
> + list_add_tail(>node, >submits);
> + spin_unlock(>submit_lock);
> +
>   gpu->funcs->submit(gpu, submit);
>   priv->lastctx = submit->queue->ctx;
>  
> -- 
> 2.26.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] drm/gma500: fix double free of gma_connector

2020-10-05 Thread Daniel Vetter
On Sat, Oct 03, 2020 at 12:39:28PM -0700, t...@redhat.com wrote:
> From: Tom Rix 
> 
> clang static analysis reports this problem:
> 
> cdv_intel_dp.c:2101:2: warning: Attempt to free released memory
> kfree(gma_connector);
> ^~~~
> 
> In cdv_intel_dp_init() when the call to cdv_intel_edp_panel_vdd_off()
> fails, the handler calls cdv_intel_dp_destroy(connector) which does
> the first free of gma_connector. So adjust the goto label and skip
> the second free.
> 
> Fixes: d112a8163f83 ("gma500/cdv: Add eDP support")
> Signed-off-by: Tom Rix 

Thanks for your patch, queued in drm-misc-next for 5.11.
-Daniel

> ---
>  drivers/gpu/drm/gma500/cdv_intel_dp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/gma500/cdv_intel_dp.c 
> b/drivers/gpu/drm/gma500/cdv_intel_dp.c
> index 720a767118c9..deb4fd13591d 100644
> --- a/drivers/gpu/drm/gma500/cdv_intel_dp.c
> +++ b/drivers/gpu/drm/gma500/cdv_intel_dp.c
> @@ -2083,7 +2083,7 @@ cdv_intel_dp_init(struct drm_device *dev, struct 
> psb_intel_mode_device *mode_dev
>   DRM_INFO("failed to retrieve link info, disabling 
> eDP\n");
>   drm_encoder_cleanup(encoder);
>   cdv_intel_dp_destroy(connector);
> - goto err_priv;
> + goto err_connector;
>   } else {
>   DRM_DEBUG_KMS("DPCD: Rev=%x LN_Rate=%x LN_CNT=%x 
> LN_DOWNSP=%x\n",
>   intel_dp->dpcd[0], intel_dp->dpcd[1], 
> -- 
> 2.18.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 04/14] drm/msm: Add priv->mm_lock to protect active/inactive lists

2020-10-04 Thread Daniel Vetter
cked(>dev->struct_mutex));
> +   might_sleep();
>
> if (!atomic_dec_return(_obj->active_count)) {
> +   mutex_lock(>mm_lock);
> msm_obj->gpu = NULL;
> list_del_init(_obj->mm_list);
> list_add_tail(_obj->mm_list, >inactive_list);
> +   mutex_unlock(>mm_lock);
> }
>  }
>
> @@ -921,13 +927,16 @@ static void free_object(struct msm_gem_object *msm_obj)
>  {
> struct drm_gem_object *obj = _obj->base;
> struct drm_device *dev = obj->dev;
> +   struct msm_drm_private *priv = dev->dev_private;
>
> WARN_ON(!mutex_is_locked(>struct_mutex));
>
> /* object should not be on active list: */
> WARN_ON(is_active(msm_obj));
>
> +   mutex_lock(>mm_lock);
> list_del(_obj->mm_list);
> +   mutex_unlock(>mm_lock);
>
> mutex_lock(_obj->lock);
>
> @@ -1103,14 +1112,9 @@ static struct drm_gem_object *_msm_gem_new(struct 
> drm_device *dev,
> mapping_set_gfp_mask(obj->filp->f_mapping, GFP_HIGHUSER);
> }
>
> -   if (struct_mutex_locked) {
> -   WARN_ON(!mutex_is_locked(>struct_mutex));
> -   list_add_tail(_obj->mm_list, >inactive_list);
> -   } else {
> -   mutex_lock(>struct_mutex);
> -   list_add_tail(_obj->mm_list, >inactive_list);
> -   mutex_unlock(>struct_mutex);
> -   }
> +   mutex_lock(>mm_lock);
> +   list_add_tail(_obj->mm_list, >inactive_list);
> +   mutex_unlock(>mm_lock);
>
> return obj;
>
> @@ -1178,9 +1182,9 @@ struct drm_gem_object *msm_gem_import(struct drm_device 
> *dev,
>
> mutex_unlock(_obj->lock);
>
> -   mutex_lock(>struct_mutex);
> +   mutex_lock(>mm_lock);
> list_add_tail(_obj->mm_list, >inactive_list);
> -   mutex_unlock(>struct_mutex);
> +   mutex_unlock(>mm_lock);
>
> return obj;
>
> diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
> b/drivers/gpu/drm/msm/msm_gem_shrinker.c
> index 482576d7a39a..c41b84a3a484 100644
> --- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
> +++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
> @@ -51,11 +51,15 @@ msm_gem_shrinker_count(struct shrinker *shrinker, struct 
> shrink_control *sc)
> if (!msm_gem_shrinker_lock(dev, ))
> return 0;
>
> +   mutex_lock(>mm_lock);
> +
> list_for_each_entry(msm_obj, >inactive_list, mm_list) {
> if (is_purgeable(msm_obj))
> count += msm_obj->base.size >> PAGE_SHIFT;
> }
>
> +   mutex_unlock(>mm_lock);
> +
> if (unlock)
> mutex_unlock(>struct_mutex);
>
> @@ -75,6 +79,8 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
> shrink_control *sc)
> if (!msm_gem_shrinker_lock(dev, ))
> return SHRINK_STOP;
>
> +   mutex_lock(>mm_lock);
> +
> list_for_each_entry(msm_obj, >inactive_list, mm_list) {
> if (freed >= sc->nr_to_scan)
> break;
> @@ -84,6 +90,8 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
> shrink_control *sc)
> }
> }
>
> +   mutex_unlock(>mm_lock);
> +
> if (unlock)
> mutex_unlock(>struct_mutex);
>
> @@ -106,6 +114,8 @@ msm_gem_shrinker_vmap(struct notifier_block *nb, unsigned 
> long event, void *ptr)
> if (!msm_gem_shrinker_lock(dev, ))
> return NOTIFY_DONE;
>
> +   mutex_lock(>mm_lock);
> +
> list_for_each_entry(msm_obj, >inactive_list, mm_list) {
> if (is_vunmapable(msm_obj)) {
> msm_gem_vunmap(_obj->base, OBJ_LOCK_SHRINKER);
> @@ -118,6 +128,8 @@ msm_gem_shrinker_vmap(struct notifier_block *nb, unsigned 
> long event, void *ptr)
> }
> }
>
> +   mutex_unlock(>mm_lock);
> +
> if (unlock)
> mutex_unlock(>struct_mutex);
>
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 6c9e1fdc1a76..1806e87600c0 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -94,7 +94,10 @@ struct msm_gpu {
> struct msm_ringbuffer *rb[MSM_GPU_MAX_RINGS];
> int nr_rings;
>
> -   /* list of GEM active objects: */
> +   /*
> +* List of GEM active objects on this gpu.  Protected by
> +* msm_drm_private::mm_lock
> +*/
> struct list_head active_list;
>
> /* does gpu need hw_init? */
> --
> 2.26.2
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-04 Thread Daniel Vetter
On Sun, Oct 4, 2020 at 2:51 PM Jason Gunthorpe  wrote:
>
> On Sat, Oct 03, 2020 at 11:40:22AM +0200, Daniel Vetter wrote:
>
> > > That leaves the only interesting places as vb2_dc_get_userptr() and
> > > vb2_vmalloc_get_userptr() which both completely fail to follow the
> > > REQUIRED behavior in the function's comment about checking PTEs. It
> > > just DMA maps them. Badly broken.
> > >
> > > Guessing this hackery is for some embedded P2P DMA transfer?
> >
> > Yeah, see also the follow_pfn trickery in
> > videobuf_dma_contig_user_get(), I think this is fully intentional and
> > userspace abi we can't break :-/
>
> We don't need to break uABI, it just needs to work properly in the
> kernel:
>
>   vma = find_vma_intersection()
>   dma_buf = dma_buf_get_from_vma(vma)
>   sg = dma_buf_p2p_dma_map(dma_buf)
>   [.. do dma ..]
>   dma_buf_unmap(sg)
>   dma_buf_put(dma_buf)
>
> It is as we discussed before, dma buf needs to be discoverable from a
> VMA, at least for users doing this kind of stuff.

I'm not a big fan of magic behaviour like this, there's more to
dma-buf buffer sharing than just "how do I get at the backing
storage". Thus far we've done everything rather explicitly. Plus with
exynos and habanalabs converted there's only v4l left over, and that
has a proper dma-buf import path already.

> > Yup this should be done with dma_buf instead, and v4l has that. But
> > old uapi and all that. This is why I said we might need a new
> > VM_DYNAMIC_PFNMAP or so, to make follow_pfn not resolve this in the
> > case where the driver manages the underlying iomem range (or whatever
> > it is) dynamically and moves buffer objects around, like drm drivers
> > do. But I looked, and we've run out of vma->vm_flags :-(
>
> A VM flag doesn't help - we need to introduce some kind of lifetime,
> and that has to be derived from the VMA. It needs data not just a flag

I don't want to make it work, I just want to make it fail. Rough idea
I have in mind is to add a follow_pfn_longterm, for all callers which
aren't either synchronized through mmap_sem or an mmu_notifier. If
this really breaks anyone's use-case we can add a tainting kernel
option which re-enables this (we've done something similar for
phys_addr_t based buffer sharing in fbdev, entirely unfixable since
the other driver has to just blindly trust that what userspace passes
around is legit). This here isn't unfixable, but if v4l people want to
keep it without a big "security hole here" sticker, they should do the
work, not me :-)

> > The other problem is that I also have no real working clue about all
> > the VM_* flags and what they all mean, and whether drm drivers set the
> > right ones in all cases (they probably don't, but oh well).
> > Documentation for this stuff in headers is a bit thin at times.
>
> Yah, I don't really know either :\
>
> The comment above vm_normal_page() is a bit helpful. Don't know what
> VM_IO/VM_PFNMAP mean in their 3 combinations
>
> There are very few places that set VM_PFNMAP without VM_IO..

Best I could find is:
- mk68 seems to outright reject pagefaults on VM_IO vma
- some places set VM_IO together with VM_MIXEDMAP instead of
VM_PFNMAP. There's some comments that this makes cow possible, but I
guess that's for the old pfn remap vma (remap_file_pages, which is
removed now). But really no clue.

VM_IO | VM_MIXEDMAP kinda makes me wonder whether follow_pfn gets the
page refcounting all right or horribly wrong in some cases ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-04 Thread Daniel Vetter
On Sun, Oct 4, 2020 at 1:24 AM Jason Gunthorpe  wrote:
> On Sat, Oct 03, 2020 at 03:52:32PM -0700, John Hubbard wrote:
> > On 10/3/20 2:45 AM, Daniel Vetter wrote:
> > > On Sat, Oct 3, 2020 at 12:39 AM John Hubbard  wrote:
> > > >
> > > > On 10/2/20 10:53 AM, Daniel Vetter wrote:
> > > > > For $reasons I've stumbled over this code and I'm not sure the change
> > > > > to the new gup functions in 55a650c35fea ("mm/gup: frame_vector:
> > > > > convert get_user_pages() --> pin_user_pages()") was entirely correct.
> > > > >
> > > > > This here is used for long term buffers (not just quick I/O) like
> > > > > RDMA, and John notes this in his patch. But I thought the rule for
> > > > > these is that they need to add FOLL_LONGTERM, which John's patch
> > > > > didn't do.
> > > >
> > > > Yep. The earlier gup --> pup conversion patches were intended to not
> > > > have any noticeable behavior changes, and FOLL_LONGTERM, with it's
> > > > special cases and such, added some risk that I wasn't ready to take
> > > > on yet. Also, FOLL_LONGTERM rules are only *recently* getting firmed
> > > > up. So there was some doubt at least in my mind, about which sites
> > > > should have it.
> > > >
> > > > But now that we're here, I think it's really good that you've brought
> > > > this up. It's definitely time to add FOLL_LONGTERM wherever it's 
> > > > missing.
> > >
> > > So should I keep this patch, or will it collide with a series you're 
> > > working on?
> >
> > It doesn't collide with anything on my end yet, because I've been slow to
> > pick up on the need for changing callsites to add FOLL_LONGTERM. :)
> >
> > And it looks like that's actually a problem, because:
> >
> > >
> > > Also with the firmed up rules, correct that I can also drop the
> > > vma_is_fsdax check when the FOLL_LONGTERM flag is set?
> >
> > That's the right direction to go *in general*, but I see that the
> > pin_user_pages code is still a bit stuck in the past. And this patch
> > won't actually work, with or without that vma_is_fsdax() check.
> > Because:
> >
> > get_vaddr_frames(FOLL_LONGTERM)
> >pin_user_pages_locked()
> >   if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
> >   return -EINVAL;
>
> There is no particular reason this code needs to have the mm sem at
> that point.
>
> It should call pin_user_pages_fast() and only if that fails get the mmap
> lock and extract the VMA to do broken hackery.

Yeah I think that works. I tried understanding gup.c code a bit more,
and it looks like FOLL_LONGTERM only works for the pup_fast variant
right now? All others seem to have this comment that it's somehow
incompatible with FAULT_FLAG_ALLOW_RETRY and daxfs. But grepping
around for that didn't show up anything, at least not nearby dax code.
For my understanding of all this, what's the hiccup there?

For plans, I only started this for a bit of my own learning, but I
think I'll respin with the following changes:
- convert exynos and habanalabs to pin_user_pages_fast directly,
instead of going through this frame-vector detour
- move the locking and convert get_vaddr_frames to pup_fast as Jason suggested
- hack up some truly gross rfc to plug the follow_pfn hole

Cheers, Daniel

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-03 Thread Daniel Vetter
On Sat, Oct 3, 2020 at 12:39 AM John Hubbard  wrote:
>
> On 10/2/20 10:53 AM, Daniel Vetter wrote:
> > For $reasons I've stumbled over this code and I'm not sure the change
> > to the new gup functions in 55a650c35fea ("mm/gup: frame_vector:
> > convert get_user_pages() --> pin_user_pages()") was entirely correct.
> >
> > This here is used for long term buffers (not just quick I/O) like
> > RDMA, and John notes this in his patch. But I thought the rule for
> > these is that they need to add FOLL_LONGTERM, which John's patch
> > didn't do.
>
> Yep. The earlier gup --> pup conversion patches were intended to not
> have any noticeable behavior changes, and FOLL_LONGTERM, with it's
> special cases and such, added some risk that I wasn't ready to take
> on yet. Also, FOLL_LONGTERM rules are only *recently* getting firmed
> up. So there was some doubt at least in my mind, about which sites
> should have it.
>
> But now that we're here, I think it's really good that you've brought
> this up. It's definitely time to add FOLL_LONGTERM wherever it's missing.

So should I keep this patch, or will it collide with a series you're working on?

Also with the firmed up rules, correct that I can also drop the
vma_is_fsdax check when the FOLL_LONGTERM flag is set?

Thanks, Daniel

>
> thanks,
> --
> John Hubbard
> NVIDIA
>
> >
> > There is already a dax specific check (added in b7f0554a56f2 ("mm:
> > fail get_vaddr_frames() for filesystem-dax mappings")), so this seems
> > like the prudent thing to do.
> >
> > Signed-off-by: Daniel Vetter 
> > Cc: Andrew Morton 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Jan Kara 
> > Cc: Dan Williams 
> > Cc: linux...@kvack.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-samsung-...@vger.kernel.org
> > Cc: linux-me...@vger.kernel.org
> > ---
> > Hi all,
> >
> > I stumbled over this and figured typing this patch can't hurt. Really
> > just to maybe learn a few things about how gup/pup is supposed to be
> > used (we have a bit of that in drivers/gpu), this here isn't really
> > ralated to anything I'm doing.
> >
> > I'm also wondering whether the explicit dax check should be removed,
> > since FOLL_LONGTERM should take care of that already.
> > -Daniel
> > ---
> >   mm/frame_vector.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> > index 5d34c9047e9c..3507e09cb3ff 100644
> > --- a/mm/frame_vector.c
> > +++ b/mm/frame_vector.c
> > @@ -35,7 +35,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> > nr_frames,
> >   {
> >   struct mm_struct *mm = current->mm;
> >   struct vm_area_struct *vma;
> > - unsigned int gup_flags = FOLL_WRITE | FOLL_FORCE;
> > + unsigned int gup_flags = FOLL_WRITE | FOLL_FORCE | FOLL_LONGTERM;
> >   int ret = 0;
> >   int err;
> >   int locked;
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-03 Thread Daniel Vetter
On Sat, Oct 3, 2020 at 1:31 AM Jason Gunthorpe  wrote:
>
> On Fri, Oct 02, 2020 at 08:16:48PM +0200, Daniel Vetter wrote:
> > On Fri, Oct 2, 2020 at 8:06 PM Jason Gunthorpe  wrote:
> > > On Fri, Oct 02, 2020 at 07:53:03PM +0200, Daniel Vetter wrote:
> > > > For $reasons I've stumbled over this code and I'm not sure the change
> > > > to the new gup functions in 55a650c35fea ("mm/gup: frame_vector:
> > > > convert get_user_pages() --> pin_user_pages()") was entirely correct.
> > > >
> > > > This here is used for long term buffers (not just quick I/O) like
> > > > RDMA, and John notes this in his patch. But I thought the rule for
> > > > these is that they need to add FOLL_LONGTERM, which John's patch
> > > > didn't do.
> > > >
> > > > There is already a dax specific check (added in b7f0554a56f2 ("mm:
> > > > fail get_vaddr_frames() for filesystem-dax mappings")), so this seems
> > > > like the prudent thing to do.
> > > >
> > > > Signed-off-by: Daniel Vetter 
> > > > Cc: Andrew Morton 
> > > > Cc: John Hubbard 
> > > > Cc: Jérôme Glisse 
> > > > Cc: Jan Kara 
> > > > Cc: Dan Williams 
> > > > Cc: linux...@kvack.org
> > > > Cc: linux-arm-ker...@lists.infradead.org
> > > > Cc: linux-samsung-...@vger.kernel.org
> > > > Cc: linux-me...@vger.kernel.org
> > > > Hi all,
> > > >
> > > > I stumbled over this and figured typing this patch can't hurt. Really
> > > > just to maybe learn a few things about how gup/pup is supposed to be
> > > > used (we have a bit of that in drivers/gpu), this here isn't really
> > > > ralated to anything I'm doing.
> > >
> > > FOLL_FORCE is a pretty big clue it should be FOLL_LONGTERM, IMHO
> >
> > Since you're here ... I've noticed that ib sets FOLL_FORCE when the ib
> > verb access mode indicates possible writes. I'm not really clear on
> > why FOLL_WRITE isn't enough any why you need to be able to write
> > through a vma that's write protected currently.
>
> Ah, FOLL_FORCE | FOLL_WRITE means *read* confusingly enough, and the
> only reason you'd want this version for read is if you are doing
> longterm stuff. I wrote about this recently:
>
> https://lore.kernel.org/linux-mm/20200928235739.gu9...@ziepe.ca/

Ah, so essentially it serves as a FOLL_GET_COW_ISSUES_OUT_OF_MY_WAY. I
think documentation for this, and/or just automatically adding the
flag set combination would be really good. But I see John is already
on top of that it seems.

Thanks for the explainer.

> > > Since every driver does this wrong anything that uses this is creating
> > > terrifying security issues.
> > >
> > > IMHO this whole API should be deleted :(
> >
> > Yeah that part I just tried to conveniently ignore. I guess this dates
> > back to a time when ioremaps where at best fixed, and there wasn't
> > anything like a gpu driver dynamically managing vram around, resulting
> > in random entirely unrelated things possibly being mapped to that set
> > of pfns.
>
> No, it was always wrong. Prior to GPU like cases the lifetime of the
> PTE was tied to the vma and when the vma becomes free the driver can
> move the things in the PTEs to 'free'. Easy to trigger use-after-free
> issues and devices like RDMA have security contexts attached to these
> PTEs so it becomes a serious security bug to do something like this.
>
> > The underlying follow_pfn is also used in other places within
> > drivers/media, so this doesn't seem to be an accident, but actually
> > intentional.
>
> Looking closely, there are very few users, most *seem* pointless, but
> maybe there is a crazy reason?
>
> The sequence
>   get_vaddr_frames();
>   frame_vector_to_pages();
>   sg_alloc_table_from_pages();
>
> Should be written
>   pin_user_pages_fast(FOLL_LONGTERM);
>   sg_alloc_table_from_pages()

Ok, that takes care of exynos and habanalabs. I'll try and wack
together a patch for exynos, that driver is a bit special - first arm
soc driver and we merged it fully well aware that it's full of warts,
just as a show to make it clear that drivers/gpu is also interested in
small gpu drivers ...

> There is some 'special' code in frame_vector_to_pages() that tries to
> get a struct page for things from a VM_IO or VM_PFNMAP...
>
> Oh snap, that is *completely* broken! If the first VMA is IO|PFNMAP
> then get_vaddr_frames() iterates over all VMAs in the range, of any
> kind and extracts the PTEs then blindly references them! This means it
> 

Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-02 Thread Daniel Vetter
On Fri, Oct 2, 2020 at 8:06 PM Jason Gunthorpe  wrote:
> On Fri, Oct 02, 2020 at 07:53:03PM +0200, Daniel Vetter wrote:
> > For $reasons I've stumbled over this code and I'm not sure the change
> > to the new gup functions in 55a650c35fea ("mm/gup: frame_vector:
> > convert get_user_pages() --> pin_user_pages()") was entirely correct.
> >
> > This here is used for long term buffers (not just quick I/O) like
> > RDMA, and John notes this in his patch. But I thought the rule for
> > these is that they need to add FOLL_LONGTERM, which John's patch
> > didn't do.
> >
> > There is already a dax specific check (added in b7f0554a56f2 ("mm:
> > fail get_vaddr_frames() for filesystem-dax mappings")), so this seems
> > like the prudent thing to do.
> >
> > Signed-off-by: Daniel Vetter 
> > Cc: Andrew Morton 
> > Cc: John Hubbard 
> > Cc: Jérôme Glisse 
> > Cc: Jan Kara 
> > Cc: Dan Williams 
> > Cc: linux...@kvack.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-samsung-...@vger.kernel.org
> > Cc: linux-me...@vger.kernel.org
> > Hi all,
> >
> > I stumbled over this and figured typing this patch can't hurt. Really
> > just to maybe learn a few things about how gup/pup is supposed to be
> > used (we have a bit of that in drivers/gpu), this here isn't really
> > ralated to anything I'm doing.
>
> FOLL_FORCE is a pretty big clue it should be FOLL_LONGTERM, IMHO

Since you're here ... I've noticed that ib sets FOLL_FORCE when the ib
verb access mode indicates possible writes. I'm not really clear on
why FOLL_WRITE isn't enough any why you need to be able to write
through a vma that's write protected currently.

> > I'm also wondering whether the explicit dax check should be removed,
> > since FOLL_LONGTERM should take care of that already.
>
> Yep! Confirms the above!
>
> This get_vaddr_frames() thing looks impossible to use properly. How on
> earth does a driver guarentee
>
>  "If @start belongs to VM_IO | VM_PFNMAP vma, we don't touch page
>  structures and the caller must make sure pfns aren't reused for
>  anything else while he is using them."
>
> The only possible way to do that is if the driver restricts the VMAs
> to ones it owns and interacts with the vm_private data to refcount
> something.
>
> Since every driver does this wrong anything that uses this is creating
> terrifying security issues.
>
> IMHO this whole API should be deleted :(

Yeah that part I just tried to conveniently ignore. I guess this dates
back to a time when ioremaps where at best fixed, and there wasn't
anything like a gpu driver dynamically managing vram around, resulting
in random entirely unrelated things possibly being mapped to that set
of pfns.

The underlying follow_pfn is also used in other places within
drivers/media, so this doesn't seem to be an accident, but actually
intentional.

I guess minimally we'd need a VM_PFNMAP flag for dynamically manged
drivers like modern drm gpu drivers, to make sure follow_pfn doesn't
follow these?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH 1/2] mm/frame-vec: Drop gup_flags from get_vaddr_frames()

2020-10-02 Thread Daniel Vetter
FOLL_WRITE | FOLL_FORCE is really the only reasonable thing to do for
simple dma device that can't guarantee write protection. Which is also
what all the callers are using.

So just simplify this.

Signed-off-by: Daniel Vetter 
Cc: Inki Dae 
Cc: Joonyoung Shim 
Cc: Seung-Woo Kim 
Cc: Kyungmin Park 
Cc: Kukjin Kim 
Cc: Krzysztof Kozlowski 
Cc: Pawel Osciak 
Cc: Marek Szyprowski 
Cc: Tomasz Figa 
Cc: Andrew Morton 
Cc: Oded Gabbay 
Cc: Omer Shpigelman 
Cc: Tomer Tayar 
Cc: Greg Kroah-Hartman 
Cc: Pawel Piskorski 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: linux...@kvack.org
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c   | 3 +--
 drivers/media/common/videobuf2/videobuf2-memops.c | 3 +--
 drivers/misc/habanalabs/common/memory.c   | 3 +--
 include/linux/mm.h| 2 +-
 mm/frame_vector.c | 4 ++--
 5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 967a5cdc120e..ac452842bab3 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -480,8 +480,7 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data 
*g2d,
goto err_free;
}
 
-   ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-   g2d_userptr->vec);
+   ret = get_vaddr_frames(start, npages, g2d_userptr->vec);
if (ret != npages) {
DRM_DEV_ERROR(g2d->dev,
  "failed to get user pages from userptr.\n");
diff --git a/drivers/media/common/videobuf2/videobuf2-memops.c 
b/drivers/media/common/videobuf2/videobuf2-memops.c
index 6e9e05153f4e..9dd6c27162f4 100644
--- a/drivers/media/common/videobuf2/videobuf2-memops.c
+++ b/drivers/media/common/videobuf2/videobuf2-memops.c
@@ -40,7 +40,6 @@ struct frame_vector *vb2_create_framevec(unsigned long start,
unsigned long first, last;
unsigned long nr;
struct frame_vector *vec;
-   unsigned int flags = FOLL_FORCE | FOLL_WRITE;
 
first = start >> PAGE_SHIFT;
last = (start + length - 1) >> PAGE_SHIFT;
@@ -48,7 +47,7 @@ struct frame_vector *vb2_create_framevec(unsigned long start,
vec = frame_vector_create(nr);
if (!vec)
return ERR_PTR(-ENOMEM);
-   ret = get_vaddr_frames(start & PAGE_MASK, nr, flags, vec);
+   ret = get_vaddr_frames(start & PAGE_MASK, nr, vec);
if (ret < 0)
goto out_destroy;
/* We accept only complete set of PFNs */
diff --git a/drivers/misc/habanalabs/common/memory.c 
b/drivers/misc/habanalabs/common/memory.c
index 5ff4688683fd..43b10aee8150 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1287,8 +1287,7 @@ static int get_user_memory(struct hl_device *hdev, u64 
addr, u64 size,
return -ENOMEM;
}
 
-   rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-   userptr->vec);
+   rc = get_vaddr_frames(start, npages, userptr->vec);
 
if (rc != npages) {
dev_err(hdev->dev,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..7d14aa2780d2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1757,7 +1757,7 @@ struct frame_vector {
 struct frame_vector *frame_vector_create(unsigned int nr_frames);
 void frame_vector_destroy(struct frame_vector *vec);
 int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
-unsigned int gup_flags, struct frame_vector *vec);
+struct frame_vector *vec);
 void put_vaddr_frames(struct frame_vector *vec);
 int frame_vector_to_pages(struct frame_vector *vec);
 void frame_vector_to_pfns(struct frame_vector *vec);
diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 10f82d5643b6..5d34c9047e9c 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -12,7 +12,6 @@
  * get_vaddr_frames() - map virtual addresses to pfns
  * @start: starting user address
  * @nr_frames: number of pages / pfns from start to map
- * @gup_flags: flags modifying lookup behaviour
  * @vec:   structure which receives pages / pfns of the addresses mapped.
  * It should have space for at least nr_frames entries.
  *
@@ -32,10 +31,11 @@
  * This function takes care of grabbing mmap_lock as necessary.
  */
 int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
-unsigned int gup_flags, struct frame_vector *vec)
+struct frame_vector *vec)
 {
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
+   unsigned int gup_flags = FOLL_WRITE | FOLL_FORCE;
int ret = 0;
int err;
int locked;
-- 
2.28.0



[PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM

2020-10-02 Thread Daniel Vetter
For $reasons I've stumbled over this code and I'm not sure the change
to the new gup functions in 55a650c35fea ("mm/gup: frame_vector:
convert get_user_pages() --> pin_user_pages()") was entirely correct.

This here is used for long term buffers (not just quick I/O) like
RDMA, and John notes this in his patch. But I thought the rule for
these is that they need to add FOLL_LONGTERM, which John's patch
didn't do.

There is already a dax specific check (added in b7f0554a56f2 ("mm:
fail get_vaddr_frames() for filesystem-dax mappings")), so this seems
like the prudent thing to do.

Signed-off-by: Daniel Vetter 
Cc: Andrew Morton 
Cc: John Hubbard 
Cc: Jérôme Glisse 
Cc: Jan Kara 
Cc: Dan Williams 
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
---
Hi all,

I stumbled over this and figured typing this patch can't hurt. Really
just to maybe learn a few things about how gup/pup is supposed to be
used (we have a bit of that in drivers/gpu), this here isn't really
ralated to anything I'm doing.

I'm also wondering whether the explicit dax check should be removed,
since FOLL_LONGTERM should take care of that already.
-Daniel
---
 mm/frame_vector.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 5d34c9047e9c..3507e09cb3ff 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -35,7 +35,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
nr_frames,
 {
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
-   unsigned int gup_flags = FOLL_WRITE | FOLL_FORCE;
+   unsigned int gup_flags = FOLL_WRITE | FOLL_FORCE | FOLL_LONGTERM;
int ret = 0;
int err;
int locked;
-- 
2.28.0



Re: [PATCH v3 0/2] Add configurable handler to execute a compound action

2020-10-02 Thread Daniel Vetter
On Fri, Oct 02, 2020 at 02:36:33PM +0200, Andrzej Pietrasiewicz wrote:
> W dniu 02.10.2020 o 14:33, Andrzej Pietrasiewicz pisze:
> > W dniu 02.10.2020 o 14:31, Greg Kroah-Hartman pisze:
> > > On Tue, Aug 18, 2020 at 01:28:23PM +0200, Andrzej Pietrasiewicz wrote:
> > > > This is a follow-up of this thread:
> > > > 
> > > > https://www.spinics.net/lists/linux-input/msg68446.html
> > > 
> > > lore.kernel.org is easier to pull stuff from :)
> > > 
> > > Anyway, what ever happened to this series?  Is there a newer one
> > > somewhere?
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > > 
> > 
> > https://lkml.org/lkml/2020/8/18/440
> > 
> > Andrzej
> 
> Sorry about confusion.
> 
> This is the same thing, so there is nothing newer.

Maybe split out the s/V/v/ in drm so I can pick that up? Alternatively
Acked-by: Daniel Vetter  if Greg takes it all.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [RFC PATCH 0/4] Add a RPMsg driver to support AI Processing Unit (APU)

2020-10-02 Thread Daniel Vetter
On Thu, Oct 01, 2020 at 07:28:27PM +0200, Alexandre Bailon wrote:
> Hi Daniel,
> 
> On 10/1/20 10:48 AM, Daniel Vetter wrote:
> > On Wed, Sep 30, 2020 at 01:53:46PM +0200, Alexandre Bailon wrote:
> > > This adds a RPMsg driver that implements communication between the CPU 
> > > and an
> > > APU.
> > > This uses VirtIO buffer to exchange messages but for sharing data, this 
> > > uses
> > > a dmabuf, mapped to be shared between CPU (userspace) and APU.
> > > The driver is relatively generic, and should work with any SoC 
> > > implementing
> > > hardware accelerator for AI if they use support remoteproc and VirtIO.
> > > 
> > > For the people interested by the firmware or userspace library,
> > > the sources are available here:
> > > https://github.com/BayLibre/open-amp/tree/v2020.01-mtk/apps/examples/apu
> > Since this has open userspace (from a very cursory look), and smells very
> > much like an acceleration driver, and seems to use dma-buf for memory
> > management: Why is this not just a drm driver?
> 
> I have never though to DRM since for me it was only a RPMsg driver.
> I don't know well DRM. Could you tell me how you would do it so I could have
> a look ?

Well internally it would still be an rpmsg driver ... I'm assuming that's
kinda similar to how most gpu drivers sit on top of a pci_device or a
platform_device, it's just a means to get at your "device"?

The part I'm talking about here is the userspace api. You're creating an
entirely new chardev interface, which at least from a quick look seems to
be based on dma-buf buffers and used to submit commands to your device to
do some kind of computing/processing. That's exactly what drivers/gpu/drm
does (if you ignore the display/modeset side of things) - at the kernel
level gpus have nothing to do with graphics, but all with handling buffer
objects and throwing workloads at some kind of accelerator thing.

Of course that's just my guess of what's going on, after scrolling through
your driver and userspace a bit, I might be completely off. But if my
guess is roughly right, then your driver is internally an rpmsg
driver, but towards userspace it should be a drm driver.

Cheers, Daniel

> 
> Thanks,
> Alexandre
> 
> > -Daniel
> > 
> > > Alexandre Bailon (3):
> > >Add a RPMSG driver for the APU in the mt8183
> > >rpmsg: apu_rpmsg: update the way to store IOMMU mapping
> > >rpmsg: apu_rpmsg: Add an IOCTL to request IOMMU mapping
> > > 
> > > Julien STEPHAN (1):
> > >rpmsg: apu_rpmsg: Add support for async apu request
> > > 
> > >   drivers/rpmsg/Kconfig  |   9 +
> > >   drivers/rpmsg/Makefile |   1 +
> > >   drivers/rpmsg/apu_rpmsg.c  | 752 +
> > >   drivers/rpmsg/apu_rpmsg.h  |  52 +++
> > >   include/uapi/linux/apu_rpmsg.h |  47 +++
> > >   5 files changed, 861 insertions(+)
> > >   create mode 100644 drivers/rpmsg/apu_rpmsg.c
> > >   create mode 100644 drivers/rpmsg/apu_rpmsg.h
> > >   create mode 100644 include/uapi/linux/apu_rpmsg.h
> > > 
> > > -- 
> > > 2.26.2
> > > 
> > > ___
> > > dri-devel mailing list
> > > dri-de...@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v2 0/3] drm: commit_work scheduling

2020-10-01 Thread Daniel Vetter
On Thu, Oct 1, 2020 at 5:15 PM Rob Clark  wrote:
>
> On Thu, Oct 1, 2020 at 12:25 AM Daniel Vetter  wrote:
> >
> > On Wed, Sep 30, 2020 at 11:16 PM Rob Clark  wrote:
> > >
> > > From: Rob Clark 
> > >
> > > The android userspace treats the display pipeline as a realtime problem.
> > > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > > that I found.)
> > >
> > > But this presents a problem with using workqueues for non-blocking
> > > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > > preempt the worker.  Which is not really the outcome you want.. once
> > > the required fences are scheduled, you want to push the atomic commit
> > > down to hw ASAP.
> > >
> > > But the decision of whether commit_work should be RT or not really
> > > depends on what userspace is doing.  For a pure CFS userspace display
> > > pipeline, commit_work() should remain SCHED_NORMAL.
> > >
> > > To handle this, convert non-blocking commit_work() to use per-CRTC
> > > kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> > > used to avoid serializing commits when userspace is using a per-CRTC
> > > update loop.  And the last patch exposes the task id to userspace as
> > > a CRTC property, so that userspace can adjust the priority and sched
> > > policy to fit it's needs.
> > >
> > >
> > > v2: Drop client cap and in-kernel setting of priority/policy in
> > > favor of exposing the kworker tid to userspace so that user-
> > > space can set priority/policy.
> >
> > Yeah I think this looks more reasonable. Still a bit irky interface,
> > so I'd like to get some kworker/rt ack on this. Other opens:
> > - needs userspace, the usual drill
>
> fwiw, right now the userspace is "modetest + chrt".. *probably* the
> userspace will become a standalone helper or daemon, mostly because
> the chrome gpu-process sandbox does not allow setting SCHED_FIFO.  I'm
> still entertaining the possibility of switching between rt and cfs
> depending on what is in the foreground (ie. only do rt for android
> apps).
>
> > - we need this also for vblank workers, otherwise this wont work for
> > drivers needing those because of another priority inversion.
>
> I have a thought on that, see below..

Hm, not seeing anything about vblank worker below?

> > - we probably want some indication of whether this actually does
> > something useful, not all drivers use atomic commit helpers. Not sure
> > how to do that.
>
> I'm leaning towards converting the other drivers over to use the
> per-crtc kwork, and then dropping the 'commit_work` from atomic state.
> I can add a patch to that, but figured I could postpone that churn
> until there is some by-in on this whole idea.

i915 has its own commit code, it's not even using the current commit
helpers (nor the commit_work). Not sure how much other fun there is.

> > - not sure whether the vfunc is an awesome idea, I'd frankly just
> > open-code this inline. We have similar special cases already for e.g.
> > dpms (and in multiple places), this isn't the worst.
>
> I could introduce a "PID" property type.  This would be useful if we
> wanted to also expose the vblank workers.

Hm right, but I think we need at most 2 for commit thread and vblank
thread (at least with the current design). So open-coded if with two
if (prop == crtc_worker_pid_prop || prop  ==
crtc_vblank_worker_pid_prop) doesn't seem too horrible to me.
Otherwise people start creating really funny stuff in their drivers
with this backend, and I don't want to spend all the time making sure
they don't abuse this :-)

> > - still feeling we could at least change the default to highpriority 
> > niceness.
>
> AFAIU this would still be preempted by something that is SCHED_FIFO.
> Also, with cfs/SCHED_NORMAL, you can still be preempted by lower
> priority things that haven't had a chance to run for a while.

i915 uses highprio workqueue, so I guess to avoid regressions we need
that (it's also not using the commit helpers right now, but no reason
not to afaics, stuff simply happened in parallel back then.

> > - there's still the problem that commit works can overlap, and a
> > single worker can't do that anymore. So rolling that out for everyone
> > as-is feels a bit risky.
>
> That is why it is per-CRTC..  I'm not sure there is a need to overlap
> work for a single CRTC?
>
> We could OFC make this a driver knob, and keep the old 'commit_

Re: linux-next: manual merge of the akpm tree with the drm-intel tree

2020-10-01 Thread Daniel Vetter
On Thu, Oct 1, 2020 at 5:08 PM Jani Nikula  wrote:
>
> On Thu, 01 Oct 2020, Daniel Vetter  wrote:
> > On Thu, Oct 1, 2020 at 3:53 PM Christoph Hellwig  wrote:
> >>
> >> On Thu, Oct 01, 2020 at 08:39:17PM +1000, Stephen Rothwell wrote:
> >> > Hi all,
> >> >
> >> > Today's linux-next merge of the akpm tree got a conflict in:
> >> >
> >> >   drivers/gpu/drm/i915/gem/i915_gem_pages.c
> >> >
> >> > between commit:
> >> >
> >> >   4caf017ee937 ("drm/i915/gem: Avoid implicit vmap for highmem on 
> >> > x86-32")
> >> >   ba2ebf605d5f ("drm/i915/gem: Prevent using pgprot_writecombine() if 
> >> > PAT is not supported")
> >
> > Uh these patches shouldn't be in linux-next because they're for 5.11,
> > not the 5.10 merge window that will open soon. Joonas?
>
> I don't know anything else, but both are tagged Cc: stable.

Uh right I got confused, they're on -fixes now. Well -next-fixes,
which seems like the wrong one for a cc: stable, I guess this should
go into 5.9 even. Apologies for my confusion.
-Daniel

>
> BR,
> Jani.
>
> >
> >> > from the drm-intel tree and patch:
> >> >
> >> >   "drm/i915: use vmap in i915_gem_object_map"
> >> >
> >> > from the akpm tree.
> >> >
> >> > I fixed it up (I just dropped the changes in the former commits) and
> >>
> >> Sigh.  The solution is a bit more complicated, but I just redid my
> >> patches to not depend on the above ones.  I can revert back to the old
> >> version, though.  Andrew, let me know what works for you.
> >
> > Imo ignore, rebasing onto linux-next without those intel patches was
> > the right thing for the 5.10 merge window.
> > -Daniel
>
> --
> Jani Nikula, Intel Open Source Graphics Center



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: linux-next: manual merge of the akpm tree with the drm-intel tree

2020-10-01 Thread Daniel Vetter
On Thu, Oct 1, 2020 at 3:53 PM Christoph Hellwig  wrote:
>
> On Thu, Oct 01, 2020 at 08:39:17PM +1000, Stephen Rothwell wrote:
> > Hi all,
> >
> > Today's linux-next merge of the akpm tree got a conflict in:
> >
> >   drivers/gpu/drm/i915/gem/i915_gem_pages.c
> >
> > between commit:
> >
> >   4caf017ee937 ("drm/i915/gem: Avoid implicit vmap for highmem on x86-32")
> >   ba2ebf605d5f ("drm/i915/gem: Prevent using pgprot_writecombine() if PAT 
> > is not supported")

Uh these patches shouldn't be in linux-next because they're for 5.11,
not the 5.10 merge window that will open soon. Joonas?

> > from the drm-intel tree and patch:
> >
> >   "drm/i915: use vmap in i915_gem_object_map"
> >
> > from the akpm tree.
> >
> > I fixed it up (I just dropped the changes in the former commits) and
>
> Sigh.  The solution is a bit more complicated, but I just redid my
> patches to not depend on the above ones.  I can revert back to the old
> version, though.  Andrew, let me know what works for you.

Imo ignore, rebasing onto linux-next without those intel patches was
the right thing for the 5.10 merge window.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH -next] drm/panfrost: simplify the return expression of panfrost_devfreq_target()

2020-10-01 Thread Daniel Vetter
On Thu, Oct 1, 2020 at 12:58 PM Steven Price  wrote:
>
> On 21/09/2020 14:10, Qinglang Miao wrote:
> > Simplify the return expression.
> >
> > Signed-off-by: Qinglang Miao 
>
> Reviewed-by: Steven Price 

As committer/maintainer for this please indicate whether you'll merge
this or not, with just an r-b patches are in an awkward limbo state.
Since Qinglang isn't committer you probably want to merge their
patches, for otherwise they get lost.
-Daniel
>
> > ---
> >   drivers/gpu/drm/panfrost/panfrost_devfreq.c | 7 +--
> >   1 file changed, 1 insertion(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c 
> > b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> > index 8ab025d00..913eaa6d0 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> > @@ -29,18 +29,13 @@ static int panfrost_devfreq_target(struct device *dev, 
> > unsigned long *freq,
> >  u32 flags)
> >   {
> >   struct dev_pm_opp *opp;
> > - int err;
> >
> >   opp = devfreq_recommended_opp(dev, freq, flags);
> >   if (IS_ERR(opp))
> >   return PTR_ERR(opp);
> >   dev_pm_opp_put(opp);
> >
> > - err = dev_pm_opp_set_rate(dev, *freq);
> > - if (err)
> > - return err;
> > -
> > - return 0;
> > + return dev_pm_opp_set_rate(dev, *freq);
> >   }
> >
> >   static void panfrost_devfreq_reset(struct panfrost_devfreq *pfdevfreq)
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 1/3] drm: Add and export function drm_gem_cma_create_noalloc

2020-10-01 Thread Daniel Vetter
On Wed, Sep 30, 2020 at 07:16:42PM +0200, Paul Cercueil wrote:
> Add and export the function drm_gem_cma_create_noalloc(), which is just
> __drm_gem_cma_create() renamed.
> 
> This function can be used by drivers that need to create a GEM object
> without allocating the backing memory.
> 
> Signed-off-by: Paul Cercueil 
> ---
>  drivers/gpu/drm/drm_gem_cma_helper.c | 11 ++-
>  include/drm/drm_gem_cma_helper.h |  3 +++
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c 
> b/drivers/gpu/drm/drm_gem_cma_helper.c
> index 59b9ca207b42..6abc4b306832 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -34,7 +34,7 @@
>   */
>  
>  /**
> - * __drm_gem_cma_create - Create a GEM CMA object without allocating memory
> + * drm_gem_cma_create_noalloc - Create a GEM CMA object without allocating 
> memory
>   * @drm: DRM device
>   * @size: size of the object to allocate
>   *
> @@ -45,8 +45,8 @@
>   * A struct drm_gem_cma_object * on success or an ERR_PTR()-encoded negative
>   * error code on failure.
>   */
> -static struct drm_gem_cma_object *
> -__drm_gem_cma_create(struct drm_device *drm, size_t size)
> +struct drm_gem_cma_object *
> +drm_gem_cma_create_noalloc(struct drm_device *drm, size_t size)
>  {
>   struct drm_gem_cma_object *cma_obj;
>   struct drm_gem_object *gem_obj;
> @@ -76,6 +76,7 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
>   kfree(cma_obj);
>   return ERR_PTR(ret);
>  }
> +EXPORT_SYMBOL_GPL(drm_gem_cma_create_noalloc);

This feels a bit awkward, since for drivers who want to roll their own we
can do that already.

I think the better approach is to export a cma function which allocates
non-coherent dma memory.
-Daniel

>  
>  /**
>   * drm_gem_cma_create - allocate an object with the given size
> @@ -98,7 +99,7 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct 
> drm_device *drm,
>  
>   size = round_up(size, PAGE_SIZE);
>  
> - cma_obj = __drm_gem_cma_create(drm, size);
> + cma_obj = drm_gem_cma_create_noalloc(drm, size);
>   if (IS_ERR(cma_obj))
>   return cma_obj;
>  
> @@ -476,7 +477,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
>   return ERR_PTR(-EINVAL);
>  
>   /* Create a CMA GEM buffer. */
> - cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
> + cma_obj = drm_gem_cma_create_noalloc(dev, attach->dmabuf->size);
>   if (IS_ERR(cma_obj))
>   return ERR_CAST(cma_obj);
>  
> diff --git a/include/drm/drm_gem_cma_helper.h 
> b/include/drm/drm_gem_cma_helper.h
> index 2bfa2502607a..be2b8e3a8ab2 100644
> --- a/include/drm/drm_gem_cma_helper.h
> +++ b/include/drm/drm_gem_cma_helper.h
> @@ -83,6 +83,9 @@ int drm_gem_cma_mmap(struct file *filp, struct 
> vm_area_struct *vma);
>  struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
> size_t size);
>  
> +struct drm_gem_cma_object *
> +drm_gem_cma_create_noalloc(struct drm_device *drm, size_t size);
> +
>  extern const struct vm_operations_struct drm_gem_cma_vm_ops;
>  
>  #ifndef CONFIG_MMU
> -- 
> 2.28.0
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [RFC PATCH 0/4] Add a RPMsg driver to support AI Processing Unit (APU)

2020-10-01 Thread Daniel Vetter
On Wed, Sep 30, 2020 at 01:53:46PM +0200, Alexandre Bailon wrote:
> This adds a RPMsg driver that implements communication between the CPU and an
> APU.
> This uses VirtIO buffer to exchange messages but for sharing data, this uses
> a dmabuf, mapped to be shared between CPU (userspace) and APU.
> The driver is relatively generic, and should work with any SoC implementing
> hardware accelerator for AI if they use support remoteproc and VirtIO.
> 
> For the people interested by the firmware or userspace library,
> the sources are available here:
> https://github.com/BayLibre/open-amp/tree/v2020.01-mtk/apps/examples/apu

Since this has open userspace (from a very cursory look), and smells very
much like an acceleration driver, and seems to use dma-buf for memory
management: Why is this not just a drm driver?
-Daniel

> 
> Alexandre Bailon (3):
>   Add a RPMSG driver for the APU in the mt8183
>   rpmsg: apu_rpmsg: update the way to store IOMMU mapping
>   rpmsg: apu_rpmsg: Add an IOCTL to request IOMMU mapping
> 
> Julien STEPHAN (1):
>   rpmsg: apu_rpmsg: Add support for async apu request
> 
>  drivers/rpmsg/Kconfig  |   9 +
>  drivers/rpmsg/Makefile |   1 +
>  drivers/rpmsg/apu_rpmsg.c  | 752 +
>  drivers/rpmsg/apu_rpmsg.h  |  52 +++
>  include/uapi/linux/apu_rpmsg.h |  47 +++
>  5 files changed, 861 insertions(+)
>  create mode 100644 drivers/rpmsg/apu_rpmsg.c
>  create mode 100644 drivers/rpmsg/apu_rpmsg.h
>  create mode 100644 include/uapi/linux/apu_rpmsg.h
> 
> -- 
> 2.26.2
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v2 0/3] drm: commit_work scheduling

2020-10-01 Thread Daniel Vetter
On Wed, Sep 30, 2020 at 11:16 PM Rob Clark  wrote:
>
> From: Rob Clark 
>
> The android userspace treats the display pipeline as a realtime problem.
> And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> that I found.)
>
> But this presents a problem with using workqueues for non-blocking
> atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> preempt the worker.  Which is not really the outcome you want.. once
> the required fences are scheduled, you want to push the atomic commit
> down to hw ASAP.
>
> But the decision of whether commit_work should be RT or not really
> depends on what userspace is doing.  For a pure CFS userspace display
> pipeline, commit_work() should remain SCHED_NORMAL.
>
> To handle this, convert non-blocking commit_work() to use per-CRTC
> kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> used to avoid serializing commits when userspace is using a per-CRTC
> update loop.  And the last patch exposes the task id to userspace as
> a CRTC property, so that userspace can adjust the priority and sched
> policy to fit it's needs.
>
>
> v2: Drop client cap and in-kernel setting of priority/policy in
> favor of exposing the kworker tid to userspace so that user-
> space can set priority/policy.

Yeah I think this looks more reasonable. Still a bit irky interface,
so I'd like to get some kworker/rt ack on this. Other opens:
- needs userspace, the usual drill
- we need this also for vblank workers, otherwise this wont work for
drivers needing those because of another priority inversion.
- we probably want some indication of whether this actually does
something useful, not all drivers use atomic commit helpers. Not sure
how to do that.
- not sure whether the vfunc is an awesome idea, I'd frankly just
open-code this inline. We have similar special cases already for e.g.
dpms (and in multiple places), this isn't the worst.
- still feeling we could at least change the default to highpriority niceness.
- there's still the problem that commit works can overlap, and a
single worker can't do that anymore. So rolling that out for everyone
as-is feels a bit risky.

Cheers, Daniel

>
> Rob Clark (3):
>   drm/crtc: Introduce per-crtc kworker
>   drm/atomic: Use kthread worker for nonblocking commits
>   drm: Expose CRTC's kworker task id
>
>  drivers/gpu/drm/drm_atomic_helper.c | 13 
>  drivers/gpu/drm/drm_crtc.c  | 14 +
>  drivers/gpu/drm/drm_mode_config.c   | 14 +
>  drivers/gpu/drm/drm_mode_object.c   |  4 
>  include/drm/drm_atomic.h| 31 +
>  include/drm/drm_crtc.h  |  8 
>  include/drm/drm_mode_config.h   |  9 +
>  include/drm/drm_property.h  |  9 +
>  8 files changed, 98 insertions(+), 4 deletions(-)
>
> --
> 2.26.2
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: remove alloc_vm_area v2

2020-09-30 Thread Daniel Vetter
On Wed, Sep 30, 2020 at 4:48 PM Christoph Hellwig  wrote:
>
> On Tue, Sep 29, 2020 at 03:43:30PM +0300, Joonas Lahtinen wrote:
> > Hmm, those are both committed after our last -next pull request, so they
> > would normally only target next merge window. drm-next closes the merge
> > window around -rc5 already.
> >
> > But, in this specific case those are both Fixes: patches with Cc: stable,
> > so they should be pulled into drm-intel-next-fixes PR.
> >
> > Rodrigo, can you cherry-pick those patches to -next-fixes that you send
> > to Dave?
>
> They still haven't made it to linux-next.  I think for now I'll just
> rebase without them again and then you can handle the conflicts for
> 5.11.

Yeah after -rc6 drm is frozen for features, so anything that's stuck
in subordinate trees rolls over to the next merge cycle. To avoid
upsetting sfr from linux-next we keep those -next branches out of
linux-next until after -rc1 again. iow, rebasing onto linux-next and
smashing this into 5.10 sounds like the right approach (since everyone
else freezes a bunch later afaik).

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v4 20/52] drm: drm_dsc.h: fix a kernel-doc markup

2020-09-30 Thread Daniel Vetter
On Wed, Sep 30, 2020 at 03:24:43PM +0200, Mauro Carvalho Chehab wrote:
> As warned by Sphinx:
> 
>   ./Documentation/gpu/drm-kms-helpers:305: ./include/drm/drm_dsc.h:587: 
> WARNING: Unparseable C cross-reference: 'struct'
>   Invalid C declaration: Expected identifier in nested name, got keyword: 
> struct [error at 6]
> struct
> --^
> 
> The markup for one struct is wrong, as struct is used twice.
> 
> Signed-off-by: Mauro Carvalho Chehab 

Applied to drm-misc-fixes, thanks for your patch.
-Daniel

> ---
>  include/drm/drm_dsc.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/drm/drm_dsc.h b/include/drm/drm_dsc.h
> index 887954cbfc60..732f32740c86 100644
> --- a/include/drm/drm_dsc.h
> +++ b/include/drm/drm_dsc.h
> @@ -588,7 +588,7 @@ struct drm_dsc_picture_parameter_set {
>   * This structure represents the DSC PPS infoframe required to send the 
> Picture
>   * Parameter Set metadata required before enabling VESA Display Stream
>   * Compression. This is based on the DP Secondary Data Packet structure and
> - * comprises of SDP Header as defined  struct dp_sdp_header in 
> drm_dp_helper.h
> + * comprises of SDP Header as defined  dp_sdp_header in 
> drm_dp_helper.h
>   * and PPS payload defined in  drm_dsc_picture_parameter_set.
>   *
>   * @pps_header: Header for PPS as per DP SDP header format of type
> -- 
> 2.26.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 0/3] Prevent out-of-bounds access for built-in font data buffers

2020-09-30 Thread Daniel Vetter
On Wed, Sep 30, 2020 at 12:56 PM Peilin Ye  wrote:
>
> On Wed, Sep 30, 2020 at 11:53:17AM +0200, Daniel Vetter wrote:
> > On Wed, Sep 30, 2020 at 03:11:51AM -0400, Peilin Ye wrote:
> > > On Tue, Sep 29, 2020 at 04:38:49PM +0200, Daniel Vetter wrote:
> > > > On Tue, Sep 29, 2020 at 2:34 PM Peilin Ye  wrote:
> > > > > Ah, and speaking of built-in fonts, see fbcon_startup():
> > > > >
> > > > > /* Setup default font */
> > > > > [...]
> > > > > vc->vc_font.charcount = 256; /* FIXME  Need to 
> > > > > support more fonts */
> > > > > ^^^
> > > > >
> > > > > This is because find_font() and get_default_font() return a `struct
> > > > > font_desc *`, but `struct font_desc` doesn't contain `charcount`. I
> > > > > think we also need to add a `charcount` field to `struct font_desc`.
> > > >
> > > > Hm yeah ... I guess maybe struct font_desc should be the starting
> > > > point for the kernel internal font structure. It's at least there
> > > > already ...
> > >
> > > I see, that will also make handling built-in fonts much easier!
> >
> > I think the only downside with starting with font_desc as the internal
> > font represenation is that there's a few fields we don't need/have for
> > userspace fonts (like the id/name stuff). So any helpers to e.g. print out
> > font information need to make sure they don't trip over that
> >
> > But otherwise I don't see a problem with this, I think.
>
> Yes, and built-in fonts don't use refcount. Or maybe we can let
> find_font() and get_default_font() kmalloc() a copy of built-in font
> data, then keep track of refcount for both user and built-in fonts, but
> that will waste a few K of memory for each built-in font we use...

A possible trick for this would be to make sure built-in fonts start
out with a refcount of 1. So never get freed. Plus maybe a check that
if the name is set, then it's a built-in font and if we ever underflow
the refcount we just WARN, but don't free anything.

Another trick would be kern_font_get/put wrappers (we'd want those
anyway if the userspace fonts are refcounted) and if kern_font->name
!= NULL (i.e. built-in font with name) then we simply don't call
kref_get/put.
-Daniel

> > > > I think for vc_date->vc_font we might need a multi-step approach:
> > > > - first add a new helper function which sets the font for a vc using
> > > > an uapi console_font struct (and probably hard-coded assumes cnt ==
> > > > 256.
> > >
> > > But user fonts may have a charcount different to 256... But yes I'll try
> > > to figure out how.
> >
> > Hm yeah, maybe we need a helper to give us the charcount then, which by
> > default is using the magic negative offset.
>
> Ah, I see! :)
>
> > Then once we've converted everything over to explicitly passing charcount
> > around, we can switch that helper. So something like
> >
> > int kern_font_charcount(struct kern_font *font);
> >
> > Feel free to bikeshed the struct name however you see fit :-)
>
> I think both `kern_font` and `font_desc` makes sense, naming is so
> hard...
>
> > > > For first steps I'd start with demidlayering some of the internal
> > > > users of uapi structs, like the console_font_op really shouldn't be
> > > > used anywhere in any function, except in the ioctl handler that
> > > > converts it into the right function call. You'll probably discover a
> > > > few other places like this on the go.
> > >
> > > Sure, I'll start from this, then cleaning up these dummy functions, then
> > > `vc_data`. Thank you for the insights!
> >
> > Please don't take this rough plan as fixed, it's just where I'd start from
> > browsing the code and your analysis a bit. We'll probably have to adapt as
> > we go and more nasty things turn up ...
>
> Sure, I'll first give it a try and see. Thank you!
>
> Peilin Ye
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] drm/bridge: tc358764: restore connector support

2020-09-30 Thread Daniel Vetter
On Wed, Sep 30, 2020 at 12:31 PM Daniel Vetter  wrote:
>
> On Wed, Sep 30, 2020 at 12:13 PM Andrzej Hajda  wrote:
> >
> >
> > W dniu 24.09.2020 o 10:31, Marek Szyprowski pisze:
> > > This patch restores DRM connector registration in the TC358764 bridge
> > > driver and restores usage of the old drm_panel_* API, thus allows dynamic
> > > panel registration. This fixes panel operation on Exynos5250-based
> > > Arndale board.
> > >
> > > This is equivalent to the revert of the following commits:
> > > 1644127f83bc "drm/bridge: tc358764: add drm_panel_bridge support"
> > > 385ca38da29c "drm/bridge: tc358764: drop drm_connector_(un)register"
> > > and removal of the calls to drm_panel_attach()/drm_panel_detach(), which
> > > were no-ops and has been removed in meanwhile.
> > >
> > > Signed-off-by: Marek Szyprowski 
> > Reviewed-by: Andrzej Hajda 
> >
> > Regards
> > Andrzej
> > > ---
> > > As I've reported and Andrzej Hajda pointed, the reverted patches break
> > > operation of the panel on the Arndale board. Noone suggested how to fix
> > > the regression, I've decided to send a revert until a new solution is
> > > found.
> > >
> > > The issues with tc358764 might be automatically resolved once the Exynos
> > > DSI itself is converted to DRM bridge:
> > > https://patchwork.kernel.org/cover/11770683/
> > > but that approach has also its own issues so far.
>
> I'm ok with the revert to fix the regression, but I'd kinda like to
> see a bit more than "maybe we fix this in the future". Otherwise this
> nice idea of having a common drm_bridge abstraction is just leading
> towards a complete disaster where every combination of bridge/driver
> works slightly differently. And we're half-way there in that mess
> already I think.

I think minimally it would be good to at least cc tha author of the
commit you're reverting, and getting their ack.

Adding Sam.
-Daniel

>
> Cheers, Daniel
>
> > >
> > > Best regards,
> > > Marek Szyprowski
> > > ---
> > >   drivers/gpu/drm/bridge/tc358764.c | 107 +-
> > >   1 file changed, 92 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/bridge/tc358764.c 
> > > b/drivers/gpu/drm/bridge/tc358764.c
> > > index d89394bc5aa4..c1e35bdf9232 100644
> > > --- a/drivers/gpu/drm/bridge/tc358764.c
> > > +++ b/drivers/gpu/drm/bridge/tc358764.c
> > > @@ -153,9 +153,10 @@ static const char * const tc358764_supplies[] = {
> > >   struct tc358764 {
> > >   struct device *dev;
> > >   struct drm_bridge bridge;
> > > + struct drm_connector connector;
> > >   struct regulator_bulk_data supplies[ARRAY_SIZE(tc358764_supplies)];
> > >   struct gpio_desc *gpio_reset;
> > > - struct drm_bridge *panel_bridge;
> > > + struct drm_panel *panel;
> > >   int error;
> > >   };
> > >
> > > @@ -209,6 +210,12 @@ static inline struct tc358764 
> > > *bridge_to_tc358764(struct drm_bridge *bridge)
> > >   return container_of(bridge, struct tc358764, bridge);
> > >   }
> > >
> > > +static inline
> > > +struct tc358764 *connector_to_tc358764(struct drm_connector *connector)
> > > +{
> > > + return container_of(connector, struct tc358764, connector);
> > > +}
> > > +
> > >   static int tc358764_init(struct tc358764 *ctx)
> > >   {
> > >   u32 v = 0;
> > > @@ -271,11 +278,43 @@ static void tc358764_reset(struct tc358764 *ctx)
> > >   usleep_range(1000, 2000);
> > >   }
> > >
> > > +static int tc358764_get_modes(struct drm_connector *connector)
> > > +{
> > > + struct tc358764 *ctx = connector_to_tc358764(connector);
> > > +
> > > + return drm_panel_get_modes(ctx->panel, connector);
> > > +}
> > > +
> > > +static const
> > > +struct drm_connector_helper_funcs tc358764_connector_helper_funcs = {
> > > + .get_modes = tc358764_get_modes,
> > > +};
> > > +
> > > +static const struct drm_connector_funcs tc358764_connector_funcs = {
> > > + .fill_modes = drm_helper_probe_single_connector_modes,
> > > + .destroy = drm_connector_cleanup,
> > > + .reset = drm_atomic_helper_connector_reset,
> > > + .atomic_duplicate_state = 
> > > drm_atomic_helper_connector_duplicate_sta

Re: [PATCH] drm/bridge: tc358764: restore connector support

2020-09-30 Thread Daniel Vetter
 ret = drm_panel_unprepare(ctx->panel);
> > + if (ret < 0)
> > + dev_err(ctx->dev, "error unpreparing panel (%d)\n", ret);
> >   tc358764_reset(ctx);
> >   usleep_range(1, 15000);
> >   ret = regulator_bulk_disable(ARRAY_SIZE(ctx->supplies), 
> > ctx->supplies);
> > @@ -296,28 +335,71 @@ static void tc358764_pre_enable(struct drm_bridge 
> > *bridge)
> >   ret = tc358764_init(ctx);
> >   if (ret < 0)
> >   dev_err(ctx->dev, "error initializing bridge (%d)\n", ret);
> > + ret = drm_panel_prepare(ctx->panel);
> > + if (ret < 0)
> > + dev_err(ctx->dev, "error preparing panel (%d)\n", ret);
> > +}
> > +
> > +static void tc358764_enable(struct drm_bridge *bridge)
> > +{
> > + struct tc358764 *ctx = bridge_to_tc358764(bridge);
> > + int ret = drm_panel_enable(ctx->panel);
> > +
> > + if (ret < 0)
> > + dev_err(ctx->dev, "error enabling panel (%d)\n", ret);
> >   }
> >
> >   static int tc358764_attach(struct drm_bridge *bridge,
> >  enum drm_bridge_attach_flags flags)
> > +{
> > + struct tc358764 *ctx = bridge_to_tc358764(bridge);
> > + struct drm_device *drm = bridge->dev;
> > + int ret;
> > +
> > + if (flags & DRM_BRIDGE_ATTACH_NO_CONNECTOR) {
> > + DRM_ERROR("Fix bridge driver to make connector optional!");
> > + return -EINVAL;
> > + }
> > +
> > + ctx->connector.polled = DRM_CONNECTOR_POLL_HPD;
> > + ret = drm_connector_init(drm, >connector,
> > +  _connector_funcs,
> > +  DRM_MODE_CONNECTOR_LVDS);
> > + if (ret) {
> > + DRM_ERROR("Failed to initialize connector\n");
> > + return ret;
> > + }
> > +
> > + drm_connector_helper_add(>connector,
> > +  _connector_helper_funcs);
> > + drm_connector_attach_encoder(>connector, bridge->encoder);
> > + ctx->connector.funcs->reset(>connector);
> > + drm_connector_register(>connector);
> > +
> > + return 0;
> > +}
> > +
> > +static void tc358764_detach(struct drm_bridge *bridge)
> >   {
> >   struct tc358764 *ctx = bridge_to_tc358764(bridge);
> >
> > - return drm_bridge_attach(bridge->encoder, ctx->panel_bridge,
> > -  bridge, flags);
> > + drm_connector_unregister(>connector);
> > + ctx->panel = NULL;
> > + drm_connector_put(>connector);
> >   }
> >
> >   static const struct drm_bridge_funcs tc358764_bridge_funcs = {
> > + .disable = tc358764_disable,
> >   .post_disable = tc358764_post_disable,
> > + .enable = tc358764_enable,
> >   .pre_enable = tc358764_pre_enable,
> >   .attach = tc358764_attach,
> > + .detach = tc358764_detach,
> >   };
> >
> >   static int tc358764_parse_dt(struct tc358764 *ctx)
> >   {
> > - struct drm_bridge *panel_bridge;
> >   struct device *dev = ctx->dev;
> > - struct drm_panel *panel;
> >   int ret;
> >
> >   ctx->gpio_reset = devm_gpiod_get(dev, "reset", GPIOD_OUT_LOW);
> > @@ -326,16 +408,12 @@ static int tc358764_parse_dt(struct tc358764 *ctx)
> >   return PTR_ERR(ctx->gpio_reset);
> >   }
> >
> > - ret = drm_of_find_panel_or_bridge(dev->of_node, 1, 0, , NULL);
> > - if (ret)
> > - return ret;
> > -
> > - panel_bridge = devm_drm_panel_bridge_add(dev, panel);
> > - if (IS_ERR(panel_bridge))
> > - return PTR_ERR(panel_bridge);
> > + ret = drm_of_find_panel_or_bridge(ctx->dev->of_node, 1, 0, 
> > >panel,
> > +   NULL);
> > + if (ret && ret != -EPROBE_DEFER)
> > + dev_err(dev, "cannot find panel (%d)\n", ret);
> >
> > - ctx->panel_bridge = panel_bridge;
> > - return 0;
> > + return ret;
> >   }
> >
> >   static int tc358764_configure_regulators(struct tc358764 *ctx)
> > @@ -381,7 +459,6 @@ static int tc358764_probe(struct mipi_dsi_device *dsi)
> >   return ret;
> >
> >   ctx->bridge.funcs = _bridge_funcs;
> > - ctx->bridge.type = DRM_MODE_CONNECTOR_LVDS;
> >   ctx->bridge.of_node = dev->of_node;
> >
> >   drm_bridge_add(>bridge);
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 3/3] RFC: dma-buf: Add an API for importing and exporting sync files (v5)

2020-09-30 Thread Daniel Vetter
On Wed, Sep 30, 2020 at 11:39:06AM +0200, Michel Dänzer wrote:
> On 2020-03-17 10:21 p.m., Jason Ekstrand wrote:
> > Explicit synchronization is the future.  At least, that seems to be what
> > most userspace APIs are agreeing on at this point.  However, most of our
> > Linux APIs (both userspace and kernel UAPI) are currently built around
> > implicit synchronization with dma-buf.  While work is ongoing to change
> > many of the userspace APIs and protocols to an explicit synchronization
> > model, switching over piecemeal is difficult due to the number of
> > potential components involved.  On the kernel side, many drivers use
> > dma-buf including GPU (3D/compute), display, v4l, and others.  In
> > userspace, we have X11, several Wayland compositors, 3D drivers, compute
> > drivers (OpenCL etc.), media encode/decode, and the list goes on.
> > 
> > This patch provides a path forward by allowing userspace to manually
> > manage the fences attached to a dma-buf.  Alternatively, one can think
> > of this as making dma-buf's implicit synchronization simply a carrier
> > for an explicit fence.  This is accomplished by adding two IOCTLs to
> > dma-buf for importing and exporting a sync file to/from the dma-buf.
> > This way a userspace component which is uses explicit synchronization,
> > such as a Vulkan driver, can manually set the write fence on a buffer
> > before handing it off to an implicitly synchronized component such as a
> > Wayland compositor or video encoder.  In this way, each of the different
> > components can be upgraded to an explicit synchronization model one at a
> > time as long as the userspace pieces connecting them are aware of it and
> > import/export fences at the right times.
> > 
> > There is a potential race condition with this API if userspace is not
> > careful.  A typical use case for implicit synchronization is to wait for
> > the dma-buf to be ready, use it, and then signal it for some other
> > component.  Because a sync_file cannot be created until it is guaranteed
> > to complete in finite time, userspace can only signal the dma-buf after
> > it has already submitted the work which uses it to the kernel and has
> > received a sync_file back.  There is no way to atomically submit a
> > wait-use-signal operation.  This is not, however, really a problem with
> > this API so much as it is a problem with explicit synchronization
> > itself.  The way this is typically handled is to have very explicit
> > ownership transfer points in the API or protocol which ensure that only
> > one component is using it at any given time.  Both X11 (via the PRESENT
> > extension) and Wayland provide such ownership transfer points via
> > explicit present and idle messages.
> > 
> > The decision was intentionally made in this patch to make the import and
> > export operations IOCTLs on the dma-buf itself rather than as a DRM
> > IOCTL.  This makes it the import/export operation universal across all
> > components which use dma-buf including GPU, display, v4l, and others.
> > It also means that a userspace component can do the import/export
> > without access to the DRM fd which may be tricky to get in cases where
> > the client communicates with DRM via a userspace API such as OpenGL or
> > Vulkan.  At a future date we may choose to add direct import/export APIs
> > to components such as drm_syncobj to avoid allocating a file descriptor
> > and going through two ioctls.  However, that seems to be something of a
> > micro-optimization as import/export operations are likely to happen at a
> > rate of a few per frame of rendered or decoded video.
> > 
> > v2 (Jason Ekstrand):
> >   - Use a wrapper dma_fence_array of all fences including the new one
> > when importing an exclusive fence.
> > 
> > v3 (Jason Ekstrand):
> >   - Lock around setting shared fences as well as exclusive
> >   - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
> >   - Initialize ret to 0 in dma_buf_wait_sync_file
> > 
> > v4 (Jason Ekstrand):
> >   - Use the new dma_resv_get_singleton helper
> > 
> > v5 (Jason Ekstrand):
> >   - Rename the IOCTLs to import/export rather than wait/signal
> >   - Drop the WRITE flag and always get/set the exclusive fence
> > 
> > Signed-off-by: Jason Ekstrand 
> 
> What's the status of this? DMA_BUF_IOCTL_EXPORT_SYNC_FILE would be useful
> for Wayland compositors to wait for client buffers to become ready without
> being prone to getting delayed by later HW access to them, so it would be
> nice to merge that at least (if DMA_BUF_IOCTL_IMPORT_SYNC_FILE is still
> controversial).

I think the missing bits are just the usual stuff
- igt testcases
- userspace using the new ioctls
- review of the entire pile

I don't think there's any fundamental objections aside from "no one ever
pushed this over the finish line".

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 0/3] Prevent out-of-bounds access for built-in font data buffers

2020-09-30 Thread Daniel Vetter
On Wed, Sep 30, 2020 at 03:11:51AM -0400, Peilin Ye wrote:
> On Tue, Sep 29, 2020 at 04:38:49PM +0200, Daniel Vetter wrote:
> > On Tue, Sep 29, 2020 at 2:34 PM Peilin Ye  wrote:
> > > It seems that users don't use `console_font` directly, they use
> > > `console_font_op`. Then, in TTY:
> > 
> > Wow, this is a maze :-/
> > 
> > > (drivers/tty/vt/vt.c)
> > > int con_font_op(struct vc_data *vc, struct console_font_op *op)
> > > {
> > > switch (op->op) {
> > > case KD_FONT_OP_SET:
> > > return con_font_set(vc, op);
> > > case KD_FONT_OP_GET:
> > > return con_font_get(vc, op);
> > > case KD_FONT_OP_SET_DEFAULT:
> > > return con_font_default(vc, op);
> > > case KD_FONT_OP_COPY:
> > > return con_font_copy(vc, op);
> > > }
> > > return -ENOSYS;
> > > }
> > 
> > So my gut feeling is that this is just a bit of overenthusiastic
> > common code sharing, and all it results is confuse everyone. I think
> > if we change the conf_font_get/set/default/copy functions to not take
> > the *op struct (which is take pretty arbitrarily from one of the
> > ioctl), but the parameters each needs directly, that would clean up
> > the code a _lot_. Since most callers would then directly call the
> > right operation, instead of this detour through console_font_op.
> > struct console_font_op is an uapi struct, so really shouldn't be used
> > for internal abstractions - we can't change uapi, hence this makes it
> > impossible to refactor anything from the get-go.
> > 
> > I also think that trying to get rid of con_font_op callers as much as
> > possible (everywhere where the op struct is constructed in the kernel
> > and doesn't come from userspace essentially) should be doable as a
> > stand-alone patch series.
> 
> I see, I'll do some code searching and try to clean them up.
> 
> > > These 4 functions allocate `console_font`. We can replace them with our
> > > `kernel_console_font`. So, ...
> > >
> > > $ vgrep "\.con_font_set"
> > 
> > An aside: git grep is awesome, and really fast.
> 
> Ah, yes, by default vgrep uses git-grep. I use vgrep when I need to see
> something colorful :)
> 
> > > $ vgrep "\.con_font_get"
> > > Index FileLine Content
> > > 0 drivers/usb/misc/sisusbvga/sisusb_con.c 1295 .con_font_get =
> > >   sisusbcon_font_get,
> > > 1 drivers/video/console/vgacon.c  1227 .con_font_get = 
> > > vgacon_font_get,
> > > 2 drivers/video/fbdev/core/fbcon.c3121 .con_font_get  
> > >   = fbcon_get_font,
> > > $
> > > $ vgrep "\.con_font_default"
> > > Index FileLine Content
> > > 0 drivers/usb/misc/sisusbvga/sisusb_con.c 1379 .con_font_default =  
> > > sisusbdummycon_font_default,
> > > 1 drivers/video/console/dummycon.c 163 .con_font_default =  
> > > dummycon_font_default,
> > 
> > The above two return 0 but do nothing, which means width/height are
> > now bogus (or well the same as what userspace set). I don't think that
> > works correctly ...
> > 
> > > 2 drivers/video/console/newport_con.c  694 .con_font_default = 
> > > newport_font_default,
> > 
> > This just seems to release the userspace font. This is already done in
> > other places where it makes a lot more sense to clean up.
> > 
> > > 3 drivers/video/fbdev/core/fbcon.c3122 .con_font_default= 
> > > fbcon_set_def_font,
> > 
> > This actually does something. tbh I would not be surprises if the
> > fb_set utility is the only thing that uses this - with a bit of code
> > search we could perhaps confirm this, and delete all the other
> > implementations.
> > 
> > > $ vgrep "\.con_font_copy"
> > > Index FileLine Content
> > > 0 drivers/usb/misc/sisusbvga/sisusb_con.c 1380 .con_font_copy = 
> > > sisusbdummycon_font_copy,
> > > 1 drivers/video/console/dummycon.c 164 .con_font_copy = 
> > > dummycon_font_copy,
> > 
> > Above two do nothing, but return 0. Again this wont work I think.
> > 
> > > 2 drivers/video/fbdev/core/fbcon.c3123 .con_font_copy 
> > >   = fbcon_copy_font,
> 

Re: [PATCH] vt_ioctl: make VT_RESIZEX behave like VT_RESIZE

2020-09-29 Thread Daniel Vetter
On Tue, Sep 29, 2020 at 12:52:03PM +0200, Martin Hostettler wrote:
> On Tue, Sep 29, 2020 at 10:12:46AM +0900, Tetsuo Handa wrote:
> > On 2020/09/29 2:59, Martin Hostettler wrote:
> > > On Sun, Sep 27, 2020 at 08:46:30PM +0900, Tetsuo Handa wrote:
> > >> VT_RESIZEX was introduced in Linux 1.3.3, but it is unclear that what
> > >> comes to the "+ more" part, and I couldn't find a user of VT_RESIZEX.
> > >>
> > > 
> > > It seems this is/was used by "svgatextmode" which seems to be at
> > > http://www.ibiblio.org/pub/Linux/utils/console/
> > > 
> > > Not sure if that kind of software still has a chance to work nowadays.
> > > 
> > 
> > Thanks for the information.
> > 
> > It seems that v.v_vlin = curr_textmode->VDisplay / 
> > (MOFLG_ISSET(curr_textmode, ATTR_DOUBLESCAN) ? 2 : 1)
> > and v.v_clin = curr_textmode->FontHeight . Thus, v.v_clin is font's height 
> > and seems to be non-zero.
> > But according to https://bugs.gentoo.org/19485 , people are using kernel 
> > framebuffer instead.
> > 
> 
> Yes, this seems to be from pre framebuffer times.
> 
> Back in the days "svga" was the wording you got for "pokes svga card
> hardware registers from userspace drivers". And textmode means font
> rendering is done via (fixed function in those times) hardware scanout
> engine. Of course having only to update 2 bytes per character was a huge
> saving early on. Likely this is also before vesa VBE was reliable.
> 
> So i guess the point where this all starts going wrong allowing the X parts
> of the api to be combined with FB based rendering at all? Sounds the only
> user didn't use that combination and so it was never tested?
> 
> Then again, this all relates to hardware from 20 years ago...

Imo userspace modesetting should be burned down anywhere we can. We've
gotten away with this in drivers/gpu by just seamlessly transitioning to
kernel drivers.

Since th only userspace we've found seems to be able to cope if this ioctl
doesn't do anything, my vote goes towards ripping it out completely and
doing nothing in there. Only question is whether we should error or fail
with a silent success: Former is safer, latter can avoid a few regression
reports since the userspace tools keep "working", and usually people don't
notice for stuff this old. It worked in drivers/gpu :-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 0/3] Prevent out-of-bounds access for built-in font data buffers

2020-09-29 Thread Daniel Vetter
On Tue, Sep 29, 2020 at 2:34 PM Peilin Ye  wrote:
>
> On Fri, Sep 25, 2020 at 03:25:51PM +0200, Daniel Vetter wrote:
> > I think the only way to make this work is that we have one place which
> > takes in the userspace uapi struct, and then converts it once into a
> > kernel_console_font. With all the error checking.
>
> Hi Daniel,
>
> It seems that users don't use `console_font` directly, they use
> `console_font_op`. Then, in TTY:

Wow, this is a maze :-/

> (drivers/tty/vt/vt.c)
> int con_font_op(struct vc_data *vc, struct console_font_op *op)
> {
> switch (op->op) {
> case KD_FONT_OP_SET:
> return con_font_set(vc, op);
> case KD_FONT_OP_GET:
> return con_font_get(vc, op);
> case KD_FONT_OP_SET_DEFAULT:
> return con_font_default(vc, op);
> case KD_FONT_OP_COPY:
> return con_font_copy(vc, op);
> }
> return -ENOSYS;
> }

So my gut feeling is that this is just a bit of overenthusiastic
common code sharing, and all it results is confuse everyone. I think
if we change the conf_font_get/set/default/copy functions to not take
the *op struct (which is take pretty arbitrarily from one of the
ioctl), but the parameters each needs directly, that would clean up
the code a _lot_. Since most callers would then directly call the
right operation, instead of this detour through console_font_op.
struct console_font_op is an uapi struct, so really shouldn't be used
for internal abstractions - we can't change uapi, hence this makes it
impossible to refactor anything from the get-go.

I also think that trying to get rid of con_font_op callers as much as
possible (everywhere where the op struct is constructed in the kernel
and doesn't come from userspace essentially) should be doable as a
stand-alone patch series.

> These 4 functions allocate `console_font`. We can replace them with our
> `kernel_console_font`. So, ...
>
> $ vgrep "\.con_font_set"

An aside: git grep is awesome, and really fast.

> Index FileLine Content
> 0 drivers/usb/misc/sisusbvga/sisusb_con.c 1294 .con_font_set =
>   sisusbcon_font_set,
> 1 drivers/usb/misc/sisusbvga/sisusb_con.c 1378 .con_font_set =
>   sisusbdummycon_font_set,
> 2 drivers/video/console/dummycon.c 162 .con_font_set =  
> dummycon_font_set,
> 3 drivers/video/console/newport_con.c  693 .con_font_set  = 
> newport_font_set,
> 4 drivers/video/console/vgacon.c  1226 .con_font_set = 
> vgacon_font_set,
> 5 drivers/video/fbdev/core/fbcon.c3120 .con_font_set  
>   = fbcon_set_font,
> $
> $ vgrep "\.con_font_get"
> Index FileLine Content
> 0 drivers/usb/misc/sisusbvga/sisusb_con.c 1295 .con_font_get =
>   sisusbcon_font_get,
> 1 drivers/video/console/vgacon.c  1227 .con_font_get = 
> vgacon_font_get,
> 2 drivers/video/fbdev/core/fbcon.c3121 .con_font_get  
>   = fbcon_get_font,
> $
> $ vgrep "\.con_font_default"
> Index FileLine Content
> 0 drivers/usb/misc/sisusbvga/sisusb_con.c 1379 .con_font_default =  
> sisusbdummycon_font_default,
> 1 drivers/video/console/dummycon.c 163 .con_font_default =  
> dummycon_font_default,

The above two return 0 but do nothing, which means width/height are
now bogus (or well the same as what userspace set). I don't think that
works correctly ...

> 2 drivers/video/console/newport_con.c  694 .con_font_default = 
> newport_font_default,

This just seems to release the userspace font. This is already done in
other places where it makes a lot more sense to clean up.

> 3 drivers/video/fbdev/core/fbcon.c3122 .con_font_default= 
> fbcon_set_def_font,

This actually does something. tbh I would not be surprises if the
fb_set utility is the only thing that uses this - with a bit of code
search we could perhaps confirm this, and delete all the other
implementations.

> $
> $ vgrep "\.con_font_copy"
> Index FileLine Content
> 0 drivers/usb/misc/sisusbvga/sisusb_con.c 1380 .con_font_copy = 
> sisusbdummycon_font_copy,
> 1 drivers/video/console/dummycon.c 164 .con_font_copy = 
> dummycon_font_copy,

Above two do nothing, but return 0. Again this wont work I think.

> 2 drivers/video/fbdev/core/fbcon.c3123 .con_font_copy 
>   = fbcon_copy_font,

Smells again like something that's only used by fb_set, and we could
probably delete the other dummy implementations. Also I'm not even
really clear on wh

Re: [PATCH v2 4/4] drm/qxl: use qxl pin function

2020-09-29 Thread Daniel Vetter
On Tue, Sep 29, 2020 at 11:51:15AM +0200, Gerd Hoffmann wrote:
> Otherwise ttm throws a WARN because we try to pin without a reservation.
> 
> Fixes: 9d36d4320462 ("drm/qxl: switch over to the new pin interface")
> Signed-off-by: Gerd Hoffmann 
> ---
>  drivers/gpu/drm/qxl/qxl_object.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/qxl/qxl_object.c 
> b/drivers/gpu/drm/qxl/qxl_object.c
> index d3635e3e3267..eb45267d51db 100644
> --- a/drivers/gpu/drm/qxl/qxl_object.c
> +++ b/drivers/gpu/drm/qxl/qxl_object.c
> @@ -145,7 +145,7 @@ int qxl_bo_create(struct qxl_device *qdev,
>   return r;
>   }
>   if (pinned)
> - ttm_bo_pin(>tbo);
> + qxl_bo_pin(bo);

I think this is now after ttm_bo_init, and at that point the object is
visible to lru users and everything. So I do think you need to grab locks
here instead of just incrementing the pin count alone.

It's also I think a bit racy, since ttm_bo_init drops the lock, so someone
might have snuck in and evicted the object already.

I think what you need is to call ttm_bo_init_reserved, then ttm_bo_pin,
then ttm_bo_unreserve, all explicitly.
-Daniel

>   *bo_ptr = bo;
>   return 0;
>  }
> -- 
> 2.27.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 0/3] Prevent out-of-bounds access for built-in font data buffers

2020-09-29 Thread Daniel Vetter
On Fri, Sep 25, 2020 at 11:35:09AM -0400, Peilin Ye wrote:
> On Fri, Sep 25, 2020 at 03:25:51PM +0200, Daniel Vetter wrote:
> > I think the only way to make this work is that we have one place which
> > takes in the userspace uapi struct, and then converts it once into a
> > kernel_console_font. With all the error checking.
> 
> Ah, I didn't think of that! When trying to introduce
> `kernel_console_font` I ended up using the uapi version and the kernel
> version in parallel...
> 
> > Then all internal code deals in terms of kernel_console_font, with
> > properly typed and named struct members and helper functions and
> > everything. And we might need a gradual conversion for this, so that first
> > we can convert over invidual console drivers, then subsystems, until at
> > the end we've pushed the conversion from uapi array to kernel_console_font
> > all the way to the ioctl entry points.
> > 
> > But that's indeed a huge pile of work, and fair warning: fbcon is
> > semi-orphaned, so by doing this you'll pretty much volunteer for
> > maintainership :-)
> >
> > But I'd be very happy to help get this done and throw some maintainership
> > credentials at you in the proces ...
> 
> Sounds exciting, I will be glad to do this! I'm just a beginner, but I
> will try to do what I can do.

If you want to follow along a bit I think would be good to subscribe to
the dri-devel mailing list. At least for all the fbcon/fbdev/gpu stuff.

I don't think there's a dedicated list for vt/console stuff, aside from
Greg's inbox :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [patch 00/13] preempt: Make preempt count unconditional

2020-09-29 Thread Daniel Vetter
On Tue, Sep 29, 2020 at 10:19:38AM +0200, Michal Hocko wrote:
> On Wed 16-09-20 23:43:02, Daniel Vetter wrote:
> > I can
> > then figure out whether it's better to risk not spotting issues with
> > call_rcu vs slapping a memalloc_noio_save/restore around all these
> > critical section which force-degrades any allocation to GFP_ATOMIC at
> 
> did you mean memalloc_noreclaim_* here?

Yeah I picked the wrong one of that family of functions.

> > most, but has the risk that we run into code that assumes "GFP_KERNEL
> > never fails for small stuff" and has a decidedly less tested fallback
> > path than rcu code.
> 
> Even if the above then please note that memalloc_noreclaim_* or
> PF_MEMALLOC should be used with an extreme care. Essentially only for
> internal memory reclaimers. It grants access to _all_ the available
> memory so any abuse can be detrimental to the overall system operation.
> Allocation failure in this mode means that we are out of memory and any
> code relying on such an allocation has to carefuly consider failure.
> This is not a random allocation mode.

Agreed, that's why I don't like having these kind of automagic critical
sections. It's a bit a shotgun approach. Paul said that the code would
handle failures, but the problem is that it applies everywhere.

Anyway my understanding is that call_rcu will be reworked and gain a pile
of tricks so that these problems for the callchains leading to call_rcu
all disappear.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: KASAN: global-out-of-bounds Read in bit_putcs (2)

2020-09-26 Thread Daniel Vetter
On Sat, Sep 26, 2020 at 9:19 AM syzbot
 wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:171d4ff7 Merge tag 'mmc-v5.9-rc4-2' of git://git.kernel.or..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=126e918d90
> kernel config:  https://syzkaller.appspot.com/x/.config?x=240e2ebab67245c7
> dashboard link: https://syzkaller.appspot.com/bug?extid=a889d70ef11d6e0f6f22
> compiler:   gcc (GCC) 10.1.0-syz 20200507
>
> Unfortunately, I don't have any reproducer for this issue yet.

Looking at the backtrace, this could be fixed by the font size checks
I just pushed:

commit 5af08640795b2b9a940c9266c0260455377ae262 (HEAD ->
drm-misc-fixes, drm-misc/for-linux-next-fixes,
drm-misc/drm-misc-fixes)
Author: Peilin Ye 
Date:   Thu Sep 24 09:43:48 2020 -0400

fbcon: Fix global-out-of-bounds read in fbcon_get_font()

But just an educated guess, no more.
-Daniel

>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+a889d70ef11d6e0f6...@syzkaller.appspotmail.com
>
> ==
> BUG: KASAN: global-out-of-bounds in __fb_pad_aligned_buffer 
> include/linux/fb.h:654 [inline]
> BUG: KASAN: global-out-of-bounds in bit_putcs_aligned 
> drivers/video/fbdev/core/bitblit.c:96 [inline]
> BUG: KASAN: global-out-of-bounds in bit_putcs+0xbb6/0xd20 
> drivers/video/fbdev/core/bitblit.c:185
> Read of size 1 at addr 88db78e9 by task syz-executor.4/16465
>
> CPU: 0 PID: 16465 Comm: syz-executor.4 Not tainted 5.9.0-rc6-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x198/0x1fd lib/dump_stack.c:118
>  print_address_description.constprop.0.cold+0x5/0x497 mm/kasan/report.c:383
>  __kasan_report mm/kasan/report.c:513 [inline]
>  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
>  __fb_pad_aligned_buffer include/linux/fb.h:654 [inline]
>  bit_putcs_aligned drivers/video/fbdev/core/bitblit.c:96 [inline]
>  bit_putcs+0xbb6/0xd20 drivers/video/fbdev/core/bitblit.c:185
>  fbcon_putcs+0x35a/0x450 drivers/video/fbdev/core/fbcon.c:1308
>  do_update_region+0x399/0x630 drivers/tty/vt/vt.c:675
>  redraw_screen+0x658/0x790 drivers/tty/vt/vt.c:1034
>  fbcon_do_set_font+0x718/0x880 drivers/video/fbdev/core/fbcon.c:2438
>  fbcon_copy_font+0x12f/0x1a0 drivers/video/fbdev/core/fbcon.c:2453
>  con_font_copy drivers/tty/vt/vt.c:4719 [inline]
>  con_font_op+0x65b/0x1140 drivers/tty/vt/vt.c:4734
>  vt_k_ioctl drivers/tty/vt/vt_ioctl.c:473 [inline]
>  vt_ioctl+0x21c9/0x2e90 drivers/tty/vt/vt_ioctl.c:852
>  tty_ioctl+0x1019/0x15f0 drivers/tty/tty_io.c:2656
>  vfs_ioctl fs/ioctl.c:48 [inline]
>  __do_sys_ioctl fs/ioctl.c:753 [inline]
>  __se_sys_ioctl fs/ioctl.c:739 [inline]
>  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x45e179
> Code: 3d b2 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 
> 83 0b b2 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7fe2e03e5c78 EFLAGS: 0246 ORIG_RAX: 0010
> RAX: ffda RBX: 00010840 RCX: 0045e179
> RDX: 2400 RSI: 4b72 RDI: 0003
> RBP: 0118cf80 R08:  R09: 
> R10:  R11: 0246 R12: 0118cf4c
> R13: 77397e9f R14: 7fe2e03e69c0 R15: 0118cf4c
>
> The buggy address belongs to the variable:
>  oid_index+0x109/0xae0
>
> Memory state around the buggy address:
>  88db7780: 00 00 00 00 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
>  88db7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 06 f9
> >88db7880: f9 f9 f9 f9 00 02 f9 f9 f9 f9 f9 f9 00 01 f9 f9
>   ^
>  88db7900: f9 f9 f9 f9 00 04 f9 f9 f9 f9 f9 f9 00 00 02 f9
>  88db7980: f9 f9 f9 f9 00 03 f9 f9 f9 f9 f9 f9 07 f9 f9 f9
> ==
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 0/3] Prevent out-of-bounds access for built-in font data buffers

2020-09-25 Thread Daniel Vetter
On Fri, Sep 25, 2020 at 06:13:00AM -0400, Peilin Ye wrote:
> Hi all!
> 
> On Fri, Sep 25, 2020 at 08:46:04AM +0200, Jiri Slaby wrote:
> > > In order to perform a reliable range check, fbcon_get_font() needs to know
> > > `FONTDATAMAX` for each built-in font under lib/fonts/. Unfortunately, we
> > > do not keep that information in our font descriptor,
> > > `struct console_font`:
> > > 
> > > (include/uapi/linux/kd.h)
> > > struct console_font {
> > >   unsigned int width, height; /* font size */
> > >   unsigned int charcount;
> > >   unsigned char *data;/* font data with height fixed to 32 */
> > > };
> > > 
> > > To make things worse, `struct console_font` is part of the UAPI, so we
> > > cannot add a new field to keep track of `FONTDATAMAX`.
> > 
> > Hi,
> > 
> > but you still can define struct kernel_console_font containing struct
> > console_font and the 4 more members you need in the kernel. See below.
> > 
> > > Fortunately, the framebuffer layer itself gives us a hint of how to
> > > resolve this issue without changing UAPI. When allocating a buffer for a
> > > user-provided font, fbcon_set_font() reserves four "extra words" at the
> > > beginning of the buffer:
> > > 
> > > (drivers/video/fbdev/core/fbcon.c)
> > >   new_data = kmalloc(FONT_EXTRA_WORDS * sizeof(int) + size, GFP_USER);
> > 
> > I might be missing something (like coffee in the morning), but why don't
> > you just:
> > 1) declare struct font_data as
> > {
> >   unsigned sum, char_count, size, refcnt;
> >   const unsigned char data[];
> > }
> > 
> > Or maybe "struct console_font font" instead of "const unsigned char
> > data[]", if need be.
> > 
> > 2) allocate by:
> >   kmalloc(struct_size(struct font_data, data, size));
> > 
> > 3) use container_of wherever needed
> > 
> > That is you name the data on negative indexes using struct as you
> > already have to define one.
> > 
> > Then you don't need the ugly macros with negative indexes. And you can
> > pass this structure down e.g. to fbcon_do_set_font, avoiding potential
> > mistakes in accessing data[-1] and similar.
> 
> Sorry that I didn't mention it in the cover letter, but yes, I've tried
> this - a new `kernel_console_font` would be much cleaner than negative
> array indexing.
> 
> The reason I ended up giving it up was, frankly speaking, these macros
> are being used at about 30 places, and I am not familiar enough with the
> framebuffer and newport_con code, so I wasn't confident how to clean
> them up and plug in `kernel_console_font` properly...
> 
> Another reason was that, functions like fbcon_get_font() handle both user
> fonts and built-in fonts, so I wanted a single solution for both of
> them. I think we can't really introduce `kernel_console_font` while
> keeping these macros, that would make the error handling logics etc.
> very messy.
> 
> I'm not very sure what to do now. Should I give it another try cleaning
> up all the macros?
> 
> And thank you for reviewing this!

I think the only way to make this work is that we have one place which
takes in the userspace uapi struct, and then converts it once into a
kernel_console_font. With all the error checking.

Then all internal code deals in terms of kernel_console_font, with
properly typed and named struct members and helper functions and
everything. And we might need a gradual conversion for this, so that first
we can convert over invidual console drivers, then subsystems, until at
the end we've pushed the conversion from uapi array to kernel_console_font
all the way to the ioctl entry points.

But that's indeed a huge pile of work, and fair warning: fbcon is
semi-orphaned, so by doing this you'll pretty much volunteer for
maintainership :-)

But I'd be very happy to help get this done and throw some maintainership
credentials at you in the proces ...

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] drm/vc4: Deleted the drm_device declaration

2020-09-25 Thread Daniel Vetter
On Fri, Sep 25, 2020 at 04:51:38PM +0800, Tian Tao wrote:
> drm_modeset_lock.h already declares struct drm_device, so there's no
> need to declare it in vc4_drv.h
> 
> Signed-off-by: Tian Tao 

Just an aside, when submitting patches please use
scripts/get_maintainers.pl to generate the recipient list. Looking through
past few patches from you it seems fairly arbitrary and often misses the
actual maintainers for a given piece of code, which increases the odds the
patch will get lost a lot.

E.g. for this one I'm only like the 5th or so fallback person, and the
main maintainer isn't on the recipient list.

Cheeers, Daniel

> ---
>  drivers/gpu/drm/vc4/vc4_drv.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
> index 8c8d96b..8717a1c 100644
> --- a/drivers/gpu/drm/vc4/vc4_drv.h
> +++ b/drivers/gpu/drm/vc4/vc4_drv.h
> @@ -19,7 +19,6 @@
>  
>  #include "uapi/drm/vc4_drm.h"
>  
> -struct drm_device;
>  struct drm_gem_object;
>  
>  /* Don't forget to update vc4_bo.c: bo_type_names[] when adding to
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 0/3] Prevent out-of-bounds access for built-in font data buffers

2020-09-25 Thread Daniel Vetter
  |  7 ---
> >  drivers/video/fbdev/core/fbcon_rotate.c |  1 +
> >  drivers/video/fbdev/core/tileblit.c |  1 +
> >  include/linux/font.h| 13 +
> >  lib/fonts/font_10x18.c  |  9 -
> >  lib/fonts/font_6x10.c   |  9 +
> >  lib/fonts/font_6x11.c   |  9 -
> >  lib/fonts/font_7x14.c   |  9 -
> >  lib/fonts/font_8x16.c   |  9 -
> >  lib/fonts/font_8x8.c|  9 -
> >  lib/fonts/font_acorn_8x8.c  |  9 ++---
> >  lib/fonts/font_mini_4x6.c   |  8 
> >  lib/fonts/font_pearl_8x8.c  |  9 -
> >  lib/fonts/font_sun12x22.c   |  9 -
> >  lib/fonts/font_sun8x16.c|  7 ---
> >  lib/fonts/font_ter16x32.c   |  9 -
> >  18 files changed, 79 insertions(+), 67 deletions(-)
> 
> Gotta love going backwards in arrays :)
> 
> Nice work, whole series is:
> 
> Reviewed-by: Greg Kroah-Hartman 
> 
> 
> Daniel, can you take this through your tree?

Applied to drm-misc-fixes, but just missed the pull request train for
-rc7. Should land in Linus' tree next week.

But I did look at the code, and I have regrets. Macros into untyped arrays
and negative indices is very old skool C. It's definitely neater than
before, but also can't deny that we're doing dental surgery on a living
and fire breathing dragon here :-/

I guess I'll just add this to the list of requirements anyone has to
resolve before we're going to resurrect scrollback.

Cheers, Daniel





> 
> thanks,
> 
> greg k-h

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


<    1   2   3   4   5   6   7   8   9   10   >