Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Daniel Vetter
On Wed, 23 Nov 2022 at 09:07, Thomas Zimmermann  wrote:
>
> Hi
>
> Am 22.11.22 um 18:08 schrieb Daniel Vetter:
> > tldr; DMA buffers aren't normal memory, expecting that you can use
> > them like that (like calling get_user_pages works, or that they're
> > accounting like any other normal memory) cannot be guaranteed.
> >
> > Since some userspace only runs on integrated devices, where all
> > buffers are actually all resident system memory, there's a huge
> > temptation to assume that a struct page is always present and useable
> > like for any more pagecache backed mmap. This has the potential to
> > result in a uapi nightmare.
> >
> > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > blocks get_user_pages and all the other struct page based
> > infrastructure for everyone. In spirit this is the uapi counterpart to
> > the kernel-internal CONFIG_DMABUF_DEBUG.
> >
> > Motivated by a recent patch which wanted to swich the system dma-buf
> > heap to vm_insert_page instead of vm_insert_pfn.
> >
> > v2:
> >
> > Jason brought up that we also want to guarantee that all ptes have the
> > pte_special flag set, to catch fast get_user_pages (on architectures
> > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> >
> >  From auditing the various functions to insert pfn pte entires
> > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > this should be the correct flag to check for.
> >
> > v3: Change to WARN_ON_ONCE (Thomas Zimmermann)
> >
> > References: 
> > https://lore.kernel.org/lkml/cakmk7uhi+mg0z0humnt13qccvuturvjpcr0njrl12k-wbwz...@mail.gmail.com/
> > Acked-by: Christian König 
> > Acked-by: Thomas Zimmermann 
> > Cc: Thomas Zimmermann 
> > Cc: Jason Gunthorpe 
> > Cc: Suren Baghdasaryan 
> > Cc: Matthew Wilcox 
> > Cc: John Stultz 
> > Signed-off-by: Daniel Vetter 
> > Cc: Sumit Semwal 
> > Cc: "Christian König" 
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > --
> > Ok I entirely forgot about this patch but stumbled over it and checked
> > what's up with it no. I think it's ready now for merging:
> > - shmem helper patches to fix up vgem landed
> > - ttm has been fixed since a while
> > - I don't think we've had any other open issues
> >
> > Time to lock down this uapi contract for real?
> > -Daniel
> > ---
> >   drivers/dma-buf/dma-buf.c | 4 
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index b6c36914e7c6..88718665c3c3 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -150,6 +150,8 @@ static int dma_buf_mmap_internal(struct file *file, 
> > struct vm_area_struct *vma)
> >   ret = dmabuf->ops->mmap(dmabuf, vma);
> >   dma_resv_unlock(dmabuf->resv);
> >
> > + WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
>
> Well , I already a-b'ed this, but does it work with DMa helpers. I'm
> asking because of [1].
>
> Best regards
> Thomas
>
> [1]
> https://elixir.bootlin.com/linux/v6.1-rc6/source/drivers/gpu/drm/drm_gem_dma_helper.c#L533

This one is entertaining, but also doesn't matter, because the
remap_pfn_range that the various dma_mmap functions boil down to sets
the VM_PFNMAP and a pile of other flags. See

https://elixir.bootlin.com/linux/v6.1-rc6/source/mm/memory.c#L2518

I have no idea why Laurent cleared this flag here just so it gets
reset again a bit later when he added that code. Laurent?
-Daniel

> > +
> >   return ret;
> >   }
> >
> > @@ -1495,6 +1497,8 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct 
> > vm_area_struct *vma,
> >   ret = dmabuf->ops->mmap(dmabuf, vma);
> >   dma_resv_unlock(dmabuf->resv);
> >
> > + WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));
> > +
> >   return ret;
> >   }
> >   EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Daniel Vetter
On Wed, 23 Nov 2022 at 10:06, Christian König  wrote:
> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> > On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe  wrote:
> >> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> >>> You nuke all the ptes. Drivers that move have slightly more than a
> >>> bare struct file, they also have a struct address_space so that
> >>> invalidate_mapping_range() works.
> >> Okay, this is one of the ways that this can be made to work correctly,
> >> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> >> was the DAX mistake)
> > Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> > Which some did, and then blamed bug reporters for the resulting splats
> > :-) One of the things we've reverted was the ttm huge pte support,
> > since that doesn't have the pmd_special flag (yet) and so would let
> > gup_fast through.
>
> The problem is not only gup, a lot of people seem to assume that when
> you are able to grab a reference to a page that the ptes pointing to
> that page can't change any more. And that's obviously incorrect.
>
> I witnessed tons of discussions about that already. Some customers even
> modified our code assuming that and then wondered why the heck they ran
> into data corruption.
>
> It's gotten so bad that I've even proposed intentionally mangling the
> page reference count on TTM allocated pages:
> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koe...@amd.com/

Yeah maybe something like this could be applied after we land this
patch here. Well maybe should have the same check in gem mmap code to
make sure no driver

> I think it would be better that instead of having special flags in the
> ptes and vmas that you can't follow them to a page structure we would
> add something to the page indicating that you can't grab a reference to
> it. But this might break some use cases as well.

Afaik the problem with that is that there's no free page bits left for
these debug checks. Plus the pte+vma flags are the flags to make this
clear already.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Christian König

Am 22.11.22 um 20:50 schrieb Daniel Vetter:

On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe  wrote:

On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:

You nuke all the ptes. Drivers that move have slightly more than a
bare struct file, they also have a struct address_space so that
invalidate_mapping_range() works.

Okay, this is one of the ways that this can be made to work correctly,
as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
was the DAX mistake)

Hence this patch, to enforce that no dma-buf exporter gets this wrong.
Which some did, and then blamed bug reporters for the resulting splats
:-) One of the things we've reverted was the ttm huge pte support,
since that doesn't have the pmd_special flag (yet) and so would let
gup_fast through.


The problem is not only gup, a lot of people seem to assume that when 
you are able to grab a reference to a page that the ptes pointing to 
that page can't change any more. And that's obviously incorrect.


I witnessed tons of discussions about that already. Some customers even 
modified our code assuming that and then wondered why the heck they ran 
into data corruption.


It's gotten so bad that I've even proposed intentionally mangling the 
page reference count on TTM allocated pages: 
https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koe...@amd.com/


I think it would be better that instead of having special flags in the 
ptes and vmas that you can't follow them to a page structure we would 
add something to the page indicating that you can't grab a reference to 
it. But this might break some use cases as well.


Regards,
Christian.


-Daniel




Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Thomas Zimmermann

Hi

Am 22.11.22 um 18:08 schrieb Daniel Vetter:

tldr; DMA buffers aren't normal memory, expecting that you can use
them like that (like calling get_user_pages works, or that they're
accounting like any other normal memory) cannot be guaranteed.

Since some userspace only runs on integrated devices, where all
buffers are actually all resident system memory, there's a huge
temptation to assume that a struct page is always present and useable
like for any more pagecache backed mmap. This has the potential to
result in a uapi nightmare.

To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
blocks get_user_pages and all the other struct page based
infrastructure for everyone. In spirit this is the uapi counterpart to
the kernel-internal CONFIG_DMABUF_DEBUG.

Motivated by a recent patch which wanted to swich the system dma-buf
heap to vm_insert_page instead of vm_insert_pfn.

v2:

Jason brought up that we also want to guarantee that all ptes have the
pte_special flag set, to catch fast get_user_pages (on architectures
that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.

 From auditing the various functions to insert pfn pte entires
(vm_insert_pfn_prot, remap_pfn_range and all it's callers like
dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
this should be the correct flag to check for.

v3: Change to WARN_ON_ONCE (Thomas Zimmermann)

References: 
https://lore.kernel.org/lkml/cakmk7uhi+mg0z0humnt13qccvuturvjpcr0njrl12k-wbwz...@mail.gmail.com/
Acked-by: Christian König 
Acked-by: Thomas Zimmermann 
Cc: Thomas Zimmermann 
Cc: Jason Gunthorpe 
Cc: Suren Baghdasaryan 
Cc: Matthew Wilcox 
Cc: John Stultz 
Signed-off-by: Daniel Vetter 
Cc: Sumit Semwal 
Cc: "Christian König" 
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
--
Ok I entirely forgot about this patch but stumbled over it and checked
what's up with it no. I think it's ready now for merging:
- shmem helper patches to fix up vgem landed
- ttm has been fixed since a while
- I don't think we've had any other open issues

Time to lock down this uapi contract for real?
-Daniel
---
  drivers/dma-buf/dma-buf.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index b6c36914e7c6..88718665c3c3 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -150,6 +150,8 @@ static int dma_buf_mmap_internal(struct file *file, struct 
vm_area_struct *vma)
ret = dmabuf->ops->mmap(dmabuf, vma);
dma_resv_unlock(dmabuf->resv);
  
+	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));


Well , I already a-b'ed this, but does it work with DMa helpers. I'm 
asking because of [1].


Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v6.1-rc6/source/drivers/gpu/drm/drm_gem_dma_helper.c#L533



+
return ret;
  }
  
@@ -1495,6 +1497,8 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,

ret = dmabuf->ops->mmap(dmabuf, vma);
dma_resv_unlock(dmabuf->resv);
  
+	WARN_ON_ONCE(!(vma->vm_flags & VM_PFNMAP));

+
return ret;
  }
  EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-22 Thread Daniel Vetter
On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe  wrote:
> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> > You nuke all the ptes. Drivers that move have slightly more than a
> > bare struct file, they also have a struct address_space so that
> > invalidate_mapping_range() works.
>
> Okay, this is one of the ways that this can be made to work correctly,
> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> was the DAX mistake)

Hence this patch, to enforce that no dma-buf exporter gets this wrong.
Which some did, and then blamed bug reporters for the resulting splats
:-) One of the things we've reverted was the ttm huge pte support,
since that doesn't have the pmd_special flag (yet) and so would let
gup_fast through.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-22 Thread Daniel Vetter
On Tue, 22 Nov 2022 at 19:50, Jason Gunthorpe  wrote:
>
> On Tue, Nov 22, 2022 at 07:08:25PM +0100, Daniel Vetter wrote:
> > On Tue, 22 Nov 2022 at 19:04, Jason Gunthorpe  wrote:
> > >
> > > On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> > > > tldr; DMA buffers aren't normal memory, expecting that you can use
> > > > them like that (like calling get_user_pages works, or that they're
> > > > accounting like any other normal memory) cannot be guaranteed.
> > > >
> > > > Since some userspace only runs on integrated devices, where all
> > > > buffers are actually all resident system memory, there's a huge
> > > > temptation to assume that a struct page is always present and useable
> > > > like for any more pagecache backed mmap. This has the potential to
> > > > result in a uapi nightmare.
> > > >
> > > > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > > > blocks get_user_pages and all the other struct page based
> > > > infrastructure for everyone. In spirit this is the uapi counterpart to
> > > > the kernel-internal CONFIG_DMABUF_DEBUG.
> > > >
> > > > Motivated by a recent patch which wanted to swich the system dma-buf
> > > > heap to vm_insert_page instead of vm_insert_pfn.
> > > >
> > > > v2:
> > > >
> > > > Jason brought up that we also want to guarantee that all ptes have the
> > > > pte_special flag set, to catch fast get_user_pages (on architectures
> > > > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > > > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> > > >
> > > > From auditing the various functions to insert pfn pte entires
> > > > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > > > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > > > this should be the correct flag to check for.
> > >
> > > I didn't look at how this actually gets used, but it is a bit of a
> > > pain to insert a lifetime controlled object like a struct page as a
> > > special PTE/VM_PFNMAP
> > >
> > > How is the lifetime model implemented here? How do you know when
> > > userspace has finally unmapped the page?
> >
> > The vma has a filp which is the refcounted dma_buf. With dma_buf you
> > never get an individual page it's always the entire object. And it's
> > up to the allocator how exactly it wants to use or not use the page's
> > refcount. So if gup goes in and elevates the refcount, you can break
> > stuff, which is why I'm doing this.
>
> But how does move work?

You nuke all the ptes. Drivers that move have slightly more than a
bare struct file, they also have a struct address_space so that
invalidate_mapping_range() works. Refaulting and any coherency issues
when a refault races against a dma-buf migration is up to the
driver/exporter to handle correctly. None rely on struct page like mm/
moving stuff around for compaction/ksm/numa-balancing/whateverr.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-22 Thread Daniel Vetter
On Tue, 22 Nov 2022 at 19:04, Jason Gunthorpe  wrote:
>
> On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> > tldr; DMA buffers aren't normal memory, expecting that you can use
> > them like that (like calling get_user_pages works, or that they're
> > accounting like any other normal memory) cannot be guaranteed.
> >
> > Since some userspace only runs on integrated devices, where all
> > buffers are actually all resident system memory, there's a huge
> > temptation to assume that a struct page is always present and useable
> > like for any more pagecache backed mmap. This has the potential to
> > result in a uapi nightmare.
> >
> > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > blocks get_user_pages and all the other struct page based
> > infrastructure for everyone. In spirit this is the uapi counterpart to
> > the kernel-internal CONFIG_DMABUF_DEBUG.
> >
> > Motivated by a recent patch which wanted to swich the system dma-buf
> > heap to vm_insert_page instead of vm_insert_pfn.
> >
> > v2:
> >
> > Jason brought up that we also want to guarantee that all ptes have the
> > pte_special flag set, to catch fast get_user_pages (on architectures
> > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> >
> > From auditing the various functions to insert pfn pte entires
> > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > this should be the correct flag to check for.
>
> I didn't look at how this actually gets used, but it is a bit of a
> pain to insert a lifetime controlled object like a struct page as a
> special PTE/VM_PFNMAP
>
> How is the lifetime model implemented here? How do you know when
> userspace has finally unmapped the page?

The vma has a filp which is the refcounted dma_buf. With dma_buf you
never get an individual page it's always the entire object. And it's
up to the allocator how exactly it wants to use or not use the page's
refcount. So if gup goes in and elevates the refcount, you can break
stuff, which is why I'm doing this.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch