Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Daniel Vetter
On Wed, 23 Nov 2022 at 16:15, Christian König  wrote:
>
> Am 23.11.22 um 16:08 schrieb Jason Gunthorpe:
> > On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote:
> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>> index 1376a47fedeedb..4161241fc3228c 100644
> >>> --- a/virt/kvm/kvm_main.c
> >>> +++ b/virt/kvm/kvm_main.c
> >>> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct 
> >>> vm_area_struct *vma,
> >>>  return r;
> >>>  }
> >>>
> >>> +   /*
> >>> +* Special PTEs are never convertible into a struct page, even if 
> >>> the
> >>> +* driver that owns them might have put a PFN with a struct page 
> >>> into
> >>> +* the PFNMAP. If the arch doesn't support special then we cannot
> >>> +* safely process these pages.
> >>> +*/
> >>> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> >>> +   if (pte_special(*ptep))
> >>> +   return -EINVAL;
> >> On second thought this wont work, because it completely defeats the
> >> point of why this code here exists. remap_pfn_range() (which is what
> >> the various dma_mmap functions and the ioremap functions are built on
> >> top of too) sets VM_PFNMAP too, so this check would even catch the
> >> static mappings.
> > The problem with the way this code is designed is how it allows
> > returning the pfn without taking any reference based on things like
> > !pfn_valid or page_reserved. This allows it to then conditionally put
> > back the reference based on the same reasoning. It is impossible to
> > thread pte special into that since it is a PTE flag, not a property of
> > the PFN.
> >
> > I don't entirely understand why it needs the page reference at all,
>
> That's exactly what I've pointed out in the previous discussion about
> that code as well.
>
> As far as I can see it this is just another case where people assumed
> that grabbing a page reference somehow magically prevents the pte from
> changing.
>
> I have not the slightest idea how people got this impression, but I have
> heard it so many time from so many different sources that there must be
> some common cause to this. Is the maybe some book or tutorial how to
> sophisticate break the kernel or something like this?

It's what get_user_pages does, so it does "work". Except this path
here is the fallback for when get_user_pages does not work (because of
the pte_special/VM_SPECIAL case). So essentially it's just a rather
broken get_user_pages that handrolls a bunch of things with
bugs&races.

I have no idea why people don't realize they're just reinventing gup
without using gup, but that's essentially what's going on.

> Anyway as far as I can see only correct approach would be to use an MMU
> notifier or more high level hmm_range_fault()+seq number.

Yeah, plus if you go through ptes you really have to obey all the
flags or things will break. Especially the RO pte flag.
-Daniel

>
> Regards,
> Christian.
>
> > even if it is available - so I can't guess why it is OK to ignore the
> > page reference in other cases, or why it is OK to be racy..
> >
> > Eg hmm_range_fault() does not obtain page references and implements a
> > very similar algorithm to kvm.
> >
> >> Plus these static mappings aren't all that static either, e.g. pci
> >> access also can revoke bar mappings nowadays.
> > And there are already mmu notifiers to handle that, AFAIK.
> >
> > Jason
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Daniel Vetter
On Wed, 23 Nov 2022 at 16:04, Jason Gunthorpe  wrote:
>
> On Wed, Nov 23, 2022 at 03:28:27PM +0100, Daniel Vetter wrote:
>
> > > This patch is known to be broken in so many ways. It also has a major
> > > security hole that it ignores the PTE flags making the page
> > > RO. Ignoring the special bit is somehow not surprising :(
> > >
> > > This probably doesn't work, but is the general idea of what KVM needs
> > > to do:
> >
> > Oh dear, when I dug around in there I entirely missed that
> > kvm_try_get_pfn exists, and it's very broken indeed. kvm really needs
> > to grow a proper mmu notifier.
> >
> > Another thing I'm wondering right now, the follow_pte();
> > fixup_user_fault(); follow_pte(); approach does not make any
> > guarantees of actually being right. If you're sufficiently unlucky you
> > might race against an immediate pte invalidate between the fixup and
> > the 2nd follow_pte(). But you can also not loop, because that would
> > fail to catch permanent faults.
>
> Yes, it is pretty broken.
>
> kvm already has support for mmu notifiers and uses it for other
> stuff. I can't remember what exactly this code path was for, IIRC
> Paolo talked about having a big rework/fix for it when we last talked
> about the missing write protect. I also vauagely recall he had some
> explanation why this might be safe.
>
> > I think the iommu fault drivers have a similar pattern.
>
> Where? It shouldn't
>
> The common code for SVA just calls handle_mm_fault() and restarts the
> PRI. Since the page table is physically shared there is no issue with
> a stale copy.
>
> > What am I missing here? Or is that also just broken. gup works around
> > this with the slow path that takes the mmap sem and walking the vma
> > tree, follow_pte/fixup_user_fautl users dont.
>
> follow_pte() is just fundamentally broken, things must not use it.
>
> > Maybe mmu notifier based restarting would help with this too, if
> > done properly.
>
> That is called hmm_range_fault()

Ah right I mixed that up on a quick grep, thanks for pointing me in
the right direction. Worries appeased.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Christian König

Am 23.11.22 um 16:08 schrieb Jason Gunthorpe:

On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1376a47fedeedb..4161241fc3228c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct 
*vma,
 return r;
 }

+   /*
+* Special PTEs are never convertible into a struct page, even if the
+* driver that owns them might have put a PFN with a struct page into
+* the PFNMAP. If the arch doesn't support special then we cannot
+* safely process these pages.
+*/
+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
+   if (pte_special(*ptep))
+   return -EINVAL;

On second thought this wont work, because it completely defeats the
point of why this code here exists. remap_pfn_range() (which is what
the various dma_mmap functions and the ioremap functions are built on
top of too) sets VM_PFNMAP too, so this check would even catch the
static mappings.

The problem with the way this code is designed is how it allows
returning the pfn without taking any reference based on things like
!pfn_valid or page_reserved. This allows it to then conditionally put
back the reference based on the same reasoning. It is impossible to
thread pte special into that since it is a PTE flag, not a property of
the PFN.

I don't entirely understand why it needs the page reference at all,


That's exactly what I've pointed out in the previous discussion about 
that code as well.


As far as I can see it this is just another case where people assumed 
that grabbing a page reference somehow magically prevents the pte from 
changing.


I have not the slightest idea how people got this impression, but I have 
heard it so many time from so many different sources that there must be 
some common cause to this. Is the maybe some book or tutorial how to 
sophisticate break the kernel or something like this?


Anyway as far as I can see only correct approach would be to use an MMU 
notifier or more high level hmm_range_fault()+seq number.


Regards,
Christian.


even if it is available - so I can't guess why it is OK to ignore the
page reference in other cases, or why it is OK to be racy..

Eg hmm_range_fault() does not obtain page references and implements a
very similar algorithm to kvm.


Plus these static mappings aren't all that static either, e.g. pci
access also can revoke bar mappings nowadays.

And there are already mmu notifiers to handle that, AFAIK.

Jason




Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Daniel Vetter
On Wed, 23 Nov 2022 at 14:28, Jason Gunthorpe  wrote:
>
> On Wed, Nov 23, 2022 at 02:12:25PM +0100, Christian König wrote:
> > Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> > > On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> > > > Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > > > > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > > > >
> > > > > > > Maybe a GFP flag to set the page reference count to zero or 
> > > > > > > something
> > > > > > > like this?
> > > > > > Hm yeah that might work. I'm not sure what it will all break though?
> > > > > > And we'd need to make sure that underflowing the page refcount dies 
> > > > > > in
> > > > > > a backtrace.
> > > > > Mucking with the refcount like this to protect against crazy out of
> > > > > tree drives seems horrible..
> > > > Well not only out of tree drivers. The intree KVM got that horrible
> > > > wrong as well, those where the latest guys complaining about it.
> > > kvm was taking refs on special PTEs? That seems really unlikely?
> >
> > Well then look at this code here:
> >
> > commit add6a0cd1c5ba51b201e1361b05a5df817083618
> > Author: Paolo Bonzini 
> > Date:   Tue Jun 7 17:51:18 2016 +0200
> >
> > KVM: MMU: try to fix up page faults before giving up
> >
> > The vGPU folks would like to trap the first access to a BAR by setting
> > vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> > handler
> > then can use remap_pfn_range to place some non-reserved pages in the
> > VMA.
> >
> > This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
> > and fixup_user_fault together help supporting it.  The patch also
> > supports
> > VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
> > reference counting.
> >
> > Cc: Xiao Guangrong 
> > Cc: Andrea Arcangeli 
> > Cc: Radim Krčmář 
> > Tested-by: Neo Jia 
> > Reported-by: Kirti Wankhede 
> > Signed-off-by: Paolo Bonzini 
>
> This patch is known to be broken in so many ways. It also has a major
> security hole that it ignores the PTE flags making the page
> RO. Ignoring the special bit is somehow not surprising :(
>
> This probably doesn't work, but is the general idea of what KVM needs
> to do:
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1376a47fedeedb..4161241fc3228c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct 
> *vma,
> return r;
> }
>
> +   /*
> +* Special PTEs are never convertible into a struct page, even if the
> +* driver that owns them might have put a PFN with a struct page into
> +* the PFNMAP. If the arch doesn't support special then we cannot
> +* safely process these pages.
> +*/
> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> +   if (pte_special(*ptep))
> +   return -EINVAL;

On second thought this wont work, because it completely defeats the
point of why this code here exists. remap_pfn_range() (which is what
the various dma_mmap functions and the ioremap functions are built on
top of too) sets VM_PFNMAP too, so this check would even catch the
static mappings.

Plus these static mappings aren't all that static either, e.g. pci
access also can revoke bar mappings nowadays.

I think nothing except full mmu_notifier will actually fix this.
-Daniel

> +#else
> +   return -EINVAL;
> +#endif
> +
> if (write_fault && !pte_write(*ptep)) {
> pfn = KVM_PFN_ERR_RO_FAULT;
> goto out;
>
> Jason



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Daniel Vetter
On Wed, 23 Nov 2022 at 14:28, Jason Gunthorpe  wrote:
>
> On Wed, Nov 23, 2022 at 02:12:25PM +0100, Christian König wrote:
> > Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> > > On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> > > > Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > > > > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > > > >
> > > > > > > Maybe a GFP flag to set the page reference count to zero or 
> > > > > > > something
> > > > > > > like this?
> > > > > > Hm yeah that might work. I'm not sure what it will all break though?
> > > > > > And we'd need to make sure that underflowing the page refcount dies 
> > > > > > in
> > > > > > a backtrace.
> > > > > Mucking with the refcount like this to protect against crazy out of
> > > > > tree drives seems horrible..
> > > > Well not only out of tree drivers. The intree KVM got that horrible
> > > > wrong as well, those where the latest guys complaining about it.
> > > kvm was taking refs on special PTEs? That seems really unlikely?
> >
> > Well then look at this code here:
> >
> > commit add6a0cd1c5ba51b201e1361b05a5df817083618
> > Author: Paolo Bonzini 
> > Date:   Tue Jun 7 17:51:18 2016 +0200
> >
> > KVM: MMU: try to fix up page faults before giving up
> >
> > The vGPU folks would like to trap the first access to a BAR by setting
> > vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> > handler
> > then can use remap_pfn_range to place some non-reserved pages in the
> > VMA.
> >
> > This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
> > and fixup_user_fault together help supporting it.  The patch also
> > supports
> > VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
> > reference counting.
> >
> > Cc: Xiao Guangrong 
> > Cc: Andrea Arcangeli 
> > Cc: Radim Krčmář 
> > Tested-by: Neo Jia 
> > Reported-by: Kirti Wankhede 
> > Signed-off-by: Paolo Bonzini 
>
> This patch is known to be broken in so many ways. It also has a major
> security hole that it ignores the PTE flags making the page
> RO. Ignoring the special bit is somehow not surprising :(
>
> This probably doesn't work, but is the general idea of what KVM needs
> to do:

Oh dear, when I dug around in there I entirely missed that
kvm_try_get_pfn exists, and it's very broken indeed. kvm really needs
to grow a proper mmu notifier.

Another thing I'm wondering right now, the follow_pte();
fixup_user_fault(); follow_pte(); approach does not make any
guarantees of actually being right. If you're sufficiently unlucky you
might race against an immediate pte invalidate between the fixup and
the 2nd follow_pte(). But you can also not loop, because that would
fail to catch permanent faults.

I think the iommu fault drivers have a similar pattern.

What am I missing here? Or is that also just broken. gup works around
this with the slow path that takes the mmap sem and walking the vma
tree, follow_pte/fixup_user_fautl users dont. Maybe mmu notifier based
restarting would help with this too, if done properly.
-Daniel

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1376a47fedeedb..4161241fc3228c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct 
> *vma,
> return r;
> }
>
> +   /*
> +* Special PTEs are never convertible into a struct page, even if the
> +* driver that owns them might have put a PFN with a struct page into
> +* the PFNMAP. If the arch doesn't support special then we cannot
> +* safely process these pages.
> +*/
> +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
> +   if (pte_special(*ptep))
> +   return -EINVAL;
> +#else
> +   return -EINVAL;
> +#endif
> +
> if (write_fault && !pte_write(*ptep)) {
> pfn = KVM_PFN_ERR_RO_FAULT;
> goto out;
>
> Jason



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Christian König

Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:

On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:

Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:

On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:


Maybe a GFP flag to set the page reference count to zero or something
like this?

Hm yeah that might work. I'm not sure what it will all break though?
And we'd need to make sure that underflowing the page refcount dies in
a backtrace.

Mucking with the refcount like this to protect against crazy out of
tree drives seems horrible..

Well not only out of tree drivers. The intree KVM got that horrible
wrong as well, those where the latest guys complaining about it.

kvm was taking refs on special PTEs? That seems really unlikely?


Well then look at this code here:

commit add6a0cd1c5ba51b201e1361b05a5df817083618
Author: Paolo Bonzini 
Date:   Tue Jun 7 17:51:18 2016 +0200

    KVM: MMU: try to fix up page faults before giving up

    The vGPU folks would like to trap the first access to a BAR by setting
    vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault 
handler
    then can use remap_pfn_range to place some non-reserved pages in 
the VMA.


    This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
    and fixup_user_fault together help supporting it.  The patch also 
supports

    VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
    reference counting.

    Cc: Xiao Guangrong 
    Cc: Andrea Arcangeli 
    Cc: Radim Krčmář 
    Tested-by: Neo Jia 
    Reported-by: Kirti Wankhede 
    Signed-off-by: Paolo Bonzini 

And see also the discussion here: 
https://patchwork.freedesktop.org/patch/414123/


as well as here: https://patchwork.freedesktop.org/patch/499190/

I can't count how often I have pointed out that this is absolutely 
illegal and KVM can't touch pages in VMAs with VM_PFNMAP.



The WARN_ON(pag_count(p) != 1) seems like a reasonable thing to do
though, though you must combine this with the special PTE flag..

That's not sufficient. The pages are released much later than things
actually go wrong. In most cases this WARN_ON here won't hit.

How so? As long as the page is mapped into the PTE there is no issue
with corruption. If dmabuf checks the refcount after it does the unmap
mapping range it should catch any bogus pin that might be confused
about address coherency.


Yeah, that would work. The problem is this WARN_ON() comes much later.

The device drivers usually keep the page around for a while even after 
it is unmapped. IIRC the cleanup worker only runs every 10ms or so.


Christian.



Jason




Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Christian König

Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:

On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:


Maybe a GFP flag to set the page reference count to zero or something
like this?

Hm yeah that might work. I'm not sure what it will all break though?
And we'd need to make sure that underflowing the page refcount dies in
a backtrace.

Mucking with the refcount like this to protect against crazy out of
tree drives seems horrible..


Well not only out of tree drivers. The intree KVM got that horrible 
wrong as well, those where the latest guys complaining about it.




The WARN_ON(pag_count(p) != 1) seems like a reasonable thing to do
though, though you must combine this with the special PTE flag..


That's not sufficient. The pages are released much later than things 
actually go wrong. In most cases this WARN_ON here won't hit.


Christian.



Jason




Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Daniel Vetter
On Wed, 23 Nov 2022 at 10:39, Christian König
 wrote:
>
> Am 23.11.22 um 10:30 schrieb Daniel Vetter:
> > On Wed, 23 Nov 2022 at 10:06, Christian König  
> > wrote:
> >> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> >>> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe  wrote:
>  On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> > You nuke all the ptes. Drivers that move have slightly more than a
> > bare struct file, they also have a struct address_space so that
> > invalidate_mapping_range() works.
>  Okay, this is one of the ways that this can be made to work correctly,
>  as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
>  was the DAX mistake)
> >>> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> >>> Which some did, and then blamed bug reporters for the resulting splats
> >>> :-) One of the things we've reverted was the ttm huge pte support,
> >>> since that doesn't have the pmd_special flag (yet) and so would let
> >>> gup_fast through.
> >> The problem is not only gup, a lot of people seem to assume that when
> >> you are able to grab a reference to a page that the ptes pointing to
> >> that page can't change any more. And that's obviously incorrect.
> >>
> >> I witnessed tons of discussions about that already. Some customers even
> >> modified our code assuming that and then wondered why the heck they ran
> >> into data corruption.
> >>
> >> It's gotten so bad that I've even proposed intentionally mangling the
> >> page reference count on TTM allocated pages:
> >> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koe...@amd.com/
> > Yeah maybe something like this could be applied after we land this
> > patch here.
>
> I think both should land ASAP. We don't have any other way than to
> clearly point out incorrect approaches if we want to prevent the
> resulting data corruption.
>
> > Well maybe should have the same check in gem mmap code to
> > make sure no driver
>
> Reads like the sentence is somehow cut of?

Yeah, just wanted to say that we need to make sure in as many places
as possible that VM_PFNMAP is set for bo mmaps.

> >> I think it would be better that instead of having special flags in the
> >> ptes and vmas that you can't follow them to a page structure we would
> >> add something to the page indicating that you can't grab a reference to
> >> it. But this might break some use cases as well.
> > Afaik the problem with that is that there's no free page bits left for
> > these debug checks. Plus the pte+vma flags are the flags to make this
> > clear already.
>
> Maybe a GFP flag to set the page reference count to zero or something
> like this?

Hm yeah that might work. I'm not sure what it will all break though?
And we'd need to make sure that underflowing the page refcount dies in
a backtrace.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

2022-11-23 Thread Christian König

Am 23.11.22 um 10:30 schrieb Daniel Vetter:

On Wed, 23 Nov 2022 at 10:06, Christian König  wrote:

Am 22.11.22 um 20:50 schrieb Daniel Vetter:

On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe  wrote:

On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:

You nuke all the ptes. Drivers that move have slightly more than a
bare struct file, they also have a struct address_space so that
invalidate_mapping_range() works.

Okay, this is one of the ways that this can be made to work correctly,
as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
was the DAX mistake)

Hence this patch, to enforce that no dma-buf exporter gets this wrong.
Which some did, and then blamed bug reporters for the resulting splats
:-) One of the things we've reverted was the ttm huge pte support,
since that doesn't have the pmd_special flag (yet) and so would let
gup_fast through.

The problem is not only gup, a lot of people seem to assume that when
you are able to grab a reference to a page that the ptes pointing to
that page can't change any more. And that's obviously incorrect.

I witnessed tons of discussions about that already. Some customers even
modified our code assuming that and then wondered why the heck they ran
into data corruption.

It's gotten so bad that I've even proposed intentionally mangling the
page reference count on TTM allocated pages:
https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koe...@amd.com/

Yeah maybe something like this could be applied after we land this
patch here.


I think both should land ASAP. We don't have any other way than to 
clearly point out incorrect approaches if we want to prevent the 
resulting data corruption.



Well maybe should have the same check in gem mmap code to
make sure no driver


Reads like the sentence is somehow cut of?




I think it would be better that instead of having special flags in the
ptes and vmas that you can't follow them to a page structure we would
add something to the page indicating that you can't grab a reference to
it. But this might break some use cases as well.

Afaik the problem with that is that there's no free page bits left for
these debug checks. Plus the pte+vma flags are the flags to make this
clear already.


Maybe a GFP flag to set the page reference count to zero or something 
like this?


Christian.


-Daniel