Re: [Nouveau] [PATCH v9 03/10] mm/rmap: Split try_to_munlock from try_to_unmap

2021-06-06 Thread Alistair Popple
On Saturday, 5 June 2021 10:41:03 AM AEST Shakeel Butt wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Jun 4, 2021 at 1:49 PM Liam Howlett  wrote:
> >
> > * Shakeel Butt  [210525 19:45]:
> > > On Tue, May 25, 2021 at 11:40 AM Liam Howlett  
wrote:
> > > >
> > > [...]
> > > > >
> > > > > +/*
> > > > > + * Walks the vma's mapping a page and mlocks the page if any locked 
vma's are
> > > > > + * found. Once one is found the page is locked and the scan can be 
terminated.
> > > > > + */
> > > >
> > > > Can you please add that this requires the mmap_sem() lock to the
> > > > comments?
> > > >
> > >
> > > Why does this require mmap_sem() lock? Also mmap_sem() lock of which 
mm_struct?
> >
> >
> > Doesn't the mlock_vma_page() require the mmap_sem() for reading?  The
> > mm_struct in vma->vm_mm;
> >
> 
> We are traversing all the vmas where this page is mapped of possibly
> different mm_structs. I don't think we want to take mmap_sem() of all
> those mm_structs. The commit b87537d9e2fe ("mm: rmap use pte lock not
> mmap_sem to set PageMlocked") removed exactly that.
> 
> >
> > From what I can see, at least the following paths have mmap_lock held
> > for writing:
> >
> > munlock_vma_pages_range() from __do_munmap()
> > munlokc_vma_pages_range() from remap_file_pages()
> >
> 
> The following path does not hold mmap_sem:
> 
> exit_mmap() -> munlock_vma_pages_all() -> munlock_vma_pages_range().
> 
> I would really suggest all to carefully read the commit message of
> b87537d9e2fe ("mm: rmap use pte lock not mmap_sem to set
> PageMlocked").
> 
> Particularly the following paragraph:
> ...
> Vlastimil Babka points out another race which this patch protects 
against.
>  try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked 
a
> moment after munlock_vma_pages_all() did its Phase 1 
TestClearPageMlocked:
> leaving PageMlocked and unevictable when it should be evictable.  
mmap_sem
> is ineffective because exit_mmap() does not hold it; page lock 
ineffective
> because __munlock_pagevec() only takes it afterwards, in Phase 2; pte 
lock
> is effective because __munlock_pagevec_fill() takes it to get the page,
> after VM_LOCKED was cleared from vm_flags, so visible to 
try_to_unmap_one.
> ...
>
> Alistair, please bring back the VM_LOCKED check with pte lock held and
> the comment "Holding pte lock, we do *not* need mmap_lock here".

Actually thanks for highlighting that paragraph. I have gone back through the 
code again in munlock_vma_pages_range() and think I have a better 
understanding of it now. So now I agree - the check of VM_LOCKED under the PTL 
is important to ensure mlock_vma_page() does not run after VM_LOCKED has been 
cleared and __munlock_pagevec_fill() has run.

Will post v10 to fix this and the try_to_munlock reference pointed out by Liam 
which I missed for v9. Thanks Shakeel for taking the time to point this out.

> One positive outcome of this cleanup patch is the removal of
> unnecessary invalidation (unmapping for kvm case) of secondary mmus.



___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 03/10] mm/rmap: Split try_to_munlock from try_to_unmap

2021-06-06 Thread Liam Howlett
* Shakeel Butt  [210604 20:41]:
> On Fri, Jun 4, 2021 at 1:49 PM Liam Howlett  wrote:
> >
> > * Shakeel Butt  [210525 19:45]:
> > > On Tue, May 25, 2021 at 11:40 AM Liam Howlett  
> > > wrote:
> > > >
> > > [...]
> > > > >
> > > > > +/*
> > > > > + * Walks the vma's mapping a page and mlocks the page if any locked 
> > > > > vma's are
> > > > > + * found. Once one is found the page is locked and the scan can be 
> > > > > terminated.
> > > > > + */
> > > >
> > > > Can you please add that this requires the mmap_sem() lock to the
> > > > comments?
> > > >
> > >
> > > Why does this require mmap_sem() lock? Also mmap_sem() lock of which 
> > > mm_struct?
> >
> >
> > Doesn't the mlock_vma_page() require the mmap_sem() for reading?  The
> > mm_struct in vma->vm_mm;
> >
> 
> We are traversing all the vmas where this page is mapped of possibly
> different mm_structs. I don't think we want to take mmap_sem() of all
> those mm_structs. The commit b87537d9e2fe ("mm: rmap use pte lock not
> mmap_sem to set PageMlocked") removed exactly that.
> 
> >
> > From what I can see, at least the following paths have mmap_lock held
> > for writing:
> >
> > munlock_vma_pages_range() from __do_munmap()
> > munlokc_vma_pages_range() from remap_file_pages()
> >
> 
> The following path does not hold mmap_sem:
> 
> exit_mmap() -> munlock_vma_pages_all() -> munlock_vma_pages_range().

Isn't this the benign race referenced by Hugh in the commit you point to
below?

> 
> I would really suggest all to carefully read the commit message of
> b87537d9e2fe ("mm: rmap use pte lock not mmap_sem to set
> PageMlocked").
> 
> Particularly the following paragraph:
> ...
> Vlastimil Babka points out another race which this patch protects against.
>  try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a
> moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked:
> leaving PageMlocked and unevictable when it should be evictable.  mmap_sem
> is ineffective because exit_mmap() does not hold it; page lock ineffective
> because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock
> is effective because __munlock_pagevec_fill() takes it to get the page,
> after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one.
> ...

So this is saying the race with exit_mmap() isn't benign after all?

> 
> Alistair, please bring back the VM_LOCKED check with pte lock held and
> the comment "Holding pte lock, we do *not* need mmap_lock here".
> 
> One positive outcome of this cleanup patch is the removal of
> unnecessary invalidation (unmapping for kvm case) of secondary mmus.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 03/10] mm/rmap: Split try_to_munlock from try_to_unmap

2021-06-06 Thread Liam Howlett
* Shakeel Butt  [210525 19:45]:
> On Tue, May 25, 2021 at 11:40 AM Liam Howlett  wrote:
> >
> [...]
> > >
> > > +/*
> > > + * Walks the vma's mapping a page and mlocks the page if any locked 
> > > vma's are
> > > + * found. Once one is found the page is locked and the scan can be 
> > > terminated.
> > > + */
> >
> > Can you please add that this requires the mmap_sem() lock to the
> > comments?
> >
> 
> Why does this require mmap_sem() lock? Also mmap_sem() lock of which 
> mm_struct?


Doesn't the mlock_vma_page() require the mmap_sem() for reading?  The
mm_struct in vma->vm_mm;


>From what I can see, at least the following paths have mmap_lock held
for writing:

munlock_vma_pages_range() from __do_munmap()
munlokc_vma_pages_range() from remap_file_pages()

> 
> > > +static bool page_mlock_one(struct page *page, struct vm_area_struct *vma,
> > > +  unsigned long address, void *unused)
> > > +{
> > > + struct page_vma_mapped_walk pvmw = {
> > > + .page = page,
> > > + .vma = vma,
> > > + .address = address,
> > > + };
> > > +
> > > + /* An un-locked vma doesn't have any pages to lock, continue the 
> > > scan */
> > > + if (!(vma->vm_flags & VM_LOCKED))
> > > + return true;
> > > +
> > > + while (page_vma_mapped_walk()) {
> > > + /* PTE-mapped THP are never mlocked */
> > > + if (!PageTransCompound(page))
> > > + mlock_vma_page(page);
> > > + page_vma_mapped_walk_done();
> > > +
> > > + /*
> > > +  * no need to continue scanning other vma's if the page has
> > > +  * been locked.
> > > +  */
> > > + return false;
> > > + }
> > > +
> > > + return true;
> > > +}

munlock_vma_pages_range() comments still references try_to_{munlock|unmap}
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 03/10] mm/rmap: Split try_to_munlock from try_to_unmap

2021-05-25 Thread Shakeel Butt
On Tue, May 25, 2021 at 11:40 AM Liam Howlett  wrote:
>
[...]
> >
> > +/*
> > + * Walks the vma's mapping a page and mlocks the page if any locked vma's 
> > are
> > + * found. Once one is found the page is locked and the scan can be 
> > terminated.
> > + */
>
> Can you please add that this requires the mmap_sem() lock to the
> comments?
>

Why does this require mmap_sem() lock? Also mmap_sem() lock of which mm_struct?

> > +static bool page_mlock_one(struct page *page, struct vm_area_struct *vma,
> > +  unsigned long address, void *unused)
> > +{
> > + struct page_vma_mapped_walk pvmw = {
> > + .page = page,
> > + .vma = vma,
> > + .address = address,
> > + };
> > +
> > + /* An un-locked vma doesn't have any pages to lock, continue the scan 
> > */
> > + if (!(vma->vm_flags & VM_LOCKED))
> > + return true;
> > +
> > + while (page_vma_mapped_walk()) {
> > + /* PTE-mapped THP are never mlocked */
> > + if (!PageTransCompound(page))
> > + mlock_vma_page(page);
> > + page_vma_mapped_walk_done();
> > +
> > + /*
> > +  * no need to continue scanning other vma's if the page has
> > +  * been locked.
> > +  */
> > + return false;
> > + }
> > +
> > + return true;
> > +}
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v9 03/10] mm/rmap: Split try_to_munlock from try_to_unmap

2021-05-25 Thread Liam Howlett
* Alistair Popple  [210524 09:29]:
> The behaviour of try_to_unmap_one() is difficult to follow because it
> performs different operations based on a fairly large set of flags used
> in different combinations.
> 
> TTU_MUNLOCK is one such flag. However it is exclusively used by
> try_to_munlock() which specifies no other flags. Therefore rather than
> overload try_to_unmap_one() with unrelated behaviour split this out into
> it's own function and remove the flag.
> 
> Signed-off-by: Alistair Popple 
> Reviewed-by: Ralph Campbell 
> Reviewed-by: Christoph Hellwig 
> 
> ---
> 
> v9:
> * Improved comments
> 
> v8:
> * Renamed try_to_munlock to page_mlock to better reflect what the
>   function actually does.
> * Removed the TODO from the documentation that this patch addresses.
> 
> v7:
> * Added Christoph's Reviewed-by
> 
> v4:
> * Removed redundant check for VM_LOCKED
> ---
>  Documentation/vm/unevictable-lru.rst | 33 ++-
>  include/linux/rmap.h |  3 +-
>  mm/mlock.c   | 10 ++---
>  mm/rmap.c| 61 
>  4 files changed, 63 insertions(+), 44 deletions(-)
> 
> diff --git a/Documentation/vm/unevictable-lru.rst 
> b/Documentation/vm/unevictable-lru.rst
> index 0e1490524f53..eae3af17f2d9 100644
> --- a/Documentation/vm/unevictable-lru.rst
> +++ b/Documentation/vm/unevictable-lru.rst
> @@ -389,14 +389,14 @@ mlocked, munlock_vma_page() updates that zone 
> statistics for the number of
>  mlocked pages.  Note, however, that at this point we haven't checked whether
>  the page is mapped by other VM_LOCKED VMAs.
>  
> -We can't call try_to_munlock(), the function that walks the reverse map to
> +We can't call page_mlock(), the function that walks the reverse map to
>  check for other VM_LOCKED VMAs, without first isolating the page from the 
> LRU.
> -try_to_munlock() is a variant of try_to_unmap() and thus requires that the 
> page
> +page_mlock() is a variant of try_to_unmap() and thus requires that the page
>  not be on an LRU list [more on these below].  However, the call to
> -isolate_lru_page() could fail, in which case we couldn't try_to_munlock().  
> So,
> +isolate_lru_page() could fail, in which case we can't call page_mlock().  So,
>  we go ahead and clear PG_mlocked up front, as this might be the only chance 
> we
> -have.  If we can successfully isolate the page, we go ahead and
> -try_to_munlock(), which will restore the PG_mlocked flag and update the zone
> +have.  If we can successfully isolate the page, we go ahead and call
> +page_mlock(), which will restore the PG_mlocked flag and update the zone
>  page statistics if it finds another VMA holding the page mlocked.  If we fail
>  to isolate the page, we'll have left a potentially mlocked page on the LRU.
>  This is fine, because we'll catch it later if and if vmscan tries to reclaim
> @@ -545,31 +545,24 @@ munlock or munmap system calls, mm teardown 
> (munlock_vma_pages_all), reclaim,
>  holepunching, and truncation of file pages and their anonymous COWed pages.
>  
>  
> -try_to_munlock() Reverse Map Scan
> +page_mlock() Reverse Map Scan
>  -
>  
> -.. warning::
> -   [!] TODO/FIXME: a better name might be page_mlocked() - analogous to the
> -   page_referenced() reverse map walker.
> -
>  When munlock_vma_page() [see section :ref:`munlock()/munlockall() System Call
>  Handling ` above] tries to munlock a
>  page, it needs to determine whether or not the page is mapped by any
>  VM_LOCKED VMA without actually attempting to unmap all PTEs from the
>  page.  For this purpose, the unevictable/mlock infrastructure
> -introduced a variant of try_to_unmap() called try_to_munlock().
> +introduced a variant of try_to_unmap() called page_mlock().
>  
> -try_to_munlock() calls the same functions as try_to_unmap() for anonymous and
> -mapped file and KSM pages with a flag argument specifying unlock versus unmap
> -processing.  Again, these functions walk the respective reverse maps looking
> -for VM_LOCKED VMAs.  When such a VMA is found, as in the try_to_unmap() case,
> -the functions mlock the page via mlock_vma_page() and return SWAP_MLOCK.  
> This
> -undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page.
> +page_mlock() walks the respective reverse maps looking for VM_LOCKED VMAs. 
> When
> +such a VMA is found the page is mlocked via mlock_vma_page(). This undoes the
> +pre-clearing of the page's PG_mlocked done by munlock_vma_page.
>  
> -Note that try_to_munlock()'s reverse map walk must visit every VMA in a 
> page's
> +Note that page_mlock()'s reverse map walk must visit every VMA in a page's
>  reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA.
>  However, the scan can terminate when it encounters a VM_LOCKED VMA.
> -Although try_to_munlock() might be called a great many times when munlocking 
> a
> +Although page_mlock() might be called a great many times when