Re: [PATCH v2 4/9] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem

2021-04-20 Thread Axel Rasmussen
On Fri, Apr 16, 2021 at 4:47 PM Hugh Dickins  wrote:
>
> On Mon, 12 Apr 2021, Axel Rasmussen wrote:
>
> > With this change, userspace can resolve a minor fault within a
> > shmem-backed area with a UFFDIO_CONTINUE ioctl. The semantics for this
> > match those for hugetlbfs - we look up the existing page in the page
> > cache, and install PTEs for it.
>
> s/PTEs/a PTE/
>
> >
> > This commit introduces a new helper: mcopy_atomic_install_ptes.
>
> The plural is misleading: it only installs a single pte, so I'm going
> to ask you to change it throughout to mcopy_atomic_install_pte()
> (I'm not thrilled with the "mcopy" nor the "atomic", but there you are
> being consistent with userfaultfd's peculiar naming, so let them be).
>
> >
> > Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in
> > shmem.c? The existing userfault implementation only relies on shmem.c
> > for VM_SHARED VMAs. However, minor fault handling / CONTINUE work just
> > fine for !VM_SHARED VMAs as well. We'd prefer to handle CONTINUE for
> > shmem in one place, regardless of shared/private (to reduce code
> > duplication).
> >
> > Why add a new mcopy_atomic_install_ptes helper? A problem we have with
> > continue is that shmem_mcopy_atomic_pte() and mcopy_atomic_pte() are
> > *close* to what we want, but not exactly. We do want to setup the PTEs
> > in a CONTINUE operation, but we don't want to e.g. allocate a new page,
> > charge it (e.g. to the shmem inode), manipulate various flags, etc. Also
> > we have the problem stated above: shmem_mcopy_atomic_pte() and
> > mcopy_atomic_pte() both handle one-half of the problem (shared /
> > private) continue cares about. So, introduce mcontinue_atomic_pte(), to
> > handle all of the shmem continue cases. Introduce the helper so it
> > doesn't duplicate code with mcopy_atomic_pte().
> >
> > In a future commit, shmem_mcopy_atomic_pte() will also be modified to
> > use this new helper. However, since this is a bigger refactor, it seems
> > most clear to do it as a separate change.
>
> (Actually that turns out to be a nice deletion of lines,
> but you're absolutely right to do it as a separate patch.)
>
> >
> > Signed-off-by: Axel Rasmussen 
> > ---
> >  mm/userfaultfd.c | 176 +++
> >  1 file changed, 131 insertions(+), 45 deletions(-)
> >
> > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > index 23fa2583bbd1..8df0438f5d6a 100644
> > --- a/mm/userfaultfd.c
> > +++ b/mm/userfaultfd.c
> > @@ -48,6 +48,87 @@ struct vm_area_struct *find_dst_vma(struct mm_struct 
> > *dst_mm,
> >   return dst_vma;
> >  }
> >
> > +/*
> > + * Install PTEs, to map dst_addr (within dst_vma) to page.
> > + *
> > + * This function handles MCOPY_ATOMIC_CONTINUE (which is always 
> > file-backed),
> > + * whether or not dst_vma is VM_SHARED. It also handles the more general
> > + * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be 
> > file
> > + * backed, or not).
> > + *
> > + * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by
> > + * shmem_mcopy_atomic_pte instead.
>
> Right, I'm thinking in terms of five cases below (I'm not for a moment
> saying that you need to list these out in the comment, just saying that
> I could not get my head around the issues in this function without
> listing them out for myself):
>
> 1. anon private mcopy (using anon page newly allocated)
> 2. shmem private mcopy (using anon page newly allocated)
> 3. shmem private mcontinue (using page in cache from shmem_getpage)
> 4. shmem shared mcontinue (using page in cache from shmem_getpage)
> 5. shmem shared mcopy (using page in cache newly allocated)
>
> Of which each has a VM_WRITE and a !VM_WRITE case; and the third and
> fourth cases are new in this patch (it really would have been better
> to introduce mcopy_atomic_install_pte() in a separate earlier patch,
> but don't change that now we've got this far); and the fifth case does
> *not* use mcopy_atomic_install_pte() in this patch, but will in future.
>
> And while making these notes, let's hightlight again what is commented
> elsewhere, the odd nature of the second case: where userfaultfd short
> circuits to an anonymous CoW page without instantiating the shmem page.
> (Please double-check me on that: quite a lot of my comments below are
> about this case 2, so if I've got it wrong, then I've got a lot wrong.)

My understanding of case (2) is the same. In mfill_atomic_pte(), we
call into mcopy_atomic_pte if !VM_SHARED, regardless of whether it's
anon or shmem we're dealing with. That function allocates an anon
page, and then mcopy_atomic_install_pte() will *not* mark it writable,
so we get CoW semantics.

>
> > + */
> > +static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t 
> > *dst_pmd,
>
> mcopy_atomic_install_pte() throughout please.
>
> > +  struct vm_area_struct *dst_vma,
> > +  unsigned long dst_addr, struct 

Re: [PATCH v2 4/9] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem

2021-04-16 Thread Hugh Dickins
On Mon, 12 Apr 2021, Axel Rasmussen wrote:

> With this change, userspace can resolve a minor fault within a
> shmem-backed area with a UFFDIO_CONTINUE ioctl. The semantics for this
> match those for hugetlbfs - we look up the existing page in the page
> cache, and install PTEs for it.

s/PTEs/a PTE/

> 
> This commit introduces a new helper: mcopy_atomic_install_ptes.

The plural is misleading: it only installs a single pte, so I'm going
to ask you to change it throughout to mcopy_atomic_install_pte()
(I'm not thrilled with the "mcopy" nor the "atomic", but there you are
being consistent with userfaultfd's peculiar naming, so let them be).

> 
> Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in
> shmem.c? The existing userfault implementation only relies on shmem.c
> for VM_SHARED VMAs. However, minor fault handling / CONTINUE work just
> fine for !VM_SHARED VMAs as well. We'd prefer to handle CONTINUE for
> shmem in one place, regardless of shared/private (to reduce code
> duplication).
> 
> Why add a new mcopy_atomic_install_ptes helper? A problem we have with
> continue is that shmem_mcopy_atomic_pte() and mcopy_atomic_pte() are
> *close* to what we want, but not exactly. We do want to setup the PTEs
> in a CONTINUE operation, but we don't want to e.g. allocate a new page,
> charge it (e.g. to the shmem inode), manipulate various flags, etc. Also
> we have the problem stated above: shmem_mcopy_atomic_pte() and
> mcopy_atomic_pte() both handle one-half of the problem (shared /
> private) continue cares about. So, introduce mcontinue_atomic_pte(), to
> handle all of the shmem continue cases. Introduce the helper so it
> doesn't duplicate code with mcopy_atomic_pte().
> 
> In a future commit, shmem_mcopy_atomic_pte() will also be modified to
> use this new helper. However, since this is a bigger refactor, it seems
> most clear to do it as a separate change.

(Actually that turns out to be a nice deletion of lines,
but you're absolutely right to do it as a separate patch.)

> 
> Signed-off-by: Axel Rasmussen 
> ---
>  mm/userfaultfd.c | 176 +++
>  1 file changed, 131 insertions(+), 45 deletions(-)
> 
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 23fa2583bbd1..8df0438f5d6a 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -48,6 +48,87 @@ struct vm_area_struct *find_dst_vma(struct mm_struct 
> *dst_mm,
>   return dst_vma;
>  }
>  
> +/*
> + * Install PTEs, to map dst_addr (within dst_vma) to page.
> + *
> + * This function handles MCOPY_ATOMIC_CONTINUE (which is always file-backed),
> + * whether or not dst_vma is VM_SHARED. It also handles the more general
> + * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be file
> + * backed, or not).
> + *
> + * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by
> + * shmem_mcopy_atomic_pte instead.

Right, I'm thinking in terms of five cases below (I'm not for a moment
saying that you need to list these out in the comment, just saying that
I could not get my head around the issues in this function without
listing them out for myself):

1. anon private mcopy (using anon page newly allocated)
2. shmem private mcopy (using anon page newly allocated)
3. shmem private mcontinue (using page in cache from shmem_getpage)
4. shmem shared mcontinue (using page in cache from shmem_getpage)
5. shmem shared mcopy (using page in cache newly allocated)

Of which each has a VM_WRITE and a !VM_WRITE case; and the third and
fourth cases are new in this patch (it really would have been better
to introduce mcopy_atomic_install_pte() in a separate earlier patch,
but don't change that now we've got this far); and the fifth case does
*not* use mcopy_atomic_install_pte() in this patch, but will in future.

And while making these notes, let's hightlight again what is commented
elsewhere, the odd nature of the second case: where userfaultfd short
circuits to an anonymous CoW page without instantiating the shmem page.
(Please double-check me on that: quite a lot of my comments below are
about this case 2, so if I've got it wrong, then I've got a lot wrong.)

> + */
> +static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t 
> *dst_pmd,

mcopy_atomic_install_pte() throughout please.

> +  struct vm_area_struct *dst_vma,
> +  unsigned long dst_addr, struct page *page,
> +  bool newly_allocated, bool wp_copy)
> +{
> + int ret;
> + pte_t _dst_pte, *dst_pte;
> + int writable;

Sorry, it's silly of me, but I keep getting irritated by "int writable"
in company with the various bools; and the way vm_shared is initialized
below, but writable initialized later.  Please humour me by making it
bool writable = dst_vma->vm_flags & VM_WRITE;

> + bool vm_shared = dst_vma->vm_flags & VM_SHARED;

And I've found that we also need
bool