Hi David,

> > For drivers that would like to longterm-pin the pages associated
> > with a file, the pin_user_pages_fd() API provides an option to
> > not only FOLL_PIN the pages but also to check and migrate them
> > if they reside in movable zone or CMA block. For now, this API
> > can only work with files belonging to shmem or hugetlbfs given
> > that the udmabuf driver is the only user.
> 
> Maybe add "Other files are rejected.". Wasn't clear to me before I
> looked into the code.
Ok, will add it in v2.

> 
> >
> > It must be noted that the pages associated with hugetlbfs files
> > are expected to be found in the page cache. An error is returned
> > if they are not found. However, shmem pages can be swapped in or
> > allocated if they are not present in the page cache.
> >
> > Cc: David Hildenbrand <da...@redhat.com>
> > Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
> > Cc: Mike Kravetz <mike.krav...@oracle.com>
> > Cc: Hugh Dickins <hu...@google.com>
> > Cc: Peter Xu <pet...@redhat.com>
> > Cc: Gerd Hoffmann <kra...@redhat.com>
> > Cc: Dongwon Kim <dongwon....@intel.com>
> > Cc: Junxiao Chang <junxiao.ch...@intel.com>
> > Suggested-by: Jason Gunthorpe <j...@nvidia.com>
> > Signed-off-by: Vivek Kasireddy <vivek.kasire...@intel.com>
> > ---
> >   include/linux/mm.h |  2 ++
> >   mm/gup.c           | 87
> ++++++++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 89 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index bf5d0b1b16f4..af2121fb8101 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2457,6 +2457,8 @@ long get_user_pages_unlocked(unsigned long
> start, unsigned long nr_pages,
> >                 struct page **pages, unsigned int gup_flags);
> >   long pin_user_pages_unlocked(unsigned long start, unsigned long
> nr_pages,
> >                 struct page **pages, unsigned int gup_flags);
> > +long pin_user_pages_fd(int fd, pgoff_t start, unsigned long nr_pages,
> > +                  unsigned int gup_flags, struct page **pages);
> >
> >   int get_user_pages_fast(unsigned long start, int nr_pages,
> >                     unsigned int gup_flags, struct page **pages);
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 2f8a2d89fde1..e34b77a15fa8 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -3400,3 +3400,90 @@ long pin_user_pages_unlocked(unsigned long
> start, unsigned long nr_pages,
> >                                  &locked, gup_flags);
> >   }
> >   EXPORT_SYMBOL(pin_user_pages_unlocked);
> > +
> 
> This does look quite neat, nice! Let's take a closer look ...
> 
> > +/**
> > + * pin_user_pages_fd() - pin user pages associated with a file
> > + * @fd:         the fd whose pages are to be pinned
> > + * @start:      starting file offset
> > + * @nr_pages:   number of pages from start to pin
> > + * @gup_flags:  flags modifying pin behaviour
> 
> ^ I assume we should drop that. At least for now the flags are
> completely unused. And most likely we would want a different set of
> flags later (GUPFD_ ...).
Right now, FOLL_LONGTERM is the only accepted value for gup_flags but
yes, as you suggest, this can be made implicit by dropping gup_flags.

> 
> > + * @pages:      array that receives pointers to the pages pinned.
> > + *              Should be at least nr_pages long.
> > + *
> > + * Attempt to pin (and migrate) pages associated with a file belonging to
> 
> I'd drop the "and migrate" part, it's more of an implementation detail.
> 
> > + * either shmem or hugetlbfs. An error is returned if pages associated with
> > + * hugetlbfs files are not present in the page cache. However, shmem
> pages
> > + * are swapped in or allocated if they are not present in the page cache.
> 
> Why don't we do the same for hugetlbfs? Would make the interface more
> streamlined.
I am going off of what Mike has stated previously:
"It may not matter to your users, but the semantics for hugetlb and shmem
pages is different.  hugetlb requires the pages exist in the page cache
while shmem will create/add pages to the cache if necessary."

However, if we were to allocate a hugepage (assuming one is not present in the
page cache at a given index), what needs to be done in addition to calling 
these APIs?
folio = alloc_hugetlb_folio_nodemask(h, NUMA_NO_NODE, NULL, GFP_USER)
hugetlb_add_to_page_cache(folio, mapping, idx)

> 
> Certainly add that pinned pages have to be released using
> unpin_user_pages().
Sure, will include that in v2.

> 
> > + *
> > + * Returns number of pages pinned. This would be equal to the number of
> > + * pages requested.
> > + * If nr_pages is 0 or negative, returns 0. If no pages were pinned, 
> > returns
> > + * -errno.
> > + */
> > +long pin_user_pages_fd(int fd, pgoff_t start, unsigned long nr_pages,
> > +                  unsigned int gup_flags, struct page **pages)
> > +{
> > +   struct page *page;
> > +   struct file *filep;
> > +   unsigned int flags, i;
> > +   long ret;
> > +
> > +   if (nr_pages <= 0)
> > +           return 0;
> 
> I think we should just forbid that and use a WARN_ON_ONCE() here /
> return -EINVAL. So we'll never end up returning 0.
I think I'll drop this check in v2 as Jason suggested.

> 
> > +   if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN))
> > +           return 0;
> > +
> > +   if (start < 0)
> > +           return -EINVAL;
> > +
> > +   filep = fget(fd);
> > +   if (!filep)
> > +       return -EINVAL;
> > +
> > +   if (!shmem_file(filep) && !is_file_hugepages(filep))
> > +       return -EINVAL;
> > +
> > +   flags = memalloc_pin_save();
> > +   do {
> > +           for (i = 0; i < nr_pages; i++) {
> > +                   if (shmem_mapping(filep->f_mapping)) {
> > +                           page = shmem_read_mapping_page(filep-
> >f_mapping,
> > +                                                          start + i);
> > +                           if (IS_ERR(page)) {
> > +                                   ret = PTR_ERR(page);
> > +                                   goto err;
> > +                           }
> > +                   } else {
> > +                           page = find_get_page_flags(filep->f_mapping,
> > +                                                      start + i,
> > +                                                      FGP_ACCESSED);
> > +                           if (!page) {
> > +                                   ret = -EINVAL;
> > +                                   goto err;
> > +                           }
> > +                   }
> > +                   ret = try_grab_page(page, FOLL_PIN);
> > +                   if (unlikely(ret))
> > +                           goto err;
> > +
> > +                   pages[i] = page;
> > +                   put_page(pages[i]);
> > +           }
> > +
> > +           ret = check_and_migrate_movable_pages(nr_pages, pages);
> > +   } while (ret == -EAGAIN);
> > +
> > +err:
> > +   memalloc_pin_restore(flags);
> > +   fput(filep);
> > +   if (!ret)
> > +           return nr_pages;
> > +
> > +   while (i > 0 && pages[--i]) {
> > +           unpin_user_page(pages[i]);
> > +           pages[i] = NULL;
> 
> If migrate_longterm_unpinnable_pages() failed, say with -ENOMEM, the
> pages were already unpinned, but pages[i] has not been cleared, no?
You are right; the above while should not be executed in that case. I added
this chunk to cleanup after any errors thrown in the for loop above. I guess
I need to add a new error label to cleanup after errors thrown by
check_and_migrate_movable_pages().

Thanks,
Vivek
> 
> > +   }
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(pin_user_pages_fd);
> > +
> 
> --
> Cheers,
> 
> David / dhildenb

Reply via email to