On Tue, Jun 4, 2019 at 12:48 PM John Hubbard <jhubb...@nvidia.com> wrote:
>
> On 6/4/19 9:48 AM, ira.we...@intel.com wrote:
> > From: Ira Weiny <ira.we...@intel.com>
> >
> > release_pages() is an optimized version of a loop around put_page().
> > Unfortunately for devmap pages the logic is not entirely correct in
> > release_pages().  This is because device pages can be more than type
> > MEMORY_DEVICE_PUBLIC.  There are in fact 4 types, private, public, FS
> > DAX, and PCI P2PDMA.  Some of these have specific needs to "put" the
> > page while others do not.
> >
> > This logic to handle any special needs is contained in
> > put_devmap_managed_page().  Therefore all devmap pages should be
> > processed by this function where we can contain the correct logic for a
> > page put.
> >
> > Handle all device type pages within release_pages() by calling
> > put_devmap_managed_page() on all devmap pages.  If
> > put_devmap_managed_page() returns true the page has been put and we
> > continue with the next page.  A false return of
> > put_devmap_managed_page() means the page did not require special
> > processing and should fall to "normal" processing.
> >
> > This was found via code inspection while determining if release_pages()
> > and the new put_user_pages() could be interchangeable.[1]
> >
> > [1] 
> > https://lore.kernel.org/lkml/20190523172852.ga27...@iweiny-desk2.sc.intel.com/
> >
> > Cc: Jérôme Glisse <jgli...@redhat.com>
> > Cc: Michal Hocko <mho...@suse.com>
> > Reviewed-by: Dan Williams <dan.j.willi...@intel.com>
> > Reviewed-by: John Hubbard <jhubb...@nvidia.com>
> > Signed-off-by: Ira Weiny <ira.we...@intel.com>
> >
> > ---
> > Changes from V2:
> >       Update changelog for more clarity as requested by Michal
> >       Update comment WRT "failing" of put_devmap_managed_page()
> >
> > Changes from V1:
> >       Add comment clarifying that put_devmap_managed_page() can still
> >       fail.
> >       Add Reviewed-by tags.
> >
> >  mm/swap.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 7ede3eddc12a..6d153ce4cb8c 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr)
> >               if (is_huge_zero_page(page))
> >                       continue;
> >
> > -             /* Device public page can not be huge page */
> > -             if (is_device_public_page(page)) {
> > +             if (is_zone_device_page(page)) {
> >                       if (locked_pgdat) {
> >                               
> > spin_unlock_irqrestore(&locked_pgdat->lru_lock,
> >                                                      flags);
> >                               locked_pgdat = NULL;
> >                       }
> > -                     put_devmap_managed_page(page);
> > -                     continue;
> > +                     /*
> > +                      * Not all zone-device-pages require special
> > +                      * processing.  Those pages return 'false' from
> > +                      * put_devmap_managed_page() expecting a call to
> > +                      * put_page_testzero()
> > +                      */
>
> Just a documentation tweak: how about:
>
>                         /*
>                          * ZONE_DEVICE pages that return 'false' from
>                          * put_devmap_managed_page() do not require special
>                          * processing, and instead, expect a call to
>                          * put_page_testzero().
>                          */

Looks better to me, but maybe just go ahead and list those
expectations explicitly. Something like:

                        /*
                         * put_devmap_managed_page() only handles
                         * ZONE_DEVICE (struct dev_pagemap managed)
                         * pages when the hosting dev_pagemap has the
                         * ->free() or ->fault() callback handlers
                         *  implemented as indicated by
                         *  dev_pagemap.type. Otherwise the expectation
                         *  is to fall back to a plain decrement /
                         *  put_page_testzero().
                         */

Reply via email to