Re: [PATCH v6 17/24] mm/gup: track FOLL_PIN pages

2019-11-19 Thread John Hubbard
On 11/19/19 3:37 AM, Jan Kara wrote:
> On Tue 19-11-19 00:16:36, John Hubbard wrote:
>> @@ -2025,6 +2149,20 @@ static int __record_subpages(struct page *page, 
>> unsigned long addr,
>>  return nr;
>>  }
>>  
>> +static bool __pin_compound_head(struct page *head, int refs, unsigned int 
>> flags)
>> +{
> 
> I don't quite like the proliferation of names starting with __. I don't
> think there's a good reason for that, particularly in this case. Also 'pin'
> here is somewhat misleading as we already use term "pin" for the particular
> way of pinning the page. We could have grab_compound_head() or maybe
> nail_compound_head() :), but you're native speaker so you may come up with
> better word.


Yes, it is ugly naming, I'll change these as follows:

__pin_compound_head() --> grab_compound_head()
__record_subpages()   --> record_subpages()

I loved the "nail_compound_head()" suggestion, it just seems very vivid, but
in the end, I figured I'd better keep it relatively drab and colorless. :)

> 
>> +if (flags & FOLL_PIN) {
>> +if (unlikely(!try_pin_compound_head(head, refs)))
>> +return false;
>> +} else {
>> +head = try_get_compound_head(head, refs);
>> +if (!head)
>> +return false;
>> +}
>> +
>> +return true;
>> +}
>> +
>>  static void put_compound_head(struct page *page, int refs)
>>  {
>>  /* Do a get_page() first, in case refs == page->_refcount */
> 
> put_compound_head() needs similar treatment as undo_dev_pagemap(), doesn't
> it?
> 

Yes, will fix that up.


>> @@ -968,7 +973,18 @@ struct page *follow_devmap_pmd(struct vm_area_struct 
>> *vma, unsigned long addr,
>>  if (!*pgmap)
>>  return ERR_PTR(-EFAULT);
>>  page = pfn_to_page(pfn);
>> -get_page(page);
>> +
>> +if (flags & FOLL_GET)
>> +get_page(page);
>> +else if (flags & FOLL_PIN) {
>> +/*
>> + * try_pin_page() is not actually expected to fail here because
>> + * we hold the pmd lock so no one can unmap the pmd and free the
>> + * page that it points to.
>> + */
>> +if (unlikely(!try_pin_page(page)))
>> +page = ERR_PTR(-EFAULT);
>> +}
> 
> This pattern is rather common. So maybe I'd add a helper grab_page(page,
> flags) doing
> 
>   if (flags & FOLL_GET)
>   get_page(page);
>   else if (flags & FOLL_PIN)
>   return try_pin_page(page);
>   return true;
> 

OK.

> Otherwise the patch looks good to me now.
> 
>   Honza

Great! I thought I'd have a v7 out today, but fate decided to have me repair
my test machine instead. So, soon. ha. :)

thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v6 17/24] mm/gup: track FOLL_PIN pages

2019-11-19 Thread Jan Kara
On Tue 19-11-19 00:16:36, John Hubbard wrote:
> @@ -2025,6 +2149,20 @@ static int __record_subpages(struct page *page, 
> unsigned long addr,
>   return nr;
>  }
>  
> +static bool __pin_compound_head(struct page *head, int refs, unsigned int 
> flags)
> +{

I don't quite like the proliferation of names starting with __. I don't
think there's a good reason for that, particularly in this case. Also 'pin'
here is somewhat misleading as we already use term "pin" for the particular
way of pinning the page. We could have grab_compound_head() or maybe
nail_compound_head() :), but you're native speaker so you may come up with
better word.

> + if (flags & FOLL_PIN) {
> + if (unlikely(!try_pin_compound_head(head, refs)))
> + return false;
> + } else {
> + head = try_get_compound_head(head, refs);
> + if (!head)
> + return false;
> + }
> +
> + return true;
> +}
> +
>  static void put_compound_head(struct page *page, int refs)
>  {
>   /* Do a get_page() first, in case refs == page->_refcount */

put_compound_head() needs similar treatment as undo_dev_pagemap(), doesn't
it?

> @@ -968,7 +973,18 @@ struct page *follow_devmap_pmd(struct vm_area_struct 
> *vma, unsigned long addr,
>   if (!*pgmap)
>   return ERR_PTR(-EFAULT);
>   page = pfn_to_page(pfn);
> - get_page(page);
> +
> + if (flags & FOLL_GET)
> + get_page(page);
> + else if (flags & FOLL_PIN) {
> + /*
> +  * try_pin_page() is not actually expected to fail here because
> +  * we hold the pmd lock so no one can unmap the pmd and free the
> +  * page that it points to.
> +  */
> + if (unlikely(!try_pin_page(page)))
> + page = ERR_PTR(-EFAULT);
> + }

This pattern is rather common. So maybe I'd add a helper grab_page(page,
flags) doing

if (flags & FOLL_GET)
get_page(page);
else if (flags & FOLL_PIN)
return try_pin_page(page);
return true;

Otherwise the patch looks good to me now.

Honza
-- 
Jan Kara 
SUSE Labs, CR
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v6 17/24] mm/gup: track FOLL_PIN pages

2019-11-19 Thread John Hubbard
Add tracking of pages that were pinned via FOLL_PIN.

As mentioned in the FOLL_PIN documentation, callers who effectively set
FOLL_PIN are required to ultimately free such pages via put_user_page().
The effect is similar to FOLL_GET, and may be thought of as "FOLL_GET
for DIO and/or RDMA use".

Pages that have been pinned via FOLL_PIN are identifiable via a
new function call:

   bool page_dma_pinned(struct page *page);

What to do in response to encountering such a page, is left to later
patchsets. There is discussion about this in [1], [2], and [3].

This also changes a BUG_ON(), to a WARN_ON(), in follow_page_mask().

[1] Some slow progress on get_user_pages() (Apr 2, 2019):
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018):
https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018):
https://lwn.net/Articles/753027/

Suggested-by: Jan Kara 
Suggested-by: Jérôme Glisse 
Signed-off-by: John Hubbard 
---
 Documentation/core-api/pin_user_pages.rst |   2 +-
 include/linux/mm.h|  86 +--
 include/linux/mmzone.h|   2 +
 include/linux/page_ref.h  |  10 +
 mm/gup.c  | 290 --
 mm/huge_memory.c  |  54 +++-
 mm/hugetlb.c  |  39 ++-
 mm/vmstat.c   |   2 +
 8 files changed, 390 insertions(+), 95 deletions(-)

diff --git a/Documentation/core-api/pin_user_pages.rst 
b/Documentation/core-api/pin_user_pages.rst
index 4f26637a5005..baa288a44a77 100644
--- a/Documentation/core-api/pin_user_pages.rst
+++ b/Documentation/core-api/pin_user_pages.rst
@@ -53,7 +53,7 @@ Which flags are set by each wrapper
 For these pin_user_pages*() functions, FOLL_PIN is OR'd in with whatever gup
 flags the caller provides. The caller is required to pass in a non-null struct
 pages* array, and the function then pin pages by incrementing each by a special
-value. For now, that value is +1, just like get_user_pages*().::
+value: GUP_PIN_COUNTING_BIAS.::
 
  Function
  
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 568cbb895f03..ab311b356ab1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1054,6 +1054,21 @@ static inline __must_check bool try_get_page(struct page 
*page)
return true;
 }
 
+__must_check bool try_pin_compound_head(struct page *page, int refs);
+
+/**
+ * try_pin_page() - mark a page as being used by pin_user_pages*().
+ *
+ * This is the FOLL_PIN counterpart to try_get_page().
+ *
+ * @page:  pointer to page to be marked
+ * @Return:true for success, false for failure
+ */
+static inline __must_check bool try_pin_page(struct page *page)
+{
+   return try_pin_compound_head(page, 1);
+}
+
 static inline void put_page(struct page *page)
 {
page = compound_head(page);
@@ -1071,29 +1086,70 @@ static inline void put_page(struct page *page)
__put_page(page);
 }
 
-/**
- * put_user_page() - release a gup-pinned page
- * @page:pointer to page to be released
+/*
+ * GUP_PIN_COUNTING_BIAS, and the associated functions that use it, overload
+ * the page's refcount so that two separate items are tracked: the original 
page
+ * reference count, and also a new count of how many pin_user_pages() calls 
were
+ * made against the page. ("gup-pinned" is another term for the latter).
+ *
+ * With this scheme, pin_user_pages() becomes special: such pages are marked as
+ * distinct from normal pages. As such, the put_user_page() call (and its
+ * variants) must be used in order to release gup-pinned pages.
+ *
+ * Choice of value:
+ *
+ * By making GUP_PIN_COUNTING_BIAS a power of two, debugging of page reference
+ * counts with respect to pin_user_pages() and put_user_page() becomes simpler,
+ * due to the fact that adding an even power of two to the page refcount has 
the
+ * effect of using only the upper N bits, for the code that counts up using the
+ * bias value. This means that the lower bits are left for the exclusive use of
+ * the original code that increments and decrements by one (or at least, by 
much
+ * smaller values than the bias value).
  *
- * Pages that were pinned via pin_user_pages*() must be released via either
- * put_user_page(), or one of the put_user_pages*() routines. This is so that
- * eventually such pages can be separately tracked and uniquely handled. In
- * particular, interactions with RDMA and filesystems need special handling.
+ * Of course, once the lower bits overflow into the upper bits (and this is
+ * OK, because subtraction recovers the original values), then visual 
inspection
+ * no longer suffices to directly view the separate counts. However, for normal
+ * applications that don't have huge page reference counts, this won't be an
+ * issue.
  *
- * put_user_page() and put_page() are not interchangeable, despite this early
- * implementation that makes them