On 12/04/2017 06:01 AM, Michal Hocko wrote:
> From: Michal Hocko <mho...@suse.com>
> 
> alloc_surplus_huge_page increases the pool size and the number of
> surplus pages opportunistically to prevent from races with the pool size
> change. See d1c3fb1f8f29 ("hugetlb: introduce nr_overcommit_hugepages
> sysctl") for more details.
> 
> The resulting code is unnecessarily hairy, cause code duplication and
> doesn't allow to share the allocation paths. Moreover pool size changes
> tend to be very seldom so optimizing for them is not really reasonable.
> Simplify the code and allow to allocate a fresh surplus page as long as
> we are under the overcommit limit and then recheck the condition after
> the allocation and drop the new page if the situation has changed. This
> should provide a reasonable guarantee that an abrupt allocation requests
> will not go way off the limit.
> 
> If we consider races with the pool shrinking and enlarging then we
> should be reasonably safe as well. In the first case we are off by one
> in the worst case and the second case should work OK because the page is
> not yet visible. We can waste CPU cycles for the allocation but that
> should be acceptable for a relatively rare condition.
> 
> Signed-off-by: Michal Hocko <mho...@suse.com>
> ---
>  mm/hugetlb.c | 60 
> +++++++++++++++++++++---------------------------------------
>  1 file changed, 21 insertions(+), 39 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a1b8b2888ec9..0c7dc269b6c0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1538,62 +1538,44 @@ int dissolve_free_huge_pages(unsigned long start_pfn, 
> unsigned long end_pfn)
>  static struct page *__alloc_surplus_huge_page(struct hstate *h, gfp_t 
> gfp_mask,
>               int nid, nodemask_t *nmask)
>  {
> -     struct page *page;
> -     unsigned int r_nid;
> +     struct page *page = NULL;
>  
>       if (hstate_is_gigantic(h))
>               return NULL;
>  
> -     /*
> -      * Assume we will successfully allocate the surplus page to
> -      * prevent racing processes from causing the surplus to exceed
> -      * overcommit
> -      *
> -      * This however introduces a different race, where a process B
> -      * tries to grow the static hugepage pool while alloc_pages() is
> -      * called by process A. B will only examine the per-node
> -      * counters in determining if surplus huge pages can be
> -      * converted to normal huge pages in adjust_pool_surplus(). A
> -      * won't be able to increment the per-node counter, until the
> -      * lock is dropped by B, but B doesn't drop hugetlb_lock until
> -      * no more huge pages can be converted from surplus to normal
> -      * state (and doesn't try to convert again). Thus, we have a
> -      * case where a surplus huge page exists, the pool is grown, and
> -      * the surplus huge page still exists after, even though it
> -      * should just have been converted to a normal huge page. This
> -      * does not leak memory, though, as the hugepage will be freed
> -      * once it is out of use. It also does not allow the counters to
> -      * go out of whack in adjust_pool_surplus() as we don't modify
> -      * the node values until we've gotten the hugepage and only the
> -      * per-node value is checked there.
> -      */
>       spin_lock(&hugetlb_lock);
> -     if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
> -             spin_unlock(&hugetlb_lock);
> -             return NULL;
> -     } else {
> -             h->nr_huge_pages++;
> -             h->surplus_huge_pages++;
> -     }
> +     if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages)
> +             goto out_unlock;
>       spin_unlock(&hugetlb_lock);
>  
>       page = __hugetlb_alloc_buddy_huge_page(h, gfp_mask, nid, nmask);
> +     if (!page)
> +             goto out_unlock;
>  
>       spin_lock(&hugetlb_lock);
> -     if (page) {
> +     /*
> +      * We could have raced with the pool size change.
> +      * Double check that and simply deallocate the new page
> +      * if we would end up overcommiting the surpluses. Abuse
> +      * temporary page to workaround the nasty free_huge_page
> +      * codeflow
> +      */
> +     if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
> +             SetPageHugeTemporary(page);
> +             put_page(page);
> +             page = NULL;
> +     } else {
> +             h->surplus_huge_pages_node[page_to_nid(page)]++;
> +             h->surplus_huge_pages++;
>               INIT_LIST_HEAD(&page->lru);
>               r_nid = page_to_nid(page);
>               set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
>               set_hugetlb_cgroup(page, NULL);
> -             /*
> -              * We incremented the global counters already
> -              */
>               h->nr_huge_pages_node[r_nid]++;
>               h->surplus_huge_pages_node[r_nid]++;
> -     } else {
> -             h->nr_huge_pages--;
> -             h->surplus_huge_pages--;

In the case of a successful surplus allocation, the following counters
are incremented:

h->surplus_huge_pages_node[page_to_nid(page)]++;
h->surplus_huge_pages++;
h->nr_huge_pages_node[r_nid]++;
h->surplus_huge_pages_node[r_nid]++;

Looks like per-node surplus_huge_pages_node is incremented twice, and
global nr_huge_pages is not incremented at all.

Also, you removed r_nid so I'm guessing this will not compile?
-- 
Mike Kravetz


>       }
> +
> +out_unlock:
>       spin_unlock(&hugetlb_lock);
>  
>       return page;
> 

Reply via email to