On 12/13/2017 11:50 PM, Michal Hocko wrote:
> On Wed 13-12-17 16:45:55, Mike Kravetz wrote:
>> On 12/04/2017 06:01 AM, Michal Hocko wrote:
>>> From: Michal Hocko <mho...@suse.com>
>>>
>>> alloc_surplus_huge_page increases the pool size and the number of
>>> surplus pages opportunistically to prevent from races with the pool size
>>> change. See d1c3fb1f8f29 ("hugetlb: introduce nr_overcommit_hugepages
>>> sysctl") for more details.
>>>
>>> The resulting code is unnecessarily hairy, cause code duplication and
>>> doesn't allow to share the allocation paths. Moreover pool size changes
>>> tend to be very seldom so optimizing for them is not really reasonable.
>>> Simplify the code and allow to allocate a fresh surplus page as long as
>>> we are under the overcommit limit and then recheck the condition after
>>> the allocation and drop the new page if the situation has changed. This
>>> should provide a reasonable guarantee that an abrupt allocation requests
>>> will not go way off the limit.
>>>
>>> If we consider races with the pool shrinking and enlarging then we
>>> should be reasonably safe as well. In the first case we are off by one
>>> in the worst case and the second case should work OK because the page is
>>> not yet visible. We can waste CPU cycles for the allocation but that
>>> should be acceptable for a relatively rare condition.
>>>
>>> Signed-off-by: Michal Hocko <mho...@suse.com>
>>> ---
>>>  mm/hugetlb.c | 60 
>>> +++++++++++++++++++++---------------------------------------
>>>  1 file changed, 21 insertions(+), 39 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index a1b8b2888ec9..0c7dc269b6c0 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -1538,62 +1538,44 @@ int dissolve_free_huge_pages(unsigned long 
>>> start_pfn, unsigned long end_pfn)
>>>  static struct page *__alloc_surplus_huge_page(struct hstate *h, gfp_t 
>>> gfp_mask,
>>>             int nid, nodemask_t *nmask)
>>>  {
>>> -   struct page *page;
>>> -   unsigned int r_nid;
>>> +   struct page *page = NULL;
>>>  
>>>     if (hstate_is_gigantic(h))
>>>             return NULL;
>>>  
>>> -   /*
>>> -    * Assume we will successfully allocate the surplus page to
>>> -    * prevent racing processes from causing the surplus to exceed
>>> -    * overcommit
>>> -    *
>>> -    * This however introduces a different race, where a process B
>>> -    * tries to grow the static hugepage pool while alloc_pages() is
>>> -    * called by process A. B will only examine the per-node
>>> -    * counters in determining if surplus huge pages can be
>>> -    * converted to normal huge pages in adjust_pool_surplus(). A
>>> -    * won't be able to increment the per-node counter, until the
>>> -    * lock is dropped by B, but B doesn't drop hugetlb_lock until
>>> -    * no more huge pages can be converted from surplus to normal
>>> -    * state (and doesn't try to convert again). Thus, we have a
>>> -    * case where a surplus huge page exists, the pool is grown, and
>>> -    * the surplus huge page still exists after, even though it
>>> -    * should just have been converted to a normal huge page. This
>>> -    * does not leak memory, though, as the hugepage will be freed
>>> -    * once it is out of use. It also does not allow the counters to
>>> -    * go out of whack in adjust_pool_surplus() as we don't modify
>>> -    * the node values until we've gotten the hugepage and only the
>>> -    * per-node value is checked there.
>>> -    */
>>>     spin_lock(&hugetlb_lock);
>>> -   if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
>>> -           spin_unlock(&hugetlb_lock);
>>> -           return NULL;
>>> -   } else {
>>> -           h->nr_huge_pages++;
>>> -           h->surplus_huge_pages++;
>>> -   }
>>> +   if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages)
>>> +           goto out_unlock;
>>>     spin_unlock(&hugetlb_lock);
>>>  
>>>     page = __hugetlb_alloc_buddy_huge_page(h, gfp_mask, nid, nmask);
>>> +   if (!page)
>>> +           goto out_unlock;
>>>  
>>>     spin_lock(&hugetlb_lock);
>>> -   if (page) {
>>> +   /*
>>> +    * We could have raced with the pool size change.
>>> +    * Double check that and simply deallocate the new page
>>> +    * if we would end up overcommiting the surpluses. Abuse
>>> +    * temporary page to workaround the nasty free_huge_page
>>> +    * codeflow
>>> +    */
>>> +   if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
>>> +           SetPageHugeTemporary(page);
>>> +           put_page(page);
>>> +           page = NULL;
>>> +   } else {
>>> +           h->surplus_huge_pages_node[page_to_nid(page)]++;
>>> +           h->surplus_huge_pages++;
>>>             INIT_LIST_HEAD(&page->lru);
>>>             r_nid = page_to_nid(page);
>>>             set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
>>>             set_hugetlb_cgroup(page, NULL);
>>> -           /*
>>> -            * We incremented the global counters already
>>> -            */
>>>             h->nr_huge_pages_node[r_nid]++;
>>>             h->surplus_huge_pages_node[r_nid]++;
>>> -   } else {
>>> -           h->nr_huge_pages--;
>>> -           h->surplus_huge_pages--;
>>
>> In the case of a successful surplus allocation, the following counters
>> are incremented:
>>
>> h->surplus_huge_pages_node[page_to_nid(page)]++;
>> h->surplus_huge_pages++;
>> h->nr_huge_pages_node[r_nid]++;
>> h->surplus_huge_pages_node[r_nid]++;
>>
>> Looks like per-node surplus_huge_pages_node is incremented twice, and
>> global nr_huge_pages is not incremented at all.
>>
>> Also, you removed r_nid so I'm guessing this will not compile?
> 
> Ups a hickup during the rebase/split up. The following code removes all
> this so I haven't noticed. Thanks for catching that!
> The incremental diff
> ---
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 41d2d9082f0d..3c16cde72ceb 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1565,8 +1565,10 @@ static struct page *__alloc_surplus_huge_page(struct 
> hstate *h, gfp_t gfp_mask,
>               put_page(page);
>               page = NULL;
>       } else {
> -             h->surplus_huge_pages_node[page_to_nid(page)]++;
> +             int r_nid;
> +
>               h->surplus_huge_pages++;
> +             h->nr_huge_pages++;
>               INIT_LIST_HEAD(&page->lru);
>               r_nid = page_to_nid(page);
>               set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
> 

With the incremental diff,

Reviewed-by: Mike Kravetz <mike.krav...@oracle.com>
-- 
Mike Kravetz

Reply via email to