On 2/26/19 2:36 PM, Andrew Morton wrote:
>> ...
>>
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -2274,7 +2274,7 @@ static int adjust_pool_surplus(struct hstate *h,
>> nodemask_t *nodes_allowed,
> 
> Please tweak that email client to prevent the wordwraps.

Sorry about that.

>> +    /*
>> +     * Check for a node specific request.  Adjust global count, but
>> +     * restrict alloc/free to the specified node.
>> +     */

Better comment might be:

        /*
         * Check for a node specific request.
         * Changing node specific huge page count may require a corresponding
         * change to the global count.  In any case, the passed node mask
         * (nodes_allowed) will restrict alloc/free to the specified node.
         */

>> +    if (nid != NUMA_NO_NODE) {
>> +            unsigned long old_count = count;
>> +            count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
>> +            /*
>> +             * If user specified count causes overflow, set to
>> +             * largest possible value.
>> +             */

Updated comment:
                /*
                 * User may have specified a large count value which caused the
                 * above calculation to overflow.  In this case, they wanted
                 * to allocate as many huge pages as possible.  Set count to
                 * largest possible value to align with their intention.
                 */

>> +            if (count < old_count)
>> +                    count = ULONG_MAX;
>> +    }
> 
> The above two comments explain the code, but do not reveal the
> reasoning behind the policy decisions which that code implements.
> 
>> ...
>>
>> +    } else {
>>              /*
>> -             * per node hstate attribute: adjust count to global,
>> -             * but restrict alloc/free to the specified node.
>> +             * Node specific request, but we could not allocate
>> +             * node mask.  Pass in ALL nodes, and clear nid.
>>               */
> 
> Ditto here, somewhat.

I was just going to update the comments and send you a new patch, but
but your comment got me thinking about this situation.  I did not really
change the way this code operates.  As a reminder, the original code is like:

NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY);

if (nid == NUMA_NO_NODE) {
        /* do something */
} else if (nodes_allowed) {
        /* do something else */
} else {
        nodes_allowed = &node_states[N_MEMORY];
}

So, the only way we get to that final else if if we can not allocate
a node mask (kmalloc a few words).  Right?  I wonder why we should
even try to continue in this case.  Why not just return right there?

The specified count value is either a request to increase number of
huge pages or decrease.  If we can't allocate a few words, we certainly
are not going to find memory to create huge pages.  There 'might' be
surplus pages which can be converted to permanent pages.  But remember
this is a 'node specific' request and we can't allocate a mask to pass
down to the conversion routines.  So, chances are good we would operate
on the wrong node.  The same goes for a request to 'free' huge pages.
Since, we can't allocate a node mask we are likely to free them from
the wrong node.

Unless my reasoning above is incorrect, I think that final else block
in __nr_hugepages_store_common() is wrong.

Any additional thoughts?
-- 
Mike Kravetz

Reply via email to