On 2016/7/25 15:52, Vlastimil Babka wrote:

> On 07/25/2016 09:42 AM, Xishi Qiu wrote:
>> On 2016/7/25 14:59, Vlastimil Babka wrote:
>>
>>> On 07/22/2016 04:57 AM, Xishi Qiu wrote:
>>>> Memory offline could happen on both movable zone and non-movable zone.
>>>> We can offline the whole node if the zone is movable zone, and if the
>>>> zone is non-movable zone, we cannot offline the whole node, because
>>>> some kernel memory can't be migrated.
>>>>
>>>> So if we offline a node with movable zone, use prefer mempolicy to alloc
>>>> new page from the next node instead of the current node or other remote
>>>> nodes, because re-migrate is a waste of time and the distance of the
>>>> remote nodes is often very large.
>>>>
>>>> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
>>>> zone.
>>>>
>>>> Signed-off-by: Xishi Qiu <qiuxi...@huawei.com>
>>>
>>> I think this could be simpler, if you preferred the next node regardless of 
>>> whether it's movable zone or not. What are use cases for trying to offline 
>>> part of non-MOVABLE zone in a node? It's not guaranteed to succeed anyway. 
>>> Also if the reasoning is that the non-MOVABLE offlining preference for 
>>> migration target should be instead on the *same* node, then 
>>> alloc_migrate_target() would anyway prefer the node of the current CPU that 
>>> happens to execute the offlining, which is random wrt the node in question. 
>>> So consistently choosing remote node is IMHO better than random even for 
>>> non-MOVABLE zone.
>>>
>>
>> Hi Vlastimil,
>>
>> use next node for movable zone, use current node for non-movable zone, right?
> 
> I was asking why not just next node for any zone, to make things simpler. 
> What's the use case for offlining in non-movable zone?
> 

OK, both use the next node is much simpler.

>>
>>>> ---
>>>>  mm/memory_hotplug.c | 35 +++++++++++++++++++++++++++++------
>>>>  1 file changed, 29 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>> index e3cbdca..930a5c6 100644
>>>> --- a/mm/memory_hotplug.c
>>>> +++ b/mm/memory_hotplug.c
>>>> @@ -1501,6 +1501,16 @@ static unsigned long scan_movable_pages(unsigned 
>>>> long start, unsigned long end)
>>>>      return 0;
>>>>  }
>>>>
>>>> +static struct page *new_node_page(struct page *page, unsigned long node,
>>>> +        int **result)
>>>> +{
>>>> +    if (PageHuge(page))
>>>> +        return alloc_huge_page_node(page_hstate(compound_head(page)),
>>>> +                    node);
>>>> +    else
>>>> +        return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);
>>>
>>> You could just test for page in movable (or highmem?) zone here in the 
>>> callback.
>>>
>>
>> is_highmem_idx() always return 0 if CONFIG_HIGHMEM closed.
> 
> Yeah, but then it doesn't matter if __GFP_HIGHMEM is given.
> 
>> And GFP_HIGHUSER_MOVABLE will choose movable_zone first, then normal_zone.
>> So how about this check? if (PageHighMem() or zone == ZONE_MOVABLE) then use 
>> GFP_HIGHUSER_MOVABLE
> 
> zone == ZONE_MOVABLE -> GFP_HIGHUSER_MOVABLE
> PageHighMem() -> GFP_HIGHUSER

How about use GFP_HIGHUSER_MOVABLE in the above two cases? As Joonsoo said:

"all migratable pages are not from user space. For example,
blockdev file cache has __GFP_MOVABLE and migratable but it has no
__GFP_HIGHMEM and __GFP_USER.
And, zram's memory isn't GFP_HIGHUSER_MOVABLE but has __GFP_MOVABLE."

And you said this before "GFP_USER just specifies some reclaim flags"

> else GFP_USER ?

else use GFP_USER | __GFP_MOVABLE ?

Thanks,
Xishi Qiu

> 
>>>> +}
>>>> +
>>>>  #define NR_OFFLINE_AT_ONCE_PAGES    (256)
>>>>  static int
>>>>  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>> @@ -1510,6 +1520,7 @@ do_migrate_range(unsigned long start_pfn, unsigned 
>>>> long end_pfn)
>>>>      int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>>>>      int not_managed = 0;
>>>>      int ret = 0;
>>>> +    int nid = NUMA_NO_NODE;
>>>>      LIST_HEAD(source);
>>>>
>>>>      for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
>>>> @@ -1564,12 +1575,24 @@ do_migrate_range(unsigned long start_pfn, unsigned 
>>>> long end_pfn)
>>>>              goto out;
>>>>          }
>>>>
>>>> -        /*
>>>> -         * alloc_migrate_target should be improooooved!!
>>>> -         * migrate_pages returns # of failed pages.
>>>> -         */
>>>> -        ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
>>>> -                    MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>>> +        for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>> +            if (!pfn_valid(pfn))
>>>> +                continue;
>>>> +            page = pfn_to_page(pfn);
>>>> +            if (zone_idx(page_zone(page)) == ZONE_MOVABLE)
>>>> +                nid = next_node_in(page_to_nid(page),
>>>> +                        node_online_map);
>>>> +            break;
>>>> +        }
>>>
>>> Then you could remove the ZONE_MOVABLE check here. I'm not sure how much 
>>> worth the precalculation of nid is, if it has to be a rather complicated 
>>> code like this, hm.
>>>
>>> Also, since we know that "next node in node_online_map" is in fact not 
>>> optimal, what about using the opportunity to really try the best possible 
>>> way? Maybe it's as simple as allocating via __alloc_pages_nodemask() with 
>>> current node's zonelist (where remote nodes should be already sorted 
>>> according to NUMA distance), but with current node (which would be first in 
>>> the zonelist) removed from the nodemask so that it's skipped over? But 
>>> check if memory offlining process didn't kill the zonelist already at this 
>>> point, or something.
>>>
>>
>> Do you mean that call __alloc_pages_nodemask(), the zonelist is from current 
>> page's node,
>> but it(the current page's node) is not include in the nodemask?
> 
> Exactly.
> The question is if there should be a fallback to current page's node if that 
> fails. Only make sense if somebody wants to offline only part of the node?
> 
>>
>> Thanks,
>> Xishi Qiu
>>
>>>> +
>>>> +        /* Alloc new page from the next node if possible */
>>>> +        if (nid != NUMA_NO_NODE)
>>>> +            ret = migrate_pages(&source, new_node_page, NULL,
>>>> +                    nid, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>>> +        else
>>>> +            ret = migrate_pages(&source, alloc_migrate_target, NULL,
>>>> +                    0, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>>>
>>> Please just use one new callback fully tailored for memory offline, instead 
>>> of choosing between the two like this.
>>>
>>>>          if (ret)
>>>>              putback_movable_pages(&source);
>>>>      }
>>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
> 
> 
> .
> 



Reply via email to