On 2016/7/25 14:59, Vlastimil Babka wrote:

> On 07/22/2016 04:57 AM, Xishi Qiu wrote:
>> Memory offline could happen on both movable zone and non-movable zone.
>> We can offline the whole node if the zone is movable zone, and if the
>> zone is non-movable zone, we cannot offline the whole node, because
>> some kernel memory can't be migrated.
>>
>> So if we offline a node with movable zone, use prefer mempolicy to alloc
>> new page from the next node instead of the current node or other remote
>> nodes, because re-migrate is a waste of time and the distance of the
>> remote nodes is often very large.
>>
>> Also use GFP_HIGHUSER_MOVABLE to alloc new page if the zone is movable
>> zone.
>>
>> Signed-off-by: Xishi Qiu <qiuxi...@huawei.com>
> 
> I think this could be simpler, if you preferred the next node regardless of 
> whether it's movable zone or not. What are use cases for trying to offline 
> part of non-MOVABLE zone in a node? It's not guaranteed to succeed anyway. 
> Also if the reasoning is that the non-MOVABLE offlining preference for 
> migration target should be instead on the *same* node, then 
> alloc_migrate_target() would anyway prefer the node of the current CPU that 
> happens to execute the offlining, which is random wrt the node in question. 
> So consistently choosing remote node is IMHO better than random even for 
> non-MOVABLE zone.
> 

Hi Vlastimil,

use next node for movable zone, use current node for non-movable zone, right?

>> ---
>>  mm/memory_hotplug.c | 35 +++++++++++++++++++++++++++++------
>>  1 file changed, 29 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index e3cbdca..930a5c6 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1501,6 +1501,16 @@ static unsigned long scan_movable_pages(unsigned long 
>> start, unsigned long end)
>>      return 0;
>>  }
>>
>> +static struct page *new_node_page(struct page *page, unsigned long node,
>> +        int **result)
>> +{
>> +    if (PageHuge(page))
>> +        return alloc_huge_page_node(page_hstate(compound_head(page)),
>> +                    node);
>> +    else
>> +        return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE, 0);
> 
> You could just test for page in movable (or highmem?) zone here in the 
> callback.
> 

is_highmem_idx() always return 0 if CONFIG_HIGHMEM closed.
And GFP_HIGHUSER_MOVABLE will choose movable_zone first, then normal_zone.
So how about this check? if (PageHighMem() or zone == ZONE_MOVABLE) then use 
GFP_HIGHUSER_MOVABLE

>> +}
>> +
>>  #define NR_OFFLINE_AT_ONCE_PAGES    (256)
>>  static int
>>  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>> @@ -1510,6 +1520,7 @@ do_migrate_range(unsigned long start_pfn, unsigned 
>> long end_pfn)
>>      int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
>>      int not_managed = 0;
>>      int ret = 0;
>> +    int nid = NUMA_NO_NODE;
>>      LIST_HEAD(source);
>>
>>      for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
>> @@ -1564,12 +1575,24 @@ do_migrate_range(unsigned long start_pfn, unsigned 
>> long end_pfn)
>>              goto out;
>>          }
>>
>> -        /*
>> -         * alloc_migrate_target should be improooooved!!
>> -         * migrate_pages returns # of failed pages.
>> -         */
>> -        ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
>> -                    MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>> +        for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>> +            if (!pfn_valid(pfn))
>> +                continue;
>> +            page = pfn_to_page(pfn);
>> +            if (zone_idx(page_zone(page)) == ZONE_MOVABLE)
>> +                nid = next_node_in(page_to_nid(page),
>> +                        node_online_map);
>> +            break;
>> +        }
> 
> Then you could remove the ZONE_MOVABLE check here. I'm not sure how much 
> worth the precalculation of nid is, if it has to be a rather complicated code 
> like this, hm.
> 
> Also, since we know that "next node in node_online_map" is in fact not 
> optimal, what about using the opportunity to really try the best possible 
> way? Maybe it's as simple as allocating via __alloc_pages_nodemask() with 
> current node's zonelist (where remote nodes should be already sorted 
> according to NUMA distance), but with current node (which would be first in 
> the zonelist) removed from the nodemask so that it's skipped over? But check 
> if memory offlining process didn't kill the zonelist already at this point, 
> or something.
> 

Do you mean that call __alloc_pages_nodemask(), the zonelist is from current 
page's node,
but it(the current page's node) is not include in the nodemask?

Thanks,
Xishi Qiu

>> +
>> +        /* Alloc new page from the next node if possible */
>> +        if (nid != NUMA_NO_NODE)
>> +            ret = migrate_pages(&source, new_node_page, NULL,
>> +                    nid, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
>> +        else
>> +            ret = migrate_pages(&source, alloc_migrate_target, NULL,
>> +                    0, MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> 
> Please just use one new callback fully tailored for memory offline, instead 
> of choosing between the two like this.
> 
>>          if (ret)
>>              putback_movable_pages(&source);
>>      }
>>
> 
> 
> .
> 



Reply via email to