Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]

2012-10-25 Thread Lai Jiangshan
Hi, KOSAKI 

On 09/28/2012 06:03 AM, KOSAKI Motohiro wrote:
> (9/27/12 2:47 AM), Lai Jiangshan wrote:
>> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
>> it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
>> node_states[N_NORMAL_MEMORY] becomes stale.
> 
> What's mean 'stale'? I guess
> 
> : Currently memory_hotplug doesn't turn on/off node_states[N_NORMAL_MEMORY]


Right.

> and
> : then it will be invalid if the platform has highmem. Luckily, almost memory 
> : hotplug aware platform don't have highmem, but are not all.
> 
> right?

Some platforms(32 bit) support logic-memory-hotplug.
Some platforms have movable memory.
They are all considered.

> I supporse this patch only meaningful on ARM platform practically.
> 

any platform whic supports memory-hotplug.

> 
> 
>> We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
>> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
>> are changed while hotpluging.
> 
> 
>> Also add @status_change_nid_normal to struct memory_notify, thus
>> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
>> are changed.
> 
> status_change_nid_normal is very ugly to me. When status_change_nid and 
> status_change_nid_normal has positive value, they are always the same.
> nid and flags value are more natual to me.

If we use flags, the semantic of "status_change_nid" is changed and we need to
change more current code, and we will add complicated to the memory hotplug
callbacks.

like this:

-   node = arg->status_change_nid;
+   if (arg->status_change_flags & (1UL << N_HIGH_MEMORY))
+   node = arg->status_change_nid;
+   else
+   node = -1;

> 
> 
> 
>>
>> Signed-off-by: Lai Jiangshan 
>> ---
>>  Documentation/memory-hotplug.txt |5 ++-
>>  include/linux/memory.h   |1 +
>>  mm/memory_hotplug.c  |   94 
>> +++--
>>  3 files changed, 83 insertions(+), 17 deletions(-)
>>
>> diff --git a/Documentation/memory-hotplug.txt 
>> b/Documentation/memory-hotplug.txt
>> index 6d0c251..6e6cbc7 100644
>> --- a/Documentation/memory-hotplug.txt
>> +++ b/Documentation/memory-hotplug.txt
>> @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct 
>> memory_notify.
>>  struct memory_notify {
>> unsigned long start_pfn;
>> unsigned long nr_pages;
>> +   int status_change_nid_normal;
>> int status_change_nid;
>>  }
>>  
>>  start_pfn is start_pfn of online/offline memory.
>>  nr_pages is # of pages of online/offline memory.
>> +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
>> +is (will be) set/clear, if this is -1, then nodemask status is not changed.
>>  status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
>>  set/clear. It means a new(memoryless) node gets new memory by online and a
>>  node loses all memory. If this is -1, then nodemask status is not changed.
>> -If status_changed_nid >= 0, callback should create/discard structures for 
>> the
>> +If status_changed_nid* >= 0, callback should create/discard structures for 
>> the
>>  node if necessary.
>>  
>>  --
>> diff --git a/include/linux/memory.h b/include/linux/memory.h
>> index ff9a9f8..a09216d 100644
>> --- a/include/linux/memory.h
>> +++ b/include/linux/memory.h
>> @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
>>  struct memory_notify {
>>  unsigned long start_pfn;
>>  unsigned long nr_pages;
>> +int status_change_nid_normal;
>>  int status_change_nid;
>>  };
>>  
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 6a5b90d..b62d429b 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, 
>> unsigned long nr_pages,
>>  return 0;
>>  }
>>  
>> +static void check_nodemasks_changes_online(unsigned long nr_pages,
>> +struct zone *zone, struct memory_notify *arg)
>> +{
>> +int nid = zone_to_nid(zone);
>> +enum zone_type zone_last = ZONE_NORMAL;
>> +
>> +if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
>> +zone_last = ZONE_MOVABLE;
> 
> This is very strange (or ugly) code. ZONE_MOVABLE don't depend on high mem.

If we don't have HIGHMEM,
any node of N_NORMAL_MEMORY has 0...ZONE_MOVABLE

if we have HIGHMEM,
any node of N_NORMAL_MEMORY has 0...ZONE_NORMAL

> 
> 
>> +
>> +if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
>> +arg->status_change_nid_normal = nid;
>> +else
>> +arg->status_change_nid_normal = -1;
> 
> Wrong. The onlined node may only have high mem zone. IOW, think fake numa 
> case etc.

"zone_idx(zone) <= zone_last" checks this case. the result is "else" branch.

> 
> 
>> +
>> +if (!node_state(nid, N_HIGH_MEMORY))
>> +arg->status_change_nid = nid;
>> +   

Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]

2012-10-25 Thread Lai Jiangshan
Hi, KOSAKI 

On 09/28/2012 06:03 AM, KOSAKI Motohiro wrote:
 (9/27/12 2:47 AM), Lai Jiangshan wrote:
 Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
 it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
 node_states[N_NORMAL_MEMORY] becomes stale.
 
 What's mean 'stale'? I guess
 
 : Currently memory_hotplug doesn't turn on/off node_states[N_NORMAL_MEMORY]


Right.

 and
 : then it will be invalid if the platform has highmem. Luckily, almost memory 
 : hotplug aware platform don't have highmem, but are not all.
 
 right?

Some platforms(32 bit) support logic-memory-hotplug.
Some platforms have movable memory.
They are all considered.

 I supporse this patch only meaningful on ARM platform practically.
 

any platform whic supports memory-hotplug.

 
 
 We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
 to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
 are changed while hotpluging.
 
 
 Also add @status_change_nid_normal to struct memory_notify, thus
 the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
 are changed.
 
 status_change_nid_normal is very ugly to me. When status_change_nid and 
 status_change_nid_normal has positive value, they are always the same.
 nid and flags value are more natual to me.

If we use flags, the semantic of status_change_nid is changed and we need to
change more current code, and we will add complicated to the memory hotplug
callbacks.

like this:

-   node = arg-status_change_nid;
+   if (arg-status_change_flags  (1UL  N_HIGH_MEMORY))
+   node = arg-status_change_nid;
+   else
+   node = -1;

 
 
 

 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 ---
  Documentation/memory-hotplug.txt |5 ++-
  include/linux/memory.h   |1 +
  mm/memory_hotplug.c  |   94 
 +++--
  3 files changed, 83 insertions(+), 17 deletions(-)

 diff --git a/Documentation/memory-hotplug.txt 
 b/Documentation/memory-hotplug.txt
 index 6d0c251..6e6cbc7 100644
 --- a/Documentation/memory-hotplug.txt
 +++ b/Documentation/memory-hotplug.txt
 @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct 
 memory_notify.
  struct memory_notify {
 unsigned long start_pfn;
 unsigned long nr_pages;
 +   int status_change_nid_normal;
 int status_change_nid;
  }
  
  start_pfn is start_pfn of online/offline memory.
  nr_pages is # of pages of online/offline memory.
 +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
 +is (will be) set/clear, if this is -1, then nodemask status is not changed.
  status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
  set/clear. It means a new(memoryless) node gets new memory by online and a
  node loses all memory. If this is -1, then nodemask status is not changed.
 -If status_changed_nid = 0, callback should create/discard structures for 
 the
 +If status_changed_nid* = 0, callback should create/discard structures for 
 the
  node if necessary.
  
  --
 diff --git a/include/linux/memory.h b/include/linux/memory.h
 index ff9a9f8..a09216d 100644
 --- a/include/linux/memory.h
 +++ b/include/linux/memory.h
 @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
  struct memory_notify {
  unsigned long start_pfn;
  unsigned long nr_pages;
 +int status_change_nid_normal;
  int status_change_nid;
  };
  
 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
 index 6a5b90d..b62d429b 100644
 --- a/mm/memory_hotplug.c
 +++ b/mm/memory_hotplug.c
 @@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, 
 unsigned long nr_pages,
  return 0;
  }
  
 +static void check_nodemasks_changes_online(unsigned long nr_pages,
 +struct zone *zone, struct memory_notify *arg)
 +{
 +int nid = zone_to_nid(zone);
 +enum zone_type zone_last = ZONE_NORMAL;
 +
 +if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
 +zone_last = ZONE_MOVABLE;
 
 This is very strange (or ugly) code. ZONE_MOVABLE don't depend on high mem.

If we don't have HIGHMEM,
any node of N_NORMAL_MEMORY has 0...ZONE_MOVABLE

if we have HIGHMEM,
any node of N_NORMAL_MEMORY has 0...ZONE_NORMAL

 
 
 +
 +if (zone_idx(zone) = zone_last  !node_state(nid, N_NORMAL_MEMORY))
 +arg-status_change_nid_normal = nid;
 +else
 +arg-status_change_nid_normal = -1;
 
 Wrong. The onlined node may only have high mem zone. IOW, think fake numa 
 case etc.

zone_idx(zone) = zone_last checks this case. the result is else branch.

 
 
 +
 +if (!node_state(nid, N_HIGH_MEMORY))
 +arg-status_change_nid = nid;
 +else
 +arg-status_change_nid = -1;
 +}
 +
 +static void set_nodemasks(int node, struct memory_notify *arg)
 
 Too ugly. just remove this and use node_set_state() directly.
 
 +{
 +if 

Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]

2012-09-28 Thread Lai Jiangshan
On 09/27/2012 10:32 PM, Ni zhan Chen wrote:
> On 09/27/2012 02:47 PM, Lai Jiangshan wrote:
>> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
>> it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
>> node_states[N_NORMAL_MEMORY] becomes stale.
>>
>> We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
>> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
>> are changed while hotpluging.
>>
>> Also add @status_change_nid_normal to struct memory_notify, thus
>> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
>> are changed.
> 
> I still don't understand why need care N_NORMAL_MEMORY here, could you explain
> in details?

Hi, Chen

In short node_states[N_NORMAL_MEMORY] will become wrong in some situation.
many memory management code access to this node_states[N_NORMAL_MEMORY].

I will add more detail in the changelog in next round.

Thanks,
Lai

> 
>>
>> Signed-off-by: Lai Jiangshan 
>> ---
>>   Documentation/memory-hotplug.txt |5 ++-
>>   include/linux/memory.h   |1 +
>>   mm/memory_hotplug.c  |   94 
>> +++--
>>   3 files changed, 83 insertions(+), 17 deletions(-)
>>
>> diff --git a/Documentation/memory-hotplug.txt 
>> b/Documentation/memory-hotplug.txt
>> index 6d0c251..6e6cbc7 100644
>> --- a/Documentation/memory-hotplug.txt
>> +++ b/Documentation/memory-hotplug.txt
>> @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct 
>> memory_notify.
>>   struct memory_notify {
>>  unsigned long start_pfn;
>>  unsigned long nr_pages;
>> +   int status_change_nid_normal;
>>  int status_change_nid;
>>   }
>> start_pfn is start_pfn of online/offline memory.
>>   nr_pages is # of pages of online/offline memory.
>> +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
>> +is (will be) set/clear, if this is -1, then nodemask status is not changed.
>>   status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will 
>> be)
>>   set/clear. It means a new(memoryless) node gets new memory by online and a
>>   node loses all memory. If this is -1, then nodemask status is not changed.
>> -If status_changed_nid >= 0, callback should create/discard structures for 
>> the
>> +If status_changed_nid* >= 0, callback should create/discard structures for 
>> the
>>   node if necessary.
>> --
>> diff --git a/include/linux/memory.h b/include/linux/memory.h
>> index ff9a9f8..a09216d 100644
>> --- a/include/linux/memory.h
>> +++ b/include/linux/memory.h
>> @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
>>   struct memory_notify {
>>   unsigned long start_pfn;
>>   unsigned long nr_pages;
>> +int status_change_nid_normal;
>>   int status_change_nid;
>>   };
>>   diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 6a5b90d..b62d429b 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, 
>> unsigned long nr_pages,
>>   return 0;
>>   }
>>   +static void check_nodemasks_changes_online(unsigned long nr_pages,
>> +struct zone *zone, struct memory_notify *arg)
>> +{
>> +int nid = zone_to_nid(zone);
>> +enum zone_type zone_last = ZONE_NORMAL;
>> +
>> +if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
>> +zone_last = ZONE_MOVABLE;
>> +
>> +if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
>> +arg->status_change_nid_normal = nid;
>> +else
>> +arg->status_change_nid_normal = -1;
>> +
>> +if (!node_state(nid, N_HIGH_MEMORY))
>> +arg->status_change_nid = nid;
>> +else
>> +arg->status_change_nid = -1;
>> +}
>> +
>> +static void set_nodemasks(int node, struct memory_notify *arg)
>> +{
>> +if (arg->status_change_nid_normal >= 0)
>> +node_set_state(node, N_NORMAL_MEMORY);
>> +
>> +node_set_state(node, N_HIGH_MEMORY);
>> +}
>> +
>> int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>>   {
>> @@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned 
>> long nr_pages)
>>   struct memory_notify arg;
>> lock_memory_hotplug();
>> +/*
>> + * This doesn't need a lock to do pfn_to_page().
>> + * The section can't be removed here because of the
>> + * memory_block->state_mutex.
>> + */
>> +zone = page_zone(pfn_to_page(pfn));
>> +
>>   arg.start_pfn = pfn;
>>   arg.nr_pages = nr_pages;
>> -arg.status_change_nid = -1;
>> +check_nodemasks_changes_online(nr_pages, zone, );
>> nid = page_to_nid(pfn_to_page(pfn));
>> -if (node_present_pages(nid) == 0)
>> -arg.status_change_nid = nid;
>> ret = memory_notify(MEM_GOING_ONLINE, );
>>   ret = notifier_to_errno(ret);
>> @@ -487,12 +520,6 @@ int __ref online_pages(unsigned long pfn, unsigned long 
>> 

Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]

2012-09-28 Thread Lai Jiangshan
On 09/27/2012 10:32 PM, Ni zhan Chen wrote:
 On 09/27/2012 02:47 PM, Lai Jiangshan wrote:
 Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
 it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
 node_states[N_NORMAL_MEMORY] becomes stale.

 We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
 to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
 are changed while hotpluging.

 Also add @status_change_nid_normal to struct memory_notify, thus
 the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
 are changed.
 
 I still don't understand why need care N_NORMAL_MEMORY here, could you explain
 in details?

Hi, Chen

In short node_states[N_NORMAL_MEMORY] will become wrong in some situation.
many memory management code access to this node_states[N_NORMAL_MEMORY].

I will add more detail in the changelog in next round.

Thanks,
Lai

 

 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 ---
   Documentation/memory-hotplug.txt |5 ++-
   include/linux/memory.h   |1 +
   mm/memory_hotplug.c  |   94 
 +++--
   3 files changed, 83 insertions(+), 17 deletions(-)

 diff --git a/Documentation/memory-hotplug.txt 
 b/Documentation/memory-hotplug.txt
 index 6d0c251..6e6cbc7 100644
 --- a/Documentation/memory-hotplug.txt
 +++ b/Documentation/memory-hotplug.txt
 @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct 
 memory_notify.
   struct memory_notify {
  unsigned long start_pfn;
  unsigned long nr_pages;
 +   int status_change_nid_normal;
  int status_change_nid;
   }
 start_pfn is start_pfn of online/offline memory.
   nr_pages is # of pages of online/offline memory.
 +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
 +is (will be) set/clear, if this is -1, then nodemask status is not changed.
   status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will 
 be)
   set/clear. It means a new(memoryless) node gets new memory by online and a
   node loses all memory. If this is -1, then nodemask status is not changed.
 -If status_changed_nid = 0, callback should create/discard structures for 
 the
 +If status_changed_nid* = 0, callback should create/discard structures for 
 the
   node if necessary.
 --
 diff --git a/include/linux/memory.h b/include/linux/memory.h
 index ff9a9f8..a09216d 100644
 --- a/include/linux/memory.h
 +++ b/include/linux/memory.h
 @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
   struct memory_notify {
   unsigned long start_pfn;
   unsigned long nr_pages;
 +int status_change_nid_normal;
   int status_change_nid;
   };
   diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
 index 6a5b90d..b62d429b 100644
 --- a/mm/memory_hotplug.c
 +++ b/mm/memory_hotplug.c
 @@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, 
 unsigned long nr_pages,
   return 0;
   }
   +static void check_nodemasks_changes_online(unsigned long nr_pages,
 +struct zone *zone, struct memory_notify *arg)
 +{
 +int nid = zone_to_nid(zone);
 +enum zone_type zone_last = ZONE_NORMAL;
 +
 +if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
 +zone_last = ZONE_MOVABLE;
 +
 +if (zone_idx(zone) = zone_last  !node_state(nid, N_NORMAL_MEMORY))
 +arg-status_change_nid_normal = nid;
 +else
 +arg-status_change_nid_normal = -1;
 +
 +if (!node_state(nid, N_HIGH_MEMORY))
 +arg-status_change_nid = nid;
 +else
 +arg-status_change_nid = -1;
 +}
 +
 +static void set_nodemasks(int node, struct memory_notify *arg)
 +{
 +if (arg-status_change_nid_normal = 0)
 +node_set_state(node, N_NORMAL_MEMORY);
 +
 +node_set_state(node, N_HIGH_MEMORY);
 +}
 +
 int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
   {
 @@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned 
 long nr_pages)
   struct memory_notify arg;
 lock_memory_hotplug();
 +/*
 + * This doesn't need a lock to do pfn_to_page().
 + * The section can't be removed here because of the
 + * memory_block-state_mutex.
 + */
 +zone = page_zone(pfn_to_page(pfn));
 +
   arg.start_pfn = pfn;
   arg.nr_pages = nr_pages;
 -arg.status_change_nid = -1;
 +check_nodemasks_changes_online(nr_pages, zone, arg);
 nid = page_to_nid(pfn_to_page(pfn));
 -if (node_present_pages(nid) == 0)
 -arg.status_change_nid = nid;
 ret = memory_notify(MEM_GOING_ONLINE, arg);
   ret = notifier_to_errno(ret);
 @@ -487,12 +520,6 @@ int __ref online_pages(unsigned long pfn, unsigned long 
 nr_pages)
   return ret;
   }
   /*
 - * This doesn't need a lock to do pfn_to_page().
 - * The section can't be removed here because of the
 - * memory_block-state_mutex.
 - */
 -zone = 

Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]

2012-09-27 Thread KOSAKI Motohiro
(9/27/12 2:47 AM), Lai Jiangshan wrote:
> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
> it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
> node_states[N_NORMAL_MEMORY] becomes stale.

What's mean 'stale'? I guess

: Currently memory_hotplug doesn't turn on/off node_states[N_NORMAL_MEMORY] and
: then it will be invalid if the platform has highmem. Luckily, almost memory 
: hotplug aware platform don't have highmem, but are not all.

right?
I supporse this patch only meaningful on ARM platform practically.



> We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
> are changed while hotpluging.


> Also add @status_change_nid_normal to struct memory_notify, thus
> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
> are changed.

status_change_nid_normal is very ugly to me. When status_change_nid and 
status_change_nid_normal has positive value, they are always the same.
nid and flags value are more natual to me.



> 
> Signed-off-by: Lai Jiangshan 
> ---
>  Documentation/memory-hotplug.txt |5 ++-
>  include/linux/memory.h   |1 +
>  mm/memory_hotplug.c  |   94 +++--
>  3 files changed, 83 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/memory-hotplug.txt 
> b/Documentation/memory-hotplug.txt
> index 6d0c251..6e6cbc7 100644
> --- a/Documentation/memory-hotplug.txt
> +++ b/Documentation/memory-hotplug.txt
> @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct 
> memory_notify.
>  struct memory_notify {
> unsigned long start_pfn;
> unsigned long nr_pages;
> +   int status_change_nid_normal;
> int status_change_nid;
>  }
>  
>  start_pfn is start_pfn of online/offline memory.
>  nr_pages is # of pages of online/offline memory.
> +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
> +is (will be) set/clear, if this is -1, then nodemask status is not changed.
>  status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
>  set/clear. It means a new(memoryless) node gets new memory by online and a
>  node loses all memory. If this is -1, then nodemask status is not changed.
> -If status_changed_nid >= 0, callback should create/discard structures for the
> +If status_changed_nid* >= 0, callback should create/discard structures for 
> the
>  node if necessary.
>  
>  --
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index ff9a9f8..a09216d 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
>  struct memory_notify {
>   unsigned long start_pfn;
>   unsigned long nr_pages;
> + int status_change_nid_normal;
>   int status_change_nid;
>  };
>  
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6a5b90d..b62d429b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, 
> unsigned long nr_pages,
>   return 0;
>  }
>  
> +static void check_nodemasks_changes_online(unsigned long nr_pages,
> + struct zone *zone, struct memory_notify *arg)
> +{
> + int nid = zone_to_nid(zone);
> + enum zone_type zone_last = ZONE_NORMAL;
> +
> + if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
> + zone_last = ZONE_MOVABLE;

This is very strange (or ugly) code. ZONE_MOVABLE don't depend on high mem.


> +
> + if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
> + arg->status_change_nid_normal = nid;
> + else
> + arg->status_change_nid_normal = -1;

Wrong. The onlined node may only have high mem zone. IOW, think fake numa case 
etc.


> +
> + if (!node_state(nid, N_HIGH_MEMORY))
> + arg->status_change_nid = nid;
> + else
> + arg->status_change_nid = -1;
> +}
> +
> +static void set_nodemasks(int node, struct memory_notify *arg)

Too ugly. just remove this and use node_set_state() directly.

> +{
> + if (arg->status_change_nid_normal >= 0)
> + node_set_state(node, N_NORMAL_MEMORY);
> +
> + node_set_state(node, N_HIGH_MEMORY);
> +}
> +
>  
>  int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
>  {
> @@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned long 
> nr_pages)
>   struct memory_notify arg;
>  
>   lock_memory_hotplug();
> + /*
> +  * This doesn't need a lock to do pfn_to_page().
> +  * The section can't be removed here because of the
> +  * memory_block->state_mutex.
> +  */

Please explain the intention of this comment. We think lock_memory_hotplug() 
close
a race against memory offline directly.


> + zone = page_zone(pfn_to_page(pfn));
> +
>   arg.start_pfn = pfn;
>   arg.nr_pages = 

Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]

2012-09-27 Thread Ni zhan Chen

On 09/27/2012 02:47 PM, Lai Jiangshan wrote:

Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
node_states[N_NORMAL_MEMORY] becomes stale.

We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
are changed while hotpluging.

Also add @status_change_nid_normal to struct memory_notify, thus
the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
are changed.


I still don't understand why need care N_NORMAL_MEMORY here, could you 
explain

in details?



Signed-off-by: Lai Jiangshan 
---
  Documentation/memory-hotplug.txt |5 ++-
  include/linux/memory.h   |1 +
  mm/memory_hotplug.c  |   94 +++--
  3 files changed, 83 insertions(+), 17 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6d0c251..6e6cbc7 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -377,15 +377,18 @@ The third argument is passed by pointer of struct 
memory_notify.
  struct memory_notify {
 unsigned long start_pfn;
 unsigned long nr_pages;
+   int status_change_nid_normal;
 int status_change_nid;
  }
  
  start_pfn is start_pfn of online/offline memory.

  nr_pages is # of pages of online/offline memory.
+status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
  status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
  set/clear. It means a new(memoryless) node gets new memory by online and a
  node loses all memory. If this is -1, then nodemask status is not changed.
-If status_changed_nid >= 0, callback should create/discard structures for the
+If status_changed_nid* >= 0, callback should create/discard structures for the
  node if necessary.
  
  --

diff --git a/include/linux/memory.h b/include/linux/memory.h
index ff9a9f8..a09216d 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
  struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
+   int status_change_nid_normal;
int status_change_nid;
  };
  
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c

index 6a5b90d..b62d429b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, 
unsigned long nr_pages,
return 0;
  }
  
+static void check_nodemasks_changes_online(unsigned long nr_pages,

+   struct zone *zone, struct memory_notify *arg)
+{
+   int nid = zone_to_nid(zone);
+   enum zone_type zone_last = ZONE_NORMAL;
+
+   if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+   zone_last = ZONE_MOVABLE;
+
+   if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
+   arg->status_change_nid_normal = nid;
+   else
+   arg->status_change_nid_normal = -1;
+
+   if (!node_state(nid, N_HIGH_MEMORY))
+   arg->status_change_nid = nid;
+   else
+   arg->status_change_nid = -1;
+}
+
+static void set_nodemasks(int node, struct memory_notify *arg)
+{
+   if (arg->status_change_nid_normal >= 0)
+   node_set_state(node, N_NORMAL_MEMORY);
+
+   node_set_state(node, N_HIGH_MEMORY);
+}
+
  
  int __ref online_pages(unsigned long pfn, unsigned long nr_pages)

  {
@@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
struct memory_notify arg;
  
  	lock_memory_hotplug();

+   /*
+* This doesn't need a lock to do pfn_to_page().
+* The section can't be removed here because of the
+* memory_block->state_mutex.
+*/
+   zone = page_zone(pfn_to_page(pfn));
+
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
-   arg.status_change_nid = -1;
+   check_nodemasks_changes_online(nr_pages, zone, );
  
  	nid = page_to_nid(pfn_to_page(pfn));

-   if (node_present_pages(nid) == 0)
-   arg.status_change_nid = nid;
  
  	ret = memory_notify(MEM_GOING_ONLINE, );

ret = notifier_to_errno(ret);
@@ -487,12 +520,6 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
return ret;
}
/*
-* This doesn't need a lock to do pfn_to_page().
-* The section can't be removed here because of the
-* memory_block->state_mutex.
-*/
-   zone = page_zone(pfn_to_page(pfn));
-   /*
 * If this zone is not populated, then it is not in zonelist.
 * This means the page allocator ignores this zone.
 * So, zonelist must be updated after online.
@@ -517,7 +544,7 @@ int __ref 

[PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]

2012-09-27 Thread Lai Jiangshan
Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
node_states[N_NORMAL_MEMORY] becomes stale.

We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
are changed while hotpluging.

Also add @status_change_nid_normal to struct memory_notify, thus
the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
are changed.

Signed-off-by: Lai Jiangshan 
---
 Documentation/memory-hotplug.txt |5 ++-
 include/linux/memory.h   |1 +
 mm/memory_hotplug.c  |   94 +++--
 3 files changed, 83 insertions(+), 17 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6d0c251..6e6cbc7 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -377,15 +377,18 @@ The third argument is passed by pointer of struct 
memory_notify.
 struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
+   int status_change_nid_normal;
int status_change_nid;
 }
 
 start_pfn is start_pfn of online/offline memory.
 nr_pages is # of pages of online/offline memory.
+status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
 status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
 set/clear. It means a new(memoryless) node gets new memory by online and a
 node loses all memory. If this is -1, then nodemask status is not changed.
-If status_changed_nid >= 0, callback should create/discard structures for the
+If status_changed_nid* >= 0, callback should create/discard structures for the
 node if necessary.
 
 --
diff --git a/include/linux/memory.h b/include/linux/memory.h
index ff9a9f8..a09216d 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
 struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
+   int status_change_nid_normal;
int status_change_nid;
 };
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6a5b90d..b62d429b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, 
unsigned long nr_pages,
return 0;
 }
 
+static void check_nodemasks_changes_online(unsigned long nr_pages,
+   struct zone *zone, struct memory_notify *arg)
+{
+   int nid = zone_to_nid(zone);
+   enum zone_type zone_last = ZONE_NORMAL;
+
+   if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+   zone_last = ZONE_MOVABLE;
+
+   if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
+   arg->status_change_nid_normal = nid;
+   else
+   arg->status_change_nid_normal = -1;
+
+   if (!node_state(nid, N_HIGH_MEMORY))
+   arg->status_change_nid = nid;
+   else
+   arg->status_change_nid = -1;
+}
+
+static void set_nodemasks(int node, struct memory_notify *arg)
+{
+   if (arg->status_change_nid_normal >= 0)
+   node_set_state(node, N_NORMAL_MEMORY);
+
+   node_set_state(node, N_HIGH_MEMORY);
+}
+
 
 int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 {
@@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
struct memory_notify arg;
 
lock_memory_hotplug();
+   /*
+* This doesn't need a lock to do pfn_to_page().
+* The section can't be removed here because of the
+* memory_block->state_mutex.
+*/
+   zone = page_zone(pfn_to_page(pfn));
+
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
-   arg.status_change_nid = -1;
+   check_nodemasks_changes_online(nr_pages, zone, );
 
nid = page_to_nid(pfn_to_page(pfn));
-   if (node_present_pages(nid) == 0)
-   arg.status_change_nid = nid;
 
ret = memory_notify(MEM_GOING_ONLINE, );
ret = notifier_to_errno(ret);
@@ -487,12 +520,6 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
return ret;
}
/*
-* This doesn't need a lock to do pfn_to_page().
-* The section can't be removed here because of the
-* memory_block->state_mutex.
-*/
-   zone = page_zone(pfn_to_page(pfn));
-   /*
 * If this zone is not populated, then it is not in zonelist.
 * This means the page allocator ignores this zone.
 * So, zonelist must be updated after online.
@@ -517,7 +544,7 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
zone->present_pages += onlined_pages;
zone->zone_pgdat->node_present_pages += onlined_pages;
if 

[PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]

2012-09-27 Thread Lai Jiangshan
Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
node_states[N_NORMAL_MEMORY] becomes stale.

We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
are changed while hotpluging.

Also add @status_change_nid_normal to struct memory_notify, thus
the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
are changed.

Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
---
 Documentation/memory-hotplug.txt |5 ++-
 include/linux/memory.h   |1 +
 mm/memory_hotplug.c  |   94 +++--
 3 files changed, 83 insertions(+), 17 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6d0c251..6e6cbc7 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -377,15 +377,18 @@ The third argument is passed by pointer of struct 
memory_notify.
 struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
+   int status_change_nid_normal;
int status_change_nid;
 }
 
 start_pfn is start_pfn of online/offline memory.
 nr_pages is # of pages of online/offline memory.
+status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
 status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
 set/clear. It means a new(memoryless) node gets new memory by online and a
 node loses all memory. If this is -1, then nodemask status is not changed.
-If status_changed_nid = 0, callback should create/discard structures for the
+If status_changed_nid* = 0, callback should create/discard structures for the
 node if necessary.
 
 --
diff --git a/include/linux/memory.h b/include/linux/memory.h
index ff9a9f8..a09216d 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
 struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
+   int status_change_nid_normal;
int status_change_nid;
 };
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6a5b90d..b62d429b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, 
unsigned long nr_pages,
return 0;
 }
 
+static void check_nodemasks_changes_online(unsigned long nr_pages,
+   struct zone *zone, struct memory_notify *arg)
+{
+   int nid = zone_to_nid(zone);
+   enum zone_type zone_last = ZONE_NORMAL;
+
+   if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+   zone_last = ZONE_MOVABLE;
+
+   if (zone_idx(zone) = zone_last  !node_state(nid, N_NORMAL_MEMORY))
+   arg-status_change_nid_normal = nid;
+   else
+   arg-status_change_nid_normal = -1;
+
+   if (!node_state(nid, N_HIGH_MEMORY))
+   arg-status_change_nid = nid;
+   else
+   arg-status_change_nid = -1;
+}
+
+static void set_nodemasks(int node, struct memory_notify *arg)
+{
+   if (arg-status_change_nid_normal = 0)
+   node_set_state(node, N_NORMAL_MEMORY);
+
+   node_set_state(node, N_HIGH_MEMORY);
+}
+
 
 int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 {
@@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
struct memory_notify arg;
 
lock_memory_hotplug();
+   /*
+* This doesn't need a lock to do pfn_to_page().
+* The section can't be removed here because of the
+* memory_block-state_mutex.
+*/
+   zone = page_zone(pfn_to_page(pfn));
+
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
-   arg.status_change_nid = -1;
+   check_nodemasks_changes_online(nr_pages, zone, arg);
 
nid = page_to_nid(pfn_to_page(pfn));
-   if (node_present_pages(nid) == 0)
-   arg.status_change_nid = nid;
 
ret = memory_notify(MEM_GOING_ONLINE, arg);
ret = notifier_to_errno(ret);
@@ -487,12 +520,6 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
return ret;
}
/*
-* This doesn't need a lock to do pfn_to_page().
-* The section can't be removed here because of the
-* memory_block-state_mutex.
-*/
-   zone = page_zone(pfn_to_page(pfn));
-   /*
 * If this zone is not populated, then it is not in zonelist.
 * This means the page allocator ignores this zone.
 * So, zonelist must be updated after online.
@@ -517,7 +544,7 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
zone-present_pages += onlined_pages;
zone-zone_pgdat-node_present_pages += onlined_pages;
   

Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]

2012-09-27 Thread Ni zhan Chen

On 09/27/2012 02:47 PM, Lai Jiangshan wrote:

Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
node_states[N_NORMAL_MEMORY] becomes stale.

We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
are changed while hotpluging.

Also add @status_change_nid_normal to struct memory_notify, thus
the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
are changed.


I still don't understand why need care N_NORMAL_MEMORY here, could you 
explain

in details?



Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
---
  Documentation/memory-hotplug.txt |5 ++-
  include/linux/memory.h   |1 +
  mm/memory_hotplug.c  |   94 +++--
  3 files changed, 83 insertions(+), 17 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6d0c251..6e6cbc7 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -377,15 +377,18 @@ The third argument is passed by pointer of struct 
memory_notify.
  struct memory_notify {
 unsigned long start_pfn;
 unsigned long nr_pages;
+   int status_change_nid_normal;
 int status_change_nid;
  }
  
  start_pfn is start_pfn of online/offline memory.

  nr_pages is # of pages of online/offline memory.
+status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
  status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
  set/clear. It means a new(memoryless) node gets new memory by online and a
  node loses all memory. If this is -1, then nodemask status is not changed.
-If status_changed_nid = 0, callback should create/discard structures for the
+If status_changed_nid* = 0, callback should create/discard structures for the
  node if necessary.
  
  --

diff --git a/include/linux/memory.h b/include/linux/memory.h
index ff9a9f8..a09216d 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
  struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
+   int status_change_nid_normal;
int status_change_nid;
  };
  
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c

index 6a5b90d..b62d429b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, 
unsigned long nr_pages,
return 0;
  }
  
+static void check_nodemasks_changes_online(unsigned long nr_pages,

+   struct zone *zone, struct memory_notify *arg)
+{
+   int nid = zone_to_nid(zone);
+   enum zone_type zone_last = ZONE_NORMAL;
+
+   if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+   zone_last = ZONE_MOVABLE;
+
+   if (zone_idx(zone) = zone_last  !node_state(nid, N_NORMAL_MEMORY))
+   arg-status_change_nid_normal = nid;
+   else
+   arg-status_change_nid_normal = -1;
+
+   if (!node_state(nid, N_HIGH_MEMORY))
+   arg-status_change_nid = nid;
+   else
+   arg-status_change_nid = -1;
+}
+
+static void set_nodemasks(int node, struct memory_notify *arg)
+{
+   if (arg-status_change_nid_normal = 0)
+   node_set_state(node, N_NORMAL_MEMORY);
+
+   node_set_state(node, N_HIGH_MEMORY);
+}
+
  
  int __ref online_pages(unsigned long pfn, unsigned long nr_pages)

  {
@@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
struct memory_notify arg;
  
  	lock_memory_hotplug();

+   /*
+* This doesn't need a lock to do pfn_to_page().
+* The section can't be removed here because of the
+* memory_block-state_mutex.
+*/
+   zone = page_zone(pfn_to_page(pfn));
+
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
-   arg.status_change_nid = -1;
+   check_nodemasks_changes_online(nr_pages, zone, arg);
  
  	nid = page_to_nid(pfn_to_page(pfn));

-   if (node_present_pages(nid) == 0)
-   arg.status_change_nid = nid;
  
  	ret = memory_notify(MEM_GOING_ONLINE, arg);

ret = notifier_to_errno(ret);
@@ -487,12 +520,6 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
return ret;
}
/*
-* This doesn't need a lock to do pfn_to_page().
-* The section can't be removed here because of the
-* memory_block-state_mutex.
-*/
-   zone = page_zone(pfn_to_page(pfn));
-   /*
 * If this zone is not populated, then it is not in zonelist.
 * This means the page allocator ignores this zone.
 * So, zonelist must be updated after online.
@@ -517,7 +544,7 @@ int __ref 

Re: [PATCH 1/3] memory_hotplug: fix stale node_states[N_NORMAL_MEMORY]

2012-09-27 Thread KOSAKI Motohiro
(9/27/12 2:47 AM), Lai Jiangshan wrote:
 Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
 it forgets to manage node_states[N_NORMAL_MEMORY]. it causes
 node_states[N_NORMAL_MEMORY] becomes stale.

What's mean 'stale'? I guess

: Currently memory_hotplug doesn't turn on/off node_states[N_NORMAL_MEMORY] and
: then it will be invalid if the platform has highmem. Luckily, almost memory 
: hotplug aware platform don't have highmem, but are not all.

right?
I supporse this patch only meaningful on ARM platform practically.



 We add check_nodemasks_changes_online() and check_nodemasks_changes_offline()
 to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
 are changed while hotpluging.


 Also add @status_change_nid_normal to struct memory_notify, thus
 the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
 are changed.

status_change_nid_normal is very ugly to me. When status_change_nid and 
status_change_nid_normal has positive value, they are always the same.
nid and flags value are more natual to me.



 
 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 ---
  Documentation/memory-hotplug.txt |5 ++-
  include/linux/memory.h   |1 +
  mm/memory_hotplug.c  |   94 +++--
  3 files changed, 83 insertions(+), 17 deletions(-)
 
 diff --git a/Documentation/memory-hotplug.txt 
 b/Documentation/memory-hotplug.txt
 index 6d0c251..6e6cbc7 100644
 --- a/Documentation/memory-hotplug.txt
 +++ b/Documentation/memory-hotplug.txt
 @@ -377,15 +377,18 @@ The third argument is passed by pointer of struct 
 memory_notify.
  struct memory_notify {
 unsigned long start_pfn;
 unsigned long nr_pages;
 +   int status_change_nid_normal;
 int status_change_nid;
  }
  
  start_pfn is start_pfn of online/offline memory.
  nr_pages is # of pages of online/offline memory.
 +status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
 +is (will be) set/clear, if this is -1, then nodemask status is not changed.
  status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
  set/clear. It means a new(memoryless) node gets new memory by online and a
  node loses all memory. If this is -1, then nodemask status is not changed.
 -If status_changed_nid = 0, callback should create/discard structures for the
 +If status_changed_nid* = 0, callback should create/discard structures for 
 the
  node if necessary.
  
  --
 diff --git a/include/linux/memory.h b/include/linux/memory.h
 index ff9a9f8..a09216d 100644
 --- a/include/linux/memory.h
 +++ b/include/linux/memory.h
 @@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
  struct memory_notify {
   unsigned long start_pfn;
   unsigned long nr_pages;
 + int status_change_nid_normal;
   int status_change_nid;
  };
  
 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
 index 6a5b90d..b62d429b 100644
 --- a/mm/memory_hotplug.c
 +++ b/mm/memory_hotplug.c
 @@ -460,6 +460,34 @@ static int online_pages_range(unsigned long start_pfn, 
 unsigned long nr_pages,
   return 0;
  }
  
 +static void check_nodemasks_changes_online(unsigned long nr_pages,
 + struct zone *zone, struct memory_notify *arg)
 +{
 + int nid = zone_to_nid(zone);
 + enum zone_type zone_last = ZONE_NORMAL;
 +
 + if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
 + zone_last = ZONE_MOVABLE;

This is very strange (or ugly) code. ZONE_MOVABLE don't depend on high mem.


 +
 + if (zone_idx(zone) = zone_last  !node_state(nid, N_NORMAL_MEMORY))
 + arg-status_change_nid_normal = nid;
 + else
 + arg-status_change_nid_normal = -1;

Wrong. The onlined node may only have high mem zone. IOW, think fake numa case 
etc.


 +
 + if (!node_state(nid, N_HIGH_MEMORY))
 + arg-status_change_nid = nid;
 + else
 + arg-status_change_nid = -1;
 +}
 +
 +static void set_nodemasks(int node, struct memory_notify *arg)

Too ugly. just remove this and use node_set_state() directly.

 +{
 + if (arg-status_change_nid_normal = 0)
 + node_set_state(node, N_NORMAL_MEMORY);
 +
 + node_set_state(node, N_HIGH_MEMORY);
 +}
 +
  
  int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
  {
 @@ -471,13 +499,18 @@ int __ref online_pages(unsigned long pfn, unsigned long 
 nr_pages)
   struct memory_notify arg;
  
   lock_memory_hotplug();
 + /*
 +  * This doesn't need a lock to do pfn_to_page().
 +  * The section can't be removed here because of the
 +  * memory_block-state_mutex.
 +  */

Please explain the intention of this comment. We think lock_memory_hotplug() 
close
a race against memory offline directly.


 + zone = page_zone(pfn_to_page(pfn));
 +
   arg.start_pfn = pfn;
   arg.nr_pages = nr_pages;
 - arg.status_change_nid = -1;
 + check_nodemasks_changes_online(nr_pages, zone,