Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-26 Thread Ingo Molnar

* Andrew Morton  wrote:

> On Thu, 21 Mar 2013 19:03:21 +0100 Ingo Molnar  wrote:
> 
> > > IMO the local scope is more obvious as this is and should only be 
> > > used for caching purposes.
> > 
> > It's a pattern we actively avoid in kernel code.
> 
> On the contrary, I always encourage people to move the static 
> definitions into function scope if possible.  So the reader can see the 
> identifier's scope without having to search the whole file. 
> Unnecessarily giving the identifier file-scope seems weird.

A common solution I use is to move such variables right before the 
function itself. That makes the "this function's scope only" aspect pretty 
apparent - without the risks of hiding globals amongst local variables. 

The other approach is to comment the variables very clearly that they are 
really globals as the 'static' keyword is easy to miss while reading 
email.

Both solutions are basically just as visible as the solution you prefer - 
but more robust.

Anyway, I guess we have to agree to disagree on that, we probably already 
spent more energy on discussing this than any worst-case problem the 
placement of these variables could ever cause in the future ;-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-25 Thread Andrew Morton
On Mon, 25 Mar 2013 15:36:54 -0700 (PDT) David Rientjes  
wrote:

> On Mon, 25 Mar 2013, Andrew Morton wrote:
> 
> > > Um, defining them in a __meminit function places them in .meminit.data 
> > > already.
> > 
> > I wish it did, but it doesn't.
> > 
> 
> $ objdump -t mm/page_alloc.o | grep last_start_pfn
> 0240 l O .meminit.data0008 
> last_start_pfn.34345
> 
> What version of gcc are you using?

4.4.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-25 Thread David Rientjes
On Mon, 25 Mar 2013, Andrew Morton wrote:

> > Um, defining them in a __meminit function places them in .meminit.data 
> > already.
> 
> I wish it did, but it doesn't.
> 

$ objdump -t mm/page_alloc.o | grep last_start_pfn
0240 l O .meminit.data  0008 last_start_pfn.34345

What version of gcc are you using?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-25 Thread Yinghai Lu
On Mon, Mar 25, 2013 at 2:56 PM, Russ Anderson  wrote:
> On Mon, Mar 25, 2013 at 10:11:27AM +0800, Lin Feng wrote:
>> On 03/24/2013 04:37 AM, Yinghai Lu wrote:
>> > +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>> > +int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
>> > +unsigned long *start_pfn, unsigned long *end_pfn)
>> > +{
>> > +   struct memblock_type *type = &memblock.memory;
>> > +   int mid = memblock_search(type, (phys_addr_t)pfn << PAGE_SHIFT);
>>
>> I'm really eager to see how much time can we save using binary search 
>> compared to
>> linear search in this case :)
>
> I have machine time tonight to measure the difference.
>
> Based on earlier testing, a system with 9TB memory calls
> __early_pfn_to_nid() 2,377,198,300 times while booting, but
> only 6815 times does it not find that the memory range is
> the same as previous and search the table.  Caching the
> previous range avoids searching the table 2,377,191,485 times,
> saving a significant amount of time.
>
> Of the remaining 6815 times when it searches the table, a binary
> search may help, but with relatively few calls it may not
> make much of an overall difference.  Testing will show how much.

Please check attached patch that could be applied on top of your patch
in -mm.

Thanks

Yinghai


memblock_search_pfn_nid.patch
Description: Binary data


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-25 Thread Russ Anderson
On Mon, Mar 25, 2013 at 10:11:27AM +0800, Lin Feng wrote:
> On 03/24/2013 04:37 AM, Yinghai Lu wrote:
> > +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> > +int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
> > +unsigned long *start_pfn, unsigned long *end_pfn)
> > +{
> > +   struct memblock_type *type = &memblock.memory;
> > +   int mid = memblock_search(type, (phys_addr_t)pfn << PAGE_SHIFT);
> 
> I'm really eager to see how much time can we save using binary search 
> compared to
> linear search in this case :)

I have machine time tonight to measure the difference.

Based on earlier testing, a system with 9TB memory calls
__early_pfn_to_nid() 2,377,198,300 times while booting, but
only 6815 times does it not find that the memory range is
the same as previous and search the table.  Caching the
previous range avoids searching the table 2,377,191,485 times,
saving a significant amount of time.

Of the remaining 6815 times when it searches the table, a binary
search may help, but with relatively few calls it may not
make much of an overall difference.  Testing will show how much.

> (quote)
> > A 4 TB (single rack) UV1 system takes 512 seconds to get through
> > the zone code.  This performance optimization reduces the time
> > by 189 seconds, a 36% improvement.
> >
> > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds,
> > a 112.9 second (53%) reduction.
> (quote)
> 
> thanks,
> linfeng

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc  r...@sgi.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-25 Thread Andrew Morton
On Sun, 24 Mar 2013 17:28:12 -0700 (PDT) David Rientjes  
wrote:

> On Sat, 23 Mar 2013, KOSAKI Motohiro wrote:
> 
> > > --- linux.orig/mm/page_alloc.c  2013-03-19 16:09:03.736450861 -0500
> > > +++ linux/mm/page_alloc.c   2013-03-22 17:07:43.895405617 -0500
> > > @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne
> > >  {
> > > unsigned long start_pfn, end_pfn;
> > > int i, nid;
> > > +   /*
> > > +  NOTE: The following SMP-unsafe globals are only used early
> > > +  in boot when the kernel is running single-threaded.
> > > +*/
> > > +   static unsigned long last_start_pfn, last_end_pfn;
> > > +   static int last_nid;
> > 
> > Why don't you mark them __meminitdata? They seems freeable.
> > 
> 
> Um, defining them in a __meminit function places them in .meminit.data 
> already.

I wish it did, but it doesn't.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-25 Thread Andrew Morton
On Thu, 21 Mar 2013 19:03:21 +0100 Ingo Molnar  wrote:

> > IMO the local scope is more obvious as this is and should only be used 
> > for caching purposes.
> 
> It's a pattern we actively avoid in kernel code.

On the contrary, I always encourage people to move the static
definitions into function scope if possible.  So the reader can see the
identifier's scope without having to search the whole file. 
Unnecessarily giving the identifier file-scope seems weird.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-24 Thread Lin Feng


On 03/24/2013 04:37 AM, Yinghai Lu wrote:
> +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
> +  unsigned long *start_pfn, unsigned long *end_pfn)
> +{
> + struct memblock_type *type = &memblock.memory;
> + int mid = memblock_search(type, (phys_addr_t)pfn << PAGE_SHIFT);

I'm really eager to see how much time can we save using binary search compared 
to
linear search in this case :)

(quote)
> A 4 TB (single rack) UV1 system takes 512 seconds to get through
> the zone code.  This performance optimization reduces the time
> by 189 seconds, a 36% improvement.
>
> A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds,
> a 112.9 second (53%) reduction.
(quote)

thanks,
linfeng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-24 Thread David Rientjes
On Sat, 23 Mar 2013, KOSAKI Motohiro wrote:

> > --- linux.orig/mm/page_alloc.c  2013-03-19 16:09:03.736450861 -0500
> > +++ linux/mm/page_alloc.c   2013-03-22 17:07:43.895405617 -0500
> > @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne
> >  {
> > unsigned long start_pfn, end_pfn;
> > int i, nid;
> > +   /*
> > +  NOTE: The following SMP-unsafe globals are only used early
> > +  in boot when the kernel is running single-threaded.
> > +*/
> > +   static unsigned long last_start_pfn, last_end_pfn;
> > +   static int last_nid;
> 
> Why don't you mark them __meminitdata? They seems freeable.
> 

Um, defining them in a __meminit function places them in .meminit.data 
already.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-24 Thread Ingo Molnar

* Russ Anderson  wrote:

> --- linux.orig/mm/page_alloc.c2013-03-19 16:09:03.736450861 -0500
> +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500
> @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne
>  {
>   unsigned long start_pfn, end_pfn;
>   int i, nid;
> + /*
> +NOTE: The following SMP-unsafe globals are only used early
> +in boot when the kernel is running single-threaded.
> +  */
> + static unsigned long last_start_pfn, last_end_pfn;
> + static int last_nid;

I guess I'm the nitpicker of the week:

please use the customary (multi-line) comment style:

  /*
   * Comment .
   * .. goes here.
   */

specified in Documentation/CodingStyle.

Thanks,

Ingo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-23 Thread KOSAKI Motohiro
> --- linux.orig/mm/page_alloc.c  2013-03-19 16:09:03.736450861 -0500
> +++ linux/mm/page_alloc.c   2013-03-22 17:07:43.895405617 -0500
> @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne
>  {
> unsigned long start_pfn, end_pfn;
> int i, nid;
> +   /*
> +  NOTE: The following SMP-unsafe globals are only used early
> +  in boot when the kernel is running single-threaded.
> +*/
> +   static unsigned long last_start_pfn, last_end_pfn;
> +   static int last_nid;

Why don't you mark them __meminitdata? They seems freeable.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-23 Thread Yinghai Lu
On Sat, Mar 23, 2013 at 8:29 AM, Russ Anderson  wrote:
> On Fri, Mar 22, 2013 at 08:25:32AM +0100, Ingo Molnar wrote:
> 
> When booting on a large memory system, the kernel spends
> considerable time in memmap_init_zone() setting up memory zones.
> Analysis shows significant time spent in __early_pfn_to_nid().
>
> The routine memmap_init_zone() checks each PFN to verify the
> nid is valid.  __early_pfn_to_nid() sequentially scans the list of
> pfn ranges to find the right range and returns the nid.  This does
> not scale well.  On a 4 TB (single rack) system there are 308
> memory ranges to scan.  The higher the PFN the more time spent
> sequentially spinning through memory ranges.
>
> Since memmap_init_zone() increments pfn, it will almost always be
> looking for the same range as the previous pfn, so check that
> range first.  If it is in the same range, return that nid.
> If not, scan the list as before.
>
> A 4 TB (single rack) UV1 system takes 512 seconds to get through
> the zone code.  This performance optimization reduces the time
> by 189 seconds, a 36% improvement.
>
> A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds,
> a 112.9 second (53%) reduction.

Interesting. but only have 308 entries in memblock...

Did you try to extend memblock_search() to search nid back?
Something like attached patch. That should save more time.

>
> Signed-off-by: Russ Anderson 
> ---
>  arch/ia64/mm/numa.c |   15 ++-
>  mm/page_alloc.c |   15 ++-
>  2 files changed, 28 insertions(+), 2 deletions(-)
>
> Index: linux/mm/page_alloc.c
> ===
> --- linux.orig/mm/page_alloc.c  2013-03-19 16:09:03.736450861 -0500
> +++ linux/mm/page_alloc.c   2013-03-22 17:07:43.895405617 -0500
> @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne
>  {
> unsigned long start_pfn, end_pfn;
> int i, nid;
> +   /*
> +  NOTE: The following SMP-unsafe globals are only used early
> +  in boot when the kernel is running single-threaded.
> +*/
> +   static unsigned long last_start_pfn, last_end_pfn;
> +   static int last_nid;
> +
> +   if (last_start_pfn <= pfn && pfn < last_end_pfn)
> +   return last_nid;
>
> for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
> -   if (start_pfn <= pfn && pfn < end_pfn)
> +   if (start_pfn <= pfn && pfn < end_pfn) {
> +   last_start_pfn = start_pfn;
> +   last_end_pfn = end_pfn;
> +   last_nid = nid;
> return nid;
> +   }
> /* This is a memory hole */
> return -1;
>  }
> Index: linux/arch/ia64/mm/numa.c
> ===
> --- linux.orig/arch/ia64/mm/numa.c  2013-02-25 15:49:44.0 -0600
> +++ linux/arch/ia64/mm/numa.c   2013-03-22 16:09:44.662268239 -0500
> @@ -61,13 +61,26 @@ paddr_to_nid(unsigned long paddr)
>  int __meminit __early_pfn_to_nid(unsigned long pfn)
>  {
> int i, section = pfn >> PFN_SECTION_SHIFT, ssec, esec;
> +   /*
> +  NOTE: The following SMP-unsafe globals are only used early
> +  in boot when the kernel is running single-threaded.
> +   */
> +   static unsigned long last_start_pfn, last_end_pfn;

last_ssec, last_esec?


> +   static int last_nid;
> +
> +   if (section >= last_ssec && section < last_esec)
> +   return last_nid;
>
> for (i = 0; i < num_node_memblks; i++) {
> ssec = node_memblk[i].start_paddr >> PA_SECTION_SHIFT;
> esec = (node_memblk[i].start_paddr + node_memblk[i].size +
> ((1L << PA_SECTION_SHIFT) - 1)) >> PA_SECTION_SHIFT;
> -   if (section >= ssec && section < esec)
> +   if (section >= ssec && section < esec) {
> +   last_ssec = ssec;
> +   last_esec = esec;
> +   last_nid = node_memblk[i].nid
> return node_memblk[i].nid;
> +   }
> }
>
> return -1;
>

also looks like you forget to put IA maintainers in the To list.

may just put ia64 part in separated patch?

Thanks

Yinghai


memblock_search_pfn_nid.patch
Description: Binary data


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-23 Thread Russ Anderson
On Fri, Mar 22, 2013 at 08:25:32AM +0100, Ingo Molnar wrote:
> 
> * David Rientjes  wrote:
> 
> > On Thu, 21 Mar 2013, Ingo Molnar wrote:
> > 
> > > > Index: linux/mm/page_alloc.c
> > > > ===
> > > > --- linux.orig/mm/page_alloc.c  2013-03-18 10:52:11.510988843 -0500
> > > > +++ linux/mm/page_alloc.c   2013-03-18 10:52:14.214931348 -0500
> > > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne
> > > >  {
> > > > unsigned long start_pfn, end_pfn;
> > > > int i, nid;
> > > > +   static unsigned long last_start_pfn, last_end_pfn;
> > > > +   static int last_nid;
> > > 
> > > Please move these globals out of function local scope, to make it more 
> > > apparent that they are not on-stack. I only noticed it in the second pass.
> > 
> > The way they're currently defined places these in meminit.data as 
> > appropriate; if they are moved out, please make sure to annotate their 
> > definitions with __meminitdata.
> 
> I'm fine with having them within the function as well in this special 
> case, as long as a heavy /* NOTE: ... */ warning is put before them - 
> which explains why these SMP-unsafe globals are safe.
> 
> ( That warning will also act as a visual delimiter that breaks the 
>   normally confusing and misleading 'globals mixed amongst stack 
>   variables' pattern. )

Thanks Ingo.  Here is an updated patch with heavy warning added.

As for the wrapper function, I was unable to find an obvious
way to add a wrapper without significanly changing both
versions of __early_pfn_to_nid().  It seems cleaner to add
the change in both versions.  I'm sure someone will point 
out if this conclusion is wrong.  :-)




When booting on a large memory system, the kernel spends
considerable time in memmap_init_zone() setting up memory zones.
Analysis shows significant time spent in __early_pfn_to_nid().

The routine memmap_init_zone() checks each PFN to verify the
nid is valid.  __early_pfn_to_nid() sequentially scans the list of
pfn ranges to find the right range and returns the nid.  This does
not scale well.  On a 4 TB (single rack) system there are 308
memory ranges to scan.  The higher the PFN the more time spent
sequentially spinning through memory ranges.

Since memmap_init_zone() increments pfn, it will almost always be
looking for the same range as the previous pfn, so check that
range first.  If it is in the same range, return that nid.
If not, scan the list as before.

A 4 TB (single rack) UV1 system takes 512 seconds to get through
the zone code.  This performance optimization reduces the time
by 189 seconds, a 36% improvement.

A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds,
a 112.9 second (53%) reduction.

Signed-off-by: Russ Anderson 
---
 arch/ia64/mm/numa.c |   15 ++-
 mm/page_alloc.c |   15 ++-
 2 files changed, 28 insertions(+), 2 deletions(-)

Index: linux/mm/page_alloc.c
===
--- linux.orig/mm/page_alloc.c  2013-03-19 16:09:03.736450861 -0500
+++ linux/mm/page_alloc.c   2013-03-22 17:07:43.895405617 -0500
@@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne
 {
unsigned long start_pfn, end_pfn;
int i, nid;
+   /*
+  NOTE: The following SMP-unsafe globals are only used early
+  in boot when the kernel is running single-threaded.
+*/
+   static unsigned long last_start_pfn, last_end_pfn;
+   static int last_nid;
+
+   if (last_start_pfn <= pfn && pfn < last_end_pfn)
+   return last_nid;
 
for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
-   if (start_pfn <= pfn && pfn < end_pfn)
+   if (start_pfn <= pfn && pfn < end_pfn) {
+   last_start_pfn = start_pfn;
+   last_end_pfn = end_pfn;
+   last_nid = nid;
return nid;
+   }
/* This is a memory hole */
return -1;
 }
Index: linux/arch/ia64/mm/numa.c
===
--- linux.orig/arch/ia64/mm/numa.c  2013-02-25 15:49:44.0 -0600
+++ linux/arch/ia64/mm/numa.c   2013-03-22 16:09:44.662268239 -0500
@@ -61,13 +61,26 @@ paddr_to_nid(unsigned long paddr)
 int __meminit __early_pfn_to_nid(unsigned long pfn)
 {
int i, section = pfn >> PFN_SECTION_SHIFT, ssec, esec;
+   /*
+  NOTE: The following SMP-unsafe globals are only used early
+  in boot when the kernel is running single-threaded.
+   */
+   static unsigned long last_start_pfn, last_end_pfn;
+   static int last_nid;
+
+   if (section >= last_ssec && section < last_esec)
+   return last_nid;
 
for (i = 0; i < num_node_memblks; i++) {
ssec = node_membl

Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-22 Thread Ingo Molnar

* David Rientjes  wrote:

> On Thu, 21 Mar 2013, Ingo Molnar wrote:
> 
> > > Index: linux/mm/page_alloc.c
> > > ===
> > > --- linux.orig/mm/page_alloc.c2013-03-18 10:52:11.510988843 -0500
> > > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500
> > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne
> > >  {
> > >   unsigned long start_pfn, end_pfn;
> > >   int i, nid;
> > > + static unsigned long last_start_pfn, last_end_pfn;
> > > + static int last_nid;
> > 
> > Please move these globals out of function local scope, to make it more 
> > apparent that they are not on-stack. I only noticed it in the second pass.
> 
> The way they're currently defined places these in meminit.data as 
> appropriate; if they are moved out, please make sure to annotate their 
> definitions with __meminitdata.

I'm fine with having them within the function as well in this special 
case, as long as a heavy /* NOTE: ... */ warning is put before them - 
which explains why these SMP-unsafe globals are safe.

( That warning will also act as a visual delimiter that breaks the 
  normally confusing and misleading 'globals mixed amongst stack 
  variables' pattern. )

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-21 Thread David Rientjes
On Thu, 21 Mar 2013, Ingo Molnar wrote:

> > Index: linux/mm/page_alloc.c
> > ===
> > --- linux.orig/mm/page_alloc.c  2013-03-18 10:52:11.510988843 -0500
> > +++ linux/mm/page_alloc.c   2013-03-18 10:52:14.214931348 -0500
> > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne
> >  {
> > unsigned long start_pfn, end_pfn;
> > int i, nid;
> > +   static unsigned long last_start_pfn, last_end_pfn;
> > +   static int last_nid;
> 
> Please move these globals out of function local scope, to make it more 
> apparent that they are not on-stack. I only noticed it in the second pass.
> 

The way they're currently defined places these in meminit.data as 
appropriate; if they are moved out, please make sure to annotate their 
definitions with __meminitdata.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-21 Thread Ingo Molnar

* Michal Hocko  wrote:

> On Thu 21-03-13 11:55:16, Ingo Molnar wrote:
> > 
> > * Russ Anderson  wrote:
> > 
> > > When booting on a large memory system, the kernel spends
> > > considerable time in memmap_init_zone() setting up memory zones.
> > > Analysis shows significant time spent in __early_pfn_to_nid().
> > > 
> > > The routine memmap_init_zone() checks each PFN to verify the
> > > nid is valid.  __early_pfn_to_nid() sequentially scans the list of
> > > pfn ranges to find the right range and returns the nid.  This does
> > > not scale well.  On a 4 TB (single rack) system there are 308
> > > memory ranges to scan.  The higher the PFN the more time spent
> > > sequentially spinning through memory ranges.
> > > 
> > > Since memmap_init_zone() increments pfn, it will almost always be
> > > looking for the same range as the previous pfn, so check that
> > > range first.  If it is in the same range, return that nid.
> > > If not, scan the list as before.
> > > 
> > > A 4 TB (single rack) UV1 system takes 512 seconds to get through
> > > the zone code.  This performance optimization reduces the time
> > > by 189 seconds, a 36% improvement.
> > > 
> > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds,
> > > a 112.9 second (53%) reduction.
> > 
> > Nice speedup!
> > 
> > A minor nit, in addition to Andrew's suggestion about wrapping 
> > __early_pfn_to_nid():
> > 
> > > Index: linux/mm/page_alloc.c
> > > ===
> > > --- linux.orig/mm/page_alloc.c2013-03-18 10:52:11.510988843 -0500
> > > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500
> > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne
> > >  {
> > >   unsigned long start_pfn, end_pfn;
> > >   int i, nid;
> > > + static unsigned long last_start_pfn, last_end_pfn;
> > > + static int last_nid;
> > 
> > Please move these globals out of function local scope, to make it more 
> > apparent that they are not on-stack. I only noticed it in the second pass.
> 
> Wouldn't this just add more confision with other _pfn variables? (e.g. 
> {min,max}_low_pfn and others)

I don't think so.

> IMO the local scope is more obvious as this is and should only be used 
> for caching purposes.

It's a pattern we actively avoid in kernel code.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-21 Thread Michal Hocko
On Thu 21-03-13 11:55:16, Ingo Molnar wrote:
> 
> * Russ Anderson  wrote:
> 
> > When booting on a large memory system, the kernel spends
> > considerable time in memmap_init_zone() setting up memory zones.
> > Analysis shows significant time spent in __early_pfn_to_nid().
> > 
> > The routine memmap_init_zone() checks each PFN to verify the
> > nid is valid.  __early_pfn_to_nid() sequentially scans the list of
> > pfn ranges to find the right range and returns the nid.  This does
> > not scale well.  On a 4 TB (single rack) system there are 308
> > memory ranges to scan.  The higher the PFN the more time spent
> > sequentially spinning through memory ranges.
> > 
> > Since memmap_init_zone() increments pfn, it will almost always be
> > looking for the same range as the previous pfn, so check that
> > range first.  If it is in the same range, return that nid.
> > If not, scan the list as before.
> > 
> > A 4 TB (single rack) UV1 system takes 512 seconds to get through
> > the zone code.  This performance optimization reduces the time
> > by 189 seconds, a 36% improvement.
> > 
> > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds,
> > a 112.9 second (53%) reduction.
> 
> Nice speedup!
> 
> A minor nit, in addition to Andrew's suggestion about wrapping 
> __early_pfn_to_nid():
> 
> > Index: linux/mm/page_alloc.c
> > ===
> > --- linux.orig/mm/page_alloc.c  2013-03-18 10:52:11.510988843 -0500
> > +++ linux/mm/page_alloc.c   2013-03-18 10:52:14.214931348 -0500
> > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne
> >  {
> > unsigned long start_pfn, end_pfn;
> > int i, nid;
> > +   static unsigned long last_start_pfn, last_end_pfn;
> > +   static int last_nid;
> 
> Please move these globals out of function local scope, to make it more 
> apparent that they are not on-stack. I only noticed it in the second pass.

Wouldn't this just add more confision with other _pfn variables? (e.g.
{min,max}_low_pfn and others)

IMO the local scope is more obvious as this is and should only be used
for caching purposes.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-21 Thread Ingo Molnar

* Russ Anderson  wrote:

> When booting on a large memory system, the kernel spends
> considerable time in memmap_init_zone() setting up memory zones.
> Analysis shows significant time spent in __early_pfn_to_nid().
> 
> The routine memmap_init_zone() checks each PFN to verify the
> nid is valid.  __early_pfn_to_nid() sequentially scans the list of
> pfn ranges to find the right range and returns the nid.  This does
> not scale well.  On a 4 TB (single rack) system there are 308
> memory ranges to scan.  The higher the PFN the more time spent
> sequentially spinning through memory ranges.
> 
> Since memmap_init_zone() increments pfn, it will almost always be
> looking for the same range as the previous pfn, so check that
> range first.  If it is in the same range, return that nid.
> If not, scan the list as before.
> 
> A 4 TB (single rack) UV1 system takes 512 seconds to get through
> the zone code.  This performance optimization reduces the time
> by 189 seconds, a 36% improvement.
> 
> A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds,
> a 112.9 second (53%) reduction.

Nice speedup!

A minor nit, in addition to Andrew's suggestion about wrapping 
__early_pfn_to_nid():

> Index: linux/mm/page_alloc.c
> ===
> --- linux.orig/mm/page_alloc.c2013-03-18 10:52:11.510988843 -0500
> +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500
> @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne
>  {
>   unsigned long start_pfn, end_pfn;
>   int i, nid;
> + static unsigned long last_start_pfn, last_end_pfn;
> + static int last_nid;

Please move these globals out of function local scope, to make it more 
apparent that they are not on-stack. I only noticed it in the second pass.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-20 Thread Andrew Morton
On Mon, 18 Mar 2013 10:56:19 -0500 Russ Anderson  wrote:

> When booting on a large memory system, the kernel spends
> considerable time in memmap_init_zone() setting up memory zones.
> Analysis shows significant time spent in __early_pfn_to_nid().
> 
> The routine memmap_init_zone() checks each PFN to verify the
> nid is valid.  __early_pfn_to_nid() sequentially scans the list of
> pfn ranges to find the right range and returns the nid.  This does
> not scale well.  On a 4 TB (single rack) system there are 308
> memory ranges to scan.  The higher the PFN the more time spent
> sequentially spinning through memory ranges.
> 
> Since memmap_init_zone() increments pfn, it will almost always be
> looking for the same range as the previous pfn, so check that
> range first.  If it is in the same range, return that nid.
> If not, scan the list as before.
> 
> A 4 TB (single rack) UV1 system takes 512 seconds to get through
> the zone code.  This performance optimization reduces the time
> by 189 seconds, a 36% improvement.
> 
> A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds,
> a 112.9 second (53%) reduction.
> 
> ...
>
> --- linux.orig/mm/page_alloc.c2013-03-18 10:52:11.510988843 -0500
> +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500
> @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne
>  {
>   unsigned long start_pfn, end_pfn;
>   int i, nid;
> + static unsigned long last_start_pfn, last_end_pfn;
> + static int last_nid;
> +
> + if (last_start_pfn <= pfn && pfn < last_end_pfn)
> + return last_nid;
>  
>   for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid)
> - if (start_pfn <= pfn && pfn < end_pfn)
> + if (start_pfn <= pfn && pfn < end_pfn) {
> + last_nid = nid;
> + last_start_pfn = start_pfn;
> + last_end_pfn = end_pfn;
>   return nid;
> + }
>   /* This is a memory hole */
>   return -1;

lol.  And yes, it seems pretty safe to assume that the kernel is
running single-threaded at this time.

arch/ia64/mm/numa.c's __early_pfn_to_nid might benefit from the same
treatment.  In fact if this had been implemented as a caching wrapper
around an unchanged __early_pfn_to_nid(), no ia64 edits would be
needed?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm: speedup in __early_pfn_to_nid

2013-03-18 Thread David Rientjes
On Mon, 18 Mar 2013, Russ Anderson wrote:

> When booting on a large memory system, the kernel spends
> considerable time in memmap_init_zone() setting up memory zones.
> Analysis shows significant time spent in __early_pfn_to_nid().
> 
> The routine memmap_init_zone() checks each PFN to verify the
> nid is valid.  __early_pfn_to_nid() sequentially scans the list of
> pfn ranges to find the right range and returns the nid.  This does
> not scale well.  On a 4 TB (single rack) system there are 308
> memory ranges to scan.  The higher the PFN the more time spent
> sequentially spinning through memory ranges.
> 
> Since memmap_init_zone() increments pfn, it will almost always be
> looking for the same range as the previous pfn, so check that
> range first.  If it is in the same range, return that nid.
> If not, scan the list as before.
> 
> A 4 TB (single rack) UV1 system takes 512 seconds to get through
> the zone code.  This performance optimization reduces the time
> by 189 seconds, a 36% improvement.
> 
> A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds,
> a 112.9 second (53%) reduction.
> 
> Signed-off-by: Russ Anderson 

Acked-by: David Rientjes 

Very nice improvement!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/