subject:"\[PATCH\] Fix sparsemem on Cell"

Re: [PATCH] Fix sparsemem on Cell (take 3)

2007-01-07 Thread Tim Pepper


On 1/7/07, Dave Hansen <[EMAIL PROTECTED]> wrote:

On Fri, 2007-01-05 at 22:52 -0600, John Rose wrote:
> Could this break ia64, given that it uses memmap_init_zone()?

You are right, I think it does.


Boot tested OK on ia64 with this latest version of the patch.

(forgot to click plain text on gmail the first time..sorry if you got
html mail or repeat)


Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell (take 3)

2007-01-07 Thread Arnd Bergmann

On Sunday 07 January 2007 09:58, Dave Hansen wrote:
> The following patch fixes an oops experienced on the Cell architecture
> when init-time functions, early_*(), are called at runtime.  It alters
> the call paths to make sure that the callers explicitly say whether the
> call is being made on behalf of a hotplug even, or happening at
> boot-time. 
> 
> It has been compile tested on ia64, s390, i386 and x86_64.

I can't test it here, since I'm travelling at the moment, but
this version looks good to me. Thanks for picking it up again!

> Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>

Acked-by: Arnd Bergmann <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell (take 3)

2007-01-07 Thread Dave Hansen

On Fri, 2007-01-05 at 22:52 -0600, John Rose wrote:
> > I dropped this on the floor over Christmas.  This has had a few smoke
> > tests on ppc64 and i386 and is ready for -mm.  Against 2.6.20-rc2-mm1.
> 
> Could this break ia64, given that it uses memmap_init_zone()?

You are right, I think it does.

Here's an updated patch to replace the earlier one.  I had to move the
enum definition over to a different header because ia64 evidently has a
different include order.

---

The following patch fixes an oops experienced on the Cell architecture
when init-time functions, early_*(), are called at runtime.  It alters
the call paths to make sure that the callers explicitly say whether the
call is being made on behalf of a hotplug even, or happening at
boot-time. 

It has been compile tested on ia64, s390, i386 and x86_64.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>

---

 lxc-dave/arch/ia64/mm/init.c|5 +++--
 lxc-dave/arch/s390/mm/vmem.c|3 ++-
 lxc-dave/include/linux/mm.h |3 ++-
 lxc-dave/include/linux/mmzone.h |8 ++--
 lxc-dave/mm/memory_hotplug.c|6 --
 lxc-dave/mm/page_alloc.c|   25 +
 6 files changed, 34 insertions(+), 16 deletions(-)

diff -puN arch/s390/mm/vmem.c~Re-_PATCH_Fix_sparsemem_on_Cell 
arch/s390/mm/vmem.c
--- lxc/arch/s390/mm/vmem.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 
15:38:23.0 -0800
+++ lxc-dave/arch/s390/mm/vmem.c2007-01-07 00:47:02.0 -0800
@@ -61,7 +61,8 @@ void memmap_init(unsigned long size, int
 
if (map_start < map_end)
memmap_init_zone((unsigned long)(map_end - map_start),
-nid, zone, page_to_pfn(map_start));
+nid, zone, page_to_pfn(map_start),
+MEMMAP_EARLY);
}
 }
 
diff -puN include/linux/mm.h~Re-_PATCH_Fix_sparsemem_on_Cell include/linux/mm.h
--- lxc/include/linux/mm.h~Re-_PATCH_Fix_sparsemem_on_Cell  2007-01-05 
15:38:23.0 -0800
+++ lxc-dave/include/linux/mm.h 2007-01-06 23:57:59.0 -0800
@@ -979,7 +979,8 @@ extern int early_pfn_to_nid(unsigned lon
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long);
+extern void memmap_init_zone(unsigned long, int, unsigned long,
+   unsigned long, enum memmap_context);
 extern void setup_per_zone_pages_min(void);
 extern void mem_init(void);
 extern void show_mem(void);
diff -puN include/linux/mmzone.h~Re-_PATCH_Fix_sparsemem_on_Cell 
include/linux/mmzone.h
--- lxc/include/linux/mmzone.h~Re-_PATCH_Fix_sparsemem_on_Cell  2007-01-05 
15:38:23.0 -0800
+++ lxc-dave/include/linux/mmzone.h 2007-01-06 23:58:15.0 -0800
@@ -471,9 +471,13 @@ void build_all_zonelists(void);
 void wakeup_kswapd(struct zone *zone, int order);
 int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
int classzone_idx, int alloc_flags);
-
+enum memmap_context {
+   MEMMAP_EARLY,
+   MEMMAP_HOTPLUG,
+};
 extern int init_currently_empty_zone(struct zone *zone, unsigned long 
start_pfn,
-unsigned long size);
+unsigned long size,
+enum memmap_context context);
 
 #ifdef CONFIG_HAVE_MEMORY_PRESENT
 void memory_present(int nid, unsigned long start, unsigned long end);
diff -puN mm/memory_hotplug.c~Re-_PATCH_Fix_sparsemem_on_Cell 
mm/memory_hotplug.c
--- lxc/mm/memory_hotplug.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 
15:38:23.0 -0800
+++ lxc-dave/mm/memory_hotplug.c2007-01-05 15:38:23.0 -0800
@@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone,
zone_type = zone - pgdat->node_zones;
if (!populated_zone(zone)) {
int ret = 0;
-   ret = init_currently_empty_zone(zone, phys_start_pfn, nr_pages);
+   ret = init_currently_empty_zone(zone, phys_start_pfn,
+   nr_pages, MEMMAP_HOTPLUG);
if (ret < 0)
return ret;
}
-   memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn);
+   memmap_init_zone(nr_pages, nid, zone_type,
+phys_start_pfn, MEMMAP_HOTPLUG);
return 0;
 }
 
diff -puN mm/page_alloc.c~Re-_PATCH_Fix_sparsemem_on_Cell mm/page_alloc.c
--- lxc/mm/page_alloc.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 
15:38:23.0 -0800
+++ lxc-dave/mm/page_alloc.c2007-01-07 00:35:27.0 -0800
@@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b
  * done. Non-atomic initialization, single-pass.
  */
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long 
zone,
-

Re: [PATCH] Fix sparsemem on Cell (take 3)

2007-01-07 Thread Dave Hansen

On Fri, 2007-01-05 at 22:52 -0600, John Rose wrote:
  I dropped this on the floor over Christmas.  This has had a few smoke
  tests on ppc64 and i386 and is ready for -mm.  Against 2.6.20-rc2-mm1.
 
 Could this break ia64, given that it uses memmap_init_zone()?

You are right, I think it does.

Here's an updated patch to replace the earlier one.  I had to move the
enum definition over to a different header because ia64 evidently has a
different include order.

---

The following patch fixes an oops experienced on the Cell architecture
when init-time functions, early_*(), are called at runtime.  It alters
the call paths to make sure that the callers explicitly say whether the
call is being made on behalf of a hotplug even, or happening at
boot-time. 

It has been compile tested on ia64, s390, i386 and x86_64.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]

---

 lxc-dave/arch/ia64/mm/init.c|5 +++--
 lxc-dave/arch/s390/mm/vmem.c|3 ++-
 lxc-dave/include/linux/mm.h |3 ++-
 lxc-dave/include/linux/mmzone.h |8 ++--
 lxc-dave/mm/memory_hotplug.c|6 --
 lxc-dave/mm/page_alloc.c|   25 +
 6 files changed, 34 insertions(+), 16 deletions(-)

diff -puN arch/s390/mm/vmem.c~Re-_PATCH_Fix_sparsemem_on_Cell 
arch/s390/mm/vmem.c
--- lxc/arch/s390/mm/vmem.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 
15:38:23.0 -0800
+++ lxc-dave/arch/s390/mm/vmem.c2007-01-07 00:47:02.0 -0800
@@ -61,7 +61,8 @@ void memmap_init(unsigned long size, int
 
if (map_start  map_end)
memmap_init_zone((unsigned long)(map_end - map_start),
-nid, zone, page_to_pfn(map_start));
+nid, zone, page_to_pfn(map_start),
+MEMMAP_EARLY);
}
 }
 
diff -puN include/linux/mm.h~Re-_PATCH_Fix_sparsemem_on_Cell include/linux/mm.h
--- lxc/include/linux/mm.h~Re-_PATCH_Fix_sparsemem_on_Cell  2007-01-05 
15:38:23.0 -0800
+++ lxc-dave/include/linux/mm.h 2007-01-06 23:57:59.0 -0800
@@ -979,7 +979,8 @@ extern int early_pfn_to_nid(unsigned lon
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long);
+extern void memmap_init_zone(unsigned long, int, unsigned long,
+   unsigned long, enum memmap_context);
 extern void setup_per_zone_pages_min(void);
 extern void mem_init(void);
 extern void show_mem(void);
diff -puN include/linux/mmzone.h~Re-_PATCH_Fix_sparsemem_on_Cell 
include/linux/mmzone.h
--- lxc/include/linux/mmzone.h~Re-_PATCH_Fix_sparsemem_on_Cell  2007-01-05 
15:38:23.0 -0800
+++ lxc-dave/include/linux/mmzone.h 2007-01-06 23:58:15.0 -0800
@@ -471,9 +471,13 @@ void build_all_zonelists(void);
 void wakeup_kswapd(struct zone *zone, int order);
 int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
int classzone_idx, int alloc_flags);
-
+enum memmap_context {
+   MEMMAP_EARLY,
+   MEMMAP_HOTPLUG,
+};
 extern int init_currently_empty_zone(struct zone *zone, unsigned long 
start_pfn,
-unsigned long size);
+unsigned long size,
+enum memmap_context context);
 
 #ifdef CONFIG_HAVE_MEMORY_PRESENT
 void memory_present(int nid, unsigned long start, unsigned long end);
diff -puN mm/memory_hotplug.c~Re-_PATCH_Fix_sparsemem_on_Cell 
mm/memory_hotplug.c
--- lxc/mm/memory_hotplug.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 
15:38:23.0 -0800
+++ lxc-dave/mm/memory_hotplug.c2007-01-05 15:38:23.0 -0800
@@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone,
zone_type = zone - pgdat-node_zones;
if (!populated_zone(zone)) {
int ret = 0;
-   ret = init_currently_empty_zone(zone, phys_start_pfn, nr_pages);
+   ret = init_currently_empty_zone(zone, phys_start_pfn,
+   nr_pages, MEMMAP_HOTPLUG);
if (ret  0)
return ret;
}
-   memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn);
+   memmap_init_zone(nr_pages, nid, zone_type,
+phys_start_pfn, MEMMAP_HOTPLUG);
return 0;
 }
 
diff -puN mm/page_alloc.c~Re-_PATCH_Fix_sparsemem_on_Cell mm/page_alloc.c
--- lxc/mm/page_alloc.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 
15:38:23.0 -0800
+++ lxc-dave/mm/page_alloc.c2007-01-07 00:35:27.0 -0800
@@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b
  * done. Non-atomic initialization, single-pass.
  */
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long 
zone,
-

Re: [PATCH] Fix sparsemem on Cell (take 3)

2007-01-07 Thread Arnd Bergmann

On Sunday 07 January 2007 09:58, Dave Hansen wrote:
 The following patch fixes an oops experienced on the Cell architecture
 when init-time functions, early_*(), are called at runtime.  It alters
 the call paths to make sure that the callers explicitly say whether the
 call is being made on behalf of a hotplug even, or happening at
 boot-time. 
 
 It has been compile tested on ia64, s390, i386 and x86_64.

I can't test it here, since I'm travelling at the moment, but
this version looks good to me. Thanks for picking it up again!

 Signed-off-by: Dave Hansen [EMAIL PROTECTED]

Acked-by: Arnd Bergmann [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell (take 3)

2007-01-07 Thread Tim Pepper


On 1/7/07, Dave Hansen [EMAIL PROTECTED] wrote:

On Fri, 2007-01-05 at 22:52 -0600, John Rose wrote:
 Could this break ia64, given that it uses memmap_init_zone()?

You are right, I think it does.


Boot tested OK on ia64 with this latest version of the patch.

(forgot to click plain text on gmail the first time..sorry if you got
html mail or repeat)


Tim
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell (take 3)

2007-01-05 Thread John Rose

> I dropped this on the floor over Christmas.  This has had a few smoke
> tests on ppc64 and i386 and is ready for -mm.  Against 2.6.20-rc2-mm1.

Could this break ia64, given that it uses memmap_init_zone()?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix sparsemem on Cell (take 3)

2007-01-05 Thread Dave Hansen

I dropped this on the floor over Christmas.  This has had a few smoke
tests on ppc64 and i386 and is ready for -mm.  Against 2.6.20-rc2-mm1.

The following patch fixes an oops experienced on the Cell architecture
when init-time functions, early_*(), are called at runtime.  It alters
the call paths to make sure that the callers explicitly say whether the
call is being made on behalf of a hotplug even, or happening at
boot-time. 

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>

---

 lxc-dave/arch/s390/mm/vmem.c|3 ++-
 lxc-dave/include/linux/mm.h |7 ++-
 lxc-dave/include/linux/mmzone.h |3 ++-
 lxc-dave/mm/memory_hotplug.c|6 --
 lxc-dave/mm/page_alloc.c|   25 +
 5 files changed, 31 insertions(+), 13 deletions(-)

diff -puN mm/page_alloc.c~sparsemem-enum1 mm/page_alloc.c
--- lxc/mm/page_alloc.c~sparsemem-enum1 2006-12-19 09:38:34.0 -0800
+++ lxc-dave/mm/page_alloc.c2006-12-19 11:18:24.0 -0800
@@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b
  * done. Non-atomic initialization, single-pass.
  */
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long 
zone,
-   unsigned long start_pfn)
+   unsigned long start_pfn, enum memmap_context context)
 {
struct page *page;
unsigned long end_pfn = start_pfn + size;
unsigned long pfn;
 
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
-   continue;
+   /*
+* There can be holes in boot-time mem_map[]s
+* handed to this function.  They do not
+* exist on hotplugged memory.
+*/
+   if (context == MEMMAP_EARLY) {
+   if (!early_pfn_valid(pfn))
+   continue;
+   if (!early_pfn_in_nid(pfn, nid))
+   continue;
+   }
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
@@ -2102,7 +2109,7 @@ void zone_init_free_lists(struct pglist_
 
 #ifndef __HAVE_ARCH_MEMMAP_INIT
 #define memmap_init(size, nid, zone, start_pfn) \
-   memmap_init_zone((size), (nid), (zone), (start_pfn))
+   memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)
 #endif
 
 static int __cpuinit zone_batchsize(struct zone *zone)
@@ -2348,7 +2355,8 @@ static __meminit void zone_pcp_init(stru
 
 __meminit int init_currently_empty_zone(struct zone *zone,
unsigned long zone_start_pfn,
-   unsigned long size)
+   unsigned long size,
+   enum memmap_context context)
 {
struct pglist_data *pgdat = zone->zone_pgdat;
int ret;
@@ -2792,7 +2800,8 @@ static void __meminit free_area_init_cor
if (!size)
continue;
 
-   ret = init_currently_empty_zone(zone, zone_start_pfn, size);
+   ret = init_currently_empty_zone(zone, zone_start_pfn,
+   size, MEMMAP_EARLY);
BUG_ON(ret);
zone_start_pfn += size;
}
diff -puN include/linux/mm.h~sparsemem-enum1 include/linux/mm.h
--- lxc/include/linux/mm.h~sparsemem-enum1  2006-12-19 09:38:45.0 
-0800
+++ lxc-dave/include/linux/mm.h 2006-12-19 09:50:47.0 -0800
@@ -979,7 +979,12 @@ extern int early_pfn_to_nid(unsigned lon
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long);
+enum memmap_context {
+   MEMMAP_EARLY,
+   MEMMAP_HOTPLUG,
+};
+extern void memmap_init_zone(unsigned long, int, unsigned long,
+   unsigned long, enum memmap_context);
 extern void setup_per_zone_pages_min(void);
 extern void mem_init(void);
 extern void show_mem(void);
diff -puN mm/memory_hotplug.c~sparsemem-enum1 mm/memory_hotplug.c
--- lxc/mm/memory_hotplug.c~sparsemem-enum1 2006-12-19 09:39:19.0 
-0800
+++ lxc-dave/mm/memory_hotplug.c2006-12-19 09:50:24.0 -0800
@@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone,
zone_type = zone - pgdat->node_zones;
if (!populated_zone(zone)) {
int ret = 0;
-   ret = init_currently_empty_zone(zone, phys_start_pfn, nr_pages);
+   ret = init_currently_empty_zone(zone, phys_start_pfn,
+   nr_pages, MEMMAP_HOTPLUG);
if (ret < 0)
return ret;
}
-

[PATCH] Fix sparsemem on Cell (take 3)

2007-01-05 Thread Dave Hansen

I dropped this on the floor over Christmas.  This has had a few smoke
tests on ppc64 and i386 and is ready for -mm.  Against 2.6.20-rc2-mm1.

The following patch fixes an oops experienced on the Cell architecture
when init-time functions, early_*(), are called at runtime.  It alters
the call paths to make sure that the callers explicitly say whether the
call is being made on behalf of a hotplug even, or happening at
boot-time. 

Signed-off-by: Dave Hansen [EMAIL PROTECTED]

---

 lxc-dave/arch/s390/mm/vmem.c|3 ++-
 lxc-dave/include/linux/mm.h |7 ++-
 lxc-dave/include/linux/mmzone.h |3 ++-
 lxc-dave/mm/memory_hotplug.c|6 --
 lxc-dave/mm/page_alloc.c|   25 +
 5 files changed, 31 insertions(+), 13 deletions(-)

diff -puN mm/page_alloc.c~sparsemem-enum1 mm/page_alloc.c
--- lxc/mm/page_alloc.c~sparsemem-enum1 2006-12-19 09:38:34.0 -0800
+++ lxc-dave/mm/page_alloc.c2006-12-19 11:18:24.0 -0800
@@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b
  * done. Non-atomic initialization, single-pass.
  */
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long 
zone,
-   unsigned long start_pfn)
+   unsigned long start_pfn, enum memmap_context context)
 {
struct page *page;
unsigned long end_pfn = start_pfn + size;
unsigned long pfn;
 
for (pfn = start_pfn; pfn  end_pfn; pfn++) {
-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
-   continue;
+   /*
+* There can be holes in boot-time mem_map[]s
+* handed to this function.  They do not
+* exist on hotplugged memory.
+*/
+   if (context == MEMMAP_EARLY) {
+   if (!early_pfn_valid(pfn))
+   continue;
+   if (!early_pfn_in_nid(pfn, nid))
+   continue;
+   }
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
@@ -2102,7 +2109,7 @@ void zone_init_free_lists(struct pglist_
 
 #ifndef __HAVE_ARCH_MEMMAP_INIT
 #define memmap_init(size, nid, zone, start_pfn) \
-   memmap_init_zone((size), (nid), (zone), (start_pfn))
+   memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)
 #endif
 
 static int __cpuinit zone_batchsize(struct zone *zone)
@@ -2348,7 +2355,8 @@ static __meminit void zone_pcp_init(stru
 
 __meminit int init_currently_empty_zone(struct zone *zone,
unsigned long zone_start_pfn,
-   unsigned long size)
+   unsigned long size,
+   enum memmap_context context)
 {
struct pglist_data *pgdat = zone-zone_pgdat;
int ret;
@@ -2792,7 +2800,8 @@ static void __meminit free_area_init_cor
if (!size)
continue;
 
-   ret = init_currently_empty_zone(zone, zone_start_pfn, size);
+   ret = init_currently_empty_zone(zone, zone_start_pfn,
+   size, MEMMAP_EARLY);
BUG_ON(ret);
zone_start_pfn += size;
}
diff -puN include/linux/mm.h~sparsemem-enum1 include/linux/mm.h
--- lxc/include/linux/mm.h~sparsemem-enum1  2006-12-19 09:38:45.0 
-0800
+++ lxc-dave/include/linux/mm.h 2006-12-19 09:50:47.0 -0800
@@ -979,7 +979,12 @@ extern int early_pfn_to_nid(unsigned lon
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long);
+enum memmap_context {
+   MEMMAP_EARLY,
+   MEMMAP_HOTPLUG,
+};
+extern void memmap_init_zone(unsigned long, int, unsigned long,
+   unsigned long, enum memmap_context);
 extern void setup_per_zone_pages_min(void);
 extern void mem_init(void);
 extern void show_mem(void);
diff -puN mm/memory_hotplug.c~sparsemem-enum1 mm/memory_hotplug.c
--- lxc/mm/memory_hotplug.c~sparsemem-enum1 2006-12-19 09:39:19.0 
-0800
+++ lxc-dave/mm/memory_hotplug.c2006-12-19 09:50:24.0 -0800
@@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone,
zone_type = zone - pgdat-node_zones;
if (!populated_zone(zone)) {
int ret = 0;
-   ret = init_currently_empty_zone(zone, phys_start_pfn, nr_pages);
+   ret = init_currently_empty_zone(zone, phys_start_pfn,
+   nr_pages, MEMMAP_HOTPLUG);
if (ret  0)
return ret;
}
-

Re: [PATCH] Fix sparsemem on Cell (take 3)

2007-01-05 Thread John Rose

 I dropped this on the floor over Christmas.  This has had a few smoke
 tests on ppc64 and i386 and is ready for -mm.  Against 2.6.20-rc2-mm1.

Could this break ia64, given that it uses memmap_init_zone()?


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-19 Thread Dave Hansen

Pulling cbe... list off the cc because it is giving annoying 'too many
recipients' warnings.

On Tue, 2006-12-19 at 09:59 +0100, Arnd Bergmann wrote:
> On Tuesday 19 December 2006 00:16, Dave Hansen wrote:
> > How about an enum, or a pair of #defines?
> > 
> > enum context
> > {
> > EARLY,
> > HOTPLUG
> > };

This patch should do the trick.  Arnd, can you try this to make sure it
solves the issue?  Also, if you get a chance, could you do a quick s390
boot of it?  It was the only arch touched by this whole thing.

It compiles on all of my i386 .configs.

--

The following patch fixes an oops experienced on the Cell architecture
when init-time functions, early_*(), are called at runtime.  It alters
the call paths to make sure that the callers explicitly say whether the
call is being made on behalf of a hotplug even, or happening at
boot-time. 

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>

---

 lxc-dave/arch/s390/mm/vmem.c|3 ++-
 lxc-dave/include/linux/mm.h |7 ++-
 lxc-dave/include/linux/mmzone.h |3 ++-
 lxc-dave/mm/memory_hotplug.c|6 --
 lxc-dave/mm/page_alloc.c|   25 +
 5 files changed, 31 insertions(+), 13 deletions(-)

diff -puN mm/page_alloc.c~sparsemem-enum1 mm/page_alloc.c
--- lxc/mm/page_alloc.c~sparsemem-enum1 2006-12-19 09:38:34.0 -0800
+++ lxc-dave/mm/page_alloc.c2006-12-19 11:18:24.0 -0800
@@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b
  * done. Non-atomic initialization, single-pass.
  */
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long 
zone,
-   unsigned long start_pfn)
+   unsigned long start_pfn, enum memmap_context context)
 {
struct page *page;
unsigned long end_pfn = start_pfn + size;
unsigned long pfn;
 
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
-   continue;
+   /*
+* There can be holes in boot-time mem_map[]s
+* handed to this function.  They do not
+* exist on hotplugged memory.
+*/
+   if (context == MEMMAP_EARLY) {
+   if (!early_pfn_valid(pfn))
+   continue;
+   if (!early_pfn_in_nid(pfn, nid))
+   continue;
+   }
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
@@ -2102,7 +2109,7 @@ void zone_init_free_lists(struct pglist_
 
 #ifndef __HAVE_ARCH_MEMMAP_INIT
 #define memmap_init(size, nid, zone, start_pfn) \
-   memmap_init_zone((size), (nid), (zone), (start_pfn))
+   memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)
 #endif
 
 static int __cpuinit zone_batchsize(struct zone *zone)
@@ -2348,7 +2355,8 @@ static __meminit void zone_pcp_init(stru
 
 __meminit int init_currently_empty_zone(struct zone *zone,
unsigned long zone_start_pfn,
-   unsigned long size)
+   unsigned long size,
+   enum memmap_context context)
 {
struct pglist_data *pgdat = zone->zone_pgdat;
int ret;
@@ -2792,7 +2800,8 @@ static void __meminit free_area_init_cor
if (!size)
continue;
 
-   ret = init_currently_empty_zone(zone, zone_start_pfn, size);
+   ret = init_currently_empty_zone(zone, zone_start_pfn,
+   size, MEMMAP_EARLY);
BUG_ON(ret);
zone_start_pfn += size;
}
diff -puN include/linux/mm.h~sparsemem-enum1 include/linux/mm.h
--- lxc/include/linux/mm.h~sparsemem-enum1  2006-12-19 09:38:45.0 
-0800
+++ lxc-dave/include/linux/mm.h 2006-12-19 09:50:47.0 -0800
@@ -979,7 +979,12 @@ extern int early_pfn_to_nid(unsigned lon
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long);
+enum memmap_context {
+   MEMMAP_EARLY,
+   MEMMAP_HOTPLUG,
+};
+extern void memmap_init_zone(unsigned long, int, unsigned long,
+   unsigned long, enum memmap_context);
 extern void setup_per_zone_pages_min(void);
 extern void mem_init(void);
 extern void show_mem(void);
diff -puN mm/memory_hotplug.c~sparsemem-enum1 mm/memory_hotplug.c
--- lxc/mm/memory_hotplug.c~sparsemem-enum1 2006-12-19 09:39:19.0 
-0800
+++ lxc-dave/mm/memory_hotplug.c2006-12-19 09:50:24.0 -0800
@@ -67,11 +67,13 @@ static int __add_zone(struct zone

Re: [PATCH] Fix sparsemem on Cell

2006-12-19 Thread Arnd Bergmann

On Tuesday 19 December 2006 00:16, Dave Hansen wrote:
> How about an enum, or a pair of #defines?
> 
> enum context
> {
>         EARLY,
>         HOTPLUG
> };

Sounds good, but since this is in a global header file, it needs
to be in an appropriate name space, like

enum memmap_context {
MEMMAP_EARLY,
MEMMAP_HOTPLUG,
};

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-19 Thread Arnd Bergmann

On Tuesday 19 December 2006 00:16, Dave Hansen wrote:
 How about an enum, or a pair of #defines?
 
 enum context
 {
         EARLY,
         HOTPLUG
 };

Sounds good, but since this is in a global header file, it needs
to be in an appropriate name space, like

enum memmap_context {
MEMMAP_EARLY,
MEMMAP_HOTPLUG,
};

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-19 Thread Dave Hansen

Pulling cbe... list off the cc because it is giving annoying 'too many
recipients' warnings.

On Tue, 2006-12-19 at 09:59 +0100, Arnd Bergmann wrote:
 On Tuesday 19 December 2006 00:16, Dave Hansen wrote:
  How about an enum, or a pair of #defines?
  
  enum context
  {
  EARLY,
  HOTPLUG
  };

This patch should do the trick.  Arnd, can you try this to make sure it
solves the issue?  Also, if you get a chance, could you do a quick s390
boot of it?  It was the only arch touched by this whole thing.

It compiles on all of my i386 .configs.

--

The following patch fixes an oops experienced on the Cell architecture
when init-time functions, early_*(), are called at runtime.  It alters
the call paths to make sure that the callers explicitly say whether the
call is being made on behalf of a hotplug even, or happening at
boot-time. 

Signed-off-by: Dave Hansen [EMAIL PROTECTED]

---

 lxc-dave/arch/s390/mm/vmem.c|3 ++-
 lxc-dave/include/linux/mm.h |7 ++-
 lxc-dave/include/linux/mmzone.h |3 ++-
 lxc-dave/mm/memory_hotplug.c|6 --
 lxc-dave/mm/page_alloc.c|   25 +
 5 files changed, 31 insertions(+), 13 deletions(-)

diff -puN mm/page_alloc.c~sparsemem-enum1 mm/page_alloc.c
--- lxc/mm/page_alloc.c~sparsemem-enum1 2006-12-19 09:38:34.0 -0800
+++ lxc-dave/mm/page_alloc.c2006-12-19 11:18:24.0 -0800
@@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b
  * done. Non-atomic initialization, single-pass.
  */
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long 
zone,
-   unsigned long start_pfn)
+   unsigned long start_pfn, enum memmap_context context)
 {
struct page *page;
unsigned long end_pfn = start_pfn + size;
unsigned long pfn;
 
for (pfn = start_pfn; pfn  end_pfn; pfn++) {
-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
-   continue;
+   /*
+* There can be holes in boot-time mem_map[]s
+* handed to this function.  They do not
+* exist on hotplugged memory.
+*/
+   if (context == MEMMAP_EARLY) {
+   if (!early_pfn_valid(pfn))
+   continue;
+   if (!early_pfn_in_nid(pfn, nid))
+   continue;
+   }
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
@@ -2102,7 +2109,7 @@ void zone_init_free_lists(struct pglist_
 
 #ifndef __HAVE_ARCH_MEMMAP_INIT
 #define memmap_init(size, nid, zone, start_pfn) \
-   memmap_init_zone((size), (nid), (zone), (start_pfn))
+   memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)
 #endif
 
 static int __cpuinit zone_batchsize(struct zone *zone)
@@ -2348,7 +2355,8 @@ static __meminit void zone_pcp_init(stru
 
 __meminit int init_currently_empty_zone(struct zone *zone,
unsigned long zone_start_pfn,
-   unsigned long size)
+   unsigned long size,
+   enum memmap_context context)
 {
struct pglist_data *pgdat = zone-zone_pgdat;
int ret;
@@ -2792,7 +2800,8 @@ static void __meminit free_area_init_cor
if (!size)
continue;
 
-   ret = init_currently_empty_zone(zone, zone_start_pfn, size);
+   ret = init_currently_empty_zone(zone, zone_start_pfn,
+   size, MEMMAP_EARLY);
BUG_ON(ret);
zone_start_pfn += size;
}
diff -puN include/linux/mm.h~sparsemem-enum1 include/linux/mm.h
--- lxc/include/linux/mm.h~sparsemem-enum1  2006-12-19 09:38:45.0 
-0800
+++ lxc-dave/include/linux/mm.h 2006-12-19 09:50:47.0 -0800
@@ -979,7 +979,12 @@ extern int early_pfn_to_nid(unsigned lon
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long);
+enum memmap_context {
+   MEMMAP_EARLY,
+   MEMMAP_HOTPLUG,
+};
+extern void memmap_init_zone(unsigned long, int, unsigned long,
+   unsigned long, enum memmap_context);
 extern void setup_per_zone_pages_min(void);
 extern void mem_init(void);
 extern void show_mem(void);
diff -puN mm/memory_hotplug.c~sparsemem-enum1 mm/memory_hotplug.c
--- lxc/mm/memory_hotplug.c~sparsemem-enum1 2006-12-19 09:39:19.0 
-0800
+++ lxc-dave/mm/memory_hotplug.c2006-12-19 09:50:24.0 -0800
@@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone,

Re: [PATCH] Fix sparsemem on Cell

2006-12-18 Thread KAMEZAWA Hiroyuki

On Mon, 18 Dec 2006 15:16:20 -0800
Dave Hansen <[EMAIL PROTECTED]> wrote:

> enum context
> {
> EARLY,
> HOTPLUG
> };
I like this :)

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-18 Thread Dave Hansen

On Mon, 2006-12-18 at 23:54 +0100, Arnd Bergmann wrote:
>  #ifndef __HAVE_ARCH_MEMMAP_INIT
>  #define memmap_init(size, nid, zone, start_pfn) \
> - memmap_init_zone((size), (nid), (zone), (start_pfn))
> + memmap_init_zone((size), (nid), (zone), (start_pfn), 1)
>  #endif

This is what I was thinking of.  Sometimes I find these kinds of calls a
bit annoying:

foo(0, 1, 1, 0, 99, 22)

It only takes a minute to look up what all of the numbers do, but that
is one minute too many. :)

How about an enum, or a pair of #defines?

enum context
{
EARLY,
HOTPLUG
};
extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
enum call_context);
...

So, the call I quoted above would become:

memmap_init_zone((size), (nid), (zone), (start_pfn), EARLY)

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-18 Thread Arnd Bergmann

On Saturday 16 December 2006 09:03, KAMEZAWA Hiroyuki wrote:
> @@ -273,10 +284,13 @@
> if (ret)
> goto error;
> }
> +   atomic_inc(_hotadd_count);
>  
> /* call arch's memory hotadd */
> ret = arch_add_memory(nid, start, size);
>  
> +   atomic_dec(_hotadd_count);
> +
> if (ret < 0)
> goto error;
>  

This also doesn't fix the problem on cell, since the time when the bug
happens, we're not calling through this function, or arch_add_memory,
at all, but rather invoke __add_pages directly. As BenH already mentioned,
we shouldn't do really call __add_pages at all.

Let me attempt another fix that might address all cases. This is completely
untested as of now, but also addresses Dave's latest comment.

Arnd <><

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 1a3d8a2..723d220 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -543,7 +543,7 @@ virtual_memmap_init (u64 start, u64 end, void *arg)
 
if (map_start < map_end)
memmap_init_zone((unsigned long)(map_end - map_start),
-args->nid, args->zone, page_to_pfn(map_start));
+args->nid, args->zone, page_to_pfn(map_start), 
1);
return 0;
 }
 
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 7f2944d..1e52cd1 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -61,7 +61,7 @@ void memmap_init(unsigned long size, int nid, unsigned long 
zone,
 
if (map_start < map_end)
memmap_init_zone((unsigned long)(map_end - map_start),
-nid, zone, page_to_pfn(map_start));
+nid, zone, page_to_pfn(map_start), 1);
}
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a17b147..6d85068 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -978,7 +978,7 @@ extern int early_pfn_to_nid(unsigned long pfn);
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long);
+extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, 
int);
 extern void setup_per_zone_pages_min(void);
 extern void mem_init(void);
 extern void show_mem(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0c055a0..16c9930 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -71,7 +71,7 @@ static int __add_zone(struct zone *zone, unsigned long 
phys_start_pfn)
if (ret < 0)
return ret;
}
-   memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn);
+   memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn, 0);
return 0;
 }
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8c1a116..60d1ac8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1953,17 +1953,19 @@ static inline unsigned long wait_table_bits(unsigned 
long size)
  * done. Non-atomic initialization, single-pass.
  */
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long 
zone,
-   unsigned long start_pfn)
+   unsigned long start_pfn, int early)
 {
struct page *page;
unsigned long end_pfn = start_pfn + size;
unsigned long pfn;
 
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
-   continue;
+   if (early) {
+   if (!early_pfn_valid(pfn))
+   continue;
+   if (!early_pfn_in_nid(pfn, nid))
+   continue;
+   }
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
@@ -1990,7 +1992,7 @@ void zone_init_free_lists(struct pglist_data *pgdat, 
struct zone *zone,
 
 #ifndef __HAVE_ARCH_MEMMAP_INIT
 #define memmap_init(size, nid, zone, start_pfn) \
-   memmap_init_zone((size), (nid), (zone), (start_pfn))
+   memmap_init_zone((size), (nid), (zone), (start_pfn), 1)
 #endif
 
 static int __cpuinit zone_batchsize(struct zone *zone)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-18 Thread Christoph Hellwig

On Mon, Dec 18, 2006 at 01:13:57PM -0800, Dave Hansen wrote:
> On Sat, 2006-12-16 at 17:03 +0900, KAMEZAWA Hiroyuki wrote:
> >  /* add this memory to iomem resource */
> >  static struct resource *register_memory_resource(u64 start, u64 size)
> >  {
> > @@ -273,10 +284,13 @@
> > if (ret)
> > goto error;
> > }
> > +   atomic_inc(_hotadd_count);
> >  
> > /* call arch's memory hotadd */
> > ret = arch_add_memory(nid, start, size);
> >  
> > +   atomic_dec(_hotadd_count);
> 
> I'd be willing to be that this will work just fine.  But, I think we can
> do it without any static state at all, if we just pass a runtime-or-not
> flag down into the arch_add_memory() call chain.
> 
> I'll code that up so we can compare to yours.  

Yes, I stronly concur that passing an explicit flag is much much better
than any hack involving global state.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-18 Thread Dave Hansen

On Sat, 2006-12-16 at 17:03 +0900, KAMEZAWA Hiroyuki wrote:
>  /* add this memory to iomem resource */
>  static struct resource *register_memory_resource(u64 start, u64 size)
>  {
> @@ -273,10 +284,13 @@
>   if (ret)
>   goto error;
>   }
> + atomic_inc(_hotadd_count);
>  
>   /* call arch's memory hotadd */
>   ret = arch_add_memory(nid, start, size);
>  
> + atomic_dec(_hotadd_count);

I'd be willing to be that this will work just fine.  But, I think we can
do it without any static state at all, if we just pass a runtime-or-not
flag down into the arch_add_memory() call chain.

I'll code that up so we can compare to yours.  

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-18 Thread Dave Hansen

On Sat, 2006-12-16 at 17:03 +0900, KAMEZAWA Hiroyuki wrote:
  /* add this memory to iomem resource */
  static struct resource *register_memory_resource(u64 start, u64 size)
  {
 @@ -273,10 +284,13 @@
   if (ret)
   goto error;
   }
 + atomic_inc(memory_hotadd_count);
  
   /* call arch's memory hotadd */
   ret = arch_add_memory(nid, start, size);
  
 + atomic_dec(memory_hotadd_count);

I'd be willing to be that this will work just fine.  But, I think we can
do it without any static state at all, if we just pass a runtime-or-not
flag down into the arch_add_memory() call chain.

I'll code that up so we can compare to yours.  

-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-18 Thread Christoph Hellwig

On Mon, Dec 18, 2006 at 01:13:57PM -0800, Dave Hansen wrote:
 On Sat, 2006-12-16 at 17:03 +0900, KAMEZAWA Hiroyuki wrote:
   /* add this memory to iomem resource */
   static struct resource *register_memory_resource(u64 start, u64 size)
   {
  @@ -273,10 +284,13 @@
  if (ret)
  goto error;
  }
  +   atomic_inc(memory_hotadd_count);
   
  /* call arch's memory hotadd */
  ret = arch_add_memory(nid, start, size);
   
  +   atomic_dec(memory_hotadd_count);
 
 I'd be willing to be that this will work just fine.  But, I think we can
 do it without any static state at all, if we just pass a runtime-or-not
 flag down into the arch_add_memory() call chain.
 
 I'll code that up so we can compare to yours.  

Yes, I stronly concur that passing an explicit flag is much much better
than any hack involving global state.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-18 Thread Arnd Bergmann

On Saturday 16 December 2006 09:03, KAMEZAWA Hiroyuki wrote:
 @@ -273,10 +284,13 @@
 if (ret)
 goto error;
 }
 +   atomic_inc(memory_hotadd_count);
  
 /* call arch's memory hotadd */
 ret = arch_add_memory(nid, start, size);
  
 +   atomic_dec(memory_hotadd_count);
 +
 if (ret  0)
 goto error;
  

This also doesn't fix the problem on cell, since the time when the bug
happens, we're not calling through this function, or arch_add_memory,
at all, but rather invoke __add_pages directly. As BenH already mentioned,
we shouldn't do really call __add_pages at all.

Let me attempt another fix that might address all cases. This is completely
untested as of now, but also addresses Dave's latest comment.

Arnd 

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 1a3d8a2..723d220 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -543,7 +543,7 @@ virtual_memmap_init (u64 start, u64 end, void *arg)
 
if (map_start  map_end)
memmap_init_zone((unsigned long)(map_end - map_start),
-args-nid, args-zone, page_to_pfn(map_start));
+args-nid, args-zone, page_to_pfn(map_start), 
1);
return 0;
 }
 
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 7f2944d..1e52cd1 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -61,7 +61,7 @@ void memmap_init(unsigned long size, int nid, unsigned long 
zone,
 
if (map_start  map_end)
memmap_init_zone((unsigned long)(map_end - map_start),
-nid, zone, page_to_pfn(map_start));
+nid, zone, page_to_pfn(map_start), 1);
}
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a17b147..6d85068 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -978,7 +978,7 @@ extern int early_pfn_to_nid(unsigned long pfn);
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long);
+extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, 
int);
 extern void setup_per_zone_pages_min(void);
 extern void mem_init(void);
 extern void show_mem(void);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0c055a0..16c9930 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -71,7 +71,7 @@ static int __add_zone(struct zone *zone, unsigned long 
phys_start_pfn)
if (ret  0)
return ret;
}
-   memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn);
+   memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn, 0);
return 0;
 }
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8c1a116..60d1ac8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1953,17 +1953,19 @@ static inline unsigned long wait_table_bits(unsigned 
long size)
  * done. Non-atomic initialization, single-pass.
  */
 void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long 
zone,
-   unsigned long start_pfn)
+   unsigned long start_pfn, int early)
 {
struct page *page;
unsigned long end_pfn = start_pfn + size;
unsigned long pfn;
 
for (pfn = start_pfn; pfn  end_pfn; pfn++) {
-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
-   continue;
+   if (early) {
+   if (!early_pfn_valid(pfn))
+   continue;
+   if (!early_pfn_in_nid(pfn, nid))
+   continue;
+   }
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
@@ -1990,7 +1992,7 @@ void zone_init_free_lists(struct pglist_data *pgdat, 
struct zone *zone,
 
 #ifndef __HAVE_ARCH_MEMMAP_INIT
 #define memmap_init(size, nid, zone, start_pfn) \
-   memmap_init_zone((size), (nid), (zone), (start_pfn))
+   memmap_init_zone((size), (nid), (zone), (start_pfn), 1)
 #endif
 
 static int __cpuinit zone_batchsize(struct zone *zone)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-18 Thread Dave Hansen

On Mon, 2006-12-18 at 23:54 +0100, Arnd Bergmann wrote:
  #ifndef __HAVE_ARCH_MEMMAP_INIT
  #define memmap_init(size, nid, zone, start_pfn) \
 - memmap_init_zone((size), (nid), (zone), (start_pfn))
 + memmap_init_zone((size), (nid), (zone), (start_pfn), 1)
  #endif

This is what I was thinking of.  Sometimes I find these kinds of calls a
bit annoying:

foo(0, 1, 1, 0, 99, 22)

It only takes a minute to look up what all of the numbers do, but that
is one minute too many. :)

How about an enum, or a pair of #defines?

enum context
{
EARLY,
HOTPLUG
};
extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
enum call_context);
...

So, the call I quoted above would become:

memmap_init_zone((size), (nid), (zone), (start_pfn), EARLY)

-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-18 Thread KAMEZAWA Hiroyuki

On Mon, 18 Dec 2006 15:16:20 -0800
Dave Hansen [EMAIL PROTECTED] wrote:

 enum context
 {
 EARLY,
 HOTPLUG
 };
I like this :)

Thanks,
-Kame

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-17 Thread Arnd Bergmann

On Friday 15 December 2006 18:14, Dave Hansen wrote:
> +   if (system_state >= SYSTEM_RUNNING)
> +   return 1;
> +   if (!early_pfn_valid(pfn))
> +   return 0;
> +   if (!early_pfn_in_nid(pfn, nid))
> +   return 0;

I haven't tried it, but I assume this is still wrong. On cell,
we didn't actually hit the case where the init sections have
been overwritten, since we call __add_pages from an initcall.

However, the pages we add are not part of the early_node_map,
so early_pfn_in_nid() returns a bogus result, causing some
page structs not to get initialized. I believe your patch
is going in the right direction, but it does not solve the
bug we have...

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-17 Thread Arnd Bergmann

On Friday 15 December 2006 18:14, Dave Hansen wrote:
 +   if (system_state = SYSTEM_RUNNING)
 +   return 1;
 +   if (!early_pfn_valid(pfn))
 +   return 0;
 +   if (!early_pfn_in_nid(pfn, nid))
 +   return 0;

I haven't tried it, but I assume this is still wrong. On cell,
we didn't actually hit the case where the init sections have
been overwritten, since we call __add_pages from an initcall.

However, the pages we add are not part of the early_node_map,
so early_pfn_in_nid() returns a bogus result, causing some
page structs not to get initialized. I believe your patch
is going in the right direction, but it does not solve the
bug we have...

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-16 Thread KAMEZAWA Hiroyuki

On Fri, 15 Dec 2006 11:45:36 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> Perhaps if the function's role in the world was commented it would be clearer.
> 

How about patch like this ? (this one is not tested.)
Already-exisiting-more-generic-flag is available ?
 
-Kame
==
 include/linux/memory_hotplug.h |8 
 mm/memory_hotplug.c|   14 ++
 mm/page_alloc.c|   11 +++
 3 files changed, 29 insertions(+), 4 deletions(-)

Index: linux-2.6.20-rc1-mm1/include/linux/memory_hotplug.h
===
--- linux-2.6.20-rc1-mm1.orig/include/linux/memory_hotplug.h2006-11-30 
06:57:37.0 +0900
+++ linux-2.6.20-rc1-mm1/include/linux/memory_hotplug.h 2006-12-16 
16:42:40.0 +0900
@@ -133,6 +133,10 @@
 #endif /* CONFIG_NUMA */
 #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
 
+extern int under_memory_hotadd(void);
+
+
+
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
@@ -159,6 +163,10 @@
dump_stack();
return -ENOSYS;
 }
+static inline int under_memory_hotadd()
+{
+   return 0;
+}
 
 #endif /* ! CONFIG_MEMORY_HOTPLUG */
 static inline int __remove_pages(struct zone *zone, unsigned long start_pfn,
Index: linux-2.6.20-rc1-mm1/mm/memory_hotplug.c
===
--- linux-2.6.20-rc1-mm1.orig/mm/memory_hotplug.c   2006-12-16 
14:24:10.0 +0900
+++ linux-2.6.20-rc1-mm1/mm/memory_hotplug.c2006-12-16 16:51:08.0 
+0900
@@ -26,6 +26,17 @@
 
 #include 
 
+/*
+ * Because memory hotplug shares some codes for initilization with boot,
+ * we sometimes have to check what we are doing ?
+ */
+static atomic_t memory_hotadd_count;
+
+int under_memory_hotadd(void)
+{
+   return atomic_read(_hotadd_count);
+}
+
 /* add this memory to iomem resource */
 static struct resource *register_memory_resource(u64 start, u64 size)
 {
@@ -273,10 +284,13 @@
if (ret)
goto error;
}
+   atomic_inc(_hotadd_count);
 
/* call arch's memory hotadd */
ret = arch_add_memory(nid, start, size);
 
+   atomic_dec(_hotadd_count);
+
if (ret < 0)
goto error;
 
Index: linux-2.6.20-rc1-mm1/mm/page_alloc.c
===
--- linux-2.6.20-rc1-mm1.orig/mm/page_alloc.c   2006-12-16 14:24:38.0 
+0900
+++ linux-2.6.20-rc1-mm1/mm/page_alloc.c2006-12-16 16:53:02.0 
+0900
@@ -2069,10 +2069,13 @@
unsigned long pfn;
 
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
-   continue;
+   if (!under_memory_hotadd()) {
+   /* we are in boot */
+   if (!early_pfn_valid(pfn))
+   continue;
+   if (!early_pfn_in_nid(pfn, nid))
+   continue;
+   }
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
init_page_count(page);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-16 Thread KAMEZAWA Hiroyuki

On Fri, 15 Dec 2006 11:45:36 -0800
Andrew Morton [EMAIL PROTECTED] wrote:

 Perhaps if the function's role in the world was commented it would be clearer.
 

How about patch like this ? (this one is not tested.)
Already-exisiting-more-generic-flag is available ?
 
-Kame
==
 include/linux/memory_hotplug.h |8 
 mm/memory_hotplug.c|   14 ++
 mm/page_alloc.c|   11 +++
 3 files changed, 29 insertions(+), 4 deletions(-)

Index: linux-2.6.20-rc1-mm1/include/linux/memory_hotplug.h
===
--- linux-2.6.20-rc1-mm1.orig/include/linux/memory_hotplug.h2006-11-30 
06:57:37.0 +0900
+++ linux-2.6.20-rc1-mm1/include/linux/memory_hotplug.h 2006-12-16 
16:42:40.0 +0900
@@ -133,6 +133,10 @@
 #endif /* CONFIG_NUMA */
 #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
 
+extern int under_memory_hotadd(void);
+
+
+
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
@@ -159,6 +163,10 @@
dump_stack();
return -ENOSYS;
 }
+static inline int under_memory_hotadd()
+{
+   return 0;
+}
 
 #endif /* ! CONFIG_MEMORY_HOTPLUG */
 static inline int __remove_pages(struct zone *zone, unsigned long start_pfn,
Index: linux-2.6.20-rc1-mm1/mm/memory_hotplug.c
===
--- linux-2.6.20-rc1-mm1.orig/mm/memory_hotplug.c   2006-12-16 
14:24:10.0 +0900
+++ linux-2.6.20-rc1-mm1/mm/memory_hotplug.c2006-12-16 16:51:08.0 
+0900
@@ -26,6 +26,17 @@
 
 #include asm/tlbflush.h
 
+/*
+ * Because memory hotplug shares some codes for initilization with boot,
+ * we sometimes have to check what we are doing ?
+ */
+static atomic_t memory_hotadd_count;
+
+int under_memory_hotadd(void)
+{
+   return atomic_read(memory_hotadd_count);
+}
+
 /* add this memory to iomem resource */
 static struct resource *register_memory_resource(u64 start, u64 size)
 {
@@ -273,10 +284,13 @@
if (ret)
goto error;
}
+   atomic_inc(memory_hotadd_count);
 
/* call arch's memory hotadd */
ret = arch_add_memory(nid, start, size);
 
+   atomic_dec(memory_hotadd_count);
+
if (ret  0)
goto error;
 
Index: linux-2.6.20-rc1-mm1/mm/page_alloc.c
===
--- linux-2.6.20-rc1-mm1.orig/mm/page_alloc.c   2006-12-16 14:24:38.0 
+0900
+++ linux-2.6.20-rc1-mm1/mm/page_alloc.c2006-12-16 16:53:02.0 
+0900
@@ -2069,10 +2069,13 @@
unsigned long pfn;
 
for (pfn = start_pfn; pfn  end_pfn; pfn++) {
-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
-   continue;
+   if (!under_memory_hotadd()) {
+   /* we are in boot */
+   if (!early_pfn_valid(pfn))
+   continue;
+   if (!early_pfn_in_nid(pfn, nid))
+   continue;
+   }
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
init_page_count(page);

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-15 Thread Benjamin Herrenschmidt


> The only other assumption is that all memory-hotplug-time pages 
> given to memmap_init_zone() are valid and able to be onlined into
> any any zone after the system is running.  The "valid" part is
> really just a question of whether or not a 'struct page' is there
> for the pfn, and *not* whether there is actual memory.  Since
> all sparsemem sections have contiguous mem_map[]s within them,
> and we only memory hotplug entire sparsemem sections, we can
> be confident that this assumption will hold.
> 
> As for the memory being in the right node, we'll assume tha
> memory hotplug is putting things in the right node.

BTW, just that people know, what we are adding isn't even memory :-) We
are calling __add_pages() to create struct page for the SPE local stores
and register space as we use them later from a nopage() handler (and no,
we can't use no_pfn just yet for various reasons, notably we need to
handle races with unmap_mapping_ranges() and thus have the truncate
logic in).

Those pages, thus, must never be onlined. Ever. It might make sense to
create a way to inform memory hotplug of that fact, but on the other
hand, I wouldn't bother as I have a plan to get rid of those
__add_pages() completely and work without struct page, maybe in a 2.6.21
timeframe.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-15 Thread Andrew Morton

On Fri, 15 Dec 2006 09:24:00 -0800
Dave Hansen <[EMAIL PROTECTED]> wrote:

> 
> ...
>
> I think the comments added say it pretty well, but I'll repeat it here.
> 
> This fix is pretty similar in concept to the one that Arnd posted
> as a temporary workaround, but I've added a few comments explaining
> what the actual assumptions are, and improved it a wee little bit.
> 
> The end goal here is to simply avoid calling the early_*() functions
> when it is _not_ early.  Those functions stop working as soon as
> free_initmem() is called.  system_state is set to SYSTEM_RUNNING
> just after free_initmem() is called, so it seems appropriate to use
> here.

Would really prefer not to do this.  system_state is evil.  Its semantics
are poorly-defined and if someone changes them a bit, or changes memory
initialisation order, you get whacked.

I think an mm-private flag with /*documented*/ semantics would be better. 
It's only a byte.

> +static int __meminit can_online_pfn_into_nid(unsigned long pfn, int nid)

I spent some time trying to work out what "can_online_pfn_into_nid" can
possibly mean and failed.  "We can bring a pfn online then turn it into a
NID"?  Don't think so.  "We can bring this page online and allocate it to
this node"?  Maybe.

Perhaps if the function's role in the world was commented it would be clearer.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-15 Thread Dave Hansen

On Fri, 2006-12-15 at 18:22 +0100, Michael Buesch wrote:
> On Friday 15 December 2006 17:53, Dave Hansen wrote:
> >  lxc-dave/init/main.c |4 
> >  lxc-dave/mm/page_alloc.c |   28 +---
> >  2 files changed, 29 insertions(+), 3 deletions(-)
> >
> > diff -puN init/main.c~sparsemem-fix init/main.c
> > --- lxc/init/main.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
> > +++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800
> > @@ -770,6 +770,10 @@ static int init(void * unused)
> > free_initmem();
> > unlock_kernel();
> > mark_rodata_ro();
> > +   /*
> > +* Memory hotplug requires that this system_state transition
> > +* happer after free_initmem().  (see memmap_init_zone())
> 
> s/happer/happens/
> 
> Other than that, can't this possibly race and crash here?
> I mean, it's not a big race window, but it can happen, no?

That's a good point.  Nice eye.

There are three routes in here: boot-time init, an ACPI call, and a
write to a sysfs file.  Bootmem is taken care of.  The write to a sysfs
file can't happen yet because userspace isn't up.

The only question would be about ACPI.  I _guess_ an ACPI event could
come in at any time, and could hit this race window.  

One other thought I had was to add an argument to memmap_init_zone() to
indicate that the memory being fed to it was contiguous and did not need
the validation checks.

Anybody have thoughts on that?

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-15 Thread Andy Whitcroft


Dave Hansen wrote:

I think the comments added say it pretty well, but I'll repeat it here.

This fix is pretty similar in concept to the one that Arnd posted
as a temporary workaround, but I've added a few comments explaining
what the actual assumptions are, and improved it a wee little bit.

The end goal here is to simply avoid calling the early_*() functions
when it is _not_ early.  Those functions stop working as soon as
free_initmem() is called.  system_state is set to SYSTEM_RUNNING
just after free_initmem() is called, so it seems appropriate to use
here.

I did think twice about actually using SYSTEM_RUNNING because we
moved away from it in other parts of memory hotplug, but those
were actually for _allocations_ in favor of slab_is_available(),
and we really don't care about the slab here.

The only other assumption is that all memory-hotplug-time pages 
given to memmap_init_zone() are valid and able to be onlined into

any any zone after the system is running.  The "valid" part is
really just a question of whether or not a 'struct page' is there
for the pfn, and *not* whether there is actual memory.  Since
all sparsemem sections have contiguous mem_map[]s within them,
and we only memory hotplug entire sparsemem sections, we can
be confident that this assumption will hold.


ie. that if hotplug is pushing this memory into a zone on a node it 
really does know what its doing, and its putting it in the right place. 
 Obviously that code needs to be 'overlap' aware but thats ok for this 
interface.




As for the memory being in the right node, we'll assume tha
memory hotplug is putting things in the right node.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>

---

 lxc-dave/init/main.c |4 
 lxc-dave/mm/page_alloc.c |   28 +---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff -puN init/main.c~sparsemem-fix init/main.c
--- lxc/init/main.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800
@@ -770,6 +770,10 @@ static int init(void * unused)
free_initmem();
unlock_kernel();
mark_rodata_ro();
+   /*
+* Memory hotplug requires that this system_state transition
+* happer after free_initmem().  (see memmap_init_zone())
+*/


typo: happen


system_state = SYSTEM_RUNNING;
numa_default_policy();
 
diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c

--- lxc/mm/page_alloc.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800
@@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
 
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
 
+static int can_online_pfn_into_nid(unsigned long pfn, int nid)

+{


__meminit ?


+   /*
+* There are two things that make this work:
+* 1. The early_pfn...() functions are __init and
+*use __initdata.  If the system is < SYSTEM_RUNNING,
+*those functions and their data will still exist.
+* 2. We also assume that all actual memory hotplug
+*(as opposed to boot-time) calls to this are only
+*for contiguous memory regions.  With sparsemem,
+*this guaranteed is easy because all sections are
+*contiguous and we never online more than one
+*section at a time.  Boot-time memory can have holes
+*anywhere.
+*/
+   if (system_state >= SYSTEM_RUNNING)
+   return 1;


Is there any value in codifying the assumption here, as a safety net?

if (system_state >= SYSTEM_RUNNING)
#ifdef CONFIG_SPARSEMEM
return 1;
#else
return 0;
#endif


+   if (!early_pfn_valid(pfn))
+   return 0;
+   if (!early_pfn_in_nid(pfn, nid))
+   return 0;
+   return 1;
+}
+
 /*
  * Initially all pages are reserved - free ones are freed
  * up by free_all_bootmem() once the early boot process is
@@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
unsigned long pfn;
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {

-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
+   if (!can_online_pfn_into_nid(pfn))


We're not passing nid here?


continue;
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
_


Looks like this would do the job to me.

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-15 Thread Dave Hansen

On Fri, 2006-12-15 at 17:11 +, Andy Whitcroft wrote:
> ie. that if hotplug is pushing this memory into a zone on a node it
> really does know what its doing, and its putting it in the right place.
>   Obviously that code needs to be 'overlap' aware but thats ok for this
> interface.

I'm not sure if this, especially if we're only doing one section at a
time.

> > system_state = SYSTEM_RUNNING;
> > numa_default_policy();
> >
> > diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
> > --- lxc/mm/page_alloc.c~sparsemem-fix   2006-12-15 08:49:53.0 
> > -0800
> > +++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800
> > @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
> >
> >  #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
> >
> > +static int can_online_pfn_into_nid(unsigned long pfn, int nid)
> > +{
> 
> __meminit ?

Yup.  I'll get that.

> > +   /*
> > +* There are two things that make this work:
> > +* 1. The early_pfn...() functions are __init and
> > +*use __initdata.  If the system is < SYSTEM_RUNNING,
> > +*those functions and their data will still exist.
> > +* 2. We also assume that all actual memory hotplug
> > +*(as opposed to boot-time) calls to this are only
> > +*for contiguous memory regions.  With sparsemem,
> > +*this guaranteed is easy because all sections are
> > +*contiguous and we never online more than one
> > +*section at a time.  Boot-time memory can have holes
> > +*anywhere.
> > +*/
> > +   if (system_state >= SYSTEM_RUNNING)
> > +   return 1;
> 
> Is there any value in codifying the assumption here, as a safety net?
> 
> if (system_state >= SYSTEM_RUNNING)
> #ifdef CONFIG_SPARSEMEM
>   return 1;
> #else
>   return 0;
> #endif

Dunno.  The normal case is that it isn't even called without memory
hotplug.  The only non-sparsemem memory hotplug is Keith's baby, and
they lay out all of their mem_map at boot-time anyway, so I don't think
this gets used.

Keith, do you know whether you even do memmap_init_zone() at runtime,
and if you ever have holes if/when you do?

> > +   if (!early_pfn_valid(pfn))
> > +   return 0;
> > +   if (!early_pfn_in_nid(pfn, nid))
> > +   return 0;
> > +   return 1;
> > +}
> > +
> >  /*
> >   * Initially all pages are reserved - free ones are freed
> >   * up by free_all_bootmem() once the early boot process is
> > @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
> > unsigned long pfn;
> >
> > for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> > -   if (!early_pfn_valid(pfn))
> > -   continue;
> > -   if (!early_pfn_in_nid(pfn, nid))
> > +   if (!can_online_pfn_into_nid(pfn))
> 
> We're not passing nid here?

No, because my ppc64 cross compiler has magically broken.  :(

Fixed patch appended.

-- Dave

I think the comments added say it pretty well, but I'll repeat it here.

This fix is pretty similar in concept to the one that Arnd posted
as a temporary workaround, but I've added a few comments explaining
what the actual assumptions are, and improved it a wee little bit.

The end goal here is to simply avoid calling the early_*() functions
when it is _not_ early.  Those functions stop working as soon as
free_initmem() is called.  system_state is set to SYSTEM_RUNNING
just after free_initmem() is called, so it seems appropriate to use
here.

I did think twice about actually using SYSTEM_RUNNING because we
moved away from it in other parts of memory hotplug, but those
were actually for _allocations_ in favor of slab_is_available(),
and we really don't care about the slab here.

The only other assumption is that all memory-hotplug-time pages 
given to memmap_init_zone() are valid and able to be onlined into
any any zone after the system is running.  The "valid" part is
really just a question of whether or not a 'struct page' is there
for the pfn, and *not* whether there is actual memory.  Since
all sparsemem sections have contiguous mem_map[]s within them,
and we only memory hotplug entire sparsemem sections, we can
be confident that this assumption will hold.

As for the memory being in the right node, we'll assume tha
memory hotplug is putting things in the right node.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>

---

 lxc-dave/init/main.c |4 
 lxc-dave/mm/page_alloc.c |   28 +---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff -puN init/main.c~sparsemem-fix init/main.c
--- lxc/init/main.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800
@@ -770,6 +770,10 @@ static int init(void * unused)
free_initmem();
unlock_kernel();
mark_rodata_ro();
+   /*
+* Memory hotplug requires that this system_state transition
+* happen after free_initmem().  (see

Re: [PATCH] Fix sparsemem on Cell

2006-12-15 Thread Michael Buesch

On Friday 15 December 2006 17:53, Dave Hansen wrote:
> 
> I think the comments added say it pretty well, but I'll repeat it here.
> 
> This fix is pretty similar in concept to the one that Arnd posted
> as a temporary workaround, but I've added a few comments explaining
> what the actual assumptions are, and improved it a wee little bit.
> 
> The end goal here is to simply avoid calling the early_*() functions
> when it is _not_ early.  Those functions stop working as soon as
> free_initmem() is called.  system_state is set to SYSTEM_RUNNING
> just after free_initmem() is called, so it seems appropriate to use
> here.
> 
> I did think twice about actually using SYSTEM_RUNNING because we
> moved away from it in other parts of memory hotplug, but those
> were actually for _allocations_ in favor of slab_is_available(),
> and we really don't care about the slab here.
> 
> The only other assumption is that all memory-hotplug-time pages 
> given to memmap_init_zone() are valid and able to be onlined into
> any any zone after the system is running.  The "valid" part is
> really just a question of whether or not a 'struct page' is there
> for the pfn, and *not* whether there is actual memory.  Since
> all sparsemem sections have contiguous mem_map[]s within them,
> and we only memory hotplug entire sparsemem sections, we can
> be confident that this assumption will hold.
> 
> As for the memory being in the right node, we'll assume tha
> memory hotplug is putting things in the right node.
> 
> Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
> 
> ---
> 
>  lxc-dave/init/main.c |4 
>  lxc-dave/mm/page_alloc.c |   28 +---
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff -puN init/main.c~sparsemem-fix init/main.c
> --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800
> +++ lxc-dave/init/main.c  2006-12-15 08:49:53.0 -0800
> @@ -770,6 +770,10 @@ static int init(void * unused)
>   free_initmem();
>   unlock_kernel();
>   mark_rodata_ro();
> + /*
> +  * Memory hotplug requires that this system_state transition
> +  * happer after free_initmem().  (see memmap_init_zone())

s/happer/happens/

Other than that, can't this possibly race and crash here?
I mean, it's not a big race window, but it can happen, no?

> +  */
>   system_state = SYSTEM_RUNNING;
>   numa_default_policy();
>  
> diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
> --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800
> +++ lxc-dave/mm/page_alloc.c  2006-12-15 08:49:53.0 -0800
> @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
>  
>  #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
>  
> +static int can_online_pfn_into_nid(unsigned long pfn, int nid)
> +{
> + /*
> +  * There are two things that make this work:
> +  * 1. The early_pfn...() functions are __init and
> +  *use __initdata.  If the system is < SYSTEM_RUNNING,
> +  *those functions and their data will still exist.
> +  * 2. We also assume that all actual memory hotplug
> +  *(as opposed to boot-time) calls to this are only
> +  *for contiguous memory regions.  With sparsemem,
> +  *this guaranteed is easy because all sections are
> +  *contiguous and we never online more than one
> +  *section at a time.  Boot-time memory can have holes
> +  *anywhere.
> +  */
> + if (system_state >= SYSTEM_RUNNING)
> + return 1;
> + if (!early_pfn_valid(pfn))
> + return 0;
> + if (!early_pfn_in_nid(pfn, nid))
> + return 0;
> + return 1;
> +}
> +
>  /*
>   * Initially all pages are reserved - free ones are freed
>   * up by free_all_bootmem() once the early boot process is
> @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
>   unsigned long pfn;
>  
>   for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> - if (!early_pfn_valid(pfn))
> - continue;
> - if (!early_pfn_in_nid(pfn, nid))
> + if (!can_online_pfn_into_nid(pfn))
>   continue;
>   page = pfn_to_page(pfn);
>   set_page_links(page, zone, nid, pfn);
> _
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix sparsemem on Cell

2006-12-15 Thread Dave Hansen


I think the comments added say it pretty well, but I'll repeat it here.

This fix is pretty similar in concept to the one that Arnd posted
as a temporary workaround, but I've added a few comments explaining
what the actual assumptions are, and improved it a wee little bit.

The end goal here is to simply avoid calling the early_*() functions
when it is _not_ early.  Those functions stop working as soon as
free_initmem() is called.  system_state is set to SYSTEM_RUNNING
just after free_initmem() is called, so it seems appropriate to use
here.

I did think twice about actually using SYSTEM_RUNNING because we
moved away from it in other parts of memory hotplug, but those
were actually for _allocations_ in favor of slab_is_available(),
and we really don't care about the slab here.

The only other assumption is that all memory-hotplug-time pages 
given to memmap_init_zone() are valid and able to be onlined into
any any zone after the system is running.  The "valid" part is
really just a question of whether or not a 'struct page' is there
for the pfn, and *not* whether there is actual memory.  Since
all sparsemem sections have contiguous mem_map[]s within them,
and we only memory hotplug entire sparsemem sections, we can
be confident that this assumption will hold.

As for the memory being in the right node, we'll assume tha
memory hotplug is putting things in the right node.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>

---

 lxc-dave/init/main.c |4 
 lxc-dave/mm/page_alloc.c |   28 +---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff -puN init/main.c~sparsemem-fix init/main.c
--- lxc/init/main.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800
@@ -770,6 +770,10 @@ static int init(void * unused)
free_initmem();
unlock_kernel();
mark_rodata_ro();
+   /*
+* Memory hotplug requires that this system_state transition
+* happer after free_initmem().  (see memmap_init_zone())
+*/
system_state = SYSTEM_RUNNING;
numa_default_policy();
 
diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
--- lxc/mm/page_alloc.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800
@@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
 
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
 
+static int can_online_pfn_into_nid(unsigned long pfn, int nid)
+{
+   /*
+* There are two things that make this work:
+* 1. The early_pfn...() functions are __init and
+*use __initdata.  If the system is < SYSTEM_RUNNING,
+*those functions and their data will still exist.
+* 2. We also assume that all actual memory hotplug
+*(as opposed to boot-time) calls to this are only
+*for contiguous memory regions.  With sparsemem,
+*this guaranteed is easy because all sections are
+*contiguous and we never online more than one
+*section at a time.  Boot-time memory can have holes
+*anywhere.
+*/
+   if (system_state >= SYSTEM_RUNNING)
+   return 1;
+   if (!early_pfn_valid(pfn))
+   return 0;
+   if (!early_pfn_in_nid(pfn, nid))
+   return 0;
+   return 1;
+}
+
 /*
  * Initially all pages are reserved - free ones are freed
  * up by free_all_bootmem() once the early boot process is
@@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
unsigned long pfn;
 
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
+   if (!can_online_pfn_into_nid(pfn))
continue;
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix sparsemem on Cell

2006-12-15 Thread Dave Hansen


I think the comments added say it pretty well, but I'll repeat it here.

This fix is pretty similar in concept to the one that Arnd posted
as a temporary workaround, but I've added a few comments explaining
what the actual assumptions are, and improved it a wee little bit.

The end goal here is to simply avoid calling the early_*() functions
when it is _not_ early.  Those functions stop working as soon as
free_initmem() is called.  system_state is set to SYSTEM_RUNNING
just after free_initmem() is called, so it seems appropriate to use
here.

I did think twice about actually using SYSTEM_RUNNING because we
moved away from it in other parts of memory hotplug, but those
were actually for _allocations_ in favor of slab_is_available(),
and we really don't care about the slab here.

The only other assumption is that all memory-hotplug-time pages 
given to memmap_init_zone() are valid and able to be onlined into
any any zone after the system is running.  The "valid" part is
really just a question of whether or not a 'struct page' is there
for the pfn, and *not* whether there is actual memory.  Since
all sparsemem sections have contiguous mem_map[]s within them,
and we only memory hotplug entire sparsemem sections, we can
be confident that this assumption will hold.

As for the memory being in the right node, we'll assume tha
memory hotplug is putting things in the right node.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>

---

 lxc-dave/init/main.c |4 
 lxc-dave/mm/page_alloc.c |   28 +---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff -puN init/main.c~sparsemem-fix init/main.c
--- lxc/init/main.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800
@@ -770,6 +770,10 @@ static int init(void * unused)
free_initmem();
unlock_kernel();
mark_rodata_ro();
+   /*
+* Memory hotplug requires that this system_state transition
+* happer after free_initmem().  (see memmap_init_zone())
+*/
system_state = SYSTEM_RUNNING;
numa_default_policy();
 
diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
--- lxc/mm/page_alloc.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800
@@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
 
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
 
+static int can_online_pfn_into_nid(unsigned long pfn, int nid)
+{
+   /*
+* There are two things that make this work:
+* 1. The early_pfn...() functions are __init and
+*use __initdata.  If the system is < SYSTEM_RUNNING,
+*those functions and their data will still exist.
+* 2. We also assume that all actual memory hotplug
+*(as opposed to boot-time) calls to this are only
+*for contiguous memory regions.  With sparsemem,
+*this guaranteed is easy because all sections are
+*contiguous and we never online more than one
+*section at a time.  Boot-time memory can have holes
+*anywhere.
+*/
+   if (system_state >= SYSTEM_RUNNING)
+   return 1;
+   if (!early_pfn_valid(pfn))
+   return 0;
+   if (!early_pfn_in_nid(pfn, nid))
+   return 0;
+   return 1;
+}
+
 /*
  * Initially all pages are reserved - free ones are freed
  * up by free_all_bootmem() once the early boot process is
@@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
unsigned long pfn;
 
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
+   if (!can_online_pfn_into_nid(pfn))
continue;
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix sparsemem on Cell

2006-12-15 Thread Dave Hansen


I think the comments added say it pretty well, but I'll repeat it here.

This fix is pretty similar in concept to the one that Arnd posted
as a temporary workaround, but I've added a few comments explaining
what the actual assumptions are, and improved it a wee little bit.

The end goal here is to simply avoid calling the early_*() functions
when it is _not_ early.  Those functions stop working as soon as
free_initmem() is called.  system_state is set to SYSTEM_RUNNING
just after free_initmem() is called, so it seems appropriate to use
here.

I did think twice about actually using SYSTEM_RUNNING because we
moved away from it in other parts of memory hotplug, but those
were actually for _allocations_ in favor of slab_is_available(),
and we really don't care about the slab here.

The only other assumption is that all memory-hotplug-time pages 
given to memmap_init_zone() are valid and able to be onlined into
any any zone after the system is running.  The valid part is
really just a question of whether or not a 'struct page' is there
for the pfn, and *not* whether there is actual memory.  Since
all sparsemem sections have contiguous mem_map[]s within them,
and we only memory hotplug entire sparsemem sections, we can
be confident that this assumption will hold.

As for the memory being in the right node, we'll assume tha
memory hotplug is putting things in the right node.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]

---

 lxc-dave/init/main.c |4 
 lxc-dave/mm/page_alloc.c |   28 +---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff -puN init/main.c~sparsemem-fix init/main.c
--- lxc/init/main.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800
@@ -770,6 +770,10 @@ static int init(void * unused)
free_initmem();
unlock_kernel();
mark_rodata_ro();
+   /*
+* Memory hotplug requires that this system_state transition
+* happer after free_initmem().  (see memmap_init_zone())
+*/
system_state = SYSTEM_RUNNING;
numa_default_policy();
 
diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
--- lxc/mm/page_alloc.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800
@@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
 
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)~((sizeof(long))-1))
 
+static int can_online_pfn_into_nid(unsigned long pfn, int nid)
+{
+   /*
+* There are two things that make this work:
+* 1. The early_pfn...() functions are __init and
+*use __initdata.  If the system is  SYSTEM_RUNNING,
+*those functions and their data will still exist.
+* 2. We also assume that all actual memory hotplug
+*(as opposed to boot-time) calls to this are only
+*for contiguous memory regions.  With sparsemem,
+*this guaranteed is easy because all sections are
+*contiguous and we never online more than one
+*section at a time.  Boot-time memory can have holes
+*anywhere.
+*/
+   if (system_state = SYSTEM_RUNNING)
+   return 1;
+   if (!early_pfn_valid(pfn))
+   return 0;
+   if (!early_pfn_in_nid(pfn, nid))
+   return 0;
+   return 1;
+}
+
 /*
  * Initially all pages are reserved - free ones are freed
  * up by free_all_bootmem() once the early boot process is
@@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
unsigned long pfn;
 
for (pfn = start_pfn; pfn  end_pfn; pfn++) {
-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
+   if (!can_online_pfn_into_nid(pfn))
continue;
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
_
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix sparsemem on Cell

2006-12-15 Thread Dave Hansen


I think the comments added say it pretty well, but I'll repeat it here.

This fix is pretty similar in concept to the one that Arnd posted
as a temporary workaround, but I've added a few comments explaining
what the actual assumptions are, and improved it a wee little bit.

The end goal here is to simply avoid calling the early_*() functions
when it is _not_ early.  Those functions stop working as soon as
free_initmem() is called.  system_state is set to SYSTEM_RUNNING
just after free_initmem() is called, so it seems appropriate to use
here.

I did think twice about actually using SYSTEM_RUNNING because we
moved away from it in other parts of memory hotplug, but those
were actually for _allocations_ in favor of slab_is_available(),
and we really don't care about the slab here.

The only other assumption is that all memory-hotplug-time pages 
given to memmap_init_zone() are valid and able to be onlined into
any any zone after the system is running.  The valid part is
really just a question of whether or not a 'struct page' is there
for the pfn, and *not* whether there is actual memory.  Since
all sparsemem sections have contiguous mem_map[]s within them,
and we only memory hotplug entire sparsemem sections, we can
be confident that this assumption will hold.

As for the memory being in the right node, we'll assume tha
memory hotplug is putting things in the right node.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]

---

 lxc-dave/init/main.c |4 
 lxc-dave/mm/page_alloc.c |   28 +---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff -puN init/main.c~sparsemem-fix init/main.c
--- lxc/init/main.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800
@@ -770,6 +770,10 @@ static int init(void * unused)
free_initmem();
unlock_kernel();
mark_rodata_ro();
+   /*
+* Memory hotplug requires that this system_state transition
+* happer after free_initmem().  (see memmap_init_zone())
+*/
system_state = SYSTEM_RUNNING;
numa_default_policy();
 
diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
--- lxc/mm/page_alloc.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800
@@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
 
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)~((sizeof(long))-1))
 
+static int can_online_pfn_into_nid(unsigned long pfn, int nid)
+{
+   /*
+* There are two things that make this work:
+* 1. The early_pfn...() functions are __init and
+*use __initdata.  If the system is  SYSTEM_RUNNING,
+*those functions and their data will still exist.
+* 2. We also assume that all actual memory hotplug
+*(as opposed to boot-time) calls to this are only
+*for contiguous memory regions.  With sparsemem,
+*this guaranteed is easy because all sections are
+*contiguous and we never online more than one
+*section at a time.  Boot-time memory can have holes
+*anywhere.
+*/
+   if (system_state = SYSTEM_RUNNING)
+   return 1;
+   if (!early_pfn_valid(pfn))
+   return 0;
+   if (!early_pfn_in_nid(pfn, nid))
+   return 0;
+   return 1;
+}
+
 /*
  * Initially all pages are reserved - free ones are freed
  * up by free_all_bootmem() once the early boot process is
@@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
unsigned long pfn;
 
for (pfn = start_pfn; pfn  end_pfn; pfn++) {
-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
+   if (!can_online_pfn_into_nid(pfn))
continue;
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
_
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-15 Thread Michael Buesch

On Friday 15 December 2006 17:53, Dave Hansen wrote:
 
 I think the comments added say it pretty well, but I'll repeat it here.
 
 This fix is pretty similar in concept to the one that Arnd posted
 as a temporary workaround, but I've added a few comments explaining
 what the actual assumptions are, and improved it a wee little bit.
 
 The end goal here is to simply avoid calling the early_*() functions
 when it is _not_ early.  Those functions stop working as soon as
 free_initmem() is called.  system_state is set to SYSTEM_RUNNING
 just after free_initmem() is called, so it seems appropriate to use
 here.
 
 I did think twice about actually using SYSTEM_RUNNING because we
 moved away from it in other parts of memory hotplug, but those
 were actually for _allocations_ in favor of slab_is_available(),
 and we really don't care about the slab here.
 
 The only other assumption is that all memory-hotplug-time pages 
 given to memmap_init_zone() are valid and able to be onlined into
 any any zone after the system is running.  The valid part is
 really just a question of whether or not a 'struct page' is there
 for the pfn, and *not* whether there is actual memory.  Since
 all sparsemem sections have contiguous mem_map[]s within them,
 and we only memory hotplug entire sparsemem sections, we can
 be confident that this assumption will hold.
 
 As for the memory being in the right node, we'll assume tha
 memory hotplug is putting things in the right node.
 
 Signed-off-by: Dave Hansen [EMAIL PROTECTED]
 
 ---
 
  lxc-dave/init/main.c |4 
  lxc-dave/mm/page_alloc.c |   28 +---
  2 files changed, 29 insertions(+), 3 deletions(-)
 
 diff -puN init/main.c~sparsemem-fix init/main.c
 --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800
 +++ lxc-dave/init/main.c  2006-12-15 08:49:53.0 -0800
 @@ -770,6 +770,10 @@ static int init(void * unused)
   free_initmem();
   unlock_kernel();
   mark_rodata_ro();
 + /*
 +  * Memory hotplug requires that this system_state transition
 +  * happer after free_initmem().  (see memmap_init_zone())

s/happer/happens/

Other than that, can't this possibly race and crash here?
I mean, it's not a big race window, but it can happen, no?

 +  */
   system_state = SYSTEM_RUNNING;
   numa_default_policy();
  
 diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
 --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800
 +++ lxc-dave/mm/page_alloc.c  2006-12-15 08:49:53.0 -0800
 @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
  
  #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)~((sizeof(long))-1))
  
 +static int can_online_pfn_into_nid(unsigned long pfn, int nid)
 +{
 + /*
 +  * There are two things that make this work:
 +  * 1. The early_pfn...() functions are __init and
 +  *use __initdata.  If the system is  SYSTEM_RUNNING,
 +  *those functions and their data will still exist.
 +  * 2. We also assume that all actual memory hotplug
 +  *(as opposed to boot-time) calls to this are only
 +  *for contiguous memory regions.  With sparsemem,
 +  *this guaranteed is easy because all sections are
 +  *contiguous and we never online more than one
 +  *section at a time.  Boot-time memory can have holes
 +  *anywhere.
 +  */
 + if (system_state = SYSTEM_RUNNING)
 + return 1;
 + if (!early_pfn_valid(pfn))
 + return 0;
 + if (!early_pfn_in_nid(pfn, nid))
 + return 0;
 + return 1;
 +}
 +
  /*
   * Initially all pages are reserved - free ones are freed
   * up by free_all_bootmem() once the early boot process is
 @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
   unsigned long pfn;
  
   for (pfn = start_pfn; pfn  end_pfn; pfn++) {
 - if (!early_pfn_valid(pfn))
 - continue;
 - if (!early_pfn_in_nid(pfn, nid))
 + if (!can_online_pfn_into_nid(pfn))
   continue;
   page = pfn_to_page(pfn);
   set_page_links(page, zone, nid, pfn);
 _
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
 

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-15 Thread Dave Hansen

On Fri, 2006-12-15 at 17:11 +, Andy Whitcroft wrote:
 ie. that if hotplug is pushing this memory into a zone on a node it
 really does know what its doing, and its putting it in the right place.
   Obviously that code needs to be 'overlap' aware but thats ok for this
 interface.

I'm not sure if this, especially if we're only doing one section at a
time.

  system_state = SYSTEM_RUNNING;
  numa_default_policy();
 
  diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c
  --- lxc/mm/page_alloc.c~sparsemem-fix   2006-12-15 08:49:53.0 
  -0800
  +++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800
  @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
 
   #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)~((sizeof(long))-1))
 
  +static int can_online_pfn_into_nid(unsigned long pfn, int nid)
  +{
 
 __meminit ?

Yup.  I'll get that.

  +   /*
  +* There are two things that make this work:
  +* 1. The early_pfn...() functions are __init and
  +*use __initdata.  If the system is  SYSTEM_RUNNING,
  +*those functions and their data will still exist.
  +* 2. We also assume that all actual memory hotplug
  +*(as opposed to boot-time) calls to this are only
  +*for contiguous memory regions.  With sparsemem,
  +*this guaranteed is easy because all sections are
  +*contiguous and we never online more than one
  +*section at a time.  Boot-time memory can have holes
  +*anywhere.
  +*/
  +   if (system_state = SYSTEM_RUNNING)
  +   return 1;
 
 Is there any value in codifying the assumption here, as a safety net?
 
 if (system_state = SYSTEM_RUNNING)
 #ifdef CONFIG_SPARSEMEM
   return 1;
 #else
   return 0;
 #endif

Dunno.  The normal case is that it isn't even called without memory
hotplug.  The only non-sparsemem memory hotplug is Keith's baby, and
they lay out all of their mem_map at boot-time anyway, so I don't think
this gets used.

Keith, do you know whether you even do memmap_init_zone() at runtime,
and if you ever have holes if/when you do?

  +   if (!early_pfn_valid(pfn))
  +   return 0;
  +   if (!early_pfn_in_nid(pfn, nid))
  +   return 0;
  +   return 1;
  +}
  +
   /*
* Initially all pages are reserved - free ones are freed
* up by free_all_bootmem() once the early boot process is
  @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
  unsigned long pfn;
 
  for (pfn = start_pfn; pfn  end_pfn; pfn++) {
  -   if (!early_pfn_valid(pfn))
  -   continue;
  -   if (!early_pfn_in_nid(pfn, nid))
  +   if (!can_online_pfn_into_nid(pfn))
 
 We're not passing nid here?

No, because my ppc64 cross compiler has magically broken.  :(

Fixed patch appended.

-- Dave


I think the comments added say it pretty well, but I'll repeat it here.

This fix is pretty similar in concept to the one that Arnd posted
as a temporary workaround, but I've added a few comments explaining
what the actual assumptions are, and improved it a wee little bit.

The end goal here is to simply avoid calling the early_*() functions
when it is _not_ early.  Those functions stop working as soon as
free_initmem() is called.  system_state is set to SYSTEM_RUNNING
just after free_initmem() is called, so it seems appropriate to use
here.

I did think twice about actually using SYSTEM_RUNNING because we
moved away from it in other parts of memory hotplug, but those
were actually for _allocations_ in favor of slab_is_available(),
and we really don't care about the slab here.

The only other assumption is that all memory-hotplug-time pages 
given to memmap_init_zone() are valid and able to be onlined into
any any zone after the system is running.  The valid part is
really just a question of whether or not a 'struct page' is there
for the pfn, and *not* whether there is actual memory.  Since
all sparsemem sections have contiguous mem_map[]s within them,
and we only memory hotplug entire sparsemem sections, we can
be confident that this assumption will hold.

As for the memory being in the right node, we'll assume tha
memory hotplug is putting things in the right node.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]

---

 lxc-dave/init/main.c |4 
 lxc-dave/mm/page_alloc.c |   28 +---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff -puN init/main.c~sparsemem-fix init/main.c
--- lxc/init/main.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800
@@ -770,6 +770,10 @@ static int init(void * unused)
free_initmem();
unlock_kernel();
mark_rodata_ro();
+   /*
+* Memory hotplug requires that this system_state transition
+* happen after free_initmem().  (see memmap_init_zone())
+*/
system_state = SYSTEM_RUNNING;
numa_default_policy();
 
diff -puN

Re: [PATCH] Fix sparsemem on Cell

2006-12-15 Thread Andy Whitcroft


Dave Hansen wrote:

I think the comments added say it pretty well, but I'll repeat it here.

This fix is pretty similar in concept to the one that Arnd posted
as a temporary workaround, but I've added a few comments explaining
what the actual assumptions are, and improved it a wee little bit.

The end goal here is to simply avoid calling the early_*() functions
when it is _not_ early.  Those functions stop working as soon as
free_initmem() is called.  system_state is set to SYSTEM_RUNNING
just after free_initmem() is called, so it seems appropriate to use
here.

I did think twice about actually using SYSTEM_RUNNING because we
moved away from it in other parts of memory hotplug, but those
were actually for _allocations_ in favor of slab_is_available(),
and we really don't care about the slab here.

The only other assumption is that all memory-hotplug-time pages 
given to memmap_init_zone() are valid and able to be onlined into

any any zone after the system is running.  The valid part is
really just a question of whether or not a 'struct page' is there
for the pfn, and *not* whether there is actual memory.  Since
all sparsemem sections have contiguous mem_map[]s within them,
and we only memory hotplug entire sparsemem sections, we can
be confident that this assumption will hold.


ie. that if hotplug is pushing this memory into a zone on a node it 
really does know what its doing, and its putting it in the right place. 
 Obviously that code needs to be 'overlap' aware but thats ok for this 
interface.




As for the memory being in the right node, we'll assume tha
memory hotplug is putting things in the right node.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]

---

 lxc-dave/init/main.c |4 
 lxc-dave/mm/page_alloc.c |   28 +---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff -puN init/main.c~sparsemem-fix init/main.c
--- lxc/init/main.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800
@@ -770,6 +770,10 @@ static int init(void * unused)
free_initmem();
unlock_kernel();
mark_rodata_ro();
+   /*
+* Memory hotplug requires that this system_state transition
+* happer after free_initmem().  (see memmap_init_zone())
+*/


typo: happen


system_state = SYSTEM_RUNNING;
numa_default_policy();
 
diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c

--- lxc/mm/page_alloc.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
+++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800
@@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b
 
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)~((sizeof(long))-1))
 
+static int can_online_pfn_into_nid(unsigned long pfn, int nid)

+{


__meminit ?


+   /*
+* There are two things that make this work:
+* 1. The early_pfn...() functions are __init and
+*use __initdata.  If the system is  SYSTEM_RUNNING,
+*those functions and their data will still exist.
+* 2. We also assume that all actual memory hotplug
+*(as opposed to boot-time) calls to this are only
+*for contiguous memory regions.  With sparsemem,
+*this guaranteed is easy because all sections are
+*contiguous and we never online more than one
+*section at a time.  Boot-time memory can have holes
+*anywhere.
+*/
+   if (system_state = SYSTEM_RUNNING)
+   return 1;


Is there any value in codifying the assumption here, as a safety net?

if (system_state = SYSTEM_RUNNING)
#ifdef CONFIG_SPARSEMEM
return 1;
#else
return 0;
#endif


+   if (!early_pfn_valid(pfn))
+   return 0;
+   if (!early_pfn_in_nid(pfn, nid))
+   return 0;
+   return 1;
+}
+
 /*
  * Initially all pages are reserved - free ones are freed
  * up by free_all_bootmem() once the early boot process is
@@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned
unsigned long pfn;
 
 	for (pfn = start_pfn; pfn  end_pfn; pfn++) {

-   if (!early_pfn_valid(pfn))
-   continue;
-   if (!early_pfn_in_nid(pfn, nid))
+   if (!can_online_pfn_into_nid(pfn))


We're not passing nid here?


continue;
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
_


Looks like this would do the job to me.

-apw
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-15 Thread Dave Hansen

On Fri, 2006-12-15 at 18:22 +0100, Michael Buesch wrote:
 On Friday 15 December 2006 17:53, Dave Hansen wrote:
   lxc-dave/init/main.c |4 
   lxc-dave/mm/page_alloc.c |   28 +---
   2 files changed, 29 insertions(+), 3 deletions(-)
 
  diff -puN init/main.c~sparsemem-fix init/main.c
  --- lxc/init/main.c~sparsemem-fix   2006-12-15 08:49:53.0 -0800
  +++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800
  @@ -770,6 +770,10 @@ static int init(void * unused)
  free_initmem();
  unlock_kernel();
  mark_rodata_ro();
  +   /*
  +* Memory hotplug requires that this system_state transition
  +* happer after free_initmem().  (see memmap_init_zone())
 
 s/happer/happens/
 
 Other than that, can't this possibly race and crash here?
 I mean, it's not a big race window, but it can happen, no?

That's a good point.  Nice eye.

There are three routes in here: boot-time init, an ACPI call, and a
write to a sysfs file.  Bootmem is taken care of.  The write to a sysfs
file can't happen yet because userspace isn't up.

The only question would be about ACPI.  I _guess_ an ACPI event could
come in at any time, and could hit this race window.  

One other thought I had was to add an argument to memmap_init_zone() to
indicate that the memory being fed to it was contiguous and did not need
the validation checks.

Anybody have thoughts on that?

-- Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-15 Thread Andrew Morton

On Fri, 15 Dec 2006 09:24:00 -0800
Dave Hansen [EMAIL PROTECTED] wrote:

 
 ...

 I think the comments added say it pretty well, but I'll repeat it here.
 
 This fix is pretty similar in concept to the one that Arnd posted
 as a temporary workaround, but I've added a few comments explaining
 what the actual assumptions are, and improved it a wee little bit.
 
 The end goal here is to simply avoid calling the early_*() functions
 when it is _not_ early.  Those functions stop working as soon as
 free_initmem() is called.  system_state is set to SYSTEM_RUNNING
 just after free_initmem() is called, so it seems appropriate to use
 here.

Would really prefer not to do this.  system_state is evil.  Its semantics
are poorly-defined and if someone changes them a bit, or changes memory
initialisation order, you get whacked.

I think an mm-private flag with /*documented*/ semantics would be better. 
It's only a byte.

 +static int __meminit can_online_pfn_into_nid(unsigned long pfn, int nid)

I spent some time trying to work out what can_online_pfn_into_nid can
possibly mean and failed.  We can bring a pfn online then turn it into a
NID?  Don't think so.  We can bring this page online and allocate it to
this node?  Maybe.

Perhaps if the function's role in the world was commented it would be clearer.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix sparsemem on Cell

2006-12-15 Thread Benjamin Herrenschmidt


 The only other assumption is that all memory-hotplug-time pages 
 given to memmap_init_zone() are valid and able to be onlined into
 any any zone after the system is running.  The valid part is
 really just a question of whether or not a 'struct page' is there
 for the pfn, and *not* whether there is actual memory.  Since
 all sparsemem sections have contiguous mem_map[]s within them,
 and we only memory hotplug entire sparsemem sections, we can
 be confident that this assumption will hold.
 
 As for the memory being in the right node, we'll assume tha
 memory hotplug is putting things in the right node.

BTW, just that people know, what we are adding isn't even memory :-) We
are calling __add_pages() to create struct page for the SPE local stores
and register space as we use them later from a nopage() handler (and no,
we can't use no_pfn just yet for various reasons, notably we need to
handle races with unmap_mapping_ranges() and thus have the truncate
logic in).

Those pages, thus, must never be onlined. Ever. It might make sense to
create a way to inform memory hotplug of that fact, but on the other
hand, I wouldn't bother as I have a plan to get rid of those
__add_pages() completely and work without struct page, maybe in a 2.6.21
timeframe.

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

44 matches

Mail list logo