Re: [PATCH] Fix sparsemem on Cell (take 3)
On 1/7/07, Dave Hansen <[EMAIL PROTECTED]> wrote: On Fri, 2007-01-05 at 22:52 -0600, John Rose wrote: > Could this break ia64, given that it uses memmap_init_zone()? You are right, I think it does. Boot tested OK on ia64 with this latest version of the patch. (forgot to click plain text on gmail the first time..sorry if you got html mail or repeat) Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell (take 3)
On Sunday 07 January 2007 09:58, Dave Hansen wrote: > The following patch fixes an oops experienced on the Cell architecture > when init-time functions, early_*(), are called at runtime. It alters > the call paths to make sure that the callers explicitly say whether the > call is being made on behalf of a hotplug even, or happening at > boot-time. > > It has been compile tested on ia64, s390, i386 and x86_64. I can't test it here, since I'm travelling at the moment, but this version looks good to me. Thanks for picking it up again! > Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> Acked-by: Arnd Bergmann <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell (take 3)
On Fri, 2007-01-05 at 22:52 -0600, John Rose wrote: > > I dropped this on the floor over Christmas. This has had a few smoke > > tests on ppc64 and i386 and is ready for -mm. Against 2.6.20-rc2-mm1. > > Could this break ia64, given that it uses memmap_init_zone()? You are right, I think it does. Here's an updated patch to replace the earlier one. I had to move the enum definition over to a different header because ia64 evidently has a different include order. --- The following patch fixes an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. It has been compile tested on ia64, s390, i386 and x86_64. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/arch/ia64/mm/init.c|5 +++-- lxc-dave/arch/s390/mm/vmem.c|3 ++- lxc-dave/include/linux/mm.h |3 ++- lxc-dave/include/linux/mmzone.h |8 ++-- lxc-dave/mm/memory_hotplug.c|6 -- lxc-dave/mm/page_alloc.c| 25 + 6 files changed, 34 insertions(+), 16 deletions(-) diff -puN arch/s390/mm/vmem.c~Re-_PATCH_Fix_sparsemem_on_Cell arch/s390/mm/vmem.c --- lxc/arch/s390/mm/vmem.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.0 -0800 +++ lxc-dave/arch/s390/mm/vmem.c2007-01-07 00:47:02.0 -0800 @@ -61,7 +61,8 @@ void memmap_init(unsigned long size, int if (map_start < map_end) memmap_init_zone((unsigned long)(map_end - map_start), -nid, zone, page_to_pfn(map_start)); +nid, zone, page_to_pfn(map_start), +MEMMAP_EARLY); } } diff -puN include/linux/mm.h~Re-_PATCH_Fix_sparsemem_on_Cell include/linux/mm.h --- lxc/include/linux/mm.h~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.0 -0800 +++ lxc-dave/include/linux/mm.h 2007-01-06 23:57:59.0 -0800 @@ -979,7 +979,8 @@ extern int early_pfn_to_nid(unsigned lon #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ extern void set_dma_reserve(unsigned long new_dma_reserve); -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long); +extern void memmap_init_zone(unsigned long, int, unsigned long, + unsigned long, enum memmap_context); extern void setup_per_zone_pages_min(void); extern void mem_init(void); extern void show_mem(void); diff -puN include/linux/mmzone.h~Re-_PATCH_Fix_sparsemem_on_Cell include/linux/mmzone.h --- lxc/include/linux/mmzone.h~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.0 -0800 +++ lxc-dave/include/linux/mmzone.h 2007-01-06 23:58:15.0 -0800 @@ -471,9 +471,13 @@ void build_all_zonelists(void); void wakeup_kswapd(struct zone *zone, int order); int zone_watermark_ok(struct zone *z, int order, unsigned long mark, int classzone_idx, int alloc_flags); - +enum memmap_context { + MEMMAP_EARLY, + MEMMAP_HOTPLUG, +}; extern int init_currently_empty_zone(struct zone *zone, unsigned long start_pfn, -unsigned long size); +unsigned long size, +enum memmap_context context); #ifdef CONFIG_HAVE_MEMORY_PRESENT void memory_present(int nid, unsigned long start, unsigned long end); diff -puN mm/memory_hotplug.c~Re-_PATCH_Fix_sparsemem_on_Cell mm/memory_hotplug.c --- lxc/mm/memory_hotplug.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.0 -0800 +++ lxc-dave/mm/memory_hotplug.c2007-01-05 15:38:23.0 -0800 @@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone, zone_type = zone - pgdat->node_zones; if (!populated_zone(zone)) { int ret = 0; - ret = init_currently_empty_zone(zone, phys_start_pfn, nr_pages); + ret = init_currently_empty_zone(zone, phys_start_pfn, + nr_pages, MEMMAP_HOTPLUG); if (ret < 0) return ret; } - memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn); + memmap_init_zone(nr_pages, nid, zone_type, +phys_start_pfn, MEMMAP_HOTPLUG); return 0; } diff -puN mm/page_alloc.c~Re-_PATCH_Fix_sparsemem_on_Cell mm/page_alloc.c --- lxc/mm/page_alloc.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.0 -0800 +++ lxc-dave/mm/page_alloc.c2007-01-07 00:35:27.0 -0800 @@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, -
Re: [PATCH] Fix sparsemem on Cell (take 3)
On Fri, 2007-01-05 at 22:52 -0600, John Rose wrote: I dropped this on the floor over Christmas. This has had a few smoke tests on ppc64 and i386 and is ready for -mm. Against 2.6.20-rc2-mm1. Could this break ia64, given that it uses memmap_init_zone()? You are right, I think it does. Here's an updated patch to replace the earlier one. I had to move the enum definition over to a different header because ia64 evidently has a different include order. --- The following patch fixes an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. It has been compile tested on ia64, s390, i386 and x86_64. Signed-off-by: Dave Hansen [EMAIL PROTECTED] --- lxc-dave/arch/ia64/mm/init.c|5 +++-- lxc-dave/arch/s390/mm/vmem.c|3 ++- lxc-dave/include/linux/mm.h |3 ++- lxc-dave/include/linux/mmzone.h |8 ++-- lxc-dave/mm/memory_hotplug.c|6 -- lxc-dave/mm/page_alloc.c| 25 + 6 files changed, 34 insertions(+), 16 deletions(-) diff -puN arch/s390/mm/vmem.c~Re-_PATCH_Fix_sparsemem_on_Cell arch/s390/mm/vmem.c --- lxc/arch/s390/mm/vmem.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.0 -0800 +++ lxc-dave/arch/s390/mm/vmem.c2007-01-07 00:47:02.0 -0800 @@ -61,7 +61,8 @@ void memmap_init(unsigned long size, int if (map_start map_end) memmap_init_zone((unsigned long)(map_end - map_start), -nid, zone, page_to_pfn(map_start)); +nid, zone, page_to_pfn(map_start), +MEMMAP_EARLY); } } diff -puN include/linux/mm.h~Re-_PATCH_Fix_sparsemem_on_Cell include/linux/mm.h --- lxc/include/linux/mm.h~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.0 -0800 +++ lxc-dave/include/linux/mm.h 2007-01-06 23:57:59.0 -0800 @@ -979,7 +979,8 @@ extern int early_pfn_to_nid(unsigned lon #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ extern void set_dma_reserve(unsigned long new_dma_reserve); -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long); +extern void memmap_init_zone(unsigned long, int, unsigned long, + unsigned long, enum memmap_context); extern void setup_per_zone_pages_min(void); extern void mem_init(void); extern void show_mem(void); diff -puN include/linux/mmzone.h~Re-_PATCH_Fix_sparsemem_on_Cell include/linux/mmzone.h --- lxc/include/linux/mmzone.h~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.0 -0800 +++ lxc-dave/include/linux/mmzone.h 2007-01-06 23:58:15.0 -0800 @@ -471,9 +471,13 @@ void build_all_zonelists(void); void wakeup_kswapd(struct zone *zone, int order); int zone_watermark_ok(struct zone *z, int order, unsigned long mark, int classzone_idx, int alloc_flags); - +enum memmap_context { + MEMMAP_EARLY, + MEMMAP_HOTPLUG, +}; extern int init_currently_empty_zone(struct zone *zone, unsigned long start_pfn, -unsigned long size); +unsigned long size, +enum memmap_context context); #ifdef CONFIG_HAVE_MEMORY_PRESENT void memory_present(int nid, unsigned long start, unsigned long end); diff -puN mm/memory_hotplug.c~Re-_PATCH_Fix_sparsemem_on_Cell mm/memory_hotplug.c --- lxc/mm/memory_hotplug.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.0 -0800 +++ lxc-dave/mm/memory_hotplug.c2007-01-05 15:38:23.0 -0800 @@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone, zone_type = zone - pgdat-node_zones; if (!populated_zone(zone)) { int ret = 0; - ret = init_currently_empty_zone(zone, phys_start_pfn, nr_pages); + ret = init_currently_empty_zone(zone, phys_start_pfn, + nr_pages, MEMMAP_HOTPLUG); if (ret 0) return ret; } - memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn); + memmap_init_zone(nr_pages, nid, zone_type, +phys_start_pfn, MEMMAP_HOTPLUG); return 0; } diff -puN mm/page_alloc.c~Re-_PATCH_Fix_sparsemem_on_Cell mm/page_alloc.c --- lxc/mm/page_alloc.c~Re-_PATCH_Fix_sparsemem_on_Cell 2007-01-05 15:38:23.0 -0800 +++ lxc-dave/mm/page_alloc.c2007-01-07 00:35:27.0 -0800 @@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, -
Re: [PATCH] Fix sparsemem on Cell (take 3)
On Sunday 07 January 2007 09:58, Dave Hansen wrote: The following patch fixes an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. It has been compile tested on ia64, s390, i386 and x86_64. I can't test it here, since I'm travelling at the moment, but this version looks good to me. Thanks for picking it up again! Signed-off-by: Dave Hansen [EMAIL PROTECTED] Acked-by: Arnd Bergmann [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell (take 3)
On 1/7/07, Dave Hansen [EMAIL PROTECTED] wrote: On Fri, 2007-01-05 at 22:52 -0600, John Rose wrote: Could this break ia64, given that it uses memmap_init_zone()? You are right, I think it does. Boot tested OK on ia64 with this latest version of the patch. (forgot to click plain text on gmail the first time..sorry if you got html mail or repeat) Tim - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell (take 3)
> I dropped this on the floor over Christmas. This has had a few smoke > tests on ppc64 and i386 and is ready for -mm. Against 2.6.20-rc2-mm1. Could this break ia64, given that it uses memmap_init_zone()? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix sparsemem on Cell (take 3)
I dropped this on the floor over Christmas. This has had a few smoke tests on ppc64 and i386 and is ready for -mm. Against 2.6.20-rc2-mm1. The following patch fixes an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/arch/s390/mm/vmem.c|3 ++- lxc-dave/include/linux/mm.h |7 ++- lxc-dave/include/linux/mmzone.h |3 ++- lxc-dave/mm/memory_hotplug.c|6 -- lxc-dave/mm/page_alloc.c| 25 + 5 files changed, 31 insertions(+), 13 deletions(-) diff -puN mm/page_alloc.c~sparsemem-enum1 mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-enum1 2006-12-19 09:38:34.0 -0800 +++ lxc-dave/mm/page_alloc.c2006-12-19 11:18:24.0 -0800 @@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, - unsigned long start_pfn) + unsigned long start_pfn, enum memmap_context context) { struct page *page; unsigned long end_pfn = start_pfn + size; unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; + /* +* There can be holes in boot-time mem_map[]s +* handed to this function. They do not +* exist on hotplugged memory. +*/ + if (context == MEMMAP_EARLY) { + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); @@ -2102,7 +2109,7 @@ void zone_init_free_lists(struct pglist_ #ifndef __HAVE_ARCH_MEMMAP_INIT #define memmap_init(size, nid, zone, start_pfn) \ - memmap_init_zone((size), (nid), (zone), (start_pfn)) + memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY) #endif static int __cpuinit zone_batchsize(struct zone *zone) @@ -2348,7 +2355,8 @@ static __meminit void zone_pcp_init(stru __meminit int init_currently_empty_zone(struct zone *zone, unsigned long zone_start_pfn, - unsigned long size) + unsigned long size, + enum memmap_context context) { struct pglist_data *pgdat = zone->zone_pgdat; int ret; @@ -2792,7 +2800,8 @@ static void __meminit free_area_init_cor if (!size) continue; - ret = init_currently_empty_zone(zone, zone_start_pfn, size); + ret = init_currently_empty_zone(zone, zone_start_pfn, + size, MEMMAP_EARLY); BUG_ON(ret); zone_start_pfn += size; } diff -puN include/linux/mm.h~sparsemem-enum1 include/linux/mm.h --- lxc/include/linux/mm.h~sparsemem-enum1 2006-12-19 09:38:45.0 -0800 +++ lxc-dave/include/linux/mm.h 2006-12-19 09:50:47.0 -0800 @@ -979,7 +979,12 @@ extern int early_pfn_to_nid(unsigned lon #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ extern void set_dma_reserve(unsigned long new_dma_reserve); -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long); +enum memmap_context { + MEMMAP_EARLY, + MEMMAP_HOTPLUG, +}; +extern void memmap_init_zone(unsigned long, int, unsigned long, + unsigned long, enum memmap_context); extern void setup_per_zone_pages_min(void); extern void mem_init(void); extern void show_mem(void); diff -puN mm/memory_hotplug.c~sparsemem-enum1 mm/memory_hotplug.c --- lxc/mm/memory_hotplug.c~sparsemem-enum1 2006-12-19 09:39:19.0 -0800 +++ lxc-dave/mm/memory_hotplug.c2006-12-19 09:50:24.0 -0800 @@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone, zone_type = zone - pgdat->node_zones; if (!populated_zone(zone)) { int ret = 0; - ret = init_currently_empty_zone(zone, phys_start_pfn, nr_pages); + ret = init_currently_empty_zone(zone, phys_start_pfn, + nr_pages, MEMMAP_HOTPLUG); if (ret < 0) return ret; } -
[PATCH] Fix sparsemem on Cell (take 3)
I dropped this on the floor over Christmas. This has had a few smoke tests on ppc64 and i386 and is ready for -mm. Against 2.6.20-rc2-mm1. The following patch fixes an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. Signed-off-by: Dave Hansen [EMAIL PROTECTED] --- lxc-dave/arch/s390/mm/vmem.c|3 ++- lxc-dave/include/linux/mm.h |7 ++- lxc-dave/include/linux/mmzone.h |3 ++- lxc-dave/mm/memory_hotplug.c|6 -- lxc-dave/mm/page_alloc.c| 25 + 5 files changed, 31 insertions(+), 13 deletions(-) diff -puN mm/page_alloc.c~sparsemem-enum1 mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-enum1 2006-12-19 09:38:34.0 -0800 +++ lxc-dave/mm/page_alloc.c2006-12-19 11:18:24.0 -0800 @@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, - unsigned long start_pfn) + unsigned long start_pfn, enum memmap_context context) { struct page *page; unsigned long end_pfn = start_pfn + size; unsigned long pfn; for (pfn = start_pfn; pfn end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; + /* +* There can be holes in boot-time mem_map[]s +* handed to this function. They do not +* exist on hotplugged memory. +*/ + if (context == MEMMAP_EARLY) { + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); @@ -2102,7 +2109,7 @@ void zone_init_free_lists(struct pglist_ #ifndef __HAVE_ARCH_MEMMAP_INIT #define memmap_init(size, nid, zone, start_pfn) \ - memmap_init_zone((size), (nid), (zone), (start_pfn)) + memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY) #endif static int __cpuinit zone_batchsize(struct zone *zone) @@ -2348,7 +2355,8 @@ static __meminit void zone_pcp_init(stru __meminit int init_currently_empty_zone(struct zone *zone, unsigned long zone_start_pfn, - unsigned long size) + unsigned long size, + enum memmap_context context) { struct pglist_data *pgdat = zone-zone_pgdat; int ret; @@ -2792,7 +2800,8 @@ static void __meminit free_area_init_cor if (!size) continue; - ret = init_currently_empty_zone(zone, zone_start_pfn, size); + ret = init_currently_empty_zone(zone, zone_start_pfn, + size, MEMMAP_EARLY); BUG_ON(ret); zone_start_pfn += size; } diff -puN include/linux/mm.h~sparsemem-enum1 include/linux/mm.h --- lxc/include/linux/mm.h~sparsemem-enum1 2006-12-19 09:38:45.0 -0800 +++ lxc-dave/include/linux/mm.h 2006-12-19 09:50:47.0 -0800 @@ -979,7 +979,12 @@ extern int early_pfn_to_nid(unsigned lon #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ extern void set_dma_reserve(unsigned long new_dma_reserve); -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long); +enum memmap_context { + MEMMAP_EARLY, + MEMMAP_HOTPLUG, +}; +extern void memmap_init_zone(unsigned long, int, unsigned long, + unsigned long, enum memmap_context); extern void setup_per_zone_pages_min(void); extern void mem_init(void); extern void show_mem(void); diff -puN mm/memory_hotplug.c~sparsemem-enum1 mm/memory_hotplug.c --- lxc/mm/memory_hotplug.c~sparsemem-enum1 2006-12-19 09:39:19.0 -0800 +++ lxc-dave/mm/memory_hotplug.c2006-12-19 09:50:24.0 -0800 @@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone, zone_type = zone - pgdat-node_zones; if (!populated_zone(zone)) { int ret = 0; - ret = init_currently_empty_zone(zone, phys_start_pfn, nr_pages); + ret = init_currently_empty_zone(zone, phys_start_pfn, + nr_pages, MEMMAP_HOTPLUG); if (ret 0) return ret; } -
Re: [PATCH] Fix sparsemem on Cell (take 3)
I dropped this on the floor over Christmas. This has had a few smoke tests on ppc64 and i386 and is ready for -mm. Against 2.6.20-rc2-mm1. Could this break ia64, given that it uses memmap_init_zone()? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
Pulling cbe... list off the cc because it is giving annoying 'too many recipients' warnings. On Tue, 2006-12-19 at 09:59 +0100, Arnd Bergmann wrote: > On Tuesday 19 December 2006 00:16, Dave Hansen wrote: > > How about an enum, or a pair of #defines? > > > > enum context > > { > > EARLY, > > HOTPLUG > > }; This patch should do the trick. Arnd, can you try this to make sure it solves the issue? Also, if you get a chance, could you do a quick s390 boot of it? It was the only arch touched by this whole thing. It compiles on all of my i386 .configs. -- The following patch fixes an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/arch/s390/mm/vmem.c|3 ++- lxc-dave/include/linux/mm.h |7 ++- lxc-dave/include/linux/mmzone.h |3 ++- lxc-dave/mm/memory_hotplug.c|6 -- lxc-dave/mm/page_alloc.c| 25 + 5 files changed, 31 insertions(+), 13 deletions(-) diff -puN mm/page_alloc.c~sparsemem-enum1 mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-enum1 2006-12-19 09:38:34.0 -0800 +++ lxc-dave/mm/page_alloc.c2006-12-19 11:18:24.0 -0800 @@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, - unsigned long start_pfn) + unsigned long start_pfn, enum memmap_context context) { struct page *page; unsigned long end_pfn = start_pfn + size; unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; + /* +* There can be holes in boot-time mem_map[]s +* handed to this function. They do not +* exist on hotplugged memory. +*/ + if (context == MEMMAP_EARLY) { + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); @@ -2102,7 +2109,7 @@ void zone_init_free_lists(struct pglist_ #ifndef __HAVE_ARCH_MEMMAP_INIT #define memmap_init(size, nid, zone, start_pfn) \ - memmap_init_zone((size), (nid), (zone), (start_pfn)) + memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY) #endif static int __cpuinit zone_batchsize(struct zone *zone) @@ -2348,7 +2355,8 @@ static __meminit void zone_pcp_init(stru __meminit int init_currently_empty_zone(struct zone *zone, unsigned long zone_start_pfn, - unsigned long size) + unsigned long size, + enum memmap_context context) { struct pglist_data *pgdat = zone->zone_pgdat; int ret; @@ -2792,7 +2800,8 @@ static void __meminit free_area_init_cor if (!size) continue; - ret = init_currently_empty_zone(zone, zone_start_pfn, size); + ret = init_currently_empty_zone(zone, zone_start_pfn, + size, MEMMAP_EARLY); BUG_ON(ret); zone_start_pfn += size; } diff -puN include/linux/mm.h~sparsemem-enum1 include/linux/mm.h --- lxc/include/linux/mm.h~sparsemem-enum1 2006-12-19 09:38:45.0 -0800 +++ lxc-dave/include/linux/mm.h 2006-12-19 09:50:47.0 -0800 @@ -979,7 +979,12 @@ extern int early_pfn_to_nid(unsigned lon #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ extern void set_dma_reserve(unsigned long new_dma_reserve); -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long); +enum memmap_context { + MEMMAP_EARLY, + MEMMAP_HOTPLUG, +}; +extern void memmap_init_zone(unsigned long, int, unsigned long, + unsigned long, enum memmap_context); extern void setup_per_zone_pages_min(void); extern void mem_init(void); extern void show_mem(void); diff -puN mm/memory_hotplug.c~sparsemem-enum1 mm/memory_hotplug.c --- lxc/mm/memory_hotplug.c~sparsemem-enum1 2006-12-19 09:39:19.0 -0800 +++ lxc-dave/mm/memory_hotplug.c2006-12-19 09:50:24.0 -0800 @@ -67,11 +67,13 @@ static int __add_zone(struct zone
Re: [PATCH] Fix sparsemem on Cell
On Tuesday 19 December 2006 00:16, Dave Hansen wrote: > How about an enum, or a pair of #defines? > > enum context > { > EARLY, > HOTPLUG > }; Sounds good, but since this is in a global header file, it needs to be in an appropriate name space, like enum memmap_context { MEMMAP_EARLY, MEMMAP_HOTPLUG, }; Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Tuesday 19 December 2006 00:16, Dave Hansen wrote: How about an enum, or a pair of #defines? enum context { EARLY, HOTPLUG }; Sounds good, but since this is in a global header file, it needs to be in an appropriate name space, like enum memmap_context { MEMMAP_EARLY, MEMMAP_HOTPLUG, }; Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
Pulling cbe... list off the cc because it is giving annoying 'too many recipients' warnings. On Tue, 2006-12-19 at 09:59 +0100, Arnd Bergmann wrote: On Tuesday 19 December 2006 00:16, Dave Hansen wrote: How about an enum, or a pair of #defines? enum context { EARLY, HOTPLUG }; This patch should do the trick. Arnd, can you try this to make sure it solves the issue? Also, if you get a chance, could you do a quick s390 boot of it? It was the only arch touched by this whole thing. It compiles on all of my i386 .configs. -- The following patch fixes an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. Signed-off-by: Dave Hansen [EMAIL PROTECTED] --- lxc-dave/arch/s390/mm/vmem.c|3 ++- lxc-dave/include/linux/mm.h |7 ++- lxc-dave/include/linux/mmzone.h |3 ++- lxc-dave/mm/memory_hotplug.c|6 -- lxc-dave/mm/page_alloc.c| 25 + 5 files changed, 31 insertions(+), 13 deletions(-) diff -puN mm/page_alloc.c~sparsemem-enum1 mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-enum1 2006-12-19 09:38:34.0 -0800 +++ lxc-dave/mm/page_alloc.c2006-12-19 11:18:24.0 -0800 @@ -2062,17 +2062,24 @@ static inline unsigned long wait_table_b * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, - unsigned long start_pfn) + unsigned long start_pfn, enum memmap_context context) { struct page *page; unsigned long end_pfn = start_pfn + size; unsigned long pfn; for (pfn = start_pfn; pfn end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; + /* +* There can be holes in boot-time mem_map[]s +* handed to this function. They do not +* exist on hotplugged memory. +*/ + if (context == MEMMAP_EARLY) { + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); @@ -2102,7 +2109,7 @@ void zone_init_free_lists(struct pglist_ #ifndef __HAVE_ARCH_MEMMAP_INIT #define memmap_init(size, nid, zone, start_pfn) \ - memmap_init_zone((size), (nid), (zone), (start_pfn)) + memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY) #endif static int __cpuinit zone_batchsize(struct zone *zone) @@ -2348,7 +2355,8 @@ static __meminit void zone_pcp_init(stru __meminit int init_currently_empty_zone(struct zone *zone, unsigned long zone_start_pfn, - unsigned long size) + unsigned long size, + enum memmap_context context) { struct pglist_data *pgdat = zone-zone_pgdat; int ret; @@ -2792,7 +2800,8 @@ static void __meminit free_area_init_cor if (!size) continue; - ret = init_currently_empty_zone(zone, zone_start_pfn, size); + ret = init_currently_empty_zone(zone, zone_start_pfn, + size, MEMMAP_EARLY); BUG_ON(ret); zone_start_pfn += size; } diff -puN include/linux/mm.h~sparsemem-enum1 include/linux/mm.h --- lxc/include/linux/mm.h~sparsemem-enum1 2006-12-19 09:38:45.0 -0800 +++ lxc-dave/include/linux/mm.h 2006-12-19 09:50:47.0 -0800 @@ -979,7 +979,12 @@ extern int early_pfn_to_nid(unsigned lon #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ extern void set_dma_reserve(unsigned long new_dma_reserve); -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long); +enum memmap_context { + MEMMAP_EARLY, + MEMMAP_HOTPLUG, +}; +extern void memmap_init_zone(unsigned long, int, unsigned long, + unsigned long, enum memmap_context); extern void setup_per_zone_pages_min(void); extern void mem_init(void); extern void show_mem(void); diff -puN mm/memory_hotplug.c~sparsemem-enum1 mm/memory_hotplug.c --- lxc/mm/memory_hotplug.c~sparsemem-enum1 2006-12-19 09:39:19.0 -0800 +++ lxc-dave/mm/memory_hotplug.c2006-12-19 09:50:24.0 -0800 @@ -67,11 +67,13 @@ static int __add_zone(struct zone *zone,
Re: [PATCH] Fix sparsemem on Cell
On Mon, 18 Dec 2006 15:16:20 -0800 Dave Hansen <[EMAIL PROTECTED]> wrote: > enum context > { > EARLY, > HOTPLUG > }; I like this :) Thanks, -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Mon, 2006-12-18 at 23:54 +0100, Arnd Bergmann wrote: > #ifndef __HAVE_ARCH_MEMMAP_INIT > #define memmap_init(size, nid, zone, start_pfn) \ > - memmap_init_zone((size), (nid), (zone), (start_pfn)) > + memmap_init_zone((size), (nid), (zone), (start_pfn), 1) > #endif This is what I was thinking of. Sometimes I find these kinds of calls a bit annoying: foo(0, 1, 1, 0, 99, 22) It only takes a minute to look up what all of the numbers do, but that is one minute too many. :) How about an enum, or a pair of #defines? enum context { EARLY, HOTPLUG }; extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, enum call_context); ... So, the call I quoted above would become: memmap_init_zone((size), (nid), (zone), (start_pfn), EARLY) -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Saturday 16 December 2006 09:03, KAMEZAWA Hiroyuki wrote: > @@ -273,10 +284,13 @@ > if (ret) > goto error; > } > + atomic_inc(_hotadd_count); > > /* call arch's memory hotadd */ > ret = arch_add_memory(nid, start, size); > > + atomic_dec(_hotadd_count); > + > if (ret < 0) > goto error; > This also doesn't fix the problem on cell, since the time when the bug happens, we're not calling through this function, or arch_add_memory, at all, but rather invoke __add_pages directly. As BenH already mentioned, we shouldn't do really call __add_pages at all. Let me attempt another fix that might address all cases. This is completely untested as of now, but also addresses Dave's latest comment. Arnd <>< diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 1a3d8a2..723d220 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -543,7 +543,7 @@ virtual_memmap_init (u64 start, u64 end, void *arg) if (map_start < map_end) memmap_init_zone((unsigned long)(map_end - map_start), -args->nid, args->zone, page_to_pfn(map_start)); +args->nid, args->zone, page_to_pfn(map_start), 1); return 0; } diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c index 7f2944d..1e52cd1 100644 --- a/arch/s390/mm/vmem.c +++ b/arch/s390/mm/vmem.c @@ -61,7 +61,7 @@ void memmap_init(unsigned long size, int nid, unsigned long zone, if (map_start < map_end) memmap_init_zone((unsigned long)(map_end - map_start), -nid, zone, page_to_pfn(map_start)); +nid, zone, page_to_pfn(map_start), 1); } } diff --git a/include/linux/mm.h b/include/linux/mm.h index a17b147..6d85068 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -978,7 +978,7 @@ extern int early_pfn_to_nid(unsigned long pfn); #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ extern void set_dma_reserve(unsigned long new_dma_reserve); -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long); +extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, int); extern void setup_per_zone_pages_min(void); extern void mem_init(void); extern void show_mem(void); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0c055a0..16c9930 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -71,7 +71,7 @@ static int __add_zone(struct zone *zone, unsigned long phys_start_pfn) if (ret < 0) return ret; } - memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn); + memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn, 0); return 0; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8c1a116..60d1ac8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1953,17 +1953,19 @@ static inline unsigned long wait_table_bits(unsigned long size) * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, - unsigned long start_pfn) + unsigned long start_pfn, int early) { struct page *page; unsigned long end_pfn = start_pfn + size; unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; + if (early) { + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); @@ -1990,7 +1992,7 @@ void zone_init_free_lists(struct pglist_data *pgdat, struct zone *zone, #ifndef __HAVE_ARCH_MEMMAP_INIT #define memmap_init(size, nid, zone, start_pfn) \ - memmap_init_zone((size), (nid), (zone), (start_pfn)) + memmap_init_zone((size), (nid), (zone), (start_pfn), 1) #endif static int __cpuinit zone_batchsize(struct zone *zone) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Mon, Dec 18, 2006 at 01:13:57PM -0800, Dave Hansen wrote: > On Sat, 2006-12-16 at 17:03 +0900, KAMEZAWA Hiroyuki wrote: > > /* add this memory to iomem resource */ > > static struct resource *register_memory_resource(u64 start, u64 size) > > { > > @@ -273,10 +284,13 @@ > > if (ret) > > goto error; > > } > > + atomic_inc(_hotadd_count); > > > > /* call arch's memory hotadd */ > > ret = arch_add_memory(nid, start, size); > > > > + atomic_dec(_hotadd_count); > > I'd be willing to be that this will work just fine. But, I think we can > do it without any static state at all, if we just pass a runtime-or-not > flag down into the arch_add_memory() call chain. > > I'll code that up so we can compare to yours. Yes, I stronly concur that passing an explicit flag is much much better than any hack involving global state. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Sat, 2006-12-16 at 17:03 +0900, KAMEZAWA Hiroyuki wrote: > /* add this memory to iomem resource */ > static struct resource *register_memory_resource(u64 start, u64 size) > { > @@ -273,10 +284,13 @@ > if (ret) > goto error; > } > + atomic_inc(_hotadd_count); > > /* call arch's memory hotadd */ > ret = arch_add_memory(nid, start, size); > > + atomic_dec(_hotadd_count); I'd be willing to be that this will work just fine. But, I think we can do it without any static state at all, if we just pass a runtime-or-not flag down into the arch_add_memory() call chain. I'll code that up so we can compare to yours. -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Sat, 2006-12-16 at 17:03 +0900, KAMEZAWA Hiroyuki wrote: /* add this memory to iomem resource */ static struct resource *register_memory_resource(u64 start, u64 size) { @@ -273,10 +284,13 @@ if (ret) goto error; } + atomic_inc(memory_hotadd_count); /* call arch's memory hotadd */ ret = arch_add_memory(nid, start, size); + atomic_dec(memory_hotadd_count); I'd be willing to be that this will work just fine. But, I think we can do it without any static state at all, if we just pass a runtime-or-not flag down into the arch_add_memory() call chain. I'll code that up so we can compare to yours. -- Dave - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Mon, Dec 18, 2006 at 01:13:57PM -0800, Dave Hansen wrote: On Sat, 2006-12-16 at 17:03 +0900, KAMEZAWA Hiroyuki wrote: /* add this memory to iomem resource */ static struct resource *register_memory_resource(u64 start, u64 size) { @@ -273,10 +284,13 @@ if (ret) goto error; } + atomic_inc(memory_hotadd_count); /* call arch's memory hotadd */ ret = arch_add_memory(nid, start, size); + atomic_dec(memory_hotadd_count); I'd be willing to be that this will work just fine. But, I think we can do it without any static state at all, if we just pass a runtime-or-not flag down into the arch_add_memory() call chain. I'll code that up so we can compare to yours. Yes, I stronly concur that passing an explicit flag is much much better than any hack involving global state. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Saturday 16 December 2006 09:03, KAMEZAWA Hiroyuki wrote: @@ -273,10 +284,13 @@ if (ret) goto error; } + atomic_inc(memory_hotadd_count); /* call arch's memory hotadd */ ret = arch_add_memory(nid, start, size); + atomic_dec(memory_hotadd_count); + if (ret 0) goto error; This also doesn't fix the problem on cell, since the time when the bug happens, we're not calling through this function, or arch_add_memory, at all, but rather invoke __add_pages directly. As BenH already mentioned, we shouldn't do really call __add_pages at all. Let me attempt another fix that might address all cases. This is completely untested as of now, but also addresses Dave's latest comment. Arnd diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 1a3d8a2..723d220 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -543,7 +543,7 @@ virtual_memmap_init (u64 start, u64 end, void *arg) if (map_start map_end) memmap_init_zone((unsigned long)(map_end - map_start), -args-nid, args-zone, page_to_pfn(map_start)); +args-nid, args-zone, page_to_pfn(map_start), 1); return 0; } diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c index 7f2944d..1e52cd1 100644 --- a/arch/s390/mm/vmem.c +++ b/arch/s390/mm/vmem.c @@ -61,7 +61,7 @@ void memmap_init(unsigned long size, int nid, unsigned long zone, if (map_start map_end) memmap_init_zone((unsigned long)(map_end - map_start), -nid, zone, page_to_pfn(map_start)); +nid, zone, page_to_pfn(map_start), 1); } } diff --git a/include/linux/mm.h b/include/linux/mm.h index a17b147..6d85068 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -978,7 +978,7 @@ extern int early_pfn_to_nid(unsigned long pfn); #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */ extern void set_dma_reserve(unsigned long new_dma_reserve); -extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long); +extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, int); extern void setup_per_zone_pages_min(void); extern void mem_init(void); extern void show_mem(void); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0c055a0..16c9930 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -71,7 +71,7 @@ static int __add_zone(struct zone *zone, unsigned long phys_start_pfn) if (ret 0) return ret; } - memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn); + memmap_init_zone(nr_pages, nid, zone_type, phys_start_pfn, 0); return 0; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8c1a116..60d1ac8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1953,17 +1953,19 @@ static inline unsigned long wait_table_bits(unsigned long size) * done. Non-atomic initialization, single-pass. */ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, - unsigned long start_pfn) + unsigned long start_pfn, int early) { struct page *page; unsigned long end_pfn = start_pfn + size; unsigned long pfn; for (pfn = start_pfn; pfn end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; + if (early) { + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); @@ -1990,7 +1992,7 @@ void zone_init_free_lists(struct pglist_data *pgdat, struct zone *zone, #ifndef __HAVE_ARCH_MEMMAP_INIT #define memmap_init(size, nid, zone, start_pfn) \ - memmap_init_zone((size), (nid), (zone), (start_pfn)) + memmap_init_zone((size), (nid), (zone), (start_pfn), 1) #endif static int __cpuinit zone_batchsize(struct zone *zone) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Mon, 2006-12-18 at 23:54 +0100, Arnd Bergmann wrote: #ifndef __HAVE_ARCH_MEMMAP_INIT #define memmap_init(size, nid, zone, start_pfn) \ - memmap_init_zone((size), (nid), (zone), (start_pfn)) + memmap_init_zone((size), (nid), (zone), (start_pfn), 1) #endif This is what I was thinking of. Sometimes I find these kinds of calls a bit annoying: foo(0, 1, 1, 0, 99, 22) It only takes a minute to look up what all of the numbers do, but that is one minute too many. :) How about an enum, or a pair of #defines? enum context { EARLY, HOTPLUG }; extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, enum call_context); ... So, the call I quoted above would become: memmap_init_zone((size), (nid), (zone), (start_pfn), EARLY) -- Dave - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Mon, 18 Dec 2006 15:16:20 -0800 Dave Hansen [EMAIL PROTECTED] wrote: enum context { EARLY, HOTPLUG }; I like this :) Thanks, -Kame - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Friday 15 December 2006 18:14, Dave Hansen wrote: > + if (system_state >= SYSTEM_RUNNING) > + return 1; > + if (!early_pfn_valid(pfn)) > + return 0; > + if (!early_pfn_in_nid(pfn, nid)) > + return 0; I haven't tried it, but I assume this is still wrong. On cell, we didn't actually hit the case where the init sections have been overwritten, since we call __add_pages from an initcall. However, the pages we add are not part of the early_node_map, so early_pfn_in_nid() returns a bogus result, causing some page structs not to get initialized. I believe your patch is going in the right direction, but it does not solve the bug we have... Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Friday 15 December 2006 18:14, Dave Hansen wrote: + if (system_state = SYSTEM_RUNNING) + return 1; + if (!early_pfn_valid(pfn)) + return 0; + if (!early_pfn_in_nid(pfn, nid)) + return 0; I haven't tried it, but I assume this is still wrong. On cell, we didn't actually hit the case where the init sections have been overwritten, since we call __add_pages from an initcall. However, the pages we add are not part of the early_node_map, so early_pfn_in_nid() returns a bogus result, causing some page structs not to get initialized. I believe your patch is going in the right direction, but it does not solve the bug we have... Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Fri, 15 Dec 2006 11:45:36 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > Perhaps if the function's role in the world was commented it would be clearer. > How about patch like this ? (this one is not tested.) Already-exisiting-more-generic-flag is available ? -Kame == include/linux/memory_hotplug.h |8 mm/memory_hotplug.c| 14 ++ mm/page_alloc.c| 11 +++ 3 files changed, 29 insertions(+), 4 deletions(-) Index: linux-2.6.20-rc1-mm1/include/linux/memory_hotplug.h === --- linux-2.6.20-rc1-mm1.orig/include/linux/memory_hotplug.h2006-11-30 06:57:37.0 +0900 +++ linux-2.6.20-rc1-mm1/include/linux/memory_hotplug.h 2006-12-16 16:42:40.0 +0900 @@ -133,6 +133,10 @@ #endif /* CONFIG_NUMA */ #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ +extern int under_memory_hotadd(void); + + + #else /* ! CONFIG_MEMORY_HOTPLUG */ /* * Stub functions for when hotplug is off @@ -159,6 +163,10 @@ dump_stack(); return -ENOSYS; } +static inline int under_memory_hotadd() +{ + return 0; +} #endif /* ! CONFIG_MEMORY_HOTPLUG */ static inline int __remove_pages(struct zone *zone, unsigned long start_pfn, Index: linux-2.6.20-rc1-mm1/mm/memory_hotplug.c === --- linux-2.6.20-rc1-mm1.orig/mm/memory_hotplug.c 2006-12-16 14:24:10.0 +0900 +++ linux-2.6.20-rc1-mm1/mm/memory_hotplug.c2006-12-16 16:51:08.0 +0900 @@ -26,6 +26,17 @@ #include +/* + * Because memory hotplug shares some codes for initilization with boot, + * we sometimes have to check what we are doing ? + */ +static atomic_t memory_hotadd_count; + +int under_memory_hotadd(void) +{ + return atomic_read(_hotadd_count); +} + /* add this memory to iomem resource */ static struct resource *register_memory_resource(u64 start, u64 size) { @@ -273,10 +284,13 @@ if (ret) goto error; } + atomic_inc(_hotadd_count); /* call arch's memory hotadd */ ret = arch_add_memory(nid, start, size); + atomic_dec(_hotadd_count); + if (ret < 0) goto error; Index: linux-2.6.20-rc1-mm1/mm/page_alloc.c === --- linux-2.6.20-rc1-mm1.orig/mm/page_alloc.c 2006-12-16 14:24:38.0 +0900 +++ linux-2.6.20-rc1-mm1/mm/page_alloc.c2006-12-16 16:53:02.0 +0900 @@ -2069,10 +2069,13 @@ unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; + if (!under_memory_hotadd()) { + /* we are in boot */ + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Fri, 15 Dec 2006 11:45:36 -0800 Andrew Morton [EMAIL PROTECTED] wrote: Perhaps if the function's role in the world was commented it would be clearer. How about patch like this ? (this one is not tested.) Already-exisiting-more-generic-flag is available ? -Kame == include/linux/memory_hotplug.h |8 mm/memory_hotplug.c| 14 ++ mm/page_alloc.c| 11 +++ 3 files changed, 29 insertions(+), 4 deletions(-) Index: linux-2.6.20-rc1-mm1/include/linux/memory_hotplug.h === --- linux-2.6.20-rc1-mm1.orig/include/linux/memory_hotplug.h2006-11-30 06:57:37.0 +0900 +++ linux-2.6.20-rc1-mm1/include/linux/memory_hotplug.h 2006-12-16 16:42:40.0 +0900 @@ -133,6 +133,10 @@ #endif /* CONFIG_NUMA */ #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ +extern int under_memory_hotadd(void); + + + #else /* ! CONFIG_MEMORY_HOTPLUG */ /* * Stub functions for when hotplug is off @@ -159,6 +163,10 @@ dump_stack(); return -ENOSYS; } +static inline int under_memory_hotadd() +{ + return 0; +} #endif /* ! CONFIG_MEMORY_HOTPLUG */ static inline int __remove_pages(struct zone *zone, unsigned long start_pfn, Index: linux-2.6.20-rc1-mm1/mm/memory_hotplug.c === --- linux-2.6.20-rc1-mm1.orig/mm/memory_hotplug.c 2006-12-16 14:24:10.0 +0900 +++ linux-2.6.20-rc1-mm1/mm/memory_hotplug.c2006-12-16 16:51:08.0 +0900 @@ -26,6 +26,17 @@ #include asm/tlbflush.h +/* + * Because memory hotplug shares some codes for initilization with boot, + * we sometimes have to check what we are doing ? + */ +static atomic_t memory_hotadd_count; + +int under_memory_hotadd(void) +{ + return atomic_read(memory_hotadd_count); +} + /* add this memory to iomem resource */ static struct resource *register_memory_resource(u64 start, u64 size) { @@ -273,10 +284,13 @@ if (ret) goto error; } + atomic_inc(memory_hotadd_count); /* call arch's memory hotadd */ ret = arch_add_memory(nid, start, size); + atomic_dec(memory_hotadd_count); + if (ret 0) goto error; Index: linux-2.6.20-rc1-mm1/mm/page_alloc.c === --- linux-2.6.20-rc1-mm1.orig/mm/page_alloc.c 2006-12-16 14:24:38.0 +0900 +++ linux-2.6.20-rc1-mm1/mm/page_alloc.c2006-12-16 16:53:02.0 +0900 @@ -2069,10 +2069,13 @@ unsigned long pfn; for (pfn = start_pfn; pfn end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) - continue; + if (!under_memory_hotadd()) { + /* we are in boot */ + if (!early_pfn_valid(pfn)) + continue; + if (!early_pfn_in_nid(pfn, nid)) + continue; + } page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); init_page_count(page); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
> The only other assumption is that all memory-hotplug-time pages > given to memmap_init_zone() are valid and able to be onlined into > any any zone after the system is running. The "valid" part is > really just a question of whether or not a 'struct page' is there > for the pfn, and *not* whether there is actual memory. Since > all sparsemem sections have contiguous mem_map[]s within them, > and we only memory hotplug entire sparsemem sections, we can > be confident that this assumption will hold. > > As for the memory being in the right node, we'll assume tha > memory hotplug is putting things in the right node. BTW, just that people know, what we are adding isn't even memory :-) We are calling __add_pages() to create struct page for the SPE local stores and register space as we use them later from a nopage() handler (and no, we can't use no_pfn just yet for various reasons, notably we need to handle races with unmap_mapping_ranges() and thus have the truncate logic in). Those pages, thus, must never be onlined. Ever. It might make sense to create a way to inform memory hotplug of that fact, but on the other hand, I wouldn't bother as I have a plan to get rid of those __add_pages() completely and work without struct page, maybe in a 2.6.21 timeframe. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Fri, 15 Dec 2006 09:24:00 -0800 Dave Hansen <[EMAIL PROTECTED]> wrote: > > ... > > I think the comments added say it pretty well, but I'll repeat it here. > > This fix is pretty similar in concept to the one that Arnd posted > as a temporary workaround, but I've added a few comments explaining > what the actual assumptions are, and improved it a wee little bit. > > The end goal here is to simply avoid calling the early_*() functions > when it is _not_ early. Those functions stop working as soon as > free_initmem() is called. system_state is set to SYSTEM_RUNNING > just after free_initmem() is called, so it seems appropriate to use > here. Would really prefer not to do this. system_state is evil. Its semantics are poorly-defined and if someone changes them a bit, or changes memory initialisation order, you get whacked. I think an mm-private flag with /*documented*/ semantics would be better. It's only a byte. > +static int __meminit can_online_pfn_into_nid(unsigned long pfn, int nid) I spent some time trying to work out what "can_online_pfn_into_nid" can possibly mean and failed. "We can bring a pfn online then turn it into a NID"? Don't think so. "We can bring this page online and allocate it to this node"? Maybe. Perhaps if the function's role in the world was commented it would be clearer. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Fri, 2006-12-15 at 18:22 +0100, Michael Buesch wrote: > On Friday 15 December 2006 17:53, Dave Hansen wrote: > > lxc-dave/init/main.c |4 > > lxc-dave/mm/page_alloc.c | 28 +--- > > 2 files changed, 29 insertions(+), 3 deletions(-) > > > > diff -puN init/main.c~sparsemem-fix init/main.c > > --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 > > +++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800 > > @@ -770,6 +770,10 @@ static int init(void * unused) > > free_initmem(); > > unlock_kernel(); > > mark_rodata_ro(); > > + /* > > +* Memory hotplug requires that this system_state transition > > +* happer after free_initmem(). (see memmap_init_zone()) > > s/happer/happens/ > > Other than that, can't this possibly race and crash here? > I mean, it's not a big race window, but it can happen, no? That's a good point. Nice eye. There are three routes in here: boot-time init, an ACPI call, and a write to a sysfs file. Bootmem is taken care of. The write to a sysfs file can't happen yet because userspace isn't up. The only question would be about ACPI. I _guess_ an ACPI event could come in at any time, and could hit this race window. One other thought I had was to add an argument to memmap_init_zone() to indicate that the memory being fed to it was contiguous and did not need the validation checks. Anybody have thoughts on that? -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
Dave Hansen wrote: I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. I did think twice about actually using SYSTEM_RUNNING because we moved away from it in other parts of memory hotplug, but those were actually for _allocations_ in favor of slab_is_available(), and we really don't care about the slab here. The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The "valid" part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. ie. that if hotplug is pushing this memory into a zone on a node it really does know what its doing, and its putting it in the right place. Obviously that code needs to be 'overlap' aware but thats ok for this interface. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/init/main.c |4 lxc-dave/mm/page_alloc.c | 28 +--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* +* Memory hotplug requires that this system_state transition +* happer after free_initmem(). (see memmap_init_zone()) +*/ typo: happen system_state = SYSTEM_RUNNING; numa_default_policy(); diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800 @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) +static int can_online_pfn_into_nid(unsigned long pfn, int nid) +{ __meminit ? + /* +* There are two things that make this work: +* 1. The early_pfn...() functions are __init and +*use __initdata. If the system is < SYSTEM_RUNNING, +*those functions and their data will still exist. +* 2. We also assume that all actual memory hotplug +*(as opposed to boot-time) calls to this are only +*for contiguous memory regions. With sparsemem, +*this guaranteed is easy because all sections are +*contiguous and we never online more than one +*section at a time. Boot-time memory can have holes +*anywhere. +*/ + if (system_state >= SYSTEM_RUNNING) + return 1; Is there any value in codifying the assumption here, as a safety net? if (system_state >= SYSTEM_RUNNING) #ifdef CONFIG_SPARSEMEM return 1; #else return 0; #endif + if (!early_pfn_valid(pfn)) + return 0; + if (!early_pfn_in_nid(pfn, nid)) + return 0; + return 1; +} + /* * Initially all pages are reserved - free ones are freed * up by free_all_bootmem() once the early boot process is @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) + if (!can_online_pfn_into_nid(pfn)) We're not passing nid here? continue; page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); _ Looks like this would do the job to me. -apw - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Fri, 2006-12-15 at 17:11 +, Andy Whitcroft wrote: > ie. that if hotplug is pushing this memory into a zone on a node it > really does know what its doing, and its putting it in the right place. > Obviously that code needs to be 'overlap' aware but thats ok for this > interface. I'm not sure if this, especially if we're only doing one section at a time. > > system_state = SYSTEM_RUNNING; > > numa_default_policy(); > > > > diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c > > --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.0 > > -0800 > > +++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800 > > @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b > > > > #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) > > > > +static int can_online_pfn_into_nid(unsigned long pfn, int nid) > > +{ > > __meminit ? Yup. I'll get that. > > + /* > > +* There are two things that make this work: > > +* 1. The early_pfn...() functions are __init and > > +*use __initdata. If the system is < SYSTEM_RUNNING, > > +*those functions and their data will still exist. > > +* 2. We also assume that all actual memory hotplug > > +*(as opposed to boot-time) calls to this are only > > +*for contiguous memory regions. With sparsemem, > > +*this guaranteed is easy because all sections are > > +*contiguous and we never online more than one > > +*section at a time. Boot-time memory can have holes > > +*anywhere. > > +*/ > > + if (system_state >= SYSTEM_RUNNING) > > + return 1; > > Is there any value in codifying the assumption here, as a safety net? > > if (system_state >= SYSTEM_RUNNING) > #ifdef CONFIG_SPARSEMEM > return 1; > #else > return 0; > #endif Dunno. The normal case is that it isn't even called without memory hotplug. The only non-sparsemem memory hotplug is Keith's baby, and they lay out all of their mem_map at boot-time anyway, so I don't think this gets used. Keith, do you know whether you even do memmap_init_zone() at runtime, and if you ever have holes if/when you do? > > + if (!early_pfn_valid(pfn)) > > + return 0; > > + if (!early_pfn_in_nid(pfn, nid)) > > + return 0; > > + return 1; > > +} > > + > > /* > > * Initially all pages are reserved - free ones are freed > > * up by free_all_bootmem() once the early boot process is > > @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned > > unsigned long pfn; > > > > for (pfn = start_pfn; pfn < end_pfn; pfn++) { > > - if (!early_pfn_valid(pfn)) > > - continue; > > - if (!early_pfn_in_nid(pfn, nid)) > > + if (!can_online_pfn_into_nid(pfn)) > > We're not passing nid here? No, because my ppc64 cross compiler has magically broken. :( Fixed patch appended. -- Dave I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. I did think twice about actually using SYSTEM_RUNNING because we moved away from it in other parts of memory hotplug, but those were actually for _allocations_ in favor of slab_is_available(), and we really don't care about the slab here. The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The "valid" part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/init/main.c |4 lxc-dave/mm/page_alloc.c | 28 +--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* +* Memory hotplug requires that this system_state transition +* happen after free_initmem(). (see
Re: [PATCH] Fix sparsemem on Cell
On Friday 15 December 2006 17:53, Dave Hansen wrote: > > I think the comments added say it pretty well, but I'll repeat it here. > > This fix is pretty similar in concept to the one that Arnd posted > as a temporary workaround, but I've added a few comments explaining > what the actual assumptions are, and improved it a wee little bit. > > The end goal here is to simply avoid calling the early_*() functions > when it is _not_ early. Those functions stop working as soon as > free_initmem() is called. system_state is set to SYSTEM_RUNNING > just after free_initmem() is called, so it seems appropriate to use > here. > > I did think twice about actually using SYSTEM_RUNNING because we > moved away from it in other parts of memory hotplug, but those > were actually for _allocations_ in favor of slab_is_available(), > and we really don't care about the slab here. > > The only other assumption is that all memory-hotplug-time pages > given to memmap_init_zone() are valid and able to be onlined into > any any zone after the system is running. The "valid" part is > really just a question of whether or not a 'struct page' is there > for the pfn, and *not* whether there is actual memory. Since > all sparsemem sections have contiguous mem_map[]s within them, > and we only memory hotplug entire sparsemem sections, we can > be confident that this assumption will hold. > > As for the memory being in the right node, we'll assume tha > memory hotplug is putting things in the right node. > > Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> > > --- > > lxc-dave/init/main.c |4 > lxc-dave/mm/page_alloc.c | 28 +--- > 2 files changed, 29 insertions(+), 3 deletions(-) > > diff -puN init/main.c~sparsemem-fix init/main.c > --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 > +++ lxc-dave/init/main.c 2006-12-15 08:49:53.0 -0800 > @@ -770,6 +770,10 @@ static int init(void * unused) > free_initmem(); > unlock_kernel(); > mark_rodata_ro(); > + /* > + * Memory hotplug requires that this system_state transition > + * happer after free_initmem(). (see memmap_init_zone()) s/happer/happens/ Other than that, can't this possibly race and crash here? I mean, it's not a big race window, but it can happen, no? > + */ > system_state = SYSTEM_RUNNING; > numa_default_policy(); > > diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c > --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 > +++ lxc-dave/mm/page_alloc.c 2006-12-15 08:49:53.0 -0800 > @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b > > #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) > > +static int can_online_pfn_into_nid(unsigned long pfn, int nid) > +{ > + /* > + * There are two things that make this work: > + * 1. The early_pfn...() functions are __init and > + *use __initdata. If the system is < SYSTEM_RUNNING, > + *those functions and their data will still exist. > + * 2. We also assume that all actual memory hotplug > + *(as opposed to boot-time) calls to this are only > + *for contiguous memory regions. With sparsemem, > + *this guaranteed is easy because all sections are > + *contiguous and we never online more than one > + *section at a time. Boot-time memory can have holes > + *anywhere. > + */ > + if (system_state >= SYSTEM_RUNNING) > + return 1; > + if (!early_pfn_valid(pfn)) > + return 0; > + if (!early_pfn_in_nid(pfn, nid)) > + return 0; > + return 1; > +} > + > /* > * Initially all pages are reserved - free ones are freed > * up by free_all_bootmem() once the early boot process is > @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned > unsigned long pfn; > > for (pfn = start_pfn; pfn < end_pfn; pfn++) { > - if (!early_pfn_valid(pfn)) > - continue; > - if (!early_pfn_in_nid(pfn, nid)) > + if (!can_online_pfn_into_nid(pfn)) > continue; > page = pfn_to_page(pfn); > set_page_links(page, zone, nid, pfn); > _ > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > -- Greetings Michael. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix sparsemem on Cell
I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. I did think twice about actually using SYSTEM_RUNNING because we moved away from it in other parts of memory hotplug, but those were actually for _allocations_ in favor of slab_is_available(), and we really don't care about the slab here. The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The "valid" part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/init/main.c |4 lxc-dave/mm/page_alloc.c | 28 +--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* +* Memory hotplug requires that this system_state transition +* happer after free_initmem(). (see memmap_init_zone()) +*/ system_state = SYSTEM_RUNNING; numa_default_policy(); diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800 @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) +static int can_online_pfn_into_nid(unsigned long pfn, int nid) +{ + /* +* There are two things that make this work: +* 1. The early_pfn...() functions are __init and +*use __initdata. If the system is < SYSTEM_RUNNING, +*those functions and their data will still exist. +* 2. We also assume that all actual memory hotplug +*(as opposed to boot-time) calls to this are only +*for contiguous memory regions. With sparsemem, +*this guaranteed is easy because all sections are +*contiguous and we never online more than one +*section at a time. Boot-time memory can have holes +*anywhere. +*/ + if (system_state >= SYSTEM_RUNNING) + return 1; + if (!early_pfn_valid(pfn)) + return 0; + if (!early_pfn_in_nid(pfn, nid)) + return 0; + return 1; +} + /* * Initially all pages are reserved - free ones are freed * up by free_all_bootmem() once the early boot process is @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) + if (!can_online_pfn_into_nid(pfn)) continue; page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix sparsemem on Cell
I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. I did think twice about actually using SYSTEM_RUNNING because we moved away from it in other parts of memory hotplug, but those were actually for _allocations_ in favor of slab_is_available(), and we really don't care about the slab here. The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The "valid" part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/init/main.c |4 lxc-dave/mm/page_alloc.c | 28 +--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* +* Memory hotplug requires that this system_state transition +* happer after free_initmem(). (see memmap_init_zone()) +*/ system_state = SYSTEM_RUNNING; numa_default_policy(); diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800 @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1)) +static int can_online_pfn_into_nid(unsigned long pfn, int nid) +{ + /* +* There are two things that make this work: +* 1. The early_pfn...() functions are __init and +*use __initdata. If the system is < SYSTEM_RUNNING, +*those functions and their data will still exist. +* 2. We also assume that all actual memory hotplug +*(as opposed to boot-time) calls to this are only +*for contiguous memory regions. With sparsemem, +*this guaranteed is easy because all sections are +*contiguous and we never online more than one +*section at a time. Boot-time memory can have holes +*anywhere. +*/ + if (system_state >= SYSTEM_RUNNING) + return 1; + if (!early_pfn_valid(pfn)) + return 0; + if (!early_pfn_in_nid(pfn, nid)) + return 0; + return 1; +} + /* * Initially all pages are reserved - free ones are freed * up by free_all_bootmem() once the early boot process is @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) + if (!can_online_pfn_into_nid(pfn)) continue; page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix sparsemem on Cell
I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. I did think twice about actually using SYSTEM_RUNNING because we moved away from it in other parts of memory hotplug, but those were actually for _allocations_ in favor of slab_is_available(), and we really don't care about the slab here. The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The valid part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. Signed-off-by: Dave Hansen [EMAIL PROTECTED] --- lxc-dave/init/main.c |4 lxc-dave/mm/page_alloc.c | 28 +--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* +* Memory hotplug requires that this system_state transition +* happer after free_initmem(). (see memmap_init_zone()) +*/ system_state = SYSTEM_RUNNING; numa_default_policy(); diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800 @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)~((sizeof(long))-1)) +static int can_online_pfn_into_nid(unsigned long pfn, int nid) +{ + /* +* There are two things that make this work: +* 1. The early_pfn...() functions are __init and +*use __initdata. If the system is SYSTEM_RUNNING, +*those functions and their data will still exist. +* 2. We also assume that all actual memory hotplug +*(as opposed to boot-time) calls to this are only +*for contiguous memory regions. With sparsemem, +*this guaranteed is easy because all sections are +*contiguous and we never online more than one +*section at a time. Boot-time memory can have holes +*anywhere. +*/ + if (system_state = SYSTEM_RUNNING) + return 1; + if (!early_pfn_valid(pfn)) + return 0; + if (!early_pfn_in_nid(pfn, nid)) + return 0; + return 1; +} + /* * Initially all pages are reserved - free ones are freed * up by free_all_bootmem() once the early boot process is @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned unsigned long pfn; for (pfn = start_pfn; pfn end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) + if (!can_online_pfn_into_nid(pfn)) continue; page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix sparsemem on Cell
I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. I did think twice about actually using SYSTEM_RUNNING because we moved away from it in other parts of memory hotplug, but those were actually for _allocations_ in favor of slab_is_available(), and we really don't care about the slab here. The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The valid part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. Signed-off-by: Dave Hansen [EMAIL PROTECTED] --- lxc-dave/init/main.c |4 lxc-dave/mm/page_alloc.c | 28 +--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* +* Memory hotplug requires that this system_state transition +* happer after free_initmem(). (see memmap_init_zone()) +*/ system_state = SYSTEM_RUNNING; numa_default_policy(); diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800 @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)~((sizeof(long))-1)) +static int can_online_pfn_into_nid(unsigned long pfn, int nid) +{ + /* +* There are two things that make this work: +* 1. The early_pfn...() functions are __init and +*use __initdata. If the system is SYSTEM_RUNNING, +*those functions and their data will still exist. +* 2. We also assume that all actual memory hotplug +*(as opposed to boot-time) calls to this are only +*for contiguous memory regions. With sparsemem, +*this guaranteed is easy because all sections are +*contiguous and we never online more than one +*section at a time. Boot-time memory can have holes +*anywhere. +*/ + if (system_state = SYSTEM_RUNNING) + return 1; + if (!early_pfn_valid(pfn)) + return 0; + if (!early_pfn_in_nid(pfn, nid)) + return 0; + return 1; +} + /* * Initially all pages are reserved - free ones are freed * up by free_all_bootmem() once the early boot process is @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned unsigned long pfn; for (pfn = start_pfn; pfn end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) + if (!can_online_pfn_into_nid(pfn)) continue; page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Friday 15 December 2006 17:53, Dave Hansen wrote: I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. I did think twice about actually using SYSTEM_RUNNING because we moved away from it in other parts of memory hotplug, but those were actually for _allocations_ in favor of slab_is_available(), and we really don't care about the slab here. The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The valid part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. Signed-off-by: Dave Hansen [EMAIL PROTECTED] --- lxc-dave/init/main.c |4 lxc-dave/mm/page_alloc.c | 28 +--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/init/main.c 2006-12-15 08:49:53.0 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* + * Memory hotplug requires that this system_state transition + * happer after free_initmem(). (see memmap_init_zone()) s/happer/happens/ Other than that, can't this possibly race and crash here? I mean, it's not a big race window, but it can happen, no? + */ system_state = SYSTEM_RUNNING; numa_default_policy(); diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/mm/page_alloc.c 2006-12-15 08:49:53.0 -0800 @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)~((sizeof(long))-1)) +static int can_online_pfn_into_nid(unsigned long pfn, int nid) +{ + /* + * There are two things that make this work: + * 1. The early_pfn...() functions are __init and + *use __initdata. If the system is SYSTEM_RUNNING, + *those functions and their data will still exist. + * 2. We also assume that all actual memory hotplug + *(as opposed to boot-time) calls to this are only + *for contiguous memory regions. With sparsemem, + *this guaranteed is easy because all sections are + *contiguous and we never online more than one + *section at a time. Boot-time memory can have holes + *anywhere. + */ + if (system_state = SYSTEM_RUNNING) + return 1; + if (!early_pfn_valid(pfn)) + return 0; + if (!early_pfn_in_nid(pfn, nid)) + return 0; + return 1; +} + /* * Initially all pages are reserved - free ones are freed * up by free_all_bootmem() once the early boot process is @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned unsigned long pfn; for (pfn = start_pfn; pfn end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) + if (!can_online_pfn_into_nid(pfn)) continue; page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Greetings Michael. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Fri, 2006-12-15 at 17:11 +, Andy Whitcroft wrote: ie. that if hotplug is pushing this memory into a zone on a node it really does know what its doing, and its putting it in the right place. Obviously that code needs to be 'overlap' aware but thats ok for this interface. I'm not sure if this, especially if we're only doing one section at a time. system_state = SYSTEM_RUNNING; numa_default_policy(); diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800 @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)~((sizeof(long))-1)) +static int can_online_pfn_into_nid(unsigned long pfn, int nid) +{ __meminit ? Yup. I'll get that. + /* +* There are two things that make this work: +* 1. The early_pfn...() functions are __init and +*use __initdata. If the system is SYSTEM_RUNNING, +*those functions and their data will still exist. +* 2. We also assume that all actual memory hotplug +*(as opposed to boot-time) calls to this are only +*for contiguous memory regions. With sparsemem, +*this guaranteed is easy because all sections are +*contiguous and we never online more than one +*section at a time. Boot-time memory can have holes +*anywhere. +*/ + if (system_state = SYSTEM_RUNNING) + return 1; Is there any value in codifying the assumption here, as a safety net? if (system_state = SYSTEM_RUNNING) #ifdef CONFIG_SPARSEMEM return 1; #else return 0; #endif Dunno. The normal case is that it isn't even called without memory hotplug. The only non-sparsemem memory hotplug is Keith's baby, and they lay out all of their mem_map at boot-time anyway, so I don't think this gets used. Keith, do you know whether you even do memmap_init_zone() at runtime, and if you ever have holes if/when you do? + if (!early_pfn_valid(pfn)) + return 0; + if (!early_pfn_in_nid(pfn, nid)) + return 0; + return 1; +} + /* * Initially all pages are reserved - free ones are freed * up by free_all_bootmem() once the early boot process is @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned unsigned long pfn; for (pfn = start_pfn; pfn end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) + if (!can_online_pfn_into_nid(pfn)) We're not passing nid here? No, because my ppc64 cross compiler has magically broken. :( Fixed patch appended. -- Dave I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. I did think twice about actually using SYSTEM_RUNNING because we moved away from it in other parts of memory hotplug, but those were actually for _allocations_ in favor of slab_is_available(), and we really don't care about the slab here. The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The valid part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. Signed-off-by: Dave Hansen [EMAIL PROTECTED] --- lxc-dave/init/main.c |4 lxc-dave/mm/page_alloc.c | 28 +--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* +* Memory hotplug requires that this system_state transition +* happen after free_initmem(). (see memmap_init_zone()) +*/ system_state = SYSTEM_RUNNING; numa_default_policy(); diff -puN
Re: [PATCH] Fix sparsemem on Cell
Dave Hansen wrote: I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. I did think twice about actually using SYSTEM_RUNNING because we moved away from it in other parts of memory hotplug, but those were actually for _allocations_ in favor of slab_is_available(), and we really don't care about the slab here. The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The valid part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. ie. that if hotplug is pushing this memory into a zone on a node it really does know what its doing, and its putting it in the right place. Obviously that code needs to be 'overlap' aware but thats ok for this interface. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. Signed-off-by: Dave Hansen [EMAIL PROTECTED] --- lxc-dave/init/main.c |4 lxc-dave/mm/page_alloc.c | 28 +--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* +* Memory hotplug requires that this system_state transition +* happer after free_initmem(). (see memmap_init_zone()) +*/ typo: happen system_state = SYSTEM_RUNNING; numa_default_policy(); diff -puN mm/page_alloc.c~sparsemem-fix mm/page_alloc.c --- lxc/mm/page_alloc.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/mm/page_alloc.c2006-12-15 08:49:53.0 -0800 @@ -2056,6 +2056,30 @@ static inline unsigned long wait_table_b #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)~((sizeof(long))-1)) +static int can_online_pfn_into_nid(unsigned long pfn, int nid) +{ __meminit ? + /* +* There are two things that make this work: +* 1. The early_pfn...() functions are __init and +*use __initdata. If the system is SYSTEM_RUNNING, +*those functions and their data will still exist. +* 2. We also assume that all actual memory hotplug +*(as opposed to boot-time) calls to this are only +*for contiguous memory regions. With sparsemem, +*this guaranteed is easy because all sections are +*contiguous and we never online more than one +*section at a time. Boot-time memory can have holes +*anywhere. +*/ + if (system_state = SYSTEM_RUNNING) + return 1; Is there any value in codifying the assumption here, as a safety net? if (system_state = SYSTEM_RUNNING) #ifdef CONFIG_SPARSEMEM return 1; #else return 0; #endif + if (!early_pfn_valid(pfn)) + return 0; + if (!early_pfn_in_nid(pfn, nid)) + return 0; + return 1; +} + /* * Initially all pages are reserved - free ones are freed * up by free_all_bootmem() once the early boot process is @@ -2069,9 +2093,7 @@ void __meminit memmap_init_zone(unsigned unsigned long pfn; for (pfn = start_pfn; pfn end_pfn; pfn++) { - if (!early_pfn_valid(pfn)) - continue; - if (!early_pfn_in_nid(pfn, nid)) + if (!can_online_pfn_into_nid(pfn)) We're not passing nid here? continue; page = pfn_to_page(pfn); set_page_links(page, zone, nid, pfn); _ Looks like this would do the job to me. -apw - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Fri, 2006-12-15 at 18:22 +0100, Michael Buesch wrote: On Friday 15 December 2006 17:53, Dave Hansen wrote: lxc-dave/init/main.c |4 lxc-dave/mm/page_alloc.c | 28 +--- 2 files changed, 29 insertions(+), 3 deletions(-) diff -puN init/main.c~sparsemem-fix init/main.c --- lxc/init/main.c~sparsemem-fix 2006-12-15 08:49:53.0 -0800 +++ lxc-dave/init/main.c2006-12-15 08:49:53.0 -0800 @@ -770,6 +770,10 @@ static int init(void * unused) free_initmem(); unlock_kernel(); mark_rodata_ro(); + /* +* Memory hotplug requires that this system_state transition +* happer after free_initmem(). (see memmap_init_zone()) s/happer/happens/ Other than that, can't this possibly race and crash here? I mean, it's not a big race window, but it can happen, no? That's a good point. Nice eye. There are three routes in here: boot-time init, an ACPI call, and a write to a sysfs file. Bootmem is taken care of. The write to a sysfs file can't happen yet because userspace isn't up. The only question would be about ACPI. I _guess_ an ACPI event could come in at any time, and could hit this race window. One other thought I had was to add an argument to memmap_init_zone() to indicate that the memory being fed to it was contiguous and did not need the validation checks. Anybody have thoughts on that? -- Dave - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
On Fri, 15 Dec 2006 09:24:00 -0800 Dave Hansen [EMAIL PROTECTED] wrote: ... I think the comments added say it pretty well, but I'll repeat it here. This fix is pretty similar in concept to the one that Arnd posted as a temporary workaround, but I've added a few comments explaining what the actual assumptions are, and improved it a wee little bit. The end goal here is to simply avoid calling the early_*() functions when it is _not_ early. Those functions stop working as soon as free_initmem() is called. system_state is set to SYSTEM_RUNNING just after free_initmem() is called, so it seems appropriate to use here. Would really prefer not to do this. system_state is evil. Its semantics are poorly-defined and if someone changes them a bit, or changes memory initialisation order, you get whacked. I think an mm-private flag with /*documented*/ semantics would be better. It's only a byte. +static int __meminit can_online_pfn_into_nid(unsigned long pfn, int nid) I spent some time trying to work out what can_online_pfn_into_nid can possibly mean and failed. We can bring a pfn online then turn it into a NID? Don't think so. We can bring this page online and allocate it to this node? Maybe. Perhaps if the function's role in the world was commented it would be clearer. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix sparsemem on Cell
The only other assumption is that all memory-hotplug-time pages given to memmap_init_zone() are valid and able to be onlined into any any zone after the system is running. The valid part is really just a question of whether or not a 'struct page' is there for the pfn, and *not* whether there is actual memory. Since all sparsemem sections have contiguous mem_map[]s within them, and we only memory hotplug entire sparsemem sections, we can be confident that this assumption will hold. As for the memory being in the right node, we'll assume tha memory hotplug is putting things in the right node. BTW, just that people know, what we are adding isn't even memory :-) We are calling __add_pages() to create struct page for the SPE local stores and register space as we use them later from a nopage() handler (and no, we can't use no_pfn just yet for various reasons, notably we need to handle races with unmap_mapping_ranges() and thus have the truncate logic in). Those pages, thus, must never be onlined. Ever. It might make sense to create a way to inform memory hotplug of that fact, but on the other hand, I wouldn't bother as I have a plan to get rid of those __add_pages() completely and work without struct page, maybe in a 2.6.21 timeframe. Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/