Re: crash in kmem_cache_init
On (23/01/08 13:14), Olaf Hering didst pronounce: > On Wed, Jan 23, Mel Gorman wrote: > > > Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of > > the > > following patch against 2.6.24-rc8 please? It contains the debug information > > that helped me figure out what was going wrong on the PPC64 machine here, > > the revert and the !l3 checks (i.e. the two patches that made machines I > > have access to work). Thanks > > It boots with your change. > ... Nice one! As the only addition here is debugging output, I can only assume that the two patches were being booted in isolation instead of combination earlier. The two threads have been a little confused with hand waving so that can easily happen. Looking at your log; > early_node_map[1] active PFN ranges > 1:0 -> 892928 All memory on node 1 > Online nodes > o 0 > o 1 > Nodes with regular memory > o 1 > Current running CPU 0 is associated with node 0 > Current node is 0 Running CPU associated with node 0 so other than being node 1 instead of node 2, your machine is similar to the one I had the problem on in terms of memoryless nodes and CPU configuration. > VFS: Cannot open root device "" or unknown-block(0,0) > Please append a correct "root=" boot option; here are the available > partitions: > Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) > Rebooting in 1 seconds.. > I see it failed to complete boot but I'm going to assume this is a relatively normal commane-line, .config or initrd problem and not a regression of some type. I'll post a patch suitable for pick-up shortly. The two patches ran in combination with CONFIG_DEBUG_SLAB a compile-based stress tests without difficulty so hopefully there is not new surprises hiding in the corners. Thanks Olaf. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Wed, Jan 23, Olaf Hering wrote: > On Wed, Jan 23, Mel Gorman wrote: > > > Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of > > the > > following patch against 2.6.24-rc8 please? It contains the debug information > > that helped me figure out what was going wrong on the PPC64 machine here, > > the revert and the !l3 checks (i.e. the two patches that made machines I > > have access to work). Thanks > > It boots with your change. This version of the patch boots ok for me: Maybe I made a mistake with earlier patches, no idea. --- mm/slab.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) --- a/mm/slab.c +++ b/mm/slab.c @@ -1590,7 +1590,7 @@ void __init kmem_cache_init(void) /* Replace the static kmem_list3 structures for the boot cpu */ init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node); - for_each_node_state(nid, N_NORMAL_MEMORY) { + for_each_online_node(nid) { init_list(malloc_sizes[INDEX_AC].cs_cachep, &initkmem_list3[SIZE_AC + nid], nid); @@ -1968,7 +1968,7 @@ static void __init set_up_list3s(struct { int node; - for_each_node_state(node, N_NORMAL_MEMORY) { + for_each_online_node(node) { cachep->nodelists[node] = &initkmem_list3[index + node]; cachep->nodelists[node]->next_reap = jiffies + REAPTIMEOUT_LIST3 + @@ -2099,7 +2099,7 @@ static int __init_refok setup_cpu_cache( g_cpucache_up = PARTIAL_L3; } else { int node; - for_each_node_state(node, N_NORMAL_MEMORY) { + for_each_online_node(node) { cachep->nodelists[node] = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, node); @@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache /* Take the l3 list lock to change the colour_next on this node */ check_irq_off(); l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } + BUG_ON(!l3); spin_lock(&l3->list_lock); /* Get colour for the slab, and cal the next value. */ @@ -3317,6 +3322,10 @@ static void *cache_alloc_node(struct int x; l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } BUG_ON(!l3); retry: @@ -3815,7 +3824,7 @@ static int alloc_kmemlist(struct kmem_ca struct array_cache *new_shared; struct array_cache **new_alien = NULL; - for_each_node_state(node, N_NORMAL_MEMORY) { + for_each_online_node(node) { if (use_alien_caches) { new_alien = alloc_alien_cache(node, cachep->limit); ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Wed, Jan 23, Mel Gorman wrote: > Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the > following patch against 2.6.24-rc8 please? It contains the debug information > that helped me figure out what was going wrong on the PPC64 machine here, > the revert and the !l3 checks (i.e. the two patches that made machines I > have access to work). Thanks It boots with your change. boot: x Please wait, loading kernel... Allocated 00a0 bytes for kernel @ 0020 Elf64 kernel loaded... OF stdout device is: /vdevice/[EMAIL PROTECTED] Hypertas detected, assuming LPAR ! command line: debug xmon=on panic=1 loglevel=8 memory layout at init: alloc_bottom : 00ac1000 alloc_top: 1000 alloc_top_hi : da00 rmo_top : 1000 ram_top : da00 Looking for displays found display : /[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED], opening ... done instantiating rtas at 0x0f6a1000 ... done : boot cpu 0002 : starting cpu hw idx 0002... done 0004 : starting cpu hw idx 0004... done 0006 : starting cpu hw idx 0006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x00cc2000 -> 0x00cc34e4 Device tree struct 0x00cc4000 -> 0x00cd6000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #52 SMP Wed Jan 23 13:05:38 CET 2008 - ppc64_pft_size= 0x1c physicalMemorySize= 0xda00 htab_hash_mask= 0x1f - Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #52 SMP Wed Jan 23 13:05:38 CET 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 892928 Normal 892928 -> 892928 Movable zone start PFN for each node early_node_map[1] active PFN ranges 1:0 -> 892928 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 880720 Policy zone: DMA Kernel command line: debug xmon=on panic=1 loglevel=8 [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 275.07 MHz time_init: processor frequency = 2197.80 MHz clocksource: timebase mult[e8ab05] shift[22] registered clockevent: decrementer mult[466a] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg-1] -> real [hvc0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 1 Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) Online nodes o 0 o 1 Nodes with regular memory o 1 Current running CPU 0 is associated with node 0 Current node is 0 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 0 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 1 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 2 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 3 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 4 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 5 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 6 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 7 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 8 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 9 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 10 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 11 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 12 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 13 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 14 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 15 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 16 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 17 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 18 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 19 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 20 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 21 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 22 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 23 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 24 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 25 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 26 o kmem_list3_init kmem_cache_init Setting kmem_cache NULL 27 o kmem_list3_
Re: crash in kmem_cache_init
On (23/01/08 08:58), Olaf Hering didst pronounce: > On Tue, Jan 22, Christoph Lameter wrote: > > > > 0xc00fe018 is in setup_cpu_cache > > > (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111). > > > 2106BUG_ON(!cachep->nodelists[node]); > > > 2107 > > > kmem_list3_init(cachep->nodelists[node]); > > > 2108} > > > 2109} > > > 2110} > > > > if (cachep->nodelists[numa_node_id()]) > > return; > > Does not help. > Sorry this is dragging out. Can you post the full dmesg with loglevel=8 of the following patch against 2.6.24-rc8 please? It contains the debug information that helped me figure out what was going wrong on the PPC64 machine here, the revert and the !l3 checks (i.e. the two patches that made machines I have access to work). Thanks diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-clean/mm/slab.c linux-2.6.24-rc8-015_debug_slab/mm/slab.c --- linux-2.6.24-rc8-clean/mm/slab.c2008-01-16 04:22:48.0 + +++ linux-2.6.24-rc8-015_debug_slab/mm/slab.c 2008-01-23 10:44:36.0 + @@ -348,6 +348,7 @@ static int slab_early_init = 1; static void kmem_list3_init(struct kmem_list3 *parent) { + printk(" o kmem_list3_init\n"); INIT_LIST_HEAD(&parent->slabs_full); INIT_LIST_HEAD(&parent->slabs_partial); INIT_LIST_HEAD(&parent->slabs_free); @@ -1236,6 +1237,7 @@ static int __cpuinit cpuup_prepare(long * kmem_list3 and not this cpu's kmem_list3 */ + printk("cpuup_prepare %ld\n", cpu); list_for_each_entry(cachep, &cache_chain, next) { /* * Set up the size64 kmemlist for cpu before we can @@ -1243,6 +1245,7 @@ static int __cpuinit cpuup_prepare(long * node has not already allocated this */ if (!cachep->nodelists[node]) { + printk(" o allocing %s %d\n", cachep->name, node); l3 = kmalloc_node(memsize, GFP_KERNEL, node); if (!l3) goto bad; @@ -1256,6 +1259,7 @@ static int __cpuinit cpuup_prepare(long * protection here. */ cachep->nodelists[node] = l3; + printk(" o l3 setup\n"); } spin_lock_irq(&cachep->nodelists[node]->list_lock); @@ -1320,6 +1324,7 @@ static int __cpuinit cpuup_prepare(long } return 0; bad: + printk(" o bad\n"); cpuup_canceled(cpu); return -ENOMEM; } @@ -1405,6 +1410,7 @@ static void init_list(struct kmem_cache spin_lock_init(&ptr->list_lock); MAKE_ALL_LISTS(cachep, ptr, nodeid); + printk("init_list RESETTING %s node %d\n", cachep->name, nodeid); cachep->nodelists[nodeid] = ptr; local_irq_enable(); } @@ -1427,10 +1433,23 @@ void __init kmem_cache_init(void) numa_platform = 0; } + printk("Online nodes\n"); + for_each_online_node(node) + printk("o %d\n", node); + printk("Nodes with regular memory\n"); + for_each_node_state(node, N_NORMAL_MEMORY) + printk("o %d\n", node); + printk("Current running CPU %d is associated with node %d\n", + smp_processor_id(), + cpu_to_node(smp_processor_id())); + printk("Current node is %d\n", + numa_node_id()); + for (i = 0; i < NUM_INIT_LISTS; i++) { kmem_list3_init(&initkmem_list3[i]); if (i < MAX_NUMNODES) cache_cache.nodelists[i] = NULL; + printk("kmem_cache_init Setting %s NULL %d\n", cache_cache.name, i); } /* @@ -1468,6 +1487,8 @@ void __init kmem_cache_init(void) cache_cache.colour_off = cache_line_size(); cache_cache.array[smp_processor_id()] = &initarray_cache.cache; cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE]; + printk("kmem_cache_init Setting %s NULL %d\n", cache_cache.name, node); + printk("kmem_cache_init Setting %s initkmem_list3 %d\n", cache_cache.name, node); /* * struct kmem_cache size depends on nr_node_ids, which @@ -1590,7 +1611,7 @@ void __init kmem_cache_init(void) /* Replace the static kmem_list3 structures for the boot cpu */ init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node); - for_each_node_state(nid, N_NORMAL_MEMORY) { + for_each_online_node(nid) { init_list(malloc_sizes[INDEX_AC].cs_cachep, &initkmem_list3[SIZE_AC + nid], nid); @@ -1968,11 +1989,13 @@ static void __init set_up_list3s(struct { int node; - for_each_node_state(node, N_NORMAL_MEMORY) { + print
Re: crash in kmem_cache_init
On Wed, Jan 23, Pekka Enberg wrote: > Hi Christoph, > > On Jan 23, 2008 1:18 AM, Christoph Lameter <[EMAIL PROTECTED]> wrote: > > My patch is useless (fascinating history of the changelog there through). > > fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that > > alloc_pages_node() will try to allocate on the current node but fallback > > to neighboring node if nothing is there > > Sure, but I was referring to the scenario where current node _has_ > pages available but no ->nodelists. Olaf, did you try it? Does not help. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
Hi Christoph, On Jan 23, 2008 1:18 AM, Christoph Lameter <[EMAIL PROTECTED]> wrote: > My patch is useless (fascinating history of the changelog there through). > fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that > alloc_pages_node() will try to allocate on the current node but fallback > to neighboring node if nothing is there Sure, but I was referring to the scenario where current node _has_ pages available but no ->nodelists. Olaf, did you try it? Pekka ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Tue, Jan 22, Christoph Lameter wrote: > > 0xc00fe018 is in setup_cpu_cache > > (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111). > > 2106BUG_ON(!cachep->nodelists[node]); > > 2107 > > kmem_list3_init(cachep->nodelists[node]); > > 2108} > > 2109} > > 2110} > > if (cachep->nodelists[numa_node_id()]) > return; Does not help. Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #48 SMP Wed Jan 23 08:54:23 CET 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 892928 Normal 892928 -> 892928 Movable zone start PFN for each node early_node_map[1] active PFN ranges 1:0 -> 892928 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 880720 Policy zone: DMA Kernel command line: debug xmon=on panic=1 [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 275.07 MHz time_init: processor frequency = 2197.80 MHz clocksource: timebase mult[e8ab05] shift[22] registered clockevent: decrementer mult[466a] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg-1] -> real [hvc0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 1 Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) Kernel panic - not syncing: kmem_cache_create(): failed to create slab `size-32(DMA)' Rebooting in 1 seconds.. --- mm/slab.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) --- a/mm/slab.c +++ b/mm/slab.c @@ -1590,7 +1590,7 @@ void __init kmem_cache_init(void) /* Replace the static kmem_list3 structures for the boot cpu */ init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node); - for_each_node_state(nid, N_NORMAL_MEMORY) { + for_each_online_node(nid) { init_list(malloc_sizes[INDEX_AC].cs_cachep, &initkmem_list3[SIZE_AC + nid], nid); @@ -1968,7 +1968,7 @@ static void __init set_up_list3s(struct { int node; - for_each_node_state(node, N_NORMAL_MEMORY) { + for_each_online_node(node) { cachep->nodelists[node] = &initkmem_list3[index + node]; cachep->nodelists[node]->next_reap = jiffies + REAPTIMEOUT_LIST3 + @@ -2108,6 +2108,8 @@ static int __init_refok setup_cpu_cache( } } } + if (!cachep->nodelists[numa_node_id()]) + return -ENODEV; cachep->nodelists[numa_node_id()]->next_reap = jiffies + REAPTIMEOUT_LIST3 + ((unsigned long)cachep) % REAPTIMEOUT_LIST3; @@ -2775,6 +2777,11 @@ static int cache_grow(struct kmem_cache /* Take the l3 list lock to change the colour_next on this node */ check_irq_off(); l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } + BUG_ON(!l3); spin_lock(&l3->list_lock); /* Get colour for the slab, and cal the next value. */ @@ -3317,6 +3324,10 @@ static void *cache_alloc_node(struct int x; l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } BUG_ON(!l3); retry: @@ -3815,7 +3826,7 @@ static int alloc_kmemlist(struct kmem_ca struct array_cache *new_shared; struct array_cache **new_alien = NULL; - for_each_node_state(node, N_NORMAL_MEMORY) { + for_each_online_node(node) { if (use_alien_caches) { new_alien = alloc_alien_cache(node, cachep->limit); ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Tue, 22 Jan 2008, Christoph Lameter wrote: > But I doubt that this is it. The fallback logic was added later and it > worked fine. My patch is useless (fascinating history of the changelog there through). fallback_alloc calls kmem_getpages without GFP_THISNODE. This means that alloc_pages_node() will try to allocate on the current node but fallback to neighboring node if nothing is there ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Tue, 22 Jan 2008, Mel Gorman wrote: > Rather it should be 2. I'll admit the physical setup of this machine is > less than ideal but clearly it's something that can happen even if > it's a bad idea. Ok. Lets hope that Pekka's find does the trick. But this would mean that fallback gets memory from node 2 for the page allocator. Then fallback alloc is going to try to insert it into the l3 of node 2 which is not there yet. So another ooops. Sigh. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
Hi, Mel Gorman wrote: > Faulting instruction address: 0xc03c8c00 > cpu 0x0: Vector: 300 (Data Access) at [c05c3840] > pc: c03c8c00: __lock_text_start+0x20/0x88 > lr: c00dadec: .cache_grow+0x7c/0x338 > sp: c05c3ac0 >msr: 80009032 >dar: 40 > dsisr: 4000 > current = 0xc0500f10 > paca= 0xc0501b80 > pid = 0, comm = swapper > enter ? for help > [c05c3b40] c00dadec .cache_grow+0x7c/0x338 > [c05c3c00] c00db54c .fallback_alloc+0x1c0/0x224 > [c05c3cb0] c00db958 .kmem_cache_alloc+0xe0/0x14c > [c05c3d50] c00d .kmem_cache_create+0x230/0x4cc > [c05c3e30] c04c05f4 .kmem_cache_init+0x310/0x640 > [c05c3ee0] c049f8d8 .start_kernel+0x304/0x3fc > [c05c3f90] c0008594 .start_here_common+0x54/0xc0 > 0:mon> I mentioned this already but received no response (maybe I am missing something totally obvious here): When we call fallback_alloc() because the current node has ->nodelists set to NULL, we end up calling kmem_getpages() with -1 as the node id which is then translated to numa_node_id() by alloc_pages_node. But the reason we called fallback_alloc() in the first place is because numa_node_id() doesn't have a ->nodelist which makes cache_grow() oops. Pekka ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Wed, 23 Jan 2008, Pekka Enberg wrote: > When we call fallback_alloc() because the current node has ->nodelists set to > NULL, we end up calling kmem_getpages() with -1 as the node id which is then > translated to numa_node_id() by alloc_pages_node. But the reason we called > fallback_alloc() in the first place is because numa_node_id() doesn't have a > ->nodelist which makes cache_grow() oops. Right, if nodeid == -1 then we need to call alloc_pages... Essentiall a revert of 50c85a19e7b3928b5b5188524c44ffcbacdd4e35 from 2005. But I doubt that this is it. The fallback logic was added later and it worked fine. --- mm/slab.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Index: linux-2.6/mm/slab.c === --- linux-2.6.orig/mm/slab.c2008-01-22 15:05:26.185452369 -0800 +++ linux-2.6/mm/slab.c 2008-01-22 15:05:59.301637009 -0800 @@ -1668,7 +1668,11 @@ static void *kmem_getpages(struct kmem_c if (cachep->flags & SLAB_RECLAIM_ACCOUNT) flags |= __GFP_RECLAIMABLE; - page = alloc_pages_node(nodeid, flags, cachep->gfporder); + if (nodeid == -1) + page = alloc_pages(flags, cachep->gfporder); + else + page = alloc_pages_node(nodeid, flags, cachep->gfporder); + if (!page) return NULL; ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On (22/01/08 14:57), Christoph Lameter didst pronounce: > On Tue, 22 Jan 2008, Mel Gorman wrote: > > > > > Whatever this was a problem fixed in the past or not, it's broken again > > > > now > > > > :( . It's possible that there is a __GFP_THISNODE that can be dropped > > > > early > > > > at boot-time that would also fix this problem in a way that doesn't > > > > affect runtime (like altering cache_grow in my patch does). > > > > > > The dropping of GFP_THISNODE has the same effect as your patch. > > > > The dropping of it totally? If so, this patch might fix a boot but it'll > > potentially be a performance regression on NUMA machines that only have > > nodes with memory, right? > > No the dropping during early allocations., > We can live with that if the machine otherwise survives during tests. They are kicked off at the moment with CONFIG_SLAB_DEBUG set but the point is moot if the patch doesn't work for Olaf. Am still waiting to hear if the two patches in combination work for him. > > o 0 > > o 2 > > Nodes with regular memory > > o 2 > > Current running CPU 0 is associated with node 0 > > Current node is 0 > > > > So node 2 has regular memory but it's trying to use node 0 at a glance. > > I've attached the patch I used against 2.6.24-rc8. It includes the revert. > > We need the current processor to be attached to a node that has > memory. We cannot fall back that early because the structures for the > other nodes do not exist yet. > Or bodge it early in the boot process so that a node with memory is always used. > > Online nodes > > o 0 > > o 2 > > Nodes with regular memory > > o 2 > > Current running CPU 0 is associated with node 0 > > Current node is 0 > > o kmem_list3_init > > This needs to be node 2. > Rather it should be 2. I'll admit the physical setup of this machine is less than ideal but clearly it's something that can happen even if it's a bad idea. > > [c05c3b40] c00dadec .cache_grow+0x7c/0x338 > > [c05c3c00] c00db54c .fallback_alloc+0x1c0/0x224 > > Fallback during bootstrap. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Tue, 22 Jan 2008, Mel Gorman wrote: > > > Whatever this was a problem fixed in the past or not, it's broken again > > > now > > > :( . It's possible that there is a __GFP_THISNODE that can be dropped > > > early > > > at boot-time that would also fix this problem in a way that doesn't > > > affect runtime (like altering cache_grow in my patch does). > > > > The dropping of GFP_THISNODE has the same effect as your patch. > > The dropping of it totally? If so, this patch might fix a boot but it'll > potentially be a performance regression on NUMA machines that only have > nodes with memory, right? No the dropping during early allocations., > o 0 > o 2 > Nodes with regular memory > o 2 > Current running CPU 0 is associated with node 0 > Current node is 0 > > So node 2 has regular memory but it's trying to use node 0 at a glance. > I've attached the patch I used against 2.6.24-rc8. It includes the revert. We need the current processor to be attached to a node that has memory. We cannot fall back that early because the structures for the other nodes do not exist yet. > Online nodes > o 0 > o 2 > Nodes with regular memory > o 2 > Current running CPU 0 is associated with node 0 > Current node is 0 > o kmem_list3_init This needs to be node 2. > [c05c3b40] c00dadec .cache_grow+0x7c/0x338 > [c05c3c00] c00db54c .fallback_alloc+0x1c0/0x224 Fallback during bootstrap. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On (22/01/08 13:34), Christoph Lameter didst pronounce: > On Tue, 22 Jan 2008, Mel Gorman wrote: > > > > After you reverted the slab memoryless node patch there should be per > > > node > > > structures created for node 0 unless the node is marked offline. Is it? > > > If > > > so then you are booting a cpu that is associated with an offline node. > > > > > > > I'll roll a patch that prints out the online states before startup and > > see what it looks like. > > Ok. Great. > The dmesg output is below. > > > > > > Can you see a better solution than this? > > > > > > Well this means that bootstrap will work by introducing foreign objects > > > into the per cpu queue (should only hold per cpu objects). They will > > > later be consumed and then the queues will contain the right objects so > > > the effect of the patch is minimal. > > > > > > > By minimal, do you mean that you expect it to break in some other > > respect later or minimal as in "this is bad but should not have no > > adverse impact". > > Should not have any adverse impact after the objects from the cpu queue > have been consumed. If the cache_reaper tries to shift objects back > from the per cpu queue into slabs then BUG_ONs may be triggered. Make sure > you run the tests with full debugging please. > I am not running a full range of tests at the moment. Just getting boot first. I'll queue up a range of tests to run with DEBUG on now but it'll be the morning before I have the results. > > Whatever this was a problem fixed in the past or not, it's broken again now > > :( . It's possible that there is a __GFP_THISNODE that can be dropped early > > at boot-time that would also fix this problem in a way that doesn't > > affect runtime (like altering cache_grow in my patch does). > > The dropping of GFP_THISNODE has the same effect as your patch. The dropping of it totally? If so, this patch might fix a boot but it'll potentially be a performance regression on NUMA machines that only have nodes with memory, right? > Objects from another node get into the per cpu queue. And on free we > assume that per cpu queue objects are from the local node. If debug is on > then we check that with BUG_ONs. > The interesting parts of the dmesg output are Online nodes o 0 o 2 Nodes with regular memory o 2 Current running CPU 0 is associated with node 0 Current node is 0 So node 2 has regular memory but it's trying to use node 0 at a glance. I've attached the patch I used against 2.6.24-rc8. It includes the revert. Here is the full output Please wait, loading kernel... Elf64 kernel loaded... Loading ramdisk... ramdisk loaded at 0240, size: 1192 Kbytes OF stdout device is: /vdevice/[EMAIL PROTECTED] Hypertas detected, assuming LPAR ! command line: ro console=hvc0 autobench_args: root=/dev/sda6 ABAT:1201041303 loglevel=8 memory layout at init: alloc_bottom : 0252a000 alloc_top: 0800 alloc_top_hi : 0001 rmo_top : 0800 ram_top : 0001 Looking for displays instantiating rtas at 0x077d9000 ... done : boot cpu 0002 : starting cpu hw idx 0002... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0262b000 -> 0x0262c1d3 Device tree struct 0x0262d000 -> 0x02635000 Calling quiesce ... returning from prom_init Partition configured for 4 cpus. Starting Linux PPC64 #1 SMP Tue Jan 22 17:15:48 EST 2008 - ppc64_pft_size= 0x1a physicalMemorySize= 0x1 htab_hash_mask= 0x7 - Linux version 2.6.24-rc8-autokern1 ([EMAIL PROTECTED]) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)) #1 SMP Tue Jan 22 17:15:48 EST 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 7168 bytes Zone PFN ranges: DMA 0 -> 1048576 Normal1048576 -> 1048576 Movable zone start PFN for each node early_node_map[1] active PFN ranges 2:0 -> 1048576 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 1034240 Policy zone: DMA Kernel command line: ro console=hvc0 autobench_args: root=/dev/sda6 ABAT:1201041303 loglevel=8 [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 238.059000 MHz time_init: processor frequency = 1904.472000 MHz clocksource: timebase mult[10cd746] shift[22] registered clockevent: decrementer mult[3cf1] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg0] -> real [hvc0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order
Re: crash in kmem_cache_init
On Tue, 22 Jan 2008, Olaf Hering wrote: > It crashes now in a different way if the patch below is applied: Yup no l3 structure for the current node. We are early in boostrap. You could just check if the l3 is there and if not just skip starting the reaper? This will be redone later anyways. Not sure if this will solve all your issues though. An l3 for the current node that we are booting on needs to be created early on for SLAB bootstrap to succeed. AFAICT SLUB doesnt care and simply uses whatever the page allocator gives it for the cpu slab. We may have gotten there because you only tested with SLUB recently and thus changes got in that broke SLAB boot assumptions. > 0xc00fe018 is in setup_cpu_cache > (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111). > 2106BUG_ON(!cachep->nodelists[node]); > 2107 > kmem_list3_init(cachep->nodelists[node]); > 2108} > 2109} > 2110} if (cachep->nodelists[numa_node_id()]) return; > 2111cachep->nodelists[numa_node_id()]->next_reap = > 2112jiffies + REAPTIMEOUT_LIST3 + > 2113((unsigned long)cachep) % REAPTIMEOUT_LIST3; > 2114 > 2115cpu_cache_get(cachep)->avail = 0; > > ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On 1/22/08, Olaf Hering <[EMAIL PROTECTED]> wrote: > On Tue, Jan 22, Mel Gorman wrote: > > > http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch > > .. Can you please check on your machine if it fixes your problem? > > It does not fix or change the nature of the crash. > > > Olaf, please confirm whether you need the patch below as well as the > > revert to make your machine boot. > > It crashes now in a different way if the patch below is applied: Was this with the revert Mel mentioned applied as well? I get the feeling both patches are needed to fix up the memoryless SLAB issue. > Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 > 20070115 (prerelease) (SUSE Linux)) #43 SMP Tue Jan 22 22:39:05 CET 2008 > early_node_map[1] active PFN ranges > 1:0 -> 892928 > Unable to handle kernel paging request for data at address 0x0058 > Faulting instruction address: 0xc00fe018 > cpu 0x0: Vector: 300 (Data Access) at [c075bac0] > pc: c00fe018: .setup_cpu_cache+0x184/0x1f4 > lr: c00fdfa8: .setup_cpu_cache+0x114/0x1f4 > sp: c075bd40 >msr: 80009032 >dar: 58 > dsisr: 4200 > current = 0xc0665a50 > paca= 0xc0666380 > pid = 0, comm = swapper > enter ? for help > [c075bd40] c00fb368 .kmem_cache_create+0x3c0/0x478 > (unreliable) > [c075be20] c05e6780 .kmem_cache_init+0x284/0x4f4 > [c075bee0] c05bf8ec .start_kernel+0x2f8/0x3fc > [c075bf90] c0008590 .start_here_common+0x60/0xd0 > 0:mon> > > 0xc00fe018 is in setup_cpu_cache > (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111). > 2106BUG_ON(!cachep->nodelists[node]); > 2107 > kmem_list3_init(cachep->nodelists[node]); I might be barking up the wrong tree, but this block above is supposed to set up the cachep->nodeslists[*] that are used immediately below. But if the loop wasn't changed from N_NORMAL_MEMORY to N_ONLINE or whatever, you might get a bad access right below for node 0 that has no memory, if that's the node we're running on... > 2108} > 2109} > 2110} > 2111cachep->nodelists[numa_node_id()]->next_reap = > 2112jiffies + REAPTIMEOUT_LIST3 + > 2113((unsigned long)cachep) % REAPTIMEOUT_LIST3; > 2114 > 2115cpu_cache_get(cachep)->avail = 0; Thanks, Nish ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Tue, Jan 22, Mel Gorman wrote: > http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch > .. Can you please check on your machine if it fixes your problem? It does not fix or change the nature of the crash. > Olaf, please confirm whether you need the patch below as well as the > revert to make your machine boot. It crashes now in a different way if the patch below is applied: Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #43 SMP Tue Jan 22 22:39:05 CET 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 892928 Normal 892928 -> 892928 Movable zone start PFN for each node early_node_map[1] active PFN ranges 1:0 -> 892928 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 880720 Policy zone: DMA Kernel command line: debug xmon=on panic=1 [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 275.07 MHz time_init: processor frequency = 2197.80 MHz clocksource: timebase mult[e8ab05] shift[22] registered clockevent: decrementer mult[466a] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg-1] -> real [hvc0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 1 Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) Unable to handle kernel paging request for data at address 0x0058 Faulting instruction address: 0xc00fe018 cpu 0x0: Vector: 300 (Data Access) at [c075bac0] pc: c00fe018: .setup_cpu_cache+0x184/0x1f4 lr: c00fdfa8: .setup_cpu_cache+0x114/0x1f4 sp: c075bd40 msr: 80009032 dar: 58 dsisr: 4200 current = 0xc0665a50 paca= 0xc0666380 pid = 0, comm = swapper enter ? for help [c075bd40] c00fb368 .kmem_cache_create+0x3c0/0x478 (unreliable) [c075be20] c05e6780 .kmem_cache_init+0x284/0x4f4 [c075bee0] c05bf8ec .start_kernel+0x2f8/0x3fc [c075bf90] c0008590 .start_here_common+0x60/0xd0 0:mon> 0xc00fe018 is in setup_cpu_cache (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111). 2106BUG_ON(!cachep->nodelists[node]); 2107 kmem_list3_init(cachep->nodelists[node]); 2108} 2109} 2110} 2111cachep->nodelists[numa_node_id()]->next_reap = 2112jiffies + REAPTIMEOUT_LIST3 + 2113((unsigned long)cachep) % REAPTIMEOUT_LIST3; 2114 2115cpu_cache_get(cachep)->avail = 0; ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Tue, 22 Jan 2008, Mel Gorman wrote: > > After you reverted the slab memoryless node patch there should be per node > > structures created for node 0 unless the node is marked offline. Is it? If > > so then you are booting a cpu that is associated with an offline node. > > > > I'll roll a patch that prints out the online states before startup and > see what it looks like. Ok. Great. > > > > Can you see a better solution than this? > > > > Well this means that bootstrap will work by introducing foreign objects > > into the per cpu queue (should only hold per cpu objects). They will > > later be consumed and then the queues will contain the right objects so > > the effect of the patch is minimal. > > > > By minimal, do you mean that you expect it to break in some other > respect later or minimal as in "this is bad but should not have no > adverse impact". Should not have any adverse impact after the objects from the cpu queue have been consumed. If the cache_reaper tries to shift objects back from the per cpu queue into slabs then BUG_ONs may be triggered. Make sure you run the tests with full debugging please. > Whatever this was a problem fixed in the past or not, it's broken again now > :( . It's possible that there is a __GFP_THISNODE that can be dropped early > at boot-time that would also fix this problem in a way that doesn't > affect runtime (like altering cache_grow in my patch does). The dropping of GFP_THISNODE has the same effect as your patch. Objects from another node get into the per cpu queue. And on free we assume that per cpu queue objects are from the local node. If debug is on then we check that with BUG_ONs. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On (22/01/08 12:11), Christoph Lameter didst pronounce: > On Tue, 22 Jan 2008, Mel Gorman wrote: > > > Christoph/Pekka, this patch is papering over the problem and something > > more fundamental may be going wrong. The crash occurs because l3 is NULL > > and the cache is kmem_cache so this is early in the boot process. It is > > selecting l3 based on node 2 which is correct in terms of available memory > > but it initialises the lists on node 0 because that is the node the CPUs are > > located. Hence later it uses an uninitialised nodelists and BLAM. Relevant > > parts of the log for seeing the memoryless nodes in relation to CPUs is; > > Would it be possible to run the bootstrap on a cpu that has a > node with memory associated to it? Not in the way the machine is currently configured. All the CPUs appear to be on a node with no memory. It's best to assume I cannot get the machine reconfigured (which just hides the bug anyway). Physically, it's thousands of miles away so I can't do the work. I can get lab support to do the job but that will take a fair while and at the end of the day, it doesn't tell us a lot. We know that other PPC64 machines work so it's not a general problem. > I believe we had the same situation > last year when GFP_THISNODE was introduced? > It feels vaguely familiar but I don't recall the details in sufficient detail to recognise if this is the same problem or not. > After you reverted the slab memoryless node patch there should be per node > structures created for node 0 unless the node is marked offline. Is it? If > so then you are booting a cpu that is associated with an offline node. > I'll roll a patch that prints out the online states before startup and see what it looks like. > > Can you see a better solution than this? > > Well this means that bootstrap will work by introducing foreign objects > into the per cpu queue (should only hold per cpu objects). They will > later be consumed and then the queues will contain the right objects so > the effect of the patch is minimal. > By minimal, do you mean that you expect it to break in some other respect later or minimal as in "this is bad but should not have no adverse impact". > I thought we fixed the similar situation last year by dropping > GFP_THISNODE for some allocations? > Whatever this was a problem fixed in the past or not, it's broken again now :( . It's possible that there is a __GFP_THISNODE that can be dropped early at boot-time that would also fix this problem in a way that doesn't affect runtime (like altering cache_grow in my patch does). -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Tue, 22 Jan 2008, Mel Gorman wrote: > Christoph/Pekka, this patch is papering over the problem and something > more fundamental may be going wrong. The crash occurs because l3 is NULL > and the cache is kmem_cache so this is early in the boot process. It is > selecting l3 based on node 2 which is correct in terms of available memory > but it initialises the lists on node 0 because that is the node the CPUs are > located. Hence later it uses an uninitialised nodelists and BLAM. Relevant > parts of the log for seeing the memoryless nodes in relation to CPUs is; Would it be possible to run the bootstrap on a cpu that has a node with memory associated to it? I believe we had the same situation last year when GFP_THISNODE was introduced? After you reverted the slab memoryless node patch there should be per node structures created for node 0 unless the node is marked offline. Is it? If so then you are booting a cpu that is associated with an offline node. > Can you see a better solution than this? Well this means that bootstrap will work by introducing foreign objects into the per cpu queue (should only hold per cpu objects). They will later be consumed and then the queues will contain the right objects so the effect of the patch is minimal. I thought we fixed the similar situation last year by dropping GFP_THISNODE for some allocations? ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On (18/01/08 23:57), Olaf Hering didst pronounce: > On Fri, Jan 18, Christoph Lameter wrote: > > > Could you try this patch? > > Does not help, same crash. > Hi Olaf, It was suggested this problem was the same as another slab-related boot problem that was fixed for 2.6.24 by reverting a change. This fix can be found at http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch . Can you please check on your machine if it fixes your problem? I am 99.% it will *not* fix your problem because there was two bugs, not one as previously believed. On two test machines here, this kmem_cache_init problem still happens even with the revert which fixed a third machine. I was delayed in testing because these boxen unavailable from Friday until yesterday evening (a stellar display of timing). It was missed on TKO because it was SLAB-specific and those machines were testing SLUB. I found that the patch below was necessary to fix the problem. Olaf, please confirm whether you need the patch below as well as the revert to make your machine boot. Christoph/Pekka, this patch is papering over the problem and something more fundamental may be going wrong. The crash occurs because l3 is NULL and the cache is kmem_cache so this is early in the boot process. It is selecting l3 based on node 2 which is correct in terms of available memory but it initialises the lists on node 0 because that is the node the CPUs are located. Hence later it uses an uninitialised nodelists and BLAM. Relevant parts of the log for seeing the memoryless nodes in relation to CPUs is; early_node_map[1] active PFN ranges 2:0 -> 1048576 Processor 1 found. clockevent: decrementer mult[3cf1] shift[16] cpu[2] Processor 2 found. clockevent: decrementer mult[3cf1] shift[16] cpu[3] Processor 3 found. Brought up 4 CPUs Node 0 CPUs: 0-3 Node 2 CPUs: Can you see a better solution than this? Recent changes to how slab operates mean a situation can occur on systems with memoryless nodes whereby the nodeid used when growing the slab does not map to the correct kmem_list3. The following patch adds the necessary check to the indicated preferred nodeid and if it is bogus, use numa_node_id() instead. Signed-off-by: Mel Gorman <[EMAIL PROTECTED]> --- mm/slab.c |9 + 1 file changed, 9 insertions(+) diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c --- linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c 2008-01-22 17:46:32.0 + +++ linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c2008-01-22 18:42:53.0 + @@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache /* Take the l3 list lock to change the colour_next on this node */ check_irq_off(); l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } + BUG_ON(!l3); spin_lock(&l3->list_lock); /* Get colour for the slab, and cal the next value. */ @@ -3317,6 +3322,10 @@ static void *cache_alloc_node(struct int x; l3 = cachep->nodelists[nodeid]; + if (!l3) { + nodeid = numa_node_id(); + l3 = cachep->nodelists[nodeid]; + } BUG_ON(!l3); retry: -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Thu, 17 Jan 2008, Olaf Hering wrote: > On Thu, Jan 17, Olaf Hering wrote: > > > Since -mm boots further, what patch should I try? > > rc8-mm1 crashes as well, l3 passed to reap_alien() is NULL. Sigh. It looks like we need alien cache structures in some cases for nodes that have no memory. We must allocate structures for all nodes regardless if they have allocatable memory or not. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Fri, 18 Jan 2008, Olaf Hering wrote: > calls cache_grow with nodeid 0 > > [c075bbd0] [c00f82d0] .cache_alloc_refill+0x234/0x2c0 > calls cache_grow with nodeid 0 > > [c075bbe0] [c00f7f38] .cache_alloc_node+0x17c/0x1e8 > > calls cache_grow with nodeid 1 > > [c075bbe0] [c00f7d68] .fallback_alloc+0x1a0/0x1f4 Okay that makes sense. You have no node 0 with normal memory but the node assigned to the executing processor is zero (correct?). Thus it needs to fallback to node 1 and that is not possible during bootstrap. You need to run kmem_cache_init() on a cpu on a processor with memory. Or we need to revert the patch which would allocate control structures again for all online nodes regardless if they have memory or not. Does reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 change the situation? (However, we tried this on the other thread without success). ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Fri, Jan 18, Christoph Lameter wrote: > Could you try this patch? Does not help, same crash. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Fri, 18 Jan 2008, Christoph Lameter wrote: > Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support HIGHMEM If !CONFIG_HIGHMEM then enum node_states { #ifdef CONFIG_HIGHMEM N_HIGH_MEMORY, /* The node has regular or high memory */ #else N_HIGH_MEMORY = N_NORMAL_MEMORY, #endif So for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); free_area_init_node(nid, pgdat, NULL, find_min_pfn_for_node(nid), NULL); /* Any memory on that node */ if (pgdat->node_present_pages) node_set_state(nid, N_HIGH_MEMORY); ^^^ sets N_NORMAL_MEMORY check_for_regular_memory(pgdat); } ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On 1/18/08, Christoph Lameter <[EMAIL PROTECTED]> wrote: > Could you try this patch? > > Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support > HIGHMEM > > It seems that we only scan through zones to set N_NORMAL_MEMORY only if > CONFIG_HIGHMEM and CONFIG_NUMA are set. We need to set > N_NORMAL_MEMORY > in the !CONFIG_HIGHMEM case. I'm testing this exact patch right now on the machine Mel saw the issues with. Thanks, Nish ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
Could you try this patch? Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support HIGHMEM It seems that we only scan through zones to set N_NORMAL_MEMORY only if CONFIG_HIGHMEM and CONFIG_NUMA are set. We need to set N_NORMAL_MEMORY in the !CONFIG_HIGHMEM case. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6/mm/page_alloc.c === --- linux-2.6.orig/mm/page_alloc.c 2008-01-18 14:08:41.0 -0800 +++ linux-2.6/mm/page_alloc.c 2008-01-18 14:13:34.0 -0800 @@ -3812,7 +3812,6 @@ restart: /* Any regular memory on that node ? */ static void check_for_regular_memory(pg_data_t *pgdat) { -#ifdef CONFIG_HIGHMEM enum zone_type zone_type; for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) { @@ -3820,7 +3819,6 @@ static void check_for_regular_memory(pg_ if (zone->present_pages) node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY); } -#endif } /** ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Fri, 18 Jan 2008, Mel Gorman wrote: > static void check_for_regular_memory(pg_data_t *pgdat) > { > #ifdef CONFIG_HIGHMEM > enum zone_type zone_type; > > for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) { > struct zone *zone = &pgdat->node_zones[zone_type]; > if (zone->present_pages) > node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY); > } > #endif > } > > i.e. go through the other zones and if any of them have memory, set > N_NORMAL_MEMORY. But... it only does this on CONFIG_HIGHMEM which on > PPC64 is not going to be set so N_NORMAL_MEMORY never gets set on > POWER That sounds bad. Argh. We may need to do a node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY) in the !HIGHMEM case. > and one of them is in kmem_cache_init(). That seems very significant. > Christoph, can you think of possibilities of where N_NORMAL_MEMORY not > being set would cause trouble for slab? Yes. That results in the per node structures not being created and thus l3 == NULL. Explains our failures. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On (18/01/08 10:47), Christoph Lameter didst pronounce: > On Thu, 17 Jan 2008, Olaf Hering wrote: > > > early_node_map[1] active PFN ranges > > 1:0 -> 892928 > > Could not find start_pfn for node 0 > > Corrupted min_pfn? > Doubtful. Node 0 has no memory but it is still being initialised. Still, I looked closer at what is going on when that message gets displayed and I see this in free_area_init_nodes() for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); free_area_init_node(nid, pgdat, NULL, find_min_pfn_for_node(nid), NULL); /* Any memory on that node */ if (pgdat->node_present_pages) node_set_state(nid, N_HIGH_MEMORY); check_for_regular_memory(pgdat); } This "Any memory on that node" thing is new and it says if there is any memory on the node, set N_HIGH_MEMORY. Fine I guess, I haven't tracked these changes closely. It calls check_for_regular_memory() which looks like static void check_for_regular_memory(pg_data_t *pgdat) { #ifdef CONFIG_HIGHMEM enum zone_type zone_type; for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) { struct zone *zone = &pgdat->node_zones[zone_type]; if (zone->present_pages) node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY); } #endif } i.e. go through the other zones and if any of them have memory, set N_NORMAL_MEMORY. But... it only does this on CONFIG_HIGHMEM which on PPC64 is not going to be set so N_NORMAL_MEMORY never gets set on POWER That sounds bad. [EMAIL PROTECTED]:~/git/linux-2.6/mm$ grep -n N_NORMAL_MEMORY slab.c 1593: for_each_node_state(nid, N_NORMAL_MEMORY) { 1971: for_each_node_state(node, N_NORMAL_MEMORY) { 2102: for_each_node_state(node, N_NORMAL_MEMORY) { 3818: for_each_node_state(node, N_NORMAL_MEMORY) { and one of them is in kmem_cache_init(). That seems very significant. Christoph, can you think of possibilities of where N_NORMAL_MEMORY not being set would cause trouble for slab? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Thu, 17 Jan 2008, Olaf Hering wrote: > Normal 892928 -> 892928 > Movable zone start PFN for each node > early_node_map[1] active PFN ranges > 1:0 -> 892928 > Could not find start_pfn for node 0 We only have a single node that is node 1? And then we initialize nodes 0 to 3? > Memory: 3496633k/3571712k available (6188k kernel code, 75080k reserved, > 1324k data, 1220k bss, 304k init) > cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 0 > l3 c05fddf0 > cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 1 > l3 c05fddf0 > cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 2 > l3 c05fddf0 > cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 3 > l3 c05fddf0 ??? ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Fri, 18 Jan 2008, Olaf Hering wrote: > calls cache_grow with nodeid 0 > > [c075bbd0] [c00f82d0] .cache_alloc_refill+0x234/0x2c0 > calls cache_grow with nodeid 0 > > [c075bbe0] [c00f7f38] .cache_alloc_node+0x17c/0x1e8 > > calls cache_grow with nodeid 1 > > [c075bbe0] [c00f7d68] .fallback_alloc+0x1a0/0x1f4 Hmmm... fallback_alloc should not be called during bootstrap. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Thu, 17 Jan 2008, Olaf Hering wrote: > early_node_map[1] active PFN ranges > 1:0 -> 892928 > Could not find start_pfn for node 0 Corrupted min_pfn? ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Thu, Jan 17, Olaf Hering wrote: > On Thu, Jan 17, Christoph Lameter wrote: > > > On Thu, 17 Jan 2008, Olaf Hering wrote: > > > > > The patch does not help. > > > > Duh. We need to know more about the problem. > > cache_grow is called from 3 places. The third call has cleared l3 for > some reason. Typo in debug patch. calls cache_grow with nodeid 0 > [c075bbd0] [c00f82d0] .cache_alloc_refill+0x234/0x2c0 calls cache_grow with nodeid 0 > [c075bbe0] [c00f7f38] .cache_alloc_node+0x17c/0x1e8 calls cache_grow with nodeid 1 > [c075bbe0] [c00f7d68] .fallback_alloc+0x1a0/0x1f4 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Thu, Jan 17, Christoph Lameter wrote: > On Thu, 17 Jan 2008, Olaf Hering wrote: > > > The patch does not help. > > Duh. We need to know more about the problem. cache_grow is called from 3 places. The third call has cleared l3 for some reason. Allocated 00a0 bytes for kernel @ 0020 Elf64 kernel loaded... OF stdout device is: /vdevice/[EMAIL PROTECTED] Hypertas detected, assuming LPAR ! command line: xmon=on sysrq=1 debug panic=1 memory layout at init: alloc_bottom : 00ac1000 alloc_top: 1000 alloc_top_hi : da00 rmo_top : 1000 ram_top : da00 Looking for displays found display : /[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED], opening ... done instantiating rtas at 0x0f6a1000 ... done : boot cpu 0002 : starting cpu hw idx 0002... done 0004 : starting cpu hw idx 0004... done 0006 : starting cpu hw idx 0006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x00cc2000 -> 0x00cc34e4 Device tree struct 0x00cc4000 -> 0x00cd6000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #34 SMP Thu Jan 17 22:06:41 CET 2008 - ppc64_pft_size= 0x1c physicalMemorySize= 0xda00 htab_hash_mask= 0x1f - Linux version 2.6.24-rc8-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #34 SMP Thu Jan 17 22:06:41 CET 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 892928 Normal 892928 -> 892928 Movable zone start PFN for each node early_node_map[1] active PFN ranges 1:0 -> 892928 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 880720 Policy zone: DMA Kernel command line: xmon=on sysrq=1 debug panic=1 [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decrementer frequency = 275.07 MHz time_init: processor frequency = 2197.80 MHz clocksource: timebase mult[e8ab05] shift[22] registered clockevent: decrementer mult[466a] shift[16] cpu[0] Console: colour dummy device 80x25 console handover: boot [udbg-1] -> real [hvc0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 1 Memory: 3496633k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 0 l3 c05fddf0 cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 1 l3 c05fddf0 cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 2 l3 c05fddf0 cache_grow(2778) swapper(0):c0,j4294937299 cachep c06a4fb8 nodeid 3 l3 c05fddf0 [ cut here ] Badness at /home/olaf/kernel/git/linux-2.6.24-rc8/mm/slab.c:2779 NIP: c00f78f4 LR: c00f78e0 CTR: 801af404 REGS: c075b880 TRAP: 0700 Not tainted (2.6.24-rc8-ppc64) MSR: 80029032 CR: 2422 XER: 0001 TASK = c0665a50[0] 'swapper' THREAD: c0758000 CPU: 0 GPR00: 0004 c075bb00 c07544c0 0063 GPR04: 0001 0001 GPR08: c06a19a0 c07a84b0 c07a84a8 GPR12: 4000 c0666380 GPR16: 4020 GPR20: 007fbd70 c054f6c8 000492d0 GPR24: c06a4fb8 c06a4fb8 c05fdc80 GPR28: 000412d0 c06e5b80 0004 NIP [c00f78f4] .cache_grow+0xc8/0x39c LR [c00f78e0] .cache_grow+0xb4/0x39c Call Trace: [c075bb00] [c00f78e0] .cache_grow+0xb4/0x39c (unreliable) [c075bbd0] [c00f82d0] .cache_alloc_refill+0x234/0x2c0 [c075bc90] [c00f842c] .kmem_cache_alloc+0xd0/0x294 [c075bd40] [c00fb4e8] .kmem_cache_create+0x208/0x478 [c075be20] [c05e670c] .kmem_cache_init+0x218/0x4f4 [c075bee0] [c05bf8ec] .start_kernel+0x2f8/0x3fc [c075bf90] [c0008590] .start_here_common+0x60/0xd0 Instruction dump: e89e80e0 e92a e80b0468 7f4ad378 fbe10070 f8010078 4bf85f01 6000 381f0001 7c1f07b4 2f9f0004
Re: crash in kmem_cache_init
On Thu, Jan 17, Olaf Hering wrote: > Since -mm boots further, what patch should I try? rc8-mm1 crashes as well, l3 passed to reap_alien() is NULL. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Thu, Jan 17, Christoph Lameter wrote: > > freeing bootmem node 1 > > Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, > > 1324k data, 1220k bss, 304k init) > > cache_grow(2781) swapper(0):c0,j4294937299 cp c06a4fb8 !l3 > > Is there more backtrace information? What function called cache_grow? I just put a 'if (!l3) return 0;' into cache_grow, the backtrace is the one from the initial report. Reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 does not change anything. Since -mm boots further, what patch should I try? The kernel boots on a different p570. See attached dmesg. huckleberry boots, cranberry crashes. --- huckleberry.suse.de-2.6.16.57-0.5-ppc64.txt 2008-01-17 20:48:18.510309000 +0100 +++ cranberry.suse.de-2.6.16.57-0.5-ppc64.txt 2008-01-17 20:48:09.425402000 +0100 @@ -1,56 +1,55 @@ Page orders: linear mapping = 24, others = 12 -Found initrd at 0xc270:0xc2a93000 +Found initrd at 0xc130:0xc16e6c1e Partition configured for 8 cpus. Starting Linux PPC64 #1 SMP Wed Dec 5 09:02:21 UTC 2007 - -ppc64_pft_size= 0x1b +ppc64_pft_size= 0x1c ppc64_interrupt_controller= 0x2 platform = 0x101 -physicalMemorySize= 0x15800 +physicalMemorySize= 0xda00 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x -htab_hash_mask= 0xf +htab_hash_mask= 0x1f - [boot]0100 MM Init [boot]0100 MM Init Done Linux version 2.6.16.57-0.5-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #1 SMP Wed Dec 5 09:02:21 UTC 2007 [boot]0012 Setup Arch -Node 0 Memory: 0x0-0xb000 -Node 1 Memory: 0xb000-0x15800 +Node 0 Memory: +Node 1 Memory: 0x0-0xda00 EEH: PCI Enhanced I/O Error Handling Enabled -PPC64 nvram contains 7168 bytes +PPC64 nvram contains 8192 bytes Using dedicated idle loop -On node 0 totalpages: 720896 - DMA zone: 720896 pages, LIFO batch:31 +On node 0 totalpages: 0 + DMA zone: 0 pages, LIFO batch:0 DMA32 zone: 0 pages, LIFO batch:0 Normal zone: 0 pages, LIFO batch:0 HighMem zone: 0 pages, LIFO batch:0 -On node 1 totalpages: 688128 - DMA zone: 688128 pages, LIFO batch:31 +On node 1 totalpages: 892928 + DMA zone: 892928 pages, LIFO batch:31 DMA32 zone: 0 pages, LIFO batch:0 Normal zone: 0 pages, LIFO batch:0 HighMem zone: 0 pages, LIFO batch:0 [boot]0015 Setup Done Built 2 zonelists -Kernel command line: root=/dev/disk/by-id/scsi-SIBM_ST373453LC_3HW1CPW57445Q010-part5 xmon=on sysrq=1 quiet +Kernel command line: root=/dev/system/root xmon=on sysrq=1 quiet [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 131072 bytes) -time_init: decrementer frequency = 207.052000 MHz -time_init: processor frequency = 1654.344000 MHz +time_init: decrementer frequency = 275.07 MHz +time_init: processor frequency = 2197.80 MHz Console: colour dummy device 80x25 -Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) -Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) -freeing bootmem node 0 +Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) +Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) freeing bootmem node 1 -Memory: 5524952k/5636096k available (4464k kernel code, 44k reserved, 1992k data, 836k bss, 264k init) -Calibrating delay loop... 413.69 BogoMIPS (lpj=2068480) +Memory: 3494648k/3571712k available (4464k kernel code, 77064k reserved, 1992k data, 836k bss, 264k init) +Calibrating delay loop... 548.86 BogoMIPS (lpj=2744320) Security Framework v1.0.0 initialized Mount-cache hash table entries: 256 checking if image is initramfs... it is -Freeing initrd memory: 3660k freed +Freeing initrd memory: 3995k freed Processor 1 found. Processor 2 found. Processor 3 found. @@ -61,7 +60,7 @@ Processor 7 found. Brought up 8 CPUs Node 0 CPUs: 0-3 Node 1 CPUs: 4-7 -migration_cost=41,0,4308 +migration_cost=38,0,3225 NET: Registered protocol family 16 PCI: Probing PCI hardware IOMMU table initialized, virtual merging enabled Page orders: linear mapping = 24, others = 12 Found initrd at 0xc270:0xc2a93000 Partition configured for 8 cpus. Starting Linux PPC64 #1 SMP Wed Dec 5 09:02:21 UTC 2007 - ppc64_pft_size= 0x1b ppc64_interrupt_controller= 0x2 platform = 0x101 physicalMemorySize= 0x15800 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x htab_hash_mask= 0xf ---
Re: crash in kmem_cache_init
Could you try Pekka's suggestion of reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 ? ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Thu, 17 Jan 2008, Olaf Hering wrote: > The patch does not help. Duh. We need to know more about the problem. > > --- linux-2.6.orig/mm/slab.c2008-01-03 12:26:42.0 -0800 > > +++ linux-2.6/mm/slab.c 2008-01-09 15:59:49.0 -0800 > > @@ -2977,7 +2977,10 @@ retry: > > } > > l3 = cachep->nodelists[node]; > > > > - BUG_ON(ac->avail > 0 || !l3); > > + if (!l3) > > + return NULL; > > + > > + BUG_ON(ac->avail > 0); > > spin_lock(&l3->list_lock); > > > > /* See if we can refill from the shared array */ > > Is this hsupposed to go into cache_grow()? There is no NULL check > for l3. No its for cache_alloc_refill. cache_grow should only be called for nodes that have memory. l3 is always used before cache_grow is called. > freeing bootmem node 1 > Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, > 1324k data, 1220k bss, 304k init) > cache_grow(2781) swapper(0):c0,j4294937299 cp c06a4fb8 !l3 Is there more backtrace information? What function called cache_grow? ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Thu, Jan 17, Christoph Lameter wrote: > On Thu, 17 Jan 2008, Pekka Enberg wrote: > > > Looks similar to the one discussed on linux-mm ("[BUG] at > > mm/slab.c:3320" thread). Christoph? > > Right. Try the latest version of the patch to fix it: The patch does not help. > Index: linux-2.6/mm/slab.c > === > --- linux-2.6.orig/mm/slab.c 2008-01-03 12:26:42.0 -0800 > +++ linux-2.6/mm/slab.c 2008-01-09 15:59:49.0 -0800 > @@ -2977,7 +2977,10 @@ retry: > } > l3 = cachep->nodelists[node]; > > - BUG_ON(ac->avail > 0 || !l3); > + if (!l3) > + return NULL; > + > + BUG_ON(ac->avail > 0); > spin_lock(&l3->list_lock); > > /* See if we can refill from the shared array */ Is this hunk supposed to go into cache_grow()? There is no NULL check for l3. But if I do that, it does not help: freeing bootmem node 1 Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init) cache_grow(2781) swapper(0):c0,j4294937299 cp c06a4fb8 !l3 Kernel panic - not syncing: kmem_cache_create(): failed to create slab `size-32' Rebooting in 1 seconds.. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Thu, 17 Jan 2008, Pekka Enberg wrote: > Looks similar to the one discussed on linux-mm ("[BUG] at > mm/slab.c:3320" thread). Christoph? Right. Try the latest version of the patch to fix it: Index: linux-2.6/mm/slab.c === --- linux-2.6.orig/mm/slab.c2008-01-03 12:26:42.0 -0800 +++ linux-2.6/mm/slab.c 2008-01-09 15:59:49.0 -0800 @@ -2977,7 +2977,10 @@ retry: } l3 = cachep->nodelists[node]; - BUG_ON(ac->avail > 0 || !l3); + if (!l3) + return NULL; + + BUG_ON(ac->avail > 0); spin_lock(&l3->list_lock); /* See if we can refill from the shared array */ @@ -3224,7 +3227,7 @@ static void *alternate_node_alloc(struct nid_alloc = cpuset_mem_spread_node(); else if (current->mempolicy) nid_alloc = slab_node(current->mempolicy); - if (nid_alloc != nid_here) + if (nid_alloc != nid_here && node_state(nid_alloc, N_NORMAL_MEMORY)) return cache_alloc_node(cachep, flags, nid_alloc); return NULL; } @@ -3439,8 +3442,14 @@ __do_cache_alloc(struct kmem_cache *cach * We may just have run out of memory on the local node. * cache_alloc_node() knows how to locate memory on other nodes */ - if (!objp) - objp = cache_alloc_node(cache, flags, numa_node_id()); + if (!objp) { + int node_id = numa_node_id(); + if (likely(cache->nodelists[node_id])) /* fast path */ + objp = cache_alloc_node(cache, flags, node_id); + else /* this function can do good fallback */ + objp = __cache_alloc_node(cache, flags, node_id, + __builtin_return_address(0)); + } out: return objp; ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
Hi Olaf, [Adding Christoph as cc.] On Jan 15, 2008 5:09 PM, Olaf Hering <[EMAIL PROTECTED]> wrote: > Current linus tree crashes in kmem_cache_init, as shown below. The > system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram. > Firmware is 240_332, 2.6.23 boots ok with the same config. > > There is a series of mm related patches in 2.6.24-rc1: > commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it, So that's the "Memoryless nodes: Slab support" patch that I think cause a similar oops while ago. > Unable to handle kernel paging request for data at address 0x0040 > Faulting instruction address: 0xc0437470 > cpu 0x0: Vector: 300 (Data Access) at [c075b830] > pc: c0437470: ._spin_lock+0x20/0x88 > lr: c00f78a8: .cache_grow+0x7c/0x338 > sp: c075bab0 >msr: 80009032 >dar: 40 > dsisr: 4000 > current = 0xc0665a50 > paca= 0xc0666380 > pid = 0, comm = swapper > enter ? for help > [c075bb30] c00f78a8 .cache_grow+0x7c/0x338 > [c075bbf0] c00f7d04 .fallback_alloc+0x1a0/0x1f4 > [c075bca0] c00f8544 .kmem_cache_alloc+0xec/0x150 > [c075bd40] c00fb1c0 .kmem_cache_create+0x208/0x478 > [c075be20] c05e670c .kmem_cache_init+0x218/0x4f4 > [c075bee0] c05bf8ec .start_kernel+0x2f8/0x3fc > [c075bf90] c0008590 .start_here_common+0x60/0xd0 Looks similar to the one discussed on linux-mm ("[BUG] at mm/slab.c:3320" thread). Christoph? ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: crash in kmem_cache_init
On Tue, Jan 15, Olaf Hering wrote: > > Current linus tree crashes in kmem_cache_init, as shown below. The > system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram. > Firmware is 240_332, 2.6.23 boots ok with the same config. > > There is a series of mm related patches in 2.6.24-rc1: > commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it, 2.6.24-rc6-mm1-ppc64 boots past this point, but crashes later. Likely unrelated to the kmem_cache_init bug: ... matroxfb: 640x480x8bpp (virtual: 640x26214) matroxfb: framebuffer at 0x4017800, mapped to 0xd8008008, size 33554432 Console: switching to colour frame buffer device 80x30 fb0: MATROX frame buffer device matroxfb_crtc2: secondary head of fb0 was registered as fb1 vio_register_driver: driver hvc_console registering HVSI: registered 0 devices Generic RTC Driver v1.07 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled pmac_zilog: 0.6 (Benjamin Herrenschmidt <[EMAIL PROTECTED]>) input: Macintosh mouse button emulation as /devices/virtual/input/input0 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ehci_hcd :c8:01.2: EHCI Host Controller ehci_hcd :c8:01.2: new USB bus registered, assigned bus number 1 ehci_hcd :c8:01.2: irq 85, io mem 0x400a0002000 ehci_hcd :c8:01.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 5 ports detected Unable to handle kernel paging request for data at address 0x0050 Faulting instruction address: 0xc00fa1c4 cpu 0x7: Vector: 300 (Data Access) at [c000d82e7a70] pc: c00fa1c4: .cache_reap+0x74/0x29c lr: c00fa198: .cache_reap+0x48/0x29c sp: c000d82e7cf0 msr: 80009032 dar: 50 dsisr: 4000 current = 0xc000d82d85c0 paca= 0xc0668e00 pid = 27, comm = events/7 enter ? for help [c000d82e7cf0] c070be98 vmstat_update+0x0/0x18 (unreliable) [c000d82e7da0] c0092994 .run_workqueue+0x120/0x210 [c000d82e7e40] c0093bb8 .worker_thread+0xcc/0xf0 [c000d82e7f00] c0097b70 .kthread+0x78/0xc4 [c000d82e7f90] c002ab74 .kernel_thread+0x4c/0x68 7:mon> ... ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
crash in kmem_cache_init
Current linus tree crashes in kmem_cache_init, as shown below. The system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram. Firmware is 240_332, 2.6.23 boots ok with the same config. There is a series of mm related patches in 2.6.24-rc1: commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it, ==> .git/BISECT_LOG <== git-bisect start # good: [0b8bc8b91cf6befea20fe78b90367ca7b61cfa0d] Linux 2.6.23 git-bisect good 0b8bc8b91cf6befea20fe78b90367ca7b61cfa0d # bad: [cebdeed27b068dcc3e7c311d7ec0d9c33b5138c2] Linux 2.6.24-rc1 git-bisect bad cebdeed27b068dcc3e7c311d7ec0d9c33b5138c2 # good: [9ac52315d4cf5f561f36dabaf0720c00d3553162] sched: guest CPU accounting: add guest-CPU /proc//stat fields git-bisect good 9ac52315d4cf5f561f36dabaf0720c00d3553162 # bad: [b9ec0339d8e22cadf2d9d1b010b51dc53837dfb0] add consts where appropriate in fs/nls/Kconfig fs/nls/Makefile fs/nls/nls_ascii.c fs/nls/nls_base.c fs/nls/nls_cp1250.c fs/nls/nls_cp1251.c fs/nls/nls_cp1255.c fs/nls/nls_cp437.c fs/nls/nls_cp737.c fs/nls/nls_cp775.c fs/nls/nls_cp850.c fs/nls/nls_cp852.c fs/nls/nls_cp855.c fs/nls/nls_cp857.c fs/nls/nls_cp860.c fs/nls/nls_cp861.c fs/nls/nls_cp862.c fs/nls/nls_cp863.c fs/nls/nls_cp864.c fs/nls/nls_cp865.c fs/nls/nls_cp866.c fs/nls/nls_cp869.c fs/nls/nls_cp874.c fs/nls/nls_cp932.c fs/nls/nls_cp936.c fs/nls/nls_cp949.c fs/nls/nls_cp950.c fs/nls/nls_euc-jp.c fs/nls/nls_iso8859-1.c fs/nls/nls_iso8859-13.c fs/nls/nls_iso8859-14.c fs/nls/nls_iso8859-15.c fs/nls/nls_iso8859-2.c fs/nls/nls_iso8859-3.c fs/nls/nls_iso8859-4.c fs/nls/nls_iso8859-5.c fs/nls/nls_iso8859-6.c fs/nls/nls_iso8859-7.c fs/nls/nls_iso8859-9.c fs/nls/nls_koi8-r.c fs/nls/nls_koi8-ru.c fs/nls/nls_koi8-u.c fs/nls/nls_utf8.c git-bisect bad b9ec0339d8e22cadf2d9d1b010b51dc53837dfb0 # bad: [78a26e25ce4837a03ac3b6c32cdae1958e547639] uml: separate timer initialization git-bisect bad 78a26e25ce4837a03ac3b6c32cdae1958e547639 # good: [4acad72ded8e3f0211bd2a762e23c28229c61a51] [IPV6]: Consolidate the ip6_pol_route_(input|output) pair git-bisect good 4acad72ded8e3f0211bd2a762e23c28229c61a51 # good: [64da82efae0d7b5f7c478021840fd329f76d965d] Add support for PCMCIA card Sierra WIreless AC850 git-bisect good 64da82efae0d7b5f7c478021840fd329f76d965d # bad: [37b07e4163f7306aa735a6e250e8d22293e5b8de] memoryless nodes: fixup uses of node_online_map in generic code git-bisect bad 37b07e4163f7306aa735a6e250e8d22293e5b8de # good: [64649a58919e66ec21792dbb6c48cb3da22cbd7f] mm: trim more holes git-bisect good 64649a58919e66ec21792dbb6c48cb3da22cbd7f # good: [fb53b3094888be0cf8ddf052277654268904bdf5] smbfs: convert to new aops git-bisect good fb53b3094888be0cf8ddf052277654268904bdf5 # good: [13808910713a98cc1159291e62cdfec92cc94d05] Memoryless nodes: Generic management of nodemasks for various purposes . Please wait, loading kernel... Allocated 00a0 bytes for kernel @ 0020 Elf64 kernel loaded... OF stdout device is: /vdevice/[EMAIL PROTECTED] Hypertas detected, assuming LPAR ! command line: panic=1 debug xmon=on memory layout at init: alloc_bottom : 00ac1000 alloc_top: 1000 alloc_top_hi : da00 rmo_top : 1000 ram_top : da00 Looking for displays found display : /[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED], opening ... done instantiating rtas at 0x0f6a1000 ... done : boot cpu 0002 : starting cpu hw idx 0002... done 0004 : starting cpu hw idx 0004... done 0006 : starting cpu hw idx 0006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x00cc2000 -> 0x00cc34e4 Device tree struct 0x00cc4000 -> 0x00cd6000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #2 SMP Tue Jan 15 14:23:02 CET 2008 - ppc64_pft_size= 0x1c physicalMemorySize= 0xda00 htab_hash_mask= 0x1f - Linux version 2.6.24-rc7-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #2 SMP Tue Jan 15 14:23:02 CET 2008 [boot]0012 Setup Arch EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 892928 Normal 892928 -> 892928 Movable zone start PFN for each node early_node_map[1] active PFN ranges 1:0 -> 892928 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 880720 Policy zone: DMA Kernel command line: panic=1 debug xmon=on [boot]0020 XICS Init xics: no ISA interrupt controller [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) time_init: decremen