Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
Michael Ellerman wrote: Does this patch, on top of Ben's patch, fix it? cheers diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index db556d2..1ade7eb 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -753,7 +753,7 @@ void __init early_init_mmu(void) } #ifdef CONFIG_SMP -void __init early_init_mmu_secondary(void) +void __cpuinit early_init_mmu_secondary(void) { /* Initialize hash table for that CPU */ if (!firmware_has_feature(FW_FEATURE_LPAR)) Yes, this patch fixed the issue. Now i can offline/online cpus without any problem. Thanks -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
On Thu, 2009-04-16 at 11:06 +0530, Sachin Sant wrote: > Sachin Sant wrote: > > Sachin Sant wrote: > >> Benjamin Herrenschmidt wrote: > >>> On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote: > >>> > While executing CPU HotPlug[1] tests i observed that during > every cpu offline process an exception is thrown. > > >>> > >>> Looks like a BUG_ON() to me... can you look at what other > >>> messages just before that ? > >> > > Ben, seems like the following patch is causing the cpu hotplug > > test failure. > > [PATCH 6/6] powerpc/mm: Introduce early_init_mmu() on 64-bit > > > > http://ozlabs.org/pipermail/linuxppc-dev/2009-March/069613.html > > > > If i back out this patch, i am able to offline/online cpu's > > without any issue. > I can recreate this problem with 2.6.30-rc2-git1 as well. Same BUG_ON while > running cpu hotplug tests. > > Let me know if there is any thing i can help to find a fix for this. Hi Sachin, Does this patch, on top of Ben's patch, fix it? cheers diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index db556d2..1ade7eb 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -753,7 +753,7 @@ void __init early_init_mmu(void) } #ifdef CONFIG_SMP -void __init early_init_mmu_secondary(void) +void __cpuinit early_init_mmu_secondary(void) { /* Initialize hash table for that CPU */ if (!firmware_has_feature(FW_FEATURE_LPAR)) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
Sachin Sant wrote: Sachin Sant wrote: Benjamin Herrenschmidt wrote: On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote: While executing CPU HotPlug[1] tests i observed that during every cpu offline process an exception is thrown. Looks like a BUG_ON() to me... can you look at what other messages just before that ? Ben, seems like the following patch is causing the cpu hotplug test failure. [PATCH 6/6] powerpc/mm: Introduce early_init_mmu() on 64-bit http://ozlabs.org/pipermail/linuxppc-dev/2009-March/069613.html If i back out this patch, i am able to offline/online cpu's without any issue. I can recreate this problem with 2.6.30-rc2-git1 as well. Same BUG_ON while running cpu hotplug tests. Let me know if there is any thing i can help to find a fix for this. Thanks -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
Sachin Sant wrote: Benjamin Herrenschmidt wrote: On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote: While executing CPU HotPlug[1] tests i observed that during every cpu offline process an exception is thrown. Looks like a BUG_ON() to me... can you look at what other messages just before that ? Ben, seems like the following patch is causing the cpu hotplug test failure. [PATCH 6/6] powerpc/mm: Introduce early_init_mmu() on 64-bit http://ozlabs.org/pipermail/linuxppc-dev/2009-March/069613.html If i back out this patch, i am able to offline/online cpu's without any issue. Thanks -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
Benjamin Herrenschmidt wrote: On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote: While executing CPU HotPlug[1] tests i observed that during every cpu offline process an exception is thrown. Looks like a BUG_ON() to me... can you look at what other messages just before that ? I don't get any other messages when the problem occurs. Infact if i don't have xmon enabled the machine just hangs without any messages on the console. I extracted the dmesg log (attached in my previous mail) through xmon. Here are last few related messages from 2.6.29-git8 kernel during problem recreation. <4>IRQ 18 affinity broken off cpu 2 <4>cpu 2 (hwid 2) Ready to die <7>CPU0 attaching NULL sched-domain.. <7>CPU1 attaching NULL sched-domain.. <7>CPU2 attaching NULL sched-domain.. <7>CPU3 attaching NULL sched-domain.. <7>CPU0 attaching sched-domain:. <7> domain 0: span 0-1 level SIBLING. <7> groups: 0 1. <7> domain 1: span 0-1,3 level CPU. <7> groups: 0-1 3. <7> domain 2: span 0-1,3 level NODE <7>groups: 0-1,3. <7>CPU1 attaching sched-domain:. <7> domain 0: span 0-1 level SIBLING. <7> groups: 1 0. <7> domain 1: span 0-1,3 level CPU. <7> groups: 0-1 3. <7> domain 2: span 0-1,3 level NODE. <7>groups: 0-1,3. <7>CPU3 attaching sched-domain:. <7> domain 0: span 0-1,3 level CPU. <7> groups: 3 0-1. <7> domain 1: span 0-1,3 level NODE. <7> groups: 0-1,3... That or lookup where the PC and LR values are in System.map and maybe get us a backtrace from xmon ? (You seem to have no symbols, have you built with kallsyms ?) I have kallsyms and debug info options enabled. CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_DEBUG_INFO=y Here is the related information from 2.6.29-git8 kernel. llm62 login: cpu 0x2: Vector: 700 (Program Check) at [c74c7ca0] pc: 007b6640 lr: 0079ddc0 sp: c74c7f20 msr: 80081002 current = 0xc000fe1c8580 paca= 0xc0ab2800 pid = 0, comm = swapper enter ? for help [c74c7f20] 00018694 (unreliable) [c74c7f90] 8278 SP (4f0003) is in userspace 2:mon> la %pc 007b6640 2:mon> la c07b6640 c07b6640: .kmem_cache_init+0x2d8/0x528 2:mon> la %lr 0079ddc0 2:mon> la c079ddc0 c079ddc0: .mem_init+0x150/0x22c 2:mon> Regards -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote: > While executing CPU HotPlug[1] tests i observed that during > every cpu offline process an exception is thrown. Looks like a BUG_ON() to me... can you look at what other messages just before that ? That or lookup where the PC and LR values are in System.map and maybe get us a backtrace from xmon ? (You seem to have no symbols, have you built with kallsyms ?) Ben. > cpu 0x2: Vector: 700 (Program Check) at [c74c7ca0] > pc: 007b6640 > lr: 0079ddc0 > sp: c74c7f20 >msr: 80081002 > current = 0xc000fe1c8580 > paca= 0xc0ab2800 > pid = 0, comm = swapper > 2:mon> r > R00 = R16 = 0002 > R01 = c74c7f20 R17 = > R02 = 009e8dc0 R18 = > R03 = 8278 R19 = > R04 = 8000 R20 = > R05 = 0002 R21 = > R06 = 0002 R22 = c0b33ae0 > R07 = R23 = > R08 = R24 = 0002 > R09 = 82fc R25 = > R10 = R26 = 0004 > R11 = a0001002 R27 = c0a95bd8 > R12 = a000 R28 = 0008 > R13 = c0ab2800 R29 = > R14 = R30 = c095e750 > R15 = 07531868 R31 = 07d70b20 > pc = 007b6640 > lr = 0079ddc0 > msr = 80081002 cr = 2204 > ctr = xer = 0020 trap = 700 > 2:mon> u > SLB contents of cpu 2 > 00 c800 40004f7ca3000500 1T ESID= c0 VSID= 4f7ca3 > LLP:100 > 01 d800 4000eb71b510 1T ESID= d0 VSID= eb71b0 > LLP:110 > 24 0800 0c80 256M ESID=0 VSID=0 > LLP: 0 > 2:mon> > > I can recreate this problem very easily on power5 > as well as power6 box. > > 2.6.29-git6 did not have this problem. Let me know if there > is any other information i can provide. I have attached the > dmesg log here. > > Thanks > -Sachin > > [1] -> CPU Hotplug test which is part of LTP. > > plain text document attachment (dmesg_cpu_hotplug) > <6>Phyp-dump disabled at boot time. > <6>Using pSeries machine description. > <7>Page orders: linear mapping = 24, virtual = 16, io = 12. > <6>Using 1TB segments. > <4>Found initrd at 0xc34d:0xc3c7f14f. > <6>console [udbg0] enabled. > <6>Partition configured for 4 cpus.. > <6>CPU maps initialized for 2 threads per core. > <7> (thread shift is 1). > <4>Starting Linux PPC64 #3 SMP Tue Mar 31 14:33:34 IST 2009. > <4>-. > <4>ppc64_pft_size= 0x1a. > <4>physicalMemorySize= 0x1. > <4>htab_hash_mask= 0x7. > <4>-. > <6>Initializing cgroup subsys cpuset. > <6>Initializing cgroup subsys cpu. > <5>Linux version 2.6.29-git7 (r...@llm62) (gcc version 4.3.2 [gcc-4_3-branch > revision 141291] (SUSE Linux) ) #3 SMP Tue Mar 31 14:33:34 IST 2009. > <4>[boot]0012 Setup Arch. > <7>Node 0 Memory: 0x0-0x1. > <4>EEH: No capable adapters found. > <6>PPC64 nvram contains 15360 bytes. > <7>Using shared processor idle loop. > <4>Zone PFN ranges:. > <4> DMA 0x -> 0x0001. > <4> Normal 0x0001 -> 0x0001. > <4>Movable zone start PFN for each node. > <4>early_node_map[1] active PFN ranges. > <4>0: 0x -> 0x0001. > <7>On node 0 totalpages: 65536. > <7> DMA zone: 56 pages used for memmap. > <7> DMA zone: 0 pages reserved. > <7> DMA zone: 65480 pages, LIFO batch:1. > <4>[boot]0015 Setup Done. > <4>Built 1 zonelists in Node order, mobility grouping on. Total pages: 65480. > <4>Policy zone: DMA. > <5>Kernel command line: root=/dev/sda5 sysrq=1 insmod=sym53c8xx insmod=ipr > crashkernel=512M-:256M . > <6>NR_IRQS:512. > <4>[boot]0020 XICS Init. > <4>[boot]0021 XICS Done. > <7>pic: no ISA interrupt controller. > <4>PID hash table entries: 4096 (order: 12, 32768 bytes). > <7>time_init: decrementer frequency = 512.00 MHz. > <7>time_init: processor frequency = 4704.00 MHz. > <6>clocksource: timebase mult[7d] shift[22] registered. > <7>clockevent: decrementer mult[8312] shift[16] cpu[0]. > <4>Console: colour dummy device 80x25. > <6>console handover: boot [udbg0] -> real [hvc0]. > <6>Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes). > <6>Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes). > <6>allocated 2621440 bytes of page_cgroup. > <6>please try cgroup_disable=memory option if you don't want. > <4>freeing bootmem node 0. > <6>Memory: 4119872k/4194304k available (8192k kernel code, 74432k reserved, > 1984k data, 4194k bss, 448k init). > <6>Calibrating delay loop... 1022.36 BogoMIPS (lp
[ppc64] 2.6.29-git7 : offlining a cpu causes an exception
While executing CPU HotPlug[1] tests i observed that during every cpu offline process an exception is thrown. cpu 0x2: Vector: 700 (Program Check) at [c74c7ca0] pc: 007b6640 lr: 0079ddc0 sp: c74c7f20 msr: 80081002 current = 0xc000fe1c8580 paca= 0xc0ab2800 pid = 0, comm = swapper 2:mon> r R00 = R16 = 0002 R01 = c74c7f20 R17 = R02 = 009e8dc0 R18 = R03 = 8278 R19 = R04 = 8000 R20 = R05 = 0002 R21 = R06 = 0002 R22 = c0b33ae0 R07 = R23 = R08 = R24 = 0002 R09 = 82fc R25 = R10 = R26 = 0004 R11 = a0001002 R27 = c0a95bd8 R12 = a000 R28 = 0008 R13 = c0ab2800 R29 = R14 = R30 = c095e750 R15 = 07531868 R31 = 07d70b20 pc = 007b6640 lr = 0079ddc0 msr = 80081002 cr = 2204 ctr = xer = 0020 trap = 700 2:mon> u SLB contents of cpu 2 00 c800 40004f7ca3000500 1T ESID= c0 VSID= 4f7ca3 LLP:100 01 d800 4000eb71b510 1T ESID= d0 VSID= eb71b0 LLP:110 24 0800 0c80 256M ESID=0 VSID=0 LLP: 0 2:mon> I can recreate this problem very easily on power5 as well as power6 box. 2.6.29-git6 did not have this problem. Let me know if there is any other information i can provide. I have attached the dmesg log here. Thanks -Sachin [1] -> CPU Hotplug test which is part of LTP. -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - <6>Phyp-dump disabled at boot time. <6>Using pSeries machine description. <7>Page orders: linear mapping = 24, virtual = 16, io = 12. <6>Using 1TB segments. <4>Found initrd at 0xc34d:0xc3c7f14f. <6>console [udbg0] enabled. <6>Partition configured for 4 cpus.. <6>CPU maps initialized for 2 threads per core. <7> (thread shift is 1). <4>Starting Linux PPC64 #3 SMP Tue Mar 31 14:33:34 IST 2009. <4>-. <4>ppc64_pft_size= 0x1a. <4>physicalMemorySize= 0x1. <4>htab_hash_mask= 0x7. <4>-. <6>Initializing cgroup subsys cpuset. <6>Initializing cgroup subsys cpu. <5>Linux version 2.6.29-git7 (r...@llm62) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #3 SMP Tue Mar 31 14:33:34 IST 2009. <4>[boot]0012 Setup Arch. <7>Node 0 Memory: 0x0-0x1. <4>EEH: No capable adapters found. <6>PPC64 nvram contains 15360 bytes. <7>Using shared processor idle loop. <4>Zone PFN ranges:. <4> DMA 0x -> 0x0001. <4> Normal 0x0001 -> 0x0001. <4>Movable zone start PFN for each node. <4>early_node_map[1] active PFN ranges. <4>0: 0x -> 0x0001. <7>On node 0 totalpages: 65536. <7> DMA zone: 56 pages used for memmap. <7> DMA zone: 0 pages reserved. <7> DMA zone: 65480 pages, LIFO batch:1. <4>[boot]0015 Setup Done. <4>Built 1 zonelists in Node order, mobility grouping on. Total pages: 65480. <4>Policy zone: DMA. <5>Kernel command line: root=/dev/sda5 sysrq=1 insmod=sym53c8xx insmod=ipr crashkernel=512M-:256M . <6>NR_IRQS:512. <4>[boot]0020 XICS Init. <4>[boot]0021 XICS Done. <7>pic: no ISA interrupt controller. <4>PID hash table entries: 4096 (order: 12, 32768 bytes). <7>time_init: decrementer frequency = 512.00 MHz. <7>time_init: processor frequency = 4704.00 MHz. <6>clocksource: timebase mult[7d] shift[22] registered. <7>clockevent: decrementer mult[8312] shift[16] cpu[0]. <4>Console: colour dummy device 80x25. <6>console handover: boot [udbg0] -> real [hvc0]. <6>Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes). <6>Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes). <6>allocated 2621440 bytes of page_cgroup. <6>please try cgroup_disable=memory option if you don't want. <4>freeing bootmem node 0. <6>Memory: 4119872k/4194304k available (8192k kernel code, 74432k reserved, 1984k data, 4194k bss, 448k init). <6>Calibrating delay loop... 1022.36 BogoMIPS (lpj=5111808). <6>Security Framework initialized. <6>SELinux: Disabled at boot.. <4>Mount-cache hash table entries: 4096. <6>Initializing cgroup subsys debug. <6>Initializing cgroup subsys ns. <6>Initializing cgroup subsys cpuacct. <6>Initializing cgroup subsys memory. <6>Initializing cgroup subsys devices. <6>Initializing cgroup subsys freezer. <7>clockevent: decrementer mult[8312] shift[16] cpu[1]. <4>Processor 1 found.. <7>