Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception

2009-04-16 Thread Sachin Sant

Michael Ellerman wrote:

Does this patch, on top of Ben's patch, fix it?

cheers

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index db556d2..1ade7eb 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -753,7 +753,7 @@ void __init early_init_mmu(void)
 }

 #ifdef CONFIG_SMP
-void __init early_init_mmu_secondary(void)
+void __cpuinit early_init_mmu_secondary(void)
 {
/* Initialize hash table for that CPU */
if (!firmware_has_feature(FW_FEATURE_LPAR))
Yes, this patch fixed the issue. Now i can offline/online cpus without 
any problem.


Thanks
-Sachin

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception

2009-04-16 Thread Michael Ellerman
On Thu, 2009-04-16 at 11:06 +0530, Sachin Sant wrote:
> Sachin Sant wrote:
> > Sachin Sant wrote:
> >> Benjamin Herrenschmidt wrote:
> >>> On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote:
> >>>  
>  While executing CPU HotPlug[1] tests i observed that during
>  every cpu offline process an exception is thrown.
>  
> >>>
> >>> Looks like a BUG_ON() to me... can you look at what other
> >>> messages just before that ?  
> >>
> > Ben, seems like the following patch is causing the cpu hotplug
> > test failure.
> > [PATCH 6/6] powerpc/mm: Introduce early_init_mmu() on 64-bit
> >
> > http://ozlabs.org/pipermail/linuxppc-dev/2009-March/069613.html
> >
> > If i back out this patch, i am able to offline/online cpu's
> > without any issue.
> I can recreate this problem with 2.6.30-rc2-git1 as well. Same BUG_ON while
> running cpu hotplug tests.
> 
> Let me know if there is any thing i can help to find a fix for this.

Hi Sachin,

Does this patch, on top of Ben's patch, fix it?

cheers

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index db556d2..1ade7eb 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -753,7 +753,7 @@ void __init early_init_mmu(void)
 }
 
 #ifdef CONFIG_SMP
-void __init early_init_mmu_secondary(void)
+void __cpuinit early_init_mmu_secondary(void)
 {
/* Initialize hash table for that CPU */
if (!firmware_has_feature(FW_FEATURE_LPAR))


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception

2009-04-15 Thread Sachin Sant

Sachin Sant wrote:

Sachin Sant wrote:

Benjamin Herrenschmidt wrote:

On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote:
 

While executing CPU HotPlug[1] tests i observed that during
every cpu offline process an exception is thrown.



Looks like a BUG_ON() to me... can you look at what other
messages just before that ?  



Ben, seems like the following patch is causing the cpu hotplug
test failure.
[PATCH 6/6] powerpc/mm: Introduce early_init_mmu() on 64-bit

http://ozlabs.org/pipermail/linuxppc-dev/2009-March/069613.html

If i back out this patch, i am able to offline/online cpu's
without any issue.

I can recreate this problem with 2.6.30-rc2-git1 as well. Same BUG_ON while
running cpu hotplug tests.

Let me know if there is any thing i can help to find a fix for this.

Thanks
-Sachin


--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception

2009-04-01 Thread Sachin Sant

Sachin Sant wrote:

Benjamin Herrenschmidt wrote:

On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote:
 

While executing CPU HotPlug[1] tests i observed that during
every cpu offline process an exception is thrown.



Looks like a BUG_ON() to me... can you look at what other
messages just before that ?  



Ben, seems like the following patch is causing the cpu hotplug
test failure. 


[PATCH 6/6] powerpc/mm: Introduce early_init_mmu() on 64-bit

http://ozlabs.org/pipermail/linuxppc-dev/2009-March/069613.html

If i back out this patch, i am able to offline/online cpu's
without any issue.

Thanks
-Sachin


--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception

2009-03-31 Thread Sachin Sant

Benjamin Herrenschmidt wrote:

On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote:
  

While executing CPU HotPlug[1] tests i observed that during
every cpu offline process an exception is thrown.



Looks like a BUG_ON() to me... can you look at what other
messages just before that ?
  

I don't get any other messages when the problem occurs. Infact
if i don't have xmon enabled the machine just hangs without
any messages on the console. I extracted the dmesg log
(attached in my previous mail) through xmon. Here are last few
related messages from 2.6.29-git8 kernel during problem recreation.

<4>IRQ 18 affinity broken off cpu 2
<4>cpu 2 (hwid 2) Ready to die
<7>CPU0 attaching NULL sched-domain..
<7>CPU1 attaching NULL sched-domain..
<7>CPU2 attaching NULL sched-domain..
<7>CPU3 attaching NULL sched-domain..
<7>CPU0 attaching sched-domain:.
<7> domain 0: span 0-1 level SIBLING.
<7>  groups: 0 1.
<7>  domain 1: span 0-1,3 level CPU.
<7>   groups: 0-1 3.
<7>   domain 2: span 0-1,3 level NODE
<7>groups: 0-1,3.
<7>CPU1 attaching sched-domain:.
<7> domain 0: span 0-1 level SIBLING.
<7>  groups: 1 0.
<7>  domain 1: span 0-1,3 level CPU.
<7>   groups: 0-1 3.
<7>   domain 2: span 0-1,3 level NODE.
<7>groups: 0-1,3.
<7>CPU3 attaching sched-domain:.
<7> domain 0: span 0-1,3 level CPU.
<7>  groups: 3 0-1.
<7>  domain 1: span 0-1,3 level NODE.
<7>   groups: 0-1,3...


That or lookup where the PC and LR values are in System.map
and maybe get us a backtrace from xmon ?

(You seem to have no symbols, have you built with kallsyms ?)

I have kallsyms and debug info options enabled.

CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_DEBUG_INFO=y

Here is the related information from 2.6.29-git8 kernel. 


llm62 login: cpu 0x2: Vector: 700 (Program Check) at [c74c7ca0]
  pc: 007b6640
  lr: 0079ddc0
  sp: c74c7f20
 msr: 80081002
current = 0xc000fe1c8580
paca= 0xc0ab2800
  pid   = 0, comm = swapper
enter ? for help
[c74c7f20] 00018694 (unreliable)
[c74c7f90] 8278
SP (4f0003) is in userspace
2:mon> la %pc
007b6640
2:mon> la c07b6640
c07b6640: .kmem_cache_init+0x2d8/0x528
2:mon> la %lr
0079ddc0
2:mon> la c079ddc0
c079ddc0: .mem_init+0x150/0x22c
2:mon>

Regards
-Sachin

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception

2009-03-31 Thread Benjamin Herrenschmidt
On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote:
> While executing CPU HotPlug[1] tests i observed that during
> every cpu offline process an exception is thrown.

Looks like a BUG_ON() to me... can you look at what other
messages just before that ?

That or lookup where the PC and LR values are in System.map
and maybe get us a backtrace from xmon ?

(You seem to have no symbols, have you built with kallsyms ?)

Ben.

> cpu 0x2: Vector: 700 (Program Check) at [c74c7ca0]
> pc: 007b6640
> lr: 0079ddc0
> sp: c74c7f20
>msr: 80081002
>   current = 0xc000fe1c8580
>   paca= 0xc0ab2800
> pid   = 0, comm = swapper
> 2:mon> r
> R00 =    R16 = 0002
> R01 = c74c7f20   R17 = 
> R02 = 009e8dc0   R18 = 
> R03 = 8278   R19 = 
> R04 = 8000   R20 = 
> R05 = 0002   R21 = 
> R06 = 0002   R22 = c0b33ae0
> R07 =    R23 = 
> R08 =    R24 = 0002
> R09 = 82fc   R25 = 
> R10 =    R26 = 0004
> R11 = a0001002   R27 = c0a95bd8
> R12 = a000   R28 = 0008
> R13 = c0ab2800   R29 = 
> R14 =    R30 = c095e750
> R15 = 07531868   R31 = 07d70b20
> pc  = 007b6640
> lr  = 0079ddc0
> msr = 80081002   cr  = 2204
> ctr =    xer = 0020   trap =  700
> 2:mon> u
> SLB contents of cpu 2
> 00 c800 40004f7ca3000500  1T  ESID=   c0  VSID=   4f7ca3 
> LLP:100
> 01 d800 4000eb71b510  1T  ESID=   d0  VSID=   eb71b0 
> LLP:110
> 24 0800 0c80 256M ESID=0  VSID=0 
> LLP:  0
> 2:mon>
> 
> I can recreate this problem very easily on power5
> as well as power6 box.
> 
> 2.6.29-git6 did not have this problem. Let me know if there
> is any other information i can provide. I have attached the
> dmesg log here.
> 
> Thanks
> -Sachin
> 
> [1] -> CPU Hotplug test which is part of LTP.
> 
> plain text document attachment (dmesg_cpu_hotplug)
> <6>Phyp-dump disabled at boot time.
> <6>Using pSeries machine description.
> <7>Page orders: linear mapping = 24, virtual = 16, io = 12.
> <6>Using 1TB segments.
> <4>Found initrd at 0xc34d:0xc3c7f14f.
> <6>console [udbg0] enabled.
> <6>Partition configured for 4 cpus..
> <6>CPU maps initialized for 2 threads per core.
> <7> (thread shift is 1).
> <4>Starting Linux PPC64 #3 SMP Tue Mar 31 14:33:34 IST 2009.
> <4>-.
> <4>ppc64_pft_size= 0x1a.
> <4>physicalMemorySize= 0x1.
> <4>htab_hash_mask= 0x7.
> <4>-.
> <6>Initializing cgroup subsys cpuset.
> <6>Initializing cgroup subsys cpu.
> <5>Linux version 2.6.29-git7 (r...@llm62) (gcc version 4.3.2 [gcc-4_3-branch 
> revision 141291] (SUSE Linux) ) #3 SMP Tue Mar 31 14:33:34 IST 2009.
> <4>[boot]0012 Setup Arch.
> <7>Node 0 Memory: 0x0-0x1.
> <4>EEH: No capable adapters found.
> <6>PPC64 nvram contains 15360 bytes.
> <7>Using shared processor idle loop.
> <4>Zone PFN ranges:.
> <4>  DMA  0x -> 0x0001.
> <4>  Normal   0x0001 -> 0x0001.
> <4>Movable zone start PFN for each node.
> <4>early_node_map[1] active PFN ranges.
> <4>0: 0x -> 0x0001.
> <7>On node 0 totalpages: 65536.
> <7>  DMA zone: 56 pages used for memmap.
> <7>  DMA zone: 0 pages reserved.
> <7>  DMA zone: 65480 pages, LIFO batch:1.
> <4>[boot]0015 Setup Done.
> <4>Built 1 zonelists in Node order, mobility grouping on.  Total pages: 65480.
> <4>Policy zone: DMA.
> <5>Kernel command line: root=/dev/sda5 sysrq=1 insmod=sym53c8xx insmod=ipr 
> crashkernel=512M-:256M  .
> <6>NR_IRQS:512.
> <4>[boot]0020 XICS Init.
> <4>[boot]0021 XICS Done.
> <7>pic: no ISA interrupt controller.
> <4>PID hash table entries: 4096 (order: 12, 32768 bytes).
> <7>time_init: decrementer frequency = 512.00 MHz.
> <7>time_init: processor frequency   = 4704.00 MHz.
> <6>clocksource: timebase mult[7d] shift[22] registered.
> <7>clockevent: decrementer mult[8312] shift[16] cpu[0].
> <4>Console: colour dummy device 80x25.
> <6>console handover: boot [udbg0] -> real [hvc0].
> <6>Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes).
> <6>Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes).
> <6>allocated 2621440 bytes of page_cgroup.
> <6>please try cgroup_disable=memory option if you don't want.
> <4>freeing bootmem node 0.
> <6>Memory: 4119872k/4194304k available (8192k kernel code, 74432k reserved, 
> 1984k data, 4194k bss, 448k init).
> <6>Calibrating delay loop... 1022.36 BogoMIPS (lp

[ppc64] 2.6.29-git7 : offlining a cpu causes an exception

2009-03-31 Thread Sachin Sant

While executing CPU HotPlug[1] tests i observed that during
every cpu offline process an exception is thrown.

cpu 0x2: Vector: 700 (Program Check) at [c74c7ca0]
   pc: 007b6640
   lr: 0079ddc0
   sp: c74c7f20
  msr: 80081002
 current = 0xc000fe1c8580
 paca= 0xc0ab2800
   pid   = 0, comm = swapper
2:mon> r
R00 =    R16 = 0002
R01 = c74c7f20   R17 = 
R02 = 009e8dc0   R18 = 
R03 = 8278   R19 = 
R04 = 8000   R20 = 
R05 = 0002   R21 = 
R06 = 0002   R22 = c0b33ae0
R07 =    R23 = 
R08 =    R24 = 0002
R09 = 82fc   R25 = 
R10 =    R26 = 0004
R11 = a0001002   R27 = c0a95bd8
R12 = a000   R28 = 0008
R13 = c0ab2800   R29 = 
R14 =    R30 = c095e750
R15 = 07531868   R31 = 07d70b20
pc  = 007b6640
lr  = 0079ddc0
msr = 80081002   cr  = 2204
ctr =    xer = 0020   trap =  700
2:mon> u
SLB contents of cpu 2
00 c800 40004f7ca3000500  1T  ESID=   c0  VSID=   4f7ca3 
LLP:100
01 d800 4000eb71b510  1T  ESID=   d0  VSID=   eb71b0 
LLP:110
24 0800 0c80 256M ESID=0  VSID=0 
LLP:  0
2:mon>

I can recreate this problem very easily on power5
as well as power6 box.

2.6.29-git6 did not have this problem. Let me know if there
is any other information i can provide. I have attached the
dmesg log here.

Thanks
-Sachin

[1] -> CPU Hotplug test which is part of LTP.

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-


<6>Phyp-dump disabled at boot time.
<6>Using pSeries machine description.
<7>Page orders: linear mapping = 24, virtual = 16, io = 12.
<6>Using 1TB segments.
<4>Found initrd at 0xc34d:0xc3c7f14f.
<6>console [udbg0] enabled.
<6>Partition configured for 4 cpus..
<6>CPU maps initialized for 2 threads per core.
<7> (thread shift is 1).
<4>Starting Linux PPC64 #3 SMP Tue Mar 31 14:33:34 IST 2009.
<4>-.
<4>ppc64_pft_size= 0x1a.
<4>physicalMemorySize= 0x1.
<4>htab_hash_mask= 0x7.
<4>-.
<6>Initializing cgroup subsys cpuset.
<6>Initializing cgroup subsys cpu.
<5>Linux version 2.6.29-git7 (r...@llm62) (gcc version 4.3.2 [gcc-4_3-branch 
revision 141291] (SUSE Linux) ) #3 SMP Tue Mar 31 14:33:34 IST 2009.
<4>[boot]0012 Setup Arch.
<7>Node 0 Memory: 0x0-0x1.
<4>EEH: No capable adapters found.
<6>PPC64 nvram contains 15360 bytes.
<7>Using shared processor idle loop.
<4>Zone PFN ranges:.
<4>  DMA  0x -> 0x0001.
<4>  Normal   0x0001 -> 0x0001.
<4>Movable zone start PFN for each node.
<4>early_node_map[1] active PFN ranges.
<4>0: 0x -> 0x0001.
<7>On node 0 totalpages: 65536.
<7>  DMA zone: 56 pages used for memmap.
<7>  DMA zone: 0 pages reserved.
<7>  DMA zone: 65480 pages, LIFO batch:1.
<4>[boot]0015 Setup Done.
<4>Built 1 zonelists in Node order, mobility grouping on.  Total pages: 65480.
<4>Policy zone: DMA.
<5>Kernel command line: root=/dev/sda5 sysrq=1 insmod=sym53c8xx insmod=ipr 
crashkernel=512M-:256M  .
<6>NR_IRQS:512.
<4>[boot]0020 XICS Init.
<4>[boot]0021 XICS Done.
<7>pic: no ISA interrupt controller.
<4>PID hash table entries: 4096 (order: 12, 32768 bytes).
<7>time_init: decrementer frequency = 512.00 MHz.
<7>time_init: processor frequency   = 4704.00 MHz.
<6>clocksource: timebase mult[7d] shift[22] registered.
<7>clockevent: decrementer mult[8312] shift[16] cpu[0].
<4>Console: colour dummy device 80x25.
<6>console handover: boot [udbg0] -> real [hvc0].
<6>Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes).
<6>Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes).
<6>allocated 2621440 bytes of page_cgroup.
<6>please try cgroup_disable=memory option if you don't want.
<4>freeing bootmem node 0.
<6>Memory: 4119872k/4194304k available (8192k kernel code, 74432k reserved, 
1984k data, 4194k bss, 448k init).
<6>Calibrating delay loop... 1022.36 BogoMIPS (lpj=5111808).
<6>Security Framework initialized.
<6>SELinux:  Disabled at boot..
<4>Mount-cache hash table entries: 4096.
<6>Initializing cgroup subsys debug.
<6>Initializing cgroup subsys ns.
<6>Initializing cgroup subsys cpuacct.
<6>Initializing cgroup subsys memory.
<6>Initializing cgroup subsys devices.
<6>Initializing cgroup subsys freezer.
<7>clockevent: decrementer mult[8312] shift[16] cpu[1].
<4>Processor 1 found..
<7>