Re: linux-next: spinlock lockup with next-20081118 on powerpc
Hi Jens, On Wed, 19 Nov 2008 10:16:28 +0100 Jens Axboe [EMAIL PROTECTED] wrote: Strange, so it gets stuck on the timer lock, very weird. You don't happen to have output showing that the other CPU is up to at that point? Unfortunately, no, but I will see what I can find tomorrow. Today's linux-next still has a problem, but it is slightly different: Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc0503030 cpu 0x0: Vector: 300 (Data Access) at [ca40] pc: c0503030: ._spin_lock_irqsave+0x40/0x110 lr: c02571f8: .blk_rq_timed_out_timer+0x48/0x190 sp: ccc0 msr: 80009032 dar: 0 dsisr: 4000 current = 0xc00022d31040 paca= 0xc0897300 pid = 3399, comm = ckbcomp enter ? for help [cd50] c02571f8 .blk_rq_timed_out_timer+0x48/0x190 [ce00] c006c2f4 .run_timer_softirq+0x1c4/0x2a0 [ced0] c0065298 .__do_softirq+0xe8/0x1f0 [cf90] c0029224 .call_do_softirq+0x14/0x24 [c00022ad3c80] c000d420 .do_softirq+0xf0/0x140 [c00022ad3d20] c00654a4 .irq_exit+0x74/0x90 [c00022ad3da0] c0025844 .timer_interrupt+0x134/0x150 [c00022ad3e30] c0003700 decrementer_common+0x100/0x180 --- Exception: 901 (Decrementer) at 0ff52440 I am currently bisecting yesterday's linux-next. -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgpAEgsCi1T70.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: linux-next: spinlock lockup with next-20081118 on powerpc
On Wed, Nov 19 2008, Stephen Rothwell wrote: Hi Jens, On Wed, 19 Nov 2008 10:16:28 +0100 Jens Axboe [EMAIL PROTECTED] wrote: Strange, so it gets stuck on the timer lock, very weird. You don't happen to have output showing that the other CPU is up to at that point? Unfortunately, no, but I will see what I can find tomorrow. Today's linux-next still has a problem, but it is slightly different: Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc0503030 cpu 0x0: Vector: 300 (Data Access) at [ca40] pc: c0503030: ._spin_lock_irqsave+0x40/0x110 lr: c02571f8: .blk_rq_timed_out_timer+0x48/0x190 sp: ccc0 msr: 80009032 dar: 0 dsisr: 4000 current = 0xc00022d31040 paca= 0xc0897300 pid = 3399, comm = ckbcomp enter ? for help [cd50] c02571f8 .blk_rq_timed_out_timer+0x48/0x190 [ce00] c006c2f4 .run_timer_softirq+0x1c4/0x2a0 [ced0] c0065298 .__do_softirq+0xe8/0x1f0 [cf90] c0029224 .call_do_softirq+0x14/0x24 [c00022ad3c80] c000d420 .do_softirq+0xf0/0x140 [c00022ad3d20] c00654a4 .irq_exit+0x74/0x90 [c00022ad3da0] c0025844 .timer_interrupt+0x134/0x150 [c00022ad3e30] c0003700 decrementer_common+0x100/0x180 --- Exception: 901 (Decrementer) at 0ff52440 That's even more weird, how could 'data' passed in to the timer ever be 0? It's setup like this: setup_timer(q-timeout, blk_rq_timed_out_timer, (unsigned long) q); when we allocate the queue. How did this trigger? -- Jens Axboe ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: linux-next: spinlock lockup with next-20081118 on powerpc
On Wed, Nov 19 2008, Stephen Rothwell wrote: Hi all, I got this in my boot test last night: Begin: Waiting for root file system... ... BUG: spinlock lockup on CPU#1, vol_id/3246, c0b09700 Call Trace: [c00040ef7080] [c000fb58] .show_stack+0x70/0x184 (unreliable) [c00040ef7130] [c027adac] ._raw_spin_lock+0x140/0x17c [c00040ef71d0] [c04ec648] ._spin_lock_irqsave+0x8c/0xc4 [c00040ef7270] [c00659dc] .lock_timer_base+0x38/0x90 [c00040ef7310] [c0065b50] .__mod_timer+0x4c/0x11c [c00040ef73c0] [c025ae9c] .blk_plug_device+0xc0/0xd8 [c00040ef7440] [c025bb90] .__make_request+0x498/0x518 [c00040ef74f0] [c0259dc8] .generic_make_request+0x24c/0x2a4 [c00040ef75b0] [c025b6d0] .submit_bio+0x108/0x130 [c00040ef7670] [c01210e4] .submit_bh+0x174/0x1c0 [c00040ef7700] [c01259a8] .block_read_full_page+0x34c/0x3b4 [c00040ef7820] [c0129a60] .blkdev_readpage+0x20/0x38 [c00040ef78a0] [c00c111c] .__do_page_cache_readahead+0x23c/0x2b8 [c00040ef7980] [c00c1370] .ondemand_readahead+0x1d8/0x210 [c00040ef7a30] [c00b7f20] .generic_file_aio_read+0x224/0x620 [c00040ef7b60] [c00f9020] .do_sync_read+0xc4/0x124 [c00040ef7cf0] [c00f98e0] .vfs_read+0xd8/0x1bc [c00040ef7d90] [c00f9f0c] .sys_read+0x4c/0x8c [c00040ef7e30] [c00084d4] syscall_exit+0x0/0x40 This was on a Power5 partition. I am attempting to reproduce the problem. Any clues? Strange, so it gets stuck on the timer lock, very weird. You don't happen to have output showing that the other CPU is up to at that point? -- Jens Axboe ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: linux-next: spinlock lockup with next-20081118 on powerpc
Hi Jens, On Wed, 19 Nov 2008 10:43:00 +0100 Jens Axboe [EMAIL PROTECTED] wrote: On Wed, Nov 19 2008, Stephen Rothwell wrote: Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc0503030 cpu 0x0: Vector: 300 (Data Access) at [ca40] pc: c0503030: ._spin_lock_irqsave+0x40/0x110 lr: c02571f8: .blk_rq_timed_out_timer+0x48/0x190 sp: ccc0 msr: 80009032 dar: 0 dsisr: 4000 current = 0xc00022d31040 paca= 0xc0897300 pid = 3399, comm = ckbcomp enter ? for help [cd50] c02571f8 .blk_rq_timed_out_timer+0x48/0x190 [ce00] c006c2f4 .run_timer_softirq+0x1c4/0x2a0 [ced0] c0065298 .__do_softirq+0xe8/0x1f0 [cf90] c0029224 .call_do_softirq+0x14/0x24 [c00022ad3c80] c000d420 .do_softirq+0xf0/0x140 [c00022ad3d20] c00654a4 .irq_exit+0x74/0x90 [c00022ad3da0] c0025844 .timer_interrupt+0x134/0x150 [c00022ad3e30] c0003700 decrementer_common+0x100/0x180 --- Exception: 901 (Decrementer) at 0ff52440 That's even more weird, how could 'data' passed in to the timer ever be 0? It's setup like this: 'data' above is generic, not a variable name. The 0 is probably the address of the spinlock (though I need to check more to be sure) as it crashed inside _spin_lock_irqsave. setup_timer(q-timeout, blk_rq_timed_out_timer, (unsigned long) q); when we allocate the queue. How did this trigger? Not sure what you mean? -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgpuZ3Ic18S4H.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: linux-next: spinlock lockup with next-20081118 on powerpc
Hi Jens, On Wed, 19 Nov 2008 14:34:09 +0100 Jens Axboe [EMAIL PROTECTED] wrote: Are you removing devices or modules? We have a bug there it seems, does this help? This is early in boot (we are waiting for the root device while running on the initramfs) so there could well be modules being unloaded. That patch makes the problem go away. -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgpSg9kZJurJM.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: linux-next: spinlock lockup with next-20081118 on powerpc
On Thu, Nov 20 2008, Stephen Rothwell wrote: Hi Jens, On Wed, 19 Nov 2008 14:34:09 +0100 Jens Axboe [EMAIL PROTECTED] wrote: Are you removing devices or modules? We have a bug there it seems, does this help? This is early in boot (we are waiting for the root device while running on the initramfs) so there could well be modules being unloaded. That patch makes the problem go away. Excellent, since it was an apparent but, I already updated the original patch with this hunk. Thanks a lot for your bisection work, Stephen! -- Jens Axboe ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
linux-next: spinlock lockup with next-20081118 on powerpc
Hi all, I got this in my boot test last night: Begin: Waiting for root file system... ... BUG: spinlock lockup on CPU#1, vol_id/3246, c0b09700 Call Trace: [c00040ef7080] [c000fb58] .show_stack+0x70/0x184 (unreliable) [c00040ef7130] [c027adac] ._raw_spin_lock+0x140/0x17c [c00040ef71d0] [c04ec648] ._spin_lock_irqsave+0x8c/0xc4 [c00040ef7270] [c00659dc] .lock_timer_base+0x38/0x90 [c00040ef7310] [c0065b50] .__mod_timer+0x4c/0x11c [c00040ef73c0] [c025ae9c] .blk_plug_device+0xc0/0xd8 [c00040ef7440] [c025bb90] .__make_request+0x498/0x518 [c00040ef74f0] [c0259dc8] .generic_make_request+0x24c/0x2a4 [c00040ef75b0] [c025b6d0] .submit_bio+0x108/0x130 [c00040ef7670] [c01210e4] .submit_bh+0x174/0x1c0 [c00040ef7700] [c01259a8] .block_read_full_page+0x34c/0x3b4 [c00040ef7820] [c0129a60] .blkdev_readpage+0x20/0x38 [c00040ef78a0] [c00c111c] .__do_page_cache_readahead+0x23c/0x2b8 [c00040ef7980] [c00c1370] .ondemand_readahead+0x1d8/0x210 [c00040ef7a30] [c00b7f20] .generic_file_aio_read+0x224/0x620 [c00040ef7b60] [c00f9020] .do_sync_read+0xc4/0x124 [c00040ef7cf0] [c00f98e0] .vfs_read+0xd8/0x1bc [c00040ef7d90] [c00f9f0c] .sys_read+0x4c/0x8c [c00040ef7e30] [c00084d4] syscall_exit+0x0/0x40 This was on a Power5 partition. I am attempting to reproduce the problem. Any clues? -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgpX107COExHG.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: linux-next: spinlock lockup with next-20081118 on powerpc
Hi all, On Wed, 19 Nov 2008 09:30:23 +1100 Stephen Rothwell [EMAIL PROTECTED] wrote: This was on a Power5 partition. I am attempting to reproduce the problem. OK, it reproduces. The machine is a Power5 partition (IBM,9124-720 eServer OpenPower 720) with 1 (2 way threaded) cpu (gr, rev2.1, 1.5GHz), 2G of memory, 2 NUMA nodes running Ubuntu Gutsy. -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgpptMk1f8FW9.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev