Re: linux-next: spinlock lockup with next-20081118 on powerpc

2008-11-19 Thread Stephen Rothwell
Hi Jens,

On Wed, 19 Nov 2008 10:16:28 +0100 Jens Axboe [EMAIL PROTECTED] wrote:

 Strange, so it gets stuck on the timer lock, very weird. You don't
 happen to have output showing that the other CPU is up to at that point?

Unfortunately, no, but I will see what I can find tomorrow.

Today's linux-next still has a problem, but it is slightly different:

Unable to handle kernel paging request for data at address 0x
Faulting instruction address: 0xc0503030
cpu 0x0: Vector: 300 (Data Access) at [ca40]
pc: c0503030: ._spin_lock_irqsave+0x40/0x110
lr: c02571f8: .blk_rq_timed_out_timer+0x48/0x190
sp: ccc0
   msr: 80009032
   dar: 0
 dsisr: 4000
  current = 0xc00022d31040
  paca= 0xc0897300
pid   = 3399, comm = ckbcomp
enter ? for help
[cd50] c02571f8 .blk_rq_timed_out_timer+0x48/0x190
[ce00] c006c2f4 .run_timer_softirq+0x1c4/0x2a0
[ced0] c0065298 .__do_softirq+0xe8/0x1f0
[cf90] c0029224 .call_do_softirq+0x14/0x24
[c00022ad3c80] c000d420 .do_softirq+0xf0/0x140
[c00022ad3d20] c00654a4 .irq_exit+0x74/0x90
[c00022ad3da0] c0025844 .timer_interrupt+0x134/0x150
[c00022ad3e30] c0003700 decrementer_common+0x100/0x180
--- Exception: 901 (Decrementer) at 0ff52440

I am currently bisecting yesterday's linux-next.
-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpAEgsCi1T70.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: linux-next: spinlock lockup with next-20081118 on powerpc

2008-11-19 Thread Jens Axboe
On Wed, Nov 19 2008, Stephen Rothwell wrote:
 Hi Jens,
 
 On Wed, 19 Nov 2008 10:16:28 +0100 Jens Axboe [EMAIL PROTECTED] wrote:
 
  Strange, so it gets stuck on the timer lock, very weird. You don't
  happen to have output showing that the other CPU is up to at that point?
 
 Unfortunately, no, but I will see what I can find tomorrow.
 
 Today's linux-next still has a problem, but it is slightly different:
 
 Unable to handle kernel paging request for data at address 0x
 Faulting instruction address: 0xc0503030
 cpu 0x0: Vector: 300 (Data Access) at [ca40]
 pc: c0503030: ._spin_lock_irqsave+0x40/0x110
 lr: c02571f8: .blk_rq_timed_out_timer+0x48/0x190
 sp: ccc0
msr: 80009032
dar: 0
  dsisr: 4000
   current = 0xc00022d31040
   paca= 0xc0897300
 pid   = 3399, comm = ckbcomp
 enter ? for help
 [cd50] c02571f8 .blk_rq_timed_out_timer+0x48/0x190
 [ce00] c006c2f4 .run_timer_softirq+0x1c4/0x2a0
 [ced0] c0065298 .__do_softirq+0xe8/0x1f0
 [cf90] c0029224 .call_do_softirq+0x14/0x24
 [c00022ad3c80] c000d420 .do_softirq+0xf0/0x140
 [c00022ad3d20] c00654a4 .irq_exit+0x74/0x90
 [c00022ad3da0] c0025844 .timer_interrupt+0x134/0x150
 [c00022ad3e30] c0003700 decrementer_common+0x100/0x180
 --- Exception: 901 (Decrementer) at 0ff52440

That's even more weird, how could 'data' passed in to the timer ever be
0? It's setup like this:

setup_timer(q-timeout, blk_rq_timed_out_timer, (unsigned long) q);

when we allocate the queue. How did this trigger?

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: linux-next: spinlock lockup with next-20081118 on powerpc

2008-11-19 Thread Jens Axboe
On Wed, Nov 19 2008, Stephen Rothwell wrote:
 Hi all,
 
 I got this in my boot test last night:
 
 Begin: Waiting for root file system... ...
 BUG: spinlock lockup on CPU#1, vol_id/3246, c0b09700
 Call Trace:
 [c00040ef7080] [c000fb58] .show_stack+0x70/0x184 (unreliable)
 [c00040ef7130] [c027adac] ._raw_spin_lock+0x140/0x17c
 [c00040ef71d0] [c04ec648] ._spin_lock_irqsave+0x8c/0xc4
 [c00040ef7270] [c00659dc] .lock_timer_base+0x38/0x90
 [c00040ef7310] [c0065b50] .__mod_timer+0x4c/0x11c
 [c00040ef73c0] [c025ae9c] .blk_plug_device+0xc0/0xd8
 [c00040ef7440] [c025bb90] .__make_request+0x498/0x518
 [c00040ef74f0] [c0259dc8] .generic_make_request+0x24c/0x2a4
 [c00040ef75b0] [c025b6d0] .submit_bio+0x108/0x130
 [c00040ef7670] [c01210e4] .submit_bh+0x174/0x1c0
 [c00040ef7700] [c01259a8] .block_read_full_page+0x34c/0x3b4
 [c00040ef7820] [c0129a60] .blkdev_readpage+0x20/0x38
 [c00040ef78a0] [c00c111c] .__do_page_cache_readahead+0x23c/0x2b8
 [c00040ef7980] [c00c1370] .ondemand_readahead+0x1d8/0x210
 [c00040ef7a30] [c00b7f20] .generic_file_aio_read+0x224/0x620
 [c00040ef7b60] [c00f9020] .do_sync_read+0xc4/0x124
 [c00040ef7cf0] [c00f98e0] .vfs_read+0xd8/0x1bc
 [c00040ef7d90] [c00f9f0c] .sys_read+0x4c/0x8c
 [c00040ef7e30] [c00084d4] syscall_exit+0x0/0x40
 
 This was on a Power5 partition.  I am attempting to reproduce the problem.
 
 Any clues?

Strange, so it gets stuck on the timer lock, very weird. You don't
happen to have output showing that the other CPU is up to at that point?

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: linux-next: spinlock lockup with next-20081118 on powerpc

2008-11-19 Thread Stephen Rothwell
Hi Jens,

On Wed, 19 Nov 2008 10:43:00 +0100 Jens Axboe [EMAIL PROTECTED] wrote:

 On Wed, Nov 19 2008, Stephen Rothwell wrote:
  
  Unable to handle kernel paging request for data at address 0x
  Faulting instruction address: 0xc0503030
  cpu 0x0: Vector: 300 (Data Access) at [ca40]
  pc: c0503030: ._spin_lock_irqsave+0x40/0x110
  lr: c02571f8: .blk_rq_timed_out_timer+0x48/0x190
  sp: ccc0
 msr: 80009032
 dar: 0
   dsisr: 4000
current = 0xc00022d31040
paca= 0xc0897300
  pid   = 3399, comm = ckbcomp
  enter ? for help
  [cd50] c02571f8 .blk_rq_timed_out_timer+0x48/0x190
  [ce00] c006c2f4 .run_timer_softirq+0x1c4/0x2a0
  [ced0] c0065298 .__do_softirq+0xe8/0x1f0
  [cf90] c0029224 .call_do_softirq+0x14/0x24
  [c00022ad3c80] c000d420 .do_softirq+0xf0/0x140
  [c00022ad3d20] c00654a4 .irq_exit+0x74/0x90
  [c00022ad3da0] c0025844 .timer_interrupt+0x134/0x150
  [c00022ad3e30] c0003700 decrementer_common+0x100/0x180
  --- Exception: 901 (Decrementer) at 0ff52440
 
 That's even more weird, how could 'data' passed in to the timer ever be
 0? It's setup like this:

'data' above is generic, not a variable name. The 0 is probably the
address of the spinlock (though I need to check more to be sure) as it
crashed inside _spin_lock_irqsave.

 setup_timer(q-timeout, blk_rq_timed_out_timer, (unsigned long) q);
 
 when we allocate the queue. How did this trigger?

Not sure what you mean?
-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpuZ3Ic18S4H.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: linux-next: spinlock lockup with next-20081118 on powerpc

2008-11-19 Thread Stephen Rothwell
Hi Jens,

On Wed, 19 Nov 2008 14:34:09 +0100 Jens Axboe [EMAIL PROTECTED] wrote:

 Are you removing devices or modules? We have a bug there it seems, does
 this help?

This is early in boot (we are waiting for the root device while running
on the initramfs) so there could well be modules being unloaded.

That patch makes the problem go away.

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpSg9kZJurJM.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: linux-next: spinlock lockup with next-20081118 on powerpc

2008-11-19 Thread Jens Axboe
On Thu, Nov 20 2008, Stephen Rothwell wrote:
 Hi Jens,
 
 On Wed, 19 Nov 2008 14:34:09 +0100 Jens Axboe [EMAIL PROTECTED] wrote:
 
  Are you removing devices or modules? We have a bug there it seems, does
  this help?
 
 This is early in boot (we are waiting for the root device while running
 on the initramfs) so there could well be modules being unloaded.
 
 That patch makes the problem go away.

Excellent, since it was an apparent but, I already updated the original
patch with this hunk.

Thanks a lot for your bisection work, Stephen!

-- 
Jens Axboe

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


linux-next: spinlock lockup with next-20081118 on powerpc

2008-11-18 Thread Stephen Rothwell
Hi all,

I got this in my boot test last night:

Begin: Waiting for root file system... ...
BUG: spinlock lockup on CPU#1, vol_id/3246, c0b09700
Call Trace:
[c00040ef7080] [c000fb58] .show_stack+0x70/0x184 (unreliable)
[c00040ef7130] [c027adac] ._raw_spin_lock+0x140/0x17c
[c00040ef71d0] [c04ec648] ._spin_lock_irqsave+0x8c/0xc4
[c00040ef7270] [c00659dc] .lock_timer_base+0x38/0x90
[c00040ef7310] [c0065b50] .__mod_timer+0x4c/0x11c
[c00040ef73c0] [c025ae9c] .blk_plug_device+0xc0/0xd8
[c00040ef7440] [c025bb90] .__make_request+0x498/0x518
[c00040ef74f0] [c0259dc8] .generic_make_request+0x24c/0x2a4
[c00040ef75b0] [c025b6d0] .submit_bio+0x108/0x130
[c00040ef7670] [c01210e4] .submit_bh+0x174/0x1c0
[c00040ef7700] [c01259a8] .block_read_full_page+0x34c/0x3b4
[c00040ef7820] [c0129a60] .blkdev_readpage+0x20/0x38
[c00040ef78a0] [c00c111c] .__do_page_cache_readahead+0x23c/0x2b8
[c00040ef7980] [c00c1370] .ondemand_readahead+0x1d8/0x210
[c00040ef7a30] [c00b7f20] .generic_file_aio_read+0x224/0x620
[c00040ef7b60] [c00f9020] .do_sync_read+0xc4/0x124
[c00040ef7cf0] [c00f98e0] .vfs_read+0xd8/0x1bc
[c00040ef7d90] [c00f9f0c] .sys_read+0x4c/0x8c
[c00040ef7e30] [c00084d4] syscall_exit+0x0/0x40

This was on a Power5 partition.  I am attempting to reproduce the problem.

Any clues?
-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpX107COExHG.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: linux-next: spinlock lockup with next-20081118 on powerpc

2008-11-18 Thread Stephen Rothwell
Hi all,

On Wed, 19 Nov 2008 09:30:23 +1100 Stephen Rothwell [EMAIL PROTECTED] wrote:

 This was on a Power5 partition.  I am attempting to reproduce the problem.

OK, it reproduces.  The machine is a Power5 partition (IBM,9124-720
eServer OpenPower 720) with 1 (2 way threaded) cpu (gr, rev2.1, 1.5GHz),
2G of memory, 2 NUMA nodes running Ubuntu Gutsy.

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpptMk1f8FW9.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev