Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
On Thu, 30 Aug 2007 13:29:37 +0200 Arkadiusz Miskiewicz <[EMAIL PROTECTED]> wrote: > On Wednesday 29 of August 2007, Jens Axboe wrote: > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > > On Wednesday 29 of August 2007, Jens Axboe wrote: > > > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > > > > On Wednesday 29 of August 2007, Jens Axboe wrote: > > > > > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > > > > > > I guess I should sent these here since it looks like not scsi bug > > > > > > > anyway. > > > > > > > > > > > > It's stex, right? It seems to have some issues with multiple > > > > > > completions of commands, which craps out the block layer of course. > > > > > > > > > > Yes, stex. I'm staying with 2.6.19 in that case since it works fine > > > > > in that version. > > > > > > > > > > So scsi bug ... 8-) > > > > > > > > And you based that conclusion on what exactly? Could be viewed as a scsi deficiency at least. Is it unheard of for "independent" queues to have shared resources? If so, then yeah, perhaps some driver-private locking as James suggested is appropriate. But if other drivers face similar problems then perhaps it is something which scsi core should offer support for. But whatever. The situation is that Ed suggested a fix eight months ago, James suggested enhancements and afaict nobody did anything more, and machines which use this driver are still crashing. OK, Ed's email client breaks message threading, so you need to hyperjump to a "different" thread a few days later, in which Ed points out that qla4xxx also has a shared tag queue. Ed's email client proceeds to splatter the discussion all over the Jan 2007 archive. Ed finds a possible bug in qla4xxx. Jens proposes a block patch. Ed disagrees, Jeff agrees with Ed, discussion dies, driver still crashing.. > > > Isn't drivers/scsi/* handled by [EMAIL PROTECTED] (that's what I mean) > > > > Yep indeed, I thought you meant that it was a scsi bug (and not an stex > > one). You could try and copy the 2.6.19 stex driver into 2.6.20 and see > > if that works, though. > > Looks like this bug is known for months :-( > > Ed Lin pointed to http://lkml.org/lkml/2007/1/23/268 with possible patch > (that > unfortunately serialises access to storage devices, well...) > > There is also: http://bugzilla.kernel.org/show_bug.cgi?id=7842 > > I'm running 2.6.22 with that patch now, did huge (few hours) rsync that > previously caused oopses and now everything works properly. > > Can we get some form of this patch into Linus tree? Here's Ed's patch again. As a suboptimal driver is better than a crashing one, perhaps we should merge it until we can sort out something better? From: "Ed Lin" <[EMAIL PROTECTED]> The block layer uses lock to protect request queue. Every scsi device has a unique request queue, and queue lock is the default lock in struct request_queue. This is good for normal cases. But for a host with shared queue tag (e.g. stex controllers), a queue lock per device means the shared queue tag is not protected when multiple devices are accessed at a same time. This patch is a simple fix for this situation by introducing a host queue lock to protect shared queue tag. Without this patch we will see various kernel panics (including the BUG() and kernel errors in blk_queue_start_tag and blk_queue_end_tag of ll_rw_blk.c) when accessing another in smp kernels). Signed-off-by: Ed Lin <[EMAIL PROTECTED]> Cc: James Bottomley <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> Cc: Jens Axboe <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- drivers/scsi/scsi_lib.c |2 +- drivers/scsi/stex.c |2 ++ include/scsi/scsi_host.h |3 +++ 3 files changed, 6 insertions(+), 1 deletion(-) diff -puN drivers/scsi/scsi_lib.c~scsi-use-lock-per-host-instead-of-per-device-for-shared-queue-tag-host drivers/scsi/scsi_lib.c --- a/drivers/scsi/scsi_lib.c~scsi-use-lock-per-host-instead-of-per-device-for-shared-queue-tag-host +++ a/drivers/scsi/scsi_lib.c @@ -1670,7 +1670,7 @@ struct request_queue *__scsi_alloc_queue { struct request_queue *q; - q = blk_init_queue(request_fn, NULL); + q = blk_init_queue(request_fn, shost->req_q_lock); if (!q) return NULL; diff -puN drivers/scsi/stex.c~scsi-use-lock-per-host-instead-of-per-device-for-shared-queue-tag-host drivers/scsi/stex.c --- a/drivers/scsi/stex.c~scsi-use-lock-per-host-instead-of-per-device-for-shared-queue-tag-host +++ a/drivers/scsi/stex.c @@ -1234,6 +1234,8 @@ stex_probe(struct pci_dev *pdev, const s if (err) goto out_free_irq; + spin_lock_init(&host->__req_q_lock); + host->req_q_lock = &host->__req_q_lock; err = scsi_init_shared_tag_map(host, host->can_queue); if (err) { printk(KERN_ERR DRV_NAME "(%s): init shared queue failed\n", diff -puN include/scsi/scsi_host.h~scsi-use-lock-per-host-instead-of-
Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
On Wednesday 29 of August 2007, Jens Axboe wrote: > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > On Wednesday 29 of August 2007, Jens Axboe wrote: > > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > > > On Wednesday 29 of August 2007, Jens Axboe wrote: > > > > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > > > > > I guess I should sent these here since it looks like not scsi bug > > > > > > anyway. > > > > > > > > > > It's stex, right? It seems to have some issues with multiple > > > > > completions of commands, which craps out the block layer of course. > > > > > > > > Yes, stex. I'm staying with 2.6.19 in that case since it works fine > > > > in that version. > > > > > > > > So scsi bug ... 8-) > > > > > > And you based that conclusion on what exactly? > > > > Isn't drivers/scsi/* handled by [EMAIL PROTECTED] (that's what I mean) > > Yep indeed, I thought you meant that it was a scsi bug (and not an stex > one). You could try and copy the 2.6.19 stex driver into 2.6.20 and see > if that works, though. Looks like this bug is known for months :-( Ed Lin pointed to http://lkml.org/lkml/2007/1/23/268 with possible patch (that unfortunately serialises access to storage devices, well...) There is also: http://bugzilla.kernel.org/show_bug.cgi?id=7842 I'm running 2.6.22 with that patch now, did huge (few hours) rsync that previously caused oopses and now everything works properly. Can we get some form of this patch into Linus tree? -- Arkadiusz MiśkiewiczPLD/Linux Team arekm / maven.plhttp://ftp.pld-linux.org/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > On Wednesday 29 of August 2007, Jens Axboe wrote: > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > > On Wednesday 29 of August 2007, Jens Axboe wrote: > > > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > > > > I guess I should sent these here since it looks like not scsi bug > > > > > anyway. > > > > > > > > It's stex, right? It seems to have some issues with multiple > > > > completions of commands, which craps out the block layer of course. > > > > > > Yes, stex. I'm staying with 2.6.19 in that case since it works fine in > > > that version. > > > > > > So scsi bug ... 8-) > > > > And you based that conclusion on what exactly? > > Isn't drivers/scsi/* handled by [EMAIL PROTECTED] (that's what I mean) Yep indeed, I thought you meant that it was a scsi bug (and not an stex one). You could try and copy the 2.6.19 stex driver into 2.6.20 and see if that works, though. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
On Wednesday 29 of August 2007, Jens Axboe wrote: > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > On Wednesday 29 of August 2007, Jens Axboe wrote: > > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > > > I guess I should sent these here since it looks like not scsi bug > > > > anyway. > > > > > > It's stex, right? It seems to have some issues with multiple > > > completions of commands, which craps out the block layer of course. > > > > Yes, stex. I'm staying with 2.6.19 in that case since it works fine in > > that version. > > > > So scsi bug ... 8-) > > And you based that conclusion on what exactly? Isn't drivers/scsi/* handled by [EMAIL PROTECTED] (that's what I mean) -- Arkadiusz MiśkiewiczPLD/Linux Team arekm / maven.plhttp://ftp.pld-linux.org/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > On Wednesday 29 of August 2007, Jens Axboe wrote: > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > > I guess I should sent these here since it looks like not scsi bug anyway. > > > > It's stex, right? It seems to have some issues with multiple completions > > of commands, which craps out the block layer of course. > > Yes, stex. I'm staying with 2.6.19 in that case since it works fine in that > version. > > So scsi bug ... 8-) And you based that conclusion on what exactly? -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
On Wednesday 29 of August 2007, Jens Axboe wrote: > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > I guess I should sent these here since it looks like not scsi bug anyway. > > It's stex, right? It seems to have some issues with multiple completions > of commands, which craps out the block layer of course. Yes, stex. I'm staying with 2.6.19 in that case since it works fine in that version. So scsi bug ... 8-) -- Arkadiusz MiśkiewiczPLD/Linux Team arekm / maven.plhttp://ftp.pld-linux.org/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote: > > I guess I should sent these here since it looks like not scsi bug anyway. It's stex, right? It seems to have some issues with multiple completions of commands, which craps out the block layer of course. > -- Forwarded Message -- > > Subject: 2.6.22 oops kernel BUG at block/elevator.c:366! > Date: Wednesday 29 of August 2007 > From: Arkadiusz Miskiewicz <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > > Hello, > > I'm trying to get stable kernel for Promise SuperTrak > X16350 hardware. So far 2.6.20, 2.6.21 and 2.6.22 oopsed > like this (while doing rsync): > > kernel BUG at block/elevator.c:366! > invalid opcode: [1] SMP > CPU 1 > Modules linked in: softdog sch_sfq forcedeth ext3 jbd mbcache dm_mod xfs > scsi_wait_scan sd_mod stex scsi_mod > Pid: 1139:#0, comm: xfsbufd Not tainted 2.6.22.5-0.2 #1 > RIP: 0010:[] [] elv_rb_del+0x3a/0x40 > RSP: :8100759b1c00 EFLAGS: 00010046 > RAX: 81000d1f5428 RBX: 81000d1f5428 RCX: 81007c1a1a00 > RDX: RSI: 81000d1f53b0 RDI: 81007c102af0 > RBP: 81000d1f53b0 R08: 81004a9dab50 R09: > R10: R11: 880072c0 R12: 81007c102ac0 > R13: 81007c1a1a00 R14: 0004 R15: 81007c102b18 > FS: 2ba2cafc9be0() GS:81007d0a5b40() knlGS: > CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b > CR2: 2ba2cab5a158 CR3: 3c5ce000 CR4: 06e0 > Process xfsbufd (pid: 1139[#0], threadinfo 8100759b, task > 81007cac1040) > Stack: 0001 81007c102ac0 81000d1f53b0 8034abe8 > 0246 81000d1f53b0 81007c1a1a00 81007c102ac0 > 81007c0f2d08 0004 81007c102b18 8034ad55 > Call Trace: > [] cfq_remove_request+0x78/0x1b0 > [] cfq_dispatch_insert+0x35/0x70 > [] cfq_dispatch_requests+0x1bf/0x3a0 > [] elv_next_request+0x3f/0x150 > [] lock_timer_base+0x34/0x70 > [] :scsi_mod:scsi_request_fn+0x69/0x3d0 > [] __make_request+0xe6/0x5d0 > [] generic_make_request+0x18b/0x230 > [] submit_bio+0x5a/0xf0 > [] :xfs:_xfs_buf_ioapply+0x199/0x340 > [] :xfs:xfs_buf_iorequest+0x29/0x80 > [] :xfs:xfs_bdstrat_cb+0x3b/0x50 > [] :xfs:xfsbufd+0x92/0x140 > [] :xfs:xfsbufd+0x0/0x140 > [] kthread+0x4b/0x80 > [] child_rip+0xa/0x12 > [] kthread+0x0/0x80 > [] child_rip+0x0/0x12 > > > Code: 0f 0b eb fe 66 90 48 83 ec 08 49 89 f8 48 89 f8 31 c9 eb 09 > RIP [] elv_rb_del+0x3a/0x40 > RSP > > > I can reproduce it without bigger problem. > > > Here are the same oopses on 2.6.20: > http://paste.stgraber.org/3138 > > This is 1 x dual core athlon64 on asus m2npv mainboard, 2GB RAM. > There is hw raid on fasttrack 16350 only (no software one). > > Has anyone seen this ? > > Going to try without cfq. > > -- > Arkadiusz Mi?kiewiczPLD/Linux Team > arekm / maven.plhttp://ftp.pld-linux.org/ > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > > --- > > -- Forwarded Message -- > > Subject: Re: 2.6.22 oops kernel BUG at block/elevator.c:366! > Date: Wednesday 29 of August 2007 > From: Arkadiusz Miskiewicz <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > > On Wednesday 29 of August 2007, Arkadiusz Miskiewicz wrote: > > Hello, > > > > I'm trying to get stable kernel for Promise SuperTrak > > X16350 hardware. So far 2.6.20, 2.6.21 and 2.6.22 oopsed > > like this (while doing rsync): > > With anticipatory: > > berta login: [ cut here ] > kernel BUG at block/as-iosched.c:1084! > invalid opcode: [1] SMP > CPU 1 > Modules linked in: softdog sch_sfq forcedeth ext3 jbd mbcache dm_mod xfs > scsi_wait_scan sd_mod stex scsi_mod > Pid: 32:#0, comm: kblockd/1 Not tainted 2.6.22.5-0.2 #1 > RIP: 0010:[] [] > as_dispatch_request+0x438/0x460 > RSP: 0018:81007d1fddc0 EFLAGS: 00010046 > RAX: RBX: 81007c765a00 RCX: > RDX: 81007c765a28 RSI: RDI: 81007c54ad08 > RBP: R08: R09: 81006a289d80 > R10: R11: 0001 R12: > R13: 0001 R14: R15: 81007cf85048 > FS: 2ba4421e8b00() GS:81007d0a5b40() knlGS: > CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b > CR