Re: qla2xxx BUG: workqueue leaked lock or atomic
On 12:05, Mingming Cao wrote: > > > BTW: Are ext3 filesystem sizes greater than 8T now officially > > > supported? > > > > I think so, but I don't know how much 16TB testing developers and > > distros are doing - perhaps the linux-ext4 denizens can tell us? > > - > > IBM has done some testing (dbench, fsstress, fsx, tiobench, iozone etc) > on 10TB ext3, I think RedHat and BULL have done similar test on >8TB > ext3 too. Thanks. I'm asking because some days ago I tried to create a 10T ext3 filesytem on a linear software raid over two hardware raids, and it failed horribly. mke2fs from e2fsprogs-1.39 refused to create such a large filesystem but did it with -F, and I could mount it afterwards. But writing data immediately produced zillions of errors and only power-cycling the box helped. We're now using a 7.9T filesystem on the same hardware. That seems to work fine on 2.6.21-rc2, so I think this is an ext3 problem. I cannot completely rule out other reasons though as the underlying qla2xxx driver also had some problems on earlier kernels. We'd much rather have a 10T filesystem if possible. So if you have time to look into the issue I would be willing to recreate the 10T filesystem and send details. Regards Andre -- The only person who always got his work done by Friday was Robinson Crusoe signature.asc Description: Digital signature
Re: qla2xxx BUG: workqueue leaked lock or atomic
On Thu, Mar 08 2007, Andre Noll wrote: > On 10:36, Jens Axboe wrote: > > - Edit .config and set CONFIG_DEBUG_INFO=y (near the bottom) > > - make oldconfig > > - rm block/cfq-iosched.o > > - make block/cfq-iosched.o > > - gdb block/cfq-iosched.o > > > > (gdb) l *cfq_dispatch_insert+0x28 > > > > and see what that says. Should not take you more than a minute or so, > > would appreciate it! > > No problem, here we go: > > # gdb block/cfq-iosched.o > GNU gdb 6.4-debian > Copyright 2005 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu"...Using host libthread_db > library "/lib/libthread_db.so.1". > > (gdb) l *cfq_dispatch_insert+0x28 > 0xcf8 is in cfq_dispatch_insert (block/cfq-iosched.c:865). > 860 } > 861 > 862 static void cfq_dispatch_insert(request_queue_t *q, struct request > *rq) > 863 { > 864 struct cfq_data *cfqd = q->elevator->elevator_data; > 865 struct cfq_queue *cfqq = RQ_CFQQ(rq); > 866 > 867 cfq_remove_request(rq); > 868 cfqq->on_dispatch[rq_is_sync(rq)]++; > 869 elv_dispatch_sort(q, rq); Ok, so it's ->next_rq being NULL or invalid. Similar to the report from Dan last week, that's a bit worrisome. I'll have to look further into that. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qla2xxx BUG: workqueue leaked lock or atomic
On 10:36, Jens Axboe wrote: > - Edit .config and set CONFIG_DEBUG_INFO=y (near the bottom) > - make oldconfig > - rm block/cfq-iosched.o > - make block/cfq-iosched.o > - gdb block/cfq-iosched.o > > (gdb) l *cfq_dispatch_insert+0x28 > > and see what that says. Should not take you more than a minute or so, > would appreciate it! No problem, here we go: # gdb block/cfq-iosched.o GNU gdb 6.4-debian Copyright 2005 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) l *cfq_dispatch_insert+0x28 0xcf8 is in cfq_dispatch_insert (block/cfq-iosched.c:865). 860 } 861 862 static void cfq_dispatch_insert(request_queue_t *q, struct request *rq) 863 { 864 struct cfq_data *cfqd = q->elevator->elevator_data; 865 struct cfq_queue *cfqq = RQ_CFQQ(rq); 866 867 cfq_remove_request(rq); 868 cfqq->on_dispatch[rq_is_sync(rq)]++; 869 elv_dispatch_sort(q, rq); Regards Andre -- The only person who always got his work done by Friday was Robinson Crusoe signature.asc Description: Digital signature
Re: qla2xxx BUG: workqueue leaked lock or atomic
On Thu, Mar 08 2007, Andre Noll wrote: > On 10:02, Jens Axboe wrote: > > Do you still have the vmlinux? It'd be interesting to see what > > > > $ gbd vmlinux > > (gdb) l *cfq_dispatch_insert+0x28 > > > > says, > > The vmlinux in the kernel dir is dated March 5 and my bug report > was Feb 28. So I'm afraid it's gone. I tried the gdb command anyway > but it only gave me > > No symbol table is loaded. Use the "file" command. Yeah, you'd need CONFIG_DEBUG_INFO enabled as well. I don't think there were any CFQ changes between feb 28 and march 5, so you could probably still try it out. A quicker way: - Edit .config and set CONFIG_DEBUG_INFO=y (near the bottom) - make oldconfig - rm block/cfq-iosched.o - make block/cfq-iosched.o - gdb block/cfq-iosched.o (gdb) l *cfq_dispatch_insert+0x28 and see what that says. Should not take you more than a minute or so, would appreciate it! -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qla2xxx BUG: workqueue leaked lock or atomic
On 10:02, Jens Axboe wrote: > Do you still have the vmlinux? It'd be interesting to see what > > $ gbd vmlinux > (gdb) l *cfq_dispatch_insert+0x28 > > says, The vmlinux in the kernel dir is dated March 5 and my bug report was Feb 28. So I'm afraid it's gone. I tried the gdb command anyway but it only gave me No symbol table is loaded. Use the "file" command. Sorry Andre -- The only person who always got his work done by Friday was Robinson Crusoe signature.asc Description: Digital signature
Re: qla2xxx BUG: workqueue leaked lock or atomic
On Thu, Mar 08 2007, Andre Noll wrote: > On 19:46, Jens Axboe wrote: > > On Wed, Feb 28 2007, Andre Noll wrote: > > > On 16:18, Andre Noll wrote: > > > > > > > With 2.6.21-rc2 I am unable to reproduce this BUG message. However, > > > > writing to both raid systems at the same time via lvm still locks up > > > > the system within minutes. > > > > > > Screenshot of the resulting kernel panic: > > > > > > http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png > > > > Do you have the full oops as well? > > Unfortunately not, as there's no way to scroll up after a kernel panic > (the screenshot was taken by using a KVM switch which just sends the > video output over ethernet). Do you still have the vmlinux? It'd be interesting to see what $ gbd vmlinux (gdb) l *cfq_dispatch_insert+0x28 says, here that'd be cfqq dereference. And that must be valid, it's set on allocation time and only cleared after free. So unless lvm issues private requests that aren't properly allocated, this whole thing looks very bizarre. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qla2xxx BUG: workqueue leaked lock or atomic
On 19:46, Jens Axboe wrote: > On Wed, Feb 28 2007, Andre Noll wrote: > > On 16:18, Andre Noll wrote: > > > > > With 2.6.21-rc2 I am unable to reproduce this BUG message. However, > > > writing to both raid systems at the same time via lvm still locks up > > > the system within minutes. > > > > Screenshot of the resulting kernel panic: > > > > http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png > > Do you have the full oops as well? Unfortunately not, as there's no way to scroll up after a kernel panic (the screenshot was taken by using a KVM switch which just sends the video output over ethernet). Thanks Andre -- The only person who always got his work done by Friday was Robinson Crusoe signature.asc Description: Digital signature
Re: qla2xxx BUG: workqueue leaked lock or atomic
On Wed, 2007-03-07 at 11:45 -0800, Andrew Morton wrote: > On Wed, 7 Mar 2007 18:09:55 +0100 Andre Noll <[EMAIL PROTECTED]> wrote: > > > On 20:39, Andrew Morton wrote: > > > On Wed, 28 Feb 2007 16:37:22 +0100 Andre Noll <[EMAIL PROTECTED]> wrote: > > > > > > > On 16:18, Andre Noll wrote: > > > > > > > > > With 2.6.21-rc2 I am unable to reproduce this BUG message. However, > > > > > writing to both raid systems at the same time via lvm still locks up > > > > > the system within minutes. > > > > > > > > Screenshot of the resulting kernel panic: > > > > > > > > > > > > http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png > > > > > > > > > > It died in CFQ. Please try a different IO scheduler. Use something > > > like > > > > > > echo deadline > /sys/block/sda/queue/scheduler > > > > > > This could still be the old qla2xxx bug, or it could be a new qla2xxx bug, > > > or it could be a block bug, or it could be an LVM bug. > > > > OK. I'm running with deadline right now. But I guess this kernel > > panic was caused by an LVM bug because lockdep reported problems with > > LVM. Nobody responded to my bug report on the LVM mailing list (see > > http://www.redhat.com/archives/linux-lvm/2007-February/msg00102.html). > > > > Non-working snapshots and no help from the mailing list convinced me > > to ditch the lvm setup [1] in favour of linear software raid. This > > means I can't do lvm-related tests any more. > > Sigh. > > > BTW: Are ext3 filesystem sizes greater than 8T now officially > > supported? > > I think so, but I don't know how much 16TB testing developers and > distros are doing - perhaps the linux-ext4 denizens can tell us? > - IBM has done some testing (dbench, fsstress, fsx, tiobench, iozone etc) on 10TB ext3, I think RedHat and BULL have done similar test on >8TB ext3 too. Mingming - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qla2xxx BUG: workqueue leaked lock or atomic
On Wed, 7 Mar 2007 18:09:55 +0100 Andre Noll <[EMAIL PROTECTED]> wrote: > On 20:39, Andrew Morton wrote: > > On Wed, 28 Feb 2007 16:37:22 +0100 Andre Noll <[EMAIL PROTECTED]> wrote: > > > > > On 16:18, Andre Noll wrote: > > > > > > > With 2.6.21-rc2 I am unable to reproduce this BUG message. However, > > > > writing to both raid systems at the same time via lvm still locks up > > > > the system within minutes. > > > > > > Screenshot of the resulting kernel panic: > > > > > > http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png > > > > > > > It died in CFQ. Please try a different IO scheduler. Use something > > like > > > > echo deadline > /sys/block/sda/queue/scheduler > > > > This could still be the old qla2xxx bug, or it could be a new qla2xxx bug, > > or it could be a block bug, or it could be an LVM bug. > > OK. I'm running with deadline right now. But I guess this kernel > panic was caused by an LVM bug because lockdep reported problems with > LVM. Nobody responded to my bug report on the LVM mailing list (see > http://www.redhat.com/archives/linux-lvm/2007-February/msg00102.html). > > Non-working snapshots and no help from the mailing list convinced me > to ditch the lvm setup [1] in favour of linear software raid. This > means I can't do lvm-related tests any more. Sigh. > BTW: Are ext3 filesystem sizes greater than 8T now officially > supported? I think so, but I don't know how much 16TB testing developers and distros are doing - perhaps the linux-ext4 denizens can tell us? - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qla2xxx BUG: workqueue leaked lock or atomic
On Wed, Feb 28 2007, Andre Noll wrote: > On 16:18, Andre Noll wrote: > > > With 2.6.21-rc2 I am unable to reproduce this BUG message. However, > > writing to both raid systems at the same time via lvm still locks up > > the system within minutes. > > Screenshot of the resulting kernel panic: > > http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png Do you have the full oops as well? -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qla2xxx BUG: workqueue leaked lock or atomic
On 20:39, Andrew Morton wrote: > On Wed, 28 Feb 2007 16:37:22 +0100 Andre Noll <[EMAIL PROTECTED]> wrote: > > > On 16:18, Andre Noll wrote: > > > > > With 2.6.21-rc2 I am unable to reproduce this BUG message. However, > > > writing to both raid systems at the same time via lvm still locks up > > > the system within minutes. > > > > Screenshot of the resulting kernel panic: > > > > http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png > > > > It died in CFQ. Please try a different IO scheduler. Use something > like > > echo deadline > /sys/block/sda/queue/scheduler > > This could still be the old qla2xxx bug, or it could be a new qla2xxx bug, > or it could be a block bug, or it could be an LVM bug. OK. I'm running with deadline right now. But I guess this kernel panic was caused by an LVM bug because lockdep reported problems with LVM. Nobody responded to my bug report on the LVM mailing list (see http://www.redhat.com/archives/linux-lvm/2007-February/msg00102.html). Non-working snapshots and no help from the mailing list convinced me to ditch the lvm setup [1] in favour of linear software raid. This means I can't do lvm-related tests any more. BTW: Are ext3 filesystem sizes greater than 8T now officially supported? Thanks Andre [1] vg of two hardware raids, 10T together, a single lv and some snapshots -- signature.asc Description: Digital signature
Re: qla2xxx BUG: workqueue leaked lock or atomic
On Wed, 28 Feb 2007 16:37:22 +0100 Andre Noll <[EMAIL PROTECTED]> wrote: > On 16:18, Andre Noll wrote: > > > With 2.6.21-rc2 I am unable to reproduce this BUG message. However, > > writing to both raid systems at the same time via lvm still locks up > > the system within minutes. > > Screenshot of the resulting kernel panic: > > http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png > It died in CFQ. Please try a different IO scheduler. Use something like echo deadline > /sys/block/sda/queue/scheduler This could still be the old qla2xxx bug, or it could be a new qla2xxx bug, or it could be a block bug, or it could be an LVM bug. Adrian, can we please track this as a regression? - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qla2xxx BUG: workqueue leaked lock or atomic
On 16:18, Andre Noll wrote: > With 2.6.21-rc2 I am unable to reproduce this BUG message. However, > writing to both raid systems at the same time via lvm still locks up > the system within minutes. Screenshot of the resulting kernel panic: http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png Andre signature.asc Description: Digital signature
Re: qla2xxx BUG: workqueue leaked lock or atomic
On 10:51, Andrew Vasquez wrote: > On Tue, 27 Feb 2007, Andre Noll wrote: > > [ 68.532665] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on() > > Ok, since 2.6.20, there been a patch added to qla2xxx which drops the > spin_unlock_irq() call while attempting to ramp-up the queue-depth: > > Could you try the latest 2.6.21-rc which contains the correction? With 2.6.21-rc2 I am unable to reproduce this BUG message. However, writing to both raid systems at the same time via lvm still locks up the system within minutes. As lockdep revealed another dm-related lock problem on this kernel, I guess I'll have to bother the lvm people on this. Thanks Andre signature.asc Description: Digital signature
Re: qla2xxx BUG: workqueue leaked lock or atomic
On Tue, 27 Feb 2007, Andre Noll wrote: > On 10:26, Andrew Vasquez wrote: > > You are loading some stale firmware that's left over on the card -- > > I'm not even sure what 4.00.70 is, as the latest release firmware is > > 4.00.27. > > That's the firmware which came with the card. Anyway, I just upgraded > the firmware, but the bug remains. The backtrace differs a bit though > as now the tg3 network driver seems to be involved as well. > > Thanks for your help > Andre ... > [ 68.532665] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on() > [ 68.532784] > [ 68.532785] Call Trace: > [ 68.532979][] trace_hardirqs_on+0xd7/0x180 > [ 68.533168] [] _spin_unlock_irq+0x2b/0x40 > [ 68.533295] [] > :qla2xxx:qla2x00_process_completed_request+0x137/0x1d0 > [ 68.533457] [] :qla2xxx:qla2x00_status_entry+0x82/0xa40 > [ 68.533577] [] __lock_acquire+0xcdf/0xd90 > [ 68.533693] [] _spin_unlock_irqrestore+0x42/0x60 > [ 68.533816] [] :qla2xxx:qla24xx_intr_handler+0x4e/0x2b0 > [ 68.533942] [] > :qla2xxx:qla24xx_process_response_queue+0xc1/0x1c0 > [ 68.534102] [] :qla2xxx:qla24xx_intr_handler+0x1d4/0x2b0 Ok, since 2.6.20, there been a patch added to qla2xxx which drops the spin_unlock_irq() call while attempting to ramp-up the queue-depth: commit befede3dabd204e9c546cbfbe391b29286c57da2 Author: Seokmann Ju <[EMAIL PROTECTED]> Date: Tue Jan 9 11:37:52 2007 -0800 [SCSI] qla2xxx: correct locking while call starget_for_each_device() Removed spin_unlock_irq()/spin_lock_irq() pairs surrounding starget_for_each_device() calls. As Matthew W. pointed out, starget_for_each_device() can be called under a spinlock being held. The change has been tested and verified on qla2xxx.ko module. Thanks Matthew W. and Hisashi H. for help. Signed-off-by: Andrew Vasquez <[EMAIL PROTECTED]> Signed-off-by: Seokmann Ju <[EMAIL PROTECTED]> Signed-off-by: James Bottomley <[EMAIL PROTECTED]> http://marc.theaimsgroup.com/?l=linux-scsi&m=116837234904583&w=2 Could you try the latest 2.6.21-rc which contains the correction? Regards, Andrew Vasquez - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qla2xxx BUG: workqueue leaked lock or atomic
On 11:11, Andre Noll wrote: > On 10:26, Andrew Vasquez wrote: > > You are loading some stale firmware that's left over on the card -- > > I'm not even sure what 4.00.70 is, as the latest release firmware is > > 4.00.27. > > That's the firmware which came with the card. Anyway, I just upgraded > the firmware, but the bug remains. the system crashed again btw., this time resulting in a kernel panic instead of just locking up silently. Here's a screenshot: http://systemlinux.org/~maan/shots/qla2xxx-crash-huangho2.png Regards Andre signature.asc Description: Digital signature
Re: qla2xxx BUG: workqueue leaked lock or atomic
On 10:26, Andrew Vasquez wrote: > You are loading some stale firmware that's left over on the card -- > I'm not even sure what 4.00.70 is, as the latest release firmware is > 4.00.27. That's the firmware which came with the card. Anyway, I just upgraded the firmware, but the bug remains. The backtrace differs a bit though as now the tg3 network driver seems to be involved as well. Thanks for your help Andre [ 67.511167] qla2xxx :05:08.0: Allocated (64 KB) for EFT... [ 67.511434] qla2xxx :05:08.0: Allocated (1413 KB) for firmware dump... [ 67.531231] scsi0 : qla2xxx [ 67.854344] qla2xxx :05:08.0: [ 67.854346] QLogic Fibre Channel HBA Driver: 8.01.07-k4 [ 67.854347] QLogic HP AE369-60001 - QLA2340 [ 67.854348] ISP2422: PCI-X Mode 1 (133 MHz) @ :05:08.0 hdma+, host#=0, fw=4.00.27 [IP] [ 67.854881] ACPI: PCI Interrupt :05:08.1[B] -> GSI 33 (level, low) -> IRQ 33 [ 67.855230] qla2xxx :05:08.1: Found an ISP2422, irq 33, iobase 0xc2012000 [ 67.855645] qla2xxx :05:08.1: Configuring PCI space... [ 67.855907] qla2xxx :05:08.1: Configure NVRAM parameters... [ 67.862486] qla2xxx :05:08.1: Verifying loaded RISC code... [ 68.106663] qla2xxx :05:08.1: Allocated (64 KB) for EFT... [ 68.107058] qla2xxx :05:08.1: Allocated (1413 KB) for firmware dump... [ 68.126759] scsi1 : qla2xxx [ 68.196783] Adding 6540152k swap on /dev/md2. Priority:-1 extents:1 across:6540152k [ 68.260645] qla2xxx :05:08.0: LIP reset occured (f8f7). [ 68.296027] qla2xxx :05:08.0: LIP occured (f8f7). [ 68.298214] qla2xxx :05:08.0: LOOP UP detected (2 Gbps). [ 68.326627] qla2xxx :05:08.1: [ 68.326628] QLogic Fibre Channel HBA Driver: 8.01.07-k4 [ 68.326630] QLogic HP AE369-60001 - QLA2340 [ 68.326631] ISP2422: PCI-X Mode 1 (133 MHz) @ :05:08.1 hdma+, host#=1, fw=4.00.27 [IP] [ 68.504335] EXT3 FS on md1, internal journal [ 68.524627] PM: Writing back config space on device :03:06.0 at offset b (was 164814e4, writing d00e11) [ 68.524644] PM: Writing back config space on device :03:06.0 at offset 3 (was 804000, writing 804010) [ 68.524650] PM: Writing back config space on device :03:06.0 at offset 2 (was 200, writing 210) [ 68.524657] PM: Writing back config space on device :03:06.0 at offset 1 (was 2b0, writing 2b00146) [ 68.532665] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on() [ 68.532784] [ 68.532785] Call Trace: [ 68.532979][] trace_hardirqs_on+0xd7/0x180 [ 68.533168] [] _spin_unlock_irq+0x2b/0x40 [ 68.533295] [] :qla2xxx:qla2x00_process_completed_request+0x137/0x1d0 [ 68.533457] [] :qla2xxx:qla2x00_status_entry+0x82/0xa40 [ 68.533577] [] __lock_acquire+0xcdf/0xd90 [ 68.533693] [] _spin_unlock_irqrestore+0x42/0x60 [ 68.533816] [] :qla2xxx:qla24xx_intr_handler+0x4e/0x2b0 [ 68.533942] [] :qla2xxx:qla24xx_process_response_queue+0xc1/0x1c0 [ 68.534102] [] :qla2xxx:qla24xx_intr_handler+0x1d4/0x2b0 [ 68.534224] [] handle_IRQ_event+0x20/0x60 [ 68.534339] [] handle_fasteoi_irq+0xbd/0x110 [ 68.534459] [] do_IRQ+0x132/0x1a0 [ 68.534574] [] ret_from_intr+0x0/0xf [ 68.534687][] __delay+0xc/0x20 [ 68.534862] [] __const_udelay+0x37/0x40 [ 68.534982] [] :tg3:tg3_chip_reset+0x547/0x670 [ 68.535103] [] :tg3:tg3_reset_hw+0x5d/0x1790 [ 68.535218] [] __udelay+0x37/0x40 [ 68.535333] [] :tg3:_tw32_flush+0x6d/0x80 [ 68.535451] [] :tg3:tg3_open+0x2d6/0x610 [ 68.535569] [] :tg3:tg3_init_hw+0x42/0x50 [ 68.535687] [] :tg3:tg3_open+0x2e3/0x610 [ 68.535804] [] dev_open+0x43/0x90 [ 68.535917] [] dev_change_flags+0x74/0x160 [ 68.536034] [] devinet_ioctl+0x2e6/0x730 [ 68.536149] [] dev_ioctl+0x302/0x340 [ 68.536264] [] __up_read+0x9b/0xb0 [ 68.536378] [] inet_ioctl+0x4c/0x70 [ 68.536494] [] sock_ioctl+0x1fc/0x230 [ 68.536610] [] do_ioctl+0x31/0xa0 [ 68.536722] [] vfs_ioctl+0x2bb/0x2e0 [ 68.536836] [] sys_ioctl+0x4a/0x80 [ 68.536948] [] system_call+0x7e/0x83 [ 68.537059] [ 68.712832] scsi 0:0:0:0: Direct-Access transtec T6100F16R1-E 342I PQ: 0 ANSI: 5 [ 68.713384] sda : very big device. try to use READ CAPACITY(16). [ 68.713594] SCSI device sda: 11714863104 512-byte hdwr sectors (5998010 MB) [ 68.713976] sda: Write Protect is off [ 68.714079] sda: Mode Sense: 9b 00 00 08 [ 68.714483] SCSI device sda: write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 68.714876] sda : very big device. try to use READ CAPACITY(16). [ 68.715080] SCSI device sda: 11714863104 512-byte hdwr sectors (5998010 MB) [ 68.715436] sda: Write Protect is off [ 68.715539] sda: Mode Sense: 9b 00 00 08 [ 68.715944] SCSI device sda: write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 68.718244] sda: unknown partition table [ 68.718707] sd 0:0:0:0: Attached scsi disk sda [ 68.718945] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 68.719413]
Re: qla2xxx BUG: workqueue leaked lock or atomic
On Mon, 26 Feb 2007, Andre Noll wrote: > On linux-2.6.20.1, we're seeing hard lockups with 2 raid systems > connected to a qla2xxx card and used as a single volume via lvm. > The system seems to lock up only if data gets written to both raid > systems at the same time. > > On a standard kernel nothing makes it to the log, the system just > freezes. So we tried a lockdep kernel which reports two BUGs during > boot, see below. > > Could this be related to our problem? Before we proceed further, could you retrieve the latest firmware release for 24xx type HBAs: > [ 64.151096] QLogic Fibre Channel HBA Driver > [ 64.151405] ACPI: PCI Interrupt :05:08.0[A] -> GSI 32 (level, low) -> > IRQ 32 > [ 64.151821] qla2xxx :05:08.0: Found an ISP2422, irq 32, iobase > 0xc2006000 > [ 64.152231] qla2xxx :05:08.0: Configuring PCI space... > [ 64.152498] qla2xxx :05:08.0: Configure NVRAM parameters... > [ 64.159088] qla2xxx :05:08.0: Verifying loaded RISC code... > [ 74.169623] qla2xxx :05:08.0: Firmware image unavailable. > [ 74.169737] qla2xxx :05:08.0: Firmware images can be retrieved from: > ftp://ftp.qlogic.com/outgoing/linux/firmware/. > [ 74.169902] qla2xxx :05:08.0: Attempting to load (potentially > outdated) firmware from flash. > [ 74.760935] qla2xxx :05:08.0: Allocated (64 KB) for EFT... > [ 74.761186] qla2xxx :05:08.0: Allocated (1413 KB) for firmware dump... > [ 74.776988] scsi0 : qla2xxx > [ 74.961451] qla2xxx :05:08.0: > [ 74.961452] QLogic Fibre Channel HBA Driver: 8.01.07-k4 > [ 74.961453] QLogic HP AE369-60001 - QLA2340 > [ 74.961454] ISP2422: PCI-X Mode 1 (133 MHz) @ :05:08.0 hdma+, > host#=0, fw=4.00.70 [IP] You are loading some stale firmware that's left over on the card -- I'm not even sure what 4.00.70 is, as the latest release firmware is 4.00.27. You can retrieve the image here: ftp://ftp.qlogic.com/outgoing/linux/firmware/ql2400_fw.bin Let's start there... before we move on to this: > [ 75.778656] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on() > [ 75.778771] > [ 75.778772] Call Trace: > [ 75.778967][] trace_hardirqs_on+0xd7/0x180 > [ 75.779154] [] _spin_unlock_irq+0x2b/0x40 > [ 75.779271] [] > qla2x00_process_completed_request+0x137/0x1d0 > [ 75.779424] [] qla2x00_status_entry+0x82/0xa40 > [ 75.779541] [] __lock_acquire+0xcdf/0xd90 > [ 75.779657] [] _spin_unlock_irqrestore+0x42/0x60 > [ 75.779775] [] qla24xx_intr_handler+0x4e/0x2b0 > [ 75.779892] [] qla24xx_process_response_queue+0xc1/0x1c0 > [ 75.780012] [] qla24xx_intr_handler+0x1d4/0x2b0 > [ 75.780131] [] handle_IRQ_event+0x20/0x60 Hmm Regards, Andrew Vasquez - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html