Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-03-09 Thread Andre Noll
On 12:05, Mingming Cao wrote:
   BTW: Are ext3 filesystem sizes greater than 8T now officially
   supported?
  
  I think so, but I don't know how much 16TB testing developers and
  distros are doing - perhaps the linux-ext4 denizens can tell us?
  -
 
 IBM has done some testing (dbench, fsstress, fsx, tiobench, iozone etc)
 on 10TB ext3, I think RedHat and BULL have done similar test on 8TB
 ext3 too.

Thanks. I'm asking because some days ago I tried to create a 10T ext3
filesytem on a linear software raid over two hardware raids, and it
failed horribly. mke2fs from e2fsprogs-1.39 refused to create such a
large filesystem but did it with -F, and I could mount it afterwards.
But writing data immediately produced zillions of errors and only
power-cycling the box helped.

We're now using a 7.9T filesystem on the same hardware. That seems
to work fine on 2.6.21-rc2, so I think this is an ext3 problem. I
cannot completely rule out other reasons though as the underlying
qla2xxx driver also had some problems on earlier kernels.

We'd much rather have a 10T filesystem if possible. So if you have
time to look into the issue I would be willing to recreate the 10T
filesystem and send details.

Regards
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe


signature.asc
Description: Digital signature


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-03-08 Thread Andre Noll
On 19:46, Jens Axboe wrote:
 On Wed, Feb 28 2007, Andre Noll wrote:
  On 16:18, Andre Noll wrote:
  
   With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
   writing to both raid systems at the same time via lvm still locks up
   the system within minutes.
  
  Screenshot of the resulting kernel panic:
  
  http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
 
 Do you have the full oops as well?

Unfortunately not, as there's no way to scroll up after a kernel panic
(the screenshot was taken by using a KVM switch which just sends the
video output over ethernet).

Thanks
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe


signature.asc
Description: Digital signature


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-03-08 Thread Jens Axboe
On Thu, Mar 08 2007, Andre Noll wrote:
 On 19:46, Jens Axboe wrote:
  On Wed, Feb 28 2007, Andre Noll wrote:
   On 16:18, Andre Noll wrote:
   
With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
writing to both raid systems at the same time via lvm still locks up
the system within minutes.
   
   Screenshot of the resulting kernel panic:
   
 http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
  
  Do you have the full oops as well?
 
 Unfortunately not, as there's no way to scroll up after a kernel panic
 (the screenshot was taken by using a KVM switch which just sends the
 video output over ethernet).

Do you still have the vmlinux? It'd be interesting to see what

$ gbd vmlinux
(gdb) l *cfq_dispatch_insert+0x28

says, here that'd be cfqq dereference. And that must be valid, it's set
on allocation time and only cleared after free. So unless lvm issues
private requests that aren't properly allocated, this whole thing looks
very bizarre.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-03-08 Thread Andre Noll
On 10:02, Jens Axboe wrote:
 Do you still have the vmlinux? It'd be interesting to see what
 
 $ gbd vmlinux
 (gdb) l *cfq_dispatch_insert+0x28
 
 says, 

The vmlinux in the kernel dir is dated March 5 and my bug report
was Feb 28. So I'm afraid it's gone. I tried the gdb command anyway
but it only gave me

No symbol table is loaded.  Use the file command.

Sorry
Andre

-- 
The only person who always got his work done by Friday was Robinson Crusoe


signature.asc
Description: Digital signature


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-03-08 Thread Jens Axboe
On Thu, Mar 08 2007, Andre Noll wrote:
 On 10:02, Jens Axboe wrote:
  Do you still have the vmlinux? It'd be interesting to see what
  
  $ gbd vmlinux
  (gdb) l *cfq_dispatch_insert+0x28
  
  says, 
 
 The vmlinux in the kernel dir is dated March 5 and my bug report
 was Feb 28. So I'm afraid it's gone. I tried the gdb command anyway
 but it only gave me
 
   No symbol table is loaded.  Use the file command.

Yeah, you'd need CONFIG_DEBUG_INFO enabled as well. I don't think there
were any CFQ changes between feb 28 and march 5, so you could probably
still try it out. A quicker way:

- Edit .config and set CONFIG_DEBUG_INFO=y (near the bottom)
- make oldconfig
- rm block/cfq-iosched.o
- make block/cfq-iosched.o
- gdb block/cfq-iosched.o

(gdb) l *cfq_dispatch_insert+0x28

and see what that says. Should not take you more than a minute or so,
would appreciate it!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-03-08 Thread Andre Noll
On 10:36, Jens Axboe wrote:
 - Edit .config and set CONFIG_DEBUG_INFO=y (near the bottom)
 - make oldconfig
 - rm block/cfq-iosched.o
 - make block/cfq-iosched.o
 - gdb block/cfq-iosched.o
 
 (gdb) l *cfq_dispatch_insert+0x28
 
 and see what that says. Should not take you more than a minute or so,
 would appreciate it!

No problem, here we go:

# gdb block/cfq-iosched.o
GNU gdb 6.4-debian
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as x86_64-linux-gnu...Using host libthread_db library 
/lib/libthread_db.so.1.

(gdb) l *cfq_dispatch_insert+0x28
0xcf8 is in cfq_dispatch_insert (block/cfq-iosched.c:865).
860 }
861
862 static void cfq_dispatch_insert(request_queue_t *q, struct request *rq)
863 {
864 struct cfq_data *cfqd = q-elevator-elevator_data;
865 struct cfq_queue *cfqq = RQ_CFQQ(rq);
866
867 cfq_remove_request(rq);
868 cfqq-on_dispatch[rq_is_sync(rq)]++;
869 elv_dispatch_sort(q, rq);

Regards
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe


signature.asc
Description: Digital signature


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-03-07 Thread Andre Noll
On 20:39, Andrew Morton wrote:
 On Wed, 28 Feb 2007 16:37:22 +0100 Andre Noll [EMAIL PROTECTED] wrote:
 
  On 16:18, Andre Noll wrote:
  
   With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
   writing to both raid systems at the same time via lvm still locks up
   the system within minutes.
  
  Screenshot of the resulting kernel panic:
  
  http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
  
 
 It died in CFQ.  Please try a different IO scheduler.  Use something
 like
 
   echo deadline  /sys/block/sda/queue/scheduler
 
 This could still be the old qla2xxx bug, or it could be a new qla2xxx bug,
 or it could be a block bug, or it could be an LVM bug.

OK. I'm running with deadline right now. But I guess this kernel
panic was caused by an LVM bug because lockdep reported problems with
LVM. Nobody responded to my bug report on the LVM mailing list (see
http://www.redhat.com/archives/linux-lvm/2007-February/msg00102.html).

Non-working snapshots and no help from the mailing list convinced me
to ditch the lvm setup [1] in favour of linear software raid. This
means I can't do lvm-related tests any more.

BTW: Are ext3 filesystem sizes greater than 8T now officially
supported?

Thanks
Andre

[1] vg of two hardware raids, 10T together, a single lv and some snapshots
-- 


signature.asc
Description: Digital signature


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-03-07 Thread Jens Axboe
On Wed, Feb 28 2007, Andre Noll wrote:
 On 16:18, Andre Noll wrote:
 
  With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
  writing to both raid systems at the same time via lvm still locks up
  the system within minutes.
 
 Screenshot of the resulting kernel panic:
 
   http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png

Do you have the full oops as well?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-03-07 Thread Andrew Morton
On Wed, 7 Mar 2007 18:09:55 +0100 Andre Noll [EMAIL PROTECTED] wrote:

 On 20:39, Andrew Morton wrote:
  On Wed, 28 Feb 2007 16:37:22 +0100 Andre Noll [EMAIL PROTECTED] wrote:
  
   On 16:18, Andre Noll wrote:
   
With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
writing to both raid systems at the same time via lvm still locks up
the system within minutes.
   
   Screenshot of the resulting kernel panic:
   
 http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
   
  
  It died in CFQ.  Please try a different IO scheduler.  Use something
  like
  
  echo deadline  /sys/block/sda/queue/scheduler
  
  This could still be the old qla2xxx bug, or it could be a new qla2xxx bug,
  or it could be a block bug, or it could be an LVM bug.
 
 OK. I'm running with deadline right now. But I guess this kernel
 panic was caused by an LVM bug because lockdep reported problems with
 LVM. Nobody responded to my bug report on the LVM mailing list (see
 http://www.redhat.com/archives/linux-lvm/2007-February/msg00102.html).
 
 Non-working snapshots and no help from the mailing list convinced me
 to ditch the lvm setup [1] in favour of linear software raid. This
 means I can't do lvm-related tests any more.

Sigh.

 BTW: Are ext3 filesystem sizes greater than 8T now officially
 supported?

I think so, but I don't know how much 16TB testing developers and
distros are doing - perhaps the linux-ext4 denizens can tell us?
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-03-06 Thread Andrew Morton
On Wed, 28 Feb 2007 16:37:22 +0100 Andre Noll [EMAIL PROTECTED] wrote:

 On 16:18, Andre Noll wrote:
 
  With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
  writing to both raid systems at the same time via lvm still locks up
  the system within minutes.
 
 Screenshot of the resulting kernel panic:
 
   http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
 

It died in CFQ.  Please try a different IO scheduler.  Use something
like

echo deadline  /sys/block/sda/queue/scheduler

This could still be the old qla2xxx bug, or it could be a new qla2xxx bug,
or it could be a block bug, or it could be an LVM bug.

Adrian, can we please track this as a regression?
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-02-28 Thread Andre Noll
On 10:51, Andrew Vasquez wrote:
 On Tue, 27 Feb 2007, Andre Noll wrote:
  [   68.532665] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
 
 Ok, since 2.6.20, there been a patch added to qla2xxx which drops the
 spin_unlock_irq() call while attempting to ramp-up the queue-depth:
 
 Could you try the latest 2.6.21-rc which contains the correction?

With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
writing to both raid systems at the same time via lvm still locks up
the system within minutes.

As lockdep revealed another dm-related lock problem on this kernel,
I guess I'll have to bother the lvm people on this.

Thanks
Andre


signature.asc
Description: Digital signature


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-02-28 Thread Andre Noll
On 16:18, Andre Noll wrote:

 With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
 writing to both raid systems at the same time via lvm still locks up
 the system within minutes.

Screenshot of the resulting kernel panic:

http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png

Andre


signature.asc
Description: Digital signature


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-02-27 Thread Andre Noll
On 10:26, Andrew Vasquez wrote:
 You are loading some stale firmware that's left over on the card --
 I'm not even sure what 4.00.70 is, as the latest release firmware is
 4.00.27.

That's the firmware which came with the card. Anyway, I just upgraded
the firmware, but the bug remains. The backtrace differs a bit though
as now the tg3 network driver seems to be involved as well.

Thanks for your help
Andre

[   67.511167] qla2xxx :05:08.0: Allocated (64 KB) for EFT...
[   67.511434] qla2xxx :05:08.0: Allocated (1413 KB) for firmware dump...
[   67.531231] scsi0 : qla2xxx
[   67.854344] qla2xxx :05:08.0: 
[   67.854346]  QLogic Fibre Channel HBA Driver: 8.01.07-k4
[   67.854347]   QLogic HP AE369-60001 - QLA2340
[   67.854348]   ISP2422: PCI-X Mode 1 (133 MHz) @ :05:08.0 hdma+, host#=0, 
fw=4.00.27 [IP] 
[   67.854881] ACPI: PCI Interrupt :05:08.1[B] - GSI 33 (level, low) - 
IRQ 33
[   67.855230] qla2xxx :05:08.1: Found an ISP2422, irq 33, iobase 
0xc2012000
[   67.855645] qla2xxx :05:08.1: Configuring PCI space...
[   67.855907] qla2xxx :05:08.1: Configure NVRAM parameters...
[   67.862486] qla2xxx :05:08.1: Verifying loaded RISC code...
[   68.106663] qla2xxx :05:08.1: Allocated (64 KB) for EFT...
[   68.107058] qla2xxx :05:08.1: Allocated (1413 KB) for firmware dump...
[   68.126759] scsi1 : qla2xxx
[   68.196783] Adding 6540152k swap on /dev/md2.  Priority:-1 extents:1 
across:6540152k
[   68.260645] qla2xxx :05:08.0: LIP reset occured (f8f7).
[   68.296027] qla2xxx :05:08.0: LIP occured (f8f7).
[   68.298214] qla2xxx :05:08.0: LOOP UP detected (2 Gbps).
[   68.326627] qla2xxx :05:08.1: 
[   68.326628]  QLogic Fibre Channel HBA Driver: 8.01.07-k4
[   68.326630]   QLogic HP AE369-60001 - QLA2340
[   68.326631]   ISP2422: PCI-X Mode 1 (133 MHz) @ :05:08.1 hdma+, host#=1, 
fw=4.00.27 [IP] 
[   68.504335] EXT3 FS on md1, internal journal
[   68.524627] PM: Writing back config space on device :03:06.0 at offset b 
(was 164814e4, writing d00e11)
[   68.524644] PM: Writing back config space on device :03:06.0 at offset 3 
(was 804000, writing 804010)
[   68.524650] PM: Writing back config space on device :03:06.0 at offset 2 
(was 200, writing 210)
[   68.524657] PM: Writing back config space on device :03:06.0 at offset 1 
(was 2b0, writing 2b00146)
[   68.532665] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
[   68.532784] 
[   68.532785] Call Trace:
[   68.532979]  IRQ  [8024b877] trace_hardirqs_on+0xd7/0x180
[   68.533168]  [80511f5b] _spin_unlock_irq+0x2b/0x40
[   68.533295]  [88032747] 
:qla2xxx:qla2x00_process_completed_request+0x137/0x1d0
[   68.533457]  [88032862] :qla2xxx:qla2x00_status_entry+0x82/0xa40
[   68.533577]  [8024b17f] __lock_acquire+0xcdf/0xd90
[   68.533693]  [80511ff2] _spin_unlock_irqrestore+0x42/0x60
[   68.533816]  [880343fe] :qla2xxx:qla24xx_intr_handler+0x4e/0x2b0
[   68.533942]  [88033551] 
:qla2xxx:qla24xx_process_response_queue+0xc1/0x1c0
[   68.534102]  [88034584] :qla2xxx:qla24xx_intr_handler+0x1d4/0x2b0
[   68.534224]  [8025e950] handle_IRQ_event+0x20/0x60
[   68.534339]  [802604ad] handle_fasteoi_irq+0xbd/0x110
[   68.534459]  [8020cf62] do_IRQ+0x132/0x1a0
[   68.534574]  [8020a236] ret_from_intr+0x0/0xf
[   68.534687]  EOI  [803ad15c] __delay+0xc/0x20
[   68.534862]  [803ad1a7] __const_udelay+0x37/0x40
[   68.534982]  [88006737] :tg3:tg3_chip_reset+0x547/0x670
[   68.535103]  [8800df2d] :tg3:tg3_reset_hw+0x5d/0x1790
[   68.535218]  [803ad1e7] __udelay+0x37/0x40
[   68.535333]  [8800408d] :tg3:_tw32_flush+0x6d/0x80
[   68.535451]  [88012196] :tg3:tg3_open+0x2d6/0x610
[   68.535569]  [8800f6a2] :tg3:tg3_init_hw+0x42/0x50
[   68.535687]  [880121a3] :tg3:tg3_open+0x2e3/0x610
[   68.535804]  [804b36e3] dev_open+0x43/0x90
[   68.535917]  [804b2814] dev_change_flags+0x74/0x160
[   68.536034]  [804f3e66] devinet_ioctl+0x2e6/0x730
[   68.536149]  [804b4bc2] dev_ioctl+0x302/0x340
[   68.536264]  [803aa71b] __up_read+0x9b/0xb0
[   68.536378]  [804f42fc] inet_ioctl+0x4c/0x70
[   68.536494]  [804a73ec] sock_ioctl+0x1fc/0x230
[   68.536610]  [8029c701] do_ioctl+0x31/0xa0
[   68.536722]  [8029ca2b] vfs_ioctl+0x2bb/0x2e0
[   68.536836]  [8029ca9a] sys_ioctl+0x4a/0x80
[   68.536948]  [80209cee] system_call+0x7e/0x83
[   68.537059] 
[   68.712832] scsi 0:0:0:0: Direct-Access transtec T6100F16R1-E 342I 
PQ: 0 ANSI: 5
[   68.713384] sda : very big device. try to use READ CAPACITY(16).
[   68.713594] SCSI device sda: 11714863104 512-byte hdwr sectors (5998010 MB)
[   68.713976] sda: Write Protect is off
[   68.714079] sda: Mode Sense: 9b 00 00 08
[   68.714483] SCSI device sda: write cache: disabled, read cache: enabled, 

Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-02-27 Thread Andre Noll
On 11:11, Andre Noll wrote:
 On 10:26, Andrew Vasquez wrote:
  You are loading some stale firmware that's left over on the card --
  I'm not even sure what 4.00.70 is, as the latest release firmware is
  4.00.27.
 
 That's the firmware which came with the card. Anyway, I just upgraded
 the firmware, but the bug remains.

the system crashed again btw., this time resulting in a kernel panic
instead of just locking up silently. Here's a screenshot:

http://systemlinux.org/~maan/shots/qla2xxx-crash-huangho2.png

Regards
Andre


signature.asc
Description: Digital signature


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-02-27 Thread Andrew Vasquez
On Tue, 27 Feb 2007, Andre Noll wrote:

 On 10:26, Andrew Vasquez wrote:
  You are loading some stale firmware that's left over on the card --
  I'm not even sure what 4.00.70 is, as the latest release firmware is
  4.00.27.
 
 That's the firmware which came with the card. Anyway, I just upgraded
 the firmware, but the bug remains. The backtrace differs a bit though
 as now the tg3 network driver seems to be involved as well.
 
 Thanks for your help
 Andre
...
 [   68.532665] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
 [   68.532784] 
 [   68.532785] Call Trace:
 [   68.532979]  IRQ  [8024b877] trace_hardirqs_on+0xd7/0x180
 [   68.533168]  [80511f5b] _spin_unlock_irq+0x2b/0x40
 [   68.533295]  [88032747] 
 :qla2xxx:qla2x00_process_completed_request+0x137/0x1d0
 [   68.533457]  [88032862] :qla2xxx:qla2x00_status_entry+0x82/0xa40
 [   68.533577]  [8024b17f] __lock_acquire+0xcdf/0xd90
 [   68.533693]  [80511ff2] _spin_unlock_irqrestore+0x42/0x60
 [   68.533816]  [880343fe] :qla2xxx:qla24xx_intr_handler+0x4e/0x2b0
 [   68.533942]  [88033551] 
 :qla2xxx:qla24xx_process_response_queue+0xc1/0x1c0
 [   68.534102]  [88034584] :qla2xxx:qla24xx_intr_handler+0x1d4/0x2b0

Ok, since 2.6.20, there been a patch added to qla2xxx which drops the
spin_unlock_irq() call while attempting to ramp-up the queue-depth:

commit befede3dabd204e9c546cbfbe391b29286c57da2
Author: Seokmann Ju [EMAIL PROTECTED]
Date:   Tue Jan 9 11:37:52 2007 -0800

[SCSI] qla2xxx: correct locking while call starget_for_each_device()

Removed spin_unlock_irq()/spin_lock_irq() pairs surrounding
starget_for_each_device() calls.
As Matthew W. pointed out, starget_for_each_device() can be called 
under
a spinlock being held.
The change has been tested and verified on qla2xxx.ko module.
Thanks Matthew W. and Hisashi H. for help.

Signed-off-by: Andrew Vasquez [EMAIL PROTECTED]
Signed-off-by: Seokmann Ju [EMAIL PROTECTED]
Signed-off-by: James Bottomley [EMAIL PROTECTED]

http://marc.theaimsgroup.com/?l=linux-scsim=116837234904583w=2

Could you try the latest 2.6.21-rc which contains the correction?

Regards,
Andrew Vasquez
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qla2xxx BUG: workqueue leaked lock or atomic

2007-02-26 Thread Andre Noll
Hi

On linux-2.6.20.1, we're seeing hard lockups with 2 raid systems
connected to a qla2xxx card and used as a single volume via lvm.
The system seems to lock up only if data gets written to both raid
systems at the same time.

On a standard kernel nothing makes it to the log, the system just
freezes. So we tried a lockdep kernel which reports two BUGs during
boot, see below.

Could this be related to our problem?

Thanks
Andre


[   64.150773] Loading iSCSI transport class v2.0-724.
[   64.151096] QLogic Fibre Channel HBA Driver
[   64.151405] ACPI: PCI Interrupt :05:08.0[A] - GSI 32 (level, low) - 
IRQ 32
[   64.151821] qla2xxx :05:08.0: Found an ISP2422, irq 32, iobase 
0xc2006000
[   64.152231] qla2xxx :05:08.0: Configuring PCI space...
[   64.152498] qla2xxx :05:08.0: Configure NVRAM parameters...
[   64.159088] qla2xxx :05:08.0: Verifying loaded RISC code...
[   74.169623] qla2xxx :05:08.0: Firmware image unavailable.
[   74.169737] qla2xxx :05:08.0: Firmware images can be retrieved from: 
ftp://ftp.qlogic.com/outgoing/linux/firmware/.
[   74.169902] qla2xxx :05:08.0: Attempting to load (potentially outdated) 
firmware from flash.
[   74.760935] qla2xxx :05:08.0: Allocated (64 KB) for EFT...
[   74.761186] qla2xxx :05:08.0: Allocated (1413 KB) for firmware dump...
[   74.776988] scsi0 : qla2xxx
[   74.961451] qla2xxx :05:08.0: 
[   74.961452]  QLogic Fibre Channel HBA Driver: 8.01.07-k4
[   74.961453]   QLogic HP AE369-60001 - QLA2340
[   74.961454]   ISP2422: PCI-X Mode 1 (133 MHz) @ :05:08.0 hdma+, host#=0, 
fw=4.00.70 [IP] 
[   74.961970] ACPI: PCI Interrupt :05:08.1[B] - GSI 33 (level, low) - 
IRQ 33
[   74.962296] qla2xxx :05:08.1: Found an ISP2422, irq 33, iobase 
0xc2172000
[   74.962662] qla2xxx :05:08.1: Configuring PCI space...
[   74.962914] qla2xxx :05:08.1: Configure NVRAM parameters...
[   74.969494] qla2xxx :05:08.1: Verifying loaded RISC code...
[   75.353426] qla2xxx :05:08.0: LIP reset occured (f7f7).
[   75.385670] qla2xxx :05:08.0: LIP occured (f7f7).
[   75.388282] qla2xxx :05:08.0: LOOP UP detected (2 Gbps).
[   75.778656] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
[   75.778771] 
[   75.778772] Call Trace:
[   75.778967]  IRQ  [8024b877] trace_hardirqs_on+0xd7/0x180
[   75.779154]  [8052bc1b] _spin_unlock_irq+0x2b/0x40
[   75.779271]  [804605d7] 
qla2x00_process_completed_request+0x137/0x1d0
[   75.779424]  [804606f2] qla2x00_status_entry+0x82/0xa40
[   75.779541]  [8024b17f] __lock_acquire+0xcdf/0xd90
[   75.779657]  [8052bcb2] _spin_unlock_irqrestore+0x42/0x60
[   75.779775]  [8046228e] qla24xx_intr_handler+0x4e/0x2b0
[   75.779892]  [804613e1] qla24xx_process_response_queue+0xc1/0x1c0
[   75.780012]  [80462414] qla24xx_intr_handler+0x1d4/0x2b0
[   75.780131]  [8025e950] handle_IRQ_event+0x20/0x60
[   75.780270]  [802604ad] handle_fasteoi_irq+0xbd/0x110
[   75.780411]  [8020cf62] do_IRQ+0x132/0x1a0
[   75.780545]  [80208430] default_idle+0x0/0x60
[   75.780682]  [8020a236] ret_from_intr+0x0/0xf
[   75.780818]  EOI  [80208467] default_idle+0x37/0x60
[   75.781021]  [80208469] default_idle+0x39/0x60
[   75.781156]  [80208467] default_idle+0x37/0x60
[   75.781294]  [802084f1] cpu_idle+0x61/0x90
[   75.781429]  [806d6f8b] start_secondary+0x51b/0x530
[   75.781569] 
[   75.781873] scsi 0:0:0:0: Direct-Access transtec T6100F16R1-E 342I 
PQ: 0 ANSI: 5
[   75.782532] BUG: workqueue leaked lock or atomic: scsi_wq_0/0x/362
[   75.782678] last function: fc_scsi_scan_rport+0x0/0x90
[   75.782878] 1 lock held by scsi_wq_0/362:
[   75.783008]  #0:  (shost-scan_mutex){--..}, at: [80529fe5] 
mutex_lock+0x25/0x30
[   75.783517] 
[   75.783518] Call Trace:
[   75.783754]  [80248319] debug_show_held_locks+0x9/0x10
[   75.783896]  [8023eb49] run_workqueue+0x149/0x1a0
[   75.784036]  [802427c0] keventd_create_kthread+0x0/0x90
[   75.784180]  [8023edc1] worker_thread+0x151/0x190
[   75.784322]  [80227e80] default_wake_function+0x0/0x10
[   75.784463]  [8023ec70] worker_thread+0x0/0x190
[   75.784600]  [80242a2a] kthread+0xda/0x110
[   75.784737]  [8020ab08] child_rip+0xa/0x12
[   75.784875]  [8052bc1b] _spin_unlock_irq+0x2b/0x40
[   75.785014]  [8020a28c] restore_args+0x0/0x30
[   75.785149]  [80242950] kthread+0x0/0x110
[   75.785285]  [8020aafe] child_rip+0x0/0x12
[   75.785417] 
[   84.980341] qla2xxx :05:08.1: Firmware image unavailable.
[   84.980455] qla2xxx :05:08.1: Firmware images can be retrieved from: 
ftp://ftp.qlogic.com/outgoing/linux/firmware/.
[   84.980620] qla2xxx :05:08.1: Attempting to load (potentially outdated) 
firmware from flash.
[   85.571726] qla2xxx :05:08.1: Allocated (64 KB) for 

Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-02-26 Thread Andrew Vasquez
On Mon, 26 Feb 2007, Andre Noll wrote:

 On linux-2.6.20.1, we're seeing hard lockups with 2 raid systems
 connected to a qla2xxx card and used as a single volume via lvm.
 The system seems to lock up only if data gets written to both raid
 systems at the same time.
 
 On a standard kernel nothing makes it to the log, the system just
 freezes. So we tried a lockdep kernel which reports two BUGs during
 boot, see below.
 
 Could this be related to our problem?

Before we proceed further, could you retrieve the latest firmware
release for 24xx type HBAs:

 [   64.151096] QLogic Fibre Channel HBA Driver
 [   64.151405] ACPI: PCI Interrupt :05:08.0[A] - GSI 32 (level, low) - 
 IRQ 32
 [   64.151821] qla2xxx :05:08.0: Found an ISP2422, irq 32, iobase 
 0xc2006000
 [   64.152231] qla2xxx :05:08.0: Configuring PCI space...
 [   64.152498] qla2xxx :05:08.0: Configure NVRAM parameters...
 [   64.159088] qla2xxx :05:08.0: Verifying loaded RISC code...
 [   74.169623] qla2xxx :05:08.0: Firmware image unavailable.
 [   74.169737] qla2xxx :05:08.0: Firmware images can be retrieved from: 
 ftp://ftp.qlogic.com/outgoing/linux/firmware/.
 [   74.169902] qla2xxx :05:08.0: Attempting to load (potentially 
 outdated) firmware from flash.
 [   74.760935] qla2xxx :05:08.0: Allocated (64 KB) for EFT...
 [   74.761186] qla2xxx :05:08.0: Allocated (1413 KB) for firmware dump...
 [   74.776988] scsi0 : qla2xxx
 [   74.961451] qla2xxx :05:08.0: 
 [   74.961452]  QLogic Fibre Channel HBA Driver: 8.01.07-k4
 [   74.961453]   QLogic HP AE369-60001 - QLA2340
 [   74.961454]   ISP2422: PCI-X Mode 1 (133 MHz) @ :05:08.0 hdma+, 
 host#=0, fw=4.00.70 [IP] 

You are loading some stale firmware that's left over on the card --
I'm not even sure what 4.00.70 is, as the latest release firmware is
4.00.27.  You can retrieve the image here:

ftp://ftp.qlogic.com/outgoing/linux/firmware/ql2400_fw.bin

Let's start there... before we move on to this:

 [   75.778656] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
 [   75.778771] 
 [   75.778772] Call Trace:
 [   75.778967]  IRQ  [8024b877] trace_hardirqs_on+0xd7/0x180
 [   75.779154]  [8052bc1b] _spin_unlock_irq+0x2b/0x40
 [   75.779271]  [804605d7] 
 qla2x00_process_completed_request+0x137/0x1d0
 [   75.779424]  [804606f2] qla2x00_status_entry+0x82/0xa40
 [   75.779541]  [8024b17f] __lock_acquire+0xcdf/0xd90
 [   75.779657]  [8052bcb2] _spin_unlock_irqrestore+0x42/0x60
 [   75.779775]  [8046228e] qla24xx_intr_handler+0x4e/0x2b0
 [   75.779892]  [804613e1] qla24xx_process_response_queue+0xc1/0x1c0
 [   75.780012]  [80462414] qla24xx_intr_handler+0x1d4/0x2b0
 [   75.780131]  [8025e950] handle_IRQ_event+0x20/0x60

Hmm

Regards,
Andrew Vasquez
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html