Re: qla2xxx errors? (2.6.19)

2007-07-02 Thread Csillag Tamas
On 06/26, James Bottomley wrote:
> On Tue, 2007-06-26 at 05:46 -0700, Andrew Vasquez wrote:
> > On Thu, 21 Jun 2007, Csillag Tamas wrote:
> > > Hi Andrew,
> > > 
> > > On 06/12, Andrew Vasquez wrote:
> > > ...
> > > > I can setup somthing similar locally, I take it you are writing
> > > > directly some some lun exported off the DS400, and not using DM?
> > > ...
> > > 
> > > Do you have any results? Does the information I sent regarding my
> > > system helped?
> > > 
> > > Please tell me if you need more information or anything that can help.
> > 
> > The initial logs don't indicate any 'notable' driver issues during the
> > I/O test.  Concurrently, we've been unable to reproduce on similar
> > storage along with other array devices.  I've escalated this to our
> > formal support group (which should be in contact with you shortly) in
> > hopes of triaging this further and getting resolution.
> 
> The DS400 is actually a discontinued array ... any chance the original
> reproducer could redo the test with an LSI or an Emulex FC adapter just
> to see if its qlogic or array related?

Well. I do not have any other card right now. The problem only appears
with kernel version > 2.6.18. So I do not think that this is array
related.

If you say it is worth trying with other cards I will try to borrow one,
but please confirm first as it is not easy for me to get one.

Thanks.

Best regards,
cstamas
-- 
CSILLAG Tamas (cstamas) - http://digitus.itk.ppke.hu/~cstamas

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qla2xxx errors? (2.6.19)

2007-06-26 Thread James Bottomley
On Tue, 2007-06-26 at 05:46 -0700, Andrew Vasquez wrote:
> On Thu, 21 Jun 2007, Csillag Tamas wrote:
> 
> > Hi Andrew,
> > 
> > On 06/12, Andrew Vasquez wrote:
> > ...
> > > I can setup somthing similar locally, I take it you are writing
> > > directly some some lun exported off the DS400, and not using DM?
> > ...
> > 
> > Do you have any results? Does the information I sent regarding my
> > system helped?
> > 
> > Please tell me if you need more information or anything that can help.
> 
> The initial logs don't indicate any 'notable' driver issues during the
> I/O test.  Concurrently, we've been unable to reproduce on similar
> storage along with other array devices.  I've escalated this to our
> formal support group (which should be in contact with you shortly) in
> hopes of triaging this further and getting resolution.

The DS400 is actually a discontinued array ... any chance the original
reproducer could redo the test with an LSI or an Emulex FC adapter just
to see if its qlogic or array related?

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qla2xxx errors? (2.6.19)

2007-06-26 Thread Andrew Vasquez
On Thu, 21 Jun 2007, Csillag Tamas wrote:

> Hi Andrew,
> 
> On 06/12, Andrew Vasquez wrote:
> ...
> > I can setup somthing similar locally, I take it you are writing
> > directly some some lun exported off the DS400, and not using DM?
> ...
> 
> Do you have any results? Does the information I sent regarding my
> system helped?
> 
> Please tell me if you need more information or anything that can help.

The initial logs don't indicate any 'notable' driver issues during the
I/O test.  Concurrently, we've been unable to reproduce on similar
storage along with other array devices.  I've escalated this to our
formal support group (which should be in contact with you shortly) in
hopes of triaging this further and getting resolution.

Regards,
Andrew Vasquez
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qla2xxx errors? (2.6.19)

2007-06-21 Thread Csillag Tamas
Hi Andrew,

On 06/12, Andrew Vasquez wrote:
...
> I can setup somthing similar locally, I take it you are writing
> directly some some lun exported off the DS400, and not using DM?
...

Do you have any results? Does the information I sent regarding my
system helped?

Please tell me if you need more information or anything that can help.

Thanks.
-- 
CSILLAG Tamas (cstamas) - http://digitus.itk.ppke.hu/~cstamas

"I busted a mirror and got seven years bad luck, but my lawyer
thinks he can get me five."
 -- Steven Wright
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qla2xxx errors? (2.6.19)

2007-06-13 Thread Csillag Tamas
On 06/12, Andrew Vasquez wrote:
> Could you load the driver with the ql2xextended_error_logging module
> parameter enabled:
> 
>   $ echo "6 4 1 7"  > /proc/sys/kernel/printk
>   $ insmod qla2xxx.ko ql2xextended_error_logging=1
> 
> and forward over the resultant messages file beginning with the load
> of the driver to the point at which the File-system failure occurs.

Ok, here we go...

Thanks.
-- 
CSILLAG Tamas (cstamas) - http://digitus.itk.ppke.hu/~cstamas

"Not all those who wander are lost."  - JRR Tolkien

2007-06-13_00:10:34.36780 kern.info: ACPI: PCI interrupt for device 
:06:01.1 disabled
2007-06-13_00:10:34.38819 kern.info: ACPI: PCI interrupt for device 
:06:01.0 disabled
2007-06-13_00:10:37.77002 kern.info: QLogic Fibre Channel HBA Driver
2007-06-13_00:10:37.77014 kern.info: ACPI: PCI Interrupt :06:01.0[A] -> GSI 
49 (level, low) -> IRQ 21
2007-06-13_00:10:37.77015 kern.info: qla2xxx :06:01.0: Found an ISP2312, 
irq 21, iobase 0xf885c000
2007-06-13_00:10:37.77023 kern.info: qla2xxx :06:01.0: Configuring PCI 
space...
2007-06-13_00:10:37.77041 kern.info: qla2xxx :06:01.0: Configure NVRAM 
parameters...
2007-06-13_00:10:37.85574 kern.info: qla2xxx :06:01.0: Verifying loaded 
RISC code...
2007-06-13_00:10:37.85576 kern.warn: scsi(5):  Load RISC code 
2007-06-13_00:10:37.95150 kern.warn: scsi(5): Verifying Checksum of loaded RISC 
code.
2007-06-13_00:10:37.97443 kern.warn: scsi(5): Checksum OK, start firmware.
2007-06-13_00:10:38.01821 kern.info: qla2xxx :06:01.0: Extended memory 
detected (512 KB)...
2007-06-13_00:10:38.01822 kern.info: qla2xxx :06:01.0: Resizing request 
queue depth (2048 -> 4096)...
2007-06-13_00:10:38.01829 kern.info: qla2xxx :06:01.0: Allocated (1308 KB) 
for firmware dump...
2007-06-13_00:10:38.05235 kern.warn: scsi(5): Issue init firmware.
2007-06-13_00:10:38.07547 kern.warn: DEBUG: detect hba 5 at address = e23262f8
2007-06-13_00:10:38.07549 kern.info: scsi5 : qla2xxx
2007-06-13_00:10:40.07542 kern.warn: scsi(5): qla2x00_loop_resync()
2007-06-13_00:10:41.66893 kern.warn: scsi(5): Asynchronous P2P MODE received.
2007-06-13_00:10:41.66896 kern.warn: scsi(5): Asynchronous LOOP UP (2 Gbps).
2007-06-13_00:10:41.66897 kern.info: qla2xxx :06:01.0: LOOP UP detected (2 
Gbps).
2007-06-13_00:10:41.67430 kern.warn: scsi(5): Asynchronous PORT UPDATE.
2007-06-13_00:10:41.67435 kern.info: scsi(5): Port database changed  0006 
.
2007-06-13_00:10:41.80908 kern.warn: scsi(5): Asynchronous PORT UPDATE ignored 
/0006/.
2007-06-13_00:10:41.80909 kern.warn: scsi(5): Asynchronous PORT UPDATE ignored 
/0007/.
2007-06-13_00:10:41.82060 kern.warn: scsi(5): Asynchronous PORT UPDATE ignored 
/0004/.
2007-06-13_00:10:42.14759 kern.warn: scsi(5): F/W Ready - OK 
2007-06-13_00:10:42.15900 kern.warn: scsi(5): fw_state=3 curr time=13a9568.
2007-06-13_00:10:42.17041 kern.warn: scsi(5): Configure loop -- dpc flags 
=0x40800e1
2007-06-13_00:10:42.18179 kern.warn: scsi(5): RSCN queue entry[0] = [00/00].
2007-06-13_00:10:42.19295 kern.warn: scsi(5): device_resync: rscn overflow.
2007-06-13_00:10:42.20485 kern.warn: scsi(5): RFT_ID exiting normally.
2007-06-13_00:10:42.21633 kern.warn: scsi(5): RFF_ID exiting normally.
2007-06-13_00:10:42.22760 kern.warn: scsi(5): RNN_ID exiting normally.
2007-06-13_00:10:42.23881 kern.warn: scsi(5): RSNN_NN exiting normally.
2007-06-13_00:10:42.25540 kern.warn: scsi(5): GID_PT entry - nn 
20112593fc1c pn 21112593fc1c portid=010100.
2007-06-13_00:10:42.26671 kern.warn: scsi(5): GID_PT entry - nn 
20112593f89c pn 21112593f89c portid=010200.
2007-06-13_00:10:42.27796 kern.warn: scsi(5): GID_PT entry - nn 
20145e241c2c pn 21145e241c2c portid=010300.
2007-06-13_00:10:42.28885 kern.warn: scsi(5): GID_PT entry - nn 
20145e241dd2 pn 21145e241dd2 portid=010400.
2007-06-13_00:10:42.29975 kern.warn: scsi(5): GID_PT entry - nn 
2000d12672a5 pn 2100d12672a5 portid=011100.
2007-06-13_00:10:42.29976 kern.warn: scsi(5): device wrap (011100)
2007-06-13_00:10:42.31919 kern.warn: scsi(5): Trying Fabric Login w/loop id 
0x0081 for port 010100.
2007-06-13_00:10:42.32992 kern.warn: scsi(5): Trying Fabric Login w/loop id 
0x0082 for port 010200.
2007-06-13_00:10:42.34015 kern.warn: scsi(5): Trying Fabric Login w/loop id 
0x0083 for port 010300.
2007-06-13_00:10:42.35005 kern.warn: scsi(5): Trying Fabric Login w/loop id 
0x0084 for port 011100.
2007-06-13_00:10:42.39558 kern.warn: scsi(5): LOOP READY
2007-06-13_00:10:42.39560 kern.info: qla2xxx :06:01.0: 
2007-06-13_00:10:42.39561 kern.warn:  QLogic Fibre Channel HBA Driver: 
8.01.07-k7-debug
2007-06-13_00:10:42.39562 kern.warn:   QLogic IBM FCEC - 
2007-06-13_00:10:42.39563 kern.warn:   ISP2312: PCI-X (133 MHz) @ :06:01.0 
hdma-, host#=5, fw=3.03.18 IPX
2007-06-13_00:10:42.39563 kern.info: ACPI: PCI Interrupt :06:01.1[B] -> GSI 
50 (level, low) -> IRQ 23
2007-06-13_00:10:42.39564 kern.info: qla2xxx 000

Re: qla2xxx errors? (2.6.19)

2007-06-12 Thread Andrew Vasquez
On Sun, 10 Jun 2007, Csillag Tamas wrote:

> The university I work at has 4 Blade HS20 servers connected to a IBM
> DS400 storage via Fibre Channel.
> The fibre channel module is:
> 
> 06:01.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 02)
> 06:01.1 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev 02)
> 
> # lsmod
> Module  Size  Used by
> ...
> qla2xxx   143584  1
> 
> # l /lib/firmware 
> total 244
> lrwxrwxrwx 1 root root 21 2007-05-05 22:02 ql2300_fw.bin -> 
> ql2300_fw.bin.3.03.18
> 
> With recent kernels filesystem corruption occurs under high disk I/O.
> I have just tried 2.6.19 and I got the same, but kernel 2.6.18 and
> before looks safe.

Could you load the driver with the ql2xextended_error_logging module
parameter enabled:

$ echo "6 4 1 7"  > /proc/sys/kernel/printk
$ insmod qla2xxx.ko ql2xextended_error_logging=1

and forward over the resultant messages file beginning with the load
of the driver to the point at which the File-system failure occurs.

> The test I use is the following:
> I first checkout the kernel's git tree:
> # git-clone 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 
> linux-2.6.git
> Then
> # for i in {100..400}; do; rsync -a linux-2.6.git/ c2_$i& done
>  waiting for finish
> # for i in {100..400}; do; rsync -a linux-2.6.git/ c3_$i& done
>  waiting for finish
> # for i in {100..400}; do; rsync -a linux-2.6.git/ c4_$i& done
> 
> Some error messages:
> ReiserFS: warning: is_tree_node: node level 32010 does not match to the 
> expected one 1
> ReiserFS: sdb1: warning: vs-5150: search_by_key: invalid format found in 
> block 8093723. Fsck?
> ReiserFS: sdb1: warning: vs-13070: reiserfs_read_locked_inode: i/o failure 
> occurred trying to find stat data of [961831 1064427 0x0 SD]

I can setup somthing similar locally, I take it you are writing
directly some some lun exported off the DS400, and not using DM?

> I am able to reproduce this errors.
> 
> With other filesystems I get similar errors.

Just curious, what are those errors?

> I suspect a bug in qla2xxx, but I am not sure.
> Can you help me how to track down this problem?

Please also provide the output of:

$ cat /proc/scsi/scsi
$ lspci -vvx

and your .config file.

Regards,
Andrew Vasquez
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html