[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)

2011-04-27 Thread Lars
Hi BDV,

I suggest to update the driver with the most recent one from LSI and install it 
using dkms.
You might need to change the source a little bit, because there are statements 
like
#if (KERNEL_VERSION >= 2.6.32) .

but it has to be 
#if (KERNEL_VERSION > 2.6.32) .

You'll find it just easily.

Try to update your firmware if possible, too. There are linux utils to
do this from LSI available.

Good luck.
Lars

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/599830

Title:
  system hangs after strange errors - raid6 and xfs defective (lsi
  driver?)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)

2011-04-27 Thread BDV
One of my servers does have a similar problem.

Ubuntu 10.04.2 LTS 64 bit
kernel 2.6.32-31-server

The RAID controller is a Symbios Logic LSI MegaSAS 9260 (rev 03)
(default drivers)

Today I found next error in dmesg
" task xfssyncd:  blocked for more than 120 seconds"

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/599830

Title:
  system hangs after strange errors - raid6 and xfs defective (lsi
  driver?)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)

2011-01-13 Thread Lars
Hi Jason,

the previous module versions were the ones shipped with ubuntu amd64
server 10.04(.1) LTS (up to 2.6.32-25-server) and 10.10
(2.6.35-24-server).

I'll have a look at DKMS. I never used it yet.

If the server runs without related issues for the next 2 month I'll
close the bug if it's still open.

Thanks again.
Lars

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/599830

Title:
  system hangs after strange errors - raid6 and xfs defective (lsi
  driver?)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)

2011-01-13 Thread Jason Unrein
That is good to hear.  There have been some workarounds in the LSI
driver to handle problems similar to this.  It might be that you're
still having a problem but the new fw/driver combination is able to mask
it much better.  I would keep an eye on your system for anything
suspicious for a while so you don't have any down time.

I was curious if you knew what driver you were using before you upgraded
to the LSI driver.  I could try to compare the two and see what the
differences were (probably a lot but maybe a specific change will stand
out).  If you don't know the driver version, then your kernel version
will help narrow it down.

Also, that driver should come with a DKMS package.  It only has builds
for Redhat Enterprise and Suse Linux Enterprise but you should be able
to use DKMS to build it for your kernel and then future updates of your
kernel should rebuild it automatically.  Here's a link to DKMS if you're
not familiar: https://help.ubuntu.com/community/DKMS.  You should only
have to do the bottom half since LSI builds the dkms package for
everyone.

Lastly, if you're satisfied with LSI driver, you might close the bug.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/599830

Title:
  system hangs after strange errors - raid6 and xfs defective (lsi
  driver?)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)

2011-01-07 Thread Lars
Ha, wait!

I forgott the most important fact.
I installed a very much newer driver version manually. And I do it every kernel 
update again.
It's the driver from the zip archive from lsi. Actually it is for redhat and 
suse but the archive contains a source tarball.

http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas3081e-r/index.html

# cat /sys/module/mptbase/version 
4.24.00.00

Regards
Lars

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/599830

Title:
  system hangs after strange errors - raid6 and xfs defective (lsi driver?)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)

2011-01-07 Thread Lars
Hi Jason,

thanks for your hints.
I did a FW update of the LSI SAS controller and reduced the fs content. Since 
the the update and the filesystems are less filled the error didn't occur again.

# cat /proc/scsi/mptsas/9
ioc1: LSISAS1068E B3, FwRev=011f0200h, Ports=1, MaxQ=483
# cat /proc/scsi/mptsas/10
ioc2: LSISAS1068E B3, FwRev=011f0200h, Ports=1, MaxQ=483


# ./sasflash -listall

 
LSI Corporation SAS FLASH Utility.

SASFlash Version 1.26.00.00 (2010.05.18)

Copyright (c) 2006-2007 LSI Corporation. All rights reserved.
 

Adapter Selected is a LSI SAS 1068E(B3):

 Num   Ctlr  FW Ver NVDATA   x86-BIOS EFI-BSDPCI Addr
---

1   1068E(B3)  01.31.02.00  2d.03  06.32.00.00No Image   00:08:00:00
2   1068E(B3)  01.31.02.00  2d.03  06.32.00.00No Image   00:09:00:00


The fs look like this:
# LANG=C df -ht xfs
FilesystemSize  Used Avail Use% Mounted on
/dev/md2  6.1T  2.2T  3.9T  36% /backup2
/dev/md3  6.1T  3.8T  2.3T  63% /backup1

Just for your interest:
# cat /sys/block/sd?/device/ioerr_cnt /sys/block/sd??/device/ioerr_cnt
0x358
0x358
0x53
0x48
0x47
0x46
0x59
0x55
0x55
0x60
0x63
0x62
0x60
0x5e
0x6c
0x62
0x60
0x67
0x68
0x6c
0x76
0x70
0x72
0x6e
0x6d
0x65
0xc3
0xbd
0xc5
0xca
0xf0
0x104
0x107
0x113
0x119
0x127
0x11b
0x127
0x126
0x12d
0x12f
0x13c
0x12e
0x142
0x17f
0x13a
0x141
0x144
0x13e
0x141


The first 2 drives are attached through SATA controller (AHCI). I don't know 
what numbers are normal but there is a server with drives that have an error 
count of more than 850 and work flawlessly.
I would disable NCQ only at very last step, because throughput is important. 
The server has to fill 2 LTO tapes with fast write speed.

Is it possible to reopen bug reports? If yes, I think you can close this one 
for now.
I'll report when problems occur again.

Thanks
Lars

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/599830

Title:
  system hangs after strange errors - raid6 and xfs defective (lsi driver?)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)

2011-01-06 Thread Jason Unrein
The mpt messages in your logs suggest that the firmware had an NCQ
problem that required it to abort all the outstanding commands and have
the OS retry them (see http://en.wikipedia.org/wiki/NCQ for what NCQ
is).  You can disable NCQ, at the cost of IO performance usually, to
work around the issue (see
https://ata.wiki.kernel.org/index.php/Libata_FAQ#Enabling.2C_disabling_and_checking_NCQ).

The problem would probably either be a bad drive or off change a bad
cable or card.  You might check each driver with smartctl to confirm
their health.  You might also what watch /sys/block/sdX/device/ioerr_cnt
for each device to help clue in on any problems (never used the file
before so I'd be curious if it helps).

Also, the xfs.log shows a panic from a null pointer.  This is probably
just a result of the problems on with the fw<->drive communication.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/599830

Title:
  system hangs after strange errors - raid6 and xfs defective (lsi driver?)

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)

2010-07-05 Thread Lars
Hallo again,

here are some other logs that might be connected to the failure:

[...]
Jul  2 17:52:25 speicher48 kernel: [17690.334458] sd 10:0:20:0: [sdx] CDB: 
Read(10): 28 00 00 00 00 00 00 00 08 00
Jul  2 17:52:25 speicher48 kernel: [17690.338722] sd 10:0:20:0: [sdx] Device 
not ready
Jul  2 17:52:25 speicher48 kernel: [17690.338723] sd 10:0:20:0: [sdx] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul  2 17:52:25 speicher48 kernel: [17690.338726] sd 10:0:20:0: [sdx] Sense Key 
: Not Ready [current] 
Jul  2 17:52:25 speicher48 kernel: [17690.338728] sd 10:0:20:0: [sdx] Add. 
Sense: Logical unit failed self-configuration
Jul  2 17:52:25 speicher48 kernel: [17690.338731] sd 10:0:20:0: [sdx] CDB: 
Read(10): 28 00 00 00 00 00 00 00 08 00
Jul  2 17:52:25 speicher48 kernel: [17690.342955] sd 10:0:20:0: [sdx] Device 
not ready
Jul  2 17:52:25 speicher48 kernel: [17690.342956] sd 10:0:20:0: [sdx] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul  2 17:52:25 speicher48 kernel: [17690.342959] sd 10:0:20:0: [sdx] Sense Key 
: Not Ready [current] 
Jul  2 17:52:25 speicher48 kernel: [17690.342961] sd 10:0:20:0: [sdx] Add. 
Sense: Logical unit failed self-configuration
Jul  2 17:52:25 speicher48 kernel: [17690.342964] sd 10:0:20:0: [sdx] CDB: 
Read(10): 28 00 00 00 10 00 00 00 08 00
Jul  2 17:52:25 speicher48 kernel: [17690.374555] sd 10:0:20:0: [sdx] Device 
not ready
Jul  2 17:52:25 speicher48 kernel: [17690.374562] sd 10:0:20:0: [sdx] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul  2 17:52:25 speicher48 kernel: [17690.374569] sd 10:0:20:0: [sdx] Sense Key 
: Not Ready [current] 
Jul  2 17:52:25 speicher48 kernel: [17690.374577] sd 10:0:20:0: [sdx] Add. 
Sense: Logical unit failed self-configuration
Jul  2 17:52:25 speicher48 kernel: [17690.374587] sd 10:0:20:0: [sdx] CDB: 
Read(10): 28 00 22 ee c0 80 00 00 08 00
Jul  2 17:52:25 speicher48 kernel: [17690.379051] sd 10:0:20:0: [sdx] Device 
not ready
Jul  2 17:52:25 speicher48 kernel: [17690.379055] sd 10:0:20:0: [sdx] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul  2 17:52:25 speicher48 kernel: [17690.379061] sd 10:0:20:0: [sdx] Sense Key 
: Not Ready [current] 
Jul  2 17:52:25 speicher48 kernel: [17690.379068] sd 10:0:20:0: [sdx] Add. 
Sense: Logical unit failed self-configuration
Jul  2 17:52:25 speicher48 kernel: [17690.379076] sd 10:0:20:0: [sdx] CDB: 
Read(10): 28 00 22 ee c0 80 00 00 08 00
Jul  2 17:52:25 speicher48 kernel: [17690.383570] sd 10:0:20:0: [sdx] Device 
not ready
Jul  2 17:52:25 speicher48 kernel: [17690.383575] sd 10:0:20:0: [sdx] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul  2 17:52:25 speicher48 kernel: [17690.383581] sd 10:0:20:0: [sdx] Sense Key 
: Not Ready [current] 
Jul  2 17:52:25 speicher48 kernel: [17690.383588] sd 10:0:20:0: [sdx] Add. 
Sense: Logical unit failed self-configuration
Jul  2 17:52:25 speicher48 kernel: [17690.383597] sd 10:0:20:0: [sdx] CDB: 
Read(10): 28 00 22 ee c1 20 00 00 08 00
Jul  2 17:52:25 speicher48 kernel: [17690.388115] sd 10:0:20:0: [sdx] Device 
not ready
Jul  2 17:52:25 speicher48 kernel: [17690.388120] sd 10:0:20:0: [sdx] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul  2 17:52:25 speicher48 kernel: [17690.388126] sd 10:0:20:0: [sdx] Sense Key 
: Not Ready [current] 
Jul  2 17:52:25 speicher48 kernel: [17690.388133] sd 10:0:20:0: [sdx] Add. 
Sense: Logical unit failed self-configuration
Jul  2 17:52:25 speicher48 kernel: [17690.388141] sd 10:0:20:0: [sdx] CDB: 
Read(10): 28 00 22 ee c1 20 00 00 08 00
Jul  2 17:52:25 speicher48 kernel: [17690.392689] sd 10:0:20:0: [sdx] Device 
not ready
[...]

after this sdx was fallen out of md2

Thanks
Lars

-- 
system hangs after strange errors - raid6 and xfs defective (lsi driver?)
https://bugs.launchpad.net/bugs/599830
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)

2010-07-01 Thread Lars
Hi,

here is a dmesg log of the system.
There are a lot of messages and errors from the LSI driver module (mptbase). 
Maybe someone knows how to interprete these.

[...]
[48087.552179] mptbase: ioc0: LogInfo(0x3108): Originator={PL}, Code={SATA 
NCQ Fail All Commands After Error}, SubCode(0x)
[48087.552381] mptbase: ioc0: LogInfo(0x3108): Originator={PL}, Code={SATA 
NCQ Fail All Commands After Error}, SubCode(0x)
[48087.552606] mptbase: ioc0: LogInfo(0x3108): Originator={PL}, Code={SATA 
NCQ Fail All Commands After Error}, SubCode(0x)
[48087.552831] mptbase: ioc0: LogInfo(0x3108): Originator={PL}, Code={SATA 
NCQ Fail All Commands After Error}, SubCode(0x)
[48578.283740] mptbase: ioc2: LogInfo(0x31110700): Originator={PL}, 
Code={Reset}, SubCode(0x0700)
[48579.771554] mptbase: ioc2: LogInfo(0x31110700): Originator={PL}, 
Code={Reset}, SubCode(0x0700)
[48579.771724] mptbase: ioc2: LogInfo(0x31110700): Originator={PL}, 
Code={Reset}, SubCode(0x0700)
[48579.771905] mptbase: ioc2: LogInfo(0x31110700): Originator={PL}, 
Code={Reset}, SubCode(0x0700)
[...]


Regards
Lars

** Attachment added: "dm.log"
   http://launchpadlibrarian.net/51223911/dm.log

-- 
system hangs after strange errors - raid6 and xfs defective (lsi driver?)
https://bugs.launchpad.net/bugs/599830
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)

2010-06-29 Thread priya
** Package changed: ubuntu => ecs (Ubuntu)

-- 
system hangs after strange errors - raid6 and xfs defective (lsi driver?)
https://bugs.launchpad.net/bugs/599830
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)

2010-06-29 Thread Lars

** Attachment added: "error during xfs access"
   http://launchpadlibrarian.net/51117657/xfs.log1

-- 
system hangs after strange errors - raid6 and xfs defective (lsi driver?)
https://bugs.launchpad.net/bugs/599830
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)

2010-06-29 Thread Lars

** Attachment added: "screen shot of hanging system"
   http://launchpadlibrarian.net/51117530/sp48-hang.jpg

-- 
system hangs after strange errors - raid6 and xfs defective (lsi driver?)
https://bugs.launchpad.net/bugs/599830
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs