[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)
Hi BDV, I suggest to update the driver with the most recent one from LSI and install it using dkms. You might need to change the source a little bit, because there are statements like #if (KERNEL_VERSION >= 2.6.32) . but it has to be #if (KERNEL_VERSION > 2.6.32) . You'll find it just easily. Try to update your firmware if possible, too. There are linux utils to do this from LSI available. Good luck. Lars -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/599830 Title: system hangs after strange errors - raid6 and xfs defective (lsi driver?) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)
One of my servers does have a similar problem. Ubuntu 10.04.2 LTS 64 bit kernel 2.6.32-31-server The RAID controller is a Symbios Logic LSI MegaSAS 9260 (rev 03) (default drivers) Today I found next error in dmesg " task xfssyncd: blocked for more than 120 seconds" -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/599830 Title: system hangs after strange errors - raid6 and xfs defective (lsi driver?) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)
Hi Jason, the previous module versions were the ones shipped with ubuntu amd64 server 10.04(.1) LTS (up to 2.6.32-25-server) and 10.10 (2.6.35-24-server). I'll have a look at DKMS. I never used it yet. If the server runs without related issues for the next 2 month I'll close the bug if it's still open. Thanks again. Lars -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/599830 Title: system hangs after strange errors - raid6 and xfs defective (lsi driver?) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)
That is good to hear. There have been some workarounds in the LSI driver to handle problems similar to this. It might be that you're still having a problem but the new fw/driver combination is able to mask it much better. I would keep an eye on your system for anything suspicious for a while so you don't have any down time. I was curious if you knew what driver you were using before you upgraded to the LSI driver. I could try to compare the two and see what the differences were (probably a lot but maybe a specific change will stand out). If you don't know the driver version, then your kernel version will help narrow it down. Also, that driver should come with a DKMS package. It only has builds for Redhat Enterprise and Suse Linux Enterprise but you should be able to use DKMS to build it for your kernel and then future updates of your kernel should rebuild it automatically. Here's a link to DKMS if you're not familiar: https://help.ubuntu.com/community/DKMS. You should only have to do the bottom half since LSI builds the dkms package for everyone. Lastly, if you're satisfied with LSI driver, you might close the bug. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/599830 Title: system hangs after strange errors - raid6 and xfs defective (lsi driver?) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)
Ha, wait! I forgott the most important fact. I installed a very much newer driver version manually. And I do it every kernel update again. It's the driver from the zip archive from lsi. Actually it is for redhat and suse but the archive contains a source tarball. http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas3081e-r/index.html # cat /sys/module/mptbase/version 4.24.00.00 Regards Lars -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/599830 Title: system hangs after strange errors - raid6 and xfs defective (lsi driver?) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)
Hi Jason, thanks for your hints. I did a FW update of the LSI SAS controller and reduced the fs content. Since the the update and the filesystems are less filled the error didn't occur again. # cat /proc/scsi/mptsas/9 ioc1: LSISAS1068E B3, FwRev=011f0200h, Ports=1, MaxQ=483 # cat /proc/scsi/mptsas/10 ioc2: LSISAS1068E B3, FwRev=011f0200h, Ports=1, MaxQ=483 # ./sasflash -listall LSI Corporation SAS FLASH Utility. SASFlash Version 1.26.00.00 (2010.05.18) Copyright (c) 2006-2007 LSI Corporation. All rights reserved. Adapter Selected is a LSI SAS 1068E(B3): Num Ctlr FW Ver NVDATA x86-BIOS EFI-BSDPCI Addr --- 1 1068E(B3) 01.31.02.00 2d.03 06.32.00.00No Image 00:08:00:00 2 1068E(B3) 01.31.02.00 2d.03 06.32.00.00No Image 00:09:00:00 The fs look like this: # LANG=C df -ht xfs FilesystemSize Used Avail Use% Mounted on /dev/md2 6.1T 2.2T 3.9T 36% /backup2 /dev/md3 6.1T 3.8T 2.3T 63% /backup1 Just for your interest: # cat /sys/block/sd?/device/ioerr_cnt /sys/block/sd??/device/ioerr_cnt 0x358 0x358 0x53 0x48 0x47 0x46 0x59 0x55 0x55 0x60 0x63 0x62 0x60 0x5e 0x6c 0x62 0x60 0x67 0x68 0x6c 0x76 0x70 0x72 0x6e 0x6d 0x65 0xc3 0xbd 0xc5 0xca 0xf0 0x104 0x107 0x113 0x119 0x127 0x11b 0x127 0x126 0x12d 0x12f 0x13c 0x12e 0x142 0x17f 0x13a 0x141 0x144 0x13e 0x141 The first 2 drives are attached through SATA controller (AHCI). I don't know what numbers are normal but there is a server with drives that have an error count of more than 850 and work flawlessly. I would disable NCQ only at very last step, because throughput is important. The server has to fill 2 LTO tapes with fast write speed. Is it possible to reopen bug reports? If yes, I think you can close this one for now. I'll report when problems occur again. Thanks Lars -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/599830 Title: system hangs after strange errors - raid6 and xfs defective (lsi driver?) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)
The mpt messages in your logs suggest that the firmware had an NCQ problem that required it to abort all the outstanding commands and have the OS retry them (see http://en.wikipedia.org/wiki/NCQ for what NCQ is). You can disable NCQ, at the cost of IO performance usually, to work around the issue (see https://ata.wiki.kernel.org/index.php/Libata_FAQ#Enabling.2C_disabling_and_checking_NCQ). The problem would probably either be a bad drive or off change a bad cable or card. You might check each driver with smartctl to confirm their health. You might also what watch /sys/block/sdX/device/ioerr_cnt for each device to help clue in on any problems (never used the file before so I'd be curious if it helps). Also, the xfs.log shows a panic from a null pointer. This is probably just a result of the problems on with the fw<->drive communication. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/599830 Title: system hangs after strange errors - raid6 and xfs defective (lsi driver?) -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)
Hallo again, here are some other logs that might be connected to the failure: [...] Jul 2 17:52:25 speicher48 kernel: [17690.334458] sd 10:0:20:0: [sdx] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 Jul 2 17:52:25 speicher48 kernel: [17690.338722] sd 10:0:20:0: [sdx] Device not ready Jul 2 17:52:25 speicher48 kernel: [17690.338723] sd 10:0:20:0: [sdx] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 2 17:52:25 speicher48 kernel: [17690.338726] sd 10:0:20:0: [sdx] Sense Key : Not Ready [current] Jul 2 17:52:25 speicher48 kernel: [17690.338728] sd 10:0:20:0: [sdx] Add. Sense: Logical unit failed self-configuration Jul 2 17:52:25 speicher48 kernel: [17690.338731] sd 10:0:20:0: [sdx] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 Jul 2 17:52:25 speicher48 kernel: [17690.342955] sd 10:0:20:0: [sdx] Device not ready Jul 2 17:52:25 speicher48 kernel: [17690.342956] sd 10:0:20:0: [sdx] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 2 17:52:25 speicher48 kernel: [17690.342959] sd 10:0:20:0: [sdx] Sense Key : Not Ready [current] Jul 2 17:52:25 speicher48 kernel: [17690.342961] sd 10:0:20:0: [sdx] Add. Sense: Logical unit failed self-configuration Jul 2 17:52:25 speicher48 kernel: [17690.342964] sd 10:0:20:0: [sdx] CDB: Read(10): 28 00 00 00 10 00 00 00 08 00 Jul 2 17:52:25 speicher48 kernel: [17690.374555] sd 10:0:20:0: [sdx] Device not ready Jul 2 17:52:25 speicher48 kernel: [17690.374562] sd 10:0:20:0: [sdx] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 2 17:52:25 speicher48 kernel: [17690.374569] sd 10:0:20:0: [sdx] Sense Key : Not Ready [current] Jul 2 17:52:25 speicher48 kernel: [17690.374577] sd 10:0:20:0: [sdx] Add. Sense: Logical unit failed self-configuration Jul 2 17:52:25 speicher48 kernel: [17690.374587] sd 10:0:20:0: [sdx] CDB: Read(10): 28 00 22 ee c0 80 00 00 08 00 Jul 2 17:52:25 speicher48 kernel: [17690.379051] sd 10:0:20:0: [sdx] Device not ready Jul 2 17:52:25 speicher48 kernel: [17690.379055] sd 10:0:20:0: [sdx] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 2 17:52:25 speicher48 kernel: [17690.379061] sd 10:0:20:0: [sdx] Sense Key : Not Ready [current] Jul 2 17:52:25 speicher48 kernel: [17690.379068] sd 10:0:20:0: [sdx] Add. Sense: Logical unit failed self-configuration Jul 2 17:52:25 speicher48 kernel: [17690.379076] sd 10:0:20:0: [sdx] CDB: Read(10): 28 00 22 ee c0 80 00 00 08 00 Jul 2 17:52:25 speicher48 kernel: [17690.383570] sd 10:0:20:0: [sdx] Device not ready Jul 2 17:52:25 speicher48 kernel: [17690.383575] sd 10:0:20:0: [sdx] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 2 17:52:25 speicher48 kernel: [17690.383581] sd 10:0:20:0: [sdx] Sense Key : Not Ready [current] Jul 2 17:52:25 speicher48 kernel: [17690.383588] sd 10:0:20:0: [sdx] Add. Sense: Logical unit failed self-configuration Jul 2 17:52:25 speicher48 kernel: [17690.383597] sd 10:0:20:0: [sdx] CDB: Read(10): 28 00 22 ee c1 20 00 00 08 00 Jul 2 17:52:25 speicher48 kernel: [17690.388115] sd 10:0:20:0: [sdx] Device not ready Jul 2 17:52:25 speicher48 kernel: [17690.388120] sd 10:0:20:0: [sdx] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 2 17:52:25 speicher48 kernel: [17690.388126] sd 10:0:20:0: [sdx] Sense Key : Not Ready [current] Jul 2 17:52:25 speicher48 kernel: [17690.388133] sd 10:0:20:0: [sdx] Add. Sense: Logical unit failed self-configuration Jul 2 17:52:25 speicher48 kernel: [17690.388141] sd 10:0:20:0: [sdx] CDB: Read(10): 28 00 22 ee c1 20 00 00 08 00 Jul 2 17:52:25 speicher48 kernel: [17690.392689] sd 10:0:20:0: [sdx] Device not ready [...] after this sdx was fallen out of md2 Thanks Lars -- system hangs after strange errors - raid6 and xfs defective (lsi driver?) https://bugs.launchpad.net/bugs/599830 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)
Hi, here is a dmesg log of the system. There are a lot of messages and errors from the LSI driver module (mptbase). Maybe someone knows how to interprete these. [...] [48087.552179] mptbase: ioc0: LogInfo(0x3108): Originator={PL}, Code={SATA NCQ Fail All Commands After Error}, SubCode(0x) [48087.552381] mptbase: ioc0: LogInfo(0x3108): Originator={PL}, Code={SATA NCQ Fail All Commands After Error}, SubCode(0x) [48087.552606] mptbase: ioc0: LogInfo(0x3108): Originator={PL}, Code={SATA NCQ Fail All Commands After Error}, SubCode(0x) [48087.552831] mptbase: ioc0: LogInfo(0x3108): Originator={PL}, Code={SATA NCQ Fail All Commands After Error}, SubCode(0x) [48578.283740] mptbase: ioc2: LogInfo(0x31110700): Originator={PL}, Code={Reset}, SubCode(0x0700) [48579.771554] mptbase: ioc2: LogInfo(0x31110700): Originator={PL}, Code={Reset}, SubCode(0x0700) [48579.771724] mptbase: ioc2: LogInfo(0x31110700): Originator={PL}, Code={Reset}, SubCode(0x0700) [48579.771905] mptbase: ioc2: LogInfo(0x31110700): Originator={PL}, Code={Reset}, SubCode(0x0700) [...] Regards Lars ** Attachment added: "dm.log" http://launchpadlibrarian.net/51223911/dm.log -- system hangs after strange errors - raid6 and xfs defective (lsi driver?) https://bugs.launchpad.net/bugs/599830 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)
** Package changed: ubuntu => ecs (Ubuntu) -- system hangs after strange errors - raid6 and xfs defective (lsi driver?) https://bugs.launchpad.net/bugs/599830 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)
** Attachment added: "error during xfs access" http://launchpadlibrarian.net/51117657/xfs.log1 -- system hangs after strange errors - raid6 and xfs defective (lsi driver?) https://bugs.launchpad.net/bugs/599830 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 599830] Re: system hangs after strange errors - raid6 and xfs defective (lsi driver?)
** Attachment added: "screen shot of hanging system" http://launchpadlibrarian.net/51117530/sp48-hang.jpg -- system hangs after strange errors - raid6 and xfs defective (lsi driver?) https://bugs.launchpad.net/bugs/599830 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs