Bug#625922: SATA devices get reset without real hardware failure
This is mostly a mee too to report that I have had this issue at least three times in the past 4 months (more or less), and every time with the one ST2000DL003-9VT166 drive in my computer (out of four). I'm also running with the CC32 firmware (apparently there's nothing more recent) but the Linux kernel is newer: Linux version 3.2.0-2-amd64 (Debian 3.2.15-1) (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-1) ) #1 SMP Sun Apr 15 16:47:38 UTC 2012 Other details that may be relevant: * It's part of a software RAID1 device (which causes it to drop out obviously). * I did a long test with SeaTools and it passed. Here's how it started last night: May 16 01:42:29 amboise kernel: [713845.984035] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen May 16 01:42:29 amboise kernel: [713845.984040] ata5.00: failed command: FLUSH CACHE EXT May 16 01:42:29 amboise kernel: [713845.984046] ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 May 16 01:42:29 amboise kernel: [713845.984047] res 40/00:02:00:08:00/00:00:00:00:00/b0 Emask 0x4 (timeout) May 16 01:42:29 amboise kernel: [713845.984051] ata5.00: status: { DRDY } May 16 01:42:29 amboise kernel: [713845.984060] ata5.00: hard resetting link May 16 01:42:29 amboise kernel: [713846.304016] ata5.01: hard resetting link May 16 01:42:34 amboise kernel: [713851.820015] ata5.00: link is slow to respond, please be patient (ready=0) May 16 01:42:39 amboise kernel: [713856.020035] ata5.00: SRST failed (errno=-16) May 16 01:42:39 amboise kernel: [713856.020045] ata5.00: hard resetting link May 16 01:42:39 amboise kernel: [713856.340022] ata5.01: hard resetting link May 16 01:42:44 amboise kernel: [713861.856014] ata5.00: link is slow to respond, please be patient (ready=0) May 16 01:42:49 amboise kernel: [713866.056013] ata5.00: SRST failed (errno=-16) May 16 01:42:49 amboise kernel: [713866.056022] ata5.00: hard resetting link May 16 01:42:49 amboise kernel: [713866.376014] ata5.01: hard resetting link May 16 01:42:54 amboise kernel: [713871.892010] ata5.00: link is slow to respond, please be patient (ready=0) May 16 01:43:24 amboise kernel: [713901.068011] ata5.00: SRST failed (errno=-16) May 16 01:43:24 amboise kernel: [713901.068019] ata5.00: limiting SATA link speed to 1.5 Gbps May 16 01:43:24 amboise kernel: [713901.068023] ata5.01: limiting SATA link speed to 1.5 Gbps May 16 01:43:24 amboise kernel: [713901.068028] ata5.00: hard resetting link May 16 01:43:24 amboise kernel: [713901.388013] ata5.01: hard resetting link May 16 01:43:29 amboise kernel: [713906.120012] ata5.00: SRST failed (errno=-16) May 16 01:43:29 amboise kernel: [713906.130577] ata5.00: reset failed, giving up May 16 01:43:29 amboise kernel: [713906.130580] ata5.00: disabled May 16 01:43:29 amboise kernel: [713906.130585] ata5.01: disabled May 16 01:43:29 amboise kernel: [713906.130589] ata5.00: device reported invalid CHS sector 0 May 16 01:43:29 amboise kernel: [713906.130598] ata5: EH complete May 16 01:43:29 amboise kernel: [713906.130670] sd 4:0:0:0: [sdb] Unhandled error code May 16 01:43:29 amboise kernel: [713906.130677] sd 4:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK May 16 01:43:29 amboise kernel: [713906.130691] sd 4:0:0:0: [sdb] CDB: Write(10): 2a 00 17 00 6f 80 00 00 08 00 May 16 01:43:29 amboise kernel: [713906.130713] end_request: I/O error, dev sdb, sector 385904512 -- Francois Gouget http://fgouget.free.fr/ In theory, theory and practice are the same, but in practice they're different. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/alpine.DEB.2.02.1205160840120.27276@amboise.dolphin
Bug#625922: SATA devices get reset without real hardware failure
Hi all, seems unreproducible to me with linux-2.6 3.1.8-2 currently available in testing. Regards, -- Alessio Treglia | www.alessiotreglia.com Debian Developer | ales...@debian.org Ubuntu Core Developer | quadris...@ubuntu.com 0416 0004 A827 6E40 BB98 90FB E8A4 8AE5 311D 765A -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/CAMHuwozR12Y6EwtHUOcGLyqGZf6gLfd9rzjF-Fw1Vi_ek7=z...@mail.gmail.com
Bug#625922: SATA devices get reset without real hardware failure
Summary of the bug so far: Messages #5, #63 from Natalia Portillo : package version: Ubuntu 2.6.38-8.?? (Debian did not use version 2.6.38-8) Debian 2.6.32-38 ("all squeeze kernels up to two weeks away") (Gentoo 2.6.32 does not have the problem) drive(s): Seagate ST2000DL003-9VT166 controller(s): Intel ICH9 (AHCI mode) (and JMicron JMB36?) kernel log: [255352.928063] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [255352.928071] ata4.00: failed command: FLUSH CACHE EXT [255352.928080] ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [255352.928082] res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [255352.928087] ata4.00: status: { DRDY } [255352.928096] ata4: hard resetting link [255362.932028] ata4: softreset failed (1st FIS failed) [255362.932036] ata4: hard resetting link [255372.932018] ata4: softreset failed (1st FIS failed) [255372.932026] ata4: hard resetting link [255407.932029] ata4: softreset failed (1st FIS failed) [255407.932038] ata4: limiting SATA link speed to 1.5 Gbps [255407.932042] ata4: hard resetting link [255413.120028] ata4: softreset failed (device not ready) [255413.120035] ata4: reset failed, giving up [255413.120040] ata4.00: disabled [255413.120060] ata4: EH complete Messages #18, #33 from Paul Faure : package version: Ubuntu 2.6.38-8.42 drive(s): Seagate ST2000DL003, ST2000DL003-9VT1 (but no problem with ST32000644NS) controller(s): Intel ICH9 (legacy mode) kernel log: [247972.000120] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [247972.000132] ata3.00: failed command: FLUSH CACHE EXT [247972.000146] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [247972.000148] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [247972.000155] ata3.00: status: { DRDY } [247972.000169] ata3: hard resetting link [247977.550053] ata3: link is slow to respond, please be patient (ready=0) [247982.050113] ata3: SRST failed (errno=-16) [247982.050138] ata3: hard resetting link [247987.600068] ata3: link is slow to respond, please be patient (ready=0) [247992.100087] ata3: SRST failed (errno=-16) [247992.100109] ata3: hard resetting link [247997.650040] ata3: link is slow to respond, please be patient (ready=0) [248027.110050] ata3: SRST failed (errno=-16) [248027.110066] ata3: limiting SATA link speed to 1.5 Gbps [248027.110075] ata3: hard resetting link [248032.120042] ata3: SRST failed (errno=-16) [248032.120053] ata3: reset failed, giving up [248032.120060] ata3.00: disabled [248032.120069] ata3.00: device reported invalid CHS sector 0 [248032.120094] ata3: EH complete Message #23 from Christian Robottom Reis : package version: Ubuntu 2.6.35-28 drive(s): Seagate ST2000DL003-9VT166 controller(s): ? kernel log: not provided Messages #38, #43 from Juhani Karlsson : Seems to be a different problem. Message #48 from Javier Ortega Conde (Malkavian) : A bunch of different problems. Message #70 from Alessio Treglia : package version: Debian 3.1.1-1 drive(s): TOSHIBA MK5055GSXN controller(s): ? (AHCI) kernel log: [ 6838.837215] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 6838.837222] ata2.00: failed command: FLUSH CACHE EXT [ 6838.837230] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [ 6838.837231] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 6838.837241] ata2.00: status: { DRDY } [ 6838.837254] ata2: hard resetting link [ 6844.199464] ata2: link is slow to respond, please be patient (ready=0) [ 6848.846062] ata2: COMRESET failed (errno=-16) [ 6848.846075] ata2: hard resetting link [ 6854.208316] ata2: link is slow to respond, please be patient (ready=0) [ 6858.854933] ata2: COMRESET failed (errno=-16) [ 6858.854943] ata2: hard resetting link [ 6864.213249] ata2: link is slow to respond, please be patient (ready=0) [ 6875.073958] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 6875.117586] ata2.00: configured for UDMA/100 [ 6875.117598] ata2.00: retrying FLUSH 0xea Emask 0x4 [ 6875.129847] ata2.00: device eported invalid CHS sector 0 [ 6875.129864] ata2: EH complete -- Ben Hutchings Computers are not intelligent. They only think they are. signature.asc Description: This is a digitally signed message part
Bug#625922: SATA devices get reset without real hardware failure
El 26/11/2011, a las 07:49, Jonathan Nieder escribió: > Hi, > > Natalia Portillo wrote: > >> While running stock Debian's sid linux 2.6.38-8-amd64 kernel I'm >> getting random fails on SATA devices. >> >> I have a RAID5 system with 5 disks and 3 of them showed the same >> exact failure, one each 48 hours. >> >> On reboot, the devices work perfectly, and badblocks runs through >> them without a single failure. >> >> Kernel exact failure is: >> >> [255352.928063] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 >> frozen >> [255352.928071] ata4.00: failed command: FLUSH CACHE EXT > [...] >> Devices are in different SATA ports (first failed ata2, then ata5, >> then ata4) and are all Seagate ST2000DL003-9VT166. >> >> Same exact hardware has been running on Linux 2.6.32-gentoo for >> weeks without a single failure. > > Thanks for reporting it, and sorry for the slow response. > > Some questions: > > - what kernel are you using now? claunia@hades:~$ uname -a Linux hades 3.0.0-1-amd64 #1 SMP Sat Aug 27 16:21:11 UTC 2011 x86_64 GNU/Linux wheezy > - can you still reproduce this? have been only two weeks with this kernel, and there is a bug, another one > - can you reproduce it with a squeeze kernel, too? with all squeeze kernels up to two weeks away > - do you know what exact version the working 2.6.32-gentoo kernel > was? r6 I think > - please attach a log of the initialization of the kernel, either by > saving full "dmesg" output right after booting or by gathering it > from /var/log/dmesg* I will have to dig up on the rotated logs, stay tuned > - any workarounds or other weird symptoms? Curiously, no workarounds, but other weird symptons in same and other kernels. On both squeeze and wheezy kernel the following happen almost once a day (always on high network transfers): [118801.372070] INFO: task bacula-sd:27996 blocked for more than 120 seconds. [118801.372091] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [118801.372113] bacula-sd D 88009f63a2c0 0 27996 1 0x [118801.372122] 88009f63a2c0 0082 8800 [118801.372130] 8800bc3780c0 00012800 88008954dfd8 88008954dfd8 [118801.372138] 00012800 88009f63a2c0 00012800 00012800 [118801.372146] Call Trace: [118801.372161] [] ? schedule_timeout+0x2d/0xd7 [118801.372170] [] ? blk_peek_request+0x1a7/0x1bc [118801.372176] [] ? wait_for_common+0x9d/0x116 [118801.372184] [] ? try_to_wake_up+0x199/0x199 [118801.372190] [] ? _raw_spin_lock_irq+0xd/0x1a [118801.372218] [] ? st_do_scsi.clone.10+0x2d9/0x309 [st] [118801.372228] [] ? st_int_ioctl+0x673/0xad5 [st] [118801.372234] [] ? mmdrop+0xd/0x1c [118801.372241] [] ? should_resched+0x5/0x24 [118801.372250] [] ? st_ioctl+0xb5e/0xedf [st] [118801.372259] [] ? hrtimer_try_to_cancel+0x3c/0x46 [118801.372265] [] ? hrtimer_cancel+0xc/0x16 [118801.372272] [] ? do_vfs_ioctl+0x45b/0x49c [118801.372278] [] ? update_rmtp+0x62/0x62 [118801.372284] [] ? hrtimer_start_expires+0x16/0x1b [118801.372290] [] ? sys_ioctl+0x4b/0x72 [118801.372297] [] ? system_call_fastpath+0x16/0x1b And repeats a lot of times (the stack trace is always different, always being the process that's doing the transfer, like bacula-sd or netatalk, or the XFS or MDRAID processes) On squeeze kernel when this happens nothing works. That is, if you open another processes, it does not open. If you kill one process, it stays opened. Hard reboot is the only way. On wheezy system continues working. Curiously I received an Efika MX Smartbook machine yesterday that exhibits another bug, but really similar. With kernel Linux 2.6.31.14.26-efikamx the internal SSD suffers a lost interrupt and resets when there is high cpu usage. Sorry have to dig logs also. > > If you can reproduce this reliably with a 3.1.y kernel, we should > take this upstream (looks like that's linux-...@vger.kernel.org > plus linux-ker...@vger.kernel.org; please cc me or this bug log if > writing there so we can track it). > > Hope that helps, > Jonathan -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/81165706-7715-4647-a055-64029a59a...@claunia.com
Bug#625922: SATA devices get reset without real hardware failure
Hi, Natalia Portillo wrote: > While running stock Debian's sid linux 2.6.38-8-amd64 kernel I'm > getting random fails on SATA devices. > > I have a RAID5 system with 5 disks and 3 of them showed the same > exact failure, one each 48 hours. > > On reboot, the devices work perfectly, and badblocks runs through > them without a single failure. > > Kernel exact failure is: > > [255352.928063] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 > frozen > [255352.928071] ata4.00: failed command: FLUSH CACHE EXT [...] > Devices are in different SATA ports (first failed ata2, then ata5, > then ata4) and are all Seagate ST2000DL003-9VT166. > > Same exact hardware has been running on Linux 2.6.32-gentoo for > weeks without a single failure. Thanks for reporting it, and sorry for the slow response. Some questions: - what kernel are you using now? - can you still reproduce this? - can you reproduce it with a squeeze kernel, too? - do you know what exact version the working 2.6.32-gentoo kernel was? - please attach a log of the initialization of the kernel, either by saving full "dmesg" output right after booting or by gathering it from /var/log/dmesg* - any workarounds or other weird symptoms? If you can reproduce this reliably with a 3.1.y kernel, we should take this upstream (looks like that's linux-...@vger.kernel.org plus linux-ker...@vger.kernel.org; please cc me or this bug log if writing there so we can track it). Hope that helps, Jonathan -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/2026074919.ga22...@elie.hsd1.il.comcast.net
Re: Bug#625922: SATA devices get reset without real hardware failure
On Wed, Oct 19, 2011 at 05:11:05PM +0200, U.Mutlu wrote: > Ben Hutchings wrote, On 2011-10-19 15:07: > >On Wed, 2011-10-19 at 13:31 +0200, U.Mutlu wrote: > >>Javier Ortega Conde (Malkavian) wrote, On 2011-10-18 00:37: > >>>This bug (in general, not just this on this web) have been in GNU/Linux > >>>since > >>>a long time with various disks, mainboards, SATA controllers, distros and > >>>kernels (maybe since changes after 2.6.24). > >> > >>I'm using kernel 2.6.37.6 and there this bug is still present. > > > >Not a Debian kernel version, so please don't bother this list with it. > > I haven't mentioned anything of Debian, I'm using the kernel from kernel.org : > http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.37.6.tar.bz2 You are writing to the debian-kernel mailing list, not LKML. > >>Has it been fixed in any recent kernel versions? > >>IMO it deserves the highest priority to fix this ASAP. > >[...] > > > >It's not a single bug. > > It's disastrous situation: an OS with buggy HD kernel driver, and no fix on > the way... Stop ranting and explain your problem to the right people (not us). Ben. -- Ben Hutchings We get into the habit of living before acquiring the habit of thinking. - Albert Camus -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20111019161410.gn3...@decadent.org.uk
Re: Bug#625922: SATA devices get reset without real hardware failure
Ben Hutchings wrote, On 2011-10-19 15:07: On Wed, 2011-10-19 at 13:31 +0200, U.Mutlu wrote: Javier Ortega Conde (Malkavian) wrote, On 2011-10-18 00:37: This bug (in general, not just this on this web) have been in GNU/Linux since a long time with various disks, mainboards, SATA controllers, distros and kernels (maybe since changes after 2.6.24). I'm using kernel 2.6.37.6 and there this bug is still present. Not a Debian kernel version, so please don't bother this list with it. I haven't mentioned anything of Debian, I'm using the kernel from kernel.org : http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.37.6.tar.bz2 Has it been fixed in any recent kernel versions? IMO it deserves the highest priority to fix this ASAP. [...] It's not a single bug. It's disastrous situation: an OS with buggy HD kernel driver, and no fix on the way... -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/j7mpa9$8ns$1...@dough.gmane.org
Re: Bug#625922: SATA devices get reset without real hardware failure
On Wed, 2011-10-19 at 13:31 +0200, U.Mutlu wrote: > Javier Ortega Conde (Malkavian) wrote, On 2011-10-18 00:37: > > This bug (in general, not just this on this web) have been in GNU/Linux > > since > > a long time with various disks, mainboards, SATA controllers, distros and > > kernels (maybe since changes after 2.6.24). > > I'm using kernel 2.6.37.6 and there this bug is still present. Not a Debian kernel version, so please don't bother this list with it. > Has it been fixed in any recent kernel versions? > IMO it deserves the highest priority to fix this ASAP. [...] It's not a single bug. Ben. -- Ben Hutchings 73.46% of all statistics are made up. signature.asc Description: This is a digitally signed message part
Re: Bug#625922: SATA devices get reset without real hardware failure
Javier Ortega Conde (Malkavian) wrote, On 2011-10-18 00:37: This bug (in general, not just this on this web) have been in GNU/Linux since a long time with various disks, mainboards, SATA controllers, distros and kernels (maybe since changes after 2.6.24). I'm using kernel 2.6.37.6 and there this bug is still present. Has it been fixed in any recent kernel versions? IMO it deserves the highest priority to fix this ASAP. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 " Raj B (bigwoof) wrote on 2011-01-03: ... I've lost data because of this as well. my entire /var/lib/mysql directory was blown away and recovered into lost+found. other directories are there as well. ... " I had a similar disaster yesterday... :-( From my syslog: Oct 18 12:11:16 c12 kernel: [ 35.340954] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Oct 18 12:11:16 c12 kernel: [ 35.342052] ata1.00: irq_stat 0x4001 Oct 18 12:11:16 c12 kernel: [ 35.343141] ata1.00: failed command: READ DMA Oct 18 12:11:16 c12 kernel: [ 35.344230] ata1.00: cmd c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in Oct 18 12:11:16 c12 kernel: [ 35.344232] res 51/01:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x1 (device error) Oct 18 12:11:16 c12 kernel: [ 35.346497] ata1.00: status: { DRDY ERR } Oct 18 12:11:16 c12 kernel: [ 35.351588] ata1.00: configured for UDMA/133 Oct 18 12:11:16 c12 kernel: [ 35.352760] ata1: EH complete Oct 18 12:11:16 c12 kernel: [ 36.374319] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Oct 18 12:11:16 c12 kernel: [ 36.375516] ata1.00: irq_stat 0x4001 Oct 18 12:11:16 c12 kernel: [ 36.376722] ata1.00: failed command: READ DMA Oct 18 12:11:16 c12 kernel: [ 36.377913] ata1.00: cmd c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in Oct 18 12:11:16 c12 kernel: [ 36.377915] res 51/01:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x1 (device error) Oct 18 12:11:16 c12 kernel: [ 36.380393] ata1.00: status: { DRDY ERR } Oct 18 12:11:16 c12 kernel: [ 36.385574] ata1.00: configured for UDMA/133 Oct 18 12:11:16 c12 kernel: [ 36.386828] ata1: EH complete Oct 18 12:11:16 c12 kernel: [ 37.407698] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Oct 18 12:11:16 c12 kernel: [ 37.409013] ata1.00: irq_stat 0x4001 Oct 18 12:11:16 c12 kernel: [ 37.410317] ata1.00: failed command: READ DMA Oct 18 12:11:16 c12 kernel: [ 37.411638] ata1.00: cmd c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in Oct 18 12:11:16 c12 kernel: [ 37.411639] res 51/40:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x9 (media error) Oct 18 12:11:16 c12 kernel: [ 37.414381] ata1.00: status: { DRDY ERR } Oct 18 12:11:16 c12 kernel: [ 37.415758] ata1.00: error: { UNC } Oct 18 12:11:16 c12 kernel: [ 37.421076] ata1.00: configured for UDMA/133 Oct 18 12:11:16 c12 kernel: [ 37.422466] ata1: EH complete Oct 18 12:11:16 c12 kernel: [ 38.449412] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Oct 18 12:11:16 c12 kernel: [ 38.450847] ata1.00: irq_stat 0x4000 Oct 18 12:11:16 c12 kernel: [ 38.452285] ata1.00: failed command: READ DMA Oct 18 12:11:16 c12 kernel: [ 38.453718] ata1.00: cmd c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in Oct 18 12:11:16 c12 kernel: [ 38.453720] res 51/40:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x9 (media error) Oct 18 12:11:16 c12 kernel: [ 38.456694] ata1.00: status: { DRDY ERR } Oct 18 12:11:16 c12 kernel: [ 38.458190] ata1.00: error: { UNC } Oct 18 12:11:16 c12 kernel: [ 38.463615] ata1.00: configured for UDMA/133 Oct 18 12:11:16 c12 kernel: [ 38.465135] ata1: EH complete Oct 18 12:11:16 c12 kernel: [ 39.491124] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Oct 18 12:11:16 c12 kernel: [ 39.492692] ata1.00: irq_stat 0x4001 Oct 18 12:11:16 c12 kernel: [ 39.494253] ata1.00: failed command: READ DMA Oct 18 12:11:16 c12 kernel: [ 39.495829] ata1.00: cmd c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in Oct 18 12:11:16 c12 kernel: [ 39.495831] res 51/40:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x9 (media error) Oct 18 12:11:16 c12 kernel: [ 39.499081] ata1.00: status: { DRDY ERR } Oct 18 12:11:16 c12 kernel: [ 39.500710] ata1.00: error: { UNC } Oct 18 12:11:16 c12 kernel: [ 39.506254] ata1.00: configured for UDMA/133 Oct 18 12:11:16 c12 kernel: [ 39.507867] ata1: EH complete ... Oct 18 14:05:41 c12 kernel: [ 71.786231] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x191 action 0xe frozen Oct 18 14:05:41 c12 kernel: [ 71.786236] ata1.00: irq_stat 0x0840, interface fatal error, PHY RDY changed Oct 18 14:05:41 c12 kernel: [ 71.786240] ata1: SError: { PHYRdyChg Dispar LinkSeq TrStaTrns } Oct 18 14:05:41 c12 kernel: [ 71.786243] ata1.00: failed command: READ DMA Oct 18 14:05:41 c12 kernel: [ 71.786250] ata1.00: cmd c8/00:20:27:03:9d/00:00:00:00:00/e1 tag 0 dma 16384 in Oct 18 14:05:41 c12 kernel: [ 71.786251] res 50/00:00:c6:02
Bug#625922: SATA devices get reset without real hardware failure
On Tue, 2011-10-18 at 00:37 +0200, Javier Ortega Conde (Malkavian) wrote: > This bug (in general, not just this on this web) have been in GNU/Linux since > a long time with various disks, mainboards, SATA controllers, distros and > kernels (maybe since changes after 2.6.24). Just because you see the same error messages, that does not mean you are seeing the same bug. > In https://bugzilla.redhat.com/show_bug.cgi?id=684599 David Zeuthen says > "it's most probably caused by this commit > http://git.kernel.org/?p=linux/hotplug/udev.git;a=commitdiff;h=560de575148b7efda3b34a7f7073abd483c5f08e > > " So that's a bug in some drives, though we need to work around it. > Possible workarounds readed to this bug: > -1: Add "libata.atapi_passthru16=0" to the kernel boot options (because some > devices may not support 16-byte ATA commands) ( > https://bugzilla.redhat.com/show_bug.cgi?id=684599 ) > -2: (Same as 1) Add options libata atapi_passthru16=0 to > /etc/modprobe.d/modprobe.conf and add FILES="/etc/modprobe.d/modprobe.conf" > to > /etc/mkinitcpio.conf ( https://bbs.archlinux.org/viewtopic.php?pid=895404 ) OK. > -3: Somebody called Fujisan said in 2009 "adding 'acpi=off noapic' to the > kernel in /etc.grub.conf seems to have solved the problem for me" ( > https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=462425 ). Raman > Gupta and Andreas M. Kirchwitz say in other forums that adding 'acpi=off' > doesn't work ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 ) > -4: (Similar to 3) Completely disable ACPI in mainboard BIOS. ( > http://lists.debian.org/debian-user/2010/01/msg00023.html ) These are workarounds for bugs in IRQ routing on some motherboards. They are also outdated advice. 10 years ago when both ACPI and the APIC architecture were quite new, there were a lot of bugs in both BIOS and kernel support for them. It was therefore sensible to try disabling it when a new system seemed unstable. Today, this is not the case. > -5: Gaetan Cambier says "add the option line to grub to disable ncq : > 'libata.force=noncq' for me, with this, i have no froze". ( > https://bugzilla.redhat.com/show_bug.cgi?id=549981 ). Others reply that it > doesn't work for them. PsYcHoK9 sys it works for him but John Doe replies > that > not for him ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 ). Not even the same symptoms. > -6: Reartes Guillermo says "booting with the kernel parameter: pcie_aspm=off > ? > For me it worked (nvidia)". Raman Gupta replies that "I tried this and it did > not fix the problem." ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 ) This is a workaround for a controller or chipset bug. [...] > Same problem in my old PC/Server Pentium II MMX with Debian 6.0.3 (stable) > with kernel 2.6.32-5-686 and libata version 3.00 in an "IBM-DTLA-305010" 10Gb > IDE disk (configured by debian as sda) in an old mainboard . No RAID used, > but > only soft reset, and no hard reset, so I don't lose data. Could send logs, > but > I think they wouldn't give any more info. > > Same problem in my desktop PC every 2 or 3 months in Debian testing with > kernels 3.0.0-1-amd64, 3.0.0-rc2-amd64, 2.6.39-2-amd64, 2.6.39-amd64, > 2.6.38-2-amd64, 2.6.38-amd64 and maybe others older, and libata 3.00 in two > Seagate 7200.11 "ST3500320AS" 500Gb SATA2 disks (with last firmware) from a > RAID10. Fortunately the other two Western Digital "WDC WD1002FAEX-00Z3A0" 1Tb > SATA3 disks don't fail, but I have to reboot and re-add disk to reconstruct > raid. Could send logs, but I think they wouldn't give any more info. [...] Use reportbug to open a *separate* bug report for *each* of these systems. Do send the logs. Please do not try to find connections with other bug reports. Ben. -- Ben Hutchings No political challenge can be met by shopping. - George Monbiot signature.asc Description: This is a digitally signed message part
Bug#625922: SATA devices get reset without real hardware failure
This bug (in general, not just this on this web) have been in GNU/Linux since a long time with various disks, mainboards, SATA controllers, distros and kernels (maybe since changes after 2.6.24). In https://bugzilla.redhat.com/show_bug.cgi?id=684599 David Zeuthen says "it's most probably caused by this commit http://git.kernel.org/?p=linux/hotplug/udev.git;a=commitdiff;h=560de575148b7efda3b34a7f7073abd483c5f08e " Possible workarounds readed to this bug: -1: Add "libata.atapi_passthru16=0" to the kernel boot options (because some devices may not support 16-byte ATA commands) ( https://bugzilla.redhat.com/show_bug.cgi?id=684599 ) -2: (Same as 1) Add options libata atapi_passthru16=0 to /etc/modprobe.d/modprobe.conf and add FILES="/etc/modprobe.d/modprobe.conf" to /etc/mkinitcpio.conf ( https://bbs.archlinux.org/viewtopic.php?pid=895404 ) -3: Somebody called Fujisan said in 2009 "adding 'acpi=off noapic' to the kernel in /etc.grub.conf seems to have solved the problem for me" ( https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=462425 ). Raman Gupta and Andreas M. Kirchwitz say in other forums that adding 'acpi=off' doesn't work ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 ) -4: (Similar to 3) Completely disable ACPI in mainboard BIOS. ( http://lists.debian.org/debian-user/2010/01/msg00023.html ) -5: Gaetan Cambier says "add the option line to grub to disable ncq : 'libata.force=noncq' for me, with this, i have no froze". ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 ). Others reply that it doesn't work for them. PsYcHoK9 sys it works for him but John Doe replies that not for him ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 ). -6: Reartes Guillermo says "booting with the kernel parameter: pcie_aspm=off ? For me it worked (nvidia)". Raman Gupta replies that "I tried this and it did not fix the problem." ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 ) -7: A. Mani says "For the SB600 controller, the right thing to do is to restrict all drives to 1.5Gbps by jumpers or with a boot option." Raman Gupta replies "I also tried this -- but with this setting all drives attached to my Marvell controller could not even be started by the kernel -- permanent "failed to IDENTIFY" errors." ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 ) -8: DjznBR (djzn-br) says he have trying some things WITHOUT success it and finally one that works. Doesn't work: TURNED HDPARM OFF, CHANGED CABLE, EXPERIMENTED AHCI & RAID MODES, DISABLED NCQ, COMPILED KERNEL WITH CONFIG_SATA_PMP DISABLED, TRYING NOW LIBATA.FORCE=1.5GBPS, changed the cables to different routes... SATA1 -> SATA2 SATA2 -> SATA3 Works (but still gives "softreset failed (device not ready)" messages in dmesg and afterwards recover without data loss) : Added option for kernel in grub configuration "libata.noacpi=1". Also says "libata.force=norst ... prevents soft and hard link resettings. If you have that switch on, when this bug comes up, there is a system lock down (because obviously the kernel prevented the soft & hard resetting." ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 ) Same problem in my old PC/Server Pentium II MMX with Debian 6.0.3 (stable) with kernel 2.6.32-5-686 and libata version 3.00 in an "IBM-DTLA-305010" 10Gb IDE disk (configured by debian as sda) in an old mainboard . No RAID used, but only soft reset, and no hard reset, so I don't lose data. Could send logs, but I think they wouldn't give any more info. Same problem in my desktop PC every 2 or 3 months in Debian testing with kernels 3.0.0-1-amd64, 3.0.0-rc2-amd64, 2.6.39-2-amd64, 2.6.39-amd64, 2.6.38-2-amd64, 2.6.38-amd64 and maybe others older, and libata 3.00 in two Seagate 7200.11 "ST3500320AS" 500Gb SATA2 disks (with last firmware) from a RAID10. Fortunately the other two Western Digital "WDC WD1002FAEX-00Z3A0" 1Tb SATA3 disks don't fail, but I have to reboot and re-add disk to reconstruct raid. Could send logs, but I think they wouldn't give any more info. Possibly these are the same bug: #539059, #603061, #524876 Same bug in other distros and kernels: -Archlinux with udev-165 and udev-166: https://bbs.archlinux.org/viewtopic.php?pid=895404 -Fedora with kernel 2.6.38-0.rc8.git0.1.fc15.x86_64 and udev-166 in a DVD reader: https://bugzilla.redhat.com/show_bug.cgi?id=684599 -Fedora 13 with kernel 2.6.33.8-149.fc13.i686.PAE or Fedora 13 64bit on a Mac Mini -Fedora 14 with kernels 2.6.31.6-166.fc12@x86_64, 2.6.32.11-99.fc12.x86_64, 2.6.35.9-64.fc14.x86_64, 2.6.35.10-72.fc14.i686 and 2.6.35.10-74.fc14.x86_64 and 2.6.35.11-83.fc14.x86_64 and 2.6.35.14-95.fc14.x86_64: https://bugzilla.redhat.com/show_bug.cgi?id=549981 -Fedora 15 (updated from Fedora 14): https://bugzilla.redhat.com/show_bug.cgi?id=549981 -Centos5.5-x64 with kernel 2.6.18-194-x64: https://bugzilla.redhat.com/show_bug.cgi?id=549981 -RHEL5 with vanilla kernel 2.6.37.3: https://bugzilla.redhat.com/show_bu
Bug#625922: SATA devices get reset without real hardware failure
I use custom kernels from: root@lrdlnx:~# dpkg -l |grep linux-source ii linux-source-2.6.32 2.6.32-35 Linux kernel source for version 2.6.32 with Debian patches ii linux-source-2.6.38 2.6.38-5~bpo60+1 Linux kernel source for version 2.6.38 with Debian patches No external patches or anything, just official Debian sources and stuff. Same error with 2.6.32 and 2.6.38, first time I noticed errors was Aug 5 and I started using 2.6.38 Aug 16. I have also changed my configuration between my two dekstop computers, these drives have been attached to different motherboard, also Nvidia chipset, not exactly same but anywaysame error also with other mainboard. Aug 5 04:38:46 lrdlnx kernel: ata2: hard resetting link Aug 5 04:38:46 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port Aug 5 04:38:46 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 5 04:38:46 lrdlnx kernel: ata2.00: configured for UDMA/133 Aug 5 04:38:46 lrdlnx kernel: ata2: EH complete -- - Juhani Karlsson juhani dot karlsson at iki dot fi http://lrdlnx.iki.fi - X-Virus-Scanned: Debian amavisd-new (with ClamAV) at lrdlnx.iki.fi -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4e5a0961.2030...@lrdlnx.iki.fi
Bug#625922: SATA devices get reset without real hardware failure
I can confirm the same problem. cat /var/log/messages.0 |grep ata Aug 28 00:11:45 lrdlnx kernel: ata2: hard resetting link Aug 28 00:11:45 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port Aug 28 00:11:45 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 00:11:45 lrdlnx kernel: ata2.00: configured for UDMA/133 Aug 28 00:11:45 lrdlnx kernel: ata2: EH complete Aug 28 00:31:24 lrdlnx kernel: ata2: hard resetting link Aug 28 00:31:24 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port Aug 28 00:31:24 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 00:31:24 lrdlnx kernel: ata2.00: configured for UDMA/133 Aug 28 00:31:24 lrdlnx kernel: ata2: EH complete Aug 28 01:02:13 lrdlnx clamd[4832]: SelfCheck: Database status OK. Aug 28 02:39:01 lrdlnx freshclam[4935]: Database updated (1029731 signatures) from db.local.clamav.net (IP: 85.254.217.235) Aug 28 02:50:15 lrdlnx kernel: ata2: hard resetting link Aug 28 02:50:15 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port Aug 28 02:50:15 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 02:50:15 lrdlnx kernel: ata2.00: configured for UDMA/133 Aug 28 02:50:15 lrdlnx kernel: ata2: EH complete Aug 28 03:02:07 lrdlnx clamd[4832]: SelfCheck: Database modification detected. Forcing reload. Aug 28 03:02:08 lrdlnx clamd[4832]: Reading databases from /var/lib/clamav Aug 28 03:02:18 lrdlnx clamd[4832]: Database correctly reloaded (1028330 signatures) Aug 28 03:08:55 lrdlnx kernel: ata2: hard resetting link Aug 28 03:08:55 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port Aug 28 03:08:56 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 03:08:56 lrdlnx kernel: ata2.00: configured for UDMA/133 Aug 28 03:08:56 lrdlnx kernel: ata2: EH complete Aug 28 03:08:58 lrdlnx kernel: ata2: hard resetting link Aug 28 03:08:58 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port Aug 28 03:08:58 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 03:08:58 lrdlnx kernel: ata2.00: configured for UDMA/133 Aug 28 03:08:58 lrdlnx kernel: ata2: EH complete after 5PM no errors /var/log/messages, sometimes error can be seen in log once every few minutes, sometimes hours or even days, system is running 24/7 around the time I started notice errrors I had just replaced smaller drives with 2TB Western Digital Caviar Green WD20EARS which use "IntelliPower", variable spin rate 5400-7200rpm just to be sure I already replaced SATA cables with new ones SATA is Nvidia: root@lrdlnx:~# lspci |grep -i sata 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2) my raid: root@lrdlnx:~# cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sda5[2] sdb5[1] 1857650986 blocks super 1.2 [2/2] [UU] md1 : active raid1 sdb2[1] sda2[0] 70011200 blocks [2/2] [UU] md3 : active raid1 sdd1[1] sdc1[0] 730957376 blocks [2/2] [UU] md0 : active raid1 sdb1[1] sda1[0] 136448 blocks [2/2] [UU] unused devices: I have run tests few time with no errors and only thing is I these errors but everything is working perfectly: root@lrdlnx:~# badblocks -vv /dev/sda Checking blocks 0 to 1953514583 Checking for bad blocks (read-only test): done Pass completed, 0 bad blocks found. root@lrdlnx:~# badblocks -vv /dev/sdb Checking blocks 0 to 1953514583 Checking for bad blocks (read-only test): done Pass completed, 0 bad blocks found. root@lrdlnx:~# badblocks -vv /dev/sdc Checking blocks 0 to 732574583 Checking for bad blocks (read-only test): done Pass completed, 0 bad blocks found. root@lrdlnx:~# badblocks -vv /dev/sdd Checking blocks 0 to 732574583 Checking for bad blocks (read-only test): done Pass completed, 0 bad blocks found. root@lrdlnx:~# root@lrdlnx:~# smartctl -t short /dev/sda smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Fri Aug 19 08:21:57 2011 Use smartctl -X to abort test. root@lrdlnx:~# smartctl -t short /dev/sdb smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "
Processed: Re: Bug#625922: SATA devices get reset without real hardware failure
Processing commands for cont...@bugs.debian.org: > reassign 625922 linux-2.6 2.6.38-8-amd64 Bug #625922 [linux-image] SATA devices get reset without real hardware failure Warning: Unknown package 'linux-image' Bug reassigned from package 'linux-image' to 'linux-2.6'. Bug No longer marked as found in versions 2.6.38-8-amd64. Bug #625922 [linux-2.6] SATA devices get reset without real hardware failure There is no source info for the package 'linux-2.6' at version '2.6.38-8-amd64' with architecture '' Unable to make a source version for version '2.6.38-8-amd64' Bug Marked as found in versions 2.6.38-8-amd64. > -- Stopping processing here. Please contact me if you need assistance. -- 625922: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=625922 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/handler.s.c.130475867918698.transcr...@bugs.debian.org
Bug#625922: SATA devices get reset without real hardware failure
Package: linux-image Version: 2.6.38-8-amd64 Severity: critical While running stock Debian's sid linux 2.6.38-8-amd64 kernel I'm getting random fails on SATA devices. I have a RAID5 system with 5 disks and 3 of them showed the same exact failure, one each 48 hours. On reboot, the devices work perfectly, and badblocks runs through them without a single failure. Kernel exact failure is: [255352.928063] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [255352.928071] ata4.00: failed command: FLUSH CACHE EXT [255352.928080] ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [255352.928082] res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [255352.928087] ata4.00: status: { DRDY } [255352.928096] ata4: hard resetting link [255362.932028] ata4: softreset failed (1st FIS failed) [255362.932036] ata4: hard resetting link [255372.932018] ata4: softreset failed (1st FIS failed) [255372.932026] ata4: hard resetting link [255407.932029] ata4: softreset failed (1st FIS failed) [255407.932038] ata4: limiting SATA link speed to 1.5 Gbps [255407.932042] ata4: hard resetting link [255413.120028] ata4: softreset failed (device not ready) [255413.120035] ata4: reset failed, giving up [255413.120040] ata4.00: disabled [255413.120060] ata4: EH complete [255413.120131] sd 4:0:0:0: [sdc] Unhandled error code [255413.120134] sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [255413.120139] sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 a5 ec 28 24 00 00 f8 00 [255413.120149] end_request: I/O error, dev sdc, sector 2783717412 [255413.120162] sd 4:0:0:0: [sdc] Unhandled error code [255413.120165] sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [255413.120169] sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 a5 ec 29 1c 00 00 10 00 [255413.120178] end_request: I/O error, dev sdc, sector 2783717660 [255413.120186] sd 4:0:0:0: [sdc] Unhandled error code [255413.120188] sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [255413.120192] sd 4:0:0:0: [sdc] CDB: Write(10): 2a 00 00 00 00 2e 00 00 08 00 [255413.120201] end_request: I/O error, dev sdc, sector 46 [255413.120209] end_request: I/O error, dev sdc, sector 46 [255413.120212] md: super_written gets error=-5, uptodate=0 [255413.120218] md/raid:md0: Disk failure on sdc1, disabling device. [255413.120219] md/raid:md0: Operation continuing on 4 devices. [255413.332414] RAID conf printout: [255413.332420] --- level:5 rd:5 wd:4 [255413.332425] disk 0, o:1, dev:sdb1 [255413.332428] disk 1, o:0, dev:sdc1 [255413.332432] disk 2, o:1, dev:sdd1 [255413.332435] disk 3, o:1, dev:sde1 [255413.332438] disk 4, o:1, dev:sdf1 [255413.352039] RAID conf printout: [255413.352045] --- level:5 rd:5 wd:4 [255413.352049] disk 0, o:1, dev:sdb1 [255413.352052] disk 2, o:1, dev:sdd1 [255413.352055] disk 3, o:1, dev:sde1 [255413.352058] disk 4, o:1, dev:sdf1 Devices are in different SATA ports (first failed ata2, then ata5, then ata4) and are all Seagate ST2000DL003-9VT166. Same exact hardware has been running on Linux 2.6.32-gentoo for weeks without a single failure. lspci output: 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller (rev 02) Subsystem: Giga-byte Technology Device 5000 Flags: bus master, fast devsel, latency 0 Capabilities: 00:01.0 PCI bridge: Intel Corporation 82G33/G31/P35/P31 Express PCI Express Root Port (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: a000-afff Memory behind bridge: f400-f5ff Prefetchable memory behind bridge: e000-efff Capabilities: Kernel driver in use: pcieport 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02) (prog-if 00 [UHCI]) Subsystem: Giga-byte Technology Device 5004 Flags: bus master, medium devsel, latency 0, IRQ 16 I/O ports at e000 [size=32] Capabilities: Kernel driver in use: uhci_hcd 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02) (prog-if 00 [UHCI]) Subsystem: Giga-byte Technology Device 5004 Flags: bus master, medium devsel, latency 0, IRQ 21 I/O ports at e100 [size=32] Capabilities: Kernel driver in use: uhci_hcd 00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02) (prog-if 00 [UHCI]) Subsystem: Giga-byte Technology Device 5004 Flags: bus master, medium devsel, latency 0, IRQ 18 I/O ports at e500 [size=32] Capabilities: Kernel driver in use: uhci_hcd 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02) (prog-if 20 [EHCI]) Subsystem: Giga-byte Technology Device 5006