subject:"Bug#625922\: SATA devices get reset without real hardware failure"

Bug#625922: SATA devices get reset without real hardware failure

2012-05-15 Thread Francois Gouget


This is mostly a mee too to report that I have had this issue at least 
three times in the past 4 months (more or less), and every time with the 
one ST2000DL003-9VT166 drive in my computer (out of four). I'm also 
running with the CC32 firmware (apparently there's nothing more recent) 
but the Linux kernel is newer:

Linux version 3.2.0-2-amd64 (Debian 3.2.15-1) 
(debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-1) ) 
#1 SMP Sun Apr 15 16:47:38 UTC 2012

Other details that may be relevant:
 * It's part of a software RAID1 device (which causes it to drop out 
   obviously).
 * I did a long test with SeaTools and it passed.

Here's how it started last night:

May 16 01:42:29 amboise kernel: [713845.984035] ata5.00: exception Emask 0x0 
SAct 0x0 SErr 0x0 action 0x6 frozen
May 16 01:42:29 amboise kernel: [713845.984040] ata5.00: failed command: FLUSH 
CACHE EXT
May 16 01:42:29 amboise kernel: [713845.984046] ata5.00: cmd 
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
May 16 01:42:29 amboise kernel: [713845.984047]  res 
40/00:02:00:08:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
May 16 01:42:29 amboise kernel: [713845.984051] ata5.00: status: { DRDY }
May 16 01:42:29 amboise kernel: [713845.984060] ata5.00: hard resetting link
May 16 01:42:29 amboise kernel: [713846.304016] ata5.01: hard resetting link
May 16 01:42:34 amboise kernel: [713851.820015] ata5.00: link is slow to 
respond, please be patient (ready=0)
May 16 01:42:39 amboise kernel: [713856.020035] ata5.00: SRST failed (errno=-16)
May 16 01:42:39 amboise kernel: [713856.020045] ata5.00: hard resetting link
May 16 01:42:39 amboise kernel: [713856.340022] ata5.01: hard resetting link
May 16 01:42:44 amboise kernel: [713861.856014] ata5.00: link is slow to 
respond, please be patient (ready=0)
May 16 01:42:49 amboise kernel: [713866.056013] ata5.00: SRST failed (errno=-16)
May 16 01:42:49 amboise kernel: [713866.056022] ata5.00: hard resetting link
May 16 01:42:49 amboise kernel: [713866.376014] ata5.01: hard resetting link
May 16 01:42:54 amboise kernel: [713871.892010] ata5.00: link is slow to 
respond, please be patient (ready=0)
May 16 01:43:24 amboise kernel: [713901.068011] ata5.00: SRST failed (errno=-16)
May 16 01:43:24 amboise kernel: [713901.068019] ata5.00: limiting SATA link 
speed to 1.5 Gbps
May 16 01:43:24 amboise kernel: [713901.068023] ata5.01: limiting SATA link 
speed to 1.5 Gbps
May 16 01:43:24 amboise kernel: [713901.068028] ata5.00: hard resetting link
May 16 01:43:24 amboise kernel: [713901.388013] ata5.01: hard resetting link
May 16 01:43:29 amboise kernel: [713906.120012] ata5.00: SRST failed (errno=-16)
May 16 01:43:29 amboise kernel: [713906.130577] ata5.00: reset failed, giving up
May 16 01:43:29 amboise kernel: [713906.130580] ata5.00: disabled
May 16 01:43:29 amboise kernel: [713906.130585] ata5.01: disabled
May 16 01:43:29 amboise kernel: [713906.130589] ata5.00: device reported 
invalid CHS sector 0
May 16 01:43:29 amboise kernel: [713906.130598] ata5: EH complete
May 16 01:43:29 amboise kernel: [713906.130670] sd 4:0:0:0: [sdb] Unhandled 
error code
May 16 01:43:29 amboise kernel: [713906.130677] sd 4:0:0:0: [sdb]  Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 16 01:43:29 amboise kernel: [713906.130691] sd 4:0:0:0: [sdb] CDB: 
Write(10): 2a 00 17 00 6f 80 00 00 08 00
May 16 01:43:29 amboise kernel: [713906.130713] end_request: I/O error, dev 
sdb, sector 385904512


-- 
Francois Gouget   http://fgouget.free.fr/
In theory, theory and practice are the same, but in practice they're different.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/alpine.DEB.2.02.1205160840120.27276@amboise.dolphin

Bug#625922: SATA devices get reset without real hardware failure

2012-01-26 Thread Alessio Treglia

Hi all,

seems unreproducible to me with linux-2.6 3.1.8-2 currently available
in testing.

Regards,

-- 
Alessio Treglia          | www.alessiotreglia.com
Debian Developer         | ales...@debian.org
Ubuntu Core Developer    | quadris...@ubuntu.com
0416 0004 A827 6E40 BB98 90FB E8A4 8AE5 311D 765A



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/CAMHuwozR12Y6EwtHUOcGLyqGZf6gLfd9rzjF-Fw1Vi_ek7=z...@mail.gmail.com

Bug#625922: SATA devices get reset without real hardware failure

2011-12-14 Thread Ben Hutchings

Summary of the bug so far:

Messages #5, #63 from Natalia Portillo :

package version: Ubuntu 2.6.38-8.?? (Debian did not use version 2.6.38-8)
 Debian 2.6.32-38 ("all squeeze kernels up to two weeks away")
 (Gentoo 2.6.32 does not have the problem)
drive(s): Seagate ST2000DL003-9VT166
controller(s): Intel ICH9 (AHCI mode) (and JMicron JMB36?)
kernel log:
[255352.928063] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[255352.928071] ata4.00: failed command: FLUSH CACHE EXT
[255352.928080] ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[255352.928082]  res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 
(timeout)
[255352.928087] ata4.00: status: { DRDY }
[255352.928096] ata4: hard resetting link
[255362.932028] ata4: softreset failed (1st FIS failed)
[255362.932036] ata4: hard resetting link
[255372.932018] ata4: softreset failed (1st FIS failed)
[255372.932026] ata4: hard resetting link
[255407.932029] ata4: softreset failed (1st FIS failed)
[255407.932038] ata4: limiting SATA link speed to 1.5 Gbps
[255407.932042] ata4: hard resetting link
[255413.120028] ata4: softreset failed (device not ready)
[255413.120035] ata4: reset failed, giving up
[255413.120040] ata4.00: disabled
[255413.120060] ata4: EH complete

Messages #18, #33 from Paul Faure :

package version: Ubuntu 2.6.38-8.42
drive(s): Seagate ST2000DL003, ST2000DL003-9VT1
  (but no problem with ST32000644NS)
controller(s): Intel ICH9 (legacy mode)
kernel log:
[247972.000120] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[247972.000132] ata3.00: failed command: FLUSH CACHE EXT
[247972.000146] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[247972.000148]  res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 
(timeout)
[247972.000155] ata3.00: status: { DRDY }
[247972.000169] ata3: hard resetting link
[247977.550053] ata3: link is slow to respond, please be patient (ready=0)
[247982.050113] ata3: SRST failed (errno=-16)
[247982.050138] ata3: hard resetting link
[247987.600068] ata3: link is slow to respond, please be patient (ready=0)
[247992.100087] ata3: SRST failed (errno=-16)
[247992.100109] ata3: hard resetting link
[247997.650040] ata3: link is slow to respond, please be patient (ready=0)
[248027.110050] ata3: SRST failed (errno=-16)
[248027.110066] ata3: limiting SATA link speed to 1.5 Gbps
[248027.110075] ata3: hard resetting link
[248032.120042] ata3: SRST failed (errno=-16)
[248032.120053] ata3: reset failed, giving up
[248032.120060] ata3.00: disabled
[248032.120069] ata3.00: device reported invalid CHS sector 0
[248032.120094] ata3: EH complete

Message #23 from Christian Robottom Reis :

package version: Ubuntu 2.6.35-28
drive(s): Seagate ST2000DL003-9VT166
controller(s): ?
kernel log: not provided

Messages #38, #43 from Juhani Karlsson :

Seems to be a different problem.

Message #48 from Javier Ortega Conde (Malkavian) :

A bunch of different problems.

Message #70 from Alessio Treglia :

package version: Debian 3.1.1-1
drive(s): TOSHIBA MK5055GSXN
controller(s): ? (AHCI)
kernel log:
[ 6838.837215] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen 
[ 6838.837222] ata2.00: failed command: FLUSH CACHE EXT
[ 6838.837230] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[ 6838.837231]  res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 
(timeout)
[ 6838.837241] ata2.00: status: { DRDY }
[ 6838.837254] ata2: hard resetting link
[ 6844.199464] ata2: link is slow to respond, please be patient (ready=0)
[ 6848.846062] ata2: COMRESET failed (errno=-16)
[ 6848.846075] ata2: hard resetting link
[ 6854.208316] ata2: link is slow to respond, please be patient (ready=0)
[ 6858.854933] ata2: COMRESET failed (errno=-16)
[ 6858.854943] ata2: hard resetting link
[ 6864.213249] ata2: link is slow to respond, please be patient (ready=0)
[ 6875.073958] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 6875.117586] ata2.00: configured for UDMA/100
[ 6875.117598] ata2.00: retrying FLUSH 0xea Emask 0x4
[ 6875.129847] ata2.00: device eported invalid CHS sector 0
[ 6875.129864] ata2: EH complete

-- 
Ben Hutchings
Computers are not intelligent.  They only think they are.


signature.asc
Description: This is a digitally signed message part

Bug#625922: SATA devices get reset without real hardware failure

2011-11-26 Thread Natalia Portillo


El 26/11/2011, a las 07:49, Jonathan Nieder escribió:

> Hi,
> 
> Natalia Portillo wrote:
> 
>> While running stock Debian's sid linux 2.6.38-8-amd64 kernel I'm
>> getting random fails on SATA devices.
>> 
>> I have a RAID5 system with 5 disks and 3 of them showed the same
>> exact failure, one each 48 hours.
>> 
>> On reboot, the devices work perfectly, and badblocks runs through
>> them without a single failure.
>> 
>> Kernel exact failure is:
>> 
>> [255352.928063] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
>> frozen
>> [255352.928071] ata4.00: failed command: FLUSH CACHE EXT
> [...]
>> Devices are in different SATA ports (first failed ata2, then ata5,
>> then ata4) and are all Seagate ST2000DL003-9VT166.
>> 
>> Same exact hardware has been running on Linux 2.6.32-gentoo for
>> weeks without a single failure.
> 
> Thanks for reporting it, and sorry for the slow response.
> 
> Some questions:
> 
> - what kernel are you using now?

claunia@hades:~$ uname -a
Linux hades 3.0.0-1-amd64 #1 SMP Sat Aug 27 16:21:11 UTC 2011 x86_64 GNU/Linux

wheezy

> - can you still reproduce this?

have been only two weeks with this kernel, and there is a bug, another one

> - can you reproduce it with a squeeze kernel, too?

with all squeeze kernels up to two weeks away

> - do you know what exact version the working 2.6.32-gentoo kernel
>   was?

r6 I think

> - please attach a log of the initialization of the kernel, either by
>   saving full "dmesg" output right after booting or by gathering it
>   from /var/log/dmesg*

I will have to dig up on the rotated logs, stay tuned

> - any workarounds or other weird symptoms?

Curiously, no workarounds, but other weird symptons in same and other kernels.

On both squeeze and wheezy kernel the following happen almost once a day 
(always on high network transfers):

[118801.372070] INFO: task bacula-sd:27996 blocked for more than 120 seconds.
[118801.372091] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[118801.372113] bacula-sd   D 88009f63a2c0 0 27996  1 0x
[118801.372122]  88009f63a2c0 0082  
8800
[118801.372130]  8800bc3780c0 00012800 88008954dfd8 
88008954dfd8
[118801.372138]  00012800 88009f63a2c0 00012800 
00012800
[118801.372146] Call Trace:
[118801.372161]  [] ? schedule_timeout+0x2d/0xd7
[118801.372170]  [] ? blk_peek_request+0x1a7/0x1bc
[118801.372176]  [] ? wait_for_common+0x9d/0x116
[118801.372184]  [] ? try_to_wake_up+0x199/0x199
[118801.372190]  [] ? _raw_spin_lock_irq+0xd/0x1a
[118801.372218]  [] ? st_do_scsi.clone.10+0x2d9/0x309 [st]
[118801.372228]  [] ? st_int_ioctl+0x673/0xad5 [st]
[118801.372234]  [] ? mmdrop+0xd/0x1c
[118801.372241]  [] ? should_resched+0x5/0x24
[118801.372250]  [] ? st_ioctl+0xb5e/0xedf [st]
[118801.372259]  [] ? hrtimer_try_to_cancel+0x3c/0x46
[118801.372265]  [] ? hrtimer_cancel+0xc/0x16
[118801.372272]  [] ? do_vfs_ioctl+0x45b/0x49c
[118801.372278]  [] ? update_rmtp+0x62/0x62
[118801.372284]  [] ? hrtimer_start_expires+0x16/0x1b
[118801.372290]  [] ? sys_ioctl+0x4b/0x72
[118801.372297]  [] ? system_call_fastpath+0x16/0x1b

And repeats a lot of times (the stack trace is always different, always being 
the process that's doing the transfer, like bacula-sd or netatalk, or the XFS 
or MDRAID processes)

On squeeze kernel when this happens nothing works. That is, if you open another 
processes, it does not open. If you kill one process, it stays opened. Hard 
reboot is the only way.
On wheezy system continues working.

Curiously I received an Efika MX Smartbook machine yesterday that exhibits 
another bug, but really similar.

With kernel Linux 2.6.31.14.26-efikamx the internal SSD suffers a lost 
interrupt and resets when there is high cpu usage. Sorry have to dig logs also.

> 
> If you can reproduce this reliably with a 3.1.y kernel, we should
> take this upstream (looks like that's linux-...@vger.kernel.org
> plus linux-ker...@vger.kernel.org; please cc me or this bug log if
> writing there so we can track it).
> 
> Hope that helps,
> Jonathan




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/81165706-7715-4647-a055-64029a59a...@claunia.com

Bug#625922: SATA devices get reset without real hardware failure

2011-11-25 Thread Jonathan Nieder

Hi,

Natalia Portillo wrote:

> While running stock Debian's sid linux 2.6.38-8-amd64 kernel I'm
> getting random fails on SATA devices.
>
> I have a RAID5 system with 5 disks and 3 of them showed the same
> exact failure, one each 48 hours.
>
> On reboot, the devices work perfectly, and badblocks runs through
> them without a single failure.
>
> Kernel exact failure is:
>
> [255352.928063] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
> frozen
> [255352.928071] ata4.00: failed command: FLUSH CACHE EXT
[...]
> Devices are in different SATA ports (first failed ata2, then ata5,
> then ata4) and are all Seagate ST2000DL003-9VT166.
>
> Same exact hardware has been running on Linux 2.6.32-gentoo for
> weeks without a single failure.

Thanks for reporting it, and sorry for the slow response.

Some questions:

 - what kernel are you using now?
 - can you still reproduce this?
 - can you reproduce it with a squeeze kernel, too?
 - do you know what exact version the working 2.6.32-gentoo kernel
   was?
 - please attach a log of the initialization of the kernel, either by
   saving full "dmesg" output right after booting or by gathering it
   from /var/log/dmesg*
 - any workarounds or other weird symptoms?

If you can reproduce this reliably with a 3.1.y kernel, we should
take this upstream (looks like that's linux-...@vger.kernel.org
plus linux-ker...@vger.kernel.org; please cc me or this bug log if
writing there so we can track it).

Hope that helps,
Jonathan



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/2026074919.ga22...@elie.hsd1.il.comcast.net

Re: Bug#625922: SATA devices get reset without real hardware failure

2011-10-19 Thread Ben Hutchings

On Wed, Oct 19, 2011 at 05:11:05PM +0200, U.Mutlu wrote:
> Ben Hutchings wrote, On 2011-10-19 15:07:
> >On Wed, 2011-10-19 at 13:31 +0200, U.Mutlu wrote:
> >>Javier Ortega Conde (Malkavian) wrote, On 2011-10-18 00:37:
> >>>This bug (in general, not just this on this web) have been in GNU/Linux 
> >>>since
> >>>a long time with various disks, mainboards, SATA controllers, distros and
> >>>kernels (maybe since changes after 2.6.24).
> >>
> >>I'm using kernel 2.6.37.6 and there this bug is still present.
> >
> >Not a Debian kernel version, so please don't bother this list with it.
> 
> I haven't mentioned anything of Debian, I'm using the kernel from kernel.org :
>  http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.37.6.tar.bz2
 
You are writing to the debian-kernel mailing list, not LKML.

> >>Has it been fixed in any recent kernel versions?
> >>IMO it deserves the highest priority to fix this ASAP.
> >[...]
> >
> >It's not a single bug.
> 
> It's disastrous situation: an OS with buggy HD kernel driver, and no fix on 
> the way...

Stop ranting and explain your problem to the right people (not us).

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
  - Albert Camus


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20111019161410.gn3...@decadent.org.uk

Re: Bug#625922: SATA devices get reset without real hardware failure

2011-10-19 Thread U.Mutlu


Ben Hutchings wrote, On 2011-10-19 15:07:

On Wed, 2011-10-19 at 13:31 +0200, U.Mutlu wrote:

Javier Ortega Conde (Malkavian) wrote, On 2011-10-18 00:37:

This bug (in general, not just this on this web) have been in GNU/Linux since
a long time with various disks, mainboards, SATA controllers, distros and
kernels (maybe since changes after 2.6.24).


I'm using kernel 2.6.37.6 and there this bug is still present.


Not a Debian kernel version, so please don't bother this list with it.


I haven't mentioned anything of Debian, I'm using the kernel from kernel.org :
 http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.37.6.tar.bz2


Has it been fixed in any recent kernel versions?
IMO it deserves the highest priority to fix this ASAP.

[...]

It's not a single bug.


It's disastrous situation: an OS with buggy HD kernel driver, and no fix on the 
way...


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/j7mpa9$8ns$1...@dough.gmane.org

Re: Bug#625922: SATA devices get reset without real hardware failure

2011-10-19 Thread Ben Hutchings

On Wed, 2011-10-19 at 13:31 +0200, U.Mutlu wrote:
> Javier Ortega Conde (Malkavian) wrote, On 2011-10-18 00:37:
> > This bug (in general, not just this on this web) have been in GNU/Linux 
> > since
> > a long time with various disks, mainboards, SATA controllers, distros and
> > kernels (maybe since changes after 2.6.24).
> 
> I'm using kernel 2.6.37.6 and there this bug is still present.

Not a Debian kernel version, so please don't bother this list with it.

> Has it been fixed in any recent kernel versions?
> IMO it deserves the highest priority to fix this ASAP.
[...]

It's not a single bug.

Ben.

-- 
Ben Hutchings
73.46% of all statistics are made up.


signature.asc
Description: This is a digitally signed message part

Re: Bug#625922: SATA devices get reset without real hardware failure

2011-10-19 Thread U.Mutlu


Javier Ortega Conde (Malkavian) wrote, On 2011-10-18 00:37:

This bug (in general, not just this on this web) have been in GNU/Linux since
a long time with various disks, mainboards, SATA controllers, distros and
kernels (maybe since changes after 2.6.24).


I'm using kernel 2.6.37.6 and there this bug is still present.
Has it been fixed in any recent kernel versions?
IMO it deserves the highest priority to fix this ASAP.


https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892

"
Raj B (bigwoof) wrote on 2011-01-03:
...
I've lost data because of this as well. my entire /var/lib/mysql directory
was blown away and recovered into lost+found. other directories are there as 
well.
...
"

I had a similar disaster yesterday... :-(

From my syslog:

Oct 18 12:11:16 c12 kernel: [   35.340954] ata1.00: exception Emask 0x0 SAct 
0x0 SErr 0x0 action 0x0
Oct 18 12:11:16 c12 kernel: [   35.342052] ata1.00: irq_stat 0x4001
Oct 18 12:11:16 c12 kernel: [   35.343141] ata1.00: failed command: READ DMA
Oct 18 12:11:16 c12 kernel: [   35.344230] ata1.00: cmd 
c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in
Oct 18 12:11:16 c12 kernel: [   35.344232]  res 
51/01:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x1 (device error)
Oct 18 12:11:16 c12 kernel: [   35.346497] ata1.00: status: { DRDY ERR }
Oct 18 12:11:16 c12 kernel: [   35.351588] ata1.00: configured for UDMA/133
Oct 18 12:11:16 c12 kernel: [   35.352760] ata1: EH complete
Oct 18 12:11:16 c12 kernel: [   36.374319] ata1.00: exception Emask 0x0 SAct 
0x0 SErr 0x0 action 0x0
Oct 18 12:11:16 c12 kernel: [   36.375516] ata1.00: irq_stat 0x4001
Oct 18 12:11:16 c12 kernel: [   36.376722] ata1.00: failed command: READ DMA
Oct 18 12:11:16 c12 kernel: [   36.377913] ata1.00: cmd 
c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in
Oct 18 12:11:16 c12 kernel: [   36.377915]  res 
51/01:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x1 (device error)
Oct 18 12:11:16 c12 kernel: [   36.380393] ata1.00: status: { DRDY ERR }
Oct 18 12:11:16 c12 kernel: [   36.385574] ata1.00: configured for UDMA/133
Oct 18 12:11:16 c12 kernel: [   36.386828] ata1: EH complete
Oct 18 12:11:16 c12 kernel: [   37.407698] ata1.00: exception Emask 0x0 SAct 
0x0 SErr 0x0 action 0x0
Oct 18 12:11:16 c12 kernel: [   37.409013] ata1.00: irq_stat 0x4001
Oct 18 12:11:16 c12 kernel: [   37.410317] ata1.00: failed command: READ DMA
Oct 18 12:11:16 c12 kernel: [   37.411638] ata1.00: cmd 
c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in
Oct 18 12:11:16 c12 kernel: [   37.411639]  res 
51/40:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x9 (media error)
Oct 18 12:11:16 c12 kernel: [   37.414381] ata1.00: status: { DRDY ERR }
Oct 18 12:11:16 c12 kernel: [   37.415758] ata1.00: error: { UNC }
Oct 18 12:11:16 c12 kernel: [   37.421076] ata1.00: configured for UDMA/133
Oct 18 12:11:16 c12 kernel: [   37.422466] ata1: EH complete
Oct 18 12:11:16 c12 kernel: [   38.449412] ata1.00: exception Emask 0x0 SAct 
0x0 SErr 0x0 action 0x0
Oct 18 12:11:16 c12 kernel: [   38.450847] ata1.00: irq_stat 0x4000
Oct 18 12:11:16 c12 kernel: [   38.452285] ata1.00: failed command: READ DMA
Oct 18 12:11:16 c12 kernel: [   38.453718] ata1.00: cmd 
c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in
Oct 18 12:11:16 c12 kernel: [   38.453720]  res 
51/40:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x9 (media error)
Oct 18 12:11:16 c12 kernel: [   38.456694] ata1.00: status: { DRDY ERR }
Oct 18 12:11:16 c12 kernel: [   38.458190] ata1.00: error: { UNC }
Oct 18 12:11:16 c12 kernel: [   38.463615] ata1.00: configured for UDMA/133
Oct 18 12:11:16 c12 kernel: [   38.465135] ata1: EH complete
Oct 18 12:11:16 c12 kernel: [   39.491124] ata1.00: exception Emask 0x0 SAct 
0x0 SErr 0x0 action 0x0
Oct 18 12:11:16 c12 kernel: [   39.492692] ata1.00: irq_stat 0x4001
Oct 18 12:11:16 c12 kernel: [   39.494253] ata1.00: failed command: READ DMA
Oct 18 12:11:16 c12 kernel: [   39.495829] ata1.00: cmd 
c8/00:08:7f:04:f5/00:00:00:00:00/e1 tag 0 dma 4096 in
Oct 18 12:11:16 c12 kernel: [   39.495831]  res 
51/40:08:7f:04:f5/00:00:00:00:00/e1 Emask 0x9 (media error)
Oct 18 12:11:16 c12 kernel: [   39.499081] ata1.00: status: { DRDY ERR }
Oct 18 12:11:16 c12 kernel: [   39.500710] ata1.00: error: { UNC }
Oct 18 12:11:16 c12 kernel: [   39.506254] ata1.00: configured for UDMA/133
Oct 18 12:11:16 c12 kernel: [   39.507867] ata1: EH complete

...

Oct 18 14:05:41 c12 kernel: [   71.786231] ata1.00: exception Emask 0x10 SAct 
0x0 SErr 0x191 action 0xe frozen
Oct 18 14:05:41 c12 kernel: [   71.786236] ata1.00: irq_stat 0x0840, 
interface fatal error, PHY RDY changed
Oct 18 14:05:41 c12 kernel: [   71.786240] ata1: SError: { PHYRdyChg Dispar 
LinkSeq TrStaTrns }
Oct 18 14:05:41 c12 kernel: [   71.786243] ata1.00: failed command: READ DMA
Oct 18 14:05:41 c12 kernel: [   71.786250] ata1.00: cmd 
c8/00:20:27:03:9d/00:00:00:00:00/e1 tag 0 dma 16384 in
Oct 18 14:05:41 c12 kernel: [   71.786251]  res 
50/00:00:c6:02

Bug#625922: SATA devices get reset without real hardware failure

2011-10-17 Thread Ben Hutchings

On Tue, 2011-10-18 at 00:37 +0200, Javier Ortega Conde (Malkavian)
wrote:
> This bug (in general, not just this on this web) have been in GNU/Linux since 
> a long time with various disks, mainboards, SATA controllers, distros and 
> kernels (maybe since changes after 2.6.24).

Just because you see the same error messages, that does not mean you are
seeing the same bug.

> In https://bugzilla.redhat.com/show_bug.cgi?id=684599  David Zeuthen says 
> "it's most probably caused by this commit 
> http://git.kernel.org/?p=linux/hotplug/udev.git;a=commitdiff;h=560de575148b7efda3b34a7f7073abd483c5f08e
>  
> "

So that's a bug in some drives, though we need to work around it.

> Possible workarounds readed to this bug: 
> -1: Add "libata.atapi_passthru16=0" to the kernel boot options (because some 
> devices may not support 16-byte ATA commands) ( 
> https://bugzilla.redhat.com/show_bug.cgi?id=684599 )
> -2: (Same as 1) Add options libata atapi_passthru16=0 to 
> /etc/modprobe.d/modprobe.conf and add FILES="/etc/modprobe.d/modprobe.conf" 
> to 
> /etc/mkinitcpio.conf ( https://bbs.archlinux.org/viewtopic.php?pid=895404 )

OK.

> -3: Somebody called Fujisan said in 2009 "adding 'acpi=off noapic' to the 
> kernel in /etc.grub.conf seems to have solved the problem for me"  ( 
> https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=462425 ).  Raman 
> Gupta  and Andreas M. Kirchwitz say in other forums that adding 'acpi=off' 
> doesn't work ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
> -4: (Similar to 3) Completely disable ACPI in mainboard BIOS. ( 
> http://lists.debian.org/debian-user/2010/01/msg00023.html )

These are workarounds for bugs in IRQ routing on some motherboards.

They are also outdated advice.  10 years ago when both ACPI and the APIC
architecture were quite new, there were a lot of bugs in both BIOS and
kernel support for them.  It was therefore sensible to try disabling it
when a new system seemed unstable.  Today, this is not the case.

> -5: Gaetan Cambier says "add the option line to grub to disable ncq : 
> 'libata.force=noncq' for me, with this, i have no froze". ( 
> https://bugzilla.redhat.com/show_bug.cgi?id=549981 ). Others reply that it 
> doesn't work for them. PsYcHoK9 sys it works for him but John Doe replies 
> that 
> not for him ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 ).

Not even the same symptoms.

> -6: Reartes Guillermo says "booting with the kernel parameter: pcie_aspm=off 
> ? 
> For me it worked (nvidia)". Raman Gupta replies that "I tried this and it did 
> not fix the problem." ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 )

This is a workaround for a controller or chipset bug.

[...]
> Same problem in my old PC/Server Pentium II MMX with Debian 6.0.3 (stable) 
> with kernel 2.6.32-5-686 and libata version 3.00 in an "IBM-DTLA-305010" 10Gb 
> IDE disk (configured by debian as sda) in an old mainboard . No RAID used, 
> but 
> only soft reset, and no hard reset, so I don't lose data. Could send logs, 
> but 
> I think they wouldn't give any more info.
> 
> Same problem in my desktop PC every 2 or 3 months in Debian testing with 
> kernels 3.0.0-1-amd64, 3.0.0-rc2-amd64, 2.6.39-2-amd64, 2.6.39-amd64, 
> 2.6.38-2-amd64, 2.6.38-amd64 and maybe others older, and libata 3.00 in two 
> Seagate 7200.11 "ST3500320AS" 500Gb SATA2 disks (with last firmware) from a 
> RAID10. Fortunately the other two Western Digital "WDC WD1002FAEX-00Z3A0" 1Tb 
> SATA3 disks don't fail, but I have to reboot and re-add disk to reconstruct 
> raid. Could send logs, but I think they wouldn't give any more info.
[...]

Use reportbug to open a *separate* bug report for *each* of these
systems.  Do send the logs.  Please do not try to find connections with
other bug reports.

Ben.

-- 
Ben Hutchings
No political challenge can be met by shopping. - George Monbiot


signature.asc
Description: This is a digitally signed message part

Bug#625922: SATA devices get reset without real hardware failure

2011-10-17 Thread Javier Ortega Conde (Malkavian)

This bug (in general, not just this on this web) have been in GNU/Linux since 
a long time with various disks, mainboards, SATA controllers, distros and 
kernels (maybe since changes after 2.6.24).

In https://bugzilla.redhat.com/show_bug.cgi?id=684599  David Zeuthen says 
"it's most probably caused by this commit 
http://git.kernel.org/?p=linux/hotplug/udev.git;a=commitdiff;h=560de575148b7efda3b34a7f7073abd483c5f08e
 
"

Possible workarounds readed to this bug: 
-1: Add "libata.atapi_passthru16=0" to the kernel boot options (because some 
devices may not support 16-byte ATA commands) ( 
https://bugzilla.redhat.com/show_bug.cgi?id=684599 )
-2: (Same as 1) Add options libata atapi_passthru16=0 to 
/etc/modprobe.d/modprobe.conf and add FILES="/etc/modprobe.d/modprobe.conf" to 
/etc/mkinitcpio.conf ( https://bbs.archlinux.org/viewtopic.php?pid=895404 )
-3: Somebody called Fujisan said in 2009 "adding 'acpi=off noapic' to the 
kernel in /etc.grub.conf seems to have solved the problem for me"  ( 
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=462425 ).  Raman 
Gupta  and Andreas M. Kirchwitz say in other forums that adding 'acpi=off' 
doesn't work ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
-4: (Similar to 3) Completely disable ACPI in mainboard BIOS. ( 
http://lists.debian.org/debian-user/2010/01/msg00023.html )
-5: Gaetan Cambier says "add the option line to grub to disable ncq : 
'libata.force=noncq' for me, with this, i have no froze". ( 
https://bugzilla.redhat.com/show_bug.cgi?id=549981 ). Others reply that it 
doesn't work for them. PsYcHoK9 sys it works for him but John Doe replies that 
not for him ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 ).
-6: Reartes Guillermo says "booting with the kernel parameter: pcie_aspm=off ? 
For me it worked (nvidia)". Raman Gupta replies that "I tried this and it did 
not fix the problem." ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
-7: A. Mani says "For the SB600 controller, the right thing to do is to 
restrict all drives to 1.5Gbps by jumpers or with a boot option."  Raman Gupta 
replies "I also tried this -- but with this setting all drives attached to my 
Marvell controller could not even be started by the kernel -- permanent 
"failed to IDENTIFY" errors." ( 
https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
-8: DjznBR (djzn-br) says he have trying some things WITHOUT success it and 
finally one that works. Doesn't work: TURNED HDPARM OFF, CHANGED CABLE, 
EXPERIMENTED AHCI & RAID MODES, DISABLED NCQ, COMPILED KERNEL WITH 
CONFIG_SATA_PMP DISABLED, TRYING NOW LIBATA.FORCE=1.5GBPS, changed the cables 
to different routes... SATA1 -> SATA2 SATA2 -> SATA3  Works (but still 
gives "softreset failed (device not ready)"  messages in dmesg and afterwards 
recover without data loss) :  Added option for kernel in grub configuration 
"libata.noacpi=1". Also says "libata.force=norst ... prevents soft and hard 
link resettings. If you have that switch on, when this bug comes up, there is 
a system lock down (because obviously the kernel prevented the soft & hard 
resetting." ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 )


Same problem in my old PC/Server Pentium II MMX with Debian 6.0.3 (stable) 
with kernel 2.6.32-5-686 and libata version 3.00 in an "IBM-DTLA-305010" 10Gb 
IDE disk (configured by debian as sda) in an old mainboard . No RAID used, but 
only soft reset, and no hard reset, so I don't lose data. Could send logs, but 
I think they wouldn't give any more info.

Same problem in my desktop PC every 2 or 3 months in Debian testing with 
kernels 3.0.0-1-amd64, 3.0.0-rc2-amd64, 2.6.39-2-amd64, 2.6.39-amd64, 
2.6.38-2-amd64, 2.6.38-amd64 and maybe others older, and libata 3.00 in two 
Seagate 7200.11 "ST3500320AS" 500Gb SATA2 disks (with last firmware) from a 
RAID10. Fortunately the other two Western Digital "WDC WD1002FAEX-00Z3A0" 1Tb 
SATA3 disks don't fail, but I have to reboot and re-add disk to reconstruct 
raid. Could send logs, but I think they wouldn't give any more info.

Possibly these are the same bug: #539059, #603061, #524876

Same bug in other distros and kernels:
-Archlinux with udev-165 and udev-166: 
https://bbs.archlinux.org/viewtopic.php?pid=895404
-Fedora with kernel 2.6.38-0.rc8.git0.1.fc15.x86_64 and udev-166 in a DVD 
reader: https://bugzilla.redhat.com/show_bug.cgi?id=684599
-Fedora 13 with kernel 2.6.33.8-149.fc13.i686.PAE or Fedora 13 64bit on a Mac 
Mini
-Fedora 14 with kernels 2.6.31.6-166.fc12@x86_64, 2.6.32.11-99.fc12.x86_64, 
2.6.35.9-64.fc14.x86_64, 2.6.35.10-72.fc14.i686 and 2.6.35.10-74.fc14.x86_64 
and 2.6.35.11-83.fc14.x86_64 and 2.6.35.14-95.fc14.x86_64: 
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-Fedora 15 (updated from Fedora 14): 
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-Centos5.5-x64 with kernel 2.6.18-194-x64: 
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-RHEL5 with vanilla kernel 2.6.37.3: 
https://bugzilla.redhat.com/show_bu

Bug#625922: SATA devices get reset without real hardware failure

2011-08-28 Thread Juhani Karlsson

I use custom kernels from:

root@lrdlnx:~# dpkg -l |grep linux-source
ii  linux-source-2.6.32 
2.6.32-35  Linux kernel source for version 2.6.32
with Debian patches
ii  linux-source-2.6.38 
2.6.38-5~bpo60+1   Linux kernel source for version 2.6.38
with Debian patches
No external patches or anything, just official Debian sources and stuff.

Same error with 2.6.32 and 2.6.38, first time I noticed errors was Aug 5
and I started using 2.6.38 Aug 16.
I have also changed my configuration between my two dekstop computers,
these drives have been attached to
different motherboard, also Nvidia chipset, not exactly same but
anywaysame error also with other mainboard.

Aug  5 04:38:46 lrdlnx kernel: ata2: hard resetting link
Aug  5 04:38:46 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port
Aug  5 04:38:46 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Aug  5 04:38:46 lrdlnx kernel: ata2.00: configured for UDMA/133
Aug  5 04:38:46 lrdlnx kernel: ata2: EH complete

-- 
-
Juhani Karlsson
juhani dot karlsson at iki dot fi
http://lrdlnx.iki.fi
-

X-Virus-Scanned: Debian amavisd-new (with ClamAV) at lrdlnx.iki.fi




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4e5a0961.2030...@lrdlnx.iki.fi

Bug#625922: SATA devices get reset without real hardware failure

2011-08-28 Thread Juhani Karlsson

I can confirm the same problem.

cat /var/log/messages.0 |grep ata
Aug 28 00:11:45 lrdlnx kernel: ata2: hard resetting link
Aug 28 00:11:45 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port
Aug 28 00:11:45 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Aug 28 00:11:45 lrdlnx kernel: ata2.00: configured for UDMA/133
Aug 28 00:11:45 lrdlnx kernel: ata2: EH complete
Aug 28 00:31:24 lrdlnx kernel: ata2: hard resetting link
Aug 28 00:31:24 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port
Aug 28 00:31:24 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Aug 28 00:31:24 lrdlnx kernel: ata2.00: configured for UDMA/133
Aug 28 00:31:24 lrdlnx kernel: ata2: EH complete
Aug 28 01:02:13 lrdlnx clamd[4832]: SelfCheck: Database status OK.
Aug 28 02:39:01 lrdlnx freshclam[4935]: Database updated (1029731
signatures) from db.local.clamav.net (IP: 85.254.217.235)
Aug 28 02:50:15 lrdlnx kernel: ata2: hard resetting link
Aug 28 02:50:15 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port
Aug 28 02:50:15 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Aug 28 02:50:15 lrdlnx kernel: ata2.00: configured for UDMA/133
Aug 28 02:50:15 lrdlnx kernel: ata2: EH complete
Aug 28 03:02:07 lrdlnx clamd[4832]: SelfCheck: Database modification
detected. Forcing reload.
Aug 28 03:02:08 lrdlnx clamd[4832]: Reading databases from /var/lib/clamav
Aug 28 03:02:18 lrdlnx clamd[4832]: Database correctly reloaded (1028330
signatures)
Aug 28 03:08:55 lrdlnx kernel: ata2: hard resetting link
Aug 28 03:08:55 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port
Aug 28 03:08:56 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Aug 28 03:08:56 lrdlnx kernel: ata2.00: configured for UDMA/133
Aug 28 03:08:56 lrdlnx kernel: ata2: EH complete
Aug 28 03:08:58 lrdlnx kernel: ata2: hard resetting link
Aug 28 03:08:58 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port
Aug 28 03:08:58 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Aug 28 03:08:58 lrdlnx kernel: ata2.00: configured for UDMA/133
Aug 28 03:08:58 lrdlnx kernel: ata2: EH complete

after 5PM no errors /var/log/messages, sometimes error can be seen in
log once every few minutes, sometimes hours
or even days, system is running 24/7

around the time I started notice errrors I had just replaced smaller
drives with 2TB Western Digital Caviar Green WD20EARS
which use "IntelliPower", variable spin rate 5400-7200rpm

just to be sure I already replaced SATA cables with new ones

SATA is Nvidia:
root@lrdlnx:~# lspci |grep -i sata
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)


my raid:
root@lrdlnx:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda5[2] sdb5[1]
  1857650986 blocks super 1.2 [2/2] [UU]
 
md1 : active raid1 sdb2[1] sda2[0]
  70011200 blocks [2/2] [UU]
 
md3 : active raid1 sdd1[1] sdc1[0]
  730957376 blocks [2/2] [UU]
 
md0 : active raid1 sdb1[1] sda1[0]
  136448 blocks [2/2] [UU]
 
unused devices: 

I have run tests few time with no errors and only thing is I these
errors but everything is working perfectly:

root@lrdlnx:~# badblocks -vv /dev/sda
Checking blocks 0 to 1953514583
Checking for bad blocks (read-only test):
done   
Pass completed, 0 bad blocks found.
root@lrdlnx:~# badblocks -vv /dev/sdb
Checking blocks 0 to 1953514583
Checking for bad blocks (read-only test):
done   
Pass completed, 0 bad blocks found.
root@lrdlnx:~# badblocks -vv /dev/sdc
Checking blocks 0 to 732574583
Checking for bad blocks (read-only test):
done   
Pass completed, 0 bad blocks found.
root@lrdlnx:~# badblocks -vv /dev/sdd
Checking blocks 0 to 732574583
Checking for bad blocks (read-only test):
done   
Pass completed, 0 bad blocks found.
root@lrdlnx:~#



root@lrdlnx:~# smartctl -t short /dev/sda
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in
off-line mode".
Drive command "Execute SMART Short self-test routine immediately in
off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Fri Aug 19 08:21:57 2011
Use smartctl -X to abort test.
root@lrdlnx:~# smartctl -t short /dev/sdb
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "

Processed: Re: Bug#625922: SATA devices get reset without real hardware failure

2011-05-07 Thread Debian Bug Tracking System

Processing commands for cont...@bugs.debian.org:

> reassign 625922 linux-2.6 2.6.38-8-amd64
Bug #625922 [linux-image] SATA devices get reset without real hardware failure
Warning: Unknown package 'linux-image'
Bug reassigned from package 'linux-image' to 'linux-2.6'.
Bug No longer marked as found in versions 2.6.38-8-amd64.
Bug #625922 [linux-2.6] SATA devices get reset without real hardware failure
There is no source info for the package 'linux-2.6' at version '2.6.38-8-amd64' 
with architecture ''
Unable to make a source version for version '2.6.38-8-amd64'
Bug Marked as found in versions 2.6.38-8-amd64.
> --
Stopping processing here.

Please contact me if you need assistance.
-- 
625922: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=625922
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/handler.s.c.130475867918698.transcr...@bugs.debian.org

Bug#625922: SATA devices get reset without real hardware failure

2011-05-07 Thread Natalia Portillo

Package: linux-image
Version: 2.6.38-8-amd64
Severity: critical

While running stock Debian's sid linux 2.6.38-8-amd64 kernel I'm getting random 
fails on SATA devices.

I have a RAID5 system with 5 disks and 3 of them showed the same exact failure, 
one each 48 hours.

On reboot, the devices work perfectly, and badblocks runs through them without 
a single failure.

Kernel exact failure is:

[255352.928063] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[255352.928071] ata4.00: failed command: FLUSH CACHE EXT
[255352.928080] ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[255352.928082]  res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 
(timeout)
[255352.928087] ata4.00: status: { DRDY }
[255352.928096] ata4: hard resetting link
[255362.932028] ata4: softreset failed (1st FIS failed)
[255362.932036] ata4: hard resetting link
[255372.932018] ata4: softreset failed (1st FIS failed)
[255372.932026] ata4: hard resetting link
[255407.932029] ata4: softreset failed (1st FIS failed)
[255407.932038] ata4: limiting SATA link speed to 1.5 Gbps
[255407.932042] ata4: hard resetting link
[255413.120028] ata4: softreset failed (device not ready)
[255413.120035] ata4: reset failed, giving up
[255413.120040] ata4.00: disabled
[255413.120060] ata4: EH complete
[255413.120131] sd 4:0:0:0: [sdc] Unhandled error code
[255413.120134] sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[255413.120139] sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 a5 ec 28 24 00 00 f8 00
[255413.120149] end_request: I/O error, dev sdc, sector 2783717412
[255413.120162] sd 4:0:0:0: [sdc] Unhandled error code
[255413.120165] sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[255413.120169] sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 a5 ec 29 1c 00 00 10 00
[255413.120178] end_request: I/O error, dev sdc, sector 2783717660
[255413.120186] sd 4:0:0:0: [sdc] Unhandled error code
[255413.120188] sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
[255413.120192] sd 4:0:0:0: [sdc] CDB: Write(10): 2a 00 00 00 00 2e 00 00 08 00
[255413.120201] end_request: I/O error, dev sdc, sector 46
[255413.120209] end_request: I/O error, dev sdc, sector 46
[255413.120212] md: super_written gets error=-5, uptodate=0
[255413.120218] md/raid:md0: Disk failure on sdc1, disabling device.
[255413.120219] md/raid:md0: Operation continuing on 4 devices.
[255413.332414] RAID conf printout:
[255413.332420]  --- level:5 rd:5 wd:4
[255413.332425]  disk 0, o:1, dev:sdb1
[255413.332428]  disk 1, o:0, dev:sdc1
[255413.332432]  disk 2, o:1, dev:sdd1
[255413.332435]  disk 3, o:1, dev:sde1
[255413.332438]  disk 4, o:1, dev:sdf1
[255413.352039] RAID conf printout:
[255413.352045]  --- level:5 rd:5 wd:4
[255413.352049]  disk 0, o:1, dev:sdb1
[255413.352052]  disk 2, o:1, dev:sdd1
[255413.352055]  disk 3, o:1, dev:sde1
[255413.352058]  disk 4, o:1, dev:sdf1

Devices are in different SATA ports (first failed ata2, then ata5, then ata4) 
and are all Seagate ST2000DL003-9VT166.

Same exact hardware has been running on Linux 2.6.32-gentoo for weeks without a 
single failure.

lspci output:
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM 
Controller (rev 02)
Subsystem: Giga-byte Technology Device 5000
Flags: bus master, fast devsel, latency 0
Capabilities: 

00:01.0 PCI bridge: Intel Corporation 82G33/G31/P35/P31 Express PCI Express 
Root Port (rev 02) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: a000-afff
Memory behind bridge: f400-f5ff
Prefetchable memory behind bridge: e000-efff
Capabilities: 
Kernel driver in use: pcieport

00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI 
Controller #4 (rev 02) (prog-if 00 [UHCI])
Subsystem: Giga-byte Technology Device 5004
Flags: bus master, medium devsel, latency 0, IRQ 16
I/O ports at e000 [size=32]
Capabilities: 
Kernel driver in use: uhci_hcd

00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI 
Controller #5 (rev 02) (prog-if 00 [UHCI])
Subsystem: Giga-byte Technology Device 5004
Flags: bus master, medium devsel, latency 0, IRQ 21
I/O ports at e100 [size=32]
Capabilities: 
Kernel driver in use: uhci_hcd

00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI 
Controller #6 (rev 02) (prog-if 00 [UHCI])
Subsystem: Giga-byte Technology Device 5004
Flags: bus master, medium devsel, latency 0, IRQ 18
I/O ports at e500 [size=32]
Capabilities: 
Kernel driver in use: uhci_hcd

00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI 
Controller #2 (rev 02) (prog-if 20 [EHCI])
Subsystem: Giga-byte Technology Device 5006

Bug#625922: SATA devices get reset without real hardware failure

Bug#625922: SATA devices get reset without real hardware failure

Bug#625922: SATA devices get reset without real hardware failure

Bug#625922: SATA devices get reset without real hardware failure

Bug#625922: SATA devices get reset without real hardware failure

Re: Bug#625922: SATA devices get reset without real hardware failure

Re: Bug#625922: SATA devices get reset without real hardware failure

Re: Bug#625922: SATA devices get reset without real hardware failure

Re: Bug#625922: SATA devices get reset without real hardware failure

Bug#625922: SATA devices get reset without real hardware failure

Bug#625922: SATA devices get reset without real hardware failure

Bug#625922: SATA devices get reset without real hardware failure

Bug#625922: SATA devices get reset without real hardware failure

Processed: Re: Bug#625922: SATA devices get reset without real hardware failure

Bug#625922: SATA devices get reset without real hardware failure

15 matches

Site Navigation

Mail list logo

Footer information