I can confirm the same problem. cat /var/log/messages.0 |grep ata Aug 28 00:11:45 lrdlnx kernel: ata2: hard resetting link Aug 28 00:11:45 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port Aug 28 00:11:45 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 00:11:45 lrdlnx kernel: ata2.00: configured for UDMA/133 Aug 28 00:11:45 lrdlnx kernel: ata2: EH complete Aug 28 00:31:24 lrdlnx kernel: ata2: hard resetting link Aug 28 00:31:24 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port Aug 28 00:31:24 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 00:31:24 lrdlnx kernel: ata2.00: configured for UDMA/133 Aug 28 00:31:24 lrdlnx kernel: ata2: EH complete Aug 28 01:02:13 lrdlnx clamd[4832]: SelfCheck: Database status OK. Aug 28 02:39:01 lrdlnx freshclam[4935]: Database updated (1029731 signatures) from db.local.clamav.net (IP: 85.254.217.235) Aug 28 02:50:15 lrdlnx kernel: ata2: hard resetting link Aug 28 02:50:15 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port Aug 28 02:50:15 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 02:50:15 lrdlnx kernel: ata2.00: configured for UDMA/133 Aug 28 02:50:15 lrdlnx kernel: ata2: EH complete Aug 28 03:02:07 lrdlnx clamd[4832]: SelfCheck: Database modification detected. Forcing reload. Aug 28 03:02:08 lrdlnx clamd[4832]: Reading databases from /var/lib/clamav Aug 28 03:02:18 lrdlnx clamd[4832]: Database correctly reloaded (1028330 signatures) Aug 28 03:08:55 lrdlnx kernel: ata2: hard resetting link Aug 28 03:08:55 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port Aug 28 03:08:56 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 03:08:56 lrdlnx kernel: ata2.00: configured for UDMA/133 Aug 28 03:08:56 lrdlnx kernel: ata2: EH complete Aug 28 03:08:58 lrdlnx kernel: ata2: hard resetting link Aug 28 03:08:58 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port Aug 28 03:08:58 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 03:08:58 lrdlnx kernel: ata2.00: configured for UDMA/133 Aug 28 03:08:58 lrdlnx kernel: ata2: EH complete
after 5PM no errors /var/log/messages, sometimes error can be seen in log once every few minutes, sometimes hours or even days, system is running 24/7 around the time I started notice errrors I had just replaced smaller drives with 2TB Western Digital Caviar Green WD20EARS which use "IntelliPower", variable spin rate 5400-7200rpm just to be sure I already replaced SATA cables with new ones SATA is Nvidia: root@lrdlnx:~# lspci |grep -i sata 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2) 00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2) 00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2) my raid: root@lrdlnx:~# cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sda5[2] sdb5[1] 1857650986 blocks super 1.2 [2/2] [UU] md1 : active raid1 sdb2[1] sda2[0] 70011200 blocks [2/2] [UU] md3 : active raid1 sdd1[1] sdc1[0] 730957376 blocks [2/2] [UU] md0 : active raid1 sdb1[1] sda1[0] 136448 blocks [2/2] [UU] unused devices: <none> I have run tests few time with no errors and only thing is I these errors but everything is working perfectly: root@lrdlnx:~# badblocks -vv /dev/sda Checking blocks 0 to 1953514583 Checking for bad blocks (read-only test): done Pass completed, 0 bad blocks found. root@lrdlnx:~# badblocks -vv /dev/sdb Checking blocks 0 to 1953514583 Checking for bad blocks (read-only test): done Pass completed, 0 bad blocks found. root@lrdlnx:~# badblocks -vv /dev/sdc Checking blocks 0 to 732574583 Checking for bad blocks (read-only test): done Pass completed, 0 bad blocks found. root@lrdlnx:~# badblocks -vv /dev/sdd Checking blocks 0 to 732574583 Checking for bad blocks (read-only test): done Pass completed, 0 bad blocks found. root@lrdlnx:~# root@lrdlnx:~# smartctl -t short /dev/sda smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Fri Aug 19 08:21:57 2011 Use smartctl -X to abort test. root@lrdlnx:~# smartctl -t short /dev/sdb smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Fri Aug 19 08:22:02 2011 Use smartctl -X to abort test. root@lrdlnx:~# smartctl -t short /dev/sdc smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Fri Aug 19 08:22:05 2011 Use smartctl -X to abort test. root@lrdlnx:~# smartctl -t short /dev/sdd smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Fri Aug 19 08:22:08 2011 Use smartctl -X to abort test. root@lrdlnx:~# smartctl -l selftest /dev/sda smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 1109 - # 2 Short offline Completed without error 00% 1104 - # 3 Short offline Completed without error 00% 1080 - # 4 Short offline Completed without error 00% 1057 - # 5 Short offline Completed without error 00% 1033 - # 6 Short offline Completed without error 00% 1009 - # 7 Short offline Completed without error 00% 985 - # 8 Short offline Completed without error 00% 961 - # 9 Short offline Completed without error 00% 937 - #10 Short offline Completed without error 00% 913 - #11 Short offline Completed without error 00% 889 - #12 Short offline Completed without error 00% 865 - #13 Short offline Completed without error 00% 841 - #14 Short offline Completed without error 00% 817 - #15 Short offline Completed without error 00% 793 - #16 Short offline Completed without error 00% 770 - #17 Short offline Completed without error 00% 748 - #18 Short offline Completed without error 00% 724 - #19 Short offline Completed without error 00% 700 - #20 Short offline Completed without error 00% 676 - #21 Short offline Completed without error 00% 652 - root@lrdlnx:~# smartctl -l selftest /dev/sdb smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 1116 - # 2 Short offline Completed without error 00% 1111 - # 3 Short offline Completed without error 00% 1087 - # 4 Short offline Completed without error 00% 1063 - # 5 Short offline Completed without error 00% 1039 - # 6 Short offline Completed without error 00% 1015 - # 7 Short offline Completed without error 00% 991 - # 8 Short offline Completed without error 00% 967 - # 9 Short offline Completed without error 00% 943 - #10 Short offline Completed without error 00% 919 - #11 Short offline Completed without error 00% 895 - #12 Short offline Completed without error 00% 871 - #13 Short offline Completed without error 00% 847 - #14 Short offline Completed without error 00% 823 - #15 Short offline Completed without error 00% 800 - #16 Short offline Completed without error 00% 776 - #17 Short offline Completed without error 00% 754 - #18 Short offline Completed without error 00% 730 - #19 Short offline Completed without error 00% 706 - #20 Short offline Completed without error 00% 682 - #21 Short offline Completed without error 00% 658 - root@lrdlnx:~# smartctl -l selftest /dev/sdc smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 16121 - # 2 Short offline Completed without error 00% 16116 - # 3 Short offline Completed without error 00% 16092 - # 4 Short offline Completed without error 00% 16068 - # 5 Short offline Completed without error 00% 16044 - # 6 Short offline Completed without error 00% 16020 - # 7 Short offline Completed without error 00% 15996 - # 8 Short offline Completed without error 00% 15972 - # 9 Short offline Completed without error 00% 15948 - #10 Short offline Completed without error 00% 15924 - #11 Short offline Completed without error 00% 15900 - #12 Short offline Completed without error 00% 15876 - #13 Short offline Completed without error 00% 15852 - #14 Short offline Completed without error 00% 15828 - #15 Short offline Completed without error 00% 15804 - #16 Short offline Completed without error 00% 15780 - #17 Short offline Completed without error 00% 15758 - #18 Short offline Completed without error 00% 15734 - #19 Short offline Completed without error 00% 15710 - #20 Short offline Completed without error 00% 15686 - #21 Short offline Completed without error 00% 15662 - root@lrdlnx:~# smartctl -l selftest /dev/sdd smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 16122 - # 2 Short offline Completed without error 00% 16117 - # 3 Short offline Completed without error 00% 16093 - # 4 Short offline Completed without error 00% 16069 - # 5 Short offline Completed without error 00% 16045 - # 6 Short offline Completed without error 00% 16021 - # 7 Short offline Completed without error 00% 15997 - # 8 Short offline Completed without error 00% 15973 - # 9 Short offline Completed without error 00% 15949 - #10 Short offline Completed without error 00% 15925 - #11 Short offline Completed without error 00% 15901 - #12 Short offline Completed without error 00% 15877 - #13 Short offline Completed without error 00% 15853 - #14 Short offline Completed without error 00% 15829 - #15 Short offline Completed without error 00% 15805 - #16 Short offline Completed without error 00% 15781 - #17 Short offline Completed without error 00% 15759 - #18 Short offline Completed without error 00% 15735 - #19 Short offline Completed without error 00% 15711 - #20 Short offline Completed without error 00% 15687 - #21 Short offline Completed without error 00% 15663 - these error just make worried because last time I had real hdd failure, I saw similiar port reset errors but also actual errors on drive like I/O error, read failure Apr 16 21:44:19 lrd-selleri kernel: res 40/00:00:00:00:e0/00:00:00:00:00/00 Emask 0x14 (ATA bus error) Apr 16 21:44:19 lrd-selleri kernel: ata1: hard resetting port Apr 16 21:44:19 lrd-selleri kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 16 21:44:19 lrd-selleri kernel: ata1.00: configured for UDMA/133 Apr 16 21:44:19 lrd-selleri kernel: ata1: EH complete Apr 16 21:44:19 lrd-selleri kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) Apr 16 21:44:19 lrd-selleri kernel: sd 0:0:0:0: [sda] Write Protect is off Apr 16 21:44:19 lrd-selleri kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Apr 16 21:50:32 lrd-selleri kernel: res 40/00:00:00:00:e0/00:00:00:00:00/00 Emask 0x14 (ATA bus error) Apr 16 21:50:32 lrd-selleri kernel: ata1: hard resetting port Apr 16 21:50:32 lrd-selleri kernel: ata1: port is slow to respond, please be patient (Status 0x80) Apr 16 21:50:32 lrd-selleri kernel: ata1: hard resetting port Apr 16 21:50:32 lrd-selleri kernel: ata1: SATA link down (SStatus 0 SControl 300) Apr 16 21:50:32 lrd-selleri kernel: ata1: failed to recover some devices, retrying in 5 secs Apr 16 21:50:32 lrd-selleri kernel: ata1: hard resetting port Apr 16 21:50:32 lrd-selleri kernel: ata1: SATA link down (SStatus 0 SControl 300) Apr 16 21:50:33 lrd-selleri kernel: ata1.00: limiting speed to UDMA/133:PIO3 Apr 16 21:50:33 lrd-selleri kernel: ata1: failed to recover some devices, retrying in 5 secs Apr 16 21:50:33 lrd-selleri kernel: ata1: hard resetting port Apr 16 21:50:33 lrd-selleri kernel: ata1: SATA link down (SStatus 0 SControl 300) Apr 16 21:50:33 lrd-selleri kernel: ata1.00: disabled Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Sense Key : Aborted Command [current] [descriptor] Apr 16 21:50:33 lrd-selleri kernel: Descriptor sense data with sense descriptors (in hex): Apr 16 21:50:33 lrd-selleri kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Apr 16 21:50:33 lrd-selleri kernel: 00 00 00 00 Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Add. Sense: No additional sense information Apr 16 21:50:33 lrd-selleri kernel: end_request: I/O error, dev sda, sector 272308480 Apr 16 21:50:33 lrd-selleri kernel: md: super_written gets error=-5, uptodate=0 Apr 16 21:50:33 lrd-selleri kernel: ^IOperation continuing on 1 devices Apr 16 21:50:33 lrd-selleri kernel: ata1: EH complete Apr 16 21:50:33 lrd-selleri kernel: ata1.00: detaching (SCSI 0:0:0:0) Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Stopping disk Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] START_STOP FAILED Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Apr 16 21:50:33 lrd-selleri kernel: RAID1 conf printout: Apr 16 21:50:33 lrd-selleri kernel: --- wd:1 rd:2 Apr 16 21:50:33 lrd-selleri kernel: disk 0, wo:0, o:1, dev:sdb2 Apr 16 21:50:33 lrd-selleri kernel: disk 1, wo:1, o:0, dev:sda2 Apr 16 21:50:33 lrd-selleri kernel: RAID1 conf printout: Apr 16 21:50:33 lrd-selleri kernel: --- wd:1 rd:2 Apr 16 21:50:33 lrd-selleri kernel: disk 0, wo:0, o:1, dev:sdb2 Apr 16 21:50:33 lrd-selleri kernel: ^IOperation continuing on 1 devices Apr 16 21:50:33 lrd-selleri kernel: RAID1 conf printout: Apr 16 21:50:33 lrd-selleri kernel: --- wd:1 rd:2 Apr 16 21:50:33 lrd-selleri kernel: disk 0, wo:0, o:1, dev:sdb1 Apr 16 21:50:33 lrd-selleri kernel: disk 1, wo:1, o:0, dev:sda1 Apr 16 21:50:33 lrd-selleri kernel: RAID1 conf printout: Apr 16 21:50:33 lrd-selleri kernel: --- wd:1 rd:2 Apr 16 21:50:33 lrd-selleri kernel: disk 0, wo:0, o:1, dev:sdb1 Apr 16 21:50:33 lrd-selleri kernel: to dead device Apr 16 21:50:33 lrd-selleri kernel: ^IOperation continuing on 1 devices Apr 16 21:50:34 lrd-selleri kernel: to dead device -- ------------------------- Juhani Karlsson juhani dot karlsson at iki dot fi http://lrdlnx.iki.fi ------------------------- X-Virus-Scanned: Debian amavisd-new (with ClamAV) at lrdlnx.iki.fi -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org