On Fri, Dec 12, 2008 at 02:58:55PM +1100, Alex Samad wrote: > On Thu, Dec 11, 2008 at 08:06:12PM -0600, lee wrote:
> > One option I haven't tried yet is to plug both SATA disks into the > > same channel (i. e. use adjacent plugs). I didn't do that because they > > might be blocking each other --- this isn't SCSI :( It shouldn't make > > a difference, but then, who knows? Maybe both disks go offline if I do > > that ... > maybe its a port (on the mother board) Well, I had the same problem with my old motherboard. > The other thought that came to mind, maybe be a bit far fetch, is the > drive going into powersaving mode ? I tried that by putting it to sleep with hdparm. It woke up just fine. > what shows up in dmesg when the drive dies, do you see any sata resets > or ??? [...] Dec 11 02:39:02 cat /USR/SBIN/CRON[19809]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm) Dec 11 02:49:29 cat kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 11 02:49:29 cat kernel: ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Dec 11 02:49:29 cat kernel: res 40/00:00:00:4f:c2/00:00:00:c2:00/00 Emask 0x4 (timeout) Dec 11 02:49:29 cat kernel: ata5.00: status: { DRDY } Dec 11 02:49:29 cat kernel: ata5: hard resetting link Dec 11 02:49:30 cat kernel: ata5: SATA link down (SStatus 0 SControl 300) Dec 11 02:49:35 cat kernel: ata5: hard resetting link Dec 11 02:49:35 cat kernel: ata5: SATA link down (SStatus 0 SControl 300) Dec 11 02:49:40 cat kernel: ata5: hard resetting link Dec 11 02:49:40 cat kernel: ata5: SATA link down (SStatus 0 SControl 300) Dec 11 02:49:40 cat kernel: ata5.00: disabled Dec 11 02:49:40 cat kernel: sd 4:0:0:0: rejecting I/O to offline device Dec 11 02:49:40 cat kernel: sd 4:0:0:0: rejecting I/O to offline device Dec 11 02:49:40 cat kernel: end_request: I/O error, dev sdb, sector 478543967 Dec 11 02:49:40 cat kernel: md: super_written gets error=-5, uptodate=0 Dec 11 02:49:40 cat kernel: raid1: Disk failure on sdb2, disabling device. Dec 11 02:49:40 cat kernel: raid1: Operation continuing on 1 devices. Dec 11 02:49:40 cat kernel: ata5: EH complete Dec 11 02:49:40 cat kernel: ata5.00: detaching (SCSI 4:0:0:0) Dec 11 02:49:40 cat kernel: sd 4:0:0:0: [sdb] Synchronizing SCSI cache Dec 11 02:49:40 cat kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Dec 11 02:49:40 cat kernel: sd 4:0:0:0: [sdb] Stopping disk Dec 11 02:49:40 cat kernel: sd 4:0:0:0: [sdb] START_STOP FAILED Dec 11 02:49:40 cat kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Dec 11 02:49:40 cat kernel: RAID1 conf printout: Dec 11 02:49:40 cat kernel: --- wd:1 rd:2 Dec 11 02:49:40 cat kernel: disk 0, wo:0, o:1, dev:sda2 Dec 11 02:49:40 cat kernel: disk 1, wo:1, o:0, dev:sdb2 Dec 11 02:49:40 cat kernel: RAID1 conf printout: Dec 11 02:49:40 cat kernel: --- wd:1 rd:2 Dec 11 02:49:40 cat kernel: disk 0, wo:0, o:1, dev:sda2 Dec 11 02:49:40 cat mdadm[1891]: Fail event detected on md device /dev/md1, component device /dev/sdb2 Dec 11 03:06:38 cat kernel: scsi 4:0:0:0: rejecting I/O to dead device Dec 11 03:06:38 cat kernel: scsi 4:0:0:0: rejecting I/O to dead device Dec 11 03:06:38 cat kernel: end_request: I/O error, dev sdb, sector 146496512 Dec 11 03:06:38 cat kernel: md: super_written gets error=-5, uptodate=0 Dec 11 03:06:38 cat kernel: raid1: Disk failure on sdb1, disabling device. Dec 11 03:06:38 cat kernel: raid1: Operation continuing on 1 devices. Dec 11 03:06:38 cat kernel: RAID1 conf printout: Dec 11 03:06:38 cat kernel: --- wd:1 rd:2 Dec 11 03:06:38 cat kernel: disk 0, wo:0, o:1, dev:sda1 Dec 11 03:06:38 cat kernel: disk 1, wo:1, o:0, dev:sdb1 Dec 11 03:06:38 cat kernel: RAID1 conf printout: Dec 11 03:06:38 cat kernel: --- wd:1 rd:2 Dec 11 03:06:38 cat kernel: disk 0, wo:0, o:1, dev:sda1 Dec 11 03:06:38 cat mdadm[1891]: Fail event detected on md device /dev/md0, component device /dev/sdb1 Dec 11 03:06:40 cat smartd[1797]: Device: /dev/sdb, No such device, open() failed Dec 11 03:06:40 cat smartd[1797]: Sending warning via /usr/share/smartmontools/smartd-runner to root ... Dec 11 03:06:41 cat smartd[1797]: Warning via /usr/share/smartmontools/smartd-runner to root: successful [...] Dec 11 03:36:40 cat smartd[1797]: Device: /dev/hdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 113 to 112 Dec 11 03:36:40 cat smartd[1797]: Device: /dev/sda, SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 72 to 71 Dec 11 03:36:40 cat smartd[1797]: Device: /dev/sdb, No such device, open() failed [...] Dec 11 04:06:40 cat smartd[1797]: Device: /dev/sdb, No such device, open() failed [...] Dec 11 04:36:40 cat smartd[1797]: Device: /dev/hdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 112 to 113 Dec 11 04:36:41 cat smartd[1797]: Device: /dev/sdb, No such device, open() failed [...] Dec 11 05:06:40 cat smartd[1797]: Device: /dev/hdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 113 to 112 Dec 11 05:06:40 cat smartd[1797]: Device: /dev/sdb, No such device, open() failed [...] Dec 11 05:36:40 cat smartd[1797]: Device: /dev/hdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 112 to 113 Dec 11 05:36:40 cat smartd[1797]: Device: /dev/sda, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 247 to 246 Dec 11 05:36:40 cat smartd[1797]: Device: /dev/sdb, No such device, open() failed [...] Dec 11 06:06:40 cat smartd[1797]: Device: /dev/hdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 113 to 112 Dec 11 06:06:41 cat smartd[1797]: Device: /dev/sdb, No such device, open() failed [...] Dec 11 06:25:12 cat kernel: raid1: sdb3: rescheduling sector 12320 Dec 11 06:25:12 cat kernel: raid1: Disk failure on sdb3, disabling device. Dec 11 06:25:12 cat kernel: raid1: Operation continuing on 1 devices. Dec 11 06:25:12 cat kernel: raid1: sda3: redirecting sector 12320 to another mirror Dec 11 06:25:12 cat kernel: RAID1 conf printout: Dec 11 06:25:12 cat kernel: --- wd:1 rd:2 Dec 11 06:25:12 cat kernel: disk 0, wo:0, o:1, dev:sda3 Dec 11 06:25:12 cat kernel: disk 1, wo:1, o:0, dev:sdb3 Dec 11 06:25:12 cat kernel: RAID1 conf printout: Dec 11 06:25:12 cat kernel: --- wd:1 rd:2 Dec 11 06:25:12 cat kernel: disk 0, wo:0, o:1, dev:sda3 Dec 11 06:25:12 cat mdadm[1891]: Fail event detected on md device /dev/md2, component device /dev/sdb3 Dec 11 06:25:13 cat syslogd 1.5.0#5: restart. Dec 11 06:26:12 cat mdadm[1891]: SpareActive event detected on md device /dev/md2, component device /dev/sdb3 There are no spares in the RAID. As you can see, it comes out of nowhere, there are 10 minutes between the last entry in the log and the exception. Hm, I could swap the drives to see if the problem is with a particular disk or if it's with the second disk. If it's always with the second disk, it must be a software problem. If it's not, it's probably the disk being broken from factory. -- "Don't let them, daddy. Don't let the stars run down." http://adin.dyndns.org/adin/TheLastQ.htm -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org