Hi!
I created a sw-raid md0 and a LVM above with four 250GB Samsung SATA
disks a couple of months ago. I am not an raid expert but I thought I
could handle it with a little help of my friends from grml: Andreas
jimmy Gredler and Michael mika Prokop.
,----
| md0 <future mds> (PV:s on partitions or whole disks)
| \ /
| \ /
| datavg (VG)
| |
| |
| datalv (LV)
| |
| ext3 (filesystem)
`----
HW: Promise FastTrack SATA controller on an P3-board. (A previously
used - and preferred - Dawicontrol DC-150 did not work at all: I could
not access the hdds.)
Approximately once a month, there was a short timeout that caused a
disk to be removed from the raid. A SMART-check and a resync (hot-add)
solved the problem so far.
,----[ syslog ]
| May 1 23:12:51 ned kernel: ata2: command timeout
| May 1 23:12:51 ned kernel: ata2: translated ATA stat/err 0x25/00\
to SCSI
SK/ASC/ASCQ 0x4/00/00
| May 1 23:12:51 ned kernel: ata2: status=0x25 { DeviceFault\
CorrectedError Error }
| May 1 23:12:51 ned kernel: SCSI error : <1 0 0 0> return code =\
0x8000002
| May 1 23:12:51 ned kernel: sdb: Current: sense key: Hardware Error
| May 1 23:12:51 ned kernel: Additional sense: No additional sense\
information
| May 1 23:12:51 ned kernel: end_request: I/O error, dev sdb, sector\
179281983
| May 1 23:12:51 ned kernel: raid5: Disk failure on sdb1, disabling\
device.
Operation continuing on 3 devices
| May 1 23:12:51 ned kernel: RAID5 conf printout:
| May 1 23:12:51 ned kernel: --- rd:4 wd:3 fd:1
| May 1 23:12:51 ned kernel: disk 0, o:1, dev:sda1
| May 1 23:12:51 ned kernel: disk 1, o:0, dev:sdb1
| May 1 23:12:51 ned kernel: disk 2, o:1, dev:sdc1
| May 1 23:12:51 ned kernel: disk 3, o:1, dev:sdd1
| May 1 23:12:51 ned kernel: RAID5 conf printout:
| May 1 23:12:51 ned kernel: --- rd:4 wd:3 fd:1
| May 1 23:12:51 ned kernel: disk 0, o:1, dev:sda1
| May 1 23:12:51 ned kernel: disk 2, o:1, dev:sdc1
| May 1 23:12:51 ned kernel: disk 3, o:1, dev:sdd1
`----
But two weeks ago, there were another timeout during such a resync and
that was the beginning of my problem.
Short summary (for the impatient)
=============
sda and sdb were removed, hot adding did not work out and I
accidentally thought, that removing and adding the drives again could
solve my problem. Bad idea.
Now I am not able to get the raid working: all drives are marked as
spares and they can't be assembled:
[EMAIL PROTECTED] ~ # mdadm --examine /dev/sd[abcd]1
/dev/sda1:
Magic : a92b4efc
Version : 00.90.02
UUID : 15f07005:037e4abf:70f51389:83dde0ed
Creation Time : Sun Jan 29 21:35:05 2006
Raid Level : raid5
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sun Jul 2 17:23:03 2006
State : clean
Active Devices : 0
Working Devices : 4
Failed Devices : 0
Spare Devices : 4
Checksum : 4eb2dfe6 - correct
Events : 0.1652541
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 1 4 spare /dev/sda1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 0 0 3 faulty removed
4 4 8 1 4 spare /dev/sda1
5 5 8 33 5 spare /dev/sdc1
6 6 8 17 6 spare /dev/sdb1
7 7 8 49 7 spare /dev/sdd1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.02
UUID : 15f07005:037e4abf:70f51389:83dde0ed
Creation Time : Sun Jan 29 21:35:05 2006
Raid Level : raid5
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sun Jul 2 17:23:03 2006
State : clean
Active Devices : 0
Working Devices : 4
Failed Devices : 0
Spare Devices : 4
Checksum : 4eb2dffa - correct
Events : 0.1652541
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 6 8 17 6 spare /dev/sdb1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 0 0 3 faulty removed
4 4 8 1 4 spare /dev/sda1
5 5 8 33 5 spare /dev/sdc1
6 6 8 17 6 spare /dev/sdb1
7 7 8 49 7 spare /dev/sdd1
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.02
UUID : 15f07005:037e4abf:70f51389:83dde0ed
Creation Time : Sun Jan 29 21:35:05 2006
Raid Level : raid5
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sun Jul 2 17:23:03 2006
State : clean
Active Devices : 0
Working Devices : 4
Failed Devices : 0
Spare Devices : 4
Checksum : 4eb2e008 - correct
Events : 0.1652541
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 5 8 33 5 spare /dev/sdc1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 0 0 3 faulty removed
4 4 8 1 4 spare /dev/sda1
5 5 8 33 5 spare /dev/sdc1
6 6 8 17 6 spare /dev/sdb1
7 7 8 49 7 spare /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.02
UUID : 15f07005:037e4abf:70f51389:83dde0ed
Creation Time : Sun Jan 29 21:35:05 2006
Raid Level : raid5
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sun Jul 2 17:23:03 2006
State : clean
Active Devices : 0
Working Devices : 4
Failed Devices : 0
Spare Devices : 4
Checksum : 4eb2e01c - correct
Events : 0.1652541
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 7 8 49 7 spare /dev/sdd1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 0 0 3 faulty removed
4 4 8 1 4 spare /dev/sda1
5 5 8 33 5 spare /dev/sdc1
6 6 8 17 6 spare /dev/sdb1
7 7 8 49 7 spare /dev/sdd1
[EMAIL PROTECTED] ~ #
[EMAIL PROTECTED] ~ # date;cat /proc/mdstat
Di Jul 4 21:36:15 CEST 2006
Personalities : [linear] [raid0] [raid1] [raid10] [raid5] [raid4]\
[raid6]
[multipath]
unused devices: <none>
[EMAIL PROTECTED] ~ # mdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
1 [EMAIL PROTECTED] ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\
/dev/sdc1 /dev/sdd1
mdadm: /dev/md0 assembled from 0 drives and 4 spares - not enough to\
start the array.
1 [EMAIL PROTECTED] ~ # mdadm --stop /dev/md0
[EMAIL PROTECTED] ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\
/dev/sdc1 /dev/sdd1 --force
mdadm: /dev/md0 assembled from 0 drives and 4 spares - not\
enough to start the
array.
1 [EMAIL PROTECTED] ~ # mdadm --zero-superblock /dev/sda
mdadm: Couldn't open /dev/sda for write - not zeroing
1 [EMAIL PROTECTED] ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\
/dev/sdc1 /dev/sdd1 --run
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
1 [EMAIL PROTECTED] ~ #
Andreas Gredler suggested following lines as a last attempt but risk
of loosing data which I want to avoid:
mdadm --stop /dev/md0
mdadm --zero-superblock /dev/sda
mdadm --zero-superblock /dev/sdb
mdadm --zero-superblock /dev/sdc
mdadm --zero-superblock /dev/sdd
mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1\
/dev/sdd1 --force
mdadm --create -n 4 -l 5 /dev/md0 missing /dev/sdb1\
/dev/sdc1 /dev/sdd1
Is there another solution to get to my data?
Thank you!
Background history (the whole story - directors cut)
==================
I published the whole story (as much as I could log during my reboots
and so on) on the web:
http://paste.debian.net/8779
It is avaliable for 72h from now on. If you want to read it
afterwards, please write me an email and I send the log to you.
Please feel free to visit this page and do not hesitate to write me,
what I can also check!
mdadm-version: 1.12.0-1
uname: Linux ned 2.6.13-grml #1 Tue Oct 4 18:24:46 CEST 2005\
i686 GNU/Linux
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html