Joe Fleming wrote:
Hey all, I have a Debian box that was acting as a 4 drive RAID-5 mdadm
softraid server. I heard one of the drives making strange noises but
mdstat reported no problems with any of the drives. I decided to copy
the data off the array so I had a backup before I tried to figure out
which drive it was. Unfortunately, in the middle of copying said data,
2 of the drives dropped out at the same time. Since RAID-5 is only
tolerant to one failure at a time, basically the whole array is hosed
now. I've had drives drop out on me before, but never 2 at once. Sigh.
I tried to Google a little about dealing with multi-drive failures
with mdadm, but I couldn't find much in my initial looking. I'm going
to keep digging, but I thought I'd post a question to the group and
see what happens. So, is there a way to tell mdadm to "unmark" one of
the 2 drives as failed and try to bring up the array again WITHOUT
rebuilding it? I really don't think both of the drives failed on me
simultaneously and I'd like to try to return 1 of the 2 to the array
and test my theory. If I can get the array back up, I can either keep
trying to copy data off it or add a new replacement and try to
rebuild. I'm pretty novice with mdadm thought I don't see an option
that will let me do what I want. Can anyone offer me some advice or
point me in the right direction..... or am I just SOL?
As a side note, why can't hard drive manufacturers make drives that
last anymore? I've had like 5 drives fail on me in the last year...
WD, Seagate, Hitachi, they all suck equally! I can't find any that
last for any reasonable amount of time, and all the warranties leave
you with reman'd drives which fail even more rapidly, some even show
up DOA. Plus, I'm not sending my unencrypted data off to some random
place! Sorry for venting, just a little ticked off at all of this.
Thanks in advance for any help.
-Joe
I've had luck in the past recovering from a multi-drive failure, where
the other failed drive was not truly dead but rather was dropped because
of an IO error caused by a thermal calibration or something similar.
The trick is to re-add the drive to the array and using the option to
force it NOT to try to rebuild the array. This used to be an require
several options like --really-force and --really-dangerous but now I
think its just something like --assemble --force /dev/md0. This forces
the array to come back up to its degraded (still down 1 disk) state. If
possible replace the degraded disk or copy your data off before the
other flakey drive fails.
---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss