Re: failed raid1 drive

steve Wed, 21 Oct 2009 20:26:57 -0700

On Thu, 2009-10-22 at 16:00 +1300, Roger Searle wrote:
> Hi, I noticed by chance that I have a failed drive in a raid1 array on a 
> file server that I need to replace, and seeking some guidance or 
> confirmation of being on the right track to resolve this.  Seems from 
> the failure of more than 1 partition I will be needing to buy a new disk 
> rather than any repair option being possible, I may as well get a new 
> pair but replace the failed disk first, then when resolved replace the 
> other.  Yes, I have backups of the valuable data on other drives both in 
> the same machine (not in this array) and elsewhere.  And I then need to 
> set up better monitoring because the failure began a few weeks ago.  But 
> for now...
There are so many levels of electronics that you go through to get to
the platter there days that if you see even a single hard error, then
now's a good time to use it only for skeet shooting...
> 
> The failed disk is 320GB, and contain (mirrored) /, home, and swap.  
> Presumably I could buy much larger disks, and need to repartition prior 
> to adding it back into the array? 
Best to use the same make/model of disk if possible. Speed differences
between the two can make it unreliable ( that's an exaggeration, but you
know what I mean ).
> 
> The partitions should be at least the same size but could be much larger 
> without any problem?
If you want to add more space, then I'd buy a new pair of bigger disks,
create a new set, and copy everything across. Reason I'm saying that is
that your 2 existing disks are probably exactly the same make/model,
with similar serial numbers??? Guess what's going to fail next (:


Last time I looked, 1TB was around the $125 mark.
> 
> There is some configuration data in mdadm.conf including UUIDs of the 
> arrays, and this doesn't match with the UUIDs in fstab, do I need to be 
> concerned about this sort of thing and can just use mdadm or other tools 
> to rebuild the arrays and that will update any relevant config files? 
mdadm.conf is pretty redundant I think. They tend to be automagically
configured at boot time these days. Building a new raid array *should*
add the correct data to it. Although I have a grand old time with a
hardy server of mine in this respect.
> 
> Is there anything else I should be looking out for or preparing? 
Don't forget to add a bootstrap to each new disk if this is going to
contain the boot partition as well.
> 
> Thanks for any pointers anyone may care to share.
You could try 

mdadm --add /dev/md3 /dev/sdb4

and see whether it resilvers. Looking in dmesg for hard errors is the
best place.

hth,

Steve
> 
> A couple of examples of DegradedArray and Fail Event emails to root 
> recently follow:
> 
> To: [email protected]
> Subject: Fail event on /dev/md1:jupiter
> Date: Wed, 21 Oct 2009 17:42:49 +1300
> 
> This is an automatically generated mail message from mdadm
> running on jupiter
> 
> A Fail event had been detected on md device /dev/md1.
> 
> It could be related to component device /dev/sdb2.
> 
> Faithfully yours, etc.
> 
> P.S. The /proc/mdstat file currently contains the following:
> 
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
> [raid4] [raid10]
> md3 : active raid1 sda4[0]
>       290977216 blocks [2/1] [U_]
>      
> md2 : active raid1 sda3[0] sdb3[1]
>       104320 blocks [2/2] [UU]
>      
> md1 : active raid1 sda2[0] sdb2[2](F)
>       1951808 blocks [2/1] [U_]
>      
> md0 : active raid1 sda1[0]
>       19534912 blocks [2/1] [U_]
>      
> unused devices: <none>
> 
> 
> 
> Subject: DegradedArray event on /dev/md3:jupiter
> Date: Wed, 07 Oct 2009 08:26:49 +1300
> 
> This is an automatically generated mail message from mdadm
> running on jupiter
> 
> A DegradedArray event had been detected on md device /dev/md3.
> 
> Faithfully yours, etc.
> 
> P.S. The /proc/mdstat file currently contains the following:
> 
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
> [raid4] [raid10]
> md3 : active raid1 sda4[0]
>       290977216 blocks [2/1] [U_]
>      
> md2 : active raid1 sda3[0] sdb3[1]
>       104320 blocks [2/2] [UU]
>      
> md1 : active raid1 sda2[0] sdb2[1]
>       1951808 blocks [2/2] [UU]
>      
> md0 : active raid1 sda1[0]
>       19534912 blocks [2/1] [U_]
>      
> unused devices: <none>
> 
> Cheers,
> Roger
> 
-- 
Steve Holdoway <[email protected]>
http://www.greengecko.co.nz
MSN: [email protected]
GPG Fingerprint = B337 828D 03E1 4F11 CB90  853C C8AB AF04 EF68 52E0

signature.asc
Description: This is a digitally signed message part

Re: failed raid1 drive

Reply via email to