Hi everyone, thanks for the great replies, I have a couple of disks arriving soon and am glad of being able to plan when to do the fix rather than having to drop everything - and maybe a long day.

I've re-added the drives back in and they are spending a couple of hours doing the recovery thing. However that's more for interest than anything else, they will both be replaced, one at a time.

Main nervousness was around reinstalling grub, though it seems that this is actually a quite straight forward process, according to pages such as summarised nicely in the following:

http://flog.cruzn.net.au/articles/restore-grub.shtml

and

http://ubuntuforums.org/showthread.php?t=224351

though please feel free to comment on their usefulness.
Roger



Craig Falconer wrote:
I agree with Steve - your best bet is to treat this as a warning and expect to replace both drives in the near future.

<!>  Why is md2 still a valid raidset?  that's odd.


The RAID has succeeded in allowing you to plan this change, rather than losing access to your files.

Since its a software raid, I suggest you plan on buying two new larger drives.

First try Steve's idea and just readd the drive to the array.
It shouldn't hurt anything.

Then two ways to progress....
0    Boot in single user mode
1 Add one new drive to the machine, partition it with similar but larger partitions as appropriate.
2    Then use
        mdadm --add /dev/md3 /dev/sdb4
        mdadm --add /dev/md2 /dev/sdb3
        mdadm --add /dev/md1 /dev/sdb2
        mdadm --add /dev/md0 /dev/sdb1
        sysctl -w dev.raid.speed_limit_max=99999999
3    While this is happening run
        watch --int 10 cat /proc/mdstat
    Wait until all the drives are synched
4 If you boot off this raidset you'll need to reinstall a boot loader on each drive
5    Down the machine and remove the last 320 GB drive.
6    Install the other new drive, then boot.
7    Partition the other new drive the same as the first big drive
8    Repeat steps 2 and 3 but use sda rather than sdb
Once they're finished synching you can grow your filesystems to their full available space
9    Do the boot loader install onto both drives again
10    Then you can reboot and it should all be good.

The other way of doing it is to:
1    Add both new drives
2    Create a new single md device
3 Create a PV and add it to a VG then create individual LVs as large as you want. Leave some spare space and you can grow individual LVs later 4 Use something like rsync to copy the files from the old md to the new LVs
5    Enjoy.


The old 320 that still works can be relegated to a windows box or something else thats unimportant. It could work for another 10 years or it could fail tomorrow... who would know.

steve wrote, On 22/10/09 16:26:
On Thu, 2009-10-22 at 16:00 +1300, Roger Searle wrote:
Hi, I noticed by chance that I have a failed drive in a raid1 array on a file server that I need to replace, and seeking some guidance or confirmation of being on the right track to resolve this. Seems from the failure of more than 1 partition I will be needing to buy a new disk rather than any repair option being possible, I may as well get a new pair but replace the failed disk first, then when resolved replace the other. Yes, I have backups of the valuable data on other drives both in the same machine (not in this array) and elsewhere. And I then need to set up better monitoring because the failure began a few weeks ago. But for now...
There are so many levels of electronics that you go through to get to
the platter there days that if you see even a single hard error, then
now's a good time to use it only for skeet shooting...
The failed disk is 320GB, and contain (mirrored) /, home, and swap. Presumably I could buy much larger disks, and need to repartition prior to adding it back into the array?
Best to use the same make/model of disk if possible. Speed differences
between the two can make it unreliable ( that's an exaggeration, but you
know what I mean ).
The partitions should be at least the same size but could be much larger without any problem?
If you want to add more space, then I'd buy a new pair of bigger disks,
create a new set, and copy everything across. Reason I'm saying that is
that your 2 existing disks are probably exactly the same make/model,
with similar serial numbers??? Guess what's going to fail next (:

Last time I looked, 1TB was around the $125 mark.
There is some configuration data in mdadm.conf including UUIDs of the arrays, and this doesn't match with the UUIDs in fstab, do I need to be concerned about this sort of thing and can just use mdadm or other tools to rebuild the arrays and that will update any relevant config files?
mdadm.conf is pretty redundant I think. They tend to be automagically
configured at boot time these days. Building a new raid array *should*
add the correct data to it. Although I have a grand old time with a
hardy server of mine in this respect.
Is there anything else I should be looking out for or preparing?
Don't forget to add a bootstrap to each new disk if this is going to
contain the boot partition as well.
Thanks for any pointers anyone may care to share.
You could try
mdadm --add /dev/md3 /dev/sdb4

and see whether it resilvers. Looking in dmesg for hard errors is the
best place.

hth,

Steve
A couple of examples of DegradedArray and Fail Event emails to root recently follow:

To: [email protected]
Subject: Fail event on /dev/md1:jupiter
Date: Wed, 21 Oct 2009 17:42:49 +1300

This is an automatically generated mail message from mdadm
running on jupiter

A Fail event had been detected on md device /dev/md1.

It could be related to component device /dev/sdb2.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md3 : active raid1 sda4[0]
      290977216 blocks [2/1] [U_]
     md2 : active raid1 sda3[0] sdb3[1]
      104320 blocks [2/2] [UU]
     md1 : active raid1 sda2[0] sdb2[2](F)
      1951808 blocks [2/1] [U_]
     md0 : active raid1 sda1[0]
      19534912 blocks [2/1] [U_]
     unused devices: <none>



Subject: DegradedArray event on /dev/md3:jupiter
Date: Wed, 07 Oct 2009 08:26:49 +1300

This is an automatically generated mail message from mdadm
running on jupiter

A DegradedArray event had been detected on md device /dev/md3.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md3 : active raid1 sda4[0]
      290977216 blocks [2/1] [U_]
     md2 : active raid1 sda3[0] sdb3[1]
      104320 blocks [2/2] [UU]
     md1 : active raid1 sda2[0] sdb2[1]
      1951808 blocks [2/2] [UU]
     md0 : active raid1 sda1[0]
      19534912 blocks [2/1] [U_]
     unused devices: <none>

Cheers,
Roger



Reply via email to