Hi,

We encountered a multi-disk failure on one of our mdadm RAID6 8+2 OSTs. 2 
drives failed in the array within the space of a couple of hours and were 
replaced. It is questionable whether both drives are actually bad because we 
are seeing the same behavior in a test environment where a bad drive is 
actually causing a good drive to be kicked out of an array.

 Unfortunately another of the drives encountered IO errors during the resync 
process and failed causing the array to go out to lunch. The resync process was 
attempted two times with the same result. Fortunately I am able (at least for 
now) to assemble the array with the existing 8/10 arrays and am able to fsck, 
mount via ldiskfs and lustre and am in the process of copying files from the 
vulnerable OST to a backup location using "lfs find --obd <target> 
/scratch|cpio -puvdm ..."

My question is: What is the best way to restore the OST? Obviously I will need 
to somehow restore the array to its full 8+2 configuration. Whether we need to 
start from scratch or use some other means, that is our first priority. But I 
would like to make the recovery as transparent to the users as possible. 

One possible option that we are considering is simply removing the OST from 
Lustre, fixing the array and copying the recovered files to a newly created OST 
(not desirable). Another is to fix the OST (not remove it from Lustre), delete 
the files that exist  and then copy the recovered files back. The problem that 
comes to mind in either scenario is what happens if a file is part of a striped 
file? Does it lose its affinity with the rest of the stripe?

Another scenario that we are wondering about is if we mount the OST via ldiskfs 
and copy everything on the file system to a backup location, fix the array 
maintaining the same tunefs.lustre configuration, then move everything back 
using the same method as it was backed up, will the files be presented to 
lustre (mds and clients) just as it was before when mounted as a lustre file 
system? 

Thanks in advance for you advise and help.

Joe Mervini
Sandia National Laboratories
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to