Hi,
> A OST (raid 6: 8+2, spare 1) had 2 disk failures almost at the same time. > While recovering it, another disk failed. so recovering procedure seems to be > halt, So did the md-array stop itself on the 3th disk failure (or at least turn read-only)? If it did you might be able to get it running again without catastrophic corruption. This is what i would try (without any warranty!): -> Forget about the 2 syncing spares -> Take the 3th failed disk and attach it to some pc -> Copy as much data as possible to a new spare using dd_rescue (-r might help) -> Put the drive with the fresh copy (= the good, new drive) into the array and assemble + start it. Use --force if mdadm complains about outdated metadata. (and starting it as 'readonly' for now would also be a good idea) -> Add a new spare to the array and sync it as fast as possible to get at least 1 parity disk. -> Run 'fsck -n /dev/mdX' to see how badly damaged your filesystem is. If you think that fsck can fix the errors (and will not cause more damadge), run it without '-n' -> Add the 2nd parity disk, sync it, mount the filesystem and pray. The amount of data corruption will be linked to the success of dd_rescue: You are probably lucky if it only failed to read a few sectors. And i agree with Kevin: If you have a support contract: ask them to fix it. (..and if you have enough hardware + time: create a backup of ALL drives in the failed raid via 'dd' before touching anything!) I'd also recommend to start periodic scrubbing: We do this once per month with low priority (~5MBPS) with little impact to the users. Regards and good luck, Adrian _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss