Re: [Lustre-discuss] recovery from multiple disks failure on the same md

2012-05-10 Thread Tae Young Hong
Thank you all  for your valuable information.
We survived and about 1 million files survived. At the first time I wanted get 
recovery professional under our support contract, but it's not possible to get 
the right guy in the right time.
So we had to do it on our own, roughly following the procedure Adrian 
mentioned, but we still felt risky and we needed good luck, now I feel that I 
do not want to do this ever again.

For your information,
dd_rescue showed that about 4MB at the almost end of the disk had bad sector. 
It took about 20 hrs to run for 1 TB SATA disk, we ran this on an OSS whose 
load was relatively small.

After inserting the fresh one into the original oss(oss07) in question, we 
found that mdadm with  -A --force could assemble it with some errors, and 
it's state was  active, degraded, Not Started, and we had to use the 
following to start and resync it.
echo clean  /sys/block/md12/md/array_state   
I didn't know other method to start it.

At the 1st try, we failed and two disks fell into faulty, maybe because at that 
times (we had a periodic maintenance), we rebooted the pair OSS node(oss08) to 
patch the lustre kernel(1.8.5), raid5 one-line fix which was mentioned by Kevin 
before.
For the next try, I updated the raid5 patched lustre kernel on oss07 and just 
power-cycled the jbod(J4400) and oss07 and then we made it without any error 
while resyncing and we found that just only 2 inodes were stale by running 
e2fsck.

Thank you also for the detailed information why we need periodic scrubbing.

Taeyoung Hong
Senior Researcher
Supercomputing Center, KISTI 

2012. 5. 8., 오전 4:24, Mark Hahn 작성:

 I'd also recommend to start periodic scrubbing: We do this once per month
 with low priority (~5MBPS) with little impact to the users.
 
 yes.  and if you think a rebuild might overstress marginal disks,
 throttling via the dev.raid.speed_limit_max sysctl can help.
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] recovery from multiple disks failure on the same md

2012-05-07 Thread Kevin Van Maren

On May 6, 2012, at 10:13 PM, Tae Young Hong wrote:

Hi,

I found the terrible situation on our lustre system.
A OST (raid 6: 8+2, spare 1) had 2 disk failures almost at the same time. While 
recovering it, another disk failed. so recovering procedure seems to be halt, 
and the spare disk which were in resync fell into spare status again. (I 
guess that resync procedure almost finished more than 95%)
Right now we have just 7 disks for this md. Is there any possibility to recover 
from this situation?



It might be possible, but not something I've done.  If the array has not been 
written to since a drive failed, you might be able to power-cycle the failed 
drives (to reset the firmware) and force re-add them (without a rebuild)?  If 
the array _has_ been modified (most likely) you could write a sector of 0's to 
the bad sector, which will corrupt just that stripe, and force-re-add the last 
failed drive and attempt to rebuild again.

Certainly if you have a support contract I'd recommend you get professional 
assistance.



Unfortunately, the failure mode you encountered is all too common.  Because the 
Linux SW RAID code does not read the parity blocks unless there is a problem, 
hard drive failures are NOT independent: drives appear to fail more often 
during a rebuild than at any other time.  The only way to work around this 
problem is to periodically do a verify of the MD array.

A verify allows the drive, which is failing in the 20% of the space that 
contains parity, to fail _before_ the data becomes unreadable, rather than fail 
_after_ the data becomes unreadable.  Don't do it on a degraded array, but it 
is a good way to ensure healthy arrays are really healthy.

echo check  /sys/block/mdX/md/sync_action to force a verify.  Parity 
mis-matches will be reported (not corrected), but drive failures can be dealt 
with sooner, rather than letting them stack up.  Do man md and see the 
sync_action section.

Also note that Lustre 1.8.7 has a fix to the SW RAID code (corruption when 
rebuilding under load).  Oracle's release called the patch 
md-avoid-corrupted-ldiskfs-after-rebuild.patch, while Whamcloud called it 
raid5-rebuild-corrupt-bug.patch

Kevin



The following is detailed log.
#1 the original configuration before any failure

 Number   Major   Minor   RaidDevice State
   0   8  1760  active sync   /dev/sdl
   1   8  1921  active sync   /dev/sdm
   2   8  2082  active sync   /dev/sdn
   3   8  2243  active sync   /dev/sdo
   4   8  2404  active sync   /dev/sdp
   5  6505  active sync   /dev/sdq
   6  65   166  active sync   /dev/sdr
   7  65   327  active sync   /dev/sds
   8  65   488  active sync   /dev/sdt
   9  65   969  active sync   /dev/sdw

  10  65   64-  spare   /dev/sdu

#2 a disk(sdl) failed, and resync started after adding spare disk(sdu)
May  7 04:53:33 oss07 kernel: sd 1:0:10:0: SCSI error: return code = 0x0802
May  7 04:53:33 oss07 kernel: sdl: Current: sense key: Medium Error
May  7 04:53:33 oss07 kernel: Add. Sense: Unrecovered read error
May  7 04:53:33 oss07 kernel:
May  7 04:53:33 oss07 kernel: Info fld=0x74241ace
May  7 04:53:33 oss07 kernel: end_request: I/O error, dev sdl, sector 1948523214
... ...
May  7 04:54:15 oss07 kernel: RAID5 conf printout:
May  7 04:54:16 oss07 kernel:  --- rd:10 wd:9 fd:1
May  7 04:54:16 oss07 kernel:  disk 1, o:1, dev:sdm
May  7 04:54:16 oss07 kernel:  disk 2, o:1, dev:sdn
May  7 04:54:16 oss07 kernel:  disk 3, o:1, dev:sdo
May  7 04:54:16 oss07 kernel:  disk 4, o:1, dev:sdp
May  7 04:54:16 oss07 kernel:  disk 5, o:1, dev:sdq
May  7 04:54:16 oss07 kernel:  disk 6, o:1, dev:sdr
May  7 04:54:16 oss07 kernel:  disk 7, o:1, dev:sds
May  7 04:54:16 oss07 kernel:  disk 8, o:1, dev:sdt
May  7 04:54:16 oss07 kernel:  disk 9, o:1, dev:sdw
May  7 04:54:16 oss07 kernel: RAID5 conf printout:
May  7 04:54:16 oss07 kernel:  --- rd:10 wd:9 fd:1
May  7 04:54:16 oss07 kernel:  disk 0, o:1, dev:sdu
May  7 04:54:16 oss07 kernel:  disk 1, o:1, dev:sdm
May  7 04:54:16 oss07 kernel:  disk 2, o:1, dev:sdn
May  7 04:54:16 oss07 kernel:  disk 3, o:1, dev:sdo
May  7 04:54:16 oss07 kernel:  disk 4, o:1, dev:sdp
May  7 04:54:16 oss07 kernel:  disk 5, o:1, dev:sdq
May  7 04:54:16 oss07 kernel:  disk 6, o:1, dev:sdr
May  7 04:54:16 oss07 kernel:  disk 7, o:1, dev:sds
May  7 04:54:16 oss07 kernel:  disk 8, o:1, dev:sdt
May  7 04:54:16 oss07 kernel:  disk 9, o:1, dev:sdw
May  7 04:54:16 oss07 kernel: md: syncing RAID array md12


#3 another disk(sdp) failed
May  7 04:54:42 oss07 kernel: end_request: I/O error, dev sdp, sector 1949298688
May  7 04:54:42 oss07 kernel: mptbase: ioc1: LogInfo(0x3108): 
Originator={PL}, Code={SATA NCQ FaCommands After Error}, SubCode(0x)
May  7 04:54:42 oss07 

Re: [Lustre-discuss] recovery from multiple disks failure on the same md

2012-05-07 Thread Adrian Ulrich
Hi,


 A OST (raid 6: 8+2, spare 1) had 2 disk failures almost at the same time. 
 While recovering it, another disk failed. so recovering procedure seems to be 
 halt,

So did the md-array stop itself on the 3th disk failure (or at least turn 
read-only)?

If it did you might be able to get it running again without catastrophic 
corruption.


This is what i would try (without any warranty!):


 - Forget about the 2 syncing spares

 - Take the 3th failed disk and attach it to some pc

 - Copy as much data as possible to a new spare using dd_rescue
(-r might help)

 - Put the drive with the fresh copy (= the good, new drive) into the array 
and assemble + start it.
Use --force if mdadm complains about outdated metadata.
(and starting it as 'readonly' for now would also be a good idea)

 - Add a new spare to the array and sync it as fast as possible to get at 
least 1 parity disk.

 - Run 'fsck -n /dev/mdX' to see how badly damaged your filesystem is.
If you think that fsck can fix the errors (and will not cause more 
damadge), run it without '-n'

 - Add the 2nd parity disk, sync it, mount the filesystem and pray.


The amount of data corruption will be linked to the success of dd_rescue: You 
are probably lucky if it only failed to read a few sectors.


And i agree with Kevin:

If you have a support contract: ask them to fix it.
(..and if you have enough hardware + time: create a backup of ALL drives in the 
failed raid via 'dd' before touching anything!)


I'd also recommend to start periodic scrubbing: We do this once per month with 
low priority (~5MBPS) with little impact to the users.


Regards and good luck,
 Adrian
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] recovery from multiple disks failure on the same md

2012-05-07 Thread Mark Hahn
 I'd also recommend to start periodic scrubbing: We do this once per month
with low priority (~5MBPS) with little impact to the users.

yes.  and if you think a rebuild might overstress marginal disks,
throttling via the dev.raid.speed_limit_max sysctl can help.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] recovery from multiple disks failure on the same md

2012-05-06 Thread Tae Young Hong
Hi,

I found the terrible situation on our lustre system.
A OST (raid 6: 8+2, spare 1) had 2 disk failures almost at the same time. While 
recovering it, another disk failed. so recovering procedure seems to be halt, 
and the spare disk which were in resync fell into spare status again. (I 
guess that resync procedure almost finished more than 95%)  
Right now we have just 7 disks for this md. Is there any possibility to recover 
from this situation?


The following is detailed log.
#1 the original configuration before any failure

 Number   Major   Minor   RaidDevice State
   0   8  1760  active sync   /dev/sdl
   1   8  1921  active sync   /dev/sdm
   2   8  2082  active sync   /dev/sdn
   3   8  2243  active sync   /dev/sdo
   4   8  2404  active sync   /dev/sdp
   5  6505  active sync   /dev/sdq
   6  65   166  active sync   /dev/sdr
   7  65   327  active sync   /dev/sds
   8  65   488  active sync   /dev/sdt
   9  65   969  active sync   /dev/sdw

  10  65   64-  spare   /dev/sdu

#2 a disk(sdl) failed, and resync started after adding spare disk(sdu)
May  7 04:53:33 oss07 kernel: sd 1:0:10:0: SCSI error: return code = 0x0802
May  7 04:53:33 oss07 kernel: sdl: Current: sense key: Medium Error
May  7 04:53:33 oss07 kernel: Add. Sense: Unrecovered read error
May  7 04:53:33 oss07 kernel:
May  7 04:53:33 oss07 kernel: Info fld=0x74241ace
May  7 04:53:33 oss07 kernel: end_request: I/O error, dev sdl, sector 1948523214
... ...
May  7 04:54:15 oss07 kernel: RAID5 conf printout:
May  7 04:54:16 oss07 kernel:  --- rd:10 wd:9 fd:1
May  7 04:54:16 oss07 kernel:  disk 1, o:1, dev:sdm
May  7 04:54:16 oss07 kernel:  disk 2, o:1, dev:sdn
May  7 04:54:16 oss07 kernel:  disk 3, o:1, dev:sdo
May  7 04:54:16 oss07 kernel:  disk 4, o:1, dev:sdp
May  7 04:54:16 oss07 kernel:  disk 5, o:1, dev:sdq
May  7 04:54:16 oss07 kernel:  disk 6, o:1, dev:sdr
May  7 04:54:16 oss07 kernel:  disk 7, o:1, dev:sds
May  7 04:54:16 oss07 kernel:  disk 8, o:1, dev:sdt
May  7 04:54:16 oss07 kernel:  disk 9, o:1, dev:sdw
May  7 04:54:16 oss07 kernel: RAID5 conf printout:
May  7 04:54:16 oss07 kernel:  --- rd:10 wd:9 fd:1
May  7 04:54:16 oss07 kernel:  disk 0, o:1, dev:sdu
May  7 04:54:16 oss07 kernel:  disk 1, o:1, dev:sdm
May  7 04:54:16 oss07 kernel:  disk 2, o:1, dev:sdn
May  7 04:54:16 oss07 kernel:  disk 3, o:1, dev:sdo
May  7 04:54:16 oss07 kernel:  disk 4, o:1, dev:sdp
May  7 04:54:16 oss07 kernel:  disk 5, o:1, dev:sdq
May  7 04:54:16 oss07 kernel:  disk 6, o:1, dev:sdr
May  7 04:54:16 oss07 kernel:  disk 7, o:1, dev:sds
May  7 04:54:16 oss07 kernel:  disk 8, o:1, dev:sdt
May  7 04:54:16 oss07 kernel:  disk 9, o:1, dev:sdw
May  7 04:54:16 oss07 kernel: md: syncing RAID array md12


#3 another disk(sdp) failed
May  7 04:54:42 oss07 kernel: end_request: I/O error, dev sdp, sector 1949298688
May  7 04:54:42 oss07 kernel: mptbase: ioc1: LogInfo(0x3108): 
Originator={PL}, Code={SATA NCQ FaCommands After Error}, SubCode(0x)
May  7 04:54:42 oss07 last message repeated 3 times
May  7 04:54:42 oss07 kernel: raid5:md12: read error not correctable (sector 
1949298688 on sdp).
May  7 04:54:42 oss07 kernel: raid5: Disk failure on sdp, disabling device. 
Operation continuing on
May  7 04:54:43 oss07 kernel: end_request: I/O error, dev sdp, sector 1948532499
... ...
May  7 04:54:44 oss07 kernel: raid5:md12: read error not correctable (sector 
1948532728 on sdp).
May  7 04:54:44 oss07 kernel: md: md12: sync done.
May  7 04:54:53 oss07 kernel: RAID5 conf printout:
May  7 04:54:53 oss07 kernel:  --- rd:10 wd:8 fd:2
May  7 04:54:53 oss07 kernel:  disk 0, o:1, dev:sdu
May  7 04:54:53 oss07 kernel:  disk 1, o:1, dev:sdm
May  7 04:54:53 oss07 kernel:  disk 2, o:1, dev:sdn
May  7 04:54:53 oss07 kernel:  disk 3, o:1, dev:sdo
May  7 04:54:53 oss07 kernel:  disk 4, o:0, dev:sdp
May  7 04:54:53 oss07 kernel:  disk 5, o:1, dev:sdq
May  7 04:54:53 oss07 kernel:  disk 6, o:1, dev:sdr
May  7 04:54:53 oss07 kernel:  disk 7, o:1, dev:sds
May  7 04:54:53 oss07 kernel:  disk 8, o:1, dev:sdt
May  7 04:54:53 oss07 kernel:  disk 9, o:1, dev:sdw
... ...
May  7 04:54:54 oss07 kernel: RAID5 conf printout:
May  7 04:54:54 oss07 kernel:  --- rd:10 wd:8 fd:2
May  7 04:54:54 oss07 kernel:  disk 1, o:1, dev:sdm
May  7 04:54:54 oss07 kernel:  disk 2, o:1, dev:sdn
May  7 04:54:54 oss07 kernel:  disk 3, o:1, dev:sdo
May  7 04:54:54 oss07 kernel:  disk 5, o:1, dev:sdq
May  7 04:54:54 oss07 kernel:  disk 6, o:1, dev:sdr
May  7 04:54:54 oss07 kernel:  disk 7, o:1, dev:sds
May  7 04:54:54 oss07 kernel:  disk 8, o:1, dev:sdt
May  7 04:54:54 oss07 kernel:  disk 9, o:1, dev:sdw
May  7 04:54:54 oss07 kernel: RAID5 conf printout:
May  7 04:54:54 oss07 kernel:  --- rd:10 wd:8 fd:2
May  7 04:54:54 oss07