On May 6, 2012, at 10:13 PM, Tae Young Hong wrote: Hi,
I found the terrible situation on our lustre system. A OST (raid 6: 8+2, spare 1) had 2 disk failures almost at the same time. While recovering it, another disk failed. so recovering procedure seems to be halt, and the spare disk which were in resync fell into "spare" status again. (I guess that resync procedure almost finished more than 95%) Right now we have just 7 disks for this md. Is there any possibility to recover from this situation? It might be possible, but not something I've done. If the array has not been written to since a drive failed, you might be able to power-cycle the failed drives (to reset the firmware) and force re-add them (without a rebuild)? If the array _has_ been modified (most likely) you could write a sector of 0's to the bad sector, which will corrupt just that stripe, and force-re-add the last failed drive and attempt to rebuild again. Certainly if you have a support contract I'd recommend you get professional assistance. Unfortunately, the failure mode you encountered is all too common. Because the Linux SW RAID code does not read the parity blocks unless there is a problem, hard drive failures are NOT independent: drives appear to fail more often during a rebuild than at any other time. The only way to work around this problem is to periodically do a "verify" of the MD array. A verify allows the drive, which is failing in the 20% of the space that contains parity, to fail _before_ the data becomes unreadable, rather than fail _after_ the data becomes unreadable. Don't do it on a degraded array, but it is a good way to ensure healthy arrays are really healthy. "echo check > /sys/block/mdX/md/sync_action" to force a verify. Parity mis-matches will be reported (not corrected), but drive failures can be dealt with sooner, rather than letting them stack up. Do "man md" and see the "sync_action" section. Also note that Lustre 1.8.7 has a fix to the SW RAID code (corruption when rebuilding under load). Oracle's release called the patch md-avoid-corrupted-ldiskfs-after-rebuild.patch, while Whamcloud called it raid5-rebuild-corrupt-bug.patch Kevin The following is detailed log. #1 the original configuration before any failure Number Major Minor RaidDevice State 0 8 176 0 active sync /dev/sdl 1 8 192 1 active sync /dev/sdm 2 8 208 2 active sync /dev/sdn 3 8 224 3 active sync /dev/sdo 4 8 240 4 active sync /dev/sdp 5 65 0 5 active sync /dev/sdq 6 65 16 6 active sync /dev/sdr 7 65 32 7 active sync /dev/sds 8 65 48 8 active sync /dev/sdt 9 65 96 9 active sync /dev/sdw 10 65 64 - spare /dev/sdu #2 a disk(sdl) failed, and resync started after adding spare disk(sdu) May 7 04:53:33 oss07 kernel: sd 1:0:10:0: SCSI error: return code = 0x08000002 May 7 04:53:33 oss07 kernel: sdl: Current: sense key: Medium Error May 7 04:53:33 oss07 kernel: Add. Sense: Unrecovered read error May 7 04:53:33 oss07 kernel: May 7 04:53:33 oss07 kernel: Info fld=0x74241ace May 7 04:53:33 oss07 kernel: end_request: I/O error, dev sdl, sector 1948523214 ... ... May 7 04:54:15 oss07 kernel: RAID5 conf printout: May 7 04:54:16 oss07 kernel: --- rd:10 wd:9 fd:1 May 7 04:54:16 oss07 kernel: disk 1, o:1, dev:sdm May 7 04:54:16 oss07 kernel: disk 2, o:1, dev:sdn May 7 04:54:16 oss07 kernel: disk 3, o:1, dev:sdo May 7 04:54:16 oss07 kernel: disk 4, o:1, dev:sdp May 7 04:54:16 oss07 kernel: disk 5, o:1, dev:sdq May 7 04:54:16 oss07 kernel: disk 6, o:1, dev:sdr May 7 04:54:16 oss07 kernel: disk 7, o:1, dev:sds May 7 04:54:16 oss07 kernel: disk 8, o:1, dev:sdt May 7 04:54:16 oss07 kernel: disk 9, o:1, dev:sdw May 7 04:54:16 oss07 kernel: RAID5 conf printout: May 7 04:54:16 oss07 kernel: --- rd:10 wd:9 fd:1 May 7 04:54:16 oss07 kernel: disk 0, o:1, dev:sdu May 7 04:54:16 oss07 kernel: disk 1, o:1, dev:sdm May 7 04:54:16 oss07 kernel: disk 2, o:1, dev:sdn May 7 04:54:16 oss07 kernel: disk 3, o:1, dev:sdo May 7 04:54:16 oss07 kernel: disk 4, o:1, dev:sdp May 7 04:54:16 oss07 kernel: disk 5, o:1, dev:sdq May 7 04:54:16 oss07 kernel: disk 6, o:1, dev:sdr May 7 04:54:16 oss07 kernel: disk 7, o:1, dev:sds May 7 04:54:16 oss07 kernel: disk 8, o:1, dev:sdt May 7 04:54:16 oss07 kernel: disk 9, o:1, dev:sdw May 7 04:54:16 oss07 kernel: md: syncing RAID array md12 #3 another disk(sdp) failed May 7 04:54:42 oss07 kernel: end_request: I/O error, dev sdp, sector 1949298688 May 7 04:54:42 oss07 kernel: mptbase: ioc1: LogInfo(0x31080000): Originator={PL}, Code={SATA NCQ FaCommands After Error}, SubCode(0x0000) May 7 04:54:42 oss07 last message repeated 3 times May 7 04:54:42 oss07 kernel: raid5:md12: read error not correctable (sector 1949298688 on sdp). May 7 04:54:42 oss07 kernel: raid5: Disk failure on sdp, disabling device. Operation continuing on May 7 04:54:43 oss07 kernel: end_request: I/O error, dev sdp, sector 1948532499 ... ... May 7 04:54:44 oss07 kernel: raid5:md12: read error not correctable (sector 1948532728 on sdp). May 7 04:54:44 oss07 kernel: md: md12: sync done. May 7 04:54:53 oss07 kernel: RAID5 conf printout: May 7 04:54:53 oss07 kernel: --- rd:10 wd:8 fd:2 May 7 04:54:53 oss07 kernel: disk 0, o:1, dev:sdu May 7 04:54:53 oss07 kernel: disk 1, o:1, dev:sdm May 7 04:54:53 oss07 kernel: disk 2, o:1, dev:sdn May 7 04:54:53 oss07 kernel: disk 3, o:1, dev:sdo May 7 04:54:53 oss07 kernel: disk 4, o:0, dev:sdp May 7 04:54:53 oss07 kernel: disk 5, o:1, dev:sdq May 7 04:54:53 oss07 kernel: disk 6, o:1, dev:sdr May 7 04:54:53 oss07 kernel: disk 7, o:1, dev:sds May 7 04:54:53 oss07 kernel: disk 8, o:1, dev:sdt May 7 04:54:53 oss07 kernel: disk 9, o:1, dev:sdw ... ... May 7 04:54:54 oss07 kernel: RAID5 conf printout: May 7 04:54:54 oss07 kernel: --- rd:10 wd:8 fd:2 May 7 04:54:54 oss07 kernel: disk 1, o:1, dev:sdm May 7 04:54:54 oss07 kernel: disk 2, o:1, dev:sdn May 7 04:54:54 oss07 kernel: disk 3, o:1, dev:sdo May 7 04:54:54 oss07 kernel: disk 5, o:1, dev:sdq May 7 04:54:54 oss07 kernel: disk 6, o:1, dev:sdr May 7 04:54:54 oss07 kernel: disk 7, o:1, dev:sds May 7 04:54:54 oss07 kernel: disk 8, o:1, dev:sdt May 7 04:54:54 oss07 kernel: disk 9, o:1, dev:sdw May 7 04:54:54 oss07 kernel: RAID5 conf printout: May 7 04:54:54 oss07 kernel: --- rd:10 wd:8 fd:2 May 7 04:54:54 oss07 kernel: disk 0, o:1, dev:sdu May 7 04:54:54 oss07 kernel: disk 1, o:1, dev:sdm May 7 04:54:54 oss07 kernel: disk 2, o:1, dev:sdn May 7 04:54:54 oss07 kernel: disk 3, o:1, dev:sdo May 7 04:54:54 oss07 kernel: disk 5, o:1, dev:sdq May 7 04:54:54 oss07 kernel: disk 6, o:1, dev:sdr May 7 04:54:54 oss07 kernel: disk 7, o:1, dev:sds May 7 04:54:55 oss07 kernel: disk 8, o:1, dev:sdt May 7 04:54:55 oss07 kernel: disk 9, o:1, dev:sdw May 7 04:54:55 oss07 kernel: md: syncing RAID array md12 # the 3rd disk(sdm) failed while resyncing May 7 09:41:53 oss07 kernel: mptbase: ioc1: LogInfo(0x31080000): Originator={PL}, Code={SATA NCQ Fail All Commands After Error}, SubCode(0x0000) May 7 09:41:57 oss07 kernel: mptbase: ioc1: LogInfo(0x31110e00): Originator={PL}, Code={Reset}, SubCode(0x0e00) May 7 09:41:59 oss07 last message repeated 24 times May 7 09:42:04 oss07 kernel: mptbase: ioc1: LogInfo(0x31080000): Originator={PL}, Code={SATA NCQ Fail All Commands After Error}, SubCode(0x0000) May 7 09:42:34 oss07 last message repeated 43 times May 7 09:42:34 oss07 kernel: sd 1:0:11:0: SCSI error: return code = 0x000b0000 May 7 09:42:34 oss07 kernel: end_request: I/O error, dev sdm, sector 1948444160 May 7 09:42:34 oss07 kernel: mptbase: ioc1: LogInfo(0x31080000): Originator={PL}, Code={SATA NCQ Fail All Commands After Error}, SubCode(0x0000) May 7 09:42:34 oss07 last message repeated 3 times May 7 09:42:34 oss07 kernel: raid5:md12: read error not correctable (sector 1948444160 on sdm). May 7 09:42:34 oss07 kernel: raid5: Disk failure on sdm, disabling device. Operation continuing on 7 devices May 7 09:42:34 oss07 kernel: raid5:md12: read error not correctable (sector 1948444168 on sdm). May 7 09:42:34 oss07 kernel: raid5:md12: read error not correctable (sector 1948444176 on sdm). ... ... May 7 09:42:49 oss07 kernel: --- rd:10 wd:7 fd:3 May 7 09:42:49 oss07 kernel: disk 0, o:1, dev:sdu May 7 09:42:49 oss07 kernel: disk 1, o:0, dev:sdm May 7 09:42:49 oss07 kernel: disk 2, o:1, dev:sdn May 7 09:42:49 oss07 kernel: disk 3, o:1, dev:sdo May 7 09:42:49 oss07 kernel: disk 5, o:1, dev:sdq May 7 09:42:49 oss07 kernel: disk 6, o:1, dev:sdr May 7 09:42:49 oss07 kernel: disk 7, o:1, dev:sds May 7 09:42:49 oss07 kernel: disk 8, o:1, dev:sdt May 7 09:42:49 oss07 kernel: disk 9, o:1, dev:sdw ... ... May 7 09:42:58 oss07 kernel: RAID5 conf printout: May 7 09:42:58 oss07 kernel: --- rd:10 wd:7 fd:3 May 7 09:42:58 oss07 kernel: disk 1, o:0, dev:sdm May 7 09:42:58 oss07 kernel: disk 2, o:1, dev:sdn May 7 09:42:58 oss07 kernel: disk 3, o:1, dev:sdo May 7 09:42:58 oss07 kernel: disk 5, o:1, dev:sdq May 7 09:42:58 oss07 kernel: disk 6, o:1, dev:sdr May 7 09:42:58 oss07 kernel: disk 7, o:1, dev:sds May 7 09:42:58 oss07 kernel: disk 8, o:1, dev:sdt May 7 09:42:58 oss07 kernel: disk 9, o:1, dev:sdw # current md status [root@oss07 ~]# mdadm --detail /dev/md12 /dev/md12: Version : 00.90.03 Creation Time : Mon Oct 4 15:30:53 2010 Raid Level : raid6 Array Size : 7814099968 (7452.11 GiB 8001.64 GB) Used Dev Size : 976762496 (931.51 GiB 1000.20 GB) Raid Devices : 10 Total Devices : 11 Preferred Minor : 12 Persistence : Superblock is persistent Intent Bitmap : /mnt/scratch/bitmaps/ost02/bitmap Update Time : Mon May 7 11:38:51 2012 State : clean, degraded Active Devices : 7 Working Devices : 8 Failed Devices : 3 Spare Devices : 1 Chunk Size : 128K UUID : 63eb5b15:294c1354:f0c167bd:f8e81f47 Events : 0.7382 Number Major Minor RaidDevice State 0 0 0 0 removed 1 0 0 1 removed 2 8 208 2 active sync /dev/sdn 3 8 224 3 active sync /dev/sdo 4 0 0 4 removed 5 65 0 5 active sync /dev/sdq 6 65 16 6 active sync /dev/sdr 7 65 32 7 active sync /dev/sds 8 65 48 8 active sync /dev/sdt 9 65 96 9 active sync /dev/sdw 10 8 176 - faulty spare /dev/sdl 11 65 64 - spare /dev/sdu 12 8 240 - faulty spare /dev/sdp 13 8 192 - faulty spare /dev/sdm Best regards, Taeyoung Hong Senior Researcher Supercomputing Center of KISTI <ATT00001..txt> This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited. Email addresses that end with a ?-c? identify the sender as a Fusion-io contractor.
_______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss