RAID6 issue: md_do_sync() got signal ... exiting
Hi all, I am having trouble with creating a RAID 6 md device on a home-grown Linux 2.6.20.11 SMP 64-bit build. I first create the RAID6 without problems, and see the following successful dump in /var/log/messages. If I check /proc/mdstat, the RAID6 is doing the initial syncing as expected. Nov 30 01:26:39 testsystem kernel: md: bind Nov 30 01:26:39 testsystem kernel: md: bind Nov 30 01:26:39 testsystem kernel: md: bind Nov 30 01:26:39 testsystem kernel: md: bind Nov 30 01:26:39 testsystem kernel: md: md0: raid array is not clean -- starting background reconstruction Nov 30 01:26:39 testsystem kernel: raid5: device dm-0 operational as raid disk 3 Nov 30 01:26:39 testsystem kernel: raid5: device dm-1 operational as raid disk 2 Nov 30 01:26:39 testsystem kernel: raid5: device dm-2 operational as raid disk 1 Nov 30 01:26:39 testsystem kernel: raid5: device dm-3 operational as raid disk 0 Nov 30 01:26:39 testsystem kernel: raid5: allocated 4268kB for md0 Nov 30 01:26:39 testsystem kernel: raid5: raid level 6 set md0 active with 4 out of 4 devices, algorithm 0 Nov 30 01:26:39 testsystem kernel: RAID5 conf printout: Nov 30 01:26:39 testsystem kernel: --- rd:4 wd:4 Nov 30 01:26:42 testsystem kernel: disk 0, o:1, dev:dm-3 Nov 30 01:26:42 testsystem kernel: disk 1, o:1, dev:dm-2 Nov 30 01:26:42 testsystem kernel: disk 2, o:1, dev:dm-1 Nov 30 01:26:42 testsystem kernel: disk 3, o:1, dev:dm-0 Nov 30 01:26:42 testsystem kernel: md: resync of RAID array md0 Nov 30 01:26:42 testsystem kernel: md: minimum _guaranteed_ speed: 0 KB/sec/disk. Nov 30 01:26:42 testsystem kernel: md: using maximum available idle IO bandwidth (but not more than 0 KB/sec) for resync. Nov 30 01:26:42 testsystem kernel: md: using 128k window, over a total of 143371968 blocks. If I then delete the RAID6 and try to create the exact same RAID6 again, it sometimes fails, with the following appearing in /var/log/messages. Note the md_do_sync() got signal ... exiting line: Nov 30 01:28:03 testsystem kernel: md: bind Nov 30 01:28:03 testsystem kernel: md: bind Nov 30 01:28:03 testsystem kernel: md: bind Nov 30 01:28:03 testsystem kernel: md: bind Nov 30 01:28:03 testsystem kernel: md: md0: raid array is not clean -- starting background reconstruction Nov 30 01:28:03 testsystem kernel: raid5: device dm-0 operational as raid disk 3 Nov 30 01:28:03 testsystem kernel: raid5: device dm-1 operational as raid disk 2 Nov 30 01:28:03 testsystem kernel: raid5: device dm-2 operational as raid disk 1 Nov 30 01:28:03 testsystem kernel: raid5: device dm-3 operational as raid disk 0 Nov 30 01:28:03 testsystem kernel: raid5: allocated 4268kB for md0 Nov 30 01:28:03 testsystem kernel: raid5: raid level 6 set md0 active with 4 out of 4 devices, algorithm 0 Nov 30 01:28:03 testsystem kernel: RAID5 conf printout: Nov 30 01:28:03 testsystem kernel: --- rd:4 wd:4 Nov 30 01:28:04 testsystem kernel: disk 0, o:1, dev:dm-3 Nov 30 01:28:05 testsystem kernel: disk 1, o:1, dev:dm-2 Nov 30 01:28:05 testsystem kernel: disk 2, o:1, dev:dm-1 Nov 30 01:28:05 testsystem kernel: disk 3, o:1, dev:dm-0 Nov 30 01:28:05 testsystem kernel: md: resync of RAID array md0 Nov 30 01:28:05 testsystem kernel: md: minimum _guaranteed_ speed: 0 KB/sec/disk. Nov 30 01:28:05 testsystem kernel: md: using maximum available idle IO bandwidth (but not more than 0 KB/sec) for resync. Nov 30 01:28:05 testsystem kernel: md: using 128k window, over a total of 143368192 blocks. Nov 30 01:28:05 testsystem kernel: md: md_do_sync() got signal ... exiting Nov 30 01:28:05 testsystem kernel: md: checkpointing resync of md0. Nov 30 01:28:05 testsystem kernel: md: md0 stopped. Nov 30 01:28:05 testsystem kernel: md: unbind Nov 30 01:28:05 testsystem kernel: md: export_rdev(dm-0) Nov 30 01:28:05 testsystem kernel: md: unbind Nov 30 01:28:05 testsystem kernel: md: export_rdev(dm-1) Nov 30 01:28:05 testsystem kernel: md: unbind Nov 30 01:28:05 testsystem kernel: md: export_rdev(dm-2) Nov 30 01:28:05 testsystem kernel: md: unbind Nov 30 01:28:05 testsystem kernel: md: export_rdev(dm-3) The failure is VERY intermittent. Sometimes it fails, sometimes it succeeds...with the exact same creation procedure. Any ideas on what may be causing this issue? Thank you very much in advance for your assistance! Best regards, Thomas More new features than ever. Check out the new AOL Mail ! - http://o.aolcdn.com/cdn.webmail.aol.com/mailtour/aol/en-us/text.htm?ncid=aolcmp000503 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid6 check/repair
Dear Neil, The point that I'm trying to make is, that there does exist a specific case, in which recovery is possible, and that implementing recovery for that case will not hurt in any way. Assuming that it true (maybe hpa got it wrong) what specific conditions would lead to one drive having corrupt data, and would correcting it on an occasional 'repair' pass be an appropriate response? The use case for the proposed 'repair' would be occasional, low-frequency corruption, for which many sources can be imagined: Any piece of hardware has a certain failure rate, which may depend on things like age, temperature, stability of operating voltage, cosmic rays, etc. but also on variations in the production process. Therefore, hardware may suffer from infrequent glitches, which are seldom enough, to be impossible to trace back to a particular piece of equipment. It would be nice to recover gracefully from that. Kernel bugs or just plain administrator mistakes are another thing. But also the case of power-loss during writing that you have mentioned could profit from that 'repair': With heterogeneous hardware, blocks may be written in unpredictable order, so that in more cases graceful recovery would be possible with 'repair' compared to just recalculating parity. Does the value justify the cost of extra code complexity? In the case of protecting data integrity, I'd say 'yes'. Everything costs extra. Code uses bytes of memory, requires maintenance, and possibly introduced new bugs. Of course, you are right. However, in my other email, I tried to sketch a piece of code which is very lean as it makes use of functions which I assume to exist. (Sorry, I didn't look at the md code, yet, so please correct me if I'm wrong.) Therefore I assume the costs in memory, maintenance and bugs to be rather low. Kind regards, Thiemo - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assemble vs create an array.......
Dragos wrote: Hello, I had created a raid 5 array on 3 232GB SATA drives. I had created one partition (for /home) formatted with either xfs or reiserfs (I do not recall). Last week I reinstalled my box from scratch with Ubuntu 7.10, with mdadm v. 2.6.2-1ubuntu2. Then I made a rookie mistake: I --create instead of --assemble. The recovery completed. I then stopped the array, realizing the mistake. 1. Please make the warning more descriptive: ALL DATA WILL BE LOST, when attempting to created an array over an existing one. 2. Do you know of any way to recover from this mistake? Or at least what filesystem it was formated with. Any help would be greatly appreciated. I have hundreds of family digital pictures and videos that are irreplaceable. Thank you in advance, Dragos - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Meh,... I do that all the time for testing The raid metadata is separate from the FS in that you can trash it as much as you like and the FS it refers to will be fine as long as you don't decide to mkfs over it If you've an old /var/log/messages kicking around from when the raid was correct you should be able to extract the order eg, RAID5 conf printout: --- rd:5 wd:5 disk 0, o:1, dev:sdf1 disk 1, o:1, dev:sde1 disk 2, o:1, dev:sdg1 disk 3, o:1, dev:sdc1 disk 4, o:1, dev:sdd1 Unfortunately, there is no point looking at mdadm -E disk> as you've already trashed the information there Anyway From the above the recreation of the array would be mdadm -C -l5 -n5 -c128 /dev/md0 /dev/sdf1 /dev/sde1 /dev/sdg1 /dev/sdc1 /dev/sdd1 (where -l5 = raid 5, -n5 = number of participating drives and -c128 = chunk size of 128K) IF you don't have the configuration printout, then you're left with exhaustive brute force searching of the combinations disks. Unfortunately possible combinations increase geometrically and going beyond 8 disks is a suicidally *bad* idea 2=2 3=6 4=24 5=120 6=720 7=5040 8=40320 You only have 3 drives so only 6 possible combinations to try (unlike myself with 5) So, just write yourself a small script with all 6 combinations and run them through a piece of shell similar to this pseudo script lvchange -an /dev/VolGroup01/LogVol00 # if you use lvm at all (change as appropriate or discard) mdadm --stop --scan yes | mdadm -C -l5 -n3 /dev/md0 /dev/sdd1 /dev/sde1 /dev/sdf1 # (replaceable combinations) lvchange -ay /dev/VolGroup01/LogVol00 # if you use lvm (or discard) mount /dev/md0 /mnt # Lets use the success return code for mount to indicate we're able to mount the FS again and bail out (man mount) if [ $? eq 0 ] ; then exit 0 fi - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid6 check/repair
Dear Neil and Eyal, Eyal Lebedinsky wrote: > Neil Brown wrote: >> It would seem that either you or Peter Anvin is mistaken. >> >> On page 9 of >> http://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf >> at the end of section 4 it says: >> >> Finally, as a word of caution it should be noted that RAID-6 by >> itself cannot even detect, never mind recover from, dual-disk >> corruption. If two disks are corrupt in the same byte positions, >> the above algorithm will in general introduce additional data >> corruption by corrupting a third drive. > > The above a/b/c cases are not correct for raid6. While we can detect > 0, 1 or 2 errors, any higher number of errors will be misidentified as > one of these. > > The cases we will always see are: > a) no errors - nothing to do > b) one error - correct it > c) two errors -report? take the raid down? recalc syndromes? > and any other case will always appear as *one* of these (not as [c]). I still don't agree. I'll explain the algorithm for error handling that I have in mind, maybe you can point out if I'm mistaken at some point. We have n data blocks D1...Dn and two parities P (XOR) and Q (Reed-Solomon). I assume the existence of two functions to calculate the parities P = calc_P(D1, ..., Dn) Q = calc_Q(D1, ..., Dn) and two functions to recover a missing data block Dx using either parity Dx = recover_P(x, D1, ..., Dx-1, Dx+1, ..., Dn, P) Dx = recover_Q(x, D1, ..., Dx-1, Dx+1, ..., Dn, Q) This pseudo-code should distinguish between a), b) and c) and properly repair case b): P' = calc_P(D1, ..., Dn); Q' = calc_Q(D1, ..., Dn); if (P' == P && Q' == Q) { /* case a): zero errors */ return; } if (P' == P && Q' != Q) { /* case b1): Q is bad, can be fixed */ Q = Q'; return; } if (P' != P && Q' == Q) { /* case b2): P is bad, can be fixed */ P = P'; return; } /* both parities are bad, so we try whether the problem can be fixed by repairing data blocks */ for (i = 1; i <= n; n++) { /* assume only Di is bad, use P parity to repair */ D' = recover_P(i, D1, ..., Di-1, Di+1, ..., Dn, P); /* use Q parity to check assumption */ Q' = calc_Q(D1, ..., Di-1, D', Di+1, ..., Dn); if (Q == Q') { /* case b3): Q parity is ok, that means the assumption was correct and we can fix the problem */ Di = D'; return; } } /* case c): when we get here, we have excluded cases a) and b), so now we really have a problem */ report_unrecoverable_error(); return; Concerning misidentification: A situation can be imagined, in which two or more simultaneous corruptions have occurred in a very special way, so that case b3) is diagnosed accidentally. While that is not impossible, I'd assume the probability for it to be negligible, to be compared to that of undetectable corruption in a RAID 5 setup. Kind regards, Thiemo - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assemble vs create an array.......
Bryce wrote: [] > mdadm -C -l5 -n5 -c128 /dev/md0 /dev/sdf1 /dev/sde1 /dev/sdg1 /dev/sdc1 > /dev/sdd1 ... > IF you don't have the configuration printout, then you're left with > exhaustive brute force searching of the combinations You're missing a very important point -- --assume-clean option. For experiments like this (trying to figure out the order of disks), you'd better ensure the data on disks isn't being changed while you try different combinations. But on each build, md always destroys one drive by re-calculating parity. You have to stop it from doing so - to not trash your data. Another option is to use one missing drive always, i.e., mdadm -C -l5 -n5 -c128 /dev/md0 /dev/sdf1 missing /dev/sdg1 /dev/sdc1 /dev/sdd1 so that the array will be degraded and no way to resync anything - this also prevents md from trashing data. /mjt - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assemble vs create an array.......
Neil Brown wrote: > On Thursday November 29, [EMAIL PROTECTED] wrote: >> 2. Do you know of any way to recover from this mistake? Or at least what >> filesystem it was formated with. It may not have been lost - yet. > If you created the same array with the same devices and layout etc, > the data will still be there, untouched. > Try to assemble the array and use "fsck" on it. To be safe I'd use fsck -n (check the man page as this is odd for reiserfs) > When you create a RAID5 array, all that is changed is the metadata (at > the end of the device) and one drive is changed to be the xor of all > the others. In other words, one of your 3 drives has just been erased. Unless you know the *exact* command you used and have the dmesg output to hand then we won't know which one. Now what you need to do is to try all the permutations of creating a degraded array using 2 of the drives and specify the 3rd as 'missing': So something like: mdadm --create /dev/md0 --raid-devices=3 --level=5 missing /dev/sdb1 /dev/sdc1 mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sdb1 missing /dev/sdc1 mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sdb1 /dev/sdc1 missing mdadm --create /dev/md0 --raid-devices=3 --level=5 missing /dev/sdb1 /dev/sdd1 mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sdb1 missing /dev/sdd1 mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sdb1 /dev/sdd1 missing etc etc It is important to create the array using a 'missing' device so the xor data isn't written. There is a program here: http://linux-raid.osdl.org/index.php/Permute_array.pl that may help... David - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assemble vs create an array.......
Thank you for your very fast answers. First I tried 'fsck -n' on the existing array. The answer was that If I wanted to check a XFS partition I should use 'xfs_check'. That seems to say that my array was partitioned with xfs, not reiserfs. Am I correct? Then I tried the different permutations: mdadm --create /dev/md0 --raid-devices=3 --level=5 missing /dev/sda1 /dev/sdb1 mount /dev/md0 temp mdadm --stop --scan mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sda1 missing /dev/sdb1 mount /dev/md0 temp mdadm --stop --scan mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sda1 /dev/sdb1 missing mount /dev/md0 temp mdadm --stop --scan mdadm --create /dev/md0 --raid-devices=3 --level=5 missing /dev/sda1 /dev/sdc1 mount /dev/md0 temp mdadm --stop --scan mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sda1 missing /dev/sdc1 mount /dev/md0 temp mdadm --stop --scan mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sda1 /dev/sdc1 missing mount /dev/md0 temp mdadm --stop --scan mdadm --create /dev/md0 --raid-devices=3 --level=5 missing /dev/sdb1 /dev/sdc1 mount /dev/md0 temp mdadm --stop --scan mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sdb1 missing /dev/sdc1 mount /dev/md0 temp mdadm --stop --scan mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sdb1 /dev/sdc1 missing mount /dev/md0 temp mdadm --stop --scan With some arrays mount reported: mount: you must specify the filesystem type and with others: mount: Structure needs cleaning No choice seems to have been successful. Please let me know of other ideas. Thank you again, Dragos David Greaves wrote: Neil Brown wrote: On Thursday November 29, [EMAIL PROTECTED] wrote: 2. Do you know of any way to recover from this mistake? Or at least what filesystem it was formated with. It may not have been lost - yet. If you created the same array with the same devices and layout etc, the data will still be there, untouched. Try to assemble the array and use "fsck" on it. To be safe I'd use fsck -n (check the man page as this is odd for reiserfs) When you create a RAID5 array, all that is changed is the metadata (at the end of the device) and one drive is changed to be the xor of all the others. In other words, one of your 3 drives has just been erased. Unless you know the *exact* command you used and have the dmesg output to hand then we won't know which one. Now what you need to do is to try all the permutations of creating a degraded array using 2 of the drives and specify the 3rd as 'missing': So something like: mdadm --create /dev/md0 --raid-devices=3 --level=5 missing /dev/sdb1 /dev/sdc1 mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sdb1 missing /dev/sdc1 mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sdb1 /dev/sdc1 missing mdadm --create /dev/md0 --raid-devices=3 --level=5 missing /dev/sdb1 /dev/sdd1 mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sdb1 missing /dev/sdd1 mdadm --create /dev/md0 --raid-devices=3 --level=5 /dev/sdb1 /dev/sdd1 missing etc etc It is important to create the array using a 'missing' device so the xor data isn't written. There is a program here: http://linux-raid.osdl.org/index.php/Permute_array.pl that may help... David - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assemble vs create an array.......
I forgot one thing. After re-creating the array which deleted my data in the first place, 'mount' was giving me this answer: mount: Structure needs cleaning Thank you for your time, Dragos - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html