Re: PROBLEM: RAID5 reshape data corruption

Nagilum Fri, 11 Jan 2008 01:14:34 -0800

----- Message from [EMAIL PROTECTED] ---------
    Date: Sun, 06 Jan 2008 22:35:46 +0100
    From: Nagilum <[EMAIL PROTECTED]>
Reply-To: Nagilum <[EMAIL PROTECTED]>
 Subject: Re: PROBLEM: RAID5 reshape data corruption
      To: Nagilum <[EMAIL PROTECTED]>

Cc: Neil Brown <[EMAIL PROTECTED]>, [email protected], Dan Williams <[EMAIL PROTECTED]>, "H. Peter Anvin" <[EMAIL PROTECTED]>

----- Message from [EMAIL PROTECTED] ---------
    Date: Sun, 06 Jan 2008 00:31:46 +0100
    From: Nagilum <[EMAIL PROTECTED]>

At the moment I'm thinking about writing a small perl program that
will generate me a shell script or makefile containing dd commands
that will copy the chunks from the drive to /dev/md0. I don't care if
that will be dog slow as long as I get most of my data back. (I'd
probably go forward instead of backward to take advantage of the
readahead, after I've determined the exact start chunk.)
For that I need to know one more thing.
Used Dev Size is 488308672k for md0 as well as the disk, 16k chunk size.
488308672/16 = 30519292.00
so the first dd would look like:
dd if=/dev/sdg of=/dev/md0 bs=16k count=1 skip=30519291 seek=X

The big question now being how to calculate X.
Since I have a working testcase I can do a lot of testing before
touching the real thing. The formula to get X will probably contain a
5 for the 5(+1) devices the raid spans now, a 4 for the 4(+1) devices
the raid spanned before the reshape, a 3 for the device number of the
disk that failed and of course the skip/current chunk number.
Can you help me come up with it?
Thanks again for looking into the whole issue.

----- End message from [EMAIL PROTECTED] -----

Ok, the spare time over the weekend allowed me to make some headway.
I'm not sure if the attachment will make it through to the ML so I
uploaded the perl script to: http://www.nagilum.de/md/rdrep.pl
First tests show already promising results although I seem to miss the
start of the error corruption. Anyway unlike with the testcase at the
real array I have to start after the area that is unreadable. I have
already determined that last Friday.
Anyway I would appreciate it if someone could have a look over the script.
I'll probably change it a little bit and make every other dd run via
exec instead of system to use some parallelism. (I guess the overhead
for runnung dd will take about as much time as the transfer itself)

----- End message from [EMAIL PROTECTED] -----

I just want to give a quick update.

The program run for about one and a half day and it looks good, the directories and files appear ok. I'll do some work on it this evening, see if I can restore some more blocks before running xfs_repair.

Kind regards,

========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  [EMAIL PROTECTED] \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..

pgpynIpmENFK6.pgp
Description: PGP Digital Signature

Re: PROBLEM: RAID5 reshape data corruption

Reply via email to