BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)

2008-01-28 Thread Peter Rabbitson

Hello,

It seems that mdadm/md do not perform proper sanity checks before adding a 
component to a degraded array. If the size of the new component is just right, 
the superblock information will overlap with the data area. This will happen 
without any error indications in the syslog or otherwise.


I came up with a reproducible scenario which I am attaching to this email 
alongside with the entire test script. I have not tested it for other raid 
levels, or other types of superblocks, but I suspect the same problem will 
occur for many other configurations.


I am willing to test patches, however the attached script is non-intrusive 
enough to be executed anywhere.


The output of the script follows bellow.

Peter

==
==
==

[EMAIL PROTECTED]:/media/space/testmd# ./md_overlap_test
Creating component 1 (1056768 bytes)... done.
Creating component 2 (1056768 bytes)... done.
Creating component 3 (1056768 bytes)... done.


===
Creating 3 disk raid5 array with v1.1 superblock
mdadm: array /dev/md9 started.
Waiting for resync to finish... done.

md9 : active raid5 loop3[3] loop2[1] loop1[0]
  2048 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

Initial checksum of raw raid5 device: 4df1921524a3b717a956fceaed0ae691  /dev/md9


===
Failing first componnent
mdadm: set /dev/loop1 faulty in /dev/md9
mdadm: hot removed /dev/loop1

md9 : active raid5 loop3[3] loop2[1]
  2048 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/2] [_UU]

Checksum of raw raid5 device after failing componnent: 
4df1921524a3b717a956fceaed0ae691  /dev/md9



===
Re-creating block device with size 1048576 bytes, so both the superblock and 
data start at the same spot

Adding back to array
mdadm: added /dev/loop1
Waiting for resync to finish... done.

md9 : active raid5 loop1[4] loop3[3] loop2[1]
  2048 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

Checksum of raw raid5 device after adding back smaller component: 
bb854f77ad222d224fcdd8c8f96b51f0  /dev/md9



===
Attempting recovery
Waiting for recovery to finish... done.
Performing check
Waiting for check to finish... done.

Current value of mismatch_cnt: 0

Checksum of raw raid5 device after repair/check: 
146f5c37305c42cda64538782c8c3794  /dev/md9

[EMAIL PROTECTED]:/media/space/testmd#
#!/bin/bash

echo Please read the script first, and comment the exit line at the top.
echo This script will require about 3MB of free space, it will free (and use)
echo loop devices 1 2 and 3, and will use the md device number specified in 
MD_DEV.
exit 0

MD_DEV=md9# make sure this is not an array you use
COMP_NUM=3
COMP_SIZE=$((1 * 1024 * 1024 + 8192)) #1MiB comp sizes with room for 8k (16 
sect) of metadata

mdadm -S /dev/$MD_DEV /dev/null

DEVS=
for i in $(seq $COMP_NUM); do
echo -n Creating component $i ($COMP_SIZE bytes)... 
losetup -d /dev/loop${i} /dev/null

set -e
PCMD=print \\\x${i}${i}\ x $COMP_SIZE   # fill entire image with the 
component number (0xiii...)
perl -e $PCMD  dummy${i}.img
losetup /dev/loop${i} dummy${i}.img
DEVS=$DEVS /dev/loop${i}
set +e
echo done.
done

echo
echo
echo ===
echo Creating $COMP_NUM disk raid5 array with v1.1 superblock
# superblock at beginning of blockdev guarantees that it will overlap with real 
data, not with parity
mdadm -C /dev/$MD_DEV -l 5 -n $COMP_NUM -e 1.1 $DEVS

echo -n Waiting for resync to finish...
while [ $(cat /sys/block/$MD_DEV/md/sync_action) != idle ] ; do
echo -n .
sleep 1
done
echo  done.
echo
grep -A1 $MD_DEV /proc/mdstat 

echo
echo -n Initial checksum of raw raid5 device: 
md5sum /dev/$MD_DEV

echo
echo
echo ===
echo Failing first componnent
mdadm -f /dev/$MD_DEV /dev/loop1
mdadm -r /dev/$MD_DEV /dev/loop1

echo
grep -A1 $MD_DEV /proc/mdstat 

echo
echo -n Checksum of raw raid5 device after failing componnent: 
md5sum /dev/$MD_DEV

echo
echo
echo ===
NEWSIZE=$(( $COMP_SIZE - $(cat /sys/block/$MD_DEV/md/rd1/offset) * 512 ))
echo Re-creating block device with size $NEWSIZE bytes, so both the superblock 
and data start at the same spot
losetup -d /dev/loop1 /dev/null
PCMD=print \\\x11\ x $NEWSIZE
perl -e $PCMD  dummy1.img
losetup /dev/loop1 dummy1.img

echo Adding back to array
mdadm -a /dev/$MD_DEV /dev/loop1

echo -n Waiting for resync to finish...
while [ $(cat /sys/block/$MD_DEV/md/sync_action) != idle ] ; do
echo -n .

Re: BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)

2008-01-28 Thread Peter Rabbitson

Neil Brown wrote:

On Monday January 28, [EMAIL PROTECTED] wrote:

Hello,

It seems that mdadm/md do not perform proper sanity checks before adding a 
component to a degraded array. If the size of the new component is just right, 
the superblock information will overlap with the data area. This will happen 
without any error indications in the syslog or otherwise.


I thought I fixed that What versions of Linux kernel and mdadm are
you using for your tests?



Linux is 2.6.23.14 with everything md related compiled in (no modules)
mdadm - v2.6.4 - 19th October 2007 (latest in debian/sid)

Peter
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)

2008-01-28 Thread Neil Brown
On Monday January 28, [EMAIL PROTECTED] wrote:
 Hello,
 
 It seems that mdadm/md do not perform proper sanity checks before adding a 
 component to a degraded array. If the size of the new component is just 
 right, 
 the superblock information will overlap with the data area. This will happen 
 without any error indications in the syslog or otherwise.

I thought I fixed that What versions of Linux kernel and mdadm are
you using for your tests?

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)

2008-01-28 Thread Neil Brown
On Monday January 28, [EMAIL PROTECTED] wrote:
 Hello,
 
 It seems that mdadm/md do not perform proper sanity checks before adding a 
 component to a degraded array. If the size of the new component is just 
 right, 
 the superblock information will overlap with the data area. This will happen 
 without any error indications in the syslog or otherwise.
 
 I came up with a reproducible scenario which I am attaching to this email 
 alongside with the entire test script. I have not tested it for other raid 
 levels, or other types of superblocks, but I suspect the same problem will 
 occur for many other configurations.
 
 I am willing to test patches, however the attached script is non-intrusive 
 enough to be executed anywhere.

Thanks for the report and the test script.

This patch for mdadm should fix this problem I hate the fact that
we sometimes use K and sometimes use sectors for
sizes/offsets... groan.

I'll probably get a test in the kernel as well to guard against this.

Thanks,
NeilBrown


### Diffstat output
 ./Manage.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/Manage.c ./Manage.c
--- .prev/Manage.c  2008-01-29 11:15:54.0 +1100
+++ ./Manage.c  2008-01-29 11:16:15.0 +1100
@@ -337,7 +337,7 @@ int Manage_subdevs(char *devname, int fd
 
/* Make sure device is large enough */
if (tst-ss-avail_size(tst, ldsize/512) 
-   array.size) {
+   array.size*2) {
fprintf(stderr, Name : %s not large 
enough to join array\n,
dv-devname);
return 1;
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html