> after purposely trying to corrupt one drive in my raid1 array (for testing
> purposes of course),

RAID1 does not protect against disk corruption.

It protects against disk *FAILURE*.

> i setup raid1 array w/ 2 identical 9gig scsi drives, and made an ext2fs on the
> array.  then i stopped the array, and used dd(1) to zero the first 2 blocks on
> the first disk (/dev/sdb1, raid-disk 0 in /etc/raidtab).

RAID1 cannot deal with this.

If you were to ZAP the last 8K of the FS, it *WOULD* notice and be able to fix 
it.

> when i then run raidstart, the mirror starts up fine,

At the *RAID* level, it has no way to know that there is corruption.

It was closed down cleanly, and both PSBs have the same sequence number.
That means "the disks were closed down correctly, and are prefect copies of 
each other".

> but refuses to mount claiming a valid ext2fs doesnt exist on the device (in
> the same way trying to mount the 'corrupt' sdb1 would).

My guess is that it just so happens that it chose to look at the disk which 
you ZAPPed for the data.  It could have used the other, in which case it would 
have "worked".

> if i perform the same test using /dev/sdc1 (raid-disk 1), the drives are
> resync'd successfully during execution of the mount command.

I suspect that is because if it happens to read the data from the good disk, 
and then modifies it (the EXT2 [not RAID] Super Block is somewhere near there, 
and gets modified frequently), it will be written back to BOTH disks, so all 
will be well ...

NB: as you say, it is the *MOUNT* that fixes it ...

> so...why doesnt sdb1 get resync'd?

Why should it be ?

It assumes the disks are in perfect sync.

If it *did* find that they were not, what should it do ?
With three disks, they could vote ...

> i remember from raidtools version 0.50, there was at least one option to
> chkraid that would allow selection of the 'trusted' (ie not corrupt)
> partition.

What is needed is a "raidhotremove --DOIT!!" to allow you to remove an active 
disk ... (people have said this should be possible in the kernel -- but 
tricky. Sounds pretty simple to me, but I'm not an expert ...)

> now that everything lives in the kernel, i dont see how that would be
> possible.

I can think of HACKs ....

> i tried using raidhot<remove|add> but since the corrupted disks are still
> listed as active, removing fails.

Quite :-((

> perhaps this is a feature, but im just wondering what the logic behind it
> is...

Two I suspect ...
1) efficiency -- don't want to re-sync the disks every time
2) AI -- how can it work out for itself which is the "master"

Reply via email to