"Stephen C. Tweedie" wrote:

(...)

>
> 3) The soft-raid backround rebuild code reads and writes through the
>    buffer cache with no synchronisation at all with other fs activity.
>    After a crash, this background rebuild code will kill the
>    write-ordering attempts of any journalling filesystem.
>
>    This affects both ext3 and reiserfs, under both RAID-1 and RAID-5.
>
> Interaction 3) needs a bit more work from the raid core to fix, but it's
> still not that hard to do.
>
> So, can any of these problems affect other, non-journaled filesystems
> too?  Yes, 1) can: throughout the kernel there are places where buffers
> are modified before the dirty bits are set.  In such places we will
> always mark the buffers dirty soon, so the window in which an incorrect
> parity can be calculated is _very_ narrow (almost non-existant on
> non-SMP machines), and the window in which it will persist on disk is
> also very small.
>
> This is not a problem.  It is just another example of a race window
> which exists already with _all_ non-battery-backed RAID-5 systems (both
> software and hardware): even with perfect parity calculations, it is
> simply impossible to guarantee that an entire stipe update on RAID-5
> completes in a single, atomic operation.  If you write a single data
> block and its parity block to the RAID array, then on an unexpected
> reboot you will always have some risk that the parity will have been
> written, but not the data.  On a reboot, if you lose a disk then you can
> reconstruct it incorrectly due to the bogus parity.
>
> THIS IS EXPECTED.  RAID-5 isn't proof against multiple failures, and the
> only way you can get bitten by this failure mode is to have a system
> failure and a disk failure at the same time.
>

>
> --Stephen

thank you very much for these clear explanations,

Last doubt: :-)
Assume all RAID code - FS interaction problems get fixed,
since a linux soft-RAID5 box has no battery backup,
does this mean that we will loose data
ONLY if there is a power failure AND successive disk failure ?
If we loose the power and then after reboot all disks remain intact
can the RAID layer reconstruct all information in a safe way ?

The problem is that power outages are unpredictable even in presence
of UPSes therefore it is important to have some protection against
power losses.

regards,
Benno.



Reply via email to