Perhaps I am confused.  How is it that a power outage while attached
to the UPS becomes "unpredictable"?  

  We run a Dell PowerEdge 2300/400 using Linux software raid and the
system monitors it's own UPS.  When power failure occures the system
will bring itself down to a minimal state (runlevel 1) after the
batteries are below 50% .. and once below 15% it will shutdown which
turns off the UPS.  When power comes back on the UPS fires up and the
system resumes as normal.

  Addmitedly this wont prevent issues like god reaching out and slapping
my system via lightning or something, nor will it resolve issues where
someone decides to grab the power cable and swing around on it severing
the connection from the UPS to the system .. but for the most part it
has thus far prooven to be a fairly decent configuration.

Benno Senoner wrote:
> 
> "Stephen C. Tweedie" wrote:
> 
> (...)
> 
> >
> > 3) The soft-raid backround rebuild code reads and writes through the
> >    buffer cache with no synchronisation at all with other fs activity.
> >    After a crash, this background rebuild code will kill the
> >    write-ordering attempts of any journalling filesystem.
> >
> >    This affects both ext3 and reiserfs, under both RAID-1 and RAID-5.
> >
> > Interaction 3) needs a bit more work from the raid core to fix, but it's
> > still not that hard to do.
> >
> > So, can any of these problems affect other, non-journaled filesystems
> > too?  Yes, 1) can: throughout the kernel there are places where buffers
> > are modified before the dirty bits are set.  In such places we will
> > always mark the buffers dirty soon, so the window in which an incorrect
> > parity can be calculated is _very_ narrow (almost non-existant on
> > non-SMP machines), and the window in which it will persist on disk is
> > also very small.
> >
> > This is not a problem.  It is just another example of a race window
> > which exists already with _all_ non-battery-backed RAID-5 systems (both
> > software and hardware): even with perfect parity calculations, it is
> > simply impossible to guarantee that an entire stipe update on RAID-5
> > completes in a single, atomic operation.  If you write a single data
> > block and its parity block to the RAID array, then on an unexpected
> > reboot you will always have some risk that the parity will have been
> > written, but not the data.  On a reboot, if you lose a disk then you can
> > reconstruct it incorrectly due to the bogus parity.
> >
> > THIS IS EXPECTED.  RAID-5 isn't proof against multiple failures, and the
> > only way you can get bitten by this failure mode is to have a system
> > failure and a disk failure at the same time.
> >
> 
> >
> > --Stephen
> 
> thank you very much for these clear explanations,
> 
> Last doubt: :-)
> Assume all RAID code - FS interaction problems get fixed,
> since a linux soft-RAID5 box has no battery backup,
> does this mean that we will loose data
> ONLY if there is a power failure AND successive disk failure ?
> If we loose the power and then after reboot all disks remain intact
> can the RAID layer reconstruct all information in a safe way ?
> 
> The problem is that power outages are unpredictable even in presence
> of UPSes therefore it is important to have some protection against
> power losses.
> 
> regards,
> Benno.

Reply via email to