OK, regardless of how the failure occurs,  my point is that a
resync is a potentially dangerous operation if you don't
know beforehand whether the source disk has bad sectors or not.
So I don't think a resync should be performed except when
absolutely necessary, or unless the source disk is known to
be absolutely free from errors.

Can someone answer my original question which was:

     Could the SB_CLEAN flag be eliminated to reduce the
     risk of a resync damaging good data?

> > I hope you're making a joke.
>
> that's not a joke - i found it in a document describing raid. I hope I can find it
> again, so I can send it to you!
>
> > The problem is that I'm trying to use the software in
> > the "real world" where the most likely need for raid-1 is
> > due to power problems.
>
> Therefore you should use UPS. not raid. And even when you are using a journalling
> file-system it's possible to lose data.
>
> > At least that's been my experience over many years of
> > doing sysadmin - most disk failures seem to occur after
> > some sort of power outage.  Either the power goes out,
> > or somone accidentally pulls the plug, etc.
>
> I had a server running a raid5 with 3 disks and after some power failures the system
> didn't want to mount the disk again. The other - un-raided disks didn't have any
> problem. - After reinstallation and adding a UPS all is fine (now for 8 months).
>
> >
> > Sam
> >
> > Thomas Kotzian wrote:
> >
> > > raid wasn't invented to survive a power failure but a disk-failure!
> > >
> > > Thomas
> > >
> > > ----- Original Message -----
> > > From: "Sam" <[EMAIL PROTECTED]>
> > > To: <[EMAIL PROTECTED]>
> > > Sent: Monday, March 27, 2000 1:00 PM
> > > Subject: Raid1 - dangerous resync after power-failure?
> > >
> > > > I'm setting up a web server with Raid-1, using raidtools 0.90-5
> > > > and linux kernel 2.2.12 (this is the Redhat 6.1 distr).  I want to
> > > > mirror all my data across two disks (hda and hdc).
> > > >
> > > > The problem I've noticed from testing is that if I shut off the power
> > > > and then reboot, the raidtools software will start re-syncing the
> > > > mirrors,
> > > > even though there was no write activity at all when the power went off
> > > > and even
> > > > though both parts of the mirror have the exact same event counter.
> > > >
> > > > The problem I see with this is as follows:
> > > >
> > > >     - Assume a power outage hits and wipes out some sectors on the
> > > >       hda disk, but leaves the superblock alone.  I think this scenario
> > > >       is a fairly likely one.
> > > >
> > > >     - After the power outage, the system boots up and starts up a
> > > > resync,
> > > >        copying data from hda to hdc
> > > >
> > > >     - The system tries to access the bad sectors on hda
> > > >
> > > > What would happen at this point?  I assume the data would be lost,
> > > > since hdc is undergoing a re-sync, and the sectors on hda are already
> > > > bad.
> > > > Even though at boot time hdc contained good copies of these sectors,
> > > > the raid software starting re-syncing onto hdc and lost that data.  If
> > > > however
> > > > the raid code had just left hdc alone it could've recovered these
> > > > sectors.
> > > >
> > > > I looked at the raidtools code, and it looks to me what is happening is
> > > > that
> > > > there is a SB_CLEAN flag in the superblock that is set to false when
> > > > raid
> > > > is started on an md device.  This SB_CLEAN flag is only set to true if a
> > > > clean
> > > > shutdown is performed.  So if a power outage hits, this flag is always
> > > > going
> > > > to be false since no clean shutdown is performed.  At boot time the md
> > > > code
> > > > then checks the SB_CLEAN flag and if it is false a resync is performed.
> > > >
> > > > It seems to me that a resync should only be required if the system is in
> > > > the
> > > > middle of a write where some data has been sent to one disk, but not yet
> > > > to another.
> > > > I think the event counter already performs this function so I don't see
> > > > why the
> > > > SB_CLEAN flag is even needed.
> > > >
> > > > What do you think?  Could this SB_CLEAN flag be eliminated to reduce the
> > > >
> > > > risk of a resync damaging good data?
> > > >
> >

Reply via email to