RE: LOTS OF BAD STUFF in raid0: raid0145-19990824-2.2.11 is unstable

Roeland M.J. Meyer Sun, 3 Jan 1999 11:52:23 -0800
FWIW, I've had a RAID0 partion, doing UseNet, for the past year. No problems
found to date. It's  pair of 2GB SCSI-W on an adaptec AHA290x controller.
The kernel is v2.0.37. I have had problems in the past where the disk was
flakey (but not completely dead) and the RAID stuff didn't help me much in
detremining that. In those cases, I had to remount the disk, outside of the
RAID environment, and run single-disk diagnostics. In one extreme case, only
black-box replacement of the drive yielded the fact that it was the drive
(process of elimination). I RMA'd the drive and everything has been working
fine since. I could not have diagnosed the problem had I not had spare
machines and extra hard drives to compare against.

There are many cases where swap-n-pray is the ONLY diagnostic mechanism that
works (unless you have extensive hardware testing facilities, of course).
The lesson is, the problem's not always software.

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of David Mansfield
> Sent: Friday, November 05, 1999 11:29 AM
> To: Ingo Molnar
> Cc: Michael K. Johnson; [EMAIL PROTECTED];
> [EMAIL PROTECTED]; Alan Cox
> Subject: Re: LOTS OF BAD STUFF in raid0: raid0145-19990824-2.2.11 is
> unstable
>
>
>
> > it's 99.99% a problem with the disk. The RAID0 code has not had any
> > significant changes (due to it's simplicity) in the last
> couple of years.
> > We never rule out software bugs, but this is one of those
> cases where it's
> > way, way down in the list of potential problem sources.
> >
> > -- mingo
> >
>
> So all of the 'attempt to access beyond end of device' errors reported
> over the last 6 months, both RAID and standard kernel are all
> chalked up
> to hardware failure?  I mean, that's what this is.  It just
> hits the RAID0
> request layer BEFORE the standard check that generates the
> 'attempt...'
> error.  I believe there is a bug lurking, because people keep
> reporting
> the 'attempt...' error, and this is that exact problem, it just hits a
> different place.
>
> My $.02 (and very little kernel hacking exp :-( ) make me
> believe there is
> still one bug in the buffer-cache code (generic kernel) that
> causes this,
> but thanks for the feedback.  I'll shut-up now.
>
> David
>
> --
> /==============================\
> | David Mansfield              |
> | [EMAIL PROTECTED]             |
> \==============================/
>
RE: LOTS OF BAD STUFF in raid0: raid0145-19990824-2.2.11 is unstable

Reply via email to