Kurt Garloff wrote:
> 
> On Sat, Mar 18, 2000 at 12:38:00AM +0100, Michael Stumpf wrote:
> > i have a problem with a scsi hard disk IBM DCAS-34330 connected to an
> > AHA-2940.
> > I think the error is caused by the domain validation.
> > Then it did this "Domain validation" and immediately afterwards the errors
> > occured.
> > No login was possible anymore.
> >
> > 1. What is this domain validation ?
> 
> The controller tests, whether the negotiated speed can be safely used and
> otherwise reduces its speed. AFAIK, this is done by writing to and reading
> from the device's buffer.

It is suppossed to be done that way on SCSI-3 compliant devices.  However, in
the current aic7xxx driver, I don't do it that way.  Instead, I keep a CRC of
the INQUIRY data and simply check that.  It won't catch subtle errors, it will
only catch gross errors in the transfers.  But, it will catch those gross
errors on all the devices, not just SCSI-3 devices.

> > 2. Is there a bugfix available ?
> 
> First a bug has to be spotted. Maybe Doug knows about one?

Nope.

> There's the possibility, that the device mixes up the data written to the
> buffer by WRITE_BUFFER with the data that it hold reading or writing to the
> disk. Could be either a firmware or a driver bug.

Except that given the description I gave above, you can see that isn't the
case ;-)

> > 3. Is it a hardware problem ?
> >    (the system was previously running under novell 3.2 w/o any problems)
> 
> Could be. The domain validation was most probably triggered by some sort of
> bug, most probably a parity error.

Typically, that's the cause, yes.

> > 4. Is there a way to force the Domain validation. -> reproducing the bug ?
> 
> Don't know.

Nope.

> > Mar 16 23:46:41 linux kernel: (scsi0:0:0:0) Performing Domain validation.
> > Mar 16 23:46:41 linux kernel: (scsi0:0:0:0) Successfully completed Domain
> > validation.
> > Mar 16 23:46:57 linux kernel: attempt to access beyond end of device
> > Mar 16 23:46:57 linux kernel: 08:03: rw=0, want=764136424, limit=1049600
> > Mar 16 23:46:57 linux kernel: dev 08:03 blksize=4096 blocknr=1801646841
> > sector=1528272840 size=4096 count=1
> > ...
> 
> Looks like filesystem corruption to me. A block with no. 764136424 certainly
> does not exist, but the filesystem most probably points to such a block,
> that's why the driver tries to read it.
> 
> The question is what was causing it.

There was 16 seconds between the domain validation and the filesystem error,
that seems more coincidental than causal to me.


-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
      Please check my web site for aic7xxx updates/answers before
                      e-mailing me about problems

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to