-- 
Debian GNU/Linux 2.0 is out! ( http://www.debian.org/ )
Email:  Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Package: kernel
Version: 2.0.34

submitted by: [EMAIL PROTECTED]  
symptom: system crash, scsi errors displayed

occasionally my system will stop doing almost all tasks,
spitting lots of scsi errors to whatever console is active,
continuing to allow virtual console switching and ip forwarding
but not logging remotely or doing anything that might require
disk access including swapping.

accessing large files several times seems to trigger the
abnormal behavior, which always requires a hard reset.

the disk is on a Tekram caching scsi controller which works
with the ataptec 1542 driver.  the scsi host adapter is
fully populated with 16M of ram and is connected only to
a CDROM drive and to a single hard drive.  the drive is
the last item on the chain and has active termination.
the external scsi port is terminated with a bank of
resistors on the card and is not in use.

the specific devices:
Configuring Adaptec (SCSI-ID 7) at IO:330, IRQ 11, DMA priority 5
scsi : 1 host.
  Vendor: Quantum   Model: XP32150W          Rev: L912
  Type:   Direct-Access                      ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
  Vendor: TOSHIBA   Model: CD-ROM XM-3401TA  Rev: 2873
  Type:   CD-ROM                             ANSI SCSI revision: 02
Detected scsi CD-ROM sr0 at scsi0, channel 0, id 6, lun 0
scsi : detected 1 SCSI cdrom 1 SCSI disk total.
SCSI device sda: hdwr sector= 512 bytes. Sectors= 4199760 [2050 MB] [2.1 GB]

the specific errors (copied by hand -- syslog stops too, even remotely)

scsi: aborting due to timeout pid 27787, scsi0, channel
  0, id 0, lun 0 Write (10) 00 00 22 c0 11 00 00 02 00
SCSI host 0 abour (pid 27787) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
Sent BUS DEVICE RESET to target 0
Sending DID_RESET for target 0
Sending DID_RESET for target 0
Sending DID_RESET for target 0
aha1542_intr_handle:  Unexpected interrupt
tarstat=0 hastat=0 idlun=10 ccb#=1
aha1542_intr_handle:  Unexpected interrupt
tarstat=0 hastat=0 idlun=10 ccb#=2
Sending DID_RESET for target 0
Sending DID_RESET for target 0
aha1542_intr_handle:  Unexpected interrupt
tarstat=0 hastat=0 idlun=8 ccb#=6
aha1542_intr_handle:  Unexpected interrupt
tarstat=0 hastat=0 idlun=8 ccb#=7

another time today it wigged.  the "aborting.." message was
scrolling by too fast to copy, but the rest of the messages were

aha1542_intr_handle:  Unexpected interrupt
tarstat=0, hastat=0 idlun=8 cab#=0
Sending DID_RESET for target 0
aha1542_intr_handle:  Unexpected interrupt
tarstat=0, hastat=0 idlun=8 cab#=1
Sending DID_RESET for target 0
aha1542_intr_handle:  Unexpected interrupt
tarstat=0, hastat=0 idlun=8 cab#=2
Sending DID_RESET for target 0
aha1542_intr_handle:  Unexpected interrupt
tarstat=0, hastat=0 idlun=8 cab#=3
Sending DID_RESET for target 0
aha1542_intr_handle:  Unexpected interrupt
tarstat=0, hastat=0 idlun=8 cab#=4
Sending DID_RESET for target 0
aha1542_intr_handle:  Unexpected interrupt
tarstat=0, hastat=0 idlun=8 cab#=5
Sending DID_RESET for target 0
aha1542_intr_handle:  Unexpected interrupt
tarstat=0, hastat=0 idlun=8 cab#=6
Sending DID_RESET for target 0
aha1542_intr_handle:  Unexpected interrupt
tarstat=0, hastat=0 idlun=8 cab#=7

a friend has suggested that adaptec controllers and linux
drivers just don't play well together and recommends 
another type of controller.  is he on target?

is there any more info i can dig from this system to be helpful?

thank you,

duncan.


Reply via email to