-- Debian GNU/Linux 2.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Package: kernel Version: 2.0.34 submitted by: [EMAIL PROTECTED] symptom: system crash, scsi errors displayed occasionally my system will stop doing almost all tasks, spitting lots of scsi errors to whatever console is active, continuing to allow virtual console switching and ip forwarding but not logging remotely or doing anything that might require disk access including swapping. accessing large files several times seems to trigger the abnormal behavior, which always requires a hard reset. the disk is on a Tekram caching scsi controller which works with the ataptec 1542 driver. the scsi host adapter is fully populated with 16M of ram and is connected only to a CDROM drive and to a single hard drive. the drive is the last item on the chain and has active termination. the external scsi port is terminated with a bank of resistors on the card and is not in use. the specific devices: Configuring Adaptec (SCSI-ID 7) at IO:330, IRQ 11, DMA priority 5 scsi : 1 host. Vendor: Quantum Model: XP32150W Rev: L912 Type: Direct-Access ANSI SCSI revision: 02 Detected scsi disk sda at scsi0, channel 0, id 0, lun 0 Vendor: TOSHIBA Model: CD-ROM XM-3401TA Rev: 2873 Type: CD-ROM ANSI SCSI revision: 02 Detected scsi CD-ROM sr0 at scsi0, channel 0, id 6, lun 0 scsi : detected 1 SCSI cdrom 1 SCSI disk total. SCSI device sda: hdwr sector= 512 bytes. Sectors= 4199760 [2050 MB] [2.1 GB] the specific errors (copied by hand -- syslog stops too, even remotely) scsi: aborting due to timeout pid 27787, scsi0, channel 0, id 0, lun 0 Write (10) 00 00 22 c0 11 00 00 02 00 SCSI host 0 abour (pid 27787) timed out - resetting SCSI bus is being reset for host 0 channel 0. Sent BUS DEVICE RESET to target 0 Sending DID_RESET for target 0 Sending DID_RESET for target 0 Sending DID_RESET for target 0 aha1542_intr_handle: Unexpected interrupt tarstat=0 hastat=0 idlun=10 ccb#=1 aha1542_intr_handle: Unexpected interrupt tarstat=0 hastat=0 idlun=10 ccb#=2 Sending DID_RESET for target 0 Sending DID_RESET for target 0 aha1542_intr_handle: Unexpected interrupt tarstat=0 hastat=0 idlun=8 ccb#=6 aha1542_intr_handle: Unexpected interrupt tarstat=0 hastat=0 idlun=8 ccb#=7 another time today it wigged. the "aborting.." message was scrolling by too fast to copy, but the rest of the messages were aha1542_intr_handle: Unexpected interrupt tarstat=0, hastat=0 idlun=8 cab#=0 Sending DID_RESET for target 0 aha1542_intr_handle: Unexpected interrupt tarstat=0, hastat=0 idlun=8 cab#=1 Sending DID_RESET for target 0 aha1542_intr_handle: Unexpected interrupt tarstat=0, hastat=0 idlun=8 cab#=2 Sending DID_RESET for target 0 aha1542_intr_handle: Unexpected interrupt tarstat=0, hastat=0 idlun=8 cab#=3 Sending DID_RESET for target 0 aha1542_intr_handle: Unexpected interrupt tarstat=0, hastat=0 idlun=8 cab#=4 Sending DID_RESET for target 0 aha1542_intr_handle: Unexpected interrupt tarstat=0, hastat=0 idlun=8 cab#=5 Sending DID_RESET for target 0 aha1542_intr_handle: Unexpected interrupt tarstat=0, hastat=0 idlun=8 cab#=6 Sending DID_RESET for target 0 aha1542_intr_handle: Unexpected interrupt tarstat=0, hastat=0 idlun=8 cab#=7 a friend has suggested that adaptec controllers and linux drivers just don't play well together and recommends another type of controller. is he on target? is there any more info i can dig from this system to be helpful? thank you, duncan.
