Hello,
My name is Mariella and I need to deal with a dual processor
machine (Intel Pentium II) with Linux as O.S. (RED HAT 5.2 and kernel
version
is 2.2.10) and the following HW:
- ULTRA SCSI RAID CONTROLLER (CMD, CRD 5500),
that manages an array of disks for a total capacity
of 3 tera-bytes.
- 4 Adaptec SCSI adapters (the one
connected to the ULTRA SCSI controller is
Adaptec AHA-294X Ultra2 SCSI host adapter
Ultra-2 LVD/SE Wide),
- 1 Giga of RAM
The HW is often stressed because of an indexing process,
that runs very often.
Since few months we have been having kernel errors, likely
related to hw problems.
We tried to solve all the problems changing all the SCSI cables
and now they seem to be OK and we don't get those errors any longer.
Since a few days ago we have got Linux kernel
messages saying:
kernel: scsi2 channel 0 : resetting for second half of retries.
kernel: SCSI bus is being reset for host 2 channel 0.
kernel: SCSI disk error : host 2 channel 0 id 0 lun 0 return code =
260 30000
and after a while the load average increases (due to the fact that no one
can send messages on the SCSI bus, because of the RESET) so high the
system
crashes.
The host 2 (the one that RESETS the SCSI bus) is the Adapatec adapter that
talks to the CMD controller.
We tried to switch to the other Adaptec adapter (with the same
characteristics of the other and qualified for the SCSI controller) we
have
and we have been getting the same
problem, the only difference is the host number in the error messages.
We changed the SCSI cable that leads form the adapter to the controller.
We have a meny system through which we can look into the SCSI controller,
but it doesn't show any kind of errors.
Since yesterday we have been getting also other messages, such as:
kernel: (scsi2:0:0:0) Parity error during Data-Out phase.
kernel: scsi : aborting command due to timeout : pid 1214123, scsi2,
channel 0, id 0, lun 0 Read (10) 00 14 7b 23 8d 00 00 02 00
The reason why I sent you this email is because I tried to
read the kernel sources, from where those messages are sent and I found
that you wrote that code.
Because actually, at the moment, I don't have a clue what to do,
I wonder if you could tell me when the kernel sent those messages
(Parity error during Data-Out phase, RESET).
I wonder if you could tell me when the kernel sent those messages
(Parity error during Data-Out phase, RESET).
Thanks for your time.
Regards,
Mariella
_____________________________
Mariella Di Giacomo
Los Alamos National Laboratory
Research Library, MS P362, P.O. Box 1663
Los Alamos, NM 87545-1362
Email: [EMAIL PROTECTED]
Phone: +1 (505) 665 4601
Fax: +1 (505) 665 6452
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]