Recently I have expanded the number of scsi device and scsi
channels on my file server and while for the most part things have been
working, there are still a few problems that remain that I have been
unable to solve.
First of all, let me explain my SCSI layout: The machine is a XLT
366 Dec Alpha w/192MB of RAM, with an onboard NCR8xxx fast scsi card and a
Adaptec 2940 PCI fast scsi card. SCSI chain zero is the Adaptec card,
while SCSI chain one is the NCR card. Some devices are in the main
computer case, the rest are either external (own case) or in a mid tower
computer case converted to hold multiple CD-ROM drives and hard drives.
scsi0:
Second computer case: (all data disks)
4.3 GB IBM Ultra Disk
4.2 GB UltraWide Quantum Disk with W->N Adapter.
2.5 GB Segate Fast Disk (x3)
4.1 GB Segate Fast Disk
scsi1:
Primary computer case:
2 GB IBM Fast Disk (system disk)
Yamaha 4416S CD-R
External cases:
Exabyte 8700LT Tape Drive (8505XL tape drive type)
Second computer case:
6.7X, 4X, & 2X Toshiba SCSI CD-ROMs
I have double checked both SCSI chains to verify that cable length is less
than maximum allowed for fast SCSI (3m). scsi0 is about 2.1m and scsi1 is
about 2.9m. Also all connectors are in good condition and solid. All
internal cables are 50pin ribbon cable with connectors spaced at least 5in
apart. External cables are HD50M<->HD50M cables. Lastly the three 2.5GB
disks on scsi0 make up a raid0 array, and eventually (once I solve these
problems) the 4.xGB disks will become a raid5 array.
Now, for the problems I have been experiencing... These have been
pretty consistent with a decent range of kernels, from 2.0.35 through to
2.2.9 (not including 2.1.x kernels of course).
* Now and then when the load on the scsi0 becomes rather high (i.e. four
or more disks being accesssed heavily, such as bulk copies or cd master
generation) a SCSI timeout will occur and while the kernel tries to
recover, the result is endless timeouts/resets and an essential system
crash. This can occur also during the massive fsck that occurs when
rebooting after a crash, though it usually recovers from scsi time outs
(via scsi bus resets) in this case.
* When using cdparanoia to copy cdda data from audio CDs on the toshiba
drives I will also get scsi timeouts and transit failures. This also
crashes the system quite well.
* When using dd (via xcdroast) to copy the contents of a CD in bulk to
the raid0 array in preparation for make a CD-R copy of a CD. This usually
results in a kernel panic just as the copy finishes, leading me to think
that the system is trying to read beyond the end of the disk and hitting
something nasty.
* At times, when performing backups, especially incremental ones, to the
tape drive, a scsi timeout will occur. This time it is unable to reset the
scsi bus, and the system usually crashes. New data tapes seems to have
solved this problem (old ones where 1.5yrs old).
The latter three problems are not really as major as the first
one, but all are quite annoying. I have not really been able to find any
pattern or direct cause for the first two, while third one appears to be a
kernel/xcdroast problem. If it a software problem, I am not too worried as
I can get a new version or work around it. It is the first two problems
that point to some sort of hardware problem.
I have shortened up the scsi chains as much as possible, and while
that has reduced the frequency of the problems to some extend, it has not
removed them. I have read through the comp.pher.scsi faq several times
looking for clues and the best I have been able to find is that fast scsi
chains just can not reallyhandle more than ~5 devices, which need to be
spaced out on the chain more than I have them.
Also, I am aware the Adaptec cards are not the greatest in terms
of quality, especially due to the auto-termination mis-feature. I can not
verify that the internal termination is turned on for the card (no
internal devices), as on Alpha there is no opportunity to enter the card's
BIOS. But I am quite sure that when I last had the card in an intel box,
that the termination was NOT set to auto.
I have a BusLogic 930 (Flastpoint) SCSI card I would much rather
use, but apparently Alpha Linux does not support that card yet. :(
The only ideas I have to solve this problem is to shorten the
scsi1 chain by moving the toshiba drives to the primary computer case, and
remove any connection to the second computer case. Then get another scsi
card, and put the 2.5GB disks on one controller and the 4.xGB disks on the
other, using ribbon cables with further spaced connectors.
Finally then, I am looking for some advice to reduce or minimize
these problems. Will my above solution solve the problem, or is there
something else that I am missing? I am sure there are people out there
that have been down the same road I am on and could provide me with some
help. Thanks in advance! Sorry for the long post, but this is not exactly
a cut and dry problem. :(
----------------------------------------------------------------------------
| "For to me to live is Christ, and to die is gain." |
| --- Philippians 1:21 (KJV) |
----------------------------------------------------------------------------
| Ryan Kirkpatrick | Boulder, Colorado | [EMAIL PROTECTED] |
----------------------------------------------------------------------------
| http://www-ugrad.cs.colorado.edu/~rkirkpat/ |
----------------------------------------------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]