On Thu, 14 Jan 1999, Ola Sigurdson wrote:

> Don't rely on software RAID on the Compaqs!!!!
> 
> Unplugging active disks causes a 100 % repeatable kernel crash (total
> lockup).
> As far as I can tell it's caused by bugs in the NCR driver
> 
> This is with kernel 2.0.36 & Compaq Proliant 1600.

I would suggest you the following testings:

1 - Upgrade to driver sym53c8xx-1.0a.
    ftp://ftp.tux.org/roudier/896/
        sym53c8xx-1.0.tar.gz            (full sources to move to 
                                         linux/drivers/scsi)
                 +
        sym53c8xx-1.0-to-1.0a.patch.gz  (kernel patch)

2 - Kill everything that may prevent kernel messages from being printed 
    to the console if the kernel gets unable to perform disk IOs.
    (killing syslogd and klogd should be enough)

3 - Run something that writes to a file-system or/and a partition
    without using RAID and turn off the disk.

4 - Wait time enough for SCSI timeouts to have chance occur. 
    Value is 20 seconds on 2.0.35/36 kernels.
    But you can decrease this value by changing SD_TIMEOUT define in 
    drivers/scsi/scsi.c to something like 5 seconds (5*HZ) for the 
    tests.

Let me know if the system locks up hard under such a creash-test (I mean
no kernel messages related to SCSI timeouts, resets and IO errors are
displayed to the console). 

If it does not, perform the same testings:

5 - Using a stock driver (preferently some version > 3.1d)
6 - Using RAID.

Let me know. Thanks.


Regards,
   Gerard.

Reply via email to