Leonard N. Zubkoff wrote:
> 
> Generally, the Mylex PCI RAID controllers take disks offline when certain types
> of unrecoverable errors occur.  The driver will log the reason for any disk
> being killed as a console message.  Without further information as to precisely
> why the disks were taken offline and whether they all were taken offline
> simultaneously, it's hard to know what happened.  Firmware bugs in either the
> controller firmware or disk drives are a plausible reason, as would be a
> problem with the SCSI controller chip on the AcceleRAID, or an electrical
> problem on the SCSI bus.

I removed the Mylex controler, and since it's a production server,
I cannot do experiments on this one, so I cannot get any more informations
(I'm sorry about that because I find very important to spend some time
helping maintainers fix things).
Also I know that dmesg output is important for mainteners, so please find
below what I saved before removing the controler. Please also notice that
the output contains many lines related to the fact that I tryed to
force disks back online. I'm sorry about not having the very first error
messages, but they was so much output from indirect troubles that append
after the inititial problem that dmesg beginning was already troncated
when I logged on the machine.

The most significant message is probably:
  DAC960#0: Physical Drive 0:x killed because of bad tag returned from drive
but I don't find it meaningfull at all, and since there is no source code
available to scan, that's why I stopped trying to cope with this controler.

Also, if you want to test the controler yourself, I can ask my compagny
to send and give it to you, since we are not going to use it any more.

In the next monthes, I will use the same disks set, with the same cables
(except the one linking to the controler, since the pins are different)
driven using Linux software RAID and the Tekram DC390U2W, so I will
send you any news about a failure that would append. 

I also discovered that Mylex is not using the last megabyte on each
disk, so I can use
  persistent-superblock 1
in my new /etc/raidtab file.
This can be interresting for you to know since it states that changing
from Mylex AcceleRAID to Linux software RAID 0.90 can be done without
clearing datas.

Regards, and many thanks for the great work that you do, even if my
personal experiment is leading me to drop all sophisticated devices
and rather use simpler ones where the sophisticated features being
performed in free (source code available) softwares, that I can read
in case of failure. 

Hubert Tonneau
***** DAC960 RAID Driver Version 2.2.4 of 23 August 1999 *****
Copyright 1998-1999 by Leonard N. Zubkoff <[EMAIL PROTECTED]>
Configuring Mylex DAC960PTL1 PCI RAID Controller
  Firmware Version: 4.06-0-60, Channels: 1, Memory Size: 8MB
  PCI Bus: 0, Device: 13, Function: 1, I/O Address: Unassigned
  PCI Address: 0xF6800000 mapped at 0xD0000000, IRQ Channel: 9
  Controller Queue Depth: 128, Maximum Blocks per Command: 128
  Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33
  Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 128/32
  Physical Devices:
    0:1  Vendor: SEAGATE   Model: ST150176LC        Revision: 0001
         Serial Number: NQ050821000019480B8Z
    0:2  Vendor: SEAGATE   Model: ST150176LC        Revision: 0001
         Serial Number: NQ0518520000194804HG
         Disk Status: Dead, 97691648 blocks, 4 resets
    0:3  Vendor: SEAGATE   Model: ST150176LC        Revision: 0001
         Serial Number: NQ05160200001948K2Z7
         Disk Status: Dead, 97691648 blocks, 4 resets
    0:4  Vendor: SEAGATE   Model: ST150176LC        Revision: 0001
         Serial Number: NQ01051700001948JQZA
         Disk Status: Dead, 97691648 blocks, 4 resets
    0:5  Vendor: SEAGATE   Model: ST150176LC        Revision: 0001
         Serial Number: NQ050859000019480B5E
         Disk Status: Dead, 97691648 blocks, 4 resets
    0:6  Vendor: SEAGATE   Model: ST150176LC        Revision: 0001
         Serial Number: NQ02821600001948JQMF
         Disk Status: Dead, 97691648 blocks, 4 resets
  Logical Drives:
    /dev/rd/c0d0: RAID-5, Offline, 390766592 blocks, Write Back
  No Rebuild or Consistency Check in Progress

DAC960#0: Make Online of Physical Drive 0:6 Succeeded
DAC960#0: Physical Drive 0:6 is now ONLINE
DAC960#0: Make Online of Physical Drive 0:6 Illegal
DAC960#0: Make Online of Physical Drive 0:2 Succeeded
DAC960#0: Physical Drive 0:2 is now ONLINE
DAC960#0: Make Online of Physical Drive 0:3 Succeeded
DAC960#0: Physical Drive 0:3 is now ONLINE
DAC960#0: Make Online of Physical Drive 0:1 Illegal
DAC960#0: Make Online of Physical Drive 0:4 Succeeded
DAC960#0: Make Online of Physical Drive 0:5 Succeeded
DAC960#0: Physical Drive 0:4 is now ONLINE
DAC960#0: Physical Drive 0:5 is now ONLINE
DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE
DAC960#0: Make Online of Physical Drive 0:6 Illegal
DAC960#0: Make Online of Physical Drive 0:1 Illegal
 rd/c0d0: unknown partition table
DAC960#0: Physical Drive 0:2 killed because of bad tag returned from drive
DAC960#0: Physical Drive 0:3 killed because of bad tag returned from drive
DAC960#0: Physical Drive 0:4 killed because of bad tag returned from drive
DAC960#0: Physical Drive 0:5 killed because of bad tag returned from drive
DAC960#0: Physical Drive 0:6 killed because of bad tag returned from drive
DAC960#0: Physical Drive 0:2 killed because it was removed
DAC960#0: Physical Drive 0:2 is now DEAD
DAC960#0: Physical Drive 0:3 is now DEAD
DAC960#0: Physical Drive 0:4 is now DEAD
DAC960#0: Physical Drive 0:5 is now DEAD
DAC960#0: Physical Drive 0:6 is now DEAD
DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now OFFLINE
NET4: AppleTalk 0.18 for Linux NET4.0
DAC960#0: Make Online of Physical Drive 0:6 Failed - Unable to Start Device
DAC960#0: Make Online of Physical Drive 0:1 Illegal
DAC960#0: Make Online of Physical Drive 0:2 Failed - Unable to Start Device
DAC960#0: Make Online of Physical Drive 0:3 Failed - Unable to Start Device
DAC960#0: Make Online of Physical Drive 0:4 Failed - Unable to Start Device

Reply via email to