On Sat, May 26, 2007 at 12:51:09PM +0100, Steve Fairhead wrote:
> Hi folks,
> 
> One of the servers (running 4.0, generic, fully patched) I'm responsible for
> has had a panic (see title line). I'll confess right away that I wasn't able
> to run trace or ps; I was away from the machine at the time and had to guide
> a colleague by phone through restarting the machine in a hurry - he had an
> office full of users breathing down his neck...
> 
> Briefly: this machine runs an external 3Tb RAID array (a Nexsan ATAboy) via
> an Adaptec 29160 SCSI card; the RAID array is configured as four logical
> drives. Checking the logs, I see a bunch of parity errors a few days before,
> and then another bunch immediately prior to the panic. (The log lines, and
> the dmesg, follow my sig.) After restarting, the ATAboy self-diagnostics
> reported no errors. (I've run other tests which have reassured me we've lost
> no data.) The log shows errors on three of the four drives, which perhaps is
> unsurprising if it's the SCSI connection which wobbled.
> 
> Are there any known issues with this SCSI card or driver (ahc)? Or do we
> just have flakey hardware? I've run memtest86+ ad nauseam etc etc with no
> issues at all, so I'm fairly confident about the base machine, but now
> unsure about the Adaptec card. The machine has otherwise been running
> happily with no errors or issues for several months now. Perhaps
> significantly, a large amount of data was being copied to the RAID array at
> the time, but this had been done many times before without issue.
> 
> All cluebats gratefully received.
> 
> Steve
> http://www.fivetrees.com

There are many known issues with ahc, known in the sense that
mysterious errors do occur on apparently random instances of
identical hardware. But if your hardware has worked up to this point
without error I would tend to discount ahc as the problem. Assuming
the driver is correctly reporting parity errors while reading data
off the bus it would appear that the data path between your external
box and the server is flakey or being disturbed in some way. And
eventually corrupt data gets through.

.... Ken

> 
> *** Extracts from /var/log/messages:
> 
> May 18 04:27:30 hglserver /bsd: sd3(ahc0:4:4): parity error detected in
> Data-in phase. SEQADDR(0x55) SCSIRATE(0xc2)
> May 18 04:27:30 hglserver /bsd:         CRC Value Mismatch
> May 18 04:27:30 hglserver /bsd: sd3(ahc0:4:4): parity error detected in
> Data-in phase. SEQADDR(0x63) SCSIRATE(0xc2)
> May 18 04:27:30 hglserver /bsd:         CRC Value Mismatch
> May 18 04:27:30 hglserver /bsd: sd3(ahc0:4:4): parity error detected in
> Data-in phase. SEQADDR(0x63) SCSIRATE(0xc2)
> May 18 04:27:30 hglserver /bsd:         CRC Value Mismatch
> May 18 04:27:30 hglserver /bsd: sd3(ahc0:4:4): parity error detected in
> Data-in phase. SEQADDR(0x4e) SCSIRATE(0xc2)
> May 18 04:27:30 hglserver /bsd:         CRC Value Mismatch
> 
> (note: 4:27 corresponds to a time during which I run a crontab'ed rsync from
> another machine for partial offsite backup.)
> 
> ... <snip> ...
> 
> May 23 16:53:56 hglserver /bsd: sd1(ahc0:4:2): parity error detected in
> Data-in phase. SEQADDR(0x1a7) SCSIRATE(0xc2)
> May 23 16:53:56 hglserver /bsd:         CRC Value Mismatch
> May 23 16:54:22 hglserver /bsd: sd2(ahc0:4:3): parity error detected in
> Data-in phase. SEQADDR(0x84) SCSIRATE(0xc2)
> May 23 16:54:22 hglserver /bsd:         CRC Value Mismatch
> May 23 16:54:25 hglserver /bsd: sd2(ahc0:4:3): parity error detected in
> Data-in phase. SEQADDR(0x54) SCSIRATE(0xc2)
> May 23 16:54:25 hglserver /bsd:         CRC Value Mismatch
> May 23 16:54:27 hglserver /bsd: sd2(ahc0:4:3): parity error detected in
> Data-in phase. SEQADDR(0x54) SCSIRATE(0xc2)
> May 23 16:54:27 hglserver /bsd:         CRC Value Mismatch
> May 23 16:54:27 hglserver /bsd: sd2(ahc0:4:3): parity error detected in
> Data-in phase. SEQADDR(0x54) SCSIRATE(0xc2)
> May 23 16:54:27 hglserver /bsd:         CRC Value Mismatch
> May 23 16:54:38 hglserver /bsd: sd1(ahc0:4:2): parity error detected in
> Data-in phase. SEQADDR(0x1a7) SCSIRATE(0xc2)
> May 23 16:54:38 hglserver /bsd:         CRC Value Mismatch
> May 23 18:31:21 hglserver syslogd: restart
> May 23 18:31:21 hglserver /bsd: start = 0, len = 9793, fs = /s1
> May 23 18:31:21 hglserver /bsd: panic: ffs_alloccg: map corrupted
> 
> (note: panic occurred at 16:54; machine restarted at 18:31 after lengthy
> fscks...)
> 
> *** dmesg:
> 
> OpenBSD 4.0-stable (GENERIC) #10: Mon May 14 20:04:41 BST 2007
>     [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
> cpu0: AMD Sempron(tm) 2400+ ("AuthenticAMD" 686-class, 256KB L2 cache) 1.67
> GHz
> cpu0:
> FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,
> FXSR,SSE
> real mem  = 1073246208 (1048092K)
> avail mem = 971010048 (948252K)
> using 4256 buffers containing 53764096 bytes (52504K) of memory
> mainbus0 (root)
> bios0 at mainbus0: AT/286+(00) BIOS, date 12/08/04, BIOS32 rev. 0 @ 0xfda50,
> SMBIOS rev. 2.3 @ 0xf0630 (29 entries)
> pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
> pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xf7f00/192 (10 entries)
> pcibios0: PCI Interrupt Router at 000:17:0 ("VIA VT8237 ISA" rev 0x00)
> pcibios0: PCI bus #1 is the last bus
> bios0: ROM list: 0xc0000/0x9000 0xc9000/0x5400
> cpu0 at mainbus0
> pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
> pchb0 at pci0 dev 0 function 0 "VIA VT8377 PCI" rev 0x80
> ppb0 at pci0 dev 1 function 0 "VIA VT8377 AGP" rev 0x00
> pci1 at ppb0 bus 1
> vga1 at pci1 dev 0 function 0 "Matrox MGA G400/G450 AGP" rev 0x85
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> re0 at pci0 dev 10 function 0 "Realtek 8169" rev 0x10: irq 5, address
> 00:14:6c:c0:28:60
> rgephy0 at re0 phy 7: RTL8169S/8110S PHY, rev. 0
> ahc0 at pci0 dev 12 function 0 "Adaptec AHA-29160 U160" rev 0x02: irq 11
> scsibus0 at ahc0: 16 targets
> sd0 at scsibus0 targ 4 lun 1: <NEXSAN, ATAboy2, 4r41> SCSI2 0/direct fixed
> sd0: 858306MB, 53644 cyl, 128 head, 256 sec, 512 bytes/sec, 1757812500 sec
> total
> sd1 at scsibus0 targ 4 lun 2: <NEXSAN, ATAboy2, 4r41> SCSI2 0/direct fixed
> sd1: 858306MB, 53644 cyl, 128 head, 256 sec, 512 bytes/sec, 1757812500 sec
> total
> sd2 at scsibus0 targ 4 lun 3: <NEXSAN, ATAboy2, 4r41> SCSI2 0/direct fixed
> sd2: 858306MB, 53644 cyl, 128 head, 256 sec, 512 bytes/sec, 1757812500 sec
> total
> sd3 at scsibus0 targ 4 lun 4: <NEXSAN, ATAboy2, 4r41> SCSI2 0/direct fixed
> sd3: 286012MB, 35751 cyl, 128 head, 128 sec, 512 bytes/sec, 585753906 sec
> total
> pciide0 at pci0 dev 15 function 0 "VIA VT6420 SATA" rev 0x80: DMA
> pciide0: using irq 10 for native-PCI interrupt
> pciide1 at pci0 dev 15 function 1 "VIA VT82C571 IDE" rev 0x06: ATA133,
> channel 0 configured to compatibility, channel 1 configured to compatibility
> wd0 at pciide1 channel 0 drive 0: <Maxtor 7L250R0>
> wd0: 16-sector PIO, LBA48, 239372MB, 490234752 sectors
> wd1 at pciide1 channel 0 drive 1: <Maxtor 7L250R0>
> wd1: 16-sector PIO, LBA48, 239372MB, 490234752 sectors
> wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 6
> wd1(pciide1:0:1): using PIO mode 4, Ultra-DMA mode 6
> atapiscsi0 at pciide1 channel 1 drive 0
> scsibus1 at atapiscsi0: 2 targets
> cd0 at scsibus1 targ 0 lun 0: <SONY, DVD RW DW-D23A, CYS1> SCSI0 5/cdrom
> removable
> cd0(pciide1:1:0): using PIO mode 4, Ultra-DMA mode 4
> uhci0 at pci0 dev 16 function 0 "VIA VT83C572 USB" rev 0x81: irq 11
> usb0 at uhci0: USB revision 1.0
> uhub0 at usb0
> uhub0: VIA UHCI root hub, rev 1.00/1.00, addr 1
> uhub0: 2 ports with 2 removable, self powered
> uhci1 at pci0 dev 16 function 1 "VIA VT83C572 USB" rev 0x81: irq 11
> usb1 at uhci1: USB revision 1.0
> uhub1 at usb1
> uhub1: VIA UHCI root hub, rev 1.00/1.00, addr 1
> uhub1: 2 ports with 2 removable, self powered
> uhci2 at pci0 dev 16 function 2 "VIA VT83C572 USB" rev 0x81: irq 10
> usb2 at uhci2: USB revision 1.0
> uhub2 at usb2
> uhub2: VIA UHCI root hub, rev 1.00/1.00, addr 1
> uhub2: 2 ports with 2 removable, self powered
> uhci3 at pci0 dev 16 function 3 "VIA VT83C572 USB" rev 0x81: irq 10
> usb3 at uhci3: USB revision 1.0
> uhub3 at usb3
> uhub3: VIA UHCI root hub, rev 1.00/1.00, addr 1
> uhub3: 2 ports with 2 removable, self powered
> ehci0 at pci0 dev 16 function 4 "VIA VT6202 USB" rev 0x86: irq 5
> usb4 at ehci0: USB revision 2.0
> uhub4 at usb4
> uhub4: VIA EHCI root hub, rev 2.00/1.00, addr 1
> uhub4: 8 ports with 8 removable, self powered
> viapm0 at pci0 dev 17 function 0 "VIA VT8237 ISA" rev 0x00
> iic0 at viapm0
> auvia0 at pci0 dev 17 function 5 "VIA VT8233 AC97" rev 0x60: irq 5
> ac97: codec id 0x434d4983 (C-Media Electronics CMI9761A+)
> audio0 at auvia0
> vr0 at pci0 dev 18 function 0 "VIA RhineII-2" rev 0x78: irq 11, address
> 00:0b:6a:c0:3f:14
> ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 10: OUI
> 0x004063, model 0x0032
> isa0 at mainbus0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5
> pckbd0 at pckbc0 (kbd slot)
> pckbc0: using irq 1 for kbd slot
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pmsi0 at pckbc0 (aux slot)
> pckbc0: using irq 12 for aux slot
> wsmouse0 at pmsi0 mux 0
> pcppi0 at isa0 port 0x61
> midi0 at pcppi0: <PC speaker>
> spkr0 at pcppi0
> lpt0 at isa0 port 0x378/4 irq 7
> lm0 at isa0 port 0x290/8: W83697HF
> npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
> pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
> fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
> fd1 at fdc0 drive 1: density unknown
> biomask ef6d netmask ef6d ttymask ffef
> pctr: user-level cycle counter enabled
> mtrr: Pentium Pro MTRR support
> ahc0: target 4 using 16bit transfers
> ahc0: target 4 synchronous at 80.0MHz DT, offset = 0x30
> dkcsum: sd0 matches BIOS drive 0x82
> dkcsum: sd1 matches BIOS drive 0x83
> dkcsum: sd2 matches BIOS drive 0x84
> dkcsum: sd3 matches BIOS drive 0x85
> dkcsum: wd0 matches BIOS drive 0x80
> dkcsum: wd1 matches BIOS drive 0x81
> root on wd0a
> rootdev=0x0 rrootdev=0x300 rawdev=0x302
> WARNING: / was not properly unmounted
> re0: watchdog timeout
> re0: watchdog timeout

Reply via email to