UPDATE:

It might not have any relation, one of our PowerEdges 2850 on the US (the 2950 one was in... Argentina) just went kaboom almost the same way... It stopped accesing the disk, it was responsive but you couldnt do anything, the console (IP KVM) showed: "ami0: timeout ccb" several times. This is an older system with other LSI card and OpenBSD 3.9 with RAID10. This one went even further with the problem, on reboot (IP PowerStrips :P), during the RAID card initialization it said "TBBU cache data is invalid" and then:

DRAM/NVRAM cfg match
Disks have good cfg but they do not match DRAM cfg
Firmware cannot flush cache

After resaving RAID configuration it started OK and everything seems nominal... Nothing on the logs. It was very alike to the other problem, but both could have been crappy Dell hardware, just mentioning this for if anyone gets any idea... Something funny to mention, this machine was carped but it was sending advertises over the carp interface, even thought it had no disk, so the other never took the MASTER

Thanks,
Alejandro.

dmesg:

OpenBSD 3.9-stable (GENERIC) #5: Thu Jan  4 19:36:23 GMT 2007
   [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Xeon(TM) CPU 2.80GHz ("GenuineIntel" 686-class) 2.80 GHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,CNXT-ID
real mem  = 2146807808 (2096492K)
avail mem = 1952804864 (1907036K)
using 4278 buffers containing 107442176 bytes (104924K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(00) BIOS, date 01/09/06, BIOS32 rev. 0 @ 0xffe90
pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfb600/320 (18 entries)
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82801EB/ER LPC" rev 0x00)
pcibios0: PCI bus #9 is the last bus
bios0: ROM list: 0xc0000/0xb000! 0xcb000/0x1000 0xcc000/0x800 0xcc800/0x1000 0xcd800/0x2600 0xec000/0x4000!
ipmi0 at mainbus0: version 1.5 interface KCS iobase 0xca8/8 spacing 4
cpu0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "Intel E7520 MCH" rev 0x09
ppb0 at pci0 dev 2 function 0 "Intel MCH PCIE" rev 0x09
pci1 at ppb0 bus 1
ppb1 at pci1 dev 0 function 0 "Intel IOP331 Channel 0" rev 0x06
pci2 at ppb1 bus 2
mpt0 at pci2 dev 5 function 0 "Symbios Logic 53c1030" rev 0x08: irq 7
scsibus0 at mpt0: 16 targets
mpt1 at pci2 dev 5 function 1 "Symbios Logic 53c1030" rev 0x08: irq 3
scsibus1 at mpt1: 16 targets
ppb2 at pci1 dev 0 function 2 "Intel IOP331 Channel 1" rev 0x06
pci3 at ppb2 bus 3
ami0 at pci3 dev 11 function 0 "Symbios Logic MegaRAID" rev 0x01: irq 3 Dell 518 64b/lhc
ami0: FW 352A, BIOS v1.10, 128MB RAM
ami0: 2 channels, 0 FC loops, 1 logical drives
scsibus2 at ami0: 40 targets
sd0 at scsibus2 targ 0 lun 0: <AMI, Host drive #00, > SCSI2 0/direct fixed
sd0: 279800MB, 279800 cyl, 64 head, 32 sec, 512 bytes/sec, 573030400 sec total
scsibus3 at ami0: 16 targets
safte0 at scsibus3 targ 6 lun 0: <PE/PV, 1x6 SCSI BP, 1.0> SCSI2 3/processor fixed
scsibus4 at ami0: 16 targets
ppb3 at pci0 dev 4 function 0 "Intel MCH PCIE" rev 0x09
pci4 at ppb3 bus 4
ppb4 at pci0 dev 5 function 0 "Intel MCH PCIE" rev 0x09
pci5 at ppb4 bus 5
ppb5 at pci5 dev 0 function 0 "Intel PCIE-PCIE" rev 0x09
pci6 at ppb5 bus 6
em0 at pci6 dev 7 function 0 "Intel PRO/1000MT (82541GI)" rev 0x05: irq 11, address 00:18:8b:34:86:bd
ppb6 at pci5 dev 0 function 2 "Intel PCIE-PCIE" rev 0x09
pci7 at ppb6 bus 7
em1 at pci7 dev 8 function 0 "Intel PRO/1000MT (82541GI)" rev 0x05: irq 3, address 00:18:8b:34:86:be
ppb7 at pci0 dev 6 function 0 "Intel MCH PCIE" rev 0x09
pci8 at ppb7 bus 8
uhci0 at pci0 dev 29 function 0 "Intel 82801EB/ER USB" rev 0x02: irq 11
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1 "Intel 82801EB/ER USB" rev 0x02: irq 10
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2 at pci0 dev 29 function 2 "Intel 82801EB/ER USB" rev 0x02: irq 7
usb2 at uhci2: USB revision 1.0
uhub2 at usb2
uhub2: Intel UHCI root hub, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
ehci0 at pci0 dev 29 function 7 "Intel 82801EB/ER USB2" rev 0x02: irq 5
usb3 at ehci0: USB revision 2.0
uhub3 at usb3
uhub3: Intel EHCI root hub, rev 2.00/1.00, addr 1
uhub3: 6 ports with 6 removable, self powered
ppb8 at pci0 dev 30 function 0 "Intel 82801BA AGP" rev 0xc2
pci9 at ppb8 bus 9
vga1 at pci9 dev 13 function 0 "ATI Radeon VE QY" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ichpcib0 at pci0 dev 31 function 0 "Intel 82801EB/ER LPC" rev 0x02
pciide0 at pci0 dev 31 function 1 "Intel 82801EB/ER IDE" rev 0x02: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility
atapiscsi0 at pciide0 channel 0 drive 0
scsibus5 at atapiscsi0: 2 targets
cd0 at scsibus5 targ 0 lun 0: <HL-DT-ST, CDRW/DVD GCC4244, B101> SCSI0 5/cdrom removable
cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 disabled (no drives)
isa0 at ichpcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pmsi0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pmsi0 mux 0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
spkr0 at pcppi0
npx0 at isa0 port 0xf0/16: using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
biomask efed netmask efed ttymask ffef
pctr: user-level cycle counter enabled
uhub4 at uhub3 port 3
uhub4: Dell product 0xa001, rev 2.00/0.00, addr 2
uhub4: 2 ports with 2 removable, self powered, multiple transaction translators
dkcsum: sd0 matches BIOS drive 0x80
root on sd0a
rootdev=0x400 rrootdev=0xd00 rawdev=0xd02
WARNING: / was not properly unmounted

Marco Peereboom wrote:

I honestly have no clue.  I have banged on my mfis as much as I could
and have never seen anything like this.  I am doing some investigation
into this.  If you find a way to repro this let me know please.

On Mon, Feb 12, 2007 at 01:12:56PM -0500, Alejandro Lozanoff wrote:
I was planning on running bonnie tonight (This is a production server)
to get some IO and see if i can reproduce the problem.
We have 3 of this servers with OBSD 4.0 stable and the same RAID card
running for about 2 months now, this is the first problem we
encounter. If you have any idea of how to reproduce this i would be
glad to test it in one of the servers that we are not using right now.

Thanks for replying
Alejandro.-

Quoting Marco Peereboom <[EMAIL PROTECTED]>:

I have never seen this but I am very interested in this particular
instance.  Apparently there is an issue with read ahead on mfi that I
have never seen before on OpenBSD but other OS' have run into.

Is this reproducible?  If so can you try to disable read ahead in
CTRL-R (bios)?

Thanks,
/marco

On Mon, Feb 12, 2007 at 11:31:26AM -0500, Alejandro Lozanoff wrote:
Sorry for the message without body, im a little sleepy and hitted the
wrong button... :p

Ok,

I had this problem last night on one of our shiny Dell PowerEdge 2950s
with RAID 10 and SAS disks.
For no reason it started screaming "sd0: not queued: error 5" on the
console, the  server didnt crash (no core, no trace) but was in an
unresponsive state, i couldnt log on or anything. I was wondering if
this error could be a hardware problem (RAID card maybe?).
The logs dont show anything notorious, there were no heavy IO at the
time of the problem. I'm kind of clueless at what could have been the
problem, a quick search showed some old messages with SCSI cards
problems only.

This is 4.0 Stable as shown below.

dmesg follows:

OpenBSD 4.0-stable (GENERIC) #1: Mon Nov 27 16:23:49 GMT 2006
  [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Xeon(TM) CPU 3.00GHz ("GenuineIntel" 686-class) 3 GHz
cpu0:

FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS
H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,CNXT-ID,C
X16
cpu0: Enhanced SpeedStep disabled by BIOS
real mem  = 1072955392 (1047808K)
avail mem = 970735616 (947984K)
using 4256 buffers containing 53751808 bytes (52492K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(00) BIOS, date 06/21/06, BIOS32 rev. 0 @
0xffe90, SMBIOS rev. 2.4 @ 0x3ffbc000 (62 entries)
bios0: Dell Inc. PowerEdge 2950
pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfade0/384 (22 entries)
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 6321ESB LPC" rev 0x00)
pcibios0: PCI bus #16 is the last bus
bios0: ROM list: 0xc0000/0x9000! 0xc9000/0x1000 0xca000/0x1800
0xcb800/0x5200 0xec000/0x4000!
ipmi0 at mainbus0: version 2.0 interface KCS iobase 0xca8/8 spacing 4
cpu0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "Intel 5000X Host" rev 0x12
ppb0 at pci0 dev 2 function 0 "Intel 5000 PCIE" rev 0x12
pci1 at ppb0 bus 6
ppb1 at pci1 dev 0 function 0 "Intel 6321ESB PCIE" rev 0x01
pci2 at ppb1 bus 7
ppb2 at pci2 dev 0 function 0 "Intel 6321ESB PCIE" rev 0x01
pci3 at ppb2 bus 8
ppb3 at pci3 dev 0 function 0 "ServerWorks PCIE-PCIX" rev 0xc2
pci4 at ppb3 bus 9
bnx0 at pci4 dev 0 function 0 "Broadcom BCM5708" rev 0x11: irq 5,
address 00:18:8b:72:c0:fd
brgphy0 at bnx0 phy 1: BCM5708C 10/100/1000baseT PHY, rev. 5
ppb4 at pci2 dev 1 function 0 "Intel 6321ESB PCIE" rev 0x01
pci5 at ppb4 bus 10
ppb5 at pci1 dev 0 function 3 "Intel 6321ESB PCIE-PCIX" rev 0x01
pci6 at ppb5 bus 11
ppb6 at pci0 dev 3 function 0 "Intel 5000 PCIE" rev 0x12
pci7 at ppb6 bus 1
ppb7 at pci7 dev 0 function 0 "Intel IOP333 PCIE-PCIX" rev 0x00
pci8 at ppb7 bus 2
mfi0 at pci8 dev 14 function 0 "Dell PERC 5" rev 0x00: irq 6
mfi0: logical drives 1, version 5.0.1-0030, 256MB RAM
scsibus0 at mfi0: 1 targets
sd0 at scsibus0 targ 0 lun 0: <DELL, PERC 5/i, 1.00> SCSI3 0/direct fixed
sd0: 278784MB, 278784 cyl, 64 head, 32 sec, 512 bytes/sec, 570949632 sec
total
ppb8 at pci7 dev 0 function 2 "Intel IOP333 PCIE-PCIX" rev 0x00
pci9 at ppb8 bus 3
ppb9 at pci0 dev 4 function 0 "Intel 5000 PCIE" rev 0x12
pci10 at ppb9 bus 12
ppb10 at pci0 dev 5 function 0 "Intel 5000 PCIE" rev 0x12
pci11 at ppb10 bus 13
ppb11 at pci0 dev 6 function 0 "Intel 5000 PCIE" rev 0x12
pci12 at ppb11 bus 14
ppb12 at pci0 dev 7 function 0 "Intel 5000 PCIE" rev 0x12
pci13 at ppb12 bus 15
pchb1 at pci0 dev 16 function 0 "Intel 5000 Error Reporting" rev 0x12
pchb2 at pci0 dev 16 function 1 "Intel 5000 Error Reporting" rev 0x12
pchb3 at pci0 dev 16 function 2 "Intel 5000 Error Reporting" rev 0x12
pchb4 at pci0 dev 17 function 0 "Intel 5000 Reserved" rev 0x12
pchb5 at pci0 dev 19 function 0 "Intel 5000 Reserved" rev 0x12
pchb6 at pci0 dev 21 function 0 "Intel 5000 FBD" rev 0x12
pchb7 at pci0 dev 22 function 0 "Intel 5000 FBD" rev 0x12
ppb13 at pci0 dev 28 function 0 "Intel 6321ESB PCIE" rev 0x09
pci14 at ppb13 bus 4
ppb14 at pci14 dev 0 function 0 "ServerWorks PCIE-PCIX" rev 0xc2
pci15 at ppb14 bus 5
bnx1 at pci15 dev 0 function 0 "Broadcom BCM5708" rev 0x11: irq 5,
address 00:18:8b:72:c0:fb
brgphy1 at bnx1 phy 1: BCM5708C 10/100/1000baseT PHY, rev. 5
uhci0 at pci0 dev 29 function 0 "Intel 6321ESB USB" rev 0x09: irq 11
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1 "Intel 6321ESB USB" rev 0x09: irq 10
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2 at pci0 dev 29 function 2 "Intel 6321ESB USB" rev 0x09: irq 11
usb2 at uhci2: USB revision 1.0
uhub2 at usb2
uhub2: Intel UHCI root hub, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
ehci0 at pci0 dev 29 function 7 "Intel 6321ESB USB" rev 0x09: irq 11
usb3 at ehci0: USB revision 2.0
uhub3 at usb3
uhub3: Intel EHCI root hub, rev 2.00/1.00, addr 1
uhub3: 6 ports with 6 removable, self powered
ppb15 at pci0 dev 30 function 0 "Intel 82801BA AGP" rev 0xd9
pci16 at ppb15 bus 16
vga1 at pci16 dev 13 function 0 "ATI ES1000" rev 0x02
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ichpcib0 at pci0 dev 31 function 0 "Intel 6321ESB LPC" rev 0x09: PM
disabled
pciide0 at pci0 dev 31 function 1 "Intel 6321ESB IDE" rev 0x09: DMA,
channel 0 configured to compatibility, channel 1 configured to
compatibility
atapiscsi0 at pciide0 channel 0 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <HL-DT-ST, CD-ROM GCR-8240N, 1.10> SCSI0
5/cdrom removable
cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 ignored (disabled)
isa0 at ichpcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
spkr0 at pcppi0
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
biomask ffc5 netmask ffe5 ttymask ffe7
pctr: user-level cycle counter enabled
uhub4 at uhub3 port 5
uhub4: Cypress Semiconductor USB2 Hub, rev 2.00/0.0b, addr 2
uhub4: 4 ports with 4 removable, self powered, multiple transaction
translators
uhidev0 at uhub1 port 1 configuration 1 interface 0
uhidev0: CHESEN PS2 to USB Converter, rev 1.10/0.10, addr 2, iclass 3/1
ukbd0 at uhidev0: 8 modifier keys, 6 key codes
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev1 at uhub1 port 1 configuration 1 interface 1
uhidev1: CHESEN PS2 to USB Converter, rev 1.10/0.10, addr 2, iclass 3/1
uhidev1: 3 report ids
ums0 at uhidev1 reportid 1: 5 buttons and Z dir.
wsmouse0 at ums0 mux 0
uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0
uhid1 at uhidev1 reportid 3: input=3, output=0, feature=0
dkcsum: sd0 matches BIOS drive 0x80
root on sd0a
rootdev=0x400 rrootdev=0xd00 rawdev=0xd02
WARNING: / was not properly unmounted


__________ NOD32 2032 (20070202) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com

Reply via email to