pseudo-crash on OpenBSD 4.5

2009-12-11 Thread bill234
Got a bit of an oddity with OpenBSD 4.5 - it's not quite a crash, but
close. It has happened 3 times now, usually after running flawlessly for
2-3 weeks.

Fully up to date with 4.5-stable, running GENERIC.MP on a Dell poweredge
R300 quad-core server with 4 gig ram (dmesg below). It's used as a
firewall/NAT/vpn gateway, and as an email server.

When the problem occurs, all services on the server stop responding
(pop,imap,smtp, etc).

The odd thing is that it does respond to ping, and the server still routes
traffic correctly, and the vpn is up.

The server console shows nothing out of the ordinary (white on black text
login prompt, no X11), but the console is frozen - doesn't respond to
keyboard.

Since it doesn't actually panic, I can't run the usual debug tools.

My only choice is to reboot.

This is my only quad-core server with 4 gig - I'm wondering if it's
related to GENERIC.MP or all the ram.

(I have many other openbsd 4.5 boxes, none have this issue, but they are
single-core and less than 3 gig ram)

Any suggestions? Dell has a bios upgrade, I'll give that a try.




OpenBSD 4.5-stable (GENERIC.MP) #2: Wed Nov  4 21:53:18 EST 2009
usern...@servername:/usr/src/sys/arch/i386/compile/GENERIC.MP
cpu0: Intel(R) Xeon(R) CPU X3323 @ 2.50GHz (GenuineIntel 686-class) 2.51
GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,CX16,xTPR
real mem  = 3483598848 (3322MB)
avail mem = 3379777536 (3223MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 08/15/08, BIOS32 rev. 0 @ 0xfa520,
SMBIOS rev. 2.5 @ 0xcfb9c000 (55 entries)
bios0: vendor Dell Inc. version 1.3.0 date 08/15/2008
bios0: Dell Inc. PowerEdge R300
acpi0 at bios0: rev 2
acpi0: tables DSDT FACP APIC SPCR HPET MCFG WD__ SLIC ERST HEST BERT EINJ
TCPA
acpi0: wakeup devices PCI0(S5)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: apic clock running at 333MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Xeon(R) CPU X3323 @ 2.50GHz (GenuineIntel 686-class) 2.50
GHz
cpu1:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,CX16,xTPR
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU X3323 @ 2.50GHz (GenuineIntel 686-class) 2.50
GHz
cpu2:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,CX16,xTPR
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Xeon(R) CPU X3323 @ 2.50GHz (GenuineIntel 686-class) 2.50
GHz
cpu3:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,CX16,xTPR
ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins
ioapic0: misconfigured as apic 0, remapped to apid 4
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 5 (PEX4)
acpiprt2 at acpi0: bus 7 (PEX6)
acpiprt3 at acpi0: bus 1 (SBE4)
acpiprt4 at acpi0: bus 2 (SBE5)
acpiprt5 at acpi0: bus 10 (COMP)
acpicpu0 at acpi0: C3
acpicpu1 at acpi0: C3
acpicpu2 at acpi0: C3
acpicpu3 at acpi0: C3
bios0: ROM list: 0xc/0x9000 0xc9000/0x1000 0xca000/0x2000
0xcc000/0x5c00 0xec000/0x4000!
ipmi at mainbus0 not configured
cpu0: Enhanced SpeedStep disabled by BIOS
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 Intel 5100 Host rev 0x90
ppb0 at pci0 dev 2 function 0 Intel 5100 PCIE rev 0x90
pci1 at ppb0 bus 3
ppb1 at pci0 dev 3 function 0 Intel 5100 PCIE rev 0x90
pci2 at ppb1 bus 4
ppb2 at pci0 dev 4 function 0 Intel 5100 PCIE rev 0x90: apic 4 int 16
(irq 0)
pci3 at ppb2 bus 5
mpi0 at pci3 dev 0 function 0 Symbios Logic SAS1068E rev 0x08: apic 4
int 16 (irq 15)
scsibus0 at mpi0: 112 targets
sd0 at scsibus0 targ 0 lun 0: Dell, VIRTUAL DISK, 1028 SCSI3 0/direct fixed
sd0: 476416MB, 512 bytes/sec, 975699968 sec total
ses0 at scsibus0 targ 8 lun 0: DP, BACKPLANE, 1.05 SCSI3 13/enclosure
services fixed
ppb3 at pci0 dev 5 function 0 Intel 5100 PCIE rev 0x90
pci4 at ppb3 bus 6
ppb4 at pci0 dev 6 function 0 Intel 5100 PCIE rev 0x90: apic 4 int 16
(irq 0)
pci5 at ppb4 bus 7
bge0 at pci5 dev 0 function 0 Broadcom BCM5722 rev 0x00, BCM5755 C0
(0xa200): apic 4 int 16 (irq 15), address 00:10:18:49:cd:f7
brgphy0 at bge0 phy 1: BCM5722 10/100/1000baseT PHY, rev. 0
ppb5 at pci0 dev 7 function 0 Intel 5100 PCIE rev 0x90
pci6 at ppb5 bus 8
pchb1 at pci0 dev 16 function 0 Intel 5100 FSB rev 0x90
pchb2 at pci0 dev 16 function 1 Intel 5100 FSB rev 0x90
pchb3 at pci0 dev 16 function 2 Intel 5100 FSB rev 0x90
pchb4 at pci0 dev 17 function 0 Intel 5100 Reserved rev 0x90
pchb5 at pci0 dev 19 function 0 Intel 5100 Reserved rev 0x90
pchb6 at pci0 dev 21 function 0 Intel 5100 DDR rev 0x90
pchb7 at pci0 dev 22 

Re: pseudo-crash on OpenBSD 4.5

2009-12-11 Thread andres
Quoting bill...@lavabit.com:

 Got a bit of an oddity with OpenBSD 4.5 - it's not quite a crash, but
 close. It has happened 3 times now, usually after running flawlessly for
 2-3 weeks.

 Fully up to date with 4.5-stable, running GENERIC.MP on a Dell poweredge
 R300 quad-core server with 4 gig ram (dmesg below). It's used as a
 firewall/NAT/vpn gateway, and as an email server.

 When the problem occurs, all services on the server stop responding
 (pop,imap,smtp, etc).

 The odd thing is that it does respond to ping, and the server still routes
 traffic correctly, and the vpn is up.

 The server console shows nothing out of the ordinary (white on black text
 login prompt, no X11), but the console is frozen - doesn't respond to
 keyboard.

 Since it doesn't actually panic, I can't run the usual debug tools.

 My only choice is to reboot.

 This is my only quad-core server with 4 gig - I'm wondering if it's
 related to GENERIC.MP or all the ram.

 (I have many other openbsd 4.5 boxes, none have this issue, but they are
 single-core and less than 3 gig ram)

 Any suggestions? Dell has a bios upgrade, I'll give that a try.

Just to get two things out of the way: 1) try the sp kernel, and 2) drop down
to 2G of memory.

A friend ran across some dell or hp system that didn't like having 4g of
ram, so its an east test do do.

bios upgrades are almost always a great thing to try.

--STeve Andre'



Re: pseudo-crash on OpenBSD 4.5

2009-12-11 Thread Nick Guenther
On Fri, Dec 11, 2009 at 12:17 PM,  bill...@lavabit.com wrote:
 Got a bit of an oddity with OpenBSD 4.5 - it's not quite a crash, but
 close. It has happened 3 times now, usually after running flawlessly for
 2-3 weeks.

 Fully up to date with 4.5-stable, running GENERIC.MP on a Dell poweredge
 R300 quad-core server with 4 gig ram (dmesg below). It's used as a
 firewall/NAT/vpn gateway, and as an email server.

 When the problem occurs, all services on the server stop responding
 (pop,imap,smtp, etc).

 The odd thing is that it does respond to ping, and the server still routes
 traffic correctly, and the vpn is up.

 The server console shows nothing out of the ordinary (white on black text
 login prompt, no X11), but the console is frozen - doesn't respond to
 keyboard.

 Since it doesn't actually panic, I can't run the usual debug tools.

I've definitely had this happen to me but never had conclusive proof
of the cause (because as you say, all you can do is reboot). I have
more information from a DD-WRT install in fact: the web UI would stop
responding and traffic would slow to a crawl but not stop; we were 90%
sure the problem was memory pressure. When you get it back up try
logging vmstat(8) every few minutes?

-Nick