Re: 5.4 (GENERIC) box has begun to randomly reboot

2014-08-09 Thread Craig R. Skinner
On 2014-08-05 Tue 16:13 PM |, STeve Andre' wrote:
 
 In decreasing order I'd say 5) motherboard problem,  4) power
 supply, 3) memory, 2) cabling failure, 1) disk controller.
 

Thanks gents.

After a night with the power off, the same phatom rebooting started
within 10 minutes the next day.

The used comptuer shop downstairs is on summer holidays, so I swapped
the disks, cables  memory in to another chassis I found in the spare
room. This has been stable since.

Someone suggested looking for swollen/domed capacitors on the main
board (Supermicro), nothing out of the ordinary was seen.

Onward,
Craig.



5.4 (GENERIC) box has begun to randomly reboot

2014-08-05 Thread Craig R. Skinner
Hi,

A reliable box has begun to randomly reboot in the last couple of days.

There's nothing obviously unusual in /var/log/*

$ ls -ld /var/crash
drwxrwx---  2 root  wheel  512 Dec 24  2013 /var/crash/
$ ls -lA /var/crash
total 4
-rw-r--r--  1 root  wheel  5 Jul 30  2013 minfree

I set up a 1 min cron job of sysctl | fgrep hw.sensors.lm1.temp  uptime
The last one before a reboot was:
hw.sensors.lm1.temp0=34.00 degC
hw.sensors.lm1.temp2=33.50 degC
 2:53PM  up 31 mins, 2 users, load averages: 0.13, 0.19, 0.23

I'm guessing some bit of hardware is on it's way out, but which?

$ ls -l /var/run/dmesg.boot
-rw-r--r--  1 root  wheel  3612 Aug  5 14:58 /var/run/dmesg.boot


OpenBSD 5.4 (GENERIC) #37: Tue Jul 30 12:05:01 MDT 2013
dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel Pentium III (GenuineIntel 686-class, 128KB L2 cache) 635 MHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PSE36,MMX,FXSR,SSE,PERF
real mem  = 535228416 (510MB)
avail mem = 515035136 (491MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 01/15/99, BIOS32 rev. 0 @ 0xfdb70, SMBIOS 
rev. 2.0 @ 0xf0480 (24 entries)
bios0: vendor American Megatrends Inc. version 063101 date 01/15/99
bios0: Supermicro Computer Intel 810
apm0 at bios0: Power Management spec V1.2
acpi at bios0 function 0x0 not configured
pcibios0 at bios0: rev 2.1 @ 0xf/0x1
pcibios0: PCI BIOS has 9 Interrupt Routing table entries
pcibios0: PCI Interrupt Router at 000:31:0 (Intel 82801AA LPC rev 0x00)
pcibios0: PCI bus #1 is the last bus
bios0: ROM list: 0xc/0x8000
cpu0 at mainbus0: (uniprocessor)
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 Intel 82810E Host rev 0x03
vga1 at pci0 dev 1 function 0 Intel 82810E Video rev 0x03
intagp0 at vga1
agp0 at intagp0: aperture at 0xec00, size 0x400
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ppb0 at pci0 dev 30 function 0 Intel 82801AA Hub-to-PCI rev 0x02
pci1 at ppb0 bus 1
rl0 at pci1 dev 0 function 0 Realtek 8139 rev 0x10: irq 11, address 
00:90:47:05:99:6d
rlphy0 at rl0 phy 0: RTL internal PHY
rl1 at pci1 dev 1 function 0 Realtek 8139 rev 0x10: irq 10, address 
00:90:47:05:30:e8
rlphy1 at rl1 phy 0: RTL internal PHY
ichpcib0 at pci0 dev 31 function 0 Intel 82801AA LPC rev 0x02: 24-bit timer 
at 3579545Hz
pciide0 at pci0 dev 31 function 1 Intel 82801AA IDE rev 0x02: DMA, channel 0 
wired to compatibility, channel 1 wired to compatibility
wd0 at pciide0 channel 0 drive 0: ST3250820A
wd0: 16-sector PIO, LBA48, 238475MB, 488397168 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
wd1 at pciide0 channel 1 drive 0: Maxtor 5A320J0
wd1: 16-sector PIO, LBA48, 308921MB, 632672208 sectors
wd1(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
uhci0 at pci0 dev 31 function 2 Intel 82801AA USB rev 0x02: irq 5
ichiic0 at pci0 dev 31 function 3 Intel 82801AA SMBus rev 0x02: irq 10
iic0 at ichiic0
spdmem0 at iic0 addr 0x50: 256MB SDRAM non-parity PC133CL2
spdmem1 at iic0 addr 0x51: 256MB SDRAM non-parity PC133CL2
auich0 at pci0 dev 31 function 5 Intel 82801AA AC97 rev 0x02: irq 10, ICH AC97
ac97: codec id 0x43525934 (Cirrus Logic CS4299 rev 4)
ac97: codec features headphone, 20 bit DAC, 18 bit ADC, Crystal Semi 3D
audio0 at auich0
isa0 at ichpcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
wbsio0 at isa0 port 0x2e/2: W83627HF rev 0x13
lm1 at wbsio0 port 0x290/8: W83627HF
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
usb0 at uhci0: USB revision 1.0
uhub0 at usb0 Intel UHCI root hub rev 1.00/1.00 addr 1
mtrr: Pentium Pro MTRR support
vscsi0 at root
scsibus0 at vscsi0: 256 targets
softraid0 at root
scsibus1 at softraid0: 256 targets
root on wd0a (0e3aa2ac975978d6.a) swap on wd0b dump on wd0b
WARNING: / was not properly unmounted



Re: 5.4 (GENERIC) box has begun to randomly reboot

2014-08-05 Thread STeve Andre'

On 08/05/14 10:02, Craig R. Skinner wrote:

Hi,

A reliable box has begun to randomly reboot in the last couple of days.

There's nothing obviously unusual in /var/log/*

$ ls -ld /var/crash
drwxrwx---  2 root  wheel  512 Dec 24  2013 /var/crash/
$ ls -lA /var/crash
total 4
-rw-r--r--  1 root  wheel  5 Jul 30  2013 minfree

I set up a 1 min cron job of sysctl | fgrep hw.sensors.lm1.temp  uptime
The last one before a reboot was:
hw.sensors.lm1.temp0=34.00 degC
hw.sensors.lm1.temp2=33.50 degC
  2:53PM  up 31 mins, 2 users, load averages: 0.13, 0.19, 0.23

I'm guessing some bit of hardware is on it's way out, but which?

$ ls -l /var/run/dmesg.boot
-rw-r--r--  1 root  wheel  3612 Aug  5 14:58 /var/run/dmesg.boot


OpenBSD 5.4 (GENERIC) #37: Tue Jul 30 12:05:01 MDT 2013
 dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel Pentium III (GenuineIntel 686-class, 128KB L2 cache) 635 MHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PSE36,MMX,FXSR,SSE,PERF
real mem  = 535228416 (510MB)
avail mem = 515035136 (491MB)



So, a nice venerable P III.. I have several Dell's of that vintage all
running well, after 10+ years.

Me, I'd get the memtest CD and use that for a start.  Easy.

In decreasing order I'd say 5) motherboard problem,  4) power
supply, 3) memory, 2) cabling failure, 1) disk controller.

I did once have a really strange problem of crashing, which
turned out to be the on-board IDE controller.  I put a Siig
sata controller in it and still works today.  So a varient on )5.

Don't forget about dust and around the fans.  I'd take it outside
and use compressed air of some kind to clean it.

Good luck...

--STeve Andre'



Re: 5.4 (GENERIC) box has begun to randomly reboot

2014-08-05 Thread Nick Holland
On 08/05/14 10:02, Craig R. Skinner wrote:
 Hi,
 
 A reliable box has begun to randomly reboot in the last couple of days.
...
 I'm guessing some bit of hardware is on it's way out, but which?

I think that's a safe guess.

 OpenBSD 5.4 (GENERIC) #37: Tue Jul 30 12:05:01 MDT 2013
 dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
 cpu0: Intel Pentium III (GenuineIntel 686-class, 128KB L2 cache) 635 MHz
 cpu0: 
 FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PSE36,MMX,FXSR,SSE,PERF
 real mem  = 535228416 (510MB)
 avail mem = 515035136 (491MB)

This box has done its time.
I love running on old hw as much as anyone, but really, it isn't worth
troubleshooting a box like this.  If you spend an hour on it, it's not
worth it...if it crashes one more time, it's not worth it.

Go find yourself a P4 someone is tossing out, pull your 250G disk out of
this box, put it in the P4.  Loose the 30G.  Enjoy the faster
performance.  Not only is the CPU faster, you should be running at UDMA
mode 4 or 5, and you might get a lot more RAM if you are lucky.

If it still crashes and reboots, it's your disk or the power you are
applying.

BTW: stupidly simple cause for reboots: old UPS systems.  Battery goes
dead, and your UPS saves you from a harmless power glitch...by turning
it into a multi-second power OUTAGE.  I've also seen perfectly
functional UPSs that couldn't switch over fast enough for some
computers, again causing a reboot.

Nick.