Spontaneous reboots with Dell PowerEdge

2006-07-07 Thread Odhiambo Washington
Hi,

I am hoping that someone has come across this weird happening with Dell 
PowerEdge
1800.

We have one such server, with dual power supply, running FreeBSD 4.11-STABLE.

Spontaneously, the server does a hard boot and comes back up. /var/log/messages
only states the previous system shutdown was unexpected. This happens at random
times and is completely unpredictable. We have replaced all the RAM modules
with a completely different set, just to eliminate anything to do with faulty
modules, but this has not cured the problem.

The server has 7 disks on a RAID 5 set. A hard reboot definately calls for a
fsck, which makes the reboot process take forever. I have fsck_y_enable in
rc.conf, because otherwise someone will have to manually run fsck after these
spontaneous reboots!

I am considering a serial console option to see if I can capture 
something, but apart from that I am at my wit's end reagrding this issue.



Here is the output of dmesg.boot:

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 4.11-STABLE #12: Thu Apr 20 16:44:32 EAT 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/SRV4.x
Timecounter i8254  frequency 1193182 Hz
CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3192.22-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0xf43  Stepping = 3
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Hyperthreading: 2 logical CPUs
real memory  = 3757834240 (3669760K bytes)
avail memory = 3658661888 (3572912K bytes)
Preloaded elf kernel kernel at 0xc0414000.
Warning: Pentium 4 CPU: PSE disabled
VESA: v2.0, 16384k memory, flags:0x1, mode table:0xc0398502 (122)
VESA: ATI RADEON VE
netsmb_dev: loaded
Pentium Pro MTRR support enabled
md0: Malloc disk
Using $PIR table, 25 entries at 0xc00fb750
npx0: math processor on motherboard
npx0: INT 16 interface
pcib0: Host to PCI bridge on motherboard
pci0: PCI bus on pcib0
pcib1: PCI to PCI bridge (vendor=8086 device=3595) irq 0 at device 2.0 on pci0
pci1: PCI bus on pcib1
pcib2: PCI to PCI bridge (vendor=8086 device=0330) at device 0.0 on pci1
pci2: PCI bus on pcib2
amr0: LSILogic MegaRAID 1.51 mem 0xfe9c-0xfe9f,0xfa0f-0xfa0f 
irq 7 at device 14.0 on pci2
amr0: LSILogic PERC 4e/Di Firmware 521X, BIOS H430, 256MB RAM
pcib3: PCI to PCI bridge (vendor=8086 device=0332) at device 0.2 on pci1
pci3: PCI bus on pcib3
pcib4: PCI to PCI bridge (vendor=8086 device=3596) irq 0 at device 3.0 on pci0
pci4: PCI bus on pcib4
pcib5: PCI to PCI bridge (vendor=8086 device=0329) at device 0.0 on pci4
pci5: PCI bus on pcib5
pcib6: PCI to PCI bridge (vendor=8086 device=032a) at device 0.2 on pci4
pci6: PCI bus on pcib6
pcib7: PCI to PCI bridge (vendor=8086 device=3597) irq 0 at device 4.0 on pci0
pci7: PCI bus on pcib7
pcib8: PCI to PCI bridge (vendor=8086 device=3598) irq 0 at device 5.0 on pci0
pci10: PCI bus on pcib8
pcib9: PCI to PCI bridge (vendor=8086 device=0329) at device 0.0 on pci10
pci11: PCI bus on pcib9
em0: Intel(R) PRO/1000 Network Connection, Version - 2.1.7 port 0xdcc0-0xdcff 
mem 0xfe4e-0xfe4f irq 11 at device 7.0 on pci11
em0:  Speed:N/A  Duplex:N/A
pcib10: PCI to PCI bridge (vendor=8086 device=032a) at device 0.2 on pci10
pci12: PCI bus on pcib10
em1: Intel(R) PRO/1000 Network Connection, Version - 2.1.7 port 0xccc0-0xccff 
mem 0xfe2e-0xfe2f irq 11 at device 8.0 on pci12
em1:  Speed:N/A  Duplex:N/A
pcib11: PCI to PCI bridge (vendor=8086 device=3599) irq 0 at device 6.0 on 
pci0
pci13: PCI bus on pcib11
uhci0: Intel 82801EB (ICH5) USB controller USB-A port 0x9ce0-0x9cff irq 11 at 
device 29.0 on pci0
usb0: Intel 82801EB (ICH5) USB controller USB-A on uhci0
usb0: USB revision 1.0
uhub0: 2 ports with 2 removable, self powered
uhci1: Intel 82801EB (ICH5) USB controller USB-B port 0x9cc0-0x9cdf irq 10 at 
device 29.1 on pci0
usb1: Intel 82801EB (ICH5) USB controller USB-B on uhci1
usb1: USB revision 1.0
uhub1: 2 ports with 2 removable, self powered
uhci2: Intel 82801EB (ICH5) USB controller USB-C port 0x9ca0-0x9cbf irq 7 at 
device 29.2 on pci0
usb2: Intel 82801EB (ICH5) USB controller USB-C on uhci2
usb2: USB revision 1.0
uhub2: 2 ports with 2 removable, self powered
pci0: USB controller at 29.7 irq 3
pcib12: Intel 82801BA/BAM (ICH2) Hub to PCI bridge at device 30.0 on pci0
pci16: PCI bus on pcib12
pci16: ATI model 5159 graphics accelerator at 13.0 irq 7
isab0: PCI to ISA bridge (vendor=8086 device=24d0) at device 31.0 on pci0
isa0: ISA bus on isab0
atapci0: Intel ICH5 ATA100 controller port 
0xfc00-0xfc0f,0-0x3,0-0x7,0-0x3,0-0x7 irq 0 at device 31.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
orm0: Option ROMs at iomem 
0xc-0xcafff,0xcb000-0xcbfff,0xcc000-0xccfff,0xec000-0xe on isa0
pmtimer0 on isa0
fdc0: NEC 72065B or clone at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on 

Re: Spontaneous reboots with Dell PowerEdge

2006-07-07 Thread Jerry McAllister
 
 Hi,
 
 I am hoping that someone has come across this weird happening with 
 Dell PowerEdge 1800.
 
 We have one such server, with dual power supply, running FreeBSD 4.11-STABLE.
 
 Spontaneously, the server does a hard boot and comes back up. 
 /var/log/messages only states the previous system shutdown was unexpected. 
 This happens at random
 times and is completely unpredictable. We have replaced all the RAM modules
 with a completely different set, just to eliminate anything to do with faulty
 modules, but this has not cured the problem.
 
 The server has 7 disks on a RAID 5 set. A hard reboot definately calls for a
 fsck, which makes the reboot process take forever. I have fsck_y_enable in
 rc.conf, because otherwise someone will have to manually run fsck after these
 spontaneous reboots!
 
 I am considering a serial console option to see if I can capture 
 something, but apart from that I am at my wit's end reagrding this issue.

Well, it sounds like something in the area of a power or heat problem.
But, it is hard to tell.   dmesg.boot will not tell you much about
what caused the system to go down.   It only contains information
about it coming back up.

You might try and look in to /var/log/messages et al.
But, I would check on power consistency and if any component
is heating up.

jerry

 
 
 
 Here is the output of dmesg.boot:
 
 Copyright (c) 1992-2005 The FreeBSD Project.
 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
   The Regents of the University of California. All rights reserved.
 FreeBSD 4.11-STABLE #12: Thu Apr 20 16:44:32 EAT 2006
 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SRV4.x
 Timecounter i8254  frequency 1193182 Hz
 CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3192.22-MHz 686-class CPU)
   Origin = GenuineIntel  Id = 0xf43  Stepping = 3
   
 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
   Hyperthreading: 2 logical CPUs
 real memory  = 3757834240 (3669760K bytes)
 avail memory = 3658661888 (3572912K bytes)
 Preloaded elf kernel kernel at 0xc0414000.
 Warning: Pentium 4 CPU: PSE disabled
 VESA: v2.0, 16384k memory, flags:0x1, mode table:0xc0398502 (122)
 VESA: ATI RADEON VE
 netsmb_dev: loaded
 Pentium Pro MTRR support enabled
 md0: Malloc disk
 Using $PIR table, 25 entries at 0xc00fb750
 npx0: math processor on motherboard
 npx0: INT 16 interface
 pcib0: Host to PCI bridge on motherboard
 pci0: PCI bus on pcib0
 pcib1: PCI to PCI bridge (vendor=8086 device=3595) irq 0 at device 2.0 on 
 pci0
 pci1: PCI bus on pcib1
 pcib2: PCI to PCI bridge (vendor=8086 device=0330) at device 0.0 on pci1
 pci2: PCI bus on pcib2
 amr0: LSILogic MegaRAID 1.51 mem 
 0xfe9c-0xfe9f,0xfa0f-0xfa0f irq 7 at device 14.0 on pci2
 amr0: LSILogic PERC 4e/Di Firmware 521X, BIOS H430, 256MB RAM
 pcib3: PCI to PCI bridge (vendor=8086 device=0332) at device 0.2 on pci1
 pci3: PCI bus on pcib3
 pcib4: PCI to PCI bridge (vendor=8086 device=3596) irq 0 at device 3.0 on 
 pci0
 pci4: PCI bus on pcib4
 pcib5: PCI to PCI bridge (vendor=8086 device=0329) at device 0.0 on pci4
 pci5: PCI bus on pcib5
 pcib6: PCI to PCI bridge (vendor=8086 device=032a) at device 0.2 on pci4
 pci6: PCI bus on pcib6
 pcib7: PCI to PCI bridge (vendor=8086 device=3597) irq 0 at device 4.0 on 
 pci0
 pci7: PCI bus on pcib7
 pcib8: PCI to PCI bridge (vendor=8086 device=3598) irq 0 at device 5.0 on 
 pci0
 pci10: PCI bus on pcib8
 pcib9: PCI to PCI bridge (vendor=8086 device=0329) at device 0.0 on pci10
 pci11: PCI bus on pcib9
 em0: Intel(R) PRO/1000 Network Connection, Version - 2.1.7 port 
 0xdcc0-0xdcff mem 0xfe4e-0xfe4f irq 11 at device 7.0 on pci11
 em0:  Speed:N/A  Duplex:N/A
 pcib10: PCI to PCI bridge (vendor=8086 device=032a) at device 0.2 on pci10
 pci12: PCI bus on pcib10
 em1: Intel(R) PRO/1000 Network Connection, Version - 2.1.7 port 
 0xccc0-0xccff mem 0xfe2e-0xfe2f irq 11 at device 8.0 on pci12
 em1:  Speed:N/A  Duplex:N/A
 pcib11: PCI to PCI bridge (vendor=8086 device=3599) irq 0 at device 6.0 on 
 pci0
 pci13: PCI bus on pcib11
 uhci0: Intel 82801EB (ICH5) USB controller USB-A port 0x9ce0-0x9cff irq 11 
 at device 29.0 on pci0
 usb0: Intel 82801EB (ICH5) USB controller USB-A on uhci0
 usb0: USB revision 1.0
 uhub0: 2 ports with 2 removable, self powered
 uhci1: Intel 82801EB (ICH5) USB controller USB-B port 0x9cc0-0x9cdf irq 10 
 at device 29.1 on pci0
 usb1: Intel 82801EB (ICH5) USB controller USB-B on uhci1
 usb1: USB revision 1.0
 uhub1: 2 ports with 2 removable, self powered
 uhci2: Intel 82801EB (ICH5) USB controller USB-C port 0x9ca0-0x9cbf irq 7 
 at device 29.2 on pci0
 usb2: Intel 82801EB (ICH5) USB controller USB-C on uhci2
 usb2: USB revision 1.0
 uhub2: 2 ports with 2 removable, self powered
 pci0: USB controller at 29.7 irq 3
 pcib12: Intel 82801BA/BAM (ICH2) Hub to PCI bridge at device 30.0 on pci0
 pci16: PCI bus on pcib12
 pci16: ATI model 5159 graphics