On Mon, Mar 04, 2002 at 02:24:13PM +0100, Marcel Prisi wrote:
> Help !!
>
> We have a postgresql server (bi-PIII 833, 1Gb RAM, Adaptec SCSI RAID)
> rebooting by itself more and more frequently.
>
> I can not find any message anywhere, as if someone just presses the "reboot"
> button every now and then. It used to reboot every week or so, but it
> rebooted just six times today !
>
> What can I do to fond the cause ? The machine runs 4.5-RELEASE, but used to
> run 4.3-PRERELEASE with the same trouble.
>
> I partially checked RAM through memtest86 (www.memtest86.com) but did not
> find any trouble. What might it be ? faulty processor ? Faulty power-supply
> ? any clues ? how to test ?
The program-driven memory testers are at best rather bad at
finding memory problems.
Candidates, not in any particular order:
o Overheated processor(s)
o Overheated or faulty memory
o Faulty or overloaded power supply
o Faulty line power from the wall
o Flaky motherbord (_Highly_ unlikely)
and, inevitably,
o "Other"
Problem Determination and Isolation techniques:
o Processor: run healthd or another voltage/temperature monitor
to watch voltages and temperatures. Check all fans for good
speed. Remove lint, cat-hair, etc., from heat sinks.[1]
o Faulty or overheated memory: Check fans, etc., as above Take .
memory to a store that has a hardware memory tester Remove one .
SIMM at a time, reboot, wait .
o Power Supply: Check DC power with voltmeter and oscilloscope.
Replace PS if you _think_ there might be a problem with it or if
it is anywhere near capacity. They're cheap here: about $100 will
get you a very-high-capacity PS.
o Faulty line power: Put the server on a known-good UPS.
o Flaky motherboard: Replace motherboard (absolute last resort).
[1] Cat hair in my house appears to be attracted preferentially
to CPU heat sinks.
--
Mike Andrews
[EMAIL PROTECTED]
Tired old sysadmin since 1964
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message