How to figure out why a computer rebooted?

2005-11-16 Thread Adam Funk
I'm disturbed that I've received a Reboot logcheck report from a
computer to which I don't have physical access right now.  I logged in
by ssh and looked at /var/log/syslog, which shows routine stuff, then
a 3-minute gap, then the restart log entries.

Nov 16 08:02:01 argon /USR/SBIN/CRON[16445]: (logcheck) CMD (   if [ -x 
/usr/sbin/logcheck ]; then nice -n10 /usr/sbin/logcheck; fi)
Nov 16 08:02:01 argon /USR/SBIN/CRON[16446]: (root) CMD (/root/bin/minute)
Nov 16 08:03:01 argon /USR/SBIN/CRON[17248]: (root) CMD (/root/bin/minute)
Nov 16 08:04:01 argon /USR/SBIN/CRON[17251]: (root) CMD (/root/bin/minute)
Nov 16 08:05:01 argon /USR/SBIN/CRON[17255]: (root) CMD (/root/bin/minute)
Nov 16 08:06:01 argon /USR/SBIN/CRON[17259]: (root) CMD (/root/bin/minute)
Nov 16 08:09:15 argon syslogd 1.4.1#17: restart.
Nov 16 08:09:16 argon kernel: klogd 1.4.1#17, log source = /proc/kmsg started.
Nov 16 08:09:16 argon kernel: Inspecting /boot/System.map-2.4.27-2-386
Nov 16 08:09:16 argon kernel: Loaded 18328 symbols from 
/boot/System.map-2.4.27-2-386.
Nov 16 08:09:16 argon kernel: Symbols match kernel version 2.4.27.
Nov 16 08:09:16 argon kernel: Loaded 787 symbols from 36 modules.


A few days ago I deliberately shut the machine down for a while (with
physical access) and produced the following bit of syslog.

Nov 12 12:56:01 argon /USR/SBIN/CRON[21427]: (root) CMD (/root/bin/minute)
Nov 12 12:57:01 argon /USR/SBIN/CRON[21430]: (root) CMD (/root/bin/minute)
Nov 12 12:58:01 argon /USR/SBIN/CRON[21433]: (root) CMD (/root/bin/minute)
Nov 12 12:58:19 argon gdm[3158]: Master halting...
Nov 12 12:58:19 argon shutdown[3158]: shutting down for system halt
Nov 12 12:58:19 argon init: Switching to runlevel: 0
Nov 12 12:58:24 argon xfs[2254]: terminating 
Nov 12 12:58:25 argon xfs[2257]: terminating 
Nov 12 12:58:27 argon chronyd[30148]: chronyd exiting on signal
Nov 12 12:58:27 argon rpc.statd[2457]: Caught signal 15, un-registering and 
exiting.
Nov 12 12:58:27 argon exiting on signal 15
Nov 12 13:16:21 argon syslogd 1.4.1#17: restart.
Nov 12 13:16:22 argon kernel: klogd 1.4.1#17, log source = /proc/kmsg started.
Nov 12 13:16:22 argon kernel: Inspecting /boot/System.map-2.4.27-2-386
Nov 12 13:16:22 argon kernel: Loaded 18328 symbols from 
/boot/System.map-2.4.27-2-386.
Nov 12 13:16:22 argon kernel: Symbols match kernel version 2.4.27.
Nov 12 13:16:22 argon kernel: Loaded 787 symbols from 36 modules.


Obviously if I get home and the digital clocks are flashing, I'll know
there was a brief power cut.  But is there any way to determine from
the computer's own evidence why it restarted?

Thanks,
Adam


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: How to figure out why a computer rebooted?

2005-11-16 Thread cmetzler

 Obviously if I get home and the digital clocks are flashing, I'll know
 there was a brief power cut.  But is there any way to determine from
 the computer's own evidence why it restarted?


Look at this part of your second log, where you shut it down yourself:

 Nov 12 12:58:19 argon gdm[3158]: Master halting...
 Nov 12 12:58:19 argon shutdown[3158]: shutting down for system halt
 Nov 12 12:58:19 argon init: Switching to runlevel: 0
 Nov 12 12:58:24 argon xfs[2254]: terminating 
 Nov 12 12:58:25 argon xfs[2257]: terminating 
 Nov 12 12:58:27 argon chronyd[30148]: chronyd exiting on signal
 Nov 12 12:58:27 argon rpc.statd[2457]: Caught signal 15, un-registering and 
 exiting.
 Nov 12 12:58:27 argon exiting on signal 15

This is stuff put out by various processes as you go to run level 0
(shutdown), stuff *not* present in the log snippet you gave from
the other machine.  So it didn't go down through a normal system
procedure -- it died abruptly, and then rebooted.  Typically, as
you note, this is because of a power issue (and depending on whether
your machine has e.g. a good UPS with a conditioner, etc, it need
not be a failure; I've had bad power quality briefly where the
digital clocks in the room didn't go to flashing 12, but the
computer power supply allowed the motherboard power to drop enough
for the motherboard to quit, then reboot shortly after).  I've
also seen other times when this has happened because of hardware
problems.  I've never seen it because of an OS or other software
issue -- for that, I'm more likely to get hard lockups or, I
guess, PANICs -- but maybe someone will chime in with a counter-
example.  But whatever the source, it died instantaneously, and
in such circumstances the system doesn't get much of an
opportunity to say I'm going down because of X.  So the only
suggestion I'd have would be to look at the logs for any sign
of growing hardware issues, which could be located (in the logs)
nowhere near the actual shutdown.  And check what you're doing
about your power quality.

-c







-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]