A server which has been running steadily for years is beginning to
reboot. To the best of my knowledge, nothing has changed. It is a
dual-processor PIII. It runs stable.
It is tucked away in the loft and usually has no monitor attached so
tracking this down is difficult. However even if I brought it into a
more convenient area, short of sitting staring at the screen waiting for
a crash or reboot, I'm not sure it would help much.
I've tried rebuilding a newer kernel from backports.org. And trimmed it
right down as much as possible. There is nothing useful in syslog. A
typical series of reboots looks like:
dougie pts/0 tbird2xp:0.0 Tue Oct 31 17:15 still logged in
runlevel (to lvl 2) 2.6.17 Tue Oct 31 17:12 - 17:21 (00:08)
reboot system boot 2.6.17 Tue Oct 31 17:12 (00:08)
dougie pts/0 tbird2xp:0.0 Tue Oct 31 17:09 - crash (00:02)
runlevel (to lvl 2) 2.6.17 Tue Oct 31 16:59 - 17:12 (00:12)
reboot system boot 2.6.17 Tue Oct 31 16:59 (00:21)
dougie pts/0 tbird2xp:0.0 Tue Oct 31 16:05 - crash (00:54)
runlevel (to lvl 2) 2.6.17 Tue Oct 31 15:16 - 16:59 (01:43)
reboot system boot 2.6.17 Tue Oct 31 15:16 (02:04)
date new time Sun Oct 29 07:11
date old time Sun Oct 29 07:12
root pts/3 kitchens Sun Oct 29 07:11 - crash (2+08:04)
dougie pts/2 kitchens Sat Oct 28 20:29 - crash (2+19:46)
dougie pts/1 kitchens Sat Oct 28 11:37 - 16:04 (1+05:27)
dougie pts/0 tbird2xp:0.0 Fri Oct 27 13:16 - crash (4+03:00)
And the syslog shows nothing notable around the time. Usuall just lines
from postfix as it processes the mail queue, then:
Oct 31 17:12:22 nick syslogd 1.4.1#17: restart (remote reception).
Oct 31 17:12:22 nick kernel: klogd 1.4.1#17, log source = /proc/kmsg
started.
Oct 31 17:12:23 nick kernel: Inspecting /boot/System.map-2.6.17
Oct 31 17:12:23 nick kernel: Loaded 21314 symbols from
/boot/System.map-2.6.17.
I'm not sure how to go about tracking this down. My searching of the
archives shows that these symptoms could describe a faulty physical
component, such as memory or PSU. So my next step is probably going to
be trying to swap the PSU and doing a memtest. One thing about the
reboots is that they often appear to be in clusters. For example, around
7AM to 9AM on Oct 24 it looks like it was bouncing for about two hours
off and on:
# last reboot
reboot system boot 2.6.8 Wed Oct 25 05:03 (06:50)
reboot system boot 2.6.8 Wed Oct 25 04:31 (07:22)
reboot system boot 2.6.8 Tue Oct 24 11:09 (1+00:44)
reboot system boot 2.6.8 Tue Oct 24 10:59 (00:06)
reboot system boot 2.6.8 Tue Oct 24 09:52 (01:01)
reboot system boot 2.6.8 Tue Oct 24 09:50 (01:03)
reboot system boot 2.6.8 Tue Oct 24 09:49 (01:05)
reboot system boot 2.6.8 Tue Oct 24 09:37 (01:17)
reboot system boot 2.6.8 Tue Oct 24 09:05 (01:49)
reboot system boot 2.6.8 Tue Oct 24 08:53 (02:00)
reboot system boot 2.6.8 Tue Oct 24 08:51 (02:03)
reboot system boot 2.6.8 Tue Oct 24 07:28 (03:26)
reboot system boot 2.6.8 Tue Oct 24 07:26 (03:27)
reboot system boot 2.6.8 Tue Oct 24 07:24 (03:29)
reboot system boot 2.6.8 Tue Oct 24 07:01 (03:52)
reboot system boot 2.6.8 Tue Oct 24 06:18 (04:36)
I'm a bit stumped on how to solve this and would appreciate any thoughts
on strategy.
Dougie
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]