[CentOS] CentOS-5.8 - Problem booting remote host

2012-06-20 Thread James B. Byrne
Kernel 2.6.18-308.8.2.el5

I recently experienced an odd problem with a host at our warm-site
location.  The facility we use suffered an hvac failure during
elevated ambient temperatures (30C+) on Monday past and the equipment
room reportedly cooked for some hours.  It was sufficient that our
equipment shutdown. In all probability this was due to an over-temp
condition since the systems are all powered from a UPS but possibly
there was an extended power out instead.

Whatever the cause one of our hosts did not restart subsequent to this
shutdown.  Which condition required my presence on site. When it was
powered up today in situ the host's console would display the CentOS
splash screen with the message [press any key to enter menu] and then
the message Booting in 4 seconds

However, the countdown timer never changed from the initial value and
the restart never took place.  When I entered the console menu and
selected the most recent kernel available the system booted normally. 
It had to do a lot of disk remediation on the first go through but all
that completed without untoward difficulty.  Subsequent shutdowns
displayed the same behaviour.  The splash screen displayed, the boot
timer message showed, and then nothing changed thereafter unless and
until I entered the boot menu.

Selecting the default kernel in the boot menu allowed the restart to
continue, this time without any unusual reports.  I repeated this
process several times more just to confirm that this was not a
transient effect.  Each time operator intervention from the console
was required to restart the system but once this was done no further
problems were noted.  I repeated the process and rebooted using each
of the older kernels present. As far as I could determine there was
nothing wrong with any of the boot images once past the auto-select
segment of the boot process.

I then went into /boot/grub.conf and changed the default boot from
index 0 to index 1 so as to use the previous kernel.  Following this
configuration change thereafter the system restarted normally without
operator intervention.

The problem kernel was installed from Updates on June 13 and was
running from that date as shown in the log entries below.  This was a
remote restart and evidently it completed without any problem.

Jun 13 10:15:59 inet04 shutdown[19274]: shutting down for system reboot
. . .
Jun 13 10:16:25 inet04 exiting on signal 15
Jun 13 10:18:35 inet04 syslogd 1.4.1: restart.
Jun 13 10:18:35 inet04 kernel: klogd 1.4.1, log source = /proc/kmsg
started.
Jun 13 10:18:35 inet04 kernel: Linux version 2.6.18-308.8.2.el5
(mockbuild@build
er10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-52)) #1
SMP Tue Jun
12 09:57:26 EDT 2012

The current syslog shows repeated restarts commencing at 18:12 on Jun
17 and ending at 19:08, after which the system no longer records any
activity whatsoever until I restarted it manually earlier today. 
These two syslogd entries are adjacent in /var/log/messages

Jun 18 19:08:48 inet04 kernel: PROBE_BLACKIST: IN=eth0 OUT=
MAC=00:1c:c4:a1:66:1e:00:18:73:e8:35:a1:08:00 SRC=126.67.126.141
DST=209.47.176.105 LEN=48 TOS=0x00 PREC=0x00 TTL=110 ID=56792 DF
PROTO=TCP SPT=2153 DPT=445 WINDOW=64240 RES=0x00 SYN URGP=0

Jun 20 12:11:31 inet04 ntpd[2480]: time reset +147721.498633 s

Have any of you ever experienced anything like this?  Does anyone have
any idea what might have caused the corruption of the restart
mechanism or where the problem might be?

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte  Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS-5.8 - Problem booting remote host

2012-06-20 Thread m . roth
James B. Byrne wrote:
 Kernel 2.6.18-308.8.2.el5

 I recently experienced an odd problem with a host at our warm-site
 location.  The facility we use suffered an hvac failure during
 elevated ambient temperatures (30C+) on Monday past and the equipment
 room reportedly cooked for some hours.  It was sufficient that our
 equipment shutdown. In all probability this was due to an over-temp
 condition since the systems are all powered from a UPS but possibly
 there was an extended power out instead.

 Whatever the cause one of our hosts did not restart subsequent to this
 shutdown.  Which condition required my presence on site. When it was
 powered up today in situ the host's console would display the CentOS
 splash screen with the message [press any key to enter menu] and then
 the message Booting in 4 seconds

 However, the countdown timer never changed from the initial value and
 the restart never took place.  When I entered the console menu and
 selected the most recent kernel available the system booted normally.
snip
I had that happen on one or two machines in the past, and never got it to
go. I believe that eventually, one upgrade or another, fixed it. You might
try yum reinstalling the new kernel, and redoing the grub-install, and see
if that helps.

   mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos