Hi,

This morning at my work place I saw an email sent by Nagios at 01:15
stating

Notification Type: PROBLEM
Service: SSH
Host: <removed>
Address: <removed>
State: CRITICAL
Date/Time: Mon Sept 25 01:15:58 BST 2006
Additional Info:
CRITICAL - Socket timeout after 10 seconds
 
And a recovery was received at 01:40

Notification Type: RECOVERY
Service: SSH
Host: <removed>
Address: <removed>
State: OK
Date/Time: Mon Sept 25 01:40:53 BST 2006
Additional Info:
SSH OK - OpenSSH_3.9p1 (protocol 1.99)

Upon checking logs I found that all processing on the system stoped at
the start of the  notification (but surprisingly I did not receive a
HOST down email from Nagios). Nothing in the logs suggest that someone
issued a reboot command and infact the 'last log' showed that no one
logged onto the system after Friday. 

The custom RRDTools graphs stored locally to the machine and which
captures the stats every miniute does not indicate any strange behavior
before 01:15 and they are all blank from 01:15 till 01:40. Even the
ethernet bandwidth which should at-least show the link monitoring
traffic is not there. Now since Nagios did not sent an HOST DOWN alert,
I was assuming that atleast the box was reponding to ping command. 

This system is not in production yet, but was being build for
production use and hence no SMS were sent out else I would have hoped
to find out more.

No system log recorded anything from 01:15 to 01:36. Am I right in
beleiving that any of the following could have happened:

 * Some one manually issued a reboot and cleared traces.
OR
 * Some authorised user scheduled a system reboot using AT command. How
to rule out this possibility.

Following are some command outputs and log excerpts. I have removed
host identification details to protect the innocent.

One more thing, I noticed a strange thing in /etc/secure log, look at
the excerpt below, did you also see the problem in log sequencing?
Entries for 01:36 are before entries of 00:40 and I have seen the log
entries further. There is no break in logging timings on this log,
unlike the messages log. The ssh disconnect which you are seeing are
Nagios monitoring, and there is no break in that? Any ideas.

# last reboot
reboot   system boot  2.6.9-42.0.2.ELs Mon Sep 25 01:36          (10:05)

wtmp begins Mon Sep  4 11:09:22 2006

# last -x
<snipped......>
runlevel (to lvl 3)   2.6.9-42.0.2.ELs Mon Sep 25 01:36 - 11:42  (10:06)
reboot   system boot  2.6.9-42.0.2.ELs Mon Sep 25 01:36          (10:06)
<snipped......>

# who -b
         system boot  Sep 25 01:36

# cat /var/log/messages
<snipped...>

Sep 25 01:09:23 <hostname-removed> kernel: audit(1159142963.681:24678):
avc:  denied  { getattr } for  pid=3265 comm="snmpd" name="/"
dev=binfmt_misc ino=7994 scontext=system_u:system_r:snmpd_t
tcontext=system_u:object_r:binfmt_misc_fs_t tclass=dir Sep 25 01:36:19
<hostname-removed> syslogd 1.4.1: restart. Sep 25 01:36:19
<hostname-removed>syslog: syslogd startup succeeded

<snipped... as these are all system startup entries>

# cat /var/log/secure
<snipped...>
Sep 25 00:08:46 <hostname-removed> sshd[28298]: Connection closed
by ::ffff:<ipaddr-removed> Sep 25 01:36:21 <hostname-removed>
sshd[2993]: Server listening on :: port 22. Sep 25 01:36:21
<hostname-removed> sshd[2993]: error: Bind to port 22 on 0.0.0.0
failed: Address already in use. Sep 25 00:40:46 <hostname-removed>
sshd[5710]: Connection closed by ::ffff:<ipaddr-removed> Sep 25
00:45:46 <hostname-removed> sshd[6498]: Connection closed
by ::ffff:<ipaddr-removed> Sep 25 00:50:46 <hostname-removed>
sshd[7289]: Connection closed by ::ffff:<ipaddr-removed> Sep 25
00:55:46 <hostname-removed> sshd[8077]: Connection closed
by ::ffff:<ipaddr-removed> Sep 25 01:00:46 <hostname-removed>
sshd[8868]: Connection closed by ::ffff:<ipaddr-removed> Sep 25
01:05:46 <hostname-removed> sshd[9658]: Connection closed
by ::ffff:<ipaddr-removed> Sep 25 01:10:46 <hostname-removed>
sshd[10449]: Connection closed by ::ffff:<ipaddr-removed> Sep 25
01:15:46 <hostname-removed> sshd[11237]: Connection closed
by ::ffff:<ipaddr-removed> Sep 25 01:20:46 <hostname-removed>
sshd[12028]: Connection closed by ::ffff:<ipaddr-removed> Sep 25
01:25:46 <hostname-removed> sshd[12816]: Connection closed
by ::ffff:<ipaddr-removed> Sep 25 01:30:46 <hostname-removed>
sshd[13611]: Connection closed by ::ffff:<ipaddr-removed> Sep 25
01:35:46 <hostname-removed> sshd[14399]: Connection closed
by ::ffff:<ipaddr-removed> Sep 25 01:40:46 <hostname-removed>
sshd[15190]: Connection closed by ::ffff:<ipaddr-removed> Sep 25
01:45:46 <hostname-removed> sshd[16509]: Connection closed
by ::ffff:<ipaddr-removed> Sep 25 01:50:46 <hostname-removed>
sshd[17300]: Connection closed by ::ffff:<ipaddr-removed> <snipped...>


Any help will be appreciated.

Regards.
-- 
अिजतााभ पांडे (Ajitabh Pandey)
http://www.ajitabhpandey.info/
ICQ - 150615062
Registered Linux User - 240748
GnuPG Key ID - C2AED210
Key fingerprint = 8A56 0684 44C2 3373 D441  20AF 7398 4DEB C2AE D210
-----------------------------------
Q:      Why do ducks have big flat feet?
A:      To stamp out forest fires.

Q:      Why do elephants have big flat feet?
A:      To stamp out flaming ducks.


-- 
Ajitabh Pandey
http://www.ajitabhpandey.info/
ICQ - 150615062
Registered Linux User - 240748
GnuPG Key ID - C2AED210
Key fingerprint = 8A56 0684 44C2 3373 D441  20AF 7398 4DEB C2AE D210
-----------------------------------
It's lucky you're going so slowly, because you're going in the wrong
direction.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
linux-india-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-india-help
  • [LIH] Possible server... Ajitabh Pandey अिजतााभ पांडे

Reply via email to