On Mon, 2009-03-30 at 08:05 -0400, John A. Sullivan III wrote:
> On Mon, 2009-03-30 at 07:10 -0400, John A. Sullivan III wrote:
> > On Mon, 2009-03-30 at 07:04 -0400, John A. Sullivan III wrote:
> > > On Mon, 2009-03-30 at 06:58 -0400, John A. Sullivan III wrote:
> > > > On Tue, 2009-03-24 at 11:49 -0400, John A. Sullivan III wrote:
> > > > > Here it is.  There is another problem.  My apologies for wondering why
> > > > > the list was so slow to respond.  I am not receiving any email from 
> > > > > the
> > > > > list including Nerijus' response below. I only received your direct
> > > > > responses, Daniel.  Does one need a gmail account to use googlegroups?
> > > > > 
> > > > > In any event, here is the bzip2 file.  Thanks - John
> > > > > 
> > > > > On Tue, 2009-03-24 at 11:44 -0300, Daniel Cid wrote:
> > > > > > Yes, try zipping it and sending to the list (or directly to my email
> > > > > > if you think it may contain confidential
> > > > > > information). It will certainly help us debug this issue.
> > > > > > 
> > > > > > Thanks,
> > > > > > 
> > > > > > --
> > > > > > Daniel B. Cid
> > > > > > dcid ( at ) ossec.net
> > > > > > 
> > > > > > On Fri, Mar 20, 2009 at 3:13 AM, Nerijus Krukauskas
> > > > > > <nkrukaus...@gmail.com> wrote:
> > > > > > >
> > > > > > > On 19/03/2009, John A. Sullivan III 
> > > > > > > <jsulli...@opensourcedevel.com> wrote:
> > > > > > >>
> > > > > > >> Thanks, Daniel.  I have the trace but it is a 40 MB file.  How 
> > > > > > >> shall I
> > > > > > >> send it to you? - John
> > > > > > >
> > > > > > >  I believe that if you try to zip it, it's gonna be something 
> > > > > > > around 4 MB... :)
> > > > > > >
> > > > > > > --
> > > > > > > http://nk99.org/
> > > > > > >
> > > > Hello, all.  I do have some more information on this serious bug.  It
> > > > has now bitten us on two out of two vservers.
> > > > 
> > > > We first thought it might have to do with our use of wildcards in the
> > > > localfile definitions, e.g., 
> > > >   <localfile>
> > > >     <log_format>syslog</log_format>
> > > >     <location>/vservers/[a-zA-Z0-9]*/var/log/maillog</location>
> > > >   </localfile>
> > > > So we pulled them all out.  We still had the same problem.  However, it
> > > > did seem to be coincidental with not being able to find specified files.
> > > > We had mistyped some file names and paths and saw this in the error logs
> > > > before the service spun out of control:
> > > > 
> > > > 2009/03/30 04:57:14 ossec-syscheckd: INFO: Starting syscheck scan (db).
> > > > 2009/03/30 04:58:41 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/vservers/w01/var/log/httpd/ssipki.error_log'.
> > > > 2009/03/30 04:58:41 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/vservers/w01/var/log/httpd/ssipki.access_log'.
> > > > 2009/03/30 04:58:41 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/admin-serv/error'.
> > > > 2009/03/30 04:58:41 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/admin-serv/access'.
> > > > 2009/03/30 04:58:41 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/slapd-ldap01/errors'.
> > > > 2009/03/30 05:00:51 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/vservers/w01/var/log/httpd/ssipki.error_log'.
> > > > 2009/03/30 05:00:51 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/vservers/w01/var/log/httpd/ssipki.access_log'.
> > > > 2009/03/30 05:00:51 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/admin-serv/error'.
> > > > 2009/03/30 05:00:51 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/admin-serv/access'.
> > > > 2009/03/30 05:00:51 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/slapd-ldap01/errors'.
> > > > 2009/03/30 05:03:01 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/vservers/w01/var/log/httpd/ssipki.error_log'.
> > > > 2009/03/30 05:03:01 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/vservers/w01/var/log/httpd/ssipki.access_log'.
> > > > 2009/03/30 05:03:01 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/admin-serv/error'.
> > > > 2009/03/30 05:03:01 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/admin-serv/access'.
> > > > 2009/03/30 05:03:01 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/slapd-ldap01/errors'.
> > > > 2009/03/30 05:05:11 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/vservers/w01/var/log/httpd/ssipki.error_log'.
> > > > 2009/03/30 05:05:11 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/vservers/w01/var/log/httpd/ssipki.access_log'.
> > > > 2009/03/30 05:05:11 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/admin-serv/error'.
> > > > 2009/03/30 05:05:11 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/admin-serv/access'.
> > > > 2009/03/30 05:05:11 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/slapd-ldap01/errors'.
> > > > 2009/03/30 05:07:21 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/vservers/w01/var/log/httpd/ssipki.error_log'.
> > > > 2009/03/30 05:07:21 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/vservers/w01/var/log/httpd/ssipki.access_log'.
> > > > 2009/03/30 05:07:21 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/admin-serv/error'.
> > > > 2009/03/30 05:07:21 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/admin-serv/access'.
> > > > 2009/03/30 05:07:21 ossec-logcollector(1103): ERROR: Unable to open 
> > > > file '/var/log/dirsrv/slapd-ldap01/errors'.
> > > > 2009/03/30 05:09:32 ossec-logcollector(1904): INFO: File not available, 
> > > > ignoring it: '/vservers/w01/var/log/httpd/ssipki.error_log'.
> > > > 2009/03/30 05:09:32 ossec-logcollector(1904): INFO: File not available, 
> > > > ignoring it: '/vservers/w01/var/log/httpd/ssipki.access_log'.
> > > > 2009/03/30 05:09:32 ossec-logcollector(1904): INFO: File not available, 
> > > > ignoring it: '/var/log/dirsrv/admin-serv/error'.
> > > > 2009/03/30 05:09:32 ossec-logcollector(1904): INFO: File not available, 
> > > > ignoring it: '/var/log/dirsrv/admin-serv/access'.
> > > > 2009/03/30 05:09:32 ossec-logcollector(1904): INFO: File not available, 
> > > > ignoring it: '/var/log/dirsrv/slapd-ldap01/errors'.
> > > > 2009/03/30 05:16:10 ossec-syscheckd: INFO: Ending syscheck scan (db).
> > > > 
> > > > On our second vserver, we did try wildcards in the directories
> > > > definitions.  That gave us the following before spinning out of control:
> > > > 2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: 
> > > > '/user/local/sbin': No such file or directory
> > > > 2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: 
> > > > '/vservers/*/etc': No such file or directory
> > > > 2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: 
> > > > '/vservers/*/usr/bin': No such file or directory
> > > > 2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: 
> > > > '/vservers/*/usr/sbin': No such file or directory
> > > > 2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: 
> > > > '/vservers/*/bin': No such file or directory
> > > > 2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: 
> > > > '/vservers/*/sbin': No such file or directory
> > > > 2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: 
> > > > '/vservers/*/usr/local/bin': No such file or directory
> > > > 2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: 
> > > > '/vservers/*/user/local/sbin': No such file or directory
> > > > 2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: 
> > > > '/vservers/*/usr/local/etc': No such file or directory
> > > > 2009/03/30 05:51:22 ossec-syscheckd: INFO: Starting syscheck scan (db).
> > > > 
> > > > Having corrected the paths in the first vserver and taken out the wild
> > > > cards, it seems to be behaving itself.  However, not being able to use
> > > > wild cards or regex's in the directories and localfiles definitions is
> > > > certainly inconvenient when we anticipate hundreds of virtual machines
> > > > on some of these systems.
> > > > 
> > > > That still leaves us with the base problem.  It appears that if ossec
> > > > syscheckd encounters enough missing files, it does spin out of control
> > > > and requires a power cycle of the system to recover.  Thanks - John
> > > > 
> > > > PS - I'm still not receiving any emails from the mail list.
> > > > 
> > > Oops! I spoke to soon.  The first vserver just went out of control but
> > > again, it is about missing files.  We had defined some directories we
> > > knew didn't have any files just in case they were populated in the
> > > future.  We would hope we could do that to prevent human error.  Here is
> > > what the logs showed before CPU usage spiked to 100%:
> > > 
> > > 2009/03/30 06:22:20 ossec-syscheckd: Error opening directory: 
> > > '/user/local/sbin': No such file or directory
> > > 2009/03/30 06:23:07 ossec-syscheckd: Error opening directory: 
> > > '/vservers/ns02/user/local/sbin': No such file or directory
> > > 2009/03/30 06:23:57 ossec-syscheckd: Error opening directory: 
> > > '/vservers/w01/user/local/sbin': No such file or directory
> > > 2009/03/30 06:25:18 ossec-syscheckd: Error opening directory: 
> > > '/vservers/pg01/user/local/sbin': No such file or directory
> > > 2009/03/30 06:26:43 ossec-syscheckd: Error opening directory: 
> > > '/vservers/ld01/user/local/sbin': No such file or directory
> > > 2009/03/30 06:28:43 ossec-syscheckd: INFO: Starting syscheck scan (db).
> > > 
> > > 
> > talk about embarassment - I just noticed the typo - however, it again
> > emphasizes the point that ossec gets very unhappy if it can't find
> > something that has been defined in ossec.conf - John
> 
> Bad news! The first vserver spun out of control again.  This is with all
> typos corrected and no wild cards.  Here is the log since the last
> reboot:
> 
> 2009/03/30 07:09:44 ossec-execd: INFO: Started (pid: 5743).
> 2009/03/30 07:09:44 ossec-agentd(1410): INFO: Reading authentication keys 
> file.
> 2009/03/30 07:09:44 ossec-agentd: INFO: No previous counter available for 
> 'vs01'.
> 2009/03/30 07:09:44 ossec-agentd: INFO: Assigning counter for agent 
> vserver01: '0:0'.
> 2009/03/30 07:09:44 ossec-agentd: INFO: Assigning sender counter: 6:4637
> 2009/03/30 07:09:44 ossec-agentd: INFO: Started (pid: 5747).
> 2009/03/30 07:09:44 ossec-agentd: INFO: Server IP Address: 172.x.x.30
> 2009/03/30 07:09:44 ossec-agentd: INFO: Trying to connect to server 
> (172.x.x.30:1514).
> 2009/03/30 07:09:48 ossec-syscheckd: INFO: Started (pid: 5755).
> 2009/03/30 07:09:48 ossec-rootcheck: INFO: Started (pid: 5755).
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/var/log/messages'.
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/var/log/secure'.
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/var/log/maillog'.
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/var/log/cron'.
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/vservers/w01/var/log/httpd/ssipkipub.error_log'.
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/vservers/w01/var/log/httpd/ssipkipub.access_log'.
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/vservers/w01/var/log/httpd/error_log'.
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/vservers/w01/var/log/httpd/access_log'.
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/vservers/w01/var/log/httpd/ssl_error_log'.
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/vservers/w01/var/log/httpd/ssl_access_log'.
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/vservers/ld01/var/log/dirsrv/admin-serv/error'.
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/vservers/ld01/var/log/dirsrv/admin-serv/access'.
> 2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: 
> '/vservers/ld01/var/log/dirsrv/slapd-ldap01/errors'.
> 2009/03/30 07:09:50 ossec-logcollector: INFO: Started (pid: 5751).
> 2009/03/30 07:09:59 ossec-agentd(4102): INFO: Connected to the server 
> (172.x.x.30:1514).
> 2009/03/30 07:19:03 ossec-syscheckd: INFO: Starting syscheck scan (db).
> 2009/03/30 07:38:01 ossec-syscheckd: INFO: Ending syscheck scan (db).
> 2009/03/30 07:38:21 ossec-rootcheck: INFO: Starting rootcheck scan.
> 
> I suppose this implies it is not about not finding files but something
> specific to searching these vserver directories. They should appear as
> normal file systems.  I will next try it without any vserver directories
> - John
<snip>
Argh! Even worse news.  It still hangs - not a single mention of vserver
directories.  As far as I can tell, this should be just like a regular
server - we are only scanning the host.  No clues in the log files other
than it didn't take long to lock.  Here's the log from restart:

2009/03/30 08:11:31 ossec-execd: INFO: Started (pid: 4373).
2009/03/30 08:11:31 ossec-agentd(1410): INFO: Reading authentication keys file.
2009/03/30 08:11:31 ossec-agentd: INFO: No previous counter available for 
'vserver01'.
2009/03/30 08:11:31 ossec-agentd: INFO: Assigning counter for agent vserver01: 
'0:0'.
2009/03/30 08:11:31 ossec-agentd: INFO: Assigning sender counter: 7:3613
2009/03/30 08:11:31 ossec-agentd: INFO: Started (pid: 4377).
2009/03/30 08:11:31 ossec-agentd: INFO: Server IP Address: 172.30.10.30
2009/03/30 08:11:31 ossec-agentd: INFO: Trying to connect to server 
(172.30.10.30:1514).
2009/03/30 08:11:36 ossec-syscheckd: INFO: Started (pid: 4385).
2009/03/30 08:11:36 ossec-rootcheck: INFO: Started (pid: 4385).
2009/03/30 08:11:37 ossec-logcollector(1950): INFO: Analyzing file: 
'/var/log/messages'.
2009/03/30 08:11:37 ossec-logcollector(1950): INFO: Analyzing file: 
'/var/log/secure'.
2009/03/30 08:11:37 ossec-logcollector(1950): INFO: Analyzing file: 
'/var/log/maillog'.
2009/03/30 08:11:37 ossec-logcollector(1950): INFO: Analyzing file: 
'/var/log/cron'.
2009/03/30 08:11:37 ossec-logcollector: INFO: Started (pid: 4381).
2009/03/30 08:11:46 ossec-agentd(4102): INFO: Connected to the server 
(172.30.10.30:1514).
2009/03/30 08:11:57 ossec-logcollector(1225): INFO: SIGNAL Received. Exit 
Cleaning...
2009/03/30 08:11:57 ossec-syscheckd(1225): INFO: SIGNAL Received. Exit 
Cleaning...
2009/03/30 08:11:57 ossec-agentd(1225): INFO: SIGNAL Received. Exit Cleaning...
2009/03/30 08:11:57 ossec-execd(1314): INFO: Shutdown received. Deleting 
responses.
2009/03/30 08:11:57 ossec-execd(1225): INFO: SIGNAL Received. Exit Cleaning...
2009/03/30 08:12:50 ossec-execd: INFO: Started (pid: 5438).
2009/03/30 08:12:50 ossec-agentd(1410): INFO: Reading authentication keys file.
2009/03/30 08:12:50 ossec-agentd: INFO: No previous counter available for 
'vserver01'.
2009/03/30 08:12:50 ossec-agentd: INFO: Assigning counter for agent vserver01: 
'0:0'.
2009/03/30 08:12:50 ossec-agentd: INFO: Assigning sender counter: 7:3623
2009/03/30 08:12:50 ossec-agentd: INFO: Started (pid: 5442).
2009/03/30 08:12:50 ossec-agentd: INFO: Server IP Address: 172.30.10.30
2009/03/30 08:12:50 ossec-agentd: INFO: Trying to connect to server 
(172.30.10.30:1514).
2009/03/30 08:12:51 ossec-agentd(4102): INFO: Connected to the server 
(172.30.10.30:1514).
2009/03/30 08:12:54 ossec-syscheckd: INFO: Started (pid: 5450).
2009/03/30 08:12:54 ossec-rootcheck: INFO: Started (pid: 5450).
2009/03/30 08:12:56 ossec-logcollector(1950): INFO: Analyzing file: 
'/var/log/messages'.
2009/03/30 08:12:56 ossec-logcollector(1950): INFO: Analyzing file: 
'/var/log/secure'.
2009/03/30 08:12:56 ossec-logcollector(1950): INFO: Analyzing file: 
'/var/log/maillog'.
2009/03/30 08:12:56 ossec-logcollector(1950): INFO: Analyzing file: 
'/var/log/cron'.
2009/03/30 08:12:56 ossec-logcollector: INFO: Started (pid: 5446).
2009/03/30 08:17:46 ossec-syscheckd: INFO: Starting syscheck scan (db).
2009/03/30 08:24:22 ossec-syscheckd: INFO: Ending syscheck scan (db).
2009/03/30 08:24:42 ossec-rootcheck: INFO: Starting rootcheck scan.

Where do I look now to solve this problem? Thanks - John
-- 
John A. Sullivan III
Open Source Development Corporation
+1 207-985-7880
jsulli...@opensourcedevel.com

http://www.spiritualoutreach.com
Making Christianity intelligible to secular society

Reply via email to