Peter, You haven't mentioned what distribution of Linux this is, also the approximate number of hosts or services being monitored or if this is a distributed monitoring system. It's always good to give some background info on the system plus Opsview version number when asking for help.
The fact that you are not seeing insert errors in your Opsview log means the database is getting updated or Opsview isn't started at all or something else? First are all the key processing running? Check if Opsview-web is running /etc/init.d/opsview-web status Check if Opsview service and nagios are running /etc/init.d/opsview status You could check to see if the underlying Nagios "runtime" database is getting written to. You do know how to access to mysql database right? If not we all had to learn sometime.. If you don't know id and password to get into mysql you could borrow the ones Opsview users. Opsview usually would connect as user nagios. Opsview password would normally be in /usr/local/nagios/etc/opsview.conf file. The command below is pulling last seen info from core monitoring runtime database, times are in UTC so you will have to convert to localtime to know last time things were seen. r...@nms:~# mysql -uroot -p --batch -e "SELECT status_update_time,is_currently_running,last_command_check FROM runtime.nagios_programstatus" Enter password: status_update_time is_currently_running last_command_check 2009-12-13 14:52:36 1 2009-12-13 14:52:36 There are very few times a linux box would need a reboot, just learning how to restart services in linux usually takes care of 95% of issues that arise. Anyway maybe someone else will chime in with some better instructions here but maybe the stuff below will give you some things to try. I hate to see you rebuild things if the fix is simple.. James Whittington VC3, Inc. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Peter Plate Sent: Sunday, December 13, 2009 6:16 AM To: Opsview Users Subject: Re: [opsview-users] Status Page contains old info Hello, I've just copied my my.cnf and edited the original. Rebooted the server just to be sure (reboot cmd). Still no update on my status screens in opsview , still shows last check some hours ago and clicking on it shows last check some minutes ago (less then set time of 5 mins). my opsview/opsview-web.log last entries : [2009/12/13 11:40:57] [Opsview.Web.Controller.Root] [ERROR] Errors encountered: DBIx::Class::ResultSet::first(): DBI Connection failed: DBI connect('database=opsview;host=l$ [2009/12/13 11:40:57] [Catalyst] [ERROR] DBIx::Class::ResultSet::first(): DBI Connection failed: DBI connect('database=opsview;host=localhost','opsview',...) failed: Can't $ [2009/12/13 11:40:57] [Catalyst] [ERROR] Caught exception in Opsview::Web::Controller::Root->end "Can't insert new Opsview::Auditlog: DBI connect('database=opsview;host=loc$ at /usr/local/opsview-web/script/../lib/Opsview/Web/Controller/Root.pm line 344" [2009/12/13 11:55:48] [Opsview.Web.Controller.Root] [INFO] Username 'admin' logged in via auth_tkt Seems it's unable to connect to the database. Or do I read this error incorrectly ? my user.log last entries : Dec 13 11:46:38 nagios ndo2db: Error: mysql_query() failed for 'INSERT INTO nagios_servicechecks SET instance_id='1', service_object_id='6778', check_type='0', current_chec$ Dec 13 11:46:38 nagios ndo2db: mysql_error: 'Lock wait timeout exceeded; try restarting transaction' After the reboot at 11:48 no more messages in te user.log I also wanted to note the following : One of my hosts had downtime planned. This downtime is show in opsview status page. However when I click on it and remove it , it is not removed. I've waited already 10 minutes to see it go away , but it's still marked as downtime , and I still can remove it (again and again..) Would be great to get it fixed. If we cannot fix it quickly , I might have to rebuild the machine as we depend on the monitoring for now. ________________________________ Van: [email protected] namens James Whittington Verzonden: zo 13-12-2009 4:46 Aan: Opsview Users Onderwerp: Re: [opsview-users] Status Page contains old info Peter, That is an interesting MySQL error, I haven't run across that before but a google search brought up some good hits on "total number of locks exceeds the lock table size" Here is one such link.. http://mrothouse.wordpress.com/2006/10/20/mysql-error-1206/ <http://mrothouse.wordpress.com/2006/10/20/mysql-error-1206/> It appears others were able to resolve the error by increasing the innodb_buffer_pool_size <http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html> variable in /etc/my.cnf. I would suggest making a copy of my.cnf and they try to adjust that variable and restarting mysql to see if that makes a difference. Good luck.. James Whittington VC3, Inc. From: [email protected] [mailto:[email protected]] On Behalf Of Peter Plate Sent: Saturday, December 12, 2009 4:49 PM To: Opsview Users Subject: Re: [opsview-users] Status Page contains old info Syslog : Dec 12 22:35:02 nagios /USR/SBIN/CRON[11973]: (nagios) CMD (/usr/local/nagios/bin/call_nmis nmis.pl type=collect mthread=true > /dev/null 2>&1) Dec 12 22:35:02 nagios /USR/SBIN/CRON[11972]: (root) CMD (if [ -x /usr/bin/mrtg ] && [ -r /etc/mrtg.cfg ]; then env LANG=C /usr/bin/mrtg /etc/mrtg.cfg >> /var/log/mrtg/mrtg$ Dec 12 22:35:03 nagios /USR/SBIN/CRON[11975]: (nagios) CMD (/usr/local/nagios/bin/mrtg_genstats.sh > /dev/null 2>&1) Dec 12 22:35:15 nagios ndo2db: Successfully disconnected from MySQL database Dec 12 22:35:15 nagios ndo2db: Successfully connected to MySQL database Dec 12 22:36:14 nagios ndo2db: Successfully disconnected from MySQL database Dec 12 22:36:14 nagios ndo2db: Successfully connected to MySQL database Dec 12 22:36:34 nagios ndo2db: Successfully disconnected from MySQL database Dec 12 22:36:34 nagios ndo2db: Successfully connected to MySQL database There are some older items which contain innodb instructions but seem ok to me. Opsview is still 3.03.2290 My user.log contains the following errors : Dec 12 16:48:31 nagios ndo2db: Error: mysql_query() failed for 'INSERT INTO nagios_servicechecks SET instance_id='1', service_object_id='7636', check_type='0', current_chec$ Dec 12 16:48:31 nagios ndo2db: mysql_error: 'The total number of locks exceeds the lock table size' Dec 12 16:48:31 nagios ndo2db: Error: mysql_query() failed for 'INSERT INTO nagios_servicechecks SET instance_id='1', service_object_id='3663', check_type='0', current_chec$ Dec 12 16:48:31 nagios ndo2db: mysql_error: 'The total number of locks exceeds the lock table size' {Truncated !} They seem to be related to the instert fails. However Í really don't know how to fix it. ________________________________ From: [email protected] [mailto:[email protected]] On Behalf Of Ton Voon Sent: zaterdag 12 december 2009 22:33 To: Opsview Users Subject: Re: [opsview-users] Status Page contains old info On 12 Dec 2009, at 21:13, Peter Plate wrote: Hello All, Last night our server with OpsView stopped responding , so we needed to hard reboot it (power off / on). Now it seems the checks are ok and still scheduled , however our status page contains old / outdated info. For example a service check had a problem today at 1:00 am , it was ok at 1:15 am (restart of service) , even the status of the service check shows ok now (clicking on it shows status : OK , and last check time was some minutes ago) But on the status page (Alerts > All Unhandled) it is still in CRITICAL state , and last check is about 8 hours ago , while it is checked every 5 minutes. Can you help us get the Opsview pages in sync again ? PS : My Unix knowledge is limited , the system was installed by previous system administrators , however I learn quickly. You likely have some database problem where the data is not being inserted correctly. Check syslog messages for mysql problems. Which version of Opsview are you using? Ton _______________________________________________ Opsview-users mailing list [email protected] http://lists.opsview.org/lists/listinfo/opsview-users
