On 19 May 2010, at 00:09, Phillips, Dustin B wrote:
Hi Ton,
Interesting...I'm pretty confident the clock has changed on the
server though. There are only a couple of folks that have access
and I have been checking the clock periodically since this started
as I have been comparing the current time to log timestamps, etc.
Also, the graphs stopped working last week when we upgraded to 3.7.0
but it appears the WebUI went stale earlier today as far as we can
tell. That may suggest the issues are not necessarily related.
We do have plans to migrate to a new monitoring host as early as
tomorrow and were hoping to have a sane configuration on this server
so that the problems are not carried over to the new server. At
this point I'm not 100% confident our database is sane, not to
mention our apparent failure to update RRDs. :(
Anyway, here is the data you suggested we check. Perhaps something
will stand out to you.
Thanks again for your attention on this matter.
To check the RRDs, pick an RRD file that is not updating and run
rrdtool dump {filename}.
Here are a few recent lines from the top of one of the failing RRD's:
<!-- 2010-05-18 21:30:00 GMT / 1274218200 --> <row><v>
NaN </v></row>
<!-- 2010-05-18 21:35:00 GMT / 1274218500 --> <row><v>
NaN </v></row>
<!-- 2010-05-18 21:40:00 GMT / 1274218800 --> <row><v>
NaN </v></row>
<!-- 2010-05-18 21:45:00 GMT / 1274219100 --> <row><v>
NaN </v></row>
<!-- 2010-05-18 21:50:00 GMT / 1274219400 --> <row><v>
NaN </v></row>
<!-- 2010-05-18 21:55:00 GMT / 1274219700 --> <row><v>
NaN </v></row>
<!-- 2010-05-18 22:00:00 GMT / 1274220000 --> <row><v>
NaN </v></row>
<!-- 2010-05-18 22:05:00 GMT / 1274220300 --> <row><v>
NaN </v></row>
<!-- 2010-05-18 22:10:00 GMT / 1274220600 --> <row><v>
NaN </v></row>
<!-- 2010-05-18 22:15:00 GMT / 1274220900 --> <row><v>
NaN </v></row>
<!-- 2010-05-18 22:20:00 GMT / 1274221200 --> <row><v>
NaN </v></row>
<!-- 2010-05-18 22:25:00 GMT / 1274221500 --> <row><v>
NaN </v></row>
<!-- 2010-05-18 22:30:00 GMT / 1274221800 --> <row><v>
NaN </v></row>
What's the value if you grep for lastupdate?
The other thing we've seen is that there can be a mismatch between
rrdtool and the perl libraries. What platform are you on? What are the
rrd tool libraries versions you have? Normally this affects all
rrdtool operations. I guess it could affect inserts but not reads.
To check the runtime database, run: select * from
nagios_programstatus\G
mysql> select * from nagios_programstatus;
+------------------+-------------+---------------------
+---------------------+---------------------+----------------------
+------------+-------------+---------------------
+---------------------+-----------------------
+-------------------------------+--------------------------------
+----------------------------+-----------------------------
+------------------------+------------------------
+----------------------------+--------------------------
+-------------------+----------------------
+--------------------------+-----------------------------
+---------------------------+------------------------------+
| programstatus_id | instance_id | status_update_time |
program_start_time | program_end_time | is_currently_running |
process_id | daemon_mode | last_command_check | last_log_rotation
| notifications_enabled | active_service_checks_enabled |
passive_service_checks_enabled | active_host_checks_enabled |
passive_host_checks_enabled | event_handlers_enabled |
flap_detection_enabled | failure_prediction_enabled |
process_performance_data | obsess_over_hosts | obsess_over_services
| modified_host_attributes | modified_service_attributes |
global_host_event_handler | global_service_event_handler |
+------------------+-------------+---------------------
+---------------------+---------------------+----------------------
+------------+-------------+---------------------
+---------------------+-----------------------
+-------------------------------+--------------------------------
+----------------------------+-----------------------------
+------------------------+------------------------
+----------------------------+--------------------------
+-------------------+----------------------
+--------------------------+-----------------------------
+---------------------------+------------------------------+
| 961480 | 1 | 2010-05-18 22:55:31 | 2010-05-18
22:17:24 | 0000-00-00 00:00:00 | 1 | 2829
| 1 | 2010-05-18 22:55:31 | 1970-01-01 00:00:00
| 1 | 1
| 1 | 1
| 1 | 1
| 1 | 1
| 1 | 1 |
1 | 223 | 223
| | |
+------------------+-------------+---------------------
+---------------------+---------------------+----------------------
+------------+-------------+---------------------
+---------------------+-----------------------
+-------------------------------+--------------------------------
+----------------------------+-----------------------------
+------------------------+------------------------
+----------------------------+--------------------------
+-------------------+----------------------
+--------------------------+-----------------------------
+---------------------------+------------------------------+
1 row in set (0.00 sec)
This looks okay.
To check the hosts, run: select max(status_update_time) from
nagios_hoststatus;
mysql> select max(status_update_time) from nagios_hoststatus;
+-------------------------+
| max(status_update_time) |
+-------------------------+
| 2010-05-19 02:46:19 |
+-------------------------+
1 row in set (0.00 sec)
This is why some hosts are not updating. Strange how the value is
pushed ahead by a few hours. You can either fix by changing the value
in the database and then updates will work again.
Ton
_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/lists/listinfo/opsview-users