On 19 May 2010, at 00:09, Phillips, Dustin B wrote:

Hi Ton,

Interesting...I'm pretty confident the clock has changed on the server though. There are only a couple of folks that have access and I have been checking the clock periodically since this started as I have been comparing the current time to log timestamps, etc. Also, the graphs stopped working last week when we upgraded to 3.7.0 but it appears the WebUI went stale earlier today as far as we can tell. That may suggest the issues are not necessarily related.

We do have plans to migrate to a new monitoring host as early as tomorrow and were hoping to have a sane configuration on this server so that the problems are not carried over to the new server. At this point I'm not 100% confident our database is sane, not to mention our apparent failure to update RRDs. :(

Anyway, here is the data you suggested we check. Perhaps something will stand out to you.

Thanks again for your attention on this matter.

To check the RRDs, pick an RRD file that is not updating and run rrdtool dump {filename}.

Here are a few recent lines from the top of one of the failing RRD's:

<!-- 2010-05-18 21:30:00 GMT / 1274218200 --> <row><v> NaN </v></row> <!-- 2010-05-18 21:35:00 GMT / 1274218500 --> <row><v> NaN </v></row> <!-- 2010-05-18 21:40:00 GMT / 1274218800 --> <row><v> NaN </v></row> <!-- 2010-05-18 21:45:00 GMT / 1274219100 --> <row><v> NaN </v></row> <!-- 2010-05-18 21:50:00 GMT / 1274219400 --> <row><v> NaN </v></row> <!-- 2010-05-18 21:55:00 GMT / 1274219700 --> <row><v> NaN </v></row> <!-- 2010-05-18 22:00:00 GMT / 1274220000 --> <row><v> NaN </v></row> <!-- 2010-05-18 22:05:00 GMT / 1274220300 --> <row><v> NaN </v></row> <!-- 2010-05-18 22:10:00 GMT / 1274220600 --> <row><v> NaN </v></row> <!-- 2010-05-18 22:15:00 GMT / 1274220900 --> <row><v> NaN </v></row> <!-- 2010-05-18 22:20:00 GMT / 1274221200 --> <row><v> NaN </v></row> <!-- 2010-05-18 22:25:00 GMT / 1274221500 --> <row><v> NaN </v></row> <!-- 2010-05-18 22:30:00 GMT / 1274221800 --> <row><v> NaN </v></row>

What's the value if you grep for lastupdate?

The other thing we've seen is that there can be a mismatch between rrdtool and the perl libraries. What platform are you on? What are the rrd tool libraries versions you have? Normally this affects all rrdtool operations. I guess it could affect inserts but not reads.

To check the runtime database, run: select * from nagios_programstatus\G

mysql> select * from nagios_programstatus;
+------------------+-------------+--------------------- +---------------------+---------------------+---------------------- +------------+-------------+--------------------- +---------------------+----------------------- +-------------------------------+-------------------------------- +----------------------------+----------------------------- +------------------------+------------------------ +----------------------------+-------------------------- +-------------------+---------------------- +--------------------------+----------------------------- +---------------------------+------------------------------+ | programstatus_id | instance_id | status_update_time | program_start_time | program_end_time | is_currently_running | process_id | daemon_mode | last_command_check | last_log_rotation | notifications_enabled | active_service_checks_enabled | passive_service_checks_enabled | active_host_checks_enabled | passive_host_checks_enabled | event_handlers_enabled | flap_detection_enabled | failure_prediction_enabled | process_performance_data | obsess_over_hosts | obsess_over_services | modified_host_attributes | modified_service_attributes | global_host_event_handler | global_service_event_handler | +------------------+-------------+--------------------- +---------------------+---------------------+---------------------- +------------+-------------+--------------------- +---------------------+----------------------- +-------------------------------+-------------------------------- +----------------------------+----------------------------- +------------------------+------------------------ +----------------------------+-------------------------- +-------------------+---------------------- +--------------------------+----------------------------- +---------------------------+------------------------------+ | 961480 | 1 | 2010-05-18 22:55:31 | 2010-05-18 22:17:24 | 0000-00-00 00:00:00 | 1 | 2829 | 1 | 2010-05-18 22:55:31 | 1970-01-01 00:00:00 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 223 | 223 | | | +------------------+-------------+--------------------- +---------------------+---------------------+---------------------- +------------+-------------+--------------------- +---------------------+----------------------- +-------------------------------+-------------------------------- +----------------------------+----------------------------- +------------------------+------------------------ +----------------------------+-------------------------- +-------------------+---------------------- +--------------------------+----------------------------- +---------------------------+------------------------------+
1 row in set (0.00 sec)

This looks okay.

To check the hosts, run: select max(status_update_time) from nagios_hoststatus;

mysql> select max(status_update_time) from nagios_hoststatus;
+-------------------------+
| max(status_update_time) |
+-------------------------+
| 2010-05-19 02:46:19     |
+-------------------------+
1 row in set (0.00 sec)

This is why some hosts are not updating. Strange how the value is pushed ahead by a few hours. You can either fix by changing the value in the database and then updates will work again.

Ton

_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/lists/listinfo/opsview-users

Reply via email to