Hi all, Here is a quick background of my current setup for monitoring:
I have an in-house tool monitoring clusters. The tool simply uses ssh to launch perl scripts on remote machines and grab all of the output to stores it on a central location in a logfile. This output is parsed and for any pre-defined tags (WARNING/CRITICAL/ERROR). If any of these tags are noticed the message is logged using syslog. The scripts residing on remote hosts is a collection of perl functions. Each one is executed one after another. Some of these functions utilize a status file from previous run to verify if state of items changed from last time. Some of these functions can be given a special argument to set the current state as default state for next iteration of checks. Cluster are monitored from the head nodes since not all nodes are accessible from central location. Head node checks contain a special function that simply use DSH to launch checks on all nodes. After looking at nagios and its check_cluster plugins I realized I would really like to monitor each of the nodes individually since I want to be able to disable a particular check on a particular node. Also I want to be able to use status files for some of the checks. As of now I have yet to find any plugin that utilizes a status file to monitor hosts. All plugin simply use current output from commands to verify the status. I will be using active checks on the clusters therefore I will configure nrpe on all nodes. My plan of attack was to simply use head node as a gateway and all nodes and services to be defined on the head node (under nrpe). From central location I can simply execute a check_nrpe type script to verify backend nodes. I still haven't figured out how I can use status files from each iteration of checks to validate status. I'd appreciate some inputs as to what are the best options in monitoring clusters where backend nodes are hidden from the central monitoring server. Also some help with use of state files. Thanks all, TP. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null