Someone on the list is looking at monitoring hadoop features with
nagios. Nagios can be configured with an event_handler. In the past I
have written event handlers to do operations like this. If down ---
use SSH key and restart.

However....Since you have an SSH key on your master node, you should
be able to have a centralized node restarter running from the master
cron. Maybe an interesting argument to run a separate nagios as your
hadoop user!

In any case you can also run a cronjob on each slave as suggested above.

The thing about all systems like this is you have to remember to shut
them down when you actually want the service down for service etc.

We run Nagios and cacti so I would like to develop check scripts for
these services. I am going to get  SVN repo together if anyone is
interested in contributing let me know.

Reply via email to