Re: nagios to monitor hadoop datanodes!

Steve Loughran Wed, 08 Oct 2008 08:55:03 -0700

Edward Capriolo wrote:

The simple way would be use use nrpe and check_proc. I have never
tested, but a command like 'ps -ef | grep java  | grep NameNode' would
be a fairly decent check. That is not very robust but it should let
you know if the process is alive.


You could also monitor the web interfaces associated with the
different servers remotely.

check_tcp!hadoop1:56070

Both the methods I suggested are quick hacks. I am going to
investigate the JMX options as well  and work them into cacti

We're developing liveness and pings under a couple of JIRA issues;nothing will be released before 0.20


https://issues.apache.org/jira/browse/HADOOP-3628
https://issues.apache.org/jira/browse/HADOOP-3969

I don't consider hitting the web page a quick hack; for HADOOP-3969 I'dquite like to have the public liveness test a page you can GET or HEAD,as that way it becomes trivial for your existing web page healthchecking code to pull in all the hadoop services. The best bit: when itfails, the ops team can point their browser at the same URL and see whatis up. And if you are a standalone developer -you are the ops team!


-steve

--
Steve Loughran                  http://www.1060.org/blogxter/publish/5
Author: Ant in Action           http://antbook.org/

Re: nagios to monitor hadoop datanodes!

Reply via email to