[ 
https://issues.apache.org/jira/browse/HADOOP-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066833#comment-14066833
 ] 

Allen Wittenauer commented on HADOOP-6473:
------------------------------------------

We sort of have this today with the health check being done by YARN.  But that 
really should be expanded to cover HDFS as well.  That's probably a separate 
JIRA from this one, however.

> Add hadoop health check/diagnostics to run from command line, JSP pages, 
> other tools
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6473
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Steve Loughran
>            Priority: Minor
>
> If the lifecycle ping() is for short-duration "are we still alive" checks, 
> Hadoop still needs something bigger to check the overall system health,.This 
> would be for end users, but also for automated cluster deployment, a complete 
> validation of the cluster, 
> It could be a command line tool, and something that runs on different nodes, 
> checked via IPC or JSP. the idea would be to do thorough checks with good 
> diagnostics.  Oh, and they should be executable through JUnit too.
> For example
>  -if running on windows, check that cygwin is on the path, fail with a 
> pointer to a wiki issue if not
>  -datanodes should check that it can create locks on the filesystem, create 
> files, timestamps are (roughly) aligned with local time.
>  -namenodes should try and create files/locks in the filesystem
>  -task tracker should try and exec() something
>  -run through the classpath and look for problems; duplicate JARs, 
> unsupported java, xerces versions, etc.
> * The number of tests should be extensible -rather than one single class with 
> all the tests, there'd be something separate for name, task, data, job 
> tracker nodes
> * They can't be in the nodes themselves, as they should be executable even if 
> the nodes don't come up. 
> * output could be in human readable text or html, and a form that could be 
> processed through hadoop itself in future
> * these tests could have side effects, such as actually trying to submit work 
> to a cluster



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to