[ 
https://issues.apache.org/jira/browse/HADOOP-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661535#action_12661535
 ] 

Hemanth Yamijala commented on HADOOP-4937:
------------------------------------------

+1 for the fix.

Results of test patch:

     [exec]
     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
     [exec]                         Please justify why no tests are needed for 
this patch.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning 
messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath 
integrity.

The -1 on tests is because HOD tests are run externally.

> [HOD] Include ringmaster RPC port information in the notes attribute
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4937
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4937
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>         Attachments: hadoop-4937-1.txt, hadoop-4937-2.txt, hadoop-4937.txt
>
>
> In large cluster deployments, due to node failures, it sometimes happens that 
> HOD clusters get allocated, but not deallocated even after the idleness limit 
> of the cluster (the time for which no jobs are run) exceeds. One of the main 
> reasons for this is the ringmaster process which is responsible for tracking 
> and cleaning an idle cluster (of which it is a part) itself goes down. To 
> handle such scenarios it makes sense to centrally track the ringmaster nodes 
> for suspicious clusters. But since the information about which port the 
> ringmaster is bound to is not centrally available, this becomes impossible to 
> monitor.
> This issue is an enhancement request to include ringmaster RPC port 
> information along with the JT and NN info as part of the resource manager's 
> notes attribute so that it can be used by any monitoring processes built 
> around it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to