[
https://issues.apache.org/jira/browse/HADOOP-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661535#action_12661535
]
Hemanth Yamijala commented on HADOOP-4937:
------------------------------------------
+1 for the fix.
Results of test patch:
[exec]
[exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new
or modified tests.
[exec] Please justify why no tests are needed for
this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning
messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number
of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs
warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath
integrity.
The -1 on tests is because HOD tests are run externally.
> [HOD] Include ringmaster RPC port information in the notes attribute
> --------------------------------------------------------------------
>
> Key: HADOOP-4937
> URL: https://issues.apache.org/jira/browse/HADOOP-4937
> Project: Hadoop Core
> Issue Type: New Feature
> Components: contrib/hod
> Reporter: Hemanth Yamijala
> Assignee: Peeyush Bishnoi
> Attachments: hadoop-4937-1.txt, hadoop-4937-2.txt, hadoop-4937.txt
>
>
> In large cluster deployments, due to node failures, it sometimes happens that
> HOD clusters get allocated, but not deallocated even after the idleness limit
> of the cluster (the time for which no jobs are run) exceeds. One of the main
> reasons for this is the ringmaster process which is responsible for tracking
> and cleaning an idle cluster (of which it is a part) itself goes down. To
> handle such scenarios it makes sense to centrally track the ringmaster nodes
> for suspicious clusters. But since the information about which port the
> ringmaster is bound to is not centrally available, this becomes impossible to
> monitor.
> This issue is an enhancement request to include ringmaster RPC port
> information along with the JT and NN info as part of the resource manager's
> notes attribute so that it can be used by any monitoring processes built
> around it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.