[HOD] Cleanup idle HOD clusters whose ringmaster nodes might have gone down
---------------------------------------------------------------------------
Key: HADOOP-4938
URL: https://issues.apache.org/jira/browse/HADOOP-4938
Project: Hadoop Core
Issue Type: Improvement
Components: contrib/hod
Reporter: Hemanth Yamijala
As mentioned in HADOOP-4937, sometimes in large cluster deployments, faulty
nodes on which the ringmaster process comes up may go down after the cluster is
successfully allocated. Such clusters fail to deallocate automatically even if
the idleness limit of the cluster is exceeded. This is because the idleness is
tracked by the ringmaster process which itself has gone down.
As large number of nodes can get held up due to this, such clusters should be
detected and deallocated in some manner.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.