[
https://issues.apache.org/jira/browse/HADOOP-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587549#action_12587549
]
Hemanth Yamijala commented on HADOOP-3153:
------------------------------------------
- I am not sure we need the refactoring. Why can't we call
hadoopCluster.deallocate from the except code block ? Even if for some reason
we cannot, I think we must refactor this differently.
{noformat}
def shutdown_job(self, ringClient=None):
if ringClient is not None:
self.__log.debug("Calling rm.stop()")
ringClient.stopRM()
self.__log.debug("Returning from rm.stop()")
self.__log.info("Job Shutdown by informing ringmaster.")
else:
self.delete_job(self.jobId)
self.__log.info("Job %s removed from queue directly." % self.jobId)
{noformat}
And there must be a way to get the ringClient from hadoopCluster in hodRunner.py
- In checkStateFile: I think we should check that self.__store is writable.
Alternatively, can we check if the file does not exist, by using errno to
differentiate permission errors.
- Provide an accessor for _hodState__stateFile in hodState and use that.
- testAllocateWithInvalidStateStore - we can add a test case where the
directory has no write permissions.
> [HOD] Hod should deallocate cluster if there's a problem in writing
> information to the state file
> -------------------------------------------------------------------------------------------------
>
> Key: HADOOP-3153
> URL: https://issues.apache.org/jira/browse/HADOOP-3153
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/hod
> Affects Versions: 0.16.0
> Reporter: Hemanth Yamijala
> Assignee: Vinod Kumar Vavilapalli
> Fix For: 0.17.0
>
> Attachments: HADOOP-3153, HADOOP-3153.1
>
>
> Consider a scenario where hod runs allocate successfully, but isn't able to
> save teh allocated information to the clusters.state file. In such a case, it
> gets an error and exits. But the cluster remains allocated, and unfortunately
> the user cannot deallocate the cluster now unless he knows the cluster
> directory.
> It is better if HOD can deallocate the cluster in such an error condition.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.