[ 
https://issues.apache.org/jira/browse/HADOOP-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587549#action_12587549
 ] 

Hemanth Yamijala commented on HADOOP-3153:
------------------------------------------

- I am not sure we need the refactoring. Why can't we call 
hadoopCluster.deallocate from the except code block ? Even if for some reason 
we cannot, I think we must refactor this differently. 
{noformat}    
  def shutdown_job(self, ringClient=None):
    if ringClient is not None:
      self.__log.debug("Calling rm.stop()")
      ringClient.stopRM()
      self.__log.debug("Returning from rm.stop()")
      self.__log.info("Job Shutdown by informing ringmaster.")
    else:
      self.delete_job(self.jobId)
      self.__log.info("Job %s removed from queue directly." % self.jobId)
{noformat}
And there must be a way to get the ringClient from hadoopCluster in hodRunner.py

- In checkStateFile: I think we should check that self.__store is writable. 
Alternatively, can we check if the file does not exist, by using errno to 
differentiate permission errors.
- Provide an accessor for _hodState__stateFile in hodState and use that.
- testAllocateWithInvalidStateStore - we can add a test case where the 
directory has no write permissions.

> [HOD] Hod should deallocate cluster if there's a problem in writing 
> information to the state file
> -------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3153
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>    Affects Versions: 0.16.0
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod Kumar Vavilapalli
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3153, HADOOP-3153.1
>
>
> Consider a scenario where hod runs allocate successfully, but isn't able to 
> save teh allocated information to the clusters.state file. In such a case, it 
> gets an error and exits. But the cluster remains allocated, and unfortunately 
> the user cannot deallocate the cluster now unless he knows the cluster 
> directory.
> It is better if HOD can deallocate the cluster in such an error condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to