[ 
https://issues.apache.org/jira/browse/HDFS-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202846#comment-13202846
 ] 

Bikas Saha commented on HDFS-2909:
----------------------------------

This is happening because JournalSet.mapJournalsAndReportErrors() calls 
abortAllJournals() and throws new IOException when a required journal fails (in 
this case, the shared dir). I still have to see why the NN continues to run as 
active after this.
Coming back to the above, it seems that the abortAllJournals() code implies 
that NN should stop running when something like this happens. That would mean 
that inaccessibility of the the single shared edits dir will cause the active 
NN to shutdown. Most likely the standby NN will also not be able to access the 
shared edits dir. Which means that the shared edits dir has become a single 
point of failure for the HA service.
Still looking at why NN did not abort.

                
> HA: Inaccessible shared edits dir not getting removed from FSImage storage 
> dirs upon error
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-2909
>                 URL: https://issues.apache.org/jira/browse/HDFS-2909
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to