[ https://issues.apache.org/jira/browse/HDFS-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202846#comment-13202846 ]
Bikas Saha commented on HDFS-2909: ---------------------------------- This is happening because JournalSet.mapJournalsAndReportErrors() calls abortAllJournals() and throws new IOException when a required journal fails (in this case, the shared dir). I still have to see why the NN continues to run as active after this. Coming back to the above, it seems that the abortAllJournals() code implies that NN should stop running when something like this happens. That would mean that inaccessibility of the the single shared edits dir will cause the active NN to shutdown. Most likely the standby NN will also not be able to access the shared edits dir. Which means that the shared edits dir has become a single point of failure for the HA service. Still looking at why NN did not abort. > HA: Inaccessible shared edits dir not getting removed from FSImage storage > dirs upon error > ------------------------------------------------------------------------------------------ > > Key: HDFS-2909 > URL: https://issues.apache.org/jira/browse/HDFS-2909 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node > Affects Versions: HA branch (HDFS-1623) > Reporter: Bikas Saha > Assignee: Bikas Saha > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira