[ https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172513#comment-13172513 ]
Todd Lipcon commented on HDFS-2702: ----------------------------------- Oh, right. duh :) Thanks, +1. > A single failed name dir can cause the NN to exit > -------------------------------------------------- > > Key: HDFS-2702 > URL: https://issues.apache.org/jira/browse/HDFS-2702 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 1.0.0 > Reporter: Eli Collins > Assignee: Eli Collins > Priority: Critical > Attachments: hdfs-2702.txt, hdfs-2702.txt, hdfs-2702.txt, > hdfs-2702.txt > > > There's a bug in FSEditLog#rollEditLog which results in the NN process > exiting if a single name dir has failed. Here's the relevant code: > {code} > close() // So editStreams.size() is 0 > foreach edits dir { > .. > eStream = new ... // Might get an IOE here > editStreams.add(eStream); > } catch (IOException ioe) { > removeEditsForStorageDir(sd); // exits if editStreams.size() <= 1 > } > {code} > If we get an IOException before we've added two edits streams to the list > we'll exit, eg if there's an error processing the 1st name dir we'll exit > even if there are 4 valid name dirs. The fix is to move the checking out of > removeEditsForStorageDir (nee processIOError) or modify it so it can be > disabled in some cases, eg here where we don't yet know how many streams are > valid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira