[ 
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172513#comment-13172513
 ] 

Todd Lipcon commented on HDFS-2702:
-----------------------------------

Oh, right. duh :) Thanks, +1.
                
> A single failed name dir can cause the NN to exit 
> --------------------------------------------------
>
>                 Key: HDFS-2702
>                 URL: https://issues.apache.org/jira/browse/HDFS-2702
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>            Priority: Critical
>         Attachments: hdfs-2702.txt, hdfs-2702.txt, hdfs-2702.txt, 
> hdfs-2702.txt
>
>
> There's a bug in FSEditLog#rollEditLog which results in the NN process 
> exiting if a single name dir has failed. Here's the relevant code:
> {code}
> close()  // So editStreams.size() is 0 
> foreach edits dir {
>   ..
>   eStream = new ...  // Might get an IOE here
>   editStreams.add(eStream);
> } catch (IOException ioe) {
>   removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
> }
> {code}
> If we get an IOException before we've added two edits streams to the list 
> we'll exit, eg if there's an error processing the 1st name dir we'll exit 
> even if there are 4 valid name dirs. The fix is to move the checking out of 
> removeEditsForStorageDir (nee processIOError) or modify it so it can be 
> disabled in some cases, eg here where we don't yet know how many streams are 
> valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to