[ 
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2702:
------------------------------

    Attachment: hdfs-2702.txt

Thanks Todd. Minor update to previous patch. It's not technically part of this 
change but good to fix at the same time. The question in the current comment in 
purgeEditLog is valid, let's fix that. Note that we won't and don't want to 
bail out since we've just closed all the logs. 

{code}
-          // Should we also remove from edits
+          sd.unlock();
+          removeEditsForStorageDir(sd);
           fsimage.updateRemovedDirs(sd, null);
           it.remove();
{code}
                
> A single failed name dir can cause the NN to exit 
> --------------------------------------------------
>
>                 Key: HDFS-2702
>                 URL: https://issues.apache.org/jira/browse/HDFS-2702
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>            Priority: Critical
>         Attachments: hdfs-2702.txt, hdfs-2702.txt, hdfs-2702.txt, 
> hdfs-2702.txt, hdfs-2702.txt
>
>
> There's a bug in FSEditLog#rollEditLog which results in the NN process 
> exiting if a single name dir has failed. Here's the relevant code:
> {code}
> close()  // So editStreams.size() is 0 
> foreach edits dir {
>   ..
>   eStream = new ...  // Might get an IOE here
>   editStreams.add(eStream);
> } catch (IOException ioe) {
>   removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
> }
> {code}
> If we get an IOException before we've added two edits streams to the list 
> we'll exit, eg if there's an error processing the 1st name dir we'll exit 
> even if there are 4 valid name dirs. The fix is to move the checking out of 
> removeEditsForStorageDir (nee processIOError) or modify it so it can be 
> disabled in some cases, eg here where we don't yet know how many streams are 
> valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to