[ 
https://issues.apache.org/jira/browse/HDFS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1955:
-----------------------------

    Attachment: hdfs-1955_1.patch

Here is a patch that provides the desired check, failing doUpgrade() if any 
storage directory fails.  The change in FSImage is just a few lines, and easily 
validated by inspection. 

However, providing a unit test for it was very difficult. The problem is that 
failure must be forced *within* the doUpgrade() method itself, which is buried 
in the Namenode startup code, and quite well protected.  First I tried to make 
the storage dir read-only, but that gets caught in recoverTransitionRead() well 
before invoking doUpgrade().  Second I looked at using Mockito, but it seems 
that in order to spy on the startup/upgrade process one would have to mock the 
entire stack of HDFS system objects.  The invocation of NNStorage.rename() at 
line 367 of FSImage would be a convenient spy target, but it is static and I 
saw no way to get hold of it.  Third, I rejected non-mock test parameters in 
production code.

Finally I just tested it manually by temporarily hacking the code in 
doUpgrade() to force the error.  I was able to validate my patch, and also 
found and fixed an NPE bug in FSEditLog.

> HDFS-1826 made FSImage.doUpgrade() too fault-tolerant
> -----------------------------------------------------
>
>                 Key: HDFS-1955
>                 URL: https://issues.apache.org/jira/browse/HDFS-1955
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 0.23.0
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>         Attachments: hdfs-1955_1.patch
>
>
> Prior to HDFS-1826, doUpgrade() would fail if any of the storage directories 
> failed to successfully write the new fsimage or edits files.
> Now it appears to "succeed" even if some or all of the individual directories 
> fail.
> There is some discussion about whether doUpgrade() should have some fault 
> tolerance, but for now make it fail on any single storage directory failure, 
> as before.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to