[ https://issues.apache.org/jira/browse/HDFS-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Foley updated HDFS-1955: ----------------------------- Attachment: hdfs-1955_1.patch Here is a patch that provides the desired check, failing doUpgrade() if any storage directory fails. The change in FSImage is just a few lines, and easily validated by inspection. However, providing a unit test for it was very difficult. The problem is that failure must be forced *within* the doUpgrade() method itself, which is buried in the Namenode startup code, and quite well protected. First I tried to make the storage dir read-only, but that gets caught in recoverTransitionRead() well before invoking doUpgrade(). Second I looked at using Mockito, but it seems that in order to spy on the startup/upgrade process one would have to mock the entire stack of HDFS system objects. The invocation of NNStorage.rename() at line 367 of FSImage would be a convenient spy target, but it is static and I saw no way to get hold of it. Third, I rejected non-mock test parameters in production code. Finally I just tested it manually by temporarily hacking the code in doUpgrade() to force the error. I was able to validate my patch, and also found and fixed an NPE bug in FSEditLog. > HDFS-1826 made FSImage.doUpgrade() too fault-tolerant > ----------------------------------------------------- > > Key: HDFS-1955 > URL: https://issues.apache.org/jira/browse/HDFS-1955 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.22.0, 0.23.0 > Reporter: Matt Foley > Assignee: Matt Foley > Attachments: hdfs-1955_1.patch > > > Prior to HDFS-1826, doUpgrade() would fail if any of the storage directories > failed to successfully write the new fsimage or edits files. > Now it appears to "succeed" even if some or all of the individual directories > fail. > There is some discussion about whether doUpgrade() should have some fault > tolerance, but for now make it fail on any single storage directory failure, > as before. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira