[ https://issues.apache.org/jira/browse/HDFS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600637#comment-13600637 ]
Aaron T. Myers commented on HDFS-3277: -------------------------------------- Patch largely looks good to me. A few small comments: # I don't understand the need for the stashing away and potentially reloading the FSNS secret manager state in the event we don't read to the end of the fsimage file we're trying to load. Instead of doing all that, why not just throw an IOE and have that get handled in FSImage#loadFSImage just like it would be if we failed to load another part of the fsimage? That would end up calling DelegationTokenSecretManager#reset, which seems correct to me. # Seems like there's an extraneous new import in FSNamesystem. # Recommend adding a class comment to LogAppender, and perhaps renaming that class to something that makes it clear it's for interposing on/verifying log output. # Recommend refactoring the code in TestDFSUpgradeFromImage and TestStartup which searches through the LogAppender lines into LogAppender itself, along the lines of a "{{int countOccurrencesOf(String)}}". # In TestStartup#corruptFSImageMD5 you might want to use the constant Storage#STORAGE_DIR_CURRENT instead of hard-coding "current". # In TestStartup#testImageChecksum, consider using GenericTestUtils#assertExceptionContains instead of "{{ioe.getMessage().contains(...)}}". > fail over to loading a different FSImage if the first one we try to load is > corrupt > ----------------------------------------------------------------------------------- > > Key: HDFS-3277 > URL: https://issues.apache.org/jira/browse/HDFS-3277 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.0.0 > Reporter: Colin Patrick McCabe > Assignee: Andrew Wang > Attachments: HDFS-3277.002.patch, HDFS-3277.003.patch, > HDFS-3277.004.patch, HDFS-3277.005.patch > > > Most users store multiple copies of the FSImage in order to prevent > catastrophic data loss if a hard disk fails. However, our image loading code > is currently not set up to start reading another FSImage if loading the first > one does not succeed. We should add this capability. > We should also be sure to remove the FSImage directory that failed from the > list of FSImage directories to write to, in the way we normally do when a > write (as opopsed to read) fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira