[ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907687#comment-13907687 ]
Suresh Srinivas commented on HDFS-5840: --------------------------------------- [~atm], sorry for the late reply. I had lost track of this. {quote} As for handling the partial upgrade failure as you've described, I'd like to add one more RPC call to the JournalManager to initiate analysis/recovery of the storage dirs upon first contact, and then refactor the contents of FSImage#recoverStorageDirs into NNUpgradeUtil just like was done with the other upgrade-related procedures. If this sounds OK to you, I'll go ahead and add that stuff and appropriate tests. {quote} Why not always recover in preupgrade/upgrade step, instead of adding another RPC? With rolling upgrade getting ready, some of the functionality added in that may be useful. For partial failures related to JournalNodes, the choice made in that feature to make the operation to rollback JournalNode idempotent. It looks like lot of rolling upgrade related code can be leveraged here, since upgrade is a special case of rolling upgrade. Should we explore that? > Follow-up to HDFS-5138 to improve error handling during partial upgrade > failures > -------------------------------------------------------------------------------- > > Key: HDFS-5840 > URL: https://issues.apache.org/jira/browse/HDFS-5840 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 3.0.0 > Reporter: Aaron T. Myers > Assignee: Aaron T. Myers > Fix For: 3.0.0 > > Attachments: HDFS-5840.patch > > > Suresh posted some good comment in HDFS-5138 after that patch had already > been committed to trunk. This JIRA is to address those. See the first comment > of this JIRA for the full content of the review. -- This message was sent by Atlassian JIRA (v6.1.5#6160)