[ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907687#comment-13907687
 ] 

Suresh Srinivas commented on HDFS-5840:
---------------------------------------

[~atm], sorry for the late reply. I had lost track of this.

{quote}
As for handling the partial upgrade failure as you've described, I'd like to 
add one more RPC call to the JournalManager to initiate analysis/recovery of 
the storage dirs upon first contact, and then refactor the contents of 
FSImage#recoverStorageDirs into NNUpgradeUtil just like was done with the other 
upgrade-related procedures. If this sounds OK to you, I'll go ahead and add 
that stuff and appropriate tests.
{quote}
Why not always recover in preupgrade/upgrade step, instead of adding another 
RPC?

With rolling upgrade getting ready, some of the functionality added in that may 
be useful. For partial failures related to JournalNodes, the choice made in 
that feature to make the operation to rollback JournalNode idempotent. It looks 
like lot of rolling upgrade related code can be leveraged here, since upgrade 
is a special case of rolling upgrade. Should we explore that?

> Follow-up to HDFS-5138 to improve error handling during partial upgrade 
> failures
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-5840
>                 URL: https://issues.apache.org/jira/browse/HDFS-5840
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>             Fix For: 3.0.0
>
>         Attachments: HDFS-5840.patch
>
>
> Suresh posted some good comment in HDFS-5138 after that patch had already 
> been committed to trunk. This JIRA is to address those. See the first comment 
> of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to