[ 
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450182#comment-13450182
 ] 

Colin Patrick McCabe commented on HDFS-3540:
--------------------------------------------

bq. Hi Colin, you keep mentioning HDFS-3004 or the recovery mode feature in 
trunk. However, we are talking about branch-1 recovery mode here.

The reason why I mentioned HDFS-3004 is because the original design doc 
contains a good explanation of why recovery mode should not be enabled in 
normal operation:

{code}
Why can't we simply do recovery as part of normal NameNode operation?  Well,
recovery may involve destructive changes to the NameNode metadata.  Since the
metadata is corrupt, we will have to use guesswork to get back to a valid
state.
{code}

This issue is the same in both branch-1 and later branches: if you have to 
guess, you shouldn't make the process automatic.

bq. The branch-1 recovery mode feature is not yet released. If the new feature 
has problems, we should remove it. It is not a point if people already know how 
to use it. If there are people using development code, they have to get 
prepared that the un-released new feature may be changed or removed.

It would be inconvenient for us to remove RM for branch-1.  I am willing to 
consider it, but I just don't think the arguments presented here so far have 
been convincing.

I think the first thing we need to answer is what is the use case for edit log 
toleration?  What are your guidelines for when edit log toleration should be 
turned on?  This has never been clear to me.  It seems to me if you wanted to 
get higher availability, you would be better off implementing edit log failover 
in branch-1.

At the very least, it would be nice to have a document explaining who the 
intended users are for edit log toleration, why they would use it rather than 
something else, and what the risks are.  At that point we could start to 
consider what the best resolution for this is-- whatever that may be.
                
> Further improvement on recovery mode and edit log toleration in branch-1
> ------------------------------------------------------------------------
>
>                 Key: HDFS-3540
>                 URL: https://issues.apache.org/jira/browse/HDFS-3540
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 1.2.0
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>
> *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1.  However, the 
> recovery mode feature in branch-1 is dramatically different from the recovery 
> mode in trunk since the edit log implementations in these two branch are 
> different.  For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not 
> in trunk.
> *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy 
> UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
> There are overlaps between these two features.  We study potential further 
> improvement in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to