[ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450182#comment-13450182 ]
Colin Patrick McCabe commented on HDFS-3540: -------------------------------------------- bq. Hi Colin, you keep mentioning HDFS-3004 or the recovery mode feature in trunk. However, we are talking about branch-1 recovery mode here. The reason why I mentioned HDFS-3004 is because the original design doc contains a good explanation of why recovery mode should not be enabled in normal operation: {code} Why can't we simply do recovery as part of normal NameNode operation? Well, recovery may involve destructive changes to the NameNode metadata. Since the metadata is corrupt, we will have to use guesswork to get back to a valid state. {code} This issue is the same in both branch-1 and later branches: if you have to guess, you shouldn't make the process automatic. bq. The branch-1 recovery mode feature is not yet released. If the new feature has problems, we should remove it. It is not a point if people already know how to use it. If there are people using development code, they have to get prepared that the un-released new feature may be changed or removed. It would be inconvenient for us to remove RM for branch-1. I am willing to consider it, but I just don't think the arguments presented here so far have been convincing. I think the first thing we need to answer is what is the use case for edit log toleration? What are your guidelines for when edit log toleration should be turned on? This has never been clear to me. It seems to me if you wanted to get higher availability, you would be better off implementing edit log failover in branch-1. At the very least, it would be nice to have a document explaining who the intended users are for edit log toleration, why they would use it rather than something else, and what the risks are. At that point we could start to consider what the best resolution for this is-- whatever that may be. > Further improvement on recovery mode and edit log toleration in branch-1 > ------------------------------------------------------------------------ > > Key: HDFS-3540 > URL: https://issues.apache.org/jira/browse/HDFS-3540 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 1.2.0 > Reporter: Tsz Wo (Nicholas), SZE > Assignee: Tsz Wo (Nicholas), SZE > > *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the > recovery mode feature in branch-1 is dramatically different from the recovery > mode in trunk since the edit log implementations in these two branch are > different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not > in trunk. > *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy > UNCHECKED_REGION_LENGTH and to tolerate edit log corruption. > There are overlaps between these two features. We study potential further > improvement in this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira