[ 
https://issues.apache.org/jira/browse/HDFS-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434293#comment-13434293
 ] 

Konstantin Shvachko commented on HDFS-3772:
-------------------------------------------

I was writing this when Jira went down. Will try to reproduce that comment.

> files created with the old replication count will expected to bump up to the 
> new minimum upon restart automatically

This is not an expected behavior.
{{dfs.namenode.replication.min}} has two purposes:
# Counting the blocks satisfying the new minimum replication during startup.
# Controls the minimal number of replicas that must be created during pipeline 
in order to call the data transfer successful.

Setting {{replication.min}} to higher value does not mean NN replicates blocks 
to that min. 
It means NN will wait for that many replicas to be reported during startup 
before exiting SafeMode.
If you set it too high, this is one of the ways to never let NN go out of 
SafeMode automatically.

SafeMode prohibits replication or deletion of blocks or modification of the 
namespace,
so block replication will not happen until NN leaves SafeMode.
If you are trying to increase block replication for all files in your file 
system you should use
{{setReplication()}} on the root recursively. But replication will start only 
after SafeMode is OFF.

> I think we can change the semantics of this parameter to the percentage of 
> blocks that satisfy the real replication of each file.

Not a good idea. In general, changing semantics of existing parameters is 
confusing. 
And in particular, because this will make NN stay in SafeMode forever if some 
DataNodes don't come up.

I think the question here is what you are trying to achieve with this? 
                
> HDFS NN will hang in safe mode and never come out if we change the 
> dfs.namenode.replication.min bigger.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3772
>                 URL: https://issues.apache.org/jira/browse/HDFS-3772
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.0.0-alpha
>            Reporter: Yanbo Liang
>
> If the NN restarts with a new minimum replication 
> (dfs.namenode.replication.min), any files created with the old replication 
> count will expected to bump up to the new minimum upon restart automatically. 
> However, the real case is that if the NN restarts will a new minimum 
> replication which is bigger than the old one, the NN will hang in safemode 
> and never come out.
> The corresponding test case can pass is because we have missing some test 
> coverage. It had been discussed in HDFS-3734.
> If the NN received enough number of reported block which is satisfying the 
> new minimum replication, it will exit safe mode. However, if we change a 
> bigger minimum replication, there will be no enough amount blocks which are 
> satisfying the limitation.
> Look at the code segment in FSNamesystem.java:
> private synchronized void incrementSafeBlockCount(short replication) {
>       if (replication == safeReplication) {
>         this.blockSafe++;
>         checkMode();
>       }
>     }
> The DNs report blocks to NN and if the replication is equal to 
> safeReplication(It is assigned by the new minimum replication.), we will 
> increment blockSafe. But if we change a bigger minimum replication, all the 
> blocks whose replications are lower than it can not satisfy this equal 
> relationship. But actually the NN had received complete block information. It 
> cause blockSafe will not increment as usual and not reach the enough amount 
> to exit safe mode and then NN hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to