[jira] [Commented] (HDFS-3772) HDFS NN will hang in safe mode and never come out if we change the dfs.namenode.replication.min bigger.

Konstantin Shvachko (JIRA) Wed, 15 Aug 2012 10:12:40 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435307#comment-13435307
 ]


Konstantin Shvachko commented on HDFS-3772:
-------------------------------------------

It is not the hang. It works as designed. You increase {{replication.min}} and 
SafeMode waits for replicas to reach that safe replication limit.
You seem to assume that bumping up {{replication.min}} will result in more 
replicas, but it is not the case. You can use setReplication() for that.
                
> HDFS NN will hang in safe mode and never come out if we change the 
> dfs.namenode.replication.min bigger.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3772
>                 URL: https://issues.apache.org/jira/browse/HDFS-3772
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.0.0-alpha
>            Reporter: Yanbo Liang
>
> If the NN restarts with a new minimum replication 
> (dfs.namenode.replication.min), any files created with the old replication 
> count will expected to bump up to the new minimum upon restart automatically. 
> However, the real case is that if the NN restarts will a new minimum 
> replication which is bigger than the old one, the NN will hang in safemode 
> and never come out.
> The corresponding test case can pass is because we have missing some test 
> coverage. It had been discussed in HDFS-3734.
> If the NN received enough number of reported block which is satisfying the 
> new minimum replication, it will exit safe mode. However, if we change a 
> bigger minimum replication, there will be no enough amount blocks which are 
> satisfying the limitation.
> Look at the code segment in FSNamesystem.java:
> private synchronized void incrementSafeBlockCount(short replication) {
>       if (replication == safeReplication) {
>         this.blockSafe++;
>         checkMode();
>       }
>     }
> The DNs report blocks to NN and if the replication is equal to 
> safeReplication(It is assigned by the new minimum replication.), we will 
> increment blockSafe. But if we change a bigger minimum replication, all the 
> blocks whose replications are lower than it can not satisfy this equal 
> relationship. But actually the NN had received complete block information. It 
> cause blockSafe will not increment as usual and not reach the enough amount 
> to exit safe mode and then NN hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3772) HDFS NN will hang in safe mode and never come out if we change the dfs.namenode.replication.min bigger.

Reply via email to