[ https://issues.apache.org/jira/browse/HDFS-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435307#comment-13435307 ]
Konstantin Shvachko commented on HDFS-3772: ------------------------------------------- It is not the hang. It works as designed. You increase {{replication.min}} and SafeMode waits for replicas to reach that safe replication limit. You seem to assume that bumping up {{replication.min}} will result in more replicas, but it is not the case. You can use setReplication() for that. > HDFS NN will hang in safe mode and never come out if we change the > dfs.namenode.replication.min bigger. > ------------------------------------------------------------------------------------------------------- > > Key: HDFS-3772 > URL: https://issues.apache.org/jira/browse/HDFS-3772 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 2.0.0-alpha > Reporter: Yanbo Liang > > If the NN restarts with a new minimum replication > (dfs.namenode.replication.min), any files created with the old replication > count will expected to bump up to the new minimum upon restart automatically. > However, the real case is that if the NN restarts will a new minimum > replication which is bigger than the old one, the NN will hang in safemode > and never come out. > The corresponding test case can pass is because we have missing some test > coverage. It had been discussed in HDFS-3734. > If the NN received enough number of reported block which is satisfying the > new minimum replication, it will exit safe mode. However, if we change a > bigger minimum replication, there will be no enough amount blocks which are > satisfying the limitation. > Look at the code segment in FSNamesystem.java: > private synchronized void incrementSafeBlockCount(short replication) { > if (replication == safeReplication) { > this.blockSafe++; > checkMode(); > } > } > The DNs report blocks to NN and if the replication is equal to > safeReplication(It is assigned by the new minimum replication.), we will > increment blockSafe. But if we change a bigger minimum replication, all the > blocks whose replications are lower than it can not satisfy this equal > relationship. But actually the NN had received complete block information. It > cause blockSafe will not increment as usual and not reach the enough amount > to exit safe mode and then NN hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira