Yanbo Liang created HDFS-3772:
---------------------------------
Summary: HDFS NN will hang in safe mode and never come out if we
change the dfs.namenode.replication.min bigger.
Key: HDFS-3772
URL: https://issues.apache.org/jira/browse/HDFS-3772
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Yanbo Liang
If the NN restarts with a new minimum replication
(dfs.namenode.replication.min), any files created with the old replication
count will expected to bump up to the new minimum upon restart automatically.
However, the real case is that if the NN restarts will a new minimum
replication which is bigger than the old one, the NN will hang in safemode and
never come out.
The corresponding test case can pass is because we have missing some test
coverage. It had been discussed in HDFS-3734.
If the NN received enough number of reported block which is satisfying the new
minimum replication, it will exit safe mode. However, if we change a bigger
minimum replication, there will be no enough amount blocks which are satisfying
the limitation.
Look at the code segment in FSNamesystem.java:
private synchronized void incrementSafeBlockCount(short replication) {
if (replication == safeReplication) {
this.blockSafe++;
checkMode();
}
}
The DNs report blocks to NN and if the replication is equal to
safeReplication(It is assigned by the new minimum replication.), we will
increment blockSafe. But if we change a bigger minimum replication, all the
blocks whose replications are lower than it can not satisfy this equal
relationship. But actually the NN had received complete block information. It
cause blockSafe will not increment as usual and not reach the enough amount to
exit safe mode and then NN hangs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira