[ https://issues.apache.org/jira/browse/HDFS-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064963#comment-13064963 ]
John George commented on HDFS-2114: ----------------------------------- Mat, Thanks a lot for reviewing this patch. I will post a newer patch tomorrow with changes to checkFile(). I have tried to integrate your comments. > 1. I know you didn't write TestDecommission.checkFile(), but your addition in > this patch of the {{isNodeDown}} flag raises a number of questions: > * Aren't (isNodeDown && nodes[j].getName().equals(downnode)) and > (nodes[j].getName().equals(downnode)) always the same? So is the first use > of > {{isNodeDown}} even necessary? The first use of {isNodeDown} is necessary because {downnode} could be null in cases when we are checking for "Recommission". > * Can there be any cases where (nodes[j].getName().equals(downnode)) and > (nodes[j].isDecommissioned()) are different? So can't the following two > blocks > be merged? And the location of the "is decommissioned" log message further > confuses this issue. > {code} > if (isNodeDown && nodes[j].getName().equals(downnode)) { > hasdown++; > LOG.info("Block " + blk.getBlock() + " replica " + > nodes[j].getName() > + " is decommissioned."); > } > if (nodes[j].isDecommissioned()) { > if (firstDecomNodeIndex == -1) { > firstDecomNodeIndex = j; > } > continue; > } > {code} I think you are right. I will change the code to assert if they are not the same. > * What is the purpose of the assertion > {code} > if(isNodeDown) > assertEquals("Decom node is not at the end", firstDecomNodeIndex, > -1); > {code} My understanding is that this assert ensures that the current blk has only ONE replica that is in decommissioned state. > And why does it even work, since the node to decommission is chosen at random? > * And in the same block, why is it important to condition it on > {{isNodeDown}}, since (!isNodeDown) implies there shouldn't be any > decommissioned nodes? So the second use of {{isNodeDown}} also seems > unnecessary. Because, we don't care for this in cases where we there is no node that is down. > > 2. Regarding the timeout: In TestDecommission, you appropriately set > BLOCKREPORT_INTERVAL_MSEC down to 1 sec to match the HEARTBEAT_INTERVAL. You > may also want to consider DFS_NAMENODE_REPLICATION_INTERVAL_KEY (default 3 > sec). Adjusting this value to 1 might allow testRecommission() to run in 5 > sec instead of 10. > Will try this. > 3. I wish there were a clear way to share the duplicated code between > testDecommission and testRecommission, but I couldn't see an obviously better > choice. Can you think of a way to improve this? ok > re-commission of a decommissioned node does not delete excess replica > --------------------------------------------------------------------- > > Key: HDFS-2114 > URL: https://issues.apache.org/jira/browse/HDFS-2114 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: John George > Assignee: John George > Attachments: HDFS-2114-2.patch, HDFS-2114-3.patch, HDFS-2114.patch > > > If a decommissioned node is removed from the decommissioned list, namenode > does not delete the excess replicas it created while the node was > decommissioned. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira