[ https://issues.apache.org/jira/browse/HDFS-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531219#comment-15531219 ]
Lei (Eddy) Xu commented on HDFS-9390: ------------------------------------- Hi, [~mingma] Thanks so much for posting this patch. It looks good to me overall. Some small nits: In {{HeartbeatManager.java#heartbeatCheck()}}: {code} try { dm.removeDeadDatanode(dead, !dead.isMaintenance()); } {code} If we change it to the following code, we can undo most of the {{DatanodeManager.java}} changes, of which the motivation of these changes are not clear to me in the first sight. {code} if (!dead.isMaintenance()) { dm.removeDeadDatanode(dead); } {code} Can you elaborate a little bit more about the following code? {code} } else if (blockManager.getMinReplicationToBeInMaintenance() == 0) { LOG.info("MinReplicationToBeInMaintenance is set to zero. " + node + " is put in maintenance state" + " immediately."); node.setInMaintenance(); } else { stats.subtract(node); node.startMaintenance(); stats.add(node); } {code} Why it does not re-calculate {{stats}} when {{minReplicationToBeInMaintanence == 0}}? In {{DecommissionManager#startmaintance()}} {code} // hbManager.startDecommission will set dead node to decommissioned. {code} Is the comment correct in the context? One related question is that, why {{startMaintenance()}} and {{stopMaintenance()}} are in {{DecommissionManager}}. In {{NumberReplicas.java}}, you might want consider rename {{int maintenance()}} to {{int maintenanceReplicas}}, so is {{liveEnteringMaintence()}}. Thanks. > Block management for maintenance states > --------------------------------------- > > Key: HDFS-9390 > URL: https://issues.apache.org/jira/browse/HDFS-9390 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Ming Ma > Assignee: Ming Ma > Attachments: HDFS-9390-2.patch, HDFS-9390.patch > > > When a node is transitioned to/stay in/transitioned out of maintenance state, > we need to make sure blocks w.r.t. that nodes are properly handled. > * When nodes are put into maintenance, it will first go to > ENTERING_MAINTENANCE, and make sure blocks are minimally replicated before > the nodes are transitioned to IN_MAINTENANCE. > * Do not replica blocks when nodes are in maintenance states. Maintenance > replica will remain in BlockMaps and thus is still considered valid from > block replication point of view. In other words, putting a node to > “maintenance” mode won’t trigger BlockManager to replicate its blocks. > * Do not invalidate replicas on node under maintenance. After any file's > replication factor is reduced, NN needs to invalidate some replicas. It > should exclude nodes under maintenance in the handling. > * Do not put IN_MAINTENANCE replicas in LocatedBlock for read operation. > * Do not allocate any new block on nodes under maintenance. > * Have Balancer exclude nodes under maintenance. > * Exclude nodes under maintenance for DN cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org