[ 
https://issues.apache.org/jira/browse/HDFS-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531219#comment-15531219
 ] 

Lei (Eddy) Xu commented on HDFS-9390:
-------------------------------------

Hi, [~mingma] Thanks so much for posting this patch.

It looks good to me overall. Some small nits:

In {{HeartbeatManager.java#heartbeatCheck()}}:

{code}
try {
 dm.removeDeadDatanode(dead, !dead.isMaintenance());
}
{code}

If we change it to the following code, we can undo most of the 
{{DatanodeManager.java}} changes, of which the motivation of these changes are 
not clear to me in the first sight.

{code}
if (!dead.isMaintenance()) {
   dm.removeDeadDatanode(dead);
}
{code}

Can you elaborate a little bit more about the following code?

{code}
  } else if (blockManager.getMinReplicationToBeInMaintenance() == 0) {
      LOG.info("MinReplicationToBeInMaintenance is set to zero. " + node +
          " is put in maintenance state" + " immediately.");
      node.setInMaintenance();
    } else {
      stats.subtract(node);
      node.startMaintenance();
      stats.add(node);
    }
{code}

Why it does not re-calculate {{stats}} when {{minReplicationToBeInMaintanence 
== 0}}?

 In {{DecommissionManager#startmaintance()}} 
{code}
 // hbManager.startDecommission will set dead node to decommissioned.
{code}
Is the comment correct in the context?

One related question is that, why {{startMaintenance()}} and 
{{stopMaintenance()}} are in {{DecommissionManager}}.

In {{NumberReplicas.java}}, you might want consider rename {{int 
maintenance()}} to {{int maintenanceReplicas}}, so is 
{{liveEnteringMaintence()}}.

Thanks.

> Block management for maintenance states
> ---------------------------------------
>
>                 Key: HDFS-9390
>                 URL: https://issues.apache.org/jira/browse/HDFS-9390
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HDFS-9390-2.patch, HDFS-9390.patch
>
>
> When a node is transitioned to/stay in/transitioned out of maintenance state, 
> we need to make sure blocks w.r.t. that nodes are properly handled.
> * When nodes are put into maintenance, it will first go to 
> ENTERING_MAINTENANCE, and make sure blocks are minimally replicated before 
> the nodes are transitioned to IN_MAINTENANCE.
> * Do not replica blocks when nodes are in maintenance states. Maintenance 
> replica will remain in BlockMaps and thus is still considered valid from 
> block replication point of view. In other words, putting a node to 
> “maintenance” mode won’t trigger BlockManager to replicate its blocks.
> * Do not invalidate replicas on node under maintenance. After any file's 
> replication factor is reduced, NN needs to invalidate some replicas. It 
> should exclude nodes under maintenance in the handling.
> * Do not put IN_MAINTENANCE replicas in LocatedBlock for read operation.
> * Do not allocate any new block on nodes under maintenance.
> * Have Balancer exclude nodes under maintenance.
> * Exclude nodes under maintenance for DN cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to