Daryn Sharp created HDFS-8776:
---------------------------------

             Summary: Decom manager should not be active on standby
                 Key: HDFS-8776
                 URL: https://issues.apache.org/jira/browse/HDFS-8776
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.6.0
            Reporter: Daryn Sharp
            Assignee: Daryn Sharp


The decommission manager should not be actively processing on the standby.

The decomm manager goes through the costly computation for determining every 
block on the node requires replication yet doesn't queue them for replication - 
because it's in standby. The decomm manager is holding the namesystem write 
lock, causing DNs to timeout on heartbeats or IBRs, NN purges the call queue of 
timed out clients, NN processes some heartbeats/IBRs before the decomm manager 
locks up the namesystem again. Nodes attempting to register will be sending 
full BRs which are more costly to send and discard than a heartbeat.

If a failover is required, the standby will likely have to struggle very hard 
to not GC while "catching up" on its queued IBRs while DNs continue to fill the 
call queue and time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to