[jira] [Updated] (HDFS-6178) Decommission on standby NN couldn't finish

Ming Ma (JIRA) Tue, 08 Apr 2014 15:56:13 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ming Ma updated HDFS-6178:
--------------------------

    Attachment: HDFS-6178-2.patch

Thanks, Jing. Updated patch per suggestion.

> Decommission on standby NN couldn't finish
> ------------------------------------------
>
>                 Key: HDFS-6178
>                 URL: https://issues.apache.org/jira/browse/HDFS-6178
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Ming Ma
>         Attachments: HDFS-6178-2.patch, HDFS-6178.patch
>
>
> Currently decommissioning machines in HA-enabled cluster requires running 
> refreshNodes in both active and standby nodes. Sometimes decommissioning 
> won't finish from standby NN's point of view.  Here is the diagnosis of why 
> it could happen.
> Standby NN's blockManager manages blocks replication and block invalidation 
> as if it is the active NN; even though DNs will ignore block commands coming 
> from standby NN. When standby NN makes block operation decisions such as the 
> target of block replication and the node to remove excess blocks from, the 
> decision is independent of active NN. So active NN and standby NN could have 
> different states. When we try to decommission nodes on standby nodes; such 
> state inconsistency might prevent standby NN from making progress. Here is an 
> example.
> Machine A
> Machine B
> Machine C
> Machine D
> Machine E
> Machine F
> Machine G
> Machine H
> 1. For a given block, both active and standby have 5 replicas on machine A, 
> B, C, D, E. So both active and standby decide to pick excess nodes to 
> invalidate.
> Active picked D and E as excess DNs. After the next block reports from D and 
> E, active NN has 3 active replicas (A, B, C), 0 excess replica.
> {noformat}
> 2014-03-27 01:50:14,410 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (E:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks 
> set
> 2014-03-27 01:50:15,539 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (D:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks 
> set
> {noformat}
> Standby pick C, E as excess DNs. Given DNs ignore commands from standby, 
> After the next block reports from C, D, E,  standby has 2 active replicas (A, 
> B), 1 excess replica (C).
> {noformat}
> 2014-03-27 01:51:49,543 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (E:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks 
> set
> 2014-03-27 01:51:49,894 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (C:50010, blk_-5207804474559026159_121186764) is added to invalidated blocks 
> set
> {noformat}
> 2. Machine A decomm request was sent to standby. Standby only had one live 
> replica and picked machine G, H as targets, but given standby commands was 
> ignored by DNs, G, H remained in pending replication queue until they are 
> timed out. At this point, you have one decommissioning replica (A), 1 active 
> replica (B), one excess replica (C).
> {noformat}
> 2014-03-27 04:42:52,258 INFO BlockStateChange: BLOCK* ask A:50010 to 
> replicate blk_-5207804474559026159_121186764 to datanode(s) G:50010 H:50010
> {noformat}
> 3. Machine A decomm request was sent to active NN. Active NN picked machine F 
> as the target. It finished properly. So active NN had 3 active replicas (B, 
> C, F), one decommissioned replica (A).
> {noformat}
> 2014-03-27 04:44:15,239 INFO BlockStateChange: BLOCK* ask 10.42.246.110:50010 
> to replicate blk_-5207804474559026159_121186764 to datanode(s) F:50010
> 2014-03-27 04:44:16,083 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: F:50010 is added to blk_-5207804474559026159_121186764 size 
> 7100065
> {noformat}
> 4. Standby NN picked up F as a new replica. Thus standby had one 
> decommissioning replica (A), 2 active replicas (B, F), one excess replica 
> (C). Standby NN kept trying to schedule replication work, but DNs ignored the 
> commands.
> {noformat}
> 2014-03-27 04:44:16,084 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: F:50010 is added to blk_-5207804474559026159_121186764 size 
> 7100065
> 2014-03-28 23:06:11,970 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block: 
> blk_-5207804474559026159_121186764, Expected Replicas: 3, live replicas: 2, 
> corrupt replicas: 0, decommissioned replicas: 1, excess replicas: 1, Is Open 
> File: false, Datanodes having this block: C:50010 B:50010 A:50010 F:50010 , 
> Current Datanode: A:50010, Is current datanode decommissioning: true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6178) Decommission on standby NN couldn't finish

Reply via email to