qinyuren created HDFS-16333:
-------------------------------

             Summary: fix balancer bug when transfer an EC block
                 Key: HDFS-16333
                 URL: https://issues.apache.org/jira/browse/HDFS-16333
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: balancer & mover
            Reporter: qinyuren
         Attachments: image-2021-11-18-17-25-13-089.png, 
image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png

We set the EC policy to (6+3) and we also have nodes that were decommissioning 
when we executed balancer.

With the balancer running, we find many error logs as follow.

!image-2021-11-18-17-25-13-089.png!

Node A wants to transfer an EC block to node B, but we found that the block is 
not on node A. The FSCK command to show the block status as follow

!image-2021-11-18-17-25-50-556.png!

Assume that the location of the an EC block look like this

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, b, c, d, e, f, g, h, i]

after decommission operation, the internal block on indices[1] were 
decommission to another node.

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, j, c, d, e, f, g, h, i]

So the location of this block may as follow, the location of indices[1] change 
from node b to node j.

In the dispatcher. getBlockList function

!image-2021-11-18-17-28-03-155.png!

If a node is not found in storageGroupMap, it will not be add to 
block.locations.

But the indices is not updated.

finally, the block location may like this, 

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

block.location:[a, c, d, e, f, g, h, i]

the location of the nodes does not match their indices



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to