[jira] [Updated] (HDFS-16333) fix balancer bug when transfer an EC block

2022-01-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-16333:
---
Component/s: erasure-coding

> fix balancer bug when transfer an EC block
> --
>
> Key: HDFS-16333
> URL: https://issues.apache.org/jira/browse/HDFS-16333
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, erasure-coding
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: image-2021-11-18-17-25-13-089.png, 
> image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> We set the EC policy to (6+3) and we also have nodes that were 
> decommissioning when we executed balancer.
> With the balancer running, we find many error logs as follow.
> !image-2021-11-18-17-25-13-089.png|width=858,height=135!
> Node A wants to transfer an EC block to node B, but we found that the block 
> is not on node A. The FSCK command to show the block status as follow
> !image-2021-11-18-17-25-50-556.png|width=607,height=189!
> In the dispatcher. getBlockList function
> !image-2021-11-18-17-28-03-155.png!
>  
> Assume that the location of the an EC block in storageGroupMap look like this
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, b, c, d, e, f, g, h, i]
> after decommission operation, the internal block on indices[1] were 
> decommission to another node.
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, {color:#FF}j{color}, c, d, e, f, g, h, i]
> the location of indices[1] change from node {color:#FF}b{color} to node 
> {color:#FF}j{color}.
>  
> When the balancer get the block location and check it with the location in 
> storageGroupMap.
> If a node is not found in storageGroupMap, it will not be add to block 
> locations.
> In this case, node {color:#FF}j {color}will not be added to the block 
> locations, while the indices is not updated.
> Finally, the block location may look like this, 
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}
> the location of the nodes does not match their indices
>  
> Solution:
> we should update the indices and match with the nodes
> {color:#FF}indices:[0, 2, 3, 4, 5, 6, 7, 8]{color}
> {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16333) fix balancer bug when transfer an EC block

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16333:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> fix balancer bug when transfer an EC block
> --
>
> Key: HDFS-16333
> URL: https://issues.apache.org/jira/browse/HDFS-16333
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: image-2021-11-18-17-25-13-089.png, 
> image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> We set the EC policy to (6+3) and we also have nodes that were 
> decommissioning when we executed balancer.
> With the balancer running, we find many error logs as follow.
> !image-2021-11-18-17-25-13-089.png|width=858,height=135!
> Node A wants to transfer an EC block to node B, but we found that the block 
> is not on node A. The FSCK command to show the block status as follow
> !image-2021-11-18-17-25-50-556.png|width=607,height=189!
> In the dispatcher. getBlockList function
> !image-2021-11-18-17-28-03-155.png!
>  
> Assume that the location of the an EC block in storageGroupMap look like this
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, b, c, d, e, f, g, h, i]
> after decommission operation, the internal block on indices[1] were 
> decommission to another node.
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, {color:#FF}j{color}, c, d, e, f, g, h, i]
> the location of indices[1] change from node {color:#FF}b{color} to node 
> {color:#FF}j{color}.
>  
> When the balancer get the block location and check it with the location in 
> storageGroupMap.
> If a node is not found in storageGroupMap, it will not be add to block 
> locations.
> In this case, node {color:#FF}j {color}will not be added to the block 
> locations, while the indices is not updated.
> Finally, the block location may look like this, 
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}
> the location of the nodes does not match their indices
>  
> Solution:
> we should update the indices and match with the nodes
> {color:#FF}indices:[0, 2, 3, 4, 5, 6, 7, 8]{color}
> {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16333) fix balancer bug when transfer an EC block

2021-12-09 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16333:

Fix Version/s: 3.2.4
   3.3.3

> fix balancer bug when transfer an EC block
> --
>
> Key: HDFS-16333
> URL: https://issues.apache.org/jira/browse/HDFS-16333
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: image-2021-11-18-17-25-13-089.png, 
> image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> We set the EC policy to (6+3) and we also have nodes that were 
> decommissioning when we executed balancer.
> With the balancer running, we find many error logs as follow.
> !image-2021-11-18-17-25-13-089.png|width=858,height=135!
> Node A wants to transfer an EC block to node B, but we found that the block 
> is not on node A. The FSCK command to show the block status as follow
> !image-2021-11-18-17-25-50-556.png|width=607,height=189!
> In the dispatcher. getBlockList function
> !image-2021-11-18-17-28-03-155.png!
>  
> Assume that the location of the an EC block in storageGroupMap look like this
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, b, c, d, e, f, g, h, i]
> after decommission operation, the internal block on indices[1] were 
> decommission to another node.
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, {color:#FF}j{color}, c, d, e, f, g, h, i]
> the location of indices[1] change from node {color:#FF}b{color} to node 
> {color:#FF}j{color}.
>  
> When the balancer get the block location and check it with the location in 
> storageGroupMap.
> If a node is not found in storageGroupMap, it will not be add to block 
> locations.
> In this case, node {color:#FF}j {color}will not be added to the block 
> locations, while the indices is not updated.
> Finally, the block location may look like this, 
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}
> the location of the nodes does not match their indices
>  
> Solution:
> we should update the indices and match with the nodes
> {color:#FF}indices:[0, 2, 3, 4, 5, 6, 7, 8]{color}
> {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16333) fix balancer bug when transfer an EC block

2021-12-08 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16333:

Fix Version/s: (was: 3.2.4)
   (was: 3.3.3)

> fix balancer bug when transfer an EC block
> --
>
> Key: HDFS-16333
> URL: https://issues.apache.org/jira/browse/HDFS-16333
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2021-11-18-17-25-13-089.png, 
> image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> We set the EC policy to (6+3) and we also have nodes that were 
> decommissioning when we executed balancer.
> With the balancer running, we find many error logs as follow.
> !image-2021-11-18-17-25-13-089.png|width=858,height=135!
> Node A wants to transfer an EC block to node B, but we found that the block 
> is not on node A. The FSCK command to show the block status as follow
> !image-2021-11-18-17-25-50-556.png|width=607,height=189!
> In the dispatcher. getBlockList function
> !image-2021-11-18-17-28-03-155.png!
>  
> Assume that the location of the an EC block in storageGroupMap look like this
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, b, c, d, e, f, g, h, i]
> after decommission operation, the internal block on indices[1] were 
> decommission to another node.
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, {color:#FF}j{color}, c, d, e, f, g, h, i]
> the location of indices[1] change from node {color:#FF}b{color} to node 
> {color:#FF}j{color}.
>  
> When the balancer get the block location and check it with the location in 
> storageGroupMap.
> If a node is not found in storageGroupMap, it will not be add to block 
> locations.
> In this case, node {color:#FF}j {color}will not be added to the block 
> locations, while the indices is not updated.
> Finally, the block location may look like this, 
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}
> the location of the nodes does not match their indices
>  
> Solution:
> we should update the indices and match with the nodes
> {color:#FF}indices:[0, 2, 3, 4, 5, 6, 7, 8]{color}
> {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16333) fix balancer bug when transfer an EC block

2021-11-18 Thread qinyuren (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qinyuren updated HDFS-16333:

Description: 
We set the EC policy to (6+3) and we also have nodes that were decommissioning 
when we executed balancer.

With the balancer running, we find many error logs as follow.

!image-2021-11-18-17-25-13-089.png|width=858,height=135!

Node A wants to transfer an EC block to node B, but we found that the block is 
not on node A. The FSCK command to show the block status as follow

!image-2021-11-18-17-25-50-556.png|width=607,height=189!

In the dispatcher. getBlockList function

!image-2021-11-18-17-28-03-155.png!

 

Assume that the location of the an EC block in storageGroupMap look like this

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, b, c, d, e, f, g, h, i]

after decommission operation, the internal block on indices[1] were 
decommission to another node.

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, {color:#FF}j{color}, c, d, e, f, g, h, i]

the location of indices[1] change from node {color:#FF}b{color} to node 
{color:#FF}j{color}.

 

When the balancer get the block location and check it with the location in 
storageGroupMap.

If a node is not found in storageGroupMap, it will not be add to block 
locations.

In this case, node {color:#FF}j {color}will not be added to the block 
locations, while the indices is not updated.

Finally, the block location may look like this, 

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

{color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}

the location of the nodes does not match their indices

 

Solution:

we should update the indices and match with the nodes

{color:#FF}indices:[0, 2, 3, 4, 5, 6, 7, 8]{color}

{color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}

  was:
We set the EC policy to (6+3) and we also have nodes that were decommissioning 
when we executed balancer.

With the balancer running, we find many error logs as follow.

!image-2021-11-18-17-25-13-089.png|width=858,height=135!

Node A wants to transfer an EC block to node B, but we found that the block is 
not on node A. The FSCK command to show the block status as follow

!image-2021-11-18-17-25-50-556.png|width=607,height=189!

Assume that the location of the an EC block look like this

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, b, c, d, e, f, g, h, i]

after decommission operation, the internal block on indices[1] were 
decommission to another node.

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, j, c, d, e, f, g, h, i]

the location of indices[1] change from node b to node j.

In the dispatcher. getBlockList function

!image-2021-11-18-17-28-03-155.png!

If a node is not found in storageGroupMap, it will not be add to 
block.locations.

But the indices is not updated.

finally, the block location may like this, 

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

block.location:[a, c, d, e, f, g, h, i]

the location of the nodes does not match their indices


> fix balancer bug when transfer an EC block
> --
>
> Key: HDFS-16333
> URL: https://issues.apache.org/jira/browse/HDFS-16333
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-11-18-17-25-13-089.png, 
> image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We set the EC policy to (6+3) and we also have nodes that were 
> decommissioning when we executed balancer.
> With the balancer running, we find many error logs as follow.
> !image-2021-11-18-17-25-13-089.png|width=858,height=135!
> Node A wants to transfer an EC block to node B, but we found that the block 
> is not on node A. The FSCK command to show the block status as follow
> !image-2021-11-18-17-25-50-556.png|width=607,height=189!
> In the dispatcher. getBlockList function
> !image-2021-11-18-17-28-03-155.png!
>  
> Assume that the location of the an EC block in storageGroupMap look like this
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, b, c, d, e, f, g, h, i]
> after decommission operation, the internal block on indices[1] were 
> decommission to another node.
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, {color:#FF}j{color}, c, d, e, f, g, h, i]
> the location of indices[1] change from node {color:#FF}b{color} to node 
> {color:#FF}j{color}.
>  
> When the balancer get the block location and check it with the location in 
> storageGroupMap.
> If a node is not found in storageGroupMap, it will not be add to block 
> locations.
> In this case, node {color:#FF}j {color}will not be added to the block 
> locations, while the indices is not updated.
> Finally, the block location may look like this, 
> indices:[0, 

[jira] [Updated] (HDFS-16333) fix balancer bug when transfer an EC block

2021-11-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16333:
--
Labels: pull-request-available  (was: )

> fix balancer bug when transfer an EC block
> --
>
> Key: HDFS-16333
> URL: https://issues.apache.org/jira/browse/HDFS-16333
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-11-18-17-25-13-089.png, 
> image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We set the EC policy to (6+3) and we also have nodes that were 
> decommissioning when we executed balancer.
> With the balancer running, we find many error logs as follow.
> !image-2021-11-18-17-25-13-089.png|width=858,height=135!
> Node A wants to transfer an EC block to node B, but we found that the block 
> is not on node A. The FSCK command to show the block status as follow
> !image-2021-11-18-17-25-50-556.png|width=607,height=189!
> Assume that the location of the an EC block look like this
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, b, c, d, e, f, g, h, i]
> after decommission operation, the internal block on indices[1] were 
> decommission to another node.
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, j, c, d, e, f, g, h, i]
> the location of indices[1] change from node b to node j.
> In the dispatcher. getBlockList function
> !image-2021-11-18-17-28-03-155.png!
> If a node is not found in storageGroupMap, it will not be add to 
> block.locations.
> But the indices is not updated.
> finally, the block location may like this, 
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> block.location:[a, c, d, e, f, g, h, i]
> the location of the nodes does not match their indices



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16333) fix balancer bug when transfer an EC block

2021-11-18 Thread qinyuren (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qinyuren updated HDFS-16333:

Description: 
We set the EC policy to (6+3) and we also have nodes that were decommissioning 
when we executed balancer.

With the balancer running, we find many error logs as follow.

!image-2021-11-18-17-25-13-089.png|width=858,height=135!

Node A wants to transfer an EC block to node B, but we found that the block is 
not on node A. The FSCK command to show the block status as follow

!image-2021-11-18-17-25-50-556.png|width=607,height=189!

Assume that the location of the an EC block look like this

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, b, c, d, e, f, g, h, i]

after decommission operation, the internal block on indices[1] were 
decommission to another node.

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, j, c, d, e, f, g, h, i]

the location of indices[1] change from node b to node j.

In the dispatcher. getBlockList function

!image-2021-11-18-17-28-03-155.png!

If a node is not found in storageGroupMap, it will not be add to 
block.locations.

But the indices is not updated.

finally, the block location may like this, 

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

block.location:[a, c, d, e, f, g, h, i]

the location of the nodes does not match their indices

  was:
We set the EC policy to (6+3) and we also have nodes that were decommissioning 
when we executed balancer.

With the balancer running, we find many error logs as follow.

!image-2021-11-18-17-25-13-089.png|width=858,height=135!

Node A wants to transfer an EC block to node B, but we found that the block is 
not on node A. The FSCK command to show the block status as follow

!image-2021-11-18-17-25-50-556.png|width=607,height=189!

Assume that the location of the an EC block look like this

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, b, c, d, e, f, g, h, i]

after decommission operation, the internal block on indices[1] were 
decommission to another node.

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, j, c, d, e, f, g, h, i]

So the location of this block may as follow, the location of indices[1] change 
from node b to node j.

In the dispatcher. getBlockList function

!image-2021-11-18-17-28-03-155.png!

If a node is not found in storageGroupMap, it will not be add to 
block.locations.

But the indices is not updated.

finally, the block location may like this, 

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

block.location:[a, c, d, e, f, g, h, i]

the location of the nodes does not match their indices


> fix balancer bug when transfer an EC block
> --
>
> Key: HDFS-16333
> URL: https://issues.apache.org/jira/browse/HDFS-16333
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: qinyuren
>Priority: Major
> Attachments: image-2021-11-18-17-25-13-089.png, 
> image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png
>
>
> We set the EC policy to (6+3) and we also have nodes that were 
> decommissioning when we executed balancer.
> With the balancer running, we find many error logs as follow.
> !image-2021-11-18-17-25-13-089.png|width=858,height=135!
> Node A wants to transfer an EC block to node B, but we found that the block 
> is not on node A. The FSCK command to show the block status as follow
> !image-2021-11-18-17-25-50-556.png|width=607,height=189!
> Assume that the location of the an EC block look like this
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, b, c, d, e, f, g, h, i]
> after decommission operation, the internal block on indices[1] were 
> decommission to another node.
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, j, c, d, e, f, g, h, i]
> the location of indices[1] change from node b to node j.
> In the dispatcher. getBlockList function
> !image-2021-11-18-17-28-03-155.png!
> If a node is not found in storageGroupMap, it will not be add to 
> block.locations.
> But the indices is not updated.
> finally, the block location may like this, 
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> block.location:[a, c, d, e, f, g, h, i]
> the location of the nodes does not match their indices



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16333) fix balancer bug when transfer an EC block

2021-11-18 Thread qinyuren (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qinyuren updated HDFS-16333:

Description: 
We set the EC policy to (6+3) and we also have nodes that were decommissioning 
when we executed balancer.

With the balancer running, we find many error logs as follow.

!image-2021-11-18-17-25-13-089.png|width=858,height=135!

Node A wants to transfer an EC block to node B, but we found that the block is 
not on node A. The FSCK command to show the block status as follow

!image-2021-11-18-17-25-50-556.png|width=607,height=189!

Assume that the location of the an EC block look like this

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, b, c, d, e, f, g, h, i]

after decommission operation, the internal block on indices[1] were 
decommission to another node.

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, j, c, d, e, f, g, h, i]

So the location of this block may as follow, the location of indices[1] change 
from node b to node j.

In the dispatcher. getBlockList function

!image-2021-11-18-17-28-03-155.png!

If a node is not found in storageGroupMap, it will not be add to 
block.locations.

But the indices is not updated.

finally, the block location may like this, 

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

block.location:[a, c, d, e, f, g, h, i]

the location of the nodes does not match their indices

  was:
We set the EC policy to (6+3) and we also have nodes that were decommissioning 
when we executed balancer.

With the balancer running, we find many error logs as follow.

!image-2021-11-18-17-25-13-089.png|width=1348,height=212!

Node A wants to transfer an EC block to node B, but we found that the block is 
not on node A. The FSCK command to show the block status as follow

!image-2021-11-18-17-25-50-556.png|width=855,height=266!

Assume that the location of the an EC block look like this

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, b, c, d, e, f, g, h, i]

after decommission operation, the internal block on indices[1] were 
decommission to another node.

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, j, c, d, e, f, g, h, i]

So the location of this block may as follow, the location of indices[1] change 
from node b to node j.

In the dispatcher. getBlockList function

!image-2021-11-18-17-28-03-155.png!

If a node is not found in storageGroupMap, it will not be add to 
block.locations.

But the indices is not updated.

finally, the block location may like this, 

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

block.location:[a, c, d, e, f, g, h, i]

the location of the nodes does not match their indices


> fix balancer bug when transfer an EC block
> --
>
> Key: HDFS-16333
> URL: https://issues.apache.org/jira/browse/HDFS-16333
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: qinyuren
>Priority: Major
> Attachments: image-2021-11-18-17-25-13-089.png, 
> image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png
>
>
> We set the EC policy to (6+3) and we also have nodes that were 
> decommissioning when we executed balancer.
> With the balancer running, we find many error logs as follow.
> !image-2021-11-18-17-25-13-089.png|width=858,height=135!
> Node A wants to transfer an EC block to node B, but we found that the block 
> is not on node A. The FSCK command to show the block status as follow
> !image-2021-11-18-17-25-50-556.png|width=607,height=189!
> Assume that the location of the an EC block look like this
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, b, c, d, e, f, g, h, i]
> after decommission operation, the internal block on indices[1] were 
> decommission to another node.
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, j, c, d, e, f, g, h, i]
> So the location of this block may as follow, the location of indices[1] 
> change from node b to node j.
> In the dispatcher. getBlockList function
> !image-2021-11-18-17-28-03-155.png!
> If a node is not found in storageGroupMap, it will not be add to 
> block.locations.
> But the indices is not updated.
> finally, the block location may like this, 
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> block.location:[a, c, d, e, f, g, h, i]
> the location of the nodes does not match their indices



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16333) fix balancer bug when transfer an EC block

2021-11-18 Thread qinyuren (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qinyuren updated HDFS-16333:

Description: 
We set the EC policy to (6+3) and we also have nodes that were decommissioning 
when we executed balancer.

With the balancer running, we find many error logs as follow.

!image-2021-11-18-17-25-13-089.png|width=1348,height=212!

Node A wants to transfer an EC block to node B, but we found that the block is 
not on node A. The FSCK command to show the block status as follow

!image-2021-11-18-17-25-50-556.png|width=855,height=266!

Assume that the location of the an EC block look like this

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, b, c, d, e, f, g, h, i]

after decommission operation, the internal block on indices[1] were 
decommission to another node.

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, j, c, d, e, f, g, h, i]

So the location of this block may as follow, the location of indices[1] change 
from node b to node j.

In the dispatcher. getBlockList function

!image-2021-11-18-17-28-03-155.png!

If a node is not found in storageGroupMap, it will not be add to 
block.locations.

But the indices is not updated.

finally, the block location may like this, 

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

block.location:[a, c, d, e, f, g, h, i]

the location of the nodes does not match their indices

  was:
We set the EC policy to (6+3) and we also have nodes that were decommissioning 
when we executed balancer.

With the balancer running, we find many error logs as follow.

!image-2021-11-18-17-25-13-089.png!

Node A wants to transfer an EC block to node B, but we found that the block is 
not on node A. The FSCK command to show the block status as follow

!image-2021-11-18-17-25-50-556.png!

Assume that the location of the an EC block look like this

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, b, c, d, e, f, g, h, i]

after decommission operation, the internal block on indices[1] were 
decommission to another node.

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

node:[a, j, c, d, e, f, g, h, i]

So the location of this block may as follow, the location of indices[1] change 
from node b to node j.

In the dispatcher. getBlockList function

!image-2021-11-18-17-28-03-155.png!

If a node is not found in storageGroupMap, it will not be add to 
block.locations.

But the indices is not updated.

finally, the block location may like this, 

indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]

block.location:[a, c, d, e, f, g, h, i]

the location of the nodes does not match their indices


> fix balancer bug when transfer an EC block
> --
>
> Key: HDFS-16333
> URL: https://issues.apache.org/jira/browse/HDFS-16333
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: qinyuren
>Priority: Major
> Attachments: image-2021-11-18-17-25-13-089.png, 
> image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png
>
>
> We set the EC policy to (6+3) and we also have nodes that were 
> decommissioning when we executed balancer.
> With the balancer running, we find many error logs as follow.
> !image-2021-11-18-17-25-13-089.png|width=1348,height=212!
> Node A wants to transfer an EC block to node B, but we found that the block 
> is not on node A. The FSCK command to show the block status as follow
> !image-2021-11-18-17-25-50-556.png|width=855,height=266!
> Assume that the location of the an EC block look like this
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, b, c, d, e, f, g, h, i]
> after decommission operation, the internal block on indices[1] were 
> decommission to another node.
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, j, c, d, e, f, g, h, i]
> So the location of this block may as follow, the location of indices[1] 
> change from node b to node j.
> In the dispatcher. getBlockList function
> !image-2021-11-18-17-28-03-155.png!
> If a node is not found in storageGroupMap, it will not be add to 
> block.locations.
> But the indices is not updated.
> finally, the block location may like this, 
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> block.location:[a, c, d, e, f, g, h, i]
> the location of the nodes does not match their indices



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org