[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success

Binglin Chang (JIRA) Sun, 08 Jun 2014 21:00:27 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021553#comment-14021553
 ]


Binglin Chang commented on HDFS-6159:
-------------------------------------

The test error log:
Rebalancing expected avg utilization to become 0.16, but on datanode 
127.0.0.1:55468 it remains at 0.02 after more than 40000 msec.
bug from the balancer log:
{noformat}
2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741829_1005 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
We can see than there are 8 blocks(800 bytes) moved to 127.0.0.1:55468, the 
utilization of this datanode should be 0.16(800/5000).
{noformat}
But at the same time, those blocks are deleted by block manager:
{noformat}
2014-06-06 18:15:54,706 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
invalidated blocks set
2014-06-06 18:15:54,709 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
invalidated blocks set
2014-06-06 18:15:56,421 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
2014-06-06 18:15:57,717 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
invalidated blocks set
2014-06-06 18:15:57,720 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
invalidated blocks set
2014-06-06 18:15:57,721 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
invalidated blocks set
2014-06-06 18:15:57,722 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
invalidated blocks set
2014-06-06 18:15:57,723 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
invalidated blocks set
2014-06-06 18:15:59,422 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
2014-06-06 18:16:02,423 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
127.0.0.1:55468 to delete [blk_1073741845_1021]
{noformat}
At last, only block blk_1073741828_1004 is left on 127.0.0.1:55468, so the 
final utilizition is(100/5000=0.02)
Those blocks are newly moved by balancer and should not be invalidated by block 
manager. Most likely some logic in BlockManager.java invalidate blocks is 
broken? Look at the svn log, recently there are some change in BlockManager 
invalidate block. (HDFS-6424, HDFS-6362).
Perhaps [~jingzhao] or [~arpitagarwal] can help look at this? 

> TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block 
> missing after balancer success
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6159
>                 URL: https://issues.apache.org/jira/browse/HDFS-6159
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.3.0
>            Reporter: Chen He
>            Assignee: Chen He
>             Fix For: 3.0.0, 2.5.0
>
>         Attachments: HDFS-6159-v2.patch, HDFS-6159-v2.patch, HDFS-6159.patch, 
> logs.txt
>
>
> The TestBalancerWithNodeGroup.testBalancerWithNodeGroup will report negative 
> false failure if there is(are) data block(s) losing after balancer 
> successfuly finishes. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success

Reply via email to