[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003467#comment-14003467 ] Hudson commented on HDFS-6250: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1780 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1780/]) HDFS-6250. Fix test failed in TestBalancerWithNodeGroup.testBalancerWithRackLocality (Contributed by Binglin Chang and Chen He) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1595145) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithNodeGroup.java > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Binglin Chang > Fix For: 2.5.0 > > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003290#comment-14003290 ] Hudson commented on HDFS-6250: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1754 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1754/]) HDFS-6250. Fix test failed in TestBalancerWithNodeGroup.testBalancerWithRackLocality (Contributed by Binglin Chang and Chen He) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1595145) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithNodeGroup.java > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Binglin Chang > Fix For: 2.5.0 > > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003255#comment-14003255 ] Hudson commented on HDFS-6250: -- FAILURE: Integrated in Hadoop-Yarn-trunk #562 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/562/]) HDFS-6250. Fix test failed in TestBalancerWithNodeGroup.testBalancerWithRackLocality (Contributed by Binglin Chang and Chen He) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1595145) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithNodeGroup.java > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Binglin Chang > Fix For: 2.5.0 > > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002935#comment-14002935 ] Hudson commented on HDFS-6250: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5606 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5606/]) HDFS-6250. Fix test failed in TestBalancerWithNodeGroup.testBalancerWithRackLocality (Contributed by Binglin Chang and Chen He) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1595145) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithNodeGroup.java > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Binglin Chang > Fix For: 2.5.0 > > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999715#comment-13999715 ] Junping Du commented on HDFS-6250: -- Thanks [~airbots] for verification work! I will commit the patch soon. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Fix For: 2.5.0 > > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998756#comment-13998756 ] Chen He commented on HDFS-6250: --- Hi [~djp], [~decster], Thank you for the reply. You are right. There are 3 matchers in Balancer, NODE_GROUP, SAME_RACK, and ANY_OTHER. If we give the SAME_RACK matcher to balancer, it will not touch the node0 even it is highly over-utilized. It make sense. Please check it in. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998417#comment-13998417 ] Binglin Chang commented on HDFS-6250: - Hi [~airbots], what Junping points out is correct, the blocks in test file should never move to rack1 whether balancer.id is large or small, it is for reliability, if a replica in rack0 moved to rack1, all the replicas for a block will be in rack1, if rack1 is down, we get a missing block > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997759#comment-13997759 ] Chen He commented on HDFS-6250: --- Hi [~djp], a block will not be considered as a good candidate in that method. However, in this test case, no block on rack0 is good. Should the balancer keep node0 always over-utilized (assume it is caused by balancer.id file)? There should be priority for the data block moving policies. I will double check it today. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998419#comment-13998419 ] Binglin Chang commented on HDFS-6250: - The problem with adding an datanode on both rack0 and rack1 is that, you can't verify there is no block movement cross rack by checking datanode dfs usage. some block may move from rack0 to rack1, and some may move from rack1 to rack0, after this, the total rack usage may not change. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998472#comment-13998472 ] Junping Du commented on HDFS-6250: -- bq. Should the balancer keep node0 always over-utilized (assume it is caused by balancer.id file)? Yes. That's true. Balancer could end up (after ~5 time iterations) with cluster unbalanced when no block can be moved due to reliability reason. The trade off here we assume is data reliability should always get first priority. One purpose of test we design here is for verifying this. Make sense? > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997412#comment-13997412 ] Junping Du commented on HDFS-6250: -- Thanks [~decster] for updating on this. [~airbots], I will commit it if you are also fine with the patch here. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997684#comment-13997684 ] Junping Du commented on HDFS-6250: -- Thanks for review and comments, [~airbots]! The good case you proposed above, if it is current behavior, will be seen as a bug for balancer. As the prerequisite of balancer's handling block movement is not hurt the data reliability. It shouldn't move replicas of balancer.id across rack0 and rack1 as it will make replica's rack number reduce to 1 which affect block's reliability and inconsistent with replica placement policy. Actually, we have code below to get rid of this case: {code} ... * 3. doing the move does not reduce the number of racks that the block has */ private boolean isGoodBlockCandidate(Source source, {code} Would you double check the behavior of balancer in the case you described above? If so, we should file a separated JIRA to fix Balancer. What do you think? > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997583#comment-13997583 ] Chen He commented on HDFS-6250: --- Hi [~djp], I think it is fine for now, [~decster]'s patch can resolve most of the problems. There is corner case that this patch may fail. Unit test is to verify the correctness of source code. This test case wants to check whether balancer can keep the rack-locality if there is not over-utilized nodes. This patch may report false negative in following condition: If the balancer.id file is really large and make both node0 and node1 (two nodes original in the minicluster before adding new nodes) over-utilized. The balancer will move data blocks from rack0 to rack1. Because balancer's purpose is to get rid of over-utilized and under-utilized nodes by moving blocks. This patch will report failure. However, the balancer is not malfunctioned. If we bring up two new nodes to the minicluster, one to rack0 and one to rack1, it will be safe. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997229#comment-13997229 ] Binglin Chang commented on HDFS-6250: - Hi Junping, run TestBalancerWithNodeGroup.testBalancerWithRackLocality 50 times, no failure or timeout, average running time 5.2s, before was 20s run TestBalancerWithNodeGroup.testBalancerWithNodeGroup 50 times, no failure or timeout, average running time 10.2s, before was 17s > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996207#comment-13996207 ] Junping Du commented on HDFS-6250: -- Hi [~decster], patch LGTM. Would you run 50 iterations and update us your results here? Thx! > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990925#comment-13990925 ] Hadoop QA commented on HDFS-6250: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643526/HDFS-6250-v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6834//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6834//console This message is automatically generated. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990778#comment-13990778 ] Hadoop QA commented on HDFS-6250: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643526/HDFS-6250-v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6831//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6831//console This message is automatically generated. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990619#comment-13990619 ] Junping Du commented on HDFS-6250: -- The new patch sounds a good direction for me. Thanks [~decster] and [~airbots] for the effort here. Kick off Jenkins test manually. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990491#comment-13990491 ] Binglin Chang commented on HDFS-6250: - I maded a patch to address this jira and HDFS-6159, along with a minor fix in balancer.id ralated doc, changes: 1. make CAPACITY to 5000 rather than 6000, so it remains same ratio to block size as before, make DEFAULT_BLOCK_SIZE useful 2. change validate method in testBalancerWithRackLocality so it doesn't depends on balancer.id file 3. there is a doc error about balancer.id, fixed With the change the test now runs only about 7 seconds, rather than 20+ seconds. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, > test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990440#comment-13990440 ] Binglin Chang commented on HDFS-6250: - Hi [~airbots], please see my comments about HDFS-6159 https://issues.apache.org/jira/browse/HDFS-6159?focusedCommentId=13990433&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13990433 The fix in HDFS-6159 and this jira seems to be suboptimal, we may need to reconsider the approach. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250.patch, test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989598#comment-13989598 ] Chen He commented on HDFS-6250: --- Thank you for your comments, [~decster]. I agree with you that the balancer.id file can bring more problems. There are two ways to reduce the side-effect of the balancer.id file in this test method. 1) increasing the block size to reduce the impact of balancer.id file( this is what I did); 2) introduce two new nodes, one in rack0 and one in rack1. The failure in this JIRA is because of the balancer.id file's block (blk_181) that should be deleted. We have to wait until that block is deleted. I created a sub-task HDFS-6342 to redesign this test method. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250.patch, test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989495#comment-13989495 ] Binglin Chang commented on HDFS-6250: - Thanks for the analysis and patch [~airbots]. The fix makes sense, here are some additional concerns: bq. HDFS creates a /system/balancer.id file (30B) to track the balancer Looks like the file contains hostname, whose size is not fixed, I see you increased block size and capacity to minimize the impact of the file, but it seems the risk is still there. testBalancerWithRackLocality tests balancer do not perform cross rack block movements in test scenario, here are the related balancer logs: {code} 014-04-15 18:29:48,649 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:29:48,650 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:54333, utilization=30.0], Source[127.0.0.1:46174, utilization=30.0]] 2014-04-15 18:29:48,650 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:29:48,650 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=0.0]] 2014-04-15 18:29:51,722 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:29:51,722 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:54333, utilization=30.168], Source[127.0.0.1:46174, utilization=30.332]] 2014-04-15 18:29:51,722 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:29:51,722 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=1.8333]] 2014-04-15 18:29:54,820 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:29:54,820 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:54333, utilization=28.5], Source[127.0.0.1:46174, utilization=30.332]] 2014-04-15 18:29:54,820 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:29:54,820 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=5.0]] 2014-04-15 18:29:57,898 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:29:57,898 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:46174, utilization=30.332], Source[127.0.0.1:54333, utilization=25.332]] 2014-04-15 18:29:57,899 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:29:57,899 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=7.667]] 2014-04-15 18:30:00,933 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:30:00,933 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:54333, utilization=22.668], Source[127.0.0.1:46174, utilization=30.332]] 2014-04-15 18:30:00,933 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:30:00,933 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=10.5]] 2014-04-15 18:30:03,989 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:30:03,989 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 above-average: [Source[127.0.0.1:46174, utilization=30.332]] 2014-04-15 18:30:03,989 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 below-average: [BalancerDatanode[127.0.0.1:54333, utilization=19.832], BalancerDatanode[127.0.0.1:48293, utilization=12.0]] 2014-04-15 18:30:03,989 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 underutilized: [] {code} I guess the test intended to let /rack0/NODEGROUP0/dn above-average(<=30%) but not over-utilized(>30%, consider avg utilization=20%), so blocks on rack0 never move to rack1, but another balancer.id file may break the assumption. So there are some problem inherently in the test, not just race condition or timeout stuff. We may need to change the test(e.g. file size, utilize rate, validate method) to prevent those corner cases. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HD
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982484#comment-13982484 ] Hadoop QA commented on HDFS-6250: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642149/HDFS-6250-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6752//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6752//console This message is automatically generated. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250-v2.patch, HDFS-6250.patch, test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982428#comment-13982428 ] Chen He commented on HDFS-6250: --- above test failure is because of timeout. I propose to increase testBalancerWithRackLocality timeout to 80 seconds. Patch is attached. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250.patch, test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981845#comment-13981845 ] Hadoop QA commented on HDFS-6250: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642034/HDFS-6250.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6743//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6743//console This message is automatically generated. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: HDFS-6250.patch, test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981701#comment-13981701 ] Chen He commented on HDFS-6250: --- I think this undeleted data block is caused by a race condition: The testBalancerWithRackLocality uses getDfsUsed() methods to count the HDFS usage. When the test code checks tdatanode usage, getDfsUsed() has not updated yet. Here is whole process. 1) ReplicationMonitor runs every 5 seconds to check data blocks that need to be updated and send them to NN. We need 10 seconds to guarantee any information reaches NN; 2) Once NN gets operations, it sends them to corresponding DNs. It needs 6 seconds (2 heartbeat intervals, assume heartbeat interval is 3 seconds) to let DNs finish these operations and report updates back to NN; 3) if we consider other processing time, it will be safe to get latest information in 20 seconds. Based on analysis above, I attached my patch. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977645#comment-13977645 ] Chen He commented on HDFS-6250: --- In this test case, program starts a minicluster composed of 2 datanodes. Block size is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It create 180 data blocks with replication factor 2. Then, a node datanode is created and balancer starts to balancing the cluster considering rack locality. However, once balancer starts, HDFS creates a /system/balancer.id file (30B) to track the balancer. After balancer successes, HDFS should delete the /system/balancer.id file which is composed of 6 data blocks (2 replication for each data block). But HDFS failed to delete 1 data block of this /system/balancer.id file. This is why it reports 1810 data blocks instead of 1800. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Chen He > Attachments: test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971608#comment-13971608 ] Chen He commented on HDFS-6250: --- Hi [~kihwal] Sure, my pleasure. > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee > Attachments: test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971605#comment-13971605 ] Kihwal Lee commented on HDFS-6250: -- [~airbots] Can you take a look at this failure? > TestBalancerWithNodeGroup.testBalancerWithRackLocality fails > > > Key: HDFS-6250 > URL: https://issues.apache.org/jira/browse/HDFS-6250 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee > Attachments: test_log.txt > > > It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ > {panel} > java.lang.AssertionError: expected:<1800> but was:<1810> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup > .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) > {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)