[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-08-16 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436301#comment-13436301
 ] 

Aaron T. Myers commented on HDFS-2966:
--

Hey Trevor, please open a new issue for it. Thanks a lot.

> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, 
> HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-08-14 Thread Trevor Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434660#comment-13434660
 ] 

Trevor Robinson commented on HDFS-2966:
---

I hit this on my last 2 builds of trunk. I don't see an open issue on it, so 
should I create a new issue or reopen this one (or HDFS-540)?

{noformat}
testCorruptBlock(org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics)
  Time elapsed: 7.082 sec  <<< FAILURE!
java.lang.AssertionError: Bad value for metric PendingReplicationBlocks 
expected:<0> but was:<1>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:126)
at org.junit.Assert.assertEquals(Assert.java:470)
at 
org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:191)
at 
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:186)
{noformat}


> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.2.0-alpha
>
> Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, 
> HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-03-10 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226857#comment-13226857
 ] 

Hudson commented on HDFS-2966:
--

Integrated in Hadoop-Mapreduce-trunk #1015 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1015/])
HDFS-2966 (Revision 1298820)

 Result = SUCCESS
stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298820
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java


> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, 
> HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-03-10 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226836#comment-13226836
 ] 

Hudson commented on HDFS-2966:
--

Integrated in Hadoop-Hdfs-trunk #980 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/980/])
HDFS-2966 (Revision 1298820)

 Result = SUCCESS
stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298820
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java


> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, 
> HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-03-09 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226082#comment-13226082
 ] 

Hudson commented on HDFS-2966:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #1865 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1865/])
HDFS-2966 (Revision 1298820)

 Result = ABORTED
stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298820
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java


> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, 
> HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-03-09 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226061#comment-13226061
 ] 

Hudson commented on HDFS-2966:
--

Integrated in Hadoop-Hdfs-trunk-Commit #1931 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1931/])
HDFS-2966 (Revision 1298820)

 Result = SUCCESS
stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298820
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java


> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, 
> HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-03-09 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226062#comment-13226062
 ] 

Hudson commented on HDFS-2966:
--

Integrated in Hadoop-Common-trunk-Commit #1856 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1856/])
HDFS-2966 (Revision 1298820)

 Result = SUCCESS
stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298820
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java


> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 0.24.0
>
> Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, 
> HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-03-05 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222611#comment-13222611
 ] 

Hadoop QA commented on HDFS-2966:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12517086/HDFS-2966.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1952//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1952//console

This message is automatically generated.

> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 0.24.0, 0.23.2
>
> Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, 
> HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-03-05 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222543#comment-13222543
 ] 

Aaron T. Myers commented on HDFS-2966:
--

+1, pending Jenkins.

> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 0.24.0, 0.23.2
>
> Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch, 
> HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-02-24 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216230#comment-13216230
 ] 

Aaron T. Myers commented on HDFS-2966:
--

bq. Rather than rename the method, I made the metric scope a parameter, so it 
can block for other metrics too.

I still find the name a tad misleading, since the method is still a little 
DN-specific. e.g. it retries based on the number of DNs, and sleeps a multiple 
of DFS_REPLICATION_INTERVAL. But, I don't feel super strongly about this point. 
Take it or leave it.

+1, the patch looks good to me, assuming you don't want to address the above 
feedback.

> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 0.24.0, 0.23.2
>
> Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-02-24 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216223#comment-13216223
 ] 

Hadoop QA commented on HDFS-2966:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12516000/HDFS-2966.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1910//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1910//console

This message is automatically generated.

> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 0.24.0, 0.23.2
>
> Attachments: HDFS-2966.patch, HDFS-2966.patch, HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-02-23 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214891#comment-13214891
 ] 

Suresh Srinivas commented on HDFS-2966:
---

Steve, please see HDFS-3002 - comment 
https://issues.apache.org/jira/browse/HDFS-3002?focusedCommentId=13214890&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13214890
 

> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 0.24.0, 0.23.2
>
> Attachments: HDFS-2966.patch, HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-02-20 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212194#comment-13212194
 ] 

Aaron T. Myers commented on HDFS-2966:
--

We can forego #3 for this JIRA, if you want. I may switch the whole test to use 
separate mini clusters in HDFS-2978 anyway, which would render this point moot. 

> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-02-20 Thread Steve Loughran (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212091#comment-13212091
 ] 

Steve Loughran commented on HDFS-2966:
--

point 1: probably trying to reduce change just to make sure I didn't 
accidentally remove an assertion. I will pull it.

point 2: seems good.

point 3: when I stripped it down too much the following tests fail -state 
propagates from one to the other. Moving to separate mini clusters could fix 
that but it would make things slower. That leaves "adding poll loops before 
each test case to ensure the FS is in the stable state before each test run. 
That's a harder thing to do and maybe something that can be put off unless this 
patch doesn't solve most people's problems.


> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-02-20 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212024#comment-13212024
 ] 

Aaron T. Myers commented on HDFS-2966:
--

Hey Steve, patch looks pretty good. I agree this issue could stand to be 
improved. I've also seen spurious failures in this test.

A few comments:

# In the spot where you call waitForGaugeValue for "FilesTotal", you also  
unnecessarily assert the value for FilesTotal.
# The name "waitForGaugeValue" seems a little misleading, since it's not a 
general-purpose method for gauges, but rather somewhat specific to gauges that 
are a function of _DN metrics_. Perhaps consider renaming it to something like 
"waitForDnMetricValue" ?
# Though the patch manages to get rid of the most race-prone sleeps (DN 
metrics), I don't think it will necessarily completely solve the issue for very 
slow VMs, since there are still several calls to updateMetrics. Can we 
completely remove the need for updateMetrics in this test, by waiting for a 
specific value as you've done here?

> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-02-18 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210998#comment-13210998
 ] 

Hadoop QA commented on HDFS-2966:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515090/HDFS-2966.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1885//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1885//console

This message is automatically generated.

> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-02-18 Thread Steve Loughran (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210972#comment-13210972
 ] 

Steve Loughran commented on HDFS-2966:
--

patch applies sleep (for the same delay as before) then poll+sleep for a 
limited set of retries before giving up.

Provide the assertions are failing on the exit of the wait cycle, rather than 
on the initial state of the tests, this polling should significantly reduce the 
probability of failure under load. 

To reassure anyone worried that this polling would slow down the test run on a 
machine not under load, this does not appear to be the case.

Before
{code}
Running org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 47.042 sec
{code}

On two runs after making changes, the elapsed times were 46.709 sec and 42.995 
sec. This implies it takes about the same time. 



> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-02-18 Thread Steve Loughran (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210969#comment-13210969
 ] 

Steve Loughran commented on HDFS-2966:
--

A problem here is that the tests are not independent -the fs events from the 
previous test can still be trickling through the filesystem when the next test 
starts running.

A simple poll/sleep cycle actually behaves worse, because it can exit too 
early; the state of the previous test is still there and the more recent 
changes aren't yet in the metrics. 

A sleep+ followup poll cycle would appear to be a better process, though it may 
still have problems under load that movind to per-test mini HDFS clusters would 
be required to fix.

> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-02-18 Thread Steve Loughran (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210888#comment-13210888
 ] 

Steve Loughran commented on HDFS-2966:
--

My planned solution to this is move from sleep-then-assert to sleep-poll-repeat 
for a longer period of time. If the state is reached sooner, the test finishes 
earlier, but if the machine is overloaded the test will stretch out. This may 
make it faster on some machines, as well as less brittle on others.


> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Priority: Minor
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2966) TestNameNodeMetrics tests can fail under load

2012-02-17 Thread Steve Loughran (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210421#comment-13210421
 ] 

Steve Loughran commented on HDFS-2966:
--

stack trace against trunk

{code}
---
Test set: org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
---
Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 53.526 sec <<< 
FAILURE!
testCorruptBlock(org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics)
  Time elapsed: 8.968 sec  <<< FAILURE!
java.lang.AssertionError: Bad value for metric PendingReplicationBlocks 
expected:<0> but was:<1>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:126)
at org.junit.Assert.assertEquals(Assert.java:470)
at 
org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185)
at 
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:185)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at 
org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at 
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
at 
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:81)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
{code}

> TestNameNodeMetrics tests can fail under load
> -
>
> Key: HDFS-2966
> URL: https://issues.apache.org/jira/browse/HDFS-2966
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
> Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>Reporter: Steve Loughran
>Priority: Minor
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work o