[ 
https://issues.apache.org/jira/browse/HDFS-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212091#comment-13212091
 ] 

Steve Loughran commented on HDFS-2966:
--------------------------------------

point 1: probably trying to reduce change just to make sure I didn't 
accidentally remove an assertion. I will pull it.

point 2: seems good.

point 3: when I stripped it down too much the following tests fail -state 
propagates from one to the other. Moving to separate mini clusters could fix 
that but it would make things slower. That leaves "adding poll loops before 
each test case to ensure the FS is in the stable state before each test run. 
That's a harder thing to do and maybe something that can be put off unless this 
patch doesn't solve most people's problems.

                
> TestNameNodeMetrics tests can fail under load
> ---------------------------------------------
>
>                 Key: HDFS-2966
>                 URL: https://issues.apache.org/jira/browse/HDFS-2966
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.24.0
>         Environment: OS/X running intellij IDEA, firefox, winxp in a 
> virtualbox.
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: HDFS-2966.patch
>
>
> I've managed to recreate HDFS-540 and HDFS-2434 by the simple technique of 
> running the HDFS tests on a desktop with out enough memory for all the 
> programs trying to run. Things got swapped out and the tests failed as the DN 
> heartbeats didn't come in on time.
> the tests both rely on {{waitForDeletion()}} to block the tests until the 
> delete operation has completed, but all it does is sleep for the same number 
> of seconds as there are datanodes. This is too brittle -it may work on a 
> lightly-loaded system, but not on a system under heavy load where it is 
> taking longer to replicate than expect.
> Immediate fix: double, triple, the sleep time?
> Better fix: have the thread block until all the DN heartbeats have finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to