[ 
https://issues.apache.org/jira/browse/HDFS-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366411#comment-15366411
 ] 

Xiao Chen commented on HDFS-10336:
----------------------------------

Thanks [~linyiqun].
Please update the timeout of {{testUnknownDatanodeSimple}} to be the same, 
since it's calling the same underlying method.

Also, looking closer, {{testBalancerWithKeytabs}} has a 5 minute timeout, not 
30s. Are you sure the test passes after bumping this to 10 mins? Would prefer 
to allow the test to pass sooner if possible. I'm okay to defer this to a 
separate jira too.

> TestBalancer failing intermittently because of not reseting 
> UserGroupInformation completely
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10336
>                 URL: https://issues.apache.org/jira/browse/HDFS-10336
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>         Attachments: HDFS-10336.001.patch, HDFS-10336.002.patch
>
>
> The unit test {{TestBalancer}} failed sometimes. 
> I looked for the reason. I found two main reasons causing this.
> * 1st. The test {{TestBalancer#testBalancerWithKeytabs}} executed timeout.
> {code}
> org.apache.hadoop.hdfs.server.balancer.TestBalancer
> testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  
> Time elapsed: 300.41 sec  <<< ERROR!
> java.lang.Exception: test timed out after 300000 milliseconds
>       at java.lang.Thread.sleep(Native Method)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
>       at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
>       at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
>       at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
> {code}
> * 2nd. The test {{TestBalancer#testBalancerWithKeytabs}} reset the {{UGI}} 
> not completely sometimes in the finally block. And this affected the other 
> unit tests threw {{IOException}}, like this:
> {code}
> testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
>   Time elapsed: 0 sec  <<< ERROR!
> java.io.IOException: Running in secure mode, but config doesn't have a keytab
>       at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
> {code}
> And there were not only one test will be affected by this. We should add a 
> line to do before doing reset {{UGI}} operation and can avoid the potenial 
> exception happens.
> {code}
> UserGroupInformation.reset();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to