[ 
https://issues.apache.org/jira/browse/HDFS-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860849#comment-15860849
 ] 

Manjunath Anand commented on HDFS-11397:
----------------------------------------

The intermittent timeout happens only in scenarios when due to race condition 
the executor service is shutdown even before the LatchedCheckable#check is 
called. In this scenario the interrupt call by the shutdown method is ignored 
and the FutureTask#get in the  line {code} assertFalse(olf.get().get()); {code} 
never sees that the executor service is shutdown and even misses the interrupt 
call. This causes the FutureTask#get to loop through indefinitely if the 
waittimeout is not specified. 

To fix this before we call shutdown on executor service we make sure the thread 
performing the call on LatchedCheckable#check is called using countdownlatch 
startedSignal (refer to patch) and then proceed with shutdown. This ensures 
that the interrupt call by the shutdown on executor service is never missed.

Attached is the patch for trunk. Please let me know any suggestions.

> TestThrottledAsyncChecker#testCancellation timed out
> ----------------------------------------------------
>
>                 Key: HDFS-11397
>                 URL: https://issues.apache.org/jira/browse/HDFS-11397
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, test
>    Affects Versions: 3.0.0-alpha3
>            Reporter: John Zhuge
>            Assignee: Manjunath Anand
>            Priority: Minor
>         Attachments: HDFS-11397-V01.patch
>
>
> {noformat}
> Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 61.153 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker
> testCancellation(org.apache.hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker)
>   Time elapsed: 60.033 sec  <<< ERROR!
> java.lang.Exception: test timed out after 60000 milliseconds
>       at sun.misc.Unsafe.park(Native Method)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>       at 
> org.apache.hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker.testCancellation(TestThrottledAsyncChecker.java:114)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to