[ 
https://issues.apache.org/jira/browse/HADOOP-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164829#comment-13164829
 ] 

Jason Lowe commented on HADOOP-7888:
------------------------------------

Before I submitted the patch I stepped through the code with the debugger to 
make sure I was seeing the two threads synchronizing within the invokeMethod(), 
so I have high confidence it should address the issue.  RE: failure rate, I was 
seeing it very intermittently when running the test directly via Eclipse, but 
then on my machine I can see the issue nearly 100% (e.g.: 34 out of 35 tries) 
with this build command:

mvn test -Dtest=TestFailoverProxy

With the patch I've never seen it fail from within Eclipse nor from the build 
command even when placed in a test loop.
                
> TestFailoverProxy fails intermittently on trunk
> -----------------------------------------------
>
>                 Key: HADOOP-7888
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7888
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.24.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: hadoop-7888.patch
>
>
> TestFailoverProxy can fail intermittently with the failures occurring in 
> testConcurrentMethodFailures().  The test has a race condition where the two 
> threads may be sequentially invoking the unreliable interface rather than 
> concurrently.  Currently the proxy provider's getProxy() method contains the 
> thread synchronization to enforce a concurrent invocation, but examining the 
> source to RetryInvocationHandler.invoke() shows that the call to getProxy() 
> during failover is too late to enforce a truly concurrent invocation.
> For this particular test, one thread could race ahead and block on the 
> CountDownLatch in getProxy() before the other thread even enters 
> RetryInvocationHandler.invoke().  If that happens the second thread will 
> cache the newly updated value for proxyProviderFailoverCount, since the 
> failover has mostly been processed by the original thread.  Therefore the 
> second thread ends up assuming no other thread is present, performs a 
> failover, and the test fails because two failovers occurred instead of one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to