[
https://issues.apache.org/jira/browse/HADOOP-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164829#comment-13164829
]
Jason Lowe commented on HADOOP-7888:
------------------------------------
Before I submitted the patch I stepped through the code with the debugger to
make sure I was seeing the two threads synchronizing within the invokeMethod(),
so I have high confidence it should address the issue. RE: failure rate, I was
seeing it very intermittently when running the test directly via Eclipse, but
then on my machine I can see the issue nearly 100% (e.g.: 34 out of 35 tries)
with this build command:
mvn test -Dtest=TestFailoverProxy
With the patch I've never seen it fail from within Eclipse nor from the build
command even when placed in a test loop.
> TestFailoverProxy fails intermittently on trunk
> -----------------------------------------------
>
> Key: HADOOP-7888
> URL: https://issues.apache.org/jira/browse/HADOOP-7888
> Project: Hadoop Common
> Issue Type: Bug
> Components: test
> Affects Versions: 0.24.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Attachments: hadoop-7888.patch
>
>
> TestFailoverProxy can fail intermittently with the failures occurring in
> testConcurrentMethodFailures(). The test has a race condition where the two
> threads may be sequentially invoking the unreliable interface rather than
> concurrently. Currently the proxy provider's getProxy() method contains the
> thread synchronization to enforce a concurrent invocation, but examining the
> source to RetryInvocationHandler.invoke() shows that the call to getProxy()
> during failover is too late to enforce a truly concurrent invocation.
> For this particular test, one thread could race ahead and block on the
> CountDownLatch in getProxy() before the other thread even enters
> RetryInvocationHandler.invoke(). If that happens the second thread will
> cache the newly updated value for proxyProviderFailoverCount, since the
> failover has mostly been processed by the original thread. Therefore the
> second thread ends up assuming no other thread is present, performs a
> failover, and the test fails because two failovers occurred instead of one.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira