[
https://issues.apache.org/jira/browse/HADOOP-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164735#comment-13164735
]
Aaron T. Myers commented on HADOOP-7888:
----------------------------------------
Thanks a lot for tracking down this issue and providing a patch, Jason.
A question for you about testing - given that this test would only fail
intermittently before, have you tried running this test in a loop with and
without the patch applied to *ensure* that the patch addresses the issue? I
believe it should fix it, but just want to make sure. Also, can you comment on
the frequency with which you'd observe this spurious test failure without the
patch?
The patch looks good to me. I'll commit it once Jason comments on the above
question.
> TestFailoverProxy fails intermittently on trunk
> -----------------------------------------------
>
> Key: HADOOP-7888
> URL: https://issues.apache.org/jira/browse/HADOOP-7888
> Project: Hadoop Common
> Issue Type: Bug
> Components: test
> Affects Versions: 0.24.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Attachments: hadoop-7888.patch
>
>
> TestFailoverProxy can fail intermittently with the failures occurring in
> testConcurrentMethodFailures(). The test has a race condition where the two
> threads may be sequentially invoking the unreliable interface rather than
> concurrently. Currently the proxy provider's getProxy() method contains the
> thread synchronization to enforce a concurrent invocation, but examining the
> source to RetryInvocationHandler.invoke() shows that the call to getProxy()
> during failover is too late to enforce a truly concurrent invocation.
> For this particular test, one thread could race ahead and block on the
> CountDownLatch in getProxy() before the other thread even enters
> RetryInvocationHandler.invoke(). If that happens the second thread will
> cache the newly updated value for proxyProviderFailoverCount, since the
> failover has mostly been processed by the original thread. Therefore the
> second thread ends up assuming no other thread is present, performs a
> failover, and the test fails because two failovers occurred instead of one.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira