[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

Hudson (JIRA) Tue, 03 Jun 2014 09:02:22 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016808#comment-14016808
 ]


Hudson commented on HADOOP-10630:
---------------------------------

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1790 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1790/])
HADOOP-10630. Possible race condition in RetryInvocationHandler. Contributed by 
Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1599366)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java


> Possible race condition in RetryInvocationHandler
> -------------------------------------------------
>
>                 Key: HADOOP-10630
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10630
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>             Fix For: 2.5.0
>
>         Attachments: HADOOP-10630.000.patch
>
>
> In one of our system tests with NameNode HA setup, we ran 300 threads in 
> LoadGenerator. While one of the NameNodes was already in the active state and 
> started to serve, we still saw one of the client thread failed all the 
> retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
> warning msg in the log:
> {noformat}
> WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
> this method invocation attempt.
> {noformat}
> After checking the code, we see the following code in RetryInvocationHandler:
> {code}
>   while (true) {
>       // The number of times this invocation handler has ever been failed 
> over,
>       // before this method invocation attempt. Used to prevent concurrent
>       // failed method invocations from triggering multiple failover attempts.
>       long invocationAttemptFailoverCount;
>       synchronized (proxyProvider) {
>         invocationAttemptFailoverCount = proxyProviderFailoverCount;
>       }
>       ......
>       if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
>             // Make sure that concurrent failed method invocations only cause 
> a
>             // single actual fail over.
>             synchronized (proxyProvider) {
>               if (invocationAttemptFailoverCount == 
> proxyProviderFailoverCount) {
>                 proxyProvider.performFailover(currentProxy.proxy);
>                 proxyProviderFailoverCount++;
>                 currentProxy = proxyProvider.getProxy();
>               } else {
>                 LOG.warn("A failover has occurred since the start of this 
> method"
>                     + " invocation attempt.");
>               }
>             }
>             invocationFailoverCount++;
>           }
>      ......
> {code}
> We can see we refresh the value of currentProxy only when the thread performs 
> the failover (while holding the monitor of the proxyProvider). Because 
> "currentProxy" is not volatile,  a thread that does not perform the failover 
> (in which case it will log the warning msg) may fail to get the new value of 
> currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

Reply via email to