[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-06-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016403#comment-14016403
 ] 

Hudson commented on HADOOP-10630:
-

FAILURE: Integrated in Hadoop-Yarn-trunk #572 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/572/])
HADOOP-10630. Possible race condition in RetryInvocationHandler. Contributed by 
Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1599366)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java


 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.5.0

 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-06-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016621#comment-14016621
 ] 

Hudson commented on HADOOP-10630:
-

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1763 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1763/])
HADOOP-10630. Possible race condition in RetryInvocationHandler. Contributed by 
Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1599366)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java


 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.5.0

 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-06-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016808#comment-14016808
 ] 

Hudson commented on HADOOP-10630:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1790 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1790/])
HADOOP-10630. Possible race condition in RetryInvocationHandler. Contributed by 
Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1599366)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java


 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.5.0

 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-06-02 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015916#comment-14015916
 ] 

Jing Zhao commented on HADOOP-10630:


Looks like our failover tests run well with the patch during the weekend. I 
will commit the patch shortly.

 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-06-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015940#comment-14015940
 ] 

Hudson commented on HADOOP-10630:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5644 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5644/])
HADOOP-10630. Possible race condition in RetryInvocationHandler. Contributed by 
Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1599366)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java


 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.5.0

 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-05-28 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011352#comment-14011352
 ] 

Kihwal Lee commented on HADOOP-10630:
-

The patch looks reasonable. Did you have a chance to verify that it fixes the 
issue?

 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-05-28 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011364#comment-14011364
 ] 

Jing Zhao commented on HADOOP-10630:


Not yet. Actually the issue cannot easily be reproduced since by default the 
client will retry/failover 10 times. I will decrease the retry number and rerun 
the test with/without the patch these days.

 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-05-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011387#comment-14011387
 ] 

Suresh Srinivas commented on HADOOP-10630:
--

+1 for the patch, once the failover tests pass.

 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010477#comment-14010477
 ] 

Hadoop QA commented on HADOOP-10630:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12647006/HADOOP-10630.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3975//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3975//console

This message is automatically generated.

 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)