[ https://issues.apache.org/jira/browse/HBASE-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220878#comment-14220878 ]
Hadoop QA commented on HBASE-12534: ----------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682854/HBASE-12534-v1.diff against master branch at commit 325cdc0987f8176ac46695f5b0c93b0fc6605ab9. ATTACHMENT ID: 12682854 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.master.balancer.TestBaseLoadBalancer Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11775//console This message is automatically generated. > Wrong region location cache in client after regions are moved > ------------------------------------------------------------- > > Key: HBASE-12534 > URL: https://issues.apache.org/jira/browse/HBASE-12534 > Project: HBase > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: Liu Shaohui > Assignee: Liu Shaohui > Priority: Critical > Labels: client > Attachments: HBASE-12534-0.94-v1.diff, HBASE-12534-v1.diff > > > In our 0.94 hbase cluster, we found that client got wrong region location > cache and did not update it after a region is moved to another regionserver. > The reason is wrong client config and bug in RpcRetryingCaller of hbase > client. > The rpc configs are following: > {code} > hbase.rpc.timeout=1000 > hbase.client.pause=200 > hbase.client.operation.timeout=1200 > {code} > But the client retry number is 3 > {code} > hbase.client.retries.number=3 > {code} > Assumed that a region is at regionserver A before, and then it is moved to > regionserver B. The client try to make a call to regionserver A and get an > NotServingRegionException. For the rety number is not 1, the region server > location cache is not cleaned. See: RpcRetryingCaller.java#141 and > RegionServerCallable.java#127 > {code} > @Override > public void throwable(Throwable t, boolean retrying) { > if (t instanceof SocketTimeoutException || > .... > } else if (t instanceof NotServingRegionException && !retrying) { > // Purge cache entries for this specific region from hbase:meta cache > // since we don't call connect(true) when number of retries is 1. > getConnection().deleteCachedRegionLocation(location); > } > } > {code} > But the call did not retry and throw an SocketTimeoutException for the time > the call will take is larger than the operation timeout.See > RpcRetryingCaller.java#152 > {code} > expectedSleep = callable.sleep(pause, tries + 1); > // If, after the planned sleep, there won't be enough time left, we > stop now. > long duration = singleCallDuration(expectedSleep); > if (duration > callTimeout) { > String msg = "callTimeout=" + callTimeout + ", callDuration=" + > duration + > ": " + callable.getExceptionMessageAdditionalDetail(); > throw (SocketTimeoutException)(new > SocketTimeoutException(msg).initCause(t)); > } > {code} > At last, the wrong region location will never be not cleaned up . > [~lhofhansl] > In hbase 0.94, the MIN_RPC_TIMEOUT in singleCallDuration is 2000 in default, > which trigger this bug. > {code} > private long singleCallDuration(final long expectedSleep) { > return (EnvironmentEdgeManager.currentTimeMillis() - this.globalStartTime) > + MIN_RPC_TIMEOUT + expectedSleep; > } > {code} > But there is risk in master code too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)