[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751730#comment-16751730 ]
Hadoop QA commented on HBASE-21775: ----------------------------------- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 28s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 12s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 24s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 17s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 11s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 42m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21775 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12956220/HBASE-21775.master.001.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux b60932886172 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / 416b70f461 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/15726/testReport/ | | Max. process+thread count | 265 (vs. ulimit of 10000) | | modules | C: hbase-client U: hbase-client | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/15726/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > The BufferedMutator doesn't ever refresh region location cache > -------------------------------------------------------------- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client > Reporter: Tommy Li > Assignee: Tommy Li > Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21775.master.001.patch > > > {color:#222222}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on <SERVER>,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > <SERVER>,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to <SERVER> failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: <SERVER> on <SERVER>,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)