[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761353#comment-16761353 ] Hudson commented on HBASE-21775: Results for branch branch-1 [build #668 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/668/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/668//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/668//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/668//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761357#comment-16761357 ] Hudson commented on HBASE-21775: Results for branch branch-1.4 [build #658 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/658/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/658//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/658//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/658//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761347#comment-16761347 ] Hudson commented on HBASE-21775: Results for branch branch-1.3 [build #640 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/640/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/640//General_Nightly_Build_Report/] (/) {color:green}+1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/640//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/640//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761315#comment-16761315 ] Hudson commented on HBASE-21775: SUCCESS: Integrated in Jenkins build HBase-1.3-IT #524 (See [https://builds.apache.org/job/HBase-1.3-IT/524/]) Revert "HBASE-21775 The BufferedMutator doesn't ever refresh region (apurtell: [https://github.com/apache/hbase/commit/f2ea066141bf5c0079e7b101549356e5611b1ed7]) * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761253#comment-16761253 ] Andrew Purtell commented on HBASE-21775: Also TestRegionLocationCaching. There are others. I'm going to revert this from branch-1 for now. If we can get a replacement in before next RC (Monday Feb 11), then we can try again. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761256#comment-16761256 ] Andrew Purtell commented on HBASE-21775: Reverted from the branch-1s. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757384#comment-16757384 ] Hudson commented on HBASE-21775: Results for branch branch-2.2 [build #7 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/7/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/7//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/7//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/7//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757277#comment-16757277 ] Hudson commented on HBASE-21775: Results for branch branch-2 [build #1649 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1649/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1649//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1649//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1649//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757217#comment-16757217 ] Hudson commented on HBASE-21775: Results for branch master [build #759 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/759/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/759//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/759//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/759//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756918#comment-16756918 ] Duo Zhang commented on HBASE-21775: --- Thanks lads. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756882#comment-16756882 ] Hudson commented on HBASE-21775: Results for branch branch-2.1 [build #817 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/817/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/817//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/817//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/817//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756870#comment-16756870 ] Hudson commented on HBASE-21775: Results for branch branch-2.0 [build #1301 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1301/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1301//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1301//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1301//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756690#comment-16756690 ] stack commented on HBASE-21775: --- I pushed addendum on branch-2.0+ branch-1 doesn't do this static messing w/ CONF so it should be ok. Resolving. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756687#comment-16756687 ] Tommy Li commented on HBASE-21775: -- Sorry about that [~stack], yeah I need to update my editor settings to match this project's styleguide > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756679#comment-16756679 ] stack commented on HBASE-21775: --- Thank you [~tommyzli] On the addendum, it looks good but in future please don't do this (maybe you have to change your config on IDE?) -import org.junit.Assert; -import org.junit.BeforeClass; -import org.junit.ClassRule; -import org.junit.Test; +import org.junit.*; I changed the addendum and fixed the checkstyle and applied the patch. Thank you. Smile one of the checkstyle complaints was ./hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java:71:import org.junit.*;: Using the '.*' form of import should be avoided - org.junit.*. [AvoidStarImport] > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756653#comment-16756653 ] Hadoop QA commented on HBASE-21775: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 54s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 0s{color} | {color:red} hbase-client: The patch generated 3 new + 16 unchanged - 1 fixed = 19 total (was 17) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 52s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 23s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 54s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 51m 17s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21775 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12956966/HBASE-21775-ADDENDUM.master.001.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 55f2d5de9471 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 5ddda1a1f6 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/15795/artifact/patchprocess/diff-checkstyle-hbase-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/15795/testReport/ | | Max. process+thread count | 260 (vs. ulimit of 1) | | modules | C: hbase-client U: hbase-client
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756608#comment-16756608 ] Tommy Li commented on HBASE-21775: -- So what I noticed is that those five tests only fail when I run the entire test class, but running them individually always succeeds. The class has a static configuration that gets modified by some tests. When I changed the tests to create a new configuration per test, the tests stopped failing on my machine. I've uploaded HBASE-21775-ADDENDUM.master.001.patch with my changes > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756436#comment-16756436 ] Tommy Li commented on HBASE-21775: -- I was able to get a somewhat reliable local repro - will take a look later today > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755571#comment-16755571 ] Duo Zhang commented on HBASE-21775: --- On master branch if I revert the change here TestAsyncProcess can pass. And this maybe a test issue, as I suppose you changed the retry behavior here, and the failed UTs are all about how we do fail recovery, for example, testFail, testErrorServers, etc. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755347#comment-16755347 ] Tommy Li commented on HBASE-21775: -- Thanks for the link, [~stack]. I took a look at the report from before my change went in and indeed TestAsyncProcess [is not listed there|[https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.1/168/artifact/dashboard.html].] Could this be a build caching issue? > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755342#comment-16755342 ] Tommy Li commented on HBASE-21775: -- So I pulled branch-2.1 and ran `mvn test -Dtest=org.apache.hadoop.hbase.client.TestAsyncProcess -Dskip.license.check=true` locally both with my change and without, and I see the same 5 test failures in both runs. I've attached the surefire output of both runs. [^org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt] has one extra failure which is the test that I added. Unless I'm looking at the wrong tests, I don't think the failures are introduced by my change > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755331#comment-16755331 ] stack commented on HBASE-21775: --- Does it pass locally for you [~tommyzli]? I believe the mighty [~Apache9] is referring to this https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.1/lastSuccessfulBuild/artifact/dashboard.html It looks like the test fails 100% of the time in the flakey test finder. Click on the 'show' to look at particular failure types. I took a look and the failures seem to be other tests, not TestAsyncProcess? Yeah, does it pass locally for you? > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755296#comment-16755296 ] Tommy Li commented on HBASE-21775: -- I'm looking at this. [~Apache9] can you paste a link to output of the failed test? I'm not familiar with jenkins and am having trouble finding the failure > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755229#comment-16755229 ] stack commented on HBASE-21775: --- [~tommyzli] If you are looking at this, just say, and I'll give it a go sir. Thanks. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753263#comment-16753263 ] Hudson commented on HBASE-21775: Results for branch branch-2 [build #1639 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1639/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1639//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1639//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1639//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753246#comment-16753246 ] Hudson commented on HBASE-21775: Results for branch branch-2.0 [build #1290 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1290/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1290//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1290//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1290//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753245#comment-16753245 ] Hudson commented on HBASE-21775: Results for branch master [build #749 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/749/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/749//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/749//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/749//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753243#comment-16753243 ] Hudson commented on HBASE-21775: Results for branch branch-2.1 [build #806 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/806/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/806//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/806//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/806//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753241#comment-16753241 ] stack commented on HBASE-21775: --- Took another look. branch-1.2 is a little different hereabouts. Would take a little testing/messing to backport. Leaving for now. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753237#comment-16753237 ] Hudson commented on HBASE-21775: Results for branch branch-1.3 [build #629 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/629/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/629//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/629//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/629//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753235#comment-16753235 ] Hudson commented on HBASE-21775: Results for branch branch-1 [build #655 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/655/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/655//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/655//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/655//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753236#comment-16753236 ] Hudson commented on HBASE-21775: Results for branch branch-1.4 [build #646 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/646/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/646//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/646//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/646//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753214#comment-16753214 ] Hudson commented on HBASE-21775: SUCCESS: Integrated in Jenkins build HBase-1.3-IT #523 (See [https://builds.apache.org/job/HBase-1.3-IT/523/]) HBASE-21775 The BufferedMutator doesn't ever refresh region location (stack: [https://github.com/apache/hbase/commit/725547cec3a83c47bfb422b449d9c94207d1304c]) * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753196#comment-16753196 ] Andrew Purtell commented on HBASE-21775: +1 Please, thank you [~stack] > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753195#comment-16753195 ] stack commented on HBASE-21775: --- FYI for future [~tommyzli], commit message should start out with the JIRA number and subject. FYI. See git log. Next time sir. Thanks for the nice patch that probably took a long time to figure though it a one-liner. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753194#comment-16753194 ] stack commented on HBASE-21775: --- Pushed to branch-2.0+ [~apurtell] Want this in branch-1 sir? I can pull it back for you np. Leaving open for the moment... > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752856#comment-16752856 ] Tommy Li commented on HBASE-21775: -- [~stack] It definitely needs to go to branch-2. I haven't tested this on version 1, but i took a brief look at the code and that condition is [the same|[https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java#L1259]|https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java#L1259],] so yeah this can also go to branch-1 > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752837#comment-16752837 ] stack commented on HBASE-21775: --- This patch should go everywhere it seems [~tommyzli]? branch-2 and branch-1. Thanks. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752658#comment-16752658 ] Tommy Li commented on HBASE-21775: -- [~stack] yes - ran a quick test where I killed the cluster while the ingestion process was running and confirmed that the buffered mutator picked up the new region locations when it came back up. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752559#comment-16752559 ] stack commented on HBASE-21775: --- +1 on the patch. Have you deployed this patch [~tommyzli] ? If so, does it make a difference sir? > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752522#comment-16752522 ] Tommy Li commented on HBASE-21775: -- [~stack] from what I can see, tableName shouldn't be null unless you manually create a BufferedMutatorImpl instead of using ConnectionFactory.createConnection().getBufferedMutator(). I not sure if the bufferedmutator would work at all without a table name. I'm running a build taken from master a few months ago, but I've seen the same issue in the latest release. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751907#comment-16751907 ] stack commented on HBASE-21775: --- Patch looks good [~tommyzli] Why would tablename be null do you think? What version of hbase are you running? > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751730#comment-16751730 ] Hadoop QA commented on HBASE-21775: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 28s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 12s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 24s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 17s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 11s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 42m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21775 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12956220/HBASE-21775.master.001.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux b60932886172 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / 416b70f461 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/15726/testReport/ | | Max. process+thread count | 265 (vs. ulimit of 1) | | modules | C: hbase-client U: hbase-client | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/15726/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated.