[jira] [Updated] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-21775: --- Fix Version/s: (was: 1.3.4) (was: 1.4.10) (was: 1.5.0) > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21775: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Resolving. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommy Li updated HBASE-21775: - Status: Patch Available (was: Reopened) > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommy Li updated HBASE-21775: - Attachment: HBASE-21775-ADDENDUM.master.001.patch > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommy Li updated HBASE-21775: - Attachment: org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21775: -- Resolution: Fixed Status: Resolved (was: Patch Available) Applied to branch-1.3+ but w/o the test since it uses branch-2-isms. Did not go back to branch-1.2 so skipped it (shout if you want this and I'll shoehorn [~busbey]) > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21775: -- Fix Version/s: 1.3.4 1.4.10 1.5.0 > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21775: -- Fix Version/s: 2.0.5 2.1.3 2.2.0 > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommy Li updated HBASE-21775: - Description: {color:#22}I noticed in some of my writing jobs that the BufferedMutator would get stuck retrying writes against a dead server.{color} {code:java} 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: dummy_table 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST 2019; NOT retrying, failed=1 -- final attempt! 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] IngestRawData.map(): [B@258bc2c7: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: Operation rpcTimeout: 1 time, servers with issues: ,17020,1547848193782 {code} After the single remaining action permanently failed, it would resume progress only to get stuck again retrying against the same dead server: {code:java} 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: dummy_table 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: dummy_table 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last exception=java.net.ConnectException: Call to failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: connection timed out: on ,17020,1547848193782, tracking started null, retrying after=20089ms, operationsToReplay=1 {code} Only restarting the client process to generate a new BufferedMutator instance would fix the issue, at least until the next regionserver crash The logs I've pasted show the issue happening with a ConnectionTimeoutException, but we've also seen it with NotServingRegionException and some others was: {color:#22}I noticed in some of my writing jobs that the BufferedMutator would get stuck retrying writes against a dead server.{color} {code:java} 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: dummy_table 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST 2019; NOT retrying, failed=1 -- final attempt! 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] IngestRawData.map(): [B@258bc2c7: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: Operation rpcTimeout: 1 time, servers with issues: ,17020,1547848193782 {code} After the single remaining action permanently failed, it would resume progress only to get stuck again retrying against the same dead server: {code:java} 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: dummy_table 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: dummy_table 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last exception=java.net.ConnectException: Call to failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: connection timed out: on ,17020,1547848193782, tracking started null, retrying after=20089ms, operationsToReplay=1 {code} Only restarting the client process to generate a new BufferedMutator instance would fix the issue, at least until the next regionserver crash > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequ
[jira] [Updated] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommy Li updated HBASE-21775: - Attachment: HBASE-21775.master.001.patch > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommy Li updated HBASE-21775: - Status: Patch Available (was: Open) > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21775.master.001.patch > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > -- This message was sent by Atlassian JIRA (v7.6.3#76005)