[jira] [Assigned] (HBASE-21785) master reports open regions as RITs and also messes up rit age metric
[ https://issues.apache.org/jira/browse/HBASE-21785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning reassigned HBASE-21785: - Assignee: Sergey Shelukhin (was: David Manning) > master reports open regions as RITs and also messes up rit age metric > - > > Key: HBASE-21785 > URL: https://issues.apache.org/jira/browse/HBASE-21785 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.0 > > Attachments: HBASE-21785.01.patch, HBASE-21785.patch > > > {noformat} > RegionState RIT time (ms) Retries > dba183f0dadfcc9dc8ae0a6dd59c84e6 dba183f0dadfcc9dc8ae0a6dd59c84e6. > state=OPEN, ts=Wed Dec 31 16:00:00 PST 1969 (1548453918s ago), > server=server,17020,1548452922054 1548453918735 0 > {noformat} > RIT age metric also gets set to a bogus value. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HBASE-21785) master reports open regions as RITs and also messes up rit age metric
[ https://issues.apache.org/jira/browse/HBASE-21785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning reassigned HBASE-21785: - Assignee: David Manning (was: Sergey Shelukhin) > master reports open regions as RITs and also messes up rit age metric > - > > Key: HBASE-21785 > URL: https://issues.apache.org/jira/browse/HBASE-21785 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.0 >Reporter: Sergey Shelukhin >Assignee: David Manning >Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.0 > > Attachments: HBASE-21785.01.patch, HBASE-21785.patch > > > {noformat} > RegionState RIT time (ms) Retries > dba183f0dadfcc9dc8ae0a6dd59c84e6 dba183f0dadfcc9dc8ae0a6dd59c84e6. > state=OPEN, ts=Wed Dec 31 16:00:00 PST 1969 (1548453918s ago), > server=server,17020,1548452922054 1548453918735 0 > {noformat} > RIT age metric also gets set to a bogus value. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28663) CanaryTool continues executing and scanning after timeout
[ https://issues.apache.org/jira/browse/HBASE-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-28663: -- Description: If you run the {{CanaryTool}} in region mode until it reaches the configured timeout, the logs and sink results will show that it can continue executing and scanning for 10 seconds. This is because the RegionTasks have already been submitted to an ExecutorService which continues execution after timeout, and the Monitor continues execution on a separate thread. The 10 second delay in shutdown is seen, in hbase 2.x at least, because {{runMonitor}} will close the {{Connection}} and that process ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094]) will lead to {{ConnectionImplementation#close}} ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300]) and inside {{shutdownPools}} we will potentially wait the full 10 seconds of {{awaitTermination}} if client operations are in progress. The scenario can be improved by simply interrupting the monitor thread, as we will often be in an {{invokeAll}} call in a {{sniff}} method. The {{invokeAll}} method is blocking, and interrupting the monitor in this call will interrupt the client threads and generally shutdown properly and timely. However, we can be more robust by also watching for a shutdown signal in the various tasks such as {{RegionTask}} so any remaining tasks will drain quickly and without errors. This will remove a lot of errors from the canary logs during shutdown. {code:java} 2024-06-12 02:57:14 [Time-limited test] ERROR tool.Canary(1076): The monitor is running too long (1140098) after timeout limit:114 will be killed itself !! 2024-06-12 02:57:14 [Time-limited test] INFO client.ConnectionImplementation(2039): Closing master protocol: MasterService 2024-06-12 02:57:14 [pool-3-thread-4] ERROR tool.Canary(353): Read from REGION1. on serverName=REGIONSERVER-1, columnFamily=0 failed java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@54f2a9a4 rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting down, pool size = 7, active threads = 7, queued tasks = 0, completed tasks = 180094] at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:199) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:271) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:440) at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:314) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:612) at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.readColumnFamily(CanaryTool.java:565) at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:609) at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503) at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471) [... repeats for 10 seconds and tens of thousands of regions ... ] 2024-06-12 02:57:16 [pool-3-thread-11] ERROR tool.Canary(353): Read from REGION1. on serverName=REGIONSERVER-2, columnFamily=0 failed java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@d08d21f rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting down, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 180098] [...] 2024-06-12 02:57:24 [pool-3-thread-11] ERROR tool.Canary(353): Read from REGION42000. on serverName=REGIONSERVER-3, columnFamily=0 failed java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@38e7a5a1 rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 180101] 2024-06-12T02:57:24.202Z, java.io.InterruptedIOException at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:294) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:255) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:53) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:191) at
[jira] [Updated] (HBASE-28663) CanaryTool continues executing and scanning after timeout
[ https://issues.apache.org/jira/browse/HBASE-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-28663: -- Status: Patch Available (was: In Progress) > CanaryTool continues executing and scanning after timeout > - > > Key: HBASE-28663 > URL: https://issues.apache.org/jira/browse/HBASE-28663 > Project: HBase > Issue Type: Bug > Components: canary >Affects Versions: 2.0.0, 3.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > Labels: pull-request-available > Original Estimate: 24h > Remaining Estimate: 24h > > If you run the {{CanaryTool}} in region mode until it reaches the configured > timeout, the logs and sink results will show that it can continue executing > and scanning for 10 seconds. > This is because the RegionTasks have already been submitted to an > ExecutorService which continues execution after timeout, and the Monitor > continues execution on a separate thread. > The 10 second delay in shutdown is seen, in hbase 2.x at least, because > {{runMonitor}} will close the {{Connection}} and that process > ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094]) > will lead to {{ConnectionImplementation#close}} > ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300]) > and inside {{shutdownPools}} we will potentially wait the full 10 seconds of > {{awaitTermination}} if client operations are in progress. > The scenario can be improved by simply interrupting the monitor thread, as we > will often be in an {{invokeAll}} call in a {{sniff}} method. The > {{invokeAll}} method is blocking, and interrupting the monitor in this call > will interrupt the client threads and generally shutdown properly and timely. > However, we can be more robust by also watching for a shutdown signal in the > various tasks such as {{RegionTask}} so any remaining tasks will drain > quickly and without errors. > > {code:java} > 2024-06-12 02:57:14 [Time-limited test] ERROR tool.Canary(1076): The monitor > is running too long (1140098) after timeout limit:114 will be killed > itself !! > 2024-06-12 02:57:14 [Time-limited test] INFO > client.ConnectionImplementation(2039): Closing master protocol: MasterService > 2024-06-12 02:57:14 [pool-3-thread-4] ERROR tool.Canary(353): Read from > REGION1. on serverName=REGIONSERVER-1, columnFamily=0 failed > java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: > Task > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@54f2a9a4 > rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting > down, pool size = 7, active threads = 7, queued tasks = 0, completed tasks = > 180094] > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:199) > at > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:271) > at > org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:440) > at > org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:314) > at > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:612) > at > org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.readColumnFamily(CanaryTool.java:565) > at > org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:609) > at > org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503) > at > org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471) > [... repeats for 10 seconds and tens of thousands of regions ... ] > 2024-06-12 02:57:16 [pool-3-thread-11] ERROR tool.Canary(353): Read from > REGION1. on serverName=REGIONSERVER-2, columnFamily=0 failed > java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: > Task > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@d08d21f > rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting > down, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = > 180098] > [...] > 2024-06-12 02:57:24 [pool-3-thread-11] ERROR tool.Canary(353): Read from > REGION42000. on serverName=REGIONSERVER-3, columnFamily=0 failed > java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: > Task > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@38e7a5a1 > rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Terminated, > pool
[jira] [Work logged] (HBASE-28663) CanaryTool continues executing and scanning after timeout
[ https://issues.apache.org/jira/browse/HBASE-28663?focusedWorklogId=923552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-923552 ] David Manning logged work on HBASE-28663: - Author: David Manning Created on: 14/Jun/24 17:05 Start Date: 14/Jun/24 17:05 Worklog Time Spent: 24h Issue Time Tracking --- Worklog Id: (was: 923552) Remaining Estimate: 0h (was: 24h) Time Spent: 24h > CanaryTool continues executing and scanning after timeout > - > > Key: HBASE-28663 > URL: https://issues.apache.org/jira/browse/HBASE-28663 > Project: HBase > Issue Type: Bug > Components: canary >Affects Versions: 2.0.0, 3.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > Labels: pull-request-available > Original Estimate: 24h > Time Spent: 24h > Remaining Estimate: 0h > > If you run the {{CanaryTool}} in region mode until it reaches the configured > timeout, the logs and sink results will show that it can continue executing > and scanning for 10 seconds. > This is because the RegionTasks have already been submitted to an > ExecutorService which continues execution after timeout, and the Monitor > continues execution on a separate thread. > The 10 second delay in shutdown is seen, in hbase 2.x at least, because > {{runMonitor}} will close the {{Connection}} and that process > ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094]) > will lead to {{ConnectionImplementation#close}} > ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300]) > and inside {{shutdownPools}} we will potentially wait the full 10 seconds of > {{awaitTermination}} if client operations are in progress. > The scenario can be improved by simply interrupting the monitor thread, as we > will often be in an {{invokeAll}} call in a {{sniff}} method. The > {{invokeAll}} method is blocking, and interrupting the monitor in this call > will interrupt the client threads and generally shutdown properly and timely. > However, we can be more robust by also watching for a shutdown signal in the > various tasks such as {{RegionTask}} so any remaining tasks will drain > quickly and without errors. > > {code:java} > 2024-06-12 02:57:14 [Time-limited test] ERROR tool.Canary(1076): The monitor > is running too long (1140098) after timeout limit:114 will be killed > itself !! > 2024-06-12 02:57:14 [Time-limited test] INFO > client.ConnectionImplementation(2039): Closing master protocol: MasterService > 2024-06-12 02:57:14 [pool-3-thread-4] ERROR tool.Canary(353): Read from > REGION1. on serverName=REGIONSERVER-1, columnFamily=0 failed > java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: > Task > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@54f2a9a4 > rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting > down, pool size = 7, active threads = 7, queued tasks = 0, completed tasks = > 180094] > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:199) > at > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:271) > at > org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:440) > at > org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:314) > at > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:612) > at > org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.readColumnFamily(CanaryTool.java:565) > at > org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:609) > at > org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503) > at > org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471) > [... repeats for 10 seconds and tens of thousands of regions ... ] > 2024-06-12 02:57:16 [pool-3-thread-11] ERROR tool.Canary(353): Read from > REGION1. on serverName=REGIONSERVER-2, columnFamily=0 failed > java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: > Task > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@d08d21f > rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting > down, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = > 180098] > [...] > 2024-06-12 02:57:24 [pool-3-thread-11] ERROR tool.Canary(353): Read from
[jira] [Updated] (HBASE-28663) CanaryTool continues executing and scanning after timeout
[ https://issues.apache.org/jira/browse/HBASE-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-28663: -- Description: If you run the {{CanaryTool}} in region mode until it reaches the configured timeout, the logs and sink results will show that it can continue executing and scanning for 10 seconds. This is because the RegionTasks have already been submitted to an ExecutorService which continues execution after timeout, and the Monitor continues execution on a separate thread. The 10 second delay in shutdown is seen, in hbase 2.x at least, because {{runMonitor}} will close the {{Connection}} and that process ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094]) will lead to {{ConnectionImplementation#close}} ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300]) and inside {{shutdownPools}} we will potentially wait the full 10 seconds of {{awaitTermination}} if client operations are in progress. The scenario can be improved by simply interrupting the monitor thread, as we will often be in an {{invokeAll}} call in a {{sniff}} method. The {{invokeAll}} method is blocking, and interrupting the monitor in this call will interrupt the client threads and generally shutdown properly and timely. However, we can be more robust by also watching for a shutdown signal in the various tasks such as {{RegionTask}} so any remaining tasks will drain quickly and without errors. {code:java} 2024-06-12 02:57:14 [Time-limited test] ERROR tool.Canary(1076): The monitor is running too long (1140098) after timeout limit:114 will be killed itself !! 2024-06-12 02:57:14 [Time-limited test] INFO client.ConnectionImplementation(2039): Closing master protocol: MasterService 2024-06-12 02:57:14 [pool-3-thread-4] ERROR tool.Canary(353): Read from REGION1. on serverName=REGIONSERVER-1, columnFamily=0 failed java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@54f2a9a4 rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting down, pool size = 7, active threads = 7, queued tasks = 0, completed tasks = 180094] at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:199) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:271) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:440) at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:314) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:612) at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.readColumnFamily(CanaryTool.java:565) at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:609) at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503) at org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471) [... repeats for 10 seconds and tens of thousands of regions ... ] 2024-06-12 02:57:16 [pool-3-thread-11] ERROR tool.Canary(353): Read from REGION1. on serverName=REGIONSERVER-2, columnFamily=0 failed java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@d08d21f rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting down, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 180098] [...] 2024-06-12 02:57:24 [pool-3-thread-11] ERROR tool.Canary(353): Read from REGION42000. on serverName=REGIONSERVER-3, columnFamily=0 failed java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@38e7a5a1 rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 180101] 2024-06-12T02:57:24.202Z, java.io.InterruptedIOException at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:294) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:255) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:53) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:191) at
[jira] [Work started] (HBASE-28663) CanaryTool continues executing and scanning after timeout
[ https://issues.apache.org/jira/browse/HBASE-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-28663 started by David Manning. - > CanaryTool continues executing and scanning after timeout > - > > Key: HBASE-28663 > URL: https://issues.apache.org/jira/browse/HBASE-28663 > Project: HBase > Issue Type: Bug > Components: canary >Affects Versions: 2.0.0, 3.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > If you run the {{CanaryTool}} in region mode until it reaches the configured > timeout, the logs and sink results will show that it can continue executing > and scanning for 10 seconds. > This is because the RegionTasks have already been submitted to an > ExecutorService which continues execution after timeout, and the Monitor > continues execution on a separate thread. > The 10 seconds is seen in hbase 2.x, at least, because {{runMonitor}} will > close the {{Connection}} and that process > ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094]) > will lead to {{ConnectionImplementation#close}} > ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300]) > and inside {{shutdownPools}} we will potentially wait the full 10 seconds of > {{awaitTermination}} if client operations are in progress. > The scenario can be improved by simply interrupting the monitor thread, as we > will often be in an {{invokeAll}} call in a {{sniff}} method, which will > interrupt the client threads and generally shutdown properly and timely. > However, we could be more robust by also watching for a shutdown signal in > the various tasks such as {{{}RegionTask{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28663) CanaryTool continues executing and scanning after timeout
David Manning created HBASE-28663: - Summary: CanaryTool continues executing and scanning after timeout Key: HBASE-28663 URL: https://issues.apache.org/jira/browse/HBASE-28663 Project: HBase Issue Type: Bug Components: canary Affects Versions: 2.0.0, 3.0.0 Reporter: David Manning Assignee: David Manning If you run the {{CanaryTool}} in region mode until it reaches the configured timeout, the logs and sink results will show that it can continue executing and scanning for 10 seconds. This is because the RegionTasks have already been submitted to an ExecutorService which continues execution after timeout, and the Monitor continues execution on a separate thread. The 10 seconds is seen in hbase 2.x, at least, because {{runMonitor}} will close the {{Connection}} and that process ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094]) will lead to {{ConnectionImplementation#close}} ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300]) and inside {{shutdownPools}} we will potentially wait the full 10 seconds of {{awaitTermination}} if client operations are in progress. The scenario can be improved by simply interrupting the monitor thread, as we will often be in an {{invokeAll}} call in a {{sniff}} method, which will interrupt the client threads and generally shutdown properly and timely. However, we could be more robust by also watching for a shutdown signal in the various tasks such as {{{}RegionTask{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HBASE-28584) RS SIGSEGV under heavy replication load
[ https://issues.apache.org/jira/browse/HBASE-28584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848778#comment-17848778 ] David Manning edited comment on HBASE-28584 at 5/22/24 11:57 PM: - We see it too in HBASE-28437. We have hbase.region.store.parallel.put.limit=0, but that is also the default in 2.5 after HBASE-26814. For us it always correlates with a lot of load that shows up suddenly, and then replicates to a peer cluster, and that peer cluster throws RegionTooBusyExceptions (blockedRequestCount metric.) was (Author: dmanning): We see it too. We have hbase.region.store.parallel.put.limit=0, but that is also the default in 2.5 after HBASE-26814. For us it always correlates with a lot of load that shows up suddenly, and then replicates to a peer cluster, and that peer cluster throws RegionTooBusyExceptions (blockedRequestCount metric.) > RS SIGSEGV under heavy replication load > --- > > Key: HBASE-28584 > URL: https://issues.apache.org/jira/browse/HBASE-28584 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.5.6 > Environment: RHEL 7.9 > JDK 11.0.23 > Hadoop 3.2.4 > Hbase 2.5.6 >Reporter: Whitney Jackson >Priority: Major > > I'm observing RS crashes under heavy replication load: > > {code:java} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f7546873b69, pid=29890, tid=36828 > # > # JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.23+7) (build > 11.0.23+7-LTS-222) > # Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.23+7-LTS-222, mixed > mode, tiered, compressed oops, g1 gc, linux-amd64) > # Problematic frame: > # J 24625 c2 > org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V > (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209] > {code} > > The heavier load comes when a replication peer has been disabled for several > hours for patching etc. When the peer is re-enabled the replication load is > high until the peer is all caught up. The crashes happen on the cluster > receiving the replication edits. > > I believe this problem started after upgrading from 2.4.x to 2.5.x. > > One possibly relevant non-standard config I run with: > {code:java} > > hbase.region.store.parallel.put.limit > > 100 > Added after seeing "failed to accept edits" replication errors > in the destination region servers indicating this limit was being exceeded > while trying to process replication edits. > > {code} > > I understand from other Jiras that the problem is likely around direct memory > usage by Netty. I haven't yet tried switching the Netty allocator to > {{unpooled}} or {{{}heap{}}}. I also haven't yet tried any of the > {{io.netty.allocator.*}} options. > > {{MaxDirectMemorySize}} is set to 26g. > > Here's the full stack for the relevant thread: > > {code:java} > Stack: [0x7f72e2e5f000,0x7f72e2f6], sp=0x7f72e2f5e450, free > space=1021k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > J 24625 c2 > org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V > (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209] > J 26253 c2 > org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I > (21 bytes) @ 0x7f7545af2d84 [0x7f7545af2d20+0x0064] > J 22971 c2 > org.apache.hadoop.hbase.codec.KeyValueCodecWithTags$KeyValueEncoder.write(Lorg/apache/hadoop/hbase/Cell;)V > (27 bytes) @ 0x7f754663f700 [0x7f754663f4c0+0x0240] > J 25251 c2 > org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (90 bytes) @ 0x7f7546a53038 [0x7f7546a50e60+0x21d8] > J 21182 c2 > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (73 bytes) @ 0x7f7545f4d90c [0x7f7545f4d3a0+0x056c] > J 21181 c2 > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (149 bytes) @ 0x7f7545fd680c [0x7f7545fd65e0+0x022c] > J 25389 c2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$$Lambda$247.run()V > (16 bytes) @ 0x7f7546ade660 [0x7f7546ade140+0x0520] > J 24098 c2 > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z
[jira] [Commented] (HBASE-28584) RS SIGSEGV under heavy replication load
[ https://issues.apache.org/jira/browse/HBASE-28584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848778#comment-17848778 ] David Manning commented on HBASE-28584: --- We see it too. We have hbase.region.store.parallel.put.limit=0, but that is also the default in 2.5 after HBASE-26814. For us it always correlates with a lot of load that shows up suddenly, and then replicates to a peer cluster, and that peer cluster throws RegionTooBusyExceptions (blockedRequestCount metric.) > RS SIGSEGV under heavy replication load > --- > > Key: HBASE-28584 > URL: https://issues.apache.org/jira/browse/HBASE-28584 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.5.6 > Environment: RHEL 7.9 > JDK 11.0.23 > Hadoop 3.2.4 > Hbase 2.5.6 >Reporter: Whitney Jackson >Priority: Major > > I'm observing RS crashes under heavy replication load: > > {code:java} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f7546873b69, pid=29890, tid=36828 > # > # JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.23+7) (build > 11.0.23+7-LTS-222) > # Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.23+7-LTS-222, mixed > mode, tiered, compressed oops, g1 gc, linux-amd64) > # Problematic frame: > # J 24625 c2 > org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V > (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209] > {code} > > The heavier load comes when a replication peer has been disabled for several > hours for patching etc. When the peer is re-enabled the replication load is > high until the peer is all caught up. The crashes happen on the cluster > receiving the replication edits. > > I believe this problem started after upgrading from 2.4.x to 2.5.x. > > One possibly relevant non-standard config I run with: > {code:java} > > hbase.region.store.parallel.put.limit > > 100 > Added after seeing "failed to accept edits" replication errors > in the destination region servers indicating this limit was being exceeded > while trying to process replication edits. > > {code} > > I understand from other Jiras that the problem is likely around direct memory > usage by Netty. I haven't yet tried switching the Netty allocator to > {{unpooled}} or {{{}heap{}}}. I also haven't yet tried any of the > {{io.netty.allocator.*}} options. > > {{MaxDirectMemorySize}} is set to 26g. > > Here's the full stack for the relevant thread: > > {code:java} > Stack: [0x7f72e2e5f000,0x7f72e2f6], sp=0x7f72e2f5e450, free > space=1021k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > J 24625 c2 > org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V > (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209] > J 26253 c2 > org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I > (21 bytes) @ 0x7f7545af2d84 [0x7f7545af2d20+0x0064] > J 22971 c2 > org.apache.hadoop.hbase.codec.KeyValueCodecWithTags$KeyValueEncoder.write(Lorg/apache/hadoop/hbase/Cell;)V > (27 bytes) @ 0x7f754663f700 [0x7f754663f4c0+0x0240] > J 25251 c2 > org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (90 bytes) @ 0x7f7546a53038 [0x7f7546a50e60+0x21d8] > J 21182 c2 > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (73 bytes) @ 0x7f7545f4d90c [0x7f7545f4d3a0+0x056c] > J 21181 c2 > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (149 bytes) @ 0x7f7545fd680c [0x7f7545fd65e0+0x022c] > J 25389 c2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$$Lambda$247.run()V > (16 bytes) @ 0x7f7546ade660 [0x7f7546ade140+0x0520] > J 24098 c2 > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z > (109 bytes) @ 0x7f754678fbb8 [0x7f754678f8e0+0x02d8] > J 27297% c2 > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run()V (603 > bytes) @ 0x7f75466c4d48 [0x7f75466c4c80+0x00c8] > j > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run()V+44 > j >
[jira] [Created] (HBASE-28422) SplitWalProcedure will attempt SplitWalRemoteProcedure on the same target RegionServer indefinitely
David Manning created HBASE-28422: - Summary: SplitWalProcedure will attempt SplitWalRemoteProcedure on the same target RegionServer indefinitely Key: HBASE-28422 URL: https://issues.apache.org/jira/browse/HBASE-28422 Project: HBase Issue Type: Bug Components: master, proc-v2, wal Affects Versions: 2.5.5 Reporter: David Manning Similar to HBASE-28050. If HMaster selects a RegionServer for SplitWalRemoteProcedure, it will retry this server as long as the server is alive. I believe this is because even though {{RSProcedureDispatcher.ExecuteProceduresRemoteCall.run}} calls {{{}remoteCallFailed{}}}, there is no logic after this to select a new target server. For {{TransitRegionStateProcedure}} there is logic to select a new server for opening a region, using {{{}forceNewPlan{}}}. But SplitWalRemoteProcedure only has logic to try another server if we receive a {{DoNotRetryIOException}} in SplitWALRemoteProcedure#complete: [https://github.com/apache/hbase/blob/780ff56b3f23e7041ef1b705b7d3d0a53fdd05ae/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SplitWALRemoteProcedure.java#L104-L110] If we receive any other IOException, we will just retry the target server forever. Just like in HBASE-28050, if there is a SaslException, this will never lead to retrying a SplitWalRemoteProcedure on a new server, which can lead to ServerCrashProcedure never finishing until the target server for SplitWalRemoteProcedure is restarted. The following log is seen repeatedly, always sending to the same host. {code:java} 2024-01-31 15:59:43,616 WARN [RSProcedureDispatcher-pool-72846] procedure.SplitWALRemoteProcedure - Failed split of hdfs:///hbase/WALs/,1704984571464-splitting/1704984571464.1706710908543, retry... java.io.IOException: Call to address= failed on local exception: java.io.IOException: Can not send request because relogin is in progress. at sun.reflect.GeneratedConstructorAccessor363.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:239) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:391) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:420) at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114) at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129) at org.apache.hadoop.hbase.ipc.NettyRpcConnection.lambda$sendRequest$4(NettyRpcConnection.java:365) at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403) at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Can not send request because relogin is in progress. at org.apache.hadoop.hbase.ipc.NettyRpcConnection.sendRequest0(NettyRpcConnection.java:321) at org.apache.hadoop.hbase.ipc.NettyRpcConnection.lambda$sendRequest$4(NettyRpcConnection.java:363) ... 8 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-28344) Flush journal logs are missing from 2.x
[ https://issues.apache.org/jira/browse/HBASE-28344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823462#comment-17823462 ] David Manning commented on HBASE-28344: --- Compaction status journal has the same problem, too. > Flush journal logs are missing from 2.x > > > Key: HBASE-28344 > URL: https://issues.apache.org/jira/browse/HBASE-28344 > Project: HBase > Issue Type: Improvement >Reporter: Prathyusha >Assignee: Prathyusha >Priority: Minor > > After refactoring of TaskMonitor from branch-1 > [ public synchronized MonitoredTask createStatus(String > description)|https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/monitoring/TaskMonitor.java#L87] > to branch-2/master > public MonitoredTask createStatus(String description){ [return > createStatus(description, > false);|https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/monitoring/TaskMonitor.java#L87] > > Flush journal logs are missing. > While flush, currently we do no set ignore monitor flag as true here > [MonitoredTask status = TaskMonitor.get().createStatus("Flushing " + > this);|https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L2459] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-25749) Improved logging when interrupting active RPC handlers holding the region close lock (HBASE-25212 hbase.regionserver.close.wait.abort)
[ https://issues.apache.org/jira/browse/HBASE-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818200#comment-17818200 ] David Manning commented on HBASE-25749: --- [~umesh9414] It doesn't let me assign to you - maybe your profile has to be updated to be allowed items to be assigned. > Improved logging when interrupting active RPC handlers holding the region > close lock (HBASE-25212 hbase.regionserver.close.wait.abort) > -- > > Key: HBASE-25749 > URL: https://issues.apache.org/jira/browse/HBASE-25749 > Project: HBase > Issue Type: Bug > Components: regionserver, rpc >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.4.0 >Reporter: David Manning >Priority: Minor > Fix For: 3.0.0-beta-2 > > > HBASE-25212 adds an optional improvement to Close Region, for interrupting > active RPC handlers holding the region close lock. If, after the timeout is > reached, the close lock can still not be acquired, the regionserver may > abort. It would be helpful to add logging for which threads or components are > holding the region close lock at this time. > Depending on the size of regionLockHolders, or use of any stack traces, log > output may need to be truncated. The interrupt code is in > HRegion#interruptRegionOperations. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-28221) Introduce regionserver metric for delayed flushes
[ https://issues.apache.org/jira/browse/HBASE-28221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816282#comment-17816282 ] David Manning commented on HBASE-28221: --- Alerting on {{flushQueueLength}} is not really the same. You will capture every delayed flush from {{PeriodicMemstoreFlusher}} as well. So it will capture both cases where you enqueue a delayed flush: either for a very busy region beyond {{blockingStoreFiles}} limit or for a region which is mostly idle and flushing after memstore edits are an hour old. Alerting on {{blockedRequestsCount}} will give you a stronger signal, because this is when you have a {{RegionTooBusyException}} due to the memstore being full, waiting on a delayed flush. But if you want to alert on a delayed flush without a full memstore, I don't know that it could be done today without adding a new metric. If you have site-wide settings for {{blockingStoreFiles}}, you could alert when {{maxStoreFileCount}} is above, or near, {{blockingStoreFiles}}. But if it varies by table, you would have to alert per-table. So there could still be some value in adding this type of metric (but consider whether alerting on client impact, i.e. {{RegionTooBusyException}} and {{blockedRequestsCount}} would be sufficient first.) [~rkrahul324] [~vjasani] > Introduce regionserver metric for delayed flushes > - > > Key: HBASE-28221 > URL: https://issues.apache.org/jira/browse/HBASE-28221 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.4.17, 2.5.6 >Reporter: Viraj Jasani >Assignee: Rahul Kumar >Priority: Major > Fix For: 2.4.18, 2.7.0, 2.5.8, 3.0.0-beta-2, 2.6.1 > > > If compaction is disabled temporarily to allow stabilizing hdfs load, we can > forget re-enabling the compaction. This can result into flushes getting > delayed for "hbase.hstore.blockingWaitTime" time (90s). While flushes do > happen eventually after waiting for max blocking time, it is important to > realize that any cluster cannot function well with compaction disabled for > significant amount of time. > > We would also block any write requests until region is flushed (90+ sec, by > default): > {code:java} > 2023-11-27 20:40:52,124 WARN [,queue=18,port=60020] regionserver.HRegion - > Region is too busy due to exceeding memstore size limit. > org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, > regionName=table1,1699923733811.4fd5e52e2133df1e347f32c646f23ab4., > server=server-1,60020,1699421714454, memstoreSize=1073820928, > blockingMemStoreSize=1073741824 > at > org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4200) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3264) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3215) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:967) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:895) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2524) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36812) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2432) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) > {code} > > Delayed flush logs: > {code:java} > LOG.warn("{} has too many store files({}); delaying flush up to {} ms", > region.getRegionInfo().getEncodedName(), getStoreFileCount(region), > this.blockingWaitTime); {code} > Suggestion: Introduce regionserver metric (MetricsRegionServerSource) for the > num of flushes getting delayed due to too many store files. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HBASE-28257) Memstore flushRequest can be blocked by a delayed flush scheduled by PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes
[ https://issues.apache.org/jira/browse/HBASE-28257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning reassigned HBASE-28257: - Assignee: David Manning > Memstore flushRequest can be blocked by a delayed flush scheduled by > PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes > - > > Key: HBASE-28257 > URL: https://issues.apache.org/jira/browse/HBASE-28257 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 3.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > *Steps to reproduce:* > # Make an edit to a region. > # Wait 1 hour + 10 seconds (default value of > {{hbase.regionserver.optionalcacheflushinterval}} plus > {{hbase.regionserver.flush.check.period}}.) > # Make a very large number of edits to the region (i.e. >= 1GB, pressure the > memstore.) > *Expected:* > Memstore pressure leads to flushes. > *Result:* > The PeriodicMemstoreFlusher has scheduled a refresh with a random delay of > 0-5 minutes (default for > {{hbase.regionserver.periodicmemstoreflusher.rangeofdelayseconds}}.) Memstore > pressure flushes are blocked by the scheduled delayed flush. Client receives > 0-5 minutes of {{RegionTooBusyExceptions}} until the delayed flush executes. > *Logs:* > 2023-12-13 06:00:13,573 INFO regionserver.HRegionServer - > MemstoreFlusherChore requesting flush of because has an old > edit so flush to free WALs after random delay 166761 ms > 2023-12-13 06:00:53,219 DEBUG regionserver.HRegion - Flush requested on > > 2023-12-13 06:01:47,694 WARN regionserver.HRegion - Region is too busy due to > exceeding memstore size limit. > org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, > regionName=, server= > at org.apache.hadoop.hbase.regionserver.HRegion.checkResources > ... > repeats > ... > 2023-12-13 06:01:52,223 WARN regionserver.HRegion - Region is too busy due to > exceeding memstore size limit. > org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, > regionName=, server= > at org.apache.hadoop.hbase.regionserver.HRegion.checkResources > ... > 2023-12-13 06:03:00,340 INFO regionserver.HRegion - Flushing 1/1 > column families, dataSize=534.77 MB heapSize=1.00 GB > 2023-12-13 06:03:09,634 INFO regionserver.HRegion - Finished flush of > dataSize ~534.77 MB/560744948, heapSize ~1.00 GB/1073816296, currentSize=0 > B/0 for in 9294ms, sequenceid=21310753, compaction requested=false > Note also this is the same cause as discussed in HBASE-16030 conversation > https://issues.apache.org/jira/browse/HBASE-16030?focusedCommentId=15340153=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15340153 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-28293) Add metric for GetClusterStatus request count.
[ https://issues.apache.org/jira/browse/HBASE-28293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804409#comment-17804409 ] David Manning commented on HBASE-28293: --- Yeah it would be nice to follow similar patterns as for other HMaster operations, like Move, Snapshot, etc. But I think most of those are now tracked by procedures, which we would not have in this case. > Add metric for GetClusterStatus request count. > -- > > Key: HBASE-28293 > URL: https://issues.apache.org/jira/browse/HBASE-28293 > Project: HBase > Issue Type: Bug >Reporter: Rushabh Shah >Priority: Major > > We have been bitten multiple times by GetClusterStatus request overwhelming > HMaster's memory usage. It would be good to add a metric for the total > GetClusterStatus requests count. > In almost all of our production incidents involving GetClusterStatus request, > HMaster were running out of memory with many clients call this RPC in > parallel and the response size is very big. > In hbase2 we have > [ClusterMetrics.Option|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterMetrics.java#L164-L224] > which can reduce the size of the response. > It would be nice to add another metric to indicate if the response size of > GetClusterStatus is greater than some threshold (like 5MB) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-28271) Infinite waiting on lock acquisition by snapshot can result in unresponsive master
[ https://issues.apache.org/jira/browse/HBASE-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799543#comment-17799543 ] David Manning commented on HBASE-28271: --- {quote}In cases where a region stays in RIT for considerable time, if enough attempts are made by the client to create snapshots on the table, it can easily exhaust all handler threads, leading to potentially unresponsive master.{quote} It can happen more easily than this, too, because you don't have to make repeat attempts to create snapshot on the same table. You can attempt to snapshot a different table, and it will still hang a new RPC handler. This is because the {{SnapshotManager#snapshotTable}} is {{synchronized}} and this is where the {{handler.prepare()}} call is made to acquire the lock. We indefinitely await the lock held by the region in transition, but we do so within {{SnapshotManager}}'s synchronized block. Any additional snapshot RPC, even for a different table, will end up blocked on entering a separate {{synchronized}} method in {{SnapshotManager#cleanupSentinels}}. This makes the condition easier to hit if you are doing a process which snapshots all tables in the cluster. > Infinite waiting on lock acquisition by snapshot can result in unresponsive > master > -- > > Key: HBASE-28271 > URL: https://issues.apache.org/jira/browse/HBASE-28271 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-4, 2.4.17, 2.5.7 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Attachments: image.png > > > When a region is stuck in transition for significant time, any attempt to > take snapshot on the table would keep master handler thread in forever > waiting state. As part of the creating snapshot on enabled or disabled table, > in order to get the table level lock, LockProcedure is executed but if any > region of the table is in transition, LockProcedure could not be executed by > the snapshot handler, resulting in forever waiting until the region > transition is completed, allowing the table level lock to be acquired by the > snapshot handler. > In cases where a region stays in RIT for considerable time, if enough > attempts are made by the client to create snapshots on the table, it can > easily exhaust all handler threads, leading to potentially unresponsive > master. Attached a sample thread dump. > Proposal: The snapshot handler should not stay stuck forever if it cannot > take table level lock, it should fail-fast. > !image.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28257) Memstore flushRequest can be blocked by a delayed flush scheduled by PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes
[ https://issues.apache.org/jira/browse/HBASE-28257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning resolved HBASE-28257. --- Resolution: Duplicate > Memstore flushRequest can be blocked by a delayed flush scheduled by > PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes > - > > Key: HBASE-28257 > URL: https://issues.apache.org/jira/browse/HBASE-28257 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 3.0.0 >Reporter: David Manning >Priority: Minor > > *Steps to reproduce:* > # Make an edit to a region. > # Wait 1 hour + 10 seconds (default value of > {{hbase.regionserver.optionalcacheflushinterval}} plus > {{hbase.regionserver.flush.check.period}}.) > # Make a very large number of edits to the region (i.e. >= 1GB, pressure the > memstore.) > *Expected:* > Memstore pressure leads to flushes. > *Result:* > The PeriodicMemstoreFlusher has scheduled a refresh with a random delay of > 0-5 minutes (default for > {{hbase.regionserver.periodicmemstoreflusher.rangeofdelayseconds}}.) Memstore > pressure flushes are blocked by the scheduled delayed flush. Client receives > 0-5 minutes of {{RegionTooBusyExceptions}} until the delayed flush executes. > *Logs:* > 2023-12-13 06:00:13,573 INFO regionserver.HRegionServer - > MemstoreFlusherChore requesting flush of because has an old > edit so flush to free WALs after random delay 166761 ms > 2023-12-13 06:00:53,219 DEBUG regionserver.HRegion - Flush requested on > > 2023-12-13 06:01:47,694 WARN regionserver.HRegion - Region is too busy due to > exceeding memstore size limit. > org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, > regionName=, server= > at org.apache.hadoop.hbase.regionserver.HRegion.checkResources > ... > repeats > ... > 2023-12-13 06:01:52,223 WARN regionserver.HRegion - Region is too busy due to > exceeding memstore size limit. > org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, > regionName=, server= > at org.apache.hadoop.hbase.regionserver.HRegion.checkResources > ... > 2023-12-13 06:03:00,340 INFO regionserver.HRegion - Flushing 1/1 > column families, dataSize=534.77 MB heapSize=1.00 GB > 2023-12-13 06:03:09,634 INFO regionserver.HRegion - Finished flush of > dataSize ~534.77 MB/560744948, heapSize ~1.00 GB/1073816296, currentSize=0 > B/0 for in 9294ms, sequenceid=21310753, compaction requested=false > Note also this is the same cause as discussed in HBASE-16030 conversation > https://issues.apache.org/jira/browse/HBASE-16030?focusedCommentId=15340153=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15340153 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HBASE-28257) Memstore flushRequest can be blocked by a delayed flush scheduled by PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes
[ https://issues.apache.org/jira/browse/HBASE-28257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning reassigned HBASE-28257: - Assignee: (was: David Manning) > Memstore flushRequest can be blocked by a delayed flush scheduled by > PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes > - > > Key: HBASE-28257 > URL: https://issues.apache.org/jira/browse/HBASE-28257 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 3.0.0 >Reporter: David Manning >Priority: Minor > > *Steps to reproduce:* > # Make an edit to a region. > # Wait 1 hour + 10 seconds (default value of > {{hbase.regionserver.optionalcacheflushinterval}} plus > {{hbase.regionserver.flush.check.period}}.) > # Make a very large number of edits to the region (i.e. >= 1GB, pressure the > memstore.) > *Expected:* > Memstore pressure leads to flushes. > *Result:* > The PeriodicMemstoreFlusher has scheduled a refresh with a random delay of > 0-5 minutes (default for > {{hbase.regionserver.periodicmemstoreflusher.rangeofdelayseconds}}.) Memstore > pressure flushes are blocked by the scheduled delayed flush. Client receives > 0-5 minutes of {{RegionTooBusyExceptions}} until the delayed flush executes. > *Logs:* > 2023-12-13 06:00:13,573 INFO regionserver.HRegionServer - > MemstoreFlusherChore requesting flush of because has an old > edit so flush to free WALs after random delay 166761 ms > 2023-12-13 06:00:53,219 DEBUG regionserver.HRegion - Flush requested on > > 2023-12-13 06:01:47,694 WARN regionserver.HRegion - Region is too busy due to > exceeding memstore size limit. > org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, > regionName=, server= > at org.apache.hadoop.hbase.regionserver.HRegion.checkResources > ... > repeats > ... > 2023-12-13 06:01:52,223 WARN regionserver.HRegion - Region is too busy due to > exceeding memstore size limit. > org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, > regionName=, server= > at org.apache.hadoop.hbase.regionserver.HRegion.checkResources > ... > 2023-12-13 06:03:00,340 INFO regionserver.HRegion - Flushing 1/1 > column families, dataSize=534.77 MB heapSize=1.00 GB > 2023-12-13 06:03:09,634 INFO regionserver.HRegion - Finished flush of > dataSize ~534.77 MB/560744948, heapSize ~1.00 GB/1073816296, currentSize=0 > B/0 for in 9294ms, sequenceid=21310753, compaction requested=false > Note also this is the same cause as discussed in HBASE-16030 conversation > https://issues.apache.org/jira/browse/HBASE-16030?focusedCommentId=15340153=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15340153 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28257) Memstore flushRequest can be blocked by a delayed flush scheduled by PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes
[ https://issues.apache.org/jira/browse/HBASE-28257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-28257: -- Description: *Steps to reproduce:* # Make an edit to a region. # Wait 1 hour + 10 seconds (default value of {{hbase.regionserver.optionalcacheflushinterval}} plus {{hbase.regionserver.flush.check.period}}.) # Make a very large number of edits to the region (i.e. >= 1GB, pressure the memstore.) *Expected:* Memstore pressure leads to flushes. *Result:* The PeriodicMemstoreFlusher has scheduled a refresh with a random delay of 0-5 minutes (default for {{hbase.regionserver.periodicmemstoreflusher.rangeofdelayseconds}}.) Memstore pressure flushes are blocked by the scheduled delayed flush. Client receives 0-5 minutes of {{RegionTooBusyExceptions}} until the delayed flush executes. *Logs:* 2023-12-13 06:00:13,573 INFO regionserver.HRegionServer - MemstoreFlusherChore requesting flush of because has an old edit so flush to free WALs after random delay 166761 ms 2023-12-13 06:00:53,219 DEBUG regionserver.HRegion - Flush requested on 2023-12-13 06:01:47,694 WARN regionserver.HRegion - Region is too busy due to exceeding memstore size limit. org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, regionName=, server= at org.apache.hadoop.hbase.regionserver.HRegion.checkResources ... repeats ... 2023-12-13 06:01:52,223 WARN regionserver.HRegion - Region is too busy due to exceeding memstore size limit. org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, regionName=, server= at org.apache.hadoop.hbase.regionserver.HRegion.checkResources ... 2023-12-13 06:03:00,340 INFO regionserver.HRegion - Flushing 1/1 column families, dataSize=534.77 MB heapSize=1.00 GB 2023-12-13 06:03:09,634 INFO regionserver.HRegion - Finished flush of dataSize ~534.77 MB/560744948, heapSize ~1.00 GB/1073816296, currentSize=0 B/0 for in 9294ms, sequenceid=21310753, compaction requested=false Note also this is the same cause as discussed in HBASE-16030 conversation https://issues.apache.org/jira/browse/HBASE-16030?focusedCommentId=15340153=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15340153 was: *Steps to reproduce:* # Make an edit to a region. # Wait 1 hour (default value of hbase.regionserver.optionalcacheflushinterval.) # Make a very large number of edits to the region (i.e. >= 1GB, pressure the memstore.) *Expected:* Memstore pressure leads to flushes. *Result:* The PeriodicMemstoreFlusher has scheduled a refresh with a random delay of 0-5 minutes (default for hbase.regionserver.periodicmemstoreflusher.rangeofdelayseconds.) Memstore pressure flushes are blocked by the scheduled delayed flush. Client receives 0-5 minutes of RegionTooBusyExceptions until the delayed flush executes. *Logs:* 2023-12-13 06:00:13,573 INFO regionserver.HRegionServer - MemstoreFlusherChore requesting flush of because has an old edit so flush to free WALs after random delay 166761 ms 2023-12-13 06:00:53,219 DEBUG regionserver.HRegion - Flush requested on 2023-12-13 06:01:47,694 WARN regionserver.HRegion - Region is too busy due to exceeding memstore size limit. org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, regionName=, server= at org.apache.hadoop.hbase.regionserver.HRegion.checkResources ... repeats ... 2023-12-13 06:01:52,223 WARN regionserver.HRegion - Region is too busy due to exceeding memstore size limit. org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, regionName=, server= at org.apache.hadoop.hbase.regionserver.HRegion.checkResources ... 2023-12-13 06:03:00,340 INFO regionserver.HRegion - Flushing 1/1 column families, dataSize=534.77 MB heapSize=1.00 GB 2023-12-13 06:03:09,634 INFO regionserver.HRegion - Finished flush of dataSize ~534.77 MB/560744948, heapSize ~1.00 GB/1073816296, currentSize=0 B/0 for in 9294ms, sequenceid=21310753, compaction requested=false Note also this is the same cause as discussed in HBASE-16030 conversation https://issues.apache.org/jira/browse/HBASE-16030?focusedCommentId=15340153=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15340153 > Memstore flushRequest can be blocked by a delayed flush scheduled by > PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes > - > > Key: HBASE-28257 > URL: https://issues.apache.org/jira/browse/HBASE-28257 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 3.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > *Steps to reproduce:* > # Make an edit to a region. > #
[jira] [Created] (HBASE-28257) Memstore flushRequest can be blocked by a delayed flush scheduled by PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes
David Manning created HBASE-28257: - Summary: Memstore flushRequest can be blocked by a delayed flush scheduled by PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes Key: HBASE-28257 URL: https://issues.apache.org/jira/browse/HBASE-28257 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 2.0.0, 3.0.0 Reporter: David Manning Assignee: David Manning *Steps to reproduce:* # Make an edit to a region. # Wait 1 hour (default value of hbase.regionserver.optionalcacheflushinterval.) # Make a very large number of edits to the region (i.e. >= 1GB, pressure the memstore.) *Expected:* Memstore pressure leads to flushes. *Result:* The PeriodicMemstoreFlusher has scheduled a refresh with a random delay of 0-5 minutes (default for hbase.regionserver.periodicmemstoreflusher.rangeofdelayseconds.) Memstore pressure flushes are blocked by the scheduled delayed flush. Client receives 0-5 minutes of RegionTooBusyExceptions until the delayed flush executes. *Logs:* 2023-12-13 06:00:13,573 INFO regionserver.HRegionServer - MemstoreFlusherChore requesting flush of because has an old edit so flush to free WALs after random delay 166761 ms 2023-12-13 06:00:53,219 DEBUG regionserver.HRegion - Flush requested on 2023-12-13 06:01:47,694 WARN regionserver.HRegion - Region is too busy due to exceeding memstore size limit. org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, regionName=, server= at org.apache.hadoop.hbase.regionserver.HRegion.checkResources ... repeats ... 2023-12-13 06:01:52,223 WARN regionserver.HRegion - Region is too busy due to exceeding memstore size limit. org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, regionName=, server= at org.apache.hadoop.hbase.regionserver.HRegion.checkResources ... 2023-12-13 06:03:00,340 INFO regionserver.HRegion - Flushing 1/1 column families, dataSize=534.77 MB heapSize=1.00 GB 2023-12-13 06:03:09,634 INFO regionserver.HRegion - Finished flush of dataSize ~534.77 MB/560744948, heapSize ~1.00 GB/1073816296, currentSize=0 B/0 for in 9294ms, sequenceid=21310753, compaction requested=false Note also this is the same cause as discussed in HBASE-16030 conversation https://issues.apache.org/jira/browse/HBASE-16030?focusedCommentId=15340153=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15340153 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-20034) Make periodic flusher delay configurable
[ https://issues.apache.org/jira/browse/HBASE-20034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-20034: -- Resolution: Duplicate Status: Resolved (was: Patch Available) > Make periodic flusher delay configurable > > > Key: HBASE-20034 > URL: https://issues.apache.org/jira/browse/HBASE-20034 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 3.0.0-alpha-1 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Major > Attachments: HBASE-20034.branch-1.patch, HBASE-20034.master.patch > > > PeriodicMemstoreFlusher is currently configured to flush with a random delay > of up to 5 minutes. Make this configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-21785) master reports open regions as RITs and also messes up rit age metric
[ https://issues.apache.org/jira/browse/HBASE-21785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788523#comment-17788523 ] David Manning commented on HBASE-21785: --- [~sershe] This says fixed in 2.2.0, but I don't see the commit https://github.com/apache/hbase/commit/9ef6bc4323c9be0e18f0cf9918a582e6b4a11853 in branch-2. > master reports open regions as RITs and also messes up rit age metric > - > > Key: HBASE-21785 > URL: https://issues.apache.org/jira/browse/HBASE-21785 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.0 > > Attachments: HBASE-21785.01.patch, HBASE-21785.patch > > > {noformat} > RegionState RIT time (ms) Retries > dba183f0dadfcc9dc8ae0a6dd59c84e6 dba183f0dadfcc9dc8ae0a6dd59c84e6. > state=OPEN, ts=Wed Dec 31 16:00:00 PST 1969 (1548453918s ago), > server=server,17020,1548452922054 1548453918735 0 > {noformat} > RIT age metric also gets set to a bogus value. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-25222) Add a cost function to move the daughter regions of a recent split to different region servers
[ https://issues.apache.org/jira/browse/HBASE-25222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning resolved HBASE-25222. --- Resolution: Duplicate > Add a cost function to move the daughter regions of a recent split to > different region servers > --- > > Key: HBASE-25222 > URL: https://issues.apache.org/jira/browse/HBASE-25222 > Project: HBase > Issue Type: Improvement >Reporter: Sandeep Pal >Assignee: Sandeep Pal >Priority: Major > > In HBase, hotspot regions are easily formed whenever there is skew and there > is high write volume. Few regions grow really fast which also becomes the > bottleneck on the few region servers. > It would be beneficial to add a cost function to move the regions after the > split to differetn region servers. In this way the writes to hot key range > will be distributed to multiple region servers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-25222) Add a cost function to move the daughter regions of a recent split to different region servers
[ https://issues.apache.org/jira/browse/HBASE-25222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1498#comment-1498 ] David Manning commented on HBASE-25222: --- {{hbase.master.auto.separate.child.regions.after.split.enabled }} is introduced in HBASE-25518. It is {{false}} by default, but seems to solve the problem that this issue is suggesting. > Add a cost function to move the daughter regions of a recent split to > different region servers > --- > > Key: HBASE-25222 > URL: https://issues.apache.org/jira/browse/HBASE-25222 > Project: HBase > Issue Type: Improvement >Reporter: Sandeep Pal >Assignee: Sandeep Pal >Priority: Major > > In HBase, hotspot regions are easily formed whenever there is skew and there > is high write volume. Few regions grow really fast which also becomes the > bottleneck on the few region servers. > It would be beneficial to add a cost function to move the regions after the > split to differetn region servers. In this way the writes to hot key range > will be distributed to multiple region servers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-25222) Add a cost function to move the daughter regions of a recent split to different region servers
[ https://issues.apache.org/jira/browse/HBASE-25222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774707#comment-17774707 ] David Manning commented on HBASE-25222: --- Seems like in hbase 2.x, this is already less of an issue, because the SplitTableRegionProcedure will choose new servers to open the daughter regions. In hbase 1.x, the daughter regions would open on the same server as the parent region. > Add a cost function to move the daughter regions of a recent split to > different region servers > --- > > Key: HBASE-25222 > URL: https://issues.apache.org/jira/browse/HBASE-25222 > Project: HBase > Issue Type: Improvement >Reporter: Sandeep Pal >Assignee: Sandeep Pal >Priority: Major > > In HBase, hotspot regions are easily formed whenever there is skew and there > is high write volume. Few regions grow really fast which also becomes the > bottleneck on the few region servers. > It would be beneficial to add a cost function to move the regions after the > split to differetn region servers. In this way the writes to hot key range > will be distributed to multiple region servers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27540) Client metrics for success/failure counts.
[ https://issues.apache.org/jira/browse/HBASE-27540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-27540: -- Component/s: metrics > Client metrics for success/failure counts. > -- > > Key: HBASE-27540 > URL: https://issues.apache.org/jira/browse/HBASE-27540 > Project: HBase > Issue Type: Improvement > Components: Client, metrics >Affects Versions: 3.0.0-alpha-3, 2.5.2 >Reporter: Victor Li >Assignee: Victor Li >Priority: Major > Fix For: 3.0.0-alpha-4, 2.4.16, 2.5.3 > > > Client metrics to see total number of successful or failure counts of related > RPC calls like get, mutate, scan etc... -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-15242) Client metrics for retries and timeouts
[ https://issues.apache.org/jira/browse/HBASE-15242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687334#comment-17687334 ] David Manning commented on HBASE-15242: --- For example, a {{RetriesExhaustedException}} will tell us we were doing retries and still failed. A {{CallTimeoutException}} will tell us that we hit a timeout. We could choose a subset of exceptions to instrument for metrics, just like the regionserver does. > Client metrics for retries and timeouts > --- > > Key: HBASE-15242 > URL: https://issues.apache.org/jira/browse/HBASE-15242 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Mikhail Antonov >Assignee: Victor Li >Priority: Major > > Client metrics to see total/avg number or retries, retries exhaused and > timeouts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-15242) Client metrics for retries and timeouts
[ https://issues.apache.org/jira/browse/HBASE-15242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687333#comment-17687333 ] David Manning commented on HBASE-15242: --- [~vli02us] Maybe we can report exception counts for some specific exceptions? This may be enough to give details about retries and timeouts and other errors too. We could do something similar to what the server metrics show: https://github.com/apache/hbase/blob/a854cba59f52bd5574b55146352b2236a718f6b0/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/MetricsHBaseServer.java#L100-L107 > Client metrics for retries and timeouts > --- > > Key: HBASE-15242 > URL: https://issues.apache.org/jira/browse/HBASE-15242 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Mikhail Antonov >Assignee: Victor Li >Priority: Major > > Client metrics to see total/avg number or retries, retries exhaused and > timeouts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27159) Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of hits and misses for cacheable requests
[ https://issues.apache.org/jira/browse/HBASE-27159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-27159: -- Status: Patch Available (was: Open) > Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of > hits and misses for cacheable requests > > > Key: HBASE-27159 > URL: https://issues.apache.org/jira/browse/HBASE-27159 > Project: HBase > Issue Type: Improvement > Components: BlockCache, metrics >Affects Versions: 2.0.0, 3.0.0-alpha-1 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > [https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400] > {code:java} > public double getHitCachingRatio() { > double requestCachingCount = getRequestCachingCount(); > if (requestCachingCount == 0) { > return 0; > } > return getHitCachingCount() / requestCachingCount; > } {code} > This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. > The metric represents the percentage of requests which were cacheable, but > not found in the cache. Unfortunately, since the counters are process-level > counters, the ratio is for the lifetime of the process. This makes it less > useful for looking at cache behavior during a smaller time period. > The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. > Having access to the underlying counters allows for offline computation of > the same metric for any given time period. But these counters are not emitted > today from {{{}MetricsRegionServerWrapperImpl.java{}}}. > Compare this to {{hitCount}} and {{missCount}} which are emitted as metrics > {{blockCacheHitCount}} and {{{}blockCacheMissCount{}}}. But these are raw > counts for the cache, which include requests that are not cacheable. The > cacheable metrics are more interesting, since it can be common to miss on a > request which is not cacheable. > Interestingly, these metrics are emitted regularly as part of a log line in > {{{}StatisticsThread.logStats{}}}. > We should emit blockCache{{{}HitCachingCount{}}} and > {{blockCacheMissCachingCount}} along with the current metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HBASE-27159) Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of hits and misses for cacheable requests
[ https://issues.apache.org/jira/browse/HBASE-27159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning reassigned HBASE-27159: - Assignee: David Manning > Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of > hits and misses for cacheable requests > > > Key: HBASE-27159 > URL: https://issues.apache.org/jira/browse/HBASE-27159 > Project: HBase > Issue Type: Improvement > Components: BlockCache, metrics >Affects Versions: 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > [https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400] > {code:java} > public double getHitCachingRatio() { > double requestCachingCount = getRequestCachingCount(); > if (requestCachingCount == 0) { > return 0; > } > return getHitCachingCount() / requestCachingCount; > } {code} > This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. > The metric represents the percentage of requests which were cacheable, but > not found in the cache. Unfortunately, since the counters are process-level > counters, the ratio is for the lifetime of the process. This makes it less > useful for looking at cache behavior during a smaller time period. > The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. > Having access to the underlying counters allows for offline computation of > the same metric for any given time period. But these counters are not emitted > today from {{{}MetricsRegionServerWrapperImpl.java{}}}. > Compare this to {{hitCount}} and {{missCount}} which are emitted as metrics > {{blockCacheHitCount}} and {{{}blockCacheMissCount{}}}. But these are raw > counts for the cache, which include requests that are not cacheable. The > cacheable metrics are more interesting, since it can be common to miss on a > request which is not cacheable. > Interestingly, these metrics are emitted regularly as part of a log line in > {{{}StatisticsThread.logStats{}}}. > We should emit blockCache{{{}HitCachingCount{}}} and > {{blockCacheMissCachingCount}} along with the current metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27302) Adding a trigger for Stochastica Balancer to safeguard for upper bound outliers.
[ https://issues.apache.org/jira/browse/HBASE-27302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579310#comment-17579310 ] David Manning commented on HBASE-27302: --- [~claraxiong] you may find https://issues.apache.org/jira/browse/HBASE-22349 useful to you. I used it for exactly this reason - triggering a balancer run in "sloppy" cases where a regionserver has more than 1+X% or less than 1-X% regions, compared to the average (mean) of regions per regionserver in the cluster. > Adding a trigger for Stochastica Balancer to safeguard for upper bound > outliers. > > > Key: HBASE-27302 > URL: https://issues.apache.org/jira/browse/HBASE-27302 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Priority: Major > > In large clusters, if one outlier has a lot of regions, the calculated > imbalance for RegionCountSkewCostFunction is quite low and often fails to > trigger the balancer. > For example, a node with twice average count on a 400-node cluster only > produce an imbalance of 0.004 < 0.02(current default threshold to trigger > balancer). An empty node also have similar effect but we have a safeguard in > place. https://issues.apache.org/jira/browse/HBASE-24139 > We can add a safeguard for this so we don't have to lower threshold on > larger clusters that makes the balancer more sensitive to other minor > imbalances. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate region count distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579308#comment-17579308 ] David Manning commented on HBASE-25625: --- [~bbeaudreault] [~claraxiong] you may find https://issues.apache.org/jira/browse/HBASE-22349 useful to you. I used it for exactly this reason - triggering a balancer run in "sloppy" cases where a regionserver has more than 1+X% or less than 1-X% regions. > StochasticBalancer CostFunctions needs a better way to evaluate region count > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Attachments: image-2021-10-05-17-17-50-944.png > > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more region servers and regions, it doesn't work > well with hot spots or a small number of unbalanced servers. The proposal is > to use the standard deviation of the count per region server to capture the > existence of a small portion of region servers with overwhelming > load/allocation. > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use the standard deviation of the count > per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 > in this case. > Patch is in test and will follow shortly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27159) Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of hits and misses for cacheable requests
[ https://issues.apache.org/jira/browse/HBASE-27159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-27159: -- Description: [https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400] {code:java} public double getHitCachingRatio() { double requestCachingCount = getRequestCachingCount(); if (requestCachingCount == 0) { return 0; } return getHitCachingCount() / requestCachingCount; } {code} This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. The metric represents the percentage of requests which were cacheable, but not found in the cache. Unfortunately, since the counters are process-level counters, the ratio is for the lifetime of the process. This makes it less useful for looking at cache behavior during a smaller time period. The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. Having access to the underlying counters allows for offline computation of the same metric for any given time period. But these counters are not emitted today from {{{}MetricsRegionServerWrapperImpl.java{}}}. Compare this to {{hitCount}} and {{missCount}} which are emitted as metrics {{blockCacheHitCount}} and {{{}blockCacheMissCount{}}}. But these are raw counts for the cache, which include requests that are not cacheable. The cacheable metrics are more interesting, since it can be common to miss on a request which is not cacheable. Interestingly, these metrics are emitted regularly as part of a log line in {{{}StatisticsThread.logStats{}}}. We should emit blockCache{{{}HitCachingCount{}}} and {{blockCacheMissCachingCount}} along with the current metrics. was: [https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400] {code:java} public double getHitCachingRatio() { double requestCachingCount = getRequestCachingCount(); if (requestCachingCount == 0) { return 0; } return getHitCachingCount() / requestCachingCount; } {code} This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. The metric represents the percentage of requests which were cacheable, but not found in the cache. Unfortunately, since the counters are process-level counters, the ratio is for the lifetime of the process. This makes it less useful for looking at cache behavior during a smaller time period. The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. Having access to the underlying counters allows for offline computation of the same metric for any given time period. But these counters are not emitted today from {{{}MetricsRegionServerWrapperImpl.java{}}}. Compare this to {{hitCount}} and {{missCount}} which are emitted as metrics {{blockCacheHitCount}} and {{{}blockCacheMissCount{}}}. But these are raw counts for the cache, which include requests that are not cacheable. The cacheable metrics are more interesting, since it can be common to miss on a request which is not cacheable. We should emit blockCache{{{}HitCachingCount{}}} and {{blockCacheMissCachingCount}} along with the current metrics. > Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of > hits and misses for cacheable requests > > > Key: HBASE-27159 > URL: https://issues.apache.org/jira/browse/HBASE-27159 > Project: HBase > Issue Type: Improvement > Components: BlockCache, metrics >Affects Versions: 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Priority: Minor > > [https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400] > {code:java} > public double getHitCachingRatio() { > double requestCachingCount = getRequestCachingCount(); > if (requestCachingCount == 0) { > return 0; > } > return getHitCachingCount() / requestCachingCount; > } {code} > This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. > The metric represents the percentage of requests which were cacheable, but > not found in the cache. Unfortunately, since the counters are process-level > counters, the ratio is for the lifetime of the process. This makes it less > useful for looking at cache behavior during a smaller time period. > The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. > Having access to the underlying counters allows for offline computation of > the same metric for any given time period. But these counters are not emitted > today from {{{}MetricsRegionServerWrapperImpl.java{}}}.
[jira] [Updated] (HBASE-27159) Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of hits and misses for cacheable requests
[ https://issues.apache.org/jira/browse/HBASE-27159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-27159: -- Summary: Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of hits and misses for cacheable requests (was: Emit source metrics for BlockCacheExpressHitPercent, getHitCachingRatio, getHitCachingCount, getMissCachingCount) > Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of > hits and misses for cacheable requests > > > Key: HBASE-27159 > URL: https://issues.apache.org/jira/browse/HBASE-27159 > Project: HBase > Issue Type: Improvement > Components: BlockCache, metrics >Affects Versions: 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Priority: Minor > > [https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400] > {code:java} > public double getHitCachingRatio() { > double requestCachingCount = getRequestCachingCount(); > if (requestCachingCount == 0) { > return 0; > } > return getHitCachingCount() / requestCachingCount; > } {code} > This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. > The metric represents the percentage of requests which were cacheable, but > not found in the cache. Unfortunately, since the counters are process-level > counters, the ratio is for the lifetime of the process. This makes it less > useful for looking at cache behavior during a smaller time period. > The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. > Having access to the underlying counters allows for offline computation of > the same metric for any given time period. But these counters are not emitted > today from {{{}MetricsRegionServerWrapperImpl.java{}}}. > Compare this to {{hitCount}} and {{missCount}} which are emitted as metrics > {{blockCacheHitCount}} and {{{}blockCacheMissCount{}}}. But these are raw > counts for the cache, which include requests that are not cacheable. The > cacheable metrics are more interesting, since it can be common to miss on a > request which is not cacheable. > We should emit blockCache{{{}HitCachingCount{}}} and > {{blockCacheMissCachingCount}} along with the current metrics. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-27159) Emit source metrics for BlockCacheExpressHitPercent, getHitCachingRatio, getHitCachingCount, getMissCachingCount
David Manning created HBASE-27159: - Summary: Emit source metrics for BlockCacheExpressHitPercent, getHitCachingRatio, getHitCachingCount, getMissCachingCount Key: HBASE-27159 URL: https://issues.apache.org/jira/browse/HBASE-27159 Project: HBase Issue Type: Improvement Components: BlockCache, metrics Affects Versions: 2.0.0, 3.0.0-alpha-1 Reporter: David Manning [https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400] {code:java} public double getHitCachingRatio() { double requestCachingCount = getRequestCachingCount(); if (requestCachingCount == 0) { return 0; } return getHitCachingCount() / requestCachingCount; } {code} This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. The metric represents the percentage of requests which were cacheable, but not found in the cache. Unfortunately, since the counters are process-level counters, the ratio is for the lifetime of the process. This makes it less useful for looking at cache behavior during a smaller time period. The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. Having access to the underlying counters allows for offline computation of the same metric for any given time period. But these counters are not emitted today from {{{}MetricsRegionServerWrapperImpl.java{}}}. Compare this to {{hitCount}} and {{missCount}} which are emitted as metrics {{blockCacheHitCount}} and {{{}blockCacheMissCount{}}}. But these are raw counts for the cache, which include requests that are not cacheable. The cacheable metrics are more interesting, since it can be common to miss on a request which is not cacheable. We should emit blockCache{{{}HitCachingCount{}}} and {{blockCacheMissCachingCount}} along with the current metrics. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky
[ https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17540525#comment-17540525 ] David Manning commented on HBASE-27054: --- Thanks for validating and committing! > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > is flaky > --- > > Key: HBASE-27054 > URL: https://issues.apache.org/jira/browse/HBASE-27054 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 2.5.0 >Reporter: Andrew Kyle Purtell >Assignee: David Manning >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-3, 2.4.13 > > > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > . Looks like we can be off by one on either side of an expected value. > Any idea what is going on here [~dmanning]? > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.779 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no less than 60. > server=srv1351292323,46522,-3543799643652531264 , load=59 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.781 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no more than 60. > server=srv1402325691,7995,26308078476749652 , load=61 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky
[ https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-27054: -- Status: Patch Available (was: Open) > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > is flaky > --- > > Key: HBASE-27054 > URL: https://issues.apache.org/jira/browse/HBASE-27054 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 2.5.0 >Reporter: Andrew Kyle Purtell >Assignee: David Manning >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-3 > > > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > . Looks like we can be off by one on either side of an expected value. > Any idea what is going on here [~dmanning]? > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.779 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no less than 60. > server=srv1351292323,46522,-3543799643652531264 , load=59 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.781 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no more than 60. > server=srv1402325691,7995,26308078476749652 , load=61 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky
[ https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539878#comment-17539878 ] David Manning commented on HBASE-27054: --- I see some good results by changing the cost function weights. I will propose a PR with those changes. {code:java} conf.setFloat("hbase.master.balancer.stochastic.moveCost", 0f); conf.setFloat("hbase.master.balancer.stochastic.tableSkewCost", 0f); {code} If I make one change, with {{maxRunningTime}} from 180s to 30s, I see 100% failure rate. If I make the above cost function weight updates, I see 100% pass rate, even with a {{maxRunningTime}} of 15s. > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > is flaky > --- > > Key: HBASE-27054 > URL: https://issues.apache.org/jira/browse/HBASE-27054 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 2.5.0 >Reporter: Andrew Kyle Purtell >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-3 > > > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > . Looks like we can be off by one on either side of an expected value. > Any idea what is going on here [~dmanning]? > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.779 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no less than 60. > server=srv1351292323,46522,-3543799643652531264 , load=59 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.781 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no more than 60. > server=srv1402325691,7995,26308078476749652 , load=61 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky
[ https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning reassigned HBASE-27054: - Assignee: David Manning > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > is flaky > --- > > Key: HBASE-27054 > URL: https://issues.apache.org/jira/browse/HBASE-27054 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 2.5.0 >Reporter: Andrew Kyle Purtell >Assignee: David Manning >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-3 > > > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > . Looks like we can be off by one on either side of an expected value. > Any idea what is going on here [~dmanning]? > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.779 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no less than 60. > server=srv1351292323,46522,-3543799643652531264 , load=59 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.781 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no more than 60. > server=srv1402325691,7995,26308078476749652 , load=61 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky
[ https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539839#comment-17539839 ] David Manning edited comment on HBASE-27054 at 5/19/22 11:06 PM: - I ran it 50 times locally using latest {{{}master{}}}, it failed twice, even with 3-minute timeout, and ~3.9 million stochastic steps. So the 77s appears irrelevant. was (Author: dmanning): I ran it 50 times locally using latest {{master}}, it failed twice, even with 3-minute timeout, and ~3.9 million stochastic steps. > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > is flaky > --- > > Key: HBASE-27054 > URL: https://issues.apache.org/jira/browse/HBASE-27054 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 2.5.0 >Reporter: Andrew Kyle Purtell >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-3 > > > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > . Looks like we can be off by one on either side of an expected value. > Any idea what is going on here [~dmanning]? > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.779 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no less than 60. > server=srv1351292323,46522,-3543799643652531264 , load=59 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.781 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no more than 60. > server=srv1402325691,7995,26308078476749652 , load=61 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky
[ https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539839#comment-17539839 ] David Manning commented on HBASE-27054: --- I ran it 50 times locally using latest {{master}}, it failed twice, even with 3-minute timeout, and ~3.9 million stochastic steps. > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > is flaky > --- > > Key: HBASE-27054 > URL: https://issues.apache.org/jira/browse/HBASE-27054 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 2.5.0 >Reporter: Andrew Kyle Purtell >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-3 > > > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > . Looks like we can be off by one on either side of an expected value. > Any idea what is going on here [~dmanning]? > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.779 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no less than 60. > server=srv1351292323,46522,-3543799643652531264 , load=59 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.781 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no more than 60. > server=srv1402325691,7995,26308078476749652 , load=61 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky
[ https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539754#comment-17539754 ] David Manning commented on HBASE-27054: --- With a lower timeout, like 60 seconds, or on slower hardware, we could get fewer iterations. I suppose in that sense we may just get unlucky in not being able to get to fully balanced state given current configuration. 50,000 regions have to move, and the {{RegionReplicaCandidateGenerator}} is doing most of that work, which is chosen roughly 25% of the time. There are likely some missteps. Conservatively, it seems like we may need 200,000 calls to guarantee the work gets done. That means 800,000 iterations. Running locally, if I had set a timeout of 60 seconds, I'd see 1.3 million iterations. It's close enough that we may see the occasional problem. The tests should ensure that even on slow hardware, with unlucky random choices, we are still virtually guaranteed success. We may not be doing that here. But a 3 minute timeout should make it much more likely. So I'm interested in the test message that says it ran 77 seconds, even though I'm sure the test could be improved to be more deterministic. > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > is flaky > --- > > Key: HBASE-27054 > URL: https://issues.apache.org/jira/browse/HBASE-27054 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 2.5.0 >Reporter: Andrew Kyle Purtell >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-3 > > > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > . Looks like we can be off by one on either side of an expected value. > Any idea what is going on here [~dmanning]? > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.779 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no less than 60. > server=srv1351292323,46522,-3543799643652531264 , load=59 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} > {noformat} > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > Time elapsed: 77.781 s <<< FAILURE! > java.lang.AssertionError: All servers should have load no more than 60. > server=srv1402325691,7995,26308078476749652 , load=61 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577) > at > org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544) > at > org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky
[ https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539749#comment-17539749 ] David Manning commented on HBASE-27054: --- [~apurtell] Do we know if this is a recent regression, or has it always been flaky? My initial thought is that there may be some randomness (it is a stochastic balancer after all) which leads to this end result. I don't believe any recent changes would have caused this to become more flaky, but I suppose it's possible. HBASE-26311 is interesting, since it changes calculations to use standard deviation. [~claraxiong] Why does the error message say it failed after 77 seconds? The test takes 3 minutes to run for me locally, which is the configured timeout for the balancer in {{StochasticBalancerTestBase2}}. Is there a link to a test failure with full logs that I can inspect? (Note, 3 minute timeout was updated in HBASE-25873. Previous value was 90 seconds.) With region replicas involved, the {{RegionReplicaCandidateGenerator}} will just move a colocated replica to a random server, without consideration of how many regions that target server is hosting. The cost functions will allow it in basically every case, since it heavily prioritizes resolving colocated replicas. So maybe by the time all the region replicas have been resolved, the number of moves is already pushing limits of one balancer iteration, with having randomly overloaded one regionserver. A situation that the balancer will have a difficult time getting out of is if one regionserver is hosting 61 replicas of 61 regions, and another regionserver is hosting 59 regions, which are replicas of those 61 regions. The {{LoadCandidateGenerator}} will keep trying to take a region from the server with 61 and give it to the server with 59, but because there is already a replica that matches, it will be too expensive to move. But as long as we can process enough iterations, probabilistically speaking we should be able to get to one of the 2 safe regions to move... when I run this test locally I see nearly 4 million iterations, and with 1/4 of those using the {{LoadCandidateGenerator}} it seems like we should generally find a solution that moves them all. {code} Finished computing new moving plan. Computation took 180001 ms to try 3975554 different iterations. Found a solution that moves 50006 regions; Going from a computed imbalance of 0.9026309610781538 to a new imbalance of 5.252006025578701E-5. funtionCost=RegionCountSkewCostFunction : (multiplier=500.0, imbalance=0.0); PrimaryRegionCountSkewCostFunction : (multiplier=500.0, imbalance=0.0); MoveCostFunction : (multiplier=7.0, imbalance=0.83343334, need balance); RackLocalityCostFunction : (multiplier=15.0, imbalance=0.0); TableSkewCostFunction : (multiplier=35.0, imbalance=0.0); RegionReplicaHostCostFunction : (multiplier=10.0, imbalance=0.0); RegionReplicaRackCostFunction : (multiplier=1.0, imbalance=0.0); ReadRequestCostFunction : (multiplier=5.0, imbalance=0.0); CPRequestCostFunction : (multiplier=5.0, imbalance=0.0); WriteRequestCostFunction : (multiplier=5.0, imbalance=0.0); MemStoreSizeCostFunction : (multiplier=5.0, imbalance=0.0); StoreFileCostFunction : (multiplier=5.0, imbalance=0.0); {code} Since the test case is also using 100 tables, and there is a {{TableSkewCostFunction}} involved, it's also possible that the balancer is happy with a slightly uneven region count balance, because balancing the last region would push towards an imbalance of tables if the target regionserver already has too many regions of that table for every region that is chosen. I don't know if the math would support this, though. If it does, it's possible that out of the last 61 regions, moving any region to the server with 59 would either cause table skew or colocated replicas, and so the balancer cannot fully balance based on the simple {{LoadCandidateGenerator}} alone. This is all hypothetical, without yet trying to debug. Given the large size of the test, the number of balancer iterations, and the flakiness, it may be difficult to debug. I ran it 10+ times locally so far, and it passes each time. So, some ideas to explore: # Don't assert that the cluster is fully balanced in this test case, just assert that there are no colocated replicas. Arguably this is the purpose of the test, and the test framework already appears to allow for this. # Change cost function weights for everything else, other than region counts and replica counts, to be 0. In this way, nothing prevents the balancer optimizing for these variables, which the test is expecting to validate. Specifically, set TableSkew and MoveCost functions to 0. # Use fewer than 100 tables, if table skew is a contributing factor. > TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster > is flaky >
[jira] [Commented] (HBASE-26989) TestStochasticLoadBalancer has some slow methods, and inconsistent set, reset, unset of configuration
[ https://issues.apache.org/jira/browse/HBASE-26989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529794#comment-17529794 ] David Manning commented on HBASE-26989: --- When running the tests locally, I see these runtime improvements: {{testNeedBalance}}: from 120 seconds to 11 seconds {{testSloppyTablesLoadBalanceByTable}} 27 seconds to <1 second {{testBalanceOfSloppyServers}} 67 seconds to <1 second So total class {{TestStochasticLoadBalancer}} runtime reduces from 230 seconds to 31 seconds. Additionally, we get more deterministic behavior, since tests are more likely to have consistent results with a max number of steps when compared to a max running time. > TestStochasticLoadBalancer has some slow methods, and inconsistent set, > reset, unset of configuration > - > > Key: HBASE-26989 > URL: https://issues.apache.org/jira/browse/HBASE-26989 > Project: HBase > Issue Type: Test > Components: Balancer, test >Affects Versions: 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > Some test ordering issues were exposed by adding new tests in HBASE-22349. I > think this is a legitimate issue which is tracked in HBASE-26988. > But we can update the tests to be consistent in how they update configuration > to reduce confusion, removing the {{unset}} calls. > We can also update other configuration values to significantly speed up the > long-running methods. Methods that are simply checking for balancer plans do > not need to {{runMaxSteps}}. All we need to do is run enough steps to > guarantee we will plan to move one region. That can be far fewer than the > tens of millions of steps we may be running given {{runMaxSteps}}. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration
[ https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-26988: -- Description: # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Dynamically reload configuration for hmaster (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to {{false}} *Note 1*: The steps may only work if the config value is not in {{hbase-default.xml}} so it may be an unlikely scenario. *Note 2*: I see this when running tests added in HBASE-22349, depending on the order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} executes before {{testBalanceOfSloppyServers}} there will be a failure. We could apply the workaround to the tests (explicitly set to {{false}}), but it seems better to fix the dynamic reconfiguration behavior. Regardless, I will propose test fixes in HBASE-26989. was: # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Dynamically reload configuration for hmaster (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to {{false}} Note: I see this when running tests added in HBASE-22349, depending on the order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} executes before {{testBalanceOfSloppyServers}} there will be a failure. We could apply the workaround to the tests (explicitly set to {{false}}), but it seems better to fix the dynamic reconfiguration behavior. > Balancer should reset to default setting for hbase.master.loadbalance.bytable > if dynamically reloading configuration > > > Key: HBASE-26988 > URL: https://issues.apache.org/jira/browse/HBASE-26988 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} > # Start HMaster > # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} > # Dynamically reload configuration for hmaster > (https://hbase.apache.org/book.html#dyn_config) > *Expected:* load balancing would no longer happen by table > *Actual:* load balancing still happens by table > *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to > {{false}} > *Note 1*: The steps may only work if the config value is not in > {{hbase-default.xml}} so it may be an unlikely scenario. > *Note 2*: I see this when running tests added in HBASE-22349, depending on > the order of execution of test methods. If > {{testSloppyTablesLoadBalanceByTable}} executes before > {{testBalanceOfSloppyServers}} there will be a failure. We could apply the > workaround to the tests (explicitly set to {{false}}), but it seems better to > fix the dynamic reconfiguration behavior. Regardless, I will propose test > fixes in HBASE-26989. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-26989) TestStochasticLoadBalancer has some slow methods, and inconsistent set, reset, unset of configuration
David Manning created HBASE-26989: - Summary: TestStochasticLoadBalancer has some slow methods, and inconsistent set, reset, unset of configuration Key: HBASE-26989 URL: https://issues.apache.org/jira/browse/HBASE-26989 Project: HBase Issue Type: Test Components: Balancer, test Affects Versions: 2.0.0, 3.0.0-alpha-1 Reporter: David Manning Assignee: David Manning Some test ordering issues were exposed by adding new tests in HBASE-22349. I think this is a legitimate issue which is tracked in HBASE-26988. But we can update the tests to be consistent in how they update configuration to reduce confusion, removing the {{unset}} calls. We can also update other configuration values to significantly speed up the long-running methods. Methods that are simply checking for balancer plans do not need to {{runMaxSteps}}. All we need to do is run enough steps to guarantee we will plan to move one region. That can be far fewer than the tens of millions of steps we may be running given {{runMaxSteps}}. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration
[ https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529774#comment-17529774 ] David Manning commented on HBASE-26988: --- I guess this behavior would apply to a lot of {{StochasticLoadBalancer}} settings as well... > Balancer should reset to default setting for hbase.master.loadbalance.bytable > if dynamically reloading configuration > > > Key: HBASE-26988 > URL: https://issues.apache.org/jira/browse/HBASE-26988 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} > # Start HMaster > # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} > # Dynamically reload configuration for hmaster > (https://hbase.apache.org/book.html#dyn_config) > *Expected:* load balancing would no longer happen by table > *Actual:* load balancing still happens by table > *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to > {{false}} > Note: I see this when running tests added in HBASE-22349, depending on the > order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} > executes before {{testBalanceOfSloppyServers}} there will be a failure. We > could apply the workaround to the tests (explicitly set to {{false}}), but it > seems better to fix the dynamic reconfiguration behavior. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration
[ https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-26988: -- Status: Patch Available (was: Open) > Balancer should reset to default setting for hbase.master.loadbalance.bytable > if dynamically reloading configuration > > > Key: HBASE-26988 > URL: https://issues.apache.org/jira/browse/HBASE-26988 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 2.0.0, 3.0.0-alpha-1 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} > # Start HMaster > # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} > # Dynamically reload configuration for hmaster > (https://hbase.apache.org/book.html#dyn_config) > *Expected:* load balancing would no longer happen by table > *Actual:* load balancing still happens by table > *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to > {{false}} > Note: I see this when running tests added in HBASE-22349, depending on the > order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} > executes before {{testBalanceOfSloppyServers}} there will be a failure. We > could apply the workaround to the tests (explicitly set to {{false}}), but it > seems better to fix the dynamic reconfiguration behavior. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration
[ https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529772#comment-17529772 ] David Manning commented on HBASE-26988: --- I randomly didn't notice it when running tests locally, because {{testUpdateBalancerLoadInfo}} also sets it to {{false}} as the last update. So if that test runs in between {{testSloppyTablesLoadBalanceByTable}} and {{testBalanceOfSloppyServers}}, everything is also okay. > Balancer should reset to default setting for hbase.master.loadbalance.bytable > if dynamically reloading configuration > > > Key: HBASE-26988 > URL: https://issues.apache.org/jira/browse/HBASE-26988 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} > # Start HMaster > # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} > # Dynamically reload configuration for hmaster > (https://hbase.apache.org/book.html#dyn_config) > *Expected:* load balancing would no longer happen by table > *Actual:* load balancing still happens by table > *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to > {{false}} > Note: I see this when running tests added in HBASE-22349, depending on the > order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} > executes before {{testBalanceOfSloppyServers}} there will be a failure. We > could apply the workaround to the tests (explicitly set to {{false}}), but it > seems better to fix the dynamic reconfiguration behavior. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration
[ https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-26988: -- Description: # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Dynamically reload configuration for hmaster (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to {{false}} Note: I see this when running tests added in HBASE-22349, depending on the order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} executes before {{testBalanceOfSloppyServers}} there will be a failure. We could apply the workaround to the tests (explicitly set to {{false}}), but it seems better to fix the dynamic reconfiguration behavior. was: # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Dynamically reload configuration for hmaster (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to {{false}} Note: I see this when running tests added in HBASE-22349, depending on the order of execution of test methods. We could apply the workaround to the tests (explicitly set to {{false}}), but it seems better to fix the dynamic reconfiguration behavior. > Balancer should reset to default setting for hbase.master.loadbalance.bytable > if dynamically reloading configuration > > > Key: HBASE-26988 > URL: https://issues.apache.org/jira/browse/HBASE-26988 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} > # Start HMaster > # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} > # Dynamically reload configuration for hmaster > (https://hbase.apache.org/book.html#dyn_config) > *Expected:* load balancing would no longer happen by table > *Actual:* load balancing still happens by table > *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to > {{false}} > Note: I see this when running tests added in HBASE-22349, depending on the > order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} > executes before {{testBalanceOfSloppyServers}} there will be a failure. We > could apply the workaround to the tests (explicitly set to {{false}}), but it > seems better to fix the dynamic reconfiguration behavior. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration
[ https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-26988: -- Description: # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Dynamically reload configuration for hmaster (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to {{false}} Note: I see this when running tests added in HBASE-22349, depending on the order of execution of test methods. We could apply the workaround to the tests (explicitly set to {{false}}), but it seems better to fix the dynamic reconfiguration behavior. was: # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Dynamically reload configuration for hmaster (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to {{false}} > Balancer should reset to default setting for hbase.master.loadbalance.bytable > if dynamically reloading configuration > > > Key: HBASE-26988 > URL: https://issues.apache.org/jira/browse/HBASE-26988 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} > # Start HMaster > # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} > # Dynamically reload configuration for hmaster > (https://hbase.apache.org/book.html#dyn_config) > *Expected:* load balancing would no longer happen by table > *Actual:* load balancing still happens by table > *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to > {{false}} > Note: I see this when running tests added in HBASE-22349, depending on the > order of execution of test methods. We could apply the workaround to the > tests (explicitly set to {{false}}), but it seems better to fix the dynamic > reconfiguration behavior. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration
[ https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-26988: -- Description: # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Dynamically reload configuration for hmaster (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table was: # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Reload configuration (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table > Balancer should reset to default setting for hbase.master.loadbalance.bytable > if dynamically reloading configuration > > > Key: HBASE-26988 > URL: https://issues.apache.org/jira/browse/HBASE-26988 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} > # Start HMaster > # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} > # Dynamically reload configuration for hmaster > (https://hbase.apache.org/book.html#dyn_config) > *Expected:* load balancing would no longer happen by table > *Actual:* load balancing still happens by table -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration
[ https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-26988: -- Description: # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Dynamically reload configuration for hmaster (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to {{false}}. was: # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Dynamically reload configuration for hmaster (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table > Balancer should reset to default setting for hbase.master.loadbalance.bytable > if dynamically reloading configuration > > > Key: HBASE-26988 > URL: https://issues.apache.org/jira/browse/HBASE-26988 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} > # Start HMaster > # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} > # Dynamically reload configuration for hmaster > (https://hbase.apache.org/book.html#dyn_config) > *Expected:* load balancing would no longer happen by table > *Actual:* load balancing still happens by table > *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to > {{false}}. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration
[ https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-26988: -- Description: # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Dynamically reload configuration for hmaster (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to {{false}} was: # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Dynamically reload configuration for hmaster (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to {{false}}. > Balancer should reset to default setting for hbase.master.loadbalance.bytable > if dynamically reloading configuration > > > Key: HBASE-26988 > URL: https://issues.apache.org/jira/browse/HBASE-26988 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} > # Start HMaster > # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} > # Dynamically reload configuration for hmaster > (https://hbase.apache.org/book.html#dyn_config) > *Expected:* load balancing would no longer happen by table > *Actual:* load balancing still happens by table > *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to > {{false}} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration
David Manning created HBASE-26988: - Summary: Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration Key: HBASE-26988 URL: https://issues.apache.org/jira/browse/HBASE-26988 Project: HBase Issue Type: Bug Components: Balancer Affects Versions: 2.0.0, 3.0.0-alpha-1 Reporter: David Manning Assignee: David Manning # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}} # Start HMaster # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}} # Reload configuration (https://hbase.apache.org/book.html#dyn_config) *Expected:* load balancing would no longer happen by table *Actual:* load balancing still happens by table -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster
[ https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-22349: -- Release Note: StochasticLoadBalancer now respects the hbase.regions.slop configuration value as another factor in determining whether to attempt a balancer run. If any regionserver has a region count outside of the target range, the balancer will attempt to balance. Using the default 0.2 value, the target range is 80%-120% of the average (mean) region count per server. Whether the balancer will ultimately move regions will still depend on the weights of StochasticLoadBalancer's cost functions. > Stochastic Load Balancer skips balancing when node is replaced in cluster > - > > Key: HBASE-22349 > URL: https://issues.apache.org/jira/browse/HBASE-22349 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 1.3.0, 1.4.4, 2.0.0 >Reporter: Suthan Phillips >Assignee: David Manning >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-3 > > Attachments: Hbase-22349.pdf > > > HBASE-24139 allows the load balancer to run when one server has 0 regions and > another server has more than 1 region. This is a special case of a more > generic problem, where one server has far too few or far too many regions. > The StochasticLoadBalancer defaults may decide the cluster is "balanced > enough" according to {{hbase.master.balancer.stochastic.minCostNeedBalance}}, > even though one server may have a far higher or lower number of regions > compared to the rest of the cluster. > One specific example of this we have seen is when we use {{RegionMover}} to > move regions back to a restarted RegionServer, if the > {{StochasticLoadBalancer}} happens to be running. The load balancer sees a > newly restarted RegionServer with 0 regions, and after HBASE-24139, it will > balance regions to this server. Simultaneously, {{RegionMover}} moves back > regions. The end result is that the newly restarted RegionServer has twice > the load of any other server in the cluster. Future iterations of the load > balancer do nothing, as the cluster cost does not exceed > {{minCostNeedBalance}}. > Another example is if the load balancer makes very slow progress on a > cluster, it may not move the average cluster load to a newly restarted > regionserver in one iteration. But after the first iteration, the balancer > may again not run due to cluster cost not exceeding {{minCostNeedBalance}}. > We can propose a solution where we reuse the {{slop}} concept in > {{SimpleLoadBalancer}} and use this to extend the HBASE-24139 logic for > deciding to run the balancer as long as there is a "sloppy" server in the > cluster. > +*Previous Description Notes Below, which are relevant, but as stated, were > already fixed by HBASE-24139*+ > In EMR cluster, whenever I replace one of the nodes, the regions never get > rebalanced. > The default minCostNeedBalance set to 0.05 is too high. > The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 > = 203 > Once a node(region server) got replaced with a new node (terminated and EMR > recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, > 22, 22, 23, 23, 23 = 203 > From hbase-master-logs, I can see the below WARN which indicates that the > default minCostNeedBalance does not hold good for these scenarios. > ## > 2019-04-29 09:31:37,027 WARN > [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] > cleaner.CleanerChore: WALs outstanding under > hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 > 09:31:42,920 INFO > [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost > which need balance is 0.05 > ## > To mitigate this, I had to modify the default minCostNeedBalance to lower > value like 0.01f and restart Region Servers and Hbase Master. After modifying > this value to 0.01f I could see the regions getting re-balanced. > This has led me to the following questions which I would like to get it > answered from the HBase experts. > 1)What are the factors that affect the value of total cost and sum > multiplier? How could we determine the right minCostNeedBalance value for any > cluster? > 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal > value? If yes, then what is the recommended way to mitigate this scenario? > Attached: Steps to reproduce > > Note: HBase-17565 patch is already applied. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster
[ https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-22349: -- Component/s: Balancer > Stochastic Load Balancer skips balancing when node is replaced in cluster > - > > Key: HBASE-22349 > URL: https://issues.apache.org/jira/browse/HBASE-22349 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 1.3.0, 1.4.4, 2.0.0 >Reporter: Suthan Phillips >Assignee: David Manning >Priority: Major > Attachments: Hbase-22349.pdf > > > HBASE-24139 allows the load balancer to run when one server has 0 regions and > another server has more than 1 region. This is a special case of a more > generic problem, where one server has far too few or far too many regions. > The StochasticLoadBalancer defaults may decide the cluster is "balanced > enough" according to {{hbase.master.balancer.stochastic.minCostNeedBalance}}, > even though one server may have a far higher or lower number of regions > compared to the rest of the cluster. > One specific example of this we have seen is when we use {{RegionMover}} to > move regions back to a restarted RegionServer, if the > {{StochasticLoadBalancer}} happens to be running. The load balancer sees a > newly restarted RegionServer with 0 regions, and after HBASE-24139, it will > balance regions to this server. Simultaneously, {{RegionMover}} moves back > regions. The end result is that the newly restarted RegionServer has twice > the load of any other server in the cluster. Future iterations of the load > balancer do nothing, as the cluster cost does not exceed > {{minCostNeedBalance}}. > Another example is if the load balancer makes very slow progress on a > cluster, it may not move the average cluster load to a newly restarted > regionserver in one iteration. But after the first iteration, the balancer > may again not run due to cluster cost not exceeding {{minCostNeedBalance}}. > We can propose a solution where we reuse the {{slop}} concept in > {{SimpleLoadBalancer}} and use this to extend the HBASE-24139 logic for > deciding to run the balancer as long as there is a "sloppy" server in the > cluster. > +*Previous Description Notes Below, which are relevant, but as stated, were > already fixed by HBASE-24139*+ > In EMR cluster, whenever I replace one of the nodes, the regions never get > rebalanced. > The default minCostNeedBalance set to 0.05 is too high. > The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 > = 203 > Once a node(region server) got replaced with a new node (terminated and EMR > recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, > 22, 22, 23, 23, 23 = 203 > From hbase-master-logs, I can see the below WARN which indicates that the > default minCostNeedBalance does not hold good for these scenarios. > ## > 2019-04-29 09:31:37,027 WARN > [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] > cleaner.CleanerChore: WALs outstanding under > hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 > 09:31:42,920 INFO > [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost > which need balance is 0.05 > ## > To mitigate this, I had to modify the default minCostNeedBalance to lower > value like 0.01f and restart Region Servers and Hbase Master. After modifying > this value to 0.01f I could see the regions getting re-balanced. > This has led me to the following questions which I would like to get it > answered from the HBase experts. > 1)What are the factors that affect the value of total cost and sum > multiplier? How could we determine the right minCostNeedBalance value for any > cluster? > 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal > value? If yes, then what is the recommended way to mitigate this scenario? > Attached: Steps to reproduce > > Note: HBase-17565 patch is already applied. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster
[ https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-22349: -- Status: Patch Available (was: Open) > Stochastic Load Balancer skips balancing when node is replaced in cluster > - > > Key: HBASE-22349 > URL: https://issues.apache.org/jira/browse/HBASE-22349 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.4.4, 1.3.0, 3.0.0-alpha-1 >Reporter: Suthan Phillips >Assignee: David Manning >Priority: Major > Attachments: Hbase-22349.pdf > > > HBASE-24139 allows the load balancer to run when one server has 0 regions and > another server has more than 1 region. This is a special case of a more > generic problem, where one server has far too few or far too many regions. > The StochasticLoadBalancer defaults may decide the cluster is "balanced > enough" according to {{hbase.master.balancer.stochastic.minCostNeedBalance}}, > even though one server may have a far higher or lower number of regions > compared to the rest of the cluster. > One specific example of this we have seen is when we use {{RegionMover}} to > move regions back to a restarted RegionServer, if the > {{StochasticLoadBalancer}} happens to be running. The load balancer sees a > newly restarted RegionServer with 0 regions, and after HBASE-24139, it will > balance regions to this server. Simultaneously, {{RegionMover}} moves back > regions. The end result is that the newly restarted RegionServer has twice > the load of any other server in the cluster. Future iterations of the load > balancer do nothing, as the cluster cost does not exceed > {{minCostNeedBalance}}. > Another example is if the load balancer makes very slow progress on a > cluster, it may not move the average cluster load to a newly restarted > regionserver in one iteration. But after the first iteration, the balancer > may again not run due to cluster cost not exceeding {{minCostNeedBalance}}. > We can propose a solution where we reuse the {{slop}} concept in > {{SimpleLoadBalancer}} and use this to extend the HBASE-24139 logic for > deciding to run the balancer as long as there is a "sloppy" server in the > cluster. > +*Previous Description Notes Below, which are relevant, but as stated, were > already fixed by HBASE-24139*+ > In EMR cluster, whenever I replace one of the nodes, the regions never get > rebalanced. > The default minCostNeedBalance set to 0.05 is too high. > The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 > = 203 > Once a node(region server) got replaced with a new node (terminated and EMR > recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, > 22, 22, 23, 23, 23 = 203 > From hbase-master-logs, I can see the below WARN which indicates that the > default minCostNeedBalance does not hold good for these scenarios. > ## > 2019-04-29 09:31:37,027 WARN > [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] > cleaner.CleanerChore: WALs outstanding under > hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 > 09:31:42,920 INFO > [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost > which need balance is 0.05 > ## > To mitigate this, I had to modify the default minCostNeedBalance to lower > value like 0.01f and restart Region Servers and Hbase Master. After modifying > this value to 0.01f I could see the regions getting re-balanced. > This has led me to the following questions which I would like to get it > answered from the HBase experts. > 1)What are the factors that affect the value of total cost and sum > multiplier? How could we determine the right minCostNeedBalance value for any > cluster? > 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal > value? If yes, then what is the recommended way to mitigate this scenario? > Attached: Steps to reproduce > > Note: HBase-17565 patch is already applied. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster
[ https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-22349: -- Description: HBASE-24139 allows the load balancer to run when one server has 0 regions and another server has more than 1 region. This is a special case of a more generic problem, where one server has far too few or far too many regions. The StochasticLoadBalancer defaults may decide the cluster is "balanced enough" according to {{hbase.master.balancer.stochastic.minCostNeedBalance}}, even though one server may have a far higher or lower number of regions compared to the rest of the cluster. One specific example of this we have seen is when we use {{RegionMover}} to move regions back to a restarted RegionServer, if the {{StochasticLoadBalancer}} happens to be running. The load balancer sees a newly restarted RegionServer with 0 regions, and after HBASE-24139, it will balance regions to this server. Simultaneously, {{RegionMover}} moves back regions. The end result is that the newly restarted RegionServer has twice the load of any other server in the cluster. Future iterations of the load balancer do nothing, as the cluster cost does not exceed {{minCostNeedBalance}}. Another example is if the load balancer makes very slow progress on a cluster, it may not move the average cluster load to a newly restarted regionserver in one iteration. But after the first iteration, the balancer may again not run due to cluster cost not exceeding {{minCostNeedBalance}}. We can propose a solution where we reuse the {{slop}} concept in {{SimpleLoadBalancer}} and use this to extend the HBASE-24139 logic for deciding to run the balancer as long as there is a "sloppy" server in the cluster. +*Previous Description Notes Below, which are relevant, but as stated, were already fixed by HBASE-24139*+ In EMR cluster, whenever I replace one of the nodes, the regions never get rebalanced. The default minCostNeedBalance set to 0.05 is too high. The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 = 203 Once a node(region server) got replaced with a new node (terminated and EMR recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 22, 22, 23, 23, 23 = 203 >From hbase-master-logs, I can see the below WARN which indicates that the >default minCostNeedBalance does not hold good for these scenarios. ## 2019-04-29 09:31:37,027 WARN [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] cleaner.CleanerChore: WALs outstanding under hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 09:31:42,920 INFO [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] balancer.StochasticLoadBalancer: Skipping load balancing because balanced cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost which need balance is 0.05 ## To mitigate this, I had to modify the default minCostNeedBalance to lower value like 0.01f and restart Region Servers and Hbase Master. After modifying this value to 0.01f I could see the regions getting re-balanced. This has led me to the following questions which I would like to get it answered from the HBase experts. 1)What are the factors that affect the value of total cost and sum multiplier? How could we determine the right minCostNeedBalance value for any cluster? 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal value? If yes, then what is the recommended way to mitigate this scenario? Attached: Steps to reproduce Note: HBase-17565 patch is already applied. was: In EMR cluster, whenever I replace one of the nodes, the regions never get rebalanced. The default minCostNeedBalance set to 0.05 is too high. The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 = 203 Once a node(region server) got replaced with a new node (terminated and EMR recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 22, 22, 23, 23, 23 = 203 >From hbase-master-logs, I can see the below WARN which indicates that the >default minCostNeedBalance does not hold good for these scenarios. ## 2019-04-29 09:31:37,027 WARN [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] cleaner.CleanerChore: WALs outstanding under hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 09:31:42,920 INFO [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] balancer.StochasticLoadBalancer: Skipping load balancing because balanced cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost which need balance is 0.05 ## To mitigate this, I had to modify the default minCostNeedBalance to lower value like 0.01f and restart Region Servers and Hbase Master. After modifying this value to 0.01f I could see the regions getting re-balanced. This has led me to
[jira] [Assigned] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster
[ https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning reassigned HBASE-22349: - Assignee: David Manning > Stochastic Load Balancer skips balancing when node is replaced in cluster > - > > Key: HBASE-22349 > URL: https://issues.apache.org/jira/browse/HBASE-22349 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 1.3.0, 1.4.4, 2.0.0 >Reporter: Suthan Phillips >Assignee: David Manning >Priority: Major > Attachments: Hbase-22349.pdf > > > In EMR cluster, whenever I replace one of the nodes, the regions never get > rebalanced. > The default minCostNeedBalance set to 0.05 is too high. > The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 > = 203 > Once a node(region server) got replaced with a new node (terminated and EMR > recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, > 22, 22, 23, 23, 23 = 203 > From hbase-master-logs, I can see the below WARN which indicates that the > default minCostNeedBalance does not hold good for these scenarios. > ## > 2019-04-29 09:31:37,027 WARN > [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] > cleaner.CleanerChore: WALs outstanding under > hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 > 09:31:42,920 INFO > [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost > which need balance is 0.05 > ## > To mitigate this, I had to modify the default minCostNeedBalance to lower > value like 0.01f and restart Region Servers and Hbase Master. After modifying > this value to 0.01f I could see the regions getting re-balanced. > This has led me to the following questions which I would like to get it > answered from the HBase experts. > 1)What are the factors that affect the value of total cost and sum > multiplier? How could we determine the right minCostNeedBalance value for any > cluster? > 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal > value? If yes, then what is the recommended way to mitigate this scenario? > Attached: Steps to reproduce > > Note: HBase-17565 patch is already applied. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster
[ https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522585#comment-17522585 ] David Manning commented on HBASE-22349: --- The scenario as originally described is fixed by HBASE-24139. However, I would like to propose using this to track other cases where we should execute the balancer, like one server with much fewer regions, or much more regions, than the average server in the cluster. (Take the original scenario, and instead of having 0 regions on the server, have only 1 region on the server.) I can think of a few options: # Use some hot/cold threshold like 50%. Compute the average regions per server. If a server has a region count which is >150% or <50% of this average, allow the balancer to run (short-circuit in {{needsBalance}}) # Find outliers using some type of standard deviation, and short-circuit run in {{needsBalance}} if one is found. # Introduce a "force run" of the balancer on some timed interval. I'm inclined to try option 1. Option 3 sounds appealing to me, because it is a backstop to catch all of the cases which are ignored by {{minCostNeedBalance}}. However, other operators may find it too interrupting, if they need little to no region movement in the cluster. For reference, one scenario where we find ourselves in this undesirable state is by running {{region_mover}} at the same time as the load balancer. As stated in the {{region_mover}} comments, those two operations will conflict. The result can be one regionserver which has double the regions of any other server in the cluster. And if {{minCostNeedBalance}} is not exceeded, which is not difficult in a sizable cluster, one regionserver will run with double the load indefinitely. > Stochastic Load Balancer skips balancing when node is replaced in cluster > - > > Key: HBASE-22349 > URL: https://issues.apache.org/jira/browse/HBASE-22349 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 1.3.0, 1.4.4, 2.0.0 >Reporter: Suthan Phillips >Priority: Major > Attachments: Hbase-22349.pdf > > > In EMR cluster, whenever I replace one of the nodes, the regions never get > rebalanced. > The default minCostNeedBalance set to 0.05 is too high. > The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 > = 203 > Once a node(region server) got replaced with a new node (terminated and EMR > recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, > 22, 22, 23, 23, 23 = 203 > From hbase-master-logs, I can see the below WARN which indicates that the > default minCostNeedBalance does not hold good for these scenarios. > ## > 2019-04-29 09:31:37,027 WARN > [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] > cleaner.CleanerChore: WALs outstanding under > hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 > 09:31:42,920 INFO > [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost > which need balance is 0.05 > ## > To mitigate this, I had to modify the default minCostNeedBalance to lower > value like 0.01f and restart Region Servers and Hbase Master. After modifying > this value to 0.01f I could see the regions getting re-balanced. > This has led me to the following questions which I would like to get it > answered from the HBase experts. > 1)What are the factors that affect the value of total cost and sum > multiplier? How could we determine the right minCostNeedBalance value for any > cluster? > 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal > value? If yes, then what is the recommended way to mitigate this scenario? > Attached: Steps to reproduce > > Note: HBase-17565 patch is already applied. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HBASE-26718) HFileArchiver can remove referenced StoreFiles from the archive
[ https://issues.apache.org/jira/browse/HBASE-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-26718: -- Status: Patch Available (was: Open) > HFileArchiver can remove referenced StoreFiles from the archive > --- > > Key: HBASE-26718 > URL: https://issues.apache.org/jira/browse/HBASE-26718 > Project: HBase > Issue Type: Bug > Components: Compaction, HFile, snapshots >Affects Versions: 2.0.0, 3.0.0-alpha-1, 1.0.0, 0.95.0 >Reporter: David Manning >Assignee: David Manning >Priority: Major > Fix For: 2.5.0, 1.7.2, 2.6.0, 3.0.0-alpha-3, 2.4.12 > > > There is a comment in {{HFileArchiver#resolveAndArchiveFile}}: > {code:java} > // if the file already exists in the archive, move that one to a timestamped > backup. This is a > // really, really unlikely situtation, where we get the same name for the > existing file, but > // is included just for that 1 in trillion chance. > {code} > In reality, we did encounter this frequently enough to cause problems. More > details will be included and linked in a separate issue. > But regardless of how we get into this situation, we can consider a different > approach to solving it. If we assume store files are immutable, and a store > file with the same name and location already exists in the archive, then it > can be safer to assume the file was already archived successfully, and react > accordingly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26720) ExportSnapshot should validate the source snapshot before copying files
[ https://issues.apache.org/jira/browse/HBASE-26720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501601#comment-17501601 ] David Manning commented on HBASE-26720: --- [~apurtell] I don't have a patch yet, but I do still plan to work on it. I am not opposed to giving up ownership if anyone else already has a patch. > ExportSnapshot should validate the source snapshot before copying files > --- > > Key: HBASE-26720 > URL: https://issues.apache.org/jira/browse/HBASE-26720 > Project: HBase > Issue Type: Improvement > Components: snapshots >Affects Versions: 0.99.0, 1.0.0, 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Major > Fix For: 2.5.0, 1.7.2, 2.6.0, 3.0.0-alpha-3, 2.4.11 > > > Running {{ExportSnapshot}} with default parameters will copy the snapshot to > a target location, and then use {{verifySnapshot}} to validate the integrity > of the written snapshot. However, it is possible for the source snapshot to > be invalid which leads to an invalid exported snapshot. > We can validate the source snapshot before export. > By default, we can validate the source snapshot unless the > {{-no-target-verify}} parameter is set. We could also introduce a separate > parameter for {{-no-source-verify}} if an operator wanted to validate the > target but not validate the source for some reason, to provide some amount of > backwards compatibility if that scenario is important. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26718) HFileArchiver can remove referenced StoreFiles from the archive
[ https://issues.apache.org/jira/browse/HBASE-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501600#comment-17501600 ] David Manning commented on HBASE-26718: --- [~apurtell] I don't have a patch yet, but I do still plan to work on it. I am not opposed to giving up ownership if anyone else already has a patch. > HFileArchiver can remove referenced StoreFiles from the archive > --- > > Key: HBASE-26718 > URL: https://issues.apache.org/jira/browse/HBASE-26718 > Project: HBase > Issue Type: Bug > Components: Compaction, HFile, snapshots >Affects Versions: 0.95.0, 1.0.0, 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Major > Fix For: 2.5.0, 1.7.2, 2.6.0, 3.0.0-alpha-3, 2.4.11 > > > There is a comment in {{HFileArchiver#resolveAndArchiveFile}}: > {code:java} > // if the file already exists in the archive, move that one to a timestamped > backup. This is a > // really, really unlikely situtation, where we get the same name for the > existing file, but > // is included just for that 1 in trillion chance. > {code} > In reality, we did encounter this frequently enough to cause problems. More > details will be included and linked in a separate issue. > But regardless of how we get into this situation, we can consider a different > approach to solving it. If we assume store files are immutable, and a store > file with the same name and location already exists in the archive, then it > can be safer to assume the file was already archived successfully, and react > accordingly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HBASE-26722) Snapshot is corrupted due to interaction between move, warmupRegion, compaction, and HFileArchiver
[ https://issues.apache.org/jira/browse/HBASE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-26722: -- Affects Version/s: (was: 2.0.0) > Snapshot is corrupted due to interaction between move, warmupRegion, > compaction, and HFileArchiver > -- > > Key: HBASE-26722 > URL: https://issues.apache.org/jira/browse/HBASE-26722 > Project: HBase > Issue Type: Bug > Components: Compaction, mover, snapshots >Affects Versions: 1.3.5 >Reporter: David Manning >Priority: Critical > Fix For: 2.2.0, 2.3.0 > > > There is an interesting sequence of events which leads to split-brain, > double-assignment type of behavior with management of store files. > The scenario is this: > # Take snapshot > # RegionX of snapshotted table is hosted on RegionServer1. > # Stop RegionServer1, using {{region_mover}}, gracefully moving all regions > to other regionservers using {{move}} RPCs. > # RegionX is now opened on RegionServer2. > # RegionServer2 compacts RegionX after opening. > # RegionServer1 starts and uses {{region_mover}} to {{move}} all previously > owned regions back to itself. > # The HMaster RPC to {{move}} calls {{warmupRegion}} on RegionServer1. > # As part of {{warmupRegion}}, RegionServer1 opens all store files of > RegionX. CompactedHFilesDischarger chore has not yet archived the > pre-compacted store file. RegionServer1 finds both the pre-compacted store > file and post-compacted store file. It logs a warning and archives the > pre-compacted file. > # RegionServer1 has warmed up the region, so now HMaster resumes the {{move}} > and sends {{close}} RegionX to RegionServer2. > # RegionServer2 closes its store files. As part of this, it archives any > compacted files which have not yet been archived by the > {{CompactedHFilesDischarger}} chore. > # Since RegionServer1 already archived the file, RegionServer2's > {{HFileArchiver}} finds the destination archive file already exists. (code > link) > # RegionServer2 renames the archived file, to free up the desired destination > filename. > With the archived file renamed, RegionServer2 attempts to archive the file as > planned. But the source file doesn't exist because RegionServer1 already > moved it... to the location RegionServer2 expected to use! > # RegionServer2 silently ignores this archival failure. (code link) > # HMaster {{HFileCleaner}} chore later deletes the renamed archive file, > because there is no active reference to it. (The snapshot reference is to the > original named file, not the "backup" timestamped version.) The snapshot data > is irretrievably lost. > HBASE-26718 tracks a potential, specific change to the archival process to > avoid this specific issue. > However, there is a more fundamental problem here that a region opened by > {{warmupRegion}} can operate on that region's store files while the region is > opened elsewhere, which must not be allowed. > This was seen on branch-1, and is a combination of HBASE-22330 and not having > the fix for HBASE-22163. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HBASE-26722) Snapshot is corrupted due to interaction between move, warmupRegion, compaction, and HFileArchiver
[ https://issues.apache.org/jira/browse/HBASE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-26722: -- Fix Version/s: 2.3.0 2.2.0 > Snapshot is corrupted due to interaction between move, warmupRegion, > compaction, and HFileArchiver > -- > > Key: HBASE-26722 > URL: https://issues.apache.org/jira/browse/HBASE-26722 > Project: HBase > Issue Type: Bug > Components: Compaction, mover, snapshots >Affects Versions: 2.0.0, 1.3.5 >Reporter: David Manning >Priority: Critical > Fix For: 2.2.0, 2.3.0 > > > There is an interesting sequence of events which leads to split-brain, > double-assignment type of behavior with management of store files. > The scenario is this: > # Take snapshot > # RegionX of snapshotted table is hosted on RegionServer1. > # Stop RegionServer1, using {{region_mover}}, gracefully moving all regions > to other regionservers using {{move}} RPCs. > # RegionX is now opened on RegionServer2. > # RegionServer2 compacts RegionX after opening. > # RegionServer1 starts and uses {{region_mover}} to {{move}} all previously > owned regions back to itself. > # The HMaster RPC to {{move}} calls {{warmupRegion}} on RegionServer1. > # As part of {{warmupRegion}}, RegionServer1 opens all store files of > RegionX. CompactedHFilesDischarger chore has not yet archived the > pre-compacted store file. RegionServer1 finds both the pre-compacted store > file and post-compacted store file. It logs a warning and archives the > pre-compacted file. > # RegionServer1 has warmed up the region, so now HMaster resumes the {{move}} > and sends {{close}} RegionX to RegionServer2. > # RegionServer2 closes its store files. As part of this, it archives any > compacted files which have not yet been archived by the > {{CompactedHFilesDischarger}} chore. > # Since RegionServer1 already archived the file, RegionServer2's > {{HFileArchiver}} finds the destination archive file already exists. (code > link) > # RegionServer2 renames the archived file, to free up the desired destination > filename. > With the archived file renamed, RegionServer2 attempts to archive the file as > planned. But the source file doesn't exist because RegionServer1 already > moved it... to the location RegionServer2 expected to use! > # RegionServer2 silently ignores this archival failure. (code link) > # HMaster {{HFileCleaner}} chore later deletes the renamed archive file, > because there is no active reference to it. (The snapshot reference is to the > original named file, not the "backup" timestamped version.) The snapshot data > is irretrievably lost. > HBASE-26718 tracks a potential, specific change to the archival process to > avoid this specific issue. > However, there is a more fundamental problem here that a region opened by > {{warmupRegion}} can operate on that region's store files while the region is > opened elsewhere, which must not be allowed. > This was seen on branch-1, and is a combination of HBASE-22330 and not having > the fix for HBASE-22163. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HBASE-26722) Snapshot is corrupted due to interaction between move, warmupRegion, compaction, and HFileArchiver
[ https://issues.apache.org/jira/browse/HBASE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning resolved HBASE-26722. --- Resolution: Duplicate > Snapshot is corrupted due to interaction between move, warmupRegion, > compaction, and HFileArchiver > -- > > Key: HBASE-26722 > URL: https://issues.apache.org/jira/browse/HBASE-26722 > Project: HBase > Issue Type: Bug > Components: Compaction, mover, snapshots >Affects Versions: 2.0.0, 1.3.5 >Reporter: David Manning >Priority: Critical > > There is an interesting sequence of events which leads to split-brain, > double-assignment type of behavior with management of store files. > The scenario is this: > # Take snapshot > # RegionX of snapshotted table is hosted on RegionServer1. > # Stop RegionServer1, using {{region_mover}}, gracefully moving all regions > to other regionservers using {{move}} RPCs. > # RegionX is now opened on RegionServer2. > # RegionServer2 compacts RegionX after opening. > # RegionServer1 starts and uses {{region_mover}} to {{move}} all previously > owned regions back to itself. > # The HMaster RPC to {{move}} calls {{warmupRegion}} on RegionServer1. > # As part of {{warmupRegion}}, RegionServer1 opens all store files of > RegionX. CompactedHFilesDischarger chore has not yet archived the > pre-compacted store file. RegionServer1 finds both the pre-compacted store > file and post-compacted store file. It logs a warning and archives the > pre-compacted file. > # RegionServer1 has warmed up the region, so now HMaster resumes the {{move}} > and sends {{close}} RegionX to RegionServer2. > # RegionServer2 closes its store files. As part of this, it archives any > compacted files which have not yet been archived by the > {{CompactedHFilesDischarger}} chore. > # Since RegionServer1 already archived the file, RegionServer2's > {{HFileArchiver}} finds the destination archive file already exists. (code > link) > # RegionServer2 renames the archived file, to free up the desired destination > filename. > With the archived file renamed, RegionServer2 attempts to archive the file as > planned. But the source file doesn't exist because RegionServer1 already > moved it... to the location RegionServer2 expected to use! > # RegionServer2 silently ignores this archival failure. (code link) > # HMaster {{HFileCleaner}} chore later deletes the renamed archive file, > because there is no active reference to it. (The snapshot reference is to the > original named file, not the "backup" timestamped version.) The snapshot data > is irretrievably lost. > HBASE-26718 tracks a potential, specific change to the archival process to > avoid this specific issue. > However, there is a more fundamental problem here that a region opened by > {{warmupRegion}} can operate on that region's store files while the region is > opened elsewhere, which must not be allowed. > This was seen on branch-1, and is a combination of HBASE-22330 and not having > the fix for HBASE-22163. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26722) Snapshot is corrupted due to interaction between move, warmupRegion, compaction, and HFileArchiver
David Manning created HBASE-26722: - Summary: Snapshot is corrupted due to interaction between move, warmupRegion, compaction, and HFileArchiver Key: HBASE-26722 URL: https://issues.apache.org/jira/browse/HBASE-26722 Project: HBase Issue Type: Bug Components: Compaction, mover, snapshots Affects Versions: 1.3.5, 2.0.0 Reporter: David Manning There is an interesting sequence of events which leads to split-brain, double-assignment type of behavior with management of store files. The scenario is this: # Take snapshot # RegionX of snapshotted table is hosted on RegionServer1. # Stop RegionServer1, using {{region_mover}}, gracefully moving all regions to other regionservers using {{move}} RPCs. # RegionX is now opened on RegionServer2. # RegionServer2 compacts RegionX after opening. # RegionServer1 starts and uses {{region_mover}} to {{move}} all previously owned regions back to itself. # The HMaster RPC to {{move}} calls {{warmupRegion}} on RegionServer1. # As part of {{warmupRegion}}, RegionServer1 opens all store files of RegionX. CompactedHFilesDischarger chore has not yet archived the pre-compacted store file. RegionServer1 finds both the pre-compacted store file and post-compacted store file. It logs a warning and archives the pre-compacted file. # RegionServer1 has warmed up the region, so now HMaster resumes the {{move}} and sends {{close}} RegionX to RegionServer2. # RegionServer2 closes its store files. As part of this, it archives any compacted files which have not yet been archived by the {{CompactedHFilesDischarger}} chore. # Since RegionServer1 already archived the file, RegionServer2's {{HFileArchiver}} finds the destination archive file already exists. (code link) # RegionServer2 renames the archived file, to free up the desired destination filename. With the archived file renamed, RegionServer2 attempts to archive the file as planned. But the source file doesn't exist because RegionServer1 already moved it... to the location RegionServer2 expected to use! # RegionServer2 silently ignores this archival failure. (code link) # HMaster {{HFileCleaner}} chore later deletes the renamed archive file, because there is no active reference to it. (The snapshot reference is to the original named file, not the "backup" timestamped version.) The snapshot data is irretrievably lost. HBASE-26718 tracks a potential, specific change to the archival process to avoid this specific issue. However, there is a more fundamental problem here that a region opened by {{warmupRegion}} can operate on that region's store files while the region is opened elsewhere, which must not be allowed. This was seen on branch-1, and is a combination of HBASE-22330 and not having the fix for HBASE-22163. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26720) ExportSnapshot should validate the source snapshot before copying files
David Manning created HBASE-26720: - Summary: ExportSnapshot should validate the source snapshot before copying files Key: HBASE-26720 URL: https://issues.apache.org/jira/browse/HBASE-26720 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 2.0.0, 3.0.0-alpha-1, 1.0.0, 0.99.0 Reporter: David Manning Assignee: David Manning Running {{ExportSnapshot}} with default parameters will copy the snapshot to a target location, and then use {{verifySnapshot}} to validate the integrity of the written snapshot. However, it is possible for the source snapshot to be invalid which leads to an invalid exported snapshot. We can validate the source snapshot before export. By default, we can validate the source snapshot unless the {{-no-target-verify}} parameter is set. We could also introduce a separate parameter for {{-no-source-verify}} if an operator wanted to validate the target but not validate the source for some reason, to provide some amount of backwards compatibility if that scenario is important. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HBASE-26718) HFileArchiver can remove referenced StoreFiles from the archive
[ https://issues.apache.org/jira/browse/HBASE-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-26718: -- Affects Version/s: 0.95.0 > HFileArchiver can remove referenced StoreFiles from the archive > --- > > Key: HBASE-26718 > URL: https://issues.apache.org/jira/browse/HBASE-26718 > Project: HBase > Issue Type: Bug > Components: Compaction, HFile, snapshots >Affects Versions: 0.95.0, 1.0.0, 3.0.0-alpha-1, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Major > > There is a comment in {{HFileArchiver#resolveAndArchiveFile}}: > {code:java} > // if the file already exists in the archive, move that one to a timestamped > backup. This is a > // really, really unlikely situtation, where we get the same name for the > existing file, but > // is included just for that 1 in trillion chance. > {code} > In reality, we did encounter this frequently enough to cause problems. More > details will be included and linked in a separate issue. > But regardless of how we get into this situation, we can consider a different > approach to solving it. If we assume store files are immutable, and a store > file with the same name and location already exists in the archive, then it > can be safer to assume the file was already archived successfully, and react > accordingly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26718) HFileArchiver can remove referenced StoreFiles from the archive
David Manning created HBASE-26718: - Summary: HFileArchiver can remove referenced StoreFiles from the archive Key: HBASE-26718 URL: https://issues.apache.org/jira/browse/HBASE-26718 Project: HBase Issue Type: Bug Components: Compaction, HFile, snapshots Affects Versions: 2.0.0, 3.0.0-alpha-1, 1.0.0 Reporter: David Manning Assignee: David Manning There is a comment in {{HFileArchiver#resolveAndArchiveFile}}: {code:java} // if the file already exists in the archive, move that one to a timestamped backup. This is a // really, really unlikely situtation, where we get the same name for the existing file, but // is included just for that 1 in trillion chance. {code} In reality, we did encounter this frequently enough to cause problems. More details will be included and linked in a separate issue. But regardless of how we get into this situation, we can consider a different approach to solving it. If we assume store files are immutable, and a store file with the same name and location already exists in the archive, then it can be safer to assume the file was already archived successfully, and react accordingly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-22300) SLB doesn't perform well with increase in number of regions
[ https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378549#comment-17378549 ] David Manning commented on HBASE-22300: --- and more specifically in subtasks HBASE-25947 or HBASE-25894 > SLB doesn't perform well with increase in number of regions > --- > > Key: HBASE-22300 > URL: https://issues.apache.org/jira/browse/HBASE-22300 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Biju Nair >Assignee: David Manning >Priority: Major > Labels: balancer > Attachments: CostFromRegionLoadFunctionNew.rtf > > > With increase in number of regions in a cluster the number of steps taken by > balancer in 30 sec (default balancer runtime) reduces noticeably. The > following is the number of steps taken with by balancer with region loads set > and running it without the loads being set i.e. cost functions using region > loads are not fully exercised. > {noformat} > Nodes regions Tables # of steps # of steps > with RS Load With no load > 5 50 5 20 20 > 100 2000 110 104707 100 > > 100 1 40 19911 100 > > 200 10 400 870 100 > {noformat} > As one would expect the reduced number of steps also makes the balancer take > long time to get to an optimal cost. Note that only 2 data points were used > in the region load histogram while in practice 15 region load data points are > remembered. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22300) SLB doesn't perform well with increase in number of regions
[ https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378546#comment-17378546 ] David Manning commented on HBASE-22300: --- I was working on this but it looks like it's already resolved in HBASE-25832. > SLB doesn't perform well with increase in number of regions > --- > > Key: HBASE-22300 > URL: https://issues.apache.org/jira/browse/HBASE-22300 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Biju Nair >Assignee: David Manning >Priority: Major > Labels: balancer > Attachments: CostFromRegionLoadFunctionNew.rtf > > > With increase in number of regions in a cluster the number of steps taken by > balancer in 30 sec (default balancer runtime) reduces noticeably. The > following is the number of steps taken with by balancer with region loads set > and running it without the loads being set i.e. cost functions using region > loads are not fully exercised. > {noformat} > Nodes regions Tables # of steps # of steps > with RS Load With no load > 5 50 5 20 20 > 100 2000 110 104707 100 > > 100 1 40 19911 100 > > 200 10 400 870 100 > {noformat} > As one would expect the reduced number of steps also makes the balancer take > long time to get to an optimal cost. Note that only 2 data points were used > in the region load histogram while in practice 15 region load data points are > remembered. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-22300) SLB doesn't perform well with increase in number of regions
[ https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning resolved HBASE-22300. --- Resolution: Duplicate > SLB doesn't perform well with increase in number of regions > --- > > Key: HBASE-22300 > URL: https://issues.apache.org/jira/browse/HBASE-22300 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Biju Nair >Assignee: David Manning >Priority: Major > Labels: balancer > Attachments: CostFromRegionLoadFunctionNew.rtf > > > With increase in number of regions in a cluster the number of steps taken by > balancer in 30 sec (default balancer runtime) reduces noticeably. The > following is the number of steps taken with by balancer with region loads set > and running it without the loads being set i.e. cost functions using region > loads are not fully exercised. > {noformat} > Nodes regions Tables # of steps # of steps > with RS Load With no load > 5 50 5 20 20 > 100 2000 110 104707 100 > > 100 1 40 19911 100 > > 200 10 400 870 100 > {noformat} > As one would expect the reduced number of steps also makes the balancer take > long time to get to an optimal cost. Note that only 2 data points were used > in the region load histogram while in practice 15 region load data points are > remembered. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-22300) SLB doesn't perform well with increase in number of regions
[ https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning reassigned HBASE-22300: - Assignee: David Manning > SLB doesn't perform well with increase in number of regions > --- > > Key: HBASE-22300 > URL: https://issues.apache.org/jira/browse/HBASE-22300 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Biju Nair >Assignee: David Manning >Priority: Major > Labels: balancer > Attachments: CostFromRegionLoadFunctionNew.rtf > > > With increase in number of regions in a cluster the number of steps taken by > balancer in 30 sec (default balancer runtime) reduces noticeably. The > following is the number of steps taken with by balancer with region loads set > and running it without the loads being set i.e. cost functions using region > loads are not fully exercised. > {noformat} > Nodes regions Tables # of steps # of steps > with RS Load With no load > 5 50 5 20 20 > 100 2000 110 104707 100 > > 100 1 40 19911 100 > > 200 10 400 870 100 > {noformat} > As one would expect the reduced number of steps also makes the balancer take > long time to get to an optimal cost. Note that only 2 data points were used > in the region load histogram while in practice 15 region load data points are > remembered. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-22300) SLB doesn't perform well with increase in number of regions
[ https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362413#comment-17362413 ] David Manning edited comment on HBASE-22300 at 6/12/21, 10:37 PM: -- [~gsbiju] do you still have interest in pursuing this work? If not, I would like to attempt a fix based on your proposal. was (Author: dmanning): [~gsbiju] do you still have interest in pursuing this work? If not, I would like to attempt a fix. > SLB doesn't perform well with increase in number of regions > --- > > Key: HBASE-22300 > URL: https://issues.apache.org/jira/browse/HBASE-22300 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Biju Nair >Priority: Major > Labels: balancer > Attachments: CostFromRegionLoadFunctionNew.rtf > > > With increase in number of regions in a cluster the number of steps taken by > balancer in 30 sec (default balancer runtime) reduces noticeably. The > following is the number of steps taken with by balancer with region loads set > and running it without the loads being set i.e. cost functions using region > loads are not fully exercised. > {noformat} > Nodes regions Tables # of steps # of steps > with RS Load With no load > 5 50 5 20 20 > 100 2000 110 104707 100 > > 100 1 40 19911 100 > > 200 10 400 870 100 > {noformat} > As one would expect the reduced number of steps also makes the balancer take > long time to get to an optimal cost. Note that only 2 data points were used > in the region load histogram while in practice 15 region load data points are > remembered. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22300) SLB doesn't perform well with increase in number of regions
[ https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362413#comment-17362413 ] David Manning commented on HBASE-22300: --- [~gsbiju] do you still have interest in pursuing this work? If not, I would like to attempt a fix. > SLB doesn't perform well with increase in number of regions > --- > > Key: HBASE-22300 > URL: https://issues.apache.org/jira/browse/HBASE-22300 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Biju Nair >Priority: Major > Labels: balancer > Attachments: CostFromRegionLoadFunctionNew.rtf > > > With increase in number of regions in a cluster the number of steps taken by > balancer in 30 sec (default balancer runtime) reduces noticeably. The > following is the number of steps taken with by balancer with region loads set > and running it without the loads being set i.e. cost functions using region > loads are not fully exercised. > {noformat} > Nodes regions Tables # of steps # of steps > with RS Load With no load > 5 50 5 20 20 > 100 2000 110 104707 100 > > 100 1 40 19911 100 > > 200 10 400 870 100 > {noformat} > As one would expect the reduced number of steps also makes the balancer take > long time to get to an optimal cost. Note that only 2 data points were used > in the region load histogram while in practice 15 region load data points are > remembered. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25739) TableSkewCostFunction need to use aggregated deviation
[ https://issues.apache.org/jira/browse/HBASE-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320237#comment-17320237 ] David Manning commented on HBASE-25739: --- oops yes! 198. Thanks [~claraxiong] > TableSkewCostFunction need to use aggregated deviation > -- > > Key: HBASE-25739 > URL: https://issues.apache.org/jira/browse/HBASE-25739 > Project: HBase > Issue Type: Sub-task > Components: Balancer, master >Reporter: Clara Xiong >Priority: Major > > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use aggregated deviation of the count per > region server to detect this scenario, generating a cost of 100/198 = 0.5 in > this case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25739) TableSkewCostFunction need to use aggregated deviation
[ https://issues.apache.org/jira/browse/HBASE-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319945#comment-17319945 ] David Manning commented on HBASE-25739: --- Yes [~clarax98007] that sounds good to me. I wasn’t suggesting any default weights need to change but was curious what you had found. Thanks for sharing. The final cost in the description probably goes to 100/298 instead of 3.1/31, is that right? > TableSkewCostFunction need to use aggregated deviation > -- > > Key: HBASE-25739 > URL: https://issues.apache.org/jira/browse/HBASE-25739 > Project: HBase > Issue Type: Sub-task > Components: Balancer, master >Reporter: Clara Xiong >Priority: Major > > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use aggregated deviation of the count per > region server to detect this scenario, generating a cost of 3.1/31 = 0.1 in > this case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25739) TableSkewCostFunction need to use aggregated deviation
[ https://issues.apache.org/jira/browse/HBASE-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318128#comment-17318128 ] David Manning commented on HBASE-25739: --- Can you update the description since we are no longer using standard deviation in the current proposal? Do you have any thoughts on the current default weight of TableSkewCostFunction? DEFAULT_TABLE_SKEW_COST = 35 - I wonder if this still makes sense given this change, or if it had this value due to the previous cost calculation. I don't really know myself... intuitively it makes sense to me to leave at 35, as it seems more important than most other cost functions, and less important than RegionCountSkewCostFunction. I was just curious if you had any thoughts. Thanks for the nice improvement. > TableSkewCostFunction need to use aggregated deviation > -- > > Key: HBASE-25739 > URL: https://issues.apache.org/jira/browse/HBASE-25739 > Project: HBase > Issue Type: Sub-task > Components: Balancer, master >Reporter: Clara Xiong >Priority: Major > > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use the standard deviation of the count > per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 > in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-25749) Improved logging when interrupting active RPC handlers holding the region close lock (HBASE-25212 hbase.regionserver.close.wait.abort)
[ https://issues.apache.org/jira/browse/HBASE-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning reassigned HBASE-25749: - Assignee: Andrew Kyle Purtell > Improved logging when interrupting active RPC handlers holding the region > close lock (HBASE-25212 hbase.regionserver.close.wait.abort) > -- > > Key: HBASE-25749 > URL: https://issues.apache.org/jira/browse/HBASE-25749 > Project: HBase > Issue Type: Bug > Components: regionserver, rpc >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.4.0 >Reporter: David Manning >Assignee: Andrew Kyle Purtell >Priority: Minor > > HBASE-25212 adds an optional improvement to Close Region, for interrupting > active RPC handlers holding the region close lock. If, after the timeout is > reached, the close lock can still not be acquired, the regionserver may > abort. It would be helpful to add logging for which threads or components are > holding the region close lock at this time. > Depending on the size of regionLockHolders, or use of any stack traces, log > output may need to be truncated. The interrupt code is in > HRegion#interruptRegionOperations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25749) Improved logging when interrupting active RPC handlers holding the region close lock (HBASE-25212 hbase.regionserver.close.wait.abort)
David Manning created HBASE-25749: - Summary: Improved logging when interrupting active RPC handlers holding the region close lock (HBASE-25212 hbase.regionserver.close.wait.abort) Key: HBASE-25749 URL: https://issues.apache.org/jira/browse/HBASE-25749 Project: HBase Issue Type: Bug Components: regionserver, rpc Affects Versions: 2.4.0, 3.0.0-alpha-1, 1.7.0 Reporter: David Manning HBASE-25212 adds an optional improvement to Close Region, for interrupting active RPC handlers holding the region close lock. If, after the timeout is reached, the close lock can still not be acquired, the regionserver may abort. It would be helpful to add logging for which threads or components are holding the region close lock at this time. Depending on the size of regionLockHolders, or use of any stack traces, log output may need to be truncated. The interrupt code is in HRegion#interruptRegionOperations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25726) MoveCostFunction is not included in the list of cost functions for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-25726: -- Status: Patch Available (was: Open) > MoveCostFunction is not included in the list of cost functions for > StochasticLoadBalancer > - > > Key: HBASE-25726 > URL: https://issues.apache.org/jira/browse/HBASE-25726 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 2.4.0, 2.3.1, 3.0.0-alpha-1, 1.7.0 >Reporter: David Manning >Assignee: David Manning >Priority: Major > > After OffPeakHours fix for MoveCostFunction (HBASE-24709), MoveCostFunction > is no longer included in costFunctions list. {{addCostFunction}} expects > multiplier to be non-zero, but multiplier is now only set in {{cost}} > function. > As a result, {{hbase.master.balancer.stochastic.maxMovePercent}} is not > respected, and there is no cost function to oppose a move. Any move that > decreases total cost at all will be accepted, causing more churn and > disruption from balancer executions. > We noticed this when investigating a case where the balancer would run after > a regionserver was restarted without use of region_mover script. The > regionserver comes online with 0 regions, leading to a shortcut in > {{needsBalance}} for {{idleRegionServerExist}}. The balancer runs to move > regions to that newly restarted regionserver. However, it moves a large > number of regions in the cluster, hyper-optimizing the other cost variables. > There were ~4300 regions in the cluster at the time, so moving 25% of the > regions should have had a final cost of at least 7 (default moveCostFunction > weight.) MoveCostFunction is also not listed in the functions contributing to > the initial cost. > {{2021-03-30 15:47:43,396 INFO [49187_ChoreService_3] > balancer.StochasticLoadBalancer - start StochasticLoadBalancer.balancer, > initCost=12.91377229840024, functionCost=RegionCountSkewCostFunction : > (500.0, 0.014878672009326464); TableSkewCostFunction : (35.0, > 0.013600280177445717); RegionReplicaHostCostFunction : (10.0, 0.0); > RegionReplicaRackCostFunction : (1.0, 0.0); ReadRequestCostFunction : > (5.0, 0.8296332203204705); WriteRequestCostFunction : (5.0, > 0.06818455421617946); MemstoreSizeCostFunction : (5.0, 0.08132131691669181); > StoreFileCostFunction : (5.0, 0.02054620605193966); computedMaxSteps: > 100}} > {{2021-03-30 15:48:13,385 DEBUG [49187_ChoreService_3] > balancer.StochasticLoadBalancer - Finished computing new load balance plan. > Computation took 30004ms to try 6571 different iterations. Found a solution > that moves 1095 regions; Going from a computed cost of 12.91377229840024 to a > new cost of 4.804625730746651}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-25726) MoveCostFunction is not included in the list of cost functions for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning reassigned HBASE-25726: - Assignee: David Manning > MoveCostFunction is not included in the list of cost functions for > StochasticLoadBalancer > - > > Key: HBASE-25726 > URL: https://issues.apache.org/jira/browse/HBASE-25726 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0 >Reporter: David Manning >Assignee: David Manning >Priority: Major > > After OffPeakHours fix for MoveCostFunction (HBASE-24709), MoveCostFunction > is no longer included in costFunctions list. {{addCostFunction}} expects > multiplier to be non-zero, but multiplier is now only set in {{cost}} > function. > As a result, {{hbase.master.balancer.stochastic.maxMovePercent}} is not > respected, and there is no cost function to oppose a move. Any move that > decreases total cost at all will be accepted, causing more churn and > disruption from balancer executions. > We noticed this when investigating a case where the balancer would run after > a regionserver was restarted without use of region_mover script. The > regionserver comes online with 0 regions, leading to a shortcut in > {{needsBalance}} for {{idleRegionServerExist}}. The balancer runs to move > regions to that newly restarted regionserver. However, it moves a large > number of regions in the cluster, hyper-optimizing the other cost variables. > There were ~4300 regions in the cluster at the time, so moving 25% of the > regions should have had a final cost of at least 7 (default moveCostFunction > weight.) MoveCostFunction is also not listed in the functions contributing to > the initial cost. > {{2021-03-30 15:47:43,396 INFO [49187_ChoreService_3] > balancer.StochasticLoadBalancer - start StochasticLoadBalancer.balancer, > initCost=12.91377229840024, functionCost=RegionCountSkewCostFunction : > (500.0, 0.014878672009326464); TableSkewCostFunction : (35.0, > 0.013600280177445717); RegionReplicaHostCostFunction : (10.0, 0.0); > RegionReplicaRackCostFunction : (1.0, 0.0); ReadRequestCostFunction : > (5.0, 0.8296332203204705); WriteRequestCostFunction : (5.0, > 0.06818455421617946); MemstoreSizeCostFunction : (5.0, 0.08132131691669181); > StoreFileCostFunction : (5.0, 0.02054620605193966); computedMaxSteps: > 100}} > {{2021-03-30 15:48:13,385 DEBUG [49187_ChoreService_3] > balancer.StochasticLoadBalancer - Finished computing new load balance plan. > Computation took 30004ms to try 6571 different iterations. Found a solution > that moves 1095 regions; Going from a computed cost of 12.91377229840024 to a > new cost of 4.804625730746651}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25726) MoveCostFunction is not included in the list of cost functions for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-25726: -- Description: After OffPeakHours fix for MoveCostFunction (HBASE-24709), MoveCostFunction is no longer included in costFunctions list. {{addCostFunction}} expects multiplier to be non-zero, but multiplier is now only set in {{cost}} function. As a result, {{hbase.master.balancer.stochastic.maxMovePercent}} is not respected, and there is no cost function to oppose a move. Any move that decreases total cost at all will be accepted, causing more churn and disruption from balancer executions. We noticed this when investigating a case where the balancer would run after a regionserver was restarted without use of region_mover script. The regionserver comes online with 0 regions, leading to a shortcut in {{needsBalance}} for {{idleRegionServerExist}}. The balancer runs to move regions to that newly restarted regionserver. However, it moves a large number of regions in the cluster, hyper-optimizing the other cost variables. There were ~4300 regions in the cluster at the time, so moving 25% of the regions should have had a final cost of at least 7 (default moveCostFunction weight.) MoveCostFunction is also not listed in the functions contributing to the initial cost. {{2021-03-30 15:47:43,396 INFO [49187_ChoreService_3] balancer.StochasticLoadBalancer - start StochasticLoadBalancer.balancer, initCost=12.91377229840024, functionCost=RegionCountSkewCostFunction : (500.0, 0.014878672009326464); TableSkewCostFunction : (35.0, 0.013600280177445717); RegionReplicaHostCostFunction : (10.0, 0.0); RegionReplicaRackCostFunction : (1.0, 0.0); ReadRequestCostFunction : (5.0, 0.8296332203204705); WriteRequestCostFunction : (5.0, 0.06818455421617946); MemstoreSizeCostFunction : (5.0, 0.08132131691669181); StoreFileCostFunction : (5.0, 0.02054620605193966); computedMaxSteps: 100}} {{2021-03-30 15:48:13,385 DEBUG [49187_ChoreService_3] balancer.StochasticLoadBalancer - Finished computing new load balance plan. Computation took 30004ms to try 6571 different iterations. Found a solution that moves 1095 regions; Going from a computed cost of 12.91377229840024 to a new cost of 4.804625730746651}} was: After OffPeakHours fix for MoveCostFunction (HBASE-24709), MoveCostFunction is no longer included in costFunctions list. {{addCostFunction}} expects multiplier to be non-zero, but multiplier is now only set in {{cost}} function. As a result, {{hbase.master.balancer.stochastic.maxMovePercent}} is not respected, and there is no cost function to oppose a move. Any move that decreases total cost at all will be accepted, causing more churn and disruption from balancer executions. We noticed this when investigating a case where the balancer would run after a regionserver was restarted without use of region_mover script. The regionserver comes online with 0 regions, leading to a shortcut in {{needsBalance}} for {{idleRegionServerExist}}. The balancer runs to move regions to that newly restarted regionserver. However, it moves a large number of regions in the cluster, hyper-optimizing the other cost variables. There were ~4300 regions in the cluster at the time, so moving 25% of the regions should have had a final cost of at least 7 (default moveCostFunction weight.) MoveCostFunction is also not listed in the functions contributing to the initial cost. {{2021}}{{-}}{{03}}{{-}}{{30}}{{ }}{{15}}{{:}}{{47}}{{:}}{{43}}{{,}}{{396}}{{ }}{{INFO}}{{ [}}{{49187}}{{_}}{{ChoreService}}{{_}}{{3}}{{] }}{{balancer}}{{.}}{{StochasticLoadBalancer}}{{ }}{{-}}{{ }}{{start}}{{}}{{StochasticLoadBalancer}}{{.}}{{balancer}}{{, }}{{initCost}}{{=}}{{12}}{{.}}{{91377229840024}}{{, }}{{functionCost}}{{=}}{{RegionCountSkewCostFunction}}{{ : (}}{{500}}{{.}}{{0}}{{, }}{{0}}{{.}}{{014878672009326464}}{{); }}{{TableSkewCostFunction}}{{ : (}}{{35}}{{.}}{{0}}{{, }}{{0}}{{.}}{{013600280177445717}}{{); }}{{RegionReplicaHostCostFunction}}{{ : (}}{{10}}{{.}}{{0}}{{, }}{{0}}{{.}}{{0}}{{); }}{{RegionReplicaRackCostFunction}}{{ : (}}{{1}}{{.}}{{0}}{{, }}{{0}}{{.}}{{0}}{{); }}{{ReadRequestCostFunction}}{{ : (}}{{5}}{{.}}{{0}}{{, }}{{0}}{{.}}{{8296332203204705}}{{); }}{{WriteRequestCostFunction}}{{ : (}}{{5}}{{.}}{{0}}{{, }}{{0}}{{.}}{{06818455421617946}}{{); }}{{MemstoreSizeCostFunction}}{{ : (}}{{5}}{{.}}{{0}}{{, }}{{0}}{{.}}{{08132131691669181}}{{); }}{{StoreFileCostFunction}}{{ : (}}{{5}}{{.}}{{0}}{{, }}{{0}}{{.}}{{02054620605193966}}{{); }}{{computedMaxSteps}}{{: }}{{100}} {{2021}}{{-}}{{03}}{{-}}{{30}}{{ }}{{15}}{{:}}{{48}}{{:}}{{13}}{{,}}{{385}}{{ }}{{DEBUG}}{{ [}}{{49187}}{{_}}{{ChoreService}}{{_}}{{3}}{{] }}{{balancer}}{{.}}{{StochasticLoadBalancer}}{{ }}{{-}}{{ }}{{Finished }}{{}}{{computing}}{{ }}{{new}}{{ }}{{load}}{{ }}{{balance}}{{ }}{{plan}}{{.}}{{ }}{{Computation}}{{
[jira] [Created] (HBASE-25726) MoveCostFunction is not included in the list of cost functions for StochasticLoadBalancer
David Manning created HBASE-25726: - Summary: MoveCostFunction is not included in the list of cost functions for StochasticLoadBalancer Key: HBASE-25726 URL: https://issues.apache.org/jira/browse/HBASE-25726 Project: HBase Issue Type: Bug Components: Balancer Affects Versions: 2.4.0, 2.3.1, 3.0.0-alpha-1, 1.7.0 Reporter: David Manning After OffPeakHours fix for MoveCostFunction (HBASE-24709), MoveCostFunction is no longer included in costFunctions list. {{addCostFunction}} expects multiplier to be non-zero, but multiplier is now only set in {{cost}} function. As a result, {{hbase.master.balancer.stochastic.maxMovePercent}} is not respected, and there is no cost function to oppose a move. Any move that decreases total cost at all will be accepted, causing more churn and disruption from balancer executions. We noticed this when investigating a case where the balancer would run after a regionserver was restarted without use of region_mover script. The regionserver comes online with 0 regions, leading to a shortcut in {{needsBalance}} for {{idleRegionServerExist}}. The balancer runs to move regions to that newly restarted regionserver. However, it moves a large number of regions in the cluster, hyper-optimizing the other cost variables. There were ~4300 regions in the cluster at the time, so moving 25% of the regions should have had a final cost of at least 7 (default moveCostFunction weight.) MoveCostFunction is also not listed in the functions contributing to the initial cost. {{2021}}{{-}}{{03}}{{-}}{{30}}{{ }}{{15}}{{:}}{{47}}{{:}}{{43}}{{,}}{{396}}{{ }}{{INFO}}{{ [}}{{49187}}{{_}}{{ChoreService}}{{_}}{{3}}{{] }}{{balancer}}{{.}}{{StochasticLoadBalancer}}{{ }}{{-}}{{ }}{{start}}{{}}{{StochasticLoadBalancer}}{{.}}{{balancer}}{{, }}{{initCost}}{{=}}{{12}}{{.}}{{91377229840024}}{{, }}{{functionCost}}{{=}}{{RegionCountSkewCostFunction}}{{ : (}}{{500}}{{.}}{{0}}{{, }}{{0}}{{.}}{{014878672009326464}}{{); }}{{TableSkewCostFunction}}{{ : (}}{{35}}{{.}}{{0}}{{, }}{{0}}{{.}}{{013600280177445717}}{{); }}{{RegionReplicaHostCostFunction}}{{ : (}}{{10}}{{.}}{{0}}{{, }}{{0}}{{.}}{{0}}{{); }}{{RegionReplicaRackCostFunction}}{{ : (}}{{1}}{{.}}{{0}}{{, }}{{0}}{{.}}{{0}}{{); }}{{ReadRequestCostFunction}}{{ : (}}{{5}}{{.}}{{0}}{{, }}{{0}}{{.}}{{8296332203204705}}{{); }}{{WriteRequestCostFunction}}{{ : (}}{{5}}{{.}}{{0}}{{, }}{{0}}{{.}}{{06818455421617946}}{{); }}{{MemstoreSizeCostFunction}}{{ : (}}{{5}}{{.}}{{0}}{{, }}{{0}}{{.}}{{08132131691669181}}{{); }}{{StoreFileCostFunction}}{{ : (}}{{5}}{{.}}{{0}}{{, }}{{0}}{{.}}{{02054620605193966}}{{); }}{{computedMaxSteps}}{{: }}{{100}} {{2021}}{{-}}{{03}}{{-}}{{30}}{{ }}{{15}}{{:}}{{48}}{{:}}{{13}}{{,}}{{385}}{{ }}{{DEBUG}}{{ [}}{{49187}}{{_}}{{ChoreService}}{{_}}{{3}}{{] }}{{balancer}}{{.}}{{StochasticLoadBalancer}}{{ }}{{-}}{{ }}{{Finished }}{{}}{{computing}}{{ }}{{new}}{{ }}{{load}}{{ }}{{balance}}{{ }}{{plan}}{{.}}{{ }}{{Computation}}{{ }}{{took}}{{ }}{{30004ms}}{{ }}{{to}}{{ }}{{try}}{{ }}{{6571}}{{ }}{{different}}{{ }}{{iterations}}{{.}}{{ }}{{Found}}{{ }}{{a }}{{}}{{solution}}{{ }}{{that}}{{ }}{{moves}}{{ }}{{1095}}{{ }}{{regions}}{{; }}{{Going}}{{ }}{{from}}{{ }}{{a}}{{ }}{{computed}}{{ }}{{cost}}{{ }}{{of}}{{ }}{{12}}{{.}}{{91377229840024}}{{ }}{{to}}{{ }}{{a}}{{ }}{{new}}{{ }}{{cost}}{{ }}{{of }}{{}}{{4}}{{.}}{{804625730746651}}{{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25648) Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after HBASE-25592 HBASE-23932
[ https://issues.apache.org/jira/browse/HBASE-25648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-25648: -- Summary: Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after HBASE-25592 HBASE-23932 (was: Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after HBASE-25592 HABSE-23932) > Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after > HBASE-25592 HBASE-23932 > > > Key: HBASE-25648 > URL: https://issues.apache.org/jira/browse/HBASE-25648 > Project: HBase > Issue Type: Bug > Components: Normalizer >Affects Versions: 1.7.0 >Reporter: David Manning >Assignee: David Manning >Priority: Major > > On branch-1 run > {{mvn test -Dtest=TestSimpleRegionNormalizerOnCluster}} > It fails. It appears to be due to some problems in the refactoring related to > HBASE-25592 and HBASE-23932. > > {code:java} > [INFO] Running > org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: > 131.753 s <<< FAILURE! - in > org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster > [ERROR] > testRegionNormalizationSplitOnCluster(org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster) > Time elapsed: 60.107 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 6 > milliseconds > at > org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster.testRegionNormalizationSplitOnCluster(TestSimpleRegionNormalizerOnCluster.java:132) > [ERROR] > testRegionNormalizationMergeOnCluster(org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster) > Time elapsed: 60.117 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 6 > milliseconds > at > org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster.testRegionNormalizationMergeOnCluster(TestSimpleRegionNormalizerOnCluster.java:199) > [INFO] > [INFO] Results: > [INFO] > [ERROR] Errors: > [ERROR] > TestSimpleRegionNormalizerOnCluster.testRegionNormalizationMergeOnCluster:199 > » TestTimedOut > [ERROR] > TestSimpleRegionNormalizerOnCluster.testRegionNormalizationSplitOnCluster:132 > TestTimedOut > [INFO] > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25648) Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after HBASE-25592 HABSE-23932
[ https://issues.apache.org/jira/browse/HBASE-25648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-25648: -- Description: On branch-1 run {{mvn test -Dtest=TestSimpleRegionNormalizerOnCluster}} It fails. It appears to be due to some problems in the refactoring related to HBASE-25592 and HBASE-23932. {code:java} [INFO] Running org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 131.753 s <<< FAILURE! - in org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster [ERROR] testRegionNormalizationSplitOnCluster(org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster) Time elapsed: 60.107 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 6 milliseconds at org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster.testRegionNormalizationSplitOnCluster(TestSimpleRegionNormalizerOnCluster.java:132) [ERROR] testRegionNormalizationMergeOnCluster(org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster) Time elapsed: 60.117 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 6 milliseconds at org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster.testRegionNormalizationMergeOnCluster(TestSimpleRegionNormalizerOnCluster.java:199) [INFO] [INFO] Results: [INFO] [ERROR] Errors: [ERROR] TestSimpleRegionNormalizerOnCluster.testRegionNormalizationMergeOnCluster:199 » TestTimedOut [ERROR] TestSimpleRegionNormalizerOnCluster.testRegionNormalizationSplitOnCluster:132 TestTimedOut [INFO] [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0 {code} was: On branch-1 run {{mvn test -Dtest=TestSimpleRegionNormalizerOnCluster}} It fails. It appears to be due to some problems in the refactoring related to HBASE-25592 and HBASE-23932. > Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after > HBASE-25592 HABSE-23932 > > > Key: HBASE-25648 > URL: https://issues.apache.org/jira/browse/HBASE-25648 > Project: HBase > Issue Type: Bug > Components: Normalizer >Affects Versions: 1.7.0 >Reporter: David Manning >Assignee: David Manning >Priority: Major > > On branch-1 run > {{mvn test -Dtest=TestSimpleRegionNormalizerOnCluster}} > It fails. It appears to be due to some problems in the refactoring related to > HBASE-25592 and HBASE-23932. > > {code:java} > [INFO] Running > org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: > 131.753 s <<< FAILURE! - in > org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster > [ERROR] > testRegionNormalizationSplitOnCluster(org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster) > Time elapsed: 60.107 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 6 > milliseconds > at > org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster.testRegionNormalizationSplitOnCluster(TestSimpleRegionNormalizerOnCluster.java:132) > [ERROR] > testRegionNormalizationMergeOnCluster(org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster) > Time elapsed: 60.117 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 6 > milliseconds > at > org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster.testRegionNormalizationMergeOnCluster(TestSimpleRegionNormalizerOnCluster.java:199) > [INFO] > [INFO] Results: > [INFO] > [ERROR] Errors: > [ERROR] > TestSimpleRegionNormalizerOnCluster.testRegionNormalizationMergeOnCluster:199 > » TestTimedOut > [ERROR] > TestSimpleRegionNormalizerOnCluster.testRegionNormalizationSplitOnCluster:132 > TestTimedOut > [INFO] > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25648) Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after HBASE-25592 HABSE-23932
David Manning created HBASE-25648: - Summary: Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after HBASE-25592 HABSE-23932 Key: HBASE-25648 URL: https://issues.apache.org/jira/browse/HBASE-25648 Project: HBase Issue Type: Bug Components: Normalizer Affects Versions: 1.7.0 Reporter: David Manning Assignee: David Manning On branch-1 run {{mvn test -Dtest=TestSimpleRegionNormalizerOnCluster}} It fails. It appears to be due to some problems in the refactoring related to HBASE-25592 and HBASE-23932. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate resource distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295860#comment-17295860 ] David Manning commented on HBASE-25625: --- I'm excited for working towards a balancer that works better for large clusters! Thanks for proposing changes in that direction. I agree that the TableSkewCostFunction seems limited in its current form of only tracking the max regions on any given server. For the other cost functions, I'm having a hard time working through the math and seeing the benefit, though. For example, if I take an 11-node cluster with 100 regions per server on average: 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100 And one node goes down, then I see: 110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 0 With sum of deviation (old computation), it is (110 - 100) * 10 + (100 - 0) * 1 = 200. Max deviation would be 1100 regions on one server, for (100 - 0) * 10 + (1100 - 100) * 1 = 2000. So the scaled cost would be 200 / 2000 - 0.1. With stdev (new computation), it also gives a scaled cost of 0.1. stdev = sqrt(((110 - 100) ^ 2 * 10 + (0 - 100) ^ 2 * 1) / 11) = sqrt(1000). Maximum possible stdev = sqrt(((0 - 100) ^ 2 * 10 + (1100 - 100) ^ 2 * 1) / 11) = sqrt(10). If another server goes down and distributed regions round-robin, the cluster state would look like: 121, 121, 121, 121, 121, 121, 121, 121, 121, 0, 11 If I did the math right, then I see: old computation: 378 / 2000 = 0.189 new computation: 0.140 So the stdev-based calculation is less likely to balance in these scenarios. How big does the cluster have to get to benefit from the new calculations? I tried 100 nodes with 1000 regions per node. One node at 0 results in 0.01 cost in both old and new calculations. Two nodes down (assuming round-robin balancing again), gives me 0.019 for the old calculation and 0.014 for the new stdev calculation. > StochasticBalancer CostFunctions needs a better way to evaluate resource > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more region servers and regions, it doesn't work > well with hot spots or a small number of unbalanced servers. The proposal is > to use the standard deviation of the count per region server to capture the > existence of a small portion of region servers with overwhelming > load/allocation. > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use the standard deviation of the count > per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 > in this case. > Patch is in test and will follow shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24657) JsonBean representation of metrics at /jmx endpoint now quotes all numbers
[ https://issues.apache.org/jira/browse/HBASE-24657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-24657: -- Fix Version/s: 1.6.0 1.4.14 1.3.7 Status: Patch Available (was: In Progress) > JsonBean representation of metrics at /jmx endpoint now quotes all numbers > -- > > Key: HBASE-24657 > URL: https://issues.apache.org/jira/browse/HBASE-24657 > Project: HBase > Issue Type: Bug > Components: metrics >Affects Versions: 1.4.11, 1.3.6, 1.6.0, 1.5.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > Fix For: 1.3.7, 1.4.14, 1.6.0 > > > HBASE-20571 had a fix to look for NaN or Infinity in numbers, and to quote > those as strings. The order of the `if-else` block is different in branch-1 > (https://github.com/apache/hbase/commit/2d493556f3c8ae87fb92422b525bf7c9345e6ccd) > and branch-2 > (https://github.com/apache/hbase/commit/39ea1efa885e2f27f41af59228e0a12c4ded08f8) > HBASE-23015 changed the JsonBean.java code in a meaningful way, and the order > of the changes were consistent between branch-1 > ([https://github.com/apache/hbase/commit/f77c14d18150f55ee892f8d24a5ee231c1ae7e20#diff-87e9e2722b9210eebfd8c820c5d72a46L319-L324]) > and branch-2 > ([https://github.com/apache/hbase/commit/761aef6d9d0b8a455842de4d5eac7d9486f00633#diff-2c8f5dd222141c69112c5c5b5f70cf55R319-R324]) > Unfortunately, they need to be reversed since the order is different between > branch-1 and branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24657) JsonBean representation of metrics at /jmx endpoint now quotes all numbers
David Manning created HBASE-24657: - Summary: JsonBean representation of metrics at /jmx endpoint now quotes all numbers Key: HBASE-24657 URL: https://issues.apache.org/jira/browse/HBASE-24657 Project: HBase Issue Type: Bug Components: metrics Affects Versions: 1.4.11, 1.3.6, 1.6.0, 1.5.0 Reporter: David Manning Assignee: David Manning HBASE-20571 had a fix to look for NaN or Infinity in numbers, and to quote those as strings. The order of the `if-else` block is different in branch-1 (https://github.com/apache/hbase/commit/2d493556f3c8ae87fb92422b525bf7c9345e6ccd) and branch-2 (https://github.com/apache/hbase/commit/39ea1efa885e2f27f41af59228e0a12c4ded08f8) HBASE-23015 changed the JsonBean.java code in a meaningful way, and the order of the changes were consistent between branch-1 ([https://github.com/apache/hbase/commit/f77c14d18150f55ee892f8d24a5ee231c1ae7e20#diff-87e9e2722b9210eebfd8c820c5d72a46L319-L324]) and branch-2 ([https://github.com/apache/hbase/commit/761aef6d9d0b8a455842de4d5eac7d9486f00633#diff-2c8f5dd222141c69112c5c5b5f70cf55R319-R324]) Unfortunately, they need to be reversed since the order is different between branch-1 and branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HBASE-24657) JsonBean representation of metrics at /jmx endpoint now quotes all numbers
[ https://issues.apache.org/jira/browse/HBASE-24657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-24657 started by David Manning. - > JsonBean representation of metrics at /jmx endpoint now quotes all numbers > -- > > Key: HBASE-24657 > URL: https://issues.apache.org/jira/browse/HBASE-24657 > Project: HBase > Issue Type: Bug > Components: metrics >Affects Versions: 1.5.0, 1.6.0, 1.3.6, 1.4.11 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > > HBASE-20571 had a fix to look for NaN or Infinity in numbers, and to quote > those as strings. The order of the `if-else` block is different in branch-1 > (https://github.com/apache/hbase/commit/2d493556f3c8ae87fb92422b525bf7c9345e6ccd) > and branch-2 > (https://github.com/apache/hbase/commit/39ea1efa885e2f27f41af59228e0a12c4ded08f8) > HBASE-23015 changed the JsonBean.java code in a meaningful way, and the order > of the changes were consistent between branch-1 > ([https://github.com/apache/hbase/commit/f77c14d18150f55ee892f8d24a5ee231c1ae7e20#diff-87e9e2722b9210eebfd8c820c5d72a46L319-L324]) > and branch-2 > ([https://github.com/apache/hbase/commit/761aef6d9d0b8a455842de4d5eac7d9486f00633#diff-2c8f5dd222141c69112c5c5b5f70cf55R319-R324]) > Unfortunately, they need to be reversed since the order is different between > branch-1 and branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-24099) Use a fair ReentrantReadWriteLock for the region close lock
[ https://issues.apache.org/jira/browse/HBASE-24099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076842#comment-17076842 ] David Manning edited comment on HBASE-24099 at 4/7/20, 2:52 AM: Yep, understood. I was just hoping someone could explain why numbers would consistently get faster with a fair lock policy... or even why a fair lock would get much slower when there is no thread waiting for a writer lock. So it's a +1 from me, for whatever that's worth. was (Author: dmanning): Yep, understood. I was just hoping someone could explain why numbers would consistently get faster with a new lock policy... or even why they would get much slower given the fairness when there is no thread waiting for a writer lock. So it's a +1 from me, for whatever that's worth. > Use a fair ReentrantReadWriteLock for the region close lock > --- > > Key: HBASE-24099 > URL: https://issues.apache.org/jira/browse/HBASE-24099 > Project: HBase > Issue Type: Improvement >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > Fix For: 3.0.0, 2.3.1, 1.3.7, 1.7.0, 2.4.0, 2.1.10, 1.4.14, 2.2.5 > > Attachments: ltt_results.pdf, pe_results.pdf, ycsb_results.pdf > > > Consider creating the region's ReentrantReadWriteLock with the fair locking > policy. We have had a couple of production incidents where a regionserver > stalled in shutdown for a very very long time, leading to RIT (FAILED_CLOSE). > The latest example is a 43 minute shutdown, ~40 minutes (2465280 ms) of that > time was spent waiting to acquire the write lock on the region in order to > finish closing it. > {quote} > ... > Finished memstore flush of ~66.92 MB/70167112, currentsize=0 B/0 for region > . in 927ms, sequenceid=6091133815, compaction requested=false at > 1585175635349 (+60 ms) > Disabling writes for close at 1585178100629 (+2465280 ms) > {quote} > This time was spent in between the memstore flush and the task status change > "Disabling writes for close at...". This is at HRegion.java:1481 in 1.3.6: > {code} > 1480: // block waiting for the lock for closing > 1481: lock.writeLock().lock(); // FindBugs: Complains > UL_UNRELEASED_LOCK_EXCEPTION_PATH but seems fine > {code} > > The close lock is operating in unfair mode. The table in question is under > constant high query load. When the close request was received, there were > active readers. After the close request there were more active readers, > near-continuous contention. Although the clients would receive > RegionServerStoppingException and other error notifications, because the > region could not be reassigned, they kept coming, region (re-)location would > find the region still hosted on the stuck server. Finally the closing thread > waiting for the write lock became no longer starved (by chance) after 40 > minutes. > The ReentrantReadWriteLock javadoc is clear about the possibility of > starvation when continuously contended: "_When constructed as non-fair (the > default), the order of entry to the read and write lock is unspecified, > subject to reentrancy constraints. A nonfair lock that is continuously > contended may indefinitely postpone one or more reader or writer threads, but > will normally have higher throughput than a fair lock._" > We could try changing the acquisition semantics of this lock to fair. This is > a one line change, where we call the RW lock constructor. Then: > "_When constructed as fair, threads contend for entry using an approximately > arrival-order policy. When the currently held lock is released, either the > longest-waiting single writer thread will be assigned the write lock, or if > there is a group of reader threads waiting longer than all waiting writer > threads, that group will be assigned the read lock._" > This could be better. The close process will have to wait until all readers > and writers already waiting for acquisition either acquire and release or go > away but won't be starved by future/incoming requests. > There could be a throughput loss in request handling, though, because this is > the global reentrant RW lock for the region. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24099) Use a fair ReentrantReadWriteLock for the region close lock
[ https://issues.apache.org/jira/browse/HBASE-24099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076842#comment-17076842 ] David Manning commented on HBASE-24099: --- Yep, understood. I was just hoping someone could explain why numbers would consistently get faster with a new lock policy... or even why they would get much slower given the fairness when there is no thread waiting for a writer lock. So it's a +1 from me, for whatever that's worth. > Use a fair ReentrantReadWriteLock for the region close lock > --- > > Key: HBASE-24099 > URL: https://issues.apache.org/jira/browse/HBASE-24099 > Project: HBase > Issue Type: Improvement >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > Fix For: 3.0.0, 2.3.1, 1.3.7, 1.7.0, 2.4.0, 2.1.10, 1.4.14, 2.2.5 > > Attachments: ltt_results.pdf, pe_results.pdf, ycsb_results.pdf > > > Consider creating the region's ReentrantReadWriteLock with the fair locking > policy. We have had a couple of production incidents where a regionserver > stalled in shutdown for a very very long time, leading to RIT (FAILED_CLOSE). > The latest example is a 43 minute shutdown, ~40 minutes (2465280 ms) of that > time was spent waiting to acquire the write lock on the region in order to > finish closing it. > {quote} > ... > Finished memstore flush of ~66.92 MB/70167112, currentsize=0 B/0 for region > . in 927ms, sequenceid=6091133815, compaction requested=false at > 1585175635349 (+60 ms) > Disabling writes for close at 1585178100629 (+2465280 ms) > {quote} > This time was spent in between the memstore flush and the task status change > "Disabling writes for close at...". This is at HRegion.java:1481 in 1.3.6: > {code} > 1480: // block waiting for the lock for closing > 1481: lock.writeLock().lock(); // FindBugs: Complains > UL_UNRELEASED_LOCK_EXCEPTION_PATH but seems fine > {code} > > The close lock is operating in unfair mode. The table in question is under > constant high query load. When the close request was received, there were > active readers. After the close request there were more active readers, > near-continuous contention. Although the clients would receive > RegionServerStoppingException and other error notifications, because the > region could not be reassigned, they kept coming, region (re-)location would > find the region still hosted on the stuck server. Finally the closing thread > waiting for the write lock became no longer starved (by chance) after 40 > minutes. > The ReentrantReadWriteLock javadoc is clear about the possibility of > starvation when continuously contended: "_When constructed as non-fair (the > default), the order of entry to the read and write lock is unspecified, > subject to reentrancy constraints. A nonfair lock that is continuously > contended may indefinitely postpone one or more reader or writer threads, but > will normally have higher throughput than a fair lock._" > We could try changing the acquisition semantics of this lock to fair. This is > a one line change, where we call the RW lock constructor. Then: > "_When constructed as fair, threads contend for entry using an approximately > arrival-order policy. When the currently held lock is released, either the > longest-waiting single writer thread will be assigned the write lock, or if > there is a group of reader threads waiting longer than all waiting writer > threads, that group will be assigned the read lock._" > This could be better. The close process will have to wait until all readers > and writers already waiting for acquisition either acquire and release or go > away but won't be starved by future/incoming requests. > There could be a throughput loss in request handling, though, because this is > the global reentrant RW lock for the region. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24099) Use a fair ReentrantReadWriteLock for the region close lock
[ https://issues.apache.org/jira/browse/HBASE-24099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076799#comment-17076799 ] David Manning commented on HBASE-24099: --- I am not an expert on the locks, but looking through the code I only found two cases where the {{writeLock}} is taken: {{startBulkRegionOperation}} and {{doClose}}. I'm guessing very few of those operations happen during the performance tests. So, as a result, I'd expect to not see too much overhead in enforcing the fairness. Most locks should be read-only, so contention should be minimal. If that's true, then it could explain that a lot of the ~10% changes are just normal variance. Put another way, there should be absolutely no reason why some read cases get faster with a fair lock pattern... right? So that seems to suggest a variance level around ~10%. All of this makes me feel pretty good about the performance results not showing a regression. > Use a fair ReentrantReadWriteLock for the region close lock > --- > > Key: HBASE-24099 > URL: https://issues.apache.org/jira/browse/HBASE-24099 > Project: HBase > Issue Type: Improvement >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > Fix For: 3.0.0, 2.3.1, 1.3.7, 1.7.0, 2.4.0, 2.1.10, 1.4.14, 2.2.5 > > Attachments: ltt_results.pdf, pe_results.pdf, ycsb_results.pdf > > > Consider creating the region's ReentrantReadWriteLock with the fair locking > policy. We have had a couple of production incidents where a regionserver > stalled in shutdown for a very very long time, leading to RIT (FAILED_CLOSE). > The latest example is a 43 minute shutdown, ~40 minutes (2465280 ms) of that > time was spent waiting to acquire the write lock on the region in order to > finish closing it. > {quote} > ... > Finished memstore flush of ~66.92 MB/70167112, currentsize=0 B/0 for region > . in 927ms, sequenceid=6091133815, compaction requested=false at > 1585175635349 (+60 ms) > Disabling writes for close at 1585178100629 (+2465280 ms) > {quote} > This time was spent in between the memstore flush and the task status change > "Disabling writes for close at...". This is at HRegion.java:1481 in 1.3.6: > {code} > 1480: // block waiting for the lock for closing > 1481: lock.writeLock().lock(); // FindBugs: Complains > UL_UNRELEASED_LOCK_EXCEPTION_PATH but seems fine > {code} > > The close lock is operating in unfair mode. The table in question is under > constant high query load. When the close request was received, there were > active readers. After the close request there were more active readers, > near-continuous contention. Although the clients would receive > RegionServerStoppingException and other error notifications, because the > region could not be reassigned, they kept coming, region (re-)location would > find the region still hosted on the stuck server. Finally the closing thread > waiting for the write lock became no longer starved (by chance) after 40 > minutes. > The ReentrantReadWriteLock javadoc is clear about the possibility of > starvation when continuously contended: "_When constructed as non-fair (the > default), the order of entry to the read and write lock is unspecified, > subject to reentrancy constraints. A nonfair lock that is continuously > contended may indefinitely postpone one or more reader or writer threads, but > will normally have higher throughput than a fair lock._" > We could try changing the acquisition semantics of this lock to fair. This is > a one line change, where we call the RW lock constructor. Then: > "_When constructed as fair, threads contend for entry using an approximately > arrival-order policy. When the currently held lock is released, either the > longest-waiting single writer thread will be assigned the write lock, or if > there is a group of reader threads waiting longer than all waiting writer > threads, that group will be assigned the read lock._" > This could be better. The close process will have to wait until all readers > and writers already waiting for acquisition either acquire and release or go > away but won't be starved by future/incoming requests. > There could be a throughput loss in request handling, though, because this is > the global reentrant RW lock for the region. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23372) ZooKeeper Assignment can result in stale znodes in region-in-transition after table is dropped and hbck run
David Manning created HBASE-23372: - Summary: ZooKeeper Assignment can result in stale znodes in region-in-transition after table is dropped and hbck run Key: HBASE-23372 URL: https://issues.apache.org/jira/browse/HBASE-23372 Project: HBase Issue Type: Bug Components: hbck, master, Region Assignment, Zookeeper Affects Versions: 1.3.2 Reporter: David Manning It is possible for znodes under /hbase/region-in-transition to remain long after a table is deleted. There does not appear to be any cleanup logic for these. The details are a little fuzzy, but it seems to be fallout from HBASE-22617. Incidents related to that bug involved regions stuck in transition, and use of hbck to fix clusters. There was a temporary table created and deleted once per day, but somehow it led to receiving {{FSLimitException$MaxDirectoryItemsExceededException}} and regions stuck in transition. Even weeks after fixing the bug and upgrading the cluster, the znodes remain under /hbase/region-in-transition. In the most impacted cluster, {{hbase zkcli ls /hbase/region-in-transition | wc -w}} returns almost 100,000 entries. This causes very slow region transition times (often 80 seconds), likely due to enumerating all these entries when zk watch on this node is triggered. Log lines for slow region transitions: {code:java} 2019-12-05 07:02:14,714 DEBUG [K.Worker-pool3-t7344] master.AssignmentManager - Handling RS_ZK_REGION_CLOSED, server=<>, region=<>, which is more than 15 seconds late, current_state={<> state=PENDING_CLOSE, ts=1575529254635, server=<>} {code} Even during hmaster failover, entries are not cleaned, but the following log lines can be seen: {code:java} 2019-11-27 00:26:27,044 WARN [.activeMasterManager] master.AssignmentManager - Couldn't find the region in recovering region=<>, state=RS_ZK_REGION_FAILED_OPEN, servername=<>, createTime=1565603905404, payload.length=0 {code} Possible solutions: # Logic to parse the RIT znode during master failover which sees if the table exists. Clean up entries for nonexistent tables. # New mode for hbck to do cleanup of nonexistent regions under the znode. # Others? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23153) PrimaryRegionCountSkewCostFunction SLB function should implement CostFunction#isNeeded
[ https://issues.apache.org/jira/browse/HBASE-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949852#comment-16949852 ] David Manning commented on HBASE-23153: --- Thanks [~apurtell] for doing literally all the work. I made a comment on the github PR about keeping the {{cost}} method as is. Otherwise LGTM > PrimaryRegionCountSkewCostFunction SLB function should implement > CostFunction#isNeeded > -- > > Key: HBASE-23153 > URL: https://issues.apache.org/jira/browse/HBASE-23153 > Project: HBase > Issue Type: Bug >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > Fix For: 3.0.0, 2.3.0, 1.6.0, 2.2.2, 2.1.8, 1.5.1 > > > The PrimaryRegionCountSkewCostFunction SLB function should implement > CostFunction#isNeeded and like the other region replica specific functions > should return false for it when region replicas are not in use. Otherwise it > will always report a cost of 0 even though its weight will be included in the > sum of the weights. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22935) TaskMonitor warns MonitoredRPCHandler task may be stuck when it recently started
[ https://issues.apache.org/jira/browse/HBASE-22935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Manning updated HBASE-22935: -- Status: Patch Available (was: Open) > TaskMonitor warns MonitoredRPCHandler task may be stuck when it recently > started > > > Key: HBASE-22935 > URL: https://issues.apache.org/jira/browse/HBASE-22935 > Project: HBase > Issue Type: Bug > Components: logging >Affects Versions: 2.0.0, 1.3.3, 1.4.0, 3.0.0, 1.5.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > Attachments: HBASE-22935.master.001.patch > > > After setting {{hbase.taskmonitor.rpc.warn.time}} to 18, the logs show > WARN messages such as these > {noformat} > 2019-08-08 21:50:02,601 WARN [read for TaskMonitor] monitoring.TaskMonitor - > Task may be stuck: RpcServer.FifoWFPBQ.default.handler=4,queue=4,port=60020: > status=Servicing call from :55164: Scan, state=RUNNING, > startTime=1563305858103, completionTime=-1, queuetimems=1565301002599, > starttimems=1565301002599, clientaddress=, remoteport=55164, > packetlength=370, rpcMethod=Scan > {noformat} > Notice that the first {{starttimems}} is far in the past. The second > {{starttimems}} and the {{queuetimems}} are much closer to the log timestamp > than 180 seconds. I think this is because the warnTime is initialized to the > time that MonitoredTaskImpl is created, but never updated until we write a > warn message to the log. -- This message was sent by Atlassian Jira (v8.3.2#803003)