[jira] [Assigned] (HBASE-21785) master reports open regions as RITs and also messes up rit age metric

2024-07-01 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-21785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning reassigned HBASE-21785:
-

Assignee: Sergey Shelukhin  (was: David Manning)

> master reports open regions as RITs and also messes up rit age metric
> -
>
> Key: HBASE-21785
> URL: https://issues.apache.org/jira/browse/HBASE-21785
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.0
>
> Attachments: HBASE-21785.01.patch, HBASE-21785.patch
>
>
> {noformat}
> RegionState   RIT time (ms)   Retries
> dba183f0dadfcc9dc8ae0a6dd59c84e6  dba183f0dadfcc9dc8ae0a6dd59c84e6. 
> state=OPEN, ts=Wed Dec 31 16:00:00 PST 1969 (1548453918s ago), 
> server=server,17020,1548452922054  1548453918735   0
> {noformat}
> RIT age metric also gets set to a bogus value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-21785) master reports open regions as RITs and also messes up rit age metric

2024-07-01 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-21785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning reassigned HBASE-21785:
-

Assignee: David Manning  (was: Sergey Shelukhin)

> master reports open regions as RITs and also messes up rit age metric
> -
>
> Key: HBASE-21785
> URL: https://issues.apache.org/jira/browse/HBASE-21785
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: David Manning
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.0
>
> Attachments: HBASE-21785.01.patch, HBASE-21785.patch
>
>
> {noformat}
> RegionState   RIT time (ms)   Retries
> dba183f0dadfcc9dc8ae0a6dd59c84e6  dba183f0dadfcc9dc8ae0a6dd59c84e6. 
> state=OPEN, ts=Wed Dec 31 16:00:00 PST 1969 (1548453918s ago), 
> server=server,17020,1548452922054  1548453918735   0
> {noformat}
> RIT age metric also gets set to a bogus value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28663) CanaryTool continues executing and scanning after timeout

2024-06-14 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-28663:
--
Description: 
If you run the {{CanaryTool}} in region mode until it reaches the configured 
timeout, the logs and sink results will show that it can continue executing and 
scanning for 10 seconds.

This is because the RegionTasks have already been submitted to an 
ExecutorService which continues execution after timeout, and the Monitor 
continues execution on a separate thread.

The 10 second delay in shutdown is seen, in hbase 2.x at least, because 
{{runMonitor}} will close the {{Connection}} and that process 
([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094])
 will lead to {{ConnectionImplementation#close}} 
([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300])
 and inside {{shutdownPools}} we will potentially wait the full 10 seconds of 
{{awaitTermination}} if client operations are in progress.

The scenario can be improved by simply interrupting the monitor thread, as we 
will often be in an {{invokeAll}} call in a {{sniff}} method. The {{invokeAll}} 
method is blocking, and interrupting the monitor in this call will interrupt 
the client threads and generally shutdown properly and timely. However, we can 
be more robust by also watching for a shutdown signal in the various tasks such 
as {{RegionTask}} so any remaining tasks will drain quickly and without errors. 
This will remove a lot of errors from the canary logs during shutdown.

 
{code:java}
2024-06-12 02:57:14 [Time-limited test] ERROR tool.Canary(1076): The monitor is 
running too long (1140098) after timeout limit:114 will be killed itself !!

2024-06-12 02:57:14 [Time-limited test] INFO 
client.ConnectionImplementation(2039): Closing master protocol: MasterService

2024-06-12 02:57:14 [pool-3-thread-4] ERROR tool.Canary(353): Read from 
REGION1. on serverName=REGIONSERVER-1, columnFamily=0 failed
java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
Task 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@54f2a9a4
 rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting down, 
pool size = 7, active threads = 7, queued tasks = 0, completed tasks = 180094]
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:199)
at 
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:271)
at 
org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:440)
at 
org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:314)
at 
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:612)
at 
org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.readColumnFamily(CanaryTool.java:565)
at 
org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:609)
at 
org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503)
at 
org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471)

[... repeats for 10 seconds and tens of thousands of regions ... ]

2024-06-12 02:57:16 [pool-3-thread-11] ERROR tool.Canary(353): Read from 
REGION1. on serverName=REGIONSERVER-2, columnFamily=0 failed
java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
Task 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@d08d21f
 rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting down, 
pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 180098]

[...]

2024-06-12 02:57:24 [pool-3-thread-11] ERROR tool.Canary(353): Read from 
REGION42000. on serverName=REGIONSERVER-3, columnFamily=0 failed
java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
Task 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@38e7a5a1
 rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Terminated, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 180101]

2024-06-12T02:57:24.202Z, java.io.InterruptedIOException

at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:294)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:255)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:53)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:191)
at 

[jira] [Updated] (HBASE-28663) CanaryTool continues executing and scanning after timeout

2024-06-14 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-28663:
--
Status: Patch Available  (was: In Progress)

> CanaryTool continues executing and scanning after timeout
> -
>
> Key: HBASE-28663
> URL: https://issues.apache.org/jira/browse/HBASE-28663
> Project: HBase
>  Issue Type: Bug
>  Components: canary
>Affects Versions: 2.0.0, 3.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If you run the {{CanaryTool}} in region mode until it reaches the configured 
> timeout, the logs and sink results will show that it can continue executing 
> and scanning for 10 seconds.
> This is because the RegionTasks have already been submitted to an 
> ExecutorService which continues execution after timeout, and the Monitor 
> continues execution on a separate thread.
> The 10 second delay in shutdown is seen, in hbase 2.x at least, because 
> {{runMonitor}} will close the {{Connection}} and that process 
> ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094])
>  will lead to {{ConnectionImplementation#close}} 
> ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300])
>  and inside {{shutdownPools}} we will potentially wait the full 10 seconds of 
> {{awaitTermination}} if client operations are in progress.
> The scenario can be improved by simply interrupting the monitor thread, as we 
> will often be in an {{invokeAll}} call in a {{sniff}} method. The 
> {{invokeAll}} method is blocking, and interrupting the monitor in this call 
> will interrupt the client threads and generally shutdown properly and timely. 
> However, we can be more robust by also watching for a shutdown signal in the 
> various tasks such as {{RegionTask}} so any remaining tasks will drain 
> quickly and without errors.
>  
> {code:java}
> 2024-06-12 02:57:14 [Time-limited test] ERROR tool.Canary(1076): The monitor 
> is running too long (1140098) after timeout limit:114 will be killed 
> itself !!
> 2024-06-12 02:57:14 [Time-limited test] INFO 
> client.ConnectionImplementation(2039): Closing master protocol: MasterService
> 2024-06-12 02:57:14 [pool-3-thread-4] ERROR tool.Canary(353): Read from 
> REGION1. on serverName=REGIONSERVER-1, columnFamily=0 failed
> java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
> Task 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@54f2a9a4
>  rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting 
> down, pool size = 7, active threads = 7, queued tasks = 0, completed tasks = 
> 180094]
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:199)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:271)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:440)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:314)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:612)
>   at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.readColumnFamily(CanaryTool.java:565)
>   at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:609)
>   at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503)
>   at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471)
> [... repeats for 10 seconds and tens of thousands of regions ... ]
> 2024-06-12 02:57:16 [pool-3-thread-11] ERROR tool.Canary(353): Read from 
> REGION1. on serverName=REGIONSERVER-2, columnFamily=0 failed
> java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
> Task 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@d08d21f
>  rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting 
> down, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 
> 180098]
> [...]
> 2024-06-12 02:57:24 [pool-3-thread-11] ERROR tool.Canary(353): Read from 
> REGION42000. on serverName=REGIONSERVER-3, columnFamily=0 failed
> java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
> Task 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@38e7a5a1
>  rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Terminated, 
> pool 

[jira] [Work logged] (HBASE-28663) CanaryTool continues executing and scanning after timeout

2024-06-14 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28663?focusedWorklogId=923552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-923552
 ]

David Manning logged work on HBASE-28663:
-

Author: David Manning
Created on: 14/Jun/24 17:05
Start Date: 14/Jun/24 17:05
Worklog Time Spent: 24h 

Issue Time Tracking
---

Worklog Id: (was: 923552)
Remaining Estimate: 0h  (was: 24h)
Time Spent: 24h

> CanaryTool continues executing and scanning after timeout
> -
>
> Key: HBASE-28663
> URL: https://issues.apache.org/jira/browse/HBASE-28663
> Project: HBase
>  Issue Type: Bug
>  Components: canary
>Affects Versions: 2.0.0, 3.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 24h
>  Time Spent: 24h
>  Remaining Estimate: 0h
>
> If you run the {{CanaryTool}} in region mode until it reaches the configured 
> timeout, the logs and sink results will show that it can continue executing 
> and scanning for 10 seconds.
> This is because the RegionTasks have already been submitted to an 
> ExecutorService which continues execution after timeout, and the Monitor 
> continues execution on a separate thread.
> The 10 second delay in shutdown is seen, in hbase 2.x at least, because 
> {{runMonitor}} will close the {{Connection}} and that process 
> ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094])
>  will lead to {{ConnectionImplementation#close}} 
> ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300])
>  and inside {{shutdownPools}} we will potentially wait the full 10 seconds of 
> {{awaitTermination}} if client operations are in progress.
> The scenario can be improved by simply interrupting the monitor thread, as we 
> will often be in an {{invokeAll}} call in a {{sniff}} method. The 
> {{invokeAll}} method is blocking, and interrupting the monitor in this call 
> will interrupt the client threads and generally shutdown properly and timely. 
> However, we can be more robust by also watching for a shutdown signal in the 
> various tasks such as {{RegionTask}} so any remaining tasks will drain 
> quickly and without errors.
>  
> {code:java}
> 2024-06-12 02:57:14 [Time-limited test] ERROR tool.Canary(1076): The monitor 
> is running too long (1140098) after timeout limit:114 will be killed 
> itself !!
> 2024-06-12 02:57:14 [Time-limited test] INFO 
> client.ConnectionImplementation(2039): Closing master protocol: MasterService
> 2024-06-12 02:57:14 [pool-3-thread-4] ERROR tool.Canary(353): Read from 
> REGION1. on serverName=REGIONSERVER-1, columnFamily=0 failed
> java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
> Task 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@54f2a9a4
>  rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting 
> down, pool size = 7, active threads = 7, queued tasks = 0, completed tasks = 
> 180094]
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:199)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:271)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:440)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:314)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:612)
>   at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.readColumnFamily(CanaryTool.java:565)
>   at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:609)
>   at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503)
>   at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471)
> [... repeats for 10 seconds and tens of thousands of regions ... ]
> 2024-06-12 02:57:16 [pool-3-thread-11] ERROR tool.Canary(353): Read from 
> REGION1. on serverName=REGIONSERVER-2, columnFamily=0 failed
> java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
> Task 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@d08d21f
>  rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting 
> down, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 
> 180098]
> [...]
> 2024-06-12 02:57:24 [pool-3-thread-11] ERROR tool.Canary(353): Read from 

[jira] [Updated] (HBASE-28663) CanaryTool continues executing and scanning after timeout

2024-06-14 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-28663:
--
Description: 
If you run the {{CanaryTool}} in region mode until it reaches the configured 
timeout, the logs and sink results will show that it can continue executing and 
scanning for 10 seconds.

This is because the RegionTasks have already been submitted to an 
ExecutorService which continues execution after timeout, and the Monitor 
continues execution on a separate thread.

The 10 second delay in shutdown is seen, in hbase 2.x at least, because 
{{runMonitor}} will close the {{Connection}} and that process 
([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094])
 will lead to {{ConnectionImplementation#close}} 
([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300])
 and inside {{shutdownPools}} we will potentially wait the full 10 seconds of 
{{awaitTermination}} if client operations are in progress.

The scenario can be improved by simply interrupting the monitor thread, as we 
will often be in an {{invokeAll}} call in a {{sniff}} method. The {{invokeAll}} 
method is blocking, and interrupting the monitor in this call will interrupt 
the client threads and generally shutdown properly and timely. However, we can 
be more robust by also watching for a shutdown signal in the various tasks such 
as {{RegionTask}} so any remaining tasks will drain quickly and without errors.

 
{code:java}
2024-06-12 02:57:14 [Time-limited test] ERROR tool.Canary(1076): The monitor is 
running too long (1140098) after timeout limit:114 will be killed itself !!

2024-06-12 02:57:14 [Time-limited test] INFO 
client.ConnectionImplementation(2039): Closing master protocol: MasterService

2024-06-12 02:57:14 [pool-3-thread-4] ERROR tool.Canary(353): Read from 
REGION1. on serverName=REGIONSERVER-1, columnFamily=0 failed
java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
Task 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@54f2a9a4
 rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting down, 
pool size = 7, active threads = 7, queued tasks = 0, completed tasks = 180094]
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:199)
at 
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:271)
at 
org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:440)
at 
org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:314)
at 
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:612)
at 
org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.readColumnFamily(CanaryTool.java:565)
at 
org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:609)
at 
org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503)
at 
org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471)

[... repeats for 10 seconds and tens of thousands of regions ... ]

2024-06-12 02:57:16 [pool-3-thread-11] ERROR tool.Canary(353): Read from 
REGION1. on serverName=REGIONSERVER-2, columnFamily=0 failed
java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
Task 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@d08d21f
 rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting down, 
pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 180098]

[...]

2024-06-12 02:57:24 [pool-3-thread-11] ERROR tool.Canary(353): Read from 
REGION42000. on serverName=REGIONSERVER-3, columnFamily=0 failed
java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
Task 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@38e7a5a1
 rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Terminated, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 180101]

2024-06-12T02:57:24.202Z, java.io.InterruptedIOException

at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:294)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:255)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:53)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:191)
at 

[jira] [Work started] (HBASE-28663) CanaryTool continues executing and scanning after timeout

2024-06-13 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-28663 started by David Manning.
-
> CanaryTool continues executing and scanning after timeout
> -
>
> Key: HBASE-28663
> URL: https://issues.apache.org/jira/browse/HBASE-28663
> Project: HBase
>  Issue Type: Bug
>  Components: canary
>Affects Versions: 2.0.0, 3.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> If you run the {{CanaryTool}} in region mode until it reaches the configured 
> timeout, the logs and sink results will show that it can continue executing 
> and scanning for 10 seconds.
> This is because the RegionTasks have already been submitted to an 
> ExecutorService which continues execution after timeout, and the Monitor 
> continues execution on a separate thread.
> The 10 seconds is seen in hbase 2.x, at least, because {{runMonitor}} will 
> close the {{Connection}} and that process 
> ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094])
>  will lead to {{ConnectionImplementation#close}} 
> ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300])
>  and inside {{shutdownPools}} we will potentially wait the full 10 seconds of 
> {{awaitTermination}} if client operations are in progress.
> The scenario can be improved by simply interrupting the monitor thread, as we 
> will often be in an {{invokeAll}} call in a {{sniff}} method, which will 
> interrupt the client threads and generally shutdown properly and timely. 
> However, we could be more robust by also watching for a shutdown signal in 
> the various tasks such as {{{}RegionTask{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28663) CanaryTool continues executing and scanning after timeout

2024-06-13 Thread David Manning (Jira)
David Manning created HBASE-28663:
-

 Summary: CanaryTool continues executing and scanning after timeout
 Key: HBASE-28663
 URL: https://issues.apache.org/jira/browse/HBASE-28663
 Project: HBase
  Issue Type: Bug
  Components: canary
Affects Versions: 2.0.0, 3.0.0
Reporter: David Manning
Assignee: David Manning


If you run the {{CanaryTool}} in region mode until it reaches the configured 
timeout, the logs and sink results will show that it can continue executing and 
scanning for 10 seconds.

This is because the RegionTasks have already been submitted to an 
ExecutorService which continues execution after timeout, and the Monitor 
continues execution on a separate thread.

The 10 seconds is seen in hbase 2.x, at least, because {{runMonitor}} will 
close the {{Connection}} and that process 
([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094])
 will lead to {{ConnectionImplementation#close}} 
([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300])
 and inside {{shutdownPools}} we will potentially wait the full 10 seconds of 
{{awaitTermination}} if client operations are in progress.

The scenario can be improved by simply interrupting the monitor thread, as we 
will often be in an {{invokeAll}} call in a {{sniff}} method, which will 
interrupt the client threads and generally shutdown properly and timely. 
However, we could be more robust by also watching for a shutdown signal in the 
various tasks such as {{{}RegionTask{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-28584) RS SIGSEGV under heavy replication load

2024-05-22 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848778#comment-17848778
 ] 

David Manning edited comment on HBASE-28584 at 5/22/24 11:57 PM:
-

We see it too in HBASE-28437. We have hbase.region.store.parallel.put.limit=0, 
but that is also the default in 2.5 after HBASE-26814. For us it always 
correlates with a lot of load that shows up suddenly, and then replicates to a 
peer cluster, and that peer cluster throws RegionTooBusyExceptions 
(blockedRequestCount metric.)


was (Author: dmanning):
We see it too. We have hbase.region.store.parallel.put.limit=0, but that is 
also the default in 2.5 after HBASE-26814. For us it always correlates with a 
lot of load that shows up suddenly, and then replicates to a peer cluster, and 
that peer cluster throws RegionTooBusyExceptions (blockedRequestCount metric.)

> RS SIGSEGV under heavy replication load
> ---
>
> Key: HBASE-28584
> URL: https://issues.apache.org/jira/browse/HBASE-28584
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.5.6
> Environment: RHEL 7.9
> JDK 11.0.23
> Hadoop 3.2.4
> Hbase 2.5.6
>Reporter: Whitney Jackson
>Priority: Major
>
> I'm observing RS crashes under heavy replication load:
>  
> {code:java}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f7546873b69, pid=29890, tid=36828
> #
> # JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.23+7) (build 
> 11.0.23+7-LTS-222)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.23+7-LTS-222, mixed 
> mode, tiered, compressed oops, g1 gc, linux-amd64)
> # Problematic frame:
> # J 24625 c2 
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V
>  (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209]
> {code}
>  
> The heavier load comes when a replication peer has been disabled for several 
> hours for patching etc. When the peer is re-enabled the replication load is 
> high until the peer is all caught up. The crashes happen on the cluster 
> receiving the replication edits.
>  
> I believe this problem started after upgrading from 2.4.x to 2.5.x.
>  
> One possibly relevant non-standard config I run with:
> {code:java}
> 
>   hbase.region.store.parallel.put.limit
>   
>   100
>   Added after seeing "failed to accept edits" replication errors 
> in the destination region servers indicating this limit was being exceeded 
> while trying to process replication edits.
> 
> {code}
>  
> I understand from other Jiras that the problem is likely around direct memory 
> usage by Netty. I haven't yet tried switching the Netty allocator to 
> {{unpooled}} or {{{}heap{}}}. I also haven't yet tried any of the  
> {{io.netty.allocator.*}} options.
>  
> {{MaxDirectMemorySize}} is set to 26g.
>  
> Here's the full stack for the relevant thread:
>  
> {code:java}
> Stack: [0x7f72e2e5f000,0x7f72e2f6],  sp=0x7f72e2f5e450,  free 
> space=1021k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> J 24625 c2 
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V
>  (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209]
> J 26253 c2 
> org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I 
> (21 bytes) @ 0x7f7545af2d84 [0x7f7545af2d20+0x0064]
> J 22971 c2 
> org.apache.hadoop.hbase.codec.KeyValueCodecWithTags$KeyValueEncoder.write(Lorg/apache/hadoop/hbase/Cell;)V
>  (27 bytes) @ 0x7f754663f700 [0x7f754663f4c0+0x0240]
> J 25251 c2 
> org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (90 bytes) @ 0x7f7546a53038 [0x7f7546a50e60+0x21d8]
> J 21182 c2 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (73 bytes) @ 0x7f7545f4d90c [0x7f7545f4d3a0+0x056c]
> J 21181 c2 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (149 bytes) @ 0x7f7545fd680c [0x7f7545fd65e0+0x022c]
> J 25389 c2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$$Lambda$247.run()V 
> (16 bytes) @ 0x7f7546ade660 [0x7f7546ade140+0x0520]
> J 24098 c2 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z

[jira] [Commented] (HBASE-28584) RS SIGSEGV under heavy replication load

2024-05-22 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848778#comment-17848778
 ] 

David Manning commented on HBASE-28584:
---

We see it too. We have hbase.region.store.parallel.put.limit=0, but that is 
also the default in 2.5 after HBASE-26814. For us it always correlates with a 
lot of load that shows up suddenly, and then replicates to a peer cluster, and 
that peer cluster throws RegionTooBusyExceptions (blockedRequestCount metric.)

> RS SIGSEGV under heavy replication load
> ---
>
> Key: HBASE-28584
> URL: https://issues.apache.org/jira/browse/HBASE-28584
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.5.6
> Environment: RHEL 7.9
> JDK 11.0.23
> Hadoop 3.2.4
> Hbase 2.5.6
>Reporter: Whitney Jackson
>Priority: Major
>
> I'm observing RS crashes under heavy replication load:
>  
> {code:java}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f7546873b69, pid=29890, tid=36828
> #
> # JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.23+7) (build 
> 11.0.23+7-LTS-222)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.23+7-LTS-222, mixed 
> mode, tiered, compressed oops, g1 gc, linux-amd64)
> # Problematic frame:
> # J 24625 c2 
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V
>  (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209]
> {code}
>  
> The heavier load comes when a replication peer has been disabled for several 
> hours for patching etc. When the peer is re-enabled the replication load is 
> high until the peer is all caught up. The crashes happen on the cluster 
> receiving the replication edits.
>  
> I believe this problem started after upgrading from 2.4.x to 2.5.x.
>  
> One possibly relevant non-standard config I run with:
> {code:java}
> 
>   hbase.region.store.parallel.put.limit
>   
>   100
>   Added after seeing "failed to accept edits" replication errors 
> in the destination region servers indicating this limit was being exceeded 
> while trying to process replication edits.
> 
> {code}
>  
> I understand from other Jiras that the problem is likely around direct memory 
> usage by Netty. I haven't yet tried switching the Netty allocator to 
> {{unpooled}} or {{{}heap{}}}. I also haven't yet tried any of the  
> {{io.netty.allocator.*}} options.
>  
> {{MaxDirectMemorySize}} is set to 26g.
>  
> Here's the full stack for the relevant thread:
>  
> {code:java}
> Stack: [0x7f72e2e5f000,0x7f72e2f6],  sp=0x7f72e2f5e450,  free 
> space=1021k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> J 24625 c2 
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V
>  (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209]
> J 26253 c2 
> org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I 
> (21 bytes) @ 0x7f7545af2d84 [0x7f7545af2d20+0x0064]
> J 22971 c2 
> org.apache.hadoop.hbase.codec.KeyValueCodecWithTags$KeyValueEncoder.write(Lorg/apache/hadoop/hbase/Cell;)V
>  (27 bytes) @ 0x7f754663f700 [0x7f754663f4c0+0x0240]
> J 25251 c2 
> org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (90 bytes) @ 0x7f7546a53038 [0x7f7546a50e60+0x21d8]
> J 21182 c2 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (73 bytes) @ 0x7f7545f4d90c [0x7f7545f4d3a0+0x056c]
> J 21181 c2 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (149 bytes) @ 0x7f7545fd680c [0x7f7545fd65e0+0x022c]
> J 25389 c2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$$Lambda$247.run()V 
> (16 bytes) @ 0x7f7546ade660 [0x7f7546ade140+0x0520]
> J 24098 c2 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z
>  (109 bytes) @ 0x7f754678fbb8 [0x7f754678f8e0+0x02d8]
> J 27297% c2 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run()V (603 
> bytes) @ 0x7f75466c4d48 [0x7f75466c4c80+0x00c8]
> j  
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run()V+44
> j  
> 

[jira] [Created] (HBASE-28422) SplitWalProcedure will attempt SplitWalRemoteProcedure on the same target RegionServer indefinitely

2024-03-05 Thread David Manning (Jira)
David Manning created HBASE-28422:
-

 Summary: SplitWalProcedure will attempt SplitWalRemoteProcedure on 
the same target RegionServer indefinitely
 Key: HBASE-28422
 URL: https://issues.apache.org/jira/browse/HBASE-28422
 Project: HBase
  Issue Type: Bug
  Components: master, proc-v2, wal
Affects Versions: 2.5.5
Reporter: David Manning


Similar to HBASE-28050. If HMaster selects a RegionServer for 
SplitWalRemoteProcedure, it will retry this server as long as the server is 
alive. I believe this is because even though 
{{RSProcedureDispatcher.ExecuteProceduresRemoteCall.run}} calls 
{{{}remoteCallFailed{}}}, there is no logic after this to select a new target 
server. For {{TransitRegionStateProcedure}} there is logic to select a new 
server for opening a region, using {{{}forceNewPlan{}}}. But 
SplitWalRemoteProcedure only has logic to try another server if we receive a 
{{DoNotRetryIOException}} in SplitWALRemoteProcedure#complete: 
[https://github.com/apache/hbase/blob/780ff56b3f23e7041ef1b705b7d3d0a53fdd05ae/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SplitWALRemoteProcedure.java#L104-L110]

If we receive any other IOException, we will just retry the target server 
forever. Just like in HBASE-28050, if there is a SaslException, this will never 
lead to retrying a SplitWalRemoteProcedure on a new server, which can lead to 
ServerCrashProcedure never finishing until the target server for 
SplitWalRemoteProcedure is restarted. The following log is seen repeatedly, 
always sending to the same host.
{code:java}
2024-01-31 15:59:43,616 WARN  [RSProcedureDispatcher-pool-72846] 
procedure.SplitWALRemoteProcedure - Failed split of 
hdfs:///hbase/WALs/,1704984571464-splitting/1704984571464.1706710908543,
 retry...
java.io.IOException: Call to address= failed on local exception: 
java.io.IOException: Can not send request because relogin is in progress.
at sun.reflect.GeneratedConstructorAccessor363.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:239)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:391)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:420)
at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114)
at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129)
at 
org.apache.hadoop.hbase.ipc.NettyRpcConnection.lambda$sendRequest$4(NettyRpcConnection.java:365)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at 
org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Can not send request because relogin is in 
progress.
at 
org.apache.hadoop.hbase.ipc.NettyRpcConnection.sendRequest0(NettyRpcConnection.java:321)
at 
org.apache.hadoop.hbase.ipc.NettyRpcConnection.lambda$sendRequest$4(NettyRpcConnection.java:363)
... 8 more
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28344) Flush journal logs are missing from 2.x

2024-03-04 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823462#comment-17823462
 ] 

David Manning commented on HBASE-28344:
---

Compaction status journal has the same problem, too.

> Flush journal logs are missing from 2.x 
> 
>
> Key: HBASE-28344
> URL: https://issues.apache.org/jira/browse/HBASE-28344
> Project: HBase
>  Issue Type: Improvement
>Reporter: Prathyusha
>Assignee: Prathyusha
>Priority: Minor
>
> After refactoring of TaskMonitor from branch-1
> [  public synchronized MonitoredTask createStatus(String 
> description)|https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/monitoring/TaskMonitor.java#L87]
> to branch-2/master
> public MonitoredTask createStatus(String description){           [return 
> createStatus(description, 
> false);|https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/monitoring/TaskMonitor.java#L87]
>  
> Flush journal logs are missing.
> While flush, currently we do no set ignore monitor flag as true here 
> [MonitoredTask status = TaskMonitor.get().createStatus("Flushing " + 
> this);|https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L2459]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25749) Improved logging when interrupting active RPC handlers holding the region close lock (HBASE-25212 hbase.regionserver.close.wait.abort)

2024-02-17 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818200#comment-17818200
 ] 

David Manning commented on HBASE-25749:
---

[~umesh9414] It doesn't let me assign to you - maybe your profile has to be 
updated to be allowed items to be assigned.

> Improved logging when interrupting active RPC handlers holding the region 
> close lock (HBASE-25212 hbase.regionserver.close.wait.abort)
> --
>
> Key: HBASE-25749
> URL: https://issues.apache.org/jira/browse/HBASE-25749
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, rpc
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.4.0
>Reporter: David Manning
>Priority: Minor
> Fix For: 3.0.0-beta-2
>
>
> HBASE-25212 adds an optional improvement to Close Region, for interrupting 
> active RPC handlers holding the region close lock. If, after the timeout is 
> reached, the close lock can still not be acquired, the regionserver may 
> abort. It would be helpful to add logging for which threads or components are 
> holding the region close lock at this time.
> Depending on the size of regionLockHolders, or use of any stack traces, log 
> output may need to be truncated. The interrupt code is in 
> HRegion#interruptRegionOperations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28221) Introduce regionserver metric for delayed flushes

2024-02-09 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816282#comment-17816282
 ] 

David Manning commented on HBASE-28221:
---

Alerting on {{flushQueueLength}} is not really the same. You will capture every 
delayed flush from {{PeriodicMemstoreFlusher}} as well. So it will capture both 
cases where you enqueue a delayed flush: either for a very busy region beyond 
{{blockingStoreFiles}} limit or for a region which is mostly idle and flushing 
after memstore edits are an hour old.

Alerting on {{blockedRequestsCount}} will give you a stronger signal, because 
this is when you have a {{RegionTooBusyException}} due to the memstore being 
full, waiting on a delayed flush.

But if you want to alert on a delayed flush without a full memstore, I don't 
know that it could be done today without adding a new metric. If you have 
site-wide settings for {{blockingStoreFiles}}, you could alert when 
{{maxStoreFileCount}} is above, or near, {{blockingStoreFiles}}. But if it 
varies by table, you would have to alert per-table.

So there could still be some value in adding this type of metric (but consider 
whether alerting on client impact, i.e. {{RegionTooBusyException}} and 
{{blockedRequestsCount}} would be sufficient first.) [~rkrahul324] [~vjasani]

> Introduce regionserver metric for delayed flushes
> -
>
> Key: HBASE-28221
> URL: https://issues.apache.org/jira/browse/HBASE-28221
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.4.17, 2.5.6
>Reporter: Viraj Jasani
>Assignee: Rahul Kumar
>Priority: Major
> Fix For: 2.4.18, 2.7.0, 2.5.8, 3.0.0-beta-2, 2.6.1
>
>
> If compaction is disabled temporarily to allow stabilizing hdfs load, we can 
> forget re-enabling the compaction. This can result into flushes getting 
> delayed for "hbase.hstore.blockingWaitTime" time (90s). While flushes do 
> happen eventually after waiting for max blocking time, it is important to 
> realize that any cluster cannot function well with compaction disabled for 
> significant amount of time.
>  
> We would also block any write requests until region is flushed (90+ sec, by 
> default):
> {code:java}
> 2023-11-27 20:40:52,124 WARN  [,queue=18,port=60020] regionserver.HRegion - 
> Region is too busy due to exceeding memstore size limit.
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=table1,1699923733811.4fd5e52e2133df1e347f32c646f23ab4., 
> server=server-1,60020,1699421714454, memstoreSize=1073820928, 
> blockingMemStoreSize=1073741824
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4200)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3264)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3215)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:967)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:895)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2524)
>     at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36812)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2432)
>     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) 
> {code}
>  
> Delayed flush logs:
> {code:java}
> LOG.warn("{} has too many store files({}); delaying flush up to {} ms",
>   region.getRegionInfo().getEncodedName(), getStoreFileCount(region),
>   this.blockingWaitTime); {code}
> Suggestion: Introduce regionserver metric (MetricsRegionServerSource) for the 
> num of flushes getting delayed due to too many store files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28257) Memstore flushRequest can be blocked by a delayed flush scheduled by PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes

2024-01-19 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning reassigned HBASE-28257:
-

Assignee: David Manning

> Memstore flushRequest can be blocked by a delayed flush scheduled by 
> PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes
> -
>
> Key: HBASE-28257
> URL: https://issues.apache.org/jira/browse/HBASE-28257
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0, 3.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> *Steps to reproduce:*
> # Make an edit to a region.
> # Wait 1 hour + 10 seconds (default value of 
> {{hbase.regionserver.optionalcacheflushinterval}} plus 
> {{hbase.regionserver.flush.check.period}}.)
> # Make a very large number of edits to the region (i.e. >= 1GB, pressure the 
> memstore.)
> *Expected:*
> Memstore pressure leads to flushes.
> *Result:*
> The PeriodicMemstoreFlusher has scheduled a refresh with a random delay of 
> 0-5 minutes (default for 
> {{hbase.regionserver.periodicmemstoreflusher.rangeofdelayseconds}}.) Memstore 
> pressure flushes are blocked by the scheduled delayed flush. Client receives 
> 0-5 minutes of {{RegionTooBusyExceptions}} until the delayed flush executes.
> *Logs:*
> 2023-12-13 06:00:13,573 INFO regionserver.HRegionServer - 
> MemstoreFlusherChore requesting flush of  because  has an old 
> edit so flush to free WALs after random delay 166761 ms
> 2023-12-13 06:00:53,219 DEBUG regionserver.HRegion - Flush requested on 
> 
> 2023-12-13 06:01:47,694 WARN regionserver.HRegion - Region is too busy due to 
> exceeding memstore size limit. 
> org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, 
> regionName=, server=
> at org.apache.hadoop.hbase.regionserver.HRegion.checkResources
> ...
> repeats
> ...
> 2023-12-13 06:01:52,223 WARN regionserver.HRegion - Region is too busy due to 
> exceeding memstore size limit. 
> org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, 
> regionName=, server=
> at org.apache.hadoop.hbase.regionserver.HRegion.checkResources
> ...
> 2023-12-13 06:03:00,340 INFO regionserver.HRegion - Flushing  1/1 
> column families, dataSize=534.77 MB heapSize=1.00 GB
> 2023-12-13 06:03:09,634 INFO regionserver.HRegion - Finished flush of 
> dataSize ~534.77 MB/560744948, heapSize ~1.00 GB/1073816296, currentSize=0 
> B/0 for  in 9294ms, sequenceid=21310753, compaction requested=false
> Note also this is the same cause as discussed in HBASE-16030 conversation 
> https://issues.apache.org/jira/browse/HBASE-16030?focusedCommentId=15340153=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15340153



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28293) Add metric for GetClusterStatus request count.

2024-01-08 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804409#comment-17804409
 ] 

David Manning commented on HBASE-28293:
---

Yeah it would be nice to follow similar patterns as for other HMaster 
operations, like Move, Snapshot, etc. But I think most of those are now tracked 
by procedures, which we would not have in this case.

> Add metric for GetClusterStatus request count.
> --
>
> Key: HBASE-28293
> URL: https://issues.apache.org/jira/browse/HBASE-28293
> Project: HBase
>  Issue Type: Bug
>Reporter: Rushabh Shah
>Priority: Major
>
> We have been bitten multiple times by GetClusterStatus request overwhelming 
> HMaster's memory usage. It would be good to add a metric for the total 
> GetClusterStatus requests count.
> In almost all of our production incidents involving GetClusterStatus request, 
> HMaster were running out of memory with many clients call this RPC in 
> parallel and the response size is very big.
> In hbase2 we have 
> [ClusterMetrics.Option|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterMetrics.java#L164-L224]
>  which can reduce the size of the response.
> It would be nice to add another metric to indicate if the response size of 
> GetClusterStatus is greater than some threshold (like 5MB)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28271) Infinite waiting on lock acquisition by snapshot can result in unresponsive master

2023-12-21 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799543#comment-17799543
 ] 

David Manning commented on HBASE-28271:
---

{quote}In cases where a region stays in RIT for considerable time, if enough 
attempts are made by the client to create snapshots on the table, it can easily 
exhaust all handler threads, leading to potentially unresponsive master.{quote}

It can happen more easily than this, too, because you don't have to make repeat 
attempts to create snapshot on the same table. You can attempt to snapshot a 
different table, and it will still hang a new RPC handler.

This is because the {{SnapshotManager#snapshotTable}} is {{synchronized}} and 
this is where the {{handler.prepare()}} call is made to acquire the lock. We 
indefinitely await the lock held by the region in transition, but we do so 
within {{SnapshotManager}}'s synchronized block.

Any additional snapshot RPC, even for a different table, will end up blocked on 
entering a separate {{synchronized}} method in 
{{SnapshotManager#cleanupSentinels}}. This makes the condition easier to hit if 
you are doing a process which snapshots all tables in the cluster.

> Infinite waiting on lock acquisition by snapshot can result in unresponsive 
> master
> --
>
> Key: HBASE-28271
> URL: https://issues.apache.org/jira/browse/HBASE-28271
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-4, 2.4.17, 2.5.7
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
> Attachments: image.png
>
>
> When a region is stuck in transition for significant time, any attempt to 
> take snapshot on the table would keep master handler thread in forever 
> waiting state. As part of the creating snapshot on enabled or disabled table, 
> in order to get the table level lock, LockProcedure is executed but if any 
> region of the table is in transition, LockProcedure could not be executed by 
> the snapshot handler, resulting in forever waiting until the region 
> transition is completed, allowing the table level lock to be acquired by the 
> snapshot handler.
> In cases where a region stays in RIT for considerable time, if enough 
> attempts are made by the client to create snapshots on the table, it can 
> easily exhaust all handler threads, leading to potentially unresponsive 
> master. Attached a sample thread dump.
> Proposal: The snapshot handler should not stay stuck forever if it cannot 
> take table level lock, it should fail-fast.
> !image.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28257) Memstore flushRequest can be blocked by a delayed flush scheduled by PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes

2023-12-13 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning resolved HBASE-28257.
---
Resolution: Duplicate

> Memstore flushRequest can be blocked by a delayed flush scheduled by 
> PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes
> -
>
> Key: HBASE-28257
> URL: https://issues.apache.org/jira/browse/HBASE-28257
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0, 3.0.0
>Reporter: David Manning
>Priority: Minor
>
> *Steps to reproduce:*
> # Make an edit to a region.
> # Wait 1 hour + 10 seconds (default value of 
> {{hbase.regionserver.optionalcacheflushinterval}} plus 
> {{hbase.regionserver.flush.check.period}}.)
> # Make a very large number of edits to the region (i.e. >= 1GB, pressure the 
> memstore.)
> *Expected:*
> Memstore pressure leads to flushes.
> *Result:*
> The PeriodicMemstoreFlusher has scheduled a refresh with a random delay of 
> 0-5 minutes (default for 
> {{hbase.regionserver.periodicmemstoreflusher.rangeofdelayseconds}}.) Memstore 
> pressure flushes are blocked by the scheduled delayed flush. Client receives 
> 0-5 minutes of {{RegionTooBusyExceptions}} until the delayed flush executes.
> *Logs:*
> 2023-12-13 06:00:13,573 INFO regionserver.HRegionServer - 
> MemstoreFlusherChore requesting flush of  because  has an old 
> edit so flush to free WALs after random delay 166761 ms
> 2023-12-13 06:00:53,219 DEBUG regionserver.HRegion - Flush requested on 
> 
> 2023-12-13 06:01:47,694 WARN regionserver.HRegion - Region is too busy due to 
> exceeding memstore size limit. 
> org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, 
> regionName=, server=
> at org.apache.hadoop.hbase.regionserver.HRegion.checkResources
> ...
> repeats
> ...
> 2023-12-13 06:01:52,223 WARN regionserver.HRegion - Region is too busy due to 
> exceeding memstore size limit. 
> org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, 
> regionName=, server=
> at org.apache.hadoop.hbase.regionserver.HRegion.checkResources
> ...
> 2023-12-13 06:03:00,340 INFO regionserver.HRegion - Flushing  1/1 
> column families, dataSize=534.77 MB heapSize=1.00 GB
> 2023-12-13 06:03:09,634 INFO regionserver.HRegion - Finished flush of 
> dataSize ~534.77 MB/560744948, heapSize ~1.00 GB/1073816296, currentSize=0 
> B/0 for  in 9294ms, sequenceid=21310753, compaction requested=false
> Note also this is the same cause as discussed in HBASE-16030 conversation 
> https://issues.apache.org/jira/browse/HBASE-16030?focusedCommentId=15340153=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15340153



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-28257) Memstore flushRequest can be blocked by a delayed flush scheduled by PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes

2023-12-13 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning reassigned HBASE-28257:
-

Assignee: (was: David Manning)

> Memstore flushRequest can be blocked by a delayed flush scheduled by 
> PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes
> -
>
> Key: HBASE-28257
> URL: https://issues.apache.org/jira/browse/HBASE-28257
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0, 3.0.0
>Reporter: David Manning
>Priority: Minor
>
> *Steps to reproduce:*
> # Make an edit to a region.
> # Wait 1 hour + 10 seconds (default value of 
> {{hbase.regionserver.optionalcacheflushinterval}} plus 
> {{hbase.regionserver.flush.check.period}}.)
> # Make a very large number of edits to the region (i.e. >= 1GB, pressure the 
> memstore.)
> *Expected:*
> Memstore pressure leads to flushes.
> *Result:*
> The PeriodicMemstoreFlusher has scheduled a refresh with a random delay of 
> 0-5 minutes (default for 
> {{hbase.regionserver.periodicmemstoreflusher.rangeofdelayseconds}}.) Memstore 
> pressure flushes are blocked by the scheduled delayed flush. Client receives 
> 0-5 minutes of {{RegionTooBusyExceptions}} until the delayed flush executes.
> *Logs:*
> 2023-12-13 06:00:13,573 INFO regionserver.HRegionServer - 
> MemstoreFlusherChore requesting flush of  because  has an old 
> edit so flush to free WALs after random delay 166761 ms
> 2023-12-13 06:00:53,219 DEBUG regionserver.HRegion - Flush requested on 
> 
> 2023-12-13 06:01:47,694 WARN regionserver.HRegion - Region is too busy due to 
> exceeding memstore size limit. 
> org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, 
> regionName=, server=
> at org.apache.hadoop.hbase.regionserver.HRegion.checkResources
> ...
> repeats
> ...
> 2023-12-13 06:01:52,223 WARN regionserver.HRegion - Region is too busy due to 
> exceeding memstore size limit. 
> org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=1.0 G, 
> regionName=, server=
> at org.apache.hadoop.hbase.regionserver.HRegion.checkResources
> ...
> 2023-12-13 06:03:00,340 INFO regionserver.HRegion - Flushing  1/1 
> column families, dataSize=534.77 MB heapSize=1.00 GB
> 2023-12-13 06:03:09,634 INFO regionserver.HRegion - Finished flush of 
> dataSize ~534.77 MB/560744948, heapSize ~1.00 GB/1073816296, currentSize=0 
> B/0 for  in 9294ms, sequenceid=21310753, compaction requested=false
> Note also this is the same cause as discussed in HBASE-16030 conversation 
> https://issues.apache.org/jira/browse/HBASE-16030?focusedCommentId=15340153=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15340153



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28257) Memstore flushRequest can be blocked by a delayed flush scheduled by PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes

2023-12-13 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-28257:
--
Description: 
*Steps to reproduce:*
# Make an edit to a region.
# Wait 1 hour + 10 seconds (default value of 
{{hbase.regionserver.optionalcacheflushinterval}} plus 
{{hbase.regionserver.flush.check.period}}.)
# Make a very large number of edits to the region (i.e. >= 1GB, pressure the 
memstore.)

*Expected:*
Memstore pressure leads to flushes.

*Result:*
The PeriodicMemstoreFlusher has scheduled a refresh with a random delay of 0-5 
minutes (default for 
{{hbase.regionserver.periodicmemstoreflusher.rangeofdelayseconds}}.) Memstore 
pressure flushes are blocked by the scheduled delayed flush. Client receives 
0-5 minutes of {{RegionTooBusyExceptions}} until the delayed flush executes.

*Logs:*
2023-12-13 06:00:13,573 INFO regionserver.HRegionServer - MemstoreFlusherChore 
requesting flush of  because  has an old edit so flush to free WALs 
after random delay 166761 ms
2023-12-13 06:00:53,219 DEBUG regionserver.HRegion - Flush requested on 
2023-12-13 06:01:47,694 WARN regionserver.HRegion - Region is too busy due to 
exceeding memstore size limit. org.apache.hadoop.hbase.RegionTooBusyException: 
Over memstore limit=1.0 G, regionName=, server=
at org.apache.hadoop.hbase.regionserver.HRegion.checkResources
...
repeats
...
2023-12-13 06:01:52,223 WARN regionserver.HRegion - Region is too busy due to 
exceeding memstore size limit. org.apache.hadoop.hbase.RegionTooBusyException: 
Over memstore limit=1.0 G, regionName=, server=
at org.apache.hadoop.hbase.regionserver.HRegion.checkResources
...
2023-12-13 06:03:00,340 INFO regionserver.HRegion - Flushing  1/1 
column families, dataSize=534.77 MB heapSize=1.00 GB
2023-12-13 06:03:09,634 INFO regionserver.HRegion - Finished flush of dataSize 
~534.77 MB/560744948, heapSize ~1.00 GB/1073816296, currentSize=0 B/0 for 
 in 9294ms, sequenceid=21310753, compaction requested=false

Note also this is the same cause as discussed in HBASE-16030 conversation 
https://issues.apache.org/jira/browse/HBASE-16030?focusedCommentId=15340153=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15340153

  was:
*Steps to reproduce:*
# Make an edit to a region.
# Wait 1 hour (default value of hbase.regionserver.optionalcacheflushinterval.)
# Make a very large number of edits to the region (i.e. >= 1GB, pressure the 
memstore.)

*Expected:*
Memstore pressure leads to flushes.

*Result:*
The PeriodicMemstoreFlusher has scheduled a refresh with a random delay of 0-5 
minutes (default for 
hbase.regionserver.periodicmemstoreflusher.rangeofdelayseconds.) Memstore 
pressure flushes are blocked by the scheduled delayed flush. Client receives 
0-5 minutes of RegionTooBusyExceptions until the delayed flush executes.

*Logs:*
2023-12-13 06:00:13,573 INFO regionserver.HRegionServer - MemstoreFlusherChore 
requesting flush of  because  has an old edit so flush to free WALs 
after random delay 166761 ms
2023-12-13 06:00:53,219 DEBUG regionserver.HRegion - Flush requested on 
2023-12-13 06:01:47,694 WARN regionserver.HRegion - Region is too busy due to 
exceeding memstore size limit. org.apache.hadoop.hbase.RegionTooBusyException: 
Over memstore limit=1.0 G, regionName=, server=
at org.apache.hadoop.hbase.regionserver.HRegion.checkResources
...
repeats
...
2023-12-13 06:01:52,223 WARN regionserver.HRegion - Region is too busy due to 
exceeding memstore size limit. org.apache.hadoop.hbase.RegionTooBusyException: 
Over memstore limit=1.0 G, regionName=, server=
at org.apache.hadoop.hbase.regionserver.HRegion.checkResources
...
2023-12-13 06:03:00,340 INFO regionserver.HRegion - Flushing  1/1 
column families, dataSize=534.77 MB heapSize=1.00 GB
2023-12-13 06:03:09,634 INFO regionserver.HRegion - Finished flush of dataSize 
~534.77 MB/560744948, heapSize ~1.00 GB/1073816296, currentSize=0 B/0 for 
 in 9294ms, sequenceid=21310753, compaction requested=false

Note also this is the same cause as discussed in HBASE-16030 conversation 
https://issues.apache.org/jira/browse/HBASE-16030?focusedCommentId=15340153=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15340153


> Memstore flushRequest can be blocked by a delayed flush scheduled by 
> PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes
> -
>
> Key: HBASE-28257
> URL: https://issues.apache.org/jira/browse/HBASE-28257
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0, 3.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> *Steps to reproduce:*
> # Make an edit to a region.
> # 

[jira] [Created] (HBASE-28257) Memstore flushRequest can be blocked by a delayed flush scheduled by PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 minutes

2023-12-13 Thread David Manning (Jira)
David Manning created HBASE-28257:
-

 Summary: Memstore flushRequest can be blocked by a delayed flush 
scheduled by PeriodicMemstoreFlusher, RegionTooBusyExceptions for up to 5 
minutes
 Key: HBASE-28257
 URL: https://issues.apache.org/jira/browse/HBASE-28257
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 2.0.0, 3.0.0
Reporter: David Manning
Assignee: David Manning


*Steps to reproduce:*
# Make an edit to a region.
# Wait 1 hour (default value of hbase.regionserver.optionalcacheflushinterval.)
# Make a very large number of edits to the region (i.e. >= 1GB, pressure the 
memstore.)

*Expected:*
Memstore pressure leads to flushes.

*Result:*
The PeriodicMemstoreFlusher has scheduled a refresh with a random delay of 0-5 
minutes (default for 
hbase.regionserver.periodicmemstoreflusher.rangeofdelayseconds.) Memstore 
pressure flushes are blocked by the scheduled delayed flush. Client receives 
0-5 minutes of RegionTooBusyExceptions until the delayed flush executes.

*Logs:*
2023-12-13 06:00:13,573 INFO regionserver.HRegionServer - MemstoreFlusherChore 
requesting flush of  because  has an old edit so flush to free WALs 
after random delay 166761 ms
2023-12-13 06:00:53,219 DEBUG regionserver.HRegion - Flush requested on 
2023-12-13 06:01:47,694 WARN regionserver.HRegion - Region is too busy due to 
exceeding memstore size limit. org.apache.hadoop.hbase.RegionTooBusyException: 
Over memstore limit=1.0 G, regionName=, server=
at org.apache.hadoop.hbase.regionserver.HRegion.checkResources
...
repeats
...
2023-12-13 06:01:52,223 WARN regionserver.HRegion - Region is too busy due to 
exceeding memstore size limit. org.apache.hadoop.hbase.RegionTooBusyException: 
Over memstore limit=1.0 G, regionName=, server=
at org.apache.hadoop.hbase.regionserver.HRegion.checkResources
...
2023-12-13 06:03:00,340 INFO regionserver.HRegion - Flushing  1/1 
column families, dataSize=534.77 MB heapSize=1.00 GB
2023-12-13 06:03:09,634 INFO regionserver.HRegion - Finished flush of dataSize 
~534.77 MB/560744948, heapSize ~1.00 GB/1073816296, currentSize=0 B/0 for 
 in 9294ms, sequenceid=21310753, compaction requested=false

Note also this is the same cause as discussed in HBASE-16030 conversation 
https://issues.apache.org/jira/browse/HBASE-16030?focusedCommentId=15340153=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15340153



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-20034) Make periodic flusher delay configurable

2023-12-13 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-20034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-20034:
--
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> Make periodic flusher delay configurable
> 
>
> Key: HBASE-20034
> URL: https://issues.apache.org/jira/browse/HBASE-20034
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 3.0.0-alpha-1
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Major
> Attachments: HBASE-20034.branch-1.patch, HBASE-20034.master.patch
>
>
> PeriodicMemstoreFlusher is currently configured to flush with a random delay 
> of up to 5 minutes.  Make this configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-21785) master reports open regions as RITs and also messes up rit age metric

2023-11-21 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-21785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788523#comment-17788523
 ] 

David Manning commented on HBASE-21785:
---

[~sershe] This says fixed in 2.2.0, but I don't see the commit 
https://github.com/apache/hbase/commit/9ef6bc4323c9be0e18f0cf9918a582e6b4a11853 
in branch-2.

> master reports open regions as RITs and also messes up rit age metric
> -
>
> Key: HBASE-21785
> URL: https://issues.apache.org/jira/browse/HBASE-21785
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.0
>
> Attachments: HBASE-21785.01.patch, HBASE-21785.patch
>
>
> {noformat}
> RegionState   RIT time (ms)   Retries
> dba183f0dadfcc9dc8ae0a6dd59c84e6  dba183f0dadfcc9dc8ae0a6dd59c84e6. 
> state=OPEN, ts=Wed Dec 31 16:00:00 PST 1969 (1548453918s ago), 
> server=server,17020,1548452922054  1548453918735   0
> {noformat}
> RIT age metric also gets set to a bogus value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-25222) Add a cost function to move the daughter regions of a recent split to different region servers

2023-10-19 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning resolved HBASE-25222.
---
Resolution: Duplicate

> Add a cost function to move the daughter regions of a recent split to 
> different region servers 
> ---
>
> Key: HBASE-25222
> URL: https://issues.apache.org/jira/browse/HBASE-25222
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sandeep Pal
>Assignee: Sandeep Pal
>Priority: Major
>
> In HBase, hotspot regions are easily formed whenever there is skew and there 
> is high write volume. Few regions grow really fast which also becomes the 
> bottleneck on the few region servers. 
> It would be beneficial to add a cost function to move the regions after the 
> split to differetn region servers. In this way the writes to hot key range 
> will be distributed to multiple region servers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25222) Add a cost function to move the daughter regions of a recent split to different region servers

2023-10-19 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1498#comment-1498
 ] 

David Manning commented on HBASE-25222:
---

{{hbase.master.auto.separate.child.regions.after.split.enabled }} is introduced 
in HBASE-25518. It is {{false}} by default, but seems to solve the problem that 
this issue is suggesting.

> Add a cost function to move the daughter regions of a recent split to 
> different region servers 
> ---
>
> Key: HBASE-25222
> URL: https://issues.apache.org/jira/browse/HBASE-25222
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sandeep Pal
>Assignee: Sandeep Pal
>Priority: Major
>
> In HBase, hotspot regions are easily formed whenever there is skew and there 
> is high write volume. Few regions grow really fast which also becomes the 
> bottleneck on the few region servers. 
> It would be beneficial to add a cost function to move the regions after the 
> split to differetn region servers. In this way the writes to hot key range 
> will be distributed to multiple region servers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25222) Add a cost function to move the daughter regions of a recent split to different region servers

2023-10-12 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774707#comment-17774707
 ] 

David Manning commented on HBASE-25222:
---

Seems like in hbase 2.x, this is already less of an issue, because the 
SplitTableRegionProcedure will choose new servers to open the daughter regions. 
In hbase 1.x, the daughter regions would open on the same server as the parent 
region.

> Add a cost function to move the daughter regions of a recent split to 
> different region servers 
> ---
>
> Key: HBASE-25222
> URL: https://issues.apache.org/jira/browse/HBASE-25222
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sandeep Pal
>Assignee: Sandeep Pal
>Priority: Major
>
> In HBase, hotspot regions are easily formed whenever there is skew and there 
> is high write volume. Few regions grow really fast which also becomes the 
> bottleneck on the few region servers. 
> It would be beneficial to add a cost function to move the regions after the 
> split to differetn region servers. In this way the writes to hot key range 
> will be distributed to multiple region servers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27540) Client metrics for success/failure counts.

2023-02-14 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-27540:
--
Component/s: metrics

> Client metrics for success/failure counts.
> --
>
> Key: HBASE-27540
> URL: https://issues.apache.org/jira/browse/HBASE-27540
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, metrics
>Affects Versions: 3.0.0-alpha-3, 2.5.2
>Reporter: Victor Li
>Assignee: Victor Li
>Priority: Major
> Fix For: 3.0.0-alpha-4, 2.4.16, 2.5.3
>
>
> Client metrics to see total number of successful or failure counts of related 
> RPC calls like get, mutate, scan etc...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-15242) Client metrics for retries and timeouts

2023-02-10 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-15242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687334#comment-17687334
 ] 

David Manning commented on HBASE-15242:
---

For example, a {{RetriesExhaustedException}} will tell us we were doing retries 
and still failed. A {{CallTimeoutException}} will tell us that we hit a 
timeout. We could choose a subset of exceptions to instrument for metrics, just 
like the regionserver does.

> Client metrics for retries and timeouts
> ---
>
> Key: HBASE-15242
> URL: https://issues.apache.org/jira/browse/HBASE-15242
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Mikhail Antonov
>Assignee: Victor Li
>Priority: Major
>
> Client metrics to see total/avg number or retries, retries exhaused and 
> timeouts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-15242) Client metrics for retries and timeouts

2023-02-10 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-15242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687333#comment-17687333
 ] 

David Manning commented on HBASE-15242:
---

[~vli02us] Maybe we can report exception counts for some specific exceptions? 
This may be enough to give details about retries and timeouts and other errors 
too. We could do something similar to what the server metrics show: 
https://github.com/apache/hbase/blob/a854cba59f52bd5574b55146352b2236a718f6b0/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/MetricsHBaseServer.java#L100-L107

> Client metrics for retries and timeouts
> ---
>
> Key: HBASE-15242
> URL: https://issues.apache.org/jira/browse/HBASE-15242
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Mikhail Antonov
>Assignee: Victor Li
>Priority: Major
>
> Client metrics to see total/avg number or retries, retries exhaused and 
> timeouts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27159) Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of hits and misses for cacheable requests

2022-10-15 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-27159:
--
Status: Patch Available  (was: Open)

> Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of 
> hits and misses for cacheable requests
> 
>
> Key: HBASE-27159
> URL: https://issues.apache.org/jira/browse/HBASE-27159
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache, metrics
>Affects Versions: 2.0.0, 3.0.0-alpha-1
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> [https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400]
> {code:java}
> public double getHitCachingRatio() {
>   double requestCachingCount = getRequestCachingCount();
>   if (requestCachingCount == 0) {
> return 0;
>   }
>   return getHitCachingCount() / requestCachingCount;
> } {code}
> This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. 
> The metric represents the percentage of requests which were cacheable, but 
> not found in the cache. Unfortunately, since the counters are process-level 
> counters, the ratio is for the lifetime of the process. This makes it less 
> useful for looking at cache behavior during a smaller time period.
> The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. 
> Having access to the underlying counters allows for offline computation of 
> the same metric for any given time period. But these counters are not emitted 
> today from {{{}MetricsRegionServerWrapperImpl.java{}}}.
> Compare this to {{hitCount}} and {{missCount}} which are emitted as metrics 
> {{blockCacheHitCount}} and {{{}blockCacheMissCount{}}}. But these are raw 
> counts for the cache, which include requests that are not cacheable. The 
> cacheable metrics are more interesting, since it can be common to miss on a 
> request which is not cacheable.
> Interestingly, these metrics are emitted regularly as part of a log line in 
> {{{}StatisticsThread.logStats{}}}.
> We should emit blockCache{{{}HitCachingCount{}}} and 
> {{blockCacheMissCachingCount}} along with the current metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-27159) Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of hits and misses for cacheable requests

2022-10-15 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning reassigned HBASE-27159:
-

Assignee: David Manning

> Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of 
> hits and misses for cacheable requests
> 
>
> Key: HBASE-27159
> URL: https://issues.apache.org/jira/browse/HBASE-27159
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache, metrics
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> [https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400]
> {code:java}
> public double getHitCachingRatio() {
>   double requestCachingCount = getRequestCachingCount();
>   if (requestCachingCount == 0) {
> return 0;
>   }
>   return getHitCachingCount() / requestCachingCount;
> } {code}
> This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. 
> The metric represents the percentage of requests which were cacheable, but 
> not found in the cache. Unfortunately, since the counters are process-level 
> counters, the ratio is for the lifetime of the process. This makes it less 
> useful for looking at cache behavior during a smaller time period.
> The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. 
> Having access to the underlying counters allows for offline computation of 
> the same metric for any given time period. But these counters are not emitted 
> today from {{{}MetricsRegionServerWrapperImpl.java{}}}.
> Compare this to {{hitCount}} and {{missCount}} which are emitted as metrics 
> {{blockCacheHitCount}} and {{{}blockCacheMissCount{}}}. But these are raw 
> counts for the cache, which include requests that are not cacheable. The 
> cacheable metrics are more interesting, since it can be common to miss on a 
> request which is not cacheable.
> Interestingly, these metrics are emitted regularly as part of a log line in 
> {{{}StatisticsThread.logStats{}}}.
> We should emit blockCache{{{}HitCachingCount{}}} and 
> {{blockCacheMissCachingCount}} along with the current metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27302) Adding a trigger for Stochastica Balancer to safeguard for upper bound outliers.

2022-08-13 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579310#comment-17579310
 ] 

David Manning commented on HBASE-27302:
---

[~claraxiong] you may find https://issues.apache.org/jira/browse/HBASE-22349 
useful to you. I used it for exactly this reason - triggering a balancer run in 
"sloppy" cases where a regionserver has more than 1+X% or less than 1-X% 
regions, compared to the average (mean) of regions per regionserver in the 
cluster.

> Adding a trigger for Stochastica Balancer to safeguard for upper bound 
> outliers.
> 
>
> Key: HBASE-27302
> URL: https://issues.apache.org/jira/browse/HBASE-27302
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Reporter: Clara Xiong
>Priority: Major
>
> In large clusters, if one outlier has a lot of regions, the calculated 
> imbalance for  RegionCountSkewCostFunction is quite low and often fails to 
> trigger the balancer.
> For example, a node with twice average count on a 400-node cluster only 
> produce an imbalance of 0.004 < 0.02(current default threshold to trigger 
> balancer). An empty node also have similar effect but we have a safeguard in 
> place. https://issues.apache.org/jira/browse/HBASE-24139
> We can add a safeguard for this so  we don't have to lower threshold on 
> larger clusters that makes the balancer more sensitive to other minor 
> imbalances.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate region count distribution

2022-08-13 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579308#comment-17579308
 ] 

David Manning commented on HBASE-25625:
---

[~bbeaudreault] [~claraxiong] you may find 
https://issues.apache.org/jira/browse/HBASE-22349 useful to you. I used it for 
exactly this reason - triggering a balancer run in "sloppy" cases where a 
regionserver has more than 1+X% or less than 1-X% regions.

> StochasticBalancer CostFunctions needs a better way to evaluate region count 
> distribution
> -
>
> Key: HBASE-25625
> URL: https://issues.apache.org/jira/browse/HBASE-25625
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer, master
>Reporter: Clara Xiong
>Assignee: Clara Xiong
>Priority: Major
> Attachments: image-2021-10-05-17-17-50-944.png
>
>
> Currently CostFunctions including RegionCountSkewCostFunctions, 
> PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the 
> unevenness of the distribution by getting the sum of deviation per region 
> server. This simple implementation works when the cluster is small. But when 
> the cluster get larger with more region servers and regions, it doesn't work 
> well with hot spots or a small number of unbalanced servers. The proposal is 
> to use the standard deviation of the count per region server to capture the 
> existence of a small portion of region servers with overwhelming 
> load/allocation.
> TableSkewCostFunction uses the sum of the max deviation region per server for 
> all tables as the measure of unevenness. It doesn't work in a very common 
> scenario in operations. Say we have 100 regions on 50 nodes, two on each. We 
> add 50 new nodes and they have 0 each. The max deviation from the mean is 1, 
> compared to 99 in the worst case scenario of 100 regions on a single server. 
> The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer 
> wouldn't move.  The proposal is to use the standard deviation of the count 
> per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 
> in this case.
> Patch is in test and will follow shortly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27159) Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of hits and misses for cacheable requests

2022-06-24 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-27159:
--
Description: 
[https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400]
{code:java}
public double getHitCachingRatio() {
  double requestCachingCount = getRequestCachingCount();
  if (requestCachingCount == 0) {
return 0;
  }
  return getHitCachingCount() / requestCachingCount;
} {code}
This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. 
The metric represents the percentage of requests which were cacheable, but not 
found in the cache. Unfortunately, since the counters are process-level 
counters, the ratio is for the lifetime of the process. This makes it less 
useful for looking at cache behavior during a smaller time period.

The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. 
Having access to the underlying counters allows for offline computation of the 
same metric for any given time period. But these counters are not emitted today 
from {{{}MetricsRegionServerWrapperImpl.java{}}}.

Compare this to {{hitCount}} and {{missCount}} which are emitted as metrics 
{{blockCacheHitCount}} and {{{}blockCacheMissCount{}}}. But these are raw 
counts for the cache, which include requests that are not cacheable. The 
cacheable metrics are more interesting, since it can be common to miss on a 
request which is not cacheable.

Interestingly, these metrics are emitted regularly as part of a log line in 
{{{}StatisticsThread.logStats{}}}.

We should emit blockCache{{{}HitCachingCount{}}} and 
{{blockCacheMissCachingCount}} along with the current metrics.

  was:
[https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400]
{code:java}
public double getHitCachingRatio() {
  double requestCachingCount = getRequestCachingCount();
  if (requestCachingCount == 0) {
return 0;
  }
  return getHitCachingCount() / requestCachingCount;
} {code}
This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. 
The metric represents the percentage of requests which were cacheable, but not 
found in the cache. Unfortunately, since the counters are process-level 
counters, the ratio is for the lifetime of the process. This makes it less 
useful for looking at cache behavior during a smaller time period.

The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. 
Having access to the underlying counters allows for offline computation of the 
same metric for any given time period. But these counters are not emitted today 
from {{{}MetricsRegionServerWrapperImpl.java{}}}.

Compare this to {{hitCount}} and {{missCount}} which are emitted as metrics 
{{blockCacheHitCount}} and {{{}blockCacheMissCount{}}}. But these are raw 
counts for the cache, which include requests that are not cacheable. The 
cacheable metrics are more interesting, since it can be common to miss on a 
request which is not cacheable.

We should emit blockCache{{{}HitCachingCount{}}} and 
{{blockCacheMissCachingCount}} along with the current metrics.


> Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of 
> hits and misses for cacheable requests
> 
>
> Key: HBASE-27159
> URL: https://issues.apache.org/jira/browse/HBASE-27159
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache, metrics
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Priority: Minor
>
> [https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400]
> {code:java}
> public double getHitCachingRatio() {
>   double requestCachingCount = getRequestCachingCount();
>   if (requestCachingCount == 0) {
> return 0;
>   }
>   return getHitCachingCount() / requestCachingCount;
> } {code}
> This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. 
> The metric represents the percentage of requests which were cacheable, but 
> not found in the cache. Unfortunately, since the counters are process-level 
> counters, the ratio is for the lifetime of the process. This makes it less 
> useful for looking at cache behavior during a smaller time period.
> The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. 
> Having access to the underlying counters allows for offline computation of 
> the same metric for any given time period. But these counters are not emitted 
> today from {{{}MetricsRegionServerWrapperImpl.java{}}}.

[jira] [Updated] (HBASE-27159) Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of hits and misses for cacheable requests

2022-06-24 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-27159:
--
Summary: Emit source metrics for BlockCacheExpressHitPercent, blockCache 
counts of hits and misses for cacheable requests  (was: Emit source metrics for 
BlockCacheExpressHitPercent, getHitCachingRatio, getHitCachingCount, 
getMissCachingCount)

> Emit source metrics for BlockCacheExpressHitPercent, blockCache counts of 
> hits and misses for cacheable requests
> 
>
> Key: HBASE-27159
> URL: https://issues.apache.org/jira/browse/HBASE-27159
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache, metrics
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Priority: Minor
>
> [https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400]
> {code:java}
> public double getHitCachingRatio() {
>   double requestCachingCount = getRequestCachingCount();
>   if (requestCachingCount == 0) {
> return 0;
>   }
>   return getHitCachingCount() / requestCachingCount;
> } {code}
> This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. 
> The metric represents the percentage of requests which were cacheable, but 
> not found in the cache. Unfortunately, since the counters are process-level 
> counters, the ratio is for the lifetime of the process. This makes it less 
> useful for looking at cache behavior during a smaller time period.
> The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. 
> Having access to the underlying counters allows for offline computation of 
> the same metric for any given time period. But these counters are not emitted 
> today from {{{}MetricsRegionServerWrapperImpl.java{}}}.
> Compare this to {{hitCount}} and {{missCount}} which are emitted as metrics 
> {{blockCacheHitCount}} and {{{}blockCacheMissCount{}}}. But these are raw 
> counts for the cache, which include requests that are not cacheable. The 
> cacheable metrics are more interesting, since it can be common to miss on a 
> request which is not cacheable.
> We should emit blockCache{{{}HitCachingCount{}}} and 
> {{blockCacheMissCachingCount}} along with the current metrics.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27159) Emit source metrics for BlockCacheExpressHitPercent, getHitCachingRatio, getHitCachingCount, getMissCachingCount

2022-06-24 Thread David Manning (Jira)
David Manning created HBASE-27159:
-

 Summary: Emit source metrics for BlockCacheExpressHitPercent, 
getHitCachingRatio, getHitCachingCount, getMissCachingCount
 Key: HBASE-27159
 URL: https://issues.apache.org/jira/browse/HBASE-27159
 Project: HBase
  Issue Type: Improvement
  Components: BlockCache, metrics
Affects Versions: 2.0.0, 3.0.0-alpha-1
Reporter: David Manning


[https://github.com/apache/hbase/blob/d447fa01ba36a11d57927b78cce1bbca361b1d52/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java#L346-L400]
{code:java}
public double getHitCachingRatio() {
  double requestCachingCount = getRequestCachingCount();
  if (requestCachingCount == 0) {
return 0;
  }
  return getHitCachingCount() / requestCachingCount;
} {code}
This code is responsible for the metric {{{}BlockCacheExpressHitPercent{}}}. 
The metric represents the percentage of requests which were cacheable, but not 
found in the cache. Unfortunately, since the counters are process-level 
counters, the ratio is for the lifetime of the process. This makes it less 
useful for looking at cache behavior during a smaller time period.

The underlying counters are {{hitCachingCount}} and {{{}missCachingCount{}}}. 
Having access to the underlying counters allows for offline computation of the 
same metric for any given time period. But these counters are not emitted today 
from {{{}MetricsRegionServerWrapperImpl.java{}}}.

Compare this to {{hitCount}} and {{missCount}} which are emitted as metrics 
{{blockCacheHitCount}} and {{{}blockCacheMissCount{}}}. But these are raw 
counts for the cache, which include requests that are not cacheable. The 
cacheable metrics are more interesting, since it can be common to miss on a 
request which is not cacheable.

We should emit blockCache{{{}HitCachingCount{}}} and 
{{blockCacheMissCachingCount}} along with the current metrics.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky

2022-05-21 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17540525#comment-17540525
 ] 

David Manning commented on HBASE-27054:
---

Thanks for validating and committing!

> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>  is flaky  
> ---
>
> Key: HBASE-27054
> URL: https://issues.apache.org/jira/browse/HBASE-27054
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.5.0
>Reporter: Andrew Kyle Purtell
>Assignee: David Manning
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3, 2.4.13
>
>
> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   . Looks like we can be off by one on either side of an expected value.
> Any idea what is going on here [~dmanning]? 
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.779 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no less than 60.
> server=srv1351292323,46522,-3543799643652531264 , load=59
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.781 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no more than 60. 
> server=srv1402325691,7995,26308078476749652 , load=61
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky

2022-05-19 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-27054:
--
Status: Patch Available  (was: Open)

> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>  is flaky  
> ---
>
> Key: HBASE-27054
> URL: https://issues.apache.org/jira/browse/HBASE-27054
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.5.0
>Reporter: Andrew Kyle Purtell
>Assignee: David Manning
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3
>
>
> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   . Looks like we can be off by one on either side of an expected value.
> Any idea what is going on here [~dmanning]? 
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.779 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no less than 60.
> server=srv1351292323,46522,-3543799643652531264 , load=59
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.781 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no more than 60. 
> server=srv1402325691,7995,26308078476749652 , load=61
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky

2022-05-19 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539878#comment-17539878
 ] 

David Manning commented on HBASE-27054:
---

I see some good results by changing the cost function weights. I will propose a 
PR with those changes.
{code:java}
conf.setFloat("hbase.master.balancer.stochastic.moveCost", 0f);
conf.setFloat("hbase.master.balancer.stochastic.tableSkewCost", 0f); {code}
If I make one change, with {{maxRunningTime}} from 180s to 30s, I see 100% 
failure rate. If I make the above cost function weight updates, I see 100% pass 
rate, even with a {{maxRunningTime}} of 15s.

> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>  is flaky  
> ---
>
> Key: HBASE-27054
> URL: https://issues.apache.org/jira/browse/HBASE-27054
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.5.0
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3
>
>
> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   . Looks like we can be off by one on either side of an expected value.
> Any idea what is going on here [~dmanning]? 
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.779 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no less than 60.
> server=srv1351292323,46522,-3543799643652531264 , load=59
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.781 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no more than 60. 
> server=srv1402325691,7995,26308078476749652 , load=61
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky

2022-05-19 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning reassigned HBASE-27054:
-

Assignee: David Manning

> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>  is flaky  
> ---
>
> Key: HBASE-27054
> URL: https://issues.apache.org/jira/browse/HBASE-27054
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.5.0
>Reporter: Andrew Kyle Purtell
>Assignee: David Manning
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3
>
>
> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   . Looks like we can be off by one on either side of an expected value.
> Any idea what is going on here [~dmanning]? 
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.779 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no less than 60.
> server=srv1351292323,46522,-3543799643652531264 , load=59
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.781 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no more than 60. 
> server=srv1402325691,7995,26308078476749652 , load=61
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky

2022-05-19 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539839#comment-17539839
 ] 

David Manning edited comment on HBASE-27054 at 5/19/22 11:06 PM:
-

I ran it 50 times locally using latest {{{}master{}}}, it failed twice, even 
with 3-minute timeout, and ~3.9 million stochastic steps. So the 77s appears 
irrelevant.


was (Author: dmanning):
I ran it 50 times locally using latest {{master}}, it failed twice, even with 
3-minute timeout, and ~3.9 million stochastic steps.

> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>  is flaky  
> ---
>
> Key: HBASE-27054
> URL: https://issues.apache.org/jira/browse/HBASE-27054
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.5.0
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3
>
>
> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   . Looks like we can be off by one on either side of an expected value.
> Any idea what is going on here [~dmanning]? 
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.779 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no less than 60.
> server=srv1351292323,46522,-3543799643652531264 , load=59
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.781 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no more than 60. 
> server=srv1402325691,7995,26308078476749652 , load=61
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky

2022-05-19 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539839#comment-17539839
 ] 

David Manning commented on HBASE-27054:
---

I ran it 50 times locally using latest {{master}}, it failed twice, even with 
3-minute timeout, and ~3.9 million stochastic steps.

> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>  is flaky  
> ---
>
> Key: HBASE-27054
> URL: https://issues.apache.org/jira/browse/HBASE-27054
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.5.0
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3
>
>
> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   . Looks like we can be off by one on either side of an expected value.
> Any idea what is going on here [~dmanning]? 
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.779 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no less than 60.
> server=srv1351292323,46522,-3543799643652531264 , load=59
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.781 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no more than 60. 
> server=srv1402325691,7995,26308078476749652 , load=61
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky

2022-05-19 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539754#comment-17539754
 ] 

David Manning commented on HBASE-27054:
---

With a lower timeout, like 60 seconds, or on slower hardware, we could get 
fewer iterations. I suppose in that sense we may just get unlucky in not being 
able to get to fully balanced state given current configuration.

50,000 regions have to move, and the {{RegionReplicaCandidateGenerator}} is 
doing most of that work, which is chosen roughly 25% of the time. There are 
likely some missteps. Conservatively, it seems like we may need 200,000 calls 
to guarantee the work gets done. That means 800,000 iterations. Running 
locally, if I had set a timeout of 60 seconds, I'd see 1.3 million iterations. 
It's close enough that we may see the occasional problem. The tests should 
ensure that even on slow hardware, with unlucky random choices, we are still 
virtually guaranteed success. We may not be doing that here. But a 3 minute 
timeout should make it much more likely. So I'm interested in the test message 
that says it ran 77 seconds, even though I'm sure the test could be improved to 
be more deterministic.

> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>  is flaky  
> ---
>
> Key: HBASE-27054
> URL: https://issues.apache.org/jira/browse/HBASE-27054
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.5.0
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3
>
>
> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   . Looks like we can be off by one on either side of an expected value.
> Any idea what is going on here [~dmanning]? 
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.779 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no less than 60.
> server=srv1351292323,46522,-3543799643652531264 , load=59
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:200)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}
> {noformat}
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>   Time elapsed: 77.781 s  <<< FAILURE!
> java.lang.AssertionError: All servers should have load no more than 60. 
> server=srv1402325691,7995,26308078476749652 , load=61
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:198)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:577)
>   at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:544)
>   at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster(TestStochasticLoadBalancerRegionReplicaLargeCluster.java:41)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-27054) TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster is flaky

2022-05-19 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539749#comment-17539749
 ] 

David Manning commented on HBASE-27054:
---

[~apurtell] Do we know if this is a recent regression, or has it always been 
flaky? My initial thought is that there may be some randomness (it is a 
stochastic balancer after all) which leads to this end result. I don't believe 
any recent changes would have caused this to become more flaky, but I suppose 
it's possible. HBASE-26311 is interesting, since it changes calculations to use 
standard deviation. [~claraxiong]

Why does the error message say it failed after 77 seconds? The test takes 3 
minutes to run for me locally, which is the configured timeout for the balancer 
in {{StochasticBalancerTestBase2}}. Is there a link to a test failure with full 
logs that I can inspect? (Note, 3 minute timeout was updated in HBASE-25873. 
Previous value was 90 seconds.)

With region replicas involved, the {{RegionReplicaCandidateGenerator}} will 
just move a colocated replica to a random server, without consideration of how 
many regions that target server is hosting. The cost functions will allow it in 
basically every case, since it heavily prioritizes resolving colocated 
replicas. So maybe by the time all the region replicas have been resolved, the 
number of moves is already pushing limits of one balancer iteration, with 
having randomly overloaded one regionserver.

A situation that the balancer will have a difficult time getting out of is if 
one regionserver is hosting 61 replicas of 61 regions, and another regionserver 
is hosting 59 regions, which are replicas of those 61 regions. The 
{{LoadCandidateGenerator}} will keep trying to take a region from the server 
with 61 and give it to the server with 59, but because there is already a 
replica that matches, it will be too expensive to move. But as long as we can 
process enough iterations, probabilistically speaking we should be able to get 
to one of the 2 safe regions to move... when I run this test locally I see 
nearly 4 million iterations, and with 1/4 of those using the 
{{LoadCandidateGenerator}} it seems like we should generally find a solution 
that moves them all.

{code}
Finished computing new moving plan. Computation took 180001 ms to try 3975554 
different iterations.  Found a solution that moves 50006 regions; Going from a 
computed imbalance of 0.9026309610781538 to a new imbalance of 
5.252006025578701E-5. funtionCost=RegionCountSkewCostFunction : 
(multiplier=500.0, imbalance=0.0); PrimaryRegionCountSkewCostFunction : 
(multiplier=500.0, imbalance=0.0); MoveCostFunction : (multiplier=7.0, 
imbalance=0.83343334, need balance); RackLocalityCostFunction : 
(multiplier=15.0, imbalance=0.0); TableSkewCostFunction : (multiplier=35.0, 
imbalance=0.0); RegionReplicaHostCostFunction : (multiplier=10.0, 
imbalance=0.0); RegionReplicaRackCostFunction : (multiplier=1.0, 
imbalance=0.0); ReadRequestCostFunction : (multiplier=5.0, imbalance=0.0); 
CPRequestCostFunction : (multiplier=5.0, imbalance=0.0); 
WriteRequestCostFunction : (multiplier=5.0, imbalance=0.0); 
MemStoreSizeCostFunction : (multiplier=5.0, imbalance=0.0); 
StoreFileCostFunction : (multiplier=5.0, imbalance=0.0);
{code}

Since the test case is also using 100 tables, and there is a 
{{TableSkewCostFunction}} involved, it's also possible that the balancer is 
happy with a slightly uneven region count balance, because balancing the last 
region would push towards an imbalance of tables if the target regionserver 
already has too many regions of that table for every region that is chosen. I 
don't know if the math would support this, though. If it does, it's possible 
that out of the last 61 regions, moving any region to the server with 59 would 
either cause table skew or colocated replicas, and so the balancer cannot fully 
balance based on the simple {{LoadCandidateGenerator}} alone.

This is all hypothetical, without yet trying to debug. Given the large size of 
the test, the number of balancer iterations, and the flakiness, it may be 
difficult to debug. I ran it 10+ times locally so far, and it passes each time. 
So, some ideas to explore:
# Don't assert that the cluster is fully balanced in this test case, just 
assert that there are no colocated replicas. Arguably this is the purpose of 
the test, and the test framework already appears to allow for this.
# Change cost function weights for everything else, other than region counts 
and replica counts, to be 0. In this way, nothing prevents the balancer 
optimizing for these variables, which the test is expecting to validate. 
Specifically, set TableSkew and MoveCost functions to 0.
# Use fewer than 100 tables, if table skew is a contributing factor.

> TestStochasticLoadBalancerRegionReplicaLargeCluster.testRegionReplicasOnLargeCluster
>  is flaky  
> 

[jira] [Commented] (HBASE-26989) TestStochasticLoadBalancer has some slow methods, and inconsistent set, reset, unset of configuration

2022-04-29 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529794#comment-17529794
 ] 

David Manning commented on HBASE-26989:
---

When running the tests locally, I see these runtime improvements:
{{testNeedBalance}}: from 120 seconds to 11 seconds
{{testSloppyTablesLoadBalanceByTable}} 27 seconds to <1 second
{{testBalanceOfSloppyServers}} 67 seconds to <1 second

So total class {{TestStochasticLoadBalancer}} runtime reduces from 230 seconds 
to 31 seconds.

Additionally, we get more deterministic behavior, since tests are more likely 
to have consistent results with a max number of steps when compared to a max 
running time.

> TestStochasticLoadBalancer has some slow methods, and inconsistent set, 
> reset, unset of configuration
> -
>
> Key: HBASE-26989
> URL: https://issues.apache.org/jira/browse/HBASE-26989
> Project: HBase
>  Issue Type: Test
>  Components: Balancer, test
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> Some test ordering issues were exposed by adding new tests in HBASE-22349. I 
> think this is a legitimate issue which is tracked in HBASE-26988.
> But we can update the tests to be consistent in how they update configuration 
> to reduce confusion, removing the {{unset}} calls.
> We can also update other configuration values to significantly speed up the 
> long-running methods. Methods that are simply checking for balancer plans do 
> not need to {{runMaxSteps}}. All we need to do is run enough steps to 
> guarantee we will plan to move one region. That can be far fewer than the 
> tens of millions of steps we may be running given {{runMaxSteps}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration

2022-04-29 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-26988:
--
Description: 
# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Dynamically reload configuration for hmaster 
(https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table
*Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
{{false}}

*Note 1*: The steps may only work if the config value is not in 
{{hbase-default.xml}} so it may be an unlikely scenario.

*Note 2*: I see this when running tests added in HBASE-22349, depending on the 
order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} 
executes before {{testBalanceOfSloppyServers}} there will be a failure. We 
could apply the workaround to the tests (explicitly set to {{false}}), but it 
seems better to fix the dynamic reconfiguration behavior. Regardless, I will 
propose test fixes in HBASE-26989.

  was:
# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Dynamically reload configuration for hmaster 
(https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table
*Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
{{false}}

Note: I see this when running tests added in HBASE-22349, depending on the 
order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} 
executes before {{testBalanceOfSloppyServers}} there will be a failure. We 
could apply the workaround to the tests (explicitly set to {{false}}), but it 
seems better to fix the dynamic reconfiguration behavior.


> Balancer should reset to default setting for hbase.master.loadbalance.bytable 
> if dynamically reloading configuration
> 
>
> Key: HBASE-26988
> URL: https://issues.apache.org/jira/browse/HBASE-26988
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
> # Start HMaster
> # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
> # Dynamically reload configuration for hmaster 
> (https://hbase.apache.org/book.html#dyn_config)
> *Expected:* load balancing would no longer happen by table
> *Actual:* load balancing still happens by table
> *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
> {{false}}
> *Note 1*: The steps may only work if the config value is not in 
> {{hbase-default.xml}} so it may be an unlikely scenario.
> *Note 2*: I see this when running tests added in HBASE-22349, depending on 
> the order of execution of test methods. If 
> {{testSloppyTablesLoadBalanceByTable}} executes before 
> {{testBalanceOfSloppyServers}} there will be a failure. We could apply the 
> workaround to the tests (explicitly set to {{false}}), but it seems better to 
> fix the dynamic reconfiguration behavior. Regardless, I will propose test 
> fixes in HBASE-26989.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26989) TestStochasticLoadBalancer has some slow methods, and inconsistent set, reset, unset of configuration

2022-04-28 Thread David Manning (Jira)
David Manning created HBASE-26989:
-

 Summary: TestStochasticLoadBalancer has some slow methods, and 
inconsistent set, reset, unset of configuration
 Key: HBASE-26989
 URL: https://issues.apache.org/jira/browse/HBASE-26989
 Project: HBase
  Issue Type: Test
  Components: Balancer, test
Affects Versions: 2.0.0, 3.0.0-alpha-1
Reporter: David Manning
Assignee: David Manning


Some test ordering issues were exposed by adding new tests in HBASE-22349. I 
think this is a legitimate issue which is tracked in HBASE-26988.

But we can update the tests to be consistent in how they update configuration 
to reduce confusion, removing the {{unset}} calls.

We can also update other configuration values to significantly speed up the 
long-running methods. Methods that are simply checking for balancer plans do 
not need to {{runMaxSteps}}. All we need to do is run enough steps to guarantee 
we will plan to move one region. That can be far fewer than the tens of 
millions of steps we may be running given {{runMaxSteps}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration

2022-04-28 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529774#comment-17529774
 ] 

David Manning commented on HBASE-26988:
---

I guess this behavior would apply to a lot of {{StochasticLoadBalancer}} 
settings as well...

> Balancer should reset to default setting for hbase.master.loadbalance.bytable 
> if dynamically reloading configuration
> 
>
> Key: HBASE-26988
> URL: https://issues.apache.org/jira/browse/HBASE-26988
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
> # Start HMaster
> # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
> # Dynamically reload configuration for hmaster 
> (https://hbase.apache.org/book.html#dyn_config)
> *Expected:* load balancing would no longer happen by table
> *Actual:* load balancing still happens by table
> *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
> {{false}}
> Note: I see this when running tests added in HBASE-22349, depending on the 
> order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} 
> executes before {{testBalanceOfSloppyServers}} there will be a failure. We 
> could apply the workaround to the tests (explicitly set to {{false}}), but it 
> seems better to fix the dynamic reconfiguration behavior.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration

2022-04-28 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-26988:
--
Status: Patch Available  (was: Open)

> Balancer should reset to default setting for hbase.master.loadbalance.bytable 
> if dynamically reloading configuration
> 
>
> Key: HBASE-26988
> URL: https://issues.apache.org/jira/browse/HBASE-26988
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 3.0.0-alpha-1
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
> # Start HMaster
> # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
> # Dynamically reload configuration for hmaster 
> (https://hbase.apache.org/book.html#dyn_config)
> *Expected:* load balancing would no longer happen by table
> *Actual:* load balancing still happens by table
> *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
> {{false}}
> Note: I see this when running tests added in HBASE-22349, depending on the 
> order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} 
> executes before {{testBalanceOfSloppyServers}} there will be a failure. We 
> could apply the workaround to the tests (explicitly set to {{false}}), but it 
> seems better to fix the dynamic reconfiguration behavior.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration

2022-04-28 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529772#comment-17529772
 ] 

David Manning commented on HBASE-26988:
---

I randomly didn't notice it when running tests locally, because 
{{testUpdateBalancerLoadInfo}} also sets it to {{false}} as the last update. So 
if that test runs in between {{testSloppyTablesLoadBalanceByTable}} and 
{{testBalanceOfSloppyServers}}, everything is also okay.

> Balancer should reset to default setting for hbase.master.loadbalance.bytable 
> if dynamically reloading configuration
> 
>
> Key: HBASE-26988
> URL: https://issues.apache.org/jira/browse/HBASE-26988
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
> # Start HMaster
> # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
> # Dynamically reload configuration for hmaster 
> (https://hbase.apache.org/book.html#dyn_config)
> *Expected:* load balancing would no longer happen by table
> *Actual:* load balancing still happens by table
> *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
> {{false}}
> Note: I see this when running tests added in HBASE-22349, depending on the 
> order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} 
> executes before {{testBalanceOfSloppyServers}} there will be a failure. We 
> could apply the workaround to the tests (explicitly set to {{false}}), but it 
> seems better to fix the dynamic reconfiguration behavior.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration

2022-04-28 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-26988:
--
Description: 
# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Dynamically reload configuration for hmaster 
(https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table
*Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
{{false}}

Note: I see this when running tests added in HBASE-22349, depending on the 
order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} 
executes before {{testBalanceOfSloppyServers}} there will be a failure. We 
could apply the workaround to the tests (explicitly set to {{false}}), but it 
seems better to fix the dynamic reconfiguration behavior.

  was:
# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Dynamically reload configuration for hmaster 
(https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table
*Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
{{false}}

Note: I see this when running tests added in HBASE-22349, depending on the 
order of execution of test methods. We could apply the workaround to the tests 
(explicitly set to {{false}}), but it seems better to fix the dynamic 
reconfiguration behavior.


> Balancer should reset to default setting for hbase.master.loadbalance.bytable 
> if dynamically reloading configuration
> 
>
> Key: HBASE-26988
> URL: https://issues.apache.org/jira/browse/HBASE-26988
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
> # Start HMaster
> # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
> # Dynamically reload configuration for hmaster 
> (https://hbase.apache.org/book.html#dyn_config)
> *Expected:* load balancing would no longer happen by table
> *Actual:* load balancing still happens by table
> *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
> {{false}}
> Note: I see this when running tests added in HBASE-22349, depending on the 
> order of execution of test methods. If {{testSloppyTablesLoadBalanceByTable}} 
> executes before {{testBalanceOfSloppyServers}} there will be a failure. We 
> could apply the workaround to the tests (explicitly set to {{false}}), but it 
> seems better to fix the dynamic reconfiguration behavior.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration

2022-04-28 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-26988:
--
Description: 
# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Dynamically reload configuration for hmaster 
(https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table
*Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
{{false}}

Note: I see this when running tests added in HBASE-22349, depending on the 
order of execution of test methods. We could apply the workaround to the tests 
(explicitly set to {{false}}), but it seems better to fix the dynamic 
reconfiguration behavior.

  was:
# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Dynamically reload configuration for hmaster 
(https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table
*Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
{{false}}


> Balancer should reset to default setting for hbase.master.loadbalance.bytable 
> if dynamically reloading configuration
> 
>
> Key: HBASE-26988
> URL: https://issues.apache.org/jira/browse/HBASE-26988
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
> # Start HMaster
> # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
> # Dynamically reload configuration for hmaster 
> (https://hbase.apache.org/book.html#dyn_config)
> *Expected:* load balancing would no longer happen by table
> *Actual:* load balancing still happens by table
> *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
> {{false}}
> Note: I see this when running tests added in HBASE-22349, depending on the 
> order of execution of test methods. We could apply the workaround to the 
> tests (explicitly set to {{false}}), but it seems better to fix the dynamic 
> reconfiguration behavior.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration

2022-04-28 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-26988:
--
Description: 
# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Dynamically reload configuration for hmaster 
(https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table

  was:
# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Reload configuration (https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table


> Balancer should reset to default setting for hbase.master.loadbalance.bytable 
> if dynamically reloading configuration
> 
>
> Key: HBASE-26988
> URL: https://issues.apache.org/jira/browse/HBASE-26988
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
> # Start HMaster
> # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
> # Dynamically reload configuration for hmaster 
> (https://hbase.apache.org/book.html#dyn_config)
> *Expected:* load balancing would no longer happen by table
> *Actual:* load balancing still happens by table



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration

2022-04-28 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-26988:
--
Description: 
# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Dynamically reload configuration for hmaster 
(https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table
*Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
{{false}}.

  was:
# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Dynamically reload configuration for hmaster 
(https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table


> Balancer should reset to default setting for hbase.master.loadbalance.bytable 
> if dynamically reloading configuration
> 
>
> Key: HBASE-26988
> URL: https://issues.apache.org/jira/browse/HBASE-26988
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
> # Start HMaster
> # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
> # Dynamically reload configuration for hmaster 
> (https://hbase.apache.org/book.html#dyn_config)
> *Expected:* load balancing would no longer happen by table
> *Actual:* load balancing still happens by table
> *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
> {{false}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration

2022-04-28 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-26988:
--
Description: 
# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Dynamically reload configuration for hmaster 
(https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table
*Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
{{false}}

  was:
# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Dynamically reload configuration for hmaster 
(https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table
*Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
{{false}}.


> Balancer should reset to default setting for hbase.master.loadbalance.bytable 
> if dynamically reloading configuration
> 
>
> Key: HBASE-26988
> URL: https://issues.apache.org/jira/browse/HBASE-26988
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> # Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
> # Start HMaster
> # Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
> # Dynamically reload configuration for hmaster 
> (https://hbase.apache.org/book.html#dyn_config)
> *Expected:* load balancing would no longer happen by table
> *Actual:* load balancing still happens by table
> *Workaround:* leave the entry in {{hbase-site.xml}} but explicitly set to 
> {{false}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26988) Balancer should reset to default setting for hbase.master.loadbalance.bytable if dynamically reloading configuration

2022-04-28 Thread David Manning (Jira)
David Manning created HBASE-26988:
-

 Summary: Balancer should reset to default setting for 
hbase.master.loadbalance.bytable if dynamically reloading configuration
 Key: HBASE-26988
 URL: https://issues.apache.org/jira/browse/HBASE-26988
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 2.0.0, 3.0.0-alpha-1
Reporter: David Manning
Assignee: David Manning


# Set {{hbase.master.loadbalance.bytable}} to {{true}} in {{hbase-site.xml}}
# Start HMaster
# Remove {{hbase.master.loadbalance.bytable}} entry in {{hbase-site.xml}}
# Reload configuration (https://hbase.apache.org/book.html#dyn_config)

*Expected:* load balancing would no longer happen by table
*Actual:* load balancing still happens by table



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster

2022-04-28 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-22349:
--
Release Note: StochasticLoadBalancer now respects the hbase.regions.slop 
configuration value as another factor in determining whether to attempt a 
balancer run. If any regionserver has a region count outside of the target 
range, the balancer will attempt to balance. Using the default 0.2 value, the 
target range is 80%-120% of the average (mean) region count per server. Whether 
the balancer will ultimately move regions will still depend on the weights of 
StochasticLoadBalancer's cost functions.

> Stochastic Load Balancer skips balancing when node is replaced in cluster
> -
>
> Key: HBASE-22349
> URL: https://issues.apache.org/jira/browse/HBASE-22349
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 1.3.0, 1.4.4, 2.0.0
>Reporter: Suthan Phillips
>Assignee: David Manning
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3
>
> Attachments: Hbase-22349.pdf
>
>
> HBASE-24139 allows the load balancer to run when one server has 0 regions and 
> another server has more than 1 region. This is a special case of a more 
> generic problem, where one server has far too few or far too many regions. 
> The StochasticLoadBalancer defaults may decide the cluster is "balanced 
> enough" according to {{hbase.master.balancer.stochastic.minCostNeedBalance}}, 
> even though one server may have a far higher or lower number of regions 
> compared to the rest of the cluster.
> One specific example of this we have seen is when we use {{RegionMover}} to 
> move regions back to a restarted RegionServer, if the 
> {{StochasticLoadBalancer}} happens to be running. The load balancer sees a 
> newly restarted RegionServer with 0 regions, and after HBASE-24139, it will 
> balance regions to this server. Simultaneously, {{RegionMover}} moves back 
> regions. The end result is that the newly restarted RegionServer has twice 
> the load of any other server in the cluster. Future iterations of the load 
> balancer do nothing, as the cluster cost does not exceed 
> {{minCostNeedBalance}}.
> Another example is if the load balancer makes very slow progress on a 
> cluster, it may not move the average cluster load to a newly restarted 
> regionserver in one iteration. But after the first iteration, the balancer 
> may again not run due to cluster cost not exceeding {{minCostNeedBalance}}.
> We can propose a solution where we reuse the {{slop}} concept in 
> {{SimpleLoadBalancer}} and use this to extend the HBASE-24139 logic for 
> deciding to run the balancer as long as there is a "sloppy" server in the 
> cluster.
> +*Previous Description Notes Below, which are relevant, but as stated, were 
> already fixed by HBASE-24139*+
> In EMR cluster, whenever I replace one of the nodes, the regions never get 
> rebalanced.
> The default minCostNeedBalance set to 0.05 is too high.
> The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 
> = 203
> Once a node(region server) got replaced with a new node (terminated and EMR 
> recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 
> 22, 22, 23, 23, 23 = 203
> From hbase-master-logs, I can see the below WARN which indicates that the 
> default minCostNeedBalance does not hold good for these scenarios.
> ##
> 2019-04-29 09:31:37,027 WARN  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> cleaner.CleanerChore: WALs outstanding under 
> hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 
> 09:31:42,920 INFO  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost 
> which need balance is 0.05
> ##
> To mitigate this, I had to modify the default minCostNeedBalance to lower 
> value like 0.01f and restart Region Servers and Hbase Master. After modifying 
> this value to 0.01f I could see the regions getting re-balanced.
> This has led me to the following questions which I would like to get it 
> answered from the HBase experts.
> 1)What are the factors that affect the value of total cost and sum 
> multiplier? How could we determine the right minCostNeedBalance value for any 
> cluster?
> 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal 
> value? If yes, then what is the recommended way to mitigate this scenario? 
> Attached: Steps to reproduce
>  
> Note: HBase-17565 patch is already applied.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster

2022-04-26 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-22349:
--
Component/s: Balancer

> Stochastic Load Balancer skips balancing when node is replaced in cluster
> -
>
> Key: HBASE-22349
> URL: https://issues.apache.org/jira/browse/HBASE-22349
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 1.3.0, 1.4.4, 2.0.0
>Reporter: Suthan Phillips
>Assignee: David Manning
>Priority: Major
> Attachments: Hbase-22349.pdf
>
>
> HBASE-24139 allows the load balancer to run when one server has 0 regions and 
> another server has more than 1 region. This is a special case of a more 
> generic problem, where one server has far too few or far too many regions. 
> The StochasticLoadBalancer defaults may decide the cluster is "balanced 
> enough" according to {{hbase.master.balancer.stochastic.minCostNeedBalance}}, 
> even though one server may have a far higher or lower number of regions 
> compared to the rest of the cluster.
> One specific example of this we have seen is when we use {{RegionMover}} to 
> move regions back to a restarted RegionServer, if the 
> {{StochasticLoadBalancer}} happens to be running. The load balancer sees a 
> newly restarted RegionServer with 0 regions, and after HBASE-24139, it will 
> balance regions to this server. Simultaneously, {{RegionMover}} moves back 
> regions. The end result is that the newly restarted RegionServer has twice 
> the load of any other server in the cluster. Future iterations of the load 
> balancer do nothing, as the cluster cost does not exceed 
> {{minCostNeedBalance}}.
> Another example is if the load balancer makes very slow progress on a 
> cluster, it may not move the average cluster load to a newly restarted 
> regionserver in one iteration. But after the first iteration, the balancer 
> may again not run due to cluster cost not exceeding {{minCostNeedBalance}}.
> We can propose a solution where we reuse the {{slop}} concept in 
> {{SimpleLoadBalancer}} and use this to extend the HBASE-24139 logic for 
> deciding to run the balancer as long as there is a "sloppy" server in the 
> cluster.
> +*Previous Description Notes Below, which are relevant, but as stated, were 
> already fixed by HBASE-24139*+
> In EMR cluster, whenever I replace one of the nodes, the regions never get 
> rebalanced.
> The default minCostNeedBalance set to 0.05 is too high.
> The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 
> = 203
> Once a node(region server) got replaced with a new node (terminated and EMR 
> recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 
> 22, 22, 23, 23, 23 = 203
> From hbase-master-logs, I can see the below WARN which indicates that the 
> default minCostNeedBalance does not hold good for these scenarios.
> ##
> 2019-04-29 09:31:37,027 WARN  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> cleaner.CleanerChore: WALs outstanding under 
> hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 
> 09:31:42,920 INFO  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost 
> which need balance is 0.05
> ##
> To mitigate this, I had to modify the default minCostNeedBalance to lower 
> value like 0.01f and restart Region Servers and Hbase Master. After modifying 
> this value to 0.01f I could see the regions getting re-balanced.
> This has led me to the following questions which I would like to get it 
> answered from the HBase experts.
> 1)What are the factors that affect the value of total cost and sum 
> multiplier? How could we determine the right minCostNeedBalance value for any 
> cluster?
> 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal 
> value? If yes, then what is the recommended way to mitigate this scenario? 
> Attached: Steps to reproduce
>  
> Note: HBase-17565 patch is already applied.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster

2022-04-26 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-22349:
--
Status: Patch Available  (was: Open)

> Stochastic Load Balancer skips balancing when node is replaced in cluster
> -
>
> Key: HBASE-22349
> URL: https://issues.apache.org/jira/browse/HBASE-22349
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.4, 1.3.0, 3.0.0-alpha-1
>Reporter: Suthan Phillips
>Assignee: David Manning
>Priority: Major
> Attachments: Hbase-22349.pdf
>
>
> HBASE-24139 allows the load balancer to run when one server has 0 regions and 
> another server has more than 1 region. This is a special case of a more 
> generic problem, where one server has far too few or far too many regions. 
> The StochasticLoadBalancer defaults may decide the cluster is "balanced 
> enough" according to {{hbase.master.balancer.stochastic.minCostNeedBalance}}, 
> even though one server may have a far higher or lower number of regions 
> compared to the rest of the cluster.
> One specific example of this we have seen is when we use {{RegionMover}} to 
> move regions back to a restarted RegionServer, if the 
> {{StochasticLoadBalancer}} happens to be running. The load balancer sees a 
> newly restarted RegionServer with 0 regions, and after HBASE-24139, it will 
> balance regions to this server. Simultaneously, {{RegionMover}} moves back 
> regions. The end result is that the newly restarted RegionServer has twice 
> the load of any other server in the cluster. Future iterations of the load 
> balancer do nothing, as the cluster cost does not exceed 
> {{minCostNeedBalance}}.
> Another example is if the load balancer makes very slow progress on a 
> cluster, it may not move the average cluster load to a newly restarted 
> regionserver in one iteration. But after the first iteration, the balancer 
> may again not run due to cluster cost not exceeding {{minCostNeedBalance}}.
> We can propose a solution where we reuse the {{slop}} concept in 
> {{SimpleLoadBalancer}} and use this to extend the HBASE-24139 logic for 
> deciding to run the balancer as long as there is a "sloppy" server in the 
> cluster.
> +*Previous Description Notes Below, which are relevant, but as stated, were 
> already fixed by HBASE-24139*+
> In EMR cluster, whenever I replace one of the nodes, the regions never get 
> rebalanced.
> The default minCostNeedBalance set to 0.05 is too high.
> The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 
> = 203
> Once a node(region server) got replaced with a new node (terminated and EMR 
> recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 
> 22, 22, 23, 23, 23 = 203
> From hbase-master-logs, I can see the below WARN which indicates that the 
> default minCostNeedBalance does not hold good for these scenarios.
> ##
> 2019-04-29 09:31:37,027 WARN  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> cleaner.CleanerChore: WALs outstanding under 
> hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 
> 09:31:42,920 INFO  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost 
> which need balance is 0.05
> ##
> To mitigate this, I had to modify the default minCostNeedBalance to lower 
> value like 0.01f and restart Region Servers and Hbase Master. After modifying 
> this value to 0.01f I could see the regions getting re-balanced.
> This has led me to the following questions which I would like to get it 
> answered from the HBase experts.
> 1)What are the factors that affect the value of total cost and sum 
> multiplier? How could we determine the right minCostNeedBalance value for any 
> cluster?
> 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal 
> value? If yes, then what is the recommended way to mitigate this scenario? 
> Attached: Steps to reproduce
>  
> Note: HBase-17565 patch is already applied.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster

2022-04-26 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-22349:
--
Description: 
HBASE-24139 allows the load balancer to run when one server has 0 regions and 
another server has more than 1 region. This is a special case of a more generic 
problem, where one server has far too few or far too many regions. The 
StochasticLoadBalancer defaults may decide the cluster is "balanced enough" 
according to {{hbase.master.balancer.stochastic.minCostNeedBalance}}, even 
though one server may have a far higher or lower number of regions compared to 
the rest of the cluster.

One specific example of this we have seen is when we use {{RegionMover}} to 
move regions back to a restarted RegionServer, if the 
{{StochasticLoadBalancer}} happens to be running. The load balancer sees a 
newly restarted RegionServer with 0 regions, and after HBASE-24139, it will 
balance regions to this server. Simultaneously, {{RegionMover}} moves back 
regions. The end result is that the newly restarted RegionServer has twice the 
load of any other server in the cluster. Future iterations of the load balancer 
do nothing, as the cluster cost does not exceed {{minCostNeedBalance}}.

Another example is if the load balancer makes very slow progress on a cluster, 
it may not move the average cluster load to a newly restarted regionserver in 
one iteration. But after the first iteration, the balancer may again not run 
due to cluster cost not exceeding {{minCostNeedBalance}}.

We can propose a solution where we reuse the {{slop}} concept in 
{{SimpleLoadBalancer}} and use this to extend the HBASE-24139 logic for 
deciding to run the balancer as long as there is a "sloppy" server in the 
cluster.

+*Previous Description Notes Below, which are relevant, but as stated, were 
already fixed by HBASE-24139*+

In EMR cluster, whenever I replace one of the nodes, the regions never get 
rebalanced.

The default minCostNeedBalance set to 0.05 is too high.

The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 = 
203

Once a node(region server) got replaced with a new node (terminated and EMR 
recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 
22, 22, 23, 23, 23 = 203

>From hbase-master-logs, I can see the below WARN which indicates that the 
>default minCostNeedBalance does not hold good for these scenarios.

##

2019-04-29 09:31:37,027 WARN  
[ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
cleaner.CleanerChore: WALs outstanding under 
hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 
09:31:42,920 INFO  
[ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost 
which need balance is 0.05

##

To mitigate this, I had to modify the default minCostNeedBalance to lower value 
like 0.01f and restart Region Servers and Hbase Master. After modifying this 
value to 0.01f I could see the regions getting re-balanced.

This has led me to the following questions which I would like to get it 
answered from the HBase experts.

1)What are the factors that affect the value of total cost and sum multiplier? 
How could we determine the right minCostNeedBalance value for any cluster?

2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal 
value? If yes, then what is the recommended way to mitigate this scenario? 

Attached: Steps to reproduce

 

Note: HBase-17565 patch is already applied.

  was:
In EMR cluster, whenever I replace one of the nodes, the regions never get 
rebalanced.

The default minCostNeedBalance set to 0.05 is too high.

The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 = 
203

Once a node(region server) got replaced with a new node (terminated and EMR 
recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 
22, 22, 23, 23, 23 = 203

>From hbase-master-logs, I can see the below WARN which indicates that the 
>default minCostNeedBalance does not hold good for these scenarios.

##

2019-04-29 09:31:37,027 WARN  
[ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
cleaner.CleanerChore: WALs outstanding under 
hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 
09:31:42,920 INFO  
[ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost 
which need balance is 0.05

##

To mitigate this, I had to modify the default minCostNeedBalance to lower value 
like 0.01f and restart Region Servers and Hbase Master. After modifying this 
value to 0.01f I could see the regions getting re-balanced.

This has led me to 

[jira] [Assigned] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster

2022-04-26 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning reassigned HBASE-22349:
-

Assignee: David Manning

> Stochastic Load Balancer skips balancing when node is replaced in cluster
> -
>
> Key: HBASE-22349
> URL: https://issues.apache.org/jira/browse/HBASE-22349
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 1.3.0, 1.4.4, 2.0.0
>Reporter: Suthan Phillips
>Assignee: David Manning
>Priority: Major
> Attachments: Hbase-22349.pdf
>
>
> In EMR cluster, whenever I replace one of the nodes, the regions never get 
> rebalanced.
> The default minCostNeedBalance set to 0.05 is too high.
> The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 
> = 203
> Once a node(region server) got replaced with a new node (terminated and EMR 
> recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 
> 22, 22, 23, 23, 23 = 203
> From hbase-master-logs, I can see the below WARN which indicates that the 
> default minCostNeedBalance does not hold good for these scenarios.
> ##
> 2019-04-29 09:31:37,027 WARN  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> cleaner.CleanerChore: WALs outstanding under 
> hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 
> 09:31:42,920 INFO  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost 
> which need balance is 0.05
> ##
> To mitigate this, I had to modify the default minCostNeedBalance to lower 
> value like 0.01f and restart Region Servers and Hbase Master. After modifying 
> this value to 0.01f I could see the regions getting re-balanced.
> This has led me to the following questions which I would like to get it 
> answered from the HBase experts.
> 1)What are the factors that affect the value of total cost and sum 
> multiplier? How could we determine the right minCostNeedBalance value for any 
> cluster?
> 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal 
> value? If yes, then what is the recommended way to mitigate this scenario? 
> Attached: Steps to reproduce
>  
> Note: HBase-17565 patch is already applied.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster

2022-04-14 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17522585#comment-17522585
 ] 

David Manning commented on HBASE-22349:
---

The scenario as originally described is fixed by HBASE-24139. However, I would 
like to propose using this to track other cases where we should execute the 
balancer, like one server with much fewer regions, or much more regions, than 
the average server in the cluster. (Take the original scenario, and instead of 
having 0 regions on the server, have only 1 region on the server.)

I can think of a few options:
# Use some hot/cold threshold like 50%. Compute the average regions per server. 
If a server has a region count which is >150% or <50% of this average, allow 
the balancer to run (short-circuit in {{needsBalance}})
# Find outliers using some type of standard deviation, and short-circuit run in 
{{needsBalance}} if one is found.
# Introduce a "force run" of the balancer on some timed interval.

I'm inclined to try option 1. Option 3 sounds appealing to me, because it is a 
backstop to catch all of the cases which are ignored by {{minCostNeedBalance}}. 
However, other operators may find it too interrupting, if they need little to 
no region movement in the cluster.

For reference, one scenario where we find ourselves in this undesirable state 
is by running {{region_mover}} at the same time as the load balancer. As stated 
in the {{region_mover}} comments, those two operations will conflict. The 
result can be one regionserver which has double the regions of any other server 
in the cluster. And if {{minCostNeedBalance}} is not exceeded, which is not 
difficult in a sizable cluster, one regionserver will run with double the load 
indefinitely.

> Stochastic Load Balancer skips balancing when node is replaced in cluster
> -
>
> Key: HBASE-22349
> URL: https://issues.apache.org/jira/browse/HBASE-22349
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 1.3.0, 1.4.4, 2.0.0
>Reporter: Suthan Phillips
>Priority: Major
> Attachments: Hbase-22349.pdf
>
>
> In EMR cluster, whenever I replace one of the nodes, the regions never get 
> rebalanced.
> The default minCostNeedBalance set to 0.05 is too high.
> The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 
> = 203
> Once a node(region server) got replaced with a new node (terminated and EMR 
> recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 
> 22, 22, 23, 23, 23 = 203
> From hbase-master-logs, I can see the below WARN which indicates that the 
> default minCostNeedBalance does not hold good for these scenarios.
> ##
> 2019-04-29 09:31:37,027 WARN  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> cleaner.CleanerChore: WALs outstanding under 
> hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 
> 09:31:42,920 INFO  
> [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] 
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
> cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost 
> which need balance is 0.05
> ##
> To mitigate this, I had to modify the default minCostNeedBalance to lower 
> value like 0.01f and restart Region Servers and Hbase Master. After modifying 
> this value to 0.01f I could see the regions getting re-balanced.
> This has led me to the following questions which I would like to get it 
> answered from the HBase experts.
> 1)What are the factors that affect the value of total cost and sum 
> multiplier? How could we determine the right minCostNeedBalance value for any 
> cluster?
> 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal 
> value? If yes, then what is the recommended way to mitigate this scenario? 
> Attached: Steps to reproduce
>  
> Note: HBase-17565 patch is already applied.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HBASE-26718) HFileArchiver can remove referenced StoreFiles from the archive

2022-03-25 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-26718:
--
Status: Patch Available  (was: Open)

> HFileArchiver can remove referenced StoreFiles from the archive
> ---
>
> Key: HBASE-26718
> URL: https://issues.apache.org/jira/browse/HBASE-26718
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, HFile, snapshots
>Affects Versions: 2.0.0, 3.0.0-alpha-1, 1.0.0, 0.95.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Major
> Fix For: 2.5.0, 1.7.2, 2.6.0, 3.0.0-alpha-3, 2.4.12
>
>
> There is a comment in {{HFileArchiver#resolveAndArchiveFile}}:
> {code:java}
> // if the file already exists in the archive, move that one to a timestamped 
> backup. This is a
> // really, really unlikely situtation, where we get the same name for the 
> existing file, but
> // is included just for that 1 in trillion chance.
> {code}
> In reality, we did encounter this frequently enough to cause problems. More 
> details will be included and linked in a separate issue.
> But regardless of how we get into this situation, we can consider a different 
> approach to solving it. If we assume store files are immutable, and a store 
> file with the same name and location already exists in the archive, then it 
> can be safer to assume the file was already archived successfully, and react 
> accordingly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26720) ExportSnapshot should validate the source snapshot before copying files

2022-03-04 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501601#comment-17501601
 ] 

David Manning commented on HBASE-26720:
---

[~apurtell] I don't have a patch yet, but I do still plan to work on it. I am 
not opposed to giving up ownership if anyone else already has a patch.

> ExportSnapshot should validate the source snapshot before copying files
> ---
>
> Key: HBASE-26720
> URL: https://issues.apache.org/jira/browse/HBASE-26720
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 0.99.0, 1.0.0, 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Major
> Fix For: 2.5.0, 1.7.2, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
>
> Running {{ExportSnapshot}} with default parameters will copy the snapshot to 
> a target location, and then use {{verifySnapshot}} to validate the integrity 
> of the written snapshot. However, it is possible for the source snapshot to 
> be invalid which leads to an invalid exported snapshot.
> We can validate the source snapshot before export.
> By default, we can validate the source snapshot unless the 
> {{-no-target-verify}} parameter is set. We could also introduce a separate 
> parameter for {{-no-source-verify}} if an operator wanted to validate the 
> target but not validate the source for some reason, to provide some amount of 
> backwards compatibility if that scenario is important.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26718) HFileArchiver can remove referenced StoreFiles from the archive

2022-03-04 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501600#comment-17501600
 ] 

David Manning commented on HBASE-26718:
---

[~apurtell] I don't have a patch yet, but I do still plan to work on it. I am 
not opposed to giving up ownership if anyone else already has a patch.

> HFileArchiver can remove referenced StoreFiles from the archive
> ---
>
> Key: HBASE-26718
> URL: https://issues.apache.org/jira/browse/HBASE-26718
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, HFile, snapshots
>Affects Versions: 0.95.0, 1.0.0, 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Major
> Fix For: 2.5.0, 1.7.2, 2.6.0, 3.0.0-alpha-3, 2.4.11
>
>
> There is a comment in {{HFileArchiver#resolveAndArchiveFile}}:
> {code:java}
> // if the file already exists in the archive, move that one to a timestamped 
> backup. This is a
> // really, really unlikely situtation, where we get the same name for the 
> existing file, but
> // is included just for that 1 in trillion chance.
> {code}
> In reality, we did encounter this frequently enough to cause problems. More 
> details will be included and linked in a separate issue.
> But regardless of how we get into this situation, we can consider a different 
> approach to solving it. If we assume store files are immutable, and a store 
> file with the same name and location already exists in the archive, then it 
> can be safer to assume the file was already archived successfully, and react 
> accordingly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HBASE-26722) Snapshot is corrupted due to interaction between move, warmupRegion, compaction, and HFileArchiver

2022-01-28 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-26722:
--
Affects Version/s: (was: 2.0.0)

> Snapshot is corrupted due to interaction between move, warmupRegion, 
> compaction, and HFileArchiver
> --
>
> Key: HBASE-26722
> URL: https://issues.apache.org/jira/browse/HBASE-26722
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, mover, snapshots
>Affects Versions: 1.3.5
>Reporter: David Manning
>Priority: Critical
> Fix For: 2.2.0, 2.3.0
>
>
> There is an interesting sequence of events which leads to split-brain, 
> double-assignment type of behavior with management of store files.
> The scenario is this:
> # Take snapshot
> # RegionX of snapshotted table is hosted on RegionServer1.
> # Stop RegionServer1, using {{region_mover}}, gracefully moving all regions 
> to other regionservers using {{move}} RPCs.
> # RegionX is now opened on RegionServer2.
> # RegionServer2 compacts RegionX after opening.
> # RegionServer1 starts and uses {{region_mover}} to {{move}} all previously 
> owned regions back to itself.
> # The HMaster RPC to {{move}} calls {{warmupRegion}} on RegionServer1.
> # As part of {{warmupRegion}}, RegionServer1 opens all store files of 
> RegionX. CompactedHFilesDischarger chore has not yet archived the 
> pre-compacted store file. RegionServer1 finds both the pre-compacted store 
> file and post-compacted store file. It logs a warning and archives the 
> pre-compacted file.
> # RegionServer1 has warmed up the region, so now HMaster resumes the {{move}} 
> and sends {{close}} RegionX to RegionServer2.
> # RegionServer2 closes its store files. As part of this, it archives any 
> compacted files which have not yet been archived by the 
> {{CompactedHFilesDischarger}} chore.
> # Since RegionServer1 already archived the file, RegionServer2's 
> {{HFileArchiver}} finds the destination archive file already exists. (code 
> link)
> # RegionServer2 renames the archived file, to free up the desired destination 
> filename.
> With the archived file renamed, RegionServer2 attempts to archive the file as 
> planned. But the source file doesn't exist because RegionServer1 already 
> moved it... to the location RegionServer2 expected to use!
> # RegionServer2 silently ignores this archival failure. (code link)
> # HMaster {{HFileCleaner}} chore later deletes the renamed archive file, 
> because there is no active reference to it. (The snapshot reference is to the 
> original named file, not the "backup" timestamped version.) The snapshot data 
> is irretrievably lost.
> HBASE-26718 tracks a potential, specific change to the archival process to 
> avoid this specific issue.
> However, there is a more fundamental problem here that a region opened by 
> {{warmupRegion}} can operate on that region's store files while the region is 
> opened elsewhere, which must not be allowed.
> This was seen on branch-1, and is a combination of HBASE-22330 and not having 
> the fix for HBASE-22163.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HBASE-26722) Snapshot is corrupted due to interaction between move, warmupRegion, compaction, and HFileArchiver

2022-01-28 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-26722:
--
Fix Version/s: 2.3.0
   2.2.0

> Snapshot is corrupted due to interaction between move, warmupRegion, 
> compaction, and HFileArchiver
> --
>
> Key: HBASE-26722
> URL: https://issues.apache.org/jira/browse/HBASE-26722
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, mover, snapshots
>Affects Versions: 2.0.0, 1.3.5
>Reporter: David Manning
>Priority: Critical
> Fix For: 2.2.0, 2.3.0
>
>
> There is an interesting sequence of events which leads to split-brain, 
> double-assignment type of behavior with management of store files.
> The scenario is this:
> # Take snapshot
> # RegionX of snapshotted table is hosted on RegionServer1.
> # Stop RegionServer1, using {{region_mover}}, gracefully moving all regions 
> to other regionservers using {{move}} RPCs.
> # RegionX is now opened on RegionServer2.
> # RegionServer2 compacts RegionX after opening.
> # RegionServer1 starts and uses {{region_mover}} to {{move}} all previously 
> owned regions back to itself.
> # The HMaster RPC to {{move}} calls {{warmupRegion}} on RegionServer1.
> # As part of {{warmupRegion}}, RegionServer1 opens all store files of 
> RegionX. CompactedHFilesDischarger chore has not yet archived the 
> pre-compacted store file. RegionServer1 finds both the pre-compacted store 
> file and post-compacted store file. It logs a warning and archives the 
> pre-compacted file.
> # RegionServer1 has warmed up the region, so now HMaster resumes the {{move}} 
> and sends {{close}} RegionX to RegionServer2.
> # RegionServer2 closes its store files. As part of this, it archives any 
> compacted files which have not yet been archived by the 
> {{CompactedHFilesDischarger}} chore.
> # Since RegionServer1 already archived the file, RegionServer2's 
> {{HFileArchiver}} finds the destination archive file already exists. (code 
> link)
> # RegionServer2 renames the archived file, to free up the desired destination 
> filename.
> With the archived file renamed, RegionServer2 attempts to archive the file as 
> planned. But the source file doesn't exist because RegionServer1 already 
> moved it... to the location RegionServer2 expected to use!
> # RegionServer2 silently ignores this archival failure. (code link)
> # HMaster {{HFileCleaner}} chore later deletes the renamed archive file, 
> because there is no active reference to it. (The snapshot reference is to the 
> original named file, not the "backup" timestamped version.) The snapshot data 
> is irretrievably lost.
> HBASE-26718 tracks a potential, specific change to the archival process to 
> avoid this specific issue.
> However, there is a more fundamental problem here that a region opened by 
> {{warmupRegion}} can operate on that region's store files while the region is 
> opened elsewhere, which must not be allowed.
> This was seen on branch-1, and is a combination of HBASE-22330 and not having 
> the fix for HBASE-22163.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26722) Snapshot is corrupted due to interaction between move, warmupRegion, compaction, and HFileArchiver

2022-01-28 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning resolved HBASE-26722.
---
Resolution: Duplicate

> Snapshot is corrupted due to interaction between move, warmupRegion, 
> compaction, and HFileArchiver
> --
>
> Key: HBASE-26722
> URL: https://issues.apache.org/jira/browse/HBASE-26722
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, mover, snapshots
>Affects Versions: 2.0.0, 1.3.5
>Reporter: David Manning
>Priority: Critical
>
> There is an interesting sequence of events which leads to split-brain, 
> double-assignment type of behavior with management of store files.
> The scenario is this:
> # Take snapshot
> # RegionX of snapshotted table is hosted on RegionServer1.
> # Stop RegionServer1, using {{region_mover}}, gracefully moving all regions 
> to other regionservers using {{move}} RPCs.
> # RegionX is now opened on RegionServer2.
> # RegionServer2 compacts RegionX after opening.
> # RegionServer1 starts and uses {{region_mover}} to {{move}} all previously 
> owned regions back to itself.
> # The HMaster RPC to {{move}} calls {{warmupRegion}} on RegionServer1.
> # As part of {{warmupRegion}}, RegionServer1 opens all store files of 
> RegionX. CompactedHFilesDischarger chore has not yet archived the 
> pre-compacted store file. RegionServer1 finds both the pre-compacted store 
> file and post-compacted store file. It logs a warning and archives the 
> pre-compacted file.
> # RegionServer1 has warmed up the region, so now HMaster resumes the {{move}} 
> and sends {{close}} RegionX to RegionServer2.
> # RegionServer2 closes its store files. As part of this, it archives any 
> compacted files which have not yet been archived by the 
> {{CompactedHFilesDischarger}} chore.
> # Since RegionServer1 already archived the file, RegionServer2's 
> {{HFileArchiver}} finds the destination archive file already exists. (code 
> link)
> # RegionServer2 renames the archived file, to free up the desired destination 
> filename.
> With the archived file renamed, RegionServer2 attempts to archive the file as 
> planned. But the source file doesn't exist because RegionServer1 already 
> moved it... to the location RegionServer2 expected to use!
> # RegionServer2 silently ignores this archival failure. (code link)
> # HMaster {{HFileCleaner}} chore later deletes the renamed archive file, 
> because there is no active reference to it. (The snapshot reference is to the 
> original named file, not the "backup" timestamped version.) The snapshot data 
> is irretrievably lost.
> HBASE-26718 tracks a potential, specific change to the archival process to 
> avoid this specific issue.
> However, there is a more fundamental problem here that a region opened by 
> {{warmupRegion}} can operate on that region's store files while the region is 
> opened elsewhere, which must not be allowed.
> This was seen on branch-1, and is a combination of HBASE-22330 and not having 
> the fix for HBASE-22163.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26722) Snapshot is corrupted due to interaction between move, warmupRegion, compaction, and HFileArchiver

2022-01-28 Thread David Manning (Jira)
David Manning created HBASE-26722:
-

 Summary: Snapshot is corrupted due to interaction between move, 
warmupRegion, compaction, and HFileArchiver
 Key: HBASE-26722
 URL: https://issues.apache.org/jira/browse/HBASE-26722
 Project: HBase
  Issue Type: Bug
  Components: Compaction, mover, snapshots
Affects Versions: 1.3.5, 2.0.0
Reporter: David Manning


There is an interesting sequence of events which leads to split-brain, 
double-assignment type of behavior with management of store files.

The scenario is this:
# Take snapshot
# RegionX of snapshotted table is hosted on RegionServer1.
# Stop RegionServer1, using {{region_mover}}, gracefully moving all regions to 
other regionservers using {{move}} RPCs.
# RegionX is now opened on RegionServer2.
# RegionServer2 compacts RegionX after opening.
# RegionServer1 starts and uses {{region_mover}} to {{move}} all previously 
owned regions back to itself.
# The HMaster RPC to {{move}} calls {{warmupRegion}} on RegionServer1.
# As part of {{warmupRegion}}, RegionServer1 opens all store files of RegionX. 
CompactedHFilesDischarger chore has not yet archived the pre-compacted store 
file. RegionServer1 finds both the pre-compacted store file and post-compacted 
store file. It logs a warning and archives the pre-compacted file.
# RegionServer1 has warmed up the region, so now HMaster resumes the {{move}} 
and sends {{close}} RegionX to RegionServer2.
# RegionServer2 closes its store files. As part of this, it archives any 
compacted files which have not yet been archived by the 
{{CompactedHFilesDischarger}} chore.
# Since RegionServer1 already archived the file, RegionServer2's 
{{HFileArchiver}} finds the destination archive file already exists. (code link)
# RegionServer2 renames the archived file, to free up the desired destination 
filename.
With the archived file renamed, RegionServer2 attempts to archive the file as 
planned. But the source file doesn't exist because RegionServer1 already moved 
it... to the location RegionServer2 expected to use!
# RegionServer2 silently ignores this archival failure. (code link)
# HMaster {{HFileCleaner}} chore later deletes the renamed archive file, 
because there is no active reference to it. (The snapshot reference is to the 
original named file, not the "backup" timestamped version.) The snapshot data 
is irretrievably lost.

HBASE-26718 tracks a potential, specific change to the archival process to 
avoid this specific issue.

However, there is a more fundamental problem here that a region opened by 
{{warmupRegion}} can operate on that region's store files while the region is 
opened elsewhere, which must not be allowed.

This was seen on branch-1, and is a combination of HBASE-22330 and not having 
the fix for HBASE-22163.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26720) ExportSnapshot should validate the source snapshot before copying files

2022-01-27 Thread David Manning (Jira)
David Manning created HBASE-26720:
-

 Summary: ExportSnapshot should validate the source snapshot before 
copying files
 Key: HBASE-26720
 URL: https://issues.apache.org/jira/browse/HBASE-26720
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 2.0.0, 3.0.0-alpha-1, 1.0.0, 0.99.0
Reporter: David Manning
Assignee: David Manning


Running {{ExportSnapshot}} with default parameters will copy the snapshot to a 
target location, and then use {{verifySnapshot}} to validate the integrity of 
the written snapshot. However, it is possible for the source snapshot to be 
invalid which leads to an invalid exported snapshot.

We can validate the source snapshot before export.

By default, we can validate the source snapshot unless the 
{{-no-target-verify}} parameter is set. We could also introduce a separate 
parameter for {{-no-source-verify}} if an operator wanted to validate the 
target but not validate the source for some reason, to provide some amount of 
backwards compatibility if that scenario is important.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HBASE-26718) HFileArchiver can remove referenced StoreFiles from the archive

2022-01-27 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-26718:
--
Affects Version/s: 0.95.0

> HFileArchiver can remove referenced StoreFiles from the archive
> ---
>
> Key: HBASE-26718
> URL: https://issues.apache.org/jira/browse/HBASE-26718
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, HFile, snapshots
>Affects Versions: 0.95.0, 1.0.0, 3.0.0-alpha-1, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Major
>
> There is a comment in {{HFileArchiver#resolveAndArchiveFile}}:
> {code:java}
> // if the file already exists in the archive, move that one to a timestamped 
> backup. This is a
> // really, really unlikely situtation, where we get the same name for the 
> existing file, but
> // is included just for that 1 in trillion chance.
> {code}
> In reality, we did encounter this frequently enough to cause problems. More 
> details will be included and linked in a separate issue.
> But regardless of how we get into this situation, we can consider a different 
> approach to solving it. If we assume store files are immutable, and a store 
> file with the same name and location already exists in the archive, then it 
> can be safer to assume the file was already archived successfully, and react 
> accordingly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26718) HFileArchiver can remove referenced StoreFiles from the archive

2022-01-27 Thread David Manning (Jira)
David Manning created HBASE-26718:
-

 Summary: HFileArchiver can remove referenced StoreFiles from the 
archive
 Key: HBASE-26718
 URL: https://issues.apache.org/jira/browse/HBASE-26718
 Project: HBase
  Issue Type: Bug
  Components: Compaction, HFile, snapshots
Affects Versions: 2.0.0, 3.0.0-alpha-1, 1.0.0
Reporter: David Manning
Assignee: David Manning


There is a comment in {{HFileArchiver#resolveAndArchiveFile}}:
{code:java}
// if the file already exists in the archive, move that one to a timestamped 
backup. This is a
// really, really unlikely situtation, where we get the same name for the 
existing file, but
// is included just for that 1 in trillion chance.
{code}

In reality, we did encounter this frequently enough to cause problems. More 
details will be included and linked in a separate issue.

But regardless of how we get into this situation, we can consider a different 
approach to solving it. If we assume store files are immutable, and a store 
file with the same name and location already exists in the archive, then it can 
be safer to assume the file was already archived successfully, and react 
accordingly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-22300) SLB doesn't perform well with increase in number of regions

2021-07-10 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378549#comment-17378549
 ] 

David Manning commented on HBASE-22300:
---

and more specifically in subtasks HBASE-25947 or HBASE-25894

> SLB doesn't perform well with increase in number of regions
> ---
>
> Key: HBASE-22300
> URL: https://issues.apache.org/jira/browse/HBASE-22300
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Biju Nair
>Assignee: David Manning
>Priority: Major
>  Labels: balancer
> Attachments: CostFromRegionLoadFunctionNew.rtf
>
>
> With increase in number of regions in a cluster the number of steps taken by 
> balancer in 30 sec (default balancer runtime) reduces noticeably. The 
> following is the number of steps taken with by balancer with region loads set 
> and running it without the loads being set i.e. cost functions using region 
> loads are not fully exercised.
> {noformat}
> Nodes  regions  Tables    # of steps   # of steps 
>   with RS Load     With no load   
> 5       50       5        20               20
> 100     2000     110      104707               100                        
>   
> 100     1    40       19911                100                        
>   
> 200     10   400      870                  100                        
>   {noformat}
> As one would expect the reduced number of steps also makes the balancer take 
> long time to get to an optimal cost. Note that only 2 data points were used 
> in the region load histogram while in practice 15 region load data points are 
> remembered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22300) SLB doesn't perform well with increase in number of regions

2021-07-10 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378546#comment-17378546
 ] 

David Manning commented on HBASE-22300:
---

I was working on this but it looks like it's already resolved in HBASE-25832.

> SLB doesn't perform well with increase in number of regions
> ---
>
> Key: HBASE-22300
> URL: https://issues.apache.org/jira/browse/HBASE-22300
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Biju Nair
>Assignee: David Manning
>Priority: Major
>  Labels: balancer
> Attachments: CostFromRegionLoadFunctionNew.rtf
>
>
> With increase in number of regions in a cluster the number of steps taken by 
> balancer in 30 sec (default balancer runtime) reduces noticeably. The 
> following is the number of steps taken with by balancer with region loads set 
> and running it without the loads being set i.e. cost functions using region 
> loads are not fully exercised.
> {noformat}
> Nodes  regions  Tables    # of steps   # of steps 
>   with RS Load     With no load   
> 5       50       5        20               20
> 100     2000     110      104707               100                        
>   
> 100     1    40       19911                100                        
>   
> 200     10   400      870                  100                        
>   {noformat}
> As one would expect the reduced number of steps also makes the balancer take 
> long time to get to an optimal cost. Note that only 2 data points were used 
> in the region load histogram while in practice 15 region load data points are 
> remembered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-22300) SLB doesn't perform well with increase in number of regions

2021-07-10 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning resolved HBASE-22300.
---
Resolution: Duplicate

> SLB doesn't perform well with increase in number of regions
> ---
>
> Key: HBASE-22300
> URL: https://issues.apache.org/jira/browse/HBASE-22300
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Biju Nair
>Assignee: David Manning
>Priority: Major
>  Labels: balancer
> Attachments: CostFromRegionLoadFunctionNew.rtf
>
>
> With increase in number of regions in a cluster the number of steps taken by 
> balancer in 30 sec (default balancer runtime) reduces noticeably. The 
> following is the number of steps taken with by balancer with region loads set 
> and running it without the loads being set i.e. cost functions using region 
> loads are not fully exercised.
> {noformat}
> Nodes  regions  Tables    # of steps   # of steps 
>   with RS Load     With no load   
> 5       50       5        20               20
> 100     2000     110      104707               100                        
>   
> 100     1    40       19911                100                        
>   
> 200     10   400      870                  100                        
>   {noformat}
> As one would expect the reduced number of steps also makes the balancer take 
> long time to get to an optimal cost. Note that only 2 data points were used 
> in the region load histogram while in practice 15 region load data points are 
> remembered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-22300) SLB doesn't perform well with increase in number of regions

2021-06-14 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning reassigned HBASE-22300:
-

Assignee: David Manning

> SLB doesn't perform well with increase in number of regions
> ---
>
> Key: HBASE-22300
> URL: https://issues.apache.org/jira/browse/HBASE-22300
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Biju Nair
>Assignee: David Manning
>Priority: Major
>  Labels: balancer
> Attachments: CostFromRegionLoadFunctionNew.rtf
>
>
> With increase in number of regions in a cluster the number of steps taken by 
> balancer in 30 sec (default balancer runtime) reduces noticeably. The 
> following is the number of steps taken with by balancer with region loads set 
> and running it without the loads being set i.e. cost functions using region 
> loads are not fully exercised.
> {noformat}
> Nodes  regions  Tables    # of steps   # of steps 
>   with RS Load     With no load   
> 5       50       5        20               20
> 100     2000     110      104707               100                        
>   
> 100     1    40       19911                100                        
>   
> 200     10   400      870                  100                        
>   {noformat}
> As one would expect the reduced number of steps also makes the balancer take 
> long time to get to an optimal cost. Note that only 2 data points were used 
> in the region load histogram while in practice 15 region load data points are 
> remembered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-22300) SLB doesn't perform well with increase in number of regions

2021-06-12 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362413#comment-17362413
 ] 

David Manning edited comment on HBASE-22300 at 6/12/21, 10:37 PM:
--

[~gsbiju] do you still have interest in pursuing this work? If not, I would 
like to attempt a fix based on your proposal.


was (Author: dmanning):
[~gsbiju] do you still have interest in pursuing this work? If not, I would 
like to attempt a fix.

> SLB doesn't perform well with increase in number of regions
> ---
>
> Key: HBASE-22300
> URL: https://issues.apache.org/jira/browse/HBASE-22300
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Biju Nair
>Priority: Major
>  Labels: balancer
> Attachments: CostFromRegionLoadFunctionNew.rtf
>
>
> With increase in number of regions in a cluster the number of steps taken by 
> balancer in 30 sec (default balancer runtime) reduces noticeably. The 
> following is the number of steps taken with by balancer with region loads set 
> and running it without the loads being set i.e. cost functions using region 
> loads are not fully exercised.
> {noformat}
> Nodes  regions  Tables    # of steps   # of steps 
>   with RS Load     With no load   
> 5       50       5        20               20
> 100     2000     110      104707               100                        
>   
> 100     1    40       19911                100                        
>   
> 200     10   400      870                  100                        
>   {noformat}
> As one would expect the reduced number of steps also makes the balancer take 
> long time to get to an optimal cost. Note that only 2 data points were used 
> in the region load histogram while in practice 15 region load data points are 
> remembered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22300) SLB doesn't perform well with increase in number of regions

2021-06-12 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362413#comment-17362413
 ] 

David Manning commented on HBASE-22300:
---

[~gsbiju] do you still have interest in pursuing this work? If not, I would 
like to attempt a fix.

> SLB doesn't perform well with increase in number of regions
> ---
>
> Key: HBASE-22300
> URL: https://issues.apache.org/jira/browse/HBASE-22300
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Biju Nair
>Priority: Major
>  Labels: balancer
> Attachments: CostFromRegionLoadFunctionNew.rtf
>
>
> With increase in number of regions in a cluster the number of steps taken by 
> balancer in 30 sec (default balancer runtime) reduces noticeably. The 
> following is the number of steps taken with by balancer with region loads set 
> and running it without the loads being set i.e. cost functions using region 
> loads are not fully exercised.
> {noformat}
> Nodes  regions  Tables    # of steps   # of steps 
>   with RS Load     With no load   
> 5       50       5        20               20
> 100     2000     110      104707               100                        
>   
> 100     1    40       19911                100                        
>   
> 200     10   400      870                  100                        
>   {noformat}
> As one would expect the reduced number of steps also makes the balancer take 
> long time to get to an optimal cost. Note that only 2 data points were used 
> in the region load histogram while in practice 15 region load data points are 
> remembered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25739) TableSkewCostFunction need to use aggregated deviation

2021-04-13 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320237#comment-17320237
 ] 

David Manning commented on HBASE-25739:
---

oops yes! 198. Thanks [~claraxiong]

> TableSkewCostFunction need to use aggregated deviation
> --
>
> Key: HBASE-25739
> URL: https://issues.apache.org/jira/browse/HBASE-25739
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, master
>Reporter: Clara Xiong
>Priority: Major
>
> TableSkewCostFunction uses the sum of the max deviation region per server for 
> all tables as the measure of unevenness. It doesn't work in a very common 
> scenario in operations. Say we have 100 regions on 50 nodes, two on each. We 
> add 50 new nodes and they have 0 each. The max deviation from the mean is 1, 
> compared to 99 in the worst case scenario of 100 regions on a single server. 
> The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer 
> wouldn't move.  The proposal is to use aggregated deviation of the count per 
> region server to detect this scenario, generating a cost of 100/198 = 0.5 in 
> this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25739) TableSkewCostFunction need to use aggregated deviation

2021-04-13 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319945#comment-17319945
 ] 

David Manning commented on HBASE-25739:
---

Yes [~clarax98007] that sounds good to me. I wasn’t suggesting any default 
weights need to change but was curious what you had found. Thanks for sharing.

The final cost in the description probably goes to 100/298 instead of 3.1/31, 
is that right?

> TableSkewCostFunction need to use aggregated deviation
> --
>
> Key: HBASE-25739
> URL: https://issues.apache.org/jira/browse/HBASE-25739
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, master
>Reporter: Clara Xiong
>Priority: Major
>
> TableSkewCostFunction uses the sum of the max deviation region per server for 
> all tables as the measure of unevenness. It doesn't work in a very common 
> scenario in operations. Say we have 100 regions on 50 nodes, two on each. We 
> add 50 new nodes and they have 0 each. The max deviation from the mean is 1, 
> compared to 99 in the worst case scenario of 100 regions on a single server. 
> The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer 
> wouldn't move.  The proposal is to use aggregated deviation of the count per 
> region server to detect this scenario, generating a cost of 3.1/31 = 0.1 in 
> this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25739) TableSkewCostFunction need to use aggregated deviation

2021-04-09 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318128#comment-17318128
 ] 

David Manning commented on HBASE-25739:
---

Can you update the description since we are no longer using standard deviation 
in the current proposal?

Do you have any thoughts on the current default weight of 
TableSkewCostFunction? DEFAULT_TABLE_SKEW_COST = 35 - I wonder if this still 
makes sense given this change, or if it had this value due to the previous cost 
calculation. I don't really know myself... intuitively it makes sense to me to 
leave at 35, as it seems more important than most other cost functions, and 
less important than RegionCountSkewCostFunction. I was just curious if you had 
any thoughts.

Thanks for the nice improvement.

> TableSkewCostFunction need to use aggregated deviation
> --
>
> Key: HBASE-25739
> URL: https://issues.apache.org/jira/browse/HBASE-25739
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, master
>Reporter: Clara Xiong
>Priority: Major
>
> TableSkewCostFunction uses the sum of the max deviation region per server for 
> all tables as the measure of unevenness. It doesn't work in a very common 
> scenario in operations. Say we have 100 regions on 50 nodes, two on each. We 
> add 50 new nodes and they have 0 each. The max deviation from the mean is 1, 
> compared to 99 in the worst case scenario of 100 regions on a single server. 
> The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer 
> wouldn't move.  The proposal is to use the standard deviation of the count 
> per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 
> in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25749) Improved logging when interrupting active RPC handlers holding the region close lock (HBASE-25212 hbase.regionserver.close.wait.abort)

2021-04-08 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning reassigned HBASE-25749:
-

Assignee: Andrew Kyle Purtell

> Improved logging when interrupting active RPC handlers holding the region 
> close lock (HBASE-25212 hbase.regionserver.close.wait.abort)
> --
>
> Key: HBASE-25749
> URL: https://issues.apache.org/jira/browse/HBASE-25749
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, rpc
>Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.4.0
>Reporter: David Manning
>Assignee: Andrew Kyle Purtell
>Priority: Minor
>
> HBASE-25212 adds an optional improvement to Close Region, for interrupting 
> active RPC handlers holding the region close lock. If, after the timeout is 
> reached, the close lock can still not be acquired, the regionserver may 
> abort. It would be helpful to add logging for which threads or components are 
> holding the region close lock at this time.
> Depending on the size of regionLockHolders, or use of any stack traces, log 
> output may need to be truncated. The interrupt code is in 
> HRegion#interruptRegionOperations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25749) Improved logging when interrupting active RPC handlers holding the region close lock (HBASE-25212 hbase.regionserver.close.wait.abort)

2021-04-08 Thread David Manning (Jira)
David Manning created HBASE-25749:
-

 Summary: Improved logging when interrupting active RPC handlers 
holding the region close lock (HBASE-25212 hbase.regionserver.close.wait.abort)
 Key: HBASE-25749
 URL: https://issues.apache.org/jira/browse/HBASE-25749
 Project: HBase
  Issue Type: Bug
  Components: regionserver, rpc
Affects Versions: 2.4.0, 3.0.0-alpha-1, 1.7.0
Reporter: David Manning


HBASE-25212 adds an optional improvement to Close Region, for interrupting 
active RPC handlers holding the region close lock. If, after the timeout is 
reached, the close lock can still not be acquired, the regionserver may abort. 
It would be helpful to add logging for which threads or components are holding 
the region close lock at this time.

Depending on the size of regionLockHolders, or use of any stack traces, log 
output may need to be truncated. The interrupt code is in 
HRegion#interruptRegionOperations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25726) MoveCostFunction is not included in the list of cost functions for StochasticLoadBalancer

2021-04-01 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-25726:
--
Status: Patch Available  (was: Open)

> MoveCostFunction is not included in the list of cost functions for 
> StochasticLoadBalancer
> -
>
> Key: HBASE-25726
> URL: https://issues.apache.org/jira/browse/HBASE-25726
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.4.0, 2.3.1, 3.0.0-alpha-1, 1.7.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Major
>
> After OffPeakHours fix for MoveCostFunction (HBASE-24709), MoveCostFunction 
> is no longer included in costFunctions list. {{addCostFunction}} expects 
> multiplier to be non-zero, but multiplier is now only set in {{cost}} 
> function.
> As a result, {{hbase.master.balancer.stochastic.maxMovePercent}} is not 
> respected, and there is no cost function to oppose a move. Any move that 
> decreases total cost at all will be accepted, causing more churn and 
> disruption from balancer executions.
> We noticed this when investigating a case where the balancer would run after 
> a regionserver was restarted without use of region_mover script. The 
> regionserver comes online with 0 regions, leading to a shortcut in 
> {{needsBalance}} for {{idleRegionServerExist}}. The balancer runs to move 
> regions to that newly restarted regionserver. However, it moves a large 
> number of regions in the cluster, hyper-optimizing the other cost variables. 
> There were ~4300 regions in the cluster at the time, so moving 25% of the 
> regions should have had a final cost of at least 7 (default moveCostFunction 
> weight.) MoveCostFunction is also not listed in the functions contributing to 
> the initial cost.
> {{2021-03-30 15:47:43,396 INFO [49187_ChoreService_3] 
> balancer.StochasticLoadBalancer - start StochasticLoadBalancer.balancer, 
> initCost=12.91377229840024, functionCost=RegionCountSkewCostFunction : 
> (500.0, 0.014878672009326464); TableSkewCostFunction : (35.0, 
> 0.013600280177445717); RegionReplicaHostCostFunction : (10.0, 0.0); 
> RegionReplicaRackCostFunction : (1.0, 0.0); ReadRequestCostFunction : 
> (5.0, 0.8296332203204705); WriteRequestCostFunction : (5.0, 
> 0.06818455421617946); MemstoreSizeCostFunction : (5.0, 0.08132131691669181); 
> StoreFileCostFunction : (5.0, 0.02054620605193966); computedMaxSteps: 
> 100}}
> {{2021-03-30 15:48:13,385 DEBUG [49187_ChoreService_3] 
> balancer.StochasticLoadBalancer - Finished computing new load balance plan. 
> Computation took 30004ms to try 6571 different iterations. Found a solution 
> that moves 1095 regions; Going from a computed cost of 12.91377229840024 to a 
> new cost of 4.804625730746651}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25726) MoveCostFunction is not included in the list of cost functions for StochasticLoadBalancer

2021-04-01 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning reassigned HBASE-25726:
-

Assignee: David Manning

> MoveCostFunction is not included in the list of cost functions for 
> StochasticLoadBalancer
> -
>
> Key: HBASE-25726
> URL: https://issues.apache.org/jira/browse/HBASE-25726
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Major
>
> After OffPeakHours fix for MoveCostFunction (HBASE-24709), MoveCostFunction 
> is no longer included in costFunctions list. {{addCostFunction}} expects 
> multiplier to be non-zero, but multiplier is now only set in {{cost}} 
> function.
> As a result, {{hbase.master.balancer.stochastic.maxMovePercent}} is not 
> respected, and there is no cost function to oppose a move. Any move that 
> decreases total cost at all will be accepted, causing more churn and 
> disruption from balancer executions.
> We noticed this when investigating a case where the balancer would run after 
> a regionserver was restarted without use of region_mover script. The 
> regionserver comes online with 0 regions, leading to a shortcut in 
> {{needsBalance}} for {{idleRegionServerExist}}. The balancer runs to move 
> regions to that newly restarted regionserver. However, it moves a large 
> number of regions in the cluster, hyper-optimizing the other cost variables. 
> There were ~4300 regions in the cluster at the time, so moving 25% of the 
> regions should have had a final cost of at least 7 (default moveCostFunction 
> weight.) MoveCostFunction is also not listed in the functions contributing to 
> the initial cost.
> {{2021-03-30 15:47:43,396 INFO [49187_ChoreService_3] 
> balancer.StochasticLoadBalancer - start StochasticLoadBalancer.balancer, 
> initCost=12.91377229840024, functionCost=RegionCountSkewCostFunction : 
> (500.0, 0.014878672009326464); TableSkewCostFunction : (35.0, 
> 0.013600280177445717); RegionReplicaHostCostFunction : (10.0, 0.0); 
> RegionReplicaRackCostFunction : (1.0, 0.0); ReadRequestCostFunction : 
> (5.0, 0.8296332203204705); WriteRequestCostFunction : (5.0, 
> 0.06818455421617946); MemstoreSizeCostFunction : (5.0, 0.08132131691669181); 
> StoreFileCostFunction : (5.0, 0.02054620605193966); computedMaxSteps: 
> 100}}
> {{2021-03-30 15:48:13,385 DEBUG [49187_ChoreService_3] 
> balancer.StochasticLoadBalancer - Finished computing new load balance plan. 
> Computation took 30004ms to try 6571 different iterations. Found a solution 
> that moves 1095 regions; Going from a computed cost of 12.91377229840024 to a 
> new cost of 4.804625730746651}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25726) MoveCostFunction is not included in the list of cost functions for StochasticLoadBalancer

2021-04-01 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-25726:
--
Description: 
After OffPeakHours fix for MoveCostFunction (HBASE-24709), MoveCostFunction is 
no longer included in costFunctions list. {{addCostFunction}} expects 
multiplier to be non-zero, but multiplier is now only set in {{cost}} function.

As a result, {{hbase.master.balancer.stochastic.maxMovePercent}} is not 
respected, and there is no cost function to oppose a move. Any move that 
decreases total cost at all will be accepted, causing more churn and disruption 
from balancer executions.

We noticed this when investigating a case where the balancer would run after a 
regionserver was restarted without use of region_mover script. The regionserver 
comes online with 0 regions, leading to a shortcut in {{needsBalance}} for 
{{idleRegionServerExist}}. The balancer runs to move regions to that newly 
restarted regionserver. However, it moves a large number of regions in the 
cluster, hyper-optimizing the other cost variables. There were ~4300 regions in 
the cluster at the time, so moving 25% of the regions should have had a final 
cost of at least 7 (default moveCostFunction weight.) MoveCostFunction is also 
not listed in the functions contributing to the initial cost.

{{2021-03-30 15:47:43,396 INFO [49187_ChoreService_3] 
balancer.StochasticLoadBalancer - start StochasticLoadBalancer.balancer, 
initCost=12.91377229840024, functionCost=RegionCountSkewCostFunction : (500.0, 
0.014878672009326464); TableSkewCostFunction : (35.0, 0.013600280177445717); 
RegionReplicaHostCostFunction : (10.0, 0.0); RegionReplicaRackCostFunction 
: (1.0, 0.0); ReadRequestCostFunction : (5.0, 0.8296332203204705); 
WriteRequestCostFunction : (5.0, 0.06818455421617946); MemstoreSizeCostFunction 
: (5.0, 0.08132131691669181); StoreFileCostFunction : (5.0, 
0.02054620605193966); computedMaxSteps: 100}}

{{2021-03-30 15:48:13,385 DEBUG [49187_ChoreService_3] 
balancer.StochasticLoadBalancer - Finished computing new load balance plan. 
Computation took 30004ms to try 6571 different iterations. Found a solution 
that moves 1095 regions; Going from a computed cost of 12.91377229840024 to a 
new cost of 4.804625730746651}}

  was:
After OffPeakHours fix for MoveCostFunction (HBASE-24709), MoveCostFunction is 
no longer included in costFunctions list. {{addCostFunction}} expects 
multiplier to be non-zero, but multiplier is now only set in {{cost}} function.

As a result, {{hbase.master.balancer.stochastic.maxMovePercent}} is not 
respected, and there is no cost function to oppose a move. Any move that 
decreases total cost at all will be accepted, causing more churn and disruption 
from balancer executions.

We noticed this when investigating a case where the balancer would run after a 
regionserver was restarted without use of region_mover script. The regionserver 
comes online with 0 regions, leading to a shortcut in {{needsBalance}} for 
{{idleRegionServerExist}}. The balancer runs to move regions to that newly 
restarted regionserver. However, it moves a large number of regions in the 
cluster, hyper-optimizing the other cost variables. There were ~4300 regions in 
the cluster at the time, so moving 25% of the regions should have had a final 
cost of at least 7 (default moveCostFunction weight.) MoveCostFunction is also 
not listed in the functions contributing to the initial cost.

{{2021}}{{-}}{{03}}{{-}}{{30}}{{ }}{{15}}{{:}}{{47}}{{:}}{{43}}{{,}}{{396}}{{ 
}}{{INFO}}{{ [}}{{49187}}{{_}}{{ChoreService}}{{_}}{{3}}{{] 
}}{{balancer}}{{.}}{{StochasticLoadBalancer}}{{ }}{{-}}{{ 
}}{{start}}{{}}{{StochasticLoadBalancer}}{{.}}{{balancer}}{{, 
}}{{initCost}}{{=}}{{12}}{{.}}{{91377229840024}}{{, 
}}{{functionCost}}{{=}}{{RegionCountSkewCostFunction}}{{ : 
(}}{{500}}{{.}}{{0}}{{, }}{{0}}{{.}}{{014878672009326464}}{{); 
}}{{TableSkewCostFunction}}{{ : (}}{{35}}{{.}}{{0}}{{, 
}}{{0}}{{.}}{{013600280177445717}}{{); }}{{RegionReplicaHostCostFunction}}{{ : 
(}}{{10}}{{.}}{{0}}{{, }}{{0}}{{.}}{{0}}{{); 
}}{{RegionReplicaRackCostFunction}}{{ : (}}{{1}}{{.}}{{0}}{{, 
}}{{0}}{{.}}{{0}}{{); }}{{ReadRequestCostFunction}}{{ : (}}{{5}}{{.}}{{0}}{{, 
}}{{0}}{{.}}{{8296332203204705}}{{); }}{{WriteRequestCostFunction}}{{ : 
(}}{{5}}{{.}}{{0}}{{, }}{{0}}{{.}}{{06818455421617946}}{{); 
}}{{MemstoreSizeCostFunction}}{{ : (}}{{5}}{{.}}{{0}}{{, 
}}{{0}}{{.}}{{08132131691669181}}{{); }}{{StoreFileCostFunction}}{{ : 
(}}{{5}}{{.}}{{0}}{{, }}{{0}}{{.}}{{02054620605193966}}{{); 
}}{{computedMaxSteps}}{{: }}{{100}}

{{2021}}{{-}}{{03}}{{-}}{{30}}{{ }}{{15}}{{:}}{{48}}{{:}}{{13}}{{,}}{{385}}{{ 
}}{{DEBUG}}{{ [}}{{49187}}{{_}}{{ChoreService}}{{_}}{{3}}{{] 
}}{{balancer}}{{.}}{{StochasticLoadBalancer}}{{ }}{{-}}{{ }}{{Finished 
}}{{}}{{computing}}{{ }}{{new}}{{ }}{{load}}{{ }}{{balance}}{{ 
}}{{plan}}{{.}}{{ }}{{Computation}}{{ 

[jira] [Created] (HBASE-25726) MoveCostFunction is not included in the list of cost functions for StochasticLoadBalancer

2021-04-01 Thread David Manning (Jira)
David Manning created HBASE-25726:
-

 Summary: MoveCostFunction is not included in the list of cost 
functions for StochasticLoadBalancer
 Key: HBASE-25726
 URL: https://issues.apache.org/jira/browse/HBASE-25726
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 2.4.0, 2.3.1, 3.0.0-alpha-1, 1.7.0
Reporter: David Manning


After OffPeakHours fix for MoveCostFunction (HBASE-24709), MoveCostFunction is 
no longer included in costFunctions list. {{addCostFunction}} expects 
multiplier to be non-zero, but multiplier is now only set in {{cost}} function.

As a result, {{hbase.master.balancer.stochastic.maxMovePercent}} is not 
respected, and there is no cost function to oppose a move. Any move that 
decreases total cost at all will be accepted, causing more churn and disruption 
from balancer executions.

We noticed this when investigating a case where the balancer would run after a 
regionserver was restarted without use of region_mover script. The regionserver 
comes online with 0 regions, leading to a shortcut in {{needsBalance}} for 
{{idleRegionServerExist}}. The balancer runs to move regions to that newly 
restarted regionserver. However, it moves a large number of regions in the 
cluster, hyper-optimizing the other cost variables. There were ~4300 regions in 
the cluster at the time, so moving 25% of the regions should have had a final 
cost of at least 7 (default moveCostFunction weight.) MoveCostFunction is also 
not listed in the functions contributing to the initial cost.

{{2021}}{{-}}{{03}}{{-}}{{30}}{{ }}{{15}}{{:}}{{47}}{{:}}{{43}}{{,}}{{396}}{{ 
}}{{INFO}}{{ [}}{{49187}}{{_}}{{ChoreService}}{{_}}{{3}}{{] 
}}{{balancer}}{{.}}{{StochasticLoadBalancer}}{{ }}{{-}}{{ 
}}{{start}}{{}}{{StochasticLoadBalancer}}{{.}}{{balancer}}{{, 
}}{{initCost}}{{=}}{{12}}{{.}}{{91377229840024}}{{, 
}}{{functionCost}}{{=}}{{RegionCountSkewCostFunction}}{{ : 
(}}{{500}}{{.}}{{0}}{{, }}{{0}}{{.}}{{014878672009326464}}{{); 
}}{{TableSkewCostFunction}}{{ : (}}{{35}}{{.}}{{0}}{{, 
}}{{0}}{{.}}{{013600280177445717}}{{); }}{{RegionReplicaHostCostFunction}}{{ : 
(}}{{10}}{{.}}{{0}}{{, }}{{0}}{{.}}{{0}}{{); 
}}{{RegionReplicaRackCostFunction}}{{ : (}}{{1}}{{.}}{{0}}{{, 
}}{{0}}{{.}}{{0}}{{); }}{{ReadRequestCostFunction}}{{ : (}}{{5}}{{.}}{{0}}{{, 
}}{{0}}{{.}}{{8296332203204705}}{{); }}{{WriteRequestCostFunction}}{{ : 
(}}{{5}}{{.}}{{0}}{{, }}{{0}}{{.}}{{06818455421617946}}{{); 
}}{{MemstoreSizeCostFunction}}{{ : (}}{{5}}{{.}}{{0}}{{, 
}}{{0}}{{.}}{{08132131691669181}}{{); }}{{StoreFileCostFunction}}{{ : 
(}}{{5}}{{.}}{{0}}{{, }}{{0}}{{.}}{{02054620605193966}}{{); 
}}{{computedMaxSteps}}{{: }}{{100}}

{{2021}}{{-}}{{03}}{{-}}{{30}}{{ }}{{15}}{{:}}{{48}}{{:}}{{13}}{{,}}{{385}}{{ 
}}{{DEBUG}}{{ [}}{{49187}}{{_}}{{ChoreService}}{{_}}{{3}}{{] 
}}{{balancer}}{{.}}{{StochasticLoadBalancer}}{{ }}{{-}}{{ }}{{Finished 
}}{{}}{{computing}}{{ }}{{new}}{{ }}{{load}}{{ }}{{balance}}{{ 
}}{{plan}}{{.}}{{ }}{{Computation}}{{ }}{{took}}{{ }}{{30004ms}}{{ }}{{to}}{{ 
}}{{try}}{{ }}{{6571}}{{ }}{{different}}{{ }}{{iterations}}{{.}}{{ 
}}{{Found}}{{ }}{{a }}{{}}{{solution}}{{ }}{{that}}{{ }}{{moves}}{{ 
}}{{1095}}{{ }}{{regions}}{{; }}{{Going}}{{ }}{{from}}{{ }}{{a}}{{ 
}}{{computed}}{{ }}{{cost}}{{ }}{{of}}{{ }}{{12}}{{.}}{{91377229840024}}{{ 
}}{{to}}{{ }}{{a}}{{ }}{{new}}{{ }}{{cost}}{{ }}{{of 
}}{{}}{{4}}{{.}}{{804625730746651}}{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25648) Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after HBASE-25592 HBASE-23932

2021-03-08 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-25648:
--
Summary: Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 
after HBASE-25592 HBASE-23932  (was: Fix normalizer and 
TestSimpleRegionNormalizerOnCluster in branch-1 after HBASE-25592 HABSE-23932)

> Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after 
> HBASE-25592 HBASE-23932
> 
>
> Key: HBASE-25648
> URL: https://issues.apache.org/jira/browse/HBASE-25648
> Project: HBase
>  Issue Type: Bug
>  Components: Normalizer
>Affects Versions: 1.7.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Major
>
> On branch-1 run
> {{mvn test -Dtest=TestSimpleRegionNormalizerOnCluster}}
> It fails. It appears to be due to some problems in the refactoring related to 
> HBASE-25592 and HBASE-23932.
>  
> {code:java}
> [INFO] Running 
> org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 
> 131.753 s <<< FAILURE! - in 
> org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster
> [ERROR] 
> testRegionNormalizationSplitOnCluster(org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster)
>   Time elapsed: 60.107 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 6 
> milliseconds
>   at 
> org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster.testRegionNormalizationSplitOnCluster(TestSimpleRegionNormalizerOnCluster.java:132)
> [ERROR] 
> testRegionNormalizationMergeOnCluster(org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster)
>   Time elapsed: 60.117 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 6 
> milliseconds
>   at 
> org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster.testRegionNormalizationMergeOnCluster(TestSimpleRegionNormalizerOnCluster.java:199)
> [INFO]
> [INFO] Results:
> [INFO]
> [ERROR] Errors:
> [ERROR]   
> TestSimpleRegionNormalizerOnCluster.testRegionNormalizationMergeOnCluster:199 
> » TestTimedOut
> [ERROR]   
> TestSimpleRegionNormalizerOnCluster.testRegionNormalizationSplitOnCluster:132 
> TestTimedOut
> [INFO]
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25648) Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after HBASE-25592 HABSE-23932

2021-03-08 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-25648:
--
Description: 
On branch-1 run

{{mvn test -Dtest=TestSimpleRegionNormalizerOnCluster}}

It fails. It appears to be due to some problems in the refactoring related to 
HBASE-25592 and HBASE-23932.

 
{code:java}
[INFO] Running 
org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster
[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 131.753 
s <<< FAILURE! - in 
org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster
[ERROR] 
testRegionNormalizationSplitOnCluster(org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster)
  Time elapsed: 60.107 s  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 6 
milliseconds
at 
org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster.testRegionNormalizationSplitOnCluster(TestSimpleRegionNormalizerOnCluster.java:132)

[ERROR] 
testRegionNormalizationMergeOnCluster(org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster)
  Time elapsed: 60.117 s  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 6 
milliseconds
at 
org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster.testRegionNormalizationMergeOnCluster(TestSimpleRegionNormalizerOnCluster.java:199)

[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]   
TestSimpleRegionNormalizerOnCluster.testRegionNormalizationMergeOnCluster:199 » 
TestTimedOut
[ERROR]   
TestSimpleRegionNormalizerOnCluster.testRegionNormalizationSplitOnCluster:132 
TestTimedOut
[INFO]
[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0
{code}

  was:
On branch-1 run

{{mvn test -Dtest=TestSimpleRegionNormalizerOnCluster}}

It fails. It appears to be due to some problems in the refactoring related to 
HBASE-25592 and HBASE-23932.


> Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after 
> HBASE-25592 HABSE-23932
> 
>
> Key: HBASE-25648
> URL: https://issues.apache.org/jira/browse/HBASE-25648
> Project: HBase
>  Issue Type: Bug
>  Components: Normalizer
>Affects Versions: 1.7.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Major
>
> On branch-1 run
> {{mvn test -Dtest=TestSimpleRegionNormalizerOnCluster}}
> It fails. It appears to be due to some problems in the refactoring related to 
> HBASE-25592 and HBASE-23932.
>  
> {code:java}
> [INFO] Running 
> org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 
> 131.753 s <<< FAILURE! - in 
> org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster
> [ERROR] 
> testRegionNormalizationSplitOnCluster(org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster)
>   Time elapsed: 60.107 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 6 
> milliseconds
>   at 
> org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster.testRegionNormalizationSplitOnCluster(TestSimpleRegionNormalizerOnCluster.java:132)
> [ERROR] 
> testRegionNormalizationMergeOnCluster(org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster)
>   Time elapsed: 60.117 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 6 
> milliseconds
>   at 
> org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster.testRegionNormalizationMergeOnCluster(TestSimpleRegionNormalizerOnCluster.java:199)
> [INFO]
> [INFO] Results:
> [INFO]
> [ERROR] Errors:
> [ERROR]   
> TestSimpleRegionNormalizerOnCluster.testRegionNormalizationMergeOnCluster:199 
> » TestTimedOut
> [ERROR]   
> TestSimpleRegionNormalizerOnCluster.testRegionNormalizationSplitOnCluster:132 
> TestTimedOut
> [INFO]
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25648) Fix normalizer and TestSimpleRegionNormalizerOnCluster in branch-1 after HBASE-25592 HABSE-23932

2021-03-08 Thread David Manning (Jira)
David Manning created HBASE-25648:
-

 Summary: Fix normalizer and TestSimpleRegionNormalizerOnCluster in 
branch-1 after HBASE-25592 HABSE-23932
 Key: HBASE-25648
 URL: https://issues.apache.org/jira/browse/HBASE-25648
 Project: HBase
  Issue Type: Bug
  Components: Normalizer
Affects Versions: 1.7.0
Reporter: David Manning
Assignee: David Manning


On branch-1 run

{{mvn test -Dtest=TestSimpleRegionNormalizerOnCluster}}

It fails. It appears to be due to some problems in the refactoring related to 
HBASE-25592 and HBASE-23932.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate resource distribution

2021-03-05 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295860#comment-17295860
 ] 

David Manning commented on HBASE-25625:
---

I'm excited for working towards a balancer that works better for large 
clusters! Thanks for proposing changes in that direction.

I agree that the TableSkewCostFunction seems limited in its current form of 
only tracking the max regions on any given server.

For the other cost functions, I'm having a hard time working through the math 
and seeing the benefit, though. For example, if I take an 11-node cluster with 
100 regions per server on average:

100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100

And one node goes down, then I see:

110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 0

With sum of deviation (old computation), it is (110 - 100) * 10 + (100 - 0) * 1 
= 200. Max deviation would be 1100 regions on one server, for (100 - 0) * 10 + 
(1100 - 100) * 1 = 2000. So the scaled cost would be 200 / 2000 - 0.1.

With stdev (new computation), it also gives a scaled cost of 0.1. stdev = 
sqrt(((110 - 100) ^ 2 * 10 + (0 - 100) ^ 2 * 1) / 11) = sqrt(1000). Maximum 
possible stdev = sqrt(((0 - 100) ^ 2 * 10 + (1100 - 100) ^ 2 * 1) / 11) = 
sqrt(10).

If another server goes down and distributed regions round-robin, the cluster 
state would look like:

121, 121, 121, 121, 121, 121, 121, 121, 121, 0, 11

If I did the math right, then I see:

old computation: 378 / 2000 = 0.189

new computation: 0.140

So the stdev-based calculation is less likely to balance in these scenarios.

How big does the cluster have to get to benefit from the new calculations? I 
tried 100 nodes with 1000 regions per node. One node at 0 results in 0.01 cost 
in both old and new calculations. Two nodes down (assuming round-robin 
balancing again), gives me 0.019 for the old calculation and 0.014 for the new 
stdev calculation.

> StochasticBalancer CostFunctions needs a better way to evaluate resource 
> distribution
> -
>
> Key: HBASE-25625
> URL: https://issues.apache.org/jira/browse/HBASE-25625
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer, master
>Reporter: Clara Xiong
>Assignee: Clara Xiong
>Priority: Major
>
> Currently CostFunctions including RegionCountSkewCostFunctions, 
> PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the 
> unevenness of the distribution by getting the sum of deviation per region 
> server. This simple implementation works when the cluster is small. But when 
> the cluster get larger with more region servers and regions, it doesn't work 
> well with hot spots or a small number of unbalanced servers. The proposal is 
> to use the standard deviation of the count per region server to capture the 
> existence of a small portion of region servers with overwhelming 
> load/allocation.
> TableSkewCostFunction uses the sum of the max deviation region per server for 
> all tables as the measure of unevenness. It doesn't work in a very common 
> scenario in operations. Say we have 100 regions on 50 nodes, two on each. We 
> add 50 new nodes and they have 0 each. The max deviation from the mean is 1, 
> compared to 99 in the worst case scenario of 100 regions on a single server. 
> The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer 
> wouldn't move.  The proposal is to use the standard deviation of the count 
> per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 
> in this case.
> Patch is in test and will follow shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24657) JsonBean representation of metrics at /jmx endpoint now quotes all numbers

2020-06-29 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-24657:
--
Fix Version/s: 1.6.0
   1.4.14
   1.3.7
   Status: Patch Available  (was: In Progress)

> JsonBean representation of metrics at /jmx endpoint now quotes all numbers
> --
>
> Key: HBASE-24657
> URL: https://issues.apache.org/jira/browse/HBASE-24657
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.4.11, 1.3.6, 1.6.0, 1.5.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
> Fix For: 1.3.7, 1.4.14, 1.6.0
>
>
> HBASE-20571 had a fix to look for NaN or Infinity in numbers, and to quote 
> those as strings. The order of the `if-else` block is different in branch-1 
> (https://github.com/apache/hbase/commit/2d493556f3c8ae87fb92422b525bf7c9345e6ccd)
>  and branch-2 
> (https://github.com/apache/hbase/commit/39ea1efa885e2f27f41af59228e0a12c4ded08f8)
> HBASE-23015 changed the JsonBean.java code in a meaningful way, and the order 
> of the changes were consistent between branch-1 
> ([https://github.com/apache/hbase/commit/f77c14d18150f55ee892f8d24a5ee231c1ae7e20#diff-87e9e2722b9210eebfd8c820c5d72a46L319-L324])
>  and branch-2 
> ([https://github.com/apache/hbase/commit/761aef6d9d0b8a455842de4d5eac7d9486f00633#diff-2c8f5dd222141c69112c5c5b5f70cf55R319-R324])
>  Unfortunately, they need to be reversed since the order is different between 
> branch-1 and branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24657) JsonBean representation of metrics at /jmx endpoint now quotes all numbers

2020-06-29 Thread David Manning (Jira)
David Manning created HBASE-24657:
-

 Summary: JsonBean representation of metrics at /jmx endpoint now 
quotes all numbers
 Key: HBASE-24657
 URL: https://issues.apache.org/jira/browse/HBASE-24657
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 1.4.11, 1.3.6, 1.6.0, 1.5.0
Reporter: David Manning
Assignee: David Manning


HBASE-20571 had a fix to look for NaN or Infinity in numbers, and to quote 
those as strings. The order of the `if-else` block is different in branch-1 
(https://github.com/apache/hbase/commit/2d493556f3c8ae87fb92422b525bf7c9345e6ccd)
 and branch-2 
(https://github.com/apache/hbase/commit/39ea1efa885e2f27f41af59228e0a12c4ded08f8)

HBASE-23015 changed the JsonBean.java code in a meaningful way, and the order 
of the changes were consistent between branch-1 
([https://github.com/apache/hbase/commit/f77c14d18150f55ee892f8d24a5ee231c1ae7e20#diff-87e9e2722b9210eebfd8c820c5d72a46L319-L324])
 and branch-2 
([https://github.com/apache/hbase/commit/761aef6d9d0b8a455842de4d5eac7d9486f00633#diff-2c8f5dd222141c69112c5c5b5f70cf55R319-R324])
 Unfortunately, they need to be reversed since the order is different between 
branch-1 and branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-24657) JsonBean representation of metrics at /jmx endpoint now quotes all numbers

2020-06-29 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-24657 started by David Manning.
-
> JsonBean representation of metrics at /jmx endpoint now quotes all numbers
> --
>
> Key: HBASE-24657
> URL: https://issues.apache.org/jira/browse/HBASE-24657
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.5.0, 1.6.0, 1.3.6, 1.4.11
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
>
> HBASE-20571 had a fix to look for NaN or Infinity in numbers, and to quote 
> those as strings. The order of the `if-else` block is different in branch-1 
> (https://github.com/apache/hbase/commit/2d493556f3c8ae87fb92422b525bf7c9345e6ccd)
>  and branch-2 
> (https://github.com/apache/hbase/commit/39ea1efa885e2f27f41af59228e0a12c4ded08f8)
> HBASE-23015 changed the JsonBean.java code in a meaningful way, and the order 
> of the changes were consistent between branch-1 
> ([https://github.com/apache/hbase/commit/f77c14d18150f55ee892f8d24a5ee231c1ae7e20#diff-87e9e2722b9210eebfd8c820c5d72a46L319-L324])
>  and branch-2 
> ([https://github.com/apache/hbase/commit/761aef6d9d0b8a455842de4d5eac7d9486f00633#diff-2c8f5dd222141c69112c5c5b5f70cf55R319-R324])
>  Unfortunately, they need to be reversed since the order is different between 
> branch-1 and branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24099) Use a fair ReentrantReadWriteLock for the region close lock

2020-04-06 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076842#comment-17076842
 ] 

David Manning edited comment on HBASE-24099 at 4/7/20, 2:52 AM:


Yep, understood. I was just hoping someone could explain why numbers would 
consistently get faster with a fair lock policy... or even why a fair lock 
would get much slower when there is no thread waiting for a writer lock.

So it's a +1 from me, for whatever that's worth.


was (Author: dmanning):
Yep, understood. I was just hoping someone could explain why numbers would 
consistently get faster with a new lock policy... or even why they would get 
much slower given the fairness when there is no thread waiting for a writer 
lock.

So it's a +1 from me, for whatever that's worth.

> Use a fair ReentrantReadWriteLock for the region close lock
> ---
>
> Key: HBASE-24099
> URL: https://issues.apache.org/jira/browse/HBASE-24099
> Project: HBase
>  Issue Type: Improvement
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0, 2.3.1, 1.3.7, 1.7.0, 2.4.0, 2.1.10, 1.4.14, 2.2.5
>
> Attachments: ltt_results.pdf, pe_results.pdf, ycsb_results.pdf
>
>
> Consider creating the region's ReentrantReadWriteLock with the fair locking 
> policy. We have had a couple of production incidents where a regionserver 
> stalled in shutdown for a very very long time, leading to RIT (FAILED_CLOSE). 
> The latest example is a 43 minute shutdown, ~40 minutes (2465280 ms) of that 
> time was spent waiting to acquire the write lock on the region in order to 
> finish closing it.
> {quote}
> ...
> Finished memstore flush of ~66.92 MB/70167112, currentsize=0 B/0 for region 
> . in 927ms, sequenceid=6091133815, compaction requested=false at 
> 1585175635349 (+60 ms)
> Disabling writes for close at 1585178100629 (+2465280 ms)
> {quote}
> This time was spent in between the memstore flush and the task status change 
> "Disabling writes for close at...". This is at HRegion.java:1481 in 1.3.6:
> {code}
> 1480:   // block waiting for the lock for closing
> 1481:  lock.writeLock().lock(); // FindBugs: Complains 
> UL_UNRELEASED_LOCK_EXCEPTION_PATH but seems fine
> {code}
>  
> The close lock is operating in unfair mode. The table in question is under 
> constant high query load. When the close request was received, there were 
> active readers. After the close request there were more active readers, 
> near-continuous contention. Although the clients would receive 
> RegionServerStoppingException and other error notifications, because the 
> region could not be reassigned, they kept coming, region (re-)location would 
> find the region still hosted on the stuck server. Finally the closing thread 
> waiting for the write lock became no longer starved (by chance) after 40 
> minutes.
> The ReentrantReadWriteLock javadoc is clear about the possibility of 
> starvation when continuously contended: "_When constructed as non-fair (the 
> default), the order of entry to the read and write lock is unspecified, 
> subject to reentrancy constraints. A nonfair lock that is continuously 
> contended may indefinitely postpone one or more reader or writer threads, but 
> will normally have higher throughput than a fair lock._"
> We could try changing the acquisition semantics of this lock to fair. This is 
> a one line change, where we call the RW lock constructor. Then:
>  "_When constructed as fair, threads contend for entry using an approximately 
> arrival-order policy. When the currently held lock is released, either the 
> longest-waiting single writer thread will be assigned the write lock, or if 
> there is a group of reader threads waiting longer than all waiting writer 
> threads, that group will be assigned the read lock._" 
> This could be better. The close process will have to wait until all readers 
> and writers already waiting for acquisition either acquire and release or go 
> away but won't be starved by future/incoming requests.
> There could be a throughput loss in request handling, though, because this is 
> the global reentrant RW lock for the region. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24099) Use a fair ReentrantReadWriteLock for the region close lock

2020-04-06 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076842#comment-17076842
 ] 

David Manning commented on HBASE-24099:
---

Yep, understood. I was just hoping someone could explain why numbers would 
consistently get faster with a new lock policy... or even why they would get 
much slower given the fairness when there is no thread waiting for a writer 
lock.

So it's a +1 from me, for whatever that's worth.

> Use a fair ReentrantReadWriteLock for the region close lock
> ---
>
> Key: HBASE-24099
> URL: https://issues.apache.org/jira/browse/HBASE-24099
> Project: HBase
>  Issue Type: Improvement
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0, 2.3.1, 1.3.7, 1.7.0, 2.4.0, 2.1.10, 1.4.14, 2.2.5
>
> Attachments: ltt_results.pdf, pe_results.pdf, ycsb_results.pdf
>
>
> Consider creating the region's ReentrantReadWriteLock with the fair locking 
> policy. We have had a couple of production incidents where a regionserver 
> stalled in shutdown for a very very long time, leading to RIT (FAILED_CLOSE). 
> The latest example is a 43 minute shutdown, ~40 minutes (2465280 ms) of that 
> time was spent waiting to acquire the write lock on the region in order to 
> finish closing it.
> {quote}
> ...
> Finished memstore flush of ~66.92 MB/70167112, currentsize=0 B/0 for region 
> . in 927ms, sequenceid=6091133815, compaction requested=false at 
> 1585175635349 (+60 ms)
> Disabling writes for close at 1585178100629 (+2465280 ms)
> {quote}
> This time was spent in between the memstore flush and the task status change 
> "Disabling writes for close at...". This is at HRegion.java:1481 in 1.3.6:
> {code}
> 1480:   // block waiting for the lock for closing
> 1481:  lock.writeLock().lock(); // FindBugs: Complains 
> UL_UNRELEASED_LOCK_EXCEPTION_PATH but seems fine
> {code}
>  
> The close lock is operating in unfair mode. The table in question is under 
> constant high query load. When the close request was received, there were 
> active readers. After the close request there were more active readers, 
> near-continuous contention. Although the clients would receive 
> RegionServerStoppingException and other error notifications, because the 
> region could not be reassigned, they kept coming, region (re-)location would 
> find the region still hosted on the stuck server. Finally the closing thread 
> waiting for the write lock became no longer starved (by chance) after 40 
> minutes.
> The ReentrantReadWriteLock javadoc is clear about the possibility of 
> starvation when continuously contended: "_When constructed as non-fair (the 
> default), the order of entry to the read and write lock is unspecified, 
> subject to reentrancy constraints. A nonfair lock that is continuously 
> contended may indefinitely postpone one or more reader or writer threads, but 
> will normally have higher throughput than a fair lock._"
> We could try changing the acquisition semantics of this lock to fair. This is 
> a one line change, where we call the RW lock constructor. Then:
>  "_When constructed as fair, threads contend for entry using an approximately 
> arrival-order policy. When the currently held lock is released, either the 
> longest-waiting single writer thread will be assigned the write lock, or if 
> there is a group of reader threads waiting longer than all waiting writer 
> threads, that group will be assigned the read lock._" 
> This could be better. The close process will have to wait until all readers 
> and writers already waiting for acquisition either acquire and release or go 
> away but won't be starved by future/incoming requests.
> There could be a throughput loss in request handling, though, because this is 
> the global reentrant RW lock for the region. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24099) Use a fair ReentrantReadWriteLock for the region close lock

2020-04-06 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076799#comment-17076799
 ] 

David Manning commented on HBASE-24099:
---

I am not an expert on the locks, but looking through the code I only found two 
cases where the {{writeLock}} is taken: {{startBulkRegionOperation}} and 
{{doClose}}. I'm guessing very few of those operations happen during the 
performance tests. So, as a result, I'd expect to not see too much overhead in 
enforcing the fairness. Most locks should be read-only, so contention should be 
minimal. If that's true, then it could explain that a lot of the ~10% changes 
are just normal variance.

Put another way, there should be absolutely no reason why some read cases get 
faster with a fair lock pattern... right? So that seems to suggest a variance 
level around ~10%.

All of this makes me feel pretty good about the performance results not showing 
a regression.

> Use a fair ReentrantReadWriteLock for the region close lock
> ---
>
> Key: HBASE-24099
> URL: https://issues.apache.org/jira/browse/HBASE-24099
> Project: HBase
>  Issue Type: Improvement
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0, 2.3.1, 1.3.7, 1.7.0, 2.4.0, 2.1.10, 1.4.14, 2.2.5
>
> Attachments: ltt_results.pdf, pe_results.pdf, ycsb_results.pdf
>
>
> Consider creating the region's ReentrantReadWriteLock with the fair locking 
> policy. We have had a couple of production incidents where a regionserver 
> stalled in shutdown for a very very long time, leading to RIT (FAILED_CLOSE). 
> The latest example is a 43 minute shutdown, ~40 minutes (2465280 ms) of that 
> time was spent waiting to acquire the write lock on the region in order to 
> finish closing it.
> {quote}
> ...
> Finished memstore flush of ~66.92 MB/70167112, currentsize=0 B/0 for region 
> . in 927ms, sequenceid=6091133815, compaction requested=false at 
> 1585175635349 (+60 ms)
> Disabling writes for close at 1585178100629 (+2465280 ms)
> {quote}
> This time was spent in between the memstore flush and the task status change 
> "Disabling writes for close at...". This is at HRegion.java:1481 in 1.3.6:
> {code}
> 1480:   // block waiting for the lock for closing
> 1481:  lock.writeLock().lock(); // FindBugs: Complains 
> UL_UNRELEASED_LOCK_EXCEPTION_PATH but seems fine
> {code}
>  
> The close lock is operating in unfair mode. The table in question is under 
> constant high query load. When the close request was received, there were 
> active readers. After the close request there were more active readers, 
> near-continuous contention. Although the clients would receive 
> RegionServerStoppingException and other error notifications, because the 
> region could not be reassigned, they kept coming, region (re-)location would 
> find the region still hosted on the stuck server. Finally the closing thread 
> waiting for the write lock became no longer starved (by chance) after 40 
> minutes.
> The ReentrantReadWriteLock javadoc is clear about the possibility of 
> starvation when continuously contended: "_When constructed as non-fair (the 
> default), the order of entry to the read and write lock is unspecified, 
> subject to reentrancy constraints. A nonfair lock that is continuously 
> contended may indefinitely postpone one or more reader or writer threads, but 
> will normally have higher throughput than a fair lock._"
> We could try changing the acquisition semantics of this lock to fair. This is 
> a one line change, where we call the RW lock constructor. Then:
>  "_When constructed as fair, threads contend for entry using an approximately 
> arrival-order policy. When the currently held lock is released, either the 
> longest-waiting single writer thread will be assigned the write lock, or if 
> there is a group of reader threads waiting longer than all waiting writer 
> threads, that group will be assigned the read lock._" 
> This could be better. The close process will have to wait until all readers 
> and writers already waiting for acquisition either acquire and release or go 
> away but won't be starved by future/incoming requests.
> There could be a throughput loss in request handling, though, because this is 
> the global reentrant RW lock for the region. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23372) ZooKeeper Assignment can result in stale znodes in region-in-transition after table is dropped and hbck run

2019-12-05 Thread David Manning (Jira)
David Manning created HBASE-23372:
-

 Summary: ZooKeeper Assignment can result in stale znodes in 
region-in-transition after table is dropped and hbck run
 Key: HBASE-23372
 URL: https://issues.apache.org/jira/browse/HBASE-23372
 Project: HBase
  Issue Type: Bug
  Components: hbck, master, Region Assignment, Zookeeper
Affects Versions: 1.3.2
Reporter: David Manning


It is possible for znodes under /hbase/region-in-transition to remain long 
after a table is deleted. There does not appear to be any cleanup logic for 
these.

The details are a little fuzzy, but it seems to be fallout from HBASE-22617. 
Incidents related to that bug involved regions stuck in transition, and use of 
hbck to fix clusters. There was a temporary table created and deleted once per 
day, but somehow it led to receiving 
{{FSLimitException$MaxDirectoryItemsExceededException}} and regions stuck in 
transition. Even weeks after fixing the bug and upgrading the cluster, the 
znodes remain under /hbase/region-in-transition. In the most impacted cluster, 
{{hbase zkcli ls /hbase/region-in-transition | wc -w}} returns almost 100,000 
entries. This causes very slow region transition times (often 80 seconds), 
likely due to enumerating all these entries when zk watch on this node is 
triggered.

Log lines for slow region transitions:
{code:java}
2019-12-05 07:02:14,714 DEBUG [K.Worker-pool3-t7344] master.AssignmentManager - 
Handling RS_ZK_REGION_CLOSED, server=<>, region=<>, 
which is more than 15 seconds late, current_state={<> 
state=PENDING_CLOSE, ts=1575529254635, server=<>}
{code}
Even during hmaster failover, entries are not cleaned, but the following log 
lines can be seen:
{code:java}
2019-11-27 00:26:27,044 WARN  [.activeMasterManager] master.AssignmentManager - 
Couldn't find the region in recovering region=<>, 
state=RS_ZK_REGION_FAILED_OPEN, servername=<>, 
createTime=1565603905404, payload.length=0
{code}
Possible solutions:
 # Logic to parse the RIT znode during master failover which sees if the table 
exists. Clean up entries for nonexistent tables.
 # New mode for hbck to do cleanup of nonexistent regions under the znode.
 # Others?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23153) PrimaryRegionCountSkewCostFunction SLB function should implement CostFunction#isNeeded

2019-10-11 Thread David Manning (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949852#comment-16949852
 ] 

David Manning commented on HBASE-23153:
---

Thanks [~apurtell] for doing literally all the work. I made a comment on the 
github PR about keeping the {{cost}} method as is. Otherwise LGTM

> PrimaryRegionCountSkewCostFunction SLB function should implement 
> CostFunction#isNeeded
> --
>
> Key: HBASE-23153
> URL: https://issues.apache.org/jira/browse/HBASE-23153
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 1.6.0, 2.2.2, 2.1.8, 1.5.1
>
>
> The PrimaryRegionCountSkewCostFunction SLB function should implement 
> CostFunction#isNeeded and like the other region replica specific functions 
> should return false for it when region replicas are not in use. Otherwise it 
> will always report a cost of 0 even though its weight will be included in the 
> sum of the weights. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-22935) TaskMonitor warns MonitoredRPCHandler task may be stuck when it recently started

2019-08-27 Thread David Manning (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Manning updated HBASE-22935:
--
Status: Patch Available  (was: Open)

> TaskMonitor warns MonitoredRPCHandler task may be stuck when it recently 
> started
> 
>
> Key: HBASE-22935
> URL: https://issues.apache.org/jira/browse/HBASE-22935
> Project: HBase
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 2.0.0, 1.3.3, 1.4.0, 3.0.0, 1.5.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
> Attachments: HBASE-22935.master.001.patch
>
>
> After setting {{hbase.taskmonitor.rpc.warn.time}} to 18, the logs show 
> WARN messages such as these
> {noformat}
> 2019-08-08 21:50:02,601 WARN  [read for TaskMonitor] monitoring.TaskMonitor - 
> Task may be stuck: RpcServer.FifoWFPBQ.default.handler=4,queue=4,port=60020: 
> status=Servicing call from :55164: Scan, state=RUNNING, 
> startTime=1563305858103, completionTime=-1, queuetimems=1565301002599, 
> starttimems=1565301002599, clientaddress=, remoteport=55164, 
> packetlength=370, rpcMethod=Scan
> {noformat}
> Notice that the first {{starttimems}} is far in the past. The second 
> {{starttimems}} and the {{queuetimems}} are much closer to the log timestamp 
> than 180 seconds. I think this is because the warnTime is initialized to the 
> time that MonitoredTaskImpl is created, but never updated until we write a 
> warn message to the log.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


  1   2   >