[jira] [Created] (HBASE-27493) Allow namespace admins to clone snapshots created by them

2022-11-18 Thread Szabolcs Bukros (Jira)
Szabolcs Bukros created HBASE-27493:
---

 Summary: Allow namespace admins to clone snapshots created by them
 Key: HBASE-27493
 URL: https://issues.apache.org/jira/browse/HBASE-27493
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 2.5.1, 3.0.0-alpha-3
Reporter: Szabolcs Bukros
Assignee: Szabolcs Bukros


Creating a snapshot requires table admin permissions. But cloning it requires 
global admin permissions unless the user owns the snapshot and wants to 
recreate the original table the snapshot was based on using the same table 
name. This puts unnecessary load on the few people having global admin 
permissions. I would like to relax this rule a bit and allow the owner of the 
snapshot to clone it into any namespace where they have admin permissions 
regardless of the table name used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27494) Client meta cache clear by exception metrics are missing some cases

2022-11-18 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-27494:
-

 Summary: Client meta cache clear by exception metrics are missing 
some cases
 Key: HBASE-27494
 URL: https://issues.apache.org/jira/browse/HBASE-27494
 Project: HBase
  Issue Type: Improvement
Reporter: Bryan Beaudreault
Assignee: Briana Augenreich


MetricsConnection has metrics for meta cache clears by server and region, and 
also metrics by exception type which triggered the clear. The metric by 
exception type is currently missing at least one instance (in 
AsyncRequestFutureImpl) where the cache is cleared due to an exception.

We should do an audit of cache clear calls and ensure all the appropriate ones 
are tracked by exception too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27495) Improve HFileLinkCleaner to validate back reference links ahead the next traverse

2022-11-18 Thread Tak-Lon (Stephen) Wu (Jira)
Tak-Lon (Stephen) Wu created HBASE-27495:


 Summary: Improve HFileLinkCleaner to validate back reference links 
ahead the next traverse 
 Key: HBASE-27495
 URL: https://issues.apache.org/jira/browse/HBASE-27495
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.5.2
Reporter: Tak-Lon (Stephen) Wu
Assignee: Tak-Lon (Stephen) Wu


We found a a race in the CleanerChore related to back reference links. When the 
HFileLinkCleaner runs for a file it can make 2 decisions depending on the file 
types.
 - Hfiles, The cleaner for HFile deletion only checks if the .links-<> 
directory is present with files. 
 - Back reference links, the cleaner checks if the forward link is still 
available in the data directory.

The logic and order how the cleaner checks these 2 files matters. When the back 
reference is checked first it can remove both the reference and the HFile from 
the archive, however, when it first runs for the HFile then only the 
back-reference is removed. In this case, the HFile is only deleted in the next 
iteration of the CleanerChore, and it could be very slow if the list of files 
are huge in case of using object store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27496) Limit size of plans produced by SimpleRegionNormalizer

2022-11-18 Thread Charles Connell (Jira)
Charles Connell created HBASE-27496:
---

 Summary: Limit size of plans produced by SimpleRegionNormalizer
 Key: HBASE-27496
 URL: https://issues.apache.org/jira/browse/HBASE-27496
 Project: HBase
  Issue Type: Improvement
  Components: Normalizer
Reporter: Charles Connell


My company (Hubspot) is starting to use {{{}SimpleRegionNormalizer{}}}. We turn 
the normalizer switch on for 30 minutes each day, when our database traffic is 
at a low point. We're using the 
{{hbase.normalizer.throughput.max_bytes_per_sec}} setting to create a rate 
limit. I've found that while the {{SimpleRegionNormalizer}} only produces new 
plans for 30 minutes each day, the plans often take many hours to execute. This 
leds to region splits, merges, and moves occurring in our HBase clusters during 
hours we'd prefer them not to.{color:#067d17}
{color}

I propose two new settings:
 * {{hbase.normalizer.merge.plans_size_limit.mb}}
 * {{hbase.normalizer.split.plans_size_limit.mb}}

This will allow HBase administrators to limit the number of plans produced by a 
run of {{{}SimpleRegionNormalizer{}}}. This will give you a way to limit 
approximately how long it takes to execute the plans. Because the current limit 
to execute plans is primarily determined by a per-byte rate limit, I propose 
that the new settings also work on a similar basis. This will make it feasible 
to reason about how your rate limit and your size limits interact.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27399) Add config for setting a max actions per normalizer run

2022-11-18 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-27399.
---
Resolution: Duplicate

> Add config for setting a max actions per normalizer run
> ---
>
> Key: HBASE-27399
> URL: https://issues.apache.org/jira/browse/HBASE-27399
> Project: HBase
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Priority: Major
>
> When you enable the normalizer for the first time on an existing table, the 
> next time it runs it can do a massive amount of merges/splits. This can be 
> painful for callers.
> Additionally, if someone were to write a large amount of Puts or Deletes it 
> may cause a merge/split storm as well.
> We should add a config which allows operators to limit the amount of actions 
> per run. This way the normalizer can work towards its plan gracefully over 
> the course of a few hours rather than all at once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27497) Add a note for RegionMerge tool.

2022-11-18 Thread Karthik Palanisamy (Jira)
Karthik Palanisamy created HBASE-27497:
--

 Summary: Add a note for RegionMerge tool. 
 Key: HBASE-27497
 URL: https://issues.apache.org/jira/browse/HBASE-27497
 Project: HBase
  Issue Type: Bug
  Components: hbck2
Reporter: Karthik Palanisamy


NOTE: 

Do not perform region merge operations on phoenix slated tables. It will affect 
the region boundaries and produce incorrect query results. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27498) Observed lot of threads blocked in ConnectionImplementation.getKeepAliveMasterService

2022-11-18 Thread Vaibhav Joshi (Jira)
Vaibhav Joshi created HBASE-27498:
-

 Summary: Observed lot of threads blocked in 
ConnectionImplementation.getKeepAliveMasterService
 Key: HBASE-27498
 URL: https://issues.apache.org/jira/browse/HBASE-27498
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 2.5.0
Reporter: Vaibhav Joshi
 Attachments: Screenshot 2022-11-16 at 10.06.59 AM.png

Recently We obseved that lot of threads are blocked in method 
"ConnectionImplementation.getKeepAliveMasterService" during some initialization 
stages of rolling restart workflow. 

During rolling restart, we make RPC calls to Master using 
RpcRetryingCallerImpl, so as part of initialization we call 
"ConnectionImplementation.getKeepAliveMasterService" for each thread. 
Internally this method do RPC call within a synchronized block to check if 
master is running (mss.isMasterRunning).

Lots of threads are in blocked state due following synchronized block

synchronized (masterLock) {
   if (!isKeepAliveMasterConnectedAndRunning(this.masterServiceState)) {
     MasterServiceStubMaker stubMaker = new MasterServiceStubMaker();
     this.masterServiceState.stub = stubMaker.makeStub();
   }
   resetMasterServiceState(this.masterServiceState);
 }

In Thread Dump Analyzer (2.4), we get warning that "A lot of threads are 
waiting for this monitor to become available again.
 This might indicate a congestion. You also should analyze other locks blocked 
by threads waiting for this monitor as there might be much more threads waiting 
for it.". Please check attached screenshot  !Screenshot 2022-11-16 at 10.06.59 
AM.png|width=1639,height=971!



"pool-11-thread-158" #313 prio=5 os_prio=0 tid=0x55b88bcb8800 nid=0x404e 
waiting for monitor entry [0x7fa48aa86000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at 
org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService(ConnectionImplementation.java:1336)
    - waiting to lock <0x0005d30ecb68> (a java.lang.Object)
    at 
org.apache.hadoop.hbase.client.ConnectionImplementation.getMaster(ConnectionImplementation.java:1327)
    at 
org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:57)
    at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:103)
    at 
org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3019)
    at 
org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3011)
    at org.apache.hadoop.hbase.client.HBaseAdmin.move(HBaseAdmin.java:1458)
    at org.apache.hadoop.hbase.util.MoveWithoutAck.call(MoveWithoutAck.java:58)
    at org.apache.hadoop.hbase.util.MoveWithoutAck.call(MoveWithoutAck.java:33)

---

 

*Proposal:*
We can optimize this flow as follows
1. Use double checked lock for 
"isKeepAliveMasterConnectedAndRunning(this.masterServiceState)" so that theads 
don't race for monitor, when master is running.
2. "isKeepAliveMasterConnectedAndRunning()" method should reuse the Globally 
cached state of isMasterRunning instead of doing expensive Call in for each 
thread. 

 

Note: The "master" branch uses "AsyncConnectionImpl" so apparently we don't 
have issues there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)