[jira] [Created] (HBASE-26806) HBase tombstone markers for MOB files are not flushed during the major compaction

2022-03-07 Thread Manish Sharma (Jira)
Manish Sharma created HBASE-26806:
-

 Summary: HBase tombstone markers for MOB files are not flushed 
during the major compaction
 Key: HBASE-26806
 URL: https://issues.apache.org/jira/browse/HBASE-26806
 Project: HBase
  Issue Type: Bug
  Components: Admin
Affects Versions: 2.0.2
Reporter: Manish Sharma


Hi, 

While running the major compaction, non MOB Files are flushed while MOB Files 
are not flushed. Can someone suggest the solution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-25782) TestStochasticLoadBalancerBalanceCluster.testBalanceCluster is flaky

2022-03-07 Thread Xiaolin Ha (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha resolved HBASE-25782.

Resolution: Duplicate

> TestStochasticLoadBalancerBalanceCluster.testBalanceCluster is flaky
> 
>
> Key: HBASE-25782
> URL: https://issues.apache.org/jira/browse/HBASE-25782
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xiaolin Ha
>Priority: Major
>
> Seems after HBASE-25739.
> Shows in 
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3176/1/testReport/org.apache.hadoop.hbase.master.balancer/TestStochasticLoadBalancerBalanceCluster/testBalanceCluster/]
> Local test can reproduce the failure,
> {code:java}
> 2021-04-16T20:11:21,425 INFO  [Time-limited test] 
> balancer.TestStochasticLoadBalancerBalanceCluster(61): Mock Cluster : { 
> srv1241949559:1 , srv609693614:5 , srv1125287745:6 , srv1143442391:6 , 
> srv1165735784:6 , srv1221998538:6 , srv394489737:6 , srv593165442:6 , 
> srv736809440:6 , srv741384165:6 } [srvr=10 rgns=54 avg=5.4 max=6 
> min=5]2021-04-16T20:11:21,425 INFO  [Time-limited test] 
> balancer.TestStochasticLoadBalancerBalanceCluster(61): Mock Cluster : { 
> srv1241949559:1 , srv609693614:5 , srv1125287745:6 , srv1143442391:6 , 
> srv1165735784:6 , srv1221998538:6 , srv394489737:6 , srv593165442:6 , 
> srv736809440:6 , srv741384165:6 } [srvr=10 rgns=54 avg=5.4 max=6 
> min=5]2021-04-16T20:11:21,425 INFO  [Time-limited test] 
> balancer.BaseLoadBalancer(1791): Start Generate Balance plan for 
> cluster.2021-04-16T20:11:21,425 DEBUG [Time-limited test] 
> balancer.StochasticLoadBalancer$RegionCountSkewCostFunction(925): 
> RegionCountSkewCostFunction sees a total of 10 servers and 54 
> regions.2021-04-16T20:11:21,425 DEBUG [Time-limited test] 
> balancer.StochasticLoadBalancer(361): Skipping load balancing because 
> balanced cluster; total cost=25.97402597402596, sum multiplier=582.0; 
> cost/multiplier to need a balance is 0.052021-04-16T20:11:21,425 INFO  
> [Time-limited test] balancer.TestStochasticLoadBalancerBalanceCluster(67): 
> Mock Balance : { srv1241949559:1 , srv609693614:5 , srv1125287745:6 , 
> srv1143442391:6 , srv1165735784:6 , srv1221998538:6 , srv394489737:6 , 
> srv593165442:6 , srv736809440:6 , srv741384165:6 } java.lang.AssertionError: 
> All servers should have load no less than 5. 
> server=srv1241949559,13844,-2719393974186553415 , load=1 at 
> org.junit.Assert.fail(Assert.java:89) at 
> org.junit.Assert.assertTrue(Assert.java:42) at 
> org.apache.hadoop.hbase.master.balancer.BalancerTestBase.assertClusterAsBalanced(BalancerTestBase.java:207)
>  at 
> org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancerBalanceCluster.testBalanceCluster(TestStochasticLoadBalancerBalanceCluster.java:68)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
> @[~claraxiong] Could you help to fix this issue?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-23186) Close dfs output stream in fsck threads when master exit

2022-03-07 Thread Xiaolin Ha (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha resolved HBASE-23186.

Resolution: Won't Fix

> Close dfs output stream in fsck threads when master exit
> 
>
> Key: HBASE-23186
> URL: https://issues.apache.org/jira/browse/HBASE-23186
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
>
> HBASE-21072 imported to use HBaseFsck as default in hbase2.
> {code:java}
> if (this.conf.getBoolean("hbase.write.hbck1.lock.file", true)) {
>   HBaseFsck.checkAndMarkRunningHbck(this.conf,
>   HBaseFsck.createLockRetryCounterFactory(this.conf).create());
> }{code}
>  
> We should close the dfs output stream when master abort/stop.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-22940) Fix snapshot NoNode error

2022-03-07 Thread Xiaolin Ha (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha resolved HBASE-22940.

Resolution: Won't Fix

> Fix snapshot NoNode error
> -
>
> Key: HBASE-22940
> URL: https://issues.apache.org/jira/browse/HBASE-22940
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Minor
> Attachments: detailed snapshot nonode errror logs.txt
>
>
> When we take snapshot for thousands tables on our cluster, we found there 
> occasionally occurs NoNodeException,error stack is as follows,
> {quote}ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: 
> Snapshot \{ ss=KYLIN_2JAU7T91XU_mtzjyprc 
> table=kylin_zjyprc_bigdata_staging:KYLIN_2JAU7T91XU type=FLUSH } had an 
> error. Procedure KYLIN_2JAU7T91XU_mtzjyprc \{ waiting=[] done=[] } at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:350)
>  at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:3674) 
> at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:44817)
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2059) at 
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:126) at 
> org.apache.hadoop.hbase.ipc.MasterFifoRpcScheduler.lambda$dispatch$1(MasterFifoRpcScheduler.java:68)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
> zjy-hadoop-prc-st1309.bj,24600,1557969473924:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
>  java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for 
> /hbase/zjyprc-xiaomi/online-snapshot/reached/KYLIN_2JAU7T91XU_mtzjyprc/zjy-hadoop-prc-st1309.bj,24600,1557969473924
>  at 
> org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
>  at 
> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:312)
>  at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:340)
>  ... 10 more Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
> java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for 
> /hbase/zjyprc-xiaomi/online-snapshot/reached/KYLIN_2JAU7T91XU_mtzjyprc/zjy-hadoop-prc-st1309.bj,24600,1557969473924
>  at 
> org.apache.hadoop.hbase.procedure.Subprocedure.cancel(Subprocedure.java:270) 
> at 
> org.apache.hadoop.hbase.procedure.ProcedureMember.controllerConnectionFailure(ProcedureMember.java:225)
>  at 
> org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.sendMemberCompleted(ZKProcedureMemberRpcs.java:267)
>  at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:185) at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52) ... 
> 4 more @zjy-hadoop-prc-zk05.bj/10.152.48.41:24500 Here is some help for this 
> command: Take a snapshot of specified table. Examples: hbase> snapshot 
> 'sourceTable', 'snapshotName' hbase> snapshot 'namespace:sourceTable', 
> 'snapshotName', \{SKIP_FLUSH => true}
> {quote}
> I looked through relevant server logs, and found that currently 
> implementation of snapshot has some problems.  When creating Procedure for 
> snapshot, the regions servers where table regions on will be set as  acquired 
> and released barriers. Master watches zk and if all the barrier region 
> servers have added nodes to the parent reached node, coordinator releases ALL 
> the barriers and snapshot procedure will be thought as completed. Followed by 
> the relevant parent reached/required node be cleared by `resetMembers()`. But 
> all the region servers will add node to the parent reached/required node, so 
> non-barrier region servers add children will encounter NoNodeException at 
> this time.
> We think the coordinator only set relevant region servers as barriers may be 
> not enough. All region servers adds node and may be all can be barriers.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-22479) Release sycLatch too early in CreateTable() to get table state failed in postCreateTable

2022-03-07 Thread Xiaolin Ha (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha resolved HBASE-22479.

Resolution: Won't Fix

> Release sycLatch too early in CreateTable() to get table state failed in 
> postCreateTable
> 
>
> Key: HBASE-22479
> URL: https://issues.apache.org/jira/browse/HBASE-22479
> Project: HBase
>  Issue Type: Bug
>  Components: master, rsgroup
>Affects Versions: 2.2.0
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
>
> SycLatch will be released as soon as prepared create table.
> But if in postCreateTable we need to get some info of the created table, it 
> will be failed.
> This can be reproduced by calling createTable() on clusters enabling rsgroup.
> ERROR log is as follows,
> 2019-05-10,11:28:07,394 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=254,queue=14,port=57900] 
> org.apache.hadoop.hbase.master.TableStateManager: Unable to get table 
> work:error1 state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> work:error1
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:147)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:365)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:411)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:444)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:467)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$13.call(MasterCoprocessorHost.java:351)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$13.call(MasterCoprocessorHost.java:348)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:348)
> at org.apache.hadoop.hbase.master.HMaster$4.run(HMaster.java:2082)
> at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134)
> at 
> org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:2065)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:681)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> I think sycLatch of createTable should be released after postCreate in CTP. 
> Any suggestions or concerns?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-19975) Remove dead servers from rsgroup

2022-03-07 Thread Xiaolin Ha (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-19975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha resolved HBASE-19975.

Resolution: Won't Fix

> Remove dead servers from rsgroup
> 
>
> Key: HBASE-19975
> URL: https://issues.apache.org/jira/browse/HBASE-19975
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>Affects Versions: 2.0.0-beta-2
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
>
> Currently only decommissioned servers are allowed to be removed from a 
> rsgroup.
> There are no regions will be assigned to dead servers too, and may dead 
> servers need to be removed more often by users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-23201) Setting switch for moveSystemRegionAsync to highest version servers

2022-03-07 Thread Xiaolin Ha (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha resolved HBASE-23201.

Resolution: Won't Fix

> Setting switch for moveSystemRegionAsync to highest version servers
> ---
>
> Key: HBASE-23201
> URL: https://issues.apache.org/jira/browse/HBASE-23201
> Project: HBase
>  Issue Type: Improvement
>  Components: amv2
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Minor
>
> Imported by HBASE-17931, like HBASE-22767 may cause system tables can not be 
> assigned.
> It will be triggered when master is starting up and new RS is added. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-21331) TestAccessController.testRemoteLocks is flakey

2022-03-07 Thread Xiaolin Ha (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-21331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha resolved HBASE-21331.

Resolution: Won't Fix

> TestAccessController.testRemoteLocks is flakey
> --
>
> Key: HBASE-21331
> URL: https://issues.apache.org/jira/browse/HBASE-21331
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: HBASE-21331.branch-2.001.patch
>
>
> TestAccessController.testRemoteLocks might return 
> {color:#205081}java.lang.AssertionError: Expected action to pass for user 
> 'qLTableACUser' but was denied
>   at 
> org.apache.hadoop.hbase.security.access.TestAccessController.testRemoteLocks(TestAccessController.java:3017)​{color}
> Should pause a while to wait for the permission change to propagate to all 
> watchers.
> Similar to HBASE-10465。
> Error logs are as follows:
> {color:#205081}2018-10-15 16:35:03,292 INFO  [RS-EventLoopGroup-6-3] 
> ipc.ServerRpcConnection(556): Connection from 172.32.9.50:34590, 
> version=2.1.0-mdh3-SNAPSHOT, sasl=false, ugi=default (auth:SIMPLE), 
> service=ClientService
> 2018-10-15 16:35:03,300 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=8,queue=0,port=42095] 
> access.AccessController(2045): Received request to grant access permission 
> UserPermission: user=qLTableACUser, [TablePermission: 
> table=preQueueNs:testRemoteLocks, family=null, qualifier=null, 
> actions=ADMIN,CREATE]
> 2018-10-15 16:35:03,301 DEBUG 
> [RpcServer.priority.FPBQ.Fifo.handler=8,queue=0,port=42095] 
> access.AccessControlLists(181): Writing permission with rowKey 
> preQueueNs:testRemoteLocks qLTableACUser: CA
> 2018-10-15 16:35:03,304 DEBUG [htable-pool1175-t1] 
> access.AccessControlLists(607): Read acl: kv [qLTableACUser: CA]
> 2018-10-15 16:35:03,305 DEBUG [htable-pool1175-t1] 
> access.AccessControlLists(607): Read acl: kv [qLTableRWXUser: RWX]
> 2018-10-15 16:35:03,310 DEBUG [Time-limited test-EventThread] 
> zookeeper.ZKWatcher(485): regionserver:42095-0x16676dbff930001, 
> quorum=localhost:49645, baseZNode=/hbase Received ZooKeeper Event, 
> type=NodeDataChanged, state=SyncConnected, 
> path=/hbase/acl/preQueueNs:testRemoteLocks
> 2018-10-15 16:35:03,344 DEBUG [Time-limited test-EventThread] 
> zookeeper.ZKWatcher(485): master:34983-0x16676dbff93, 
> quorum=localhost:49645, baseZNode=/hbase Received ZooKeeper Event, 
> type=NodeDataChanged, state=SyncConnected, 
> path=/hbase/acl/preQueueNs:testRemoteLocks
> 2018-10-15 16:35:03,367 INFO  [Time-limited test] 
> zookeeper.ReadOnlyZKClient(350): Close zookeeper connection 0x5d6295fe to 
> localhost:49645
> 2018-10-15 16:35:03,368 DEBUG [Time-limited test] ipc.AbstractRpcClient(491): 
> Stopping rpc client
> 2018-10-15 16:35:03,369 INFO  [Time-limited test] hbase.Waiter(189): Waiting 
> up to [10,000] milli-secs(wait.for.ratio=[1])
> 2018-10-15 16:35:03,369 INFO  [Time-limited test] 
> access.SecureTestUtil$1(356): AccessController on region 
> hbase:acl,,1539592430177.caa9354e41bc4d1d52f493194490f66c. has not updated: 
> mtime=219
> 2018-10-15 16:35:03,440 DEBUG [zk-permission-watcher2-thread-1] 
> access.ZKPermissionWatcher(245): Updating permissions cache from 
> preQueueNs:testRemoteLocks with data 
> PBUF\x0A8\x0A\x0DqLTableACUser\x12'\x08\x03"#\x0A\x1D\x0A\x0ApreQueueNs\x12\x0FtestRemoteLocks
>  \x03 
> \x04\x0A;\x0A\x0EqLTableRWXUser\x12)\x08\x03"%\x0A\x1D\x0A\x0ApreQueueNs\x12\x0FtestRemoteLocks
>  \x00 \x01 \x02​{color}​



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-25720) Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM

2022-03-07 Thread Xiaolin Ha (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha resolved HBASE-25720.

Resolution: Won't Fix

> Sync WAL stuck when prepare flush cache will prevent flush cache and cause OOM
> --
>
> Key: HBASE-25720
> URL: https://issues.apache.org/jira/browse/HBASE-25720
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Attachments: prepare-flush-cache-stuck.png
>
>
> We call HRegion#doSyncOfUnflushedWALChanges when preparing to flush cache. 
> But this WAL sync may stuck, and abort the flush of cache. 
> !prepare-flush-cache-stuck.png|width=519,height=246!
> If we cannot aware of this problem in time, RS will OOM kill.
> I think we should force abort RS when sync stuck in preparing, like in 
> committing snapshots.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26303) Use priority queue in dir scan pool of cleaner

2022-03-07 Thread Xiaolin Ha (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha resolved HBASE-26303.

Resolution: Won't Fix

> Use priority queue in dir scan pool of cleaner
> --
>
> Key: HBASE-26303
> URL: https://issues.apache.org/jira/browse/HBASE-26303
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
>
> DirScanPool used normal LinkedBlockingQueue when creating thread pool,
> {code:java}
>  54   private static ThreadPoolExecutor initializePool(int size) {
>  55     return Threads.getBoundedCachedThreadPool(size, 1, TimeUnit.MINUTES,
>  56       new 
> ThreadFactoryBuilder().setNameFormat("dir-scan-pool-%d").setDaemon(true)
>  57         
> .setUncaughtExceptionHandler(Threads.LOGGING_EXCEPTION_HANDLER).build());
>  58   }
> {code}
> which will not priority scan larger directories and delete files there as 
> expected, though CleanerChore#sortByConsumedSpace() before putting 
> directories to the queue.
> Subdirectories of larger directories and small directories are taken fairly 
> in the queue.
> We should used priority queue here instead, e.g. PriorityBlockingQueue, to 
> make larger directories be cleaned earlier. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26807) Unify CallQueueTooBigException special pause with CallDroppedException

2022-03-07 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-26807:
-

 Summary: Unify CallQueueTooBigException special pause with 
CallDroppedException
 Key: HBASE-26807
 URL: https://issues.apache.org/jira/browse/HBASE-26807
 Project: HBase
  Issue Type: Improvement
Reporter: Bryan Beaudreault


CallQueueTooBigException and CallDroppedException crop up in very similar 
circumstances – the former is thrown if the request cannot be enqueued because 
the queue is full; the latter is thrown when a call is dropped from the queue 
to make room for another call.

HBASE-17114 added a special pause feature, which allows pausing for a longer 
period of time when CallQueueTooBigException is encountered, vs the normal 
pause for other exceptions. The idea here is to help reduce load so the server 
can process its queue. We should extend this feature to encompass 
CallDroppedException for the same reason.

Currently the config is called "hbase.client.pause.cqtbe". We should probably 
deprecate that in favor of a more generic name.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26808) YCSB section of docs refers to unexisting github repo

2022-03-07 Thread Laurent Edel (Jira)
Laurent Edel created HBASE-26808:


 Summary: YCSB section of docs refers to unexisting github repo
 Key: HBASE-26808
 URL: https://issues.apache.org/jira/browse/HBASE-26808
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Laurent Edel


in [the YCSB section|https://hbase.apache.org/book.html#ycsb] of the Apache 
HBase book the [Ted Dunning's YCSB repo|https://github.com/tdunning/YCSB] is 
mentioned.

This repo doesn't exist anymore, maybe to be replaced by [Brian Frank 
Cooper|https://github.com/brianfrankcooper/YCSB/] ones?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-25844) Fix Jersey for hbase-server processes

2022-03-07 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-25844.
--
Resolution: Fixed

All subtasks are resolved, I think this is working correctly as of the upgrade 
to hbase-thirdparty:4.0.1

> Fix Jersey for hbase-server processes
> -
>
> Key: HBASE-25844
> URL: https://issues.apache.org/jira/browse/HBASE-25844
> Project: HBase
>  Issue Type: Task
>  Components: master, regionserver, thirdparty
>Affects Versions: 3.0.0-alpha-1, 2.5.0
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
>
> I spent some time trying to use Jersey from within the Master and it's not 
> working. To summarize, we have unshaded resources from both 
> jersey-server-1.19 and jersey-server-2.32 on the hbase-server classpath. 
> Jersey's initialization uses ServiceLoader to look up concrete implementation 
> classes of {{javax.ws.rs}} classes at runtime. Because we do not shade 
> {{javax.ws.rs}} in hbase-thirdparty-jersey, an attempt to use shaded 
> jersey-2.x still results in loading unshaded jersey-1.x jars, leading to an 
> error like this
> {noformat}
> java.lang.AbstractMethodError: 
> javax.ws.rs.core.UriBuilder.uri(Ljava/lang/String;)Ljavax/ws/rs/core/UriBuilder;
>   at javax.ws.rs.core.UriBuilder.fromUri(UriBuilder.java:96)
>   at 
> org.apache.hbase.thirdparty.org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:275)
>   at 
> org.apache.hbase.thirdparty.org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
>   at 
> org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791)
>   at 
> org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
>   at 
> org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:112)
> {noformat}
> We cannot override what version of these classes are loaded at runtime via 
> Java property because Jersey's load order implementation checks system 
> properties as a last resort, not first thing as is claimed by javadoc.
> So I can think of two solutions.
> # One is to shade {{javax.ws.rs}} in hbase-thirdparty-jersey. This would 
> shade both the interfaces and the resource files that are referenced at 
> runtime, allowing for an entirely isolated jersey container instantiate.
> # Another idea is to add a custom {{ClassLoader}} that is inserted before 
> jersey is initialized. This would filter out resources that are "banned", 
> allowing our desired implementation through.
> Between these, I think (1) is better, but I don't know what else might break. 
> I've made an effort of both, but with neither approach can I get a jersey 
> environment to response from my resource class... either because the solution 
> is incomplete, or because I don't have the jersey environment configured 
> properly.
> See also some discussion that happened over on Slack, 
> https://apache-hbase.slack.com/archives/C13K8NVAM/p1618857521051700



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Reopened] (HBASE-26732) [hbase-thirdparty] Update jackson (databind) to 2.13.1

2022-03-07 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell reopened HBASE-26732:
-

Reopening to also encompass main project POM changes.

> [hbase-thirdparty] Update jackson (databind) to 2.13.1
> --
>
> Key: HBASE-26732
> URL: https://issues.apache.org/jira/browse/HBASE-26732
> Project: HBase
>  Issue Type: Bug
>  Components: security, thirdparty
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: thirdparty-4.1.0
>
>
> Update jackson-databind to 2.13.1 to address a raised vulnerability that 
> could possible DoS attack certain versions of Jackson. Please refer to 
> https://github.com/FasterXML/jackson-databind/issues/3328 for further info.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-22338) LICENSE file only contains Apache 2.0

2022-03-07 Thread Tak-Lon (Stephen) Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tak-Lon (Stephen) Wu resolved HBASE-22338.
--
Fix Version/s: hbase-connectors-1.1.0
   (was: hbase-connectors-1.0.1)
 Hadoop Flags: Reviewed
   Resolution: Fixed

[036729dea0d1041e957f0349254f4ea1cf7cfba9|https://github.com/apache/hbase-connectors/commit/036729dea0d1041e957f0349254f4ea1cf7cfba9]
 pushed, resolving

> LICENSE file only contains Apache 2.0
> -
>
> Key: HBASE-22338
> URL: https://issues.apache.org/jira/browse/HBASE-22338
> Project: HBase
>  Issue Type: Bug
>  Components: hbase-connectors
>Affects Versions: connector-1.0.0
>Reporter: Peter Somogyi
>Assignee: Tak-Lon (Stephen) Wu
>Priority: Critical
> Fix For: hbase-connectors-1.1.0
>
> Attachments: NOTICE.aggregate-no-build-year, 
> hbase-connectors-dependency.html
>
>
> LICENSE.md file has only Apache 2.0 licenses but we package dependencies that 
> use different ones. For example jcodings uses MIT.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26809) Report client backoff time for server overloaded in ConnectionMetrics

2022-03-07 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-26809:
-

 Summary: Report client backoff time for server overloaded in 
ConnectionMetrics
 Key: HBASE-26809
 URL: https://issues.apache.org/jira/browse/HBASE-26809
 Project: HBase
  Issue Type: New Feature
Reporter: Bryan Beaudreault


Servers can throw CallQueueTooBigException and CallDroppedException when 
overloaded. As of HBASE-26807, these can have a configurable extra backoff to 
allow the server to recover. Depending on the server side queue implementation, 
different callers may be more or less impacted by server load. It is very 
useful to measure in the client how much time is spent backing off so we can 
see which clients are most affected by server overload.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26810) Add dynamic configuration support for system coprocessors

2022-03-07 Thread Tak-Lon (Stephen) Wu (Jira)
Tak-Lon (Stephen) Wu created HBASE-26810:


 Summary: Add dynamic configuration support for system coprocessors
 Key: HBASE-26810
 URL: https://issues.apache.org/jira/browse/HBASE-26810
 Project: HBase
  Issue Type: Bug
  Components: conf, Coprocessors
Affects Versions: 3.0.0-alpha-3
Reporter: Tak-Lon (Stephen) Wu


[Dynamic Configuraiton|https://hbase.apache.org/book.html#dyn_config] is very 
helpful that the operator can keep the JVM for regionserver or master running 
without restarting it. 

With this feature, this task aims to extend the scope of on demend 
configuration to system coprocessors including REGION, USER REGION, 
REGIONSERVER, MASTER such that we could save time on rolling restart with a 
quicker update. 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26811) Secondary replica may be disabled for read forever

2022-03-07 Thread chenglei (Jira)
chenglei created HBASE-26811:


 Summary: Secondary replica may be disabled for read forever
 Key: HBASE-26811
 URL: https://issues.apache.org/jira/browse/HBASE-26811
 Project: HBase
  Issue Type: Bug
Reporter: chenglei






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26812) ShortCircuitingClusterConnection fails to close RegionScanners when making short-circuited calls

2022-03-07 Thread Lars Hofhansl (Jira)
Lars Hofhansl created HBASE-26812:
-

 Summary: ShortCircuitingClusterConnection fails to close 
RegionScanners when making short-circuited calls
 Key: HBASE-26812
 URL: https://issues.apache.org/jira/browse/HBASE-26812
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.9
Reporter: Lars Hofhansl


Just ran into this on the Phoenix side.
We retrieve a Connection via {{RegionCoprocessorEnvironment.createConnection... 
getTable(...)}}. And then call get on that table. The Get's key happens to 
local. Now each call to table.get() leaves an open StoreScanner around forever. 
(verified with a memory profiler).

There references are held via 
RegionScannerImpl.storeHeap.scannersForDelayedClose. Eventially the 
RegionServer goes a GC of death.

The reason appears to be that in this case there is currentCall context. Some 
time in 2.x the Rpc handler/call was made responsible for closing open region 
scanners, but we forgot to handle {{ShortCircuitingClusterConnection}}

It's not immediately clear how to fix this. But it does make 
ShortCircuitingClusterConnection useless and dangerous. If you use it, you 
*will* create a giant memory leak.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)