[jira] [Created] (HBASE-28399) region size can be wrong from RegionSizeCalculator

2024-02-23 Thread ruanhui (Jira)
ruanhui created HBASE-28399:
---

 Summary: region size can be wrong from RegionSizeCalculator
 Key: HBASE-28399
 URL: https://issues.apache.org/jira/browse/HBASE-28399
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 3.0.0-beta-1
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-beta-2


The RegionSizeCalculator calculates region byte size using the following method
{code:java}
private static final long MEGABYTE = 1024L * 1024L;
long regionSizeBytes =
  ((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * MEGABYTE; 
{code}
However, this method will lose accuracy. For example, the result of 
{code:java}
((long) new Size(1, Size.Unit.BYTE).get(Size.Unit.MEGABYTE)) * MEGABYTE {code}
is 0. This will result in a TableInputSplit with a length of 0, but in fact 
this TableInputSplit has a small amount of data.

 

This TableInputSplit will be ignored if we enable 
spark.hadoopRDD.ignoreEmptySplits.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28195) set start row as prefix if a scan with PrefixFilter

2023-11-09 Thread ruanhui (Jira)
ruanhui created HBASE-28195:
---

 Summary: set start row as prefix if a scan with PrefixFilter
 Key: HBASE-28195
 URL: https://issues.apache.org/jira/browse/HBASE-28195
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Affects Versions: 3.0.0-alpha-4
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-beta-1


If a scan with PrefixFilter, we can set start row as the prefix. This will help 
reduce filtered data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28194) New Splittable Meta

2023-11-09 Thread ruanhui (Jira)
ruanhui created HBASE-28194:
---

 Summary: New Splittable Meta
 Key: HBASE-28194
 URL: https://issues.apache.org/jira/browse/HBASE-28194
 Project: HBase
  Issue Type: New Feature
  Components: meta, Region Assignment
Reporter: ruanhui


This issue is used to try to land to solution on splittable meta.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28116) Move snapshot storage from filesystem to a separated HBase table

2023-09-27 Thread ruanhui (Jira)
ruanhui created HBASE-28116:
---

 Summary: Move snapshot storage from filesystem to a separated 
HBase table
 Key: HBASE-28116
 URL: https://issues.apache.org/jira/browse/HBASE-28116
 Project: HBase
  Issue Type: New Feature
  Components: snapshots
Reporter: ruanhui


As we know, rename and list are very expensive operations on object storage. 
Currently, the snapshot in hbase relies on these two operations. For example, 
when taking snapshot, we first write snapshot description and data manifest 
file to a temporary directory ,then commit it by a rename operation. When list 
all snapshots, we will scan the snapshot directory to find all completed 
snapshots.

So maybe we can try to introduce a new snapshot storage, using hbase table to 
store it.
Here are a few points from which maybe we can gain benefits:
1. make hbase easier to deploy on object storage, like s3
2. will make snapshots faster and more lightweight. In the current 
filesystem-based snapshot implementation, when consolidating snapshot manifest, 
we will first list all region manifests with a thread pool, read content and 
then delete them. When the number of regions is large, this process may take a 
lot of time. In comparison, the read and write operations of hbase tables are 
more lightweight than the read and write operations of hdfs files.
3. more likely to reduce hdfs small files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28080) correct span name in AbstractRpcBasedConnectionRegistry#getActiveMaster

2023-09-12 Thread ruanhui (Jira)
ruanhui created HBASE-28080:
---

 Summary: correct span name in 
AbstractRpcBasedConnectionRegistry#getActiveMaster
 Key: HBASE-28080
 URL: https://issues.apache.org/jira/browse/HBASE-28080
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 3.0.0-alpha-4
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 4.0.0-alpha-1


It looks like that the span name does not correspond to what is actually done.
 
public CompletableFuture getActiveMaster() {
return tracedFuture(
() -> this
. call(
(c, s, d) -> s.getActiveMaster(c, GetActiveMasterRequest.getDefaultInstance(), 
d),
GetActiveMasterResponse::hasServerName, "getActiveMaster()")
.thenApply(resp -> ProtobufUtil.toServerName(resp.getServerName())),
getClass().getSimpleName() + ".getClusterId");
}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28015) rpc handler can get stuck on LruBlockCache

2023-08-10 Thread ruanhui (Jira)
ruanhui created HBASE-28015:
---

 Summary: rpc handler can get stuck on LruBlockCache
 Key: HBASE-28015
 URL: https://issues.apache.org/jira/browse/HBASE-28015
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Affects Versions: 3.0.0-alpha-4
Reporter: ruanhui
 Fix For: 4.0.0-alpha-1


We found lots of read handlers got stuck on LruBlockCache#getBlock, this may be 
caused by a bug in jdk8 ConcurrentHashMap. To make common fast, I think we'd 
better get and check it before call ConcurrentHashMap#computeIfAbsent.

 

 

"RpcServer.priority.RWQ.Fifo.scan.handler=190,queue=57,port=60020" #1807 daemon 
prio=5 os_prio=0 cpu=9703.28ms elapsed=88160.93s tid=0x7f38d338a800 
nid=0x8f4 waiting for monitor entry [0x7f0af4baa000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
java.util.concurrent.ConcurrentHashMap.computeIfPresent(ConcurrentHashMap.java:1760)
        - waiting to lock <0x7f2fc6495fe0> (a 
java.util.concurrent.ConcurrentHashMap$Node)
        at 
org.apache.hadoop.hbase.io.hfile.LruBlockCache.getBlock(LruBlockCache.java:538)
        at 
org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:88)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.getCachedBlock(HFileReaderImpl.java:1124)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1300)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:331)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:679)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:631)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:315)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.backwardSeek(StoreFileScanner.java:561)
        at 
org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.backwardSeek(ReversedKeyValueHeap.java:117)
        at 
org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.backwardSeek(ReversedStoreScanner.java:134)
        at 
org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekAsDirection(ReversedStoreScanner.java:94)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:821)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:727)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:155)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:7515)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:7683)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:7447)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3403)
        - locked <0x7f2ff1fc8f40> (a 
org.apache.hadoop.hbase.regionserver.ReversedRegionScannerImpl)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3662)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45253)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:447)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:136)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

   Locked ownable synchronizers:
        - None



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27988) NPE in AddPeerProcedure recovery

2023-07-23 Thread ruanhui (Jira)
ruanhui created HBASE-27988:
---

 Summary: NPE in AddPeerProcedure recovery
 Key: HBASE-27988
 URL: https://issues.apache.org/jira/browse/HBASE-27988
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 4.0.0-alpha-1
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-4


AddPeerProcedure will restore syncReplicationPeerLock when replayed in master 
recovery, however the replicationPeerManager has not been initialized when 
replay procedure, which will cause a nullPointerException and master to abort.
{code:java}
@Override
protected void afterReplay(MasterProcedureEnv env) {
  // ..
  if (peerConfig.isSyncReplication()) {
if (!env.getReplicationPeerManager().tryAcquireSyncReplicationPeerLock()) {
  throw new IllegalStateException(
"Can not acquire sync replication peer lock for peer " + peerId);
}
  }
}  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27984) NPE in MigrateReplicationQueueFromZkToTableProcedure recovery

2023-07-19 Thread ruanhui (Jira)
ruanhui created HBASE-27984:
---

 Summary: NPE in MigrateReplicationQueueFromZkToTableProcedure 
recovery
 Key: HBASE-27984
 URL: https://issues.apache.org/jira/browse/HBASE-27984
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 3.0.0-alpha-4
Reporter: ruanhui
 Fix For: 4.0.0-alpha-1


MigrateReplicationQueueFromZkToTableProcedure will restore the disabled state 
of replication log cleaner barrier when replayed in master recovery,
{code:java}
@Override
protected void afterReplay(MasterProcedureEnv env) {
  if (getCurrentState() == getInitialState()) {
// do not need to disable log cleaner or acquire lock if we are in the 
initial state, later
// when executing the procedure we will try to disable and acquire.
return;
  }
  if 
(!env.getReplicationPeerManager().getReplicationLogCleanerBarrier().disable()) {
throw new IllegalStateException("can not disable log cleaner, this should 
not happen");
  }
} {code}


however the replicationPeerManager has not been initialized when replay 
procedure, which will cause a nullPointerException and master to abort.

Maybe better to add a check after the initialization of replicationPeerManager 
to determine whether replication log cleaner barrier needs to be disabled ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27968) add JvmPauseMonitor in hbase-client

2023-07-09 Thread ruanhui (Jira)
ruanhui created HBASE-27968:
---

 Summary: add JvmPauseMonitor in hbase-client
 Key: HBASE-27968
 URL: https://issues.apache.org/jira/browse/HBASE-27968
 Project: HBase
  Issue Type: New Feature
  Components: Client
Affects Versions: 3.0.0-alpha-4
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-beta-1


Many of our users integrate hbase-client in some frameworks such as SpringBoot, 
and JvmPauseMonitor will help to find GC problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27967) introduce a ConnectionLimitHandler to limit the number of concurrent connections to the Server

2023-07-07 Thread ruanhui (Jira)
ruanhui created HBASE-27967:
---

 Summary: introduce a ConnectionLimitHandler to limit the number of 
concurrent connections to the Server
 Key: HBASE-27967
 URL: https://issues.apache.org/jira/browse/HBASE-27967
 Project: HBase
  Issue Type: New Feature
  Components: IPC/RPC
Affects Versions: 3.0.0-alpha-4
Reporter: ruanhui
 Fix For: 3.0.0-beta-1


The unreasonable retries of the client cause the hbase server to fail to accept 
and create new connections, and thus hang up. We can consider introducing a 
ConnectionLimitHandler similar to Cassandra in our NettyRpcServer to protect 
the hbase servers.

 
ERROR [master:store-WAL-Roller] master.HMaster: * ABORTING master 
hmaster,6,1679921578648: IOE in log roller *
java.net.SocketException: Call From hmaster/hmaster to namenode:9000 failed on 
socket exception: java.net.SocketException: Too many open files; For more 
details see: [http://wiki.apache.org/hadoop/SocketException]
java.io.IOException: Too many open files
at java.base/sun.nio.ch.Net.accept(Native Method)
at 
java.base/sun.nio.ch.ServerSocketChannelImpl.implAccept(ServerSocketChannelImpl.java:425)
at 
java.base/sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:391)
at org.apt 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:376)
at jdk.proxy2/jdk.proxy2.$Proxy24.getFileInfo(Unknown Source)
at jdk.internal.reflect.GeneratedMethodAccessor139.invoke(Unknown Source)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:376)
at jdk.proxy2/jdk.proxy2.$Proxy24.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1753)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1617)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1614)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1629)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1713)
at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.getNewPath(AbstractFSWAL.java:582)
at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:843)
at 
org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:268)
at org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:187)
Caused by: java.net.SocketException: Too many open files
at java.base/sun.nio.ch.Net.socket0(Native Method)
at java.base/sun.nio.ch.Net.socket(Net.java:524)
at java.base/sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:146)
at java.base/sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:129)
at 
java.base/sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:77)
at java.base/java.nio.channels.SocketChannel.open(SocketChannel.java:192)
at 
org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:656)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:812)
at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636)
at org.apache.hadoop.ipc.Client.call(Client.java:1452)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27905) Directly schedule procedures that do not need to acquire locks

2023-06-03 Thread ruanhui (Jira)
ruanhui created HBASE-27905:
---

 Summary: Directly schedule procedures that do not need to acquire 
locks
 Key: HBASE-27905
 URL: https://issues.apache.org/jira/browse/HBASE-27905
 Project: HBase
  Issue Type: Improvement
  Components: proc-v2
Affects Versions: 3.0.0-alpha-3
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-4


Currently, in the procedure scheduler, we will not schedule any other 
procedures for a given queue if a procedure has held the exclusive lock, even 
if a procedure does not require any locks. For such procedures that do not 
require locks, we prefer that they can be executed directly without waiting 
until the procedure that held the exclusive lock is executed before starting to 
schedule execution. Otherwise, if the procedure holding the exclusive lock is 
stuck, the procedure that does not need the lock will also wait forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27885) expose metaCacheHits in MetricsConnection

2023-05-24 Thread ruanhui (Jira)
ruanhui created HBASE-27885:
---

 Summary: expose metaCacheHits in MetricsConnection
 Key: HBASE-27885
 URL: https://issues.apache.org/jira/browse/HBASE-27885
 Project: HBase
  Issue Type: New Feature
  Components: Client
Affects Versions: 3.0.0-alpha-3
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-4






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27855) Support dynamic adjustment of flusher count

2023-05-10 Thread ruanhui (Jira)
ruanhui created HBASE-27855:
---

 Summary: Support dynamic adjustment of flusher count
 Key: HBASE-27855
 URL: https://issues.apache.org/jira/browse/HBASE-27855
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 3.0.0-alpha-3
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-4






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27844) changed type names to avoid conflicts with built-in types

2023-05-04 Thread ruanhui (Jira)
ruanhui created HBASE-27844:
---

 Summary: changed type names to avoid conflicts with built-in types
 Key: HBASE-27844
 URL: https://issues.apache.org/jira/browse/HBASE-27844
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 2.5.4
Reporter: ruanhui
Assignee: ruanhui


Some compilers will resolve Builder to java.lang.Thread.Builder instead of 
Builder in pb and cause compilation failure. We should try to avoid conflicts 
with built-in class names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27463) Reset sizeOfLogQueue when refresh replication source

2022-11-03 Thread ruanhui (Jira)
ruanhui created HBASE-27463:
---

 Summary: Reset sizeOfLogQueue when refresh replication source
 Key: HBASE-27463
 URL: https://issues.apache.org/jira/browse/HBASE-27463
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 3.0.0-alpha-3
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-4


When refresh replication sources, we don't clear the metric. That may cause the 
value of sizeOfLogQueue metric wrong.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27458) Use ReadWriteLock for region scanner readpoint map

2022-11-01 Thread ruanhui (Jira)
ruanhui created HBASE-27458:
---

 Summary: Use ReadWriteLock for region scanner readpoint map 
 Key: HBASE-27458
 URL: https://issues.apache.org/jira/browse/HBASE-27458
 Project: HBase
  Issue Type: Improvement
  Components: Scanners
Affects Versions: 3.0.0-alpha-3
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-4
 Attachments: jstack-2.png

Currently we manage the concurrency between the RegionScanner and 
getSmallestReadPoint by synchronizing on the scannerReadPoints object. In our 
production, we find that many read threads are blocked by this when we have a 
heavy read load. 

we need to get smallest read point when 
a. flush a memstore 
b. compact memstore/storefile 
c. do delta operation like increment/append
Usually the frequency of these operations is much less than read requests. 

It's a little expensive to use an exclusive lock here because for region 
scanners, what it need to do is just calcaulating readpoint and putting the 
readpoint in the scanner readpoint map, which is thread-safe. Multiple read 
threads can do this in parallel without synchronization.

Based on the above consideration, maybe we can replace the synchronized lock 
with readwrite lock. It will help improve the read performance if the 
bottleneck is on the synchronization here.

!jstack.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27445) result of DirectMemoryUtils#getDirectMemorySize may be wrong

2022-10-25 Thread ruanhui (Jira)
ruanhui created HBASE-27445:
---

 Summary: result of DirectMemoryUtils#getDirectMemorySize may be 
wrong
 Key: HBASE-27445
 URL: https://issues.apache.org/jira/browse/HBASE-27445
 Project: HBase
  Issue Type: Bug
  Components: UI
Affects Versions: 3.0.0-alpha-3
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-4


If the parameter is set repeatedly, the latter will take effect. For example, 
if we set 

-Xms30g -Xmx30g -XX:MaxDirectMemorySize=40g -XX:MaxDirectMemorySize=50g

the MaxDirectMemorySize will be set as 50g.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27355) Separate meta read requests from master and client

2022-09-02 Thread ruanhui (Jira)
ruanhui created HBASE-27355:
---

 Summary: Separate meta read requests from master and client 
 Key: HBASE-27355
 URL: https://issues.apache.org/jira/browse/HBASE-27355
 Project: HBase
  Issue Type: Improvement
  Components: IPC/RPC
Affects Versions: 3.0.0-alpha-3
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-4


If we have a large number of store files in a single region or the response 
from hdfs is slow, the region transition can be slow, the client may put a lot 
of pressure on the meta server when retrying. This may block the master system 
read requests. Maybe we can set a special priority for the master request to 
isolate read requests from master and client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27325) the bulkload max call queue size can be update to a wrong value

2022-08-24 Thread ruanhui (Jira)
ruanhui created HBASE-27325:
---

 Summary: the bulkload max call queue size can be update to a wrong 
value
 Key: HBASE-27325
 URL: https://issues.apache.org/jira/browse/HBASE-27325
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 3.0.0-alpha-3
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-4


The configKey can be wrong, because 

name.toLowerCase(Locale.ROOT).contains("bulkLoad") is always false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27320) hide some sensitive configuration information in the UI

2022-08-23 Thread ruanhui (Jira)
ruanhui created HBASE-27320:
---

 Summary: hide some sensitive configuration information in the UI
 Key: HBASE-27320
 URL: https://issues.apache.org/jira/browse/HBASE-27320
 Project: HBase
  Issue Type: Improvement
  Components: UI
Affects Versions: 3.0.0-alpha-3
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-4


In the discussion about how to store keystore/truststore password securely, 
[~bbeaudreault]  mentioned and I quote here

"I agree that it seems insecure to put it directly into the hbase-site.xml. 
Another reason is due to the RS UI which (helpfully) can print the entire site 
configuration. We’d need to make sure the password is excluded from that, but 
better to remove it from site xml altogether".

I also felt that some sensitive information was exposed in the UI, for example, 
if we set superuser in the hbase-site.xml, the non-admin users can obtain 
superuser information and simulate superuser to perform some non-permitted 
operations on the cluster. So I think maybe we should hide these sensitive 
information in the UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27305) add an option to skip file splitting when bulkload hfiles

2022-08-16 Thread ruanhui (Jira)
ruanhui created HBASE-27305:
---

 Summary: add an option to skip file splitting when bulkload hfiles
 Key: HBASE-27305
 URL: https://issues.apache.org/jira/browse/HBASE-27305
 Project: HBase
  Issue Type: Improvement
  Components: tooling
Affects Versions: 3.0.0-alpha-3
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-4


When bulkload hfiles, if the key range of the hfile does not match the key 
range of the region, the BulkLoadHFilesTool will split hfile to fit make the 
key range of the new file match the key range of the region. If there are many 
files to be split, the load on the BulkLoadHFilesTool will be very high. 
Sometimes we want to avoid this situation, just directly fail and regenerate 
new hfiles. Here we try to introduce a new option, When the above problem is 
encountered, an exception will be thrown and let the upper client handle it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27158) Add namespace column family to UNDELETABLE_META_COLUMNFAMILIES

2022-06-24 Thread ruanhui (Jira)
ruanhui created HBASE-27158:
---

 Summary: Add namespace column family to 
UNDELETABLE_META_COLUMNFAMILIES
 Key: HBASE-27158
 URL: https://issues.apache.org/jira/browse/HBASE-27158
 Project: HBase
  Issue Type: Improvement
  Components: proc-v2
Affects Versions: 2.4.12
Reporter: ruanhui
 Fix For: 3.0.0-alpha-4


If we delete the namespace family from hbase:meta, clusters can also be 
problematic. So I think we should also add the namespace family to the family 
list which can not be deleted.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27157) Potential race condition in WorkerAssigner

2022-06-23 Thread ruanhui (Jira)
ruanhui created HBASE-27157:
---

 Summary: Potential race condition in WorkerAssigner
 Key: HBASE-27157
 URL: https://issues.apache.org/jira/browse/HBASE-27157
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Affects Versions: 2.4.12
Reporter: ruanhui
 Fix For: 3.0.0-alpha-2


Multiple SplitWALProcedures share the same WorkerAssigner instance, so there is 
potential race condition because the suspend and the wake method are not 
synchronized.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26974) Introduce a LogRollProcedure

2022-04-24 Thread ruanhui (Jira)
ruanhui created HBASE-26974:
---

 Summary: Introduce a LogRollProcedure
 Key: HBASE-26974
 URL: https://issues.apache.org/jira/browse/HBASE-26974
 Project: HBase
  Issue Type: Improvement
  Components: backuprestore, proc-v2
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-3


The current log-rolling for all regionservers is based in ZK. Here is an 
attempt to reimplement it with procedure v2.

Here are some requirements about the implementation.
The procedure can be introduced as a new feature. It should remain fully 
compatible with previous implementations. Also, this feature can be disabled by 
the configuration. Currently we only use the logroll procedure when taking a 
backup job, so I think all code logic should be implemented in the hbase-backup 
module as much as possible(I'm not sure if this is the right way to do it. If 
you have any suggestions, please let me know).


Here are some details about the implementation.
LogRollProcedure
The LogRollProcedure is used to roll WAL for all the regionservers in the 
cluster. It acquires the shared lock of the backup system table.
RSLogRollProcedure
The RSLogRollProcedure is used to schedule a RSLogRollRemoteProcedure for each 
regionserver. When the subprocedure returns, the RSLogRollProcedure will check 
the logrolling result in the backup system table. If failed, The 
RSLogRollProcedure will schedule a new RSLogRollRemoteProcedure to retry.
RSLogRollRemoteProcedure
The RSLogRollRemoteProcedure is used to send the log roll request to the remote 
server.

This is only the first version implementation, any suggestions and feedbacks 
are appreciated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26961) cache region locations when getAllRegionLocations() for branch-2.2+

2022-04-19 Thread ruanhui (Jira)
ruanhui created HBASE-26961:
---

 Summary: cache region locations when getAllRegionLocations() for 
branch-2.2+
 Key: HBASE-26961
 URL: https://issues.apache.org/jira/browse/HBASE-26961
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 2.4.11, 2.3.7, 2.2.7
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 2.4.12


backport HBASE-26942 for branch-2.2, branch-2.3 and branch-2.4



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26942) cache region locations when getAllRegionLocations()

2022-04-10 Thread ruanhui (Jira)
ruanhui created HBASE-26942:
---

 Summary: cache region locations when getAllRegionLocations()
 Key: HBASE-26942
 URL: https://issues.apache.org/jira/browse/HBASE-26942
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 2.4.11, 3.0.0-alpha-2
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-3


When get all table region locations from meta, we can cache the result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26867) Introduce a FlushProcedure

2022-03-19 Thread ruanhui (Jira)
ruanhui created HBASE-26867:
---

 Summary: Introduce a FlushProcedure
 Key: HBASE-26867
 URL: https://issues.apache.org/jira/browse/HBASE-26867
 Project: HBase
  Issue Type: New Feature
  Components: proc-v2
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 2.6.0, 3.0.0-alpha-3


Reimplement proc-v1 based flush procedure in proc-v2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26859) Split TestSnapshotProcedure to several smaller tests

2022-03-17 Thread ruanhui (Jira)
ruanhui created HBASE-26859:
---

 Summary: Split TestSnapshotProcedure to several smaller tests
 Key: HBASE-26859
 URL: https://issues.apache.org/jira/browse/HBASE-26859
 Project: HBase
  Issue Type: Improvement
  Components: proc-v2, snapshots
Affects Versions: 2.4.11
Reporter: ruanhui
 Fix For: 3.0.0-alpha-3, 2.4.12


TestSnapshotProcedure is too big. It's easy to timeout.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26842) TestSnapshotProcedure fails in branch-2

2022-03-15 Thread ruanhui (Jira)
ruanhui created HBASE-26842:
---

 Summary: TestSnapshotProcedure fails in branch-2
 Key: HBASE-26842
 URL: https://issues.apache.org/jira/browse/HBASE-26842
 Project: HBase
  Issue Type: Bug
  Components: proc-v2, snapshots
Reporter: ruanhui
 Fix For: 2.4.11


We still use the origin implementation for Admin in branch-2, which is 
different from the AdminOverAsyncAdmin in master branch. This patch will try to 
introduce the snapshot procedure to the origin Admin implementation client in 
branch-2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26769) add archive directory, old WAL direcotry and disabled tables space usage information to metrics

2022-02-23 Thread ruanhui (Jira)
ruanhui created HBASE-26769:
---

 Summary: add archive directory, old WAL direcotry and disabled 
tables space usage information to metrics
 Key: HBASE-26769
 URL: https://issues.apache.org/jira/browse/HBASE-26769
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: ruanhui


Currently we don't have space usage information for the archive directory, the 
old wal directory and disabled tables.  This patch is to add this part of the 
information to the metric system.

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26716) NPE caused by converting uppercase hostname to lowercase in RegionMover

2022-01-27 Thread ruanhui (Jira)
ruanhui created HBASE-26716:
---

 Summary: NPE caused by converting uppercase hostname to lowercase 
in RegionMover
 Key: HBASE-26716
 URL: https://issues.apache.org/jira/browse/HBASE-26716
 Project: HBase
  Issue Type: Improvement
  Components: util
Affects Versions: 2.4.9
Reporter: ruanhui


In HBASE-19456, we introduced case-insensitivity feature in RegionMover and 
converted uppercase hostnames to lowercase hostnames. But this maybe causes 
that we can't get the rsgroup info of unloading server, because the addresses 
in hbase case insensitive. This will make 
org.apache.hadoop.hbase.util.TestRegionMoverWithRSGroupEnable fail.
 
 
2022-01-27T20:53:31,948 INFO [Time-limited test] 
util.TestRegionMoverWithRSGroupEnable(127): Unloading {*}VM{*}-154-75-centos
2022-01-27T20:53:31,959 INFO 
[RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=49232] 
master.MasterRpcServices(3011): rsGroupInfo of {*}vm{*}-154-75-centos:39126 is 
null
2022-01-27T20:53:31,961 INFO [pool-332-thread-1] util.RegionMover(419): rsgroup 
of {*}vm{*}-154-75-centos:39126 is null
2022-01-27T20:53:31,961 ERROR [pool-332-thread-1] util.RegionMover(471): Error 
while unloading regions
java.lang.NullPointerException: null
at 
org.apache.hadoop.hbase.util.RegionMover.lambda$unloadRegions$3(RegionMover.java:421)
 ~[classes/:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_292]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_292]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_292]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26610) RSRollLogTask didn't call coprocessor when request roll log in backup

2021-12-19 Thread ruanhui (Jira)
ruanhui created HBASE-26610:
---

 Summary: RSRollLogTask didn't call coprocessor when request roll 
log in backup
 Key: HBASE-26610
 URL: https://issues.apache.org/jira/browse/HBASE-26610
 Project: HBase
  Issue Type: Improvement
  Components: backuprestore
Affects Versions: 2.4.9
Reporter: ruanhui
Assignee: ruanhui






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26554) Introduce a new parameter in jmx servlet to exclude the specific mbean

2021-12-09 Thread ruanhui (Jira)
ruanhui created HBASE-26554:
---

 Summary: Introduce a new parameter in jmx servlet to exclude the 
specific mbean
 Key: HBASE-26554
 URL: https://issues.apache.org/jira/browse/HBASE-26554
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 2.4.8
Reporter: ruanhui
Assignee: ruanhui
 Fix For: 3.0.0-alpha-2


There are many regionservers serving over a thousand regions, and the metric 
load is pretty big.
I tried to exclude some huge mbean like 
'Hadoop:service=HBase,name=RegionServer,sub=Regions' with regex, but didn't 
succeed.
So I want to propose a new parameter 'excl' in jmx servlet to exclude the 
splecific bean or beans.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26485) Introduce a method to clean restore directory after Snapshot Scan

2021-11-30 Thread ruanhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ruanhui resolved HBASE-26485.
-
Resolution: Fixed

> Introduce a method to clean restore directory after Snapshot Scan
> -
>
> Key: HBASE-26485
> URL: https://issues.apache.org/jira/browse/HBASE-26485
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: ruanhui
>Assignee: ruanhui
>Priority: Minor
>
> SnapshotScan is widely used in our company. However, after the snapshot scan 
> job, the restore directory is not cleaned, and this maybe puts a lot of 
> pressure on HDFS after a long time. So maybe we can introduce a method for 
> users to clean the snapshot restore directory after job.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26485) Introduce a method to clean restore directory after Snapshot Scan

2021-11-24 Thread ruanhui (Jira)
ruanhui created HBASE-26485:
---

 Summary: Introduce a method to clean restore directory after 
Snapshot Scan
 Key: HBASE-26485
 URL: https://issues.apache.org/jira/browse/HBASE-26485
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Reporter: ruanhui
Assignee: ruanhui


SnapshotScan is widely used in our company. However, after the snapshot scan 
job, the restore directory is not cleaned, and this maybe puts a lot of 
pressure on HDFS after a long time. So maybe we can introduce a method for 
users to clean the snapshot restore directory after job.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26323) introduce a SnapshotProcedure

2021-10-03 Thread ruanhui (Jira)
ruanhui created HBASE-26323:
---

 Summary: introduce a SnapshotProcedure
 Key: HBASE-26323
 URL: https://issues.apache.org/jira/browse/HBASE-26323
 Project: HBase
  Issue Type: New Feature
  Components: proc-v2, snapshots
Reporter: ruanhui


Currently,snapshot in hbase uses zk as coordinator. It has some limitations, 
 a. Snapshot maybe fails when there are region server crashes.
 b. Snapshot maybe failed when master restarts.
 c. Only one snapshot per table can be taken in a time.
 d. Snapshot verify will be handled by master, which may take long time when 
our table has a large number of regions, for example 1.

 

Since we have procedure v2 framework now, it is possible to solve the above 
problems. So here is a procedure2-based snapshot implementation. It has some 
goals,
 a. Snapshot can continue when there are region server crashes.
 b. Snapshot can continue when master restarts.
 c. More than one snapshot per table can be taken in a time.
 d. We can use region servers to verify snapshot to accelerate procedure.

 

Here are some details about implementation.
 *SnapshotProcedure*
 SnapshotProcedure is used to take snapshot on a table. It acquires shared 
table lock on the snapshot table and hold the shared lock during suspend and 
yield. 
 *SnapshotRegionProcedure*
 SnapshotRegionProcedure is used to take snapshot on a specific region of the 
snapshot table. It acquires exclusive region lock and releases lock during 
suspend and yield. Before dispatch remote snapshot operations to region server, 
it will check target region in RIT or not. If target region is in RIT, it will 
sleep some time and retry.
 *SnapshotVerifyProcedure*
 SnapshotVerifyProcedure is used to send snapshot verify request to region 
server. If snapshot is corrupted, it will notify parent snapshot to retry. When 
remote region server is crashed, it will choose another online server and retry.

 

I would be very grateful for any advice and guidance. Is anyone interested in 
taking a look?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26166) table list in master ui has a monor bug

2021-08-02 Thread ruanhui (Jira)
ruanhui created HBASE-26166:
---

 Summary: table list in master ui has a monor bug
 Key: HBASE-26166
 URL: https://issues.apache.org/jira/browse/HBASE-26166
 Project: HBase
  Issue Type: Bug
  Components: UI
Affects Versions: 2.4.5
Reporter: ruanhui
 Attachments: image-2021-08-03-13-09-24-030.png

!image-2021-08-03-13-09-24-030.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25880) remove files from filesCompacting when clear compaction queues

2021-05-11 Thread ruanhui (Jira)
ruanhui created HBASE-25880:
---

 Summary: remove files from filesCompacting when clear compaction 
queues
 Key: HBASE-25880
 URL: https://issues.apache.org/jira/browse/HBASE-25880
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: ruanhui
Assignee: ruanhui


When clear compaction queues, we just clear the workQueue of 
ThreadPoolExecutor, but files in compaction request are still in 
filesCompacting list. maybe we should clear it also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25102) fix replication.stats.thread.period.seconds default setting bug

2020-09-27 Thread ruanhui (Jira)
ruanhui created HBASE-25102:
---

 Summary: fix replication.stats.thread.period.seconds default 
setting bug 
 Key: HBASE-25102
 URL: https://issues.apache.org/jira/browse/HBASE-25102
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.3.2, 2.2.6
Reporter: ruanhui


replication.stats.thread.period.seconds is in seconds while default TimeUnit is 
TimeUnit.

MILLISECONDS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)