Re: [DISCUSS] Upgrade Zookeeper and Curator to latest version

2023-10-11 Thread Duo Zhang
Bump.

There is a new CVE in zookeeper before 3.7.2.

https://nvd.nist.gov/vuln/detail/CVE-2023-44981

I think maybe it is time for us to bump the zookeeper version now.

Thanks.

张铎(Duo Zhang)  于2023年3月16日周四 18:27写道:
>
> We only use Curator in hbase-examples IIRC, so it should be OK to upgrade it.
>
> For ZooKeeper, besides client server wire compatibility, we also need to 
> consider java compatibility. For example, different protobuf versions can 
> communicate with each other, but if you depend on protobuf 2.5 and 3.x in the 
> java project, you will be in trouble as the classes are different...
>
> So the question here is, if we upgrade ZooKeeper to 3.8.x, will it break 
> downstream users who are still on ZooKeeper 3.6.x or 3.7.x?
>
> Thanks.
>
> Villő Szűcs  于2023年2月28日周二 22:33写道:
>>
>> Hi,
>> I’d like to upgrade zookeeper in hbase (and in other components as well) to
>> 3.8.1 version and curator to 5.4.0 version.
>> It is useful since the current zookeeper version 3.5.7 is EOL and we should
>> release HBASE 3 with the latest zookeeper to be on an active version.
>> ZooKeeper clients from 3.5.x onwards are fully compatible with 3.8.x
>> servers. ZooKeeper 3.8.x clients are compatible with 3.5.x, 3.6.x and 3.7.x
>> servers as long as we are not using new APIs not present these versions.
>> See ZooKeeper 3.8.0 Release Notes[1] for details.
>> Curator 5.0 contains a few non-backward compatible/breaking changes from
>> previous versions: https://curator.apache.org/breaking-changes.html, but
>> these changes have no effect on hbase. See Curator Release Notes[2] for
>> details.
>> Do you have any suggestions?
>>
>> [1] https://zookeeper.apache.org/doc/r3.8.0/releasenotes.html
>> [2] https://cwiki.apache.org/confluence/display/CURATOR/Releases


[jira] [Resolved] (HBASE-27382) Cluster completely down due to WAL splitting failing for hbase:meta table.

2023-10-11 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-27382.
-
  Assignee: (was: Rushabh Shah)
Resolution: Won't Fix

> Cluster completely down due to WAL splitting failing for hbase:meta table.
> --
>
> Key: HBASE-27382
> URL: https://issues.apache.org/jira/browse/HBASE-27382
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.0, 1.7.2, 2.4.14
>Reporter: Rushabh Shah
>Priority: Major
>
> We are running some version of 1.7.2 in our production environment. We 
> encountered this issue recently.
> We colocate namenode and region server holding hbase:meta table on a set of 5 
> master nodes. Co-incidentally active namenode and region server holding meta 
> table were on the same physical node and that node went down due to hardware 
> issue. We have sub optimal hdfs level timeouts configured so whenever active 
> namenode goes down, it takes around 12-15 minutes for hdfs client within 
> hbase to connect to new active namenode. So all the region servers were 
> having problems for about 15 minutes to connect to new active namenode.
> Below are the sequence of events:
> 1. Host running active namenode and hbase:meta went down at +2022-09-09 
> 16:56:56,878+
> 2. HMaster started running ServerCrashProcedure at +2022-09-09 16:59:05,696+
> {noformat}
> 2022-09-09 16:59:05,696 DEBUG [t-processor-pool2-t1] 
> procedure2.ProcedureExecutor - Procedure ServerCrashProcedure 
> serverName=,61020,1662714013670, shouldSplitWal=true, 
> carryingMeta=true id=1 owner=dummy state=RUNNABLE:SERVER_CRASH_START added to 
> the store.
> 2022-09-09 16:59:05,702 DEBUG [t-processor-pool2-t1] master.ServerManager - 
> Added=,61020,1662714013670 to dead servers, submitted shutdown 
> handler to be executed meta=true
> 2022-09-09 16:59:05,707 DEBUG [ProcedureExecutor-0] master.DeadServer - 
> Started processing ,61020,1662714013670; numProcessing=1
> 2022-09-09 16:59:05,712 INFO  [ProcedureExecutor-0] 
> procedure.ServerCrashProcedure - Start processing crashed 
> ,61020,1662714013670
> {noformat}
> 3. SplitLogManager created 2 split log tasks in zookeeper.
> {noformat}
> 2022-09-09 16:59:06,049 INFO  [ProcedureExecutor-1] master.SplitLogManager - 
> Started splitting 2 logs in 
> [hdfs:///hbase/WALs/,61020,1662714013670-splitting]
>  for [,61020,1662714013670]
> 2022-09-09 16:59:06,081 DEBUG [main-EventThread] 
> coordination.SplitLogManagerCoordination - put up splitlog task at znode 
> /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta
> 2022-09-09 16:59:06,093 DEBUG [main-EventThread] 
> coordination.SplitLogManagerCoordination - put up splitlog task at znode 
> /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662739251611.meta
> {noformat}
> 4. The first split log task is more interesting: 
> +/hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta+
> 5. Since all the region servers were having problems connecting to active 
> namenode, SplitLogManager tried total of 4 times to assign this task (3 
> resubmits, configured by hbase.splitlog.max.resubmit) and then finally gave 
> up.
> {noformat}
> -- try 1 -
> 2022-09-09 16:59:06,205 INFO  [main-EventThread] 
> coordination.SplitLogManagerCoordination - task 
> /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta
>  acquired by ,61020,1662540522069
> -- try 2 -
> 2022-09-09 17:01:06,642 INFO  [ager__ChoreService_1] 
> coordination.SplitLogManagerCoordination - resubmitting task 
> /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta
> 2022-09-09 17:01:06,666 DEBUG [main-EventThread] 
> coordination.SplitLogManagerCoordination - task not yet acquired 
> /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta
>  ver = 2
> 2022-09-09 17:01:06,715 INFO  [main-EventThread] 
> coordination.SplitLogManagerCoordination - task 
> /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta
>  acquired by ,61020,1662530684713
> -- try 3 -
> 2022-09-09 17:03:07,643 INFO  [ager__ChoreService_1] 
> coordination.SplitLogManagerCoordination - resubmitting task 
> /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta
> 2022-09-09 17:03:07,687 DEBUG [main-EventThread] 
> coordination.SplitLogManagerCoordination - task not yet acquired 
> 

[jira] [Resolved] (HBASE-28133) TestSyncTimeRangeTracker fails with OOM with small -Xms values

2023-10-11 Thread Peter Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi resolved HBASE-28133.
---
Fix Version/s: 2.6.0
   2.4.18
   2.5.6
   3.0.0-beta-1
   Resolution: Fixed

Merged to all active branches. Thanks [~stoty]!

> TestSyncTimeRangeTracker fails with OOM with small -Xms values
> --
>
> Key: HBASE-28133
> URL: https://issues.apache.org/jira/browse/HBASE-28133
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.4.17
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: Arm64, test
> Fix For: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1
>
>
> Edit2: It's not the OS, it's the -Xmx value determined from the host memory 
> size.
> Edit: It's related to the OS and it's default java 8 , not to the processor 
> architecture.
> This test seems to be cutting real close to the heap size.
> On ARM, it consistently fails on my RHEL8.8 Aarch64 VM with Java 8.
> {noformat}
> mvn test -P runDevTests -Dtest.build.data.basedirectory=/ram2G 
> -Dhadoop.profile=3.0 -fn -B -Dtest=TestSyncTimeRangeTracker* -pl hbase-server
> ...
> [ERROR] 
> org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker.testConcurrentIncludeTimestampCorrectness
>   Time elapsed: 1.969 s  <<< ERROR!
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> It seems that Java on ARM has some higher memory overhead than x86_64.
> Simply bumping -Xmx from the default 2200m to 2300m allows it to pass. 
> {noformat}
> mvn test -P runDevTests -Dtest.build.data.basedirectory=/ram2G 
> -Dhadoop.profile=3.0 -fn -B -Dtest=TestSyncTimeRangeTracker* -pl hbase-server 
> -Dsurefire.Xmx=2300m
> ...
> [INFO] Running org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker
> [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.395 
> s - in org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker
> {noformat}
> However, the real solution should be reducing the memory usage for this test.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28146) ServerManager's rsAdmins map should be thread safe

2023-10-11 Thread Ray Mattingly (Jira)
Ray Mattingly created HBASE-28146:
-

 Summary: ServerManager's rsAdmins map should be thread safe
 Key: HBASE-28146
 URL: https://issues.apache.org/jira/browse/HBASE-28146
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.5.5
Reporter: Ray Mattingly
Assignee: Ray Mattingly


On 2.x [the ServerManager registers admins in a 
HashMap|https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java].
 This can result in thread safety issues — we recently observed an exception 
which caused a region to be indefinitely stuck in transition until we could 
manually intervene. We saw the following exception in the HMaster logs:
{code:java}
2023-10-11 02:20:05.213 [RSProcedureDispatcher-pool-325] ERROR 
org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher: Unexpected 
error caught, this may cause the procedure to hang forever
    java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast 
to class java.util.HashMap$TreeNode (java.util.HashMap$Node and 
java.util.HashMap$TreeNode are in module java.base of loader 'bootstrap')
        at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1900) ~[?:?]
        at java.util.HashMap$TreeNode.treeify(HashMap.java:2016) ~[?:?]
        at java.util.HashMap.treeifyBin(HashMap.java:768) ~[?:?]
        at java.util.HashMap.putVal(HashMap.java:640) ~[?:?]
        at java.util.HashMap.put(HashMap.java:608) ~[?:?]
        at 
org.apache.hadoop.hbase.master.ServerManager.getRsAdmin(ServerManager.java:723){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28145) When specifying the wrong BoolFilter type while creating a table in HBase shell, the log prompt will report an error.

2023-10-11 Thread Xiao Zhang (Jira)
Xiao Zhang created HBASE-28145:
--

 Summary: When specifying the wrong BoolFilter type while creating 
a table in HBase shell, the log prompt will report an error.
 Key: HBASE-28145
 URL: https://issues.apache.org/jira/browse/HBASE-28145
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 2.5.5, 3.0.0-alpha-4
Reporter: Xiao Zhang
Assignee: Xiao Zhang
 Attachments: image-2023-10-11-16-14-31-219.png

Executing the following command in HBase shell, specifying the wrong BoolFilter 
type, will prompt "ERROR: uninitialized constant 
Java::OrgApacheHadoopHbaseRegionserver::StoreFile::BloomType".

```

create 'zx',\{NAME=>'0', BLOOMFILTER=>'TEST'}

```

!image-2023-10-11-16-14-31-219.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)