Re: [DISCUSS] Upgrade Zookeeper and Curator to latest version
Bump. There is a new CVE in zookeeper before 3.7.2. https://nvd.nist.gov/vuln/detail/CVE-2023-44981 I think maybe it is time for us to bump the zookeeper version now. Thanks. 张铎(Duo Zhang) 于2023年3月16日周四 18:27写道: > > We only use Curator in hbase-examples IIRC, so it should be OK to upgrade it. > > For ZooKeeper, besides client server wire compatibility, we also need to > consider java compatibility. For example, different protobuf versions can > communicate with each other, but if you depend on protobuf 2.5 and 3.x in the > java project, you will be in trouble as the classes are different... > > So the question here is, if we upgrade ZooKeeper to 3.8.x, will it break > downstream users who are still on ZooKeeper 3.6.x or 3.7.x? > > Thanks. > > Villő Szűcs 于2023年2月28日周二 22:33写道: >> >> Hi, >> I’d like to upgrade zookeeper in hbase (and in other components as well) to >> 3.8.1 version and curator to 5.4.0 version. >> It is useful since the current zookeeper version 3.5.7 is EOL and we should >> release HBASE 3 with the latest zookeeper to be on an active version. >> ZooKeeper clients from 3.5.x onwards are fully compatible with 3.8.x >> servers. ZooKeeper 3.8.x clients are compatible with 3.5.x, 3.6.x and 3.7.x >> servers as long as we are not using new APIs not present these versions. >> See ZooKeeper 3.8.0 Release Notes[1] for details. >> Curator 5.0 contains a few non-backward compatible/breaking changes from >> previous versions: https://curator.apache.org/breaking-changes.html, but >> these changes have no effect on hbase. See Curator Release Notes[2] for >> details. >> Do you have any suggestions? >> >> [1] https://zookeeper.apache.org/doc/r3.8.0/releasenotes.html >> [2] https://cwiki.apache.org/confluence/display/CURATOR/Releases
[jira] [Resolved] (HBASE-27382) Cluster completely down due to WAL splitting failing for hbase:meta table.
[ https://issues.apache.org/jira/browse/HBASE-27382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-27382. - Assignee: (was: Rushabh Shah) Resolution: Won't Fix > Cluster completely down due to WAL splitting failing for hbase:meta table. > -- > > Key: HBASE-27382 > URL: https://issues.apache.org/jira/browse/HBASE-27382 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.0, 1.7.2, 2.4.14 >Reporter: Rushabh Shah >Priority: Major > > We are running some version of 1.7.2 in our production environment. We > encountered this issue recently. > We colocate namenode and region server holding hbase:meta table on a set of 5 > master nodes. Co-incidentally active namenode and region server holding meta > table were on the same physical node and that node went down due to hardware > issue. We have sub optimal hdfs level timeouts configured so whenever active > namenode goes down, it takes around 12-15 minutes for hdfs client within > hbase to connect to new active namenode. So all the region servers were > having problems for about 15 minutes to connect to new active namenode. > Below are the sequence of events: > 1. Host running active namenode and hbase:meta went down at +2022-09-09 > 16:56:56,878+ > 2. HMaster started running ServerCrashProcedure at +2022-09-09 16:59:05,696+ > {noformat} > 2022-09-09 16:59:05,696 DEBUG [t-processor-pool2-t1] > procedure2.ProcedureExecutor - Procedure ServerCrashProcedure > serverName=,61020,1662714013670, shouldSplitWal=true, > carryingMeta=true id=1 owner=dummy state=RUNNABLE:SERVER_CRASH_START added to > the store. > 2022-09-09 16:59:05,702 DEBUG [t-processor-pool2-t1] master.ServerManager - > Added=,61020,1662714013670 to dead servers, submitted shutdown > handler to be executed meta=true > 2022-09-09 16:59:05,707 DEBUG [ProcedureExecutor-0] master.DeadServer - > Started processing ,61020,1662714013670; numProcessing=1 > 2022-09-09 16:59:05,712 INFO [ProcedureExecutor-0] > procedure.ServerCrashProcedure - Start processing crashed > ,61020,1662714013670 > {noformat} > 3. SplitLogManager created 2 split log tasks in zookeeper. > {noformat} > 2022-09-09 16:59:06,049 INFO [ProcedureExecutor-1] master.SplitLogManager - > Started splitting 2 logs in > [hdfs:///hbase/WALs/,61020,1662714013670-splitting] > for [,61020,1662714013670] > 2022-09-09 16:59:06,081 DEBUG [main-EventThread] > coordination.SplitLogManagerCoordination - put up splitlog task at znode > /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta > 2022-09-09 16:59:06,093 DEBUG [main-EventThread] > coordination.SplitLogManagerCoordination - put up splitlog task at znode > /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662739251611.meta > {noformat} > 4. The first split log task is more interesting: > +/hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta+ > 5. Since all the region servers were having problems connecting to active > namenode, SplitLogManager tried total of 4 times to assign this task (3 > resubmits, configured by hbase.splitlog.max.resubmit) and then finally gave > up. > {noformat} > -- try 1 - > 2022-09-09 16:59:06,205 INFO [main-EventThread] > coordination.SplitLogManagerCoordination - task > /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta > acquired by ,61020,1662540522069 > -- try 2 - > 2022-09-09 17:01:06,642 INFO [ager__ChoreService_1] > coordination.SplitLogManagerCoordination - resubmitting task > /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta > 2022-09-09 17:01:06,666 DEBUG [main-EventThread] > coordination.SplitLogManagerCoordination - task not yet acquired > /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta > ver = 2 > 2022-09-09 17:01:06,715 INFO [main-EventThread] > coordination.SplitLogManagerCoordination - task > /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta > acquired by ,61020,1662530684713 > -- try 3 - > 2022-09-09 17:03:07,643 INFO [ager__ChoreService_1] > coordination.SplitLogManagerCoordination - resubmitting task > /hbase/splitWAL/WALs%2F%2C61020%2C1662714013670-splitting%2F%252C61020%252C1662714013670.meta.1662735651285.meta > 2022-09-09 17:03:07,687 DEBUG [main-EventThread] > coordination.SplitLogManagerCoordination - task not yet acquired >
[jira] [Resolved] (HBASE-28133) TestSyncTimeRangeTracker fails with OOM with small -Xms values
[ https://issues.apache.org/jira/browse/HBASE-28133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi resolved HBASE-28133. --- Fix Version/s: 2.6.0 2.4.18 2.5.6 3.0.0-beta-1 Resolution: Fixed Merged to all active branches. Thanks [~stoty]! > TestSyncTimeRangeTracker fails with OOM with small -Xms values > -- > > Key: HBASE-28133 > URL: https://issues.apache.org/jira/browse/HBASE-28133 > Project: HBase > Issue Type: Bug >Affects Versions: 2.4.17 >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Labels: Arm64, test > Fix For: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1 > > > Edit2: It's not the OS, it's the -Xmx value determined from the host memory > size. > Edit: It's related to the OS and it's default java 8 , not to the processor > architecture. > This test seems to be cutting real close to the heap size. > On ARM, it consistently fails on my RHEL8.8 Aarch64 VM with Java 8. > {noformat} > mvn test -P runDevTests -Dtest.build.data.basedirectory=/ram2G > -Dhadoop.profile=3.0 -fn -B -Dtest=TestSyncTimeRangeTracker* -pl hbase-server > ... > [ERROR] > org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker.testConcurrentIncludeTimestampCorrectness > Time elapsed: 1.969 s <<< ERROR! > java.lang.OutOfMemoryError: Java heap space > {noformat} > It seems that Java on ARM has some higher memory overhead than x86_64. > Simply bumping -Xmx from the default 2200m to 2300m allows it to pass. > {noformat} > mvn test -P runDevTests -Dtest.build.data.basedirectory=/ram2G > -Dhadoop.profile=3.0 -fn -B -Dtest=TestSyncTimeRangeTracker* -pl hbase-server > -Dsurefire.Xmx=2300m > ... > [INFO] Running org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker > [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.395 > s - in org.apache.hadoop.hbase.regionserver.TestSyncTimeRangeTracker > {noformat} > However, the real solution should be reducing the memory usage for this test. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28146) ServerManager's rsAdmins map should be thread safe
Ray Mattingly created HBASE-28146: - Summary: ServerManager's rsAdmins map should be thread safe Key: HBASE-28146 URL: https://issues.apache.org/jira/browse/HBASE-28146 Project: HBase Issue Type: Bug Affects Versions: 2.5.5 Reporter: Ray Mattingly Assignee: Ray Mattingly On 2.x [the ServerManager registers admins in a HashMap|https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java]. This can result in thread safety issues — we recently observed an exception which caused a region to be indefinitely stuck in transition until we could manually intervene. We saw the following exception in the HMaster logs: {code:java} 2023-10-11 02:20:05.213 [RSProcedureDispatcher-pool-325] ERROR org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher: Unexpected error caught, this may cause the procedure to hang forever java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast to class java.util.HashMap$TreeNode (java.util.HashMap$Node and java.util.HashMap$TreeNode are in module java.base of loader 'bootstrap') at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1900) ~[?:?] at java.util.HashMap$TreeNode.treeify(HashMap.java:2016) ~[?:?] at java.util.HashMap.treeifyBin(HashMap.java:768) ~[?:?] at java.util.HashMap.putVal(HashMap.java:640) ~[?:?] at java.util.HashMap.put(HashMap.java:608) ~[?:?] at org.apache.hadoop.hbase.master.ServerManager.getRsAdmin(ServerManager.java:723){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28145) When specifying the wrong BoolFilter type while creating a table in HBase shell, the log prompt will report an error.
Xiao Zhang created HBASE-28145: -- Summary: When specifying the wrong BoolFilter type while creating a table in HBase shell, the log prompt will report an error. Key: HBASE-28145 URL: https://issues.apache.org/jira/browse/HBASE-28145 Project: HBase Issue Type: Bug Components: shell Affects Versions: 2.5.5, 3.0.0-alpha-4 Reporter: Xiao Zhang Assignee: Xiao Zhang Attachments: image-2023-10-11-16-14-31-219.png Executing the following command in HBase shell, specifying the wrong BoolFilter type, will prompt "ERROR: uninitialized constant Java::OrgApacheHadoopHbaseRegionserver::StoreFile::BloomType". ``` create 'zx',\{NAME=>'0', BLOOMFILTER=>'TEST'} ``` !image-2023-10-11-16-14-31-219.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)