[jira] [Resolved] (HBASE-28648) Change the deprecation cycle for RegionObserver.postInstantiateDeleteTracker
[ https://issues.apache.org/jira/browse/HBASE-28648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liangjun He resolved HBASE-28648. - Fix Version/s: 3.0.0-beta-2 Resolution: Fixed > Change the deprecation cycle for RegionObserver.postInstantiateDeleteTracker > > > Key: HBASE-28648 > URL: https://issues.apache.org/jira/browse/HBASE-28648 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Reporter: Duo Zhang >Assignee: Liangjun He >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > > Visibility label feature still use this method so it can not be removed in > 3.0.0. Should change the deprecation cycle javadoc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28389) HBase backup yarn queue parameter ignored
[ https://issues.apache.org/jira/browse/HBASE-28389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liangjun He resolved HBASE-28389. - Fix Version/s: 2.7.0 3.0.0-beta-2 2.6.1 Resolution: Fixed > HBase backup yarn queue parameter ignored > - > > Key: HBASE-28389 > URL: https://issues.apache.org/jira/browse/HBASE-28389 > Project: HBase > Issue Type: Bug > Components: backuprestore >Affects Versions: 2.6.0 > Environment: HBase branch-2.6 >Reporter: Dieter De Paepe >Assignee: Liangjun He >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1 > > > It seems the parameter to specify the yarn queue for HBase backup (`-q`) is > ignored: > {code:java} > hbase backup create full hdfs:///tmp/backups/hbasetest/hbase -q hbase-backup > {code} > gets executed on the "default" queue. > Setting the queue through the configuration does work. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28761) Expose HTTP context in REST Client
Istvan Toth created HBASE-28761: --- Summary: Expose HTTP context in REST Client Key: HBASE-28761 URL: https://issues.apache.org/jira/browse/HBASE-28761 Project: HBase Issue Type: Improvement Components: REST Reporter: Istvan Toth We already expose the Apache HTTP Client object in the REST client, but we specify the context for each call separately, so it is not possible to retrieve it. Add a getter and setter for the stickyContext object. The use case for this is copying session cookies between clients to avoid re-authentication by each client object, but this may also be useful for debugging purposes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28587) Remove deprecated methods in Cell
[ https://issues.apache.org/jira/browse/HBASE-28587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28587. --- Fix Version/s: 3.0.0-beta-2 Hadoop Flags: Incompatible change,Reviewed Release Note: Removed these deprecated methods from Cell interface byte getTypeByte(); long getSequenceId(); byte[] getTagsArray(); int getTagsOffset(); int getTagsLength(); Resolution: Fixed > Remove deprecated methods in Cell > - > > Key: HBASE-28587 > URL: https://issues.apache.org/jira/browse/HBASE-28587 > Project: HBase > Issue Type: Sub-task > Components: API, Client >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28729) Change the generic type of List in InternalScanner.next
[ https://issues.apache.org/jira/browse/HBASE-28729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28729. --- Fix Version/s: 3.0.0-beta-2 Hadoop Flags: Incompatible change,Reviewed (was: Incompatible change) Release Note: Change InternalScanner.next method to accept List rather than List, so we do not need to cast everywhere in the code. This is a breaking change for coprocessor users, especially that if you implement your own InternalScanner. In general, we can make sure that all the elements in the return List are ExtendedCells, thus Cells, so you are free to cast them to Cells when you want to intercept the results. And all Cells created via CellBuilder are all ExtendedCells, so you are free to cast them to ExtendedCells before adding to the List, or you can cast the List to List or even List to add Cells to it. Assignee: Duo Zhang Resolution: Fixed Pushed to master and branch-3. Thanks [~sunxin] for reviewing! > Change the generic type of List in InternalScanner.next > --- > > Key: HBASE-28729 > URL: https://issues.apache.org/jira/browse/HBASE-28729 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors, regionserver >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > > Plan to change it from List to List, so we could > pass both List and List to it, or even List for > coprocessors. > This could save a lot of casting in our main code. > This is an incompatible change for coprocessors, so it will only go into > branch-3+, and will be marked as incompatible change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28753) FNFE may occur when accessing the region.jsp of the replica region
[ https://issues.apache.org/jira/browse/HBASE-28753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28753. --- Hadoop Flags: Reviewed Resolution: Fixed Pushed to all active branches. Thanks [~guluo] for contributing and [~PankajKumar] for reviewing! > FNFE may occur when accessing the region.jsp of the replica region > -- > > Key: HBASE-28753 > URL: https://issues.apache.org/jira/browse/HBASE-28753 > Project: HBase > Issue Type: Bug > Components: Replication, UI >Affects Versions: 2.4.13 >Reporter: guluo >Assignee: guluo >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11 > > Attachments: image-2024-07-24-20-13-22-014.png > > > On hbase UI, we can get the details of storefiles in region region by > accessing region.jsp. > However, When hbase table enables the region replication, the replica region > may reference deleted storefile due to it dosen't refresh in a timely manner, > so in this case, we would get FNFE when openning the region.jsp of the region. > > java.io.FileNotFoundException: File > file:/home/gl/code/github/hbase/hbase-assembly/target/hbase-4.0.0-alpha-1-SNAPSHOT/tmp/hbase/data/default/t01/e073c6b7c05eadda3f91d5b9692fc98d/info/5c52361153044b89aa61090cd5497998.4433b98ccf6b4a011ab03fc4a5e38a1a > does not exist at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:915) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1236) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:905) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) > at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:1881) at > org.apache.hadoop.hbase.generated.regionserver.region_jsp._jspService(region_jsp.java:97) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:111) at > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28758) Remove the aarch64 profile
[ https://issues.apache.org/jira/browse/HBASE-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28758. --- Fix Version/s: 3.0.0-beta-2 Hadoop Flags: Reviewed Resolution: Fixed Pushed to master and branch-3. Thanks [~misterwang] for contributing! > Remove the aarch64 profile > -- > > Key: HBASE-28758 > URL: https://issues.apache.org/jira/browse/HBASE-28758 > Project: HBase > Issue Type: Improvement > Components: build, pom, Protobufs >Reporter: Duo Zhang >Assignee: MisterWang >Priority: Major > Labels: beginner, pull-request-available > Fix For: 3.0.0-beta-2 > > > We do not depend on protobuf 2.5 on branch-3+, so we do not need the special > protoc compiler for arm any more. > Just remove the profile. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28759) SLF4J: Class path contains multiple SLF4J bindings.
[ https://issues.apache.org/jira/browse/HBASE-28759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Longping Jie resolved HBASE-28759. -- Resolution: Not A Bug > SLF4J: Class path contains multiple SLF4J bindings. > --- > > Key: HBASE-28759 > URL: https://issues.apache.org/jira/browse/HBASE-28759 > Project: HBase > Issue Type: Improvement > Components: logging >Affects Versions: 2.6.0, 2.5.10 > Environment: hbase2.5.x 2.6.x > hadoop3.3.6 >Reporter: Longping Jie >Priority: Minor > > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/data/app/hadoop-3.3.6/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/data/app/hbase-2.5.10/lib/client-facing-thirdparty/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an > explanation. > > The above log dependency conflict causes the regionserver to be unable to > output logs after it is started. > By default, in the hbase script file in the bin directory, the value of > HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP is true, which will append the hadoop > lib to the classpath. In this way, after the hbase process is started, the > hadoop jar will be loaded, which may cause dependency conflicts. > Is it possible to set the variable HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP in > the hbase-env.sh file, set the default value to true, and only modify this > value to false when necessary? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28747) HBase-Nightly-s390x Build failures
[ https://issues.apache.org/jira/browse/HBASE-28747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28747. --- Assignee: Duo Zhang Resolution: Fixed > HBase-Nightly-s390x Build failures > -- > > Key: HBASE-28747 > URL: https://issues.apache.org/jira/browse/HBASE-28747 > Project: HBase > Issue Type: Task > Components: community, jenkins >Reporter: Soham Munshi >Assignee: Duo Zhang >Priority: Major > > Hi [~zhangduo] > This is regarding recent [s390x CI > failures|https://ci-hbase.apache.org/job/HBase-Nightly-s390x/] . > The install.log and junit.log has got below output - > {code:java} > /tmp/jenkins18056117051185954087.sh: line 12: > /home/jenkins/tools/maven/latest3//bin/mvn: No such file or directory{code} > Upon checking the machine stats it seems like the Apache Maven path is not > getting set properly, since the mvn_home outputs - > {code:java} > MAVEN_HOME: /home/jenkins/tools/maven/latest3/{code} > where as the mvn_version outputs the following - > {code:java} > [1mApache Maven 3.6.3[m Maven home: /usr/share/maven Java version: 11.0.23, > vendor: Ubuntu, runtime: /usr/lib/jvm/java-11-openjdk-s390x Default locale: > en_US, platform encoding: UTF-8 OS name: "linux", version: > "5.4.0-174-generic", arch: "s390x", family: "unix"{code} > Could you please help us get this fixed? > Thanks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28722) Should wipe out all the output directories before unstash in nightly job
[ https://issues.apache.org/jira/browse/HBASE-28722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28722. --- Hadoop Flags: Reviewed Resolution: Fixed > Should wipe out all the output directories before unstash in nightly job > > > Key: HBASE-28722 > URL: https://issues.apache.org/jira/browse/HBASE-28722 > Project: HBase > Issue Type: Bug > Components: jenkins, scripts >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11 > > > For master and branch-3, we do not have jdk8 and jdk11 stages but we can > still see there are comments on jira which include these stages's results. > I think the problem is that, in the 'init health results' stage, we want to > stash some empty results but actually there are some build results for > previous builds there so we stash some non empty results. > We should wipe out these directories first before stash them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28760) Client integration test fails on master branch
Duo Zhang created HBASE-28760: - Summary: Client integration test fails on master branch Key: HBASE-28760 URL: https://issues.apache.org/jira/browse/HBASE-28760 Project: HBase Issue Type: Bug Components: jenkins, scripts Reporter: Duo Zhang Permission denied... Not sure what is the real problem. {noformat} 17:17:52 [Sun Jul 28 09:17:51 AM UTC 2024 INFO]: Personality: patch mvninstall 17:17:52 cd /home/jenkins/jenkins-home/workspace/HBase_Nightly_master/component 17:17:52 /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-home/workspace/HBase_Nightly_master/yetus-m2/hbase-master-full-0 --threads=2 -Djava.io.tmpdir=/home/jenkins/jenkins-home/workspace/HBase_Nightly_master/component/target -DHBasePatchProcess -fae clean install -DskipTests=true -Dmaven.javadoc.skip=true -Dcheckstyle.skip=true -Dfindbugs.skip=true -Dspotbugs.skip=true > /home/jenkins/jenkins-home/workspace/HBase_Nightly_master/output-general/patch-mvninstall-root.txt 2>&1 17:21:17 Building a binary tarball from the source tarball succeeded. [Pipeline] echo 17:21:17 unpacking the hbase bin tarball into 'hbase-install' and the client tarball into 'hbase-client' [Pipeline] sh 17:21:18 tar: /jaxws-ri-2.3.2.pom: Cannot open: Permission denied 17:21:20 tar: Exiting with failure status due to previous errors Post stage [Pipeline] stash 17:21:20 Warning: overwriting stash ‘srctarball-result’ 17:21:20 Stashed 2 file(s) [Pipeline] sshPublisher 17:21:20 SSH: Current build result is [FAILURE], not going to run. [Pipeline] sh 17:21:20 Remove /home/jenkins/jenkins-home/workspace/HBase_Nightly_master/output-srctarball/hbase-src.tar.gz for saving space [Pipeline] archiveArtifacts 17:21:20 Archiving artifacts [Pipeline] archiveArtifacts 17:21:20 Archiving artifacts [Pipeline] archiveArtifacts 17:21:20 Archiving artifacts [Pipeline] archiveArtifacts 17:21:20 Archiving artifacts [Pipeline] } [Pipeline] // withEnv [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // stage [Pipeline] } 17:21:20 Failed in branch packaging and integration {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28759) SLF4J: Class path contains multiple SLF4J bindings.
Longping Jie created HBASE-28759: Summary: SLF4J: Class path contains multiple SLF4J bindings. Key: HBASE-28759 URL: https://issues.apache.org/jira/browse/HBASE-28759 Project: HBase Issue Type: Bug Components: logging Affects Versions: 2.5.10, 2.6.0 Environment: hbase2.5.x 2.6.x hadoop3.3.6 Reporter: Longping Jie SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/data/app/hadoop-3.3.6/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/data/app/hbase-2.5.10/lib/client-facing-thirdparty/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. The above log dependency conflict causes the regionserver to be unable to output logs after it is started. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28758) Remove the aarch64 profile
Duo Zhang created HBASE-28758: - Summary: Remove the aarch64 profile Key: HBASE-28758 URL: https://issues.apache.org/jira/browse/HBASE-28758 Project: HBase Issue Type: Improvement Components: build, pom Reporter: Duo Zhang We do not depend on protobuf 2.5 on branch-3+, so we do not need the special protoc compiler for arm any more. Just remove the profile. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28719) Use ExtendedCell in WALEdit
[ https://issues.apache.org/jira/browse/HBASE-28719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28719. --- Fix Version/s: 3.0.0-beta-2 Resolution: Fixed Pushed to master and branch-3. Thanks [~sunxin] for reviewing! > Use ExtendedCell in WALEdit > --- > > Key: HBASE-28719 > URL: https://issues.apache.org/jira/browse/HBASE-28719 > Project: HBase > Issue Type: Sub-task > Components: wal >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28748) Replication blocking: InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.
[ https://issues.apache.org/jira/browse/HBASE-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28748. --- Hadoop Flags: Reviewed Resolution: Fixed Pushed to branch-2.6+. Thanks for [~leojie] for reporting this issue and helping verifying the patch. Thanks [~sunxin] for reviewing! > Replication blocking: > InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag > had invalid wire type. > -- > > Key: HBASE-28748 > URL: https://issues.apache.org/jira/browse/HBASE-28748 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 2.6.0 > Environment: hbase2.6.0 > hadoop3.3.6 >Reporter: Longping Jie >Assignee: Duo Zhang >Priority: Critical > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1 > > Attachments: image-2024-07-23-12-33-50-395.png, > rs-replciation-error.log, > tx1-int-hbase-main-prod-4%2C16020%2C1720602602602.1720609818921 > > > h2. replication queue overstock, As shown below: > !image-2024-07-23-12-33-50-395.png! > > In the figure, the first wal file no longer exists, but has not been skipped, > causing replciation to block. > the second and third wal file were moved oldWals, you can see the attachment, > the reading of these two files faile. > h2. The error log in rs is > 2024-07-22T17:47:49,130 WARN > [RS_CLAIM_REPLICATION_QUEUE-regionserver/sh2-int-hbase-main-ha-9:16020-0.replicationSource,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464.replicationSource.wal-reader.tx1-int-hbase-main-prod-3%2C16020%2C1720602522464,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464] > wal.ProtobufWALStreamReader: Error while reading WALKey, originalPosition=0, > currentPosition=81 > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: > Protocol message tag had invalid wire type. > at > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:119) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:503) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:770) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2829) > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4212) > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4204) > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:192) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage.parseWithIOException(GeneratedMessage.java:321) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey.parseFrom(WALProtos.java:2321) > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.readWALKey(ProtobufWALTailingReader.java:128) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.next(ProtobufWALTailingReader.java:257) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:490) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.lastAttempt(WALEntryStream.java:306) > ~
[jira] [Resolved] (HBASE-28522) UNASSIGN proc indefinitely stuck on dead rs
[ https://issues.apache.org/jira/browse/HBASE-28522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28522. --- Fix Version/s: 2.7.0 3.0.0-beta-2 2.6.1 2.5.11 Hadoop Flags: Reviewed Assignee: Duo Zhang (was: Prathyusha) Resolution: Fixed Pushed to all active branches. Thanks all for helping and reviewing! > UNASSIGN proc indefinitely stuck on dead rs > --- > > Key: HBASE-28522 > URL: https://issues.apache.org/jira/browse/HBASE-28522 > Project: HBase > Issue Type: Improvement > Components: proc-v2, Region Assignment >Reporter: Prathyusha >Assignee: Duo Zhang >Priority: Critical > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11 > > Attachments: timeline.jpg > > > One scenario we noticed in production - > we had DisableTableProc and SCP almost triggered at similar time > 2024-03-16 17:59:23,014 INFO [PEWorker-11] procedure.DisableTableProcedure - > Set to state=DISABLING > 2024-03-16 17:59:15,243 INFO [PEWorker-26] procedure.ServerCrashProcedure - > Start pid=21592440, state=RUNNABLE:SERVER_CRASH_START, locked=true; > ServerCrashProcedure > , splitWal=true, meta=false > DisabeTableProc creates unassign procs, and at this time ASSIGNs of SCP is > not completed > {{2024-03-16 17:59:23,003 DEBUG [PEWorker-40] procedure2.ProcedureExecutor - > LOCK_EVENT_WAIT pid=21594220, ppid=21592440, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=, region=, ASSIGN}} > UNASSIGN created by DisableTableProc is stuck on the dead regionserver and we > had to manually bypass unassign of DisableTableProc and then do ASSIGN. > If we can break the loop for UNASSIGN procedure to not retry if there is scp > for that server, we do not need manual intervention?, at least the > DisableTableProc can go to a rollback state? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28742) CompactionTool fails with NPE when mslab is enabled
[ https://issues.apache.org/jira/browse/HBASE-28742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28742. --- Fix Version/s: 2.7.0 3.0.0-beta-2 2.6.1 2.5.11 Hadoop Flags: Reviewed Resolution: Fixed Pushed to all active branches. Thanks [~vineet.4008] for contributing and [~PankajKumar] for reviewing! > CompactionTool fails with NPE when mslab is enabled > --- > > Key: HBASE-28742 > URL: https://issues.apache.org/jira/browse/HBASE-28742 > Project: HBase > Issue Type: Bug > Components: Compaction >Affects Versions: 2.6.0, 3.0.0-beta-1, 2.5.9 >Reporter: Vineet Kumar Maheshwari >Assignee: Vineet Kumar Maheshwari >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11 > > > While using the CompactionTool, NPE is observed. > *Command:* > {code:java} > hbase org.apache.hadoop.hbase.regionserver.CompactionTool -major > {code} > *Exception Details:* > {code:java} > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.recycleChunks(MemStoreLABImpl.java:296) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.lambda$new$0(MemStoreLABImpl.java:109) > at org.apache.hadoop.hbase.nio.RefCnt.deallocate(RefCnt.java:95) > at > org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.handleRelease(AbstractReferenceCounted.java:86) > at > org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.release(AbstractReferenceCounted.java:76) > at org.apache.hadoop.hbase.nio.RefCnt.release(RefCnt.java:84) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.close(MemStoreLABImpl.java:269) > at > org.apache.hadoop.hbase.regionserver.Segment.close(Segment.java:143) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.close(AbstractMemStore.java:381) > at > org.apache.hadoop.hbase.regionserver.HStore.closeWithoutLock(HStore.java:723) > at org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:795) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compactStoreFiles(CompactionTool.java:171) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compactRegion(CompactionTool.java:137) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compactTable(CompactionTool.java:129) > at > org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:118) > at > org.apache.hadoop.hbase.regionserver.CompactionTool.doClient(CompactionTool.java:374) > at > org.apache.hadoop.hbase.regionserver.CompactionTool.run(CompactionTool.java:424) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.hbase.regionserver.CompactionTool.main(CompactionTool.java:460){code} > *Fix Suggestions:* > Initialize the ChunkCreator in CompactionTool when > hbase.hregion.memstore.mslab.enabled is enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28756) RegionSizeCalculator ignored the size of memstore, which leads Spark miss data
[ https://issues.apache.org/jira/browse/HBASE-28756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-28756. - Fix Version/s: 3.0.0-beta-2 2.6.1 2.5.11 Resolution: Fixed > RegionSizeCalculator ignored the size of memstore, which leads Spark miss data > -- > > Key: HBASE-28756 > URL: https://issues.apache.org/jira/browse/HBASE-28756 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 2.6.0, 3.0.0-beta-1, 2.5.10 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0-beta-2, 2.6.1, 2.5.11 > > > RegionSizeCalculator only considers the size of StoreFile and ignores the > size of MemStore. For a new region that has only been written to MemStore and > has not been flushed, will consider its size to be 0. > When we use TableInputFormat to read HBase table data in Spark. > {code:java} > spark.sparkContext.newAPIHadoopRDD( > conf, > classOf[TableInputFormat], > classOf[ImmutableBytesWritable], > classOf[Result]) > }{code} > Spark defaults to ignoring empty InputSplits, which is determined by the > configuration "{{{}spark.hadoopRDD.ignoreEmptySplits{}}}". > {code:java} > private[spark] val HADOOP_RDD_IGNORE_EMPTY_SPLITS = > ConfigBuilder("spark.hadoopRDD.ignoreEmptySplits") > .internal() > .doc("When true, HadoopRDD/NewHadoopRDD will not create partitions for > empty input splits.") > .version("2.3.0") > .booleanConf > .createWithDefault(true) {code} > The above reasons lead to Spark missing data. So we should consider both the > size of the StoreFile and the MemStore in the RegionSizeCalculator. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[ANNOUNCE] Apache HBase 2.5.10 is now available for download
The HBase team is happy to announce the immediate availability of HBase 2.5.10. Apache HBase™ is an open-source, distributed, versioned, non-relational database. Apache HBase gives you low latency random access to billions of rows with millions of columns atop non-specialized hardware. To learn more about HBase, see https://hbase.apache.org/. HBase 2.5.10 is the latest patch release in the HBase 2.5.x line. The full list of issues can be found in the included CHANGES and RELEASENOTES, or via our issue tracker: https://s.apache.org/2.5.10-jira To download please follow the links and instructions on our website: https://hbase.apache.org/downloads.html Questions, comments, and problems are always welcome at: dev@hbase.apache.org. Thanks to all who contributed and made this release possible. Cheers, The HBase Dev Team
[jira] [Resolved] (HBASE-28755) Update downloads.xml for 2.5.10
[ https://issues.apache.org/jira/browse/HBASE-28755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-28755. - Resolution: Fixed > Update downloads.xml for 2.5.10 > --- > > Key: HBASE-28755 > URL: https://issues.apache.org/jira/browse/HBASE-28755 > Project: HBase > Issue Type: Task >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28655) TestHFileCompressionZstd fails with IllegalArgumentException: Illegal bufferSize
[ https://issues.apache.org/jira/browse/HBASE-28655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankaj Kumar resolved HBASE-28655. -- Resolution: Fixed Thanks [~zhangduo] for the review. > TestHFileCompressionZstd fails with IllegalArgumentException: Illegal > bufferSize > > > Key: HBASE-28655 > URL: https://issues.apache.org/jira/browse/HBASE-28655 > Project: HBase > Issue Type: Bug > Components: HFile, Operability >Affects Versions: 2.6.0, 3.0.0-beta-1, 2.5.8 >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11 > > > HADOOP-18810 added io.compression.codec.zstd.buffersize in core-default.xml > with default value as 0. > So ZSTD buffer size will be returned as 0 based on core-default.xml, > {code:java} > static int getBufferSize(Configuration conf) { > return conf.getInt(ZSTD_BUFFER_SIZE_KEY, > > conf.getInt(CommonConfigurationKeys.IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_KEY, > // IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_DEFAULT is 0! We can't allow > that. > ZSTD_BUFFER_SIZE_DEFAULT)); > } > {code} > HBASE-26259 added a value check, but got reverted in HBASE-26959. > > This issue will also occur during region flush and abort the RegionServer. > > TestHFileCompressionZstd and other zstd related test cases are are also > failing, > {code:java} > java.lang.IllegalArgumentException: Illegal bufferSize > at > org.apache.hadoop.io.compress.CompressorStream.(CompressorStream.java:42) > at > org.apache.hadoop.io.compress.BlockCompressorStream.(BlockCompressorStream.java:56) > at > org.apache.hadoop.hbase.io.compress.aircompressor.ZstdCodec.createOutputStream(ZstdCodec.java:106) > at > org.apache.hadoop.hbase.io.compress.Compression$Algorithm.createPlainCompressionStream(Compression.java:454) > at > org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.(HFileBlockDefaultEncodingContext.java:99) > at > org.apache.hadoop.hbase.io.hfile.NoOpDataBlockEncoder.newDataBlockEncodingContext(NoOpDataBlockEncoder.java:85) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.(HFileBlock.java:846) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishInit(HFileWriterImpl.java:304) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.(HFileWriterImpl.java:185) > at > org.apache.hadoop.hbase.io.hfile.HFile$WriterFactory.create(HFile.java:312) > at > org.apache.hadoop.hbase.io.compress.HFileTestBase.doTest(HFileTestBase.java:73) > at > org.apache.hadoop.hbase.io.compress.aircompressor.TestHFileCompressionZstd.test(TestHFileCompressionZstd.java:54) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28757) Understand how supportplaintext property works in TLS setup.
Rushabh Shah created HBASE-28757: Summary: Understand how supportplaintext property works in TLS setup. Key: HBASE-28757 URL: https://issues.apache.org/jira/browse/HBASE-28757 Project: HBase Issue Type: Improvement Components: security Affects Versions: 2.6.0 Reporter: Rushabh Shah We are testing TLS feature and I am confused on how hbase.server.netty.tls.supportplaintext property works. Here is our current setup. This is a fresh cluster deployment. hbase.server.netty.tls.enabled --> true hbase.client.netty.tls.enabled --> true hbase.server.netty.tls.supportplaintext --> false (We don't want to fallback on kerberos) We still have our kerberos related configuration enabled. hbase.security.authentication --> kerberos *Our expectation:* During regionserver startup, regionserver will use TLS for authentication and the communication will succeed. *Actual observation* During regionserver startup, hmaster authenticates regionserver* via kerberos authentication*and *regionserver's reportForDuty RPC fails*. RS logs: {noformat} 2024-07-25 16:59:55,098 INFO [regionserver/regionserver-0:60020] regionserver.HRegionServer - reportForDuty to master=hmaster-0,6,1721926791062 with isa=regionserver-0/:60020, startcode=1721926793434 2024-07-25 16:59:55,548 DEBUG [RS-EventLoopGroup-1-2] ssl.SslHandler - [id: 0xa48e3487, L:/:39837 - R:hmaster-0/:6] HANDSHAKEN: protocol:TLSv1.2 cipher suite:TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 2024-07-25 16:59:55,578 DEBUG [RS-EventLoopGroup-1-2] security.UserGroupInformation - PrivilegedAction [as: hbase/regionserver-0. (auth:KERBEROS)][action: org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler$2@3769e55] java.lang.Exception at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1896) at org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler.channelRead0(NettyHBaseSaslRpcClientHandler.java:161) at org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler.channelRead0(NettyHBaseSaslRpcClientHandler.java:43) ... ... 2024-07-25 16:59:55,581 DEBUG [RS-EventLoopGroup-1-2] security.UserGroupInformation - PrivilegedAction [as: hbase/regionserver-0 (auth:KERBEROS)][action: org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler$2@c6f0806] java.lang.Exception at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1896) at org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler.channelRead0(NettyHBaseSaslRpcClientHandler.java:161) at org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler.channelRead0(NettyHBaseSaslRpcClientHandler.java:43) at org.apache.hbase.thirdparty.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) 2024-07-25 16:59:55,602 WARN [regionserver/regionserver-0:60020] regionserver.HRegionServer - error telling master we are up org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Call to address=hmaster-0:6 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Connection closed at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:340) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:92) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:595) at org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:16398) at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2997) at org.apache.hadoop.hbase.regionserver.HRegionServer.lambda$run$2(HRegionServer.java:1084) at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187) at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:177) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1079) Caused by: org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Call to address=hmaster-0:6 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Connection closed at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:233) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:391) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425) at org.apache.hadoop.hbase.ipc.AbstractRpc
[jira] [Created] (HBASE-28756) RegionSizeCalculator ignored the size of memstore, which leads Spark miss data
Sun Xin created HBASE-28756: --- Summary: RegionSizeCalculator ignored the size of memstore, which leads Spark miss data Key: HBASE-28756 URL: https://issues.apache.org/jira/browse/HBASE-28756 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 2.5.10, 3.0.0-beta-1, 2.6.0 Reporter: Sun Xin Assignee: Sun Xin RegionSizeCalculator only considers the size of StoreFile and ignores the size of MemStore. For a new region that has only been written to MemStore and has not been flushed, will consider its size to be 0. When we use TableInputFormat to read HBase table data in Spark. {code:java} spark.sparkContext.newAPIHadoopRDD( conf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result]) }{code} Spark defaults to ignoring empty InputSplits, which is determined by the configuration "{{{}spark.hadoopRDD.ignoreEmptySplits{}}}". {code:java} private[spark] val HADOOP_RDD_IGNORE_EMPTY_SPLITS = ConfigBuilder("spark.hadoopRDD.ignoreEmptySplits") .internal() .doc("When true, HadoopRDD/NewHadoopRDD will not create partitions for empty input splits.") .version("2.3.0") .booleanConf .createWithDefault(true) {code} The above reasons lead to Spark missing data. So we should consider both the size of the StoreFile and the MemStore in the RegionSizeCalculator. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28755) Update downloads.xml for 2.5.10
Andrew Kyle Purtell created HBASE-28755: --- Summary: Update downloads.xml for 2.5.10 Key: HBASE-28755 URL: https://issues.apache.org/jira/browse/HBASE-28755 Project: HBase Issue Type: Task Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28754) Verify the first argument passed to compaction_switch
JueWang created HBASE-28754: --- Summary: Verify the first argument passed to compaction_switch Key: HBASE-28754 URL: https://issues.apache.org/jira/browse/HBASE-28754 Project: HBase Issue Type: Improvement Components: shell Reporter: JueWang Sometimes, users may inadvertently attempt to use compaction_switch; therefore, it is advisable to implement a verification step for the first argument passed to this function, ensuring that incorrect inputs do not accidentally disable compaction. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] HBase backup API with record/store phase
Hi Dieter, I don't see a problem with making the individual steps accessible from some external "driver". My only requirement is that there's a clear interface between each step so that whatever driver implementations exist don't get caught with divergent semantics. In the current state, the only driver is the one that we ship with the project, so there's only one place where such semantics must be correct. Because this is an area where dataloss is possible, and dataloss is a reputation-killer for a data storage system like ours, we must tread carefully. Thanks, Nick On Mon, Jul 15, 2024 at 5:27 PM Dieter De Paepe wrote: > At NGData, we are using HBase backup as part of the backup procedure for > our product. Besides HBase, some other components (HDFS, ZooKeeper, ...) > are also backed up. > Due to how our product works, there are some dependencies between these > components, i.e. HBase should be backed up first, then ZooKeeper, then... > To minimize the time between the backup for each component (i.e. to > minimize data drift), we designed a phased approach in our backup procedure: > > * > a "record" phase, where all data relevant for a backup is captured. Eg, > for HDFS this is a HDFS snapshot. > * > a "store" phase, where the captured data is moved to cloud storage. Eg, > for HDFS, this is a DistCP of that snapshot > > This approach allows us to avoid any delay related to data transfer to the > end of the backup procedure, meaning the time between data capture for all > component backups is minimized. > > The HBase backup API currently doesn't support this kind of phase > approach, though the steps that are executed certainly would allow this: > > * > Record phase (full backup): roll WALs, snapshot tables > * > Store phase (full backup): snapshot copy, bulk load copy, updating > metadata, terminating backup session > * > Record phase (incremental backup): roll WALs > * > Record phase (incremental backup): convert WALs to HFiles, bulk load copy, > HFile copy, metadata updates, terminating backup session > > As this seems like a general use-case, I would like to suggest refactoring > the HBase backup API to allow this kind of 2-phase approach. CLI usage can > remain unchanged. > > Before logging any ticket about this, I wanted to hear the community's > thoughts about this. > Unfortunately, I can't promise we will be available to actually spend time > on this in the short term, but I'd rather have a plan of attack ready once > we (or someone else) does have the time. > > Regards, > Dieter >
[jira] [Created] (HBASE-28753) FNFE may occur when accessing the region.jsp of the replica region
guluo created HBASE-28753: - Summary: FNFE may occur when accessing the region.jsp of the replica region Key: HBASE-28753 URL: https://issues.apache.org/jira/browse/HBASE-28753 Project: HBase Issue Type: Bug Components: Replication, UI Affects Versions: 2.4.13 Reporter: guluo Assignee: guluo Attachments: image-2024-07-24-20-08-22-820.png On hbase UI, we can get the details of storefiles in region region by accessing region.jsp. However, When hbase table enables the region replication, the replica region may reference deleted storefile due to it dosen't refresh in a timely manner, so in this case, we would get FNFE when openning the region.jsp of the region. !image-2024-07-24-20-08-22-820.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28752) wal.AsyncFSWAL: sync failed
SunQiang created HBASE-28752: Summary: wal.AsyncFSWAL: sync failed Key: HBASE-28752 URL: https://issues.apache.org/jira/browse/HBASE-28752 Project: HBase Issue Type: Improvement Components: asyncclient, wal Affects Versions: 2.2.5, 2.1.10 Reporter: SunQiang Our HBase system is used for OLAP , The client has strict requirements for latency and stability, and the client configuration is as follows: {code:java} hbase.rpc.timeout: 100 hbase.client.operation.timeout: 500 hbase.client.retries.number: 3 hbase.client.pause: 120 {code} When I logged off the Datanode, I received this exception: {code:java} 2024-06-03 17:19:16,535 WARN [RpcServer.default.RWQ.Fifo.read.handler=216,queue=4,port=16020] hdfs.BlockReaderFactory: I/O error constructing remote block reader. org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.111.242.219:50010] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534) at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3436) at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:777) at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:694) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355) at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1173) at org.apache.hadoop.hdfs.DFSInputStream.access$200(DFSInputStream.java:92) at org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1118) at org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1110) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:2022) at org.apache.hadoop.hdfs.DFSClient$2.rejectedExecution(DFSClient.java:3481) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) at java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:181) at org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1297) {code} This will cause the HBase service to become unstable because HBase has accessed an offline datanode node, resulting in a long time required to create a socket connection to the offline datanode. Through stack logs, I found that it is controlled through the configuration of hdfs.client.socket timeout. -- In hbase-site.xml,I found that adjusting the {color:#FF}hdfs.client.socket-time{color} configuration is effective,so I turned down the hdfs.client.socket time configuration from 60s to 5s. but I found that if I continued to turn down the hdfs.client.socket time configuration to {color:#FF}200ms{color}, the following exception occurred: {code:java} 2024-06-18 15:51:24,212 WARN [AsyncFSWAL-0] wal. AsyncFSWAL: sync failed java.io.IOException: Timeout(200ms) waiting for response .{code} The configuration of 'hdfs. client. socket time' is reused in the FanOutOneBlockAsyncDFSOutput.class of hbase. -- In the 'FanOutOneBlockAsyncDFSOutput' construction method: {code:java} FanOutOneBlockAsyncDFSOutput(Configuration conf, FSUtils fsUtils, DistributedFileSystem dfs, DFSClient client, ClientProtocol namenode, String clientName, String src, long fileId, LocatedBlock locatedBlock, Encryptor encryptor, List datanodeList, DataChecksum summer, ByteBufAllocator alloc) { this.conf = conf; this.fsUtils = fsUtils; this.dfs = dfs; this.client = client; this.namenode = namenode; this.fileId = fileId; this.clientName = clientName; this.src = src; this.block = locatedBlock.getBlock(); this.locations = locatedBlock.getLocations(); this.encryptor = encryptor; this.datanodeList = datanodeList; this.summer = summer; this.maxDataLen = MAX_DATA_LEN - (MAX_DATA_LEN % summer.getBytesPerChecksum()); this.alloc = alloc; this.buf = alloc.directBuffer(sendBufSizePRedictor.initialSize()); this.state = State.STREAMING; setupReceiver(conf.getInt(DFS_CLIENT_SOCKET_TIMEOUT_KEY, READ_TIMEOUT)); } {code} My implementation process: 1. add a new configuration in hbase-site.xml {code:java} + + hbase.wal.asyncfsoutput.timeout + 6 + {code} 2.modify code {code:java} 151 + private static final String FANOUT_TIMEOUTKEY = "hbase.wal.asyncfsoutput.timeout"; 339 - setupReceiver(conf.getInt(DFS_CLIENT_SOCKET_TIMEOUT_KEY, READ_TIMEOUT)); 339 + setupReceiver(c
[jira] [Created] (HBASE-28751) Metrics for ConnectionRegistry API's need to be added
Umesh Kumar Kumawat created HBASE-28751: --- Summary: Metrics for ConnectionRegistry API's need to be added Key: HBASE-28751 URL: https://issues.apache.org/jira/browse/HBASE-28751 Project: HBase Issue Type: Improvement Affects Versions: 2.5.8, 2.4.17 Reporter: Umesh Kumar Kumawat For now, no metrics are being pushed for connection registry API's. We need at least some basic metrics for API's- requestCount - number of requests from client failureCount - number of requests where we give failed response response time - time took to respond to request -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28750) Region normalizer should work in off peak if config
MisterWang created HBASE-28750: -- Summary: Region normalizer should work in off peak if config Key: HBASE-28750 URL: https://issues.apache.org/jira/browse/HBASE-28750 Project: HBase Issue Type: Improvement Components: Normalizer Reporter: MisterWang Region normalizer involves the splitting and merging of regions, which can cause jitter in online services, especially when there are many region normalizer plans. We should run this task during off peak hours if config. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28749) Remove the duplicate configurations named hbase.wal.batch.size
[ https://issues.apache.org/jira/browse/HBASE-28749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-28749. - Resolution: Fixed > Remove the duplicate configurations named hbase.wal.batch.size > -- > > Key: HBASE-28749 > URL: https://issues.apache.org/jira/browse/HBASE-28749 > Project: HBase > Issue Type: Improvement > Components: wal >Affects Versions: 3.0.0-beta-1 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Minor > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > > The following code appears in two places: AsyncFSWAL and AbstractFSWAL > {code:java} > public static final String WAL_BATCH_SIZE = "hbase.wal.batch.size"; > public static final long DEFAULT_WAL_BATCH_SIZE = 64L * 1024; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28743) Snapshot based mapreduce jobs fails with NPE while trying to close mslab within mapper
[ https://issues.apache.org/jira/browse/HBASE-28743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28743. --- Fix Version/s: 2.7.0 3.0.0-beta-2 2.6.1 2.5.11 (was: 2.5.9) Hadoop Flags: Reviewed Resolution: Fixed Pushed to all active branches. THanks [~vineet.4008] for contributing! > Snapshot based mapreduce jobs fails with NPE while trying to close mslab > within mapper > -- > > Key: HBASE-28743 > URL: https://issues.apache.org/jira/browse/HBASE-28743 > Project: HBase > Issue Type: Bug > Components: snapshots >Reporter: Ujjawal Kumar >Assignee: Vineet Kumar Maheshwari >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11 > > > {code:java} > 2024-07-11 10:20:38,800 WARN [main] client.ClientSideRegionScanner - > Exception while closing region > java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1808) > at > org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1557) > at > org.apache.hadoop.hbase.client.ClientSideRegionScanner.close(ClientSideRegionScanner.java:133) > at > org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl$RecordReader.close(TableSnapshotInputFormatImpl.java:310) > at > org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat$TableSnapshotRegionRecordReader.close(TableSnapshotInputFormat.java:184) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:536) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:804) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.recycleChunks(MemStoreLABImpl.java:296) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.lambda$new$0(MemStoreLABImpl.java:109) > at org.apache.hadoop.hbase.nio.RefCnt.deallocate(RefCnt.java:95) > at > org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.handleRelease(AbstractReferenceCounted.java:86) > at > org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.release(AbstractReferenceCounted.java:76) > at org.apache.hadoop.hbase.nio.RefCnt.release(RefCnt.java:84) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.close(MemStoreLABImpl.java:269) > at > org.apache.hadoop.hbase.regionserver.Segment.close(Segment.java:143) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.close(AbstractMemStore.java:381) > at > org.apache.hadoop.hbase.regionserver.HStore.closeWithoutLock(HStore.java:723) > at org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:795) > at > org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1786) > at > org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1783) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) {code} > This happens because the ChunkCreator is only initialized as part of > HRegionServer > [here.|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/HBaseServerBase.java#L410-L431] > HRegion created on top of snapshot files within mapper wouldn't have > ChunkCreator initialized causing NPE while trying to close the memstore > This is seen after https://issues.apache.org/jira/browse/HBASE-28401 > There are 2 possible solutions here : > 1. Initialize ChunkCreator as while trying to create HRegion within snapashot > based mapper > 2. Disable the mslab altogether (via hbase.hregion.memstore.mslab.enabled set > to false) within snapashot based mapper -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28724) BucketCache.notifyFileCachingCompleted may throw IllegalMonitorStateException
[ https://issues.apache.org/jira/browse/HBASE-28724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28724. -- Fix Version/s: 3.0.0 2.7.0 2.6.1 Resolution: Fixed Merged into master, branch-2 and branch-2.6. Thanks for reviewing it, [~psomogyi] ! > BucketCache.notifyFileCachingCompleted may throw IllegalMonitorStateException > -- > > Key: HBASE-28724 > URL: https://issues.apache.org/jira/browse/HBASE-28724 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0, 4.0.0-alpha-1, 2.7.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0, 2.7.0, 2.6.1 > > > If the prefetch thread completes reading the file blocks faster than the > bucket cache writer threads are able to drain it from the writer queues, we > might run into a scenario where BucketCache.notifyFileCachingCompleted may > throw IllegalMonitorStateException, as we can reach [this block of the > code|https://github.com/wchevreuil/hbase/blob/684964f1c1693d2a0792b7b721c92693d75b4cea/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java#L2106]. > I believe the impact is not critical, as the prefetch thread is already > finishing at that point, but nevertheless, such error in the logs might be > misleading. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Failure: HBase Generate Website
Build status: FAILURE The HBase website has not been updated to incorporate recent HBase changes. See https://ci-hbase.apache.org/job/hbase_generate_website/574/console
Re: [jira] [Created] (HBASE-28748) Protocol message tag had invalid wire type.
Sorry, I meant to reply the message in hbase-zh mailing list... 张铎(Duo Zhang) 于2024年7月22日周一 19:58写道: > > Replication 卡了吗?Stream reader 是在不停的 tail > 文件的,如果遇到写了一半的就是有可能出异常,他会重试。如果没卡,后面还能继续读说明就没问题 > > 你也可以尝试用 WALPrettyPrinter 去读一下那个文件看看能不能读? > > Longping Jie (Jira) 于2024年7月22日周一 17:59写道: > > > > Longping Jie created HBASE-28748: > > > > > > Summary: Protocol message tag had invalid wire type. > > Key: HBASE-28748 > > URL: https://issues.apache.org/jira/browse/HBASE-28748 > > Project: HBase > > Issue Type: Bug > > Affects Versions: 2.6.0 > > Environment: hbase2.6.0 > > > > hadoop3.3.6 > > Reporter: Longping Jie > > > > > > 2024-07-22T17:47:49,130 WARN > > [RS_CLAIM_REPLICATION_QUEUE-regionserver/sh2-int-hbase-main-ha-9:16020-0.replicationSource,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464.replicationSource.wal-reader.tx1-int-hbase-main-prod-3%2C16020%2C1720602522464,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464] > > wal.ProtobufWALStreamReader: Error while reading WALKey, > > originalPosition=0, currentPosition=81 > > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: > > Protocol message tag had invalid wire type. > > at > > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:119) > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > > at > > org.apache.hbase.thirdparty.com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:503) > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > > at > > org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:770) > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > > at > > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2829) > > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > > at > > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4212) > > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > > at > > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4204) > > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > > at > > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:192) > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > > at > > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209) > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > > at > > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214) > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > > at > > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > > at > > org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage.parseWithIOException(GeneratedMessage.java:321) > > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > > at > > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey.parseFrom(WALProtos.java:2321) > > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > > at > > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.readWALKey(ProtobufWALTailingReader.java:128) > > ~[hbase-server-2.6.0.jar:2.6.0] > > at > > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.next(ProtobufWALTailingReader.java:257) > > ~[hbase-server-2.6.0.jar:2.6.0] > > at > > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:490) > > ~[hbase-server-2.6.0.jar:2.6.0] > > at > > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.lastAttempt(WALEntryStream.java:306) > > ~[hbase-server-2.6.0.jar:2.6.0] > > at > > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:388) > > ~[hbase-server-2.6.0.jar:2.6.0] > > at > > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:130) > > ~[hbase-server-2.6.0.jar:2.6.0] > > at > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:153) > > ~[hbase-server-2.6.0.jar:2.6.0] > > 2024-07-22T
Re: [jira] [Created] (HBASE-28748) Protocol message tag had invalid wire type.
Replication 卡了吗?Stream reader 是在不停的 tail 文件的,如果遇到写了一半的就是有可能出异常,他会重试。如果没卡,后面还能继续读说明就没问题 你也可以尝试用 WALPrettyPrinter 去读一下那个文件看看能不能读? Longping Jie (Jira) 于2024年7月22日周一 17:59写道: > > Longping Jie created HBASE-28748: > > > Summary: Protocol message tag had invalid wire type. > Key: HBASE-28748 > URL: https://issues.apache.org/jira/browse/HBASE-28748 > Project: HBase > Issue Type: Bug > Affects Versions: 2.6.0 > Environment: hbase2.6.0 > > hadoop3.3.6 > Reporter: Longping Jie > > > 2024-07-22T17:47:49,130 WARN > [RS_CLAIM_REPLICATION_QUEUE-regionserver/sh2-int-hbase-main-ha-9:16020-0.replicationSource,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464.replicationSource.wal-reader.tx1-int-hbase-main-prod-3%2C16020%2C1720602522464,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464] > wal.ProtobufWALStreamReader: Error while reading WALKey, originalPosition=0, > currentPosition=81 > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: > Protocol message tag had invalid wire type. > at > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:119) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:503) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:770) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2829) > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4212) > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4204) > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:192) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage.parseWithIOException(GeneratedMessage.java:321) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey.parseFrom(WALProtos.java:2321) > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.readWALKey(ProtobufWALTailingReader.java:128) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.next(ProtobufWALTailingReader.java:257) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:490) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.lastAttempt(WALEntryStream.java:306) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:388) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:130) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:153) > ~[hbase-server-2.6.0.jar:2.6.0] > 2024-07-22T17:48:13,315 WARN [RS-EventLoopGroup-1-65] ipc.NettyRpcConnection: > Exception encountered while connecting to the server > tx1-int-hbase-main-prod-3:16020 > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out after 1 ms: tx1-int-hbase-main-prod-3/127.0.0.1:16020 > at > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615) > ~[hbase-shaded-netty-4.1.7.jar:?] > at > org.apache.h
Re: Want to join the hbase slack channel
Since you already have a apache.org email address, you can join the ASF slack workspace by your own and then join the #hbase channel. Just follow the guide here. https://infra.apache.org/slack.html Thanks. leojie 于2024年7月22日周一 18:11写道: > > Hi > I want to join the hbase slack channel,I hope to have the opportunity > to learn more about HBase from the big guys. >Thanks a lot. > best wishes to you!
[jira] [Created] (HBASE-28749) Remove the duplicate configurations named hbase.wal.batch.size
Sun Xin created HBASE-28749: --- Summary: Remove the duplicate configurations named hbase.wal.batch.size Key: HBASE-28749 URL: https://issues.apache.org/jira/browse/HBASE-28749 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 3.0.0-beta-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-beta-2 The following code appears in two places: AsyncFSWAL and AbstractFSWAL {code:java} public static final String WAL_BATCH_SIZE = "hbase.wal.batch.size"; public static final long DEFAULT_WAL_BATCH_SIZE = 64L * 1024; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Want to join the hbase slack channel
Hi I want to join the hbase slack channel,I hope to have the opportunity to learn more about HBase from the big guys. Thanks a lot. best wishes to you!
[jira] [Created] (HBASE-28748) Protocol message tag had invalid wire type.
Longping Jie created HBASE-28748: Summary: Protocol message tag had invalid wire type. Key: HBASE-28748 URL: https://issues.apache.org/jira/browse/HBASE-28748 Project: HBase Issue Type: Bug Affects Versions: 2.6.0 Environment: hbase2.6.0 hadoop3.3.6 Reporter: Longping Jie 2024-07-22T17:47:49,130 WARN [RS_CLAIM_REPLICATION_QUEUE-regionserver/sh2-int-hbase-main-ha-9:16020-0.replicationSource,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464.replicationSource.wal-reader.tx1-int-hbase-main-prod-3%2C16020%2C1720602522464,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464] wal.ProtobufWALStreamReader: Error while reading WALKey, originalPosition=0, currentPosition=81 org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type. at org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:119) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:503) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:770) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2829) ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] at org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4212) ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] at org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4204) ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:192) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage.parseWithIOException(GeneratedMessage.java:321) ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] at org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey.parseFrom(WALProtos.java:2321) ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] at org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.readWALKey(ProtobufWALTailingReader.java:128) ~[hbase-server-2.6.0.jar:2.6.0] at org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.next(ProtobufWALTailingReader.java:257) ~[hbase-server-2.6.0.jar:2.6.0] at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:490) ~[hbase-server-2.6.0.jar:2.6.0] at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.lastAttempt(WALEntryStream.java:306) ~[hbase-server-2.6.0.jar:2.6.0] at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:388) ~[hbase-server-2.6.0.jar:2.6.0] at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:130) ~[hbase-server-2.6.0.jar:2.6.0] at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:153) ~[hbase-server-2.6.0.jar:2.6.0] 2024-07-22T17:48:13,315 WARN [RS-EventLoopGroup-1-65] ipc.NettyRpcConnection: Exception encountered while connecting to the server tx1-int-hbase-main-prod-3:16020 org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: connection timed out after 1 ms: tx1-int-hbase-main-prod-3/127.0.0.1:16020 at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615) ~[hbase-shaded-netty-4.1.7.jar:?] at org.apache.hbase.thirdparty.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[hbase-shaded-netty-4.1.7.jar:?] at org.apache.hbase.thirdparty.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) ~[hbase-shaded-netty-4.1.7.jar:?] at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[hbase-shaded-netty-4.1.7.jar:?] at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[hbase-shaded-netty-4.1.7.jar
[jira] [Created] (HBASE-28747) HBase-Nightly-s390x Build failures
Soham Munshi created HBASE-28747: Summary: HBase-Nightly-s390x Build failures Key: HBASE-28747 URL: https://issues.apache.org/jira/browse/HBASE-28747 Project: HBase Issue Type: Task Components: community, jenkins Reporter: Soham Munshi Hi [~qi...@zhang.net] This is regarding recent [s390x CI failures|https://ci-hbase.apache.org/job/HBase-Nightly-s390x/] . The install.log and junit.log has got below output - {code:java} /tmp/jenkins18056117051185954087.sh: line 12: /home/jenkins/tools/maven/latest3//bin/mvn: No such file or directory{code} Upon checking the machine stats it seems like the Apache Maven path is not getting set properly, since the mvn_home outputs - {code:java} MAVEN_HOME: /home/jenkins/tools/maven/latest3/{code} where as the mvn_version outputs the following - {code:java} [1mApache Maven 3.6.3[m Maven home: /usr/share/maven Java version: 11.0.23, vendor: Ubuntu, runtime: /usr/lib/jvm/java-11-openjdk-s390x Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "5.4.0-174-generic", arch: "s390x", family: "unix"{code} Could you please help us get this fixed? Thanks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28734) Improve HBase shell snapshot command Doc with TTL option
[ https://issues.apache.org/jira/browse/HBASE-28734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liangjun He resolved HBASE-28734. - Resolution: Fixed > Improve HBase shell snapshot command Doc with TTL option > - > > Key: HBASE-28734 > URL: https://issues.apache.org/jira/browse/HBASE-28734 > Project: HBase > Issue Type: Improvement > Components: shell >Reporter: Ashok shetty >Assignee: Liangjun He >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11 > > > The current HBase shell snapshot command allows users to create a snapshot of > a specific table. While this command is useful, it could be enhanced by > adding a TTL (Time-to-Live) option. This would allow users to specify a time > period after which the snapshot would automatically be deleted. > I propose we introduce a TTL option in the snapshot command doc as follows: > hbase> snapshot 'sourceTable', 'snapshotName', \{TTL => '7d'} > This would create a snapshot of 'sourceTable' called 'snapshotName' that > would automatically be deleted after 7 days. The addition document of a TTL > option would provide a better user experience and assist with efficient > storage management. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28746) [hbase-thirdparty] Bump netty to latest 4.1.112.Final version
Pankaj Kumar created HBASE-28746: Summary: [hbase-thirdparty] Bump netty to latest 4.1.112.Final version Key: HBASE-28746 URL: https://issues.apache.org/jira/browse/HBASE-28746 Project: HBase Issue Type: Bug Components: dependencies, security, thirdparty Reporter: Pankaj Kumar Assignee: Pankaj Kumar netty 4.1.112.Final is released recently, let's upgrade the dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HBASE-28734) Improve HBase shell snapshot command Doc with TTL option
[ https://issues.apache.org/jira/browse/HBASE-28734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang reopened HBASE-28734: --- > Improve HBase shell snapshot command Doc with TTL option > - > > Key: HBASE-28734 > URL: https://issues.apache.org/jira/browse/HBASE-28734 > Project: HBase > Issue Type: Improvement > Components: shell >Reporter: Ashok shetty >Assignee: Liangjun He >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > > The current HBase shell snapshot command allows users to create a snapshot of > a specific table. While this command is useful, it could be enhanced by > adding a TTL (Time-to-Live) option. This would allow users to specify a time > period after which the snapshot would automatically be deleted. > I propose we introduce a TTL option in the snapshot command doc as follows: > hbase> snapshot 'sourceTable', 'snapshotName', \{TTL => '7d'} > This would create a snapshot of 'sourceTable' called 'snapshotName' that > would automatically be deleted after 7 days. The addition document of a TTL > option would provide a better user experience and assist with efficient > storage management. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28734) Improve HBase shell snapshot command Doc with TTL option
[ https://issues.apache.org/jira/browse/HBASE-28734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liangjun He resolved HBASE-28734. - Fix Version/s: 4.0.0-alpha-1 Resolution: Fixed > Improve HBase shell snapshot command Doc with TTL option > - > > Key: HBASE-28734 > URL: https://issues.apache.org/jira/browse/HBASE-28734 > Project: HBase > Issue Type: Improvement > Components: shell >Reporter: Ashok shetty >Assignee: Liangjun He >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > > The current HBase shell snapshot command allows users to create a snapshot of > a specific table. While this command is useful, it could be enhanced by > adding a TTL (Time-to-Live) option. This would allow users to specify a time > period after which the snapshot would automatically be deleted. > I propose we introduce a TTL option in the snapshot command doc as follows: > hbase> snapshot 'sourceTable', 'snapshotName', \{TTL => '7d'} > This would create a snapshot of 'sourceTable' called 'snapshotName' that > would automatically be deleted after 7 days. The addition document of a TTL > option would provide a better user experience and assist with efficient > storage management. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28744) Add a new command-line option for table backup in our ref guide
[ https://issues.apache.org/jira/browse/HBASE-28744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liangjun He resolved HBASE-28744. - Resolution: Fixed > Add a new command-line option for table backup in our ref guide > --- > > Key: HBASE-28744 > URL: https://issues.apache.org/jira/browse/HBASE-28744 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Liangjun He >Assignee: Liangjun He >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28702) TestBackupMerge fails 100% of times on flaky dashboard
[ https://issues.apache.org/jira/browse/HBASE-28702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liangjun He resolved HBASE-28702. - Resolution: Fixed > TestBackupMerge fails 100% of times on flaky dashboard > -- > > Key: HBASE-28702 > URL: https://issues.apache.org/jira/browse/HBASE-28702 > Project: HBase > Issue Type: Bug > Components: backuprestore >Reporter: Duo Zhang >Assignee: Liangjun He >Priority: Critical > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28745) Default Zookeeper ConnectionRegistry APIs timeout should be less
[ https://issues.apache.org/jira/browse/HBASE-28745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani resolved HBASE-28745. -- Hadoop Flags: Reviewed Resolution: Fixed > Default Zookeeper ConnectionRegistry APIs timeout should be less > > > Key: HBASE-28745 > URL: https://issues.apache.org/jira/browse/HBASE-28745 > Project: HBase > Issue Type: Sub-task >Reporter: Viraj Jasani >Assignee: Divneet Kaur >Priority: Minor > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11 > > > HBASE-28428 introduces timeout for Zookeeper ConnectionRegistry APIs. > However, the default timeout value we have set is 60s. Given that connection > registry are metadata APIs, they should have much lesser timeout value, > including default. > Let's set default timeout to 10s. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28745) Default Zookeeper ConnectionRegistry APIs timeout should be less
Viraj Jasani created HBASE-28745: Summary: Default Zookeeper ConnectionRegistry APIs timeout should be less Key: HBASE-28745 URL: https://issues.apache.org/jira/browse/HBASE-28745 Project: HBase Issue Type: Sub-task Reporter: Viraj Jasani HBASE-28428 introduces timeout for Zookeeper ConnectionRegistry APIs. However, the default timeout value we have set is 60s. Given that connection registry are metadata APIs, they should have much lesser timeout value, including default. Let's set default timeout to 10s. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28744) Add a new command-line option for table backup in our ref guide
Liangjun He created HBASE-28744: --- Summary: Add a new command-line option for table backup in our ref guide Key: HBASE-28744 URL: https://issues.apache.org/jira/browse/HBASE-28744 Project: HBase Issue Type: Task Components: documentation Reporter: Liangjun He Assignee: Liangjun He Fix For: 4.0.0-alpha-1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28428) Zookeeper ConnectionRegistry APIs should have timeout
[ https://issues.apache.org/jira/browse/HBASE-28428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani resolved HBASE-28428. -- Fix Version/s: 2.7.0 2.6.1 2.5.11 Hadoop Flags: Reviewed Resolution: Fixed > Zookeeper ConnectionRegistry APIs should have timeout > - > > Key: HBASE-28428 > URL: https://issues.apache.org/jira/browse/HBASE-28428 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.4.17, 3.0.0-beta-1, 2.5.8 >Reporter: Viraj Jasani >Assignee: Divneet Kaur >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11 > > > Came across a couple of instances where active master failover happens around > the same time as Zookeeper leader failover, leading to stuck HBase client if > one of the threads is blocked on one of the ConnectionRegistry rpc calls. > ConnectionRegistry APIs are wrapped with CompletableFuture. However, their > usages do not have any timeouts, which can potentially lead to the entire > client in stuck state indefinitely as we take some global locks. For > instance, _getKeepAliveMasterService()_ takes > {_}masterLock{_}, hence if getting active master from _masterAddressZNode_ > gets stuck, we can block any admin operation that needs > {_}getKeepAliveMasterService(){_}. > > Sample stacktrace that blocked all client operations that required table > descriptor from Admin: > {code:java} > jdk.internal.misc.Unsafe.park > java.util.concurrent.locks.LockSupport.park > java.util.concurrent.CompletableFuture$Signaller.block > java.util.concurrent.ForkJoinPool.managedBlock > java.util.concurrent.CompletableFuture.waitingGet > java.util.concurrent.CompletableFuture.get > org.apache.hadoop.hbase.client.ConnectionImplementation.get > org.apache.hadoop.hbase.client.ConnectionImplementation.access$? > org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStubNoRetries > org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStub > org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService > org.apache.hadoop.hbase.client.ConnectionImplementation.getMaster > org.apache.hadoop.hbase.client.MasterCallable.prepare > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable > org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor > org.apache.hadoop.hbase.client.HTable.getDescriptororg.apache.phoenix.query.ConnectionQueryServicesImpl.getTableDescriptor > org.apache.phoenix.query.DelegateConnectionQueryServices.getTableDescriptor > org.apache.phoenix.util.IndexUtil.isGlobalIndexCheckerEnabled > org.apache.phoenix.execute.MutationState.filterIndexCheckerMutations > org.apache.phoenix.execute.MutationState.sendBatch > org.apache.phoenix.execute.MutationState.send > org.apache.phoenix.execute.MutationState.send > org.apache.phoenix.execute.MutationState.commit > org.apache.phoenix.jdbc.PhoenixConnection$?.call > org.apache.phoenix.jdbc.PhoenixConnection$?.call > org.apache.phoenix.call.CallRunner.run > org.apache.phoenix.jdbc.PhoenixConnection.commit {code} > Another similar incident is captured on PHOENIX-7233. In this case, > retrieving clusterId from ZNode got stuck and that blocked client from being > able to create any more HBase Connection. Stacktrace for referece: > {code:java} > jdk.internal.misc.Unsafe.park > java.util.concurrent.locks.LockSupport.park > java.util.concurrent.CompletableFuture$Signaller.block > java.util.concurrent.ForkJoinPool.managedBlock > java.util.concurrent.CompletableFuture.waitingGet > java.util.concurrent.CompletableFuture.get > org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId > org.apache.hadoop.hbase.client.ConnectionImplementation. > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance? > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance > jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance > java.lang.reflect.Constructor.newInstance > org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$? > org.apache.hadoop.hbase.client.ConnectionFactory$$Lambda$?.run > java.security.AccessController.doPrivileged > javax.security.auth.Subject.doAs > org.apache.hadoop.security.UserGroupInformation.doAs > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection > org.apache.hado
[jira] [Created] (HBASE-28742) CompactionTool fails with NPE when mslab is enabled
Vineet Kumar Maheshwari created HBASE-28742: --- Summary: CompactionTool fails with NPE when mslab is enabled Key: HBASE-28742 URL: https://issues.apache.org/jira/browse/HBASE-28742 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 2.5.9, 3.0.0-beta-1, 2.6.0 Reporter: Vineet Kumar Maheshwari Assignee: Vineet Kumar Maheshwari While using the CompactionTool, NPE is observed. *Command:* hbase org.apache.hadoop.hbase.regionserver.CompactionTool -major *Exception Details:* Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.recycleChunks(MemStoreLABImpl.java:296) at org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.lambda$new$0(MemStoreLABImpl.java:109) at org.apache.hadoop.hbase.nio.RefCnt.deallocate(RefCnt.java:95) at org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.handleRelease(AbstractReferenceCounted.java:86) at org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.release(AbstractReferenceCounted.java:76) at org.apache.hadoop.hbase.nio.RefCnt.release(RefCnt.java:84) at org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.close(MemStoreLABImpl.java:269) at org.apache.hadoop.hbase.regionserver.Segment.close(Segment.java:143) at org.apache.hadoop.hbase.regionserver.AbstractMemStore.close(AbstractMemStore.java:381) at org.apache.hadoop.hbase.regionserver.HStore.closeWithoutLock(HStore.java:723) at org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:795) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compactStoreFiles(CompactionTool.java:171) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compactRegion(CompactionTool.java:137) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compactTable(CompactionTool.java:129) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:118) at org.apache.hadoop.hbase.regionserver.CompactionTool.doClient(CompactionTool.java:374) at org.apache.hadoop.hbase.regionserver.CompactionTool.run(CompactionTool.java:424) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.regionserver.CompactionTool.main(CompactionTool.java:460) *Fix Suggestions:* Initialize the ChunkCreator in CompactionTool when hbase.hregion.memstore.mslab.enabled is enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28741) Rpc ConnectionRegistry APIs should have timeout
Viraj Jasani created HBASE-28741: Summary: Rpc ConnectionRegistry APIs should have timeout Key: HBASE-28741 URL: https://issues.apache.org/jira/browse/HBASE-28741 Project: HBase Issue Type: Improvement Affects Versions: 2.5.10, 2.4.18, 2.6.0 Reporter: Viraj Jasani ConnectionRegistry are some of the most basic metadata APIs that determine how clients can interact with the servers after getting required metadata. These APIs should timeout quickly if they cannot server metadata in time. Similar to HBASE-28428 introducing timeout for Zookeeper ConnectionRegistry APIs, we should also introduce timeout (same timeout values) for Rpc ConnectionRegistry APIs as well. RpcConnectionRegistry uses HBase RPC framework with hedge read fanout mode. We have two options to introduce timeout: # Use RetryTimer to keep watch on CompletableFuture and make it complete exceptionally if timeout is reached (similar proposal as HBASE-28428). # Introduce separate Rpc timeout config for AbstractRpcBasedConnectionRegistry as the rpc timeout for generic RPC operations (hbase.rpc.timeout) could be higher. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28743) Snapshot based mapreduce jobs fails with NPE while trying to close mslab within mapper
Ujjawal Kumar created HBASE-28743: - Summary: Snapshot based mapreduce jobs fails with NPE while trying to close mslab within mapper Key: HBASE-28743 URL: https://issues.apache.org/jira/browse/HBASE-28743 Project: HBase Issue Type: Bug Components: snapshots Reporter: Ujjawal Kumar 2024-07-11 10:20:38,800 WARN [main] client.ClientSideRegionScanner - Exception while closing region java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1808) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1557) at org.apache.hadoop.hbase.client.ClientSideRegionScanner.close(ClientSideRegionScanner.java:133) at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl$RecordReader.close(TableSnapshotInputFormatImpl.java:310) at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat$TableSnapshotRegionRecordReader.close(TableSnapshotInputFormat.java:184) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:536) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:804) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.recycleChunks(MemStoreLABImpl.java:296) at org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.lambda$new$0(MemStoreLABImpl.java:109) at org.apache.hadoop.hbase.nio.RefCnt.deallocate(RefCnt.java:95) at org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.handleRelease(AbstractReferenceCounted.java:86) at org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.release(AbstractReferenceCounted.java:76) at org.apache.hadoop.hbase.nio.RefCnt.release(RefCnt.java:84) at org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.close(MemStoreLABImpl.java:269) at org.apache.hadoop.hbase.regionserver.Segment.close(Segment.java:143) at org.apache.hadoop.hbase.regionserver.AbstractMemStore.close(AbstractMemStore.java:381) at org.apache.hadoop.hbase.regionserver.HStore.closeWithoutLock(HStore.java:723) at org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:795) at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1786) at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1783) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28704) The expired snapshot can be read by CopyTable or ExportSnapshot
[ https://issues.apache.org/jira/browse/HBASE-28704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guluo resolved HBASE-28704. --- Resolution: Fixed > The expired snapshot can be read by CopyTable or ExportSnapshot > > > Key: HBASE-28704 > URL: https://issues.apache.org/jira/browse/HBASE-28704 > Project: HBase > Issue Type: Bug > Components: mapreduce, snapshots >Affects Versions: 2.4.13 >Reporter: guluo >Assignee: guluo >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11 > > > We can get data of the expired snapshot through the following way. > {code:java} > hbase org.apache.hadoop.hbase.mapreduce.CopyTable --snapshot expired_snapshot > --new.name my_table{code} > And we did not check if the snapshot is expired when we export a snaoshot by > ExportSnapshot tool. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28740) Need to call parent class's serialization methods in CloseExcessRegionReplicasProcedure
[ https://issues.apache.org/jira/browse/HBASE-28740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-28740. - Hadoop Flags: Reviewed Resolution: Fixed > Need to call parent class's serialization methods in > CloseExcessRegionReplicasProcedure > --- > > Key: HBASE-28740 > URL: https://issues.apache.org/jira/browse/HBASE-28740 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Blocker > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.10 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28735) Move our official slack channel from apache-hbase.slack.com to the one in the-asf.slack.com
[ https://issues.apache.org/jira/browse/HBASE-28735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28735. --- Assignee: Duo Zhang Resolution: Fixed Done. > Move our official slack channel from apache-hbase.slack.com to the one in > the-asf.slack.com > --- > > Key: HBASE-28735 > URL: https://issues.apache.org/jira/browse/HBASE-28735 > Project: HBase > Issue Type: Task > Components: community >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > > According to this thread in the mailing list > https://lists.apache.org/thread/cyr8vfxvfqm2srz7m1kkp4mkk015r8wx > Let's do the move. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28738) Send notice email to all mailing list to mention the slack channel change
[ https://issues.apache.org/jira/browse/HBASE-28738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28738. --- Resolution: Fixed Done. > Send notice email to all mailing list to mention the slack channel change > - > > Key: HBASE-28738 > URL: https://issues.apache.org/jira/browse/HBASE-28738 > Project: HBase > Issue Type: Sub-task > Components: community >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[NOTICE] Official slack channel moved to #hbase on https://the-asf.slack.com/
Per the discussion thread[1], we finally decided to move our official slack channel from apache-hbase.slack.com to #hbase channel on the-asf.slack.com. Please mail to dev@hbase to request an invite. Thanks. below are Chinese 以下是中文 === 经过讨论[1],我们决定把官方 slack channel 从 apache-hbase.slack.com 转移到 the-asf.slack.com 上的 #hbase。 如果你想加入,请发邮件给 dev@hbase。 谢谢 1. https://lists.apache.org/thread/cyr8vfxvfqm2srz7m1kkp4mkk015r8wx
[jira] [Resolved] (HBASE-28737) Add the slack channel related information in README.md
[ https://issues.apache.org/jira/browse/HBASE-28737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28737. --- Fix Version/s: 2.7.0 3.0.0-beta-2 2.6.1 2.5.10 Hadoop Flags: Reviewed Resolution: Fixed Pushed to all active branches. Thanks [~meiyi] for reviewing! > Add the slack channel related information in README.md > -- > > Key: HBASE-28737 > URL: https://issues.apache.org/jira/browse/HBASE-28737 > Project: HBase > Issue Type: Sub-task > Components: documentation >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.10 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28736) Modify our ref guide about the slack channel change
[ https://issues.apache.org/jira/browse/HBASE-28736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28736. --- Fix Version/s: 4.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed Merged to master. Thanks [~meiyi] for reviewing! > Modify our ref guide about the slack channel change > --- > > Key: HBASE-28736 > URL: https://issues.apache.org/jira/browse/HBASE-28736 > Project: HBase > Issue Type: Sub-task > Components: documentation >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28740) Need to call parent class's serialization methods in CloseExcessRegionReplicasProcedure
Duo Zhang created HBASE-28740: - Summary: Need to call parent class's serialization methods in CloseExcessRegionReplicasProcedure Key: HBASE-28740 URL: https://issues.apache.org/jira/browse/HBASE-28740 Project: HBase Issue Type: Bug Components: proc-v2 Reporter: Duo Zhang Assignee: Duo Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[ANNOUNCE] Apache HBase 2.5.9 is now available for download
The HBase team is happy to announce the immediate availability of HBase 2.5.9. Apache HBase™ is an open-source, distributed, versioned, non-relational database. Apache HBase gives you low latency random access to billions of rows with millions of columns atop non-specialized hardware. To learn more about HBase, see https://hbase.apache.org/. HBase 2.5.9 is the latest patch release in the HBase 2.5.x line. The full list of issues can be found in the included CHANGES and RELEASENOTES, or via our issue tracker: https://s.apache.org/2.5.9-jira To download please follow the links and instructions on our website: https://hbase.apache.org/downloads.html Questions, comments, and problems are always welcome at: dev@hbase.apache.org. Thanks to all who contributed and made this release possible. Cheers, The HBase Dev Team
[jira] [Resolved] (HBASE-28739) Update downloads.xml for 2.5.9
[ https://issues.apache.org/jira/browse/HBASE-28739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-28739. - Resolution: Fixed > Update downloads.xml for 2.5.9 > -- > > Key: HBASE-28739 > URL: https://issues.apache.org/jira/browse/HBASE-28739 > Project: HBase > Issue Type: Task >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28739) Update downloads.xml for 2.5.9
Andrew Kyle Purtell created HBASE-28739: --- Summary: Update downloads.xml for 2.5.9 Key: HBASE-28739 URL: https://issues.apache.org/jira/browse/HBASE-28739 Project: HBase Issue Type: Task Reporter: Andrew Kyle Purtell -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28737) Add the slack channel related inforamtion in README.md
Duo Zhang created HBASE-28737: - Summary: Add the slack channel related inforamtion in README.md Key: HBASE-28737 URL: https://issues.apache.org/jira/browse/HBASE-28737 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Duo Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28738) Send notice email to all mailing list to mention the slack channel change
Duo Zhang created HBASE-28738: - Summary: Send notice email to all mailing list to mention the slack channel change Key: HBASE-28738 URL: https://issues.apache.org/jira/browse/HBASE-28738 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28735) Move our official slack channel from apache-hbase.slack.com to the one in the-asf.slack.com
Duo Zhang created HBASE-28735: - Summary: Move our official slack channel from apache-hbase.slack.com to the one in the-asf.slack.com Key: HBASE-28735 URL: https://issues.apache.org/jira/browse/HBASE-28735 Project: HBase Issue Type: Task Components: community Reporter: Duo Zhang According to this thread in the mailing list https://lists.apache.org/thread/cyr8vfxvfqm2srz7m1kkp4mkk015r8wx Let's do the move. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28736) Modify our ref guide about the slack channel change
Duo Zhang created HBASE-28736: - Summary: Modify our ref guide about the slack channel change Key: HBASE-28736 URL: https://issues.apache.org/jira/browse/HBASE-28736 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Duo Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28674) Bump minimum supported java version to 17 for HBase 3.x
[ https://issues.apache.org/jira/browse/HBASE-28674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28674. --- Fix Version/s: 3.0.0-beta-2 Release Note: HBase 3.x can only be compiled and run by JDK17+. We set release to 17 when compiling so the generated byte code can only be executed by JRE17+, and we also change the scripts to check JDK version to be 17+ before actually doing anything. Resolution: Fixed > Bump minimum supported java version to 17 for HBase 3.x > --- > > Key: HBASE-28674 > URL: https://issues.apache.org/jira/browse/HBASE-28674 > Project: HBase > Issue Type: Umbrella > Components: build, java >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0-beta-2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28683) Only allow one TableProcedureInterface for a single table to run at the same time for some special procedure types
[ https://issues.apache.org/jira/browse/HBASE-28683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28683. --- Fix Version/s: 2.7.0 3.0.0-beta-2 2.6.1 2.5.10 Hadoop Flags: Reviewed Resolution: Fixed Pushed to all active branches. Thanks [~vjasani] for reviewing! > Only allow one TableProcedureInterface for a single table to run at the same > time for some special procedure types > -- > > Key: HBASE-28683 > URL: https://issues.apache.org/jira/browse/HBASE-28683 > Project: HBase > Issue Type: Improvement > Components: master, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.10 > > > We have a table lock in the MasterProcedureScheduler, which is designed to > only allow one procedure to run at the same time when they require exclusive > lock. > But there is a problem that for availability, usually we can not always hold > the exclusive lock through the whole procedure life time, as if so, we can > not execute region assignment for this table too. The solution is to set > holdLock to false, which means we will release the table lock after one > execution cycle. > In this way, it is possible that different table procedures may execute at > the same time, which could mess things up. > Especially that, in HBASE-28522, we find out that it is even impossible for > DisableTableProcedure to hold the exclusive lock all the time. If the steps > for DisableTableProcedure can be overlapped with other procedures like > ModifyTableProcedure or even EnableTableProcedure, things will be > definationly messed up... > So we need to find another way to ensure that for a single table, only one of > these procedures can be executed at the same time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28734) Improve HBase shell snapshot command Doc with TTL option
Ashok shetty created HBASE-28734: Summary: Improve HBase shell snapshot command Doc with TTL option Key: HBASE-28734 URL: https://issues.apache.org/jira/browse/HBASE-28734 Project: HBase Issue Type: Improvement Components: shell Reporter: Ashok shetty The current HBase shell snapshot command allows users to create a snapshot of a specific table. While this command is useful, it could be enhanced by adding a TTL (Time-to-Live) option. This would allow users to specify a time period after which the snapshot would automatically be deleted. I propose we introduce a TTL option in the snapshot command doc as follows: hbase> snapshot 'sourceTable', 'snapshotName', \{TTL => '7d'} This would create a snapshot of 'sourceTable' called 'snapshotName' that would automatically be deleted after 7 days. The addition document of a TTL option would provide a better user experience and assist with efficient storage management. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28733) Publish API docs for 2.6
Nick Dimiduk created HBASE-28733: Summary: Publish API docs for 2.6 Key: HBASE-28733 URL: https://issues.apache.org/jira/browse/HBASE-28733 Project: HBase Issue Type: Task Components: community, documentation Reporter: Nick Dimiduk We have released 2.6 but the website has not been updated with the new API docs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-26092) JVM core dump in the replication path
[ https://issues.apache.org/jira/browse/HBASE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-26092. - Resolution: Duplicate > JVM core dump in the replication path > - > > Key: HBASE-26092 > URL: https://issues.apache.org/jira/browse/HBASE-26092 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.3.5 >Reporter: Huaxiang Sun >Priority: Critical > > When replication is turned on, we found the following code dump in the region > server. > I checked the code dump for replication. I think I got some ideas. For > replication, when RS receives walEdits from remote cluster, it needs to send > them out to final RS. In this case, NettyRpcConnection is deployed, calls are > queued while it refers to ByteBuffer in the context of replicationHandler > (returned to the pool once it returns). Code dump will happen since the > byteBuffer has been reused. Needs ref count in this asynchronous processing. > > Feel free to take it, otherwise, I will try to work on a patch later. > > > {code:java} > Stack: [0x7fb1bf039000,0x7fb1bf13a000], sp=0x7fb1bf138560, free > space=1021k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > J 28175 C2 > org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I > (21 bytes) @ 0x7fd2663c [0x7fd263c0+0x27c] > J 14912 C2 > org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.writeRequest(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Lorg/apache/hadoop/hbase/ipc/Call;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (370 bytes) @ 0x7fdbbb94b590 [0x7fdbbb949c00+0x1990] > J 14911 C2 > org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (30 bytes) @ 0x7fdbb972d1d4 [0x7fdbb972d1a0+0x34] > J 30476 C2 > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (149 bytes) @ 0x7fdbbd4e7084 [0x7fdbbd4e6900+0x784] > J 14914 C2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$6$1.run()V (22 > bytes) @ 0x7fdbbb9344ec [0x7fdbbb934280+0x26c] > J 23528 C2 > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z > (106 bytes) @ 0x7fdbbcbb0efc [0x7fdbbcbb0c40+0x2bc] > J 15987% C2 > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run()V (461 > bytes) @ 0x7fdbbbaf1580 [0x7fdbbbaf1360+0x220] > j > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run()V+44 > j > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run()V+11 > j > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run()V+4 > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[DISCUSS] HBase backup API with record/store phase
At NGData, we are using HBase backup as part of the backup procedure for our product. Besides HBase, some other components (HDFS, ZooKeeper, ...) are also backed up. Due to how our product works, there are some dependencies between these components, i.e. HBase should be backed up first, then ZooKeeper, then... To minimize the time between the backup for each component (i.e. to minimize data drift), we designed a phased approach in our backup procedure: * a "record" phase, where all data relevant for a backup is captured. Eg, for HDFS this is a HDFS snapshot. * a "store" phase, where the captured data is moved to cloud storage. Eg, for HDFS, this is a DistCP of that snapshot This approach allows us to avoid any delay related to data transfer to the end of the backup procedure, meaning the time between data capture for all component backups is minimized. The HBase backup API currently doesn't support this kind of phase approach, though the steps that are executed certainly would allow this: * Record phase (full backup): roll WALs, snapshot tables * Store phase (full backup): snapshot copy, bulk load copy, updating metadata, terminating backup session * Record phase (incremental backup): roll WALs * Record phase (incremental backup): convert WALs to HFiles, bulk load copy, HFile copy, metadata updates, terminating backup session As this seems like a general use-case, I would like to suggest refactoring the HBase backup API to allow this kind of 2-phase approach. CLI usage can remain unchanged. Before logging any ticket about this, I wanted to hear the community's thoughts about this. Unfortunately, I can't promise we will be available to actually spend time on this in the short term, but I'd rather have a plan of attack ready once we (or someone else) does have the time. Regards, Dieter
[jira] [Resolved] (HBASE-28727) SteppingSplitPolicy may not work when table enables region replication
[ https://issues.apache.org/jira/browse/HBASE-28727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28727. --- Fix Version/s: 2.7.0 3.0.0-beta-2 2.6.1 2.5.10 Hadoop Flags: Reviewed Resolution: Fixed Pushed to all active branches. Thanks [~guluo] for contributing! > SteppingSplitPolicy may not work when table enables region replication > -- > > Key: HBASE-28727 > URL: https://issues.apache.org/jira/browse/HBASE-28727 > Project: HBase > Issue Type: Bug >Affects Versions: 2.4.13 >Reporter: guluo >Assignee: guluo >Priority: Minor > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.10 > > > Reproduction: > 1. Create a table with region replication, and ensure that the primary region > and replica region are on the same RS (eg: the HBase cluster has only one RS) > create 't01', 'info', \{REGION_REPLICATION => 2} > 2. The first region does not split when storefile size exceed flushsize * 2, > because that we get 2 regions about this table on this RS (1 primary region > and 1 replica region) > > I think we should ignore the replica reggion when getting the count of > regions on this same regionserver. > Is my idea correct? maybe can discuss it . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28732) Fix typo in Jenkinsfile_Github for jdk8 hadoop2 check
Duo Zhang created HBASE-28732: - Summary: Fix typo in Jenkinsfile_Github for jdk8 hadoop2 check Key: HBASE-28732 URL: https://issues.apache.org/jira/browse/HBASE-28732 Project: HBase Issue Type: Improvement Components: jenkins Reporter: Duo Zhang https://github.com/apache/hbase/blob/9dee538f65d84a900724d424c71793dff46e9684/dev-support/Jenkinsfile_GitHub#L314 This line PR JDK8 Hadoop3 Check Report Should be PR JDK8 Hadoop2 Check Report -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28731) Remove the IA.Private annotation on WALEdit's add methods as they have already been used by CP users
Duo Zhang created HBASE-28731: - Summary: Remove the IA.Private annotation on WALEdit's add methods as they have already been used by CP users Key: HBASE-28731 URL: https://issues.apache.org/jira/browse/HBASE-28731 Project: HBase Issue Type: Task Components: Coprocessors, wal Reporter: Duo Zhang Assignee: Duo Zhang Per the discussion thread here https://lists.apache.org/thread/b7zfyqmxo9lrt2rpo0lc0m6vsomn217w -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28723) [JDK17] TestSecureIPC fails under JDK17
[ https://issues.apache.org/jira/browse/HBASE-28723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28723. --- Hadoop Flags: Reviewed Resolution: Fixed The flaky dashboard is OK for branch-2.5. Resolve. > [JDK17] TestSecureIPC fails under JDK17 > --- > > Key: HBASE-28723 > URL: https://issues.apache.org/jira/browse/HBASE-28723 > Project: HBase > Issue Type: Sub-task > Components: java, test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.10 > > > Although the tests only fail on branch-2.5, the same exception also produced > on other active branches, so even if the tests passes, it does not test what > we want I think. > {noformat} > 2024-07-11T11:56:44,323 DEBUG [Thread-3 {}] ipc.BlockingRpcConnection$1(409): > Exception encountered while connecting to the server localhost:39851 > java.lang.reflect.InaccessibleObjectException: Unable to make field private > transient java.lang.String java.net.InetAddress.canonicalHostName accessible: > module java.base does not "opens java.net" to unnamed module @26a7b76d > at > java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354) > ~[?:?] > at > java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297) > ~[?:?] > at java.lang.reflect.Field.checkCanSetAccessible(Field.java:178) ~[?:?] > at java.lang.reflect.Field.setAccessible(Field.java:172) ~[?:?] > at > org.apache.hadoop.hbase.security.AbstractTestSecureIPC$CanonicalHostnameTestingAuthenticationProviderSelector$1.createClient(AbstractTestSecureIPC.java:202) > ~[test-classes/:?] > at > org.apache.hadoop.hbase.security.AbstractHBaseSaslRpcClient.(AbstractHBaseSaslRpcClient.java:79) > ~[classes/:?] > at > org.apache.hadoop.hbase.security.HBaseSaslRpcClient.(HBaseSaslRpcClient.java:74) > ~[classes/:?] > at > org.apache.hadoop.hbase.ipc.BlockingRpcConnection.setupSaslConnection(BlockingRpcConnection.java:366) > ~[classes/:?] > at > org.apache.hadoop.hbase.ipc.BlockingRpcConnection$2.run(BlockingRpcConnection.java:541) > ~[classes/:?] > at > org.apache.hadoop.hbase.ipc.BlockingRpcConnection$2.run(BlockingRpcConnection.java:1) > ~[classes/:?] > at > java.security.AccessController.doPrivileged(AccessController.java:712) ~[?:?] > at javax.security.auth.Subject.doAs(Subject.java:439) ~[?:?] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > ~[hadoop-common-3.3.5.jar:?] > at > org.apache.hadoop.hbase.ipc.BlockingRpcConnection.setupIOstreams(BlockingRpcConnection.java:538) > ~[classes/:?] > at > org.apache.hadoop.hbase.ipc.BlockingRpcConnection.writeRequest(BlockingRpcConnection.java:685) > ~[classes/:?] > at > org.apache.hadoop.hbase.ipc.BlockingRpcConnection$4.run(BlockingRpcConnection.java:819) > ~[classes/:?] > at > org.apache.hadoop.hbase.ipc.HBaseRpcControllerImpl.notifyOnCancel(HBaseRpcControllerImpl.java:276) > ~[classes/:?] > at > org.apache.hadoop.hbase.ipc.BlockingRpcConnection.sendRequest(BlockingRpcConnection.java:792) > ~[classes/:?] > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:449) > ~[classes/:?] > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:336) > ~[classes/:?] > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:606) > ~[classes/:?] > at > org.apache.hadoop.hbase.shaded.ipc.protobuf.generated.TestRpcServiceProtos$TestProtobufRpcProto$BlockingStub.echo(TestRpcServiceProtos.java:500) > ~[classes/:?] > at > org.apache.hadoop.hbase.security.AbstractTestSecureIPC$TestThread.run(AbstractTestSecureIPC.java:451) > ~[test-classes/:?] > {noformat} > We need to open java.net too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28730) Locating region can exceed client operation timeout
Daniel Roudnitsky created HBASE-28730: - Summary: Locating region can exceed client operation timeout Key: HBASE-28730 URL: https://issues.apache.org/jira/browse/HBASE-28730 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 2.5.9, 2.4.18, 2.6.0, 2.3.7 Reporter: Daniel Roudnitsky Assignee: Daniel Roudnitsky I'll be referring to hbase.client.operation.timeout as 'operation timeout' and hbase.client.meta.operation.timeout as 'meta timeout'. In the branch-2 client there is a userRegionLock that a thread needs to acquire to run a meta scan to locate a region. userRegionLock acquisition time is bounded by the meta timeout (HBASE-24956) and once the lock is acquired the meta scan time is bounded by hbase.client.meta.scanner.timeout.period (HBASE-27078). The following describes two cases where resolving the region location for an operation can exceed the end to end operation timeout when there is contention around userRegionLock and/or meta slowness (high contention could result from meta slowness/hotspotting , and is more likely in a high concurrency environment where lots of batch operations are being executed): # In locateRegionInMeta , if the relevant region location is not cached, userRegion lock acquisition and meta scan (if userRegionLock is able to be acquired within the lock timeout) [may be retried up to hbase.client.retries.number times|#L1012]. Operation timeout check is not done in between retries, so even if one has meta timeout + meta scanner timeout < operation timeout, retries could take the client beyond the operation timeout before we exit out of locateRegionInMeta and an operation timeout check is done if (meta operation timeout + meta scanner timeout) * region lookup attempts > operation timeout. Suppose we have operation timeout = meta timeout = 10sec and client retries = 2, and there is enough contention/meta slowness that userRegionLock cannot be acquired for 1min, and we have a new thread running an operation that needs to do a region lookup. For this operation, locateRegionInMeta will try to acquire the userRegionLock 3 times , taking 3 * 10sec + some pause time in between retries before we exit out of locateRegionInMeta and the operation times out after >3x the configured 10sec operation/meta timeout. # Without any retries, if one has (hbase.client.meta.operation.timeout || hbase.client.meta.scanner.timeout.period) > hbase.client.operation.timeout (meta operation timeout default makes this easily possible - HBASE-28608) the client operation timeout could be exceeded. +Proposal+ I propose two changes: # Doing an operation timeout check in between retrying userRegion lock acquisition + meta scan (perhaps moving the retry logic + loop outside of the locateRegionInMeta method?) # Change userRegionLock timeout and meta scanner timeout to a dynamic values that depend on the time remaining for the end to end operation. userRegionLock acquisition and meta scan time are bounded by static values regardless of how much time was already spent trying to do region location lookups or how much time might be remaining to run the actual operations once all required region locations are found. If we were to use time remaining for the operation for the lock timeout, and then set the meta scanner timeout to min(hbase.client.meta.scanner.timeout.period, operation time remaining after userRegionLock acquisition), that would provide a good upper bound on time spent attempting to locate a region that should keep the operation closely within the desired end to end timeout. Dynamic userRegionLock and meta scanner timeouts would also remove some complexity/dependence on client configurations in the locate region codepath which should simplify the thought process behind choosing appropriate client timeouts. Branch-2 blocking client is effected, I am not yet sure and have not tested how branch-2 AsyncTable is effected. Branch-3+ does not have userRegionLock, and the sync client connection implementation is very [different|https://github.com/apache/hbase/pull/6000#issuecomment-2210913557] (thank you Duo for explaining). This issue extends/develops on what was originally reported in the bottom of HBASE-28358. HBASE-27490 is related work which greatly improved the upper bound on region location resolution time for batch operations. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28729) Change the generic type of List in InternalScanner.next
Duo Zhang created HBASE-28729: - Summary: Change the generic type of List in InternalScanner.next Key: HBASE-28729 URL: https://issues.apache.org/jira/browse/HBASE-28729 Project: HBase Issue Type: Sub-task Components: Coprocessors, regionserver Reporter: Duo Zhang Plan to change it from List to List, so we could pass both List and List to it, or even List for coprocessors. This could save a lot of casting in our main code. This is an incompatible change for coprocessors, so it will only go into branch-3+, and will be marked as incompatible change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28728) No data available when scan cross-region and setReversed(true) in Spark on HBase sc.newAPIHadoopRDD
Li Zhexi created HBASE-28728: Summary: No data available when scan cross-region and setReversed(true) in Spark on HBase sc.newAPIHadoopRDD Key: HBASE-28728 URL: https://issues.apache.org/jira/browse/HBASE-28728 Project: HBase Issue Type: Bug Affects Versions: 2.4.14, 2.2.3 Reporter: Li Zhexi Using below scala code to scan data in Spark on HBase: val scan = new Scan() scan.withStartRow(Bytes.toBytes(startKey)) scan.withStopRow(Bytes.toBytes(stopKey)) if (reversed) { scan.setReversed(true) } val conf = ConnectionFactory.createConnection.getConfiguration conf.set(TableInputFormat.INPUT_TABLE, tableName) conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(scan)) val rdd = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result]) 1.When scan cross-region without reversed=true, the scan can be performed normally and the result can be obtained. 2.When scan do not cross-region but with reversed=true, the scan can be performed normally and the result can be obtained. 3.When scan cross-region with reversed=true, the result is empty. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28727) SteppingSplitPolicy may not work when table enables region replication
guluo created HBASE-28727: - Summary: SteppingSplitPolicy may not work when table enables region replication Key: HBASE-28727 URL: https://issues.apache.org/jira/browse/HBASE-28727 Project: HBase Issue Type: Bug Affects Versions: 2.4.13 Reporter: guluo Reproduction: 1. Create a table with region replication, and ensure that the primary region and replica region are on the same RS (eg: the HBase cluster has only one RS) create 't01', 'info', \{REGION_REPLICATION => 2} 2. The first region does not split when storefile size exceed flushsize * 2, because that we get 2 regions about this table on this RS (1 primary region and 1 replica region) I think we should ignore the replica reggion when getting the count of regions on this same regionserver. Is my idea correct? maybe can discuss it . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28713) Add 2.6.x in hadoop support matrix in our ref guide
[ https://issues.apache.org/jira/browse/HBASE-28713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28713. --- Fix Version/s: 3.0.0-beta-2 Hadoop Flags: Reviewed Resolution: Fixed Merged to master. Thanks [~bbeaudreault] for reviewing! > Add 2.6.x in hadoop support matrix in our ref guide > --- > > Key: HBASE-28713 > URL: https://issues.apache.org/jira/browse/HBASE-28713 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > > Now it is only up to 2.5.x. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Reverting hbase rest protobuf package name
HBASE-23975 was the original ticket. My guess is that since hbase-shaded-protocol was already set up to do the compiling and shading, moving it there was the easiest solution. I guess that the same logic was behind the rename: since every other class there uses the .shaded. package, change the REST messages the same way. regards Istvan On Fri, Jul 12, 2024 at 9:48 AM 张铎(Duo Zhang) wrote: > In which jira we did this moving? Are there any reasons why we did > this in the past? > > Istvan Toth 于2024年7月12日周五 03:57写道: > > > > Hi! > > > > While working on HBASE-28725, I realized that in HBase 3+ the REST > protobuf > > definition files have been moved to hbase-shaded-protobuf, and the > package > > name has also been renamed. > > > > While I fully agree with the move to using the thirdparty protobuf > library > > (in fact I'd like to backport that change to 2.x), I think that moving > the > > .proto files and renaming the package was not a good idea. > > > > The REST interface does not use the HBase patched features of the > protobuf > > library, and if we want to maintain any pretense that the REST protobuf > > encoding is usable by non-java code, then we should not use it in the > > future either. > > > > (If we ever decide to use the patched features for performance reasons, > we > > will need to define new protobuf messages for that anyway) > > > > Protobuf does not use the package name on the wire, so wire compatibility > > is not an issue. > > > > In the unlikely case that someone has implemented an independent REST > > client that uses protobuf encoding, this will also ensure compatibility > > with the 3.0+ .protoc definitions. > > > > My proposal is: > > > > HBASE-28726 <https://issues.apache.org/jira/browse/HBASE-28726> Revert > REST > > protobuf package to org.apache.hadoop.hbase.shaded.rest > > *This applies only to branch-3+:* > > 1. Move the REST .proto files and compiling back to the hbase-rest module > > (but use the same protoc compiler that we use now) > > 2. Revert the package name of the protobuf messages to the original > > 3. No other changes, we still use the thirdparty protobuf library. > > > > The other issue is that on HBase 2.x the REST client still requires > > unshaded protobuf 2.5.0 which brings back all the protobuf library > > conflicts that were fixed in 3.0 and by hbase-shaded-client. To fix this, > > my proposal is: > > > > HBASE-28725 <https://issues.apache.org/jira/browse/HBASE-28725> Use > > thirdparty protobuf for REST interface in HBase 2.x > > *This applies only to branch-2.x:* > > 1. Backport the code changes that use the thirdparty protobuf library for > > REST to branch-2.x > > > > With these two changes, the REST code would be almost identical on every > > branch, easing maintenance. > > > > What do you think ? > > > > Istvan > -- *István Tóth* | Sr. Staff Software Engineer *Email*: st...@cloudera.com cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> -- --
Re: [DISCUSS] Reverting hbase rest protobuf package name
In which jira we did this moving? Are there any reasons why we did this in the past? Istvan Toth 于2024年7月12日周五 03:57写道: > > Hi! > > While working on HBASE-28725, I realized that in HBase 3+ the REST protobuf > definition files have been moved to hbase-shaded-protobuf, and the package > name has also been renamed. > > While I fully agree with the move to using the thirdparty protobuf library > (in fact I'd like to backport that change to 2.x), I think that moving the > .proto files and renaming the package was not a good idea. > > The REST interface does not use the HBase patched features of the protobuf > library, and if we want to maintain any pretense that the REST protobuf > encoding is usable by non-java code, then we should not use it in the > future either. > > (If we ever decide to use the patched features for performance reasons, we > will need to define new protobuf messages for that anyway) > > Protobuf does not use the package name on the wire, so wire compatibility > is not an issue. > > In the unlikely case that someone has implemented an independent REST > client that uses protobuf encoding, this will also ensure compatibility > with the 3.0+ .protoc definitions. > > My proposal is: > > HBASE-28726 <https://issues.apache.org/jira/browse/HBASE-28726> Revert REST > protobuf package to org.apache.hadoop.hbase.shaded.rest > *This applies only to branch-3+:* > 1. Move the REST .proto files and compiling back to the hbase-rest module > (but use the same protoc compiler that we use now) > 2. Revert the package name of the protobuf messages to the original > 3. No other changes, we still use the thirdparty protobuf library. > > The other issue is that on HBase 2.x the REST client still requires > unshaded protobuf 2.5.0 which brings back all the protobuf library > conflicts that were fixed in 3.0 and by hbase-shaded-client. To fix this, > my proposal is: > > HBASE-28725 <https://issues.apache.org/jira/browse/HBASE-28725> Use > thirdparty protobuf for REST interface in HBase 2.x > *This applies only to branch-2.x:* > 1. Backport the code changes that use the thirdparty protobuf library for > REST to branch-2.x > > With these two changes, the REST code would be almost identical on every > branch, easing maintenance. > > What do you think ? > > Istvan
[DISCUSS] Reverting hbase rest protobuf package name
Hi! While working on HBASE-28725, I realized that in HBase 3+ the REST protobuf definition files have been moved to hbase-shaded-protobuf, and the package name has also been renamed. While I fully agree with the move to using the thirdparty protobuf library (in fact I'd like to backport that change to 2.x), I think that moving the .proto files and renaming the package was not a good idea. The REST interface does not use the HBase patched features of the protobuf library, and if we want to maintain any pretense that the REST protobuf encoding is usable by non-java code, then we should not use it in the future either. (If we ever decide to use the patched features for performance reasons, we will need to define new protobuf messages for that anyway) Protobuf does not use the package name on the wire, so wire compatibility is not an issue. In the unlikely case that someone has implemented an independent REST client that uses protobuf encoding, this will also ensure compatibility with the 3.0+ .protoc definitions. My proposal is: HBASE-28726 <https://issues.apache.org/jira/browse/HBASE-28726> Revert REST protobuf package to org.apache.hadoop.hbase.shaded.rest *This applies only to branch-3+:* 1. Move the REST .proto files and compiling back to the hbase-rest module (but use the same protoc compiler that we use now) 2. Revert the package name of the protobuf messages to the original 3. No other changes, we still use the thirdparty protobuf library. The other issue is that on HBase 2.x the REST client still requires unshaded protobuf 2.5.0 which brings back all the protobuf library conflicts that were fixed in 3.0 and by hbase-shaded-client. To fix this, my proposal is: HBASE-28725 <https://issues.apache.org/jira/browse/HBASE-28725> Use thirdparty protobuf for REST interface in HBase 2.x *This applies only to branch-2.x:* 1. Backport the code changes that use the thirdparty protobuf library for REST to branch-2.x With these two changes, the REST code would be almost identical on every branch, easing maintenance. What do you think ? Istvan
[jira] [Created] (HBASE-28726) Revert REST protobuf package to org.apache.hadoop.hbase.shaded.rest
Istvan Toth created HBASE-28726: --- Summary: Revert REST protobuf package to org.apache.hadoop.hbase.shaded.rest Key: HBASE-28726 URL: https://issues.apache.org/jira/browse/HBASE-28726 Project: HBase Issue Type: Bug Components: REST Reporter: Istvan Toth In Hbase 3+, the package name of the REST messages has been renamed to org.apache.hadoop.hbase.shaded.rest from org.apache.hadoop.hbase.rest These definitions are only used by REST, and have nothing to do with standard HBase RPC communication. I propose reverting the package name. We may also want to move the protobuf definitions back to the hbase-rest module. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28725) Use thirdparty protobuf for REST interface in HBase 2.x
Istvan Toth created HBASE-28725: --- Summary: Use thirdparty protobuf for REST interface in HBase 2.x Key: HBASE-28725 URL: https://issues.apache.org/jira/browse/HBASE-28725 Project: HBase Issue Type: Improvement Components: REST Reporter: Istvan Toth Assignee: Istvan Toth This change has already been done in branch 3+ as part of the protobuf 2.5 removal, We just need to backport it to 2.x. This removes the requirement of having unshaded protobuf 2.5.0 on the hbase-rest client classpath. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28724) BucketCache.notifyFileCachingCompleted may throw IllegalMonitorStateException
Wellington Chevreuil created HBASE-28724: Summary: BucketCache.notifyFileCachingCompleted may throw IllegalMonitorStateException Key: HBASE-28724 URL: https://issues.apache.org/jira/browse/HBASE-28724 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil If the prefetch thread completes reading the file blocks faster than the bucket cache writer threads are able to drain it from the writer queues, we might run into a scenario where BucketCache.notifyFileCachingCompleted may throw IllegalMonitorStateException, as we can reach [this block of the code|https://github.com/wchevreuil/hbase/blob/684964f1c1693d2a0792b7b721c92693d75b4cea/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java#L2106]. I believe the impact is not critical, as the prefetch thread is already finishing at that point, but nevertheless, such error in the logs might be misleading. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28684) Remove CellWrapper and use ExtendedCell internally in client side data structure
[ https://issues.apache.org/jira/browse/HBASE-28684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28684. --- Hadoop Flags: Incompatible change,Reviewed (was: Incompatible change) Resolution: Fixed Pushed to master and branch-3. Thanks [~Ddupg] for reviewing! > Remove CellWrapper and use ExtendedCell internally in client side data > structure > > > Key: HBASE-28684 > URL: https://issues.apache.org/jira/browse/HBASE-28684 > Project: HBase > Issue Type: Sub-task > Components: API, Client >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > > In general, all Cells in HBase are ExtendedCells, we introduce Cell interface > is only for preventing user to call some methods which damage the system. > So I think we should have internal methods which can get ExtendedCell from > the client side data structures so we do not need to cast everywhere. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28707) Backport the code changes in HBASE-28675 to branch-2.x
[ https://issues.apache.org/jira/browse/HBASE-28707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28707. --- Fix Version/s: 2.7.0 2.6.1 2.5.10 Hadoop Flags: Reviewed Assignee: Duo Zhang Resolution: Fixed Pushed to all branch-2.x. Thanks [~ndimiduk] for reviewing! > Backport the code changes in HBASE-28675 to branch-2.x > -- > > Key: HBASE-28707 > URL: https://issues.apache.org/jira/browse/HBASE-28707 > Project: HBase > Issue Type: Task >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 2.6.1, 2.5.10 > > > For aligning the code between different branches. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28665) WALs not marked closed when there are errors in closing WALs
[ https://issues.apache.org/jira/browse/HBASE-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani resolved HBASE-28665. -- Fix Version/s: 2.7.0 2.6.1 2.5.10 Hadoop Flags: Reviewed Resolution: Fixed > WALs not marked closed when there are errors in closing WALs > > > Key: HBASE-28665 > URL: https://issues.apache.org/jira/browse/HBASE-28665 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 2.5.8 >Reporter: Kiran Kumar Maturi >Assignee: Kiran Kumar Maturi >Priority: Minor > Labels: pull-request-available > Fix For: 2.7.0, 2.6.1, 2.5.10 > > > In our production clusters we have observed that when WAL close fails It > causes the the oldWAL files not marked as close and not letting them cleaned. > When a WAL close fails in closeWriter it increments the error count. > {code:java} > Span span = Span.current(); > try { > span.addEvent("closing writer"); > writer.close(); > span.addEvent("writer closed"); > } catch (IOException ioe) { > int errors = closeErrorCount.incrementAndGet(); > boolean hasUnflushedEntries = isUnflushedEntries(); > if (syncCloseCall && (hasUnflushedEntries || (errors > > this.closeErrorsTolerated))) { > LOG.error("Close of WAL " + path + " failed. Cause=\"" + > ioe.getMessage() + "\", errors=" > + errors + ", hasUnflushedEntries=" + hasUnflushedEntries); > throw ioe; > } > LOG.warn("Riding over failed WAL close of " + path > + "; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK", > ioe); > } > {code} > When there are errors in closing WAL only twice doReplaceWALWriter enters > this code block > {code:java} > if (isUnflushedEntries() || closeErrorCount.get() >= > this.closeErrorsTolerated) { > try { > closeWriter(this.writer, oldPath, true); > } finally { > inflightWALClosures.remove(oldPath.getName()); > } > } > {code} > as we don't mark them closed here like we do it here > > {code:java} > Writer localWriter = this.writer; > closeExecutor.execute(() -> { > try { > closeWriter(localWriter, oldPath, false); > } catch (IOException e) { > LOG.warn("close old writer failed", e); > } finally { > // call this even if the above close fails, as there is no > other chance we can set > // closed to true, it will not cause big problems. > {color:red} markClosedAndClean(oldPath);{color} > inflightWALClosures.remove(oldPath.getName()); > } > }); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28723) [JDK17] TestSecureIPC fails under JDK17
Duo Zhang created HBASE-28723: - Summary: [JDK17] TestSecureIPC fails under JDK17 Key: HBASE-28723 URL: https://issues.apache.org/jira/browse/HBASE-28723 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang Although the tests only fail on branch-2.5, the same exception also produced on other active branches, so even if the tests passes, it does not test what we want I think. {noformat} 2024-07-11T11:56:44,323 DEBUG [Thread-3 {}] ipc.BlockingRpcConnection$1(409): Exception encountered while connecting to the server localhost:39851 java.lang.reflect.InaccessibleObjectException: Unable to make field private transient java.lang.String java.net.InetAddress.canonicalHostName accessible: module java.base does not "opens java.net" to unnamed module @26a7b76d at java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354) ~[?:?] at java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297) ~[?:?] at java.lang.reflect.Field.checkCanSetAccessible(Field.java:178) ~[?:?] at java.lang.reflect.Field.setAccessible(Field.java:172) ~[?:?] at org.apache.hadoop.hbase.security.AbstractTestSecureIPC$CanonicalHostnameTestingAuthenticationProviderSelector$1.createClient(AbstractTestSecureIPC.java:202) ~[test-classes/:?] at org.apache.hadoop.hbase.security.AbstractHBaseSaslRpcClient.(AbstractHBaseSaslRpcClient.java:79) ~[classes/:?] at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.(HBaseSaslRpcClient.java:74) ~[classes/:?] at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.setupSaslConnection(BlockingRpcConnection.java:366) ~[classes/:?] at org.apache.hadoop.hbase.ipc.BlockingRpcConnection$2.run(BlockingRpcConnection.java:541) ~[classes/:?] at org.apache.hadoop.hbase.ipc.BlockingRpcConnection$2.run(BlockingRpcConnection.java:1) ~[classes/:?] at java.security.AccessController.doPrivileged(AccessController.java:712) ~[?:?] at javax.security.auth.Subject.doAs(Subject.java:439) ~[?:?] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) ~[hadoop-common-3.3.5.jar:?] at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.setupIOstreams(BlockingRpcConnection.java:538) ~[classes/:?] at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.writeRequest(BlockingRpcConnection.java:685) ~[classes/:?] at org.apache.hadoop.hbase.ipc.BlockingRpcConnection$4.run(BlockingRpcConnection.java:819) ~[classes/:?] at org.apache.hadoop.hbase.ipc.HBaseRpcControllerImpl.notifyOnCancel(HBaseRpcControllerImpl.java:276) ~[classes/:?] at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.sendRequest(BlockingRpcConnection.java:792) ~[classes/:?] at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:449) ~[classes/:?] at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:336) ~[classes/:?] at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:606) ~[classes/:?] at org.apache.hadoop.hbase.shaded.ipc.protobuf.generated.TestRpcServiceProtos$TestProtobufRpcProto$BlockingStub.echo(TestRpcServiceProtos.java:500) ~[classes/:?] at org.apache.hadoop.hbase.security.AbstractTestSecureIPC$TestThread.run(AbstractTestSecureIPC.java:451) ~[test-classes/:?] {noformat} We need to open java.net too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28672) Ensure large batches are not indefinitely blocked by quotas
[ https://issues.apache.org/jira/browse/HBASE-28672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk resolved HBASE-28672. -- Fix Version/s: 2.7.0 3.0.0-beta-2 2.6.1 Resolution: Fixed Pushed to branch-2.6+. Thanks [~rmdmattingly] for the contribution and to [~zhangduo] for build system quick-fixes. [~rmdmattingly] should this also go back to 2.5? The patch did not apply cleanly, it looked like some interfaces aren't present there. Maybe a dependency needs to be backported first? > Ensure large batches are not indefinitely blocked by quotas > --- > > Key: HBASE-28672 > URL: https://issues.apache.org/jira/browse/HBASE-28672 > Project: HBase > Issue Type: Improvement > Components: Quotas >Affects Versions: 2.6.0 >Reporter: Ray Mattingly >Assignee: Ray Mattingly >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.6.1 > > > At my day job we are trying to implement default quotas for a variety of > access patterns. We began by introducing a default read IO limit per-user, > per-machine — this has been very successful in reducing hotspots, even on > clusters with thousands of distinct users. > While implementing a default writes/second throttle, I realized that doing so > would put us in a precarious situation where large-enough batches may never > succeed. If your batch size is greater than your TimeLimiter's max > throughput, then you will always fail in the quota estimation stage. > Meanwhile [IO estimates are more > optimistic|https://github.com/apache/hbase/blob/bdb3f216e864e20eb2b09352707a751a5cf7460f/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/DefaultOperationQuota.java#L192-L193], > deliberately, which can let large requests do targeted oversubscription of > an IO quota: > > {code:java} > // assume 1 block required for reads. this is probably a low estimate, which > is okay > readConsumed = numReads > 0 ? blockSizeBytes : 0;{code} > > This is okay because the Limiter's availability will go negative and force a > longer backoff on subsequent requests. I believe this is preferable UX > compared to a doomed throttling loop. > In my opinion, we should do something similar in batch request estimation, by > estimating a batch request's workload at {{Math.min(batchSize, > limiterMaxThroughput)}} rather than simply {{{}batchSize{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28364) Warn: Cache key had block type null, but was found in L1 cache
[ https://issues.apache.org/jira/browse/HBASE-28364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28364. -- Resolution: Fixed Merged to 2.6 and 2.5 branches. > Warn: Cache key had block type null, but was found in L1 cache > -- > > Key: HBASE-28364 > URL: https://issues.apache.org/jira/browse/HBASE-28364 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 2.4.18, 2.5.9 >Reporter: Bryan Beaudreault >Assignee: Nikita Pande >Priority: Major > Labels: pull-request-available > Fix For: 2.6.1, 2.5.10 > > > I'm ITBLL testing branch-2.6 and am seeing lots of these warns. This is new > to me. I would expect a warn to be on the rare side or be indicative of a > problem, but unclear from the code. > cc [~wchevreuil] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28722) Should wipe out all the output directories in the 'init health results' stage in nightly job
Duo Zhang created HBASE-28722: - Summary: Should wipe out all the output directories in the 'init health results' stage in nightly job Key: HBASE-28722 URL: https://issues.apache.org/jira/browse/HBASE-28722 Project: HBase Issue Type: Bug Components: jenkins, scripts Reporter: Duo Zhang For master and branch-3, we do not have jdk8 and jdk11 stages but we can still see there are comments on jira which include these stages's results. I think the problem is that, in the 'init health results' stage, we want to stash some empty results but actually there are some build results for previous builds there so we stash some non empty results. We should wipe out these directories first before stash them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28668) Add documentation about specifying connection uri in replication and map reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-28668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28668. --- Fix Version/s: 4.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed Merged to master. Thanks [~bbeaudreault] and [~ndimiduk] for reviewing! > Add documentation about specifying connection uri in replication and map > reduce jobs > > > Key: HBASE-28668 > URL: https://issues.apache.org/jira/browse/HBASE-28668 > Project: HBase > Issue Type: Task > Components: documentation, mapreduce, Replication >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28714) Hadoop check for hadoop 3.4.0 is failing
[ https://issues.apache.org/jira/browse/HBASE-28714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-28714. --- Hadoop Flags: Reviewed Resolution: Fixed Pushed to all active branches. Thanks [~ndimiduk] for reviewing! > Hadoop check for hadoop 3.4.0 is failing > > > Key: HBASE-28714 > URL: https://issues.apache.org/jira/browse/HBASE-28714 > Project: HBase > Issue Type: Bug > Components: dependencies, hadoop3 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.10 > > > In hadoop 3.4.0, hadoop common depends on org.bouncycastle:bcprov-jdk15on, > where we an enforcer rule to force depending on org.bouncycastle:*-jdk18on. > We should exlude org.bouncycastle:bcprov-jdk15on from hadoop. > And also remove direct references of protobuf 2.5 in our asyncfs code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28721) AsyncFSWAL is broken when running against hadoop 3.4.0
Duo Zhang created HBASE-28721: - Summary: AsyncFSWAL is broken when running against hadoop 3.4.0 Key: HBASE-28721 URL: https://issues.apache.org/jira/browse/HBASE-28721 Project: HBase Issue Type: Bug Components: hadoop3, wal Reporter: Duo Zhang {noformat} 2024-07-10T10:09:33,161 ERROR [master/localhost:0:becomeActiveMaster {}] asyncfs.FanOutOneBlockAsyncDFSOutputHelper(258): Couldn't properly initialize access to HDFS internals. Please update your WAL Provider to not make use of the 'asyncfs' provider. See HBASE-16110 for more information. java.lang.NoSuchMethodException: org.apache.hadoop.hdfs.DFSClient.beginFileLease(long,org.apache.hadoop.hdfs.DFSOutputStream) at java.lang.Class.getDeclaredMethod(Class.java:2675) ~[?:?] at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createLeaseManager(FanOutOneBlockAsyncDFSOutputHelper.java:175) ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.(FanOutOneBlockAsyncDFSOutputHelper.java:252) ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] at java.lang.Class.forName0(Native Method) ~[?:?] at java.lang.Class.forName(Class.java:375) ~[?:?] at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.load(AsyncFSWALProvider.java:149) ~[classes/:?] at org.apache.hadoop.hbase.wal.WALFactory.getProviderClass(WALFactory.java:174) ~[classes/:?] at org.apache.hadoop.hbase.wal.WALFactory.(WALFactory.java:262) ~[classes/:?] at org.apache.hadoop.hbase.wal.WALFactory.(WALFactory.java:231) ~[classes/:?] at org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:383) ~[classes/:?] at org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:135) ~[classes/:?] at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1003) ~[classes/:?] at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) ~[classes/:?] at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) ~[classes/:?] at org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) ~[hbase-common-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] at java.lang.Thread.run(Thread.java:840) ~[?:?] {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)