[jira] [Updated] (HBASE-28621) PrefixFilter should use SEEK_NEXT_USING_HINT
[ https://issues.apache.org/jira/browse/HBASE-28621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28621: --- Labels: beginner beginner-friendly pull-request-available (was: beginner beginner-friendly) > PrefixFilter should use SEEK_NEXT_USING_HINT > - > > Key: HBASE-28621 > URL: https://issues.apache.org/jira/browse/HBASE-28621 > Project: HBase > Issue Type: Improvement > Components: Filters >Reporter: Istvan Toth >Assignee: Dávid Paksy >Priority: Major > Labels: beginner, beginner-friendly, pull-request-available > > Looking at PrefixFilter, I have noticed that it doesn't use the > SEEK_NEXT_USING_HINT mechanism. > AFAICT, we could safely set the the prefix as a next row hint, which could be > a huge performance win. > Of course, ideally the user would set the scan startRow to the prefix, which > avoids the problem, but the user may forget to do that, or may use the filter > in a filterList that doesn't allow for setting the start/stop rows close tho > the prefix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28900) Avoid resetting bucket cache during restart if inconsistency is observed for some blocks.
[ https://issues.apache.org/jira/browse/HBASE-28900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28900: --- Labels: pull-request-available (was: ) > Avoid resetting bucket cache during restart if inconsistency is observed for > some blocks. > - > > Key: HBASE-28900 > URL: https://issues.apache.org/jira/browse/HBASE-28900 > Project: HBase > Issue Type: Bug > Components: BucketCache >Reporter: Janardhan Hungund >Assignee: Janardhan Hungund >Priority: Major > Labels: pull-request-available > > During the execution of persistence of backing map into the persistence file, > the backing map is not guarded by a lock against caching of blocks and block > evictions. > Hence, some of the block entries in the backing map in persistence may not be > consistent with the bucket cache. > During, the retrieval of the backing map from persistence if an inconsistency > is detected, the complete bucket cache is discarded and is rebuilt. > One of the errors, seen, is, as mentioned below: > {code:java} > 2024-09-30 08:58:33,840 WARN > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache: Can't restore from > file[/hadoopfs/ephfs1/bucketcache.map]. The bucket cache will be reset and > rebuilt. Exception seen: > org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException: Couldn't > find match for index 26 in free list > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$Bucket.addAllocation(BucketAllocator.java:140) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.(BucketAllocator.java:406) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.retrieveFromFile(BucketCache.java:1486) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.lambda$startPersistenceRetriever$0(BucketCache.java:377) > at java.base/java.lang.Thread.run(Thread.java:840) {code} > This retrieval can be optimised to only discard the inconsistent entries in > the persistent backing map and retain the remaining entries. The bucket cache > validator will throw away the inconsistent entry from the backing map. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28898) Use reflection to access recoverLease(), setSafeMode() APIs.
[ https://issues.apache.org/jira/browse/HBASE-28898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28898: --- Labels: pull-request-available (was: ) > Use reflection to access recoverLease(), setSafeMode() APIs. > > > Key: HBASE-28898 > URL: https://issues.apache.org/jira/browse/HBASE-28898 > Project: HBase > Issue Type: Task > Components: Filesystem Integration >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0 > > > HBASE-27769 used the new Hadoop API (available since Hadoop 3.3.6/3.4.0) to > access recoverLease() and setSafeMode() APIs, and committed the change in a > feature branch. > However, until we move beyond Hadoop 3.3.6, we can not use them directly. > I'd like to propose to use reflection to access these APIs in the interim so > HBase can use Ozone sooner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28906) Run nightly tests with multiple Hadoop 3 versions
[ https://issues.apache.org/jira/browse/HBASE-28906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28906: --- Labels: pull-request-available (was: ) > Run nightly tests with multiple Hadoop 3 versions > - > > Key: HBASE-28906 > URL: https://issues.apache.org/jira/browse/HBASE-28906 > Project: HBase > Issue Type: Sub-task > Components: integration tests, test >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28901) checkcompatibility.py can run maven commands with parallelism
[ https://issues.apache.org/jira/browse/HBASE-28901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28901: --- Labels: pull-request-available (was: ) > checkcompatibility.py can run maven commands with parallelism > - > > Key: HBASE-28901 > URL: https://issues.apache.org/jira/browse/HBASE-28901 > Project: HBase > Issue Type: Task > Components: create-release >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > > We can speed up the create-release process by taking advantage of maven > parallelism during creation of the API compatibility report. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28903) Incremental backup test missing explicit test for bulkloads
[ https://issues.apache.org/jira/browse/HBASE-28903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28903: --- Labels: pull-request-available (was: ) > Incremental backup test missing explicit test for bulkloads > --- > > Key: HBASE-28903 > URL: https://issues.apache.org/jira/browse/HBASE-28903 > Project: HBase > Issue Type: Improvement >Reporter: Hernan Gelaf-Romer >Priority: Major > Labels: pull-request-available > > Our incremental backup tests don't explicitly test our ability to backup and > restore bulkloads. It'd be nice to have this to verify bulkloads work in the > context of the backup/restore flow, and to avoid regressions in the future -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28905) Skip excessive evaluations of LINK_NAME_PATTERN and REF_NAME_PATTERN regular expressions
[ https://issues.apache.org/jira/browse/HBASE-28905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28905: --- Labels: pull-request-available (was: ) > Skip excessive evaluations of LINK_NAME_PATTERN and REF_NAME_PATTERN regular > expressions > > > Key: HBASE-28905 > URL: https://issues.apache.org/jira/browse/HBASE-28905 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.6.0, 3.0.0-beta-1, 2.7.0 >Reporter: Charles Connell >Assignee: Charles Connell >Priority: Minor > Labels: pull-request-available > Attachments: cpu_time_flamegraph_2.6.0.html, > cpu_time_flamegraph_with_optimization.html, > performance_test_query_latency_2.6.0.png, > performance_test_query_latency_with_optimization.png > > > To test if a file is a link file, HBase checks if its file name matches the > regex > {code:java} > ^(?:((?:[_\p{Digit}\p{IsAlphabetic}]+))(?:\=))?((?:[_\p{Digit}\p{IsAlphabetic}][-_.\p{Digit}\p{IsAlphabetic}]*))=((?:[a-f0-9]+))-([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?)$ > {code} > To test if an HFile has a "reference name," HBase checks if its file name > matches the regex > {code:java} > ^([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?|^(?:((?:[_\p{Digit}\p{IsAlphabetic}]+))(?:\=))?((?:[_\p{Digit}\p{IsAlphabetic}][-_.\p{Digit}\p{IsAlphabetic}]*))=((?:[a-f0-9]+))-([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?)$)\.(.+)$ > {code} > Matching against these big regexes is computationally expensive. HBASE-27474 > introduced (in 2.6.0) [code in a hot > path|https://github.com/apache/hbase/blob/1602c531b245b4d455b48161757cde2ec3d1930b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java#L1716] > in {{HFileReaderImpl}} that checks whether an HFile is a link or reference > file while deciding whether to cache blocks from that file. In flamegraphs > taken at my company during performance tests, this meant that these regex > evaulations take 2-3% of the CPU time on a busy RegionServer. > Later, the hot-path invocation of the regexes was removed in HBASE-28596 in > branch-2 and later, but not branch-2.6, so only the 2.6.x series suffers the > performance regression. Nonetheless, all invocations of these regexes are > still unnecessarily expensive and can be fast-failed easily. > The link name pattern contains a literal "=", so any string that does not > contain a "=" can be assumed to not match the regex. The reference name > pattern contains a literal ".", so any string that does not contain a "." can > be assumed to not match the regex. This optimization is mostly helpful in > 2.6.x, but is valid in all branches. > Running performance tests of this optimization removed the regex evaluations > from my flamegraphs entirely, and reduced query latency by 15%. Some charts > are attached. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28897) Incremental backups can be taken with incompatible column families
[ https://issues.apache.org/jira/browse/HBASE-28897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28897: --- Labels: pull-request-available (was: ) > Incremental backups can be taken with incompatible column families > -- > > Key: HBASE-28897 > URL: https://issues.apache.org/jira/browse/HBASE-28897 > Project: HBase > Issue Type: Bug > Components: backup&restore >Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1 >Reporter: Hernan Gelaf-Romer >Assignee: Hernan Gelaf-Romer >Priority: Major > Labels: pull-request-available > > Incremental backups can be taken even if the table descriptor of the current > table does not match the column families of the full backup for that same > table. When restoring the table, we choose to use the families of the full > backup. This can cause the restore process to fail if we add a column family > in the incremental backup that doesn't exist in the full backup. The bulkload > process will fail because it is trying to write column families that don't > exist in the restore table. > > I think the correct solution here is to prevent incremental backups from > being taken if the families of the current table don't match those of the > full backup. This will force users to instead take a full backup. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28894) NPE on TestPrefetch.testPrefetchWithDelay
[ https://issues.apache.org/jira/browse/HBASE-28894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28894: --- Labels: pull-request-available (was: ) > NPE on TestPrefetch.testPrefetchWithDelay > - > > Key: HBASE-28894 > URL: https://issues.apache.org/jira/browse/HBASE-28894 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Labels: pull-request-available > > I'm seeing some failures on TestPrefetch.testPrefetchWithDelay in some > pre-commit runs, I believe this is due to a race condition in > PrefetchExecutor. > loadConfiguration. > In these failures, it seems we are getting the NPE below: > {noformat} > Stacktracejava.lang.NullPointerException > at > java.util.concurrent.ConcurrentSkipListMap.put(ConcurrentSkipListMap.java:1580) > at > org.apache.hadoop.hbase.io.hfile.PrefetchExecutor.request(PrefetchExecutor.java:108) > at > org.apache.hadoop.hbase.io.hfile.PrefetchExecutor.lambda$loadConfiguration$0(PrefetchExecutor.java:206) > at > java.util.concurrent.ConcurrentSkipListMap.forEach(ConcurrentSkipListMap.java:3269) > at > org.apache.hadoop.hbase.io.hfile.PrefetchExecutor.loadConfiguration(PrefetchExecutor.java:200) > at > org.apache.hadoop.hbase.regionserver.PrefetchExecutorNotifier.onConfigurationChange(PrefetchExecutorNotifier.java:51) > at > org.apache.hadoop.hbase.io.hfile.TestPrefetch.testPrefetchWithDelay(TestPrefetch.java:378) > {noformat} > I think this is because we are completing prefetch in this test before the > induced delay, then this test triggers a new configuration change, but the > prefetch thread calls PrefetchExecutor.complete just before the test thread > reaches [this > point|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/PrefetchExecutor.java#L206]: > {noformat} > 2024-10-01T11:28:10,660 DEBUG [Time-limited test {}] > hfile.PrefetchExecutor(102): Prefetch requested for > /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-6328/yetus-jdk8-hadoop2-check/src/hbase-server/target/test-data/b646497b-7616-6533-e8cb-98e5c9e2e083/TestPrefetchWithDelay/71eefdb271ae4f65b694a6ec3d4287a0, > delay=25000 ms > 2024-10-01T11:28:30,668 INFO [Time-limited test {}] hbase.Waiter(181): > Waiting up to [10,000] milli-secs(wait.for.ratio=[1]) > 2024-10-01T11:28:35,661 DEBUG [hfile-prefetch-1727782088576 {}] > hfile.HFilePreadReader$1(103): No entry in the backing map for cache key > 71eefdb271ae4f65b694a6ec3d4287a0_0. > ... > 2024-10-01T11:28:35,673 DEBUG [hfile-prefetch-1727782088576 {}] > hfile.HFilePreadReader$1(103): No entry in the backing map for cache key > 71eefdb271ae4f65b694a6ec3d4287a0_52849. > 2024-10-01T11:28:35,674 DEBUG [Time-limited test {}] > hfile.PrefetchExecutor(142): Prefetch cancelled for > /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-6328/yetus-jdk8-hadoop2-check/src/hbase-server/target/test-data/b646497b-7616-6533-e8cb-98e5c9e2e083/TestPrefetchWithDelay/71eefdb271ae4f65b694a6ec3d4287a0 > 2024-10-01T11:28:35,674 DEBUG [hfile-prefetch-1727782088576 {}] > hfile.PrefetchExecutor(121): Prefetch completed for > 71eefdb271ae4f65b694a6ec3d4287a0 > 2024-10-01T11:28:35,674 DEBUG [Time-limited test {}] > hfile.PrefetchExecutor(102): Prefetch requested for > /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-6328/yetus-jdk8-hadoop2-check/src/hbase-server/target/test-data/b646497b-7616-6533-e8cb-98e5c9e2e083/TestPrefetchWithDelay/71eefdb271ae4f65b694a6ec3d4287a0, > delay=991 ms > ... > {noformat} > CC: [~kabhishek4] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28890) RefCnt Leak error when caching index blocks at write time
[ https://issues.apache.org/jira/browse/HBASE-28890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28890: --- Labels: pull-request-available (was: ) > RefCnt Leak error when caching index blocks at write time > - > > Key: HBASE-28890 > URL: https://issues.apache.org/jira/browse/HBASE-28890 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Labels: pull-request-available > > Following [~bbeaudreault] works from HBASE-27170 that added the (very useful) > refcount leak detector, we sometimes see these reports on some branch-2 based > deployments: > {noformat} > 2024-09-25 10:06:42,413 ERROR > org.apache.hbase.thirdparty.io.netty.util.ResourceLeakDetector: LEAK: > RefCnt.release() was not called before it's garbage-collected. See > https://netty.io/wiki/reference-counted-objects.html for more information. > Recent access records: > Created at: > org.apache.hadoop.hbase.nio.RefCnt.(RefCnt.java:59) > org.apache.hadoop.hbase.nio.RefCnt.create(RefCnt.java:54) > org.apache.hadoop.hbase.nio.ByteBuff.wrap(ByteBuff.java:550) > > org.apache.hadoop.hbase.io.ByteBuffAllocator.allocate(ByteBuffAllocator.java:357) > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.cloneUncompressedBufferWithHeader(HFileBlock.java:1153) > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.getBlockForCaching(HFileBlock.java:1215) > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.lambda$writeIndexBlocks$0(HFileBlockIndex.java:997) > java.base/java.util.Optional.ifPresent(Optional.java:178) > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:996) > > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:635) > > org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:378) > > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) > > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74) > > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:831) > > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2033) > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2878) > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2620) > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2592) > > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2462) > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:602) > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:572) > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:65) > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:344) > {noformat} > It turns out that we always convert the block to a "on-heap" one, inside > LruBlockCache.cacheBlock, so when the index block is a SharedMemHFileBlock, > the blockForCaching instance in the code > [here|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java#L1076] > becomes eligible for GC without releasing buffers/decreasing refcount > (leak), right after we return the BlockIndexWriter.writeIndexBlocks call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28733) Publish API docs for 2.6
[ https://issues.apache.org/jira/browse/HBASE-28733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28733: --- Labels: pull-request-available (was: ) > Publish API docs for 2.6 > > > Key: HBASE-28733 > URL: https://issues.apache.org/jira/browse/HBASE-28733 > Project: HBase > Issue Type: Task > Components: community, documentation >Reporter: Nick Dimiduk >Assignee: Dávid Paksy >Priority: Major > Labels: pull-request-available > > We have released 2.6 but the website has not been updated with the new API > docs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28888) Backport "HBASE-18382 [Thrift] Add transport type info to info server" to branch-2
[ https://issues.apache.org/jira/browse/HBASE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-2: --- Labels: beginner pull-request-available (was: beginner) > Backport "HBASE-18382 [Thrift] Add transport type info to info server" to > branch-2 > -- > > Key: HBASE-2 > URL: https://issues.apache.org/jira/browse/HBASE-2 > Project: HBase > Issue Type: Improvement > Components: Thrift >Reporter: Lars George >Assignee: Nihal Jain >Priority: Minor > Labels: beginner, pull-request-available > Fix For: 3.0.0-alpha-1 > > > It would be really helpful to know if the Thrift server was started using the > HTTP or binary transport. Any additional info, like QOP settings for SASL > etc. would be great too. Right now the UI is very limited and shows > {{true/false}} for, for example, {{Compact Transport}}. It'd suggest to > change this to show something more useful like this: > {noformat} > Thrift Impl Type: non-blocking > Protocol: Binary > Transport: Framed > QOP: Authentication & Confidential > {noformat} > or > {noformat} > Protocol: Binary + HTTP > Transport: Standard > QOP: none > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28887) Fix broken link to mailing lists page
[ https://issues.apache.org/jira/browse/HBASE-28887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28887: --- Labels: pull-request-available (was: ) > Fix broken link to mailing lists page > - > > Key: HBASE-28887 > URL: https://issues.apache.org/jira/browse/HBASE-28887 > Project: HBase > Issue Type: Task > Components: documentation >Affects Versions: 4.0.0-alpha-1 >Reporter: Dávid Paksy >Priority: Minor > Labels: pull-request-available > > The Reference Guide (book) link to the mailing lists page -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28866) Setting `hbase.oldwals.cleaner.thread.size` to negative value will break HMaster and produce hard-to-diagnose logs
[ https://issues.apache.org/jira/browse/HBASE-28866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28866: --- Labels: pull-request-available (was: ) > Setting `hbase.oldwals.cleaner.thread.size` to negative value will break > HMaster and produce hard-to-diagnose logs > -- > > Key: HBASE-28866 > URL: https://issues.apache.org/jira/browse/HBASE-28866 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 2.4.2, 3.0.0-beta-1 >Reporter: Ariadne_team >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > Attachments: HBASE-28866-000-1.patch, HBASE-28866-000.patch > > > > Problem > - > HBase Master cannot be initialized with the following setting: > > hbase.oldwals.cleaner.thread.size > -1 > Default is 2 > > > After running the start-hbase.sh, the Master node could not be started due to > an exception: > {code:java} > ERROR [master/localhost:16000:becomeActiveMaster] master.HMaster: Failed to > become active master > java.lang.IllegalArgumentException: Illegal Capacity: -1 > at java.util.ArrayList.(ArrayList.java:157) > at > org.apache.hadoop.hbase.master.cleaner.LogCleaner.createOldWalsCleaner(LogCleaner.java:149) > at > org.apache.hadoop.hbase.master.cleaner.LogCleaner.(LogCleaner.java:80) > at > org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1329) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:917) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2081) > at org.apache.hadoop.hbase.master.HMaster.lambda$0(HMaster.java:505) > at java.lang.Thread.run(Thread.java:750){code} > We were really confused and misled by the error log as the 'Illegal Capacity' > of ArrayList seems like an internal code issue. > > After we read the source code, we found that > "hbase.oldwals.cleaner.thread.size" is parsed and used in > createOldWalsCleaner() function without checking: > {code:java} > int size = conf.getInt(OLD_WALS_CLEANER_THREAD_SIZE, > DEFAULT_OLD_WALS_CLEANER_THREAD_SIZE); this.oldWALsCleaner = > createOldWalsCleaner(size); {code} > The value of "hbase.oldwals.cleaner.thread.size" will be served as the > initialCapacity of ArrayList. If the configuration value is negative, an > IllegalArgumentException will be thrown.: > {code:java} > private List createOldWalsCleaner(int size) { > ... > List oldWALsCleaner = new ArrayList<>(size); > ... > } {code} > > Solution (the attached patch) > - > The basic idea of the attached patch is to add a check and relevant logging > for this value during the initialization of the {{LogCleaner}} in the > constructor. This will help users better diagnose the issue. The detailed > patch is shown below. > {code:java} > @@ -78,6 +78,11 @@ > public class LogCleaner extends CleanerChore > pool, params, null); > this.pendingDelete = new LinkedBlockingQueue<>(); > int size = conf.getInt(OLD_WALS_CLEANER_THREAD_SIZE, > DEFAULT_OLD_WALS_CLEANER_THREAD_SIZE); > + if (size <= 0) { > + LOG.warn("The size of old WALs cleaner thread is {}, which is invalid, > " > + + "the default value will be used.", size); > + size = DEFAULT_OLD_WALS_CLEANER_THREAD_SIZE; > + } > this.oldWALsCleaner = createOldWalsCleaner(size); > this.cleanerThreadTimeoutMsec = > conf.getLong(OLD_WALS_CLEANER_THREAD_TIMEOUT_MSEC, > DEFAULT_OLD_WALS_CLEANER_THREAD_TIMEOUT_MSEC);{code} > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28884) SFT's BrokenStoreFileCleaner may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-28884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28884: --- Labels: pull-request-available (was: ) > SFT's BrokenStoreFileCleaner may cause data loss > > > Key: HBASE-28884 > URL: https://issues.apache.org/jira/browse/HBASE-28884 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Labels: pull-request-available > > When having this BrokenStoreFileCleaner enabled, one of our customers has run > into a data loss situation, probably due to a race condition between regions > getting moved out of the regionserver while the BrokenStoreFileCleaner was > checking this region's files eligibility for deletion. We have seen that the > file got deleted by the given region server, around the same time the region > got closed on this region server. I believe a race condition during region > close is possible here: > 1) In BrokenStoreFileCleaner, for each region online on the given RS, we get > the list of files in the store dirs, then iterate through it [1]; > 2) For each file listed, we perform several checks, including this one [2] > that checks if the file is "active" > The problem is, if the region for the file we are checking got closed between > point #1 and #2, by the time we check if the file is active in [2], the store > may have already been closed as part of the region closure, so this check > would consider the file as deletable. > One simple solution is to check if the store's region is still open before > proceeding with deleting the file. > [1] > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BrokenStoreFileCleaner.java#L99 > [2] > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BrokenStoreFileCleaner.java#L133 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28883) Manage hbase-thirdparty transitive dependencies via BOM pom
[ https://issues.apache.org/jira/browse/HBASE-28883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28883: --- Labels: pull-request-available (was: ) > Manage hbase-thirdparty transitive dependencies via BOM pom > --- > > Key: HBASE-28883 > URL: https://issues.apache.org/jira/browse/HBASE-28883 > Project: HBase > Issue Type: Task > Components: build, thirdparty >Reporter: Nick Dimiduk >Priority: Major > Labels: pull-request-available > > Despite the intentions to the contrary, there are several places where we > need the version of a dependency managed in hbase-thirdparty to match an > import in the main product (and maybe also in our other repos). Right now, > this is managed via comments in the poms, which read "when this changes > there, don't for get to update it here...". We can do better than this. > I think that hbase-thirdparty could publish a BOM pom file that can be > imported into any of the downstream hbase projects that make use of that > release of hbase-thirdparty. That will centralize management of these > dependencies in the hbase-thirdparty repo. > This blog post has a nice write-up on the idea, > https://www.garretwilson.com/blog/2023/06/14/improve-maven-bom-pattern -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28879) Bump hbase-thirdparty to 4.1.9
[ https://issues.apache.org/jira/browse/HBASE-28879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28879: --- Labels: pull-request-available (was: ) > Bump hbase-thirdparty to 4.1.9 > -- > > Key: HBASE-28879 > URL: https://issues.apache.org/jira/browse/HBASE-28879 > Project: HBase > Issue Type: Task > Components: dependencies, thirdparty >Reporter: Duo Zhang >Assignee: Nick Dimiduk >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28882) Backup restores are broken if the backup has moved locations
[ https://issues.apache.org/jira/browse/HBASE-28882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28882: --- Labels: pull-request-available (was: ) > Backup restores are broken if the backup has moved locations > > > Key: HBASE-28882 > URL: https://issues.apache.org/jira/browse/HBASE-28882 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0, 2.6.1 >Reporter: Ray Mattingly >Assignee: Ray Mattingly >Priority: Major > Labels: pull-request-available > > My company runs a few hundred HBase clusters. We want to take backups > everyday in one public cloud region, and then use said cloud's native > replication solution to "backup our backups" in a secondary region. This is > how we plan for region-wide disaster recovery. > This system should work, but doesn't because of the way that BackupManifests > are constructed. > Backing up a bit (no pun intended): when we replicate backups verbatim, the > manifest file continues to point to the original backup root. This shouldn't > matter, because when taking a restore one passes a RestoreRequest to the > RestoreTablesClient — and this RestoreRequest includes a BackupRootDir field. > This works as you would expect initially, but eventually we build a > BackupManifest that fails to interpolate this provided root directory and, > instead, falls back to what it finds on disk in the backup (which would point > back to the primary backup location, even if reading a replicated backup). > To fix this, I'm proposing that we properly interpolate the request's root > directory field when building BackupManifests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28880) ParseException may occur when getting the fileDate of the mob file recovered through snapshot
[ https://issues.apache.org/jira/browse/HBASE-28880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28880: --- Labels: pull-request-available (was: ) > ParseException may occur when getting the fileDate of the mob file recovered > through snapshot > - > > Key: HBASE-28880 > URL: https://issues.apache.org/jira/browse/HBASE-28880 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.4.13 > Environment: hbase2.4.13 > centos >Reporter: guluo >Assignee: guluo >Priority: Major > Labels: pull-request-available > > The task ExpiredMobFileCleaner may occur ParseException when parsing MOB file > recovered through snapshot, causing these expired MOB file cannot be deleted. > > The Reason: > The task ExpiredMobFileCleaner obtain the MOB file creation time by parsing > the MOB filename. > In regular MOB table, the 32nd to 40th characters of the MOB filename > indicate the file creation time, ExpiredMobFileCleaner can get the creation > time of MOB file by obtaining these characters. > However, in MOB tables recovered through snapshot, the format of MOB filename > is tableName-mobregionaname-hfilename,so ExpiredMobFileCleaner may not be > able to obtain the creation time of MOB file by obtaining the characters at > the above location. So, in this situation, ParseException will occur, causing > these expired MOB file cannot be deleted finally. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-26867) Introduce a FlushProcedure
[ https://issues.apache.org/jira/browse/HBASE-26867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-26867: --- Labels: pull-request-available (was: ) > Introduce a FlushProcedure > -- > > Key: HBASE-26867 > URL: https://issues.apache.org/jira/browse/HBASE-26867 > Project: HBase > Issue Type: New Feature > Components: proc-v2 >Reporter: ruanhui >Assignee: ruanhui >Priority: Minor > Labels: pull-request-available > Fix For: 2.6.0, 3.0.0-beta-1 > > > Reimplement proc-v1 based flush procedure in proc-v2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27757) Clean up ScanMetrics API
[ https://issues.apache.org/jira/browse/HBASE-27757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-27757: --- Labels: pull-request-available (was: ) > Clean up ScanMetrics API > > > Key: HBASE-27757 > URL: https://issues.apache.org/jira/browse/HBASE-27757 > Project: HBase > Issue Type: Improvement >Reporter: Bryan Beaudreault >Assignee: Chandra Sekhar K >Priority: Major > Labels: pull-request-available > > ScanMetrics object exposes public instance variables for all metrics. For > example, ScanMetrics.countOfRPCcalls. This is not standard API design in java > or in hbase, and requires suppressing VisibilityModifier checkstyle warnings. > We should clean this up, but would require major version changes since it's > part of public API -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28382) Support building hbase-connectors with JDK17
[ https://issues.apache.org/jira/browse/HBASE-28382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28382: --- Labels: pull-request-available (was: ) > Support building hbase-connectors with JDK17 > > > Key: HBASE-28382 > URL: https://issues.apache.org/jira/browse/HBASE-28382 > Project: HBase > Issue Type: Sub-task > Components: hbase-connectors, java >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28790) hbase-connectors fails to build with hbase 2.6.0
[ https://issues.apache.org/jira/browse/HBASE-28790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28790: --- Labels: pull-request-available (was: ) > hbase-connectors fails to build with hbase 2.6.0 > > > Key: HBASE-28790 > URL: https://issues.apache.org/jira/browse/HBASE-28790 > Project: HBase > Issue Type: Bug > Components: build, hbase-connectors >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > hbase-connectors fails to build with hbase 2.6.0 > {code:java} > [INFO] Reactor Summary for Apache HBase Connectors 1.1.0-SNAPSHOT: > [INFO] > [INFO] Apache HBase Connectors SUCCESS [ 4.377 > s] > [INFO] Apache HBase - Kafka ... SUCCESS [ 0.116 > s] > [INFO] Apache HBase - Model Objects for Kafka Proxy ... SUCCESS [ 3.222 > s] > [INFO] Apache HBase - Kafka Proxy . FAILURE [ 8.305 > s] > [INFO] Apache HBase - Spark ... SKIPPED > [INFO] Apache HBase - Spark Protocol .. SKIPPED > [INFO] Apache HBase - Spark Protocol (Shaded) . SKIPPED > [INFO] Apache HBase - Spark Connector . SKIPPED > [INFO] Apache HBase - Spark Integration Tests . SKIPPED > [INFO] Apache HBase Connectors - Assembly . SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 16.703 s > [INFO] Finished at: 2024-08-17T11:29:20Z > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile > (default-compile) on project hbase-kafka-proxy: Compilation failure > [ERROR] > /workspaces/hbase-connectors/kafka/hbase-kafka-proxy/src/main/java/org/apache/hadoop/hbase/kafka/KafkaBridgeConnection.java:[169,31] > is not > abstract and does not override abstract method > setRequestAttribute(java.lang.String,byte[]) in > org.apache.hadoop.hbase.client.TableBuilder {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28876) Should call ProcedureSchduler.completionCleanup for non-root procedure too
[ https://issues.apache.org/jira/browse/HBASE-28876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28876: --- Labels: pull-request-available (was: ) > Should call ProcedureSchduler.completionCleanup for non-root procedure too > -- > > Key: HBASE-28876 > URL: https://issues.apache.org/jira/browse/HBASE-28876 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Duo Zhang >Priority: Major > Labels: pull-request-available > > Per the discussion in this PR > https://github.com/apache/hbase/pull/6247 > And the related issue HBASE-28830, it seems incorrect that we only call > cleanup for non-root procedures. > This issue aims to see if there are any issues we call this method for every > procedure. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28875) FSHlog closewrite closeErrorCount should increment for initial catch exception
[ https://issues.apache.org/jira/browse/HBASE-28875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28875: --- Labels: pull-request-available (was: ) > FSHlog closewrite closeErrorCount should increment for initial catch exception > -- > > Key: HBASE-28875 > URL: https://issues.apache.org/jira/browse/HBASE-28875 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 2.7.0 >Reporter: Y. SREENIVASULU REDDY >Assignee: Y. SREENIVASULU REDDY >Priority: Minor > Labels: pull-request-available > > During close writer for FSHlog, if any error occured then for initial > exception itself need to increment the closeErrorCount counter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28868) Add missing permission check for updateRSGroupConfig in branch-2
[ https://issues.apache.org/jira/browse/HBASE-28868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28868: --- Labels: pull-request-available (was: ) > Add missing permission check for updateRSGroupConfig in branch-2 > > > Key: HBASE-28868 > URL: https://issues.apache.org/jira/browse/HBASE-28868 > Project: HBase > Issue Type: Task > Components: rsgroup >Affects Versions: 2.7.0 >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Minor > Labels: pull-request-available > > Found this during HBASE-28867, we do not have security check for > updateRSGroupConfig in branch-2. See > [https://github.com/apache/hbase/blob/0dc334f572329be7eb2455cec3519fc820c04c25/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminEndpoint.java#L450] > Same check exists in master > [https://github.com/apache/hbase/blob/52082bc5b80a60406bfaaa630ed5cb23027436c1/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java#L2279] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28871) [hbase-thirdparty] Bump dependency versions before releasing
[ https://issues.apache.org/jira/browse/HBASE-28871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28871: --- Labels: pull-request-available (was: ) > [hbase-thirdparty] Bump dependency versions before releasing > > > Key: HBASE-28871 > URL: https://issues.apache.org/jira/browse/HBASE-28871 > Project: HBase > Issue Type: Sub-task > Components: dependencies, thirdparty >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28869) [hbase-thirdparty] Bump protobuf java to 4.27.5+
[ https://issues.apache.org/jira/browse/HBASE-28869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28869: --- Labels: pull-request-available (was: ) > [hbase-thirdparty] Bump protobuf java to 4.27.5+ > > > Key: HBASE-28869 > URL: https://issues.apache.org/jira/browse/HBASE-28869 > Project: HBase > Issue Type: Task > Components: Protobufs, security, thirdparty >Reporter: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: thirdparty-4.1.9 > > > For addressing CVE-2024-7254 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28867) Backport "HBASE-20653 Add missing observer hooks for region server group to MasterObserver" to branch-2
[ https://issues.apache.org/jira/browse/HBASE-28867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28867: --- Labels: pull-request-available (was: ) > Backport "HBASE-20653 Add missing observer hooks for region server group to > MasterObserver" to branch-2 > --- > > Key: HBASE-28867 > URL: https://issues.apache.org/jira/browse/HBASE-28867 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.5.10 >Reporter: Ted Yu >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > Currently the following region server group operations don't have > corresponding hook in MasterObserver : > * getRSGroupInfo > * getRSGroupInfoOfServer > * getRSGroupInfoOfTable > * listRSGroup > This JIRA is to > * add them to MasterObserver > * add pre/post hook calls in RSGroupAdminEndpoint thru > master.getMasterCoprocessorHost for the above operations > * add corresponding tests to TestRSGroups (in similar manner to that of > HBASE-20627) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28864) NoMethodError undefined method assignment_expression?
[ https://issues.apache.org/jira/browse/HBASE-28864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28864: --- Labels: pull-request-available (was: ) > NoMethodError undefined method assignment_expression? > - > > Key: HBASE-28864 > URL: https://issues.apache.org/jira/browse/HBASE-28864 > Project: HBase > Issue Type: Bug > Components: shell >Affects Versions: 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0 >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2 > > > After HBASE-28250 Bump jruby to 9.4.8.0 to fix snakeyaml CVE after every > command the message "NoMethodError undefined method assignment_expression?" > is printed. > This is called from code copied from > https://github.com/ruby/irb/blob/v1.4.2/lib/irb.rb . The fix is to also copy > over the definition of `assignment_expression`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28835) Make connector support for Decimal type
[ https://issues.apache.org/jira/browse/HBASE-28835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28835: --- Labels: pull-request-available (was: ) > Make connector support for Decimal type > --- > > Key: HBASE-28835 > URL: https://issues.apache.org/jira/browse/HBASE-28835 > Project: HBase > Issue Type: Improvement > Components: spark >Affects Versions: connector-1.0.0 >Reporter: yan.duan >Priority: Minor > Labels: pull-request-available > Fix For: connector-1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28862) Change the generic type for ObserverContext from 'RegionCoprocessorEnvironment' to '? extends RegionCoprocessorEnvironment' in RegionObserver
[ https://issues.apache.org/jira/browse/HBASE-28862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28862: --- Labels: pull-request-available (was: ) > Change the generic type for ObserverContext from > 'RegionCoprocessorEnvironment' to '? extends RegionCoprocessorEnvironment' in > RegionObserver > - > > Key: HBASE-28862 > URL: https://issues.apache.org/jira/browse/HBASE-28862 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors, regionserver >Reporter: Duo Zhang >Priority: Major > Labels: pull-request-available > > This will be a breaking change for coprocessor implementation, but the > ability of region observer is not changed, so I think it is OK to include > this in 3.0.0 release, as we have already changed the coprocessor protobuf to > the relocated one, which already breaks lots of coprocessors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28721) AsyncFSWAL is broken when running against hadoop 3.4.0
[ https://issues.apache.org/jira/browse/HBASE-28721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28721: --- Labels: pull-request-available (was: ) > AsyncFSWAL is broken when running against hadoop 3.4.0 > -- > > Key: HBASE-28721 > URL: https://issues.apache.org/jira/browse/HBASE-28721 > Project: HBase > Issue Type: Bug > Components: hadoop3, wal >Reporter: Duo Zhang >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > > {noformat} > 2024-07-10T10:09:33,161 ERROR [master/localhost:0:becomeActiveMaster {}] > asyncfs.FanOutOneBlockAsyncDFSOutputHelper(258): Couldn't properly initialize > access to HDFS internals. Please update your WAL Provider to not make use of > the 'asyncfs' provider. See HBASE-16110 for more information. > java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.DFSClient.beginFileLease(long,org.apache.hadoop.hdfs.DFSOutputStream) > at java.lang.Class.getDeclaredMethod(Class.java:2675) ~[?:?] > at > org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createLeaseManager(FanOutOneBlockAsyncDFSOutputHelper.java:175) > ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at > org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.(FanOutOneBlockAsyncDFSOutputHelper.java:252) > ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at java.lang.Class.forName0(Native Method) ~[?:?] > at java.lang.Class.forName(Class.java:375) ~[?:?] > at > org.apache.hadoop.hbase.wal.AsyncFSWALProvider.load(AsyncFSWALProvider.java:149) > ~[classes/:?] > at > org.apache.hadoop.hbase.wal.WALFactory.getProviderClass(WALFactory.java:174) > ~[classes/:?] > at org.apache.hadoop.hbase.wal.WALFactory.(WALFactory.java:262) > ~[classes/:?] > at org.apache.hadoop.hbase.wal.WALFactory.(WALFactory.java:231) > ~[classes/:?] > at > org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:383) > ~[classes/:?] > at > org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:135) > ~[classes/:?] > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1003) > ~[classes/:?] > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) > ~[classes/:?] > at > org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) > ~[classes/:?] > at > org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) > ~[hbase-common-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at java.lang.Thread.run(Thread.java:840) ~[?:?] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28569) Race condition during WAL splitting leading to corrupt recovered.edits
[ https://issues.apache.org/jira/browse/HBASE-28569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28569: --- Labels: pull-request-available (was: ) > Race condition during WAL splitting leading to corrupt recovered.edits > -- > > Key: HBASE-28569 > URL: https://issues.apache.org/jira/browse/HBASE-28569 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.4.17 >Reporter: Benoit Sigoure >Priority: Major > Labels: pull-request-available > > There is a race condition that can happen when a regionserver aborts > initialisation while splitting a WAL from another regionserver. This race > leads to writing the WAL trailer for recovered edits while the writer threads > are still running, thus the trailer gets interleaved with the edits > corrupting the recovered edits file (and preventing the region to be > assigned). > We've seen this happening on HBase 2.4.17, but looking at the latest code it > seems that the race can still happen there. > The sequence of operations that leads to this issue: > * {{org.apache.hadoop.hbase.wal.WALSplitter.splitWAL}} calls > {{outputSink.close()}} after adding all the entries to the buffers > * The output sink is > {{org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink}} and its {{close}} > method calls first {{finishWriterThreads}} in a try block which in turn will > call {{finish}} on every thread and then join it to make sure it's done. > * However if the splitter thread gets interrupted because of RS aborting, > the join will get interrupted and {{finishWriterThreads}} will rethrow > without waiting for the writer threads to stop. > * This is problematic because coming back to > {{org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close}} it will call > {{closeWriters}} in a finally block (so it will execute even when the join > was interrupted). > * {{closeWriters}} will call > {{org.apache.hadoop.hbase.wal.AbstractRecoveredEditsOutputSink.closeRecoveredEditsWriter}} > which will call {{close}} on {{{}editWriter.writer{}}}. > * When {{editWriter.writer}} is > {{{}org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter{}}}, its > {{close}} method will write the trailer before closing the file. > * This trailer write will now go in parallel with writer threads writing > entries causing corruption. > * If there are no other errors, {{closeWriters}} will succeed renaming all > temporary files to final recovered edits, causing problems next time the > region is assigned. > Logs evidence supporting the above flow: > Abort is triggered (because it failed to open the WAL due to some ongoing > infra issue): > {noformat} > regionserver-2 regionserver 06:22:00.384 > [RS_OPEN_META-regionserver/host01:16201-0] ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer - * ABORTING region > server host01,16201,1709187641249: WAL can not clean up after init failed > *{noformat} > We can see that the writer threads were still active after closing (even > considering that the > ordering in the log might not be accurate, we see that they die because the > channel is closed while still writing, not because they're stopping): > {noformat} > regionserver-2 regionserver 06:22:09.662 [DataStreamer for file > /hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/03169600719-host02%2C16201%2C1709180140645.1709186722780.temp > block BP-1645452845-192.168.2.230-1615455682886:blk_1076340939_2645368] WARN > org.apache.hadoop.hdfs.DataStreamer - Error Recovery for > BP-1645452845-192.168.2.230-1615455682886:blk_1076340939_2645368 in pipeline > [DatanodeInfoWithStorage[192.168.2.230:15010,DS-2aa201ab-1027-47ec-b05f-b39d795fda85,DISK], > > DatanodeInfoWithStorage[192.168.2.232:15010,DS-39651d5a-67d2-4126-88f0-45cdee967dab,DISK], > Datanode > InfoWithStorage[192.168.2.231:15010,DS-e08a1d17-f7b1-4e39-9713-9706bd762f48,DISK]]: > datanode > 2(DatanodeInfoWithStorage[192.168.2.231:15010,DS-e08a1d17-f7b1-4e39-9713-9706bd762f48,DISK]) > is bad. > regionserver-2 regionserver 06:22:09.742 [split-log-closeStream-pool-1] INFO > org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink - Closed recovered edits > writer > path=hdfs://mycluster/hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/03169600719-host02%2C16201% > 2C1709180140645.1709186722780.temp (wrote 5949 edits, skipped 0 edits in 93 > ms) > regionserver-2 regionserver 06:22:09.743 > [RS_LOG_REPLAY_OPS-regionserver/host01:16201-1-Writer-0] ERROR > org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink - Failed to write log > entry aeris_v2/53308260a6b22eaf6ebb8353f7df3077/3169611655=[#edits: 8 = > ] to log > regionserver-2 regionserver
[jira] [Updated] (HBASE-28797) New version of Region#getRowLock with timeout
[ https://issues.apache.org/jira/browse/HBASE-28797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28797: --- Labels: pull-request-available (was: ) > New version of Region#getRowLock with timeout > - > > Key: HBASE-28797 > URL: https://issues.apache.org/jira/browse/HBASE-28797 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.6.0, 3.0.0-beta-1 >Reporter: Viraj Jasani >Assignee: Chandra Sekhar K >Priority: Major > Labels: pull-request-available > > Region APIs are LimitedPrivate for Coprocs. One of the APIs provided by HBase > for Coproc use is to acquire row level read/write lock(s): > {code:java} > /** > * Get a row lock for the specified row. All locks are reentrant. Before > calling this function > * make sure that a region operation has already been started (the calling > thread has already > * acquired the region-close-guard lock). > * > * The obtained locks should be released after use by {@link > RowLock#release()} > * > * NOTE: the boolean passed here has changed. It used to be a boolean that > stated whether or not > * to wait on the lock. Now it is whether it an exclusive lock is requested. > * @param row The row actions will be performed against > * @param readLock is the lock reader or writer. True indicates that a > non-exclusive lock is > * requested > * @see #startRegionOperation() > * @see #startRegionOperation(Operation) > */ > RowLock getRowLock(byte[] row, boolean readLock) throws IOException; {code} > The implementation by default uses config "hbase.rowlock.wait.duration" as > row level lock timeout for both read and write locks. The default value is > quite high (~30s). > While updating the cluster level row lock timeout might not be worth for all > use cases, having new API that takes timeout param would be really helpful > for critical latency sensitive Coproc APIs. > > The new signature should be: > {code:java} > RowLock getRowLock(byte[] row, boolean readLock, int timeout) throws > IOException; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28850) Only return from ReplicationSink.replicationEntries while all background tasks are finished
[ https://issues.apache.org/jira/browse/HBASE-28850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28850: --- Labels: pull-request-available (was: ) > Only return from ReplicationSink.replicationEntries while all background > tasks are finished > --- > > Key: HBASE-28850 > URL: https://issues.apache.org/jira/browse/HBASE-28850 > Project: HBase > Issue Type: Improvement > Components: Replication, rpc >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28645) Add build information to the REST server version endpoint
[ https://issues.apache.org/jira/browse/HBASE-28645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28645: --- Labels: pull-request-available (was: ) > Add build information to the REST server version endpoint > - > > Key: HBASE-28645 > URL: https://issues.apache.org/jira/browse/HBASE-28645 > Project: HBase > Issue Type: New Feature > Components: REST >Reporter: Istvan Toth >Priority: Minor > Labels: pull-request-available > > There is currently no way to check the REST server version / build number > remotely. > The */version/cluster* endpoint takes the version from master (fair enough), > and the */version/rest* does not include the build information. > We should add a version field to the /version/rest endpoint, which reports > the version of the REST server component. > We should also log this at startup, just like we log the cluster version now. > We may have to add and store the version in the hbase-rest code during build, > similarly to how do it for the other components. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28846) Change the default Hadoop3 version to 3.4.0, and add tests to make sure HBase works with earlier supported Hadoop versions
[ https://issues.apache.org/jira/browse/HBASE-28846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28846: --- Labels: pull-request-available (was: ) > Change the default Hadoop3 version to 3.4.0, and add tests to make sure HBase > works with earlier supported Hadoop versions > -- > > Key: HBASE-28846 > URL: https://issues.apache.org/jira/browse/HBASE-28846 > Project: HBase > Issue Type: Improvement > Components: hadoop3, test >Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0 >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > > Discussed on the mailing list: > https://lists.apache.org/thread/orc62x0v2ktvj26ltvrqpfgzr94ncswn -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28860) Add a metric of the amount of data written to WAL to determine the pressure of replication
[ https://issues.apache.org/jira/browse/HBASE-28860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28860: --- Labels: pull-request-available (was: ) > Add a metric of the amount of data written to WAL to determine the pressure > of replication > -- > > Key: HBASE-28860 > URL: https://issues.apache.org/jira/browse/HBASE-28860 > Project: HBase > Issue Type: Improvement >Reporter: terrytlu >Priority: Major > Labels: pull-request-available > > Add a metric of the amount of data written to WAL to determine the pressure > of replication. > Combined with the replication shipped size metric, the user can determine how > many RegionServers are needed to meet the data(WAL) writing requirements, > that is to achieve the goal of no replication lag. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28845) table level wal appendSize and replication source metrics not correctly shown in /jmx response
[ https://issues.apache.org/jira/browse/HBASE-28845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28845: --- Labels: pull-request-available (was: ) > table level wal appendSize and replication source metrics not correctly shown > in /jmx response > -- > > Key: HBASE-28845 > URL: https://issues.apache.org/jira/browse/HBASE-28845 > Project: HBase > Issue Type: Bug >Reporter: terrytlu >Assignee: terrytlu >Priority: Major > Labels: pull-request-available > Attachments: image-2024-09-18-11-21-10-279.png, > image-2024-09-18-11-21-20-295.png > > > Found 2 metrics did not display in the /jmx http interface response, table > level wal appendSize and table level replication source. > I suspect it's because the metric name contains a colon : > !image-2024-09-18-11-21-10-279.png|width=521,height=161! > > !image-2024-09-18-11-21-20-295.png|width=521,height=282! > > after modify the table name string to "Namespace_$namespace_table_$table, the > metric really display correctly in the /jmx response > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28842) TestRequestAttributes should fail when expected
[ https://issues.apache.org/jira/browse/HBASE-28842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28842: --- Labels: pull-request-available (was: ) > TestRequestAttributes should fail when expected > --- > > Key: HBASE-28842 > URL: https://issues.apache.org/jira/browse/HBASE-28842 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 3.0.0 >Reporter: Evelyn Boland >Assignee: Evelyn Boland >Priority: Major > Labels: pull-request-available > > Problem: > The tests in the TestRequestAttributes class pass even when they should fail. > I've included an example of a test that should fail but does not below. > Fix: > Throw an IOException in the AttributesCoprocessor when the map of expected > request attributes does not match the map of given request attributes. > > Test: > We set 2+ request attributes on the Get request but always return 0 request > attributes from AttributesCoprocessor::getRequestAttributesForRowKey method. > Yet the test passes even though the map of expected request attributes never > matches the map of given request attributes. > {code:java} > @Category({ ClientTests.class, MediumTests.class }) > public class TestRequestAttributes { > @ClassRule > public static final HBaseClassTestRule CLASS_RULE = > HBaseClassTestRule.forClass(TestRequestAttributes.class); > private static final byte[] ROW_KEY1 = Bytes.toBytes("1"); > private static final Map> > ROW_KEY_TO_REQUEST_ATTRIBUTES = > new HashMap<>(); > static { > CONNECTION_ATTRIBUTES.put("clientId", Bytes.toBytes("foo")); > ROW_KEY_TO_REQUEST_ATTRIBUTES.put(ROW_KEY1, addRandomRequestAttributes()); > } > private static final ExecutorService EXECUTOR_SERVICE = > Executors.newFixedThreadPool(100); > private static final byte[] FAMILY = Bytes.toBytes("0"); > private static final TableName TABLE_NAME = > TableName.valueOf("testRequestAttributes"); > private static final HBaseTestingUtil TEST_UTIL = new HBaseTestingUtil(); > private static SingleProcessHBaseCluster cluster; > @BeforeClass > public static void setUp() throws Exception { > cluster = TEST_UTIL.startMiniCluster(1); > Table table = TEST_UTIL.createTable(TABLE_NAME, new byte[][] { FAMILY }, > 1, > HConstants.DEFAULT_BLOCKSIZE, AttributesCoprocessor.class.getName()); > table.close(); > } > @AfterClass > public static void afterClass() throws Exception { > cluster.close(); > TEST_UTIL.shutdownMiniCluster(); > } > @Test > public void testRequestAttributesGet() throws IOException { > Configuration conf = TEST_UTIL.getConfiguration(); > try ( > Connection conn = ConnectionFactory.createConnection(conf, null, > AuthUtil.loginClient(conf), > CONNECTION_ATTRIBUTES); > Table table = configureRequestAttributes(conn.getTableBuilder(TABLE_NAME, > EXECUTOR_SERVICE), > ROW_KEY_TO_REQUEST_ATTRIBUTES.get(ROW_KEY1)).build()) { > table.get(new Get(ROW_KEY1)); > } > } > private static Map addRandomRequestAttributes() { > Map requestAttributes = new HashMap<>(); > int j = Math.max(2, (int) (10 * Math.random())); > for (int i = 0; i < j; i++) { > requestAttributes.put(String.valueOf(i), > Bytes.toBytes(UUID.randomUUID().toString())); > } > return requestAttributes; > } > public static class AttributesCoprocessor implements RegionObserver, > RegionCoprocessor { > @Override > public Optional getRegionObserver() { > return Optional.of(this); > } > @Override > public void preGetOp(ObserverContext c, Get > get, > List result) throws IOException { > > validateRequestAttributes(getRequestAttributesForRowKey(get.getRow(; > } > private Map getRequestAttributesForRowKey(byte[] rowKey) { > return Collections.emptyMap(); // This line helps demonstrate the bug > } > private boolean validateRequestAttributes(Map > requestAttributes) { > RpcCall rpcCall = RpcServer.getCurrentCall().get(); > Map attrs = rpcCall.getRequestAttributes(); > if (attrs.size() != requestAttributes.size()) { > return; > } > for (Map.Entry attr : attrs.entrySet()) { > if (!requestAttributes.containsKey(attr.getKey())) { > return; > } > if (!Arrays.equals(requestAttributes.get(attr.getKey()), > attr.getValue())) { > return; > } > } > return; > } > } > } {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28803) HBase Master stuck due to improper handling of WALSyncTimeoutException within UncheckedIOException
[ https://issues.apache.org/jira/browse/HBASE-28803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28803: --- Labels: pull-request-available (was: ) > HBase Master stuck due to improper handling of WALSyncTimeoutException within > UncheckedIOException > -- > > Key: HBASE-28803 > URL: https://issues.apache.org/jira/browse/HBASE-28803 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 2.6.0, 3.0.0-alpha-4 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Critical > Labels: pull-request-available > > One of our test clusters stuck during a rolling restart due to a WAL.sync > timeout. This issue did not result in the Master aborting because the > WALSyncTimeoutException was wrapped in an UncheckedIOException, which > prevented the proper exception handling mechanism from being triggered. As a > result, the Master was handing for a long time and procedures were stuck. > This was a 2.4 based HBase with HBASE-27230. > {noformat} > 2024-08-17 17:23:07,567 ERROR > org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore: Failed > to delete pid=2027 > org.apache.hadoop.hbase.regionserver.wal.WALSyncTimeoutIOException: > org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync > result after 30 ms for txid=4347, WAL system stuck? > at > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:848) > at > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:718) > at org.apache.hadoop.hbase.regionserver.HRegion.sync(HRegion.java:8902) > at > org.apache.hadoop.hbase.regionserver.HRegion.doWALAppend(HRegion.java:8469) > at > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4523) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4447) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4377) > at > org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4853) > at > org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4847) > at > org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4843) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3155) > at > org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.lambda$delete$8(RegionProcedureStore.java:379) > at > org.apache.hadoop.hbase.master.region.MasterRegion.update(MasterRegion.java:141) > at > org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.delete(RegionProcedureStore.java:379) > at > org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.delete(RegionProcedureStore.java:410) > at > org.apache.hadoop.hbase.procedure2.CompletedProcedureCleaner.periodicExecute(CompletedProcedureCleaner.java:135) > at > org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeInMemoryChore(TimeoutExecutorThread.java:122) > at > org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:101) > at > org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68) > Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to > get sync result after 30 ms for txid=4347, WAL system stuck? > at > org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:171) > at > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:844) > ... 18 more > 2024-08-17 17:23:07,568 ERROR > org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread: Ignoring pid=-1, > state=WAITING_TIMEOUT; > org.apache.hadoop.hbase.procedure2.CompletedProcedureCleaner exception: > org.apache.hadoop.hbase.regionserver.wal.WALSyncTimeoutIOException: > org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync > result after 30 ms for txid=4347, WAL system stuck? > java.io.UncheckedIOException: > org.apache.hadoop.hbase.regionserver.wal.WALSyncTimeoutIOException: > org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync > result after 30 ms for txid=4347, WAL system stuck? > at > org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.delete(RegionProcedureStore.java:383) > at > org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.delete(RegionProcedureStore.java:410) > at > org.apache.hadoop.hbase.procedure2.CompletedProcedureCleaner.periodicExecute(CompletedProcedureCleaner.java:135) > at > org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeInMemoryChore(Ti
[jira] [Updated] (HBASE-28840) Optimise memory utilisation retrieval of bucket-cache from persistence.
[ https://issues.apache.org/jira/browse/HBASE-28840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28840: --- Labels: pull-request-available (was: ) > Optimise memory utilisation retrieval of bucket-cache from persistence. > --- > > Key: HBASE-28840 > URL: https://issues.apache.org/jira/browse/HBASE-28840 > Project: HBase > Issue Type: Bug > Components: BucketCache >Reporter: Janardhan Hungund >Assignee: Janardhan Hungund >Priority: Major > Labels: pull-request-available > > During the persistence of bucket-cache backing map to a file, the backing map > is divided into multiple smaller chunks and persisted to the file. This > chunking avoids the high memory utilisation of during persistence, since only > a small subset of backing map entries need to persisted in one chunk. > However, during the retrieval of the backing map during the server startup, > we accumulate all these chunks into a list and then process each chunk to > recreate the in-memory backing map. Since, all the chunks are fetched from > the persistence file and then processed, the memory requirement is higher. > The retrieval of bucket-cache from persistence file can be optimised to > enable the processing of one chunk at a time to avoid high memory utilisation. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28841) Modify default value of hbase.bucketcache.persistence.chunksize to 10K
[ https://issues.apache.org/jira/browse/HBASE-28841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28841: --- Labels: pull-request-available (was: ) > Modify default value of hbase.bucketcache.persistence.chunksize to 10K > -- > > Key: HBASE-28841 > URL: https://issues.apache.org/jira/browse/HBASE-28841 > Project: HBase > Issue Type: Bug >Reporter: Janardhan Hungund >Assignee: Janardhan Hungund >Priority: Major > Labels: pull-request-available > > Currently, the default value of the configuration parameter > "hbase.bucketcache.persistence.chunksize" is 10 million (1000). > This is the number of block entries that are processed during the persistence > of bucket-cache backing map to the persistence file. During the testing, it > was found that, this high number of chunksize resulted in high heap > utilisation in region servers leading to longer GC pauses which also led to > server crashes intermittently. > When the value of this configuration is set to 10K(1), the cache remains > stable. No GC delays are observed. Also no server crashes are observed. > The jmap outputs collected against the regionservers showed reduced memory > utilisation from 4.5-5GB to 1-1.5GB for the objects related to persistence > code. > Hence, we need to adjust the default value of this configuration parameter to > 10K. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28839) Exception handling during retrieval of bucket-cache from persistence.
[ https://issues.apache.org/jira/browse/HBASE-28839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28839: --- Labels: pull-request-available (was: ) > Exception handling during retrieval of bucket-cache from persistence. > - > > Key: HBASE-28839 > URL: https://issues.apache.org/jira/browse/HBASE-28839 > Project: HBase > Issue Type: Bug > Components: BucketCache >Reporter: Janardhan Hungund >Assignee: Janardhan Hungund >Priority: Major > Labels: pull-request-available > > During the retrieval of bucket cache from the persistence file during the > startup, it was observed that, if an exception, other than, the IOException > occurs, the bucket cache internal members remain uninitialised and cause the > bucket to remain unusable. The exception is not logged in the trace file and > the retrieval thread exits without initialising the bucket-cache. > Also, the NullPointerExceptions are seen when, trying to use the cache. > {code:java} > 2024-09-10 14:33:30,020 ERROR > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache: WriterThread encountered > error > java.lang.NullPointerException > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$RAMQueueEntry.writeToCache(BucketCache.java:1975) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.doDrain(BucketCache.java:1298) > {code} > > {code:java} > 2024-09-13 07:01:05,964 ERROR > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: Error getting metrics > from source RegionServer,sub=Server > java.lang.NullPointerException > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getFreeSize(BucketCache.java:1819) > at > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getFreeSize(CombinedBlockCache.java:179) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionServerWrapperImpl.getBlockCacheFreeSize(MetricsRegionServerWrapperImpl.java:308) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceImpl.addGaugesToMetricsRecordBuilder(MetricsRegionServerSourceImpl.java:525) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceImpl.getMetrics(MetricsRegionServerSourceImpl.java:333) > {code} > All type of exceptions need to be handled gracefully. > All types of exceptions must be logged to the trace file. > The bucket cache needs to reinitialised and made usable. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28432) Refactor tools which are under test packaging to a new module hbase-tools
[ https://issues.apache.org/jira/browse/HBASE-28432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28432: --- Labels: pull-request-available (was: ) > Refactor tools which are under test packaging to a new module hbase-tools > - > > Key: HBASE-28432 > URL: https://issues.apache.org/jira/browse/HBASE-28432 > Project: HBase > Issue Type: Sub-task >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > The purpose of this task is to refactor and move certain tools currently > located under the test packaging to a new module, named 'hbase-tools'. > The following tools have been initially identified for relocation(will add > more as and when identified): > - > [PerformanceEvaluation|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java] > - > [LoadTestTool|https://github.com/apache/hbase/blob/936d267d1094e37222b9b836ab068689ccce3574/hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/util/LoadTestTool.java] > - > [HFilePerformanceEvaluation|https://github.com/apache/hbase/blob/936d267d1094e37222b9b836ab068689ccce3574/hbase-server/src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java] > - > [ScanPerformanceEvaluation|https://github.com/apache/hbase/blob/936d267d1094e37222b9b836ab068689ccce3574/hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/ScanPerformanceEvaluation.java] > - > [LoadBalancerPerformanceEvaluation|https://github.com/apache/hbase/blob/936d267d1094e37222b9b836ab068689ccce3574/hbase-balancer/src/test/java/org/apache/hadoop/hbase/master/balancer/LoadBalancerPerformanceEvaluation.java] > These tools are valuable beyond the scope of testing and should be accessible > in the binary distribution of HBase. However, their current location within > the test jars adds unnecessary bloat to the assembly and classpath, and > potentially introduces CVE-prone JARs into the binary assemblies. We plan to > remove all test jars from assembly with HBASE-28433. > This task involves creating the new 'hbase-tools' module, and moving the > identified tools into this module. It also includes ensuring that these tools > function correctly in their new location and that their relocation does not > negatively impact any existing functionality or dependencies. > CC: [~stoty], [~zhangduo], [~ndimiduk], [~bbeaudreault] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28830) when a procedure on a table executed as a child procedure, further table procedure operations on that table are blocked forever waiting to acquire the table procedure lo
[ https://issues.apache.org/jira/browse/HBASE-28830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28830: --- Labels: pull-request-available (was: ) > when a procedure on a table executed as a child procedure, further table > procedure operations on that table are blocked forever waiting to acquire the > table procedure lock > > > Key: HBASE-28830 > URL: https://issues.apache.org/jira/browse/HBASE-28830 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 3.0.0-beta-1, 2.5.10 >Reporter: Chandra Sekhar K >Assignee: Chandra Sekhar K >Priority: Critical > Labels: pull-request-available > > when a procedure on a table executed as a child procedure, furthur table > procedure operations on that table are blocked forever waiting to aquire the > table procedure lock . > This issue occur due to not clearing of table lock for the table procedures > submitted as child procedures after the changes in HBASE-28683, -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28836) Parallelize the archival of compacted files
[ https://issues.apache.org/jira/browse/HBASE-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28836: --- Labels: pull-request-available (was: ) > Parallelize the archival of compacted files > > > Key: HBASE-28836 > URL: https://issues.apache.org/jira/browse/HBASE-28836 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.5.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Major > Labels: pull-request-available > > While splitting a region in hbase it has to cleanup compacted files for > bookkeeping. > > Currently we do it sequentially and that is good enough because for hdfs it > is a fast operation. When we do the same in s3 it becomes a issue. We need to > paralleize this to make it faster. > {code:java} > // code placeholder > for (File file : toArchive) { > // if its a file archive it > try { > LOG.trace("Archiving {}", file); > if (file.isFile()) { > // attempt to archive the file > if (!resolveAndArchiveFile(baseArchiveDir, file, startTime)) { > LOG.warn("Couldn't archive " + file + " into backup directory: " > + baseArchiveDir); > failures.add(file); > } > } else { > // otherwise its a directory and we need to archive all files > LOG.trace("{} is a directory, archiving children files", file); > // so we add the directory name to the one base archive > Path parentArchiveDir = new Path(baseArchiveDir, file.getName()); > // and then get all the files from that directory and attempt to > // archive those too > Collection children = file.getChildren(); > failures.addAll(resolveAndArchive(fs, parentArchiveDir, children, > start)); > } > } catch (IOException e) { > LOG.warn("Failed to archive {}", file, e); > failures.add(file); > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-25768: --- Labels: pull-request-available (was: ) > Support an overall coarse and fast balance strategy for StochasticLoadBalancer > -- > > Key: HBASE-25768 > URL: https://issues.apache.org/jira/browse/HBASE-25768 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > > When we use StochasticLoadBalancer + balanceByTable, we could face two > difficulties. > # For each table, their regions are distributed uniformly, but for the > overall cluster, still exiting imbalance between RSes; > # When there are large-scaled restart of RSes, or expansion for groups or > cluster, we hope the balancer can execute as soon as possible, but the > StochasticLoadBalancer may need a lot of time to compute costs. > We can detect these circumstances in StochasticLoadBalancer(such as using the > percentage of skew tables), and before the normal balance steps trying, we > can add a strategy to let it just balance like the SimpleLoadBalancer or use > few light cost functions here. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28825) Add deprecation cycle for methods in TokenUtil
[ https://issues.apache.org/jira/browse/HBASE-28825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28825: --- Labels: pull-request-available (was: ) > Add deprecation cycle for methods in TokenUtil > -- > > Key: HBASE-28825 > URL: https://issues.apache.org/jira/browse/HBASE-28825 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Duo Zhang >Assignee: MisterWang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28822) Change the LOG field in CodecPerformance to private
[ https://issues.apache.org/jira/browse/HBASE-28822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28822: --- Labels: pull-request-available (was: ) > Change the LOG field in CodecPerformance to private > --- > > Key: HBASE-28822 > URL: https://issues.apache.org/jira/browse/HBASE-28822 > Project: HBase > Issue Type: Sub-task > Components: logging, test >Reporter: Duo Zhang >Assignee: MisterWang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28821) Optimise bucket cache persistence by reusing backmap entry object.
[ https://issues.apache.org/jira/browse/HBASE-28821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28821: --- Labels: pull-request-available (was: ) > Optimise bucket cache persistence by reusing backmap entry object. > -- > > Key: HBASE-28821 > URL: https://issues.apache.org/jira/browse/HBASE-28821 > Project: HBase > Issue Type: Bug > Components: BucketCache >Reporter: Janardhan Hungund >Assignee: Janardhan Hungund >Priority: Major > Labels: pull-request-available > > During the persistence of backing map entries into the backing map file, we > create a new BackingMapEntry.Builder for each entry in the backing map. This > can be optimised by using a single BackingMapEntry.Builder object and using > it to build each entry during serialisation. > This Jira tracks the optimisation by avoiding multiple builder objects. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28812) Upgrade from 2.6.0 to 3.0.0 crashed
[ https://issues.apache.org/jira/browse/HBASE-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28812: --- Labels: pull-request-available upgrade (was: upgrade) > Upgrade from 2.6.0 to 3.0.0 crashed > --- > > Key: HBASE-28812 > URL: https://issues.apache.org/jira/browse/HBASE-28812 > Project: HBase > Issue Type: Bug > Components: compatibility >Affects Versions: 3.0.0 >Reporter: Ke Han >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available, upgrade > Attachments: hbase--master-2d6e4fad2af5.log, > hbase--master-440ed844e077.log > > > I am trying to upgrade from 2.6.0 (stable release) to 3.0.0. I built 3.0.0 > using the following commit (a030e8099840e640684a68b6e4a79e7c1d5a6823) > {code:java} > commit a030e8099840e640684a68b6e4a79e7c1d5a6823 (HEAD -> branch-3, > upstream/branch-3) > Author: Ray Mattingly > Date: Mon Sep 2 04:38:29 2024 -0400 HBASE-28697 Don't clean bulk load > system entries until backup is complete (#6089) > > Co-authored-by: Ray Mattingly > {code} > However, the HMaster would crash during the upgrade process. > h1. Reproduce > Step1: Start up 2.6.0 cluster (1 HDFS, 1 HM, 1 RS) > Step2: Stop the entire cluster > Step3: Upgrade to 3.0.0 cluster. > HMaster will crash with the following error message > {code:java} > 2024-09-04T04:29:18,917 WARN [master/hmaster:16000:becomeActiveMaster] > regionserver.HRegion: Failed initialize of region= > master:store,,1.1595e783b53d99cd5eef43b6debb2682., starting to roll back > memstore > java.io.IOException: java.io.IOException: > org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile > Trailer from file > hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75 > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1215) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1158) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1030) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:974) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7794) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7749) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:277) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:432) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:135) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1003) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) > ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at java.lang.Thread.run(Thread.java:833) ~[?:?] > Caused by: java.io.IOException: > org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile > Trailer from file > hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75 > at > org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:289) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:339) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:301) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > o
[jira] [Updated] (HBASE-28580) Revert the deprecation for methods in WALObserver
[ https://issues.apache.org/jira/browse/HBASE-28580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28580: --- Labels: pull-request-available (was: ) > Revert the deprecation for methods in WALObserver > - > > Key: HBASE-28580 > URL: https://issues.apache.org/jira/browse/HBASE-28580 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors, wal >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > > Per the discussion in this thread > https://lists.apache.org/thread/28c2tn4kn9gwvtsdtcbxx1c5tjdfh5jy -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28810) BackupLogCleaner is difficult to debug
[ https://issues.apache.org/jira/browse/HBASE-28810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28810: --- Labels: pull-request-available (was: ) > BackupLogCleaner is difficult to debug > -- > > Key: HBASE-28810 > URL: https://issues.apache.org/jira/browse/HBASE-28810 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 2.6.1 >Reporter: Ray Mattingly >Assignee: Ray Mattingly >Priority: Major > Labels: pull-request-available > > While implementing HBase's incremental backups across a few hundred clusters, > we continue to step on some rakes. Now and again, we find old WALs piling up > due to a poorly cleaned up BackupInfo, or a bug in the BackupLogCleaner, etc. > The BackupLogCleaner is difficult to debug for a couple of reasons: > # It has a lack of useful debug logging > # It has [a misleadingly named > method|https://github.com/HubSpot/hbase/blob/2d08fa67dfe9458260bc0be3ebc7bbd769850190/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java#L83-L83] > (this method returns the newest backup ts, not the oldest) > I'm going to introduce a small refactor that will alleviate these pain points. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28807) Remove some useless code and add some logs for CanaryTool
[ https://issues.apache.org/jira/browse/HBASE-28807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28807: --- Labels: pull-request-available (was: ) > Remove some useless code and add some logs for CanaryTool > - > > Key: HBASE-28807 > URL: https://issues.apache.org/jira/browse/HBASE-28807 > Project: HBase > Issue Type: Improvement > Components: canary >Reporter: MisterWang >Assignee: MisterWang >Priority: Minor > Labels: pull-request-available > > Remove some useless code in CanaryTool.sniff. > Add some logs when get null location for table region. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28433) Modify the assembly to not include test jars and their transitive dependencies
[ https://issues.apache.org/jira/browse/HBASE-28433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28433: --- Labels: pull-request-available (was: ) > Modify the assembly to not include test jars and their transitive dependencies > -- > > Key: HBASE-28433 > URL: https://issues.apache.org/jira/browse/HBASE-28433 > Project: HBase > Issue Type: Sub-task >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > This task aims to modify the HBase assembly to exclude test jars and their > transitive dependencies. > Currently, our assembly includes test jars and test dependencies, which adds > unnecessary bloat to the assembly and classpath. This not only increases the > distribution size but also potentially introduces CVE-prone JARs into the > binary assemblies. > The objective of this task is to modify the build and assembly process to > exclude these test jars and their dependencies. This will result in a leaner, > more secure assembly with a faster startup time. > CC: [~stoty], [~zhangduo], [~ndimiduk], [~bbeaudreault] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28805) Implement chunked persistence of backing map for persistent bucket cache.
[ https://issues.apache.org/jira/browse/HBASE-28805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28805: --- Labels: pull-request-available (was: ) > Implement chunked persistence of backing map for persistent bucket cache. > - > > Key: HBASE-28805 > URL: https://issues.apache.org/jira/browse/HBASE-28805 > Project: HBase > Issue Type: Task > Components: BucketCache >Reporter: Janardhan Hungund >Assignee: Janardhan Hungund >Priority: Major > Labels: pull-request-available > > The persistent bucket cache implementation feature relies on the persistence > of backing map to a persistent file. the protobuf APIs are used to serialise > the backing map and its related structures into the file. An asynchronous > thread periodically flushes the contents of backing map to the persistence > file. > The protobuf library has a limitation of 2GB on the size of protobuf > messages. If the size of backing map increases beyond 2GB, an unexpected > exception is reported in the asynchronous thread and stops the persister > thread. This causes the persistent file go out of sync with the actual bucket > cache. Due to this, the bucket cache shrinks to a smaller size after a cache > restart. Checksum errors are also reported. > This Jira tracks the implementation of introducing chunking of the backing > map to persistence such that every protobuf is smaller than 2GB in size. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28804) Implement asynchronous retrieval of bucket-cache data from persistence.
[ https://issues.apache.org/jira/browse/HBASE-28804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28804: --- Labels: pull-request-available (was: ) > Implement asynchronous retrieval of bucket-cache data from persistence. > --- > > Key: HBASE-28804 > URL: https://issues.apache.org/jira/browse/HBASE-28804 > Project: HBase > Issue Type: Task > Components: BucketCache >Reporter: Janardhan Hungund >Assignee: Janardhan Hungund >Priority: Major > Labels: pull-request-available > > During the retrieval of data from bucket cache persistence file, a > transient structure that stores the blocks ordered by filename is constructed > from the backing map entries. The population of this transient structure is > done during the server start-up. This process increases the region-server > startup time, if the bucketcache has large number of blocks. > This population happens inline with the server restart and blocks the server > for several minutes. This makes the server restart inconvenient for the > external users. Restarts during upgrade can run into timeout issues due to > this delay in the server startup. > Hence, the recommendation in this Jira is to make the cache-retrieval > asynchronous to the server startup. During a server startup, a new thread is > spawn that reads the persistence file and creates the required structures > from persistence file. The server continues with the restart and does not > wait for the bucket-cache initialisation to complete. > Note that the bucket cache is not available immediately for usage and will > only be ready to use after the data is repopulated from persistence into > memory. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28801) WALs are not cleaned even after all entries are flushed
[ https://issues.apache.org/jira/browse/HBASE-28801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28801: --- Labels: pull-request-available (was: ) > WALs are not cleaned even after all entries are flushed > --- > > Key: HBASE-28801 > URL: https://issues.apache.org/jira/browse/HBASE-28801 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 2.5.6 >Reporter: Kiran Kumar Maturi >Assignee: Kiran Kumar Maturi >Priority: Minor > Labels: pull-request-available > > In our production fleet we have observed that WAL files are not cleaned up > even when all the entries have been flushed. I have fixed the WAL close > issue when there is issue with the wal closures as part of > [HBASE-28665|https://issues.apache.org/jira/projects/HBASE/issues/HBASE-28665]. > There is a case in case of unflushed entried that can lead for the wal not > be cleaned even after all the entries have been flushed > [FSHLog.java > |https://github.com/apache/hbase/blob/branch-2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L388] > {code:java} > if (isUnflushedEntries() || closeErrorCount.get() >= > this.closeErrorsTolerated) { > try { > closeWriter(this.writer, oldPath, true); > } finally { > inflightWALClosures.remove(oldPath.getName()); > if (!isUnflushedEntries()) { > markClosedAndClean(oldPath); > } > } > {code} > If there are unflushed entries then wal will never be marked close and won't > be cleaned further > {code:java} > private synchronized void cleanOldLogs() { > List> logsToArchive = null; > // For each log file, look at its Map of regions to the highest sequence > id; if all sequence ids > // are older than what is currently in memory, the WAL can be GC'd. > for (Map.Entry e : this.walFile2Props.entrySet()) { > if (!e.getValue().closed) { > LOG.debug("{} is not closed yet, will try archiving it next time", > e.getKey()); > continue; > } > Path log = e.getKey(); > Map sequenceNums = > e.getValue().encodedName2HighestSequenceId; > if (this.sequenceIdAccounting.areAllLower(sequenceNums)) { > if (logsToArchive == null) { > logsToArchive = new ArrayList<>(); > } > logsToArchive.add(Pair.newPair(log, e.getValue().logSize)); > if (LOG.isTraceEnabled()) { > LOG.trace("WAL file ready for archiving " + log); > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28802) Fix the expected RS hostname message during useIP feature enabled
[ https://issues.apache.org/jira/browse/HBASE-28802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28802: --- Labels: pull-request-available (was: ) > Fix the expected RS hostname message during useIP feature enabled > -- > > Key: HBASE-28802 > URL: https://issues.apache.org/jira/browse/HBASE-28802 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 3.0.0-alpha-3, 2.5.3 >Reporter: Y. SREENIVASULU REDDY >Assignee: Y. SREENIVASULU REDDY >Priority: Minor > Labels: pull-request-available > Fix For: 3.0.0-alpha-4, 2.7.0 > > > For HRegionServer#handleReportForDutyResponse, when the hostname is different > from the regionserver and master side, both the two conditions should abort > RS error message is corrected. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28732) Fix typo in Jenkinsfile_Github for jdk8 hadoop2 check
[ https://issues.apache.org/jira/browse/HBASE-28732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28732: --- Labels: beginner pull-request-available trivial (was: beginner trivial) > Fix typo in Jenkinsfile_Github for jdk8 hadoop2 check > - > > Key: HBASE-28732 > URL: https://issues.apache.org/jira/browse/HBASE-28732 > Project: HBase > Issue Type: Improvement > Components: jenkins >Reporter: Duo Zhang >Assignee: JinHyuk Kim >Priority: Major > Labels: beginner, pull-request-available, trivial > > https://github.com/apache/hbase/blob/9dee538f65d84a900724d424c71793dff46e9684/dev-support/Jenkinsfile_GitHub#L314 > This line > PR JDK8 Hadoop3 Check Report > Should be > PR JDK8 Hadoop2 Check Report -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28774) Protect hbase:meta from any hotspotting regions by balancing them to different region servers
[ https://issues.apache.org/jira/browse/HBASE-28774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28774: --- Labels: pull-request-available (was: ) > Protect hbase:meta from any hotspotting regions by balancing them to > different region servers > - > > Key: HBASE-28774 > URL: https://issues.apache.org/jira/browse/HBASE-28774 > Project: HBase > Issue Type: Improvement > Components: meta >Reporter: Ranganath Govardhanagiri >Assignee: Ranganath Govardhanagiri >Priority: Major > Labels: pull-request-available > > During some of the incidents, it is observed that when hbase:meta is > colocated with high load (or hotspotting) region, that makes meta unavailable > and causes availability issues. This item is to provide a way to load balance > such regions such that meta is not impacted. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27746) Check if the file system supports storage policy before invoking setStoragePolicy()
[ https://issues.apache.org/jira/browse/HBASE-27746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-27746: --- Labels: pull-request-available (was: ) > Check if the file system supports storage policy before invoking > setStoragePolicy() > --- > > Key: HBASE-27746 > URL: https://issues.apache.org/jira/browse/HBASE-27746 > Project: HBase > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: pull-request-available > > Found these messages on an Ozone cluster: > {noformat} > 2023-03-20 12:27:09,185 WARN org.apache.hadoop.hbase.util.CommonFSUtils: > Unable to set storagePolicy=HOT for > path=ofs://ozone1/vol1/bucket1/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/proc. > DEBUG log level might have more details. > java.lang.UnsupportedOperationException: RootedOzoneFileSystem doesn't > support setStoragePolicy > at > org.apache.hadoop.fs.FileSystem.setStoragePolicy(FileSystem.java:3227) > at > org.apache.hadoop.hbase.util.CommonFSUtils.invokeSetStoragePolicy(CommonFSUtils.java:521) > at > org.apache.hadoop.hbase.util.CommonFSUtils.setStoragePolicy(CommonFSUtils.java:504) > at > org.apache.hadoop.hbase.util.CommonFSUtils.setStoragePolicy(CommonFSUtils.java:477) > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.setStoragePolicy(HRegionFileSystem.java:225) > at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:275) > at > org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6387) > at > org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1115) > at > org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1112) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > Ozone does not support storage policy. If we use > FileSystem.hasPathCapability() API to check before invoking the API, these > warning messages can be avoided. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27903) Skip submitting Split/Merge procedure when split/merge is disabled at table level
[ https://issues.apache.org/jira/browse/HBASE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-27903: --- Labels: pull-request-available (was: ) > Skip submitting Split/Merge procedure when split/merge is disabled at table > level > - > > Key: HBASE-27903 > URL: https://issues.apache.org/jira/browse/HBASE-27903 > Project: HBase > Issue Type: Improvement > Components: Admin >Reporter: Ashok shetty >Assignee: Nihal Jain >Priority: Minor > Labels: pull-request-available > > *Scenario* > If split/merge is disabled at table level , master will submit a > SplitTableRegionProcedure/MergeTableRegionsProcedure , and rollback it as > execution fails during pre-checks . > *Improvement* > Master can check it early and no need to submit > SplitTableRegionProcedure/MergeTableRegionsProcedure when split/merge switch > is disabled at Table level. > *Steps* > {code:java} > create 'testCreateTableWithMergeDisableParameter', 'f1', {MERGE_ENABLED => > false} > list_regions 'testCreateTableWithMergeDisableParameter' > merge_region > 'd21cdc5d488e8036017696c46cffd9b1','6382c8f731a4f0379b6e98ece4b06e3e' > {code} > {code:java} > create 'testcreatetablewithsplitdisableparameter', 'f1', {SPLIT_ENABLED => > false} > split 'testcreatetablewithsplitdisableparameter','30'{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28792) AsyncTableImpl calls coprocessor callbacks in undefined order
[ https://issues.apache.org/jira/browse/HBASE-28792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28792: --- Labels: pull-request-available (was: ) > AsyncTableImpl calls coprocessor callbacks in undefined order > - > > Key: HBASE-28792 > URL: https://issues.apache.org/jira/browse/HBASE-28792 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Charles Connell >Priority: Major > Labels: pull-request-available > > To call a coprocessor endpoint asynchronously, you start by calling > {{AsyncTable#coprocessorService()}}, which gives you a > {{CoprocessorServiceBuilder}}, and a few steps later you can talk to your > coprocessor over the network. One argument to > {{AsyncTable#coprocessorService()}} is a {{CoprocessorCallback}} object, > which contains several methods that will be called during the lifecycle of a > coprocessor endpoint call. {{AsyncTableImpl}}'s implementation of > {{AsyncTable#coprocessorService()}} wraps your {{CoprocessorCallback}} with > its own that delegates the work to a thread pool. A snippet of this: > {code} > @Override > public void onRegionComplete(RegionInfo region, R resp) { > pool.execute(context.wrap(() -> callback.onRegionComplete(region, > resp))); > } > ... > @Override > public void onComplete() { > pool.execute(context.wrap(callback::onComplete)); > } > {code} > The trouble with this is that your implementations of {{onRegionComplete()}} > and {{onComplete()}} will end up getting called in a random order, and/or at > the same time. The tasks of calling them are delegated to a thread pool, and > the completion of those tasks is not waited on, so the thread pool can choose > any ordering it wants to. Troublingly, {{onComplete()}} can be called before > the final {{onRegionComplete()}}, which is an violation of [the contract > specified in the {{CoprocessorCallback#onComplete()}} > javadoc|https://github.com/apache/hbase/blob/41dd87cd908d4d089d0b8cff6c88c01ed60622c5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTable.java#L671]. > I discovered this while working on HBASE-28770. I found that > {{AsyncAggregationClient#rowCount()}} returns incorrect results 5-10% of the > time, and this bug is the reason. Other {{AsyncAggregationClient}} methods I > presume are similarly affected. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28770) Pass partial results from AggregateImplementation when quotas are exceeded
[ https://issues.apache.org/jira/browse/HBASE-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28770: --- Labels: pull-request-available (was: ) > Pass partial results from AggregateImplementation when quotas are exceeded > -- > > Key: HBASE-28770 > URL: https://issues.apache.org/jira/browse/HBASE-28770 > Project: HBase > Issue Type: Improvement > Components: Coprocessors >Reporter: Charles Connell >Assignee: Charles Connell >Priority: Major > Labels: pull-request-available > > Currently there is a gap in the coverage of HBase's quota-based workload > throttling. Requests sent by {{[Async]AggregationClient}} reach > {{AggregateImplementation}}. This then executes Scans in a way that bypasses > the quota system. We see issues with this at Hubspot where clusters suffer > under this load and we don't have a good way to protect them. > In this ticket I'm teaching {{AggregateImplementation}} to optionally stop > scanning when a throttle is violated, and send back just the results it has > accumulated so far. In addition, it will send back a row key to > {{AsyncAggregationClient}}. When the client gets a response with a row key, > it will sleep in order to satisfy the throttle, and then send a new request > with a scan starting at that row key. This will have the effect of continuing > the work where the last request stopped. > This feature will be unconditionally enabled by {{AsyncAggregationClient}} > once this ticket is finished. {{AggregateImplementation}} will not assume > that clients support partial results, however, so it can keep supporting > older clients. For clients that do not support partial results, throttles > will not be respecting, and results will always be complete. > This feature was [first proposed on the mailing > list|https://lists.apache.org/thread/1vqnxb71z7swq2cogz4qg3cn6b10xp4v]. > Builds on work in HBASE-28346. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28793) Update hbase-thirdparty to 4.1.8
[ https://issues.apache.org/jira/browse/HBASE-28793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28793: --- Labels: pull-request-available (was: ) > Update hbase-thirdparty to 4.1.8 > > > Key: HBASE-28793 > URL: https://issues.apache.org/jira/browse/HBASE-28793 > Project: HBase > Issue Type: Task > Components: dependencies >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28786) Fix classname for command: copyreppeers in bin/hbase
[ https://issues.apache.org/jira/browse/HBASE-28786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28786: --- Labels: pull-request-available (was: ) > Fix classname for command: copyreppeers in bin/hbase > > > Key: HBASE-28786 > URL: https://issues.apache.org/jira/browse/HBASE-28786 > Project: HBase > Issue Type: Bug >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Minor > Labels: pull-request-available > > Stumbled upon this. Dug deeper seems during review we missed to rename the > classname in bin/hbase when the actual class was renamed from > ReplicationPeerMigrationTool -> CopyReplicationPeers > > See > https://github.com/apache/hbase/compare/69603351b3f2817c74d869d32da0596bab3c409e..1d11ce96c44277df6ccdd16ae2c9d8a1c419f3da > [hbase@hostname~]$ hbase copyreppeers > Error: Could not find or load main class > org.apache.hadoop.hbase.replication.ReplicationPeerMigrationTool > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.replication.ReplicationPeerMigrationTool > > FYI [~zhangduo] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28784) Exclude samples and release-documentation zip of jaxws-ri from output tarball
[ https://issues.apache.org/jira/browse/HBASE-28784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28784: --- Labels: pull-request-available (was: ) > Exclude samples and release-documentation zip of jaxws-ri from output tarball > - > > Key: HBASE-28784 > URL: https://issues.apache.org/jira/browse/HBASE-28784 > Project: HBase > Issue Type: Task >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > Found this while (testing HBASE-28070 and) I was checking lib folder for > extracted assembly for master. I guess this must be a problem for all > branches fixed for HBASE-28760 > Following zip files are there in hbase-4.0.0-alpha-1-SNAPSHOT/lib: > * samples-2.3.2.zip > * release-documentation-2.3.2-docbook.zip -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28783) Concurrent execution of normalizer operations on tables in RegionNormalizerWorker
[ https://issues.apache.org/jira/browse/HBASE-28783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28783: --- Labels: pull-request-available (was: ) > Concurrent execution of normalizer operations on tables in > RegionNormalizerWorker > - > > Key: HBASE-28783 > URL: https://issues.apache.org/jira/browse/HBASE-28783 > Project: HBase > Issue Type: Improvement > Components: Normalizer >Reporter: MisterWang >Assignee: MisterWang >Priority: Major > Labels: pull-request-available > > Recently, I have been managing the large tables in the HBase cluster by > enabling the normalizer to set the size of the regions and keep the number of > regions within a reasonable range. > The current code retrieves tables from RegionNormalizerWorkQueue and performs > normalization operations on the tables in series. When there are multiple > large tables in a cluster that need to be managed, the efficiency will be > very low. > I have confirmed that each split or merge plan generated by each table during > the normalizer process will be limited by RateLimiter, so I think it is > reasonable to perform table normalizers concurrently. > In terms of implementation, create a thread pool for executing tasks in > RegionNormalizerWorker, with a default value of 1 for the number of thread > pools, and provide a parameter that can be configured to other values. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28778) NPE may occur when opening master-status or table.jsp or procedure.jsp while Master is initializing
[ https://issues.apache.org/jira/browse/HBASE-28778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28778: --- Labels: pull-request-available (was: ) > NPE may occur when opening master-status or table.jsp or procedure.jsp while > Master is initializing > --- > > Key: HBASE-28778 > URL: https://issues.apache.org/jira/browse/HBASE-28778 > Project: HBase > Issue Type: Bug > Components: UI >Reporter: guluo >Priority: Major > Labels: pull-request-available > > The reason: > For table.jsp, NPE may occur when calling master.getConnection() while > Master is initializing. > asyncClusterConnection will only be initialized when HMaster call method > setupClusterConnection, > so before that, asyncClusterConnection is null, at this moment, we would get > NPE if opening table.jsp. > procedure.jsp and master-status may also encounter NPE for the similar reason. > > Error Message: > java.lang.NullPointerException: Cannot invoke > "org.apache.hadoop.hbase.procedure2.ProcedureExecutor.getProcedures()" > because "procExecutor" is null at > org.apache.hadoop.hbase.generated.master.procedures_jsp._jspService(procedures_jsp.java:76) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:111) at > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at > org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1450) > at > org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) > at > org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) > at > org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:117) > at > org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > > 2024-08-13T20:41:40,056 WARN [qtp463313451-80] server.HttpChannel: > /master-status > java.lang.NullPointerException: Cannot invoke > "org.apache.hadoop.hbase.master.assignment.RegionStateNode.isInState(org.apache.hadoop.hbase.master.RegionState$State[])" > because "rsn" is null > at > org.apache.hadoop.hbase.master.http.MasterStatusServlet.getMetaLocationOrNull(MasterStatusServlet.java:80) > ~[hbase-server-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at > org.apache.hadoop.hbase.master.http.MasterStatusServlet.doGet(MasterStatusServlet.java:60) > ~[hbase-server-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > ~[javax.servlet-api-3.1.0.jar:3.1.0] > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > ~[javax.servlet-api-3.1.0.jar:3.1.0] > at > org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) > ~[hbase-shaded-jetty-4.1.7.jar:?] > at > org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) > ~[hbase-shaded-jetty-4.1.7.jar:?] > at > org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:117) > ~[hbase-http-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28654) Support only remove expired files in the compact process
[ https://issues.apache.org/jira/browse/HBASE-28654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28654: --- Labels: pull-request-available (was: ) > Support only remove expired files in the compact process > > > Key: HBASE-28654 > URL: https://issues.apache.org/jira/browse/HBASE-28654 > Project: HBase > Issue Type: Improvement > Components: Compaction >Reporter: MisterWang >Assignee: MisterWang >Priority: Minor > Labels: pull-request-available > > As is known to all, the compact processes generate a certain amount of I/O. > But in some cases, not being compact in time does not affect online services. > For example, in the scenario where the TTL time of a table is short, there > are more writes than reads. To ensure write performance and reduce cluster > I/O pressure, Only expired files are removed in the compact process is a good > idea. > Another usage scenario: Tables have primary and backup, and the backup table > can add this table attribute to reduce the compact IO of the backup cluster. > Optimization solution: Add a table attribute for the HBase table -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28777) mTLS client hostname verification doesn't work with OptionalSslHandler
[ https://issues.apache.org/jira/browse/HBASE-28777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28777: --- Labels: pull-request-available (was: ) > mTLS client hostname verification doesn't work with OptionalSslHandler > -- > > Key: HBASE-28777 > URL: https://issues.apache.org/jira/browse/HBASE-28777 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 3.0.0-beta-1 >Reporter: Andor Molnar >Assignee: Andor Molnar >Priority: Major > Labels: pull-request-available > > Netty's OptionalSslHandler cannot carry hostport information to SslEngine, > hence HBASE-27673 fixed the TLS-only case only. We need to have a custom > handling for the plaintext-enabled TLS mode in order to support client > hostname verification in that case too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28775) Change the output of DatanodeInfo in the log to the hostname of the datanode
[ https://issues.apache.org/jira/browse/HBASE-28775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28775: --- Labels: pull-request-available (was: ) > Change the output of DatanodeInfo in the log to the hostname of the datanode > > > Key: HBASE-28775 > URL: https://issues.apache.org/jira/browse/HBASE-28775 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: MisterWang >Priority: Minor > Labels: pull-request-available > > Now, DatanodeInfo will be output in the print log. When we are > troubleshooting and searching for slow datanode nodes, we need to convert IP > addresses to hostnames, which is quite cumbersome. > I think the output log should has good readability, so it would be better to > output the hostname of the datanode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27781) AssertionError in AsyncRequestFutureImpl when timing out during location resolution
[ https://issues.apache.org/jira/browse/HBASE-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-27781: --- Labels: pull-request-available (was: ) > AssertionError in AsyncRequestFutureImpl when timing out during location > resolution > --- > > Key: HBASE-27781 > URL: https://issues.apache.org/jira/browse/HBASE-27781 > Project: HBase > Issue Type: Bug >Reporter: Bryan Beaudreault >Assignee: Daniel Roudnitsky >Priority: Major > Labels: pull-request-available > > In AsyncFutureRequestImpl we fail fast when operation timeout is exceeded > during location resolution > [here|https://github.com/apache/hbase/blob/branch-2.5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L460-L462]. > In that handling, we loop all actions and set them as failed. The problem > is, some number of actions may already finished when we get to this spot. So > the actionsInProgress would have been decremented for those already, and now > we're going to decrement by all actions. This causes an assertion error since > we go negative > [here|https://github.com/apache/hbase/blob/branch-2.5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L1197] > We still want to fail all actions, because none will be executed. But we need > special handling to avoid this case. Maybe don't bother decrementing the > actionsInProgress at all, instead set to 0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28771) Add support for non replica actions to AsyncRequestFutureImpl.isActionComplete
[ https://issues.apache.org/jira/browse/HBASE-28771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28771: --- Labels: pull-request-available (was: ) > Add support for non replica actions to AsyncRequestFutureImpl.isActionComplete > -- > > Key: HBASE-28771 > URL: https://issues.apache.org/jira/browse/HBASE-28771 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: Daniel Roudnitsky >Assignee: Daniel Roudnitsky >Priority: Minor > Labels: pull-request-available > > The current isActionComplete method we have in AsyncRequestFutureImpl is only > designed to support replica actions, it would be useful to have > isActionComplete support non replica actions that could be reused for > HBASE-28358 and in other paths where non replica actions are being handled. > Since isActionComplete currently only has one caller we can move the replica > action check to the caller method instead of having the check be inside > isActionComplete. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28669) After one RegionServer restarts, another RegionServer leaks a connection to ZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-28669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28669: --- Labels: Replication pull-request-available (was: Replication) > After one RegionServer restarts, another RegionServer leaks a connection to > ZooKeeper > - > > Key: HBASE-28669 > URL: https://issues.apache.org/jira/browse/HBASE-28669 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.4.5 >Reporter: ZhongYou Li >Priority: Minor > Labels: Replication, pull-request-available > > The peer "to_pd_A" has been removed, but there is an error log in > RegionServer, error log: > {code:java} > 2024-06-11 09:42:34.074 ERROR > [ReplicationExecutor-0.replicationSource,to_pd_A-172.30.12.12,6002,1709612684705-SendThread(bjtx-hbase-onll-meta-01:2181)] > client.StaticHostProvider: Unable to resolve address: > bjtx-hbase-onll-meta-03:2181 > java.net.UnknownHostException: bjtx-hbase-onll-meta-03 > at java.net.InetAddress$CachedAddresses.get(InetAddress.java:764) > at java.net.InetAddress.getAllByName0(InetAddress.java:1291) > at java.net.InetAddress.getAllByName(InetAddress.java:1144) > at java.net.InetAddress.getAllByName(InetAddress.java:1065) > at > org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:92) > at > org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:147) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:375) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1137){code} > Here are the steps to reproduce: > I have 3 RegionServers. The following steps can reproduce the phenomenon of > ZK connection leakage: > 1. Enable replication > 2. Create a peer > 3. Shut down any two RegionServers for a few minutes and restart them > 4. Print the thread stack on the RegionServer that is not shut down, search > for the keyword , and you can see that there are 4 more threads with > ZooKeeper > 5. By removing the peer, the extra 4 threads still exist > The following is the thread stack leak in one of my RegionServers: > {code:java} > "ReplicationExecutor-0.replicationSource,lizy_test_replication-10.0.16.29,6002,1718180442225-EventThread" > #610 daemon prio=5 os_prio=0 cpu=0.27ms elapsed=466.94s > tid=0x7efc58179000 nid=0x5a051 waiting on condition [0x7efc2cdef000] > "ReplicationExecutor-0.replicationSource,lizy_test_replication-10.0.16.29,6002,1718180442225-SendThread(10.0.16.100:2181)" > #609 daemon prio=5 os_prio=0 cpu=3.02ms elapsed=466.94s > tid=0x7efc58178800 nid=0x5a050 runnable [0x7efc2cef] > "ReplicationExecutor-0.replicationSource,lizy_test_replication-10.0.16.9,6002,1718180457260-EventThread" > #505 daemon prio=5 os_prio=0 cpu=0.27ms elapsed=556.09s > tid=0x7efc50094800 nid=0x59c04 waiting on condition [0x7efc2d7f7000] > "ReplicationExecutor-0.replicationSource,lizy_test_replication-10.0.16.9,6002,1718180457260-SendThread(10.0.16.100:2181)" > #504 daemon prio=5 os_prio=0 cpu=3.72ms elapsed=556.09s > tid=0x7efc50093000 nid=0x59c03 runnable [0x7efc2d8f8000] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28769) Create PoC for RangeStoreFileReader to support multi region splitting
[ https://issues.apache.org/jira/browse/HBASE-28769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28769: --- Labels: pull-request-available (was: ) > Create PoC for RangeStoreFileReader to support multi region splitting > - > > Key: HBASE-28769 > URL: https://issues.apache.org/jira/browse/HBASE-28769 > Project: HBase > Issue Type: Sub-task >Reporter: Rajeshbabu Chintaguntla >Assignee: Rajeshbabu Chintaguntla >Priority: Major > Labels: pull-request-available > > This is the JIRA to create the PoC of the range store file reader. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28760) Exclude pom file of jaxws-ri in output tarball
[ https://issues.apache.org/jira/browse/HBASE-28760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28760: --- Labels: pull-request-available (was: ) > Exclude pom file of jaxws-ri in output tarball > -- > > Key: HBASE-28760 > URL: https://issues.apache.org/jira/browse/HBASE-28760 > Project: HBase > Issue Type: Bug > Components: jenkins, scripts >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > > Permission denied... > Not sure what is the real problem. > {noformat} > 17:21:17 Building a binary tarball from the source tarball succeeded. > [Pipeline] echo > 17:21:17 unpacking the hbase bin tarball into 'hbase-install' and the client > tarball into 'hbase-client' > [Pipeline] sh > 17:21:18 tar: /jaxws-ri-2.3.2.pom: Cannot open: Permission denied > 17:21:20 tar: Exiting with failure status due to previous errors > Post stage > [Pipeline] stash > 17:21:20 Warning: overwriting stash ‘srctarball-result’ > 17:21:20 Stashed 2 file(s) > [Pipeline] sshPublisher > 17:21:20 SSH: Current build result is [FAILURE], not going to run. > [Pipeline] sh > 17:21:20 Remove > /home/jenkins/jenkins-home/workspace/HBase_Nightly_master/output-srctarball/hbase-src.tar.gz > for saving space > [Pipeline] archiveArtifacts > 17:21:20 Archiving artifacts > [Pipeline] archiveArtifacts > 17:21:20 Archiving artifacts > [Pipeline] archiveArtifacts > 17:21:20 Archiving artifacts > [Pipeline] archiveArtifacts > 17:21:20 Archiving artifacts > [Pipeline] } > [Pipeline] // withEnv > [Pipeline] } > [Pipeline] // node > [Pipeline] } > [Pipeline] // stage > [Pipeline] } > 17:21:20 Failed in branch packaging and integration > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28767) Simplify backup bulk-loading code
[ https://issues.apache.org/jira/browse/HBASE-28767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28767: --- Labels: pull-request-available (was: ) > Simplify backup bulk-loading code > - > > Key: HBASE-28767 > URL: https://issues.apache.org/jira/browse/HBASE-28767 > Project: HBase > Issue Type: Task > Components: backup&restore >Reporter: Dieter De Paepe >Priority: Minor > Labels: pull-request-available > > While working on HBASE-28706, I came across a lot of overly complex/duplicate > code related to how bulk uploads are tracked for backups. > This ticket is to simplify some of it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28648) Change the deprecation cycle for RegionObserver.postInstantiateDeleteTracker
[ https://issues.apache.org/jira/browse/HBASE-28648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28648: --- Labels: pull-request-available (was: ) > Change the deprecation cycle for RegionObserver.postInstantiateDeleteTracker > > > Key: HBASE-28648 > URL: https://issues.apache.org/jira/browse/HBASE-28648 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Reporter: Duo Zhang >Assignee: Liangjun He >Priority: Major > Labels: pull-request-available > > Visibility label feature still use this method so it can not be removed in > 3.0.0. Should change the deprecation cycle javadoc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-23778) Region History Redo
[ https://issues.apache.org/jira/browse/HBASE-23778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-23778: --- Labels: pull-request-available (was: ) > Region History Redo > --- > > Key: HBASE-23778 > URL: https://issues.apache.org/jira/browse/HBASE-23778 > Project: HBase > Issue Type: Improvement >Reporter: Swaroopa Kadam >Assignee: Akshita Jain >Priority: Major > Labels: pull-request-available > > My initial thought is mainly to extend the HBase shell(gradually extend to > CLI and UI) to allow the use of > {code:java} > where {code} > to get the necessary information and allow passing history as an additional > parameter to get the history. We can configure how many transitions we want > to store so that anything (ZK or small data structure in the table region > itself or maybe something else) that is used for state management is not > exploded. > As pointed by > [~andrew.purt...@gmail.com] need to be watchful of mistakes done in the past: > HBASE-533 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28389) HBase backup yarn queue parameter ignored
[ https://issues.apache.org/jira/browse/HBASE-28389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28389: --- Labels: pull-request-available (was: ) > HBase backup yarn queue parameter ignored > - > > Key: HBASE-28389 > URL: https://issues.apache.org/jira/browse/HBASE-28389 > Project: HBase > Issue Type: Bug > Components: backup&restore >Affects Versions: 2.6.0 > Environment: HBase branch-2.6 >Reporter: Dieter De Paepe >Assignee: Liangjun He >Priority: Major > Labels: pull-request-available > > It seems the parameter to specify the yarn queue for HBase backup (`-q`) is > ignored: > {code:java} > hbase backup create full hdfs:///tmp/backups/hbasetest/hbase -q hbase-backup > {code} > gets executed on the "default" queue. > Setting the queue through the configuration does work. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28690) Aborting Active HMaster is not rejecting reportRegionStateTransition if procedure is initialised by next Active master
[ https://issues.apache.org/jira/browse/HBASE-28690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28690: --- Labels: pull-request-available (was: ) > Aborting Active HMaster is not rejecting reportRegionStateTransition if > procedure is initialised by next Active master > -- > > Key: HBASE-28690 > URL: https://issues.apache.org/jira/browse/HBASE-28690 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.5.8 >Reporter: Umesh Kumar Kumawat >Assignee: Umesh Kumar Kumawat >Priority: Major > Labels: pull-request-available > > A CloseRegionProcedure on master requests the RS to close the region and > after closing the region RS reports RegionStateTransition > back([here|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L1853]). > On receiving the report, the master checks if regionNode has any procedure > assigned to it > ([code|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1294]). > > > {code:java} > private boolean reportTransition(RegionStateNode regionNode, ServerStateNode > serverNode, > TransitionCode state, long seqId, long procId) throws IOException { > ServerName serverName = serverNode.getServerName(); > TransitRegionStateProcedure proc = regionNode.getProcedure(); > if (proc == null) { > return false; > } > > proc.reportTransition(master.getMasterProcedureExecutor().getEnvironment(), > regionNode, > serverName, state, seqId, procId); > return true; > } {code} > If regionNode doesn't have any procedure, the master just logs it and doesn't > throw any error to RPC. > > Think of a case when MasterFailover is happening and the new Active master > only initialized the TRSP and CloseRegionProcedure. Now aborting Master has > stale/false data. If the transition report comes to the aborting master, not > rejecting this report is causing the procedure to get stuck. > > *Logs for more understanding* > active master server4-1 failing > {noformat} > 2024-06-20 04:45:05,576 ERROR > [iority.RWQ.Fifo.write.handler=3,queue=0,port=61000] master.HMaster - * > ABORTING master server4-1,61000,1715413775736: Failed to record region server > as started *{noformat} > *logs of new active master server5-1* > > {noformat} > 2024-06-20 04:49:28,893 DEBUG [aster/server5-1:61000:becomeActiveMaster] > assignment.RegionStateStore - Load hbase:meta entry > region=888a715d5926adbb89c985d8967f40d4, regionState=OPEN, > lastHost=server1-119,61020,1717560166420, > regionLocation=server1-119,61020,1717560166420, openSeqNum=34892620 > 024-06-20 04:49:51,886 INFO [PEWorker-22] procedure2.ProcedureExecutor - > Initialized subprocedures=[{pid=16276416, ppid=16276108, > state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure > table=RIMBS.UPLOADER_JOB_DETAILS, region=888a715d5926adbb89c985d8967f40d4, > UNASSIGN}] (on server5-1) > 2024-06-20 04:49:52,022 INFO [PEWorker-40] procedure2.ProcedureExecutor - > Initialized subprocedures=[{pid=16276470, ppid=16276416, state=RUNNABLE; > CloseRegionProcedure 888a715d5926adbb89c985d8967f40d4, > server=server1-119,61020,1717560166420}] (on server5-1){noformat} > > *RS logs for closing* > {noformat} > 2024-06-20 04:49:52,267 INFO [_REGION-regionserver/server1-119:61020-2] > handler.UnassignRegionHandler - Close 888a715d5926adbb89c985d8967f40d4 > 2024-06-20 04:49:52,267 DEBUG [_REGION-regionserver/server1-119:61020-2] > regionserver.HRegion - Closing 888a715d5926adbb89c985d8967f40d4, disabling > compactions & flushes > 2024-06-20 04:49:52,354 INFO [_REGION-regionserver/server1-119:61020-2] > regionserver.HRegion - Closed > TABLE,KW\x00na240-app1-16\x00/Events-120620231740\x00MARKER-Events,1702619592612.888a715d5926adbb89c985d8967f40d4. > {noformat} > *Logs of report on aborting active Hmaster* > {noformat} > 2024-06-20 04:49:52,355 WARN > [iority.RWQ.Fifo.write.handler=1,queue=0,port=61000] > assignment.AssignmentManager - No matching procedure found for > server1-119,61020,1717560166420 transition on state=OPEN, > location=server1-119,61020,1717560166420, table=RIMBS.UPLOADER_JOB_DETAILS, > region=888a715d5926adbb89c985d8967f40d4 to CLOSED ( host = server4-1 , > hbaseMasterLogFile){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28758) Remove the aarch64 profile
[ https://issues.apache.org/jira/browse/HBASE-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28758: --- Labels: beginner pull-request-available (was: beginner) > Remove the aarch64 profile > -- > > Key: HBASE-28758 > URL: https://issues.apache.org/jira/browse/HBASE-28758 > Project: HBase > Issue Type: Improvement > Components: build, pom, Protobufs >Reporter: Duo Zhang >Assignee: MisterWang >Priority: Major > Labels: beginner, pull-request-available > > We do not depend on protobuf 2.5 on branch-3+, so we do not need the special > protoc compiler for arm any more. > Just remove the profile. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28250) Bump jruby to 9.4.8.0 to fix snakeyaml CVE
[ https://issues.apache.org/jira/browse/HBASE-28250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28250: --- Labels: pull-request-available (was: ) > Bump jruby to 9.4.8.0 to fix snakeyaml CVE > -- > > Key: HBASE-28250 > URL: https://issues.apache.org/jira/browse/HBASE-28250 > Project: HBase > Issue Type: Task > Components: jruby, security, shell >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > As a follow up of HBASE-28249, we want to bump to latest 9.4.x line here. > This release line drops critical snakeyaml CVE ({*}org.yaml : snakeyaml : > 1.33{*} having > [CVE-2022-1471|https://nvd.nist.gov/vuln/detail/CVE-2022-1471]) from our > classpath with following change along with several other bugs/fixes: > * The Psych YAML library is updated to 5.1.0. This version switches the > JRuby extension to SnakeYAML Engine, avoiding CVEs against the original > SnakeYAML and updating YAML compatibility to specification version 1.2. > [#6365|https://github.com/jruby/jruby/issues/6365], > [#7570|https://github.com/jruby/jruby/issues/7570], > [#7626|https://github.com/jruby/jruby/pull/7626] > NOTE: JRuby 9.4.x targets Ruby 3.1 compatibility instead of Ruby 2.6 which > 9.3.x were having! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28729) Change the generic type of List in InternalScanner.next
[ https://issues.apache.org/jira/browse/HBASE-28729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28729: --- Labels: pull-request-available (was: ) > Change the generic type of List in InternalScanner.next > --- > > Key: HBASE-28729 > URL: https://issues.apache.org/jira/browse/HBASE-28729 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors, regionserver >Reporter: Duo Zhang >Priority: Major > Labels: pull-request-available > > Plan to change it from List to List, so we could > pass both List and List to it, or even List for > coprocessors. > This could save a lot of casting in our main code. > This is an incompatible change for coprocessors, so it will only go into > branch-3+, and will be marked as incompatible change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28587) Remove deprecated methods in Cell
[ https://issues.apache.org/jira/browse/HBASE-28587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28587: --- Labels: pull-request-available (was: ) > Remove deprecated methods in Cell > - > > Key: HBASE-28587 > URL: https://issues.apache.org/jira/browse/HBASE-28587 > Project: HBase > Issue Type: Sub-task > Components: API, Client >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28584) RS SIGSEGV under heavy replication load
[ https://issues.apache.org/jira/browse/HBASE-28584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28584: --- Labels: pull-request-available (was: ) > RS SIGSEGV under heavy replication load > --- > > Key: HBASE-28584 > URL: https://issues.apache.org/jira/browse/HBASE-28584 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.5.6 > Environment: RHEL 7.9 > JDK 11.0.23 > Hadoop 3.2.4 > Hbase 2.5.6 >Reporter: Whitney Jackson >Assignee: Andrew Kyle Purtell >Priority: Major > Labels: pull-request-available > Attachments: > 0001-Deep-clone-cells-set-to-be-replicated-onto-the-local.patch, > 0001-Support-configuration-based-selection-of-netty-chann.patch, > rs_profile_after.html, rs_profile_before.html > > > I'm observing RS crashes under heavy replication load: > > {code:java} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f7546873b69, pid=29890, tid=36828 > # > # JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.23+7) (build > 11.0.23+7-LTS-222) > # Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.23+7-LTS-222, mixed > mode, tiered, compressed oops, g1 gc, linux-amd64) > # Problematic frame: > # J 24625 c2 > org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V > (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209] > {code} > > The heavier load comes when a replication peer has been disabled for several > hours for patching etc. When the peer is re-enabled the replication load is > high until the peer is all caught up. The crashes happen on the cluster > receiving the replication edits. > > I believe this problem started after upgrading from 2.4.x to 2.5.x. > > One possibly relevant non-standard config I run with: > {code:java} > > hbase.region.store.parallel.put.limit > > 100 > Added after seeing "failed to accept edits" replication errors > in the destination region servers indicating this limit was being exceeded > while trying to process replication edits. > > {code} > > I understand from other Jiras that the problem is likely around direct memory > usage by Netty. I haven't yet tried switching the Netty allocator to > {{unpooled}} or {{{}heap{}}}. I also haven't yet tried any of the > {{io.netty.allocator.*}} options. > > {{MaxDirectMemorySize}} is set to 26g. > > Here's the full stack for the relevant thread: > > {code:java} > Stack: [0x7f72e2e5f000,0x7f72e2f6], sp=0x7f72e2f5e450, free > space=1021k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > J 24625 c2 > org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V > (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209] > J 26253 c2 > org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I > (21 bytes) @ 0x7f7545af2d84 [0x7f7545af2d20+0x0064] > J 22971 c2 > org.apache.hadoop.hbase.codec.KeyValueCodecWithTags$KeyValueEncoder.write(Lorg/apache/hadoop/hbase/Cell;)V > (27 bytes) @ 0x7f754663f700 [0x7f754663f4c0+0x0240] > J 25251 c2 > org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (90 bytes) @ 0x7f7546a53038 [0x7f7546a50e60+0x21d8] > J 21182 c2 > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (73 bytes) @ 0x7f7545f4d90c [0x7f7545f4d3a0+0x056c] > J 21181 c2 > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (149 bytes) @ 0x7f7545fd680c [0x7f7545fd65e0+0x022c] > J 25389 c2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$$Lambda$247.run()V > (16 bytes) @ 0x7f7546ade660 [0x7f7546ade140+0x0520] > J 24098 c2 > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z > (109 bytes) @ 0x7f754678fbb8 [0x7f754678f8e0+0x02d8] > J 27297% c2 > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run()V (603 > bytes) @ 0x7f75466c4d48 [0x7f75466c4c80+0x00c8] > j > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run()V+44 > j > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.
[jira] [Updated] (HBASE-28756) RegionSizeCalculator ignored the size of memstore, which leads Spark miss data
[ https://issues.apache.org/jira/browse/HBASE-28756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28756: --- Labels: pull-request-available (was: ) > RegionSizeCalculator ignored the size of memstore, which leads Spark miss data > -- > > Key: HBASE-28756 > URL: https://issues.apache.org/jira/browse/HBASE-28756 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 2.6.0, 3.0.0-beta-1, 2.5.10 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Labels: pull-request-available > > RegionSizeCalculator only considers the size of StoreFile and ignores the > size of MemStore. For a new region that has only been written to MemStore and > has not been flushed, will consider its size to be 0. > When we use TableInputFormat to read HBase table data in Spark. > {code:java} > spark.sparkContext.newAPIHadoopRDD( > conf, > classOf[TableInputFormat], > classOf[ImmutableBytesWritable], > classOf[Result]) > }{code} > Spark defaults to ignoring empty InputSplits, which is determined by the > configuration "{{{}spark.hadoopRDD.ignoreEmptySplits{}}}". > {code:java} > private[spark] val HADOOP_RDD_IGNORE_EMPTY_SPLITS = > ConfigBuilder("spark.hadoopRDD.ignoreEmptySplits") > .internal() > .doc("When true, HadoopRDD/NewHadoopRDD will not create partitions for > empty input splits.") > .version("2.3.0") > .booleanConf > .createWithDefault(true) {code} > The above reasons lead to Spark missing data. So we should consider both the > size of the StoreFile and the MemStore in the RegionSizeCalculator. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-6028) Implement a cancel for in-progress compactions
[ https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-6028: -- Labels: beginner pull-request-available (was: beginner) > Implement a cancel for in-progress compactions > -- > > Key: HBASE-6028 > URL: https://issues.apache.org/jira/browse/HBASE-6028 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Derek Wollenstein >Assignee: Mohit Goel >Priority: Minor > Labels: beginner, pull-request-available > Fix For: 3.0.0-alpha-1, 2.2.0 > > Attachments: HBASE-6028.master.007.patch, > HBASE-6028.master.008.patch, HBASE-6028.master.008.patch, > HBASE-6028.master.009.patch > > > Depending on current server load, it can be extremely expensive to run > periodic minor / major compactions. It would be helpful to have a feature > where a user could use the shell or a client tool to explicitly cancel an > in-progress compactions. This would allow a system to recover when too many > regions became eligible for compactions at once -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28753) FNFE may occur when accessing the region.jsp of the replica region
[ https://issues.apache.org/jira/browse/HBASE-28753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28753: --- Labels: pull-request-available (was: ) > FNFE may occur when accessing the region.jsp of the replica region > -- > > Key: HBASE-28753 > URL: https://issues.apache.org/jira/browse/HBASE-28753 > Project: HBase > Issue Type: Bug > Components: Replication, UI >Affects Versions: 2.4.13 >Reporter: guluo >Assignee: guluo >Priority: Major > Labels: pull-request-available > Attachments: image-2024-07-24-20-13-22-014.png > > > On hbase UI, we can get the details of storefiles in region region by > accessing region.jsp. > However, When hbase table enables the region replication, the replica region > may reference deleted storefile due to it dosen't refresh in a timely manner, > so in this case, we would get FNFE when openning the region.jsp of the region. > > java.io.FileNotFoundException: File > file:/home/gl/code/github/hbase/hbase-assembly/target/hbase-4.0.0-alpha-1-SNAPSHOT/tmp/hbase/data/default/t01/e073c6b7c05eadda3f91d5b9692fc98d/info/5c52361153044b89aa61090cd5497998.4433b98ccf6b4a011ab03fc4a5e38a1a > does not exist at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:915) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1236) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:905) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) > at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:1881) at > org.apache.hadoop.hbase.generated.regionserver.region_jsp._jspService(region_jsp.java:97) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:111) at > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28748) Replication blocking: InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.
[ https://issues.apache.org/jira/browse/HBASE-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28748: --- Labels: pull-request-available (was: ) > Replication blocking: > InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag > had invalid wire type. > -- > > Key: HBASE-28748 > URL: https://issues.apache.org/jira/browse/HBASE-28748 > Project: HBase > Issue Type: Bug > Components: Replication, wal >Affects Versions: 2.6.0 > Environment: hbase2.6.0 > hadoop3.3.6 >Reporter: Longping Jie >Assignee: Duo Zhang >Priority: Critical > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1 > > Attachments: image-2024-07-23-12-33-50-395.png, > rs-replciation-error.log, > tx1-int-hbase-main-prod-4%2C16020%2C1720602602602.1720609818921 > > > h2. replication queue overstock, As shown below: > !image-2024-07-23-12-33-50-395.png! > > In the figure, the first wal file no longer exists, but has not been skipped, > causing replciation to block. > the second and third wal file were moved oldWals, you can see the attachment, > the reading of these two files faile. > h2. The error log in rs is > 2024-07-22T17:47:49,130 WARN > [RS_CLAIM_REPLICATION_QUEUE-regionserver/sh2-int-hbase-main-ha-9:16020-0.replicationSource,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464.replicationSource.wal-reader.tx1-int-hbase-main-prod-3%2C16020%2C1720602522464,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464] > wal.ProtobufWALStreamReader: Error while reading WALKey, originalPosition=0, > currentPosition=81 > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: > Protocol message tag had invalid wire type. > at > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:119) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:503) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:770) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2829) > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4212) > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4204) > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:192) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage.parseWithIOException(GeneratedMessage.java:321) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey.parseFrom(WALProtos.java:2321) > ~[hbase-protocol-shaded-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.readWALKey(ProtobufWALTailingReader.java:128) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.next(ProtobufWALTailingReader.java:257) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:490) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.lastAttempt(WALEntryStream.java:306) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:388) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:130) > ~[hbase-server-2.6.0.jar:2.6.0] > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(Repl
[jira] [Updated] (HBASE-28655) TestHFileCompressionZstd fails with IllegalArgumentException: Illegal bufferSize
[ https://issues.apache.org/jira/browse/HBASE-28655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28655: --- Labels: pull-request-available (was: ) > TestHFileCompressionZstd fails with IllegalArgumentException: Illegal > bufferSize > > > Key: HBASE-28655 > URL: https://issues.apache.org/jira/browse/HBASE-28655 > Project: HBase > Issue Type: Bug > Components: HFile, Operability >Affects Versions: 2.6.0, 3.0.0-beta-1, 2.5.8 >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Major > Labels: pull-request-available > > HADOOP-18810 added io.compression.codec.zstd.buffersize in core-default.xml > with default value as 0. > So ZSTD buffer size will be returned as 0 based on core-default.xml, > {code:java} > static int getBufferSize(Configuration conf) { > return conf.getInt(ZSTD_BUFFER_SIZE_KEY, > > conf.getInt(CommonConfigurationKeys.IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_KEY, > // IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_DEFAULT is 0! We can't allow > that. > ZSTD_BUFFER_SIZE_DEFAULT)); > } > {code} > HBASE-26259 added a value check, but got reverted in HBASE-26959. > > This issue will also occur during region flush and abort the RegionServer. > > TestHFileCompressionZstd and other zstd related test cases are are also > failing, > {code:java} > java.lang.IllegalArgumentException: Illegal bufferSize > at > org.apache.hadoop.io.compress.CompressorStream.(CompressorStream.java:42) > at > org.apache.hadoop.io.compress.BlockCompressorStream.(BlockCompressorStream.java:56) > at > org.apache.hadoop.hbase.io.compress.aircompressor.ZstdCodec.createOutputStream(ZstdCodec.java:106) > at > org.apache.hadoop.hbase.io.compress.Compression$Algorithm.createPlainCompressionStream(Compression.java:454) > at > org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.(HFileBlockDefaultEncodingContext.java:99) > at > org.apache.hadoop.hbase.io.hfile.NoOpDataBlockEncoder.newDataBlockEncodingContext(NoOpDataBlockEncoder.java:85) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.(HFileBlock.java:846) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishInit(HFileWriterImpl.java:304) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.(HFileWriterImpl.java:185) > at > org.apache.hadoop.hbase.io.hfile.HFile$WriterFactory.create(HFile.java:312) > at > org.apache.hadoop.hbase.io.compress.HFileTestBase.doTest(HFileTestBase.java:73) > at > org.apache.hadoop.hbase.io.compress.aircompressor.TestHFileCompressionZstd.test(TestHFileCompressionZstd.java:54) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28750) Region normalizer should work in off peak if config
[ https://issues.apache.org/jira/browse/HBASE-28750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28750: --- Labels: pull-request-available (was: ) > Region normalizer should work in off peak if config > --- > > Key: HBASE-28750 > URL: https://issues.apache.org/jira/browse/HBASE-28750 > Project: HBase > Issue Type: Improvement > Components: Normalizer >Reporter: MisterWang >Priority: Minor > Labels: pull-request-available > > Region normalizer involves the splitting and merging of regions, which can > cause jitter in online services, especially when there are many region > normalizer plans. We should run this task during off peak hours if config. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28749) Remove the duplicate configurations named hbase.wal.batch.size
[ https://issues.apache.org/jira/browse/HBASE-28749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28749: --- Labels: pull-request-available (was: ) > Remove the duplicate configurations named hbase.wal.batch.size > -- > > Key: HBASE-28749 > URL: https://issues.apache.org/jira/browse/HBASE-28749 > Project: HBase > Issue Type: Improvement > Components: wal >Affects Versions: 3.0.0-beta-1 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Minor > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > > The following code appears in two places: AsyncFSWAL and AbstractFSWAL > {code:java} > public static final String WAL_BATCH_SIZE = "hbase.wal.batch.size"; > public static final long DEFAULT_WAL_BATCH_SIZE = 64L * 1024; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28719) Use ExtendedCell in WALEdit
[ https://issues.apache.org/jira/browse/HBASE-28719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28719: --- Labels: pull-request-available (was: ) > Use ExtendedCell in WALEdit > --- > > Key: HBASE-28719 > URL: https://issues.apache.org/jira/browse/HBASE-28719 > Project: HBase > Issue Type: Sub-task > Components: wal >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28734) Improve HBase shell snapshot command Doc with TTL option
[ https://issues.apache.org/jira/browse/HBASE-28734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28734: --- Labels: pull-request-available (was: ) > Improve HBase shell snapshot command Doc with TTL option > - > > Key: HBASE-28734 > URL: https://issues.apache.org/jira/browse/HBASE-28734 > Project: HBase > Issue Type: Improvement > Components: shell >Reporter: Ashok shetty >Assignee: Liangjun He >Priority: Minor > Labels: pull-request-available > > The current HBase shell snapshot command allows users to create a snapshot of > a specific table. While this command is useful, it could be enhanced by > adding a TTL (Time-to-Live) option. This would allow users to specify a time > period after which the snapshot would automatically be deleted. > I propose we introduce a TTL option in the snapshot command doc as follows: > hbase> snapshot 'sourceTable', 'snapshotName', \{TTL => '7d'} > This would create a snapshot of 'sourceTable' called 'snapshotName' that > would automatically be deleted after 7 days. The addition document of a TTL > option would provide a better user experience and assist with efficient > storage management. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28745) Default Zookeeper ConnectionRegistry APIs timeout should be less
[ https://issues.apache.org/jira/browse/HBASE-28745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28745: --- Labels: pull-request-available (was: ) > Default Zookeeper ConnectionRegistry APIs timeout should be less > > > Key: HBASE-28745 > URL: https://issues.apache.org/jira/browse/HBASE-28745 > Project: HBase > Issue Type: Sub-task >Reporter: Viraj Jasani >Assignee: Divneet Kaur >Priority: Minor > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11 > > > HBASE-28428 introduces timeout for Zookeeper ConnectionRegistry APIs. > However, the default timeout value we have set is 60s. Given that connection > registry are metadata APIs, they should have much lesser timeout value, > including default. > Let's set default timeout to 10s. -- This message was sent by Atlassian Jira (v8.20.10#820010)