[jira] [Updated] (HBASE-28621) PrefixFilter should use SEEK_NEXT_USING_HINT

2024-10-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28621:
---
Labels: beginner beginner-friendly pull-request-available  (was: beginner 
beginner-friendly)

> PrefixFilter should use SEEK_NEXT_USING_HINT 
> -
>
> Key: HBASE-28621
> URL: https://issues.apache.org/jira/browse/HBASE-28621
> Project: HBase
>  Issue Type: Improvement
>  Components: Filters
>Reporter: Istvan Toth
>Assignee: Dávid Paksy
>Priority: Major
>  Labels: beginner, beginner-friendly, pull-request-available
>
> Looking at PrefixFilter, I have noticed that it doesn't use the 
> SEEK_NEXT_USING_HINT mechanism.
> AFAICT, we could safely set the the prefix as a next row hint, which could be 
> a huge performance win.
> Of course, ideally the user would set the scan startRow to the prefix, which 
> avoids the problem, but the user may forget to do that, or may use the filter 
> in a filterList that doesn't allow for setting the start/stop rows close tho 
> the prefix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28900) Avoid resetting bucket cache during restart if inconsistency is observed for some blocks.

2024-10-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28900:
---
Labels: pull-request-available  (was: )

> Avoid resetting bucket cache during restart if inconsistency is observed for 
> some blocks.
> -
>
> Key: HBASE-28900
> URL: https://issues.apache.org/jira/browse/HBASE-28900
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Reporter: Janardhan Hungund
>Assignee: Janardhan Hungund
>Priority: Major
>  Labels: pull-request-available
>
> During the execution of persistence of backing map into the persistence file, 
> the backing map is not guarded by a lock against caching of blocks and block 
> evictions.
> Hence, some of the block entries in the backing map in persistence may not be 
> consistent with the bucket cache.
> During, the retrieval of the backing map from persistence if an inconsistency 
> is detected, the complete bucket cache is discarded and is rebuilt.
> One of the errors, seen, is, as mentioned below:
> {code:java}
> 2024-09-30 08:58:33,840 WARN 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache: Can't restore from 
> file[/hadoopfs/ephfs1/bucketcache.map]. The bucket cache will be reset and 
> rebuilt. Exception seen:
> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException: Couldn't 
> find match for index 26 in free list
>         at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$Bucket.addAllocation(BucketAllocator.java:140)
>         at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.(BucketAllocator.java:406)
>         at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.retrieveFromFile(BucketCache.java:1486)
>         at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.lambda$startPersistenceRetriever$0(BucketCache.java:377)
>         at java.base/java.lang.Thread.run(Thread.java:840) {code}
> This retrieval can be optimised to only discard the inconsistent entries in 
> the persistent backing map and retain the remaining entries. The bucket cache 
> validator will throw away the inconsistent entry from the backing map.
> Thanks,
> Janardhan



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28898) Use reflection to access recoverLease(), setSafeMode() APIs.

2024-10-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28898:
---
Labels: pull-request-available  (was: )

> Use reflection to access recoverLease(), setSafeMode() APIs.
> 
>
> Key: HBASE-28898
> URL: https://issues.apache.org/jira/browse/HBASE-28898
> Project: HBase
>  Issue Type: Task
>  Components: Filesystem Integration
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0
>
>
> HBASE-27769 used the new Hadoop API (available since Hadoop 3.3.6/3.4.0) to 
> access recoverLease() and setSafeMode() APIs, and committed the change in a 
> feature branch.
> However, until we move beyond Hadoop 3.3.6, we can not use them directly.
> I'd like to propose to use reflection to access these APIs in the interim so 
> HBase can use Ozone sooner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28906) Run nightly tests with multiple Hadoop 3 versions

2024-10-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28906:
---
Labels: pull-request-available  (was: )

> Run nightly tests with multiple Hadoop 3 versions
> -
>
> Key: HBASE-28906
> URL: https://issues.apache.org/jira/browse/HBASE-28906
> Project: HBase
>  Issue Type: Sub-task
>  Components: integration tests, test
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28901) checkcompatibility.py can run maven commands with parallelism

2024-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28901:
---
Labels: pull-request-available  (was: )

> checkcompatibility.py can run maven commands with parallelism
> -
>
> Key: HBASE-28901
> URL: https://issues.apache.org/jira/browse/HBASE-28901
> Project: HBase
>  Issue Type: Task
>  Components: create-release
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>
> We can speed up the create-release process by taking advantage of maven 
> parallelism during creation of the API compatibility report.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28903) Incremental backup test missing explicit test for bulkloads

2024-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28903:
---
Labels: pull-request-available  (was: )

> Incremental backup test missing explicit test for bulkloads
> ---
>
> Key: HBASE-28903
> URL: https://issues.apache.org/jira/browse/HBASE-28903
> Project: HBase
>  Issue Type: Improvement
>Reporter: Hernan Gelaf-Romer
>Priority: Major
>  Labels: pull-request-available
>
> Our incremental backup tests don't explicitly test our ability to backup and 
> restore bulkloads. It'd be nice to have this to verify bulkloads work in the 
> context of the backup/restore flow, and to avoid regressions in the future



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28905) Skip excessive evaluations of LINK_NAME_PATTERN and REF_NAME_PATTERN regular expressions

2024-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28905:
---
Labels: pull-request-available  (was: )

> Skip excessive evaluations of LINK_NAME_PATTERN and REF_NAME_PATTERN regular 
> expressions
> 
>
> Key: HBASE-28905
> URL: https://issues.apache.org/jira/browse/HBASE-28905
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.6.0, 3.0.0-beta-1, 2.7.0
>Reporter: Charles Connell
>Assignee: Charles Connell
>Priority: Minor
>  Labels: pull-request-available
> Attachments: cpu_time_flamegraph_2.6.0.html, 
> cpu_time_flamegraph_with_optimization.html, 
> performance_test_query_latency_2.6.0.png, 
> performance_test_query_latency_with_optimization.png
>
>
> To test if a file is a link file, HBase checks if its file name matches the 
> regex
> {code:java}
> ^(?:((?:[_\p{Digit}\p{IsAlphabetic}]+))(?:\=))?((?:[_\p{Digit}\p{IsAlphabetic}][-_.\p{Digit}\p{IsAlphabetic}]*))=((?:[a-f0-9]+))-([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?)$
> {code}
> To test if an HFile has a "reference name," HBase checks if its file name 
> matches the regex
> {code:java}
> ^([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?|^(?:((?:[_\p{Digit}\p{IsAlphabetic}]+))(?:\=))?((?:[_\p{Digit}\p{IsAlphabetic}][-_.\p{Digit}\p{IsAlphabetic}]*))=((?:[a-f0-9]+))-([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?)$)\.(.+)$
> {code}
> Matching against these big regexes is computationally expensive. HBASE-27474 
> introduced (in 2.6.0) [code in a hot 
> path|https://github.com/apache/hbase/blob/1602c531b245b4d455b48161757cde2ec3d1930b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java#L1716]
>  in {{HFileReaderImpl}} that checks whether an HFile is a link or reference 
> file while deciding whether to cache blocks from that file. In flamegraphs 
> taken at my company during performance tests, this meant that these regex 
> evaulations take 2-3% of the CPU time on a busy RegionServer.
> Later, the hot-path invocation of the regexes was removed in HBASE-28596 in 
> branch-2 and later, but not branch-2.6, so only the 2.6.x series suffers the 
> performance regression. Nonetheless, all invocations of these regexes are 
> still unnecessarily expensive and can be fast-failed easily.
> The link name pattern contains a literal "=", so any string that does not 
> contain a "=" can be assumed to not match the regex. The reference name 
> pattern contains a literal ".", so any string that does not contain a "." can 
> be assumed to not match the regex. This optimization is mostly helpful in 
> 2.6.x, but is valid in all branches.
> Running performance tests of this optimization removed the regex evaluations 
> from my flamegraphs entirely, and reduced query latency by 15%. Some charts 
> are attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28897) Incremental backups can be taken with incompatible column families

2024-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28897:
---
Labels: pull-request-available  (was: )

> Incremental backups can be taken with incompatible column families
> --
>
> Key: HBASE-28897
> URL: https://issues.apache.org/jira/browse/HBASE-28897
> Project: HBase
>  Issue Type: Bug
>  Components: backup&restore
>Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1
>Reporter: Hernan Gelaf-Romer
>Assignee: Hernan Gelaf-Romer
>Priority: Major
>  Labels: pull-request-available
>
> Incremental backups can be taken even if the table descriptor of the current 
> table does not match the column families of the full backup for that same 
> table. When restoring the table, we choose to use the families of the full 
> backup. This can cause the restore process to fail if we add a column family 
> in the incremental backup that doesn't exist in the full backup. The bulkload 
> process will fail because it is trying to write column families that don't 
> exist in the restore table. 
>  
> I think the correct solution here is to prevent incremental backups from 
> being taken if the families of the current table don't match those of the 
> full backup. This will force users to instead take a full backup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28894) NPE on TestPrefetch.testPrefetchWithDelay

2024-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28894:
---
Labels: pull-request-available  (was: )

> NPE on TestPrefetch.testPrefetchWithDelay
> -
>
> Key: HBASE-28894
> URL: https://issues.apache.org/jira/browse/HBASE-28894
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>  Labels: pull-request-available
>
> I'm seeing some failures on TestPrefetch.testPrefetchWithDelay in some 
> pre-commit runs, I believe this is due to a race condition in 
> PrefetchExecutor.
> loadConfiguration.
> In these failures, it seems we are getting the NPE below:
> {noformat}
> Stacktracejava.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentSkipListMap.put(ConcurrentSkipListMap.java:1580)
>   at 
> org.apache.hadoop.hbase.io.hfile.PrefetchExecutor.request(PrefetchExecutor.java:108)
>   at 
> org.apache.hadoop.hbase.io.hfile.PrefetchExecutor.lambda$loadConfiguration$0(PrefetchExecutor.java:206)
>   at 
> java.util.concurrent.ConcurrentSkipListMap.forEach(ConcurrentSkipListMap.java:3269)
>   at 
> org.apache.hadoop.hbase.io.hfile.PrefetchExecutor.loadConfiguration(PrefetchExecutor.java:200)
>   at 
> org.apache.hadoop.hbase.regionserver.PrefetchExecutorNotifier.onConfigurationChange(PrefetchExecutorNotifier.java:51)
>   at 
> org.apache.hadoop.hbase.io.hfile.TestPrefetch.testPrefetchWithDelay(TestPrefetch.java:378)
>  {noformat}
> I think this is because we are completing prefetch in this test before the 
> induced delay, then this test triggers a new configuration change, but the 
> prefetch thread calls PrefetchExecutor.complete just before the test thread 
> reaches [this 
> point|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/PrefetchExecutor.java#L206]:
> {noformat}
> 2024-10-01T11:28:10,660 DEBUG [Time-limited test {}] 
> hfile.PrefetchExecutor(102): Prefetch requested for 
> /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-6328/yetus-jdk8-hadoop2-check/src/hbase-server/target/test-data/b646497b-7616-6533-e8cb-98e5c9e2e083/TestPrefetchWithDelay/71eefdb271ae4f65b694a6ec3d4287a0,
>  delay=25000 ms
> 2024-10-01T11:28:30,668 INFO  [Time-limited test {}] hbase.Waiter(181): 
> Waiting up to [10,000] milli-secs(wait.for.ratio=[1])
> 2024-10-01T11:28:35,661 DEBUG [hfile-prefetch-1727782088576 {}] 
> hfile.HFilePreadReader$1(103): No entry in the backing map for cache key 
> 71eefdb271ae4f65b694a6ec3d4287a0_0. 
> ...
> 2024-10-01T11:28:35,673 DEBUG [hfile-prefetch-1727782088576 {}] 
> hfile.HFilePreadReader$1(103): No entry in the backing map for cache key 
> 71eefdb271ae4f65b694a6ec3d4287a0_52849. 
> 2024-10-01T11:28:35,674 DEBUG [Time-limited test {}] 
> hfile.PrefetchExecutor(142): Prefetch cancelled for 
> /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-6328/yetus-jdk8-hadoop2-check/src/hbase-server/target/test-data/b646497b-7616-6533-e8cb-98e5c9e2e083/TestPrefetchWithDelay/71eefdb271ae4f65b694a6ec3d4287a0
> 2024-10-01T11:28:35,674 DEBUG [hfile-prefetch-1727782088576 {}] 
> hfile.PrefetchExecutor(121): Prefetch completed for 
> 71eefdb271ae4f65b694a6ec3d4287a0
> 2024-10-01T11:28:35,674 DEBUG [Time-limited test {}] 
> hfile.PrefetchExecutor(102): Prefetch requested for 
> /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-6328/yetus-jdk8-hadoop2-check/src/hbase-server/target/test-data/b646497b-7616-6533-e8cb-98e5c9e2e083/TestPrefetchWithDelay/71eefdb271ae4f65b694a6ec3d4287a0,
>  delay=991 ms
> ...
> {noformat}
> CC: [~kabhishek4]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28890) RefCnt Leak error when caching index blocks at write time

2024-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28890:
---
Labels: pull-request-available  (was: )

> RefCnt Leak error when caching index blocks at write time
> -
>
> Key: HBASE-28890
> URL: https://issues.apache.org/jira/browse/HBASE-28890
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>  Labels: pull-request-available
>
> Following [~bbeaudreault] works from HBASE-27170 that added the (very useful) 
> refcount leak detector, we sometimes see these reports on some branch-2 based 
> deployments:
> {noformat}
> 2024-09-25 10:06:42,413 ERROR 
> org.apache.hbase.thirdparty.io.netty.util.ResourceLeakDetector: LEAK: 
> RefCnt.release() was not called before it's garbage-collected. See 
> https://netty.io/wiki/reference-counted-objects.html for more information.
> Recent access records:  
> Created at:
> org.apache.hadoop.hbase.nio.RefCnt.(RefCnt.java:59)
> org.apache.hadoop.hbase.nio.RefCnt.create(RefCnt.java:54)
> org.apache.hadoop.hbase.nio.ByteBuff.wrap(ByteBuff.java:550)
> 
> org.apache.hadoop.hbase.io.ByteBuffAllocator.allocate(ByteBuffAllocator.java:357)
> 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.cloneUncompressedBufferWithHeader(HFileBlock.java:1153)
> 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.getBlockForCaching(HFileBlock.java:1215)
> 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.lambda$writeIndexBlocks$0(HFileBlockIndex.java:997)
> java.base/java.util.Optional.ifPresent(Optional.java:178)
> 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:996)
> 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:635)
> 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:378)
> 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74)
> 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:831)
> 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2033)
> 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2878)
> 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2620)
> 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2592)
> 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2462)
> 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:602)
> 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:572)
> 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:65)
> 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:344)
> {noformat}
> It turns out that we always convert the block to a "on-heap" one, inside 
> LruBlockCache.cacheBlock, so when the index block is a SharedMemHFileBlock, 
> the blockForCaching instance in the code 
> [here|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java#L1076]
>  becomes eligible for GC without releasing buffers/decreasing refcount 
> (leak), right after we return the BlockIndexWriter.writeIndexBlocks call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28733) Publish API docs for 2.6

2024-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28733:
---
Labels: pull-request-available  (was: )

> Publish API docs for 2.6
> 
>
> Key: HBASE-28733
> URL: https://issues.apache.org/jira/browse/HBASE-28733
> Project: HBase
>  Issue Type: Task
>  Components: community, documentation
>Reporter: Nick Dimiduk
>Assignee: Dávid Paksy
>Priority: Major
>  Labels: pull-request-available
>
> We have released 2.6 but the website has not been updated with the new API 
> docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28888) Backport "HBASE-18382 [Thrift] Add transport type info to info server" to branch-2

2024-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-2:
---
Labels: beginner pull-request-available  (was: beginner)

> Backport "HBASE-18382 [Thrift] Add transport type info to info server" to 
> branch-2
> --
>
> Key: HBASE-2
> URL: https://issues.apache.org/jira/browse/HBASE-2
> Project: HBase
>  Issue Type: Improvement
>  Components: Thrift
>Reporter: Lars George
>Assignee: Nihal Jain
>Priority: Minor
>  Labels: beginner, pull-request-available
> Fix For: 3.0.0-alpha-1
>
>
> It would be really helpful to know if the Thrift server was started using the 
> HTTP or binary transport. Any additional info, like QOP settings for SASL 
> etc. would be great too. Right now the UI is very limited and shows 
> {{true/false}} for, for example, {{Compact Transport}}. It'd suggest to 
> change this to show something more useful like this:
> {noformat}
> Thrift Impl Type: non-blocking
> Protocol: Binary
> Transport: Framed
> QOP: Authentication & Confidential
> {noformat}
> or
> {noformat}
> Protocol: Binary + HTTP
> Transport: Standard
> QOP: none
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28887) Fix broken link to mailing lists page

2024-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28887:
---
Labels: pull-request-available  (was: )

> Fix broken link to mailing lists page
> -
>
> Key: HBASE-28887
> URL: https://issues.apache.org/jira/browse/HBASE-28887
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 4.0.0-alpha-1
>Reporter: Dávid Paksy
>Priority: Minor
>  Labels: pull-request-available
>
> The Reference Guide (book) link to the mailing lists page



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28866) Setting `hbase.oldwals.cleaner.thread.size` to negative value will break HMaster and produce hard-to-diagnose logs

2024-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28866:
---
Labels: pull-request-available  (was: )

> Setting `hbase.oldwals.cleaner.thread.size` to negative value will break 
> HMaster and produce hard-to-diagnose logs
> --
>
> Key: HBASE-28866
> URL: https://issues.apache.org/jira/browse/HBASE-28866
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.4.2, 3.0.0-beta-1
>Reporter: Ariadne_team
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0-beta-2
>
> Attachments: HBASE-28866-000-1.patch, HBASE-28866-000.patch
>
>
> 
> Problem
> -
> HBase Master cannot be initialized with the following setting:
>   
>     hbase.oldwals.cleaner.thread.size
>     -1
>     Default is 2
>   
>  
> After running the start-hbase.sh, the Master node could not be started due to 
> an exception:
> {code:java}
> ERROR [master/localhost:16000:becomeActiveMaster] master.HMaster: Failed to 
> become active master
> java.lang.IllegalArgumentException: Illegal Capacity: -1
>     at java.util.ArrayList.(ArrayList.java:157)
>     at 
> org.apache.hadoop.hbase.master.cleaner.LogCleaner.createOldWalsCleaner(LogCleaner.java:149)
>     at 
> org.apache.hadoop.hbase.master.cleaner.LogCleaner.(LogCleaner.java:80)
>     at 
> org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1329)
>     at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:917)
>     at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2081)
>     at org.apache.hadoop.hbase.master.HMaster.lambda$0(HMaster.java:505)
>     at java.lang.Thread.run(Thread.java:750){code}
> We were really confused and misled by the error log as the 'Illegal Capacity' 
> of ArrayList seems like an internal code issue.
>  
> After we read the source code, we found that 
> "hbase.oldwals.cleaner.thread.size" is parsed and used in 
> createOldWalsCleaner() function without checking:
> {code:java}
> int size = conf.getInt(OLD_WALS_CLEANER_THREAD_SIZE, 
> DEFAULT_OLD_WALS_CLEANER_THREAD_SIZE);    this.oldWALsCleaner = 
> createOldWalsCleaner(size); {code}
> The value of "hbase.oldwals.cleaner.thread.size" will be served as the 
> initialCapacity of ArrayList. If the configuration value is negative, an 
> IllegalArgumentException will be thrown.:
> {code:java}
> private List createOldWalsCleaner(int size) {
> ...
>     List oldWALsCleaner = new ArrayList<>(size);
> ...
> } {code}
>  
> Solution (the attached patch) 
> -
> The basic idea of the attached patch is to add a check and relevant logging 
> for this value during the initialization of the {{LogCleaner}} in the 
> constructor. This will help users better diagnose the issue. The detailed 
> patch is shown below.
> {code:java}
> @@ -78,6 +78,11 @@ 
> public class LogCleaner extends CleanerChore       
> pool, params, null);
>      this.pendingDelete = new LinkedBlockingQueue<>();
>      int size = conf.getInt(OLD_WALS_CLEANER_THREAD_SIZE, 
> DEFAULT_OLD_WALS_CLEANER_THREAD_SIZE);
> +    if (size <= 0) {
> +      LOG.warn("The size of old WALs cleaner thread is {}, which is invalid, 
> "
> +          + "the default value will be used.", size);
> +      size = DEFAULT_OLD_WALS_CLEANER_THREAD_SIZE;
> +    }
>      this.oldWALsCleaner = createOldWalsCleaner(size);
>      this.cleanerThreadTimeoutMsec = 
> conf.getLong(OLD_WALS_CLEANER_THREAD_TIMEOUT_MSEC,
>        DEFAULT_OLD_WALS_CLEANER_THREAD_TIMEOUT_MSEC);{code}
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28884) SFT's BrokenStoreFileCleaner may cause data loss

2024-09-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28884:
---
Labels: pull-request-available  (was: )

> SFT's BrokenStoreFileCleaner may cause data loss
> 
>
> Key: HBASE-28884
> URL: https://issues.apache.org/jira/browse/HBASE-28884
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>  Labels: pull-request-available
>
> When having this BrokenStoreFileCleaner enabled, one of our customers has run 
> into a data loss situation, probably due to a race condition between regions 
> getting moved out of the regionserver while the BrokenStoreFileCleaner was 
> checking this region's files eligibility for deletion. We have seen that the 
> file got deleted by the given region server, around the same time the region 
> got closed on this region server. I believe a race condition during region 
> close is possible here:
> 1) In BrokenStoreFileCleaner, for each region online on the given RS, we get 
> the list of files in the store dirs, then iterate through it [1]; 
> 2) For each file listed, we perform several checks, including this one [2] 
> that checks if the file is "active"
> The problem is, if the region for the file we are checking got closed between 
> point #1 and #2, by the time we check if the file is active in [2], the store 
> may have already been closed as part of the region closure, so this check 
> would consider the file as deletable.
> One simple solution is to check if the store's region is still open before 
> proceeding with deleting the file.
> [1] 
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BrokenStoreFileCleaner.java#L99
> [2] 
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BrokenStoreFileCleaner.java#L133



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28883) Manage hbase-thirdparty transitive dependencies via BOM pom

2024-09-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28883:
---
Labels: pull-request-available  (was: )

> Manage hbase-thirdparty transitive dependencies via BOM pom
> ---
>
> Key: HBASE-28883
> URL: https://issues.apache.org/jira/browse/HBASE-28883
> Project: HBase
>  Issue Type: Task
>  Components: build, thirdparty
>Reporter: Nick Dimiduk
>Priority: Major
>  Labels: pull-request-available
>
> Despite the intentions to the contrary, there are several places where we 
> need the version of a dependency managed in hbase-thirdparty to match an 
> import in the main product (and maybe also in our other repos). Right now, 
> this is managed via comments in the poms, which read "when this changes 
> there, don't for get to update it here...". We can do better than this.
> I think that hbase-thirdparty could publish a BOM pom file that can be 
> imported into any of the downstream hbase projects that make use of that 
> release of hbase-thirdparty. That will centralize management of these 
> dependencies in the hbase-thirdparty repo.
> This blog post has a nice write-up on the idea, 
> https://www.garretwilson.com/blog/2023/06/14/improve-maven-bom-pattern



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28879) Bump hbase-thirdparty to 4.1.9

2024-09-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28879:
---
Labels: pull-request-available  (was: )

> Bump hbase-thirdparty to 4.1.9
> --
>
> Key: HBASE-28879
> URL: https://issues.apache.org/jira/browse/HBASE-28879
> Project: HBase
>  Issue Type: Task
>  Components: dependencies, thirdparty
>Reporter: Duo Zhang
>Assignee: Nick Dimiduk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28882) Backup restores are broken if the backup has moved locations

2024-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28882:
---
Labels: pull-request-available  (was: )

> Backup restores are broken if the backup has moved locations
> 
>
> Key: HBASE-28882
> URL: https://issues.apache.org/jira/browse/HBASE-28882
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0, 2.6.1
>Reporter: Ray Mattingly
>Assignee: Ray Mattingly
>Priority: Major
>  Labels: pull-request-available
>
> My company runs a few hundred HBase clusters. We want to take backups 
> everyday in one public cloud region, and then use said cloud's native 
> replication solution to "backup our backups" in a secondary region. This is 
> how we plan for region-wide disaster recovery.
> This system should work, but doesn't because of the way that BackupManifests 
> are constructed.
> Backing up a bit (no pun intended): when we replicate backups verbatim, the 
> manifest file continues to point to the original backup root. This shouldn't 
> matter, because when taking a restore one passes a RestoreRequest to the 
> RestoreTablesClient — and this RestoreRequest includes a BackupRootDir field. 
> This works as you would expect initially, but eventually we build a 
> BackupManifest that fails to interpolate this provided root directory and, 
> instead, falls back to what it finds on disk in the backup (which would point 
> back to the primary backup location, even if reading a replicated backup).
> To fix this, I'm proposing that we properly interpolate the request's root 
> directory field when building BackupManifests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28880) ParseException may occur when getting the fileDate of the mob file recovered through snapshot

2024-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28880:
---
Labels: pull-request-available  (was: )

> ParseException may occur when getting the fileDate of the mob file recovered 
> through snapshot
> -
>
> Key: HBASE-28880
> URL: https://issues.apache.org/jira/browse/HBASE-28880
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.4.13
> Environment: hbase2.4.13
> centos
>Reporter: guluo
>Assignee: guluo
>Priority: Major
>  Labels: pull-request-available
>
> The task ExpiredMobFileCleaner may occur ParseException when parsing MOB file 
> recovered through snapshot, causing these expired MOB file cannot be deleted.
>  
> The Reason:
> The task ExpiredMobFileCleaner obtain the MOB file creation time by parsing 
> the MOB filename.
> In regular MOB table, the 32nd to 40th characters of the MOB filename 
> indicate the file creation time, ExpiredMobFileCleaner can get the creation 
> time of MOB file by obtaining these characters.
> However, in MOB tables recovered through snapshot, the format of MOB filename 
> is tableName-mobregionaname-hfilename,so ExpiredMobFileCleaner may not be 
> able to obtain the creation time of MOB file by obtaining the characters at 
> the above location. So, in this situation, ParseException will occur, causing 
> these expired MOB file cannot be deleted finally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-26867) Introduce a FlushProcedure

2024-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-26867:
---
Labels: pull-request-available  (was: )

> Introduce a FlushProcedure
> --
>
> Key: HBASE-26867
> URL: https://issues.apache.org/jira/browse/HBASE-26867
> Project: HBase
>  Issue Type: New Feature
>  Components: proc-v2
>Reporter: ruanhui
>Assignee: ruanhui
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-1
>
>
> Reimplement proc-v1 based flush procedure in proc-v2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27757) Clean up ScanMetrics API

2024-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-27757:
---
Labels: pull-request-available  (was: )

> Clean up ScanMetrics API
> 
>
> Key: HBASE-27757
> URL: https://issues.apache.org/jira/browse/HBASE-27757
> Project: HBase
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Chandra Sekhar K
>Priority: Major
>  Labels: pull-request-available
>
> ScanMetrics object exposes public instance variables for all metrics. For 
> example, ScanMetrics.countOfRPCcalls. This is not standard API design in java 
> or in hbase, and requires suppressing VisibilityModifier checkstyle warnings. 
> We should clean this up, but would require major version changes since it's 
> part of public API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28382) Support building hbase-connectors with JDK17

2024-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28382:
---
Labels: pull-request-available  (was: )

> Support building hbase-connectors with JDK17
> 
>
> Key: HBASE-28382
> URL: https://issues.apache.org/jira/browse/HBASE-28382
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbase-connectors, java
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28790) hbase-connectors fails to build with hbase 2.6.0

2024-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28790:
---
Labels: pull-request-available  (was: )

> hbase-connectors fails to build with hbase 2.6.0
> 
>
> Key: HBASE-28790
> URL: https://issues.apache.org/jira/browse/HBASE-28790
> Project: HBase
>  Issue Type: Bug
>  Components: build, hbase-connectors
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
>  Labels: pull-request-available
>
> hbase-connectors fails to build with hbase 2.6.0
> {code:java}
> [INFO] Reactor Summary for Apache HBase Connectors 1.1.0-SNAPSHOT:
> [INFO] 
> [INFO] Apache HBase Connectors  SUCCESS [  4.377 
> s]
> [INFO] Apache HBase - Kafka ... SUCCESS [  0.116 
> s]
> [INFO] Apache HBase - Model Objects for Kafka Proxy ... SUCCESS [  3.222 
> s]
> [INFO] Apache HBase - Kafka Proxy . FAILURE [  8.305 
> s]
> [INFO] Apache HBase - Spark ... SKIPPED
> [INFO] Apache HBase - Spark Protocol .. SKIPPED
> [INFO] Apache HBase - Spark Protocol (Shaded) . SKIPPED
> [INFO] Apache HBase - Spark Connector . SKIPPED
> [INFO] Apache HBase - Spark Integration Tests . SKIPPED
> [INFO] Apache HBase Connectors - Assembly . SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  16.703 s
> [INFO] Finished at: 2024-08-17T11:29:20Z
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile 
> (default-compile) on project hbase-kafka-proxy: Compilation failure
> [ERROR] 
> /workspaces/hbase-connectors/kafka/hbase-kafka-proxy/src/main/java/org/apache/hadoop/hbase/kafka/KafkaBridgeConnection.java:[169,31]
>   is not 
> abstract and does not override abstract method 
> setRequestAttribute(java.lang.String,byte[]) in 
> org.apache.hadoop.hbase.client.TableBuilder {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28876) Should call ProcedureSchduler.completionCleanup for non-root procedure too

2024-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28876:
---
Labels: pull-request-available  (was: )

> Should call ProcedureSchduler.completionCleanup for non-root procedure too
> --
>
> Key: HBASE-28876
> URL: https://issues.apache.org/jira/browse/HBASE-28876
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Duo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Per the discussion in this PR
> https://github.com/apache/hbase/pull/6247
> And the related issue HBASE-28830, it seems incorrect that we only call 
> cleanup for non-root procedures.
> This issue aims to see if there are any issues we call this method for every 
> procedure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28875) FSHlog closewrite closeErrorCount should increment for initial catch exception

2024-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28875:
---
Labels: pull-request-available  (was: )

> FSHlog closewrite closeErrorCount should increment for initial catch exception
> --
>
> Key: HBASE-28875
> URL: https://issues.apache.org/jira/browse/HBASE-28875
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.7.0
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Y. SREENIVASULU REDDY
>Priority: Minor
>  Labels: pull-request-available
>
> During close writer for FSHlog, if any error occured then for initial 
> exception itself need to increment the closeErrorCount counter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28868) Add missing permission check for updateRSGroupConfig in branch-2

2024-09-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28868:
---
Labels: pull-request-available  (was: )

> Add missing permission check for updateRSGroupConfig in branch-2
> 
>
> Key: HBASE-28868
> URL: https://issues.apache.org/jira/browse/HBASE-28868
> Project: HBase
>  Issue Type: Task
>  Components: rsgroup
>Affects Versions: 2.7.0
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Minor
>  Labels: pull-request-available
>
> Found this during HBASE-28867, we do not have security check for 
> updateRSGroupConfig in branch-2. See 
> [https://github.com/apache/hbase/blob/0dc334f572329be7eb2455cec3519fc820c04c25/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminEndpoint.java#L450]
> Same check exists in master 
> [https://github.com/apache/hbase/blob/52082bc5b80a60406bfaaa630ed5cb23027436c1/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java#L2279]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28871) [hbase-thirdparty] Bump dependency versions before releasing

2024-09-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28871:
---
Labels: pull-request-available  (was: )

> [hbase-thirdparty] Bump dependency versions before releasing
> 
>
> Key: HBASE-28871
> URL: https://issues.apache.org/jira/browse/HBASE-28871
> Project: HBase
>  Issue Type: Sub-task
>  Components: dependencies, thirdparty
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28869) [hbase-thirdparty] Bump protobuf java to 4.27.5+

2024-09-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28869:
---
Labels: pull-request-available  (was: )

> [hbase-thirdparty] Bump protobuf java to 4.27.5+
> 
>
> Key: HBASE-28869
> URL: https://issues.apache.org/jira/browse/HBASE-28869
> Project: HBase
>  Issue Type: Task
>  Components: Protobufs, security, thirdparty
>Reporter: Duo Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: thirdparty-4.1.9
>
>
> For addressing CVE-2024-7254



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28867) Backport "HBASE-20653 Add missing observer hooks for region server group to MasterObserver" to branch-2

2024-09-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28867:
---
Labels: pull-request-available  (was: )

> Backport "HBASE-20653 Add missing observer hooks for region server group to 
> MasterObserver" to branch-2
> ---
>
> Key: HBASE-28867
> URL: https://issues.apache.org/jira/browse/HBASE-28867
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.5.10
>Reporter: Ted Yu
>Assignee: Nihal Jain
>Priority: Major
>  Labels: pull-request-available
>
> Currently the following region server group operations don't have 
> corresponding hook in MasterObserver :
> * getRSGroupInfo
> * getRSGroupInfoOfServer
> * getRSGroupInfoOfTable
> * listRSGroup
> This JIRA is to 
> * add them to MasterObserver
> * add pre/post hook calls in RSGroupAdminEndpoint thru 
> master.getMasterCoprocessorHost for the above operations
> * add corresponding tests to TestRSGroups (in similar manner to that of 
> HBASE-20627)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28864) NoMethodError undefined method assignment_expression?

2024-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28864:
---
Labels: pull-request-available  (was: )

> NoMethodError undefined method assignment_expression?
> -
>
> Key: HBASE-28864
> URL: https://issues.apache.org/jira/browse/HBASE-28864
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2
>
>
> After HBASE-28250 Bump jruby to 9.4.8.0 to fix snakeyaml CVE after every 
> command the message "NoMethodError undefined method assignment_expression?" 
> is printed. 
> This is called from code copied from 
> https://github.com/ruby/irb/blob/v1.4.2/lib/irb.rb . The fix is to also copy 
> over the definition of `assignment_expression`. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28835) Make connector support for Decimal type

2024-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28835:
---
Labels: pull-request-available  (was: )

> Make connector support for Decimal type
> ---
>
> Key: HBASE-28835
> URL: https://issues.apache.org/jira/browse/HBASE-28835
> Project: HBase
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: connector-1.0.0
>Reporter: yan.duan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: connector-1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28862) Change the generic type for ObserverContext from 'RegionCoprocessorEnvironment' to '? extends RegionCoprocessorEnvironment' in RegionObserver

2024-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28862:
---
Labels: pull-request-available  (was: )

> Change the generic type for ObserverContext from 
> 'RegionCoprocessorEnvironment' to '? extends RegionCoprocessorEnvironment' in 
> RegionObserver
> -
>
> Key: HBASE-28862
> URL: https://issues.apache.org/jira/browse/HBASE-28862
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors, regionserver
>Reporter: Duo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> This will be a breaking change for coprocessor implementation, but the 
> ability of region observer is not changed, so I think it is OK to include 
> this in 3.0.0 release, as we have already changed the coprocessor protobuf to 
> the relocated one, which already breaks lots of coprocessors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28721) AsyncFSWAL is broken when running against hadoop 3.4.0

2024-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28721:
---
Labels: pull-request-available  (was: )

> AsyncFSWAL is broken when running against hadoop 3.4.0
> --
>
> Key: HBASE-28721
> URL: https://issues.apache.org/jira/browse/HBASE-28721
> Project: HBase
>  Issue Type: Bug
>  Components: hadoop3, wal
>Reporter: Duo Zhang
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>
> {noformat}
> 2024-07-10T10:09:33,161 ERROR [master/localhost:0:becomeActiveMaster {}] 
> asyncfs.FanOutOneBlockAsyncDFSOutputHelper(258): Couldn't properly initialize 
> access to HDFS internals. Please update your WAL Provider to not make use of 
> the 'asyncfs' provider. See HBASE-16110 for more information.
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.DFSClient.beginFileLease(long,org.apache.hadoop.hdfs.DFSOutputStream)
> at java.lang.Class.getDeclaredMethod(Class.java:2675) ~[?:?]
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createLeaseManager(FanOutOneBlockAsyncDFSOutputHelper.java:175)
>  ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
> at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.(FanOutOneBlockAsyncDFSOutputHelper.java:252)
>  ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
> at java.lang.Class.forName0(Native Method) ~[?:?]
> at java.lang.Class.forName(Class.java:375) ~[?:?]
> at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.load(AsyncFSWALProvider.java:149)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hbase.wal.WALFactory.getProviderClass(WALFactory.java:174) 
> ~[classes/:?]
> at org.apache.hadoop.hbase.wal.WALFactory.(WALFactory.java:262) 
> ~[classes/:?]
> at org.apache.hadoop.hbase.wal.WALFactory.(WALFactory.java:231) 
> ~[classes/:?]
> at 
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:383)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:135)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1003)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[classes/:?]
> at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
> at java.lang.Thread.run(Thread.java:840) ~[?:?]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28569) Race condition during WAL splitting leading to corrupt recovered.edits

2024-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28569:
---
Labels: pull-request-available  (was: )

> Race condition during WAL splitting leading to corrupt recovered.edits
> --
>
> Key: HBASE-28569
> URL: https://issues.apache.org/jira/browse/HBASE-28569
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.4.17
>Reporter: Benoit Sigoure
>Priority: Major
>  Labels: pull-request-available
>
> There is a race condition that can happen when a regionserver aborts 
> initialisation while splitting a WAL from another regionserver. This race 
> leads to writing the WAL trailer for recovered edits while the writer threads 
> are still running, thus the trailer gets interleaved with the edits 
> corrupting the recovered edits file (and preventing the region to be 
> assigned).
> We've seen this happening on HBase 2.4.17, but looking at the latest code it 
> seems that the race can still happen there.
> The sequence of operations that leads to this issue:
>  * {{org.apache.hadoop.hbase.wal.WALSplitter.splitWAL}} calls 
> {{outputSink.close()}} after adding all the entries to the buffers
>  * The output sink is 
> {{org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink}} and its {{close}} 
> method calls first {{finishWriterThreads}} in a try block which in turn will 
> call {{finish}} on every thread and then join it to make sure it's done.
>  * However if the splitter thread gets interrupted because of RS aborting, 
> the join will get interrupted and {{finishWriterThreads}} will rethrow 
> without waiting for the writer threads to stop.
>  * This is problematic because coming back to 
> {{org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close}} it will call 
> {{closeWriters}} in a finally block (so it will execute even when the join 
> was interrupted).
>  * {{closeWriters}} will call 
> {{org.apache.hadoop.hbase.wal.AbstractRecoveredEditsOutputSink.closeRecoveredEditsWriter}}
>  which will call {{close}} on {{{}editWriter.writer{}}}.
>  * When {{editWriter.writer}} is 
> {{{}org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter{}}}, its 
> {{close}} method will write the trailer before closing the file.
>  * This trailer write will now go in parallel with writer threads writing 
> entries causing corruption.
>  * If there are no other errors, {{closeWriters}} will succeed renaming all 
> temporary files to final recovered edits, causing problems next time the 
> region is assigned.
> Logs evidence supporting the above flow:
> Abort is triggered (because it failed to open the WAL due to some ongoing 
> infra issue):
> {noformat}
> regionserver-2 regionserver 06:22:00.384 
> [RS_OPEN_META-regionserver/host01:16201-0] ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer - * ABORTING region 
> server host01,16201,1709187641249: WAL can not clean up after init failed 
> *{noformat}
> We can see that the writer threads were still active after closing (even 
> considering that the
> ordering in the log might not be accurate, we see that they die because the 
> channel is closed while still writing, not because they're stopping):
> {noformat}
> regionserver-2 regionserver 06:22:09.662 [DataStreamer for file 
> /hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/03169600719-host02%2C16201%2C1709180140645.1709186722780.temp
>  block BP-1645452845-192.168.2.230-1615455682886:blk_1076340939_2645368] WARN 
>  org.apache.hadoop.hdfs.DataStreamer - Error Recovery for 
> BP-1645452845-192.168.2.230-1615455682886:blk_1076340939_2645368 in pipeline 
> [DatanodeInfoWithStorage[192.168.2.230:15010,DS-2aa201ab-1027-47ec-b05f-b39d795fda85,DISK],
>  
> DatanodeInfoWithStorage[192.168.2.232:15010,DS-39651d5a-67d2-4126-88f0-45cdee967dab,DISK],
>  Datanode
> InfoWithStorage[192.168.2.231:15010,DS-e08a1d17-f7b1-4e39-9713-9706bd762f48,DISK]]:
>  datanode 
> 2(DatanodeInfoWithStorage[192.168.2.231:15010,DS-e08a1d17-f7b1-4e39-9713-9706bd762f48,DISK])
>  is bad.
> regionserver-2 regionserver 06:22:09.742 [split-log-closeStream-pool-1] INFO  
> org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink - Closed recovered edits 
> writer 
> path=hdfs://mycluster/hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/03169600719-host02%2C16201%
> 2C1709180140645.1709186722780.temp (wrote 5949 edits, skipped 0 edits in 93 
> ms)
> regionserver-2 regionserver 06:22:09.743 
> [RS_LOG_REPLAY_OPS-regionserver/host01:16201-1-Writer-0] ERROR 
> org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink - Failed to write log 
> entry aeris_v2/53308260a6b22eaf6ebb8353f7df3077/3169611655=[#edits: 8 = 
> ] to log
> regionserver-2 regionserver

[jira] [Updated] (HBASE-28797) New version of Region#getRowLock with timeout

2024-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28797:
---
Labels: pull-request-available  (was: )

> New version of Region#getRowLock with timeout
> -
>
> Key: HBASE-28797
> URL: https://issues.apache.org/jira/browse/HBASE-28797
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.6.0, 3.0.0-beta-1
>Reporter: Viraj Jasani
>Assignee: Chandra Sekhar K
>Priority: Major
>  Labels: pull-request-available
>
> Region APIs are LimitedPrivate for Coprocs. One of the APIs provided by HBase 
> for Coproc use is to acquire row level read/write lock(s):
> {code:java}
> /**
>  * Get a row lock for the specified row. All locks are reentrant. Before 
> calling this function
>  * make sure that a region operation has already been started (the calling 
> thread has already
>  * acquired the region-close-guard lock).
>  * 
>  * The obtained locks should be released after use by {@link 
> RowLock#release()}
>  * 
>  * NOTE: the boolean passed here has changed. It used to be a boolean that 
> stated whether or not
>  * to wait on the lock. Now it is whether it an exclusive lock is requested.
>  * @param row  The row actions will be performed against
>  * @param readLock is the lock reader or writer. True indicates that a 
> non-exclusive lock is
>  * requested
>  * @see #startRegionOperation()
>  * @see #startRegionOperation(Operation)
>  */
> RowLock getRowLock(byte[] row, boolean readLock) throws IOException; {code}
> The implementation by default uses config "hbase.rowlock.wait.duration" as 
> row level lock timeout for both read and write locks. The default value is 
> quite high (~30s).
> While updating the cluster level row lock timeout might not be worth for all 
> use cases, having new API that takes timeout param would be really helpful 
> for critical latency sensitive Coproc APIs.
>  
> The new signature should be:
> {code:java}
> RowLock getRowLock(byte[] row, boolean readLock, int timeout) throws 
> IOException; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28850) Only return from ReplicationSink.replicationEntries while all background tasks are finished

2024-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28850:
---
Labels: pull-request-available  (was: )

> Only return from ReplicationSink.replicationEntries while all background 
> tasks are finished
> ---
>
> Key: HBASE-28850
> URL: https://issues.apache.org/jira/browse/HBASE-28850
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication, rpc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28645) Add build information to the REST server version endpoint

2024-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28645:
---
Labels: pull-request-available  (was: )

> Add build information to the REST server version endpoint
> -
>
> Key: HBASE-28645
> URL: https://issues.apache.org/jira/browse/HBASE-28645
> Project: HBase
>  Issue Type: New Feature
>  Components: REST
>Reporter: Istvan Toth
>Priority: Minor
>  Labels: pull-request-available
>
> There is currently no way to check the REST server version / build number 
> remotely.
> The */version/cluster* endpoint takes the version from master (fair enough),
> and the */version/rest* does not include the build information.
> We should add a version field to the /version/rest endpoint, which reports 
> the version of the REST server component.
> We should also log this at startup, just like we log the cluster version now.
> We may have to add and store the version in the hbase-rest code during build, 
> similarly to how do it for the other components.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28846) Change the default Hadoop3 version to 3.4.0, and add tests to make sure HBase works with earlier supported Hadoop versions

2024-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28846:
---
Labels: pull-request-available  (was: )

> Change the default Hadoop3 version to 3.4.0, and add tests to make sure HBase 
> works with earlier supported Hadoop versions
> --
>
> Key: HBASE-28846
> URL: https://issues.apache.org/jira/browse/HBASE-28846
> Project: HBase
>  Issue Type: Improvement
>  Components: hadoop3, test
>Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>
> Discussed on the mailing list:
> https://lists.apache.org/thread/orc62x0v2ktvj26ltvrqpfgzr94ncswn



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28860) Add a metric of the amount of data written to WAL to determine the pressure of replication

2024-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28860:
---
Labels: pull-request-available  (was: )

> Add a metric of the amount of data written to WAL to determine the pressure 
> of replication
> --
>
> Key: HBASE-28860
> URL: https://issues.apache.org/jira/browse/HBASE-28860
> Project: HBase
>  Issue Type: Improvement
>Reporter: terrytlu
>Priority: Major
>  Labels: pull-request-available
>
> Add a metric of the amount of data written to WAL to determine the pressure 
> of replication.
> Combined with the replication shipped size metric, the user can determine how 
> many RegionServers are needed to meet the data(WAL) writing requirements, 
> that is to achieve the goal of no replication lag.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28845) table level wal appendSize and replication source metrics not correctly shown in /jmx response

2024-09-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28845:
---
Labels: pull-request-available  (was: )

> table level wal appendSize and replication source metrics not correctly shown 
> in /jmx response
> --
>
> Key: HBASE-28845
> URL: https://issues.apache.org/jira/browse/HBASE-28845
> Project: HBase
>  Issue Type: Bug
>Reporter: terrytlu
>Assignee: terrytlu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-09-18-11-21-10-279.png, 
> image-2024-09-18-11-21-20-295.png
>
>
> Found 2 metrics did not display in the /jmx http interface response, table 
> level wal appendSize and table level replication source.
> I suspect it's because the metric name contains a colon : 
> !image-2024-09-18-11-21-10-279.png|width=521,height=161!
>  
> !image-2024-09-18-11-21-20-295.png|width=521,height=282!
>  
> after modify the table name string to "Namespace_$namespace_table_$table, the 
> metric really display correctly in the /jmx response
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28842) TestRequestAttributes should fail when expected

2024-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28842:
---
Labels: pull-request-available  (was: )

> TestRequestAttributes should fail when expected
> ---
>
> Key: HBASE-28842
> URL: https://issues.apache.org/jira/browse/HBASE-28842
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Evelyn Boland
>Assignee: Evelyn Boland
>Priority: Major
>  Labels: pull-request-available
>
> Problem:
> The tests in the TestRequestAttributes class pass even when they should fail. 
> I've included an example of a test that should fail but does not below.
> Fix:
> Throw an IOException in the AttributesCoprocessor when the map of expected 
> request attributes does not match the map of given request attributes.
>  
> Test: 
> We set 2+ request attributes on the Get request but always return 0 request 
> attributes from AttributesCoprocessor::getRequestAttributesForRowKey method. 
> Yet the test passes even though the map of expected request attributes never 
> matches the map of given request attributes.
> {code:java}
> @Category({ ClientTests.class, MediumTests.class })
> public class TestRequestAttributes {
>   @ClassRule
>   public static final HBaseClassTestRule CLASS_RULE =
> HBaseClassTestRule.forClass(TestRequestAttributes.class);
>   private static final byte[] ROW_KEY1 = Bytes.toBytes("1");
>   private static final Map> 
> ROW_KEY_TO_REQUEST_ATTRIBUTES =
> new HashMap<>();
>   static {
> CONNECTION_ATTRIBUTES.put("clientId", Bytes.toBytes("foo"));
> ROW_KEY_TO_REQUEST_ATTRIBUTES.put(ROW_KEY1, addRandomRequestAttributes());
>   }
>   private static final ExecutorService EXECUTOR_SERVICE = 
> Executors.newFixedThreadPool(100);
>   private static final byte[] FAMILY = Bytes.toBytes("0");
>   private static final TableName TABLE_NAME = 
> TableName.valueOf("testRequestAttributes");
>   private static final HBaseTestingUtil TEST_UTIL = new HBaseTestingUtil();
>   private static SingleProcessHBaseCluster cluster;
>   @BeforeClass
>   public static void setUp() throws Exception {
> cluster = TEST_UTIL.startMiniCluster(1);
> Table table = TEST_UTIL.createTable(TABLE_NAME, new byte[][] { FAMILY }, 
> 1,
>   HConstants.DEFAULT_BLOCKSIZE, AttributesCoprocessor.class.getName());
> table.close();
>   }
>   @AfterClass
>   public static void afterClass() throws Exception {
> cluster.close();
> TEST_UTIL.shutdownMiniCluster();
>   }
> @Test
> public void testRequestAttributesGet() throws IOException {
>   Configuration conf = TEST_UTIL.getConfiguration();
>   try (
> Connection conn = ConnectionFactory.createConnection(conf, null, 
> AuthUtil.loginClient(conf),
>   CONNECTION_ATTRIBUTES);
> Table table = configureRequestAttributes(conn.getTableBuilder(TABLE_NAME, 
> EXECUTOR_SERVICE),
>   ROW_KEY_TO_REQUEST_ATTRIBUTES.get(ROW_KEY1)).build()) {
> table.get(new Get(ROW_KEY1));
>   }
> }
> private static Map addRandomRequestAttributes() {
>   Map requestAttributes = new HashMap<>();
>   int j = Math.max(2, (int) (10 * Math.random()));
>   for (int i = 0; i < j; i++) {
> requestAttributes.put(String.valueOf(i), 
> Bytes.toBytes(UUID.randomUUID().toString()));
>   }
>   return requestAttributes;
> }
> public static class AttributesCoprocessor implements RegionObserver, 
> RegionCoprocessor {
> @Override
> public Optional getRegionObserver() {
>   return Optional.of(this);
> }
> @Override
> public void preGetOp(ObserverContext c, Get 
> get,
>   List result) throws IOException {
>
> validateRequestAttributes(getRequestAttributesForRowKey(get.getRow(;   
> }
> private Map getRequestAttributesForRowKey(byte[] rowKey) {
>   return Collections.emptyMap(); // This line helps demonstrate the bug
> }
> private boolean validateRequestAttributes(Map 
> requestAttributes) {
>   RpcCall rpcCall = RpcServer.getCurrentCall().get();
>   Map attrs = rpcCall.getRequestAttributes();
>   if (attrs.size() != requestAttributes.size()) {
> return;
>   }
>   for (Map.Entry attr : attrs.entrySet()) {
> if (!requestAttributes.containsKey(attr.getKey())) {
>   return;
> }
> if (!Arrays.equals(requestAttributes.get(attr.getKey()), 
> attr.getValue())) {
>   return;
> }
>   }
>   return;
> }
>   }
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28803) HBase Master stuck due to improper handling of WALSyncTimeoutException within UncheckedIOException

2024-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28803:
---
Labels: pull-request-available  (was: )

> HBase Master stuck due to improper handling of WALSyncTimeoutException within 
> UncheckedIOException
> --
>
> Key: HBASE-28803
> URL: https://issues.apache.org/jira/browse/HBASE-28803
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.6.0, 3.0.0-alpha-4
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Critical
>  Labels: pull-request-available
>
> One of our test clusters stuck during a rolling restart due to a WAL.sync 
> timeout. This issue did not result in the Master aborting because the 
> WALSyncTimeoutException was wrapped in an UncheckedIOException, which 
> prevented the proper exception handling mechanism from being triggered. As a 
> result, the Master was handing for a long time and  procedures were stuck.
> This was a 2.4 based HBase with HBASE-27230.
> {noformat}
> 2024-08-17 17:23:07,567 ERROR 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore: Failed 
> to delete pid=2027
> org.apache.hadoop.hbase.regionserver.wal.WALSyncTimeoutIOException: 
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync 
> result after 30 ms for txid=4347, WAL system stuck?
>     at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:848)
>     at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:718)
>     at org.apache.hadoop.hbase.regionserver.HRegion.sync(HRegion.java:8902)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.doWALAppend(HRegion.java:8469)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4523)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4447)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4377)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4853)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4847)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4843)
>     at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3155)
>     at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.lambda$delete$8(RegionProcedureStore.java:379)
>     at 
> org.apache.hadoop.hbase.master.region.MasterRegion.update(MasterRegion.java:141)
>     at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.delete(RegionProcedureStore.java:379)
>     at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.delete(RegionProcedureStore.java:410)
>     at 
> org.apache.hadoop.hbase.procedure2.CompletedProcedureCleaner.periodicExecute(CompletedProcedureCleaner.java:135)
>     at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeInMemoryChore(TimeoutExecutorThread.java:122)
>     at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:101)
>     at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
> Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=4347, WAL system stuck?
>     at 
> org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:171)
>     at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:844)
>     ... 18 more
> 2024-08-17 17:23:07,568 ERROR 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread: Ignoring pid=-1, 
> state=WAITING_TIMEOUT; 
> org.apache.hadoop.hbase.procedure2.CompletedProcedureCleaner exception: 
> org.apache.hadoop.hbase.regionserver.wal.WALSyncTimeoutIOException: 
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync 
> result after 30 ms for txid=4347, WAL system stuck?
> java.io.UncheckedIOException: 
> org.apache.hadoop.hbase.regionserver.wal.WALSyncTimeoutIOException: 
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync 
> result after 30 ms for txid=4347, WAL system stuck?
>     at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.delete(RegionProcedureStore.java:383)
>     at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.delete(RegionProcedureStore.java:410)
>     at 
> org.apache.hadoop.hbase.procedure2.CompletedProcedureCleaner.periodicExecute(CompletedProcedureCleaner.java:135)
>     at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeInMemoryChore(Ti

[jira] [Updated] (HBASE-28840) Optimise memory utilisation retrieval of bucket-cache from persistence.

2024-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28840:
---
Labels: pull-request-available  (was: )

> Optimise memory utilisation retrieval of bucket-cache from persistence.
> ---
>
> Key: HBASE-28840
> URL: https://issues.apache.org/jira/browse/HBASE-28840
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Reporter: Janardhan Hungund
>Assignee: Janardhan Hungund
>Priority: Major
>  Labels: pull-request-available
>
> During the persistence of bucket-cache backing map to a file, the backing map 
> is divided into multiple smaller chunks and persisted to the file. This 
> chunking avoids the high memory utilisation of during persistence, since only 
> a small subset of backing map entries need to persisted in one chunk.
> However, during the retrieval of the backing map during the server startup, 
> we accumulate all these chunks into a list and then process each chunk to 
> recreate the in-memory backing map. Since, all the chunks are fetched from 
> the persistence file and then processed, the memory requirement is higher.
> The retrieval of bucket-cache from persistence file can be optimised to 
> enable the processing of one chunk at a time to avoid high memory utilisation.
> Thanks,
> Janardhan 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28841) Modify default value of hbase.bucketcache.persistence.chunksize to 10K

2024-09-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28841:
---
Labels: pull-request-available  (was: )

> Modify default value of hbase.bucketcache.persistence.chunksize to 10K
> --
>
> Key: HBASE-28841
> URL: https://issues.apache.org/jira/browse/HBASE-28841
> Project: HBase
>  Issue Type: Bug
>Reporter: Janardhan Hungund
>Assignee: Janardhan Hungund
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the default value of the configuration parameter
> "hbase.bucketcache.persistence.chunksize" is 10 million (1000). 
> This is the number of block entries that are processed during the persistence 
> of bucket-cache backing map to the persistence file. During the testing, it 
> was found that, this high number of chunksize resulted in high heap 
> utilisation in region servers leading to longer GC pauses which also led to 
> server crashes intermittently.
> When the value of this configuration is set to 10K(1), the cache remains 
> stable. No GC delays are observed. Also no server crashes are observed.
> The jmap outputs collected against the regionservers showed reduced memory 
> utilisation from 4.5-5GB to 1-1.5GB for the objects related to persistence 
> code.
> Hence, we need to adjust the default value of this configuration parameter to 
> 10K.
> Thanks,
> Janardhan



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28839) Exception handling during retrieval of bucket-cache from persistence.

2024-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28839:
---
Labels: pull-request-available  (was: )

> Exception handling during retrieval of bucket-cache from persistence.
> -
>
> Key: HBASE-28839
> URL: https://issues.apache.org/jira/browse/HBASE-28839
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Reporter: Janardhan Hungund
>Assignee: Janardhan Hungund
>Priority: Major
>  Labels: pull-request-available
>
> During the retrieval of bucket cache from the persistence file during the 
> startup, it was observed that, if an exception, other than, the IOException 
> occurs, the bucket cache internal members remain uninitialised and cause the 
> bucket to remain unusable. The exception is not logged in the trace file and 
> the retrieval thread exits without initialising the bucket-cache.
> Also, the NullPointerExceptions are seen when, trying to use the cache.
> {code:java}
> 2024-09-10 14:33:30,020 ERROR 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache: WriterThread encountered 
> error
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$RAMQueueEntry.writeToCache(BucketCache.java:1975)
> at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.doDrain(BucketCache.java:1298)
>  {code}
>  
> {code:java}
> 2024-09-13 07:01:05,964 ERROR 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: Error getting metrics 
> from source RegionServer,sub=Server
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getFreeSize(BucketCache.java:1819)
> at 
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getFreeSize(CombinedBlockCache.java:179)
> at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionServerWrapperImpl.getBlockCacheFreeSize(MetricsRegionServerWrapperImpl.java:308)
> at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceImpl.addGaugesToMetricsRecordBuilder(MetricsRegionServerSourceImpl.java:525)
> at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceImpl.getMetrics(MetricsRegionServerSourceImpl.java:333)
>  {code}
> All type of exceptions need to be handled gracefully.
> All types of exceptions must be logged to the trace file.
> The bucket cache needs to reinitialised and made usable.
> Thanks,
> Janardhan



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28432) Refactor tools which are under test packaging to a new module hbase-tools

2024-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28432:
---
Labels: pull-request-available  (was: )

> Refactor tools which are under test packaging to a new module hbase-tools
> -
>
> Key: HBASE-28432
> URL: https://issues.apache.org/jira/browse/HBASE-28432
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
>  Labels: pull-request-available
>
> The purpose of this task is to refactor and move certain tools currently 
> located under the test packaging to a new module, named 'hbase-tools'.
> The following tools have been initially identified for relocation(will add 
> more as and when identified):
>  - 
> [PerformanceEvaluation|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java]
>  - 
> [LoadTestTool|https://github.com/apache/hbase/blob/936d267d1094e37222b9b836ab068689ccce3574/hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/util/LoadTestTool.java]
>  - 
> [HFilePerformanceEvaluation|https://github.com/apache/hbase/blob/936d267d1094e37222b9b836ab068689ccce3574/hbase-server/src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java]
>  - 
> [ScanPerformanceEvaluation|https://github.com/apache/hbase/blob/936d267d1094e37222b9b836ab068689ccce3574/hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/ScanPerformanceEvaluation.java]
>  - 
> [LoadBalancerPerformanceEvaluation|https://github.com/apache/hbase/blob/936d267d1094e37222b9b836ab068689ccce3574/hbase-balancer/src/test/java/org/apache/hadoop/hbase/master/balancer/LoadBalancerPerformanceEvaluation.java]
> These tools are valuable beyond the scope of testing and should be accessible 
> in the binary distribution of HBase. However, their current location within 
> the test jars adds unnecessary bloat to the assembly and classpath, and 
> potentially introduces CVE-prone JARs into the binary assemblies. We plan to 
> remove all test jars from assembly with HBASE-28433.
> This task involves creating the new 'hbase-tools' module, and moving the 
> identified tools into this module. It also includes ensuring that these tools 
> function correctly in their new location and that their relocation does not 
> negatively impact any existing functionality or dependencies.
> CC: [~stoty], [~zhangduo], [~ndimiduk], [~bbeaudreault]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28830) when a procedure on a table executed as a child procedure, further table procedure operations on that table are blocked forever waiting to acquire the table procedure lo

2024-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28830:
---
Labels: pull-request-available  (was: )

> when a procedure on a table executed as a child procedure, further table 
> procedure operations on that table are blocked forever waiting to acquire the 
> table procedure lock 
> 
>
> Key: HBASE-28830
> URL: https://issues.apache.org/jira/browse/HBASE-28830
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 3.0.0-beta-1, 2.5.10
>Reporter: Chandra Sekhar K
>Assignee: Chandra Sekhar K
>Priority: Critical
>  Labels: pull-request-available
>
> when a procedure on a table executed as a child procedure, furthur table 
> procedure operations on that table are blocked forever waiting to aquire the 
> table procedure lock .
> This issue occur due to not clearing of table lock for the table procedures 
> submitted as child procedures after the changes in HBASE-28683,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28836) Parallelize the archival of compacted files

2024-09-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28836:
---
Labels: pull-request-available  (was: )

> Parallelize the archival of compacted files 
> 
>
> Key: HBASE-28836
> URL: https://issues.apache.org/jira/browse/HBASE-28836
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 2.5.10
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
>
> While splitting a region in hbase it has to cleanup compacted files for 
> bookkeeping.
>  
> Currently we do it sequentially and that is good enough because for hdfs it 
> is a fast operation. When we do the same in s3 it becomes a issue. We need to 
> paralleize this to make it faster.
> {code:java}
> // code placeholder
> for (File file : toArchive) {
>       // if its a file archive it
>       try {
>         LOG.trace("Archiving {}", file);
>         if (file.isFile()) {
>           // attempt to archive the file
>           if (!resolveAndArchiveFile(baseArchiveDir, file, startTime)) {
>             LOG.warn("Couldn't archive " + file + " into backup directory: " 
> + baseArchiveDir);
>             failures.add(file);
>           }
>         } else {
>           // otherwise its a directory and we need to archive all files
>           LOG.trace("{} is a directory, archiving children files", file);
>           // so we add the directory name to the one base archive
>           Path parentArchiveDir = new Path(baseArchiveDir, file.getName());
>           // and then get all the files from that directory and attempt to
>           // archive those too
>           Collection children = file.getChildren();
>           failures.addAll(resolveAndArchive(fs, parentArchiveDir, children, 
> start));
>         }
>       } catch (IOException e) {
>         LOG.warn("Failed to archive {}", file, e);
>         failures.add(file);
>       }
>     } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-25768:
---
Labels: pull-request-available  (was: )

> Support an overall coarse and fast balance strategy for StochasticLoadBalancer
> --
>
> Key: HBASE-25768
> URL: https://issues.apache.org/jira/browse/HBASE-25768
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0-beta-2
>
>
> When we use StochasticLoadBalancer + balanceByTable, we could face two 
> difficulties.
>  # For each table, their regions are distributed uniformly, but for the 
> overall cluster, still exiting imbalance between RSes;
>  # When there are large-scaled restart of RSes, or expansion for groups or 
> cluster, we hope the balancer can execute as soon as possible, but the 
> StochasticLoadBalancer may need a lot of time to compute costs.
> We can detect these circumstances in StochasticLoadBalancer(such as using the 
> percentage of skew tables), and before the normal balance steps trying, we 
> can add a strategy to let it just balance like the SimpleLoadBalancer or use 
> few light cost functions here.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28825) Add deprecation cycle for methods in TokenUtil

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28825:
---
Labels: pull-request-available  (was: )

> Add deprecation cycle for methods in TokenUtil
> --
>
> Key: HBASE-28825
> URL: https://issues.apache.org/jira/browse/HBASE-28825
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Duo Zhang
>Assignee: MisterWang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28822) Change the LOG field in CodecPerformance to private

2024-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28822:
---
Labels: pull-request-available  (was: )

> Change the LOG field in CodecPerformance to private
> ---
>
> Key: HBASE-28822
> URL: https://issues.apache.org/jira/browse/HBASE-28822
> Project: HBase
>  Issue Type: Sub-task
>  Components: logging, test
>Reporter: Duo Zhang
>Assignee: MisterWang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28821) Optimise bucket cache persistence by reusing backmap entry object.

2024-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28821:
---
Labels: pull-request-available  (was: )

> Optimise bucket cache persistence by reusing backmap entry object.
> --
>
> Key: HBASE-28821
> URL: https://issues.apache.org/jira/browse/HBASE-28821
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Reporter: Janardhan Hungund
>Assignee: Janardhan Hungund
>Priority: Major
>  Labels: pull-request-available
>
> During the persistence of backing map entries into the backing map file, we 
> create a new BackingMapEntry.Builder for each entry in the backing map. This 
> can be optimised by using a single BackingMapEntry.Builder object and using 
> it to build each entry during serialisation.
> This Jira tracks the optimisation by avoiding multiple builder objects.
> Thanks,
> Janardhan



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28812) Upgrade from 2.6.0 to 3.0.0 crashed

2024-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28812:
---
Labels: pull-request-available upgrade  (was: upgrade)

> Upgrade from 2.6.0 to 3.0.0 crashed
> ---
>
> Key: HBASE-28812
> URL: https://issues.apache.org/jira/browse/HBASE-28812
> Project: HBase
>  Issue Type: Bug
>  Components: compatibility
>Affects Versions: 3.0.0
>Reporter: Ke Han
>Assignee: Duo Zhang
>Priority: Major
>  Labels: pull-request-available, upgrade
> Attachments: hbase--master-2d6e4fad2af5.log, 
> hbase--master-440ed844e077.log
>
>
> I am trying to upgrade from 2.6.0 (stable release) to 3.0.0. I built 3.0.0 
> using the following commit (a030e8099840e640684a68b6e4a79e7c1d5a6823)
> {code:java}
> commit a030e8099840e640684a68b6e4a79e7c1d5a6823 (HEAD -> branch-3, 
> upstream/branch-3)
> Author: Ray Mattingly 
> Date:   Mon Sep 2 04:38:29 2024 -0400    HBASE-28697 Don't clean bulk load 
> system entries until backup is complete (#6089)
>     
>     Co-authored-by: Ray Mattingly 
> {code}
> However, the HMaster would crash during the upgrade process.
> h1. Reproduce
> Step1: Start up 2.6.0 cluster (1 HDFS, 1 HM, 1 RS)
> Step2: Stop the entire cluster
> Step3: Upgrade to 3.0.0 cluster.
> HMaster will crash with the following error message
> {code:java}
> 2024-09-04T04:29:18,917 WARN  [master/hmaster:16000:becomeActiveMaster] 
> regionserver.HRegion: Failed initialize of region= 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682., starting to roll back 
> memstore
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
> Trailer from file 
> hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1215)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1158)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1030)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:974) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7794) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7749)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:277)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:432)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:135)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1003)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at java.lang.Thread.run(Thread.java:833) ~[?:?]
> Caused by: java.io.IOException: 
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
> Trailer from file 
> hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75
>         at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:289)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:339)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:301) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> o

[jira] [Updated] (HBASE-28580) Revert the deprecation for methods in WALObserver

2024-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28580:
---
Labels: pull-request-available  (was: )

> Revert the deprecation for methods in WALObserver
> -
>
> Key: HBASE-28580
> URL: https://issues.apache.org/jira/browse/HBASE-28580
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors, wal
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Per the discussion in this thread
> https://lists.apache.org/thread/28c2tn4kn9gwvtsdtcbxx1c5tjdfh5jy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28810) BackupLogCleaner is difficult to debug

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28810:
---
Labels: pull-request-available  (was: )

> BackupLogCleaner is difficult to debug
> --
>
> Key: HBASE-28810
> URL: https://issues.apache.org/jira/browse/HBASE-28810
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.6.1
>Reporter: Ray Mattingly
>Assignee: Ray Mattingly
>Priority: Major
>  Labels: pull-request-available
>
> While implementing HBase's incremental backups across a few hundred clusters, 
> we continue to step on some rakes. Now and again, we find old WALs piling up 
> due to a poorly cleaned up BackupInfo, or a bug in the BackupLogCleaner, etc.
> The BackupLogCleaner is difficult to debug for a couple of reasons:
>  # It has a lack of useful debug logging
>  # It has [a misleadingly named 
> method|https://github.com/HubSpot/hbase/blob/2d08fa67dfe9458260bc0be3ebc7bbd769850190/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/master/BackupLogCleaner.java#L83-L83]
>  (this method returns the newest backup ts, not the oldest)
> I'm going to introduce a small refactor that will alleviate these pain points.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28807) Remove some useless code and add some logs for CanaryTool

2024-08-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28807:
---
Labels: pull-request-available  (was: )

> Remove some useless code and add some logs for CanaryTool
> -
>
> Key: HBASE-28807
> URL: https://issues.apache.org/jira/browse/HBASE-28807
> Project: HBase
>  Issue Type: Improvement
>  Components: canary
>Reporter: MisterWang
>Assignee: MisterWang
>Priority: Minor
>  Labels: pull-request-available
>
> Remove some useless code in CanaryTool.sniff.
> Add some logs when get null location for table region.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28433) Modify the assembly to not include test jars and their transitive dependencies

2024-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28433:
---
Labels: pull-request-available  (was: )

> Modify the assembly to not include test jars and their transitive dependencies
> --
>
> Key: HBASE-28433
> URL: https://issues.apache.org/jira/browse/HBASE-28433
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
>  Labels: pull-request-available
>
> This task aims to modify the HBase assembly to exclude test jars and their 
> transitive dependencies.
> Currently, our assembly includes test jars and test dependencies, which adds 
> unnecessary bloat to the assembly and classpath. This not only increases the 
> distribution size but also potentially introduces CVE-prone JARs into the 
> binary assemblies.
> The objective of this task is to modify the build and assembly process to 
> exclude these test jars and their dependencies. This will result in a leaner, 
> more secure assembly with a faster startup time.
> CC:  [~stoty], [~zhangduo], [~ndimiduk], [~bbeaudreault]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28805) Implement chunked persistence of backing map for persistent bucket cache.

2024-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28805:
---
Labels: pull-request-available  (was: )

> Implement chunked persistence of backing map for persistent bucket cache.
> -
>
> Key: HBASE-28805
> URL: https://issues.apache.org/jira/browse/HBASE-28805
> Project: HBase
>  Issue Type: Task
>  Components: BucketCache
>Reporter: Janardhan Hungund
>Assignee: Janardhan Hungund
>Priority: Major
>  Labels: pull-request-available
>
> The persistent bucket cache implementation feature relies on the persistence 
> of backing map to a persistent file. the protobuf APIs are used to serialise 
> the backing map and its related structures into the file. An asynchronous 
> thread periodically flushes the contents of backing map to the persistence 
> file.
> The protobuf library has a limitation of 2GB on the size of protobuf 
> messages. If the size of backing map increases beyond 2GB, an unexpected 
> exception is reported in the asynchronous thread and stops the persister 
> thread. This causes the persistent file go out of sync with the actual bucket 
> cache. Due to this, the bucket cache shrinks to a smaller size after a cache 
> restart. Checksum errors are also reported.
> This Jira tracks the implementation of introducing chunking of the backing 
> map to persistence such that every protobuf is smaller than 2GB in size.
> Thanks,
> Janardhan



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28804) Implement asynchronous retrieval of bucket-cache data from persistence.

2024-08-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28804:
---
Labels: pull-request-available  (was: )

> Implement asynchronous retrieval of bucket-cache data from persistence.
> ---
>
> Key: HBASE-28804
> URL: https://issues.apache.org/jira/browse/HBASE-28804
> Project: HBase
>  Issue Type: Task
>  Components: BucketCache
>Reporter: Janardhan Hungund
>Assignee: Janardhan Hungund
>Priority: Major
>  Labels: pull-request-available
>
>     During the retrieval of data from bucket cache persistence file, a 
> transient structure that stores the blocks ordered by filename is constructed 
> from the backing map entries. The  population of this transient structure is 
> done during the server start-up. This process increases the region-server 
> startup time, if the bucketcache has large number of blocks.
> This population happens inline with the server restart and blocks the server 
> for several minutes. This makes the server restart inconvenient for the 
> external users. Restarts during upgrade can run into timeout issues due to 
> this delay in the server startup.
>  Hence, the recommendation in this Jira is to make the cache-retrieval 
> asynchronous to the server startup. During a server startup, a new thread is 
> spawn that reads the persistence file and creates the required structures 
> from persistence file. The server continues with the restart and does not 
> wait for the bucket-cache initialisation to complete.  
>  Note that the bucket cache is not available immediately for usage and will 
> only be ready to use after the data is repopulated from persistence into 
> memory.
> Thanks,
> Janardhan



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28801) WALs are not cleaned even after all entries are flushed

2024-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28801:
---
Labels: pull-request-available  (was: )

> WALs are not cleaned even after all entries are flushed
> ---
>
> Key: HBASE-28801
> URL: https://issues.apache.org/jira/browse/HBASE-28801
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.5.6
>Reporter: Kiran Kumar Maturi
>Assignee: Kiran Kumar Maturi
>Priority: Minor
>  Labels: pull-request-available
>
> In our production fleet we have observed that WAL files are not cleaned up 
> even when all the entries have been flushed. I have fixed the WAL  close 
> issue when there is issue with the wal closures as part of 
> [HBASE-28665|https://issues.apache.org/jira/projects/HBASE/issues/HBASE-28665].
>  There is a case in case of unflushed entried that can lead for the wal not 
> be cleaned even after all the entries have been flushed
> [FSHLog.java 
> |https://github.com/apache/hbase/blob/branch-2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L388]
> {code:java}
>  if (isUnflushedEntries() || closeErrorCount.get() >= 
> this.closeErrorsTolerated) {
>   try {
> closeWriter(this.writer, oldPath, true);
>   } finally {
> inflightWALClosures.remove(oldPath.getName());
> if (!isUnflushedEntries()) {
>   markClosedAndClean(oldPath);
> }
>   }
> {code}
> If there are unflushed entries then wal will never be marked close and won't 
> be cleaned further
> {code:java}
> private synchronized void cleanOldLogs() {
> List> logsToArchive = null;
> // For each log file, look at its Map of regions to the highest sequence 
> id; if all sequence ids
> // are older than what is currently in memory, the WAL can be GC'd.
> for (Map.Entry e : this.walFile2Props.entrySet()) {
>   if (!e.getValue().closed) {
> LOG.debug("{} is not closed yet, will try archiving it next time", 
> e.getKey());
> continue;
>   }
>   Path log = e.getKey();
>   Map sequenceNums = 
> e.getValue().encodedName2HighestSequenceId;
>   if (this.sequenceIdAccounting.areAllLower(sequenceNums)) {
> if (logsToArchive == null) {
>   logsToArchive = new ArrayList<>();
> }
> logsToArchive.add(Pair.newPair(log, e.getValue().logSize));
> if (LOG.isTraceEnabled()) {
>   LOG.trace("WAL file ready for archiving " + log);
> }
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28802) Fix the expected RS hostname message during useIP feature enabled

2024-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28802:
---
Labels: pull-request-available  (was: )

> Fix the expected  RS hostname message during useIP feature enabled
> --
>
> Key: HBASE-28802
> URL: https://issues.apache.org/jira/browse/HBASE-28802
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 3.0.0-alpha-3, 2.5.3
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Y. SREENIVASULU REDDY
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0-alpha-4, 2.7.0
>
>
> For HRegionServer#handleReportForDutyResponse, when the hostname is different 
> from the regionserver and master side, both the two conditions should abort 
> RS error message is corrected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28732) Fix typo in Jenkinsfile_Github for jdk8 hadoop2 check

2024-08-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28732:
---
Labels: beginner pull-request-available trivial  (was: beginner trivial)

> Fix typo in Jenkinsfile_Github for jdk8 hadoop2 check
> -
>
> Key: HBASE-28732
> URL: https://issues.apache.org/jira/browse/HBASE-28732
> Project: HBase
>  Issue Type: Improvement
>  Components: jenkins
>Reporter: Duo Zhang
>Assignee: JinHyuk Kim
>Priority: Major
>  Labels: beginner, pull-request-available, trivial
>
> https://github.com/apache/hbase/blob/9dee538f65d84a900724d424c71793dff46e9684/dev-support/Jenkinsfile_GitHub#L314
> This line
> PR JDK8 Hadoop3 Check Report
> Should be
> PR JDK8 Hadoop2 Check Report



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28774) Protect hbase:meta from any hotspotting regions by balancing them to different region servers

2024-08-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28774:
---
Labels: pull-request-available  (was: )

> Protect hbase:meta from any hotspotting regions by balancing them to 
> different region servers
> -
>
> Key: HBASE-28774
> URL: https://issues.apache.org/jira/browse/HBASE-28774
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Reporter: Ranganath Govardhanagiri
>Assignee: Ranganath Govardhanagiri
>Priority: Major
>  Labels: pull-request-available
>
> During some of the incidents, it is observed that when hbase:meta is 
> colocated with high load (or hotspotting) region, that makes meta unavailable 
> and causes availability issues. This item is to provide a way to load balance 
> such regions such that meta is not impacted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27746) Check if the file system supports storage policy before invoking setStoragePolicy()

2024-08-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-27746:
---
Labels: pull-request-available  (was: )

> Check if the file system supports storage policy before invoking 
> setStoragePolicy()
> ---
>
> Key: HBASE-27746
> URL: https://issues.apache.org/jira/browse/HBASE-27746
> Project: HBase
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: pull-request-available
>
> Found these messages on an Ozone cluster:
> {noformat}
> 2023-03-20 12:27:09,185 WARN org.apache.hadoop.hbase.util.CommonFSUtils: 
> Unable to set storagePolicy=HOT for 
> path=ofs://ozone1/vol1/bucket1/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/proc.
>  DEBUG log level might have more details.
> java.lang.UnsupportedOperationException: RootedOzoneFileSystem doesn't 
> support setStoragePolicy
> at 
> org.apache.hadoop.fs.FileSystem.setStoragePolicy(FileSystem.java:3227)
> at 
> org.apache.hadoop.hbase.util.CommonFSUtils.invokeSetStoragePolicy(CommonFSUtils.java:521)
> at 
> org.apache.hadoop.hbase.util.CommonFSUtils.setStoragePolicy(CommonFSUtils.java:504)
> at 
> org.apache.hadoop.hbase.util.CommonFSUtils.setStoragePolicy(CommonFSUtils.java:477)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.setStoragePolicy(HRegionFileSystem.java:225)
> at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:275)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6387)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1115)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1112)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Ozone does not support storage policy. If we use 
> FileSystem.hasPathCapability() API to check before invoking the API, these 
> warning messages can be avoided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27903) Skip submitting Split/Merge procedure when split/merge is disabled at table level

2024-08-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-27903:
---
Labels: pull-request-available  (was: )

> Skip submitting Split/Merge procedure when split/merge is disabled at table 
> level
> -
>
> Key: HBASE-27903
> URL: https://issues.apache.org/jira/browse/HBASE-27903
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin
>Reporter: Ashok shetty
>Assignee: Nihal Jain
>Priority: Minor
>  Labels: pull-request-available
>
> *Scenario*
> If split/merge is disabled at table level , master will submit a 
> SplitTableRegionProcedure/MergeTableRegionsProcedure , and rollback it as 
> execution fails during pre-checks .
> *Improvement*
> Master can check it early and no need to submit 
> SplitTableRegionProcedure/MergeTableRegionsProcedure when split/merge switch 
> is disabled at Table level.
> *Steps*
> {code:java}
> create 'testCreateTableWithMergeDisableParameter', 'f1', {MERGE_ENABLED => 
> false}
> list_regions 'testCreateTableWithMergeDisableParameter'
> merge_region 
> 'd21cdc5d488e8036017696c46cffd9b1','6382c8f731a4f0379b6e98ece4b06e3e'
> {code}
> {code:java}
> create 'testcreatetablewithsplitdisableparameter', 'f1', {SPLIT_ENABLED => 
> false}
> split 'testcreatetablewithsplitdisableparameter','30'{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28792) AsyncTableImpl calls coprocessor callbacks in undefined order

2024-08-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28792:
---
Labels: pull-request-available  (was: )

> AsyncTableImpl calls coprocessor callbacks in undefined order
> -
>
> Key: HBASE-28792
> URL: https://issues.apache.org/jira/browse/HBASE-28792
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Reporter: Charles Connell
>Priority: Major
>  Labels: pull-request-available
>
> To call a coprocessor endpoint asynchronously, you start by calling 
> {{AsyncTable#coprocessorService()}}, which gives you a 
> {{CoprocessorServiceBuilder}}, and a few steps later you can talk to your 
> coprocessor over the network. One argument to 
> {{AsyncTable#coprocessorService()}} is a {{CoprocessorCallback}} object, 
> which contains several methods that will be called during the lifecycle of a 
> coprocessor endpoint call. {{AsyncTableImpl}}'s implementation of 
> {{AsyncTable#coprocessorService()}} wraps your {{CoprocessorCallback}} with 
> its own that delegates the work to a thread pool. A snippet of this:
> {code}
>   @Override
>   public void onRegionComplete(RegionInfo region, R resp) {
> pool.execute(context.wrap(() -> callback.onRegionComplete(region, 
> resp)));
>   }
> ...
>   @Override
>   public void onComplete() {
> pool.execute(context.wrap(callback::onComplete));
>   }
> {code}
> The trouble with this is that your implementations of {{onRegionComplete()}} 
> and {{onComplete()}} will end up getting called in a random order, and/or at 
> the same time. The tasks of calling them are delegated to a thread pool, and 
> the completion of those tasks is not waited on, so the thread pool can choose 
> any ordering it wants to. Troublingly, {{onComplete()}} can be called before 
> the final {{onRegionComplete()}}, which is an violation of [the contract 
> specified in the {{CoprocessorCallback#onComplete()}} 
> javadoc|https://github.com/apache/hbase/blob/41dd87cd908d4d089d0b8cff6c88c01ed60622c5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTable.java#L671].
> I discovered this while working on HBASE-28770. I found that 
> {{AsyncAggregationClient#rowCount()}} returns incorrect results 5-10% of the 
> time, and this bug is the reason. Other {{AsyncAggregationClient}} methods I 
> presume are similarly affected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28770) Pass partial results from AggregateImplementation when quotas are exceeded

2024-08-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28770:
---
Labels: pull-request-available  (was: )

> Pass partial results from AggregateImplementation when quotas are exceeded
> --
>
> Key: HBASE-28770
> URL: https://issues.apache.org/jira/browse/HBASE-28770
> Project: HBase
>  Issue Type: Improvement
>  Components: Coprocessors
>Reporter: Charles Connell
>Assignee: Charles Connell
>Priority: Major
>  Labels: pull-request-available
>
> Currently there is a gap in the coverage of HBase's quota-based workload 
> throttling. Requests sent by {{[Async]AggregationClient}} reach 
> {{AggregateImplementation}}. This then executes Scans in a way that bypasses 
> the quota system. We see issues with this at Hubspot where clusters suffer 
> under this load and we don't have a good way to protect them.
> In this ticket I'm teaching {{AggregateImplementation}} to optionally stop 
> scanning when a throttle is violated, and send back just the results it has 
> accumulated so far. In addition, it will send back a row key to 
> {{AsyncAggregationClient}}. When the client gets a response with a row key, 
> it will sleep in order to satisfy the throttle, and then send a new request 
> with a scan starting at that row key. This will have the effect of continuing 
> the work where the last request stopped.
> This feature will be unconditionally enabled by {{AsyncAggregationClient}} 
> once this ticket is finished. {{AggregateImplementation}} will not assume 
> that clients support partial results, however, so it can keep supporting 
> older clients. For clients that do not support partial results, throttles 
> will not be respecting, and results will always be complete.
> This feature was [first proposed on the mailing 
> list|https://lists.apache.org/thread/1vqnxb71z7swq2cogz4qg3cn6b10xp4v]. 
> Builds on work in HBASE-28346.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28793) Update hbase-thirdparty to 4.1.8

2024-08-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28793:
---
Labels: pull-request-available  (was: )

> Update hbase-thirdparty to 4.1.8
> 
>
> Key: HBASE-28793
> URL: https://issues.apache.org/jira/browse/HBASE-28793
> Project: HBase
>  Issue Type: Task
>  Components: dependencies
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28786) Fix classname for command: copyreppeers in bin/hbase

2024-08-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28786:
---
Labels: pull-request-available  (was: )

> Fix classname for command: copyreppeers in bin/hbase
> 
>
> Key: HBASE-28786
> URL: https://issues.apache.org/jira/browse/HBASE-28786
> Project: HBase
>  Issue Type: Bug
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Minor
>  Labels: pull-request-available
>
> Stumbled upon this. Dug deeper seems during review we missed to rename the 
> classname in bin/hbase when the actual class was renamed from 
> ReplicationPeerMigrationTool -> CopyReplicationPeers
>  
> See 
> https://github.com/apache/hbase/compare/69603351b3f2817c74d869d32da0596bab3c409e..1d11ce96c44277df6ccdd16ae2c9d8a1c419f3da
> [hbase@hostname~]$ hbase copyreppeers
> Error: Could not find or load main class 
> org.apache.hadoop.hbase.replication.ReplicationPeerMigrationTool
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.replication.ReplicationPeerMigrationTool 
>  
> FYI [~zhangduo] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28784) Exclude samples and release-documentation zip of jaxws-ri from output tarball

2024-08-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28784:
---
Labels: pull-request-available  (was: )

> Exclude samples and release-documentation zip of jaxws-ri from output tarball
> -
>
> Key: HBASE-28784
> URL: https://issues.apache.org/jira/browse/HBASE-28784
> Project: HBase
>  Issue Type: Task
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
>  Labels: pull-request-available
>
> Found this while (testing HBASE-28070 and) I was checking lib folder for 
> extracted assembly for master. I guess this must be a problem for all 
> branches fixed for HBASE-28760
> Following zip files are there in hbase-4.0.0-alpha-1-SNAPSHOT/lib:
> * samples-2.3.2.zip
> * release-documentation-2.3.2-docbook.zip



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28783) Concurrent execution of normalizer operations on tables in RegionNormalizerWorker

2024-08-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28783:
---
Labels: pull-request-available  (was: )

> Concurrent execution of normalizer operations on tables in 
> RegionNormalizerWorker
> -
>
> Key: HBASE-28783
> URL: https://issues.apache.org/jira/browse/HBASE-28783
> Project: HBase
>  Issue Type: Improvement
>  Components: Normalizer
>Reporter: MisterWang
>Assignee: MisterWang
>Priority: Major
>  Labels: pull-request-available
>
> Recently, I have been managing the large tables in the HBase cluster by 
> enabling the normalizer to set the size of the regions and keep the number of 
> regions within a reasonable range.
> The current code retrieves tables from RegionNormalizerWorkQueue and performs 
> normalization operations on the tables in series. When there are multiple 
> large tables in a cluster that need to be managed, the efficiency will be 
> very low.
> I have confirmed that each split or merge plan generated by each table during 
> the normalizer process will be limited by RateLimiter, so I think it is 
> reasonable to perform table normalizers concurrently.
> In terms of implementation, create a thread pool for executing tasks in 
> RegionNormalizerWorker, with a default value of 1 for the number of thread 
> pools, and provide a parameter that can be configured to other values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28778) NPE may occur when opening master-status or table.jsp or procedure.jsp while Master is initializing

2024-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28778:
---
Labels: pull-request-available  (was: )

> NPE may occur when opening master-status or table.jsp or procedure.jsp while 
> Master is initializing
> ---
>
> Key: HBASE-28778
> URL: https://issues.apache.org/jira/browse/HBASE-28778
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: guluo
>Priority: Major
>  Labels: pull-request-available
>
> The reason:
> For table.jsp, NPE may occur when calling  master.getConnection() while 
> Master is initializing.
> asyncClusterConnection will only be initialized when HMaster call method 
> setupClusterConnection,
> so before that, asyncClusterConnection is null, at this moment,  we would get 
> NPE if opening table.jsp.
> procedure.jsp and master-status may also encounter NPE for the similar reason.
>  
> Error Message:
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.hadoop.hbase.procedure2.ProcedureExecutor.getProcedures()" 
> because "procExecutor" is null at 
> org.apache.hadoop.hbase.generated.master.procedures_jsp._jspService(procedures_jsp.java:76)
>  at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:111) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1450)
>  at 
> org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
>  at 
> org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
>  at 
> org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:117)
>  at 
> org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>  
> 2024-08-13T20:41:40,056 WARN  [qtp463313451-80] server.HttpChannel: 
> /master-status
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.hadoop.hbase.master.assignment.RegionStateNode.isInState(org.apache.hadoop.hbase.master.RegionState$State[])"
>  because "rsn" is null
>         at 
> org.apache.hadoop.hbase.master.http.MasterStatusServlet.getMetaLocationOrNull(MasterStatusServlet.java:80)
>  ~[hbase-server-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.http.MasterStatusServlet.doGet(MasterStatusServlet.java:60)
>  ~[hbase-server-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) 
> ~[javax.servlet-api-3.1.0.jar:3.1.0]
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
> ~[javax.servlet-api-3.1.0.jar:3.1.0]
>         at 
> org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
>  ~[hbase-shaded-jetty-4.1.7.jar:?]
>         at 
> org.apache.hbase.thirdparty.org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
>  ~[hbase-shaded-jetty-4.1.7.jar:?]
>         at 
> org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:117)
>  ~[hbase-http-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28654) Support only remove expired files in the compact process

2024-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28654:
---
Labels: pull-request-available  (was: )

> Support only remove expired files in the compact process
> 
>
> Key: HBASE-28654
> URL: https://issues.apache.org/jira/browse/HBASE-28654
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: MisterWang
>Assignee: MisterWang
>Priority: Minor
>  Labels: pull-request-available
>
> As is known to all, the compact processes generate a certain amount of I/O. 
> But in some cases, not being compact in time does not affect online services.
> For example, in the scenario where the TTL time of a table is short, there 
> are more writes than reads. To ensure write performance and reduce cluster 
> I/O pressure, Only expired files are removed in the compact process is a good 
> idea.
> Another usage scenario: Tables have primary and backup, and the backup table 
> can add this table attribute to reduce the compact IO of the backup cluster.
> Optimization solution: Add a table attribute for the HBase table



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28777) mTLS client hostname verification doesn't work with OptionalSslHandler

2024-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28777:
---
Labels: pull-request-available  (was: )

> mTLS client hostname verification doesn't work with OptionalSslHandler
> --
>
> Key: HBASE-28777
> URL: https://issues.apache.org/jira/browse/HBASE-28777
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 3.0.0-beta-1
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
>  Labels: pull-request-available
>
> Netty's OptionalSslHandler cannot carry hostport information to SslEngine, 
> hence HBASE-27673 fixed the TLS-only case only. We need to have a custom 
> handling for the plaintext-enabled TLS mode in order to support client 
> hostname verification in that case too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28775) Change the output of DatanodeInfo in the log to the hostname of the datanode

2024-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28775:
---
Labels: pull-request-available  (was: )

> Change the output of DatanodeInfo in the log to the hostname of the datanode
> 
>
> Key: HBASE-28775
> URL: https://issues.apache.org/jira/browse/HBASE-28775
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: MisterWang
>Priority: Minor
>  Labels: pull-request-available
>
> Now,  DatanodeInfo will be output in the print log. When we are 
> troubleshooting and searching for slow datanode nodes, we need to convert IP 
> addresses to hostnames, which is quite cumbersome.
> I think the output log should has good readability, so it would be better to 
> output the hostname of the datanode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27781) AssertionError in AsyncRequestFutureImpl when timing out during location resolution

2024-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-27781:
---
Labels: pull-request-available  (was: )

> AssertionError in AsyncRequestFutureImpl when timing out during location 
> resolution
> ---
>
> Key: HBASE-27781
> URL: https://issues.apache.org/jira/browse/HBASE-27781
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Assignee: Daniel Roudnitsky
>Priority: Major
>  Labels: pull-request-available
>
> In AsyncFutureRequestImpl we fail fast when operation timeout is exceeded 
> during location resolution 
> [here|https://github.com/apache/hbase/blob/branch-2.5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L460-L462].
>  In that handling, we loop all actions and set them as failed. The problem 
> is, some number of actions may already finished when we get to this spot. So 
> the actionsInProgress would have been decremented for those already, and now 
> we're going to decrement by all actions. This causes an assertion error since 
> we go negative 
> [here|https://github.com/apache/hbase/blob/branch-2.5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L1197]
> We still want to fail all actions, because none will be executed. But we need 
> special handling to avoid this case. Maybe don't bother decrementing the 
> actionsInProgress at all, instead set to 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28771) Add support for non replica actions to AsyncRequestFutureImpl.isActionComplete

2024-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28771:
---
Labels: pull-request-available  (was: )

> Add support for non replica actions to AsyncRequestFutureImpl.isActionComplete
> --
>
> Key: HBASE-28771
> URL: https://issues.apache.org/jira/browse/HBASE-28771
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Daniel Roudnitsky
>Assignee: Daniel Roudnitsky
>Priority: Minor
>  Labels: pull-request-available
>
> The current isActionComplete method we have in AsyncRequestFutureImpl is only 
> designed to support replica actions, it would be useful to have 
> isActionComplete support non replica actions that could be reused for 
> HBASE-28358 and in other paths where non replica actions are being handled. 
> Since isActionComplete currently only has one caller we can move the replica 
> action check to the caller method instead of having the check be inside 
> isActionComplete. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28669) After one RegionServer restarts, another RegionServer leaks a connection to ZooKeeper

2024-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28669:
---
Labels: Replication pull-request-available  (was: Replication)

> After one RegionServer restarts, another RegionServer leaks a connection to 
> ZooKeeper
> -
>
> Key: HBASE-28669
> URL: https://issues.apache.org/jira/browse/HBASE-28669
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.4.5
>Reporter: ZhongYou Li
>Priority: Minor
>  Labels: Replication, pull-request-available
>
> The peer "to_pd_A" has been removed, but there is an error log in 
> RegionServer, error log:
> {code:java}
> 2024-06-11 09:42:34.074 ERROR 
> [ReplicationExecutor-0.replicationSource,to_pd_A-172.30.12.12,6002,1709612684705-SendThread(bjtx-hbase-onll-meta-01:2181)]
>  client.StaticHostProvider: Unable to resolve address: 
> bjtx-hbase-onll-meta-03:2181
> java.net.UnknownHostException: bjtx-hbase-onll-meta-03
>    at java.net.InetAddress$CachedAddresses.get(InetAddress.java:764)
>    at java.net.InetAddress.getAllByName0(InetAddress.java:1291)
>    at java.net.InetAddress.getAllByName(InetAddress.java:1144)
>    at java.net.InetAddress.getAllByName(InetAddress.java:1065)
>    at 
> org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:92)
>    at 
> org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:147)
>    at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:375)
>    at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1137){code}
> Here are the steps to reproduce:
> I have 3 RegionServers. The following steps can reproduce the phenomenon of 
> ZK connection leakage:
> 1. Enable replication
> 2. Create a peer
> 3. Shut down any two RegionServers for a few minutes and restart them
> 4. Print the thread stack on the RegionServer that is not shut down, search 
> for the keyword , and you can see that there are 4 more threads with 
> ZooKeeper
> 5. By removing the peer, the extra 4 threads still exist
> The following is the thread stack leak in one of my RegionServers:
> {code:java}
> "ReplicationExecutor-0.replicationSource,lizy_test_replication-10.0.16.29,6002,1718180442225-EventThread"
>  #610 daemon prio=5 os_prio=0 cpu=0.27ms elapsed=466.94s 
> tid=0x7efc58179000 nid=0x5a051 waiting on condition [0x7efc2cdef000]
> "ReplicationExecutor-0.replicationSource,lizy_test_replication-10.0.16.29,6002,1718180442225-SendThread(10.0.16.100:2181)"
>  #609 daemon prio=5 os_prio=0 cpu=3.02ms elapsed=466.94s 
> tid=0x7efc58178800 nid=0x5a050 runnable [0x7efc2cef]
> "ReplicationExecutor-0.replicationSource,lizy_test_replication-10.0.16.9,6002,1718180457260-EventThread"
>  #505 daemon prio=5 os_prio=0 cpu=0.27ms elapsed=556.09s 
> tid=0x7efc50094800 nid=0x59c04 waiting on condition [0x7efc2d7f7000]
> "ReplicationExecutor-0.replicationSource,lizy_test_replication-10.0.16.9,6002,1718180457260-SendThread(10.0.16.100:2181)"
>  #504 daemon prio=5 os_prio=0 cpu=3.72ms elapsed=556.09s 
> tid=0x7efc50093000 nid=0x59c03 runnable [0x7efc2d8f8000] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28769) Create PoC for RangeStoreFileReader to support multi region splitting

2024-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28769:
---
Labels: pull-request-available  (was: )

> Create PoC for RangeStoreFileReader to support multi region splitting
> -
>
> Key: HBASE-28769
> URL: https://issues.apache.org/jira/browse/HBASE-28769
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
>Priority: Major
>  Labels: pull-request-available
>
> This is the JIRA to create the PoC of the range store file reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28760) Exclude pom file of jaxws-ri in output tarball

2024-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28760:
---
Labels: pull-request-available  (was: )

> Exclude pom file of jaxws-ri in output tarball
> --
>
> Key: HBASE-28760
> URL: https://issues.apache.org/jira/browse/HBASE-28760
> Project: HBase
>  Issue Type: Bug
>  Components: jenkins, scripts
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Permission denied...
> Not sure what is the real problem.
> {noformat}
> 17:21:17  Building a binary tarball from the source tarball succeeded.
> [Pipeline] echo
> 17:21:17  unpacking the hbase bin tarball into 'hbase-install' and the client 
> tarball into 'hbase-client'
> [Pipeline] sh
> 17:21:18  tar: /jaxws-ri-2.3.2.pom: Cannot open: Permission denied
> 17:21:20  tar: Exiting with failure status due to previous errors
> Post stage
> [Pipeline] stash
> 17:21:20  Warning: overwriting stash ‘srctarball-result’
> 17:21:20  Stashed 2 file(s)
> [Pipeline] sshPublisher
> 17:21:20  SSH: Current build result is [FAILURE], not going to run.
> [Pipeline] sh
> 17:21:20  Remove 
> /home/jenkins/jenkins-home/workspace/HBase_Nightly_master/output-srctarball/hbase-src.tar.gz
>  for saving space
> [Pipeline] archiveArtifacts
> 17:21:20  Archiving artifacts
> [Pipeline] archiveArtifacts
> 17:21:20  Archiving artifacts
> [Pipeline] archiveArtifacts
> 17:21:20  Archiving artifacts
> [Pipeline] archiveArtifacts
> 17:21:20  Archiving artifacts
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> 17:21:20  Failed in branch packaging and integration
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28767) Simplify backup bulk-loading code

2024-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28767:
---
Labels: pull-request-available  (was: )

> Simplify backup bulk-loading code
> -
>
> Key: HBASE-28767
> URL: https://issues.apache.org/jira/browse/HBASE-28767
> Project: HBase
>  Issue Type: Task
>  Components: backup&restore
>Reporter: Dieter De Paepe
>Priority: Minor
>  Labels: pull-request-available
>
> While working on HBASE-28706, I came across a lot of overly complex/duplicate 
> code related to how bulk uploads are tracked for backups.
> This ticket is to simplify some of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28648) Change the deprecation cycle for RegionObserver.postInstantiateDeleteTracker

2024-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28648:
---
Labels: pull-request-available  (was: )

> Change the deprecation cycle for RegionObserver.postInstantiateDeleteTracker
> 
>
> Key: HBASE-28648
> URL: https://issues.apache.org/jira/browse/HBASE-28648
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors
>Reporter: Duo Zhang
>Assignee: Liangjun He
>Priority: Major
>  Labels: pull-request-available
>
> Visibility label feature still use this method so it can not be removed in 
> 3.0.0. Should change the deprecation cycle javadoc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-23778) Region History Redo

2024-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-23778:
---
Labels: pull-request-available  (was: )

> Region History Redo
> ---
>
> Key: HBASE-23778
> URL: https://issues.apache.org/jira/browse/HBASE-23778
> Project: HBase
>  Issue Type: Improvement
>Reporter: Swaroopa Kadam
>Assignee: Akshita Jain
>Priority: Major
>  Labels: pull-request-available
>
> My initial thought is mainly to extend the HBase shell(gradually extend to 
> CLI and UI) to allow the use of 
> {code:java}
> where {code}
> to get the necessary information and allow passing history as an additional 
> parameter to get the history. We can configure how many transitions we want 
> to store so that anything (ZK or small data structure in the table region 
> itself or maybe something else) that is used for state management is not 
> exploded.
> As pointed by
> [~andrew.purt...@gmail.com] need to be watchful of mistakes done in the past: 
> HBASE-533 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28389) HBase backup yarn queue parameter ignored

2024-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28389:
---
Labels: pull-request-available  (was: )

> HBase backup yarn queue parameter ignored
> -
>
> Key: HBASE-28389
> URL: https://issues.apache.org/jira/browse/HBASE-28389
> Project: HBase
>  Issue Type: Bug
>  Components: backup&restore
>Affects Versions: 2.6.0
> Environment: HBase branch-2.6
>Reporter: Dieter De Paepe
>Assignee: Liangjun He
>Priority: Major
>  Labels: pull-request-available
>
> It seems the parameter to specify the yarn queue for HBase backup (`-q`) is 
> ignored:
> {code:java}
> hbase backup create full hdfs:///tmp/backups/hbasetest/hbase -q hbase-backup 
> {code}
> gets executed on the "default" queue.
> Setting the queue through the configuration does work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28690) Aborting Active HMaster is not rejecting reportRegionStateTransition if procedure is initialised by next Active master

2024-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28690:
---
Labels: pull-request-available  (was: )

> Aborting Active HMaster is not rejecting reportRegionStateTransition if 
> procedure is initialised by next Active master
> --
>
> Key: HBASE-28690
> URL: https://issues.apache.org/jira/browse/HBASE-28690
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.8
>Reporter: Umesh Kumar Kumawat
>Assignee: Umesh Kumar Kumawat
>Priority: Major
>  Labels: pull-request-available
>
> A CloseRegionProcedure on master requests the RS to close the region and 
> after closing the region RS reports RegionStateTransition 
> back([here|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L1853]).
>  On receiving the report, the master checks if regionNode has any procedure 
> assigned to it 
> ([code|https://github.com/apache/hbase/blob/d1015a68ed9f94d74668abd37edefd32f5e9305b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1294]).
>  
>  
> {code:java}
>  private boolean reportTransition(RegionStateNode regionNode, ServerStateNode 
> serverNode,
>     TransitionCode state, long seqId, long procId) throws IOException {
>     ServerName serverName = serverNode.getServerName();
>     TransitRegionStateProcedure proc = regionNode.getProcedure();
>     if (proc == null) {
>       return false;
>     }
>     
> proc.reportTransition(master.getMasterProcedureExecutor().getEnvironment(), 
> regionNode,
>       serverName, state, seqId, procId);
>     return true;
>   } {code}
> If regionNode doesn't have any procedure, the master just logs it and doesn't 
> throw any error to RPC. 
>  
> Think of a case when MasterFailover is happening and the new Active master 
> only initialized the TRSP and CloseRegionProcedure. Now aborting Master has 
> stale/false data. If the transition report comes to the aborting master, not 
> rejecting this report is causing the procedure to get stuck. 
>  
> *Logs for more understanding* 
> active master server4-1 failing
> {noformat}
> 2024-06-20 04:45:05,576 ERROR 
> [iority.RWQ.Fifo.write.handler=3,queue=0,port=61000] master.HMaster - * 
> ABORTING master server4-1,61000,1715413775736: Failed to record region server 
> as started *{noformat}
> *logs of new active master server5-1*
>  
> {noformat}
> 2024-06-20 04:49:28,893 DEBUG [aster/server5-1:61000:becomeActiveMaster] 
> assignment.RegionStateStore - Load hbase:meta entry 
> region=888a715d5926adbb89c985d8967f40d4, regionState=OPEN, 
> lastHost=server1-119,61020,1717560166420, 
> regionLocation=server1-119,61020,1717560166420, openSeqNum=34892620
> 024-06-20 04:49:51,886 INFO [PEWorker-22] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[{pid=16276416, ppid=16276108, 
> state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure 
> table=RIMBS.UPLOADER_JOB_DETAILS, region=888a715d5926adbb89c985d8967f40d4, 
> UNASSIGN}]  (on server5-1)
> 2024-06-20 04:49:52,022 INFO [PEWorker-40] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[{pid=16276470, ppid=16276416, state=RUNNABLE; 
> CloseRegionProcedure 888a715d5926adbb89c985d8967f40d4, 
> server=server1-119,61020,1717560166420}] (on server5-1){noformat}
>  
> *RS logs for closing* 
> {noformat}
> 2024-06-20 04:49:52,267 INFO [_REGION-regionserver/server1-119:61020-2] 
> handler.UnassignRegionHandler - Close 888a715d5926adbb89c985d8967f40d4
> 2024-06-20 04:49:52,267 DEBUG [_REGION-regionserver/server1-119:61020-2] 
> regionserver.HRegion - Closing 888a715d5926adbb89c985d8967f40d4, disabling 
> compactions & flushes
> 2024-06-20 04:49:52,354 INFO [_REGION-regionserver/server1-119:61020-2] 
> regionserver.HRegion - Closed 
> TABLE,KW\x00na240-app1-16\x00/Events-120620231740\x00MARKER-Events,1702619592612.888a715d5926adbb89c985d8967f40d4.
> {noformat}
> *Logs of report on aborting active Hmaster*
> {noformat}
> 2024-06-20 04:49:52,355 WARN 
> [iority.RWQ.Fifo.write.handler=1,queue=0,port=61000] 
> assignment.AssignmentManager - No matching procedure found for 
> server1-119,61020,1717560166420 transition on state=OPEN, 
> location=server1-119,61020,1717560166420, table=RIMBS.UPLOADER_JOB_DETAILS, 
> region=888a715d5926adbb89c985d8967f40d4 to CLOSED ( host = server4-1 , 
> hbaseMasterLogFile){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28758) Remove the aarch64 profile

2024-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28758:
---
Labels: beginner pull-request-available  (was: beginner)

> Remove the aarch64 profile
> --
>
> Key: HBASE-28758
> URL: https://issues.apache.org/jira/browse/HBASE-28758
> Project: HBase
>  Issue Type: Improvement
>  Components: build, pom, Protobufs
>Reporter: Duo Zhang
>Assignee: MisterWang
>Priority: Major
>  Labels: beginner, pull-request-available
>
> We do not depend on protobuf 2.5 on branch-3+, so we do not need the special 
> protoc compiler for arm any more.
> Just remove the profile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28250) Bump jruby to 9.4.8.0 to fix snakeyaml CVE

2024-07-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28250:
---
Labels: pull-request-available  (was: )

> Bump jruby to 9.4.8.0 to fix snakeyaml CVE
> --
>
> Key: HBASE-28250
> URL: https://issues.apache.org/jira/browse/HBASE-28250
> Project: HBase
>  Issue Type: Task
>  Components: jruby, security, shell
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
>  Labels: pull-request-available
>
> As a follow up of HBASE-28249, we want to bump to latest 9.4.x line here. 
> This release line drops critical snakeyaml CVE ({*}org.yaml : snakeyaml : 
> 1.33{*} having 
> [CVE-2022-1471|https://nvd.nist.gov/vuln/detail/CVE-2022-1471]) from our 
> classpath with following change along with several other bugs/fixes: 
>  * The Psych YAML library is updated to 5.1.0. This version switches the 
> JRuby extension to SnakeYAML Engine, avoiding CVEs against the original 
> SnakeYAML and updating YAML compatibility to specification version 1.2. 
> [#6365|https://github.com/jruby/jruby/issues/6365], 
> [#7570|https://github.com/jruby/jruby/issues/7570], 
> [#7626|https://github.com/jruby/jruby/pull/7626]
> NOTE: JRuby 9.4.x targets Ruby 3.1 compatibility instead of Ruby 2.6 which 
> 9.3.x were having!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28729) Change the generic type of List in InternalScanner.next

2024-07-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28729:
---
Labels: pull-request-available  (was: )

> Change the generic type of List in InternalScanner.next
> ---
>
> Key: HBASE-28729
> URL: https://issues.apache.org/jira/browse/HBASE-28729
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors, regionserver
>Reporter: Duo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Plan to change it from List to List, so we could 
> pass both List and List to it, or even List for 
> coprocessors.
> This could save a lot of casting in our main code.
> This is an incompatible change for coprocessors, so it will only go into 
> branch-3+, and will be marked as incompatible change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28587) Remove deprecated methods in Cell

2024-07-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28587:
---
Labels: pull-request-available  (was: )

> Remove deprecated methods in Cell
> -
>
> Key: HBASE-28587
> URL: https://issues.apache.org/jira/browse/HBASE-28587
> Project: HBase
>  Issue Type: Sub-task
>  Components: API, Client
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28584) RS SIGSEGV under heavy replication load

2024-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28584:
---
Labels: pull-request-available  (was: )

> RS SIGSEGV under heavy replication load
> ---
>
> Key: HBASE-28584
> URL: https://issues.apache.org/jira/browse/HBASE-28584
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.5.6
> Environment: RHEL 7.9
> JDK 11.0.23
> Hadoop 3.2.4
> Hbase 2.5.6
>Reporter: Whitney Jackson
>Assignee: Andrew Kyle Purtell
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-Deep-clone-cells-set-to-be-replicated-onto-the-local.patch, 
> 0001-Support-configuration-based-selection-of-netty-chann.patch, 
> rs_profile_after.html, rs_profile_before.html
>
>
> I'm observing RS crashes under heavy replication load:
>  
> {code:java}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f7546873b69, pid=29890, tid=36828
> #
> # JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.23+7) (build 
> 11.0.23+7-LTS-222)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.23+7-LTS-222, mixed 
> mode, tiered, compressed oops, g1 gc, linux-amd64)
> # Problematic frame:
> # J 24625 c2 
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V
>  (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209]
> {code}
>  
> The heavier load comes when a replication peer has been disabled for several 
> hours for patching etc. When the peer is re-enabled the replication load is 
> high until the peer is all caught up. The crashes happen on the cluster 
> receiving the replication edits.
>  
> I believe this problem started after upgrading from 2.4.x to 2.5.x.
>  
> One possibly relevant non-standard config I run with:
> {code:java}
> 
>   hbase.region.store.parallel.put.limit
>   
>   100
>   Added after seeing "failed to accept edits" replication errors 
> in the destination region servers indicating this limit was being exceeded 
> while trying to process replication edits.
> 
> {code}
>  
> I understand from other Jiras that the problem is likely around direct memory 
> usage by Netty. I haven't yet tried switching the Netty allocator to 
> {{unpooled}} or {{{}heap{}}}. I also haven't yet tried any of the  
> {{io.netty.allocator.*}} options.
>  
> {{MaxDirectMemorySize}} is set to 26g.
>  
> Here's the full stack for the relevant thread:
>  
> {code:java}
> Stack: [0x7f72e2e5f000,0x7f72e2f6],  sp=0x7f72e2f5e450,  free 
> space=1021k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> J 24625 c2 
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V
>  (75 bytes) @ 0x7f7546873b69 [0x7f7546873960+0x0209]
> J 26253 c2 
> org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I 
> (21 bytes) @ 0x7f7545af2d84 [0x7f7545af2d20+0x0064]
> J 22971 c2 
> org.apache.hadoop.hbase.codec.KeyValueCodecWithTags$KeyValueEncoder.write(Lorg/apache/hadoop/hbase/Cell;)V
>  (27 bytes) @ 0x7f754663f700 [0x7f754663f4c0+0x0240]
> J 25251 c2 
> org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (90 bytes) @ 0x7f7546a53038 [0x7f7546a50e60+0x21d8]
> J 21182 c2 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (73 bytes) @ 0x7f7545f4d90c [0x7f7545f4d3a0+0x056c]
> J 21181 c2 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (149 bytes) @ 0x7f7545fd680c [0x7f7545fd65e0+0x022c]
> J 25389 c2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$$Lambda$247.run()V 
> (16 bytes) @ 0x7f7546ade660 [0x7f7546ade140+0x0520]
> J 24098 c2 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z
>  (109 bytes) @ 0x7f754678fbb8 [0x7f754678f8e0+0x02d8]
> J 27297% c2 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run()V (603 
> bytes) @ 0x7f75466c4d48 [0x7f75466c4c80+0x00c8]
> j  
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run()V+44
> j  
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.

[jira] [Updated] (HBASE-28756) RegionSizeCalculator ignored the size of memstore, which leads Spark miss data

2024-07-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28756:
---
Labels: pull-request-available  (was: )

> RegionSizeCalculator ignored the size of memstore, which leads Spark miss data
> --
>
> Key: HBASE-28756
> URL: https://issues.apache.org/jira/browse/HBASE-28756
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 2.6.0, 3.0.0-beta-1, 2.5.10
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>  Labels: pull-request-available
>
> RegionSizeCalculator only considers the size of StoreFile and ignores the 
> size of MemStore. For a new region that has only been written to MemStore and 
> has not been flushed, will consider its size to be 0.
> When we use TableInputFormat to read HBase table data in Spark.
> {code:java}
> spark.sparkContext.newAPIHadoopRDD(
> conf,
> classOf[TableInputFormat],
> classOf[ImmutableBytesWritable],
> classOf[Result])
> }{code}
> Spark defaults to ignoring empty InputSplits, which is determined by the 
> configuration  "{{{}spark.hadoopRDD.ignoreEmptySplits{}}}".
> {code:java}
> private[spark] val HADOOP_RDD_IGNORE_EMPTY_SPLITS =
>   ConfigBuilder("spark.hadoopRDD.ignoreEmptySplits")
> .internal()
> .doc("When true, HadoopRDD/NewHadoopRDD will not create partitions for 
> empty input splits.")
> .version("2.3.0")
> .booleanConf
> .createWithDefault(true) {code}
> The above reasons lead to Spark missing data. So we should consider both the 
> size of the StoreFile and the MemStore in the RegionSizeCalculator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-6028) Implement a cancel for in-progress compactions

2024-07-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-6028:
--
Labels: beginner pull-request-available  (was: beginner)

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner, pull-request-available
> Fix For: 3.0.0-alpha-1, 2.2.0
>
> Attachments: HBASE-6028.master.007.patch, 
> HBASE-6028.master.008.patch, HBASE-6028.master.008.patch, 
> HBASE-6028.master.009.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28753) FNFE may occur when accessing the region.jsp of the replica region

2024-07-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28753:
---
Labels: pull-request-available  (was: )

> FNFE may occur when accessing the region.jsp of the replica region
> --
>
> Key: HBASE-28753
> URL: https://issues.apache.org/jira/browse/HBASE-28753
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, UI
>Affects Versions: 2.4.13
>Reporter: guluo
>Assignee: guluo
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-07-24-20-13-22-014.png
>
>
> On hbase UI, we can get the details of storefiles in region region by 
> accessing region.jsp.
> However, When hbase table enables the region replication, the replica region 
> may reference deleted storefile due to it dosen't refresh in a timely manner, 
> so in this case, we would get FNFE when openning the region.jsp of the region.
>  
> java.io.FileNotFoundException: File 
> file:/home/gl/code/github/hbase/hbase-assembly/target/hbase-4.0.0-alpha-1-SNAPSHOT/tmp/hbase/data/default/t01/e073c6b7c05eadda3f91d5b9692fc98d/info/5c52361153044b89aa61090cd5497998.4433b98ccf6b4a011ab03fc4a5e38a1a
>  does not exist at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:915)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1236)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:905)
>  at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
>  at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
>  at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:1881) at 
> org.apache.hadoop.hbase.generated.regionserver.region_jsp._jspService(region_jsp.java:97)
>  at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:111) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28748) Replication blocking: InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.

2024-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28748:
---
Labels: pull-request-available  (was: )

> Replication blocking: 
> InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag 
> had invalid wire type.
> --
>
> Key: HBASE-28748
> URL: https://issues.apache.org/jira/browse/HBASE-28748
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 2.6.0
> Environment: hbase2.6.0
> hadoop3.3.6
>Reporter: Longping Jie
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1
>
> Attachments: image-2024-07-23-12-33-50-395.png, 
> rs-replciation-error.log, 
> tx1-int-hbase-main-prod-4%2C16020%2C1720602602602.1720609818921
>
>
> h2. replication queue overstock, As shown below:
> !image-2024-07-23-12-33-50-395.png!
>  
> In the figure, the first wal file no longer exists, but has not been skipped, 
> causing replciation to block.
> the second and third wal file were moved oldWals, you can see the attachment, 
> the reading of these two files faile.
> h2. The error log in rs is
> 2024-07-22T17:47:49,130 WARN 
> [RS_CLAIM_REPLICATION_QUEUE-regionserver/sh2-int-hbase-main-ha-9:16020-0.replicationSource,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464.replicationSource.wal-reader.tx1-int-hbase-main-prod-3%2C16020%2C1720602522464,test_hbase_258-tx1-int-hbase-main-prod-3,16020,1720602522464]
>  wal.ProtobufWALStreamReader: Error while reading WALKey, originalPosition=0, 
> currentPosition=81
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException:
>  Protocol message tag had invalid wire type.
> at 
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:119)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> at 
> org.apache.hbase.thirdparty.com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:503)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> at 
> org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:770)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:2829)
>  ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4212)
>  ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:4204)
>  ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
> at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:192)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> at 
> org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessage.parseWithIOException(GeneratedMessage.java:321)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey.parseFrom(WALProtos.java:2321)
>  ~[hbase-protocol-shaded-2.6.0.jar:2.6.0]
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.readWALKey(ProtobufWALTailingReader.java:128)
>  ~[hbase-server-2.6.0.jar:2.6.0]
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufWALTailingReader.next(ProtobufWALTailingReader.java:257)
>  ~[hbase-server-2.6.0.jar:2.6.0]
> at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:490)
>  ~[hbase-server-2.6.0.jar:2.6.0]
> at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.lastAttempt(WALEntryStream.java:306)
>  ~[hbase-server-2.6.0.jar:2.6.0]
> at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:388)
>  ~[hbase-server-2.6.0.jar:2.6.0]
> at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:130)
>  ~[hbase-server-2.6.0.jar:2.6.0]
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(Repl

[jira] [Updated] (HBASE-28655) TestHFileCompressionZstd fails with IllegalArgumentException: Illegal bufferSize

2024-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28655:
---
Labels: pull-request-available  (was: )

> TestHFileCompressionZstd fails with IllegalArgumentException: Illegal 
> bufferSize
> 
>
> Key: HBASE-28655
> URL: https://issues.apache.org/jira/browse/HBASE-28655
> Project: HBase
>  Issue Type: Bug
>  Components: HFile, Operability
>Affects Versions: 2.6.0, 3.0.0-beta-1, 2.5.8
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
>  Labels: pull-request-available
>
> HADOOP-18810 added io.compression.codec.zstd.buffersize in core-default.xml 
> with default value as 0.
> So ZSTD buffer size will be returned as 0 based on core-default.xml,
> {code:java}
>   static int getBufferSize(Configuration conf) {
> return conf.getInt(ZSTD_BUFFER_SIZE_KEY,
>   
> conf.getInt(CommonConfigurationKeys.IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_KEY,
> // IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_DEFAULT is 0! We can't allow 
> that.
> ZSTD_BUFFER_SIZE_DEFAULT));
>   }
> {code}
> HBASE-26259 added a value check, but got reverted in HBASE-26959.
>  
> This issue will also occur during region flush and abort the RegionServer.
>  
> TestHFileCompressionZstd and other zstd related test cases are are also 
> failing,
> {code:java}
> java.lang.IllegalArgumentException: Illegal bufferSize
>   at 
> org.apache.hadoop.io.compress.CompressorStream.(CompressorStream.java:42)
>   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.(BlockCompressorStream.java:56)
>   at 
> org.apache.hadoop.hbase.io.compress.aircompressor.ZstdCodec.createOutputStream(ZstdCodec.java:106)
>   at 
> org.apache.hadoop.hbase.io.compress.Compression$Algorithm.createPlainCompressionStream(Compression.java:454)
>   at 
> org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.(HFileBlockDefaultEncodingContext.java:99)
>   at 
> org.apache.hadoop.hbase.io.hfile.NoOpDataBlockEncoder.newDataBlockEncodingContext(NoOpDataBlockEncoder.java:85)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.(HFileBlock.java:846)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishInit(HFileWriterImpl.java:304)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.(HFileWriterImpl.java:185)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFile$WriterFactory.create(HFile.java:312)
>   at 
> org.apache.hadoop.hbase.io.compress.HFileTestBase.doTest(HFileTestBase.java:73)
>   at 
> org.apache.hadoop.hbase.io.compress.aircompressor.TestHFileCompressionZstd.test(TestHFileCompressionZstd.java:54)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28750) Region normalizer should work in off peak if config

2024-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28750:
---
Labels: pull-request-available  (was: )

> Region normalizer should work in off peak if config
> ---
>
> Key: HBASE-28750
> URL: https://issues.apache.org/jira/browse/HBASE-28750
> Project: HBase
>  Issue Type: Improvement
>  Components: Normalizer
>Reporter: MisterWang
>Priority: Minor
>  Labels: pull-request-available
>
> Region normalizer involves the splitting and merging of regions, which can 
> cause jitter in online services, especially when there are many region 
> normalizer plans. We should run this task during off peak hours if config.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28749) Remove the duplicate configurations named hbase.wal.batch.size

2024-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28749:
---
Labels: pull-request-available  (was: )

> Remove the duplicate configurations named hbase.wal.batch.size
> --
>
> Key: HBASE-28749
> URL: https://issues.apache.org/jira/browse/HBASE-28749
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-beta-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.0.0-beta-2
>
>
> The following code appears in two places: AsyncFSWAL and AbstractFSWAL
> {code:java}
> public static final String WAL_BATCH_SIZE = "hbase.wal.batch.size";
> public static final long DEFAULT_WAL_BATCH_SIZE = 64L * 1024; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28719) Use ExtendedCell in WALEdit

2024-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28719:
---
Labels: pull-request-available  (was: )

> Use ExtendedCell in WALEdit
> ---
>
> Key: HBASE-28719
> URL: https://issues.apache.org/jira/browse/HBASE-28719
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28734) Improve HBase shell snapshot command Doc with TTL option

2024-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28734:
---
Labels: pull-request-available  (was: )

> Improve HBase shell snapshot command Doc with TTL option 
> -
>
> Key: HBASE-28734
> URL: https://issues.apache.org/jira/browse/HBASE-28734
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Ashok shetty
>Assignee: Liangjun He
>Priority: Minor
>  Labels: pull-request-available
>
> The current HBase shell snapshot command allows users to create a snapshot of 
> a specific table. While this command is useful, it could be enhanced by 
> adding a TTL (Time-to-Live) option. This would allow users to specify a time 
> period after which the snapshot would automatically be deleted.
> I propose we introduce a TTL option in the snapshot command doc as follows:
> hbase> snapshot 'sourceTable', 'snapshotName', \{TTL => '7d'}
> This would create a snapshot of 'sourceTable' called 'snapshotName' that 
> would automatically be deleted after 7 days. The addition document of a TTL 
> option would provide a better user experience and assist with efficient 
> storage management.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28745) Default Zookeeper ConnectionRegistry APIs timeout should be less

2024-07-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28745:
---
Labels: pull-request-available  (was: )

> Default Zookeeper ConnectionRegistry APIs timeout should be less
> 
>
> Key: HBASE-28745
> URL: https://issues.apache.org/jira/browse/HBASE-28745
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Viraj Jasani
>Assignee: Divneet Kaur
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.11
>
>
> HBASE-28428 introduces timeout for Zookeeper ConnectionRegistry APIs. 
> However, the default timeout value we have set is 60s. Given that connection 
> registry are metadata APIs, they should have much lesser timeout value, 
> including default.
> Let's set default timeout to 10s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   >