[jira] [Created] (HBASE-28724) BucketCache.notifyFileCachingCompleted may throw IllegalMonitorStateException
Wellington Chevreuil created HBASE-28724: Summary: BucketCache.notifyFileCachingCompleted may throw IllegalMonitorStateException Key: HBASE-28724 URL: https://issues.apache.org/jira/browse/HBASE-28724 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil If the prefetch thread completes reading the file blocks faster than the bucket cache writer threads are able to drain it from the writer queues, we might run into a scenario where BucketCache.notifyFileCachingCompleted may throw IllegalMonitorStateException, as we can reach [this block of the code|https://github.com/wchevreuil/hbase/blob/684964f1c1693d2a0792b7b721c92693d75b4cea/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java#L2106]. I believe the impact is not critical, as the prefetch thread is already finishing at that point, but nevertheless, such error in the logs might be misleading. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28364) Warn: Cache key had block type null, but was found in L1 cache
[ https://issues.apache.org/jira/browse/HBASE-28364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28364. -- Resolution: Fixed Merged to 2.6 and 2.5 branches. > Warn: Cache key had block type null, but was found in L1 cache > -- > > Key: HBASE-28364 > URL: https://issues.apache.org/jira/browse/HBASE-28364 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 2.4.18, 2.5.9 >Reporter: Bryan Beaudreault >Assignee: Nikita Pande >Priority: Major > Labels: pull-request-available > Fix For: 2.6.1, 2.5.10 > > > I'm ITBLL testing branch-2.6 and am seeing lots of these warns. This is new > to me. I would expect a warn to be on the rare side or be indicative of a > problem, but unclear from the code. > cc [~wchevreuil] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28596) Optimise BucketCache usage upon regions splits/merges.
[ https://issues.apache.org/jira/browse/HBASE-28596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28596. -- Release Note: This adds a new configuration property, "hbase.rs.evictblocksonsplit”, with default value set to true, which makes all parent region blocks to get evicted on split. It has modified behaviour implemented on previous HBASE-27474, to allow prefetch to run on the daughters' refs (if hbase.rs.prefetchblocksonopen is true). It has also modified how BucketCache deals with blocks from reference files: 1) When adding blocks for a reference file, it first resolves the reference and check if the related block from the parent file is already in the cache. If so, it doesn't add any this block to the cache. Otherwise, it will add the block with the reference as the cache key. 2) When searching for blocks from a reference file in the cache, it first resolves the reference and check for the block from the original file, returning this one if found. Otherwise, it searches the cache again, now using the reference file as cache key. Resolution: Fixed Merged into master, branch-3 and branch-2. Thanks for reviewing it [~taklwu] ! > Optimise BucketCache usage upon regions splits/merges. > -- > > Key: HBASE-28596 > URL: https://issues.apache.org/jira/browse/HBASE-28596 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2 > > > This proposal aims to give more flexibility for users to decide whether or > not blocks from a parent region should be evict, and also optimise cache > usage by resolving file reference blocks to the referred block in the cache. > Some extra context: > 1) Originally, the default behaviour on splits was to rely on the > "hbase.rs.evictblocksonclose" value to decide if the cached blocks from the > parent split should be evicted or not. Then the resulting split daughters get > open with refs to the parent file. If hbase.rs.prefetchblocksonopen is set, > these openings will trigger a prefetch of the blocks from the parent split, > now with cache keys from the ref path. That means, if > "hbase.rs.evictblocksonclose" is false and “hbase.rs.prefetchblocksonopen” is > true, we will be duplicating blocks in the cache. In scenarios where cache > usage is at capacity and added latency for reading from the file system is > high (for example reading from a cloud storage), this can have a severe > impact, as the prefetch for the refs would trigger evictions. Also, the refs > tend to be short lived, as compaction is triggered on the split daughters > soon after it’s open. > 2) HBASE-27474 has changed the original behaviour described above, to now > always evict blocks from the split parent upon split is completed, and > skipping prefetch for refs (since refs are short lived). The side effect is > that the daughters blocks would only be cached once compaction is completed, > but compaction itself will run slower since it needs to read the blocks from > the file system. On regions as large as 20GB, the performance degradation > reported by users has been severe. > This proposes a new “hbase.rs.evictblocksonsplit” configuration property that > makes the eviction over split configurable. Depending on the use case, the > impact of mass evictions due to cache capacity may be higher, in which case > users might prefer to keep evicting split parent blocks. Additionally, it > modifies the way we handle refs when caching. HBASE-27474 behaviour was to > skip caching refs to avoid duplicate data in the cache as long as compaction > was enabled, relying on the fact that refs from splits are usually short > lived. Here, we propose modifying the search for blocks cache keys, so that > we always resolve the referenced file first and look for the related > referenced file block in the cache. That way we avoid duplicates in the cache > and also expedite scan performance on the split daughters, as it’s now > resolving the referenced file and reading from the cache. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28657) Backport HBASE-28246 Expose region cached size over JMX metrics and report in the RS UI
[ https://issues.apache.org/jira/browse/HBASE-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28657. -- Resolution: Fixed Merged into branch-2.6. Thanks for backporting it, [~szucsvillo] . > Backport HBASE-28246 Expose region cached size over JMX metrics and report in > the RS UI > --- > > Key: HBASE-28657 > URL: https://issues.apache.org/jira/browse/HBASE-28657 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.6.0 >Reporter: Szucs Villo >Assignee: Szucs Villo >Priority: Major > Labels: pull-request-available > Fix For: 2.6.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28467) Integration of time-based priority caching into cacheOnRead read code paths.
[ https://issues.apache.org/jira/browse/HBASE-28467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28467. -- Resolution: Fixed Merged into the feature branch. Thanks for the contribution, [~janardhan.hungund] ! > Integration of time-based priority caching into cacheOnRead read code paths. > > > Key: HBASE-28467 > URL: https://issues.apache.org/jira/browse/HBASE-28467 > Project: HBase > Issue Type: Task > Components: BucketCache >Reporter: Janardhan Hungund >Assignee: Janardhan Hungund >Priority: Major > Labels: pull-request-available > > This Jira tracks the integration of time-based caching framework APIs into > read code paths. > Thanks, > Janardhan > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27915) Update hbase_docker with an extra Dockerfile compatible with mac m1 platfrom
[ https://issues.apache.org/jira/browse/HBASE-27915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27915. -- Resolution: Fixed Thanks for reviewing it [~swu] ! I had merged this into master, branch-3, branch-2, branch-2.6, branch-2.5 and branch-2.4. > Update hbase_docker with an extra Dockerfile compatible with mac m1 platfrom > > > Key: HBASE-27915 > URL: https://issues.apache.org/jira/browse/HBASE-27915 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Labels: pull-request-available > > When trying to use the current Dockerfile under "./dev-support/hbase_docker" > on m1 macs, the docker build fails at the git clone & mvn build stage with > below error: > {noformat} > #0 8.214 qemu-x86_64: Could not open '/lib64/ld-linux-x86-64.so.2': No such > file or directory > {noformat} > It turns out for mac m1, we have to explicitly define the platform flag for > the ubuntu image. I thought we could add a note in this readme, together with > an "m1" subfolder containing a modified copy of this Dockerfile that works on > mac m1s. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28469) Integration of time-based priority caching into compaction paths.
[ https://issues.apache.org/jira/browse/HBASE-28469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28469. -- Resolution: Fixed Merged into the feature branch. Thanks for the contribution [~vinayakhegde], and for reviewing [~janardhan.hungund] ! > Integration of time-based priority caching into compaction paths. > - > > Key: HBASE-28469 > URL: https://issues.apache.org/jira/browse/HBASE-28469 > Project: HBase > Issue Type: Task >Reporter: Janardhan Hungund >Assignee: Vinayak Hegde >Priority: Major > Labels: pull-request-available > > The time-based priority caching is dependent on the date-tiered compaction > that structures store files in date-based tiered layout. This Jira tracks the > changes are needed for the integration of this compaction strategy with the > data-tiering to enable appropriate caching of hot data in the cache, while > the code data can remain the cloud storage. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28596) Optimise BucketCache usage upon regions splits/merges.
Wellington Chevreuil created HBASE-28596: Summary: Optimise BucketCache usage upon regions splits/merges. Key: HBASE-28596 URL: https://issues.apache.org/jira/browse/HBASE-28596 Project: HBase Issue Type: Improvement Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil This proposal aims to give more flexibility for users to decide whether or not blocks from a parent region should be evict, and also optimise cache usage by resolving file reference blocks to the referred block in the cache. Some extra context: 1) Originally, the default behaviour on splits was to rely on the "hbase.rs.evictblocksonclose" value to decide if the cached blocks from the parent split should be evicted or not. Then the resulting split daughters get open with refs to the parent file. If hbase.rs.prefetchblocksonopen is set, these openings will trigger a prefetch of the blocks from the parent split, now with cache keys from the ref path. That means, if "hbase.rs.evictblocksonclose" is false and “hbase.rs.prefetchblocksonopen” is true, we will be duplicating blocks in the cache. In scenarios where cache usage is at capacity and added latency for reading from the file system is high (for example reading from a cloud storage), this can have a severe impact, as the prefetch for the refs would trigger evictions. Also, the refs tend to be short lived, as compaction is triggered on the split daughters soon after it’s open. 2) HBASE-27474 has changed the original behaviour described above, to now always evict blocks from the split parent upon split is completed, and skipping prefetch for refs (since refs are short lived). The side effect is that the daughters blocks would only be cached once compaction is completed, but compaction itself will run slower since it needs to read the blocks from the file system. On regions as large as 20GB, the performance degradation reported by users has been severe. This proposes a new “hbase.rs.evictblocksonsplit” configuration property that makes the eviction over split configurable. Depending on the use case, the impact of mass evictions due to cache capacity may be higher, in which case users might prefer to keep evicting split parent blocks. Additionally, it modifies the way we handle refs when caching. HBASE-27474 behaviour was to skip caching refs to avoid duplicate data in the cache as long as compaction was enabled, relying on the fact that refs from splits are usually short lived. Here, we propose modifying the search for blocks cache keys, so that we always resolve the referenced file first and look for the related referenced file block in the cache. That way we avoid duplicates in the cache and also expedite scan performance on the split daughters, as it’s now resolving the referenced file and reading from the cache. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28535) Implement a region server level configuration to enable/disable data-tiering
[ https://issues.apache.org/jira/browse/HBASE-28535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28535. -- Resolution: Fixed Merged into the feature branch. Thanks for the contribution, [~janardhan.hungund] ! > Implement a region server level configuration to enable/disable data-tiering > > > Key: HBASE-28535 > URL: https://issues.apache.org/jira/browse/HBASE-28535 > Project: HBase > Issue Type: Task > Components: BucketCache >Reporter: Janardhan Hungund >Assignee: Janardhan Hungund >Priority: Major > Labels: pull-request-available > > Provide the user with the ability to enable and disable the data tiering > feature. The time-based data tiering is applicable to a specific set of use > cases which write date based records and access to recently written data. > The feature, in general, should be avoided for use cases which are not > dependent on the date-based reads and writes as the code flows which enable > data temperature checks can induce performance regressions. > This Jira is added to track the functionality to optionally enable > region-server wide configuration to disable or enable the feature. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28468) Integration of time-based priority caching logic into cache evictions.
[ https://issues.apache.org/jira/browse/HBASE-28468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28468. -- Resolution: Fixed Merged into the feature branch. > Integration of time-based priority caching logic into cache evictions. > -- > > Key: HBASE-28468 > URL: https://issues.apache.org/jira/browse/HBASE-28468 > Project: HBase > Issue Type: Task >Reporter: Janardhan Hungund >Assignee: Janardhan Hungund >Priority: Major > Labels: pull-request-available > > When the time-based priority caching is enabled, then, the block evictions > triggered when the cache is full, should use the time-based priority caching > framework APIs to detect the cold files and evict the blocks of those files > first. This ensures that the hot data remains in cache while the cold data is > evicted from cache. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28466) Integration of time-based priority logic of bucket cache in prefetch functionality of HBase.
[ https://issues.apache.org/jira/browse/HBASE-28466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28466. -- Resolution: Fixed Merged into feature branch. Thanks for the contribution, [~vinayakhegde] ! > Integration of time-based priority logic of bucket cache in prefetch > functionality of HBase. > > > Key: HBASE-28466 > URL: https://issues.apache.org/jira/browse/HBASE-28466 > Project: HBase > Issue Type: Task > Components: BucketCache >Reporter: Janardhan Hungund >Assignee: Vinayak Hegde >Priority: Major > Labels: pull-request-available > > This Jira tracks the integration of the framework of APIs (implemented in > HBASE-28465) related to data tiering into prefetch logic of HBase. The > implementation should filter out the cold data and enable the prefetching of > hot data into bucket cache. > Thanks, > Janardhan > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28292) Make Delay prefetch property to be dynamically configured
[ https://issues.apache.org/jira/browse/HBASE-28292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28292. -- Resolution: Fixed PR got merged into master branch by [~psomogyi] and I had backported it into branch-3, branch-2, branch-2.6, branch-2.5 and branch-2.4. Thanks for the contributions, [~kabhishek4] ! > Make Delay prefetch property to be dynamically configured > - > > Key: HBASE-28292 > URL: https://issues.apache.org/jira/browse/HBASE-28292 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.6.0, 2.4.17, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0, 2.5.8 >Reporter: Abhishek Kothalikar >Assignee: Abhishek Kothalikar >Priority: Major > Labels: pull-request-available > Fix For: 2.6.0, 2.4.18, 3.0.0, 4.0.0-alpha-1, 2.7.0, 2.5.9 > > Attachments: HBASE-28292.docx > > > Make the prefetch delay configurable. The prefetch delay is associated to > hbase.hfile.prefetch.delay configuration. There are some cases where > configuring hbase.hfile.prefetch.delay would help in achieving better > throughput. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28505) Implement enforcement to require Date Tiered Compaction for Time Range Data Tiering
[ https://issues.apache.org/jira/browse/HBASE-28505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28505. -- Resolution: Fixed Merged into HBASE-28463 feature branch. > Implement enforcement to require Date Tiered Compaction for Time Range Data > Tiering > --- > > Key: HBASE-28505 > URL: https://issues.apache.org/jira/browse/HBASE-28505 > Project: HBase > Issue Type: Task >Reporter: Vinayak Hegde >Assignee: Vinayak Hegde >Priority: Major > Labels: pull-request-available > > The implementation should enforce the requirement of enabling Date Tiered > Compaction for Time Range Data Tiering. This restriction ensures that users > can fully benefit from Time Range Data Tiering functionality by disallowing > its usage unless Date Tiered Compaction is enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28465) Implementation of framework for time-based priority bucket-cache.
[ https://issues.apache.org/jira/browse/HBASE-28465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28465. -- Resolution: Fixed Merged into the [HBASE-28463|https://github.com/apache/hbase/tree/HBASE-28463] feature branch. > Implementation of framework for time-based priority bucket-cache. > - > > Key: HBASE-28465 > URL: https://issues.apache.org/jira/browse/HBASE-28465 > Project: HBase > Issue Type: Task >Reporter: Janardhan Hungund >Assignee: Vinayak Hegde >Priority: Major > Labels: pull-request-available > > In this Jira, we track the implementation of framework for the time-based > priority cache. > This framework would help us to get the required metadata of the HFiles and > helps us make the decision about the hotness or coldness of data. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28458) BucketCache.notifyFileCachingCompleted may incorrectly consider a file fully cached
[ https://issues.apache.org/jira/browse/HBASE-28458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28458. -- Resolution: Fixed Merged into master, branch-3, branch-2 and branch-2.6. Thanks for reviewing it [~zhangduo] [~psomogyi] ! > BucketCache.notifyFileCachingCompleted may incorrectly consider a file fully > cached > --- > > Key: HBASE-28458 > URL: https://issues.apache.org/jira/browse/HBASE-28458 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Labels: pull-request-available > Fix For: 2.6.0, 3.0.0, 4.0.0-alpha-1, 2.7.0 > > > Noticed that > TestBucketCachePersister.testPrefetchBlockEvictionWhilePrefetchRunning was > flakey, failing whenever the block eviction happened while prefetch was still > ongoing. > In the test, we pass an instance of BucketCache directly to the cache config, > so the test is actually placing both data and meta blocks in the bucket > cache. So sometimes, the test call BucketCache.notifyFileCachingCompleted > after the it has already evicted two blocks. > Inside BucketCache.notifyFileCachingCompleted, we iterate through the > backingMap entry set, counting number of blocks for the given file. Then, to > consider whether the file is fully cached or not, we do the following > validation: > {noformat} > if (dataBlockCount == count.getValue() || totalBlockCount == > count.getValue()) { > LOG.debug("File {} has now been fully cached.", fileName); > fileCacheCompleted(fileName, size); > } {noformat} > But the test generates 57 total blocks, 55 data and 2 meta blocks. It evicts > two blocks and asserts that the file hasn't been considered fully cached. > When these evictions happen while prefetch is still going, we'll pass that > check, as the the number of blocks for the file in the backingMap would still > be 55, which is what we pass as dataBlockCount. > As BucketCache is intended for storing data blocks only, I believe we should > make sure BucketCache.notifyFileCachingCompleted only accounts for data > blocks. Also, the > TestBucketCachePersister.testPrefetchBlockEvictionWhilePrefetchRunning should > be updated to consistently reproduce the eviction concurrent to the prefetch. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28450) BuckeCache.evictBlocksByHfileName won't work after a cache recovery from file
[ https://issues.apache.org/jira/browse/HBASE-28450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28450. -- Resolution: Fixed Merged to master, branch-3, branch-2 and branch-2.6. Thanks for the reviews, [~psomogyi] [~ankit.jhil]! > BuckeCache.evictBlocksByHfileName won't work after a cache recovery from file > - > > Key: HBASE-28450 > URL: https://issues.apache.org/jira/browse/HBASE-28450 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.6.1 > > > HBASE-27313, HBASE-27686 and HBASE-27743 have extended BucketCache persistent > cache capabilities to make it resilient to RS crashes or non graceful stops, > when using file based ioengine for BucketCache. > BucketCache maintains two main collections for tracking blocks in the cache: > backingMap and blocksByHFile. The former is used as the main index of blocks > for the actual cache, whilst the latter is a set of all blocks in the cache > ordered by name, in order to conveniently and efficiently retrieve the list > of all blocks from a single file in the BucketCache.evictBlocksByHfile method. > > The problem is that at cache recovery time, we are populating the > blocksByHFile set, which causes any calls to BucketCache.evictBlocksByHfile > method to not evict any blocks, once we have recovered the cache from the > cache persistence file (for instance, after a n RS restart). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28458) BucketCache.notifyFileCachingCompleted may incorrectly consider a file fully cached
Wellington Chevreuil created HBASE-28458: Summary: BucketCache.notifyFileCachingCompleted may incorrectly consider a file fully cached Key: HBASE-28458 URL: https://issues.apache.org/jira/browse/HBASE-28458 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil Noticed that TestBucketCachePersister.testPrefetchBlockEvictionWhilePrefetchRunning was flakey, failing whenever the block eviction happened while prefetch was still ongoing. In the test, we pass an instance of BucketCache directly to the cache config, so the test is actually placing both data and meta blocks in the bucket cache. So sometimes, the test call BucketCache.notifyFileCachingCompleted after the it has already evicted two blocks. Inside BucketCache.notifyFileCachingCompleted, we iterate through the backingMap entry set, counting number of blocks for the given file. Then, to consider whether the file is fully cached or not, we do the following validation: {noformat} if (dataBlockCount == count.getValue() || totalBlockCount == count.getValue()) { LOG.debug("File {} has now been fully cached.", fileName); fileCacheCompleted(fileName, size); } {noformat} But the test generates 57 total blocks, 55 data and 2 meta blocks. It evicts two blocks and asserts that the file hasn't been considered fully cached. When these evictions happen while prefetch is still going, we'll pass that check, as the the number of blocks for the file in the backingMap would still be 55, which is what we pass as dataBlockCount. As BucketCache is intended for storing data blocks only, I believe we should make sure BucketCache.notifyFileCachingCompleted only accounts for data blocks. Also, the TestBucketCachePersister.testPrefetchBlockEvictionWhilePrefetchRunning should be updated to consistently reproduce the eviction concurrent to the prefetch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28450) BuckeCache.evictBlocksByHfileName won't work after a cache recovery from file
Wellington Chevreuil created HBASE-28450: Summary: BuckeCache.evictBlocksByHfileName won't work after a cache recovery from file Key: HBASE-28450 URL: https://issues.apache.org/jira/browse/HBASE-28450 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil HBASE-27313, HBASE-27686 and HBASE-27743 have extended BucketCache persistent cache capabilities to make it resilient to RS crashes or non graceful stops, when using file based ioengine for BucketCache. BucketCache maintains two main collections for tracking blocks in the cache: backingMap and blocksByHFile. The former is used as the main index of blocks for the actual cache, whilst the latter is a set of all blocks in the cache ordered by name, in order to conveniently and efficiently retrieve the list of all blocks from a single file in the BucketCache.evictBlocksByHfile method. The problem is that at cache recovery time, we are populating the blocksByHFile set, which causes any calls to BucketCache.evictBlocksByHfile method to not evict any blocks, once we have recovered the cache from the cache persistence file (for instance, after a n RS restart). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28303) Interrupt cache prefetch thread when a heap usage threshold is reached
[ https://issues.apache.org/jira/browse/HBASE-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28303. -- Resolution: Fixed Merged into master, branch-3 and branch-2. > Interrupt cache prefetch thread when a heap usage threshold is reached > -- > > Key: HBASE-28303 > URL: https://issues.apache.org/jira/browse/HBASE-28303 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.6.0, 2.4.17, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7, 2.7.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 3.0.0, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2 > > > Mostly critical when using non heap cache implementations, such as offheap or > file based. If the cache medium is too large and there are many blocks to be > cached, it may create a lot of cache index object in the RegionServer heap. > We should have guardrails to preventing caching from exhausting available > heap. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28303) Interrupt cache prefetch thread when a heap usage threshold is reached
Wellington Chevreuil created HBASE-28303: Summary: Interrupt cache prefetch thread when a heap usage threshold is reached Key: HBASE-28303 URL: https://issues.apache.org/jira/browse/HBASE-28303 Project: HBase Issue Type: Improvement Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil Mostly critical when using non heap cache implementations, such as offheap or file based. If the cache medium is too large and there are many blocks to be cached, it may create a lot of cache index object in the RegionServer heap. We should have guardrails to preventing caching from exhausting available heap. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28259) Add java.base/java.io=ALL-UNNAMED open to jdk11_jvm_flags
[ https://issues.apache.org/jira/browse/HBASE-28259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28259. -- Resolution: Fixed Merged to master, branch-3, branch-2, branch-2.6, branch-2.5 and branch-2.4. Thanks for the contribution, [~mrzhao] ! > Add java.base/java.io=ALL-UNNAMED open to jdk11_jvm_flags > -- > > Key: HBASE-28259 > URL: https://issues.apache.org/jira/browse/HBASE-28259 > Project: HBase > Issue Type: Bug > Components: java >Reporter: Moran >Assignee: Moran >Priority: Trivial > > hbase shell > 2023-12-13T23:49:50.846+08:00 [main] WARN FilenoUtil : Native subprocess > control requires open access to the JDK IO subsystem > Pass '--add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens > java.base/java.io=ALL-UNNAMED' to enable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28246) Expose region cached size over JMX metrics and report in the RS UI
[ https://issues.apache.org/jira/browse/HBASE-28246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28246. -- Resolution: Fixed Thanks for reviewing it, [~psomogyi]! Merged into master, branch-3 and branch-2. > Expose region cached size over JMX metrics and report in the RS UI > -- > > Key: HBASE-28246 > URL: https://issues.apache.org/jira/browse/HBASE-28246 > Project: HBase > Issue Type: Improvement > Components: BucketCache >Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 3.0.0-beta-1, 4.0.0-alpha-1 > > Attachments: Screenshot 2023-12-06 at 22.58.17.png > > > With large file based bucket cache, the prefetch executor can take long time > to complete cache all of the dataset. It would be useful to report how much % > of regions data is already cached, in order to give an idea of how much work > prefetch executor has done. > This PRs adds jmx metrics for region cache % and also reports the same in the > RS UI "Store File Metrics" tab as below: > !Screenshot 2023-12-06 at 22.58.17.png|width=658,height=114! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28251) [SFT] Add description for specifying SFT impl during snapshot recovery
[ https://issues.apache.org/jira/browse/HBASE-28251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28251. -- Resolution: Fixed Merged into master branch. Thanks for reviewing this, [~psomogyi] , [~nihaljain.cs] and [~zhangduo] ! > [SFT] Add description for specifying SFT impl during snapshot recovery > -- > > Key: HBASE-28251 > URL: https://issues.apache.org/jira/browse/HBASE-28251 > Project: HBase > Issue Type: Sub-task > Components: documentation >Affects Versions: 4.0.0-alpha-1 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 4.0.0-alpha-1 > > > HBASE-26286 added an option to clone_snapshot command that allows for > specifying the SFT implementation during the snapshot recovery. This really > useful when recovering snapshots imported from clusters not using the same > SFT impl as the one where we are cloning it. Without this, the cloned/restore > table will get created with the the SFT impl of its original cluster, > requiring extra conversion steps using the MIGRATION tracker. > This also fix formatting for the "Bulk Data Generator Tool", which is > currently displayed as a sub-topic of the SFT chapter. It should have it's > own chapter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28251) [SFT] Add description for specifying SFT impl during snapshot recovery
Wellington Chevreuil created HBASE-28251: Summary: [SFT] Add description for specifying SFT impl during snapshot recovery Key: HBASE-28251 URL: https://issues.apache.org/jira/browse/HBASE-28251 Project: HBase Issue Type: Improvement Components: documentation Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil HBASE-26286 added an option to clone_snapshot command that allows for specifying the SFT implementation during the snapshot recovery. This really useful when recovering snapshots imported from clusters not using the same SFT impl as the one where we are cloning it. Without this, the cloned/restore table will get created with the the SFT impl of its original cluster, requiring extra conversion steps using the MIGRATION tracker. This also fix formatting for the "Bulk Data Generator Tool", which is currently displayed as a sub-topic of the SFT chapter. It should have it's own chapter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28209) Create a jmx metrics to expose the oldWALs directory size
[ https://issues.apache.org/jira/browse/HBASE-28209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28209. -- Resolution: Fixed I have now merged the branch-2 PR and cherry-picked into branch-2.6, branch-2.5 and branch-2.4. Thanks for the contribution, [~vinayakhegde] ! > Create a jmx metrics to expose the oldWALs directory size > - > > Key: HBASE-28209 > URL: https://issues.apache.org/jira/browse/HBASE-28209 > Project: HBase > Issue Type: Improvement > Components: metrics >Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1 >Reporter: Vinayak Hegde >Assignee: Vinayak Hegde >Priority: Major > Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7, 2.7.0 > > > Create a jmx metrics that can return the size of the old WALs in bytes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28246) Expose region cached size over JMX metrics and report in the RS UI
Wellington Chevreuil created HBASE-28246: Summary: Expose region cached size over JMX metrics and report in the RS UI Key: HBASE-28246 URL: https://issues.apache.org/jira/browse/HBASE-28246 Project: HBase Issue Type: Improvement Components: BucketCache Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil Attachments: Screenshot 2023-12-06 at 22.58.17.png With large file based bucket cache, the prefetch executor can take long time to complete cache all of the dataset. It would be useful to report how much % of regions data is already cached, in order to give an idea of how much work prefetch executor has done. This PRs adds jmx metrics for region cache % and also reports the same in the RS UI "Store File Metrics" tab as below: !Screenshot 2023-12-06 at 22.58.17.png|width=658,height=114! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28186) Rebase CacheAwareBalance related commits into master branch
[ https://issues.apache.org/jira/browse/HBASE-28186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28186. -- Resolution: Fixed Thanks for helping with the backport to branch-2, [~ragarkar] ! > Rebase CacheAwareBalance related commits into master branch > --- > > Key: HBASE-28186 > URL: https://issues.apache.org/jira/browse/HBASE-28186 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28211) BucketCache.blocksByHFile may leak on allocationFailure or if we reach io errors tolerated
[ https://issues.apache.org/jira/browse/HBASE-28211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28211. -- Resolution: Fixed Thanks for reviewing it, [~zhangduo]! I have now merged it to master, branch-3, branch-2, branch-2.5 and branch-2.6. > BucketCache.blocksByHFile may leak on allocationFailure or if we reach io > errors tolerated > -- > > Key: HBASE-28211 > URL: https://issues.apache.org/jira/browse/HBASE-28211 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7 > > > We add blocks to BucketCache.blocksByHFile on doDrain before we actually had > successfully added the block to the cache. We may still fail to cache the > block if it is too big to fit any of the configured bucket sizes, or if we > fail to write it in the ioengine and reach the tolerated io errors threshold. > In such cases, the related block would remain in the > BucketCache.blocksByHFile indefinitely. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28217) PrefetchExecutor should not run for files from CFs that have disabled BLOCKCACHE
[ https://issues.apache.org/jira/browse/HBASE-28217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28217. -- Resolution: Fixed Thanks for reviewing it, [~psomogyi]. I have merged it to master, branch-3, branch-2, branch-2.5 and branch-2.4. > PrefetchExecutor should not run for files from CFs that have disabled > BLOCKCACHE > > > Key: HBASE-28217 > URL: https://issues.apache.org/jira/browse/HBASE-28217 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7 > > > HFilePReadReader relies on the return of CacheConfig.shouldPrefetchOnOpen > return to decide if it should run the PrefetchExecutor for the files. > Currently, CacheConfig.shouldPrefetchOnOpen returns true if > "hbase.rs.prefetchblocksonopen" is set to true at the config, OR > PREFETCH_BLOCKS_ON_OPEN is set to true at CF level. > There's also the CacheConfig.shouldCacheDataOnRead, which returns true if > both hbase.block.data.cacheonread is set to true at the config AND BLOCKCACHE > is set to true at CF level. > If BLOCKCACHE is set to false at CF level, HFilePReadReader will still run > the PrefetchExecutor to read all the file's blocks from the FileSystem, but > then would find out the given block shouldn't be cached. > I believe we should change CacheConfig.shouldPrefetchOnOpen to return true > only if CacheConfig.shouldCacheDataOnRead is also true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28176) PrefetchExecutor should stop once cache reaches capacity
[ https://issues.apache.org/jira/browse/HBASE-28176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28176. -- Resolution: Fixed Thanks for reviewing it, [~psomogyi]! Have merged into master, branch-3 and branch-2. > PrefetchExecutor should stop once cache reaches capacity > > > Key: HBASE-28176 > URL: https://issues.apache.org/jira/browse/HBASE-28176 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.6.0, 3.0.0-alpha-4, 4.0.0-alpha-1 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1 > > > The prefetch executor runs a full scan on regions in the background once > regions are opened, if the "hbase.rs.prefetchblocksonopen" property is set to > true. However, if the store file size is much larger than the cache capacity, > we should interrupt the prefetch once it has reached the cache capacity, > otherwise it would just be triggering evictions of little value, since we > don't have any sense of block priority at that point. It's better to stop the > read, and let client reads cause the eviction of LFU blocks and cache the > most accessed blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28217) PrefetchExecutor should not run for files from CFs that have disabled BLOCKCACHE
Wellington Chevreuil created HBASE-28217: Summary: PrefetchExecutor should not run for files from CFs that have disabled BLOCKCACHE Key: HBASE-28217 URL: https://issues.apache.org/jira/browse/HBASE-28217 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil HFilePReadReader relies on the return of CacheConfig.shouldPrefetchOnOpen return to decide if it should run the PrefetchExecutor for the files. Currently, CacheConfig.shouldPrefetchOnOpen returns true if "hbase.rs.prefetchblocksonopen" is set to true at the config, OR PREFETCH_BLOCKS_ON_OPEN is set to true at CF level. There's also the CacheConfig.shouldCacheDataOnRead, which returns true if both hbase.block.data.cacheonread is set to true at the config AND BLOCKCACHE is set to true at CF level. If BLOCKCACHE is set to false at CF level, HFilePReadReader will still run the PrefetchExecutor to read all the file's blocks from the FileSystem, but then would find out the given block shouldn't be cached. I believe we should change CacheConfig.shouldPrefetchOnOpen to return true only if CacheConfig.shouldCacheDataOnRead is also true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28211) BucketCache.blocksByHFile may leak on allocationFailure or if we reach io errors tolerated
Wellington Chevreuil created HBASE-28211: Summary: BucketCache.blocksByHFile may leak on allocationFailure or if we reach io errors tolerated Key: HBASE-28211 URL: https://issues.apache.org/jira/browse/HBASE-28211 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil We add blocks to BucketCache.blocksByHFile on doDrain before we actually had successfully added the block to the cache. We may still fail to cache the block if it is too big to fit any of the configured bucket sizes, or if we fail to write it in the ioengine and reach the tolerated io errors threshold. In such cases, the related block would remain in the BucketCache.blocksByHFile indefinitely. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28174) DELETE endpoint in REST API does not support deleting binary row keys/columns
[ https://issues.apache.org/jira/browse/HBASE-28174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28174. -- Resolution: Fixed Thanks for the contribution, [~james_udiljak_bhp]. I had now merged the PR on master branch and cherry-picked it to branch-3 and branch-2. > DELETE endpoint in REST API does not support deleting binary row keys/columns > - > > Key: HBASE-28174 > URL: https://issues.apache.org/jira/browse/HBASE-28174 > Project: HBase > Issue Type: Bug > Components: REST >Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1 >Reporter: James Udiljak >Assignee: James Udiljak >Priority: Blocker > Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1 > > Attachments: delete_base64_1.png > > > h2. Notes > This is the first time I have raised an issue in the ASF Jira. Please let me > know if there's anything I need to adjust on the issue to fit in with your > development flow. > I have marked the priority as "blocker" because this issue blocks me as a > user of the HBase REST API from deploying an effective solution for our > setup. Please feel free to change this if the Priority field has another > meaning to you. > I have also chosen 2.4.17 as the affected version because this is the version > I am running, however looking at the source code on GitHub in the default > branch, I think many other versions would be affected. > h2. Description of Issue > The DELETE operation in the [HBase REST > API|https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/rest/package-summary.html#operation_delete] > requires specifying row keys and column families/offsets in the URI (i.e. as > UTF-8 text). This makes it impossible to specify a delete operation via the > REST API for a binary row key or column family/offset, as single bytes with a > decimal value greater than 127 are not valid in UTF-8. > Percent-encoding these "high" values does not work around the issue, as the > HBase REST API uses Java's {{URLDecoder.Decode(percentEncodedString, > "UTF-8")}} function, which replaces any percent-encoded byte in the range > {{%80}} to {{%FF}} with the [replacement > character|https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character]. > Even if this were not the case, the row-key is ultimately [converted to a > byte > array|https://github.com/apache/hbase/blob/rel/2.4.17/hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/RowSpec.java#L60-L100] > using UTF-8 encoding, wherein code points >127 are encoded across multiple > bytes, corrupting the user-supplied row key. > h2. Proposed Solution > I do not believe it is possible to allow encoding of arbitrary bytes in the > URL for the DELETE endpoint without breaking compatibility for any users who > may have been unknowingly UTF-8 encoding their binary row keys. Even if it > were possible, the syntax would likely be terse. > Instead, I propose a new version of the DELETE endpoint that would accept row > keys and column families/offsets in the request _body_ (using Base64 encoding > for the JSON and XML formats, and bare binary for protobuf). This new > endpoint would follow the same conventions as the PUT operations, except that > cell values would not need to be specified (unless the user is performing a > check-and-delete operation). > As an additional benefit, using the request body could potentially allow for > deleting multiple rows in a single request, which would drastically improve > the efficiency of my use case. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28189) Fix the miss count in one of CombinedBlockCache getBlock implementations
[ https://issues.apache.org/jira/browse/HBASE-28189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28189. -- Resolution: Fixed Thanks for reviewing this, [~psomogyi]. I had merged this to master, branch-3, branch-2, branch-2.5 and branch-2.4. > Fix the miss count in one of CombinedBlockCache getBlock implementations > > > Key: HBASE-28189 > URL: https://issues.apache.org/jira/browse/HBASE-28189 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7 > > > In one of the > CombinedBlockCache.getBlock(getBlock(cacheKey,cachingrepeat,updateCacheMetrics) > we always compute a miss in L1 if the passed block is of type DATA. We > should compute the miss in one of the caches only, not both. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28189) Fix the miss count in one of CombinedBlockCache getBlock implementations
Wellington Chevreuil created HBASE-28189: Summary: Fix the miss count in one of CombinedBlockCache getBlock implementations Key: HBASE-28189 URL: https://issues.apache.org/jira/browse/HBASE-28189 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil In one of the CombinedBlockCache.getBlock(getBlock(cacheKey,cachingrepeat,updateCacheMetrics) we always compute a miss in L1 if the passed block is of type DATA. We should compute the miss in one of the caches only, not both. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28186) Rebase CacheAwareBalance related commits into master branch
Wellington Chevreuil created HBASE-28186: Summary: Rebase CacheAwareBalance related commits into master branch Key: HBASE-28186 URL: https://issues.apache.org/jira/browse/HBASE-28186 Project: HBase Issue Type: Sub-task Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28168) Add option in RegionMover.java to isolate one or more regions on the RegionSever
[ https://issues.apache.org/jira/browse/HBASE-28168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28168. -- Release Note: This adds a new "isolate_regions" operation to RegionMover, which allows operators to pass a list of region encoded ids to be "isolated" in the passed RegionServer. Regions currently deployed in the RegionServer that are not in the passed list of regions would be moved to other RegionServers. Regions in the passed list that are currently on other RegionServers would be moved to the passed RegionServer. Please refer to the command help for further information. Resolution: Fixed > Add option in RegionMover.java to isolate one or more regions on the > RegionSever > > > Key: HBASE-28168 > URL: https://issues.apache.org/jira/browse/HBASE-28168 > Project: HBase > Issue Type: New Feature >Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1 >Reporter: Mihir Monani >Assignee: Mihir Monani >Priority: Minor > Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7 > > > Sometime one or more HBase regions on RS are under high load. This can lead > to resource starvation for other regions hosted on the RS. It might be > necessary to isolate one or more regions on the RS so that region with heavy > load doesn't impact other regions hosted on the same RS. > RegionMover.java class provides a way to load/unload the regions from the > specific RS. It would be a good to have an option to pass list of region hash > that should be left (or moved) on the RS and put the RS in the > draining/decommission mode so HMaster doesn't assign new regions on the RS. > Ex. > {code:java} > --isolateRegionIds regionHash1,regionHash2,regionHash3{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28170) Put the cached time at the beginning of the block; run cache validation in the background when retrieving the persistent cache
[ https://issues.apache.org/jira/browse/HBASE-28170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28170. -- Resolution: Fixed Merged into master, branch-3 and branch-2. Thanks for reviewing it, [~psomogyi]! > Put the cached time at the beginning of the block; run cache validation in > the background when retrieving the persistent cache > -- > > Key: HBASE-28170 > URL: https://issues.apache.org/jira/browse/HBASE-28170 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.6.0, 3.0.0, 4.0.0-alpha-1 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1 > > > In HBASE-28004, we added a "cached time" long at the end of each block on the > bucket cache. We also record the cached time in the backing map we persist to > disk periodically, in order to retrieve the cache upon crashes/restarts. The > persisted backing map includes the last modification time of the cache itself. > On restarts, once we read the backing map from the persisted file, we compare > the last modification time of the cache recorded there against the last > modification time of the cache. If those differ, it means the cache has been > updated after the backing map has been persisted, so the backing map might > not be accurate. We then iterate though the backing map entires and compare > the entries cached time against the related block in the cache, and if those > differ, we remove the entry from the map. > Currently this validation is made at RS initialisation time, but with caches > as large as 1.6TB/30M+ blocks, it can last to an hour, meaning the RS is > useless over that time. This PR changes this validation to be performed in > the background, whilst direct accesses to a block in the cache would also > perform the "cached time" comparison. > This PR also moves the "cached time" to the beginning of the block in the > cache, instead of the end. We noticed that with the "cached time" at the end > we can fail to ensure consistency at some conditions. Consider the following: > 1) A block B1 of size S gets allocated at offset 0 with cached time T1; > 2) The backing map is persisted, containing B1 at offset 0 and cached time T1; > 3) B1 is evicted. It's offset in the cache is now free, however its contents > are still there, including the cached time T1 at its end; > 4) A new block B2 of size S/2 gets allocated at offset 0 with cached time T2; > 5) RS crashes before the backing map gets saved, so the persisted backing map > still has only the reference to B1, but not B2; > 6) At restart, we run the validation. Because B2 was half the size of B1, we > haven't overridden B1 cached time from the cache, so we will successfully > validate B1, although its content is now half overridden by B2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28176) PrefetchExecutor should stop once cache reaches capacity
Wellington Chevreuil created HBASE-28176: Summary: PrefetchExecutor should stop once cache reaches capacity Key: HBASE-28176 URL: https://issues.apache.org/jira/browse/HBASE-28176 Project: HBase Issue Type: Improvement Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil The prefetch executor runs a full scan on regions in the background once regions are opened, if the "hbase.rs.prefetchblocksonopen" property is set to true. However, if the store file size is much larger than the cache capacity, we should interrupt the prefetch once it has reached the cache capacity, otherwise it would just be triggering evictions of little value, since we don't have any sense of block priority at that point. It's better to stop the read, and let client reads cause the eviction of LFU blocks and cache the most accessed blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28170) Put the cached time at the beginning of the block run cache validation in the background when retrieving the persistent cache
Wellington Chevreuil created HBASE-28170: Summary: Put the cached time at the beginning of the block run cache validation in the background when retrieving the persistent cache Key: HBASE-28170 URL: https://issues.apache.org/jira/browse/HBASE-28170 Project: HBase Issue Type: Sub-task Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil In HBASE-28004, we added a "cached time" long at the end of each block on the bucket cache. We also record the cached time in the backing map we persist to disk periodically, in order to retrieve the cache upon crashes/restarts. The persisted backing map includes the last modification time of the cache itself. On restarts, once we read the backing map from the persisted file, we compare the last modification time of the cache recorded there against the last modification time of the cache. If those differ, it means the cache has been updated after the backing map has been persisted, so the backing map might not be accurate. We then iterate though the backing map entires and compare the entries cached time against the related block in the cache, and if those differ, we remove the entry from the map. Currently this validation is made at RS initialisation time, but with caches as large as 1.6TB/30M+ blocks, it can last to an hour, meaning the RS is useless over that time. This PR changes this validation to be performed in the background, whilst direct accesses to a block in the cache would also perform the "cached time" comparison. This PR also moves the "cached time" to the beginning of the block in the cache, instead of the end. We noticed that with the "cached time" at the end we can fail to ensure consistency at some conditions. Consider the following: 1) A block B1 of size S gets allocated at offset 0 with cached time T1; 2) The backing map is persisted, containing B1 at offset 0 and cached time T1; 3) B1 is evicted. It's offset in the cache is now free, however its contents are still there, including the cached time T1 at its end; 4) A new block B2 of size S/2 gets allocated at offset 0 with cached time T2; 5) RS crashes before the backing map gets saved, so the persisted backing map still has only the reference to B1, but not B2; 6) At restart, we run the validation. Because B2 was half the size of B1, we haven't overridden B1 cached time from the cache, so we will successfully validate B1, although its content is now half overridden by B2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27999) Implement cache aware load balancer
[ https://issues.apache.org/jira/browse/HBASE-27999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27999. -- Resolution: Fixed Merged into the feature branch. > Implement cache aware load balancer > --- > > Key: HBASE-27999 > URL: https://issues.apache.org/jira/browse/HBASE-27999 > Project: HBase > Issue Type: Sub-task > Components: Balancer >Reporter: Rahul Agarkar >Assignee: Rahul Agarkar >Priority: Major > > HBase uses ephemeral cache to cache the blocks by reading them from the slow > storages and storing them to the bucket cache. This cache is warmed up > everytime a region server is started. Depending on the data size and the > configured cache size, the cache warm up can take anywhere between a few > minutes to few hours. Doing this everytime the region server starts can be a > very expensive process. To eliminate this, HBASE-27313 implemented the cache > persistence feature where the region servers periodically persist the blocks > cached in the bucket cache. This persisted information is then used to > resurrect the cache in the event of a region server restart because of normal > restart or crash. > This feature aims at enhancing this capability of HBase to enable the > balancer implementation considers the cache allocation of each region on > region servers when calculating a new assignment plan and uses the > region/region server cache allocation info reported by region servers which > takes into account to calculate the percentage of HFiles cached for each > region on the hosting server, and then use that as another factor when > deciding on an optimal, new assignment plan. > > A design document describing the balancer can be found at > https://docs.google.com/document/d/1A8-eVeRhZjwL0hzFw9wmXl8cGP4BFomSlohX2QcaFg4/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HBASE-27389) Add cost function in balancer to consider the cost of building bucket cache before moving regions
[ https://issues.apache.org/jira/browse/HBASE-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil reopened HBASE-27389: -- > Add cost function in balancer to consider the cost of building bucket cache > before moving regions > - > > Key: HBASE-27389 > URL: https://issues.apache.org/jira/browse/HBASE-27389 > Project: HBase > Issue Type: Task > Components: Balancer >Reporter: Rahul Agarkar >Assignee: Rahul Agarkar >Priority: Major > > HBase currently uses StochasticLoadBalancer to determine the cost of moving > the regions from one RS to another. Each cost functions give a result between > 0 and 1, with 0 being the lowest cost and 1 being the cost. The balancer > iterates through each cost function and comes up with the total cost. Now, > the balancer will create multiple balancing plans on random actions and try > to compute the cost of each plan as if they are executed, if the cost of the > plan is less than the initial cost, the plan is executed. > Implement a new "CacheAwareCostFunction" which takes into account if the > region is fully cached and return the highest cost if the plan suggests > moving this region. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27389) Add cost function in balancer to consider the cost of building bucket cache before moving regions
[ https://issues.apache.org/jira/browse/HBASE-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27389. -- Resolution: Fixed Merged into the feature branch. > Add cost function in balancer to consider the cost of building bucket cache > before moving regions > - > > Key: HBASE-27389 > URL: https://issues.apache.org/jira/browse/HBASE-27389 > Project: HBase > Issue Type: Task > Components: Balancer >Reporter: Rahul Agarkar >Assignee: Rahul Agarkar >Priority: Major > > HBase currently uses StochasticLoadBalancer to determine the cost of moving > the regions from one RS to another. Each cost functions give a result between > 0 and 1, with 0 being the lowest cost and 1 being the cost. The balancer > iterates through each cost function and comes up with the total cost. Now, > the balancer will create multiple balancing plans on random actions and try > to compute the cost of each plan as if they are executed, if the cost of the > plan is less than the initial cost, the plan is executed. > Implement a new "CacheAwareCostFunction" which takes into account if the > region is fully cached and return the highest cost if the plan suggests > moving this region. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28097) Add documentation section for the Cache Aware balancer function
Wellington Chevreuil created HBASE-28097: Summary: Add documentation section for the Cache Aware balancer function Key: HBASE-28097 URL: https://issues.apache.org/jira/browse/HBASE-28097 Project: HBase Issue Type: Sub-task Reporter: Wellington Chevreuil -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27998) Enhance region metrics to include prefetch ratio for each region
[ https://issues.apache.org/jira/browse/HBASE-27998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27998. -- Resolution: Done Merged into the feature branch. > Enhance region metrics to include prefetch ratio for each region > > > Key: HBASE-27998 > URL: https://issues.apache.org/jira/browse/HBASE-27998 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Rahul Agarkar >Assignee: Rahul Agarkar >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28044) Reduce frequency of saving backing map in persistence cache
Wellington Chevreuil created HBASE-28044: Summary: Reduce frequency of saving backing map in persistence cache Key: HBASE-28044 URL: https://issues.apache.org/jira/browse/HBASE-28044 Project: HBase Issue Type: Sub-task Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil Currently we always write the whole cache mapping into the persistent map file. This is not a lightweight operation, on a full 1.6TB cache with ten millions of block, this can grow as much as 10GB. In the current persistent cache implementation, we flush it to disk every 1s. If we raise the "checkpoint" period, we risk lose more cache events in the event of a recovery. This proposes reduce the frequency needed to save the backing map as follows: 1) Save every block addition/eviction into a single file in disk; 2) Checkpoint at higher intervals, consolidating all transactions into the larger map file; 3) In the event of failure, recovery would consist of loading the latest map file, then applying all the transactions files sequentially; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28004) Persistent cache map can get corrupt if crash happens midway through the write
[ https://issues.apache.org/jira/browse/HBASE-28004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28004. -- Resolution: Fixed > Persistent cache map can get corrupt if crash happens midway through the write > -- > > Key: HBASE-28004 > URL: https://issues.apache.org/jira/browse/HBASE-28004 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.6.0, 3.0.0-alpha-4, 4.0.0-alpha-1 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1 > > > HBASE-27686 added a background thread for periodically saving the cache index > map, together with a list of completed cached files so that we can recover > the cache state in case of crash or restart. Problem is that the cache index > can become few GB large (a sample case with 1.6TB of used bucket cache would > map to between 8GB to 10GB indexes), and these writes take few seconds to > complete, causing any RS crash very likely to leave a corrupt index file that > can't be recovered when the RS starts again. Worse, since we store the list > of cached files on a separate file, this also leads to cache inconsistencies, > with files in the list of cached files never cached once the RS is restarted, > even though we have no cache index for those and every read ends up going to > the FS. > This task aims to refactor the cache persistent as follows: > 1) Write both the list of completely cached files and the cache indexes in a > single file, so that we can have this synced atomically; > 2) When writing the persistent cache file, use a temp name first, then once > the write is successfully finished, rename it to the actual name. This way, > if crash happens whilst the persistent cache is still being written, the temp > file would be corrupt, but we could still recover from the last successful > sync, and we would only lose the caching ops since the last sync. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28041) Rebase HBASE-27389 branch with master and fix conflicts
[ https://issues.apache.org/jira/browse/HBASE-28041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-28041. -- Resolution: Fixed Rebased and fixed conflicts. > Rebase HBASE-27389 branch with master and fix conflicts > --- > > Key: HBASE-28041 > URL: https://issues.apache.org/jira/browse/HBASE-28041 > Project: HBase > Issue Type: Sub-task >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28041) Rebase HBASE-27389 branch with master
Wellington Chevreuil created HBASE-28041: Summary: Rebase HBASE-27389 branch with master Key: HBASE-28041 URL: https://issues.apache.org/jira/browse/HBASE-28041 Project: HBase Issue Type: Sub-task Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27997) Enhance prefetch executor to record region prefetch information along with the list of hfiles prefetched
[ https://issues.apache.org/jira/browse/HBASE-27997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27997. -- Resolution: Done > Enhance prefetch executor to record region prefetch information along with > the list of hfiles prefetched > > > Key: HBASE-27997 > URL: https://issues.apache.org/jira/browse/HBASE-27997 > Project: HBase > Issue Type: Sub-task > Components: BucketCache >Affects Versions: 2.6.0, 3.0.0-alpha-4 >Reporter: Rahul Agarkar >Assignee: Rahul Agarkar >Priority: Major > > HBASE-27313 implemented the prefetch persistence feature where it persists > the list of hFiles prefetched in the bucket cache. This information is used > to reconstruct the cache in the event of a server restart/crash. > Currently, only the list of hFiles is persisted. > However, for the new PrefetchAwareLoadBalancer (work in progress) to work, we > need the information about how much a region is prefetched on a region server. > This Jira introduces an additional map in the prefetch executor to maintain > the information about how much a region has been prefetched on that region > server. The size of region server prefetched is calculated as the total size > of all hFiles prefetched for that region. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28004) Persistent cache map can get corrupt if crash happens midway through the write
Wellington Chevreuil created HBASE-28004: Summary: Persistent cache map can get corrupt if crash happens midway through the write Key: HBASE-28004 URL: https://issues.apache.org/jira/browse/HBASE-28004 Project: HBase Issue Type: Sub-task Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27871) Meta replication stuck forever if wal it's still reading gets rolled and deleted
[ https://issues.apache.org/jira/browse/HBASE-27871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27871. -- Resolution: Fixed > Meta replication stuck forever if wal it's still reading gets rolled and > deleted > > > Key: HBASE-27871 > URL: https://issues.apache.org/jira/browse/HBASE-27871 > Project: HBase > Issue Type: Bug > Components: meta replicas >Affects Versions: 2.6.0, 2.4.16, 2.4.17, 2.5.4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 2.4.18, 2.5.6 > > > This affects branch-2 based releases only (in master, HBASE-26416 refactored > region replication to not rely on the replication framework anymore). > Per the original [meta region replicas > design|https://docs.google.com/document/d/1jJWVc-idHhhgL4KDRpjMsQJKCl_NRaCLGiH3Wqwd3O8/edit], > we use most of the replication framework for communicating changes in the > primary replica back to the secondary ones, but we skip storing the queue > state in ZK. In the event of a region replication crash, we should let the > related replication source thread be interrupted, so that > RegionReplicaReplicationEndpoint would set a new source from the scratch and > make sure to update the secondary replicas. > > We have run into a situation in one of our customers' cluster where the > region replica source faced a long lag (probably because the RSes hosting the > secondary replicas were busy and slower in processing the region replication > entries), so that the current wal got rolled and eventually deleted whilst > the replication source reader was still referring it. In such cases, > ReplicationSourceReader only sees the IOException and keeps retrying the read > indefinitely, but since the file is now gone, it will just get stuck there > forever. In the particular case of FNFE (which I believe would only happen > for region replication), we should just raise an exception and let > RegionReplicaReplicationEndpoint handle it to reset the region replication > source. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27915) Update hbase_docker with an extra Dockerfile compatible with mac m1 platfrom
Wellington Chevreuil created HBASE-27915: Summary: Update hbase_docker with an extra Dockerfile compatible with mac m1 platfrom Key: HBASE-27915 URL: https://issues.apache.org/jira/browse/HBASE-27915 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil When trying to use the current Dockerfile under "./dev-support/hbase_docker" on m1 macs, the docker build fails at the git clone & mvn build stage with below error: {noformat} #0 8.214 qemu-x86_64: Could not open '/lib64/ld-linux-x86-64.so.2': No such file or directory {noformat} It turns out for mac m1, we have to explicitly define the platform flag for the ubuntu image. I thought we could add a note in this readme, together with an "m1" subfolder containing a modified copy of this Dockerfile that works on mac m1s. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27820) HBase is not starting due to Jersey library conflicts with javax.ws.rs.api jar
[ https://issues.apache.org/jira/browse/HBASE-27820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27820. -- Resolution: Fixed Also merged the branch-2.4 PR. > HBase is not starting due to Jersey library conflicts with javax.ws.rs.api jar > -- > > Key: HBASE-27820 > URL: https://issues.apache.org/jira/browse/HBASE-27820 > Project: HBase > Issue Type: Task > Components: dependencies >Affects Versions: 3.0.0-alpha-3 >Reporter: Rahul Agarkar >Assignee: Rahul Agarkar >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.5, 2.4.18 > > > With some recent Atlas changes for supporting HTTP based hook support, HBase > is not starting because of conflicts between jersey jars and rs-api jar. > This Jira is to exclude the javax.ws.rs-api.jar from the HBase classpath. > HBase uses shaded jersey jars and hence does not need to use this jar > directly. However, it still adds this jar to the CLASSPATH while starting the > server. Atlas on the other hand is using a non-shaded version of > javax.ws.rs-api jar which causes this conflict and causes the hbase server > fail while initializing atlas co-processor. > Since hbase is using shaded jersey jar and not using this jar directly, it > should be removed from the bundle as it may cause similar conflicts with > other client applications potentially using it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27874) Problem in flakey generated report causes pre-commit run to fail
Wellington Chevreuil created HBASE-27874: Summary: Problem in flakey generated report causes pre-commit run to fail Key: HBASE-27874 URL: https://issues.apache.org/jira/browse/HBASE-27874 Project: HBase Issue Type: Bug Components: build Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil Have noticed the UT pre-commit run failed on this latest PR for branch-2 with the below: {noformat} Thu May 18 10:37:32 AM UTC 2023 cd /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-5241/yetus-jdk8-hadoop2-check/src/hbase-server /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-5241/yetus-m2/hbase-branch-2-patch-1 --threads=4 -Djava.io.tmpdir=/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-5241/yetus-jdk8-hadoop2-check/src/target -DHBasePatchProcess -PrunAllTests -Dtest.exclude.pattern=**/regionserver.TestMetricsRegionServer.java,**/master.procedure.TestSnapshotProcedureRSCrashes.java,**/security.access.TestAccessController.java,**/conf.TestConfigurationManagerWARNING: package jdk.internal.util.random not in java.base.java,**/io.hfile.bucket.TestPrefetchPersistence.java,**/client.TestFromClientSide3.java,**/replication.TestReplicationMetricsforUI.java,**/io.hfile.bucket.TestBucketCache.java,**/replication.regionserver.TestReplicationValueCompressedWAL.java,**/master.procedure.TestHBCKSCP.java,**/http.TestInfoServersACL.java,**/io.hfile.bucket.TestBucketCachePersister.java,**/replication.TestReplicationKillSlaveRS.java,**/regionserver.TestClearRegionBlockCache.java,**/master.TestUnknownServers.java,**/replication.TestReplicationKillSlaveRSWithSeparateOldWALs.java,**/quotas.TestClusterScopeQuotaThrottle.java,**/io.hfile.TestBlockEvictionOnRegionMovement.java,**/replication.regionserver.TestMetaRegionReplicaReplicationEndpoint.java,**/regionserver.TestRegionReplicas.java,**/coprocessor.TestCoprocessorEndpointTracing.java,**/master.region.TestMasterRegionCompaction.java,**/io.hfile.TestPrefetchRSClose.java -Dsurefire.firstPartForkCount=0.5C -Dsurefire.secondPartForkCount=0.5C clean test -fae [INFO] BUILD FAILURE [INFO] [INFO] Total time: 0.861 s (Wall Clock) [INFO] Finished at: 2023-05-18T10:37:34Z [INFO] [ERROR] Unknown lifecycle phase "jdk.internal.util.random". You must specify a valid lifecycle phase or a goal in the format : or :[:]:. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1] [ERROR] {noformat} Note the "{+}**/conf.TestConfigurationManagerWARNING: package jdk.internal.util.random not in java.base.java{+}" passed as one of the supposedly flakey tests. Looking around our build scripts, I figured we pull the list of flakey from the "{+}excludes{+}" artifact generated by the latest "find flakey" build. It seems the [latest branch-2 run|https://ci-hbase.apache.org/job/HBase-Find-Flaky-Tests/job/branch-2/1063/artifact/output/excludes] generated this artifact with the wrong name already: {noformat} **/replication.TestReplicationMetricsforUI.java,**/conf.TestConfigurationManagerWARNING: package jdk.internal.util.random not in java.base.java,**/master.region.TestMasterRegionCompaction.java,**/regionserver.TestRegionReplicas.java,**/replication.regionserver.TestReplicationValueCompressedWAL.java,**/coprocessor.TestCoprocessorEndpointTracing.java,**/quotas.TestClusterScopeQuotaThrottle.java,**/replication.TestReplicationKillSlaveRSWithSeparateOldWALs.java,**/client.TestFromClientSide3.java,**/io.hfile.TestBlockEvictionOnRegionMovement.java,**/io.hfile.bucket.TestPrefetchPersistence.java,**/regionserver.TestMetricsRegionServer.java,**/io.hfile.bucket.TestBucketCachePersister.java,**/regionserver.TestClearRegionBlockCache.java,**/master.procedure.TestHBCKSCP.java,**/replication.regionserver.TestMetaRegionReplicaReplicationEndpoint.java,**/security.access.TestAccessController.java,**/io.hfile.bucket.TestBucketCache.java,**/io.hfile.TestPrefetchRSClose.java,**/replication.TestReplicationKillSlaveRS.java,**/master.TestUnknownServers.java,**/http.TestInfoServersACL.java {noformat} Digging deeper, found that the "find flakey" build checks the UT output of latest nightly and flakey builds, to
[jira] [Created] (HBASE-27871) Meta replication stuck forever if wal it's still reading gets rolled and deleted
Wellington Chevreuil created HBASE-27871: Summary: Meta replication stuck forever if wal it's still reading gets rolled and deleted Key: HBASE-27871 URL: https://issues.apache.org/jira/browse/HBASE-27871 Project: HBase Issue Type: Bug Components: meta replicas Affects Versions: 2.5.4, 2.4.17 Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil This affects branch-2 based releases only (in master, HBASE-26416 refactored region replication to not rely on the replication framework anymore). Per the original [meta region replicas design|https://docs.google.com/document/d/1jJWVc-idHhhgL4KDRpjMsQJKCl_NRaCLGiH3Wqwd3O8/edit], we use most of the replication framework for communicating changes in the primary replica back to the secondary ones, but we skip storing the queue state in ZK. In the event of a region replication crash, we should let the related replication source thread be interrupted, so that RegionReplicaReplicationEndpoint would set a new source from the scratch and make sure to update the secondary replicas. We have run into a situation in one of our customers' cluster where the region replica source faced a long lag (probably because the RSes hosting the secondary replicas were busy and slower in processing the region replication entries), so that the current wal got rolled and eventually deleted whilst the replication source reader was still referring it. In such cases, ReplicationSourceReader only sees the IOException and keeps retrying the read indefinitely, but since the file is now gone, it will just get stuck there forever. In the particular case of FNFE (which I believe would only happen for region replication), we should just raise an exception and let RegionReplicaReplicationEndpoint handle it to reset the region replication source. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27789) Backport "HBASE-24914 Remove duplicate code appearing continuously in method ReplicationPeerManager.updatePeerConfig" to branch-2
[ https://issues.apache.org/jira/browse/HBASE-27789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27789. -- Resolution: Fixed Merged into branch-2.5. I had also made you a contributor, [~Li Zhexi], so you should be able to self-assign jiras now. > Backport "HBASE-24914 Remove duplicate code appearing continuously in method > ReplicationPeerManager.updatePeerConfig" to branch-2 > - > > Key: HBASE-27789 > URL: https://issues.apache.org/jira/browse/HBASE-27789 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.5.0, 2.4.14 >Reporter: Li Zhexi >Assignee: Li Zhexi >Priority: Minor > Fix For: 2.5.5 > > > Branch-2/ Branch-2.4/Branch-2.5 also have duplicate code in > ReplicationPeerManager#updatePeerConfig > newPeerConfigBuilder.putAllConfiguration(oldPeerConfig.getConfiguration()); > newPeerConfigBuilder.putAllConfiguration(peerConfig.getConfiguration()); > newPeerConfigBuilder.putAllConfiguration(oldPeerConfig.getConfiguration()); > newPeerConfigBuilder.putAllConfiguration(peerConfig.getConfiguration()); -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27795) Define RPC API for cache cleaning
Wellington Chevreuil created HBASE-27795: Summary: Define RPC API for cache cleaning Key: HBASE-27795 URL: https://issues.apache.org/jira/browse/HBASE-27795 Project: HBase Issue Type: Sub-task Reporter: Wellington Chevreuil We should add an RPC API to allow for a "limited manual" cache cleaning. If hbase.rs.evictblocksonclose is set to false, blocks may hang in the cache upon regions move between RSes. The method at the RS level, should compare the files from its online regions against the files in the prefetch list file, evicting blocks from the files in the prefetch list file that are not in any of the online regions for the given RS. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27794) Tooling for parsing/reading the prefetch files list file
Wellington Chevreuil created HBASE-27794: Summary: Tooling for parsing/reading the prefetch files list file Key: HBASE-27794 URL: https://issues.apache.org/jira/browse/HBASE-27794 Project: HBase Issue Type: Sub-task Reporter: Wellington Chevreuil The content of the file defined by hbase.prefetch.file.list.path is encoded. It would be nice to have some extra tool for properly parsing it and print the list in human readable format, for easy of troubleshooting. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27750) Update the list of prefetched hfiles upon simple block eviction
[ https://issues.apache.org/jira/browse/HBASE-27750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27750. -- Resolution: Fixed Thanks for the contribution [~sk...@cloudera.com]. I had now merged it into master and branch-2 branches. > Update the list of prefetched hfiles upon simple block eviction > --- > > Key: HBASE-27750 > URL: https://issues.apache.org/jira/browse/HBASE-27750 > Project: HBase > Issue Type: Sub-task > Components: BucketCache >Affects Versions: 2.6.0, 3.0.0-alpha-4 >Reporter: Shanmukha Haripriya Kota >Assignee: Shanmukha Haripriya Kota >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4 > > > Currently, we maintain a list of Hfiles on disk for which prefetch is > complete and avoid prefetching those files after a restart. But we don't > handle cases where blocks are evicted from the cache. This ticket is for > updating the list of prefetched files upon simple block eviction. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27686) Recovery of BucketCache and Prefetched data after RS Crash
[ https://issues.apache.org/jira/browse/HBASE-27686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27686. -- Resolution: Fixed Thanks for the contribution [~sk...@cloudera.com]! I have now merged this into master and branch-2. > Recovery of BucketCache and Prefetched data after RS Crash > -- > > Key: HBASE-27686 > URL: https://issues.apache.org/jira/browse/HBASE-27686 > Project: HBase > Issue Type: Improvement > Components: BucketCache >Reporter: Shanmukha Haripriya Kota >Assignee: Shanmukha Haripriya Kota >Priority: Major > > HBASE-27313 introduced the ability to persist a list of hfiles for which > prefetch has already been completed, so the we can avoid prefetching those > files again in the event of a graceful restart, but it doesn't cover crash > scenarios, as if the RS is killed or abnormally stopped, the list wouldn't be > saved. > This change aims to persist the list of already prefetched from a background > thread that periodically checks cache state and persists the list if updates > have happened. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27619) Bulkload fails when trying to bulkload files with invalid names after HBASE-26707
[ https://issues.apache.org/jira/browse/HBASE-27619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27619. -- Resolution: Fixed Merged into master, branch-2 and branch-2.5 branches. Thanks for reviewing it, [~swu] > Bulkload fails when trying to bulkload files with invalid names after > HBASE-26707 > - > > Key: HBASE-27619 > URL: https://issues.apache.org/jira/browse/HBASE-27619 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.0-alpha-3, 2.5.3 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4 > > > HBASE-26707 has introduced changes to reduce renames on bulkload when using > FILE based SFT. However, if the bulkloading file has an invalid hfile name, > or has been split in the bulkload process, we don't do any renaming when FILE > based SFT is enabled, and we place the file name as "it is" in the store dir. > This later fails the validations performed by StoreFileReader when it tries > to open the file. > This jira adds extra validation for the bulkloading file name format in > HRegion.bulkLoadHFiles and also extends TestLoadIncrementalHFiles to run the > same test suite with FILE based SFT enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27619) Bulkload fails when trying to bulkload files with invalid names after HBASE-26707
Wellington Chevreuil created HBASE-27619: Summary: Bulkload fails when trying to bulkload files with invalid names after HBASE-26707 Key: HBASE-27619 URL: https://issues.apache.org/jira/browse/HBASE-27619 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil HBASE-26707 has introduced changes to reduce renames on bulkload when using FILE based SFT. However, if the bulkloading file has an invalid hfile name, or has been split in the bulkload process, we don't do any renaming when FILE based SFT is enabled, and we place the file name as "it is" in the store dir. This later fails the validations performed by StoreFileReader when it tries to open the file. This jira adds extra validation for the bulkloading file name format in HRegion.bulkLoadHFiles and also extends TestLoadIncrementalHFiles to run the same test suite with FILE based SFT enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27551) Add config options to delay assignment to retain last region location
[ https://issues.apache.org/jira/browse/HBASE-27551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27551. -- Resolution: Fixed Merged into master and branch-2. Thanks for reviewing it [~zhangduo][~swu]! > Add config options to delay assignment to retain last region location > - > > Key: HBASE-27551 > URL: https://issues.apache.org/jira/browse/HBASE-27551 > Project: HBase > Issue Type: Improvement >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4 > > > HBASE-27313 introduced the ability to persist the list of files cached in a > given RS, but temporary RSes loss or restarts would cause regions to be > eagerly reassigned on other RSes, making the persisted cache useless. For > some use cases, such as when using ObjectStores based persistence, > performance degradation caused by cache misses have a worse impact than > temporary region unavailability. > This proposes and additional config property (disabled by default) to > forcibly wait the TRSP for a configurable time while checking for the > previous RS holding region to get back online, before proceeding with the > region assignment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27551) Add config options to delay assignment to retain last region location
Wellington Chevreuil created HBASE-27551: Summary: Add config options to delay assignment to retain last region location Key: HBASE-27551 URL: https://issues.apache.org/jira/browse/HBASE-27551 Project: HBase Issue Type: Improvement Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil HBASE-27313 introduced the ability to persist the list of files cached in a given RS, but temporary RSes loss or restarts would cause regions to be eagerly reassigned on other RSes, making the persisted cache useless. For some use cases, such as when using ObjectStores based persistence, performance degradation caused by cache misses have a worse impact than temporary region unavailability. This proposes and additional config property (disabled by default) to forcibly wait the TRSP for a configurable time while checking for the previous RS holding region to get back online, before proceeding with the region assignment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27474) Evict blocks on split/merge; Avoid caching reference/hlinks if compaction is enabled
[ https://issues.apache.org/jira/browse/HBASE-27474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27474. -- Release Note: This modifies behaviour of block cache management as follows: 1) Always evict blocks for the files from parent split region once it's closed, regardless of the "hbase.rs.evictblocksonclose" configured value; 2) If compactions are enabled, doesn't cache blocks for the refs/link files under split daughters once these regions are opened; For #1 above, an additional evict_cache property has been added to the CloseRegionRequest protobuf message. It's default to false. Rolling upgrading cluster would retain the previous behaviour on RSes not yet upgraded. Resolution: Fixed Thanks for the review [~psomogyi] ! Had merged into master and branch-2. > Evict blocks on split/merge; Avoid caching reference/hlinks if compaction is > enabled > > > Key: HBASE-27474 > URL: https://issues.apache.org/jira/browse/HBASE-27474 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-3 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4 > > > This change aims to improve block cache usage upon splits/merges. On a > split/merge event the following main steps happen: > 1) parent regions are closed; 2) daughters are created and opened with > refs/hlinks; 3) Compaction is triggered soon after the daughters get online; > With "hbase.rs.evictblocksonclose" set to false, we keep all blocks for the > closed regions in 1, then will try to load same blocks again on 2 (since we > are using the refs/links for the cache key), just to throw it away and cache > the compaction resulting file in 3. > If the block cache is close to its capacity, blocks from the compacted files > in 3 will likely miss the cache. > The proposal here is to always evict blocks for parent regions on a > split/merge event, and also avoid caching blocks for refs/hlinks if > compactions are enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27511) Lock contention when doing multiple parallel preads due to StoreFileReader reuse
Wellington Chevreuil created HBASE-27511: Summary: Lock contention when doing multiple parallel preads due to StoreFileReader reuse Key: HBASE-27511 URL: https://issues.apache.org/jira/browse/HBASE-27511 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil Attachments: rs-stack-lock-contention In HStoreFile, we reuse the StoreFileReader created during HStoreFile initialization when creating a StoreFileScanner for preads: [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStoreFile.java#L545] When using S3 as hbase storage, we noticed this caused lock contention when multiple clients were doing preads in parallel: {noformat} ... "RpcServer.default.FPBQ.Fifo.handler=38,queue=8,port=16020" #125 daemon prio=5 os_prio=0 tid=0x7fc11d83c000 nid=0x73f2 waiting for monitor entry [0x7fc1154e6000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:73) - waiting to lock <0x7fc4c2769660> (a org.apache.hadoop.fs.s3a.S3AInputStream) ... "RpcServer.default.FPBQ.Fifo.handler=37,queue=7,port=16020" #124 daemon prio=5 os_prio=0 tid=0x7fc11d83a000 nid=0x73f1 waiting for monitor entry [0x7fc1155e7000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:73) - waiting to lock <0x7fc4c2769660> (a org.apache.hadoop.fs.s3a.S3AInputStream) ... "RpcServer.default.FPBQ.Fifo.handler=36,queue=6,port=16020" #123 daemon prio=5 os_prio=0 tid=0x7fc11d838000 nid=0x73f0 runnable [0x7fc1156e8000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) ... at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:216) - locked <0x7fc4c2769660> (a org.apache.hadoop.fs.s3a.S3AInputStream) ... {noformat} We should create a new instance of StoreFileReader for each StoreFileScanner when doing preads, instead, -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27484) FNFE on StoreFileScanner after a flush followed by a compaction
[ https://issues.apache.org/jira/browse/HBASE-27484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27484. -- Resolution: Fixed Merged to master, branch-2, branch-2.5 and branch-2.4. Thanks for reviewing it, [~psomogyi] ! > FNFE on StoreFileScanner after a flush followed by a compaction > --- > > Key: HBASE-27484 > URL: https://issues.apache.org/jira/browse/HBASE-27484 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > > One of our customers was running SyncTable from a 1.2 based cluster, where > SyncTable map tasks were open scanners on a 2.4 based cluster for comparing > the two clusters. Few of the map tasks failed with a DoNotRetryException > caused by a FileNotFoundException blowing all the way up to the client: > {noformat} > Error: org.apache.hadoop.hbase.DoNotRetryIOException: > org.apache.hadoop.hbase.DoNotRetryIOException: java.io.FileNotFoundException: > open > s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0 > at 7225 on > s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0: > com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does > not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; > Request ID: KBRNC67WZGCS4SCF; S3 Extended Request ID: > wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=; > Proxy: null), S3 Extended Request ID: > wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=:NoSuchKey > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3712) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45819) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339) > Caused by: java.io.FileNotFoundException: open > s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0 > at 7225 on > s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0: > com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does > not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; > Request ID: KBRNC67WZGCS4SCF; S3 Extended Request ID: > wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=; > Proxy: null), S3 Extended Request ID: > wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=:NoSuchKey > ... > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:632) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:315) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:417) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.reopenAfterFlush(StoreScanner.java:1018) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:552) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:155) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:7399) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:7567) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:7331) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3373) > {noformat} > We can see on the RS logs that the above file got recently create as an > outcome of a memstore flush, then compaction is triggered shortly: > {noformat} > 2022-11-11 22:16:50,322 INFO > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed memstore > data size=208.15 KB at sequenceid=4949703 (bloomFilter=false), > to=s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0 > 2022-11-11 22:16:50,513 INFO org.apache.hadoop.hbase.regionserver.HStore: > Added >
[jira] [Created] (HBASE-27484) FNFE on StoreFileScanner after a flush followed by a compaction
Wellington Chevreuil created HBASE-27484: Summary: FNFE on StoreFileScanner after a flush followed by a compaction Key: HBASE-27484 URL: https://issues.apache.org/jira/browse/HBASE-27484 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil One of our customers was running SyncTable from a 1.2 based cluster, where SyncTable map tasks were open scanners on a 2.4 based cluster for comparing the two clusters. Few of the map tasks failed with a DoNotRetryException caused by a FileNotFoundException blowing all the way up to the client: {noformat} Error: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: java.io.FileNotFoundException: open s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0 at 7225 on s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: KBRNC67WZGCS4SCF; S3 Extended Request ID: wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=; Proxy: null), S3 Extended Request ID: wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=:NoSuchKey at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3712) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45819) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339) Caused by: java.io.FileNotFoundException: open s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0 at 7225 on s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: KBRNC67WZGCS4SCF; S3 Extended Request ID: wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=; Proxy: null), S3 Extended Request ID: wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=:NoSuchKey ... at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:632) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:315) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:417) at org.apache.hadoop.hbase.regionserver.StoreScanner.reopenAfterFlush(StoreScanner.java:1018) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:552) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:155) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:7399) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:7567) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:7331) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3373) {noformat} We can see on the RS logs that the above file got recently create as an outcome of a memstore flush, then compaction is triggered shortly: {noformat} 2022-11-11 22:16:50,322 INFO org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed memstore data size=208.15 KB at sequenceid=4949703 (bloomFilter=false), to=s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0 2022-11-11 22:16:50,513 INFO org.apache.hadoop.hbase.regionserver.HStore: Added s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0, entries=951, sequenceid=4949703, filesize=26.2 K ... 2022-11-11 22:16:50,791 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction of 4c53da8c2ab9b7d7a0d6046ef3bb701c/0 in xx,IT001E90506702\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1618275339031.4c53da8c2ab9b7d7a0d6046ef3bb701c. 2022-11-11 22:16:50,791 INFO
[jira] [Created] (HBASE-27474) Evict blocks on split/merge; Avoid caching reference/hlinks if compaction is enabled
Wellington Chevreuil created HBASE-27474: Summary: Evict blocks on split/merge; Avoid caching reference/hlinks if compaction is enabled Key: HBASE-27474 URL: https://issues.apache.org/jira/browse/HBASE-27474 Project: HBase Issue Type: Improvement Affects Versions: 3.0.0-alpha-3 Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil This change aims to improve block cache usage upon splits/merges. On a split/merge event the following main steps happen: 1) parent regions are closed; 2) daughters are created and opened with refs/hlinks; 3) Compaction is triggered soon after the daughters get online; With "hbase.rs.evictblocksonclose" set to false, we keep all blocks for the closed regions in 1, then will try to load same blocks again on 2 (since we are using the refs/links for the cache key), just to throw it away and cache the compaction resulting file in 3. If the block cache is close to its capacity, blocks from the compacted files in 3 will likely miss the cache. The proposal here is to always evict blocks for parent regions on a split/merge event, and also avoid caching blocks for refs/hlinks if compactions are enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27407) Fixing check for "description" request param in JMXJsonServlet.java
[ https://issues.apache.org/jira/browse/HBASE-27407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27407. -- Resolution: Fixed Thanks for your contribution [~lkovacs] , had now merged it into master, branch-2, branch-2.5 and branch-2.4. > Fixing check for "description" request param in JMXJsonServlet.java > --- > > Key: HBASE-27407 > URL: https://issues.apache.org/jira/browse/HBASE-27407 > Project: HBase > Issue Type: Bug > Components: metrics >Affects Versions: 2.6.0, 3.0.0-alpha-3 >Reporter: Luca Kovacs >Assignee: Luca Kovacs >Priority: Minor > Fix For: 2.6.0, 2.5.1, 3.0.0-alpha-4, 2.4.15 > > > When trying to access the JMX metrics' description via the "description=true" > URL parameter, any value is accepted. > The current version checks only if "description" is in the URL parameter, but > doesn't check the parameter value. > I would like to fix this via checking if the parameter value is 'true' and > showing the description only when this condition is met. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27386) Use encoded size for calculating compression ratio in block size predicator
[ https://issues.apache.org/jira/browse/HBASE-27386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27386. -- Resolution: Fixed Merged into master and branch-2. Thanks for reviewing, [~ankit.singhal] ! > Use encoded size for calculating compression ratio in block size predicator > --- > > Key: HBASE-27386 > URL: https://issues.apache.org/jira/browse/HBASE-27386 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-3 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4 > > > In HBASE-27264 we had introduced the notion of block size predicators to > define hfile block boundaries when writing a new hfile, and provided the > PreviousBlockCompressionRatePredicator implementation for calculating block > sizes based on a compression ratio. It was using the raw data size written to > the block so far to calculate the compression ratio, but in the case where > encoding is enabled, this could lead to a very high compression ratio and > therefore, larger block sizes. We should use the encoded size to calculate > compression ratio, instead. > Here's a example scenario: > 1) Sample block size when not using the > PreviousBlockCompressionRatePredicator as implemented by HBASE-27264: > {noformat} > onDiskSizeWithoutHeader=6613, uncompressedSizeWithoutHeader=32928 {noformat} > 2) Sample block size when using PreviousBlockCompressionRatePredicator as > implemented by HBASE-27264 (uses raw data size to calculate compression rate): > {noformat} > onDiskSizeWithoutHeader=126920, uncompressedSizeWithoutHeader=655393 > {noformat} > 3) Sample block size when using PreviousBlockCompressionRatePredicator with > encoded size for calculating compression rate: > {noformat} > onDiskSizeWithoutHeader=54299, uncompressedSizeWithoutHeader=328051 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27386) Use encoded size for calculating compression ratio in block size predicator
Wellington Chevreuil created HBASE-27386: Summary: Use encoded size for calculating compression ratio in block size predicator Key: HBASE-27386 URL: https://issues.apache.org/jira/browse/HBASE-27386 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil In HBASE-27264 we had introduced the notion of block size predicators to define hfile block boundaries when writing a new hfile, and provided the PreviousBlockCompressionRatePredicator implementation for calculating block sizes based on a compression ratio. It was using the raw data size written to the block so far to calculate the compression ratio, but in the case where encoding is enabled, this could lead to a very high compression ratio and therefore, larger block sizes. We should use the encoded size to calculate compression ratio, instead. Here's a example scenario: 1) Sample block size when not using the PreviousBlockCompressionRatePredicator as implemented by HBASE-27264: {noformat} onDiskSizeWithoutHeader=6613, uncompressedSizeWithoutHeader=32928 {noformat} 2) Sample block size when using PreviousBlockCompressionRatePredicator as implemented by HBASE-27264 (uses raw data size to calculate compression rate): {noformat} onDiskSizeWithoutHeader=126920, uncompressedSizeWithoutHeader=655393 {noformat} 3) Sample block size when using PreviousBlockCompressionRatePredicator with encoded size for calculating compression rate: {noformat} onDiskSizeWithoutHeader=54299, uncompressedSizeWithoutHeader=328051 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27370) Avoid decompressing blocks when reading from bucket cache prefetch threads
[ https://issues.apache.org/jira/browse/HBASE-27370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27370. -- Resolution: Fixed Thanks for reviewing this [~psomogyi] [~taklwu] , had merged into master, then cherry-picked into branch-2, branch-2.5 and branch-2.4. > Avoid decompressing blocks when reading from bucket cache prefetch threads > --- > > Key: HBASE-27370 > URL: https://issues.apache.org/jira/browse/HBASE-27370 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 2.5.1, 3.0.0-alpha-4, 2.4.15 > > > When prefetching blocks into bucket cache, we had observed a consistent CPU > usage around 70% with no other workloads ongoing. For large bucket caches > (i.e. when using file based bucket cache), the prefetch can last for sometime > and having such a high CPU usage may impact the database usage by client > applications. > Further analysis of the prefetch threads stack trace showed that very often, > decompress logic is being executed by these threads: > {noformat} > "hfile-prefetch-1654895061122" #234 daemon prio=5 os_prio=0 > tid=0x557bb2907000 nid=0x406d runnable [0x7f294a504000] > java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:235) > at > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x0002d24c0ae8> (a java.io.BufferedInputStream) > at > org.apache.hadoop.hbase.io.util.BlockIOUtils.readFullyWithHeapBuffer(BlockIOUtils.java:105) > at > org.apache.hadoop.hbase.io.compress.Compression.decompress(Compression.java:465) > at > org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultDecodingContext.prepareDecoding(HFileBlockDefaultDecodingContext.java:90) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock.unpack(HFileBlock.java:650) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1342) > {noformat} > This is because *HFileReaderImpl.readBlock* is always decompressing blocks > even when *hbase.block.data.cachecompressed* is set to true. > This patch proposes an alternative flag to differentiate prefetch from normal > reads, so that doesn't decompress DATA blocks when prefetching with > *hbase.block.data.cachecompressed* set to true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27370) Avoid decompressing blocks when reading from bucket cache prefetch threads
Wellington Chevreuil created HBASE-27370: Summary: Avoid decompressing blocks when reading from bucket cache prefetch threads Key: HBASE-27370 URL: https://issues.apache.org/jira/browse/HBASE-27370 Project: HBase Issue Type: Improvement Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil When prefetching blocks into bucket cache, we had observed a consistent CPU usage around 70% with no other workloads ongoing. For large bucket caches (i.e. when using file based bucket cache), the prefetch can last for sometime and having such a high CPU usage may impact the database usage by client applications. Further analysis of the prefetch threads stack trace showed that very often, decompress logic is being executed by these threads: {noformat} "hfile-prefetch-1654895061122" #234 daemon prio=5 os_prio=0 tid=0x557bb2907000 nid=0x406d runnable [0x7f294a504000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native Method) at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:235) at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) - locked <0x0002d24c0ae8> (a java.io.BufferedInputStream) at org.apache.hadoop.hbase.io.util.BlockIOUtils.readFullyWithHeapBuffer(BlockIOUtils.java:105) at org.apache.hadoop.hbase.io.compress.Compression.decompress(Compression.java:465) at org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultDecodingContext.prepareDecoding(HFileBlockDefaultDecodingContext.java:90) at org.apache.hadoop.hbase.io.hfile.HFileBlock.unpack(HFileBlock.java:650) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1342) {noformat} This is because *HFileReaderImpl.readBlock* is always decompressing blocks even when *hbase.block.data.cachecompressed* is set to true. This patch proposes an alternative flag to differentiate prefetch from normal reads, so that doesn't decompress DATA blocks when prefetching with *hbase.block.data.cachecompressed* set to true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27265) Tool to read the contents of the storefile tracker file
[ https://issues.apache.org/jira/browse/HBASE-27265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27265. -- Resolution: Fixed Thanks for the contribution, [~abhradeep.kundu] ! Had now merged this on master and branch-2. > Tool to read the contents of the storefile tracker file > --- > > Key: HBASE-27265 > URL: https://issues.apache.org/jira/browse/HBASE-27265 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.6.0, 3.0.0-alpha-4 >Reporter: Abhradeep Kundu >Assignee: Abhradeep Kundu >Priority: Minor > Fix For: 2.6.0, 3.0.0-alpha-4 > > > It will be useful to have a tool that provides an ability to read the > contents of the tracker file (.filelist/f1 or f2) > Using the hdfs -cat or -text displays the contents of the tracker file in > binary and there is no option to show the contents in plain text. > {code:java} > x[cloudbreak@cod--z4t08rqbuyms-master0 ~]$ sudo hdfs dfs -text > s3a://odx-qe-bucket/odx-d7v40h/audit/cod--z4t08rqbuyms/hbase/data/default/one/6126beb5b349a1eee4b92987b78f1058/cf/.filelist/f1 > 22/04/05 19:51:14 WARN impl.MetricsConfig: Cannot locate configuration: tried > hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties > 22/04/05 19:51:14 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot > period at 10 second(s). > 22/04/05 19:51:14 INFO impl.MetricsSystemImpl: s3a-file-system metrics system > started > 22/04/05 19:51:14 INFO s3a.IDBDelegationTokenBinding: There is no Knox Token > available, fetching one from IDBroker... > 22/04/05 19:51:14 INFO idbroker.AbstractIDBClient: Authenticating with > IDBroker requires Kerberos > 22/04/05 19:51:14 INFO idbroker.AbstractIDBClient: Kerberos credentials are > available, using Kerberos to establish a session. > UGI=hbase/cod--z4t08rqbuyms-master0.odx-d7v4.svbr-nqvp.int.cldr.w...@odx-d7v4.svbr-nqvp.int.cldr.work > (auth:KERBEROS) > Apr 05, 2022 7:51:14 PM org.apache.knox.gateway.shell.KnoxSession createClient > INFO: Using default JAAS configuration > 22/04/05 19:51:15 INFO s3a.IDBDelegationTokenBinding: Bonded to Knox token > eyJqa3...OGdQmQ > 22/04/05 19:51:15 INFO Configuration.deprecation: No unit for > fs.s3a.connection.request.timeout(0) assuming SECONDS > 22/04/05 19:51:15 INFO idbroker.AbstractIDBClient: Creating Knox CAB session > using Knox DT eyJqa3...OGdQmQ ... > 22/04/05 19:51:16 INFO s3a.S3AInputStream: Switching to Random IO seek policy > ��/% > 98ef4cf597be48598c8376bc1cac200d�&22/04/05 19:51:16 INFO > impl.MetricsSystemImpl: Stopping s3a-file-system metrics system... > 22/04/05 19:51:16 INFO impl.MetricsSystemImpl: s3a-file-system metrics system > stopped. > 22/04/05 19:51:16 INFO impl.MetricsSystemImpl: s3a-file-system metrics system > shutdown complete.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27147) [HBCK2] extraRegionsInMeta does not work If RegionInfo is null
[ https://issues.apache.org/jira/browse/HBASE-27147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27147. -- Resolution: Fixed > [HBCK2] extraRegionsInMeta does not work If RegionInfo is null > -- > > Key: HBASE-27147 > URL: https://issues.apache.org/jira/browse/HBASE-27147 > Project: HBase > Issue Type: Bug > Components: hbck2 >Reporter: Karthik Palanisamy >Assignee: Wellington Chevreuil >Priority: Major > > extraRegionsInMeta will not clean/fix meta if info:regioninfo columns is > missing. > > Somehow, the customer has the following empty row in meta as a stale. > 'I1xx,16332508x.f53609cc1ae366b43205dxxx', 'info:state', > 16223 > > And no corresponding table "I1xx" exist. > > We use extraRegionsInMeta but it didn't clean. Also, we created same table > again and used extraRegionsInMeta after removing HDFS data but the stale row > never cleaned. It looks extraRegionsInMeta works only when "info:regioninfo" > is present. > > We need to handle the scenario for other columns I.e info:state, info:server, > etc -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes
Wellington Chevreuil created HBASE-27264: Summary: Add options to consider compressed size when delimiting blocks during hfile writes Key: HBASE-27264 URL: https://issues.apache.org/jira/browse/HBASE-27264 Project: HBase Issue Type: New Feature Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" property soo that it can allow for the encoded size to be considered when delimiting hfiles blocks during writes. Here we propose two additional properties,"hbase.block.size.limit.compressed" and "hbase.block.size.max.compressed" that would allow for consider the compressed size (if compression is in use) for delimiting blocks during hfile writing. When compression is enabled, certain datasets can have very high compression efficiency, so that the default 64KB block size and 10GB max file size can lead to hfiles with very large number of blocks. In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that switches to compressed size for delimiting blocks, and "hbase.block.size.max.compressed" is an int with the limit, in bytes for the compressed block size, in order to avoid very large uncompressed blocks (defaulting to 320KB). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27232) Fix checking for encoded block size when deciding if block should be closed
[ https://issues.apache.org/jira/browse/HBASE-27232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27232. -- Resolution: Fixed > Fix checking for encoded block size when deciding if block should be closed > --- > > Key: HBASE-27232 > URL: https://issues.apache.org/jira/browse/HBASE-27232 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-3, 2.4.13 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4 > > > On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and > uncompressed data size when deciding to close a block and start a new one. > That could lead to varying "on-disk" block sizes, depending on the encoding > efficiency for the cells in each block. > HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio > property, as ration of the original configured block size, to be compared > against the encoded size. This was an attempt to ensure homogeneous block > sizes. However, the check introduced by HBASE-17757 also considers the > unencoded size, which in the cases where encoding efficiency is higher than > what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would > still lead to varying block sizes. > This patch changes that logic, to only consider encoded size if > hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it > will consider the unencoded size. This gives a finer control over the on-disk > block sizes and the overall number of blocks when encoding is in use. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27232) Fix checking for encoded block size when deciding if block should be closed
Wellington Chevreuil created HBASE-27232: Summary: Fix checking for encoded block size when deciding if block should be closed Key: HBASE-27232 URL: https://issues.apache.org/jira/browse/HBASE-27232 Project: HBase Issue Type: Improvement Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and uncompressed data size when deciding to close a block and start a new one. That could lead to varying "on-disk" block sizes, depending on the encoding efficiency for the cells in each block. HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio property, as ration of the original configured block size, to be compared against the encoded size. This was an attempt to ensure homogeneous block sizes. However, the check introduced by HBASE-17757 also considers the unencoded size, which in the cases where encoding efficiency is higher than what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would still lead to varying block sizes. This patch changes that logic, to only consider encoded size if hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it will consider the unencoded size. This gives a finer control over the on-disk block sizes and the overall number of blocks when encoding is in use. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27147) Add a hbck2 option to clear emptyRegion from meta
[ https://issues.apache.org/jira/browse/HBASE-27147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27147. -- Resolution: Not A Problem Please see already existing command: *extraRegionsInMeta * > Add a hbck2 option to clear emptyRegion from meta > - > > Key: HBASE-27147 > URL: https://issues.apache.org/jira/browse/HBASE-27147 > Project: HBase > Issue Type: Bug > Components: hbck2 >Reporter: Karthik Palanisamy >Priority: Major > > No alternative option in hbck2 to fix empty regions. hbck1 equivalent is > "-fixEmptyMetaCells". > "Try to fix hbase:meta entries not referencing any region (empty > REGIONINFO_QUALIFIER rows)" > > NOTE: This is an inconsistent meta bug. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-27119) [HBCK2] Some commands are broken after HBASE-24587
Wellington Chevreuil created HBASE-27119: Summary: [HBCK2] Some commands are broken after HBASE-24587 Key: HBASE-27119 URL: https://issues.apache.org/jira/browse/HBASE-27119 Project: HBase Issue Type: Bug Components: hbase-operator-tools, hbck2 Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil HBCK2_replication_ and _filesystem_ commands are broken after HBASE-24587. Trying to pass the _-f_ or _--fix_ options give the below error: {noformat} ERROR: Unrecognized option: -f FOR USAGE, use the -h or --help option 2022-06-14T16:07:32,296 INFO [main] client.ConnectionImplementation: Closing master protocol: MasterService Exception in thread "main" java.lang.NullPointerException at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:1083) at org.apache.hbase.HBCK2.run(HBCK2.java:982) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hbase.HBCK2.main(HBCK2.java:1318) {noformat} This is because _getInputList_ calls [here|https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java#L1073] and [here|https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java#L1082] only accept the _-i_/_--inputFiles_, throwing an exception if we pass _-f/--fix_ options. Still need to confirm if any other command is affected by this. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HBASE-27017) MOB snapshot is broken when FileBased SFT is used
[ https://issues.apache.org/jira/browse/HBASE-27017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27017. -- Resolution: Fixed Merged into master, branch-2 and branch-2.5. > MOB snapshot is broken when FileBased SFT is used > - > > Key: HBASE-27017 > URL: https://issues.apache.org/jira/browse/HBASE-27017 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.5.0, 3.0.0-alpha-2 >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3 > > > During snapshot MOB regions are treated like any other region. When a > snapshot is taken and hfile references are collected a StoreFileTracker is > created to get the current active hfile list. But the MOB region stores are > not tracked so an empty list is returned, resulting in a broken snapshot. > When this snapshot is cloned the resulting table will have no MOB files or > references. > The problematic code can be found here: > [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java#L313] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HBASE-27069) Hbase SecureBulkload permission regression
[ https://issues.apache.org/jira/browse/HBASE-27069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27069. -- Resolution: Fixed > Hbase SecureBulkload permission regression > -- > > Key: HBASE-27069 > URL: https://issues.apache.org/jira/browse/HBASE-27069 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.0, 3.0.0-alpha-3 >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3 > > > HBASE-26707 has introduced a bug, where setting the permission of the bulk > loaded HFile to 777 is made conditional. > However, as discussed in HBASE-15790, that permission is essential for > HBase's correct operation. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HBASE-27061) two phase bulkload is broken when SFT is in use.
[ https://issues.apache.org/jira/browse/HBASE-27061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27061. -- Resolution: Fixed Thanks for the fix, [~sergey.soldatov]. Had merged into master, branch-2 and branch-2.5. > two phase bulkload is broken when SFT is in use. > > > Key: HBASE-27061 > URL: https://issues.apache.org/jira/browse/HBASE-27061 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.0, 3.0.0-alpha-2, 2.6.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3 > > > In HBASE-26707 for the SFT case, we are writing files directly to the region > location. For that we are using HRegion.regionDir as the staging directory. > The problem is that in reality, this dir is pointing to the WAL dir, so for > S3 deployments that would be pointing to the hdfs. As the result during the > execution of LoadIncrementalHFiles the process failed with the exception: > {noformat} > 2022-05-24 03:31:23,656 ERROR > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to > complete bulk load > java.lang.IllegalArgumentException: Wrong FS > hdfs://ns1//hbase-wals/data/default/employees/4f367b303da4fed7667fff07fd4c6066/department/acd971097924463da6d6e3a15f9527da > -expected s3a://hbase > at > org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1375) > at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:647) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1337) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1363) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3521) > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:511) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:397) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$SecureBulkLoadListener.prepareBulkLoad(SecureBulkLoadManager.java:397) > at > org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:6994) > at > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:291) > at > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1879) > at > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2453) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45821) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HBASE-27021) StoreFileInfo should set its initialPath in a consistent way
[ https://issues.apache.org/jira/browse/HBASE-27021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-27021. -- Resolution: Fixed Merged into master, branch-2 and branch-2.5. Thanks for reviewing it [~zhangduo] [~elserj] ! > StoreFileInfo should set its initialPath in a consistent way > > > Key: HBASE-27021 > URL: https://issues.apache.org/jira/browse/HBASE-27021 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.0, 3.0.0-alpha-2, 2.6.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3 > > > Currently, StoreFileInfo provides overloaded public constructors where the > related file path can be passed as either a Path or FileStatus instance. This > can lead to the StoreFileInfo instances related to the same file entry to > have different representations of the file path, which could create problems > for functions relying on equality for comparing store files. One example I > could find is the StoreEngine.refreshStoreFiles method, which list some files > from the SFT, then compares against a list of files from the SFM to decide > how it should update SFM internal cache. Here's a sample output from the > TestHStore.testRefreshStoreFiles: > --- > 2022-05-10T15:06:42,831 INFO [Time-limited test] > regionserver.StoreEngine(399): Refreshing store files for > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine@69d58ac1 files to > add: > [file:/hbase/hbase-server/target/test-data/e3eac5ce-9bdf-8624-bcec-09c89790d682/TestStoretestRefreshStoreFiles/data/default/table/da6a3cf38941b37cd16438d554b13bbc/family/6e92c2f5cf1f40f7b8c6b6b34a176fa5, > > file:/hbase/hbase-server/target/test-data/e3eac5ce-9bdf-8624-bcec-09c89790d682/TestStoretestRefreshStoreFiles/data/default/table/da6a3cf38941b37cd16438d554b13bbc/family/{*}fa4d5909da644d94873cbfdc6b5a07da{*}] > files to remove: > [/hbase/hbase-server/target/test-data/e3eac5ce-9bdf-8624-bcec-09c89790d682/TestStoretestRefreshStoreFiles/data/default/table/da6a3cf38941b37cd16438d554b13bbc/family/{*}fa4d5909da644d94873cbfdc6b5a07da{*}] > --- > The above will wrongly add it to SFM's list of compacted files, making a > valid file potentially eligible for deletion and data loss. > I think we can avoid that by always converting Path instances passed in > StoreFileInfo constructors to a FileStatus, for consistently build the > internal StoreFileInfo path. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-27022) SFT seems apparently tracking invalid/malformed store files
Wellington Chevreuil created HBASE-27022: Summary: SFT seems apparently tracking invalid/malformed store files Key: HBASE-27022 URL: https://issues.apache.org/jira/browse/HBASE-27022 Project: HBase Issue Type: Sub-task Reporter: Wellington Chevreuil Opening this on behalf of [~apurtell] , who first reported this issue on HBASE-26999: When running scale tests using ITLCC, the following errors were observed: {noformat} [00]2022-05-05 15:59:52,280 WARN [region-location-0] regionserver.StoreFileInfo: Skipping hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/9eafc10e1b5a25532a4f0adf550828fc/c/9d07757144a7404fac02e161b5bd035e because it is empty. HBASE-646 DATA LOSS? ... [00]2022-05-05 15:59:52,320 WARN [region-location-2] regionserver.StoreFileInfo: Skipping hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/5322c54b9a899eae03cb16e956a836d5/c/184b4f55ab1a4dbc813e77aeae1343ae because it is empty. HBASE-646 DATA LOSS? {noformat} >From some discussions in HBASE-26999, it seems that SFT has wrongly tracked an >incomplete/unfinished store file. For further context, follow the [comments thread on HBASE-26999|https://issues.apache.org/jira/browse/HBASE-26999?focusedCommentId=17533508=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17533508]. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-27021) StoreFileInfo should set its initialPath in a consistent way
Wellington Chevreuil created HBASE-27021: Summary: StoreFileInfo should set its initialPath in a consistent way Key: HBASE-27021 URL: https://issues.apache.org/jira/browse/HBASE-27021 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil Currently, StoreFileInfo provides overloaded public constructors where the related file path can be passed as either a Path or FileStatus instance. This can lead to the StoreFileInfo instances related to the same file entry to have different representations of the file path, which could create problems for functions relying on equality for comparing store files. One example I could find is the StoreEngine.refreshStoreFiles method, which list some files from the SFT, then compares against a list of files from the SFM to decide how it should update SFM internal cache. Here's a sample output from the TestHStore.testRefreshStoreFiles: --- 2022-05-10T15:06:42,831 INFO [Time-limited test] regionserver.StoreEngine(399): Refreshing store files for org.apache.hadoop.hbase.regionserver.DefaultStoreEngine@69d58ac1 files to add: [file:/hbase/hbase-server/target/test-data/e3eac5ce-9bdf-8624-bcec-09c89790d682/TestStoretestRefreshStoreFiles/data/default/table/da6a3cf38941b37cd16438d554b13bbc/family/6e92c2f5cf1f40f7b8c6b6b34a176fa5, file:/hbase/hbase-server/target/test-data/e3eac5ce-9bdf-8624-bcec-09c89790d682/TestStoretestRefreshStoreFiles/data/default/table/da6a3cf38941b37cd16438d554b13bbc/family/{*}fa4d5909da644d94873cbfdc6b5a07da{*}] files to remove: [/hbase/hbase-server/target/test-data/e3eac5ce-9bdf-8624-bcec-09c89790d682/TestStoretestRefreshStoreFiles/data/default/table/da6a3cf38941b37cd16438d554b13bbc/family/{*}fa4d5909da644d94873cbfdc6b5a07da{*}] --- The above will wrongly add it to SFM's list of compacted files, making a valid file potentially eligible for deletion and data loss. I think we can avoid that by always converting Path instances passed in StoreFileInfo constructors to a FileStatus, for consistently build the internal StoreFileInfo path. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-26999) HStore should try write WAL compact marker before replacing compacted files in StoreEngine
Wellington Chevreuil created HBASE-26999: Summary: HStore should try write WAL compact marker before replacing compacted files in StoreEngine Key: HBASE-26999 URL: https://issues.apache.org/jira/browse/HBASE-26999 Project: HBase Issue Type: Sub-task Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil On HBASE-26064, it seems we altered the order we update different places with the results of a compaction: {noformat} @@ -1510,14 +1149,13 @@ public class HStore implements Store, HeapSize, StoreConfigInformation, List newFiles) throws IOException { // Do the steps necessary to complete the compaction. setStoragePolicyFromFileName(newFiles); - List sfs = commitStoreFiles(newFiles, true); + List sfs = storeEngine.commitStoreFiles(newFiles, true); if (this.getCoprocessorHost() != null) { for (HStoreFile sf : sfs) { getCoprocessorHost().postCompact(this, sf, cr.getTracker(), cr, user); } } - writeCompactionWalRecord(filesToCompact, sfs); - replaceStoreFiles(filesToCompact, sfs); + replaceStoreFiles(filesToCompact, sfs, true); ... @@ -1581,25 +1219,24 @@ public class HStore implements Store, HeapSize, StoreConfigInformation, this.region.getRegionInfo(), compactionDescriptor, this.region.getMVCC()); } - void replaceStoreFiles(Collection compactedFiles, Collection result) - throws IOException { - this.lock.writeLock().lock(); - try { - this.storeEngine.getStoreFileManager().addCompactionResults(compactedFiles, result); - synchronized (filesCompacting) { - filesCompacting.removeAll(compactedFiles); - } - - // These may be null when the RS is shutting down. The space quota Chores will fix the Region - // sizes later so it's not super-critical if we miss these. - RegionServerServices rsServices = region.getRegionServerServices(); - if (rsServices != null && rsServices.getRegionServerSpaceQuotaManager() != null) { - updateSpaceQuotaAfterFileReplacement( - rsServices.getRegionServerSpaceQuotaManager().getRegionSizeStore(), getRegionInfo(), - compactedFiles, result); - } - } finally { - this.lock.writeLock().unlock(); + @RestrictedApi(explanation = "Should only be called in TestHStore", link = "", + allowedOnPath = ".*/(HStore|TestHStore).java") + void replaceStoreFiles(Collection compactedFiles, Collection result, + boolean writeCompactionMarker) throws IOException { + storeEngine.replaceStoreFiles(compactedFiles, result); + if (writeCompactionMarker) { + writeCompactionWalRecord(compactedFiles, result); + } + synchronized (filesCompacting) { + filesCompacting.removeAll(compactedFiles); + } + // These may be null when the RS is shutting down. The space quota Chores will fix the Region + // sizes later so it's not super-critical if we miss these. + RegionServerServices rsServices = region.getRegionServerServices(); + if (rsServices != null && rsServices.getRegionServerSpaceQuotaManager() != null) { + updateSpaceQuotaAfterFileReplacement( + rsServices.getRegionServerSpaceQuotaManager().getRegionSizeStore(), getRegionInfo(), + compactedFiles, result); {noformat} While running some large scale load test, we run into File SFT metafiles inconsistency that we believe could have been avoided if the original order was in place. Here the scenario we had: 1) Region R with one CF f was open on RS1. At this time, the given store had some files, let's say these were file1, file2 and file3; 2) Compaction started on RS1; 3) RS1 entered a long GC pause, lost ZK lock. Compaction is still running, though. 4) RS2 opens R. The related File SFT instance for this store then creates a new meta file with file1, file2 and file3. 5) Compaction on RS1 successfully completes the *storeEngine.replaceStoreFiles* call. This updates the in memory cache of valid files (StoreFileManager) and the SFT metafile for the store engine on RS1 with the compaction resulting file, say file4, removing file1, file2 and file3. Note that the SFT meta file used by RS1 here is different (older) than the one used by RS2. 6) Compaction on RS1 tries to update WAL marker, but fails to do so, as the WAL already got closed when the RS1 ZK lock expired. This triggers a store close in RS1. As part of the store close process, it removes all files it sees as completed compacted, in this case, file1, file2 and file3. 7) RS2 still references file1, file2 and file3. It then gets FileNotFoundException when trying to open any of these files. This situation would had been avoided if the original order of a) write WAL marker, then b) replace store files was kept. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HBASE-26971) SnapshotInfo --snapshot param is marked as required even when trying to list all snapshots
[ https://issues.apache.org/jira/browse/HBASE-26971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-26971. -- Resolution: Fixed Merged to master, branch-2, branch-2.5 and branch-2.4. Thanks for reviewing, [~elserj] ! > SnapshotInfo --snapshot param is marked as required even when trying to list > all snapshots > -- > > Key: HBASE-26971 > URL: https://issues.apache.org/jira/browse/HBASE-26971 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.5.0, 3.0.0-alpha-2, 2.4.11 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.12 > > > SnapshotInfo –-list-snapshots lists all existing snapshots and doesn't need > any filter, however, --snapshot param is marked as required, causing list to > fail if this param is not defined. > Also, the help description is a bit confusing about which options should be > used together. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-26971) SnapshotInfo --snapshot param is marked as required even when trying to list all snapshots
Wellington Chevreuil created HBASE-26971: Summary: SnapshotInfo --snapshot param is marked as required even when trying to list all snapshots Key: HBASE-26971 URL: https://issues.apache.org/jira/browse/HBASE-26971 Project: HBase Issue Type: Improvement Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil SnapshotInfo –-list-snapshots lists all existing snapshots and doesn't need any filter, however, --snapshot param is marked as required, causing list to fail if this param is not defined. Also, the help description is a bit confusing about which options should be used together. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HBASE-26927) Add snapshot scanner UT with SFT and some cleanups to TestTableSnapshotScanner
[ https://issues.apache.org/jira/browse/HBASE-26927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-26927. -- Resolution: Fixed Merged into master, branch-2 and branch-2.5. Thanks for reviewing it, [~zhangduo] [~elserj] ! > Add snapshot scanner UT with SFT and some cleanups to TestTableSnapshotScanner > -- > > Key: HBASE-26927 > URL: https://issues.apache.org/jira/browse/HBASE-26927 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.5.0, 3.0.0-alpha-2, 3.0.0-alpha-3, 2.4.11 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3 > > > This is to replicate current TableSapshotScanner UTs to run over an SFT > cluster. Just extending current TableSapshotScanner with overrides to setup > method. Also applied some fix/cleanups to TableSapshotScanner. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26927) Add snapshot scanner UT with SFT and some cleanups to TestTableSnapshotScanner
Wellington Chevreuil created HBASE-26927: Summary: Add snapshot scanner UT with SFT and some cleanups to TestTableSnapshotScanner Key: HBASE-26927 URL: https://issues.apache.org/jira/browse/HBASE-26927 Project: HBase Issue Type: Sub-task Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil This is to replicate current TableSapshotScanner UTs to run over an SFT cluster. Just extending current TableSapshotScanner with overrides to setup method. Also applied some fix/cleanups to TableSapshotScanner. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HBASE-26838) Junit jar is not included in the hbase tar ball, causing issues for some hbase tools that do rely on it
[ https://issues.apache.org/jira/browse/HBASE-26838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-26838. -- Resolution: Fixed Merged into master, then cherry-picked into branch-2, branch-2.4 and branch-2.4. Thanks for reviewing it, [~elserj] , [~ndimiduk] ! > Junit jar is not included in the hbase tar ball, causing issues for some > hbase tools that do rely on it > > > Key: HBASE-26838 > URL: https://issues.apache.org/jira/browse/HBASE-26838 > Project: HBase > Issue Type: Bug > Components: integration tests, tooling >Affects Versions: 2.5.0, 3.0.0-alpha-2, 2.6.0, 2.4.11 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.12 > > > We used to include junit jar on the generated tar ball lib directory. After > some sanitisation of unnecessary libs for most of hbase processes, junit got > removed from the packing, so that it don't end in hbase classpath by default. > Some testing tools, however do depend on junit at runtime, and would now fail > with NoClassDefFoundError, like > [IntegrationTestIngest:|https://hbase.apache.org/book.html#chaos.monkey.properties] > {noformat} > 2022-03-14T21:54:50,483 INFO [main] client.AsyncConnectionImpl: Connection > has been closed by main. > Exception in thread "main" java.lang.NoClassDefFoundError: org/junit/Assert > at > org.apache.hadoop.hbase.IntegrationTestIngest.initTable(IntegrationTestIngest.java:101) > at > org.apache.hadoop.hbase.IntegrationTestIngest.setUpCluster(IntegrationTestIngest.java:92) > at > org.apache.hadoop.hbase.IntegrationTestBase.setUp(IntegrationTestBase.java:170) > at > org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:153) > at > org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:153) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.hbase.IntegrationTestIngest.main(IntegrationTestIngest.java:259) > Caused by: java.lang.ClassNotFoundException: org.junit.Assert > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) > ... 7 more {noformat} > Discussing with [~elserj] internally, we believe a reasonable solution would > be to include junit jar back into the tarball, under "lib/test" dir so that > it's not automatically added to hbase processes classpath, but still allow > operators to manually define it in a convenient way, like below: > {noformat} > HBASE_CLASSPATH="$HBASE_HOME/lib/tests/*" hbase > org.apache.hadoop.hbase.IntegrationTesngest -m slowDeterministic {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HBASE-26881) Backport HBASE-25368 to branch-2
[ https://issues.apache.org/jira/browse/HBASE-26881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-26881. -- Resolution: Fixed Merged to branch-2. Thanks for reviewing, [~apurtell] ! > Backport HBASE-25368 to branch-2 > - > > Key: HBASE-26881 > URL: https://issues.apache.org/jira/browse/HBASE-26881 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.5.0, 2.6.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > > HBASE-26640 introduced two extra paths under master:store table: > ".initializing" and ".initialized", in order to control when such store has > been completed started for SFT. > Problem is that TestHFileProcedurePrettyPrinter uses > RegionInfo.isEncodedRegionName to determine if a given child path in the > table dir is a valid region dir. Current implementation for > RegionInfo.isEncodedRegionName considers ".initializing" and ".initialized" > as valid region encoded names, thus the test ends up picking one of the flag > dirs to list hfiles that should had been modified when validating the test > outcome. > Further improvements have been made to RegionInfo.isEncodedRegionName in > HBASE-25368 to proper validate region names, but those weren't backported to > branch-2. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26881) Fix TestHFileProcedurePrettyPrinter broken by changes from HBASE-26640
Wellington Chevreuil created HBASE-26881: Summary: Fix TestHFileProcedurePrettyPrinter broken by changes from HBASE-26640 Key: HBASE-26881 URL: https://issues.apache.org/jira/browse/HBASE-26881 Project: HBase Issue Type: Sub-task Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil HBASE-26640 introduced two extra paths under master:store table: ".initializing" and ".initialized", in order to control when such store has been completed started for SFT. Problem is that TestHFileProcedurePrettyPrinter assumes all child dirs from master:store would be region dirs, so it ends up picking up one of the flag dirs to list hfiles that should had been modified when validating the test outcome. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26838) Junit jar is not included in the hbase tar ball, causing issues for some hbase tools that do rely on it
Wellington Chevreuil created HBASE-26838: Summary: Junit jar is not included in the hbase tar ball, causing issues for some hbase tools that do rely on it Key: HBASE-26838 URL: https://issues.apache.org/jira/browse/HBASE-26838 Project: HBase Issue Type: Bug Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil We used to include junit jar on the generated tar ball lib directory. After some sanitisation of unnecessary libs for most of hbase processes, junit got removed from the packing, so that it don't end in hbase classpath by default. Some testing tools, however do depend on junit at runtime, and would now fail with NoClassDefFoundError, like [IntegrationTestIngest:|https://hbase.apache.org/book.html#chaos.monkey.properties] {noformat} 2022-03-14T21:54:50,483 INFO [main] client.AsyncConnectionImpl: Connection has been closed by main. Exception in thread "main" java.lang.NoClassDefFoundError: org/junit/Assert at org.apache.hadoop.hbase.IntegrationTestIngest.initTable(IntegrationTestIngest.java:101) at org.apache.hadoop.hbase.IntegrationTestIngest.setUpCluster(IntegrationTestIngest.java:92) at org.apache.hadoop.hbase.IntegrationTestBase.setUp(IntegrationTestBase.java:170) at org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:153) at org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:153) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.IntegrationTestIngest.main(IntegrationTestIngest.java:259) Caused by: java.lang.ClassNotFoundException: org.junit.Assert at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) ... 7 more {noformat} Discussing with [~elserj] internally, we believe a reasonable solution would be to include junit jar back into the tarball, under "lib/test" dir so that it's not automatically added to hbase processes classpath, but still allow operators to manually define it in a convenient way, like below: {noformat} HBASE_CLASSPATH="$HBASE_HOME/lib/tests/*" hbase org.apache.hadoop.hbase.IntegrationTesngest -m slowDeterministic {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HBASE-26707) Reduce number of renames during bulkload
[ https://issues.apache.org/jira/browse/HBASE-26707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-26707. -- Resolution: Fixed Thanks for the contribution, [~bszabolcs] ! > Reduce number of renames during bulkload > > > Key: HBASE-26707 > URL: https://issues.apache.org/jira/browse/HBASE-26707 > Project: HBase > Issue Type: Sub-task >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Major > > Make sure we only do a single rename operation during bulkload when > StoreEngine does not require the the use of tmp directories. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HBASE-26577) Update ref guide section for IT and Chaos Monkey to explain the additions from HBASE-26556
[ https://issues.apache.org/jira/browse/HBASE-26577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-26577. -- Resolution: Fixed Merged into master. Thanks for reviewing it, [~elserj] ! > Update ref guide section for IT and Chaos Monkey to explain the additions > from HBASE-26556 > -- > > Key: HBASE-26577 > URL: https://issues.apache.org/jira/browse/HBASE-26577 > Project: HBase > Issue Type: Improvement >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > > HBASE-26556 introduced a customisable monkey factory for the slow > deterministic policy, as well as made possible for pluggable implementations > of the hbase remote shell commands. This is to document how these new > features can be used. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HBASE-26662) User.createUserForTesting should not reset UserProvider.groups every time if hbase.group.service.for.test.only is true
[ https://issues.apache.org/jira/browse/HBASE-26662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-26662. -- Resolution: Fixed Thanks for reviewing it, [~elserj] , [~zhangduo] ! > User.createUserForTesting should not reset UserProvider.groups every time if > hbase.group.service.for.test.only is true > -- > > Key: HBASE-26662 > URL: https://issues.apache.org/jira/browse/HBASE-26662 > Project: HBase > Issue Type: Bug >Affects Versions: 2.5.0, 3.0.0-alpha-2, 2.4.9, 2.6.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.10 > > > The _if check_ below will always unnecessarily reset static var > _UserProvider.groups_ to a newly created instance of TestingGroups every time > `User.createUserForTesting` is called. > {noformat} > if (!(UserProvider.groups instanceof TestingGroups) || > conf.getBoolean(TestingGroups.TEST_CONF, false)) { > UserProvider.groups = new TestingGroups(UserProvider.groups); > } > {noformat} > For tests creating multiple {_}test users{_}, this causes the latest created > user to reset _groups_ and all previously created users would now have to be > available on the {_}User.underlyingImplementation{_}, which not always will > be true. -- This message was sent by Atlassian Jira (v8.20.1#820001)