[jira] [Created] (HBASE-28724) BucketCache.notifyFileCachingCompleted may throw IllegalMonitorStateException

2024-07-11 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28724:


 Summary: BucketCache.notifyFileCachingCompleted may throw 
IllegalMonitorStateException 
 Key: HBASE-28724
 URL: https://issues.apache.org/jira/browse/HBASE-28724
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


If the prefetch thread completes reading the file blocks faster than the bucket 
cache writer threads are able to drain it from the writer queues, we might run 
into a scenario where BucketCache.notifyFileCachingCompleted may throw 
IllegalMonitorStateException, as we can reach [this block of the 
code|https://github.com/wchevreuil/hbase/blob/684964f1c1693d2a0792b7b721c92693d75b4cea/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java#L2106].
 I believe the impact is not critical, as the prefetch thread is already 
finishing at that point, but nevertheless, such error in the logs might be 
misleading.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28364) Warn: Cache key had block type null, but was found in L1 cache

2024-07-10 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28364.
--
Resolution: Fixed

Merged to 2.6 and 2.5 branches.

> Warn: Cache key had block type null, but was found in L1 cache
> --
>
> Key: HBASE-28364
> URL: https://issues.apache.org/jira/browse/HBASE-28364
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.4.18, 2.5.9
>Reporter: Bryan Beaudreault
>Assignee: Nikita Pande
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.1, 2.5.10
>
>
> I'm ITBLL testing branch-2.6 and am seeing lots of these warns. This is new 
> to me. I would expect a warn to be on the rare side or be indicative of a 
> problem, but unclear from the code.
> cc [~wchevreuil] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28596) Optimise BucketCache usage upon regions splits/merges.

2024-06-21 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28596.
--
Release Note: 
This adds a new configuration property, "hbase.rs.evictblocksonsplit”, with 
default value set to true, which makes all parent region blocks to get evicted 
on split. 

It has modified behaviour implemented on previous HBASE-27474, to allow 
prefetch to run on the daughters' refs (if hbase.rs.prefetchblocksonopen is 
true).

It has also modified how BucketCache deals with blocks from reference files:
1) When adding blocks for a reference file, it first resolves the reference and 
check if the related block from the parent file is already in the cache. If so, 
it doesn't add any this block to the cache. Otherwise, it will add the block 
with the reference as the cache key.
2) When searching for blocks from a reference file in the cache, it first 
resolves the reference and check for the block from the original file, 
returning this one if found. Otherwise, it searches the cache again, now using 
the reference file as cache key.


  Resolution: Fixed

Merged into master, branch-3 and branch-2. Thanks for reviewing it [~taklwu] !

> Optimise BucketCache usage upon regions splits/merges.
> --
>
> Key: HBASE-28596
> URL: https://issues.apache.org/jira/browse/HBASE-28596
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2
>
>
> This proposal aims to give more flexibility for users to decide whether or 
> not blocks from a parent region should be evict, and also optimise cache 
> usage by resolving file reference blocks to the referred block in the cache.
> Some extra context:
> 1) Originally, the default behaviour on splits was to rely on the 
> "hbase.rs.evictblocksonclose" value to decide if the cached blocks from the 
> parent split should be evicted or not. Then the resulting split daughters get 
> open with refs to the parent file. If hbase.rs.prefetchblocksonopen is set, 
> these openings will trigger a prefetch of the blocks from the parent split, 
> now with cache keys from the ref path. That means, if 
> "hbase.rs.evictblocksonclose" is false and “hbase.rs.prefetchblocksonopen” is 
> true, we will be duplicating blocks in the cache. In scenarios where cache 
> usage is at capacity and added latency for reading from the file system is 
> high (for example reading from a cloud storage), this can have a severe 
> impact, as the prefetch for the refs would trigger evictions. Also, the refs 
> tend to be short lived, as compaction is triggered on the split daughters 
> soon after it’s open.
> 2) HBASE-27474 has changed the original behaviour described above, to now 
> always evict blocks from the split parent upon split is completed, and 
> skipping prefetch for refs (since refs are short lived). The side effect is 
> that the daughters blocks would only be cached once compaction is completed, 
> but compaction itself will run slower since it needs to read the blocks from 
> the file system. On regions as large as 20GB, the performance degradation 
> reported by users has been severe.
> This proposes a new “hbase.rs.evictblocksonsplit” configuration property that 
> makes the eviction over split configurable. Depending on the use case, the 
> impact of mass evictions due to cache capacity may be higher, in which case 
> users might prefer to keep evicting split parent blocks. Additionally, it 
> modifies the way we handle refs when caching. HBASE-27474 behaviour was to 
> skip caching refs to avoid duplicate data in the cache as long as compaction 
> was enabled, relying on the fact that refs from splits are usually short 
> lived. Here, we propose modifying the search for blocks cache keys, so that 
> we always resolve the referenced file first and look for the related 
> referenced file block in the cache. That way we avoid duplicates in the cache 
> and also expedite scan performance on the split daughters, as it’s now 
> resolving the referenced file and reading from the cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28657) Backport HBASE-28246 Expose region cached size over JMX metrics and report in the RS UI

2024-06-17 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28657.
--
Resolution: Fixed

Merged into branch-2.6. Thanks for backporting it, [~szucsvillo] .

> Backport HBASE-28246 Expose region cached size over JMX metrics and report in 
> the RS UI
> ---
>
> Key: HBASE-28657
> URL: https://issues.apache.org/jira/browse/HBASE-28657
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Szucs Villo
>Assignee: Szucs Villo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28467) Integration of time-based priority caching into cacheOnRead read code paths.

2024-05-22 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28467.
--
Resolution: Fixed

Merged into the feature branch. Thanks for the contribution, 
[~janardhan.hungund] !

> Integration of time-based priority caching into cacheOnRead read code paths.
> 
>
> Key: HBASE-28467
> URL: https://issues.apache.org/jira/browse/HBASE-28467
> Project: HBase
>  Issue Type: Task
>  Components: BucketCache
>Reporter: Janardhan Hungund
>Assignee: Janardhan Hungund
>Priority: Major
>  Labels: pull-request-available
>
> This Jira tracks the integration of time-based caching framework APIs into 
> read code paths.
> Thanks,
> Janardhan
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27915) Update hbase_docker with an extra Dockerfile compatible with mac m1 platfrom

2024-05-22 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27915.
--
Resolution: Fixed

Thanks for reviewing it [~swu] ! I had merged this into master, branch-3, 
branch-2, branch-2.6, branch-2.5 and branch-2.4.

> Update hbase_docker with an extra Dockerfile compatible with mac m1 platfrom
> 
>
> Key: HBASE-27915
> URL: https://issues.apache.org/jira/browse/HBASE-27915
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
>  Labels: pull-request-available
>
> When trying to use the current Dockerfile under "./dev-support/hbase_docker" 
> on m1 macs, the docker build fails at the git clone & mvn build stage with 
> below error:
> {noformat}
>  #0 8.214 qemu-x86_64: Could not open '/lib64/ld-linux-x86-64.so.2': No such 
> file or directory
> {noformat}
> It turns out for mac m1, we have to explicitly define the platform flag for 
> the ubuntu image. I thought we could add a note in this readme, together with 
> an "m1" subfolder containing a modified copy of this Dockerfile that works on 
> mac m1s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28469) Integration of time-based priority caching into compaction paths.

2024-05-22 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28469.
--
Resolution: Fixed

Merged into the feature branch. Thanks for the contribution [~vinayakhegde], 
and for reviewing [~janardhan.hungund] !

> Integration of time-based priority caching into compaction paths.
> -
>
> Key: HBASE-28469
> URL: https://issues.apache.org/jira/browse/HBASE-28469
> Project: HBase
>  Issue Type: Task
>Reporter: Janardhan Hungund
>Assignee: Vinayak Hegde
>Priority: Major
>  Labels: pull-request-available
>
> The time-based priority caching is dependent on the date-tiered compaction 
> that structures store files in date-based tiered layout. This Jira tracks the 
> changes are needed for the integration of this compaction strategy with the 
> data-tiering to enable appropriate caching of hot data in the cache, while 
> the code data can remain the cloud storage.
> Thanks,
> Janardhan



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28596) Optimise BucketCache usage upon regions splits/merges.

2024-05-15 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28596:


 Summary: Optimise BucketCache usage upon regions splits/merges.
 Key: HBASE-28596
 URL: https://issues.apache.org/jira/browse/HBASE-28596
 Project: HBase
  Issue Type: Improvement
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


This proposal aims to give more flexibility for users to decide whether or not 
blocks from a parent region should be evict, and also optimise cache usage by 
resolving file reference blocks to the referred block in the cache.

Some extra context:

1) Originally, the default behaviour on splits was to rely on the 
"hbase.rs.evictblocksonclose" value to decide if the cached blocks from the 
parent split should be evicted or not. Then the resulting split daughters get 
open with refs to the parent file. If hbase.rs.prefetchblocksonopen is set, 
these openings will trigger a prefetch of the blocks from the parent split, now 
with cache keys from the ref path. That means, if "hbase.rs.evictblocksonclose" 
is false and “hbase.rs.prefetchblocksonopen” is true, we will be duplicating 
blocks in the cache. In scenarios where cache usage is at capacity and added 
latency for reading from the file system is high (for example reading from a 
cloud storage), this can have a severe impact, as the prefetch for the refs 
would trigger evictions. Also, the refs tend to be short lived, as compaction 
is triggered on the split daughters soon after it’s open.

2) HBASE-27474 has changed the original behaviour described above, to now 
always evict blocks from the split parent upon split is completed, and skipping 
prefetch for refs (since refs are short lived). The side effect is that the 
daughters blocks would only be cached once compaction is completed, but 
compaction itself will run slower since it needs to read the blocks from the 
file system. On regions as large as 20GB, the performance degradation reported 
by users has been severe.

This proposes a new “hbase.rs.evictblocksonsplit” configuration property that 
makes the eviction over split configurable. Depending on the use case, the 
impact of mass evictions due to cache capacity may be higher, in which case 
users might prefer to keep evicting split parent blocks. Additionally, it 
modifies the way we handle refs when caching. HBASE-27474 behaviour was to skip 
caching refs to avoid duplicate data in the cache as long as compaction was 
enabled, relying on the fact that refs from splits are usually short lived. 
Here, we propose modifying the search for blocks cache keys, so that we always 
resolve the referenced file first and look for the related referenced file 
block in the cache. That way we avoid duplicates in the cache and also expedite 
scan performance on the split daughters, as it’s now resolving the referenced 
file and reading from the cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28535) Implement a region server level configuration to enable/disable data-tiering

2024-05-02 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28535.
--
Resolution: Fixed

Merged into the feature branch. Thanks for the contribution, 
[~janardhan.hungund] !

> Implement a region server level configuration to enable/disable data-tiering
> 
>
> Key: HBASE-28535
> URL: https://issues.apache.org/jira/browse/HBASE-28535
> Project: HBase
>  Issue Type: Task
>  Components: BucketCache
>Reporter: Janardhan Hungund
>Assignee: Janardhan Hungund
>Priority: Major
>  Labels: pull-request-available
>
> Provide the user with the ability to enable and disable the data tiering 
> feature. The time-based data tiering is applicable to a specific set of use 
> cases which write date based records and access to recently written data.
> The feature, in general, should be avoided for use cases which are not 
> dependent on the date-based reads and writes as the code flows which enable 
> data temperature checks can induce performance regressions.
> This Jira is added to track the functionality to optionally enable 
> region-server wide configuration to disable or enable the feature.
> Thanks,
> Janardhan



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28468) Integration of time-based priority caching logic into cache evictions.

2024-04-25 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28468.
--
Resolution: Fixed

Merged into the feature branch.

> Integration of time-based priority caching logic into cache evictions.
> --
>
> Key: HBASE-28468
> URL: https://issues.apache.org/jira/browse/HBASE-28468
> Project: HBase
>  Issue Type: Task
>Reporter: Janardhan Hungund
>Assignee: Janardhan Hungund
>Priority: Major
>  Labels: pull-request-available
>
> When the time-based priority caching is enabled, then, the block evictions 
> triggered when the cache is full, should use the time-based priority caching 
> framework APIs to detect the cold files and evict the blocks of those files 
> first. This ensures that the hot data remains in cache while the cold data is 
> evicted from cache.
> Thanks,
> Janardhan



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28466) Integration of time-based priority logic of bucket cache in prefetch functionality of HBase.

2024-04-22 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28466.
--
Resolution: Fixed

Merged into feature branch. Thanks for the contribution, [~vinayakhegde] !

> Integration of time-based priority logic of bucket cache in prefetch 
> functionality of HBase.
> 
>
> Key: HBASE-28466
> URL: https://issues.apache.org/jira/browse/HBASE-28466
> Project: HBase
>  Issue Type: Task
>  Components: BucketCache
>Reporter: Janardhan Hungund
>Assignee: Vinayak Hegde
>Priority: Major
>  Labels: pull-request-available
>
> This Jira tracks the integration of the framework of APIs (implemented in 
> HBASE-28465) related to data tiering into prefetch logic of HBase. The 
> implementation should filter out the cold data and enable the prefetching of 
> hot data into bucket cache.
> Thanks,
> Janardhan
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28292) Make Delay prefetch property to be dynamically configured

2024-04-16 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28292.
--
Resolution: Fixed

PR got merged into master branch by [~psomogyi] and I had backported it into 
branch-3, branch-2, branch-2.6, branch-2.5 and branch-2.4. Thanks for the 
contributions, [~kabhishek4] !

> Make Delay prefetch property to be dynamically configured
> -
>
> Key: HBASE-28292
> URL: https://issues.apache.org/jira/browse/HBASE-28292
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.6.0, 2.4.17, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0, 2.5.8
>Reporter: Abhishek Kothalikar
>Assignee: Abhishek Kothalikar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0, 4.0.0-alpha-1, 2.7.0, 2.5.9
>
> Attachments: HBASE-28292.docx
>
>
> Make the prefetch delay configurable. The prefetch delay is associated to 
> hbase.hfile.prefetch.delay configuration. There are some cases where 
> configuring hbase.hfile.prefetch.delay would help in achieving better 
> throughput. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28505) Implement enforcement to require Date Tiered Compaction for Time Range Data Tiering

2024-04-12 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28505.
--
Resolution: Fixed

Merged into HBASE-28463 feature branch.

> Implement enforcement to require Date Tiered Compaction for Time Range Data 
> Tiering
> ---
>
> Key: HBASE-28505
> URL: https://issues.apache.org/jira/browse/HBASE-28505
> Project: HBase
>  Issue Type: Task
>Reporter: Vinayak Hegde
>Assignee: Vinayak Hegde
>Priority: Major
>  Labels: pull-request-available
>
> The implementation should enforce the requirement of enabling Date Tiered 
> Compaction for Time Range Data Tiering. This restriction ensures that users 
> can fully benefit from Time Range Data Tiering functionality by disallowing 
> its usage unless Date Tiered Compaction is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28465) Implementation of framework for time-based priority bucket-cache.

2024-04-08 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28465.
--
Resolution: Fixed

Merged into the [HBASE-28463|https://github.com/apache/hbase/tree/HBASE-28463] 
feature branch.

> Implementation of framework for time-based priority bucket-cache.
> -
>
> Key: HBASE-28465
> URL: https://issues.apache.org/jira/browse/HBASE-28465
> Project: HBase
>  Issue Type: Task
>Reporter: Janardhan Hungund
>Assignee: Vinayak Hegde
>Priority: Major
>  Labels: pull-request-available
>
> In this Jira, we track the implementation of framework for the time-based 
> priority cache.
> This framework would help us to get the required metadata of the HFiles and 
> helps us make the decision about the hotness or coldness of data.
> Thanks,
> Janardhan



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28458) BucketCache.notifyFileCachingCompleted may incorrectly consider a file fully cached

2024-04-05 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28458.
--
Resolution: Fixed

Merged into master, branch-3, branch-2 and branch-2.6. Thanks for reviewing it 
[~zhangduo] [~psomogyi] !

> BucketCache.notifyFileCachingCompleted may incorrectly consider a file fully 
> cached
> ---
>
> Key: HBASE-28458
> URL: https://issues.apache.org/jira/browse/HBASE-28458
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 3.0.0, 4.0.0-alpha-1, 2.7.0
>
>
> Noticed that 
> TestBucketCachePersister.testPrefetchBlockEvictionWhilePrefetchRunning was 
> flakey, failing whenever the block eviction happened while prefetch was still 
> ongoing.
> In the test, we pass an instance of BucketCache directly to the cache config, 
> so the test is actually placing both data and meta blocks in the bucket 
> cache. So sometimes, the test call BucketCache.notifyFileCachingCompleted 
> after the it has already evicted two blocks.  
> Inside BucketCache.notifyFileCachingCompleted, we iterate through the 
> backingMap entry set, counting number of blocks for the given file. Then, to 
> consider whether the file is fully cached or not, we do the following 
> validation:
> {noformat}
> if (dataBlockCount == count.getValue() || totalBlockCount == 
> count.getValue()) {
>   LOG.debug("File {} has now been fully cached.", fileName);
>   fileCacheCompleted(fileName, size);
> }  {noformat}
> But the test generates 57 total blocks, 55 data and 2 meta blocks. It evicts 
> two blocks and asserts that the file hasn't been considered fully cached. 
> When these evictions happen while prefetch is still going, we'll pass that 
> check, as the the number of blocks for the file in the backingMap would still 
> be 55, which is what we pass as dataBlockCount.
> As BucketCache is intended for storing data blocks only, I believe we should 
> make sure BucketCache.notifyFileCachingCompleted only accounts for data 
> blocks. Also, the 
> TestBucketCachePersister.testPrefetchBlockEvictionWhilePrefetchRunning should 
> be updated to consistently reproduce the eviction concurrent to the prefetch. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28450) BuckeCache.evictBlocksByHfileName won't work after a cache recovery from file

2024-03-28 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28450.
--
Resolution: Fixed

Merged to master, branch-3, branch-2 and branch-2.6. Thanks for the reviews, 
[~psomogyi]  [~ankit.jhil]!

> BuckeCache.evictBlocksByHfileName won't work after a cache recovery from file
> -
>
> Key: HBASE-28450
> URL: https://issues.apache.org/jira/browse/HBASE-28450
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.6.1
>
>
> HBASE-27313, HBASE-27686 and HBASE-27743 have extended BucketCache persistent 
> cache capabilities to make it resilient to RS crashes or non graceful stops, 
> when using file based ioengine for BucketCache.
> BucketCache maintains two main collections for tracking blocks in the cache: 
> backingMap and blocksByHFile. The former is used as the main index of blocks 
> for the actual cache, whilst the latter is a set of all blocks in the cache 
> ordered by name, in order to conveniently and efficiently retrieve the list 
> of all blocks from a single file in the BucketCache.evictBlocksByHfile method.
>  
> The problem is that at cache recovery time, we are populating the 
> blocksByHFile set, which causes any calls to BucketCache.evictBlocksByHfile 
> method to not evict any blocks, once we have recovered the cache from the 
> cache persistence file (for instance, after a n RS restart).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28458) BucketCache.notifyFileCachingCompleted may incorrectly consider a file fully cached

2024-03-26 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28458:


 Summary: BucketCache.notifyFileCachingCompleted may incorrectly 
consider a file fully cached
 Key: HBASE-28458
 URL: https://issues.apache.org/jira/browse/HBASE-28458
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


Noticed that 
TestBucketCachePersister.testPrefetchBlockEvictionWhilePrefetchRunning was 
flakey, failing whenever the block eviction happened while prefetch was still 
ongoing.

In the test, we pass an instance of BucketCache directly to the cache config, 
so the test is actually placing both data and meta blocks in the bucket cache. 
So sometimes, the test call BucketCache.notifyFileCachingCompleted after the it 
has already evicted two blocks.  

Inside BucketCache.notifyFileCachingCompleted, we iterate through the 
backingMap entry set, counting number of blocks for the given file. Then, to 
consider whether the file is fully cached or not, we do the following 
validation:
{noformat}
if (dataBlockCount == count.getValue() || totalBlockCount == count.getValue()) {
  LOG.debug("File {} has now been fully cached.", fileName);
  fileCacheCompleted(fileName, size);
}  {noformat}
But the test generates 57 total blocks, 55 data and 2 meta blocks. It evicts 
two blocks and asserts that the file hasn't been considered fully cached. When 
these evictions happen while prefetch is still going, we'll pass that check, as 
the the number of blocks for the file in the backingMap would still be 55, 
which is what we pass as dataBlockCount.

As BucketCache is intended for storing data blocks only, I believe we should 
make sure BucketCache.notifyFileCachingCompleted only accounts for data blocks. 
Also, the 
TestBucketCachePersister.testPrefetchBlockEvictionWhilePrefetchRunning should 
be updated to consistently reproduce the eviction concurrent to the prefetch. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28450) BuckeCache.evictBlocksByHfileName won't work after a cache recovery from file

2024-03-20 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28450:


 Summary: BuckeCache.evictBlocksByHfileName won't work after a 
cache recovery from file
 Key: HBASE-28450
 URL: https://issues.apache.org/jira/browse/HBASE-28450
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


HBASE-27313, HBASE-27686 and HBASE-27743 have extended BucketCache persistent 
cache capabilities to make it resilient to RS crashes or non graceful stops, 
when using file based ioengine for BucketCache.

BucketCache maintains two main collections for tracking blocks in the cache: 
backingMap and blocksByHFile. The former is used as the main index of blocks 
for the actual cache, whilst the latter is a set of all blocks in the cache 
ordered by name, in order to conveniently and efficiently retrieve the list of 
all blocks from a single file in the BucketCache.evictBlocksByHfile method.

 

The problem is that at cache recovery time, we are populating the blocksByHFile 
set, which causes any calls to BucketCache.evictBlocksByHfile method to not 
evict any blocks, once we have recovered the cache from the cache persistence 
file (for instance, after a n RS restart).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28303) Interrupt cache prefetch thread when a heap usage threshold is reached

2024-02-06 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28303.
--
Resolution: Fixed

Merged into master, branch-3 and branch-2.

> Interrupt cache prefetch thread when a heap usage threshold is reached
> --
>
> Key: HBASE-28303
> URL: https://issues.apache.org/jira/browse/HBASE-28303
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.6.0, 2.4.17, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7, 2.7.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 3.0.0, 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2
>
>
> Mostly critical when using non heap cache implementations, such as offheap or 
> file based. If the cache medium is too large and there are many blocks to be 
> cached, it may create a lot of cache index object in the RegionServer heap. 
> We should have guardrails to preventing caching from exhausting available 
> heap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28303) Interrupt cache prefetch thread when a heap usage threshold is reached

2024-01-10 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28303:


 Summary: Interrupt cache prefetch thread when a heap usage 
threshold is reached
 Key: HBASE-28303
 URL: https://issues.apache.org/jira/browse/HBASE-28303
 Project: HBase
  Issue Type: Improvement
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


Mostly critical when using non heap cache implementations, such as offheap or 
file based. If the cache medium is too large and there are many blocks to be 
cached, it may create a lot of cache index object in the RegionServer heap. We 
should have guardrails to preventing caching from exhausting available heap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28259) Add java.base/java.io=ALL-UNNAMED open to jdk11_jvm_flags

2024-01-05 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28259.
--
Resolution: Fixed

Merged to master, branch-3, branch-2, branch-2.6, branch-2.5 and branch-2.4. 
Thanks for the contribution, [~mrzhao] !

> Add  java.base/java.io=ALL-UNNAMED open to jdk11_jvm_flags
> --
>
> Key: HBASE-28259
> URL: https://issues.apache.org/jira/browse/HBASE-28259
> Project: HBase
>  Issue Type: Bug
>  Components: java
>Reporter: Moran
>Assignee: Moran
>Priority: Trivial
>
> hbase shell
> 2023-12-13T23:49:50.846+08:00 [main] WARN FilenoUtil : Native subprocess 
> control requires open access to the JDK IO subsystem
> Pass '--add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens 
> java.base/java.io=ALL-UNNAMED' to enable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28246) Expose region cached size over JMX metrics and report in the RS UI

2023-12-14 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28246.
--
Resolution: Fixed

Thanks for reviewing it, [~psomogyi]! Merged into master, branch-3 and branch-2.

> Expose region cached size over JMX metrics and report in the RS UI
> --
>
> Key: HBASE-28246
> URL: https://issues.apache.org/jira/browse/HBASE-28246
> Project: HBase
>  Issue Type: Improvement
>  Components: BucketCache
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 3.0.0-beta-1, 4.0.0-alpha-1
>
> Attachments: Screenshot 2023-12-06 at 22.58.17.png
>
>
> With large file based bucket cache, the prefetch executor can take long time 
> to complete cache all of the dataset. It would be useful to report how much % 
> of regions data is already cached, in order to give an idea of how much work 
> prefetch executor has done.
> This PRs adds jmx metrics for region cache % and also reports the same in the 
> RS UI "Store File Metrics" tab as below:
> !Screenshot 2023-12-06 at 22.58.17.png|width=658,height=114!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28251) [SFT] Add description for specifying SFT impl during snapshot recovery

2023-12-11 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28251.
--
Resolution: Fixed

Merged into master branch. Thanks for reviewing this, [~psomogyi] , 
[~nihaljain.cs] and [~zhangduo] !

> [SFT] Add description for specifying SFT impl during snapshot recovery
> --
>
> Key: HBASE-28251
> URL: https://issues.apache.org/jira/browse/HBASE-28251
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 4.0.0-alpha-1
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 4.0.0-alpha-1
>
>
> HBASE-26286 added an option to clone_snapshot command that allows for 
> specifying the SFT implementation during the snapshot recovery. This really 
> useful when recovering snapshots imported from clusters not using the same 
> SFT impl as the one where we are cloning it. Without this, the cloned/restore 
> table will get created with the the SFT impl of its original cluster, 
> requiring extra conversion steps using the MIGRATION tracker.
> This also fix formatting for the "Bulk Data Generator Tool", which is 
> currently displayed as a sub-topic of the SFT chapter. It should have it's 
> own chapter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28251) [SFT] Add description for specifying SFT impl during snapshot recovery

2023-12-08 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28251:


 Summary: [SFT] Add description for specifying SFT impl during 
snapshot recovery
 Key: HBASE-28251
 URL: https://issues.apache.org/jira/browse/HBASE-28251
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


HBASE-26286 added an option to clone_snapshot command that allows for 
specifying the SFT implementation during the snapshot recovery. This really 
useful when recovering snapshots imported from clusters not using the same SFT 
impl as the one where we are cloning it. Without this, the cloned/restore table 
will get created with the the SFT impl of its original cluster, requiring extra 
conversion steps using the MIGRATION tracker.

This also fix formatting for the "Bulk Data Generator Tool", which is currently 
displayed as a sub-topic of the SFT chapter. It should have it's own chapter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28209) Create a jmx metrics to expose the oldWALs directory size

2023-12-08 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28209.
--
Resolution: Fixed

I have now merged the branch-2 PR and cherry-picked into branch-2.6, branch-2.5 
and branch-2.4. Thanks for the contribution, [~vinayakhegde] !

> Create a jmx metrics to expose the oldWALs directory size
> -
>
> Key: HBASE-28209
> URL: https://issues.apache.org/jira/browse/HBASE-28209
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1
>Reporter: Vinayak Hegde
>Assignee: Vinayak Hegde
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7, 2.7.0
>
>
> Create a jmx metrics that can return the size of the old WALs in bytes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28246) Expose region cached size over JMX metrics and report in the RS UI

2023-12-06 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28246:


 Summary: Expose region cached size over JMX metrics and report in 
the RS UI
 Key: HBASE-28246
 URL: https://issues.apache.org/jira/browse/HBASE-28246
 Project: HBase
  Issue Type: Improvement
  Components: BucketCache
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil
 Attachments: Screenshot 2023-12-06 at 22.58.17.png

With large file based bucket cache, the prefetch executor can take long time to 
complete cache all of the dataset. It would be useful to report how much % of 
regions data is already cached, in order to give an idea of how much work 
prefetch executor has done.

This PRs adds jmx metrics for region cache % and also reports the same in the 
RS UI "Store File Metrics" tab as below:

!Screenshot 2023-12-06 at 22.58.17.png|width=658,height=114!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28186) Rebase CacheAwareBalance related commits into master branch

2023-11-30 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28186.
--
Resolution: Fixed

Thanks for helping with the backport to branch-2, [~ragarkar] ! 

> Rebase CacheAwareBalance related commits into master branch
> ---
>
> Key: HBASE-28186
> URL: https://issues.apache.org/jira/browse/HBASE-28186
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28211) BucketCache.blocksByHFile may leak on allocationFailure or if we reach io errors tolerated

2023-11-29 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28211.
--
Resolution: Fixed

Thanks for reviewing it, [~zhangduo]! I have now merged it to master, branch-3, 
branch-2, branch-2.5 and branch-2.6.

> BucketCache.blocksByHFile may leak on allocationFailure or if we reach io 
> errors tolerated
> --
>
> Key: HBASE-28211
> URL: https://issues.apache.org/jira/browse/HBASE-28211
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7
>
>
> We add blocks to BucketCache.blocksByHFile on doDrain before we actually had 
> successfully added the block to the cache. We may still fail to cache the 
> block if it is too big to fit any of the configured bucket sizes, or if we 
> fail to write it in the ioengine and reach the tolerated io errors threshold. 
> In such cases, the related block would remain in the 
> BucketCache.blocksByHFile indefinitely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28217) PrefetchExecutor should not run for files from CFs that have disabled BLOCKCACHE

2023-11-28 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28217.
--
Resolution: Fixed

Thanks for reviewing it, [~psomogyi]. I have merged it to master, branch-3, 
branch-2, branch-2.5 and branch-2.4.

> PrefetchExecutor should not run for files from CFs that have disabled 
> BLOCKCACHE
> 
>
> Key: HBASE-28217
> URL: https://issues.apache.org/jira/browse/HBASE-28217
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7
>
>
> HFilePReadReader relies on the return of CacheConfig.shouldPrefetchOnOpen 
> return to decide if it should run the PrefetchExecutor for the files. 
> Currently, CacheConfig.shouldPrefetchOnOpen returns true if 
> "hbase.rs.prefetchblocksonopen" is set to true at the config, OR 
> PREFETCH_BLOCKS_ON_OPEN is set to true at CF level.
> There's also the CacheConfig.shouldCacheDataOnRead, which returns true if 
> both hbase.block.data.cacheonread is set to true at the config AND BLOCKCACHE 
> is set to true at CF level.
> If BLOCKCACHE is set to false at CF level, HFilePReadReader will still run 
> the PrefetchExecutor to read all the file's blocks from the FileSystem, but 
> then would find out the given block shouldn't be cached. 
> I believe we should change CacheConfig.shouldPrefetchOnOpen to return true 
> only if CacheConfig.shouldCacheDataOnRead is also true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28176) PrefetchExecutor should stop once cache reaches capacity

2023-11-23 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28176.
--
Resolution: Fixed

Thanks for reviewing it, [~psomogyi]! Have merged into master, branch-3 and 
branch-2.

> PrefetchExecutor should stop once cache reaches capacity
> 
>
> Key: HBASE-28176
> URL: https://issues.apache.org/jira/browse/HBASE-28176
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 4.0.0-alpha-1
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1
>
>
> The prefetch executor runs a full scan on regions in the background once 
> regions are opened, if the "hbase.rs.prefetchblocksonopen" property is set to 
> true. However, if the store file size is much larger than the cache capacity, 
> we should interrupt the prefetch once it has reached the cache capacity, 
> otherwise it would just be triggering evictions of little value, since we 
> don't have any sense of block priority at that point. It's better to stop the 
> read, and let client reads cause the eviction of LFU blocks and cache the 
> most accessed blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28217) PrefetchExecutor should not run for files from CFs that have disabled BLOCKCACHE

2023-11-22 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28217:


 Summary: PrefetchExecutor should not run for files from CFs that 
have disabled BLOCKCACHE
 Key: HBASE-28217
 URL: https://issues.apache.org/jira/browse/HBASE-28217
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


HFilePReadReader relies on the return of CacheConfig.shouldPrefetchOnOpen 
return to decide if it should run the PrefetchExecutor for the files. 

Currently, CacheConfig.shouldPrefetchOnOpen returns true if 
"hbase.rs.prefetchblocksonopen" is set to true at the config, OR 
PREFETCH_BLOCKS_ON_OPEN is set to true at CF level.

There's also the CacheConfig.shouldCacheDataOnRead, which returns true if both 
hbase.block.data.cacheonread is set to true at the config AND BLOCKCACHE is set 
to true at CF level.

If BLOCKCACHE is set to false at CF level, HFilePReadReader will still run the 
PrefetchExecutor to read all the file's blocks from the FileSystem, but then 
would find out the given block shouldn't be cached. 

I believe we should change CacheConfig.shouldPrefetchOnOpen to return true only 
if CacheConfig.shouldCacheDataOnRead is also true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28211) BucketCache.blocksByHFile may leak on allocationFailure or if we reach io errors tolerated

2023-11-20 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28211:


 Summary: BucketCache.blocksByHFile may leak on allocationFailure 
or if we reach io errors tolerated
 Key: HBASE-28211
 URL: https://issues.apache.org/jira/browse/HBASE-28211
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


We add blocks to BucketCache.blocksByHFile on doDrain before we actually had 
successfully added the block to the cache. We may still fail to cache the block 
if it is too big to fit any of the configured bucket sizes, or if we fail to 
write it in the ioengine and reach the tolerated io errors threshold. In such 
cases, the related block would remain in the BucketCache.blocksByHFile 
indefinitely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28174) DELETE endpoint in REST API does not support deleting binary row keys/columns

2023-11-17 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28174.
--
Resolution: Fixed

Thanks for the contribution, [~james_udiljak_bhp]. I had now merged the PR on 
master branch and cherry-picked it to branch-3 and branch-2.

> DELETE endpoint in REST API does not support deleting binary row keys/columns
> -
>
> Key: HBASE-28174
> URL: https://issues.apache.org/jira/browse/HBASE-28174
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1
>Reporter: James Udiljak
>Assignee: James Udiljak
>Priority: Blocker
> Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1
>
> Attachments: delete_base64_1.png
>
>
> h2. Notes
> This is the first time I have raised an issue in the ASF Jira. Please let me 
> know if there's anything I need to adjust on the issue to fit in with your 
> development flow.
> I have marked the priority as "blocker" because this issue blocks me as a 
> user of the HBase REST API from deploying an effective solution for our 
> setup. Please feel free to change this if the Priority field has another 
> meaning to you.
> I have also chosen 2.4.17 as the affected version because this is the version 
> I am running, however looking at the source code on GitHub in the default 
> branch, I think many other versions would be affected.
> h2. Description of Issue
> The DELETE operation in the [HBase REST 
> API|https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/rest/package-summary.html#operation_delete]
>  requires specifying row keys and column families/offsets in the URI (i.e. as 
> UTF-8 text). This makes it impossible to specify a delete operation via the 
> REST API for a binary row key or column family/offset, as single bytes with a 
> decimal value greater than 127 are not valid in UTF-8.
> Percent-encoding these "high" values does not work around the issue, as the 
> HBase REST API uses Java's {{URLDecoder.Decode(percentEncodedString, 
> "UTF-8")}} function, which replaces any percent-encoded byte in the range 
> {{%80}} to {{%FF}} with the [replacement 
> character|https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character].
>  Even if this were not the case, the row-key is ultimately [converted to a 
> byte 
> array|https://github.com/apache/hbase/blob/rel/2.4.17/hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/RowSpec.java#L60-L100]
>  using UTF-8 encoding, wherein code points >127 are encoded across multiple 
> bytes, corrupting the user-supplied row key.
> h2. Proposed Solution
> I do not believe it is possible to allow encoding of arbitrary bytes in the 
> URL for the DELETE endpoint without breaking compatibility for any users who 
> may have been unknowingly UTF-8 encoding their binary row keys. Even if it 
> were possible, the syntax would likely be terse.
> Instead, I propose a new version of the DELETE endpoint that would accept row 
> keys and column families/offsets in the request _body_ (using Base64 encoding 
> for the JSON and XML formats, and bare binary for protobuf). This new 
> endpoint would follow the same conventions as the PUT operations, except that 
> cell values would not need to be specified (unless the user is performing a 
> check-and-delete operation).
> As an additional benefit, using the request body could potentially allow for 
> deleting multiple rows in a single request, which would drastically improve 
> the efficiency of my use case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28189) Fix the miss count in one of CombinedBlockCache getBlock implementations

2023-11-09 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28189.
--
Resolution: Fixed

Thanks for reviewing this, [~psomogyi]. I had merged this to master, branch-3, 
branch-2, branch-2.5 and branch-2.4.

> Fix the miss count in one of CombinedBlockCache getBlock implementations
> 
>
> Key: HBASE-28189
> URL: https://issues.apache.org/jira/browse/HBASE-28189
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7
>
>
> In one of the 
> CombinedBlockCache.getBlock(getBlock(cacheKey,cachingrepeat,updateCacheMetrics)
>  we always compute a miss in L1 if the passed block is of type DATA. We 
> should compute the miss in one of the caches only, not both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28189) Fix the miss count in one of CombinedBlockCache getBlock implementations

2023-11-06 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28189:


 Summary: Fix the miss count in one of CombinedBlockCache getBlock 
implementations
 Key: HBASE-28189
 URL: https://issues.apache.org/jira/browse/HBASE-28189
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


In one of the 
CombinedBlockCache.getBlock(getBlock(cacheKey,cachingrepeat,updateCacheMetrics) 
we always compute a miss in L1 if the passed block is of type DATA. We should 
compute the miss in one of the caches only, not both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28186) Rebase CacheAwareBalance related commits into master branch

2023-11-02 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28186:


 Summary: Rebase CacheAwareBalance related commits into master 
branch
 Key: HBASE-28186
 URL: https://issues.apache.org/jira/browse/HBASE-28186
 Project: HBase
  Issue Type: Sub-task
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28168) Add option in RegionMover.java to isolate one or more regions on the RegionSever

2023-10-27 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28168.
--
Release Note: 
This adds a new "isolate_regions" operation to RegionMover, which allows 
operators to pass a list of region encoded ids to be "isolated" in the passed 
RegionServer. 
Regions currently deployed in the RegionServer that are not in the passed list 
of regions would be moved to other RegionServers. Regions in the passed list 
that are currently on other RegionServers would be moved to the passed 
RegionServer.

Please refer to the command help for further information.
  Resolution: Fixed

> Add option in RegionMover.java to isolate one or more regions on the 
> RegionSever
> 
>
> Key: HBASE-28168
> URL: https://issues.apache.org/jira/browse/HBASE-28168
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.6, 4.0.0-alpha-1
>Reporter: Mihir Monani
>Assignee: Mihir Monani
>Priority: Minor
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7
>
>
> Sometime one or more HBase regions on RS are under high load. This can lead 
> to resource starvation for other regions hosted on the RS. It might be 
> necessary to isolate one or more regions on the RS so that region with heavy 
> load doesn't impact other regions hosted on the same RS.
> RegionMover.java class provides a way to load/unload the regions from the 
> specific RS. It would be a good to have an option to pass list of region hash 
> that should be left (or moved) on the RS and put the RS in the 
> draining/decommission mode so HMaster doesn't assign new regions on the RS.
> Ex.
> {code:java}
> --isolateRegionIds regionHash1,regionHash2,regionHash3{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28170) Put the cached time at the beginning of the block; run cache validation in the background when retrieving the persistent cache

2023-10-24 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28170.
--
Resolution: Fixed

Merged into master, branch-3 and branch-2. Thanks for reviewing it, [~psomogyi]!

> Put the cached time at the beginning of the block; run cache validation in 
> the background when retrieving the persistent cache
> --
>
> Key: HBASE-28170
> URL: https://issues.apache.org/jira/browse/HBASE-28170
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 3.0.0, 4.0.0-alpha-1
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1
>
>
> In HBASE-28004, we added a "cached time" long at the end of each block on the 
> bucket cache. We also record the cached time in the backing map we persist to 
> disk periodically, in order to retrieve the cache upon crashes/restarts. The 
> persisted backing map includes the last modification time of the cache itself.
> On restarts, once we read the backing map from the persisted file, we compare 
> the last modification time of the cache recorded there against the last 
> modification time of the cache. If those differ, it means the cache has been 
> updated after the backing map has been persisted, so the backing map might 
> not be accurate. We then iterate though the backing map entires and compare 
> the entries cached time against the related block in the cache, and if those 
> differ, we remove the entry from the map. 
> Currently this validation is made at RS initialisation time, but with caches 
> as large as 1.6TB/30M+ blocks, it can last to an hour, meaning the RS is 
> useless over that time. This PR changes this validation to be performed in 
> the background, whilst direct accesses to a block in the cache would also 
> perform the "cached time" comparison.
> This PR also moves the "cached time" to the beginning of the block in the 
> cache, instead of the end. We noticed that with the "cached time" at the end 
> we can fail to ensure consistency at some conditions. Consider the following: 
> 1) A block B1 of size S gets allocated at offset 0 with cached time T1;
> 2) The backing map is persisted, containing B1 at offset 0 and cached time T1;
> 3) B1 is evicted. It's offset in the cache is now free, however its contents 
> are still there, including the cached time T1 at its end;
> 4) A new block B2 of size S/2 gets allocated at offset 0 with cached time T2;
> 5) RS crashes before the backing map gets saved, so the persisted backing map 
> still has only the reference to B1, but not B2;
> 6) At restart, we run the validation. Because B2 was half the size of B1, we 
> haven't overridden B1 cached time from the cache, so we will successfully 
> validate B1, although its content is now half overridden by B2. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28176) PrefetchExecutor should stop once cache reaches capacity

2023-10-23 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28176:


 Summary: PrefetchExecutor should stop once cache reaches capacity
 Key: HBASE-28176
 URL: https://issues.apache.org/jira/browse/HBASE-28176
 Project: HBase
  Issue Type: Improvement
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


The prefetch executor runs a full scan on regions in the background once 
regions are opened, if the "hbase.rs.prefetchblocksonopen" property is set to 
true. However, if the store file size is much larger than the cache capacity, 
we should interrupt the prefetch once it has reached the cache capacity, 
otherwise it would just be triggering evictions of little value, since we don't 
have any sense of block priority at that point. It's better to stop the read, 
and let client reads cause the eviction of LFU blocks and cache the most 
accessed blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28170) Put the cached time at the beginning of the block run cache validation in the background when retrieving the persistent cache

2023-10-19 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28170:


 Summary: Put the cached time at the beginning of the block run 
cache validation in the background when retrieving the persistent cache
 Key: HBASE-28170
 URL: https://issues.apache.org/jira/browse/HBASE-28170
 Project: HBase
  Issue Type: Sub-task
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


In HBASE-28004, we added a "cached time" long at the end of each block on the 
bucket cache. We also record the cached time in the backing map we persist to 
disk periodically, in order to retrieve the cache upon crashes/restarts. The 
persisted backing map includes the last modification time of the cache itself.

On restarts, once we read the backing map from the persisted file, we compare 
the last modification time of the cache recorded there against the last 
modification time of the cache. If those differ, it means the cache has been 
updated after the backing map has been persisted, so the backing map might not 
be accurate. We then iterate though the backing map entires and compare the 
entries cached time against the related block in the cache, and if those 
differ, we remove the entry from the map. 

Currently this validation is made at RS initialisation time, but with caches as 
large as 1.6TB/30M+ blocks, it can last to an hour, meaning the RS is useless 
over that time. This PR changes this validation to be performed in the 
background, whilst direct accesses to a block in the cache would also perform 
the "cached time" comparison.

This PR also moves the "cached time" to the beginning of the block in the 
cache, instead of the end. We noticed that with the "cached time" at the end we 
can fail to ensure consistency at some conditions. Consider the following: 
1) A block B1 of size S gets allocated at offset 0 with cached time T1;
2) The backing map is persisted, containing B1 at offset 0 and cached time T1;
3) B1 is evicted. It's offset in the cache is now free, however its contents 
are still there, including the cached time T1 at its end;
4) A new block B2 of size S/2 gets allocated at offset 0 with cached time T2;
5) RS crashes before the backing map gets saved, so the persisted backing map 
still has only the reference to B1, but not B2;
6) At restart, we run the validation. Because B2 was half the size of B1, we 
haven't overridden B1 cached time from the cache, so we will successfully 
validate B1, although its content is now half overridden by B2. 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27999) Implement cache aware load balancer

2023-10-16 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27999.
--
Resolution: Fixed

Merged into the feature branch.

> Implement cache aware load balancer
> ---
>
> Key: HBASE-27999
> URL: https://issues.apache.org/jira/browse/HBASE-27999
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer
>Reporter: Rahul Agarkar
>Assignee: Rahul Agarkar
>Priority: Major
>
> HBase uses ephemeral cache to cache the blocks by reading them from the slow 
> storages and storing them to the bucket cache. This cache is warmed up 
> everytime a region server is started. Depending on the data size and the 
> configured cache size, the cache warm up can take anywhere between a few 
> minutes to few hours. Doing this everytime the region server starts can be a 
> very expensive process. To eliminate this, HBASE-27313 implemented the cache 
> persistence feature where the region servers periodically persist the blocks 
> cached in the bucket cache. This persisted information is then used to 
> resurrect the cache in the event of a region server restart because of normal 
> restart or crash.
> This feature aims at enhancing this capability of HBase to enable the 
> balancer implementation considers the cache allocation of each region on 
> region servers when calculating a new assignment plan and uses the 
> region/region server cache allocation info reported by region servers which 
> takes into account to calculate the percentage of HFiles cached for each 
> region on the hosting server, and then use that as another factor when 
> deciding on an optimal, new assignment plan.
>  
> A design document describing the balancer can be found at 
> https://docs.google.com/document/d/1A8-eVeRhZjwL0hzFw9wmXl8cGP4BFomSlohX2QcaFg4/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-27389) Add cost function in balancer to consider the cost of building bucket cache before moving regions

2023-10-16 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil reopened HBASE-27389:
--

> Add cost function in balancer to consider the cost of building bucket cache 
> before moving regions
> -
>
> Key: HBASE-27389
> URL: https://issues.apache.org/jira/browse/HBASE-27389
> Project: HBase
>  Issue Type: Task
>  Components: Balancer
>Reporter: Rahul Agarkar
>Assignee: Rahul Agarkar
>Priority: Major
>
> HBase currently uses StochasticLoadBalancer to determine the cost of moving 
> the regions from one RS to another. Each cost functions give a result between 
> 0 and 1, with 0 being the lowest cost and 1 being the cost. The balancer 
> iterates through each cost function and comes up with the total cost. Now, 
> the balancer will create multiple balancing plans on random actions and try 
> to compute the cost of each plan as if they are executed, if the cost of the 
> plan is less than the initial cost, the plan is executed.
> Implement a new "CacheAwareCostFunction" which takes into account if the 
> region is fully cached and return the highest cost if the plan suggests 
> moving this region.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27389) Add cost function in balancer to consider the cost of building bucket cache before moving regions

2023-10-16 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27389.
--
Resolution: Fixed

Merged into the feature branch.

> Add cost function in balancer to consider the cost of building bucket cache 
> before moving regions
> -
>
> Key: HBASE-27389
> URL: https://issues.apache.org/jira/browse/HBASE-27389
> Project: HBase
>  Issue Type: Task
>  Components: Balancer
>Reporter: Rahul Agarkar
>Assignee: Rahul Agarkar
>Priority: Major
>
> HBase currently uses StochasticLoadBalancer to determine the cost of moving 
> the regions from one RS to another. Each cost functions give a result between 
> 0 and 1, with 0 being the lowest cost and 1 being the cost. The balancer 
> iterates through each cost function and comes up with the total cost. Now, 
> the balancer will create multiple balancing plans on random actions and try 
> to compute the cost of each plan as if they are executed, if the cost of the 
> plan is less than the initial cost, the plan is executed.
> Implement a new "CacheAwareCostFunction" which takes into account if the 
> region is fully cached and return the highest cost if the plan suggests 
> moving this region.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28097) Add documentation section for the Cache Aware balancer function

2023-09-19 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28097:


 Summary: Add documentation section for the Cache Aware balancer 
function
 Key: HBASE-28097
 URL: https://issues.apache.org/jira/browse/HBASE-28097
 Project: HBase
  Issue Type: Sub-task
Reporter: Wellington Chevreuil






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27998) Enhance region metrics to include prefetch ratio for each region

2023-08-29 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27998.
--
Resolution: Done

Merged into the feature branch.

> Enhance region metrics to include prefetch ratio for each region
> 
>
> Key: HBASE-27998
> URL: https://issues.apache.org/jira/browse/HBASE-27998
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Rahul Agarkar
>Assignee: Rahul Agarkar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28044) Reduce frequency of saving backing map in persistence cache

2023-08-24 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28044:


 Summary: Reduce frequency of saving backing map in persistence 
cache
 Key: HBASE-28044
 URL: https://issues.apache.org/jira/browse/HBASE-28044
 Project: HBase
  Issue Type: Sub-task
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


Currently we always write the whole cache mapping into the persistent map file. 
This is not a lightweight operation, on a full 1.6TB cache with ten millions of 
block, this can grow as much as 10GB. In the current persistent cache 
implementation, we flush it to disk every 1s. If we raise the "checkpoint" 
period, we risk lose more cache events in the event of a recovery. 

This proposes reduce the frequency needed to save the backing map as follows:
1) Save every block addition/eviction into a single file in disk;
2) Checkpoint at higher intervals, consolidating all transactions into the 
larger map file;
3) In the event of failure, recovery would consist of loading the latest map 
file, then applying all the transactions files sequentially;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28004) Persistent cache map can get corrupt if crash happens midway through the write

2023-08-23 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28004.
--
Resolution: Fixed

> Persistent cache map can get corrupt if crash happens midway through the write
> --
>
> Key: HBASE-28004
> URL: https://issues.apache.org/jira/browse/HBASE-28004
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 3.0.0-alpha-4, 4.0.0-alpha-1
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1
>
>
> HBASE-27686 added a background thread for periodically saving the cache index 
> map, together with a list of completed cached files so that we can recover 
> the cache state in case of crash or restart. Problem is that the cache index 
> can become few GB large (a sample case with 1.6TB of used bucket cache would 
> map to between 8GB to 10GB indexes), and these writes take few seconds to 
> complete, causing any RS crash very likely to leave a corrupt index file that 
> can't be recovered when the RS starts again. Worse, since we store the list 
> of cached files on a separate file, this also leads to cache inconsistencies, 
> with files in the list of cached files never cached once the RS is restarted, 
> even though we have no cache index for those and every read ends up going to 
> the FS.
> This task aims to refactor the cache persistent as follows: 
> 1) Write both the list of completely cached files and the cache indexes in a 
> single file, so that we can have this synced atomically;
> 2) When writing the persistent cache file, use a temp name first, then once 
> the write is successfully finished, rename it to the actual name. This way, 
> if crash happens whilst the persistent cache is still being written, the temp 
> file would be corrupt, but we could still recover from the last successful 
> sync, and we would only lose the caching ops since the last sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28041) Rebase HBASE-27389 branch with master and fix conflicts

2023-08-23 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-28041.
--
Resolution: Fixed

Rebased and fixed conflicts.

> Rebase HBASE-27389 branch with master and fix conflicts
> ---
>
> Key: HBASE-28041
> URL: https://issues.apache.org/jira/browse/HBASE-28041
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28041) Rebase HBASE-27389 branch with master

2023-08-23 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28041:


 Summary: Rebase HBASE-27389 branch with master
 Key: HBASE-28041
 URL: https://issues.apache.org/jira/browse/HBASE-28041
 Project: HBase
  Issue Type: Sub-task
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27997) Enhance prefetch executor to record region prefetch information along with the list of hfiles prefetched

2023-08-03 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27997.
--
Resolution: Done

> Enhance prefetch executor to record region prefetch information along with 
> the list of hfiles prefetched
> 
>
> Key: HBASE-27997
> URL: https://issues.apache.org/jira/browse/HBASE-27997
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Affects Versions: 2.6.0, 3.0.0-alpha-4
>Reporter: Rahul Agarkar
>Assignee: Rahul Agarkar
>Priority: Major
>
> HBASE-27313 implemented the prefetch persistence feature where it persists 
> the list of hFiles prefetched in the bucket cache. This information is used 
> to reconstruct the cache in the event of a server restart/crash.
> Currently, only the list of hFiles is persisted.
> However, for the new PrefetchAwareLoadBalancer (work in progress) to work, we 
> need the information about how much a region is prefetched on a region server.
> This Jira introduces an additional map in the prefetch executor to maintain 
> the information about how much a region has been prefetched on that region 
> server. The size of region server prefetched is calculated as the total size 
> of all hFiles prefetched for that region.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28004) Persistent cache map can get corrupt if crash happens midway through the write

2023-08-01 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-28004:


 Summary: Persistent cache map can get corrupt if crash happens 
midway through the write
 Key: HBASE-28004
 URL: https://issues.apache.org/jira/browse/HBASE-28004
 Project: HBase
  Issue Type: Sub-task
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27871) Meta replication stuck forever if wal it's still reading gets rolled and deleted

2023-06-20 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27871.
--
Resolution: Fixed

> Meta replication stuck forever if wal it's still reading gets rolled and 
> deleted
> 
>
> Key: HBASE-27871
> URL: https://issues.apache.org/jira/browse/HBASE-27871
> Project: HBase
>  Issue Type: Bug
>  Components: meta replicas
>Affects Versions: 2.6.0, 2.4.16, 2.4.17, 2.5.4
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 2.5.6
>
>
> This affects branch-2 based releases only (in master, HBASE-26416 refactored 
> region replication to not rely on the replication framework anymore).
> Per the original [meta region replicas 
> design|https://docs.google.com/document/d/1jJWVc-idHhhgL4KDRpjMsQJKCl_NRaCLGiH3Wqwd3O8/edit],
>  we use most of the replication framework for communicating changes in the 
> primary replica back to the secondary ones, but we skip storing the queue 
> state in ZK. In the event of a region replication crash, we should let the 
> related replication source thread be interrupted, so that 
> RegionReplicaReplicationEndpoint would set a new source from the scratch and 
> make sure to update the secondary replicas.
>  
> We have run into a situation in one of our customers' cluster where the 
> region replica source faced a long lag (probably because the RSes hosting the 
> secondary replicas were busy and slower in processing the region replication 
> entries), so that the current wal got rolled and eventually deleted whilst 
> the replication source reader was still referring it. In such cases, 
> ReplicationSourceReader only sees the IOException and keeps retrying the read 
> indefinitely, but since the file is now gone, it will just get stuck there 
> forever. In the particular case of FNFE (which I believe would only happen 
> for region replication), we should just raise an exception and let 
> RegionReplicaReplicationEndpoint handle it to reset the region replication 
> source.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27915) Update hbase_docker with an extra Dockerfile compatible with mac m1 platfrom

2023-06-07 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27915:


 Summary: Update hbase_docker with an extra Dockerfile compatible 
with mac m1 platfrom
 Key: HBASE-27915
 URL: https://issues.apache.org/jira/browse/HBASE-27915
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


When trying to use the current Dockerfile under "./dev-support/hbase_docker" on 
m1 macs, the docker build fails at the git clone & mvn build stage with below 
error:
{noformat}
 #0 8.214 qemu-x86_64: Could not open '/lib64/ld-linux-x86-64.so.2': No such 
file or directory
{noformat}

It turns out for mac m1, we have to explicitly define the platform flag for the 
ubuntu image. I thought we could add a note in this readme, together with an 
"m1" subfolder containing a modified copy of this Dockerfile that works on mac 
m1s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27820) HBase is not starting due to Jersey library conflicts with javax.ws.rs.api jar

2023-06-01 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27820.
--
Resolution: Fixed

Also merged the branch-2.4 PR.

> HBase is not starting due to Jersey library conflicts with javax.ws.rs.api jar
> --
>
> Key: HBASE-27820
> URL: https://issues.apache.org/jira/browse/HBASE-27820
> Project: HBase
>  Issue Type: Task
>  Components: dependencies
>Affects Versions: 3.0.0-alpha-3
>Reporter: Rahul Agarkar
>Assignee: Rahul Agarkar
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.5, 2.4.18
>
>
> With some recent Atlas changes for supporting HTTP based hook support, HBase 
> is not starting because of conflicts between jersey jars and rs-api jar.
> This Jira is to exclude the javax.ws.rs-api.jar from the HBase classpath.
> HBase uses shaded jersey jars and hence does not need to use this jar 
> directly. However, it still adds this jar to the CLASSPATH while starting the 
> server. Atlas on the other hand is using a non-shaded version of 
> javax.ws.rs-api jar which causes this conflict and causes the hbase server 
> fail while initializing atlas co-processor.
> Since hbase is using shaded jersey jar and not using this jar directly, it 
> should be removed from the bundle as it may cause similar conflicts with 
> other client applications potentially using it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27874) Problem in flakey generated report causes pre-commit run to fail

2023-05-18 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27874:


 Summary: Problem in flakey generated report causes pre-commit run 
to fail
 Key: HBASE-27874
 URL: https://issues.apache.org/jira/browse/HBASE-27874
 Project: HBase
  Issue Type: Bug
  Components: build
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


Have noticed the UT pre-commit run failed on this latest PR for branch-2 with 
the below:
{noformat}
Thu May 18 10:37:32 AM UTC 2023
cd 
/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-5241/yetus-jdk8-hadoop2-check/src/hbase-server
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-5241/yetus-m2/hbase-branch-2-patch-1
 --threads=4 
-Djava.io.tmpdir=/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-5241/yetus-jdk8-hadoop2-check/src/target
 -DHBasePatchProcess -PrunAllTests 
-Dtest.exclude.pattern=**/regionserver.TestMetricsRegionServer.java,**/master.procedure.TestSnapshotProcedureRSCrashes.java,**/security.access.TestAccessController.java,**/conf.TestConfigurationManagerWARNING:
 package jdk.internal.util.random not in 
java.base.java,**/io.hfile.bucket.TestPrefetchPersistence.java,**/client.TestFromClientSide3.java,**/replication.TestReplicationMetricsforUI.java,**/io.hfile.bucket.TestBucketCache.java,**/replication.regionserver.TestReplicationValueCompressedWAL.java,**/master.procedure.TestHBCKSCP.java,**/http.TestInfoServersACL.java,**/io.hfile.bucket.TestBucketCachePersister.java,**/replication.TestReplicationKillSlaveRS.java,**/regionserver.TestClearRegionBlockCache.java,**/master.TestUnknownServers.java,**/replication.TestReplicationKillSlaveRSWithSeparateOldWALs.java,**/quotas.TestClusterScopeQuotaThrottle.java,**/io.hfile.TestBlockEvictionOnRegionMovement.java,**/replication.regionserver.TestMetaRegionReplicaReplicationEndpoint.java,**/regionserver.TestRegionReplicas.java,**/coprocessor.TestCoprocessorEndpointTracing.java,**/master.region.TestMasterRegionCompaction.java,**/io.hfile.TestPrefetchRSClose.java
 -Dsurefire.firstPartForkCount=0.5C -Dsurefire.secondPartForkCount=0.5C clean 
test -fae


[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  0.861 s (Wall Clock)
[INFO] Finished at: 2023-05-18T10:37:34Z
[INFO] 
[ERROR] Unknown lifecycle phase "jdk.internal.util.random". You must specify a 
valid lifecycle phase or a goal in the format : or 
:[:]:. Available 
lifecycle phases are: validate, initialize, generate-sources, process-sources, 
generate-resources, process-resources, compile, process-classes, 
generate-test-sources, process-test-sources, generate-test-resources, 
process-test-resources, test-compile, process-test-classes, test, 
prepare-package, package, pre-integration-test, integration-test, 
post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, 
pre-site, site, post-site, site-deploy. -> [Help 1]
[ERROR] 
{noformat}
Note the "{+}**/conf.TestConfigurationManagerWARNING: package 
jdk.internal.util.random not in java.base.java{+}" passed as one of the 
supposedly flakey tests. Looking around our build scripts, I figured we pull 
the list of flakey from the "{+}excludes{+}" artifact generated by the latest 
"find flakey" build. It seems the [latest branch-2 
run|https://ci-hbase.apache.org/job/HBase-Find-Flaky-Tests/job/branch-2/1063/artifact/output/excludes]
 generated this artifact with the wrong name already:

{noformat}
**/replication.TestReplicationMetricsforUI.java,**/conf.TestConfigurationManagerWARNING:
 package jdk.internal.util.random not in 
java.base.java,**/master.region.TestMasterRegionCompaction.java,**/regionserver.TestRegionReplicas.java,**/replication.regionserver.TestReplicationValueCompressedWAL.java,**/coprocessor.TestCoprocessorEndpointTracing.java,**/quotas.TestClusterScopeQuotaThrottle.java,**/replication.TestReplicationKillSlaveRSWithSeparateOldWALs.java,**/client.TestFromClientSide3.java,**/io.hfile.TestBlockEvictionOnRegionMovement.java,**/io.hfile.bucket.TestPrefetchPersistence.java,**/regionserver.TestMetricsRegionServer.java,**/io.hfile.bucket.TestBucketCachePersister.java,**/regionserver.TestClearRegionBlockCache.java,**/master.procedure.TestHBCKSCP.java,**/replication.regionserver.TestMetaRegionReplicaReplicationEndpoint.java,**/security.access.TestAccessController.java,**/io.hfile.bucket.TestBucketCache.java,**/io.hfile.TestPrefetchRSClose.java,**/replication.TestReplicationKillSlaveRS.java,**/master.TestUnknownServers.java,**/http.TestInfoServersACL.java
{noformat}

Digging deeper, found that the "find flakey" build checks the UT output of 
latest nightly and flakey builds, to 

[jira] [Created] (HBASE-27871) Meta replication stuck forever if wal it's still reading gets rolled and deleted

2023-05-17 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27871:


 Summary: Meta replication stuck forever if wal it's still reading 
gets rolled and deleted
 Key: HBASE-27871
 URL: https://issues.apache.org/jira/browse/HBASE-27871
 Project: HBase
  Issue Type: Bug
  Components: meta replicas
Affects Versions: 2.5.4, 2.4.17
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


This affects branch-2 based releases only (in master, HBASE-26416 refactored 
region replication to not rely on the replication framework anymore).

Per the original [meta region replicas 
design|https://docs.google.com/document/d/1jJWVc-idHhhgL4KDRpjMsQJKCl_NRaCLGiH3Wqwd3O8/edit],
 we use most of the replication framework for communicating changes in the 
primary replica back to the secondary ones, but we skip storing the queue state 
in ZK. In the event of a region replication crash, we should let the related 
replication source thread be interrupted, so that 
RegionReplicaReplicationEndpoint would set a new source from the scratch and 
make sure to update the secondary replicas.
 
We have run into a situation in one of our customers' cluster where the region 
replica source faced a long lag (probably because the RSes hosting the 
secondary replicas were busy and slower in processing the region replication 
entries), so that the current wal got rolled and eventually deleted whilst the 
replication source reader was still referring it. In such cases, 
ReplicationSourceReader only sees the IOException and keeps retrying the read 
indefinitely, but since the file is now gone, it will just get stuck there 
forever. In the particular case of FNFE (which I believe would only happen for 
region replication), we should just raise an exception and let 
RegionReplicaReplicationEndpoint handle it to reset the region replication 
source.
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27789) Backport "HBASE-24914 Remove duplicate code appearing continuously in method ReplicationPeerManager.updatePeerConfig" to branch-2

2023-04-14 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27789.
--
Resolution: Fixed

Merged into branch-2.5. I had also made you a contributor, [~Li Zhexi], so you 
should be able to self-assign jiras now.

> Backport "HBASE-24914 Remove duplicate code appearing continuously in method 
> ReplicationPeerManager.updatePeerConfig" to branch-2
> -
>
> Key: HBASE-27789
> URL: https://issues.apache.org/jira/browse/HBASE-27789
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 2.5.0, 2.4.14
>Reporter: Li Zhexi
>Assignee: Li Zhexi
>Priority: Minor
> Fix For: 2.5.5
>
>
> Branch-2/ Branch-2.4/Branch-2.5 also have duplicate code in 
> ReplicationPeerManager#updatePeerConfig
> newPeerConfigBuilder.putAllConfiguration(oldPeerConfig.getConfiguration());
> newPeerConfigBuilder.putAllConfiguration(peerConfig.getConfiguration());
> newPeerConfigBuilder.putAllConfiguration(oldPeerConfig.getConfiguration());
> newPeerConfigBuilder.putAllConfiguration(peerConfig.getConfiguration());



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27795) Define RPC API for cache cleaning

2023-04-14 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27795:


 Summary: Define RPC API for cache cleaning
 Key: HBASE-27795
 URL: https://issues.apache.org/jira/browse/HBASE-27795
 Project: HBase
  Issue Type: Sub-task
Reporter: Wellington Chevreuil


We should add an RPC API to allow for a "limited manual" cache cleaning. If 
hbase.rs.evictblocksonclose is set to false, blocks may hang in the cache upon 
regions move between RSes. 

The method at the RS level, should compare the files from its online regions 
against the files in the prefetch list file, evicting blocks from the files in 
the prefetch list file that are not in any of the online regions for the given 
RS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27794) Tooling for parsing/reading the prefetch files list file

2023-04-14 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27794:


 Summary: Tooling for parsing/reading the prefetch files list file
 Key: HBASE-27794
 URL: https://issues.apache.org/jira/browse/HBASE-27794
 Project: HBase
  Issue Type: Sub-task
Reporter: Wellington Chevreuil


The content of the file defined by hbase.prefetch.file.list.path is encoded.  
It would be nice to have some extra tool for properly parsing it and print the 
list in human readable format, for easy of troubleshooting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27750) Update the list of prefetched hfiles upon simple block eviction

2023-03-27 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27750.
--
Resolution: Fixed

Thanks for the contribution [~sk...@cloudera.com]. I had now merged it into 
master and branch-2 branches.

> Update the list of prefetched hfiles upon simple block eviction
> ---
>
> Key: HBASE-27750
> URL: https://issues.apache.org/jira/browse/HBASE-27750
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Affects Versions: 2.6.0, 3.0.0-alpha-4
>Reporter: Shanmukha Haripriya Kota
>Assignee: Shanmukha Haripriya Kota
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> Currently, we maintain a list of Hfiles on disk for which prefetch is 
> complete and avoid prefetching those files after a restart. But we don't 
> handle cases where blocks are evicted from the cache. This ticket is for 
> updating the list of prefetched files upon simple block eviction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27686) Recovery of BucketCache and Prefetched data after RS Crash

2023-03-16 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27686.
--
Resolution: Fixed

Thanks for the contribution [~sk...@cloudera.com]! I have now merged this into 
master and branch-2.

> Recovery of BucketCache and Prefetched data after RS Crash
> --
>
> Key: HBASE-27686
> URL: https://issues.apache.org/jira/browse/HBASE-27686
> Project: HBase
>  Issue Type: Improvement
>  Components: BucketCache
>Reporter: Shanmukha Haripriya Kota
>Assignee: Shanmukha Haripriya Kota
>Priority: Major
>
> HBASE-27313 introduced the ability to persist a list of hfiles for which 
> prefetch has already been completed, so the we can avoid prefetching those 
> files again in the event of a graceful restart, but it doesn't cover crash 
> scenarios, as if the RS is killed or abnormally stopped, the list wouldn't be 
> saved. 
> This change aims to persist the list of already prefetched from a background 
> thread that periodically checks cache state and persists the list if updates 
> have happened.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27619) Bulkload fails when trying to bulkload files with invalid names after HBASE-26707

2023-02-09 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27619.
--
Resolution: Fixed

Merged into master, branch-2 and branch-2.5 branches. Thanks for reviewing it, 
[~swu]

> Bulkload fails when trying to bulkload files with invalid names after 
> HBASE-26707
> -
>
> Key: HBASE-27619
> URL: https://issues.apache.org/jira/browse/HBASE-27619
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-alpha-3, 2.5.3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4
>
>
> HBASE-26707 has introduced changes to reduce renames on bulkload when using 
> FILE based SFT. However, if the bulkloading file has an invalid hfile name, 
> or has been split in the bulkload process, we don't do any renaming when FILE 
> based SFT is enabled, and we place the file name as "it is" in the store dir. 
> This later fails the validations performed by StoreFileReader when it tries 
> to open the file.
> This jira adds extra validation for the bulkloading file name format in 
> HRegion.bulkLoadHFiles and also extends TestLoadIncrementalHFiles to run the 
> same test suite with FILE based SFT enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27619) Bulkload fails when trying to bulkload files with invalid names after HBASE-26707

2023-02-07 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27619:


 Summary: Bulkload fails when trying to bulkload files with invalid 
names after HBASE-26707
 Key: HBASE-27619
 URL: https://issues.apache.org/jira/browse/HBASE-27619
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


HBASE-26707 has introduced changes to reduce renames on bulkload when using 
FILE based SFT. However, if the bulkloading file has an invalid hfile name, or 
has been split in the bulkload process, we don't do any renaming when FILE 
based SFT is enabled, and we place the file name as "it is" in the store dir. 
This later fails the validations performed by StoreFileReader when it tries to 
open the file.

This jira adds extra validation for the bulkloading file name format in 
HRegion.bulkLoadHFiles and also extends TestLoadIncrementalHFiles to run the 
same test suite with FILE based SFT enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27551) Add config options to delay assignment to retain last region location

2023-01-25 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27551.
--
Resolution: Fixed

Merged into master and branch-2. Thanks for reviewing it [~zhangduo][~swu]!

> Add config options to delay assignment to retain last region location
> -
>
> Key: HBASE-27551
> URL: https://issues.apache.org/jira/browse/HBASE-27551
> Project: HBase
>  Issue Type: Improvement
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> HBASE-27313 introduced the ability to persist the list of files cached in a 
> given RS, but temporary RSes loss or restarts would cause regions to be 
> eagerly reassigned on other RSes, making the persisted cache useless. For 
> some use cases, such as when using ObjectStores based persistence, 
> performance degradation caused by cache misses have a worse impact than 
> temporary region unavailability.  
> This proposes and additional config property (disabled by default) to 
> forcibly wait the TRSP for a configurable time while checking for the 
> previous RS holding region to get back online, before proceeding with the 
> region assignment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27551) Add config options to delay assignment to retain last region location

2023-01-05 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27551:


 Summary: Add config options to delay assignment to retain last 
region location
 Key: HBASE-27551
 URL: https://issues.apache.org/jira/browse/HBASE-27551
 Project: HBase
  Issue Type: Improvement
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


HBASE-27313 introduced the ability to persist the list of files cached in a 
given RS, but temporary RSes loss or restarts would cause regions to be eagerly 
reassigned on other RSes, making the persisted cache useless. For some use 
cases, such as when using ObjectStores based persistence, performance 
degradation caused by cache misses have a worse impact than temporary region 
unavailability.  

This proposes and additional config property (disabled by default) to forcibly 
wait the TRSP for a configurable time while checking for the previous RS 
holding region to get back online, before proceeding with the region assignment.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27474) Evict blocks on split/merge; Avoid caching reference/hlinks if compaction is enabled

2022-12-19 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27474.
--
Release Note: 
This modifies behaviour of block cache management as follows:
1) Always evict blocks for the files from parent split region once it's closed, 
regardless of the "hbase.rs.evictblocksonclose" configured value;
2) If compactions are enabled, doesn't cache blocks for the refs/link files 
under split daughters once these regions are opened;

For #1 above, an additional evict_cache property has been added to the 
CloseRegionRequest protobuf message. It's default to false. Rolling upgrading 
cluster would retain the previous behaviour on RSes not yet upgraded.

  Resolution: Fixed

Thanks for the review [~psomogyi] ! Had merged into master and branch-2.

> Evict blocks on split/merge; Avoid caching reference/hlinks if compaction is 
> enabled
> 
>
> Key: HBASE-27474
> URL: https://issues.apache.org/jira/browse/HBASE-27474
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> This change aims to improve block cache usage upon splits/merges. On a 
> split/merge event the following main steps happen:
> 1) parent regions are closed; 2) daughters are created and opened with 
> refs/hlinks; 3) Compaction is triggered soon after the daughters get online;
> With "hbase.rs.evictblocksonclose" set to false, we keep all blocks for the 
> closed regions in 1, then will try to load same blocks again on 2 (since we 
> are using the refs/links for the cache key), just to throw it away and cache 
> the compaction resulting file in 3. 
> If the block cache is close to its capacity, blocks from the compacted files 
> in 3 will likely miss the cache.
> The proposal here is to always evict blocks for parent regions on a 
> split/merge event, and also avoid caching blocks for refs/hlinks if 
> compactions are enabled. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27511) Lock contention when doing multiple parallel preads due to StoreFileReader reuse

2022-11-29 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27511:


 Summary: Lock contention when doing multiple parallel preads due 
to StoreFileReader reuse
 Key: HBASE-27511
 URL: https://issues.apache.org/jira/browse/HBASE-27511
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil
 Attachments: rs-stack-lock-contention

In HStoreFile, we reuse the StoreFileReader created during HStoreFile 
initialization when creating a StoreFileScanner for preads:

[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStoreFile.java#L545]

When using S3 as hbase storage, we noticed this caused lock contention when 
multiple clients were doing preads in parallel:
{noformat}
...
"RpcServer.default.FPBQ.Fifo.handler=38,queue=8,port=16020" #125 daemon prio=5 
os_prio=0 tid=0x7fc11d83c000 nid=0x73f2 waiting for monitor entry 
[0x7fc1154e6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:73)
        - waiting to lock <0x7fc4c2769660> (a 
org.apache.hadoop.fs.s3a.S3AInputStream)
...
"RpcServer.default.FPBQ.Fifo.handler=37,queue=7,port=16020" #124 daemon prio=5 
os_prio=0 tid=0x7fc11d83a000 nid=0x73f1 waiting for monitor entry 
[0x7fc1155e7000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:73)
        - waiting to lock <0x7fc4c2769660> (a 
org.apache.hadoop.fs.s3a.S3AInputStream)
...
"RpcServer.default.FPBQ.Fifo.handler=36,queue=6,port=16020" #123 daemon prio=5 
os_prio=0 tid=0x7fc11d838000 nid=0x73f0 runnable [0x7fc1156e8000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
...
at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:216)
        - locked <0x7fc4c2769660> (a 
org.apache.hadoop.fs.s3a.S3AInputStream)
... {noformat}
 We should create a new instance of StoreFileReader for each StoreFileScanner 
when doing preads, instead,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27484) FNFE on StoreFileScanner after a flush followed by a compaction

2022-11-29 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27484.
--
Resolution: Fixed

Merged to master, branch-2, branch-2.5 and branch-2.4. Thanks for reviewing it, 
[~psomogyi] !

> FNFE on StoreFileScanner after a flush followed by a compaction
> ---
>
> Key: HBASE-27484
> URL: https://issues.apache.org/jira/browse/HBASE-27484
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> One of our customers was running SyncTable from a 1.2 based cluster, where 
> SyncTable map tasks were open scanners on a 2.4 based cluster for comparing 
> the two clusters. Few of the map tasks failed with a DoNotRetryException 
> caused by a FileNotFoundException blowing all the way up to the client:
> {noformat}
> Error: org.apache.hadoop.hbase.DoNotRetryIOException: 
> org.apache.hadoop.hbase.DoNotRetryIOException: java.io.FileNotFoundException: 
> open 
> s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0
>  at 7225 on 
> s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0:
>  com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does 
> not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; 
> Request ID: KBRNC67WZGCS4SCF; S3 Extended Request ID: 
> wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=; 
> Proxy: null), S3 Extended Request ID: 
> wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=:NoSuchKey
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3712)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45819)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339)
> Caused by: java.io.FileNotFoundException: open 
> s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0
>  at 7225 on 
> s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0:
>  com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does 
> not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; 
> Request ID: KBRNC67WZGCS4SCF; S3 Extended Request ID: 
> wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=; 
> Proxy: null), S3 Extended Request ID: 
> wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=:NoSuchKey
> ...
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:632)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:315)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:417)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.reopenAfterFlush(StoreScanner.java:1018)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:552)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:155)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:7399)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:7567)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:7331)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3373)
>  {noformat}
> We can see on the RS logs that the above file got recently create as an 
> outcome of a memstore flush, then compaction is triggered shortly:
> {noformat}
> 2022-11-11 22:16:50,322 INFO 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed memstore 
> data size=208.15 KB at sequenceid=4949703 (bloomFilter=false), 
> to=s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0
> 2022-11-11 22:16:50,513 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Added 
> 

[jira] [Created] (HBASE-27484) FNFE on StoreFileScanner after a flush followed by a compaction

2022-11-14 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27484:


 Summary: FNFE on StoreFileScanner after a flush followed by a 
compaction
 Key: HBASE-27484
 URL: https://issues.apache.org/jira/browse/HBASE-27484
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


One of our customers was running SyncTable from a 1.2 based cluster, where 
SyncTable map tasks were open scanners on a 2.4 based cluster for comparing the 
two clusters. Few of the map tasks failed with a DoNotRetryException caused by 
a FileNotFoundException blowing all the way up to the client:
{noformat}
Error: org.apache.hadoop.hbase.DoNotRetryIOException: 
org.apache.hadoop.hbase.DoNotRetryIOException: java.io.FileNotFoundException: 
open 
s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0
 at 7225 on 
s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0:
 com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not 
exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request 
ID: KBRNC67WZGCS4SCF; S3 Extended Request ID: 
wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=; 
Proxy: null), S3 Extended Request ID: 
wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=:NoSuchKey
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3712)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45819)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339)
Caused by: java.io.FileNotFoundException: open 
s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0
 at 7225 on 
s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0:
 com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not 
exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request 
ID: KBRNC67WZGCS4SCF; S3 Extended Request ID: 
wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=; 
Proxy: null), S3 Extended Request ID: 
wWEJm8tlFSj/8g+xFpmD1vgWzT/n7HzBcOFAZ8ayIqKMsDZGN/d2kEhdusPLhMd540h+QAPP1xw=:NoSuchKey
...
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:632)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:315)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:417)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.reopenAfterFlush(StoreScanner.java:1018)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:552)
at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:155)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:7399)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:7567)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:7331)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3373)
 {noformat}

We can see on the RS logs that the above file got recently create as an outcome 
of a memstore flush, then compaction is triggered shortly:
{noformat}
2022-11-11 22:16:50,322 INFO 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed memstore data 
size=208.15 KB at sequenceid=4949703 (bloomFilter=false), 
to=s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0
2022-11-11 22:16:50,513 INFO org.apache.hadoop.hbase.regionserver.HStore: Added 
s3a://x--xxx/hbase/data/default/xx/4c53da8c2ab9b7d7a0d6046ef3bb701c/0/e2c58350e4e54c21b0f713a4c271b8c0,
 entries=951, sequenceid=4949703, filesize=26.2 K
...
2022-11-11 22:16:50,791 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Starting compaction of 4c53da8c2ab9b7d7a0d6046ef3bb701c/0 in 
xx,IT001E90506702\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1618275339031.4c53da8c2ab9b7d7a0d6046ef3bb701c.
2022-11-11 22:16:50,791 INFO 

[jira] [Created] (HBASE-27474) Evict blocks on split/merge; Avoid caching reference/hlinks if compaction is enabled

2022-11-08 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27474:


 Summary: Evict blocks on split/merge; Avoid caching 
reference/hlinks if compaction is enabled
 Key: HBASE-27474
 URL: https://issues.apache.org/jira/browse/HBASE-27474
 Project: HBase
  Issue Type: Improvement
Affects Versions: 3.0.0-alpha-3
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


This change aims to improve block cache usage upon splits/merges. On a 
split/merge event the following main steps happen:

1) parent regions are closed; 2) daughters are created and opened with 
refs/hlinks; 3) Compaction is triggered soon after the daughters get online;

With "hbase.rs.evictblocksonclose" set to false, we keep all blocks for the 
closed regions in 1, then will try to load same blocks again on 2 (since we are 
using the refs/links for the cache key), just to throw it away and cache the 
compaction resulting file in 3. 

If the block cache is close to its capacity, blocks from the compacted files in 
3 will likely miss the cache.

The proposal here is to always evict blocks for parent regions on a split/merge 
event, and also avoid caching blocks for refs/hlinks if compactions are 
enabled. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27407) Fixing check for "description" request param in JMXJsonServlet.java

2022-10-06 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27407.
--
Resolution: Fixed

Thanks for your contribution [~lkovacs] , had now merged it into master, 
branch-2, branch-2.5 and branch-2.4.

> Fixing check for "description" request param in JMXJsonServlet.java
> ---
>
> Key: HBASE-27407
> URL: https://issues.apache.org/jira/browse/HBASE-27407
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.6.0, 3.0.0-alpha-3
>Reporter: Luca Kovacs
>Assignee: Luca Kovacs
>Priority: Minor
> Fix For: 2.6.0, 2.5.1, 3.0.0-alpha-4, 2.4.15
>
>
> When trying to access the JMX metrics' description via the "description=true" 
> URL parameter, any value is accepted.
> The current version checks only if "description" is in the URL parameter, but 
> doesn't check the parameter value. 
> I would like to fix this via checking if the parameter value is 'true' and 
> showing the description only when this condition is met.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27386) Use encoded size for calculating compression ratio in block size predicator

2022-09-23 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27386.
--
Resolution: Fixed

Merged into master and branch-2. Thanks for reviewing, [~ankit.singhal] !

> Use encoded size for calculating compression ratio in block size predicator
> ---
>
> Key: HBASE-27386
> URL: https://issues.apache.org/jira/browse/HBASE-27386
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-3
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> In HBASE-27264 we had introduced the notion of block size predicators to 
> define hfile block boundaries when writing a new hfile, and provided the
> PreviousBlockCompressionRatePredicator implementation for calculating block 
> sizes based on a compression ratio. It was using the raw data size written to 
> the block so far to calculate the compression ratio, but in the case where 
> encoding is enabled, this could lead to a very high compression ratio and 
> therefore, larger block sizes. We should use the encoded size to calculate 
> compression ratio, instead.
> Here's a example scenario:
> 1) Sample block size when not using the  
> PreviousBlockCompressionRatePredicator as implemented by HBASE-27264:
> {noformat}
> onDiskSizeWithoutHeader=6613, uncompressedSizeWithoutHeader=32928 {noformat}
> 2) Sample block size when using PreviousBlockCompressionRatePredicator as 
> implemented by HBASE-27264 (uses raw data size to calculate compression rate):
> {noformat}
> onDiskSizeWithoutHeader=126920, uncompressedSizeWithoutHeader=655393
> {noformat}
> 3) Sample block size when using PreviousBlockCompressionRatePredicator with 
> encoded size for calculating compression rate:
> {noformat}
> onDiskSizeWithoutHeader=54299, uncompressedSizeWithoutHeader=328051
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27386) Use encoded size for calculating compression ratio in block size predicator

2022-09-22 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27386:


 Summary: Use encoded size for calculating compression ratio in 
block size predicator
 Key: HBASE-27386
 URL: https://issues.apache.org/jira/browse/HBASE-27386
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


In HBASE-27264 we had introduced the notion of block size predicators to define 
hfile block boundaries when writing a new hfile, and provided the

PreviousBlockCompressionRatePredicator implementation for calculating block 
sizes based on a compression ratio. It was using the raw data size written to 
the block so far to calculate the compression ratio, but in the case where 
encoding is enabled, this could lead to a very high compression ratio and 
therefore, larger block sizes. We should use the encoded size to calculate 
compression ratio, instead.

Here's a example scenario:

1) Sample block size when not using the  PreviousBlockCompressionRatePredicator 
as implemented by HBASE-27264:
{noformat}
onDiskSizeWithoutHeader=6613, uncompressedSizeWithoutHeader=32928 {noformat}

2) Sample block size when using PreviousBlockCompressionRatePredicator as 
implemented by HBASE-27264 (uses raw data size to calculate compression rate):
{noformat}
onDiskSizeWithoutHeader=126920, uncompressedSizeWithoutHeader=655393
{noformat}

3) Sample block size when using PreviousBlockCompressionRatePredicator with 
encoded size for calculating compression rate:
{noformat}
onDiskSizeWithoutHeader=54299, uncompressedSizeWithoutHeader=328051
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27370) Avoid decompressing blocks when reading from bucket cache prefetch threads

2022-09-20 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27370.
--
Resolution: Fixed

Thanks for reviewing this [~psomogyi] [~taklwu] , had merged into master, then 
cherry-picked into branch-2, branch-2.5 and branch-2.4.

> Avoid decompressing blocks when reading from bucket cache prefetch threads 
> ---
>
> Key: HBASE-27370
> URL: https://issues.apache.org/jira/browse/HBASE-27370
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-4
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 2.5.1, 3.0.0-alpha-4, 2.4.15
>
>
> When prefetching blocks into bucket cache, we had observed a consistent CPU 
> usage around 70% with no other workloads ongoing. For large bucket caches 
> (i.e. when using file based bucket cache), the prefetch can last for sometime 
> and having such a high CPU usage may impact the database usage by client 
> applications.
> Further analysis of the prefetch threads stack trace showed that very often, 
> decompress logic is being executed by these threads:
> {noformat}
> "hfile-prefetch-1654895061122" #234 daemon prio=5 os_prio=0 
> tid=0x557bb2907000 nid=0x406d runnable [0x7f294a504000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native
>  Method)
>         at 
> org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:235)
>         at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
>         at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>         - locked <0x0002d24c0ae8> (a java.io.BufferedInputStream)
>         at 
> org.apache.hadoop.hbase.io.util.BlockIOUtils.readFullyWithHeapBuffer(BlockIOUtils.java:105)
>         at 
> org.apache.hadoop.hbase.io.compress.Compression.decompress(Compression.java:465)
>         at 
> org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultDecodingContext.prepareDecoding(HFileBlockDefaultDecodingContext.java:90)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.unpack(HFileBlock.java:650)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1342)
>  {noformat}
> This is because *HFileReaderImpl.readBlock* is always decompressing blocks 
> even when *hbase.block.data.cachecompressed* is set to true. 
> This patch proposes an alternative flag to differentiate prefetch from normal 
> reads, so that doesn't decompress DATA blocks when prefetching with  
> *hbase.block.data.cachecompressed* set to true. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27370) Avoid decompressing blocks when reading from bucket cache prefetch threads

2022-09-13 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27370:


 Summary: Avoid decompressing blocks when reading from bucket cache 
prefetch threads 
 Key: HBASE-27370
 URL: https://issues.apache.org/jira/browse/HBASE-27370
 Project: HBase
  Issue Type: Improvement
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


When prefetching blocks into bucket cache, we had observed a consistent CPU 
usage around 70% with no other workloads ongoing. For large bucket caches (i.e. 
when using file based bucket cache), the prefetch can last for sometime and 
having such a high CPU usage may impact the database usage by client 
applications.

Further analysis of the prefetch threads stack trace showed that very often, 
decompress logic is being executed by these threads:
{noformat}
"hfile-prefetch-1654895061122" #234 daemon prio=5 os_prio=0 
tid=0x557bb2907000 nid=0x406d runnable [0x7f294a504000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native
 Method)
        at 
org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:235)
        at 
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
        at 
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        - locked <0x0002d24c0ae8> (a java.io.BufferedInputStream)
        at 
org.apache.hadoop.hbase.io.util.BlockIOUtils.readFullyWithHeapBuffer(BlockIOUtils.java:105)
        at 
org.apache.hadoop.hbase.io.compress.Compression.decompress(Compression.java:465)
        at 
org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultDecodingContext.prepareDecoding(HFileBlockDefaultDecodingContext.java:90)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock.unpack(HFileBlock.java:650)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1342)
 {noformat}

This is because *HFileReaderImpl.readBlock* is always decompressing blocks even 
when *hbase.block.data.cachecompressed* is set to true. 

This patch proposes an alternative flag to differentiate prefetch from normal 
reads, so that doesn't decompress DATA blocks when prefetching with  
*hbase.block.data.cachecompressed* set to true. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27265) Tool to read the contents of the storefile tracker file

2022-08-08 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27265.
--
Resolution: Fixed

Thanks for the contribution, [~abhradeep.kundu] ! Had now merged this on master 
and branch-2.

> Tool to read the contents of the storefile tracker file
> ---
>
> Key: HBASE-27265
> URL: https://issues.apache.org/jira/browse/HBASE-27265
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 3.0.0-alpha-4
>Reporter: Abhradeep Kundu
>Assignee: Abhradeep Kundu
>Priority: Minor
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> It will be useful to have a tool that provides an ability to read the 
> contents of the tracker file (.filelist/f1 or f2)
> Using the hdfs -cat or -text displays the contents of the tracker file in 
> binary and there is no option to show the contents in plain text.
> {code:java}
> x[cloudbreak@cod--z4t08rqbuyms-master0 ~]$ sudo hdfs dfs -text 
> s3a://odx-qe-bucket/odx-d7v40h/audit/cod--z4t08rqbuyms/hbase/data/default/one/6126beb5b349a1eee4b92987b78f1058/cf/.filelist/f1
> 22/04/05 19:51:14 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 22/04/05 19:51:14 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 22/04/05 19:51:14 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 22/04/05 19:51:14 INFO s3a.IDBDelegationTokenBinding: There is no Knox Token 
> available, fetching one from IDBroker...
> 22/04/05 19:51:14 INFO idbroker.AbstractIDBClient: Authenticating with 
> IDBroker requires Kerberos
> 22/04/05 19:51:14 INFO idbroker.AbstractIDBClient: Kerberos credentials are 
> available, using Kerberos to establish a session. 
> UGI=hbase/cod--z4t08rqbuyms-master0.odx-d7v4.svbr-nqvp.int.cldr.w...@odx-d7v4.svbr-nqvp.int.cldr.work
>  (auth:KERBEROS)
> Apr 05, 2022 7:51:14 PM org.apache.knox.gateway.shell.KnoxSession createClient
> INFO: Using default JAAS configuration
> 22/04/05 19:51:15 INFO s3a.IDBDelegationTokenBinding: Bonded to Knox token 
> eyJqa3...OGdQmQ
> 22/04/05 19:51:15 INFO Configuration.deprecation: No unit for 
> fs.s3a.connection.request.timeout(0) assuming SECONDS
> 22/04/05 19:51:15 INFO idbroker.AbstractIDBClient: Creating Knox CAB session 
> using Knox DT eyJqa3...OGdQmQ ...
> 22/04/05 19:51:16 INFO s3a.S3AInputStream: Switching to Random IO seek policy
> ⑙��/%
>  98ef4cf597be48598c8376bc1cac200d�&22/04/05 19:51:16 INFO 
> impl.MetricsSystemImpl: Stopping s3a-file-system metrics system...
> 22/04/05 19:51:16 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 22/04/05 19:51:16 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27147) [HBCK2] extraRegionsInMeta does not work If RegionInfo is null

2022-08-02 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27147.
--
Resolution: Fixed

> [HBCK2] extraRegionsInMeta does not work If RegionInfo is null
> --
>
> Key: HBASE-27147
> URL: https://issues.apache.org/jira/browse/HBASE-27147
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2
>Reporter: Karthik Palanisamy
>Assignee: Wellington Chevreuil
>Priority: Major
>
> extraRegionsInMeta will not clean/fix meta if info:regioninfo columns is 
> missing.
>  
> Somehow, the customer has the following empty row in meta as a stale. 
> 'I1xx,16332508x.f53609cc1ae366b43205dxxx', 'info:state', 
> 16223
>  
> And no corresponding table "I1xx" exist. 
>  
> We use extraRegionsInMeta but it didn't clean. Also, we created same table 
> again and used extraRegionsInMeta after removing HDFS data but the stale row 
> never cleaned. It looks extraRegionsInMeta works only when "info:regioninfo" 
> is present. 
>  
> We need to handle the scenario for other columns I.e info:state, info:server, 
> etc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes

2022-08-01 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27264:


 Summary: Add options to consider compressed size when delimiting 
blocks during hfile writes
 Key: HBASE-27264
 URL: https://issues.apache.org/jira/browse/HBASE-27264
 Project: HBase
  Issue Type: New Feature
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" 
property soo that it can allow for the encoded size to be considered when 
delimiting hfiles blocks during writes.

Here we propose two additional properties,"hbase.block.size.limit.compressed" 
and  "hbase.block.size.max.compressed" that would allow for consider the 
compressed size (if compression is in use) for delimiting blocks during hfile 
writing. When compression is enabled, certain datasets can have very high 
compression efficiency, so that the default 64KB block size and 10GB max file 
size can lead to hfiles with very large number of blocks. 

In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that 
switches to compressed size for delimiting blocks, and 
"hbase.block.size.max.compressed" is an int with the limit, in bytes for the 
compressed block size, in order to avoid very large uncompressed blocks 
(defaulting to 320KB).

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27232) Fix checking for encoded block size when deciding if block should be closed

2022-07-25 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27232.
--
Resolution: Fixed

> Fix checking for encoded block size when deciding if block should be closed
> ---
>
> Key: HBASE-27232
> URL: https://issues.apache.org/jira/browse/HBASE-27232
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-3, 2.4.13
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and 
> uncompressed data size when deciding to close a block and start a new one. 
> That could lead to varying "on-disk" block sizes, depending on the encoding 
> efficiency for the cells in each block.
> HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio 
> property, as ration of the original configured block size, to be compared 
> against the encoded size. This was an attempt to ensure homogeneous block 
> sizes. However, the check introduced by HBASE-17757 also considers the 
> unencoded size, which in the cases where encoding efficiency is higher than 
> what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would 
> still lead to varying block sizes.
> This patch changes that logic, to only consider encoded size if 
> hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it 
> will consider the unencoded size. This gives a finer control over the on-disk 
> block sizes and the overall number of blocks when encoding is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27232) Fix checking for encoded block size when deciding if block should be closed

2022-07-21 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27232:


 Summary: Fix checking for encoded block size when deciding if 
block should be closed
 Key: HBASE-27232
 URL: https://issues.apache.org/jira/browse/HBASE-27232
 Project: HBase
  Issue Type: Improvement
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and 
uncompressed data size when deciding to close a block and start a new one. That 
could lead to varying "on-disk" block sizes, depending on the encoding 
efficiency for the cells in each block.

HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio 
property, as ration of the original configured block size, to be compared 
against the encoded size. This was an attempt to ensure homogeneous block 
sizes. However, the check introduced by HBASE-17757 also considers the 
unencoded size, which in the cases where encoding efficiency is higher than 
what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would 
still lead to varying block sizes.

This patch changes that logic, to only consider encoded size if 
hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it 
will consider the unencoded size. This gives a finer control over the on-disk 
block sizes and the overall number of blocks when encoding is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27147) Add a hbck2 option to clear emptyRegion from meta

2022-06-22 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27147.
--
Resolution: Not A Problem

Please see already existing command: *extraRegionsInMeta 
*

> Add a hbck2 option to clear emptyRegion from meta
> -
>
> Key: HBASE-27147
> URL: https://issues.apache.org/jira/browse/HBASE-27147
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2
>Reporter: Karthik Palanisamy
>Priority: Major
>
> No alternative option in hbck2 to fix empty regions.  hbck1 equivalent is 
> "-fixEmptyMetaCells".  
> "Try to fix hbase:meta entries not referencing any region (empty 
> REGIONINFO_QUALIFIER rows)"
>  
> NOTE: This is an inconsistent meta bug. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27119) [HBCK2] Some commands are broken after HBASE-24587

2022-06-14 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27119:


 Summary: [HBCK2] Some commands are broken after HBASE-24587
 Key: HBASE-27119
 URL: https://issues.apache.org/jira/browse/HBASE-27119
 Project: HBase
  Issue Type: Bug
  Components: hbase-operator-tools, hbck2
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


HBCK2_replication_ and _filesystem_ commands are broken after HBASE-24587. 
Trying to pass the _-f_ or _--fix_ options give the below error:
{noformat}
ERROR: Unrecognized option: -f
FOR USAGE, use the -h or --help option
2022-06-14T16:07:32,296 INFO  [main] client.ConnectionImplementation: Closing 
master protocol: MasterService
Exception in thread "main" java.lang.NullPointerException
at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:1083)
at org.apache.hbase.HBCK2.run(HBCK2.java:982)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hbase.HBCK2.main(HBCK2.java:1318)
{noformat}

This is because _getInputList_ calls 
[here|https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java#L1073]
 and 
[here|https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java#L1082]
 only accept the _-i_/_--inputFiles_, throwing an exception if we pass 
_-f/--fix_ options.

Still need to confirm if any other command is affected by this.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-27017) MOB snapshot is broken when FileBased SFT is used

2022-06-05 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27017.
--
Resolution: Fixed

Merged into master, branch-2 and branch-2.5.

> MOB snapshot is broken when FileBased SFT is used
> -
>
> Key: HBASE-27017
> URL: https://issues.apache.org/jira/browse/HBASE-27017
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.5.0, 3.0.0-alpha-2
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3
>
>
> During snapshot MOB regions are treated like any other region. When a 
> snapshot is taken and hfile references are collected a StoreFileTracker is 
> created to get the current active hfile list. But the MOB region stores are 
> not tracked so an empty list is returned, resulting in a broken snapshot. 
> When this snapshot is cloned the resulting table will have no MOB files or 
> references.
> The problematic code can be found here:
> [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java#L313]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-27069) Hbase SecureBulkload permission regression

2022-05-31 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27069.
--
Resolution: Fixed

> Hbase SecureBulkload permission regression
> --
>
> Key: HBASE-27069
> URL: https://issues.apache.org/jira/browse/HBASE-27069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.0, 3.0.0-alpha-3
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3
>
>
> HBASE-26707 has introduced a bug, where setting the permission of the bulk 
> loaded HFile to 777 is made conditional.
> However, as discussed in HBASE-15790, that permission is essential for 
> HBase's correct operation.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-27061) two phase bulkload is broken when SFT is in use.

2022-05-26 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27061.
--
Resolution: Fixed

Thanks for the fix, [~sergey.soldatov]. Had merged into master, branch-2 and 
branch-2.5.

> two phase bulkload is broken when SFT is in use.
> 
>
> Key: HBASE-27061
> URL: https://issues.apache.org/jira/browse/HBASE-27061
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.0, 3.0.0-alpha-2, 2.6.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3
>
>
> In HBASE-26707 for the SFT case, we are writing files directly to the region 
> location. For that we are using HRegion.regionDir as the staging directory. 
> The problem is that in reality, this dir is pointing to the WAL dir, so for 
> S3 deployments that would be pointing to the hdfs. As the result during the 
> execution of LoadIncrementalHFiles the process failed with the exception:
> {noformat}
> 2022-05-24 03:31:23,656 ERROR 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to 
> complete bulk load
> java.lang.IllegalArgumentException: Wrong FS 
> hdfs://ns1//hbase-wals/data/default/employees/4f367b303da4fed7667fff07fd4c6066/department/acd971097924463da6d6e3a15f9527da
>  -expected s3a://hbase
> at 
> org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1375)
> at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:647)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1337)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1363)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3521)
> at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:511)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:397)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$SecureBulkLoadListener.prepareBulkLoad(SecureBulkLoadManager.java:397)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:6994)
> at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:291)
> at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1879)
> at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2453)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45821)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-27021) StoreFileInfo should set its initialPath in a consistent way

2022-05-12 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-27021.
--
Resolution: Fixed

Merged into master, branch-2 and branch-2.5. Thanks for reviewing it 
[~zhangduo] [~elserj] !

> StoreFileInfo should set its initialPath in a consistent way
> 
>
> Key: HBASE-27021
> URL: https://issues.apache.org/jira/browse/HBASE-27021
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.0, 3.0.0-alpha-2, 2.6.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3
>
>
> Currently, StoreFileInfo provides overloaded public constructors where the 
> related file path can be passed as either a Path or FileStatus instance. This 
> can lead to the StoreFileInfo instances related to the same file entry to 
> have different representations of the file path, which could create problems 
> for functions relying on equality for comparing store files. One example I 
> could find is the StoreEngine.refreshStoreFiles method, which list some files 
> from the SFT, then compares against a list of files from the SFM to decide 
> how it should update SFM internal cache. Here's a sample output from the 
> TestHStore.testRefreshStoreFiles:
> ---
> 2022-05-10T15:06:42,831 INFO [Time-limited test] 
> regionserver.StoreEngine(399): Refreshing store files for 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine@69d58ac1 files to 
> add: 
> [file:/hbase/hbase-server/target/test-data/e3eac5ce-9bdf-8624-bcec-09c89790d682/TestStoretestRefreshStoreFiles/data/default/table/da6a3cf38941b37cd16438d554b13bbc/family/6e92c2f5cf1f40f7b8c6b6b34a176fa5,
>  
> file:/hbase/hbase-server/target/test-data/e3eac5ce-9bdf-8624-bcec-09c89790d682/TestStoretestRefreshStoreFiles/data/default/table/da6a3cf38941b37cd16438d554b13bbc/family/{*}fa4d5909da644d94873cbfdc6b5a07da{*}]
>  files to remove: 
> [/hbase/hbase-server/target/test-data/e3eac5ce-9bdf-8624-bcec-09c89790d682/TestStoretestRefreshStoreFiles/data/default/table/da6a3cf38941b37cd16438d554b13bbc/family/{*}fa4d5909da644d94873cbfdc6b5a07da{*}]
> ---
> The above will wrongly add it to SFM's list of compacted files, making a 
> valid file potentially eligible for deletion and data loss.
> I think we can avoid that by always converting Path instances passed in 
> StoreFileInfo constructors to a FileStatus, for consistently build the 
> internal StoreFileInfo path.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27022) SFT seems apparently tracking invalid/malformed store files

2022-05-10 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27022:


 Summary: SFT seems apparently tracking invalid/malformed store 
files
 Key: HBASE-27022
 URL: https://issues.apache.org/jira/browse/HBASE-27022
 Project: HBase
  Issue Type: Sub-task
Reporter: Wellington Chevreuil


Opening this on behalf of [~apurtell] , who first reported this issue on 
HBASE-26999: When running scale tests using ITLCC, the following errors were 
observed:
{noformat}
[00]2022-05-05 15:59:52,280 WARN [region-location-0] regionserver.StoreFileInfo:
Skipping 
hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/9eafc10e1b5a25532a4f0adf550828fc/c/9d07757144a7404fac02e161b5bd035e
because it is empty. HBASE-646 DATA LOSS?
...
[00]2022-05-05 15:59:52,320 WARN [region-location-2] 
regionserver.StoreFileInfo: 
Skipping 
hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/5322c54b9a899eae03cb16e956a836d5/c/184b4f55ab1a4dbc813e77aeae1343ae
 
because it is empty. HBASE-646 DATA LOSS? {noformat}
 

>From some discussions in HBASE-26999, it seems that SFT has wrongly tracked an 
>incomplete/unfinished store file. 

For further context, follow the [comments thread on 
HBASE-26999|https://issues.apache.org/jira/browse/HBASE-26999?focusedCommentId=17533508=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17533508].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27021) StoreFileInfo should set its initialPath in a consistent way

2022-05-10 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27021:


 Summary: StoreFileInfo should set its initialPath in a consistent 
way
 Key: HBASE-27021
 URL: https://issues.apache.org/jira/browse/HBASE-27021
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


Currently, StoreFileInfo provides overloaded public constructors where the 
related file path can be passed as either a Path or FileStatus instance. This 
can lead to the StoreFileInfo instances related to the same file entry to have 
different representations of the file path, which could create problems for 
functions relying on equality for comparing store files. One example I could 
find is the StoreEngine.refreshStoreFiles method, which list some files from 
the SFT, then compares against a list of files from the SFM to decide how it 
should update SFM internal cache. Here's a sample output from the 
TestHStore.testRefreshStoreFiles:

---

2022-05-10T15:06:42,831 INFO [Time-limited test] regionserver.StoreEngine(399): 
Refreshing store files for 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine@69d58ac1 files to add: 
[file:/hbase/hbase-server/target/test-data/e3eac5ce-9bdf-8624-bcec-09c89790d682/TestStoretestRefreshStoreFiles/data/default/table/da6a3cf38941b37cd16438d554b13bbc/family/6e92c2f5cf1f40f7b8c6b6b34a176fa5,
 
file:/hbase/hbase-server/target/test-data/e3eac5ce-9bdf-8624-bcec-09c89790d682/TestStoretestRefreshStoreFiles/data/default/table/da6a3cf38941b37cd16438d554b13bbc/family/{*}fa4d5909da644d94873cbfdc6b5a07da{*}]
 files to remove: 
[/hbase/hbase-server/target/test-data/e3eac5ce-9bdf-8624-bcec-09c89790d682/TestStoretestRefreshStoreFiles/data/default/table/da6a3cf38941b37cd16438d554b13bbc/family/{*}fa4d5909da644d94873cbfdc6b5a07da{*}]

---

The above will wrongly add it to SFM's list of compacted files, making a valid 
file potentially eligible for deletion and data loss.

I think we can avoid that by always converting Path instances passed in 
StoreFileInfo constructors to a FileStatus, for consistently build the internal 
StoreFileInfo path.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26999) HStore should try write WAL compact marker before replacing compacted files in StoreEngine

2022-05-05 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-26999:


 Summary: HStore should try write WAL compact marker before 
replacing compacted files in StoreEngine
 Key: HBASE-26999
 URL: https://issues.apache.org/jira/browse/HBASE-26999
 Project: HBase
  Issue Type: Sub-task
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


On HBASE-26064, it seems we altered the order we update different places with 
the results of a compaction:
{noformat}
@@ -1510,14 +1149,13 @@ public class HStore implements Store, HeapSize, 
StoreConfigInformation,
       List newFiles) throws IOException {
     // Do the steps necessary to complete the compaction.
     setStoragePolicyFromFileName(newFiles);
-    List sfs = commitStoreFiles(newFiles, true);
+    List sfs = storeEngine.commitStoreFiles(newFiles, true);
     if (this.getCoprocessorHost() != null) {
       for (HStoreFile sf : sfs) {
         getCoprocessorHost().postCompact(this, sf, cr.getTracker(), cr, user);
       }
     }
-    writeCompactionWalRecord(filesToCompact, sfs);
-    replaceStoreFiles(filesToCompact, sfs);
+    replaceStoreFiles(filesToCompact, sfs, true);
...
@@ -1581,25 +1219,24 @@ public class HStore implements Store, HeapSize, 
StoreConfigInformation,
         this.region.getRegionInfo(), compactionDescriptor, 
this.region.getMVCC());
   }
 
-  void replaceStoreFiles(Collection compactedFiles, 
Collection result)
-      throws IOException {
-    this.lock.writeLock().lock();
-    try {
-      
this.storeEngine.getStoreFileManager().addCompactionResults(compactedFiles, 
result);
-      synchronized (filesCompacting) {
-        filesCompacting.removeAll(compactedFiles);
-      }
-
-      // These may be null when the RS is shutting down. The space quota 
Chores will fix the Region
-      // sizes later so it's not super-critical if we miss these.
-      RegionServerServices rsServices = region.getRegionServerServices();
-      if (rsServices != null && rsServices.getRegionServerSpaceQuotaManager() 
!= null) {
-        updateSpaceQuotaAfterFileReplacement(
-            
rsServices.getRegionServerSpaceQuotaManager().getRegionSizeStore(), 
getRegionInfo(),
-            compactedFiles, result);
-      }
-    } finally {
-      this.lock.writeLock().unlock();
+  @RestrictedApi(explanation = "Should only be called in TestHStore", link = 
"",
+    allowedOnPath = ".*/(HStore|TestHStore).java")
+  void replaceStoreFiles(Collection compactedFiles, 
Collection result,
+    boolean writeCompactionMarker) throws IOException {
+    storeEngine.replaceStoreFiles(compactedFiles, result);
+    if (writeCompactionMarker) {
+      writeCompactionWalRecord(compactedFiles, result);
+    }
+    synchronized (filesCompacting) {
+      filesCompacting.removeAll(compactedFiles);
+    }
+    // These may be null when the RS is shutting down. The space quota Chores 
will fix the Region
+    // sizes later so it's not super-critical if we miss these.
+    RegionServerServices rsServices = region.getRegionServerServices();
+    if (rsServices != null && rsServices.getRegionServerSpaceQuotaManager() != 
null) {
+      updateSpaceQuotaAfterFileReplacement(
+        rsServices.getRegionServerSpaceQuotaManager().getRegionSizeStore(), 
getRegionInfo(),
+        compactedFiles, result); {noformat}
While running some large scale load test, we run into File SFT metafiles 
inconsistency that we believe could have been avoided if the original order was 
in place. Here the scenario we had:

1) Region R with one CF f was open on RS1. At this time, the given store had 
some files, let's say these were file1, file2 and file3;

2) Compaction started on RS1;

3) RS1 entered a long GC pause, lost ZK lock. Compaction is still running, 
though.

4) RS2 opens R. The related File SFT instance for this store then creates a new 
meta file with file1, file2 and file3.

5) Compaction on RS1 successfully completes the *storeEngine.replaceStoreFiles* 
call. This updates the in memory cache of valid files (StoreFileManager) and 
the SFT metafile for  the store engine on RS1 with the compaction resulting 
file, say file4, removing file1, file2 and file3. Note that the SFT meta file 
used by RS1 here is different (older) than the one used by RS2.

6) Compaction on RS1 tries to update WAL marker, but fails to do so, as the WAL 
already got closed when the RS1 ZK lock expired. This triggers a store close in 
RS1. As part of the store close process, it removes all files it sees as 
completed compacted, in this case, file1, file2 and file3.

7) RS2 still references file1, file2 and file3. It then gets 
FileNotFoundException when trying to open any of these files.

This situation would had been avoided if the original order of a) write WAL 
marker, then b) replace store files was kept. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-26971) SnapshotInfo --snapshot param is marked as required even when trying to list all snapshots

2022-04-26 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-26971.
--
Resolution: Fixed

Merged to master, branch-2, branch-2.5 and branch-2.4. Thanks for reviewing, 
[~elserj] !

> SnapshotInfo --snapshot param is marked as required even when trying to list 
> all snapshots
> --
>
> Key: HBASE-26971
> URL: https://issues.apache.org/jira/browse/HBASE-26971
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.5.0, 3.0.0-alpha-2, 2.4.11
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.12
>
>
> SnapshotInfo –-list-snapshots lists all existing snapshots and doesn't need 
> any filter, however, --snapshot param is marked as required, causing list to 
> fail if this param is not defined. 
> Also, the help description is a bit confusing about which options should be 
> used together.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26971) SnapshotInfo --snapshot param is marked as required even when trying to list all snapshots

2022-04-22 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-26971:


 Summary: SnapshotInfo --snapshot param is marked as required even 
when trying to list all snapshots
 Key: HBASE-26971
 URL: https://issues.apache.org/jira/browse/HBASE-26971
 Project: HBase
  Issue Type: Improvement
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


SnapshotInfo –-list-snapshots lists all existing snapshots and doesn't need any 
filter, however, --snapshot param is marked as required, causing list to fail 
if this param is not defined. 

Also, the help description is a bit confusing about which options should be 
used together.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-26927) Add snapshot scanner UT with SFT and some cleanups to TestTableSnapshotScanner

2022-04-05 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-26927.
--
Resolution: Fixed

Merged into master, branch-2 and branch-2.5. Thanks for reviewing it, 
[~zhangduo] [~elserj] !

> Add snapshot scanner UT with SFT and some cleanups to TestTableSnapshotScanner
> --
>
> Key: HBASE-26927
> URL: https://issues.apache.org/jira/browse/HBASE-26927
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.5.0, 3.0.0-alpha-2, 3.0.0-alpha-3, 2.4.11
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3
>
>
> This is to replicate current TableSapshotScanner UTs to run over an SFT 
> cluster. Just extending current TableSapshotScanner with overrides to setup 
> method. Also applied some fix/cleanups to TableSapshotScanner.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26927) Add snapshot scanner UT with SFT and some cleanups to TestTableSnapshotScanner

2022-04-04 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-26927:


 Summary: Add snapshot scanner UT with SFT and some cleanups to 
TestTableSnapshotScanner
 Key: HBASE-26927
 URL: https://issues.apache.org/jira/browse/HBASE-26927
 Project: HBase
  Issue Type: Sub-task
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


This is to replicate current TableSapshotScanner UTs to run over an SFT 
cluster. Just extending current TableSapshotScanner with overrides to setup 
method. Also applied some fix/cleanups to TableSapshotScanner.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26838) Junit jar is not included in the hbase tar ball, causing issues for some hbase tools that do rely on it

2022-03-31 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-26838.
--
Resolution: Fixed

Merged into master, then cherry-picked into branch-2, branch-2.4 and 
branch-2.4. Thanks for reviewing it, [~elserj] , [~ndimiduk] !

> Junit jar is not included in the hbase tar ball, causing issues for some  
> hbase tools that do rely on it
> 
>
> Key: HBASE-26838
> URL: https://issues.apache.org/jira/browse/HBASE-26838
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests, tooling
>Affects Versions: 2.5.0, 3.0.0-alpha-2, 2.6.0, 2.4.11
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.12
>
>
> We used to include junit jar on the generated tar ball lib directory. After 
> some sanitisation of unnecessary libs for most of hbase processes, junit got 
> removed from the packing, so that it don't end in hbase classpath by default. 
> Some testing tools, however do depend on junit at runtime, and would now fail 
> with NoClassDefFoundError, like 
> [IntegrationTestIngest:|https://hbase.apache.org/book.html#chaos.monkey.properties]
> {noformat}
> 2022-03-14T21:54:50,483 INFO  [main] client.AsyncConnectionImpl: Connection 
> has been closed by main.
> Exception in thread "main" java.lang.NoClassDefFoundError: org/junit/Assert
>   at 
> org.apache.hadoop.hbase.IntegrationTestIngest.initTable(IntegrationTestIngest.java:101)
>   at 
> org.apache.hadoop.hbase.IntegrationTestIngest.setUpCluster(IntegrationTestIngest.java:92)
>   at 
> org.apache.hadoop.hbase.IntegrationTestBase.setUp(IntegrationTestBase.java:170)
>   at 
> org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:153)
>   at 
> org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:153)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at 
> org.apache.hadoop.hbase.IntegrationTestIngest.main(IntegrationTestIngest.java:259)
> Caused by: java.lang.ClassNotFoundException: org.junit.Assert
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
>   at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
>   ... 7 more {noformat}
> Discussing with [~elserj] internally, we believe a reasonable solution would 
> be to include junit jar back into the tarball, under "lib/test" dir so that 
> it's not automatically added to hbase processes classpath, but still allow 
> operators to manually define it in a convenient way, like below:
> {noformat}
> HBASE_CLASSPATH="$HBASE_HOME/lib/tests/*" hbase 
> org.apache.hadoop.hbase.IntegrationTesngest -m slowDeterministic {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26881) Backport HBASE-25368 to branch-2

2022-03-25 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-26881.
--
Resolution: Fixed

Merged to branch-2. Thanks for reviewing, [~apurtell] !

> Backport HBASE-25368 to branch-2 
> -
>
> Key: HBASE-26881
> URL: https://issues.apache.org/jira/browse/HBASE-26881
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> HBASE-26640 introduced two extra paths under master:store table: 
> ".initializing" and ".initialized", in order to control when such store has 
> been completed started for SFT.
> Problem is that TestHFileProcedurePrettyPrinter uses 
> RegionInfo.isEncodedRegionName to determine if a given child path in the 
> table dir is a valid region dir. Current implementation for 
> RegionInfo.isEncodedRegionName considers ".initializing" and ".initialized" 
> as valid region encoded names, thus the test ends up picking one of the flag 
> dirs to list hfiles that should had been modified when validating the test 
> outcome.
> Further improvements have been made to RegionInfo.isEncodedRegionName in 
> HBASE-25368 to proper validate region names, but those weren't backported to 
> branch-2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26881) Fix TestHFileProcedurePrettyPrinter broken by changes from HBASE-26640

2022-03-23 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-26881:


 Summary: Fix TestHFileProcedurePrettyPrinter broken by changes 
from HBASE-26640 
 Key: HBASE-26881
 URL: https://issues.apache.org/jira/browse/HBASE-26881
 Project: HBase
  Issue Type: Sub-task
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


HBASE-26640 introduced two extra paths under master:store table: 
".initializing" and ".initialized", in order to control when such store has 
been completed started for SFT.

Problem is that TestHFileProcedurePrettyPrinter assumes all child dirs from 
master:store would be region dirs, so it ends up picking up one of the flag 
dirs to list hfiles that should had been modified when validating the test 
outcome.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26838) Junit jar is not included in the hbase tar ball, causing issues for some hbase tools that do rely on it

2022-03-14 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-26838:


 Summary: Junit jar is not included in the hbase tar ball, causing 
issues for some  hbase tools that do rely on it
 Key: HBASE-26838
 URL: https://issues.apache.org/jira/browse/HBASE-26838
 Project: HBase
  Issue Type: Bug
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


We used to include junit jar on the generated tar ball lib directory. After 
some sanitisation of unnecessary libs for most of hbase processes, junit got 
removed from the packing, so that it don't end in hbase classpath by default. 

Some testing tools, however do depend on junit at runtime, and would now fail 
with NoClassDefFoundError, like 
[IntegrationTestIngest:|https://hbase.apache.org/book.html#chaos.monkey.properties]
{noformat}
2022-03-14T21:54:50,483 INFO  [main] client.AsyncConnectionImpl: Connection has 
been closed by main.
Exception in thread "main" java.lang.NoClassDefFoundError: org/junit/Assert
at 
org.apache.hadoop.hbase.IntegrationTestIngest.initTable(IntegrationTestIngest.java:101)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.setUpCluster(IntegrationTestIngest.java:92)
at 
org.apache.hadoop.hbase.IntegrationTestBase.setUp(IntegrationTestBase.java:170)
at 
org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:153)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at 
org.apache.hadoop.hbase.IntegrationTestIngest.main(IntegrationTestIngest.java:259)
Caused by: java.lang.ClassNotFoundException: org.junit.Assert
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 7 more {noformat}
Discussing with [~elserj] internally, we believe a reasonable solution would be 
to include junit jar back into the tarball, under "lib/test" dir so that it's 
not automatically added to hbase processes classpath, but still allow operators 
to manually define it in a convenient way, like below:
{noformat}
HBASE_CLASSPATH="$HBASE_HOME/lib/tests/*" hbase 
org.apache.hadoop.hbase.IntegrationTesngest -m slowDeterministic {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26707) Reduce number of renames during bulkload

2022-02-25 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-26707.
--
Resolution: Fixed

Thanks for the contribution, [~bszabolcs] !

> Reduce number of renames during bulkload
> 
>
> Key: HBASE-26707
> URL: https://issues.apache.org/jira/browse/HBASE-26707
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
>
> Make sure we only do a single rename operation during bulkload when 
> StoreEngine does not require the the use of tmp directories.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26577) Update ref guide section for IT and Chaos Monkey to explain the additions from HBASE-26556

2022-01-25 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-26577.
--
Resolution: Fixed

Merged into master. Thanks for reviewing it, [~elserj] !

> Update ref guide section for IT and Chaos Monkey to explain the additions 
> from HBASE-26556
> --
>
> Key: HBASE-26577
> URL: https://issues.apache.org/jira/browse/HBASE-26577
> Project: HBase
>  Issue Type: Improvement
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
>
> HBASE-26556 introduced a customisable monkey factory for the slow 
> deterministic policy, as well as made possible for pluggable implementations 
> of the hbase remote shell commands. This is to document how these new 
> features can be used.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26662) User.createUserForTesting should not reset UserProvider.groups every time if hbase.group.service.for.test.only is true

2022-01-18 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-26662.
--
Resolution: Fixed

Thanks for reviewing it, [~elserj] , [~zhangduo] !

> User.createUserForTesting should not reset UserProvider.groups every time if 
> hbase.group.service.for.test.only is true
> --
>
> Key: HBASE-26662
> URL: https://issues.apache.org/jira/browse/HBASE-26662
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.5.0, 3.0.0-alpha-2, 2.4.9, 2.6.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.10
>
>
> The _if check_ below will always unnecessarily reset static var 
> _UserProvider.groups_ to a newly created instance of TestingGroups every time 
> `User.createUserForTesting` is called.
> {noformat}
> if (!(UserProvider.groups instanceof TestingGroups) ||
> conf.getBoolean(TestingGroups.TEST_CONF, false)) {
>   UserProvider.groups = new TestingGroups(UserProvider.groups);
> }
> {noformat}
> For tests creating multiple {_}test users{_}, this causes the latest created 
> user to reset _groups_ and all previously created users would now have to be 
> available on the {_}User.underlyingImplementation{_}, which not always will 
> be true.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   3   4   >