[ 
https://issues.apache.org/jira/browse/HBASE-28170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-28170:
-----------------------------------------
    Summary: Put the cached time at the beginning of the block; run cache 
validation in the background when retrieving the persistent cache  (was: Put 
the cached time at the beginning of the block run cache validation in the 
background when retrieving the persistent cache)

> Put the cached time at the beginning of the block; run cache validation in 
> the background when retrieving the persistent cache
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-28170
>                 URL: https://issues.apache.org/jira/browse/HBASE-28170
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>
> In HBASE-28004, we added a "cached time" long at the end of each block on the 
> bucket cache. We also record the cached time in the backing map we persist to 
> disk periodically, in order to retrieve the cache upon crashes/restarts. The 
> persisted backing map includes the last modification time of the cache itself.
> On restarts, once we read the backing map from the persisted file, we compare 
> the last modification time of the cache recorded there against the last 
> modification time of the cache. If those differ, it means the cache has been 
> updated after the backing map has been persisted, so the backing map might 
> not be accurate. We then iterate though the backing map entires and compare 
> the entries cached time against the related block in the cache, and if those 
> differ, we remove the entry from the map. 
> Currently this validation is made at RS initialisation time, but with caches 
> as large as 1.6TB/30M+ blocks, it can last to an hour, meaning the RS is 
> useless over that time. This PR changes this validation to be performed in 
> the background, whilst direct accesses to a block in the cache would also 
> perform the "cached time" comparison.
> This PR also moves the "cached time" to the beginning of the block in the 
> cache, instead of the end. We noticed that with the "cached time" at the end 
> we can fail to ensure consistency at some conditions. Consider the following: 
> 1) A block B1 of size S gets allocated at offset 0 with cached time T1;
> 2) The backing map is persisted, containing B1 at offset 0 and cached time T1;
> 3) B1 is evicted. It's offset in the cache is now free, however its contents 
> are still there, including the cached time T1 at its end;
> 4) A new block B2 of size S/2 gets allocated at offset 0 with cached time T2;
> 5) RS crashes before the backing map gets saved, so the persisted backing map 
> still has only the reference to B1, but not B2;
> 6) At restart, we run the validation. Because B2 was half the size of B1, we 
> haven't overridden B1 cached time from the cache, so we will successfully 
> validate B1, although its content is now half overridden by B2. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to