[ https://issues.apache.org/jira/browse/HBASE-28170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wellington Chevreuil updated HBASE-28170: ----------------------------------------- Summary: Put the cached time at the beginning of the block; run cache validation in the background when retrieving the persistent cache (was: Put the cached time at the beginning of the block run cache validation in the background when retrieving the persistent cache) > Put the cached time at the beginning of the block; run cache validation in > the background when retrieving the persistent cache > ------------------------------------------------------------------------------------------------------------------------------ > > Key: HBASE-28170 > URL: https://issues.apache.org/jira/browse/HBASE-28170 > Project: HBase > Issue Type: Sub-task > Reporter: Wellington Chevreuil > Assignee: Wellington Chevreuil > Priority: Major > > In HBASE-28004, we added a "cached time" long at the end of each block on the > bucket cache. We also record the cached time in the backing map we persist to > disk periodically, in order to retrieve the cache upon crashes/restarts. The > persisted backing map includes the last modification time of the cache itself. > On restarts, once we read the backing map from the persisted file, we compare > the last modification time of the cache recorded there against the last > modification time of the cache. If those differ, it means the cache has been > updated after the backing map has been persisted, so the backing map might > not be accurate. We then iterate though the backing map entires and compare > the entries cached time against the related block in the cache, and if those > differ, we remove the entry from the map. > Currently this validation is made at RS initialisation time, but with caches > as large as 1.6TB/30M+ blocks, it can last to an hour, meaning the RS is > useless over that time. This PR changes this validation to be performed in > the background, whilst direct accesses to a block in the cache would also > perform the "cached time" comparison. > This PR also moves the "cached time" to the beginning of the block in the > cache, instead of the end. We noticed that with the "cached time" at the end > we can fail to ensure consistency at some conditions. Consider the following: > 1) A block B1 of size S gets allocated at offset 0 with cached time T1; > 2) The backing map is persisted, containing B1 at offset 0 and cached time T1; > 3) B1 is evicted. It's offset in the cache is now free, however its contents > are still there, including the cached time T1 at its end; > 4) A new block B2 of size S/2 gets allocated at offset 0 with cached time T2; > 5) RS crashes before the backing map gets saved, so the persisted backing map > still has only the reference to B1, but not B2; > 6) At restart, we run the validation. Because B2 was half the size of B1, we > haven't overridden B1 cached time from the cache, so we will successfully > validate B1, although its content is now half overridden by B2. -- This message was sent by Atlassian Jira (v8.20.10#820010)