[
https://issues.apache.org/jira/browse/HBASE-29857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18054891#comment-18054891
]
rstest commented on HBASE-29857:
--------------------------------
I found the issue is partially solved by
[HBASE-28839|https://issues.apache.org/jira/browse/HBASE-28839]:
*HBase 2.6.3 Persistence Format (OLD):*
// Writes numChunks first, then chunks
byte[] bytes = new byte[Long.BYTES];
long numChunks = Bytes.toLong(bytes, 0); // Read numChunks
// BUG: When numChunks=0 (empty cache), code still tries to read first chunk
BucketCacheProtos.BucketCacheEntry firstChunk =
BucketCacheProtos.BucketCacheEntry.parseDelimitedFrom(in); // Returns null
→ NPE
*Master Persistence Format (NEW):*
// In BucketProtoUtils.serializeAsPB():
fos.write(PB_MAGIC_V2); // 1. Write magic bytes
toPB(cache, builder).writeDelimitedTo(fos); // 2. ALWAYS write metadata
(even if empty)
for (entry : cache.backingMap.entrySet()) \{ ... } // 3. Write chunks only if
non-empty
// In BucketCache.retrieveChunkedBackingMap():
BucketCacheEntry cacheEntry = parseDelimitedFrom(in); // Reads metadata
(always present)
while (in.available() > 0) \{ ... } // Gracefully handles
no chunks
However it could be good if we do a NPE check for parseDelimitedFrom(in),
currently NPE is caught by a generic exception handler introduced in
[HBASE-28839|https://issues.apache.org/jira/browse/HBASE-28839].
The Fix:
{code:java}
private void retrieveChunkedBackingMap(FileInputStream in) throws IOException
{
BucketCacheProtos.BucketCacheEntry cacheEntry =
BucketCacheProtos.BucketCacheEntry.parseDelimitedFrom(in);
// HBASE-29857: Handle case where persistence file is empty or corrupted.
// parseDelimitedFrom() returns null when there's no data to read.
if (cacheEntry == null) {
throw new IOException(
"Failed to read cache entry from persistence file (possibly empty or
corrupted)");
}
{code}
[~wchevreuil] What do you think?
> BucketCache fails to start when persistence file was written with empty cache
> -----------------------------------------------------------------------------
>
> Key: HBASE-29857
> URL: https://issues.apache.org/jira/browse/HBASE-29857
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 2.6.3
> Reporter: rstest
> Priority: Critical
>
> When a RegionServer with BucketCache persistence enabled is restarted, if the
> BucketCache was empty at shutdown time, the new RegionServer fails to start
> with a `NullPointerException` in `BucketCache.parsePB()`.
>
> The bug is in the interaction between `BucketProtoUtils.serializeAsPB()` and
> `BucketCache.retrieveChunkedBackingMap()`:
> 1. During shutdown with empty cache: When `backingMap.size() == 0`,
> `serializeAsPB()` writes `numChunks = 0` to the persistence file, but the
> loop that writes `BucketCacheEntry` objects never executes (because there are
> no entries to iterate). This means no BucketCacheEntry is written to the file
> 2. During startup: `retrieveChunkedBackingMap()` reads `numChunks = 0` from
> the file, but still attempts to read the first chunk using
> `parseDelimitedFrom()`. Since no `BucketCacheEntry` was written,
> `parseDelimitedFrom()` returns `null`.
> 3. NPE occurs: The null `firstChunk` is passed to `parsePB()`, which calls
> `firstChunk.getDeserializersMap()` on the null object, causing NPE.
>
> This bug just make the region server not able to be restarted.
> I will provide a fix in PR and also a unit test that can reproduce the bug
> (if the fix is not applied).
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)