[
https://issues.apache.org/jira/browse/HBASE-29627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wellington Chevreuil resolved HBASE-29627.
------------------------------------------
Resolution: Fixed
Merged into master, branch-3, branch-2 and branch-2.6. Thanks for reviewing it,
[~psomogyi] !
> Handle any block cache fetching errors when reading a block in HFileReaderImpl
> ------------------------------------------------------------------------------
>
> Key: HBASE-29627
> URL: https://issues.apache.org/jira/browse/HBASE-29627
> Project: HBase
> Issue Type: Improvement
> Components: BlockCache
> Affects Versions: 3.0.0-beta-1, 2.7.0
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.4
>
>
> One of our customers had faced a situation where blocks are getting cached
> uncompressed into bucket cache (due to this other issue reported in
> HBASE-29623) following flushes or compactions, then at read time, during
> cache retrieval in HFileReaderImpl, it fails to decode the block accordingly,
> throwing below uncaught exception, failing the read indefinitely:
> {noformat}
> 2025-09-17 06:38:25,607 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplit: Compaction failed
> region=XXXXXXXXXXXXXXXXXXXXXX,1721528038124.a3012627f502c78738430343b0b54966.,
> storeName=a3012627f502c78738430343b0b54966/0, priority=45,
> startTime=1758091104691
> java.lang.IllegalArgumentException: There is no data block encoder for given
> id '5'
> at
> org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.getEncodingById(DataBlockEncoding.java:157)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock.getDataBlockEncoding(HFileBlock.java:2003)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.getCachedBlock(HFileReaderImpl.java:1122)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1288)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1249)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:750)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$EncodedScanner.next(HFileReaderImpl.java:1528)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:194)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:112)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:681)
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:437)
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:354)
> {noformat}
> I suspect this mostly related to an error in calculating the meta_space
> initial offset, due to some extra byte in the byte buffer (something in the
> line of this [comment from
> HBASE-27053|https://issues.apache.org/jira/browse/HBASE-27053?focusedCommentId=17564026&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17564026]).
> If the buffer has extra bytes, we could miss calculate the meta space
> offset, reading a wrong byte value for the "usesChecksum" flag, which then
> would lead to wrong header size calculation (24 for no checksum, 33 for
> checksum), then leading to a wrong positioning for reading the encoding type
> short.
> Unfortunately, I could not reproduce this issue on a controlled test
> environment. However, I think we should still change HFileReaderImpl to
> handle any possible exception happening when retrieving blocks from the
> cache, so that instead of failing the whole read operation, it should evict
> the given corrupt block from the cache and resort to read it from the file
> system.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)