[ 
https://issues.apache.org/jira/browse/HBASE-29627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-29627.
------------------------------------------
    Resolution: Fixed

Merged into master, branch-3, branch-2 and branch-2.6. Thanks for reviewing it, 
[~psomogyi] !

> Handle any block cache fetching errors when reading a block in HFileReaderImpl
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-29627
>                 URL: https://issues.apache.org/jira/browse/HBASE-29627
>             Project: HBase
>          Issue Type: Improvement
>          Components: BlockCache
>    Affects Versions: 3.0.0-beta-1, 2.7.0
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.6.4
>
>
> One of our customers had faced a situation where blocks are getting cached 
> uncompressed into bucket cache (due to this other issue reported in 
> HBASE-29623) following flushes or compactions, then at read time, during 
> cache retrieval in HFileReaderImpl, it fails to decode the block accordingly, 
> throwing below uncaught exception, failing the read indefinitely: 
> {noformat}
> 2025-09-17 06:38:25,607 ERROR 
> org.apache.hadoop.hbase.regionserver.CompactSplit: Compaction failed 
> region=XXXXXXXXXXXXXXXXXXXXXX,1721528038124.a3012627f502c78738430343b0b54966.,
>  storeName=a3012627f502c78738430343b0b54966/0, priority=45, 
> startTime=1758091104691
> java.lang.IllegalArgumentException: There is no data block encoder for given 
> id '5'
>         at 
> org.apache.hadoop.hbase.io.encoding.DataBlockEncoding.getEncodingById(DataBlockEncoding.java:157)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.getDataBlockEncoding(HFileBlock.java:2003)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.getCachedBlock(HFileReaderImpl.java:1122)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1288)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1249)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:750)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$EncodedScanner.next(HFileReaderImpl.java:1528)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:194)
>         at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:112)
>         at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:681)
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:437)
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:354)
>  {noformat}
> I suspect this mostly related to an error in calculating the meta_space 
> initial offset, due to some extra byte in the byte buffer (something in the 
> line of this [comment from 
> HBASE-27053|https://issues.apache.org/jira/browse/HBASE-27053?focusedCommentId=17564026&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17564026]).
>  If the buffer has extra bytes, we could miss calculate the meta space 
> offset, reading a wrong byte value for the "usesChecksum" flag, which then 
> would lead to wrong header size calculation (24 for no checksum, 33 for 
> checksum), then leading to a wrong positioning for reading the encoding type 
> short. 
> Unfortunately, I could not reproduce this issue on a controlled test 
> environment. However, I think we should still change HFileReaderImpl to 
> handle any possible exception happening when retrieving blocks from the 
> cache, so that instead of failing the whole read operation, it should evict 
> the given corrupt block from the cache and resort to read it from the file 
> system.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to