Tak-Lon (Stephen) Wu created HDDS-15424:
-------------------------------------------

             Summary: Non-Stream read failed HBase read block checksum 
intermittently
                 Key: HDDS-15424
                 URL: https://issues.apache.org/jira/browse/HDDS-15424
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Tak-Lon (Stephen) Wu
         Attachments: cf-d7f3673bffb24e0eb692b37c76a06925.log

When testing HBase with Ozone via running YCSB read only workload C (no SCR), 
we found a strange checksum error when reading block from Ozone's 
ChunkInputStream. 

[HBase's data block 
checksum|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java#L1531-L1558]
 (hbase.regionserver.checksum.verify) is enabled by default, and the goal is to 
save roundtrip filesystem checksum and keep that in HBase-level by keep its own 
check inside the HFile data blocks.

this does not fail with HDFS (no SCR, no bucketcache).

{code}
2026-05-21 14:31:41,627 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock: 
[RpcServer.default.FPBQ.Fifo.handler=14,queue=5,port=22101]: Reading 
d7f3673bffb24e0eb692b37c76a06925 at offset=66368883, pread=true, 
verifyChecksum=true, cachedHeader=null, onDiskSizeWithHeader=65613
2026-05-21 14:31:41,655 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock: 
[RpcServer.default.FPBQ.Fifo.handler=81,queue=0,port=22101]: Read 
[blockType=DATA, fileOffset=238227405, headerSize=33, 
onDiskSizeWithoutHeader=65584, uncompressedSizeWithoutHeader=65564, 
prevBlockOffset=238161816, isUseHBaseChecksum=true, checksumType=CRC32C, 
bytesPerChecksum=16384, onDiskDataSizeWithHeader=65597, 
getOnDiskSizeWithHeader=65617, totalChecksumBytes=20, isUnpacked=true, 
buf=[SingleByteBuff[pos=0, lim=65597, cap= 65650]], 
dataBeginsWith=\x00\x00\x00+\x00\x00\x00d\x00\x17user157290070305533595, 
fileContext=[usesHBaseChecksum=true, checksumType=CRC32C, 
bytesPerChecksum=16384, blocksize=65536, encoding=NONE, 
indexBlockEncoding=NONE, includesMvcc=true, includesTags=false, 
compressAlgo=NONE, compressTags=false, decompressionContext=null, 
cryptoContext=[cipher=NONE keyHash=NONE], 
name=d7f3673bffb24e0eb692b37c76a06925, 
cellComparator=org.apache.hadoop.hbase.InnerStoreCellComparator@4d3ead0a], 
nextBlockOnDiskSize=65633] in 101 ms
2026-05-21 14:31:41,772 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock: 
[RpcServer.default.FPBQ.Fifo.handler=14,queue=5,port=22101]: Read 
[blockType=DATA, fileOffset=66368883, headerSize=33, 
onDiskSizeWithoutHeader=65580, uncompressedSizeWithoutHeader=65560, 
prevBlockOffset=66303290, isUseHBaseChecksum=true, checksumType=CRC32C, 
bytesPerChecksum=16384, onDiskDataSizeWithHeader=65593, 
getOnDiskSizeWithHeader=65613, totalChecksumBytes=20, isUnpacked=true, 
buf=[SingleByteBuff[pos=0, lim=65593, cap= 65646]], 
dataBeginsWith=\x00\x00\x00+\x00\x00\x00d\x00\x17user156719743603804260, 
fileContext=[usesHBaseChecksum=true, checksumType=CRC32C, 
bytesPerChecksum=16384, blocksize=65536, encoding=NONE, 
indexBlockEncoding=NONE, includesMvcc=true, includesTags=false, 
compressAlgo=NONE, compressTags=false, decompressionContext=null, 
cryptoContext=[cipher=NONE keyHash=NONE], 
name=d7f3673bffb24e0eb692b37c76a06925, 
cellComparator=org.apache.hadoop.hbase.InnerStoreCellComparator@4d3ead0a], 
nextBlockOnDiskSize=65738] in 145 ms
2026-05-21 14:31:45,042 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock: 
[RpcServer.default.FPBQ.Fifo.handler=18,queue=0,port=22101]: Reading 
d7f3673bffb24e0eb692b37c76a06925 at offset=280830996, pread=true, 
verifyChecksum=true, cachedHeader=null, onDiskSizeWithHeader=65609
2026-05-21 14:32:12,355 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HBase checksumType 
verification failed for file 
ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
 at offset 237833541 filesize 392584403 checksumType 49. Retrying read with 
HDFS checksums turned on...
2026-05-21 14:32:12,355 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HBase checksum 
verification failed for file 
ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
 at offset 237833541 filesize 392584403. Retrying read with HDFS checksums 
turned on...
2026-05-21 14:32:12,404 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HDFS checksum 
verification succeeded for file 
ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
 at offset 237833541 filesize 392584403
{code}

the question is about why reading from ozone could think the data block is 
corruption ? but when it fails back to Ozone (Filesystem) checksum, it 
completes successfully? 

this extra checksum at the filesystem would cause minor delay and it's not 
always failing and only failed once a while. please see the capture reading for 
cf d7f3673bffb24e0eb692b37c76a06925 in cf-d7f3673bffb24e0eb692b37c76a06925.log 
and you will find most of the case it was reading fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to