Tak-Lon (Stephen) Wu created HDDS-15424:
-------------------------------------------
Summary: Non-Stream read failed HBase read block checksum
intermittently
Key: HDDS-15424
URL: https://issues.apache.org/jira/browse/HDDS-15424
Project: Apache Ozone
Issue Type: Bug
Reporter: Tak-Lon (Stephen) Wu
Attachments: cf-d7f3673bffb24e0eb692b37c76a06925.log
When testing HBase with Ozone via running YCSB read only workload C (no SCR),
we found a strange checksum error when reading block from Ozone's
ChunkInputStream.
[HBase's data block
checksum|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java#L1531-L1558]
(hbase.regionserver.checksum.verify) is enabled by default, and the goal is to
save roundtrip filesystem checksum and keep that in HBase-level by keep its own
check inside the HFile data blocks.
this does not fail with HDFS (no SCR, no bucketcache).
{code}
2026-05-21 14:31:41,627 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock:
[RpcServer.default.FPBQ.Fifo.handler=14,queue=5,port=22101]: Reading
d7f3673bffb24e0eb692b37c76a06925 at offset=66368883, pread=true,
verifyChecksum=true, cachedHeader=null, onDiskSizeWithHeader=65613
2026-05-21 14:31:41,655 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock:
[RpcServer.default.FPBQ.Fifo.handler=81,queue=0,port=22101]: Read
[blockType=DATA, fileOffset=238227405, headerSize=33,
onDiskSizeWithoutHeader=65584, uncompressedSizeWithoutHeader=65564,
prevBlockOffset=238161816, isUseHBaseChecksum=true, checksumType=CRC32C,
bytesPerChecksum=16384, onDiskDataSizeWithHeader=65597,
getOnDiskSizeWithHeader=65617, totalChecksumBytes=20, isUnpacked=true,
buf=[SingleByteBuff[pos=0, lim=65597, cap= 65650]],
dataBeginsWith=\x00\x00\x00+\x00\x00\x00d\x00\x17user157290070305533595,
fileContext=[usesHBaseChecksum=true, checksumType=CRC32C,
bytesPerChecksum=16384, blocksize=65536, encoding=NONE,
indexBlockEncoding=NONE, includesMvcc=true, includesTags=false,
compressAlgo=NONE, compressTags=false, decompressionContext=null,
cryptoContext=[cipher=NONE keyHash=NONE],
name=d7f3673bffb24e0eb692b37c76a06925,
cellComparator=org.apache.hadoop.hbase.InnerStoreCellComparator@4d3ead0a],
nextBlockOnDiskSize=65633] in 101 ms
2026-05-21 14:31:41,772 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock:
[RpcServer.default.FPBQ.Fifo.handler=14,queue=5,port=22101]: Read
[blockType=DATA, fileOffset=66368883, headerSize=33,
onDiskSizeWithoutHeader=65580, uncompressedSizeWithoutHeader=65560,
prevBlockOffset=66303290, isUseHBaseChecksum=true, checksumType=CRC32C,
bytesPerChecksum=16384, onDiskDataSizeWithHeader=65593,
getOnDiskSizeWithHeader=65613, totalChecksumBytes=20, isUnpacked=true,
buf=[SingleByteBuff[pos=0, lim=65593, cap= 65646]],
dataBeginsWith=\x00\x00\x00+\x00\x00\x00d\x00\x17user156719743603804260,
fileContext=[usesHBaseChecksum=true, checksumType=CRC32C,
bytesPerChecksum=16384, blocksize=65536, encoding=NONE,
indexBlockEncoding=NONE, includesMvcc=true, includesTags=false,
compressAlgo=NONE, compressTags=false, decompressionContext=null,
cryptoContext=[cipher=NONE keyHash=NONE],
name=d7f3673bffb24e0eb692b37c76a06925,
cellComparator=org.apache.hadoop.hbase.InnerStoreCellComparator@4d3ead0a],
nextBlockOnDiskSize=65738] in 145 ms
2026-05-21 14:31:45,042 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock:
[RpcServer.default.FPBQ.Fifo.handler=18,queue=0,port=22101]: Reading
d7f3673bffb24e0eb692b37c76a06925 at offset=280830996, pread=true,
verifyChecksum=true, cachedHeader=null, onDiskSizeWithHeader=65609
2026-05-21 14:32:12,355 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HBase checksumType
verification failed for file
ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
at offset 237833541 filesize 392584403 checksumType 49. Retrying read with
HDFS checksums turned on...
2026-05-21 14:32:12,355 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HBase checksum
verification failed for file
ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
at offset 237833541 filesize 392584403. Retrying read with HDFS checksums
turned on...
2026-05-21 14:32:12,404 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HDFS checksum
verification succeeded for file
ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
at offset 237833541 filesize 392584403
{code}
the question is about why reading from ozone could think the data block is
corruption ? but when it fails back to Ozone (Filesystem) checksum, it
completes successfully?
this extra checksum at the filesystem would cause minor delay and it's not
always failing and only failed once a while. please see the capture reading for
cf d7f3673bffb24e0eb692b37c76a06925 in cf-d7f3673bffb24e0eb692b37c76a06925.log
and you will find most of the case it was reading fine.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]