[
https://issues.apache.org/jira/browse/HDDS-15424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087732#comment-18087732
]
Tak-Lon (Stephen) Wu edited comment on HDDS-15424 at 6/9/26 7:12 PM:
---------------------------------------------------------------------
changed to CRC32C in hbase and ozone client configuration, it's still not
working and I found that this issue should be in hbase level, I may open a new
JIRA in HBase (if this is related to the cached header, but what I observed
this is just the filesystem pread common issue.).
FYI the problem ws mostly related to how hbase expected the next header block
from the input stream.
1. when reading from the input stream, it was reading the next Header block
found `checksumType = 50` or `checksumType = 52`(either misaligned read or
corrupted), it was expecting ((byte) 2)) instead of 50. the it fail back to
ozone checksum.
{code}
# example one
2026-06-08 19:23:17,991 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=5,queue=2,port=22101]: HBase checksumType
verification failed for file
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/0d6f7d68f3583f7ecae2238d15308395/cf/bc7d85d8cb2545d686aefb9b61dd0a7a
at offset 92071526 filesize 284102854 checksumType 50. Retrying read with HDFS
checksums turned on...
2026-06-08 19:23:17,991 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=5,queue=2,port=22101]: HBase checksum
verification failed for file
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/0d6f7d68f3583f7ecae2238d15308395/cf/bc7d85d8cb2545d686aefb9b61dd0a7a
at offset 92071526 filesize 284102854. Retrying read with HDFS checksums
turned on...
2026-06-08 19:23:18,672 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=5,queue=2,port=22101]: HDFS checksum
verification succeeded for file
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/0d6f7d68f3583f7ecae2238d15308395/cf/bc7d85d8cb2545d686aefb9b61dd0a7a
at offset 92071526 filesize 284102854
# example two
026-06-08 19:24:33,968 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=22101]: HBase checksumType
verification failed for file
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/4dd125d79c4537b96631994072d7c3fa/cf/7191c7cc2a984c01a2b264d4ef54b65c
at offset 96368429 filesize 394458568 checksumType 52. Retrying read with HDFS
checksums turned on...
2026-06-08 19:24:33,968 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=22101]: HBase checksum
verification failed for file
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/4dd125d79c4537b96631994072d7c3fa/cf/7191c7cc2a984c01a2b264d4ef54b65c
at offset 96368429 filesize 394458568. Retrying read with HDFS checksums
turned on...
2026-06-08 19:24:33,968 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=12,queue=0,port=22101]: HBase checksum
verification failed for file
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/4dd125d79c4537b96631994072d7c3fa/cf/7191c7cc2a984c01a2b264d4ef54b65c
at offset 104246354 filesize 394458568. Retrying read with HDFS checksums
turned on...
{code}
we're all reading in the middle of the file, even if hbase give the offset,
from what I check the code and with Cursor, the client may have some over-read
bytes in the buffer, such it does not have the right bytes to the pread. so
hbase did this correction for any file system including HDFS and OFS.
was (Author: taklwu):
changed to CRC32C in hbase and ozone client configuration, it's still not
working and I found that this issue should be in hbase level, I will open a new
JIRA in HBase.
FYI the problem ws mostly related to how hbase expected the next header block
from the input stream.
1. when reading from the input stream, it was reading the next Header block
found `checksumType = 50` or `checksumType = 52`(either misaligned read or
corrupted), it was expecting ((byte) 2)) instead of 50. the it fail back to
ozone checksum.
{code}
# example one
2026-06-08 19:23:17,991 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=5,queue=2,port=22101]: HBase checksumType
verification failed for file
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/0d6f7d68f3583f7ecae2238d15308395/cf/bc7d85d8cb2545d686aefb9b61dd0a7a
at offset 92071526 filesize 284102854 checksumType 50. Retrying read with HDFS
checksums turned on...
2026-06-08 19:23:17,991 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=5,queue=2,port=22101]: HBase checksum
verification failed for file
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/0d6f7d68f3583f7ecae2238d15308395/cf/bc7d85d8cb2545d686aefb9b61dd0a7a
at offset 92071526 filesize 284102854. Retrying read with HDFS checksums
turned on...
2026-06-08 19:23:18,672 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=5,queue=2,port=22101]: HDFS checksum
verification succeeded for file
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/0d6f7d68f3583f7ecae2238d15308395/cf/bc7d85d8cb2545d686aefb9b61dd0a7a
at offset 92071526 filesize 284102854
# example two
026-06-08 19:24:33,968 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=22101]: HBase checksumType
verification failed for file
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/4dd125d79c4537b96631994072d7c3fa/cf/7191c7cc2a984c01a2b264d4ef54b65c
at offset 96368429 filesize 394458568 checksumType 52. Retrying read with HDFS
checksums turned on...
2026-06-08 19:24:33,968 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=22101]: HBase checksum
verification failed for file
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/4dd125d79c4537b96631994072d7c3fa/cf/7191c7cc2a984c01a2b264d4ef54b65c
at offset 96368429 filesize 394458568. Retrying read with HDFS checksums
turned on...
2026-06-08 19:24:33,968 WARN org.apache.hadoop.hbase.io.hfile.HFile:
[RpcServer.default.FPBQ.Fifo.handler=12,queue=0,port=22101]: HBase checksum
verification failed for file
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/4dd125d79c4537b96631994072d7c3fa/cf/7191c7cc2a984c01a2b264d4ef54b65c
at offset 104246354 filesize 394458568. Retrying read with HDFS checksums
turned on...
{code}
we're all reading in the middle of the file, even if hbase give the offset,
from what I check the code and with Cursor, the client may have some over-read
bytes in the buffer, such it does not have the right bytes to the pread. so
hbase did this correction for any file system including HDFS and OFS.
> Non-Stream read failed HBase read block checksum intermittently
> ---------------------------------------------------------------
>
> Key: HDDS-15424
> URL: https://issues.apache.org/jira/browse/HDDS-15424
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Tak-Lon (Stephen) Wu
> Priority: Minor
> Attachments: cf-d7f3673bffb24e0eb692b37c76a06925.log
>
>
> When testing HBase with Ozone via running YCSB read only workload C (no SCR),
> we found a strange checksum error when reading block from Ozone's
> ChunkInputStream.
> [HBase's data block
> checksum|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java#L1531-L1558]
> (hbase.regionserver.checksum.verify) is enabled by default, and the goal is
> to save roundtrip filesystem checksum and keep that in HBase-level by keep
> its own check inside the HFile data blocks.
> this does not fail with HDFS (no SCR, no bucketcache).
> {code}
> 2026-05-21 14:31:41,627 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock:
> [RpcServer.default.FPBQ.Fifo.handler=14,queue=5,port=22101]: Reading
> d7f3673bffb24e0eb692b37c76a06925 at offset=66368883, pread=true,
> verifyChecksum=true, cachedHeader=null, onDiskSizeWithHeader=65613
> 2026-05-21 14:31:41,655 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock:
> [RpcServer.default.FPBQ.Fifo.handler=81,queue=0,port=22101]: Read
> [blockType=DATA, fileOffset=238227405, headerSize=33,
> onDiskSizeWithoutHeader=65584, uncompressedSizeWithoutHeader=65564,
> prevBlockOffset=238161816, isUseHBaseChecksum=true, checksumType=CRC32C,
> bytesPerChecksum=16384, onDiskDataSizeWithHeader=65597,
> getOnDiskSizeWithHeader=65617, totalChecksumBytes=20, isUnpacked=true,
> buf=[SingleByteBuff[pos=0, lim=65597, cap= 65650]],
> dataBeginsWith=\x00\x00\x00+\x00\x00\x00d\x00\x17user157290070305533595,
> fileContext=[usesHBaseChecksum=true, checksumType=CRC32C,
> bytesPerChecksum=16384, blocksize=65536, encoding=NONE,
> indexBlockEncoding=NONE, includesMvcc=true, includesTags=false,
> compressAlgo=NONE, compressTags=false, decompressionContext=null,
> cryptoContext=[cipher=NONE keyHash=NONE],
> name=d7f3673bffb24e0eb692b37c76a06925,
> cellComparator=org.apache.hadoop.hbase.InnerStoreCellComparator@4d3ead0a],
> nextBlockOnDiskSize=65633] in 101 ms
> 2026-05-21 14:31:41,772 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock:
> [RpcServer.default.FPBQ.Fifo.handler=14,queue=5,port=22101]: Read
> [blockType=DATA, fileOffset=66368883, headerSize=33,
> onDiskSizeWithoutHeader=65580, uncompressedSizeWithoutHeader=65560,
> prevBlockOffset=66303290, isUseHBaseChecksum=true, checksumType=CRC32C,
> bytesPerChecksum=16384, onDiskDataSizeWithHeader=65593,
> getOnDiskSizeWithHeader=65613, totalChecksumBytes=20, isUnpacked=true,
> buf=[SingleByteBuff[pos=0, lim=65593, cap= 65646]],
> dataBeginsWith=\x00\x00\x00+\x00\x00\x00d\x00\x17user156719743603804260,
> fileContext=[usesHBaseChecksum=true, checksumType=CRC32C,
> bytesPerChecksum=16384, blocksize=65536, encoding=NONE,
> indexBlockEncoding=NONE, includesMvcc=true, includesTags=false,
> compressAlgo=NONE, compressTags=false, decompressionContext=null,
> cryptoContext=[cipher=NONE keyHash=NONE],
> name=d7f3673bffb24e0eb692b37c76a06925,
> cellComparator=org.apache.hadoop.hbase.InnerStoreCellComparator@4d3ead0a],
> nextBlockOnDiskSize=65738] in 145 ms
> 2026-05-21 14:31:45,042 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock:
> [RpcServer.default.FPBQ.Fifo.handler=18,queue=0,port=22101]: Reading
> d7f3673bffb24e0eb692b37c76a06925 at offset=280830996, pread=true,
> verifyChecksum=true, cachedHeader=null, onDiskSizeWithHeader=65609
> 2026-05-21 14:32:12,355 WARN org.apache.hadoop.hbase.io.hfile.HFile:
> [RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HBase
> checksumType verification failed for file
> ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
> at offset 237833541 filesize 392584403 checksumType 49. Retrying read with
> HDFS checksums turned on...
> 2026-05-21 14:32:12,355 WARN org.apache.hadoop.hbase.io.hfile.HFile:
> [RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HBase checksum
> verification failed for file
> ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
> at offset 237833541 filesize 392584403. Retrying read with HDFS checksums
> turned on...
> 2026-05-21 14:32:12,404 WARN org.apache.hadoop.hbase.io.hfile.HFile:
> [RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HDFS checksum
> verification succeeded for file
> ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
> at offset 237833541 filesize 392584403
> {code}
> the question is about why reading from ozone could think the data block is
> corruption ? but when it fails back to Ozone (Filesystem) checksum, it
> completes successfully?
> this extra checksum at the filesystem would cause minor delay and it's not
> always failing and only failed once a while. please see the capture reading
> for cf d7f3673bffb24e0eb692b37c76a06925 in
> cf-d7f3673bffb24e0eb692b37c76a06925.log and you will find most of the case it
> was reading fine.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]