[ 
https://issues.apache.org/jira/browse/HDDS-15424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087732#comment-18087732
 ] 

Tak-Lon (Stephen) Wu edited comment on HDDS-15424 at 6/9/26 7:12 PM:
---------------------------------------------------------------------

changed to CRC32C in hbase and ozone client configuration, it's still not 
working and I found that this issue should be in hbase level, I may open a new 
JIRA in HBase (if this is related to the cached header, but what I observed 
this is just the filesystem pread common issue.). 

FYI the problem ws mostly related to how hbase expected the next header block 
from the input stream.

1. when reading from the input stream, it was reading the next Header block 
found `checksumType = 50` or `checksumType = 52`(either misaligned read or 
corrupted), it was expecting ((byte) 2)) instead of 50. the it fail back to 
ozone checksum. 

{code}
# example one
2026-06-08 19:23:17,991 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=5,queue=2,port=22101]: HBase checksumType 
verification failed for file 
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/0d6f7d68f3583f7ecae2238d15308395/cf/bc7d85d8cb2545d686aefb9b61dd0a7a
 at offset 92071526 filesize 284102854 checksumType 50. Retrying read with HDFS 
checksums turned on...
2026-06-08 19:23:17,991 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=5,queue=2,port=22101]: HBase checksum 
verification failed for file 
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/0d6f7d68f3583f7ecae2238d15308395/cf/bc7d85d8cb2545d686aefb9b61dd0a7a
 at offset 92071526 filesize 284102854. Retrying read with HDFS checksums 
turned on...
2026-06-08 19:23:18,672 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=5,queue=2,port=22101]: HDFS checksum 
verification succeeded for file 
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/0d6f7d68f3583f7ecae2238d15308395/cf/bc7d85d8cb2545d686aefb9b61dd0a7a
 at offset 92071526 filesize 284102854

# example two 
026-06-08 19:24:33,968 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=22101]: HBase checksumType 
verification failed for file 
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/4dd125d79c4537b96631994072d7c3fa/cf/7191c7cc2a984c01a2b264d4ef54b65c
 at offset 96368429 filesize 394458568 checksumType 52. Retrying read with HDFS 
checksums turned on...
2026-06-08 19:24:33,968 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=22101]: HBase checksum 
verification failed for file 
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/4dd125d79c4537b96631994072d7c3fa/cf/7191c7cc2a984c01a2b264d4ef54b65c
 at offset 96368429 filesize 394458568. Retrying read with HDFS checksums 
turned on...
2026-06-08 19:24:33,968 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=12,queue=0,port=22101]: HBase checksum 
verification failed for file 
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/4dd125d79c4537b96631994072d7c3fa/cf/7191c7cc2a984c01a2b264d4ef54b65c
 at offset 104246354 filesize 394458568. Retrying read with HDFS checksums 
turned on...
{code}

we're all reading in the middle of the file, even if hbase give the offset, 
from what I check the code and with Cursor, the client may have some over-read 
bytes in the buffer, such it does not have the right bytes to the pread. so 
hbase did this correction for any file system including HDFS and OFS. 



was (Author: taklwu):
changed to CRC32C in hbase and ozone client configuration, it's still not 
working and I found that this issue should be in hbase level, I will open a new 
JIRA in HBase. 

FYI the problem ws mostly related to how hbase expected the next header block 
from the input stream.

1. when reading from the input stream, it was reading the next Header block 
found `checksumType = 50` or `checksumType = 52`(either misaligned read or 
corrupted), it was expecting ((byte) 2)) instead of 50. the it fail back to 
ozone checksum. 

{code}
# example one
2026-06-08 19:23:17,991 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=5,queue=2,port=22101]: HBase checksumType 
verification failed for file 
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/0d6f7d68f3583f7ecae2238d15308395/cf/bc7d85d8cb2545d686aefb9b61dd0a7a
 at offset 92071526 filesize 284102854 checksumType 50. Retrying read with HDFS 
checksums turned on...
2026-06-08 19:23:17,991 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=5,queue=2,port=22101]: HBase checksum 
verification failed for file 
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/0d6f7d68f3583f7ecae2238d15308395/cf/bc7d85d8cb2545d686aefb9b61dd0a7a
 at offset 92071526 filesize 284102854. Retrying read with HDFS checksums 
turned on...
2026-06-08 19:23:18,672 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=5,queue=2,port=22101]: HDFS checksum 
verification succeeded for file 
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/0d6f7d68f3583f7ecae2238d15308395/cf/bc7d85d8cb2545d686aefb9b61dd0a7a
 at offset 92071526 filesize 284102854

# example two 
026-06-08 19:24:33,968 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=22101]: HBase checksumType 
verification failed for file 
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/4dd125d79c4537b96631994072d7c3fa/cf/7191c7cc2a984c01a2b264d4ef54b65c
 at offset 96368429 filesize 394458568 checksumType 52. Retrying read with HDFS 
checksums turned on...
2026-06-08 19:24:33,968 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=22101]: HBase checksum 
verification failed for file 
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/4dd125d79c4537b96631994072d7c3fa/cf/7191c7cc2a984c01a2b264d4ef54b65c
 at offset 96368429 filesize 394458568. Retrying read with HDFS checksums 
turned on...
2026-06-08 19:24:33,968 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
[RpcServer.default.FPBQ.Fifo.handler=12,queue=0,port=22101]: HBase checksum 
verification failed for file 
ofs://ozone1780561471/hbase1/hbaseroot/hbase2/data/default/200m/4dd125d79c4537b96631994072d7c3fa/cf/7191c7cc2a984c01a2b264d4ef54b65c
 at offset 104246354 filesize 394458568. Retrying read with HDFS checksums 
turned on...
{code}

we're all reading in the middle of the file, even if hbase give the offset, 
from what I check the code and with Cursor, the client may have some over-read 
bytes in the buffer, such it does not have the right bytes to the pread. so 
hbase did this correction for any file system including HDFS and OFS. 


> Non-Stream read failed HBase read block checksum intermittently
> ---------------------------------------------------------------
>
>                 Key: HDDS-15424
>                 URL: https://issues.apache.org/jira/browse/HDDS-15424
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Tak-Lon (Stephen) Wu
>            Priority: Minor
>         Attachments: cf-d7f3673bffb24e0eb692b37c76a06925.log
>
>
> When testing HBase with Ozone via running YCSB read only workload C (no SCR), 
> we found a strange checksum error when reading block from Ozone's 
> ChunkInputStream. 
> [HBase's data block 
> checksum|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java#L1531-L1558]
>  (hbase.regionserver.checksum.verify) is enabled by default, and the goal is 
> to save roundtrip filesystem checksum and keep that in HBase-level by keep 
> its own check inside the HFile data blocks.
> this does not fail with HDFS (no SCR, no bucketcache).
> {code}
> 2026-05-21 14:31:41,627 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock: 
> [RpcServer.default.FPBQ.Fifo.handler=14,queue=5,port=22101]: Reading 
> d7f3673bffb24e0eb692b37c76a06925 at offset=66368883, pread=true, 
> verifyChecksum=true, cachedHeader=null, onDiskSizeWithHeader=65613
> 2026-05-21 14:31:41,655 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock: 
> [RpcServer.default.FPBQ.Fifo.handler=81,queue=0,port=22101]: Read 
> [blockType=DATA, fileOffset=238227405, headerSize=33, 
> onDiskSizeWithoutHeader=65584, uncompressedSizeWithoutHeader=65564, 
> prevBlockOffset=238161816, isUseHBaseChecksum=true, checksumType=CRC32C, 
> bytesPerChecksum=16384, onDiskDataSizeWithHeader=65597, 
> getOnDiskSizeWithHeader=65617, totalChecksumBytes=20, isUnpacked=true, 
> buf=[SingleByteBuff[pos=0, lim=65597, cap= 65650]], 
> dataBeginsWith=\x00\x00\x00+\x00\x00\x00d\x00\x17user157290070305533595, 
> fileContext=[usesHBaseChecksum=true, checksumType=CRC32C, 
> bytesPerChecksum=16384, blocksize=65536, encoding=NONE, 
> indexBlockEncoding=NONE, includesMvcc=true, includesTags=false, 
> compressAlgo=NONE, compressTags=false, decompressionContext=null, 
> cryptoContext=[cipher=NONE keyHash=NONE], 
> name=d7f3673bffb24e0eb692b37c76a06925, 
> cellComparator=org.apache.hadoop.hbase.InnerStoreCellComparator@4d3ead0a], 
> nextBlockOnDiskSize=65633] in 101 ms
> 2026-05-21 14:31:41,772 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock: 
> [RpcServer.default.FPBQ.Fifo.handler=14,queue=5,port=22101]: Read 
> [blockType=DATA, fileOffset=66368883, headerSize=33, 
> onDiskSizeWithoutHeader=65580, uncompressedSizeWithoutHeader=65560, 
> prevBlockOffset=66303290, isUseHBaseChecksum=true, checksumType=CRC32C, 
> bytesPerChecksum=16384, onDiskDataSizeWithHeader=65593, 
> getOnDiskSizeWithHeader=65613, totalChecksumBytes=20, isUnpacked=true, 
> buf=[SingleByteBuff[pos=0, lim=65593, cap= 65646]], 
> dataBeginsWith=\x00\x00\x00+\x00\x00\x00d\x00\x17user156719743603804260, 
> fileContext=[usesHBaseChecksum=true, checksumType=CRC32C, 
> bytesPerChecksum=16384, blocksize=65536, encoding=NONE, 
> indexBlockEncoding=NONE, includesMvcc=true, includesTags=false, 
> compressAlgo=NONE, compressTags=false, decompressionContext=null, 
> cryptoContext=[cipher=NONE keyHash=NONE], 
> name=d7f3673bffb24e0eb692b37c76a06925, 
> cellComparator=org.apache.hadoop.hbase.InnerStoreCellComparator@4d3ead0a], 
> nextBlockOnDiskSize=65738] in 145 ms
> 2026-05-21 14:31:45,042 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock: 
> [RpcServer.default.FPBQ.Fifo.handler=18,queue=0,port=22101]: Reading 
> d7f3673bffb24e0eb692b37c76a06925 at offset=280830996, pread=true, 
> verifyChecksum=true, cachedHeader=null, onDiskSizeWithHeader=65609
> 2026-05-21 14:32:12,355 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
> [RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HBase 
> checksumType verification failed for file 
> ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
>  at offset 237833541 filesize 392584403 checksumType 49. Retrying read with 
> HDFS checksums turned on...
> 2026-05-21 14:32:12,355 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
> [RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HBase checksum 
> verification failed for file 
> ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
>  at offset 237833541 filesize 392584403. Retrying read with HDFS checksums 
> turned on...
> 2026-05-21 14:32:12,404 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
> [RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HDFS checksum 
> verification succeeded for file 
> ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
>  at offset 237833541 filesize 392584403
> {code}
> the question is about why reading from ozone could think the data block is 
> corruption ? but when it fails back to Ozone (Filesystem) checksum, it 
> completes successfully? 
> this extra checksum at the filesystem would cause minor delay and it's not 
> always failing and only failed once a while. please see the capture reading 
> for cf d7f3673bffb24e0eb692b37c76a06925 in 
> cf-d7f3673bffb24e0eb692b37c76a06925.log and you will find most of the case it 
> was reading fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to