Tak-Lon (Stephen) Wu created HDDS-15422:
-------------------------------------------
Summary: Stream read with `seek` close stream incorrectly
Key: HDDS-15422
URL: https://issues.apache.org/jira/browse/HDDS-15422
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Client
Affects Versions: 2.2.0
Reporter: Tak-Lon (Stephen) Wu
While testing HBase read only with Ozone by using YCSB workload C (L1
blockcache / no bucketcache /), We are experiencing degraded read throughput
and hanging handler threads on our HBase RegionServers. Despite a healthy
BlockCache hit rate (~87.5%), active read threads are becoming blocked,
creating a bottleneck.
Profiling reveals that handler threads are hanging IdLock.getLockEntry while
waiting for a single thread to fetch an HFile block from Apache Ozone.
and while tuning on TRACE/DEBUG log , we found the input stream are closing
immediately after seeking from end/middle of the file back to 0. the next seek
that tried to seek from 0 to middle hangs and wait till timeout. this
indicates that read amplification issue triggered by Reverse Scans over the
Ozone gRPC client cannot reuse the stream.
{code}
2026-05-29 12:50:06,171 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
stream119-reader3831: return response positon 179405298, length 4095502 (block
offset 179306496, length 4194304)
2026-05-29 12:50:06,171 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream119: seek
179471034 -> 0
2026-05-29 12:50:06,172 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing
stream119-reader3831
2026-05-29 12:50:06,172 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
stream119-reader3831 setCompleted success
2026-05-29 12:50:06,172 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream120: seek 0
-> 133955559
2026-05-29 12:50:06,172 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[grpc-default-executor-11]: stream119-reader3831 Failed to setCompleted:
Already completed
2026-05-29 12:50:06,985 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek
217274129 -> 0
2026-05-29 12:50:06,985 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek 0
-> 102738300
2026-05-29 12:50:06,985 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: position
102738300, length 65686, requested 102738300, diff 65686, readLength 65686,
preReadSize=0
2026-05-29 12:50:07,008 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[grpc-default-executor-24]: stream492-reader3832: enqueue response offset
101711872, length 4194304, numChecksums 4, bytesPerChecksum=1048576
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
stream492-reader3832: return response positon 102738300, length 3167876 (block
offset 101711872, length 4194304)
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek
102803986 -> 0
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing
stream492-reader3832
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]:
stream492-reader3832 setCompleted success
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek 0
-> 217274129
2026-05-29 12:50:07,009 DEBUG
org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream:
[grpc-default-executor-24]: stream492-reader3832 Failed to setCompleted:
Already completed
{code}
please check [this snippet of code
|https://github.com/apache/ozone/blob/master/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInputStream.java#L212]
in StreamBlockInputStream.java.
one thought is to reduce the read chunk size from e.g. 4MB or 1MB to 64KB to
match the hbase block, but this size reduction does not help the fact of
Reverse Scan for HFile reading cases.
Setup configuration
{code}
ozone.fs.datastream.enabled = true
ozone.client.stream.readblock.enable = true
hdds.container.ratis.datastream.enabled = true
ozone.client.verify.checksum = false (was true and turned to false when
checksum failed)
hdds.grpc.tls.enabled = false (final)
ozone.client.stream.read.timeout = 120s
scm.container.client.idle.threshold = 120s
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]