[ 
https://issues.apache.org/jira/browse/HDDS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084998#comment-18084998
 ] 

Tsz-wo Sze edited comment on HDDS-15422 at 5/31/26 8:27 PM:
------------------------------------------------------------

bq. ... we found the input stream are closing immediately after seeking from 
end/middle of the file back to 0. the next seek that tried to seek from 0 to 
middle hangs and wait till timeout.  ...

{code}
2026-05-29 12:50:06,171 DEBUG StreamBlockInputStream: stream119-reader3831: 
return response positon 179405298, length 4095502 (block offset 179306496, 
length 4194304)
2026-05-29 12:50:06,171 DEBUG StreamBlockInputStream: stream119: seek 179471034 
-> 0
2026-05-29 12:50:06,172 DEBUG StreamBlockInputStream: Closing 
stream119-reader3831
2026-05-29 12:50:06,172 DEBUG StreamBlockInputStream: stream119-reader3831 
setCompleted success
2026-05-29 12:50:06,172 DEBUG StreamBlockInputStream: stream120: seek 0 -> 
133955559
{code}

[~taklwu], thanks for filing this bug!  

>From the log above, it seems the calls are:
# read position is 179471034
# seek to 0
# seek to 133955559

Somehow the stream is closed between two seeks.  Let me see how to fix it.


was (Author: szetszwo):
bq. ... we found the input stream are closing immediately after seeking from 
end/middle of the file back to 0. the next seek that tried to seek from 0 to 
middle hangs and wait till timeout.  ...

{code}
2026-05-29 12:50:06,171 DEBUG StreamBlockInputStream: stream119-reader3831: 
return response positon 179405298, length 4095502 (block offset 179306496, 
length 4194304)
2026-05-29 12:50:06,171 DEBUG StreamBlockInputStream: stream119: seek 179471034 
-> 0
2026-05-29 12:50:06,172 DEBUG StreamBlockInputStream: Closing 
stream119-reader3831
2026-05-29 12:50:06,172 DEBUG StreamBlockInputStream: stream119-reader3831 
setCompleted success
2026-05-29 12:50:06,172 DEBUG StreamBlockInputStream: stream120: seek 0 -> 
133955559
{code}

[~taklwu], thanks for filing this bug!  

>From the log above, it seems the call is:
# read position is 179471034
# seek to 0
# seek to 133955559

Somehow the stream is closed between two seeks.  Let me see how to fix it.

> Stream read with `seek` close stream during reverse scan may cause 
> performance bottleneck 
> ------------------------------------------------------------------------------------------
>
>                 Key: HDDS-15422
>                 URL: https://issues.apache.org/jira/browse/HDDS-15422
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: Ozone Client
>    Affects Versions: 2.2.0
>            Reporter: Tak-Lon (Stephen) Wu
>            Priority: Critical
>         Attachments: hbase-rpc-handler-are-all-parked.png, 
> hbase-rs-read-stream-60-1.txt, hbase-rs-read-stream-60-2.txt, 
> hbase-rs-read-stream-60-3.txt, hbase-rs-read-stream-60-4.html, 
> ozone-datanode-1.txt, ozone-datanode-2.txt
>
>
> While testing HBase read only with Ozone by using YCSB workload C (L1 
> blockcache / no bucketcache /), We are experiencing degraded read throughput 
> and hanging handler threads on our HBase RegionServers. Despite a healthy 
> BlockCache hit rate (~87.5%), active read threads are becoming blocked, 
> creating a bottleneck.
> Profiling reveals that handler threads are hanging IdLock.getLockEntry while 
> waiting for a single thread to fetch an HFile block from Apache Ozone.
> and while tuning on TRACE/DEBUG log , we found the input stream are closing 
> immediately after seeking from end/middle of the file back to 0. the next 
> seek that tried to seek from 0 to middle hangs and wait till timeout.  this 
> indicates that read amplification issue triggered by Reverse Scans over the 
> Ozone gRPC client cannot reuse the stream.
> {code}
> 2026-05-29 12:50:06,171 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: 
> stream119-reader3831: return response positon 179405298, length 4095502 
> (block offset 179306496, length 4194304)
> 2026-05-29 12:50:06,171 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream119: seek 
> 179471034 -> 0
> 2026-05-29 12:50:06,172 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing 
> stream119-reader3831
> 2026-05-29 12:50:06,172 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: 
> stream119-reader3831 setCompleted success
> 2026-05-29 12:50:06,172 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream120: seek 
> 0 -> 133955559
> 2026-05-29 12:50:06,172 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [grpc-default-executor-11]: stream119-reader3831 Failed to setCompleted: 
> Already completed
> 2026-05-29 12:50:06,985 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek 
> 217274129 -> 0
> 2026-05-29 12:50:06,985 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek 
> 0 -> 102738300
> 2026-05-29 12:50:06,985 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: position 
> 102738300, length 65686, requested 102738300, diff 65686, readLength 65686, 
> preReadSize=0
> 2026-05-29 12:50:07,008 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [grpc-default-executor-24]: stream492-reader3832: enqueue response offset 
> 101711872, length 4194304, numChecksums 4, bytesPerChecksum=1048576
> 2026-05-29 12:50:07,009 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: 
> stream492-reader3832: return response positon 102738300, length 3167876 
> (block offset 101711872, length 4194304)
> 2026-05-29 12:50:07,009 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek 
> 102803986 -> 0
> 2026-05-29 12:50:07,009 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing 
> stream492-reader3832
> 2026-05-29 12:50:07,009 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: 
> stream492-reader3832 setCompleted success
> 2026-05-29 12:50:07,009 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek 
> 0 -> 217274129
> 2026-05-29 12:50:07,009 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [grpc-default-executor-24]: stream492-reader3832 Failed to setCompleted: 
> Already completed
> {code}
> please check [this snippet of code 
> |https://github.com/apache/ozone/blob/master/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInputStream.java#L212]
>  in StreamBlockInputStream.java. 
> one thought is to reduce the read chunk size from e.g. 4MB or 1MB to 64KB to 
> match the hbase block, but this size reduction does not help the fact of 
> Reverse Scan for HFile reading cases.
> {code}
> 2026-05-29 12:50:06,985 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: position 
> 102738300, length 65686, requested 102738300, diff 65686, readLength 65686, 
> preReadSize=0
> 2026-05-29 12:50:07,008 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [grpc-default-executor-24]: stream492-reader3832: enqueue response offset 
> 101711872, length 4194304, numChecksums 4, bytesPerChecksum=1048576
> {code}
> Setup configuration
> {code}
>  ozone.fs.datastream.enabled = true
>  ozone.client.stream.readblock.enable = true
>  hdds.container.ratis.datastream.enabled = true
>  ozone.client.verify.checksum = false (was true and turned to false when 
> checksum failed)
>  hdds.grpc.tls.enabled = false (final)
>  ozone.client.stream.read.timeout = 120s
>  scm.container.client.idle.threshold = 120s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to