[ 
https://issues.apache.org/jira/browse/HDDS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tak-Lon (Stephen) Wu updated HDDS-15422:
----------------------------------------
    Attachment: hbase-rs-read-stream-60-1.txt
                hbase-rs-read-stream-60-2.txt
                hbase-rs-read-stream-60-3.txt
                hbase-rs-read-stream-60-4.html

> Stream read with `seek` close stream during reverse scan may cause 
> performance bottleneck 
> ------------------------------------------------------------------------------------------
>
>                 Key: HDDS-15422
>                 URL: https://issues.apache.org/jira/browse/HDDS-15422
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Client
>    Affects Versions: 2.2.0
>            Reporter: Tak-Lon (Stephen) Wu
>            Priority: Major
>         Attachments: hbase-rs-read-stream-60-1.txt, 
> hbase-rs-read-stream-60-2.txt, hbase-rs-read-stream-60-3.txt, 
> hbase-rs-read-stream-60-4.html, ozone-datanode-1.txt, ozone-datanode-2.txt
>
>
> While testing HBase read only with Ozone by using YCSB workload C (L1 
> blockcache / no bucketcache /), We are experiencing degraded read throughput 
> and hanging handler threads on our HBase RegionServers. Despite a healthy 
> BlockCache hit rate (~87.5%), active read threads are becoming blocked, 
> creating a bottleneck.
> Profiling reveals that handler threads are hanging IdLock.getLockEntry while 
> waiting for a single thread to fetch an HFile block from Apache Ozone.
> and while tuning on TRACE/DEBUG log , we found the input stream are closing 
> immediately after seeking from end/middle of the file back to 0. the next 
> seek that tried to seek from 0 to middle hangs and wait till timeout.  this 
> indicates that read amplification issue triggered by Reverse Scans over the 
> Ozone gRPC client cannot reuse the stream.
> {code}
> 2026-05-29 12:50:06,171 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: 
> stream119-reader3831: return response positon 179405298, length 4095502 
> (block offset 179306496, length 4194304)
> 2026-05-29 12:50:06,171 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream119: seek 
> 179471034 -> 0
> 2026-05-29 12:50:06,172 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing 
> stream119-reader3831
> 2026-05-29 12:50:06,172 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: 
> stream119-reader3831 setCompleted success
> 2026-05-29 12:50:06,172 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream120: seek 
> 0 -> 133955559
> 2026-05-29 12:50:06,172 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [grpc-default-executor-11]: stream119-reader3831 Failed to setCompleted: 
> Already completed
> 2026-05-29 12:50:06,985 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek 
> 217274129 -> 0
> 2026-05-29 12:50:06,985 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek 
> 0 -> 102738300
> 2026-05-29 12:50:06,985 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: position 
> 102738300, length 65686, requested 102738300, diff 65686, readLength 65686, 
> preReadSize=0
> 2026-05-29 12:50:07,008 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [grpc-default-executor-24]: stream492-reader3832: enqueue response offset 
> 101711872, length 4194304, numChecksums 4, bytesPerChecksum=1048576
> 2026-05-29 12:50:07,009 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: 
> stream492-reader3832: return response positon 102738300, length 3167876 
> (block offset 101711872, length 4194304)
> 2026-05-29 12:50:07,009 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream492: seek 
> 102803986 -> 0
> 2026-05-29 12:50:07,009 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: Closing 
> stream492-reader3832
> 2026-05-29 12:50:07,009 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: 
> stream492-reader3832 setCompleted success
> 2026-05-29 12:50:07,009 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: stream493: seek 
> 0 -> 217274129
> 2026-05-29 12:50:07,009 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [grpc-default-executor-24]: stream492-reader3832 Failed to setCompleted: 
> Already completed
> {code}
> please check [this snippet of code 
> |https://github.com/apache/ozone/blob/master/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInputStream.java#L212]
>  in StreamBlockInputStream.java. 
> one thought is to reduce the read chunk size from e.g. 4MB or 1MB to 64KB to 
> match the hbase block, but this size reduction does not help the fact of 
> Reverse Scan for HFile reading cases.
> {code}
> 2026-05-29 12:50:06,985 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [RpcServer.default.FPBQ.Fifo.handler=42,queue=6,port=22101]: position 
> 102738300, length 65686, requested 102738300, diff 65686, readLength 65686, 
> preReadSize=0
> 2026-05-29 12:50:07,008 DEBUG 
> org.apache.hadoop.hdds.scm.storage.StreamBlockInputStream: 
> [grpc-default-executor-24]: stream492-reader3832: enqueue response offset 
> 101711872, length 4194304, numChecksums 4, bytesPerChecksum=1048576
> {code}
> Setup configuration
> {code}
>  ozone.fs.datastream.enabled = true
>  ozone.client.stream.readblock.enable = true
>  hdds.container.ratis.datastream.enabled = true
>  ozone.client.verify.checksum = false (was true and turned to false when 
> checksum failed)
>  hdds.grpc.tls.enabled = false (final)
>  ozone.client.stream.read.timeout = 120s
>  scm.container.client.idle.threshold = 120s
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to