smengcl opened a new pull request, #10353: URL: https://github.com/apache/ozone/pull/10353
Generated-by: Claude Code (Opus 4.7) ## What changes were proposed in this pull request? `BufferUtils.assignByteBuffers` previously allocated heap `ByteBuffer`s for chunk reads. Reading from a `FileChannel` into a heap buffer forces the JDK to first read into a thread-local `DirectByteBuffer` (in `sun.nio.ch.IOUtil`) and then memcpy the chunk-sized payload into the user's heap buffer. So every default-config chunk read paid one extra user-space chunk-sized memcpy plus one chunk-sized heap allocation. Two prior PRs added opt-in direct-buffer paths to skip exactly this copy: * HDDS-7188 (Tsz-Wo Nicholas Sze, Jan 2025): `ChunkUtils.readDataNettyChunkedNioFile` via Netty `PooledByteBufAllocator.DEFAULT` * HDDS-10488 (Sammi Chen, Sep 2024): `MappedBufferManager` mmap reads with semaphore-tracked quota Both deliberately avoided raw `ByteBuffer.allocateDirect` because it is a known leak pattern under sustained load. Tiny wrapper objects mean GC fires rarely, the `Cleaner` that frees direct memory never runs promptly, and the native budget exhausts with `OutOfMemoryError: Direct buffer memory`. Neither PR updated the default `assignByteBuffers` path, so the heap allocation stayed. This change uses Hadoop's `org.apache.hadoop.util.DirectBufferPool`, the same pool already used by `BlockOutputStream` and `KeyValueContainerCheck`. Release plumbing lives entirely in `ChunkUtils.readData`: a release lambda is registered on `DispatcherContext.setReleaseMethod`, fires once in `GrpcXceiverService`'s `finally` block after `responseObserver.onNext` returns (the standard server marshaller is synchronous, so the response bytes are already serialized into the Netty outgoing buffer by then), and returns each buffer to the pool. Paths without a release-supporting context fall back to GC reclaim via the pool's `WeakReference` layer, so the pool degrades to plain `allocateDirect` rather than leaking. `ChunkBuffer` and `ChunkBufferImplWithByteBufferList` are unchanged from master. A consumer audit confirmed no `.array()` usage breaks with direct buffers. `ChecksumByteBufferImpl` guards `.array()` with `hasArray()`, and `UnsafeByteOperations.unsafeWrap` of a direct `ByteBuffer` produces a zero-copy `NioByteString` without calling `.array()`. One test (`TestChunkUtils.concurrentReadWriteOfSameFile`) called `.array()` directly and was updated to use `get(byte[])`. ### Expected impact Per chunk read: * One chunk-sized user-space memcpy eliminated (~270 µs for a 4 MB chunk on a typical x86 server with ~15 GB/s memory bandwidth). * One chunk-sized heap allocation eliminated (heap churn moves to bounded direct memory in the pool). Wall-clock effect depends on storage: | Storage | Latency per chunk | Effect | |---|---|---| | Page-cache hit | sub-ms total | meaningful (memcpy was a large fraction) | | NVMe sequential | ~1 ms total | ~20 percent reduction | | SATA SSD | ~10 ms total | a few percent | | HDD | tens of ms total | invisible at wall-clock | Heap-churn reduction is consistent across storage tiers: at 100 MB/s read throughput a DN sheds roughly 100 MB/s of heap allocation. ### Pool sizing and operational note `DirectBufferPool` is unbounded by design (per `ConcurrentMap<Integer, Queue<WeakReference<ByteBuffer>>>`). Steady-state pool memory converges to peak concurrent in-flight buffers per capacity class. For default-config chunk reads (4 MB) on a 64-thread DN, that is roughly 256 MB of direct memory. Operators should ensure `-XX:MaxDirectMemorySize` accommodates this; the existing `BlockOutputStream` and `KeyValueContainerCheck` consumers of `DirectBufferPool` are already sized this way. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-15359 ## How was this patch tested? - Existing tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
