yandrey321 commented on code in PR #10438: URL: https://github.com/apache/ozone/pull/10438#discussion_r3383454418
########## hadoop-hdds/common/src/test/java/org/apache/hadoop/ozone/common/ChunkBufferPutBenchmark.java: ########## @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone.common; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.text.DecimalFormat; +import java.util.concurrent.ThreadLocalRandom; +import org.apache.hadoop.ozone.common.JfrByteBufferAllocations.AllocationStats; + +/** + * Microbenchmark for ChunkBuffer.put(byte[]) direct copy vs ByteBuffer.wrap path. + * + * <p>Focused on the scenarios where HDDS-15485 shows the clearest benefit: + * <ul> + * <li>Throughput: 4KB stream fill with incremental buffer (64KB increment)</li> + * <li>Allocations: same 4KB / 64KB-increment incremental buffer path (JFR + wrap calls)</li> + * </ul> + * + * <p>Run from the repo root: + * <pre> + * mvn -pl hadoop-hdds/common -q test-compile exec:java \ + * -Dexec.mainClass=org.apache.hadoop.ozone.common.ChunkBufferPutBenchmark \ + * -Dexec.classpathScope=test \ + * -Dexec.args="--add-opens jdk.jfr/jdk.jfr=ALL-UNNAMED --add-opens jdk.jfr/jdk.jfr.consumer=ALL-UNNAMED" + * </pre> + * JFR ByteBuffer counts are sampled; put-op count reports exact wrap calls. + * Wrap-path timings use a blackhole on each {@code ByteBuffer.wrap} so the JVM + * cannot eliminate short-lived wrapper allocations via escape analysis. + */ +public final class ChunkBufferPutBenchmark { + + /** Prevents escape analysis from removing ByteBuffer.wrap allocations in the wrap path. */ + private static volatile Object blackhole; Review Comment: @szetszwo I removed blackhole and rerun the test, I still see the same improvement: ``` mvn -pl hadoop-hdds/common -q test-compile exec:java \ -Dexec.mainClass=org.apache.hadoop.ozone.common.ChunkBufferPutBenchmark \ -Dexec.classpathScope=test \ -Dexec.args="--add-opens jdk.jfr/jdk.jfr=ALL-UNNAMED --add-opens jdk.jfr/jdk.jfr.consumer=ALL-UNNAMED" ChunkBuffer.put(byte[]) microbenchmark (pre-allocated buffer, put-only) JVM: 17 on aarch64 === Throughput showcase === --- Incremental buffer showcase --- Config: ozone.client.stream.buffer.size=4MB, ozone.client.stream.buffer.increment=64KB, io.file.buffer.size=4KB Pattern: 4KB stream fill into IncrementalChunkBuffer (64KB steps) Chunk=4096KB increment=64KB write=4KB round 1: direct put(byte[]): 42,262.7 MB/s | 92 ns/op | 20.00s | 216,385,536 ops wrap put(ByteBuffer): 38,938.0 MB/s | 100 ns/op | 20.00s | 199,363,584 ops improvement: 8.5% faster (1.09x) per 4KB write; throughput 8.5% (1.09x) round 2: direct put(byte[]): 41,901.0 MB/s | 93 ns/op | 20.00s | 214,533,120 ops wrap put(ByteBuffer): 38,885.1 MB/s | 100 ns/op | 20.00s | 199,092,224 ops improvement: 7.8% faster (1.08x) per 4KB write; throughput 7.8% (1.08x) round 3: direct put(byte[]): 41,889.6 MB/s | 93 ns/op | 20.00s | 214,474,752 ops wrap put(ByteBuffer): 38,947.6 MB/s | 100 ns/op | 20.00s | 199,412,736 ops improvement: 7.6% faster (1.08x) per 4KB write; throughput 7.6% (1.08x) median improvement over 3 rounds: 7.8% === Allocation showcase === --- Incremental buffer showcase --- Config: ozone.client.stream.buffer.size=4MB, ozone.client.stream.buffer.increment=64KB, io.file.buffer.size=4KB Pattern: 4KB stream fill into IncrementalChunkBuffer (64KB steps) Chunk=4096KB increment=64KB write=4KB direct put(byte[]): 53,298,176 put ops | 0 ByteBuffer TLAB allocs | 0 alloc bytes wrap put(ByteBuffer): 48,955,392 put ops | 724 ByteBuffer TLAB allocs | 40,544 alloc bytes ByteBuffer.wrap calls on wrap path (1 per put): 48,955,392 direct path avoids 48,955,392 ByteBuffer.wrap calls per run JFR confirms zero ByteBuffer TLAB allocations on direct path JFR sampled ByteBuffer TLAB allocations avoided on direct path: 724 (JFR samples TLAB events; put-op count is the exact wrap-call metric) ``` Removing `-Dexec.args="--add-opens jdk.jfr/jdk.jfr=ALL-UNNAMED --add-opens jdk.jfr/jdk.jfr.consumer=ALL-UNNAMED"` param doubles the numbers: ``` mvn -pl hadoop-hdds/common -q test-compile exec:java \ -Dexec.mainClass=org.apache.hadoop.ozone.common.ChunkBufferPutBenchmark \ -Dexec.classpathScope=test ChunkBuffer.put(byte[]) microbenchmark (pre-allocated buffer, put-only) JVM: 17 on aarch64 === Throughput showcase === --- Incremental buffer showcase --- Config: ozone.client.stream.buffer.size=4MB, ozone.client.stream.buffer.increment=64KB, io.file.buffer.size=4KB Pattern: 4KB stream fill into IncrementalChunkBuffer (64KB steps) Chunk=4096KB increment=64KB write=4KB round 1: direct put(byte[]): 42,609.4 MB/s | 92 ns/op | 20.00s | 218,161,152 ops wrap put(ByteBuffer): 36,828.7 MB/s | 106 ns/op | 20.00s | 188,563,456 ops improvement: 15.7% faster (1.16x) per 4KB write; throughput 15.7% (1.16x) round 2: direct put(byte[]): 42,486.2 MB/s | 92 ns/op | 20.00s | 217,530,368 ops wrap put(ByteBuffer): 37,137.1 MB/s | 105 ns/op | 20.00s | 190,142,464 ops improvement: 14.4% faster (1.14x) per 4KB write; throughput 14.4% (1.14x) round 3: direct put(byte[]): 42,399.2 MB/s | 92 ns/op | 20.00s | 217,084,928 ops wrap put(ByteBuffer): 36,557.4 MB/s | 107 ns/op | 20.00s | 187,174,912 ops improvement: 16.0% faster (1.16x) per 4KB write; throughput 16.0% (1.16x) median improvement over 3 rounds: 15.7% === Allocation showcase === --- Incremental buffer showcase --- Config: ozone.client.stream.buffer.size=4MB, ozone.client.stream.buffer.increment=64KB, io.file.buffer.size=4KB Pattern: 4KB stream fill into IncrementalChunkBuffer (64KB steps) Chunk=4096KB increment=64KB write=4KB direct put(byte[]): 50,421,760 put ops | 0 ByteBuffer TLAB allocs | 0 alloc bytes wrap put(ByteBuffer): 48,524,288 put ops | 714 ByteBuffer TLAB allocs | 39,984 alloc bytes ByteBuffer.wrap calls on wrap path (1 per put): 48,524,288 direct path avoids 48,524,288 ByteBuffer.wrap calls per run JFR confirms zero ByteBuffer TLAB allocations on direct path JFR sampled ByteBuffer TLAB allocations avoided on direct path: 714 (JFR samples TLAB events; put-op count is the exact wrap-call metric) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
