This is an automated email from the ASF dual-hosted git repository. gangwu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/parquet-java.git
The following commit(s) were added to refs/heads/master by this push: new 3ac860e11 GH-2994: Optimize string to binary conversion in AvroWriteSupport (#2995) 3ac860e11 is described below commit 3ac860e1145c0fba1cf3b902c943f1703dd9db52 Author: sschepens <sebastian.schep...@mercadolibre.com> AuthorDate: Wed Aug 28 11:58:54 2024 -0300 GH-2994: Optimize string to binary conversion in AvroWriteSupport (#2995) `Binary.fromCharSequence` is an order of magnitude slower than `Binary.fromString` when input is a `String`: ``` Benchmarks.fromCharSequence thrpt 25 5885347.328 ± 186669.738 ops/s Benchmarks.fromString thrpt 25 71335979.492 ± 8800704.044 ops/s ``` Here is the code for the benchmarks: ```java public class Benchmarks { private static final String string = RandomStringUtils.randomAlphanumeric(100); @Benchmark @BenchmarkMode(Mode.Throughput) public void fromCharSequence(Blackhole blackhole) { blackhole.consume(Binary.fromCharSequence(string)); } @Benchmark @BenchmarkMode(Mode.Throughput) public void fromString(Blackhole blackhole) { blackhole.consume(Binary.fromString(string)); } } ``` --- .../src/main/java/org/apache/parquet/avro/AvroWriteSupport.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java b/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java index 846fb8bab..53fc3d59c 100644 --- a/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java +++ b/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java @@ -403,10 +403,12 @@ public class AvroWriteSupport<T> extends WriteSupport<T> { if (value instanceof Utf8) { Utf8 utf8 = (Utf8) value; return Binary.fromReusedByteArray(utf8.getBytes(), 0, utf8.getByteLength()); + } else if (value instanceof String) { + return Binary.fromString((String) value); } else if (value instanceof CharSequence) { return Binary.fromCharSequence((CharSequence) value); } - return Binary.fromCharSequence(value.toString()); + return Binary.fromString(value.toString()); } private static GenericData getDataModel(ParquetConfiguration conf, Schema schema) {