This is an automated email from the ASF dual-hosted git repository.

gangwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-java.git


The following commit(s) were added to refs/heads/master by this push:
     new 3ac860e11 GH-2994: Optimize string to binary conversion in 
AvroWriteSupport (#2995)
3ac860e11 is described below

commit 3ac860e1145c0fba1cf3b902c943f1703dd9db52
Author: sschepens <sebastian.schep...@mercadolibre.com>
AuthorDate: Wed Aug 28 11:58:54 2024 -0300

    GH-2994: Optimize string to binary conversion in AvroWriteSupport (#2995)
    
    `Binary.fromCharSequence` is an order of magnitude slower than 
`Binary.fromString` when input is a `String`:
    
    ```
    Benchmarks.fromCharSequence  thrpt   25   5885347.328 ±  186669.738  ops/s
    Benchmarks.fromString        thrpt   25  71335979.492 ± 8800704.044  ops/s
    ```
    
    Here is the code for the benchmarks:
    ```java
    public class Benchmarks {
        private static final String string = 
RandomStringUtils.randomAlphanumeric(100);
    
        @Benchmark
        @BenchmarkMode(Mode.Throughput)
        public void fromCharSequence(Blackhole blackhole) {
            blackhole.consume(Binary.fromCharSequence(string));
        }
    
        @Benchmark
        @BenchmarkMode(Mode.Throughput)
        public void fromString(Blackhole blackhole) {
            blackhole.consume(Binary.fromString(string));
        }
    }
    ```
---
 .../src/main/java/org/apache/parquet/avro/AvroWriteSupport.java       | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java 
b/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java
index 846fb8bab..53fc3d59c 100644
--- a/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java
+++ b/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java
@@ -403,10 +403,12 @@ public class AvroWriteSupport<T> extends WriteSupport<T> {
     if (value instanceof Utf8) {
       Utf8 utf8 = (Utf8) value;
       return Binary.fromReusedByteArray(utf8.getBytes(), 0, 
utf8.getByteLength());
+    } else if (value instanceof String) {
+      return Binary.fromString((String) value);
     } else if (value instanceof CharSequence) {
       return Binary.fromCharSequence((CharSequence) value);
     }
-    return Binary.fromCharSequence(value.toString());
+    return Binary.fromString(value.toString());
   }
 
   private static GenericData getDataModel(ParquetConfiguration conf, Schema 
schema) {

Reply via email to