dongjoon-hyun opened a new pull request, #2521: URL: https://github.com/apache/orc/pull/2521
### What changes were proposed in this pull request? This PR aims to support Parquet LZ4 in bench module. ### Why are the changes needed? To benchmark `LZ4` like the other codecs. ### How was this patch tested? Manually run the following. **BUILD** ``` $ cd java $ mvn package -DskipTests -Pbenchmark ``` **WRITE** ``` $ java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -d sales -c lz4 -f parquet Processing sales [parquet] [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.lz4] ``` **FILE NAME** ``` $ ls -alR data/generated/sales total 13396024 drwxr-xr-x@ 4 dongjoon staff 128 Feb 6 16:51 . drwxr-xr-x@ 3 dongjoon staff 96 Feb 6 14:50 .. -rw-r--r--@ 1 dongjoon staff 3768120878 Feb 6 16:53 parquet.lz4 ``` **READ** ``` $ java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -d sales -c lz4 -f parquet ... [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 10 ms. row count = 374588 data/generated/sales/parquet.lz4 rows: 25000000 batches: 24415 ``` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: `Opus 4.5` on `Claude Code` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
