dongjoon-hyun opened a new pull request, #2521:
URL: https://github.com/apache/orc/pull/2521

   ### What changes were proposed in this pull request?
   
   This PR aims to support Parquet LZ4 in bench module.
   
   ### Why are the changes needed?
   
   To benchmark `LZ4` like the other codecs.
   
   ### How was this patch tested?
   
   Manually run the following.
   
   **BUILD**
   ```
   $ cd java
   
   $ mvn package -DskipTests -Pbenchmark
   ```
   
   **WRITE**
   ```
   $ java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -d 
sales -c lz4 -f parquet
   Processing sales [parquet]
   [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
   [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new 
compressor [.lz4]
   ```
   
   **FILE NAME**
   ```
   $ ls -alR data/generated/sales
   total 13396024
   drwxr-xr-x@ 4 dongjoon  staff         128 Feb  6 16:51 .
   drwxr-xr-x@ 3 dongjoon  staff          96 Feb  6 14:50 ..
   -rw-r--r--@ 1 dongjoon  staff  3768120878 Feb  6 16:53 parquet.lz4
   ```
   
   **READ**
   ```
   $ java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -d sales -c 
lz4 -f parquet
   ...
   [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block 
read in memory in 10 ms. row count = 374588
   data/generated/sales/parquet.lz4 rows: 25000000 batches: 24415
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: `Opus 4.5` on `Claude Code`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to