[GitHub] [orc] dongjoon-hyun opened a new pull request #630: ORC-737: Upgrade Spark to 3.1.0

GitBox Sun, 17 Jan 2021 20:38:34 -0800


dongjoon-hyun opened a new pull request #630:
URL: https://github.com/apache/orc/pull/630



   ### What changes were proposed in this pull request?
   
   This PR aims to upgrade Spark from 2.4.6 to 3.1.0 in benchmark module.
   Please note that we will upgrade to Spark 3.1.1 immediately as soon as 
possible because 3.1.0 is not an official release.
   
   ### Why are the changes needed?
   
   To use the latest version in benchmark.
   
   ### How was this patch tested?
   
   ```
   $ cd java/bench
   
   $ mvn package
   INFO] 
------------------------------------------------------------------------
   [INFO] Reactor Summary for ORC Benchmarks 1.7.0-SNAPSHOT:
   [INFO] 
   [INFO] ORC Benchmarks ..................................... SUCCESS [  0.891 
s]
   [INFO] ORC Benchmarks Core ................................ SUCCESS [  5.977 
s]
   [INFO] ORC Benchmarks Hive ................................ SUCCESS [  6.855 
s]
   [INFO] ORC Benchmarks Spark ............................... SUCCESS [ 26.922 
s]
   [INFO] 
------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] 
------------------------------------------------------------------------
   [INFO] Total time:  40.809 s
   [INFO] Finished at: 2021-01-17T20:28:48-08:00
   [INFO] 
------------------------------------------------------------------------
   
   $ java -jar spark/target/orc-benchmarks-spark-*.jar spark -i 1 -I 1 ~/data   
 
   # JMH version: 1.20
   # VM version: JDK 1.8.0_275, VM 25.275-b01
   # VM invoker: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
   # VM options: -server -Xms256m -Xmx2g -Dbench.root.dir=/home/dongjoon/data
   # Warmup: 1 iterations, 10 s each
   # Measurement: 1 iterations, 10 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   # Benchmark: org.apache.orc.bench.spark.SparkBenchmark.fullRead
   # Parameters: (compression = none, dataset = taxi, format = orc)
   
   # Run progress: 0.00% complete, ETA 00:27:00
   # Fork: 1 of 1
   # Warmup Iteration   1: [WARN ] Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable
   
   Records: 22773249
   Invocations: 1
   Reads: 81
   Bytes: 1333068794
   11158424.688 us/op
   Iteration   1: 
   Records: 45546498
   Invocations: 2
   Reads: 162
   Bytes: 2666137588
   9776462.312 us/op
                    bytesPerRecord: 58.537 #
                    perRecord:      0.429 us/op
                    reads:          81.000 #
                    records:        22773249.000 #
   ...
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [orc] dongjoon-hyun opened a new pull request #630: ORC-737: Upgrade Spark to 3.1.0

Reply via email to