[spark] branch master updated: [SPARK-27043][SQL] Add ORC nested schema pruning benchmarks

dongjoon Tue, 05 Mar 2019 11:13:47 -0800

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 8385749  [SPARK-27043][SQL] Add ORC nested schema pruning benchmarks
8385749 is described below

commit 83857496e53520aa0fdf3978fcbcdd6c49c3ab5c
Author: Liang-Chi Hsieh <vii...@gmail.com>
AuthorDate: Tue Mar 5 11:12:57 2019 -0800

    [SPARK-27043][SQL] Add ORC nested schema pruning benchmarks
    
    ## What changes were proposed in this pull request?
    
    We have benchmark of nested schema pruning, but only for Parquet. This adds 
similar benchmark for ORC. This is used with nested schema pruning of ORC.
    
    ## How was this patch tested?
    
    Added test.
    
    Closes #23955 from viirya/orc-nested-schema-pruning-benchmark.
    
    Authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 .../NestedSchemaPruningBenchmark-results.txt       | 40 ----------------
 .../OrcNestedSchemaPruningBenchmark-results.txt    | 40 ++++++++++++++++
 .../OrcV2NestedSchemaPruningBenchmark-results.txt  | 40 ++++++++++++++++
 ...ParquetNestedSchemaPruningBenchmark-results.txt | 40 ++++++++++++++++
 .../benchmark/NestedSchemaPruningBenchmark.scala   | 54 ++++++++++------------
 .../OrcNestedSchemaPruningBenchmark.scala          | 44 ++++++++++++++++++
 .../OrcV2NestedSchemaPruningBenchmark.scala        | 35 ++++++++++++++
 .../ParquetNestedSchemaPruningBenchmark.scala      | 35 ++++++++++++++
 8 files changed, 258 insertions(+), 70 deletions(-)

diff --git a/sql/core/benchmarks/NestedSchemaPruningBenchmark-results.txt 
b/sql/core/benchmarks/NestedSchemaPruningBenchmark-results.txt
deleted file mode 100644
index 7585cae..0000000
--- a/sql/core/benchmarks/NestedSchemaPruningBenchmark-results.txt
+++ /dev/null
@@ -1,40 +0,0 @@
-================================================================================================
-Nested Schema Pruning Benchmark
-================================================================================================
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
-Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
-Selection:                               Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Top-level column                                59 /   68         16.9         
 59.1       1.0X
-Nested column                                  180 /  186          5.6         
179.7       0.3X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
-Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
-Limiting:                                Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Top-level column                               241 /  246          4.2         
240.9       1.0X
-Nested column                                 1828 / 1904          0.5        
1827.5       0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
-Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
-Repartitioning:                          Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Top-level column                               201 /  208          5.0         
200.8       1.0X
-Nested column                                 1811 / 1864          0.6        
1811.4       0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
-Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
-Repartitioning by exprs:                 Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Top-level column                               206 /  212          4.9         
205.8       1.0X
-Nested column                                 1814 / 1863          0.6        
1814.3       0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
-Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
-Sorting:                                 Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
-------------------------------------------------------------------------------------------------
-Top-level column                               282 /  302          3.5         
281.7       1.0X
-Nested column                                 2093 / 2199          0.5        
2093.1       0.1X
-
-
diff --git a/sql/core/benchmarks/OrcNestedSchemaPruningBenchmark-results.txt 
b/sql/core/benchmarks/OrcNestedSchemaPruningBenchmark-results.txt
new file mode 100644
index 0000000..f738256
--- /dev/null
+++ b/sql/core/benchmarks/OrcNestedSchemaPruningBenchmark-results.txt
@@ -0,0 +1,40 @@
+================================================================================================
+Nested Schema Pruning Benchmark For ORC v1
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Selection:                                Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    113            196         
 89          8.8         113.0       1.0X
+Nested column                                      1316           1639         
240          0.8        1315.5       0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Limiting:                                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    260            474         
211          3.8         260.4       1.0X
+Nested column                                      2322           3312         
701          0.4        2322.3       0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Repartitioning:                           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    275            318         
 55          3.6         274.8       1.0X
+Nested column                                      2482           3263         
759          0.4        2482.2       0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Repartitioning by exprs:                  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    274            288         
 11          3.7         273.9       1.0X
+Nested column                                      2783           2905         
 86          0.4        2782.7       0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Sorting:                                  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    382            419         
 23          2.6         382.4       1.0X
+Nested column                                      2974           3517         
699          0.3        2974.1       0.1X
+
+
diff --git a/sql/core/benchmarks/OrcV2NestedSchemaPruningBenchmark-results.txt 
b/sql/core/benchmarks/OrcV2NestedSchemaPruningBenchmark-results.txt
new file mode 100644
index 0000000..ad43ffb
--- /dev/null
+++ b/sql/core/benchmarks/OrcV2NestedSchemaPruningBenchmark-results.txt
@@ -0,0 +1,40 @@
+================================================================================================
+Nested Schema Pruning Benchmark For ORC v2
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Selection:                                Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                     91            102         
  9         11.0          91.2       1.0X
+Nested column                                      1459           1548         
 80          0.7        1458.5       0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Limiting:                                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    101            112         
 10          9.9         100.7       1.0X
+Nested column                                      1459           1619         
109          0.7        1458.9       0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Repartitioning:                           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    268            284         
 12          3.7         268.2       1.0X
+Nested column                                      2781           2865         
 73          0.4        2780.8       0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Repartitioning by exprs:                  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    309            318         
  6          3.2         309.2       1.0X
+Nested column                                      2426           2891         
253          0.4        2425.8       0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Sorting:                                  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    179            194         
  8          5.6         179.3       1.0X
+Nested column                                      2084           2277         
243          0.5        2083.7       0.1X
+
+
diff --git 
a/sql/core/benchmarks/ParquetNestedSchemaPruningBenchmark-results.txt 
b/sql/core/benchmarks/ParquetNestedSchemaPruningBenchmark-results.txt
new file mode 100644
index 0000000..d51ebc6
--- /dev/null
+++ b/sql/core/benchmarks/ParquetNestedSchemaPruningBenchmark-results.txt
@@ -0,0 +1,40 @@
+================================================================================================
+Nested Schema Pruning Benchmark For Parquet
+================================================================================================
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Selection:                                Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                     88            114         
 16         11.4          87.5       1.0X
+Nested column                                       201            223         
 27          5.0         200.5       0.4X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Limiting:                                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    263            315         
 36          3.8         263.2       1.0X
+Nested column                                      2111           2622         
613          0.5        2111.1       0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Repartitioning:                           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    222            250         
 34          4.5         222.2       1.0X
+Nested column                                      2084           2339         
266          0.5        2084.2       0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Repartitioning by exprs:                  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    238            306         
 96          4.2         238.1       1.0X
+Nested column                                      2080           2373         
218          0.5        2079.5       0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
+Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
+Sorting:                                  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------------------------------
+Top-level column                                    328            383         
 57          3.1         327.6       1.0X
+Nested column                                      2595           3136         
638          0.4        2595.1       0.1X
+
+
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
index ddfc8ae..e852de1 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
@@ -21,23 +21,17 @@ import org.apache.spark.benchmark.Benchmark
 import org.apache.spark.sql.internal.SQLConf
 
 /**
- * Synthetic benchmark for nested schema pruning performance.
- * To run this benchmark:
- * {{{
- *   1. without sbt:
- *      bin/spark-submit --class <this class> --jars <spark core test jar> 
<sql core test jar>
- *   2. build/sbt "sql/test:runMain <this class>"
- *   3. generate result:
- *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this 
class>"
- *      Results will be written to 
"benchmarks/NestedSchemaPruningBenchmark-results.txt".
- * }}}
+ * The base class for synthetic benchmark for nested schema pruning 
performance.
  */
-object NestedSchemaPruningBenchmark extends SqlBasedBenchmark {
+abstract class NestedSchemaPruningBenchmark extends SqlBasedBenchmark {
 
   import spark.implicits._
 
-  private val N = 1000000
-  private val numIters = 10
+  val dataSourceName: String
+  val benchmarkName: String
+
+  protected val N = 1000000
+  protected val numIters = 10
 
   // We use `col1 BIGINT, col2 STRUCT<_1: BIGINT, _2: STRING>` as a test 
schema.
   // col1 and col2._1 is used for comparision. col2._2 mimics the burden for 
the other columns
@@ -53,13 +47,13 @@ object NestedSchemaPruningBenchmark extends 
SqlBasedBenchmark {
     }
   }
 
-  private def selectBenchmark(numRows: Int, numIters: Int): Unit = {
+  protected def selectBenchmark(numRows: Int, numIters: Int): Unit = {
     withTempPath { dir =>
       val path = dir.getCanonicalPath
 
       Seq(1, 2).foreach { i =>
-        df.write.parquet(path + s"/$i")
-        spark.read.parquet(path + s"/$i").createOrReplaceTempView(s"t$i")
+        df.write.format(dataSourceName).save(path + s"/$i")
+        spark.read.format(dataSourceName).load(path + 
s"/$i").createOrReplaceTempView(s"t$i")
       }
 
       val benchmark = new Benchmark(s"Selection", numRows, numIters, output = 
output)
@@ -71,13 +65,13 @@ object NestedSchemaPruningBenchmark extends 
SqlBasedBenchmark {
     }
   }
 
-  private def limitBenchmark(numRows: Int, numIters: Int): Unit = {
+  protected def limitBenchmark(numRows: Int, numIters: Int): Unit = {
     withTempPath { dir =>
       val path = dir.getCanonicalPath
 
       Seq(1, 2).foreach { i =>
-        df.write.parquet(path + s"/$i")
-        spark.read.parquet(path + s"/$i").createOrReplaceTempView(s"t$i")
+        df.write.format(dataSourceName).save(path + s"/$i")
+        spark.read.format(dataSourceName).load(path + 
s"/$i").createOrReplaceTempView(s"t$i")
       }
 
       val benchmark = new Benchmark(s"Limiting", numRows, numIters, output = 
output)
@@ -91,13 +85,13 @@ object NestedSchemaPruningBenchmark extends 
SqlBasedBenchmark {
     }
   }
 
-  private def repartitionBenchmark(numRows: Int, numIters: Int): Unit = {
+  protected def repartitionBenchmark(numRows: Int, numIters: Int): Unit = {
     withTempPath { dir =>
       val path = dir.getCanonicalPath
 
       Seq(1, 2).foreach { i =>
-        df.write.parquet(path + s"/$i")
-        spark.read.parquet(path + s"/$i").createOrReplaceTempView(s"t$i")
+        df.write.format(dataSourceName).save(path + s"/$i")
+        spark.read.format(dataSourceName).load(path + 
s"/$i").createOrReplaceTempView(s"t$i")
       }
 
       val benchmark = new Benchmark(s"Repartitioning", numRows, numIters, 
output = output)
@@ -111,13 +105,13 @@ object NestedSchemaPruningBenchmark extends 
SqlBasedBenchmark {
     }
   }
 
-  private def repartitionByExprBenchmark(numRows: Int, numIters: Int): Unit = {
+  protected def repartitionByExprBenchmark(numRows: Int, numIters: Int): Unit 
= {
     withTempPath { dir =>
       val path = dir.getCanonicalPath
 
       Seq(1, 2).foreach { i =>
-        df.write.parquet(path + s"/$i")
-        spark.read.parquet(path + s"/$i").createOrReplaceTempView(s"t$i")
+        df.write.format(dataSourceName).save(path + s"/$i")
+        spark.read.format(dataSourceName).load(path + 
s"/$i").createOrReplaceTempView(s"t$i")
       }
 
       val benchmark = new Benchmark(s"Repartitioning by exprs", numRows, 
numIters, output = output)
@@ -131,13 +125,13 @@ object NestedSchemaPruningBenchmark extends 
SqlBasedBenchmark {
     }
   }
 
-  private def sortBenchmark(numRows: Int, numIters: Int): Unit = {
+  protected def sortBenchmark(numRows: Int, numIters: Int): Unit = {
     withTempPath { dir =>
       val path = dir.getCanonicalPath
 
       Seq(1, 2).foreach { i =>
-        df.write.parquet(path + s"/$i")
-        spark.read.parquet(path + s"/$i").createOrReplaceTempView(s"t$i")
+        df.write.format(dataSourceName).save(path + s"/$i")
+        spark.read.format(dataSourceName).load(path + 
s"/$i").createOrReplaceTempView(s"t$i")
       }
 
       val benchmark = new Benchmark(s"Sorting", numRows, numIters, output = 
output)
@@ -150,8 +144,8 @@ object NestedSchemaPruningBenchmark extends 
SqlBasedBenchmark {
   }
 
   override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
-    runBenchmark(s"Nested Schema Pruning Benchmark") {
-      withSQLConf (SQLConf.NESTED_SCHEMA_PRUNING_ENABLED.key -> "true") {
+    runBenchmark(benchmarkName) {
+      withSQLConf(SQLConf.NESTED_SCHEMA_PRUNING_ENABLED.key -> "true") {
         selectBenchmark (N, numIters)
         limitBenchmark (N, numIters)
         repartitionBenchmark (N, numIters)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/OrcNestedSchemaPruningBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/OrcNestedSchemaPruningBenchmark.scala
new file mode 100644
index 0000000..947fc67
--- /dev/null
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/OrcNestedSchemaPruningBenchmark.scala
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for nested schema pruning performance for ORC V1 
datasource.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar> 
<sql core test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this 
class>"
+ *      Results will be written to 
"benchmarks/OrcNestedSchemaPruningBenchmark-results.txt".
+ * }}}
+ */
+object OrcNestedSchemaPruningBenchmark extends NestedSchemaPruningBenchmark {
+  override val dataSourceName: String = "orc"
+  override val benchmarkName: String = "Nested Schema Pruning Benchmark For 
ORC v1"
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+    withSQLConf(SQLConf.USE_V1_SOURCE_READER_LIST.key -> "orc",
+        SQLConf.USE_V1_SOURCE_WRITER_LIST.key -> "orc") {
+      super.runBenchmarkSuite(mainArgs)
+    }
+  }
+}
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/OrcV2NestedSchemaPruningBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/OrcV2NestedSchemaPruningBenchmark.scala
new file mode 100644
index 0000000..e735d1c
--- /dev/null
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/OrcV2NestedSchemaPruningBenchmark.scala
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+/**
+ * Synthetic benchmark for nested schema pruning performance for ORC V2 
datasource.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar> 
<sql core test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this 
class>"
+ *      Results will be written to 
"benchmarks/OrcV2NestedSchemaPruningBenchmark-results.txt".
+ * }}}
+ */
+object OrcV2NestedSchemaPruningBenchmark extends NestedSchemaPruningBenchmark {
+  override val dataSourceName: String = "orc"
+  override val benchmarkName: String = "Nested Schema Pruning Benchmark For 
ORC v2"
+}
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ParquetNestedSchemaPruningBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ParquetNestedSchemaPruningBenchmark.scala
new file mode 100644
index 0000000..1c9cc2c
--- /dev/null
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ParquetNestedSchemaPruningBenchmark.scala
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+/**
+ * Synthetic benchmark for nested schema pruning performance for Parquet 
datasource.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar> 
<sql core test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this 
class>"
+ *      Results will be written to 
"benchmarks/ParquetNestedSchemaPruningBenchmark-results.txt".
+ * }}}
+ */
+object ParquetNestedSchemaPruningBenchmark extends 
NestedSchemaPruningBenchmark {
+    override val dataSourceName: String = "parquet"
+    override val benchmarkName: String = "Nested Schema Pruning Benchmark For 
Parquet"
+}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27043][SQL] Add ORC nested schema pruning benchmarks

Reply via email to