Repository: spark
Updated Branches:
  refs/heads/master bd32b509a -> 1c9acc243


[SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBenchmark benchmark results

## What changes were proposed in this pull request?
This pr corrected the default configuration (`spark.master=local[1]`) for 
benchmarks. Also, this updated performance results on the AWS `r3.xlarge`.

## How was this patch tested?
N/A

Author: Takeshi Yamamuro <yamam...@apache.org>

Closes #21625 from maropu/FixDataSourceReadBenchmark.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1c9acc24
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1c9acc24
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1c9acc24

Branch: refs/heads/master
Commit: 1c9acc2438f9a97134ae5213a12112b2361fbb78
Parents: bd32b50
Author: Takeshi Yamamuro <yamam...@apache.org>
Authored: Thu Jun 28 09:21:10 2018 +0800
Committer: hyukjinkwon <gurwls...@apache.org>
Committed: Thu Jun 28 09:21:10 2018 +0800

----------------------------------------------------------------------
 .../benchmark/DataSourceReadBenchmark.scala     | 296 ++++++++++---------
 1 file changed, 152 insertions(+), 144 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/1c9acc24/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
----------------------------------------------------------------------
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
index fc6d8ab..8711f5a 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
@@ -39,9 +39,11 @@ import org.apache.spark.util.{Benchmark, Utils}
 object DataSourceReadBenchmark {
   val conf = new SparkConf()
     .setAppName("DataSourceReadBenchmark")
-    .setIfMissing("spark.master", "local[1]")
+    // Since `spark.master` always exists, overrides this value
+    .set("spark.master", "local[1]")
     .setIfMissing("spark.driver.memory", "3g")
     .setIfMissing("spark.executor.memory", "3g")
+    .setIfMissing("spark.ui.enabled", "false")
 
   val spark = SparkSession.builder.config(conf).getOrCreate()
 
@@ -154,73 +156,73 @@ object DataSourceReadBenchmark {
           }
         }
 
-
         /*
-        Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
+        OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 
4.14.33-51.37.amzn1.x86_64
+        Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
         SQL Single TINYINT Column Scan:      Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                 15231 / 15267          1.0     
    968.3       1.0X
-        SQL Json                                  8476 / 8498          1.9     
    538.9       1.8X
-        SQL Parquet Vectorized                     121 /  127        130.0     
      7.7     125.9X
-        SQL Parquet MR                            1515 / 1543         10.4     
     96.3      10.1X
-        SQL ORC Vectorized                         164 /  171         95.9     
     10.4      92.9X
-        SQL ORC Vectorized with copy               228 /  234         69.0     
     14.5      66.8X
-        SQL ORC MR                                1297 / 1309         12.1     
     82.5      11.7X
+        SQL CSV                                 22964 / 23096          0.7     
   1460.0       1.0X
+        SQL Json                                  8469 / 8593          1.9     
    538.4       2.7X
+        SQL Parquet Vectorized                     164 /  177         95.8     
     10.4     139.9X
+        SQL Parquet MR                            1687 / 1706          9.3     
    107.2      13.6X
+        SQL ORC Vectorized                         191 /  197         82.3     
     12.2     120.2X
+        SQL ORC Vectorized with copy               215 /  219         73.2     
     13.7     106.9X
+        SQL ORC MR                                1392 / 1412         11.3     
     88.5      16.5X
 
 
         SQL Single SMALLINT Column Scan:     Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                 16344 / 16374          1.0     
   1039.1       1.0X
-        SQL Json                                  8634 / 8648          1.8     
    548.9       1.9X
-        SQL Parquet Vectorized                     172 /  177         91.5     
     10.9      95.1X
-        SQL Parquet MR                            1744 / 1746          9.0     
    110.9       9.4X
-        SQL ORC Vectorized                         189 /  194         83.1     
     12.0      86.4X
-        SQL ORC Vectorized with copy               244 /  250         64.5     
     15.5      67.0X
-        SQL ORC MR                                1341 / 1386         11.7     
     85.3      12.2X
+        SQL CSV                                 24090 / 24097          0.7     
   1531.6       1.0X
+        SQL Json                                  8791 / 8813          1.8     
    558.9       2.7X
+        SQL Parquet Vectorized                     204 /  212         77.0     
     13.0     117.9X
+        SQL Parquet MR                            1813 / 1850          8.7     
    115.3      13.3X
+        SQL ORC Vectorized                         226 /  230         69.7     
     14.4     106.7X
+        SQL ORC Vectorized with copy               295 /  298         53.3     
     18.8      81.6X
+        SQL ORC MR                                1526 / 1549         10.3     
     97.1      15.8X
 
 
         SQL Single INT Column Scan:          Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                 17874 / 17875          0.9     
   1136.4       1.0X
-        SQL Json                                  9190 / 9204          1.7     
    584.3       1.9X
-        SQL Parquet Vectorized                     141 /  160        111.2     
      9.0     126.4X
-        SQL Parquet MR                            1930 / 2049          8.2     
    122.7       9.3X
-        SQL ORC Vectorized                         259 /  264         60.7     
     16.5      69.0X
-        SQL ORC Vectorized with copy               265 /  272         59.4     
     16.8      67.5X
-        SQL ORC MR                                1528 / 1569         10.3     
     97.2      11.7X
+        SQL CSV                                 25637 / 25791          0.6     
   1629.9       1.0X
+        SQL Json                                  9532 / 9570          1.7     
    606.0       2.7X
+        SQL Parquet Vectorized                     181 /  191         86.8     
     11.5     141.5X
+        SQL Parquet MR                            2210 / 2227          7.1     
    140.5      11.6X
+        SQL ORC Vectorized                         309 /  317         50.9     
     19.6      83.0X
+        SQL ORC Vectorized with copy               316 /  322         49.8     
     20.1      81.2X
+        SQL ORC MR                                1650 / 1680          9.5     
    104.9      15.5X
 
 
         SQL Single BIGINT Column Scan:       Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                 22812 / 22839          0.7     
   1450.4       1.0X
-        SQL Json                                12026 / 12054          1.3     
    764.6       1.9X
-        SQL Parquet Vectorized                     222 /  227         70.8     
     14.1     102.6X
-        SQL Parquet MR                            2199 / 2204          7.2     
    139.8      10.4X
-        SQL ORC Vectorized                         331 /  335         47.6     
     21.0      69.0X
-        SQL ORC Vectorized with copy               338 /  343         46.6     
     21.5      67.6X
-        SQL ORC MR                                1618 / 1622          9.7     
    102.9      14.1X
+        SQL CSV                                 31617 / 31764          0.5     
   2010.1       1.0X
+        SQL Json                                12440 / 12451          1.3     
    790.9       2.5X
+        SQL Parquet Vectorized                     284 /  315         55.4     
     18.0     111.4X
+        SQL Parquet MR                            2382 / 2390          6.6     
    151.5      13.3X
+        SQL ORC Vectorized                         398 /  403         39.5     
     25.3      79.5X
+        SQL ORC Vectorized with copy               410 /  413         38.3     
     26.1      77.1X
+        SQL ORC MR                                1783 / 1813          8.8     
    113.4      17.7X
 
 
         SQL Single FLOAT Column Scan:        Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                 18703 / 18740          0.8     
   1189.1       1.0X
-        SQL Json                                11779 / 11869          1.3     
    748.9       1.6X
-        SQL Parquet Vectorized                     143 /  145        110.1     
      9.1     130.9X
-        SQL Parquet MR                            1954 / 1963          8.0     
    124.2       9.6X
-        SQL ORC Vectorized                         347 /  355         45.3     
     22.1      53.8X
-        SQL ORC Vectorized with copy               356 /  359         44.1     
     22.7      52.5X
-        SQL ORC MR                                1570 / 1598         10.0     
     99.8      11.9X
+        SQL CSV                                 26679 / 26742          0.6     
   1696.2       1.0X
+        SQL Json                                12490 / 12541          1.3     
    794.1       2.1X
+        SQL Parquet Vectorized                     174 /  183         90.4     
     11.1     153.3X
+        SQL Parquet MR                            2201 / 2223          7.1     
    140.0      12.1X
+        SQL ORC Vectorized                         415 /  429         37.9     
     26.4      64.3X
+        SQL ORC Vectorized with copy               422 /  428         37.2     
     26.9      63.2X
+        SQL ORC MR                                1767 / 1773          8.9     
    112.3      15.1X
 
 
         SQL Single DOUBLE Column Scan:       Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                 23832 / 23838          0.7     
   1515.2       1.0X
-        SQL Json                                16204 / 16226          1.0     
   1030.2       1.5X
-        SQL Parquet Vectorized                     242 /  306         65.1     
     15.4      98.6X
-        SQL Parquet MR                            2462 / 2482          6.4     
    156.5       9.7X
-        SQL ORC Vectorized                         419 /  451         37.6     
     26.6      56.9X
-        SQL ORC Vectorized with copy               426 /  447         36.9     
     27.1      55.9X
-        SQL ORC MR                                1885 / 1931          8.3     
    119.8      12.6X
+        SQL CSV                                 34223 / 34324          0.5     
   2175.8       1.0X
+        SQL Json                                17784 / 17785          0.9     
   1130.7       1.9X
+        SQL Parquet Vectorized                     277 /  283         56.7     
     17.6     123.4X
+        SQL Parquet MR                            2356 / 2386          6.7     
    149.8      14.5X
+        SQL ORC Vectorized                         533 /  536         29.5     
     33.9      64.2X
+        SQL ORC Vectorized with copy               541 /  546         29.1     
     34.4      63.3X
+        SQL ORC MR                                2166 / 2177          7.3     
    137.7      15.8X
         */
         sqlBenchmark.run()
 
@@ -294,41 +296,42 @@ object DataSourceReadBenchmark {
         }
 
         /*
-        Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
+        OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 
4.14.33-51.37.amzn1.x86_64
+        Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
         Single TINYINT Column Scan:          Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        ParquetReader Vectorized                   187 /  201         84.2     
     11.9       1.0X
-        ParquetReader Vectorized -> Row            101 /  103        156.4     
      6.4       1.9X
+        ParquetReader Vectorized                   198 /  202         79.4     
     12.6       1.0X
+        ParquetReader Vectorized -> Row            119 /  121        132.3     
      7.6       1.7X
 
 
         Single SMALLINT Column Scan:         Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        ParquetReader Vectorized                   272 /  288         57.8     
     17.3       1.0X
-        ParquetReader Vectorized -> Row            213 /  219         73.7     
     13.6       1.3X
+        ParquetReader Vectorized                   282 /  287         55.8     
     17.9       1.0X
+        ParquetReader Vectorized -> Row            246 /  247         64.0     
     15.6       1.1X
 
 
         Single INT Column Scan:              Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        ParquetReader Vectorized                   252 /  288         62.5     
     16.0       1.0X
-        ParquetReader Vectorized -> Row            232 /  246         67.7     
     14.8       1.1X
+        ParquetReader Vectorized                   258 /  262         60.9     
     16.4       1.0X
+        ParquetReader Vectorized -> Row            259 /  260         60.8     
     16.5       1.0X
 
 
         Single BIGINT Column Scan:           Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        ParquetReader Vectorized                   415 /  454         37.9     
     26.4       1.0X
-        ParquetReader Vectorized -> Row            407 /  432         38.6     
     25.9       1.0X
+        ParquetReader Vectorized                   361 /  369         43.6     
     23.0       1.0X
+        ParquetReader Vectorized -> Row            361 /  371         43.6     
     22.9       1.0X
 
 
         Single FLOAT Column Scan:            Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        ParquetReader Vectorized                   251 /  302         62.7     
     16.0       1.0X
-        ParquetReader Vectorized -> Row            220 /  234         71.5     
     14.0       1.1X
+        ParquetReader Vectorized                   253 /  261         62.2     
     16.1       1.0X
+        ParquetReader Vectorized -> Row            254 /  256         61.9     
     16.2       1.0X
 
 
         Single DOUBLE Column Scan:           Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        ParquetReader Vectorized                   432 /  436         36.4     
     27.5       1.0X
-        ParquetReader Vectorized -> Row            414 /  422         38.0     
     26.4       1.0X
+        ParquetReader Vectorized                   357 /  364         44.0     
     22.7       1.0X
+        ParquetReader Vectorized -> Row            358 /  366         44.0     
     22.7       1.0X
         */
         parquetReaderBenchmark.run()
       }
@@ -382,16 +385,17 @@ object DataSourceReadBenchmark {
         }
 
         /*
-        Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
+        OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 
4.14.33-51.37.amzn1.x86_64
+        Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
         Int and String Scan:                 Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                 19172 / 19173          0.5     
   1828.4       1.0X
-        SQL Json                                12799 / 12873          0.8     
   1220.6       1.5X
-        SQL Parquet Vectorized                    2558 / 2564          4.1     
    244.0       7.5X
-        SQL Parquet MR                            4514 / 4583          2.3     
    430.4       4.2X
-        SQL ORC Vectorized                        2561 / 2697          4.1     
    244.3       7.5X
-        SQL ORC Vectorized with copy              3076 / 3110          3.4     
    293.4       6.2X
-        SQL ORC MR                                4197 / 4283          2.5     
    400.2       4.6X
+        SQL CSV                                 27145 / 27158          0.4     
   2588.7       1.0X
+        SQL Json                                12969 / 13337          0.8     
   1236.8       2.1X
+        SQL Parquet Vectorized                    2419 / 2448          4.3     
    230.7      11.2X
+        SQL Parquet MR                            4631 / 4633          2.3     
    441.7       5.9X
+        SQL ORC Vectorized                        2412 / 2465          4.3     
    230.0      11.3X
+        SQL ORC Vectorized with copy              2633 / 2675          4.0     
    251.1      10.3X
+        SQL ORC MR                                4280 / 4350          2.4     
    408.2       6.3X
         */
         benchmark.run()
       }
@@ -445,16 +449,17 @@ object DataSourceReadBenchmark {
         }
 
         /*
-        Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
+        OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 
4.14.33-51.37.amzn1.x86_64
+        Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
         Repeated String:                     Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                 10889 / 10924          1.0     
   1038.5       1.0X
-        SQL Json                                  7903 / 7931          1.3     
    753.7       1.4X
-        SQL Parquet Vectorized                     777 /  799         13.5     
     74.1      14.0X
-        SQL Parquet MR                            1682 / 1708          6.2     
    160.4       6.5X
-        SQL ORC Vectorized                         532 /  534         19.7     
     50.7      20.5X
-        SQL ORC Vectorized with copy               742 /  743         14.1     
     70.7      14.7X
-        SQL ORC MR                                1996 / 2002          5.3     
    190.4       5.5X
+        SQL CSV                                 17345 / 17424          0.6     
   1654.1       1.0X
+        SQL Json                                  8639 / 8664          1.2     
    823.9       2.0X
+        SQL Parquet Vectorized                     839 /  854         12.5     
     80.0      20.7X
+        SQL Parquet MR                            1771 / 1775          5.9     
    168.9       9.8X
+        SQL ORC Vectorized                         550 /  569         19.1     
     52.4      31.6X
+        SQL ORC Vectorized with copy               785 /  849         13.4     
     74.9      22.1X
+        SQL ORC MR                                2168 / 2202          4.8     
    206.7       8.0X
         */
         benchmark.run()
       }
@@ -574,30 +579,31 @@ object DataSourceReadBenchmark {
         }
 
         /*
-        Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
+        OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 
4.14.33-51.37.amzn1.x86_64
+        Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
         Partitioned Table:                   Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        Data column - CSV                       25428 / 25454          0.6     
   1616.7       1.0X
-        Data column - Json                      12689 / 12774          1.2     
    806.7       2.0X
-        Data column - Parquet Vectorized           222 /  231         70.7     
     14.1     114.3X
-        Data column - Parquet MR                  3355 / 3397          4.7     
    213.3       7.6X
-        Data column - ORC Vectorized               332 /  338         47.4     
     21.1      76.6X
-        Data column - ORC Vectorized with copy     338 /  341         46.5     
     21.5      75.2X
-        Data column - ORC MR                      2329 / 2356          6.8     
    148.0      10.9X
-        Partition column - CSV                  17465 / 17502          0.9     
   1110.4       1.5X
-        Partition column - Json                 10865 / 10876          1.4     
    690.8       2.3X
-        Partition column - Parquet Vectorized       48 /   52        325.4     
      3.1     526.1X
-        Partition column - Parquet MR             1695 / 1696          9.3     
    107.8      15.0X
-        Partition column - ORC Vectorized           49 /   54        319.9     
      3.1     517.2X
-        Partition column - ORC Vectorized with copy 49 /   52        324.1     
      3.1     524.0X
-        Partition column - ORC MR                 1548 / 1549         10.2     
     98.4      16.4X
-        Both columns - CSV                      25568 / 25595          0.6     
   1625.6       1.0X
-        Both columns - Json                     13658 / 13673          1.2     
    868.4       1.9X
-        Both columns - Parquet Vectorized          270 /  296         58.3     
     17.1      94.3X
-        Both columns - Parquet MR                 3501 / 3521          4.5     
    222.6       7.3X
-        Both columns - ORC Vectorized              377 /  380         41.7     
     24.0      67.4X
-        Both column - ORC Vectorized with copy     447 /  448         35.2     
     28.4      56.9X
-        Both columns - ORC MR                     2440 / 2446          6.4     
    155.2      10.4X
+        Data column - CSV                       32613 / 32841          0.5     
   2073.4       1.0X
+        Data column - Json                      13343 / 13469          1.2     
    848.3       2.4X
+        Data column - Parquet Vectorized           302 /  318         52.1     
     19.2     108.0X
+        Data column - Parquet MR                  2908 / 2924          5.4     
    184.9      11.2X
+        Data column - ORC Vectorized               412 /  425         38.1     
     26.2      79.1X
+        Data column - ORC Vectorized with copy     442 /  446         35.6     
     28.1      73.8X
+        Data column - ORC MR                      2390 / 2396          6.6     
    152.0      13.6X
+        Partition column - CSV                    9626 / 9683          1.6     
    612.0       3.4X
+        Partition column - Json                 10909 / 10923          1.4     
    693.6       3.0X
+        Partition column - Parquet Vectorized       69 /   76        228.4     
      4.4     473.6X
+        Partition column - Parquet MR             1898 / 1933          8.3     
    120.7      17.2X
+        Partition column - ORC Vectorized           67 /   74        236.0     
      4.2     489.4X
+        Partition column - ORC Vectorized with copy 65 /   72        241.9     
      4.1     501.6X
+        Partition column - ORC MR                 1743 / 1749          9.0     
    110.8      18.7X
+        Both columns - CSV                      35523 / 35552          0.4     
   2258.5       0.9X
+        Both columns - Json                     13676 / 13681          1.2     
    869.5       2.4X
+        Both columns - Parquet Vectorized          317 /  326         49.5     
     20.2     102.7X
+        Both columns - Parquet MR                 3333 / 3336          4.7     
    211.9       9.8X
+        Both columns - ORC Vectorized              441 /  446         35.6     
     28.1      73.9X
+        Both column - ORC Vectorized with copy     517 /  524         30.4     
     32.9      63.1X
+        Both columns - ORC MR                     2574 / 2577          6.1     
    163.6      12.7X
         */
         benchmark.run()
       }
@@ -684,41 +690,42 @@ object DataSourceReadBenchmark {
         }
 
         /*
-        Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
+        OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 
4.14.33-51.37.amzn1.x86_64
+        Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
         String with Nulls Scan:              Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                 13518 / 13529          0.8     
   1289.2       1.0X
-        SQL Json                                10895 / 10926          1.0     
   1039.0       1.2X
-        SQL Parquet Vectorized                    1539 / 1581          6.8     
    146.8       8.8X
-        SQL Parquet MR                            3746 / 3811          2.8     
    357.3       3.6X
-        ParquetReader Vectorized                  1070 / 1112          9.8     
    102.0      12.6X
-        SQL ORC Vectorized                        1389 / 1408          7.6     
    132.4       9.7X
-        SQL ORC Vectorized with copy              1736 / 1750          6.0     
    165.6       7.8X
-        SQL ORC MR                                3799 / 3892          2.8     
    362.3       3.6X
+        SQL CSV                                 14875 / 14920          0.7     
   1418.6       1.0X
+        SQL Json                                10974 / 10992          1.0     
   1046.5       1.4X
+        SQL Parquet Vectorized                    1711 / 1750          6.1     
    163.2       8.7X
+        SQL Parquet MR                            3838 / 3884          2.7     
    366.0       3.9X
+        ParquetReader Vectorized                  1155 / 1168          9.1     
    110.2      12.9X
+        SQL ORC Vectorized                        1341 / 1380          7.8     
    127.9      11.1X
+        SQL ORC Vectorized with copy              1659 / 1716          6.3     
    158.2       9.0X
+        SQL ORC MR                                3594 / 3634          2.9     
    342.7       4.1X
 
 
         String with Nulls Scan:              Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                 10854 / 10892          1.0     
   1035.2       1.0X
-        SQL Json                                  8129 / 8138          1.3     
    775.3       1.3X
-        SQL Parquet Vectorized                    1053 / 1104         10.0     
    100.4      10.3X
-        SQL Parquet MR                            2840 / 2854          3.7     
    270.8       3.8X
-        ParquetReader Vectorized                   978 / 1008         10.7     
     93.2      11.1X
-        SQL ORC Vectorized                        1312 / 1387          8.0     
    125.1       8.3X
-        SQL ORC Vectorized with copy              1764 / 1772          5.9     
    168.2       6.2X
-        SQL ORC MR                                3435 / 3445          3.1     
    327.6       3.2X
+        SQL CSV                                 17219 / 17264          0.6     
   1642.1       1.0X
+        SQL Json                                  8843 / 8864          1.2     
    843.3       1.9X
+        SQL Parquet Vectorized                    1169 / 1178          9.0     
    111.4      14.7X
+        SQL Parquet MR                            2676 / 2697          3.9     
    255.2       6.4X
+        ParquetReader Vectorized                  1068 / 1071          9.8     
    101.8      16.1X
+        SQL ORC Vectorized                        1319 / 1319          7.9     
    125.8      13.1X
+        SQL ORC Vectorized with copy              1638 / 1639          6.4     
    156.2      10.5X
+        SQL ORC MR                                3230 / 3257          3.2     
    308.1       5.3X
 
 
         String with Nulls Scan:              Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                   8043 / 8048          1.3     
    767.1       1.0X
-        SQL Json                                  4911 / 4923          2.1     
    468.4       1.6X
-        SQL Parquet Vectorized                     206 /  209         51.0     
     19.6      39.1X
-        SQL Parquet MR                            1528 / 1537          6.9     
    145.8       5.3X
-        ParquetReader Vectorized                   216 /  219         48.6     
     20.6      37.2X
-        SQL ORC Vectorized                         462 /  466         22.7     
     44.1      17.4X
-        SQL ORC Vectorized with copy               568 /  572         18.5     
     54.2      14.2X
-        SQL ORC MR                                1647 / 1649          6.4     
    157.1       4.9X
+        SQL CSV                                 13976 / 14053          0.8     
   1332.8       1.0X
+        SQL Json                                  5166 / 5176          2.0     
    492.6       2.7X
+        SQL Parquet Vectorized                     274 /  282         38.2     
     26.2      50.9X
+        SQL Parquet MR                            1553 / 1555          6.8     
    148.1       9.0X
+        ParquetReader Vectorized                   241 /  246         43.5     
     23.0      57.9X
+        SQL ORC Vectorized                         476 /  479         22.0     
     45.4      29.3X
+        SQL ORC Vectorized with copy               584 /  588         17.9     
     55.7      23.9X
+        SQL ORC MR                                1720 / 1734          6.1     
    164.1       8.1X
         */
         benchmark.run()
       }
@@ -773,38 +780,39 @@ object DataSourceReadBenchmark {
         }
 
         /*
-        Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
+        OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 
4.14.33-51.37.amzn1.x86_64
+        Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
         Single Column Scan from 10 columns:  Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                   3663 / 3665          0.3     
   3493.2       1.0X
-        SQL Json                                  3122 / 3160          0.3     
   2977.5       1.2X
-        SQL Parquet Vectorized                      40 /   42         26.2     
     38.2      91.5X
-        SQL Parquet MR                             189 /  192          5.5     
    180.2      19.4X
-        SQL ORC Vectorized                          48 /   51         21.6     
     46.2      75.6X
-        SQL ORC Vectorized with copy                49 /   52         21.4     
     46.7      74.9X
-        SQL ORC MR                                 280 /  289          3.7     
    267.1      13.1X
+        SQL CSV                                   3478 / 3481          0.3     
   3316.4       1.0X
+        SQL Json                                  2646 / 2654          0.4     
   2523.6       1.3X
+        SQL Parquet Vectorized                      67 /   72         15.8     
     63.5      52.2X
+        SQL Parquet MR                             207 /  214          5.1     
    197.6      16.8X
+        SQL ORC Vectorized                          69 /   76         15.2     
     66.0      50.3X
+        SQL ORC Vectorized with copy                70 /   76         15.0     
     66.5      49.9X
+        SQL ORC MR                                 299 /  303          3.5     
    285.1      11.6X
 
 
         Single Column Scan from 50 columns:  Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                 11420 / 11505          0.1     
  10891.1       1.0X
-        SQL Json                                11905 / 12120          0.1     
  11353.6       1.0X
-        SQL Parquet Vectorized                      50 /   54         20.9     
     47.8     227.7X
-        SQL Parquet MR                             195 /  199          5.4     
    185.8      58.6X
-        SQL ORC Vectorized                          61 /   65         17.3     
     57.8     188.3X
-        SQL ORC Vectorized with copy                62 /   65         17.0     
     58.8     185.2X
-        SQL ORC MR                                 847 /  865          1.2     
    807.4      13.5X
+        SQL CSV                                   9214 / 9236          0.1     
   8786.7       1.0X
+        SQL Json                                  9943 / 9978          0.1     
   9482.7       0.9X
+        SQL Parquet Vectorized                      77 /   86         13.6     
     73.3     119.8X
+        SQL Parquet MR                             229 /  235          4.6     
    218.6      40.2X
+        SQL ORC Vectorized                          84 /   96         12.5     
     80.0     109.9X
+        SQL ORC Vectorized with copy                83 /   91         12.6     
     79.4     110.7X
+        SQL ORC MR                                 843 /  854          1.2     
    804.0      10.9X
 
 
-        Single Column Scan from 100 columns: Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
+        Single Column Scan from 100 columns  Best/Avg Time(ms)    Rate(M/s)   
Per Row(ns)   Relative
         
--------------------------------------------------------------------------------------------
-        SQL CSV                                 21278 / 21404          0.0     
  20292.4       1.0X
-        SQL Json                                22455 / 22625          0.0     
  21414.7       0.9X
-        SQL Parquet Vectorized                      73 /   75         14.4     
     69.3     292.8X
-        SQL Parquet MR                             220 /  226          4.8     
    209.7      96.8X
-        SQL ORC Vectorized                          82 /   86         12.8     
     78.2     259.4X
-        SQL ORC Vectorized with copy                82 /   90         12.7     
     78.7     258.0X
-        SQL ORC MR                                1568 / 1582          0.7     
   1495.4      13.6X
+        SQL CSV                                 16503 / 16622          0.1     
  15738.9       1.0X
+        SQL Json                                19109 / 19184          0.1     
  18224.2       0.9X
+        SQL Parquet Vectorized                      99 /  108         10.6     
     94.3     166.8X
+        SQL Parquet MR                             253 /  264          4.1     
    241.6      65.1X
+        SQL ORC Vectorized                         107 /  114          9.8     
    101.6     154.8X
+        SQL ORC Vectorized with copy               107 /  118          9.8     
    102.1     154.1X
+        SQL ORC MR                                1526 / 1529          0.7     
   1455.3      10.8X
         */
         benchmark.run()
       }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to