[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/23027 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...
GitHub user wangyum reopened a pull request: https://github.com/apache/spark/pull/23027 [SPARK-26049][SQL][TEST] FilterPushdownBenchmark add InMemoryTable case ## What changes were proposed in this pull request? `FilterPushdownBenchmark` add InMemoryTable case. ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-26049 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23027.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23027 commit d0a2a3f4af492fbf69f7774e03d67d4af39cd5c7 Author: Yuming Wang Date: 2018-11-14T00:35:35Z Add InMemoryTable filter benchmark commit 01d01e7995ecb72705d0d610892dc99a6c3f4621 Author: Yuming Wang Date: 2018-11-19T07:59:19Z cache inMemoryTable from file commit b8c54ea5048524f7df0b750a11a8fb109b43f479 Author: Yuming Wang Date: 2018-11-19T12:44:46Z Fix path --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/23027#discussion_r235259340 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -2,669 +2,809 @@ Pushdown for many distinct value case -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row (value IS NULL): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11405 / 11485 1.4 725.1 1.0X -Parquet Vectorized (Pushdown) 675 / 690 23.3 42.9 16.9X -Native ORC Vectorized 7127 / 7170 2.2 453.1 1.6X -Native ORC Vectorized (Pushdown) 519 / 541 30.3 33.0 22.0X +Parquet Vectorized7823 / 7996 2.0 497.4 1.0X +Parquet Vectorized (Pushdown) 460 / 468 34.2 29.2 17.0X +Native ORC Vectorized 5412 / 5550 2.9 344.1 1.4X +Native ORC Vectorized (Pushdown) 551 / 563 28.6 35.0 14.2X +InMemoryTable Vectorized 6 /6 2859.1 0.31422.0X +InMemoryTable Vectorized (Pushdown) 5 /6 3023.0 0.31503.6X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11457 / 11473 1.4 728.4 1.0X -Parquet Vectorized (Pushdown) 656 / 686 24.0 41.7 17.5X -Native ORC Vectorized 7328 / 7342 2.1 465.9 1.6X -Native ORC Vectorized (Pushdown) 539 / 565 29.2 34.2 21.3X +Parquet Vectorized 8322 / 11160 1.9 529.1 1.0X +Parquet Vectorized (Pushdown) 463 / 472 34.0 29.4 18.0X +Native ORC Vectorized 5622 / 5635 2.8 357.4 1.5X +Native ORC Vectorized (Pushdown) 563 / 595 27.9 35.8 14.8X +InMemoryTable Vectorized 4831 / 4881 3.3 307.2 1.7X +InMemoryTable Vectorized (Pushdown) 1980 / 2027 7.9 125.9 4.2X --- End diff -- I think the reason is [SPARK-22599](https://issues.apache.org/jira/browse/SPARK-22599). But if we cached all data to memory, the result is: ``` Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized6491 / 6716 2.4 412.7 1.0X Parquet Vectorized (Pushdown) 491 / 496 32.0 31.2 13.2X Native ORC Vectorized 5849 / 6103 2.7 371.9 1.1X Native ORC Vectorized (Pushdown) 533 / 572 29.5 33.9 12.2X InMemoryTable Vectorized 2788 / 2854 5.6 177.2 2.3X InMemoryTable Vectorized (Pushdown)370 / 408 42.5 23.5 17.5X ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/23027 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/23027#discussion_r234521489 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -104,6 +107,10 @@ object FilterPushdownBenchmark extends BenchmarkBase with SQLHelper { df.write.mode("overwrite") .option("parquet.block.size", blockSize).parquet(parquetPath) spark.read.parquet(parquetPath).createOrReplaceTempView("parquetTable") + +df.write.mode("overwrite").save(inMemoryTablePath) --- End diff -- Cache `inMemoryTable` from file to avoid the performance issue: https://github.com/apache/spark/pull/23027#pullrequestreview-175054485 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/23027#discussion_r234482766 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -2,669 +2,809 @@ Pushdown for many distinct value case -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row (value IS NULL): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11405 / 11485 1.4 725.1 1.0X -Parquet Vectorized (Pushdown) 675 / 690 23.3 42.9 16.9X -Native ORC Vectorized 7127 / 7170 2.2 453.1 1.6X -Native ORC Vectorized (Pushdown) 519 / 541 30.3 33.0 22.0X +Parquet Vectorized7823 / 7996 2.0 497.4 1.0X +Parquet Vectorized (Pushdown) 460 / 468 34.2 29.2 17.0X +Native ORC Vectorized 5412 / 5550 2.9 344.1 1.4X +Native ORC Vectorized (Pushdown) 551 / 563 28.6 35.0 14.2X +InMemoryTable Vectorized 6 /6 2859.1 0.31422.0X +InMemoryTable Vectorized (Pushdown) 5 /6 3023.0 0.31503.6X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11457 / 11473 1.4 728.4 1.0X -Parquet Vectorized (Pushdown) 656 / 686 24.0 41.7 17.5X -Native ORC Vectorized 7328 / 7342 2.1 465.9 1.6X -Native ORC Vectorized (Pushdown) 539 / 565 29.2 34.2 21.3X +Parquet Vectorized 8322 / 11160 1.9 529.1 1.0X +Parquet Vectorized (Pushdown) 463 / 472 34.0 29.4 18.0X +Native ORC Vectorized 5622 / 5635 2.8 357.4 1.5X +Native ORC Vectorized (Pushdown) 563 / 595 27.9 35.8 14.8X +InMemoryTable Vectorized 4831 / 4881 3.3 307.2 1.7X +InMemoryTable Vectorized (Pushdown) 1980 / 2027 7.9 125.9 4.2X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 string row (value = '7864320'): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11878 / 11888 1.3 755.2 1.0X -Parquet Vectorized (Pushdown) 630 / 654 25.0 40.1 18.9X -Native ORC Vectorized 7342 / 7362 2.1 466.8 1.6X -Native ORC Vectorized (Pushdown) 519 / 537 30.3 33.0 22.9X +Parquet Vectorized8322 / 8386 1.9 529.1 1.0X +Parquet Vectorized (Pushdown) 434 / 441 36.2 27.6 19.2X +Native ORC Vectorized 5659 / 5944 2.8 359.8 1.5X +Native ORC Vectorized (Pushdown) 535 / 567 29.4 34.0 15.6X +InMemoryTable Vectorized 4784 / 4879 3.3 304.1 1.7X +InMemoryTable Vectorized (Pushdown) 1950 / 1985 8.1 124.0 4.3X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM
[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/23027#discussion_r233689556 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -2,669 +2,809 @@ Pushdown for many distinct value case -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row (value IS NULL): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11405 / 11485 1.4 725.1 1.0X -Parquet Vectorized (Pushdown) 675 / 690 23.3 42.9 16.9X -Native ORC Vectorized 7127 / 7170 2.2 453.1 1.6X -Native ORC Vectorized (Pushdown) 519 / 541 30.3 33.0 22.0X +Parquet Vectorized7823 / 7996 2.0 497.4 1.0X +Parquet Vectorized (Pushdown) 460 / 468 34.2 29.2 17.0X +Native ORC Vectorized 5412 / 5550 2.9 344.1 1.4X +Native ORC Vectorized (Pushdown) 551 / 563 28.6 35.0 14.2X +InMemoryTable Vectorized 6 /6 2859.1 0.31422.0X +InMemoryTable Vectorized (Pushdown) 5 /6 3023.0 0.31503.6X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11457 / 11473 1.4 728.4 1.0X -Parquet Vectorized (Pushdown) 656 / 686 24.0 41.7 17.5X -Native ORC Vectorized 7328 / 7342 2.1 465.9 1.6X -Native ORC Vectorized (Pushdown) 539 / 565 29.2 34.2 21.3X +Parquet Vectorized 8322 / 11160 1.9 529.1 1.0X +Parquet Vectorized (Pushdown) 463 / 472 34.0 29.4 18.0X +Native ORC Vectorized 5622 / 5635 2.8 357.4 1.5X +Native ORC Vectorized (Pushdown) 563 / 595 27.9 35.8 14.8X +InMemoryTable Vectorized 4831 / 4881 3.3 307.2 1.7X +InMemoryTable Vectorized (Pushdown) 1980 / 2027 7.9 125.9 4.2X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 string row (value = '7864320'): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11878 / 11888 1.3 755.2 1.0X -Parquet Vectorized (Pushdown) 630 / 654 25.0 40.1 18.9X -Native ORC Vectorized 7342 / 7362 2.1 466.8 1.6X -Native ORC Vectorized (Pushdown) 519 / 537 30.3 33.0 22.9X +Parquet Vectorized8322 / 8386 1.9 529.1 1.0X +Parquet Vectorized (Pushdown) 434 / 441 36.2 27.6 19.2X +Native ORC Vectorized 5659 / 5944 2.8 359.8 1.5X +Native ORC Vectorized (Pushdown) 535 / 567 29.4 34.0 15.6X +InMemoryTable Vectorized 4784 / 4879 3.3 304.1 1.7X +InMemoryTable Vectorized (Pushdown) 1950 / 1985 8.1 124.0 4.3X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM
[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/23027#discussion_r233687968 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -2,669 +2,809 @@ Pushdown for many distinct value case -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row (value IS NULL): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11405 / 11485 1.4 725.1 1.0X -Parquet Vectorized (Pushdown) 675 / 690 23.3 42.9 16.9X -Native ORC Vectorized 7127 / 7170 2.2 453.1 1.6X -Native ORC Vectorized (Pushdown) 519 / 541 30.3 33.0 22.0X +Parquet Vectorized7823 / 7996 2.0 497.4 1.0X +Parquet Vectorized (Pushdown) 460 / 468 34.2 29.2 17.0X +Native ORC Vectorized 5412 / 5550 2.9 344.1 1.4X +Native ORC Vectorized (Pushdown) 551 / 563 28.6 35.0 14.2X +InMemoryTable Vectorized 6 /6 2859.1 0.31422.0X +InMemoryTable Vectorized (Pushdown) 5 /6 3023.0 0.31503.6X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11457 / 11473 1.4 728.4 1.0X -Parquet Vectorized (Pushdown) 656 / 686 24.0 41.7 17.5X -Native ORC Vectorized 7328 / 7342 2.1 465.9 1.6X -Native ORC Vectorized (Pushdown) 539 / 565 29.2 34.2 21.3X +Parquet Vectorized 8322 / 11160 1.9 529.1 1.0X +Parquet Vectorized (Pushdown) 463 / 472 34.0 29.4 18.0X +Native ORC Vectorized 5622 / 5635 2.8 357.4 1.5X +Native ORC Vectorized (Pushdown) 563 / 595 27.9 35.8 14.8X +InMemoryTable Vectorized 4831 / 4881 3.3 307.2 1.7X +InMemoryTable Vectorized (Pushdown) 1980 / 2027 7.9 125.9 4.2X --- End diff -- Yes. This is the current benchmark result. I plan to improve it step by step. Example: [SPARK-26004](https://issues.apache.org/jira/browse/SPARK-26004) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/23027#discussion_r233686986 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -131,6 +134,15 @@ object FilterPushdownBenchmark extends BenchmarkBase with SQLHelper { } } +Seq(false, true).foreach { pushDownEnabled => + val name = s"InMemoryTable Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}" + benchmark.addCase(name) { _ => +withSQLConf(SQLConf.IN_MEMORY_PARTITION_PRUNING.key -> s"$pushDownEnabled") { --- End diff -- I think the InMemoryTable's partition same to Parquet RowGroup(@kiszk please correct if I'm wrong). We put them together and it's easy to compare performance. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23027#discussion_r233603036 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -131,6 +134,15 @@ object FilterPushdownBenchmark extends BenchmarkBase with SQLHelper { } } +Seq(false, true).foreach { pushDownEnabled => + val name = s"InMemoryTable Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}" + benchmark.addCase(name) { _ => +withSQLConf(SQLConf.IN_MEMORY_PARTITION_PRUNING.key -> s"$pushDownEnabled") { --- End diff -- @wangyum . `FilterPushdownBenchmark` is not related to `Partition Pruning`, isn't it? This benchmark case will be misleading. I'd like to have another benchmark for this `IN_MEMORY_PARTITION_PRUNING `. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23027#discussion_r233600662 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -2,669 +2,809 @@ Pushdown for many distinct value case -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row (value IS NULL): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11405 / 11485 1.4 725.1 1.0X -Parquet Vectorized (Pushdown) 675 / 690 23.3 42.9 16.9X -Native ORC Vectorized 7127 / 7170 2.2 453.1 1.6X -Native ORC Vectorized (Pushdown) 519 / 541 30.3 33.0 22.0X +Parquet Vectorized7823 / 7996 2.0 497.4 1.0X +Parquet Vectorized (Pushdown) 460 / 468 34.2 29.2 17.0X +Native ORC Vectorized 5412 / 5550 2.9 344.1 1.4X +Native ORC Vectorized (Pushdown) 551 / 563 28.6 35.0 14.2X +InMemoryTable Vectorized 6 /6 2859.1 0.31422.0X +InMemoryTable Vectorized (Pushdown) 5 /6 3023.0 0.31503.6X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11457 / 11473 1.4 728.4 1.0X -Parquet Vectorized (Pushdown) 656 / 686 24.0 41.7 17.5X -Native ORC Vectorized 7328 / 7342 2.1 465.9 1.6X -Native ORC Vectorized (Pushdown) 539 / 565 29.2 34.2 21.3X +Parquet Vectorized 8322 / 11160 1.9 529.1 1.0X +Parquet Vectorized (Pushdown) 463 / 472 34.0 29.4 18.0X +Native ORC Vectorized 5622 / 5635 2.8 357.4 1.5X +Native ORC Vectorized (Pushdown) 563 / 595 27.9 35.8 14.8X +InMemoryTable Vectorized 4831 / 4881 3.3 307.2 1.7X +InMemoryTable Vectorized (Pushdown) 1980 / 2027 7.9 125.9 4.2X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 string row (value = '7864320'): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11878 / 11888 1.3 755.2 1.0X -Parquet Vectorized (Pushdown) 630 / 654 25.0 40.1 18.9X -Native ORC Vectorized 7342 / 7362 2.1 466.8 1.6X -Native ORC Vectorized (Pushdown) 519 / 537 30.3 33.0 22.9X +Parquet Vectorized8322 / 8386 1.9 529.1 1.0X +Parquet Vectorized (Pushdown) 434 / 441 36.2 27.6 19.2X +Native ORC Vectorized 5659 / 5944 2.8 359.8 1.5X +Native ORC Vectorized (Pushdown) 535 / 567 29.4 34.0 15.6X +InMemoryTable Vectorized 4784 / 4879 3.3 304.1 1.7X +InMemoryTable Vectorized (Pushdown) 1950 / 1985 8.1 124.0 4.3X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server
[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23027#discussion_r23362 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -2,669 +2,809 @@ Pushdown for many distinct value case -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row (value IS NULL): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11405 / 11485 1.4 725.1 1.0X -Parquet Vectorized (Pushdown) 675 / 690 23.3 42.9 16.9X -Native ORC Vectorized 7127 / 7170 2.2 453.1 1.6X -Native ORC Vectorized (Pushdown) 519 / 541 30.3 33.0 22.0X +Parquet Vectorized7823 / 7996 2.0 497.4 1.0X +Parquet Vectorized (Pushdown) 460 / 468 34.2 29.2 17.0X +Native ORC Vectorized 5412 / 5550 2.9 344.1 1.4X +Native ORC Vectorized (Pushdown) 551 / 563 28.6 35.0 14.2X +InMemoryTable Vectorized 6 /6 2859.1 0.31422.0X +InMemoryTable Vectorized (Pushdown) 5 /6 3023.0 0.31503.6X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized 11457 / 11473 1.4 728.4 1.0X -Parquet Vectorized (Pushdown) 656 / 686 24.0 41.7 17.5X -Native ORC Vectorized 7328 / 7342 2.1 465.9 1.6X -Native ORC Vectorized (Pushdown) 539 / 565 29.2 34.2 21.3X +Parquet Vectorized 8322 / 11160 1.9 529.1 1.0X +Parquet Vectorized (Pushdown) 463 / 472 34.0 29.4 18.0X +Native ORC Vectorized 5622 / 5635 2.8 357.4 1.5X +Native ORC Vectorized (Pushdown) 563 / 595 27.9 35.8 14.8X +InMemoryTable Vectorized 4831 / 4881 3.3 307.2 1.7X +InMemoryTable Vectorized (Pushdown) 1980 / 2027 7.9 125.9 4.2X --- End diff -- Oh, it's slower than Orc/Parquet file in this case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org