Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/23027#discussion_r235259340 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -2,669 +2,809 @@ Pushdown for many distinct value case ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row (value IS NULL): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Parquet Vectorized 11405 / 11485 1.4 725.1 1.0X -Parquet Vectorized (Pushdown) 675 / 690 23.3 42.9 16.9X -Native ORC Vectorized 7127 / 7170 2.2 453.1 1.6X -Native ORC Vectorized (Pushdown) 519 / 541 30.3 33.0 22.0X +Parquet Vectorized 7823 / 7996 2.0 497.4 1.0X +Parquet Vectorized (Pushdown) 460 / 468 34.2 29.2 17.0X +Native ORC Vectorized 5412 / 5550 2.9 344.1 1.4X +Native ORC Vectorized (Pushdown) 551 / 563 28.6 35.0 14.2X +InMemoryTable Vectorized 6 / 6 2859.1 0.3 1422.0X +InMemoryTable Vectorized (Pushdown) 5 / 6 3023.0 0.3 1503.6X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Parquet Vectorized 11457 / 11473 1.4 728.4 1.0X -Parquet Vectorized (Pushdown) 656 / 686 24.0 41.7 17.5X -Native ORC Vectorized 7328 / 7342 2.1 465.9 1.6X -Native ORC Vectorized (Pushdown) 539 / 565 29.2 34.2 21.3X +Parquet Vectorized 8322 / 11160 1.9 529.1 1.0X +Parquet Vectorized (Pushdown) 463 / 472 34.0 29.4 18.0X +Native ORC Vectorized 5622 / 5635 2.8 357.4 1.5X +Native ORC Vectorized (Pushdown) 563 / 595 27.9 35.8 14.8X +InMemoryTable Vectorized 4831 / 4881 3.3 307.2 1.7X +InMemoryTable Vectorized (Pushdown) 1980 / 2027 7.9 125.9 4.2X --- End diff -- I think the reason is [SPARK-22599](https://issues.apache.org/jira/browse/SPARK-22599). But if we cached all data to memory, the result is: ``` Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 6491 / 6716 2.4 412.7 1.0X Parquet Vectorized (Pushdown) 491 / 496 32.0 31.2 13.2X Native ORC Vectorized 5849 / 6103 2.7 371.9 1.1X Native ORC Vectorized (Pushdown) 533 / 572 29.5 33.9 12.2X InMemoryTable Vectorized 2788 / 2854 5.6 177.2 2.3X InMemoryTable Vectorized (Pushdown) 370 / 408 42.5 23.5 17.5X ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org