Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21603 Benchmark result: ``` ##########################[ Pushdown benchmark for InSet -> InFilters ]########################## Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz InSet -> InFilters (threshold: 10, values count: 5, distribution: 10): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7649 / 7678 2.1 486.3 1.0X Parquet Vectorized (Pushdown) 316 / 325 49.8 20.1 24.2X Native ORC Vectorized 6787 / 7353 2.3 431.5 1.1X Native ORC Vectorized (Pushdown) 1010 / 1020 15.6 64.2 7.6X InSet -> InFilters (threshold: 10, values count: 5, distribution: 50): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7537 / 7944 2.1 479.2 1.0X Parquet Vectorized (Pushdown) 297 / 306 52.9 18.9 25.3X Native ORC Vectorized 6768 / 6779 2.3 430.3 1.1X Native ORC Vectorized (Pushdown) 998 / 1017 15.8 63.4 7.6X InSet -> InFilters (threshold: 10, values count: 5, distribution: 90): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7500 / 7592 2.1 476.8 1.0X Parquet Vectorized (Pushdown) 299 / 306 52.5 19.0 25.1X Native ORC Vectorized 6758 / 6797 2.3 429.7 1.1X Native ORC Vectorized (Pushdown) 982 / 993 16.0 62.4 7.6X InSet -> InFilters (threshold: 10, values count: 10, distribution: 10): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7566 / 8153 2.1 481.1 1.0X Parquet Vectorized (Pushdown) 319 / 328 49.3 20.3 23.7X Native ORC Vectorized 6761 / 6812 2.3 429.8 1.1X Native ORC Vectorized (Pushdown) 995 / 1013 15.8 63.3 7.6X InSet -> InFilters (threshold: 10, values count: 10, distribution: 50): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7512 / 7581 2.1 477.6 1.0X Parquet Vectorized (Pushdown) 315 / 322 50.0 20.0 23.9X Native ORC Vectorized 6712 / 6774 2.3 426.8 1.1X Native ORC Vectorized (Pushdown) 1001 / 1032 15.7 63.6 7.5X InSet -> InFilters (threshold: 10, values count: 10, distribution: 90): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7603 / 7689 2.1 483.4 1.0X Parquet Vectorized (Pushdown) 308 / 317 51.0 19.6 24.7X Native ORC Vectorized 7011 / 7605 2.2 445.7 1.1X Native ORC Vectorized (Pushdown) 1038 / 1067 15.2 66.0 7.3X InSet -> InFilters (threshold: 10, values count: 50, distribution: 10): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7750 / 7796 2.0 492.7 1.0X Parquet Vectorized (Pushdown) 7855 / 7961 2.0 499.4 1.0X Native ORC Vectorized 7120 / 7820 2.2 452.7 1.1X Native ORC Vectorized (Pushdown) 1085 / 1122 14.5 69.0 7.1X InSet -> InFilters (threshold: 10, values count: 50, distribution: 50): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7920 / 8012 2.0 503.5 1.0X Parquet Vectorized (Pushdown) 7855 / 8159 2.0 499.4 1.0X Native ORC Vectorized 7087 / 7105 2.2 450.6 1.1X Native ORC Vectorized (Pushdown) 1098 / 1118 14.3 69.8 7.2X InSet -> InFilters (threshold: 10, values count: 50, distribution: 90): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7809 / 7918 2.0 496.5 1.0X Parquet Vectorized (Pushdown) 7800 / 7857 2.0 495.9 1.0X Native ORC Vectorized 7089 / 7145 2.2 450.7 1.1X Native ORC Vectorized (Pushdown) 1102 / 1123 14.3 70.1 7.1X InSet -> InFilters (threshold: 10, values count: 100, distribution: 10): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7793 / 7823 2.0 495.5 1.0X Parquet Vectorized (Pushdown) 7765 / 7863 2.0 493.7 1.0X Native ORC Vectorized 7066 / 7175 2.2 449.2 1.1X Native ORC Vectorized (Pushdown) 1194 / 1210 13.2 75.9 6.5X InSet -> InFilters (threshold: 10, values count: 100, distribution: 50): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7782 / 7816 2.0 494.8 1.0X Parquet Vectorized (Pushdown) 7737 / 7782 2.0 491.9 1.0X Native ORC Vectorized 7056 / 7100 2.2 448.6 1.1X Native ORC Vectorized (Pushdown) 1193 / 1264 13.2 75.9 6.5X InSet -> InFilters (threshold: 10, values count: 100, distribution: 90): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7726 / 8463 2.0 491.2 1.0X Parquet Vectorized (Pushdown) 8759 / 9317 1.8 556.9 0.9X Native ORC Vectorized 7067 / 7379 2.2 449.3 1.1X Native ORC Vectorized (Pushdown) 1352 / 1520 11.6 86.0 5.7X InSet -> InFilters (threshold: 100, values count: 5, distribution: 10): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 8694 / 10591 1.8 552.7 1.0X Parquet Vectorized (Pushdown) 288 / 313 54.5 18.3 30.1X Native ORC Vectorized 6898 / 7754 2.3 438.6 1.3X Native ORC Vectorized (Pushdown) 1037 / 1279 15.2 65.9 8.4X InSet -> InFilters (threshold: 100, values count: 5, distribution: 50): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7584 / 8641 2.1 482.2 1.0X Parquet Vectorized (Pushdown) 293 / 299 53.7 18.6 25.9X Native ORC Vectorized 6849 / 6918 2.3 435.5 1.1X Native ORC Vectorized (Pushdown) 996 / 1020 15.8 63.3 7.6X InSet -> InFilters (threshold: 100, values count: 5, distribution: 90): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7617 / 7947 2.1 484.3 1.0X Parquet Vectorized (Pushdown) 311 / 341 50.5 19.8 24.5X Native ORC Vectorized 7468 / 8006 2.1 474.8 1.0X Native ORC Vectorized (Pushdown) 1095 / 1173 14.4 69.6 7.0X InSet -> InFilters (threshold: 100, values count: 10, distribution: 10): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 8364 / 9682 1.9 531.8 1.0X Parquet Vectorized (Pushdown) 325 / 498 48.4 20.7 25.7X Native ORC Vectorized 6931 / 7797 2.3 440.7 1.2X Native ORC Vectorized (Pushdown) 1010 / 1032 15.6 64.2 8.3X InSet -> InFilters (threshold: 100, values count: 10, distribution: 50): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7647 / 8096 2.1 486.2 1.0X Parquet Vectorized (Pushdown) 315 / 409 49.9 20.1 24.2X Native ORC Vectorized 6839 / 7307 2.3 434.8 1.1X Native ORC Vectorized (Pushdown) 1033 / 1077 15.2 65.7 7.4X InSet -> InFilters (threshold: 100, values count: 10, distribution: 90): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7653 / 8725 2.1 486.6 1.0X Parquet Vectorized (Pushdown) 319 / 367 49.3 20.3 24.0X Native ORC Vectorized 7121 / 8047 2.2 452.7 1.1X Native ORC Vectorized (Pushdown) 1066 / 1133 14.8 67.8 7.2X InSet -> InFilters (threshold: 100, values count: 50, distribution: 10): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7804 / 8926 2.0 496.2 1.0X Parquet Vectorized (Pushdown) 476 / 568 33.0 30.3 16.4X Native ORC Vectorized 7891 / 8248 2.0 501.7 1.0X Native ORC Vectorized (Pushdown) 1158 / 1195 13.6 73.6 6.7X InSet -> InFilters (threshold: 100, values count: 50, distribution: 50): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 8576 / 9488 1.8 545.2 1.0X Parquet Vectorized (Pushdown) 522 / 593 30.1 33.2 16.4X Native ORC Vectorized 7199 / 7692 2.2 457.7 1.2X Native ORC Vectorized (Pushdown) 1180 / 1280 13.3 75.0 7.3X InSet -> InFilters (threshold: 100, values count: 50, distribution: 90): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 9142 / 10012 1.7 581.2 1.0X Parquet Vectorized (Pushdown) 536 / 620 29.3 34.1 17.0X Native ORC Vectorized 7720 / 9655 2.0 490.9 1.2X Native ORC Vectorized (Pushdown) 1110 / 1212 14.2 70.6 8.2X InSet -> InFilters (threshold: 100, values count: 100, distribution: 10): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 8478 / 9150 1.9 539.0 1.0X Parquet Vectorized (Pushdown) 700 / 900 22.5 44.5 12.1X Native ORC Vectorized 7427 / 8069 2.1 472.2 1.1X Native ORC Vectorized (Pushdown) 1185 / 1633 13.3 75.3 7.2X InSet -> InFilters (threshold: 100, values count: 100, distribution: 50): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7919 / 9670 2.0 503.5 1.0X Parquet Vectorized (Pushdown) 731 / 750 21.5 46.5 10.8X Native ORC Vectorized 7205 / 7306 2.2 458.1 1.1X Native ORC Vectorized (Pushdown) 1191 / 1224 13.2 75.7 6.6X InSet -> InFilters (threshold: 100, values count: 100, distribution: 90): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 7845 / 8146 2.0 498.8 1.0X Parquet Vectorized (Pushdown) 761 / 838 20.7 48.4 10.3X Native ORC Vectorized 7081 / 7741 2.2 450.2 1.1X Native ORC Vectorized (Pushdown) 1289 / 1459 12.2 82.0 6.1X ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org