Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13591 )
Change subject: KUDU-2846: optimize predicate evaluation for primitives ...................................................................... KUDU-2846: optimize predicate evaluation for primitives This changes to an optimized unrolled-by-8 predicate evaluation for primitive columns. Performance is improved by up to 7.2x depending on the particular predicate, type, and nullability (average around 4.8x). Branches are reduced by about 6.5x and branch-misses by about 22x. It's possible that hand-coded SIMD could improve on this a little bit but likely not worth the effort. perf-stat before: Performance counter stats for 'build/latest/bin/column_predicate-test --gtest_filter=*Bench*': 73905.379627 task-clock (msec) # 0.997 CPUs utilized 272,810,081,028 cycles # 3.691 GHz 938,488,388,743 instructions # 3.44 insn per cycle 148,052,698,322 branches # 2003.274 M/sec 882,311,138 branch-misses # 0.60% of all branches perf-stat after: Performance counter stats for 'build/latest/bin/column_predicate-test --gtest_filter=*Bench*': 15354.077654 task-clock (msec) # 0.992 CPUs utilized 56,850,629,856 cycles # 3.703 GHz 181,599,095,960 instructions # 3.19 insn per cycle 22,496,453,160 branches # 1465.178 M/sec 38,662,626 branch-misses # 0.17% of all branches Detailed results before: int8 NOT NULL (c = 0) 632.1M evals/sec 4.44 cycles/eval int8 NULL (c = 0) 515.6M evals/sec 5.48 cycles/eval int8 NOT NULL (c >= 0) 630.8M evals/sec 4.45 cycles/eval int8 NULL (c >= 0) 426.8M evals/sec 6.64 cycles/eval int8 NOT NULL (c >= 0 AND c < 2) 632.6M evals/sec 4.44 cycles/eval int8 NULL (c >= 0 AND c < 2) 384.7M evals/sec 7.38 cycles/eval int16 NOT NULL (c = 0) 644.4M evals/sec 4.34 cycles/eval int16 NULL (c = 0) 524.6M evals/sec 5.37 cycles/eval int16 NOT NULL (c >= 0) 638.4M evals/sec 4.37 cycles/eval int16 NULL (c >= 0) 458.8M evals/sec 6.17 cycles/eval int16 NOT NULL (c >= 0 AND c < 2) 635.3M evals/sec 4.40 cycles/eval int16 NULL (c >= 0 AND c < 2) 335.1M evals/sec 8.50 cycles/eval int32 NOT NULL (c = 0) 645.2M evals/sec 4.34 cycles/eval int32 NULL (c = 0) 492.6M evals/sec 5.77 cycles/eval int32 NOT NULL (c >= 0) 608.6M evals/sec 4.64 cycles/eval int32 NULL (c >= 0) 440.7M evals/sec 6.48 cycles/eval int32 NOT NULL (c >= 0 AND c < 2) 637.8M evals/sec 4.43 cycles/eval int32 NULL (c >= 0 AND c < 2) 348.0M evals/sec 8.22 cycles/eval int64 NOT NULL (c = 0) 642.7M evals/sec 4.36 cycles/eval int64 NULL (c = 0) 505.3M evals/sec 5.60 cycles/eval int64 NOT NULL (c >= 0) 643.5M evals/sec 4.34 cycles/eval int64 NULL (c >= 0) 472.8M evals/sec 6.00 cycles/eval int64 NOT NULL (c >= 0 AND c < 2) 634.2M evals/sec 4.43 cycles/eval int64 NULL (c >= 0 AND c < 2) 396.7M evals/sec 7.21 cycles/eval float NOT NULL (c = 0) 604.6M evals/sec 4.63 cycles/eval float NULL (c = 0) 406.7M evals/sec 7.05 cycles/eval float NOT NULL (c >= 0) 545.3M evals/sec 5.20 cycles/eval float NULL (c >= 0) 384.4M evals/sec 7.39 cycles/eval float NOT NULL (c >= 0 AND c < 2) 583.2M evals/sec 4.80 cycles/eval float NULL (c >= 0 AND c < 2) 312.2M evals/sec 9.12 cycles/eval double NOT NULL (c = 0) 614.0M evals/sec 4.56 cycles/eval double NULL (c = 0) 471.5M evals/sec 5.99 cycles/eval double NOT NULL (c >= 0) 623.0M evals/sec 4.48 cycles/eval double NULL (c >= 0) 379.9M evals/sec 7.47 cycles/eval double NOT NULL (c >= 0 AND c < 2) 599.5M evals/sec 4.67 cycles/eval double NULL (c >= 0 AND c < 2) 415.2M evals/sec 6.82 cycles/eval Detailed results after: int8 NOT NULL (c = 0) 3660.3M evals/sec 0.76 cycles/eval int8 NULL (c = 0) 3657.1M evals/sec 0.76 cycles/eval int8 NOT NULL (c >= 0) 3712.0M evals/sec 0.75 cycles/eval int8 NULL (c >= 0) 3618.9M evals/sec 0.78 cycles/eval int8 NOT NULL (c >= 0 AND c < 2) 1661.9M evals/sec 1.73 cycles/eval int8 NULL (c >= 0 AND c < 2) 1663.4M evals/sec 1.77 cycles/eval int16 NOT NULL (c = 0) 3781.4M evals/sec 0.73 cycles/eval int16 NULL (c = 0) 3738.3M evals/sec 0.74 cycles/eval int16 NOT NULL (c >= 0) 3672.9M evals/sec 0.76 cycles/eval int16 NULL (c >= 0) 3767.4M evals/sec 0.75 cycles/eval int16 NOT NULL (c >= 0 AND c < 2) 1654.3M evals/sec 1.77 cycles/eval int16 NULL (c >= 0 AND c < 2) 1651.6M evals/sec 1.72 cycles/eval int32 NOT NULL (c = 0) 2925.1M evals/sec 0.97 cycles/eval int32 NULL (c = 0) 2844.4M evals/sec 0.97 cycles/eval int32 NOT NULL (c >= 0) 2942.7M evals/sec 0.95 cycles/eval int32 NULL (c >= 0) 2900.8M evals/sec 0.98 cycles/eval int32 NOT NULL (c >= 0 AND c < 2) 1641.1M evals/sec 1.73 cycles/eval int32 NULL (c >= 0 AND c < 2) 1638.8M evals/sec 1.75 cycles/eval int64 NOT NULL (c = 0) 3878.6M evals/sec 0.71 cycles/eval int64 NULL (c = 0) 3763.9M evals/sec 0.76 cycles/eval int64 NOT NULL (c >= 0) 2784.4M evals/sec 1.01 cycles/eval int64 NULL (c >= 0) 2782.6M evals/sec 1.01 cycles/eval int64 NOT NULL (c >= 0 AND c < 2) 1671.4M evals/sec 1.71 cycles/eval int64 NULL (c >= 0 AND c < 2) 1741.5M evals/sec 1.64 cycles/eval float NOT NULL (c = 0) 3940.8M evals/sec 0.72 cycles/eval float NULL (c = 0) 3820.9M evals/sec 0.72 cycles/eval float NOT NULL (c >= 0) 4571.4M evals/sec 0.60 cycles/eval float NULL (c >= 0) 4741.3M evals/sec 0.58 cycles/eval float NOT NULL (c >= 0 AND c < 2) 1318.0M evals/sec 2.18 cycles/eval float NULL (c >= 0 AND c < 2) 1262.3M evals/sec 2.28 cycles/eval double NOT NULL (c = 0) 2813.4M evals/sec 1.01 cycles/eval double NULL (c = 0) 2664.6M evals/sec 1.06 cycles/eval double NOT NULL (c >= 0) 3620.8M evals/sec 0.77 cycles/eval double NULL (c >= 0) 3657.2M evals/sec 0.76 cycles/eval double NOT NULL (c >= 0 AND c < 2) 1248.8M evals/sec 2.30 cycles/eval double NULL (c >= 0 AND c < 2) 1253.7M evals/sec 2.28 cycles/eval Change-Id: I9dd062961a3cd2c892997d6aba12684e603628a1 Reviewed-on: http://gerrit.cloudera.org:8080/13591 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong <aw...@cloudera.com> --- M src/kudu/common/CMakeLists.txt M src/kudu/common/column_predicate-test.cc M src/kudu/common/column_predicate.cc 3 files changed, 147 insertions(+), 13 deletions(-) Approvals: Kudu Jenkins: Verified Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/13591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9dd062961a3cd2c892997d6aba12684e603628a1 Gerrit-Change-Number: 13591 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <t...@apache.org>