Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Adar Dembo,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13591

to look at the new patch set (#4).

Change subject: KUDU-2846 (part 1): optimize predicate evaluation for primitives
......................................................................

KUDU-2846 (part 1): optimize predicate evaluation for primitives

This changes to an optimized unrolled-by-8 predicate evaluation for
primitive columns.

Performance is improved by 1.6-2.5x depending on the particular
predicate, type, and nullability (average around 2x). Branches are
reduced by about 7.5x and branch-misses by about 19.6x.

Looking at the "after" perf-stat results, the instructions-per-cycle are
way down, which indicates we're probably stalled on instruction
dependencies or port saturation. This is also indicated by the fact that
the smaller ints don't seem to run any faster than the large ints (which
wouldn't be the case if we were limited by load/store bandwidth). Likely
the next fix here is to use SIMD to do comparisons in parallel as
suggested in the JIRA.  Unfortunately, the compiler doesn't seem to
auto-vectorize these loops, so if we want further gain, we'll have to
add some more hand-written vectorization code. So, we'll start with this
easy win.

perf-stat before:
 Performance counter stats for 'build/latest/bin/column_predicate-test 
--gtest_filter=*Bench*':
      73905.379627      task-clock (msec)         #    0.997 CPUs utilized
               504      context-switches          #    0.007 K/sec
                19      cpu-migrations            #    0.000 K/sec
             1,296      page-faults               #    0.018 K/sec
   272,810,081,028      cycles                    #    3.691 GHz
   938,488,388,743      instructions              #    3.44  insn per cycle
   148,052,698,322      branches                  # 2003.274 M/sec
       882,311,138      branch-misses             #    0.60% of all branches

perf-stat after:
 Performance counter stats for 'build/latest/bin/column_predicate-test 
--gtest_filter=*Bench*':

      38024.082495      task-clock (msec)         #    0.996 CPUs utilized
               252      context-switches          #    0.007 K/sec
                 7      cpu-migrations            #    0.000 K/sec
             1,295      page-faults               #    0.034 K/sec
   142,231,469,257      cycles                    #    3.741 GHz
   172,437,810,470      instructions              #    1.21  insn per cycle
    18,460,117,439      branches                  #  485.485 M/sec
        60,960,125      branch-misses             #    0.33% of all branches

Detailed results before:
  int8   NOT NULL   (c = 0) 632.1M evals/sec    4.44 cycles/eval
  int8   NULL       (c = 0) 515.6M evals/sec    5.48 cycles/eval
  int8   NOT NULL   (c >= 0) 630.8M evals/sec   4.45 cycles/eval
  int8   NULL       (c >= 0) 426.8M evals/sec   6.64 cycles/eval
  int8   NOT NULL   (c >= 0 AND c < 2) 632.6M evals/sec 4.44 cycles/eval
  int8   NULL       (c >= 0 AND c < 2) 384.7M evals/sec 7.38 cycles/eval
  int16  NOT NULL   (c = 0) 644.4M evals/sec    4.34 cycles/eval
  int16  NULL       (c = 0) 524.6M evals/sec    5.37 cycles/eval
  int16  NOT NULL   (c >= 0) 638.4M evals/sec   4.37 cycles/eval
  int16  NULL       (c >= 0) 458.8M evals/sec   6.17 cycles/eval
  int16  NOT NULL   (c >= 0 AND c < 2) 635.3M evals/sec 4.40 cycles/eval
  int16  NULL       (c >= 0 AND c < 2) 335.1M evals/sec 8.50 cycles/eval
  int32  NOT NULL   (c = 0) 645.2M evals/sec    4.34 cycles/eval
  int32  NULL       (c = 0) 492.6M evals/sec    5.77 cycles/eval
  int32  NOT NULL   (c >= 0) 608.6M evals/sec   4.64 cycles/eval
  int32  NULL       (c >= 0) 440.7M evals/sec   6.48 cycles/eval
  int32  NOT NULL   (c >= 0 AND c < 2) 637.8M evals/sec 4.43 cycles/eval
  int32  NULL       (c >= 0 AND c < 2) 348.0M evals/sec 8.22 cycles/eval
  int64  NOT NULL   (c = 0) 642.7M evals/sec    4.36 cycles/eval
  int64  NULL       (c = 0) 505.3M evals/sec    5.60 cycles/eval
  int64  NOT NULL   (c >= 0) 643.5M evals/sec   4.34 cycles/eval
  int64  NULL       (c >= 0) 472.8M evals/sec   6.00 cycles/eval
  int64  NOT NULL   (c >= 0 AND c < 2) 634.2M evals/sec 4.43 cycles/eval
  int64  NULL       (c >= 0 AND c < 2) 396.7M evals/sec 7.21 cycles/eval
  float  NOT NULL   (c = 0) 604.6M evals/sec    4.63 cycles/eval
  float  NULL       (c = 0) 406.7M evals/sec    7.05 cycles/eval
  float  NOT NULL   (c >= 0) 545.3M evals/sec   5.20 cycles/eval
  float  NULL       (c >= 0) 384.4M evals/sec   7.39 cycles/eval
  float  NOT NULL   (c >= 0 AND c < 2) 583.2M evals/sec 4.80 cycles/eval
  float  NULL       (c >= 0 AND c < 2) 312.2M evals/sec 9.12 cycles/eval
  double NOT NULL   (c = 0) 614.0M evals/sec    4.56 cycles/eval
  double NULL       (c = 0) 471.5M evals/sec    5.99 cycles/eval
  double NOT NULL   (c >= 0) 623.0M evals/sec   4.48 cycles/eval
  double NULL       (c >= 0) 379.9M evals/sec   7.47 cycles/eval
  double NOT NULL   (c >= 0 AND c < 2) 599.5M evals/sec 4.67 cycles/eval
  double NULL       (c >= 0 AND c < 2) 415.2M evals/sec 6.82 cycles/eval

Detailed results after:
  int8   NOT NULL   (c = 0) 1053.2M evals/sec   2.74 cycles/eval
  int8   NULL       (c = 0) 1044.6M evals/sec   2.77 cycles/eval
  int8   NOT NULL   (c >= 0) 1044.6M evals/sec  2.77 cycles/eval
  int8   NULL       (c >= 0) 1045.0M evals/sec  2.76 cycles/eval
  int8   NOT NULL   (c >= 0 AND c < 2) 943.8M evals/sec 3.03 cycles/eval
  int8   NULL       (c >= 0 AND c < 2) 933.9M evals/sec 3.07 cycles/eval
  int16  NOT NULL   (c = 0) 1039.2M evals/sec   2.78 cycles/eval
  int16  NULL       (c = 0) 1037.2M evals/sec   2.79 cycles/eval
  int16  NOT NULL   (c >= 0) 1041.2M evals/sec  2.78 cycles/eval
  int16  NULL       (c >= 0) 1049.2M evals/sec  2.76 cycles/eval
  int16  NOT NULL   (c >= 0 AND c < 2) 948.3M evals/sec 3.00 cycles/eval
  int16  NULL       (c >= 0 AND c < 2) 951.1M evals/sec 2.99 cycles/eval
  int32  NOT NULL   (c = 0) 1049.5M evals/sec   2.74 cycles/eval
  int32  NULL       (c = 0) 1050.3M evals/sec   2.74 cycles/eval
  int32  NOT NULL   (c >= 0) 1040.9M evals/sec  2.76 cycles/eval
  int32  NULL       (c >= 0) 1050.1M evals/sec  2.75 cycles/eval
  int32  NOT NULL   (c >= 0 AND c < 2) 944.7M evals/sec 2.99 cycles/eval
  int32  NULL       (c >= 0 AND c < 2) 931.0M evals/sec 3.03 cycles/eval
  int64  NOT NULL   (c = 0) 1040.7M evals/sec   2.75 cycles/eval
  int64  NULL       (c = 0) 1040.8M evals/sec   2.76 cycles/eval
  int64  NOT NULL   (c >= 0) 1036.6M evals/sec  2.77 cycles/eval
  int64  NULL       (c >= 0) 1044.9M evals/sec  2.75 cycles/eval
  int64  NOT NULL   (c >= 0 AND c < 2) 941.2M evals/sec 3.02 cycles/eval
  int64  NULL       (c >= 0 AND c < 2) 930.9M evals/sec 3.04 cycles/eval
  float  NOT NULL   (c = 0) 1040.6M evals/sec   2.77 cycles/eval
  float  NULL       (c = 0) 1035.7M evals/sec   2.78 cycles/eval
  float  NOT NULL   (c >= 0) 960.5M evals/sec   3.00 cycles/eval
  float  NULL       (c >= 0) 955.2M evals/sec   3.01 cycles/eval
  float  NOT NULL   (c >= 0 AND c < 2) 797.5M evals/sec 3.56 cycles/eval
  float  NULL       (c >= 0 AND c < 2) 797.6M evals/sec 3.56 cycles/eval
  double NOT NULL   (c = 0) 1036.4M evals/sec   2.77 cycles/eval
  double NULL       (c = 0) 988.7M evals/sec    2.91 cycles/eval
  double NOT NULL   (c >= 0) 924.2M evals/sec   3.11 cycles/eval
  double NULL       (c >= 0) 930.9M evals/sec   3.10 cycles/eval
  double NOT NULL   (c >= 0 AND c < 2) 800.0M evals/sec 3.55 cycles/eval
  double NULL       (c >= 0 AND c < 2) 802.5M evals/sec 3.52 cycles/eval

Change-Id: I9dd062961a3cd2c892997d6aba12684e603628a1
---
M src/kudu/common/CMakeLists.txt
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
3 files changed, 152 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/13591/4
--
To view, visit http://gerrit.cloudera.org:8080/13591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9dd062961a3cd2c892997d6aba12684e603628a1
Gerrit-Change-Number: 13591
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to