[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

Zach Amsden (Code Review) Thu, 08 Jun 2017 15:18:52 -0700

Zach Amsden has uploaded a new patch set (#17).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
......................................................................


IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status: Runs with ASAN, runs without crashes on ee tests.  Performance
results inconclusive; this may not be worth the complexity unless we
get really selective or really expensive predicates.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

Performance:  Hard to get!  Simple predicates did not show
improvement, in fact regressed.  I used a TPC-H scale 30 dataset,
duplicated 3x into a 'biglineitem' table.

 select count(*) from biglineitem WHERE l_returnflag = 'A';
    1.43s -> 1.53s

 select count(*) from biglineitem WHERE l_shipinstruct = 'DELIVER IN
PERSON';
    1.43s -> 1.53s

 select count(*) from biglineitem WHERE l_quantity > 49;

    0.93s -> 0.93s

 select count(*) from biglineitem WHERE instr(l_shipdate, '1994-11') >
0;

    2.33s -> 1.03s

So this appears to only be a win for expensive predicates.

Update: I added changes to make computed predicate costs visible from
the frontend to the backend, and tried a TPC-DS scale 10 dataset, which
has better queries (lots of IN groups).  Still there is a regression in
raw query performance.

Update again: reversing one branch to UNLIKELY in the ir compiled code
gave these results on TPC-DSx10:

Q46 (limit modified to 100000):
  3.05 -> 2.89 sec
Q27 (limit modified to 100000):
  3.13 -> 3.09 sec

Update (6.5.2017): Switched back to bitset evaluation for scratch batch
rather than using an extra byte per tuple row.  Surprisingly, this is
the best performing diff yet for selective predicates.  Using TPC-DS-10
with a filter:

Query: select count(*) from
  store_sales,
  customer,
  customer_address
where store_sales.ss_customer_sk = customer.c_customer_sk and
      customer.c_current_addr_sk = customer_address.ca_address_sk
 and customer_address.ca_city IN (... list of ~200 cities)

This almost makes parity.  I get 3836140 rows in 1.41-1.45s with this
diff and in 1.39-1.45s with the same caching optimization for batch_size
on top of the last change (Tim's parquest column reader optimizations).
So in a totally fair comparison, we are still losing :(

Update (6.8.2017): Tried Tim's suggestions.  Best I could get on this
query was now only 1.54s.  My host OS kernel did change during this time
so I re-measured baseline and got a best time of 1.53s.  So we seem to
be making parity, but not showcasing a big win.

Maybe runtime filters will pay off better; they already are toggled
dynamically so nothing should be lost.

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.h
M be/src/runtime/tuple.h
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
20 files changed, 559 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/17
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <zams...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com>

[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

Reply via email to