[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-06-20 Thread Zach Amsden (Code Review)
Zach Amsden has abandoned this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Abandoned

Not enough win for the complexity.

-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-06-16 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#19).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status: Runs with ASAN, runs without crashes on ee tests.  Performance
results inconclusive; this may not be worth the complexity unless we
get really selective or really expensive predicates.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

Performance:  Hard to get!  Simple predicates did not show
improvement, in fact regressed.  I used a TPC-H scale 30 dataset,
duplicated 3x into a 'biglineitem' table.

 select count(*) from biglineitem WHERE l_returnflag = 'A';
1.43s -> 1.53s

 select count(*) from biglineitem WHERE l_shipinstruct = 'DELIVER IN
PERSON';
1.43s -> 1.53s

 select count(*) from biglineitem WHERE l_quantity > 49;

0.93s -> 0.93s

 select count(*) from biglineitem WHERE instr(l_shipdate, '1994-11') >
0;

2.33s -> 1.03s

So this appears to only be a win for expensive predicates.

Update: I added changes to make computed predicate costs visible from
the frontend to the backend, and tried a TPC-DS scale 10 dataset, which
has better queries (lots of IN groups).  Still there is a regression in
raw query performance.

Update again: reversing one branch to UNLIKELY in the ir compiled code
gave these results on TPC-DSx10:

Q46 (limit modified to 10):
  3.05 -> 2.89 sec
Q27 (limit modified to 10):
  3.13 -> 3.09 sec

Update (6.5.2017): Switched back to bitset evaluation for scratch batch
rather than using an extra byte per tuple row.  Surprisingly, this is
the best performing diff yet for selective predicates.  Using TPC-DS-10
with a filter:

Query: select count(*) from
  store_sales,
  customer,
  customer_address
where store_sales.ss_customer_sk = customer.c_customer_sk and
  customer.c_current_addr_sk = customer_address.ca_address_sk
 and customer_address.ca_city IN (... list of ~200 cities)

This almost makes parity.  I get 3836140 rows in 1.41-1.45s with this
diff and in 1.39-1.45s with the same caching optimization for batch_size
on top of the last change (Tim's parquest column reader optimizations).
So in a totally fair comparison, we are still losing :(

Update (6.8.2017): Tried Tim's suggestions.  Best I could get on this
query was now only 1.54s.  My host OS kernel did change during this time
so I re-measured baseline and got a best time of 1.53s.  So we seem to
be making parity, but not showcasing a big win.

Maybe runtime filters will pay off better; they already are toggled
dynamically so nothing should be lost.

Update (6.16.2017): Tried runtime filters, simplified the logic as much
as possible.  Everything is still a wash:

Query: select count(*) from
  store_sales,
  customer,
  customer_address
where store_sales.ss_customer_sk = customer.c_customer_sk and
  customer.c_current_addr_sk = customer_address.ca_address_sk
 and customer_address.ca_city IN
 ('Providence', 'Mount Zion', 'Greenville', 'Mount Olive', 'Wildwood',
   'Hillcrest', 'Crossroads', 'Belmont', 'Wilson', 'Riverdale',
'Newport',
   'Springdale', 'Mountain View', 'Forest Hills', 'Bridgeport', 'White
Oak',
   'Oakwood', 'Newtown', 'Macedonia', 'Five Forks', 'Edgewood',
'Arlington',
   'Unionville', 'Red Hill', 'Clinton', 'Woodville', 'Summit',
'Jamestown',
   'Hamilton', 'Clifton', 'Union Hill', 'Sulphur Springs', 'Lincoln',
   'Lebanon', 'Farmington', 'Enterprise', 'Brownsville', 'Ashland',
'Woodland',
   'Waterloo', 'Valley View', 'Oak Ridge', 'Maple Grove', 'Kingston',
   'Jackson', 'Greenfield', 'Green Acres', 'Plainview', 'Marion',
'Florence',
   'Qarth', 'R\'lyeh', 'Mos Eisley', 'King\'s Landing', "Q'onos",

[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-06-16 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#18).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status: Runs with ASAN, runs without crashes on ee tests.  Performance
results inconclusive; this may not be worth the complexity unless we
get really selective or really expensive predicates.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

Performance:  Hard to get!  Simple predicates did not show
improvement, in fact regressed.  I used a TPC-H scale 30 dataset,
duplicated 3x into a 'biglineitem' table.

 select count(*) from biglineitem WHERE l_returnflag = 'A';
1.43s -> 1.53s

 select count(*) from biglineitem WHERE l_shipinstruct = 'DELIVER IN
PERSON';
1.43s -> 1.53s

 select count(*) from biglineitem WHERE l_quantity > 49;

0.93s -> 0.93s

 select count(*) from biglineitem WHERE instr(l_shipdate, '1994-11') >
0;

2.33s -> 1.03s

So this appears to only be a win for expensive predicates.

Update: I added changes to make computed predicate costs visible from
the frontend to the backend, and tried a TPC-DS scale 10 dataset, which
has better queries (lots of IN groups).  Still there is a regression in
raw query performance.

Update again: reversing one branch to UNLIKELY in the ir compiled code
gave these results on TPC-DSx10:

Q46 (limit modified to 10):
  3.05 -> 2.89 sec
Q27 (limit modified to 10):
  3.13 -> 3.09 sec

Update (6.5.2017): Switched back to bitset evaluation for scratch batch
rather than using an extra byte per tuple row.  Surprisingly, this is
the best performing diff yet for selective predicates.  Using TPC-DS-10
with a filter:

Query: select count(*) from
  store_sales,
  customer,
  customer_address
where store_sales.ss_customer_sk = customer.c_customer_sk and
  customer.c_current_addr_sk = customer_address.ca_address_sk
 and customer_address.ca_city IN (... list of ~200 cities)

This almost makes parity.  I get 3836140 rows in 1.41-1.45s with this
diff and in 1.39-1.45s with the same caching optimization for batch_size
on top of the last change (Tim's parquest column reader optimizations).
So in a totally fair comparison, we are still losing :(

Update (6.8.2017): Tried Tim's suggestions.  Best I could get on this
query was now only 1.54s.  My host OS kernel did change during this time
so I re-measured baseline and got a best time of 1.53s.  So we seem to
be making parity, but not showcasing a big win.

Maybe runtime filters will pay off better; they already are toggled
dynamically so nothing should be lost.

Update (6.16.2017): Tried runtime filters, simplified the logic as much
as possible.  Everything is still a wash:

Query: select count(*) from
  store_sales,
  customer,
  customer_address
where store_sales.ss_customer_sk = customer.c_customer_sk and
  customer.c_current_addr_sk = customer_address.ca_address_sk
 and customer_address.ca_city IN
 ('Providence', 'Mount Zion', 'Greenville', 'Mount Olive', 'Wildwood',
   'Hillcrest', 'Crossroads', 'Belmont', 'Wilson', 'Riverdale',
'Newport',
   'Springdale', 'Mountain View', 'Forest Hills', 'Bridgeport', 'White
Oak',
   'Oakwood', 'Newtown', 'Macedonia', 'Five Forks', 'Edgewood',
'Arlington',
   'Unionville', 'Red Hill', 'Clinton', 'Woodville', 'Summit',
'Jamestown',
   'Hamilton', 'Clifton', 'Union Hill', 'Sulphur Springs', 'Lincoln',
   'Lebanon', 'Farmington', 'Enterprise', 'Brownsville', 'Ashland',
'Woodland',
   'Waterloo', 'Valley View', 'Oak Ridge', 'Maple Grove', 'Kingston',
   'Jackson', 'Greenfield', 'Green Acres', 'Plainview', 'Marion',
'Florence',
   'Qarth', 'R\'lyeh', 'Mos Eisley', 'King\'s Landing', "Q'onos",

[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-06-08 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 420:   LIKELY(dictionary_results_.num_bits() > 0)) {
> I think the predicate evaluation on 40,000 values is probably cheap enough 
We can certainly try it.  I was worried the pre-computation might be expensive 
if we have, say, string manipulation in predicates, as opposed to inexpensive, 
simple comparisons.  Still, even if we have the same number of predicate 
evaluations, they end up going through the unoptimized EvalConjuncts() path, as 
opposed to the codegen'd path.

As for IS_FILTERED, that is set when the column reader is created.  
IS_DICT_ENCODED is determined per page.  We are left with no way to remove 
dictionary_results_.num_bits() on a per-row basis, since we can't unset 
IS_FILTERED, and IS_DICT_ENCODED may be true even if the encoding did not cover 
all values.

I'll try precomputing all the values and see what happens.


-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-06-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 430: if (!dictionary_results_.Get(dict_index)) {
I'm thinking that this branch lengthened the critical path through this 
function for the non-selective case, since the two loads can't necessarily 
happen in parallel. Hard to know if that's the source of the regression without 
profiling. We could try to do something like this so that the two loads can 
happen in parallel:
  
  void* slot = tuple->GetSlot(tuple_offset_);
  T* val_ptr = reinterpret_cast(slot);
  dict_decoder_.GetValue(dict_index, val_ptr);
  if (!dictionary_results_.Get(dict_index)) {
filtered_rows_->Set(*num_values + val_count, true);
  }

Or maybe even change Set() so that it uses branch-free code and do something 
like:

   filtered_rows_->Set(!dictionary_results_.Get(dict_index));


-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-06-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 420:   LIKELY(dictionary_results_.num_bits() > 0)) {
> Not liking this branch, but it is unavoidable without pre-computation overh
I think the predicate evaluation on 40,000 values is probably cheap enough to 
just do it. In most cases we would have to evaluate the predicates anyway when 
processing the dictionary-encoded pages.

I don't think I fully understand the inexpensive predicate check that you're 
envisioning. Does it really need to be done per-row instead of per-batch (like 
the IS_DICT_ENCODED branch).


-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-06-05 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 16:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/hdfs-parquet-scanner-ir.cc
File be/src/exec/hdfs-parquet-scanner-ir.cc:

Line 89: idx = scratch_batch_.NextValidTuple(idx);
Surprising, despite apparently being costlier, the bitmap scan appears to be 
fast, especially when predicates are selective.


http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 420:   LIKELY(dictionary_results_.num_bits() > 0)) {
Not liking this branch, but it is unavoidable without pre-computation overhead. 
 When we have a partially dictionary encoding on a filtered column that spills 
to plain encoding, we don't pre-evaluate the dictionary against predicates.  We 
could do so anyway, but it may end up being an unnecessary up-front cost.  
However, the number of dictionary entries before spilling to plain is bounded, 
so the cost should be fixed.

Maybe we can afford to do so if the predicates are "inexpensive", but then we 
are still left with this check for expensive predicate (and have no way to 
determine cost here without another check).


http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 172: // pointer.  We don't need to transfer the extra filter byte.
Comment obsolete with latest diff.


http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 94:   NullIndicatorOffset(int byte_offset = 0, int bit_offset = -1)
This fixes an apparent bug, where byte_offset = -1 (passed in constructor for 
ParquetColumnReader as NullIndicatorOffset(-1, -1) ends up converting into code 
which will execute something like:

tuple_mem[-1] |= 0

Which despite not modifying memory, could be an out of bounds memory access.


http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/util/bitmap-test.cc
File be/src/util/bitmap-test.cc:

Line 81:   for (index = bm.NextSetBit(); index < size; index = 
bm.NextSetBit(index)) {
test needs updating for behavior change: index -> index + 1


-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-06-05 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#16).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status: Runs with ASAN, runs without crashes on ee tests.  Performance
results inconclusive; this may not be worth the complexity unless we
get really selective or really expensive predicates.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

Performance:  Hard to get!  Simple predicates did not show
improvement, in fact regressed.  I used a TPC-H scale 30 dataset,
duplicated 3x into a 'biglineitem' table.

 select count(*) from biglineitem WHERE l_returnflag = 'A';
1.43s -> 1.53s

 select count(*) from biglineitem WHERE l_shipinstruct = 'DELIVER IN
PERSON';
1.43s -> 1.53s

 select count(*) from biglineitem WHERE l_quantity > 49;

0.93s -> 0.93s

 select count(*) from biglineitem WHERE instr(l_shipdate, '1994-11') >
0;

2.33s -> 1.03s

So this appears to only be a win for expensive predicates.

Update: I added changes to make computed predicate costs visible from
the frontend to the backend, and tried a TPC-DS scale 10 dataset, which
has better queries (lots of IN groups).  Still there is a regression in
raw query performance.

Update again: reversing one branch to UNLIKELY in the ir compiled code
gave these results on TPC-DSx10:

Q46 (limit modified to 10):
  3.05 -> 2.89 sec
Q27 (limit modified to 10):
  3.13 -> 3.09 sec

Update (6.5.2017): Switched back to bitset evaluation for scratch batch
rather than using an extra byte per tuple row.  Surprisingly, this is
the best performing diff yet for selective predicates.  Using TPC-DS-10
with a filter:

Query: select count(*) from
  store_sales,
  customer,
  customer_address
where store_sales.ss_customer_sk = customer.c_customer_sk and
  customer.c_current_addr_sk = customer_address.ca_address_sk
 and customer_address.ca_city IN (... list of ~200 cities)

This almost makes parity.  I get 3836140 rows in 1.41-1.45s with this
diff and in 1.39-1.45s with the same caching optimization for batch_size
on top of the last change (Tim's parquest column reader optimizations).
So in a totally fair comparison, we are still losing :(

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.h
M be/src/runtime/tuple.h
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
20 files changed, 552 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/16
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 

[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-06-01 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 14:

I'm getting kind of disheartened by this approach.  The biggest obstacle is 
that we don't know ahead of time what conjuncts will be useful for filtering, 
as we don't know which columns are fully dictionary encoded.  None of the 
approaches I have tried really make parity with the baseline, and only very 
expensive conjuncts on sets of small to mid-size distinct values appear to be a 
win.

It is hard to come up with a data set and query that perform much better 
without artificially making poor queries to begin with.  Nothing reasonable 
I've tried on TPC-Hx30 or TPC-DSx10 have produced great results.

-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-06-01 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#14).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status: Runs with ASAN, runs without crashes on ee tests.  Performance
results inconclusive; this may not be worth the complexity unless we
get really selective or really expensive predicates.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

Performance:  Hard to get!  Simple predicates did not show
improvement, in fact regressed.  I used a TPC-H scale 30 dataset,
duplicated 3x into a 'biglineitem' table.

 select count(*) from biglineitem WHERE l_returnflag = 'A';
1.43s -> 1.53s

 select count(*) from biglineitem WHERE l_shipinstruct = 'DELIVER IN
PERSON';
1.43s -> 1.53s

 select count(*) from biglineitem WHERE l_quantity > 49;

0.93s -> 0.93s

 select count(*) from biglineitem WHERE instr(l_shipdate, '1994-11') >
0;

2.33s -> 1.03s

So this appears to only be a win for expensive predicates.

Update: I added changes to make computed predicate costs visible from
the frontend to the backend, and tried a TPC-DS scale 10 dataset, which
has better queries (lots of IN groups).  Still there is a regression in
raw query performance.

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.h
M be/src/runtime/tuple.h
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
20 files changed, 527 insertions(+), 142 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/14
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-31 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#13).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status: Ready for review.  Runs with ASAN, runs without crashes
on ee tests.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

Performance:  Hard to get!  Simple predicates did not show
improvement, in fact regressed.  I used a TPC-H scale 30 dataset,
duplicated 3x into a 'biglineitem' table.

 select count(*) from biglineitem WHERE l_returnflag = 'A';
1.43s -> 1.53s

 select count(*) from biglineitem WHERE l_shipinstruct = 'DELIVER IN
PERSON';
1.43s -> 1.53s

 select count(*) from biglineitem WHERE l_quantity > 49;

0.93s -> 0.93s

 select count(*) from biglineitem WHERE instr(l_shipdate, '1994-11') >
0;

2.33s -> 1.03s

So this appears to only be a win for expensive predicates.

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.h
M be/src/runtime/tuple.h
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
18 files changed, 527 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/13
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-31 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#12).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status: Ready for review.  Seems to be passing all tests, runs with
ASAN, and gives expected results.  Need to come up with some specific
test cases that exercise this functionality and measure performance.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.h
M be/src/runtime/tuple.h
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
18 files changed, 527 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-30 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 9:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

PS9, Line 567: skippable_map
> skippable_map != nullptr
Done


PS9, Line 586: 0 
> huh ?
Was diverting for testing


PS9, Line 586: (*skippable_map)[i] == true
> (*skippable_map)[i]
Done


PS9, Line 590: bool
> skip_val
Done


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/hdfs-parquet-scanner-ir.cc
File be/src/exec/hdfs-parquet-scanner-ir.cc:

PS9, Line 88: LOG(INFO) << "tuple idx: " << tuple_index;
> debugging output ?
Yeah, removed this


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS9, Line 1050: std::map&
> DictionaryFilterPositionMap
I think it will need to be HdfsScanNodeBase::DictionaryFilterPositionMap - not 
sure even a using declaration will help have the syntax, but I'll try.


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

PS9, Line 167: typedef std::map 
DictFilterConjunctsPositionMap;
> Comment what this map stands for ?
Done


PS9, Line 366: Mapping to original position in conjuncts
> Mapping from SlotId to ...
Done


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

PS9, Line 206: IS_FILTERED
> Comment what IS_FILTERED stands for.
Done


PS9, Line 240: (filters == nullptr) ^ IS_FILTERED
> Is there a reason to not use DCHECK((filters == nullptr) != IS_FILTERED) ?
1 fewer character?  I can change this though.


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

PS9, Line 390: return false;
> Is this hard coded for testing ? If not, please document why it's always fa
No, it gets overridden in the derived class.


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/util/bitmap.h
File be/src/util/bitmap.h:

PS9, Line 86: (mask & buffer_[word_index])
> (mask & buffer[word_index]) == 0
Done


PS9, Line 99: buffer_.size()
> num_words_
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-26 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#11).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status: Ready for review.  Seems to be passing all tests, runs with
ASAN, and gives expected results.  Need to come up with some specific
test cases that exercise this functionality and measure performance.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.h
M be/src/runtime/tuple.h
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
21 files changed, 511 insertions(+), 148 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-26 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6726/10/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

Line 167:   static const int MAX_BATCH_SIZE = 65536;
Didn't end up needing this.


http://gerrit.cloudera.org:8080/#/c/6726/10/be/src/util/bitmap.h
File be/src/util/bitmap.h:

Line 79:   /// Returns next set bit, or num_bits_ if the end is reached.
This function ended up unused as well.  What shall we do with it?


http://gerrit.cloudera.org:8080/#/c/6726/10/testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test:

Line 25:parquet dictionary predicates: int_col IS NULL, int_col > 1
Bogus test change needs to be backed out.


-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-26 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#10).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status: Ready for review.  Seems to be passing all tests, runs with
ASAN, and gives expected results.  Need to come up with some specific
test cases that exercise this functionality and measure performance.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/descriptors.h
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.h
M be/src/runtime/tuple.h
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
23 files changed, 510 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-24 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#9).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status: ready for review.  Seems to be passing all tests, runs with
ASAN, and gives expected results.  Need to come up with some specific
test cases that exercise this functionality and measure performance.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.h
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
20 files changed, 564 insertions(+), 180 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 7:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6726/7//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4864 Speed up single slot predicates with dictionaries
> There is a bit of a disconnect with the JIRA as specified and the intention
Thanks for the clarification. Agree this is the best path forward for the 
moment.

We should make sure to straighten out the JIRAs though so that the JIRA 
attached to this diff actually reflects the work you're doing, and that we 
track the other optimisation as separate work.


http://gerrit.cloudera.org:8080/#/c/6726/7/be/src/exec/hdfs-parquet-scanner-ir.cc
File be/src/exec/hdfs-parquet-scanner-ir.cc:

Line 77:tuple_index = scratch_batch_->NextValidTuple()) {
> Your intuition and mine are similar.  I did my best to make the "simple" ve
+1 no point optimising it until the code around it has settled down.


http://gerrit.cloudera.org:8080/#/c/6726/7/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 876: RETURN_IF_ERROR(scalar_reader->InitDictionary(dict_filter_tuple));
This is orthogonal, but we really should evaluate single-slot runtime filters 
against the dictionary as well. The payoff there is potentially really big - 
they can be quite selective and are expensive to evaluate. Do we have a JIRA 
for that?


http://gerrit.cloudera.org:8080/#/c/6726/7/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 317:   bool ReadValueBatch(ScratchTupleBatch& scratch, int* num_values) {
> My guess is that inlining the functions from the header should get about as
There's also a design layering question about whether the ColumnReader 
abstraction should know about ScratchTupleBatch. IMO there's a cleaner 
separation if ColumnReaders only care about materialising a column into a 
memory buffer and setting bits based on predicates as fast as possible, not 
about how we're staging tuples in a scratch batch. 

Even after inlining it adds an additional pointer indirection via 'scratch'. We 
can try to coax the compiler into undoing what we want it to do but at some 
point it's more robust and adds less cognitive load if we just write code 
that's closer to what we want the compiler to generate.

On a lot of queries we spend up to 50% of CPU time in this function so the perf 
matters a lot (I'm not just being anal about shaving CPU cycles). We've had a 
couple rounds of attempting to optimise this function and it is *very* 
sensitive. Even then we left a lot of performance on the floor for subtle 
reasons: see https://gerrit.cloudera.org/#/c/6950/. The big problem is that we 
have multiple layers of non-trivial inlined functions inside that loop and it's 
hard to reason about what the compiler and CPU will actually be doing.


PS7, Line 408: IS_FILTERED
> The behavior is different as a class, not just at this function level, so I
It looks like the only perf-critical place that references IS_FILTERED are in 
these two templatised functions, so I'm still skeptical that it makes sense to 
templatise the whole class - otherwise it could just be a member variable. I 
don't see a reason to treat it differently from IS_DICT_ENCODED.


Line 410: return false;
I'm going to retract my previous comment about materializing the indices as a 
separate step. I played around with a prototype of batching the slot 
materialisation and the best design looks like making the dictionary decoder 
RLE-aware so that it can only does one dictionary lookup per run. See 
https://github.com/timarmstrong/incubator-impala/commit/d45d8fae03d2645683f82ece21fce9310f78ac7a#diff-66ee012551a23c734bbe834529dd3b3dR306
 (warning - it's pretty rough).

I think that would mean pushing the bitmap check down into the dictionary 
decoder, which is something very different from what I was talking about.

That requires rearranging a lot of code in a more fundamental so the short-term 
change to ReadFilteredSlot() doesn't really matter either way.


Line 417:   if (IS_FILTERED) batch.InvalidateTuple(val_index);
> True.  (Was not so in original code).
Do you have a repro? If it's a correctness bug we should make sure we're on top 
of it.

I agree we're not strict about validating that the def levels are in the 
expected range and we could accept a technically-invalid parquet file as valid 
input. I think it's reasonable that we don't exhaustively validate the file so 
long as we don't overrun any buffers etc as a result.

I don't see how that leads to any further consequences though. Impala assumes 
that all slots coming out of HDFS scans are nullable, so it can't be 
non-NULLable from Impala's point of view.

If the Parquet file claims that a column is non-NULLable within that file 
(max_def_level == 0) then I don't think we 

[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-22 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 7:

(17 comments)

Thanks for looking at this!

http://gerrit.cloudera.org:8080/#/c/6726/7//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4864 Speed up single slot predicates with dictionaries
> The JIRA seems to be talking about doing something slightly different: taki
There is a bit of a disconnect with the JIRA as specified and the intention 
behind the JIRA, which had me initially working down the wrong path.  I 
realized that (x OP k) type predicates can only be evaluated for OP = or <=>, 
and started working on a way to identify such predicates in sub-expressions of 
the conjuncts.  It all seemed quite speculative and limited in applicability.  
Once can imagine expanding this to ordering predicates with sorted 
dictionaries, but as we don't have that yet, it would be building without a 
foundation.

After clarification from Marcel, it became apparent that what was intended was 
to be able to evaluate arbitrary single slot predicates against the dictionary 
(as we do this anyway for row-group filtering) and save those results to 
eliminate the cost of re-evaluating the same conjuncts again later.

The central difficulty of this problem is that the set of conjuncts which can 
be dictionary evaluated is not known until row group time.

This is the best proposal that I have come up with so far at attempting to 
solve that problem.

I agree we should get the design right on this one.  Doing this wrong could be 
costly in terms of either performance or complexity.


http://gerrit.cloudera.org:8080/#/c/6726/7/be/src/exec/hdfs-parquet-scanner-ir.cc
File be/src/exec/hdfs-parquet-scanner-ir.cc:

Line 40
> I believe this code was deliberately written like this to avoid the indirec
I have no problem reverting this.  Originally I had the two functions merged 
into one, but I didn't like imposing the overhead of checking the bitmap on 
this version.  While the code evolved, I had to put an interface on 
ScratchBatch just for sanity, as too many pieces of code were playing with its 
internals.

I think the interface is actually broad enough now to go back to the original 
code (minus the interference with internals).  I'll simply do this at the end:

scratch_batch_->Advance((scratch_tuple - scratch_tuple_start) / tuple_size);


Line 77:tuple_index = scratch_batch_->NextValidTuple()) {
> It seems like this might be slow for the case when many or all of the tuple
Your intuition and mine are similar.  I did my best to make the "simple" 
version of NextSetBit() as fast as possible, but I believe intrinsics might 
actually be warranted here.

Wanted to get the core ideas sound before going in that direction though.

Oh, and for reference, I also tried various other bitmap implementations - 
boost::dynamic_bitset and std::vector.  I didn't like the boost 
dependency (or the code too much) and the std::vector (specialized 
bit-vector) implementation compiled to assembler sent me running for the hills.

Branches, branches everywhere...


http://gerrit.cloudera.org:8080/#/c/6726/7/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 935
> I'm not sure, but this hoisting may have been deliberate because the compil
It certainly shouldn't hurt to re-hoist


Line 208:   dict_filters_active_.reset(new 
bool[scanner_conjunct_ctxs_->size()]);
> This probably works fine in practice but afaik scoped_ptr will call free in
Will check


PS7, Line 1051: _
> nit: extraneous _
Originally a class member, until I realized this function had no access.  Will 
fix.


http://gerrit.cloudera.org:8080/#/c/6726/7/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 317:   bool ReadValueBatch(ScratchTupleBatch& scratch, int* num_values) {
> I think it would be best to avoid using the ScratchTupleBatch abstraction i
My guess is that inlining the functions from the header should get about as 
optimal as possible.

Maybe hoisting 'max = scratch.capacity()' to tell the compiler it is unchanged 
throughout the loop, but other than that, I don't see much room for improvement 
that has been lost.  Let's see what perf results look like when that is ready.

I don't want to pass the bitmap directly, as I only want the bitmap to be 
touched (or even known about) by those pieces of code which actually are 
running in a filtered column reader.


Line 408:   if (IS_FILTERED && IS_DICT_ENCODED &&
> This patch pushes the predicate evaluation down into ReadFilteredSlot() so 
Perhaps there is a way to do this, but I don't see a clear path to that.  We 
have some choices, but they are all problematic:

Materialize codewords and values.  Where do the codewords go?  There is no 
space in the tuple.

Materialize codewords only.  What do we do when there is no dictionary?

Materialize 

[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 7:

(19 comments)

Looks promising. We're likely to be building more things on top of this, and 
the potential impact is pretty big if we made the right design decisions, so I 
want to make sure that we're making the right choices here - I think there are 
a few decision points we need to think through careful.

http://gerrit.cloudera.org:8080/#/c/6726/7//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4864 Speed up single slot predicates with dictionaries
The JIRA seems to be talking about doing something slightly different: taking 
predicates of form (x OP constant), translating the constant into the codeword, 
then comparing dict_index directly to the codeword. That avoids any kind of 
dictionary or bitmap lookup and can be vectorized so can be very effective. 
There are various papers about this kind of trick - I can find links if you 
haven't seen them. I think maybe we need to split out this work into a separate 
JIRA.

Your current approach applies to a much larger range of predicates so is also 
very useful. It would be good to think about whether the infrastructure will 
support that kind of optimisation in the future. I think doing a more 
column-oriented approach might make it more natural to do multiple passes over 
the column.


Line 41: function per file format, so we would have to simulate a file format
Yeah that codegen'd function map is a bit of a mess.


http://gerrit.cloudera.org:8080/#/c/6726/7/be/src/exec/hdfs-parquet-scanner-ir.cc
File be/src/exec/hdfs-parquet-scanner-ir.cc:

Line 40
I believe this code was deliberately written like this to avoid the indirection 
via this->scratch_batch_ in the tight loop below - I think we should leave this 
alone.


Line 77:tuple_index = scratch_batch_->NextValidTuple()) {
It seems like this might be slow for the case when many or all of the tuples 
are valid, since NextSetBit() is fairly expensive compared to incrementing a 
counter. It's probably worth waiting to see performance numbers, but my 
intuition is that we'll need to make sure that advancing to the next bit is 
cheaper.


http://gerrit.cloudera.org:8080/#/c/6726/7/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 935
I'm not sure, but this hoisting may have been deliberate because the compiler 
couldn't hoist the load via scratch_batch_ out of the loop


Line 208:   dict_filters_active_.reset(new 
bool[scanner_conjunct_ctxs_->size()]);
This probably works fine in practice but afaik scoped_ptr will call free 
instead of free[] on the array. I think unique_ptr does the right thing.


Line 865: // Legacy impala files cannot be eliminated here, because the 
only way to
I think we're missing an opportunity to apply the optimisation to columns with 
a mix of plain and dictionary-encoded pages.

My understanding is that the pre-existing dictionary filtering optimisation 
only works when the whole column is dictionary-encoded, but your new 
optimisation can be applied on a page-by-page basis.

I'm not sure how hard it is to restructure things to get it to work in that 
case. I think it's probably fairly low-impact because most columns will either 
be predominantly plain-encoded or predominantly dict-encoded, but we should add 
a comment so that the limitation is explicit.


PS7, Line 1051: _
nit: extraneous _


http://gerrit.cloudera.org:8080/#/c/6726/7/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

Line 436:   /// XXX Why is this a scoped_ptr ?
IIRC so we don't need to include the full header.


http://gerrit.cloudera.org:8080/#/c/6726/7/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 317:   bool ReadValueBatch(ScratchTupleBatch& scratch, int* num_values) {
I think it would be best to avoid using the ScratchTupleBatch abstraction in 
this code - these functions are perf critical and it's easier to reason about 
performance when the abstraction is stripped away and the code is directly 
operating over contiguous arrays as much as possible.

Maybe just pass in the bitmap directly?


Line 408:   if (IS_FILTERED && IS_DICT_ENCODED &&
This patch pushes the predicate evaluation down into ReadFilteredSlot() so that 
it's done value-by-value as they are materialised.

I'm not convinced this is the best approach - I think we should consider 
peeling predicate evaluation out into a separate loop and doing it columnwise. 
This would mean that ReadSlot() or some variant would materialize the values 
and the codewords into an array, then we'd do another pass over the codeword 
array to evaluate the conjunctions. Or maybe materialize the codewords in one 
pass, then doing another pass to evaluate conjuncts, then another one to lookup 
the surviving values in the dictionary. 

[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-18 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#7).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status of this diff: Compiles and starts.  Bitmap tests for new
functionality pass.  Tests are broken due to some inadvertent
change in the parquet column reader that causing file decode to
break.  Needs debugging, and there are definitely some bugs,
but the exposition of the concept is now fully formed.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

We could codegen another version of the EvalConjuncts function
without this overhead, but because of the complexity involved in
doing so and the pain involved (ScanNodeBase assumes one codegen'd
function per file format, so we would have to simulate a file format
or some other awful hack).

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/common/global-flags.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
19 files changed, 561 insertions(+), 188 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-18 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#6).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status of this diff: Compiles and starts.  Bitmap tests for new
functionality pass.  Tests are broken due to some inadvertent
change in the parquet column reader that causing file decode to
break.

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/common/global-flags.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
19 files changed, 554 insertions(+), 188 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-02 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#5).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status of this diff: Compiles and starts.  Bitmap tests for new
functionality pass.  FE and BE tests passing.  Query test now
passing (run not finished).

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/common/global-flags.cc
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
12 files changed, 446 insertions(+), 185 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-02 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 4:

(2 comments)

A couple quick observations

http://gerrit.cloudera.org:8080/#/c/6726/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS4, Line 1454: );
The front end orders conjuncts by selectivity and cost. When we pull them out 
and attach them to column materialization, the order is not preserved. If the 
conjunct is evaluated using the dictionary, this should be fine. If the 
conjunct is not evaluated from the dictionary, then it might result in a more 
expensive evaluation.

To put numbers on it:
Suppose there are two conjuncts A and B. A is expensive (cost = 10) and super 
selective (eliminates 0.99). B is cheap (cost = 1) and moderately selective 
(eliminates 0.50). The front end might put B first, so if B eliminates 50% of 
the row, then A is called 50% of the time to eliminate the rest. This has an 
amortized cost of 1 + 0.50 * 10 = 6, which is cheaper than calling A 100% of 
the time.

We can reorder the materialization of the columns at runtime using knowledge of 
which columns are dictionary encoded and which aren't.


PS4, Line 1467: endif
It should be possible to do this up in HdfsScanNode. As an example, see 
extractKuduConjuncts in KuduScanNode. This pulls out conjuncts that will be 
evaluated by Kudu.


-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-HasComments: Yes