Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
......................................................................


Patch Set 4:

(2 comments)

A couple quick observations

http://gerrit.cloudera.org:8080/#/c/6726/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS4, Line 1454: );
The front end orders conjuncts by selectivity and cost. When we pull them out 
and attach them to column materialization, the order is not preserved. If the 
conjunct is evaluated using the dictionary, this should be fine. If the 
conjunct is not evaluated from the dictionary, then it might result in a more 
expensive evaluation.

To put numbers on it:
Suppose there are two conjuncts A and B. A is expensive (cost = 10) and super 
selective (eliminates 0.99). B is cheap (cost = 1) and moderately selective 
(eliminates 0.50). The front end might put B first, so if B eliminates 50% of 
the row, then A is called 50% of the time to eliminate the rest. This has an 
amortized cost of 1 + 0.50 * 10 = 6, which is cheaper than calling A 100% of 
the time.

We can reorder the materialization of the columns at runtime using knowledge of 
which columns are dictionary encoded and which aren't.


PS4, Line 1467: endif
It should be possible to do this up in HdfsScanNode. As an example, see 
extractKuduConjuncts in KuduScanNode. This pulls out conjuncts that will be 
evaluated by Kudu.


-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <zams...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to