Joe McDonnell has posted comments on this change. Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries ......................................................................
Patch Set 4: (2 comments) A couple quick observations http://gerrit.cloudera.org:8080/#/c/6726/4/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: PS4, Line 1454: ); The front end orders conjuncts by selectivity and cost. When we pull them out and attach them to column materialization, the order is not preserved. If the conjunct is evaluated using the dictionary, this should be fine. If the conjunct is not evaluated from the dictionary, then it might result in a more expensive evaluation. To put numbers on it: Suppose there are two conjuncts A and B. A is expensive (cost = 10) and super selective (eliminates 0.99). B is cheap (cost = 1) and moderately selective (eliminates 0.50). The front end might put B first, so if B eliminates 50% of the row, then A is called 50% of the time to eliminate the rest. This has an amortized cost of 1 + 0.50 * 10 = 6, which is cheaper than calling A 100% of the time. We can reorder the materialization of the columns at runtime using knowledge of which columns are dictionary encoded and which aren't. PS4, Line 1467: endif It should be possible to do this up in HdfsScanNode. As an example, see extractKuduConjuncts in KuduScanNode. This pulls out conjuncts that will be evaluated by Kudu. -- To view, visit http://gerrit.cloudera.org:8080/6726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden <zams...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-HasComments: Yes