Hello Alex Behm, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/6950 to look at the new patch set (#9). Change subject: IMPALA-5347: Parquet scanner microoptimizations ...................................................................... IMPALA-5347: Parquet scanner microoptimizations A mix of microoptimizations that profiling the parquet scanner revealed. All lead to some measurable improvement and added up to significant speedups for various scans. * Add ALWAYS_INLINE to hot functions that GCC was mistakenly not inlining in all cases. * Apply __restrict__ in a few places so the compiler knows that it is safe to cache values accessed via those pointers * memset() the whole batch instead of the null indicators is cases where it is almost certainly cheaper. * Avoid updating two correlated loop variables in MaterializeValueBatch(). * Avoid unnecessary initialization of often-unused 'val' in ReadSlot(). * Shave a few instructions off the (still very expensive) bit unpacking and dict decoding logic. Performance: Some local TPC-H and targeted-perf benchmarks showed average speedups of ~5%. I did some benchmarks targeted at measuring column materialisation performance using a version of lineitem with duplicated data to make it bigger. These queries all got significantly faster. Dict-encoded DECIMAL: 2.23 -> 1.23s SELECT count(*) FROM biglineitem WHERE l_quantity > 49 Plain-encoded BIGINT: 2.33s -> 1.62s SELECT count(*) FROM biglineitem WHERE l_orderkey != 10 Dict-encoded STRING: 2.73s -> 1.72s SELECT count(*) FROM biglineitem WHERE l_returnflag = 'A' Plain-encoded STRING: 7.07s -> 6.08s (most time spent in Snappy) SELECT count(*) FROM biglineitem WHERE length(l_comment) > 37 Multiple columns: 5.15s -> 3.74s SELECT count(*) FROM biglineitem WHERE l_quantity > 49 and l_partkey != 199 and l_tax < 100 Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/runtime/tuple.h M be/src/util/bit-stream-utils.inline.h M be/src/util/bit-util.h M be/src/util/dict-encoding.h M be/src/util/rle-encoding.h 9 files changed, 113 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/6950/9 -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbapple-imp...@apache.org> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: anujphadke <apha...@cloudera.com>