Balazs Hevele has uploaded this change for review. ( http://gerrit.cloudera.org:8080/24117
Change subject: IMPALA-12137: Fix skipping parquet data copy for dict pages ...................................................................... IMPALA-12137: Fix skipping parquet data copy for dict pages This commit fixes skipping the copying of data pages read from parquet files when it has dictionary encoding. The previous commit used a value from a parameter that was an output parameter. Skips some memory allocation when reading parquet files in a very specific case: for uncompressed data pages of var len strings, having dictionary encoding. In this case, there is no need to allocate a copy of the data buffer for strings to point into, because they will point into the dictionary. Measurements: Measured peak memory with the following query: select count(distinct city) from functional_parquet.airports_parquet; Peak Memory of SCAN HDFS dropped from 425.75KB to 399.79KB. Change-Id: I3c6dfaeb5d2b7addbcd8ad663271131ec8608003 --- M be/src/exec/parquet/parquet-column-chunk-reader.cc 1 file changed, 5 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/24117/1 -- To view, visit http://gerrit.cloudera.org:8080/24117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3c6dfaeb5d2b7addbcd8ad663271131ec8608003 Gerrit-Change-Number: 24117 Gerrit-PatchSet: 1 Gerrit-Owner: Balazs Hevele <[email protected]>
