Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/24117 )
Change subject: IMPALA-12137: Fix skipping parquet data copy for dict pages ...................................................................... IMPALA-12137: Fix skipping parquet data copy for dict pages This commit fixes skipping the copying of data pages read from parquet files when it has dictionary encoding. The previous commit used a value from a parameter that was an output parameter. Skips some memory allocation when reading parquet files in a very specific case: for uncompressed data pages of var len strings, having dictionary encoding. In this case, there is no need to allocate a copy of the data buffer for strings to point into, because they will point into the dictionary. Measurements: Measured peak memory with the following query: select count(distinct city) from functional_parquet.airports_parquet; Peak Memory of SCAN HDFS dropped from 425.75KB to 399.79KB. Change-Id: I3c6dfaeb5d2b7addbcd8ad663271131ec8608003 Reviewed-on: http://gerrit.cloudera.org:8080/24117 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/exec/parquet/parquet-column-chunk-reader.cc 1 file changed, 5 insertions(+), 3 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/24117 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3c6dfaeb5d2b7addbcd8ad663271131ec8608003 Gerrit-Change-Number: 24117 Gerrit-PatchSet: 3 Gerrit-Owner: Balazs Hevele <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
