[Impala-ASF-CR] IMPALA-12137: Fix skipping parquet data copy for dict pages

Impala Public Jenkins (Code Review) Thu, 19 Mar 2026 11:48:09 -0700

Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/24117 )


Change subject: IMPALA-12137: Fix skipping parquet data copy for dict pages
......................................................................

IMPALA-12137: Fix skipping parquet data copy for dict pages

This commit fixes skipping the copying of data pages read from parquet
files when it has dictionary encoding.
The previous commit used a value from a parameter that was an output
parameter.

Skips some memory allocation when reading parquet files in a very
specific case: for uncompressed data pages of var len strings, having
dictionary encoding. In this case, there is no need to allocate a copy
of the data buffer for strings to point into, because they will point
into the dictionary.

Measurements:
Measured peak memory with the following query:
  select count(distinct city) from functional_parquet.airports_parquet;
Peak Memory of SCAN HDFS dropped from 425.75KB to 399.79KB.

Change-Id: I3c6dfaeb5d2b7addbcd8ad663271131ec8608003
Reviewed-on: http://gerrit.cloudera.org:8080/24117
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
M be/src/exec/parquet/parquet-column-chunk-reader.cc
1 file changed, 5 insertions(+), 3 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/24117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3c6dfaeb5d2b7addbcd8ad663271131ec8608003
Gerrit-Change-Number: 24117
Gerrit-PatchSet: 3
Gerrit-Owner: Balazs Hevele <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

[Impala-ASF-CR] IMPALA-12137: Fix skipping parquet data copy for dict pages

Reply via email to