[Impala-ASF-CR] IMPALA-12137: Fix skipping parquet data copy for dict pages

Balazs Hevele (Code Review) Thu, 19 Mar 2026 05:50:17 -0700

Balazs Hevele has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/24117



Change subject: IMPALA-12137: Fix skipping parquet data copy for dict pages
......................................................................

IMPALA-12137: Fix skipping parquet data copy for dict pages

This commit fixes skipping the copying of data pages read from parquet
files when it has dictionary encoding.
The previous commit used a value from a parameter that was an output
parameter.

Skips some memory allocation when reading parquet files in a very
specific case: for uncompressed data pages of var len strings, having
dictionary encoding. In this case, there is no need to allocate a copy
of the data buffer for strings to point into, because they will point
into the dictionary.

Measurements:
Measured peak memory with the following query:
  select count(distinct city) from functional_parquet.airports_parquet;
Peak Memory of SCAN HDFS dropped from 425.75KB to 399.79KB.

Change-Id: I3c6dfaeb5d2b7addbcd8ad663271131ec8608003
---
M be/src/exec/parquet/parquet-column-chunk-reader.cc
1 file changed, 5 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/24117/1
--
To view, visit http://gerrit.cloudera.org:8080/24117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3c6dfaeb5d2b7addbcd8ad663271131ec8608003
Gerrit-Change-Number: 24117
Gerrit-PatchSet: 1
Gerrit-Owner: Balazs Hevele <[email protected]>

[Impala-ASF-CR] IMPALA-12137: Fix skipping parquet data copy for dict pages

Reply via email to