[jira] [Commented] (IMPALA-12137) Parquet scanner should not copy uncompressed data in dict encoded pages

ASF subversion and git services (Jira) Thu, 19 Mar 2026 22:02:49 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-12137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18066936#comment-18066936
 ]


ASF subversion and git services commented on IMPALA-12137:
----------------------------------------------------------

Commit ab22511520f122db617bd08685d6fa11c4b36668 in impala's branch 
refs/heads/master from Balazs Hevele
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ab2251152 ]

IMPALA-12137: Fix skipping parquet data copy for dict pages

This commit fixes skipping the copying of data pages read from parquet
files when it has dictionary encoding.
The previous commit used a value from a parameter that was an output
parameter.

Skips some memory allocation when reading parquet files in a very
specific case: for uncompressed data pages of var len strings, having
dictionary encoding. In this case, there is no need to allocate a copy
of the data buffer for strings to point into, because they will point
into the dictionary.

Measurements:
Measured peak memory with the following query:
  select count(distinct city) from functional_parquet.airports_parquet;
Peak Memory of SCAN HDFS dropped from 425.75KB to 399.79KB.

Change-Id: I3c6dfaeb5d2b7addbcd8ad663271131ec8608003
Reviewed-on: http://gerrit.cloudera.org:8080/24117
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Parquet scanner should not copy uncompressed data in dict encoded pages
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-12137
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12137
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Daniel Becker
>            Assignee: Balazs Hevele
>            Priority: Major
>             Fix For: Impala 5.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-12137) Parquet scanner should not copy uncompressed data in dict encoded pages

Reply via email to