[
https://issues.apache.org/jira/browse/IMPALA-12137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18066936#comment-18066936
]
ASF subversion and git services commented on IMPALA-12137:
----------------------------------------------------------
Commit ab22511520f122db617bd08685d6fa11c4b36668 in impala's branch
refs/heads/master from Balazs Hevele
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ab2251152 ]
IMPALA-12137: Fix skipping parquet data copy for dict pages
This commit fixes skipping the copying of data pages read from parquet
files when it has dictionary encoding.
The previous commit used a value from a parameter that was an output
parameter.
Skips some memory allocation when reading parquet files in a very
specific case: for uncompressed data pages of var len strings, having
dictionary encoding. In this case, there is no need to allocate a copy
of the data buffer for strings to point into, because they will point
into the dictionary.
Measurements:
Measured peak memory with the following query:
select count(distinct city) from functional_parquet.airports_parquet;
Peak Memory of SCAN HDFS dropped from 425.75KB to 399.79KB.
Change-Id: I3c6dfaeb5d2b7addbcd8ad663271131ec8608003
Reviewed-on: http://gerrit.cloudera.org:8080/24117
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Parquet scanner should not copy uncompressed data in dict encoded pages
> -----------------------------------------------------------------------
>
> Key: IMPALA-12137
> URL: https://issues.apache.org/jira/browse/IMPALA-12137
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Daniel Becker
> Assignee: Balazs Hevele
> Priority: Major
> Fix For: Impala 5.0.0
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]