[
https://issues.apache.org/jira/browse/KYLIN-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaguang Jia updated KYLIN-5693:
-------------------------------
Description:
h2. Dev Design
Parquet footer metadata is now always read twice in vectorized parquet reader.
When the NameNode is under high pressure, it will cost time to read twice.
Actually we can avoid reading the footer twice by reading all row groups in
advance and filter row groups according to filters that require push down (no
need to read the footer metadata again the second time).
> Reduce the number of times Spark reads Parquet Footer to improve query
> performance
> ----------------------------------------------------------------------------------
>
> Key: KYLIN-5693
> URL: https://issues.apache.org/jira/browse/KYLIN-5693
> Project: Kylin
> Issue Type: Improvement
> Components: Query Engine
> Affects Versions: 5.0-alpha
> Reporter: Yaguang Jia
> Assignee: Yaguang Jia
> Priority: Critical
> Fix For: 5.0-beta
>
>
> h2. Dev Design
> Parquet footer metadata is now always read twice in vectorized parquet reader.
> When the NameNode is under high pressure, it will cost time to read twice.
> Actually we can avoid reading the footer twice by reading all row groups in
> advance and filter row groups according to filters that require push down (no
> need to read the footer metadata again the second time).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)