[jira] [Updated] (KYLIN-5693) Reduce the number of times Spark reads Parquet Footer to improve query performance

Yaguang Jia (Jira) Sun, 20 Aug 2023 20:23:04 -0700


     [ 
https://issues.apache.org/jira/browse/KYLIN-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yaguang Jia updated KYLIN-5693:
-------------------------------
    Description: 
h2. Dev Design

Parquet footer metadata is now always read twice in vectorized parquet reader.
When the NameNode is under high pressure, it will cost time to read twice. 
Actually we can avoid reading the footer twice by reading all row groups in 
advance and filter row groups according to filters that require push down (no 
need to read the footer metadata again the second time).

> Reduce the number of times Spark reads Parquet Footer to improve query 
> performance
> ----------------------------------------------------------------------------------
>
>                 Key: KYLIN-5693
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5693
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine
>    Affects Versions: 5.0-alpha
>            Reporter: Yaguang Jia
>            Assignee: Yaguang Jia
>            Priority: Critical
>             Fix For: 5.0-beta
>
>
> h2. Dev Design
> Parquet footer metadata is now always read twice in vectorized parquet reader.
> When the NameNode is under high pressure, it will cost time to read twice. 
> Actually we can avoid reading the footer twice by reading all row groups in 
> advance and filter row groups according to filters that require push down (no 
> need to read the footer metadata again the second time).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KYLIN-5693) Reduce the number of times Spark reads Parquet Footer to improve query performance

Reply via email to