[jira] [Updated] (SPARK-36527) Implement lazy materialization for the vectorized Parquet reader

Chao Sun (Jira) Fri, 06 Jan 2023 14:25:45 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-36527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chao Sun updated SPARK-36527:
-----------------------------
        Parent:     (was: SPARK-35743)
    Issue Type: Improvement  (was: Sub-task)

> Implement lazy materialization for the vectorized Parquet reader
> ----------------------------------------------------------------
>
>                 Key: SPARK-36527
>                 URL: https://issues.apache.org/jira/browse/SPARK-36527
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Chao Sun
>            Priority: Major
>
> At the moment the Parquet vectorized reader will eagerly decode all the 
> columns that are in the read schema, before any filter has been applied to 
> them. This is costly. Instead it's better to only materialize these column 
> vectors when the data are actually needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36527) Implement lazy materialization for the vectorized Parquet reader

Reply via email to