[ https://issues.apache.org/jira/browse/SPARK-36527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chao Sun updated SPARK-36527: ----------------------------- Parent: (was: SPARK-35743) Issue Type: Improvement (was: Sub-task) > Implement lazy materialization for the vectorized Parquet reader > ---------------------------------------------------------------- > > Key: SPARK-36527 > URL: https://issues.apache.org/jira/browse/SPARK-36527 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.0 > Reporter: Chao Sun > Priority: Major > > At the moment the Parquet vectorized reader will eagerly decode all the > columns that are in the read schema, before any filter has been applied to > them. This is costly. Instead it's better to only materialize these column > vectors when the data are actually needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org