[VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

L. C. Hsieh Mon, 13 Feb 2023 14:53:25 -0800

Hi all,

I'd like to start the vote for SPIP: Lazy Materialization for Parquet
Read Performance Improvement.


The high summary of the SPIP is that it proposes an improvement to the
Parquet reader with lazy materialization which only materializes (i.e.
decompress, de-code, etc...) necessary values. For Spark-SQL filter
operations, evaluating the filters first and lazily materializing only
the used values can save computation wastes and improve the read
performance.

References:

JIRA ticket https://issues.apache.org/jira/browse/SPARK-42256
SPIP doc 
https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
Discussion thread
https://lists.apache.org/thread/5yf2ylqhcv94y03m7gp3mgf3q0fp6gw6

Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because …

Thank you!

Liang-Chi Hsieh

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

[VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

Reply via email to