Hi all, I'd like to start the vote for SPIP: Lazy Materialization for Parquet Read Performance Improvement.
The high summary of the SPIP is that it proposes an improvement to the Parquet reader with lazy materialization which only materializes (i.e. decompress, de-code, etc...) necessary values. For Spark-SQL filter operations, evaluating the filters first and lazily materializing only the used values can save computation wastes and improve the read performance. References: JIRA ticket https://issues.apache.org/jira/browse/SPARK-42256 SPIP doc https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME Discussion thread https://lists.apache.org/thread/5yf2ylqhcv94y03m7gp3mgf3q0fp6gw6 Please vote on the SPIP for the next 72 hours: [ ] +1: Accept the proposal as an official SPIP [ ] +0 [ ] -1: I don’t think this is a good idea because … Thank you! Liang-Chi Hsieh --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org