Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

kazuyuki tanimura Tue, 31 Jan 2023 14:34:28 -0800

Thank you Mich. I followed the instruction at 
https://spark.apache.org/improvement-proposals.html 
<https://spark.apache.org/improvement-proposals.html> and used its template.
While we are open to revise our design doc, it seems more like you are 
proposing the community to change the instruction per se?


Kazu

> On Jan 31, 2023, at 11:24 AM, Mich Talebzadeh <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> Hi,
> 
> Thanks for these proposals. good suggestions. Is this style of breaking down 
> your approach standard?
> 
> My view would be that perhaps it makes more sense to follow the industry 
> established approach of breaking down your technical proposal  into:
> 
> Background
> Objective
> Scope
> Constraints
> Assumptions
> Reporting
> Deliverables
> Timelines
> Appendix
> Your current approach using below 
> 
> Q1. What are you trying to do? Articulate your objectives using absolutely no 
> jargon. What are you trying to achieve?
> Q2. What problem is this proposal NOT designed to solve? What issues the 
> suggested proposal is not going to address
> Q3. How is it done today, and what are the limits of current practice?
> Q4. What is new in your approach approach and why do you think it will be 
> successful succeed?
> Q5. Who cares? If you are successful, what difference will it make? If your 
> proposal succeeds, what tangible benefits will it add?
> Q6. What are the risks?
> Q7. How long will it take?
> Q8. What are the midterm and final “exams” to check for success?
>  
> May not do  justice to your proposal.
> 
> HTH
> 
> Mich
> 
> 
>    view my Linkedin profile 
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
> 
>  https://en.everybodywiki.com/Mich_Talebzadeh 
> <https://en.everybodywiki.com/Mich_Talebzadeh>
>  
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> 
> On Tue, 31 Jan 2023 at 17:35, kazuyuki tanimura <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi everyone,
> 
> I would like to start a discussion on “Lazy Materialization for Parquet Read 
> Performance Improvement"
> 
> Chao and I propose a Parquet reader with lazy materialization. For Spark-SQL 
> filter operations, evaluating the filters first and lazily materializing only 
> the used values can save computation wastes and improve the read performance.
> The current implementation of Spark requires the read values to materialize 
> (i.e. decompress, de-code, etc...) onto memory first before applying the 
> filters even though the filters may eventually throw away many values.
> 
> We made our design doc as follows.
> SPIP Jira: https://issues.apache.org/jira/browse/SPARK-42256 
> <https://issues.apache.org/jira/browse/SPARK-42256> 
> SPIP Doc: 
> https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
>  
> <https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME>
> 
> Liang-Chi was kind enough to shepherd this effort. 
> 
> Thank you
> Kazu

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

Reply via email to