Thank you Mich. I followed the instruction at https://spark.apache.org/improvement-proposals.html <https://spark.apache.org/improvement-proposals.html> and used its template. While we are open to revise our design doc, it seems more like you are proposing the community to change the instruction per se?
Kazu > On Jan 31, 2023, at 11:24 AM, Mich Talebzadeh <mich.talebza...@gmail.com > <mailto:mich.talebza...@gmail.com>> wrote: > > Hi, > > Thanks for these proposals. good suggestions. Is this style of breaking down > your approach standard? > > My view would be that perhaps it makes more sense to follow the industry > established approach of breaking down your technical proposal into: > > Background > Objective > Scope > Constraints > Assumptions > Reporting > Deliverables > Timelines > Appendix > Your current approach using below > > Q1. What are you trying to do? Articulate your objectives using absolutely no > jargon. What are you trying to achieve? > Q2. What problem is this proposal NOT designed to solve? What issues the > suggested proposal is not going to address > Q3. How is it done today, and what are the limits of current practice? > Q4. What is new in your approach approach and why do you think it will be > successful succeed? > Q5. Who cares? If you are successful, what difference will it make? If your > proposal succeeds, what tangible benefits will it add? > Q6. What are the risks? > Q7. How long will it take? > Q8. What are the midterm and final “exams” to check for success? > > May not do justice to your proposal. > > HTH > > Mich > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > https://en.everybodywiki.com/Mich_Talebzadeh > <https://en.everybodywiki.com/Mich_Talebzadeh> > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > > On Tue, 31 Jan 2023 at 17:35, kazuyuki tanimura <ktanim...@apple.com.invalid > <mailto:ktanim...@apple.com.invalid>> wrote: > Hi everyone, > > I would like to start a discussion on “Lazy Materialization for Parquet Read > Performance Improvement" > > Chao and I propose a Parquet reader with lazy materialization. For Spark-SQL > filter operations, evaluating the filters first and lazily materializing only > the used values can save computation wastes and improve the read performance. > The current implementation of Spark requires the read values to materialize > (i.e. decompress, de-code, etc...) onto memory first before applying the > filters even though the filters may eventually throw away many values. > > We made our design doc as follows. > SPIP Jira: https://issues.apache.org/jira/browse/SPARK-42256 > <https://issues.apache.org/jira/browse/SPARK-42256> > SPIP Doc: > https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME > > <https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME> > > Liang-Chi was kind enough to shepherd this effort. > > Thank you > Kazu