Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread L. C. Hsieh
Hi Mich, The title of this thread is "[DISCUSS]". We need to have a public discussion on a SPIP proposal collecting comments before we can move forward to call for a vote on it. On Mon, Feb 13, 2023 at 2:35 PM Mich Talebzadeh wrote: > Hi, > > I thought we already voted to go ahead with this

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread Mich Talebzadeh
Hi, I thought we already voted to go ahead with this proposal! view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread kazuyuki tanimura
Thank you Liang-Chi! Kazu > On Feb 11, 2023, at 7:12 PM, L. C. Hsieh wrote: > > Thanks all for your feedback. > > Given this positive feedback, if there is no other comments/discussion, I > will go to start a vote in the next few days. > > Thank you again! > > On Thu, Feb 2, 2023 at 10:12

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-02-11 Thread L. C. Hsieh
Thanks all for your feedback. Given this positive feedback, if there is no other comments/discussion, I will go to start a vote in the next few days. Thank you again! On Thu, Feb 2, 2023 at 10:12 AM kazuyuki tanimura wrote: > Thank you all for +1s and reviewing the SPIP doc. > > Kazu > > On

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-02-02 Thread kazuyuki tanimura
Thank you all for +1s and reviewing the SPIP doc. Kazu > On Feb 1, 2023, at 1:28 AM, Dongjoon Hyun wrote: > > +1 > > On Wed, Feb 1, 2023 at 12:52 AM Mich Talebzadeh > wrote: > +1 > > >view my Linkedin profile >

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-02-02 Thread kazuyuki tanimura
Thank you Mich. I addressed your point on the SPIP doc. Kazu > On Feb 1, 2023, at 2:04 AM, Mich Talebzadeh wrote: > > > In your statement on Q2 in SPIP, you mention and I quote > > "... File formats other than Parquet are beyond the scope of this SPIP.." > > It is important that you explain

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-02-01 Thread Mich Talebzadeh
In your statement on Q2 in SPIP, you mention and I quote "... File formats other than Parquet are beyond the scope of this SPIP.." It is important that you explain why you choose Parquet for this work. Apache Parquet is an open source *column-oriented data format

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-02-01 Thread Dongjoon Hyun
+1 On Wed, Feb 1, 2023 at 12:52 AM Mich Talebzadeh wrote: > +1 > > > >view my Linkedin profile > > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any >

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-02-01 Thread Mich Talebzadeh
+1 view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-01-31 Thread huaxin gao
+1 On Tue, Jan 31, 2023 at 6:10 PM DB Tsai wrote: > +1 > > Sent from my iPhone > > On Jan 31, 2023, at 4:16 PM, Yuming Wang wrote: > >  > +1. > > On Wed, Feb 1, 2023 at 7:42 AM kazuyuki tanimura > wrote: > >> Great! Much appreciated, Mitch! >> >> Kazu >> >> On Jan 31, 2023, at 3:07 PM, Mich

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-01-31 Thread DB Tsai
+1Sent from my iPhoneOn Jan 31, 2023, at 4:16 PM, Yuming Wang wrote:+1.On Wed, Feb 1, 2023 at 7:42 AM kazuyuki tanimura wrote:Great! Much appreciated, Mitch! KazuOn Jan 31, 2023, at 3:07 PM, Mich Talebzadeh wrote:Thanks, Kazu.I followed that template link and indeed

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-01-31 Thread Yuming Wang
+1. On Wed, Feb 1, 2023 at 7:42 AM kazuyuki tanimura wrote: > Great! Much appreciated, Mitch! > > Kazu > > On Jan 31, 2023, at 3:07 PM, Mich Talebzadeh > wrote: > > Thanks, Kazu. > > I followed that template link and indeed as you pointed out it is a common > template. If it works then it is

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-01-31 Thread kazuyuki tanimura
Great! Much appreciated, Mitch! Kazu > On Jan 31, 2023, at 3:07 PM, Mich Talebzadeh > wrote: > > Thanks, Kazu. > > I followed that template link and indeed as you pointed out it is a common > template. If it works then it is what it is. > > I will be going through your design proposals and

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-01-31 Thread Mich Talebzadeh
Thanks, Kazu. I followed that template link and indeed as you pointed out it is a common template. If it works then it is what it is. I will be going through your design proposals and hopefully we can review it. Regards, Mich view my Linkedin profile

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-01-31 Thread kazuyuki tanimura
Thank you Mich. I followed the instruction at https://spark.apache.org/improvement-proposals.html and used its template. While we are open to revise our design doc, it seems more like you are proposing the community to change the instruction

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-01-31 Thread Mich Talebzadeh
Hi, Thanks for these proposals. good suggestions. Is this style of breaking down your approach standard? My view would be that perhaps it makes more sense to follow the industry established approach of breaking down your technical proposal into: 1. Background 2. Objective 3. Scope

[DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-01-31 Thread kazuyuki tanimura
Hi everyone, I would like to start a discussion on “Lazy Materialization for Parquet Read Performance Improvement" Chao and I propose a Parquet reader with lazy materialization. For Spark-SQL filter operations, evaluating the filters first and lazily materializing only the used values can