Re: [DISCUSS] Spark 4.0.0 release

2024-04-16 Thread Cheng Pan
will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0? Thanks, Cheng Pan > On Apr 15, 2024, at 09:58, Jungtaek Lim wrote: > > W.r.t. state data source - reader (SPARK-45511), there are several follow-up > tickets, but we don't plan to address them soon. The current

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread huaxin gao
+1 On Tue, Apr 16, 2024 at 6:55 PM Kent Yao wrote: > +1(non-binding) > > Thanks, > Kent Yao > > bo yang 于2024年4月17日周三 09:49写道: > > > > +1 > > > > On Tue, Apr 16, 2024 at 1:38 PM Hyukjin Kwon > wrote: > >> > >> +1 > >> > >> On Wed, Apr 17, 2024 at 3:57 AM L. C. Hsieh wrote: > >>> > >>> +1 >

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread Kent Yao
+1(non-binding) Thanks, Kent Yao bo yang 于2024年4月17日周三 09:49写道: > > +1 > > On Tue, Apr 16, 2024 at 1:38 PM Hyukjin Kwon wrote: >> >> +1 >> >> On Wed, Apr 17, 2024 at 3:57 AM L. C. Hsieh wrote: >>> >>> +1 >>> >>> On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan wrote: >>> > >>> > +1 >>> > >>> > On

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread DB Tsai
+1Sent from my iPhoneOn Apr 16, 2024, at 3:11 PM, bo yang wrote:+1On Tue, Apr 16, 2024 at 1:38 PM Hyukjin Kwon wrote:+1On Wed, Apr 17, 2024 at 3:57 AM L. C. Hsieh wrote:+1 On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan wrote: > > +1 >

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread bo yang
+1 On Tue, Apr 16, 2024 at 1:38 PM Hyukjin Kwon wrote: > +1 > > On Wed, Apr 17, 2024 at 3:57 AM L. C. Hsieh wrote: > >> +1 >> >> On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan wrote: >> > >> > +1 >> > >> > On Mon, Apr 15, 2024 at 12:31 PM Dongjoon Hyun >> wrote: >> >> >> >> I'll start with my

Configuration to disable file exists in DataSource

2024-04-16 Thread Romain Ardiet
Hi community, When using DataFrameReader to read parquet files located on s3, there is no way to disable file existence checks done by the driver. My use case is that I have a spark job reading list of s3 files generated by an upstream job. This list can contain thousands of files. The process

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread Hyukjin Kwon
+1 On Wed, Apr 17, 2024 at 3:57 AM L. C. Hsieh wrote: > +1 > > On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan wrote: > > > > +1 > > > > On Mon, Apr 15, 2024 at 12:31 PM Dongjoon Hyun > wrote: > >> > >> I'll start with my +1. > >> > >> - Checked checksum and signature > >> - Checked

Re: Which version of spark version supports parquet version 2 ?

2024-04-16 Thread Mich Talebzadeh
Hi Prem, Regrettably this is not my area of speciality. I trust another colleague will have a more informed idea. Alternatively you may raise an SPIP for it. Spark Project Improvement Proposals (SPIP) | Apache Spark HTH Mich Talebzadeh,

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread Gengliang Wang
+1 On Tue, Apr 16, 2024 at 11:57 AM L. C. Hsieh wrote: > +1 > > On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan wrote: > > > > +1 > > > > On Mon, Apr 15, 2024 at 12:31 PM Dongjoon Hyun > wrote: > >> > >> I'll start with my +1. > >> > >> - Checked checksum and signature > >> - Checked

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread L. C. Hsieh
+1 On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan wrote: > > +1 > > On Mon, Apr 15, 2024 at 12:31 PM Dongjoon Hyun wrote: >> >> I'll start with my +1. >> >> - Checked checksum and signature >> - Checked Scala/Java/R/Python/SQL Document's Spark version >> - Checked published Maven artifacts >> -

Re: Which version of spark version supports parquet version 2 ?

2024-04-16 Thread Prem Sahoo
Hello Mich,Thanks for example.I have the same parquet-mr version which creates Parquet version 1. We need to create V2 as it is more optimized. We have Dremio where if we use Parquet V2 it is 75% better than Parquet V1 in case of read and 25 % better in case of write . so we are inclined towards

Re: Which version of spark version supports parquet version 2 ?

2024-04-16 Thread Mich Talebzadeh
Well let us do a test in PySpark. Take this code and create a default parquet file. My spark is 3.4 cat parquet_checxk.py from pyspark.sql import SparkSession spark = SparkSession.builder.appName("ParquetVersionExample").getOrCreate() data = [("London", 8974432), ("New York City", 8804348),

Re: Which version of spark version supports parquet version 2 ?

2024-04-16 Thread Prem Sahoo
Hello Community,Could any of you shed some light on below questions please ?Sent from my iPhoneOn Apr 15, 2024, at 9:02 PM, Prem Sahoo wrote:Any specific reason spark does not support or community doesn't want to go to Parquet V2 , which is more optimized and read and write is too much faster

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread Wenchen Fan
+1 On Mon, Apr 15, 2024 at 12:31 PM Dongjoon Hyun wrote: > I'll start with my +1. > > - Checked checksum and signature > - Checked Scala/Java/R/Python/SQL Document's Spark version > - Checked published Maven artifacts > - All CIs passed. > > Thanks, > Dongjoon. > > On 2024/04/15 04:22:26