unsubscribe

2023-06-19 Thread Bharat Kul Ratan

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-19 Thread Dongjoon Hyun
Hi, Herman. This is a series of discussions as I re-summarized here. You can find some context in the previous timeline thread. 2023-05-30 Apache Spark 4.0 Timeframe? https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6 Could you reply there to collect your timeline suggestions? We

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Cheng Pan
This API looks starting from scratch and has no relationship with the existing Java/Scala DataSourceV2 API. Particularly, how can they support SQL? We have been back and forth on the DataSource V2 design since 2.3, I believe there are some things to learn when introducing the Python DataSource

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Hyukjin Kwon
Actually I support this idea in a way that Python developers don't have to learn Scala to write their own source (and separate packaging). This is more crucial especially when you want to write a simple data source that interacts with the Python ecosystem. On Tue, 20 Jun 2023 at 03:08, Denny Lee

Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-19 Thread Jia Fan
+1 Dongjoon Hyun 于2023年6月20日周二 10:41写道: > Please vote on releasing the following candidate as Apache Spark version > 3.4.1. > > The vote is open until June 23rd 1AM (PST) and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1 Release this package as Apache

[VOTE] Release Spark 3.4.1 (RC1)

2023-06-19 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 3.4.1. The vote is open until June 23rd 1AM (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.4.1 [ ] -1 Do not release this package because

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-19 Thread Herman van Hovell
Dongjoon, I am not sure if I am not sure if I follow the line of thought here. Multiple people have asked for clarification on what Spark 4.0 would mean (Holden, Mridul, Jia & Xiao). You can - for the record - also add me to this list. However you choose to single out Xiao because asks this

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-19 Thread Dongjoon Hyun
Thank you. I reviewed the threads, vote and result once more. I found that I missed the binding vote mark on Holden in the vote result email. The following should be "-0: Holden Karau *". Sorry for this mistake, Holden and all. > -0: Holden Karau To Hyukjin, I disagree with you at the

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Denny Lee
Slightly biased, but per my conversations - this would be awesome to have! On Mon, Jun 19, 2023 at 09:43 Abdeali Kothari wrote: > I would definitely use it - is it's available :) > > On Mon, 19 Jun 2023, 21:56 Jacek Laskowski, wrote: > >> Hi Allison and devs, >> >> Although I was against this

unsubscribe

2023-06-19 Thread Bharat Kul Ratan

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Abdeali Kothari
I would definitely use it - is it's available :) On Mon, 19 Jun 2023, 21:56 Jacek Laskowski, wrote: > Hi Allison and devs, > > Although I was against this idea at first sight (probably because I'm a > Scala dev), I think it could work as long as there are people who'd be > interested in such an

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Jacek Laskowski
Hi Allison and devs, Although I was against this idea at first sight (probably because I'm a Scala dev), I think it could work as long as there are people who'd be interested in such an API. Were there any? I'm just curious. I've seen no emails requesting it. I also doubt that Python devs would

Re: Data Contracts

2023-06-19 Thread Deepak Sharma
Sorry for using simple in my last email . It’s not gonna to be simple in any terms . Thanks for sharing the git Philip . Will definitely go through it . Thanks Deepak On Mon, 19 Jun 2023 at 3:47 PM, Phillip Henry wrote: > I think it might be a bit more complicated than this (but happy to be >

Re: Data Contracts

2023-06-19 Thread Phillip Henry
I think it might be a bit more complicated than this (but happy to be proved wrong). I have a minimum working example at: https://github.com/PhillHenry/SparkConstraints.git that runs out-of-the-box (mvn test) and demonstrates what I am trying to achieve. A test persists a DataFrame that

Re: Data Contracts

2023-06-19 Thread Deepak Sharma
It can be as simple as adding a function to the spark session builder specifically on the read which can take the yaml file(definition if data co tracts to be in yaml) and apply it to the data frame . It can ignore the rows not matching the data contracts defined in the yaml . Thanks Deepak On

Re: Data Contracts

2023-06-19 Thread Phillip Henry
For my part, I'm not too concerned about the mechanism used to implement the validation as long as it's rich enough to express the constraints. I took a look at JSON Schemas (for which there are a number of JVM implementations) but I don't think it can handle more complex data types like dates.