Re: Data Contracts

2023-06-12 Thread Deepak Sharma
Spark can be used with tools like great expectations as well to implement the data contracts . I am not sure though if spark alone can do the data contracts . I was reading a blog on data mesh and how to glue it together with data contracts , that’s where I came across this spark and great

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Jungtaek Lim
I concur with Holden and Mridul. Let's build a plan before we call the tentative deadline. I understand setting the tentative deadline would definitely help in pushing back features which "never ever ends", but at least we may want to list up features and discuss for priority. It is still possible

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Mridul Muralidharan
I agree with Holden, we should have some understanding of what we are targeting for 4.0, given it is a major ver bump - and work from there on the release date. Regards, Mridul On Mon, Jun 12, 2023 at 8:53 PM Jia Fan wrote: > By the way, like Holden said, what's big feature for 4.0.0? I think

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Jia Fan
By the way, like Holden said, what's big feature for 4.0.0? I think very big version change always bring some different. Jia Fan 于2023年6月13日周二 08:25写道: > +1 > > > > Jia Fan > > > > 2023年6月13日 03:51,Chao Sun 写道: > > +1 > > On Mon, Jun 12, 2023 at 12:50 PM kazuyuki

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Jia Fan
+1 Jia Fan > 2023年6月13日 03:51,Chao Sun 写道: > > +1 > > On Mon, Jun 12, 2023 at 12:50 PM kazuyuki tanimura > wrote: >> +1 (non-binding) >> >> Thank you! >> Kazu >> >> >>> On Jun 12, 2023, at 11:32 AM, Holden Karau >> > wrote: >>>

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-06-12 Thread Mich Talebzadeh
Hi all, Has there been any progress on the item list summarised by Holden namely - Inter-Pod security, istio + mTLS - Sidecar management - Docker Images - Add links to more related images - - Helm links - Data Locality concerns - Upgrading Spark Versions - Performance

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Chao Sun
+1 On Mon, Jun 12, 2023 at 12:50 PM kazuyuki tanimura wrote: > +1 (non-binding) > > Thank you! > Kazu > > > On Jun 12, 2023, at 11:32 AM, Holden Karau wrote: > > -0 > > I'd like to see more of a doc around what we're planning on for a 4.0 > before we pick a target release date etc. (feels like

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread kazuyuki tanimura
+1 (non-binding) Thank you! Kazu > On Jun 12, 2023, at 11:32 AM, Holden Karau wrote: > > -0 > > I'd like to see more of a doc around what we're planning on for a 4.0 before > we pick a target release date etc. (feels like cart before the horse). > > But it's a weak preference. > > On Mon,

Re: Gauging interest in: ScalaFix + Scala Steward for Spark 4.0

2023-06-12 Thread Dongjoon Hyun
Holden, I agree with you a lot in a sense that this is a chicken and egg situation. Spark v4.0 release is a really big one, isn't it? (1) First, do you think the proposed items are 'BLOCKER'-level Apache Spark 4.0 JIRA items? May I ask why you think in that way? If I understand more, I can

Re: Data Contracts

2023-06-12 Thread Elliot West
Hi Phillip, While not as fine-grained as your example, there do exist schema systems such as that in Avro that can can evaluate compatible and incompatible changes to the schema, from the perspective of the reader, writer, or both. This provides some potential degree of enforcement, and means to

Re: Gauging interest in: ScalaFix + Scala Steward for Spark 4.0

2023-06-12 Thread Holden Karau
Yup I think buidling consensus on what goes in 4.X is something we’ll need to do. On Mon, Jun 12, 2023 at 11:56 AM Dongjoon Hyun wrote: > Thank you for sharing those. I'm also interested in taking advantage of > it. Also, I hope `spark-upgrade` can help us in line with Spark 4.0. > > However,

Re: Gauging interest in: ScalaFix + Scala Steward for Spark 4.0

2023-06-12 Thread Dongjoon Hyun
Thank you for sharing those. I'm also interested in taking advantage of it. Also, I hope `spark-upgrade` can help us in line with Spark 4.0. However, we don't need to discuss any of this if we don't build a consensus on both Spark 4.0 or next Scala version. We don't have a vehicle at all to

Gauging interest in: ScalaFix + Scala Steward for Spark 4.0

2023-06-12 Thread Holden Karau
My self and a few folks have been working on a spark-upgrade project (focused on getting folks onto current versions of Spark). Since it looks like were starting the discussion around Spark 4 I was thinking now could be a good time for us to consider if we want to try and integrate auto-upgrade

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Holden Karau
-0 I'd like to see more of a doc around what we're planning on for a 4.0 before we pick a target release date etc. (feels like cart before the horse). But it's a weak preference. On Mon, Jun 12, 2023 at 11:24 AM Xiao Li wrote: > Thanks for starting the vote. > > I do have a concern about the

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Xiao Li
Thanks for starting the vote. I do have a concern about the target release date of Spark 4.0. L. C. Hsieh 于2023年6月12日周一 11:09写道: > +1 > > On Mon, Jun 12, 2023 at 11:06 AM huaxin gao > wrote: > > > > +1 > > > > On Mon, Jun 12, 2023 at 11:05 AM Dongjoon Hyun > wrote: > >> > >> +1 > >> > >>

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread L. C. Hsieh
+1 On Mon, Jun 12, 2023 at 11:06 AM huaxin gao wrote: > > +1 > > On Mon, Jun 12, 2023 at 11:05 AM Dongjoon Hyun wrote: >> >> +1 >> >> Dongjoon >> >> On 2023/06/12 18:00:38 Dongjoon Hyun wrote: >> > Please vote on the release plan for Apache Spark 4.0.0. >> > >> > The vote is open until June

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread huaxin gao
+1 On Mon, Jun 12, 2023 at 11:05 AM Dongjoon Hyun wrote: > +1 > > Dongjoon > > On 2023/06/12 18:00:38 Dongjoon Hyun wrote: > > Please vote on the release plan for Apache Spark 4.0.0. > > > > The vote is open until June 16th 1AM (PST) and passes if a majority +1 > PMC > > votes are cast, with a

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Dongjoon Hyun
+1 Dongjoon On 2023/06/12 18:00:38 Dongjoon Hyun wrote: > Please vote on the release plan for Apache Spark 4.0.0. > > The vote is open until June 16th 1AM (PST) and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1 Have a release plan for Apache Spark 4.0.0

[VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Dongjoon Hyun
Please vote on the release plan for Apache Spark 4.0.0. The vote is open until June 16th 1AM (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Have a release plan for Apache Spark 4.0.0 (June 2024) [ ] -1 Do not have a plan for Apache Spark 4.0.0 because

Re: Data Contracts

2023-06-12 Thread Ryan Blue
Hey Phillip, You're right that we can improve tooling to help with data contracts, but I think that a contract still needs to be an agreement between people. Constraints help by helping to ensure a data producer adheres to the contract and gives feedback as soon as possible when assumptions are

Data Contracts

2023-06-12 Thread Phillip Henry
Hi, folks. There currently seems to be a buzz around "data contracts". From what I can tell, these mainly advocate a cultural solution. But instead, could big data tools be used to enforce these contracts? My questions really are: are there any plans to implement data constraints in Spark (eg,

Re: Apache Spark 3.4.1 Release?

2023-06-12 Thread beliefer
Dongjoon. Thank you. There is a issue should be fixed. https://issues.apache.org/jira/browse/SPARK-44018 在 2023-06-12 13:22:30,"Dongjoon Hyun" 写道: Thank you all. I'll check and prepare `branch-3.4` for the target date, June 20th. Dongjoon. On Fri, Jun 9, 2023 at 10:47 PM yangjie01

Re: ASF policy violation and Scala version issues

2023-06-12 Thread Dongjoon Hyun
Let me add my answers about a few Scala questions, Jungtaek. > Are we concerned that a library does not release a new version > which bumps the Scala version, which the Scala version is > announced in less than a week? No, we have concerns about the newly introduced disability in the Apache