Migrating the Junit framework used in Apache Spark 4.0 from 4.x to 5.x

2023-09-25 Thread 杨杰
Hi all, In SPARK-44170 (apache/spark#43074 [1]), I’m trying to migrate the Junit test framework used in Spark 4.0 from Junit4 to Junit5. Although this involves a fair amount of code modifications, given that Junit 4 is still developed based on Java 6 source code and it hasn't released a new

unsubscribe

2023-09-24 Thread Wei Hong
unsubscribe

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-24 Thread Mich Talebzadeh
LOL, Hindsight is a very good thing and often one learns these through experience.Once told off because strict ordering was not maintained, then the lesson will never be forgotten! HTH Mich Talebzadeh, Distinguished Technologist, Solutions Architect & Engineer London United Kingdom view

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-23 Thread Steve Loughran
Now, if you are ruthless it'd make sense to randomise the order of results if someone left out the order by, to stop complacency. like that time sun changed the ordering that methods were returned in a Class.listMethods() call and everyone's junit test cases failed if they'd assumed that ordering

Re:Are DataFrame rows ordered without an explicit ordering clause?

2023-09-23 Thread beliefer
AFAIK, The order is free whether it's SQL without spcified ORDER BY clause or DataFrame without sort. The behavior is consistent between them. At 2023-09-18 23:47:40, "Nicholas Chammas" wrote: I’ve always considered DataFrames to be logically equivalent to SQL tables or queries. In

[DISCUSS] Porting back SPARK-45178 to 3.5/3.4 version lines

2023-09-20 Thread Jungtaek Lim
Hi devs, I'd like to get some inputs for dealing with the possible correctness issue we figured. The JIRA ticket is SPARK-45178 and I described the issue and solution I proposed. Context: Source might behave incorrectly leading to correctness

Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Dongjoon Hyun
Instead of that, I believe you are looking for `spark.sql.sources.useV1SourceList` if the question is about "Concretely, is the plan for Spark 4 to continue defaulting to the built-in v1 data sources?". Here is the code.

Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Will Raschkowski
Thank you for linking that, Dongjoon! I found SPARK-44518 in that list which wants to turn Spark’s Hive integration into a data source. To think out loud: The big gaps between built-in v1 and v2 data sources are support for bucketing and

Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Will Raschkowski
Thank you for linking that, Dongjoon! I found SPARK-44518 in that list which wants to turn Spark’s Hive integration into a data source. IIUC, that’s very related but I’m curious if I’m thinking about this correctly: Big gaps between built-in

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Mich Talebzadeh
These are good points. In traditional RDBMSs, SQL query results without an explicit *ORDER BY* clause may vary in order due to optimization, especially when no clustered index is defined. In contrast, systems like Hive and Spark SQL, which are based on distributed file storage, do not rely on

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Mich Talebzadeh
Hi Nicholas, Your point "In SQL, the result order of any query is implementation-dependent without an explicit ORDER BY clause. Technically, you could run `SELECT * FROM table;` 10 times in a row and get 10 different orderings." yes I concur my understanding is the same. In SQL, the result

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Reynold Xin
It should be the same as SQL. Otherwise it takes away a lot of potential future optimization opportunities. On Mon, Sep 18 2023 at 8:47 AM, Nicholas Chammas < nicholas.cham...@gmail.com > wrote: > > I’ve always considered DataFrames to be logically equivalent to SQL tables > or queries. > >

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Sean Owen
I think it's the same, and always has been - yes you don't have a guaranteed ordering unless an operation produces a specific ordering. Could be the result of order by, yes; I believe you would be guaranteed that reading input files results in data in the order they appear in the file, etc. 1:1

Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Nicholas Chammas
I’ve always considered DataFrames to be logically equivalent to SQL tables or queries. In SQL, the result order of any query is implementation-dependent without an explicit ORDER BY clause. Technically, you could run `SELECT * FROM table;` 10 times in a row and get 10 different orderings. I

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-18 Thread Ruifeng Zheng
Thanks Yuanjian for driving this release, Congratulations! On Mon, Sep 18, 2023 at 2:16 PM Maxim Gekk wrote: > Thank you for the work, Yuanjian! > > On Mon, Sep 18, 2023 at 6:28 AM beliefer wrote: > >> Congratulations! Apache Spark. >> >> >> >> At 2023-09-16 01:01:40, "Yuanjian Li" wrote: >>

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-18 Thread Maxim Gekk
Thank you for the work, Yuanjian! On Mon, Sep 18, 2023 at 6:28 AM beliefer wrote: > Congratulations! Apache Spark. > > > > At 2023-09-16 01:01:40, "Yuanjian Li" wrote: > > Hi All, > > We are happy to announce the availability of *Apache Spark 3.5.0*! > > Apache Spark 3.5.0 is the sixth release

Re:[ANNOUNCE] Apache Spark 3.5.0 released

2023-09-17 Thread beliefer
Congratulations! Apache Spark. At 2023-09-16 01:01:40, "Yuanjian Li" wrote: Hi All, We are happy to announce the availability of Apache Spark 3.5.0! Apache Spark 3.5.0 is the sixth release of the 3.x line. To download Spark 3.5.0, head over to the download page:

Re: First Time contribution.

2023-09-17 Thread Haejoon Lee
Welcome Ram! :-) I would recommend you to check https://issues.apache.org/jira/browse/SPARK-37935 out as a starter task. Refer to https://github.com/apache/spark/pull/41504, https://github.com/apache/spark/pull/41455 as an example PR. Or you can also add a new sub-task if you find any error

Re: First Time contribution.

2023-09-17 Thread Denny Lee
Hi Ram, We have some good guidance at https://spark.apache.org/contributing.html HTH! Denny On Sun, Sep 17, 2023 at 17:18 ram manickam wrote: > > > > Hello All, > Recently, joined this community and would like to contribute. Is there a > guideline or recommendation on tasks that can be

[ANNOUNCE] Apache Spark 3.5.0 released

2023-09-15 Thread Yuanjian Li
Hi All, We are happy to announce the availability of *Apache Spark 3.5.0*! Apache Spark 3.5.0 is the sixth release of the 3.x line. To download Spark 3.5.0, head over to the download page: https://spark.apache.org/downloads.html (Please note: the PyPi upload is pending due to a size limit

Re: Plans for built-in v2 data sources in Spark 4

2023-09-14 Thread Dongjoon Hyun
Hi, Will. According to the following JIRA, as of now, there is no plan or on-going discussion to switch it. https://issues.apache.org/jira/browse/SPARK-44111 (Prepare Apache Spark 4.0.0) Thanks, Dongjoon. On Wed, Sep 13, 2023 at 9:02 AM Will Raschkowski wrote: > Hey everyone, > > > > I was

Re: Write Spark Connection client application in Go

2023-09-14 Thread bo yang
Thanks Holden and Martin for the nice words and feedback :) On Wed, Sep 13, 2023 at 8:22 AM Martin Grund wrote: > This is absolutely awesome! Thank you so much for dedicating your time to > this project! > > > On Wed, Sep 13, 2023 at 6:04 AM Holden Karau wrote: > >> That’s so cool! Great work

Plans for built-in v2 data sources in Spark 4

2023-09-13 Thread Will Raschkowski
Hey everyone, I was wondering what the plans are for Spark's built-in v2 file data sources in Spark 4. Concretely, is the plan for Spark 4 to continue defaulting to the built-in v1 data sources? And if yes, what are the blockers for defaulting to v2? I see, just as example, that writing

Re: Write Spark Connection client application in Go

2023-09-13 Thread Martin Grund
This is absolutely awesome! Thank you so much for dedicating your time to this project! On Wed, Sep 13, 2023 at 6:04 AM Holden Karau wrote: > That’s so cool! Great work y’all :) > > On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote: > >> Hi Spark Friends, >> >> Anyone interested in using Golang

unsubscribe

2023-09-13 Thread ankur

Re: Write Spark Connection client application in Go

2023-09-12 Thread Holden Karau
That’s so cool! Great work y’all :) On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote: > Hi Spark Friends, > > Anyone interested in using Golang to write Spark application? We created a > Spark > Connect Go Client library . > Would love to hear

unsubscribe

2023-09-12 Thread 杨军
unsubscribe

Write Spark Connection client application in Go

2023-09-12 Thread bo yang
Hi Spark Friends, Anyone interested in using Golang to write Spark application? We created a Spark Connect Go Client library . Would love to hear feedback/thoughts from the community. Please see the quick start guide

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-12 Thread XiDuo You
+1 (non-binding) Jungtaek Lim 于2023年9月12日周二 15:14写道: > > +1 (non-binding) > > Thanks for driving this release and the patience on multiple RCs! > > On Tue, Sep 12, 2023 at 10:00 AM Yuanjian Li wrote: >> >> +1 (non-binding) >> >> Yuanjian Li 于2023年9月11日周一 09:36写道: >>> >>> @Peter Toth I've

[VOTE][RESULT] Release Apache Spark 3.5.0 (RC5)

2023-09-12 Thread Yuanjian Li
The vote passes with 13 +1s (8 binding +1s). Thank you all who helped with the release! (* = binding) +1: - Mridul Muralidharan (*) - Yuanjian Li - Xiao Li (*) - Gengliang Wang (*) - Hyukjin Kwon (*) - Ruifeng Zheng (*) - Jungtaek Lim - Wenchen Fan (*) - Jia Fan - Jie Yang - Yuming Wang (*) -

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-12 Thread Dongjoon Hyun
+1 Dongjoon. On 2023/09/12 03:38:37 Kent Yao wrote: > +1 (non-binding), great work! > > Kent Yao > > Yuming Wang 于2023年9月12日周二 11:32写道: > > > > +1. > > > > On Tue, Sep 12, 2023 at 10:57 AM yangjie01 > > wrote: > >> > >> +1 > >> > >> > >> > >> 发件人: Jia Fan > >> 日期: 2023年9月12日 星期二 10:08 > >>

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Kent Yao
+1 (non-binding), great work! Kent Yao Yuming Wang 于2023年9月12日周二 11:32写道: > > +1. > > On Tue, Sep 12, 2023 at 10:57 AM yangjie01 > wrote: >> >> +1 >> >> >> >> 发件人: Jia Fan >> 日期: 2023年9月12日 星期二 10:08 >> 收件人: Ruifeng Zheng >> 抄送: Hyukjin Kwon , Xiao Li , >> Mridul Muralidharan , Peter Toth

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Yuming Wang
+1. On Tue, Sep 12, 2023 at 10:57 AM yangjie01 wrote: > +1 > > > > *发件人**: *Jia Fan > *日期**: *2023年9月12日 星期二 10:08 > *收件人**: *Ruifeng Zheng > *抄送**: *Hyukjin Kwon , Xiao Li , > Mridul Muralidharan , Peter Toth , > Spark dev list , Yuanjian Li > > *主题**: *Re: [VOTE] Release Apache Spark 3.5.0

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread yangjie01
+1 发件人: Jia Fan 日期: 2023年9月12日 星期二 10:08 收件人: Ruifeng Zheng 抄送: Hyukjin Kwon , Xiao Li , Mridul Muralidharan , Peter Toth , Spark dev list , Yuanjian Li 主题: Re: [VOTE] Release Apache Spark 3.5.0 (RC5) +1 Ruifeng Zheng mailto:ruife...@apache.org>> 于2023年9月12日周二 08:46写道: +1 On Tue, Sep 12,

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Jia Fan
+1 Ruifeng Zheng 于2023年9月12日周二 08:46写道: > +1 > > On Tue, Sep 12, 2023 at 7:24 AM Hyukjin Kwon wrote: > >> +1 >> >> On Tue, Sep 12, 2023 at 7:05 AM Xiao Li wrote: >> >>> +1 >>> >>> Xiao >>> >>> Yuanjian Li 于2023年9月11日周一 10:53写道: >>> @Peter Toth I've looked into the details of this

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Wenchen Fan
+1 On Tue, Sep 12, 2023 at 9:00 AM Yuanjian Li wrote: > +1 (non-binding) > > Yuanjian Li 于2023年9月11日周一 09:36写道: > >> @Peter Toth I've looked into the details of this >> issue, and it appears that it's neither a regression in version 3.5.0 nor a >> correctness issue. It's a bug related to a

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Jungtaek Lim
+1 (non-binding) Thanks for driving this release and the patience on multiple RCs! On Tue, Sep 12, 2023 at 10:00 AM Yuanjian Li wrote: > +1 (non-binding) > > Yuanjian Li 于2023年9月11日周一 09:36写道: > >> @Peter Toth I've looked into the details of this >> issue, and it appears that it's neither a

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Ruifeng Zheng
+1 On Tue, Sep 12, 2023 at 7:24 AM Hyukjin Kwon wrote: > +1 > > On Tue, Sep 12, 2023 at 7:05 AM Xiao Li wrote: > >> +1 >> >> Xiao >> >> Yuanjian Li 于2023年9月11日周一 10:53写道: >> >>> @Peter Toth I've looked into the details of this >>> issue, and it appears that it's neither a regression in

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Hyukjin Kwon
+1 On Tue, Sep 12, 2023 at 7:05 AM Xiao Li wrote: > +1 > > Xiao > > Yuanjian Li 于2023年9月11日周一 10:53写道: > >> @Peter Toth I've looked into the details of this >> issue, and it appears that it's neither a regression in version 3.5.0 nor a >> correctness issue. It's a bug related to a new

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Gengliang Wang
+1 On Mon, Sep 11, 2023 at 11:28 AM Xiao Li wrote: > +1 > > Xiao > > Yuanjian Li 于2023年9月11日周一 10:53写道: > >> @Peter Toth I've looked into the details of this >> issue, and it appears that it's neither a regression in version 3.5.0 nor a >> correctness issue. It's a bug related to a new

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Xiao Li
+1 Xiao Yuanjian Li 于2023年9月11日周一 10:53写道: > @Peter Toth I've looked into the details of this > issue, and it appears that it's neither a regression in version 3.5.0 nor a > correctness issue. It's a bug related to a new feature. I think we can fix > this in 3.5.1 and list it as a known issue

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Peter Toth
Thanks Yuanjian. Please disregard my -1 then. Yuanjian Li ezt írta (időpont: 2023. szept. 11., H, 18:36): > @Peter Toth I've looked into the details of this > issue, and it appears that it's neither a regression in version 3.5.0 nor a > correctness issue. It's a bug related to a new feature. I

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Yuanjian Li
+1 (non-binding) Yuanjian Li 于2023年9月11日周一 09:36写道: > @Peter Toth I've looked into the details of this > issue, and it appears that it's neither a regression in version 3.5.0 nor a > correctness issue. It's a bug related to a new feature. I think we can fix > this in 3.5.1 and list it as a

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Yuanjian Li
@Peter Toth I've looked into the details of this issue, and it appears that it's neither a regression in version 3.5.0 nor a correctness issue. It's a bug related to a new feature. I think we can fix this in 3.5.1 and list it as a known issue of the Scala client of Spark Connect in 3.5.0. Mridul

unsubscribe

2023-09-11 Thread Sairam Natarajan
unsubscribe

unsubscribe

2023-09-10 Thread Cenk Ariöz
unsubscribe

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-10 Thread Mridul Muralidharan
+1 Signatures, digests, etc check out fine. Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes Regards, Mridul On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li wrote: > Please vote on releasing the following candidate(RC5) as Apache Spark > version 3.5.0. > > The vote is open

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-10 Thread Peter Toth
Hi Yuanjian, Sorry, -1 from me. Let's not introduce this bugs in 3.5: https://issues.apache.org/jira/browse/SPARK-45109 / https://github.com/apache/spark/pull/42863 Best, Peter Yuanjian Li ezt írta (időpont: 2023. szept. 10., V, 10:39): > Yes, SPARK-44805 has been included. For the commits

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-10 Thread Yuanjian Li
@ian.a.mann...@gmail.com Thank you for your question. Because the voting period hasn't ended yet and this fix has just been merged, we don't want to release version 3.5.0 with a known correctness bug. We've quickly cut RC5, and we welcome you to continue assisting with the testing. Ian Manning

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-10 Thread Yuanjian Li
Yes, SPARK-44805 has been included. For the commits from RC4 to RC5, please refer to https://github.com/apache/spark/commits/v3.5.0-rc5. Mich Talebzadeh 于2023年9月9日周六 08:09写道: > Apologies that should read ... release 3.5.0 (RC4) plus .. > > Mich Talebzadeh, > Distinguished Technologist,

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-09 Thread Mich Talebzadeh
Apologies that should read ... release 3.5.0 (RC4) plus .. Mich Talebzadeh, Distinguished Technologist, Solutions Architect & Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-09 Thread Mich Talebzadeh
Hi, Can you please confirm that this cut is release 3.4.0 plus the resolved Jira https://issues.apache.org/jira/browse/SPARK-44805 which was already fixed yesterday? Nothing else I believe? Thanks Mich view my Linkedin profile

[VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-09 Thread Yuanjian Li
Please vote on releasing the following candidate(RC5) as Apache Spark version 3.5.0. The vote is open until 11:59pm Pacific time Sep 11th and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.5.0 [ ] -1 Do not release this

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-09 Thread Ian Manning
This issue is not a regression and yet we fail the vote? Couldn't this issue have been fixed in 3.5.1? Sorry I am new, so maybe this is how it works? On Sat, 9 Sep 2023, 02:29 Dongjoon Hyun, wrote: > Sorry but I'm -1 because there exists a late-arrival correctness patch > although it's not a

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-08 Thread Yuanjian Li
@Dongjoon Hyun Thank you for reporting this and for your prompt response. The vote has failed. I'll cut RC5 tonight, PST time. Dongjoon Hyun 于2023年9月8日周五 15:57写道: > Sorry but I'm -1 because there exists a late-arrival correctness patch > although it's not a regression. > > -

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-08 Thread Xinrong Meng
+1 Thank you for driving the release! On Fri, Sep 8, 2023 at 10:12 AM Jungtaek Lim wrote: > +1 (non-binding) > > Thanks for driving this release! > > On Fri, Sep 8, 2023 at 11:29 AM Holden Karau wrote: > >> +1 pip installing seems to function :) >> >> On Thu, Sep 7, 2023 at 7:22 PM Yuming

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-08 Thread Dongjoon Hyun
Sorry but I'm -1 because there exists a late-arrival correctness patch although it's not a regression. - https://issues.apache.org/jira/browse/SPARK-44805 "Data lost after union using spark.sql.parquet.enableNestedColumnVectorizedReader=true" - https://github.com/apache/spark/pull/42850 -

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-08 Thread Jungtaek Lim
+1 (non-binding) Thanks for driving this release! On Fri, Sep 8, 2023 at 11:29 AM Holden Karau wrote: > +1 pip installing seems to function :) > > On Thu, Sep 7, 2023 at 7:22 PM Yuming Wang wrote: > >> +1. >> >> On Thu, Sep 7, 2023 at 10:33 PM yangjie01 >> wrote: >> >>> +1 >>> >>> >>> >>>

Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Dipayan Dev
@Alfie Davidson : Awesome, it worked with "“org.elasticsearch.spark.sql”" But as soon as I switched to *elasticsearch-spark-20_2.12, *"es" also worked. On Fri, Sep 8, 2023 at 12:45 PM Dipayan Dev wrote: > > Let me try that and get back. Just wondering, if there a change in the > way we pass

Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Dipayan Dev
Let me try that and get back. Just wondering, if there a change in the way we pass the format in connector from Spark 2 to 3? On Fri, 8 Sep 2023 at 12:35 PM, Alfie Davidson wrote: > I am pretty certain you need to change the write.format from “es” to > “org.elasticsearch.spark.sql” > > Sent

Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Alfie Davidson
I am pretty certain you need to change the write.format from “es” to “org.elasticsearch.spark.sql”Sent from my iPhoneOn 8 Sep 2023, at 03:10, Dipayan Dev wrote:++ DevOn Thu, 7 Sep 2023 at 10:22 PM, Dipayan Dev wrote:Hi, Can you please elaborate your last response? I

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-07 Thread Holden Karau
+1 pip installing seems to function :) On Thu, Sep 7, 2023 at 7:22 PM Yuming Wang wrote: > +1. > > On Thu, Sep 7, 2023 at 10:33 PM yangjie01 > wrote: > >> +1 >> >> >> >> *发件人**: *Gengliang Wang >> *日期**: *2023年9月7日 星期四 12:53 >> *收件人**: *Yuanjian Li >> *抄送**: *Xiao Li ,

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-07 Thread Yuming Wang
+1. On Thu, Sep 7, 2023 at 10:33 PM yangjie01 wrote: > +1 > > > > *发件人**: *Gengliang Wang > *日期**: *2023年9月7日 星期四 12:53 > *收件人**: *Yuanjian Li > *抄送**: *Xiao Li , "her...@databricks.com.invalid" > , Spark dev list > *主题**: *Re: [VOTE] Release Apache Spark 3.5.0 (RC4) > > > > +1 > > > > On

Re: Making spark plan UI interactive

2023-09-07 Thread Calili dos Santos Silva
I really appreciate the idea. Another inspiration could be Datadog with its line graph and run logs below. Any way to graphically understand the application breakpoint can be great. Em qua., 6 de set. de 2023 08:04, Santosh Pingale escreveu: > Hey community > > Spark UI with the plan

Re: Elasticsearch support for Spark 3.x

2023-09-07 Thread Dipayan Dev
++ Dev On Thu, 7 Sep 2023 at 10:22 PM, Dipayan Dev wrote: > Hi, > > Can you please elaborate your last response? I don’t have any external > dependencies added, and just updated the Spark version as mentioned below. > > Can someone help me with this? > > On Fri, 1 Sep 2023 at 5:58 PM, Koert

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-07 Thread yangjie01
+1 发件人: Gengliang Wang 日期: 2023年9月7日 星期四 12:53 收件人: Yuanjian Li 抄送: Xiao Li , "her...@databricks.com.invalid" , Spark dev list 主题: Re: [VOTE] Release Apache Spark 3.5.0 (RC4) +1 On Wed, Sep 6, 2023 at 9:46 PM Yuanjian Li mailto:xyliyuanj...@gmail.com>> wrote: +1 (non-binding) Xiao Li

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-07 Thread Kent Yao
+1 (Non-binding) Kent Gengliang Wang 于2023年9月7日周四 14:09写道: > > +1 > > On Wed, Sep 6, 2023 at 9:46 PM Yuanjian Li wrote: >> >> +1 (non-binding) >> >> Xiao Li 于2023年9月6日周三 15:27写道: >>> >>> +1 >>> >>> Xiao >>> >>> Herman van Hovell 于2023年9月6日周三 22:08写道: Tested connect, and everything

Re: Making spark plan UI interactive

2023-09-06 Thread 泽民 朴
+1 Making it interactive can boost the productivity of developers who deals with complex plans. On 6 Sep 2023, at 14:39, Abdeali Kothari wrote:  I feel this pain frequently Something more interactive would be great On Wed, 6 Sep 2023 at 4:34 PM, Santosh Pingale wrote: Hey community Spark

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-06 Thread Gengliang Wang
+1 On Wed, Sep 6, 2023 at 9:46 PM Yuanjian Li wrote: > +1 (non-binding) > > Xiao Li 于2023年9月6日周三 15:27写道: > >> +1 >> >> Xiao >> >> Herman van Hovell 于2023年9月6日周三 22:08写道: >> >>> Tested connect, and everything looks good. >>> >>> +1 >>> >>> On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li >>>

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-06 Thread Yuanjian Li
+1 (non-binding) Xiao Li 于2023年9月6日周三 15:27写道: > +1 > > Xiao > > Herman van Hovell 于2023年9月6日周三 22:08写道: > >> Tested connect, and everything looks good. >> >> +1 >> >> On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li >> wrote: >> >>> Please vote on releasing the following candidate(RC4) as Apache

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-06 Thread Xiao Li
+1 Xiao Herman van Hovell 于2023年9月6日周三 22:08写道: > Tested connect, and everything looks good. > > +1 > > On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li wrote: > >> Please vote on releasing the following candidate(RC4) as Apache Spark >> version 3.5.0. >> >> The vote is open until 11:59pm Pacific

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-06 Thread Herman van Hovell
Tested connect, and everything looks good. +1 On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li wrote: > Please vote on releasing the following candidate(RC4) as Apache Spark > version 3.5.0. > > The vote is open until 11:59pm Pacific time Sep 8th and passes if a > majority +1 PMC votes are cast,

Re: Making spark plan UI interactive

2023-09-06 Thread Abdeali Kothari
I feel this pain frequently Something more interactive would be great On Wed, 6 Sep 2023 at 4:34 PM, Santosh Pingale wrote: > Hey community > > Spark UI with the plan visualisation is an excellent resource for finding > out crucial information about how your application is doing and what parts

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-09-06 Thread Mich Talebzadeh
Thanks Alison for your explanation. 1. As a matter of interest, what does "sessionCatalog.resolveProcedure" do? Does it recompile the stored procedure (SP)? 2. If the SP makes a reference to an underlying table and table schema is changed. then by definition that SP compiled plan will

Making spark plan UI interactive

2023-09-06 Thread Santosh Pingale
Hey community Spark UI with the plan visualisation is an excellent resource for finding out crucial information about how your application is doing and what parts of the execution can still be optimized to fulfill time/resource constraints. The graph in its current form is sufficient for simpler

Re: [DISCUSS] Incremental statistics collection

2023-09-06 Thread Rakesh Raushan
Hi all, I would like to hear more from community on this topic. I believe it would significantly improve statistics collection in spark. Thanks Rakesh On Sat, 2 Sep 2023 at 10:36 AM, Rakesh Raushan wrote: > Thanks all for all your insights. > > @Mich > I am not trying to introduce any

Release Note of Apache Spark 3.5.0

2023-09-06 Thread Yuanjian Li
Hi All, Thank you all for your valuable contributions to the Spark 3.5 release so far! I would appreciate your review and feedback on the release note. Please see here for the draft

[VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-06 Thread Yuanjian Li
Please vote on releasing the following candidate(RC4) as Apache Spark version 3.5.0. The vote is open until 11:59pm Pacific time Sep 8th and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.5.0 [ ] -1 Do not release this

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-09-05 Thread Allison Wang
Hi Mich, Thank you for your comments! I've left some comments on the SPIP, but let's continue the discussion here. You've highlighted the potential advantages of Python stored procedures, and I'd like to emphasize two important aspects: 1. *Versatility*: Integrating Python into SQL provides

Re: Feature to restart Spark job from previous failure point

2023-09-05 Thread Mich Talebzadeh
Hi Dipayan, You ought to maintain data source consistency minimising changes. upstream. Spark is not a Swiss Army knife :) Anyhow, we already do this in spark structured streaming with the concept of checkpointing.You can do so by implementing - Checkpointing - Stateful processing in

Feature to restart Spark job from previous failure point

2023-09-04 Thread Dipayan Dev
Hi Team, One of the biggest pain points we're facing is when Spark reads upstream partition data and during Action, the upstream also gets refreshed and the application fails with 'File not exists' error. It could happen that the job has already spent a reasonable amount of time, and re-running

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-09-03 Thread Mich Talebzadeh
On this subject of launching both the driver and the executors using lazy executor IDs, this can introduce complexity but potentially could be a viable strategy in certain scenarios. Basically your mileage varies Pros: 1. Faster Startup: launching the driver and initial executors

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-09-02 Thread Yuanjian Li
Sure, no problem. Holden Karau 于2023年9月2日周六 22:10写道: > Can we delay the next RC cut until after Labor Day? > > On Sat, Sep 2, 2023 at 9:59 PM Yuanjian Li wrote: > >> Thank you for all the reports! >> The vote has failed. I plan to cut RC4 in two days. >> >> @Dipayan Dev I quickly skimmed

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-09-02 Thread Holden Karau
Can we delay the next RC cut until after Labor Day? On Sat, Sep 2, 2023 at 9:59 PM Yuanjian Li wrote: > Thank you for all the reports! > The vote has failed. I plan to cut RC4 in two days. > > @Dipayan Dev I quickly skimmed through the > corresponding ticket, and it doesn't seem to be a

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-09-02 Thread Yuanjian Li
Thank you for all the reports! The vote has failed. I plan to cut RC4 in two days. @Dipayan Dev I quickly skimmed through the corresponding ticket, and it doesn't seem to be a regression introduced in 3.5. Additionally, someone is asking if this is the same issue as SPARK-35279. @Yuming Wang I

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-09-02 Thread Mich Talebzadeh
I have noticed an worthy discussion in the SPIP comments regarding the definition of "stored procedure" in the context of Spark, and I believe it is an important point to address. To provide some historical context, Sybase , a

Re: [DISCUSS] Incremental statistics collection

2023-09-01 Thread Rakesh Raushan
Thanks all for all your insights. @Mich I am not trying to introduce any sampling model here. This idea is about collecting the task write metrics while writing the data and aggregating it with the existing values present in the catalog(create a new entry if it's a CTAS command). This approach is

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-09-01 Thread Jungtaek Lim
My apologies, I have to add another ticket for a blocker, SPARK-45045 . That said, I'm -1 (non-binding). SPARK-43183 made a behavioral change regarding the StreamingQueryListener as well as

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-31 Thread Wenchen Fan
Sorry for the last-minute bug report, but we found a regression in 3.5: the SQL INSERT command without a column list fills missing columns with NULL while Spark 3.4 does not allow it. According to the SQL standard, this shouldn't be allowed and thus a regression in 3.5. The fix has been merged

Re: [DISCUSS] Updating documentation hosted for EOL and maintenance releases

2023-08-31 Thread Matei Zaharia
It would be great to do this IMO, because there are often usability and formatting fixes needed to docs over time, and people naturally search for docs from their *deployed* version of the project — not the latest version, hoping that it also applies to their release. For example, right now

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-31 Thread Mich Talebzadeh
I concur with the view point raised by @Sean Owen While this might introduce some challenges related to compatibility and environment issues, it is not fundamentally different from how the users currently import and use common code in Python. The main difference is that now this shared code would

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-31 Thread Ian Manning
+1 (non-binding) Using Spark Core, Spark SQL, Structured Streaming. On Tue, Aug 29, 2023 at 8:12 PM Yuanjian Li wrote: > Please vote on releasing the following candidate(RC3) as Apache Spark > version 3.5.0. > > The vote is open until 11:59pm Pacific time Aug 31st and passes if a > majority +1

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-31 Thread Sean Owen
I think you're talking past Hyukjin here. I think the response is: none of that is managed by Pyspark now, and this proposal does not change that. Your current interpreter and environment is used to execute the stored procedure, which is just Python code. It's on you to bring an environment that

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-31 Thread Mich Talebzadeh
These are my initial thoughts: As usual your mileage varies. Depending on the use case, introducing support for stored procedures (SP) in Spark SQL with Python as the procedural language *Pros* - Can potentially provide more flexibility and capabilities in the respective SQL workflows. We

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-31 Thread Mich Talebzadeh
Thanks Allison! Mich Talebzadeh, Distinguished Technologist, Solutions Architect & Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any

[DISCUSS] Updating documentation hosted for EOL and maintenance releases

2023-08-30 Thread Hyukjin Kwon
Hi all, I would like to raise a discussion about updating documentation hosted for EOL and maintenance versions. To provide some context, we currently host the documentation for EOL versions of Apache Spark, which can be found at links like

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Alexander Shorin
> Which Python version will run that stored procedure? > > All Python versions supported in PySpark > Where in stored procedure defines the exact python version which will run the code? That was the question. > How to manage external dependencies? > > Existing way we have >

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Hyukjin Kwon
Which Python version will run that stored procedure? All Python versions supported in PySpark How to manage external dependencies? Existing way we have https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html . In fact, this will use the external dependencies within your

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Alexander Shorin
-1 Great idea to ignore the experience of others and copy bad practices back for nothing. If you are familiar with Python ecosystem then you should answer the questions: 1. Which Python version will run that stored procedure? 2. How to manage external dependencies? 3. How to test it via a common

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-30 Thread Yuming Wang
It seems can not check signature: yumwang@G9L07H60PK Downloads % gpg --keyserver hkps://keys.openpgp.org --recv-key FC3AE3A7EAA1BAC98770840E7E1ABCC53AAA2216 gpg: key 7E1ABCC53AAA2216: no user ID gpg: Total number processed: 1 yumwang@G9L07H60PK Downloads % gpg --batch --verify

<    5   6   7   8   9   10   11   12   13   14   >