Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-30 Thread Wenchen Fan
Hi Jason,

Thanks for reporting! https://issues.apache.org/jira/browse/SPARK-32136 looks
like a breaking change and we should investigate.

On Wed, Jul 1, 2020 at 11:31 AM Holden Karau  wrote:

> I can take care of 2.4.7 unless someone else wants to do it.
>
> On Tue, Jun 30, 2020 at 8:29 PM Jason Moore 
> wrote:
>
>> Hi all,
>>
>>
>>
>> Could I get some input on the severity of this one that I found
>> yesterday?  If that’s a correctness issue, should it block this patch?  Let
>> me know under the ticket if there’s more info that I can provide to help.
>>
>>
>>
>> https://issues.apache.org/jira/browse/SPARK-32136
>>
>>
>>
>> Thanks,
>>
>> Jason.
>>
>>
>>
>> *From: *Jungtaek Lim 
>> *Date: *Wednesday, 1 July 2020 at 10:20 am
>> *To: *Shivaram Venkataraman 
>> *Cc: *Prashant Sharma , 郑瑞峰 ,
>> Gengliang Wang , gurwls223 <
>> gurwls...@gmail.com>, Dongjoon Hyun , Jules
>> Damji , Holden Karau ,
>> Reynold Xin , Yuanjian Li ,
>> "dev@spark.apache.org" , Takeshi Yamamuro <
>> linguin@gmail.com>
>> *Subject: *Re: [DISCUSS] Apache Spark 3.0.1 Release
>>
>>
>>
>> SPARK-32130 [1] looks to be a performance regression introduced in Spark
>> 3.0.0, which is ideal to look into before releasing another bugfix version.
>>
>>
>>
>> 1. https://issues.apache.org/jira/browse/SPARK-32130
>>
>>
>>
>> On Wed, Jul 1, 2020 at 7:05 AM Shivaram Venkataraman <
>> shiva...@eecs.berkeley.edu> wrote:
>>
>> Hi all
>>
>>
>>
>> I just wanted to ping this thread to see if all the outstanding blockers
>> for 3.0.1 have been fixed. If so, it would be great if we can get the
>> release going. The CRAN team sent us a note that the version SparkR
>> available on CRAN for the current R version (4.0.2) is broken and hence we
>> need to update the package soon --  it will be great to do it with 3.0.1.
>>
>>
>>
>> Thanks
>>
>> Shivaram
>>
>>
>>
>> On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma 
>> wrote:
>>
>> +1 for 3.0.1 release.
>>
>> I too can help out as release manager.
>>
>>
>>
>> On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰  wrote:
>>
>> I volunteer to be a release manager of 3.0.1, if nobody is working on
>> this.
>>
>>
>>
>>
>>
>> -- 原始邮件 --
>>
>> *发**件人**:* "Gengliang Wang";
>>
>> *发**送**时间**:* 2020年6月24日(星期三) 下午4:15
>>
>> *收件人**:* "Hyukjin Kwon";
>>
>> *抄送**:* "Dongjoon Hyun";"Jungtaek Lim"<
>> kabhwan.opensou...@gmail.com>;"Jules Damji";"Holden
>> Karau";"Reynold Xin";"Shivaram
>> Venkataraman";"Yuanjian Li"<
>> xyliyuanj...@gmail.com>;"Spark dev list";"Takeshi
>> Yamamuro";
>>
>> *主**题**:* Re: [DISCUSS] Apache Spark 3.0.1 Release
>>
>>
>>
>> +1, the issues mentioned are really serious.
>>
>>
>>
>> On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon  wrote:
>>
>> +1.
>>
>> Just as a note,
>> - SPARK-31918  is
>> fixed now, and there's no blocker. - When we build SparkR, we should use
>> the latest R version at least 4.0.0+.
>>
>>
>>
>> 2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun 님이 작성:
>>
>> +1
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>> On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>> +1 on a 3.0.1 soon.
>>
>>
>>
>> Probably it would be nice if some Scala experts can take a look at
>> https://issues.apache.org/jira/browse/SPARK-32051 and include the fix
>> into 3.0.1 if possible.
>>
>> Looks like APIs designed to work with Scala 2.11 & Java bring
>> ambiguity in Scala 2.12 & Java.
>>
>>
>>
>> On Wed, Jun 24, 2020 at 4:52 AM Jules Damji  wrote:
>>
>> +1 (non-binding)
>>
>>
>>
>> Sent from my iPhone
>>
>> Pardon the dumb thumb typos :)
>>
>>
>>
>> On Jun 23, 2020, at 11:36 AM, Holden Karau  wrote:
>>
>> +1 on a patch release soon
>>
>>
>>
>> On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin  wrote:
>>
>> *Error! Filename not specified.*
>>
>> +1 on doing a new patch release soon. I saw some of these issues when
>> preparing the 3.0 release, and some of them are very serious.
>>
>>
>>
>>
>>
>> On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <
>> shiva...@eecs.berkeley.edu> wrote:
>>
>> +1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release
>> soon.
>>
>> Shivaram
>>
>> On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro 
>> wrote:
>>
>> Thanks for the heads-up, Yuanjian!
>>
>> I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.
>>
>> wow, the updates are so quick. Anyway, +1 for the release.
>>
>> Bests,
>> Takeshi
>>
>> On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li 
>> wrote:
>>
>> Hi dev-list,
>>
>> I’m writing this to raise the discussion about Spark 3.0.1 feasibility
>> since 4 blocker issues were found after Spark 3.0.0:
>>
>> [SPARK-31990] The state store compatibility broken will cause a
>> correctness issue when Streaming query with `dropDuplicate` uses the
>> checkpoint written by the old Spark version.
>>
>> [SPARK-32038] The regression bug in handling NaN values in
>> COUNT(DISTINCT)
>>
>> [SPARK-31918][WIP] CRAN requires to make it working with the latest R
>> 4.0. It makes the 3.0 

Re: [Spark Core] Merging PR #23340 for New Executor Memory Metrics

2020-06-30 Thread Alex Scammon
Thank you!

I just reached out to the original author and we'll get the conflicts sorted 
out amongst us, I'm sure.

Thanks again,

-Alex

From: Dongjoon Hyun 
Sent: Tuesday, June 30, 2020 9:01 PM
To: Alex Scammon 
Cc: Michel Sumbul ; dev@spark.apache.org 

Subject: Re: [Spark Core] Merging PR #23340 for New Executor Memory Metrics

HI, Alex and Michel.

I removed the `Stale` label and reopened it for now. You may want to ping the 
original author because the last update of that PR is one year ago and has many 
conflicts as of today.

Bests,
Dongjoon.

On Tue, Jun 30, 2020 at 10:56 AM Alex Scammon 
mailto:alex.scam...@ext.gresearch.co.uk>> 
wrote:
Can I buymeacoffee.com for someone to take a look at 
PR#23340?  I'm totally not above 
outright bribery to get some eyes on this PR.

Thanks,

-Alex

From: Michel Sumbul mailto:michelsum...@yahoo.fr>>
Sent: Thursday, June 25, 2020 11:48 AM
To: dev@spark.apache.org 
mailto:dev@spark.apache.org>>; Alex Scammon 
mailto:alex.scam...@ext.gresearch.co.uk>>
Subject: Re: [Spark Core] Merging PR #23340 for New Executor Memory Metrics


Hey Dev team,

I agreed with Alex, theses metrics can be really usefull to tune jobs.
Any chances someone can have a look at it?

Thanks,
Michel
Le lundi 22 juin 2020 à 22:48:23 UTC+1, Alex Scammon 
mailto:alex.scam...@ext.gresearch.co.uk>> a 
écrit :


Hi there devs,

Congrats on Spark 3.0.0, that's great to see.

I'm hoping to get some eyes on something old, however:

  *   https://github.com/apache/spark/pull/23340

I'm really just trying to get some eyes on this PR and see if we can still move 
it forward.  I reached out to the reviewers of the PR but haven't heard 
anything back so I thought I'd try here instead.  We're happy to help sort out 
any remaining issues if there are any.

This particular PR is part of a larger story that LinkedIn was working on here:

  *   https://issues.apache.org/jira/browse/SPARK-23206

Any help getting #23340 opened back up and moving again would be very much 
appreciated.

Cheers,

Alex Scammon
Head of Open Source Engineering
G-Research



Re: [Spark Core] Merging PR #23340 for New Executor Memory Metrics

2020-06-30 Thread Dongjoon Hyun
HI, Alex and Michel.

I removed the `Stale` label and reopened it for now. You may want to ping
the original author because the last update of that PR is one year ago and
has many conflicts as of today.

Bests,
Dongjoon.

On Tue, Jun 30, 2020 at 10:56 AM Alex Scammon <
alex.scam...@ext.gresearch.co.uk> wrote:

> Can I buymeacoffee.com for someone to take a look at PR#23340
> ?  I'm totally not above
> outright bribery to get some eyes on this PR.
>
> Thanks,
>
> -Alex
> --
> *From:* Michel Sumbul 
> *Sent:* Thursday, June 25, 2020 11:48 AM
> *To:* dev@spark.apache.org ; Alex Scammon <
> alex.scam...@ext.gresearch.co.uk>
> *Subject:* Re: [Spark Core] Merging PR #23340 for New Executor Memory
> Metrics
>
>
> Hey Dev team,
>
> I agreed with Alex, theses metrics can be really usefull to tune jobs.
> Any chances someone can have a look at it?
>
> Thanks,
> Michel
> Le lundi 22 juin 2020 à 22:48:23 UTC+1, Alex Scammon <
> alex.scam...@ext.gresearch.co.uk> a écrit :
>
>
> Hi there devs,
>
> Congrats on Spark 3.0.0, that's great to see.
>
> I'm hoping to get some eyes on something old, however:
>
>- https://github.com/apache/spark/pull/23340
>
> I'm really just trying to get some eyes on this PR and see if we can still
> move it forward.  I reached out to the reviewers of the PR but haven't
> heard anything back so I thought I'd try here instead.  We're happy to help
> sort out any remaining issues if there are any.
>
> This particular PR is part of a larger story that LinkedIn was working on
> here:
>
>- https://issues.apache.org/jira/browse/SPARK-23206
>
> Any help getting #23340 opened back up and moving again would be very much
> appreciated.
>
> Cheers,
>
> Alex Scammon
> Head of Open Source Engineering
> G-Research
>
>


Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-30 Thread Holden Karau
I can take care of 2.4.7 unless someone else wants to do it.

On Tue, Jun 30, 2020 at 8:29 PM Jason Moore 
wrote:

> Hi all,
>
>
>
> Could I get some input on the severity of this one that I found
> yesterday?  If that’s a correctness issue, should it block this patch?  Let
> me know under the ticket if there’s more info that I can provide to help.
>
>
>
> https://issues.apache.org/jira/browse/SPARK-32136
>
>
>
> Thanks,
>
> Jason.
>
>
>
> *From: *Jungtaek Lim 
> *Date: *Wednesday, 1 July 2020 at 10:20 am
> *To: *Shivaram Venkataraman 
> *Cc: *Prashant Sharma , 郑瑞峰 ,
> Gengliang Wang , gurwls223 <
> gurwls...@gmail.com>, Dongjoon Hyun , Jules
> Damji , Holden Karau , Reynold
> Xin , Yuanjian Li , "
> dev@spark.apache.org" , Takeshi Yamamuro <
> linguin@gmail.com>
> *Subject: *Re: [DISCUSS] Apache Spark 3.0.1 Release
>
>
>
> SPARK-32130 [1] looks to be a performance regression introduced in Spark
> 3.0.0, which is ideal to look into before releasing another bugfix version.
>
>
>
> 1. https://issues.apache.org/jira/browse/SPARK-32130
>
>
>
> On Wed, Jul 1, 2020 at 7:05 AM Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
> Hi all
>
>
>
> I just wanted to ping this thread to see if all the outstanding blockers
> for 3.0.1 have been fixed. If so, it would be great if we can get the
> release going. The CRAN team sent us a note that the version SparkR
> available on CRAN for the current R version (4.0.2) is broken and hence we
> need to update the package soon --  it will be great to do it with 3.0.1.
>
>
>
> Thanks
>
> Shivaram
>
>
>
> On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma 
> wrote:
>
> +1 for 3.0.1 release.
>
> I too can help out as release manager.
>
>
>
> On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰  wrote:
>
> I volunteer to be a release manager of 3.0.1, if nobody is working on this.
>
>
>
>
>
> -- 原始邮件 --
>
> *发**件人**:* "Gengliang Wang";
>
> *发**送**时间**:* 2020年6月24日(星期三) 下午4:15
>
> *收件人**:* "Hyukjin Kwon";
>
> *抄送**:* "Dongjoon Hyun";"Jungtaek Lim"<
> kabhwan.opensou...@gmail.com>;"Jules Damji";"Holden
> Karau";"Reynold Xin";"Shivaram
> Venkataraman";"Yuanjian Li"<
> xyliyuanj...@gmail.com>;"Spark dev list";"Takeshi
> Yamamuro";
>
> *主**题**:* Re: [DISCUSS] Apache Spark 3.0.1 Release
>
>
>
> +1, the issues mentioned are really serious.
>
>
>
> On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon  wrote:
>
> +1.
>
> Just as a note,
> - SPARK-31918  is
> fixed now, and there's no blocker. - When we build SparkR, we should use
> the latest R version at least 4.0.0+.
>
>
>
> 2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun 님이 작성:
>
> +1
>
>
>
> Bests,
>
> Dongjoon.
>
>
>
> On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim 
> wrote:
>
> +1 on a 3.0.1 soon.
>
>
>
> Probably it would be nice if some Scala experts can take a look at
> https://issues.apache.org/jira/browse/SPARK-32051 and include the fix
> into 3.0.1 if possible.
>
> Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in
> Scala 2.12 & Java.
>
>
>
> On Wed, Jun 24, 2020 at 4:52 AM Jules Damji  wrote:
>
> +1 (non-binding)
>
>
>
> Sent from my iPhone
>
> Pardon the dumb thumb typos :)
>
>
>
> On Jun 23, 2020, at 11:36 AM, Holden Karau  wrote:
>
> +1 on a patch release soon
>
>
>
> On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin  wrote:
>
> *Error! Filename not specified.*
>
> +1 on doing a new patch release soon. I saw some of these issues when
> preparing the 3.0 release, and some of them are very serious.
>
>
>
>
>
> On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
> +1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.
>
> Shivaram
>
> On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro 
> wrote:
>
> Thanks for the heads-up, Yuanjian!
>
> I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.
>
> wow, the updates are so quick. Anyway, +1 for the release.
>
> Bests,
> Takeshi
>
> On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li 
> wrote:
>
> Hi dev-list,
>
> I’m writing this to raise the discussion about Spark 3.0.1 feasibility
> since 4 blocker issues were found after Spark 3.0.0:
>
> [SPARK-31990] The state store compatibility broken will cause a
> correctness issue when Streaming query with `dropDuplicate` uses the
> checkpoint written by the old Spark version.
>
> [SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)
>
> [SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0.
> It makes the 3.0 release unavailable on CRAN, and only supports R [3.5,
> 4.0)
>
> [SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression
>
> I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I
> think it would be great if we have Spark 3.0.1 to deliver the critical
> fixes.
>
> Any comments are appreciated.
>
> Best,
>
> Yuanjian
>
> --
> ---
> Takeshi Yamamuro
>
> 

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-30 Thread Jason Moore
Hi all,

Could I get some input on the severity of this one that I found yesterday?  If 
that’s a correctness issue, should it block this patch?  Let me know under the 
ticket if there’s more info that I can provide to help.

https://issues.apache.org/jira/browse/SPARK-32136

Thanks,
Jason.

From: Jungtaek Lim 
Date: Wednesday, 1 July 2020 at 10:20 am
To: Shivaram Venkataraman 
Cc: Prashant Sharma , 郑瑞峰 , 
Gengliang Wang , gurwls223 
, Dongjoon Hyun , Jules Damji 
, Holden Karau , Reynold Xin 
, Yuanjian Li , 
"dev@spark.apache.org" , Takeshi Yamamuro 

Subject: Re: [DISCUSS] Apache Spark 3.0.1 Release

SPARK-32130 [1] looks to be a performance regression introduced in Spark 3.0.0, 
which is ideal to look into before releasing another bugfix version.

1. https://issues.apache.org/jira/browse/SPARK-32130

On Wed, Jul 1, 2020 at 7:05 AM Shivaram Venkataraman 
mailto:shiva...@eecs.berkeley.edu>> wrote:
Hi all

I just wanted to ping this thread to see if all the outstanding blockers for 
3.0.1 have been fixed. If so, it would be great if we can get the release 
going. The CRAN team sent us a note that the version SparkR available on CRAN 
for the current R version (4.0.2) is broken and hence we need to update the 
package soon --  it will be great to do it with 3.0.1.

Thanks
Shivaram

On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma 
mailto:scrapco...@gmail.com>> wrote:
+1 for 3.0.1 release.
I too can help out as release manager.

On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰 
mailto:ruife...@foxmail.com>> wrote:
I volunteer to be a release manager of 3.0.1, if nobody is working on this.


-- 原始邮件 --
发件人: "Gengliang 
Wang"mailto:gengliang.w...@databricks.com>>;
发送时间: 2020年6月24日(星期三) 下午4:15
收件人: "Hyukjin Kwon"mailto:gurwls...@gmail.com>>;
抄送: "Dongjoon 
Hyun"mailto:dongjoon.h...@gmail.com>>;"Jungtaek 
Lim"mailto:kabhwan.opensou...@gmail.com>>;"Jules 
Damji"mailto:dmat...@comcast.net>>;"Holden 
Karau"mailto:hol...@pigscanfly.ca>>;"Reynold 
Xin"mailto:r...@databricks.com>>;"Shivaram 
Venkataraman"mailto:shiva...@eecs.berkeley.edu>>;"Yuanjian
 Li"mailto:xyliyuanj...@gmail.com>>;"Spark dev 
list"mailto:dev@spark.apache.org>>;"Takeshi 
Yamamuro"mailto:linguin@gmail.com>>;
主题: Re: [DISCUSS] Apache Spark 3.0.1 Release

+1, the issues mentioned are really serious.

On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
+1.

Just as a note,
- SPARK-31918 is fixed now, 
and there's no blocker. - When we build SparkR, we should use the latest R 
version at least 4.0.0+.

2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>>님이 작성:
+1

Bests,
Dongjoon.

On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim 
mailto:kabhwan.opensou...@gmail.com>> wrote:
+1 on a 3.0.1 soon.

Probably it would be nice if some Scala experts can take a look at 
https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 
3.0.1 if possible.
Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in 
Scala 2.12 & Java.

On Wed, Jun 24, 2020 at 4:52 AM Jules Damji 
mailto:dmat...@comcast.net>> wrote:
+1 (non-binding)

Sent from my iPhone
Pardon the dumb thumb typos :)


On Jun 23, 2020, at 11:36 AM, Holden Karau 
mailto:hol...@pigscanfly.ca>> wrote:
+1 on a patch release soon

On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin 
mailto:r...@databricks.com>> wrote:
Error! Filename not specified.
+1 on doing a new patch release soon. I saw some of these issues when preparing 
the 3.0 release, and some of them are very serious.


On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman 
mailto:shiva...@eecs.berkeley.edu>> wrote:

+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.

Shivaram

On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro 
mailto:linguin@gmail.com>> wrote:

Thanks for the heads-up, Yuanjian!

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

wow, the updates are so quick. Anyway, +1 for the release.

Bests,
Takeshi

On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li 
mailto:xyliyuanj...@gmail.com>> wrote:

Hi dev-list,

I’m writing this to raise the discussion about Spark 3.0.1 feasibility since 4 
blocker issues were found after Spark 3.0.0:

[SPARK-31990] The state store compatibility broken will cause a correctness 
issue when Streaming query with `dropDuplicate` uses the checkpoint written by 
the old Spark version.

[SPARK-32038] The regression bug in handling NaN values in COUNT(DISTINCT)

[SPARK-31918][WIP] CRAN requires to make it working with the latest R 4.0. It 
makes the 3.0 release unavailable on CRAN, and only supports R [3.5, 4.0)

[SPARK-31967] Downgrade vis.js to fix Jobs UI loading time regression

I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. I think it 
would be great if we have Spark 3.0.1 to deliver the critical fixes.

Any comments are appreciated.

Best,

Yuanjian

--
---
Takeshi Yamamuro


Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-30 Thread Jungtaek Lim
SPARK-32130 [1] looks to be a performance regression introduced in Spark
3.0.0, which is ideal to look into before releasing another bugfix version.

1. https://issues.apache.org/jira/browse/SPARK-32130

On Wed, Jul 1, 2020 at 7:05 AM Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> Hi all
>
> I just wanted to ping this thread to see if all the outstanding blockers
> for 3.0.1 have been fixed. If so, it would be great if we can get the
> release going. The CRAN team sent us a note that the version SparkR
> available on CRAN for the current R version (4.0.2) is broken and hence we
> need to update the package soon --  it will be great to do it with 3.0.1.
>
> Thanks
> Shivaram
>
> On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma 
> wrote:
>
>> +1 for 3.0.1 release.
>> I too can help out as release manager.
>>
>> On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰  wrote:
>>
>>> I volunteer to be a release manager of 3.0.1, if nobody is working on
>>> this.
>>>
>>>
>>> -- 原始邮件 --
>>> *发件人:* "Gengliang Wang";
>>> *发送时间:* 2020年6月24日(星期三) 下午4:15
>>> *收件人:* "Hyukjin Kwon";
>>> *抄送:* "Dongjoon Hyun";"Jungtaek Lim"<
>>> kabhwan.opensou...@gmail.com>;"Jules Damji";"Holden
>>> Karau";"Reynold Xin";"Shivaram
>>> Venkataraman";"Yuanjian Li"<
>>> xyliyuanj...@gmail.com>;"Spark dev list";"Takeshi
>>> Yamamuro";
>>> *主题:* Re: [DISCUSS] Apache Spark 3.0.1 Release
>>>
>>> +1, the issues mentioned are really serious.
>>>
>>> On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon 
>>> wrote:
>>>
 +1.

 Just as a note,
 - SPARK-31918  is
 fixed now, and there's no blocker. - When we build SparkR, we should use
 the latest R version at least 4.0.0+.

 2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun 님이
 작성:

> +1
>
> Bests,
> Dongjoon.
>
> On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
>
>> +1 on a 3.0.1 soon.
>>
>> Probably it would be nice if some Scala experts can take a look at
>> https://issues.apache.org/jira/browse/SPARK-32051 and include the
>> fix into 3.0.1 if possible.
>> Looks like APIs designed to work with Scala 2.11 & Java bring
>> ambiguity in Scala 2.12 & Java.
>>
>> On Wed, Jun 24, 2020 at 4:52 AM Jules Damji 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Sent from my iPhone
>>> Pardon the dumb thumb typos :)
>>>
>>> On Jun 23, 2020, at 11:36 AM, Holden Karau 
>>> wrote:
>>>
>>> 
>>> +1 on a patch release soon
>>>
>>> On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin 
>>> wrote:
>>>
 +1 on doing a new patch release soon. I saw some of these issues
 when preparing the 3.0 release, and some of them are very serious.


 On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <
 shiva...@eecs.berkeley.edu> wrote:

> +1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1
> release soon.
>
> Shivaram
>
> On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <
> linguin@gmail.com> wrote:
>
> Thanks for the heads-up, Yuanjian!
>
> I also noticed branch-3.0 already has 39 commits after Spark
> 3.0.0.
>
> wow, the updates are so quick. Anyway, +1 for the release.
>
> Bests,
> Takeshi
>
> On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li <
> xyliyuanj...@gmail.com> wrote:
>
> Hi dev-list,
>
> I’m writing this to raise the discussion about Spark 3.0.1
> feasibility since 4 blocker issues were found after Spark 3.0.0:
>
> [SPARK-31990] The state store compatibility broken will cause a
> correctness issue when Streaming query with `dropDuplicate` uses the
> checkpoint written by the old Spark version.
>
> [SPARK-32038] The regression bug in handling NaN values in
> COUNT(DISTINCT)
>
> [SPARK-31918][WIP] CRAN requires to make it working with the
> latest R 4.0. It makes the 3.0 release unavailable on CRAN, and only
> supports R [3.5, 4.0)
>
> [SPARK-31967] Downgrade vis.js to fix Jobs UI loading time
> regression
>
> I also noticed branch-3.0 already has 39 commits after Spark
> 3.0.0. I think it would be great if we have Spark 3.0.1 to deliver the
> critical fixes.
>
> Any comments are appreciated.
>
> Best,
>
> Yuanjian
>
> --
> ---
> Takeshi Yamamuro
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


>>>
>>> --
>>> Twitter: 

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-30 Thread Shivaram Venkataraman
Hi all

I just wanted to ping this thread to see if all the outstanding blockers
for 3.0.1 have been fixed. If so, it would be great if we can get the
release going. The CRAN team sent us a note that the version SparkR
available on CRAN for the current R version (4.0.2) is broken and hence we
need to update the package soon --  it will be great to do it with 3.0.1.

Thanks
Shivaram

On Wed, Jun 24, 2020 at 8:31 PM Prashant Sharma 
wrote:

> +1 for 3.0.1 release.
> I too can help out as release manager.
>
> On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰  wrote:
>
>> I volunteer to be a release manager of 3.0.1, if nobody is working on
>> this.
>>
>>
>> -- 原始邮件 --
>> *发件人:* "Gengliang Wang";
>> *发送时间:* 2020年6月24日(星期三) 下午4:15
>> *收件人:* "Hyukjin Kwon";
>> *抄送:* "Dongjoon Hyun";"Jungtaek Lim"<
>> kabhwan.opensou...@gmail.com>;"Jules Damji";"Holden
>> Karau";"Reynold Xin";"Shivaram
>> Venkataraman";"Yuanjian Li"<
>> xyliyuanj...@gmail.com>;"Spark dev list";"Takeshi
>> Yamamuro";
>> *主题:* Re: [DISCUSS] Apache Spark 3.0.1 Release
>>
>> +1, the issues mentioned are really serious.
>>
>> On Tue, Jun 23, 2020 at 7:56 PM Hyukjin Kwon  wrote:
>>
>>> +1.
>>>
>>> Just as a note,
>>> - SPARK-31918  is
>>> fixed now, and there's no blocker. - When we build SparkR, we should use
>>> the latest R version at least 4.0.0+.
>>>
>>> 2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun 님이 작성:
>>>
 +1

 Bests,
 Dongjoon.

 On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> +1 on a 3.0.1 soon.
>
> Probably it would be nice if some Scala experts can take a look at
> https://issues.apache.org/jira/browse/SPARK-32051 and include the fix
> into 3.0.1 if possible.
> Looks like APIs designed to work with Scala 2.11 & Java bring
> ambiguity in Scala 2.12 & Java.
>
> On Wed, Jun 24, 2020 at 4:52 AM Jules Damji 
> wrote:
>
>> +1 (non-binding)
>>
>> Sent from my iPhone
>> Pardon the dumb thumb typos :)
>>
>> On Jun 23, 2020, at 11:36 AM, Holden Karau 
>> wrote:
>>
>> 
>> +1 on a patch release soon
>>
>> On Tue, Jun 23, 2020 at 10:47 AM Reynold Xin 
>> wrote:
>>
>>> +1 on doing a new patch release soon. I saw some of these issues
>>> when preparing the 3.0 release, and some of them are very serious.
>>>
>>>
>>> On Tue, Jun 23, 2020 at 8:06 AM, Shivaram Venkataraman <
>>> shiva...@eecs.berkeley.edu> wrote:
>>>
 +1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1
 release soon.

 Shivaram

 On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro <
 linguin@gmail.com> wrote:

 Thanks for the heads-up, Yuanjian!

 I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.

 wow, the updates are so quick. Anyway, +1 for the release.

 Bests,
 Takeshi

 On Tue, Jun 23, 2020 at 4:59 PM Yuanjian Li 
 wrote:

 Hi dev-list,

 I’m writing this to raise the discussion about Spark 3.0.1
 feasibility since 4 blocker issues were found after Spark 3.0.0:

 [SPARK-31990] The state store compatibility broken will cause a
 correctness issue when Streaming query with `dropDuplicate` uses the
 checkpoint written by the old Spark version.

 [SPARK-32038] The regression bug in handling NaN values in
 COUNT(DISTINCT)

 [SPARK-31918][WIP] CRAN requires to make it working with the latest
 R 4.0. It makes the 3.0 release unavailable on CRAN, and only supports 
 R
 [3.5, 4.0)

 [SPARK-31967] Downgrade vis.js to fix Jobs UI loading time
 regression

 I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.
 I think it would be great if we have Spark 3.0.1 to deliver the 
 critical
 fixes.

 Any comments are appreciated.

 Best,

 Yuanjian

 --
 ---
 Takeshi Yamamuro

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

>>>
>>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>


Re: [Spark Core] Merging PR #23340 for New Executor Memory Metrics

2020-06-30 Thread Alex Scammon
Can I buymeacoffee.com for someone to take a look at 
PR#23340?  I'm totally not above 
outright bribery to get some eyes on this PR.

Thanks,

-Alex

From: Michel Sumbul 
Sent: Thursday, June 25, 2020 11:48 AM
To: dev@spark.apache.org ; Alex Scammon 

Subject: Re: [Spark Core] Merging PR #23340 for New Executor Memory Metrics


Hey Dev team,

I agreed with Alex, theses metrics can be really usefull to tune jobs.
Any chances someone can have a look at it?

Thanks,
Michel
Le lundi 22 juin 2020 à 22:48:23 UTC+1, Alex Scammon 
 a écrit :


Hi there devs,

Congrats on Spark 3.0.0, that's great to see.

I'm hoping to get some eyes on something old, however:

  *   https://github.com/apache/spark/pull/23340

I'm really just trying to get some eyes on this PR and see if we can still move 
it forward.  I reached out to the reviewers of the PR but haven't heard 
anything back so I thought I'd try here instead.  We're happy to help sort out 
any remaining issues if there are any.

This particular PR is part of a larger story that LinkedIn was working on here:

  *   https://issues.apache.org/jira/browse/SPARK-23206

Any help getting #23340 opened back up and moving again would be very much 
appreciated.

Cheers,

Alex Scammon
Head of Open Source Engineering
G-Research



Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-30 Thread Tom Graves
 Stage Level Scheduling -  https://issues.apache.org/jira/browse/SPARK-27495

TomOn Monday, June 29, 2020, 11:07:18 AM CDT, Dongjoon Hyun 
 wrote:  
 
 Hi, All.
After a short celebration of Apache Spark 3.0, I'd like to ask you the 
community opinion on Apache Spark 3.1 feature expectations.
First of all, Apache Spark 3.1 is scheduled for December 2020.- 
https://spark.apache.org/versioning-policy.html
I'm expecting the following items:
1. Support Scala 2.132. Use Apache Hadoop 3.2 by default for better cloud 
support3. Declaring Kubernetes Scheduler GA    In my perspective, the last main 
missing piece was Dynamic allocation and    - Dynamic allocation with shuffle 
tracking is already shipped at 3.0.    - Dynamic allocation with worker 
decommission/data migration is targeting 3.1. (Thanks, Holden)4. DSv2 
Stabilization
I'm aware of some more features which are on the way currently, but I love to 
hear the opinions from the main developers and more over the main users who 
need those features.
Thank you in advance. Welcome for any comments.
Bests,Dongjoon.  

Re: Spark 3 pod template for the driver

2020-06-30 Thread Michel Sumbul
Hi Edeesis,

The goal is to not have these settings in the spark submit command. If I
specify the same things in a pod template for the executor, I still got the
message:
"Exception in thread "main" org.apache.spark.SparkException "Must specify
the driver container image"

it even don't try to start an executor container as the driver is not
started yet.
Any idea?

Thanks,
Michel

Le mar. 30 juin 2020 à 00:06, edeesis  a écrit :

> If I could muster a guess, you still need to specify the executor image. As
> is, this will only specify the driver image.
>
> You can specify it as --conf spark.kubernetes.container.image or --conf
> spark.kubernetes.executor.container.image
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Contribute to Apache Spark

2020-06-30 Thread Takeshi Yamamuro
Hi,

Thanks for your interest!
Please read the contribution guide first:
https://spark.apache.org/contributing.html

We don't have such a permission and you can file issues in Jira by yourself,
then make a PR for them.

Enjoy your work!

On Tue, Jun 30, 2020 at 2:34 PM 飘鹅玉雪 <397189...@qq.com> wrote:

> Hi,
> I want to contribute to Apache Spark.
> Would you please give me the contributor permission?
> My JIRA ID is suizhe007.
>


-- 
---
Takeshi Yamamuro