Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread Felix Cheung
Reposting for shane here

[SPARK-27178]
https://github.com/apache/spark/commit/342e91fdfa4e6ce5cc3a0da085d1fe723184021b

Is problematic too and it’s not in the rc8 cut

https://github.com/apache/spark/commits/branch-2.4

(Personally I don’t want to delay 2.4.1 either..)


From: Sean Owen 
Sent: Wednesday, March 20, 2019 11:18 AM
To: DB Tsai
Cc: dev
Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

+1 for this RC. The tag is correct, licenses and sigs check out, tests
of the source with most profiles enabled works for me.

On Tue, Mar 19, 2019 at 5:28 PM DB Tsai  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.4.1.
>
> The vote is open until March 23 PST and passes if a majority +1 PMC votes are 
> cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.1-rc8 (commit 
> 746b3ddee6f7ad3464e326228ea226f5b1f39a41):
> https://github.com/apache/spark/tree/v2.4.1-rc8
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1318/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-docs/
>
> The list of bug fixes going into 2.4.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.1?
> ===
>
> The current list of open tickets targeted at 2.4.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.4.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-20 Thread Xiangrui Meng
Steve, the initial work would focus on GPUs, but we will keep the
interfaces general to support other accelerators in the future. This was
mentioned in the SPIP and draft design.

Imran, you should have comment permission now. Thanks for making a pass! I
don't think the proposed 3.0 features should block Spark 3.0 release
either. It is just an estimate of what we could deliver. I will update the
doc to make it clear.

Felix, it would be great if you can review the updated docs and let us know
your feedback.

** How about setting a tentative vote closing time to next Tue (Mar 26)?

On Wed, Mar 20, 2019 at 11:01 AM Imran Rashid  wrote:

> Thanks for sending the updated docs.  Can you please give everyone the
> ability to comment?  I have some comments, but overall I think this is a
> good proposal and addresses my prior concerns.
>
> My only real concern is that I notice some mention of "must dos" for spark
> 3.0.  I don't want to make any commitment to holding spark 3.0 for parts of
> this, I think that is an entirely separate decision.  However I'm guessing
> this is just a minor wording issue, and you really mean that's a minimal
> set of features you are aiming for, which is reasonable.
>
> On Mon, Mar 18, 2019 at 12:56 PM Xingbo Jiang 
> wrote:
>
>> Hi all,
>>
>> I updated the SPIP doc
>> 
>> and stories
>> ,
>> I hope it now contains clear scope of the changes and enough details for
>> SPIP vote.
>> Please review the updated docs, thanks!
>>
>> Xiangrui Meng  于2019年3月6日周三 上午8:35写道:
>>
>>> How about letting Xingbo make a major revision to the SPIP doc to make
>>> it clear what proposed are? I like Felix's suggestion to switch to the new
>>> Heilmeier template, which helps clarify what are proposed and what are not.
>>> Then let's review the new SPIP and resume the vote.
>>>
>>> On Tue, Mar 5, 2019 at 7:54 AM Imran Rashid 
>>> wrote:
>>>
 OK, I suppose then we are getting bogged down into what a vote on an
 SPIP means then anyway, which I guess we can set aside for now.  With the
 level of detail in this proposal, I feel like there is a reasonable chance
 I'd still -1 the design or implementation.

 And the other thing you're implicitly asking the community for is to
 prioritize this feature for continued review and maintenance.  There is
 already work to be done in things like making barrier mode support dynamic
 allocation (SPARK-24942), bugs in failure handling (eg. SPARK-25250), and
 general efficiency of failure handling (eg. SPARK-25341, SPARK-20178).  I'm
 very concerned about getting spread too thin.

>>>
 But if this is really just a vote on (1) is better gpu support
 important for spark, in some form, in some release? and (2) is it
 *possible* to do this in a safe way?  then I will vote +0.

 On Tue, Mar 5, 2019 at 8:25 AM Tom Graves  wrote:

> So to me most of the questions here are implementation/design
> questions, I've had this issue in the past with SPIP's where I expected to
> have more high level design details but was basically told that belongs in
> the design jira follow on. This makes me think we need to revisit what a
> SPIP really need to contain, which should be done in a separate thread.
> Note personally I would be for having more high level details in it.
> But the way I read our documentation on a SPIP right now that detail
> is all optional, now maybe we could argue its based on what reviewers
> request, but really perhaps we should make the wording of that more
> required.  thoughts?  We should probably separate that discussion if 
> people
> want to talk about that.
>
> For this SPIP in particular the reason I +1 it is because it came down
> to 2 questions:
>
> 1) do I think spark should support this -> my answer is yes, I think
> this would improve spark, users have been requesting both better GPUs
> support and support for controlling container requests at a finer
> granularity for a while.  If spark doesn't support this then users may go
> to something else, so I think it we should support it
>
> 2) do I think its possible to design and implement it without causing
> large instabilities?   My opinion here again is yes. I agree with Imran 
> and
> others that the scheduler piece needs to be looked at very closely as we
> have had a lot of issues there and that is why I was asking for more
> details in the design jira:
> https://issues.apache.org/jira/browse/SPARK-27005.  But I do believe
> its possible to do.
>
> If others have reservations on similar questions then I think we
> should resolve here or take the discussion of what a SPIP is to a 
> different
> thread and th

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread Sean Owen
+1 for this RC. The tag is correct, licenses and sigs check out, tests
of the source with most profiles enabled works for me.

On Tue, Mar 19, 2019 at 5:28 PM DB Tsai  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.4.1.
>
> The vote is open until March 23 PST and passes if a majority +1 PMC votes are 
> cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.1-rc8 (commit 
> 746b3ddee6f7ad3464e326228ea226f5b1f39a41):
> https://github.com/apache/spark/tree/v2.4.1-rc8
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1318/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-docs/
>
> The list of bug fixes going into 2.4.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.1?
> ===
>
> The current list of open tickets targeted at 2.4.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.4.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, 
> Inc
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread DB Tsai
Unfortunately, for this 2.4.1 RC cuts, we ran into couple critical bug
fixes unexpectedly just right after each RC was cut, and some of the
bugs were even found after the RC cut which is hard to know there is
still a blocker beforehand.

How about we start to test out the RC8 now given the differences
between RC8 and 2.4.0 are big? If an issue is found to justify to fail
RC8, we can include SPARK-27112 and SPARK-27160 in next cut. Thus,
even we decide to cut another RC, it will be easier to test.

Thanks.

Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 42E5B25A8F7A82C1


Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 42E5B25A8F7A82C1


On Wed, Mar 20, 2019 at 9:48 AM dhruve ashar  wrote:
>
> I agree with Imran on this. Since we are already on rc8, we don't want to 
> indefinitely hold the release for one more fix, but it this one is a severe 
> deadlock.
>
> Note to myself and other community members: May be we can be more proactive 
> in checking and reporting inprogress PR reviews/JIRAs for any blockers as 
> early or as soon as the first RC is cut, so that we don't hold the release 
> process. I believe in this case, the merge commit and RC were in a very close 
> time frame.
>
> On Wed, Mar 20, 2019 at 11:32 AM Imran Rashid  wrote:
>>
>> Even if only PMC are able to veto a release, I believe all community members 
>> are encouraged to vote, even a -1, to express their opinions, right?
>>
>> I am -0.5 on the release because of SPARK-27112.  It is not a regression, so 
>> in that sense I don't think it must hold the release.  But it is fixing a 
>> pretty bad deadlock.
>>
>> that said, I'm only -0.5 because (a) I don't want to keep holding the 
>> release indefinitely for "one more fix" and (b) this will probably only hit 
>> users running on large clusters -- probably sophisticated enough users to 
>> apply their own set of patches.  I'd prefer we cut another rc with the fix, 
>> but understand the tradeoffs here.
>>
>> On Wed, Mar 20, 2019 at 10:17 AM Sean Owen  wrote:
>>>
>>> Is it a regression from 2.4.0? that's not the only criteria but part of it.
>>> The version link is 
>>> https://issues.apache.org/jira/projects/SPARK/versions/12344117
>>>
>>> On Wed, Mar 20, 2019 at 10:15 AM dhruve ashar  wrote:

 A deadlock bug was recently fixed and backported to 2.4, but the rc was 
 cut before that. I think we should include a critical bug like that in the 
 current rc.

 issue: https://issues.apache.org/jira/browse/SPARK-27112
 commit: 
 https://github.com/apache/spark/commit/95e73b328ac883be2ced9099f20c8878e498e297

 I am hitting a deadlink while checking: 
 https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
 I don't know what the right url looks like, but we should fix it.

 On Wed, Mar 20, 2019 at 9:14 AM Stavros Kontopoulos 
  wrote:
>
> +1  (non-binding)
>
> On Wed, Mar 20, 2019 at 8:33 AM Sean Owen  wrote:
>>
>> (Only the PMC can veto a release)
>> That doesn't look like a regression. I get that it's important, but I
>> don't see that it should block this release.
>>
>> On Tue, Mar 19, 2019 at 11:00 PM Darcy Shen  
>> wrote:
>> >
>> > -1
>> >
>> > please backpoart SPARK-27160, a correctness issue about ORC native 
>> > reader.
>> >
>> > see https://github.com/apache/spark/pull/24092
>> >
>> >
>> >  On Wed, 20 Mar 2019 06:21:29 +0800 DB Tsai 
>> >  wrote 
>> >
>> > Please vote on releasing the following candidate as Apache Spark 
>> > version 2.4.1.
>> >
>> > The vote is open until March 23 PST and passes if a majority +1 PMC 
>> > votes are cast, with
>> > a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.4.1
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.4.1-rc8 (commit 
>> > 746b3ddee6f7ad3464e326228ea226f5b1f39a41):
>> > https://github.com/apache/spark/tree/v2.4.1-rc8
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1318/
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-docs/
>> >
>> > The list of bug fixes going into 2.4.1 can be found at the following 
>> > URL:
>> > https://i

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-20 Thread Imran Rashid
Thanks for sending the updated docs.  Can you please give everyone the
ability to comment?  I have some comments, but overall I think this is a
good proposal and addresses my prior concerns.

My only real concern is that I notice some mention of "must dos" for spark
3.0.  I don't want to make any commitment to holding spark 3.0 for parts of
this, I think that is an entirely separate decision.  However I'm guessing
this is just a minor wording issue, and you really mean that's a minimal
set of features you are aiming for, which is reasonable.

On Mon, Mar 18, 2019 at 12:56 PM Xingbo Jiang  wrote:

> Hi all,
>
> I updated the SPIP doc
> 
> and stories
> ,
> I hope it now contains clear scope of the changes and enough details for
> SPIP vote.
> Please review the updated docs, thanks!
>
> Xiangrui Meng  于2019年3月6日周三 上午8:35写道:
>
>> How about letting Xingbo make a major revision to the SPIP doc to make it
>> clear what proposed are? I like Felix's suggestion to switch to the new
>> Heilmeier template, which helps clarify what are proposed and what are not.
>> Then let's review the new SPIP and resume the vote.
>>
>> On Tue, Mar 5, 2019 at 7:54 AM Imran Rashid  wrote:
>>
>>> OK, I suppose then we are getting bogged down into what a vote on an
>>> SPIP means then anyway, which I guess we can set aside for now.  With the
>>> level of detail in this proposal, I feel like there is a reasonable chance
>>> I'd still -1 the design or implementation.
>>>
>>> And the other thing you're implicitly asking the community for is to
>>> prioritize this feature for continued review and maintenance.  There is
>>> already work to be done in things like making barrier mode support dynamic
>>> allocation (SPARK-24942), bugs in failure handling (eg. SPARK-25250), and
>>> general efficiency of failure handling (eg. SPARK-25341, SPARK-20178).  I'm
>>> very concerned about getting spread too thin.
>>>
>>
>>> But if this is really just a vote on (1) is better gpu support important
>>> for spark, in some form, in some release? and (2) is it *possible* to do
>>> this in a safe way?  then I will vote +0.
>>>
>>> On Tue, Mar 5, 2019 at 8:25 AM Tom Graves  wrote:
>>>
 So to me most of the questions here are implementation/design
 questions, I've had this issue in the past with SPIP's where I expected to
 have more high level design details but was basically told that belongs in
 the design jira follow on. This makes me think we need to revisit what a
 SPIP really need to contain, which should be done in a separate thread.
 Note personally I would be for having more high level details in it.
 But the way I read our documentation on a SPIP right now that detail is
 all optional, now maybe we could argue its based on what reviewers request,
 but really perhaps we should make the wording of that more required.
  thoughts?  We should probably separate that discussion if people want to
 talk about that.

 For this SPIP in particular the reason I +1 it is because it came down
 to 2 questions:

 1) do I think spark should support this -> my answer is yes, I think
 this would improve spark, users have been requesting both better GPUs
 support and support for controlling container requests at a finer
 granularity for a while.  If spark doesn't support this then users may go
 to something else, so I think it we should support it

 2) do I think its possible to design and implement it without causing
 large instabilities?   My opinion here again is yes. I agree with Imran and
 others that the scheduler piece needs to be looked at very closely as we
 have had a lot of issues there and that is why I was asking for more
 details in the design jira:
 https://issues.apache.org/jira/browse/SPARK-27005.  But I do believe
 its possible to do.

 If others have reservations on similar questions then I think we should
 resolve here or take the discussion of what a SPIP is to a different thread
 and then come back to this, thoughts?

 Note there is a high level design for at least the core piece, which is
 what people seem concerned with, already so including it in the SPIP should
 be straight forward.

 Tom

 On Monday, March 4, 2019, 2:52:43 PM CST, Imran Rashid <
 im...@therashids.com> wrote:


 On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng  wrote:

 On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung 
 wrote:

 IMO upfront allocation is less useful. Specifically too expensive for
 large jobs.


 This is also an API/design discussion.


 I agree with Felix -- this is more than just an API question.  It has a
 huge impact on the complexity of what you're p

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread Imran Rashid
Even if only PMC are able to veto a release, I believe all community
members are encouraged to vote, even a -1, to express their opinions, right?

I am -0.5 on the release because of SPARK-27112.  It is not a regression,
so in that sense I don't think it must hold the release.  But it is fixing
a pretty bad deadlock.

that said, I'm only -0.5 because (a) I don't want to keep holding the
release indefinitely for "one more fix" and (b) this will probably only hit
users running on large clusters -- probably sophisticated enough users to
apply their own set of patches.  I'd prefer we cut another rc with the fix,
but understand the tradeoffs here.

On Wed, Mar 20, 2019 at 10:17 AM Sean Owen  wrote:

> Is it a regression from 2.4.0? that's not the only criteria but part of it.
> The version link is
> https://issues.apache.org/jira/projects/SPARK/versions/12344117
>
> On Wed, Mar 20, 2019 at 10:15 AM dhruve ashar 
> wrote:
>
>> A deadlock bug was recently fixed and backported to 2.4, but the rc was
>> cut before that. I think we should include a critical bug like that in the
>> current rc.
>>
>> issue: https://issues.apache.org/jira/browse/SPARK-27112
>> commit:
>> https://github.com/apache/spark/commit/95e73b328ac883be2ced9099f20c8878e498e297
>>
>> I am hitting a deadlink while checking:
>> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>> I don't know what the right url looks like, but we should fix it.
>>
>> On Wed, Mar 20, 2019 at 9:14 AM Stavros Kontopoulos <
>> stavros.kontopou...@lightbend.com> wrote:
>>
>>> +1  (non-binding)
>>>
>>> On Wed, Mar 20, 2019 at 8:33 AM Sean Owen  wrote:
>>>
 (Only the PMC can veto a release)
 That doesn't look like a regression. I get that it's important, but I
 don't see that it should block this release.

 On Tue, Mar 19, 2019 at 11:00 PM Darcy Shen 
 wrote:
 >
 > -1
 >
 > please backpoart SPARK-27160, a correctness issue about ORC native
 reader.
 >
 > see https://github.com/apache/spark/pull/24092
 >
 >
 >  On Wed, 20 Mar 2019 06:21:29 +0800 DB Tsai
  wrote 
 >
 > Please vote on releasing the following candidate as Apache Spark
 version 2.4.1.
 >
 > The vote is open until March 23 PST and passes if a majority +1 PMC
 votes are cast, with
 > a minimum of 3 +1 votes.
 >
 > [ ] +1 Release this package as Apache Spark 2.4.1
 > [ ] -1 Do not release this package because ...
 >
 > To learn more about Apache Spark, please see http://spark.apache.org/
 >
 > The tag to be voted on is v2.4.1-rc8 (commit
 746b3ddee6f7ad3464e326228ea226f5b1f39a41):
 > https://github.com/apache/spark/tree/v2.4.1-rc8
 >
 > The release files, including signatures, digests, etc. can be found
 at:
 > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-bin/
 >
 > Signatures used for Spark RCs can be found in this file:
 > https://dist.apache.org/repos/dist/dev/spark/KEYS
 >
 > The staging repository for this release can be found at:
 >
 https://repository.apache.org/content/repositories/orgapachespark-1318/
 >
 > The documentation corresponding to this release can be found at:
 > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-docs/
 >
 > The list of bug fixes going into 2.4.1 can be found at the following
 URL:
 > https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
 >
 > FAQ
 >
 > =
 > How can I help test this release?
 > =
 >
 > If you are a Spark user, you can help us test this release by taking
 > an existing Spark workload and running on this release candidate, then
 > reporting any regressions.
 >
 > If you're working in PySpark you can set up a virtual env and install
 > the current RC and see if anything important breaks, in the Java/Scala
 > you can add the staging repository to your projects resolvers and test
 > with the RC (make sure to clean up the artifact cache before/after so
 > you don't end up building with a out of date RC going forward).
 >
 > ===
 > What should happen to JIRA tickets still targeting 2.4.1?
 > ===
 >
 > The current list of open tickets targeted at 2.4.1 can be found at:
 > https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 2.4.1
 >
 > Committers should look at those and triage. Extremely important bug
 > fixes, documentation, and API tweaks that impact compatibility should
 > be worked on immediately. Everything else please retarget to an
 > appropriate release.
 >
 > ==
 > But my bug isn't fixed?
 > ==
 >
 > In order to make timely releases, we will typically not hold the
 > release unless the bug 

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread dhruve ashar
I agree with Imran on this. Since we are already on rc8, we don't want to
indefinitely hold the release for one more fix, but it this one is a severe
deadlock.

Note to myself and other community members: May be we can be more proactive
in checking and reporting inprogress PR reviews/JIRAs for any blockers as
early or as soon as the first RC is cut, so that we don't hold the release
process. I believe in this case, the merge commit and RC were in a very
close time frame.

On Wed, Mar 20, 2019 at 11:32 AM Imran Rashid  wrote:

> Even if only PMC are able to veto a release, I believe all community
> members are encouraged to vote, even a -1, to express their opinions, right?
>
> I am -0.5 on the release because of SPARK-27112.  It is not a regression,
> so in that sense I don't think it must hold the release.  But it is fixing
> a pretty bad deadlock.
>
> that said, I'm only -0.5 because (a) I don't want to keep holding the
> release indefinitely for "one more fix" and (b) this will probably only hit
> users running on large clusters -- probably sophisticated enough users to
> apply their own set of patches.  I'd prefer we cut another rc with the fix,
> but understand the tradeoffs here.
>
> On Wed, Mar 20, 2019 at 10:17 AM Sean Owen  wrote:
>
>> Is it a regression from 2.4.0? that's not the only criteria but part of
>> it.
>> The version link is
>> https://issues.apache.org/jira/projects/SPARK/versions/12344117
>>
>> On Wed, Mar 20, 2019 at 10:15 AM dhruve ashar 
>> wrote:
>>
>>> A deadlock bug was recently fixed and backported to 2.4, but the rc was
>>> cut before that. I think we should include a critical bug like that in the
>>> current rc.
>>>
>>> issue: https://issues.apache.org/jira/browse/SPARK-27112
>>> commit:
>>> https://github.com/apache/spark/commit/95e73b328ac883be2ced9099f20c8878e498e297
>>>
>>> I am hitting a deadlink while checking:
>>> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>>> I don't know what the right url looks like, but we should fix it.
>>>
>>> On Wed, Mar 20, 2019 at 9:14 AM Stavros Kontopoulos <
>>> stavros.kontopou...@lightbend.com> wrote:
>>>
 +1  (non-binding)

 On Wed, Mar 20, 2019 at 8:33 AM Sean Owen  wrote:

> (Only the PMC can veto a release)
> That doesn't look like a regression. I get that it's important, but I
> don't see that it should block this release.
>
> On Tue, Mar 19, 2019 at 11:00 PM Darcy Shen 
> wrote:
> >
> > -1
> >
> > please backpoart SPARK-27160, a correctness issue about ORC native
> reader.
> >
> > see https://github.com/apache/spark/pull/24092
> >
> >
> >  On Wed, 20 Mar 2019 06:21:29 +0800 DB Tsai
>  wrote 
> >
> > Please vote on releasing the following candidate as Apache Spark
> version 2.4.1.
> >
> > The vote is open until March 23 PST and passes if a majority +1 PMC
> votes are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 2.4.1
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see
> http://spark.apache.org/
> >
> > The tag to be voted on is v2.4.1-rc8 (commit
> 746b3ddee6f7ad3464e326228ea226f5b1f39a41):
> > https://github.com/apache/spark/tree/v2.4.1-rc8
> >
> > The release files, including signatures, digests, etc. can be found
> at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> >
> https://repository.apache.org/content/repositories/orgapachespark-1318/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-docs/
> >
> > The list of bug fixes going into 2.4.1 can be found at the following
> URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate,
> then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the
> Java/Scala
> > you can add the staging repository to your projects resolvers and
> test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 2.4.1?
> > =

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread Sean Owen
Is it a regression from 2.4.0? that's not the only criteria but part of it.
The version link is
https://issues.apache.org/jira/projects/SPARK/versions/12344117

On Wed, Mar 20, 2019 at 10:15 AM dhruve ashar  wrote:

> A deadlock bug was recently fixed and backported to 2.4, but the rc was
> cut before that. I think we should include a critical bug like that in the
> current rc.
>
> issue: https://issues.apache.org/jira/browse/SPARK-27112
> commit:
> https://github.com/apache/spark/commit/95e73b328ac883be2ced9099f20c8878e498e297
>
> I am hitting a deadlink while checking:
> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
> I don't know what the right url looks like, but we should fix it.
>
> On Wed, Mar 20, 2019 at 9:14 AM Stavros Kontopoulos <
> stavros.kontopou...@lightbend.com> wrote:
>
>> +1  (non-binding)
>>
>> On Wed, Mar 20, 2019 at 8:33 AM Sean Owen  wrote:
>>
>>> (Only the PMC can veto a release)
>>> That doesn't look like a regression. I get that it's important, but I
>>> don't see that it should block this release.
>>>
>>> On Tue, Mar 19, 2019 at 11:00 PM Darcy Shen 
>>> wrote:
>>> >
>>> > -1
>>> >
>>> > please backpoart SPARK-27160, a correctness issue about ORC native
>>> reader.
>>> >
>>> > see https://github.com/apache/spark/pull/24092
>>> >
>>> >
>>> >  On Wed, 20 Mar 2019 06:21:29 +0800 DB Tsai
>>>  wrote 
>>> >
>>> > Please vote on releasing the following candidate as Apache Spark
>>> version 2.4.1.
>>> >
>>> > The vote is open until March 23 PST and passes if a majority +1 PMC
>>> votes are cast, with
>>> > a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.4.1
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.4.1-rc8 (commit
>>> 746b3ddee6f7ad3464e326228ea226f5b1f39a41):
>>> > https://github.com/apache/spark/tree/v2.4.1-rc8
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> >
>>> https://repository.apache.org/content/repositories/orgapachespark-1318/
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-docs/
>>> >
>>> > The list of bug fixes going into 2.4.1 can be found at the following
>>> URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> >
>>> > If you are a Spark user, you can help us test this release by taking
>>> > an existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and install
>>> > the current RC and see if anything important breaks, in the Java/Scala
>>> > you can add the staging repository to your projects resolvers and test
>>> > with the RC (make sure to clean up the artifact cache before/after so
>>> > you don't end up building with a out of date RC going forward).
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 2.4.1?
>>> > ===
>>> >
>>> > The current list of open tickets targeted at 2.4.1 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.4.1
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> >
>>> > In order to make timely releases, we will typically not hold the
>>> > release unless the bug in question is a regression from the previous
>>> > release. That being said, if there is something which is a regression
>>> > that has not been correctly targeted please ping me or a committer to
>>> > help target the issue.
>>> >
>>> >
>>> > DB Tsai | Siri Open Source Technologies [not a contribution] | 
>>> Apple, Inc
>>> >
>>> >
>>> > -
>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >
>>> >
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>> --
>> Stavros Kontopoulos
>>
>> *Senior Software Engineer*
>> *Lightbend, Inc.*
>>
>> *p:  +30 6977967274 <%2B1%20650%20678%200020>*
>> *e: stavros.kontopou...@lightbend.com* 
>>
>>

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread dhruve ashar
A deadlock bug was recently fixed and backported to 2.4, but the rc was cut
before that. I think we should include a critical bug like that in the
current rc.

issue: https://issues.apache.org/jira/browse/SPARK-27112
commit:
https://github.com/apache/spark/commit/95e73b328ac883be2ced9099f20c8878e498e297

I am hitting a deadlink while checking:
https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
I don't know what the right url looks like, but we should fix it.

On Wed, Mar 20, 2019 at 9:14 AM Stavros Kontopoulos <
stavros.kontopou...@lightbend.com> wrote:

> +1  (non-binding)
>
> On Wed, Mar 20, 2019 at 8:33 AM Sean Owen  wrote:
>
>> (Only the PMC can veto a release)
>> That doesn't look like a regression. I get that it's important, but I
>> don't see that it should block this release.
>>
>> On Tue, Mar 19, 2019 at 11:00 PM Darcy Shen 
>> wrote:
>> >
>> > -1
>> >
>> > please backpoart SPARK-27160, a correctness issue about ORC native
>> reader.
>> >
>> > see https://github.com/apache/spark/pull/24092
>> >
>> >
>> >  On Wed, 20 Mar 2019 06:21:29 +0800 DB Tsai
>>  wrote 
>> >
>> > Please vote on releasing the following candidate as Apache Spark
>> version 2.4.1.
>> >
>> > The vote is open until March 23 PST and passes if a majority +1 PMC
>> votes are cast, with
>> > a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.4.1
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.4.1-rc8 (commit
>> 746b3ddee6f7ad3464e326228ea226f5b1f39a41):
>> > https://github.com/apache/spark/tree/v2.4.1-rc8
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1318/
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-docs/
>> >
>> > The list of bug fixes going into 2.4.1 can be found at the following
>> URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>> >
>> > FAQ
>> >
>> > =
>> > How can I help test this release?
>> > =
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===
>> > What should happen to JIRA tickets still targeting 2.4.1?
>> > ===
>> >
>> > The current list of open tickets targeted at 2.4.1 can be found at:
>> > https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 2.4.1
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==
>> > But my bug isn't fixed?
>> > ==
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>> >
>> >
>> > DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple,
>> Inc
>> >
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>> >
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> Stavros Kontopoulos
>
> *Senior Software Engineer*
> *Lightbend, Inc.*
>
> *p:  +30 6977967274 <%2B1%20650%20678%200020>*
> *e: stavros.kontopou...@lightbend.com* 
>
>
>

-- 
-Dhruve Ashar


Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread Stavros Kontopoulos
+1  (non-binding)

On Wed, Mar 20, 2019 at 8:33 AM Sean Owen  wrote:

> (Only the PMC can veto a release)
> That doesn't look like a regression. I get that it's important, but I
> don't see that it should block this release.
>
> On Tue, Mar 19, 2019 at 11:00 PM Darcy Shen 
> wrote:
> >
> > -1
> >
> > please backpoart SPARK-27160, a correctness issue about ORC native
> reader.
> >
> > see https://github.com/apache/spark/pull/24092
> >
> >
> >  On Wed, 20 Mar 2019 06:21:29 +0800 DB Tsai 
> wrote 
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 2.4.1.
> >
> > The vote is open until March 23 PST and passes if a majority +1 PMC
> votes are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 2.4.1
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v2.4.1-rc8 (commit
> 746b3ddee6f7ad3464e326228ea226f5b1f39a41):
> > https://github.com/apache/spark/tree/v2.4.1-rc8
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1318/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-docs/
> >
> > The list of bug fixes going into 2.4.1 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 2.4.1?
> > ===
> >
> > The current list of open tickets targeted at 2.4.1 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.1
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> >
> > DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple,
> Inc
> >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
Stavros Kontopoulos

*Senior Software Engineer*
*Lightbend, Inc.*

*p:  +30 6977967274 <%2B1%20650%20678%200020>*
*e: stavros.kontopou...@lightbend.com* 


Re: Introduce FORMAT clause to CAST with SQL:2016 datetime patterns

2019-03-20 Thread Maciej Szymkiewicz
One concern here is introduction of second formatting convention.

This can not only cause confusion among users, but also result in some hard
to spot bugs, when wrong format, with different meaning, is used. This is
already a problem for Python and R users, with week year and months /
minutes mixups popping out from time to time.

On Wed, 20 Mar 2019 at 10:53, Gabor Kaszab  wrote:

> Hey Hive and Spark communities,
> [dev@impala in cc]
>
> I'm working on an Impala improvement to introduce the FORMAT clause within
> CAST() operator and to implement ISO SQL:2016 datetime pattern support for
> this new FORMAT clause:
> https://issues.apache.org/jira/browse/IMPALA-4018
>
> One example of the new format:
> SELECT(CAST("2018-01-02 09:15" as timestamp FORMAT "-MM-DD HH12:MI"));
>
> I have put together a document for my proposal of how to do this in Impala
> and what patterns we plan to support to cover the SQL standard and what
> additional patterns we propose to support on top of the standard's
> recommendation.
>
> https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/
>
> The reason I share this with the Hive and Spark communities because I feel
> it would be nice that these systems were in line with the Impala
> implementation. So I'd like to involve these communities to the planning
> phase of this task so that everyone can share their opinion about whether
> this make sense in the proposed form.
> Eventually I feel that each of these systems should have the SQL:2016
> datetime format and I think it would be nice to have it with a newly
> introduced CAST(..FORMAT..) clause.
>
> I would like to ask members from both Hive and Spark to take a look at my
> proposal and share their opinion from their own component's perspective. If
> we get on the same page I'll eventually open Jiras to cover this
> improvement for each mentioned systems.
>
> Cheers,
> Gabor
>
>
>
>

-- 

Regards,
Maciej


Introduce FORMAT clause to CAST with SQL:2016 datetime patterns

2019-03-20 Thread Gabor Kaszab
Hey Hive and Spark communities,
[dev@impala in cc]

I'm working on an Impala improvement to introduce the FORMAT clause within
CAST() operator and to implement ISO SQL:2016 datetime pattern support for
this new FORMAT clause:
https://issues.apache.org/jira/browse/IMPALA-4018

One example of the new format:
SELECT(CAST("2018-01-02 09:15" as timestamp FORMAT "-MM-DD HH12:MI"));

I have put together a document for my proposal of how to do this in Impala
and what patterns we plan to support to cover the SQL standard and what
additional patterns we propose to support on top of the standard's
recommendation.
https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/

The reason I share this with the Hive and Spark communities because I feel
it would be nice that these systems were in line with the Impala
implementation. So I'd like to involve these communities to the planning
phase of this task so that everyone can share their opinion about whether
this make sense in the proposed form.
Eventually I feel that each of these systems should have the SQL:2016
datetime format and I think it would be nice to have it with a newly
introduced CAST(..FORMAT..) clause.

I would like to ask members from both Hive and Spark to take a look at my
proposal and share their opinion from their own component's perspective. If
we get on the same page I'll eventually open Jiras to cover this
improvement for each mentioned systems.

Cheers,
Gabor