Re: [VOTE] SPARK 3.0.0-preview (RC2)

2019-10-31 Thread Xiao Li
Spark 3.0 will still use the Hadoop 2.7 profile by default, I think. Hadoop
2.7 profile is much more stable than Hadoop 3.2 profile.

On Thu, Oct 31, 2019 at 3:54 PM Sean Owen  wrote:

> This isn't a big thing, but I see that the pyspark build includes
> Hadoop 2.7 rather than 3.2. Maybe later we change the build to put in
> 3.2 by default.
>
> Otherwise, the tests all seems to pass with JDK 8 / 11 with all
> profiles enabled, so I'm +1 on it.
>
>
> On Thu, Oct 31, 2019 at 1:00 AM Xingbo Jiang 
> wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 3.0.0-preview.
> >
> > The vote is open until November 3 PST and passes if a majority +1 PMC
> votes are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.0.0-preview
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v3.0.0-preview-rc2 (commit
> 007c873ae34f58651481ccba30e8e2ba38a692c4):
> > https://github.com/apache/spark/tree/v3.0.0-preview-rc2
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.0.0-preview-rc2-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1336/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.0.0-preview-rc2-docs/
> >
> > The list of bug fixes going into 3.0.0 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12339177
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with an out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.0.0?
> > ===
> >
> > The current list of open tickets targeted at 3.0.0 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.0.0
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
[image: Databricks Summit - Watch the talks]



Re: [VOTE] SPARK 3.0.0-preview (RC2)

2019-10-31 Thread Sean Owen
This isn't a big thing, but I see that the pyspark build includes
Hadoop 2.7 rather than 3.2. Maybe later we change the build to put in
3.2 by default.

Otherwise, the tests all seems to pass with JDK 8 / 11 with all
profiles enabled, so I'm +1 on it.


On Thu, Oct 31, 2019 at 1:00 AM Xingbo Jiang  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.0.0-preview.
>
> The vote is open until November 3 PST and passes if a majority +1 PMC votes 
> are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.0.0-preview
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.0.0-preview-rc2 (commit 
> 007c873ae34f58651481ccba30e8e2ba38a692c4):
> https://github.com/apache/spark/tree/v3.0.0-preview-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-preview-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1336/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-preview-rc2-docs/
>
> The list of bug fixes going into 3.0.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.0.0?
> ===
>
> The current list of open tickets targeted at 3.0.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 3.0.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Packages to release in 3.0.0-preview

2019-10-31 Thread Cody Koeninger
On Thu, Oct 31, 2019 at 4:30 PM Sean Owen  wrote:
>
> . But it'd be cooler to call these major
> releases!


Maybe this is just semantics, but my point is the Scala project
already does call 2.12 to 2.13 a major release

e.g. from https://www.scala-lang.org/download/

"Note that different *major* releases of Scala (e.g. Scala 2.11.x and
Scala 2.12.x) are not binary compatible with each other."

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Packages to release in 3.0.0-preview

2019-10-31 Thread Sean Owen
Yep, it's worse than that. Code compiled for 2.x is _not allowed_ to
work with 2.(x+1). I say this with all love for Scala and total
respect for how big improvements in what Scala does necessarily mean
bytecode-level incompatibility. But it'd be cooler to call these major
releases! even in Java, you stand a chance of Java 6-era code still
running on Java 14.

On Thu, Oct 31, 2019 at 4:14 PM Cody Koeninger  wrote:
>
> On Wed, Oct 30, 2019 at 5:57 PM Sean Owen  wrote:
>
> > Or, frankly, maybe Scala should reconsider the mutual incompatibility
> > between minor releases. These are basically major releases, and
> > indeed, it causes exactly this kind of headache.
> >
>
>
> Not saying binary incompatibility is fun, but 2.12 to 2.13 is a major
> release, it's not a minor release.  Scala pre-dates semantic
> versioning, the second digit is for major releases.
>
> scala 2.13.0 Jun 7, 2019
> scala 2.12.0 Nov 2, 2016
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Packages to release in 3.0.0-preview

2019-10-31 Thread Cody Koeninger
On Wed, Oct 30, 2019 at 5:57 PM Sean Owen  wrote:

> Or, frankly, maybe Scala should reconsider the mutual incompatibility
> between minor releases. These are basically major releases, and
> indeed, it causes exactly this kind of headache.
>


Not saying binary incompatibility is fun, but 2.12 to 2.13 is a major
release, it's not a minor release.  Scala pre-dates semantic
versioning, the second digit is for major releases.

scala 2.13.0 Jun 7, 2019
scala 2.12.0 Nov 2, 2016

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-31 Thread Shane Knapp
i'm currently testing PyPy3.6 v7.2.0 w/this pull request:
https://github.com/apache/spark/pull/26330

On Wed, Oct 30, 2019 at 2:31 PM Maciej Szymkiewicz 
wrote:

> Could we upgrade to PyPy3.6 v7.2.0?
> On 10/30/19 9:45 PM, Shane Knapp wrote:
>
> one quick thing:  we currently test against python2.7, 3.6 *and* pypy2.5.1
> (python2.7).
>
> what are our plans for pypy?
>
>
> On Wed, Oct 30, 2019 at 12:26 PM Dongjoon Hyun 
> wrote:
>
>> Thank you all. I made a PR for that.
>>
>> https://github.com/apache/spark/pull/26326
>>
>> On Tue, Oct 29, 2019 at 5:45 AM Takeshi Yamamuro 
>> wrote:
>>
>>> +1, too.
>>>
>>> On Tue, Oct 29, 2019 at 4:16 PM Holden Karau 
>>> wrote:
>>>
 +1 to deprecating but not yet removing support for 3.6

 On Tue, Oct 29, 2019 at 3:47 AM Shane Knapp 
 wrote:

> +1 to testing the absolute minimum number of python variants as
> possible.  ;)
>
> On Mon, Oct 28, 2019 at 7:46 PM Hyukjin Kwon 
> wrote:
>
>> +1 from me as well.
>>
>> 2019년 10월 29일 (화) 오전 5:34, Xiangrui Meng 님이 작성:
>>
>>> +1. And we should start testing 3.7 and maybe 3.8 in Jenkins.
>>>
>>> On Thu, Oct 24, 2019 at 9:34 AM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>>
 Thank you for starting the thread.

 In addition to that, we currently are testing Python 3.6 only in
 Apache Spark Jenkins environment.

 Given that Python 3.8 is already out and Apache Spark 3.0.0 RC1
 will start next January
 (https://spark.apache.org/versioning-policy.html), I'm +1 for the
 deprecation (Python < 3.6) at Apache Spark 3.0.0.

 It's just a deprecation to prepare the next-step development cycle.
 Bests,
 Dongjoon.


 On Thu, Oct 24, 2019 at 1:10 AM Maciej Szymkiewicz <
 mszymkiew...@gmail.com> wrote:

> Hi everyone,
>
> While deprecation of Python 2 in 3.0.0 has been announced
> ,
> there is no clear statement about specific continuing support of 
> different
> Python 3 version.
>
> Specifically:
>
>- Python 3.4 has been retired this year.
>- Python 3.5 is already in the "security fixes only" mode and
>should be retired in the middle of 2020.
>
> Continued support of these two blocks adoption of many new Python
> features (PEP 468)  and it is hard to justify beyond 2020.
>
> Should these two be deprecated in 3.0.0 as well?
>
> --
> Best regards,
> Maciej
>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>
 --
 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>
> --
> Best regards,
> Maciej
>
>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-31 Thread Takuya UESHIN
+1

On Thu, Oct 31, 2019 at 11:21 AM Bryan Cutler  wrote:

> +1 for deprecating
>
> On Wed, Oct 30, 2019 at 2:46 PM Shane Knapp  wrote:
>
>> sure.  that shouldn't be too hard, but we've historically given very
>> little support to it.
>>
>> On Wed, Oct 30, 2019 at 2:31 PM Maciej Szymkiewicz <
>> mszymkiew...@gmail.com> wrote:
>>
>>> Could we upgrade to PyPy3.6 v7.2.0?
>>> On 10/30/19 9:45 PM, Shane Knapp wrote:
>>>
>>> one quick thing:  we currently test against python2.7, 3.6 *and*
>>> pypy2.5.1 (python2.7).
>>>
>>> what are our plans for pypy?
>>>
>>>
>>> On Wed, Oct 30, 2019 at 12:26 PM Dongjoon Hyun 
>>> wrote:
>>>
 Thank you all. I made a PR for that.

 https://github.com/apache/spark/pull/26326

 On Tue, Oct 29, 2019 at 5:45 AM Takeshi Yamamuro 
 wrote:

> +1, too.
>
> On Tue, Oct 29, 2019 at 4:16 PM Holden Karau 
> wrote:
>
>> +1 to deprecating but not yet removing support for 3.6
>>
>> On Tue, Oct 29, 2019 at 3:47 AM Shane Knapp 
>> wrote:
>>
>>> +1 to testing the absolute minimum number of python variants as
>>> possible.  ;)
>>>
>>> On Mon, Oct 28, 2019 at 7:46 PM Hyukjin Kwon 
>>> wrote:
>>>
 +1 from me as well.

 2019년 10월 29일 (화) 오전 5:34, Xiangrui Meng 님이
 작성:

> +1. And we should start testing 3.7 and maybe 3.8 in Jenkins.
>
> On Thu, Oct 24, 2019 at 9:34 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
>
>> Thank you for starting the thread.
>>
>> In addition to that, we currently are testing Python 3.6 only in
>> Apache Spark Jenkins environment.
>>
>> Given that Python 3.8 is already out and Apache Spark 3.0.0 RC1
>> will start next January
>> (https://spark.apache.org/versioning-policy.html), I'm +1 for
>> the deprecation (Python < 3.6) at Apache Spark 3.0.0.
>>
>> It's just a deprecation to prepare the next-step development
>> cycle.
>> Bests,
>> Dongjoon.
>>
>>
>> On Thu, Oct 24, 2019 at 1:10 AM Maciej Szymkiewicz <
>> mszymkiew...@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> While deprecation of Python 2 in 3.0.0 has been announced
>>> ,
>>> there is no clear statement about specific continuing support of 
>>> different
>>> Python 3 version.
>>>
>>> Specifically:
>>>
>>>- Python 3.4 has been retired this year.
>>>- Python 3.5 is already in the "security fixes only" mode
>>>and should be retired in the middle of 2020.
>>>
>>> Continued support of these two blocks adoption of many new
>>> Python features (PEP 468)  and it is hard to justify beyond 2020.
>>>
>>> Should these two be deprecated in 3.0.0 as well?
>>>
>>> --
>>> Best regards,
>>> Maciej
>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>> --
>>> Best regards,
>>> Maciej
>>>
>>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

-- 
Takuya UESHIN
Tokyo, Japan

http://twitter.com/ueshin


Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-31 Thread Bryan Cutler
+1 for deprecating

On Wed, Oct 30, 2019 at 2:46 PM Shane Knapp  wrote:

> sure.  that shouldn't be too hard, but we've historically given very
> little support to it.
>
> On Wed, Oct 30, 2019 at 2:31 PM Maciej Szymkiewicz 
> wrote:
>
>> Could we upgrade to PyPy3.6 v7.2.0?
>> On 10/30/19 9:45 PM, Shane Knapp wrote:
>>
>> one quick thing:  we currently test against python2.7, 3.6 *and*
>> pypy2.5.1 (python2.7).
>>
>> what are our plans for pypy?
>>
>>
>> On Wed, Oct 30, 2019 at 12:26 PM Dongjoon Hyun 
>> wrote:
>>
>>> Thank you all. I made a PR for that.
>>>
>>> https://github.com/apache/spark/pull/26326
>>>
>>> On Tue, Oct 29, 2019 at 5:45 AM Takeshi Yamamuro 
>>> wrote:
>>>
 +1, too.

 On Tue, Oct 29, 2019 at 4:16 PM Holden Karau 
 wrote:

> +1 to deprecating but not yet removing support for 3.6
>
> On Tue, Oct 29, 2019 at 3:47 AM Shane Knapp 
> wrote:
>
>> +1 to testing the absolute minimum number of python variants as
>> possible.  ;)
>>
>> On Mon, Oct 28, 2019 at 7:46 PM Hyukjin Kwon 
>> wrote:
>>
>>> +1 from me as well.
>>>
>>> 2019년 10월 29일 (화) 오전 5:34, Xiangrui Meng 님이 작성:
>>>
 +1. And we should start testing 3.7 and maybe 3.8 in Jenkins.

 On Thu, Oct 24, 2019 at 9:34 AM Dongjoon Hyun <
 dongjoon.h...@gmail.com> wrote:

> Thank you for starting the thread.
>
> In addition to that, we currently are testing Python 3.6 only in
> Apache Spark Jenkins environment.
>
> Given that Python 3.8 is already out and Apache Spark 3.0.0 RC1
> will start next January
> (https://spark.apache.org/versioning-policy.html), I'm +1 for the
> deprecation (Python < 3.6) at Apache Spark 3.0.0.
>
> It's just a deprecation to prepare the next-step development cycle.
> Bests,
> Dongjoon.
>
>
> On Thu, Oct 24, 2019 at 1:10 AM Maciej Szymkiewicz <
> mszymkiew...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> While deprecation of Python 2 in 3.0.0 has been announced
>> ,
>> there is no clear statement about specific continuing support of 
>> different
>> Python 3 version.
>>
>> Specifically:
>>
>>- Python 3.4 has been retired this year.
>>- Python 3.5 is already in the "security fixes only" mode and
>>should be retired in the middle of 2020.
>>
>> Continued support of these two blocks adoption of many new Python
>> features (PEP 468)  and it is hard to justify beyond 2020.
>>
>> Should these two be deprecated in 3.0.0 as well?
>>
>> --
>> Best regards,
>> Maciej
>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


 --
 ---
 Takeshi Yamamuro

>>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>> --
>> Best regards,
>> Maciej
>>
>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


[VOTE] SPARK 3.0.0-preview (RC2)

2019-10-31 Thread Xingbo Jiang
Please vote on releasing the following candidate as Apache Spark version
3.0.0-preview.

The vote is open until November 3 PST and passes if a majority +1 PMC votes
are cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.0.0-preview
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.0.0-preview-rc2 (commit
007c873ae34f58651481ccba30e8e2ba38a692c4):
https://github.com/apache/spark/tree/v3.0.0-preview-rc2

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.0.0-preview-rc2-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1336/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.0.0-preview-rc2-docs/

The list of bug fixes going into 3.0.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12339177

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.0.0?
===

The current list of open tickets targeted at 3.0.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.0.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.