[VOTE] Spark 2.3.0 (RC2)

2018-01-22 Thread Sameer Agarwal
Please vote on releasing the following candidate as Apache Spark version
2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC and
passes if a majority of at least 3 PMC +1 votes are cast.


[ ] +1 Release this package as Apache Spark 2.3.0

[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v2.3.0-rc2:
https://github.com/apache/spark/tree/v2.3.0-rc2
(489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)

List of JIRA tickets resolved in this release can be found here:
https://issues.apache.org/jira/projects/SPARK/versions/12339551

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/

Release artifacts are signed with the following key:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1262/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html


FAQ

===
What are the unresolved issues targeted for 2.3.0?
===

Please see https://s.apache.org/oXKi. At the time of writing, there are
currently no known release blockers.

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install the
current RC and see if anything important breaks, in the Java/Scala you can
add the staging repository to your projects resolvers and test with the RC
(make sure to clean up the artifact cache before/after so you don't end up
building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.0?
===

Committers should look at those and triage. Extremely important bug fixes,
documentation, and API tweaks that impact compatibility should be worked on
immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
appropriate.

===
Why is my bug not fixed?
===

In order to make timely releases, we will typically not hold the release
unless the bug in question is a regression from 2.2.0. That being said, if
there is something which is a regression from 2.2.0 and has not been
correctly targeted please ping me or a committer to help target the issue
(you can see the open issues listed as impacting Spark 2.3.0 at
https://s.apache.org/WmoI).


Regards,
Sameer


Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-22 Thread Marcelo Vanzin
+0

Signatures check out. Code compiles, although I see the errors in [1]
when untarring the source archive; perhaps we should add "use GNU tar"
to the RM checklist?

Also ran our internal tests and they seem happy.

My concern is the list of open bugs targeted at 2.3.0 (ignoring the
documentation ones). It is not long, but it seems some of those need
to be looked at. It would be nice for the committers who are involved
in those bugs to take a look.

[1] 
https://superuser.com/questions/318809/linux-os-x-tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt


On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal  wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC and
> passes if a majority of at least 3 PMC +1 votes are cast.
>
>
> [ ] +1 Release this package as Apache Spark 2.3.0
>
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v2.3.0-rc2:
> https://github.com/apache/spark/tree/v2.3.0-rc2
> (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
>
> List of JIRA tickets resolved in this release can be found here:
> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1262/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html
>
>
> FAQ
>
> ===
> What are the unresolved issues targeted for 2.3.0?
> ===
>
> Please see https://s.apache.org/oXKi. At the time of writing, there are
> currently no known release blockers.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install the
> current RC and see if anything important breaks, in the Java/Scala you can
> add the staging repository to your projects resolvers and test with the RC
> (make sure to clean up the artifact cache before/after so you don't end up
> building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.0?
> ===
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
> appropriate.
>
> ===
> Why is my bug not fixed?
> ===
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.2.0. That being said, if
> there is something which is a regression from 2.2.0 and has not been
> correctly targeted please ping me or a committer to help target the issue
> (you can see the open issues listed as impacting Spark 2.3.0 at
> https://s.apache.org/WmoI).
>
>
> Regards,
> Sameer



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-22 Thread Wenchen Fan
+1

All the blocking issues are resolved(AFAIK), and important data source v2
features have been merged.

On Tue, Jan 23, 2018 at 9:09 AM, Marcelo Vanzin  wrote:

> +0
>
> Signatures check out. Code compiles, although I see the errors in [1]
> when untarring the source archive; perhaps we should add "use GNU tar"
> to the RM checklist?
>
> Also ran our internal tests and they seem happy.
>
> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
> documentation ones). It is not long, but it seems some of those need
> to be looked at. It would be nice for the committers who are involved
> in those bugs to take a look.
>
> [1] https://superuser.com/questions/318809/linux-os-x-
> tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>
>
> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
> wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC
> and
> > passes if a majority of at least 3 PMC +1 votes are cast.
> >
> >
> > [ ] +1 Release this package as Apache Spark 2.3.0
> >
> > [ ] -1 Do not release this package because ...
> >
> >
> > To learn more about Apache Spark, please see https://spark.apache.org/
> >
> > The tag to be voted on is v2.3.0-rc2:
> > https://github.com/apache/spark/tree/v2.3.0-rc2
> > (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
> >
> > List of JIRA tickets resolved in this release can be found here:
> > https://issues.apache.org/jira/projects/SPARK/versions/12339551
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
> >
> > Release artifacts are signed with the following key:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1262/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-
> docs/_site/index.html
> >
> >
> > FAQ
> >
> > ===
> > What are the unresolved issues targeted for 2.3.0?
> > ===
> >
> > Please see https://s.apache.org/oXKi. At the time of writing, there are
> > currently no known release blockers.
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install the
> > current RC and see if anything important breaks, in the Java/Scala you
> can
> > add the staging repository to your projects resolvers and test with the
> RC
> > (make sure to clean up the artifact cache before/after so you don't end
> up
> > building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 2.3.0?
> > ===
> >
> > Committers should look at those and triage. Extremely important bug
> fixes,
> > documentation, and API tweaks that impact compatibility should be worked
> on
> > immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
> > appropriate.
> >
> > ===
> > Why is my bug not fixed?
> > ===
> >
> > In order to make timely releases, we will typically not hold the release
> > unless the bug in question is a regression from 2.2.0. That being said,
> if
> > there is something which is a regression from 2.2.0 and has not been
> > correctly targeted please ping me or a committer to help target the issue
> > (you can see the open issues listed as impacting Spark 2.3.0 at
> > https://s.apache.org/WmoI).
> >
> >
> > Regards,
> > Sameer
>
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-23 Thread Sean Owen
I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
unpacking it with 'xvzf' and also unzipping it first, and it untarred
without warnings in either case.

I am encountering errors while running the tests, different ones each time,
so am still figuring out whether there is a real problem or just flaky
tests.

These issues look like blockers, as they are inherently to be completed
before the 2.3 release. They are mostly not done. I suppose I'd -1 on
behalf of those who say this needs to be done first, though, we can keep
testing.

SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
SPARK-23114 Spark R 2.3 QA umbrella

Here are the remaining items targeted for 2.3:

SPARK-15689 Data source API v2
SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
SPARK-21646 Add new type coercion rules to compatible with Hive
SPARK-22386 Data Source V2 improvements
SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
SPARK-22735 Add VectorSizeHint to ML features documentation
SPARK-22739 Additional Expression Support for Objects
SPARK-22809 pyspark is sensitive to imports with dots
SPARK-22820 Spark 2.3 SQL API audit


On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin  wrote:

> +0
>
> Signatures check out. Code compiles, although I see the errors in [1]
> when untarring the source archive; perhaps we should add "use GNU tar"
> to the RM checklist?
>
> Also ran our internal tests and they seem happy.
>
> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
> documentation ones). It is not long, but it seems some of those need
> to be looked at. It would be nice for the committers who are involved
> in those bugs to take a look.
>
> [1]
> https://superuser.com/questions/318809/linux-os-x-tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>
>
> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
> wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC
> and
> > passes if a majority of at least 3 PMC +1 votes are cast.
> >
> >
> > [ ] +1 Release this package as Apache Spark 2.3.0
> >
> > [ ] -1 Do not release this package because ...
> >
> >
> > To learn more about Apache Spark, please see https://spark.apache.org/
> >
> > The tag to be voted on is v2.3.0-rc2:
> > https://github.com/apache/spark/tree/v2.3.0-rc2
> > (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
> >
> > List of JIRA tickets resolved in this release can be found here:
> > https://issues.apache.org/jira/projects/SPARK/versions/12339551
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
> >
> > Release artifacts are signed with the following key:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1262/
> >
> > The documentation corresponding to this release can be found at:
> >
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html
> >
> >
> > FAQ
> >
> > ===
> > What are the unresolved issues targeted for 2.3.0?
> > ===
> >
> > Please see https://s.apache.org/oXKi. At the time of writing, there are
> > currently no known release blockers.
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install the
> > current RC and see if anything important breaks, in the Java/Scala you
> can
> > add the staging repository to your projects resolvers and test with the
> RC
> > (make sure to clean up the artifact cache before/after so you don't end
> up
> > building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 2.3.0?
> > ===
> >
> > Committers should look at those and triage. Extremely important bug
> fixes,
> > documentation, and API tweaks that impact compatibility should be worked
> on
> > immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
> > appropriate.
> >
> > ===
> > Why is my bug not fixed?
> > ===
> >
> > In order to make timely releases, we will typically not hold the release
> > unless the bug in question is a regression from 2.2.0. That being said,
> if
> > there is something which is a regression from 2.2.0 and has not been
> > correctly targeted please ping me or a committer to help target the issue
> > (you can see the open issues listed as impacting Spark 2.3.0 at
> > https://s.apache.org/WmoI).

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-23 Thread Marcelo Vanzin
On Tue, Jan 23, 2018 at 7:01 AM, Sean Owen  wrote:
> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried unpacking
> it with 'xvzf' and also unzipping it first, and it untarred without warnings
> in either case.

The warnings just show up if you unpack using GNU tar. The exit code
is still 0, so aside from the ugly warnings, it seems everything still
works.

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-23 Thread Xiao Li
+1

Xiao Li

2018-01-23 9:44 GMT-08:00 Marcelo Vanzin :

> On Tue, Jan 23, 2018 at 7:01 AM, Sean Owen  wrote:
> > I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
> unpacking
> > it with 'xvzf' and also unzipping it first, and it untarred without
> warnings
> > in either case.
>
> The warnings just show up if you unpack using GNU tar. The exit code
> is still 0, so aside from the ugly warnings, it seems everything still
> works.
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-24 Thread Marcelo Vanzin
Given that the bugs I was worried about have been dealt with, I'm
upgrading to +1.

On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin  wrote:
> +0
>
> Signatures check out. Code compiles, although I see the errors in [1]
> when untarring the source archive; perhaps we should add "use GNU tar"
> to the RM checklist?
>
> Also ran our internal tests and they seem happy.
>
> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
> documentation ones). It is not long, but it seems some of those need
> to be looked at. It would be nice for the committers who are involved
> in those bugs to take a look.
>
> [1] 
> https://superuser.com/questions/318809/linux-os-x-tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>
>
> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal  wrote:
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC and
>> passes if a majority of at least 3 PMC +1 votes are cast.
>>
>>
>> [ ] +1 Release this package as Apache Spark 2.3.0
>>
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v2.3.0-rc2:
>> https://github.com/apache/spark/tree/v2.3.0-rc2
>> (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
>>
>> List of JIRA tickets resolved in this release can be found here:
>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
>>
>> Release artifacts are signed with the following key:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1262/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html
>>
>>
>> FAQ
>>
>> ===
>> What are the unresolved issues targeted for 2.3.0?
>> ===
>>
>> Please see https://s.apache.org/oXKi. At the time of writing, there are
>> currently no known release blockers.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install the
>> current RC and see if anything important breaks, in the Java/Scala you can
>> add the staging repository to your projects resolvers and test with the RC
>> (make sure to clean up the artifact cache before/after so you don't end up
>> building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.0?
>> ===
>>
>> Committers should look at those and triage. Extremely important bug fixes,
>> documentation, and API tweaks that impact compatibility should be worked on
>> immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
>> appropriate.
>>
>> ===
>> Why is my bug not fixed?
>> ===
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.2.0. That being said, if
>> there is something which is a regression from 2.2.0 and has not been
>> correctly targeted please ping me or a committer to help target the issue
>> (you can see the open issues listed as impacting Spark 2.3.0 at
>> https://s.apache.org/WmoI).
>>
>>
>> Regards,
>> Sameer
>
>
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread 蒋星博
I'm sorry to post -1 on this, since there is a non-trivial correctness
issue that I believe we should fix in 2.3.

TL;DR; of the issue: A certain pattern of shuffle+repartition in a query
may produce wrong result if some downstream stages failed and trigger retry
of repartition, the reason of this bug is that current implementation of
`repartition()` doesn't generate deterministic output. The JIRA task:
https://issues.apache.org/jira/browse/SPARK-23207

This is NOT a regression, but since it's a non-trivial correctness issue,
we'd better ship the patch along with 2.3,

2018-01-24 11:42 GMT-08:00 Marcelo Vanzin :

> Given that the bugs I was worried about have been dealt with, I'm
> upgrading to +1.
>
> On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin 
> wrote:
> > +0
> >
> > Signatures check out. Code compiles, although I see the errors in [1]
> > when untarring the source archive; perhaps we should add "use GNU tar"
> > to the RM checklist?
> >
> > Also ran our internal tests and they seem happy.
> >
> > My concern is the list of open bugs targeted at 2.3.0 (ignoring the
> > documentation ones). It is not long, but it seems some of those need
> > to be looked at. It would be nice for the committers who are involved
> > in those bugs to take a look.
> >
> > [1] https://superuser.com/questions/318809/linux-os-x-
> tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
> >
> >
> > On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
> wrote:
> >> Please vote on releasing the following candidate as Apache Spark version
> >> 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC
> and
> >> passes if a majority of at least 3 PMC +1 votes are cast.
> >>
> >>
> >> [ ] +1 Release this package as Apache Spark 2.3.0
> >>
> >> [ ] -1 Do not release this package because ...
> >>
> >>
> >> To learn more about Apache Spark, please see https://spark.apache.org/
> >>
> >> The tag to be voted on is v2.3.0-rc2:
> >> https://github.com/apache/spark/tree/v2.3.0-rc2
> >> (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
> >>
> >> List of JIRA tickets resolved in this release can be found here:
> >> https://issues.apache.org/jira/projects/SPARK/versions/12339551
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
> >>
> >> Release artifacts are signed with the following key:
> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>
> >> The staging repository for this release can be found at:
> >> https://repository.apache.org/content/repositories/orgapachespark-1262/
> >>
> >> The documentation corresponding to this release can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-
> docs/_site/index.html
> >>
> >>
> >> FAQ
> >>
> >> ===
> >> What are the unresolved issues targeted for 2.3.0?
> >> ===
> >>
> >> Please see https://s.apache.org/oXKi. At the time of writing, there are
> >> currently no known release blockers.
> >>
> >> =
> >> How can I help test this release?
> >> =
> >>
> >> If you are a Spark user, you can help us test this release by taking an
> >> existing Spark workload and running on this release candidate, then
> >> reporting any regressions.
> >>
> >> If you're working in PySpark you can set up a virtual env and install
> the
> >> current RC and see if anything important breaks, in the Java/Scala you
> can
> >> add the staging repository to your projects resolvers and test with the
> RC
> >> (make sure to clean up the artifact cache before/after so you don't end
> up
> >> building with a out of date RC going forward).
> >>
> >> ===
> >> What should happen to JIRA tickets still targeting 2.3.0?
> >> ===
> >>
> >> Committers should look at those and triage. Extremely important bug
> fixes,
> >> documentation, and API tweaks that impact compatibility should be
> worked on
> >> immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
> >> appropriate.
> >>
> >> ===
> >> Why is my bug not fixed?
> >> ===
> >>
> >> In order to make timely releases, we will typically not hold the release
> >> unless the bug in question is a regression from 2.2.0. That being said,
> if
> >> there is something which is a regression from 2.2.0 and has not been
> >> correctly targeted please ping me or a committer to help target the
> issue
> >> (you can see the open issues listed as impacting Spark 2.3.0 at
> >> https://s.apache.org/WmoI).
> >>
> >>
> >> Regards,
> >> Sameer
> >
> >
> >
> > --
> > Marcelo
>
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Marcelo Vanzin
Sorry, have to change my vote again. Hive guys ran into SPARK-23209
and that's a regression we need to fix. I'll post a patch soon. So -1
(although others have already -1'ed).

On Wed, Jan 24, 2018 at 11:42 AM, Marcelo Vanzin  wrote:
> Given that the bugs I was worried about have been dealt with, I'm
> upgrading to +1.
>
> On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin  wrote:
>> +0
>>
>> Signatures check out. Code compiles, although I see the errors in [1]
>> when untarring the source archive; perhaps we should add "use GNU tar"
>> to the RM checklist?
>>
>> Also ran our internal tests and they seem happy.
>>
>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>> documentation ones). It is not long, but it seems some of those need
>> to be looked at. It would be nice for the committers who are involved
>> in those bugs to take a look.
>>
>> [1] 
>> https://superuser.com/questions/318809/linux-os-x-tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>>
>>
>> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal  wrote:
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC and
>>> passes if a majority of at least 3 PMC +1 votes are cast.
>>>
>>>
>>> [ ] +1 Release this package as Apache Spark 2.3.0
>>>
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see https://spark.apache.org/
>>>
>>> The tag to be voted on is v2.3.0-rc2:
>>> https://github.com/apache/spark/tree/v2.3.0-rc2
>>> (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
>>>
>>> List of JIRA tickets resolved in this release can be found here:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1262/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html
>>>
>>>
>>> FAQ
>>>
>>> ===
>>> What are the unresolved issues targeted for 2.3.0?
>>> ===
>>>
>>> Please see https://s.apache.org/oXKi. At the time of writing, there are
>>> currently no known release blockers.
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install the
>>> current RC and see if anything important breaks, in the Java/Scala you can
>>> add the staging repository to your projects resolvers and test with the RC
>>> (make sure to clean up the artifact cache before/after so you don't end up
>>> building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.3.0?
>>> ===
>>>
>>> Committers should look at those and triage. Extremely important bug fixes,
>>> documentation, and API tweaks that impact compatibility should be worked on
>>> immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
>>> appropriate.
>>>
>>> ===
>>> Why is my bug not fixed?
>>> ===
>>>
>>> In order to make timely releases, we will typically not hold the release
>>> unless the bug in question is a regression from 2.2.0. That being said, if
>>> there is something which is a regression from 2.2.0 and has not been
>>> correctly targeted please ping me or a committer to help target the issue
>>> (you can see the open issues listed as impacting Spark 2.3.0 at
>>> https://s.apache.org/WmoI).
>>>
>>>
>>> Regards,
>>> Sameer
>>
>>
>>
>> --
>> Marcelo
>
>
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Nick Pentreath
I think this has come up before (and Sean mentions it above), but the
sub-items on:

SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella

are actually marked as Blockers, but are not targeted to 2.3.0. I think
they should be, and I'm not comfortable with those not being resolved
before voting positively on the release.

So I'm -1 too for that reason.

I think most of those review items are close to done, and there is also
https://issues.apache.org/jira/browse/SPARK-22799 that I think should be in
for 2.3 (to avoid a behavior change later between 2.3.0 and 2.3.1,
especially since we'll have another RC now it seems).


On Thu, 25 Jan 2018 at 19:28 Marcelo Vanzin  wrote:

> Sorry, have to change my vote again. Hive guys ran into SPARK-23209
> and that's a regression we need to fix. I'll post a patch soon. So -1
> (although others have already -1'ed).
>
> On Wed, Jan 24, 2018 at 11:42 AM, Marcelo Vanzin 
> wrote:
> > Given that the bugs I was worried about have been dealt with, I'm
> > upgrading to +1.
> >
> > On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin 
> wrote:
> >> +0
> >>
> >> Signatures check out. Code compiles, although I see the errors in [1]
> >> when untarring the source archive; perhaps we should add "use GNU tar"
> >> to the RM checklist?
> >>
> >> Also ran our internal tests and they seem happy.
> >>
> >> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
> >> documentation ones). It is not long, but it seems some of those need
> >> to be looked at. It would be nice for the committers who are involved
> >> in those bugs to take a look.
> >>
> >> [1]
> https://superuser.com/questions/318809/linux-os-x-tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
> >>
> >>
> >> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
> wrote:
> >>> Please vote on releasing the following candidate as Apache Spark
> version
> >>> 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am
> UTC and
> >>> passes if a majority of at least 3 PMC +1 votes are cast.
> >>>
> >>>
> >>> [ ] +1 Release this package as Apache Spark 2.3.0
> >>>
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>>
> >>> To learn more about Apache Spark, please see https://spark.apache.org/
> >>>
> >>> The tag to be voted on is v2.3.0-rc2:
> >>> https://github.com/apache/spark/tree/v2.3.0-rc2
> >>> (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
> >>>
> >>> List of JIRA tickets resolved in this release can be found here:
> >>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
> >>>
> >>> The release files, including signatures, digests, etc. can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
> >>>
> >>> Release artifacts are signed with the following key:
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>
> >>> The staging repository for this release can be found at:
> >>>
> https://repository.apache.org/content/repositories/orgapachespark-1262/
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>>
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html
> >>>
> >>>
> >>> FAQ
> >>>
> >>> ===
> >>> What are the unresolved issues targeted for 2.3.0?
> >>> ===
> >>>
> >>> Please see https://s.apache.org/oXKi. At the time of writing, there
> are
> >>> currently no known release blockers.
> >>>
> >>> =
> >>> How can I help test this release?
> >>> =
> >>>
> >>> If you are a Spark user, you can help us test this release by taking an
> >>> existing Spark workload and running on this release candidate, then
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and install
> the
> >>> current RC and see if anything important breaks, in the Java/Scala you
> can
> >>> add the staging repository to your projects resolvers and test with
> the RC
> >>> (make sure to clean up the artifact cache before/after so you don't
> end up
> >>> building with a out of date RC going forward).
> >>>
> >>> ===
> >>> What should happen to JIRA tickets still targeting 2.3.0?
> >>> ===
> >>>
> >>> Committers should look at those and triage. Extremely important bug
> fixes,
> >>> documentation, and API tweaks that impact compatibility should be
> worked on
> >>> immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
> >>> appropriate.
> >>>
> >>> ===
> >>> Why is my bug not fixed?
> >>> ===
> >>>
> >>> In order to make timely releases, we will typically not hold the
> release
> >>> unless the bug in question is a regression from 2.2.0. That being
> said, if
> >>> there is something which is a regression from 2.2.0 and has not been
> >>> correctly targeted please ping me or a committer to help target the
> issue
> >>> (you can see the open iss

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sean Owen
Most tests pass on RC2, except I'm still seeing the timeout caused by
https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never finish.
I followed the thread a bit further and wasn't clear whether it was
subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though I am
still seeing these tests fail or hang:

- subscribing topic by name from earliest offsets (failOnDataLoss: false)
- subscribing topic by name from earliest offsets (failOnDataLoss: true)

Will check out the next RC.

On Tue, Jan 23, 2018 at 9:01 AM Sean Owen  wrote:

> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
> unpacking it with 'xvzf' and also unzipping it first, and it untarred
> without warnings in either case.
>
> I am encountering errors while running the tests, different ones each
> time, so am still figuring out whether there is a real problem or just
> flaky tests.
>
> These issues look like blockers, as they are inherently to be completed
> before the 2.3 release. They are mostly not done. I suppose I'd -1 on
> behalf of those who say this needs to be done first, though, we can keep
> testing.
>
> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
> SPARK-23114 Spark R 2.3 QA umbrella
>
> Here are the remaining items targeted for 2.3:
>
> SPARK-15689 Data source API v2
> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
> SPARK-21646 Add new type coercion rules to compatible with Hive
> SPARK-22386 Data Source V2 improvements
> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
> SPARK-22735 Add VectorSizeHint to ML features documentation
> SPARK-22739 Additional Expression Support for Objects
> SPARK-22809 pyspark is sensitive to imports with dots
> SPARK-22820 Spark 2.3 SQL API audit
>
>
> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin 
> wrote:
>
>> +0
>>
>> Signatures check out. Code compiles, although I see the errors in [1]
>> when untarring the source archive; perhaps we should add "use GNU tar"
>> to the RM checklist?
>>
>> Also ran our internal tests and they seem happy.
>>
>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>> documentation ones). It is not long, but it seems some of those need
>> to be looked at. It would be nice for the committers who are involved
>> in those bugs to take a look.
>>
>> [1]
>> https://superuser.com/questions/318809/linux-os-x-tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>>
>>
>> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
>> wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC
>> and
>> > passes if a majority of at least 3 PMC +1 votes are cast.
>> >
>> >
>> > [ ] +1 Release this package as Apache Spark 2.3.0
>> >
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > To learn more about Apache Spark, please see https://spark.apache.org/
>> >
>> > The tag to be voted on is v2.3.0-rc2:
>> > https://github.com/apache/spark/tree/v2.3.0-rc2
>> > (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
>> >
>> > List of JIRA tickets resolved in this release can be found here:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12339551
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
>> >
>> > Release artifacts are signed with the following key:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1262/
>> >
>> > The documentation corresponding to this release can be found at:
>> >
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html
>> >
>> >
>> > FAQ
>> >
>> > ===
>> > What are the unresolved issues targeted for 2.3.0?
>> > ===
>> >
>> > Please see https://s.apache.org/oXKi. At the time of writing, there are
>> > currently no known release blockers.
>> >
>> > =
>> > How can I help test this release?
>> > =
>> >
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> the
>> > current RC and see if anything important breaks, in the Java/Scala you
>> can
>> > add the staging repository to your projects resolvers and test with the
>> RC
>> > (make sure to clean up the artifact cache before/after so you don't end
>> up
>> > building with a out of date RC going forward).
>> >
>> > ===
>> > What should happen to JIRA tickets still targeting 2.3.0?
>> > ===

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sameer Agarwal
I'm a -1 too. In addition to SPARK-23207
, we've recently merged
two codegen fixes (SPARK-23208
 and SPARK-21717
) that address a major
code-splitting bug and performance regressions respectively.

Regarding QA tasks, I think it goes without saying that all QA
pre-requisites are by-definition "release blockers" and an RC will not pass
until all of them are resolved. Traditionally for every major Spark
release, we've seen that serious QA only starts once an RC is cut, but if
the community feels otherwise, I'm happy to hold off the next RC until all
these QA JIRAs are resolved. Otherwise, I'll follow up with an RC3 once
SPARK-23207  and
SPARK-23209  are
resolved.

On 25 January 2018 at 10:17, Nick Pentreath 
wrote:

> I think this has come up before (and Sean mentions it above), but the
> sub-items on:
>
> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
>
> are actually marked as Blockers, but are not targeted to 2.3.0. I think
> they should be, and I'm not comfortable with those not being resolved
> before voting positively on the release.
>
> So I'm -1 too for that reason.
>
> I think most of those review items are close to done, and there is also
> https://issues.apache.org/jira/browse/SPARK-22799 that I think should be
> in for 2.3 (to avoid a behavior change later between 2.3.0 and 2.3.1,
> especially since we'll have another RC now it seems).
>
>
> On Thu, 25 Jan 2018 at 19:28 Marcelo Vanzin  wrote:
>
>> Sorry, have to change my vote again. Hive guys ran into SPARK-23209
>> and that's a regression we need to fix. I'll post a patch soon. So -1
>> (although others have already -1'ed).
>>
>> On Wed, Jan 24, 2018 at 11:42 AM, Marcelo Vanzin 
>> wrote:
>> > Given that the bugs I was worried about have been dealt with, I'm
>> > upgrading to +1.
>> >
>> > On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin 
>> wrote:
>> >> +0
>> >>
>> >> Signatures check out. Code compiles, although I see the errors in [1]
>> >> when untarring the source archive; perhaps we should add "use GNU tar"
>> >> to the RM checklist?
>> >>
>> >> Also ran our internal tests and they seem happy.
>> >>
>> >> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>> >> documentation ones). It is not long, but it seems some of those need
>> >> to be looked at. It would be nice for the committers who are involved
>> >> in those bugs to take a look.
>> >>
>> >> [1] https://superuser.com/questions/318809/linux-os-x-
>> tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>> >>
>> >>
>> >> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
>> wrote:
>> >>> Please vote on releasing the following candidate as Apache Spark
>> version
>> >>> 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am
>> UTC and
>> >>> passes if a majority of at least 3 PMC +1 votes are cast.
>> >>>
>> >>>
>> >>> [ ] +1 Release this package as Apache Spark 2.3.0
>> >>>
>> >>> [ ] -1 Do not release this package because ...
>> >>>
>> >>>
>> >>> To learn more about Apache Spark, please see
>> https://spark.apache.org/
>> >>>
>> >>> The tag to be voted on is v2.3.0-rc2:
>> >>> https://github.com/apache/spark/tree/v2.3.0-rc2
>> >>> (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
>> >>>
>> >>> List of JIRA tickets resolved in this release can be found here:
>> >>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>> >>>
>> >>> The release files, including signatures, digests, etc. can be found
>> at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
>> >>>
>> >>> Release artifacts are signed with the following key:
>> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >>>
>> >>> The staging repository for this release can be found at:
>> >>> https://repository.apache.org/content/repositories/
>> orgapachespark-1262/
>> >>>
>> >>> The documentation corresponding to this release can be found at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-
>> docs/_site/index.html
>> >>>
>> >>>
>> >>> FAQ
>> >>>
>> >>> ===
>> >>> What are the unresolved issues targeted for 2.3.0?
>> >>> ===
>> >>>
>> >>> Please see https://s.apache.org/oXKi. At the time of writing, there
>> are
>> >>> currently no known release blockers.
>> >>>
>> >>> =
>> >>> How can I help test this release?
>> >>> =
>> >>>
>> >>> If you are a Spark user, you can help us test this release by taking
>> an
>> >>> existing Spark workload and running on this release candidate, then
>> >>> reporting any regressions.
>> >>>
>> >>> If you're working in PySpark you can set up a virtual env and install
>> the
>> >>> current RC and see if anything important breaks, in the Java/Scala
>

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sameer Agarwal
> Most tests pass on RC2, except I'm still seeing the timeout caused by
> https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never
> finish. I followed the thread a bit further and wasn't clear whether it was
> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
> https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though I am
> still seeing these tests fail or hang:
>
> - subscribing topic by name from earliest offsets (failOnDataLoss: false)
> - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>

Sean, while some of these tests were timing out on RC1, we're not aware of
any known issues in RC2. Both maven (
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.6/146/testReport/org.apache.spark.sql.kafka010/history/)
and sbt (
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/123/testReport/org.apache.spark.sql.kafka010/history/)
historical builds on jenkins for org.apache.spark.sql.kafka010 look fairly
healthy. If you're still seeing timeouts in RC2, can you create a JIRA with
any applicable build/env info?



> On Tue, Jan 23, 2018 at 9:01 AM Sean Owen  wrote:
>
>> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
>> unpacking it with 'xvzf' and also unzipping it first, and it untarred
>> without warnings in either case.
>>
>> I am encountering errors while running the tests, different ones each
>> time, so am still figuring out whether there is a real problem or just
>> flaky tests.
>>
>> These issues look like blockers, as they are inherently to be completed
>> before the 2.3 release. They are mostly not done. I suppose I'd -1 on
>> behalf of those who say this needs to be done first, though, we can keep
>> testing.
>>
>> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
>> SPARK-23114 Spark R 2.3 QA umbrella
>>
>> Here are the remaining items targeted for 2.3:
>>
>> SPARK-15689 Data source API v2
>> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
>> SPARK-21646 Add new type coercion rules to compatible with Hive
>> SPARK-22386 Data Source V2 improvements
>> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
>> SPARK-22735 Add VectorSizeHint to ML features documentation
>> SPARK-22739 Additional Expression Support for Objects
>> SPARK-22809 pyspark is sensitive to imports with dots
>> SPARK-22820 Spark 2.3 SQL API audit
>>
>>
>> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin 
>> wrote:
>>
>>> +0
>>>
>>> Signatures check out. Code compiles, although I see the errors in [1]
>>> when untarring the source archive; perhaps we should add "use GNU tar"
>>> to the RM checklist?
>>>
>>> Also ran our internal tests and they seem happy.
>>>
>>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>>> documentation ones). It is not long, but it seems some of those need
>>> to be looked at. It would be nice for the committers who are involved
>>> in those bugs to take a look.
>>>
>>> [1] https://superuser.com/questions/318809/linux-os-x-
>>> tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>>>
>>>
>>> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
>>> wrote:
>>> > Please vote on releasing the following candidate as Apache Spark
>>> version
>>> > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am
>>> UTC and
>>> > passes if a majority of at least 3 PMC +1 votes are cast.
>>> >
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.3.0
>>> >
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> >
>>> > To learn more about Apache Spark, please see https://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.3.0-rc2:
>>> > https://github.com/apache/spark/tree/v2.3.0-rc2
>>> > (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
>>> >
>>> > List of JIRA tickets resolved in this release can be found here:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
>>> >
>>> > Release artifacts are signed with the following key:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> > https://repository.apache.org/content/repositories/
>>> orgapachespark-1262/
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-
>>> docs/_site/index.html
>>> >
>>> >
>>> > FAQ
>>> >
>>> > ===
>>> > What are the unresolved issues targeted for 2.3.0?
>>> > ===
>>> >
>>> > Please see https://s.apache.org/oXKi. At the time of writing, there
>>> are
>>> > currently no known release blockers.
>>> >
>>> > =
>>> > How can I help test this re

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Marcelo Vanzin
On Thu, Jan 25, 2018 at 12:29 PM, Sean Owen  wrote:
> I am still seeing these tests fail or hang:
>
> - subscribing topic by name from earliest offsets (failOnDataLoss: false)
> - subscribing topic by name from earliest offsets (failOnDataLoss: true)

This is something that we are seeing internally on a different version
Spark, and we're currently investigating with our Kafka people. Not
sure it's the same issue (we have a newer version of Kafka libraries),
but this is just another way of saying that I don't think those hangs
are new in 2.3, at least.

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Dongjoon Hyun
SPARK-23221 is one of the reasons for Kafka-test-suite deadlock issue.

For the hang issues, it seems not to be marked as a failure correctly in
Apache Spark Jenkins history.


On Thu, Jan 25, 2018 at 1:03 PM, Marcelo Vanzin  wrote:

> On Thu, Jan 25, 2018 at 12:29 PM, Sean Owen  wrote:
> > I am still seeing these tests fail or hang:
> >
> > - subscribing topic by name from earliest offsets (failOnDataLoss: false)
> > - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>
> This is something that we are seeing internally on a different version
> Spark, and we're currently investigating with our Kafka people. Not
> sure it's the same issue (we have a newer version of Kafka libraries),
> but this is just another way of saying that I don't think those hangs
> are new in 2.3, at least.
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Shixiong(Ryan) Zhu
+ Jose

On Thu, Jan 25, 2018 at 2:18 PM, Dongjoon Hyun 
wrote:

> SPARK-23221 is one of the reasons for Kafka-test-suite deadlock issue.
>
> For the hang issues, it seems not to be marked as a failure correctly in
> Apache Spark Jenkins history.
>
>
> On Thu, Jan 25, 2018 at 1:03 PM, Marcelo Vanzin 
> wrote:
>
>> On Thu, Jan 25, 2018 at 12:29 PM, Sean Owen  wrote:
>> > I am still seeing these tests fail or hang:
>> >
>> > - subscribing topic by name from earliest offsets (failOnDataLoss:
>> false)
>> > - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>>
>> This is something that we are seeing internally on a different version
>> Spark, and we're currently investigating with our Kafka people. Not
>> sure it's the same issue (we have a newer version of Kafka libraries),
>> but this is just another way of saying that I don't think those hangs
>> are new in 2.3, at least.
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Joseph Torres
SPARK-23221 fixes an issue specific
to KafkaContinuousSourceStressForDontFailOnDataLossSuite; I don't think it
could cause other suites to deadlock.

Do note that the previous hang issues we saw caused by SPARK-23055 were
correctly marked as failures.

On Thu, Jan 25, 2018 at 3:40 PM, Shixiong(Ryan) Zhu  wrote:

> + Jose
>
> On Thu, Jan 25, 2018 at 2:18 PM, Dongjoon Hyun 
> wrote:
>
>> SPARK-23221 is one of the reasons for Kafka-test-suite deadlock issue.
>>
>> For the hang issues, it seems not to be marked as a failure correctly in
>> Apache Spark Jenkins history.
>>
>>
>> On Thu, Jan 25, 2018 at 1:03 PM, Marcelo Vanzin 
>> wrote:
>>
>>> On Thu, Jan 25, 2018 at 12:29 PM, Sean Owen  wrote:
>>> > I am still seeing these tests fail or hang:
>>> >
>>> > - subscribing topic by name from earliest offsets (failOnDataLoss:
>>> false)
>>> > - subscribing topic by name from earliest offsets (failOnDataLoss:
>>> true)
>>>
>>> This is something that we are seeing internally on a different version
>>> Spark, and we're currently investigating with our Kafka people. Not
>>> sure it's the same issue (we have a newer version of Kafka libraries),
>>> but this is just another way of saying that I don't think those hangs
>>> are new in 2.3, at least.
>>>
>>> --
>>> Marcelo
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>


Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-26 Thread Sameer Agarwal
This vote has failed due to a number of aforementioned blockers. I'll
follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
resolved: https://s.apache.org/oXKi


On 25 January 2018 at 12:59, Sameer Agarwal  wrote:

>
> Most tests pass on RC2, except I'm still seeing the timeout caused by
>> https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never
>> finish. I followed the thread a bit further and wasn't clear whether it was
>> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
>> https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though I am
>> still seeing these tests fail or hang:
>>
>> - subscribing topic by name from earliest offsets (failOnDataLoss: false)
>> - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>>
>
> Sean, while some of these tests were timing out on RC1, we're not aware of
> any known issues in RC2. Both maven (https://amplab.cs.berkeley.
> edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/
> spark-branch-2.3-test-maven-hadoop-2.6/146/testReport/org.
> apache.spark.sql.kafka010/history/) and sbt (https://amplab.cs.berkeley.
> edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/
> spark-branch-2.3-test-sbt-hadoop-2.6/123/testReport/org.
> apache.spark.sql.kafka010/history/) historical builds on jenkins
> for org.apache.spark.sql.kafka010 look fairly healthy. If you're still
> seeing timeouts in RC2, can you create a JIRA with any applicable build/env
> info?
>
>
>
>> On Tue, Jan 23, 2018 at 9:01 AM Sean Owen  wrote:
>>
>>> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
>>> unpacking it with 'xvzf' and also unzipping it first, and it untarred
>>> without warnings in either case.
>>>
>>> I am encountering errors while running the tests, different ones each
>>> time, so am still figuring out whether there is a real problem or just
>>> flaky tests.
>>>
>>> These issues look like blockers, as they are inherently to be completed
>>> before the 2.3 release. They are mostly not done. I suppose I'd -1 on
>>> behalf of those who say this needs to be done first, though, we can keep
>>> testing.
>>>
>>> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
>>> SPARK-23114 Spark R 2.3 QA umbrella
>>>
>>> Here are the remaining items targeted for 2.3:
>>>
>>> SPARK-15689 Data source API v2
>>> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
>>> SPARK-21646 Add new type coercion rules to compatible with Hive
>>> SPARK-22386 Data Source V2 improvements
>>> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
>>> SPARK-22735 Add VectorSizeHint to ML features documentation
>>> SPARK-22739 Additional Expression Support for Objects
>>> SPARK-22809 pyspark is sensitive to imports with dots
>>> SPARK-22820 Spark 2.3 SQL API audit
>>>
>>>
>>> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin 
>>> wrote:
>>>
 +0

 Signatures check out. Code compiles, although I see the errors in [1]
 when untarring the source archive; perhaps we should add "use GNU tar"
 to the RM checklist?

 Also ran our internal tests and they seem happy.

 My concern is the list of open bugs targeted at 2.3.0 (ignoring the
 documentation ones). It is not long, but it seems some of those need
 to be looked at. It would be nice for the committers who are involved
 in those bugs to take a look.

 [1] https://superuser.com/questions/318809/linux-os-x-tar-
 incompatibility-tarballs-created-on-os-x-give-errors-when-unt


 On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
 wrote:
 > Please vote on releasing the following candidate as Apache Spark
 version
 > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am
 UTC and
 > passes if a majority of at least 3 PMC +1 votes are cast.
 >
 >
 > [ ] +1 Release this package as Apache Spark 2.3.0
 >
 > [ ] -1 Do not release this package because ...
 >
 >
 > To learn more about Apache Spark, please see
 https://spark.apache.org/
 >
 > The tag to be voted on is v2.3.0-rc2:
 > https://github.com/apache/spark/tree/v2.3.0-rc2
 > (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
 >
 > List of JIRA tickets resolved in this release can be found here:
 > https://issues.apache.org/jira/projects/SPARK/versions/12339551
 >
 > The release files, including signatures, digests, etc. can be found
 at:
 > https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
 >
 > Release artifacts are signed with the following key:
 > https://dist.apache.org/repos/dist/dev/spark/KEYS
 >
 > The staging repository for this release can be found at:
 > https://repository.apache.org/content/repositories/orgapache
 spark-1262/
 >
 > The documentation corresponding to this release can be found at:
 > https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs
 /_site/index.html
 >
 

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-30 Thread Andrew Ash
I'd like to nominate SPARK-23274
 as a potential blocker
for the 2.3.0 release as well, due to being a regression from 2.2.0.  The
ticket has a simple repro included, showing a query that works in prior
releases but now fails with an exception in the catalyst optimizer.

On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal 
wrote:

> This vote has failed due to a number of aforementioned blockers. I'll
> follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
> resolved: https://s.apache.org/oXKi
>
>
> On 25 January 2018 at 12:59, Sameer Agarwal  wrote:
>
>>
>> Most tests pass on RC2, except I'm still seeing the timeout caused by
>>> https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never
>>> finish. I followed the thread a bit further and wasn't clear whether it was
>>> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
>>> https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though I am
>>> still seeing these tests fail or hang:
>>>
>>> - subscribing topic by name from earliest offsets (failOnDataLoss: false)
>>> - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>>>
>>
>> Sean, while some of these tests were timing out on RC1, we're not aware
>> of any known issues in RC2. Both maven (https://amplab.cs.berkeley.ed
>> u/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-
>> branch-2.3-test-maven-hadoop-2.6/146/testReport/org.apache.
>> spark.sql.kafka010/history/) and sbt (https://amplab.cs.berkeley.ed
>> u/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-
>> branch-2.3-test-sbt-hadoop-2.6/123/testReport/org.apache.
>> spark.sql.kafka010/history/) historical builds on jenkins
>> for org.apache.spark.sql.kafka010 look fairly healthy. If you're still
>> seeing timeouts in RC2, can you create a JIRA with any applicable build/env
>> info?
>>
>>
>>
>>> On Tue, Jan 23, 2018 at 9:01 AM Sean Owen  wrote:
>>>
 I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
 unpacking it with 'xvzf' and also unzipping it first, and it untarred
 without warnings in either case.

 I am encountering errors while running the tests, different ones each
 time, so am still figuring out whether there is a real problem or just
 flaky tests.

 These issues look like blockers, as they are inherently to be completed
 before the 2.3 release. They are mostly not done. I suppose I'd -1 on
 behalf of those who say this needs to be done first, though, we can keep
 testing.

 SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
 SPARK-23114 Spark R 2.3 QA umbrella

 Here are the remaining items targeted for 2.3:

 SPARK-15689 Data source API v2
 SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
 SPARK-21646 Add new type coercion rules to compatible with Hive
 SPARK-22386 Data Source V2 improvements
 SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
 SPARK-22735 Add VectorSizeHint to ML features documentation
 SPARK-22739 Additional Expression Support for Objects
 SPARK-22809 pyspark is sensitive to imports with dots
 SPARK-22820 Spark 2.3 SQL API audit


 On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin 
 wrote:

> +0
>
> Signatures check out. Code compiles, although I see the errors in [1]
> when untarring the source archive; perhaps we should add "use GNU tar"
> to the RM checklist?
>
> Also ran our internal tests and they seem happy.
>
> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
> documentation ones). It is not long, but it seems some of those need
> to be looked at. It would be nice for the committers who are involved
> in those bugs to take a look.
>
> [1] https://superuser.com/questions/318809/linux-os-x-tar-incomp
> atibility-tarballs-created-on-os-x-give-errors-when-unt
>
>
> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
> wrote:
> > Please vote on releasing the following candidate as Apache Spark
> version
> > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am
> UTC and
> > passes if a majority of at least 3 PMC +1 votes are cast.
> >
> >
> > [ ] +1 Release this package as Apache Spark 2.3.0
> >
> > [ ] -1 Do not release this package because ...
> >
> >
> > To learn more about Apache Spark, please see
> https://spark.apache.org/
> >
> > The tag to be voted on is v2.3.0-rc2:
> > https://github.com/apache/spark/tree/v2.3.0-rc2
> > (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
> >
> > List of JIRA tickets resolved in this release can be found here:
> > https://issues.apache.org/jira/projects/SPARK/versions/12339551
> >
> > The release files, including signatures, digests, etc. can be found
> at:
> > https://dist.apache.org/rep

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-31 Thread Sameer Agarwal
Just a quick status update on RC3 -- SPARK-23274
 was resolved yesterday
and tests have been quite healthy throughout this week and the last. I'll
cut the new RC as soon as the remaining blocker (SPARK-23202
) is resolved.


On 30 January 2018 at 10:12, Andrew Ash  wrote:

> I'd like to nominate SPARK-23274
>  as a potential
> blocker for the 2.3.0 release as well, due to being a regression from
> 2.2.0.  The ticket has a simple repro included, showing a query that works
> in prior releases but now fails with an exception in the catalyst optimizer.
>
> On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal 
> wrote:
>
>> This vote has failed due to a number of aforementioned blockers. I'll
>> follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
>> resolved: https://s.apache.org/oXKi
>>
>>
>> On 25 January 2018 at 12:59, Sameer Agarwal 
>> wrote:
>>
>>>
>>> Most tests pass on RC2, except I'm still seeing the timeout caused by
 https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never
 finish. I followed the thread a bit further and wasn't clear whether it was
 subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
 https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though I
 am still seeing these tests fail or hang:

 - subscribing topic by name from earliest offsets (failOnDataLoss:
 false)
 - subscribing topic by name from earliest offsets (failOnDataLoss: true)

>>>
>>> Sean, while some of these tests were timing out on RC1, we're not aware
>>> of any known issues in RC2. Both maven (https://amplab.cs.berkeley.ed
>>> u/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-bra
>>> nch-2.3-test-maven-hadoop-2.6/146/testReport/org.apache.spar
>>> k.sql.kafka010/history/) and sbt (https://amplab.cs.berkeley.ed
>>> u/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-bra
>>> nch-2.3-test-sbt-hadoop-2.6/123/testReport/org.apache.spar
>>> k.sql.kafka010/history/) historical builds on jenkins
>>> for org.apache.spark.sql.kafka010 look fairly healthy. If you're still
>>> seeing timeouts in RC2, can you create a JIRA with any applicable build/env
>>> info?
>>>
>>>
>>>
 On Tue, Jan 23, 2018 at 9:01 AM Sean Owen  wrote:

> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
> unpacking it with 'xvzf' and also unzipping it first, and it untarred
> without warnings in either case.
>
> I am encountering errors while running the tests, different ones each
> time, so am still figuring out whether there is a real problem or just
> flaky tests.
>
> These issues look like blockers, as they are inherently to be
> completed before the 2.3 release. They are mostly not done. I suppose I'd
> -1 on behalf of those who say this needs to be done first, though, we can
> keep testing.
>
> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
> SPARK-23114 Spark R 2.3 QA umbrella
>
> Here are the remaining items targeted for 2.3:
>
> SPARK-15689 Data source API v2
> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
> SPARK-21646 Add new type coercion rules to compatible with Hive
> SPARK-22386 Data Source V2 improvements
> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
> SPARK-22735 Add VectorSizeHint to ML features documentation
> SPARK-22739 Additional Expression Support for Objects
> SPARK-22809 pyspark is sensitive to imports with dots
> SPARK-22820 Spark 2.3 SQL API audit
>
>
> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin 
> wrote:
>
>> +0
>>
>> Signatures check out. Code compiles, although I see the errors in [1]
>> when untarring the source archive; perhaps we should add "use GNU tar"
>> to the RM checklist?
>>
>> Also ran our internal tests and they seem happy.
>>
>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>> documentation ones). It is not long, but it seems some of those need
>> to be looked at. It would be nice for the committers who are involved
>> in those bugs to take a look.
>>
>> [1] https://superuser.com/questions/318809/linux-os-x-tar-incomp
>> atibility-tarballs-created-on-os-x-give-errors-when-unt
>>
>>
>> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
>> wrote:
>> > Please vote on releasing the following candidate as Apache Spark
>> version
>> > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am
>> UTC and
>> > passes if a majority of at least 3 PMC +1 votes are cast.
>> >
>> >
>> > [ ] +1 Release this package as Apache Spark 2.3.0
>> >
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > To learn more about A

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-31 Thread Yin Huai
seems we are not running tests related to pandas in pyspark tests (see my
email "python tests related to pandas are skipped in jenkins"). I think we
should fix this test issue and make sure all tests are good before cutting
RC3.

On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal 
wrote:

> Just a quick status update on RC3 -- SPARK-23274
>  was resolved
> yesterday and tests have been quite healthy throughout this week and the
> last. I'll cut the new RC as soon as the remaining blocker (SPARK-23202
> ) is resolved.
>
>
> On 30 January 2018 at 10:12, Andrew Ash  wrote:
>
>> I'd like to nominate SPARK-23274
>>  as a potential
>> blocker for the 2.3.0 release as well, due to being a regression from
>> 2.2.0.  The ticket has a simple repro included, showing a query that works
>> in prior releases but now fails with an exception in the catalyst optimizer.
>>
>> On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal 
>> wrote:
>>
>>> This vote has failed due to a number of aforementioned blockers. I'll
>>> follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
>>> resolved: https://s.apache.org/oXKi
>>>
>>>
>>> On 25 January 2018 at 12:59, Sameer Agarwal 
>>> wrote:
>>>

 Most tests pass on RC2, except I'm still seeing the timeout caused by
> https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never
> finish. I followed the thread a bit further and wasn't clear whether it 
> was
> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
> https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though I
> am still seeing these tests fail or hang:
>
> - subscribing topic by name from earliest offsets (failOnDataLoss:
> false)
> - subscribing topic by name from earliest offsets (failOnDataLoss:
> true)
>

 Sean, while some of these tests were timing out on RC1, we're not aware
 of any known issues in RC2. Both maven (https://amplab.cs.berkeley.ed
 u/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-bra
 nch-2.3-test-maven-hadoop-2.6/146/testReport/org.apache.spar
 k.sql.kafka010/history/) and sbt (https://amplab.cs.berkeley.ed
 u/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-bra
 nch-2.3-test-sbt-hadoop-2.6/123/testReport/org.apache.spark.
 sql.kafka010/history/) historical builds on jenkins
 for org.apache.spark.sql.kafka010 look fairly healthy. If you're still
 seeing timeouts in RC2, can you create a JIRA with any applicable build/env
 info?



> On Tue, Jan 23, 2018 at 9:01 AM Sean Owen  wrote:
>
>> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
>> unpacking it with 'xvzf' and also unzipping it first, and it untarred
>> without warnings in either case.
>>
>> I am encountering errors while running the tests, different ones each
>> time, so am still figuring out whether there is a real problem or just
>> flaky tests.
>>
>> These issues look like blockers, as they are inherently to be
>> completed before the 2.3 release. They are mostly not done. I suppose I'd
>> -1 on behalf of those who say this needs to be done first, though, we can
>> keep testing.
>>
>> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
>> SPARK-23114 Spark R 2.3 QA umbrella
>>
>> Here are the remaining items targeted for 2.3:
>>
>> SPARK-15689 Data source API v2
>> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
>> SPARK-21646 Add new type coercion rules to compatible with Hive
>> SPARK-22386 Data Source V2 improvements
>> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
>> SPARK-22735 Add VectorSizeHint to ML features documentation
>> SPARK-22739 Additional Expression Support for Objects
>> SPARK-22809 pyspark is sensitive to imports with dots
>> SPARK-22820 Spark 2.3 SQL API audit
>>
>>
>> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin 
>> wrote:
>>
>>> +0
>>>
>>> Signatures check out. Code compiles, although I see the errors in [1]
>>> when untarring the source archive; perhaps we should add "use GNU
>>> tar"
>>> to the RM checklist?
>>>
>>> Also ran our internal tests and they seem happy.
>>>
>>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>>> documentation ones). It is not long, but it seems some of those need
>>> to be looked at. It would be nice for the committers who are involved
>>> in those bugs to take a look.
>>>
>>> [1] https://superuser.com/questions/318809/linux-os-x-tar-incomp
>>> atibility-tarballs-created-on-os-x-give-errors-when-unt
>>>
>>>
>>> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
>>> wrote:
>>> > Please 

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Nick Pentreath
All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side
that should be everything outstanding.

On Thu, 1 Feb 2018 at 06:21 Yin Huai  wrote:

> seems we are not running tests related to pandas in pyspark tests (see my
> email "python tests related to pandas are skipped in jenkins"). I think we
> should fix this test issue and make sure all tests are good before cutting
> RC3.
>
> On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal 
> wrote:
>
>> Just a quick status update on RC3 -- SPARK-23274
>>  was resolved
>> yesterday and tests have been quite healthy throughout this week and the
>> last. I'll cut the new RC as soon as the remaining blocker (SPARK-23202
>> ) is resolved.
>>
>>
>> On 30 January 2018 at 10:12, Andrew Ash  wrote:
>>
>>> I'd like to nominate SPARK-23274
>>>  as a potential
>>> blocker for the 2.3.0 release as well, due to being a regression from
>>> 2.2.0.  The ticket has a simple repro included, showing a query that works
>>> in prior releases but now fails with an exception in the catalyst optimizer.
>>>
>>> On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal 
>>> wrote:
>>>
 This vote has failed due to a number of aforementioned blockers. I'll
 follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
 resolved: https://s.apache.org/oXKi


 On 25 January 2018 at 12:59, Sameer Agarwal 
 wrote:

>
> Most tests pass on RC2, except I'm still seeing the timeout caused by
>> https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never
>> finish. I followed the thread a bit further and wasn't clear whether it 
>> was
>> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
>> https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though I
>> am still seeing these tests fail or hang:
>>
>> - subscribing topic by name from earliest offsets (failOnDataLoss:
>> false)
>> - subscribing topic by name from earliest offsets (failOnDataLoss:
>> true)
>>
>
> Sean, while some of these tests were timing out on RC1, we're not
> aware of any known issues in RC2. Both maven (
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.6/146/testReport/org.apache.spark.sql.kafka010/history/)
> and sbt (
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/123/testReport/org.apache.spark.sql.kafka010/history/)
> historical builds on jenkins for org.apache.spark.sql.kafka010 look fairly
> healthy. If you're still seeing timeouts in RC2, can you create a JIRA 
> with
> any applicable build/env info?
>
>
>
>> On Tue, Jan 23, 2018 at 9:01 AM Sean Owen  wrote:
>>
>>> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
>>> unpacking it with 'xvzf' and also unzipping it first, and it untarred
>>> without warnings in either case.
>>>
>>> I am encountering errors while running the tests, different ones
>>> each time, so am still figuring out whether there is a real problem or 
>>> just
>>> flaky tests.
>>>
>>> These issues look like blockers, as they are inherently to be
>>> completed before the 2.3 release. They are mostly not done. I suppose 
>>> I'd
>>> -1 on behalf of those who say this needs to be done first, though, we 
>>> can
>>> keep testing.
>>>
>>> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
>>> SPARK-23114 Spark R 2.3 QA umbrella
>>>
>>> Here are the remaining items targeted for 2.3:
>>>
>>> SPARK-15689 Data source API v2
>>> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
>>> SPARK-21646 Add new type coercion rules to compatible with Hive
>>> SPARK-22386 Data Source V2 improvements
>>> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
>>> SPARK-22735 Add VectorSizeHint to ML features documentation
>>> SPARK-22739 Additional Expression Support for Objects
>>> SPARK-22809 pyspark is sensitive to imports with dots
>>> SPARK-22820 Spark 2.3 SQL API audit
>>>
>>>
>>> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin 
>>> wrote:
>>>
 +0

 Signatures check out. Code compiles, although I see the errors in
 [1]
 when untarring the source archive; perhaps we should add "use GNU
 tar"
 to the RM checklist?

 Also ran our internal tests and they seem happy.

 My concern is the list of open bugs targeted at 2.3.0 (ignoring the
 documentation ones). It is not long, but it seems some of those need
 to be looked at. It would be nice for the committers who are
>>>

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Michael Heuer
We found two classes new to Spark 2.3.0 that must be registered in Kryo for
our tests to pass on RC2

org.apache.spark.sql.execution.datasources.BasicWriteTaskStats
org.apache.spark.sql.execution.datasources.ExecutedWriteSummary

https://github.com/bigdatagenomics/adam/pull/1897

Perhaps a mention in release notes?

   michael


On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath 
wrote:

> All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side
> that should be everything outstanding.
>
>
> On Thu, 1 Feb 2018 at 06:21 Yin Huai  wrote:
>
>> seems we are not running tests related to pandas in pyspark tests (see my
>> email "python tests related to pandas are skipped in jenkins"). I think we
>> should fix this test issue and make sure all tests are good before cutting
>> RC3.
>>
>> On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal 
>> wrote:
>>
>>> Just a quick status update on RC3 -- SPARK-23274
>>>  was resolved
>>> yesterday and tests have been quite healthy throughout this week and the
>>> last. I'll cut the new RC as soon as the remaining blocker (SPARK-23202
>>> ) is resolved.
>>>
>>>
>>> On 30 January 2018 at 10:12, Andrew Ash  wrote:
>>>
 I'd like to nominate SPARK-23274
  as a potential
 blocker for the 2.3.0 release as well, due to being a regression from
 2.2.0.  The ticket has a simple repro included, showing a query that works
 in prior releases but now fails with an exception in the catalyst 
 optimizer.

 On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal >>> > wrote:

> This vote has failed due to a number of aforementioned blockers. I'll
> follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
> resolved: https://s.apache.org/oXKi
>
>
> On 25 January 2018 at 12:59, Sameer Agarwal 
> wrote:
>
>>
>> Most tests pass on RC2, except I'm still seeing the timeout caused by
>>> https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never
>>> finish. I followed the thread a bit further and wasn't clear whether it 
>>> was
>>> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
>>> https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though
>>> I am still seeing these tests fail or hang:
>>>
>>> - subscribing topic by name from earliest offsets (failOnDataLoss:
>>> false)
>>> - subscribing topic by name from earliest offsets (failOnDataLoss:
>>> true)
>>>
>>
>> Sean, while some of these tests were timing out on RC1, we're not
>> aware of any known issues in RC2. Both maven (
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%
>> 20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-
>> hadoop-2.6/146/testReport/org.apache.spark.sql.kafka010/history/)
>> and sbt (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%
>> 20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-
>> hadoop-2.6/123/testReport/org.apache.spark.sql.kafka010/history/)
>> historical builds on jenkins for org.apache.spark.sql.kafka010 look
>> fairly healthy. If you're still seeing timeouts in RC2, can you create a
>> JIRA with any applicable build/env info?
>>
>>
>>
>>> On Tue, Jan 23, 2018 at 9:01 AM Sean Owen 
>>> wrote:
>>>
 I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
 unpacking it with 'xvzf' and also unzipping it first, and it untarred
 without warnings in either case.

 I am encountering errors while running the tests, different ones
 each time, so am still figuring out whether there is a real problem or 
 just
 flaky tests.

 These issues look like blockers, as they are inherently to be
 completed before the 2.3 release. They are mostly not done. I suppose 
 I'd
 -1 on behalf of those who say this needs to be done first, though, we 
 can
 keep testing.

 SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
 SPARK-23114 Spark R 2.3 QA umbrella

 Here are the remaining items targeted for 2.3:

 SPARK-15689 Data source API v2
 SPARK-20928 SPIP: Continuous Processing Mode for Structured
 Streaming
 SPARK-21646 Add new type coercion rules to compatible with Hive
 SPARK-22386 Data Source V2 improvements
 SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
 SPARK-22735 Add VectorSizeHint to ML features documentation
 SPARK-22739 Additional Expression Support for Objects
 SPARK-22809 pyspark is sensitive to imports with dots
 SPARK-22820 Spark 2.3 SQL API audit


 On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin 
 wrote:
>>>

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Tom Graves
 
Testing with spark 2.3 and I see a difference in the sql coalesce talking to 
hive vs spark 2.2. It seems spark 2.3 ignores the coalesce.
Query:spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >= 
'20170301' AND dt <= '20170331' AND something IS NOT 
NULL").coalesce(16).show()

in spark 2.2 the coalesce works here, but in spark 2.3, it doesn't.   Anyone 
know about this issue or are there some weird config changes, otherwise I'll 
file a jira?
Note I also see a performance difference when reading cached data. Spark 2.3. 
Small query on 19GB cached data, spark 2.3 is 30% worse.  This is only 13 
seconds on spark 2.2 vs 17 seconds on spark 2.3.  Straight up reading from hive 
(orc) seems better though.
Tom


On Thursday, February 1, 2018, 11:23:45 AM CST, Michael Heuer 
 wrote:  
 
 We found two classes new to Spark 2.3.0 that must be registered in Kryo for 
our tests to pass on RC2

org.apache.spark.sql.execution.datasources.BasicWriteTaskStats
org.apache.spark.sql.execution.datasources.ExecutedWriteSummary

https://github.com/bigdatagenomics/adam/pull/1897

Perhaps a mention in release notes?

   michael


On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath  wrote:

All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side that 
should be everything outstanding.

On Thu, 1 Feb 2018 at 06:21 Yin Huai  wrote:

seems we are not running tests related to pandas in pyspark tests (see my email 
"python tests related to pandas are skipped in jenkins"). I think we should fix 
this test issue and make sure all tests are good before cutting RC3.
On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal  wrote:

Just a quick status update on RC3 -- SPARK-23274 was resolved yesterday and 
tests have been quite healthy throughout this week and the last. I'll cut the 
new RC as soon as the remaining blocker (SPARK-23202) is resolved.

On 30 January 2018 at 10:12, Andrew Ash  wrote:

I'd like to nominate SPARK-23274 as a potential blocker for the 2.3.0 release 
as well, due to being a regression from 2.2.0.  The ticket has a simple repro 
included, showing a query that works in prior releases but now fails with an 
exception in the catalyst optimizer.
On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal  wrote:

This vote has failed due to a number of aforementioned blockers. I'll follow up 
with RC3 as soon as the 2 remaining (non-QA) blockers are resolved: 
https://s.apache. org/oXKi


On 25 January 2018 at 12:59, Sameer Agarwal  wrote:



Most tests pass on RC2, except I'm still seeing the timeout caused by 
https://issues.apache.org/ jira/browse/SPARK-23055 ; the tests never finish. I 
followed the thread a bit further and wasn't clear whether it was subsequently 
re-fixed for 2.3.0 or not. It says it's resolved along with 
https://issues.apache. org/jira/browse/SPARK-22908  for 2.3.0 though I am still 
seeing these tests fail or hang:
- subscribing topic by name from earliest offsets (failOnDataLoss: false)- 
subscribing topic by name from earliest offsets (failOnDataLoss: true)

Sean, while some of these tests were timing out on RC1, we're not aware of any 
known issues in RC2. Both maven (https://amplab.cs.berkeley. 
edu/jenkins/view/Spark%20QA% 20Test%20(Dashboard)/job/ 
spark-branch-2.3-test-maven- hadoop-2.6/146/testReport/org. 
apache.spark.sql.kafka010/ history/) and sbt (https://amplab.cs.berkeley. 
edu/jenkins/view/Spark%20QA% 20Test%20(Dashboard)/job/ 
spark-branch-2.3-test-sbt- hadoop-2.6/123/testReport/org. 
apache.spark.sql.kafka010/ history/) historical builds on jenkins for 
org.apache.spark.sql. kafka010 look fairly healthy. If you're still seeing 
timeouts in RC2, can you create a JIRA with any applicable build/env info?
 
On Tue, Jan 23, 2018 at 9:01 AM Sean Owen  wrote:

I'm not seeing that same problem on OS X and /usr/bin/tar. I tried unpacking it 
with 'xvzf' and also unzipping it first, and it untarred without warnings in 
either case.
I am encountering errors while running the tests, different ones each time, so 
am still figuring out whether there is a real problem or just flaky tests.
These issues look like blockers, as they are inherently to be completed before 
the 2.3 release. They are mostly not done. I suppose I'd -1 on behalf of those 
who say this needs to be done first, though, we can keep testing.
SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrellaSPARK-23114 Spark R 2.3 QA 
umbrella
Here are the remaining items targeted for 2.3:
SPARK-15689 Data source API v2SPARK-20928 SPIP: Continuous Processing Mode for 
Structured StreamingSPARK-21646 Add new type coercion rules to compatible with 
HiveSPARK-22386 Data Source V2 improvementsSPARK-22731 Add a test for ROWID 
type to OracleIntegrationSuiteSPARK-22735 Add VectorSizeHint to ML features 
documentationSPARK-22739 Additional Expression Support for ObjectsSPARK-22809 
pyspark is sensitive to imports with dotsSPARK-22820 Spark 2.3 SQL API audit

On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin  wrote:

+0

Signatures

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Andrew Ash
I'd like to nominate SPARK-23290
 as a potential blocker
for the 2.3.0 release.  It's a regression from 2.2.0 in that user pyspark
code that works in 2.2.0 now fails in the 2.3.0 RCs: the type return type
of date columns changed from object to datetime64[ns].  My understanding of
the Spark Versioning Policy  is
that user code should continue to run in future versions of Spark with the
same major version number.

Thanks!

On Thu, Feb 1, 2018 at 9:50 AM, Tom Graves 
wrote:

>
> Testing with spark 2.3 and I see a difference in the sql coalesce talking
> to hive vs spark 2.2. It seems spark 2.3 ignores the coalesce.
>
> Query:
> spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >=
> '20170301' AND dt <= '20170331' AND something IS NOT
> NULL").coalesce(16).show()
>
> in spark 2.2 the coalesce works here, but in spark 2.3, it doesn't.
>  Anyone know about this issue or are there some weird config changes,
> otherwise I'll file a jira?
>
> Note I also see a performance difference when reading cached data. Spark
> 2.3. Small query on 19GB cached data, spark 2.3 is 30% worse.  This is only
> 13 seconds on spark 2.2 vs 17 seconds on spark 2.3.  Straight up reading
> from hive (orc) seems better though.
>
> Tom
>
>
>
> On Thursday, February 1, 2018, 11:23:45 AM CST, Michael Heuer <
> heue...@gmail.com> wrote:
>
>
> We found two classes new to Spark 2.3.0 that must be registered in Kryo
> for our tests to pass on RC2
>
> org.apache.spark.sql.execution.datasources.BasicWriteTaskStats
> org.apache.spark.sql.execution.datasources.ExecutedWriteSummary
>
> https://github.com/bigdatagenomics/adam/pull/1897
>
> Perhaps a mention in release notes?
>
>michael
>
>
> On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath 
> wrote:
>
> All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side
> that should be everything outstanding.
>
>
> On Thu, 1 Feb 2018 at 06:21 Yin Huai  wrote:
>
> seems we are not running tests related to pandas in pyspark tests (see my
> email "python tests related to pandas are skipped in jenkins"). I think we
> should fix this test issue and make sure all tests are good before cutting
> RC3.
>
> On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal 
> wrote:
>
> Just a quick status update on RC3 -- SPARK-23274
>  was resolved
> yesterday and tests have been quite healthy throughout this week and the
> last. I'll cut the new RC as soon as the remaining blocker (SPARK-23202
> ) is resolved.
>
>
> On 30 January 2018 at 10:12, Andrew Ash  wrote:
>
> I'd like to nominate SPARK-23274
>  as a potential
> blocker for the 2.3.0 release as well, due to being a regression from
> 2.2.0.  The ticket has a simple repro included, showing a query that works
> in prior releases but now fails with an exception in the catalyst optimizer.
>
> On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal 
> wrote:
>
> This vote has failed due to a number of aforementioned blockers. I'll
> follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
> resolved: https://s.apache. org/oXKi 
>
>
> On 25 January 2018 at 12:59, Sameer Agarwal  wrote:
>
>
> Most tests pass on RC2, except I'm still seeing the timeout caused by 
> https://issues.apache.org/
> jira/browse/SPARK-23055
>  ; the tests never
> finish. I followed the thread a bit further and wasn't clear whether it was
> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with 
> https://issues.apache.
> org/jira/browse/SPARK-22908
>   for 2.3.0 though I
> am still seeing these tests fail or hang:
>
> - subscribing topic by name from earliest offsets (failOnDataLoss: false)
> - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>
>
> Sean, while some of these tests were timing out on RC1, we're not aware of
> any known issues in RC2. Both maven (https://amplab.cs.berkeley.
> edu/jenkins/view/Spark%20QA% 20Test%20(Dashboard)/job/
> spark-branch-2.3-test-maven- hadoop-2.6/146/testReport/org.
> apache.spark.sql.kafka010/ history/
> )
> and sbt (https://amplab.cs.berkeley. edu/jenkins/view/Spark%20QA%
> 20Test%20(Dashboard)/job/ spark-branch-2.3-test-sbt-
> hadoop-2.6/123/testReport/org. apache.spark.sql.kafka010/ history/
> )
> historical builds on jenkins for org.apache.spark.sql. kafka010 look fairly
> 

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Sameer Agarwal
[+ Xiao]

SPARK-23290  does sound like a blocker. On the SQL side, I can confirm that
there were non-trivial changes around repartitioning/coalesce and cache
performance in 2.3 --  we're currently investigating these.

On 1 February 2018 at 10:02, Andrew Ash  wrote:

> I'd like to nominate SPARK-23290
>  as a potential
> blocker for the 2.3.0 release.  It's a regression from 2.2.0 in that user
> pyspark code that works in 2.2.0 now fails in the 2.3.0 RCs: the type
> return type of date columns changed from object to datetime64[ns].  My
> understanding of the Spark Versioning Policy
>  is that user code should
> continue to run in future versions of Spark with the same major version
> number.
>
> Thanks!
>
> On Thu, Feb 1, 2018 at 9:50 AM, Tom Graves 
> wrote:
>
>>
>> Testing with spark 2.3 and I see a difference in the sql coalesce talking
>> to hive vs spark 2.2. It seems spark 2.3 ignores the coalesce.
>>
>> Query:
>> spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >=
>> '20170301' AND dt <= '20170331' AND something IS NOT
>> NULL").coalesce(16).show()
>>
>> in spark 2.2 the coalesce works here, but in spark 2.3, it doesn't.
>>  Anyone know about this issue or are there some weird config changes,
>> otherwise I'll file a jira?
>>
>> Note I also see a performance difference when reading cached data. Spark
>> 2.3. Small query on 19GB cached data, spark 2.3 is 30% worse.  This is only
>> 13 seconds on spark 2.2 vs 17 seconds on spark 2.3.  Straight up reading
>> from hive (orc) seems better though.
>>
>> Tom
>>
>>
>>
>> On Thursday, February 1, 2018, 11:23:45 AM CST, Michael Heuer <
>> heue...@gmail.com> wrote:
>>
>>
>> We found two classes new to Spark 2.3.0 that must be registered in Kryo
>> for our tests to pass on RC2
>>
>> org.apache.spark.sql.execution.datasources.BasicWriteTaskStats
>> org.apache.spark.sql.execution.datasources.ExecutedWriteSummary
>>
>> https://github.com/bigdatagenomics/adam/pull/1897
>>
>> Perhaps a mention in release notes?
>>
>>michael
>>
>>
>> On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath 
>> wrote:
>>
>> All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side
>> that should be everything outstanding.
>>
>>
>> On Thu, 1 Feb 2018 at 06:21 Yin Huai  wrote:
>>
>> seems we are not running tests related to pandas in pyspark tests (see my
>> email "python tests related to pandas are skipped in jenkins"). I think we
>> should fix this test issue and make sure all tests are good before cutting
>> RC3.
>>
>> On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal 
>> wrote:
>>
>> Just a quick status update on RC3 -- SPARK-23274
>>  was resolved
>> yesterday and tests have been quite healthy throughout this week and the
>> last. I'll cut the new RC as soon as the remaining blocker (SPARK-23202
>> ) is resolved.
>>
>>
>> On 30 January 2018 at 10:12, Andrew Ash  wrote:
>>
>> I'd like to nominate SPARK-23274
>>  as a potential
>> blocker for the 2.3.0 release as well, due to being a regression from
>> 2.2.0.  The ticket has a simple repro included, showing a query that works
>> in prior releases but now fails with an exception in the catalyst optimizer.
>>
>> On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal 
>> wrote:
>>
>> This vote has failed due to a number of aforementioned blockers. I'll
>> follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
>> resolved: https://s.apache. org/oXKi 
>>
>>
>> On 25 January 2018 at 12:59, Sameer Agarwal 
>> wrote:
>>
>>
>> Most tests pass on RC2, except I'm still seeing the timeout caused by 
>> https://issues.apache.org/
>> jira/browse/SPARK-23055
>>  ; the tests never
>> finish. I followed the thread a bit further and wasn't clear whether it was
>> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with 
>> https://issues.apache.
>> org/jira/browse/SPARK-22908
>>   for 2.3.0 though I
>> am still seeing these tests fail or hang:
>>
>> - subscribing topic by name from earliest offsets (failOnDataLoss: false)
>> - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>>
>>
>> Sean, while some of these tests were timing out on RC1, we're not aware
>> of any known issues in RC2. Both maven (https://amplab.cs.berkeley.
>> edu/jenkins/view/Spark%20QA% 20Test%20(Dashboard)/job/
>> spark-branch-2.3-test-maven- hadoop-2.6/146/testReport/org.
>> apache.spark.sql.kafka010/ history/
>> )
>> and sbt (https://amplab.cs.berkeley. edu/jen

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Tom Graves
 I filed a jira [SPARK-23304] Spark SQL coalesce() against hive not working - 
ASF JIRA for the coalesce issue.

| 
| 
|  | 
[SPARK-23304] Spark SQL coalesce() against hive not working - ASF JIRA


 |

 |

 |



Tom
On Thursday, February 1, 2018, 12:36:02 PM CST, Sameer Agarwal 
 wrote:  
 
 [+ Xiao]
SPARK-23290  does sound like a blocker. On the SQL side, I can confirm that 
there were non-trivial changes around repartitioning/coalesce and cache 
performance in 2.3 --  we're currently investigating these.
On 1 February 2018 at 10:02, Andrew Ash  wrote:

I'd like to nominate SPARK-23290 as a potential blocker for the 2.3.0 release.  
It's a regression from 2.2.0 in that user pyspark code that works in 2.2.0 now 
fails in the 2.3.0 RCs: the type return type of date columns changed from 
object to datetime64[ns].  My understanding of the Spark Versioning Policy is 
that user code should continue to run in future versions of Spark with the same 
major version number.
Thanks!
On Thu, Feb 1, 2018 at 9:50 AM, Tom Graves  wrote:

 
Testing with spark 2.3 and I see a difference in the sql coalesce talking to 
hive vs spark 2.2. It seems spark 2.3 ignores the coalesce.
Query:spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >= 
'20170301' AND dt <= '20170331' AND something IS NOT 
NULL").coalesce(16).show()

in spark 2.2 the coalesce works here, but in spark 2.3, it doesn't.   Anyone 
know about this issue or are there some weird config changes, otherwise I'll 
file a jira?
Note I also see a performance difference when reading cached data. Spark 2.3. 
Small query on 19GB cached data, spark 2.3 is 30% worse.  This is only 13 
seconds on spark 2.2 vs 17 seconds on spark 2.3.  Straight up reading from hive 
(orc) seems better though.
Tom


On Thursday, February 1, 2018, 11:23:45 AM CST, Michael Heuer 
 wrote:  
 
 We found two classes new to Spark 2.3.0 that must be registered in Kryo for 
our tests to pass on RC2

org.apache.spark.sql.execution .datasources.BasicWriteTaskSta ts
org.apache.spark.sql.execution .datasources.ExecutedWriteSumm ary

https://github.com/bigdatageno mics/adam/pull/1897

Perhaps a mention in release notes?

   michael


On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath  wrote:

All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side that 
should be everything outstanding.

On Thu, 1 Feb 2018 at 06:21 Yin Huai  wrote:

seems we are not running tests related to pandas in pyspark tests (see my email 
"python tests related to pandas are skipped in jenkins"). I think we should fix 
this test issue and make sure all tests are good before cutting RC3.
On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal  wrote:

Just a quick status update on RC3 -- SPARK-23274 was resolved yesterday and 
tests have been quite healthy throughout this week and the last. I'll cut the 
new RC as soon as the remaining blocker (SPARK-23202) is resolved.

On 30 January 2018 at 10:12, Andrew Ash  wrote:

I'd like to nominate SPARK-23274 as a potential blocker for the 2.3.0 release 
as well, due to being a regression from 2.2.0.  The ticket has a simple repro 
included, showing a query that works in prior releases but now fails with an 
exception in the catalyst optimizer.
On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal  wrote:

This vote has failed due to a number of aforementioned blockers. I'll follow up 
with RC3 as soon as the 2 remaining (non-QA) blockers are resolved: 
https://s.apache. org/oXKi


On 25 January 2018 at 12:59, Sameer Agarwal  wrote:



Most tests pass on RC2, except I'm still seeing the timeout caused by 
https://issues.apache.org/ jira/browse/SPARK-23055 ; the tests never finish. I 
followed the thread a bit further and wasn't clear whether it was subsequently 
re-fixed for 2.3.0 or not. It says it's resolved along with 
https://issues.apache. org/jira/browse/SPARK-22908  for 2.3.0 though I am still 
seeing these tests fail or hang:
- subscribing topic by name from earliest offsets (failOnDataLoss: false)- 
subscribing topic by name from earliest offsets (failOnDataLoss: true)

Sean, while some of these tests were timing out on RC1, we're not aware of any 
known issues in RC2. Both maven (https://amplab.cs.berkeley. 
edu/jenkins/view/Spark%20QA% 20Test%20(Dashboard)/job/ 
spark-branch-2.3-test-maven- hadoop-2.6/146/testReport/org. 
apache.spark.sql.kafka010/ history/) and sbt (https://amplab.cs.berkeley. 
edu/jenkins/view/Spark%20QA% 20Test%20(Dashboard)/job/ 
spark-branch-2.3-test-sbt- hadoop-2.6/123/testReport/org. 
apache.spark.sql.kafka010/ history/) historical builds on jenkins for 
org.apache.spark.sql. kafka010 look fairly healthy. If you're still seeing 
timeouts in RC2, can you create a JIRA with any applicable build/env info?
 
On Tue, Jan 23, 2018 at 9:01 AM Sean Owen  wrote:

I'm not seeing that same problem on OS X and /usr/bin/tar. I tried unpacking it 
with 'xvzf' and also unzipping it first, and it untarred without warnings in 
either case.
I am 

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-04 Thread Xingbo Jiang
I filed another NPE problem in WebUI, I believe this is regression in 2.3:
https://issues.apache.org/jira/browse/SPARK-23330

2018-02-01 10:38 GMT-08:00 Tom Graves :

> I filed a jira [SPARK-23304] Spark SQL coalesce() against hive not
> working - ASF JIRA  for
> the coalesce issue.
>
> [SPARK-23304] Spark SQL coalesce() against hive not working - ASF JIRA
>
> 
>
>
> Tom
>
> On Thursday, February 1, 2018, 12:36:02 PM CST, Sameer Agarwal <
> samee...@apache.org> wrote:
>
>
> [+ Xiao]
>
> SPARK-23290  does sound like a blocker. On the SQL side, I can confirm
> that there were non-trivial changes around repartitioning/coalesce and
> cache performance in 2.3 --  we're currently investigating these.
>
> On 1 February 2018 at 10:02, Andrew Ash  wrote:
>
> I'd like to nominate SPARK-23290
>  as a potential
> blocker for the 2.3.0 release.  It's a regression from 2.2.0 in that user
> pyspark code that works in 2.2.0 now fails in the 2.3.0 RCs: the type
> return type of date columns changed from object to datetime64[ns].  My
> understanding of the Spark Versioning Policy
>  is that user code should
> continue to run in future versions of Spark with the same major version
> number.
>
> Thanks!
>
> On Thu, Feb 1, 2018 at 9:50 AM, Tom Graves 
> wrote:
>
>
> Testing with spark 2.3 and I see a difference in the sql coalesce talking
> to hive vs spark 2.2. It seems spark 2.3 ignores the coalesce.
>
> Query:
> spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >=
> '20170301' AND dt <= '20170331' AND something IS NOT
> NULL").coalesce(16).show()
>
> in spark 2.2 the coalesce works here, but in spark 2.3, it doesn't.
>  Anyone know about this issue or are there some weird config changes,
> otherwise I'll file a jira?
>
> Note I also see a performance difference when reading cached data. Spark
> 2.3. Small query on 19GB cached data, spark 2.3 is 30% worse.  This is only
> 13 seconds on spark 2.2 vs 17 seconds on spark 2.3.  Straight up reading
> from hive (orc) seems better though.
>
> Tom
>
>
>
> On Thursday, February 1, 2018, 11:23:45 AM CST, Michael Heuer <
> heue...@gmail.com> wrote:
>
>
> We found two classes new to Spark 2.3.0 that must be registered in Kryo
> for our tests to pass on RC2
>
> org.apache.spark.sql.execution .datasources.BasicWriteTaskSta ts
> org.apache.spark.sql.execution .datasources.ExecutedWriteSumm ary
>
> https://github.com/bigdatageno mics/adam/pull/1897
> 
>
> Perhaps a mention in release notes?
>
>michael
>
>
> On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath 
> wrote:
>
> All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side
> that should be everything outstanding.
>
>
> On Thu, 1 Feb 2018 at 06:21 Yin Huai  wrote:
>
> seems we are not running tests related to pandas in pyspark tests (see my
> email "python tests related to pandas are skipped in jenkins"). I think we
> should fix this test issue and make sure all tests are good before cutting
> RC3.
>
> On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal 
> wrote:
>
> Just a quick status update on RC3 -- SPARK-23274
>  was resolved
> yesterday and tests have been quite healthy throughout this week and the
> last. I'll cut the new RC as soon as the remaining blocker (SPARK-23202
> ) is resolved.
>
>
> On 30 January 2018 at 10:12, Andrew Ash  wrote:
>
> I'd like to nominate SPARK-23274
>  as a potential
> blocker for the 2.3.0 release as well, due to being a regression from
> 2.2.0.  The ticket has a simple repro included, showing a query that works
> in prior releases but now fails with an exception in the catalyst optimizer.
>
> On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal 
> wrote:
>
> This vote has failed due to a number of aforementioned blockers. I'll
> follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
> resolved: https://s.apache. org/oXKi 
>
>
> On 25 January 2018 at 12:59, Sameer Agarwal  wrote:
>
>
> Most tests pass on RC2, except I'm still seeing the timeout caused by 
> https://issues.apache.org/
> jira/browse/SPARK-23055
>  ; the tests never
> finish. I followed the thread a bit further and wasn't clear whether it was
> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with 
> https://issues.apache.
> org/jira/browse/SPARK-22908
>   for 2.3.0 though I
> am still seeing these tests fail or hang:
>
> - subscribing topic by name from earliest offsets (failOnDataLoss: false)
> - subscribing topic by name from earliest offsets (failOnD

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-06 Thread Sameer Agarwal
FYI -- Thanks to a big community-wide effort over the last few days, we're
now down to just one last remaining code blocker again:
https://issues.apache.org/jira/browse/SPARK-23309

I'll cut an RC3 as soon as that's resolved.

On 4 February 2018 at 00:02, Xingbo Jiang  wrote:

> I filed another NPE problem in WebUI, I believe this is regression in 2.3:
> https://issues.apache.org/jira/browse/SPARK-23330
>
> 2018-02-01 10:38 GMT-08:00 Tom Graves :
>
>> I filed a jira [SPARK-23304] Spark SQL coalesce() against hive not
>> working - ASF JIRA  for
>> the coalesce issue.
>>
>> [SPARK-23304] Spark SQL coalesce() against hive not working - ASF JIRA
>>
>> 
>>
>>
>> Tom
>>
>> On Thursday, February 1, 2018, 12:36:02 PM CST, Sameer Agarwal <
>> samee...@apache.org> wrote:
>>
>>
>> [+ Xiao]
>>
>> SPARK-23290  does sound like a blocker. On the SQL side, I can confirm
>> that there were non-trivial changes around repartitioning/coalesce and
>> cache performance in 2.3 --  we're currently investigating these.
>>
>> On 1 February 2018 at 10:02, Andrew Ash  wrote:
>>
>> I'd like to nominate SPARK-23290
>>  as a potential
>> blocker for the 2.3.0 release.  It's a regression from 2.2.0 in that user
>> pyspark code that works in 2.2.0 now fails in the 2.3.0 RCs: the type
>> return type of date columns changed from object to datetime64[ns].  My
>> understanding of the Spark Versioning Policy
>>  is that user code
>> should continue to run in future versions of Spark with the same major
>> version number.
>>
>> Thanks!
>>
>> On Thu, Feb 1, 2018 at 9:50 AM, Tom Graves 
>> wrote:
>>
>>
>> Testing with spark 2.3 and I see a difference in the sql coalesce talking
>> to hive vs spark 2.2. It seems spark 2.3 ignores the coalesce.
>>
>> Query:
>> spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >=
>> '20170301' AND dt <= '20170331' AND something IS NOT
>> NULL").coalesce(16).show()
>>
>> in spark 2.2 the coalesce works here, but in spark 2.3, it doesn't.
>>  Anyone know about this issue or are there some weird config changes,
>> otherwise I'll file a jira?
>>
>> Note I also see a performance difference when reading cached data. Spark
>> 2.3. Small query on 19GB cached data, spark 2.3 is 30% worse.  This is only
>> 13 seconds on spark 2.2 vs 17 seconds on spark 2.3.  Straight up reading
>> from hive (orc) seems better though.
>>
>> Tom
>>
>>
>>
>> On Thursday, February 1, 2018, 11:23:45 AM CST, Michael Heuer <
>> heue...@gmail.com> wrote:
>>
>>
>> We found two classes new to Spark 2.3.0 that must be registered in Kryo
>> for our tests to pass on RC2
>>
>> org.apache.spark.sql.execution .datasources.BasicWriteTaskSta ts
>> org.apache.spark.sql.execution .datasources.ExecutedWriteSumm ary
>>
>> https://github.com/bigdatageno mics/adam/pull/1897
>> 
>>
>> Perhaps a mention in release notes?
>>
>>michael
>>
>>
>> On Thu, Feb 1, 2018 at 3:29 AM, Nick Pentreath 
>> wrote:
>>
>> All MLlib QA JIRAs resolved. Looks like SparkR too, so from the ML side
>> that should be everything outstanding.
>>
>>
>> On Thu, 1 Feb 2018 at 06:21 Yin Huai  wrote:
>>
>> seems we are not running tests related to pandas in pyspark tests (see my
>> email "python tests related to pandas are skipped in jenkins"). I think we
>> should fix this test issue and make sure all tests are good before cutting
>> RC3.
>>
>> On Wed, Jan 31, 2018 at 10:12 AM, Sameer Agarwal 
>> wrote:
>>
>> Just a quick status update on RC3 -- SPARK-23274
>>  was resolved
>> yesterday and tests have been quite healthy throughout this week and the
>> last. I'll cut the new RC as soon as the remaining blocker (SPARK-23202
>> ) is resolved.
>>
>>
>> On 30 January 2018 at 10:12, Andrew Ash  wrote:
>>
>> I'd like to nominate SPARK-23274
>>  as a potential
>> blocker for the 2.3.0 release as well, due to being a regression from
>> 2.2.0.  The ticket has a simple repro included, showing a query that works
>> in prior releases but now fails with an exception in the catalyst optimizer.
>>
>> On Fri, Jan 26, 2018 at 10:41 AM, Sameer Agarwal 
>> wrote:
>>
>> This vote has failed due to a number of aforementioned blockers. I'll
>> follow up with RC3 as soon as the 2 remaining (non-QA) blockers are
>> resolved: https://s.apache. org/oXKi 
>>
>>
>> On 25 January 2018 at 12:59, Sameer Agarwal 
>> wrote:
>>
>>
>> Most tests pass on RC2, except I'm still seeing the timeout caused by 
>> https://issues.apache.org/
>> jira/browse/SPARK-23055
>>  ; the tests never
>> finish. I followed the thread a bit further and